Use these functions to convert a string of text to an unambiguous array of bytes and vice versa.
In VB6/VBA, use the StrConv
function.
Dim abData() As Byte Dim Str As String Dim i As Long Str = "Hello world!" ' Convert string to bytes abData = StrConv(Str, vbFromUnicode) For i = 0 To UBound(abData) Debug.Print Hex(abData(i)); "='" & Chr(abData(i)) & "'" Next ' Convert bytes to string Str = StrConv(abData, vbUnicode) Debug.Print "'" & Str & "'"
48='H' 65='e' 6C='l' 6C='l' 6F='o' 20=' ' 77='w' 6F='o' 72='r' 6C='l' 64='d' 21='!' 'Hello world!'
VB6 stores its strings internally in "Unicode" format, two bytes per character, but the StrConv function will convert to an array of bytes encoded in "ANSI" format using your default code page.
In VB.NET use System.Text.Encoding
.
Dim abData() As Byte Dim Str As String Dim i As Long Str = "Hello world!" ' Convert string to bytes abData = System.Text.Encoding.Default.GetBytes(Str) For i = 0 To UBound(abData) Console.WriteLine(Hex(abData(i)) & "='" & Chr(abData(i)) & "'") Next ' Convert bytes to string Str = System.Text.Encoding.Default.GetString(abData) Console.WriteLine("'" & Str & "'")
In .NET strings are stored internally in "Unicode" format (UTF-16) and the GetBytes method can extract an array of bytes in any encoding you want.
The .Default
encoding uses the default code page on your system which is usually
1252 (Western European) but may be different on your setup.
If you want ISO-8859-1 (Latin-1) you can replace
.Default
with .GetEncoding(28591)
(code page 28591 is ISO-8859-1 which is identical to Windows-1252 except for characters in the range 0x80 to 0x9F).
Alternatively use System.Text.Encoding.GetEncoding("iso-8859-1").GetBytes(Str)
.
If you want UTF-8-encoded bytes, use System.Text.Encoding.UTF8.GetBytes(Str)
.
In C#, use System.Text.Encoding
, which has identical behaviour to the function in VB.NET.
byte[] abData; string Str; int i; Str = "Hello world!"; // Convert string to bytes abData = System.Text.Encoding.Default.GetBytes(Str); for (i = 0; i < abData.Length; i++) { Console.WriteLine("{0:X}", abData[i]); } // Convert bytes to string Str = System.Text.Encoding.Default.GetString(abData); Console.WriteLine("'{0}'", Str);
In C and C++, the distinction between a string and an array of bytes is often blurred.
A string is a zero-terminated sequence of char
types and
bytes are stored in the unsigned char
type.
A string needs an extra character for the null terminating character;
a byte array does not, but it needs its length to be stored in a separate variable
A byte array can can contain a zero (NUL) value but a string cannot.
#include <stdio.h> #include <string.h> #include <stdlib.h> static void pr_hexbytes(const unsigned char *bytes, int nbytes) /* Print bytes in hex format + newline */ { int i; for (i = 0; i < nbytes; i++) printf("%02X ", bytes[i]); printf("\n"); } int main() { char szStr[] = "Hello world!"; unsigned char *lpData; long nbytes; char *lpszCopy; /* Convert string to bytes */ /* (a) simply re-cast */ lpData = (unsigned char*)szStr; nbytes = strlen(szStr); pr_hexbytes(lpData, nbytes); /* (b) make a copy */ lpData = malloc(nbytes); memcpy(lpData, (unsigned char*)szStr, nbytes); pr_hexbytes(lpData, nbytes); /* Convert bytes to a zero-terminated string */ lpszCopy = malloc(nbytes + 1); memcpy(lpszCopy, lpData, nbytes); lpszCopy[nbytes] = '\0'; printf("'%s'\n", lpszCopy); free(lpData); free(lpszCopy); return 0; }
48 65 6C 6C 6F 20 77 6F 72 6C 64 21 48 65 6C 6C 6F 20 77 6F 72 6C 64 21 'Hello world!'
The types char
and unsigned char
might be identical on your system, or they might not be.
We strongly recommend that you explictly distinguish between strings and byte arrays in your code by using
the correct type and consistently treating them differently.
If your string is a Unicode string, then it consists of a sequence of wchar_t
types.
Converting wide-character strings to a sequence of bytes in C is more problematic.
You can either convert the Unicode string directly to a string of bytes (in which case every second byte will be zero for
US-ASCII characters),
or use the stdlib wcstombs
function or the Windows WideCharToMultiByte
function
to convert to a sequence of multi-byte characters (some will be one byte long, some two)
and then convert the multi-byte string to bytes (you can do this with a simple cast).
Each party encrypting and decrypting must agree on which way to do it.