Converting strings to bytes and vice versa

CryptoSys PKI Pro Manual

Converting strings to bytes and vice versa

Use these functions to convert a string of text to an unambiguous array of bytes and vice versa.

In VB6/VBA, use the StrConv function.

Dim abData() As Byte
Dim Str As String
Dim i As Long
Str = "Hello world!"
' Convert string to bytes
abData = StrConv(Str, vbFromUnicode)
For i = 0 To UBound(abData)
    Debug.Print Hex(abData(i)); "='" & Chr(abData(i)) & "'"
Next
' Convert bytes to string
Str = StrConv(abData, vbUnicode)
Debug.Print "'" & Str & "'"

48='H'
65='e'
6C='l'
6C='l'
6F='o'
20=' '
77='w'
6F='o'
72='r'
6C='l'
64='d'
21='!'
'Hello world!'

VB6 stores its strings internally in "Unicode" format, two bytes per character, but the StrConv function will convert to an array of bytes encoded in "ANSI" format using your default code page.

In VB.NET use System.Text.Encoding.

Dim abData() As Byte
Dim Str As String
Dim i As Long
Str = "Hello world!"
' Convert string to bytes
abData = System.Text.Encoding.Default.GetBytes(Str)
For i = 0 To UBound(abData)
    Console.WriteLine(Hex(abData(i)) & "='" & Chr(abData(i)) & "'")
Next
' Convert bytes to string
Str = System.Text.Encoding.Default.GetString(abData)
Console.WriteLine("'" & Str & "'")

In .NET strings are stored internally in "Unicode" format (UTF-16) and the GetBytes method can extract an array of bytes in any encoding you want.

The .Default encoding uses the default code page on your system which is usually 1252 (Western European) but may be different on your setup. If you want ISO-8859-1 (Latin-1) you can replace .Default with .GetEncoding(28591) (code page 28591 is ISO-8859-1 which is identical to Windows-1252 except for characters in the range 0x80 to 0x9F). Alternatively use System.Text.Encoding.GetEncoding("iso-8859-1").GetBytes(Str). If you want UTF-8-encoded bytes, use System.Text.Encoding.UTF8.GetBytes(Str).

In C#, use System.Text.Encoding, which has identical behaviour to the function in VB.NET.

byte[] abData;
string Str;
int i;
Str = "Hello world!";
// Convert string to bytes
abData = System.Text.Encoding.Default.GetBytes(Str);
for (i = 0; i < abData.Length; i++)
{
	Console.WriteLine("{0:X}", abData[i]);
}
// Convert bytes to string
Str = System.Text.Encoding.Default.GetString(abData);
Console.WriteLine("'{0}'", Str);

In C and C++, the distinction between a string and an array of bytes is often blurred. A string is a zero-terminated sequence of char types and bytes are stored in the unsigned char type. A string needs an extra character for the null terminating character; a byte array does not, but it needs its length to be stored in a separate variable A byte array can can contain a zero (NUL) value but a string cannot.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

static void pr_hexbytes(const unsigned char *bytes, int nbytes)
/* Print bytes in hex format + newline */
{
   int i;
   for (i = 0; i < nbytes; i++)
   	printf("%02X ", bytes[i]);
   printf("\n");
}

int main()
{
   char szStr[] = "Hello world!";
   unsigned char *lpData;
   long nbytes;
   char *lpszCopy;

   /* Convert string to bytes */
   /* (a) simply re-cast */
   lpData = (unsigned char*)szStr;
   nbytes = strlen(szStr);
   pr_hexbytes(lpData, nbytes);

   /* (b) make a copy */
   lpData = malloc(nbytes);
   memcpy(lpData, (unsigned char*)szStr, nbytes);
   pr_hexbytes(lpData, nbytes);

   /* Convert bytes to a zero-terminated string */
   lpszCopy = malloc(nbytes + 1);
   memcpy(lpszCopy, lpData, nbytes);
   lpszCopy[nbytes] = '\0';
   printf("'%s'\n", lpszCopy);

   free(lpData);
   free(lpszCopy);

   return 0;
}

48 65 6C 6C 6F 20 77 6F 72 6C 64 21
48 65 6C 6C 6F 20 77 6F 72 6C 64 21
'Hello world!'

The types char and unsigned char might be identical on your system, or they might not be. We strongly recommend that you explictly distinguish between strings and byte arrays in your code by using the correct type and consistently treating them differently.

If your string is a Unicode string, then it consists of a sequence of wchar_t types. Converting wide-character strings to a sequence of bytes in C is more problematic. You can either convert the Unicode string directly to a string of bytes (in which case every second byte will be zero for US-ASCII characters), or use the stdlib wcstombs function or the Windows WideCharToMultiByte function to convert to a sequence of multi-byte characters (some will be one byte long, some two) and then convert the multi-byte string to bytes (you can do this with a simple cast). Each party encrypting and decrypting must agree on which way to do it.

[Contents] [Index]

[PREV: "Hello world" programs...] [Contents] [Index]

[NEXT: Converting VB6 to VB.NET...]