Converts a string of 8-bit Latin-1 characters into a UTF-8 encoded array of bytes.
Public Declare Function CNV_UTF8BytesFromLatin1 Lib "diCrPKI.dll" (ByRef abOutput As Byte, ByVal nOutBytes As Long, ByVal strInput As String) As Long
nLen = CNV_UTF8BytesFromLatin1(abOutput(0), nOutBytes, strInput)
Byte array suitably dimensioned to receive output.Long specifying the maximum number of bytes to be received.String of Latin-1 characters to be converted.
long _stdcall CNV_UTF8BytesFromLatin1(unsigned char *lpOutput, long nOutBytes, const char *szInput);
Long: If successful, the return value is a positive number indicating the
number of bytes in the output array, or number of bytes required if nOutBytes is set to zero;
otherwise it returns a negative error code.
Use System.Text.Encoding.UTF8.GetBytes(Str).
Will set up to nOutBytes bytes in the output array. If nOutBytes is zero, it returns the required number of bytes.
Dim strData As String Dim abDataUTF8() As Byte Dim nRet As Long Dim nBytes As Long Dim nChars As Long Dim strNew As String ' Our original string data contains 5 non-ASCII characters strData = "abcóéíáñ" Debug.Print "Latin-1 string='" & strData & "'" Debug.Print " (" & Len(strData) & " characters)" ' Convert directly to array of bytes in UTF-8 encoding ' Find required length first nBytes = CNV_UTF8BytesFromLatin1(vbNull, 0, strData) If nBytes <= 0 Then Debug.Print "Failed to convert to UTF-8: " & nBytes Exit Sub End If ' Pre-dimension ReDim abDataUTF8(nBytes - 1) nBytes = CNV_UTF8BytesFromLatin1(abDataUTF8(0), nBytes, strData) ' Display in hex Debug.Print "UTF-8=(0x)" & cnvHexStrFromBytes(abDataUTF8) Debug.Print " (" & nBytes & " bytes)" ' Check if this is valid UTF-8 encoding nRet = CNV_CheckUTF8Bytes(abDataUTF8(0), nBytes) Debug.Print "CNV_CheckUTF8Bytes returns " & nRet & " (expected 2)" ' Now put back into a string nChars = CNV_Latin1FromUTF8Bytes("", 0, abDataUTF8(0), nBytes) If nChars <= 0 Then Debug.Print "Failed to convert to string: " & nChars Exit Sub End If strNew = String(nChars, " ") nChars = CNV_Latin1FromUTF8Bytes(strNew, nChars, abDataUTF8(0), nBytes) Debug.Print "New string='" & strNew & "' (" & nChars & " characters)"
This should result in the output:
Latin-1 string='abcóéíáñ' (8 characters) UTF-8=(0x)616263C3B3C3A9C3ADC3A1C3B1 (13 bytes) CNV_CheckUTF8Bytes returns 2 (expected 2) New string='abcóéíáñ' (8 characters)
CNV_Latin1FromUTF8Bytes CNV_CheckUTF8Bytes