CNV_CheckUTF8 checks if a string only contains valid UTF-8 characters.
Public Declare Function CNV_CheckUTF8 Lib "diCrPKI.dll"
(ByVal strInput As String) As Long
nLen = CNV_CheckUTF8(strInput)
String to be checked.
long _stdcall CNV_CheckUTF8(const char *szInput);
Long: Returns zero if the string is invalid UTF-8,
or a positive number if the string is valid UTF-8,
where the value of the number indicates the nature of the encoded characters:
| Returns | Value | Result |
|---|---|---|
| PKI_CHRS_NOT_UTF8 | 0 | Not valid UTF-8 |
| PKI_CHRS_ALL_ASCII | 1 | Valid UTF-8, all chars are 7-bit ASCII |
| PKI_CHRS_ANSI8 | 2 | Valid UTF-8, contains at least one 8-bit ANSI character |
| PKI_CHRS_MULTIBYTE | 3 | Valid UTF-8, contains at least one multi-byte character that cannot be represented in a single-byte character set. |
`Overlong' UTF-8 sequences and illegal surrogates are rejected as invalid. Strings that return
PKI_CHRS_ANSI8 (2) can be converted to Latin-1 format using the
CNV_Latin1FromUTF8 function. Strings that return
PKI_CHRS_MULTIBYTE (3) cannot be converted to Latin-1, and strings that return
PKI_CHRS_ALL_ASCII (1) are already OK because they only consist of 7-bit ASCII characters.
Dim strData As String Dim strDataUTF8 As String Dim nRet As Long Dim nLen As Long ' Our original string data is in "Latin-1" encoding strData = "Asociación Mexicana de Estándares para el Comercio Electrónico A.C.|México|" Debug.Print "Latin-1 string:" Debug.Print strData ' Check if this is valid UTF-8 (it's not) nRet = CNV_CheckUTF8(strData) Debug.Print "CNV_CheckUTF8 returns " & nRet ' So convert to UTF-8 nLen = CNV_UTF8FromLatin1("", 0, strData) If nLen < 0 Then Debug.Print "Failed to convert to UTF-8: " & nLen Exit Function End If strDataUTF8 = String(nLen, " ") nLen = CNV_UTF8FromLatin1(strDataUTF8, nLen, strData) ' Which may not display correctly in VB6...! Debug.Print "UTF-8 string:" Debug.Print strDataUTF8 ' And check again (expected result = 2 ' => Valid UTF-8, contains at least one 8-bit ANSI character nRet = CNV_CheckUTF8(strDataUTF8) Debug.Print "CNV_CheckUTF8 returns " & nRet
This should give a result like:
Latin-1 string: Asociación Mexicana de Estándares para el Comercio Electrónico A.C.|México| CNV_CheckUTF8 returns 0 UTF-8 string: Asociación Mexicana de Estándares para el Comercio Electrónico A.C.|México| CNV_CheckUTF8 returns 2
To view UTF-8 data properly, you need to use a UTF-8-compatible text editor.
CNV_Latin1FromUTF8 CNV_UTF8FromLatin1