CryptoSys PKI Toolkit Manual

CNV_CheckUTF8

CNV_CheckUTF8 checks if a string only contains valid UTF-8 characters.

VB6/VBA Syntax

Public Declare Function CNV_CheckUTF8 Lib "diCrPKI.dll" (ByVal strInput As String) As Long

nLen = CNV_CheckUTF8(strInput)

Parameters

strInput
[in] String to be checked.

C/C++ Syntax

long _stdcall CNV_CheckUTF8(const char *szInput);

Returns (VB6/C)

Long: Returns zero if the string is invalid UTF-8, or a positive number if the string is valid UTF-8, where the value of the number indicates the nature of the encoded characters:

ReturnsValueResult
PKI_CHRS_NOT_UTF80Not valid UTF-8
PKI_CHRS_ALL_ASCII1Valid UTF-8, all chars are 7-bit ASCII
PKI_CHRS_ANSI82Valid UTF-8, contains at least one 8-bit ANSI character
PKI_CHRS_MULTIBYTE3Valid UTF-8, contains at least one multi-byte character that cannot be represented in a single-byte character set.

.NET Equivalent

Cnv.CheckUTF8 Method

Remarks

`Overlong' UTF-8 sequences and illegal surrogates are rejected as invalid. Strings that return PKI_CHRS_ANSI8 (2) can be converted to Latin-1 format using the CNV_Latin1FromUTF8 function. Strings that return PKI_CHRS_MULTIBYTE (3) cannot be converted to Latin-1, and strings that return PKI_CHRS_ALL_ASCII (1) are already OK because they only consist of 7-bit ASCII characters.

Example

Dim strData As String
Dim strDataUTF8 As String
Dim nRet As Long
Dim nLen As Long

' Our original string data is in "Latin-1" encoding
strData = "Asociación Mexicana de Estándares para el Comercio Electrónico A.C.|México|"
Debug.Print "Latin-1 string:"
Debug.Print strData

' Check if this is valid UTF-8 (it's not)
nRet = CNV_CheckUTF8(strData)
Debug.Print "CNV_CheckUTF8 returns " & nRet

' So convert to UTF-8
nLen = CNV_UTF8FromLatin1("", 0, strData)
If nLen < 0 Then
    Debug.Print "Failed to convert to UTF-8: " & nLen
    Exit Function
End If
strDataUTF8 = String(nLen, " ")
nLen = CNV_UTF8FromLatin1(strDataUTF8, nLen, strData)
' Which may not display correctly in VB6...!
Debug.Print "UTF-8 string:"
Debug.Print strDataUTF8

' And check again (expected result = 2
' => Valid UTF-8, contains at least one 8-bit ANSI character
nRet = CNV_CheckUTF8(strDataUTF8)
Debug.Print "CNV_CheckUTF8 returns " & nRet

This should give a result like:

Latin-1 string:
Asociación Mexicana de Estándares para el Comercio Electrónico A.C.|México|
CNV_CheckUTF8 returns 0
UTF-8 string:
Asociación Mexicana de Estándares para el Comercio Electrónico A.C.|México|
CNV_CheckUTF8 returns 2

To view UTF-8 data properly, you need to use a UTF-8-compatible text editor.

See Also

CNV_Latin1FromUTF8 CNV_UTF8FromLatin1

[Contents] [Index]

[HOME]   [NEXT: CNV_HexFilter...]

Copyright © 2004-9 D.I. Management Services Pty Ltd. All rights reserved.