CryptoSys PKI Pro Manual

UTF-8 and Latin-1

Deprecated and obsolete UTF-8 functions

These deprecated UTF-8-related functions are deprecated and should be replaced. The .NET method Cnv.CheckUTF8(String) is obsolete and has been withdrawn in [v11.0].

DeprecatedReplaced by
CNV_UTF8FromLatin1CNV_UTF8BytesFromLatin1
CNV_Latin1FromUTF8CNV_Latin1FromUTF8Bytes
CNV_CheckUTF8CNV_CheckUTF8Bytes
Cnv.CheckUTF8(String) (withdrawn)Cnv.CheckUTF8(Byte[]) method

The change is subtle. Strictly speaking, the concept of "converting" a string of characters from one character encoding scheme to another is meaningless. What we really mean is that we want to change the byte array that represents the string in Latin-1 encoding to a new byte array that represents the same string using UTF-8 encoding.

The deprecated string-based functions just "forced" the UTF-8-encoded bytes back into a string type. This forcing trick will work in VB6/C but does not in .NET (well, you can but it's a lot of effort, and pointless). The resulting "UTF-8 strings" will print "funny" but you could pass them to HASH_HexFromString and obtain the correct hash digest.

Aside: We had intended to completely remove these three string-based VB6/VBA functions in v11.0 but then discovered we used them ourselves in an Access VBA application to fix up UTF-8 strings from emails before saving in a database. So, er, they are reprieved.

If you need to "convert" a string to UTF-8 while using this cryptography toolkit, you probably intend to pass the result to a message digest hash or signature function. The underlying hash functions work with byte arrays anyway and so you should really just go directly from the Latin-1 string to a byte array containing bytes of the correct UTF-8 encoding. Then you can check you have the correct bytes and can pass this array directly to the "Bytes" version of the hash function.

We have also added the new function CNV_ByteEncoding and equivalent method Cnv.ByteEncoding to convert encoding in a byte array between UTF-8 and Latin-1. Again, by working with byte arrays, we are doing it the right way.

Both VB6 and .NET store strings internally in "Unicode" encoding (more accurately, UTF-16) but when passed to this Toolkit they are automatically converted to strings of "ANSI" characters. For more information, see Converting strings to bytes and vice versa

Sample code

Here's how to change the code in VBA to obtain the SHA-1 digest of a UTF-8-encoded string using the new functions.

Dim strData As String
Dim strDataUTF8 As String
Dim abDataUTF8() As Byte
Dim strDigest As String
Dim nRet As Long
Dim nLen As Long

' Our original string data
strData = "Estándares de Electrónica de México para mañana"
' "Convert" to UTF-8
nLen = CNV_UTF8FromLatin1("", 0, strData)
nLen = CNV_UTF8BytesFromLatin1(0, 0, strData)
If nLen <= 0 Then
    Debug.Print "Failed to convert to UTF-8: " & nLen
    Exit Function
End If
strDataUTF8 = String(nLen, " ")
ReDim abDataUTF8(nLen - 1)
nLen = CNV_UTF8FromLatin1(strDataUTF8, nLen, strData)
nLen = CNV_UTF8BytesFromLatin1(abDataUTF8(0), nLen, strData)

' Create a hash but first dimension the string to receive it
strDigest = String(PKI_SHA1_CHARS, " ")
nRet = HASH_HexFromString(strDigest, Len(strDigest), strDataUTF8, Len(strDataUTF8), PKI_HASH_SHA1)
nRet = HASH_HexFromBytes(strDigest, Len(strDigest), abDataUTF8(0), nLen, PKI_HASH_SHA1)
Debug.Print "Digest=" & strDigest
This should result in the output:
Digest=3eeb1871c14cd03af6d586850e3058fa80cbbe51

The same as above but using much briefer (and safer) VBA wrapper functions.

Dim strData As String
Dim strDigest As String
strData = "Estándares de Electrónica de México para mañana"
' Compute SHA-1 hash over UTF-8 encoded bytes in one line
strDigest = hashHexFromBytes(cnvUTF8BytesFromLatin1(strData), PKI_HASH_SHA1)
Debug.Print "Digest=" & strDigest

Here is code to do the same thing in VB.NET.

Dim strData As String
Dim abDataUTF8() As Byte
Dim strDigest As String
' Our original string data
strData = "Estándares de Electrónica de México para mañana"
' "Convert" to UTF-8
abDataUTF8 = System.Text.Encoding.UTF8.GetBytes(strData)
' Compute hash value
strDigest = Hash.HexFromBytes(abDataUTF8, HashAlgorithm.Sha1)
Console.WriteLine("Digest=" & strDigest)

And in C#, doing it all in one line.

string strData;
string strDigest;
strData = "Estándares de Electrónica de México para mañana";
strDigest = Hash.HexFromBytes(System.Text.Encoding.UTF8.GetBytes(strData), HashAlgorithm.Sha1);
Console.WriteLine("Digest=" + strDigest);

[Contents] [Index]

[PREV: Using with Python...]   [Contents]   [Index]   
   [NEXT: Filenames with "International" characters...]

Copyright © 2004-24 D.I. Management Services Pty Ltd. All rights reserved. Generated 2024-09-23T07:52:09Z.