CryptoSys Home > PKI > Accented characters and UTF-8 in XML-DSIG

Accented characters and UTF-8 in XML-DSIG signatures


In this page we look at a simple example to create an XML-DSIG signature of an XML document containing accented characters like áéíóúñ.

The problem | The answer | The Digest Value | The Signature Value | The final signed XML document | The Code | See Also | Contact us

The problem

Question: I am trying to create the signature for the <Book> element in this XML document, but I cannot compute the correct SHA-1 digest value.

<?xml version="1.0" encoding="ISO-8859-1"?>
<References>
<Book xml:id="F01">
<FirstName>Bruceñ</FirstName>
</Book>
</References>

I can do it for the first name "Bruce" but it fails when I add the ñ character .

The answer

Answer: The problem is usually because the canonicalized (c14n'd) form requires the ñ character (latin small letter n with tilde) to be encoded in UTF-8.

Here is some code in VB6 that takes the input from reading the original file in ISO-8859-1 (Latin-1) form and computes the SHA-1 digest value of the canonicalized form.

Dim strData As String
Dim abData() As Byte
Dim strDigest As String
Dim nRet As Long
Dim nDataLen As Long
Dim strSig64 As String

' Input as (would be) read from original file (with LATIN SMALL LETTER N WITH TILDE)
strData = "<Book xml:id=""F01"">" & vbCrLf & _
    "<FirstName>Bruceñ</FirstName>" & vbCrLf & _
    "</Book>"
Debug.Print strData
Debug.Print "ORIG= " & cnvHexStrFromString(strData)
' Line breaks are normalized to "#xA": i.e. convert CR-LF pairs to LF
strData = Replace(strData, vbCrLf, vbLf)
' Convert to UTF-8
nDataLen = CNV_UTF8BytesFromLatin1(vbNull, 0, strData)
ReDim abData(nDataLen - 1)
nDataLen = CNV_UTF8BytesFromLatin1(abData(0), nDataLen, strData)
' Display input as sequence of bytes in hex form
Debug.Print "INPUT=" & cnvHexStrFromBytes(abData)
' Form SHA-1 digest of input
strDigest = String(PKI_SHA1_CHARS, " ")
nRet = HASH_HexFromBytes(strDigest, Len(strDigest), abData(0), nDataLen, PKI_HASH_SHA1)
Debug.Print "DIGEST(hex)=" & strDigest
' Encode in base64
strSig64 = cnvB64StrFromHexStr(strDigest)
' Return base64-encoded digest...
Debug.Print "DIGEST(base64)=" & strSig64

And running this code results in

<Book xml:id="F01">
<FirstName>Bruceñ</FirstName>
</Book>
ORIG= 3C426F6F6B20786D6C3A69643D22463031223E0D0A3C46697273744E616D653E4272756365F13C2F46697273744E616D653E0D0A3C2F426F6F6B3E
INPUT=3C426F6F6B20786D6C3A69643D22463031223E0A3C46697273744E616D653E4272756365C3B13C2F46697273744E616D653E0A3C2F426F6F6B3E
DIGEST(hex)=ff1ae390056f0aea04dc8e6db9d19d2325391a1d
DIGEST(base64)=/xrjkAVvCuoE3I5tudGdIyU5Gh0=

New Note:this can be done much more easily using our SC14N XML canonicalization utility. See the examples below using SC14N to compute the digest of the Book element directly and the SignedInfo element directly.

The Digest Value

The short answer is that the "DigestValue" we require for the XML signature is /xrjkAVvCuoE3I5tudGdIyU5Gh0=.

The longer answer for those of you who want to debug your own programs is that we convert the sequence of bytes in the original XML fragment <Book>...</Book>

3C426F6F6B20786D6C3A69643D22463031223E0D0A3C46697273744E616D653E4272756365F13C2F46697273744E616D653E0D0A3C2F426F6F6B3E

to another sequence of bytes, the canonicalized form of exactly 58 bytes:

3C426F6F6B20786D6C3A69643D22463031223E0A3C46697273744E616D653E4272756365C3B13C2F46697273744E616D653E0A3C2F426F6F6B3E

where all CR-LF pairs (0x)0D 0A are converted to a single 0A, and (in this case) the only non-ASCII character ñ (F1) is represented by two bytes (0x)C3 B1 in UTF-8 encoding.

Note that this sequence should always begin with a "<" character (3C) and always end with a ">" (3E). Be aware, too, that there are lots of other changes you might need to do for more general c14n, but this is all you need to do for our simple example.

This sequence of 58 bytes is input to the SHA-1 digest algorithm to yield the result (0x)FF1AE390056F0AEA04DC8E6DB9D19D2325391A1D or /xrjkAVvCuoE3I5tudGdIyU5Gh0= in base64. Only this exact sequence as input will give the correct digest value.

Using SC14N to compute the digest of the Book element directly

Using SC14N on the base XML file: Transform the subset for element with tag name Book and compute digest value of this using default SHA-1.

> sc14n -d -s Book utf8-base.xml
/xrjkAVvCuoE3I5tudGdIyU5Gh0=

In VBA/VB6:

strDigest = sc14nFile2Digest(strFileName, "Book", "", SC14N_TRAN_SUBSETBYTAG)
or we could use the URI reference instead (capital '-S' option)
> sc14n -d -S "xml:id=F01" utf8-base.xml
/xrjkAVvCuoE3I5tudGdIyU5Gh0=

The Signature Value

Having got this digest value, we need to compute the signature value. The input to this process is the canonicalized form of the "SignedInfo" element. Our exact input in this case is

<SignedInfo xmlns="http://www.w3.org/2000/09/xmldsig#">
<CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"></CanonicalizationMethod>
<SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"></SignatureMethod>
<Reference URI="#F01">
<Transforms>
<Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"></Transform>
</Transforms>
<DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"></DigestMethod>
<DigestValue>/xrjkAVvCuoE3I5tudGdIyU5Gh0=</DigestValue>
</Reference>
</SignedInfo>

That is, the 554 bytes shown in this hexdump. Note again that the first and last characters in this sequence are "<" and ">", respectively, and that the line endings are single LF characters 0x0A.

Note also that this SignedInfo element contains only plain ASCII characters, so it can be directly hashed without having to do the Latin-1 to UTF-8 conversion above. Plain ASCII is a valid subset of UTF-8.

This input has SHA-1 digest value (0x)3910b5e6a669140c374978e55fe08d87ddd81e74 or ORC15qZpFAw3SXjlX+CNh93YHnQ= in base64.

Using SC14N to compute the digest of the SignedInfo directly

Using SC14N on the intermediate XML file: Transform the subset for element with tag name SignedInfo and compute digest value of this using default SHA-1.

> sc14n -d -s SignedInfo utf8-inter.xml
ORC15qZpFAw3SXjlX+CNh93YHnQ=

In VBA/VB6:

strDigest = sc14nFile2Digest(strFileName, "SignedInfo", "", SC14N_TRAN_SUBSETBYTAG)

This digest value is incorporated in the actual signature value. We use Alice's 1024-bit encrypted private key (password="password") to create the signature. See the sample code below.

You can see the XML form of Alice's public key in the <RSAKeyValue> in the final XML document below. There are example procedures in the sample code showing how to form this using both Alice's certificate and private key file.

The final signed XML document

<?xml version="1.0" encoding="ISO-8859-1"?>
<References>
<Book xml:id="F01">
<FirstName>Bruceñ</FirstName>
</Book>
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
<SignedInfo>
<CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
<SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/>
<Reference URI="#F01">
<Transforms>
<Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
</Transforms>
<DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
<DigestValue>/xrjkAVvCuoE3I5tudGdIyU5Gh0=</DigestValue>
</Reference>
</SignedInfo>
<SignatureValue>
PZab37+BAm8XXXLOL4CxF0M0Ep9okIl2IDfvOEaejCv68lrRv0zCF3zXOkl6x09e
prbrWK1adS3bNzqK0KxDiUBuOAKgX0wF1MrTPCJmlwU+HSsVmbFj49jlx+9q5YGi
oEf3bdOFO3Mj9+1snhwEAIPVVe8n+jGvbTD/d4CyPIk=
</SignatureValue>
<KeyInfo>
<KeyValue>
<RSAKeyValue>
<Modulus>
4IlzOY3Y9fXoh3Y5f06wBbtTg94Pt6vcfcd1KQ0FLm0S36aGJtTSb6pYKfyX7PqC
UQ8wgL6xUJ5GRPEsu9gyz8ZobwfZsGCsvu40CWoT9fcFBZPfXro1Vtlh/xl/yYHm
+Gzqh0Bw76xtLHSfLfpVOrmZdwKmSFKMTvNXOFd0V18=
</Modulus>
<Exponent>AQAB</Exponent>
</RSAKeyValue>
</KeyValue>
</KeyInfo>
</Signature>
</References>

You can check this at the Online XML Digital Signature Verifer.

New2022-03-20: See Troubleshooting problems on the 'Online XML Digital Signature Verifier' site

The Code

Here is our sample code in VB6/VBA and in VB.NET/VB200x. This zipped file (7.2 kB) includes the source code as well as Alice's private key file (password="password"), X.509 certificate and the final XML document.

See also

  1. Signing an XML document using XMLDSIG: a simple example signing a straightforward text string and storing the result in an XML document.
  2. XML-Dsig and the Chile SII: creating digital signatures in XML documents (XML-Dsig) using the standards for electronic invoices set by the Servicio de Impuestos Internos (SII) of Chile.
  3. NewRe-released 2018-08-09: SC14N, a straightforward XML canonicalization utility.

Contact us

To comment on this page or ask a question, please send us a message.

This page first published 4 December 2010. Last updated 15 November 2022.

[Go to top]