CryptoSys Home > Sc14n > Flattening an XML document

Flattening an XML document


The term "flattening" for an XML document means removing all whitespace between tags to produce a document on a single line.

This is the recommended way to submit a document signed using XML-DSIG to a server. It eliminates issues where the whitespace between tags can invalidate a signature. This whitespace has no semantic meaning but is significant for XML-DSIG signatures. Best avoided.

A pretty-printed document

doc-pretty.xml:

<doc>
 <node>
    <data>text</data>
    <f>
    </f>
 </node>
</doc>

How it should be transmitted safely in one line

doc-as-signed.xml:

<doc><node><data>text</data><f></f></node></doc>

It's up to you whether you do this, but we recommend it. The whitespace between tags can get changed in transmission and any signature will be invalidated.

To get the flattened form using a Perl regex:
$xmldata =~ s/>\s+</></gs;
Or, in C# land:
s = Regex.Replace(s, @">\s+<", @"><", RegexOptions.Singleline);

The "flatten" option in SC14N

You've signed a document in flattened (one-line) form but this is inconvenient to work with and you've pretty-printed it. Can you compute the C14N transformation of the original using the pretty-printed version? Yes, you can.

Computing the straight C14N transformation of the pretty-printed version will give the wrong result - this is not what was signed.
> sc14n doc-pretty.xml
<root>
 <node>
    <data>text</data>
    <f>
    </f>
 </node>
</root>
But using the "--flatten" option will give the correct result.
> sc14n --flatten doc-pretty.xml
<root><node><data>text</data><f></f></node></root>

Note that this is just a convenience when working with pretty-printed documents. It's not an official XML transformation. We find it useful for doing analysis on documents that have already been signed.

A subtle issue

The "flatten" option will remove what is strictly called "ignorable whitespace" from an XML document. There is a subtle issue here. Consider the document ignorable_ws.xml:
<?xml version="1.0" encoding="UTF-8"?>
<root>
 <node>
    <data>text</data>
    <withspaces>   some text with spaces and
    newlines
    </withspaces>
    <e />
    <f>
    </f>
 </node>
</root>
The correct "flattened" version is:
> sc14n -f ignorable_ws.xml
<root><node><data>text</data><withspaces>   some text with spaces and
        newlines
        </withspaces><e></e><f></f></node></root>
C:\!Data\DotNet\diSc14nNet\work>
The element <f> that has nothing but whitespace as content is flattened to the empty element <f></f>. But the content of the element <withspaces> is left unchanged, including all leading and trailing whitespace and newlines inside the tags.
<withspaces>   some text with spaces and
        newlines
        </withspaces>

Contact us

To contact us or comment on this page, please send us a message.

This page last updated 15 December 2019

[Go to top]