xmlsq is a simple lightweight utility to query XML documents using XPath 1.0. New release 18 July 2021.
Did you ever want a simple utility that just went and got you the text value out of an XML file? And do this without having to use the overhead of a huge XML library? Our xmlsq utility will do that for you. xmlsq is provided both as a stand-alone Windows command-line CLI executable and a separate API you can call from various programming languages including C#, VB.NET, C, C++, VBA and VB6. And it's free.
xmlsq uses XPath 1.0 and operates on either an XML file or a string that represents a valid XML document.
The default mode of xmlsq ("get-text") is simply to return the text value for the first occurrence of the node that matches the XPath query. That is the text content from an element or the attribute value from an attribute. There is also a "full-query" mode which returns the result of a full XPath query as a string, and "count" mode which returns the number of nodes that match the query.
In "get-text" and "count" mode the XPath query must point to a node in the XML document. Queries that return an integer or boolean type will fail.
Queries in "full-query" mode will return the result in a string. For example, the integer value 1.5 will be returned as the string "1.500000"
.
We assume you are familiar with XPath 1.0. XPath gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document. For more information on XPath 1.0 see the References below.
For the command-line syntax, type xmlsq --help
Usage: xmlsq {[-g]|-f|-c} [OPTION]... QUERY [INFILE] Perform XPath 1.0 queries on an XML document. -g, --get-text get text for first matching node [default] -f, --full-query execute full XPath 1.0 query -c, --count return the integer value of `count(query)` OPTIONS: -@, --stdin read input from stdin [default=INFILE] -i, --input=INFILE optional way to specify INFILE -a, --asciify output non-ASCII chars as XML character references -r, --raw output nodeset in raw format [default=prettify] -t, --trim trim leading/trailing whitespace (and collapse whitespace for an attribute value) -d, --delim=DELIM enclose output in delimiter(s), eg ' or [] -v, --version print program version and exit -h, --help print this help and exit -E, --examples print examples and exit INFILE must be specified unless `--stdin` option is used. Exit status is 0 on success or 1 if error.
hello.xml
<a> <b foo='baz'>hello</b> <b>world</b> </a>then the simple "get-text" query for
"//b"
will return just the character data inside the first element b which is "hello"
> xmlsq //b hello.xml helloSimilarly, the query
"//b/@foo"
will return the value of the attribute foo for the first occurrence of element b, which is "baz"
.
> xmlsq //b/@foo hello.xml bazTo get the text content of the second element b, use the element predicate
[2]
:
> xmlsq //b[2] hello.xml world
By contrast, using the "full-query" mode (the full XPath 1.0 query) returns the set of all matching nodes:
> xmlsq --full-query //b hello.xml <b foo="baz">hello</b> <b>world</b>
In our work with XML documents (mostly security related, XML-DSIG and XMLENC) the simple "get-text" mode is exactly the behaviour we want.
More details below in Get-text mode and Full-query mode.
The default "get-text" mode is a simplified form of XPath designed to return the text contents of the first element that matches the XPath query.
If the element is a text-only node with only character data inside, then this text is returned. If the query is for an attribute, then the attribute's text value is returned. If the element contains child elements or mixed child elements and character data, then the contents will be returned as a "prettified" string.
> xmlsq /a hello.xml <b foo="baz">hello</b> <b>world</b>
The XPath expression in get-text mode must evaluate to a node or node set (else **ERROR: BAD_XPATH:Expression does not evaluate to node set**
).
Using the --full-query
or -f
mode returns the full XPath result,
not the simplified text-only results returned for the "get-text" mode. Any valid XPath expression may be used.
Query | Default | Full-query |
---|---|---|
//b | hello |
<b foo="baz">hello</b> <b>world</b> |
//b[2] | world | <b>world</b> |
//b/@foo | baz | foo="baz" |
/a |
<b foo="baz">hello</b> <b>world</b> | <a> <b foo="baz">hello</b> <b>world</b> </a> |
--raw
option if you don't want this.
> xmlsq -f /a hello.xml <a> <b foo="baz">hello</b> <b>world</b> </a> > xmlsq --raw -f /a hello.xml <a><b foo="baz">hello</b><b>world</b></a>
You can use any valid XPath 1.0 expression in full-query mode. Use a dummy XML document, e.g. <a/>
,
for expressions that do not operate on an XML document. For example:
> xmlsq -f "3 + 5 div 2" "<a/>" 5.500000 > xmlsq -f "string-length('abc')" "<a/>" 3.000000 > xmlsq -f "substring('abcdefghij',3,4)" "<a/>" cdef > xmlsq -f "normalize-space(' my node ')" "<a/>" my node
--count
or -c
) returns the integer value of count(query)
,
the number of nodes that match the query.
> xmlsq --count "//b" hello.xml 2Use the count mode to:
(query)[i]
for i = 1 to count.
> xmlsq -f "/" "<a><e /></a>" <a> <e /> </a> > xmlsq --count "//e" "<a><e /></a>" 1 > xmlsq --count "//notthere" "<a><e /></a>" 0
**ERROR: BAD_XPATH:Expression does not evaluate to node set**
or **ERROR -3
).
> xmlsq --count "1+2" "<a/>" **ERROR -3
Download the latest version of xmlsq for Windows from one of the links below. This is free for personal and commercial use subject to the license conditions†.
Most recent production version 1.0.0 compiled 18 July 2021. Use either
Either unzip the zip file and run the install.exe
program inside it,
or download the exe program directly and run it.
These installation programs should be signed by verified publisher "d.i. management services pty limited".
Minimum required operating system is Windows XP-SP2 and above (that is, XP/Vista/W7/W8/W10) or Windows Server 2003 and above.
Trouble installing: If Microsoft Defender Smartscreen gives you a warning, see Unrecognized app error. (TL;DR Click "More info" then "Run anyway"). Check that you see "Publisher: D.I. MANAGEMENT SERVICES PTY LIMITED".
After installing, test by opening a command line window and typing xmlsq --help
. See Command-line syntax for more details.
† If your organisation has a problem with the license conditions as stated, please contact us and we'll work something out.
The Python interface to xmlsq is available separately from the Python Package Index (PyPi). For details see Python programming.
Added 18 July 2021.
The C++ interface is an alternative interface for C++ programmers who want to avoid the memory allocation hassles of using the "raw" C interface.
All strings are std::string
and all byte arrays are stored in a std::vector
.
It raises exceptions if the input parameters are wrong (invalid XML data or XPath expression) or a file is missing. We are not that keen on exceptions as a rule,
but we'll go with the flow here. The documentation is here and example code is here.
The code should compile under the C++11 standard or later. We've tested it using MSVC++ Version 12.0 (2013) and g++ version 8.3.0 (x86_64-w64-mingw32/8.3.0). Could the code be more concise? Goodness, it's C++, not Haskell! If you think the C++ code could be improved, please let us know.
xmlsq conforms to the W3C XPath 1.0 specification except for the following incompatibilities (Ref: pugixml v1.10: 8.6. Conformance to W3C specification).
<node>text1 <![CDATA[data]]> text2</node>
node should have one text node child, but instead has three.
id()
function always returns an empty node set.
namespace::
axis).
<foo xmlns:ns1='uri' xmlns:ns2='uri'><ns1:child/><ns2:child/></foo>
, query foo/ns1:*
will return only the first child, not both of them. Compliant XPath implementations can return both nodes if the user provides appropriate namespace declarations.
Note that xmlsq does not support namespace nodes. It is "namespace ignorant". We consider this a bonus! Just do a web search on "XML namespaces are keeping me from selecting nodes" or "XML namespaces are breaking my XPath searches" to see how much confusion namespaces causes with XPath. You either need to invoke a "Namespace Manager" (which we deliberately don't offer) or use incredibly complicated XPath expressions.
In xmlsq name tests are performed on qualified names (QNames) instead of expanded names.
That means a search on "//ds:Signature"
will work as expected provided there are not two different namespaces declared for
the prefix "ds" (and, honestly, if you have documents like that you deserve all the heartache you get).
If you really need some sort of namespace manager for your XPath queries, then this lightweight tool is not for you.
TL;DR Just search on the actual tag name used in the document.
Example: xmlenc document using default xmlns for xmldsig element <KeyInfo>:<EncryptedData xmlns="http://www.w3.org/2001/04/xmlenc#" MimeType="text/plain"> <EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#aes128-cbc" /> <KeyInfo xmlns="http://www.w3.org/2000/09/xmldsig#"> <KeyName>job</KeyName> </KeyInfo> <!-- ... --> </EncryptedData>The query
//KeyName
works on the above document:
> xmlsq //KeyName encrypt-data-aes128-cbc.xml jobEquivalent document using prefix "ds:"
<EncryptedData xmlns="http://www.w3.org/2001/04/xmlenc#" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" MimeType="text/plain"> <EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#aes128-cbc" /> <ds:KeyInfo> <ds:KeyName>job</ds:KeyName> </ds:KeyInfo> <!-- ... --> </EncryptedData>In this case, we must use the prefix "ds:" in the query
"//ds:KeyName"
to get the value we want:
> xmlsq --delim=' //KeyName encrypt-data-aes128-cbc-ds.xml '' > xmlsq --delim=' //ds:KeyName encrypt-data-aes128-cbc-ds.xml 'job'Alternatively, use the XPath
local-name()
function:
> xmlsq "//*[local-name()='KeyName']" encrypt-data-aes128-cbc-ds.xml job
xmlsq --helpIf this fails, re-install the program.
If you are getting error messages like "Cannot find DLL" or similar, please read Using on a 64-bit platform.
This software is based on the pugixml library (https://pugixml.org). Pugixml is Copyright (C) 2006-2019 Arseny Kapoulkine. Pugixml is a light-weight, simple and fast XML parser for C++ with XPath support and excellent documentation. We highly recommend it.
We use code from Bjoern Hoehrmann's Flexible and Economical UTF-8 Decoder Copyright (c) 2008-2009 Bjoern Hoehrmann <bjoern@hoehrmann.de>. "This page presents [a decoder] that is very easy to use correctly, short, small, fast, and free."
Some tests use reference files from
W3C XML Encryption Implementation and Interoperability Report
>
interop samples
>
merlin-xmlenc-five.tar.gz
.
diXmlsqNet.dll
using .NET 4.0 (instead of 4.5) for better backwards compatibility.To contact us or comment on this page, please send us a message.
This page first published 2 June 2020. Last updated 30 December 2021.