LinuxCommandLibrary

xml-select

Extract data from XML documents

TLDR

Select all elements matching "XPATH1" and print the value of their sub-element "XPATH2"

$ xml [[sel|select]] [[-t|--template]] [[-m|--match]] "[XPATH1]" [[-v|--value-of]] "[XPATH2]" [path/to/input.xml|URI]
copy

Match "XPATH1" and print the value of "XPATH2" as text with new-lines
$ xml [[sel|select]] [[-T|--text]] [[-t|--template]] [[-m|--match]] "[XPATH1]" [[-v|--value-of]] "[XPATH2]" [[-n|--nl]] [path/to/input.xml|URI]
copy

Count the elements of "XPATH1"
$ xml [[sel|select]] [[-t|--template]] [[-v|--value-of]] "count([XPATH1])" [path/to/input.xml|URI]
copy

Count all nodes in one or more XML documents
$ xml [[sel|select]] [[-T|--text]] [[-t|--template]] [[-f|--inp-name]] [[-o|--output]] " " [[-v|--value-of]] "count(node())" [[-n|--nl]] [path/to/input1.xml|URI] [path/to/input2.xml|URI]
copy

Display help
$ xml [[sel|select]] --help
copy

SYNOPSIS

xml-select [OPTIONS] XPath_expression [FILE...]
xml-select [OPTIONS] -f XPATH_FILE [FILE...]

PARAMETERS

XPath_expression
    The XPath expression used to select nodes from the XML document. If not
provided via -e or -f, this is typically the first non-option argument.

FILE...
    One or more XML files to process. If no files are specified, xml-select
reads XML data from standard input.

-e expression
    Specifies the XPath expression to evaluate. This is an explicit way to
provide the expression.

-f XPATH_FILE
    Reads the XPath expression from the specified XPATH_FILE instead of
from the command line.

-i
    Ignores insignificant whitespace when parsing the XML document. This can
be useful for cleaner output or to avoid matching whitespace-only text nodes.

-p
    Pretty-prints the XML output, adding indentation and newlines to enhance
readability of the resulting XML fragments.

-v
    Outputs only the value (text content for elements, attribute value for
attributes) of the selected node(s), stripping all XML tags from the output.

-N prefix=uri
    Defines a namespace mapping. This allows XPath expressions to use prefixes
for elements and attributes that belong to specific XML namespaces (e.g.,
-N soap=http://schemas.xmlsoap.org/soap/envelope/). Can be used
multiple times to define multiple namespaces.

-h
    Displays a brief help message and exits.

-V
    Displays version information for xml-select and exits.

DESCRIPTION

xml-select is a command-line utility for parsing and extracting data from XML documents. It leverages
XPath expressions to precisely target and retrieve specific XML elements, attributes, or their textual content.
Users can feed XML data via standard input or specify one or more input files. The tool outputs the
matching XML fragments or their values to standard output, making it highly suitable for scripting,
data extraction tasks, and quick inspections of XML structures without the complexity of writing
custom scripts or full XSLT transformations. It's built upon the powerful
Perl XML::Twig module, offering an efficient and lightweight method for navigating and
extracting information from XML.

CAVEATS

xml-select primarily supports XPath 1.0, so advanced features from
XPath 2.0 or later may not be available. As a Perl script, its performance
on extremely large XML files might not match native compiled XML parsers.
The tool is part of the xml-twig-tools package, which may not be
pre-installed on all Linux distributions.

USAGE EXAMPLES

Extracting an element:
xml-select '//book/title' library.xml

Extracting text content:
xml-select -v '//book/author' library.xml

Handling namespaces:
xml-select -N x='http://example.com/ns' '//x:data/x:item' document.xml

Piping input:
curl http://example.com/api/data.xml | xml-select '//item/price'

XPATH VERSION SUPPORT

xml-select's XPath engine primarily conforms to the
XPath 1.0 specification. Users requiring XPath 2.0 or
later features might need to explore alternative XML processing tools.

HISTORY

xml-select is a key component of the xml-twig-tools suite,
developed by Michel Rodriguez. Its foundation is the
XML::Twig Perl module, which is renowned for its efficient,
event-driven parsing approach, particularly beneficial for handling
large XML documents by processing them in chunks rather than loading
the entire document into memory. This design philosophy has made
xml-select a persistent and valuable tool in the Perl and Linux
ecosystems for XML manipulation.

SEE ALSO

xmlstarlet(1), xsltproc(1), xmllint(1), grep(1), awk(1)

Copied to clipboard