xml-select
Extract data from XML documents
TLDR
Select all elements matching "XPATH1" and print the value of their sub-element "XPATH2"
Match "XPATH1" and print the value of "XPATH2" as text with new-lines
Count the elements of "XPATH1"
Count all nodes in one or more XML documents
Display help
SYNOPSIS
xml-select [OPTIONS] XPath_expression [FILE...]
xml-select [OPTIONS] -f XPATH_FILE [FILE...]
PARAMETERS
XPath_expression
The XPath expression used to select nodes from the XML document. If not
provided via -e or -f, this is typically the first non-option argument.
FILE...
One or more XML files to process. If no files are specified, xml-select
reads XML data from standard input.
-e expression
Specifies the XPath expression to evaluate. This is an explicit way to
provide the expression.
-f XPATH_FILE
Reads the XPath expression from the specified XPATH_FILE instead of
from the command line.
-i
Ignores insignificant whitespace when parsing the XML document. This can
be useful for cleaner output or to avoid matching whitespace-only text nodes.
-p
Pretty-prints the XML output, adding indentation and newlines to enhance
readability of the resulting XML fragments.
-v
Outputs only the value (text content for elements, attribute value for
attributes) of the selected node(s), stripping all XML tags from the output.
-N prefix=uri
Defines a namespace mapping. This allows XPath expressions to use prefixes
for elements and attributes that belong to specific XML namespaces (e.g.,
-N soap=http://schemas.xmlsoap.org/soap/envelope/). Can be used
multiple times to define multiple namespaces.
-h
Displays a brief help message and exits.
-V
Displays version information for xml-select and exits.
DESCRIPTION
xml-select is a command-line utility for parsing and extracting data from XML documents. It leverages
XPath expressions to precisely target and retrieve specific XML elements, attributes, or their textual content.
Users can feed XML data via standard input or specify one or more input files. The tool outputs the
matching XML fragments or their values to standard output, making it highly suitable for scripting,
data extraction tasks, and quick inspections of XML structures without the complexity of writing
custom scripts or full XSLT transformations. It's built upon the powerful
Perl XML::Twig module, offering an efficient and lightweight method for navigating and
extracting information from XML.
CAVEATS
xml-select primarily supports XPath 1.0, so advanced features from
XPath 2.0 or later may not be available. As a Perl script, its performance
on extremely large XML files might not match native compiled XML parsers.
The tool is part of the xml-twig-tools package, which may not be
pre-installed on all Linux distributions.
USAGE EXAMPLES
Extracting an element:
xml-select '//book/title' library.xml
Extracting text content:
xml-select -v '//book/author' library.xml
Handling namespaces:
xml-select -N x='http://example.com/ns' '//x:data/x:item' document.xml
Piping input:
curl http://example.com/api/data.xml | xml-select '//item/price'
XPATH VERSION SUPPORT
xml-select's XPath engine primarily conforms to the
XPath 1.0 specification. Users requiring XPath 2.0 or
later features might need to explore alternative XML processing tools.
HISTORY
xml-select is a key component of the xml-twig-tools suite,
developed by Michel Rodriguez. Its foundation is the
XML::Twig Perl module, which is renowned for its efficient,
event-driven parsing approach, particularly beneficial for handling
large XML documents by processing them in chunks rather than loading
the entire document into memory. This design philosophy has made
xml-select a persistent and valuable tool in the Perl and Linux
ecosystems for XML manipulation.
SEE ALSO
xmlstarlet(1), xsltproc(1), xmllint(1), grep(1), awk(1)