LinuxCommandLibrary

xmllint

Validate and format XML documents

TLDR

Return all nodes (tags) named "foo"

$ xmllint --xpath "//[foo]" [source_file.xml]
copy

Return the contents of the first node named "foo" as a string
$ xmllint --xpath "string(//[foo])" [source_file.xml]
copy

Return the href attribute of the second anchor element in an HTML file
$ xmllint --html --xpath "string(//a[2]/@href)" webpage.xhtml
copy

Return human-readable (indented) XML from file
$ xmllint --format [source_file.xml]
copy

Check that an XML file meets the requirements of its DOCTYPE declaration
$ xmllint --valid [source_file.xml]
copy

Validate XML against DTD schema hosted online
$ xmllint --dtdvalid [URL] [source_file.xml]
copy

SYNOPSIS

xmllint [OPTIONS] [FILE... | -]
FILE... can be one or more XML/HTML files. Use - to read from standard input.

PARAMETERS

--noout
    Parse and validate, but suppress outputting the document. Useful for integrity checks.

--valid
    Validate the input document against its embedded DTD. Reports validity errors.

--dtdvalid URL
    Validate the input document against an external DTD specified by URL.

--schema URL
    Validate the document against an XML Schema specified by URL.

--relaxng URL
    Validate the document against a RELAX NG grammar specified by URL.

--xpath EXPR
    Evaluate an XPath expression EXPR on the document and print the results.

--html
    Parse the input document as HTML instead of XML, handling HTML specific parsing rules.

--format
    Reformat the output, pretty-printing the XML document with proper indentation.

--noblanks
    Remove ignorable blank nodes (whitespace) from the output, often used with --format.

--c14n
    Canonicalize the output, producing a canonical XML representation (Canonical XML 1.0).

--encoding ENCODING
    Specify the character encoding for the input document (e.g., UTF-8, ISO-8859-1).

--output FILE
    Direct the output of xmllint to the specified FILE instead of standard output.

--nowarning
    Do not display warnings, only errors.

--noent
    Substitute entity references with their values.

--version
    Display the version of xmllint and its underlying libxml2 library.

--help
    Print a summary of command-line options.

DESCRIPTION

xmllint is a powerful command-line utility from the libxml2 project, designed for parsing, validating, and formatting XML and HTML documents. It serves as an essential tool for developers, system administrators, and anyone working with structured data formats. Key functionalities include checking for well-formedness (syntactical correctness) and validity against DTDs, XML Schemas, or RELAX NG grammars. xmllint can also pretty-print XML, making complex documents human-readable, and evaluate XPath expressions to extract specific data. Its comprehensive error reporting helps in quickly identifying and rectifying issues within XML files, making it indispensable for debugging and ensuring data integrity in configuration files, web services, and more.

CAVEATS

xmllint is primarily designed for well-formed XML and HTML. While robust, it can consume significant memory and CPU for extremely large or complex documents, especially when used with options like --format. Its error messages, though detailed, can sometimes be cryptic to users unfamiliar with XML parsing intricacies. Network access for DTD/Schema resolution can be disabled with --nonet for security or performance reasons.

<B>EXIT STATUS</B>

xmllint uses specific exit codes to indicate operation results. A return value of 0 signifies success (document is well-formed, valid, or command executed successfully). A non-zero exit status indicates an error, such as a parsing error, validation failure, or incorrect arguments, making it suitable for use in shell scripts and automated build processes.

<B>COMMON USE CASES</B>

Beyond basic validation, xmllint is frequently used in CI/CD pipelines to validate configuration files (e.g., Jenkins job XMLs, Maven POMs), perform quick syntax checks on generated XML output, and pretty-print XML data for easier debugging and review. Its XPath capabilities are invaluable for extracting specific data points from XML documents in scripts.

HISTORY

xmllint originated as a command-line interface to libxml2, a C library for XML parsing developed by Daniel Veillard starting around 1999. libxml2 was designed to be fast, robust, and compliant with W3C XML standards. xmllint quickly became a popular and often default XML tool on Unix-like operating systems due to its reliability and comprehensive feature set, providing a simple yet powerful way to interact with XML documents without complex programming. It has since been a staple in scripting, development workflows, and automated validation tasks.

SEE ALSO

xmlstarlet(1), xsltproc(1), grep(1), awk(1), libxml2(3)

Copied to clipboard