xml-c14n

Canonicalize XML documents

TLDR

View documentation for the original command

$ tldr xml canonic

SYNOPSIS

xml-c14n [OPTIONS] [FILE]
Reads XML from FILE or standard input, outputs canonicalized XML.

-o, --output file
    Specifies the output file for the canonicalized XML. If not specified, output goes to standard output.

-i, --in-place
    Performs canonicalization on the input file directly, overwriting its content. Use with caution.

-e, --exc-c14n
    Uses Exclusive XML Canonicalization 1.0, which handles namespaces differently for portability.

-C, --with-comments
    Includes comments in the canonicalized output. Note: standard C14N removes comments.

-I, --id id
    Specifies the ID of the root element to be canonicalized. Only that element and its descendants are processed.

-P, --prefix-ns
    Used with exclusive canonicalization, prefixes namespace attributes with 'xml' if necessary.

--version
    Displays version information and exits.

--help
    Displays a help message with usage information and exits.

DESCRIPTION

xml-c14n is a command-line tool designed for canonicalizing XML documents. XML canonicalization (C14N) is a process that transforms an XML document into a physical representation, a byte stream, that is consistent across different XML parsers and environments. This consistency is crucial for applications like digital signatures, where even a slight change in whitespace or attribute order would invalidate the signature. The tool reads an XML document from standard input or a specified file and outputs its canonicalized form to standard output or an output file. It supports both standard XML Canonicalization 1.0 and Exclusive XML Canonicalization 1.0, which is often used in SOAP and WS-Security contexts by excluding non-declared namespace prefixes from the canonical form. By default, xml-c14n removes comments, resolves character and entity references, and normalizes whitespace, ensuring a deterministic byte-for-byte representation of the document's logical content.

CAVEATS

Canonicalization can be complex, especially with DTDs and external entities. While xml-c14n handles standard cases, specific edge cases related to DTD processing or unresolvable external references might lead to unexpected results or errors. The --in-place option should be used with extreme caution as it overwrites the original file, making it prone to data loss if an error occurs.

CANONICALIZATION STANDARDS

xml-c14n implements Canonical XML 1.0 by default. The --exc-c14n option allows it to use Exclusive XML Canonicalization 1.0. These standards define precise rules for transforming an XML document into a canonical form, ensuring that logically equivalent documents produce identical byte sequences, regardless of minor differences like whitespace or attribute order.

INPUT AND OUTPUT

The command reads XML from the specified FILE argument. If no file is provided, it reads from standard input (stdin), allowing it to be piped from other commands. The canonicalized output is sent to standard output (stdout) unless an output file is specified using the --output option.

HISTORY

The concept of XML Canonicalization was developed by the World Wide Web Consortium (W3C) to address the need for a consistent byte-stream representation of XML documents, primarily for digital signatures. The first recommendation, Canonical XML 1.0, was published in 2001. Exclusive XML Canonicalization 1.0 followed in 2002 to address specific needs in Web Services. The xml-c14n command is part of the libxml2 utilities, a widely used XML toolkit in the Linux ecosystem, reflecting its importance in XML processing workflows.