xml-escape
Escape characters for XML/HTML
TLDR
Escape special XML characters in a string
Escape special XML characters from stdin
Display help
SYNOPSIS
xml-escape
DESCRIPTION
The xml-escape command is a utility designed to convert special characters found in text data into their corresponding XML entity references. This process, known as XML escaping, is crucial when embedding arbitrary text within XML documents to prevent parsing errors and ensure the document remains well-formed.
It typically reads input from standard input (stdin) and writes the escaped output to standard output (stdout). The characters commonly escaped include: less than (< becomes <), greater than (> becomes >), ampersand (& becomes &), single quote (' becomes '), and double quote (" becomes ").
This command is often used in scripting pipelines where data needs to be prepared for inclusion in XML fields, attributes, or CDATA sections, ensuring that the XML parser correctly interprets the content rather than seeing the special characters as structural XML markup.
CAVEATS
Limited Scope: The command primarily focuses on the five standard XML predefined entities (<, >, &, ', "). It does not handle other character encoding conversions or arbitrary Unicode character escaping beyond these basic entities.
No Validation: xml-escape is a simple filter; it does not perform any XML validation on its input. It assumes the input is plain text or partial XML content that needs specific characters escaped.
Input/Output Streams: It typically operates on standard input (stdin) and standard output (stdout). Users must manage file redirection appropriately, as it does not usually take file arguments directly.
COMMON USAGE
The xml-escape command is designed to be used as a filter in shell pipelines. It reads data from standard input and prints the escaped data to standard output. This makes it highly versatile for integrating into scripts for data transformation.
Examples:
1. Escaping a literal string:
echo "<tag>Value & More</tag>" | xml-escape
2. Escaping content from a file:
cat input.txt | xml-escape > output.xml
3. Piping output to another XML utility:
generate_data_command | xml-escape | xmllint --format -
HISTORY
The concept of XML escaping is fundamental to XML itself, introduced with its specification. Utilities like xml-escape emerged as part of larger XML processing toolkits, such as the libxml2 library. libxml2 is a widely used C library for parsing and manipulating XML documents, developed by Daniel Veillard. Simple command-line tools like xml-escape are often provided as convenient wrappers around core library functions, making basic XML operations accessible from the shell. Its development history is tied to the evolution of XML tools on Unix-like systems, emphasizing simplicity and pipeline compatibility.
SEE ALSO
xmlstarlet(1), xmllint(1), sed(1), awk(1)