xml-unescape
Convert XML entities to their literal characters
TLDR
Unescape special XML characters from a string
Unescape special XML characters from stdin
Display help
SYNOPSIS
xml-unescape [OPTIONS] [FILE...]
PARAMETERS
-h, --help
Displays a brief help message and exits.
-v, --version
Shows version information about the utility and exits.
-o FILE, --output=FILE
Writes the unescaped output to the specified FILE instead of standard output (stdout).
--input-encoding=ENCODING
Specifies the character encoding of the input data. This helps in correctly interpreting multi-byte characters and entity references.
--output-encoding=ENCODING
Specifies the character encoding for the output data. The output will be transcoded to this encoding if different from the input.
--strict
Enables strict parsing mode. The command will exit with an error if it encounters malformed or unrecognized XML entities, instead of attempting to ignore or fix them.
FILE...
One or more input files to be processed. If no files are specified, the command reads from standard input (stdin).
DESCRIPTION
The concept of xml-unescape refers to the process of converting XML character entity references (like < for <, & for &, &#DD; for decimal character codes, or &#xHH; for hexadecimal character codes) back into their original, literal character forms. This operation is crucial when XML content has been embedded within other XML elements or plain text in an escaped format to avoid conflicts with XML's structural markup.
The primary purpose is to make the content human-readable or machine-parseable in its original form. For example, if an XML document contains data like "This is <XML> content", unescaping it would result in "This is
CAVEATS
It is important to note that xml-unescape is not a standard, universally available standalone command on most Linux distributions. The functionality described typically refers to a common operation that needs to be performed on XML data. Users usually achieve this through more comprehensive XML processing toolkits like xmlstarlet, by using text manipulation utilities such as sed or perl with regular expressions for simpler cases, or by scripting in languages like Python or Perl which provide robust XML parsing libraries (e.g., lxml in Python, XML::LibXML in Perl). Therefore, direct invocation of 'xml-unescape' might not work without a specific package or custom script providing it.
COMMON XML ENTITIES
The most frequently unescaped XML entities include:
- < converts to < (less than sign)
- > converts to > (greater than sign)
- & converts to & (ampersand)
- " converts to " (double quotation mark)
- ' converts to ' (apostrophe or single quotation mark)
- &#DD; converts to the character represented by the decimal number DD (e.g.,   for space)
- &#xHH; converts to the character represented by the hexadecimal number HH (e.g.,   for space)
USE CASES
XML unescaping is typically performed when:
- Extracting plain text content from an XML document that might contain escaped characters.
- Preparing XML data for display in environments (like web browsers or text editors) that are not XML-aware and expect literal characters.
- Processing data that was stored in an XML-escaped format (e.g., in a database field) and needs to be returned to its original form for further processing or analysis.
- When debugging or inspecting raw XML content that uses extensive escaping for readability.
HISTORY
The necessity for XML unescaping dates back to the very inception of XML itself. XML mandates that certain characters (like <, >, &, ", ') be escaped when they appear within element content or attribute values to avoid conflicts with the document's markup structure. As XML usage grew, the need for tools to reverse this escaping process became apparent, particularly for extracting or displaying plain text content. While dedicated xml-unescape commands are rare, the functionality has been integrated into numerous XML parsers, validators, and command-line processing utilities as a fundamental capability, reflecting a continuous need for clean content extraction from XML.
SEE ALSO
xmlstarlet(1), sed(1), perl(1), python(1), recode(1)