recode

Convert text file character encodings

SYNOPSIS

recode [options] [recode-chain] [files...]

-f, --force
    Force conversion even if the target encoding might result in information loss or an uncertain result.

-i, --info
    Display detailed information about the internal graph of character sets and recoding steps, aiding in understanding conversion paths.

-l, --list
    List all known character sets, their aliases, and available recoding steps that recode can handle.

-o <file>, --output=<file>
    Direct the converted output to the specified file instead of standard output.

-s, --sequence
    Treat subsequent arguments as distinct recoding chains, allowing multiple conversion operations in a single invocation.

-v, --verbose
    Produce verbose output, showing details of the conversion process, which can be useful for debugging.

-w, --warn
    Issue warnings for characters that cannot be accurately represented in the target encoding, indicating potential data loss.

-d, --debug
    Produce extensive debugging output, primarily for developers to understand the internal operations of recode.

-h, --help
    Display a help message that summarizes command usage and options, then exit.

-V, --version
    Display version information for the recode utility and exit.

DESCRIPTION

The recode command is a powerful and versatile utility designed for converting character sets and other text properties between different encodings and formats. It acts as a universal filter, reading from standard input or specified files, and writing the converted output to standard output or a designated output file.

Unlike simpler conversion tools, recode supports a wide array of character encodings, including various ISO-8859-x, KOI8-x, EBCDIC, UTF-8, and many legacy systems. Its strength lies in its ability to not only change character sets but also perform complex transformations such as line ending conversions (e.g., between CRLF, LF, and CR), character normalization (NFC, NFD, NFKC, NFKD), and encoding/decoding of various text formats like Quoted-Printable (QP), Base64, HTML entities, XML entities, and URL encoding.

Users define conversion rules through specific "recoding chains," which dictate the sequence of transformations. This allows recode to handle intricate conversion paths, making it an indispensable tool for ensuring text interoperability across diverse operating systems, applications, and network protocols. Whether preparing files for different platforms or cleaning up text data, recode offers comprehensive control over character and format conversions.

CAVEATS

While powerful, recode can have a steep learning curve due to its extensive recoding chain syntax. Its comprehensive nature means understanding character encodings and internal steps is often necessary for complex conversions. For simple, standard character set conversions, iconv(1) is often preferred due to its simpler interface and widespread availability on Linux systems.

RECODING CHAIN SYNTAX

The core functionality of recode is defined by its 'recoding chain' arguments. These chains specify the source and target encodings, along with optional intermediate steps or format transformations. A typical chain looks like source_encoding..target_encoding.

Common elements within a recoding chain include:

Encoding Names: e.g., LATIN-1, UTF-8, KOI8-R, EBCDIC-US.
Line Ending Conversions: e.g., CR-LF (Windows/DOS), LF (Unix/Linux), CR (Macintosh).
Text Formats: QP (Quoted-Printable), BASE64, HTML (HTML entities), XML (XML entities), URL (URL encoding/decoding).
Normalization Forms: NFC, NFD, NFKC, NFKD (Unicode normalization forms).
Transliteration: Steps to approximate characters not representable in the target encoding.

recode intelligently finds the best path through its internal 'graph' of character sets and steps to perform the requested conversion.

HISTORY

The recode utility was developed by François Pinard and is part of the GNU project. It was designed to be a highly flexible and universal character set and format converter, predating widespread UTF-8 adoption but continually updated to support modern encodings and text processing needs. It has been a long-standing tool for handling diverse text data across different computing environments.