LinuxCommandLibrary

recode

Convert text file character encodings

SYNOPSIS

recode [option]... [[charset]... FILE]...

PARAMETERS

-d, --directory=PATH
    Add PATH to the list of directories containing character conversion tables.

-l, --list[=FORMAT]
    List known charsets or encodings in FORMAT. Default format is 'name'.

-k, --known=PAIRS
    Restrict charsets to those available through PAIRS. PAIRS are in the form 'L1:C1,L2:C2,...' where Ln is a language and Cn is a list of charsets applicable to that language. '...' means all other languages.

-c, --copyright
    Show copyright and copying conditions.

-f, --force
    Force recoding even if the output file already exists.

-t, --touch
    Touch the recoded files only when contents change.

-i, --intermediate=FILE
    Use FILE as intermediate file in two-stage recodings.

-p, --sequence=RULES
    Use the RULES sequence for deleting and modifying the character recoding sequence.

-q, --quiet, --silent
    Suppress all messages, except for errors.

-s, --strict
    Use strict sequences; treat undefined characters as errors.

-v, --verbose
    Explain sequence of steps being taken.

--help
    Display this help and exit.

--version
    Output version information and exit.

DESCRIPTION

The recode command converts files between various character sets and encodings. It reads the input file, recognizes its character set (or defaults to a specified or locale-dependent encoding if auto-detection fails), and then outputs a new file with the specified target encoding. Recode supports a wide range of character sets and encodings, and also handles transformations like converting between line ending conventions (DOS/Windows, Unix/Linux, Mac). It is a powerful tool for cleaning up text files, preparing them for use in different environments, or ensuring consistent encoding across a system.

It can also act as a filter, operating on standard input and output if no explicit files are given. Recode is especially useful for situations where simple text editors or other tools fail to correctly handle the encoding of a file, leading to garbled characters or other display issues.

CAVEATS

Recode relies on conversion tables to perform the character set conversions. If the required table isn't available or if the input encoding is not correctly detected, the recoding process may fail or produce unexpected results.

Be cautious when overwriting files, especially if the encoding conversion is lossy. Lossy conversions discard characters that don't exist in the target character set.

CHARACTER SET SPECIFICATIONS

Character set specifications can be given in several forms, including: canonical names, aliases, and even numerical specifications.

For example: 'latin1', 'iso-8859-1', 'CP850', and '10646-1993:U+0041' are all valid specifications.

ENCODING ORDER

When converting from one charset to another, recode automatically determines the best sequence of steps. The user can control it using the -p,--sequence, adding RULES that describe transformations using del, mod or add operations.

HISTORY

The recode command has been available in Unix-like systems for a long time, filling the need to convert text files between a myriad of different encodings. Its development focused on expanding the support for a large number of character sets and their conversions, making it a valuable tool for tasks such as internationalizing software, processing text from different sources, and ensuring data consistency.

Over the years, the rise of Unicode and UTF-8 has somewhat reduced the need for frequent recoding, as these encodings have become the de facto standard. However, legacy data and specific requirements still make the command useful for conversions involving older or less common character sets.

SEE ALSO

iconv(1), enca(1)

Copied to clipboard