recode
Convert text file character encodings
SYNOPSIS
recode [option]... [[charset]... FILE]...
PARAMETERS
-d, --directory=PATH
Add PATH to the list of directories containing character conversion tables.
-l, --list[=FORMAT]
List known charsets or encodings in FORMAT. Default format is 'name'.
-k, --known=PAIRS
Restrict charsets to those available through PAIRS. PAIRS are in the form 'L1:C1,L2:C2,...' where Ln is a language and Cn is a list of charsets applicable to that language. '...' means all other languages.
-c, --copyright
Show copyright and copying conditions.
-f, --force
Force recoding even if the output file already exists.
-t, --touch
Touch the recoded files only when contents change.
-i, --intermediate=FILE
Use FILE as intermediate file in two-stage recodings.
-p, --sequence=RULES
Use the RULES sequence for deleting and modifying the character recoding sequence.
-q, --quiet, --silent
Suppress all messages, except for errors.
-s, --strict
Use strict sequences; treat undefined characters as errors.
-v, --verbose
Explain sequence of steps being taken.
--help
Display this help and exit.
--version
Output version information and exit.
DESCRIPTION
The recode command converts files between various character sets and encodings. It reads the input file, recognizes its character set (or defaults to a specified or locale-dependent encoding if auto-detection fails), and then outputs a new file with the specified target encoding. Recode supports a wide range of character sets and encodings, and also handles transformations like converting between line ending conventions (DOS/Windows, Unix/Linux, Mac). It is a powerful tool for cleaning up text files, preparing them for use in different environments, or ensuring consistent encoding across a system.
It can also act as a filter, operating on standard input and output if no explicit files are given. Recode is especially useful for situations where simple text editors or other tools fail to correctly handle the encoding of a file, leading to garbled characters or other display issues.
CAVEATS
Recode relies on conversion tables to perform the character set conversions. If the required table isn't available or if the input encoding is not correctly detected, the recoding process may fail or produce unexpected results.
Be cautious when overwriting files, especially if the encoding conversion is lossy. Lossy conversions discard characters that don't exist in the target character set.
CHARACTER SET SPECIFICATIONS
Character set specifications can be given in several forms, including: canonical names, aliases, and even numerical specifications.
For example: 'latin1', 'iso-8859-1', 'CP850', and '10646-1993:U+0041' are all valid specifications.
ENCODING ORDER
When converting from one charset to another, recode automatically determines the best sequence of steps. The user can control it using the -p,--sequence, adding RULES that describe transformations using del, mod or add operations.
HISTORY
The recode command has been available in Unix-like systems for a long time, filling the need to convert text files between a myriad of different encodings. Its development focused on expanding the support for a large number of character sets and their conversions, making it a valuable tool for tasks such as internationalizing software, processing text from different sources, and ensuring data consistency.
Over the years, the rise of Unicode and UTF-8 has somewhat reduced the need for frequent recoding, as these encodings have become the de facto standard. However, legacy data and specific requirements still make the command useful for conversions involving older or less common character sets.