iconv
Convert text file character encoding
TLDR
Convert file to a specific encoding, and print to stdout
Convert file to the current locale's encoding, and output to a file
List supported encodings
SYNOPSIS
iconv [-f FROMCODE] [-t TOCODE] [-o OUTPUTFILE] [-c] [-s] [--verbose] [--list] [INPUTFILE ...]
PARAMETERS
-f FROMCODE, --from-code=FROMCODE
Specifies the original character encoding of the input. If omitted, the system's current locale encoding is used.
-t TOCODE, --to-code=TOCODE
Specifies the desired character encoding for the output. If omitted, the system's current locale encoding is used.
-o OUTPUTFILE, --output=OUTPUTFILE
Writes the converted output to OUTPUTFILE instead of standard output.
-c
Omit characters that are invalid in the input encoding, or cannot be converted to the output encoding. Without this option, an error would occur.
-s, --silent
Suppress warnings about invalid characters or failed conversions. Errors will still be reported.
--verbose
Print progress information during conversion, useful for large files or debugging.
--list
Lists all known character set encodings supported by the iconv implementation on your system.
DESCRIPTION
iconv is a powerful command-line utility used to convert character encodings of files or standard input from one encoding to another. Its primary purpose is to ensure text data can be correctly interpreted and displayed across different systems or applications that use varying character sets. For instance, it can transform a file encoded in ISO-8859-1 to UTF-8, which is crucial for internationalization and compatibility.
The command operates by reading the input stream, translating each character according to the specified source (--from-code) and target (--to-code) encodings, and then writing the converted output. If no input file is specified, it reads from standard input; if no output file is specified, it writes to standard output. iconv is based on the iconv() C library function, which is a standard part of most Unix-like operating systems' C libraries (like glibc). This ensures robust support for a wide range of character sets, although the exact set of supported encodings can vary by system. It also provides options for handling invalid characters, such as omitting them or issuing warnings, making it a flexible tool for various text processing tasks.
CAVEATS
When converting between character sets, especially from a wider encoding (like UTF-8) to a narrower one (like ASCII), some characters may not have an equivalent representation. iconv will either omit them (with -c) or replace them with a default character, potentially leading to data loss. It's crucial to correctly identify the source encoding, as an incorrect -f option can result in garbled output. The set of supported encodings depends on the specific iconv implementation available on your system, usually provided by the C library (e.g., glibc).
FINDING SUPPORTED ENCODINGS
To see a comprehensive list of all character encodings that your system's iconv supports, you can run the command:
iconv --list
This list is extensive and includes various aliases for common encodings.
COMMON USE CASES
One of the most frequent uses of iconv is converting legacy text files to the widely adopted UTF-8 encoding. For example, to convert a file named old_file.txt from ISO-8859-1 to UTF-8 and save it as new_file.txt:
iconv -f ISO-8859-1 -t UTF-8 old_file.txt -o new_file.txt
When piping output from other commands, iconv can be used as a filter:
cat input.txt | iconv -f UTF-16 -t UTF-8 > output.txt
HISTORY
iconv emerged as a standard component of Unix-like systems through the POSIX standard, specifically from the X/Open Portability Guide (XPG4) in the early 1990s. Its underlying C library function, iconv(), provides the core character set conversion capabilities. This standardization ensured that character encoding conversion became a portable and reliable feature across various Unix implementations, vital for handling international text data as computing globalized. It has been a stable and fundamental utility in Linux and other Unix-like operating systems since its inception.