LinuxCommandLibrary

nkf

Convert character encodings of text files

TLDR

Convert to UTF-8 encoding

$ nkf -w [path/to/file.txt]
copy

Convert to SHIFT_JIS encoding
$ nkf -s [path/to/file.txt]
copy

Convert to UTF-8 encoding and overwrite the file
$ nkf -w --overwrite [path/to/file.txt]
copy

Use LF as the new line code and overwrite (UNIX type)
$ nkf -d --overwrite [path/to/file.txt]
copy

Use CRLF as the new line code and overwrite (windows type)
$ nkf -c --overwrite [path/to/file.txt]
copy

Decrypt mime file and overwrite
$ nkf -m --overwrite [path/to/file.txt]
copy

SYNOPSIS

nkf [options] [file ...]

PARAMETERS

-s
    Converts output to Shift_JIS encoding.

-e
    Converts output to EUC-JP encoding.

-j
    Converts output to JIS encoding.

-w
    Converts output to UTF-8 encoding.

-Lu
    Converts line endings to Unix (LF) format.

-Lw
    Converts line endings to Windows (CRLF) format.

-Lm
    Converts line endings to Mac (CR) format.

-b
    Decodes MIME Base64 encoding.

-m
    Decodes MIME Quoted-Printable encoding.

-g
    Prints the guessed input encoding to standard error without performing conversion.

-o <file>
    Writes output to the specified <file> instead of standard output.

-v
    Displays version information and exits.

DESCRIPTION

nkf (Network Kanji Filter) is a powerful and versatile command-line utility primarily designed for character encoding conversion, especially for various Japanese character sets (JIS, EUC-JP, Shift_JIS, UTF-8, etc.). Beyond character encoding, it also excels at converting line-ending conventions (Unix LF, Windows CRLF, Mac CR) and decoding MIME-encoded text (Base64 and Quoted-Printable). It automatically detects the input encoding in many cases, making it a valuable tool for processing text files from different sources or for preparing text for different operating systems or applications. Its flexibility makes it indispensable for developers, system administrators, and anyone working with multilingual or cross-platform text data.

CAVEATS

While nkf supports UTF-8, its core strength and primary development focus have been on handling legacy Japanese character sets. Its automatic encoding detection might not be flawless, especially with very short or ambiguous input data. Certain options (e.g., -S, -E, -J, -W) convert backslashes to yen signs, which can be an unexpected side effect outside of Japanese contexts.

INPUT/OUTPUT HANDLING

By default, nkf reads from standard input if no file arguments are provided. The converted output is sent to standard output unless the -o option is used to specify an output file. When multiple input files are specified, their converted contents are concatenated and directed to the chosen output destination.

HISTORY

nkf was developed in Japan in the late 1980s and early 1990s to address the complexities of character encoding and line-ending differences across diverse computing environments, particularly within the Japanese internet and UNIX communities. Its primary purpose was to facilitate the seamless exchange of text data encoded in different Japanese character sets (JIS, Shift_JIS, EUC-JP) and to reconcile the varying line-ending conventions between Unix, MS-DOS, and Macintosh systems. It quickly became a de-facto standard tool for these conversions in Japan and has since adapted to support newer encodings like UTF-8, remaining widely used today.

SEE ALSO

iconv(1), recode(1), dos2unix(1), unix2dos(1)

Copied to clipboard