isutf8

Check if file is valid UTF-8

TLDR

Check whether the specified files contain valid UTF-8

$ isutf8 [path/to/file1 path/to/file2 ...]

Print errors using multiple lines

$ isutf8 [[-v|--verbose]] [path/to/file1 path/to/file2 ...]

Do not print anything to stdout, indicate the result merely with the exit code

$ isutf8 [[-q|--quiet]] [path/to/file1 path/to/file2 ...]

Only print the names of the files containing invalid UTF-8

$ isutf8 [[-l|--list]] [path/to/file1 path/to/file2 ...]

Same as --list but inverted, i.e., only print the names of the files containing valid UTF-8

$ isutf8 [[-i|--invert]] [path/to/file1 path/to/file2 ...]

PARAMETERS

-c
    Quiet mode: suppress output, use exit status only.

-h, --help
    Print usage help to stderr and exit.

-V, --version
    Display version info and copyright, then exit.

isutf8 is a lightweight command-line tool from the moreutils package that verifies if input data conforms to the UTF-8 encoding standard. It reads from standard input (stdin) by default or from specified files, scanning content line-by-line for valid UTF-8 byte sequences.

UTF-8 is the dominant character encoding for Unicode on Unix-like systems, but files can contain invalid sequences due to corruption, mixed encodings, or legacy data. isutf8 detects issues like overlong encodings, surrogate halves, or impossible bytes, making it ideal for data validation pipelines, script checks, or ensuring compatibility before processing with tools expecting UTF-8.

For each input (stdin or file), it outputs a simple status: "stdin:valid", "stdin:invalid", "file:valid", or "file:invalid". The exit code summarizes results: 0 for all valid, 1 for any invalid content, 2 for errors like unreadable files.

The -c option enables quiet mode for scripting, producing no output and relying solely on exit status. This utility excels in automation, such as validating log files, CSV imports, or web content before parsing.

isutf8

Check if file is valid UTF-8

TLDR

SYNOPSIS

PARAMETERS

DESCRIPTION

CAVEATS

EXIT STATUS

OUTPUT FORMAT

HISTORY

SEE ALSO