LinuxCommandLibrary

file

Determine file type

TLDR

Give a description of the type of the specified file

$ file [path/to/file]
copy

Look inside a zipped file and determine the file type(s) inside
$ file [[-z|--uncompress]] [path/to/file.zip]
copy

Allow file to work with special or device files
$ file [[-s|--special-files]] [path/to/file]
copy

Don't stop at first file type match; keep going until the end of the file
$ file [[-k|--keep-going]] [path/to/file]
copy

Determine the MIME encoding type of a file
$ file [[-i|--mime]] [path/to/file]
copy

SYNOPSIS

file [OPTION...] [FILE...]

PARAMETERS

-b, --brief
    Do not prepend filenames to output lines (e.g., only show the file type description).

-i, --mime
    Outputs mime type strings along with character set information (e.g., text/plain; charset=us-ascii).

-k, --keep-going
    Don't stop at the first match. This allows identifying all applicable types for a file.

-L, --dereference
    Follow symbolic links. By default, file examines the link itself, not the target.

-s, --special-files
    Normally, file attempts to read and determine the type of argument files which are symbolic links. It also tries to determine the type of block or character special files.

-z, --uncompress
    Try to look inside compressed files. It automatically uncompresses common compression types like gzip, bzip2, xz, etc.

-f namefile, --files-from namefile
    Read the names of the files to be examined from namefile (one per line) instead of the command line.

-m magicfile, --magic-file magicfile
    Specify an alternate list of magic number files to use instead of the default ones.

DESCRIPTION

The file command is a standard Unix utility that identifies the type of a given file. It's crucial for understanding the nature of files, especially when extensions are missing or misleading.

Unlike simply looking at a file's extension, file examines the file's content, header bytes, and file system information. It primarily uses a database of "magic numbers" (predefined byte sequences) located in files like /usr/share/misc/magic or /etc/magic (and its compiled version, magic.mgc), along with internal heuristics. This allows it to distinguish between various types like text files, executable binaries, libraries, archives, image formats, and more. It can also identify character sets and compression types.

Its output provides valuable insights for users and scripts, helping to determine how to process or open a file correctly. It's widely used in shell scripts for conditional processing based on file type.

CAVEATS

file relies heavily on a database of "magic numbers" and heuristics, which means it may not always be 100% accurate, especially for unusual or corrupted files. It can be fooled by files intentionally disguised to appear as a different type. The accuracy depends on the comprehensiveness and up-to-dateness of the magic files.

MAGIC FILES (<I>/USR/SHARE/MISC/MAGIC</I>)

file primarily determines file types by consulting a database of "magic numbers" and byte patterns stored in files like /usr/share/misc/magic or /etc/magic (and its compiled version, magic.mgc). These files contain rules that describe characteristic byte sequences at specific offsets within a file, allowing the command to identify various formats like JPEG images, ELF executables, ZIP archives, and more.

OUTPUT FORMAT

By default, the output format for a single file is filename: file_type_description. For example, myfile.txt: ASCII text or myprogram: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=..., stripped.

HISTORY

The file command is a very old and fundamental Unix utility, with its origins tracing back to the early versions of Research Unix in the 1970s. It was later included in System V Unix and became a part of the POSIX standard, cementing its place as a core command-line tool. The modern widely used version is maintained by Ian Darwin and others, with continuous efforts to expand its magic database to recognize new file formats. Its core mechanism of inspecting file content for type identification has remained consistent throughout its development.

SEE ALSO

stat(1), od(1), strings(1), identify(1)

Copied to clipboard