sum
Calculate checksum and block count of files
TLDR
Compute a checksum with BSD-compatible algorithm and 1024-byte blocks
Compute a checksum with System V-compatible algorithm and 512-byte blocks
SYNOPSIS
sum [OPTION]... [FILE]...
PARAMETERS
-r, --sysv
Use the System V algorithm for calculating the checksum. This algorithm may produce a different checksum than the default (BSD) algorithm for the same file.
-s, --bsd
Use the BSD algorithm for calculating the checksum. This is often the default algorithm if not specified and is generally the older, simpler 16-bit sum.
--help
Display a help message and exit.
--version
Output version information and exit.
FILE
The path to the file(s) for which to calculate the checksum and block count. If no FILE is specified, sum reads from standard input.
DESCRIPTION
The sum command is a traditional Unix utility designed to compute a 16-bit checksum for files and report the number of 1KB blocks they occupy. It reads each specified FILE (or standard input if no file is given) and outputs the checksum, the block count, and the filename. While historically useful for simple integrity checks and quick comparisons, its 16-bit checksum is very weak and easily susceptible to collisions, meaning different files can produce the same checksum.
For more robust and secure integrity verification, modern alternatives like cksum (which uses a 32-bit CRC checksum) or cryptographic hash functions like MD5 (md5sum) and SHA (sha256sum) are highly recommended. The sum command remains primarily for backward compatibility and specific legacy system interactions. It supports two main algorithms: the default BSD algorithm and the System V algorithm, which can be selected via options.
CAVEATS
Weak Checksum: The 16-bit checksum provided by sum is not cryptographically secure and is prone to collisions. It should not be used for verifying file integrity where security or strong data assurance is required.
Block Size Inconsistency: The block count reported is typically in 1KB blocks, but interpretation of "block" can vary across systems or utilities, leading to potential confusion.
Algorithm Dependence: The checksum result depends on the specific algorithm used (BSD vs. System V), so comparing checksums requires knowing which algorithm was applied.
OUTPUT FORMAT
The output of the sum command typically consists of three fields separated by spaces: the calculated 16-bit checksum, the number of 1KB blocks in the file, and the filename. For example, "23456 123 filename.txt". If reading from standard input, the filename field is omitted.
HISTORY
The sum command is one of the oldest utilities in the Unix ecosystem, dating back to early versions of Unix. Its original purpose was to provide a quick way to verify that a file had not been accidentally corrupted during transfer or storage. As computing evolved and the need for more robust and secure integrity checks arose, its limitations (primarily the 16-bit checksum's susceptibility to collisions) became apparent.
This led to the development of commands like cksum (which uses a 32-bit CRC) and later, cryptographic hash functions (MD5, SHA). Despite its age and superseded functionality for modern use cases, sum has been retained in POSIX and various Linux distributions primarily for backward compatibility with older scripts and systems that might still rely on its specific output format and checksum algorithm.