LinuxCommandLibrary

diffoscope

Compare differences between files, directories, or archives

TLDR

Compare two files

$ diffoscope [path/to/file1] [path/to/file2]
copy

Compare two files without displaying a progress bar
$ diffoscope --no-progress [path/to/file1] [path/to/file2]
copy

Compare two files and write an HTML-report to a file (use - for stdout)
$ diffoscope --html [path/to/outfile|-] [path/to/file1] [path/to/file2]
copy

Compare two directories excluding files with a name matching a specified pattern
$ diffoscope --exclude [pattern] [path/to/directory1] [path/to/directory2]
copy

Compare two directories and control whether directory metadata is considered
$ diffoscope --exclude-directory-metadata [auto|yes|no|recursive] [path/to/directory1] [path/to/directory2]
copy

SYNOPSIS

diffoscope [OPTIONS] <FILE_OR_DIR1> <FILE_OR_DIR2>

PARAMETERS

-h, --help
    Show a help message and exit.

--version
    Show program's version number and exit.

--compare-files
    When comparing two directories, compare their contents. (Default: True)

--no-compare-files
    Do not compare file contents when comparing directories.

--compare-symlinks
    When comparing two directories, compare symbolic links. (Default: True)

--no-compare-symlinks
    Do not compare symbolic links.

--compare-mtimes
    Compare file modification times. (Default: True)

--no-compare-mtimes
    Do not compare file modification times.

--compare-uids
    Compare user IDs. (Default: True)

--no-compare-uids
    Do not compare user IDs.

--compare-gids
    Compare group IDs. (Default: True)

--no-compare-gids
    Do not compare group IDs.

--compare-modes
    Compare file modes (permissions). (Default: True)

--no-compare-modes
    Do not compare file modes (permissions).

--compare-xattrs
    Compare extended attributes. (Default: True)

--no-compare-xattrs
    Do not compare extended attributes.

--compare-numeric-ids
    Compare numeric user and group IDs, instead of names.

--exclude PATTERN
    Exclude files or directories matching the specified PATTERN.

--max-depth N
    Only compare files/directories up to N levels deep.

--full-diff
    Show full differences even for very large files.

--shallow
    Do not recurse into archives or filesystems (e.g., compare a .zip file as a plain file, not its contents).

--progress
    Show progress indicators during the comparison.

--text
    Output in plain text format (default).

--html
    Output in HTML format, suitable for web browsers.

--json
    Output in JSON format, suitable for machine parsing.

--output FILE
    Write output to the specified FILE instead of standard output.

--unified
    Output unified diff format (implies --text).

--new-directory
    Use for the label of the new (second) directory in the output.

--old-directory
    Use for the label of the old (first) directory in the output.

--use-file-mtime
    When comparing, treat modification times of files as 0 (epoch) to ignore timestamp differences.

--use-source-date-epoch
    When comparing, treat modification times of files as the SOURCE_DATE_EPOCH environment variable, used in reproducible builds.

--debug
    Show debugging messages during execution.

--profile
    Profile execution to identify performance bottlenecks.

--max-file-size SIZE
    Maximum file size to attempt to diff. Files larger than SIZE will be skipped.

--max-diff-size SIZE
    Maximum difference size to display. Differences larger than SIZE will be truncated.

--max-directory-size SIZE
    Maximum directory size (in bytes) to recurse into. Larger directories are skipped.

--no-timestamps
    Do not output timestamps in the diff header.

--list-handlers
    List all available file format handlers that diffoscope can use.

--diff-tool COMMAND
    Use the specified COMMAND to diff text files internally.

DESCRIPTION

diffoscope is a versatile tool for recursively comparing two files or directories. Unlike standard diff, it intelligently understands various file formats and archives (e.g., .deb, .rpm, .jar, .zip, .tar, disk images like .iso, .img), allowing it to deep-dive into their contents, metadata, permissions, and timestamps.

It can even unpack and compare nested archives or filesystems, providing a comprehensive report of all differences found. diffoscope is a crucial component in the Reproducible Builds effort, helping developers ensure that identical source code produces identical build artifacts, byte-for-byte. It provides detailed, human-readable output, and also supports machine-readable JSON output for automated analysis. Its ability to "look inside" many file types makes it invaluable for debugging build irreproducibility or understanding subtle differences between software versions.

CAVEATS

  • Can be resource-intensive for very large or deeply nested comparisons, potentially consuming significant CPU and memory.
  • Requires external tools and libraries for parsing certain file formats (e.g., binutils for ELF analysis, debutils for .deb files, libarchive for various archives). Ensure these dependencies are met for full functionality.
  • Output can be very verbose, especially for complex differences. Parsing plain text output programmatically might be challenging; consider using the --json output for automated analysis.
  • Not all differences are immediately obvious without understanding the underlying file formats and their specific metadata or internal structures.

OUTPUT FORMATS

diffoscope offers flexible output options. Beyond the default human-readable plain text (--text), it can generate rich HTML (--html) for viewing in web browsers, which includes clickable sections and visual cues for differences. It also provides structured JSON (--json) for easy machine parsing and integration into automated workflows or custom reporting tools, making it highly adaptable for various use cases.

REPRODUCIBLE BUILDS INTEGRATION

While useful for general file comparison, diffoscope's design is heavily influenced by the requirements of the Reproducible Builds effort. It includes specific options like --use-file-mtime and --use-source-date-epoch to standardize timestamps, allowing comparisons to focus solely on genuine content differences and facilitate the verification of deterministic builds. This focus makes it an indispensable tool for auditing and improving the reproducibility of software build processes.

HISTORY

diffoscope emerged as a fundamental tool within the Reproducible Builds project, an initiative focused on ensuring that software can be reliably rebuilt from source, yielding identical binary artifacts every time. Its development began around 2013-2014, driven by the need for a robust and intelligent diffing utility capable of inspecting complex, nested binary formats that standard tools like diff could not handle. Its primary purpose is to help identify and debug non-deterministic elements introduced during the build process, such as varying timestamps, build paths, or user/group IDs, which can cause 'unreproducible' builds. By providing detailed insights into these differences, diffoscope plays a vital role in enhancing software trustworthiness and security.

SEE ALSO

diff(1), cmp(1), meld(1), file(1), ar(1), dpkg(1), rpm(1), tar(1), zip(1)

Copied to clipboard