diffoscope
Compare differences between files, directories, or archives
TLDR
Compare two files
Compare two files without displaying a progress bar
Compare two files and write an HTML-report to a file (use - for stdout)
Compare two directories excluding files with a name matching a specified pattern
Compare two directories and control whether directory metadata is considered
SYNOPSIS
diffoscope [OPTIONS] <FILE_OR_DIR1> <FILE_OR_DIR2>
PARAMETERS
-h, --help
Show a help message and exit.
--version
Show program's version number and exit.
--compare-files
When comparing two directories, compare their contents. (Default: True)
--no-compare-files
Do not compare file contents when comparing directories.
--compare-symlinks
When comparing two directories, compare symbolic links. (Default: True)
--no-compare-symlinks
Do not compare symbolic links.
--compare-mtimes
Compare file modification times. (Default: True)
--no-compare-mtimes
Do not compare file modification times.
--compare-uids
Compare user IDs. (Default: True)
--no-compare-uids
Do not compare user IDs.
--compare-gids
Compare group IDs. (Default: True)
--no-compare-gids
Do not compare group IDs.
--compare-modes
Compare file modes (permissions). (Default: True)
--no-compare-modes
Do not compare file modes (permissions).
--compare-xattrs
Compare extended attributes. (Default: True)
--no-compare-xattrs
Do not compare extended attributes.
--compare-numeric-ids
Compare numeric user and group IDs, instead of names.
--exclude PATTERN
Exclude files or directories matching the specified PATTERN.
--max-depth N
Only compare files/directories up to N levels deep.
--full-diff
Show full differences even for very large files.
--shallow
Do not recurse into archives or filesystems (e.g., compare a .zip file as a plain file, not its contents).
--progress
Show progress indicators during the comparison.
--text
Output in plain text format (default).
--html
Output in HTML format, suitable for web browsers.
--json
Output in JSON format, suitable for machine parsing.
--output FILE
Write output to the specified FILE instead of standard output.
--unified
Output unified diff format (implies --text).
--new-directory
Use
--old-directory
Use
--use-file-mtime
When comparing, treat modification times of files as 0 (epoch) to ignore timestamp differences.
--use-source-date-epoch
When comparing, treat modification times of files as the SOURCE_DATE_EPOCH environment variable, used in reproducible builds.
--debug
Show debugging messages during execution.
--profile
Profile execution to identify performance bottlenecks.
--max-file-size SIZE
Maximum file size to attempt to diff. Files larger than SIZE will be skipped.
--max-diff-size SIZE
Maximum difference size to display. Differences larger than SIZE will be truncated.
--max-directory-size SIZE
Maximum directory size (in bytes) to recurse into. Larger directories are skipped.
--no-timestamps
Do not output timestamps in the diff header.
--list-handlers
List all available file format handlers that diffoscope can use.
--diff-tool COMMAND
Use the specified COMMAND to diff text files internally.
DESCRIPTION
diffoscope is a versatile tool for recursively comparing two files or directories. Unlike standard diff, it intelligently understands various file formats and archives (e.g., .deb, .rpm, .jar, .zip, .tar, disk images like .iso, .img), allowing it to deep-dive into their contents, metadata, permissions, and timestamps.
It can even unpack and compare nested archives or filesystems, providing a comprehensive report of all differences found. diffoscope is a crucial component in the Reproducible Builds effort, helping developers ensure that identical source code produces identical build artifacts, byte-for-byte. It provides detailed, human-readable output, and also supports machine-readable JSON output for automated analysis. Its ability to "look inside" many file types makes it invaluable for debugging build irreproducibility or understanding subtle differences between software versions.
CAVEATS
- Can be resource-intensive for very large or deeply nested comparisons, potentially consuming significant CPU and memory.
- Requires external tools and libraries for parsing certain file formats (e.g., binutils for ELF analysis, debutils for .deb files, libarchive for various archives). Ensure these dependencies are met for full functionality.
- Output can be very verbose, especially for complex differences. Parsing plain text output programmatically might be challenging; consider using the --json output for automated analysis.
- Not all differences are immediately obvious without understanding the underlying file formats and their specific metadata or internal structures.
OUTPUT FORMATS
diffoscope offers flexible output options. Beyond the default human-readable plain text (--text), it can generate rich HTML (--html) for viewing in web browsers, which includes clickable sections and visual cues for differences. It also provides structured JSON (--json) for easy machine parsing and integration into automated workflows or custom reporting tools, making it highly adaptable for various use cases.
REPRODUCIBLE BUILDS INTEGRATION
While useful for general file comparison, diffoscope's design is heavily influenced by the requirements of the Reproducible Builds effort. It includes specific options like --use-file-mtime and --use-source-date-epoch to standardize timestamps, allowing comparisons to focus solely on genuine content differences and facilitate the verification of deterministic builds. This focus makes it an indispensable tool for auditing and improving the reproducibility of software build processes.
HISTORY
diffoscope emerged as a fundamental tool within the Reproducible Builds project, an initiative focused on ensuring that software can be reliably rebuilt from source, yielding identical binary artifacts every time. Its development began around 2013-2014, driven by the need for a robust and intelligent diffing utility capable of inspecting complex, nested binary formats that standard tools like diff could not handle. Its primary purpose is to help identify and debug non-deterministic elements introduced during the build process, such as varying timestamps, build paths, or user/group IDs, which can cause 'unreproducible' builds. By providing detailed insights into these differences, diffoscope plays a vital role in enhancing software trustworthiness and security.