diff-pdf
Visually compare two PDF files
TLDR
Compare PDFs, indicating changes using return codes (0 = no difference, 1 = PDFs differ)
Compare PDFs, outputting a PDF with visually highlighted differences
Compare PDFs, viewing differences in a simple GUI
SYNOPSIS
diff-pdf [OPTIONS] FILE1.pdf FILE2.pdf [OUTPUT.pdf]
PARAMETERS
-o, --output=FILE
Specifies the output PDF file to save the highlighted differences. If not specified and no viewer is opened, the comparison results are only indicated by the exit status.
-s, --view-mode=MODE
Sets the interactive viewer mode (e.g., swipe, fade, difference). This option requires the interactive viewer to be available and not suppressed by --no-view.
-r, --resolution=DPI
Sets the rendering resolution in DPI for page comparison (default is typically 100). Higher DPI means better accuracy but slower processing and larger memory consumption.
-m, --match-pages
Attempts to match pages by content similarity rather than by page number, useful for documents with inserted or deleted pages, or reordered content.
--no-antialias
Disables anti-aliasing during rendering, potentially showing more subtle pixel differences that might otherwise be smoothed over. This can make differences more apparent but also create 'noise' from rendering artifacts.
--gray
Renders pages in grayscale before comparison, useful for ignoring color variations and focusing solely on luminosity changes or structural differences.
--diff-color=COLOR
Sets the highlight color for differences (e.g., red, #FF0000). Accepts common color names or hexadecimal RGB values.
--no-view
Prevents the interactive viewer from opening, useful for scripting or automated checks. If an output file is specified with -o, it will still be generated.
--check-only
Only checks for differences and exits with status 0 if files are visually identical, 1 if differences are found. Does not generate an output PDF or open the viewer. Useful for CI/CD pipelines.
--check-level=LEVEL
Sets the threshold for considering a difference significant enough (default is often 0.1). A lower value means it's more sensitive to subtle differences; a higher value ignores minor discrepancies.
--page-range=RANGE
Compares only a specific range of pages (e.g., 1-5, 7, 1- for all pages from 1 onwards). Multiple ranges can be comma-separated (e.g., 1-3,5,8-10).
--exclude-rect=X1,Y1,X2,Y2
Excludes a rectangular area from comparison on all pages. Coordinates are in points (1/72 inch) relative to the top-left corner of the page. Useful for ignoring headers, footers, or dynamic elements.
--verbose
Provides more detailed output during processing, including progress, rendering information, and potential warnings or errors.
--version
Displays the diff-pdf version information and exits.
--help
Shows the help message and available options, then exits.
DESCRIPTION
diff-pdf is a command-line tool designed to visually compare two PDF files. Unlike text-based diff tools, diff-pdf renders each page of the input PDFs into images and then performs a pixel-by-pixel comparison. It identifies and highlights discrepancies such as changes in text, fonts, layout, images, and formatting that standard text diffs might miss. The tool can either open a graphical viewer to show the differences interactively (using swipe, fade, or difference modes) or generate a new PDF document with the differences highlighted, making it invaluable for quality assurance, document version control, and verifying PDF processing outputs. It's particularly useful when ensuring that two PDFs appear identical to the human eye, even if their underlying structure might differ slightly.
CAVEATS
diff-pdf performs visual (raster) comparison, not structural analysis. This means slight differences in font rendering, anti-aliasing, or PDF internal structures might be reported as visual differences even if they appear negligible to the human eye or represent identical content. Performance can be slow for large PDF files or very high DPI settings, as it involves rendering each page as an image. It relies on an underlying PDF rendering engine (commonly Poppler or Ghostscript), so its accuracy and capabilities are tied to that engine. It may not work optimally with scanned documents where text content is not embedded, only image content.
EXIT STATUS
diff-pdf returns 0 if the files are visually identical (especially when using --check-only), 1 if differences are found, and a non-zero value greater than 1 in case of errors. This makes it highly suitable for use in scripts, automated testing, and continuous integration/delivery (CI/CD) pipelines to ensure PDF integrity.
DEPENDENCIES
diff-pdf typically requires a robust PDF rendering library such as Poppler (often the `libpoppler-glib` development files). For its interactive graphical viewer, it usually depends on a compatible toolkit like GTK+ (GIMP Toolkit). These dependencies are typically available through standard Linux package managers.
HISTORY
diff-pdf is a relatively modern tool that leverages existing PDF rendering libraries (like Poppler) to provide a user-friendly visual comparison utility. Its development was driven by the need for a robust way to compare PDF documents visually, overcoming the limitations of purely textual diff tools for complex document formats. It gained popularity for its straightforward approach to highlighting visual discrepancies, making it a valuable tool in various document-centric workflows.