LinuxCommandLibrary

pdfcrop

Crop PDF files to remove excess whitespace

TLDR

Automatically detect and remove the margin for each page in a PDF file

$ pdfcrop [path/to/input_file.pdf] [path/to/output_file.pdf]
copy

Set the margins of each page to a specific value
$ pdfcrop [path/to/input_file.pdf] --margins '[left] [top] [right] [bottom]' [path/to/output_file.pdf]
copy

Set the margins of each page to a specific value, using the same value for left, top, right and bottom
$ pdfcrop [path/to/input_file.pdf] --margins [300] [path/to/output_file.pdf]
copy

Use a user-defined bounding box for cropping instead of automatically detecting it
$ pdfcrop [path/to/input_file.pdf] --bbox '[left] [top] [right] [bottom]' [path/to/output_file.pdf]
copy

Use different user-defined bounding boxes for odd and even pages
$ pdfcrop [path/to/input_file.pdf] --bbox-odd '[left] [top] [right] [bottom]' --bbox-even '[left] [top] [right] [bottom]' [path/to/output_file.pdf]
copy

Automatically detect margins using a lower resolution for improved performance
$ pdfcrop [path/to/input_file.pdf] --resolution [72] [path/to/output_file.pdf]
copy

SYNOPSIS

pdfcrop [options] input.pdf [output.pdf]

PARAMETERS

--hires
    Use HiResBoundingBox for more precise cropping, if available in the PDF.

--margins <left> <top> <right> <bottom> | <all>
    Add extra margins around the detected bounding box. Values can be absolute (e.g., `10pt`) or a single value for all sides.

--output-file <file>
    Specify the output filename. By default, `pdfcrop` appends -crop to the input filename.

--output-dir <dir>
    Specify an output directory for the cropped PDF.

--pages <range>
    Process only the specified page range (e.g., `1-5`, `odd`, `all`).

--verbose
    Display detailed information during processing, including detected bounding box coordinates.

--restricted
    Assume a restricted `\write18` shell escape, forcing the use of Ghostscript for bounding box detection instead of pdfinfo/pdffonts.

--no-clean
    Do not remove temporary files created during the cropping process. Useful for debugging.

DESCRIPTION

pdfcrop is a Perl script that automatically detects the bounding box of a PDF file and crops the document to fit its content. It leverages Ghostscript (or pdfTeX's pdfinfo and pdffonts utilities) for bounding box detection and pdflatex for the actual cropping. This command is extremely useful for removing unnecessary white margins around content, often generated by LaTeX, scanners, or other PDF creation tools, making documents more compact and suitable for embedding or viewing on smaller screens. It can process individual files and offers options for adding extra margins, fixing bad bounding boxes, or handling specific page ranges. It's commonly part of the TeX Live utility suite.

CAVEATS

pdfcrop relies on external tools like Ghostscript and pdfTeX (specifically pdflatex, pdfinfo, pdffonts) to function.
It might struggle with certain complex PDFs where the bounding box detection is ambiguous or the content is difficult to parse (e.g., scans with background noise).
For very large or numerous PDF files, the processing can be computationally intensive as it involves regenerating the PDF.
It's important to note that pdfcrop is designed solely for whitespace cropping and is not a general-purpose PDF editor.

HOW IT WORKS

pdfcrop operates in two main steps:
1. Bounding Box Detection: It uses either Ghostscript (via `gs`) or pdfTeX utilities (`pdfinfo`, `pdffonts`) to analyze the PDF and determine the precise bounding box of the content.
2. PDF Regeneration: Once the bounding box is found, it generates a small LaTeX file that includes the original PDF, instructs pdflatex to crop it to the calculated dimensions, and then processes this LaTeX file to create the new, cropped PDF.

OUTPUT NAMING CONVENTION

By default, if no output filename is specified, pdfcrop creates a new PDF file by appending -crop to the original input filename (e.g., `document.pdf` becomes `document-crop.pdf`). This can be overridden using the --output-file option.

HISTORY

pdfcrop is a Perl script that emerged as a practical solution within the TeX Live distribution to address the pervasive issue of excessive white margins in PDFs, particularly those generated by LaTeX. Its development aimed to automate the often tedious manual cropping process. Over time, it has evolved to include more robust bounding box detection, better control over margins, and improved handling of diverse PDF structures, making it a widely used tool for optimizing PDF documents.

SEE ALSO

gs(1), pdflatex(1), pdfinfo(1), pdfjam(1)

Copied to clipboard