LinuxCommandLibrary

qpdf

Transform and manipulate PDF files

TLDR

Extract pages 1-3, 5 and 6-10 from a PDF file and save them as another one

$ qpdf --empty --pages [path/to/input.pdf] [1-3,5,6-10] -- [path/to/output.pdf]
copy

Merge (concatenate) all the pages of multiple PDF files and save the result as a new PDF
$ qpdf --empty --pages [path/to/file1.pdf file2.pdf ...] -- [path/to/output.pdf]
copy

Merge (concatenate) given pages from a list of PDF files and save the result as a new PDF
$ qpdf --empty --pages [path/to/file1.pdf] [1,6-8] [path/to/file2.pdf] [3,4,5] -- [path/to/output.pdf]
copy

Write each group of n pages to a separate output file with a given filename pattern
$ qpdf --split-pages=[n] [path/to/input.pdf] [path/to/out_%d.pdf]
copy

Rotate certain pages of a PDF with a given angle
$ qpdf --rotate=[90:2,4,6] --rotate=[180:7-8] [path/to/input.pdf] [path/to/output.pdf]
copy

Remove the password from a password-protected file
$ qpdf --password=[password] --decrypt [path/to/input.pdf] [path/to/output.pdf]
copy

SYNOPSIS

qpdf [options] input.pdf [output.pdf]
qpdf --empty --pages input.pdf [pagespec] [output.pdf]
qpdf --json input.pdf
qpdf {--help|--version}

PARAMETERS

--encrypt key_bits user_password owner_password
    Encrypts the output PDF with specified key bits (40, 128, or 256) and passwords.

--decrypt
    Decrypts an encrypted input PDF file. May require --password.

--linearize
    Optimizes the output PDF for web viewing (fast web view) by linearizing its internal structure.

--qdf
    Outputs the PDF in 'QDF' mode, which makes its internal structure more readable by uncompressing streams and making objects direct.

--empty --pages input.pdf [pagespec]
    Creates a new PDF using pages specified from input files, enabling merging or splitting operations.

--password password
    Provides the password for accessing an encrypted input PDF file.

--flatten-annotations
    Converts interactive annotations (e.g., form fields, pop-ups) into static content.

--remove-unreferenced-objects
    Removes objects that are not directly referenced from the PDF's logical structure, potentially reducing file size.

--json
    Outputs a JSON representation of the PDF's structural information, useful for programmatic analysis.

--coalesce-contents
    Attempts to combine multiple content streams on a single page into one, simplifying the page description.

--compression=level
    Sets the compression level for output streams, where 0 means no compression and higher levels mean more compression.

--replace-input
    Replaces the original input file with the generated output file. Use with caution.

--help
    Displays a help message with detailed command-line options and usage.

--version
    Displays the version information of the qpdf utility.

DESCRIPTION

qpdf is a powerful command-line tool for structural, content-preserving transformations of PDF files. It can encrypt and decrypt documents, linearize them for web optimization, merge or split PDFs, and uncompress streams for detailed analysis. qpdf is invaluable for inspecting the internal object structure of PDF files, making it a favorite among developers and power users for debugging, security analysis, and advanced PDF manipulations that go beyond simple content editing. It is known for its reliability in handling various PDF complexities and preserving the integrity of the document's content while modifying its underlying structure.

CAVEATS

qpdf focuses on structural transformations, not direct content editing like changing text or images. Incorrect use, especially with options like --replace-input, can lead to data loss. While powerful, qpdf may not fully preserve all highly complex or proprietary PDF features, or they might behave unexpectedly if not explicitly handled. Output in QDF mode is for inspection and debugging; modifications to QDF output might not result in a valid PDF if not syntactically correct.

INSPECTING PDF STRUCTURE

qpdf excels at providing deep insights into a PDF's internal structure. Options like --json, --show-pages, --show-object-streams, and --qdf are invaluable for debugging malformed PDFs, understanding PDF vulnerabilities, or analyzing how specific features are implemented within a document.

SECURITY APPLICATIONS

Beyond encryption and decryption, qpdf can be used in security contexts to sanitize PDF files. For instance, removing unreferenced objects (--remove-unreferenced-objects) can eliminate potentially malicious hidden data, and it can be used to modify permission bits or add/remove passwords to control access to sensitive documents.

HISTORY

qpdf was originally developed by Jay Berkenbilt, with its first public release appearing around 2004-2005. Written in C++, it was designed to be a robust and reliable tool for manipulating PDF files at a structural level. It gained popularity as a versatile command-line utility for its ability to handle complex PDF features, including various encryption schemes and object streams, while providing a clear and efficient interface. Its development has continued steadily, adapting to evolving PDF specifications and addressing a wide range of use cases from simple merges to intricate structural analysis.

SEE ALSO

pdftk(1), mutool(1), gs(1), pdfinfo(1), pdfunite(1)

Copied to clipboard