LinuxCommandLibrary

pdfinfo

Extract metadata from PDF files

TLDR

Print PDF file information

$ pdfinfo [path/to/file.pdf]
copy

Specify user password for PDF file to bypass security restrictions
$ pdfinfo -upw [password] [path/to/file.pdf]
copy

Specify owner password for PDF file to bypass security restrictions
$ pdfinfo -opw [password] [path/to/file.pdf]
copy

SYNOPSIS

pdfinfo [options] PDF-file

PARAMETERS

-f
    Specifies the first page to examine. By default, it's 1.

-l
    Specifies the last page to examine. By default, it's the last page of the document.

-opw
    Provides the owner password for encrypted PDF files. This allows access to more restricted information.

-upw
    Provides the user password for encrypted PDF files.

-box
    Prints the page bounding box information (MediaBox, CropBox, BleedBox, TrimBox, ArtBox) for each page.

-enc
    Sets the output text encoding. Use -listenc to see available encodings.

-isodates
    Prints dates in ISO 8601 format (YYYY-MM-DDTHH:MM:SS+-HH:MM).

-rawdates
    Prints dates in the raw string format as stored in the PDF file.

-json
    Outputs the document information in a JSON format. This is particularly useful for programmatic parsing.

-v
    Prints the copyright and version information for the pdfinfo utility.

DESCRIPTION

The pdfinfo command is a utility for extracting various pieces of information about a PDF file.
It is part of the Poppler PDF rendering library and provides a command-line interface to inspect PDF metadata.
This includes details such as the title, author, subject, keywords, creation and modification dates, page count, page size (media, crop, bleed, trim, art boxes), encryption status, and more.
It's an invaluable tool for scripting, document management, or simply quickly checking the properties of a PDF file without opening it in a viewer.
Unlike other PDF utilities, pdfinfo is non-destructive; it only reads information from the PDF and does not modify the original document in any way.

CAVEATS

pdfinfo requires a valid PDF file. It may behave unexpectedly or fail on malformed or corrupted documents.
For encrypted PDFs, the appropriate user or owner password (via -upw or -opw) is necessary to access document information. Without it, only limited details might be available or the command may fail.
The amount and type of metadata available depend entirely on what was embedded into the PDF during its creation. Some PDFs may have very sparse information.
This command is designed for metadata extraction only; it does not extract the textual content of the PDF. Use pdftotext for that purpose.

TYPICAL OUTPUT FIELDS

When run without the -json option, pdfinfo typically outputs fields such as:
Title, Author, Creator, Producer, CreationDate, ModDate, Tagged, UserProperties, Page size, Page rot, File size, Optimized, PDF version, Encrypted, Page count.

USAGE WITH ENCRYPTED FILES

If a PDF file is password-protected, pdfinfo will prompt for a password unless one is provided via the -upw (user password) or -opw (owner password) options. The owner password typically grants more permissions and access to metadata than the user password.

HISTORY

pdfinfo originated as part of the Xpdf project, a free PDF viewer and toolkit. Later, the core PDF rendering library and utilities, including pdfinfo, were forked and continued development under the Poppler project.
Poppler's goal is to provide a robust, open-source PDF rendering engine and command-line tools for various Linux distributions and applications. The command's functionality has evolved to include modern features like JSON output, reflecting its continued relevance in automated PDF processing.

SEE ALSO

pdftotext(1), pdffonts(1), pdfimages(1), pdftops(1), file(1), exiftool(1)

Copied to clipboard