LinuxCommandLibrary

pdf-parser

analyzes PDF file structure

TLDR

Parse PDF structure

$ pdf-parser [file.pdf]
copy
Search for keyword
$ pdf-parser -s [keyword] [file.pdf]
copy
Show specific object
$ pdf-parser -o [5] [file.pdf]
copy
Extract streams
$ pdf-parser -d [output.bin] -o [5] [file.pdf]
copy
Show statistics
$ pdf-parser -a [file.pdf]
copy
Filter by object type
$ pdf-parser -t [/JavaScript] [file.pdf]
copy
Decode streams
$ pdf-parser -f [file.pdf]
copy

SYNOPSIS

pdf-parser [-s search] [-o id] [-t type] [-f] [options] file

DESCRIPTION

pdf-parser analyzes PDF file structure. It's used for malware analysis and forensics.
Object enumeration shows all PDF objects. Each object's type and contents are displayed.
Searching finds embedded scripts, URLs, or suspicious content. JavaScript and launch actions are common malware vectors.
Stream extraction dumps compressed or encoded data. Filters decompress FlateDecode and other encodings.
Statistics summarize object types present. This quickly identifies files with unusual structures.
Reference following traces object relationships. Cross-references reveal document structure.

PARAMETERS

-s STRING

Search for string.
-o ID
Select object by ID.
-t TYPE
Filter by type.
-f
Apply stream filters.
-d FILE
Dump stream to file.
-a
Statistics and analysis.
-w
Raw output.
-r N
Reference object.
-c
Content stream.
-v
Verbose output.

CAVEATS

Malicious PDFs may crash parsers. Output can be very large. Not all PDF features supported.

HISTORY

pdf-parser was created by Didier Stevens for PDF malware analysis. It's part of his toolkit for analyzing suspicious documents and is widely used in incident response.

SEE ALSO

pdfinfo(1), pdftotext(1), pdfid(1), strings(1)

> TERMINAL_GEAR

Curated for the Linux community

Copied to clipboard

> TERMINAL_GEAR

Curated for the Linux community