ptargrep

Search within tar archives without extracting

TLDR

Search for a pattern within one or more .tar archives

$ ptargrep "[search_pattern]" [path/to/file1 path/to/file2 ...]

Extract to the current directory using the basename of the file from the archive

$ ptargrep [[-b|--basename]] "[search_pattern]" [path/to/file]

Search for a case-insensitive pattern matching within a .tar archive

$ ptargrep [[-i|--ignore-case]] "[search_pattern]" [path/to/file]

SYNOPSIS

ptargrep [options] pattern archive [files...]

-a
    Match all files that contain the pattern, not just the first occurrence if multiple files share the same name within the archive.

-b
    Print the block number where the match was found (relative to the start of the archive).

-c
    Instead of printing matching lines, print a count of matching lines for each file.

-E
    Interpret pattern as an Extended Regular Expression (ERE). This is often the default behavior for grep.

-F
    Interpret pattern as a fixed string, not a regular expression.

-f name
    Read patterns from the file name, one pattern per line.

-h
    Display a brief usage message and exit.

-i
    Ignore case distinctions in both the pattern and the input data.

-j
    Decompress the archive using bzip2 before searching. (Equivalent to --bzip2).

-l
    List only the names of files within the archive that contain at least one match. Do not print the matching lines.

-L
    List only the names of files within the archive that do not contain any matches.

-n
    Prefix each output line with its 1-based line number within the file.

-P
    Interpret pattern as a Perl Compatible Regular Expression (PCRE).

-q
    Quiet mode: suppress all output. Exit status indicates whether matches were found (0 for match, non-zero for no match or error).

-s
    Suppress error messages about non-existent or unreadable files.

-v
    Invert the match; select non-matching lines instead of matching ones.

-w
    Select only those lines containing matches that form whole words.

-x
    Select only those lines where the entire line matches the pattern exactly.

-y
    Decompress the archive using xz (or lzma) before searching. (Equivalent to --xz).

-z
    Decompress the archive using gzip before searching. (Equivalent to --gzip).

DESCRIPTION

ptargrep is a specialized command-line utility designed to search for text patterns within files stored inside tar archives. Its primary advantage is that it performs the search without requiring the user to extract the entire archive first, saving significant time and disk space, especially for large archives or when only specific files are of interest. It efficiently streams the content from the archive and applies a grep-like search.

While not universally present on all Linux distributions by default (it's often part of the libarchive project, which provides bsdtar), it offers a powerful way to inspect archive contents. It supports various compression formats like gzip, bzip2, and xz transparently, and includes many familiar options from the standard grep command.

CAVEATS

The availability of ptargrep depends on the system's tar implementation. It's typically included with bsdtar (from the libarchive project) but may not be present by default on systems primarily using GNU tar. For very large archives or complex patterns, performance can still be a consideration. ptargrep is a read-only utility; it cannot modify the contents of the archive.

ARCHIVE SPECIFICATION

The archive argument specifies the tar archive file to be searched. This can be a local file path or, in some cases, a device file.

FILE FILTERING

The optional files... arguments allow users to specify specific files or directories within the archive to narrow down the search. If no files are specified, ptargrep searches all files within the archive that are not directories.

REGULAR EXPRESSION SUPPORT

ptargrep supports various types of regular expressions, including basic regular expressions (BREs - default if not specified), extended regular expressions (EREs) with -E, and Perl-compatible regular expressions (PCREs) with -P. This provides flexibility for complex search patterns.

HISTORY

ptargrep is a utility that originated as part of the libarchive project, an open-source library that provides a uniform API for reading and writing various archive formats. It was developed to offer a convenient and efficient way to search within tar archives, leveraging libarchive's streaming capabilities. Its design reflects the common need to quickly inspect archive contents without a full extraction, mirroring the functionality of grep for regular files but extended to compressed archive formats.