bzegrep
Search compressed files for a pattern
TLDR
Search for extended regex (supporting ?, +, {}, () and |) in a compressed file (case-sensitive)
Search for extended regex (supporting ?, +, {}, () and |) in a compressed file (case-insensitive)
Search for lines that do not match a pattern
Print file name and line number for each match
Search for lines matching a pattern, printing only the matched text
Recursively search files in a bzip2 compressed tar archive for a pattern
SYNOPSIS
bzegrep [grep_options] PATTERN [FILE...]
PARAMETERS
grep_options
bzegrep typically accepts and passes through most options available to the standard grep command. Common examples include:
-i, --ignore-case
Ignore case distinctions in patterns and data.
-v, --invert-match
Invert the sense of matching, to select non-matching lines.
-r, -R, --recursive
Search directories recursively. For compressed files, this applies to files within specified directories.
-l, --files-with-matches
Print only the names of files containing matches, one per file.
-n, --line-number
Prefix each line of output with the 1-based line number within its respective file.
-c, --count
Print only a count of matching lines per file.
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines.
-B NUM, --before-context=NUM
Print NUM lines of leading context before matching lines.
-C NUM, --context=NUM
Print NUM lines of output context (leading and trailing) around matching lines.
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (ERE). This is often the default behavior for bzegrep as it typically calls egrep or grep -E internally.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.
DESCRIPTION
bzegrep is a utility designed to search for patterns within files compressed with either gzip (.gz extension) or bzip2 (.bz2 extension) without explicit prior decompression. It typically functions as a shell script wrapper that detects the compression type and then passes the file content to the underlying grep command, usually via zgrep for gzipped files and bzgrep for bzipped files.
This provides a seamless way to search through compressed logs, archives, and other data files, eliminating the manual steps of decompressing and then searching. It supports most, if not all, options available to the standard grep command, allowing for versatile pattern matching (e.g., regular expressions, fixed strings), control over output format (e.g., line numbers, context lines), and recursive directory searching, similar to how grep operates on uncompressed files.
CAVEATS
- Performance Overhead: Searching compressed files requires on-the-fly decompression, which can be slower and consume more CPU/memory than searching uncompressed files, especially for very large files.
- Non-Standard Command: bzegrep itself is not a universally standardized command across all Linux distributions. While its functionality is common, specific implementations might vary, or it might simply be a symlink or shell script combining zgrep and bzgrep. The more standard commands are zgrep and bzgrep.
- Error Handling: May not provide verbose error messages for corrupted or malformed compressed files; it might simply fail or produce incomplete output.
- Dependency: Relies on grep, gzip, and bzip2 (or their respective decompression utilities) being installed and accessible in the system's PATH.
USAGE WITH PIPES
Although primarily used with files, bzegrep (or its underlying components like zgrep/bzgrep via piping zcat/bzcat) can often operate on compressed data piped to its standard input. For example,
zcat file.gz | bzegrep PATTERN
or bzcat file.bz2 | bzegrep PATTERN. Some implementations of bzegrep might even attempt to auto-detect the compression type from piped input, though this is less common than with file arguments.
UNDERLYING MECHANISM
The typical mechanism for bzegrep involves spawning a decompression command (e.g., gunzip -c or bunzip2 -c) for each compressed file. The decompressed content is then piped directly to the standard grep command. This 'on-the-fly' decompression ensures that the original compressed files remain untouched and avoids the need for temporary uncompressed files, saving disk I/O and space.
HISTORY
bzegrep does not have a distinct, long-standing historical development as a standalone project like grep itself. Instead, its existence typically stems from the practical need to search across different common compressed file formats seamlessly. While zgrep (for gzip files) and bzgrep (for bzip2 files) emerged as part of their respective compression utility packages in the early to late 1990s, bzegrep often appeared as a convenient wrapper or a simple shell script. Its development reflects a common user requirement in Unix-like systems to transparently interact with compressed data, especially logs and archives, which became prevalent for disk space efficiency. It exemplifies the Unix philosophy of combining smaller, specialized tools (grep, gzip, bzip2) to achieve a more complex and user-friendly function.