bzip2
Compress files using block-sorting lossless algorithm
TLDR
Compress a file
Decompress a file
Decompress a file to stdout
Test the integrity of each file inside the archive file
Show the compression ratio for each file processed with detailed information
Decompress a file overwriting existing files
Display help
SYNOPSIS
bzip2 [ -cdfkqstvzVL ] [ --best ] [ --fast ] [ -num ] [ filename ... ]
bunzip2 [ -fkqvVL ] [ filename ... ]
bzcat [ -s ] [ filename ... ]
PARAMETERS
-c, --stdout
Compress or decompress to standard output. This option keeps the original file intact.
-d, --decompress
Force decompression. bunzip2 is an alias for bzip2 -d.
-k, --keep
Keep original (input) files. By default, bzip2 deletes the original file after compression or decompression.
-f, --force
Force overwrite of output files. This option prevents bzip2 from prompting before overwriting existing files.
-s, --small
Reduce memory usage for decompression. Useful on systems with limited RAM, but makes decompression slower.
-t, --test
Test integrity of compressed files. Checks if the compressed file is intact.
-v, --verbose
Be verbose during operation. Displays compression ratio and other information.
-z, --compress
Force compression. This is the default behavior if no other operation is specified.
-num
Set the block size to 100k multiplied by 'num'. 'num' ranges from 1 to 9. Higher numbers (e.g., -9, --best) give better compression but use more memory and time. -1 (--fast) gives faster compression.
--best
Alias for -9. Offers the highest compression ratio (and slowest speed/most memory).
--fast
Alias for -1. Offers the fastest compression speed (and lowest compression ratio).
-L, --license
Display license information.
-V, --version
Display version information.
DESCRIPTION
bzip2 is a powerful command-line utility for compressing and decompressing files in Linux and Unix-like systems. It utilizes the Burrows-Wheeler block sorting text compression algorithm, coupled with run-length encoding and Huffman coding, to achieve typically better compression ratios than older methods like gzip. While it often produces smaller archives, bzip2 is generally slower for both compression and decompression, and requires more memory, especially during compression. It compresses single files and, by default, replaces the original file with the compressed version, appending the .bz2 extension. Conversely, decompression removes the .bz2 extension and restores the original file. Unlike archiving tools like tar, bzip2 does not combine multiple files into a single archive; it operates on individual files. For archiving directories or multiple files, it is commonly used in conjunction with tar, creating files like archive.tar.bz2. Its strength lies in maximizing compression for single large files, making it suitable for distributing large software packages or backups where disk space is a premium.
CAVEATS
- Not an archiver: bzip2 only compresses single files. It cannot combine multiple files or directories into one archive directly. For that, it must be used in conjunction with tar.
- Slower than gzip: Compression and decompression are generally slower compared to gzip, especially on systems with less memory.
- Memory usage: Higher memory consumption during compression than gzip, especially for larger block sizes (e.g., -9). Decompression memory usage is lower but can be reduced further with -s.
- Corruption: If a bzip2 compressed file is corrupted, data recovery might be more difficult than with gzip due to the block-sorting algorithm.
USAGE WITH <B>TAR</B>
While bzip2 compresses single files, it's commonly used with the tar command to create compressed archives of multiple files or entire directories. For example, tar -jcvf archive.tar.bz2 /path/to/directory compresses and archives a directory into a single .tar.bz2 file. To extract, tar -jxvf archive.tar.bz2.
INPUT/OUTPUT BEHAVIOR
By default, bzip2 operates as a filter that replaces the input file with the compressed or decompressed output. For example, bzip2 filename.txt creates filename.txt.bz2 and deletes filename.txt. To keep the original, use the -k (--keep) option. If you need to pipe the output, use -c (--stdout).
HISTORY
bzip2 was developed by Julian Seward and first released in 1996. It was designed to improve upon the compression achieved by gzip by implementing the Burrows-Wheeler Transform (BWT), a block-sorting algorithm, combined with a move-to-front transform and Huffman coding. The BWT rearranges the input data into blocks so that identical or similar sequences are grouped together, making them easier to compress with subsequent algorithms. This innovative approach allowed bzip2 to often achieve superior compression ratios, leading to its widespread adoption, especially for distributing software and large data files where disk space was a primary concern. Its development marked a significant step forward in general-purpose lossless data compression.
SEE ALSO
gzip(1): A widely used compression utility, generally faster but with slightly lower compression ratios., bunzip2(1): A symlink to bzip2, used for decompression., bzcat(1): A symlink to bzip2 -c, used to decompress files to standard output without removing the original., tar(1): A utility for creating, extracting, and managing archive files. Often used with bzip2 (e.g., tar -jcvf archive.tar.bz2 directory)., xz(1): Another modern compression utility, often achieving even better compression ratios than bzip2, using the LZMA algorithm.