bzip2

Compress files using block-sorting lossless algorithm

TLDR

Compress a file

$ bzip2 [path/to/file_to_compress]

Decompress a file

$ bzip2 [[-d|--decompress]] [path/to/compressed_file.bz2]

Decompress a file to stdout

$ bzip2 [[-dc|--decompress --stdout]] [path/to/compressed_file.bz2]

Test the integrity of each file inside the archive file

$ bzip2 [[-t|--test]] [path/to/compressed_file.bz2]

Show the compression ratio for each file processed with detailed information

$ bzip2 [[-v|--verbose]] [path/to/compressed_files.bz2]

Decompress a file overwriting existing files

$ bzip2 [[-f|--force]] [path/to/compressed_file.bz2]

Display help

$ bzip2 [[-h|--help]]

SYNOPSIS

bzip2 [ -cdfkqstvzVL ] [ --best ] [ --fast ] [ -num ] [ filename ... ]
bunzip2 [ -fkqvVL ] [ filename ... ]
bzcat [ -s ] [ filename ... ]

-c, --stdout
    Compress or decompress to standard output. This option keeps the original file intact.

-d, --decompress
    Force decompression. bunzip2 is an alias for bzip2 -d.

-k, --keep
    Keep original (input) files. By default, bzip2 deletes the original file after compression or decompression.

-f, --force
    Force overwrite of output files. This option prevents bzip2 from prompting before overwriting existing files.

-s, --small
    Reduce memory usage for decompression. Useful on systems with limited RAM, but makes decompression slower.

-t, --test
    Test integrity of compressed files. Checks if the compressed file is intact.

-v, --verbose
    Be verbose during operation. Displays compression ratio and other information.

-z, --compress
    Force compression. This is the default behavior if no other operation is specified.

-num
    Set the block size to 100k multiplied by 'num'. 'num' ranges from 1 to 9. Higher numbers (e.g., -9, --best) give better compression but use more memory and time. -1 (--fast) gives faster compression.

--best
    Alias for -9. Offers the highest compression ratio (and slowest speed/most memory).

--fast
    Alias for -1. Offers the fastest compression speed (and lowest compression ratio).

-L, --license
    Display license information.

-V, --version
    Display version information.

DESCRIPTION

bzip2 is a powerful command-line utility for compressing and decompressing files in Linux and Unix-like systems. It utilizes the Burrows-Wheeler block sorting text compression algorithm, coupled with run-length encoding and Huffman coding, to achieve typically better compression ratios than older methods like gzip. While it often produces smaller archives, bzip2 is generally slower for both compression and decompression, and requires more memory, especially during compression. It compresses single files and, by default, replaces the original file with the compressed version, appending the .bz2 extension. Conversely, decompression removes the .bz2 extension and restores the original file. Unlike archiving tools like tar, bzip2 does not combine multiple files into a single archive; it operates on individual files. For archiving directories or multiple files, it is commonly used in conjunction with tar, creating files like archive.tar.bz2. Its strength lies in maximizing compression for single large files, making it suitable for distributing large software packages or backups where disk space is a premium.

CAVEATS

Not an archiver: bzip2 only compresses single files. It cannot combine multiple files or directories into one archive directly. For that, it must be used in conjunction with tar.
Slower than gzip: Compression and decompression are generally slower compared to gzip, especially on systems with less memory.
Memory usage: Higher memory consumption during compression than gzip, especially for larger block sizes (e.g., -9). Decompression memory usage is lower but can be reduced further with -s.
Corruption: If a bzip2 compressed file is corrupted, data recovery might be more difficult than with gzip due to the block-sorting algorithm.

USAGE WITH <B>TAR</B>

While bzip2 compresses single files, it's commonly used with the tar command to create compressed archives of multiple files or entire directories. For example, tar -jcvf archive.tar.bz2 /path/to/directory compresses and archives a directory into a single .tar.bz2 file. To extract, tar -jxvf archive.tar.bz2.

INPUT/OUTPUT BEHAVIOR

By default, bzip2 operates as a filter that replaces the input file with the compressed or decompressed output. For example, bzip2 filename.txt creates filename.txt.bz2 and deletes filename.txt. To keep the original, use the -k (--keep) option. If you need to pipe the output, use -c (--stdout).

HISTORY

bzip2 was developed by Julian Seward and first released in 1996. It was designed to improve upon the compression achieved by gzip by implementing the Burrows-Wheeler Transform (BWT), a block-sorting algorithm, combined with a move-to-front transform and Huffman coding. The BWT rearranges the input data into blocks so that identical or similar sequences are grouped together, making them easier to compress with subsequent algorithms. This innovative approach allowed bzip2 to often achieve superior compression ratios, leading to its widespread adoption, especially for distributing software and large data files where disk space was a primary concern. Its development marked a significant step forward in general-purpose lossless data compression.