bzip2recover

Recover data from damaged bzip2 files

TLDR

Recover all intact blocks from a damaged .bz2 file

$ bzip2recover [damaged_file.bz2]

bzip2recover is a specialized utility for salvaging data from corrupted bzip2 (.bz2) archives. bzip2 compression divides data into independent blocks, each with its own CRC checksum. When file corruption occurs—due to disk errors, transmission issues, or incomplete writes—standard decompression like bunzip2 fails on the entire file.

bzip2recover scans the input file byte-by-byte, identifies intact blocks by verifying their headers and CRCs, and extracts each valid block to a separate output file. Output files are named inputfile0001.bz2, inputfile0002.bz2, and so on, preserving as much original data as possible. Damaged or incomplete blocks are skipped.

After recovery, users can decompress each output file individually with bunzip2. If the blocks form a continuous stream, concatenate the decompressed outputs to reconstruct the original data. This tool excels in localized damage scenarios but cannot repair or reconstruct missing blocks. It processes one file at a time and overwrites existing output files with matching names, so backups are advised.

CAVEATS

Cannot recover data from severely damaged or missing blocks.
Overwrites existing output files like *0001.bz2.
Decompressed blocks may need manual concatenation for full reconstruction.

USAGE WORKFLOW

1. Run: bzip2recover damaged.bz2
2. Decompress outputs: bunzip2 damaged000?.bz2
3. Concatenate if sequential: cat damaged0001 damaged0002 > recovered.txt

OUTPUT NAMING

Files numbered sequentially from 0001. Number of outputs indicates recoverable blocks.

HISTORY

Developed by Julian Seward as part of the bzip2 project, first released in 1996. bzip2recover has remained largely unchanged, focusing on block-level recovery for bzip2's fixed 900kB block structure (configurable via -1 to -9 in bzip2).