ddrescue

Recover data from failing storage devices

TLDR

Take an image of a device, creating a log file

$ sudo ddrescue [/dev/sdb] [path/to/image.dd] [path/to/log.txt]

Clone Disk A to Disk B, creating a log file

$ sudo ddrescue [[-f|--force]] [[-n|--no-scrape]] [/dev/sdX] [/dev/sdY] [path/to/log.txt]

SYNOPSIS

ddrescue [options] infile outfile [mapfile]

--help
    Display help message and exit.

--version
    Display version information and exit.

--verbose, -v
    Be verbose; show detailed progress and error information.

--force, -f
    Overwrite the output file or device, even if it exists and is a block device. Use with extreme caution.

--no-trim, -n
    Skip the trimming phase of recovery (the second pass, which tries to recover data from the edges of bad areas).

--retrim
    Truncate the unreadable parts at the end of the output file. Useful for partial images.

--reverse, -R
    Read backwards. Can be effective for drives that struggle when reading forward through bad sectors.

--direct, -D
    Use direct disk access (O_DIRECT). Bypasses the kernel's page cache, which can be beneficial for failing hardware.

--sparse, -S
    Create a sparse output file. Unread blocks will not consume disk space on the output file system, saving space for partial recoveries.

--synchronous, -s
    Use synchronous writes (O_SYNC). Ensures data is written to disk immediately, often used for USB drives or unreliable media.

--cluster-size=BYTES, -c BYTES
    Set the size of the clusters (blocks) to copy. Default is 128 KiB. Larger values can speed up reads from healthy areas.

--min-read-rate=RATE
    Set the minimum reading rate in bytes/sec. If reading falls below this, ddrescue may attempt to skip more data.

--max-read-rate=RATE
    Set the maximum reading rate in bytes/sec. Useful for limiting I/O on fragile drives to prevent further damage.

--timeout=SECONDS
    Stop after SECONDS without progress. Prevents indefinite hangs on unresponsive drives, especially useful for automation.

--log-file=FILE, -L FILE
    Specify the log file (mapfile) to record bad sectors and progress. Essential for resuming operations and multi-pass recovery.

--error-limit=N
    Abort after N read errors. Can prevent ddrescue from spending excessive time on severely damaged areas with no hope of recovery.

--data-recovery-mode=MODE
    Set the data recovery mode (e.g., 'no-split', 'test-input'). Consult the man page for available modes and their implications.

--output-direction=DIRECTION
    Specify the direction of copying: 'forward' (default), 'reverse', 'fwd-rev', 'rev-fwd'. Dictates the order of passes.

--output-file-size=BYTES
    Limit the size of the output file (or device) to BYTES.

--input-file-size=BYTES
    Limit the size of the input file (or device) to BYTES.

--input-position=BYTES
    Start reading at BYTES offset from the beginning of the input file.

--output-position=BYTES
    Start writing at BYTES offset from the beginning of the output file.

--data-validation-file=FILE
    Validate the output against a previously created copy or reference file, useful for integrity checks.

--generate-errors=ERRORS
    Generate artificial errors for testing purposes (e.g., '100%' for all errors). Do not use for real recovery.

--fill-area=START:END
    Fill a specified area (from START to END bytes) with a pattern. Useful for sanitization or marking bad areas.

--fill-slack
    Fill the slack space (space after the last used block) with a pattern. Useful for forensic data wiping.

DESCRIPTION

ddrescue is a powerful and reliable data recovery tool. It copies data from one file or block device (like a hard drive or SSD) to another, intelligently handling read errors. Unlike dd, which typically aborts or introduces unhelpful zeros on read errors, ddrescue is specifically designed for recovering data from damaged or failing storage media. It attempts to read the good parts of the input first, then meticulously retries the problematic or "bad" areas.

A key feature is its use of a "mapfile" (or log file). This file records the status of every sector – whether it's copied, unread, or marked as a bad sector. This allows ddrescue to resume an interrupted recovery operation from where it left off, avoiding redundant reads and ensuring efficiency. It also enables multiple passes with different strategies (e.g., trying to read bad sectors in reverse). By iteratively trying to recover data and skipping over unreadable blocks, ddrescue maximizes the amount of salvaged data from a compromised drive, making it an indispensable tool for forensic analysis and disaster recovery.

CAVEATS

ddrescue only attempts to copy data; it does not repair the source drive itself.
Always ensure the outfile has sufficient space to hold the infile's data.
Be extremely careful when specifying infile and outfile, as an error can lead to irreversible data loss on the wrong device. Using sudo is often required for raw device access, increasing the risk if misused.
Avoid mounting the infile (source) partition during the recovery process to prevent further damage or read errors.
Recovery from severely damaged drives may still be partial or impossible. It cannot recover data from physically destroyed platters.
The mapfile is crucial for effective recovery and resuming operations; always specify it.
The recovery process can be very time-consuming for large or severely damaged drives.

DEFAULT RECOVERY STRATEGY

ddrescue employs a sophisticated multi-pass strategy by default to maximize data recovery:
1. Copying Pass: It first reads and copies large contiguous blocks of data, skipping over error areas quickly.
2. Trimming Pass: It then attempts to read the edges of the skipped error areas more carefully.
3. Scraping Pass: Finally, it tries to read the remaining unreadable blocks, sector by sector, in a targeted manner.
This approach prioritizes speed for the healthy parts and then dedicates more time to problematic areas.

IMPORTANCE OF THE MAPFILE

The mapfile (specified with infile outfile mapfile) is critical. It records the progress and location of bad sectors. This allows ddrescue to:
- Resume an interrupted operation without losing progress.
- Perform multiple passes with different strategies (e.g., --reverse).
- Avoid re-reading already copied data, significantly speeding up the recovery process.
- Be used to compare two copies of a file/device to create a third, merged copy.

HISTORY

GNU ddrescue was developed by Antonio Diaz Diaz as part of the GNU project. It emerged as a more advanced and robust alternative to the traditional dd command for data recovery scenarios, specifically addressing the limitations of dd when dealing with read errors on failing storage devices. The first stable release dates back to the early 2000s, with continuous development adding features like the indispensable mapfile for progress tracking and advanced recovery strategies. Its focus on efficiency and fault tolerance quickly made it a go-to tool for system administrators, forensic experts, and anyone needing to salvage data from compromised hard drives.