duperemove
filesystem extent deduplication tool
TLDR
Search for duplicate extents
$ duperemove -r [path/to/directory]
Deduplicate on Btrfs or XFS$ duperemove -r -d [path/to/directory]
Use hash file for persistence$ duperemove -r -d --hashfile=[path/to/hashfile] [path/to/directory]
Limit threads$ duperemove -r -d --hashfile=[path/to/hashfile] --io-threads=[n] --cpu-threads=[n] [path/to/directory]
SYNOPSIS
duperemove [options] paths...
DESCRIPTION
duperemove finds duplicate filesystem extents and optionally schedules them for deduplication. On filesystems like Btrfs and XFS, identical data blocks can be shared between files, saving disk space.An extent is a contiguous area of storage allocated for a file.
PARAMETERS
-r
Recursively process directories-d
Deduplicate (schedule duplicates for dedup)--hashfile file
Store hashes in file for reuse--io-threads n
I/O thread count--cpu-threads n
CPU thread count for hash comparison-h
Print human-readable sizes-v
Verbose output--dedupe-options=OPTIONS
Comma-separated dedupe options (e.g., partial, same)-b SIZE
Block size for hashing (default: 128K)
CAVEATS
Only works on filesystems supporting extent-level deduplication (Btrfs, XFS). Deduplication is handled by the kernel via the `FIDEDUPERANGE` ioctl. Using a hashfile is strongly recommended for large datasets as it reduces memory usage and enables incremental scans across runs. Without `-d`, the tool only reports duplicates without deduplicating. Read-only files can still be deduplicated since dedup operates at the filesystem level.
