LinuxCommandLibrary
GitHubF-DroidGoogle Play Store

duperemove

filesystem extent deduplication tool

TLDR

Search for duplicate extents
$ duperemove -r [path/to/directory]
copy
Deduplicate on Btrfs or XFS
$ duperemove -r -d [path/to/directory]
copy
Use hash file for persistence
$ duperemove -r -d --hashfile=[path/to/hashfile] [path/to/directory]
copy
Limit threads
$ duperemove -r -d --hashfile=[path/to/hashfile] --io-threads=[n] --cpu-threads=[n] [path/to/directory]
copy

SYNOPSIS

duperemove [options] paths...

DESCRIPTION

duperemove finds duplicate filesystem extents and optionally schedules them for deduplication. On filesystems like Btrfs and XFS, identical data blocks can be shared between files, saving disk space.An extent is a contiguous area of storage allocated for a file.

PARAMETERS

-r

Recursively process directories
-d
Deduplicate (schedule duplicates for dedup)
--hashfile file
Store hashes in file for reuse
--io-threads n
I/O thread count
--cpu-threads n
CPU thread count for hash comparison
-h
Print human-readable sizes
-v
Verbose output
--dedupe-options=OPTIONS
Comma-separated dedupe options (e.g., partial, same)
-b SIZE
Block size for hashing (default: 128K)

CAVEATS

Only works on filesystems supporting extent-level deduplication (Btrfs, XFS). Deduplication is handled by the kernel via the `FIDEDUPERANGE` ioctl. Using a hashfile is strongly recommended for large datasets as it reduces memory usage and enables incremental scans across runs. Without `-d`, the tool only reports duplicates without deduplicating. Read-only files can still be deduplicated since dedup operates at the filesystem level.

SEE ALSO

btrfs(8), fdupes(1), rmlint(1)

Copied to clipboard
Kai