LinuxCommandLibrary

duperemove

filesystem extent deduplication tool

TLDR

Search for duplicate extents

$ duperemove -r [path/to/directory]
copy
Deduplicate on Btrfs or XFS
$ duperemove -r -d [path/to/directory]
copy
Use hash file for persistence
$ duperemove -r -d --hashfile=[path/to/hashfile] [path/to/directory]
copy
Limit threads
$ duperemove -r -d --hashfile=[path/to/hashfile] --io-threads=[n] --cpu-threads=[n] [path/to/directory]
copy

SYNOPSIS

duperemove [options] paths...

DESCRIPTION

duperemove finds duplicate filesystem extents and optionally schedules them for deduplication. On filesystems like Btrfs and XFS, identical data blocks can be shared between files, saving disk space.
An extent is a contiguous area of storage allocated for a file.

PARAMETERS

-r

Recursively process directories
-d
Deduplicate (schedule duplicates for dedup)
--hashfile file
Store hashes in file for reuse
--io-threads n
I/O thread count
--cpu-threads n
CPU thread count for hash comparison
-v
Verbose output

CAVEATS

Only works on filesystems supporting extent-level deduplication (Btrfs, XFS). Deduplication is handled by the kernel. Using hashfile reduces memory usage and enables incremental scans.

SEE ALSO

btrfs(8), fdupes(1)

> TERMINAL_GEAR

Curated for the Linux community

Copied to clipboard

> TERMINAL_GEAR

Curated for the Linux community