rdfind
Find and handle duplicate files efficiently
TLDR
Find duplicates
SYNOPSIS
rdfind [-deleteduplicates true] [-makehardlinks true] [options] directories
DESCRIPTION
rdfind (redundant data find) efficiently locates duplicate files across one or more directory trees using a multi-phase detection algorithm. It first groups files by size, then computes partial checksums on the first bytes of same-sized files, and finally performs full checksums only on files that still match, making it fast even on large file sets.
Once duplicates are identified, rdfind can delete them, replace them with hardlinks (saving disk space while keeping the same path), or replace them with symbolic links. A results file lists all duplicates found for manual review, and the -dryrun flag simulates operations without modifying the filesystem. The first file encountered in the argument order is always kept as the original.
PARAMETERS
-deleteduplicates BOOL
Delete duplicates.-makehardlinks BOOL
Replace with hardlinks.-makesymlinks BOOL
Replace with symlinks.-dryrun BOOL
Simulate only.-ignoreempty BOOL
Ignore empty files.-removeidentinode BOOL
Remove same inode files.-outputname FILE
Results filename.-minsize BYTES
Minimum file size.
CAVEATS
Hardlinks only on same filesystem. Symlinks may break if original moves. Careful with deletion.
HISTORY
rdfind (really delete find) was created for efficient duplicate detection. Its multi-stage algorithm handles large file sets quickly.
