rmlint
Find duplicate files and filesystem lint
TLDR
Find duplicates in directory
SYNOPSIS
rmlint [-T types] [-k] [-o output] [options] paths [// paths]
DESCRIPTION
rmlint finds duplicate files, empty files, broken symlinks, and other lint. It generates scripts to remove or manage found items.
Duplicate detection uses progressive matching: size first, then partial hashes, finally full hashes or paranoid byte comparison. This minimizes I/O for large collections.
The double-slash (//) separator defines original vs duplicate paths. Files in paths before // are preferred originals; those after are marked as duplicates. This enables controlled cleanup of backup or mirror directories.
Output includes a shell script (rmlint.sh) with removal commands. The script is cautious by default, requiring confirmation and keeping originals. JSON and CSV outputs enable custom processing.
Sorting criteria (-S) determine which duplicate is kept: by age, path depth, basename length, or alphabetically. Multiple criteria combine for fine-grained control.
Additional lint types include: empty directories, broken symlinks, files with bad user/group, and non-stripped binaries.
PARAMETERS
-T, --types TYPES
Find types: df (duplicates), ef (empty files), ed (empty dirs).-k, --keep-all-tagged
Keep files in tagged (first) paths.-m, --must-match-tagged
Require match in tagged path.-o, --output FMT
Output format: sh, csv, json, py.-c, --config FMT:KEY=VALUE
Configure output handler.-s, --size RANGE
Filter by file size.-d, --max-depth N
Maximum directory depth.--dry-run
Don't write output files.-g, --progress
Show progress bar.-p, --paranoid
Byte-by-byte comparison.-S CRITERIA, --sortcriteria CRITERIA
Sorting for original selection.-n, --newer-than-stamp FILE
Only files newer than file.-r, --hidden
Include hidden files.-f, --followlinks
Follow symbolic links.
CAVEATS
Hash-based detection has theoretical collision risk. Large filesystems need significant memory for tracking. Follow-symlink mode can expand search dramatically. Removal scripts should be reviewed before execution. Some filesystems don't track modification time accurately.
HISTORY
rmlint was created by Christopher Pahl (SeeSpotRun) around 2012 as a fast, modern duplicate finder. Written in C, it replaced slower Python predecessors. The project emphasizes safety (generating review-able scripts) and performance (parallel hashing, incremental matching).
SEE ALSO
fdupes(1), jdupes(1), duperemove(1), rdfind(1)
