LinuxCommandLibrary

rmlint

Find duplicate files and filesystem lint

TLDR

Find duplicates in directory

$ rmlint [/path/to/directory]
copy
Find duplicates and generate removal script
$ rmlint [/path/to/directory]
copy
Find only duplicate files (not empty files/dirs)
$ rmlint -T df [/path/to/directory]
copy
Keep first found duplicate
$ rmlint -k [/path/to/directory]
copy
Compare two directories (keep originals in first)
$ rmlint [/path/to/originals] // [/path/to/duplicates]
copy
Find empty files and directories
$ rmlint -T ef,ed [/path/to/directory]
copy
Dry run (output only)
$ rmlint --dry-run [/path/to/directory]
copy
JSON output
$ rmlint -o json [/path/to/directory]
copy

SYNOPSIS

rmlint [-T types] [-k] [-o output] [options] paths [// paths]

DESCRIPTION

rmlint finds duplicate files, empty files, broken symlinks, and other lint. It generates scripts to remove or manage found items.
Duplicate detection uses progressive matching: size first, then partial hashes, finally full hashes or paranoid byte comparison. This minimizes I/O for large collections.
The double-slash (//) separator defines original vs duplicate paths. Files in paths before // are preferred originals; those after are marked as duplicates. This enables controlled cleanup of backup or mirror directories.
Output includes a shell script (rmlint.sh) with removal commands. The script is cautious by default, requiring confirmation and keeping originals. JSON and CSV outputs enable custom processing.
Sorting criteria (-S) determine which duplicate is kept: by age, path depth, basename length, or alphabetically. Multiple criteria combine for fine-grained control.
Additional lint types include: empty directories, broken symlinks, files with bad user/group, and non-stripped binaries.

PARAMETERS

-T, --types TYPES

Find types: df (duplicates), ef (empty files), ed (empty dirs).
-k, --keep-all-tagged
Keep files in tagged (first) paths.
-m, --must-match-tagged
Require match in tagged path.
-o, --output FMT
Output format: sh, csv, json, py.
-c, --config FMT:KEY=VALUE
Configure output handler.
-s, --size RANGE
Filter by file size.
-d, --max-depth N
Maximum directory depth.
--dry-run
Don't write output files.
-g, --progress
Show progress bar.
-p, --paranoid
Byte-by-byte comparison.
-S CRITERIA, --sortcriteria CRITERIA
Sorting for original selection.
-n, --newer-than-stamp FILE
Only files newer than file.
-r, --hidden
Include hidden files.
-f, --followlinks
Follow symbolic links.

CAVEATS

Hash-based detection has theoretical collision risk. Large filesystems need significant memory for tracking. Follow-symlink mode can expand search dramatically. Removal scripts should be reviewed before execution. Some filesystems don't track modification time accurately.

HISTORY

rmlint was created by Christopher Pahl (SeeSpotRun) around 2012 as a fast, modern duplicate finder. Written in C, it replaced slower Python predecessors. The project emphasizes safety (generating review-able scripts) and performance (parallel hashing, incremental matching).

SEE ALSO

fdupes(1), jdupes(1), duperemove(1), rdfind(1)

> TERMINAL_GEAR

Curated for the Linux community

Copied to clipboard

> TERMINAL_GEAR

Curated for the Linux community