rmlint
Find and remove duplicate files
TLDR
Check directories for duplicated, empty and broken files
Check for duplicates bigger than a specific size, preferably keeping files in tagged directories (after the double slash)
Check for space wasters, keeping everything in the untagged directories
Delete duplicate files found by an execution of rmlint
Find duplicate directory trees based on data, ignoring names
Mark files at lower path [d]epth as originals, on tie choose shorter [l]ength
Find files with identical filename and contents, and link rather than delete the duplicates
Use data as master directory. Find only duplicates in backup that are also in data. Do not delete any files in data
SYNOPSIS
rmlint [options] paths...
PARAMETERS
-o format
Specify the output format. Common options include `sh` (shell script for removal), `json`, `csv`, and `parrot` (for the GUI).
-T types
Define what kind of "lint" to find. Multiple types can be comma-separated, e.g., `duplicates,emptyfiles,badlinks`.
-L, --symlink
Replace identified duplicate files with symbolic links to the kept file.
-D, --hardlink
Replace identified duplicate files with hard links to the kept file.
-S size
Ignore files smaller than a specified size (e.g., `1M`, `100K`).
-X, --destroy-badlinks
Automatically remove broken symbolic links without generating a script.
Use with extreme caution.
-j N
Use N CPU cores for parallel processing, speeding up scans on multi-core systems.
-V, --version
Display version information and exit.
-h, --help
Show the help message and exit.
DESCRIPTION
rmlint is a powerful and fast command-line utility designed for finding and cleaning up various types of "lint" on your file system. Its primary strength lies in identifying and managing duplicate files, but it also efficiently handles empty files, broken symbolic links, empty directories, and non-stripped binaries. Unlike simpler tools, rmlint employs a sophisticated scanning algorithm, often utilizing hashes for comparison, and can be configured to use various comparison levels from size to full content. It provides flexible output formats, most notably a shell script, allowing users to safely review and execute cleanup operations. This design choice emphasizes data safety by preventing accidental deletion. rmlint aims to help users reclaim disk space and maintain a tidy file system by intelligently detecting and facilitating the removal or linking of redundant data, making it a valuable tool for system maintenance.
CAVEATS
Always review the generated shell script (typically `rmlint.sh`) before executing it to ensure no critical files are inadvertently deleted.
Be particularly cautious with options that perform immediate actions, such as `-X` (`--destroy-badlinks`), as they bypass the review step.
Using hardlinks or symlinks (`-D`, `-L`) modifies file system metadata; understand the implications for backups and other tools.
Scanning large file systems can be resource-intensive, consuming significant CPU and disk I/O.
<I>OUTPUT FORMATS</I>
rmlint supports various output formats specified via the `-o` option. The most commonly used format is `sh`, which generates a shell script (e.g., `rmlint.sh`) containing `rm` or `ln` commands for the identified lint. This allows users to carefully inspect what will be deleted or linked before execution. Other formats like `json`, `csv`, and `parrot` (used by the GUI) are available for programmatic integration or advanced analysis.
<I>SAFETY AND REVIEW</I>
A core principle of rmlint's design is safety. By default, it does not directly delete files or modify the file system. Instead, it generates a shell script containing the necessary commands to perform the cleanup (e.g., `rm`, `ln`). Users are strongly encouraged to review this generated script (typically found in the current directory as `rmlint.sh`) to ensure that only intended files are affected. This crucial safety net helps prevent accidental data loss and provides full control over the cleanup process.
HISTORY
rmlint was created by Christopher Schramm and has been actively developed as a modern, fast, and feature-rich alternative to older duplicate file finders. Its design prioritizes speed, especially for large datasets, by leveraging efficient algorithms and parallel processing. It gained popularity for its emphasis on safety through its default behavior of generating reviewable shell scripts rather than directly modifying the file system.