fdupes
Identify and optionally delete duplicate files
TLDR
Search a single directory
Search multiple directories
Search a directory recursively
Search multiple directories, one recursively
Search recursively, considering hardlinks as duplicates
Search recursively for duplicates and display interactive prompt to pick which ones to keep, deleting the others
Search recursively and delete duplicates without prompting
SYNOPSIS
fdupes [options] directory ...
PARAMETERS
-r, --recurse
Recursively scan specified directories.
-s, --symlink
Follow symbolic links found in arguments and during recursive scans.
-d, --delete
Prompt user for files to preserve, delete all others in a set of duplicates.
-N, --noprompt
Used with --delete; preserve the first file, delete others without prompting. Use with extreme caution!
-f, --omitfirst
Omit the first file in each set of duplicates from output.
-L, --hardlink
Replace all duplicate files with hardlinks to the first file in each set.
-S, --symlink
Replace all duplicate files with symlinks to the first file in each set.
-m, --summarize
Summarize duplicate file information, showing total duplicates and space saved.
-n, --noempty
Exclude zero-length files from consideration.
-1, --sameline
List each set of duplicate files on a single line, separated by spaces.
-q, --quiet
Do not show progress indicator.
-p, --preservepermissions
Preserve permissions of original file when hardlinking or symlinking duplicates.
-v, --version
Display fdupes version information and exit.
-h, --help
Display usage instructions and exit.
DESCRIPTION
fdupes is a command-line utility written in C for efficiently finding and managing duplicate files within specified directories. It identifies duplicates by first comparing file sizes, then by comparing cryptographic hash signatures (like MD5 or SHA-1) of their content. This two-stage approach minimizes unnecessary byte-by-byte comparisons, making it quite fast.
Files with identical sizes are subjected to a full content hash validation to confirm their identical nature. Upon discovery, fdupes offers several powerful actions: it can simply list the duplicates, allow for interactive deletion where the user chooses which files to preserve, or replace duplicates with hardlinks or symlinks to the first encountered instance, thereby saving disk space. It supports recursive scanning of directories, following symbolic links, and various output formats for flexibility. As a robust tool, fdupes is invaluable for cleaning up cluttered storage, optimizing disk usage, and maintaining an organized file system.
CAVEATS
- Using options like --delete, especially in combination with --noprompt, can lead to irreversible data loss if used incorrectly. Always double-check commands before execution.
- The default behavior of fdupes might recursively scan directories depending on the version and compilation options; it's always safer to explicitly use --recurse if that's the desired behavior.
- Hardlinking or symlinking duplicates changes the filesystem structure. Be aware of implications, especially with backups or applications that expect distinct file paths.
- fdupes relies on file content hashes (MD5/SHA-1 by default). While collisions are rare, they are theoretically possible, meaning two different files could have the same hash. However, for practical purposes, this is usually not a concern for typical file management.
PERFORMANCE CONSIDERATIONS
fdupes processes files in order of size, which helps in quickly discarding non-duplicates. For large numbers of files, especially across different storage devices, its performance can be significantly impacted by disk I/O. Using faster storage or limiting the scope of the scan can improve speed.
CHECKSUM ALGORITHMS
While fdupes historically used MD5, newer versions might support SHA-1 or other algorithms, providing stronger guarantees against hash collisions. Check your specific version's man page for details on which algorithms are used or can be configured.
INTERACTIVE DELETION
The interactive deletion mode (-d) presents each set of duplicate files and asks the user to choose which files to keep. This is a very safe way to remove duplicates, as it provides granular control. The files are numbered, and the user enters the numbers of files to preserve, separated by spaces. Entering a blank line preserves all files in the current set, and entering ? or h shows help.
HISTORY
fdupes was originally written by Adrian Lopez and first released in 2003. It quickly became a popular and lightweight utility for duplicate file detection on Unix-like systems due to its efficiency and straightforward approach. Over the years, it has seen contributions from various developers, adding features like more robust hashing algorithms (e.g., SHA-1, SHA-256 in newer versions), improved output formatting, and additional options for handling duplicates like symlinking. Its development has focused on maintaining a fast and reliable tool for a common system administration task.