jdupes
Find and delete duplicate files
TLDR
Search a single directory
Search multiple directories
Search all directories recursively
Search directory recursively and let user choose files to preserve
Search multiple directories and follow subdirectores under directory2, not directory1
Search multiple directories and keep the directory order in result
SYNOPSIS
jdupes [options] file_or_directory ...
PARAMETERS
-r, --recurse
Recurse into subdirectories to find duplicates.
-s, --summarize
Summarize duplicate file usage and statistics.
-d, --delete
Delete duplicate files. Defaults to interactive mode unless -N is used.
-L, --hardlink
Hardlink all duplicate files together, conserving disk space.
-m, --makelinks
Replace duplicate files with symlinks to the first occurrence.
-n, --noaction
Do not perform any actions; just print matches as if no other action option was specified.
-o, --omitempty
Omit zero-length files from consideration when searching for duplicates.
-v, --verbose
Verbose output; print more information about the scanning and actions.
-z, --zeromatch
Match zero-length files as duplicates (overrides -o).
-x, --exclude DIR
Exclude a specific directory from scanning. Can be used multiple times.
-N, --noprompt
Do not prompt when deleting files; delete immediately without confirmation.
-I, --isolate-filesystem
Do not cross file system boundaries when recursing into directories.
-S, --size-only
Match files by size only, skipping the hash comparison. Less accurate, but faster.
-H, --nohardlinks
Do not consider files with more than one hard link as duplicates. Useful for finding truly unique files.
-j, --hash-threads NUM
Set the number of concurrent hashing threads for faster processing on multi-core systems.
-E, --symlinks
Follow symlinks and treat the files they point to as regular files.
-y, --print-json
Output results in JSON format for programmatic parsing.
DESCRIPTION
jdupes is a powerful command-line utility designed to find and perform various actions on duplicate files within specified directories or paths. It is a highly optimized and robust rewrite of fdupes, focusing on performance and reliability, especially with large datasets. jdupes identifies duplicates by comparing file sizes, then performing a partial hash check, and finally a full file hash comparison if initial checks suggest a match. It can be used to simply list duplicates, delete them (interactively or automatically), create hard links or symlinks to conserve disk space, or summarize findings. Its efficient algorithm and C implementation make it suitable for managing vast collections of files, offering a flexible solution for disk space management and data organization.
CAVEATS
jdupes deletion actions are permanent; always use --noaction or interactive mode first to verify.
By default, jdupes will not cross mount points, which can be overridden with -I.
Care must be taken when using options like --hardlink or --makelinks across different file systems, as these operations typically require files to reside on the same file system.
jdupes requires read access to all files and directories it scans.
ALGORITHM DETAILS
jdupes finds duplicates using a multi-stage process:
1. It groups files by size.
2. It performs a partial 1-kilobyte hash on files of identical size.
3. For files that match the partial hash, it calculates a full file hash (MD5 by default, configurable during compile time) to confirm true duplicates. This approach minimizes disk I/O and CPU usage by avoiding full hash calculations on non-matching files.
EXIT STATUS
jdupes typically exits with a status of 0 on success.
A non-zero exit status indicates an error or an invalid command-line invocation.
HISTORY
jdupes began as a modernized and performance-focused fork of the popular fdupes utility. Written in C, its primary goal was to address performance bottlenecks and add new features like multithreading for hashing, making it more efficient for scanning very large file systems. It has evolved to include numerous options for output formatting, action types, and fine-tuning scanning behavior, positioning itself as a robust solution for duplicate file management on Linux and Unix-like systems.