jdupes

Find and delete duplicate files

TLDR

Search a single directory

$ jdupes [path/to/directory]

Search multiple directories

$ jdupes [directory1 directory2 ...]

Search all directories recursively

$ jdupes [[-r|--recurse]] [path/to/directory]

Search directory recursively and let user choose files to preserve

$ jdupes [[-d|--delete]] [[-r|--recurse]] [path/to/directory]

Search multiple directories and follow subdirectores under directory2, not directory1

$ jdupes [directory1] [[-R|--recurse:]] [directory2]

Search multiple directories and keep the directory order in result

$ jdupes [[-O|--param-order]] [directory1 directory2 directory3 ...]

SYNOPSIS

jdupes [options] file_or_directory ...

-r, --recurse
    Recurse into subdirectories to find duplicates.

-s, --summarize
    Summarize duplicate file usage and statistics.

-d, --delete
    Delete duplicate files. Defaults to interactive mode unless -N is used.

-L, --hardlink
    Hardlink all duplicate files together, conserving disk space.

-m, --makelinks
    Replace duplicate files with symlinks to the first occurrence.

-n, --noaction
    Do not perform any actions; just print matches as if no other action option was specified.

-o, --omitempty
    Omit zero-length files from consideration when searching for duplicates.

-v, --verbose
    Verbose output; print more information about the scanning and actions.

-z, --zeromatch
    Match zero-length files as duplicates (overrides -o).

-x, --exclude DIR
    Exclude a specific directory from scanning. Can be used multiple times.

-N, --noprompt
    Do not prompt when deleting files; delete immediately without confirmation.

-I, --isolate-filesystem
    Do not cross file system boundaries when recursing into directories.

-S, --size-only
    Match files by size only, skipping the hash comparison. Less accurate, but faster.

-H, --nohardlinks
    Do not consider files with more than one hard link as duplicates. Useful for finding truly unique files.

-j, --hash-threads NUM
    Set the number of concurrent hashing threads for faster processing on multi-core systems.

-E, --symlinks
    Follow symlinks and treat the files they point to as regular files.

-y, --print-json
    Output results in JSON format for programmatic parsing.

DESCRIPTION

jdupes is a powerful command-line utility designed to find and perform various actions on duplicate files within specified directories or paths. It is a highly optimized and robust rewrite of fdupes, focusing on performance and reliability, especially with large datasets. jdupes identifies duplicates by comparing file sizes, then performing a partial hash check, and finally a full file hash comparison if initial checks suggest a match. It can be used to simply list duplicates, delete them (interactively or automatically), create hard links or symlinks to conserve disk space, or summarize findings. Its efficient algorithm and C implementation make it suitable for managing vast collections of files, offering a flexible solution for disk space management and data organization.

CAVEATS

jdupes deletion actions are permanent; always use --noaction or interactive mode first to verify.
By default, jdupes will not cross mount points, which can be overridden with -I.
Care must be taken when using options like --hardlink or --makelinks across different file systems, as these operations typically require files to reside on the same file system.
jdupes requires read access to all files and directories it scans.

ALGORITHM DETAILS

jdupes finds duplicates using a multi-stage process:
1. It groups files by size.
2. It performs a partial 1-kilobyte hash on files of identical size.
3. For files that match the partial hash, it calculates a full file hash (MD5 by default, configurable during compile time) to confirm true duplicates. This approach minimizes disk I/O and CPU usage by avoiding full hash calculations on non-matching files.

EXIT STATUS

jdupes typically exits with a status of 0 on success.
A non-zero exit status indicates an error or an invalid command-line invocation.

HISTORY

jdupes began as a modernized and performance-focused fork of the popular fdupes utility. Written in C, its primary goal was to address performance bottlenecks and add new features like multithreading for hashing, making it more efficient for scanning very large file systems. It has evolved to include numerous options for output formatting, action types, and fine-tuning scanning behavior, positioning itself as a robust solution for duplicate file management on Linux and Unix-like systems.