LinuxCommandLibrary

fclones

Find duplicate files efficiently

TLDR

Search for duplicate files in the current directory

$ fclones group .
copy

Search multiple directories for duplicate files and cache the results
$ fclones group --cache [path/to/directory1 path/to/directory2 ...]
copy

Search only the specified directory for duplicate files, skipping subdirectories and save the results into a file
$ fclones group [path/to/directory] --depth 1 > [path/to/file.txt]
copy

Move the duplicate files in a TXT file to a different directory
$ fclones < [path/to/file.txt] move [path/to/target_directory]
copy

Perform a dry run for soft links in a TXT file without actually linking
$ fclones < [path/to/file.txt] link --soft --dry-run 2 > /dev/null
copy

Delete the newest duplicates from the current directory without storing them in a file
$ fclones group . | fclones remove --priority newest
copy

Preprocess JPEG files in the current directory by using an external command to strip their EXIF data before matching for duplicates
$ fclones group . --name '*.jpg' -i --transform 'exiv2 -d a $IN' --in-place
copy

SYNOPSIS

fclones group|list|clean [options] [path…]

PARAMETERS

-h, --help
    Print help for subcommand

-V, --version
    Print version info

-v, --verbose
    Increase verbosity (repeat for more)

-q, --quiet
    Suppress non-essential output

--color
    Control colored output: auto|never|always

--format
    Output format: text|json|csv|jsonlines (default: text)

--hash-algo
    Hash algorithm: xxhash|blake3 (default: xxhash)

--hash-threshold
    Hash files larger than this (default: 2KiB)

--size
    Minimum file size to consider

--threads
    Number of threads (default: CPU cores)

-i, --input
    Read paths from file instead of args

-o, --output
    Write output to file

--prune-empty-dirs
    Remove empty dirs after clean

--skip-pending
    Skip files with pending deletions

--exclude
    Exclude paths matching glob

--include
    Include only matching globs

-f, --follow-links
    Follow symlinks

--allow-missing
    Continue on missing paths

DESCRIPTION

fclones is a high-speed command-line tool for finding duplicate files across large filesystems. Written in Rust, it leverages parallel I/O, streaming hash computations (xxhash, blake3), and size-based prefiltering to outperform tools like fdupes or rdfind by orders of magnitude.

It operates via subcommands: group identifies and groups duplicates by hash; list catalogs all files with metadata; clean enables safe deletion of dupes. Files are first grouped by size, then hashed only if matching sizes exceed thresholds, minimizing disk reads. Ideal for backups, storage optimization, or cleaning media libraries. Supports exclusions, symlinks, permissions checks, and output formats like JSON or CSV. Handles millions of files efficiently, often scanning TBs in minutes.

CAVEATS

High memory use on massive directories (>100GB dupes); best on SSDs; root not needed but watch permissions; xxhash collisions theoretically possible but negligible.

SUBCOMMANDS DETAIL

group: Prints dup groups (path, size, hash).
list: Lists all files (no dups).
clean: Interactive/scripted deletion (use --delete --confirm)

EXAMPLE

fclones group ~ /data   # find dups
fclones clean --delete ~/.cache/fclones.plot   # remove from plot file

HISTORY

Developed by Pavel Strakhov starting 2020; Rust rewrite of dupfind for speed. GitHub: P flexed/fclones. v0.34.0 (2024) adds blake3, better parallelism. Widely packaged (Arch, Fedora, NixOS).

SEE ALSO

fdupes(1), rdfind(1), rmlint(1), find(1)

Copied to clipboard