fclones

Find duplicate files efficiently

TLDR

Search for duplicate files in the current directory

$ fclones group .

Search multiple directories for duplicate files and cache the results

$ fclones group --cache [path/to/directory1 path/to/directory2]

Search only the specified directory for duplicate files, skipping subdirectories and save the results into a file

$ fclones group [path/to/directory] --depth 1 > [path/to/file.txt]

Move the duplicate files in a TXT file to a different directory

$ fclones move [path/to/target_directory] < [path/to/file.txt]

Perform a dry run for soft links in a TXT file without actually linking

$ fclones link --soft < [path/to/file.txt] --dry-run 2 > /dev/null

Delete the newest duplicates from the current directory without storing them in a file

$ fclones group . | fclones remove --priority newest

Preprocess JPEG files in the current directory by using an external command to strip their EXIF data before matching for duplicates

$ fclones group . --name '*.jpg' -i --transform 'exiv2 -d a $IN' --in-place

fclones [COMMAND] [OPTIONS] [PATH...]

Common commands:
fclones find [OPTIONS] [PATH...]
fclones group [OPTIONS] [PATH...]
fclones rm [OPTIONS] [PATH...]
fclones link [OPTIONS] [PATH...]
fclones dedupe [OPTIONS] [PATH...]

PARAMETERS

--help, -h
    Displays a help message for the command or subcommand.

--version
    Shows the version information of fclones.

--dry-run, -n
    Simulates the operation without making any actual changes to the file system.

--min-size <SIZE>, -s <SIZE>
    Only considers files larger than or equal to the specified size (e.g., 1K, 1M).

--max-size <SIZE>
    Only considers files smaller than or equal to the specified size (e.g., 10G).

--exclude <PATTERN>
    Excludes files or directories matching the given glob pattern.

--include-hidden
    Includes hidden files and directories in the scan.

--dedupe-strategy <STRATEGY>
    Defines which file to keep among duplicates (e.g., oldest, newest, largest, smallest).

--no-confirm, -f
    Performs actions without asking for user confirmation. Use with caution.

--hardlink
    Replaces duplicate files with hard links to the original.

--symlink
    Replaces duplicate files with symbolic links to the original.

--xfstype <TYPE>
    Excludes files residing on filesystems of a specific type (e.g., tmpfs).

--one-filesystem, -x
    Prevents crossing filesystem boundaries when scanning directories.

DESCRIPTION

fclones is a highly optimized command-line utility written in Rust designed to efficiently identify and manage duplicate files on a file system. It uses a multi-stage approach, typically starting with comparing file sizes, then block-by-block comparison, and finally cryptographic hashing (e.g., Blake3) to ensure accuracy. This method significantly reduces the I/O operations and CPU cycles needed compared to simpler tools. fclones can find duplicates, group them, and then perform actions like deleting redundant copies or replacing them with hard links, thereby freeing up disk space. It provides features for safe operation, such as dry-run modes, and offers flexible filtering options to target specific files or directories. Its performance makes it suitable for large datasets.

CAVEATS

Caveats and Limitations:
Data Loss Risk: The rm and dedupe commands can permanently delete files. Always use --dry-run first, and carefully understand the implications of the --dedupe-strategy before proceeding with actual deletion or linking.
Hard Link Limitations: Hard links cannot span across different file systems. Attempting to create hard links between different file systems will result in an error or the creation of new copies instead of links.
Performance on Large Scale: While highly optimized, scanning an extremely large number of files or files located on very slow storage can still be time-consuming and I/O intensive. Consider limiting scan depths or using filters for extremely large datasets.

<B>PERFORMANCE OPTIMIZATION</B>

fclones employs a multi-stage duplicate detection process: an initial size comparison, followed by block-by-block comparison, and finally a cryptographic hash (Blake3 is often the default due to its exceptional speed and collision resistance). This intelligent approach minimizes the amount of data read from disk, leading to significant performance gains, especially on large datasets compared to tools that rely solely on full file hashing.

<B>MODULAR SUBCOMMANDS</B>

fclones is structured with distinct subcommands for different operations (find, group, rm, link, dedupe), which allows for more granular control over its functionality. Users can first find duplicates, then review the groups, and only then proceed with actions like removal or linking, enhancing both safety and flexibility in managing duplicate files.

HISTORY

fclones is a relatively modern utility, first released around 2019-2020. It was created by Paweł Kołaczyk with the primary goal of providing a faster and more efficient alternative to existing duplicate file finders, leveraging the performance benefits of the Rust programming language and advanced hashing algorithms like Blake3. Its development focuses on speed, accuracy, and safe operation for large file collections, making it a popular choice for disk space management on modern Linux systems.