czkawka_cli
Find and remove duplicate files
TLDR
List duplicate in specific directories and write the results into a file
Find duplicate files in specific directories and delete them (default: NONE)
Find similar looking image with a specific similarity level (default: High)
Display help
SYNOPSIS
czkawka_cli [GLOBAL_OPTIONS] COMMAND [COMMAND_OPTIONS]
czkawka_cli {duplicates|empty_folders|big_files|empty_files|temporary_files|broken_files|invalid_images|similar_images|similar_videos|music_files|symlinks} [OPTIONS]
PARAMETERS
-h, --help
Displays help information for the command or specific subcommand.
-V, --version
Prints the version information of Czkawka.
-p, --path
Adds a directory path to scan. Can be specified multiple times.
-e, --excluded_paths
Adds a directory path to exclude from scanning. Can be specified multiple times.
-f, --file_to_save
Saves the results of the scan to the specified file.
-s, --allowed_extensions
Restricts the search to files with the given extensions (e.g., `jpg,png`).
-x, --excluded_extensions
Excludes files with the specified extensions.
-a, --minimal_file_size
Ignores files smaller than the given size (e.g., `1M`, `500k`).
-z, --maximal_file_size
Ignores files larger than the given size.
--dry_run
Performs a simulated run without making any actual changes (e.g., deletion, moving).
--delete_group
Deletes files from a specific result group (e.g., `1` for the first group).
--move_group
Moves files from a specific group to a destination directory.
--hardlink_group
Creates hardlinks for files in a group, keeping only the file at the specified index.
--delete_out_of_path
Allows deleting files even if they are outside the initial search paths (use with caution).
--print_text
Prints results in a human-readable text format to stdout.
--print_json
Prints results in JSON format to stdout.
--csv_output
Saves results to the specified file in CSV format.
--json_output
Saves results to the specified file in JSON format.
--check_for_symlinks, --follow_symlinks
Follows symbolic links when scanning directories.
--skip_hidden_files, --exclude_hidden_dirs
Skips hidden files and directories during the scan.
--same_file_name
Only considers files with the exact same name for comparison (e.g., duplicates).
--same_extension
Only considers files with the exact same extension for comparison.
--exclude_file_names
Excludes files whose names match the given pattern.
--exclude_directory_names
Excludes directories whose names match the given pattern.
--not_recursive
Disables recursive scanning; only checks the top-level directories.
--allow_empty_folders
When searching for empty folders, allows them to contain only other empty folders.
--hash_type
(Duplicates only) Specifies the hashing algorithm for file comparison (`Blake3`, `CRC32`, `MD5`, `SHA1`, `SHA256`, `SHA512`).
--search_method
(Duplicates only) Defines how duplicates are searched (`bfs` for Breadth-First, `dfs` for Depth-First).
--delete_all_except_first
(Duplicates only) Deletes all duplicate files except for the first one found in each group.
--delete_all_except_oldest
(Duplicates only) Deletes all duplicate files except for the oldest one in each group.
--delete_all_except_newest
(Duplicates only) Deletes all duplicate files except for the newest one in each group.
--delete_all_except_shortest_name
(Duplicates only) Deletes all duplicate files except for the one with the shortest name in each group.
--delete_all_except_longest_name
(Duplicates only) Deletes all duplicate files except for the one with the longest name in each group.
--hardlink_all_except_first
(Duplicates only) Creates hardlinks for all duplicates to the first file in each group.
--number_of_files
(Big Files only) Specifies the maximum number of largest files to list.
--hash_size
(Similar Images only) Sets the size of the hash used for image comparison (e.g., `8`, `16`, `32`).
--minimal_comparison
(Similar Images only) Sets the minimum similarity percentage for images to be considered similar (e.g., `75`).
--compare_tags
(Music Files only) Compares music files based on their ID3 tags (e.g., artist, album, title).
DESCRIPTION
czkawka_cli is the command-line interface for Czkawka, a powerful, fast, and free disk cleaner written in Rust. It efficiently helps users reclaim disk space by identifying various types of redundant or unwanted files. Its capabilities include finding duplicate files based on content or name, locating empty folders, identifying large files that consume significant space, detecting empty files, and cleaning up temporary files. Beyond basic cleanup, it can also find broken symlinks, invalid images, and even visually similar images or duplicate music files based on tags.
The tool is highly configurable, allowing users to specify search paths, exclude directories or file extensions, and set file size limits. It offers various output formats, including plain text, CSV, and JSON, making it suitable for scripting and integration into automated workflows. For actions like deleting or moving files, czkawka_cli provides dry_run functionality for safe testing, ensuring users have control over changes made to their filesystem.
CAVEATS
Using deletion or modification commands (`--delete_group`, `--move_group`, `--hardlink_group`) without first performing a `--dry_run` or careful review of the output can lead to unintended data loss. The `--delete_out_of_path` option is particularly dangerous as it overrides safety measures preventing deletion of files outside specified search directories. Performance can vary significantly depending on the number of files, disk speed, and chosen scan methods (e.g., hash type for duplicates).
SCAN MODES (COMMANDS)
Czkawka CLI operates through various subcommands, each dedicated to a specific type of file cleanup or analysis:duplicates: Finds files with identical content.empty_folders: Locates directories that contain no files or only other empty directories.big_files: Lists the largest files consuming significant disk space.empty_files: Identifies files with zero bytes.temporary_files: Searches for files typically created as temporary by applications.broken_files: Detects broken symbolic links or other invalid file entries.invalid_images: Finds images that are corrupted or cannot be properly opened.similar_images: Discovers images that are visually similar, even if they are not exact duplicates.similar_videos: Identifies videos that are visually similar.music_files: Finds duplicate music files, optionally comparing based on ID3 tags.symlinks: Finds redundant symbolic links.
OUTPUT FORMATS
The command supports multiple output formats beyond just printing to the console. Results can be saved to a file in plain text, CSV, or JSON format, which is particularly useful for integration with other scripts or data analysis tools. The `--print_json` option outputs directly to stdout, facilitating pipeline integration.
HISTORY
Czkawka (Polish for "hiccup") was created by RafaĆ Janik as an open-source project to provide a modern, fast, and cross-platform alternative to older disk cleaning tools, particularly those for Linux. Written in Rust, it emphasizes performance and memory safety. The czkawka_cli component offers a powerful command-line interface, making it suitable for scripting and server environments, while czkawka_gui provides a user-friendly graphical interface. Its development has focused on a wide range of cleanup tasks, including advanced features like similar image and video detection, which are often missing in simpler tools.


