convmv
Convert file names from one encoding to another
TLDR
Test filename encoding conversion (don't actually change the filename)
Convert filename encoding and rename the file to the new encoding
SYNOPSIS
convmv -f ENCODING_FROM -t ENCODING_TO [OPTIONS] [FILE_OR_DIR...]
PARAMETERS
-f ENCODING_FROM
Specifies the current character encoding of the filenames that need to be converted.
-t ENCODING_TO
Specifies the target character encoding to which filenames should be converted.
-r, --recursive
Recursively converts filenames in subdirectories as well as the specified directory.
--notest
Performs the actual filename conversion. By default, convmv operates in dry-run mode, only showing what it would do.
--dry-run
Explicitly enables dry-run mode. This is the default behavior and shows changes without applying them.
--list-encodings
Lists all available character encodings that convmv supports for conversion.
--replace
Replaces characters that cannot be converted to the target encoding with a specified or default placeholder character (e.g., '?').
-i, --interactive
Prompts for confirmation before converting each individual file.
--log FILE
Logs all changes made to filenames, including successful conversions and errors, to the specified file.
DESCRIPTION
convmv is a utility for converting filenames to a different character encoding. It's particularly useful when dealing with files created on systems using different character encodings (e.g., ISO-8859-1, EUC-JP) and you need to display or use them correctly on a system primarily using UTF-8. It does not convert the content of files, only their names.
The command offers a dry-run mode by default, allowing users to preview the changes before applying them, which is crucial given the potential impact on filenames. It can operate on individual files, directories, or recursively convert all filenames within a directory tree, making it a powerful tool for migrating entire file systems or large archives between systems with differing locale settings. It leverages underlying encoding conversion libraries to perform its task reliably.
CAVEATS
Always perform a dry-run first (the default behavior) to preview changes, as filename alterations can be irreversible.
It is highly recommended to create a backup of your data before running convmv with the --notest option.
Remember, convmv only changes filenames, not the content of the files themselves.
Filenames with severely corrupted or completely unknown original encodings might not be fully recoverable.
DEFAULT DRY-RUN MODE
By default, convmv operates in a 'dry-run' mode, meaning it will only show you which filenames would be converted without making any actual changes. This is a crucial safety feature to verify the intended outcome before committing changes. To apply the changes, you must explicitly use the --notest option.
ENCODING DETECTION AND SPECIFICITY
While convmv can sometimes intelligently detect broken UTF-8 sequences and try to guess the original encoding, it is always safer and more reliable to explicitly specify both the source (-f) and target (-t) encodings when you know them, to ensure accurate conversions.
HISTORY
convmv was developed to address the common and often frustrating problem of character encoding mismatches, particularly relevant when migrating files between different operating systems or locales. As UTF-8 became the predominant encoding in modern Linux distributions, tools like convmv became essential for cleaning up 'garbled' or incorrectly displayed filenames originating from systems using older encodings like ISO-8859-1 or various Windows code pages, simplifying file management and data integrity.