aspell-import
Convert personal word list formats to Aspell
SYNOPSIS
aspell-import [OPTIONS] [FILE...]
PARAMETERS
-l, --lang=<lang>
Specifies the language of the word list, which is essential for proper character set handling and linguistic rules.
--encoding=<encoding>
Sets the character encoding of the input FILE(s) (e.g., UTF-8, ISO-8859-1). Defaults to the system's locale encoding.
--normalize
Performs Unicode normalization on the words, useful for handling characters with combining diacritics.
--ignore-case
Processes words without regard to their casing, converting them to a common case (usually lowercase) before sorting and uniqueness checks.
--rem-accents
Removes diacritical marks (accents) from words during processing.
--clean
Cleans the word list by removing invalid characters or applying other cleaning rules based on the language.
--allow-dash
Allows hyphens to be considered valid characters within words, preventing words with hyphens from being split or discarded.
--allow-apostrophe
Allows apostrophes to be considered valid characters within words.
--extra-chars=<chars>
Defines additional characters that should be considered valid within words, beyond the default alphanumeric set.
--rem-all-extra-chars
Removes all characters from words that are not alphanumeric or spaces, effectively stripping punctuation and symbols.
--master=<name>
Specifies the master dictionary name to use for context or word processing rules, affecting how words are normalized or considered.
--personal=<name>
Specifies the personal dictionary name to use for context, though aspell-import primarily outputs to standard out for further piping to other Aspell tools.
DESCRIPTION
The aspell-import command is a utility provided with the GNU Aspell spell checker. Its primary purpose is to convert plain text word lists (typically one word per line) into the highly optimized, sorted, and unique format used by Aspell dictionaries. This is crucial for creating or updating personal dictionaries, specialized vocabularies, or custom language packs. It handles various aspects of word processing such as encoding, case folding, accent removal, and character normalization, ensuring that the imported words are correctly processed and recognized by Aspell's spell-checking engine. The command reads input from standard input by default, or from specified files, and writes the processed word list to standard output.
CAVEATS
aspell-import expects a plain text word list, typically with one word per line. Using incorrect character encoding for the input file can lead to corrupted or improperly processed words. The effectiveness of options like --normalize or --rem-accents depends on the underlying Aspell library's support for the specified language and its character set.
USAGE WITH DICTIONARY CREATION
Often, the output of aspell-import is piped to other utilities like word-list-compress to create the final compressed dictionary files (.rws and .wl) that Aspell uses. For example:cat my_words.txt | aspell-import --lang=en --encoding=UTF-8 | word-list-compress > my_custom_dict.rws
DEFAULT BEHAVIOR
By default, aspell-import sorts the words, removes duplicates, and converts them to lowercase. It also tries to remove or handle invalid characters based on the language rules, unless explicitly told otherwise by options like --dont-clean or --extra-chars.
HISTORY
aspell-import is a core utility that ships with GNU Aspell, a free and open-source spell checker designed to replace Ispell. Aspell's development began in the late 1990s with the goal of improving upon Ispell's functionality, particularly in handling different character sets and intelligent suggestion generation. aspell-import was created to facilitate the integration of custom word lists into Aspell's efficient dictionary format, a fundamental requirement for users wishing to extend or customize their spell-checking capabilities.
SEE ALSO
aspell(1), word-list-compress(1)