LinuxCommandLibrary

aspell-import

Convert personal word list formats to Aspell

SYNOPSIS

aspell-import [OPTIONS] [FILE...]

PARAMETERS

-l, --lang=<lang>
    Specifies the language of the word list, which is essential for proper character set handling and linguistic rules.

--encoding=<encoding>
    Sets the character encoding of the input FILE(s) (e.g., UTF-8, ISO-8859-1). Defaults to the system's locale encoding.

--normalize
    Performs Unicode normalization on the words, useful for handling characters with combining diacritics.

--ignore-case
    Processes words without regard to their casing, converting them to a common case (usually lowercase) before sorting and uniqueness checks.

--rem-accents
    Removes diacritical marks (accents) from words during processing.

--clean
    Cleans the word list by removing invalid characters or applying other cleaning rules based on the language.

--allow-dash
    Allows hyphens to be considered valid characters within words, preventing words with hyphens from being split or discarded.

--allow-apostrophe
    Allows apostrophes to be considered valid characters within words.

--extra-chars=<chars>
    Defines additional characters that should be considered valid within words, beyond the default alphanumeric set.

--rem-all-extra-chars
    Removes all characters from words that are not alphanumeric or spaces, effectively stripping punctuation and symbols.

--master=<name>
    Specifies the master dictionary name to use for context or word processing rules, affecting how words are normalized or considered.

--personal=<name>
    Specifies the personal dictionary name to use for context, though aspell-import primarily outputs to standard out for further piping to other Aspell tools.

DESCRIPTION

The aspell-import command is a utility provided with the GNU Aspell spell checker. Its primary purpose is to convert plain text word lists (typically one word per line) into the highly optimized, sorted, and unique format used by Aspell dictionaries. This is crucial for creating or updating personal dictionaries, specialized vocabularies, or custom language packs. It handles various aspects of word processing such as encoding, case folding, accent removal, and character normalization, ensuring that the imported words are correctly processed and recognized by Aspell's spell-checking engine. The command reads input from standard input by default, or from specified files, and writes the processed word list to standard output.

CAVEATS

aspell-import expects a plain text word list, typically with one word per line. Using incorrect character encoding for the input file can lead to corrupted or improperly processed words. The effectiveness of options like --normalize or --rem-accents depends on the underlying Aspell library's support for the specified language and its character set.

USAGE WITH DICTIONARY CREATION

Often, the output of aspell-import is piped to other utilities like word-list-compress to create the final compressed dictionary files (.rws and .wl) that Aspell uses. For example:
cat my_words.txt | aspell-import --lang=en --encoding=UTF-8 | word-list-compress > my_custom_dict.rws

DEFAULT BEHAVIOR

By default, aspell-import sorts the words, removes duplicates, and converts them to lowercase. It also tries to remove or handle invalid characters based on the language rules, unless explicitly told otherwise by options like --dont-clean or --extra-chars.

HISTORY

aspell-import is a core utility that ships with GNU Aspell, a free and open-source spell checker designed to replace Ispell. Aspell's development began in the late 1990s with the goal of improving upon Ispell's functionality, particularly in handling different character sets and intelligent suggestion generation. aspell-import was created to facilitate the integration of custom word lists into Aspell's efficient dictionary format, a fundamental requirement for users wishing to extend or customize their spell-checking capabilities.

SEE ALSO

Copied to clipboard