hunspell
Spell check files or text
TLDR
Check the spelling of a file
Check the spelling of a file with the en_US dictionary
List misspelled words in a file
SYNOPSIS
hunspell [options] [dictionary_name] [text_file]
hunspell -d dictionary_name [options] < text_file
PARAMETERS
-d dictionary
Specifies the dictionary to use. This can be a full path or just the name (e.g., 'en_US').
-i encoding
Sets the input text encoding (e.g., UTF-8, ISO-8859-1). Automatically detected if not specified.
-a
Activates the morphological analysis mode. Outputs detailed information about recognized words.
-m
Enables the stemming mode. Outputs the stem (root form) of each recognized word.
-l
Lists only the misspelled words, one per line. Useful for piping output.
-s
Activates the suggestion mode. For each misspelled word, provides a list of suggested corrections.
-D
Displays the path to the main dictionary directory and lists all available dictionaries.
-u file
Specifies a personal dictionary file to add to the main dictionary.
-w
Word list mode. Prints all known words in the dictionary, one per line.
-v
Displays the version information of hunspell.
-h, --help
Shows a brief help message with available command-line options.
DESCRIPTION
hunspell is a robust command-line spell checker and morphological analyzer widely used in many open-source projects, including LibreOffice, Firefox, and Chromium. It supports a vast number of languages through its flexible dictionary format, which consists of an affix file (.aff) and a dictionary file (.dic).
The command can operate in various modes: checking for misspellings, suggesting corrections, performing stemming (reducing words to their root form), and providing morphological analysis (breaking down words into their components and grammatical features). It can process input from a specified file or standard input, making it highly versatile for integration into scripts and larger text processing workflows.
Unlike simpler spell checkers, hunspell utilizes a two-stage process: first, it checks if a word is present in the dictionary or can be generated by affix rules; if not, it then generates suggestions based on edit distance and dictionary entries. This makes it particularly effective for languages with complex morphology.
CAVEATS
Dictionary paths are crucial; hunspell looks for dictionaries in specific system-wide and user-specific locations. If a dictionary is not found, the command may fail or report an error.
While hunspell supports many encodings, mismatches between input file encoding and the specified encoding can lead to incorrect spell checking or garbled output.
Performance can vary significantly depending on the dictionary size, the complexity of affix rules, and the size of the input text.
HISTORY
hunspell is a successor of MySpell, which was originally developed for OpenOffice.org. It was designed to be faster and to support a wider range of languages, especially those with complex agglutinative morphology (where words are formed by combining multiple morphemes). Its development was spearheaded by László Németh, and it quickly became the default spell checker in many prominent open-source projects, including the Mozilla Firefox browser and the LibreOffice office suite, due to its robust feature set and open-source licensing.