LinuxCommandLibrary

codespell

Find and fix spelling errors in code

TLDR

Check for typos in all text files in the current directory, recursively

$ codespell
copy

Correct all typos found in-place
$ codespell --write-changes
copy

Skip files with names that match the specified pattern (accepts a comma-separated list of patterns using wildcards)
$ codespell --skip "[pattern]"
copy

Use a custom dictionary file when checking (--dictionary can be used multiple times)
$ codespell --dictionary [path/to/file.txt]
copy

Do not check words that are listed in the specified file
$ codespell --ignore-words [path/to/file.txt]
copy

Do not check the specified words
$ codespell --ignore-words-list [ignored_word1,ignored_word2,...]
copy

Print 3 lines of context around, before or after each match
$ codespell --[context|before-context|after-context] [3]
copy

Check file names for typos, in addition to file contents
$ codespell --check-filenames
copy

SYNOPSIS

codespell [OPTIONS] [files...]
codespell [OPTIONS] -r [directories...]

PARAMETERS

-w
    Fixes misspelled words in place.

-D
    Displays only the errors found (default behavior).

-x
    Displays the difference (diff format) for potential fixes.

-c
    Displays context around the misspelled word.

-S SKIP_PATTERNS
    Skips files or directories matching the given comma-separated patterns.

-I BUILTIN_DICTIONARY
    Specifies additional built-in dictionaries to include (e.g., 'rare').

-L BUILTIN_DICTIONARY
    Specifies built-in dictionaries to load (overrides default).

-d DICTIONARIES
    Specifies a comma-separated list of custom dictionary files to use.

-s SKIP_BUILTIN
    Skips specific built-in dictionaries (e.g., 'en_US').

-r
    Recursively checks files in the given directories.

--ignore-words-list IGNORE_WORDS_LIST
    Provides a comma-separated list of words to ignore.

--check-filenames
    Also checks for misspelled words in filenames.

--check-hidden
    Includes hidden files/directories during recursive checks.

--version
    Shows the program's version number and exit.

DESCRIPTION

codespell is a command-line utility designed to quickly scan source code, documentation, and comments for common spelling mistakes. Unlike general-purpose spell checkers, codespell focuses on a curated list of frequently misspelled words and common programming-related typos, making it highly effective for codebases without generating excessive noise from legitimate variable names or technical jargon. It can identify errors and, optionally, fix them in place. Its efficiency and targeted approach make it an ideal tool for integration into continuous integration (CI) pipelines, pre-commit hooks, or for regular local development checks to maintain code quality and readability.

CAVEATS

False Positives: While optimized for code, codespell may still flag legitimate technical terms, variable names, or acronyms as errors, requiring manual review or adding to an ignore list.
Limited Scope: It primarily focuses on common spelling errors and does not perform grammar checks or understand the linguistic context of the text.
In-place Modification: Using the -w option modifies files directly. It is highly recommended to use codespell with version control (e.g., Git) to easily revert changes if unintended modifications occur.
Dictionary Dependent: Its effectiveness relies on its built-in and user-defined dictionaries. Uncommon or domain-specific typos might be missed if not present in the dictionaries.

<I>INTEGRATION WITH CI/CD:</I>

codespell is frequently integrated into Continuous Integration/Continuous Deployment (CI/CD) pipelines. This ensures that any new or modified code is automatically checked for spelling errors before it's merged, preventing common typos from entering the codebase. Tools like GitHub Actions, GitLab CI/CD, or Jenkins can easily run codespell as part of their build or test stages.

<I>CUSTOM DICTIONARIES:</I>

Users can define custom dictionary files to include project-specific terminology, acronyms, or proper nouns that codespell should recognize as correctly spelled. This helps in reducing false positives and tailoring the tool to specific project needs, enhancing its utility.

<I>ERROR CODES:</I>

codespell supports specific error codes for different types of issues, which can be enabled or disabled (e.g., --enable-error-code, --disable-error-code). This allows for fine-grained control over what kinds of spelling issues are reported.

HISTORY

codespell was created by Lucas C. Villa Real with the first commit appearing in 2013. It originated from the need for a lightweight, code-centric spell checker that could efficiently identify common typos in source code without the overhead and false positives of traditional spell checkers when applied to programming contexts. Its development has been community-driven, expanding its dictionaries and features to better serve developers seeking to improve code readability and reduce technical debt associated with misspellings, especially in comments and documentation. It has gained popularity due to its focused approach and ease of integration into modern development workflows.

SEE ALSO

aspell(1), hunspell(1), grep(1), sed(1), git(1)

Copied to clipboard