texcount
Count words in LaTeX documents
TLDR
Count words in a TeX file
Count words in a document and subdocuments built with \input or \include
Count words in a document and subdocuments, listing each file separately (and a total count)
Count words in a document and subdocuments, producing subcounts by chapter (instead of subsection)
Count words with verbose output
SYNOPSIS
texcount [OPTIONS] FILES...
PARAMETERS
-total, --total
Outputs only the grand total count of words.
-sum, --sum
Outputs a summary of counts, often categorized by document sections.
-char, --char
Counts characters, including spaces.
-word, --word
Counts words (this is the default behavior).
-sentence, --sentence
Counts sentences within the document.
-incbib, --incbib
Includes the bibliography (e.g., from .bbl files) in the total word count.
-nobib, --nobib
Excludes the bibliography from the total word count (this is the default).
-incfig, --incfig
Includes the text from figure and table captions in the word count.
-sub[=LEVEL], --sub[=LEVEL]
Counts words per subdocument or per sectional unit (e.g., section, subsection, paragraph).
-exclcmd=CMD, --exclude-command=CMD
Excludes the argument of a specific LaTeX command (e.g., \footnote) from counting.
-exclenv=ENV, --exclude-environment=ENV
Excludes content within a specific LaTeX environment (e.g., figure, abstract) from counting.
-only-section=SEC
Counts only the content within a specified section or subsection name.
-output=FORMAT, --output=FORMAT
Specifies the output format: summary (default), tex (LaTeX table), xml, or csv.
-q, --quiet
Suppresses informational messages, showing only the final count.
-v, --version
Displays version information and exits.
-h, --help
Displays a help message with command options and exits.
DESCRIPTION
texcount is a powerful Perl script designed to provide accurate word, character, and sentence counts for LaTeX, TeX, and ConTeXt documents.
It intelligently parses LaTeX commands, environments, and document structure, enabling it to exclude or include specific elements like comments, commands, or sections from the count.
Unlike generic word counters, texcount understands the nuances of LaTeX, providing a more reliable count by ignoring the underlying code.
It supports various output formats, including simple summaries, LaTeX tables, or XML, making it versatile for academic, publishing, and reporting purposes.
CAVEATS
texcount relies on parsing LaTeX syntax, so malformed or highly unconventional LaTeX code might lead to inaccurate counts.
Its definition of a 'word' is based on whitespace and common punctuation, which may not always align with very specific linguistic counting rules.
It requires a Perl interpreter to run.
CONFIGURATION FILES
texcount can be configured using a personal configuration file (~/.texcountrc) or a project-specific file (.texcountrc in the current directory).
These files allow users to define custom counting rules, default exclusions, or specify how certain LaTeX commands and environments should be handled, providing a high degree of customization for specific document types or workflows.
COUNTING RULES AND INCLUSIONS
texcount applies intelligent rules for what constitutes a 'word', typically separating by whitespace and certain punctuation.
By default, it excludes LaTeX commands, environments like verbatim, and comments.
However, options exist to include or exclude almost any element, such as footnotes, specific commands, or mathematical content, offering fine-grained control over the final count based on user requirements.
HISTORY
texcount is a well-established and actively maintained Perl script, originating to provide accurate word counts for academic and publishing workflows involving LaTeX.
It has evolved over many years to support newer LaTeX features, packages, and more complex document structures, becoming a standard tool for LaTeX users needing precise document statistics.