LinuxCommandLibrary

texcount

Count words in LaTeX documents

TLDR

Count words in a TeX file

$ texcount [path/to/file.tex]
copy

Count words in a document and subdocuments built with \input or \include
$ texcount -merge [file.tex]
copy

Count words in a document and subdocuments, listing each file separately (and a total count)
$ texcount -inc [file.tex]
copy

Count words in a document and subdocuments, producing subcounts by chapter (instead of subsection)
$ texcount -merge -sub=chapter [file.tex]
copy

Count words with verbose output
$ texcount -v [path/to/file.tex]
copy

SYNOPSIS

texcount [OPTIONS] FILES...

PARAMETERS

-total, --total
    Outputs only the grand total count of words.

-sum, --sum
    Outputs a summary of counts, often categorized by document sections.

-char, --char
    Counts characters, including spaces.

-word, --word
    Counts words (this is the default behavior).

-sentence, --sentence
    Counts sentences within the document.

-incbib, --incbib
    Includes the bibliography (e.g., from .bbl files) in the total word count.

-nobib, --nobib
    Excludes the bibliography from the total word count (this is the default).

-incfig, --incfig
    Includes the text from figure and table captions in the word count.

-sub[=LEVEL], --sub[=LEVEL]
    Counts words per subdocument or per sectional unit (e.g., section, subsection, paragraph).

-exclcmd=CMD, --exclude-command=CMD
    Excludes the argument of a specific LaTeX command (e.g., \footnote) from counting.

-exclenv=ENV, --exclude-environment=ENV
    Excludes content within a specific LaTeX environment (e.g., figure, abstract) from counting.

-only-section=SEC
    Counts only the content within a specified section or subsection name.

-output=FORMAT, --output=FORMAT
    Specifies the output format: summary (default), tex (LaTeX table), xml, or csv.

-q, --quiet
    Suppresses informational messages, showing only the final count.

-v, --version
    Displays version information and exits.

-h, --help
    Displays a help message with command options and exits.

DESCRIPTION

texcount is a powerful Perl script designed to provide accurate word, character, and sentence counts for LaTeX, TeX, and ConTeXt documents.
It intelligently parses LaTeX commands, environments, and document structure, enabling it to exclude or include specific elements like comments, commands, or sections from the count.
Unlike generic word counters, texcount understands the nuances of LaTeX, providing a more reliable count by ignoring the underlying code.
It supports various output formats, including simple summaries, LaTeX tables, or XML, making it versatile for academic, publishing, and reporting purposes.

CAVEATS

texcount relies on parsing LaTeX syntax, so malformed or highly unconventional LaTeX code might lead to inaccurate counts.
Its definition of a 'word' is based on whitespace and common punctuation, which may not always align with very specific linguistic counting rules.
It requires a Perl interpreter to run.

CONFIGURATION FILES

texcount can be configured using a personal configuration file (~/.texcountrc) or a project-specific file (.texcountrc in the current directory).
These files allow users to define custom counting rules, default exclusions, or specify how certain LaTeX commands and environments should be handled, providing a high degree of customization for specific document types or workflows.

COUNTING RULES AND INCLUSIONS

texcount applies intelligent rules for what constitutes a 'word', typically separating by whitespace and certain punctuation.
By default, it excludes LaTeX commands, environments like verbatim, and comments.
However, options exist to include or exclude almost any element, such as footnotes, specific commands, or mathematical content, offering fine-grained control over the final count based on user requirements.

HISTORY

texcount is a well-established and actively maintained Perl script, originating to provide accurate word counts for academic and publishing workflows involving LaTeX.
It has evolved over many years to support newer LaTeX features, packages, and more complex document structures, becoming a standard tool for LaTeX users needing precise document statistics.

SEE ALSO

pdflatex(1), latex(1), grep(1), wc(1), bibtex(1)

Copied to clipboard