indxbib
Create inverted index for bibliographic databases
SYNOPSIS
indxbib [-acn] [-o index] file ...
PARAMETERS
-c
Discard common words during indexing. This is often the default behavior, but this option explicitly ensures it.
-a
Do not discard common words; index all words in the bibliography records.
-n
Do not create the dictionary (.d) and vocabulary (.v) files. Only the inverted index (.i) file will be generated.
-o index
Specify the base name for the output index files. Instead of file.i, file.d, file.v, the output files will be named index.i, index.d, index.v. This is useful for combining multiple bibliography sources into a single index.
DESCRIPTION
indxbib is a utility within the groff typesetting system, designed to create an inverted index for bibliographic databases used by the refer preprocessor. It processes one or more bibliography files, which typically contain records formatted with refer's conventions (e.g., lines starting with %A, %T, etc., for author, title). For each bibliography file, indxbib generates three associated files: file.i (the inverted index itself), file.d (a sorted dictionary of unique words), and file.v (a vocabulary list with word counts). This index allows lookbib to efficiently search and retrieve bibliographic entries, making the refer citation process much faster when dealing with large bibliographies. It intelligently discards common words by default to optimize search results and index size, though this behavior can be toggled.
CAVEATS
The indxbib command does not provide an option to specify a custom file for common words directly via the command line. It relies on a default system-wide common words list.
OUTPUT FILES
For each input file (or for the combined index specified by -o), indxbib creates three distinct output files:
file.i: The core inverted index, mapping words to bibliography entries.
file.d: A sorted dictionary of all unique indexed words.
file.v: A vocabulary list showing each indexed word and its frequency count.
BIBLIOGRAPHY FORMAT
indxbib expects bibliography files to be structured according to refer's conventions. Records are separated by blank lines, and each field within a record begins with a percent sign (%) followed by a key letter (e.g., %A for Author, %T for Title) and a space.
HISTORY
indxbib is a long-standing utility, originating from the original AT&T Unix troff typesetting system. It was developed as part of the refer preprocessor suite to enable efficient searching and retrieval of bibliographic references embedded within roff documents. With the advent of groff (GNU roff), an open-source reimplementation of troff, indxbib was included as a core component, maintaining its original functionality and purpose within the groff ecosystem for academic and technical document preparation. Its design reflects the Unix philosophy of small, focused tools.