LinuxCommandLibrary

strfile

Create random access files from text files

SYNOPSIS

strfile [ options ] file [ outfile ]

PARAMETERS

-a or --alt-delimiter
    Use `\n%\n` as the string delimiter, useful for multi-line entries.

-c char or --character=char
    Specify a custom character (char) as the string delimiter. Default is `%`.

-i or --ignore-case
    Ignore case when sorting strings, primarily used with diction.

-l or --length
    Print the length of the longest string found to standard output.

-o or --offset
    Use byte offsets instead of memory pointers, necessary if mmap(2) fails or isn't available.

-r or --random
    Randomize the order of strings in the generated data file.

-s or --silent
    Suppress verbose output, showing only errors.

-v or --verbose
    Display verbose output (e.g., counts of strings, characters). This is the default.

-x or --extfile
    Create an auxiliary extfile for programs that cannot handle large files (> 2GB) directly.

--debug
    Enable debugging output (developer option).

--help
    Display a help message and exit.

--version
    Display version information and exit.

DESCRIPTION

strfile is a utility used to create a random access data file, commonly for applications like fortune or diction. It processes an input text file where individual string entries are separated by a specified delimiter (by default, a newline followed by a percent sign).
Upon execution, strfile generates a binary companion file, typically named filename.dat. This .dat file contains a structured index, including byte offsets for each string within the original text file. This indexing mechanism enables other programs to efficiently seek and retrieve a random string without the necessity of reading the entire source file sequentially.
This capability is especially valuable for managing and accessing content from large text databases, ensuring fast and efficient random access to the stored information.

CAVEATS

The input file must contain strings separated by the specified delimiter.
The delimiter itself must appear on a line by itself, or be preceded by a newline for the default `\n%` delimiter.
The output .dat file is architecture-dependent (endianness, pointer size). A .dat file created on one system might not be readable on another with different architecture characteristics.
For very large files, using the -x option might be necessary on 32-bit systems.

FILE FORMAT

The .dat file generated by strfile is a binary file containing a header structure (including string count, longest string length, checksum) followed by an array of long integers (or off_t for offsets) representing the byte offsets of each string within the original text file. This structure allows for direct random access.

USAGE WITH FORTUNE

To use strfile with a custom fortune file, you typically place your text file (e.g., myfortunes) in a fortune directory (e.g., /usr/share/games/fortune/) and then run strfile myfortunes. This will create myfortunes.dat, making your fortunes available to the fortune command.

HISTORY

strfile is an integral part of the fortune-mod package, tracing its origins back to early Unix and BSD systems. It was developed to provide an efficient indexing mechanism, allowing applications like fortune to quickly retrieve random messages or quotes from large text files without repeatedly parsing the entire dataset. Its core functionality has remained consistent, serving as a foundational utility for efficient random data access in text-based applications.

SEE ALSO

fortune(6), diction(1), mmap(2)

Copied to clipboard