strfile
Create random access files from text files
SYNOPSIS
strfile [ options ] file [ outfile ]
PARAMETERS
-a or --alt-delimiter
Use `\n%\n` as the string delimiter, useful for multi-line entries.
-c char or --character=char
Specify a custom character (char) as the string delimiter. Default is `%`.
-i or --ignore-case
Ignore case when sorting strings, primarily used with diction.
-l or --length
Print the length of the longest string found to standard output.
-o or --offset
Use byte offsets instead of memory pointers, necessary if mmap(2) fails or isn't available.
-r or --random
Randomize the order of strings in the generated data file.
-s or --silent
Suppress verbose output, showing only errors.
-v or --verbose
Display verbose output (e.g., counts of strings, characters). This is the default.
-x or --extfile
Create an auxiliary extfile for programs that cannot handle large files (> 2GB) directly.
--debug
Enable debugging output (developer option).
--help
Display a help message and exit.
--version
Display version information and exit.
DESCRIPTION
strfile is a utility used to create a random access data file, commonly for applications like fortune or diction. It processes an input text file where individual string entries are separated by a specified delimiter (by default, a newline followed by a percent sign).
Upon execution, strfile generates a binary companion file, typically named filename.dat. This .dat file contains a structured index, including byte offsets for each string within the original text file. This indexing mechanism enables other programs to efficiently seek and retrieve a random string without the necessity of reading the entire source file sequentially.
This capability is especially valuable for managing and accessing content from large text databases, ensuring fast and efficient random access to the stored information.
CAVEATS
The input file must contain strings separated by the specified delimiter.
The delimiter itself must appear on a line by itself, or be preceded by a newline for the default `\n%` delimiter.
The output .dat file is architecture-dependent (endianness, pointer size). A .dat file created on one system might not be readable on another with different architecture characteristics.
For very large files, using the -x option might be necessary on 32-bit systems.
FILE FORMAT
The .dat file generated by strfile is a binary file containing a header structure (including string count, longest string length, checksum) followed by an array of long integers (or off_t for offsets) representing the byte offsets of each string within the original text file. This structure allows for direct random access.
USAGE WITH FORTUNE
To use strfile with a custom fortune file, you typically place your text file (e.g., myfortunes) in a fortune directory (e.g., /usr/share/games/fortune/) and then run strfile myfortunes. This will create myfortunes.dat, making your fortunes available to the fortune command.
HISTORY
strfile is an integral part of the fortune-mod package, tracing its origins back to early Unix and BSD systems. It was developed to provide an efficient indexing mechanism, allowing applications like fortune to quickly retrieve random messages or quotes from large text files without repeatedly parsing the entire dataset. Its core functionality has remained consistent, serving as a foundational utility for efficient random data access in text-based applications.
SEE ALSO
fortune(6), diction(1), mmap(2)


