flex

Generate lexical analyzers (scanners)

TLDR

Generate an analyzer from a Lex file, storing it to the file lex.yy.c

$ flex [analyzer.l]

Write analyzer to stdout

$ flex [[-t|--stdout]] [analyzer.l]

Specify the output file

$ flex [analyzer.l] [[-o|--outfile]] [analyzer.c]

Generate a batch scanner instead of an interactive scanner

$ flex [[-B|--batch]] [analyzer.l]

Compile a C file generated by Lex

$ cc [path/to/lex.yy.c] -o [executable]

-o file
    Specifies the output filename for the generated C code, overriding the default lex.yy.c.

-i
    Makes the generated scanner case-insensitive, allowing patterns to match regardless of character casing.

-P prefix
    Changes the default yy prefix of all scanner functions and variables (e.g., yylex, yytext).

-t
    Writes the generated scanner C code to standard output instead of creating the default lex.yy.c file.

-L
    Suppresses the generation of #line directives in the output file, which can simplify debugging of the generated code itself.

-v
    Provides a summary of the generated scanner's statistics, such as table sizes and number of states, to standard error.

-F
    Optimizes the generated scanner for speed (Fast Tables) at the cost of potentially larger memory usage for internal tables.

DESCRIPTION

flex (Fast Lexical Analyzer Generator) is a powerful tool that generates C source code for programs that perform pattern matching on text. It is a free and faster re-implementation of the classic lex utility. Given a specification file (typically with a .l or .lex extension) containing a set of regular expressions and corresponding C code actions, flex generates a C source file (by default lex.yy.c). This output file contains a function, yylex(), which reads input, identifies tokens based on the defined patterns, and executes the associated C actions. Widely used in conjunction with parser generators like bison (or yacc), flex is a fundamental component for building compilers, interpreters, and advanced text processing utilities, efficiently breaking input streams into meaningful sequences of characters known as tokens.

CAVEATS

Understanding regular expressions is crucial for effective use. The generated code can be verbose, and debugging complex lexical rules or interactions with parser generators requires careful attention to error handling within actions.

INTEGRATION WITH PARSERS

flex scanners are most often used as the frontend for parsers generated by tools like bison or yacc. The yylex() function, generated by flex, reads input and returns integer token codes (and optionally semantic values via yylval) that correspond to the terminal symbols expected by the parser.

SPECIFICATION FILE STRUCTURE

A flex specification file is divided into three sections, separated by %%: the definitions section (macros, included C code), the rules section (regular expressions and corresponding C actions), and the user code section (additional C functions or data for the actions).

HISTORY

flex is a free software re-implementation of the original lex program. lex was initially developed at Bell Labs in the 1970s. flex was created in 1987 by Vern Paxson under the GNU project to provide a more robust, faster, and freely distributable alternative to lex, while maintaining backward compatibility with existing lex specifications. It has since become the standard lexical analyzer generator in Unix-like environments due to its performance and feature set.