LinuxCommandLibrary

bison

Create a parser from a grammar definition

TLDR

Compile a bison definition file

$ bison [path/to/file.y]
copy

Compile in debug mode, which causes the resulting parser to write additional information to stdout
$ bison [[-t|--debug]] [path/to/file.y]
copy

Specify the output filename
$ bison [[-o|--output]] [path/to/output.c] [path/to/file.y]
copy

Be verbose when compiling
$ bison [[-v|--verbose]]
copy

SYNOPSIS

bison [OPTION]... FILE

PARAMETERS

-d, --defines
    Generate a header file (e.g., .tab.h) with token definitions and other declarations for the parser.

-o FILE, --output=FILE
    Specify the name of the output parser file, e.g., 'parser.c'.

-g, --graph
    Output a VCG (Visualization of Compiler Graphs) grammar graph for visualization.

-v, --verbose
    Output a verbose description of the grammar, including conflicts (shift/reduce, reduce/reduce) and states.

-t, --no-parser
    Do not generate the parser source file. Useful when only the header file (with -d) or reports are needed.

-y, --yacc
    Emulate Yacc compatibility. Primarily affects output file names and some default behaviors.

--file-prefix=PREFIX
    Specify a prefix for all generated file names, overriding the default based on the input file.

--language=LANG
    Specify the target programming language for the generated parser (e.g., 'c', 'c++', 'java').

--name-prefix=PREFIX
    Specify a prefix for all generated external symbols (functions, variables) in the parser, e.g., 'yy' is default for 'yyparse'.

DESCRIPTION

bison is the GNU Project's general-purpose parser generator.
It is compatible with Yacc (Yet Another Compiler Compiler).
It takes a LALR(1) context-free grammar description and generates a C, C++, Java, or other language source file for a parser that can recognize valid sentences of that grammar.
Parsers generated by Bison are LALR(1) and capable of detecting syntax errors.
It is commonly used in conjunction with a lexical analyzer generator like flex (Fast Lexical Analyzer), where flex handles tokenization (lexing) and bison handles parsing the sequence of tokens according to the grammar rules.
This combination is fundamental in building compilers, interpreters, and other language processing tools.
bison helps automate the tedious and error-prone task of writing parsers manually, allowing developers to focus on the grammar rules themselves.

CAVEATS

bison, by default, generates LALR(1) parsers.
This means they cannot handle all context-free grammars, especially those requiring more than one token of lookahead to resolve ambiguities.
Common issues include:
Shift/Reduce Conflicts: Occur when the parser cannot decide whether to shift the next token onto the stack or reduce a grammar rule. Bison defaults to shifting.
Reduce/Reduce Conflicts: Occur when the parser has two or more grammar rules that apply to the same sequence of tokens. Bison resolves by choosing the rule that appears first in the grammar file.
Both types of conflicts often indicate ambiguities in the grammar that need to be resolved by the grammar designer to ensure correct parsing behavior.

<B>INPUT FILE FORMAT</B>

bison expects a grammar description file, conventionally named with a '.y' extension (e.g., 'grammar.y').
This file is typically divided into three main sections, separated by '%%':
1. Declarations (e.g., token definitions, C code inclusions).
2. Grammar Rules (the core of the language definition).
3. Additional C/C++/Java Code (e.g., custom functions, error reporting).

<B>OUTPUT FILES</B>

By default, bison generates a parser source file named after the input file with a '.tab.c' (or '.tab.cpp' for C++, etc.) suffix (e.g., 'grammar.tab.c').
This file contains the parser logic, including the 'yyparse()' function.
When the '-d' or '--defines' option is used, it also generates a corresponding header file (e.g., 'grammar.tab.h') containing token definitions and other necessary declarations for linking with the lexical analyzer.

HISTORY

bison was developed as part of the GNU Project to provide a free software replacement for the proprietary Yacc (Yet Another Compiler Compiler).
It was first released in 1988.
Over the years, bison has significantly evolved, adding support for C++, Java, and other languages, as well as more advanced features like GLR parsing (Generalized LR) for ambiguous grammars, although LALR(1) remains the default.
Its development is ongoing, reflecting advancements in language parsing theory and software engineering practices.
It has become a cornerstone tool for many open-source projects requiring custom language processing, cementing its role in the compiler construction ecosystem.

SEE ALSO

flex(1), yacc(1), lex(1), gcc(1), make(1)

Copied to clipboard