yacc
Generate a parser from grammar specification
TLDR
Create a file y.tab.c containing the C parser code and compile the grammar file with all necessary constant declarations for values. (Constant declarations file y.tab.h is created only when the -d flag is used)
Compile a grammar file containing the description of the parser and a report of conflicts generated by ambiguities in the grammar
Compile a grammar file, and prefix output filenames with prefix instead of y
SYNOPSIS
yacc [options] filename
PARAMETERS
-b file_prefix
Sets the prefix for the names of the output files (e.g., file_prefix.tab.c, file_prefix.tab.h). Instead of the default 'y', output files will start with file_prefix.
-d
Generates a header file (default: y.tab.h) containing definitions for the token names (enums or #defines) used in the grammar. This file is crucial for the lexical analyzer.
-l
Suppresses the generation of #line directives in the output C code. This can be useful when debugging the generated parser itself, as it prevents line numbers from pointing back to the yacc input file.
-o outfile
Specifies the name of the output parser C source file instead of the default y.tab.c.
-p prefix
Changes the prefix of the external names used in the generated parser (e.g., yyparse becomes prefixparse, yylex becomes prefixlex). Useful for avoiding name collisions when linking multiple parsers into one executable.
-v
Generates a verbose description file (default: y.output) that summarizes the grammar, the state machine, and potential conflicts (shift/reduce, reduce/reduce) generated by yacc. This file is invaluable for debugging grammar ambiguities.
DESCRIPTION
The yacc command, short for "Yet Another Compiler Compiler", is a powerful programming tool that generates a parser (the syntactic analysis part of a compiler) from a formal grammar specification. It is widely used in the development of compilers, interpreters, and other language processing tools.
yacc takes an input file, typically with a .y extension, which describes the grammar of a language using Backus-Naur Form (BNF) or a similar notation, along with associated C code actions for each grammar rule. Upon successful execution, yacc generates a C source file (y.tab.c by default) containing the LALR(1) parsing table and the parsing logic. This generated code can then be compiled with a standard C compiler. It is often used in conjunction with lexical analyzer generators like lex or flex, where lex handles tokenization (lexical analysis) and yacc handles the parsing of these tokens into a structured syntax tree (syntactic analysis).
CAVEATS
The original yacc is an older tool and may not be installed by default on modern Linux distributions. It has largely been superseded by GNU Bison (often just referred to as bison), which offers more features, better error reporting, and is actively maintained. While bison is mostly compatible with yacc grammars, some yacc-specific features might not be present or behave differently in bison, and vice-versa. For new projects, bison is generally recommended over traditional yacc.
GRAMMAR FILE STRUCTURE
A yacc grammar file (e.g., parser.y) typically consists of three sections, separated by %% delimiters:
1. Declarations: Contains C declarations, %token, %type, and %start directives defining tokens, non-terminal types, and the start symbol.
2. Grammar Rules: Defines the language's syntax using production rules (e.g., 'expression: term '+' term;') and associated C code actions to be executed when a rule is matched.
3. User Subroutines: Includes additional C functions, such as main(), yyerror() (for error handling), and yylex() (the lexical analyzer function, usually provided by lex/flex), required for the parser to operate.
OUTPUT FILES
By default, when invoked with a grammar file (e.g., grammar.y), yacc generates two primary files:
y.tab.c: The C source file containing the generated parser function (typically named yyparse()) and the LALR(1) parsing tables. This file is then compiled into an executable.
y.tab.h: A C header file (generated when the -d option is used) defining the token numbers for all terminal symbols declared in the grammar. This file is essential for the lexical analyzer (e.g., generated by lex or flex) to use the same token definitions as the parser, ensuring proper communication between the two stages.
HISTORY
yacc was originally developed by Stephen C. Johnson at Bell Labs in the early 1970s for the Unix operating system. It became a cornerstone tool for compiler construction and significantly influenced the design of subsequent language processing utilities. Its elegant approach to defining syntax and generating parsers made it indispensable for developers. The widespread adoption of yacc led to the development of compatible and enhanced versions, most notably GNU Bison, which aimed to be a free software replacement for yacc while adding numerous extensions and improvements. Although direct usage of the original yacc might be less common on modern systems, its legacy lives on through bison and similar tools.