NAME
yacc - Generates an LR(1) parsing program from input consisting of a
context-free grammar specification
SYNOPSIS
yacc [-vltds] [-b prefix] [-N number] [-p symbol_prefix] [-P pathname]
grammar
The yacc command converts a context-free grammar specification into a set
of tables for a simple automaton that executes an LR(1) parsing algorithm.
FLAGS
-b prefix
Uses prefix instead of y as the prefix for all output filenames
(prefix.tab.c, prefix.tab.h, and prefix.output).
-d Produces the y.tab.h file, which contains the #define statements that
associate the yacc-assigned token codes with your token names. This
allows source files other than y.tab.c to access the token codes by
including this header file.
-l Includes no #line constructs in y.tab.c. Use this only after the gram-
mar and associated actions are fully debugged.
-N number
Provides yacc with extra storage for building its LALR tables, which
may be necessary when compiling very large grammars. Thenumber should
be larger than 40,000 when you use this flag.
-p symbol_prefix
Allows multiple yacc parsers to be linked together. Use symbol_prefix
instead of yy to prefix global symbols.
-P pathname
Specifies an alternative parser (instead of /usr/ccs/lib/yaccpar). The
pathname specifies the filename of the skeleton to be used in place of
yaccpar).
-s Breaks the yyparse() function into several smaller functions. Because
its size is somewhat proportional to that of the grammar, it is possi-
ble for yyparse() to become too large to compile, optimize, or execute
efficiently.
-t Compiles run-time debugging code. By default, this code is not
included when y.tab.c is compiled. If YYDEBUG has a nonzero value, the
C compiler (cc) includes the debugging code, whether or not the -t flag
was used. Without compiling this code, yyparse() will run more
quickly.
-v Produces the y.output file, which contains a readable description of
The yacc program reads its skeleton parser from the file
/usr/ccs/lib/yaccpar. Use the environment variable PARSER to specify
another location for yacc to read from.
Syntax for yacc Input
This section contains a formal description of the yacc input file (or gram-
mar file), which is normally named with a .y suffix. The section provides
a listing of the special values, macros, and functions recognized by yacc.
The general format of the yacc input file is:
[ definitions ]
%%
[ rules ]
[ %%
[ user functions ] ]
where
definitions
Is the section where you define the variables to be used later in
the grammar, such as in the rules section. It is also where
files are included (#include) and processing conditions are
defined. This section is optional.
rules Is the section that contains grammar rules for the parser. A
yacc input file must have a rules section.
user functions
Is the section that contains user-supplied functions that can be
used by the actions in the rules section. This section is
optional.
The NULL character must not be used in grammar rules or literals. Each
line in the definitions can be:
%{
%} When placed on lines by themselves, these enclose C code to be
passed into the global definitions of the output file. Such
lines commonly include preprocessor directives and declarations
of external variables and functions.
%token [type] token [number] [name [number]...
Lists tokens or tty symbols to be used in the rest of the input
file. This line is needed for tokens that do not appear in other
% definitions. If type is present, the C type for all tokens on
this line is declared to be the type referenced by type. If a
positive integer number follows a token, that value is assigned
to the token.
that the token cannot be used associatively.
%start symbol
Indicates the highest-level production rule to be reduced; in
other words, the rule where the parser can consider its work done
and terminate. If this definition is not included, the parser
uses the first production rule. The symbol must be non-terminal
(not a token).
%type < type > symbol [ symbol ... ]
Defines each symbol as data type type, to resolve ambiguities. If
this construct is present, yacc performs type checking and other-
wise assumes all symbols to be of type integer.
%union union-def
Defines the yylval global variable as a union, where union-def is
a standard C definition in the format:
{ type member ; [ type member ; ... ] }
At least one member should be an int. Any valid C data type can
be defined, including structures. When you run yacc with the -d
option, the definition of yylval is placed in the y.tab.h file
and can be referred to in a lex input file.
Every token (non-terminal symbol) must be listed in one of the preceding %
definitions. Multiple tokens can be separated by white space or commas.
All the tokens in %left, %right, and %nonassoc definitions are assigned a
precedence with tokens in later definitions having precedence over those in
earlier definitions.
In addition to symbols, a token can be literal character enclosed in single
quotes. (Multibyte characters are recognized by the lexical analyzer and
returned as tokens.) The following special characters can be used, just as
in C programs:
\a Alert
\n Newline
\t Tab
\v Vertical tab
\r Carriage Return
\b Backspace
\f Form Feed
\\ Backslash
\' Single Quote
minal symbols must be declared in %token definitions.
Each symbol-sequence represents an alternative way of reducing the rule. A
symbol can appear recursively in its own rule. Always use left-recursion
(where the recursive symbol appears before the terminating case in
symbol-sequence).
The specific sequence:
%prec token
indicates that the current sequence of symbols is to be preferred over oth-
ers, at the level of precedence assigned to token in the definitions sec-
tion.
The specially defined token error matches any unrecognized sequence of
input. This token causes the parser to invoke the yyerror function. By
default, the parser tries to synchronize with the input and continue pro-
cessing it by reading and discarding all input up to the symbol following
error. (You can override this behavior through the yyerrok action.) If no
error token appears in the yacc input file, the parser exits with an error
message upon encountering unrecognized input.
The parser always executes action after encountering the symbol that pre-
cedes it. Thus, an action can appear in the middle of a symbol-sequence,
after each symbol-sequence, or after multiple instances of symbol-sequence.
In the last case, action is executed when the parser matches any of the
sequences.
The action consists of standard C code within braces and can also take the
following values, variables, and keywords.
yylval If the token returned by the yylex function is associated with a
significant value, yylex should place the value in this global
variable. By default, yylval is of type int. The definitions
section can include a %union definition to associate with other
data types, including structures. If you run yacc with the -d
option, the full yylval definition is passed into the y.tab.h
file for access by lex
yyerrok Causes the parser to start parsing tokens immediately after an
erroneous sequence, instead of performing the default action of
reading and discarding tokens up to a synchronization token. The
yyerrok action should appear immediately after the error token.
$ [ <type> ] n
Refers to symbol n, a token index in the production, counting
from the beginning of the production rule, where the first symbol
after the colon is $1. The type variable is the name of one of
the union lines listed in the %union directive in the declaration
section. The <type> syntax (non-standard) allows the value to be
cast to a specific data type. Note that you will rarely need to
The following functions, which are contained in the user functions section,
are invoked within the yyparse function generated by yacc.
yylex() The lexical analyzer called by yyparse to recognize each token of
input. Usually this function is created by lex. yylex reads
input, recognizes expressions within the input, and returns a
token number representing the kind of token read. The function
returns an int value. A return value of 0 (zero) means the end
of input.
If the parser and yylex do not agree on these token numbers,
reliable communication between them cannot occur. For (one char-
acter) literals, the token is simply the numeric value of the
character in the current character set. The numbers for other
tokens can either be chosen by yacc, or by the user. In either
case, the #define construct of C is used to allow yylex () to
return these numbers symbolically. The #define statements are put
into the code file, and the header file if that file is
requested. The set of characters permitted by yacc in an identif-
ier is larger than that permitted by C. Token names found to con-
tain such characters will not be included in the #define declara-
tions.
If the token numbers are chosed by yacc, the tokens other than
literals, are assigned numbers greater than 256, although no
order is implied. A token can be explicitly assigned a number by
following its first appearance in the declaration section with a
number. Names and literals not defined this way retain their
default definition. All assigned token numbers are unique and
distinct from the token numbers used for literals.If duplicate
token numbers cause conflicts in parser generation, yacc reports
an error; otherwise, it is unspecified whether the token assign-
ment is accepted or an error is reported.
The end of the input is marked by a special token called the end-
marker that has a token number that is zero or negative. All lex-
ical analyzers return zero or negative as a token number upon
reaching the end of their input. If the tokens up to, but not
excluding, the endmarker form a structure that matches the start
symbol, the parser accepts the input. If the endmarker is seen
in any other context, it is considered an error.
yyerror(string)
The function that the parser calls upon encountering an input
error. The default function, defined in liby.a, simply prints
string to the standard error. The user can redefine the func-
tion. The function's type is void.
The liby.a library contains default main() and yyerror() functions. These
look like the following, respectively:
Comments, in C syntax, can appear anywhere in the user functions or defini-
tions sections. In the rules section, comments can appear wherever a sym-
bol is allowed. Blank lines or lines consisting of white space can be
inserted anywhere in the file, and are ignored.
EXAMPLES
This section describes the example programs for the lex and yacc commands,
which together create a simple desk calculator program that performs addi-
tion, subtraction, multiplication, and division operations. The calculator
program also allows you to assign values to variables (each designated by a
single lowercase ASCII letter), and then use the variables in calculations.
The files that contain the program are as follows:
calc.l
The lex specification file that defines the lexical analysis rules.
calc.y
The yacc grammar file that defines the parsing rules and calls the
yylex() function created by lex to provide input.
The remaining text expects that the current directory is the directory that
contains the lex and yacc example program files.
Compiling the Example Program
Perform the following steps to create the example program using lex and
yacc:
1. Process the yacc grammar file using the -d flag. The -d flag tells
yacc to create a file that defines the tokens it uses in addition to
the C language source code.
yacc -d calc.y
2. The following files are created (the *.o files are created temporarily
and then removed):
y.tab.c
The C language source file that yacc created for the parser.
y.tab.h
A header file containing #define statements for the tokens used by
the parser.
3. Process the lex specification file:
lex calc.l
4. The following file is created:
The object file for lex.yy.c.
calc
The executable program file.
You can then run the program directly by entering:
calc
Then enter numbers and operators in calculator fashion. After you
press <Return>, the program displays the result of the operation. If
you assign a value to a variable as follows, the cursor moves to the
next line:
m=4 <Return>
_
You can then use the variable in calculations and it will have the
value assigned to it:
m+5 <Return>
9
The Parser Source Code
The text that follows shows the contents of the file calc.y. This file has
entries in all three of the sections of a yacc grammar file: declarations,
rules, and programs.
%{
#include <stdio.h>
int regs[26];
int base;
%}
%start list
%token DIGIT LETTER
%left '|'
%left '&'
%left '+' '-'
%left '*' '/' '%'
%left UMINUS /*supplies precedence for unary minus */
%% /*beginning of rules section */
list : /*empty */
{ $$ = $1 * $3; }
| expr '/' expr
{ $$ = $1 / $3; }
| expr '%' expr
{ $$ = $1 % $3; }
| expr '+' expr
{ $$ = $1 + $3; }
| expr '-' expr
{ $$ = $1 - $3; }
| expr '&' expr
{ $$ = $1 & $3; }
| expr '|' expr
{ $$ = $1 | $3; }
| '-' expr %prec UMINUS
{ $$ = -$2; }
| LETTER
{ $$ = regs[$1]; }
| number
;
number : DIGIT
{ $$ = $1; base = ($1==0) ? 8:10; }
| number DIGIT
{ $$ = base * $1 + $2; }
;
%%
main()
{
return(yyparse());
}
yyerror(s)
char *s;
{
fprintf(stderr,"%s\n",s);
}
yywrap()
{
return(1);
}
Declarations Section
This section contains entries that perform the following functions:
+ Includes standard I/O header file.
+ Defines global variables.
tines are included in this file, you do not need to use the yacc library
when processing this file.
main() The required main program that calls yyparse() to start the pro-
gram.
yyerror(s) This error handling routine only prints a syntax error message.
yywrap() The wrap-up routine that returns a value of 1 when the end of
input occurs.
The Lexical Analyzer Source Code
This shows the contents of the file calc.lex. This file contains include
statements for standard input and output, as well as for the y.tab.h file.
The yacc program generates that file from the yacc grammar file informa-
tion, if you use the -d flag with the yacc command. The file y.tab.h con-
tains definitions for the tokens that the parser program uses. In addi-
tion, calc.lex contains the rules used to generate the tokens from the
input stream.
%{
#include <stdio.h>
#include "y.tab.h"
int c;
extern YYSTYPE yylval;
%}
%%
" " ;
[a-z] {
c = yytext[0];
yylval = c - 'a';
return(LETTER);
}
[0-9] {
c = yytext[0];
yylval = c - '0';
return(DIGIT);
}
[^a-z 0-9] {
c = yytext[0];
return(c);
}
FILES
y.output A readable description of parsing tables and a report on con-
flicts generated by grammar ambiguities.
y.tab.c Output file.
RELATED INFORMATION
Commands: lex(1).
Programming Support Tools