Node:Grammar in Bison, Next:Semantic Values, Previous:Language and Grammar, Up:Concepts
A formal grammar is a mathematical construct. To define the language for Bison, you must write a file expressing the grammar in Bison syntax: a Bison grammar file. See Bison Grammar Files.
A nonterminal symbol in the formal grammar is represented in Bison input
as an identifier, like an identifier in C. By convention, it should be
in lower case, such as expr
, stmt
or declaration
.
The Bison representation for a terminal symbol is also called a token
type. Token types as well can be represented as C-like identifiers. By
convention, these identifiers should be upper case to distinguish them from
nonterminals: for example, INTEGER
, IDENTIFIER
, IF
or
RETURN
. A terminal symbol that stands for a particular keyword in
the language should be named after that keyword converted to upper case.
The terminal symbol error
is reserved for error recovery.
See Symbols.
A terminal symbol can also be represented as a character literal, just like a C character constant. You should do this whenever a token is just a single character (parenthesis, plus-sign, etc.): use that same character in a literal as the terminal symbol for that token.
A third way to represent a terminal symbol is with a C string constant containing several characters. See Symbols, for more information.
The grammar rules also have an expression in Bison syntax. For example,
here is the Bison rule for a C return
statement. The semicolon in
quotes is a literal character token, representing part of the C syntax for
the statement; the naked semicolon, and the colon, are Bison punctuation
used in every rule.
stmt: RETURN expr ';' ;