Chapter 2:
A Simple Syntax-directed Translator
Salam R. Al-E’mari
Adham University College
1
Languages are described by their syntaxes
and semantics
What is the meaning of Syntax?
•It means the form or structure of the expressions, statements and program
units. The Syntax rules of a language specify which strings of characters from
the language’s alphabet are in the language.
What is the meaning of Semantics?
It means the meaning of the expressions, statements, and program units.
2
3
character stream
Lexical Analyzer
Syntax Analyzer (Parser)
Semantic Analyzer
Intermediate Code Generator
Machine-Independent Code Optimizer
Code Generator
Machine-Dependent Code Optimizer
token stream
syntax tree
syntax tree
intermediate representation
intermediate representation
target-machine code
target-machine code
Symbol Table Phases of a compiler
Analysis part
Front end of the
compiler
Synthesis part
Back end of the
compiler
The Role of the Lexical
Analyzer
Syntax Definition
1. Definition of Grammars
2. Derivations
3. Parse trees
4. Ambiguity
5. Associativity of operators
6. Precedence of operators
4
5
1.Definition of Grammars
• Context-free grammar is a 4-tuple with
• A set of tokens (terminal symbols)
• A set of nonterminals
• A set of productions
• A designated start symbol
• Lexeme is basic component during lexical analysis. A lexeme consists of related alphabet
from the language. e.g. numeric literals, operators (+), and special words (begin).
• Token (variable names) : of a language is a category of its lexemes, and a lexeme is an
instance of a token.
Lexemes Tokens
index identifier
= equal_sign
2 int_literal
* mult_op
17 int_literal
; semicolon
Example: consider the following Java statement
index = 2 * count +17;
The lexemes and tokens of this statement are represented in the
following table
6
Formal Methods of Describing Syntax
The formal language that is used to describe the syntax of programming
languages is called grammar.
BNF (Backus-Naur Form) and Context-Free Grammars (FG)
BNF is widely accepted way to describe the syntax of a programming language.
Context-Free Grammars
Regular expression and context-free grammar are useful in describing the syntax of a
programming language. Regular expression describes how a token is made of alphabets, and
context-free grammar determines how the tokens are put together as a valid sentence in the
grammar.
7
BNF Fundamentals
• Metalanguage is a language that is used to describe another language. BNF is
a metalanguage for programming languages.
• The BNF consists of rules (or productions). A rule has a left-hand side
(LHS) as the abstraction, and a right-hand side (RHS) as the definition. As in
java assignment statement the definition could be as follows:
<assign> → <var> = <expression>
• The abstractions in a BNF description, or grammar, are often called
nonterminal symbols, or simply nonterminals.
• Lexemes and tokens of the rules are called terminal symbols, or simply
terminals.
8
9
Example Grammar
list  list + digit
list  list - digit
list  digit
digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
G = <{list,digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list>
with productions P =
Context-free grammar for simple expressions:
10
2. Derivation
• Given a CF grammar we can determine the set of all strings (sequences of
tokens) generated by the grammar using derivation
• We begin with the start symbol
• In each step, we replace one nonterminal in the current sentential form with one
of the right-hand sides of a production for that nonterminal
11
Derivation for the Example Grammar
list
 list + digit
 list - digit + digit
 digit - digit + digit
 9 - digit + digit
 9 - 5 + digit
 9 - 5 + 2
This is an example leftmost derivation, because we replaced
the leftmost nonterminal (underlined) in each step.
Likewise, a rightmost derivation replaces the rightmost
nonterminal in each step
A derivation of a program in this
language follows:
<program> => begin <stmt_list> end
=> begin <stmt> ; <stmt_list> end
=> begin <var> = <expression>; <stmt_list> end
=> begin A= <expression>; <stmt_list> end
=> begin A = <var> + < var> ; <stmt_list> end
=> begin A = B+ < var> ; <stmt_list> end
=> begin A = B + C; <stmt_list> end
=> begin A = B + C; <stmt> end
=> begin A = B + C ; <var> = <expression>
end
=> begin A = B + C ; B = <expression> end
=> begin A = B + C ; B = <var> end
=> begin A = B + C ; B = C end
<program> → begin <stmt_list> end
<stmt_list> → <stmt>
| <stmt>; <stmt_list>
<stmt> → <var> = <expression>
<var> → A | B | C
<expression> → <var> + <var>
| <var> - <var>
| <var>
Example :
12
A = B*(A+C) is generated by
the leftmost derivation:
Homework:
a grammar for a Simple
Assignment Statements
<assign> → <id> = <expr>
<id> → A | B | C |
<expr> → <id> + <expr>
| <id>*<expr>
| (<expr>)
| <id>
13
14
3. Parse Trees
• The root of the tree is labeled by the start symbol
• Each leaf of the tree is labeled by a terminal (=token) or 
• Each interior node is labeled by a nonterminal
• If A  X1 X2 … Xn is a production, then node A has
immediate children X1, X2, …, Xn where Xi is a (non)terminal
or  ( denotes the empty string)
15
Parse Tree for the Example Grammar
Parse tree of the string 9-5+2 using grammar G
list
digit
9 - 5 + 2
list
list digit
digit
The sequence of
leaf is called the
yield of the parse tree
<assign>
A parse tree for the simple statement:
A = B * (A + C)
<id>
=
<expr>
<id>
*
<expr>
(
<expr>
)
<id>
+
<expr>
A
B
A
<id>
C
Parse Tree for the Example Grammar
16
17
4. Ambiguity
string  string + string | string - string | 0 | 1 | … | 9
G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>
with production P =
Consider the following context-free grammar:
This grammar is ambiguous, because more than one parse tree
represents the string 9-5+2
18
Ambiguity (cont’d)
string
string
9 - 5 + 2
string
string string
string
string
9 - 5 + 2
string
string string
<assign>
Ambiguity Example : Two distinct parse trees (generated by the grammar on
the next slide) for the same sentence, A = B +C * A
<id>
=
<expr>
<expr>
+
<expr>
<expr>
*
<expr>
<id>
A
<id>
C
<id>
A
B
<assign>
<id>
=
<expr>
<expr>
*
<expr>
<expr>
+
<expr>
<id>
A
<id>
B
<id>
C
A
19
20
5- Associativity of Operators
right  term = right | term
left  left + term | term
Left-associative operators have left-recursive productions
Right-associative operators have right-recursive productions
String a=b=c has the same meaning as a=(b=c)
String a+b+c has the same meaning as (a+b)+c
Operator associativity can also be indicated by a grammar.
<expr> => <expr> + const | const (unambiguous)
<expr> => <expr> + <expr> | const (ambiguous)
<expr>
<expr> + const
const
<expr> + const
Associativity of Operators
21
22
6. Precedence of Operators
expr  expr + term | term
term  term * factor | factor
factor  number | ( expr )
Operators with higher precedence “bind more tightly”
String 2+3*5 has the same meaning as 2+(3*5)
expr
expr term
factor
+
2 3 * 5
term
factor
term
factor
number
number
number
Example:
If we use the parse tree to indicate precedence levels of the
operators, we cannot have ambiguity
<expr> → <expr> - <term> | <term>
<term> → <term> / const | const
23
- Optional parts are placed in brackets ([ ])
<if_stmt> → if (<expression>) <statement> [else
<statement>]
- Alternative parts of RHSs in parentheses and separate them with
vertical bars
<term> → <term> (* | / | %) <factor>
- Put repetitions (0 or more) are placed inside braces ({})
<ident_list> → <identifier> {, <identifier>}
- EBNF has the same expression power as BNF
- The addition includes optional constructs (parts), repetition, and multiple choices,
very much like regular expression.
Extended BNF
24
BNF
<expr>  <expr> + <term>
| <expr> - <term>
| <term>
<term>  <term> * <factor>
| <term> / <factor>
| <factor>
EBNF
<expr>  <term> {(+ | -) <term>}
<term>  <factor> {(* | /) <factor>}
25
26
Syntax-Directed Translation
• Uses a CF grammar to specify the syntactic structure of the
language
• AND associates a set of attributes with the terminals and
nonterminals of the grammar
• AND associates with each production a set of semantic rules
to compute values of attributes
• A parse tree is traversed and semantic rules applied: after
the tree traversal(s) are completed, the attribute values on
the nonterminals contain the translated form of the input
27
Synthesized and Inherited Attributes
• An attribute is said to be:
• Synthesized if its value at a parse-tree node is determined from the attribute
values at the children of the node.
• Inherited if its value at a parse-tree node is determined by the parent (by
enforcing the parent’s semantic rules)
28
Example Attribute Grammar
expr  expr1 + term
expr  expr1 - term
expr  term
term  0
term  1
…
term  9
expr.t := expr1.t // term.t // “+”
expr.t := expr1.t // term.t // “-”
expr.t := term.t
term.t := “0”
term.t := “1”
…
term.t := “9”
Production Semantic Rule
String concat operator
29
Example Annotated Parse Tree
expr.t = “95-2+”
term.t = “2”
9 - 5 + 2
expr.t = “95-”
expr.t = “9” term.t = “5”
term.t = “9”
30
Depth-First Traversals
procedure visit(n : node);
begin
for each child m of n, from left to right do
visit(m);
evaluate semantic rules at node n
end
31
Depth-First Traversals (Example)
expr.t = “95-2+”
term.t = “2”
9 - 5 + 2
expr.t = “95-”
expr.t = “9” term.t = “5”
term.t = “9”
Note: all attributes are
of the synthesized type
32
Translation Schemes
• A translation scheme is a CF grammar embedded with semantic actions
rest  + term { print(“+”) } rest
Embedded
semantic action
rest
term rest
+ { print(“+”) }
33
Example Translation Scheme
expr  expr + term
expr  expr - term
expr  term
term  0
term  1
…
term  9
{ print(“+”) }
{ print(“-”) }
{ print(“0”) }
{ print(“1”) }
…
{ print(“9”) }
34
Example Translation Scheme (cont’d)
expr
term
9
-
5
+
2
expr
expr term
term
{ print(“-”) }
{ print(“+”) }
{ print(“9”) }
{ print(“5”) }
{ print(“2”) }
Translates 9-5+2 into postfix 95-2+
35
Parsing
• Parsing = process of determining if a string of tokens can be
generated by a grammar
• For any CF grammar there is a parser that takes at most O(n3
)
time to parse a string of n tokens
• Linear algorithms suffice for parsing programming language
source code
• Top-down parsing “constructs” a parse tree from root to leaves
• Bottom-up parsing “constructs” a parse tree from leaves to root
36
Predictive Parsing
• Recursive descent parsing is a top-down parsing method
• Each nonterminal has one (recursive) procedure tha tis responsible
for parsing the nonterminal’s syntactic category of input tokens
• When a nonterminal has multiple productions, each production is
implemented in a branch of a selection statement based on input
look-ahead information
• Predictive parsing is a special form of recursive descent
parsing where we use one lookahead token to
unambiguously determine the parse operations
37
Example Predictive Parser (Grammar)
type  simple
| ^ id
| array [ simple ] of type
simple  integer
| char
| num dotdot num
38
Example Predictive Parser (Program
Code)
procedure match(t : token);
begin
if lookahead = t then
lookahead := nexttoken()
else error()
end;
procedure type();
begin
if lookahead in { ‘integer’, ‘char’, ‘num’ } then
simple()
else if lookahead = ‘^’ then
match(‘^’); match(id)
else if lookahead = ‘array’ then
match(‘array’); match(‘[‘); simple();
match(‘]’); match(‘of’); type()
else error()
end;
procedure simple();
begin
if lookahead = ‘integer’ then
match(‘integer’)
else if lookahead = ‘char’ then
match(‘char’)
else if lookahead = ‘num’ then
match(‘num’);
match(‘dotdot’);
match(‘num’)
else error()
end;
39
Example Predictive Parser (Execution Step
1)
type()
match(‘array’)
array [ num num
dotdot ] of integer
Input:
lookahead
Check lookahead
and call match
40
Example Predictive Parser (Execution Step
2)
match(‘array’)
array [ num num
dotdot ] of integer
Input:
lookahead
match(‘[’)
type()
41
Example Predictive Parser (Execution Step
3)
simple()
match(‘array’)
array [ num num
dotdot ] of integer
Input:
lookahead
match(‘[’)
match(‘num’)
type()
42
Example Predictive Parser (Execution Step
4)
simple()
match(‘array’)
array [ num num
dotdot ] of integer
Input:
lookahead
match(‘[’)
match(‘num’) match(‘dotdot’)
type()
43
Example Predictive Parser (Execution Step
5)
simple()
match(‘array’)
array [ num num
dotdot ] of integer
Input:
lookahead
match(‘[’)
match(‘num’) match(‘num’)
match(‘dotdot’)
type()
44
Example Predictive Parser (Execution Step
6)
simple()
match(‘array’)
array [ num num
dotdot ] of integer
Input:
lookahead
match(‘[’) match(‘]’)
match(‘num’) match(‘num’)
match(‘dotdot’)
type()
45
Example Predictive Parser (Execution Step
7)
simple()
match(‘array’)
array [ num num
dotdot ] of integer
Input:
lookahead
match(‘[’) match(‘]’) match(‘of’)
match(‘num’) match(‘num’)
match(‘dotdot’)
type()
46
Example Predictive Parser (Execution Step
8)
simple()
match(‘array’)
array [ num num
dotdot ] of integer
Input:
lookahead
match(‘[’) match(‘]’) type()
match(‘of’)
match(‘num’) match(‘num’)
match(‘dotdot’)
match(‘integer’)
type()
simple()
47
Left Factoring
When more than one production for nonterminal A starts
with the same symbols, the FIRST sets are not disjoint
We can use left factoring to fix the problem
stmt  if expr then stmt endif
| if expr then stmt else stmt endif
stmt  if expr then stmt opt_else
opt_else  else stmt endif
| endif
48
Left Recursion
When a production for nonterminal A starts with a
self reference then a predictive parser loops forever
A  A 
| 
| 
We can eliminate left recursive productions by systematically
rewriting the grammar using right recursive productions
A   R
|  R
R   R
| 
49
A Translator for Simple Expressions
expr  expr + term
expr  expr - term
expr  term
term  0
term  1
…
term  9
{ print(“+”) }
{ print(“-”) }
{ print(“0”) }
{ print(“1”) }
…
{ print(“9”) }
expr  term rest
rest  + term { print(“+”) } rest | - term { print(“-”) } rest | 
term  0 { print(“0”) }
term  1 { print(“1”) }
…
term  9 { print(“9”) }
After left recursion elimination:
50
Thank you
Thank you 


Compiler design lessons notes from Semester

  • 1.
    Chapter 2: A SimpleSyntax-directed Translator Salam R. Al-E’mari Adham University College 1
  • 2.
    Languages are describedby their syntaxes and semantics What is the meaning of Syntax? •It means the form or structure of the expressions, statements and program units. The Syntax rules of a language specify which strings of characters from the language’s alphabet are in the language. What is the meaning of Semantics? It means the meaning of the expressions, statements, and program units. 2
  • 3.
    3 character stream Lexical Analyzer SyntaxAnalyzer (Parser) Semantic Analyzer Intermediate Code Generator Machine-Independent Code Optimizer Code Generator Machine-Dependent Code Optimizer token stream syntax tree syntax tree intermediate representation intermediate representation target-machine code target-machine code Symbol Table Phases of a compiler Analysis part Front end of the compiler Synthesis part Back end of the compiler The Role of the Lexical Analyzer
  • 4.
    Syntax Definition 1. Definitionof Grammars 2. Derivations 3. Parse trees 4. Ambiguity 5. Associativity of operators 6. Precedence of operators 4
  • 5.
    5 1.Definition of Grammars •Context-free grammar is a 4-tuple with • A set of tokens (terminal symbols) • A set of nonterminals • A set of productions • A designated start symbol • Lexeme is basic component during lexical analysis. A lexeme consists of related alphabet from the language. e.g. numeric literals, operators (+), and special words (begin). • Token (variable names) : of a language is a category of its lexemes, and a lexeme is an instance of a token.
  • 6.
    Lexemes Tokens index identifier =equal_sign 2 int_literal * mult_op 17 int_literal ; semicolon Example: consider the following Java statement index = 2 * count +17; The lexemes and tokens of this statement are represented in the following table 6
  • 7.
    Formal Methods ofDescribing Syntax The formal language that is used to describe the syntax of programming languages is called grammar. BNF (Backus-Naur Form) and Context-Free Grammars (FG) BNF is widely accepted way to describe the syntax of a programming language. Context-Free Grammars Regular expression and context-free grammar are useful in describing the syntax of a programming language. Regular expression describes how a token is made of alphabets, and context-free grammar determines how the tokens are put together as a valid sentence in the grammar. 7
  • 8.
    BNF Fundamentals • Metalanguageis a language that is used to describe another language. BNF is a metalanguage for programming languages. • The BNF consists of rules (or productions). A rule has a left-hand side (LHS) as the abstraction, and a right-hand side (RHS) as the definition. As in java assignment statement the definition could be as follows: <assign> → <var> = <expression> • The abstractions in a BNF description, or grammar, are often called nonterminal symbols, or simply nonterminals. • Lexemes and tokens of the rules are called terminal symbols, or simply terminals. 8
  • 9.
    9 Example Grammar list list + digit list  list - digit list  digit digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 G = <{list,digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list> with productions P = Context-free grammar for simple expressions:
  • 10.
    10 2. Derivation • Givena CF grammar we can determine the set of all strings (sequences of tokens) generated by the grammar using derivation • We begin with the start symbol • In each step, we replace one nonterminal in the current sentential form with one of the right-hand sides of a production for that nonterminal
  • 11.
    11 Derivation for theExample Grammar list  list + digit  list - digit + digit  digit - digit + digit  9 - digit + digit  9 - 5 + digit  9 - 5 + 2 This is an example leftmost derivation, because we replaced the leftmost nonterminal (underlined) in each step. Likewise, a rightmost derivation replaces the rightmost nonterminal in each step
  • 12.
    A derivation ofa program in this language follows: <program> => begin <stmt_list> end => begin <stmt> ; <stmt_list> end => begin <var> = <expression>; <stmt_list> end => begin A= <expression>; <stmt_list> end => begin A = <var> + < var> ; <stmt_list> end => begin A = B+ < var> ; <stmt_list> end => begin A = B + C; <stmt_list> end => begin A = B + C; <stmt> end => begin A = B + C ; <var> = <expression> end => begin A = B + C ; B = <expression> end => begin A = B + C ; B = <var> end => begin A = B + C ; B = C end <program> → begin <stmt_list> end <stmt_list> → <stmt> | <stmt>; <stmt_list> <stmt> → <var> = <expression> <var> → A | B | C <expression> → <var> + <var> | <var> - <var> | <var> Example : 12
  • 13.
    A = B*(A+C)is generated by the leftmost derivation: Homework: a grammar for a Simple Assignment Statements <assign> → <id> = <expr> <id> → A | B | C | <expr> → <id> + <expr> | <id>*<expr> | (<expr>) | <id> 13
  • 14.
    14 3. Parse Trees •The root of the tree is labeled by the start symbol • Each leaf of the tree is labeled by a terminal (=token) or  • Each interior node is labeled by a nonterminal • If A  X1 X2 … Xn is a production, then node A has immediate children X1, X2, …, Xn where Xi is a (non)terminal or  ( denotes the empty string)
  • 15.
    15 Parse Tree forthe Example Grammar Parse tree of the string 9-5+2 using grammar G list digit 9 - 5 + 2 list list digit digit The sequence of leaf is called the yield of the parse tree
  • 16.
    <assign> A parse treefor the simple statement: A = B * (A + C) <id> = <expr> <id> * <expr> ( <expr> ) <id> + <expr> A B A <id> C Parse Tree for the Example Grammar 16
  • 17.
    17 4. Ambiguity string string + string | string - string | 0 | 1 | … | 9 G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string> with production P = Consider the following context-free grammar: This grammar is ambiguous, because more than one parse tree represents the string 9-5+2
  • 18.
    18 Ambiguity (cont’d) string string 9 -5 + 2 string string string string string 9 - 5 + 2 string string string
  • 19.
    <assign> Ambiguity Example :Two distinct parse trees (generated by the grammar on the next slide) for the same sentence, A = B +C * A <id> = <expr> <expr> + <expr> <expr> * <expr> <id> A <id> C <id> A B <assign> <id> = <expr> <expr> * <expr> <expr> + <expr> <id> A <id> B <id> C A 19
  • 20.
    20 5- Associativity ofOperators right  term = right | term left  left + term | term Left-associative operators have left-recursive productions Right-associative operators have right-recursive productions String a=b=c has the same meaning as a=(b=c) String a+b+c has the same meaning as (a+b)+c
  • 21.
    Operator associativity canalso be indicated by a grammar. <expr> => <expr> + const | const (unambiguous) <expr> => <expr> + <expr> | const (ambiguous) <expr> <expr> + const const <expr> + const Associativity of Operators 21
  • 22.
    22 6. Precedence ofOperators expr  expr + term | term term  term * factor | factor factor  number | ( expr ) Operators with higher precedence “bind more tightly” String 2+3*5 has the same meaning as 2+(3*5) expr expr term factor + 2 3 * 5 term factor term factor number number number
  • 23.
    Example: If we usethe parse tree to indicate precedence levels of the operators, we cannot have ambiguity <expr> → <expr> - <term> | <term> <term> → <term> / const | const 23
  • 24.
    - Optional partsare placed in brackets ([ ]) <if_stmt> → if (<expression>) <statement> [else <statement>] - Alternative parts of RHSs in parentheses and separate them with vertical bars <term> → <term> (* | / | %) <factor> - Put repetitions (0 or more) are placed inside braces ({}) <ident_list> → <identifier> {, <identifier>} - EBNF has the same expression power as BNF - The addition includes optional constructs (parts), repetition, and multiple choices, very much like regular expression. Extended BNF 24
  • 25.
    BNF <expr>  <expr>+ <term> | <expr> - <term> | <term> <term>  <term> * <factor> | <term> / <factor> | <factor> EBNF <expr>  <term> {(+ | -) <term>} <term>  <factor> {(* | /) <factor>} 25
  • 26.
    26 Syntax-Directed Translation • Usesa CF grammar to specify the syntactic structure of the language • AND associates a set of attributes with the terminals and nonterminals of the grammar • AND associates with each production a set of semantic rules to compute values of attributes • A parse tree is traversed and semantic rules applied: after the tree traversal(s) are completed, the attribute values on the nonterminals contain the translated form of the input
  • 27.
    27 Synthesized and InheritedAttributes • An attribute is said to be: • Synthesized if its value at a parse-tree node is determined from the attribute values at the children of the node. • Inherited if its value at a parse-tree node is determined by the parent (by enforcing the parent’s semantic rules)
  • 28.
    28 Example Attribute Grammar expr expr1 + term expr  expr1 - term expr  term term  0 term  1 … term  9 expr.t := expr1.t // term.t // “+” expr.t := expr1.t // term.t // “-” expr.t := term.t term.t := “0” term.t := “1” … term.t := “9” Production Semantic Rule String concat operator
  • 29.
    29 Example Annotated ParseTree expr.t = “95-2+” term.t = “2” 9 - 5 + 2 expr.t = “95-” expr.t = “9” term.t = “5” term.t = “9”
  • 30.
    30 Depth-First Traversals procedure visit(n: node); begin for each child m of n, from left to right do visit(m); evaluate semantic rules at node n end
  • 31.
    31 Depth-First Traversals (Example) expr.t= “95-2+” term.t = “2” 9 - 5 + 2 expr.t = “95-” expr.t = “9” term.t = “5” term.t = “9” Note: all attributes are of the synthesized type
  • 32.
    32 Translation Schemes • Atranslation scheme is a CF grammar embedded with semantic actions rest  + term { print(“+”) } rest Embedded semantic action rest term rest + { print(“+”) }
  • 33.
    33 Example Translation Scheme expr expr + term expr  expr - term expr  term term  0 term  1 … term  9 { print(“+”) } { print(“-”) } { print(“0”) } { print(“1”) } … { print(“9”) }
  • 34.
    34 Example Translation Scheme(cont’d) expr term 9 - 5 + 2 expr expr term term { print(“-”) } { print(“+”) } { print(“9”) } { print(“5”) } { print(“2”) } Translates 9-5+2 into postfix 95-2+
  • 35.
    35 Parsing • Parsing =process of determining if a string of tokens can be generated by a grammar • For any CF grammar there is a parser that takes at most O(n3 ) time to parse a string of n tokens • Linear algorithms suffice for parsing programming language source code • Top-down parsing “constructs” a parse tree from root to leaves • Bottom-up parsing “constructs” a parse tree from leaves to root
  • 36.
    36 Predictive Parsing • Recursivedescent parsing is a top-down parsing method • Each nonterminal has one (recursive) procedure tha tis responsible for parsing the nonterminal’s syntactic category of input tokens • When a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input look-ahead information • Predictive parsing is a special form of recursive descent parsing where we use one lookahead token to unambiguously determine the parse operations
  • 37.
    37 Example Predictive Parser(Grammar) type  simple | ^ id | array [ simple ] of type simple  integer | char | num dotdot num
  • 38.
    38 Example Predictive Parser(Program Code) procedure match(t : token); begin if lookahead = t then lookahead := nexttoken() else error() end; procedure type(); begin if lookahead in { ‘integer’, ‘char’, ‘num’ } then simple() else if lookahead = ‘^’ then match(‘^’); match(id) else if lookahead = ‘array’ then match(‘array’); match(‘[‘); simple(); match(‘]’); match(‘of’); type() else error() end; procedure simple(); begin if lookahead = ‘integer’ then match(‘integer’) else if lookahead = ‘char’ then match(‘char’) else if lookahead = ‘num’ then match(‘num’); match(‘dotdot’); match(‘num’) else error() end;
  • 39.
    39 Example Predictive Parser(Execution Step 1) type() match(‘array’) array [ num num dotdot ] of integer Input: lookahead Check lookahead and call match
  • 40.
    40 Example Predictive Parser(Execution Step 2) match(‘array’) array [ num num dotdot ] of integer Input: lookahead match(‘[’) type()
  • 41.
    41 Example Predictive Parser(Execution Step 3) simple() match(‘array’) array [ num num dotdot ] of integer Input: lookahead match(‘[’) match(‘num’) type()
  • 42.
    42 Example Predictive Parser(Execution Step 4) simple() match(‘array’) array [ num num dotdot ] of integer Input: lookahead match(‘[’) match(‘num’) match(‘dotdot’) type()
  • 43.
    43 Example Predictive Parser(Execution Step 5) simple() match(‘array’) array [ num num dotdot ] of integer Input: lookahead match(‘[’) match(‘num’) match(‘num’) match(‘dotdot’) type()
  • 44.
    44 Example Predictive Parser(Execution Step 6) simple() match(‘array’) array [ num num dotdot ] of integer Input: lookahead match(‘[’) match(‘]’) match(‘num’) match(‘num’) match(‘dotdot’) type()
  • 45.
    45 Example Predictive Parser(Execution Step 7) simple() match(‘array’) array [ num num dotdot ] of integer Input: lookahead match(‘[’) match(‘]’) match(‘of’) match(‘num’) match(‘num’) match(‘dotdot’) type()
  • 46.
    46 Example Predictive Parser(Execution Step 8) simple() match(‘array’) array [ num num dotdot ] of integer Input: lookahead match(‘[’) match(‘]’) type() match(‘of’) match(‘num’) match(‘num’) match(‘dotdot’) match(‘integer’) type() simple()
  • 47.
    47 Left Factoring When morethan one production for nonterminal A starts with the same symbols, the FIRST sets are not disjoint We can use left factoring to fix the problem stmt  if expr then stmt endif | if expr then stmt else stmt endif stmt  if expr then stmt opt_else opt_else  else stmt endif | endif
  • 48.
    48 Left Recursion When aproduction for nonterminal A starts with a self reference then a predictive parser loops forever A  A  |  |  We can eliminate left recursive productions by systematically rewriting the grammar using right recursive productions A   R |  R R   R | 
  • 49.
    49 A Translator forSimple Expressions expr  expr + term expr  expr - term expr  term term  0 term  1 … term  9 { print(“+”) } { print(“-”) } { print(“0”) } { print(“1”) } … { print(“9”) } expr  term rest rest  + term { print(“+”) } rest | - term { print(“-”) } rest |  term  0 { print(“0”) } term  1 { print(“1”) } … term  9 { print(“9”) } After left recursion elimination:
  • 50.