Top down parsers are more restricted than bottom up parsers. However, ANTLR uses a top-down parser. In this chapter parse tables and recursive descent parsers are described.
Science 7 - LAND and SEA BREEZE and its Characteristics
5 top-down-parsers
1. 5/11/2021 Saeed Parsa 1
Compiler Design
Top down parsers
Saeed Parsa
Room 332,
School of Computer Engineering,
Iran University of Science & Technology
parsa@iust.ac.ir
Winter 2021
2. Who, when/where, and what?
• Who are we?
• Lecturer
• Saeed Parsa
• Associate Professor in IUST
• Research Area: Software Engineering, Software Testing,
Software Debugging, Reverse Engineering, etc.
• Email: parsa@iust.ac.ir
• More Information:
• http://parsa.iust.ac.ir/
• Slide Share
• https://www.slideshare.net/SaeedParsa
5/11/2021 Saeed Parsa 2
5. 5/11/2021 Saeed Parsa 5
Top-Down Parsing
A predictive parser is characterized by its ability to choose the production to
apply solely on the basis of the next input symbol and the current
nonterminal being processed.
Top down parsing, starts with the start symbol and apply the
productions until arriving at the desired string.
7. 5/11/2021 Saeed Parsa 7
Predictive parsers
A predictive parser uses the next input symbol,
as a look-ahead to determine the production
rule for expanding the current nonterminal.
9. 5/11/2021 Saeed Parsa 9
LL(1) Grammars
A grammar is LL(1) if it can be parsed by considering only one non-
terminal and the next token, as look ahead, in the input stream.
Example: The following grammar is LL(1):
S ::= a S B | d B
B ::= b B | a
N.T. a b d
S a S B d B
B a bB
Parsing table:
10. 5/11/2021 Saeed Parsa 10
The following grammar is LL(1):
S ::= a S B | d B
B ::= b B | a
N.T. a b d
S a S B d B
B a bB
Parsing table:
Input string:
adbaa
Parsing:
Use parsing table
to select a production
S
Example
11. 5/11/2021 Saeed Parsa 11
LL(1) Grammars defintion
A grammar, G, is LL(1) if and only if:
“A 𝛽1 | 𝛽2 | … | 𝛽𝑛" G
𝐹𝑖𝑟𝑠𝑡(𝛽𝑖), 𝐹𝑖𝑟𝑠𝑡(𝛽𝑗) = 𝜑 , 𝑖, 𝑗 1. . 𝑛 i j
where
A a 𝛽 First(A) = { a } , a Terminal Symbols & 𝛽 is any string
A B 𝛽 First(A) ⊇ First(B), B Nonterminal Symbols & …
A 𝛽1 | 𝛽2 | … | 𝛽𝑛
First(A) = First(𝛽1) First(𝛽2) … First(𝛽𝑛)
12. 5/11/2021 Saeed Parsa 12
First Set: Definition
• Suppose A is a nonterminals, First(A) consists of the first terminals
that can be derived from A.
if A ⇒* aβ, then a ∈ First(A)
if A ⇒* (nullable), then ∈ First(A)
First(A) = First(aB)+ First(CD)
= {a} + First(C) + First(D)
= {a, c, d}
A aB | CD
B Bb |
C c |
D d
Another grammar
E.g.
First(B) = {b, }
14. 5/11/2021 Saeed Parsa 14
First Set: Properties
1. If X is a terminal or ε, then First(X) = {X}
2. Suppose X is a nonterminal and X Y1Y2...Yk
- if for some i, Y1...Yi-1 ⇒* ε , then First(X) ⊇ First(Yi) – {ε}
- if Y1...Yk ⇒* ε, then ε ∈ First(X)
First(A) = {a, c, d} A aB | CD
B Bb | ε
C c | ε
D d
Another grammar
E.g.
Why exclude it ?
16. 5/11/2021 Saeed Parsa 16
Example 1
First(B) = First(ea) = {e}
First(C) = First(bC) + First(d) = {b, d}
First(A) = {a} + First(B) + First(C) = {a,b,d,e}
G is LL(1) because:
A a A | B b | C d a First(aA)∩First(Bb), First(aA)∩First(Cda), First(Bb)∩First(Cda)
= {a} ∩ {e} = 𝜑 , = {a} ∩ {b,d} = 𝜑, = {e} ∩ {b,d} = 𝜑
C b C | d First(bC) ∩ First(d)
={b} ∩ {d} = 𝜑
17. 5/11/2021 Saeed Parsa 17
Example 2
G: A a A | a B d| C d a
C d C | a
B a e
G is not LL(1) because:
A a A | a B d | C d a
1. First(aA) ∩ First(a B b) = a 𝜑
2. also First(Cda) = {d, a} {a}
if ( look-ahead == {a} current symbol == A )
then it will not be possible to determine which production to choose
18. 5/11/2021 Saeed Parsa 18
i. Left Factoring
G: A a eA | a B d| C d a
C d C | a
B a e
A a e A| a B d| d C d a | a d a
C d C | a
B a e
A a A’ | d C d a
A’ e A |B d | d a
C d C | a
B a e
19. 5/11/2021 Saeed Parsa 19
We say that a nonterminal x is nullable if the empty sequence can be
derived from it.
X * then Nullable(X) = true
Nullable
To be LL(1) for any production:
Y *
First() First() should be empty.
A B b e d
B b d |
First(B) Follow(B) = {b}
20. 5/11/2021 Saeed Parsa 20
Where
LL(1) grammars
A grammar is LL(1) if and only if:
No two distinct productions with the same LHS can generate the same
first terminal symbol. (eg. A → a | a β is not LL{1})
No nullable symbol ‘A’ has the same terminal symbol ‘a’ in both its first
and follow sets for distinct production rules.
There is only one way to send a nullable symbol to .
21. 5/11/2021 Saeed Parsa 21
Follow:
A → αBβ then Follow(B) = First(β)
A → αBβC and Nullable(β)= true, then Follow(B)= Follow(B)∪First(β)∪First(C)
A → αBβ and Nullable(β) = true, then Follow(B) = Follow(B)∪First(β)∪ Follow(A)
Predict(A, a) = {A → β : a ∈ First(β)} ∪ {A → β : β is nullable and a ∈ Follow(A)}
LL(1) grammars
22. 5/11/2021 Saeed Parsa 22
• G : S a b | a d
Is G LL(1) ? First(ab) ∩ First(ad) = {a}
S is not LL(1) Left refactoring should be applied
G : S a A’
A’ b | d
• G: S a B d
B d e|
Is G LL(1) ? First(B) ∩ Follow(B) = {d}
S is not LL(1) productions should be removed
G : S a de d | a d
S a d S’
LL(1)
23. 5/11/2021 Saeed Parsa 23
• Example: determine the follow set where required
S a B C d | C A
B b B |
C a A e|
A e
First(S) = {a} + First(C)
First(B) = {b, } = {b} + Follow(B) = {b} + First(C) = {b, a, d, e}
First(C) = {a, } = {a} + Follow(C) = {a} + {d} + First(A) ={a, d, e}
Example - 1
24. 5/11/2021 Saeed Parsa 24
• Example: determine the follow set where required
S a B C d | C A
B b B |
C a A e|
A e
First(S) = {a} + First(C) = {a, …}
First(B) = {b, } = {b} + Follow(B) = {b} + First(C) = {b, d, e}
First(C) = {a, } = {a} + Follow(C) = {a} + {d} + First(A) ={a, d, e}
Example - 2
25. 5/11/2021 Saeed Parsa 25
Production Rules
S -> aBDh
B -> cC
C -> bC | d
D -> EF
E -> g | λ
F -> f | λ
First sets
First(D) = First(E) = {g, f, h}
First(E) = {g} + Follow(E) = {g, f, h}
Follow(E) = First(F) = {f, h}
First(F) = {f} + Follow(F) = {f, h}
Follow(F) = Follow(D) = {h}
Follow(S) = {$}
Follow(B) = First(D) = {g, f, h}
Follow(C) = Follow(B) = {g, f, h}
Follow(D) = {h}
Example - 3
26. 5/11/2021 Saeed Parsa 26
Production Rules:
S -> aBDh
B -> cC
C -> bC | d
D -> EF | h
E -> g | λ
F -> f | λ
First(D) = First(E) = {g, f, h}
First(E) = {g} + Follow(E) = {g, f, h}
Follow(E) = First(F) = {f, h}
First(F) = {f} + Follow(F) = {f, h}
Follow(F) = Follow(D) = {h}
Follow(S) = {$}
Follow(B) = First(D) = {g, f, h}
Follow(C) = Follow(B) = {g, f, h}
Follow(D) = {h}
Is this grammar LL(1) ?
C -> bC | d First)bC) First(d) =
D -> EF | h First(EF) First(h) = {h}
D -> g F | F | h D -> g f | g | f | λ | h
First(D) Follow(D) ={ h }
D -> g f | g | f | h
S -> a B D h | a B h
S -> a B S’
S’ -> D h | h
D -> gf | g | f |h
S’ -> g f h | g h | f h | h h | h
Example - 4
28. 5/11/2021 Saeed Parsa 28
Follow sets
• What ?
The Follow set of a non-terminal, A, is the First of symbols that come
after A.
• Why ?
For a grammar, G, to be LL(1), as described before:
B G | Nullable(B) First(B) Follow(B) =
• How ?
A B Follow(B) = First()
if nullable () Follow(B)= First()+Follow(A)
• Examples ? Let me finish with the description and then …
30. 5/11/2021 Saeed Parsa 30
Notice:
Input to a compiler is a source file;
A source file like any other files ends up with an end of file marker;
The end of file marker is represented by $;
Since a source program is supposed to be an instance of the start
symbol, therefore:
$ is always considered as a member of the follow set of the start
symbol.
Follow Sets
31. 5/11/2021 Saeed Parsa 31
Follow Set Example 1
• Notice:
Always add the end of
File marker, $, to the
Follow(start symbol)
32. 5/11/2021 Saeed Parsa 32
Follow Set Example 1
C → cC => First(C) = {c} + follow(C)
S → C => Follow(C) Follow(S)
S → ASb => Follow(S) {b}
S is strting symbol => $ Follow(S)
=> Follow(S) ={b, $}
S → C => Follow(C) = {b,$}
C → cC => First(C) = {c} + follow(C)
= {c, b,$}
S → ASb => Follow(A) Firs(S)
S →Asb => First(S) First(A) = {a}
S → C => First(S) First(C) = {c} + Follow(C)
= {a, c, b, $}
S →ASb => Follow(A) Firs(S) = {a,c,b,$}
Follow(A) = {a,c,b,$}
Follow(C) = {b,$}
Follow(S) = {b,$}
33. 5/11/2021 Saeed Parsa 33
Follow Set Example 2
G: S L B
L id : |
B i c t S E
E e S |
E e S | => First(E) = {e} + Follow(E) = {e. $}
B i C t S E => Follow(E) Follow(B)
S L B => Follow(B) Follow(S)
B i C t S E => Follow(S) = First(E) = {$} + First(E)= {$} + {e,$} {e,$}
Follow(B) Follow(S) = {e,$}
S L B => Follow(L) First(B) = {i}
B i C t S E => First(B) = {i}
34. 5/11/2021 Saeed Parsa 34
Example
Transform the following grammar into LL(1)
G: LabeledSt Label Statement
Label id : |
Statement AssignmentSt | IfSt | WhileSt | CallSt
AssignmentSt id := Expression
WhileSt while Expression do Statement
IfSt if Expression do Statement
CallSt id ( Params )
1. The grammar is not LL(1) because:
Nulable(Label) (First(Label) ∩ Follow Label = id ≠ 𝜑 ¬ 𝐿𝐿 1
First(Label) = {id, } = {id} + follow(Label) = {id} + First(Staement)
35. 5/11/2021 Saeed Parsa 35
ii. Null Production Removal
LabeledSt Label Statement
Label id : |
1. Replace Label with its expansion
LabeledSt id: Statement | Statement
2. First(id : Statement) First(Statement) = {id}
Left factor into LL(1)
37. 5/11/2021 Saeed Parsa 37
iii. Left Recursion Elimination
A grammar is not LL(1) if:
it includes a left recursive production:
X X 𝛼 | 𝛽
because: First(X 𝛼) = First(X) = First(𝛽) First(X 𝛼) First(𝛽)
Left recursion is eliminated by converting the grammar into an
equivalent right recursive grammar.
X X 𝛼 | 𝛽 𝑖𝑠 𝑐𝑜𝑛𝑣𝑒𝑟𝑡𝑒𝑑 𝑡𝑜
1. BNF: X 𝛽 X’
X’ 𝛼 X’ |
2. EBNF: X 𝛽 {𝛼}
38. 5/11/2021 Saeed Parsa 38
X X 𝛼 | 𝛽 𝑖𝑠 𝑒𝑞𝑢𝑖𝑣𝑎𝑙𝑒𝑛𝑡 𝑡𝑜
1. BNF: X 𝛽 X’
X’ 𝛼 X’ |
2. EBNF: X 𝛽 {𝛼}
Because:
1. X X 𝛼 | 𝛽 2. X 𝛽 {𝛼}
39. 5/11/2021 Saeed Parsa 39
Consider the regular expressions grammar
E E + T | E - T | T
T T * F | T / F | F
F Id | No | ( E )
1. Left factoring (BNF)
E E + T | E - T | T E E E” | T
E” +T | -T
T T * F | T / F | F T T T” | F
T” * F | / F
1. Left factoring (EBNF)
E E + T | E - T | T
E E (+ T | - T) | T
T T * F | T / F | F
T T (* F | / F) | F
40. 5/11/2021 Saeed Parsa 40
X X𝛼 | 𝛽 𝑖𝑠 𝑐𝑜𝑛𝑣𝑒𝑟𝑡𝑒𝑑 𝑡𝑜
1. BNF: X 𝛽 X’
X’ 𝛼 X’ |
2. EBNF: X 𝛽 {𝛼}
2. Left Recursion Elimination (BNF)
E E E” | T E T E’
E’ E”E’ |
E’ +T E’ | -T E’ |
T T T” | F T F T’
T’ * F T’| / F T’ |
2. Left Recursion Elimination (EBNF)
E E (+ T | - T) | T E T {+ T | - T}
T T (* F | / F ) | F T F {* F | / F}
Example
41. 5/11/2021 Saeed Parsa 41
- Equivalent G. (BNF)
E T E’
E’ +T E’ | -T E’ |
T F T’
T’ * F T’| / F T’ |
F Id | No | ( E )
- Equivalen G. (EBNF)
E T {+ T | - T}
T F {* F | / F}
F Id | No | ( E )
Example
Consider the regular expressions grammar
E E + T | E - T | T
T T * F | T / F | F
F Id | No | ( E )
42. 5/11/2021 Saeed Parsa 42
To ensure that a grammar is LL(1), we must do the following:
1. Eliminate any common left prefixes,
2. Eliminate any left recursion, as shown below.
3. Eliminate nullable productions, if they cause problem.
1. Left factoring:
A → αβ1|αβ2|𝜹
is replaced with:
A → αA′ | 𝜹
A′ → β1|β2
Or in extended BNF:
A → α (β1|β2)
How to transform to LL(1)
43. 5/11/2021 Saeed Parsa 43
2. Left Recursion Elimination
X X𝛼 | 𝛽
is converted to
X 𝛽 X’
X’ 𝛼 X’ |
Or in extended BNF:
X 𝛽 {𝛼}
3. No nullable symbol A has the same terminal symbol a in both its first
and follow sets for distinct production rules.
How to transform to LL(1)
44. 5/11/2021 Saeed Parsa 44
The key problem during predictive parsing is that of determining the
production to be applied for a non-terminal.
This is done by using a parsing table.
A parsing table is a two-dimensional array M[A,a] where A is a non-terminal,
and a is a terminal or the symbol $, menaing “end of input string”.
The other inputs of a predictive parser are:
◦ The input buffer, which contains the string to be parsed followed by $.
◦ The stack which contains a sequence of sentential forms, initially, $S
(end of input string and start symbol) in it.
Parse tables
45. 5/11/2021 Saeed Parsa 45
• The purpose of parsing table is to determine which production rule to use next.
• Consider the following grammar:
G1:
S d A B | B a B
A d A | B a
B b B |
Example 1
1. Transform the grammar into LL(1) form,
2. Use First and follow sets to construct the parsing table,
3. Use the parsing table to parse given input strings.
46. 5/11/2021 Saeed Parsa 46
1. Convert G1 into the LL(1) form
- B b B |
- First(B) = {b, } => First(B) Follow(B) should be null.
- A B a => follow(B) = {a}
- S d A B => follow(B) = {a} + {$} = {a, $}
- It is assumed that always: $ follow(Start symbol)
- => $ follow(S) => $ follow(dAB) => $ follow(B)
- => First(B) Follow(B) = {b} {a, $} =
- First(B) = {b, }, Follow(B) = {a, $}
- First(B) = {b, a, $}
Example 1-Continued
47. 5/11/2021 Saeed Parsa 47
- A d A | B a
- First(dA) First(B a) = {d} {b, a} =
- First(A) = First(dA) First(B a) ={d, b, a}
- S d A B | B a B
- First(dAB) First(BaB) = {d} {b , a, $} =
- First(S) = First(dAB) First(BaB) First( a )
- First(S) = {d} {b} {a} = {d, b, a}
2. Use First sets to work out the parsing table
Example 1-Continued
48. 5/11/2021 Saeed Parsa 48
- First(S) = {d, b, a}
- First(A) = {d, b, a}
- First(B) = {b, } = {b,a,$}
- Follow(B) = { a, $}
Example 1-Continued
G1:
S d A B | B a B
A d A | B a
B b B |
d a b $
S dAB BaB BaB BaB
A dA Ba Ba Ba
B bB
49. 5/11/2021 Saeed Parsa 49
Build parsing table for this grammar:
G2:
S ( L ) | a
LL S | S
Example 2
1- Eliminate left recursion
G2:
S ( L ) | a
L S L’
L’ SL’ | λ
50. 5/11/2021 Saeed Parsa 50
Example 2-2
2- Define First set and if required
follow sets for the Non-terminals.
First(L’) = First(S) +{λ} ={(, a, λ}
Follow(L’) = Follow(L) = { ) }
First(L) = First(S) = {(, a }
Follow(L)={(,a}+{)} = { (, a, ) }
$
)
(
a
-
-
(L)
a
S
SL’
SL’
L
-
λ
SL’
SL’
L’
قاعده
ورودي
تجزيه پشته
S(L)
(a(aa))$
$ S
(a(aa))$
$ )L(
LSL’
a(aa))$
$ )L
Sa
a(aa))$
$ ) L’S
Delete
a(aa))$
$ ) L’a
L’SL’
(aa))$
$ ) L’
S(L)
(aa))$
$ ) L’S
Delete
(aa))$
$ ) L’)L(
LSL’
aa))$
$ ) L’)L
Sa
aa))$
$ ) L’) L’S
Delete
aa))$
$ ) L’) L’a
L’ SL’
a))$
$ ) L’) L’
Sa
a))$
$ ) L’) L’S
Delete
a))$
$ ) L’) L’a
L’λ
))$
$ ) L’) L’
Delete
))$
$ ) L’)
L’λ
)$
$ ) L’
Delete
)$
$ )
$
$
53. 5/11/2021 Saeed Parsa 53
• The third homework : Insert your slides from this slide on
54. 5/11/2021 Saeed Parsa 54
1. Convert this grammar to LL(1)
G1:
S::= |A S
A ::= id := id
A ::= if id then A
A ::= if id then A' else A
A' ::= id := id
A' ::= if id then A' else A‘
Exercise -1
56. 5/11/2021 Saeed Parsa 56
1. A ::= ABd | Aa | a
i. Left factoring
A ::= A A” | a
A” ::= Bd | a
ii. Eliminate left recursion
A ::= A A” | a => A ::= a A’
A’ ::= A” A’ |
=> A’ ::= Bd A’ | a A’ |
B ::= Be | b => B ::= b B’
B’::= e B’ |
3. A ::= A B |A c| a | aa
i. Left factoring
=> A ::= A A” | a D
A” ::= B | c
D ::= a |
ii. Eliminate left recursion
=> A ::= a D A’
A’ ::= B A’ | c A’ |
2. A ::= A b |A c| a | b
i. Left factoring
=> A ::= A A” | a | b
A” ::= b | c
ii. Eliminate left recursion
=> A ::= a A’ | b A’
A’ ::= b A’ | c A’ |
Solution
57. 5/11/2021 Saeed Parsa 57
Consider the grammar G12
a) Point out all aspects of Grammar G12 which are
not LL(1).
b) Write a new grammar which accepts the same
language, but avoids left recursion and common
left prefixes.
c) Write the FIRST and FOLLOW sets for the new
grammar.
d) Write out the LL(1) parse table for the new
grammar.
e) Is the new grammar an LL(1) grammar? Explain
your answer carefully.
Exercise -3
58. 5/11/2021 Saeed Parsa 58
Exercise -4
Consider the assignment statements grammar
A id := E
E E + T | E - T | T
T T * F | T / F | F
F Id | No | ( E )
Convert the grammar to LL(1).
Construct the parsing table for the grammar.
Use the table to parse the statement: a := (b/c*3 – e*f)/2
62. 5/11/2021 Saeed Parsa 62
A recursive-descent parser is structured as a set of mutually recursive
procedures, one for each nonterminal in the grammar.
The procedure corresponding to nonterminal A recognizes an instance of A in
the input stream.
To recognize a nonterminal B on some right-hand side for A, the parser
invokes the procedure corresponding to B.
Thus, the grammar itself serves as a guide to the parser's implementation.
Recursive descent parsers
63. 5/11/2021 Saeed Parsa 63
• To test for the presence of a nonterminal, say ’A’, the code invokes a
procedure, named A.
• Suppose: A a B D
Recursive descent parsers
public class Parser
{ private enum symbols currentSymbol;
Parser () { currentSymbol = nextSymbol(); A()}
public void A()
{ /*A */ Expect(‘a’); B(), D(); }
public void Expect(enum Symbols expectedSymbol)
{ if ( currentSymbol == expectedSymbol) currentSymbol = nextSymbol();
else syntaxError(); }
64. 5/11/2021 Saeed Parsa 64
For instance:
G: S if E then S | if E then S else S | begin S L | print E
L end | ; S L
E i
Recursive descent parsers develop,a procedure / method for each non-
terminal A, with the same name as the nonterminal.
There are three non-terminals S, L, and E, in the grammar.
Three methods S(), L() and E() should be written.
A lexical analyzer method nextSymbol() is invoked to get the next lexicon
from the input file.
nextSymbol() copies the symbol in a global variable called currentSymbol.
It is assumed that always the next symbol is accessible via currentSymbol,
before the next symbol could be analyzed.
Recursive descent parsers
65. 5/11/2021 Saeed Parsa 65
There are three non-terminals S, L, and E, in the grammar.
Three methods S(), L() and E() should be written.
A lexical analyzer method nextSymbol() is invoked to get the next lexicon
from the input file.
nextSymbol() copies the symbol in a global variable called currentSymbol.
It is assumed that always the next symbol is accessible via currentSymbol,
before the next symbol could be analyzed.
Recursive descent parsers
66. 5/11/2021 Saeed Parsa 66
// S if E then S | if E then S else S | begin S L | print E
public void S()
{ if (currentSymbol == "if")
{ nextSymbol(); E(); expect( "then"); S();
if (currentSymbol == "else") { nextSymbol(); S(); return; }
} else if (currentSymbol == "begin") { nextSymbol(); S(); L(); return; }
else if (currentSymbol == "print")
{ nextSymbol(); E(); return; }
else { throw new IllegalTokenException("Procedure S() expected an 'if’
or 'then' or else or begin or print token " + "but received: "
+ currentSymbol ); } } }
Recursive descent parsers
67. 5/11/2021 Saeed Parsa 67
1. Transform the G into LL(1):
G: E T E’
E’ +T E’ | -T E’ |
T T T’
T’ * F T’| / F T’ |
F Id | No | ( E )
- Equivaled G. (EBNF)
G: E T {+ T | - T}
T F {* F | / F}
F Id | No | ( E )
• For instance consider the regular expressions grammar
G: E E + T | E - T | T
T T * F | T / F | F
F Id | No | ( E )
• A recursive-descent parser is structured as a set of mutually recursive
procedures, one for each nonterminal in the grammar.
Recursive descent parsers
68. 5/11/2021 Saeed Parsa 68
• The procedure corresponding to nonterminal A recognizes an instance of A
in the input stream.
// E T E’
Public void E()
{ /* E */ T(); E’(); }
• To recognize a nonterminal B on some right-hand side for A, the parser
invokes the procedure corresponding to B.
//E’ +T E’ | -T E’ |
Public void E()
{if (currentSymbol == s_Add)
{/* E’ */ nextSymbol(); T(); E’();}
else if (currentSymbol == s_Sub)
{/* E’ */ nextSymbol(); T(); E’();}
}
Recursive descent parsers
69. 5/11/2021 Saeed Parsa 69
• For building parsers (especially bottom-up) a BNF grammar is often better,
than EBNF. But it’s easy to convert an EBNF Grammar to BNF:
Convert every repetition { E } to a fresh non-terminal X and add
X ::= | E X.
Convert every option [ E ] to a fresh non-terminal X and add
X ::= | E.
Convert every group ( E ) to a fresh non-terminal X and add
X ::= E.
We can even do away with alternatives by having several productions
with the same non-terminal.
X ::= E | E’. becomes X ::= E. X ::= E’.
From EBNF to BNF
70. 5/11/2021 Saeed Parsa 70
For a recursive descent parser it is easier to use extended BNF.
G: E T {+ T | - T}
T F {* F | / F}
F Id | No | ( E )
public class Parser
{ private enum symbols currentSymbol;
Parser() { // Gets the next symbol, as currentSymbol, before calling E
currentSymbol = nextSymbol(); E();}
// G: E T {+ T | - T}
public void E( )
{ /* E */ T();
while ( currentSymbol == S_Add || currentSymbol == S_Sub)
{ nextSymbol(); T(); }
}
From EBNF to BNF
71. 5/11/2021 Saeed Parsa 71
// T F {* F | / F}
public void T( )
{ /* T */ F();
while ( currentSymbol == S_Mul || currentSymbol == S_Div)
{ nextSymbol(); F(); } }
// F Id | No | ( E )
public void F( )
{ if (currentSymbol == S_Id || currentSymbol == S_No) nextSymbol();
else { /* F ( E ) */
Expect(S_openPar); E(); Expect(S_closePar); }
public void Expect(enum symbols expectedSymbol )
{ if currentSymbol == expectedSymbol) nextSymbol(); else syntaxError(); }
public void nextSymbol( File *input-File){ … }
} //Eof Parser Class.
From EBNF to BNF
72. 5/11/2021 Saeed Parsa 72
A Mini Pascal Compiler
ProgramX Program id ; BlockBody .
Blockbody [ ConstantDefpart ] [ typeDefPart ] [VarDefPart ]
{FunctionDef | ProcedureDef }CompaundStatement
ConstantDefPart Const ConstandDef {ConstantDef}
ConstantDef id = ( No | id ) ;
TypeDefPart Type TypeDef {TypeDef}
TypeDef id = (integer | real | character)
VarDefPart Var VarDef {VarDef}
VarDef id : (integer | real | character)
• Consider the mini-pascal grammar:
74. 5/11/2021 Saeed Parsa 74
Mini Pascal R.D. parser
Begin
init(); // Initializes the Mini-Pascal parser
NextSymbol(); // Get a lookahead
ProgramX(); // Call Starting symbol function
End.
Public class Parser
{ public enum symbols currentSymbol;
Parser(String SourceFile)
{ init(SourceFile); // Open Source and …
NextSymbol(); // currentSymbol = next symbol;
ProgramX(); // Call Start-symbol }
…
}
75. 5/11/2021 Saeed Parsa 75
Recursive descent parsers start by calling the starting symbol of the grammar.
/* ProgramX Program id ; blockBody . */
public void ProgramX( )
{
Expect( S_Program ); // Expect visiting the “program” keyword
Expect( S_id ); // Expect visiting an identifier
Expect( S_Semi ); // Expect visiting a semicolon
bolckBody( ); // Invoke blockBody()
Expect( S_Dot ); // Expect visiting a dot
}
76. 5/11/2021 Saeed Parsa 76
Mini Pascal R.D. parser - 3
/* blockBody [ constantDefpart ] [ typeDefPart ] [varDefPart ]
{functionDef | procedureDef } compaundStatement */
public void blockBody( )
{ if (currentSymbol == S_Const) constantDefpart();
if (currentSymbol == S_Type) typeDefpart();
if (currentSymbol == S_Var) varDefpart();
while (currentSymbol == S_Procedure || currentSymbol == S_function)
if (currentSymbol == S_Procedure) procedureDef();
else functionDef();
compoundStatement( );
}
77. 5/11/2021 Saeed Parsa 77
Mini Pascal R.D. parser - 4
/* constantDefpart Const constandDef {constantDef} */
public void constantDefpart( )
{ Expect( S_Const );
constantDef();
// while currenstSymbol in first(constantDef)
while (currentSymbol == S_Id) constantDef();
}
/* constantDef id = ( No | id ) ; */
public void constantDefpart( )
{ Expect( S_Id); Expect( S_Eaual);
if(currentSymbol == S_No) nextSymbol()
else Expect( S_Id);
expect(S_Semicolon); }
78. 5/11/2021 Saeed Parsa 78
Error Recovery
Error recovery is a process to act against the error in order to reduce the negative
effect of the error.
If the next symbol does not match the expected symbol, then ignore the input
symbols as far as next expected symbol is observed.
79. 5/11/2021 Saeed Parsa 79
Error Recovery: definition
Error recovery is a process to act against the error in order to reduce the negative
effect of the error.
Internally the error recovery works as follows:
؞ The location of the syntax error is reported.
؞ If possible, the tokens that would be a legal continuation of the program are
reported.
؞ The tokens that can serve to continue parsing are computed. A minimal
sequence of tokens is skipped until one of these tokens is found.
80. 5/11/2021 Saeed Parsa 80
Error recovery
• Consider the “Expect” method:
public void Expect( enum Symbols expectedSymbol )
{
if (currentSymbol == expectedSymbol)
nextSymbol( );
else
syntaxError( );
}
• We are going to complete the “syntaxError” method:
public void syntaxError( )
{
Console.writeline( “ Syntax Error “);
nextSymbol(); //Get the next look-ahead symbol
}
81. 5/11/2021 Saeed Parsa 81
Motivating Example .1
• Now, consider this grammar:
• Consider the following code:
82. 5/11/2021 Saeed Parsa 82
Motivating Example .2
/* compoundSt ::= begin Sts end*/
procedure compoundSt( )
begin
Expect(S_Begin);
Sts();
Expect(S_end);
end;
/* Sts ::= St; Sts | */
procedure Sts( )
begin
St( ); Expect(S_semicolon);
Sts();
end;
Look ahead : begin
83. 5/11/2021 Saeed Parsa 83
Motivating Example .3
/* St ::= ifSt | whileSt | assSt | compounSt */
procedure St( )
begin
if currentSymbol = s_if) then ifSt( )
else if currentSymbol = s_while) then whileSt( )
else if currentSymbol = s_id) then assSt( )
else Expect(s_begin);
end;
Look ahead : begin
Look ahead : jf
84. 5/11/2021 Saeed Parsa 84
Motivating Example .4
/* assSt ::= id := E
procedure assSt( )
begin
nextSymbol( );
Expect(s_assign);
E()
end;
begin
Jf i = 5 then i := i+1;
while j< 5 di i := i*j;
end
public void Expect( enum Symbols expectedSymbol )
{
if (currentSymbol == expectedSymbol)
nextSymbol( );
else
syntaxError( );
}
public void Expect( enum Symbols expectedSymbol )
{
Console.writeline( “ Syntax Error “);
nextSymbol(); //Get the next look-ahead symbol
}
Expected: S_id
Look ahead: i
85. 5/11/2021 Saeed Parsa 85
Error Recovery: Approach
Suppose parser is expecting a non-terminal, Yi, in this production:
X Y1 Y2 … Yi … Yn
In fact the parser expecting a terminal symbol s First( Yi ).
The error recovery works as follows:
؞ Skip next symbols, s, till arriving at a symbol
؞ s First( Yi+1)..n).
؞ Or it proceeds with ignoring the next symols, s, until it arrives at a symbol
S Stop(Yi)
؞ where i [1..n-1] => Stop(Yi) = 𝑗=𝑖+1
𝑛
𝐹𝑖𝑟𝑠𝑡(𝑌𝑗) + Stop(Y)
؞ Stop(Yn) = Stop(Y)
؞ Stop(Start Symbol) always includes the end of file marker, $.
86. 5/11/2021 Saeed Parsa 86
Stop set
G1:
St ifSt | whileSt | assSt | compoundSt
=> Stop(St) = [s_eof] since St is the start symbol,
=> Stop(ifSt) = Stop(whileSt) = Stop(assSt) = Stop(compoundSt) = Stop(St)
= [s_eof]
compoundSt begin Sts end
=> Stop(s_begin) = First(Sts) + [s_end] + Stop[compoundSt]= [s_end, s_eof]
=> Stop(Sts) = [s_end] + Stop[compoundSt] = [s_eof, s_end]
=> Stop(s_end) = Stop[compoundSt] = [s_eof]
Sts St ; Sts | St
=> Stop(St) = [s_semicolon] + first(St) + Stop(Sts)
= [s_semicolon] + First(ifSt) + First(whileSt) + First(assSt) +
First(compoundSt) + [s_eof, s_end];
87. 5/11/2021 Saeed Parsa 87
Error Recovery
• The “Expect” method is modified as follows:
public void Expect( enum Symbols expectedSymbol , HashSet Stop)
{
if (currentSymbol == expectedSymbol)
nextSymbol( );
else syntaxError( Stop );
}
• We are going to complete the “syntaxError” method:
public void syntaxError( HashSet<enum symbols> Stop )
{
Console.writeline( “ Syntax Error “);
while( !Stop.contains( currentSymbol ) )
nextSymbol();
}
88. 5/11/2021 Saeed Parsa 88
Error Recovery
The Expect method is modified as follows:
(* Expect compares expected symbol S with the current symbol *)
Procedure Expect ( ExpectedSymbol : Symbols , Stop : Set of Symbols ) ;
Begin
if Currentsymbol = ExpectedSymbol Then Nextsymbol
Else SyntaxError( Stop ) ;
End {Expect};
Procedure SyntaxError(Stop: Set of Symbols);
Begin
promptMsg(‘ Syntax error at‘, LineNo, ColNo) ;
While not (CurrentSymbol in Stop) DO NextSymbol() ;
End{ SyntaxError };
89. 5/11/2021 Saeed Parsa 89
Error Recovery
• The main body of the Mini-pascal compiler is modified as follows:
Program MiniPascalComplier;
Type
Symbols = ( S_if , S_while, S_repeat , S_for, S_Case, S_then, S_else, S_do,
S_program, S_uses, S_interface, S_unit, S_begin, S_end,
S_label, S_const, S_type, S_var, S_procedure, S_function,
S_integer , S_real , S_char, S_array, S_record, S_pointer,
S_lt , S_gt , S_eq , S_le , S_ge, S_ne, S_add, S_sub, S_or, S_mul,
S_div, S_and, S_id, S_no, S_not, S_comma, S_colon, S_semicolon,
S_dot, S_OpBracket, S_ClBracket, S_OpCurlyB, S_ClCurlyB,
S_OpSquB, S_ClSquB );
Var
CurrentSymbol : Symbols ;
90. 5/11/2021 Saeed Parsa 90
Error Recovery : MiniPascal
Begin
init( ) ; (* Initialize variables, open source and create target files *)
nextSymbol( ); (*Detects and saves the first lexicon in currentSymbol *)
ProgramX ([S_EOF] ); (* End of file marker is expected after the start symbol,
ProgramX *)
End.
92. 5/11/2021 Saeed Parsa 92
Error Recovery : MiniPascal
(* Blockbody [ ConstantDef.part ][ typeDef.Part ][VarDefPart ]
{FunctionDef | ProcedureDef} CompaundStatement *)
Procedure BlockBody( Stop : Set of Symbols);
Begin
if CurrentSymbol = S_const then
ConstantDef.Part(Stop + [S_Type, S_Var, S_Procedure, S_Function, S_Begin] );
if CurrentSymbol = S_type then
TypeDefPart( Stop + [ S_Var, S_Procedure, S_Function, S_Begin] );
if CurrentSymbol = S_var then
VarDefPart( Stop + [ S_Procedure, S_Function, S_Begin] );
93. 5/11/2021 Saeed Parsa 93
Example 1
Convert this grammar into LL(1) form and write a recursive-descent parser
for it:
G1:
S aB | aC | dD
D Da | Db | d
B BC | b
C Cd | d
G1:
S a (B | C ) | dD
D d { a | b }
B b { C }
C d {d}
First(S) = {a, d}
First(D) = {d}
First(B) = {b}
First(C) = {d}
94. 5/11/2021 Saeed Parsa 94
Error Recovery : Example – 1.1
Begin
init();
NextSymbol();
S(S_EOF);
End.
(* S a (B | C ) | dD *)
Procedure S( Stop : Set of Symbols);
Begin
If(CurrentSymbol = S_d) Then Begin NextSymbol; D( Stop );
End
Else Expect(S_a, [S_b, S_d] + Stop);
End;
95. 5/11/2021 Saeed Parsa 95
Error Recovery : Example – 1.2
(* D d { a | b }
*)
Procedure D( Stop : Set of Symbols);
Begin
Expect(S_d, [S_b, S_d] + Stop);
While(CurrentSymbol = S_a ) Or (CurrentSymbol = S_b) do
If CurrentSymbol = S_a Then NextSymbol
Else Expect(S_a,
[S_b, S_d] + Stop);
End;
(*B b { d }*)
Procedure B( Stop : Set of Symbols);
Begin
Expect(S_b, [ S_d] + Stop);
While(CurrentSymbol = S_d ) do NextSymbol;
End;
(*C d { d }*)
Procedure C( Stop : Set of Symbols)
Begin
Expect(S_d, [ S_d] + Stop);
While(CurrentSymbol = S_d ) do NextSymbol;
End;
96. 5/11/2021 Saeed Parsa 96
Example 2
• Convert this grammar into LL(1) and write a recursive-descent parser for it:
G2:
S Aa | Bd | Sc
A Aa |
B Bb |
A {a} A is nullable
B {b} B is nullable
First(A) = {a, }
Follow(A) = {a}
First(A) Follow(A) = {a}
First(B) = {b, }
Follow(B) = {d}
First(B) Follow(B) =
S {a}a | Bd | Sc
S {a{a}| Bd }c
First(S) = {a, b, c, d}
97. 5/11/2021 Saeed Parsa 97
Example 2.1
G2:
S Aa | Bd | Sc
A Aa |
B Bb |
G2:
S { a{a} | b{b} | d } c
/* S { a{a} | b{b} | d } c */
public void S(HashSet Stop)
{ HashSet<char> Follow = new HashSet<char>();
if (currentSymbol == ‘a’)
while (currentSymbol == ‘a’) nextSymbol();
else if (currentSymbol == ‘b’)
while (currentSymbol == ‘b’) nextSymbol();
else
{ Follow.Add (‘a’); Follow.sAdd (‘b’);
Follow.Add (‘c’); Follows.Add (‘d’);
Expect(‘d’, Follow+ Stop); }
Expect( d, Stop ),
98. 5/11/2021 Saeed Parsa 98
Example 3
Convert this grammar into LL(1) and write a recursive-descent parser for it:
G3:
Expression SimpleExp {RelOp SimpleExp}
RelOp < | <= | = | <> | >= | > | IN
SimpleExp Term { ( ‘+’ | ‘-’ | Or ) Term }
Term Factor { (‘/’ | ‘*’ | DIV | AND) Factor}
Factor Number | NOT Factor | ‘(‘ Expression ‘)’ | Variable
Variable Identifier { ‘[‘ Dim ‘]’ }
Dim Expression { ‘ ,’ Expression }
100. 5/11/2021 Saeed Parsa 100
Example 3.2
Begin
Init( );
NextSymbol([S_Eof]);
Expression( );
End.
(*Expression SimpleExp { RelOp SimpleExp } *)
procedure Expression( Stop : Set of Symbols);
Begin
FirstSetOfRelop = [ < , <=, = , < > , >= | > | IN ];
FirstSimpleExp = [Number, Not, ‘(‘, Identifier ];
SimpleExp( Stop + FirstSetOfRelop );
while CurrentSymbol in FirstSetOfRelop do
Begin
RelOp(Stop + FirstSimpleExp);
SimpleExp (Stop + FirstSetOfRelop);
End;
End {Expression};
101. 5/11/2021 Saeed Parsa 101
Example 4
• Convert this grammar into LL(1) form and write a recursive-descent parser for
it:
G4:
S L D
L id : |
D A | C | I | B
A id := no + id
C id ( )
B begin T end
T T; S | S
I if (id > no) goto id
102. 5/11/2021 Saeed Parsa 102
Example 4.2
S L D => First(S) = First(L)
L id : | => First(L) = [id, ] nullable
S L D => Follow(L) = First(D)
D A | C | I | B => First(D) = [id, if, begin]
=> First(C) First(D) = [id] Not LL(1)
=> First(L) Follow (L) = [id] Not LL(1)
=> D id := no + id | id ( ) | I | B
Left refactoring : D id G | | I | B
G := no | ( )
Null production Elimination:
S id: D | id G | I | B
Left Factoring: S id (: D | G) | I | B
103. 5/11/2021 Saeed Parsa 103
Example 4.3
G4:
S id (: D |G }| I | B => First(S) = [id, if, begin]
D id G | I | B => First(D) = [id, if, begin]
G := no | ( ) => First(G) = [no, ( ]
B begin T end => First(B) = [Begin]
T S{; S } => First(G) = [id, if, begin]
I if (id > no) goto id => First(I) = [if]
104. 5/11/2021 Saeed Parsa 104
Example 4.4
// S id (: D |G }| I | B => First(S) = [id, if, begin]
public void S( HashSet Stop)
{HashSet<String> First = new HashSet<String>();
D id G | I | B => First(D) = [id, if, begin]
G := no | ( ) => First(G) = [no, ( ]
B begin T end => First(B) = [Begin]
T S{; S } => First(G) = [id, if, begin]
I if (id > no) goto id => First(I) = [if]
105. 5/11/2021 Saeed Parsa 105
Example 4.3
G4:
S id (: D |G }| I | B => First(S) = [id, if, begin]
D id G | I | B => First(D) = [id, if, begin]
G := no | ( ) => First(G) = [no, ( ]
B begin T end => First(B) = [Begin]
T S{; S } => First(G) = [id, if, begin]
I if (id > no) goto id => First(I) = [if]
107. The third homework : Insert your slides from this slide on
5/11/2021 Saeed Parsa 107
108. 5/11/2021 Saeed Parsa 108
1. Convert this grammar to LL(1) in EBNF and write a recursive descent parser
In Python or C#.
G1:
S::= |A S
A ::= id := id
A ::= if id then A
A ::= if id then A' else A
A' ::= id := id
A' ::= if id then A' else A‘
Exercise -1
1. A’ Could be removed
2. The grammar is not acepable
3. If you substitute for S in the
profuction S ::= A S you will have
S ::= {A}.
109. 5/11/2021 Saeed Parsa 109
Consider the grammar G12
a) Point out all aspects of Grammar G12 which are
not LL(1).
b) Convert it into LL(1) in EBNF.
c) Write the FIRST and FOLLOW sets for the new
grammar.
d) Write out the LL(1) recursive descent parser.
e) Do not forget error recovery.
Exercise -2
110. 5/11/2021 Saeed Parsa 110
Exercise -3
Consider the assignment statements grammar
A id := E
E E + T | E - T | T
T T * F | T / F | F
F Id | No | ( E )
Convert the grammar into LL(1), using EBNF.
Write out the recursive descent parser in C#, C++ or Python
Programming languages.
Do not forget error recovery.
112. 5/11/2021 Saeed Parsa 112
Parse-Tree Listeners & Visitors
ANTLR provides support for two tree-walking mechanisms in its runtime library.
By default, ANTLR generates a parse-tree listener interface that responds to
events triggered by the built-in tree walker.
The listeners receive notification of events like startDocument and
endDocument.
ANTLR can also generate tree walkers that follow the visitor design pattern
As the walker encounters the node for rule assign, for example, it triggers enterAssign()
and passes it the AssignContext parse-tree node.
The beauty of the listener mechanism is that it’s all automatic.
We don’t have to write a parse-tree walker, and our listener methods don’t have to
explicitly visit their children.
113. 5/11/2021 Saeed Parsa 113
Parse-Tree Listeners
To walk a tree and trigger calls into a listener, ANTLR’s runtime provides class
ParseTreeWalker.
ANTLR generates a ParseTreeListener subclass specific to each grammar with enter and
exit methods for each rule.
As the walker encounters the node for rule assign, for example, it triggers enterAssign()
and passes it the AssignContext parse-tree node.
The beauty of the listener mechanism is that it’s all automatic. We don’t have to write a
parse-tree walker, and our listener methods don’t have to explicitly visit their children.
The beauty of the listener mechanism is that it’s all automatic. We don’t have to write a
parse-tree walker, and our listener methods don’t have to explicitly visit their children.
114. 5/11/2021 Saeed Parsa 114
Parse-Tree Listeners
The thick dashed line shows a depth-first walk of the parse tree.
The thin dashed lines indicate the method call sequence among the visitor methods.
115. 5/11/2021 Saeed Parsa 115
Build a language application
The first step to building a language application is to create a grammar that describes a
language’s syntactic rules (the set of valid sentences).
Run ANTLR (class org.antlr.v4.Tool) on the grammar file.
antlr4 ArrayInit.g4 # Generate parser and lexer using antlr4 alias
• From grammar ArrayInit.g4, ANTLR generates lots of files that we’d normally have to
write by hand.
116. 5/11/2021 Saeed Parsa 116
Write syntactic and lexical rules
starter/ArrayInit.g4
/** Grammars always start with a grammar header. This grammar is called
* ArrayInit and must match the filename: ArrayInit.g4
*/
grammar ArrayInit;
/** A rule called init that matches comma-separated values between {...}. */
init : '{' value (',' value)* '}' ; // must match at least one value
/** A value can be either a nested array/struct or a simple integer (INT) */
value : init
| INT
;
// parser rules start with lowercase letters, lexer rules with uppercase
INT : [0-9]+ ; // Define token INT as one or more digits
WS : [ trn]+ -> skip ; // Define whitespace rule, toss it out
118. 5/11/2021 Saeed Parsa 118
Run the program
The program generates lisp like parse tress for a given input.
Here’s how to compile everything and run Test:
javac ArrayInit*.java Test.java
java Test
• Input
➾ {1,{2,3},4}
➾EOF
• output
❮ (init { (value 1) , (value (init { (value 2) , (value 3) })) , (value 4) })
119. 5/11/2021 Saeed Parsa 119
ANTLR 4 with Python3 Detailed Example
ANTLR4 introduced a handy listener-based API, but sometimes it's
better not to use it.
https://dzone.com/articles/antlr-4-with-python-2-detailed-example
120. 5/11/2021 Saeed Parsa 120
ANTLR 4 with Python3 Detailed Example
As before, we run ANTLR on the grammar to generate code.
https://dzone.com/articles/antlr-4-with-python-2-detailed-example
antlr4 -Dlanguage=Python3 arithmetic.g4
This generates a lexer, parser, and a base class for a listener;
I'll give the main body of the code first:
1 def main():
2 lexer = arithmeticLexer(antlr4.StdinStream())
3 stream = antlr4.CommonTokenStream(lexer)
4 parser = arithmeticParser(stream)
5 tree = parser.expression()
6 handleExpression(tree)
7 if __name__ == '__main__’:
8 main()
121. 5/11/2021 Saeed Parsa 121
Iterate over the children
The ANTLR API provides us with the means to iterate over the children of a node.
We can walk through the children in order.
NTLR API provides us with the means to iterate over the children of a node.
1. def handleExpression(expr):
2 adding = True
3 value = 0
4 for child in expr.getChildren():
5 if isinstance(child, antlr4.tree.Tree.TerminalNode):
6 adding = child.getText() == "+"
7 else:
8 multValue = handleMultiply(child)
9 if adding:
10 value += multValue
11 else:
12 value -= multValue
13 print "Parsed expression %s has value %s" % (expr.getText(), value)
122. 5/11/2021 Saeed Parsa 122
We iterate over the children; where we find a multiplying expression, we evaluate it.
Where we find an operator, we use it to set a flag indicating the next operation to
perform.
1. def handleMultiply(expr):
2 multiplying = True
3 value = 1
4 for child in expr.getChildren():
5 if isinstance(child, antlr4.tree.Tree.TerminalNode):
6 multiplying = child.getText() == "*"
7 else:
8 if multiplying:
9 value *= int(child.getText())
10 else:
11 value /= int(child.getText())
12
13 return value
Iterate over the children
123. The place of IUST in the world
5/11/2021 Saeed Parsa 123
https://www.researchgate.net/publication/328099969_Software_Fault_Localisation_A_Systematic_Mapping_Study