PARSING
Top Down Parser
2
TOP-DOWN PARSING
 The parse tree is constructed
– From the top
– From left to right
• Terminals are seen in order of
appearance in the token stream:
t2 t5 t6 t8 t9
3
TOP-DOWN PARSING
Top-down parser
 Recursive-Descent Parsing
 Backtracking is needed (If a choice of a production
rule does not work, we backtrack to try other
alternatives.)
 It is a general parsing technique, but not widely used.
 Not efficient
 Predictive Parsing
 no backtracking
 efficient
 needs a special form of grammars (LL(1) grammars).
 Non-Recursive (Table Driven) Predictive Parser is also
known as LL(1) parser.
 Recursive Predictive Parsing is a special form of
Recursive Descent parsing without backtracking.
4
RECURSIVE-DESCENT PARSING (USES
BACKTRACKING)
 Backtracking is needed.
 It tries to find the left-most derivation.
S  aBc
B  bc | b
S S
Input : abc
a B c a B
b c b
5
fails, backtrack
c
RECURSIVE DESCENT PARSING
 Consider the grammar
E → T + E | T
T → ( E ) | int | int * T
Input: int * int
 Start with top-level non-terminal E
 Try the rules for E in order
6
RECURSIVE DESCENT PARSING. EXAMPLE (CONT.)
Try E → T + E
Then try a rule for T → ( E )
But ( does not match input token int.
Try T → int . Token matches.
But + after T does not match input token *
Try T → int * T
This will match but + after T will be unmatched
Has exhausted the choices for T
Backtrack to choose for another derivation of E
7
RECURSIVE DESCENT PARSING. EXAMPLE (CONT.)
Try E → T
Follow same steps as before for T
– And succeed with T → int * T and T →
int
– With the following parse tree
E
T
8
int
*
T
int
RECURSIVE-DESCENT PARSING (BACKTRACKING
PROBLEM)
 Consider the following production
S → aAb
A → c |cd
Let the input string be acdb.
9
EXAMPLE 2
 Consider the following production
SBA| AB
Aa| SA
Bb | SB
w= abab
Parse the above w using recursive decent
parsing and find the problem of recursive
decent parser
10
PREDICTIVE PARSER
 When re-writing a non-terminal in a derivation
step, a predictive parser can uniquely choose a
production rule by just looking the current symbol in
the input string.
A  1 | ... | n input: ... a .......
current token
 Unlike recursive-descent, predictive parser can
“predict” which production to use.
– By looking at the next few tokens.
– No backtracking.
11
PREDICTIVE PARSER (EXAMPLE)
stmt  if ...... |
while ...... |
begin ...... |
for .....
 When we are trying to write the non-terminal stmt, if the
current token is if we have to choose first production rule.
 When we are trying to write the non-terminal stmt, we can
uniquely choose the production rule by just looking the
current token.
12
CONSTRUCTING THE LL(1) PARSING
TABLE
EXAMPLE
A → BC
B → DE
D → FG
F → HI
H → xY
First(A) = {x}
TASK
Write the sets of the following:
S -> Ty
T -> AB
T -> sT
A -> aA
A -> λ
B -> bB
B -> λ
 Example 2.
Calculate the first and follow functions for the
given grammar-
S → aBDh
B → cC
C → bC / ∈
D → EF
E → g / ∈
F → f / ∈
 Solution-
The first and follow functions are as follows-
First Functions-
First(S) = { a }
First(B) = { c }
First(C) = { b , ∈ }
First(D) = { First(E) – ∈ } ∪ First(F) = { g , f , ∈ }
First(E) = { g , ∈ }
First(F) = { f , ∈ }
Follow Functions-
Follow(S) = { $ }
Follow(B) = { First(D) – ∈ } ∪ First(h) = { g , f , h }
Follow(C) = Follow(B) = { g , f , h }
Follow(D) = First(h) = { h }
Follow(E) = { First(F) – ∈ } ∪ Follow(D) = { f , h }
Follow(F) = Follow(D) = { h }
 Calculate the first and follow functions for the
given grammar-
 S → AaAb / BbBa
 A → ∈
 B → ∈
 Example 3.
 E -> TR
 R -> +T R| #
 T -> F Y
 Y -> *F Y | #
 F -> (E) | i
 Output :
 First(E)= { (, i, }
 First(R)= { +, #, }
 First(T)= { (, i, }
 First(Y)= { *, #, }
 First(F)= { (, i, }
 Follow(E) = { $, ), }
 Follow(R) = { $, ), }
 Follow(T) = { +, $, ), }
 Follow(Y) = { +, $, ), }
 Follow(F) = { *, +, $, ), }
 E → T X
 X → + E
 X → ε
 T → int Y
 T → ( E )
 Y → * T
 Y → ε
LL(1) GRAMMAR
Grammer1:
1. Q -> aQbQ
2. Q -> bQaQ
3. Q -> Ɛ
Grammar2:
1. S->ab
2. S->Ɛ
3. B->bC
4. B->Ɛ
5. C->cS
6. C->Ɛ
NON-RECURSIVE PREDICTIVE PARSING -- LL(1)
PARSER
NON-RECURSIVE PREDICTIVE PARSING -- LL(1)
PARSER
 Non-Recursive predictive parsing is a table-driven
parser.
 It is a top-down parser.
 It is also known as LL(1) Parser.
input buffer
stack Non-recursive
output
Predictive Parser
81
EXAMPLE PARSE TABLE CONSTRUCTION
S → B c | D B
B → a b | c S
D → d | ε
For this grammar:
 Construct FIRST and FOLLOW Sets
 Apply algorithm to calculate parse table
EXAMPLE PARSE TABLE CONSTRUCTION
X FIRST(X) FOLLOW(X)
---------------------------------------------------
D { d, ε } { a, c }
B { a, c } { c, $ }
S { a, c, d } { $, c }
Bc { a, c }
DB { d, a, c }
ab { a }
cS { c }
D { d }
Ε {ε }
PARSE TABLE
a b c d $
S Bc
DB
Bc
DB
DB
B
D ε ε
Finish Filling In Table
LL(1) PARSER
input buffer
 our string to be parsed. We will assume that its
end is marked with a special symbol $.
stack
 contains the grammar symbols
 at the bottom of the stack, there is a special end
marker symbol $.
 initially the stack contains only the symbol $ and
the starting symbol S. $S  initial stack
 when the stack is emptied (i.e. only $ left in the
stack), the parsing is completed.
90
LL(1) PARSER
output
a production rule representing a step of the
derivation sequence (left-most derivation) of
the string in the input buffer.
parsing table
 a two-dimensional array M[A,a]
 each row is a non-terminal symbol
 each column is a terminal symbol & the special
symbol $
 each entry holds a production rule.
91
LL(1) PARSER – PARSER ACTIONS
 The symbol at the top of the stack (say X) and the
current symbol in the input string (say a)
determine the parser action.
 There are four possible parser actions.
1. If X and a are $  parser halts (successful completion)
2. If X and a are the same terminal symbol then
 parser pops X from the stack, and moves the next symbol in the
input buffer.
3. If X is a non-terminal
 M [X,a] holds a production rule XY1Y2...Yk, it pushes Yk,Yk-1,...,Y1
into the stack. The parser also outputs the production rule XY1Y2...Yk
to represent a step of the derivation.
4. none of the above  error
 all empty entries in the parsing table are errors.
 If X is a terminal symbol different from a, this is also an error case.
92
LL(1) PARSER
EXAMPLE TO PARSE ID+ID
stack input output
$E id+id$ E  TE’
$E’T id+id$ T  FT’
$E’ T’F id+id$ F  id
$ E’ T’id id+id$
$ E’ T’ +id$ T’  
$ E’ +id$ E’  +TE’
$ E’ T+ +id$
$ E’ T id$ T  FT’
$ E’ T’ F id$ F  id
$ E’ T’id id$
$ E’ T’ $ T’  
$ E’ $ E’  
$ $ accept
150
id + $
E E 
TE’
E
’
E’ 
+TE’
E’  
T T 
FT’
T
’
T’   T’  
F F 
id
LL(1) PARSER – ANOTHER EXAMPLE
S  aBa LL(1) Parsing
B  bB |  Table
w =abba
stack input output
$S abba$ S  aBa
$aBa abba$
$aB bba$ B  bB
$aBb bba$
$aB ba$ B  bB
$aBb ba$
$aB a$ B  
$a a$
$ $ accept, successful completion
151
a b $
S S  aBa
B B   B  bB
LL(1) PARSER – ANOTHER EXAMPLE (CONT.)
152
Outputs: S  aBa B  bB B  bB B  
Derivation(left-most): S  aBa  abBa  abbBa  abba
S
B
a a
B
B
b
b

parse tree
RECURSIVE DESCENT
PREDICTIVE PARSING
RECURSIVE DESCENT PREDICTIVE PARSING
After left factoring, the grammer is changed to
PROGRAM → begin DECLIST comma STATELIST
end
DECLIS → d semi DECLIST
DECLIST → d
STATELIST → s semi STATELIST
STATELIST → s
PROGRAM → begin DECLIST comma STATELIST end
DECLIST → dX
X → semi DECLIST | є
STATELIST → sY
Y → semi STATELIST | є
First(X) = {semi, є} Follow(X) =
{comma}
First(Y) = {semi, є} Follow(Y) = {end}
Write functions for each nonterminal.
PROGRAM → begin DECLIST comma
STATELIST end
DECLIST → dX
X → semi DECLIST | є
STATELIST → sY
Y → semi STATELIST | є
main()
{
token = lexical();
PROGRAM();
}
Viod PROGRAM
{
if (token != begin) error();
token = lexical();
DECLIST();
if (token != comma) error();
token = lexical();
STATELIST();
if (token != end) error();
}
void DECLIST()
{
if (token != d) error;
token = lexical();
X();
}
void X()
{
if (token == semi)
{
token = lexical();
DECLIST();
}
else
if (token == comma) ; // do nothing
else error();
}
void STATELIST()
{
if (token != s) error();
token = lexical();
Y();
}
Void Y()
{
if (token == semi)
{
token = lexical();
STATELIST();
}
else
if (token == end) ; // do nothing
else error();
}
CHANGING RECURSION INTO ITERATION
Change productions into an extended notation
that includes the *.
PROGRAM → begin DECLIST comma
STATELIST end
DECLIST → dX
X → semi DECLIST | є
STATELIST → sY
Y → semi STATELIST | є
PROGRAM → begin DECLIST comma
STATELIST end
DECLIST → d (semi d)*
STATELIST → s (semi s)*
CHANGING RECURSION INTO ITERATION
void DECLIST()
{ if (token != d) error();
token = lexical();
while (token == semi)
{
token = lexical();
if (token != d) error();
token = lexical();
}
}
CHANGING RECURSION INTO ITERATION
void STATELIST()
{ if (token != s) error();
token = lexical();
while (token == semi)
{
token = lexical();
if (token != s) error();
token = lexical();
}
}
CHANGING RECURSION INTO ITERATION
Removal of recursion is not always possible. A
context free grammar might contain middle
recursion and this can not be replaced by
iteration. For example
E → E ‘+’ T
E → T
T → T ‘*’ F
T → F
F → ‘(‘ E ‘)’
F → ‘x’
Transforming the grammar into LL(1)
E → E ‘+’ T
E → T
T → T ‘*’ F
T → F
F → ‘(‘ E ‘)’
F → ‘x’
E → TX
X → ‘ +’ TX | є
T → FY
Y → ‘*’ FY | є
F → ‘(‘ E ‘) | ‘x’
Replacing recursion by iteration, where
possible, we have
E → T( ‘+’ T)*
T → F(‘*’ F)*
F → ‘(‘ E ‘)’ | ‘x’
void E()
{
T();
while (token == plus)
{
token = lexical();
T();
}
}
Void T()
{
F();
while (token == Times)
{
token = lexical();
F();
}
}
E → T( ‘+’ T)*
T → F(‘*’ F)*
F → ‘(‘ E ‘)’ | ‘x’
Void F()
{
if (token == obracket)
{
token = lexical();
E();
if (token == cbracket)
token = lexical();
else
error();
}
else if (token == x)
token = lexical();
else
error();
}
main()
{
token = lexical(;
E();
}
E → T( ‘+’ T)*
T → F(‘*’ F)*
F → ‘(‘ E ‘)’ | ‘x’

PARSING.ppt

  • 1.
  • 2.
  • 3.
    TOP-DOWN PARSING  Theparse tree is constructed – From the top – From left to right • Terminals are seen in order of appearance in the token stream: t2 t5 t6 t8 t9 3
  • 4.
    TOP-DOWN PARSING Top-down parser Recursive-Descent Parsing  Backtracking is needed (If a choice of a production rule does not work, we backtrack to try other alternatives.)  It is a general parsing technique, but not widely used.  Not efficient  Predictive Parsing  no backtracking  efficient  needs a special form of grammars (LL(1) grammars).  Non-Recursive (Table Driven) Predictive Parser is also known as LL(1) parser.  Recursive Predictive Parsing is a special form of Recursive Descent parsing without backtracking. 4
  • 5.
    RECURSIVE-DESCENT PARSING (USES BACKTRACKING) Backtracking is needed.  It tries to find the left-most derivation. S  aBc B  bc | b S S Input : abc a B c a B b c b 5 fails, backtrack c
  • 6.
    RECURSIVE DESCENT PARSING Consider the grammar E → T + E | T T → ( E ) | int | int * T Input: int * int  Start with top-level non-terminal E  Try the rules for E in order 6
  • 7.
    RECURSIVE DESCENT PARSING.EXAMPLE (CONT.) Try E → T + E Then try a rule for T → ( E ) But ( does not match input token int. Try T → int . Token matches. But + after T does not match input token * Try T → int * T This will match but + after T will be unmatched Has exhausted the choices for T Backtrack to choose for another derivation of E 7
  • 8.
    RECURSIVE DESCENT PARSING.EXAMPLE (CONT.) Try E → T Follow same steps as before for T – And succeed with T → int * T and T → int – With the following parse tree E T 8 int * T int
  • 9.
    RECURSIVE-DESCENT PARSING (BACKTRACKING PROBLEM) Consider the following production S → aAb A → c |cd Let the input string be acdb. 9
  • 10.
    EXAMPLE 2  Considerthe following production SBA| AB Aa| SA Bb | SB w= abab Parse the above w using recursive decent parsing and find the problem of recursive decent parser 10
  • 11.
    PREDICTIVE PARSER  Whenre-writing a non-terminal in a derivation step, a predictive parser can uniquely choose a production rule by just looking the current symbol in the input string. A  1 | ... | n input: ... a ....... current token  Unlike recursive-descent, predictive parser can “predict” which production to use. – By looking at the next few tokens. – No backtracking. 11
  • 12.
    PREDICTIVE PARSER (EXAMPLE) stmt if ...... | while ...... | begin ...... | for .....  When we are trying to write the non-terminal stmt, if the current token is if we have to choose first production rule.  When we are trying to write the non-terminal stmt, we can uniquely choose the production rule by just looking the current token. 12
  • 35.
  • 44.
    EXAMPLE A → BC B→ DE D → FG F → HI H → xY First(A) = {x}
  • 53.
    TASK Write the setsof the following: S -> Ty T -> AB T -> sT A -> aA A -> λ B -> bB B -> λ
  • 74.
     Example 2. Calculatethe first and follow functions for the given grammar- S → aBDh B → cC C → bC / ∈ D → EF E → g / ∈ F → f / ∈
  • 75.
     Solution- The firstand follow functions are as follows- First Functions- First(S) = { a } First(B) = { c } First(C) = { b , ∈ } First(D) = { First(E) – ∈ } ∪ First(F) = { g , f , ∈ } First(E) = { g , ∈ } First(F) = { f , ∈ } Follow Functions- Follow(S) = { $ } Follow(B) = { First(D) – ∈ } ∪ First(h) = { g , f , h } Follow(C) = Follow(B) = { g , f , h } Follow(D) = First(h) = { h } Follow(E) = { First(F) – ∈ } ∪ Follow(D) = { f , h } Follow(F) = Follow(D) = { h }
  • 76.
     Calculate thefirst and follow functions for the given grammar-  S → AaAb / BbBa  A → ∈  B → ∈
  • 77.
     Example 3. E -> TR  R -> +T R| #  T -> F Y  Y -> *F Y | #  F -> (E) | i  Output :  First(E)= { (, i, }  First(R)= { +, #, }  First(T)= { (, i, }  First(Y)= { *, #, }  First(F)= { (, i, }  Follow(E) = { $, ), }  Follow(R) = { $, ), }  Follow(T) = { +, $, ), }  Follow(Y) = { +, $, ), }  Follow(F) = { *, +, $, ), }
  • 78.
     E →T X  X → + E  X → ε  T → int Y  T → ( E )  Y → * T  Y → ε
  • 79.
    LL(1) GRAMMAR Grammer1: 1. Q-> aQbQ 2. Q -> bQaQ 3. Q -> Ɛ Grammar2: 1. S->ab 2. S->Ɛ 3. B->bC 4. B->Ɛ 5. C->cS 6. C->Ɛ
  • 80.
  • 81.
    NON-RECURSIVE PREDICTIVE PARSING-- LL(1) PARSER  Non-Recursive predictive parsing is a table-driven parser.  It is a top-down parser.  It is also known as LL(1) Parser. input buffer stack Non-recursive output Predictive Parser 81
  • 82.
    EXAMPLE PARSE TABLECONSTRUCTION S → B c | D B B → a b | c S D → d | ε For this grammar:  Construct FIRST and FOLLOW Sets  Apply algorithm to calculate parse table
  • 83.
    EXAMPLE PARSE TABLECONSTRUCTION X FIRST(X) FOLLOW(X) --------------------------------------------------- D { d, ε } { a, c } B { a, c } { c, $ } S { a, c, d } { $, c } Bc { a, c } DB { d, a, c } ab { a } cS { c } D { d } Ε {ε }
  • 84.
    PARSE TABLE a bc d $ S Bc DB Bc DB DB B D ε ε Finish Filling In Table
  • 90.
    LL(1) PARSER input buffer our string to be parsed. We will assume that its end is marked with a special symbol $. stack  contains the grammar symbols  at the bottom of the stack, there is a special end marker symbol $.  initially the stack contains only the symbol $ and the starting symbol S. $S  initial stack  when the stack is emptied (i.e. only $ left in the stack), the parsing is completed. 90
  • 91.
    LL(1) PARSER output a productionrule representing a step of the derivation sequence (left-most derivation) of the string in the input buffer. parsing table  a two-dimensional array M[A,a]  each row is a non-terminal symbol  each column is a terminal symbol & the special symbol $  each entry holds a production rule. 91
  • 92.
    LL(1) PARSER –PARSER ACTIONS  The symbol at the top of the stack (say X) and the current symbol in the input string (say a) determine the parser action.  There are four possible parser actions. 1. If X and a are $  parser halts (successful completion) 2. If X and a are the same terminal symbol then  parser pops X from the stack, and moves the next symbol in the input buffer. 3. If X is a non-terminal  M [X,a] holds a production rule XY1Y2...Yk, it pushes Yk,Yk-1,...,Y1 into the stack. The parser also outputs the production rule XY1Y2...Yk to represent a step of the derivation. 4. none of the above  error  all empty entries in the parsing table are errors.  If X is a terminal symbol different from a, this is also an error case. 92
  • 150.
    LL(1) PARSER EXAMPLE TOPARSE ID+ID stack input output $E id+id$ E  TE’ $E’T id+id$ T  FT’ $E’ T’F id+id$ F  id $ E’ T’id id+id$ $ E’ T’ +id$ T’   $ E’ +id$ E’  +TE’ $ E’ T+ +id$ $ E’ T id$ T  FT’ $ E’ T’ F id$ F  id $ E’ T’id id$ $ E’ T’ $ T’   $ E’ $ E’   $ $ accept 150 id + $ E E  TE’ E ’ E’  +TE’ E’   T T  FT’ T ’ T’   T’   F F  id
  • 151.
    LL(1) PARSER –ANOTHER EXAMPLE S  aBa LL(1) Parsing B  bB |  Table w =abba stack input output $S abba$ S  aBa $aBa abba$ $aB bba$ B  bB $aBb bba$ $aB ba$ B  bB $aBb ba$ $aB a$ B   $a a$ $ $ accept, successful completion 151 a b $ S S  aBa B B   B  bB
  • 152.
    LL(1) PARSER –ANOTHER EXAMPLE (CONT.) 152 Outputs: S  aBa B  bB B  bB B   Derivation(left-most): S  aBa  abBa  abbBa  abba S B a a B B b b  parse tree
  • 153.
  • 154.
    RECURSIVE DESCENT PREDICTIVEPARSING After left factoring, the grammer is changed to PROGRAM → begin DECLIST comma STATELIST end DECLIS → d semi DECLIST DECLIST → d STATELIST → s semi STATELIST STATELIST → s PROGRAM → begin DECLIST comma STATELIST end DECLIST → dX X → semi DECLIST | є STATELIST → sY Y → semi STATELIST | є
  • 155.
    First(X) = {semi,є} Follow(X) = {comma} First(Y) = {semi, є} Follow(Y) = {end} Write functions for each nonterminal. PROGRAM → begin DECLIST comma STATELIST end DECLIST → dX X → semi DECLIST | є STATELIST → sY Y → semi STATELIST | є
  • 156.
  • 157.
    Viod PROGRAM { if (token!= begin) error(); token = lexical(); DECLIST(); if (token != comma) error(); token = lexical(); STATELIST(); if (token != end) error(); }
  • 158.
    void DECLIST() { if (token!= d) error; token = lexical(); X(); }
  • 159.
    void X() { if (token== semi) { token = lexical(); DECLIST(); } else if (token == comma) ; // do nothing else error(); }
  • 160.
    void STATELIST() { if (token!= s) error(); token = lexical(); Y(); } Void Y() { if (token == semi) { token = lexical(); STATELIST(); } else if (token == end) ; // do nothing else error(); }
  • 161.
    CHANGING RECURSION INTOITERATION Change productions into an extended notation that includes the *. PROGRAM → begin DECLIST comma STATELIST end DECLIST → dX X → semi DECLIST | є STATELIST → sY Y → semi STATELIST | є PROGRAM → begin DECLIST comma STATELIST end DECLIST → d (semi d)* STATELIST → s (semi s)*
  • 162.
    CHANGING RECURSION INTOITERATION void DECLIST() { if (token != d) error(); token = lexical(); while (token == semi) { token = lexical(); if (token != d) error(); token = lexical(); } }
  • 163.
    CHANGING RECURSION INTOITERATION void STATELIST() { if (token != s) error(); token = lexical(); while (token == semi) { token = lexical(); if (token != s) error(); token = lexical(); } }
  • 164.
    CHANGING RECURSION INTOITERATION Removal of recursion is not always possible. A context free grammar might contain middle recursion and this can not be replaced by iteration. For example E → E ‘+’ T E → T T → T ‘*’ F T → F F → ‘(‘ E ‘)’ F → ‘x’
  • 165.
    Transforming the grammarinto LL(1) E → E ‘+’ T E → T T → T ‘*’ F T → F F → ‘(‘ E ‘)’ F → ‘x’ E → TX X → ‘ +’ TX | є T → FY Y → ‘*’ FY | є F → ‘(‘ E ‘) | ‘x’ Replacing recursion by iteration, where possible, we have E → T( ‘+’ T)* T → F(‘*’ F)* F → ‘(‘ E ‘)’ | ‘x’
  • 166.
    void E() { T(); while (token== plus) { token = lexical(); T(); } } Void T() { F(); while (token == Times) { token = lexical(); F(); } } E → T( ‘+’ T)* T → F(‘*’ F)* F → ‘(‘ E ‘)’ | ‘x’
  • 167.
    Void F() { if (token== obracket) { token = lexical(); E(); if (token == cbracket) token = lexical(); else error(); } else if (token == x) token = lexical(); else error(); } main() { token = lexical(; E(); } E → T( ‘+’ T)* T → F(‘*’ F)* F → ‘(‘ E ‘)’ | ‘x’

Editor's Notes

  • #5 December 25, 2023