CSE340 - Principles of
Programming Languages
Lecture 15:
Parsing Techniques III
Javier Gonzalez-Sanchez
javiergs@asu.edu
BYENG M1-38
Office Hours: By appointment
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 2
Parser | Error Recovery
PROGRAM
Line N: expected {
Line N: expected }
currentToken++;
Searching for
FIRST(BODY) or }
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 3
Parser | Error Recovery
Rule FIRST set FOLLOW set
PROGRAM { EOF
BODY FIRST (PRINT) U FIRST (ASIGNMENT) U FIRST(VARIABLE) U FIRST
(WHILE) U FIRST(IF) U FIRST (RETURN)
}
PRINT print ;
ASSIGNMENT identifier ;
VARIABLE int, float, boolean, void, char, string ;
WHILE while } U FIRST(BODY)
IF if } U FIRST(BODY)
RETURN return ;
EXPRESSION FIRST(X) ), ;
X FIRST(Y) | U FOLLOW(EXPRESSION)
Y ! U FIRST(R) & U FOLLOW(X)
R FIRST(E) FOLLOW(Y)
E FIRST (A) !=, ==, >, < U FOLLOW(R)
A FIRST (B) -, + U FOLLOW(E)
B - U FIRST (C) *, /, U FOLLOW(A)
C integer, octal, hexadecimal, binary, true, false, string, char, float, identifier, ( FOLLOW(B)
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 4
Calculating the First Set
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 5
FIRST set
Definition
FIRST (a) is the set of tokens that can begin the construction a.
Example
<E> → <A> {+ <A>}
<A> → <B> {* <B>}
<B> → -<C> | <C>
<C> → integer
FIRST(E) = {-, integer}
FIRST(A) = {-, integer}
FIRST(B) = {-, integer}
FIRST(C) = {integer}
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 6
FIRST set
Define FIRST (BODY)
FIRST(BODY) =
FIRST (PRINT) U FIRST (ASSIGNMENT) U FIRST(VARIABLE) U FIRST(WHILE) U
FIRST(IF) U FIRST(RETURN)
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 7
FIRST set
Define FIRST (C)
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 8
FIRST set
Define FIRST (A)
Define FIRST (B)
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 9
FIRST set
<S> → <A><B><C>
<S> → <F>
<A> → <E><F>d
<A> → a
<B> → a<B>b
<B> → ε
<C> → c<C>
<C> → d
<E> → e<E>
<E> → <F>
<F> → <F>f
<F> → ε
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 10
Calculate the FIRST set
1.  FIRST(X) = {X}
if X is a terminal
2. FIRST(ε) = {ε}.
note that this is not covered by the first rule because
ε is not a terminal.
3.  If A → Xα, add FIRST(X) − {ε} to FIRST(A)
4.  If A → A1A2A3 ...AiAi+1 ... Ak and
ε ∈ F IRST (A1) and ε ∈ FIRST (A2) and . . . and ε ∈ FIRST (Ai),
then add FIRST (Ai+1) − {ε} to FIRST (A).
5.  If A → A1A2A3 ...Ak and
ε ∈ FIRST(A1) and ε ∈ FIRST(A2) and... and ε ∈ FIRST(Ak),
then add ε to FIRST(A).
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 11
Calculate the FIRST set
loop
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 12
FIRST set
S → ABC
S → F
A → EFd
A → a
B → aBb
B → ε
C → cC
C → d
E → eE
E → F
F → Ff
F → ε
rule	
 FIRST  set  -­‐‑  evolution	
S	
 ø {a, ε} {a, ε, e, f} {a, ε, e, f, d}
A	
 ø {a} {a, e} {a, e, f, d}
B	
 ø {a, ε}
C	
 ø {c, d}
E	
 ø {e} {e, ε} {e, ε, f}
F	
 ø {ε} {ε, f}
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 13
FIRST set | Exercise
<X> → <A> | <A> a
<A> → <B> | <B> b
<B> → <C><D><E> | c d e | <C> c <D> d <E> e
<C> → one
<D> → two
<E> → three
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 14
FIRST set | Exercise
<X> → <A>
<X> → <A> a
<A> → <B>
<A> → <B> b
<B> → <C><D><E>
<B> → c d e
<B> → <C> c <D> d <E> e
<C> → one
<D> → two
<E> → three
OPTION 1:
FIRST(X) = {c, one}
FIRST(A) = {c, one}
FIRST(B) = {c, one}
FIRST(C) = {one}
FIRST(D) = {two}
FIRST(E) = {three}
OPTION 2:
FIRST(X) = {c, one, ε}
FIRST(A) = {b, c, one}
FIRST(B) = {c, one}
FIRST(C) = {one}
FIRST(D) = {two}
FIRST(E) = {three}
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 15
FIRST set | Exercise
<X> → <A>
<X> → <A> a
<A> → <B>
<A> → <B> b
<B> → <C><D><E>
<B> → c d e
<B> → <C> c <D> d <E> e
<C> → one
<C> → ε
<D> → two
<D> → ε
<E> → three
FIRST(X) = {c, one, three, two}
FIRST(A) = {c, one, three, two}
FIRST(B) = {c, one, three, two}
FIRST(C) = {one, ε}
FIRST(D) = {two, ε}
FIRST(E) = {three}
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 16
Calculating the Follow Set
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 17
FOLLOW set
Definition
FOLLOW (a) is the set of tokens that can follow the construction a.
Example
<E> → <A> {+ <A>}
<A> → <B> {* <B>}
<B> → -<C> | <C>
<C> → integer
5 + 4 + -7 * 12 + 75
5 + 4 + ((-7) * 12) + 75
What follows <C> ?
What follows <A> ?
What follows <E> ?
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 18
FOLLOW set
Definition
FOLLOW (a) is the set of tokens that can follow the construction a.
Example
<E> → <A> {+ <A>}
<A> → <B> {* <B>}
<B> → <C> | <C>
<C> → integer
FOLLOW(E) = {$} // $ represents end of input, i.e., EOF
FOLLOW(A) = {+, $}
FOLLOW(B) = {*, +, $}
FOLLOW(C) = {*, +, $}
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 19
FOLLOW set
Define FOLLOW (BODY)
FIRST(BODY) = }
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 20
FOLLOW set
Define FOLLOW (C)
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 21
FOLLOW set
<S> → <A><B><C>
<S> → <F>
<A> → <E><F>d
<A> → a
<B> → a<B>b
<B> → ε
<C> → c<C>
<C> → d
<E> → e<E>
<E> → <F>
<F> → <F>f
<F> → ε
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 22
Calculate the FOLLOW set
1.  First put $ (the end of input marker) in Follow(S) (S is the start
symbol)
2.  If there is a production A → aBb,
(where a can be a whole string)
then everything in FIRST(b) except for ε is placed in FOLLOW(B).
(apply the rule 4 in calculate FIRST set)
3.  If there is a production A → aB,
then add FOLLOW(A) to FOLLOW(B)
4.  If there is a production A → aBb,
where FIRST(b) contains ε,
then add FOLLOW(A) to FOLLOW(B)
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 23
Calculate the FOLLOW set
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 24
FOLLOW set
S → ABC
S → F
A → EFd
A → a
B → aBb
B → ε
C → cC
C → d
E → eE
E → F
F → Ff
F → ε
rule	
 FOLLOW  set  -­‐‑  evolution	
S	
 {eof}
A	
 {a} {a, c, d}
B	
 {c, d} {c, d, b}
C	
 {eof}
E	
 {f} {f, d}
F	
 {eof} {eof, d} {eof, d, f}
FIRST sets:
S={a,ε,e,f,d}
A={a, e, f, d}
B={a, ε}
C= {c, d}
E={e, ε, f}
F={ε,f}
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 25
Another Example
<E> → <T> {+<T>}
<T> → <F> {*<F>}
<F> → (<E>) | integer
FIRST (E) = {(, integer}
FIRST (T) = {(, integer}
FIRST (F) = {(, integer}
FOLLOW(E) = {$, )}
FOLLOW(T) = {$, ),+ }
FOLLOW(F) = {$, ),+, * }
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 26
Prediction Rules
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 27
Prediction Rules
Rule 1.
It should always be possible to choose among several
alternatives in a grammar rule.
FIRST(R1) FIRST(R2) FIRST(R3)... FIRST(Rn) = Ø
BODY
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 28
Prediction Rules
Rule 1.1
The FIRST sets of any two choices in one rule must not
have tokens in common in order to implement a single-
symbol look ahead predictive parser.
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 29
Prediction Rules
Rule 2.
For any optional part, no token beginning the optional part
can also come after the optional part.
FIRST(RULE) != FOLLOW(RULE)
BODY PROGRAM
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 30
Homework
Programming Assignment #2
CSE340 - Principles of Programming Languages
Javier Gonzalez-Sanchez
javiergs@asu.edu
Summer 2015
Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.

201506 CSE340 Lecture 15

  • 1.
    CSE340 - Principlesof Programming Languages Lecture 15: Parsing Techniques III Javier Gonzalez-Sanchez javiergs@asu.edu BYENG M1-38 Office Hours: By appointment
  • 2.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 2 Parser | Error Recovery PROGRAM Line N: expected { Line N: expected } currentToken++; Searching for FIRST(BODY) or }
  • 3.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 3 Parser | Error Recovery Rule FIRST set FOLLOW set PROGRAM { EOF BODY FIRST (PRINT) U FIRST (ASIGNMENT) U FIRST(VARIABLE) U FIRST (WHILE) U FIRST(IF) U FIRST (RETURN) } PRINT print ; ASSIGNMENT identifier ; VARIABLE int, float, boolean, void, char, string ; WHILE while } U FIRST(BODY) IF if } U FIRST(BODY) RETURN return ; EXPRESSION FIRST(X) ), ; X FIRST(Y) | U FOLLOW(EXPRESSION) Y ! U FIRST(R) & U FOLLOW(X) R FIRST(E) FOLLOW(Y) E FIRST (A) !=, ==, >, < U FOLLOW(R) A FIRST (B) -, + U FOLLOW(E) B - U FIRST (C) *, /, U FOLLOW(A) C integer, octal, hexadecimal, binary, true, false, string, char, float, identifier, ( FOLLOW(B)
  • 4.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 4 Calculating the First Set
  • 5.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 5 FIRST set Definition FIRST (a) is the set of tokens that can begin the construction a. Example <E> → <A> {+ <A>} <A> → <B> {* <B>} <B> → -<C> | <C> <C> → integer FIRST(E) = {-, integer} FIRST(A) = {-, integer} FIRST(B) = {-, integer} FIRST(C) = {integer}
  • 6.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 6 FIRST set Define FIRST (BODY) FIRST(BODY) = FIRST (PRINT) U FIRST (ASSIGNMENT) U FIRST(VARIABLE) U FIRST(WHILE) U FIRST(IF) U FIRST(RETURN)
  • 7.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 7 FIRST set Define FIRST (C)
  • 8.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 8 FIRST set Define FIRST (A) Define FIRST (B)
  • 9.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 9 FIRST set <S> → <A><B><C> <S> → <F> <A> → <E><F>d <A> → a <B> → a<B>b <B> → ε <C> → c<C> <C> → d <E> → e<E> <E> → <F> <F> → <F>f <F> → ε
  • 10.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 10 Calculate the FIRST set 1.  FIRST(X) = {X} if X is a terminal 2. FIRST(ε) = {ε}. note that this is not covered by the first rule because ε is not a terminal. 3.  If A → Xα, add FIRST(X) − {ε} to FIRST(A) 4.  If A → A1A2A3 ...AiAi+1 ... Ak and ε ∈ F IRST (A1) and ε ∈ FIRST (A2) and . . . and ε ∈ FIRST (Ai), then add FIRST (Ai+1) − {ε} to FIRST (A). 5.  If A → A1A2A3 ...Ak and ε ∈ FIRST(A1) and ε ∈ FIRST(A2) and... and ε ∈ FIRST(Ak), then add ε to FIRST(A).
  • 11.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 11 Calculate the FIRST set loop
  • 12.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 12 FIRST set S → ABC S → F A → EFd A → a B → aBb B → ε C → cC C → d E → eE E → F F → Ff F → ε rule FIRST  set  -­‐‑  evolution S ø {a, ε} {a, ε, e, f} {a, ε, e, f, d} A ø {a} {a, e} {a, e, f, d} B ø {a, ε} C ø {c, d} E ø {e} {e, ε} {e, ε, f} F ø {ε} {ε, f}
  • 13.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 13 FIRST set | Exercise <X> → <A> | <A> a <A> → <B> | <B> b <B> → <C><D><E> | c d e | <C> c <D> d <E> e <C> → one <D> → two <E> → three
  • 14.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 14 FIRST set | Exercise <X> → <A> <X> → <A> a <A> → <B> <A> → <B> b <B> → <C><D><E> <B> → c d e <B> → <C> c <D> d <E> e <C> → one <D> → two <E> → three OPTION 1: FIRST(X) = {c, one} FIRST(A) = {c, one} FIRST(B) = {c, one} FIRST(C) = {one} FIRST(D) = {two} FIRST(E) = {three} OPTION 2: FIRST(X) = {c, one, ε} FIRST(A) = {b, c, one} FIRST(B) = {c, one} FIRST(C) = {one} FIRST(D) = {two} FIRST(E) = {three}
  • 15.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 15 FIRST set | Exercise <X> → <A> <X> → <A> a <A> → <B> <A> → <B> b <B> → <C><D><E> <B> → c d e <B> → <C> c <D> d <E> e <C> → one <C> → ε <D> → two <D> → ε <E> → three FIRST(X) = {c, one, three, two} FIRST(A) = {c, one, three, two} FIRST(B) = {c, one, three, two} FIRST(C) = {one, ε} FIRST(D) = {two, ε} FIRST(E) = {three}
  • 16.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 16 Calculating the Follow Set
  • 17.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 17 FOLLOW set Definition FOLLOW (a) is the set of tokens that can follow the construction a. Example <E> → <A> {+ <A>} <A> → <B> {* <B>} <B> → -<C> | <C> <C> → integer 5 + 4 + -7 * 12 + 75 5 + 4 + ((-7) * 12) + 75 What follows <C> ? What follows <A> ? What follows <E> ?
  • 18.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 18 FOLLOW set Definition FOLLOW (a) is the set of tokens that can follow the construction a. Example <E> → <A> {+ <A>} <A> → <B> {* <B>} <B> → <C> | <C> <C> → integer FOLLOW(E) = {$} // $ represents end of input, i.e., EOF FOLLOW(A) = {+, $} FOLLOW(B) = {*, +, $} FOLLOW(C) = {*, +, $}
  • 19.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 19 FOLLOW set Define FOLLOW (BODY) FIRST(BODY) = }
  • 20.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 20 FOLLOW set Define FOLLOW (C)
  • 21.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 21 FOLLOW set <S> → <A><B><C> <S> → <F> <A> → <E><F>d <A> → a <B> → a<B>b <B> → ε <C> → c<C> <C> → d <E> → e<E> <E> → <F> <F> → <F>f <F> → ε
  • 22.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 22 Calculate the FOLLOW set 1.  First put $ (the end of input marker) in Follow(S) (S is the start symbol) 2.  If there is a production A → aBb, (where a can be a whole string) then everything in FIRST(b) except for ε is placed in FOLLOW(B). (apply the rule 4 in calculate FIRST set) 3.  If there is a production A → aB, then add FOLLOW(A) to FOLLOW(B) 4.  If there is a production A → aBb, where FIRST(b) contains ε, then add FOLLOW(A) to FOLLOW(B)
  • 23.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 23 Calculate the FOLLOW set
  • 24.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 24 FOLLOW set S → ABC S → F A → EFd A → a B → aBb B → ε C → cC C → d E → eE E → F F → Ff F → ε rule FOLLOW  set  -­‐‑  evolution S {eof} A {a} {a, c, d} B {c, d} {c, d, b} C {eof} E {f} {f, d} F {eof} {eof, d} {eof, d, f} FIRST sets: S={a,ε,e,f,d} A={a, e, f, d} B={a, ε} C= {c, d} E={e, ε, f} F={ε,f}
  • 25.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 25 Another Example <E> → <T> {+<T>} <T> → <F> {*<F>} <F> → (<E>) | integer FIRST (E) = {(, integer} FIRST (T) = {(, integer} FIRST (F) = {(, integer} FOLLOW(E) = {$, )} FOLLOW(T) = {$, ),+ } FOLLOW(F) = {$, ),+, * }
  • 26.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 26 Prediction Rules
  • 27.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 27 Prediction Rules Rule 1. It should always be possible to choose among several alternatives in a grammar rule. FIRST(R1) FIRST(R2) FIRST(R3)... FIRST(Rn) = Ø BODY
  • 28.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 28 Prediction Rules Rule 1.1 The FIRST sets of any two choices in one rule must not have tokens in common in order to implement a single- symbol look ahead predictive parser.
  • 29.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 29 Prediction Rules Rule 2. For any optional part, no token beginning the optional part can also come after the optional part. FIRST(RULE) != FOLLOW(RULE) BODY PROGRAM
  • 30.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 30 Homework Programming Assignment #2
  • 31.
    CSE340 - Principlesof Programming Languages Javier Gonzalez-Sanchez javiergs@asu.edu Summer 2015 Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.