Ch4a

Syntax Analysis Part I Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007-2009

Position of a Parser in the Compiler Model Lexical Analyzer Parser and rest of front-end Source Program Token, tokenval Symbol Table Get next token Lexical error Syntax error Semantic error Intermediate representation

The Parser ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Syntax-Directed Translation ,[object Object],[object Object],[object Object],[object Object],[object Object]

Error Handling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Viable-Prefix Property ,[object Object],[object Object],[object Object],… for (;) … … DO 10 I = 1;0 … Error is detected here Error is detected here Prefix Prefix

Error Recovery Strategies ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Grammars (Recap) ,[object Object],[object Object],[object Object],[object Object],[object Object]

Notational Conventions Used ,[object Object],[object Object],[object Object],[object Object],[object Object]

Derivations (Recap) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Derivation (Example) Grammar G = ({ E }, { + , * , ( , ) , - , id }, P , E ) with productions P = E  E + E E  E * E E  ( E ) E  - E E  id E  - E  - id E  * E E  + id * id + id E  rm E + E  rm E + id  rm id + id Example derivations: E  * id + id

Chomsky Hierarchy: Language Classification ,[object Object],[object Object],[object Object],[object Object],[object Object]

Chomsky Hierarchy L ( regular )  L ( context free )  L ( context sensitive )  L ( unrestricted ) Where L ( T ) = { L ( G ) | G is of type T } That is: the set of all languages generated by grammars G of type T L 1 = { a n b n | n  1 } is context free L 2 = { a n b n c n | n  1 } is context sensitive Every finite language is regular! (construct a FSA for strings in L ( G )) Examples:

Parsing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Top-Down Parsing ,[object Object],Grammar: E  T + T T  ( E ) T  - E T  id Leftmost derivation: E  lm T + T  lm id + T  lm id + id E E T + T id id E T T + E T + T id

[object Object],[object Object],Left Recursion (Recap)

A General Systematic Left Recursion Elimination Method Input: Grammar G with no cycles or  - productions Arrange the nonterminals in some order A 1 , A 2 , …, A n for i = 1, …, n do for j = 1, …, i -1 do replace each A i  A j  with A i   1  |  2  | … |  k  where A j   1 |  2 | … |  k enddo eliminate the immediate left recursion in A i enddo

Immediate Left-Recursion Elimination Rewrite every left-recursive production A  A  |  |  | A  into a right-recursive production: A   A R |  A R A R   A R |  A R | 

Example Left Recursion Elim. A  B C | a B  C A | A b C  A B | C C | a Choose arrangement: A , B , C i = 1: nothing to do i = 2, j = 1: B  C A | A b  B  C A | B C b | a b  (imm) B  C A B R | a b B R B R  C b B R |  i = 3, j = 1: C  A B | C C | a  C  B C B | a B | C C | a i = 3, j = 2: C  B C B | a B | C C | a  C  C A B R C B | a b B R C B | a B | C C | a  (imm) C  a b B R C B C R | a B C R | a C R C R  A B R C B C R | C C R | 

Left Factoring ,[object Object],[object Object]

Predictive Parsing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

FIRST (Revisited) ,[object Object]

LL(1) Grammar ,[object Object]

Recursive-Descent Parsing (Recap) ,[object Object],[object Object],[object Object]

Using FIRST and FOLLOW in a Recursive-Descent Parser expr  term rest rest  + term rest | - term rest |  term  id procedure rest (); begin if lookahead in FIRST( + term rest ) then match (‘ + ’); term (); rest () else if lookahead in FIRST( - term rest ) then match (‘ - ’); term (); rest () else if lookahead in FOLLOW( rest ) then return else error () end ; FIRST( + term rest ) = { + } FIRST( - term rest ) = { - } FOLLOW( rest ) = { $ }

Non-Recursive Predictive Parsing: Table-Driven Parsing ,[object Object],Predictive parsing program (driver) Parsing table M stack input output a + b $ X Y Z $

Constructing an LL(1) Predictive Parsing Table for each production A   do for each a  FIRST(  ) do add A   to M [ A , a ] enddo if   FIRST(  ) then for each b  FOLLOW( A ) do add A   to M [ A , b ] enddo endif enddo Mark each undefined entry in M error

Example Table E  T E R E R  + T E R |  T  F T R T R  * F T R |  F  ( E ) | id id + * ( ) $ E E  T E R E  T E R E R E R  + T E R E R   E R   T T  F T R T  F T R T R T R   T R  * F T R T R   T R   F F  id F  ( E ) A   FIRST(  ) FOLLOW( A ) E  T E R ( id $ ) E R  + T E R + $ ) E R    $ ) T  F T R ( id + $ ) T R  * F T R * + $ ) T R    + $ ) F  ( E ) ( * + $ ) F  id id * + $ )

LL(1) Grammars are Unambiguous Ambiguous grammar S  i E t S S R | a S R  e S |  E  b Error: duplicate table entry a b e i t $ S S  a S  i E t S S R S R S R   S R  e S S R   E E  b A   FIRST(  ) FOLLOW( A ) S  i E t S S R i e $ S  a a e $ S R  e S e e $ S R    e $ E  b b t

Predictive Parsing Program (Driver) push( $ ) push( S ) a := lookahead repeat X := pop() if X is a terminal or X = $ then match( X ) // moves to next token and a := lookahead else if M [ X , a ] = X  Y 1 Y 2 … Y k then push( Y k , Y k -1 , …, Y 2 , Y 1 ) // such that Y 1 is on top … invoke actions and/or produce IR output … else error() endif until X = $

Example Table-Driven Parsing Stack $ E $ E R T $ E R T R F $ E R T R id $ E R T R $ E R $ E R T + $ E R T $ E R T R F $ E R T R id $ E R T R $ E R T R F * $ E R T R F $ E R T R id $ E R T R $ E R $ Input id +id*id$ id +id*id$ id +id*id$ id +id*id$ + id*id$ + id*id$ + id*id$ id *id$ id *id$ id *id$ * id$ * id$ id $ id $ $ $ $ Production applied E  T E R T  F T R F  id T R   E R  + T E R T  F T R F  id T R  * F T R F  id T R   E R  

Panic Mode Recovery FOLLOW( E ) = { ) $ } FOLLOW( E R ) = { ) $ } FOLLOW( T ) = { + ) $ } FOLLOW( T R ) = { + ) $ } FOLLOW( F ) = { + * ) $ } Add synchronizing actions to undefined entries based on FOLLOW synch : the driver pops current nonterminal A and skips input tillsynch token or skips input until one of FIRST( A ) is found Pro: Can be automated Cons: Error messages are needed id + * ( ) $ E E  T E R E  T E R synch synch E R E R  + T E R E R   E R   T T  F T R synch T  F T R synch synch T R T R   T R  * F T R T R   T R   F F  id synch synch F  ( E ) synch synch

Phrase-Level Recovery Change input stream by inserting missing tokens For example: id id is changed into id * id insert *: driver inserts missing * and retries the production Can then continue here Pro: Can be automated Cons: Recovery not always intuitive id + * ( ) $ E E  T E R E  T E R synch synch E R E R  + T E R E R   E R   T T  F T R synch T  F T R synch synch T R insert * T R   T R  * F T R T R   T R   F F  id synch synch F  ( E ) synch synch

Error Productions E  T E R E R  + T E R |  T  F T R T R  * F T R |  F  ( E ) | id Add “ error production” : T R  F T R to ignore missing * , e.g.: id id Pro: Powerful recovery method Cons: Cannot be automated id + * ( ) $ E E  T E R E  T E R synch synch E R E R  + T E R E R   E R   T T  F T R synch T  F T R synch synch T R T R  F T R T R   T R  * F T R T R   T R   F F  id synch synch F  ( E ) synch synch

Ch4a

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Ch4a

Similar to Ch4a (20)

More from kinnarshah8888

More from kinnarshah8888 (16)

Recently uploaded

Recently uploaded (20)

Ch4a