SYNTAX
ANALYSIS
Syntax Analyzer
 Syntax Analyzer creates the syntactic structure of the given
source program.
 This syntactic structure - parse tree.
 Syntax Analyzer is also known as parser.
 The syntax analyzer (parser) checks whether a given source
program satisfies the rules implied by a context-free grammar
or not.
 If it satisfies, the parser creates the parse tree of that program.
 Otherwise the parser gives the error messages.
INTRODUCTION
 Every programming language has precise rules that prescribe the syntactic
structure of well-formed programs.
 Program is made up of functions, a function out of declarations and statements, a
statement out of expressions
 The syntax of programming language constructs can be specified by context-
free grammars
 A context-free grammar
 gives a precise syntactic specification of a programming language.
 the design of the grammar is an initial phase of the design of a
compiler.
 a grammar can be directly converted into a parser by some tools.
Parser
• Parser works on a stream of tokens.
• The smallest item is a token.
Lexical
Analyzer
Parser
source
program
token
get next token
parse tree
Parsing
 Parsing is the process of determining whether a string of tokens can be
generated by a grammar.
 Parsing methods
 The top-down
 Bottom-up methods.
 Top-down parsing, construction starts at the root and proceeds to the
leaves.
 Bottom-up parsing, construction starts at the leaves and proceeds towards
the root.
 Top-down parsers are easy to build by hand.
 Bottom-up parsing,
 Can handle a larger class of grammars.
 They are not as easy to build, but tools for generating them directly from a grammar are available.
 Both top-down and bottom-up parsers scan the input from left to right (one symbol at a time).
Top- Down Parsing
 Done by starting with the root, labeled with the starting nonterminal stmt,
and repeatedly performing the following two steps.
 At node N, labeled with nonterminal A, select one of the productions for A and
construct children at N for the symbols in the production body.
 Find the next node at which a subtree is to be constructed, typically the leftmost
unexpanded nonterminal of the tree.
 The current terminal being scanned in the input is frequently referred to as
the lookahead symbol.
Top- Down Parsing
Top- Down Parsing
Top- Down Parsing
Top-Down Parsing
 Top-Down Parsing is an attempt to find a left-most
derivation for an input string
 Example:
S  cAd Find a derivation for
A  ab | a for w  cad
S S Backtrack S
/ |   / |   / | 
c A d c A d c A d
/  |
a b a
Predictive Parsing
 Recursive-descent parsing is a top-down method of syntax analysis in
which a set of recursive procedures is used to process the input.
 Simple form of recursive descent – Predictive Parsing
Syntax Error Handling
 Goals in error handling
 Report the presence of errors clearly and accurately.
 Recover from each error quickly enough to detect subsequent errors.
 Add minimal overhead to the processing of correct programs.
Error-Recovery Strategies
 The simplest approach is for the parser to quit with an informative error
message when it detects the first error.
 Panic-mode recovery
 Phrase-level recovery
 Error-productions
 Global-correction.
Panic-Mode Recovery
 The parser discards input symbols one at a time until one of a designated set of
synchronizing tokens is found.
 The synchronizing tokens are usually delimiters, such as ; or }.
 Skips a considerable amount of input without checking for additional errors
 It has the advantage of simplicity, and is guaranteed not to go into an infinite
loop.
Phrase-Level Recovery
 Perform local correction on the remaining input;
 It may replace a prefix of the remaining input by some string that allows the
parser to continue.
 A typical local correction is to replace a comma by a semicolon.
 Delete an extraneous semicolon.
 Insert a missing semicolon.
 Disadvantage in coping with situations in which the actual error has occurred
before the point of detection.
Error Productions
 Expand the grammar for the language at hand with productions that generate the
erroneous constructs.
 The parser can then generate appropriate error diagnostics about the erroneous
construct that has been recognized in the input.
Global Correction
 Compiler to make as few changes as possible in processing an incorrect input
string.
 Given an incorrect input string x and grammar G, algorithms will find a parse
tree for a related string y, such that the number of insertions, deletions, and
changes of tokens required to transform x into y is as small as possible.
 Not implemented.
Syntax Definition
 A grammar describes the hierarchical structure of programming language constructs.
 Eg: if ( expression ) statement else statement
 An if-else statement is the concatenation of the keyword if, an opening parenthesis, an
expression, a closing parenthesis, a statement, the keyword else, and another statement.
 Stmt -> if ( expr ) stmt else stmt
 Rule is called a production.
 In a production, lexical elements if and the parentheses are called terminals.
 Variables like expr and stmt are called nonterminals.
A Context Free Grammar
 A context-free grammar has four components:
 A set of terminal symbols, sometimes referred to as "tokens.“
 A set of nonterminals, sometimes called "syntactic variables."
 A set of productions, where each production consists of a nonterminal,called the head or
left side of the production, an arrow, and a sequence of terminals and/or nonterminals ,
called the body or right side of the production
 A designation of one of the nonterminals as the start symbol.
A Context Free Grammar
The terminal symbols are
Notational Conventions
These symbols are terminals:
 Lowercase letters early in the alphabet, such as a, b, c.
 Operator symbols such as +, *, and so on.
 Punctuation symbols such as parentheses, comma, and so on.
 The digits 0, 1, . . . , 9.
 Boldface strings such as id or if, each of which represents a single terminal
symbol.
Notational Conventions
These symbols are nonterminals:
 Uppercase letters early in the alphabet, such as A, B, C.
 The letter s, which, when it appears, is usually the start symbol.
 Lowercase, italic names such as expr or stmt.
 Uppercase letters may be used t o represent nonterminals for the constructs.
For example, nonterminals for expressions, terms, and factors are often
represented by E, T, and F, respectively.
Notational Conventions
 Uppercase letters late in the alphabet, such as X, Y, Z, represent grammar
symbols; that is, either nonterminals or terminals.
 Lowercase letters late in the alphabet , chiefly u, v, ... ,z, represent (possibly
empty) strings of terminals.
 Lowercase greek letters,α, β, γ for example, represent (possibly empty) strings
of grammar symbols.
 A set of productions a -> α 1 , a -> α2, ... , a -> α k with a common head
 A (call them a-productions) , may be written A -> α 1 I α 2 I . , . I α k · call α1 ,
α2 , ... ,αk the alternatives for A.
 Unless stated otherwise, the head of the first production is the start symbol
Notational Conventions
Derivations
 E  E+E : E+E derives from E
 E  E+E  id+E  id+id
 A sequence of replacements of non-terminal symbols is called a derivation
of id+id from E.
 A   if there is a production rule A in our grammar and  and
 are arbitrary strings of terminal and non-terminal symbols
1  2  ...  n (n derives from 1 or 1 derives n )
 : derives in one step
 : derives in zero or more steps
 : derives in one or more steps
*
+
CFG - Terminology
 L(G) is the language of G (the language generated by G) which is a set of
sentences.
 A sentence of L(G) is a string of terminal symbols of G.
 If S is the start symbol of G then
 is a sentence of L(G) iff S   where  is a string of terminals of G
 If G is a context-free grammar, L(G) is a context-free language.
 Two grammars are equivalent if they produce the same language.
 S   - If  contains non-terminals, it is called as a sentential form of G.
- If  does not contain non-terminals, it is called as a sentence of G.
*
*
Derivation Example
 E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)
OR
 E  -E  -(E)  -(E+E)  -(E+id)  -(id+id)
 At each derivation step, we can choose any of the non-terminal in the
sentential form of G for the replacement.
 If we always choose the left-most non-terminal in each derivation
step, this derivation is called as left-most derivation.
 If we always choose the right-most non-terminal in each derivation
step, this derivation is called as right-most derivation.
Left-Most and Right-Most Derivations
Left-Most Derivation
E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)
Right-Most Derivation
E  -E  -(E)  -(E+E)  -(E+id)  -(id+id)
 We will see that the top-down parsers try to find the left-most derivation of the
given source program.
 We will see that the bottom-up parsers try to find the right-most derivation of
the given source program in the reverse order.
lmlmlmlmlm
rmrmrmrmrm
Parse Trees and Derivations
 A parse tree is a graphical representation of a derivation that filters out the
order in which productions are applied to replace nonterminals.
 The interior node is labeled with the nonterminal A in the head of the
production;
 The children of the node are labeled, from left to right, by the symbols in the
body of the production
 The leaves of a parse tree are labeled by nonterminals or terminals
 Read from left to right, constitute a sentential form, called the yield or frontier
of the tree.
 There is a many-to-one relationship between derivations and parse trees.
Ambiguity
 a grammar that produces more than one parse tree for some sentence is said
to be ambiguous
1
2
3
4
a
b
c
d
e
f
Writing a Grammar
 Grammars are capable of describing most, of the syntax of programming
languages .
 Grammar should be unambiguous.
 Left-recursion elimination and left factoring - are useful for rewriting
grammars .
 From the resulting grammar we can create top down parsers without
backtracking.
 Such parsers are called predictive parsers or recursive-descent parser
Eliminating Ambiguity
 ambiguous grammar can be rewritten to eliminate the ambiguity.
 stmt -> if expr then stmt
|if expr then stmt else stmt
|other
 is ambiguous since the string
 if E1 then if E2 then S1 else S2 has the two parse trees
Two parse trees for an ambiguous sentence
Eliminating Ambiguity
 The general rule is, "Match each else with the closest unmatched then."
Left Recursion
 A grammar is left recursive if it has a non-terminal A such that there is a
derivation.
A  A for some string 
 Top-down parsing techniques cannot handle left-recursive grammars.
 The left-recursion may appear in a single step of the derivation (immediate left-
recursion), or may appear in more than one step of the derivation.
*
Immediate Left-Recursion
A  A  |  where  does not start with A
 eliminate immediate left recursion
A   A’
A’   A’ | 
A  A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with A
 eliminate immediate left recursion
A  1 A’ | ... | n A’
A’  1 A’ | ... | m A’ |  an equivalent grammar
In general,
Left-Recursion -- Problem
• A grammar cannot be immediately left-recursive, but it still can
be left-recursive.
S  Aa | b
A  Sc | d
S  Aa  Sca
A  Sc  Aac causes to a left-recursion
Eliminate Left-Recursion -- Algorithm
- Arrange non-terminals in some order: A1 ... An
- for i from 1 to n do {
- for j from 1 to i-1 do {
replace each production
Ai  Aj 
by
Ai  1  | ... | k 
where Aj  1 | ... | k
}
- eliminate immediate left-recursions among Ai productions
}
Eliminate Left-Recursion
S  Aa | b
A  Ac | Sd | f
- Order of non-terminals: S, A
- A  Ac | Aad | bd | f
- Eliminate the immediate left-recursion in A
A  bdA’ | fA’
A’  cA’ | adA’ | 
So, the resulting equivalent grammar which is not left-recursive is:
S  Aa | b
A  bdA’ | fA’
A’  cA’ | adA’ | 
Eliminate Left-Recursion – Example2
S  Aa | b
A  Ac | Sd | f
- Order of non-terminals: A, S
- Eliminate the immediate left-recursion in A
A  SdA’ | fA’
A’  cA’ | 
- Replace S  Aa with S  SdA’a | fA’a
- Eliminate the immediate left-recursion in S
S  fA’aS’ | bS’
S’  dA’aS’ | 
So, the resulting equivalent grammar which is not left-recursive is:
S  fA’aS’ | bS’
S’  dA’aS’ | 
A  SdA’ | fA’
A’  cA’ | 
Left-Recursive Grammars III
 Here is an example of a (directly) left-recursive grammar:
E  E + T | T
T  T * F | F
F  ( E ) | id
 This grammar can be re-written as the following non left-
recursive grammar:
E  T E’ E’  + TE’ | є
T  F T’ T’  * F T’ | є
F  (E) | id
Left Factoring
 Left factoring is a grammar transformation that is useful for
producing a grammar suitable for predictive, or top-down,
parsing.
 Stmt -> if expr then stmt else stmt
|if expr then stmt
 A ->α 1 | α 2
 So it should be left factored as
Left-Factoring -- Algorithm
 For each non-terminal A with two or more alternatives (production rules)
with a common non-empty prefix
A  1 | ... | n | 1 | ... | m
convert it into
A  A’ | 1 | ... | m
A’  1 | ... | n
Left-Factoring – Example1
A  abB | aB | cdg | cdeB | cdfB

A  aA’ | cdg | cdeB | cdfB
A’  bB | B

A  aA’ | cdA’’
A’  bB | B
A’’  g | eB | fB
Left-Factoring – Example2
A  ad | a | ab | abc | b

A  aA’ | b
A’  d |  | b | bc

A  aA’ | b
A’  d |  | bA’’
A’’   | c
Top-Down Parsing
 The parse tree is created top to bottom.
 Top-down parser
 Recursive-descent parsing
 Backtracking is needed
 It is a general parsing technique, but not widely used.
 Not efficient
 Predictive parsing
 No backtracking
 Efficient
 Needs a special form of grammars - (LL(1) grammars).
 Recursive predictive parsing is a special form of recursive descent parsing without
backtracking.
 Non-recursive (table driven) predictive parser is also known as LL(1) parser.
Recursive Predictive Parsing
Each non-terminal corresponds to a procedure.
Ex: A  aBb
proc A {
- match the current token with a, and move to the next
token;
- call ‘B’;
- match the current token with b, and move to the next
token;
}
Recursive Predictive Parsing (cont.)
A  aBb | bAB
proc A {
case of the current token {
‘a’: - match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
‘b’: - match the current token with b, and move to the next token;
- call ‘A’;
- call ‘B’;
}
}
Top-down parse for id + id * id
FIRST and FOLLOW
 FIRST and FOLLOW allow us to choose which production toapply, based on the
next input symbol.
 FIRST(α), where α is any string of grammar symbols, to be the set of terminals that
begin strings derived from α.
 If α => ε, then ε is also in FIRST(α) .
 A => cY, so c is in FIRST(A)
 FOLLOW(A) is the set of the terminals which occur immediately after (follow) the
non-terminal A in the strings derived from the starting symbol.
 a terminal a is in FOLLOW(A) if S  Aa
 $ is in FOLLOW(A) if S  A
*
*
FIRST
1. If X is a terminal, then FIRST(X) = {X}.
2. If X is a nonterminal and X -> YI Y2 ... Yk is a production for some k>=1, then
place a in FIRST(X) if for some i, a is in FIRST(Yi), and ε is in all of
FIRST(YI), ... ,FIRST(Yi-I); that is , YI Y2 ... Yi-1 => ε. If ε is in FIRST (Yj) for
all j = 1, 2,... ,k, then add ε to FIRST (X). For example, everything in FIRST(Y1)
is surely in FIRST(X) . If Yi does not derive ε then we add nothing more to
FIRST (X) , but if Y1 => ε, then we add FIRST(Y2), and so on.
3. 3. If X => ε is a production, then add ε to FIRST (X).
*
FOLLOW
LL ( 1 ) Grammars
 L: scanning the input from left to right
 L: producing a leftmost derivation
 1 : one input symbol of lookahead at each step
 A grammar G is LL(1) if and only if whenever A -> α | β are two distinct
productions of G, the following conditions hold:
Construction of a predictive parsing
table.

Syntax analysis

  • 1.
  • 2.
    Syntax Analyzer  SyntaxAnalyzer creates the syntactic structure of the given source program.  This syntactic structure - parse tree.  Syntax Analyzer is also known as parser.  The syntax analyzer (parser) checks whether a given source program satisfies the rules implied by a context-free grammar or not.  If it satisfies, the parser creates the parse tree of that program.  Otherwise the parser gives the error messages.
  • 3.
    INTRODUCTION  Every programminglanguage has precise rules that prescribe the syntactic structure of well-formed programs.  Program is made up of functions, a function out of declarations and statements, a statement out of expressions  The syntax of programming language constructs can be specified by context- free grammars  A context-free grammar  gives a precise syntactic specification of a programming language.  the design of the grammar is an initial phase of the design of a compiler.  a grammar can be directly converted into a parser by some tools.
  • 4.
    Parser • Parser workson a stream of tokens. • The smallest item is a token. Lexical Analyzer Parser source program token get next token parse tree
  • 5.
    Parsing  Parsing isthe process of determining whether a string of tokens can be generated by a grammar.  Parsing methods  The top-down  Bottom-up methods.  Top-down parsing, construction starts at the root and proceeds to the leaves.  Bottom-up parsing, construction starts at the leaves and proceeds towards the root.  Top-down parsers are easy to build by hand.  Bottom-up parsing,  Can handle a larger class of grammars.  They are not as easy to build, but tools for generating them directly from a grammar are available.  Both top-down and bottom-up parsers scan the input from left to right (one symbol at a time).
  • 6.
    Top- Down Parsing Done by starting with the root, labeled with the starting nonterminal stmt, and repeatedly performing the following two steps.  At node N, labeled with nonterminal A, select one of the productions for A and construct children at N for the symbols in the production body.  Find the next node at which a subtree is to be constructed, typically the leftmost unexpanded nonterminal of the tree.  The current terminal being scanned in the input is frequently referred to as the lookahead symbol.
  • 7.
  • 8.
  • 9.
  • 10.
    Top-Down Parsing  Top-DownParsing is an attempt to find a left-most derivation for an input string  Example: S  cAd Find a derivation for A  ab | a for w  cad S S Backtrack S / |  / |  / | c A d c A d c A d / | a b a
  • 11.
    Predictive Parsing  Recursive-descentparsing is a top-down method of syntax analysis in which a set of recursive procedures is used to process the input.  Simple form of recursive descent – Predictive Parsing
  • 12.
    Syntax Error Handling Goals in error handling  Report the presence of errors clearly and accurately.  Recover from each error quickly enough to detect subsequent errors.  Add minimal overhead to the processing of correct programs.
  • 13.
    Error-Recovery Strategies  Thesimplest approach is for the parser to quit with an informative error message when it detects the first error.  Panic-mode recovery  Phrase-level recovery  Error-productions  Global-correction.
  • 14.
    Panic-Mode Recovery  Theparser discards input symbols one at a time until one of a designated set of synchronizing tokens is found.  The synchronizing tokens are usually delimiters, such as ; or }.  Skips a considerable amount of input without checking for additional errors  It has the advantage of simplicity, and is guaranteed not to go into an infinite loop.
  • 15.
    Phrase-Level Recovery  Performlocal correction on the remaining input;  It may replace a prefix of the remaining input by some string that allows the parser to continue.  A typical local correction is to replace a comma by a semicolon.  Delete an extraneous semicolon.  Insert a missing semicolon.  Disadvantage in coping with situations in which the actual error has occurred before the point of detection.
  • 16.
    Error Productions  Expandthe grammar for the language at hand with productions that generate the erroneous constructs.  The parser can then generate appropriate error diagnostics about the erroneous construct that has been recognized in the input.
  • 17.
    Global Correction  Compilerto make as few changes as possible in processing an incorrect input string.  Given an incorrect input string x and grammar G, algorithms will find a parse tree for a related string y, such that the number of insertions, deletions, and changes of tokens required to transform x into y is as small as possible.  Not implemented.
  • 18.
    Syntax Definition  Agrammar describes the hierarchical structure of programming language constructs.  Eg: if ( expression ) statement else statement  An if-else statement is the concatenation of the keyword if, an opening parenthesis, an expression, a closing parenthesis, a statement, the keyword else, and another statement.  Stmt -> if ( expr ) stmt else stmt  Rule is called a production.  In a production, lexical elements if and the parentheses are called terminals.  Variables like expr and stmt are called nonterminals.
  • 19.
    A Context FreeGrammar  A context-free grammar has four components:  A set of terminal symbols, sometimes referred to as "tokens.“  A set of nonterminals, sometimes called "syntactic variables."  A set of productions, where each production consists of a nonterminal,called the head or left side of the production, an arrow, and a sequence of terminals and/or nonterminals , called the body or right side of the production  A designation of one of the nonterminals as the start symbol.
  • 20.
    A Context FreeGrammar The terminal symbols are
  • 21.
    Notational Conventions These symbolsare terminals:  Lowercase letters early in the alphabet, such as a, b, c.  Operator symbols such as +, *, and so on.  Punctuation symbols such as parentheses, comma, and so on.  The digits 0, 1, . . . , 9.  Boldface strings such as id or if, each of which represents a single terminal symbol.
  • 22.
    Notational Conventions These symbolsare nonterminals:  Uppercase letters early in the alphabet, such as A, B, C.  The letter s, which, when it appears, is usually the start symbol.  Lowercase, italic names such as expr or stmt.  Uppercase letters may be used t o represent nonterminals for the constructs. For example, nonterminals for expressions, terms, and factors are often represented by E, T, and F, respectively.
  • 23.
    Notational Conventions  Uppercaseletters late in the alphabet, such as X, Y, Z, represent grammar symbols; that is, either nonterminals or terminals.  Lowercase letters late in the alphabet , chiefly u, v, ... ,z, represent (possibly empty) strings of terminals.  Lowercase greek letters,α, β, γ for example, represent (possibly empty) strings of grammar symbols.  A set of productions a -> α 1 , a -> α2, ... , a -> α k with a common head  A (call them a-productions) , may be written A -> α 1 I α 2 I . , . I α k · call α1 , α2 , ... ,αk the alternatives for A.  Unless stated otherwise, the head of the first production is the start symbol
  • 24.
  • 25.
    Derivations  E E+E : E+E derives from E  E  E+E  id+E  id+id  A sequence of replacements of non-terminal symbols is called a derivation of id+id from E.  A   if there is a production rule A in our grammar and  and  are arbitrary strings of terminal and non-terminal symbols 1  2  ...  n (n derives from 1 or 1 derives n )  : derives in one step  : derives in zero or more steps  : derives in one or more steps * +
  • 26.
    CFG - Terminology L(G) is the language of G (the language generated by G) which is a set of sentences.  A sentence of L(G) is a string of terminal symbols of G.  If S is the start symbol of G then  is a sentence of L(G) iff S   where  is a string of terminals of G  If G is a context-free grammar, L(G) is a context-free language.  Two grammars are equivalent if they produce the same language.  S   - If  contains non-terminals, it is called as a sentential form of G. - If  does not contain non-terminals, it is called as a sentence of G. * *
  • 27.
    Derivation Example  E -E  -(E)  -(E+E)  -(id+E)  -(id+id) OR  E  -E  -(E)  -(E+E)  -(E+id)  -(id+id)  At each derivation step, we can choose any of the non-terminal in the sentential form of G for the replacement.  If we always choose the left-most non-terminal in each derivation step, this derivation is called as left-most derivation.  If we always choose the right-most non-terminal in each derivation step, this derivation is called as right-most derivation.
  • 28.
    Left-Most and Right-MostDerivations Left-Most Derivation E  -E  -(E)  -(E+E)  -(id+E)  -(id+id) Right-Most Derivation E  -E  -(E)  -(E+E)  -(E+id)  -(id+id)  We will see that the top-down parsers try to find the left-most derivation of the given source program.  We will see that the bottom-up parsers try to find the right-most derivation of the given source program in the reverse order. lmlmlmlmlm rmrmrmrmrm
  • 29.
    Parse Trees andDerivations  A parse tree is a graphical representation of a derivation that filters out the order in which productions are applied to replace nonterminals.  The interior node is labeled with the nonterminal A in the head of the production;  The children of the node are labeled, from left to right, by the symbols in the body of the production  The leaves of a parse tree are labeled by nonterminals or terminals  Read from left to right, constitute a sentential form, called the yield or frontier of the tree.  There is a many-to-one relationship between derivations and parse trees.
  • 30.
    Ambiguity  a grammarthat produces more than one parse tree for some sentence is said to be ambiguous
  • 31.
  • 32.
    Writing a Grammar Grammars are capable of describing most, of the syntax of programming languages .  Grammar should be unambiguous.  Left-recursion elimination and left factoring - are useful for rewriting grammars .  From the resulting grammar we can create top down parsers without backtracking.  Such parsers are called predictive parsers or recursive-descent parser
  • 33.
    Eliminating Ambiguity  ambiguousgrammar can be rewritten to eliminate the ambiguity.  stmt -> if expr then stmt |if expr then stmt else stmt |other  is ambiguous since the string  if E1 then if E2 then S1 else S2 has the two parse trees
  • 34.
    Two parse treesfor an ambiguous sentence
  • 35.
    Eliminating Ambiguity  Thegeneral rule is, "Match each else with the closest unmatched then."
  • 36.
    Left Recursion  Agrammar is left recursive if it has a non-terminal A such that there is a derivation. A  A for some string   Top-down parsing techniques cannot handle left-recursive grammars.  The left-recursion may appear in a single step of the derivation (immediate left- recursion), or may appear in more than one step of the derivation. *
  • 37.
    Immediate Left-Recursion A A  |  where  does not start with A  eliminate immediate left recursion A   A’ A’   A’ |  A  A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with A  eliminate immediate left recursion A  1 A’ | ... | n A’ A’  1 A’ | ... | m A’ |  an equivalent grammar In general,
  • 38.
    Left-Recursion -- Problem •A grammar cannot be immediately left-recursive, but it still can be left-recursive. S  Aa | b A  Sc | d S  Aa  Sca A  Sc  Aac causes to a left-recursion
  • 39.
    Eliminate Left-Recursion --Algorithm - Arrange non-terminals in some order: A1 ... An - for i from 1 to n do { - for j from 1 to i-1 do { replace each production Ai  Aj  by Ai  1  | ... | k  where Aj  1 | ... | k } - eliminate immediate left-recursions among Ai productions }
  • 40.
    Eliminate Left-Recursion S Aa | b A  Ac | Sd | f - Order of non-terminals: S, A - A  Ac | Aad | bd | f - Eliminate the immediate left-recursion in A A  bdA’ | fA’ A’  cA’ | adA’ |  So, the resulting equivalent grammar which is not left-recursive is: S  Aa | b A  bdA’ | fA’ A’  cA’ | adA’ | 
  • 41.
    Eliminate Left-Recursion –Example2 S  Aa | b A  Ac | Sd | f - Order of non-terminals: A, S - Eliminate the immediate left-recursion in A A  SdA’ | fA’ A’  cA’ |  - Replace S  Aa with S  SdA’a | fA’a - Eliminate the immediate left-recursion in S S  fA’aS’ | bS’ S’  dA’aS’ |  So, the resulting equivalent grammar which is not left-recursive is: S  fA’aS’ | bS’ S’  dA’aS’ |  A  SdA’ | fA’ A’  cA’ | 
  • 42.
    Left-Recursive Grammars III Here is an example of a (directly) left-recursive grammar: E  E + T | T T  T * F | F F  ( E ) | id  This grammar can be re-written as the following non left- recursive grammar: E  T E’ E’  + TE’ | є T  F T’ T’  * F T’ | є F  (E) | id
  • 43.
    Left Factoring  Leftfactoring is a grammar transformation that is useful for producing a grammar suitable for predictive, or top-down, parsing.  Stmt -> if expr then stmt else stmt |if expr then stmt  A ->α 1 | α 2  So it should be left factored as
  • 45.
    Left-Factoring -- Algorithm For each non-terminal A with two or more alternatives (production rules) with a common non-empty prefix A  1 | ... | n | 1 | ... | m convert it into A  A’ | 1 | ... | m A’  1 | ... | n
  • 46.
    Left-Factoring – Example1 A abB | aB | cdg | cdeB | cdfB  A  aA’ | cdg | cdeB | cdfB A’  bB | B  A  aA’ | cdA’’ A’  bB | B A’’  g | eB | fB
  • 47.
    Left-Factoring – Example2 A ad | a | ab | abc | b  A  aA’ | b A’  d |  | b | bc  A  aA’ | b A’  d |  | bA’’ A’’   | c
  • 48.
    Top-Down Parsing  Theparse tree is created top to bottom.  Top-down parser  Recursive-descent parsing  Backtracking is needed  It is a general parsing technique, but not widely used.  Not efficient  Predictive parsing  No backtracking  Efficient  Needs a special form of grammars - (LL(1) grammars).  Recursive predictive parsing is a special form of recursive descent parsing without backtracking.  Non-recursive (table driven) predictive parser is also known as LL(1) parser.
  • 49.
    Recursive Predictive Parsing Eachnon-terminal corresponds to a procedure. Ex: A  aBb proc A { - match the current token with a, and move to the next token; - call ‘B’; - match the current token with b, and move to the next token; }
  • 50.
    Recursive Predictive Parsing(cont.) A  aBb | bAB proc A { case of the current token { ‘a’: - match the current token with a, and move to the next token; - call ‘B’; - match the current token with b, and move to the next token; ‘b’: - match the current token with b, and move to the next token; - call ‘A’; - call ‘B’; } }
  • 51.
    Top-down parse forid + id * id
  • 53.
    FIRST and FOLLOW FIRST and FOLLOW allow us to choose which production toapply, based on the next input symbol.  FIRST(α), where α is any string of grammar symbols, to be the set of terminals that begin strings derived from α.  If α => ε, then ε is also in FIRST(α) .  A => cY, so c is in FIRST(A)  FOLLOW(A) is the set of the terminals which occur immediately after (follow) the non-terminal A in the strings derived from the starting symbol.  a terminal a is in FOLLOW(A) if S  Aa  $ is in FOLLOW(A) if S  A * *
  • 54.
    FIRST 1. If Xis a terminal, then FIRST(X) = {X}. 2. If X is a nonterminal and X -> YI Y2 ... Yk is a production for some k>=1, then place a in FIRST(X) if for some i, a is in FIRST(Yi), and ε is in all of FIRST(YI), ... ,FIRST(Yi-I); that is , YI Y2 ... Yi-1 => ε. If ε is in FIRST (Yj) for all j = 1, 2,... ,k, then add ε to FIRST (X). For example, everything in FIRST(Y1) is surely in FIRST(X) . If Yi does not derive ε then we add nothing more to FIRST (X) , but if Y1 => ε, then we add FIRST(Y2), and so on. 3. 3. If X => ε is a production, then add ε to FIRST (X). *
  • 55.
  • 57.
    LL ( 1) Grammars  L: scanning the input from left to right  L: producing a leftmost derivation  1 : one input symbol of lookahead at each step  A grammar G is LL(1) if and only if whenever A -> α | β are two distinct productions of G, the following conditions hold:
  • 58.
    Construction of apredictive parsing table.