Chapter โ€“ three
Syntax analysis
Top Down Parsing
And
Bottom Up Parsing
1
Outline
๏ฒ Introduction
๏ฒ Context free grammar (CFG)
๏ฒ Derivation
๏ฒ Parse tree
๏ฒ Ambiguity
๏ฒ Top-down parsing
๏ฒ Bottom up parsing
2
Introduction
๏ฒ Syntax: the way in which tokens are put together to
form expressions, statements, or blocks of statements.
๏ฑ The rules governing the formation of statements in a
programming language.
๏ฒ Syntax analysis: the task concerned with fitting a
sequence of tokens into a specified syntax.
๏ฒ Parsing: To break a sentence down into its component
parts with an explanation of the form, function, and
syntactical relationship of each part.
๏ฒ The syntax of a programming language is usually given
by the grammar rules of a context free grammar (CFG).
3
4
Parser
lexical
analyzer
Syntax
analyzer
symbol
table
get next
token
Source
Program
get next
char
next char next token
(Contains a record
for each identifier)
Lexical
Error
Syntax
Error
Parse tree
Introductionโ€ฆ
๏ฒ The syntax analyzer (parser) checks whether a given
source program satisfies the rules implied by a CFG
or not.
๏ญ If it satisfies, the parser creates the parse tree of that
program.
๏ญ Otherwise, the parser gives the error messages.
๏ฒ A CFG:
๏ญ gives a precise syntactic specification of a
programming language.
๏ญ A grammar can be directly converted in to a parser by
some tools (yacc).
5
Introductionโ€ฆ
๏ฒ The parser can be categorized into two groups:
๏ฒ Top-down parser
๏ญ The parse tree is created top to bottom, starting from
the root to leaves.
๏ฒ Bottom-up parser
๏ญ The parse tree is created bottom to top, starting from
the leaves to root.
๏ฒ Both top-down and bottom-up parser scan the input
from left to right (one symbol at a time).
๏ฒ Efficient top-down and bottom-up parsers can be
implemented by making use of context-free-
grammar.
๏ญ LL for top-down parsing
๏ญ LR for bottom-up parsing
6
Context free grammar (CFG)
๏ฒ A context-free grammar is a specification for the
syntactic structure of a programming language.
๏ฒ Context-free grammar has 4-tuples:
G = (T, N, P, S) where
๏ญ T is a finite set of terminals (a set of tokens)
๏ญ N is a finite set of non-terminals (syntactic variables)
๏ญ P is a finite set of productions of the form
A โ†’ ฮฑ where A is non-terminal and
ฮฑ is a strings of terminals and non-terminals (including the
empty string)
๏ฑ S โˆˆ N is a designated start symbol (one of the non-
terminal symbols)
7
Example: grammar for simple arithmetic
expressions
expression ๏ƒ  expression + term
expression ๏ƒ  expression - term
expression ๏ƒ  term
term ๏ƒ  term * factor
term ๏ƒ  term / factor
term ๏ƒ  factor
factor ๏ƒ  (expression )
factor ๏ƒ  id
8
Terminal symbols
id + - * / ( )
Non-terminals
expression
term
Factor
Start symbol
expression
Derivation
๏ฒ A derivation is a sequence of replacements of structure names
by choices on the right hand sides of grammar rules.
๏ฒ Example: E โ†’ E + E | E โ€“ E | E * E | E / E | -E
E โ†’ ( E )
E โ†’ id
E => E + E means that E + E is derived from E
- we can replace E by E + E
- we have to have a production rule E โ†’ E+E in our grammar.
E=>E+E =>id+E=>id+id means that a sequence of replacements of
non-terminal symbols is called a derivation of id+id from E.
9
Derivationโ€ฆ
๏ฑ If we always choose the left-most non-terminal in each
derivation step, this derivation is called left-most derivation.
Example: E=>-E=>-(E)=>-(E+E)=>-(id+E)=>-(id+id)
๏ฑ If we always choose the right-most non-terminal in each
derivation step, this derivation is called right-most
derivation.
Example: E=>-E=>-(E)=>-(E+E)=>-(E+id)=>-(id+id)
๏ฑ We will see that the top-down parser try to find the left-most
derivation of the given source program.
๏ฑ We will see that the bottom-up parser try to find right-most
derivation of the given source program in the reverse order.
10
Parse tree
๏ฒ A parse tree is a graphical representation of a
derivation.
๏ฒ It filters out the order in which productions are applied
to replace non-terminals.
๏ฒ A parse tree corresponding to a derivation is a labeled
tree in which:
โ€ข the interior nodes are labeled by non-terminals,
โ€ข the leaf nodes are labeled by terminals, and
โ€ข the children of each internal node represent the
replacement of the associated non-terminal in one
step of the derivation.
11
12
Parse tree and Derivation
E ๏‚ฎ E + E | E ๏€ช E | ( E ) | - E | id
Lets examine this derivation:
E ๏ƒž -E ๏ƒž -(E) ๏ƒž -(E + E) ๏ƒž -(id + id)
E E
E
-
E
E
-
E
( )
E
E
-
E
( )
+
E E
E
E
-
E
( )
+
E E
id id
This is a top-down derivation
because we start building the
parse tree at the top parse tree
Grammar
13
Which parse tree is correct?
Ambiguity: example
Construct parse tree for the expression: id + id ๏€ช id
E E
+
E E
E
+
E E
๏€ช
E E
E
+
E E
๏€ช
E E
id id
id
E E
๏€ช
E E
E
๏€ช
E E
+
E E
E
๏€ช
E E
+
E E
id id
id
E ๏‚ฎ E + E | E ๏€ช E | ( E ) | - E | id
14
According to the grammar, both are correct.
Ambiguity: exampleโ€ฆ
Find a derivation for the expression: id + id ๏€ช id
E
+
E E
๏€ช
E E
id id
id
E
*
E E
+
E E
id id
id
A grammar that produces more than one
parse tree for any input sentence is said
to be an ambiguous grammar.
E ๏‚ฎ E + E | E ๏€ช E | ( E ) | - E | id
Syntax analysis
๏ฒ Every language has rules that prescribe the syntactic
structure of well formed programs.
๏ฒ The syntax can be described using Context Free
Grammars (CFG) notation.
๏ฒ The use of CFGs has several advantages:
๏ญ helps in identifying ambiguities
๏ญ it is possible to have a tool which produces automatically
a parser using the grammar
๏ญ a properly designed grammar helps in modifying the
parser easily when the language changes
15
Top-down parsing
Recursive Descent Parsing (RDP)
๏ฒ This method of top-down parsing can be considered as
an attempt to find the left most derivation for an input
string. It may involve backtracking.
๏ฒ To construct the parse tree using RDP:
๏ญ we create one node tree consisting of S.
๏ญ two pointers, one for the tree and one for the input, will
be used to indicate where the parsing process is.
๏ญ initially, they will be on S and the first input symbol,
respectively.
๏ญ then we use the first S-production to expand the tree.
The tree pointer will be positioned on the left most
symbol of the newly created sub-tree.
16
Bottom up Parsing
๏ฒ Top-down parsers:
โ€ข Starts constructing the parse tree at the
top (root) of the tree and move down
towards the leaves.
โ€ข Easy to implement by hand, but work
with restricted grammars.
๏ฒ example: predictive parsers
18
Bottom-Up and Top-Down
Parsers
Top-down parsers:
โ€ข Starts constructing the parse tree at the top (root) of the
tree and move down towards the leaves.
โ€ข Easy to implement by hand, but work with restricted
grammars.
example: predictive parsers
Bottom-up parsers:
โ€ข build the nodes on the bottom of the parse tree first.
โ€ข Suitable for automatic parser generation, handle a larger
class of grammars.
examples: shift-reduce parser (or LR(k) parsers)
19
Bottom-Up Parser
A bottom-up parser, or a shift-reduce parser, begins
at the leaves and works up to the top of the tree.
The reduction steps trace a rightmost derivation
on reverse.
S ๏‚ฎ aABe
A ๏‚ฎ Abc | b
B ๏‚ฎ d
Consider the Grammar:
We want to parse the input string abbcde.
This parser is known as an LR Parser because
it scans the input from Left to right, and it constructs
a Rightmost derivation in reverse order.
Bottom-up parser (LR parsing)
S ๏ƒ  aABe
A ๏ƒ  Abc | b
B ๏ƒ  d
abbcde ๏ƒŸ aAbcde ๏ƒŸ aAde ๏ƒŸ aABe ๏ƒŸ S
๏ฒ At each step, we have to find ฮฑ such that ฮฑ is a
substring of the sentence and replace ฮฑ by A, where
A ๏ƒ  ฮฑ
20
21
12/24/2023
.

Chapter-3 compiler.pptx course materials

  • 1.
    Chapter โ€“ three Syntaxanalysis Top Down Parsing And Bottom Up Parsing 1
  • 2.
    Outline ๏ฒ Introduction ๏ฒ Contextfree grammar (CFG) ๏ฒ Derivation ๏ฒ Parse tree ๏ฒ Ambiguity ๏ฒ Top-down parsing ๏ฒ Bottom up parsing 2
  • 3.
    Introduction ๏ฒ Syntax: theway in which tokens are put together to form expressions, statements, or blocks of statements. ๏ฑ The rules governing the formation of statements in a programming language. ๏ฒ Syntax analysis: the task concerned with fitting a sequence of tokens into a specified syntax. ๏ฒ Parsing: To break a sentence down into its component parts with an explanation of the form, function, and syntactical relationship of each part. ๏ฒ The syntax of a programming language is usually given by the grammar rules of a context free grammar (CFG). 3
  • 4.
    4 Parser lexical analyzer Syntax analyzer symbol table get next token Source Program get next char nextchar next token (Contains a record for each identifier) Lexical Error Syntax Error Parse tree
  • 5.
    Introductionโ€ฆ ๏ฒ The syntaxanalyzer (parser) checks whether a given source program satisfies the rules implied by a CFG or not. ๏ญ If it satisfies, the parser creates the parse tree of that program. ๏ญ Otherwise, the parser gives the error messages. ๏ฒ A CFG: ๏ญ gives a precise syntactic specification of a programming language. ๏ญ A grammar can be directly converted in to a parser by some tools (yacc). 5
  • 6.
    Introductionโ€ฆ ๏ฒ The parsercan be categorized into two groups: ๏ฒ Top-down parser ๏ญ The parse tree is created top to bottom, starting from the root to leaves. ๏ฒ Bottom-up parser ๏ญ The parse tree is created bottom to top, starting from the leaves to root. ๏ฒ Both top-down and bottom-up parser scan the input from left to right (one symbol at a time). ๏ฒ Efficient top-down and bottom-up parsers can be implemented by making use of context-free- grammar. ๏ญ LL for top-down parsing ๏ญ LR for bottom-up parsing 6
  • 7.
    Context free grammar(CFG) ๏ฒ A context-free grammar is a specification for the syntactic structure of a programming language. ๏ฒ Context-free grammar has 4-tuples: G = (T, N, P, S) where ๏ญ T is a finite set of terminals (a set of tokens) ๏ญ N is a finite set of non-terminals (syntactic variables) ๏ญ P is a finite set of productions of the form A โ†’ ฮฑ where A is non-terminal and ฮฑ is a strings of terminals and non-terminals (including the empty string) ๏ฑ S โˆˆ N is a designated start symbol (one of the non- terminal symbols) 7
  • 8.
    Example: grammar forsimple arithmetic expressions expression ๏ƒ  expression + term expression ๏ƒ  expression - term expression ๏ƒ  term term ๏ƒ  term * factor term ๏ƒ  term / factor term ๏ƒ  factor factor ๏ƒ  (expression ) factor ๏ƒ  id 8 Terminal symbols id + - * / ( ) Non-terminals expression term Factor Start symbol expression
  • 9.
    Derivation ๏ฒ A derivationis a sequence of replacements of structure names by choices on the right hand sides of grammar rules. ๏ฒ Example: E โ†’ E + E | E โ€“ E | E * E | E / E | -E E โ†’ ( E ) E โ†’ id E => E + E means that E + E is derived from E - we can replace E by E + E - we have to have a production rule E โ†’ E+E in our grammar. E=>E+E =>id+E=>id+id means that a sequence of replacements of non-terminal symbols is called a derivation of id+id from E. 9
  • 10.
    Derivationโ€ฆ ๏ฑ If wealways choose the left-most non-terminal in each derivation step, this derivation is called left-most derivation. Example: E=>-E=>-(E)=>-(E+E)=>-(id+E)=>-(id+id) ๏ฑ If we always choose the right-most non-terminal in each derivation step, this derivation is called right-most derivation. Example: E=>-E=>-(E)=>-(E+E)=>-(E+id)=>-(id+id) ๏ฑ We will see that the top-down parser try to find the left-most derivation of the given source program. ๏ฑ We will see that the bottom-up parser try to find right-most derivation of the given source program in the reverse order. 10
  • 11.
    Parse tree ๏ฒ Aparse tree is a graphical representation of a derivation. ๏ฒ It filters out the order in which productions are applied to replace non-terminals. ๏ฒ A parse tree corresponding to a derivation is a labeled tree in which: โ€ข the interior nodes are labeled by non-terminals, โ€ข the leaf nodes are labeled by terminals, and โ€ข the children of each internal node represent the replacement of the associated non-terminal in one step of the derivation. 11
  • 12.
    12 Parse tree andDerivation E ๏‚ฎ E + E | E ๏€ช E | ( E ) | - E | id Lets examine this derivation: E ๏ƒž -E ๏ƒž -(E) ๏ƒž -(E + E) ๏ƒž -(id + id) E E E - E E - E ( ) E E - E ( ) + E E E E - E ( ) + E E id id This is a top-down derivation because we start building the parse tree at the top parse tree Grammar
  • 13.
    13 Which parse treeis correct? Ambiguity: example Construct parse tree for the expression: id + id ๏€ช id E E + E E E + E E ๏€ช E E E + E E ๏€ช E E id id id E E ๏€ช E E E ๏€ช E E + E E E ๏€ช E E + E E id id id E ๏‚ฎ E + E | E ๏€ช E | ( E ) | - E | id
  • 14.
    14 According to thegrammar, both are correct. Ambiguity: exampleโ€ฆ Find a derivation for the expression: id + id ๏€ช id E + E E ๏€ช E E id id id E * E E + E E id id id A grammar that produces more than one parse tree for any input sentence is said to be an ambiguous grammar. E ๏‚ฎ E + E | E ๏€ช E | ( E ) | - E | id
  • 15.
    Syntax analysis ๏ฒ Everylanguage has rules that prescribe the syntactic structure of well formed programs. ๏ฒ The syntax can be described using Context Free Grammars (CFG) notation. ๏ฒ The use of CFGs has several advantages: ๏ญ helps in identifying ambiguities ๏ญ it is possible to have a tool which produces automatically a parser using the grammar ๏ญ a properly designed grammar helps in modifying the parser easily when the language changes 15
  • 16.
    Top-down parsing Recursive DescentParsing (RDP) ๏ฒ This method of top-down parsing can be considered as an attempt to find the left most derivation for an input string. It may involve backtracking. ๏ฒ To construct the parse tree using RDP: ๏ญ we create one node tree consisting of S. ๏ญ two pointers, one for the tree and one for the input, will be used to indicate where the parsing process is. ๏ญ initially, they will be on S and the first input symbol, respectively. ๏ญ then we use the first S-production to expand the tree. The tree pointer will be positioned on the left most symbol of the newly created sub-tree. 16
  • 17.
    Bottom up Parsing ๏ฒTop-down parsers: โ€ข Starts constructing the parse tree at the top (root) of the tree and move down towards the leaves. โ€ข Easy to implement by hand, but work with restricted grammars. ๏ฒ example: predictive parsers
  • 18.
    18 Bottom-Up and Top-Down Parsers Top-downparsers: โ€ข Starts constructing the parse tree at the top (root) of the tree and move down towards the leaves. โ€ข Easy to implement by hand, but work with restricted grammars. example: predictive parsers Bottom-up parsers: โ€ข build the nodes on the bottom of the parse tree first. โ€ข Suitable for automatic parser generation, handle a larger class of grammars. examples: shift-reduce parser (or LR(k) parsers)
  • 19.
    19 Bottom-Up Parser A bottom-upparser, or a shift-reduce parser, begins at the leaves and works up to the top of the tree. The reduction steps trace a rightmost derivation on reverse. S ๏‚ฎ aABe A ๏‚ฎ Abc | b B ๏‚ฎ d Consider the Grammar: We want to parse the input string abbcde. This parser is known as an LR Parser because it scans the input from Left to right, and it constructs a Rightmost derivation in reverse order.
  • 20.
    Bottom-up parser (LRparsing) S ๏ƒ  aABe A ๏ƒ  Abc | b B ๏ƒ  d abbcde ๏ƒŸ aAbcde ๏ƒŸ aAde ๏ƒŸ aABe ๏ƒŸ S ๏ฒ At each step, we have to find ฮฑ such that ฮฑ is a substring of the sentence and replace ฮฑ by A, where A ๏ƒ  ฮฑ 20
  • 21.