SlideShare a Scribd company logo
1 of 170
Download to read offline
Syntax Analysis
Announcements
● Written Assignment 1 out, due Friday, July
6th at 5PM.
● Explore the theoretical aspects of scanning.
● See the limits of maximal-munch scanning.
● Class mailing list:
● There is an issue with SCPD students and the
course mailing list.
●
Email the staff immediately if you haven't
gotten any of our emails.
Where We Are
Lexical Analysis
Syntax Analysis
Semantic Analysis
IR Generation
IR Optimization
Code Generation
Optimization
Source
Code
Machine
Code
Where We Are
Lexical Analysis
Syntax Analysis
Semantic Analysis
IR Generation
IR Optimization
Code Generation
Optimization
Source
Code
Machine
Code
Flex-pert
Flex-pert
Where We Are
Lexical Analysis
Syntax Analysis
Semantic Analysis
IR Generation
IR Optimization
Code Generation
Optimization
Source
Code
Machine
Code
while (ip < z)
++ip;
w h i l e ( i < z ) n t + i p ;
while (ip < z)
++ip;
p + +
w h i l e ( i < z ) n t + i p ;
while (ip < z)
++ip;
p + +
T_While ( T_Ident < T_Ident ) ++ T_Ident
ip z ip
w h i l e ( i < z ) n t + i p ;
while (ip < z)
++ip;
p + +
T_While ( T_Ident < T_Ident ) ++ T_Ident
ip z ip
While
++
Ident
<
Ident Ident
ip z ip
do[for] = new 0;
do[for] = new 0;
d o [ f ] n e w 0 ;
=
o r
do[for] = new 0;
T_Do [ T_For T_New T_IntConst
0
d o [ f ] n e w 0 ;
=
o r
] =
do[for] = new 0;
T_Do [ T_For T_New T_IntConst
0
d o [ f ] n e w 0 ;
=
o r
] =
What is Syntax Analysis?
● After lexical analysis (scanning), we have
a series of tokens.
● In syntax analysis (or parsing), we
want to interpret what those tokens
mean.
● Goal: Recover the structure described by
that series of tokens.
● Goal: Report errors if those tokens do not
properly encode a structure.
Outline
● Today: Formalisms for syntax analysis.
● Context-Free Grammars
● Derivations
● Concrete and Abstract Syntax Trees
● Ambiguity
● Next Week: Parsing algorithms.
● Top-Down Parsing
● Bottom-Up Parsing
Formal Languages
● An alphabet is a set Ξ£ of symbols that act
as letters.
● A language over Ξ£ is a set of strings made
from symbols in Ξ£.
● When scanning, our alphabet was ASCII or
Unicode characters. We produced tokens.
● When parsing, our alphabet is the set of
tokens produced by the scanner.
The Limits of Regular Languages
● When scanning, we used regular expressions to
define each token.
● Unfortunately, regular expressions are (usually)
too weak to define programming languages.
● Cannot define a regular expression matching all
expressions with properly balanced parentheses.
● Cannot define a regular expression matching all
functions with properly nested block structure.
● We need a more powerful formalism.
Context-Free Grammars
● A context-free grammar (or CFG) is a
formalism for defining languages.
● Can define the context-free languages,
a strict superset of the the regular
languages.
● CFGs are best explained by example...
Arithmetic Expressions
● Suppose we want to describe all legal arithmetic
expressions using addition, subtraction,
multiplication, and division.
● Here is one possible CFG:
E β†’ int
E β†’ E Op E
E β†’ (E)
Op β†’ +
Op β†’ -
Op β†’ *
Op β†’ /
E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E * (E Op E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int Op int)
β‡’ int * (int + int)
Arithmetic Expressions
● Suppose we want to describe all legal arithmetic
expressions using addition, subtraction,
multiplication, and division.
● Here is one possible CFG:
E β†’ int
E β†’ E Op E
E β†’ (E)
Op β†’ +
Op β†’ -
Op β†’ *
Op β†’ /
E
β‡’ E Op E
β‡’ E Op int
β‡’ int Op int
β‡’ int / int
Context-Free Grammars
● Formally, a context-free
grammar is a collection of four
objects:
● A set of nonterminal symbols
(or variables),
● A set of terminal symbols,
● A set of production rules saying
how each nonterminal can be
converted by a string of terminals
and nonterminals, and
● A start symbol that begins the
derivation.
E β†’ int
E β†’ E Op E
E β†’ (E)
Op β†’ +
Op β†’ -
Op β†’ *
Op β†’ /
A Notational Shorthand
E β†’ int
E β†’ E Op E
E β†’ (E)
Op β†’ +
Op β†’ -
Op β†’ *
Op β†’ /
A Notational Shorthand
E β†’ int | E Op E | (E)
Op β†’ + | - | * | /
Not Notational Shorthand
● The syntax for regular expressions does
not carry over to CFGs.
● Cannot use *, |, or parentheses.
S β†’ a*b
Not Notational Shorthand
● The syntax for regular expressions does
not carry over to CFGs.
● Cannot use *, |, or parentheses.
S β†’ Ab
Not Notational Shorthand
● The syntax for regular expressions does
not carry over to CFGs.
● Cannot use *, |, or parentheses.
S β†’ Ab
A β†’ Aa | Ξ΅
Not Notational Shorthand
● The syntax for regular expressions does
not carry over to CFGs.
● Cannot use *, |, or parentheses.
S β†’ a(b|c*)
Not Notational Shorthand
● The syntax for regular expressions does
not carry over to CFGs.
● Cannot use *, |, or parentheses.
S β†’ aX
X β†’ (b|c*)
Not Notational Shorthand
● The syntax for regular expressions does
not carry over to CFGs.
● Cannot use *, |, or parentheses.
S β†’ aX
X β†’ b | c*
Not Notational Shorthand
● The syntax for regular expressions does
not carry over to CFGs.
● Cannot use *, |, or parentheses.
S β†’ aX
X β†’ b | C
Not Notational Shorthand
● The syntax for regular expressions does
not carry over to CFGs.
● Cannot use *, |, or parentheses.
S β†’ aX
X β†’ b | C
C β†’ Cc | Ξ΅
More Context-Free Grammars
● Chemicals!
C19
H14
O5
S
Cu3
(CO3
)2
(OH)2
MnO4
-
S2-
Form β†’ Cmp | Cmp Ion
Cmp β†’ Term | Term Num | Cmp Cmp
Term β†’ Elem | (Cmp)
Elem β†’ H | He | Li | Be | B | C | …
Ion β†’ + | - | IonNum + | IonNum -
IonNum β†’ 2 | 3 | 4 | ...
Num β†’ 1 | IonNum
CFGs for Chemistry
Form
β‡’ Cmp Ion
β‡’ Cmp Cmp Ion
β‡’ Cmp Term Num Ion
β‡’ Term Term Num Ion
β‡’ Elem Term Num Ion
β‡’ Mn Term Num Ion
β‡’ Mn Elem Num Ion
β‡’ MnO Num Ion
β‡’ MnO IonNum Ion
β‡’ MnO4
Ion
β‡’ MnO4
-
Form β†’ Cmp | Cmp Ion
Cmp β†’ Term | Term Num | Cmp Cmp
Term β†’ Elem | (Cmp)
Elem β†’ H | He | Li | Be | B | C | …
Ion β†’ + | - | IonNum + | IonNum -
IonNum β†’ 2 | 3 | 4 | ...
Num β†’ 1 | IonNum
CFGs for Programming Languages
BLOCK β†’ STMT
| { STMTS }
STMTS β†’ Ξ΅
| STMT STMTS
STMT β†’ EXPR;
| if (EXPR) BLOCK
| while (EXPR) BLOCK
| do BLOCK while (EXPR);
| BLOCK
| …
EXPR β†’ identifier
| constant
| EXPR + EXPR
| EXPR – EXPR
| EXPR * EXPR
| ...
Some CFG Notation
● We will be discussing generic
transformations and operations on CFGs
over the next two weeks.
● Let's standardize our notation.
Some CFG Notation
● Capital letters at the beginning of the
alphabet will represent nonterminals.
● i.e. A, B, C, D
● Lowercase letters at the end of the
alphabet will represent terminals.
● i.e. t, u, v, w
● Lowercase Greek letters will represent
arbitrary strings of terminals and
nonterminals.
● i.e. Ξ±, Ξ³, Ο‰
Examples
● We might write an arbitrary production as
A β†’ Ο‰
● We might write a string of a nonterminal followed
by a terminal as
At
● We might write an arbitrary production containing
a nonterminal followed by a terminal as
B → αAtω
Derivations
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E * (E Op E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int Op int)
β‡’ int * (int + int)
● This sequence of steps is called
a derivation.
● A string Ξ±AΟ‰ yields string
Ξ±Ξ³Ο‰ iff A β†’ Ξ³ is a production.
● If Ξ± yields Ξ², we write Ξ± β‡’ Ξ².
● We say that Ξ± derives Ξ² iff
there is a sequence of strings
where
Ξ± β‡’ Ξ±1
β‡’ Ξ±2
β‡’ … β‡’ Ξ²
● If Ξ± derives Ξ², we write Ξ± β‡’* Ξ².
Leftmost Derivations
β†’ STMTS
β‡’ STMT STMTS
β‡’ EXPR; STMTS
β‡’ EXPR = EXPR; STMTS
β‡’ id = EXPR; STMTS
β‡’ id = EXPR + EXPR; STMTS
β‡’ id = id + EXPR; STMTS
β‡’ id = id + constant; STMTS
β‡’ id = id + constant;
BLOCK β†’ STMT
| { STMTS }
STMTS β†’ Ξ΅
| STMT STMTS
STMT β†’ EXPR;
| if (EXPR) BLOCK
| while (EXPR) BLOCK
| do BLOCK while (EXPR);
| BLOCK
| …
EXPR β†’ identifier
| constant
| EXPR + EXPR
| EXPR – EXPR
| EXPR * EXPR
| EXPR = EXPR
| ...
Leftmost Derivations
● A leftmost derivation is a derivation in
which each step expands the leftmost
nonterminal.
● A rightmost derivation is a derivation
in which each step expands the rightmost
nonterminal.
● These will be of great importance when
we talk about parsing next week.
Related Derivations
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
Derivations Revisited
● A derivation encodes two pieces of
information:
● What productions were applied produce the
resulting string from the start symbol?
● In what order were they applied?
● Multiple derivations might use the same
productions, but apply them in a
different order.
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
E
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
E
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
E
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
E
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
E
E E
Op
int
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
E
E E
Op
int
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
E
E E
Op
int *
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
E
E E
Op
int *
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
E
E E
Op
int *
E
( )
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
E
E E
Op
int *
E
( )
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
E
E E
Op
int *
E
( )
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
E
E E
Op
int *
E
( )
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
E
E E
Op
int *
E
( int )
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
E
E E
Op
int *
E
( int )
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
E
E E
Op
int *
E
( int + )
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
E
E E
Op
int *
E
( int + )
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ int Op E
β‡’ int * E
β‡’ int * (E)
β‡’ int * (E Op E)
β‡’ int * (int Op E)
β‡’ int * (int + E)
β‡’ int * (int + int)
E
E E
Op
int *
E
( int + int )
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
E
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
E
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
E
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
E
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
E
E E
Op
E
( )
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
E
E E
Op
E
( )
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
E
E E
Op
E
( )
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
E
E E
Op
E
( )
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
E
E E
Op
E
( int )
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
E
E E
Op
E
( int )
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
E
E E
Op
E
( + int )
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
E
E E
Op
E
( + int )
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
E
E E
Op
E
( int + int )
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
E
E E
Op
E
( int + int )
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
E
E E
Op
*
E
( int + int )
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
E
E E
Op
*
E
( int + int )
E E
Op
Parse Trees
β‡’ E
β‡’ E Op E
β‡’ E Op (E)
β‡’ E Op (E Op E)
β‡’ E Op (E Op int)
β‡’ E Op (E + int)
β‡’ E Op (int + int)
β‡’ E * (int + int)
β‡’ int * (int + int)
E
E E
Op
int *
E
( int + int )
E E
Op
For Comparison
E
E E
Op
int *
E
( int + int )
E E
Op
E
E E
Op
int *
E
( int + int )
E E
Op
Parse Trees
● A parse tree is a tree encoding the steps
in a derivation.
● Internal nodes represent nonterminal
symbols used in the production.
● Inorder walk of the leaves contains the
generated string.
● Encodes what productions are used, not
the order in which those productions are
applied.
The Goal of Parsing
● Goal of syntax analysis: Recover the
structure described by a series of
tokens.
● If language is described as a CFG, goal is
to recover a parse tree for the the input
string.
● Usually we do some simplifications on the
tree; more on that later.
● We'll discuss how to do this next week.
Challenges in Parsing
E
E E
Op
int * int + int
E E
Op
E
E E
Op
int * int
E E
Op
+ int
A Serious Problem
int * (int + int) (int * int) + int
Ambiguity
● A CFG is said to be ambiguous if there is at
least one string with two or more parse trees.
● Note that ambiguity is a property of grammars,
not languages.
● There is no algorithm for converting an
arbitrary ambiguous grammar into an
unambiguous one.
● Some languages are inherently ambiguous, meaning
that no unambiguous grammar exists for them.
● There is no algorithm for detecting whether an
arbitrary grammar is ambiguous.
Is Ambiguity a Problem?
● Depends on semantics.
E β†’ int | E + E
Is Ambiguity a Problem?
● Depends on semantics.
E β†’ int | E + E
R
+ R
E
+ E
R
R + R
5 R R
+
3 7
E
E + E
5 E E
+
3 7
R
5
R
E
+
E E 7
5 3
Is Ambiguity a Problem?
● Depends on semantics.
E β†’ int | E + E | E - E
Is Ambiguity a Problem?
● Depends on semantics.
E β†’ int | E + E | E - E
R
+ R
E
+ E
R
R + R
5 R R
+
3 7
E
E - E
5 E E
+
3 7
R
5
R
E
-
E E 7
5 3
Resolving Ambiguity
● If a grammar can be made unambiguous
at all, it is usually made unambiguous
through layering.
● Have exactly one way to build each piece
of the string.
● Have exactly one way of combining those
pieces back together.
Example: Balanced Parentheses
● Consider the language of all strings of
balanced parentheses.
● Examples:
● Ξ΅
● ()
● (()())
● ((()))(())()
● Here is one possible grammar for balanced
parentheses:
P β†’ Ξ΅ | PP | (P)
Balanced Parentheses
● Given the grammar P β†’ Ξ΅ | PP | (P)
● How might we generate the string (()())?
Balanced Parentheses
● Given the grammar P β†’ Ξ΅ | PP | (P)
● How might we generate the string (()())?
P
Balanced Parentheses
● Given the grammar P β†’ Ξ΅ | PP | (P)
● How might we generate the string (()())?
P
P
( )
Balanced Parentheses
● Given the grammar P β†’ Ξ΅ | PP | (P)
● How might we generate the string (()())?
P
P
P P
( )
Balanced Parentheses
● Given the grammar P β†’ Ξ΅ | PP | (P)
● How might we generate the string (()())?
P
P
P P
P
( ( ) )
Balanced Parentheses
● Given the grammar P β†’ Ξ΅ | PP | (P)
● How might we generate the string (()())?
P
P
P P
P
( ( ) )
Ξ΅
Balanced Parentheses
● Given the grammar P β†’ Ξ΅ | PP | (P)
● How might we generate the string (()())?
P
P
P P
P P
( ( ) ( ) )
Ξ΅
Balanced Parentheses
● Given the grammar P β†’ Ξ΅ | PP | (P)
● How might we generate the string (()())?
P
P
P P
P P
( ( ) ( ) )
Ξ΅ Ξ΅
Balanced Parentheses
P
P
P P
P P
( ( ) ( ) )
Ξ΅ Ξ΅
P
P
Ξ΅
How to resolve this ambiguity?
( ( ) ( ) ) ( ) ( ( ) )
( ( ) ( ) ) ( ) ( ( ) )
( ( ) ( ) ) ( ) ( ( ) )
( ( ) ( ) ) ( ) ( ( ) )
Rethinking Parentheses
● A string of balanced parentheses is a
sequence of strings that are themselves
balanced parentheses.
● To avoid ambiguity, we can build the
string in two steps:
● Decide how many different substrings we
will glue together.
● Build each substring independently.
Let's ask the Internet for help!
Um... what?
● The way the nyancat flies across the sky
is similar to how we can build up strings
of balanced parentheses.
Um... what?
● The way the nyancat flies across the sky
is similar to how we can build up strings
of balanced parentheses.
Um... what?
● The way the nyancat flies across the sky
is similar to how we can build up strings
of balanced parentheses.
Um... what?
● The way the nyancat flies across the sky
is similar to how we can build up strings
of balanced parentheses.
Um... what?
● The way the nyancat flies across the sky
is similar to how we can build up strings
of balanced parentheses.
Um... what?
● The way the nyancat flies across the sky
is similar to how we can build up strings
of balanced parentheses.
Um... what?
● The way the nyancat flies across the sky
is similar to how we can build up strings
of balanced parentheses.
Um... what?
● The way the nyancat flies across the sky
is similar to how we can build up strings
of balanced parentheses.
β†’
β†’ Ξ΅
β†’
β†’
Building Parentheses
● Spread a string of parentheses across the
string. There is exactly one way to do
this for any number of parentheses.
● Expand out each substring by adding in
parentheses and repeating.
S
P
β†’
β†’
S
P | Ξ΅
( )
S
Building Parentheses
S
β‡’ PS
β‡’ PPS
β‡’ PP
β‡’ (S)P
β‡’ (S)(S)
β‡’ (PS)(S)
β‡’ (P)(S)
β‡’ ((S))(S)
β‡’ (())(S)
β‡’ (())()
S
P
β†’
β†’
S
P | Ξ΅
( )
S
Context-Free Grammars
● A regular expression can be
● Any letter
● Ξ΅
● The concatenation of regular expressions.
● The union of regular expressions.
● The Kleene closure of a regular expression.
● A parenthesized regular expression.
Context-Free Grammars
● This gives us the following CFG:
R β†’ a | b | c | …
R β†’ β€œΞ΅β€
R β†’ RR
R β†’ R β€œ|” R
R β†’ R*
R β†’ (R)
R β†’ a | b | c | …
R β†’ β€œΞ΅β€
R β†’ RR
R β†’ R β€œ|” R
R β†’ R*
R β†’ (R)
a | b *
R
R
R
R
a | b *
R R
R
R
a | (b*) (a | b)*
An Ambiguous Grammar
Resolving Ambiguity
● We can try to resolve the ambiguity via
layering:
R β†’ a | b | c | …
R β†’ β€œΞ΅β€
R β†’ RR
R β†’ R β€œ|” R
R β†’ R*
R β†’ (R)
|
a a *
b
Resolving Ambiguity
● We can try to resolve the ambiguity via
layering:
R β†’ a | b | c | …
R β†’ β€œΞ΅β€
R β†’ RR
R β†’ R β€œ|” R
R β†’ R*
R β†’ (R)
|
a a *
b
Resolving Ambiguity
● We can try to resolve the ambiguity via
layering:
R β†’ a | b | c | …
R β†’ β€œΞ΅β€
R β†’ RR
R β†’ R β€œ|” R
R β†’ R*
R β†’ (R)
|
a a *
b
Resolving Ambiguity
● We can try to resolve the ambiguity via
layering:
R β†’ a | b | c | …
R β†’ β€œΞ΅β€
R β†’ RR
R β†’ R β€œ|” R
R β†’ R*
R β†’ (R)
|
a a *
b
Resolving Ambiguity
● We can try to resolve the ambiguity via
layering:
R β†’ a | b | c | …
R β†’ β€œΞ΅β€
R β†’ RR
R β†’ R β€œ|” R
R β†’ R*
R β†’ (R)
|
a a *
b
Resolving Ambiguity
● We can try to resolve the ambiguity via
layering:
R β†’ a | b | c | …
R β†’ β€œΞ΅β€
R β†’ RR
R β†’ R β€œ|” R
R β†’ R*
R β†’ (R)
|
a a *
b
Resolving Ambiguity
● We can try to resolve the ambiguity via
layering:
R β†’ a | b | c | …
R β†’ β€œΞ΅β€
R β†’ RR
R β†’ R β€œ|” R
R β†’ R*
R β†’ (R)
|
a a *
b
Resolving Ambiguity
● We can try to resolve the ambiguity via
layering:
R β†’ a | b | c | …
R β†’ β€œΞ΅β€
R β†’ RR
R β†’ R β€œ|” R
R β†’ R*
R β†’ (R)
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | c | …
U β†’ β€œΞ΅β€
U β†’ (R)
Why is this unambiguous?
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | …
U β†’ β€œΞ΅β€
U β†’ (R)
Why is this unambiguous?
Only generates
β€œatomic” expressions
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | …
U β†’ β€œΞ΅β€
U β†’ (R)
Why is this unambiguous?
Only generates
β€œatomic” expressions
Puts stars onto
atomic expressions
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | …
U β†’ β€œΞ΅β€
U β†’ (R)
Why is this unambiguous?
Only generates
β€œatomic” expressions
Puts stars onto
atomic expressions
Concatenates starred
expressions
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | …
U β†’ β€œΞ΅β€
U β†’ (R)
Why is this unambiguous?
Only generates
β€œatomic” expressions
Puts stars onto
atomic expressions
Concatenates starred
expressions
Unions
concatenated
expressions
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | …
U β†’ β€œΞ΅β€
U β†’ (R)
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | c | …
U β†’ β€œΞ΅β€
U β†’ (R)
a b | c | a *
R
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | c | …
U β†’ β€œΞ΅β€
U β†’ (R)
a b | c | a *
S
R
R
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | c | …
U β†’ β€œΞ΅β€
U β†’ (R)
a b | c | a *
S
S
R
R
R
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | c | …
U β†’ β€œΞ΅β€
U β†’ (R)
a b | c | a *
S
S
S
R
R
R
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | c | …
U β†’ β€œΞ΅β€
U β†’ (R)
a b | c | a *
T
S
S
S
S
R
R
R
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | c | …
U β†’ β€œΞ΅β€
U β†’ (R)
a b | c | a *
T
T
S
S
S
S
R
R
R
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | c | …
U β†’ β€œΞ΅β€
U β†’ (R)
a b | c | a *
U
T
T
S
S
S
S
R
R
R
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | c | …
U β†’ β€œΞ΅β€
U β†’ (R)
a b | c | a *
U
T
T
S
S
S
S
R
R
R
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | c | …
U β†’ β€œΞ΅β€
U β†’ (R)
a b | c | a *
U U
T
T
S
S
S
S
R
R
R
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | c | …
U β†’ β€œΞ΅β€
U β†’ (R)
a b | c | a *
U U
T
T
S
S
S
S
R
R
R
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | c | …
U β†’ β€œΞ΅β€
U β†’ (R)
a b | c | a *
U U
T
T
T
S
S
S
S
R
R
R
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | c | …
U β†’ β€œΞ΅β€
U β†’ (R)
a b | c | a *
U U U
T
T
T
S
S
S
S
R
R
R
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | c | …
U β†’ β€œΞ΅β€
U β†’ (R)
a b | c | a *
U U U
T
T
T
S
S
S
S
R
R
R
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | c | …
U β†’ β€œΞ΅β€
U β†’ (R)
a b | c | a *
U U U
T
T
T
S
S
S
S
R
R
R
T
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | c | …
U β†’ β€œΞ΅β€
U β†’ (R)
a b | c | a *
U U U
T
T
T
T
S
S
S
S
R
R
R
T
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | c | …
U β†’ β€œΞ΅β€
U β†’ (R)
a b | c | a *
U U U U
T
T
T
T
S
S
S
S
R
R
R
T
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | c | …
U β†’ β€œΞ΅β€
U β†’ (R)
a b | c | a *
U U U U
T
T
T
T
S
S
S
S
R
R
R
T
Precedence Declarations
● If we leave the world of pure CFGs, we
can often resolve ambiguities through
precedence declarations.
● e.g. multiplication has higher precedence
than addition, but lower precedence than
exponentiation.
● Allows for unambiguous parsing of
ambiguous grammars.
● We'll see how this is implemented later
on.
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | …
U β†’ β€œΞ΅β€
U β†’ (R)
The Structure of a Parse Tree
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | …
U β†’ β€œΞ΅β€
U β†’ (R)
a a | b *
The Structure of a Parse Tree
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | …
U β†’ β€œΞ΅β€
U β†’ (R)
a a | b *
U
U
U
T
T
T
S
S
R S
R
The Structure of a Parse Tree
a a | b *
U
U
U
T
T
T
S
S
R S
R
The Structure of a Parse Tree
a a b
concat
*
or
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | …
U β†’ β€œΞ΅β€
U β†’ (R)
a a | b *
a ( b | c )
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | …
U β†’ β€œΞ΅β€
U β†’ (R)
a ( b | c )
U U U
T T
T
S
S S
R
R
U
T
S
R
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | …
U β†’ β€œΞ΅β€
U β†’ (R)
a ( b | c )
U U U
T T
T
S
S S
R
R
U
T
S
R
a
b c
or
concat
R β†’ S | R β€œ|” S
S β†’ T | ST
T β†’ U | T*
U β†’ a | b | …
U β†’ β€œΞ΅β€
U β†’ (R)
Abstract Syntax Trees (ASTs)
● A parse tree is a concrete syntax tree;
it shows exactly how the text was
derived.
● A more useful structure is an abstract
syntax tree, which retains only the
essential structure of the input.
How to build an AST?
● Typically done through semantic actions.
● Associate a piece of code to execute with
each production.
● As the input is parsed, execute this code to
build the AST.
● Exact order of code execution depends on the
parsing method used.
●
This is called a syntax-directed
translation.
Simple Semantic Actions
E β†’ T + E
E β†’ T
T β†’ int
T β†’ int * T
T β†’ (E)
E1
.val = T.val + E2
.val
E.val = T.val
T.val = int.val
T.val = int.val * T.val
T.val = E.val
Simple Semantic Actions
E β†’ T + E
E β†’ T
T β†’ int
T β†’ int * T
T β†’ (E)
E1
.val = T.val + E2
.val
E.val = T.val
T.val = int.val
T.val = int.val * T.val
T.val = E.val
int * + 7
5
26 int int
Simple Semantic Actions
E β†’ T + E
E β†’ T
T β†’ int
T β†’ int * T
T β†’ (E)
E1
.val = T.val + E2
.val
E.val = T.val
T.val = int.val
T.val = int.val * T.val
T.val = E.val
int * + 7
T 5
5
26 int int
Simple Semantic Actions
E β†’ T + E
E β†’ T
T β†’ int
T β†’ int * T
T β†’ (E)
E1
.val = T.val + E2
.val
E.val = T.val
T.val = int.val
T.val = int.val * T.val
T.val = E.val
int * + 7
T
T
5
130
5
26 int int
Simple Semantic Actions
E β†’ T + E
E β†’ T
T β†’ int
T β†’ int * T
T β†’ (E)
E1
.val = T.val + E2
.val
E.val = T.val
T.val = int.val
T.val = int.val * T.val
T.val = E.val
int * + 7
T
T
T
5
130
7
5
26 int int
Simple Semantic Actions
E β†’ T + E
E β†’ T
T β†’ int
T β†’ int * T
T β†’ (E)
E1
.val = T.val + E2
.val
E.val = T.val
T.val = int.val
T.val = int.val * T.val
T.val = E.val
int * + 7
T
T
T
E
5
130
7
7
5
26 int int
Simple Semantic Actions
E β†’ T + E
E β†’ T
T β†’ int
T β†’ int * T
T β†’ (E)
E1
.val = T.val + E2
.val
E.val = T.val
T.val = int.val
T.val = int.val * T.val
T.val = E.val
int * + 7
T
T
T
E
5
130
7
7
5
26 int int
E 137
R β†’ S R.ast = S.ast;
R β†’ R β€œ|” S R1
.ast = new Or(R2
.ast, S.ast);
S β†’ T S.ast = T.ast;
S β†’ ST S1
.ast = new Concat(S2
.ast, T.ast);
T β†’ U T.ast = U.ast;
T β†’ T* T1
.ast = new Star(T2
.ast);
U β†’ a U.ast = new SingleChar('a');
U β†’ β€œΞ΅β€ U.ast = new Epsilon();
U β†’ (R) U.ast = R.ast;
Semantic Actions to Build ASTs
Summary
●
Syntax analysis (parsing) extracts the structure from the
tokens produced by the scanner.
●
Languages are usually specified by context-free grammars
(CFGs).
● A parse tree shows how a string can be derived from a
grammar.
● A grammar is ambiguous if it can derive the same string
multiple ways.
● There is no algorithm for eliminating ambiguity; it must be
done by hand.
●
Abstract syntax trees (ASTs) contain an abstract
representation of a program's syntax.
●
Semantic actions associated with productions can be used
to build ASTs.
Next Time
● Top-Down Parsing
● Parsing as a Search
● Backtracking Parsers
● Predictive Parsers
● LL(1)

More Related Content

Similar to Syntax Analysis.pdf

Module 11
Module 11Module 11
Module 11bittudavis
Β 
Ch3_Syntax Analysis.pptx
Ch3_Syntax Analysis.pptxCh3_Syntax Analysis.pptx
Ch3_Syntax Analysis.pptxTameneTamire
Β 
Lecture 04 syntax analysis
Lecture 04 syntax analysisLecture 04 syntax analysis
Lecture 04 syntax analysisIffat Anjum
Β 
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...movocode
Β 
3b. LMD & RMD.pdf
3b. LMD & RMD.pdf3b. LMD & RMD.pdf
3b. LMD & RMD.pdfTANZINTANZINA
Β 
Programming_Language_Syntax.ppt
Programming_Language_Syntax.pptProgramming_Language_Syntax.ppt
Programming_Language_Syntax.pptAmrita Sharma
Β 
Interm codegen
Interm codegenInterm codegen
Interm codegenAnshul Sharma
Β 
Cfg part i
Cfg   part iCfg   part i
Cfg part iKashif Ali
Β 
04 Syntax Analysis.pdf
04 Syntax Analysis.pdf04 Syntax Analysis.pdf
04 Syntax Analysis.pdfmovamag594
Β 
Compiler design.ppt
Compiler design.pptCompiler design.ppt
Compiler design.pptAnkushPal34
Β 
New compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfNew compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfeliasabdi2024
Β 
Parsing in Compiler Design
Parsing in Compiler DesignParsing in Compiler Design
Parsing in Compiler DesignAkhil Kaushik
Β 
Chapter Three(2)
Chapter Three(2)Chapter Three(2)
Chapter Three(2)bolovv
Β 
Compiler Components and their Generators - Traditional Parsing Algorithms
Compiler Components and their Generators - Traditional Parsing AlgorithmsCompiler Components and their Generators - Traditional Parsing Algorithms
Compiler Components and their Generators - Traditional Parsing AlgorithmsGuido Wachsmuth
Β 

Similar to Syntax Analysis.pdf (20)

Module 11
Module 11Module 11
Module 11
Β 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
Β 
Ch3_Syntax Analysis.pptx
Ch3_Syntax Analysis.pptxCh3_Syntax Analysis.pptx
Ch3_Syntax Analysis.pptx
Β 
Lecture 04 syntax analysis
Lecture 04 syntax analysisLecture 04 syntax analysis
Lecture 04 syntax analysis
Β 
Syntax
SyntaxSyntax
Syntax
Β 
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...
Β 
3b. LMD & RMD.pdf
3b. LMD & RMD.pdf3b. LMD & RMD.pdf
3b. LMD & RMD.pdf
Β 
Programming_Language_Syntax.ppt
Programming_Language_Syntax.pptProgramming_Language_Syntax.ppt
Programming_Language_Syntax.ppt
Β 
Lexicalanalyzer
LexicalanalyzerLexicalanalyzer
Lexicalanalyzer
Β 
Lexicalanalyzer
LexicalanalyzerLexicalanalyzer
Lexicalanalyzer
Β 
Interm codegen
Interm codegenInterm codegen
Interm codegen
Β 
Lecture5 syntax analysis_1
Lecture5 syntax analysis_1Lecture5 syntax analysis_1
Lecture5 syntax analysis_1
Β 
Cfg part i
Cfg   part iCfg   part i
Cfg part i
Β 
04 Syntax Analysis.pdf
04 Syntax Analysis.pdf04 Syntax Analysis.pdf
04 Syntax Analysis.pdf
Β 
Compiler design.ppt
Compiler design.pptCompiler design.ppt
Compiler design.ppt
Β 
New compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfNew compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdf
Β 
Parsing in Compiler Design
Parsing in Compiler DesignParsing in Compiler Design
Parsing in Compiler Design
Β 
Chapter Three(2)
Chapter Three(2)Chapter Three(2)
Chapter Three(2)
Β 
Ch2 (1).ppt
Ch2 (1).pptCh2 (1).ppt
Ch2 (1).ppt
Β 
Compiler Components and their Generators - Traditional Parsing Algorithms
Compiler Components and their Generators - Traditional Parsing AlgorithmsCompiler Components and their Generators - Traditional Parsing Algorithms
Compiler Components and their Generators - Traditional Parsing Algorithms
Β 

Recently uploaded

ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
Β 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
Β 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
Β 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
Β 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
Β 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
Β 
call girls in Kamla Market (DELHI) πŸ” >ΰΌ’9953330565πŸ” genuine Escort Service πŸ”βœ”οΈβœ”οΈ
call girls in Kamla Market (DELHI) πŸ” >ΰΌ’9953330565πŸ” genuine Escort Service πŸ”βœ”οΈβœ”οΈcall girls in Kamla Market (DELHI) πŸ” >ΰΌ’9953330565πŸ” genuine Escort Service πŸ”βœ”οΈβœ”οΈ
call girls in Kamla Market (DELHI) πŸ” >ΰΌ’9953330565πŸ” genuine Escort Service πŸ”βœ”οΈβœ”οΈ9953056974 Low Rate Call Girls In Saket, Delhi NCR
Β 
HỌC TỐT TIαΊΎNG ANH 11 THEO CHΖ―Ζ NG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIαΊΎT - CαΊ’ NΔ‚...
HỌC TỐT TIαΊΎNG ANH 11 THEO CHΖ―Ζ NG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIαΊΎT - CαΊ’ NΔ‚...HỌC TỐT TIαΊΎNG ANH 11 THEO CHΖ―Ζ NG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIαΊΎT - CαΊ’ NΔ‚...
HỌC TỐT TIαΊΎNG ANH 11 THEO CHΖ―Ζ NG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIαΊΎT - CαΊ’ NΔ‚...Nguyen Thanh Tu Collection
Β 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
Β 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
Β 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
Β 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
Β 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
Β 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
Β 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
Β 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
Β 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
Β 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
Β 

Recently uploaded (20)

ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
Β 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
Β 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
Β 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
Β 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
Β 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Β 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Β 
call girls in Kamla Market (DELHI) πŸ” >ΰΌ’9953330565πŸ” genuine Escort Service πŸ”βœ”οΈβœ”οΈ
call girls in Kamla Market (DELHI) πŸ” >ΰΌ’9953330565πŸ” genuine Escort Service πŸ”βœ”οΈβœ”οΈcall girls in Kamla Market (DELHI) πŸ” >ΰΌ’9953330565πŸ” genuine Escort Service πŸ”βœ”οΈβœ”οΈ
call girls in Kamla Market (DELHI) πŸ” >ΰΌ’9953330565πŸ” genuine Escort Service πŸ”βœ”οΈβœ”οΈ
Β 
HỌC TỐT TIαΊΎNG ANH 11 THEO CHΖ―Ζ NG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIαΊΎT - CαΊ’ NΔ‚...
HỌC TỐT TIαΊΎNG ANH 11 THEO CHΖ―Ζ NG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIαΊΎT - CαΊ’ NΔ‚...HỌC TỐT TIαΊΎNG ANH 11 THEO CHΖ―Ζ NG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIαΊΎT - CαΊ’ NΔ‚...
HỌC TỐT TIαΊΎNG ANH 11 THEO CHΖ―Ζ NG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIαΊΎT - CαΊ’ NΔ‚...
Β 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
Β 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
Β 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
Β 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
Β 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
Β 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
Β 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
Β 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
Β 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
Β 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
Β 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
Β 

Syntax Analysis.pdf

  • 2. Announcements ● Written Assignment 1 out, due Friday, July 6th at 5PM. ● Explore the theoretical aspects of scanning. ● See the limits of maximal-munch scanning. ● Class mailing list: ● There is an issue with SCPD students and the course mailing list. ● Email the staff immediately if you haven't gotten any of our emails.
  • 3. Where We Are Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code
  • 4. Where We Are Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Flex-pert
  • 5. Where We Are Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code
  • 6. while (ip < z) ++ip;
  • 7. w h i l e ( i < z ) n t + i p ; while (ip < z) ++ip; p + +
  • 8. w h i l e ( i < z ) n t + i p ; while (ip < z) ++ip; p + + T_While ( T_Ident < T_Ident ) ++ T_Ident ip z ip
  • 9. w h i l e ( i < z ) n t + i p ; while (ip < z) ++ip; p + + T_While ( T_Ident < T_Ident ) ++ T_Ident ip z ip While ++ Ident < Ident Ident ip z ip
  • 11. do[for] = new 0; d o [ f ] n e w 0 ; = o r
  • 12. do[for] = new 0; T_Do [ T_For T_New T_IntConst 0 d o [ f ] n e w 0 ; = o r ] =
  • 13. do[for] = new 0; T_Do [ T_For T_New T_IntConst 0 d o [ f ] n e w 0 ; = o r ] =
  • 14. What is Syntax Analysis? ● After lexical analysis (scanning), we have a series of tokens. ● In syntax analysis (or parsing), we want to interpret what those tokens mean. ● Goal: Recover the structure described by that series of tokens. ● Goal: Report errors if those tokens do not properly encode a structure.
  • 15. Outline ● Today: Formalisms for syntax analysis. ● Context-Free Grammars ● Derivations ● Concrete and Abstract Syntax Trees ● Ambiguity ● Next Week: Parsing algorithms. ● Top-Down Parsing ● Bottom-Up Parsing
  • 16. Formal Languages ● An alphabet is a set Ξ£ of symbols that act as letters. ● A language over Ξ£ is a set of strings made from symbols in Ξ£. ● When scanning, our alphabet was ASCII or Unicode characters. We produced tokens. ● When parsing, our alphabet is the set of tokens produced by the scanner.
  • 17. The Limits of Regular Languages ● When scanning, we used regular expressions to define each token. ● Unfortunately, regular expressions are (usually) too weak to define programming languages. ● Cannot define a regular expression matching all expressions with properly balanced parentheses. ● Cannot define a regular expression matching all functions with properly nested block structure. ● We need a more powerful formalism.
  • 18. Context-Free Grammars ● A context-free grammar (or CFG) is a formalism for defining languages. ● Can define the context-free languages, a strict superset of the the regular languages. ● CFGs are best explained by example...
  • 19. Arithmetic Expressions ● Suppose we want to describe all legal arithmetic expressions using addition, subtraction, multiplication, and division. ● Here is one possible CFG: E β†’ int E β†’ E Op E E β†’ (E) Op β†’ + Op β†’ - Op β†’ * Op β†’ / E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E * (E Op E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int Op int) β‡’ int * (int + int)
  • 20. Arithmetic Expressions ● Suppose we want to describe all legal arithmetic expressions using addition, subtraction, multiplication, and division. ● Here is one possible CFG: E β†’ int E β†’ E Op E E β†’ (E) Op β†’ + Op β†’ - Op β†’ * Op β†’ / E β‡’ E Op E β‡’ E Op int β‡’ int Op int β‡’ int / int
  • 21. Context-Free Grammars ● Formally, a context-free grammar is a collection of four objects: ● A set of nonterminal symbols (or variables), ● A set of terminal symbols, ● A set of production rules saying how each nonterminal can be converted by a string of terminals and nonterminals, and ● A start symbol that begins the derivation. E β†’ int E β†’ E Op E E β†’ (E) Op β†’ + Op β†’ - Op β†’ * Op β†’ /
  • 22. A Notational Shorthand E β†’ int E β†’ E Op E E β†’ (E) Op β†’ + Op β†’ - Op β†’ * Op β†’ /
  • 23. A Notational Shorthand E β†’ int | E Op E | (E) Op β†’ + | - | * | /
  • 24. Not Notational Shorthand ● The syntax for regular expressions does not carry over to CFGs. ● Cannot use *, |, or parentheses. S β†’ a*b
  • 25. Not Notational Shorthand ● The syntax for regular expressions does not carry over to CFGs. ● Cannot use *, |, or parentheses. S β†’ Ab
  • 26. Not Notational Shorthand ● The syntax for regular expressions does not carry over to CFGs. ● Cannot use *, |, or parentheses. S β†’ Ab A β†’ Aa | Ξ΅
  • 27. Not Notational Shorthand ● The syntax for regular expressions does not carry over to CFGs. ● Cannot use *, |, or parentheses. S β†’ a(b|c*)
  • 28. Not Notational Shorthand ● The syntax for regular expressions does not carry over to CFGs. ● Cannot use *, |, or parentheses. S β†’ aX X β†’ (b|c*)
  • 29. Not Notational Shorthand ● The syntax for regular expressions does not carry over to CFGs. ● Cannot use *, |, or parentheses. S β†’ aX X β†’ b | c*
  • 30. Not Notational Shorthand ● The syntax for regular expressions does not carry over to CFGs. ● Cannot use *, |, or parentheses. S β†’ aX X β†’ b | C
  • 31. Not Notational Shorthand ● The syntax for regular expressions does not carry over to CFGs. ● Cannot use *, |, or parentheses. S β†’ aX X β†’ b | C C β†’ Cc | Ξ΅
  • 32. More Context-Free Grammars ● Chemicals! C19 H14 O5 S Cu3 (CO3 )2 (OH)2 MnO4 - S2- Form β†’ Cmp | Cmp Ion Cmp β†’ Term | Term Num | Cmp Cmp Term β†’ Elem | (Cmp) Elem β†’ H | He | Li | Be | B | C | … Ion β†’ + | - | IonNum + | IonNum - IonNum β†’ 2 | 3 | 4 | ... Num β†’ 1 | IonNum
  • 33. CFGs for Chemistry Form β‡’ Cmp Ion β‡’ Cmp Cmp Ion β‡’ Cmp Term Num Ion β‡’ Term Term Num Ion β‡’ Elem Term Num Ion β‡’ Mn Term Num Ion β‡’ Mn Elem Num Ion β‡’ MnO Num Ion β‡’ MnO IonNum Ion β‡’ MnO4 Ion β‡’ MnO4 - Form β†’ Cmp | Cmp Ion Cmp β†’ Term | Term Num | Cmp Cmp Term β†’ Elem | (Cmp) Elem β†’ H | He | Li | Be | B | C | … Ion β†’ + | - | IonNum + | IonNum - IonNum β†’ 2 | 3 | 4 | ... Num β†’ 1 | IonNum
  • 34. CFGs for Programming Languages BLOCK β†’ STMT | { STMTS } STMTS β†’ Ξ΅ | STMT STMTS STMT β†’ EXPR; | if (EXPR) BLOCK | while (EXPR) BLOCK | do BLOCK while (EXPR); | BLOCK | … EXPR β†’ identifier | constant | EXPR + EXPR | EXPR – EXPR | EXPR * EXPR | ...
  • 35. Some CFG Notation ● We will be discussing generic transformations and operations on CFGs over the next two weeks. ● Let's standardize our notation.
  • 36. Some CFG Notation ● Capital letters at the beginning of the alphabet will represent nonterminals. ● i.e. A, B, C, D ● Lowercase letters at the end of the alphabet will represent terminals. ● i.e. t, u, v, w ● Lowercase Greek letters will represent arbitrary strings of terminals and nonterminals. ● i.e. Ξ±, Ξ³, Ο‰
  • 37. Examples ● We might write an arbitrary production as A β†’ Ο‰ ● We might write a string of a nonterminal followed by a terminal as At ● We might write an arbitrary production containing a nonterminal followed by a terminal as B β†’ Ξ±AtΟ‰
  • 38. Derivations β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E * (E Op E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int Op int) β‡’ int * (int + int) ● This sequence of steps is called a derivation. ● A string Ξ±AΟ‰ yields string Ξ±Ξ³Ο‰ iff A β†’ Ξ³ is a production. ● If Ξ± yields Ξ², we write Ξ± β‡’ Ξ². ● We say that Ξ± derives Ξ² iff there is a sequence of strings where Ξ± β‡’ Ξ±1 β‡’ Ξ±2 β‡’ … β‡’ Ξ² ● If Ξ± derives Ξ², we write Ξ± β‡’* Ξ².
  • 39. Leftmost Derivations β†’ STMTS β‡’ STMT STMTS β‡’ EXPR; STMTS β‡’ EXPR = EXPR; STMTS β‡’ id = EXPR; STMTS β‡’ id = EXPR + EXPR; STMTS β‡’ id = id + EXPR; STMTS β‡’ id = id + constant; STMTS β‡’ id = id + constant; BLOCK β†’ STMT | { STMTS } STMTS β†’ Ξ΅ | STMT STMTS STMT β†’ EXPR; | if (EXPR) BLOCK | while (EXPR) BLOCK | do BLOCK while (EXPR); | BLOCK | … EXPR β†’ identifier | constant | EXPR + EXPR | EXPR – EXPR | EXPR * EXPR | EXPR = EXPR | ...
  • 40. Leftmost Derivations ● A leftmost derivation is a derivation in which each step expands the leftmost nonterminal. ● A rightmost derivation is a derivation in which each step expands the rightmost nonterminal. ● These will be of great importance when we talk about parsing next week.
  • 41. Related Derivations β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int) β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int)
  • 42. Derivations Revisited ● A derivation encodes two pieces of information: ● What productions were applied produce the resulting string from the start symbol? ● In what order were they applied? ● Multiple derivations might use the same productions, but apply them in a different order.
  • 43. Parse Trees β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int)
  • 44. Parse Trees β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int) E
  • 45. Parse Trees β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int) E
  • 46. Parse Trees β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int) E E E Op
  • 47. Parse Trees β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int) E E E Op
  • 48. Parse Trees β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int) E E E Op int
  • 49. Parse Trees β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int) E E E Op int
  • 50. Parse Trees β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int) E E E Op int *
  • 51. Parse Trees β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int) E E E Op int *
  • 52. Parse Trees β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int) E E E Op int * E ( )
  • 53. Parse Trees β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int) E E E Op int * E ( )
  • 54. Parse Trees β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int) E E E Op int * E ( ) E E Op
  • 55. Parse Trees β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int) E E E Op int * E ( ) E E Op
  • 56. Parse Trees β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int) E E E Op int * E ( int ) E E Op
  • 57. Parse Trees β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int) E E E Op int * E ( int ) E E Op
  • 58. Parse Trees β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int) E E E Op int * E ( int + ) E E Op
  • 59. Parse Trees β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int) E E E Op int * E ( int + ) E E Op
  • 60. Parse Trees β‡’ E β‡’ E Op E β‡’ int Op E β‡’ int * E β‡’ int * (E) β‡’ int * (E Op E) β‡’ int * (int Op E) β‡’ int * (int + E) β‡’ int * (int + int) E E E Op int * E ( int + int ) E E Op
  • 61. Parse Trees β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int)
  • 62. Parse Trees β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int) E
  • 63. Parse Trees β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int) E
  • 64. Parse Trees β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int) E E E Op
  • 65. Parse Trees β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int) E E E Op
  • 66. Parse Trees β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int) E E E Op E ( )
  • 67. Parse Trees β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int) E E E Op E ( )
  • 68. Parse Trees β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int) E E E Op E ( ) E E Op
  • 69. Parse Trees β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int) E E E Op E ( ) E E Op
  • 70. Parse Trees β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int) E E E Op E ( int ) E E Op
  • 71. Parse Trees β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int) E E E Op E ( int ) E E Op
  • 72. Parse Trees β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int) E E E Op E ( + int ) E E Op
  • 73. Parse Trees β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int) E E E Op E ( + int ) E E Op
  • 74. Parse Trees β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int) E E E Op E ( int + int ) E E Op
  • 75. Parse Trees β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int) E E E Op E ( int + int ) E E Op
  • 76. Parse Trees β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int) E E E Op * E ( int + int ) E E Op
  • 77. Parse Trees β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int) E E E Op * E ( int + int ) E E Op
  • 78. Parse Trees β‡’ E β‡’ E Op E β‡’ E Op (E) β‡’ E Op (E Op E) β‡’ E Op (E Op int) β‡’ E Op (E + int) β‡’ E Op (int + int) β‡’ E * (int + int) β‡’ int * (int + int) E E E Op int * E ( int + int ) E E Op
  • 79. For Comparison E E E Op int * E ( int + int ) E E Op E E E Op int * E ( int + int ) E E Op
  • 80. Parse Trees ● A parse tree is a tree encoding the steps in a derivation. ● Internal nodes represent nonterminal symbols used in the production. ● Inorder walk of the leaves contains the generated string. ● Encodes what productions are used, not the order in which those productions are applied.
  • 81. The Goal of Parsing ● Goal of syntax analysis: Recover the structure described by a series of tokens. ● If language is described as a CFG, goal is to recover a parse tree for the the input string. ● Usually we do some simplifications on the tree; more on that later. ● We'll discuss how to do this next week.
  • 83. E E E Op int * int + int E E Op E E E Op int * int E E Op + int A Serious Problem int * (int + int) (int * int) + int
  • 84. Ambiguity ● A CFG is said to be ambiguous if there is at least one string with two or more parse trees. ● Note that ambiguity is a property of grammars, not languages. ● There is no algorithm for converting an arbitrary ambiguous grammar into an unambiguous one. ● Some languages are inherently ambiguous, meaning that no unambiguous grammar exists for them. ● There is no algorithm for detecting whether an arbitrary grammar is ambiguous.
  • 85. Is Ambiguity a Problem? ● Depends on semantics. E β†’ int | E + E
  • 86. Is Ambiguity a Problem? ● Depends on semantics. E β†’ int | E + E R + R E + E R R + R 5 R R + 3 7 E E + E 5 E E + 3 7 R 5 R E + E E 7 5 3
  • 87. Is Ambiguity a Problem? ● Depends on semantics. E β†’ int | E + E | E - E
  • 88. Is Ambiguity a Problem? ● Depends on semantics. E β†’ int | E + E | E - E R + R E + E R R + R 5 R R + 3 7 E E - E 5 E E + 3 7 R 5 R E - E E 7 5 3
  • 89. Resolving Ambiguity ● If a grammar can be made unambiguous at all, it is usually made unambiguous through layering. ● Have exactly one way to build each piece of the string. ● Have exactly one way of combining those pieces back together.
  • 90. Example: Balanced Parentheses ● Consider the language of all strings of balanced parentheses. ● Examples: ● Ξ΅ ● () ● (()()) ● ((()))(())() ● Here is one possible grammar for balanced parentheses: P β†’ Ξ΅ | PP | (P)
  • 91. Balanced Parentheses ● Given the grammar P β†’ Ξ΅ | PP | (P) ● How might we generate the string (()())?
  • 92. Balanced Parentheses ● Given the grammar P β†’ Ξ΅ | PP | (P) ● How might we generate the string (()())? P
  • 93. Balanced Parentheses ● Given the grammar P β†’ Ξ΅ | PP | (P) ● How might we generate the string (()())? P P ( )
  • 94. Balanced Parentheses ● Given the grammar P β†’ Ξ΅ | PP | (P) ● How might we generate the string (()())? P P P P ( )
  • 95. Balanced Parentheses ● Given the grammar P β†’ Ξ΅ | PP | (P) ● How might we generate the string (()())? P P P P P ( ( ) )
  • 96. Balanced Parentheses ● Given the grammar P β†’ Ξ΅ | PP | (P) ● How might we generate the string (()())? P P P P P ( ( ) ) Ξ΅
  • 97. Balanced Parentheses ● Given the grammar P β†’ Ξ΅ | PP | (P) ● How might we generate the string (()())? P P P P P P ( ( ) ( ) ) Ξ΅
  • 98. Balanced Parentheses ● Given the grammar P β†’ Ξ΅ | PP | (P) ● How might we generate the string (()())? P P P P P P ( ( ) ( ) ) Ξ΅ Ξ΅
  • 99. Balanced Parentheses P P P P P P ( ( ) ( ) ) Ξ΅ Ξ΅ P P Ξ΅
  • 100. How to resolve this ambiguity?
  • 101. ( ( ) ( ) ) ( ) ( ( ) )
  • 102. ( ( ) ( ) ) ( ) ( ( ) )
  • 103. ( ( ) ( ) ) ( ) ( ( ) )
  • 104. ( ( ) ( ) ) ( ) ( ( ) )
  • 105. Rethinking Parentheses ● A string of balanced parentheses is a sequence of strings that are themselves balanced parentheses. ● To avoid ambiguity, we can build the string in two steps: ● Decide how many different substrings we will glue together. ● Build each substring independently.
  • 106. Let's ask the Internet for help!
  • 107.
  • 108. Um... what? ● The way the nyancat flies across the sky is similar to how we can build up strings of balanced parentheses.
  • 109. Um... what? ● The way the nyancat flies across the sky is similar to how we can build up strings of balanced parentheses.
  • 110. Um... what? ● The way the nyancat flies across the sky is similar to how we can build up strings of balanced parentheses.
  • 111. Um... what? ● The way the nyancat flies across the sky is similar to how we can build up strings of balanced parentheses.
  • 112. Um... what? ● The way the nyancat flies across the sky is similar to how we can build up strings of balanced parentheses.
  • 113. Um... what? ● The way the nyancat flies across the sky is similar to how we can build up strings of balanced parentheses.
  • 114. Um... what? ● The way the nyancat flies across the sky is similar to how we can build up strings of balanced parentheses.
  • 115. Um... what? ● The way the nyancat flies across the sky is similar to how we can build up strings of balanced parentheses. β†’ β†’ Ξ΅ β†’ β†’
  • 116. Building Parentheses ● Spread a string of parentheses across the string. There is exactly one way to do this for any number of parentheses. ● Expand out each substring by adding in parentheses and repeating. S P β†’ β†’ S P | Ξ΅ ( ) S
  • 117. Building Parentheses S β‡’ PS β‡’ PPS β‡’ PP β‡’ (S)P β‡’ (S)(S) β‡’ (PS)(S) β‡’ (P)(S) β‡’ ((S))(S) β‡’ (())(S) β‡’ (())() S P β†’ β†’ S P | Ξ΅ ( ) S
  • 118. Context-Free Grammars ● A regular expression can be ● Any letter ● Ξ΅ ● The concatenation of regular expressions. ● The union of regular expressions. ● The Kleene closure of a regular expression. ● A parenthesized regular expression.
  • 119. Context-Free Grammars ● This gives us the following CFG: R β†’ a | b | c | … R β†’ β€œΞ΅β€ R β†’ RR R β†’ R β€œ|” R R β†’ R* R β†’ (R)
  • 120. R β†’ a | b | c | … R β†’ β€œΞ΅β€ R β†’ RR R β†’ R β€œ|” R R β†’ R* R β†’ (R) a | b * R R R R a | b * R R R R a | (b*) (a | b)* An Ambiguous Grammar
  • 121. Resolving Ambiguity ● We can try to resolve the ambiguity via layering: R β†’ a | b | c | … R β†’ β€œΞ΅β€ R β†’ RR R β†’ R β€œ|” R R β†’ R* R β†’ (R) | a a * b
  • 122. Resolving Ambiguity ● We can try to resolve the ambiguity via layering: R β†’ a | b | c | … R β†’ β€œΞ΅β€ R β†’ RR R β†’ R β€œ|” R R β†’ R* R β†’ (R) | a a * b
  • 123. Resolving Ambiguity ● We can try to resolve the ambiguity via layering: R β†’ a | b | c | … R β†’ β€œΞ΅β€ R β†’ RR R β†’ R β€œ|” R R β†’ R* R β†’ (R) | a a * b
  • 124. Resolving Ambiguity ● We can try to resolve the ambiguity via layering: R β†’ a | b | c | … R β†’ β€œΞ΅β€ R β†’ RR R β†’ R β€œ|” R R β†’ R* R β†’ (R) | a a * b
  • 125. Resolving Ambiguity ● We can try to resolve the ambiguity via layering: R β†’ a | b | c | … R β†’ β€œΞ΅β€ R β†’ RR R β†’ R β€œ|” R R β†’ R* R β†’ (R) | a a * b
  • 126. Resolving Ambiguity ● We can try to resolve the ambiguity via layering: R β†’ a | b | c | … R β†’ β€œΞ΅β€ R β†’ RR R β†’ R β€œ|” R R β†’ R* R β†’ (R) | a a * b
  • 127. Resolving Ambiguity ● We can try to resolve the ambiguity via layering: R β†’ a | b | c | … R β†’ β€œΞ΅β€ R β†’ RR R β†’ R β€œ|” R R β†’ R* R β†’ (R) | a a * b
  • 128. Resolving Ambiguity ● We can try to resolve the ambiguity via layering: R β†’ a | b | c | … R β†’ β€œΞ΅β€ R β†’ RR R β†’ R β€œ|” R R β†’ R* R β†’ (R) R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | c | … U β†’ β€œΞ΅β€ U β†’ (R)
  • 129. Why is this unambiguous? R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | … U β†’ β€œΞ΅β€ U β†’ (R)
  • 130. Why is this unambiguous? Only generates β€œatomic” expressions R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | … U β†’ β€œΞ΅β€ U β†’ (R)
  • 131. Why is this unambiguous? Only generates β€œatomic” expressions Puts stars onto atomic expressions R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | … U β†’ β€œΞ΅β€ U β†’ (R)
  • 132. Why is this unambiguous? Only generates β€œatomic” expressions Puts stars onto atomic expressions Concatenates starred expressions R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | … U β†’ β€œΞ΅β€ U β†’ (R)
  • 133. Why is this unambiguous? Only generates β€œatomic” expressions Puts stars onto atomic expressions Concatenates starred expressions Unions concatenated expressions R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | … U β†’ β€œΞ΅β€ U β†’ (R)
  • 134. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | c | … U β†’ β€œΞ΅β€ U β†’ (R) a b | c | a * R
  • 135. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | c | … U β†’ β€œΞ΅β€ U β†’ (R) a b | c | a * S R R
  • 136. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | c | … U β†’ β€œΞ΅β€ U β†’ (R) a b | c | a * S S R R R
  • 137. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | c | … U β†’ β€œΞ΅β€ U β†’ (R) a b | c | a * S S S R R R
  • 138. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | c | … U β†’ β€œΞ΅β€ U β†’ (R) a b | c | a * T S S S S R R R
  • 139. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | c | … U β†’ β€œΞ΅β€ U β†’ (R) a b | c | a * T T S S S S R R R
  • 140. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | c | … U β†’ β€œΞ΅β€ U β†’ (R) a b | c | a * U T T S S S S R R R
  • 141. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | c | … U β†’ β€œΞ΅β€ U β†’ (R) a b | c | a * U T T S S S S R R R
  • 142. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | c | … U β†’ β€œΞ΅β€ U β†’ (R) a b | c | a * U U T T S S S S R R R
  • 143. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | c | … U β†’ β€œΞ΅β€ U β†’ (R) a b | c | a * U U T T S S S S R R R
  • 144. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | c | … U β†’ β€œΞ΅β€ U β†’ (R) a b | c | a * U U T T T S S S S R R R
  • 145. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | c | … U β†’ β€œΞ΅β€ U β†’ (R) a b | c | a * U U U T T T S S S S R R R
  • 146. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | c | … U β†’ β€œΞ΅β€ U β†’ (R) a b | c | a * U U U T T T S S S S R R R
  • 147. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | c | … U β†’ β€œΞ΅β€ U β†’ (R) a b | c | a * U U U T T T S S S S R R R T
  • 148. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | c | … U β†’ β€œΞ΅β€ U β†’ (R) a b | c | a * U U U T T T T S S S S R R R T
  • 149. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | c | … U β†’ β€œΞ΅β€ U β†’ (R) a b | c | a * U U U U T T T T S S S S R R R T
  • 150. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | c | … U β†’ β€œΞ΅β€ U β†’ (R) a b | c | a * U U U U T T T T S S S S R R R T
  • 151. Precedence Declarations ● If we leave the world of pure CFGs, we can often resolve ambiguities through precedence declarations. ● e.g. multiplication has higher precedence than addition, but lower precedence than exponentiation. ● Allows for unambiguous parsing of ambiguous grammars. ● We'll see how this is implemented later on.
  • 152. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | … U β†’ β€œΞ΅β€ U β†’ (R) The Structure of a Parse Tree
  • 153. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | … U β†’ β€œΞ΅β€ U β†’ (R) a a | b * The Structure of a Parse Tree
  • 154. R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | … U β†’ β€œΞ΅β€ U β†’ (R) a a | b * U U U T T T S S R S R The Structure of a Parse Tree
  • 155. a a | b * U U U T T T S S R S R The Structure of a Parse Tree a a b concat * or R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | … U β†’ β€œΞ΅β€ U β†’ (R) a a | b *
  • 156. a ( b | c ) R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | … U β†’ β€œΞ΅β€ U β†’ (R)
  • 157. a ( b | c ) U U U T T T S S S R R U T S R R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | … U β†’ β€œΞ΅β€ U β†’ (R)
  • 158. a ( b | c ) U U U T T T S S S R R U T S R a b c or concat R β†’ S | R β€œ|” S S β†’ T | ST T β†’ U | T* U β†’ a | b | … U β†’ β€œΞ΅β€ U β†’ (R)
  • 159. Abstract Syntax Trees (ASTs) ● A parse tree is a concrete syntax tree; it shows exactly how the text was derived. ● A more useful structure is an abstract syntax tree, which retains only the essential structure of the input.
  • 160. How to build an AST? ● Typically done through semantic actions. ● Associate a piece of code to execute with each production. ● As the input is parsed, execute this code to build the AST. ● Exact order of code execution depends on the parsing method used. ● This is called a syntax-directed translation.
  • 161. Simple Semantic Actions E β†’ T + E E β†’ T T β†’ int T β†’ int * T T β†’ (E) E1 .val = T.val + E2 .val E.val = T.val T.val = int.val T.val = int.val * T.val T.val = E.val
  • 162. Simple Semantic Actions E β†’ T + E E β†’ T T β†’ int T β†’ int * T T β†’ (E) E1 .val = T.val + E2 .val E.val = T.val T.val = int.val T.val = int.val * T.val T.val = E.val int * + 7 5 26 int int
  • 163. Simple Semantic Actions E β†’ T + E E β†’ T T β†’ int T β†’ int * T T β†’ (E) E1 .val = T.val + E2 .val E.val = T.val T.val = int.val T.val = int.val * T.val T.val = E.val int * + 7 T 5 5 26 int int
  • 164. Simple Semantic Actions E β†’ T + E E β†’ T T β†’ int T β†’ int * T T β†’ (E) E1 .val = T.val + E2 .val E.val = T.val T.val = int.val T.val = int.val * T.val T.val = E.val int * + 7 T T 5 130 5 26 int int
  • 165. Simple Semantic Actions E β†’ T + E E β†’ T T β†’ int T β†’ int * T T β†’ (E) E1 .val = T.val + E2 .val E.val = T.val T.val = int.val T.val = int.val * T.val T.val = E.val int * + 7 T T T 5 130 7 5 26 int int
  • 166. Simple Semantic Actions E β†’ T + E E β†’ T T β†’ int T β†’ int * T T β†’ (E) E1 .val = T.val + E2 .val E.val = T.val T.val = int.val T.val = int.val * T.val T.val = E.val int * + 7 T T T E 5 130 7 7 5 26 int int
  • 167. Simple Semantic Actions E β†’ T + E E β†’ T T β†’ int T β†’ int * T T β†’ (E) E1 .val = T.val + E2 .val E.val = T.val T.val = int.val T.val = int.val * T.val T.val = E.val int * + 7 T T T E 5 130 7 7 5 26 int int E 137
  • 168. R β†’ S R.ast = S.ast; R β†’ R β€œ|” S R1 .ast = new Or(R2 .ast, S.ast); S β†’ T S.ast = T.ast; S β†’ ST S1 .ast = new Concat(S2 .ast, T.ast); T β†’ U T.ast = U.ast; T β†’ T* T1 .ast = new Star(T2 .ast); U β†’ a U.ast = new SingleChar('a'); U β†’ β€œΞ΅β€ U.ast = new Epsilon(); U β†’ (R) U.ast = R.ast; Semantic Actions to Build ASTs
  • 169. Summary ● Syntax analysis (parsing) extracts the structure from the tokens produced by the scanner. ● Languages are usually specified by context-free grammars (CFGs). ● A parse tree shows how a string can be derived from a grammar. ● A grammar is ambiguous if it can derive the same string multiple ways. ● There is no algorithm for eliminating ambiguity; it must be done by hand. ● Abstract syntax trees (ASTs) contain an abstract representation of a program's syntax. ● Semantic actions associated with productions can be used to build ASTs.
  • 170. Next Time ● Top-Down Parsing ● Parsing as a Search ● Backtracking Parsers ● Predictive Parsers ● LL(1)