2. Announcements
β Written Assignment 1 out, due Friday, July
6th at 5PM.
β Explore the theoretical aspects of scanning.
β See the limits of maximal-munch scanning.
β Class mailing list:
β There is an issue with SCPD students and the
course mailing list.
β
Email the staff immediately if you haven't
gotten any of our emails.
3. Where We Are
Lexical Analysis
Syntax Analysis
Semantic Analysis
IR Generation
IR Optimization
Code Generation
Optimization
Source
Code
Machine
Code
4. Where We Are
Lexical Analysis
Syntax Analysis
Semantic Analysis
IR Generation
IR Optimization
Code Generation
Optimization
Source
Code
Machine
Code
Flex-pert
Flex-pert
5. Where We Are
Lexical Analysis
Syntax Analysis
Semantic Analysis
IR Generation
IR Optimization
Code Generation
Optimization
Source
Code
Machine
Code
12. do[for] = new 0;
T_Do [ T_For T_New T_IntConst
0
d o [ f ] n e w 0 ;
=
o r
] =
13. do[for] = new 0;
T_Do [ T_For T_New T_IntConst
0
d o [ f ] n e w 0 ;
=
o r
] =
14. What is Syntax Analysis?
β After lexical analysis (scanning), we have
a series of tokens.
β In syntax analysis (or parsing), we
want to interpret what those tokens
mean.
β Goal: Recover the structure described by
that series of tokens.
β Goal: Report errors if those tokens do not
properly encode a structure.
15. Outline
β Today: Formalisms for syntax analysis.
β Context-Free Grammars
β Derivations
β Concrete and Abstract Syntax Trees
β Ambiguity
β Next Week: Parsing algorithms.
β Top-Down Parsing
β Bottom-Up Parsing
16. Formal Languages
β An alphabet is a set Ξ£ of symbols that act
as letters.
β A language over Ξ£ is a set of strings made
from symbols in Ξ£.
β When scanning, our alphabet was ASCII or
Unicode characters. We produced tokens.
β When parsing, our alphabet is the set of
tokens produced by the scanner.
17. The Limits of Regular Languages
β When scanning, we used regular expressions to
define each token.
β Unfortunately, regular expressions are (usually)
too weak to define programming languages.
β Cannot define a regular expression matching all
expressions with properly balanced parentheses.
β Cannot define a regular expression matching all
functions with properly nested block structure.
β We need a more powerful formalism.
18. Context-Free Grammars
β A context-free grammar (or CFG) is a
formalism for defining languages.
β Can define the context-free languages,
a strict superset of the the regular
languages.
β CFGs are best explained by example...
19. Arithmetic Expressions
β Suppose we want to describe all legal arithmetic
expressions using addition, subtraction,
multiplication, and division.
β Here is one possible CFG:
E β int
E β E Op E
E β (E)
Op β +
Op β -
Op β *
Op β /
E
β E Op E
β E Op (E)
β E Op (E Op E)
β E * (E Op E)
β int * (E Op E)
β int * (int Op E)
β int * (int Op int)
β int * (int + int)
20. Arithmetic Expressions
β Suppose we want to describe all legal arithmetic
expressions using addition, subtraction,
multiplication, and division.
β Here is one possible CFG:
E β int
E β E Op E
E β (E)
Op β +
Op β -
Op β *
Op β /
E
β E Op E
β E Op int
β int Op int
β int / int
21. Context-Free Grammars
β Formally, a context-free
grammar is a collection of four
objects:
β A set of nonterminal symbols
(or variables),
β A set of terminal symbols,
β A set of production rules saying
how each nonterminal can be
converted by a string of terminals
and nonterminals, and
β A start symbol that begins the
derivation.
E β int
E β E Op E
E β (E)
Op β +
Op β -
Op β *
Op β /
24. Not Notational Shorthand
β The syntax for regular expressions does
not carry over to CFGs.
β Cannot use *, |, or parentheses.
S β a*b
25. Not Notational Shorthand
β The syntax for regular expressions does
not carry over to CFGs.
β Cannot use *, |, or parentheses.
S β Ab
26. Not Notational Shorthand
β The syntax for regular expressions does
not carry over to CFGs.
β Cannot use *, |, or parentheses.
S β Ab
A β Aa | Ξ΅
27. Not Notational Shorthand
β The syntax for regular expressions does
not carry over to CFGs.
β Cannot use *, |, or parentheses.
S β a(b|c*)
28. Not Notational Shorthand
β The syntax for regular expressions does
not carry over to CFGs.
β Cannot use *, |, or parentheses.
S β aX
X β (b|c*)
29. Not Notational Shorthand
β The syntax for regular expressions does
not carry over to CFGs.
β Cannot use *, |, or parentheses.
S β aX
X β b | c*
30. Not Notational Shorthand
β The syntax for regular expressions does
not carry over to CFGs.
β Cannot use *, |, or parentheses.
S β aX
X β b | C
31. Not Notational Shorthand
β The syntax for regular expressions does
not carry over to CFGs.
β Cannot use *, |, or parentheses.
S β aX
X β b | C
C β Cc | Ξ΅
32. More Context-Free Grammars
β Chemicals!
C19
H14
O5
S
Cu3
(CO3
)2
(OH)2
MnO4
-
S2-
Form β Cmp | Cmp Ion
Cmp β Term | Term Num | Cmp Cmp
Term β Elem | (Cmp)
Elem β H | He | Li | Be | B | C | β¦
Ion β + | - | IonNum + | IonNum -
IonNum β 2 | 3 | 4 | ...
Num β 1 | IonNum
33. CFGs for Chemistry
Form
β Cmp Ion
β Cmp Cmp Ion
β Cmp Term Num Ion
β Term Term Num Ion
β Elem Term Num Ion
β Mn Term Num Ion
β Mn Elem Num Ion
β MnO Num Ion
β MnO IonNum Ion
β MnO4
Ion
β MnO4
-
Form β Cmp | Cmp Ion
Cmp β Term | Term Num | Cmp Cmp
Term β Elem | (Cmp)
Elem β H | He | Li | Be | B | C | β¦
Ion β + | - | IonNum + | IonNum -
IonNum β 2 | 3 | 4 | ...
Num β 1 | IonNum
34. CFGs for Programming Languages
BLOCK β STMT
| { STMTS }
STMTS β Ξ΅
| STMT STMTS
STMT β EXPR;
| if (EXPR) BLOCK
| while (EXPR) BLOCK
| do BLOCK while (EXPR);
| BLOCK
| β¦
EXPR β identifier
| constant
| EXPR + EXPR
| EXPR β EXPR
| EXPR * EXPR
| ...
35. Some CFG Notation
β We will be discussing generic
transformations and operations on CFGs
over the next two weeks.
β Let's standardize our notation.
36. Some CFG Notation
β Capital letters at the beginning of the
alphabet will represent nonterminals.
β i.e. A, B, C, D
β Lowercase letters at the end of the
alphabet will represent terminals.
β i.e. t, u, v, w
β Lowercase Greek letters will represent
arbitrary strings of terminals and
nonterminals.
β i.e. Ξ±, Ξ³, Ο
37. Examples
β We might write an arbitrary production as
A β Ο
β We might write a string of a nonterminal followed
by a terminal as
At
β We might write an arbitrary production containing
a nonterminal followed by a terminal as
B β Ξ±AtΟ
38. Derivations
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E * (E Op E)
β int * (E Op E)
β int * (int Op E)
β int * (int Op int)
β int * (int + int)
β This sequence of steps is called
a derivation.
β A string Ξ±AΟ yields string
Ξ±Ξ³Ο iff A β Ξ³ is a production.
β If Ξ± yields Ξ², we write Ξ± β Ξ².
β We say that Ξ± derives Ξ² iff
there is a sequence of strings
where
Ξ± β Ξ±1
β Ξ±2
β β¦ β Ξ²
β If Ξ± derives Ξ², we write Ξ± β* Ξ².
39. Leftmost Derivations
β STMTS
β STMT STMTS
β EXPR; STMTS
β EXPR = EXPR; STMTS
β id = EXPR; STMTS
β id = EXPR + EXPR; STMTS
β id = id + EXPR; STMTS
β id = id + constant; STMTS
β id = id + constant;
BLOCK β STMT
| { STMTS }
STMTS β Ξ΅
| STMT STMTS
STMT β EXPR;
| if (EXPR) BLOCK
| while (EXPR) BLOCK
| do BLOCK while (EXPR);
| BLOCK
| β¦
EXPR β identifier
| constant
| EXPR + EXPR
| EXPR β EXPR
| EXPR * EXPR
| EXPR = EXPR
| ...
40. Leftmost Derivations
β A leftmost derivation is a derivation in
which each step expands the leftmost
nonterminal.
β A rightmost derivation is a derivation
in which each step expands the rightmost
nonterminal.
β These will be of great importance when
we talk about parsing next week.
41. Related Derivations
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
42. Derivations Revisited
β A derivation encodes two pieces of
information:
β What productions were applied produce the
resulting string from the start symbol?
β In what order were they applied?
β Multiple derivations might use the same
productions, but apply them in a
different order.
43. Parse Trees
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
44. Parse Trees
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
E
45. Parse Trees
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
E
46. Parse Trees
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
E
E E
Op
47. Parse Trees
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
E
E E
Op
48. Parse Trees
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
E
E E
Op
int
49. Parse Trees
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
E
E E
Op
int
50. Parse Trees
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
E
E E
Op
int *
51. Parse Trees
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
E
E E
Op
int *
52. Parse Trees
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
E
E E
Op
int *
E
( )
53. Parse Trees
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
E
E E
Op
int *
E
( )
54. Parse Trees
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
E
E E
Op
int *
E
( )
E E
Op
55. Parse Trees
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
E
E E
Op
int *
E
( )
E E
Op
56. Parse Trees
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
E
E E
Op
int *
E
( int )
E E
Op
57. Parse Trees
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
E
E E
Op
int *
E
( int )
E E
Op
58. Parse Trees
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
E
E E
Op
int *
E
( int + )
E E
Op
59. Parse Trees
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
E
E E
Op
int *
E
( int + )
E E
Op
60. Parse Trees
β E
β E Op E
β int Op E
β int * E
β int * (E)
β int * (E Op E)
β int * (int Op E)
β int * (int + E)
β int * (int + int)
E
E E
Op
int *
E
( int + int )
E E
Op
61. Parse Trees
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
62. Parse Trees
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
E
63. Parse Trees
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
E
64. Parse Trees
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
E
E E
Op
65. Parse Trees
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
E
E E
Op
66. Parse Trees
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
E
E E
Op
E
( )
67. Parse Trees
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
E
E E
Op
E
( )
68. Parse Trees
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
E
E E
Op
E
( )
E E
Op
69. Parse Trees
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
E
E E
Op
E
( )
E E
Op
70. Parse Trees
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
E
E E
Op
E
( int )
E E
Op
71. Parse Trees
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
E
E E
Op
E
( int )
E E
Op
72. Parse Trees
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
E
E E
Op
E
( + int )
E E
Op
73. Parse Trees
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
E
E E
Op
E
( + int )
E E
Op
74. Parse Trees
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
E
E E
Op
E
( int + int )
E E
Op
75. Parse Trees
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
E
E E
Op
E
( int + int )
E E
Op
76. Parse Trees
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
E
E E
Op
*
E
( int + int )
E E
Op
77. Parse Trees
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
E
E E
Op
*
E
( int + int )
E E
Op
78. Parse Trees
β E
β E Op E
β E Op (E)
β E Op (E Op E)
β E Op (E Op int)
β E Op (E + int)
β E Op (int + int)
β E * (int + int)
β int * (int + int)
E
E E
Op
int *
E
( int + int )
E E
Op
80. Parse Trees
β A parse tree is a tree encoding the steps
in a derivation.
β Internal nodes represent nonterminal
symbols used in the production.
β Inorder walk of the leaves contains the
generated string.
β Encodes what productions are used, not
the order in which those productions are
applied.
81. The Goal of Parsing
β Goal of syntax analysis: Recover the
structure described by a series of
tokens.
β If language is described as a CFG, goal is
to recover a parse tree for the the input
string.
β Usually we do some simplifications on the
tree; more on that later.
β We'll discuss how to do this next week.
83. E
E E
Op
int * int + int
E E
Op
E
E E
Op
int * int
E E
Op
+ int
A Serious Problem
int * (int + int) (int * int) + int
84. Ambiguity
β A CFG is said to be ambiguous if there is at
least one string with two or more parse trees.
β Note that ambiguity is a property of grammars,
not languages.
β There is no algorithm for converting an
arbitrary ambiguous grammar into an
unambiguous one.
β Some languages are inherently ambiguous, meaning
that no unambiguous grammar exists for them.
β There is no algorithm for detecting whether an
arbitrary grammar is ambiguous.
85. Is Ambiguity a Problem?
β Depends on semantics.
E β int | E + E
86. Is Ambiguity a Problem?
β Depends on semantics.
E β int | E + E
R
+ R
E
+ E
R
R + R
5 R R
+
3 7
E
E + E
5 E E
+
3 7
R
5
R
E
+
E E 7
5 3
87. Is Ambiguity a Problem?
β Depends on semantics.
E β int | E + E | E - E
88. Is Ambiguity a Problem?
β Depends on semantics.
E β int | E + E | E - E
R
+ R
E
+ E
R
R + R
5 R R
+
3 7
E
E - E
5 E E
+
3 7
R
5
R
E
-
E E 7
5 3
89. Resolving Ambiguity
β If a grammar can be made unambiguous
at all, it is usually made unambiguous
through layering.
β Have exactly one way to build each piece
of the string.
β Have exactly one way of combining those
pieces back together.
90. Example: Balanced Parentheses
β Consider the language of all strings of
balanced parentheses.
β Examples:
β Ξ΅
β ()
β (()())
β ((()))(())()
β Here is one possible grammar for balanced
parentheses:
P β Ξ΅ | PP | (P)
105. Rethinking Parentheses
β A string of balanced parentheses is a
sequence of strings that are themselves
balanced parentheses.
β To avoid ambiguity, we can build the
string in two steps:
β Decide how many different substrings we
will glue together.
β Build each substring independently.
108. Um... what?
β The way the nyancat flies across the sky
is similar to how we can build up strings
of balanced parentheses.
109. Um... what?
β The way the nyancat flies across the sky
is similar to how we can build up strings
of balanced parentheses.
110. Um... what?
β The way the nyancat flies across the sky
is similar to how we can build up strings
of balanced parentheses.
111. Um... what?
β The way the nyancat flies across the sky
is similar to how we can build up strings
of balanced parentheses.
112. Um... what?
β The way the nyancat flies across the sky
is similar to how we can build up strings
of balanced parentheses.
113. Um... what?
β The way the nyancat flies across the sky
is similar to how we can build up strings
of balanced parentheses.
114. Um... what?
β The way the nyancat flies across the sky
is similar to how we can build up strings
of balanced parentheses.
115. Um... what?
β The way the nyancat flies across the sky
is similar to how we can build up strings
of balanced parentheses.
β
β Ξ΅
β
β
116. Building Parentheses
β Spread a string of parentheses across the
string. There is exactly one way to do
this for any number of parentheses.
β Expand out each substring by adding in
parentheses and repeating.
S
P
β
β
S
P | Ξ΅
( )
S
117. Building Parentheses
S
β PS
β PPS
β PP
β (S)P
β (S)(S)
β (PS)(S)
β (P)(S)
β ((S))(S)
β (())(S)
β (())()
S
P
β
β
S
P | Ξ΅
( )
S
118. Context-Free Grammars
β A regular expression can be
β Any letter
β Ξ΅
β The concatenation of regular expressions.
β The union of regular expressions.
β The Kleene closure of a regular expression.
β A parenthesized regular expression.
119. Context-Free Grammars
β This gives us the following CFG:
R β a | b | c | β¦
R β βΞ΅β
R β RR
R β R β|β R
R β R*
R β (R)
120. R β a | b | c | β¦
R β βΞ΅β
R β RR
R β R β|β R
R β R*
R β (R)
a | b *
R
R
R
R
a | b *
R R
R
R
a | (b*) (a | b)*
An Ambiguous Grammar
121. Resolving Ambiguity
β We can try to resolve the ambiguity via
layering:
R β a | b | c | β¦
R β βΞ΅β
R β RR
R β R β|β R
R β R*
R β (R)
|
a a *
b
122. Resolving Ambiguity
β We can try to resolve the ambiguity via
layering:
R β a | b | c | β¦
R β βΞ΅β
R β RR
R β R β|β R
R β R*
R β (R)
|
a a *
b
123. Resolving Ambiguity
β We can try to resolve the ambiguity via
layering:
R β a | b | c | β¦
R β βΞ΅β
R β RR
R β R β|β R
R β R*
R β (R)
|
a a *
b
124. Resolving Ambiguity
β We can try to resolve the ambiguity via
layering:
R β a | b | c | β¦
R β βΞ΅β
R β RR
R β R β|β R
R β R*
R β (R)
|
a a *
b
125. Resolving Ambiguity
β We can try to resolve the ambiguity via
layering:
R β a | b | c | β¦
R β βΞ΅β
R β RR
R β R β|β R
R β R*
R β (R)
|
a a *
b
126. Resolving Ambiguity
β We can try to resolve the ambiguity via
layering:
R β a | b | c | β¦
R β βΞ΅β
R β RR
R β R β|β R
R β R*
R β (R)
|
a a *
b
127. Resolving Ambiguity
β We can try to resolve the ambiguity via
layering:
R β a | b | c | β¦
R β βΞ΅β
R β RR
R β R β|β R
R β R*
R β (R)
|
a a *
b
128. Resolving Ambiguity
β We can try to resolve the ambiguity via
layering:
R β a | b | c | β¦
R β βΞ΅β
R β RR
R β R β|β R
R β R*
R β (R)
R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | c | β¦
U β βΞ΅β
U β (R)
129. Why is this unambiguous?
R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | β¦
U β βΞ΅β
U β (R)
130. Why is this unambiguous?
Only generates
βatomicβ expressions
R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | β¦
U β βΞ΅β
U β (R)
131. Why is this unambiguous?
Only generates
βatomicβ expressions
Puts stars onto
atomic expressions
R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | β¦
U β βΞ΅β
U β (R)
132. Why is this unambiguous?
Only generates
βatomicβ expressions
Puts stars onto
atomic expressions
Concatenates starred
expressions
R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | β¦
U β βΞ΅β
U β (R)
133. Why is this unambiguous?
Only generates
βatomicβ expressions
Puts stars onto
atomic expressions
Concatenates starred
expressions
Unions
concatenated
expressions
R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | β¦
U β βΞ΅β
U β (R)
134. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | c | β¦
U β βΞ΅β
U β (R)
a b | c | a *
R
135. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | c | β¦
U β βΞ΅β
U β (R)
a b | c | a *
S
R
R
136. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | c | β¦
U β βΞ΅β
U β (R)
a b | c | a *
S
S
R
R
R
137. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | c | β¦
U β βΞ΅β
U β (R)
a b | c | a *
S
S
S
R
R
R
138. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | c | β¦
U β βΞ΅β
U β (R)
a b | c | a *
T
S
S
S
S
R
R
R
139. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | c | β¦
U β βΞ΅β
U β (R)
a b | c | a *
T
T
S
S
S
S
R
R
R
140. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | c | β¦
U β βΞ΅β
U β (R)
a b | c | a *
U
T
T
S
S
S
S
R
R
R
141. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | c | β¦
U β βΞ΅β
U β (R)
a b | c | a *
U
T
T
S
S
S
S
R
R
R
142. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | c | β¦
U β βΞ΅β
U β (R)
a b | c | a *
U U
T
T
S
S
S
S
R
R
R
143. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | c | β¦
U β βΞ΅β
U β (R)
a b | c | a *
U U
T
T
S
S
S
S
R
R
R
144. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | c | β¦
U β βΞ΅β
U β (R)
a b | c | a *
U U
T
T
T
S
S
S
S
R
R
R
145. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | c | β¦
U β βΞ΅β
U β (R)
a b | c | a *
U U U
T
T
T
S
S
S
S
R
R
R
146. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | c | β¦
U β βΞ΅β
U β (R)
a b | c | a *
U U U
T
T
T
S
S
S
S
R
R
R
147. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | c | β¦
U β βΞ΅β
U β (R)
a b | c | a *
U U U
T
T
T
S
S
S
S
R
R
R
T
148. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | c | β¦
U β βΞ΅β
U β (R)
a b | c | a *
U U U
T
T
T
T
S
S
S
S
R
R
R
T
149. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | c | β¦
U β βΞ΅β
U β (R)
a b | c | a *
U U U U
T
T
T
T
S
S
S
S
R
R
R
T
150. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | c | β¦
U β βΞ΅β
U β (R)
a b | c | a *
U U U U
T
T
T
T
S
S
S
S
R
R
R
T
151. Precedence Declarations
β If we leave the world of pure CFGs, we
can often resolve ambiguities through
precedence declarations.
β e.g. multiplication has higher precedence
than addition, but lower precedence than
exponentiation.
β Allows for unambiguous parsing of
ambiguous grammars.
β We'll see how this is implemented later
on.
152. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | β¦
U β βΞ΅β
U β (R)
The Structure of a Parse Tree
153. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | β¦
U β βΞ΅β
U β (R)
a a | b *
The Structure of a Parse Tree
154. R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | β¦
U β βΞ΅β
U β (R)
a a | b *
U
U
U
T
T
T
S
S
R S
R
The Structure of a Parse Tree
155. a a | b *
U
U
U
T
T
T
S
S
R S
R
The Structure of a Parse Tree
a a b
concat
*
or
R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | β¦
U β βΞ΅β
U β (R)
a a | b *
156. a ( b | c )
R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | β¦
U β βΞ΅β
U β (R)
157. a ( b | c )
U U U
T T
T
S
S S
R
R
U
T
S
R
R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | β¦
U β βΞ΅β
U β (R)
158. a ( b | c )
U U U
T T
T
S
S S
R
R
U
T
S
R
a
b c
or
concat
R β S | R β|β S
S β T | ST
T β U | T*
U β a | b | β¦
U β βΞ΅β
U β (R)
159. Abstract Syntax Trees (ASTs)
β A parse tree is a concrete syntax tree;
it shows exactly how the text was
derived.
β A more useful structure is an abstract
syntax tree, which retains only the
essential structure of the input.
160. How to build an AST?
β Typically done through semantic actions.
β Associate a piece of code to execute with
each production.
β As the input is parsed, execute this code to
build the AST.
β Exact order of code execution depends on the
parsing method used.
β
This is called a syntax-directed
translation.
161. Simple Semantic Actions
E β T + E
E β T
T β int
T β int * T
T β (E)
E1
.val = T.val + E2
.val
E.val = T.val
T.val = int.val
T.val = int.val * T.val
T.val = E.val
162. Simple Semantic Actions
E β T + E
E β T
T β int
T β int * T
T β (E)
E1
.val = T.val + E2
.val
E.val = T.val
T.val = int.val
T.val = int.val * T.val
T.val = E.val
int * + 7
5
26 int int
163. Simple Semantic Actions
E β T + E
E β T
T β int
T β int * T
T β (E)
E1
.val = T.val + E2
.val
E.val = T.val
T.val = int.val
T.val = int.val * T.val
T.val = E.val
int * + 7
T 5
5
26 int int
164. Simple Semantic Actions
E β T + E
E β T
T β int
T β int * T
T β (E)
E1
.val = T.val + E2
.val
E.val = T.val
T.val = int.val
T.val = int.val * T.val
T.val = E.val
int * + 7
T
T
5
130
5
26 int int
165. Simple Semantic Actions
E β T + E
E β T
T β int
T β int * T
T β (E)
E1
.val = T.val + E2
.val
E.val = T.val
T.val = int.val
T.val = int.val * T.val
T.val = E.val
int * + 7
T
T
T
5
130
7
5
26 int int
166. Simple Semantic Actions
E β T + E
E β T
T β int
T β int * T
T β (E)
E1
.val = T.val + E2
.val
E.val = T.val
T.val = int.val
T.val = int.val * T.val
T.val = E.val
int * + 7
T
T
T
E
5
130
7
7
5
26 int int
167. Simple Semantic Actions
E β T + E
E β T
T β int
T β int * T
T β (E)
E1
.val = T.val + E2
.val
E.val = T.val
T.val = int.val
T.val = int.val * T.val
T.val = E.val
int * + 7
T
T
T
E
5
130
7
7
5
26 int int
E 137
168. R β S R.ast = S.ast;
R β R β|β S R1
.ast = new Or(R2
.ast, S.ast);
S β T S.ast = T.ast;
S β ST S1
.ast = new Concat(S2
.ast, T.ast);
T β U T.ast = U.ast;
T β T* T1
.ast = new Star(T2
.ast);
U β a U.ast = new SingleChar('a');
U β βΞ΅β U.ast = new Epsilon();
U β (R) U.ast = R.ast;
Semantic Actions to Build ASTs
169. Summary
β
Syntax analysis (parsing) extracts the structure from the
tokens produced by the scanner.
β
Languages are usually specified by context-free grammars
(CFGs).
β A parse tree shows how a string can be derived from a
grammar.
β A grammar is ambiguous if it can derive the same string
multiple ways.
β There is no algorithm for eliminating ambiguity; it must be
done by hand.
β
Abstract syntax trees (ASTs) contain an abstract
representation of a program's syntax.
β
Semantic actions associated with productions can be used
to build ASTs.
170. Next Time
β Top-Down Parsing
β Parsing as a Search
β Backtracking Parsers
β Predictive Parsers
β LL(1)