The document discusses lexical analysis in compiler construction, including an overview of the topics covered such as regular languages represented as regular grammars, regular expressions, and finite state automata. It also discusses the equivalence between these formalisms and techniques for constructing tools for lexical analysis.
Fostering Friendships - Enhancing Social Bonds in the Classroom
Compiler Components and their Generators - Lexical Analysis
1. Compiler Components & Generators
Lexical Analysis
Guido Wachsmuth
Delft
Course IN4303
University of
Technology Compiler Construction
Challenge the future
2. Recap: Garbage Collection
lessons learned
How can we collect unreachable records on the heap?
• reference counts
• mark reachable records, sweep unreachable records
• copy reachable records
How can we reduce heap space needed for garbage collection?
• pointer-reversal
• breadth-first search
• hybrid algorithms
Lexical Analysis 2
9. Recap: A Theory of Language
formal languages
Lexical Analysis 5
10. Recap: A Theory of Language
formal languages
vocabulary Σ
finite, nonempty set of elements (words, letters)
alphabet
Lexical Analysis 5
11. Recap: A Theory of Language
formal languages
vocabulary Σ
finite, nonempty set of elements (words, letters)
alphabet
string over Σ
finite sequence of elements chosen from Σ
word, sentence, utterance
Lexical Analysis 5
12. Recap: A Theory of Language
formal languages
vocabulary Σ
finite, nonempty set of elements (words, letters)
alphabet
string over Σ
finite sequence of elements chosen from Σ
word, sentence, utterance
formal language λ
set of strings over a vocabulary Σ
λ ⊆ Σ*
Lexical Analysis 5
13. Recap: A Theory of Language
formal grammars
Lexical Analysis 6
14. Recap: A Theory of Language
formal grammars
formal grammar G = (N, Σ, P, S)
nonterminal symbols N
terminal symbols Σ
production rules P ⊆ (N∪Σ)* N (N∪Σ)* × (N∪Σ)*
start symbol S∈N
Lexical Analysis 6
15. Recap: A Theory of Language
formal grammars
formal grammar G = (N, Σ, P, S)
nonterminal symbols N nonterminal symbol
terminal symbols Σ
production rules P ⊆ (N∪Σ)* N (N∪Σ)* × (N∪Σ)*
start symbol S∈N
Lexical Analysis 6
16. Recap: A Theory of Language
formal grammars
formal grammar G = (N, Σ, P, S)
nonterminal symbols N
terminal symbols Σ
production rules P ⊆ (N∪Σ)* N (N∪Σ)* × (N∪Σ)*
start symbol S∈N
context
Lexical Analysis 6
17. Recap: A Theory of Language
formal grammars
formal grammar G = (N, Σ, P, S)
nonterminal symbols N
terminal symbols Σ
production rules P ⊆ (N∪Σ)* N (N∪Σ)* × (N∪Σ)*
start symbol S∈N
replacement
Lexical Analysis 6
18. Recap: A Theory of Language
formal grammars
formal grammar G = (N, Σ, P, S)
nonterminal symbols N
terminal symbols Σ
production rules P ⊆ (N∪Σ)* N (N∪Σ)* × (N∪Σ)*
start symbol S∈N
grammar classes
type-0, unrestricted
type-1, context-sensitive: (a A c, a b c)
type-2, context-free: P ⊆ N × (N∪Σ)*
type-3, regular: (A, x) or (A, xB)
Lexical Analysis 6
19. Decimal Numbers
right regular grammar
Num → “0” Num Num → “0”
Num → “1” Num Num → “1”
Num → “2” Num Num → “2”
Num → “3” Num Num → “3”
Num → “4” Num Num → “4”
Num → “5” Num Num → “5”
Num → “6” Num Num → “6”
Num → “7” Num Num → “7”
Num → “8” Num Num → “8”
Num → “9” Num Num → “9”
Lexical Analysis 7
20. Identifiers
right regular grammar
Id → “a” R Id → “a”
… …
Id → “z” R Id → “z”
R → “a” R R → “a”
… …
R → “z” R R → “z”
R → “0” R R → “0”
… …
R → “9” R R → “9”
Lexical Analysis 8
21. Recap: A Theory of Language
formal languages
Lexical Analysis 9
22. Recap: A Theory of Language
formal languages
formal grammar G = (N, Σ, P, S)
Lexical Analysis 9
23. Recap: A Theory of Language
formal languages
formal grammar G = (N, Σ, P, S)
derivation relation G ⊆ (N∪Σ)* × (N∪Σ)*
w G w’
∃(p, q)∈P: ∃u,v∈(N∪Σ)*:
w=u p v ∧ w’=u q v
Lexical Analysis 9
24. Recap: A Theory of Language
formal languages
formal grammar G = (N, Σ, P, S)
derivation relation G ⊆ (N∪Σ)* × (N∪Σ)*
w G w’
∃(p, q)∈P: ∃u,v∈(N∪Σ)*:
w=u p v ∧ w’=u q v
formal language L(G) ⊆ Σ*
L(G) = {w∈Σ* | S G
*
w}
Lexical Analysis 9
25. Recap: A Theory of Language
formal languages
formal grammar G = (N, Σ, P, S)
derivation relation G ⊆ (N∪Σ)* × (N∪Σ)*
w G w’
∃(p, q)∈P: ∃u,v∈(N∪Σ)*:
w=u p v ∧ w’=u q v
formal language L(G) ⊆ Σ*
L(G) = {w∈Σ* | S G
*
w}
classes of formal languages
Lexical Analysis 9
32. Finite Automata
formal definition
finite automaton M = (Q, Σ, T, q0, F)
states Q
input symbols Σ
transition function T
start state q0∈Q
final states F⊆Q
Lexical Analysis 15
33. Finite Automata
formal definition
finite automaton M = (Q, Σ, T, q0, F)
states Q
input symbols Σ
transition function T
start state q0∈Q
final states F⊆Q
transition function
nondeterministic FA T : Q × Σ → P(Q)
NFA with ε-moves T : Q × (Σ ∪ {ε}) → P(Q)
deterministic FA T : Q × Σ → Q
Lexical Analysis 15
54. NFA construction
right regular grammar
formal grammar G = (N, Σ, P, S)
finite automaton M = (N ∪ {f}, Σ, T, S, F)
Lexical Analysis 23
55. NFA construction
right regular grammar
formal grammar G = (N, Σ, P, S)
finite automaton M = (N ∪ {f}, Σ, T, S, F)
transition function T
(X, aY)∈P : (X, a, Y)∈T
(X, a)∈P : (X, a, f)∈T
Lexical Analysis 23
56. NFA construction
right regular grammar
formal grammar G = (N, Σ, P, S)
finite automaton M = (N ∪ {f}, Σ, T, S, F)
transition function T
(X, aY)∈P : (X, a, Y)∈T
(X, a)∈P : (X, a, f)∈T
final states F
(S, ε)∈P : F = {S, f}
else: F = {f}
Lexical Analysis 23
57. NFA construction
example
Num → “0” Num Num → “0”
Num → “1” Num Num → “1”
Num → “2” Num Num → “2”
Num → “3” Num Num → “3”
Num → “4” Num Num → “4”
Num → “5” Num Num → “5”
Num → “6” Num Num → “6”
Num → “7” Num Num → “7”
Num → “8” Num Num → “8”
Num → “9” Num Num → “9”
Lexical Analysis 24
58. NFA construction
example
Num → “0” Num Num → “0”
Num → “1” Num Num → “1”
Num → “2” Num Num → “2”
N
Num → “3” Num Num → “3”
Num → “4” Num Num → “4”
Num → “5” Num Num → “5”
Num → “6” Num Num → “6”
Num → “7” Num Num → “7” f
Num → “8” Num Num → “8”
Num → “9” Num Num → “9”
Lexical Analysis 24
59. NFA construction
example
Num → “0” Num Num → “0”
Num → “1” Num Num → “1”
Num → “2” Num Num → “2”
N
Num → “3” Num Num → “3”
Num → “4” Num Num → “4”
Num → “5” Num Num → “5”
Num → “6” Num Num → “6”
Num → “7” Num Num → “7” f
Num → “8” Num Num → “8”
Num → “9” Num Num → “9”
Lexical Analysis 24
60. NFA construction
example
Num → “0” Num Num → “0”
Num → “1” Num Num → “1”
Num → “2” Num Num → “2”
N 0-9
Num → “3” Num Num → “3”
Num → “4” Num Num → “4”
Num → “5” Num Num → “5”
Num → “6” Num Num → “6”
Num → “7” Num Num → “7” f
Num → “8” Num Num → “8”
Num → “9” Num Num → “9”
Lexical Analysis 24
61. NFA construction
example
Num → “0” Num Num → “0”
Num → “1” Num Num → “1”
Num → “2” Num Num → “2”
N 0-9
Num → “3” Num Num → “3”
Num → “4” Num Num → “4”
Num → “5” Num Num → “5” 0-9
Num → “6” Num Num → “6”
Num → “7” Num Num → “7” f
Num → “8” Num Num → “8”
Num → “9” Num Num → “9”
Lexical Analysis 24
64. NFA construction
ε elimination
additional final states
• states with ε-moves into final states
• become final states themselves
additional transitions
• ε-move from source to target state
• transitions from target state
• add these transitions to the source state
Lexical Analysis 27
65. Powerset construction
eliminating nondeterminism
nondeterministic finite automaton M = (Q, Σ, T, q0, F)
deterministic finite automaton M’ = (P(Q), Σ, T’, {q0}, F’)
transition function T’
• T’({q1,..., qn}, x) = T({q1,..., qn}, x) = T(q1, x) ∪ … ∪ T(qn, x)
final states F’ = {S⎮S⊆Q, S∩F ≠ ∅}
• all states that include a final state of the original NFA
Lexical Analysis 28
66. Powerset construction
example
f f
3 4 23 24
a-e,g-z a-z
i i 0-9 0-9
a-z a-z a-z
1 2 1 2
0-9 a-h 0-9
j-z
Lexical Analysis 29
69. Summary
lessons learned
What are the formalisms to describe regular languages?
• regular grammars
• regular expressions
• finite state automata
Lexical Analysis 31
70. Summary
lessons learned
What are the formalisms to describe regular languages?
• regular grammars
• regular expressions
• finite state automata
Why are these formalisms equivalent?
• constructive proofs
Lexical Analysis 31
71. Summary
lessons learned
What are the formalisms to describe regular languages?
• regular grammars
• regular expressions
• finite state automata
Why are these formalisms equivalent?
• constructive proofs
How can we generate compiler tools from that?
• implement DFAs
• generate transition tables
Lexical Analysis 31
73. Literature
learn more
formal languages
Noam Chomsky: Three models for the description of language. 1956
J. E. Hopcroft, R. Motwani, J. D. Ullman: Introduction to Automata
Theory, Languages, and Computation. 2006
Lexical Analysis 32
74. Literature
learn more
formal languages
Noam Chomsky: Three models for the description of language. 1956
J. E. Hopcroft, R. Motwani, J. D. Ullman: Introduction to Automata
Theory, Languages, and Computation. 2006
lexical analysis
Andrew W. Appel, Jens Palsberg: Modern Compiler Implementation in
Java, 2nd edition. 2002
Lexical Analysis 32
75. Outlook
coming next
compiler components and their generators
• Lecture 12: Syntactical Analysis
• Lecture 13: SDF inside
• Lecture 14: Static Analysis
Lab Nov 26
• test cases
• finish Lab 2, do reference resolution next
• finish Lab 3, do hover help next
• content completion
Lexical Analysis 33
78. Pictures
copyrights
Slide 1:
Book Scanner by Ben Woosley, some rights reserved
Slides 5, 6, 9:
Noam Chomsky by Fellowsisters, some rights reserved
Slide 7, 8, 12, 16:
Tiger by Bernard Landgraf, some rights reserved
Slide 19:
Coffee Primo by Dominica Williamson, some rights reserved
Slide 33:
Pine Creek by Nicholas, some rights reserved
Slide 34:
Questions by Oberazzi, some rights reserved
Slide 35:
Too Much Credit by Andres Rueda, some rights reserved
Lexical Analysis 36
Editor's Notes
\n
\n
\n
\n
\n
\n
\n
linguistic theory\n\nNoam Chomsky\n
linguistic theory\n\nNoam Chomsky\n
linguistic theory\n\nNoam Chomsky\n
restrictions on production rules => grammar classes\n
restrictions on production rules => grammar classes\n
restrictions on production rules => grammar classes\n
restrictions on production rules => grammar classes\n
restrictions on production rules => grammar classes\n
restrictions on production rules => grammar classes\n
restrictions on production rules => grammar classes\n
restrictions on production rules => grammar classes\n
computer science: lexical syntax\n\ncan write this as a regular grammar\n
computer science: lexical syntax\n\ncan write this as a regular grammar\n
\n
\n
\n
\n
\n
\n
computer science: lexical syntax\n\ncan write this as a regular grammar\n
\n
\n
restrictions on production rules => grammar classes\n
restrictions on production rules => grammar classes\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
restrictions on production rules => grammar classes\n
restrictions on production rules => grammar classes\n
restrictions on production rules => grammar classes\n
restrictions on production rules => grammar classes\n
computer science: lexical syntax\n\ncan write this as a regular grammar\n
computer science: lexical syntax\n\ncan write this as a regular grammar\n
computer science: lexical syntax\n\ncan write this as a regular grammar\n
computer science: lexical syntax\n\ncan write this as a regular grammar\n
computer science: lexical syntax\n\ncan write this as a regular grammar\n