Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lexical Analysis

921 views

Published on

Slides for lecture on lexical analysis for Compiler Construction course at TU Delft

Published in: Software
  • Be the first to comment

Lexical Analysis

  1. 1. Challenge the future Delft University of Technology Course IN4303 Compiler Construction Eduardo Souza, Guido Wachsmuth, Eelco Visser Lexical Analysis
  2. 2. today’s lecture Lexical Analysis Overview 2
  3. 3. today’s lecture Lexical Analysis Overview Lexical analysis 2
  4. 4. today’s lecture Lexical Analysis Overview Lexical analysis Regular languages • regular grammars • regular expressions • finite state automata 2
  5. 5. today’s lecture Lexical Analysis Overview Lexical analysis Regular languages • regular grammars • regular expressions • finite state automata Equivalence of formalisms • constructive approach 2
  6. 6. today’s lecture Lexical Analysis Overview Lexical analysis Regular languages • regular grammars • regular expressions • finite state automata Equivalence of formalisms • constructive approach Tool generation 2
  7. 7. Lexical Analysis Regular Grammars I 3
  8. 8. formal languages Lexical Analysis 4 Recap: A Theory of Language
  9. 9. formal languages Lexical Analysis vocabulary Σ finite, nonempty set of elements (words, letters) alphabet 4 Recap: A Theory of Language
  10. 10. formal languages Lexical Analysis vocabulary Σ finite, nonempty set of elements (words, letters) alphabet string over Σ finite sequence of elements chosen from Σ word, sentence, utterance 4 Recap: A Theory of Language
  11. 11. formal languages Lexical Analysis vocabulary Σ finite, nonempty set of elements (words, letters) alphabet string over Σ finite sequence of elements chosen from Σ word, sentence, utterance formal language λ set of strings over a vocabulary Σ λ ⊆ Σ* 4 Recap: A Theory of Language
  12. 12. formal grammars Lexical Analysis 5 Recap: A Theory of Language
  13. 13. formal grammars Lexical Analysis formal grammar G = (N, Σ, P, S) nonterminal symbols N terminal symbols Σ production rules P ⊆ (N∪Σ)* N (N∪Σ)* × (N∪Σ)* start symbol S∈N 5 Recap: A Theory of Language
  14. 14. formal grammars Lexical Analysis formal grammar G = (N, Σ, P, S) nonterminal symbols N terminal symbols Σ production rules P ⊆ (N∪Σ)* N (N∪Σ)* × (N∪Σ)* start symbol S∈N 5 Recap: A Theory of Language nonterminal symbol
  15. 15. formal grammars Lexical Analysis formal grammar G = (N, Σ, P, S) nonterminal symbols N terminal symbols Σ production rules P ⊆ (N∪Σ)* N (N∪Σ)* × (N∪Σ)* start symbol S∈N 5 Recap: A Theory of Language context
  16. 16. formal grammars Lexical Analysis formal grammar G = (N, Σ, P, S) nonterminal symbols N terminal symbols Σ production rules P ⊆ (N∪Σ)* N (N∪Σ)* × (N∪Σ)* start symbol S∈N 5 Recap: A Theory of Language replacement
  17. 17. formal grammars Lexical Analysis formal grammar G = (N, Σ, P, S) nonterminal symbols N terminal symbols Σ production rules P ⊆ (N∪Σ)* N (N∪Σ)* × (N∪Σ)* start symbol S∈N grammar classes type-0, unrestricted type-1, context-sensitive: (a A c, a b c) type-2, context-free: P ⊆ N × (N∪Σ)* type-3, regular: (A, x) or (A, xB) 5 Recap: A Theory of Language
  18. 18. right regular grammar Lexical Analysis 6 Num → “0” Num Num → “1” Num Num → “2” Num Num → “3” Num Num → “4” Num Num → “5” Num Num → “6” Num Num → “7” Num Num → “8” Num Num → “9” Num Num → “0” Num → “1” Num → “2” Num → “3” Num → “4” Num → “5” Num → “6” Num → “7” Num → “8” Num → “9” Decimal Numbers
  19. 19. right regular grammar Lexical Analysis 7 Id → “a” R … Id → “z” R R → “a” R … R → “z” R R → “0” R … R → “9” R Id → “a” … Id → “z” R → “a” … R → “z” R → “0” … R → “9” Identifiers
  20. 20. formal languages Lexical Analysis 8 Recap: A Theory of Language
  21. 21. formal languages Lexical Analysis formal grammar G = (N, Σ, P, S) 8 Recap: A Theory of Language
  22. 22. formal languages Lexical Analysis formal grammar G = (N, Σ, P, S) derivation relation G ⊆ (N∪Σ)* × (N∪Σ)* w G w’ ∃(p, q)∈P: ∃u,v∈(N∪Σ)* : w=u p v ∧ w’=u q v 8 Recap: A Theory of Language
  23. 23. formal languages Lexical Analysis formal grammar G = (N, Σ, P, S) derivation relation G ⊆ (N∪Σ)* × (N∪Σ)* w G w’ ∃(p, q)∈P: ∃u,v∈(N∪Σ)* : w=u p v ∧ w’=u q v formal language L(G) ⊆ Σ* L(G) = {w∈Σ* | S G * w} 8 Recap: A Theory of Language
  24. 24. formal languages Lexical Analysis formal grammar G = (N, Σ, P, S) derivation relation G ⊆ (N∪Σ)* × (N∪Σ)* w G w’ ∃(p, q)∈P: ∃u,v∈(N∪Σ)* : w=u p v ∧ w’=u q v formal language L(G) ⊆ Σ* L(G) = {w∈Σ* | S G * w} classes of formal languages 8 Recap: A Theory of Language
  25. 25. Lexical Analysis 9 Example Is G regular?
 L(G) = ?
  26. 26. Lexical Analysis G: 9 Example Is G regular?
 L(G) = ?
  27. 27. Lexical Analysis G: 9 Example Is G regular?
 L(G) = ?
  28. 28. Lexical Analysis G: S → a 9 Example Is G regular?
 L(G) = ?
  29. 29. Lexical Analysis G: S → a S → aA 9 Example Is G regular?
 L(G) = ?
  30. 30. Lexical Analysis G: S → a S → aA A → aB 9 Example Is G regular?
 L(G) = ?
  31. 31. Lexical Analysis G: S → a S → aA A → aB B → aC 9 Example Is G regular?
 L(G) = ?
  32. 32. Lexical Analysis G: S → a S → aA A → aB B → aC C → aD 9 Example Is G regular?
 L(G) = ?
  33. 33. Lexical Analysis G: S → a S → aA A → aB B → aC C → aD D → aS 9 Example Is G regular?
 L(G) = ?
  34. 34. Lexical Analysis G: S → a S → aA A → aB B → aC C → aD D → aS 9 Example Is G regular?
 L(G) = ?
  35. 35. Lexical Analysis Regular Expressions II 10
  36. 36. overview Lexical Analysis Recap: Regular Expressions basics • symbol from an alphabet • ε combinators • alternation: E1 | E2 • concatenation: E1 E2 • repetition: E* • optional: E? = E | ε • one or more: E+ = E E* 11
  37. 37. regular expressions Lexical Analysis 12 Num: (0|1|2|3|4|5|6|7|8|9)+ Id: (a|…|z)(a|…|z|0|…|9)* Decimal Numbers & Identifiers
  38. 38. formal languages Lexical Analysis Regular Expressions basics • L(a) = {“a”} • L(ε) = {“”} combinators • L(E1 | E2) = L(E1) ∪ L(E2) • L(E1 E2) = L(E1) · L(E2) • L(E*) = L(E)* 13
  39. 39. regular expressions Lexical Analysis 14 Num: (0|1|2|3|4|5|6|7|8|9)+ A valid Num is any word that consists of a sequence of one or more digits. Id: (a|…|z)(a|…|z|0|…|9)* A valid Id is any word that starts with a lowercase letter followed by sequence of zero or more lowercase letters or digits. Decimal Numbers & Identifiers
  40. 40. Lexical Analysis 15 Example
  41. 41. Lexical Analysis 15 Example Gr:
  42. 42. Lexical Analysis 15 Example Gr:
  43. 43. Lexical Analysis 15 Example Gr: S → ('A'= | … | 'Z')*(0 | … | 9)*
  44. 44. Lexical Analysis 15 Example Gr: S → ('A'= | … | 'Z')*(0 | … | 9)*
  45. 45. Lexical Analysis 15 Example Gr: S → ('A'= | … | 'Z')*(0 | … | 9)* L(Gr) = ?
  46. 46. Lexical Analysis 16 Example
  47. 47. Lexical Analysis 16 Example Gr:
  48. 48. Lexical Analysis 16 Example Gr:
  49. 49. Lexical Analysis 16 Example Gr: S → ('A'= | … | 'Z')*(0 | … | 9)*
  50. 50. Lexical Analysis 16 Example Gr: S → ('A'= | … | 'Z')*(0 | … | 9)*
  51. 51. Lexical Analysis 16 Example Gr: S → ('A'= | … | 'Z')*(0 | … | 9)* L(Gr) = The set of words that start with zero or more capital letters, followed by zero or more decimal digits.
  52. 52. Lexical Analysis Finite Automata III 17
  53. 53. formal definition Lexical Analysis 18 Finite Automata
  54. 54. formal definition Lexical Analysis finite automaton M = (Q, Σ, T, q0, F) states Q input symbols Σ transition function T start state q0∈Q final states F⊆Q 18 Finite Automata
  55. 55. formal definition Lexical Analysis finite automaton M = (Q, Σ, T, q0, F) states Q input symbols Σ transition function T start state q0∈Q final states F⊆Q transition function nondeterministic FA T : Q × Σ → P(Q) NFA with ε-moves T : Q × (Σ ∪ {ε}) → P(Q) deterministic FA T : Q × Σ → Q 18 Finite Automata
  56. 56. finite automata Lexical Analysis Decimal Numbers & Identifiers 19 1 2 0-9 0-9 1 2 a-z a-z 0-9
  57. 57. formal languages Lexical Analysis 20 Nondeterministic Finite Automata
  58. 58. formal languages Lexical Analysis finite automaton M = (Q, Σ, T, q0, F) 20 Nondeterministic Finite Automata
  59. 59. formal languages Lexical Analysis finite automaton M = (Q, Σ, T, q0, F) transition function T : Q × Σ → P(Q) T({q1,..., qn}, x) := T(q1, x) ∪ … ∪ T(qn, x) T*({q1,..., qn}, ε) := {q1,..., qn} T*({q1,..., qn}, xw) := T*(T({q1,..., qn}, x), w) 20 Nondeterministic Finite Automata
  60. 60. formal languages Lexical Analysis finite automaton M = (Q, Σ, T, q0, F) transition function T : Q × Σ → P(Q) T({q1,..., qn}, x) := T(q1, x) ∪ … ∪ T(qn, x) T*({q1,..., qn}, ε) := {q1,..., qn} T*({q1,..., qn}, xw) := T*(T({q1,..., qn}, x), w) formal language L(M) ⊆ Σ* L(M) = {w∈Σ* | T* ({q0}, w) ∩ F ≠ ∅} 20 Nondeterministic Finite Automata
  61. 61. formal languages Lexical Analysis 21 Nondeterministic Finite Automata 3 4 f 1 2 a-z a-z 0-9 i
  62. 62. formal languages Lexical Analysis 22 Deterministic Finite Automata
  63. 63. formal languages Lexical Analysis finite automaton M = (Q, Σ, T, q0, F) 22 Deterministic Finite Automata
  64. 64. formal languages Lexical Analysis finite automaton M = (Q, Σ, T, q0, F) transition function T : Q × Σ → Q T*(q, ε) := q T*(q, xw) := T*(T(q, x), w) 22 Deterministic Finite Automata
  65. 65. formal languages Lexical Analysis finite automaton M = (Q, Σ, T, q0, F) transition function T : Q × Σ → Q T*(q, ε) := q T*(q, xw) := T*(T(q, x), w) formal language L(M) ⊆ Σ* L(M) = {w∈Σ* | T* (q0, w) ∈ F} 22 Deterministic Finite Automata
  66. 66. formal languages Lexical Analysis 23 Deterministic Finite Automata 1 2 0-9 0-9 1 2 a-z a-z 0-9
  67. 67. Lexical Analysis Equivalence IV 24
  68. 68. formalisms Lexical Analysis Regular Languages 25 right regular grammars left regular grammars regular expressions NFAs DFAs NFAs with ε-moves
  69. 69. formalisms Lexical Analysis Regular Languages 25 right regular grammars left regular grammars regular expressions NFAs DFAs NFAs with ε-moves
  70. 70. formalisms Lexical Analysis Regular Languages 25 right regular grammars left regular grammars regular expressions NFAs DFAs NFAs with ε-moves
  71. 71. formalisms Lexical Analysis Regular Languages 25 right regular grammars left regular grammars regular expressions NFAs DFAs NFAs with ε-moves
  72. 72. formalisms Lexical Analysis Regular Languages 25 right regular grammars left regular grammars regular expressions NFAs DFAs NFAs with ε-moves
  73. 73. formalisms Lexical Analysis Regular Languages 25 right regular grammars left regular grammars regular expressions NFAs DFAs NFAs with ε-moves
  74. 74. formalisms Lexical Analysis Regular Languages 26 right regular grammars left regular grammars regular expressions NFAs DFAs NFAs with ε-moves
  75. 75. formalisms Lexical Analysis Regular Languages 27 right regular grammars left regular grammars regular expressions NFAs DFAs NFAs with ε-moves
  76. 76. right regular grammar Lexical Analysis 28 NFA construction
  77. 77. right regular grammar Lexical Analysis formal grammar G = (N, Σ, P, S) 28 NFA construction
  78. 78. right regular grammar Lexical Analysis formal grammar G = (N, Σ, P, S) finite automaton M = (N ∪ {f}, Σ, T, S, F) 28 NFA construction
  79. 79. right regular grammar Lexical Analysis formal grammar G = (N, Σ, P, S) finite automaton M = (N ∪ {f}, Σ, T, S, F) transition function T (X, aY)∈P : (X, a, Y)∈T (X, a)∈P : (X, a, f)∈T 28 NFA construction
  80. 80. right regular grammar Lexical Analysis formal grammar G = (N, Σ, P, S) finite automaton M = (N ∪ {f}, Σ, T, S, F) transition function T (X, aY)∈P : (X, a, Y)∈T (X, a)∈P : (X, a, f)∈T final states F (S, ε)∈P : F = {S, f} else: F = {f} 28 NFA construction
  81. 81. Example Lexical Analysis 29 Num → “0” Num Num → “1” Num Num → “2” Num Num → “3” Num Num → “4” Num Num → “5” Num Num → “6” Num Num → “7” Num Num → “8” Num Num → “9” Num Num → “0” Num → “1” Num → “2” Num → “3” Num → “4” Num → “5” Num → “6” Num → “7” Num → “8” Num → “9” NFA construction
  82. 82. Example Lexical Analysis 29 Num → “0” Num Num → “1” Num Num → “2” Num Num → “3” Num Num → “4” Num Num → “5” Num Num → “6” Num Num → “7” Num Num → “8” Num Num → “9” Num Num → “0” Num → “1” Num → “2” Num → “3” Num → “4” Num → “5” Num → “6” Num → “7” Num → “8” Num → “9” NFA construction N f
  83. 83. Example Lexical Analysis 29 Num → “0” Num Num → “1” Num Num → “2” Num Num → “3” Num Num → “4” Num Num → “5” Num Num → “6” Num Num → “7” Num Num → “8” Num Num → “9” Num Num → “0” Num → “1” Num → “2” Num → “3” Num → “4” Num → “5” Num → “6” Num → “7” Num → “8” Num → “9” NFA construction N f
  84. 84. Example Lexical Analysis 29 Num → “0” Num Num → “1” Num Num → “2” Num Num → “3” Num Num → “4” Num Num → “5” Num Num → “6” Num Num → “7” Num Num → “8” Num Num → “9” Num Num → “0” Num → “1” Num → “2” Num → “3” Num → “4” Num → “5” Num → “6” Num → “7” Num → “8” Num → “9” NFA construction N f 0-9
  85. 85. Example Lexical Analysis 29 Num → “0” Num Num → “1” Num Num → “2” Num Num → “3” Num Num → “4” Num Num → “5” Num Num → “6” Num Num → “7” Num Num → “8” Num Num → “9” Num Num → “0” Num → “1” Num → “2” Num → “3” Num → “4” Num → “5” Num → “6” Num → “7” Num → “8” Num → “9” NFA construction N f 0-9 0-9
  86. 86. Example Lexical Analysis formal grammar G = (N, Σ, P, S) finite automaton M = (N ∪ {f}, Σ, T, S, F) transition function T (X, aY)∈P : (X, a, Y)∈T (X, a)∈P : (X, a, f)∈T final states F (S, ε)∈P : F = {S, f} else: F = {f} 30 NFA construction G: S → a S → aA A → aB B → aC C → aD D → aS
  87. 87. formalisms Lexical Analysis Regular Languages 31 right regular grammars left regular grammars regular expressions NFAs DFAs NFAs with ε-moves
  88. 88. regular expressions Lexical Analysis NFA construction 32 a a (x) εε x x* ε x ε
  89. 89. regular expressions Lexical Analysis NFA construction 33 x|y x xy y ε ε ε ε εε x y ε
  90. 90. regular expressions Lexical Analysis NFA construction 34 Gr: S → ('A'= | … | 'Z')*(0 | … | 9)*
  91. 91. formalisms Lexical Analysis Regular Languages 35 right regular grammars left regular grammars regular expressions NFAs DFAs NFAs with ε-moves
  92. 92. ε elimination Lexical Analysis additional final states • states with ε-moves into final states • become final states themselves additional transitions • ε-move from source to target state • transitions from target state • add these transitions to the source state 36 NFA construction
  93. 93. ε elimination Lexical Analysis NFA construction additional final states • states with ε-moves into final states • become final states themselves additional transitions • ε-move from source to target state • transitions from target state • add these transitions to the source state 37 Gr: S → ('A'= | … | 'Z')*(0 | … | 9)*
  94. 94. formalisms Lexical Analysis Regular Languages 38 right regular grammars left regular grammars regular expressions NFAs DFAs NFAs with ε-moves
  95. 95. eliminating nondeterminism Lexical Analysis nondeterministic finite automaton M = (Q, Σ, T, q0, F) deterministic finite automaton M’ = (P(Q), Σ, T’, {q0}, F’) transition function T’ • T’({q1,..., qn}, x) = T({q1,..., qn}, x) = T(q1, x) ∪ … ∪ T(qn, x) final states F’ = {S⎮S⊆Q, S∩F ≠ ∅} • all states that include a final state of the original NFA 39 Powerset construction
  96. 96. example Lexical Analysis Powerset construction 40 3 4 f 1 2 a-z a-z 0-9 i
  97. 97. example Lexical Analysis Powerset construction 41 3 4 f 1 2 a-z a-z 0-9 i 1 2 a-h j-z a-z 0-9 24 f i 23 a-e,g-z 0-9 a-z 0-9
  98. 98. Lexical Analysis Summary V 42
  99. 99. lessons learned Lexical Analysis Summary 43
  100. 100. lessons learned Lexical Analysis Summary What are the formalisms to describe regular languages? • regular grammars • regular expressions • finite state automata 43
  101. 101. lessons learned Lexical Analysis Summary What are the formalisms to describe regular languages? • regular grammars • regular expressions • finite state automata Why are these formalisms equivalent? • constructive proofs 43
  102. 102. lessons learned Lexical Analysis Summary What are the formalisms to describe regular languages? • regular grammars • regular expressions • finite state automata Why are these formalisms equivalent? • constructive proofs How can we generate compiler tools from that? • implement DFAs 43
  103. 103. learn more Lexical Analysis Literature 44
  104. 104. learn more Lexical Analysis Literature Formal languages Noam Chomsky: Three models for the description of language. 1956 J. E. Hopcroft, R. Motwani, J. D. Ullman: Introduction to Automata Theory, Languages, and Computation. 2006 44
  105. 105. learn more Lexical Analysis Literature Formal languages Noam Chomsky: Three models for the description of language. 1956 J. E. Hopcroft, R. Motwani, J. D. Ullman: Introduction to Automata Theory, Languages, and Computation. 2006 Lexical analysis Andrew W. Appel, Jens Palsberg: Modern Compiler Implementation in Java, 2nd edition. 2002 44
  106. 106. Lexical Analysis Copyrights 45
  107. 107. Lexical Analysis 46
  108. 108. copyrights Lexical Analysis Pictures Slide 1: Book Scanner by Ben Woosley, some rights reserved Slides 4, 5, 8: Noam Chomsky by Fellowsisters, some rights reserved Slide 6, 7, 11, 15: Tiger by Bernard Landgraf, some rights reserved Slide 18: Coffee Primo by Dominica Williamson, some rights reserved Slide 32: Pine Creek by Nicholas, some rights reserved 47

×