Compiler Components and their Generators - Lexical Analysis

4,494 views
4,498 views

Published on

Presentation slides for lecture 11 of course IN4303 on Compiler Construction at TU Delft.

Published in: Education, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,494
On SlideShare
0
From Embeds
0
Number of Embeds
696
Actions
Shares
0
Downloads
135
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • linguistic theory\n\nNoam Chomsky\n
  • linguistic theory\n\nNoam Chomsky\n
  • linguistic theory\n\nNoam Chomsky\n
  • restrictions on production rules => grammar classes\n
  • restrictions on production rules => grammar classes\n
  • restrictions on production rules => grammar classes\n
  • restrictions on production rules => grammar classes\n
  • restrictions on production rules => grammar classes\n
  • restrictions on production rules => grammar classes\n
  • restrictions on production rules => grammar classes\n
  • restrictions on production rules => grammar classes\n
  • computer science: lexical syntax\n\ncan write this as a regular grammar\n
  • computer science: lexical syntax\n\ncan write this as a regular grammar\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • computer science: lexical syntax\n\ncan write this as a regular grammar\n
  • \n
  • \n
  • restrictions on production rules => grammar classes\n
  • restrictions on production rules => grammar classes\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • restrictions on production rules => grammar classes\n
  • restrictions on production rules => grammar classes\n
  • restrictions on production rules => grammar classes\n
  • restrictions on production rules => grammar classes\n
  • computer science: lexical syntax\n\ncan write this as a regular grammar\n
  • computer science: lexical syntax\n\ncan write this as a regular grammar\n
  • computer science: lexical syntax\n\ncan write this as a regular grammar\n
  • computer science: lexical syntax\n\ncan write this as a regular grammar\n
  • computer science: lexical syntax\n\ncan write this as a regular grammar\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Compiler Components and their Generators - Lexical Analysis

    1. 1. Compiler Components & GeneratorsLexical AnalysisGuido Wachsmuth Delft Course IN4303 University of Technology Compiler Construction Challenge the future
    2. 2. Recap: Garbage Collectionlessons learnedHow can we collect unreachable records on the heap? • reference counts • mark reachable records, sweep unreachable records • copy reachable recordsHow can we reduce heap space needed for garbage collection? • pointer-reversal • breadth-first search • hybrid algorithms Lexical Analysis 2
    3. 3. Overviewtoday’s lecture Lexical Analysis 3
    4. 4. Overviewtoday’s lecturelexical analysis Lexical Analysis 3
    5. 5. Overviewtoday’s lecturelexical analysisregular languages • regular grammars • regular expressions • finite state automata Lexical Analysis 3
    6. 6. Overviewtoday’s lecturelexical analysisregular languages • regular grammars • regular expressions • finite state automataequivalence of formalisms • constructive approach Lexical Analysis 3
    7. 7. Overviewtoday’s lecturelexical analysisregular languages • regular grammars • regular expressions • finite state automataequivalence of formalisms • constructive approachtool generation Lexical Analysis 3
    8. 8. Iregular grammars Lexical Analysis 4
    9. 9. Recap: A Theory of Languageformal languages Lexical Analysis 5
    10. 10. Recap: A Theory of Languageformal languagesvocabulary Σ finite, nonempty set of elements (words, letters) alphabet Lexical Analysis 5
    11. 11. Recap: A Theory of Languageformal languagesvocabulary Σ finite, nonempty set of elements (words, letters) alphabetstring over Σ finite sequence of elements chosen from Σ word, sentence, utterance Lexical Analysis 5
    12. 12. Recap: A Theory of Languageformal languagesvocabulary Σ finite, nonempty set of elements (words, letters) alphabetstring over Σ finite sequence of elements chosen from Σ word, sentence, utteranceformal language λ set of strings over a vocabulary Σ λ ⊆ Σ* Lexical Analysis 5
    13. 13. Recap: A Theory of Languageformal grammars Lexical Analysis 6
    14. 14. Recap: A Theory of Languageformal grammarsformal grammar G = (N, Σ, P, S) nonterminal symbols N terminal symbols Σ production rules P ⊆ (N∪Σ)* N (N∪Σ)* × (N∪Σ)* start symbol S∈N Lexical Analysis 6
    15. 15. Recap: A Theory of Languageformal grammarsformal grammar G = (N, Σ, P, S) nonterminal symbols N nonterminal symbol terminal symbols Σ production rules P ⊆ (N∪Σ)* N (N∪Σ)* × (N∪Σ)* start symbol S∈N Lexical Analysis 6
    16. 16. Recap: A Theory of Languageformal grammarsformal grammar G = (N, Σ, P, S) nonterminal symbols N terminal symbols Σ production rules P ⊆ (N∪Σ)* N (N∪Σ)* × (N∪Σ)* start symbol S∈N context Lexical Analysis 6
    17. 17. Recap: A Theory of Languageformal grammarsformal grammar G = (N, Σ, P, S) nonterminal symbols N terminal symbols Σ production rules P ⊆ (N∪Σ)* N (N∪Σ)* × (N∪Σ)* start symbol S∈N replacement Lexical Analysis 6
    18. 18. Recap: A Theory of Languageformal grammarsformal grammar G = (N, Σ, P, S) nonterminal symbols N terminal symbols Σ production rules P ⊆ (N∪Σ)* N (N∪Σ)* × (N∪Σ)* start symbol S∈Ngrammar classes type-0, unrestricted type-1, context-sensitive: (a A c, a b c) type-2, context-free: P ⊆ N × (N∪Σ)* type-3, regular: (A, x) or (A, xB) Lexical Analysis 6
    19. 19. Decimal Numbersright regular grammar Num → “0” Num Num → “0” Num → “1” Num Num → “1” Num → “2” Num Num → “2” Num → “3” Num Num → “3” Num → “4” Num Num → “4” Num → “5” Num Num → “5” Num → “6” Num Num → “6” Num → “7” Num Num → “7” Num → “8” Num Num → “8” Num → “9” Num Num → “9” Lexical Analysis 7
    20. 20. Identifiersright regular grammar Id → “a” R Id → “a” … … Id → “z” R Id → “z” R → “a” R R → “a” … … R → “z” R R → “z” R → “0” R R → “0” … … R → “9” R R → “9” Lexical Analysis 8
    21. 21. Recap: A Theory of Languageformal languages Lexical Analysis 9
    22. 22. Recap: A Theory of Languageformal languagesformal grammar G = (N, Σ, P, S) Lexical Analysis 9
    23. 23. Recap: A Theory of Languageformal languagesformal grammar G = (N, Σ, P, S)derivation relation G ⊆ (N∪Σ)* × (N∪Σ)* w G w’ ∃(p, q)∈P: ∃u,v∈(N∪Σ)*: w=u p v ∧ w’=u q v Lexical Analysis 9
    24. 24. Recap: A Theory of Languageformal languagesformal grammar G = (N, Σ, P, S)derivation relation G ⊆ (N∪Σ)* × (N∪Σ)* w G w’ ∃(p, q)∈P: ∃u,v∈(N∪Σ)*: w=u p v ∧ w’=u q vformal language L(G) ⊆ Σ* L(G) = {w∈Σ* | S G * w} Lexical Analysis 9
    25. 25. Recap: A Theory of Languageformal languagesformal grammar G = (N, Σ, P, S)derivation relation G ⊆ (N∪Σ)* × (N∪Σ)* w G w’ ∃(p, q)∈P: ∃u,v∈(N∪Σ)*: w=u p v ∧ w’=u q vformal language L(G) ⊆ Σ* L(G) = {w∈Σ* | S G * w}classes of formal languages Lexical Analysis 9
    26. 26. IIregular expressions Lexical Analysis 10
    27. 27. Recap: Regular Expressionsoverviewbasics • symbol from an alphabet • εcombinators • alternation: E1 | E2 • concatenation: E1 E2 • repetition: E* • optional: E? = E | ε • one or more: E+ = E E* Lexical Analysis 11
    28. 28. Decimal Numbers & Identifiersregular expressions Num: (0|1|2|3|4|5|6|7|8|9)+ Id: (a|…|z)(a|…|z|0|…|9)* Lexical Analysis 12
    29. 29. Regular Expressionsformal languagesbasics • L(a) = {“a”} • L(ε) = {“”}combinators • L(E1 | E2) = L(E1) ∪ L(E2) • L(E1 E2) = L(E1) · L(E2) • L(E*) = L(E)* Lexical Analysis 13
    30. 30. IIIfinite automata Lexical Analysis 14
    31. 31. Finite Automataformal definition Lexical Analysis 15
    32. 32. Finite Automataformal definitionfinite automaton M = (Q, Σ, T, q0, F) states Q input symbols Σ transition function T start state q0∈Q final states F⊆Q Lexical Analysis 15
    33. 33. Finite Automataformal definitionfinite automaton M = (Q, Σ, T, q0, F) states Q input symbols Σ transition function T start state q0∈Q final states F⊆Qtransition function nondeterministic FA T : Q × Σ → P(Q) NFA with ε-moves T : Q × (Σ ∪ {ε}) → P(Q) deterministic FA T : Q × Σ → Q Lexical Analysis 15
    34. 34. Decimal Numbers & Identifiersfinite automata 0-9 1 2 0-9 a-z a-z 1 2 0-9 Lexical Analysis 16
    35. 35. Nondeterministic Finite Automataformal languages Lexical Analysis 17
    36. 36. Nondeterministic Finite Automataformal languagesfinite automaton M = (Q, Σ, T, q0, F) Lexical Analysis 17
    37. 37. Nondeterministic Finite Automataformal languagesfinite automaton M = (Q, Σ, T, q0, F)transition function T : Q × Σ → P(Q) T({q1,..., qn}, x) := T(q1, x) ∪ … ∪ T(qn, x) T*({q1,..., qn}, ε) := {q1,..., qn} T*({q1,..., qn}, xw) := T*(T({q1,..., qn}, x), w) Lexical Analysis 17
    38. 38. Nondeterministic Finite Automataformal languagesfinite automaton M = (Q, Σ, T, q0, F)transition function T : Q × Σ → P(Q) T({q1,..., qn}, x) := T(q1, x) ∪ … ∪ T(qn, x) T*({q1,..., qn}, ε) := {q1,..., qn} T*({q1,..., qn}, xw) := T*(T({q1,..., qn}, x), w)formal language L(M) ⊆ Σ* L(M) = {w∈Σ* | T*({q0}, w) ∩ F ≠ ∅} Lexical Analysis 17
    39. 39. Deterministic Finite Automataformal languages Lexical Analysis 18
    40. 40. Deterministic Finite Automataformal languagesfinite automaton M = (Q, Σ, T, q0, F) Lexical Analysis 18
    41. 41. Deterministic Finite Automataformal languagesfinite automaton M = (Q, Σ, T, q0, F)transition function T : Q × Σ → P(Q) T*(q, ε) := q T*(q, xw) := T*(T(q, x), w) Lexical Analysis 18
    42. 42. Deterministic Finite Automataformal languagesfinite automaton M = (Q, Σ, T, q0, F)transition function T : Q × Σ → P(Q) T*(q, ε) := q T*(q, xw) := T*(T(q, x), w)formal language L(M) ⊆ Σ* L(M) = {w∈Σ* | T*(q0, w) ∈ F} Lexical Analysis 18
    43. 43. coffee break Lexical Analysis 19
    44. 44. IVequivalence Lexical Analysis 20
    45. 45. Regular Languagesformalisms left regular right regular regular grammars grammars expressions NFAs with NFAs DFAs ε-moves Lexical Analysis 21
    46. 46. Regular Languagesformalisms left regular right regular regular grammars grammars expressions NFAs with NFAs DFAs ε-moves Lexical Analysis 21
    47. 47. Regular Languagesformalisms left regular right regular regular grammars grammars expressions NFAs with NFAs DFAs ε-moves Lexical Analysis 21
    48. 48. Regular Languagesformalisms left regular right regular regular grammars grammars expressions NFAs with NFAs DFAs ε-moves Lexical Analysis 21
    49. 49. Regular Languagesformalisms left regular right regular regular grammars grammars expressions NFAs with NFAs DFAs ε-moves Lexical Analysis 21
    50. 50. Regular Languagesformalisms left regular right regular regular grammars grammars expressions NFAs with NFAs DFAs ε-moves Lexical Analysis 22
    51. 51. Regular Languagesformalisms left regular right regular regular grammars grammars expressions NFAs with NFAs DFAs ε-moves Lexical Analysis 22
    52. 52. NFA constructionright regular grammar Lexical Analysis 23
    53. 53. NFA constructionright regular grammarformal grammar G = (N, Σ, P, S) Lexical Analysis 23
    54. 54. NFA constructionright regular grammarformal grammar G = (N, Σ, P, S)finite automaton M = (N ∪ {f}, Σ, T, S, F) Lexical Analysis 23
    55. 55. NFA constructionright regular grammarformal grammar G = (N, Σ, P, S)finite automaton M = (N ∪ {f}, Σ, T, S, F)transition function T (X, aY)∈P : (X, a, Y)∈T (X, a)∈P : (X, a, f)∈T Lexical Analysis 23
    56. 56. NFA constructionright regular grammarformal grammar G = (N, Σ, P, S)finite automaton M = (N ∪ {f}, Σ, T, S, F)transition function T (X, aY)∈P : (X, a, Y)∈T (X, a)∈P : (X, a, f)∈Tfinal states F (S, ε)∈P : F = {S, f} else: F = {f} Lexical Analysis 23
    57. 57. NFA constructionexample Num → “0” Num Num → “0” Num → “1” Num Num → “1” Num → “2” Num Num → “2” Num → “3” Num Num → “3” Num → “4” Num Num → “4” Num → “5” Num Num → “5” Num → “6” Num Num → “6” Num → “7” Num Num → “7” Num → “8” Num Num → “8” Num → “9” Num Num → “9” Lexical Analysis 24
    58. 58. NFA constructionexample Num → “0” Num Num → “0” Num → “1” Num Num → “1” Num → “2” Num Num → “2” N Num → “3” Num Num → “3” Num → “4” Num Num → “4” Num → “5” Num Num → “5” Num → “6” Num Num → “6” Num → “7” Num Num → “7” f Num → “8” Num Num → “8” Num → “9” Num Num → “9” Lexical Analysis 24
    59. 59. NFA constructionexample Num → “0” Num Num → “0” Num → “1” Num Num → “1” Num → “2” Num Num → “2” N Num → “3” Num Num → “3” Num → “4” Num Num → “4” Num → “5” Num Num → “5” Num → “6” Num Num → “6” Num → “7” Num Num → “7” f Num → “8” Num Num → “8” Num → “9” Num Num → “9” Lexical Analysis 24
    60. 60. NFA constructionexample Num → “0” Num Num → “0” Num → “1” Num Num → “1” Num → “2” Num Num → “2” N 0-9 Num → “3” Num Num → “3” Num → “4” Num Num → “4” Num → “5” Num Num → “5” Num → “6” Num Num → “6” Num → “7” Num Num → “7” f Num → “8” Num Num → “8” Num → “9” Num Num → “9” Lexical Analysis 24
    61. 61. NFA constructionexample Num → “0” Num Num → “0” Num → “1” Num Num → “1” Num → “2” Num Num → “2” N 0-9 Num → “3” Num Num → “3” Num → “4” Num Num → “4” Num → “5” Num Num → “5” 0-9 Num → “6” Num Num → “6” Num → “7” Num Num → “7” f Num → “8” Num Num → “8” Num → “9” Num Num → “9” Lexical Analysis 24
    62. 62. NFA constructionregular expressions a x x(x) ε ε x εx* ε Lexical Analysis 25
    63. 63. NFA construction regular expressions ε x εx|y y ε ε x y ε ε εxy Lexical Analysis 26
    64. 64. NFA constructionε eliminationadditional final states • states with ε-moves into final states • become final states themselvesadditional transitions • ε-move from source to target state • transitions from target state • add these transitions to the source state Lexical Analysis 27
    65. 65. Powerset constructioneliminating nondeterminismnondeterministic finite automaton M = (Q, Σ, T, q0, F)deterministic finite automaton M’ = (P(Q), Σ, T’, {q0}, F’)transition function T’ • T’({q1,..., qn}, x) = T({q1,..., qn}, x) = T(q1, x) ∪ … ∪ T(qn, x)final states F’ = {S⎮S⊆Q, S∩F ≠ ∅} • all states that include a final state of the original NFA Lexical Analysis 28
    66. 66. Powerset constructionexample f f 3 4 23 24 a-e,g-z a-z i i 0-9 0-9 a-z a-z a-z 1 2 1 2 0-9 a-h 0-9 j-z Lexical Analysis 29
    67. 67. Vsummary Lexical Analysis 30
    68. 68. Summarylessons learned Lexical Analysis 31
    69. 69. Summarylessons learnedWhat are the formalisms to describe regular languages? • regular grammars • regular expressions • finite state automata Lexical Analysis 31
    70. 70. Summarylessons learnedWhat are the formalisms to describe regular languages? • regular grammars • regular expressions • finite state automataWhy are these formalisms equivalent? • constructive proofs Lexical Analysis 31
    71. 71. Summarylessons learnedWhat are the formalisms to describe regular languages? • regular grammars • regular expressions • finite state automataWhy are these formalisms equivalent? • constructive proofsHow can we generate compiler tools from that? • implement DFAs • generate transition tables Lexical Analysis 31
    72. 72. Literaturelearn more Lexical Analysis 32
    73. 73. Literaturelearn moreformal languages Noam Chomsky: Three models for the description of language. 1956 J. E. Hopcroft, R. Motwani, J. D. Ullman: Introduction to Automata Theory, Languages, and Computation. 2006 Lexical Analysis 32
    74. 74. Literaturelearn moreformal languages Noam Chomsky: Three models for the description of language. 1956 J. E. Hopcroft, R. Motwani, J. D. Ullman: Introduction to Automata Theory, Languages, and Computation. 2006lexical analysis Andrew W. Appel, Jens Palsberg: Modern Compiler Implementation in Java, 2nd edition. 2002 Lexical Analysis 32
    75. 75. Outlookcoming nextcompiler components and their generators • Lecture 12: Syntactical Analysis • Lecture 13: SDF inside • Lecture 14: Static AnalysisLab Nov 26 • test cases • finish Lab 2, do reference resolution next • finish Lab 3, do hover help next • content completion Lexical Analysis 33
    76. 76. questions Lexical Analysis 34
    77. 77. credits Lexical Analysis 35
    78. 78. PicturescopyrightsSlide 1: Book Scanner by Ben Woosley, some rights reservedSlides 5, 6, 9: Noam Chomsky by Fellowsisters, some rights reservedSlide 7, 8, 12, 16: Tiger by Bernard Landgraf, some rights reservedSlide 19: Coffee Primo by Dominica Williamson, some rights reservedSlide 33: Pine Creek by Nicholas, some rights reservedSlide 34: Questions by Oberazzi, some rights reservedSlide 35: Too Much Credit by Andres Rueda, some rights reserved Lexical Analysis 36

    ×