MELJUN CORTES Automata Theory (Automata12)

188 views
104 views

Published on

MELJUN CORTES Automata Theory (Automata12)

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
188
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

MELJUN CORTES Automata Theory (Automata12)

  1. 1. MELJUN P. CORTES, MBA,MPA,BSCS,ACS CSC 3130: Automata theory and formal languages Parsers for programming languages MELJUN CORTES
  2. 2. CFG of the java programming language Identifier: IDENTIFIER QualifiedIdentifier: Identifier { . Identifier } Literal: IntegerLiteral FloatingPointLiteral CharacterLiteral StringLiteral BooleanLiteral NullLiteral Expression: Expression1 [AssignmentOperator Expression1]] AssignmentOperator: = += -= *= /= &= |= … from http://java.sun.com/docs/books/jls /second_edition/html/syntax.doc.html#52996
  3. 3. Parsing java programs class Point2d { /* The X and Y coordinates of the point--instance variables */ private double x; private double y; private boolean debug; // A trick to help with debugging public Point2d (double px, double py) { x = px; y = py; debug = false; // Constructor // turn off debugging } public Point2d () { // Default constructor this (0.0, 0.0); // Invokes 2 parameter Point2D constructor } // Note that a this() invocation must be the BEGINNING of // statement body of constructor public Point2d (Point2d pt) { x = pt.getX(); y = pt.getY(); } // Another consructor … } Simple java program: about 1000 symbols
  4. 4. Parsing algorithms • How long would it take to parse this? exhaustive algorithm about 1080 years (longer than life of universe) CYK algorithm about 1 week! • Can we parse faster? • No! CYK is the fastest known general-purpose parsing algorithm
  5. 5. Another way of thinking Scientist: Engineer: Find an algorithm that can parse strings in any grammar Design your grammar so it has a very fast parsing algorithm
  6. 6. An example Stack Input ε a ab A T Ta Taa Taab TaA TaT TaTb TA T Tc S abaabbc baabbc aabbc aabbc aabbc abbc bbc bc bc bc c c c ε ε shift shift reduce (5) reduce (3) shift shift shift reduce (5) reduce (3) shift reduce (4) reduce (2) shift reduce (1) S → Tc(1) T → TA(2) | A(3) A → aTb(4) | ab(5) Action input: abaabbc S T A T T A A a b a a b b c
  7. 7. Items S → Tc(1) T → TA(2) T → A(3) A → aTb(4) A → ab(5) S → •Tc S → T•c S → Tc• T → •TA T → T•A T → TA• T → •A T → A• Stack Input Action ε • a • ab • A • T • Ta • abaabbc baabbc aabbc aabbc aabbc abbc shift shift reduce (5) reduce (3) shift shift A → •aTb A → •ab A → a•Tb A → a•b A → aT•b A → ab• A → aTb• Idea of parsing algorithm: Try to match complete items to top of stack
  8. 8. Some terminology Stack Input Action ε a ab A T Ta Taa Taab TaA TaT TaTb TA T Tc S abaabbc baabbc aabbc aabbc aabbc abbc bbc bc bc bc c c c ε ε shift shift reduce (5) reduce (3) shift shift shift reduce (5) reduce (3) shift reduce (4) reduce (2) shift reduce (1) S → Tc(1) T → TA(2) | A(3) A → aTb(4) | ab(5) input: abaabbc handle valid items: a•Tb, a•b valid items: T•a, T•c, aT•b
  9. 9. Outline of LR(0) parsing algorithm • As the string is being read, it is pushed on a stack • Algorithm keeps track of all valid items • Algorithm can perform two actions: no complete item is viable there is one valid item, and it is complete shift reduce
  10. 10. Running the algorithm A Stack Input Valid Items S ε a aabb abb S aa bb S R S R aab aA aAb A b b ε ε A → •aAb A → a•Ab A → •aAb A → a•Ab A → •aAb A → ab• A → aA•b A → aAb• A → aAb | ab A → •ab A → a•b A → •ab A → a•b A → •ab A ⇒ aAb ⇒ aabb
  11. 11. Running the algorithm A Stack Input Valid Items S ε a aabb abb S aa bb S R S R aab aA aAb A b b ε ε A → •aAb A → a•Ab A → •aAb A → a•Ab A → •aAb A → ab• A → aA•b A → aAb• A → aAb | ab A → •ab A → a•b A → •ab A → a•b A → •ab A ⇒ aAb ⇒ aabb
  12. 12. How to update viable items • Initial set of valid items S → •α for every production S → α • Updating valid items on “shift b” A → α•bβ is updated to A → α•Xβ A → αb•β disappears if X ≠ b – After these updates, for every valid item A → α•Cβ and production C → •δ, we also add C → •δ as a valid item a, b: terminals notation A, B: variables X, Y: mixed symbols α, β: mixed strings
  13. 13. How to update viable items • Updating valid items on “reduce β to B” – First, we backtrack to viable items before reduce – Then, we apply same rules as for “shift B” (as if B were a terminal) A → αB•β A → α•Bβ is updated to A → α•Xβ disappears if X ≠ B C → •δ is added for every valid item A → α•Cβ and production C → •δ
  14. 14. Viable item updates by εNFA • States of εNFA will be items (plus a start state q0) • For every item S → •α we have a transition q0 ε S → •α • For every item A → α•Xβ we have a transition A → α•Xβ X A → αX•β • For every item A → α•Cβ and production C → •δ A → α•Cβ ε C → •δ
  15. 15. Example A → aAb | ab A → •aAb ε ε a A → a•Ab A b ε q0 A → aA•b A → aAb• ε A → •ab a A → a•b b A → ab•
  16. 16. Convert εNFA to DFA a 1 A → •aAb A→ •ab a b 3 A → ab• A 2 A → a•Ab A → a•b A → •aAb A → •ab 4 A → aA•b b 5 A → aAb• die states correspond to sets of valid items transitions are labeled by variables / terminals
  17. 17. Attempt at parsing with DFA A Stack Input DFA state S ε a aabb abb S aa bb 1 A → •aAb 2 A → a•Ab A → •aAb 2 A → a•Ab A → •aAb 3 A → ab• ? A → aA•b S aab R aA b b A → aAb | ab A → •ab A → a•b A → •ab A → a•b A → •ab A ⇒ aAb ⇒ aabb
  18. 18. Remember the state in stack! Input DFA state S 1 1a2 aabb abb S 1a2a2 bb S R S R 1a2a2b3 1a2A4 1a2A4b5 1A b b ε ε 1 A → •aAb 2 A → a•Ab A → •aAb 2 A → a•Ab A → •aAb 3 A → ab• 4 A → aA•b 5 A → aAb• A Stack A → aAb | ab A → •ab A → a•b A → •ab A → a•b A → •ab A ⇒ aAb ⇒ aabb
  19. 19. LR(0) grammars and deterministic PDAs • The parsing procedure can be implemented by a deterministic pushdown automaton • A PDA is deterministic if in every state there is at most one possible transition – for every input symbol and pop symbol, including ε • Example: PDA for w#wR is deterministic, but PDA for wwR is not
  20. 20. LR(0) grammars and deterministic PDAs • Not every PDA can be made deterministic • Since PDAs are equivalent to CFLs, LR(0) parsing algorithm must fail for some CFLs! • When does LR(0) parsing algorithm fail?
  21. 21. Outline of LR(0) parsing algorithm • Algorithm can perform two actions: no complete item is valid there is one valid item, and it is complete shift (S) reduce (R) some valid items complete, some not more than one valid complete item S / R conflict R / R conflict • What if:
  22. 22. Hierarchy of context-free grammars context-free grammars parse using CYK algorithm (slow) LR(∞) grammars … to be continued…java LR(1) grammars LR(0) grammars parse using LR(0) algorithm perl python …

×