Ambiguity Pilambda

362 views
292 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
362
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ambiguity Pilambda

  1. 1. Approximating Context-Free Grammar Ambiguity Claus Brabrand [email_address] BRICS, Department of Computer Science University of Aarhus, Denmark
  2. 2. // Abstract “ Approximating Context-Free Grammar Ambiguity” Context-free grammar ambiguity is undecidable. However, just because it’s undecidable, doesn’t mean there aren’t (good) approximations! Indeed, the whole area of static analysis works on “side-stepping undecidability” . We exhibit a characterization of context-free ambiguity which induces a whole framework for approximating the problem. In particular, we give an approximation, A MN , based on the [Mohri-Nederhof, 2000] regular approximation of context-free grammars and show how to boost the precision even further.
  3. 3. // Outline <ul><li>Introduction </li></ul><ul><li>Vertical / Horizontal Ambiguity </li></ul><ul><li>Characterization of Ambiguity </li></ul><ul><li>(Over-)Approximation Framework </li></ul><ul><li>Approximation (A MN ) </li></ul><ul><li>Assessment </li></ul><ul><li>Related Work </li></ul><ul><li>Conclusion </li></ul>
  4. 4. // Context-Free Grammar <ul><li>N finite set of nonterminals </li></ul><ul><li> finite set of terminals </li></ul><ul><li>s  N start nonterminal </li></ul><ul><li> : N  P (E*) production function , E = N   </li></ul>G =  N,  , s,   <ul><li>Assume : </li></ul><ul><ul><li>All n  N reachable (from s) </li></ul></ul><ul><ul><li>All n  N derive some (finite) string </li></ul></ul>L : G  P (  *) language of G, L (G)
  5. 5. // Relevant CFG Decision Problems <ul><li>Decidable : </li></ul><ul><ul><ul><li>Membership:   L (G CFG ) </li></ul></ul></ul><ul><ul><ul><li>Emptyness: L (G CFG ) =  </li></ul></ul></ul><ul><ul><ul><li>Intersection (w/ REG): L (G CFG )  L (R REG ) = L (C CFG ) </li></ul></ul></ul><ul><ul><ul><li>… constructively </li></ul></ul></ul><ul><li>Undecidable : </li></ul><ul><ul><ul><li>Intersection (w/ CFG): L (G CFG )  L (G’ CFG ) ? </li></ul></ul></ul><ul><ul><ul><li>… </li></ul></ul></ul><ul><li>Ambiguity :  *: 2 derivation trees ? </li></ul>
  6. 6. // Ambiguity: Undecidable! <ul><li>Algorithms: </li></ul><ul><ul><ul><li>Undecidable ! </li></ul></ul></ul><ul><ul><ul><li>However… </li></ul></ul></ul> T s  T’ s  = unambiguous ambiguous <ul><li>Ambiguity :  *: 2 derivation trees ? </li></ul>?
  7. 7. // “Side-Stepping Undecidability” <ul><li>Unsafe approximation : </li></ul><ul><li>Safe approximation : </li></ul>However, just because it’s undecidable, doesn’t mean there aren’t (good) approximations ! Indeed, the whole area of static analysis works on “ side-stepping undecidability ” . unambiguous ambiguous safe (over-) approximation unambiguous ambiguous safe (under-) approximation unambiguous ambiguous unsafe approximation
  8. 8. // Motivation <ul><li>Use safe (over-) approximation : </li></ul><ul><li>“ Yes! ”  “G guaranteed unambiguous”!!! </li></ul><ul><ul><li>Safely use any GLR parser on G </li></ul></ul><ul><ul><ul><li>Because: never two parses at runtime! </li></ul></ul></ul><ul><li>Hence: </li></ul><ul><ul><ul><li>dynamic parse ambiguity  static parse ambiguity </li></ul></ul></ul>unambiguous ambiguous Yes! .
  9. 9. // Motivation (cont’d) <ul><li>Undecidability means: “there’ll always be a slack ”: </li></ul><ul><li>However, still useful! </li></ul><ul><ul><li>Possible interpretations of “ No? ”: </li></ul></ul><ul><ul><ul><ul><li>Treat as error (reject grammar): </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>“ Please redesign your grammar” (as in [LA]LR(k)) </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li>Treat as warning : </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>“ Here are some potential problems” </li></ul></ul></ul></ul></ul>unambiguous ambiguous No? . .
  10. 10. // Vertical Ambiguity <ul><li>“ Vertical ambiguity” : </li></ul><ul><ul><ul><li>Example: </li></ul></ul></ul> n  N :  ,  ’   (n) :    ’  L (  )  L (  ’) =  x a y Z : x A y : x B y A : a B : a Ambiguous string: ~ “ reduce/reduce conflict ” in [Yacc]  G
  11. 11. // Horizontal Ambiguity <ul><li>“ Horizontal ambiguity” : </li></ul><ul><ul><ul><li>where: </li></ul></ul></ul><ul><ul><ul><li>Example: </li></ul></ul></ul> n  N :    (n):  i  [1..|  |-1]: L (  0 ..  i-1 ) L (  i ..  |  |-1 ) =  : P (  *)  P (  *)  P (  *) X Y = { x a y | x,y  *  a  +  x,x a  L (X)  y, a y  L (Y) } x a y Z : A B A : x a : x B : a y : y Ambiguous string: ~ “ shift/reduce conflict ” in [Yacc]  G      
  12. 12. // Characterization of Ambiguity <ul><li>Theorem 1: </li></ul><ul><ul><ul><li>Lemma 1a: (“  ”) </li></ul></ul></ul><ul><ul><ul><li>Lemma 1b: (“  ”) </li></ul></ul></ul>G  G  G unambiguous G  G  G unambiguous G  G  G unambiguous
  13. 13. // Proof (Lemma 1a): “  ” <ul><li>… or contrapositively: </li></ul><ul><li>Proof: </li></ul><ul><ul><li>Assume G ambiguous (i.e.  2 der. trees for  ) </li></ul></ul><ul><ul><li>Show: </li></ul></ul><ul><ul><ul><ul><li>by induction in max height of the 2 derivation trees </li></ul></ul></ul></ul>G  G  G unambiguous G ambiguous  G  G G  G
  14. 14. // Proof (Lemma 1a): “  ” (Base) <ul><li>Base case (height  1): </li></ul><ul><ul><li>The ambiguity means that (for p  p’) : </li></ul></ul><ul><ul><li>Which means: </li></ul></ul><ul><ul><ul><ul><li>i.e., we have a vertical ambiguity: </li></ul></ul></ul></ul>N  ’ 1  N  1   L (  )  L (  ’)  {  }   p p’ = G
  15. 15. // Proof (Lemma 1a): “  ” (I.H.) <ul><li>Induction step (height  n): </li></ul><ul><ul><li>Assume induction hypothesis (for height  n-1) </li></ul></ul><ul><ul><li>The ambiguity means: </li></ul></ul>N n-1  N  n-1  i  ’ i’ … …  i … …  ’ i’ p p’ 1 1  |  -1| =  ’ 0  ’ |  ’-1|  0 .. .. .. ..  = 
  16. 16. // Proof (Lemma 1a): “  ” (p  p’) <ul><li>Case p = q (different production): </li></ul><ul><ul><li>… but then  </li></ul></ul><ul><ul><ul><ul><li>i.e., we have a vertical ambiguity: </li></ul></ul></ul></ul>L (  )  L (  ’)  {  }   p  p’ G N n-1  N  n-1  i  ’ i’ … …  i … …  ’ i’ p p’ 1 1  |  -1| =  ’ 0  ’ |  ’-1|  0 .. .. .. ..  = 
  17. 17. // Proof (Lemma 1a): “  ” (p=p’,1) <ul><li>Case p  q (same prod.  ): </li></ul><ul><ul><li>i.e. “the top of the trees are the same” </li></ul></ul><ul><ul><ul><li>Case : </li></ul></ul></ul><ul><ul><ul><ul><li> ambiguity in subtree i ( deriving same  i ): </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Induction hypothesis (this subtree)  </li></ul></ul></ul></ul></ul> i :  i =  ’ i p = p’  i :  i =  ’ i  N n-1  N  n-1  i  i … …  i … …  i’ p p’ 1 1  |  -1| =  0  |  -1|  0 .. .. .. ..  =  G G
  18. 18. // Proof (Lemma 1a): “  ” (p=p’,2) <ul><li>Case p  q (same prod.  ): </li></ul><ul><ul><ul><li>Case : </li></ul></ul></ul><ul><ul><ul><ul><li>… but then: (assume WLOG ): </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Now pick any k : </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>...then: </li></ul></ul></ul></ul></ul>N n-1  N  n-1  i . … .  i p  i :  i   ’ i p = p’ p 1 1  i :  i =  ’ i   i :  i =  ’ i   j  i:  j   ’ j =   j  j  ’ i . … .  i  j  ’ j i  k < j L (  0 ..  k ) L (  k+1 ..  |  | )    k k <ul><li>least such i </li></ul><ul><li>2 nd least such j </li></ul>G  
  19. 19. // Proof (Lemma 1b): “  ” <ul><li>Contrapositively: </li></ul><ul><li>Assume “ ” (vertical conflict) : </li></ul><ul><ul><ul><li>Then for some N  N : </li></ul></ul></ul><ul><ul><ul><ul><li>But then derive (using reachability + derivability of N) : </li></ul></ul></ul></ul>s  * x N   x    * x a   * x a y s  * x N   x  ’   * x a   * x a y N    * a , N   ’  * a , L (  )  L (  ’ )  { a }   G  G  G unambiguous G ambiguous  G  G
  20. 20. // Proof (Lemma 1b): “  ” (cont’d) <ul><li>Assume “ ” (horizontal conflict) : </li></ul><ul><ul><ul><li>Then for some N  N : </li></ul></ul></ul><ul><ul><ul><ul><li>But then derive (using reachability + derivability of N) : </li></ul></ul></ul></ul>s  * v N   v     * v x    * v x a y   * v x a y w s  * v N   v     * v x a    * v x a y   * v x a y w N    , L (  ) L (  )    x,y   * :  a   + : x,x a  L (  )  y, a y  L (  ) i.e.  
  21. 21. // (Over-)Approximation (A) <ul><li>(Over-)Approximation A : E*  P (  *) </li></ul><ul><ul><ul><li>A decidable  “ ” and “ ” decidable on co-dom( A ) </li></ul></ul></ul><ul><li>Approximated vertical ambiguity: </li></ul><ul><li>Approximated horizontal ambiguity: </li></ul>  E* : L (  )  A (  )  n  N :  ,  ’   (n) : A (  )  A (  ’) =  A A  n  N:    (n):  i  [1..|  |-1]: A (  0 ..  i-1 ) A (  i ..  |  |-1 ) =   G G    
  22. 22. // Ambiguity Approximation <ul><li>Theorem 2: </li></ul><ul><ul><li>Proof : </li></ul></ul><ul><ul><ul><li>“ Conflicts w/ smaller sets  conflicts w/ larger sets”: </li></ul></ul></ul>  G unambiguous A (  )  A (  ) =   L (  )  L (  ) =  A (  ) A (  ) =   L (  ) L (  ) =  A A    A A G G G G G G    
  23. 23. // Compositionality (of A’s) <ul><li>Colloary 3: </li></ul><ul><ul><li>Proof: </li></ul></ul><ul><ul><ul><li>Follows from definition [omited…] </li></ul></ul></ul><ul><ul><li>i.e. “Approximations are compositional ”!: </li></ul></ul>A , A’ decidable (over-)approximations  A  A’ decidable (over-)approximation unambiguous ambiguous unambiguous ambiguous unambiguous ambiguous A A’ A  A’ 
  24. 24. // Choice(s) of A? <ul><li>A  * (  ) =  * (constant) </li></ul><ul><ul><ul><li>Worst approximation </li></ul></ul></ul><ul><ul><ul><li>… but safe approximation! </li></ul></ul></ul><ul><ul><li>Useless: </li></ul></ul><ul><ul><ul><ul><li>“ Cannot determine that any grammars are unambiguous” </li></ul></ul></ul></ul>unambiguous ambiguous worst approximation
  25. 25. // Choice(s) of A? (cont’d) <ul><li>A MN (  ) = [Mohri-Nederhof](  ) </li></ul><ul><ul><ul><li>CFG  DFA (NFA) Approximation </li></ul></ul></ul><ul><ul><li>Properties of this “ Black-box ”: </li></ul></ul><ul><ul><ul><li>Good (over-)approximation! </li></ul></ul></ul><ul><ul><ul><li>Works on language , L (G); </li></ul></ul></ul><ul><ul><ul><ul><li>not on grammatical structure , G </li></ul></ul></ul></ul><ul><ul><li>Approximation parameterizable : </li></ul></ul><ul><ul><ul><ul><li>E.g. unfold nonterminals “n” times </li></ul></ul></ul></ul><ul><ul><ul><li>“ Regular Approximation of Context-Free Grammars through Transformation” </li></ul></ul></ul><ul><ul><ul><li>[Mohri-Nederhof, 2000] </li></ul></ul></ul>Black-box
  26. 26. // Decidability (of A MN ) <ul><ul><ul><li>“  ” decidable (using DFAs) </li></ul></ul></ul><ul><ul><ul><li>O(|X NFA ||Y NFA |) </li></ul></ul></ul><ul><ul><ul><li>“ ” decidable (using DFAs) </li></ul></ul></ul><ul><ul><ul><li>O(|X NFA ||Y NFA |) </li></ul></ul></ul><ul><ul><ul><li>A MN decidable </li></ul></ul></ul><ul><ul><ul><ul><li>With potential counterexamples (using DFAs) </li></ul></ul></ul></ul>X  Y =  X Y =    G unambiguous A MN A MN    
  27. 27. <ul><li>For X,Y regular languages: </li></ul><ul><li>All overlappings, “x a y”, as DFAs; variant of “  ” construction! </li></ul>// Decision Algorithm for (X Y)  X NFA Y NFA [X;Y] NFA   a  path : X NFA Y NFA [X;Y] NFA a a x y x a y a a X Y Y X X  Y   
  28. 28. // Three Approximation Answers <ul><li>Y! : </li></ul><ul><ul><li>“ G definitely not ambiguous ”! </li></ul></ul><ul><li>“ ? / D ? ”: </li></ul><ul><ul><li>“ ? ”: “ Don’t know ”? </li></ul></ul><ul><ul><ul><li>… could not find any potential counterexamples . </li></ul></ul></ul><ul><ul><li>“ D ? ”: “ Don’t know ” – look at over-approx, D? </li></ul></ul><ul><ul><ul><li>… and here are all potential counterexamples </li></ul></ul></ul><ul><ul><ul><ul><li>Note : some strings do not even parse! </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Improve : Parse S  FIN D  subset of real counterexamples </li></ul></ul></ul></ul>True answer
  29. 29. // Regaining Lost Precision! <ul><li>Now parse all counterexamples ! </li></ul><ul><ul><li>i.e. parse DFA, D DFA : </li></ul></ul><ul><ul><li>1) i.e. construct: </li></ul></ul><ul><ul><ul><ul><li>Decidable in O(|D||G|) </li></ul></ul></ul></ul><ul><ul><li>2) Decide emptyness on C: </li></ul></ul><ul><ul><ul><ul><li>Decidable in O(|C| = |D||G|) </li></ul></ul></ul></ul><ul><li>Only potential counterexamples that parse! </li></ul>L (C CFG ) = L (D DFA )  L (G CFG ) L (C CFG ) = 
  30. 30. // Three Approximation Answers <ul><li>Y! : </li></ul><ul><ul><li>“ G definitely not ambiguous ”! </li></ul></ul><ul><li>“ ? / C ? ”: </li></ul><ul><ul><li>“ ? ”: “ Don’t know ”? </li></ul></ul><ul><ul><ul><li>… could not find any counterexamples. </li></ul></ul></ul><ul><ul><li>“ C ? ”: “ Don’t know ” – look at over-approx, C? </li></ul></ul><ul><ul><ul><li>… and here are all potential counterexamples </li></ul></ul></ul><ul><ul><ul><ul><li>Note : all strings actually parse (maybe not ambiguously)! </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Improve : extract finite under-approximation...? </li></ul></ul></ul></ul>True answer
  31. 31. <ul><li>[Mohri-Nederhof]: O(n 2 vh) </li></ul><ul><li>Vertical Amb: O(n 3 v 4 h 4 ) </li></ul><ul><li>Horizontal Amb: O(n 3 v 3 h 5 ) </li></ul><ul><li>Total: O(n 3 v 3 h 4 (v+h))  O(g 5 ) </li></ul>// Asymptotic (Time) Complexity N 1 : e 1,1 … e a,1 : … : e 1,p … e a,p h n v <ul><li>n = | N | </li></ul><ul><li>v = max{|  (N)|, N  N } </li></ul><ul><li>h = max{|  |,  (N), N  N } </li></ul><ul><li>g = nvh = |G| </li></ul>
  32. 32. // Related Work (Dynamic) <ul><li>Dynamic disambiguation : </li></ul><ul><ul><ul><li>“ Disambiguation-by-convention”: </li></ul></ul></ul><ul><ul><ul><ul><ul><li>Longest match, most specific match, … </li></ul></ul></ul></ul></ul><ul><ul><ul><li>Customizable: </li></ul></ul></ul><ul><ul><ul><ul><ul><li>[Bison v. 1.5+]: %dprec , %merge </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>[ASF+SDF]: “disambiguation filters” </li></ul></ul></ul></ul></ul><ul><li>Dynamic ambiguity interception : </li></ul><ul><ul><ul><ul><ul><li>GLR ([Tomita], [Early], [Bison], [ASF+SDF], …) </li></ul></ul></ul></ul></ul>
  33. 33. // Related Work (Static) <ul><li>Static disambiguation : </li></ul><ul><ul><ul><li>“ Disambiguation-by-convention”: </li></ul></ul></ul><ul><ul><ul><ul><ul><li>First match, most specific match, … </li></ul></ul></ul></ul></ul><ul><ul><ul><li>Customizable: </li></ul></ul></ul><ul><ul><ul><ul><ul><li>[Yacc]: %left , %right , %nonassoc , %prec </li></ul></ul></ul></ul></ul><ul><li>Static ambiguity interception : </li></ul><ul><ul><ul><ul><ul><li>LL(k), [LA-]LR(k), … </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Our work goes here (but for GLR)! </li></ul></ul></ul></ul></ul>
  34. 34. // Implementation <ul><li>disamb (Java) </li></ul>In progress…!
  35. 35. // Assessment <ul><li>Quality of approximation ~ ~ Quantity of false-positives </li></ul><ul><ul><li>Precision: </li></ul></ul><ul><ul><ul><li>Our LR(k) ? </li></ul></ul></ul><ul><ul><ul><li>LR(k) Our ? </li></ul></ul></ul><ul><ul><ul><li>False-positives ? </li></ul></ul></ul><ul><ul><ul><li>Characterize “ ? ” / “ N ? ” </li></ul></ul></ul><ul><ul><ul><ul><li>In terms of grammatical structure ? </li></ul></ul></ul></ul><ul><li>Efficiency (in practise…) </li></ul>In progress…!
  36. 36. // Example: Expression chains <ul><li>… !? </li></ul>E -> E + T -> T T -> T * F -> F F -> ( E ) -> x
  37. 37. // Example: Balancing Structures <ul><li>Nasty: </li></ul><ul><li>Requires: </li></ul><ul><ul><ul><li>Unbounded memory (# x’es) </li></ul></ul></ul><ul><ul><ul><ul><li>i.e. CFG structure </li></ul></ul></ul></ul><ul><ul><ul><li>Unbounded lookahead </li></ul></ul></ul><ul><ul><ul><ul><li>i.e. any finite k is insufficient </li></ul></ul></ul></ul><ul><ul><li> False-positives! </li></ul></ul>S -> A A A -> x A x -> y xxyxx xyx Example string:
  38. 38. // Future Work <ul><li>Permit </li></ul><ul><ul><li>With disambiguating conventions for: </li></ul></ul><ul><ul><ul><li>Associativity </li></ul></ul></ul><ul><ul><ul><li>Precedence </li></ul></ul></ul><ul><li>Parsing optimization: </li></ul><ul><ul><ul><li>Exploit compile-time analysis information at runtime </li></ul></ul></ul><ul><li>… </li></ul>E -> E  E
  39. 39. // Conclusion But wait, there’s more… “ Approximating Context-Free Grammar Ambiguity” Context-free grammar ambiguity is undecidable. However, just because it’s undecidable, doesn’t mean there aren’t (good) approximations! Indeed, the whole area of static analysis works on “side-stepping undecidability” . We exhibit a characterization of context-free ambiguity which induces a whole framework for (over-)approximation. In particular, we give an approximation based on the [Mohri-Nederhof, 2000] regular approximation of context-free grammars and show how to boost the precision even further.
  40. 40. // Lessons Learned <ul><li>Framework: </li></ul><ul><ul><ul><li>Plug in your favorite (over-)approximation of L (  ) </li></ul></ul></ul><ul><ul><ul><ul><li>Even take intersection of them: A =  i A i </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Approximation closed under intersection </li></ul></ul></ul></ul></ul><ul><li>Methodology: </li></ul><ul><ul><ul><li>Just because it’s undecidable doesn’t mean there aren’t (good) approximations </li></ul></ul></ul><ul><ul><ul><ul><li>Quantity of false-positives (practically motivated) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>What to do with false-positives (pratically motivated) </li></ul></ul></ul></ul><ul><li>Don’t be scared of undecidability </li></ul>
  41. 41. [bonus slides]
  42. 42. // Membership: Decidable! <ul><li>Membership (aka. “parsing” ): </li></ul><ul><ul><li>Given    * : </li></ul></ul><ul><ul><ul><li>“ Is the string,  , in the language of G”: </li></ul></ul></ul><ul><li>Algorithms: </li></ul><ul><ul><ul><li>LL(k) O(|  |) </li></ul></ul></ul><ul><ul><ul><li>[LA-]LR(k) O(|  |) </li></ul></ul></ul><ul><ul><ul><li>GLR O(|  | 3 ) </li></ul></ul></ul><ul><ul><ul><li>… </li></ul></ul></ul>  L (G)
  43. 43. <ul><li>The ambiguity problem for [X;Y]... </li></ul><ul><ul><li>In fact, already a problem if x’ “goes too far”: </li></ul></ul><ul><ul><ul><li>Thus, we only have a problem if (“X eats into Y”): </li></ul></ul></ul><ul><ul><ul><ul><li>Essentially disambiguation by picking longest match </li></ul></ul></ul></ul>// Parsing Greedily Left-to-Right x y x’ y’ x y - (“too little”): Not possible (due to greediness) ... may occur in 2 cases: - (“too much”): Only this is a problem!  X  X;( prefix(Y) {  } )    X Y  x’ y’

×