grammer

530 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

grammer

  1. 1. Parsing Techniques for Lexicalized Context-Free Grammars* Giorgio Satta University of Padua * Joint work with : Jason Eisner, Mark-Jan Nederhof
  2. 2. Summary • Part I: Lexicalized Context-Free Grammars – motivations and definition – relation with other formalisms • Part II: standard parsing – TD techniques – BU techniques • Part III: novel algorithms – BU enhanced – TD enhanced
  3. 3. Lexicalized grammars • each rule specialized for one or more lexical items • advantages over non-lexicalized formalisms: – express syntactic preferences that are sensitive to lexical words – control word selection
  4. 4. Syntactic preferences • adjuncts Workers [ dumped sacks ] into a bin *Workers dumped [ sacks into a bin ] • N-N compound [ hydrogen ion ] exchange *hydrogen [ ion exchange ]
  5. 5. Word selection • lexical Nora convened the meeting ?Nora convened the party • semantics Peggy solved two puzzles ?Peggy solved two goats • world knowledge Mary shelved some books ?Mary shelved some cooks
  6. 6. Lexicalized CFG Motivations : • study computational properties common to generative formalisms used in state-of-the-art real-world parsers • develop parsing algorithm that can be directly applied to these formalisms
  7. 7. Lexicalized CFG VP[dump][sack] VP[dump][sack] V[dump] NP[sack] PP[into][bin] P[into] N[sack] dumped sacks NP[bin] Det[a] into a N[bin] bin
  8. 8. Lexicalized CFG Context-free grammars with : • alphabet VT: – dumped, sacks, into, ... • delexicalized nonterminals VD: – NP, VP, ... • nonterminals VN: – NP[sack], VP[dump][sack], ...
  9. 9. Lexicalized CFG Delexicalized nonterminals encode : • word sense N, V, ... • grammatical features number, tense, ... • structural information bar level, subcategorization state, ... • other constraints distribution, contextual features, ...
  10. 10. Lexicalized CFG • productions have two forms : – V[dump] → dumped – VP[dump][sack] → VP[dump][sack] PP[into][bin] • lexical elements in lhs inherited from rhs
  11. 11. Lexicalized CFG • production is k-lexical : k occurrences of lexical elements in rhs – NP[bin] → Det[a] N[bin] is 2-lexical – VP[dump][sack] → VP[dump][sack] PP[into][bin] is 4-lexical
  12. 12. LCFG at work • 2-lexical CFG – Alshawi 1996 : Head Automata – Eisner 1996 : Dependency Grammars – Charniak 1997 : CFG – Collins 1997 : generative model
  13. 13. LCFG at work Probabilistic LCFG G is strongly equivalent to probabilistic grammar G’ iff • 1-2-1 mapping between derivations • each direction is a homomorphism • derivation probabilities are preserved
  14. 14. LCFG at work From Charniak 1997 to 2-lex CFG : NPS[profits] NNP[profits] ADJNP[corporate] Pr1 (corporate | ADJ, NP, profits) × Pr1 (profits | N, NP, profits) × Pr2 ( NP → ADJ N | NP, S, profits)
  15. 15. LCFG at work From Collins 1997 (Model #2) to 2-lex CFG : 〈VPS, {}, ∆left ⊕ ∆, ∆S ⊕ ∆〉 [bought] 〈N, ∆〉 [IBM] 〈VPS, {NP-C}, ∆left , ∆S 〉 [bought] Prleft (NP, IBM | VP, S, bought, ∆left , {NP-C})
  16. 16. LCFG at work Major Limitation : Cannot capture relations involving lexical items outside actual constituent (cfr. history based models) NP[d1][d0] V[d0] d0 NP[d1] d1 PP[d2][d3] d2 cannot look at d0 when computing PP attachment
  17. 17. LCFG at work • lexicalized context-free parsers that are not LCFG : – Magerman 1995 : Shift-Reduce+ – Ratnaparkhi 1997 : Shift-Reduce+ – Chelba & Jelinek 1998 : Shift-Reduce+ – Hermjakob & Mooney 1997 : LR
  18. 18. Related work Other frameworks for the study of lexicalized grammars : • Carroll & Weir 1997 : Stochastic Lexicalized Grammars; emphasis on expressiveness • Goodman 1997 : Probabilistic Feature Grammars; emphasis on parameter estimation
  19. 19. Summary • Part I: Lexicalized Context-Free Grammars – motivations and definition – relation with other formalisms • Part II: standard parsing – TD techniques – BU techniques • Part III: novel algorithms – BU enhanced – TD enhanced
  20. 20. Standard Parsing • standard parsing algorithms (CKY, Earley, LC, ...) run on LCFG in time O ( | G | × |w |3 ) • for 2-lex CFG (simplest case) |G | grows with |VD|3 × |VT|2 !! Goal : Get rid of |VT| factors
  21. 21. Standard Parsing: TD Result (to be refined) : Algorithms satisfying the correct-prefix property are “unlikely” to run on LCFG in time independent of VT
  22. 22. Correct-prefix property Earley, Left-Corner, GLR, ... : S w left-to-right reading position
  23. 23. On-line parsing No grammar precompilation (Earley) : G w Parser Output
  24. 24. Standard Parsing: TD Result : On-line parsers with correct-prefix property cannot run in time O ( f(|VD|, |w |) ), for any function f
  25. 25. Off-line parsing Grammar is precompiled (Left-Corner, LR) : w Parser C(G ) G PreComp Output
  26. 26. Standard Parsing: TD Fact : We can simulate a nondeterministic FA M on w in time O ( |M | × |w | ) Conjecture : Fix a polynomial p. We cannot simulate M on w in time p( |w | ) unless we spend exponential time in precompiling M
  27. 27. Standard Parsing: TD Assume our conjecture holds true Result : Off-line parsers with correct-prefix property cannot run in time O ( p(|VD|, |w |) ), for any polynomial p, unless we spend exponential time in precompiling G
  28. 28. Standard Parsing: BU Common practice in lexicalized grammar parsing : • select productions that are lexically grounded in w • parse BU with selected subset of G Problem : Algorithm removes |VT| factors but introduces new |w | factors !!
  29. 29. Standard Parsing: BU Time charged : • i, k, j ⇒ |w |3 A[d2] B[d1] i d1 C[d2] k d2 • A, B, C j • d 1, d 2 Running time is O ( |VD|3 × |w |5 ) !! ⇒ |VD|3 ⇒ |w |2
  30. 30. Standard BU : Exhaustive 100000 10000 y = c x 5,2019 1000 time BU naive 100 10 1 10 100 length
  31. 31. Standard BU : Pruning 10000 1000 time y = c x 3,8282 BU naive 100 10 1 10 100 length
  32. 32. Summary • Part I: Lexicalized Context-Free Grammars – motivations and definition – relation with other formalisms • Part II: standard parsing – TD techniques – BU techniques • Part III: novel algorithms – BU enhanced – TD enhanced
  33. 33. BU enhanced Result : Parsing with 2-lex CFG in time O ( |VD|3 × |w |4 ) Remark : Result transfers to models in Alshawi 1996, Eisner 1996, Charniak 1997, Collins 1997 Remark : Technique extends to improve parsing of Lexicalized-Tree Adjoining Grammars
  34. 34. Algorithm #1 Basic step in naive BU : A[d2] B[d1] i d1 C[d2] k d2 j Idea : Indices d1 and j can be processed independently
  35. 35. Algorithm #1 • Step 1 A[d2] • Step 2 d1 ⇒ C[d2] B[d1] i A[d2] k A[d2] k d2 k i d2 C[d2] i C[d2] j d2 A[d2] ⇒ i d2 j
  36. 36. BU enhanced Upper bound provided by Algorithm #1 : O (|w |4 ) Goal : Can we go down to O (|w |3 ) ?
  37. 37. Spine The spine of a parse tree is the path from the root to the root’s head S[buy] S[buy] NP[IBM] IBM AdvP[week] VP[buy] V[buy] bought last week NP[Lotus] Lotus
  38. 38. Spine projection The spine projection is the yield of the subtree composed by the spine and all its sibling nodes S[buy] S[buy] NP[IBM] IBM AdvP[week] VP[buy] V[buy] bought last week NP[Lotus] Lotus NP[IBM] bought NP[Lotus] AdvP[week]
  39. 39. Split Grammars Split spine projections at head : ?? Problem : how much information do we need to store in order to construct new grammatical spine projections from splits ?
  40. 40. Split Grammars Fact : Set of spine projections is a linear contextfree language Definition : 2-lex CFG is split if set of spine projections is a regular language Remark : For split grammars, we can recombine splits using finite information
  41. 41. Split Grammars Non-split grammar : S[d] AdvP[a] S1[d] S[d] AdvP[a] AdvP[b] S1[d] S[d] AdvP[b] • unbounded # of dependencies between left and right dependents of head … • linguistically unattested and unlikely
  42. 42. Split Grammars S[d] AdvP[a] S[d] S[d] AdvP[b] … Split grammar : finite # of dependencies between left and right dependents of lexical head
  43. 43. Split Grammars Precompile grammar such that splits are derived separately S[buy] S[buy] S[buy] AdvP[week] NP[IBM] VP[buy] V[buy] NP[IBM] r3[buy] ⇒ NP[Lotus] bought r3[buy] is a split symbol r2[buy] AdvP[week] r1[buy] NP[Lotus] bought
  44. 44. Split Grammars • t : max # of states per spine automaton • g : max # of split symbols per spine automaton (g < t ) • m : # of delexicalized nonterminals thare are maximal projections
  45. 45. BU enhanced Result : Parsing with split 2-lexical CFG in time O ( t 2 g 2 m 2 | w |3 ) Remark: Models in Alshawi 1996, Charniak 1997 and Collins 1997 are not split
  46. 46. Algorithm #2 Idea : B[d] • recognize left and right splits separately d B[d] B[d] s d s d • collect head dependents one split at a time
  47. 47. Algorithm #2 NP[IBM] bought NP[Lotus] AdvP[week]
  48. 48. Algorithm #2 • Step 1 k s1 s2 d2 d1 d2 • Step 2 d1 B[d1] s2 d1 i ⇒ r1 B[d1] s1 B[d1] r2 r2 r2 r2 ⇒ s1 s2 s2 d2 i d2
  49. 49. Algorithm #2 : Exhaustive 100000 y = c x 5,2019 10000 y = c x 3,328 1000 time BU split BU naive 100 10 1 10 100 length
  50. 50. Algorithm #2 : Pruning 10000 y = c x 3,8282 1000 time y = c x 2,8179 BU naive BU split 100 10 1 10 100 length
  51. 51. Related work Cubic time algorithms for lexicalized grammars : • Sleator & Temperley 1991 : Link Grammars • Eisner 1997 : Bilexical Grammars (improved by transfer of Algorithm #2)
  52. 52. TD enhanced Goal : Introduce TD prediction for 2-lexical CFG parsing, without |VT| factors Remark : Must relax left-to-right parsing (because of previous results)
  53. 53. TD enhanced Result : TD parsing with 2-lex CFG in time O ( |VD|3 × |w |4 ) Open : O ( |w |3 ) extension to split grammars
  54. 54. TD enhanced Strongest version of correct-prefix property : S w reading position
  55. 55. Data Structures Prods with lhs A[d] : Trie for A[d] : • A[d] → X1[d1] X2[d2] d1 • A[d] → Y1[d3] Y2[d2] • A[d] → Z1[d2] Z2[d1] d2 d2 d1 d3
  56. 56. Data Structures Rightmost subsequence recognition by precompiling input w into a deterministic FA : a b a b c a c a c a b b
  57. 57. Algorithm #3 S A[d] i j Item representation : • i, j indicate extension of A[d] partial analysis k • k indicates rightmost possible position for completion of A[d] analysis
  58. 58. Algorithm #3 : Prediction A[d2] C[d2] B[d1] d1 i d2 k’ k • Step 1 : find rightmost subsequence before k for some A[d2] production • Step 2 : make Earley prediction
  59. 59. Conclusions • standard parsing techniques are not suitable for processing lexicalized grammars • novel algorithms have been introduced using enhanced dynamic programming • work to be done : extension to history-based models
  60. 60. The End Many thanks for helpful discussion to : Jason Eisner, Mark-Jan Nederhof

×