K-best, Locally-pruned,
    Transition-based
Dependency Parsing using
Robust Risk Minimization
             Jinho D. Choi
...
Dependency Structure
• What is dependency?
  - Syntactic or semantic relation between word-tokens
     • Syntactic: NMOD (...
Dependency Graph
• For a sentence s = w .. w , a dependency graph G = (V , E )
                                        1  ...
Dependency Parsing Models
• Transition-based parsing model
  - Transition: an operation that searches for a dependency
   ...
Nivre’s List-based Algorithm
• Transition-based, non-projective dependency parsing algorithm
• # , # != lists of partially...
Nivre’s List-based Algorithm
      root   She   bought   a   car




 !1          !2             "         A
Nivre’s List-based Algorithm
                 root   She   bought   a   car




       !1               !2             "  ...
Nivre’s List-based Algorithm
                 root   She   bought     a      car




                                     ...
Nivre’s List-based Algorithm
                  root   She   bought     a      car




                                    ...
Nivre’s List-based Algorithm
                  root   She   bought     a      car




                                    ...
Nivre’s List-based Algorithm
                  root        She   bought     a      car




                               ...
Nivre’s List-based Algorithm
                  root        She   bought     a      car




                               ...
Nivre’s List-based Algorithm
                  root        She   bought     a      car




                               ...
Nivre’s List-based Algorithm
                  root        She    bought     a      car




                              ...
Nivre’s List-based Algorithm
                   root         She    bought     a      car




                            ...
Nivre’s List-based Algorithm
                   root         She   bought    a    car




     bought
       she          ...
Nivre’s List-based Algorithm
                   root         She        bought      a    car




     bought
       she   ...
Nivre’s List-based Algorithm
                   root         She        bought      a    car




        a
     bought
   ...
Nivre’s List-based Algorithm
                   root         She        bought       a      car




        a
     bought
...
Nivre’s List-based Algorithm
                   root         She        bought       a      car




     bought           ...
Nivre’s List-based Algorithm
                   root         She        bought       a        car




     bought         ...
Nivre’s List-based Algorithm
                   root          She          bought       a        car




                 ...
Nivre’s List-based Algorithm
                   root          She          bought         a       car




                ...
Nivre’s List-based Algorithm
                   root         She        bought         a       car



       car
        a...
Nivre’s List-based Algorithm
                   root         She        bought         a       car



       car
        a...
Robust Risk Minimization
• Linear binary classification algorithm
  - Searches for a hyperplane h(x) = w ·x ! ! that separa...
K-best, Locally-pruned Parsing
• RRM is a binary classification algorithm.
  - One-against-all method using multiple classi...
K-best, Locally-pruned Parsing
• Predicting a wrong transition at any state can generate a
  completely different tree (fr...
Post-processing
• The output from the transition-based parser is not guaranteed
  to be a tree but rather a forest.

  - I...
Feature Space
• About 14 million features




• f: form, m: lemma, p: pos-tag, d: dependency label
• lm(w): left-most depe...
Evaluation
• Models
  I. Greedy search using the highest scoring transition
  II. Best search using all predicted transiti...
Evaluation
   • Parsing accuracies
                      Labled Attachment Score                         Unlabeled Attachm...
Evaluation
• Average number of transitions
                   I   II-III     IV    V           VI-VII

1,500


1,125


 75...
Summary and Conclusions
• Summary
  - Transition-based, non-projective dependency parsing
  - k-best, locally pruned depen...
Future Work
• Parsing Algorithm
  - Search transitions for both left and right sides of "[0].
  - Beam search.
  - Normali...
Upcoming SlideShare
Loading in …5
×

K -best, Locally Pruned, Transition-based Dependency Parsing Using Robust Risk Minimization

1,283 views
1,220 views

Published on

We combine transition-based dependency parsing with a high performing but relatively underexplored machine learning technique, Robust Risk Minimization. During decoding, we judiciously prune the next parsing states using k-best ranking. Moreover, we apply a simple post-processing to ensure robustness. We evaluate our approach on the CoNLL’09 shared task English data and improve transition-based dependency parsing accuracy, reaching a labeled attachment score of 89.28%. We also have observed near quadratic average running time in practice for the algorithm.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,283
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

K -best, Locally Pruned, Transition-based Dependency Parsing Using Robust Risk Minimization

  1. 1. K-best, Locally-pruned, Transition-based Dependency Parsing using Robust Risk Minimization Jinho D. Choi University of Colorado at Boulder J. D. Power and Associates September 9, 2009
  2. 2. Dependency Structure • What is dependency? - Syntactic or semantic relation between word-tokens • Syntactic: NMOD (a beautiful woman) • Semantic: LOC (places in this city), TMP (events in this year) • Phrase structure vs. dependency structure - Constituents vs. dependencies S bought NP VP SBJ OBJ Pro V NP she car she bought Det N DET a a car
  3. 3. Dependency Graph • For a sentence s = w .. w , a dependency graph G = (V , E ) 1 n s s s - V = {w = root, w , ... , w } s 0 1 n - E = {(w , r, w ) : w ! w , w ! V , w ! V - {w }, r ! R } s i j i j i s j s 0 s - R = a set of all dependency relations in s s • A well-formed dependency graph - Unique root, single head, connected, acyclic ! dependency tree - Projective vs. non-projective root She bought a car O(n) vs. root She bought a car yesterday that was blue O(n2)
  4. 4. Dependency Parsing Models • Transition-based parsing model - Transition: an operation that searches for a dependency relation between each pair of words (e.g. Left-Arc, Shift, etc.) - Greedy search that finds local optimums (locally optimized transitions) " do better for short-distance dependencies - Nivre’s algorithm (p, O(n)), Covington’s algorithm (n, O(n2)) • Graph-based parsing model - Build a complete graph with directed/weighted edges and find the tree with the highest score (sum of all weighted edges) - Exhaustive search that finds for the global optimum (maximum spanning tree) " do better for long-distance dependencies - Eisner’s algorithm (p, O(n2)), Edmonds’ algorithm (n, O(n3))
  5. 5. Nivre’s List-based Algorithm • Transition-based, non-projective dependency parsing algorithm • # , # != lists of partially processed tokens 1 2 $ != a list of remaining unprocessed tokens • Initialization: (# , # , $, A) = ([0], [ ], [1, 2, . . . , n], { }) 1 2 Termination: (#1, #2, $, A) = ([...], [...], [ ], {...}) Deterministic shift vs. non-deterministic shift
  6. 6. Nivre’s List-based Algorithm root She bought a car !1 !2 " A
  7. 7. Nivre’s List-based Algorithm root She bought a car !1 !2 " A • Initialize
  8. 8. Nivre’s List-based Algorithm root She bought a car she bought a root car !1 !2 " A • Initialize
  9. 9. Nivre’s List-based Algorithm root She bought a car she bought a root car !1 !2 " A • Initialize • Shift : she
  10. 10. Nivre’s List-based Algorithm root She bought a car bought she a root car !1 !2 " A • Initialize • Shift : she
  11. 11. Nivre’s List-based Algorithm root She bought a car bought she a root car !1 !2 " A • Initialize • Shift : she • Left-Arc : she ! bought
  12. 12. Nivre’s List-based Algorithm root She bought a car bought a root she car she ! bought !1 !2 " A • Initialize • Shift : she • Left-Arc : she ! bought
  13. 13. Nivre’s List-based Algorithm root She bought a car bought a root she car she ! bought !1 !2 " A • Initialize • Shift : she • Left-Arc : she ! bought • Right-Arc : root " bought
  14. 14. Nivre’s List-based Algorithm root She bought a car bought root a root " bought she car she ! bought !1 !2 " A • Initialize • Shift : she • Left-Arc : she ! bought • Right-Arc : root " bought
  15. 15. Nivre’s List-based Algorithm root She bought a car bought root a root " bought she car she ! bought !1 !2 " A • Initialize • Shift : she • Left-Arc : she ! bought • Right-Arc : root " bought • Shift : root, she, bought
  16. 16. Nivre’s List-based Algorithm root She bought a car bought she a root " bought root car she ! bought !1 !2 " A • Initialize • Shift : she • Left-Arc : she ! bought • Right-Arc : root " bought • Shift : root, she, bought
  17. 17. Nivre’s List-based Algorithm root She bought a car bought she a root " bought root car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : she ! bought • Right-Arc : root " bought • Shift : root, she, bought
  18. 18. Nivre’s List-based Algorithm root She bought a car a bought she root " bought root car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : she ! bought • Right-Arc : root " bought • Shift : root, she, bought
  19. 19. Nivre’s List-based Algorithm root She bought a car a bought she root " bought root car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : root " bought • Shift : root, she, bought
  20. 20. Nivre’s List-based Algorithm root She bought a car bought a ! car she root " bought root a car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : root " bought • Shift : root, she, bought
  21. 21. Nivre’s List-based Algorithm root She bought a car bought a ! car she root " bought root a car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : bought " car • Right-Arc : root " bought • Shift : root, she, bought
  22. 22. Nivre’s List-based Algorithm root She bought a car bought " car a ! car she bought root " bought root a car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : bought " car • Right-Arc : root " bought • Shift : root, she, bought
  23. 23. Nivre’s List-based Algorithm root She bought a car bought " car a ! car she bought root " bought root a car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : bought " car • Right-Arc : root " bought • Shift: bought, a, car • Shift : root, she, bought
  24. 24. Nivre’s List-based Algorithm root She bought a car car a bought " car bought a ! car she root " bought root she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : bought " car • Right-Arc : root " bought • Shift: bought, a, car • Shift : root, she, bought
  25. 25. Nivre’s List-based Algorithm root She bought a car car a bought " car bought a ! car she root " bought root she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : bought " car • Right-Arc : root " bought • Shift: bought, a, car • Shift : root, she, bought • Terminate
  26. 26. Robust Risk Minimization • Linear binary classification algorithm - Searches for a hyperplane h(x) = w ·x ! ! that separates two T classes, -1 and 1, where class(xi) = (h(xi) < 0) ? -1 : 1. - Finds " and ^! that solve the following optimization problem. • Advantages - Learns irrelevant features faster (than Perceptron). - Deals with non-linearly separable data more flexibly.
  27. 27. K-best, Locally-pruned Parsing • RRM is a binary classification algorithm. - One-against-all method using multiple classifiers. - What if more than one classifier predict transitions? • Pick the transition with the highest score. • What if the highest scoring transition is not correct?
  28. 28. K-best, Locally-pruned Parsing • Predicting a wrong transition at any state can generate a completely different tree (from as it would be in gold-standard). • It is better to use k-best transitions instead of 1-best. - Derive several trees and pick the one with the highest score. - score(tree) = % score(transition) " transitions used to derive the tree - Problem with the above equation (addressed yesterday) • A tree derived by a longer sequence of transitions win. • Normalize the score by the total number of transitions. • score(tree) = 1/|T|·% score(transition) " transitions
  29. 29. Post-processing • The output from the transition-based parser is not guaranteed to be a tree but rather a forest. - It is possible for some tokens not found their heads. - For each such token, compare it against all other tokens and pick the one that gives the highest score to be the head. - For such w ,j • Compare it against all w and see which wi gives the i<j highest scoring Right-Arc transition. • Compare it against all w j<kand see which wk gives the highest scoring Left-Arc transition.
  30. 30. Feature Space • About 14 million features • f: form, m: lemma, p: pos-tag, d: dependency label • lm(w): left-most dependent , ln(w): left-nearest dependent rm(w): right-most dependent, rn(w): right-nearest dependent
  31. 31. Evaluation • Models I. Greedy search using the highest scoring transition II. Best search using all predicted transitions III. II + using the upper bound of 1 IV. III + using the lower bound of "0.1 V. III + using the lower bound of "0.2 VI. V + using top 2 scoring transitions VII. VI + post-processing
  32. 32. Evaluation • Parsing accuracies Labled Attachment Score Unlabeled Attachment Score 95.00 91.25 90.97 90.12 90.47 90.47 89.21 89.34 89.42 89.28 88.62 88.87 88.87 87.50 87.88 87.96 88.08 83.75 80.00 I II III IV V VI VII
  33. 33. Evaluation • Average number of transitions I II-III IV V VI-VII 1,500 1,125 750 375 0 2007 1-10 11-20 21-30 31-40 41-50 > 50
  34. 34. Summary and Conclusions • Summary - Transition-based, non-projective dependency parsing - k-best, locally pruned dependency parsing - Post-processing - Robust Risk Minimization • Conclusions - It is possible to achieve higher parsing accuracy by considering k-best, locally pruned trees, - while keeping near quadratic running time in practice.
  35. 35. Future Work • Parsing Algorithm - Search transitions for both left and right sides of "[0]. - Beam search. - Normalize scores and use priors for transitions. • Feature - Cut-off ones less than a threshold. - Predicate-argument structure from frameset files. • Machine learning algorithm - Apply different values for learning parameters. - Compare with Perceptron, Support Vector Machine.

×