Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

1,430 views

Published on

No Downloads

Total views

1,430

On SlideShare

0

From Embeds

0

Number of Embeds

12

Shares

0

Downloads

17

Comments

0

Likes

1

No embeds

No notes for slide

- 1. K-best, Locally-pruned, Transition-based Dependency Parsing using Robust Risk Minimization Jinho D. Choi University of Colorado at Boulder J. D. Power and Associates September 9, 2009
- 2. Dependency Structure • What is dependency? - Syntactic or semantic relation between word-tokens • Syntactic: NMOD (a beautiful woman) • Semantic: LOC (places in this city), TMP (events in this year) • Phrase structure vs. dependency structure - Constituents vs. dependencies S bought NP VP SBJ OBJ Pro V NP she car she bought Det N DET a a car
- 3. Dependency Graph • For a sentence s = w .. w , a dependency graph G = (V , E ) 1 n s s s - V = {w = root, w , ... , w } s 0 1 n - E = {(w , r, w ) : w ! w , w ! V , w ! V - {w }, r ! R } s i j i j i s j s 0 s - R = a set of all dependency relations in s s • A well-formed dependency graph - Unique root, single head, connected, acyclic ! dependency tree - Projective vs. non-projective root She bought a car O(n) vs. root She bought a car yesterday that was blue O(n2)
- 4. Dependency Parsing Models • Transition-based parsing model - Transition: an operation that searches for a dependency relation between each pair of words (e.g. Left-Arc, Shift, etc.) - Greedy search that ﬁnds local optimums (locally optimized transitions) " do better for short-distance dependencies - Nivre’s algorithm (p, O(n)), Covington’s algorithm (n, O(n2)) • Graph-based parsing model - Build a complete graph with directed/weighted edges and ﬁnd the tree with the highest score (sum of all weighted edges) - Exhaustive search that ﬁnds for the global optimum (maximum spanning tree) " do better for long-distance dependencies - Eisner’s algorithm (p, O(n2)), Edmonds’ algorithm (n, O(n3))
- 5. Nivre’s List-based Algorithm • Transition-based, non-projective dependency parsing algorithm • # , # != lists of partially processed tokens 1 2 $ != a list of remaining unprocessed tokens • Initialization: (# , # , $, A) = ([0], [ ], [1, 2, . . . , n], { }) 1 2 Termination: (#1, #2, $, A) = ([...], [...], [ ], {...}) Deterministic shift vs. non-deterministic shift
- 6. Nivre’s List-based Algorithm root She bought a car !1 !2 " A
- 7. Nivre’s List-based Algorithm root She bought a car !1 !2 " A • Initialize
- 8. Nivre’s List-based Algorithm root She bought a car she bought a root car !1 !2 " A • Initialize
- 9. Nivre’s List-based Algorithm root She bought a car she bought a root car !1 !2 " A • Initialize • Shift : she
- 10. Nivre’s List-based Algorithm root She bought a car bought she a root car !1 !2 " A • Initialize • Shift : she
- 11. Nivre’s List-based Algorithm root She bought a car bought she a root car !1 !2 " A • Initialize • Shift : she • Left-Arc : she ! bought
- 12. Nivre’s List-based Algorithm root She bought a car bought a root she car she ! bought !1 !2 " A • Initialize • Shift : she • Left-Arc : she ! bought
- 13. Nivre’s List-based Algorithm root She bought a car bought a root she car she ! bought !1 !2 " A • Initialize • Shift : she • Left-Arc : she ! bought • Right-Arc : root " bought
- 14. Nivre’s List-based Algorithm root She bought a car bought root a root " bought she car she ! bought !1 !2 " A • Initialize • Shift : she • Left-Arc : she ! bought • Right-Arc : root " bought
- 15. Nivre’s List-based Algorithm root She bought a car bought root a root " bought she car she ! bought !1 !2 " A • Initialize • Shift : she • Left-Arc : she ! bought • Right-Arc : root " bought • Shift : root, she, bought
- 16. Nivre’s List-based Algorithm root She bought a car bought she a root " bought root car she ! bought !1 !2 " A • Initialize • Shift : she • Left-Arc : she ! bought • Right-Arc : root " bought • Shift : root, she, bought
- 17. Nivre’s List-based Algorithm root She bought a car bought she a root " bought root car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : she ! bought • Right-Arc : root " bought • Shift : root, she, bought
- 18. Nivre’s List-based Algorithm root She bought a car a bought she root " bought root car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : she ! bought • Right-Arc : root " bought • Shift : root, she, bought
- 19. Nivre’s List-based Algorithm root She bought a car a bought she root " bought root car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : root " bought • Shift : root, she, bought
- 20. Nivre’s List-based Algorithm root She bought a car bought a ! car she root " bought root a car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : root " bought • Shift : root, she, bought
- 21. Nivre’s List-based Algorithm root She bought a car bought a ! car she root " bought root a car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : bought " car • Right-Arc : root " bought • Shift : root, she, bought
- 22. Nivre’s List-based Algorithm root She bought a car bought " car a ! car she bought root " bought root a car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : bought " car • Right-Arc : root " bought • Shift : root, she, bought
- 23. Nivre’s List-based Algorithm root She bought a car bought " car a ! car she bought root " bought root a car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : bought " car • Right-Arc : root " bought • Shift: bought, a, car • Shift : root, she, bought
- 24. Nivre’s List-based Algorithm root She bought a car car a bought " car bought a ! car she root " bought root she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : bought " car • Right-Arc : root " bought • Shift: bought, a, car • Shift : root, she, bought
- 25. Nivre’s List-based Algorithm root She bought a car car a bought " car bought a ! car she root " bought root she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : bought " car • Right-Arc : root " bought • Shift: bought, a, car • Shift : root, she, bought • Terminate
- 26. Robust Risk Minimization • Linear binary classiﬁcation algorithm - Searches for a hyperplane h(x) = w ·x ! ! that separates two T classes, -1 and 1, where class(xi) = (h(xi) < 0) ? -1 : 1. - Finds " and ^! that solve the following optimization problem. • Advantages - Learns irrelevant features faster (than Perceptron). - Deals with non-linearly separable data more ﬂexibly.
- 27. K-best, Locally-pruned Parsing • RRM is a binary classiﬁcation algorithm. - One-against-all method using multiple classiﬁers. - What if more than one classiﬁer predict transitions? • Pick the transition with the highest score. • What if the highest scoring transition is not correct?
- 28. K-best, Locally-pruned Parsing • Predicting a wrong transition at any state can generate a completely different tree (from as it would be in gold-standard). • It is better to use k-best transitions instead of 1-best. - Derive several trees and pick the one with the highest score. - score(tree) = % score(transition) " transitions used to derive the tree - Problem with the above equation (addressed yesterday) • A tree derived by a longer sequence of transitions win. • Normalize the score by the total number of transitions. • score(tree) = 1/|T|·% score(transition) " transitions
- 29. Post-processing • The output from the transition-based parser is not guaranteed to be a tree but rather a forest. - It is possible for some tokens not found their heads. - For each such token, compare it against all other tokens and pick the one that gives the highest score to be the head. - For such w ,j • Compare it against all w and see which wi gives the i<j highest scoring Right-Arc transition. • Compare it against all w j<kand see which wk gives the highest scoring Left-Arc transition.
- 30. Feature Space • About 14 million features • f: form, m: lemma, p: pos-tag, d: dependency label • lm(w): left-most dependent , ln(w): left-nearest dependent rm(w): right-most dependent, rn(w): right-nearest dependent
- 31. Evaluation • Models I. Greedy search using the highest scoring transition II. Best search using all predicted transitions III. II + using the upper bound of 1 IV. III + using the lower bound of "0.1 V. III + using the lower bound of "0.2 VI. V + using top 2 scoring transitions VII. VI + post-processing
- 32. Evaluation • Parsing accuracies Labled Attachment Score Unlabeled Attachment Score 95.00 91.25 90.97 90.12 90.47 90.47 89.21 89.34 89.42 89.28 88.62 88.87 88.87 87.50 87.88 87.96 88.08 83.75 80.00 I II III IV V VI VII
- 33. Evaluation • Average number of transitions I II-III IV V VI-VII 1,500 1,125 750 375 0 2007 1-10 11-20 21-30 31-40 41-50 > 50
- 34. Summary and Conclusions • Summary - Transition-based, non-projective dependency parsing - k-best, locally pruned dependency parsing - Post-processing - Robust Risk Minimization • Conclusions - It is possible to achieve higher parsing accuracy by considering k-best, locally pruned trees, - while keeping near quadratic running time in practice.
- 35. Future Work • Parsing Algorithm - Search transitions for both left and right sides of "[0]. - Beam search. - Normalize scores and use priors for transitions. • Feature - Cut-off ones less than a threshold. - Predicate-argument structure from frameset ﬁles. • Machine learning algorithm - Apply different values for learning parameters. - Compare with Perceptron, Support Vector Machine.

No public clipboards found for this slide

Be the first to comment