K -best, Locally Pruned, Transition-based Dependency Parsing Using Robust Risk Minimization

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    K -best, Locally Pruned, Transition-based Dependency Parsing Using Robust Risk Minimization - Presentation Transcript

    1. K-best, Locally-pruned, Transition-based Dependency Parsing using Robust Risk Minimization Jinho D. Choi University of Colorado at Boulder J. D. Power and Associates September 9, 2009
    2. Dependency Structure • What is dependency? - Syntactic or semantic relation between word-tokens • Syntactic: NMOD (a beautiful woman) • Semantic: LOC (places in this city), TMP (events in this year) • Phrase structure vs. dependency structure - Constituents vs. dependencies S bought NP VP SBJ OBJ Pro V NP she car she bought Det N DET a a car
    3. Dependency Graph • For a sentence s = w .. w , a dependency graph G = (V , E ) 1 n s s s - V = {w = root, w , ... , w } s 0 1 n - E = {(w , r, w ) : w ! w , w ! V , w ! V - {w }, r ! R } s i j i j i s j s 0 s - R = a set of all dependency relations in s s • A well-formed dependency graph - Unique root, single head, connected, acyclic ! dependency tree - Projective vs. non-projective root She bought a car O(n) vs. root She bought a car yesterday that was blue O(n2)
    4. Dependency Parsing Models • Transition-based parsing model - Transition: an operation that searches for a dependency relation between each pair of words (e.g. Left-Arc, Shift, etc.) - Greedy search that finds local optimums (locally optimized transitions) " do better for short-distance dependencies - Nivre’s algorithm (p, O(n)), Covington’s algorithm (n, O(n2)) • Graph-based parsing model - Build a complete graph with directed/weighted edges and find the tree with the highest score (sum of all weighted edges) - Exhaustive search that finds for the global optimum (maximum spanning tree) " do better for long-distance dependencies - Eisner’s algorithm (p, O(n2)), Edmonds’ algorithm (n, O(n3))
    5. Nivre’s List-based Algorithm • Transition-based, non-projective dependency parsing algorithm • # , # != lists of partially processed tokens 1 2 $ != a list of remaining unprocessed tokens • Initialization: (# , # , $, A) = ([0], [ ], [1, 2, . . . , n], { }) 1 2 Termination: (#1, #2, $, A) = ([...], [...], [ ], {...}) Deterministic shift vs. non-deterministic shift
    6. Nivre’s List-based Algorithm root She bought a car !1 !2 " A
    7. Nivre’s List-based Algorithm root She bought a car !1 !2 " A • Initialize
    8. Nivre’s List-based Algorithm root She bought a car she bought a root car !1 !2 " A • Initialize
    9. Nivre’s List-based Algorithm root She bought a car she bought a root car !1 !2 " A • Initialize • Shift : she
    10. Nivre’s List-based Algorithm root She bought a car bought she a root car !1 !2 " A • Initialize • Shift : she
    11. Nivre’s List-based Algorithm root She bought a car bought she a root car !1 !2 " A • Initialize • Shift : she • Left-Arc : she ! bought
    12. Nivre’s List-based Algorithm root She bought a car bought a root she car she ! bought !1 !2 " A • Initialize • Shift : she • Left-Arc : she ! bought
    13. Nivre’s List-based Algorithm root She bought a car bought a root she car she ! bought !1 !2 " A • Initialize • Shift : she • Left-Arc : she ! bought • Right-Arc : root " bought
    14. Nivre’s List-based Algorithm root She bought a car bought root a root " bought she car she ! bought !1 !2 " A • Initialize • Shift : she • Left-Arc : she ! bought • Right-Arc : root " bought
    15. Nivre’s List-based Algorithm root She bought a car bought root a root " bought she car she ! bought !1 !2 " A • Initialize • Shift : she • Left-Arc : she ! bought • Right-Arc : root " bought • Shift : root, she, bought
    16. Nivre’s List-based Algorithm root She bought a car bought she a root " bought root car she ! bought !1 !2 " A • Initialize • Shift : she • Left-Arc : she ! bought • Right-Arc : root " bought • Shift : root, she, bought
    17. Nivre’s List-based Algorithm root She bought a car bought she a root " bought root car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : she ! bought • Right-Arc : root " bought • Shift : root, she, bought
    18. Nivre’s List-based Algorithm root She bought a car a bought she root " bought root car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : she ! bought • Right-Arc : root " bought • Shift : root, she, bought
    19. Nivre’s List-based Algorithm root She bought a car a bought she root " bought root car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : root " bought • Shift : root, she, bought
    20. Nivre’s List-based Algorithm root She bought a car bought a ! car she root " bought root a car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : root " bought • Shift : root, she, bought
    21. Nivre’s List-based Algorithm root She bought a car bought a ! car she root " bought root a car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : bought " car • Right-Arc : root " bought • Shift : root, she, bought
    22. Nivre’s List-based Algorithm root She bought a car bought " car a ! car she bought root " bought root a car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : bought " car • Right-Arc : root " bought • Shift : root, she, bought
    23. Nivre’s List-based Algorithm root She bought a car bought " car a ! car she bought root " bought root a car she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : bought " car • Right-Arc : root " bought • Shift: bought, a, car • Shift : root, she, bought
    24. Nivre’s List-based Algorithm root She bought a car car a bought " car bought a ! car she root " bought root she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : bought " car • Right-Arc : root " bought • Shift: bought, a, car • Shift : root, she, bought
    25. Nivre’s List-based Algorithm root She bought a car car a bought " car bought a ! car she root " bought root she ! bought !1 !2 " A • Initialize • Shift : a • Shift : she • Left-Arc : a ! car • Left-Arc : she ! bought • Right-Arc : bought " car • Right-Arc : root " bought • Shift: bought, a, car • Shift : root, she, bought • Terminate
    26. Robust Risk Minimization • Linear binary classification algorithm - Searches for a hyperplane h(x) = w ·x ! ! that separates two T classes, -1 and 1, where class(xi) = (h(xi) < 0) ? -1 : 1. - Finds " and ^! that solve the following optimization problem. • Advantages - Learns irrelevant features faster (than Perceptron). - Deals with non-linearly separable data more flexibly.
    27. K-best, Locally-pruned Parsing • RRM is a binary classification algorithm. - One-against-all method using multiple classifiers. - What if more than one classifier predict transitions? • Pick the transition with the highest score. • What if the highest scoring transition is not correct?
    28. K-best, Locally-pruned Parsing • Predicting a wrong transition at any state can generate a completely different tree (from as it would be in gold-standard). • It is better to use k-best transitions instead of 1-best. - Derive several trees and pick the one with the highest score. - score(tree) = % score(transition) " transitions used to derive the tree - Problem with the above equation (addressed yesterday) • A tree derived by a longer sequence of transitions win. • Normalize the score by the total number of transitions. • score(tree) = 1/|T|·% score(transition) " transitions
    29. Post-processing • The output from the transition-based parser is not guaranteed to be a tree but rather a forest. - It is possible for some tokens not found their heads. - For each such token, compare it against all other tokens and pick the one that gives the highest score to be the head. - For such w ,j • Compare it against all w and see which wi gives the i<j highest scoring Right-Arc transition. • Compare it against all w j<kand see which wk gives the highest scoring Left-Arc transition.
    30. Feature Space • About 14 million features • f: form, m: lemma, p: pos-tag, d: dependency label • lm(w): left-most dependent , ln(w): left-nearest dependent rm(w): right-most dependent, rn(w): right-nearest dependent
    31. Evaluation • Models I. Greedy search using the highest scoring transition II. Best search using all predicted transitions III. II + using the upper bound of 1 IV. III + using the lower bound of "0.1 V. III + using the lower bound of "0.2 VI. V + using top 2 scoring transitions VII. VI + post-processing
    32. Evaluation • Parsing accuracies Labled Attachment Score Unlabeled Attachment Score 95.00 91.25 90.97 90.12 90.47 90.47 89.21 89.34 89.42 89.28 88.62 88.87 88.87 87.50 87.88 87.96 88.08 83.75 80.00 I II III IV V VI VII
    33. Evaluation • Average number of transitions I II-III IV V VI-VII 1,500 1,125 750 375 0 2007 1-10 11-20 21-30 31-40 41-50 > 50
    34. Summary and Conclusions • Summary - Transition-based, non-projective dependency parsing - k-best, locally pruned dependency parsing - Post-processing - Robust Risk Minimization • Conclusions - It is possible to achieve higher parsing accuracy by considering k-best, locally pruned trees, - while keeping near quadratic running time in practice.
    35. Future Work • Parsing Algorithm - Search transitions for both left and right sides of "[0]. - Beam search. - Normalize scores and use priors for transitions. • Feature - Cut-off ones less than a threshold. - Predicate-argument structure from frameset files. • Machine learning algorithm - Apply different values for learning parameters. - Compare with Perceptron, Support Vector Machine.

    + Jinho D. ChoiJinho D. Choi, 4 months ago

    custom

    124 views, 0 favs, 0 embeds more stats

    We combine transition-based dependency parsing with more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 124
      • 124 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 1
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories