Getting the Most out of Transition-based Dependency Parsing


Published on

This paper suggests two ways of improving transition-based, non-projective dependency parsing. First, we add a transition to an existing non-projective parsing algorithm, so it can perform either projective or non-projective parsing as needed. Second, we present a boot- strapping technique that narrows down discrepancies between gold-standard and automatic parses used as features. The new addition to the algorithm shows a clear advantage in parsing speed. The bootstrapping technique gives a significant improvement to parsing accuracy, showing near state-of-the- art performance with respect to other parsing approaches evaluated on the same data set.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Getting the Most out of Transition-based Dependency Parsing

  1. 1. Getting the Most out of Transition-based Dependency ParsingJinho D. Choi and Martha PalmerInstitute of Cognitive Science, University of Colorado at Boulder<br />Parsing Algorithm<br />Experiments<br />Introduction<br />Bootstrapping Technique<br />Experimental setup<br /><ul><li> Corpora: English and Czech data distributed by the CoNLL’09 shared task.
  2. 2. Machine learning algorithm: Liblinear L2-L1 SVM.</li></ul>Accuracy comparisons<br /><ul><li>Our : ‘Choi and Nicolov’ + Left-Pop transition.
  3. 3. Our+: ‘Our’ + bootstrapping technique.
  4. 4. Gesmundo et al.: the best transition-based system for CoNLL’09.
  5. 5. Bohnet: the best graph-based system for CoNLL’09 (the overallrank is in the parenthesis).
  6. 6. LAS and UAS: Labeled and Unlabeled Attachment Scores. </li></ul>Speed comparisons<br /><ul><li> ‘Our’ performed slightly faster than ‘Our+’ because it made more non-deterministic Shift’s.
  7. 7. ‘Nivre’ indicates Nivre’s swap algorithm that showed an expected linear time non-projective parsing complexity (Nivre, 2009), of which we used the implementation from MaltParser.
  8. 8. The curve shown by ‘Nivre’ might be caused by implementation details regarding feature extraction, which we included as part of parsing.</li></ul>Transitions<br /><ul><li> Parsing states are represented as tuples (λ1, λ2, β, E). : λ1, λ2, and β are lists of wordtokens. : Eis a set of labeled edges (previously identified dependencies).
  9. 9. L is a dependency label, and i, j, k are indices of their corresponding word tokens.
  10. 10. The initial state is ([0], [ ], [1, …, n], E); 0 corresponds to the root node.
  11. 11. The final state is (λ1, λ2, [ ], E); the algorithm terminates when all tokens in β are consumed.
  12. 12. Left-PopL and Left-ArcL are performed when wj is the head of wi with a dependency L. : Left-Pop removes wi from λ1, assuming that the token is no longer needed. : Left-Arc keepswiso it can be the head of some token wj<k≤n in β.
  13. 13. Right-ArcL is performed when wi is the head of wj with a dependency L.
  14. 14. Shift is performed when : DT – λ1is empty. : NT – There is no token in λ1 that is either the head or a dependent of wj.
  15. 15. No-Arc is to buffer processed tokens so each token in β can be compared to all (or some) tokens prior to it.</li></ul>Parsing states<br /><ul><li> After Left-Pop is performed (#8), [w4 = my] is removed from the search space and no longer considered in the later parsing states (e.g., between #10 and #11).</li></ul>The range of subtree and head information<br /><ul><li> When wi and wj are compared, subtree and head information of these tokens is partially provided by previous parsing states.</li></ul>Bootstrapping technique<br /><ul><li> A simplified version of Searn, an algorithm for integrating search and learning to solve complex structured prediction problems (Daumé et. al., 2009).
  16. 16. The first time that this idea has been applied to transition-based dependency parsing.</li></ul>Why transition-based dependency parsing?<br /><ul><li> It is fast. : Projective parsing - O(n), non-projective parsing - O(n2).
  17. 17. Parse history can be used as features. : Parsing complexity is still preserved. Can non-projective dependency parsing be any faster?
  18. 18. # of non-projective dependencies <<< # of projective dependencies. : Perform projective parsing for most cases and non-projective parsing only when it is needed.
  19. 19. Choi and Nicolov, 2009. : Added a non-deterministic Shifttransition to Nivre’s list-based non-projective algorithmreduced the search space achieved linear time parsing speed in practice.
  20. 20. This work : Adds a transition from Nivre’s projective algorithm to Choi-Nicolov’s approach (Left-Pop).reduces the search space even more.</li></ul>How do we use parse history as features?<br /><ul><li> Current approaches use gold-standard parses as features during training.: Not necessarily what parsers encounter during decoding.
  21. 21. This work : Minimizes the gap between gold-standard and automatic parsesusing bootstrapping.</li></ul>Average parsing speedsper sentence<br /><ul><li> Nivre : 2.86 (ms)
  22. 22. Choi-Nicolov : 2.69 (ms)
  23. 23. Our+ : 2.29 (ms)
  24. 24. Our : 2.20 (ms)</li></ul>Note: ‘Our’, not presented in the figure, showed a growthvery similar to ‘Our+’.<br />Conclusion<br />Conclusion<br /><ul><li> The Left-Pop transition gives improvements to both parsing speed and accuracy, showing a linear time non-projective dependency parsing speed with respect to sentence length.
  25. 25. The bootstrapping technique gives a significant improvement to parsing accuracy, showing near state-of-the-art performance with respect to other parsing approaches.</li></ul>ClearParser<br /><ul><li> Open source project:
  26. 26. Contact: Jinho D. Choi (
  27. 27. Gold-standard labels are achieved by comparing the dependency relation between wi and wj in the gold-standard tree.</li></ul>Acknowledgments<br /><ul><li> We gratefully acknowledge the support of the National Science Foundation Grants CISE-IIS-RI-0910992, Richer Representations for Machine Translation, a sub- contract from the Mayo Clinic and Harvard Children’s Hospital based on a grant from the ONC, 90TR0002/01, Strategic Health Advanced Research Project Area 4: Natural Language Processing, and a grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc.
  28. 28. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
  29. 29. Stop the procedure when the parsing accuracy of the current cross-validation is lower than the one from the previous iteration.</li>