Using Parallel Propbanks to Enhance Word-alignments

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    Using Parallel Propbanks to Enhance Word-alignments - Presentation Transcript

    1. Using Parallel Propbanks to Enhance Word-Alignments The 3rd Linguistic Annotation Workshop at ACL ’09 August 7th, 2009 Jinho D. Choi (Univ. of Colorado at Boulder) Martha Palmer (Univ. of Colorado at Boulder) Niawen Xue (Brandeis University)
    2. Parallel Propbanks • Propbank - Corpus annotated with verbal propositions and their arguments (semantic roles) [ Gansu Province] also actively [ explored ] [ high risk business] Arg0: explorer Arg1: things explored • Parallel Propbanks - Propbanks annotated in parallel corpus [!!"] " # [ #$ ] [% $% &'] Arg0 Arg1 2
    3. Word-Alignments • Given parallel sentences, discover translation for each word !" # ! $" % & # '( $% )& Construction is a principal economic activity in developing Pudong • GIZA++: a statistical machine translation toolkit - It is hard to verify if the alignments are correct. - Words with low frequencies may not get aligned. - It does not account for semantics. 3
    4. Predicate Matching (based on GIZA++) • English Chinese Parallel Treebank (ECTB) - Xinhua: Chinese newswire + literal translation - Sinorama: Chinese news magazine + non-literal translation Xinhua: 12,895 Sinorama: 40,086 19% 32% En.verb 45% En.be 3% En.else 56% En.none 22% 19% 3% 6
    5. Top-down Argument Matching • Verify word-alignments - For each Chinese verb vc aligned to some English verb ve - Verify that the alignment is correct if the arguments of vc and ve match Arg0 ArgM ArgM Rel Arg1 [ !!" ] [ " ] [ # ] [ #$ ] [ % $% &' ] [Gansu Province ][ also][ actively] [explored ][ high risk business ] Arg0 ArgM ArgM Rel Arg1 Bingo! 7
    6. Bottom-up Argument Matching • Expand word-alignments - For each Chinese verb vc aligned to no English word - Align vc to ve such that ve is an English verb that maximizes the argument matching with vc Arg0 A.M A.M A.M Arg1 Rel [ !!" # $" %#] [ &] [' ][ ( ][ $ )" %&] [ '] [ Foreign funded enterprises in Gansu Province][ no][longer ][worry about investment risk ] ][ Arg0 A.M A.M Rel Arg1 8
    7. Bottom-up Argument Matching • Expand word-alignments - For each Chinese verb vc aligned to no English word - Align vc to ve such that ve is an English verb that maximizes the argument matching with vc ArgM Rel Arg1 [Foreign ][ funded ][enterprises]in Gansu Province no longer worry about investment risk [ !!" # $" %#] [ &] [' ][ ( ][ $ )" %&] [ '] Arg0 A.M A.M A.M Arg1 Rel [ Foreign funded enterprises in Gansu Province][ no][longer ][worry about investment risk ] ][ Arg0 A.M A.M Rel Arg1 8
    8. Argument Matching Score • Macro argument matching score • Micro argument matching score • Thresholds - Top-down: thresholds on macro score - Bottom-up: thresholds on both macro and micro scores 9
    9. System Overview Source Language Target Language Corpus Corpus GIZA++ Word Verbs aligned Alignments Verbs aligned to verbs to no word Parallel Top-down Propbanks Bottom-up Matching Matching Verified Expanded Alignments Alignments Enhanced Alignments 10
    10. Evaluations • Test Corpus - NIST-GALE Web Genre Test Data - 100 parallel sentences, 365 verb tokens, 273 verb types • Measurements - Term Coverage : how many Chinese verb-types are covered - Term Expansion : how many English verb-types are suggested - Alignment Accuracy : how many suggested English verb-types are correct 11
    11. Evaluations: Top-down Mac.th = 0.0 (GIZA++) Mac.th = 0.5 (TDAM) Term Coverage 130.0 129 97.5 65.0 79 76 62 32.5 0 Xinhua Sinorama Average Alignment Accuracy 90.0% 67.5% 83.35% 83.71% 78.09% 45.0% 57.76% 22.5% 0% Xinhua Sinorama 12
    12. Evaluations: Bottom-up Mac.th = 0.8, Mic.th = 0.6 Term Coverage 30.0 22.5 27 15.0 18 7.5 0 5.5% error-reduction Xinhua Sinorama 17% abs-improvement Average Alignment Accuracy 70.0% 52.5% 63.89% 35.0% 17.5% 0% 14.46% Xinhua Sinorama 13
    13. Conclusions & Future Work • Conclusions - Top-down Argument Matching is most effective for verifying word-alignments based on non-literal translations that have proven difficult for GIZA++. - Bottom-up Argument Matching shows promise for expanding the coverage of GIZA++ alignments based on literal translations. • We will try to enhance word-alignments by using - Automatically labeled Propbanks - Nombanks, Named-entity tags - Parallel Propbanks prior to GIZA++ 14
    14. Acknowledgements • We gratefully acknowledge the support of the National Science Foundation Grants IIS-0325646, Domain Independent Semantic Parsing, CISE-CRI-0551615, Towards a Comprehensive Linguistic Annotation, and a grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc. • Special thanks to Daniel Gildea, Ding Liu (University of Rochester) who provided word-alignments, Wei Wang (Information Sciences Institute at University of Southern California) who provided the test-corpus, and Hua Zhong (University of Colorado at Boulder) who performed the evaluations. 15
    SlideShare Zeitgeist 2009

    + Jinho D. ChoiJinho D. Choi Nominate

    custom

    103 views, 1 favs, 0 embeds more stats

    This short paper describes the use of the linguisti more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 103
      • 103 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 0
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories