Using Parallel Propbanks to Enhance Word-alignments

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Using Parallel Propbanks to Enhance Word-alignments - Presentation Transcript

    1. Using Parallel Propbanks to enhance Word-alignments Jinho Choi, Martha Palmer, Niawen Xue Institute of Cognitive Science, University of Colorado at Boulder Average Top-down Argument Matching Scores Average Bottom-up Argument Matching Scores Background
      • Propbank
      • - A corpus annotated with verbal propositions and their arguments.
      • - Adds semantic information (semantic roles) to the phrase structures.
      • e.g. John opened the door with his foot
      • Word-alignments
      • - Parallel sentences: a sentence s and t are called parallel if t is a translation of s .
      • - Word alignment: Given parallel sentences, align words that are semantically close.
      • - GIZA++: a statistical machine translation toolkit used to train word- alignment models.
      Phrase Structure System Overview Motivation Issues with GIZA++ generated word-alignments - It is hard to verify if the alignments are correct. - Words with low frequencies may not get aligned to any words. - GIZA++ does not account for semantics. Using parallel Propbanks to enhance word-alignments for verb-predicates - Let S and T be a source and a target language, respectively. - For each verb-predicate v s ∈ S aligned to some word w t ∈ T , : if w t is also a verb-predicate and the arguments of v s and w t match, consider the alignment is correct (top-down matching). - For each verb-predicate v s ∈ S aligned to no word ∈ T , : if the arguments of v s match to the arguments of some verb- predicate v t ∈ T , align v s to v t (bottom-up matching). Propbank Annotations Corpus Description English Chinese Translation Treebank (ECTB) - A parallel corpus between English and Chinese - The corpus is divided into two parts : Xinhua Chinese newswire with literal English translations (4,363 parallel sentences) : Sinorama Chinese news magazine with non-literal English translations (12,600 parallel sentences) Predicate Matching For each Chinese verb-predicate v c aligned to some English word w e , we checked if w e is also a verb-predicate. pred = predicates, be = be-verbs, else = non-verbs, none = no words Top-down Argument Matching
      • For each Chinese verb v c aligned to an English verb v e
      • - Convert all Chinese words in the arguments of v c to their English alignments (skip ones not aligned to any English words).
      • - Compare the converted arguments of v c with the arguments of v e .
      • For each argument, check how many words are matched. If the matching is above a certain threshold, consider the alignment is correct.
      • Measurements
      • - CA = a set of arguments of v c , where ca i ∈ CA
      • EA = a set of arguments of v e , where ea i ∈ EA
      • Macro average argument matching score
      • =
      • Micro average argument matching score =
      Evaluations
      • Test corpus
      • - English-Chinese parallel corpus provided by Wei Wang (Information Sciences Institute at the Univ. of Southern California)
      • 100 parallel sentences, 273 Chinese verb-types (365 verb-tokens)
      • Test if word-alignments found in ECTB can correctly translate Chinese verbs to English verbs
      • Measurements
      • - Term coverage (TC): how many Chinese verb-types are covered by word-alignments found in ECTB
      • - Term expansion (TE): for each covered Chinese verb-type, how many English verb-types are suggested by the word-alignments
      • - Alignment accuracy (AA): how many suggested English verb-types are correct
      • Refining word-alignments
      • - Apply only the word-alignments whose macro-average scores are above a certain threshold
      • Thresholds: 0 (accept all alignments), 0.4 (accept alignments whose macro average scores are above 40%)
      • ATE = Average term expansion, AAA = Average alignment accuracy
      • Expanding word-alignments
      • Apply only the word-alignments whose macro and micro average scores are above certain thresholds
      Bottom-up Matching
      • For each Chinese verb v c aligned to no English word
      • - Convert all Chinese words to their English alignments.
      • Compare the converted arguments of v c with the arguments of each English verb v e that is not aligned to any Chinese verb, and find the one, say v m , with the maximum micro average score.
      • - If the micro average score of v c and v m is above a certain threshold, align v c to v m .
      Xinhua Sinorama Macro Avg. 80.55% 53.56% Micro Avg. 83.91% 52.62% Xinhua Sinorama Threshold 0.7 0.8 0.7 0.8 Macro Avg. 80.74% 83.99% 77.70% 82.86% Micro Avg. 82.63% 86.46% 79.45& 85.07% Xinhua Sinorama TH TC ATE AAA TC ATE AAA 0.0 79 1.77 83.35% 129 2.29 57.76% 0.4 76 1.72 83.54% 93 1.8 65.88% 0.5 76 1.68 83.71% 62 1.58 78.09% Macro – 0.7 Macro – 0.8 TC ATE AAA TC ATE AAA Micro Xinhua 0.0 22 4.27 50.38% 20 3.35 57.50% 0.6 21 3.9 54.76% 18 3.39 63.89% 0.7 19 3.47 55.26% 17 3.12 61.76% Micro Sinorama 0.0 37 3.59 18.01% 29 3.14 14.95% 0.6 31 3.06 15.11% 27 2.93 14.46% 0.7 21 2.81 11.99% 25 2.6 11.82% Summary and Future Works • Top-down Argument Matching is most effective with non-literal translations that have proven difficult for GIZA++. • Bottom-up Argument Matching shows promise for expanding the coverage of GIZA++ alignments that are based on literal translations. • In future work, we will try to enhance word-alignments by using automatically labeled Propbanks, Nombanks, and Named-entity tags.
    SlideShare Zeitgeist 2009

    + Jinho D. ChoiJinho D. Choi Nominate

    custom

    122 views, 0 favs, 0 embeds more stats

    This short paper describes the use of the linguisti more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 122
      • 122 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 0
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories