Using Parallel Propbanks to enhance Word-alignments

Using Parallel Propbanks to
Enhance Word-Alignments
Jinho D. Choi (Univ. of Colorado at Boulder)
Martha Palmer (Univ. of Colorado at Boulder)
Niawen Xue (Brandeis University)
The 3rd Linguistic Annotation Workshop at ACL ’09
August 7th, 2009

Parallel Propbanks
• Propbank
- Corpus annotated with verbal propositions and their
arguments (semantic roles)
• Parallel Propbanks
- Propbanks annotated in parallel corpus
2
Gansu Province also actively explored high risk business[ ] [ ] [ ]
Arg0: explorer Arg1: things explored
!!" " # #$ % $% &'[ ] [ ] [ ]
Arg0 Arg1

Word-Alignments
• Given parallel sentences, discover translation for each
word
• GIZA++: a statistical machine translation toolkit
- It is hard to verify if the alignments are correct.
- Words with low frequencies may not get aligned.
- It does not account for semantics.
3
!" # ! $" % & # '( $% )&
is a principal economic activity in developing PudongConstruction

Predicate Matching (based on GIZA++)
• English Chinese Parallel Treebank (ECTB)
- Xinhua: Chinese newswire + literal translation
- Sinorama: Chinese news magazine + non-literal translation
6
32%
19% 3%
45%
56%
22%
3%
19%
En.verb
En.be
En.else
En.none
Xinhua: 12,895 Sinorama: 40,086

Top-down Argument Matching
• Verify word-alignments
- For each Chinese verb vc aligned to some English verb ve
- Verify that the alignment is correct if the arguments of
vc and ve match
7
!!" " # #$ % $% &'
Gansu Province also actively explored high risk business[ ][ ][ ] [ ][ ]
Arg0 ArgM ArgM Rel Arg1
Arg0 ArgM ArgM Rel Arg1
[ ] [ ] [ ] [ ] [ ]
Bingo!

Bottom-up Argument Matching
• Expand word-alignments
- For each Chinese verb vc aligned to no English word
- Align vc to ve such that ve is an English verb that maximizes
the argument matching with vc
8
!!" # $" %# & ' ( $ )" %& 担'
Foreign funded enterprises in Gansu Province no longer worry about investment risk[ ][ ][ ][ ][ ]
Arg0 A.M A.M Rel Arg1
Arg0 A.M A.M A.M Arg1 Rel
[ ] [ ] [ ][ ][ ] [ ]

Bottom-up Argument Matching
• Expand word-alignments
- For each Chinese verb vc aligned to no English word
- Align vc to ve such that ve is an English verb that maximizes
the argument matching with vc
8
ArgM Rel Arg1
[ ][ ][ ]Foreign funded enterprises in Gansu Province no longer worry about investment risk
!!" # $" %# & ' ( $ )" %& 担'
Foreign funded enterprises in Gansu Province no longer worry about investment risk
[ ] [ ] [ ][ ][ ] [ ]
Arg0 A.M A.M A.M Arg1 Rel
[ ][ ][ ][ ][ ]
Arg0 A.M A.M Rel Arg1

Argument Matching Score
• Macro argument matching score
• Micro argument matching score
• Thresholds
- Top-down: thresholds on macro score
- Bottom-up: thresholds on both macro and micro scores
9

System Overview
10
GIZA++
Word
AlignmentsVerbs aligned
to verbs
Verbs aligned
to no word
Source Language
Corpus
Target Language
Corpus
Parallel
PropbanksTop-down
Matching
Bottom-up
Matching
Veriﬁed
Alignments
Expanded
Alignments
Enhanced
Alignments

Evaluations
• Test Corpus
- NIST-GALE Web Genre Test Data
- 100 parallel sentences, 365 verb tokens, 273 verb types
• Measurements
- Term Coverage
: how many Chinese verb-types are covered
- Term Expansion
: how many English verb-types are suggested
- Alignment Accuracy
: how many suggested English verb-types are correct
11

Evaluations:Top-down
12
0
32.5
65.0
97.5
130.0
Xinhua Sinorama
62
76
129
79
Term Coverage
Mac.th = 0.0 (GIZA++) Mac.th = 0.5 (TDAM)
0%
22.5%
45.0%
67.5%
90.0%
Xinhua Sinorama
78.09%83.71%
57.76%
83.35%
Average Alignment Accuracy

Evaluations: Bottom-up
13
0
7.5
15.0
22.5
30.0
Xinhua Sinorama
27
18
Term Coverage
0%
17.5%
35.0%
52.5%
70.0%
Xinhua Sinorama
14.46%
63.89%
Average Alignment Accuracy
Mac.th = 0.8, Mic.th = 0.6
5.5% error-reduction
17% abs-improvement

Conclusions & Future Work
• Conclusions
- Top-down Argument Matching is most effective for verifying
word-alignments based on non-literal translations that have
proven difﬁcult for GIZA++.
- Bottom-up Argument Matching shows promise for expanding
the coverage of GIZA++ alignments based on literal
translations.
• We will try to enhance word-alignments by using
- Automatically labeled Propbanks
- Nombanks, Named-entity tags
- Parallel Propbanks prior to GIZA++
14

Acknowledgements
• We gratefully acknowledge the support of the National
Science Foundation Grants IIS-0325646, Domain
Independent Semantic Parsing, CISE-CRI-0551615,
Towards a Comprehensive Linguistic Annotation, and a
grant from the Defense Advanced Research Projects
Agency (DARPA/IPTO) under the GALE program,
DARPA/CMO Contract No. HR0011-06-C-0022,
subcontract from BBN, Inc.
• Special thanks to Daniel Gildea, Ding Liu (University of
Rochester) who provided word-alignments,Wei Wang
(Information Sciences Institute at University of Southern
California) who provided the test-corpus, and Hua
Zhong (University of Colorado at Boulder) who
performed the evaluations.
15

Using Parallel Propbanks to enhance Word-alignments

Recommended

Recommended

More Related Content

Similar to Using Parallel Propbanks to enhance Word-alignments

Similar to Using Parallel Propbanks to enhance Word-alignments (20)

More from Jinho Choi

More from Jinho Choi (20)

Recently uploaded

Recently uploaded (20)

Using Parallel Propbanks to enhance Word-alignments