SlideShare a Scribd company logo
1 of 14
Download to read offline
Using Parallel Propbanks to
Enhance Word-Alignments
Jinho D. Choi (Univ. of Colorado at Boulder)
Martha Palmer (Univ. of Colorado at Boulder)
Niawen Xue (Brandeis University)
The 3rd Linguistic Annotation Workshop at ACL ’09
August 7th, 2009
Parallel Propbanks
• Propbank
- Corpus annotated with verbal propositions and their
arguments (semantic roles)
• Parallel Propbanks
- Propbanks annotated in parallel corpus
2
Gansu Province also actively explored high risk business[ ] [ ] [ ]
Arg0: explorer Arg1: things explored
!!" " # #$ % $% &'[ ] [ ] [ ]
Arg0 Arg1
Word-Alignments
• Given parallel sentences, discover translation for each
word
• GIZA++: a statistical machine translation toolkit
- It is hard to verify if the alignments are correct.
- Words with low frequencies may not get aligned.
- It does not account for semantics.
3
!" # ! $" % & # '( $% )&
is a principal economic activity in developing PudongConstruction
Predicate Matching (based on GIZA++)
• English Chinese Parallel Treebank (ECTB)
- Xinhua: Chinese newswire + literal translation
- Sinorama: Chinese news magazine + non-literal translation
6
32%
19% 3%
45%
56%
22%
3%
19%
En.verb
En.be
En.else
En.none
Xinhua: 12,895 Sinorama: 40,086
Top-down Argument Matching
• Verify word-alignments
- For each Chinese verb vc aligned to some English verb ve
- Verify that the alignment is correct if the arguments of
vc and ve match
7
!!" " # #$ % $% &'
Gansu Province also actively explored high risk business[ ][ ][ ] [ ][ ]
Arg0 ArgM ArgM Rel Arg1
Arg0 ArgM ArgM Rel Arg1
[ ] [ ] [ ] [ ] [ ]
Bingo!
Bottom-up Argument Matching
• Expand word-alignments
- For each Chinese verb vc aligned to no English word
- Align vc to ve such that ve is an English verb that maximizes
the argument matching with vc
8
!!" # $" %# & ' ( $ )" %& 担'
Foreign funded enterprises in Gansu Province no longer worry about investment risk[ ][ ][ ][ ][ ]
Arg0 A.M A.M Rel Arg1
Arg0 A.M A.M A.M Arg1 Rel
[ ] [ ] [ ][ ][ ] [ ]
Bottom-up Argument Matching
• Expand word-alignments
- For each Chinese verb vc aligned to no English word
- Align vc to ve such that ve is an English verb that maximizes
the argument matching with vc
8
ArgM Rel Arg1
[ ][ ][ ]Foreign funded enterprises in Gansu Province no longer worry about investment risk
!!" # $" %# & ' ( $ )" %& 担'
Foreign funded enterprises in Gansu Province no longer worry about investment risk
[ ] [ ] [ ][ ][ ] [ ]
Arg0 A.M A.M A.M Arg1 Rel
[ ][ ][ ][ ][ ]
Arg0 A.M A.M Rel Arg1
Argument Matching Score
• Macro argument matching score
• Micro argument matching score
• Thresholds
- Top-down: thresholds on macro score
- Bottom-up: thresholds on both macro and micro scores
9
System Overview
10
GIZA++
Word
AlignmentsVerbs aligned
to verbs
Verbs aligned
to no word
Source Language
Corpus
Target Language
Corpus
Parallel
PropbanksTop-down
Matching
Bottom-up
Matching
Verified
Alignments
Expanded
Alignments
Enhanced
Alignments
Evaluations
• Test Corpus
- NIST-GALE Web Genre Test Data
- 100 parallel sentences, 365 verb tokens, 273 verb types
• Measurements
- Term Coverage
: how many Chinese verb-types are covered
- Term Expansion
: how many English verb-types are suggested
- Alignment Accuracy
: how many suggested English verb-types are correct
11
Evaluations:Top-down
12
0
32.5
65.0
97.5
130.0
Xinhua Sinorama
62
76
129
79
Term Coverage
Mac.th = 0.0 (GIZA++) Mac.th = 0.5 (TDAM)
0%
22.5%
45.0%
67.5%
90.0%
Xinhua Sinorama
78.09%83.71%
57.76%
83.35%
Average Alignment Accuracy
Evaluations: Bottom-up
13
0
7.5
15.0
22.5
30.0
Xinhua Sinorama
27
18
Term Coverage
0%
17.5%
35.0%
52.5%
70.0%
Xinhua Sinorama
14.46%
63.89%
Average Alignment Accuracy
Mac.th = 0.8, Mic.th = 0.6
5.5% error-reduction
17% abs-improvement
Conclusions & Future Work
• Conclusions
- Top-down Argument Matching is most effective for verifying
word-alignments based on non-literal translations that have
proven difficult for GIZA++.
- Bottom-up Argument Matching shows promise for expanding
the coverage of GIZA++ alignments based on literal
translations.
• We will try to enhance word-alignments by using
- Automatically labeled Propbanks
- Nombanks, Named-entity tags
- Parallel Propbanks prior to GIZA++
14
Acknowledgements
• We gratefully acknowledge the support of the National
Science Foundation Grants IIS-0325646, Domain
Independent Semantic Parsing, CISE-CRI-0551615,
Towards a Comprehensive Linguistic Annotation, and a
grant from the Defense Advanced Research Projects
Agency (DARPA/IPTO) under the GALE program,
DARPA/CMO Contract No. HR0011-06-C-0022,
subcontract from BBN, Inc.
• Special thanks to Daniel Gildea, Ding Liu (University of
Rochester) who provided word-alignments,Wei Wang
(Information Sciences Institute at University of Southern
California) who provided the test-corpus, and Hua
Zhong (University of Colorado at Boulder) who
performed the evaluations.
15

More Related Content

Similar to Using Parallel Propbanks to enhance Word-alignments

Essential Elements of Excellent Multilingual Search
Essential Elements of Excellent Multilingual SearchEssential Elements of Excellent Multilingual Search
Essential Elements of Excellent Multilingual Searchandrew_paulsen
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningLena Shakurova
 
Detecting Cross-lingual Semantic Similarity Using Parallel PropBanks
Detecting Cross-lingual Semantic Similarity Using Parallel PropBanksDetecting Cross-lingual Semantic Similarity Using Parallel PropBanks
Detecting Cross-lingual Semantic Similarity Using Parallel PropBanksJinho Choi
 
Detecting Cross-lingual Semantic Similarity Using Parallel PropBanks
Detecting Cross-lingual Semantic Similarity Using Parallel PropBanksDetecting Cross-lingual Semantic Similarity Using Parallel PropBanks
Detecting Cross-lingual Semantic Similarity Using Parallel PropBanksJinho Choi
 
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterTwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterSubhabrata Mukherjee
 
Serverless Text Analytics with Amazon Comprehend
Serverless Text Analytics with Amazon ComprehendServerless Text Analytics with Amazon Comprehend
Serverless Text Analytics with Amazon ComprehendDonnie Prakoso
 
Korean-optimized Word Representations for Out of Vocabulary Problems caused b...
Korean-optimized Word Representations for Out of Vocabulary Problems caused b...Korean-optimized Word Representations for Out of Vocabulary Problems caused b...
Korean-optimized Word Representations for Out of Vocabulary Problems caused b...Seonghyun Kim
 
"Machine Translation 101" and the Challenge of Patents
"Machine Translation 101" and the Challenge of Patents"Machine Translation 101" and the Challenge of Patents
"Machine Translation 101" and the Challenge of PatentsIconic Translation Machines
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrLucidworks
 
Categorical Evaluation for Advanced Distributional Semantic Models
Categorical Evaluation for Advanced Distributional Semantic ModelsCategorical Evaluation for Advanced Distributional Semantic Models
Categorical Evaluation for Advanced Distributional Semantic ModelsJinho Choi
 
Data Science Your Vacation
Data Science Your VacationData Science Your Vacation
Data Science Your VacationTJ Stalcup
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingSean Golliher
 
A career in IT requires an understanding of the various technologies.docx
A career in IT requires an understanding of the various technologies.docxA career in IT requires an understanding of the various technologies.docx
A career in IT requires an understanding of the various technologies.docxJospehStull43
 
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...IRJET Journal
 
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...Shuyo Nakatani
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelHemantha Kulathilake
 
WIA 2019 - Using Embeddings to Understand the Variance and Evolution of Data ...
WIA 2019 - Using Embeddings to Understand the Variance and Evolution of Data ...WIA 2019 - Using Embeddings to Understand the Variance and Evolution of Data ...
WIA 2019 - Using Embeddings to Understand the Variance and Evolution of Data ...Women in Analytics Conference
 
Georgetown Data Science - Team BuzzFeed
Georgetown Data Science - Team BuzzFeed Georgetown Data Science - Team BuzzFeed
Georgetown Data Science - Team BuzzFeed Joshua Erb
 

Similar to Using Parallel Propbanks to enhance Word-alignments (20)

Essential Elements of Excellent Multilingual Search
Essential Elements of Excellent Multilingual SearchEssential Elements of Excellent Multilingual Search
Essential Elements of Excellent Multilingual Search
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learning
 
Detecting Cross-lingual Semantic Similarity Using Parallel PropBanks
Detecting Cross-lingual Semantic Similarity Using Parallel PropBanksDetecting Cross-lingual Semantic Similarity Using Parallel PropBanks
Detecting Cross-lingual Semantic Similarity Using Parallel PropBanks
 
Detecting Cross-lingual Semantic Similarity Using Parallel PropBanks
Detecting Cross-lingual Semantic Similarity Using Parallel PropBanksDetecting Cross-lingual Semantic Similarity Using Parallel PropBanks
Detecting Cross-lingual Semantic Similarity Using Parallel PropBanks
 
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterTwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
 
Serverless Text Analytics with Amazon Comprehend
Serverless Text Analytics with Amazon ComprehendServerless Text Analytics with Amazon Comprehend
Serverless Text Analytics with Amazon Comprehend
 
evonlp_slides.pdf
evonlp_slides.pdfevonlp_slides.pdf
evonlp_slides.pdf
 
Korean-optimized Word Representations for Out of Vocabulary Problems caused b...
Korean-optimized Word Representations for Out of Vocabulary Problems caused b...Korean-optimized Word Representations for Out of Vocabulary Problems caused b...
Korean-optimized Word Representations for Out of Vocabulary Problems caused b...
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
"Machine Translation 101" and the Challenge of Patents
"Machine Translation 101" and the Challenge of Patents"Machine Translation 101" and the Challenge of Patents
"Machine Translation 101" and the Challenge of Patents
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with Solr
 
Categorical Evaluation for Advanced Distributional Semantic Models
Categorical Evaluation for Advanced Distributional Semantic ModelsCategorical Evaluation for Advanced Distributional Semantic Models
Categorical Evaluation for Advanced Distributional Semantic Models
 
Data Science Your Vacation
Data Science Your VacationData Science Your Vacation
Data Science Your Vacation
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document Parsing
 
A career in IT requires an understanding of the various technologies.docx
A career in IT requires an understanding of the various technologies.docxA career in IT requires an understanding of the various technologies.docx
A career in IT requires an understanding of the various technologies.docx
 
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
 
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
WIA 2019 - Using Embeddings to Understand the Variance and Evolution of Data ...
WIA 2019 - Using Embeddings to Understand the Variance and Evolution of Data ...WIA 2019 - Using Embeddings to Understand the Variance and Evolution of Data ...
WIA 2019 - Using Embeddings to Understand the Variance and Evolution of Data ...
 
Georgetown Data Science - Team BuzzFeed
Georgetown Data Science - Team BuzzFeed Georgetown Data Science - Team BuzzFeed
Georgetown Data Science - Team BuzzFeed
 

More from Jinho Choi

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Jinho Choi
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Jinho Choi
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Jinho Choi
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Jinho Choi
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionJinho Choi
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Jinho Choi
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning RepresentationJinho Choi
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingJinho Choi
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet SimilaritiesJinho Choi
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical RelationsJinho Choi
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementJinho Choi
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingJinho Choi
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueJinho Choi
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingJinho Choi
 
Topological Sort
Topological SortTopological Sort
Topological SortJinho Choi
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseJinho Choi
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsJinho Choi
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyJinho Choi
 

More from Jinho Choi (20)

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 

Recently uploaded

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 

Recently uploaded (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

Using Parallel Propbanks to enhance Word-alignments

  • 1. Using Parallel Propbanks to Enhance Word-Alignments Jinho D. Choi (Univ. of Colorado at Boulder) Martha Palmer (Univ. of Colorado at Boulder) Niawen Xue (Brandeis University) The 3rd Linguistic Annotation Workshop at ACL ’09 August 7th, 2009
  • 2. Parallel Propbanks • Propbank - Corpus annotated with verbal propositions and their arguments (semantic roles) • Parallel Propbanks - Propbanks annotated in parallel corpus 2 Gansu Province also actively explored high risk business[ ] [ ] [ ] Arg0: explorer Arg1: things explored !!" " # #$ % $% &'[ ] [ ] [ ] Arg0 Arg1
  • 3. Word-Alignments • Given parallel sentences, discover translation for each word • GIZA++: a statistical machine translation toolkit - It is hard to verify if the alignments are correct. - Words with low frequencies may not get aligned. - It does not account for semantics. 3 !" # ! $" % & # '( $% )& is a principal economic activity in developing PudongConstruction
  • 4. Predicate Matching (based on GIZA++) • English Chinese Parallel Treebank (ECTB) - Xinhua: Chinese newswire + literal translation - Sinorama: Chinese news magazine + non-literal translation 6 32% 19% 3% 45% 56% 22% 3% 19% En.verb En.be En.else En.none Xinhua: 12,895 Sinorama: 40,086
  • 5. Top-down Argument Matching • Verify word-alignments - For each Chinese verb vc aligned to some English verb ve - Verify that the alignment is correct if the arguments of vc and ve match 7 !!" " # #$ % $% &' Gansu Province also actively explored high risk business[ ][ ][ ] [ ][ ] Arg0 ArgM ArgM Rel Arg1 Arg0 ArgM ArgM Rel Arg1 [ ] [ ] [ ] [ ] [ ] Bingo!
  • 6. Bottom-up Argument Matching • Expand word-alignments - For each Chinese verb vc aligned to no English word - Align vc to ve such that ve is an English verb that maximizes the argument matching with vc 8 !!" # $" %# & ' ( $ )" %& 担' Foreign funded enterprises in Gansu Province no longer worry about investment risk[ ][ ][ ][ ][ ] Arg0 A.M A.M Rel Arg1 Arg0 A.M A.M A.M Arg1 Rel [ ] [ ] [ ][ ][ ] [ ]
  • 7. Bottom-up Argument Matching • Expand word-alignments - For each Chinese verb vc aligned to no English word - Align vc to ve such that ve is an English verb that maximizes the argument matching with vc 8 ArgM Rel Arg1 [ ][ ][ ]Foreign funded enterprises in Gansu Province no longer worry about investment risk !!" # $" %# & ' ( $ )" %& 担' Foreign funded enterprises in Gansu Province no longer worry about investment risk [ ] [ ] [ ][ ][ ] [ ] Arg0 A.M A.M A.M Arg1 Rel [ ][ ][ ][ ][ ] Arg0 A.M A.M Rel Arg1
  • 8. Argument Matching Score • Macro argument matching score • Micro argument matching score • Thresholds - Top-down: thresholds on macro score - Bottom-up: thresholds on both macro and micro scores 9
  • 9. System Overview 10 GIZA++ Word AlignmentsVerbs aligned to verbs Verbs aligned to no word Source Language Corpus Target Language Corpus Parallel PropbanksTop-down Matching Bottom-up Matching Verified Alignments Expanded Alignments Enhanced Alignments
  • 10. Evaluations • Test Corpus - NIST-GALE Web Genre Test Data - 100 parallel sentences, 365 verb tokens, 273 verb types • Measurements - Term Coverage : how many Chinese verb-types are covered - Term Expansion : how many English verb-types are suggested - Alignment Accuracy : how many suggested English verb-types are correct 11
  • 11. Evaluations:Top-down 12 0 32.5 65.0 97.5 130.0 Xinhua Sinorama 62 76 129 79 Term Coverage Mac.th = 0.0 (GIZA++) Mac.th = 0.5 (TDAM) 0% 22.5% 45.0% 67.5% 90.0% Xinhua Sinorama 78.09%83.71% 57.76% 83.35% Average Alignment Accuracy
  • 12. Evaluations: Bottom-up 13 0 7.5 15.0 22.5 30.0 Xinhua Sinorama 27 18 Term Coverage 0% 17.5% 35.0% 52.5% 70.0% Xinhua Sinorama 14.46% 63.89% Average Alignment Accuracy Mac.th = 0.8, Mic.th = 0.6 5.5% error-reduction 17% abs-improvement
  • 13. Conclusions & Future Work • Conclusions - Top-down Argument Matching is most effective for verifying word-alignments based on non-literal translations that have proven difficult for GIZA++. - Bottom-up Argument Matching shows promise for expanding the coverage of GIZA++ alignments based on literal translations. • We will try to enhance word-alignments by using - Automatically labeled Propbanks - Nombanks, Named-entity tags - Parallel Propbanks prior to GIZA++ 14
  • 14. Acknowledgements • We gratefully acknowledge the support of the National Science Foundation Grants IIS-0325646, Domain Independent Semantic Parsing, CISE-CRI-0551615, Towards a Comprehensive Linguistic Annotation, and a grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc. • Special thanks to Daniel Gildea, Ding Liu (University of Rochester) who provided word-alignments,Wei Wang (Information Sciences Institute at University of Southern California) who provided the test-corpus, and Hua Zhong (University of Colorado at Boulder) who performed the evaluations. 15