SlideShare a Scribd company logo
1 of 1
Detecting Cross-lingual Semantic Similarity
Using Parallel PropBanks
Shumin Wu, Jinho Choi, Martha Palmer
Institute of Cognitive Science, University of Colorado at Boulder
Resource
PropBank
- A corpus annotated with verbal propositions and their arguments.
- Adds semantic information (semantic roles) to the phrase structures.
- e.g. John opened the door with his foot
Parallel Corpus each with PropBank Annotation
-Parallel sentences: a sentence s and t are called parallel if t is a
translation of s.
Chinese Sentence:
大 通道 建设 搞活 了 大 西南 的 物流、 人流、
信息流,
big passage construction invigorate big southwest material flow people flow information flow
促进了 沿海 港口 城市 经济 的 发展。
promote coastal port city economy develop
Propbank Annotations:
English Sentence:
Construction of the main passage has activated the flow of
materials , the flow of people and the flow of information in the
great southwest , and has promoted development in the coastal
port cities’ economies .
PropBank Annotations:
Phrase Structure
PropBank Annotations
搞活.01:
Arg0:大通道建设
Arg1:大西南的物流、人流、信息流
促进.01:
Arg0:大通道建设
Arg1:沿海港口城市经济的发展
activate.01:
Arg0: construction of the main passage
Arg1: the flow of materials , the flow of
people and the flow of information
Argm-loc: in the great southwest
promote.01:
Arg0: construction of the main passage
Arg1: development
Argm-loc: in the coastal port cities’
economies
Motivation
Detecting Cross-lingual Semantic Similarity
- Align the PropBank annotations between parallel corpus
- Group semantically similar Chinese proposition
: generate Chinese semantic resource
- Deduce semantic similarity between the two languages
: use semantic mapping to improve word alignment and machine
translation
搞活.01:
Arg0:大通道建设
Arg1:大西南的物流、人流、信息流
促进.01:
Arg0:大通道建设
Arg1:沿海港口城市经济的发展
activate.01:
Arg0: construction of the main passage
Arg1: the flow of materials , the flow of
people and the flow of information
Argm-loc: in the great southwest
promote.01:
Arg0: construction of the main passage
Arg1: development
Argm-loc: in the coastal port cities’
economies
Word Alignment
Word Alignment:
- Given parallel sentences, align words that are semantically close.
- GIZA++: a statistical machine translation toolkit used to train word-
alignment models.
: provides word to word alignment in each direction
: using only GIZA++ to find parallel predicate pairs misses close to
20% of predicate occurrences compared to human annotator:
Percentage of aligned predicates on 200 random
Sentences in the Xinhua Corpus
Evaluating English-Chinese Semantic Classes
Chinese semantic classes
- No current Chinese verb class resource available
- Manually evaluate verb groups (that semantically map to the same
English verb) on a scale of 0-3
: score of 0: not related
: score of 1: related in context
: score of 2: hypernym/hyponym relations
: score of 3: direct/dictionary translation
English semantic classes
- Use WordNet semantic relations for evaluation
- Merging through hypernym relationship
Ex: Taxonomy of {decrease, drop, fall}, indicates decrease is hypernym of drop
and synonym of fall:
- Sense merging
Ex: Taxonomy of {sponsor, hold}, indicates sponsor is the hyponym of support.1,
and support.4 is synonym of hold
- Number of semantic classes after merging
Ex: Taxonomy of {appear, occur, emerge, exhibit. Even after sense merging,
WordNet did not find any relationship between exhibit and the other verbs,
resulting 2 semantic classes
Corpus Description
English Chinese Translation Treebank (ECTB)
- A parallel corpus between English and Chinese
- The corpus is divided into two parts
: Xinhua Chinese newswire with literal English translations
(4,363 parallel sentences)
: Sinorama Chinese news magazine with non-literal English
translations (not used for semantic mapping)
Symmetric Predicate Mapping (SPM)
- Based on GIZA++ word alignment
- Pair-wise similarity measure between semantic roles based on
aligned words of the predicate and arguments
: weighs alignment of predicate and main arguments (ARG0, ARG1)
more heavily over other arguments
: use both Chinese/English and English/Chinese GIZA++ word
alignment output to generate a bidirectional similarity measure
(harmonic mean of the two)
- Find the best one-to-one mapping (linear assignment problem)
using Kuhn-Munkres method:
- Ex:
Matches:
搞活.01 ↔ activate.01, 促进.01 ↔ promote.01
Results
Alignment GIZA++ Human Annotator
Ch.pred → En.pred 48.1% 60.1%
En.pred → Ch.pred 59.2% 73.8%
Ch.pred ↔ En.pred 53.1% 66.3%
1 2
2
ECCE
ECCE
SYM
SimSimβ
SimSim
)β(Sim



 
  j
ij
i
ij
Ci Ej
ijijSYM xxxSim
x
1,1,maxarg ,
activate.01 promote.01
搞活.01 0.77 0.25
促进.01 0.23 0.49
Method precision recall F-score
GIZA++ 84.2% 67.5% 74.9%
SPM 87.0% 88.1% 87.5%
Construction of the main passage hasactivated flow of materials thein great southwest
大 通道 建设 搞活 了 大 西南 的 物流
the
English Semantic Class Results
Experiment Setup
- Start with the Chinese verbs in the previous section, retrieve
mapped English verbs from the Xinhua corpus (excluding light
verbs and single occurrences, and single member verb sets)
Results
- Number of English verb sets: 57
- Total number of English verbs: 127
Taxonomy tree height of verbs to its lowest common hypernym
within each verb set:
Number of sense merges required:
Resulting number of semantic classes for each verb set:
Summary and Acknowledgements
- Exploring symmetric predicate similarity using PropBank predicate-argument structure
- Automatically generating English-to-Chinese and Chinese-to-English semantic class mapping
- Verifying English semantic class mapping using WordNet
We gratefully acknowledge the support of the National Science Foundation Grants CISE- CRI-0551615, and a grant from the Defense Advanced Research Projects Agency
(DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc. Any opinions, findings, and conclusions or recommendations
expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Chinese Semantic Class Results
Experiment Setup
- Choose the 50 most diversely-mapped (to Chinese) English verbs
from the Xinhua corpus (excluding light verbs and single
occurrences)
Results
- Total number of Chinese verbs: 218
- Average membership of Chinese semantic class: 4.36
- Human score:

More Related Content

Similar to Detecting Cross-lingual Semantic Similarity Using Parallel PropBanks

Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related ...
Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related ...Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related ...
Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related ...Yandex
 
OpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allOpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allAlexandre Rademaker
 
Using Parallel Propbanks to enhance Word-alignments
Using Parallel Propbanks to enhance Word-alignmentsUsing Parallel Propbanks to enhance Word-alignments
Using Parallel Propbanks to enhance Word-alignmentsJinho Choi
 
OpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportOpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportAlexandre Rademaker
 
English kazakh parallel corpus for statistical machine translation
English kazakh parallel corpus for statistical machine translationEnglish kazakh parallel corpus for statistical machine translation
English kazakh parallel corpus for statistical machine translationijnlc
 
Word embeddings
Word embeddingsWord embeddings
Word embeddingsShruti kar
 
Apertium: an extensive and shared language resource base for MT and much more...
Apertium: an extensive and shared language resource base for MT and much more...Apertium: an extensive and shared language resource base for MT and much more...
Apertium: an extensive and shared language resource base for MT and much more...TAUS - The Language Data Network
 
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...cscpconf
 
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...csandit
 
Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...
Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...
Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...QuantInsti
 
Using Parallel Propbanks to Enhance Word-alignments
Using Parallel Propbanks to Enhance Word-alignmentsUsing Parallel Propbanks to Enhance Word-alignments
Using Parallel Propbanks to Enhance Word-alignmentsJinho Choi
 
Natural Language Inference in SICK
Natural Language Inference in SICKNatural Language Inference in SICK
Natural Language Inference in SICKValeria de Paiva
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.pptbutest
 
A Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual EntailmentA Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual EntailmentFaculty of Computer Science
 
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceBeyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceVijay Prakash Dwivedi
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMassimo Schenone
 

Similar to Detecting Cross-lingual Semantic Similarity Using Parallel PropBanks (20)

NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related ...
Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related ...Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related ...
Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related ...
 
OpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allOpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for all
 
Using Parallel Propbanks to enhance Word-alignments
Using Parallel Propbanks to enhance Word-alignmentsUsing Parallel Propbanks to enhance Word-alignments
Using Parallel Propbanks to enhance Word-alignments
 
OpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportOpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project Report
 
English kazakh parallel corpus for statistical machine translation
English kazakh parallel corpus for statistical machine translationEnglish kazakh parallel corpus for statistical machine translation
English kazakh parallel corpus for statistical machine translation
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
Apertium: an extensive and shared language resource base for MT and much more...
Apertium: an extensive and shared language resource base for MT and much more...Apertium: an extensive and shared language resource base for MT and much more...
Apertium: an extensive and shared language resource base for MT and much more...
 
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
 
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
 
Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...
Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...
Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...
 
Using Parallel Propbanks to Enhance Word-alignments
Using Parallel Propbanks to Enhance Word-alignmentsUsing Parallel Propbanks to Enhance Word-alignments
Using Parallel Propbanks to Enhance Word-alignments
 
Natural Language Inference in SICK
Natural Language Inference in SICKNatural Language Inference in SICK
Natural Language Inference in SICK
 
New word analogy corpus
New word analogy corpusNew word analogy corpus
New word analogy corpus
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
A Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual EntailmentA Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual Entailment
 
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceBeyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
A Proposition Bank of Urdu
A Proposition Bank of UrduA Proposition Bank of Urdu
A Proposition Bank of Urdu
 

More from Jinho Choi

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Jinho Choi
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Jinho Choi
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Jinho Choi
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Jinho Choi
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionJinho Choi
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Jinho Choi
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning RepresentationJinho Choi
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingJinho Choi
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet SimilaritiesJinho Choi
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical RelationsJinho Choi
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementJinho Choi
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingJinho Choi
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueJinho Choi
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingJinho Choi
 
Topological Sort
Topological SortTopological Sort
Topological SortJinho Choi
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseJinho Choi
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsJinho Choi
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyJinho Choi
 

More from Jinho Choi (20)

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 

Recently uploaded

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Detecting Cross-lingual Semantic Similarity Using Parallel PropBanks

  • 1. Detecting Cross-lingual Semantic Similarity Using Parallel PropBanks Shumin Wu, Jinho Choi, Martha Palmer Institute of Cognitive Science, University of Colorado at Boulder Resource PropBank - A corpus annotated with verbal propositions and their arguments. - Adds semantic information (semantic roles) to the phrase structures. - e.g. John opened the door with his foot Parallel Corpus each with PropBank Annotation -Parallel sentences: a sentence s and t are called parallel if t is a translation of s. Chinese Sentence: 大 通道 建设 搞活 了 大 西南 的 物流、 人流、 信息流, big passage construction invigorate big southwest material flow people flow information flow 促进了 沿海 港口 城市 经济 的 发展。 promote coastal port city economy develop Propbank Annotations: English Sentence: Construction of the main passage has activated the flow of materials , the flow of people and the flow of information in the great southwest , and has promoted development in the coastal port cities’ economies . PropBank Annotations: Phrase Structure PropBank Annotations 搞活.01: Arg0:大通道建设 Arg1:大西南的物流、人流、信息流 促进.01: Arg0:大通道建设 Arg1:沿海港口城市经济的发展 activate.01: Arg0: construction of the main passage Arg1: the flow of materials , the flow of people and the flow of information Argm-loc: in the great southwest promote.01: Arg0: construction of the main passage Arg1: development Argm-loc: in the coastal port cities’ economies Motivation Detecting Cross-lingual Semantic Similarity - Align the PropBank annotations between parallel corpus - Group semantically similar Chinese proposition : generate Chinese semantic resource - Deduce semantic similarity between the two languages : use semantic mapping to improve word alignment and machine translation 搞活.01: Arg0:大通道建设 Arg1:大西南的物流、人流、信息流 促进.01: Arg0:大通道建设 Arg1:沿海港口城市经济的发展 activate.01: Arg0: construction of the main passage Arg1: the flow of materials , the flow of people and the flow of information Argm-loc: in the great southwest promote.01: Arg0: construction of the main passage Arg1: development Argm-loc: in the coastal port cities’ economies Word Alignment Word Alignment: - Given parallel sentences, align words that are semantically close. - GIZA++: a statistical machine translation toolkit used to train word- alignment models. : provides word to word alignment in each direction : using only GIZA++ to find parallel predicate pairs misses close to 20% of predicate occurrences compared to human annotator: Percentage of aligned predicates on 200 random Sentences in the Xinhua Corpus Evaluating English-Chinese Semantic Classes Chinese semantic classes - No current Chinese verb class resource available - Manually evaluate verb groups (that semantically map to the same English verb) on a scale of 0-3 : score of 0: not related : score of 1: related in context : score of 2: hypernym/hyponym relations : score of 3: direct/dictionary translation English semantic classes - Use WordNet semantic relations for evaluation - Merging through hypernym relationship Ex: Taxonomy of {decrease, drop, fall}, indicates decrease is hypernym of drop and synonym of fall: - Sense merging Ex: Taxonomy of {sponsor, hold}, indicates sponsor is the hyponym of support.1, and support.4 is synonym of hold - Number of semantic classes after merging Ex: Taxonomy of {appear, occur, emerge, exhibit. Even after sense merging, WordNet did not find any relationship between exhibit and the other verbs, resulting 2 semantic classes Corpus Description English Chinese Translation Treebank (ECTB) - A parallel corpus between English and Chinese - The corpus is divided into two parts : Xinhua Chinese newswire with literal English translations (4,363 parallel sentences) : Sinorama Chinese news magazine with non-literal English translations (not used for semantic mapping) Symmetric Predicate Mapping (SPM) - Based on GIZA++ word alignment - Pair-wise similarity measure between semantic roles based on aligned words of the predicate and arguments : weighs alignment of predicate and main arguments (ARG0, ARG1) more heavily over other arguments : use both Chinese/English and English/Chinese GIZA++ word alignment output to generate a bidirectional similarity measure (harmonic mean of the two) - Find the best one-to-one mapping (linear assignment problem) using Kuhn-Munkres method: - Ex: Matches: 搞活.01 ↔ activate.01, 促进.01 ↔ promote.01 Results Alignment GIZA++ Human Annotator Ch.pred → En.pred 48.1% 60.1% En.pred → Ch.pred 59.2% 73.8% Ch.pred ↔ En.pred 53.1% 66.3% 1 2 2 ECCE ECCE SYM SimSimβ SimSim )β(Sim        j ij i ij Ci Ej ijijSYM xxxSim x 1,1,maxarg , activate.01 promote.01 搞活.01 0.77 0.25 促进.01 0.23 0.49 Method precision recall F-score GIZA++ 84.2% 67.5% 74.9% SPM 87.0% 88.1% 87.5% Construction of the main passage hasactivated flow of materials thein great southwest 大 通道 建设 搞活 了 大 西南 的 物流 the English Semantic Class Results Experiment Setup - Start with the Chinese verbs in the previous section, retrieve mapped English verbs from the Xinhua corpus (excluding light verbs and single occurrences, and single member verb sets) Results - Number of English verb sets: 57 - Total number of English verbs: 127 Taxonomy tree height of verbs to its lowest common hypernym within each verb set: Number of sense merges required: Resulting number of semantic classes for each verb set: Summary and Acknowledgements - Exploring symmetric predicate similarity using PropBank predicate-argument structure - Automatically generating English-to-Chinese and Chinese-to-English semantic class mapping - Verifying English semantic class mapping using WordNet We gratefully acknowledge the support of the National Science Foundation Grants CISE- CRI-0551615, and a grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. Chinese Semantic Class Results Experiment Setup - Choose the 50 most diversely-mapped (to Chinese) English verbs from the Xinhua corpus (excluding light verbs and single occurrences) Results - Total number of Chinese verbs: 218 - Average membership of Chinese semantic class: 4.36 - Human score: