SlideShare a Scribd company logo
Paper information
1
• Title
ü Weakly Supervised Multilingual Causality Extraction from
Wikipedia
• URL
ü https://www.aclweb.org/anthology/D19-1296/
• Author
ü Chikara Hashimoto (Yahoo Japan)
• Conference
ü EMNLP2019
Background: Causality knowledge
2
• Much of the world consists of entities that causally
depend on each other
• Understanding causality knowledge is essential for
tasks such as why-QA, Reading Comprehension and
event prediction
Protectionism → Trade war
Background:
Causality extraction from text
3
• There exists many works of causality extraction
• Many existing works missed issues that are important
for constructing a causality knowledge base (CKB)
General framework of causality (relation) extraction [Doan, 19’]
Background:
Three desiderata for constructing CKB
4
• Verifiability: needed for extracted causalities, so that CKB
can sustain the credibility of its information
• Translatability: to avoid duplicating the construction
effort for CKB of different languages
• Connectivity: to connect the CKB to other KBs for
boosting efforts of KB maintenance in various communities
Tobacco → Lung cancer
true or fake ?
🤔
CKB"
CKB#
CKB$
CKB
Proposed: utilizing Wikipedia articles
5
Proposed method extract causalities using cause and
effect entities that correspond to Wikipedia articles
• We can verify causalities using Wikipedia articles and
connect them to other languages and KBs by Wikidata
Challenges: lack of training data and
descriptions for classification
6
• There is no data marking causes in Wikipedia articles
for learning of the causality classifier
ü Annotation is of cource labor-intensive
• Descriptions of Wikipedia tend to avoid redundancy
so that meaningful contexts are scattered
ü Since most of relation extraction methods handle only a
sentence, it is difficult to predict causality in this situation
Proposed method: Using distant
supervision and multilingual data
7
1. Automatically and accurately collect causality
entities utilizing the property of Wikipedia
2. Automatically collect contexts of causality entities
from multiple and multilingual Wikipedia sentences
Proposed method:
Causality entity extraction
8
• Identify causality-describing sections by using predefined
keywords that appear in the titles of such sections
ü keywords: Cause, Causes, Effect, Effects
ü To extend to multilingual settings, translate keywords into
the eight languages: de, fr, es, it, pt, sv, nl, pl
First collect causality (seed) entities, that are
more likely to participate in causality than others
Proposed method:
Seed causality extraction
9
Collect seed causality as entity pair (e1, e2), such that:
• e1 appears in a causality-describing section of e2,
whose title contains Cause or Causes
• e2 appears in a causality-describing section of e1,
whose title contains Effect or Effects
Proposed method:
Seed causality context extraction
10
• Collect seed causality context of (e1, e2) to extract
only highly relevant contexts for a target causality:
ü Extract context of e1 from the article of e2
ü Extract context of e2 from the article of e1
• Collect contexts of other languages in the same way
Proposed method:
Learning causality classifier
11
• Develop binary classifier using collected examples:
ü Positive example: causality entity pairs with its context
ü Negative example: entity pairs with its context, such that
one entity(article) has a link to the other entity
ex: Barack Obama → Hillary Clinton
Ø Those negative examples are sensible, meaning that they
were not random pairs but semantically related
Experimental settings:
Training and test data
12
• Training data: collected by proposed method
ü 879 Positive examples
ü 879 Negative examples
• Test data: using relation triples in Wikidata
ü 1,524 Positive examples: Wikidata triples (e1, relation, e2),
such that relation is “has cause” or “has effect”
ü 1,524 Negative examples: Those with other relations
Experimental settings:
Proposed and compared methods
13
• Proposed method (PROP):
ü fastText based classifier using collected training data
• SECTION:
ü Predict as positive if e1 and e2 appear with “cause” and “effect”
• RELATED:
ü Predict as positive if e1 and e2 are semantically related using
Wikipedia-link-based distance measure
• ORACLE RE:
ü Make oracle prediction if e1 and e2 co-occur in a sentence, this
method can be regarded as upper-bound of RE method
Experimental results: Overall results
14
• SECTION achieved 100% Prec.
ü This indicates that the accurateness of the training data
• ORACLE RE achieved 100% Prec. but lower Recall
ü This indicates that more important clues exist in other sentences
• PROP achieved the best F1 score
Experimental results: Ablation test
15
Adding multilingual data boost the performance
Experimental results: Analysis of output
16
• PROP can correctly predict causality even if both
entities do not co-occur in a sentence
ü Adipsia → Hypernatremia
- Adipsia may be seen in conditions such as diabetes
insipidus and may result in hypernatremia.
ü Hormone therapy → Cancer pain
- hormone therapy, which sometimes causes pain flares;
• Wrong output include instances that lack clues for
predicting causality
Discussion: Desidereta (verifiability)
17
• Examined 100 samples from the causalities that
PROP correctly classified
ü 76.5% of the samples are verifiable, by reading its
individual article
ü Ex: Onchocerca volvulus → Onchocerciasis
- Onchocerca volvulus is a nematode that causes
onchocerciasis.
Conclusion
18
• Proposed a weakly supervised method for
extracting causality from Wikipedia articles
ü Extracted causalities tend to be easy to verify,
translatable to other languages, and connect to other KBs
• Proposed method achieved precision and recall
above 98% and 64%, respectively
ü It could even extract causalities whose cause and effect
entities did not co-occur in a sentence
所感
19
• 問題や実験の設定の作り込みがうまい
ü 既存のRelation Extractionの潮流に疑問を投げかける
新たなタスク設計をし,それなりに妥当な実験と結果
• イントロのdesiderataは強引な気がしなくもない
ü verificationまで自動でできて欲しい?
• ニューラルとかゴリゴリ使わなくてもEMNLP long
通るんですよと言うお手本

More Related Content

Similar to Weakly Supervised Multilingual Causality Extraction from Wikipedia

Towards an Ecology of Knowledge
Towards an Ecology of KnowledgeTowards an Ecology of Knowledge
Towards an Ecology of Knowledge
Elwin Huaman
 
7th heredity & genetics 10 13-08
7th heredity & genetics 10 13-087th heredity & genetics 10 13-08
7th heredity & genetics 10 13-08
Gregory Baker
 
The Importance of Open Data and Models for Energy Systems Analysis
The Importance of Open Data and Models for Energy Systems AnalysisThe Importance of Open Data and Models for Energy Systems Analysis
The Importance of Open Data and Models for Energy Systems Analysis
The Open Data Institute of North Carolina
 
A little more semantics goes a lot further!  Getting more out of Linked Data ...
A little more semantics goes a lot further!  Getting more out of Linked Data ...A little more semantics goes a lot further!  Getting more out of Linked Data ...
A little more semantics goes a lot further!  Getting more out of Linked Data ...
Michel Dumontier
 
Dean R Berry The Challenges of Technology Student Project
Dean R Berry The Challenges of  Technology Student ProjectDean R Berry The Challenges of  Technology Student Project
Dean R Berry The Challenges of Technology Student Project
Riverside County Office of Education
 
Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlin
Simon Jupp
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
Connected Data World
 
Using Text Analysis to Reduce Information Overload in Pandemic Influenza Plan...
Using Text Analysis to Reduce Information Overload in Pandemic Influenza Plan...Using Text Analysis to Reduce Information Overload in Pandemic Influenza Plan...
Using Text Analysis to Reduce Information Overload in Pandemic Influenza Plan...
James Powell
 
Ai lab manual
Ai lab manualAi lab manual
Ai lab manual
Shipra Swati
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
Andre Freitas
 
Representing uncertainty in expert systems
Representing uncertainty in expert systemsRepresenting uncertainty in expert systems
Representing uncertainty in expert systems
bhupendra kumar
 
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Jinho Choi
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
Marina Santini
 
Talking papers anita-hoge-1995-128pgs-edu
Talking papers anita-hoge-1995-128pgs-eduTalking papers anita-hoge-1995-128pgs-edu
Talking papers anita-hoge-1995-128pgs-edu
RareBooksnRecords
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBI
Pistoia Alliance
 
Video enhanced website_lessonplan
Video enhanced website_lessonplanVideo enhanced website_lessonplan
Video enhanced website_lessonplan
Jaye-Andrea Arp
 
Scientific writing.pptx
Scientific writing.pptxScientific writing.pptx
Scientific writing.pptx
Anand Gaurav, Ph.D.
 
Open Content Barriers Oer10 Conference Mar10 V Rolfe
Open Content Barriers Oer10 Conference Mar10 V RolfeOpen Content Barriers Oer10 Conference Mar10 V Rolfe
Open Content Barriers Oer10 Conference Mar10 V Rolfe
Vivien Rolfe
 
Open Content Barriers Oer10 Conference Mar10 V Rolfe
Open Content Barriers Oer10 Conference Mar10 V RolfeOpen Content Barriers Oer10 Conference Mar10 V Rolfe
Open Content Barriers Oer10 Conference Mar10 V Rolfe
guestd94672
 
How Write A Essay In English
How Write A Essay In EnglishHow Write A Essay In English
How Write A Essay In English
Leslie Daniels
 

Similar to Weakly Supervised Multilingual Causality Extraction from Wikipedia (20)

Towards an Ecology of Knowledge
Towards an Ecology of KnowledgeTowards an Ecology of Knowledge
Towards an Ecology of Knowledge
 
7th heredity & genetics 10 13-08
7th heredity & genetics 10 13-087th heredity & genetics 10 13-08
7th heredity & genetics 10 13-08
 
The Importance of Open Data and Models for Energy Systems Analysis
The Importance of Open Data and Models for Energy Systems AnalysisThe Importance of Open Data and Models for Energy Systems Analysis
The Importance of Open Data and Models for Energy Systems Analysis
 
A little more semantics goes a lot further!  Getting more out of Linked Data ...
A little more semantics goes a lot further!  Getting more out of Linked Data ...A little more semantics goes a lot further!  Getting more out of Linked Data ...
A little more semantics goes a lot further!  Getting more out of Linked Data ...
 
Dean R Berry The Challenges of Technology Student Project
Dean R Berry The Challenges of  Technology Student ProjectDean R Berry The Challenges of  Technology Student Project
Dean R Berry The Challenges of Technology Student Project
 
Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlin
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
 
Using Text Analysis to Reduce Information Overload in Pandemic Influenza Plan...
Using Text Analysis to Reduce Information Overload in Pandemic Influenza Plan...Using Text Analysis to Reduce Information Overload in Pandemic Influenza Plan...
Using Text Analysis to Reduce Information Overload in Pandemic Influenza Plan...
 
Ai lab manual
Ai lab manualAi lab manual
Ai lab manual
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
 
Representing uncertainty in expert systems
Representing uncertainty in expert systemsRepresenting uncertainty in expert systems
Representing uncertainty in expert systems
 
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
Talking papers anita-hoge-1995-128pgs-edu
Talking papers anita-hoge-1995-128pgs-eduTalking papers anita-hoge-1995-128pgs-edu
Talking papers anita-hoge-1995-128pgs-edu
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBI
 
Video enhanced website_lessonplan
Video enhanced website_lessonplanVideo enhanced website_lessonplan
Video enhanced website_lessonplan
 
Scientific writing.pptx
Scientific writing.pptxScientific writing.pptx
Scientific writing.pptx
 
Open Content Barriers Oer10 Conference Mar10 V Rolfe
Open Content Barriers Oer10 Conference Mar10 V RolfeOpen Content Barriers Oer10 Conference Mar10 V Rolfe
Open Content Barriers Oer10 Conference Mar10 V Rolfe
 
Open Content Barriers Oer10 Conference Mar10 V Rolfe
Open Content Barriers Oer10 Conference Mar10 V RolfeOpen Content Barriers Oer10 Conference Mar10 V Rolfe
Open Content Barriers Oer10 Conference Mar10 V Rolfe
 
How Write A Essay In English
How Write A Essay In EnglishHow Write A Essay In English
How Write A Essay In English
 

Recently uploaded

Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 

Recently uploaded (20)

Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 

Weakly Supervised Multilingual Causality Extraction from Wikipedia

  • 1. Paper information 1 • Title ü Weakly Supervised Multilingual Causality Extraction from Wikipedia • URL ü https://www.aclweb.org/anthology/D19-1296/ • Author ü Chikara Hashimoto (Yahoo Japan) • Conference ü EMNLP2019
  • 2. Background: Causality knowledge 2 • Much of the world consists of entities that causally depend on each other • Understanding causality knowledge is essential for tasks such as why-QA, Reading Comprehension and event prediction Protectionism → Trade war
  • 3. Background: Causality extraction from text 3 • There exists many works of causality extraction • Many existing works missed issues that are important for constructing a causality knowledge base (CKB) General framework of causality (relation) extraction [Doan, 19’]
  • 4. Background: Three desiderata for constructing CKB 4 • Verifiability: needed for extracted causalities, so that CKB can sustain the credibility of its information • Translatability: to avoid duplicating the construction effort for CKB of different languages • Connectivity: to connect the CKB to other KBs for boosting efforts of KB maintenance in various communities Tobacco → Lung cancer true or fake ? 🤔 CKB" CKB# CKB$ CKB
  • 5. Proposed: utilizing Wikipedia articles 5 Proposed method extract causalities using cause and effect entities that correspond to Wikipedia articles • We can verify causalities using Wikipedia articles and connect them to other languages and KBs by Wikidata
  • 6. Challenges: lack of training data and descriptions for classification 6 • There is no data marking causes in Wikipedia articles for learning of the causality classifier ü Annotation is of cource labor-intensive • Descriptions of Wikipedia tend to avoid redundancy so that meaningful contexts are scattered ü Since most of relation extraction methods handle only a sentence, it is difficult to predict causality in this situation
  • 7. Proposed method: Using distant supervision and multilingual data 7 1. Automatically and accurately collect causality entities utilizing the property of Wikipedia 2. Automatically collect contexts of causality entities from multiple and multilingual Wikipedia sentences
  • 8. Proposed method: Causality entity extraction 8 • Identify causality-describing sections by using predefined keywords that appear in the titles of such sections ü keywords: Cause, Causes, Effect, Effects ü To extend to multilingual settings, translate keywords into the eight languages: de, fr, es, it, pt, sv, nl, pl First collect causality (seed) entities, that are more likely to participate in causality than others
  • 9. Proposed method: Seed causality extraction 9 Collect seed causality as entity pair (e1, e2), such that: • e1 appears in a causality-describing section of e2, whose title contains Cause or Causes • e2 appears in a causality-describing section of e1, whose title contains Effect or Effects
  • 10. Proposed method: Seed causality context extraction 10 • Collect seed causality context of (e1, e2) to extract only highly relevant contexts for a target causality: ü Extract context of e1 from the article of e2 ü Extract context of e2 from the article of e1 • Collect contexts of other languages in the same way
  • 11. Proposed method: Learning causality classifier 11 • Develop binary classifier using collected examples: ü Positive example: causality entity pairs with its context ü Negative example: entity pairs with its context, such that one entity(article) has a link to the other entity ex: Barack Obama → Hillary Clinton Ø Those negative examples are sensible, meaning that they were not random pairs but semantically related
  • 12. Experimental settings: Training and test data 12 • Training data: collected by proposed method ü 879 Positive examples ü 879 Negative examples • Test data: using relation triples in Wikidata ü 1,524 Positive examples: Wikidata triples (e1, relation, e2), such that relation is “has cause” or “has effect” ü 1,524 Negative examples: Those with other relations
  • 13. Experimental settings: Proposed and compared methods 13 • Proposed method (PROP): ü fastText based classifier using collected training data • SECTION: ü Predict as positive if e1 and e2 appear with “cause” and “effect” • RELATED: ü Predict as positive if e1 and e2 are semantically related using Wikipedia-link-based distance measure • ORACLE RE: ü Make oracle prediction if e1 and e2 co-occur in a sentence, this method can be regarded as upper-bound of RE method
  • 14. Experimental results: Overall results 14 • SECTION achieved 100% Prec. ü This indicates that the accurateness of the training data • ORACLE RE achieved 100% Prec. but lower Recall ü This indicates that more important clues exist in other sentences • PROP achieved the best F1 score
  • 15. Experimental results: Ablation test 15 Adding multilingual data boost the performance
  • 16. Experimental results: Analysis of output 16 • PROP can correctly predict causality even if both entities do not co-occur in a sentence ü Adipsia → Hypernatremia - Adipsia may be seen in conditions such as diabetes insipidus and may result in hypernatremia. ü Hormone therapy → Cancer pain - hormone therapy, which sometimes causes pain flares; • Wrong output include instances that lack clues for predicting causality
  • 17. Discussion: Desidereta (verifiability) 17 • Examined 100 samples from the causalities that PROP correctly classified ü 76.5% of the samples are verifiable, by reading its individual article ü Ex: Onchocerca volvulus → Onchocerciasis - Onchocerca volvulus is a nematode that causes onchocerciasis.
  • 18. Conclusion 18 • Proposed a weakly supervised method for extracting causality from Wikipedia articles ü Extracted causalities tend to be easy to verify, translatable to other languages, and connect to other KBs • Proposed method achieved precision and recall above 98% and 64%, respectively ü It could even extract causalities whose cause and effect entities did not co-occur in a sentence
  • 19. 所感 19 • 問題や実験の設定の作り込みがうまい ü 既存のRelation Extractionの潮流に疑問を投げかける 新たなタスク設計をし,それなりに妥当な実験と結果 • イントロのdesiderataは強引な気がしなくもない ü verificationまで自動でできて欲しい? • ニューラルとかゴリゴリ使わなくてもEMNLP long 通るんですよと言うお手本