SlideShare a Scribd company logo
1 of 19
Download to read offline
Paper information
1
• Title
ü Weakly Supervised Multilingual Causality Extraction from
Wikipedia
• URL
ü https://www.aclweb.org/anthology/D19-1296/
• Author
ü Chikara Hashimoto (Yahoo Japan)
• Conference
ü EMNLP2019
Background: Causality knowledge
2
• Much of the world consists of entities that causally
depend on each other
• Understanding causality knowledge is essential for
tasks such as why-QA, Reading Comprehension and
event prediction
Protectionism → Trade war
Background:
Causality extraction from text
3
• There exists many works of causality extraction
• Many existing works missed issues that are important
for constructing a causality knowledge base (CKB)
General framework of causality (relation) extraction [Doan, 19’]
Background:
Three desiderata for constructing CKB
4
• Verifiability: needed for extracted causalities, so that CKB
can sustain the credibility of its information
• Translatability: to avoid duplicating the construction
effort for CKB of different languages
• Connectivity: to connect the CKB to other KBs for
boosting efforts of KB maintenance in various communities
Tobacco → Lung cancer
true or fake ?
🤔
CKB"
CKB#
CKB$
CKB
Proposed: utilizing Wikipedia articles
5
Proposed method extract causalities using cause and
effect entities that correspond to Wikipedia articles
• We can verify causalities using Wikipedia articles and
connect them to other languages and KBs by Wikidata
Challenges: lack of training data and
descriptions for classification
6
• There is no data marking causes in Wikipedia articles
for learning of the causality classifier
ü Annotation is of cource labor-intensive
• Descriptions of Wikipedia tend to avoid redundancy
so that meaningful contexts are scattered
ü Since most of relation extraction methods handle only a
sentence, it is difficult to predict causality in this situation
Proposed method: Using distant
supervision and multilingual data
7
1. Automatically and accurately collect causality
entities utilizing the property of Wikipedia
2. Automatically collect contexts of causality entities
from multiple and multilingual Wikipedia sentences
Proposed method:
Causality entity extraction
8
• Identify causality-describing sections by using predefined
keywords that appear in the titles of such sections
ü keywords: Cause, Causes, Effect, Effects
ü To extend to multilingual settings, translate keywords into
the eight languages: de, fr, es, it, pt, sv, nl, pl
First collect causality (seed) entities, that are
more likely to participate in causality than others
Proposed method:
Seed causality extraction
9
Collect seed causality as entity pair (e1, e2), such that:
• e1 appears in a causality-describing section of e2,
whose title contains Cause or Causes
• e2 appears in a causality-describing section of e1,
whose title contains Effect or Effects
Proposed method:
Seed causality context extraction
10
• Collect seed causality context of (e1, e2) to extract
only highly relevant contexts for a target causality:
ü Extract context of e1 from the article of e2
ü Extract context of e2 from the article of e1
• Collect contexts of other languages in the same way
Proposed method:
Learning causality classifier
11
• Develop binary classifier using collected examples:
ü Positive example: causality entity pairs with its context
ü Negative example: entity pairs with its context, such that
one entity(article) has a link to the other entity
ex: Barack Obama → Hillary Clinton
Ø Those negative examples are sensible, meaning that they
were not random pairs but semantically related
Experimental settings:
Training and test data
12
• Training data: collected by proposed method
ü 879 Positive examples
ü 879 Negative examples
• Test data: using relation triples in Wikidata
ü 1,524 Positive examples: Wikidata triples (e1, relation, e2),
such that relation is “has cause” or “has effect”
ü 1,524 Negative examples: Those with other relations
Experimental settings:
Proposed and compared methods
13
• Proposed method (PROP):
ü fastText based classifier using collected training data
• SECTION:
ü Predict as positive if e1 and e2 appear with “cause” and “effect”
• RELATED:
ü Predict as positive if e1 and e2 are semantically related using
Wikipedia-link-based distance measure
• ORACLE RE:
ü Make oracle prediction if e1 and e2 co-occur in a sentence, this
method can be regarded as upper-bound of RE method
Experimental results: Overall results
14
• SECTION achieved 100% Prec.
ü This indicates that the accurateness of the training data
• ORACLE RE achieved 100% Prec. but lower Recall
ü This indicates that more important clues exist in other sentences
• PROP achieved the best F1 score
Experimental results: Ablation test
15
Adding multilingual data boost the performance
Experimental results: Analysis of output
16
• PROP can correctly predict causality even if both
entities do not co-occur in a sentence
ü Adipsia → Hypernatremia
- Adipsia may be seen in conditions such as diabetes
insipidus and may result in hypernatremia.
ü Hormone therapy → Cancer pain
- hormone therapy, which sometimes causes pain flares;
• Wrong output include instances that lack clues for
predicting causality
Discussion: Desidereta (verifiability)
17
• Examined 100 samples from the causalities that
PROP correctly classified
ü 76.5% of the samples are verifiable, by reading its
individual article
ü Ex: Onchocerca volvulus → Onchocerciasis
- Onchocerca volvulus is a nematode that causes
onchocerciasis.
Conclusion
18
• Proposed a weakly supervised method for
extracting causality from Wikipedia articles
ü Extracted causalities tend to be easy to verify,
translatable to other languages, and connect to other KBs
• Proposed method achieved precision and recall
above 98% and 64%, respectively
ü It could even extract causalities whose cause and effect
entities did not co-occur in a sentence
所感
19
• 問題や実験の設定の作り込みがうまい
ü 既存のRelation Extractionの潮流に疑問を投げかける
新たなタスク設計をし,それなりに妥当な実験と結果
• イントロのdesiderataは強引な気がしなくもない
ü verificationまで自動でできて欲しい?
• ニューラルとかゴリゴリ使わなくてもEMNLP long
通るんですよと言うお手本

More Related Content

Similar to Weakly Supervised Multilingual Causality Extraction from Wikipedia

Towards an Ecology of Knowledge
Towards an Ecology of KnowledgeTowards an Ecology of Knowledge
Towards an Ecology of KnowledgeElwin Huaman
 
7th heredity & genetics 10 13-08
7th heredity & genetics 10 13-087th heredity & genetics 10 13-08
7th heredity & genetics 10 13-08Gregory Baker
 
A little more semantics goes a lot further!  Getting more out of Linked Data ...
A little more semantics goes a lot further!  Getting more out of Linked Data ...A little more semantics goes a lot further!  Getting more out of Linked Data ...
A little more semantics goes a lot further!  Getting more out of Linked Data ...Michel Dumontier
 
Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinSimon Jupp
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesConnected Data World
 
Using Text Analysis to Reduce Information Overload in Pandemic Influenza Plan...
Using Text Analysis to Reduce Information Overload in Pandemic Influenza Plan...Using Text Analysis to Reduce Information Overload in Pandemic Influenza Plan...
Using Text Analysis to Reduce Information Overload in Pandemic Influenza Plan...James Powell
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018Andre Freitas
 
Representing uncertainty in expert systems
Representing uncertainty in expert systemsRepresenting uncertainty in expert systems
Representing uncertainty in expert systemsbhupendra kumar
 
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...Jinho Choi
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: SummarizationMarina Santini
 
Talking papers anita-hoge-1995-128pgs-edu
Talking papers anita-hoge-1995-128pgs-eduTalking papers anita-hoge-1995-128pgs-edu
Talking papers anita-hoge-1995-128pgs-eduRareBooksnRecords
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIPistoia Alliance
 
Video enhanced website_lessonplan
Video enhanced website_lessonplanVideo enhanced website_lessonplan
Video enhanced website_lessonplanJaye-Andrea Arp
 
Open Content Barriers Oer10 Conference Mar10 V Rolfe
Open Content Barriers Oer10 Conference Mar10 V RolfeOpen Content Barriers Oer10 Conference Mar10 V Rolfe
Open Content Barriers Oer10 Conference Mar10 V RolfeVivien Rolfe
 
Open Content Barriers Oer10 Conference Mar10 V Rolfe
Open Content Barriers Oer10 Conference Mar10 V RolfeOpen Content Barriers Oer10 Conference Mar10 V Rolfe
Open Content Barriers Oer10 Conference Mar10 V Rolfeguestd94672
 
How Write A Essay In English
How Write A Essay In EnglishHow Write A Essay In English
How Write A Essay In EnglishLeslie Daniels
 

Similar to Weakly Supervised Multilingual Causality Extraction from Wikipedia (20)

Towards an Ecology of Knowledge
Towards an Ecology of KnowledgeTowards an Ecology of Knowledge
Towards an Ecology of Knowledge
 
7th heredity & genetics 10 13-08
7th heredity & genetics 10 13-087th heredity & genetics 10 13-08
7th heredity & genetics 10 13-08
 
The Importance of Open Data and Models for Energy Systems Analysis
The Importance of Open Data and Models for Energy Systems AnalysisThe Importance of Open Data and Models for Energy Systems Analysis
The Importance of Open Data and Models for Energy Systems Analysis
 
A little more semantics goes a lot further!  Getting more out of Linked Data ...
A little more semantics goes a lot further!  Getting more out of Linked Data ...A little more semantics goes a lot further!  Getting more out of Linked Data ...
A little more semantics goes a lot further!  Getting more out of Linked Data ...
 
Dean R Berry The Challenges of Technology Student Project
Dean R Berry The Challenges of  Technology Student ProjectDean R Berry The Challenges of  Technology Student Project
Dean R Berry The Challenges of Technology Student Project
 
Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlin
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
 
Using Text Analysis to Reduce Information Overload in Pandemic Influenza Plan...
Using Text Analysis to Reduce Information Overload in Pandemic Influenza Plan...Using Text Analysis to Reduce Information Overload in Pandemic Influenza Plan...
Using Text Analysis to Reduce Information Overload in Pandemic Influenza Plan...
 
Ai lab manual
Ai lab manualAi lab manual
Ai lab manual
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
 
Representing uncertainty in expert systems
Representing uncertainty in expert systemsRepresenting uncertainty in expert systems
Representing uncertainty in expert systems
 
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
Talking papers anita-hoge-1995-128pgs-edu
Talking papers anita-hoge-1995-128pgs-eduTalking papers anita-hoge-1995-128pgs-edu
Talking papers anita-hoge-1995-128pgs-edu
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBI
 
Video enhanced website_lessonplan
Video enhanced website_lessonplanVideo enhanced website_lessonplan
Video enhanced website_lessonplan
 
Scientific writing.pptx
Scientific writing.pptxScientific writing.pptx
Scientific writing.pptx
 
Open Content Barriers Oer10 Conference Mar10 V Rolfe
Open Content Barriers Oer10 Conference Mar10 V RolfeOpen Content Barriers Oer10 Conference Mar10 V Rolfe
Open Content Barriers Oer10 Conference Mar10 V Rolfe
 
Open Content Barriers Oer10 Conference Mar10 V Rolfe
Open Content Barriers Oer10 Conference Mar10 V RolfeOpen Content Barriers Oer10 Conference Mar10 V Rolfe
Open Content Barriers Oer10 Conference Mar10 V Rolfe
 
How Write A Essay In English
How Write A Essay In EnglishHow Write A Essay In English
How Write A Essay In English
 

Recently uploaded

Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxjbellis
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdfMuhammad Subhan
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 

Recently uploaded (20)

Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 

Weakly Supervised Multilingual Causality Extraction from Wikipedia

  • 1. Paper information 1 • Title ü Weakly Supervised Multilingual Causality Extraction from Wikipedia • URL ü https://www.aclweb.org/anthology/D19-1296/ • Author ü Chikara Hashimoto (Yahoo Japan) • Conference ü EMNLP2019
  • 2. Background: Causality knowledge 2 • Much of the world consists of entities that causally depend on each other • Understanding causality knowledge is essential for tasks such as why-QA, Reading Comprehension and event prediction Protectionism → Trade war
  • 3. Background: Causality extraction from text 3 • There exists many works of causality extraction • Many existing works missed issues that are important for constructing a causality knowledge base (CKB) General framework of causality (relation) extraction [Doan, 19’]
  • 4. Background: Three desiderata for constructing CKB 4 • Verifiability: needed for extracted causalities, so that CKB can sustain the credibility of its information • Translatability: to avoid duplicating the construction effort for CKB of different languages • Connectivity: to connect the CKB to other KBs for boosting efforts of KB maintenance in various communities Tobacco → Lung cancer true or fake ? 🤔 CKB" CKB# CKB$ CKB
  • 5. Proposed: utilizing Wikipedia articles 5 Proposed method extract causalities using cause and effect entities that correspond to Wikipedia articles • We can verify causalities using Wikipedia articles and connect them to other languages and KBs by Wikidata
  • 6. Challenges: lack of training data and descriptions for classification 6 • There is no data marking causes in Wikipedia articles for learning of the causality classifier ü Annotation is of cource labor-intensive • Descriptions of Wikipedia tend to avoid redundancy so that meaningful contexts are scattered ü Since most of relation extraction methods handle only a sentence, it is difficult to predict causality in this situation
  • 7. Proposed method: Using distant supervision and multilingual data 7 1. Automatically and accurately collect causality entities utilizing the property of Wikipedia 2. Automatically collect contexts of causality entities from multiple and multilingual Wikipedia sentences
  • 8. Proposed method: Causality entity extraction 8 • Identify causality-describing sections by using predefined keywords that appear in the titles of such sections ü keywords: Cause, Causes, Effect, Effects ü To extend to multilingual settings, translate keywords into the eight languages: de, fr, es, it, pt, sv, nl, pl First collect causality (seed) entities, that are more likely to participate in causality than others
  • 9. Proposed method: Seed causality extraction 9 Collect seed causality as entity pair (e1, e2), such that: • e1 appears in a causality-describing section of e2, whose title contains Cause or Causes • e2 appears in a causality-describing section of e1, whose title contains Effect or Effects
  • 10. Proposed method: Seed causality context extraction 10 • Collect seed causality context of (e1, e2) to extract only highly relevant contexts for a target causality: ü Extract context of e1 from the article of e2 ü Extract context of e2 from the article of e1 • Collect contexts of other languages in the same way
  • 11. Proposed method: Learning causality classifier 11 • Develop binary classifier using collected examples: ü Positive example: causality entity pairs with its context ü Negative example: entity pairs with its context, such that one entity(article) has a link to the other entity ex: Barack Obama → Hillary Clinton Ø Those negative examples are sensible, meaning that they were not random pairs but semantically related
  • 12. Experimental settings: Training and test data 12 • Training data: collected by proposed method ü 879 Positive examples ü 879 Negative examples • Test data: using relation triples in Wikidata ü 1,524 Positive examples: Wikidata triples (e1, relation, e2), such that relation is “has cause” or “has effect” ü 1,524 Negative examples: Those with other relations
  • 13. Experimental settings: Proposed and compared methods 13 • Proposed method (PROP): ü fastText based classifier using collected training data • SECTION: ü Predict as positive if e1 and e2 appear with “cause” and “effect” • RELATED: ü Predict as positive if e1 and e2 are semantically related using Wikipedia-link-based distance measure • ORACLE RE: ü Make oracle prediction if e1 and e2 co-occur in a sentence, this method can be regarded as upper-bound of RE method
  • 14. Experimental results: Overall results 14 • SECTION achieved 100% Prec. ü This indicates that the accurateness of the training data • ORACLE RE achieved 100% Prec. but lower Recall ü This indicates that more important clues exist in other sentences • PROP achieved the best F1 score
  • 15. Experimental results: Ablation test 15 Adding multilingual data boost the performance
  • 16. Experimental results: Analysis of output 16 • PROP can correctly predict causality even if both entities do not co-occur in a sentence ü Adipsia → Hypernatremia - Adipsia may be seen in conditions such as diabetes insipidus and may result in hypernatremia. ü Hormone therapy → Cancer pain - hormone therapy, which sometimes causes pain flares; • Wrong output include instances that lack clues for predicting causality
  • 17. Discussion: Desidereta (verifiability) 17 • Examined 100 samples from the causalities that PROP correctly classified ü 76.5% of the samples are verifiable, by reading its individual article ü Ex: Onchocerca volvulus → Onchocerciasis - Onchocerca volvulus is a nematode that causes onchocerciasis.
  • 18. Conclusion 18 • Proposed a weakly supervised method for extracting causality from Wikipedia articles ü Extracted causalities tend to be easy to verify, translatable to other languages, and connect to other KBs • Proposed method achieved precision and recall above 98% and 64%, respectively ü It could even extract causalities whose cause and effect entities did not co-occur in a sentence
  • 19. 所感 19 • 問題や実験の設定の作り込みがうまい ü 既存のRelation Extractionの潮流に疑問を投げかける 新たなタスク設計をし,それなりに妥当な実験と結果 • イントロのdesiderataは強引な気がしなくもない ü verificationまで自動でできて欲しい? • ニューラルとかゴリゴリ使わなくてもEMNLP long 通るんですよと言うお手本