Weakly Supervised Multilingual Causality Extraction from Wikipedia

Paper information
1
• Title
ü Weakly Supervised Multilingual Causality Extraction from
Wikipedia
• URL
ü https://www.aclweb.org/anthology/D19-1296/
• Author
ü Chikara Hashimoto (Yahoo Japan)
• Conference
ü EMNLP2019

Background: Causality knowledge
2
• Much of the world consists of entities that causally
depend on each other
• Understanding causality knowledge is essential for
tasks such as why-QA, Reading Comprehension and
event prediction
Protectionism → Trade war

Background:
Causality extraction from text
3
• There exists many works of causality extraction
• Many existing works missed issues that are important
for constructing a causality knowledge base (CKB)
General framework of causality (relation) extraction [Doan, 19’]

Background:
Three desiderata for constructing CKB
4
• Verifiability: needed for extracted causalities, so that CKB
can sustain the credibility of its information
• Translatability: to avoid duplicating the construction
effort for CKB of different languages
• Connectivity: to connect the CKB to other KBs for
boosting efforts of KB maintenance in various communities
Tobacco → Lung cancer
true or fake ?
🤔
CKB"
CKB#
CKB$
CKB

Proposed: utilizing Wikipedia articles
5
Proposed method extract causalities using cause and
effect entities that correspond to Wikipedia articles
• We can verify causalities using Wikipedia articles and
connect them to other languages and KBs by Wikidata

Challenges: lack of training data and
descriptions for classification
6
• There is no data marking causes in Wikipedia articles
for learning of the causality classifier
ü Annotation is of cource labor-intensive
• Descriptions of Wikipedia tend to avoid redundancy
so that meaningful contexts are scattered
ü Since most of relation extraction methods handle only a
sentence, it is difficult to predict causality in this situation

Proposed method: Using distant
supervision and multilingual data
7
1. Automatically and accurately collect causality
entities utilizing the property of Wikipedia
2. Automatically collect contexts of causality entities
from multiple and multilingual Wikipedia sentences

Proposed method:
Causality entity extraction
8
• Identify causality-describing sections by using predefined
keywords that appear in the titles of such sections
ü keywords: Cause, Causes, Effect, Effects
ü To extend to multilingual settings, translate keywords into
the eight languages: de, fr, es, it, pt, sv, nl, pl
First collect causality (seed) entities, that are
more likely to participate in causality than others

Proposed method:
Seed causality extraction
9
Collect seed causality as entity pair (e1, e2), such that:
• e1 appears in a causality-describing section of e2,
whose title contains Cause or Causes
• e2 appears in a causality-describing section of e1,
whose title contains Effect or Effects

Proposed method:
Seed causality context extraction
10
• Collect seed causality context of (e1, e2) to extract
only highly relevant contexts for a target causality:
ü Extract context of e1 from the article of e2
ü Extract context of e2 from the article of e1
• Collect contexts of other languages in the same way

Proposed method:
Learning causality classifier
11
• Develop binary classifier using collected examples:
ü Positive example: causality entity pairs with its context
ü Negative example: entity pairs with its context, such that
one entity(article) has a link to the other entity
ex: Barack Obama → Hillary Clinton
Ø Those negative examples are sensible, meaning that they
were not random pairs but semantically related

Experimental settings:
Training and test data
12
• Training data: collected by proposed method
ü 879 Positive examples
ü 879 Negative examples
• Test data: using relation triples in Wikidata
ü 1,524 Positive examples: Wikidata triples (e1, relation, e2),
such that relation is “has cause” or “has effect”
ü 1,524 Negative examples: Those with other relations

Experimental settings:
Proposed and compared methods
13
• Proposed method (PROP):
ü fastText based classifier using collected training data
• SECTION:
ü Predict as positive if e1 and e2 appear with “cause” and “effect”
• RELATED:
ü Predict as positive if e1 and e2 are semantically related using
Wikipedia-link-based distance measure
• ORACLE RE:
ü Make oracle prediction if e1 and e2 co-occur in a sentence, this
method can be regarded as upper-bound of RE method

Experimental results: Overall results
14
• SECTION achieved 100% Prec.
ü This indicates that the accurateness of the training data
• ORACLE RE achieved 100% Prec. but lower Recall
ü This indicates that more important clues exist in other sentences
• PROP achieved the best F1 score

Experimental results: Ablation test
15
Adding multilingual data boost the performance

Experimental results: Analysis of output
16
• PROP can correctly predict causality even if both
entities do not co-occur in a sentence
ü Adipsia → Hypernatremia
- Adipsia may be seen in conditions such as diabetes
insipidus and may result in hypernatremia.
ü Hormone therapy → Cancer pain
- hormone therapy, which sometimes causes pain flares;
• Wrong output include instances that lack clues for
predicting causality

Discussion: Desidereta (verifiability)
17
• Examined 100 samples from the causalities that
PROP correctly classified
ü 76.5% of the samples are verifiable, by reading its
individual article
ü Ex: Onchocerca volvulus → Onchocerciasis
- Onchocerca volvulus is a nematode that causes
onchocerciasis.

Conclusion
18
• Proposed a weakly supervised method for
extracting causality from Wikipedia articles
ü Extracted causalities tend to be easy to verify,
translatable to other languages, and connect to other KBs
• Proposed method achieved precision and recall
above 98% and 64%, respectively
ü It could even extract causalities whose cause and effect
entities did not co-occur in a sentence

所感
19
• 問題や実験の設定の作り込みがうまい
ü 既存のRelation Extractionの潮流に疑問を投げかける
新たなタスク設計をし，それなりに妥当な実験と結果
• イントロのdesiderataは強引な気がしなくもない
ü verificationまで自動でできて欲しい？
• ニューラルとかゴリゴリ使わなくてもEMNLP long
通るんですよと言うお手本

Weakly Supervised Multilingual Causality Extraction from Wikipedia

Recommended

Recommended

More Related Content

Similar to Weakly Supervised Multilingual Causality Extraction from Wikipedia

Similar to Weakly Supervised Multilingual Causality Extraction from Wikipedia (20)

Recently uploaded

Recently uploaded (20)

Weakly Supervised Multilingual Causality Extraction from Wikipedia