Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Weakly Supervised Multilingual Causality Extraction from Wikipedia
1. Paper information
1
• Title
ü Weakly Supervised Multilingual Causality Extraction from
Wikipedia
• URL
ü https://www.aclweb.org/anthology/D19-1296/
• Author
ü Chikara Hashimoto (Yahoo Japan)
• Conference
ü EMNLP2019
2. Background: Causality knowledge
2
• Much of the world consists of entities that causally
depend on each other
• Understanding causality knowledge is essential for
tasks such as why-QA, Reading Comprehension and
event prediction
Protectionism → Trade war
3. Background:
Causality extraction from text
3
• There exists many works of causality extraction
• Many existing works missed issues that are important
for constructing a causality knowledge base (CKB)
General framework of causality (relation) extraction [Doan, 19’]
4. Background:
Three desiderata for constructing CKB
4
• Verifiability: needed for extracted causalities, so that CKB
can sustain the credibility of its information
• Translatability: to avoid duplicating the construction
effort for CKB of different languages
• Connectivity: to connect the CKB to other KBs for
boosting efforts of KB maintenance in various communities
Tobacco → Lung cancer
true or fake ?
🤔
CKB"
CKB#
CKB$
CKB
5. Proposed: utilizing Wikipedia articles
5
Proposed method extract causalities using cause and
effect entities that correspond to Wikipedia articles
• We can verify causalities using Wikipedia articles and
connect them to other languages and KBs by Wikidata
6. Challenges: lack of training data and
descriptions for classification
6
• There is no data marking causes in Wikipedia articles
for learning of the causality classifier
ü Annotation is of cource labor-intensive
• Descriptions of Wikipedia tend to avoid redundancy
so that meaningful contexts are scattered
ü Since most of relation extraction methods handle only a
sentence, it is difficult to predict causality in this situation
7. Proposed method: Using distant
supervision and multilingual data
7
1. Automatically and accurately collect causality
entities utilizing the property of Wikipedia
2. Automatically collect contexts of causality entities
from multiple and multilingual Wikipedia sentences
8. Proposed method:
Causality entity extraction
8
• Identify causality-describing sections by using predefined
keywords that appear in the titles of such sections
ü keywords: Cause, Causes, Effect, Effects
ü To extend to multilingual settings, translate keywords into
the eight languages: de, fr, es, it, pt, sv, nl, pl
First collect causality (seed) entities, that are
more likely to participate in causality than others
9. Proposed method:
Seed causality extraction
9
Collect seed causality as entity pair (e1, e2), such that:
• e1 appears in a causality-describing section of e2,
whose title contains Cause or Causes
• e2 appears in a causality-describing section of e1,
whose title contains Effect or Effects
10. Proposed method:
Seed causality context extraction
10
• Collect seed causality context of (e1, e2) to extract
only highly relevant contexts for a target causality:
ü Extract context of e1 from the article of e2
ü Extract context of e2 from the article of e1
• Collect contexts of other languages in the same way
11. Proposed method:
Learning causality classifier
11
• Develop binary classifier using collected examples:
ü Positive example: causality entity pairs with its context
ü Negative example: entity pairs with its context, such that
one entity(article) has a link to the other entity
ex: Barack Obama → Hillary Clinton
Ø Those negative examples are sensible, meaning that they
were not random pairs but semantically related
12. Experimental settings:
Training and test data
12
• Training data: collected by proposed method
ü 879 Positive examples
ü 879 Negative examples
• Test data: using relation triples in Wikidata
ü 1,524 Positive examples: Wikidata triples (e1, relation, e2),
such that relation is “has cause” or “has effect”
ü 1,524 Negative examples: Those with other relations
13. Experimental settings:
Proposed and compared methods
13
• Proposed method (PROP):
ü fastText based classifier using collected training data
• SECTION:
ü Predict as positive if e1 and e2 appear with “cause” and “effect”
• RELATED:
ü Predict as positive if e1 and e2 are semantically related using
Wikipedia-link-based distance measure
• ORACLE RE:
ü Make oracle prediction if e1 and e2 co-occur in a sentence, this
method can be regarded as upper-bound of RE method
14. Experimental results: Overall results
14
• SECTION achieved 100% Prec.
ü This indicates that the accurateness of the training data
• ORACLE RE achieved 100% Prec. but lower Recall
ü This indicates that more important clues exist in other sentences
• PROP achieved the best F1 score
16. Experimental results: Analysis of output
16
• PROP can correctly predict causality even if both
entities do not co-occur in a sentence
ü Adipsia → Hypernatremia
- Adipsia may be seen in conditions such as diabetes
insipidus and may result in hypernatremia.
ü Hormone therapy → Cancer pain
- hormone therapy, which sometimes causes pain flares;
• Wrong output include instances that lack clues for
predicting causality
17. Discussion: Desidereta (verifiability)
17
• Examined 100 samples from the causalities that
PROP correctly classified
ü 76.5% of the samples are verifiable, by reading its
individual article
ü Ex: Onchocerca volvulus → Onchocerciasis
- Onchocerca volvulus is a nematode that causes
onchocerciasis.
18. Conclusion
18
• Proposed a weakly supervised method for
extracting causality from Wikipedia articles
ü Extracted causalities tend to be easy to verify,
translatable to other languages, and connect to other KBs
• Proposed method achieved precision and recall
above 98% and 64%, respectively
ü It could even extract causalities whose cause and effect
entities did not co-occur in a sentence
19. 所感
19
• 問題や実験の設定の作り込みがうまい
ü 既存のRelation Extractionの潮流に疑問を投げかける
新たなタスク設計をし,それなりに妥当な実験と結果
• イントロのdesiderataは強引な気がしなくもない
ü verificationまで自動でできて欲しい?
• ニューラルとかゴリゴリ使わなくてもEMNLP long
通るんですよと言うお手本