SlideShare a Scribd company logo
Modeling Missing Data in Distant Supervision for Information Extraction 
Alan Ritter (CMU) 
Luke Zettlemoyer(University of Washington) 
Mausam(University of Washington) 
Oren Etzioni(Vulcan Inc.) 
TACL, 1, 367-378, 2013. 
Presented by NaoakiOkazaki (Tohoku University) 
2014-09-05 Modeling Missing Data in Distant Supervision 
1
Relation instance extraction 
Steven Spielberg’s film Saving Private Ryan is loosely based on the brothers’ story. 
Extractor 
Film 
Director 
Saving Private Ryan 
Steven Spielberg 
Film-director relation 
• 
Fully-supervised learning (Zhou+ 05, …) 
• 
Uses ACE corpora to build relation-instance classifiers 
• 
Suffers from the limited number of training data 
• 
Unsupervised information extraction (Banko+ 07, …) 
• 
Extracts relational patterns between entities, and clusters the patterns into relations 
• 
Difficult to map clusters into relations of interest 
• 
Bootstrap learning (Brin98, …) 
• 
Uses seed instances to extract a new set of relational patterns 
• 
Often suffers from low precision (semantic drift) 
• 
Distant supervision (Mintz+ 09, …) 
• 
Combines the advantages of the above approaches 
2014-09-05 Modeling Missing Data in Distant Supervision 
2
Distant supervision (Mintz+, 09) 
Person 
Birthplace 
EdwinHubble 
Marshfield 
… 
… 
Automatic annotation 
Astronomer Edwin Hubble was born in Marshfield, Missouri. 
Feature extraction 
Mintzet al. (2009) Distant supervision for relation extraction without labeled data. ACL-2009, pages 1003–1011. 
* Each row presents a single feature. Concatenate features from different sentences containing the same entity pairs. 
Problem: An entity pair cannot have multiple relations 
E.g., Founded(Jobs, Apple) and CEO-of(Jobs, Apple) are true. 
2014-09-05 Modeling Missing Data in Distant Supervision 
3
MultiR(Hoffmann+, 11) 
Introduces latent variables (푧푧푖푖) to indicate the relation expressed by sentence 푥푥푖푖 
0 
1 
1 
0 
Founder 
Founder 
CEO-of 
푦푦born−in 
푦푦founder 
푦푦CEO−of 
푦푦capital−of 
Steve Jobs was founder of Apple. 
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. 
Steve Jobs is CEO of Apple. 
푧푧1 
푧푧2 
푧푧3 
푝푝풚풚,풛풛풙풙 = 1 푍푍푥푥 ෑ 푟푟 Φjoin(푦푦푟푟,풛풛)ෑ 푖푖 Φextract(푧푧푖푖,푥푥푖푖) 
푥푥1 
푥푥2 
푥푥3 
풛풛 
풙풙 
풚풚 
For entity pair, (Steve Jobs, Apple) 
푥푥푖푖: a sentence containing the entity pair 
푦푦푟푟∈{0,1}: 1if the knowledge base includes the pair with relation 푟푟, 0otherwise 
푧푧푖푖∈푅푅: the relation expressed by sentence 푥푥푖푖 
Φextract푧푧푖푖,푥푥푖푖=exp෍ 푗푗 휃휃푗푗휙휙푗푗(푧푧푖푖,푥푥푖푖) 
Φjoin푦푦푟푟,풛풛=1(¬푦푦푟푟⋁∃푖푖: 푗푗=푧푧푖푖) 
(Deterministic OR) 
The same as (Mintz+ 09) 
Φjoinensures that a sentence 푥푥푖푖expressing the relation 푟푟exists if 푟푟is true 
Allows multiple relations for the same entity pair 
2014-09-05 Modeling Missing Data in Distant Supervision 
4
MultiR: Training 
Hoffmann et al. (2011) Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations. ACL-2011, pages 541–550. 
Loop for passes over the training data 
Loop for entity pairs in the KB 
Predict sentence-level and KB-level relations (ignoring the facts in the KB) 
Find an optimal assignment of sentence-level relations consistent with the facts in KB 
We need two kinds of inferences 
Update feature weights similarly to the perceptron algorithm 
2014-09-05 Modeling Missing Data in Distant Supervision 5
MultiR: Inference 1: argmax 풚풚,풛풛 푝푝(풚풚,풛풛|풙풙) 
? 
? 
? 
? 
? 
? 
? 
푦푦born−in 
푦푦founder 
푦푦CEO−of 
푦푦capital−of 
Steve Jobs was founder of Apple. 
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. 
Steve Jobs is CEO of Apple. 
푧푧1 
푧푧2 
푧푧3 
푥푥1 
푥푥2 
푥푥3 
풛풛 
풙풙 
풚풚 
For entity pair, (Steve Jobs, Apple) 
0.5 
16.0 
9.0 
0.1 
8.0 
11.0 
6.0 
0.1 
7.0 
8.0 
7.0 
0.2 
born−in 
founder 
CEO−of 
capita−of 
Predict a relation label for each sentence independently 
Aggregate sentence- level predictions into global-level predictions 
2014-09-05 Modeling Missing Data in Distant Supervision 
6
MultiR: Inference 1: argmax 풚풚,풛풛 푝푝(풚풚,풛풛|풙풙) 
0 
1 
0 
0 
founder 
founder 
founder 
푦푦born−in 
푦푦founder 
푦푦CEO−of 
푦푦capital−of 
Steve Jobs was founder of Apple. 
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. 
Steve Jobs is CEO of Apple. 
푧푧1 
푧푧2 
푧푧3 
푥푥1 
푥푥2 
푥푥3 
풛풛 
풙풙 
풚풚 
For entity pair, (Steve Jobs, Apple) 
0.5 
16.0 
9.0 
0.1 
8.0 
11.0 
6.0 
0.1 
7.0 
8.0 
7.0 
0.2 
born−in 
founder 
CEO−of 
capita−of 
Predict a relation label for each sentence independently 
Aggregate sentence- level predictions into global-level predictions 
Very easy to find! 
Computational cost: 표표(푅푅풙풙) 
2014-09-05 Modeling Missing Data in Distant Supervision 
7
MultiR: Inference 2: argmax 풛풛 푝푝(풛풛|풙풙,풚풚) 
0 
1 
1 
0 
? 
? 
? 
푦푦born−in 
푦푦founder 
푦푦CEO−of 
푦푦capital−of 
Steve Jobs was founder of Apple. 
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. 
Steve Jobs is CEO of Apple. 
푧푧1 
푧푧2 
푧푧3 
푥푥1 
푥푥2 
푥푥3 
풛풛 
풙풙 
풚풚 
For entity pair, (Steve Jobs, Apple) 
0.5 
16.0 
9.0 
0.1 
8.0 
11.0 
6.0 
0.1 
7.0 
8.0 
7.0 
0.2 
born−in 
founder 
CEO−of 
capita−of 
0.5 
8 
7 
16 
11 
8 
9 
6 
7 
0.1 
0.1 
0.2 
Define an edge weight: w푦푦푟푟,푧푧푖푖=Φextract(푟푟,푥푥푖푖) 
A node with 푦푦푟푟=1must have at least an edge connecting to 푧푧푖푖 
Each node 푧푧푖푖must have an edge connecting to 푦푦푟푟 
Find a set of edges that maximize the sum of weights 
2014-09-05 Modeling Missing Data in Distant Supervision 
8
MultiR: Inference 2: argmax 풛풛 푝푝(풛풛|풙풙,풚풚) 
0 
1 
1 
0 
founder 
founder 
CEO-of 
푦푦born−in 
푦푦founder 
푦푦CEO−of 
푦푦capital−of 
Steve Jobs was founder of Apple. 
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. 
Steve Jobs is CEO of Apple. 
푧푧1 
푧푧2 
푧푧3 
푥푥1 
푥푥2 
푥푥3 
풛풛 
풙풙 
풚풚 
For entity pair, (Steve Jobs, Apple) 
0.5 
16.0 
9.0 
0.1 
8.0 
11.0 
6.0 
0.1 
7.0 
8.0 
7.0 
0.2 
born−in 
founder 
CEO−of 
capita−of 
16 
11 
8 
9 
6 
7 
Define an edge weight: w푦푦푟푟,푧푧푖푖=Φextract(푟푟,푥푥푖푖) 
A node with 푦푦푟푟=1must have at least an edge connecting to 푧푧푖푖 
Each node 푧푧푖푖must have an edge connecting to 푦푦푟푟 
Find a set of edges that maximize the sum of weights 
Exact solution in polynomial time 
In practice, approximate solution by greedy search (assigning 푧푧푖푖for each node 푦푦푟푟=1) is sufficient 
2014-09-05 Modeling Missing Data in Distant Supervision 
9
Contribution of this work 
• 
MultiRmakes two assumptions (hard constraints): 
• 
If a fact is not found in the database, it cannot be mentioned in the text 
• 
If a fact is in the database, it must be mentioned in at least one sentence. 
• 
Relax MultiRto handle the situation where: 
• 
A fact is not mentioned in text (MIT) 
• 
A fact mentioned in text is missing in database (MID) 
• 
Side effect of this relaxation 
• 
Incorporates the tendency that the knowledge base is likely to include popular entities and relations 
2014-09-05 Modeling Missing Data in Distant Supervision 
10
Distant Supervision with Data Not Missing at Random (DNMAR) 
0 
1 
1 
0 
Founder 
Founder 
visit 
푦푦born−in 
푦푦founder 
푦푦CEO−of 
푦푦visit 
Steve Jobs was founder of Apple. 
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. 
Steve Jobs visited Apple store… 
푧푧1 
푧푧2 
푧푧3 
푥푥1 
푥푥2 
푥푥3 
풛풛 
풙풙 
풚풚 
For entity pair, (Steve Jobs, Apple) 
0 
1 
0 
1 
풕풕 
Introduce a layer of latent variables (푡푡푟푟) to handle missing cases 
휙휙miss푦푦푟푟,푡푡푟푟 = −훼훼푀푀푀푀푀푀(푦푦푟푟=1⋀푡푡푟푟=0) (missingintext) −훼훼푀푀푀푀푀푀(푦푦푟푟=0⋀푡푡푟푟=1) (missinginDB) 0(otherwise) 
Relaxing two hard constraints in MultiRinto soft oneswith penalty factors −훼훼푀푀푀푀푀푀and −훼훼푀푀푀푀푀푀 
Introduce a new factor: 
Training algorithm is the same as the one used in MultiR 
2014-09-05 Modeling Missing Data in Distant Supervision 
11
Constrained inference: argmax 풛풛 푝푝(풛풛|풙풙,풚풚) 
0 
1 
1 
0 
? 
? 
? 
푦푦born−in 
푦푦founder 
푦푦CEO−of 
푦푦visit 
Steve Jobs was founder of Apple. 
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. 
Steve Jobs visited Apple store… 
푧푧1 
푧푧2 
푧푧3 
푥푥1 
푥푥2 
푥푥3 
풛풛 
풙풙 
풚풚 
For entity pair, (Steve Jobs, Apple) 
? 
? 
? 
? 
풕풕 
푧푧∗=argmax 풛풛 ෍ 푖푖=1 푛푛 휃휃ȉΦextract푧푧푖푖,푥푥푖푖+෍ 푟푟 훼훼푀푀푀푀푇ȉ1(푦푦푟푟⋁∃푖푖:푟푟=푧푧푖푖)−훼훼푀푀푀푀퐷ȉ1(¬푦푦푟푟⋁∃푖푖:푟푟=푧푧푖푖) 
Became more challenging 
A* search can find an exact solution, but is not scalable with many variables 
Present a greedy hill climbing approach for the inference: 
1. 
Initialize 푧푧푖푖at random 
2. 
Obtain neighborhoods of the current solution 
3. 
Move to the neighbor yielding the highest score 
4. 
Repeat this process 
2014-09-05 Modeling Missing Data in Distant Supervision 
12
Incorporating popularity in KB 
• 
We tune the penalty factors 훼훼푀푀푀푀푀푀and 훼훼푀푀푀푀퐷on a development set 
• 
We can take into account how likely each fact is to be observed in the text and the knowledge base 
• 
Facts about Barack Obama are likelyto exist 
• 
Facts about NaoakiOkazaki are unlikelyto exists 
• 
Control the penalty factor for each entity pair 
• 
Popularity of entities: 훼훼푀푀푀푀푀푀 (푒푒1,푒푒2)=−훾훾min(푐푐푒푒1,푐푐(푒푒2)) 
• 
A larger penalty if the model predicts that a fact about a popular entity does not exist in KB 
• 
Well-aligned relations: assign 3 kinds of values of 훼훼푀푀푀푀푇푟푟 
• 
A larger penalty if a popular relation such as contains, place_lived, and nationalitydoes not exist in text 
2014-09-05 Modeling Missing Data in Distant Supervision 
13
Experiments 
• 
Binary relation extraction 
• 
The standard setting (Riedel+, 10) 
• 
Knowledge base: Freebase relations 
• 
Text corpus: 1.8m New York Times articles 
• 
Two kinds of evaluation 
• 
Sentence-level extractions using the dataset (Hoffmann+, 11) 
• 
Holdout evaluation on Freebase knowledge 
• 
Unary relation extraction (NE categorization) 
• 
Twitter NE categorization dataset (Ritter+, 11) 
• 
Knowledge base: Freebase (instances and their categories) 
• 
Text corpus: tweets 
• 
Hold-out evaluation 
2014-09-05 Modeling Missing Data in Distant Supervision 
14
Results 
17% increase in area under the curve. 
Incorporating popularity yielded 27% increase over the baseline. 
This evaluation underestimate precision because many facts correctly extracted from text are missing in the database. 
DNMAR doubled the recall. 
Ritter et al. (2013) Modeling Missing Data in Distant Supervision for Information Extraction, TACL(1), 367-378. 
2014-09-05 Modeling Missing Data in Distant Supervision 
15
Conclusion 
• 
Investigated the problem of missing data in distant supervision 
• 
Presented an extension of MultiRto handle missing data 
• 
Could incorporate the popularity of facts to be included in the knowledge base and text 
• 
Presented a scalable inference algorithm based on greedy hill-climbing 
• 
Demonstrated the effectiveness of the modeling 
2014-09-05 Modeling Missing Data in Distant Supervision 
16
References 
• 
Raphael Hoffmann, CongleZhang, Xiao Ling, Luke Zettlemoyer, Daniel S. Weld. (2011) Knowledge- Based Weak Supervision for Information Extraction of Overlapping Relations. ACL-2011, pages 541–550. 
•Slides 
and 
codes 
• 
Mike Mintz, Steven Bills, RionSnow, Dan Jurafsky. (2009) Distant supervision for relation extraction without labeled data. ACL-2009, pages 1003–1011. 
2014-09-05 Modeling Missing Data in Distant Supervision 
17

More Related Content

What's hot

The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery
The Lazy Traveling Salesman Memory Management for Large-Scale Link DiscoveryThe Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery
The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery
Holistic Benchmarking of Big Linked Data
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
Masahiro Suzuki
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
Emanuele Ghelfi
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilities
taeseon ryu
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
Hoang Nguyen
 
12_applications.pdf
12_applications.pdf12_applications.pdf
12_applications.pdf
KSChidanandKumarJSSS
 
A Short Introduction to Generative Adversarial Networks
A Short Introduction to Generative Adversarial NetworksA Short Introduction to Generative Adversarial Networks
A Short Introduction to Generative Adversarial Networks
Jong Wook Kim
 
IEEESSCI2017-FOCI4-1039
IEEESSCI2017-FOCI4-1039IEEESSCI2017-FOCI4-1039
IEEESSCI2017-FOCI4-1039
Naoki Hayashi
 
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
Naoki Hayashi
 
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
남주 김
 
Probability distributions for ml
Probability distributions for mlProbability distributions for ml
Probability distributions for ml
Sung Yub Kim
 
lec18_ref.pdf
lec18_ref.pdflec18_ref.pdf
lec18_ref.pdf
vishal choudhary
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
NAVER Engineering
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
Arthur Mensch
 

What's hot (14)

The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery
The Lazy Traveling Salesman Memory Management for Large-Scale Link DiscoveryThe Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery
The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilities
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
 
12_applications.pdf
12_applications.pdf12_applications.pdf
12_applications.pdf
 
A Short Introduction to Generative Adversarial Networks
A Short Introduction to Generative Adversarial NetworksA Short Introduction to Generative Adversarial Networks
A Short Introduction to Generative Adversarial Networks
 
IEEESSCI2017-FOCI4-1039
IEEESSCI2017-FOCI4-1039IEEESSCI2017-FOCI4-1039
IEEESSCI2017-FOCI4-1039
 
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
 
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
 
Probability distributions for ml
Probability distributions for mlProbability distributions for ml
Probability distributions for ml
 
lec18_ref.pdf
lec18_ref.pdflec18_ref.pdf
lec18_ref.pdf
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 

Viewers also liked

Learning to automatically solve algebra word problems
Learning to automatically solve algebra word problemsLearning to automatically solve algebra word problems
Learning to automatically solve algebra word problems
Naoaki Okazaki
 
研究室における研究・実装ノウハウの共有
研究室における研究・実装ノウハウの共有研究室における研究・実装ノウハウの共有
研究室における研究・実装ノウハウの共有
Naoaki Okazaki
 
Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...
Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...
Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...
Naoaki Okazaki
 
Visualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLPVisualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLP
Naoaki Okazaki
 
Word2vecの並列実行時の学習速度の改善
Word2vecの並列実行時の学習速度の改善Word2vecの並列実行時の学習速度の改善
Word2vecの並列実行時の学習速度の改善
Naoaki Okazaki
 
Supervised Learning of Universal Sentence Representations from Natural Langua...
Supervised Learning of Universal Sentence Representations from Natural Langua...Supervised Learning of Universal Sentence Representations from Natural Langua...
Supervised Learning of Universal Sentence Representations from Natural Langua...
Naoaki Okazaki
 
単語・句の分散表現の学習
単語・句の分散表現の学習単語・句の分散表現の学習
単語・句の分散表現の学習
Naoaki Okazaki
 
単語の分散表現と構成性の計算モデルの発展
単語の分散表現と構成性の計算モデルの発展単語の分散表現と構成性の計算モデルの発展
単語の分散表現と構成性の計算モデルの発展
Naoaki Okazaki
 
深層ニューラルネットワーク による知識の自動獲得・推論
深層ニューラルネットワークによる知識の自動獲得・推論深層ニューラルネットワークによる知識の自動獲得・推論
深層ニューラルネットワーク による知識の自動獲得・推論
Naoaki Okazaki
 
言語と画像の表現学習
言語と画像の表現学習言語と画像の表現学習
言語と画像の表現学習
Yuki Noguchi
 
深層学習時代の自然言語処理
深層学習時代の自然言語処理深層学習時代の自然言語処理
深層学習時代の自然言語処理
Yuya Unno
 

Viewers also liked (11)

Learning to automatically solve algebra word problems
Learning to automatically solve algebra word problemsLearning to automatically solve algebra word problems
Learning to automatically solve algebra word problems
 
研究室における研究・実装ノウハウの共有
研究室における研究・実装ノウハウの共有研究室における研究・実装ノウハウの共有
研究室における研究・実装ノウハウの共有
 
Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...
Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...
Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...
 
Visualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLPVisualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLP
 
Word2vecの並列実行時の学習速度の改善
Word2vecの並列実行時の学習速度の改善Word2vecの並列実行時の学習速度の改善
Word2vecの並列実行時の学習速度の改善
 
Supervised Learning of Universal Sentence Representations from Natural Langua...
Supervised Learning of Universal Sentence Representations from Natural Langua...Supervised Learning of Universal Sentence Representations from Natural Langua...
Supervised Learning of Universal Sentence Representations from Natural Langua...
 
単語・句の分散表現の学習
単語・句の分散表現の学習単語・句の分散表現の学習
単語・句の分散表現の学習
 
単語の分散表現と構成性の計算モデルの発展
単語の分散表現と構成性の計算モデルの発展単語の分散表現と構成性の計算モデルの発展
単語の分散表現と構成性の計算モデルの発展
 
深層ニューラルネットワーク による知識の自動獲得・推論
深層ニューラルネットワークによる知識の自動獲得・推論深層ニューラルネットワークによる知識の自動獲得・推論
深層ニューラルネットワーク による知識の自動獲得・推論
 
言語と画像の表現学習
言語と画像の表現学習言語と画像の表現学習
言語と画像の表現学習
 
深層学習時代の自然言語処理
深層学習時代の自然言語処理深層学習時代の自然言語処理
深層学習時代の自然言語処理
 

Similar to Modeling missing data in distant supervision for information extraction (Ritter+, TACL 2013)

Linked Data Generation for Adaptive Learning Analytics Systems
Linked Data Generation for Adaptive Learning Analytics SystemsLinked Data Generation for Adaptive Learning Analytics Systems
Linked Data Generation for Adaptive Learning Analytics Systems
Sven Lieber
 
Using binary classifiers
Using binary classifiersUsing binary classifiers
Using binary classifiersbutest
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World Systems
Manojit Nandi
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
eXascale Infolab
 
Mariia Havrylovych "Active learning and weak supervision in NLP projects"
Mariia Havrylovych "Active learning and weak supervision in NLP projects"Mariia Havrylovych "Active learning and weak supervision in NLP projects"
Mariia Havrylovych "Active learning and weak supervision in NLP projects"
Fwdays
 
Dynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & StatisticsDynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & Statistics
Paul Hofmann
 
Data Science.pptx
Data Science.pptxData Science.pptx
Data Science.pptx
TrainerAnalogicx
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
Sqrrl
 
Fast Distributed Online Classification
Fast Distributed Online ClassificationFast Distributed Online Classification
Fast Distributed Online Classification
Prasad Chalasani
 
Scene Description From Images To Sentences
Scene Description From Images To SentencesScene Description From Images To Sentences
Scene Description From Images To Sentences
IRJET Journal
 
Bayesian reasoning
Bayesian reasoningBayesian reasoning
Bayesian reasoning
Marta Fajlhauer
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
ebiquity
 
OpenML DALI
OpenML DALIOpenML DALI
OpenML DALI
Joaquin Vanschoren
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11darwinrlo
 
Machine learning presentation (razi)
Machine learning presentation (razi)Machine learning presentation (razi)
Machine learning presentation (razi)
Rizwan Shaukat
 
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
Gilles Vandewiele
 
(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report
Gaurav Sawant
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
Mohamed BEN ELLEFI
 
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
Charlie Berger
 
Lecture 19
Lecture 19Lecture 19
Lecture 19
Shani729
 

Similar to Modeling missing data in distant supervision for information extraction (Ritter+, TACL 2013) (20)

Linked Data Generation for Adaptive Learning Analytics Systems
Linked Data Generation for Adaptive Learning Analytics SystemsLinked Data Generation for Adaptive Learning Analytics Systems
Linked Data Generation for Adaptive Learning Analytics Systems
 
Using binary classifiers
Using binary classifiersUsing binary classifiers
Using binary classifiers
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World Systems
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
Mariia Havrylovych "Active learning and weak supervision in NLP projects"
Mariia Havrylovych "Active learning and weak supervision in NLP projects"Mariia Havrylovych "Active learning and weak supervision in NLP projects"
Mariia Havrylovych "Active learning and weak supervision in NLP projects"
 
Dynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & StatisticsDynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & Statistics
 
Data Science.pptx
Data Science.pptxData Science.pptx
Data Science.pptx
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
 
Fast Distributed Online Classification
Fast Distributed Online ClassificationFast Distributed Online Classification
Fast Distributed Online Classification
 
Scene Description From Images To Sentences
Scene Description From Images To SentencesScene Description From Images To Sentences
Scene Description From Images To Sentences
 
Bayesian reasoning
Bayesian reasoningBayesian reasoning
Bayesian reasoning
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
OpenML DALI
OpenML DALIOpenML DALI
OpenML DALI
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11
 
Machine learning presentation (razi)
Machine learning presentation (razi)Machine learning presentation (razi)
Machine learning presentation (razi)
 
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
 
(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
 
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
 
Lecture 19
Lecture 19Lecture 19
Lecture 19
 

Recently uploaded

ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 

Recently uploaded (20)

ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 

Modeling missing data in distant supervision for information extraction (Ritter+, TACL 2013)

  • 1. Modeling Missing Data in Distant Supervision for Information Extraction Alan Ritter (CMU) Luke Zettlemoyer(University of Washington) Mausam(University of Washington) Oren Etzioni(Vulcan Inc.) TACL, 1, 367-378, 2013. Presented by NaoakiOkazaki (Tohoku University) 2014-09-05 Modeling Missing Data in Distant Supervision 1
  • 2. Relation instance extraction Steven Spielberg’s film Saving Private Ryan is loosely based on the brothers’ story. Extractor Film Director Saving Private Ryan Steven Spielberg Film-director relation • Fully-supervised learning (Zhou+ 05, …) • Uses ACE corpora to build relation-instance classifiers • Suffers from the limited number of training data • Unsupervised information extraction (Banko+ 07, …) • Extracts relational patterns between entities, and clusters the patterns into relations • Difficult to map clusters into relations of interest • Bootstrap learning (Brin98, …) • Uses seed instances to extract a new set of relational patterns • Often suffers from low precision (semantic drift) • Distant supervision (Mintz+ 09, …) • Combines the advantages of the above approaches 2014-09-05 Modeling Missing Data in Distant Supervision 2
  • 3. Distant supervision (Mintz+, 09) Person Birthplace EdwinHubble Marshfield … … Automatic annotation Astronomer Edwin Hubble was born in Marshfield, Missouri. Feature extraction Mintzet al. (2009) Distant supervision for relation extraction without labeled data. ACL-2009, pages 1003–1011. * Each row presents a single feature. Concatenate features from different sentences containing the same entity pairs. Problem: An entity pair cannot have multiple relations E.g., Founded(Jobs, Apple) and CEO-of(Jobs, Apple) are true. 2014-09-05 Modeling Missing Data in Distant Supervision 3
  • 4. MultiR(Hoffmann+, 11) Introduces latent variables (푧푧푖푖) to indicate the relation expressed by sentence 푥푥푖푖 0 1 1 0 Founder Founder CEO-of 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦capital−of Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs is CEO of Apple. 푧푧1 푧푧2 푧푧3 푝푝풚풚,풛풛풙풙 = 1 푍푍푥푥 ෑ 푟푟 Φjoin(푦푦푟푟,풛풛)ෑ 푖푖 Φextract(푧푧푖푖,푥푥푖푖) 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) 푥푥푖푖: a sentence containing the entity pair 푦푦푟푟∈{0,1}: 1if the knowledge base includes the pair with relation 푟푟, 0otherwise 푧푧푖푖∈푅푅: the relation expressed by sentence 푥푥푖푖 Φextract푧푧푖푖,푥푥푖푖=exp෍ 푗푗 휃휃푗푗휙휙푗푗(푧푧푖푖,푥푥푖푖) Φjoin푦푦푟푟,풛풛=1(¬푦푦푟푟⋁∃푖푖: 푗푗=푧푧푖푖) (Deterministic OR) The same as (Mintz+ 09) Φjoinensures that a sentence 푥푥푖푖expressing the relation 푟푟exists if 푟푟is true Allows multiple relations for the same entity pair 2014-09-05 Modeling Missing Data in Distant Supervision 4
  • 5. MultiR: Training Hoffmann et al. (2011) Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations. ACL-2011, pages 541–550. Loop for passes over the training data Loop for entity pairs in the KB Predict sentence-level and KB-level relations (ignoring the facts in the KB) Find an optimal assignment of sentence-level relations consistent with the facts in KB We need two kinds of inferences Update feature weights similarly to the perceptron algorithm 2014-09-05 Modeling Missing Data in Distant Supervision 5
  • 6. MultiR: Inference 1: argmax 풚풚,풛풛 푝푝(풚풚,풛풛|풙풙) ? ? ? ? ? ? ? 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦capital−of Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs is CEO of Apple. 푧푧1 푧푧2 푧푧3 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) 0.5 16.0 9.0 0.1 8.0 11.0 6.0 0.1 7.0 8.0 7.0 0.2 born−in founder CEO−of capita−of Predict a relation label for each sentence independently Aggregate sentence- level predictions into global-level predictions 2014-09-05 Modeling Missing Data in Distant Supervision 6
  • 7. MultiR: Inference 1: argmax 풚풚,풛풛 푝푝(풚풚,풛풛|풙풙) 0 1 0 0 founder founder founder 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦capital−of Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs is CEO of Apple. 푧푧1 푧푧2 푧푧3 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) 0.5 16.0 9.0 0.1 8.0 11.0 6.0 0.1 7.0 8.0 7.0 0.2 born−in founder CEO−of capita−of Predict a relation label for each sentence independently Aggregate sentence- level predictions into global-level predictions Very easy to find! Computational cost: 표표(푅푅풙풙) 2014-09-05 Modeling Missing Data in Distant Supervision 7
  • 8. MultiR: Inference 2: argmax 풛풛 푝푝(풛풛|풙풙,풚풚) 0 1 1 0 ? ? ? 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦capital−of Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs is CEO of Apple. 푧푧1 푧푧2 푧푧3 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) 0.5 16.0 9.0 0.1 8.0 11.0 6.0 0.1 7.0 8.0 7.0 0.2 born−in founder CEO−of capita−of 0.5 8 7 16 11 8 9 6 7 0.1 0.1 0.2 Define an edge weight: w푦푦푟푟,푧푧푖푖=Φextract(푟푟,푥푥푖푖) A node with 푦푦푟푟=1must have at least an edge connecting to 푧푧푖푖 Each node 푧푧푖푖must have an edge connecting to 푦푦푟푟 Find a set of edges that maximize the sum of weights 2014-09-05 Modeling Missing Data in Distant Supervision 8
  • 9. MultiR: Inference 2: argmax 풛풛 푝푝(풛풛|풙풙,풚풚) 0 1 1 0 founder founder CEO-of 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦capital−of Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs is CEO of Apple. 푧푧1 푧푧2 푧푧3 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) 0.5 16.0 9.0 0.1 8.0 11.0 6.0 0.1 7.0 8.0 7.0 0.2 born−in founder CEO−of capita−of 16 11 8 9 6 7 Define an edge weight: w푦푦푟푟,푧푧푖푖=Φextract(푟푟,푥푥푖푖) A node with 푦푦푟푟=1must have at least an edge connecting to 푧푧푖푖 Each node 푧푧푖푖must have an edge connecting to 푦푦푟푟 Find a set of edges that maximize the sum of weights Exact solution in polynomial time In practice, approximate solution by greedy search (assigning 푧푧푖푖for each node 푦푦푟푟=1) is sufficient 2014-09-05 Modeling Missing Data in Distant Supervision 9
  • 10. Contribution of this work • MultiRmakes two assumptions (hard constraints): • If a fact is not found in the database, it cannot be mentioned in the text • If a fact is in the database, it must be mentioned in at least one sentence. • Relax MultiRto handle the situation where: • A fact is not mentioned in text (MIT) • A fact mentioned in text is missing in database (MID) • Side effect of this relaxation • Incorporates the tendency that the knowledge base is likely to include popular entities and relations 2014-09-05 Modeling Missing Data in Distant Supervision 10
  • 11. Distant Supervision with Data Not Missing at Random (DNMAR) 0 1 1 0 Founder Founder visit 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦visit Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs visited Apple store… 푧푧1 푧푧2 푧푧3 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) 0 1 0 1 풕풕 Introduce a layer of latent variables (푡푡푟푟) to handle missing cases 휙휙miss푦푦푟푟,푡푡푟푟 = −훼훼푀푀푀푀푀푀(푦푦푟푟=1⋀푡푡푟푟=0) (missingintext) −훼훼푀푀푀푀푀푀(푦푦푟푟=0⋀푡푡푟푟=1) (missinginDB) 0(otherwise) Relaxing two hard constraints in MultiRinto soft oneswith penalty factors −훼훼푀푀푀푀푀푀and −훼훼푀푀푀푀푀푀 Introduce a new factor: Training algorithm is the same as the one used in MultiR 2014-09-05 Modeling Missing Data in Distant Supervision 11
  • 12. Constrained inference: argmax 풛풛 푝푝(풛풛|풙풙,풚풚) 0 1 1 0 ? ? ? 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦visit Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs visited Apple store… 푧푧1 푧푧2 푧푧3 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) ? ? ? ? 풕풕 푧푧∗=argmax 풛풛 ෍ 푖푖=1 푛푛 휃휃ȉΦextract푧푧푖푖,푥푥푖푖+෍ 푟푟 훼훼푀푀푀푀푇ȉ1(푦푦푟푟⋁∃푖푖:푟푟=푧푧푖푖)−훼훼푀푀푀푀퐷ȉ1(¬푦푦푟푟⋁∃푖푖:푟푟=푧푧푖푖) Became more challenging A* search can find an exact solution, but is not scalable with many variables Present a greedy hill climbing approach for the inference: 1. Initialize 푧푧푖푖at random 2. Obtain neighborhoods of the current solution 3. Move to the neighbor yielding the highest score 4. Repeat this process 2014-09-05 Modeling Missing Data in Distant Supervision 12
  • 13. Incorporating popularity in KB • We tune the penalty factors 훼훼푀푀푀푀푀푀and 훼훼푀푀푀푀퐷on a development set • We can take into account how likely each fact is to be observed in the text and the knowledge base • Facts about Barack Obama are likelyto exist • Facts about NaoakiOkazaki are unlikelyto exists • Control the penalty factor for each entity pair • Popularity of entities: 훼훼푀푀푀푀푀푀 (푒푒1,푒푒2)=−훾훾min(푐푐푒푒1,푐푐(푒푒2)) • A larger penalty if the model predicts that a fact about a popular entity does not exist in KB • Well-aligned relations: assign 3 kinds of values of 훼훼푀푀푀푀푇푟푟 • A larger penalty if a popular relation such as contains, place_lived, and nationalitydoes not exist in text 2014-09-05 Modeling Missing Data in Distant Supervision 13
  • 14. Experiments • Binary relation extraction • The standard setting (Riedel+, 10) • Knowledge base: Freebase relations • Text corpus: 1.8m New York Times articles • Two kinds of evaluation • Sentence-level extractions using the dataset (Hoffmann+, 11) • Holdout evaluation on Freebase knowledge • Unary relation extraction (NE categorization) • Twitter NE categorization dataset (Ritter+, 11) • Knowledge base: Freebase (instances and their categories) • Text corpus: tweets • Hold-out evaluation 2014-09-05 Modeling Missing Data in Distant Supervision 14
  • 15. Results 17% increase in area under the curve. Incorporating popularity yielded 27% increase over the baseline. This evaluation underestimate precision because many facts correctly extracted from text are missing in the database. DNMAR doubled the recall. Ritter et al. (2013) Modeling Missing Data in Distant Supervision for Information Extraction, TACL(1), 367-378. 2014-09-05 Modeling Missing Data in Distant Supervision 15
  • 16. Conclusion • Investigated the problem of missing data in distant supervision • Presented an extension of MultiRto handle missing data • Could incorporate the popularity of facts to be included in the knowledge base and text • Presented a scalable inference algorithm based on greedy hill-climbing • Demonstrated the effectiveness of the modeling 2014-09-05 Modeling Missing Data in Distant Supervision 16
  • 17. References • Raphael Hoffmann, CongleZhang, Xiao Ling, Luke Zettlemoyer, Daniel S. Weld. (2011) Knowledge- Based Weak Supervision for Information Extraction of Overlapping Relations. ACL-2011, pages 541–550. •Slides and codes • Mike Mintz, Steven Bills, RionSnow, Dan Jurafsky. (2009) Distant supervision for relation extraction without labeled data. ACL-2009, pages 1003–1011. 2014-09-05 Modeling Missing Data in Distant Supervision 17