Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Modeling missing data in distant supervision for information extraction (Ritter+, TACL 2013)

3,069 views

Published on

第6回最先端NLP勉強会の発表資料
http://www.cl.ecei.tohoku.ac.jp/~y-matsu/snlp6/

Published in: Science
  • Be the first to comment

Modeling missing data in distant supervision for information extraction (Ritter+, TACL 2013)

  1. 1. Modeling Missing Data in Distant Supervision for Information Extraction Alan Ritter (CMU) Luke Zettlemoyer(University of Washington) Mausam(University of Washington) Oren Etzioni(Vulcan Inc.) TACL, 1, 367-378, 2013. Presented by NaoakiOkazaki (Tohoku University) 2014-09-05 Modeling Missing Data in Distant Supervision 1
  2. 2. Relation instance extraction Steven Spielberg’s film Saving Private Ryan is loosely based on the brothers’ story. Extractor Film Director Saving Private Ryan Steven Spielberg Film-director relation • Fully-supervised learning (Zhou+ 05, …) • Uses ACE corpora to build relation-instance classifiers • Suffers from the limited number of training data • Unsupervised information extraction (Banko+ 07, …) • Extracts relational patterns between entities, and clusters the patterns into relations • Difficult to map clusters into relations of interest • Bootstrap learning (Brin98, …) • Uses seed instances to extract a new set of relational patterns • Often suffers from low precision (semantic drift) • Distant supervision (Mintz+ 09, …) • Combines the advantages of the above approaches 2014-09-05 Modeling Missing Data in Distant Supervision 2
  3. 3. Distant supervision (Mintz+, 09) Person Birthplace EdwinHubble Marshfield … … Automatic annotation Astronomer Edwin Hubble was born in Marshfield, Missouri. Feature extraction Mintzet al. (2009) Distant supervision for relation extraction without labeled data. ACL-2009, pages 1003–1011. * Each row presents a single feature. Concatenate features from different sentences containing the same entity pairs. Problem: An entity pair cannot have multiple relations E.g., Founded(Jobs, Apple) and CEO-of(Jobs, Apple) are true. 2014-09-05 Modeling Missing Data in Distant Supervision 3
  4. 4. MultiR(Hoffmann+, 11) Introduces latent variables (푧푧푖푖) to indicate the relation expressed by sentence 푥푥푖푖 0 1 1 0 Founder Founder CEO-of 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦capital−of Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs is CEO of Apple. 푧푧1 푧푧2 푧푧3 푝푝풚풚,풛풛풙풙 = 1 푍푍푥푥 ෑ 푟푟 Φjoin(푦푦푟푟,풛풛)ෑ 푖푖 Φextract(푧푧푖푖,푥푥푖푖) 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) 푥푥푖푖: a sentence containing the entity pair 푦푦푟푟∈{0,1}: 1if the knowledge base includes the pair with relation 푟푟, 0otherwise 푧푧푖푖∈푅푅: the relation expressed by sentence 푥푥푖푖 Φextract푧푧푖푖,푥푥푖푖=exp෍ 푗푗 휃휃푗푗휙휙푗푗(푧푧푖푖,푥푥푖푖) Φjoin푦푦푟푟,풛풛=1(¬푦푦푟푟⋁∃푖푖: 푗푗=푧푧푖푖) (Deterministic OR) The same as (Mintz+ 09) Φjoinensures that a sentence 푥푥푖푖expressing the relation 푟푟exists if 푟푟is true Allows multiple relations for the same entity pair 2014-09-05 Modeling Missing Data in Distant Supervision 4
  5. 5. MultiR: Training Hoffmann et al. (2011) Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations. ACL-2011, pages 541–550. Loop for passes over the training data Loop for entity pairs in the KB Predict sentence-level and KB-level relations (ignoring the facts in the KB) Find an optimal assignment of sentence-level relations consistent with the facts in KB We need two kinds of inferences Update feature weights similarly to the perceptron algorithm 2014-09-05 Modeling Missing Data in Distant Supervision 5
  6. 6. MultiR: Inference 1: argmax 풚풚,풛풛 푝푝(풚풚,풛풛|풙풙) ? ? ? ? ? ? ? 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦capital−of Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs is CEO of Apple. 푧푧1 푧푧2 푧푧3 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) 0.5 16.0 9.0 0.1 8.0 11.0 6.0 0.1 7.0 8.0 7.0 0.2 born−in founder CEO−of capita−of Predict a relation label for each sentence independently Aggregate sentence- level predictions into global-level predictions 2014-09-05 Modeling Missing Data in Distant Supervision 6
  7. 7. MultiR: Inference 1: argmax 풚풚,풛풛 푝푝(풚풚,풛풛|풙풙) 0 1 0 0 founder founder founder 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦capital−of Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs is CEO of Apple. 푧푧1 푧푧2 푧푧3 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) 0.5 16.0 9.0 0.1 8.0 11.0 6.0 0.1 7.0 8.0 7.0 0.2 born−in founder CEO−of capita−of Predict a relation label for each sentence independently Aggregate sentence- level predictions into global-level predictions Very easy to find! Computational cost: 표표(푅푅풙풙) 2014-09-05 Modeling Missing Data in Distant Supervision 7
  8. 8. MultiR: Inference 2: argmax 풛풛 푝푝(풛풛|풙풙,풚풚) 0 1 1 0 ? ? ? 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦capital−of Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs is CEO of Apple. 푧푧1 푧푧2 푧푧3 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) 0.5 16.0 9.0 0.1 8.0 11.0 6.0 0.1 7.0 8.0 7.0 0.2 born−in founder CEO−of capita−of 0.5 8 7 16 11 8 9 6 7 0.1 0.1 0.2 Define an edge weight: w푦푦푟푟,푧푧푖푖=Φextract(푟푟,푥푥푖푖) A node with 푦푦푟푟=1must have at least an edge connecting to 푧푧푖푖 Each node 푧푧푖푖must have an edge connecting to 푦푦푟푟 Find a set of edges that maximize the sum of weights 2014-09-05 Modeling Missing Data in Distant Supervision 8
  9. 9. MultiR: Inference 2: argmax 풛풛 푝푝(풛풛|풙풙,풚풚) 0 1 1 0 founder founder CEO-of 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦capital−of Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs is CEO of Apple. 푧푧1 푧푧2 푧푧3 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) 0.5 16.0 9.0 0.1 8.0 11.0 6.0 0.1 7.0 8.0 7.0 0.2 born−in founder CEO−of capita−of 16 11 8 9 6 7 Define an edge weight: w푦푦푟푟,푧푧푖푖=Φextract(푟푟,푥푥푖푖) A node with 푦푦푟푟=1must have at least an edge connecting to 푧푧푖푖 Each node 푧푧푖푖must have an edge connecting to 푦푦푟푟 Find a set of edges that maximize the sum of weights Exact solution in polynomial time In practice, approximate solution by greedy search (assigning 푧푧푖푖for each node 푦푦푟푟=1) is sufficient 2014-09-05 Modeling Missing Data in Distant Supervision 9
  10. 10. Contribution of this work • MultiRmakes two assumptions (hard constraints): • If a fact is not found in the database, it cannot be mentioned in the text • If a fact is in the database, it must be mentioned in at least one sentence. • Relax MultiRto handle the situation where: • A fact is not mentioned in text (MIT) • A fact mentioned in text is missing in database (MID) • Side effect of this relaxation • Incorporates the tendency that the knowledge base is likely to include popular entities and relations 2014-09-05 Modeling Missing Data in Distant Supervision 10
  11. 11. Distant Supervision with Data Not Missing at Random (DNMAR) 0 1 1 0 Founder Founder visit 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦visit Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs visited Apple store… 푧푧1 푧푧2 푧푧3 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) 0 1 0 1 풕풕 Introduce a layer of latent variables (푡푡푟푟) to handle missing cases 휙휙miss푦푦푟푟,푡푡푟푟 = −훼훼푀푀푀푀푀푀(푦푦푟푟=1⋀푡푡푟푟=0) (missingintext) −훼훼푀푀푀푀푀푀(푦푦푟푟=0⋀푡푡푟푟=1) (missinginDB) 0(otherwise) Relaxing two hard constraints in MultiRinto soft oneswith penalty factors −훼훼푀푀푀푀푀푀and −훼훼푀푀푀푀푀푀 Introduce a new factor: Training algorithm is the same as the one used in MultiR 2014-09-05 Modeling Missing Data in Distant Supervision 11
  12. 12. Constrained inference: argmax 풛풛 푝푝(풛풛|풙풙,풚풚) 0 1 1 0 ? ? ? 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦visit Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs visited Apple store… 푧푧1 푧푧2 푧푧3 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) ? ? ? ? 풕풕 푧푧∗=argmax 풛풛 ෍ 푖푖=1 푛푛 휃휃ȉΦextract푧푧푖푖,푥푥푖푖+෍ 푟푟 훼훼푀푀푀푀푇ȉ1(푦푦푟푟⋁∃푖푖:푟푟=푧푧푖푖)−훼훼푀푀푀푀퐷ȉ1(¬푦푦푟푟⋁∃푖푖:푟푟=푧푧푖푖) Became more challenging A* search can find an exact solution, but is not scalable with many variables Present a greedy hill climbing approach for the inference: 1. Initialize 푧푧푖푖at random 2. Obtain neighborhoods of the current solution 3. Move to the neighbor yielding the highest score 4. Repeat this process 2014-09-05 Modeling Missing Data in Distant Supervision 12
  13. 13. Incorporating popularity in KB • We tune the penalty factors 훼훼푀푀푀푀푀푀and 훼훼푀푀푀푀퐷on a development set • We can take into account how likely each fact is to be observed in the text and the knowledge base • Facts about Barack Obama are likelyto exist • Facts about NaoakiOkazaki are unlikelyto exists • Control the penalty factor for each entity pair • Popularity of entities: 훼훼푀푀푀푀푀푀 (푒푒1,푒푒2)=−훾훾min(푐푐푒푒1,푐푐(푒푒2)) • A larger penalty if the model predicts that a fact about a popular entity does not exist in KB • Well-aligned relations: assign 3 kinds of values of 훼훼푀푀푀푀푇푟푟 • A larger penalty if a popular relation such as contains, place_lived, and nationalitydoes not exist in text 2014-09-05 Modeling Missing Data in Distant Supervision 13
  14. 14. Experiments • Binary relation extraction • The standard setting (Riedel+, 10) • Knowledge base: Freebase relations • Text corpus: 1.8m New York Times articles • Two kinds of evaluation • Sentence-level extractions using the dataset (Hoffmann+, 11) • Holdout evaluation on Freebase knowledge • Unary relation extraction (NE categorization) • Twitter NE categorization dataset (Ritter+, 11) • Knowledge base: Freebase (instances and their categories) • Text corpus: tweets • Hold-out evaluation 2014-09-05 Modeling Missing Data in Distant Supervision 14
  15. 15. Results 17% increase in area under the curve. Incorporating popularity yielded 27% increase over the baseline. This evaluation underestimate precision because many facts correctly extracted from text are missing in the database. DNMAR doubled the recall. Ritter et al. (2013) Modeling Missing Data in Distant Supervision for Information Extraction, TACL(1), 367-378. 2014-09-05 Modeling Missing Data in Distant Supervision 15
  16. 16. Conclusion • Investigated the problem of missing data in distant supervision • Presented an extension of MultiRto handle missing data • Could incorporate the popularity of facts to be included in the knowledge base and text • Presented a scalable inference algorithm based on greedy hill-climbing • Demonstrated the effectiveness of the modeling 2014-09-05 Modeling Missing Data in Distant Supervision 16
  17. 17. References • Raphael Hoffmann, CongleZhang, Xiao Ling, Luke Zettlemoyer, Daniel S. Weld. (2011) Knowledge- Based Weak Supervision for Information Extraction of Overlapping Relations. ACL-2011, pages 541–550. •Slides and codes • Mike Mintz, Steven Bills, RionSnow, Dan Jurafsky. (2009) Distant supervision for relation extraction without labeled data. ACL-2009, pages 1003–1011. 2014-09-05 Modeling Missing Data in Distant Supervision 17

×