Extreme Extraction - Machine Reading in a WeekPresentation Transcript
[Freedman+ EMNLP11] ExtremeExtraction – Machine Reading in a Week 23 Dec 2011 Nakatani Shuyo @ Cybozu labs, Inc twitter : @shuyo
Abstract• Target: – Rapid construction of concept and relation extraction system• Method: – Extend an existing ACE system for new relation – in short time with minimum training data • in a Week (<50 person hours) with <20 example pairs – Evaluate by question answering task
Phases1. Ontology and resources2. Extending system for new ontology3. Extracting relations4. Evaluation
1. Ontology and resources• possibleTreatment( Substance, Condition ) – SSRIs(S) are effective treatments for depression(C)• expectedDateOnMarket( Substance , Date ) – More drugs for type 2(S) expected on market soon(D)• responsibleForTreatment( Substance, Agent ) – Officials(A) Responsible for Treatment of War Dead(S)• studiesDisease( Agent , Condition ) not sure – cancer(C) researcher Dr. Henri Joyeux(A)• hasSideEffect( Substance, Condition )
2. Extending system for new ontology• Add new relation/class detectors into “our” extraction system for ACE task – Details of the system are not clear... • Class detectors with unsupervised word clustering • Bootstrap relation learner with a template and seeds • Pattern learning for relation extraction• Annotate words for 4 classes• Coreference
Bootstrap relation learner• DAP(Double-Anchored Pattern) (Kozareva+ 08) – Web search with a query based on “<CLASS> such as <SEED> and *” – Add words at the position “*” in snippet into the class member as new seeds – Repeat “the bootstraping loop” while seeds are available
Relation detection with DAP• CLASS = disease / SEED = cold• Web search = “disease such as cold and”
Relation detection with DAP• CLASS = disease / SEED = cold• Web search = “disease such as cold and” – disease such as cold and flu (9). ... – disease such as cold and heat, external ... – disease such as cold and pneumonia. ... – disease (such as cold and hot diseases), ... – disease such as cold and flu viruses. ... – disease such as cold and food poisoning. ...
Four classes to annotate• Substance-Name – medicine name• Substance-Description – e.g. “new drags”• Condition-Name – name of disease• Condition-Description – e.g. “the illness”
Annotation• Name tagging with active learning(Miller+ 04) – Unsupervised word clustering on binary tree (Brown+ 90) – Tagging with clustering information • Averaged Perceptron (Collins 02) – Request annotation for selected sentence based on “confidence score” • score = (highest perceptron score) - (second one) !?
Results of Class Detection What’s GS(GoldStandard)? from [Freedman+ 11]• substances & conditions – -Name / -Description respectively• without/with lists of known substances and conditions
Coreference• It took the most time(20 of 43 hours)• But its detail is not clear... – domain independent heuristics – appositive linking
3. Extracting relations• Learned Patterns vs. Handwritten Patterns from [Freedman+ 11]
from [Freedman+ 11]
4. Evaluation• Question Answering with extracted information• Query examples – Find possible treatments for diabetes – What is expected date to market for Abilify?
Answer Example• ACME produces a wide range of drugs including treatments for malaria and athletes foot – responsibleForTreatment(drugs, ACME) – possibleTreatment(drugs, malaria) – possibleTreatment(drugs, athletes foot)
from [Freedman+ 11]• useful = answering complex query
When non-useful answers are removed from [Freedman+ 11]• annotator’s recall (A)• using combining both (C)• using only handwritten rules (H, HW)• using only learned patterns (L)
from [Freedman+ 11]
Discussion from [Freedman+ 11]
Conclusions• The combination system can achieve F1 of 0.51 in a new domain in a week.• It requires so little training data.• The effectiveness of learning algorithms is still not competitive with handwritten patterns.
References• [Freedman+ 11] Extreme Extraction – Machine Reading in a Week• [Kozareva+ 08] Semantic Class Learning from the Web with Hyponym Pattern Linkage• [Miller+ 04] Name Tagging with Word Cluster and Discriminative Training – [Brown+ 90] Class-based n-gram models of natural language – [Collins 02] Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithm