• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Extreme Extraction - Machine Reading in a Week

Extreme Extraction - Machine Reading in a Week






Total Views
Views on SlideShare
Embed Views



11 Embeds 7,918

http://shuyo.wordpress.com 4767
http://d.hatena.ne.jp 2025
http://www.techgig.com 1115
http://prlog.ru 2
http://translate.googleusercontent.com 2
http://hatenatunnel.appspot.com 2
http://www.365dailyjournal.com 1
http://webcache.googleusercontent.com 1
http://www.hatenatunnel.appspot.com 1
http://timesjobs.techgig.com 1
https://shuyo.wordpress.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Extreme Extraction - Machine Reading in a Week Extreme Extraction - Machine Reading in a Week Presentation Transcript

    • [Freedman+ EMNLP11] ExtremeExtraction – Machine Reading in a Week 23 Dec 2011 Nakatani Shuyo @ Cybozu labs, Inc twitter : @shuyo
    • Abstract• Target: – Rapid construction of concept and relation extraction system• Method: – Extend an existing ACE system for new relation – in short time with minimum training data • in a Week (<50 person hours) with <20 example pairs – Evaluate by question answering task
    • Phases1. Ontology and resources2. Extending system for new ontology3. Extracting relations4. Evaluation
    • 1. Ontology and resources• possibleTreatment( Substance, Condition ) – SSRIs(S) are effective treatments for depression(C)• expectedDateOnMarket( Substance , Date ) – More drugs for type 2(S) expected on market soon(D)• responsibleForTreatment( Substance, Agent ) – Officials(A) Responsible for Treatment of War Dead(S)• studiesDisease( Agent , Condition ) not sure – cancer(C) researcher Dr. Henri Joyeux(A)• hasSideEffect( Substance, Condition )
    • 2. Extending system for new ontology• Add new relation/class detectors into “our” extraction system for ACE task – Details of the system are not clear... • Class detectors with unsupervised word clustering • Bootstrap relation learner with a template and seeds • Pattern learning for relation extraction• Annotate words for 4 classes• Coreference
    • Bootstrap relation learner• DAP(Double-Anchored Pattern) (Kozareva+ 08) – Web search with a query based on “<CLASS> such as <SEED> and *” – Add words at the position “*” in snippet into the class member as new seeds – Repeat “the bootstraping loop” while seeds are available
    • Relation detection with DAP• CLASS = disease / SEED = cold• Web search = “disease such as cold and”
    • Relation detection with DAP• CLASS = disease / SEED = cold• Web search = “disease such as cold and” – disease such as cold and flu (9). ... – disease such as cold and heat, external ... – disease such as cold and pneumonia. ... – disease (such as cold and hot diseases), ... – disease such as cold and flu viruses. ... – disease such as cold and food poisoning. ...
    • Four classes to annotate• Substance-Name – medicine name• Substance-Description – e.g. “new drags”• Condition-Name – name of disease• Condition-Description – e.g. “the illness”
    • Annotation• Name tagging with active learning(Miller+ 04) – Unsupervised word clustering on binary tree (Brown+ 90) – Tagging with clustering information • Averaged Perceptron (Collins 02) – Request annotation for selected sentence based on “confidence score” • score = (highest perceptron score) - (second one) !?
    • Results of Class Detection What’s GS(GoldStandard)? from [Freedman+ 11]• substances & conditions – -Name / -Description respectively• without/with lists of known substances and conditions
    • Coreference• It took the most time(20 of 43 hours)• But its detail is not clear... – domain independent heuristics – appositive linking
    • 3. Extracting relations• Learned Patterns vs. Handwritten Patterns from [Freedman+ 11]
    • from [Freedman+ 11]
    • 4. Evaluation• Question Answering with extracted information• Query examples – Find possible treatments for diabetes – What is expected date to market for Abilify?
    • Answer Example• ACME produces a wide range of drugs including treatments for malaria and athletes foot – responsibleForTreatment(drugs, ACME) – possibleTreatment(drugs, malaria) – possibleTreatment(drugs, athletes foot)
    • from [Freedman+ 11]• useful = answering complex query
    • When non-useful answers are removed from [Freedman+ 11]• annotator’s recall (A)• using combining both (C)• using only handwritten rules (H, HW)• using only learned patterns (L)
    • from [Freedman+ 11]
    • Discussion from [Freedman+ 11]
    • Conclusions• The combination system can achieve F1 of 0.51 in a new domain in a week.• It requires so little training data.• The effectiveness of learning algorithms is still not competitive with handwritten patterns.
    • References• [Freedman+ 11] Extreme Extraction – Machine Reading in a Week• [Kozareva+ 08] Semantic Class Learning from the Web with Hyponym Pattern Linkage• [Miller+ 04] Name Tagging with Word Cluster and Discriminative Training – [Brown+ 90] Class-based n-gram models of natural language – [Collins 02] Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithm