SlideShare a Scribd company logo
[Freedman+ EMNLP11] Extreme
Extraction – Machine Reading in a
              Week

                23 Dec 2011
      Nakatani Shuyo @ Cybozu labs, Inc
               twitter : @shuyo
Abstract
• Target:
  – Rapid construction of concept and relation
    extraction system
• Method:
  – Extend an existing ACE system for new relation
  – in short time with minimum training data
     • in a Week (<50 person hours) with <20 example pairs
  – Evaluate by question answering task
Phases
1. Ontology and resources
2. Extending system for new ontology
3. Extracting relations
4. Evaluation
1. Ontology and resources
• possibleTreatment( Substance, Condition )
   – SSRIs(S) are effective treatments for depression(C)
• expectedDateOnMarket( Substance , Date )
   – More drugs for type 2(S) expected on market soon(D)
• responsibleForTreatment( Substance, Agent )
   – Officials(A) Responsible for Treatment of War Dead(S)
• studiesDisease( Agent , Condition )                       not
                                                           sure
   – cancer(C) researcher Dr. Henri Joyeux(A)
• hasSideEffect( Substance, Condition )
2. Extending system for new
               ontology
• Add new relation/class detectors into “our”
  extraction system for ACE task
  – Details of the system are not clear...
     • Class detectors with unsupervised word clustering
     • Bootstrap relation learner with a template and seeds
     • Pattern learning for relation extraction

• Annotate words for 4 classes
• Coreference
Bootstrap relation learner
• DAP(Double-Anchored Pattern) (Kozareva+ 08)
  – Web search with a query based on “<CLASS>
    such as <SEED> and *”
  – Add words at the position “*” in snippet into the
    class member as new seeds
  – Repeat “the bootstraping loop” while seeds are
    available
Relation detection with DAP
• CLASS = disease / SEED = cold
• Web search = “disease such as cold and”
Relation detection with DAP
• CLASS = disease / SEED = cold
• Web search = “disease such as cold and”
  – disease such as cold and flu (9). ...
  – disease such as cold and heat, external ...
  – disease such as cold and pneumonia. ...
  – disease (such as cold and hot diseases), ...
  – disease such as cold and flu viruses. ...
  – disease such as cold and food poisoning. ...
Four classes to annotate
• Substance-Name
  – medicine name
• Substance-Description
  – e.g. “new drags”
• Condition-Name
  – name of disease
• Condition-Description
  – e.g. “the illness”
Annotation
• Name tagging with active learning(Miller+ 04)
  – Unsupervised word clustering on binary tree
    (Brown+ 90)
  – Tagging with clustering information
     • Averaged Perceptron (Collins 02)

  – Request annotation for selected sentence based on
    “confidence score”
     • score = (highest perceptron score) - (second one)

                                       !?
Results of Class Detection
            What’s
       GS(GoldStandard)?




                                         from [Freedman+ 11]
• substances & conditions
   – -Name / -Description respectively
• without/with lists of known substances and conditions
Coreference
• It took the most time(20 of 43 hours)
• But its detail is not clear...
  – domain independent heuristics
  – appositive linking
3. Extracting relations
• Learned Patterns vs. Handwritten Patterns




                from [Freedman+ 11]
from [Freedman+ 11]
4. Evaluation
• Question Answering with extracted
  information


• Query examples
  – Find possible treatments for diabetes
  – What is expected date to market for Abilify?
Answer Example
• ACME produces a wide range of drugs
  including treatments for malaria and
  athletes foot
  – responsibleForTreatment(drugs, ACME)
  – possibleTreatment(drugs, malaria)
  – possibleTreatment(drugs, athletes foot)
from [Freedman+ 11]

• useful = answering complex query
When non-useful answers are removed




                                           from [Freedman+ 11]
•   annotator’s recall (A)
•   using combining both (C)
•   using only handwritten rules (H, HW)
•   using only learned patterns (L)
from [Freedman+ 11]
Discussion




 from [Freedman+ 11]
Conclusions
• The combination system can achieve
  F1 of 0.51 in a new domain in a week.
• It requires so little training data.
• The effectiveness of learning algorithms is
  still not competitive with handwritten
  patterns.
References
• [Freedman+ 11] Extreme Extraction – Machine
  Reading in a Week
• [Kozareva+ 08] Semantic Class Learning from the
  Web with Hyponym Pattern Linkage
• [Miller+ 04] Name Tagging with Word Cluster and
  Discriminative Training
   – [Brown+ 90] Class-based n-gram models of natural
     language
   – [Collins 02] Discriminative Training Methods for Hidden
     Markov Models: Theory and Experiments with Perceptron
     Algorithm

More Related Content

Similar to Extreme Extraction - Machine Reading in a Week

Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
Liz Norman
 
Studying ppl scientifically nb 913
Studying ppl scientifically nb 913Studying ppl scientifically nb 913
Studying ppl scientifically nb 913Jim Forde
 
R methods 66
R methods 66R methods 66
R methods 66
Walaa Abdelnaby
 
Soc. Unit I, Packet 2
Soc. Unit I, Packet 2Soc. Unit I, Packet 2
Soc. Unit I, Packet 2NHSDAnderson
 
Nursingnotes.info nursing-research-review
Nursingnotes.info nursing-research-reviewNursingnotes.info nursing-research-review
Nursingnotes.info nursing-research-reviewgrey clemente
 
Effectiveness of New, Informationist-led Curriculum Changes at the College of...
Effectiveness of New, Informationist-led Curriculum Changes at the College of...Effectiveness of New, Informationist-led Curriculum Changes at the College of...
Effectiveness of New, Informationist-led Curriculum Changes at the College of...
University of Michigan Taubman Health Sciences Library
 
Variations in citation practices across the scientific landscape: Analysis ba...
Variations in citation practices across the scientific landscape: Analysis ba...Variations in citation practices across the scientific landscape: Analysis ba...
Variations in citation practices across the scientific landscape: Analysis ba...
Wout Lamers
 
MELJUN CORTES research seminar_1_the_research_process_coming_to_terms
MELJUN CORTES research seminar_1_the_research_process_coming_to_termsMELJUN CORTES research seminar_1_the_research_process_coming_to_terms
MELJUN CORTES research seminar_1_the_research_process_coming_to_terms
MELJUN CORTES
 
Studyingpplscientificallynb914
Studyingpplscientificallynb914Studyingpplscientificallynb914
Studyingpplscientificallynb914Jim Forde
 
Clinical Epidemiology - Systematic PubMed Searching Workshop
Clinical Epidemiology - Systematic PubMed Searching WorkshopClinical Epidemiology - Systematic PubMed Searching Workshop
Clinical Epidemiology - Systematic PubMed Searching Workshop
Robin Featherstone
 
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to KnowSystematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
Lorie Kloda
 
Searching for evidence - Paramedicine
Searching for evidence - ParamedicineSearching for evidence - Paramedicine
Searching for evidence - Paramedicine
La Trobe University Library - College of SHE
 
Information retrieval in systematic reviews: a case study of the crime preven...
Information retrieval in systematic reviews: a case study of the crime preven...Information retrieval in systematic reviews: a case study of the crime preven...
Information retrieval in systematic reviews: a case study of the crime preven...
Lisa Tompson
 
Meta analysis_Sharanbasappa
Meta analysis_SharanbasappaMeta analysis_Sharanbasappa
Meta analysis_Sharanbasappa
Sharanabasappa Durg
 
Exercise Science
Exercise ScienceExercise Science
Methodology and research process
Methodology and research processMethodology and research process
Methodology and research process
Toufik Kasmi
 
The best research method طرق البحث
The best research method طرق البحثThe best research method طرق البحث
The best research method طرق البحث
abdullah alhariri
 
Pronunciation App - Research Proposal
Pronunciation App - Research ProposalPronunciation App - Research Proposal
Pronunciation App - Research ProposalLiza Pesenson
 
Classroom Research
Classroom ResearchClassroom Research
Classroom Research
harrindl
 

Similar to Extreme Extraction - Machine Reading in a Week (20)

Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
 
Studying ppl scientifically nb 913
Studying ppl scientifically nb 913Studying ppl scientifically nb 913
Studying ppl scientifically nb 913
 
R methods 66
R methods 66R methods 66
R methods 66
 
Soc. Unit I, Packet 2
Soc. Unit I, Packet 2Soc. Unit I, Packet 2
Soc. Unit I, Packet 2
 
Nursingnotes.info nursing-research-review
Nursingnotes.info nursing-research-reviewNursingnotes.info nursing-research-review
Nursingnotes.info nursing-research-review
 
Effectiveness of New, Informationist-led Curriculum Changes at the College of...
Effectiveness of New, Informationist-led Curriculum Changes at the College of...Effectiveness of New, Informationist-led Curriculum Changes at the College of...
Effectiveness of New, Informationist-led Curriculum Changes at the College of...
 
Variations in citation practices across the scientific landscape: Analysis ba...
Variations in citation practices across the scientific landscape: Analysis ba...Variations in citation practices across the scientific landscape: Analysis ba...
Variations in citation practices across the scientific landscape: Analysis ba...
 
MELJUN CORTES research seminar_1_the_research_process_coming_to_terms
MELJUN CORTES research seminar_1_the_research_process_coming_to_termsMELJUN CORTES research seminar_1_the_research_process_coming_to_terms
MELJUN CORTES research seminar_1_the_research_process_coming_to_terms
 
Studyingpplscientificallynb914
Studyingpplscientificallynb914Studyingpplscientificallynb914
Studyingpplscientificallynb914
 
Clinical Epidemiology - Systematic PubMed Searching Workshop
Clinical Epidemiology - Systematic PubMed Searching WorkshopClinical Epidemiology - Systematic PubMed Searching Workshop
Clinical Epidemiology - Systematic PubMed Searching Workshop
 
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to KnowSystematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
 
Searching for evidence - Paramedicine
Searching for evidence - ParamedicineSearching for evidence - Paramedicine
Searching for evidence - Paramedicine
 
Information retrieval in systematic reviews: a case study of the crime preven...
Information retrieval in systematic reviews: a case study of the crime preven...Information retrieval in systematic reviews: a case study of the crime preven...
Information retrieval in systematic reviews: a case study of the crime preven...
 
Meta analysis_Sharanbasappa
Meta analysis_SharanbasappaMeta analysis_Sharanbasappa
Meta analysis_Sharanbasappa
 
Exercise Science
Exercise ScienceExercise Science
Exercise Science
 
Methodology and research process
Methodology and research processMethodology and research process
Methodology and research process
 
The best research method طرق البحث
The best research method طرق البحثThe best research method طرق البحث
The best research method طرق البحث
 
Podiatry: Searching for Evidence
Podiatry: Searching for EvidencePodiatry: Searching for Evidence
Podiatry: Searching for Evidence
 
Pronunciation App - Research Proposal
Pronunciation App - Research ProposalPronunciation App - Research Proposal
Pronunciation App - Research Proposal
 
Classroom Research
Classroom ResearchClassroom Research
Classroom Research
 

More from Shuyo Nakatani

画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
Shuyo Nakatani
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
Shuyo Nakatani
 
無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)
Shuyo Nakatani
 
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Shuyo Nakatani
 
人工知能と機械学習の違いって?
人工知能と機械学習の違いって?人工知能と機械学習の違いって?
人工知能と機械学習の違いって?
Shuyo Nakatani
 
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRRとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
Shuyo Nakatani
 
ドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoR
Shuyo Nakatani
 
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
Shuyo Nakatani
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章
Shuyo Nakatani
 
星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章
Shuyo Nakatani
 
言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyoShuyo Nakatani
 
Zipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPZipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLP
Shuyo Nakatani
 
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...Shuyo Nakatani
 
ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014
Shuyo Nakatani
 
猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測Shuyo Nakatani
 
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
Shuyo Nakatani
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013Shuyo Nakatani
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門Shuyo Nakatani
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013Shuyo Nakatani
 
ノンパラベイズ入門の入門
ノンパラベイズ入門の入門ノンパラベイズ入門の入門
ノンパラベイズ入門の入門
Shuyo Nakatani
 

More from Shuyo Nakatani (20)

画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)
 
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)
 
人工知能と機械学習の違いって?
人工知能と機械学習の違いって?人工知能と機械学習の違いって?
人工知能と機械学習の違いって?
 
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRRとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
 
ドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoR
 
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章
 
星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章
 
言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo
 
Zipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPZipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLP
 
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
 
ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014
 
猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測
 
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013
 
ノンパラベイズ入門の入門
ノンパラベイズ入門の入門ノンパラベイズ入門の入門
ノンパラベイズ入門の入門
 

Recently uploaded

ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 

Recently uploaded (20)

ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 

Extreme Extraction - Machine Reading in a Week

  • 1. [Freedman+ EMNLP11] Extreme Extraction – Machine Reading in a Week 23 Dec 2011 Nakatani Shuyo @ Cybozu labs, Inc twitter : @shuyo
  • 2. Abstract • Target: – Rapid construction of concept and relation extraction system • Method: – Extend an existing ACE system for new relation – in short time with minimum training data • in a Week (<50 person hours) with <20 example pairs – Evaluate by question answering task
  • 3. Phases 1. Ontology and resources 2. Extending system for new ontology 3. Extracting relations 4. Evaluation
  • 4. 1. Ontology and resources • possibleTreatment( Substance, Condition ) – SSRIs(S) are effective treatments for depression(C) • expectedDateOnMarket( Substance , Date ) – More drugs for type 2(S) expected on market soon(D) • responsibleForTreatment( Substance, Agent ) – Officials(A) Responsible for Treatment of War Dead(S) • studiesDisease( Agent , Condition ) not sure – cancer(C) researcher Dr. Henri Joyeux(A) • hasSideEffect( Substance, Condition )
  • 5. 2. Extending system for new ontology • Add new relation/class detectors into “our” extraction system for ACE task – Details of the system are not clear... • Class detectors with unsupervised word clustering • Bootstrap relation learner with a template and seeds • Pattern learning for relation extraction • Annotate words for 4 classes • Coreference
  • 6. Bootstrap relation learner • DAP(Double-Anchored Pattern) (Kozareva+ 08) – Web search with a query based on “<CLASS> such as <SEED> and *” – Add words at the position “*” in snippet into the class member as new seeds – Repeat “the bootstraping loop” while seeds are available
  • 7. Relation detection with DAP • CLASS = disease / SEED = cold • Web search = “disease such as cold and”
  • 8. Relation detection with DAP • CLASS = disease / SEED = cold • Web search = “disease such as cold and” – disease such as cold and flu (9). ... – disease such as cold and heat, external ... – disease such as cold and pneumonia. ... – disease (such as cold and hot diseases), ... – disease such as cold and flu viruses. ... – disease such as cold and food poisoning. ...
  • 9. Four classes to annotate • Substance-Name – medicine name • Substance-Description – e.g. “new drags” • Condition-Name – name of disease • Condition-Description – e.g. “the illness”
  • 10. Annotation • Name tagging with active learning(Miller+ 04) – Unsupervised word clustering on binary tree (Brown+ 90) – Tagging with clustering information • Averaged Perceptron (Collins 02) – Request annotation for selected sentence based on “confidence score” • score = (highest perceptron score) - (second one) !?
  • 11. Results of Class Detection What’s GS(GoldStandard)? from [Freedman+ 11] • substances & conditions – -Name / -Description respectively • without/with lists of known substances and conditions
  • 12. Coreference • It took the most time(20 of 43 hours) • But its detail is not clear... – domain independent heuristics – appositive linking
  • 13. 3. Extracting relations • Learned Patterns vs. Handwritten Patterns from [Freedman+ 11]
  • 15. 4. Evaluation • Question Answering with extracted information • Query examples – Find possible treatments for diabetes – What is expected date to market for Abilify?
  • 16. Answer Example • ACME produces a wide range of drugs including treatments for malaria and athletes foot – responsibleForTreatment(drugs, ACME) – possibleTreatment(drugs, malaria) – possibleTreatment(drugs, athletes foot)
  • 17. from [Freedman+ 11] • useful = answering complex query
  • 18. When non-useful answers are removed from [Freedman+ 11] • annotator’s recall (A) • using combining both (C) • using only handwritten rules (H, HW) • using only learned patterns (L)
  • 21. Conclusions • The combination system can achieve F1 of 0.51 in a new domain in a week. • It requires so little training data. • The effectiveness of learning algorithms is still not competitive with handwritten patterns.
  • 22. References • [Freedman+ 11] Extreme Extraction – Machine Reading in a Week • [Kozareva+ 08] Semantic Class Learning from the Web with Hyponym Pattern Linkage • [Miller+ 04] Name Tagging with Word Cluster and Discriminative Training – [Brown+ 90] Class-based n-gram models of natural language – [Collins 02] Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithm