SlideShare a Scribd company logo
Distant supervision
for relation extraction
without labeled data
Mike Mintz, Steven Bills, Rion Snow, Dan Jurafsky

ACL 2009
Introduced by Makoto Morishita
Contribution of this paper
• Proposed “distant supervision” for the first
time.
• By using distant supervision,

we can extract the relation between entities
from the sentences without annotation work.
2
Current training methods
• Supervised learning
• Unsupervised learning
• Self-training
• Active learning
3
Supervised learning
• Use only annotated data to train a model.
• Need a heavy cost to make the data.
4
Annotated data
Unsupervised learning
5
• Use only unannotated data.
• The result may not be suitable for some purposes.
Unannotated
data
Self-training
6
• Use annotated data for the seed of training model, then annotate
the unlabeled data by myself.
• It may be low precision and have a bias from the annotated data.










Unannotated data
Annotated
data
Active learning
7
• Use existing model to evaluate what data we
want to next, then annotate the selected data.
Unannotated data
Annotated
data
Evaluate
Annotate
Distant supervision
8
• We use existing database and unannotated data
to train classifier, then annotate the new data.
Unannotated
data
Classifier
Unannotated
data
Existing database
train
train
annotate
In this paper…
10
11
What we want to do
• Extract the relation between entities from
sentences.
• e.g.

sentence: Kyoto, the famous place in Japan.

entity: Japan, Kyoto

relation: location-contains <Japan, Kyoto>

12
In this work…
13
• Freebase: 102 relations, 940k entities, 

1.8M instances.
Unannotated
data
Classifier
Unannotated
data
Freebase
train
train
annotate
Wikipedia
Multiclass logistic

regression classifier
Wikipedia
Freebase
14
Training
• Find the sentence that contains two entities.

- This sentence tends to express the relation.

- Entities are found by a named entity tagger.
• Train classifier.

- I will explain the features later.
15
Example
• Known relation:

location-contains <Virginia, Richmond>

location-contains <France, Nantes>
• We found the sentences like:

- Richmond, the capital of Virginia.

- Henry’s Edict of Nantes helped the
Protestants of France.
• Train the classified using these sentences.
16
Testing
• Find the sentence that contains two entities.

- This sentence tends to express the relation.

- Entities are found by a named entity tagger.
• Using trained classifier, we can know these
entities have a relation.
17
Features
• Lexical features:

- specific words between and surrounding
the two entities in the sentence.
• Syntactic features:

- dependency path
18
Lexical features
• The sequence of words between the two entities.
• The part-of-speech tags of these words.
• A flag indication which entity came first in the sentence.
• A window of k words to the left of Entity 1 and their part-of-speech tags.
• A window of k words to the right of Entity 2 and their part-of-speech tags.
19
Astronomer Edwin Hubble was born in Marshfield, Missouri.
Syntactic features
20
• A dependency path between the two entities.
• For each entity, one “window” node that is not part of the dependency path.
Result
Trained features
22
Automatic evaluation
23
Human evaluation
24
Conclusion
• By using this method, we can extract the
relation from unlabeled texts.
• By using database, the label is suit for the
current database.
• Extracted relations are seemed to be
accurate.
25
Example usage of distant supervision
26
Existing database Target annotation
Freebase

(relation between entities)
Wikipedia sentences

(find new relations)
Emoticon
Tweet

(annotate positive, negative)
Dependency parse tree,
knowledge base
semantic parser
Comments
• Distant supervision can be useful for other
tasks.

- Currently, this method is used mainly for
relation extraction task.
• However, it supposes that we already have a
large database.
27
END

More Related Content

Similar to [Paper Introduction] Distant supervision for relation extraction without labeled data

Developing affective constructs
Developing affective constructsDeveloping affective constructs
Developing affective constructs
Carlo Magno
 
Is connectivism real v 19th
Is connectivism real v 19thIs connectivism real v 19th
Is connectivism real v 19th
frazil
 
Summary and conclusion - Survey research and design in psychology
Summary and conclusion - Survey research and design in psychologySummary and conclusion - Survey research and design in psychology
Summary and conclusion - Survey research and design in psychology
James Neill
 
Revision of Scientific Method
Revision of Scientific MethodRevision of Scientific Method
Revision of Scientific Method
Lily Kotze
 
Unsupervised Main Entity Extraction from News Articles using Latent Variables
Unsupervised Main Entity Extraction from News Articles using Latent VariablesUnsupervised Main Entity Extraction from News Articles using Latent Variables
Unsupervised Main Entity Extraction from News Articles using Latent Variables
Jinho Choi
 
Feature Selection.pdf
Feature Selection.pdfFeature Selection.pdf
Feature Selection.pdf
adarshbarnwal5
 
Anomalies! You can't escape them.
Anomalies! You can't escape them.Anomalies! You can't escape them.
Anomalies! You can't escape them.
CSIRO
 
User Centered Design of an Android app
User Centered Design of an Android appUser Centered Design of an Android app
User Centered Design of an Android app
Satheesh Kumar Chandran
 
Presentation for data science and data anayltics
Presentation for data science and data anaylticsPresentation for data science and data anayltics
Presentation for data science and data anayltics
timaprofile
 
Unit 2.pptx
Unit 2.pptxUnit 2.pptx
Unit 2.pptx
WilliamTom9
 
03 case-study-and-phenomenology
03 case-study-and-phenomenology03 case-study-and-phenomenology
03 case-study-and-phenomenology
jessieldiez
 
Object modeling
Object modelingObject modeling
Object modeling
Preeti Mishra
 
Lec 6 learning
Lec 6 learningLec 6 learning
Lec 6 learning
Eyob Sisay
 
Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012
CameliaN
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
Sagar Ahire
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
Gong Cheng
 
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Anita de Waard
 
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
Charlie Hull
 
Meta analysis in neuroimaging 101
Meta analysis in neuroimaging 101Meta analysis in neuroimaging 101
Meta analysis in neuroimaging 101
Krzysztof Gorgolewski
 
Outlier analysis,Chapter-12, Data Mining: Concepts and Techniques
Outlier analysis,Chapter-12, Data Mining: Concepts and TechniquesOutlier analysis,Chapter-12, Data Mining: Concepts and Techniques
Outlier analysis,Chapter-12, Data Mining: Concepts and Techniques
Ashikur Rahman
 

Similar to [Paper Introduction] Distant supervision for relation extraction without labeled data (20)

Developing affective constructs
Developing affective constructsDeveloping affective constructs
Developing affective constructs
 
Is connectivism real v 19th
Is connectivism real v 19thIs connectivism real v 19th
Is connectivism real v 19th
 
Summary and conclusion - Survey research and design in psychology
Summary and conclusion - Survey research and design in psychologySummary and conclusion - Survey research and design in psychology
Summary and conclusion - Survey research and design in psychology
 
Revision of Scientific Method
Revision of Scientific MethodRevision of Scientific Method
Revision of Scientific Method
 
Unsupervised Main Entity Extraction from News Articles using Latent Variables
Unsupervised Main Entity Extraction from News Articles using Latent VariablesUnsupervised Main Entity Extraction from News Articles using Latent Variables
Unsupervised Main Entity Extraction from News Articles using Latent Variables
 
Feature Selection.pdf
Feature Selection.pdfFeature Selection.pdf
Feature Selection.pdf
 
Anomalies! You can't escape them.
Anomalies! You can't escape them.Anomalies! You can't escape them.
Anomalies! You can't escape them.
 
User Centered Design of an Android app
User Centered Design of an Android appUser Centered Design of an Android app
User Centered Design of an Android app
 
Presentation for data science and data anayltics
Presentation for data science and data anaylticsPresentation for data science and data anayltics
Presentation for data science and data anayltics
 
Unit 2.pptx
Unit 2.pptxUnit 2.pptx
Unit 2.pptx
 
03 case-study-and-phenomenology
03 case-study-and-phenomenology03 case-study-and-phenomenology
03 case-study-and-phenomenology
 
Object modeling
Object modelingObject modeling
Object modeling
 
Lec 6 learning
Lec 6 learningLec 6 learning
Lec 6 learning
 
Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
 
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
 
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
 
Meta analysis in neuroimaging 101
Meta analysis in neuroimaging 101Meta analysis in neuroimaging 101
Meta analysis in neuroimaging 101
 
Outlier analysis,Chapter-12, Data Mining: Concepts and Techniques
Outlier analysis,Chapter-12, Data Mining: Concepts and TechniquesOutlier analysis,Chapter-12, Data Mining: Concepts and Techniques
Outlier analysis,Chapter-12, Data Mining: Concepts and Techniques
 

More from NAIST Machine Translation Study Group

[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
NAIST Machine Translation Study Group
 
On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translationOn using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation
NAIST Machine Translation Study Group
 
RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)
NAIST Machine Translation Study Group
 
[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...
NAIST Machine Translation Study Group
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
NAIST Machine Translation Study Group
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
NAIST Machine Translation Study Group
 
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
NAIST Machine Translation Study Group
 
[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...
NAIST Machine Translation Study Group
 
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1
NAIST Machine Translation Study Group
 
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 2 No.2
 [Book Reading] 機械翻訳 - Section 2 No.2 [Book Reading] 機械翻訳 - Section 2 No.2
[Book Reading] 機械翻訳 - Section 2 No.2
NAIST Machine Translation Study Group
 

More from NAIST Machine Translation Study Group (14)

[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
 
On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translationOn using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation
 
RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)
 
[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
 
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
 
[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...
 
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
 
[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1
 
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
 
[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2
 
[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1
 
[Book Reading] 機械翻訳 - Section 2 No.2
 [Book Reading] 機械翻訳 - Section 2 No.2 [Book Reading] 機械翻訳 - Section 2 No.2
[Book Reading] 機械翻訳 - Section 2 No.2
 

Recently uploaded

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 

Recently uploaded (20)

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 

[Paper Introduction] Distant supervision for relation extraction without labeled data

  • 1. Distant supervision for relation extraction without labeled data Mike Mintz, Steven Bills, Rion Snow, Dan Jurafsky
 ACL 2009 Introduced by Makoto Morishita
  • 2. Contribution of this paper • Proposed “distant supervision” for the first time. • By using distant supervision,
 we can extract the relation between entities from the sentences without annotation work. 2
  • 3. Current training methods • Supervised learning • Unsupervised learning • Self-training • Active learning 3
  • 4. Supervised learning • Use only annotated data to train a model. • Need a heavy cost to make the data. 4 Annotated data
  • 5. Unsupervised learning 5 • Use only unannotated data. • The result may not be suitable for some purposes. Unannotated data
  • 6. Self-training 6 • Use annotated data for the seed of training model, then annotate the unlabeled data by myself. • It may be low precision and have a bias from the annotated data. 
 
 
 
 
 Unannotated data Annotated data
  • 7. Active learning 7 • Use existing model to evaluate what data we want to next, then annotate the selected data. Unannotated data Annotated data Evaluate Annotate
  • 8. Distant supervision 8 • We use existing database and unannotated data to train classifier, then annotate the new data. Unannotated data Classifier Unannotated data Existing database train train annotate
  • 10. 10
  • 11. 11
  • 12. What we want to do • Extract the relation between entities from sentences. • e.g.
 sentence: Kyoto, the famous place in Japan.
 entity: Japan, Kyoto
 relation: location-contains <Japan, Kyoto>
 12
  • 13. In this work… 13 • Freebase: 102 relations, 940k entities, 
 1.8M instances. Unannotated data Classifier Unannotated data Freebase train train annotate Wikipedia Multiclass logistic
 regression classifier Wikipedia
  • 15. Training • Find the sentence that contains two entities.
 - This sentence tends to express the relation.
 - Entities are found by a named entity tagger. • Train classifier.
 - I will explain the features later. 15
  • 16. Example • Known relation:
 location-contains <Virginia, Richmond>
 location-contains <France, Nantes> • We found the sentences like:
 - Richmond, the capital of Virginia.
 - Henry’s Edict of Nantes helped the Protestants of France. • Train the classified using these sentences. 16
  • 17. Testing • Find the sentence that contains two entities.
 - This sentence tends to express the relation.
 - Entities are found by a named entity tagger. • Using trained classifier, we can know these entities have a relation. 17
  • 18. Features • Lexical features:
 - specific words between and surrounding the two entities in the sentence. • Syntactic features:
 - dependency path 18
  • 19. Lexical features • The sequence of words between the two entities. • The part-of-speech tags of these words. • A flag indication which entity came first in the sentence. • A window of k words to the left of Entity 1 and their part-of-speech tags. • A window of k words to the right of Entity 2 and their part-of-speech tags. 19 Astronomer Edwin Hubble was born in Marshfield, Missouri.
  • 20. Syntactic features 20 • A dependency path between the two entities. • For each entity, one “window” node that is not part of the dependency path.
  • 25. Conclusion • By using this method, we can extract the relation from unlabeled texts. • By using database, the label is suit for the current database. • Extracted relations are seemed to be accurate. 25
  • 26. Example usage of distant supervision 26 Existing database Target annotation Freebase
 (relation between entities) Wikipedia sentences
 (find new relations) Emoticon Tweet
 (annotate positive, negative) Dependency parse tree, knowledge base semantic parser
  • 27. Comments • Distant supervision can be useful for other tasks.
 - Currently, this method is used mainly for relation extraction task. • However, it supposes that we already have a large database. 27
  • 28. END