SlideShare a Scribd company logo
Distant supervision
for relation extraction
without labeled data
Mike Mintz, Steven Bills, Rion Snow, Dan Jurafsky

ACL 2009
Introduced by Makoto Morishita
Contribution of this paper
• Proposed “distant supervision” for the first
time.
• By using distant supervision,

we can extract the relation between entities
from the sentences without annotation work.
2
Current training methods
• Supervised learning
• Unsupervised learning
• Self-training
• Active learning
3
Supervised learning
• Use only annotated data to train a model.
• Need a heavy cost to make the data.
4
Annotated data
Unsupervised learning
5
• Use only unannotated data.
• The result may not be suitable for some purposes.
Unannotated
data
Self-training
6
• Use annotated data for the seed of training model, then annotate
the unlabeled data by myself.
• It may be low precision and have a bias from the annotated data.










Unannotated data
Annotated
data
Active learning
7
• Use existing model to evaluate what data we
want to next, then annotate the selected data.
Unannotated data
Annotated
data
Evaluate
Annotate
Distant supervision
8
• We use existing database and unannotated data
to train classifier, then annotate the new data.
Unannotated
data
Classifier
Unannotated
data
Existing database
train
train
annotate
In this paper…
10
11
What we want to do
• Extract the relation between entities from
sentences.
• e.g.

sentence: Kyoto, the famous place in Japan.

entity: Japan, Kyoto

relation: location-contains <Japan, Kyoto>

12
In this work…
13
• Freebase: 102 relations, 940k entities, 

1.8M instances.
Unannotated
data
Classifier
Unannotated
data
Freebase
train
train
annotate
Wikipedia
Multiclass logistic

regression classifier
Wikipedia
Freebase
14
Training
• Find the sentence that contains two entities.

- This sentence tends to express the relation.

- Entities are found by a named entity tagger.
• Train classifier.

- I will explain the features later.
15
Example
• Known relation:

location-contains <Virginia, Richmond>

location-contains <France, Nantes>
• We found the sentences like:

- Richmond, the capital of Virginia.

- Henry’s Edict of Nantes helped the
Protestants of France.
• Train the classified using these sentences.
16
Testing
• Find the sentence that contains two entities.

- This sentence tends to express the relation.

- Entities are found by a named entity tagger.
• Using trained classifier, we can know these
entities have a relation.
17
Features
• Lexical features:

- specific words between and surrounding
the two entities in the sentence.
• Syntactic features:

- dependency path
18
Lexical features
• The sequence of words between the two entities.
• The part-of-speech tags of these words.
• A flag indication which entity came first in the sentence.
• A window of k words to the left of Entity 1 and their part-of-speech tags.
• A window of k words to the right of Entity 2 and their part-of-speech tags.
19
Astronomer Edwin Hubble was born in Marshfield, Missouri.
Syntactic features
20
• A dependency path between the two entities.
• For each entity, one “window” node that is not part of the dependency path.
Result
Trained features
22
Automatic evaluation
23
Human evaluation
24
Conclusion
• By using this method, we can extract the
relation from unlabeled texts.
• By using database, the label is suit for the
current database.
• Extracted relations are seemed to be
accurate.
25
Example usage of distant supervision
26
Existing database Target annotation
Freebase

(relation between entities)
Wikipedia sentences

(find new relations)
Emoticon
Tweet

(annotate positive, negative)
Dependency parse tree,
knowledge base
semantic parser
Comments
• Distant supervision can be useful for other
tasks.

- Currently, this method is used mainly for
relation extraction task.
• However, it supposes that we already have a
large database.
27
END

More Related Content

Similar to [Paper Introduction] Distant supervision for relation extraction without labeled data

Developing affective constructs
Developing affective constructsDeveloping affective constructs
Developing affective constructs
Carlo Magno
 
Is connectivism real v 19th
Is connectivism real v 19thIs connectivism real v 19th
Is connectivism real v 19th
frazil
 
Summary and conclusion - Survey research and design in psychology
Summary and conclusion - Survey research and design in psychologySummary and conclusion - Survey research and design in psychology
Summary and conclusion - Survey research and design in psychology
James Neill
 
Revision of Scientific Method
Revision of Scientific MethodRevision of Scientific Method
Revision of Scientific Method
Lily Kotze
 
Artificial Intelligence - Reason and Planning
Artificial Intelligence - Reason and PlanningArtificial Intelligence - Reason and Planning
Artificial Intelligence - Reason and Planning
ArchanaKK4
 
Unsupervised Main Entity Extraction from News Articles using Latent Variables
Unsupervised Main Entity Extraction from News Articles using Latent VariablesUnsupervised Main Entity Extraction from News Articles using Latent Variables
Unsupervised Main Entity Extraction from News Articles using Latent Variables
Jinho Choi
 
Feature Selection.pdf
Feature Selection.pdfFeature Selection.pdf
Feature Selection.pdf
adarshbarnwal5
 
Anomalies! You can't escape them.
Anomalies! You can't escape them.Anomalies! You can't escape them.
Anomalies! You can't escape them.
CSIRO
 
User Centered Design of an Android app
User Centered Design of an Android appUser Centered Design of an Android app
User Centered Design of an Android app
Satheesh Kumar Chandran
 
Presentation for data science and data anayltics
Presentation for data science and data anaylticsPresentation for data science and data anayltics
Presentation for data science and data anayltics
timaprofile
 
Unit 2.pptx
Unit 2.pptxUnit 2.pptx
Unit 2.pptx
WilliamTom9
 
03 case-study-and-phenomenology
03 case-study-and-phenomenology03 case-study-and-phenomenology
03 case-study-and-phenomenology
jessieldiez
 
Object modeling
Object modelingObject modeling
Object modeling
Preeti Mishra
 
Lec 6 learning
Lec 6 learningLec 6 learning
Lec 6 learning
Eyob Sisay
 
Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012
CameliaN
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
Sagar Ahire
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
Gong Cheng
 
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Anita de Waard
 
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
Charlie Hull
 
Meta analysis in neuroimaging 101
Meta analysis in neuroimaging 101Meta analysis in neuroimaging 101
Meta analysis in neuroimaging 101
Krzysztof Gorgolewski
 

Similar to [Paper Introduction] Distant supervision for relation extraction without labeled data (20)

Developing affective constructs
Developing affective constructsDeveloping affective constructs
Developing affective constructs
 
Is connectivism real v 19th
Is connectivism real v 19thIs connectivism real v 19th
Is connectivism real v 19th
 
Summary and conclusion - Survey research and design in psychology
Summary and conclusion - Survey research and design in psychologySummary and conclusion - Survey research and design in psychology
Summary and conclusion - Survey research and design in psychology
 
Revision of Scientific Method
Revision of Scientific MethodRevision of Scientific Method
Revision of Scientific Method
 
Artificial Intelligence - Reason and Planning
Artificial Intelligence - Reason and PlanningArtificial Intelligence - Reason and Planning
Artificial Intelligence - Reason and Planning
 
Unsupervised Main Entity Extraction from News Articles using Latent Variables
Unsupervised Main Entity Extraction from News Articles using Latent VariablesUnsupervised Main Entity Extraction from News Articles using Latent Variables
Unsupervised Main Entity Extraction from News Articles using Latent Variables
 
Feature Selection.pdf
Feature Selection.pdfFeature Selection.pdf
Feature Selection.pdf
 
Anomalies! You can't escape them.
Anomalies! You can't escape them.Anomalies! You can't escape them.
Anomalies! You can't escape them.
 
User Centered Design of an Android app
User Centered Design of an Android appUser Centered Design of an Android app
User Centered Design of an Android app
 
Presentation for data science and data anayltics
Presentation for data science and data anaylticsPresentation for data science and data anayltics
Presentation for data science and data anayltics
 
Unit 2.pptx
Unit 2.pptxUnit 2.pptx
Unit 2.pptx
 
03 case-study-and-phenomenology
03 case-study-and-phenomenology03 case-study-and-phenomenology
03 case-study-and-phenomenology
 
Object modeling
Object modelingObject modeling
Object modeling
 
Lec 6 learning
Lec 6 learningLec 6 learning
Lec 6 learning
 
Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
 
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
 
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
 
Meta analysis in neuroimaging 101
Meta analysis in neuroimaging 101Meta analysis in neuroimaging 101
Meta analysis in neuroimaging 101
 

More from NAIST Machine Translation Study Group

[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
NAIST Machine Translation Study Group
 
On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translationOn using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation
NAIST Machine Translation Study Group
 
RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)
NAIST Machine Translation Study Group
 
[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...
NAIST Machine Translation Study Group
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
NAIST Machine Translation Study Group
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
NAIST Machine Translation Study Group
 
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
NAIST Machine Translation Study Group
 
[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...
NAIST Machine Translation Study Group
 
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1
NAIST Machine Translation Study Group
 
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 2 No.2
 [Book Reading] 機械翻訳 - Section 2 No.2 [Book Reading] 機械翻訳 - Section 2 No.2
[Book Reading] 機械翻訳 - Section 2 No.2
NAIST Machine Translation Study Group
 

More from NAIST Machine Translation Study Group (14)

[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
 
On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translationOn using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation
 
RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)
 
[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
 
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
 
[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...
 
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
 
[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1
 
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
 
[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2
 
[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1
 
[Book Reading] 機械翻訳 - Section 2 No.2
 [Book Reading] 機械翻訳 - Section 2 No.2 [Book Reading] 機械翻訳 - Section 2 No.2
[Book Reading] 機械翻訳 - Section 2 No.2
 

Recently uploaded

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 

Recently uploaded (20)

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 

[Paper Introduction] Distant supervision for relation extraction without labeled data

  • 1. Distant supervision for relation extraction without labeled data Mike Mintz, Steven Bills, Rion Snow, Dan Jurafsky
 ACL 2009 Introduced by Makoto Morishita
  • 2. Contribution of this paper • Proposed “distant supervision” for the first time. • By using distant supervision,
 we can extract the relation between entities from the sentences without annotation work. 2
  • 3. Current training methods • Supervised learning • Unsupervised learning • Self-training • Active learning 3
  • 4. Supervised learning • Use only annotated data to train a model. • Need a heavy cost to make the data. 4 Annotated data
  • 5. Unsupervised learning 5 • Use only unannotated data. • The result may not be suitable for some purposes. Unannotated data
  • 6. Self-training 6 • Use annotated data for the seed of training model, then annotate the unlabeled data by myself. • It may be low precision and have a bias from the annotated data. 
 
 
 
 
 Unannotated data Annotated data
  • 7. Active learning 7 • Use existing model to evaluate what data we want to next, then annotate the selected data. Unannotated data Annotated data Evaluate Annotate
  • 8. Distant supervision 8 • We use existing database and unannotated data to train classifier, then annotate the new data. Unannotated data Classifier Unannotated data Existing database train train annotate
  • 10. 10
  • 11. 11
  • 12. What we want to do • Extract the relation between entities from sentences. • e.g.
 sentence: Kyoto, the famous place in Japan.
 entity: Japan, Kyoto
 relation: location-contains <Japan, Kyoto>
 12
  • 13. In this work… 13 • Freebase: 102 relations, 940k entities, 
 1.8M instances. Unannotated data Classifier Unannotated data Freebase train train annotate Wikipedia Multiclass logistic
 regression classifier Wikipedia
  • 15. Training • Find the sentence that contains two entities.
 - This sentence tends to express the relation.
 - Entities are found by a named entity tagger. • Train classifier.
 - I will explain the features later. 15
  • 16. Example • Known relation:
 location-contains <Virginia, Richmond>
 location-contains <France, Nantes> • We found the sentences like:
 - Richmond, the capital of Virginia.
 - Henry’s Edict of Nantes helped the Protestants of France. • Train the classified using these sentences. 16
  • 17. Testing • Find the sentence that contains two entities.
 - This sentence tends to express the relation.
 - Entities are found by a named entity tagger. • Using trained classifier, we can know these entities have a relation. 17
  • 18. Features • Lexical features:
 - specific words between and surrounding the two entities in the sentence. • Syntactic features:
 - dependency path 18
  • 19. Lexical features • The sequence of words between the two entities. • The part-of-speech tags of these words. • A flag indication which entity came first in the sentence. • A window of k words to the left of Entity 1 and their part-of-speech tags. • A window of k words to the right of Entity 2 and their part-of-speech tags. 19 Astronomer Edwin Hubble was born in Marshfield, Missouri.
  • 20. Syntactic features 20 • A dependency path between the two entities. • For each entity, one “window” node that is not part of the dependency path.
  • 25. Conclusion • By using this method, we can extract the relation from unlabeled texts. • By using database, the label is suit for the current database. • Extracted relations are seemed to be accurate. 25
  • 26. Example usage of distant supervision 26 Existing database Target annotation Freebase
 (relation between entities) Wikipedia sentences
 (find new relations) Emoticon Tweet
 (annotate positive, negative) Dependency parse tree, knowledge base semantic parser
  • 27. Comments • Distant supervision can be useful for other tasks.
 - Currently, this method is used mainly for relation extraction task. • However, it supposes that we already have a large database. 27
  • 28. END