SlideShare a Scribd company logo
1 of 1
Download to read offline
Isabelle Augenstein, Andreas
Vlachos, Diana Maynard
Extracting Relations between Non-Standard Entities using
Distant Supervision and Imitation Learning
Approach Overview
Musical Artist Album
Michael Jackson Music & Me
The Beatles ?
Stealers Wheel ?
Sentences
Preview songs from Forever, Michael by Michael Jackson on the iTunes Store.
The only Beatles album to occasion negative, even hostile reviews, there are
few other rock records as controversial as Let It Be.
The Beatles recorded ten songs during a single studio session for their debut
LP, Please Please Me.
Stealers Wheel are a Scottish folk rock band formed in Paisley, Renfrewshire.
Web Search
Sentences
Music & Me is the third studio album by
American singer Michael Jackson.
Stealers Wheel are a Scottish folk rock
band formed in Paisley, Renfrewshire.
NER
Training Instances
Label Subject Object
true Michael Jackson Forever, Michael
Sentence
Preview songs from
false Michael Jackson iTunes Store Preview songs from
Testing Instances
true The Beatles Let It Be The only Beatles album to
false The Beatles LP The Beatles recorded ten
true The Beatles Please Please Me The Beatles recorded ten
false Stealers Wheel Scottish Stealers Wheel are a
false Stealers Wheel Paisley, Renfrewshire Stealers Wheel are a
Distant
Supervision
NE Classifier
Relation Extractor
Train
Predict
Output
Subject Object
The Beatles Let It Be
The Beatles Please Please Me
Stealers Wheel
Distant Supervision
Train relation extractors without manually labeled
data, using a knowledge base
and unlabeled text
How can we recognise arguments of
relations?
Named Entity Recognition (NER) and Named Entity Classification
(NEC) are typically used as part of preprocessing using tools such
as Stanford NER or FIGER
Is off-the-shelf NER good
enough?
•  Experiments with 16 relations (e.g. album,
character, record label, author, origin)
Recall of Stanford NER compared to simple
POS-based heuristics
Improving NEC for RE: Imitation Learning
•  Simple solution (OS): adding NEC features to RE
•  Problem: NEC features (e.g. mention, mention context)
overpower RE features (e.g. path between s and o)
Ø  OS would incorrectly predict Steven Spielberg,
because context is stronger
•  Solution (IL): Decomposing the learning task into series of
actions: NEC, then RE if NEC prediction is positive
•  Classifiers are trained iteratively with imitation learning
algorithm DAGGER (Ross et al., 2011)
•  NEC stage is fairly permissive and enhances RE
Ø  NEC prediction for both candidates is positive
Ø  RE correctly predicts Alfred Hitchcock
Improving NEC for RE: Web Features
Results
0	
  
0.1	
  
0.2	
  
0.3	
  
0.4	
  
0.5	
  
0.6	
  
0.7	
  
P-­‐avg	
  
Rel	
  only	
  
Stanf	
  
FIGER	
  
OS	
  
IL	
  
One of director Steven Spielberg’s greatest heroes was
Alfred Hitchcock, the mastermind behind Psycho.
Candidates for director relation with subject Psycho:
Steven Spielberg, Alfred Hitchcock
Arctic Monkeys
Arctic Monkeys are a rock band from Sheffield,
famous for albums such as AM.
Albums:
- Whatever People Say I Am, That's What I'm Not
- AM
header
link
bold
list
0	
   0.5	
   1	
  
PER	
  
LOC	
  
ORG	
  
MISC	
  
Stanford	
  NER	
  
HeurisHc	
  
Conclusions
•  Imitation learning approach outperforms baselines with supervised NEC
(Stanford NER and FIGER) by 10 points in average precision
•  Web features such as appearance in lists or links to other Web improve
average precision by 7 points
•  Sparse, high-precision features (such as parse) outperform high-recall low-
precision features (such as BOW features)
References
Stéphane Ross, Geoffrey J. Gordon, and Drew Bagnell.
2011. A Reduction of Imitation Learning and Structured
Prediction to No-Regret Online Learning. JMLR.

More Related Content

More from Isabelle Augenstein

Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
Isabelle Augenstein
 

More from Isabelle Augenstein (20)

Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science Communication
 
Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)
 
Explainability for NLP
Explainability for NLPExplainability for NLP
Explainability for NLP
 
Towards Explainable Fact Checking
Towards Explainable Fact CheckingTowards Explainable Fact Checking
Towards Explainable Fact Checking
 
Tracking False Information Online
Tracking False Information OnlineTracking False Information Online
Tracking False Information Online
 
What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...
 
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
 
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
 
Learning to read for automated fact checking
Learning to read for automated fact checkingLearning to read for automated fact checking
Learning to read for automated fact checking
 
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
 
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
 
Weakly Supervised Machine Reading
Weakly Supervised Machine ReadingWeakly Supervised Machine Reading
Weakly Supervised Machine Reading
 
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with AutoencodersUSFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
 
Distant Supervision with Imitation Learning
Distant Supervision with Imitation LearningDistant Supervision with Imitation Learning
Distant Supervision with Imitation Learning
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked Data
 
Lodifier: Generating Linked Data from Unstructured Text
Lodifier: Generating Linked Data from Unstructured TextLodifier: Generating Linked Data from Unstructured Text
Lodifier: Generating Linked Data from Unstructured Text
 
Relation Extraction from the Web using Distant Supervision
Relation Extraction from the Web using Distant SupervisionRelation Extraction from the Web using Distant Supervision
Relation Extraction from the Web using Distant Supervision
 
Natural Language Processing for the Semantic Web
Natural Language Processing for the Semantic WebNatural Language Processing for the Semantic Web
Natural Language Processing for the Semantic Web
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 

Extracting Relations between Non-Standard Entities using Distant Supervision and Imitation Learning

  • 1. Isabelle Augenstein, Andreas Vlachos, Diana Maynard Extracting Relations between Non-Standard Entities using Distant Supervision and Imitation Learning Approach Overview Musical Artist Album Michael Jackson Music & Me The Beatles ? Stealers Wheel ? Sentences Preview songs from Forever, Michael by Michael Jackson on the iTunes Store. The only Beatles album to occasion negative, even hostile reviews, there are few other rock records as controversial as Let It Be. The Beatles recorded ten songs during a single studio session for their debut LP, Please Please Me. Stealers Wheel are a Scottish folk rock band formed in Paisley, Renfrewshire. Web Search Sentences Music & Me is the third studio album by American singer Michael Jackson. Stealers Wheel are a Scottish folk rock band formed in Paisley, Renfrewshire. NER Training Instances Label Subject Object true Michael Jackson Forever, Michael Sentence Preview songs from false Michael Jackson iTunes Store Preview songs from Testing Instances true The Beatles Let It Be The only Beatles album to false The Beatles LP The Beatles recorded ten true The Beatles Please Please Me The Beatles recorded ten false Stealers Wheel Scottish Stealers Wheel are a false Stealers Wheel Paisley, Renfrewshire Stealers Wheel are a Distant Supervision NE Classifier Relation Extractor Train Predict Output Subject Object The Beatles Let It Be The Beatles Please Please Me Stealers Wheel Distant Supervision Train relation extractors without manually labeled data, using a knowledge base and unlabeled text How can we recognise arguments of relations? Named Entity Recognition (NER) and Named Entity Classification (NEC) are typically used as part of preprocessing using tools such as Stanford NER or FIGER Is off-the-shelf NER good enough? •  Experiments with 16 relations (e.g. album, character, record label, author, origin) Recall of Stanford NER compared to simple POS-based heuristics Improving NEC for RE: Imitation Learning •  Simple solution (OS): adding NEC features to RE •  Problem: NEC features (e.g. mention, mention context) overpower RE features (e.g. path between s and o) Ø  OS would incorrectly predict Steven Spielberg, because context is stronger •  Solution (IL): Decomposing the learning task into series of actions: NEC, then RE if NEC prediction is positive •  Classifiers are trained iteratively with imitation learning algorithm DAGGER (Ross et al., 2011) •  NEC stage is fairly permissive and enhances RE Ø  NEC prediction for both candidates is positive Ø  RE correctly predicts Alfred Hitchcock Improving NEC for RE: Web Features Results 0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   P-­‐avg   Rel  only   Stanf   FIGER   OS   IL   One of director Steven Spielberg’s greatest heroes was Alfred Hitchcock, the mastermind behind Psycho. Candidates for director relation with subject Psycho: Steven Spielberg, Alfred Hitchcock Arctic Monkeys Arctic Monkeys are a rock band from Sheffield, famous for albums such as AM. Albums: - Whatever People Say I Am, That's What I'm Not - AM header link bold list 0   0.5   1   PER   LOC   ORG   MISC   Stanford  NER   HeurisHc   Conclusions •  Imitation learning approach outperforms baselines with supervised NEC (Stanford NER and FIGER) by 10 points in average precision •  Web features such as appearance in lists or links to other Web improve average precision by 7 points •  Sparse, high-precision features (such as parse) outperform high-recall low- precision features (such as BOW features) References Stéphane Ross, Geoffrey J. Gordon, and Drew Bagnell. 2011. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. JMLR.