SlideShare a Scribd company logo
http://multimedialab.elis.ugent.be 
Ghent University – iMinds, ELIS Department/Multimedia Lab 
Gaston Crommenlaan 8 bus 201 
B-9050 Ledeberg – Ghent, Belgium 
Fréderic Godin, Baptist Vandersmissen, Azarakhsh Jalalvand, Wesley De Neve and Rik Van de Walle 
Workshop on Machine Learning and NLP, NIPS 2014 
Alleviating Manual Feature Engineering for Part-of-Speech Tagging 
of Twitter Microposts using Distributed Word Representations 
12/12/2014, Montreal, Canada 
Research Question 
Vote-Constrained Bootstrapping* 
Can we avoid manual feature engineering when developing 
a Part-of-Speech tagger for Twitter microposts? 
@frederic_godin, @BaptistV, @wmdeneve and @rvdwalle 
Solution 
Automatically learn features on 400 million raw Twitter microposts that 
capture syntactic and semantic patterns and feed them to a neural network 
Learn Features Train the Part-of-Speech Tagger 
400 million 
Word2vec Skip-gram 
400D vector 
400D 
400D vector 
400D 400D 
444000000DDD v vVeeecccttotoorr r 
Hidden Layer (500D) 
Output Layer (52D) 
im doin good 
VBG 
Evaluation 
ARK tagger 
GATE tagger 
im 
doin 
good VBG 
V 
Agree? 
Automatically generate 
high confidence labeled data 
Use this data to pre-train 
the neural network 
*Derczynski et al., 2013. "Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data" 
Word2vec 
dataset 
Pre-training 
dataset 
Accuracy 
validation set 
Accuracy 
test set 
150M / 87.95% 87.46% 
150M 50K 89.64% 88.82% 
400M 50K 89.73% 88.95% 
400M 125K 90.09% 88.90% 
Ritter et al. (2011) 84.55% 
Derczynski et al. (2013) 88.69%

More Related Content

Similar to Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter Microposts using Distributed Word Representations

Similar to Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter Microposts using Distributed Word Representations (20)

CV VD Mohire
CV VD MohireCV VD Mohire
CV VD Mohire
 
Amit_Resume
Amit_ResumeAmit_Resume
Amit_Resume
 
VEDANT GHODKE - RESUME
VEDANT GHODKE - RESUMEVEDANT GHODKE - RESUME
VEDANT GHODKE - RESUME
 
VEDANT GHODKE - RESUME
VEDANT GHODKE - RESUMEVEDANT GHODKE - RESUME
VEDANT GHODKE - RESUME
 
DB
DBDB
DB
 
Resume upto august 2016
Resume upto august 2016Resume upto august 2016
Resume upto august 2016
 
Deepak Kumar Gautam
Deepak Kumar GautamDeepak Kumar Gautam
Deepak Kumar Gautam
 
Naisahdh nimbark copy
Naisahdh nimbark   copyNaisahdh nimbark   copy
Naisahdh nimbark copy
 
DILIP M NAIR
DILIP M NAIRDILIP M NAIR
DILIP M NAIR
 
Yashwanth Krishnan - Resume
Yashwanth Krishnan - ResumeYashwanth Krishnan - Resume
Yashwanth Krishnan - Resume
 
Rushabh_Doshi_1_
Rushabh_Doshi_1_Rushabh_Doshi_1_
Rushabh_Doshi_1_
 
resume_15450424_1453448806 (1)
resume_15450424_1453448806 (1)resume_15450424_1453448806 (1)
resume_15450424_1453448806 (1)
 
requirements engineering - technologies
requirements engineering - technologiesrequirements engineering - technologies
requirements engineering - technologies
 
Resume Ajay Neethi Kannan
Resume Ajay Neethi KannanResume Ajay Neethi Kannan
Resume Ajay Neethi Kannan
 
Rohit Ahlawat
Rohit AhlawatRohit Ahlawat
Rohit Ahlawat
 
Pravalika Resume
Pravalika ResumePravalika Resume
Pravalika Resume
 
Next Generation IoT Architectures_Hans Salomonsson
Next Generation IoT Architectures_Hans SalomonssonNext Generation IoT Architectures_Hans Salomonsson
Next Generation IoT Architectures_Hans Salomonsson
 
Resume_Anu
Resume_AnuResume_Anu
Resume_Anu
 
Mohamed-Rashad-Resume
Mohamed-Rashad-ResumeMohamed-Rashad-Resume
Mohamed-Rashad-Resume
 
Resume
ResumeResume
Resume
 

More from fgodin

Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...fgodin
 
Skip, residual and densely connected RNN architectures
Skip, residual and densely connected RNN architecturesSkip, residual and densely connected RNN architectures
Skip, residual and densely connected RNN architecturesfgodin
 
Improving Language Modeling using Densely Connected Recurrent Neural Networks
Improving Language Modeling using Densely Connected Recurrent Neural NetworksImproving Language Modeling using Densely Connected Recurrent Neural Networks
Improving Language Modeling using Densely Connected Recurrent Neural Networksfgodin
 
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...fgodin
 
The Normalized Freebase Distance (NFD)
The Normalized Freebase Distance (NFD)The Normalized Freebase Distance (NFD)
The Normalized Freebase Distance (NFD)fgodin
 
Msm2013challenge
Msm2013challengeMsm2013challenge
Msm2013challengefgodin
 
Using Topic Models for Twitter hashtag recommendation
Using Topic Models for Twitter hashtag recommendationUsing Topic Models for Twitter hashtag recommendation
Using Topic Models for Twitter hashtag recommendationfgodin
 

More from fgodin (7)

Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
 
Skip, residual and densely connected RNN architectures
Skip, residual and densely connected RNN architecturesSkip, residual and densely connected RNN architectures
Skip, residual and densely connected RNN architectures
 
Improving Language Modeling using Densely Connected Recurrent Neural Networks
Improving Language Modeling using Densely Connected Recurrent Neural NetworksImproving Language Modeling using Densely Connected Recurrent Neural Networks
Improving Language Modeling using Densely Connected Recurrent Neural Networks
 
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
 
The Normalized Freebase Distance (NFD)
The Normalized Freebase Distance (NFD)The Normalized Freebase Distance (NFD)
The Normalized Freebase Distance (NFD)
 
Msm2013challenge
Msm2013challengeMsm2013challenge
Msm2013challenge
 
Using Topic Models for Twitter hashtag recommendation
Using Topic Models for Twitter hashtag recommendationUsing Topic Models for Twitter hashtag recommendation
Using Topic Models for Twitter hashtag recommendation
 

Recently uploaded

Article writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptxArticle writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptxabhinandnam9997
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxGal Baras
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyDamar Juniarto
 
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理aagad
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shoplaozhuseo02
 
ER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAEER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAEHimani415946
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
 
The AI Powered Organization-Intro to AI-LAN.pdf
The AI Powered Organization-Intro to AI-LAN.pdfThe AI Powered Organization-Intro to AI-LAN.pdf
The AI Powered Organization-Intro to AI-LAN.pdfSiskaFitrianingrum
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxlaozhuseo02
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?Linksys Velop Login
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan
 

Recently uploaded (12)

Article writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptxArticle writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptx
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case Study
 
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
ER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAEER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAE
 
The Best AI Powered Software - Intellivid AI Studio
The Best AI Powered Software - Intellivid AI StudioThe Best AI Powered Software - Intellivid AI Studio
The Best AI Powered Software - Intellivid AI Studio
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
The AI Powered Organization-Intro to AI-LAN.pdf
The AI Powered Organization-Intro to AI-LAN.pdfThe AI Powered Organization-Intro to AI-LAN.pdf
The AI Powered Organization-Intro to AI-LAN.pdf
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdf
 

Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter Microposts using Distributed Word Representations

  • 1. http://multimedialab.elis.ugent.be Ghent University – iMinds, ELIS Department/Multimedia Lab Gaston Crommenlaan 8 bus 201 B-9050 Ledeberg – Ghent, Belgium Fréderic Godin, Baptist Vandersmissen, Azarakhsh Jalalvand, Wesley De Neve and Rik Van de Walle Workshop on Machine Learning and NLP, NIPS 2014 Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter Microposts using Distributed Word Representations 12/12/2014, Montreal, Canada Research Question Vote-Constrained Bootstrapping* Can we avoid manual feature engineering when developing a Part-of-Speech tagger for Twitter microposts? @frederic_godin, @BaptistV, @wmdeneve and @rvdwalle Solution Automatically learn features on 400 million raw Twitter microposts that capture syntactic and semantic patterns and feed them to a neural network Learn Features Train the Part-of-Speech Tagger 400 million Word2vec Skip-gram 400D vector 400D 400D vector 400D 400D 444000000DDD v vVeeecccttotoorr r Hidden Layer (500D) Output Layer (52D) im doin good VBG Evaluation ARK tagger GATE tagger im doin good VBG V Agree? Automatically generate high confidence labeled data Use this data to pre-train the neural network *Derczynski et al., 2013. "Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data" Word2vec dataset Pre-training dataset Accuracy validation set Accuracy test set 150M / 87.95% 87.46% 150M 50K 89.64% 88.82% 400M 50K 89.73% 88.95% 400M 125K 90.09% 88.90% Ritter et al. (2011) 84.55% Derczynski et al. (2013) 88.69%