SlideShare a Scribd company logo
1 of 18
ELIS – Multimedia Lab
Fréderic Godin, Baptist Vandersmissen,
Wesley De Neve & Rik Van de Walle
Multimedia Lab, Ghent University – iMinds
Find me at: @frederic_godin / www.fredericgodin.com
Named Entity Recognition for Twitter Microposts
(only) using Distributed Word Representations
2
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Introduction
Goal: Recognizing 10 types of named entities (NEs)
in noisy Twitter microposts
Problem: Tweets contain spelling mistakes, slang
and lack uniform grammar rules
3
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Traditional solutions
Typical features: Ortographic features, gazetteers,
corpus statistics or other parsing techniques (PoS
and chunking)
Typical machine learning techniques: CRF, HMM
4
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
POS
Ortho-
graphic
Gaze
tteers
Brown
clustering
Word
embedding
ML F1(%)
ousia X X X – GloVe
entity linking
using SVM
56.41
NLANGP – X X X
word2vec &
GloVe
CRF++ 51.40
nrc – – X X word2vec
semi-Markov
MIRA
44.74
multimedialab – – – – word2vec FFNN 43.75
USFD X X X X – CRF L-BFGS 42.46
iitp X X X – – CRF++ 39.84
Hallym X – – X
correlation
analysis
CRFsuite 37.21
lattice X X – X – CRF wapiti 16.47
Baseline – X X – – CRFsuite 31.97
An overview of the used approaches
5
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
A simple, general but effective
neural network architecture
Use word2vec to generate good feature representations for
words (=unsupervised learning)
Feed those word representations to another neural network
(NN) for any classification task (=supervised learning)
Example
Feature
representation
Machine
learning
Label(s)
Learn word2vec
word representations
once in advance
Train a new NN
for any task
6
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Word2vec: automatically learning good features
2D projection of a 400D space of the top 1000 words used on Twitter.
The model was trained on 400 million tweets having 5 billion words
7
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
A simple, general but effective
neural network architecture (1)
W(t-1)
W(t)
W(t+1)
L
o
o
k
u
p
N-dim
N-dim
N-dim
Feed
forward
neural
network
Tag(W(t))
Example
Feature
representation
Machine
learning
Label(s)
Concatenate (3N-dim)Window = 3
8
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
A simple, general but effective
neural network architecture (2)
from
Beijing
to
L
o
o
k
u
p
N-dim
N-dim
N-dim
Feed
forward
neural
network
Location
Example
Feature
representation
Machine
learning
Label(s)
Concatenate (3N-dim)Window = 3
9
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Postprocessing (1)
FR ML
W(1)
W(2)
W(3)
Label(1)
Label(2)
Label(3)
Post-
processing
Label(1)
Label(2)
Label(3)
Correct for inconsistencies
NE starting with an I-tag
Multi-word expressions having different categories
10
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Postprocessing (2)
FR ML
Manchester
United
is
B-Loc
I-sportsteam
O
Post-
processing
B-sportsteam
I-sportsteam
O
Correct for inconsistencies
NE starting with an I-tag
Multi-word expressions having different categories
11
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Experimental setup
Feature Learning
Word2vec Skipgram with negative sampling
400 million raw English tweets (limited preprocessing)
Neural Network
One hidden layer, with 500 hidden units
Word embeddings of size 400, Voc of 3mil words
Mini-batch SGD and Dropout
Experiments with Tanh and ReLU
12
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Word2vec results
Slang
- Wrong capitalization
- Sometimes not in Gazetteer
Spelling
13
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Normalizing slang words/spelling
14
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Dealing with capitalization + gazetteer functionality
15
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Results
POS
Ortho-
graphic
Gaze
tteers
Brown
clustering
Word
embedding
ML F1(%)
ousia X X X – GloVe
entity linking
using SVM
56.41
NLANGP – X X X
word2vec &
GloVe
CRF++ 51.40
nrc – – X X word2vec
semi-Markov
MIRA
44.74
multimedialab – – – – word2vec FFNN 43.75
USFD X X X X – CRF L-BFGS 42.46
iitp X X X – – CRF++ 39.84
Hallym X – – X
correlation
analysis
CRFsuite 37.21
lattice X X – X – CRF wapiti 16.47
BASELINE – X X – – CRFsuite 31.97
16
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Lessons learned
Feature Learning
A W2V window of 1 worked best
More syntax-oriented embeddings
Neural Networks
Multiple layers did not improve the F1-score
Dropout and ReLU worked best
Postprocessing
Multi-word expressions often have different categories
17
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Conclusion
End-to-end semi-supervised neural network architecture
No feature engineering needed
Reusable architecture
Beats traditional systems that only use
hand-crafted features
18
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
#Questions?
http://www.fredericgodin.com/software/
The word2vec Twitter
model is available at:
@frederic_godin

More Related Content

What's hot

Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Mining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software ArtifactsMining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software ArtifactsPreetha Chatterjee
 
Native Language Identification - Brief review to the state of the art
Native Language Identification - Brief review to the state of the artNative Language Identification - Brief review to the state of the art
Native Language Identification - Brief review to the state of the artFrancisco Manuel Rangel Pardo
 
Seq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese LanguageSeq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese LanguageJinho Choi
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroNumenta
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2alessio_ferrari
 
Generic Tools, Specific Laguages
Generic Tools, Specific LaguagesGeneric Tools, Specific Laguages
Generic Tools, Specific LaguagesMarkus Voelter
 

What's hot (10)

Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Mining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software ArtifactsMining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software Artifacts
 
Native Language Identification - Brief review to the state of the art
Native Language Identification - Brief review to the state of the artNative Language Identification - Brief review to the state of the art
Native Language Identification - Brief review to the state of the art
 
Seq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese LanguageSeq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese Language
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2
 
Anabela Barreiro - Alinhamentos
Anabela Barreiro - AlinhamentosAnabela Barreiro - Alinhamentos
Anabela Barreiro - Alinhamentos
 
Cross language alignments - challenges guidelines and gold sets
Cross language alignments - challenges guidelines and gold setsCross language alignments - challenges guidelines and gold sets
Cross language alignments - challenges guidelines and gold sets
 
Generic Tools, Specific Laguages
Generic Tools, Specific LaguagesGeneric Tools, Specific Laguages
Generic Tools, Specific Laguages
 

Viewers also liked

The Forrester Wave™: Enterprise Mobile Management, 3° trimestre 2014
The Forrester Wave™: Enterprise Mobile Management, 3° trimestre 2014The Forrester Wave™: Enterprise Mobile Management, 3° trimestre 2014
The Forrester Wave™: Enterprise Mobile Management, 3° trimestre 2014Symantec
 
Nz the role of communication in the learning process
Nz   the role of communication in the learning processNz   the role of communication in the learning process
Nz the role of communication in the learning processNanang Zubaidi
 
Muscle ativaction during four pilates core stability exercises in quadrupede ...
Muscle ativaction during four pilates core stability exercises in quadrupede ...Muscle ativaction during four pilates core stability exercises in quadrupede ...
Muscle ativaction during four pilates core stability exercises in quadrupede ...Dra. Welker Fisioterapeuta
 
Customer service and our_behaviour - ARISE ROBY
Customer service and our_behaviour - ARISE ROBYCustomer service and our_behaviour - ARISE ROBY
Customer service and our_behaviour - ARISE ROBYArise Roby
 
ARG Panel at VWLondon
ARG Panel at VWLondonARG Panel at VWLondon
ARG Panel at VWLondonRoo Reynolds
 
R41960 Fed Energy Contracting Authority
R41960 Fed Energy Contracting AuthorityR41960 Fed Energy Contracting Authority
R41960 Fed Energy Contracting AuthorityAnthony Andrews
 
Estate Tax Repeal
Estate Tax RepealEstate Tax Repeal
Estate Tax Repealpquimby
 
Build your network (Arabic)
Build your network (Arabic)Build your network (Arabic)
Build your network (Arabic)LinkedIn Nordic
 
A Vision for Indigenous Evaluation | Nan Wehipeihana Keynote presentation at ...
A Vision for Indigenous Evaluation | Nan Wehipeihana Keynote presentation at ...A Vision for Indigenous Evaluation | Nan Wehipeihana Keynote presentation at ...
A Vision for Indigenous Evaluation | Nan Wehipeihana Keynote presentation at ...Nan Wehipeihana
 
How we use tools to help our startup clients
How we use tools to help our startup clientsHow we use tools to help our startup clients
How we use tools to help our startup clientsAntti Salonen
 
Some of my collections
Some of my collectionsSome of my collections
Some of my collectionsRoo Reynolds
 
Guitarra ejercicios para_la_mano_izquierda
Guitarra ejercicios para_la_mano_izquierdaGuitarra ejercicios para_la_mano_izquierda
Guitarra ejercicios para_la_mano_izquierdaSergio Zurdo
 
Google Analytics - 5 Powodow by pokochac Google Analytics - Robert Drozd
Google Analytics - 5 Powodow by pokochac Google Analytics - Robert DrozdGoogle Analytics - 5 Powodow by pokochac Google Analytics - Robert Drozd
Google Analytics - 5 Powodow by pokochac Google Analytics - Robert Drozdaulapolska
 

Viewers also liked (14)

The Forrester Wave™: Enterprise Mobile Management, 3° trimestre 2014
The Forrester Wave™: Enterprise Mobile Management, 3° trimestre 2014The Forrester Wave™: Enterprise Mobile Management, 3° trimestre 2014
The Forrester Wave™: Enterprise Mobile Management, 3° trimestre 2014
 
Meditation slides
Meditation slidesMeditation slides
Meditation slides
 
Nz the role of communication in the learning process
Nz   the role of communication in the learning processNz   the role of communication in the learning process
Nz the role of communication in the learning process
 
Muscle ativaction during four pilates core stability exercises in quadrupede ...
Muscle ativaction during four pilates core stability exercises in quadrupede ...Muscle ativaction during four pilates core stability exercises in quadrupede ...
Muscle ativaction during four pilates core stability exercises in quadrupede ...
 
Customer service and our_behaviour - ARISE ROBY
Customer service and our_behaviour - ARISE ROBYCustomer service and our_behaviour - ARISE ROBY
Customer service and our_behaviour - ARISE ROBY
 
ARG Panel at VWLondon
ARG Panel at VWLondonARG Panel at VWLondon
ARG Panel at VWLondon
 
R41960 Fed Energy Contracting Authority
R41960 Fed Energy Contracting AuthorityR41960 Fed Energy Contracting Authority
R41960 Fed Energy Contracting Authority
 
Estate Tax Repeal
Estate Tax RepealEstate Tax Repeal
Estate Tax Repeal
 
Build your network (Arabic)
Build your network (Arabic)Build your network (Arabic)
Build your network (Arabic)
 
A Vision for Indigenous Evaluation | Nan Wehipeihana Keynote presentation at ...
A Vision for Indigenous Evaluation | Nan Wehipeihana Keynote presentation at ...A Vision for Indigenous Evaluation | Nan Wehipeihana Keynote presentation at ...
A Vision for Indigenous Evaluation | Nan Wehipeihana Keynote presentation at ...
 
How we use tools to help our startup clients
How we use tools to help our startup clientsHow we use tools to help our startup clients
How we use tools to help our startup clients
 
Some of my collections
Some of my collectionsSome of my collections
Some of my collections
 
Guitarra ejercicios para_la_mano_izquierda
Guitarra ejercicios para_la_mano_izquierdaGuitarra ejercicios para_la_mano_izquierda
Guitarra ejercicios para_la_mano_izquierda
 
Google Analytics - 5 Powodow by pokochac Google Analytics - Robert Drozd
Google Analytics - 5 Powodow by pokochac Google Analytics - Robert DrozdGoogle Analytics - 5 Powodow by pokochac Google Analytics - Robert Drozd
Google Analytics - 5 Powodow by pokochac Google Analytics - Robert Drozd
 

Similar to Named Entity Recognition for Twitter Microposts (only) using Distributed Word Representations

Microsoft PROSE SDK: A Framework for Inductive Program Synthesis
Microsoft PROSE SDK: A Framework for Inductive Program SynthesisMicrosoft PROSE SDK: A Framework for Inductive Program Synthesis
Microsoft PROSE SDK: A Framework for Inductive Program SynthesisAlex Polozov
 
CLaSIC 2016 presentation
CLaSIC 2016 presentationCLaSIC 2016 presentation
CLaSIC 2016 presentationTakeshi Sato
 
IRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET Journal
 
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondIsabelle Augenstein
 
Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...
Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...
Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...fgodin
 
Using Topic Models for Twitter hashtag recommendation
Using Topic Models for Twitter hashtag recommendationUsing Topic Models for Twitter hashtag recommendation
Using Topic Models for Twitter hashtag recommendationfgodin
 
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Marcin Junczys-Dowmunt
 
SyncMeta: Near Real-time Collaborative Conceptual Modeling on the Web
SyncMeta: Near Real-time Collaborative Conceptual Modeling on the WebSyncMeta: Near Real-time Collaborative Conceptual Modeling on the Web
SyncMeta: Near Real-time Collaborative Conceptual Modeling on the WebNicolaescu Petru
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Heide-Mieke Scherpereel - Sensotec WoDy Audiokrant
Heide-Mieke Scherpereel - Sensotec WoDy AudiokrantHeide-Mieke Scherpereel - Sensotec WoDy Audiokrant
Heide-Mieke Scherpereel - Sensotec WoDy Audiokrantimec.archive
 
NL-based Software Engineering (NLBSE) '22
NL-based Software Engineering (NLBSE) '22NL-based Software Engineering (NLBSE) '22
NL-based Software Engineering (NLBSE) '22Sebastiano Panichella
 
Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Felix Z. Hoffmann
 
Logics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese UnderstandingLogics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese UnderstandingValeria de Paiva
 
Towards Twitter hashtag recommendation using distributed word representations...
Towards Twitter hashtag recommendation using distributed word representations...Towards Twitter hashtag recommendation using distributed word representations...
Towards Twitter hashtag recommendation using distributed word representations...Wesley De Neve
 
Using DBpedia for Spotting and Disambiguating Entities
Using DBpedia for Spotting and Disambiguating EntitiesUsing DBpedia for Spotting and Disambiguating Entities
Using DBpedia for Spotting and Disambiguating EntitiesJulien PLU
 
Real-Time Metamodeling in the Web Browser
Real-Time Metamodeling in the Web BrowserReal-Time Metamodeling in the Web Browser
Real-Time Metamodeling in the Web BrowserMichael Derntl
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsRoelof Pieters
 

Similar to Named Entity Recognition for Twitter Microposts (only) using Distributed Word Representations (20)

Question answering
Question answeringQuestion answering
Question answering
 
Microsoft PROSE SDK: A Framework for Inductive Program Synthesis
Microsoft PROSE SDK: A Framework for Inductive Program SynthesisMicrosoft PROSE SDK: A Framework for Inductive Program Synthesis
Microsoft PROSE SDK: A Framework for Inductive Program Synthesis
 
CLaSIC 2016 presentation
CLaSIC 2016 presentationCLaSIC 2016 presentation
CLaSIC 2016 presentation
 
IRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET- Semantic Question Matching
IRJET- Semantic Question Matching
 
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
 
Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...
Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...
Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...
 
Using Topic Models for Twitter hashtag recommendation
Using Topic Models for Twitter hashtag recommendationUsing Topic Models for Twitter hashtag recommendation
Using Topic Models for Twitter hashtag recommendation
 
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
 
SyncMeta: Near Real-time Collaborative Conceptual Modeling on the Web
SyncMeta: Near Real-time Collaborative Conceptual Modeling on the WebSyncMeta: Near Real-time Collaborative Conceptual Modeling on the Web
SyncMeta: Near Real-time Collaborative Conceptual Modeling on the Web
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Heide-Mieke Scherpereel - Sensotec WoDy Audiokrant
Heide-Mieke Scherpereel - Sensotec WoDy AudiokrantHeide-Mieke Scherpereel - Sensotec WoDy Audiokrant
Heide-Mieke Scherpereel - Sensotec WoDy Audiokrant
 
NL-based Software Engineering (NLBSE) '22
NL-based Software Engineering (NLBSE) '22NL-based Software Engineering (NLBSE) '22
NL-based Software Engineering (NLBSE) '22
 
FLOSS Case Studies
FLOSS Case StudiesFLOSS Case Studies
FLOSS Case Studies
 
Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?
 
Logics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese UnderstandingLogics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese Understanding
 
Towards Twitter hashtag recommendation using distributed word representations...
Towards Twitter hashtag recommendation using distributed word representations...Towards Twitter hashtag recommendation using distributed word representations...
Towards Twitter hashtag recommendation using distributed word representations...
 
Project report
Project reportProject report
Project report
 
Using DBpedia for Spotting and Disambiguating Entities
Using DBpedia for Spotting and Disambiguating EntitiesUsing DBpedia for Spotting and Disambiguating Entities
Using DBpedia for Spotting and Disambiguating Entities
 
Real-Time Metamodeling in the Web Browser
Real-Time Metamodeling in the Web BrowserReal-Time Metamodeling in the Web Browser
Real-Time Metamodeling in the Web Browser
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 

Recently uploaded

This is a Powerpoint about research into the codes and conventions of a film ...
This is a Powerpoint about research into the codes and conventions of a film ...This is a Powerpoint about research into the codes and conventions of a film ...
This is a Powerpoint about research into the codes and conventions of a film ...samuelcoulson30
 
DickinsonSlides teeeeeeeeeeessssssssssst.pptx
DickinsonSlides teeeeeeeeeeessssssssssst.pptxDickinsonSlides teeeeeeeeeeessssssssssst.pptx
DickinsonSlides teeeeeeeeeeessssssssssst.pptxednyonat
 
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...Nitya salvi
 
SELECTING A SOCIAL MEDIA MARKETING COMPANY
SELECTING A SOCIAL MEDIA MARKETING COMPANYSELECTING A SOCIAL MEDIA MARKETING COMPANY
SELECTING A SOCIAL MEDIA MARKETING COMPANYdizinfo
 
Top Call Girls In Telibagh ( Lucknow ) 🔝 8923113531 🔝 Cash Payment
Top Call Girls In Telibagh ( Lucknow  ) 🔝 8923113531 🔝  Cash PaymentTop Call Girls In Telibagh ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment
Top Call Girls In Telibagh ( Lucknow ) 🔝 8923113531 🔝 Cash Paymentanilsa9823
 
Film the city investagation powerpoint :)
Film the city investagation powerpoint :)Film the city investagation powerpoint :)
Film the city investagation powerpoint :)AshtonCains
 
Film show evaluation powerpoint for site
Film show evaluation powerpoint for siteFilm show evaluation powerpoint for site
Film show evaluation powerpoint for siteAshtonCains
 
CASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFE
CASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFECASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFE
CASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFECall girl Jaipur
 
BDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
VIP Chandigarh Call Girls Service 7001035870 Enjoy Call Girls With Our Escorts
VIP Chandigarh Call Girls Service 7001035870 Enjoy Call Girls With Our EscortsVIP Chandigarh Call Girls Service 7001035870 Enjoy Call Girls With Our Escorts
VIP Chandigarh Call Girls Service 7001035870 Enjoy Call Girls With Our Escortssonatiwari757
 
Improve Your Brand in Waco with a Professional Social Media Marketing Company
Improve Your Brand in Waco with a Professional Social Media Marketing CompanyImprove Your Brand in Waco with a Professional Social Media Marketing Company
Improve Your Brand in Waco with a Professional Social Media Marketing CompanyWSI INTERNET PARTNER
 
CALL ON ➥8923113531 🔝Call Girls Takrohi Lucknow best Female service 👖
CALL ON ➥8923113531 🔝Call Girls Takrohi Lucknow best Female service  👖CALL ON ➥8923113531 🔝Call Girls Takrohi Lucknow best Female service  👖
CALL ON ➥8923113531 🔝Call Girls Takrohi Lucknow best Female service 👖anilsa9823
 
Film show production powerpoint for site
Film show production powerpoint for siteFilm show production powerpoint for site
Film show production powerpoint for siteAshtonCains
 
Night 7k Call Girls Atta Market Escorts Call Me: 8448380779
Night 7k Call Girls Atta Market Escorts Call Me: 8448380779Night 7k Call Girls Atta Market Escorts Call Me: 8448380779
Night 7k Call Girls Atta Market Escorts Call Me: 8448380779Delhi Call girls
 
Your LinkedIn Makeover: Sociocosmos Presence Package
Your LinkedIn Makeover: Sociocosmos Presence PackageYour LinkedIn Makeover: Sociocosmos Presence Package
Your LinkedIn Makeover: Sociocosmos Presence PackageSocioCosmos
 
9990611130 Find & Book Russian Call Girls In Crossings Republik
9990611130 Find & Book Russian Call Girls In Crossings Republik9990611130 Find & Book Russian Call Girls In Crossings Republik
9990611130 Find & Book Russian Call Girls In Crossings RepublikGenuineGirls
 
Elite Class ➥8448380779▻ Call Girls In Nizammuddin Delhi NCR
Elite Class ➥8448380779▻ Call Girls In Nizammuddin Delhi NCRElite Class ➥8448380779▻ Call Girls In Nizammuddin Delhi NCR
Elite Class ➥8448380779▻ Call Girls In Nizammuddin Delhi NCRDelhi Call girls
 
Call Girls In Gurgaon Dlf pHACE 2 Women Delhi ncr
Call Girls In Gurgaon Dlf pHACE 2 Women Delhi ncrCall Girls In Gurgaon Dlf pHACE 2 Women Delhi ncr
Call Girls In Gurgaon Dlf pHACE 2 Women Delhi ncrSapana Sha
 

Recently uploaded (20)

This is a Powerpoint about research into the codes and conventions of a film ...
This is a Powerpoint about research into the codes and conventions of a film ...This is a Powerpoint about research into the codes and conventions of a film ...
This is a Powerpoint about research into the codes and conventions of a film ...
 
DickinsonSlides teeeeeeeeeeessssssssssst.pptx
DickinsonSlides teeeeeeeeeeessssssssssst.pptxDickinsonSlides teeeeeeeeeeessssssssssst.pptx
DickinsonSlides teeeeeeeeeeessssssssssst.pptx
 
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
 
SELECTING A SOCIAL MEDIA MARKETING COMPANY
SELECTING A SOCIAL MEDIA MARKETING COMPANYSELECTING A SOCIAL MEDIA MARKETING COMPANY
SELECTING A SOCIAL MEDIA MARKETING COMPANY
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Masudpur
Delhi  99530 vip 56974  Genuine Escort Service Call Girls in MasudpurDelhi  99530 vip 56974  Genuine Escort Service Call Girls in Masudpur
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Masudpur
 
Top Call Girls In Telibagh ( Lucknow ) 🔝 8923113531 🔝 Cash Payment
Top Call Girls In Telibagh ( Lucknow  ) 🔝 8923113531 🔝  Cash PaymentTop Call Girls In Telibagh ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment
Top Call Girls In Telibagh ( Lucknow ) 🔝 8923113531 🔝 Cash Payment
 
Film the city investagation powerpoint :)
Film the city investagation powerpoint :)Film the city investagation powerpoint :)
Film the city investagation powerpoint :)
 
Film show evaluation powerpoint for site
Film show evaluation powerpoint for siteFilm show evaluation powerpoint for site
Film show evaluation powerpoint for site
 
CASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFE
CASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFECASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFE
CASH PAYMENT ON GIRL HAND TO HAND HOUSEWIFE
 
BDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort Service
 
Russian Call Girls Rohini Sector 37 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Rohini Sector 37 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Rohini Sector 37 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Rohini Sector 37 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
VIP Chandigarh Call Girls Service 7001035870 Enjoy Call Girls With Our Escorts
VIP Chandigarh Call Girls Service 7001035870 Enjoy Call Girls With Our EscortsVIP Chandigarh Call Girls Service 7001035870 Enjoy Call Girls With Our Escorts
VIP Chandigarh Call Girls Service 7001035870 Enjoy Call Girls With Our Escorts
 
Improve Your Brand in Waco with a Professional Social Media Marketing Company
Improve Your Brand in Waco with a Professional Social Media Marketing CompanyImprove Your Brand in Waco with a Professional Social Media Marketing Company
Improve Your Brand in Waco with a Professional Social Media Marketing Company
 
CALL ON ➥8923113531 🔝Call Girls Takrohi Lucknow best Female service 👖
CALL ON ➥8923113531 🔝Call Girls Takrohi Lucknow best Female service  👖CALL ON ➥8923113531 🔝Call Girls Takrohi Lucknow best Female service  👖
CALL ON ➥8923113531 🔝Call Girls Takrohi Lucknow best Female service 👖
 
Film show production powerpoint for site
Film show production powerpoint for siteFilm show production powerpoint for site
Film show production powerpoint for site
 
Night 7k Call Girls Atta Market Escorts Call Me: 8448380779
Night 7k Call Girls Atta Market Escorts Call Me: 8448380779Night 7k Call Girls Atta Market Escorts Call Me: 8448380779
Night 7k Call Girls Atta Market Escorts Call Me: 8448380779
 
Your LinkedIn Makeover: Sociocosmos Presence Package
Your LinkedIn Makeover: Sociocosmos Presence PackageYour LinkedIn Makeover: Sociocosmos Presence Package
Your LinkedIn Makeover: Sociocosmos Presence Package
 
9990611130 Find & Book Russian Call Girls In Crossings Republik
9990611130 Find & Book Russian Call Girls In Crossings Republik9990611130 Find & Book Russian Call Girls In Crossings Republik
9990611130 Find & Book Russian Call Girls In Crossings Republik
 
Elite Class ➥8448380779▻ Call Girls In Nizammuddin Delhi NCR
Elite Class ➥8448380779▻ Call Girls In Nizammuddin Delhi NCRElite Class ➥8448380779▻ Call Girls In Nizammuddin Delhi NCR
Elite Class ➥8448380779▻ Call Girls In Nizammuddin Delhi NCR
 
Call Girls In Gurgaon Dlf pHACE 2 Women Delhi ncr
Call Girls In Gurgaon Dlf pHACE 2 Women Delhi ncrCall Girls In Gurgaon Dlf pHACE 2 Women Delhi ncr
Call Girls In Gurgaon Dlf pHACE 2 Women Delhi ncr
 

Named Entity Recognition for Twitter Microposts (only) using Distributed Word Representations

  • 1. ELIS – Multimedia Lab Fréderic Godin, Baptist Vandersmissen, Wesley De Neve & Rik Van de Walle Multimedia Lab, Ghent University – iMinds Find me at: @frederic_godin / www.fredericgodin.com Named Entity Recognition for Twitter Microposts (only) using Distributed Word Representations
  • 2. 2 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Introduction Goal: Recognizing 10 types of named entities (NEs) in noisy Twitter microposts Problem: Tweets contain spelling mistakes, slang and lack uniform grammar rules
  • 3. 3 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Traditional solutions Typical features: Ortographic features, gazetteers, corpus statistics or other parsing techniques (PoS and chunking) Typical machine learning techniques: CRF, HMM
  • 4. 4 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 POS Ortho- graphic Gaze tteers Brown clustering Word embedding ML F1(%) ousia X X X – GloVe entity linking using SVM 56.41 NLANGP – X X X word2vec & GloVe CRF++ 51.40 nrc – – X X word2vec semi-Markov MIRA 44.74 multimedialab – – – – word2vec FFNN 43.75 USFD X X X X – CRF L-BFGS 42.46 iitp X X X – – CRF++ 39.84 Hallym X – – X correlation analysis CRFsuite 37.21 lattice X X – X – CRF wapiti 16.47 Baseline – X X – – CRFsuite 31.97 An overview of the used approaches
  • 5. 5 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 A simple, general but effective neural network architecture Use word2vec to generate good feature representations for words (=unsupervised learning) Feed those word representations to another neural network (NN) for any classification task (=supervised learning) Example Feature representation Machine learning Label(s) Learn word2vec word representations once in advance Train a new NN for any task
  • 6. 6 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Word2vec: automatically learning good features 2D projection of a 400D space of the top 1000 words used on Twitter. The model was trained on 400 million tweets having 5 billion words
  • 7. 7 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 A simple, general but effective neural network architecture (1) W(t-1) W(t) W(t+1) L o o k u p N-dim N-dim N-dim Feed forward neural network Tag(W(t)) Example Feature representation Machine learning Label(s) Concatenate (3N-dim)Window = 3
  • 8. 8 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 A simple, general but effective neural network architecture (2) from Beijing to L o o k u p N-dim N-dim N-dim Feed forward neural network Location Example Feature representation Machine learning Label(s) Concatenate (3N-dim)Window = 3
  • 9. 9 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Postprocessing (1) FR ML W(1) W(2) W(3) Label(1) Label(2) Label(3) Post- processing Label(1) Label(2) Label(3) Correct for inconsistencies NE starting with an I-tag Multi-word expressions having different categories
  • 10. 10 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Postprocessing (2) FR ML Manchester United is B-Loc I-sportsteam O Post- processing B-sportsteam I-sportsteam O Correct for inconsistencies NE starting with an I-tag Multi-word expressions having different categories
  • 11. 11 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Experimental setup Feature Learning Word2vec Skipgram with negative sampling 400 million raw English tweets (limited preprocessing) Neural Network One hidden layer, with 500 hidden units Word embeddings of size 400, Voc of 3mil words Mini-batch SGD and Dropout Experiments with Tanh and ReLU
  • 12. 12 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Word2vec results Slang - Wrong capitalization - Sometimes not in Gazetteer Spelling
  • 13. 13 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Normalizing slang words/spelling
  • 14. 14 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Dealing with capitalization + gazetteer functionality
  • 15. 15 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Results POS Ortho- graphic Gaze tteers Brown clustering Word embedding ML F1(%) ousia X X X – GloVe entity linking using SVM 56.41 NLANGP – X X X word2vec & GloVe CRF++ 51.40 nrc – – X X word2vec semi-Markov MIRA 44.74 multimedialab – – – – word2vec FFNN 43.75 USFD X X X X – CRF L-BFGS 42.46 iitp X X X – – CRF++ 39.84 Hallym X – – X correlation analysis CRFsuite 37.21 lattice X X – X – CRF wapiti 16.47 BASELINE – X X – – CRFsuite 31.97
  • 16. 16 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Lessons learned Feature Learning A W2V window of 1 worked best More syntax-oriented embeddings Neural Networks Multiple layers did not improve the F1-score Dropout and ReLU worked best Postprocessing Multi-word expressions often have different categories
  • 17. 17 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Conclusion End-to-end semi-supervised neural network architecture No feature engineering needed Reusable architecture Beats traditional systems that only use hand-crafted features
  • 18. 18 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 #Questions? http://www.fredericgodin.com/software/ The word2vec Twitter model is available at: @frederic_godin