SlideShare a Scribd company logo
ELIS – Multimedia Lab
Fréderic Godin, Baptist Vandersmissen,
Wesley De Neve & Rik Van de Walle
Multimedia Lab, Ghent University – iMinds
Find me at: @frederic_godin / www.fredericgodin.com
Named Entity Recognition for Twitter Microposts
(only) using Distributed Word Representations
2
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Introduction
Goal: Recognizing 10 types of named entities (NEs)
in noisy Twitter microposts
Problem: Tweets contain spelling mistakes, slang
and lack uniform grammar rules
3
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Traditional solutions
Typical features: Ortographic features, gazetteers,
corpus statistics or other parsing techniques (PoS
and chunking)
Typical machine learning techniques: CRF, HMM
4
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
POS
Ortho-
graphic
Gaze
tteers
Brown
clustering
Word
embedding
ML F1(%)
ousia X X X – GloVe
entity linking
using SVM
56.41
NLANGP – X X X
word2vec &
GloVe
CRF++ 51.40
nrc – – X X word2vec
semi-Markov
MIRA
44.74
multimedialab – – – – word2vec FFNN 43.75
USFD X X X X – CRF L-BFGS 42.46
iitp X X X – – CRF++ 39.84
Hallym X – – X
correlation
analysis
CRFsuite 37.21
lattice X X – X – CRF wapiti 16.47
Baseline – X X – – CRFsuite 31.97
An overview of the used approaches
5
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
A simple, general but effective
neural network architecture
Use word2vec to generate good feature representations for
words (=unsupervised learning)
Feed those word representations to another neural network
(NN) for any classification task (=supervised learning)
Example
Feature
representation
Machine
learning
Label(s)
Learn word2vec
word representations
once in advance
Train a new NN
for any task
6
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Word2vec: automatically learning good features
2D projection of a 400D space of the top 1000 words used on Twitter.
The model was trained on 400 million tweets having 5 billion words
7
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
A simple, general but effective
neural network architecture (1)
W(t-1)
W(t)
W(t+1)
L
o
o
k
u
p
N-dim
N-dim
N-dim
Feed
forward
neural
network
Tag(W(t))
Example
Feature
representation
Machine
learning
Label(s)
Concatenate (3N-dim)Window = 3
8
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
A simple, general but effective
neural network architecture (2)
from
Beijing
to
L
o
o
k
u
p
N-dim
N-dim
N-dim
Feed
forward
neural
network
Location
Example
Feature
representation
Machine
learning
Label(s)
Concatenate (3N-dim)Window = 3
9
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Postprocessing (1)
FR ML
W(1)
W(2)
W(3)
Label(1)
Label(2)
Label(3)
Post-
processing
Label(1)
Label(2)
Label(3)
Correct for inconsistencies
NE starting with an I-tag
Multi-word expressions having different categories
10
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Postprocessing (2)
FR ML
Manchester
United
is
B-Loc
I-sportsteam
O
Post-
processing
B-sportsteam
I-sportsteam
O
Correct for inconsistencies
NE starting with an I-tag
Multi-word expressions having different categories
11
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Experimental setup
Feature Learning
Word2vec Skipgram with negative sampling
400 million raw English tweets (limited preprocessing)
Neural Network
One hidden layer, with 500 hidden units
Word embeddings of size 400, Voc of 3mil words
Mini-batch SGD and Dropout
Experiments with Tanh and ReLU
12
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Word2vec results
Slang
- Wrong capitalization
- Sometimes not in Gazetteer
Spelling
13
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Normalizing slang words/spelling
14
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Dealing with capitalization + gazetteer functionality
15
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Results
POS
Ortho-
graphic
Gaze
tteers
Brown
clustering
Word
embedding
ML F1(%)
ousia X X X – GloVe
entity linking
using SVM
56.41
NLANGP – X X X
word2vec &
GloVe
CRF++ 51.40
nrc – – X X word2vec
semi-Markov
MIRA
44.74
multimedialab – – – – word2vec FFNN 43.75
USFD X X X X – CRF L-BFGS 42.46
iitp X X X – – CRF++ 39.84
Hallym X – – X
correlation
analysis
CRFsuite 37.21
lattice X X – X – CRF wapiti 16.47
BASELINE – X X – – CRFsuite 31.97
16
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Lessons learned
Feature Learning
A W2V window of 1 worked best
More syntax-oriented embeddings
Neural Networks
Multiple layers did not improve the F1-score
Dropout and ReLU worked best
Postprocessing
Multi-word expressions often have different categories
17
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Conclusion
End-to-end semi-supervised neural network architecture
No feature engineering needed
Reusable architecture
Beats traditional systems that only use
hand-crafted features
18
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
#Questions?
http://www.fredericgodin.com/software/
The word2vec Twitter
model is available at:
@frederic_godin

More Related Content

What's hot

Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
Mining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software ArtifactsMining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software Artifacts
Preetha Chatterjee
 
Native Language Identification - Brief review to the state of the art
Native Language Identification - Brief review to the state of the artNative Language Identification - Brief review to the state of the art
Native Language Identification - Brief review to the state of the art
Francisco Manuel Rangel Pardo
 
Seq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese LanguageSeq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese Language
Jinho Choi
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
Numenta
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2
alessio_ferrari
 
Anabela Barreiro - Alinhamentos
Anabela Barreiro - AlinhamentosAnabela Barreiro - Alinhamentos
Generic Tools, Specific Laguages
Generic Tools, Specific LaguagesGeneric Tools, Specific Laguages
Generic Tools, Specific Laguages
Markus Voelter
 

What's hot (10)

Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Mining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software ArtifactsMining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software Artifacts
 
Native Language Identification - Brief review to the state of the art
Native Language Identification - Brief review to the state of the artNative Language Identification - Brief review to the state of the art
Native Language Identification - Brief review to the state of the art
 
Seq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese LanguageSeq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese Language
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2
 
Anabela Barreiro - Alinhamentos
Anabela Barreiro - AlinhamentosAnabela Barreiro - Alinhamentos
Anabela Barreiro - Alinhamentos
 
Cross language alignments - challenges guidelines and gold sets
Cross language alignments - challenges guidelines and gold setsCross language alignments - challenges guidelines and gold sets
Cross language alignments - challenges guidelines and gold sets
 
Generic Tools, Specific Laguages
Generic Tools, Specific LaguagesGeneric Tools, Specific Laguages
Generic Tools, Specific Laguages
 

Viewers also liked

The Forrester Wave™: Enterprise Mobile Management, 3° trimestre 2014
The Forrester Wave™: Enterprise Mobile Management, 3° trimestre 2014The Forrester Wave™: Enterprise Mobile Management, 3° trimestre 2014
The Forrester Wave™: Enterprise Mobile Management, 3° trimestre 2014
Symantec
 
Nz the role of communication in the learning process
Nz   the role of communication in the learning processNz   the role of communication in the learning process
Nz the role of communication in the learning process
Nanang Zubaidi
 
Muscle ativaction during four pilates core stability exercises in quadrupede ...
Muscle ativaction during four pilates core stability exercises in quadrupede ...Muscle ativaction during four pilates core stability exercises in quadrupede ...
Muscle ativaction during four pilates core stability exercises in quadrupede ...
Dra. Welker Fisioterapeuta
 
Customer service and our_behaviour - ARISE ROBY
Customer service and our_behaviour - ARISE ROBYCustomer service and our_behaviour - ARISE ROBY
Customer service and our_behaviour - ARISE ROBY
Arise Roby
 
ARG Panel at VWLondon
ARG Panel at VWLondonARG Panel at VWLondon
ARG Panel at VWLondon
Roo Reynolds
 
R41960 Fed Energy Contracting Authority
R41960 Fed Energy Contracting AuthorityR41960 Fed Energy Contracting Authority
R41960 Fed Energy Contracting AuthorityAnthony Andrews
 
Estate Tax Repeal
Estate Tax RepealEstate Tax Repeal
Estate Tax Repeal
pquimby
 
Build your network (Arabic)
Build your network (Arabic)Build your network (Arabic)
Build your network (Arabic)
LinkedIn Nordic
 
A Vision for Indigenous Evaluation | Nan Wehipeihana Keynote presentation at ...
A Vision for Indigenous Evaluation | Nan Wehipeihana Keynote presentation at ...A Vision for Indigenous Evaluation | Nan Wehipeihana Keynote presentation at ...
A Vision for Indigenous Evaluation | Nan Wehipeihana Keynote presentation at ...
Nan Wehipeihana
 
How we use tools to help our startup clients
How we use tools to help our startup clientsHow we use tools to help our startup clients
How we use tools to help our startup clients
Antti Salonen
 
Some of my collections
Some of my collectionsSome of my collections
Some of my collections
Roo Reynolds
 
Guitarra ejercicios para_la_mano_izquierda
Guitarra ejercicios para_la_mano_izquierdaGuitarra ejercicios para_la_mano_izquierda
Guitarra ejercicios para_la_mano_izquierdaSergio Zurdo
 
Google Analytics - 5 Powodow by pokochac Google Analytics - Robert Drozd
Google Analytics - 5 Powodow by pokochac Google Analytics - Robert DrozdGoogle Analytics - 5 Powodow by pokochac Google Analytics - Robert Drozd
Google Analytics - 5 Powodow by pokochac Google Analytics - Robert Drozdaulapolska
 

Viewers also liked (14)

The Forrester Wave™: Enterprise Mobile Management, 3° trimestre 2014
The Forrester Wave™: Enterprise Mobile Management, 3° trimestre 2014The Forrester Wave™: Enterprise Mobile Management, 3° trimestre 2014
The Forrester Wave™: Enterprise Mobile Management, 3° trimestre 2014
 
Meditation slides
Meditation slidesMeditation slides
Meditation slides
 
Nz the role of communication in the learning process
Nz   the role of communication in the learning processNz   the role of communication in the learning process
Nz the role of communication in the learning process
 
Muscle ativaction during four pilates core stability exercises in quadrupede ...
Muscle ativaction during four pilates core stability exercises in quadrupede ...Muscle ativaction during four pilates core stability exercises in quadrupede ...
Muscle ativaction during four pilates core stability exercises in quadrupede ...
 
Customer service and our_behaviour - ARISE ROBY
Customer service and our_behaviour - ARISE ROBYCustomer service and our_behaviour - ARISE ROBY
Customer service and our_behaviour - ARISE ROBY
 
ARG Panel at VWLondon
ARG Panel at VWLondonARG Panel at VWLondon
ARG Panel at VWLondon
 
R41960 Fed Energy Contracting Authority
R41960 Fed Energy Contracting AuthorityR41960 Fed Energy Contracting Authority
R41960 Fed Energy Contracting Authority
 
Estate Tax Repeal
Estate Tax RepealEstate Tax Repeal
Estate Tax Repeal
 
Build your network (Arabic)
Build your network (Arabic)Build your network (Arabic)
Build your network (Arabic)
 
A Vision for Indigenous Evaluation | Nan Wehipeihana Keynote presentation at ...
A Vision for Indigenous Evaluation | Nan Wehipeihana Keynote presentation at ...A Vision for Indigenous Evaluation | Nan Wehipeihana Keynote presentation at ...
A Vision for Indigenous Evaluation | Nan Wehipeihana Keynote presentation at ...
 
How we use tools to help our startup clients
How we use tools to help our startup clientsHow we use tools to help our startup clients
How we use tools to help our startup clients
 
Some of my collections
Some of my collectionsSome of my collections
Some of my collections
 
Guitarra ejercicios para_la_mano_izquierda
Guitarra ejercicios para_la_mano_izquierdaGuitarra ejercicios para_la_mano_izquierda
Guitarra ejercicios para_la_mano_izquierda
 
Google Analytics - 5 Powodow by pokochac Google Analytics - Robert Drozd
Google Analytics - 5 Powodow by pokochac Google Analytics - Robert DrozdGoogle Analytics - 5 Powodow by pokochac Google Analytics - Robert Drozd
Google Analytics - 5 Powodow by pokochac Google Analytics - Robert Drozd
 

Similar to Named Entity Recognition for Twitter Microposts (only) using Distributed Word Representations

Question answering
Question answeringQuestion answering
Question answering
Nafiseh Navabpour
 
Microsoft PROSE SDK: A Framework for Inductive Program Synthesis
Microsoft PROSE SDK: A Framework for Inductive Program SynthesisMicrosoft PROSE SDK: A Framework for Inductive Program Synthesis
Microsoft PROSE SDK: A Framework for Inductive Program Synthesis
Alex Polozov
 
CLaSIC 2016 presentation
CLaSIC 2016 presentationCLaSIC 2016 presentation
CLaSIC 2016 presentation
Takeshi Sato
 
IRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET- Semantic Question Matching
IRJET- Semantic Question Matching
IRJET Journal
 
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
Isabelle Augenstein
 
Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...
Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...
Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...
fgodin
 
Using Topic Models for Twitter hashtag recommendation
Using Topic Models for Twitter hashtag recommendationUsing Topic Models for Twitter hashtag recommendation
Using Topic Models for Twitter hashtag recommendation
fgodin
 
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Marcin Junczys-Dowmunt
 
SyncMeta: Near Real-time Collaborative Conceptual Modeling on the Web
SyncMeta: Near Real-time Collaborative Conceptual Modeling on the WebSyncMeta: Near Real-time Collaborative Conceptual Modeling on the Web
SyncMeta: Near Real-time Collaborative Conceptual Modeling on the Web
Nicolaescu Petru
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Bhaskar Mitra
 
Heide-Mieke Scherpereel - Sensotec WoDy Audiokrant
Heide-Mieke Scherpereel - Sensotec WoDy AudiokrantHeide-Mieke Scherpereel - Sensotec WoDy Audiokrant
Heide-Mieke Scherpereel - Sensotec WoDy Audiokrantimec.archive
 
NL-based Software Engineering (NLBSE) '22
NL-based Software Engineering (NLBSE) '22NL-based Software Engineering (NLBSE) '22
NL-based Software Engineering (NLBSE) '22
Sebastiano Panichella
 
FLOSS Case Studies
FLOSS Case StudiesFLOSS Case Studies
FLOSS Case Studies
Dr. Sulayman K. Sowe
 
Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?
Felix Z. Hoffmann
 
Logics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese UnderstandingLogics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese Understanding
Valeria de Paiva
 
Towards Twitter hashtag recommendation using distributed word representations...
Towards Twitter hashtag recommendation using distributed word representations...Towards Twitter hashtag recommendation using distributed word representations...
Towards Twitter hashtag recommendation using distributed word representations...
Wesley De Neve
 
Project report
Project reportProject report
Project report
Utkarsh Soni
 
Using DBpedia for Spotting and Disambiguating Entities
Using DBpedia for Spotting and Disambiguating EntitiesUsing DBpedia for Spotting and Disambiguating Entities
Using DBpedia for Spotting and Disambiguating Entities
Julien PLU
 
Real-Time Metamodeling in the Web Browser
Real-Time Metamodeling in the Web BrowserReal-Time Metamodeling in the Web Browser
Real-Time Metamodeling in the Web Browser
Michael Derntl
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
Roelof Pieters
 

Similar to Named Entity Recognition for Twitter Microposts (only) using Distributed Word Representations (20)

Question answering
Question answeringQuestion answering
Question answering
 
Microsoft PROSE SDK: A Framework for Inductive Program Synthesis
Microsoft PROSE SDK: A Framework for Inductive Program SynthesisMicrosoft PROSE SDK: A Framework for Inductive Program Synthesis
Microsoft PROSE SDK: A Framework for Inductive Program Synthesis
 
CLaSIC 2016 presentation
CLaSIC 2016 presentationCLaSIC 2016 presentation
CLaSIC 2016 presentation
 
IRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET- Semantic Question Matching
IRJET- Semantic Question Matching
 
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
 
Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...
Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...
Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...
 
Using Topic Models for Twitter hashtag recommendation
Using Topic Models for Twitter hashtag recommendationUsing Topic Models for Twitter hashtag recommendation
Using Topic Models for Twitter hashtag recommendation
 
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
 
SyncMeta: Near Real-time Collaborative Conceptual Modeling on the Web
SyncMeta: Near Real-time Collaborative Conceptual Modeling on the WebSyncMeta: Near Real-time Collaborative Conceptual Modeling on the Web
SyncMeta: Near Real-time Collaborative Conceptual Modeling on the Web
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Heide-Mieke Scherpereel - Sensotec WoDy Audiokrant
Heide-Mieke Scherpereel - Sensotec WoDy AudiokrantHeide-Mieke Scherpereel - Sensotec WoDy Audiokrant
Heide-Mieke Scherpereel - Sensotec WoDy Audiokrant
 
NL-based Software Engineering (NLBSE) '22
NL-based Software Engineering (NLBSE) '22NL-based Software Engineering (NLBSE) '22
NL-based Software Engineering (NLBSE) '22
 
FLOSS Case Studies
FLOSS Case StudiesFLOSS Case Studies
FLOSS Case Studies
 
Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?
 
Logics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese UnderstandingLogics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese Understanding
 
Towards Twitter hashtag recommendation using distributed word representations...
Towards Twitter hashtag recommendation using distributed word representations...Towards Twitter hashtag recommendation using distributed word representations...
Towards Twitter hashtag recommendation using distributed word representations...
 
Project report
Project reportProject report
Project report
 
Using DBpedia for Spotting and Disambiguating Entities
Using DBpedia for Spotting and Disambiguating EntitiesUsing DBpedia for Spotting and Disambiguating Entities
Using DBpedia for Spotting and Disambiguating Entities
 
Real-Time Metamodeling in the Web Browser
Real-Time Metamodeling in the Web BrowserReal-Time Metamodeling in the Web Browser
Real-Time Metamodeling in the Web Browser
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 

Recently uploaded

7 Tips on Social Media Marketing strategy
7 Tips on Social Media Marketing strategy7 Tips on Social Media Marketing strategy
7 Tips on Social Media Marketing strategy
Digital Marketing Lab
 
Unlock TikTok Success with Sociocosmos..
Unlock TikTok Success with Sociocosmos..Unlock TikTok Success with Sociocosmos..
Unlock TikTok Success with Sociocosmos..
SocioCosmos
 
Your Path to YouTube Stardom Starts Here
Your Path to YouTube Stardom Starts HereYour Path to YouTube Stardom Starts Here
Your Path to YouTube Stardom Starts Here
SocioCosmos
 
Buy Pinterest Followers, Reactions & Repins Go Viral on Pinterest with Socio...
Buy Pinterest Followers, Reactions & Repins  Go Viral on Pinterest with Socio...Buy Pinterest Followers, Reactions & Repins  Go Viral on Pinterest with Socio...
Buy Pinterest Followers, Reactions & Repins Go Viral on Pinterest with Socio...
SocioCosmos
 
Grow Your Reddit Community Fast.........
Grow Your Reddit Community Fast.........Grow Your Reddit Community Fast.........
Grow Your Reddit Community Fast.........
SocioCosmos
 
Multilingual SEO Services | Multilingual Keyword Research | Filose
Multilingual SEO Services |  Multilingual Keyword Research | FiloseMultilingual SEO Services |  Multilingual Keyword Research | Filose
Multilingual SEO Services | Multilingual Keyword Research | Filose
madisonsmith478075
 
SluggerPunk Final Angel Investor Proposal
SluggerPunk Final Angel Investor ProposalSluggerPunk Final Angel Investor Proposal
SluggerPunk Final Angel Investor Proposal
grogshiregames
 
“To be integrated is to feel secure, to feel connected.” The views and experi...
“To be integrated is to feel secure, to feel connected.” The views and experi...“To be integrated is to feel secure, to feel connected.” The views and experi...
“To be integrated is to feel secure, to feel connected.” The views and experi...
AJHSSR Journal
 
Improving Workplace Safety Performance in Malaysian SMEs: The Role of Safety ...
Improving Workplace Safety Performance in Malaysian SMEs: The Role of Safety ...Improving Workplace Safety Performance in Malaysian SMEs: The Role of Safety ...
Improving Workplace Safety Performance in Malaysian SMEs: The Role of Safety ...
AJHSSR Journal
 
SluggerPunk Angel Investor Final Proposal
SluggerPunk Angel Investor Final ProposalSluggerPunk Angel Investor Final Proposal
SluggerPunk Angel Investor Final Proposal
grogshiregames
 
The Evolution of SEO: Insights from a Leading Digital Marketing Agency
The Evolution of SEO: Insights from a Leading Digital Marketing AgencyThe Evolution of SEO: Insights from a Leading Digital Marketing Agency
The Evolution of SEO: Insights from a Leading Digital Marketing Agency
Digital Marketing Lab
 
Surat Digital Marketing School - course curriculum
Surat Digital Marketing School - course curriculumSurat Digital Marketing School - course curriculum
Surat Digital Marketing School - course curriculum
digitalcourseshop4
 
Social Media Marketing Strategies .
Social Media Marketing Strategies                     .Social Media Marketing Strategies                     .
Social Media Marketing Strategies .
Virtual Real Design
 

Recently uploaded (13)

7 Tips on Social Media Marketing strategy
7 Tips on Social Media Marketing strategy7 Tips on Social Media Marketing strategy
7 Tips on Social Media Marketing strategy
 
Unlock TikTok Success with Sociocosmos..
Unlock TikTok Success with Sociocosmos..Unlock TikTok Success with Sociocosmos..
Unlock TikTok Success with Sociocosmos..
 
Your Path to YouTube Stardom Starts Here
Your Path to YouTube Stardom Starts HereYour Path to YouTube Stardom Starts Here
Your Path to YouTube Stardom Starts Here
 
Buy Pinterest Followers, Reactions & Repins Go Viral on Pinterest with Socio...
Buy Pinterest Followers, Reactions & Repins  Go Viral on Pinterest with Socio...Buy Pinterest Followers, Reactions & Repins  Go Viral on Pinterest with Socio...
Buy Pinterest Followers, Reactions & Repins Go Viral on Pinterest with Socio...
 
Grow Your Reddit Community Fast.........
Grow Your Reddit Community Fast.........Grow Your Reddit Community Fast.........
Grow Your Reddit Community Fast.........
 
Multilingual SEO Services | Multilingual Keyword Research | Filose
Multilingual SEO Services |  Multilingual Keyword Research | FiloseMultilingual SEO Services |  Multilingual Keyword Research | Filose
Multilingual SEO Services | Multilingual Keyword Research | Filose
 
SluggerPunk Final Angel Investor Proposal
SluggerPunk Final Angel Investor ProposalSluggerPunk Final Angel Investor Proposal
SluggerPunk Final Angel Investor Proposal
 
“To be integrated is to feel secure, to feel connected.” The views and experi...
“To be integrated is to feel secure, to feel connected.” The views and experi...“To be integrated is to feel secure, to feel connected.” The views and experi...
“To be integrated is to feel secure, to feel connected.” The views and experi...
 
Improving Workplace Safety Performance in Malaysian SMEs: The Role of Safety ...
Improving Workplace Safety Performance in Malaysian SMEs: The Role of Safety ...Improving Workplace Safety Performance in Malaysian SMEs: The Role of Safety ...
Improving Workplace Safety Performance in Malaysian SMEs: The Role of Safety ...
 
SluggerPunk Angel Investor Final Proposal
SluggerPunk Angel Investor Final ProposalSluggerPunk Angel Investor Final Proposal
SluggerPunk Angel Investor Final Proposal
 
The Evolution of SEO: Insights from a Leading Digital Marketing Agency
The Evolution of SEO: Insights from a Leading Digital Marketing AgencyThe Evolution of SEO: Insights from a Leading Digital Marketing Agency
The Evolution of SEO: Insights from a Leading Digital Marketing Agency
 
Surat Digital Marketing School - course curriculum
Surat Digital Marketing School - course curriculumSurat Digital Marketing School - course curriculum
Surat Digital Marketing School - course curriculum
 
Social Media Marketing Strategies .
Social Media Marketing Strategies                     .Social Media Marketing Strategies                     .
Social Media Marketing Strategies .
 

Named Entity Recognition for Twitter Microposts (only) using Distributed Word Representations

  • 1. ELIS – Multimedia Lab Fréderic Godin, Baptist Vandersmissen, Wesley De Neve & Rik Van de Walle Multimedia Lab, Ghent University – iMinds Find me at: @frederic_godin / www.fredericgodin.com Named Entity Recognition for Twitter Microposts (only) using Distributed Word Representations
  • 2. 2 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Introduction Goal: Recognizing 10 types of named entities (NEs) in noisy Twitter microposts Problem: Tweets contain spelling mistakes, slang and lack uniform grammar rules
  • 3. 3 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Traditional solutions Typical features: Ortographic features, gazetteers, corpus statistics or other parsing techniques (PoS and chunking) Typical machine learning techniques: CRF, HMM
  • 4. 4 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 POS Ortho- graphic Gaze tteers Brown clustering Word embedding ML F1(%) ousia X X X – GloVe entity linking using SVM 56.41 NLANGP – X X X word2vec & GloVe CRF++ 51.40 nrc – – X X word2vec semi-Markov MIRA 44.74 multimedialab – – – – word2vec FFNN 43.75 USFD X X X X – CRF L-BFGS 42.46 iitp X X X – – CRF++ 39.84 Hallym X – – X correlation analysis CRFsuite 37.21 lattice X X – X – CRF wapiti 16.47 Baseline – X X – – CRFsuite 31.97 An overview of the used approaches
  • 5. 5 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 A simple, general but effective neural network architecture Use word2vec to generate good feature representations for words (=unsupervised learning) Feed those word representations to another neural network (NN) for any classification task (=supervised learning) Example Feature representation Machine learning Label(s) Learn word2vec word representations once in advance Train a new NN for any task
  • 6. 6 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Word2vec: automatically learning good features 2D projection of a 400D space of the top 1000 words used on Twitter. The model was trained on 400 million tweets having 5 billion words
  • 7. 7 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 A simple, general but effective neural network architecture (1) W(t-1) W(t) W(t+1) L o o k u p N-dim N-dim N-dim Feed forward neural network Tag(W(t)) Example Feature representation Machine learning Label(s) Concatenate (3N-dim)Window = 3
  • 8. 8 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 A simple, general but effective neural network architecture (2) from Beijing to L o o k u p N-dim N-dim N-dim Feed forward neural network Location Example Feature representation Machine learning Label(s) Concatenate (3N-dim)Window = 3
  • 9. 9 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Postprocessing (1) FR ML W(1) W(2) W(3) Label(1) Label(2) Label(3) Post- processing Label(1) Label(2) Label(3) Correct for inconsistencies NE starting with an I-tag Multi-word expressions having different categories
  • 10. 10 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Postprocessing (2) FR ML Manchester United is B-Loc I-sportsteam O Post- processing B-sportsteam I-sportsteam O Correct for inconsistencies NE starting with an I-tag Multi-word expressions having different categories
  • 11. 11 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Experimental setup Feature Learning Word2vec Skipgram with negative sampling 400 million raw English tweets (limited preprocessing) Neural Network One hidden layer, with 500 hidden units Word embeddings of size 400, Voc of 3mil words Mini-batch SGD and Dropout Experiments with Tanh and ReLU
  • 12. 12 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Word2vec results Slang - Wrong capitalization - Sometimes not in Gazetteer Spelling
  • 13. 13 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Normalizing slang words/spelling
  • 14. 14 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Dealing with capitalization + gazetteer functionality
  • 15. 15 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Results POS Ortho- graphic Gaze tteers Brown clustering Word embedding ML F1(%) ousia X X X – GloVe entity linking using SVM 56.41 NLANGP – X X X word2vec & GloVe CRF++ 51.40 nrc – – X X word2vec semi-Markov MIRA 44.74 multimedialab – – – – word2vec FFNN 43.75 USFD X X X X – CRF L-BFGS 42.46 iitp X X X – – CRF++ 39.84 Hallym X – – X correlation analysis CRFsuite 37.21 lattice X X – X – CRF wapiti 16.47 BASELINE – X X – – CRFsuite 31.97
  • 16. 16 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Lessons learned Feature Learning A W2V window of 1 worked best More syntax-oriented embeddings Neural Networks Multiple layers did not improve the F1-score Dropout and ReLU worked best Postprocessing Multi-word expressions often have different categories
  • 17. 17 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 Conclusion End-to-end semi-supervised neural network architecture No feature engineering needed Reusable architecture Beats traditional systems that only use hand-crafted features
  • 18. 18 ELIS – Multimedia Lab NER in Twitter Microposts using distributed word representations Fréderic Godin et al. 31 July 2015 #Questions? http://www.fredericgodin.com/software/ The word2vec Twitter model is available at: @frederic_godin