As part of the Named Entity Recognition for Twitter microposts shared task at ACL2015, we propose a solution which only uses word embeddings. The word embeddings model is trained on 400 million tweets and is available at http://www.fredericgodin.com/software/.
Named Entity Recognition for Twitter Microposts (only) using Distributed Word Representations
1. ELIS – Multimedia Lab
Fréderic Godin, Baptist Vandersmissen,
Wesley De Neve & Rik Van de Walle
Multimedia Lab, Ghent University – iMinds
Find me at: @frederic_godin / www.fredericgodin.com
Named Entity Recognition for Twitter Microposts
(only) using Distributed Word Representations
2. 2
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Introduction
Goal: Recognizing 10 types of named entities (NEs)
in noisy Twitter microposts
Problem: Tweets contain spelling mistakes, slang
and lack uniform grammar rules
3. 3
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Traditional solutions
Typical features: Ortographic features, gazetteers,
corpus statistics or other parsing techniques (PoS
and chunking)
Typical machine learning techniques: CRF, HMM
4. 4
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
POS
Ortho-
graphic
Gaze
tteers
Brown
clustering
Word
embedding
ML F1(%)
ousia X X X – GloVe
entity linking
using SVM
56.41
NLANGP – X X X
word2vec &
GloVe
CRF++ 51.40
nrc – – X X word2vec
semi-Markov
MIRA
44.74
multimedialab – – – – word2vec FFNN 43.75
USFD X X X X – CRF L-BFGS 42.46
iitp X X X – – CRF++ 39.84
Hallym X – – X
correlation
analysis
CRFsuite 37.21
lattice X X – X – CRF wapiti 16.47
Baseline – X X – – CRFsuite 31.97
An overview of the used approaches
5. 5
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
A simple, general but effective
neural network architecture
Use word2vec to generate good feature representations for
words (=unsupervised learning)
Feed those word representations to another neural network
(NN) for any classification task (=supervised learning)
Example
Feature
representation
Machine
learning
Label(s)
Learn word2vec
word representations
once in advance
Train a new NN
for any task
6. 6
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Word2vec: automatically learning good features
2D projection of a 400D space of the top 1000 words used on Twitter.
The model was trained on 400 million tweets having 5 billion words
7. 7
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
A simple, general but effective
neural network architecture (1)
W(t-1)
W(t)
W(t+1)
L
o
o
k
u
p
N-dim
N-dim
N-dim
Feed
forward
neural
network
Tag(W(t))
Example
Feature
representation
Machine
learning
Label(s)
Concatenate (3N-dim)Window = 3
8. 8
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
A simple, general but effective
neural network architecture (2)
from
Beijing
to
L
o
o
k
u
p
N-dim
N-dim
N-dim
Feed
forward
neural
network
Location
Example
Feature
representation
Machine
learning
Label(s)
Concatenate (3N-dim)Window = 3
9. 9
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Postprocessing (1)
FR ML
W(1)
W(2)
W(3)
Label(1)
Label(2)
Label(3)
Post-
processing
Label(1)
Label(2)
Label(3)
Correct for inconsistencies
NE starting with an I-tag
Multi-word expressions having different categories
10. 10
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Postprocessing (2)
FR ML
Manchester
United
is
B-Loc
I-sportsteam
O
Post-
processing
B-sportsteam
I-sportsteam
O
Correct for inconsistencies
NE starting with an I-tag
Multi-word expressions having different categories
11. 11
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Experimental setup
Feature Learning
Word2vec Skipgram with negative sampling
400 million raw English tweets (limited preprocessing)
Neural Network
One hidden layer, with 500 hidden units
Word embeddings of size 400, Voc of 3mil words
Mini-batch SGD and Dropout
Experiments with Tanh and ReLU
12. 12
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Word2vec results
Slang
- Wrong capitalization
- Sometimes not in Gazetteer
Spelling
13. 13
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Normalizing slang words/spelling
14. 14
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Dealing with capitalization + gazetteer functionality
15. 15
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Results
POS
Ortho-
graphic
Gaze
tteers
Brown
clustering
Word
embedding
ML F1(%)
ousia X X X – GloVe
entity linking
using SVM
56.41
NLANGP – X X X
word2vec &
GloVe
CRF++ 51.40
nrc – – X X word2vec
semi-Markov
MIRA
44.74
multimedialab – – – – word2vec FFNN 43.75
USFD X X X X – CRF L-BFGS 42.46
iitp X X X – – CRF++ 39.84
Hallym X – – X
correlation
analysis
CRFsuite 37.21
lattice X X – X – CRF wapiti 16.47
BASELINE – X X – – CRFsuite 31.97
16. 16
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Lessons learned
Feature Learning
A W2V window of 1 worked best
More syntax-oriented embeddings
Neural Networks
Multiple layers did not improve the F1-score
Dropout and ReLU worked best
Postprocessing
Multi-word expressions often have different categories
17. 17
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Conclusion
End-to-end semi-supervised neural network architecture
No feature engineering needed
Reusable architecture
Beats traditional systems that only use
hand-crafted features
18. 18
ELIS – Multimedia Lab
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
#Questions?
http://www.fredericgodin.com/software/
The word2vec Twitter
model is available at:
@frederic_godin