Named Entity Recognition for Twitter Microposts (only) using Distributed Word Representations

ELIS – Multimedia Lab
Fréderic Godin, Baptist Vandersmissen,
Wesley De Neve & Rik Van de Walle
Multimedia Lab, Ghent University – iMinds
Find me at: @frederic_godin / www.fredericgodin.com
Named Entity Recognition for Twitter Microposts
(only) using Distributed Word Representations

2
NER in Twitter Microposts using distributed word representations
Fréderic Godin et al.
31 July 2015
Introduction
Goal: Recognizing 10 types of named entities (NEs)
in noisy Twitter microposts
Problem: Tweets contain spelling mistakes, slang
and lack uniform grammar rules

3
31 July 2015
Traditional solutions
Typical features: Ortographic features, gazetteers,
corpus statistics or other parsing techniques (PoS
and chunking)
Typical machine learning techniques: CRF, HMM

4
31 July 2015
POS
Ortho-
graphic
Gaze
tteers
Brown
clustering
Word
embedding
ML F1(%)
ousia X X X – GloVe
entity linking
using SVM
56.41
NLANGP – X X X
word2vec &
GloVe
CRF++ 51.40
nrc – – X X word2vec
semi-Markov
MIRA
44.74
multimedialab – – – – word2vec FFNN 43.75
USFD X X X X – CRF L-BFGS 42.46
iitp X X X – – CRF++ 39.84
Hallym X – – X
correlation
analysis
CRFsuite 37.21
lattice X X – X – CRF wapiti 16.47
Baseline – X X – – CRFsuite 31.97
An overview of the used approaches

5
31 July 2015
A simple, general but effective
neural network architecture
Use word2vec to generate good feature representations for
words (=unsupervised learning)
Feed those word representations to another neural network
(NN) for any classification task (=supervised learning)
Example
Feature
representation
Machine
learning
Label(s)
Learn word2vec
word representations
once in advance
Train a new NN
for any task

6
31 July 2015
Word2vec: automatically learning good features
2D projection of a 400D space of the top 1000 words used on Twitter.
The model was trained on 400 million tweets having 5 billion words

7
31 July 2015
neural network architecture (1)
W(t-1)
W(t)
W(t+1)
L
o
o
k
u
p
N-dim
N-dim
N-dim
Feed
forward
neural
network
Tag(W(t))
Example
Feature
representation
Machine
learning
Label(s)
Concatenate (3N-dim)Window = 3

8
31 July 2015
neural network architecture (2)
from
Beijing
to
L
o
o
k
u
p
N-dim
N-dim
N-dim
Feed
forward
neural
network
Location
Example
Feature
representation
Machine
learning
Label(s)
Concatenate (3N-dim)Window = 3

9
31 July 2015
Postprocessing (1)
FR ML
W(1)
W(2)
W(3)
Label(1)
Label(2)
Label(3)
Post-
processing
Label(1)
Label(2)
Label(3)
Correct for inconsistencies
NE starting with an I-tag
Multi-word expressions having different categories

10
31 July 2015
Postprocessing (2)
FR ML
Manchester
United
is
B-Loc
I-sportsteam
O
Post-
processing
B-sportsteam
I-sportsteam
O
Correct for inconsistencies
NE starting with an I-tag
Multi-word expressions having different categories

11
31 July 2015
Experimental setup
Feature Learning
Word2vec Skipgram with negative sampling
400 million raw English tweets (limited preprocessing)
Neural Network
One hidden layer, with 500 hidden units
Word embeddings of size 400, Voc of 3mil words
Mini-batch SGD and Dropout
Experiments with Tanh and ReLU

12
31 July 2015
Word2vec results
Slang
- Wrong capitalization
- Sometimes not in Gazetteer
Spelling

13
31 July 2015
Normalizing slang words/spelling

14
31 July 2015
Dealing with capitalization + gazetteer functionality

15
31 July 2015
Results
POS
Ortho-
graphic
Gaze
tteers
Brown
clustering
Word
embedding
ML F1(%)
ousia X X X – GloVe
entity linking
using SVM
56.41
NLANGP – X X X
word2vec &
GloVe
CRF++ 51.40
nrc – – X X word2vec
semi-Markov
MIRA
44.74
multimedialab – – – – word2vec FFNN 43.75
USFD X X X X – CRF L-BFGS 42.46
iitp X X X – – CRF++ 39.84
Hallym X – – X
correlation
analysis
CRFsuite 37.21
lattice X X – X – CRF wapiti 16.47
BASELINE – X X – – CRFsuite 31.97

16
31 July 2015
Lessons learned
Feature Learning
A W2V window of 1 worked best
More syntax-oriented embeddings
Neural Networks
Multiple layers did not improve the F1-score
Dropout and ReLU worked best
Postprocessing
Multi-word expressions often have different categories

17
31 July 2015
Conclusion
End-to-end semi-supervised neural network architecture
No feature engineering needed
Reusable architecture
Beats traditional systems that only use
hand-crafted features

18
31 July 2015
#Questions?
http://www.fredericgodin.com/software/
The word2vec Twitter
model is available at:
@frederic_godin

Named Entity Recognition for Twitter Microposts (only) using Distributed Word Representations

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Viewers also liked

Viewers also liked (14)

Similar to Named Entity Recognition for Twitter Microposts (only) using Distributed Word Representations

Similar to Named Entity Recognition for Twitter Microposts (only) using Distributed Word Representations (20)

Recently uploaded

Recently uploaded (20)

Named Entity Recognition for Twitter Microposts (only) using Distributed Word Representations