A Review of Deep Contextualized Word Representations (Peters+, 2018)

Deep contextualized
word representations
Matthew E. Peters, Mark Neumann, Mohit
Iyyer, Matt Gardner, Christopher Clark,
Kenton Lee, Luke Zettlemoyer
NAACL-HLT 2018

Overview
• Propose a new type of deep contextualised word
representations (ELMo) that model:
‣ Complex characteristics of word use (e.g., syntax and
semantics)
‣ How these uses vary across linguistic contexts (i.e., to
model polysemy)
• Show that ELMo can improve existing neural models in
various NLP tasks
• Argue that ELMo can capture more abstract linguistic
characteristics in the higher level of layers
33
• Overview
• Method
• Evaluation
• Analysis
• Comments

Example
34
GloVe mostly learns
sport-related context
ELMo can distinguish the word
sense based on the context

Method
• Embeddings from Language Models: ELMo
• Learn word embeddings through building
bidirectional language models (biLMs)
‣ biLMs consist of forward and backward LMs
✦ Forward:
✦ Backward:
35
• Overview
• Method
• Evaluation
• Analysis
• Comments
p(t1, t2, …, tN) =
N
∏
k=1
p(tk |t1, t2, …, tk−1)
p(t1, t2, …, tN) =
N
∏
k=1
p(tk |tk+1, tk+2, …, tN)

Method
With long short term memory (LSTM) network,
predicting the next words in both directions to build
biLMs
36
have a nice one …
Output layer
Hidden layers
(LSTMs)
Embedding layer
…
a nice one
xk
ok
k − 1
k − 1
Expanded in the forward direction of kThe forward LM architecture
• Overview
• Method
• Evaluation
• Analysis
• Comments
tk
hLM
k1
hLM
k2

Method
ELMo represents a word as a linear combination of
corresponding hidden layers (inc. its embedding)
37
tk
xk
ok
k − 1
k − 1
hLM
k1
ok
k + 1
k + 1
hLM
k2
hLM
k1
hLM
k2
Forward LM Backward LM
tk tk
stask
0
stask
1
stask
2
Concatenate
hidden layers
hLM
k1
hLM
k2
hLM
k0
([xk ; xk])
×
×
× [hLM
kj ; hLM
kj ]
{
ELMotask
k ×γtask
∑=
Unlike usual word embeddings, ELMo is
assigned to every token instead of a type
biLMs
ELMo is a task specific
representation. A down-stream
task learns weighting parameters

ELMo can be integrated to almost all neural NLP tasks
with simple concatenation to the embedding layer
Method
38
• Overview
• Method
• Evaluation
• Analysis
• Comments
have a nice
ELMo ELMo ELMo
biLMs
Corpus
Usual inputs
Enhance inputs
with ELMosTrain

Many linguistic tasks are improved by using ELMo
39
Evaluation
• Overview
• Method
• Evaluation
• Analysis
• Comments
Q&A
Textual entailment
Sentiment analysis
Semantic role labelling
Coreference resolution
Named entity recognition

Analysis
The higher layer seemed to learn semantics while the lower
layer probably captured syntactic features
40
• Overview
• Method
• Evaluation
• Analysis
• Comments
Word sense disambiguation PoS tagging

Analysis
The higher layer seemed to learn semantics while the lower
layer probably captured syntactic features???
41
• Overview
• Method
• Evaluation
• Analysis
• Comments
Most models preferred
“syntactic (probably)” features
Even in sentiment analysis

Analysis
ELMo-enhanced models can make use of small
datasets more efficiently
42
• Overview
• Method
• Evaluation
• Analysis
• Comments
Textual entailment Semantic role labelling

Comments
• Pre-trained ELMo models are available at https://
allennlp.org/elmo
‣ AllenNLP is a deep NLP library on top of PyTorch
‣ AllenNLP is a product of AI2 (Allen Institute for
Artificial Intelligence) which works on other
interesting projects like Semantic Scholar
• ELMo can process character-level inputs
‣ Japanese (Chinese, Korean, …) ELMo models likely
to be possible
43
• Overview
• Method
• Evaluation
• Analysis
• Comments

A Review of Deep Contextualized Word Representations (Peters+, 2018)

More Related Content

What's hot

Similar to A Review of Deep Contextualized Word Representations (Peters+, 2018)

Recently uploaded

A Review of Deep Contextualized Word Representations (Peters+, 2018)