A Review of Deep Contextualized Word Representations (Peters+, 2018)
Jun. 22, 2018•0 likes•19,085 views
Download to read offline
Report
Engineering
A brief review of the paper:
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In NAACL-HLT (pp. 2227–2237)
2. Overview
• Propose a new type of deep contextualised word
representations (ELMo) that model:
‣ Complex characteristics of word use (e.g., syntax and
semantics)
‣ How these uses vary across linguistic contexts (i.e., to
model polysemy)
• Show that ELMo can improve existing neural models in
various NLP tasks
• Argue that ELMo can capture more abstract linguistic
characteristics in the higher level of layers
33
• Overview
• Method
• Evaluation
• Analysis
• Comments
4. Method
• Embeddings from Language Models: ELMo
• Learn word embeddings through building
bidirectional language models (biLMs)
‣ biLMs consist of forward and backward LMs
✦ Forward:
✦ Backward:
35
• Overview
• Method
• Evaluation
• Analysis
• Comments
p(t1, t2, …, tN) =
N
∏
k=1
p(tk |t1, t2, …, tk−1)
p(t1, t2, …, tN) =
N
∏
k=1
p(tk |tk+1, tk+2, …, tN)
5. Method
With long short term memory (LSTM) network,
predicting the next words in both directions to build
biLMs
36
have a nice one …
Output layer
Hidden layers
(LSTMs)
Embedding layer
…
a nice one
xk
ok
k − 1
k − 1
Expanded in the forward direction of kThe forward LM architecture
• Overview
• Method
• Evaluation
• Analysis
• Comments
tk
hLM
k1
hLM
k2
6. Method
ELMo represents a word as a linear combination of
corresponding hidden layers (inc. its embedding)
37
tk
xk
ok
k − 1
k − 1
hLM
k1
ok
k + 1
k + 1
hLM
k2
hLM
k1
hLM
k2
Forward LM Backward LM
tk tk
stask
0
stask
1
stask
2
Concatenate
hidden layers
hLM
k1
hLM
k2
hLM
k0
([xk ; xk])
×
×
× [hLM
kj ; hLM
kj ]
{
ELMotask
k ×γtask
∑=
Unlike usual word embeddings, ELMo is
assigned to every token instead of a type
biLMs
ELMo is a task specific
representation. A down-stream
task learns weighting parameters
7. ELMo can be integrated to almost all neural NLP tasks
with simple concatenation to the embedding layer
Method
38
• Overview
• Method
• Evaluation
• Analysis
• Comments
have a nice
ELMo ELMo ELMo
biLMs
Corpus
Usual inputs
Enhance inputs
with ELMosTrain
8. Many linguistic tasks are improved by using ELMo
39
Evaluation
• Overview
• Method
• Evaluation
• Analysis
• Comments
Q&A
Textual entailment
Sentiment analysis
Semantic role labelling
Coreference resolution
Named entity recognition
9. Analysis
The higher layer seemed to learn semantics while the lower
layer probably captured syntactic features
40
• Overview
• Method
• Evaluation
• Analysis
• Comments
Word sense disambiguation PoS tagging
10. Analysis
The higher layer seemed to learn semantics while the lower
layer probably captured syntactic features???
41
• Overview
• Method
• Evaluation
• Analysis
• Comments
Most models preferred
“syntactic (probably)” features
Even in sentiment analysis
11. Analysis
ELMo-enhanced models can make use of small
datasets more efficiently
42
• Overview
• Method
• Evaluation
• Analysis
• Comments
Textual entailment Semantic role labelling
12. Comments
• Pre-trained ELMo models are available at https://
allennlp.org/elmo
‣ AllenNLP is a deep NLP library on top of PyTorch
‣ AllenNLP is a product of AI2 (Allen Institute for
Artificial Intelligence) which works on other
interesting projects like Semantic Scholar
• ELMo can process character-level inputs
‣ Japanese (Chinese, Korean, …) ELMo models likely
to be possible
43
• Overview
• Method
• Evaluation
• Analysis
• Comments