This is the official slide for the NAACL 2021 paper: MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories.
Scaling API-first – The story of a global engineering organization
MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories (NAACL 2021)
1. MelBERT: Metaphor Detection via
Contextualized Late Interaction using
Metaphorical Identification Theories
Minjin Choi1, Sunkyung Lee1, Eunseong Choi1, Heesoo Park2,
Junhyuk Lee1, Dongwon Lee3, and Jongwuk Lee1
Sungkyunkwan University (SKKU), Republic of Korea1, Bering Lab, Republic of Korea2,
The Pennsylvania State University, United States3
NAACL 2021
3. Metaphor Detection
➢ A metaphor represents other concepts rather than literal
meanings.
• A metaphor is pervasive and essential, yet subtle.
• Metaphor detection can be helpful for various NLP tasks, e.g., machine
translation, sentimental analysis, dialogue systems, etc.
3
The debate has been sharpened.
4. Limitation of Existing Methods
➢ Feature-based approaches are intuitive and straightforward but
difficult to handle rare usages of metaphors.
4
Literal Metaphorical
Black dress Black humor
Ripe banana Ripe age
Stormy sea Stormy applause
Sharp pencil Sharp debate
Annotated adjective-noun pairs
Metaphorical !
The debate has been sharpened.
Luana Bulat, Stephen Clark, Ekaterina Shutova, “Modelling metaphor with attribute-based semantics.”, EACL 2017.
5. Limitation of Existing Methods
➢ RNN-based models can consider sequences but challenging to
understand the meaning of words in context.
5
Ge Gao, Eunsol Choi, Yejin Choi, Luke Zettlemoyer, “Neural Metaphor Detection in Context.”, EMNLP 2018.
Rui Mao, Chenghua Lin, Frank Guerin, “End-to-End Sequential Metaphor Identification Inspired by Linguistic Theories”, ACL 2019.
The debate has been sharpened
LSTM
LSTM
X
LSTM
LSTM
X
LSTM
LSTM
X
LSTM
LSTM
X
LSTM
LSTM
O
6. Our Key Contributions
6
Transformer
Encoder
➢ We utilize a contextualized model using two metaphor
identification theories.
• The model has a powerful representation capacity for the sentences.
SPV
MIP
7. Our Key Contributions
➢ Selectional Preference Violation (SPV)
• Exploit SPV to detect metaphor from the contradiction between a targ
et word and its context
7
Unusual in the context of “debate”,
rather than a “pencil”
vs
The debate has been sharpened.
8. Our Key Contributions
➢ Metaphor Identification Procedure (MIP)
• Exploit MIP to detect metaphor from the difference between literal
meaning and contextual meaning of a word
8
Literal meaning:
To make something sharp
Contextual meaning:
To become more intense
The debate has been sharpened.
9. Our Key Contributions
➢ We leverage a late-interaction architecture over pre-trained
contextualized models.
• It can prevent unnecessary interactions while effectively distinguish the
contextualized and linguistic meanings of a word.
9
∙∙
∙
∙∙
∙
∙∙
∙
∙∙
∙
Late interaction
Sentence Target word
∙∙
∙
∙∙
∙
∙∙
∙
∙∙
∙
All-to-all interaction
Sentence Target word
11. Overview of MelBERT
➢ MelBERT consists of two main components.
• SPV layer compares the target word and its context.
• MIP layer compares the contextualized and linguistic meaning of the
target word.
11
[CLS] The debate sharpened [SEP]
Transformer Encoder Transformer Encoder
has been [CLS] sharpened [SEP]
SPV layer MIP layer
Linear + Softmax
12. MelBERT using SPV
➢ SPV layer identifies the contradiction between a target word
and its context.
• We only utilize the sentence encoder for SPV.
12
Transformer Encoder
[CLS] sharpened [SEP]
𝐕𝐒 𝐕𝐒,𝐭
SPV layer
The interaction between
target word and other words
in the sentence.
The interaction across
all pairwise words in
the sentence
[CLS] The debate sharpened [SEP]
has been
13. [CLS] The debate sharpened [SEP]
has been [CLS] sharpened [SEP]
𝐕𝐒 𝐕𝐒,𝐭 𝐕𝒕
MIP layer
MelBERT using MIP
➢ MIP layer identifies the semantic gap for the target word in
context and isolation.
• We utilize the sentence encoder and the target word encoder for MIP.
13
Isolated embedding
vector for the target word
Contextualized embedding
vector for the target word
14. Late Interaction over MelBERT
➢ The hidden vectors are combined to compute a prediction score.
• MelBERT predicts whether a target word is metaphorical or not.
14
[CLS] The debate sharpened [SEP]
Transformer Encoder
has been [CLS] sharpened [SEP]
𝐕𝐒 𝐕𝐒,𝐭 𝐕𝒕
SPV layer MIP layer
Transformer Encoder
Linear + Softmax
Is
Metaphor
Not
Metaphor
0.8 0.2
15. MelBERT in Details
➢ The loss function for MelBERT
➢ Linguistic features
• We utilize linguistic features such as POS tags and local contexts for
segment embedding.
• The local context indicates a clause including target tokens.
15
ෝ
𝒚 = 𝝈 𝑾𝑻
𝒉𝑴𝑰𝑷; 𝒉𝑺𝑷𝑽 + 𝒃 ,
𝓛 =
𝒊=𝟏
𝑵
𝒚𝒊𝒍𝒐𝒈 ෝ
𝒚𝒊 + (𝟏 − 𝒚𝒊)𝐥𝐨𝐠(𝟏 − ෝ
𝒚𝒊) ,
𝑤ℎ𝑒𝑟𝑒 ℎ𝑀𝐼𝑃 𝑎𝑛𝑑 ℎ𝑆𝑃𝑉 𝑎𝑟𝑒 ℎ𝑖𝑑𝑑𝑒𝑛 𝑣𝑒𝑐𝑡𝑜𝑟𝑠 𝑒𝑎𝑐ℎ 𝑓𝑟𝑜𝑚 𝑀𝐼𝑃 𝑎𝑛𝑑 𝑆𝑃𝑉 𝑙𝑎𝑦𝑒𝑟
17. Experimental Setup: Dataset
➢ We evaluate MelBERT over well-known public datasets.
• VUA-18 and VUA-20
17
Dataset #tokens %M #sent Sent len
VUA−18𝑡𝑟
VUA−18𝑑𝑒𝑣
VUA−18𝑡𝑒
116,622
38,628
50,175
11.2
11.6
12.4
6,323
1,550
2,694
18.4
24.9
18.6
VUA−20𝑡𝑟
VUA−20𝑡𝑒
160,154
22,196
12.0
17.9
12,109
3,698
15
15.5
VUA−Verb𝑡𝑒 5,873 30 2,694 18.6
#tokens: the number of tokens
%M: the percentage of metaphorical words
#sent: the number of sentences
Sent len: the average length of sentences
18. Competitive Models
➢ Four RNN-based models
• RNN_ELMo: a BiLSTM-based model using ELMo as an input
• RNN_BERT: a BiLSTM-based model using BERT embeddings as an input
• RNN_HG: a variant of BiLSTM-based model using linguistic theories
• RNN_MHCA: a variant of BiLSTM-based model with multi-head attention
mechanism using linguistic theories
➢ Three contextualization-based models
• RoBERTa_BASE: a simple adoption of RoBERTa for classification
• RoBERTa_SEQ: a simple adoption of RoBERTa for sequence labeling
• DeepMet: a winning model in the VUA 2020 shared task, using linguistic
features and RoBERTa as a backbone model
18
21. Effects of Two Different Linguistic Theories
➢ MelBERT using both metaphor identification theories consistently
shows the best performance.
• MelBERT without SPV outperforms MelBERT without MIP, proving the
effectiveness of the late interaction mechanism.
21
Model
VUA-18 VUA-20
Prec Rec F1 Prec Rec F1
MelBERT 80.1 76.9 78.5 76.4 68.6 72.3
(-) MIP 77.8 75.8 76.7 74.9 67.8 71.1
(-) SPV 79.5 76.3 77.9 74.9 68.4 71.4
(-) MIP (-) SPV
[CLS] Tok 1 Target Tok N
Transformer Encoder Transformer Encoder
[CLS] Target [SEP]
SPV layer
Linear + Softmax
…
…
MelBERT
MIP layer
22. Qualitative Analysis of MelBERT
➢ MelBERT detects metaphors that other models do not notice.
• MelBERT often fails to identify metaphorical words for implicit metaphors,
e.g., “Way of the World” is poetic.
22
(-) MIP (-) SPV MelBERT Sentence
Manchester is not alone.
That’s an old trick.
So who’s covering tomorrow?
The day thrift turned into a nightmare.
Way of the World: Farming notes (-) MIP (-) SPV
[CLS] Tok 1 Target Tok N
Transformer Encoder Transformer Encoder
[CLS] Target [SEP]
SPV layer
Linear + Softmax
…
…
MelBERT
MIP layer
The metaphorical words in the sentence are in red italicized.
marks correct model prediction.
24. Conclusion
➢ We propose a novel metaphor detection model with metaphor
identification theories.
• MelBERT: metaphor-aware late interaction over BERT
➢ MelBERT has an excellent theoretical foundation in linguistics.
• Selectional Preference Violation(SPV) & Metaphor Identification
Procedure(MIP)
➢ MelBERT achieves competitive or state-of-the-art performance
on various datasets.
24