MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories (NAACL 2021)

MelBERT: Metaphor Detection via
Contextualized Late Interaction using
Metaphorical Identification Theories
Minjin Choi1, Sunkyung Lee1, Eunseong Choi1, Heesoo Park2,
Junhyuk Lee1, Dongwon Lee3, and Jongwuk Lee1
Sungkyunkwan University (SKKU), Republic of Korea1, Bering Lab, Republic of Korea2,
The Pennsylvania State University, United States3
NAACL 2021

Metaphor Detection
➢ A metaphor represents other concepts rather than literal
meanings.
• A metaphor is pervasive and essential, yet subtle.
• Metaphor detection can be helpful for various NLP tasks, e.g., machine
translation, sentimental analysis, dialogue systems, etc.
3
The debate has been sharpened.

Limitation of Existing Methods
➢ Feature-based approaches are intuitive and straightforward but
difficult to handle rare usages of metaphors.
4
Literal Metaphorical
Black dress Black humor
Ripe banana Ripe age
Stormy sea Stormy applause
Sharp pencil Sharp debate
Annotated adjective-noun pairs
Metaphorical !
Luana Bulat, Stephen Clark, Ekaterina Shutova, “Modelling metaphor with attribute-based semantics.”, EACL 2017.

Limitation of Existing Methods
➢ RNN-based models can consider sequences but challenging to
understand the meaning of words in context.
5
Ge Gao, Eunsol Choi, Yejin Choi, Luke Zettlemoyer, “Neural Metaphor Detection in Context.”, EMNLP 2018.
Rui Mao, Chenghua Lin, Frank Guerin, “End-to-End Sequential Metaphor Identification Inspired by Linguistic Theories”, ACL 2019.
The debate has been sharpened
LSTM
LSTM
X
LSTM
LSTM
X
LSTM
LSTM
X
LSTM
LSTM
X
LSTM
LSTM
O

Our Key Contributions
6
Transformer
Encoder
➢ We utilize a contextualized model using two metaphor
identification theories.
• The model has a powerful representation capacity for the sentences.
SPV
MIP

➢ Selectional Preference Violation (SPV)
• Exploit SPV to detect metaphor from the contradiction between a targ
et word and its context
7
Unusual in the context of “debate”,
rather than a “pencil”
vs

➢ Metaphor Identification Procedure (MIP)
• Exploit MIP to detect metaphor from the difference between literal
meaning and contextual meaning of a word
8
Literal meaning:
To make something sharp
Contextual meaning:
To become more intense

➢ We leverage a late-interaction architecture over pre-trained
contextualized models.
• It can prevent unnecessary interactions while effectively distinguish the
contextualized and linguistic meanings of a word.
9
∙∙
∙
∙∙
∙
∙∙
∙
∙∙
∙
Late interaction
Sentence Target word
∙∙
∙
∙∙
∙
∙∙
∙
∙∙
∙
All-to-all interaction
Sentence Target word

Overview of MelBERT
➢ MelBERT consists of two main components.
• SPV layer compares the target word and its context.
• MIP layer compares the contextualized and linguistic meaning of the
target word.
11
[CLS] The debate sharpened [SEP]
Transformer Encoder Transformer Encoder
has been [CLS] sharpened [SEP]
SPV layer MIP layer
Linear + Softmax

MelBERT using SPV
➢ SPV layer identifies the contradiction between a target word
and its context.
• We only utilize the sentence encoder for SPV.
12
Transformer Encoder
[CLS] sharpened [SEP]
𝐕𝐒 𝐕𝐒,𝐭
SPV layer
The interaction between
target word and other words
in the sentence.
The interaction across
all pairwise words in
the sentence
has been

𝐕𝐒 𝐕𝐒,𝐭 𝐕𝒕
MIP layer
MelBERT using MIP
➢ MIP layer identifies the semantic gap for the target word in
context and isolation.
• We utilize the sentence encoder and the target word encoder for MIP.
13
Isolated embedding
vector for the target word
Contextualized embedding
vector for the target word

Late Interaction over MelBERT
➢ The hidden vectors are combined to compute a prediction score.
• MelBERT predicts whether a target word is metaphorical or not.
14
Transformer Encoder
𝐕𝐒 𝐕𝐒,𝐭 𝐕𝒕
SPV layer MIP layer
Transformer Encoder
Linear + Softmax
Is
Metaphor
Not
Metaphor
0.8 0.2

MelBERT in Details
➢ The loss function for MelBERT
➢ Linguistic features
• We utilize linguistic features such as POS tags and local contexts for
segment embedding.
• The local context indicates a clause including target tokens.
15
ෝ
𝒚 = 𝝈 𝑾𝑻
𝒉𝑴𝑰𝑷; 𝒉𝑺𝑷𝑽 + 𝒃 ,
𝓛 = ෍
𝒊=𝟏
𝑵
𝒚𝒊𝒍𝒐𝒈 ෝ
𝒚𝒊 + (𝟏 − 𝒚𝒊)𝐥𝐨𝐠(𝟏 − ෝ
𝒚𝒊) ,
𝑤ℎ𝑒𝑟𝑒 ℎ𝑀𝐼𝑃 𝑎𝑛𝑑 ℎ𝑆𝑃𝑉 𝑎𝑟𝑒 ℎ𝑖𝑑𝑑𝑒𝑛 𝑣𝑒𝑐𝑡𝑜𝑟𝑠 𝑒𝑎𝑐ℎ 𝑓𝑟𝑜𝑚 𝑀𝐼𝑃 𝑎𝑛𝑑 𝑆𝑃𝑉 𝑙𝑎𝑦𝑒𝑟

Experimental Setup: Dataset
➢ We evaluate MelBERT over well-known public datasets.
• VUA-18 and VUA-20
17
Dataset #tokens %M #sent Sent len
VUA−18𝑡𝑟
VUA−18𝑑𝑒𝑣
VUA−18𝑡𝑒
116,622
38,628
50,175
11.2
11.6
12.4
6,323
1,550
2,694
18.4
24.9
18.6
VUA−20𝑡𝑟
VUA−20𝑡𝑒
160,154
22,196
12.0
17.9
12,109
3,698
15
15.5
VUA−Verb𝑡𝑒 5,873 30 2,694 18.6
#tokens: the number of tokens
%M: the percentage of metaphorical words
#sent: the number of sentences
Sent len: the average length of sentences

Competitive Models
➢ Four RNN-based models
• RNN_ELMo: a BiLSTM-based model using ELMo as an input
• RNN_BERT: a BiLSTM-based model using BERT embeddings as an input
• RNN_HG: a variant of BiLSTM-based model using linguistic theories
• RNN_MHCA: a variant of BiLSTM-based model with multi-head attention
mechanism using linguistic theories
➢ Three contextualization-based models
• RoBERTa_BASE: a simple adoption of RoBERTa for classification
• RoBERTa_SEQ: a simple adoption of RoBERTa for sequence labeling
• DeepMet: a winning model in the VUA 2020 shared task, using linguistic
features and RoBERTa as a backbone model
18

MelBERT vs. Competing Models
➢In terms of F1-score, MelBERT consistently outperforms competitive
models over two benchmark datasets.
19
Dataset VUA-18 VUA-Verb
Metric Prec Rec F1 Prec Rec F1
RNN_ELMo 71.6 73.6 72.6 68.2 71.3 69.7
RNN_BERT 71.5 71.9 71.7 66.7 71.5 69.0
RNN_HG 71.8 76.3 74.0 69.3 72.3 70.8
RNN_MHCA 73.0 75.7 74.3 66.3 75.2 70.5
RoBERTa_BASE 79.4 75.0 77.2 76.9 72.8 74.8
RoBERTa_SEQ 80.4 74.9 77.5 79.2 69.8 74.2
DeepMet 82.0 71.3 76.3 79.5 70.8 74.9
MelBERT 80.1 76.9 78.5 78.7 72.9 75.7
RNN-based
Contextualization
-based
Ours

POS tags: MelBERT vs. Competing Models
➢ MelBERT shows the best performance in terms of F1-score.
• MelBERT achieves consistent improvements regardless of POS tags of
target words.
20
POS tag Verb Adjective Adverb Noun
Metric Prec Rec F1 Prec Rec F1 Prec Rec F1 Prec Rec F1
RNN_ELMo 68.1 71.9 69.9 56.1 60.6 58.3 67.2 53.7 59.7 59.9 60.8 60.4
RNN_BERT 67.1 72.1 69.5 58.1 51.6 54.7 64.8 61.1 62.9 63.3 56.8 59.9
RNN_HG 66.4 75.5 70.7 59.2 65.6 62.2 61.0 66.8 63.8 60.3 66.8 63.4
RNN_MHCA 66.0 76.0 70.7 61.4 61.7 61.6 66.1 60.7 63.2 69.1 58.2 63.2
RoBERTa_BASE 77.0 72.1 74.5 71.7 59.0 64.7 78.2 69.3 73.5 77.5 60.4 67.9
RoBERTa_SEQ 74.4 75.1 74.8 72.0 57.1 63.7 77.6 63.9 70.1 76.5 59.0 66.6
DeepMet 78.8 68.5 73.3 79.0 52.9 63.3 79.4 66.4 72.3 76.5 57.1 65.4
MelBERT 74.2 75.9 75.1 69.4 60.1 64.4 80.2 69.7 74.6 75.4 66.5 70.7
RNN-based
Contextualization
-based
Ours

Effects of Two Different Linguistic Theories
➢ MelBERT using both metaphor identification theories consistently
shows the best performance.
• MelBERT without SPV outperforms MelBERT without MIP, proving the
effectiveness of the late interaction mechanism.
21
Model
VUA-18 VUA-20
Prec Rec F1 Prec Rec F1
MelBERT 80.1 76.9 78.5 76.4 68.6 72.3
(-) MIP 77.8 75.8 76.7 74.9 67.8 71.1
(-) SPV 79.5 76.3 77.9 74.9 68.4 71.4
(-) MIP (-) SPV
[CLS] Tok 1 Target Tok N
[CLS] Target [SEP]
SPV layer
Linear + Softmax
…
…
MelBERT
MIP layer

Qualitative Analysis of MelBERT
➢ MelBERT detects metaphors that other models do not notice.
• MelBERT often fails to identify metaphorical words for implicit metaphors,
e.g., “Way of the World” is poetic.
22
(-) MIP (-) SPV MelBERT Sentence
Manchester is not alone.
That’s an old trick.
So who’s covering tomorrow?
The day thrift turned into a nightmare.
Way of the World: Farming notes (-) MIP (-) SPV
[CLS] Tok 1 Target Tok N
[CLS] Target [SEP]
SPV layer
Linear + Softmax
…
…
MelBERT
MIP layer
The metaphorical words in the sentence are in red italicized.
marks correct model prediction.

Conclusion
➢ We propose a novel metaphor detection model with metaphor
identification theories.
• MelBERT: metaphor-aware late interaction over BERT
➢ MelBERT has an excellent theoretical foundation in linguistics.
• Selectional Preference Violation(SPV) & Metaphor Identification
Procedure(MIP)
➢ MelBERT achieves competitive or state-of-the-art performance
on various datasets.
24

Q&A
25
Code: https://github.com/jin530/MelBERT
Email: sk1027@skku.edu

MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories (NAACL 2021)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories (NAACL 2021)

Similar to MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories (NAACL 2021) (20)

Recently uploaded

Recently uploaded (20)

MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories (NAACL 2021)