SlideShare a Scribd company logo
1 of 25
Download to read offline
MelBERT: Metaphor Detection via
Contextualized Late Interaction using
Metaphorical Identification Theories
Minjin Choi1, Sunkyung Lee1, Eunseong Choi1, Heesoo Park2,
Junhyuk Lee1, Dongwon Lee3, and Jongwuk Lee1
Sungkyunkwan University (SKKU), Republic of Korea1, Bering Lab, Republic of Korea2,
The Pennsylvania State University, United States3
NAACL 2021
Motivation
2
Metaphor Detection
➢ A metaphor represents other concepts rather than literal
meanings.
• A metaphor is pervasive and essential, yet subtle.
• Metaphor detection can be helpful for various NLP tasks, e.g., machine
translation, sentimental analysis, dialogue systems, etc.
3
The debate has been sharpened.
Limitation of Existing Methods
➢ Feature-based approaches are intuitive and straightforward but
difficult to handle rare usages of metaphors.
4
Literal Metaphorical
Black dress Black humor
Ripe banana Ripe age
Stormy sea Stormy applause
Sharp pencil Sharp debate
Annotated adjective-noun pairs
Metaphorical !
The debate has been sharpened.
Luana Bulat, Stephen Clark, Ekaterina Shutova, “Modelling metaphor with attribute-based semantics.”, EACL 2017.
Limitation of Existing Methods
➢ RNN-based models can consider sequences but challenging to
understand the meaning of words in context.
5
Ge Gao, Eunsol Choi, Yejin Choi, Luke Zettlemoyer, “Neural Metaphor Detection in Context.”, EMNLP 2018.
Rui Mao, Chenghua Lin, Frank Guerin, “End-to-End Sequential Metaphor Identification Inspired by Linguistic Theories”, ACL 2019.
The debate has been sharpened
LSTM
LSTM
X
LSTM
LSTM
X
LSTM
LSTM
X
LSTM
LSTM
X
LSTM
LSTM
O
Our Key Contributions
6
Transformer
Encoder
➢ We utilize a contextualized model using two metaphor
identification theories.
• The model has a powerful representation capacity for the sentences.
SPV
MIP
Our Key Contributions
➢ Selectional Preference Violation (SPV)
• Exploit SPV to detect metaphor from the contradiction between a targ
et word and its context
7
Unusual in the context of “debate”,
rather than a “pencil”
vs
The debate has been sharpened.
Our Key Contributions
➢ Metaphor Identification Procedure (MIP)
• Exploit MIP to detect metaphor from the difference between literal
meaning and contextual meaning of a word
8
Literal meaning:
To make something sharp
Contextual meaning:
To become more intense
The debate has been sharpened.
Our Key Contributions
➢ We leverage a late-interaction architecture over pre-trained
contextualized models.
• It can prevent unnecessary interactions while effectively distinguish the
contextualized and linguistic meanings of a word.
9
∙∙
∙
∙∙
∙
∙∙
∙
∙∙
∙
Late interaction
Sentence Target word
∙∙
∙
∙∙
∙
∙∙
∙
∙∙
∙
All-to-all interaction
Sentence Target word
Proposed Model
10
Overview of MelBERT
➢ MelBERT consists of two main components.
• SPV layer compares the target word and its context.
• MIP layer compares the contextualized and linguistic meaning of the
target word.
11
[CLS] The debate sharpened [SEP]
Transformer Encoder Transformer Encoder
has been [CLS] sharpened [SEP]
SPV layer MIP layer
Linear + Softmax
MelBERT using SPV
➢ SPV layer identifies the contradiction between a target word
and its context.
• We only utilize the sentence encoder for SPV.
12
Transformer Encoder
[CLS] sharpened [SEP]
𝐕𝐒 𝐕𝐒,𝐭
SPV layer
The interaction between
target word and other words
in the sentence.
The interaction across
all pairwise words in
the sentence
[CLS] The debate sharpened [SEP]
has been
[CLS] The debate sharpened [SEP]
has been [CLS] sharpened [SEP]
𝐕𝐒 𝐕𝐒,𝐭 𝐕𝒕
MIP layer
MelBERT using MIP
➢ MIP layer identifies the semantic gap for the target word in
context and isolation.
• We utilize the sentence encoder and the target word encoder for MIP.
13
Isolated embedding
vector for the target word
Contextualized embedding
vector for the target word
Late Interaction over MelBERT
➢ The hidden vectors are combined to compute a prediction score.
• MelBERT predicts whether a target word is metaphorical or not.
14
[CLS] The debate sharpened [SEP]
Transformer Encoder
has been [CLS] sharpened [SEP]
𝐕𝐒 𝐕𝐒,𝐭 𝐕𝒕
SPV layer MIP layer
Transformer Encoder
Linear + Softmax
Is
Metaphor
Not
Metaphor
0.8 0.2
MelBERT in Details
➢ The loss function for MelBERT
➢ Linguistic features
• We utilize linguistic features such as POS tags and local contexts for
segment embedding.
• The local context indicates a clause including target tokens.
15
ෝ
𝒚 = 𝝈 𝑾𝑻
𝒉𝑴𝑰𝑷; 𝒉𝑺𝑷𝑽 + 𝒃 ,
𝓛 = ෍
𝒊=𝟏
𝑵
𝒚𝒊𝒍𝒐𝒈 ෝ
𝒚𝒊 + (𝟏 − 𝒚𝒊)𝐥𝐨𝐠(𝟏 − ෝ
𝒚𝒊) ,
𝑤ℎ𝑒𝑟𝑒 ℎ𝑀𝐼𝑃 𝑎𝑛𝑑 ℎ𝑆𝑃𝑉 𝑎𝑟𝑒 ℎ𝑖𝑑𝑑𝑒𝑛 𝑣𝑒𝑐𝑡𝑜𝑟𝑠 𝑒𝑎𝑐ℎ 𝑓𝑟𝑜𝑚 𝑀𝐼𝑃 𝑎𝑛𝑑 𝑆𝑃𝑉 𝑙𝑎𝑦𝑒𝑟
Experiments
16
Experimental Setup: Dataset
➢ We evaluate MelBERT over well-known public datasets.
• VUA-18 and VUA-20
17
Dataset #tokens %M #sent Sent len
VUA−18𝑡𝑟
VUA−18𝑑𝑒𝑣
VUA−18𝑡𝑒
116,622
38,628
50,175
11.2
11.6
12.4
6,323
1,550
2,694
18.4
24.9
18.6
VUA−20𝑡𝑟
VUA−20𝑡𝑒
160,154
22,196
12.0
17.9
12,109
3,698
15
15.5
VUA−Verb𝑡𝑒 5,873 30 2,694 18.6
#tokens: the number of tokens
%M: the percentage of metaphorical words
#sent: the number of sentences
Sent len: the average length of sentences
Competitive Models
➢ Four RNN-based models
• RNN_ELMo: a BiLSTM-based model using ELMo as an input
• RNN_BERT: a BiLSTM-based model using BERT embeddings as an input
• RNN_HG: a variant of BiLSTM-based model using linguistic theories
• RNN_MHCA: a variant of BiLSTM-based model with multi-head attention
mechanism using linguistic theories
➢ Three contextualization-based models
• RoBERTa_BASE: a simple adoption of RoBERTa for classification
• RoBERTa_SEQ: a simple adoption of RoBERTa for sequence labeling
• DeepMet: a winning model in the VUA 2020 shared task, using linguistic
features and RoBERTa as a backbone model
18
MelBERT vs. Competing Models
➢In terms of F1-score, MelBERT consistently outperforms competitive
models over two benchmark datasets.
19
Dataset VUA-18 VUA-Verb
Metric Prec Rec F1 Prec Rec F1
RNN_ELMo 71.6 73.6 72.6 68.2 71.3 69.7
RNN_BERT 71.5 71.9 71.7 66.7 71.5 69.0
RNN_HG 71.8 76.3 74.0 69.3 72.3 70.8
RNN_MHCA 73.0 75.7 74.3 66.3 75.2 70.5
RoBERTa_BASE 79.4 75.0 77.2 76.9 72.8 74.8
RoBERTa_SEQ 80.4 74.9 77.5 79.2 69.8 74.2
DeepMet 82.0 71.3 76.3 79.5 70.8 74.9
MelBERT 80.1 76.9 78.5 78.7 72.9 75.7
RNN-based
Contextualization
-based
Ours
POS tags: MelBERT vs. Competing Models
➢ MelBERT shows the best performance in terms of F1-score.
• MelBERT achieves consistent improvements regardless of POS tags of
target words.
20
POS tag Verb Adjective Adverb Noun
Metric Prec Rec F1 Prec Rec F1 Prec Rec F1 Prec Rec F1
RNN_ELMo 68.1 71.9 69.9 56.1 60.6 58.3 67.2 53.7 59.7 59.9 60.8 60.4
RNN_BERT 67.1 72.1 69.5 58.1 51.6 54.7 64.8 61.1 62.9 63.3 56.8 59.9
RNN_HG 66.4 75.5 70.7 59.2 65.6 62.2 61.0 66.8 63.8 60.3 66.8 63.4
RNN_MHCA 66.0 76.0 70.7 61.4 61.7 61.6 66.1 60.7 63.2 69.1 58.2 63.2
RoBERTa_BASE 77.0 72.1 74.5 71.7 59.0 64.7 78.2 69.3 73.5 77.5 60.4 67.9
RoBERTa_SEQ 74.4 75.1 74.8 72.0 57.1 63.7 77.6 63.9 70.1 76.5 59.0 66.6
DeepMet 78.8 68.5 73.3 79.0 52.9 63.3 79.4 66.4 72.3 76.5 57.1 65.4
MelBERT 74.2 75.9 75.1 69.4 60.1 64.4 80.2 69.7 74.6 75.4 66.5 70.7
RNN-based
Contextualization
-based
Ours
Effects of Two Different Linguistic Theories
➢ MelBERT using both metaphor identification theories consistently
shows the best performance.
• MelBERT without SPV outperforms MelBERT without MIP, proving the
effectiveness of the late interaction mechanism.
21
Model
VUA-18 VUA-20
Prec Rec F1 Prec Rec F1
MelBERT 80.1 76.9 78.5 76.4 68.6 72.3
(-) MIP 77.8 75.8 76.7 74.9 67.8 71.1
(-) SPV 79.5 76.3 77.9 74.9 68.4 71.4
(-) MIP (-) SPV
[CLS] Tok 1 Target Tok N
Transformer Encoder Transformer Encoder
[CLS] Target [SEP]
SPV layer
Linear + Softmax
…
…
MelBERT
MIP layer
Qualitative Analysis of MelBERT
➢ MelBERT detects metaphors that other models do not notice.
• MelBERT often fails to identify metaphorical words for implicit metaphors,
e.g., “Way of the World” is poetic.
22
(-) MIP (-) SPV MelBERT Sentence
Manchester is not alone.
That’s an old trick.
So who’s covering tomorrow?
The day thrift turned into a nightmare.
Way of the World: Farming notes (-) MIP (-) SPV
[CLS] Tok 1 Target Tok N
Transformer Encoder Transformer Encoder
[CLS] Target [SEP]
SPV layer
Linear + Softmax
…
…
MelBERT
MIP layer
The metaphorical words in the sentence are in red italicized.
marks correct model prediction.
Conclusion
23
Conclusion
➢ We propose a novel metaphor detection model with metaphor
identification theories.
• MelBERT: metaphor-aware late interaction over BERT
➢ MelBERT has an excellent theoretical foundation in linguistics.
• Selectional Preference Violation(SPV) & Metaphor Identification
Procedure(MIP)
➢ MelBERT achieves competitive or state-of-the-art performance
on various datasets.
24
Q&A
25
Code: https://github.com/jin530/MelBERT
Email: sk1027@skku.edu

More Related Content

What's hot

JDLA主催「CVPR2023技術報告会」発表資料
JDLA主催「CVPR2023技術報告会」発表資料JDLA主催「CVPR2023技術報告会」発表資料
JDLA主催「CVPR2023技術報告会」発表資料Morpho, Inc.
 
脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治
脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治
脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治The Whole Brain Architecture Initiative
 
深層学習 勉強会第1回 ディープラーニングの歴史とFFNNの設計
深層学習 勉強会第1回 ディープラーニングの歴史とFFNNの設計深層学習 勉強会第1回 ディープラーニングの歴史とFFNNの設計
深層学習 勉強会第1回 ディープラーニングの歴史とFFNNの設計Yuta Sugii
 
[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...
[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...
[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...Deep Learning JP
 
ICLR2020の異常検知論文の紹介 (2019/11/23)
ICLR2020の異常検知論文の紹介 (2019/11/23)ICLR2020の異常検知論文の紹介 (2019/11/23)
ICLR2020の異常検知論文の紹介 (2019/11/23)ぱんいち すみもと
 
Deep Learning による視覚×言語融合の最前線
Deep Learning による視覚×言語融合の最前線Deep Learning による視覚×言語融合の最前線
Deep Learning による視覚×言語融合の最前線Yoshitaka Ushiku
 
Detecting attended visual targets in video の勉強会用資料
Detecting attended visual targets in video の勉強会用資料Detecting attended visual targets in video の勉強会用資料
Detecting attended visual targets in video の勉強会用資料Yasunori Ozaki
 
論文紹介 wav2vec: Unsupervised Pre-training for Speech Recognition
論文紹介  wav2vec: Unsupervised Pre-training for Speech Recognition論文紹介  wav2vec: Unsupervised Pre-training for Speech Recognition
論文紹介 wav2vec: Unsupervised Pre-training for Speech RecognitionYosukeKashiwagi1
 
Data-Centric AI開発における データ生成の取り組み
Data-Centric AI開発における データ生成の取り組みData-Centric AI開発における データ生成の取り組み
Data-Centric AI開発における データ生成の取り組みTakeshi Suzuki
 
12. Diffusion Model の数学的基礎.pdf
12. Diffusion Model の数学的基礎.pdf12. Diffusion Model の数学的基礎.pdf
12. Diffusion Model の数学的基礎.pdf幸太朗 岩澤
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII
 
[DL輪読会]VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
[DL輪読会]VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera[DL輪読会]VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
[DL輪読会]VNect: Real-time 3D Human Pose Estimation with a Single RGB CameraDeep Learning JP
 
確率モデルを使ったグラフクラスタリング
確率モデルを使ったグラフクラスタリング確率モデルを使ったグラフクラスタリング
確率モデルを使ったグラフクラスタリング正志 坪坂
 
転移学習ランキング・ドメイン適応
転移学習ランキング・ドメイン適応転移学習ランキング・ドメイン適応
転移学習ランキング・ドメイン適応Elpo González Valbuena
 
Adversarial examples in deep learning (Gregory Chatel)
Adversarial examples in deep learning (Gregory Chatel)Adversarial examples in deep learning (Gregory Chatel)
Adversarial examples in deep learning (Gregory Chatel)MeetupDataScienceRoma
 
【DL輪読会】Learning Physics Constrained Dynamics Using Autoencoders
【DL輪読会】Learning Physics Constrained Dynamics Using Autoencoders【DL輪読会】Learning Physics Constrained Dynamics Using Autoencoders
【DL輪読会】Learning Physics Constrained Dynamics Using AutoencodersDeep Learning JP
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...Deep Learning JP
 

What's hot (20)

JDLA主催「CVPR2023技術報告会」発表資料
JDLA主催「CVPR2023技術報告会」発表資料JDLA主催「CVPR2023技術報告会」発表資料
JDLA主催「CVPR2023技術報告会」発表資料
 
脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治
脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治
脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治
 
深層学習 勉強会第1回 ディープラーニングの歴史とFFNNの設計
深層学習 勉強会第1回 ディープラーニングの歴史とFFNNの設計深層学習 勉強会第1回 ディープラーニングの歴史とFFNNの設計
深層学習 勉強会第1回 ディープラーニングの歴史とFFNNの設計
 
[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...
[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...
[DL輪読会]Peeking into the Future: Predicting Future Person Activities and Locat...
 
ICLR2020の異常検知論文の紹介 (2019/11/23)
ICLR2020の異常検知論文の紹介 (2019/11/23)ICLR2020の異常検知論文の紹介 (2019/11/23)
ICLR2020の異常検知論文の紹介 (2019/11/23)
 
BlackBox モデルの説明性・解釈性技術の実装
BlackBox モデルの説明性・解釈性技術の実装BlackBox モデルの説明性・解釈性技術の実装
BlackBox モデルの説明性・解釈性技術の実装
 
Deep Learning による視覚×言語融合の最前線
Deep Learning による視覚×言語融合の最前線Deep Learning による視覚×言語融合の最前線
Deep Learning による視覚×言語融合の最前線
 
Detecting attended visual targets in video の勉強会用資料
Detecting attended visual targets in video の勉強会用資料Detecting attended visual targets in video の勉強会用資料
Detecting attended visual targets in video の勉強会用資料
 
論文紹介 wav2vec: Unsupervised Pre-training for Speech Recognition
論文紹介  wav2vec: Unsupervised Pre-training for Speech Recognition論文紹介  wav2vec: Unsupervised Pre-training for Speech Recognition
論文紹介 wav2vec: Unsupervised Pre-training for Speech Recognition
 
Data-Centric AI開発における データ生成の取り組み
Data-Centric AI開発における データ生成の取り組みData-Centric AI開発における データ生成の取り組み
Data-Centric AI開発における データ生成の取り組み
 
12. Diffusion Model の数学的基礎.pdf
12. Diffusion Model の数学的基礎.pdf12. Diffusion Model の数学的基礎.pdf
12. Diffusion Model の数学的基礎.pdf
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
 
[DL輪読会]VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
[DL輪読会]VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera[DL輪読会]VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
[DL輪読会]VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
 
確率モデルを使ったグラフクラスタリング
確率モデルを使ったグラフクラスタリング確率モデルを使ったグラフクラスタリング
確率モデルを使ったグラフクラスタリング
 
転移学習ランキング・ドメイン適応
転移学習ランキング・ドメイン適応転移学習ランキング・ドメイン適応
転移学習ランキング・ドメイン適応
 
Adversarial examples in deep learning (Gregory Chatel)
Adversarial examples in deep learning (Gregory Chatel)Adversarial examples in deep learning (Gregory Chatel)
Adversarial examples in deep learning (Gregory Chatel)
 
Skip gram shirakawa_20141121
Skip gram shirakawa_20141121Skip gram shirakawa_20141121
Skip gram shirakawa_20141121
 
【DL輪読会】Learning Physics Constrained Dynamics Using Autoencoders
【DL輪読会】Learning Physics Constrained Dynamics Using Autoencoders【DL輪読会】Learning Physics Constrained Dynamics Using Autoencoders
【DL輪読会】Learning Physics Constrained Dynamics Using Autoencoders
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
 
VAEs for multimodal disentanglement
VAEs for multimodal disentanglementVAEs for multimodal disentanglement
VAEs for multimodal disentanglement
 

Similar to MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories (NAACL 2021)

BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...Kyuri Kim
 
An Introduction to Pre-training General Language Representations
An Introduction to Pre-training General Language RepresentationsAn Introduction to Pre-training General Language Representations
An Introduction to Pre-training General Language Representationszperjaccico
 
NLP using Deep learning
NLP using Deep learningNLP using Deep learning
NLP using Deep learningBabu Priyavrat
 
Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)Biswajit Biswas
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Reviewchangedaeoh
 
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...changedaeoh
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)H K Yoon
 
Natural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyNatural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyRimzim Thube
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTSuman Debnath
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentationSurya Sg
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMakerSuman Debnath
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...NILESH VERMA
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer ConnectAnuj Gupta
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)Jaemin Cho
 

Similar to MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories (NAACL 2021) (20)

BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
 
An Introduction to Pre-training General Language Representations
An Introduction to Pre-training General Language RepresentationsAn Introduction to Pre-training General Language Representations
An Introduction to Pre-training General Language Representations
 
NLP using Deep learning
NLP using Deep learningNLP using Deep learning
NLP using Deep learning
 
Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
 
Natural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyNatural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A Survey
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMaker
 
Bert.pptx
Bert.pptxBert.pptx
Bert.pptx
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Towards OpenLogos Hybrid Machine Translation - Anabela Barreiro
Towards OpenLogos Hybrid Machine Translation - Anabela BarreiroTowards OpenLogos Hybrid Machine Translation - Anabela Barreiro
Towards OpenLogos Hybrid Machine Translation - Anabela Barreiro
 
Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories (NAACL 2021)

  • 1. MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories Minjin Choi1, Sunkyung Lee1, Eunseong Choi1, Heesoo Park2, Junhyuk Lee1, Dongwon Lee3, and Jongwuk Lee1 Sungkyunkwan University (SKKU), Republic of Korea1, Bering Lab, Republic of Korea2, The Pennsylvania State University, United States3 NAACL 2021
  • 3. Metaphor Detection ➢ A metaphor represents other concepts rather than literal meanings. • A metaphor is pervasive and essential, yet subtle. • Metaphor detection can be helpful for various NLP tasks, e.g., machine translation, sentimental analysis, dialogue systems, etc. 3 The debate has been sharpened.
  • 4. Limitation of Existing Methods ➢ Feature-based approaches are intuitive and straightforward but difficult to handle rare usages of metaphors. 4 Literal Metaphorical Black dress Black humor Ripe banana Ripe age Stormy sea Stormy applause Sharp pencil Sharp debate Annotated adjective-noun pairs Metaphorical ! The debate has been sharpened. Luana Bulat, Stephen Clark, Ekaterina Shutova, “Modelling metaphor with attribute-based semantics.”, EACL 2017.
  • 5. Limitation of Existing Methods ➢ RNN-based models can consider sequences but challenging to understand the meaning of words in context. 5 Ge Gao, Eunsol Choi, Yejin Choi, Luke Zettlemoyer, “Neural Metaphor Detection in Context.”, EMNLP 2018. Rui Mao, Chenghua Lin, Frank Guerin, “End-to-End Sequential Metaphor Identification Inspired by Linguistic Theories”, ACL 2019. The debate has been sharpened LSTM LSTM X LSTM LSTM X LSTM LSTM X LSTM LSTM X LSTM LSTM O
  • 6. Our Key Contributions 6 Transformer Encoder ➢ We utilize a contextualized model using two metaphor identification theories. • The model has a powerful representation capacity for the sentences. SPV MIP
  • 7. Our Key Contributions ➢ Selectional Preference Violation (SPV) • Exploit SPV to detect metaphor from the contradiction between a targ et word and its context 7 Unusual in the context of “debate”, rather than a “pencil” vs The debate has been sharpened.
  • 8. Our Key Contributions ➢ Metaphor Identification Procedure (MIP) • Exploit MIP to detect metaphor from the difference between literal meaning and contextual meaning of a word 8 Literal meaning: To make something sharp Contextual meaning: To become more intense The debate has been sharpened.
  • 9. Our Key Contributions ➢ We leverage a late-interaction architecture over pre-trained contextualized models. • It can prevent unnecessary interactions while effectively distinguish the contextualized and linguistic meanings of a word. 9 ∙∙ ∙ ∙∙ ∙ ∙∙ ∙ ∙∙ ∙ Late interaction Sentence Target word ∙∙ ∙ ∙∙ ∙ ∙∙ ∙ ∙∙ ∙ All-to-all interaction Sentence Target word
  • 11. Overview of MelBERT ➢ MelBERT consists of two main components. • SPV layer compares the target word and its context. • MIP layer compares the contextualized and linguistic meaning of the target word. 11 [CLS] The debate sharpened [SEP] Transformer Encoder Transformer Encoder has been [CLS] sharpened [SEP] SPV layer MIP layer Linear + Softmax
  • 12. MelBERT using SPV ➢ SPV layer identifies the contradiction between a target word and its context. • We only utilize the sentence encoder for SPV. 12 Transformer Encoder [CLS] sharpened [SEP] 𝐕𝐒 𝐕𝐒,𝐭 SPV layer The interaction between target word and other words in the sentence. The interaction across all pairwise words in the sentence [CLS] The debate sharpened [SEP] has been
  • 13. [CLS] The debate sharpened [SEP] has been [CLS] sharpened [SEP] 𝐕𝐒 𝐕𝐒,𝐭 𝐕𝒕 MIP layer MelBERT using MIP ➢ MIP layer identifies the semantic gap for the target word in context and isolation. • We utilize the sentence encoder and the target word encoder for MIP. 13 Isolated embedding vector for the target word Contextualized embedding vector for the target word
  • 14. Late Interaction over MelBERT ➢ The hidden vectors are combined to compute a prediction score. • MelBERT predicts whether a target word is metaphorical or not. 14 [CLS] The debate sharpened [SEP] Transformer Encoder has been [CLS] sharpened [SEP] 𝐕𝐒 𝐕𝐒,𝐭 𝐕𝒕 SPV layer MIP layer Transformer Encoder Linear + Softmax Is Metaphor Not Metaphor 0.8 0.2
  • 15. MelBERT in Details ➢ The loss function for MelBERT ➢ Linguistic features • We utilize linguistic features such as POS tags and local contexts for segment embedding. • The local context indicates a clause including target tokens. 15 ෝ 𝒚 = 𝝈 𝑾𝑻 𝒉𝑴𝑰𝑷; 𝒉𝑺𝑷𝑽 + 𝒃 , 𝓛 = ෍ 𝒊=𝟏 𝑵 𝒚𝒊𝒍𝒐𝒈 ෝ 𝒚𝒊 + (𝟏 − 𝒚𝒊)𝐥𝐨𝐠(𝟏 − ෝ 𝒚𝒊) , 𝑤ℎ𝑒𝑟𝑒 ℎ𝑀𝐼𝑃 𝑎𝑛𝑑 ℎ𝑆𝑃𝑉 𝑎𝑟𝑒 ℎ𝑖𝑑𝑑𝑒𝑛 𝑣𝑒𝑐𝑡𝑜𝑟𝑠 𝑒𝑎𝑐ℎ 𝑓𝑟𝑜𝑚 𝑀𝐼𝑃 𝑎𝑛𝑑 𝑆𝑃𝑉 𝑙𝑎𝑦𝑒𝑟
  • 17. Experimental Setup: Dataset ➢ We evaluate MelBERT over well-known public datasets. • VUA-18 and VUA-20 17 Dataset #tokens %M #sent Sent len VUA−18𝑡𝑟 VUA−18𝑑𝑒𝑣 VUA−18𝑡𝑒 116,622 38,628 50,175 11.2 11.6 12.4 6,323 1,550 2,694 18.4 24.9 18.6 VUA−20𝑡𝑟 VUA−20𝑡𝑒 160,154 22,196 12.0 17.9 12,109 3,698 15 15.5 VUA−Verb𝑡𝑒 5,873 30 2,694 18.6 #tokens: the number of tokens %M: the percentage of metaphorical words #sent: the number of sentences Sent len: the average length of sentences
  • 18. Competitive Models ➢ Four RNN-based models • RNN_ELMo: a BiLSTM-based model using ELMo as an input • RNN_BERT: a BiLSTM-based model using BERT embeddings as an input • RNN_HG: a variant of BiLSTM-based model using linguistic theories • RNN_MHCA: a variant of BiLSTM-based model with multi-head attention mechanism using linguistic theories ➢ Three contextualization-based models • RoBERTa_BASE: a simple adoption of RoBERTa for classification • RoBERTa_SEQ: a simple adoption of RoBERTa for sequence labeling • DeepMet: a winning model in the VUA 2020 shared task, using linguistic features and RoBERTa as a backbone model 18
  • 19. MelBERT vs. Competing Models ➢In terms of F1-score, MelBERT consistently outperforms competitive models over two benchmark datasets. 19 Dataset VUA-18 VUA-Verb Metric Prec Rec F1 Prec Rec F1 RNN_ELMo 71.6 73.6 72.6 68.2 71.3 69.7 RNN_BERT 71.5 71.9 71.7 66.7 71.5 69.0 RNN_HG 71.8 76.3 74.0 69.3 72.3 70.8 RNN_MHCA 73.0 75.7 74.3 66.3 75.2 70.5 RoBERTa_BASE 79.4 75.0 77.2 76.9 72.8 74.8 RoBERTa_SEQ 80.4 74.9 77.5 79.2 69.8 74.2 DeepMet 82.0 71.3 76.3 79.5 70.8 74.9 MelBERT 80.1 76.9 78.5 78.7 72.9 75.7 RNN-based Contextualization -based Ours
  • 20. POS tags: MelBERT vs. Competing Models ➢ MelBERT shows the best performance in terms of F1-score. • MelBERT achieves consistent improvements regardless of POS tags of target words. 20 POS tag Verb Adjective Adverb Noun Metric Prec Rec F1 Prec Rec F1 Prec Rec F1 Prec Rec F1 RNN_ELMo 68.1 71.9 69.9 56.1 60.6 58.3 67.2 53.7 59.7 59.9 60.8 60.4 RNN_BERT 67.1 72.1 69.5 58.1 51.6 54.7 64.8 61.1 62.9 63.3 56.8 59.9 RNN_HG 66.4 75.5 70.7 59.2 65.6 62.2 61.0 66.8 63.8 60.3 66.8 63.4 RNN_MHCA 66.0 76.0 70.7 61.4 61.7 61.6 66.1 60.7 63.2 69.1 58.2 63.2 RoBERTa_BASE 77.0 72.1 74.5 71.7 59.0 64.7 78.2 69.3 73.5 77.5 60.4 67.9 RoBERTa_SEQ 74.4 75.1 74.8 72.0 57.1 63.7 77.6 63.9 70.1 76.5 59.0 66.6 DeepMet 78.8 68.5 73.3 79.0 52.9 63.3 79.4 66.4 72.3 76.5 57.1 65.4 MelBERT 74.2 75.9 75.1 69.4 60.1 64.4 80.2 69.7 74.6 75.4 66.5 70.7 RNN-based Contextualization -based Ours
  • 21. Effects of Two Different Linguistic Theories ➢ MelBERT using both metaphor identification theories consistently shows the best performance. • MelBERT without SPV outperforms MelBERT without MIP, proving the effectiveness of the late interaction mechanism. 21 Model VUA-18 VUA-20 Prec Rec F1 Prec Rec F1 MelBERT 80.1 76.9 78.5 76.4 68.6 72.3 (-) MIP 77.8 75.8 76.7 74.9 67.8 71.1 (-) SPV 79.5 76.3 77.9 74.9 68.4 71.4 (-) MIP (-) SPV [CLS] Tok 1 Target Tok N Transformer Encoder Transformer Encoder [CLS] Target [SEP] SPV layer Linear + Softmax … … MelBERT MIP layer
  • 22. Qualitative Analysis of MelBERT ➢ MelBERT detects metaphors that other models do not notice. • MelBERT often fails to identify metaphorical words for implicit metaphors, e.g., “Way of the World” is poetic. 22 (-) MIP (-) SPV MelBERT Sentence Manchester is not alone. That’s an old trick. So who’s covering tomorrow? The day thrift turned into a nightmare. Way of the World: Farming notes (-) MIP (-) SPV [CLS] Tok 1 Target Tok N Transformer Encoder Transformer Encoder [CLS] Target [SEP] SPV layer Linear + Softmax … … MelBERT MIP layer The metaphorical words in the sentence are in red italicized. marks correct model prediction.
  • 24. Conclusion ➢ We propose a novel metaphor detection model with metaphor identification theories. • MelBERT: metaphor-aware late interaction over BERT ➢ MelBERT has an excellent theoretical foundation in linguistics. • Selectional Preference Violation(SPV) & Metaphor Identification Procedure(MIP) ➢ MelBERT achieves competitive or state-of-the-art performance on various datasets. 24