Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf

AAN
Adversarial Attention Modeling for
Multi-dimensional Emotion Regression
ACL, 2019
Suyang Zhu, Shoushan Li, Guodong Zhou
Speaker: Po-Chuan Chen
May 30, 2024
1 / 34

AAN
Table of contents
1 Abstract
2 Introduction
3 Adversarial Attention Network
4 Experimentation
5 Conclusion
2 / 34

AAN
Abstract
Table of contents
1 Abstract
2 Introduction
4 Experimentation
5 Conclusion
3 / 34

AAN
Abstract
Abstract
To improve Reader′s and Writer′s for multi-dimensional emotion
regression tasks with EMOBANK, this paper proposes an Adversarial
Attention Network.
Using an attention layer to determine which words are valuable
for a particular emotion dimension
Learning better word weights by using adversarial training
Incorporating a shared attention layer to learn public word
weights between two emotion dimensions
4 / 34

AAN
Introduction
Table of contents
1 Abstract
2 Introduction
4 Experimentation
5 Conclusion
5 / 34

AAN
Introduction
Introduction
Analyzing emotion can be divided into:
Emotion classification [1]
Emotion regression [8]
Due to the inherent difficulty of the regression task and the lack of
high-quality, large-scale emotion regression corpora. Studies on
emotion regression have started later than emotion classification.
6 / 34

AAN
Introduction
Introduction (Cont.)
Nowadays, there are many emotion regression corpora [4].
These emotion regression corpora apply the widely-admitted
Valence-Arousal model or Valence-Arousal-Dominance model.
Most of the existing studies in emotion regression focus on a single
emotion dimension by training multiple independent models for
different emotion dimensions.
7 / 34

AAN
Introduction
Introduction (Cont.)
For this paper, they solve multi-dimensional emotion regression via a
joint approach.
They model the multidimensional learning task as multi-task learning
through adversarial learning [5, 6].
The model will learn better representations via adversarial learning
from directly learning continuous attention weights.
8 / 34

AAN
Introduction
Contribution
They propose an Adversarial Attention Network (AAN). It conducts
adversarial learning between two attention layers to learn two sets of
word weight parameters for two emotion dimensions.
Below are two features of AAN:
1 An algorithm to learn two sets of better word weights.
2 Leveraging shared information between emotion dimensions.
9 / 34

AAN
Adversarial Attention Network
Table of contents I
1 Abstract
2 Introduction
Attention Modeling
Feature Extraction
Dimensional Emotion Regression
Emotion Dimension Discrimination
Adversarial Training
4 Experimentation
10 / 34

AAN
Table of contents II
5 Conclusion
11 / 34

AAN
Figure 1: The framework of the Valence-Arousal Adversarial Attention
Network which conducts adversarial learning between a pair of emotion
dimensions.
12 / 34

AAN
Attention Modeling
Attention Modeling
AAN takes a sequence of word vectors X = [x1 x2 . . . xi . . . xk] of a
text, which contains k words, as an input, where xi denotes the word
vector of the ith words in the text.
The attention layer aims to learn a normalized weight vector
A = [a1 a2 . . . ai . . . ak] from X by a one-layer LSTM to decide the
value of a word vector, and finally output a weighted sequence:
X′
= Att(X) = diag(A)X
where A = softmax(LSTM(X)).
13 / 34

AAN
Attention Modeling
Attention Modeling (Cont.)
AAN has three attention layers, denoted as AttV, AttA, and AttS, where
AttV and AttA are used for rating the Valence score and Arousal score.
AttS is a shared attention layer to indicate which words contribute to
the rating scores of both emotion dimensions:
X′
V = AttV (X)
X′
A = AttA(X)
X′
S = AttS(X)
14 / 34

AAN
Feature Extraction
Feature Extraction
The feature extractor is trained to extract the feature vector from a
weighted sequence returned by an attention layer. They use a
single-layered bidirectional LSTM (BiLSTM):
H = BiLSTM(X′
)
= [h1 h2 . . . hi . . . hk]
where i means the ith time step.
15 / 34

AAN
Feature Extraction
Feature Extraction (Cont.)
The output feature vector Feat set as:
Feat = Ext(X′
)
= tanh(hk ⊕ h)
where ⊕ denotes the concatenating operator and h = 1
k
Ík
i=1 hi.
16 / 34

AAN
They implement the regressor simply with a single-layered
full-connected neural network, such that the gradients can be
propagated in the network:
S = R(Feat)
= relu(W(Feat) + b)
where S is the regression score, R is a regressor, W denotes the
parameters of the full-connected layer, b denotes the bias term.
17 / 34

AAN
Dimensional Emotion Regression (Cont.)
AAN’s regressor set as:
SV = RV (FeatV ⊕ FeatS)
SA = RA(FeatA ⊕ FeatS)
18 / 34

AAN
The discriminator D judges which emotion dimension an input feature
vector contributes to.
They designed a discriminator using a single-layer fully connected
neural network, with the loss function based on the Wasserstein
distance between two feature distributions:
P = D(Feat)
= tanh(W Feat + b)
where W denotes the parameters of the full-connected layer.
19 / 34

AAN
Emotion Dimension Discrimination (Cont.)
The discriminating result P ∈ (−1, 1). In AAN, the closer the value of
P is to 1, the more probably Feat contributes to Valence.
The discriminator outputs the results of FeatV and FeatA:
PV = D(FeatV)
PA = D(FeatA)
20 / 34

AAN
Firstly, they train AttV, AttA, AttS, RV, RA and Ext by minimizing
following regression losses:
min
1
n
n
∑︁
i=1
(SVi − TVi )2
min
1
n
n
∑︁
i=1
(SAi − TAi )2
where SVi and SAi denote the regression scores. TVi and TAi denote the
annotated true values of two emotion dimensions. n denotes the total
number of input samples.
21 / 34

AAN
Adversarial Training (Cont.)
Then, they update the parameters of D by maximizing the Wasserstein
distance between two feature distributions:
max
1
n
n
∑︁
i=1
(PVi − PAi )
During the training process, they clip the parameters of D to a fixed
absolute value at each training epoch [2] to meet the Lipschitz
continuity required for using a full-connected layer to approximately
fit the Wasserstein distance.
22 / 34

AAN
Adversarial Training (Cont.)
Finally, they update the parameters of AttV and AttA by adversarially
fooling D:
min
1
n
n
∑︁
i=1
(PVi − PAi )
23 / 34

AAN
Experimentation
Table of contents
1 Abstract
2 Introduction
4 Experimentation
5 Conclusion
24 / 34

AAN
Experimentation
Experiment Settings
Compared with other baselines, they apply AAN to the EMOBANK
Reader′s and Writer′s multi-dimensional emotion regression. Also,
five-fold cross-validation is used in all experiments.
Table 1: The distribution of annotated texts in each domain of the
EMOBANK corpus.
25 / 34

AAN
Experimentation
Experiment Settings (Cont.)
Table 2: List of hyper parameters during AAN training.
They apply the widely used Pearsons correlation coefficient r in all
experiments as the evaluation metric
26 / 34

AAN
Experimentation
Baselines
Deep CNN [3]
Regional CNN-LSTM [8]
Context LSTM-CNN [7]
Attention Network
Joint Learning
27 / 34

AAN
Experimentation
Table 3: The r-values of all the evaluated approaches.
28 / 34

AAN
Conclusion
Table of contents
1 Abstract
2 Introduction
4 Experimentation
5 Conclusion
29 / 34

AAN
Conclusion
Conclusion
In this paper, they propose an Adversarial Attention Network (AAN)
for multi-dimensional emotion regression.
It takes advantage of both adversarial learning and attention
mechanism by conducting adversarial learning between two attention
layers to learn better-weighted information in a given text.
30 / 34

AAN
Conclusion
References I
[1] Muhammad Abdul-Mageed and Lyle Ungar. “EmoNet:
Fine-Grained Emotion Detection with Gated Recurrent Neural
Networks”. In: Proceedings of the 55th Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long
Papers). 2017, pp. 718–728.
[2] Martin Arjovsky, Soumith Chintala, and Léon Bottou.
Wasserstein GAN. 2017. arXiv: 1701.07875 [stat.ML].
[3] Zsolt Bitvai and Trevor Cohn. “Non-Linear Text Regression with
a Deep Convolutional Neural Network”. In: Proceedings of the
53rd Annual Meeting of the Association for Computational
Linguistics and the 7th International Joint Conference on
Natural Language Processing (Volume 2: Short Papers). 2015,
pp. 180–185.
31 / 34

AAN
Conclusion
References II
[4] Sven Buechel and Udo Hahn. “EmoBank: Studying the Impact
of Annotation Perspective and Representation Format on
Dimensional Emotion Analysis”. In: Proceedings of the 15th
Conference of the European Chapter of the Association for
Computational Linguistics: Volume 2, Short Papers. 2017,
pp. 578–585.
[5] Pengfei Liu, Xipeng Qiu, et al. “Adversarial Multi-task Learning
for Text Classification”. In: Proceedings of the 55th Annual
Meeting of the Association for Computational Linguistics
(Volume 1: Long Papers). 2017, pp. 1–10.
32 / 34

AAN
Conclusion
References III
[6] Ryo Masumura, Yusuke Shinohara, et al. “Adversarial Training
for Multi-task and Multi-lingual Joint Modeling of Utterance
Intent Classification”. In: Proceedings of the 2018 Conference
on Empirical Methods in Natural Language Processing. 2018,
pp. 633–639.
[7] Xingyi Song, Johann Petrak, and Angus Roberts. “A Deep
Neural Network Sentence Level Classification Method with
Context Information”. In: Proceedings of the 2018 Conference
on Empirical Methods in Natural Language Processing. 2018,
pp. 900–904.
33 / 34

AAN
Conclusion
References IV
[8] Jin Wang, Liang-Chih Yu, et al. “Dimensional Sentiment
Analysis Using a Regional CNN-LSTM Model”. In: Proceedings
of the 54th Annual Meeting of the Association for Computational
Linguistics (Volume 2: Short Papers). 2016, pp. 225–230.
34 / 34

Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf

More Related Content

More from Po-Chuan Chen

Recently uploaded

Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf