AAN
Adversarial Attention Modeling for
Multi-dimensional Emotion Regression
ACL, 2019
Suyang Zhu, Shoushan Li, Guodong Zhou
Speaker: Po-Chuan Chen
May 30, 2024
1 / 34
AAN
Table of contents
1 Abstract
2 Introduction
3 Adversarial Attention Network
4 Experimentation
5 Conclusion
2 / 34
AAN
Abstract
Table of contents
1 Abstract
2 Introduction
3 Adversarial Attention Network
4 Experimentation
5 Conclusion
3 / 34
AAN
Abstract
Abstract
To improve Reader′s and Writer′s for multi-dimensional emotion
regression tasks with EMOBANK, this paper proposes an Adversarial
Attention Network.
Using an attention layer to determine which words are valuable
for a particular emotion dimension
Learning better word weights by using adversarial training
Incorporating a shared attention layer to learn public word
weights between two emotion dimensions
4 / 34
AAN
Introduction
Table of contents
1 Abstract
2 Introduction
3 Adversarial Attention Network
4 Experimentation
5 Conclusion
5 / 34
AAN
Introduction
Introduction
Analyzing emotion can be divided into:
Emotion classification [1]
Emotion regression [8]
Due to the inherent difficulty of the regression task and the lack of
high-quality, large-scale emotion regression corpora. Studies on
emotion regression have started later than emotion classification.
6 / 34
AAN
Introduction
Introduction (Cont.)
Nowadays, there are many emotion regression corpora [4].
These emotion regression corpora apply the widely-admitted
Valence-Arousal model or Valence-Arousal-Dominance model.
Most of the existing studies in emotion regression focus on a single
emotion dimension by training multiple independent models for
different emotion dimensions.
7 / 34
AAN
Introduction
Introduction (Cont.)
For this paper, they solve multi-dimensional emotion regression via a
joint approach.
They model the multidimensional learning task as multi-task learning
through adversarial learning [5, 6].
The model will learn better representations via adversarial learning
from directly learning continuous attention weights.
8 / 34
AAN
Introduction
Contribution
They propose an Adversarial Attention Network (AAN). It conducts
adversarial learning between two attention layers to learn two sets of
word weight parameters for two emotion dimensions.
Below are two features of AAN:
1 An algorithm to learn two sets of better word weights.
2 Leveraging shared information between emotion dimensions.
9 / 34
AAN
Adversarial Attention Network
Table of contents I
1 Abstract
2 Introduction
3 Adversarial Attention Network
Attention Modeling
Feature Extraction
Dimensional Emotion Regression
Emotion Dimension Discrimination
Adversarial Training
4 Experimentation
10 / 34
AAN
Adversarial Attention Network
Table of contents II
5 Conclusion
11 / 34
AAN
Adversarial Attention Network
Figure 1: The framework of the Valence-Arousal Adversarial Attention
Network which conducts adversarial learning between a pair of emotion
dimensions.
12 / 34
AAN
Adversarial Attention Network
Attention Modeling
Attention Modeling
AAN takes a sequence of word vectors X = [x1 x2 . . . xi . . . xk] of a
text, which contains k words, as an input, where xi denotes the word
vector of the ith words in the text.
The attention layer aims to learn a normalized weight vector
A = [a1 a2 . . . ai . . . ak] from X by a one-layer LSTM to decide the
value of a word vector, and finally output a weighted sequence:
X′
= Att(X) = diag(A)X
where A = softmax(LSTM(X)).
13 / 34
AAN
Adversarial Attention Network
Attention Modeling
Attention Modeling (Cont.)
AAN has three attention layers, denoted as AttV, AttA, and AttS, where
AttV and AttA are used for rating the Valence score and Arousal score.
AttS is a shared attention layer to indicate which words contribute to
the rating scores of both emotion dimensions:
X′
V = AttV (X)
X′
A = AttA(X)
X′
S = AttS(X)
14 / 34
AAN
Adversarial Attention Network
Feature Extraction
Feature Extraction
The feature extractor is trained to extract the feature vector from a
weighted sequence returned by an attention layer. They use a
single-layered bidirectional LSTM (BiLSTM):
H = BiLSTM(X′
)
= [h1 h2 . . . hi . . . hk]
where i means the ith time step.
15 / 34
AAN
Adversarial Attention Network
Feature Extraction
Feature Extraction (Cont.)
The output feature vector Feat set as:
Feat = Ext(X′
)
= tanh(hk ⊕ h)
where ⊕ denotes the concatenating operator and h = 1
k
Ík
i=1 hi.
16 / 34
AAN
Adversarial Attention Network
Dimensional Emotion Regression
Dimensional Emotion Regression
They implement the regressor simply with a single-layered
full-connected neural network, such that the gradients can be
propagated in the network:
S = R(Feat)
= relu(W(Feat) + b)
where S is the regression score, R is a regressor, W denotes the
parameters of the full-connected layer, b denotes the bias term.
17 / 34
AAN
Adversarial Attention Network
Dimensional Emotion Regression
Dimensional Emotion Regression (Cont.)
AAN’s regressor set as:
SV = RV (FeatV ⊕ FeatS)
SA = RA(FeatA ⊕ FeatS)
18 / 34
AAN
Adversarial Attention Network
Emotion Dimension Discrimination
Emotion Dimension Discrimination
The discriminator D judges which emotion dimension an input feature
vector contributes to.
They designed a discriminator using a single-layer fully connected
neural network, with the loss function based on the Wasserstein
distance between two feature distributions:
P = D(Feat)
= tanh(W Feat + b)
where W denotes the parameters of the full-connected layer.
19 / 34
AAN
Adversarial Attention Network
Emotion Dimension Discrimination
Emotion Dimension Discrimination (Cont.)
The discriminating result P ∈ (−1, 1). In AAN, the closer the value of
P is to 1, the more probably Feat contributes to Valence.
The discriminator outputs the results of FeatV and FeatA:
PV = D(FeatV)
PA = D(FeatA)
20 / 34
AAN
Adversarial Attention Network
Adversarial Training
Adversarial Training
Firstly, they train AttV, AttA, AttS, RV, RA and Ext by minimizing
following regression losses:
min
1
n
n
∑︁
i=1
(SVi − TVi )2
min
1
n
n
∑︁
i=1
(SAi − TAi )2
where SVi and SAi denote the regression scores. TVi and TAi denote the
annotated true values of two emotion dimensions. n denotes the total
number of input samples.
21 / 34
AAN
Adversarial Attention Network
Adversarial Training
Adversarial Training (Cont.)
Then, they update the parameters of D by maximizing the Wasserstein
distance between two feature distributions:
max
1
n
n
∑︁
i=1
(PVi − PAi )
During the training process, they clip the parameters of D to a fixed
absolute value at each training epoch [2] to meet the Lipschitz
continuity required for using a full-connected layer to approximately
fit the Wasserstein distance.
22 / 34
AAN
Adversarial Attention Network
Adversarial Training
Adversarial Training (Cont.)
Finally, they update the parameters of AttV and AttA by adversarially
fooling D:
min
1
n
n
∑︁
i=1
(PVi − PAi )
23 / 34
AAN
Experimentation
Table of contents
1 Abstract
2 Introduction
3 Adversarial Attention Network
4 Experimentation
5 Conclusion
24 / 34
AAN
Experimentation
Experiment Settings
Compared with other baselines, they apply AAN to the EMOBANK
Reader′s and Writer′s multi-dimensional emotion regression. Also,
five-fold cross-validation is used in all experiments.
Table 1: The distribution of annotated texts in each domain of the
EMOBANK corpus.
25 / 34
AAN
Experimentation
Experiment Settings (Cont.)
Table 2: List of hyper parameters during AAN training.
They apply the widely used Pearsons correlation coefficient r in all
experiments as the evaluation metric
26 / 34
AAN
Experimentation
Baselines
Deep CNN [3]
Regional CNN-LSTM [8]
Context LSTM-CNN [7]
Attention Network
Joint Learning
27 / 34
AAN
Experimentation
Table 3: The r-values of all the evaluated approaches.
28 / 34
AAN
Conclusion
Table of contents
1 Abstract
2 Introduction
3 Adversarial Attention Network
4 Experimentation
5 Conclusion
29 / 34
AAN
Conclusion
Conclusion
In this paper, they propose an Adversarial Attention Network (AAN)
for multi-dimensional emotion regression.
It takes advantage of both adversarial learning and attention
mechanism by conducting adversarial learning between two attention
layers to learn better-weighted information in a given text.
30 / 34
AAN
Conclusion
References I
[1] Muhammad Abdul-Mageed and Lyle Ungar. “EmoNet:
Fine-Grained Emotion Detection with Gated Recurrent Neural
Networks”. In: Proceedings of the 55th Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long
Papers). 2017, pp. 718–728.
[2] Martin Arjovsky, Soumith Chintala, and Léon Bottou.
Wasserstein GAN. 2017. arXiv: 1701.07875 [stat.ML].
[3] Zsolt Bitvai and Trevor Cohn. “Non-Linear Text Regression with
a Deep Convolutional Neural Network”. In: Proceedings of the
53rd Annual Meeting of the Association for Computational
Linguistics and the 7th International Joint Conference on
Natural Language Processing (Volume 2: Short Papers). 2015,
pp. 180–185.
31 / 34
AAN
Conclusion
References II
[4] Sven Buechel and Udo Hahn. “EmoBank: Studying the Impact
of Annotation Perspective and Representation Format on
Dimensional Emotion Analysis”. In: Proceedings of the 15th
Conference of the European Chapter of the Association for
Computational Linguistics: Volume 2, Short Papers. 2017,
pp. 578–585.
[5] Pengfei Liu, Xipeng Qiu, et al. “Adversarial Multi-task Learning
for Text Classification”. In: Proceedings of the 55th Annual
Meeting of the Association for Computational Linguistics
(Volume 1: Long Papers). 2017, pp. 1–10.
32 / 34
AAN
Conclusion
References III
[6] Ryo Masumura, Yusuke Shinohara, et al. “Adversarial Training
for Multi-task and Multi-lingual Joint Modeling of Utterance
Intent Classification”. In: Proceedings of the 2018 Conference
on Empirical Methods in Natural Language Processing. 2018,
pp. 633–639.
[7] Xingyi Song, Johann Petrak, and Angus Roberts. “A Deep
Neural Network Sentence Level Classification Method with
Context Information”. In: Proceedings of the 2018 Conference
on Empirical Methods in Natural Language Processing. 2018,
pp. 900–904.
33 / 34
AAN
Conclusion
References IV
[8] Jin Wang, Liang-Chih Yu, et al. “Dimensional Sentiment
Analysis Using a Regional CNN-LSTM Model”. In: Proceedings
of the 54th Annual Meeting of the Association for Computational
Linguistics (Volume 2: Short Papers). 2016, pp. 225–230.
34 / 34

Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf

  • 1.
    AAN Adversarial Attention Modelingfor Multi-dimensional Emotion Regression ACL, 2019 Suyang Zhu, Shoushan Li, Guodong Zhou Speaker: Po-Chuan Chen May 30, 2024 1 / 34
  • 2.
    AAN Table of contents 1Abstract 2 Introduction 3 Adversarial Attention Network 4 Experimentation 5 Conclusion 2 / 34
  • 3.
    AAN Abstract Table of contents 1Abstract 2 Introduction 3 Adversarial Attention Network 4 Experimentation 5 Conclusion 3 / 34
  • 4.
    AAN Abstract Abstract To improve Reader′sand Writer′s for multi-dimensional emotion regression tasks with EMOBANK, this paper proposes an Adversarial Attention Network. Using an attention layer to determine which words are valuable for a particular emotion dimension Learning better word weights by using adversarial training Incorporating a shared attention layer to learn public word weights between two emotion dimensions 4 / 34
  • 5.
    AAN Introduction Table of contents 1Abstract 2 Introduction 3 Adversarial Attention Network 4 Experimentation 5 Conclusion 5 / 34
  • 6.
    AAN Introduction Introduction Analyzing emotion canbe divided into: Emotion classification [1] Emotion regression [8] Due to the inherent difficulty of the regression task and the lack of high-quality, large-scale emotion regression corpora. Studies on emotion regression have started later than emotion classification. 6 / 34
  • 7.
    AAN Introduction Introduction (Cont.) Nowadays, thereare many emotion regression corpora [4]. These emotion regression corpora apply the widely-admitted Valence-Arousal model or Valence-Arousal-Dominance model. Most of the existing studies in emotion regression focus on a single emotion dimension by training multiple independent models for different emotion dimensions. 7 / 34
  • 8.
    AAN Introduction Introduction (Cont.) For thispaper, they solve multi-dimensional emotion regression via a joint approach. They model the multidimensional learning task as multi-task learning through adversarial learning [5, 6]. The model will learn better representations via adversarial learning from directly learning continuous attention weights. 8 / 34
  • 9.
    AAN Introduction Contribution They propose anAdversarial Attention Network (AAN). It conducts adversarial learning between two attention layers to learn two sets of word weight parameters for two emotion dimensions. Below are two features of AAN: 1 An algorithm to learn two sets of better word weights. 2 Leveraging shared information between emotion dimensions. 9 / 34
  • 10.
    AAN Adversarial Attention Network Tableof contents I 1 Abstract 2 Introduction 3 Adversarial Attention Network Attention Modeling Feature Extraction Dimensional Emotion Regression Emotion Dimension Discrimination Adversarial Training 4 Experimentation 10 / 34
  • 11.
    AAN Adversarial Attention Network Tableof contents II 5 Conclusion 11 / 34
  • 12.
    AAN Adversarial Attention Network Figure1: The framework of the Valence-Arousal Adversarial Attention Network which conducts adversarial learning between a pair of emotion dimensions. 12 / 34
  • 13.
    AAN Adversarial Attention Network AttentionModeling Attention Modeling AAN takes a sequence of word vectors X = [x1 x2 . . . xi . . . xk] of a text, which contains k words, as an input, where xi denotes the word vector of the ith words in the text. The attention layer aims to learn a normalized weight vector A = [a1 a2 . . . ai . . . ak] from X by a one-layer LSTM to decide the value of a word vector, and finally output a weighted sequence: X′ = Att(X) = diag(A)X where A = softmax(LSTM(X)). 13 / 34
  • 14.
    AAN Adversarial Attention Network AttentionModeling Attention Modeling (Cont.) AAN has three attention layers, denoted as AttV, AttA, and AttS, where AttV and AttA are used for rating the Valence score and Arousal score. AttS is a shared attention layer to indicate which words contribute to the rating scores of both emotion dimensions: X′ V = AttV (X) X′ A = AttA(X) X′ S = AttS(X) 14 / 34
  • 15.
    AAN Adversarial Attention Network FeatureExtraction Feature Extraction The feature extractor is trained to extract the feature vector from a weighted sequence returned by an attention layer. They use a single-layered bidirectional LSTM (BiLSTM): H = BiLSTM(X′ ) = [h1 h2 . . . hi . . . hk] where i means the ith time step. 15 / 34
  • 16.
    AAN Adversarial Attention Network FeatureExtraction Feature Extraction (Cont.) The output feature vector Feat set as: Feat = Ext(X′ ) = tanh(hk ⊕ h) where ⊕ denotes the concatenating operator and h = 1 k Ík i=1 hi. 16 / 34
  • 17.
    AAN Adversarial Attention Network DimensionalEmotion Regression Dimensional Emotion Regression They implement the regressor simply with a single-layered full-connected neural network, such that the gradients can be propagated in the network: S = R(Feat) = relu(W(Feat) + b) where S is the regression score, R is a regressor, W denotes the parameters of the full-connected layer, b denotes the bias term. 17 / 34
  • 18.
    AAN Adversarial Attention Network DimensionalEmotion Regression Dimensional Emotion Regression (Cont.) AAN’s regressor set as: SV = RV (FeatV ⊕ FeatS) SA = RA(FeatA ⊕ FeatS) 18 / 34
  • 19.
    AAN Adversarial Attention Network EmotionDimension Discrimination Emotion Dimension Discrimination The discriminator D judges which emotion dimension an input feature vector contributes to. They designed a discriminator using a single-layer fully connected neural network, with the loss function based on the Wasserstein distance between two feature distributions: P = D(Feat) = tanh(W Feat + b) where W denotes the parameters of the full-connected layer. 19 / 34
  • 20.
    AAN Adversarial Attention Network EmotionDimension Discrimination Emotion Dimension Discrimination (Cont.) The discriminating result P ∈ (−1, 1). In AAN, the closer the value of P is to 1, the more probably Feat contributes to Valence. The discriminator outputs the results of FeatV and FeatA: PV = D(FeatV) PA = D(FeatA) 20 / 34
  • 21.
    AAN Adversarial Attention Network AdversarialTraining Adversarial Training Firstly, they train AttV, AttA, AttS, RV, RA and Ext by minimizing following regression losses: min 1 n n ∑︁ i=1 (SVi − TVi )2 min 1 n n ∑︁ i=1 (SAi − TAi )2 where SVi and SAi denote the regression scores. TVi and TAi denote the annotated true values of two emotion dimensions. n denotes the total number of input samples. 21 / 34
  • 22.
    AAN Adversarial Attention Network AdversarialTraining Adversarial Training (Cont.) Then, they update the parameters of D by maximizing the Wasserstein distance between two feature distributions: max 1 n n ∑︁ i=1 (PVi − PAi ) During the training process, they clip the parameters of D to a fixed absolute value at each training epoch [2] to meet the Lipschitz continuity required for using a full-connected layer to approximately fit the Wasserstein distance. 22 / 34
  • 23.
    AAN Adversarial Attention Network AdversarialTraining Adversarial Training (Cont.) Finally, they update the parameters of AttV and AttA by adversarially fooling D: min 1 n n ∑︁ i=1 (PVi − PAi ) 23 / 34
  • 24.
    AAN Experimentation Table of contents 1Abstract 2 Introduction 3 Adversarial Attention Network 4 Experimentation 5 Conclusion 24 / 34
  • 25.
    AAN Experimentation Experiment Settings Compared withother baselines, they apply AAN to the EMOBANK Reader′s and Writer′s multi-dimensional emotion regression. Also, five-fold cross-validation is used in all experiments. Table 1: The distribution of annotated texts in each domain of the EMOBANK corpus. 25 / 34
  • 26.
    AAN Experimentation Experiment Settings (Cont.) Table2: List of hyper parameters during AAN training. They apply the widely used Pearsons correlation coefficient r in all experiments as the evaluation metric 26 / 34
  • 27.
    AAN Experimentation Baselines Deep CNN [3] RegionalCNN-LSTM [8] Context LSTM-CNN [7] Attention Network Joint Learning 27 / 34
  • 28.
    AAN Experimentation Table 3: Ther-values of all the evaluated approaches. 28 / 34
  • 29.
    AAN Conclusion Table of contents 1Abstract 2 Introduction 3 Adversarial Attention Network 4 Experimentation 5 Conclusion 29 / 34
  • 30.
    AAN Conclusion Conclusion In this paper,they propose an Adversarial Attention Network (AAN) for multi-dimensional emotion regression. It takes advantage of both adversarial learning and attention mechanism by conducting adversarial learning between two attention layers to learn better-weighted information in a given text. 30 / 34
  • 31.
    AAN Conclusion References I [1] MuhammadAbdul-Mageed and Lyle Ungar. “EmoNet: Fine-Grained Emotion Detection with Gated Recurrent Neural Networks”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017, pp. 718–728. [2] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein GAN. 2017. arXiv: 1701.07875 [stat.ML]. [3] Zsolt Bitvai and Trevor Cohn. “Non-Linear Text Regression with a Deep Convolutional Neural Network”. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015, pp. 180–185. 31 / 34
  • 32.
    AAN Conclusion References II [4] SvenBuechel and Udo Hahn. “EmoBank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis”. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. 2017, pp. 578–585. [5] Pengfei Liu, Xipeng Qiu, et al. “Adversarial Multi-task Learning for Text Classification”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017, pp. 1–10. 32 / 34
  • 33.
    AAN Conclusion References III [6] RyoMasumura, Yusuke Shinohara, et al. “Adversarial Training for Multi-task and Multi-lingual Joint Modeling of Utterance Intent Classification”. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018, pp. 633–639. [7] Xingyi Song, Johann Petrak, and Angus Roberts. “A Deep Neural Network Sentence Level Classification Method with Context Information”. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018, pp. 900–904. 33 / 34
  • 34.
    AAN Conclusion References IV [8] JinWang, Liang-Chih Yu, et al. “Dimensional Sentiment Analysis Using a Regional CNN-LSTM Model”. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2016, pp. 225–230. 34 / 34