Data Augmentation for Improving
Emotion Recognition in Software
Engineering Communication
Mia Mohammad Imran
ASE 2022 - Research Paper
Yashasvi Jain
Preetha Chatterjee Kostadin Damevski
1
● Developers often show emotions (joy, anger, etc) in their communications.
Motivation
Toxic 🤬
Appreciation 🙏
2
“@[USER] Thank you, Stephen. I hope in
the future Angular will become even better
and easier to understand. However, first of
all, I am grateful to Angular for making me
grow as a developer.”
Soooooooooooo you’re setting Angular on
fire and saying bold shit in bold like the
Angular team don’t care about you cause you
found relative pathing has an issue is an odd
area
Motivation
● General purpose emotion classification tools are not effective to Software
Engineering corpora.
● Researchers developed SE-specific tools to recognize emotions.
○ These tools do not perform very well [1]. On a StackOverflow dataset:
■ Joy: F1-score ranges between 0.37 to 0.47.
■ Fear: F1-score ranges between 0.22 to 0.40.
● Most likely problem: lack of large high-quality datasets on software developers
emotions in communication channel.
[1] Chen et al. “Emoji-powered sentiment and emotion detection from software developers' communication data.” TOSEM, 2021
3
Data Collection
● Selected 4 popular OSS repositories with over 50k GitHub stars.
4
[1] Biswas et al., “Achieving reliable sentiment analysis in the software engineering domain using bert.” ICSME, 2020.
● Total 2000 comments (1000 positive & 1000 negative)
Emotion Categorization
● There are a number of models of emotions.
● Most popular in SE is Shaver’s emotion categorization.
○ 6 primary categories:
■ Anger 😡
■ Love ❤
■ Fear 😨
■ Joy 😊
■ Sadness 😥
■ Surprise 😲
○ 25 secondary categories and over 100 tertiary categories.
5
Emotion Categorization: Shaver’s Categories
● 6 primary categories:
○ Anger 😡
○ Love ❤
○ Fear 😨
○ Joy 😊
○ Sadness 😥
○ Surprise 😲
● 25 secondary categories and over
100 tertiary categories.
❤
6
Shaver’s Categories Are Not a Perfect Match
● “I’m curious about this - can you give more context on what exactly goes
wrong? Perhaps if that causes bugs this should be prohibited instead?"
○ Expresses Curiosity 🤔
● “And, I am a little confused, if there is not any special folder, according to the
module resolution [URL] How could file find the correct modules? Did I miss
something?”
○ Expresses Confusion 😕
7
Shaver’s Categories Are Not a Perfect Match
● To mitigate the problem, we combine a recent text-based emotion classification
tool GoEmotions (2020) by Google which has 27 categories.
○ Provided a mapping between their categories and primary emotions:
■ 👍 Approval to 😊 Joy
■ 👎 Disapproval to 😡 Anger
■ 🤔 Curiosity to 😲 Surprise
8
Studied Tools for Emotion Classification in SE
ESEM-E [1] SVM Unigram, bigram
EMTk [2] SVM Unigram, bigram, emotion lexicon, polarity, mood
SEntiMoji [3] Transfer learning DeepMoji representation model
[1] Murgia et al., “An exploratory qualitative and quantitative analysis of emotions in issue report comments of open source systems.”, ESEM, 2018
[2] Calefato et al., “Emtk-the emotion mining toolkit.” SEmotion, 2019
[3] Chen et al. “Emoji-powered sentiment and emotion detection from software developers' communication data.” TOSEM, 2021
9
How Do the Tools Perform
● F1-score similar across all three tools.
● Overall precision significantly higher than recall.
○ The tools predicted conservatively.
■ choosing to predict more utterances lacking a certain emotion.
10
How Do the Tools Perform
● The false positive instances are broadly spread.
● Vast majority (58%) of the false negative instances are shared among the tools.
11
Error Analysis of FNs
● Analyzed 176 FN instances using Novielli et al.’s categorization [1].
[1] Novielli, Nicole et al. "A benchmark study on sentiment analysis for software engineering research." 2018 MSR. 12
Error Analysis of FNs
● General Error: the inability to recognize lexical cues that occur in the text.
○ “that’s awesome, I’ve been needing this for a while”
● Implicit Sentiment Polarity: humans use common knowledge to recognize
emotions that the tools miss.
○ “This was actually causing this test-case not to be executed!”
13
Data Augmentation
● Hypothesis: More training data should improve some error categories.
● Data Augmentation: a technique for creating new training instances by
targeted modification.
● The new instance is:
○ different from the source instance.
○ label invariant.
“awesome! I'm
glad you know
about this
trick.”
“awesome! I'm
happy you
know about
this trick.”
Data Augmentation
14
Data Augmentation: Unconstrained Strategy
● Four operators: insert, substitute, delete and shuffle.
● Used BART [1] generative model for insert and substitute operations.
15
[1] Lewis et al., “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and
Comprehension.” ACL, 2020
Data Augmentation: Unconstrained Strategy
● Unconstrained Strategy sometimes introduce noise.
○ Source:
“This looks good, thanks for clarifying the docs.”
○ Augmented:
“This looks worse, thanks for clarifying the docs.”
16
Data Augmentation: Lexicon-based Strategy
● Insert or Substitute word using an SE-specific emotion lexicon.
○ Emotion of the word is same as the annotation of the utterance.
● The SE-specific emotion lexicon comes from Mäntylä et al. [1].
[1] Mäntylä et al., “Bootstrapping a lexicon for emotional arousal in software engineering.” MSR, 2017
“This looks good,
thanks for clarifying
the docs.”
“This looks
wonderful, thanks
for clarifying the
docs.”
word from ‘Joy’ Lexicon
17
Data Augmentation: Polarity-based Strategy
● Same four operators as Unconstrained Strategy.
○ Delete word only if it has neutral polarity.
Positive Emotions Negative Emotions Ambiguous Emotions
Love Anger Surprise
Joy Fear
Sadness
Increase or
Preserve
Positive polarity
Increase or
Preserve
Negative polarity
No changes in
polarity
18
Data Augmentation: Results
● Overall Polarity strategy performed best.
19
Data Augmentation: Takeaway
● Helps to identify insufficient lexical cues.
○ “that’s awesome, I’ve been needing this for a while”
● Data augmentation does not seem to help in identifying implicit emotions.
○ “This was actually causing this test-case not to be executed!”
● Polarity strategy worked best, likely because it provided a balance between:
○ completely unconstrained augmentation and highly constrained
augmentation.
20
Summary of Contributions
● Manually annotated 2000 GitHub utterances.
● Extension of emotion taxonomy.
● Qualitative error analysis of three existing SE emotion classification tools.
● Demonstration and evaluation of three data augmentation approaches.
● Annotation instructions, annotated dataset, and source codes for data
augmentation are publicly available.
Questions/Thoughts/Collaboration Ideas to:
Mia Mohammad Imran, imranm3@vcu.edu
21

Data Augmentation for Improving Emotion Recognition in Software Engineering Communication

  • 1.
    Data Augmentation forImproving Emotion Recognition in Software Engineering Communication Mia Mohammad Imran ASE 2022 - Research Paper Yashasvi Jain Preetha Chatterjee Kostadin Damevski 1
  • 2.
    ● Developers oftenshow emotions (joy, anger, etc) in their communications. Motivation Toxic 🤬 Appreciation 🙏 2 “@[USER] Thank you, Stephen. I hope in the future Angular will become even better and easier to understand. However, first of all, I am grateful to Angular for making me grow as a developer.” Soooooooooooo you’re setting Angular on fire and saying bold shit in bold like the Angular team don’t care about you cause you found relative pathing has an issue is an odd area
  • 3.
    Motivation ● General purposeemotion classification tools are not effective to Software Engineering corpora. ● Researchers developed SE-specific tools to recognize emotions. ○ These tools do not perform very well [1]. On a StackOverflow dataset: ■ Joy: F1-score ranges between 0.37 to 0.47. ■ Fear: F1-score ranges between 0.22 to 0.40. ● Most likely problem: lack of large high-quality datasets on software developers emotions in communication channel. [1] Chen et al. “Emoji-powered sentiment and emotion detection from software developers' communication data.” TOSEM, 2021 3
  • 4.
    Data Collection ● Selected4 popular OSS repositories with over 50k GitHub stars. 4 [1] Biswas et al., “Achieving reliable sentiment analysis in the software engineering domain using bert.” ICSME, 2020. ● Total 2000 comments (1000 positive & 1000 negative)
  • 5.
    Emotion Categorization ● Thereare a number of models of emotions. ● Most popular in SE is Shaver’s emotion categorization. ○ 6 primary categories: ■ Anger 😡 ■ Love ❤ ■ Fear 😨 ■ Joy 😊 ■ Sadness 😥 ■ Surprise 😲 ○ 25 secondary categories and over 100 tertiary categories. 5
  • 6.
    Emotion Categorization: Shaver’sCategories ● 6 primary categories: ○ Anger 😡 ○ Love ❤ ○ Fear 😨 ○ Joy 😊 ○ Sadness 😥 ○ Surprise 😲 ● 25 secondary categories and over 100 tertiary categories. ❤ 6
  • 7.
    Shaver’s Categories AreNot a Perfect Match ● “I’m curious about this - can you give more context on what exactly goes wrong? Perhaps if that causes bugs this should be prohibited instead?" ○ Expresses Curiosity 🤔 ● “And, I am a little confused, if there is not any special folder, according to the module resolution [URL] How could file find the correct modules? Did I miss something?” ○ Expresses Confusion 😕 7
  • 8.
    Shaver’s Categories AreNot a Perfect Match ● To mitigate the problem, we combine a recent text-based emotion classification tool GoEmotions (2020) by Google which has 27 categories. ○ Provided a mapping between their categories and primary emotions: ■ 👍 Approval to 😊 Joy ■ 👎 Disapproval to 😡 Anger ■ 🤔 Curiosity to 😲 Surprise 8
  • 9.
    Studied Tools forEmotion Classification in SE ESEM-E [1] SVM Unigram, bigram EMTk [2] SVM Unigram, bigram, emotion lexicon, polarity, mood SEntiMoji [3] Transfer learning DeepMoji representation model [1] Murgia et al., “An exploratory qualitative and quantitative analysis of emotions in issue report comments of open source systems.”, ESEM, 2018 [2] Calefato et al., “Emtk-the emotion mining toolkit.” SEmotion, 2019 [3] Chen et al. “Emoji-powered sentiment and emotion detection from software developers' communication data.” TOSEM, 2021 9
  • 10.
    How Do theTools Perform ● F1-score similar across all three tools. ● Overall precision significantly higher than recall. ○ The tools predicted conservatively. ■ choosing to predict more utterances lacking a certain emotion. 10
  • 11.
    How Do theTools Perform ● The false positive instances are broadly spread. ● Vast majority (58%) of the false negative instances are shared among the tools. 11
  • 12.
    Error Analysis ofFNs ● Analyzed 176 FN instances using Novielli et al.’s categorization [1]. [1] Novielli, Nicole et al. "A benchmark study on sentiment analysis for software engineering research." 2018 MSR. 12
  • 13.
    Error Analysis ofFNs ● General Error: the inability to recognize lexical cues that occur in the text. ○ “that’s awesome, I’ve been needing this for a while” ● Implicit Sentiment Polarity: humans use common knowledge to recognize emotions that the tools miss. ○ “This was actually causing this test-case not to be executed!” 13
  • 14.
    Data Augmentation ● Hypothesis:More training data should improve some error categories. ● Data Augmentation: a technique for creating new training instances by targeted modification. ● The new instance is: ○ different from the source instance. ○ label invariant. “awesome! I'm glad you know about this trick.” “awesome! I'm happy you know about this trick.” Data Augmentation 14
  • 15.
    Data Augmentation: UnconstrainedStrategy ● Four operators: insert, substitute, delete and shuffle. ● Used BART [1] generative model for insert and substitute operations. 15 [1] Lewis et al., “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension.” ACL, 2020
  • 16.
    Data Augmentation: UnconstrainedStrategy ● Unconstrained Strategy sometimes introduce noise. ○ Source: “This looks good, thanks for clarifying the docs.” ○ Augmented: “This looks worse, thanks for clarifying the docs.” 16
  • 17.
    Data Augmentation: Lexicon-basedStrategy ● Insert or Substitute word using an SE-specific emotion lexicon. ○ Emotion of the word is same as the annotation of the utterance. ● The SE-specific emotion lexicon comes from Mäntylä et al. [1]. [1] Mäntylä et al., “Bootstrapping a lexicon for emotional arousal in software engineering.” MSR, 2017 “This looks good, thanks for clarifying the docs.” “This looks wonderful, thanks for clarifying the docs.” word from ‘Joy’ Lexicon 17
  • 18.
    Data Augmentation: Polarity-basedStrategy ● Same four operators as Unconstrained Strategy. ○ Delete word only if it has neutral polarity. Positive Emotions Negative Emotions Ambiguous Emotions Love Anger Surprise Joy Fear Sadness Increase or Preserve Positive polarity Increase or Preserve Negative polarity No changes in polarity 18
  • 19.
    Data Augmentation: Results ●Overall Polarity strategy performed best. 19
  • 20.
    Data Augmentation: Takeaway ●Helps to identify insufficient lexical cues. ○ “that’s awesome, I’ve been needing this for a while” ● Data augmentation does not seem to help in identifying implicit emotions. ○ “This was actually causing this test-case not to be executed!” ● Polarity strategy worked best, likely because it provided a balance between: ○ completely unconstrained augmentation and highly constrained augmentation. 20
  • 21.
    Summary of Contributions ●Manually annotated 2000 GitHub utterances. ● Extension of emotion taxonomy. ● Qualitative error analysis of three existing SE emotion classification tools. ● Demonstration and evaluation of three data augmentation approaches. ● Annotation instructions, annotated dataset, and source codes for data augmentation are publicly available. Questions/Thoughts/Collaboration Ideas to: Mia Mohammad Imran, imranm3@vcu.edu 21