SlideShare a Scribd company logo
Data Augmentation for Improving
Emotion Recognition in Software
Engineering Communication
Mia Mohammad Imran
ASE 2022 - Research Paper
Yashasvi Jain
Preetha Chatterjee Kostadin Damevski
1
● Developers often show emotions (joy, anger, etc) in their communications.
Motivation
Toxic 🤬
Appreciation 🙏
2
“@[USER] Thank you, Stephen. I hope in
the future Angular will become even better
and easier to understand. However, first of
all, I am grateful to Angular for making me
grow as a developer.”
Soooooooooooo you’re setting Angular on
fire and saying bold shit in bold like the
Angular team don’t care about you cause you
found relative pathing has an issue is an odd
area
Motivation
● General purpose emotion classification tools are not effective to Software
Engineering corpora.
● Researchers developed SE-specific tools to recognize emotions.
○ These tools do not perform very well [1]. On a StackOverflow dataset:
■ Joy: F1-score ranges between 0.37 to 0.47.
■ Fear: F1-score ranges between 0.22 to 0.40.
● Most likely problem: lack of large high-quality datasets on software developers
emotions in communication channel.
[1] Chen et al. “Emoji-powered sentiment and emotion detection from software developers' communication data.” TOSEM, 2021
3
Data Collection
● Selected 4 popular OSS repositories with over 50k GitHub stars.
4
[1] Biswas et al., “Achieving reliable sentiment analysis in the software engineering domain using bert.” ICSME, 2020.
● Total 2000 comments (1000 positive & 1000 negative)
Emotion Categorization
● There are a number of models of emotions.
● Most popular in SE is Shaver’s emotion categorization.
○ 6 primary categories:
■ Anger 😡
■ Love ❤
■ Fear 😨
■ Joy 😊
■ Sadness 😥
■ Surprise 😲
○ 25 secondary categories and over 100 tertiary categories.
5
Emotion Categorization: Shaver’s Categories
● 6 primary categories:
○ Anger 😡
○ Love ❤
○ Fear 😨
○ Joy 😊
○ Sadness 😥
○ Surprise 😲
● 25 secondary categories and over
100 tertiary categories.
❤
6
Shaver’s Categories Are Not a Perfect Match
● “I’m curious about this - can you give more context on what exactly goes
wrong? Perhaps if that causes bugs this should be prohibited instead?"
○ Expresses Curiosity 🤔
● “And, I am a little confused, if there is not any special folder, according to the
module resolution [URL] How could file find the correct modules? Did I miss
something?”
○ Expresses Confusion 😕
7
Shaver’s Categories Are Not a Perfect Match
● To mitigate the problem, we combine a recent text-based emotion classification
tool GoEmotions (2020) by Google which has 27 categories.
○ Provided a mapping between their categories and primary emotions:
■ 👍 Approval to 😊 Joy
■ 👎 Disapproval to 😡 Anger
■ 🤔 Curiosity to 😲 Surprise
8
Studied Tools for Emotion Classification in SE
ESEM-E [1] SVM Unigram, bigram
EMTk [2] SVM Unigram, bigram, emotion lexicon, polarity, mood
SEntiMoji [3] Transfer learning DeepMoji representation model
[1] Murgia et al., “An exploratory qualitative and quantitative analysis of emotions in issue report comments of open source systems.”, ESEM, 2018
[2] Calefato et al., “Emtk-the emotion mining toolkit.” SEmotion, 2019
[3] Chen et al. “Emoji-powered sentiment and emotion detection from software developers' communication data.” TOSEM, 2021
9
How Do the Tools Perform
● F1-score similar across all three tools.
● Overall precision significantly higher than recall.
○ The tools predicted conservatively.
■ choosing to predict more utterances lacking a certain emotion.
10
How Do the Tools Perform
● The false positive instances are broadly spread.
● Vast majority (58%) of the false negative instances are shared among the tools.
11
Error Analysis of FNs
● Analyzed 176 FN instances using Novielli et al.’s categorization [1].
[1] Novielli, Nicole et al. "A benchmark study on sentiment analysis for software engineering research." 2018 MSR. 12
Error Analysis of FNs
● General Error: the inability to recognize lexical cues that occur in the text.
○ “that’s awesome, I’ve been needing this for a while”
● Implicit Sentiment Polarity: humans use common knowledge to recognize
emotions that the tools miss.
○ “This was actually causing this test-case not to be executed!”
13
Data Augmentation
● Hypothesis: More training data should improve some error categories.
● Data Augmentation: a technique for creating new training instances by
targeted modification.
● The new instance is:
○ different from the source instance.
○ label invariant.
“awesome! I'm
glad you know
about this
trick.”
“awesome! I'm
happy you
know about
this trick.”
Data Augmentation
14
Data Augmentation: Unconstrained Strategy
● Four operators: insert, substitute, delete and shuffle.
● Used BART [1] generative model for insert and substitute operations.
15
[1] Lewis et al., “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and
Comprehension.” ACL, 2020
Data Augmentation: Unconstrained Strategy
● Unconstrained Strategy sometimes introduce noise.
○ Source:
“This looks good, thanks for clarifying the docs.”
○ Augmented:
“This looks worse, thanks for clarifying the docs.”
16
Data Augmentation: Lexicon-based Strategy
● Insert or Substitute word using an SE-specific emotion lexicon.
○ Emotion of the word is same as the annotation of the utterance.
● The SE-specific emotion lexicon comes from Mäntylä et al. [1].
[1] Mäntylä et al., “Bootstrapping a lexicon for emotional arousal in software engineering.” MSR, 2017
“This looks good,
thanks for clarifying
the docs.”
“This looks
wonderful, thanks
for clarifying the
docs.”
word from ‘Joy’ Lexicon
17
Data Augmentation: Polarity-based Strategy
● Same four operators as Unconstrained Strategy.
○ Delete word only if it has neutral polarity.
Positive Emotions Negative Emotions Ambiguous Emotions
Love Anger Surprise
Joy Fear
Sadness
Increase or
Preserve
Positive polarity
Increase or
Preserve
Negative polarity
No changes in
polarity
18
Data Augmentation: Results
● Overall Polarity strategy performed best.
19
Data Augmentation: Takeaway
● Helps to identify insufficient lexical cues.
○ “that’s awesome, I’ve been needing this for a while”
● Data augmentation does not seem to help in identifying implicit emotions.
○ “This was actually causing this test-case not to be executed!”
● Polarity strategy worked best, likely because it provided a balance between:
○ completely unconstrained augmentation and highly constrained
augmentation.
20
Summary of Contributions
● Manually annotated 2000 GitHub utterances.
● Extension of emotion taxonomy.
● Qualitative error analysis of three existing SE emotion classification tools.
● Demonstration and evaluation of three data augmentation approaches.
● Annotation instructions, annotated dataset, and source codes for data
augmentation are publicly available.
Questions/Thoughts/Collaboration Ideas to:
Mia Mohammad Imran, imranm3@vcu.edu
21

More Related Content

Similar to Data Augmentation for Improving Emotion Recognition in Software Engineering Communication

Perceptual Data_04182016
Perceptual Data_04182016Perceptual Data_04182016
Perceptual Data_04182016Kunal Dash
 
Emotion Classification In Software Engineering Texts: A Comparative Analysis ...
Emotion Classification In Software Engineering Texts: A Comparative Analysis ...Emotion Classification In Software Engineering Texts: A Comparative Analysis ...
Emotion Classification In Software Engineering Texts: A Comparative Analysis ...
Mia Mohammad Imran
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
Khushboo Gupta
 
Part 1Motivation and EmpowermentLeaders must often motivate .docx
Part 1Motivation and EmpowermentLeaders must often motivate .docxPart 1Motivation and EmpowermentLeaders must often motivate .docx
Part 1Motivation and EmpowermentLeaders must often motivate .docx
danhaley45372
 
Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...
Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...
Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...
Rasa Technologies
 
Emotion detection from text using data mining and text mining
Emotion detection from text using data mining and text miningEmotion detection from text using data mining and text mining
Emotion detection from text using data mining and text mining
Sakthi Dasans
 
Uncovering the Causes of Emotions in Software Developer Communication Using Z...
Uncovering the Causes of Emotions in Software Developer Communication Using Z...Uncovering the Causes of Emotions in Software Developer Communication Using Z...
Uncovering the Causes of Emotions in Software Developer Communication Using Z...
Mia Mohammad Imran
 
NYAI #18: Designing for AI (Rob Strati & Jesse Schifano of ECHO)
NYAI #18: Designing for AI (Rob Strati & Jesse Schifano of ECHO)NYAI #18: Designing for AI (Rob Strati & Jesse Schifano of ECHO)
NYAI #18: Designing for AI (Rob Strati & Jesse Schifano of ECHO)
Maryam Farooq
 
Emotional Intelligence with Suzette Reyes
Emotional Intelligence with Suzette ReyesEmotional Intelligence with Suzette Reyes
Emotional Intelligence with Suzette Reyes
Jodi Rudick
 
Crucial confrontations
Crucial confrontationsCrucial confrontations
Crucial confrontations
Yves Hanoulle
 
Motivation lesson
Motivation lessonMotivation lesson
Motivation lesson
pstall
 
The Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer EcosystemThe Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer Ecosystem
Nicole Novielli
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Association for Computational Linguistics
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Association for Computational Linguistics
 
Interpersonal Communication 1 - Emotional Intelligence
Interpersonal Communication 1 - Emotional IntelligenceInterpersonal Communication 1 - Emotional Intelligence
Interpersonal Communication 1 - Emotional IntelligenceGeorge Diamandis
 
Emotion Detection
Emotion DetectionEmotion Detection
Emotion Detection
MD. ABUL KALAM AZAD
 
Lean in - Questions...move you toward what you want!
Lean in  - Questions...move you toward what you want!Lean in  - Questions...move you toward what you want!
Lean in - Questions...move you toward what you want!
Denise Reed
 
Social Media Sentiments Analysis
Social Media Sentiments AnalysisSocial Media Sentiments Analysis
Social Media Sentiments Analysis
PratisthaSingh5
 
Grant writing slide show
Grant writing slide showGrant writing slide show
Grant writing slide show
South Carolina Resources
 

Similar to Data Augmentation for Improving Emotion Recognition in Software Engineering Communication (20)

Sentiment Analysis.pptx
Sentiment Analysis.pptxSentiment Analysis.pptx
Sentiment Analysis.pptx
 
Perceptual Data_04182016
Perceptual Data_04182016Perceptual Data_04182016
Perceptual Data_04182016
 
Emotion Classification In Software Engineering Texts: A Comparative Analysis ...
Emotion Classification In Software Engineering Texts: A Comparative Analysis ...Emotion Classification In Software Engineering Texts: A Comparative Analysis ...
Emotion Classification In Software Engineering Texts: A Comparative Analysis ...
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
 
Part 1Motivation and EmpowermentLeaders must often motivate .docx
Part 1Motivation and EmpowermentLeaders must often motivate .docxPart 1Motivation and EmpowermentLeaders must often motivate .docx
Part 1Motivation and EmpowermentLeaders must often motivate .docx
 
Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...
Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...
Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...
 
Emotion detection from text using data mining and text mining
Emotion detection from text using data mining and text miningEmotion detection from text using data mining and text mining
Emotion detection from text using data mining and text mining
 
Uncovering the Causes of Emotions in Software Developer Communication Using Z...
Uncovering the Causes of Emotions in Software Developer Communication Using Z...Uncovering the Causes of Emotions in Software Developer Communication Using Z...
Uncovering the Causes of Emotions in Software Developer Communication Using Z...
 
NYAI #18: Designing for AI (Rob Strati & Jesse Schifano of ECHO)
NYAI #18: Designing for AI (Rob Strati & Jesse Schifano of ECHO)NYAI #18: Designing for AI (Rob Strati & Jesse Schifano of ECHO)
NYAI #18: Designing for AI (Rob Strati & Jesse Schifano of ECHO)
 
Emotional Intelligence with Suzette Reyes
Emotional Intelligence with Suzette ReyesEmotional Intelligence with Suzette Reyes
Emotional Intelligence with Suzette Reyes
 
Crucial confrontations
Crucial confrontationsCrucial confrontations
Crucial confrontations
 
Motivation lesson
Motivation lessonMotivation lesson
Motivation lesson
 
The Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer EcosystemThe Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer Ecosystem
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Interpersonal Communication 1 - Emotional Intelligence
Interpersonal Communication 1 - Emotional IntelligenceInterpersonal Communication 1 - Emotional Intelligence
Interpersonal Communication 1 - Emotional Intelligence
 
Emotion Detection
Emotion DetectionEmotion Detection
Emotion Detection
 
Lean in - Questions...move you toward what you want!
Lean in  - Questions...move you toward what you want!Lean in  - Questions...move you toward what you want!
Lean in - Questions...move you toward what you want!
 
Social Media Sentiments Analysis
Social Media Sentiments AnalysisSocial Media Sentiments Analysis
Social Media Sentiments Analysis
 
Grant writing slide show
Grant writing slide showGrant writing slide show
Grant writing slide show
 

More from Preetha Chatterjee

Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Lock...
Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Lock...Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Lock...
Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Lock...
Preetha Chatterjee
 
Exploring ChatGPT for Toxicity Detection in GitHub
Exploring ChatGPT for Toxicity Detection in GitHubExploring ChatGPT for Toxicity Detection in GitHub
Exploring ChatGPT for Toxicity Detection in GitHub
Preetha Chatterjee
 
Interpersonal Trust in OSS: Exploring Dimensions of Trust in GitHub Pull Requ...
Interpersonal Trust in OSS: Exploring Dimensions of Trust in GitHub Pull Requ...Interpersonal Trust in OSS: Exploring Dimensions of Trust in GitHub Pull Requ...
Interpersonal Trust in OSS: Exploring Dimensions of Trust in GitHub Pull Requ...
Preetha Chatterjee
 
Automatic Identification of Informative Code in Stack Overflow Posts
Automatic Identification of Informative Code in Stack Overflow PostsAutomatic Identification of Informative Code in Stack Overflow Posts
Automatic Identification of Informative Code in Stack Overflow Posts
Preetha Chatterjee
 
Automatically Identifying the Quality of Developer Chats for Post Hoc Use
Automatically Identifying the Quality of Developer Chats for Post Hoc UseAutomatically Identifying the Quality of Developer Chats for Post Hoc Use
Automatically Identifying the Quality of Developer Chats for Post Hoc Use
Preetha Chatterjee
 
Finding Help with Programming Errors: An Exploratory Study of Novice Software...
Finding Help with Programming Errors: An Exploratory Study of Novice Software...Finding Help with Programming Errors: An Exploratory Study of Novice Software...
Finding Help with Programming Errors: An Exploratory Study of Novice Software...
Preetha Chatterjee
 
Extracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related ChatsExtracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related Chats
Preetha Chatterjee
 
Mining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software ArtifactsMining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software Artifacts
Preetha Chatterjee
 
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Preetha Chatterjee
 
Extracting Code Segments and Their Descriptions from Research Articles
Extracting Code Segments and Their Descriptions from Research ArticlesExtracting Code Segments and Their Descriptions from Research Articles
Extracting Code Segments and Their Descriptions from Research Articles
Preetha Chatterjee
 

More from Preetha Chatterjee (10)

Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Lock...
Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Lock...Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Lock...
Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Lock...
 
Exploring ChatGPT for Toxicity Detection in GitHub
Exploring ChatGPT for Toxicity Detection in GitHubExploring ChatGPT for Toxicity Detection in GitHub
Exploring ChatGPT for Toxicity Detection in GitHub
 
Interpersonal Trust in OSS: Exploring Dimensions of Trust in GitHub Pull Requ...
Interpersonal Trust in OSS: Exploring Dimensions of Trust in GitHub Pull Requ...Interpersonal Trust in OSS: Exploring Dimensions of Trust in GitHub Pull Requ...
Interpersonal Trust in OSS: Exploring Dimensions of Trust in GitHub Pull Requ...
 
Automatic Identification of Informative Code in Stack Overflow Posts
Automatic Identification of Informative Code in Stack Overflow PostsAutomatic Identification of Informative Code in Stack Overflow Posts
Automatic Identification of Informative Code in Stack Overflow Posts
 
Automatically Identifying the Quality of Developer Chats for Post Hoc Use
Automatically Identifying the Quality of Developer Chats for Post Hoc UseAutomatically Identifying the Quality of Developer Chats for Post Hoc Use
Automatically Identifying the Quality of Developer Chats for Post Hoc Use
 
Finding Help with Programming Errors: An Exploratory Study of Novice Software...
Finding Help with Programming Errors: An Exploratory Study of Novice Software...Finding Help with Programming Errors: An Exploratory Study of Novice Software...
Finding Help with Programming Errors: An Exploratory Study of Novice Software...
 
Extracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related ChatsExtracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related Chats
 
Mining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software ArtifactsMining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software Artifacts
 
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
 
Extracting Code Segments and Their Descriptions from Research Articles
Extracting Code Segments and Their Descriptions from Research ArticlesExtracting Code Segments and Their Descriptions from Research Articles
Extracting Code Segments and Their Descriptions from Research Articles
 

Recently uploaded

GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 

Recently uploaded (20)

GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 

Data Augmentation for Improving Emotion Recognition in Software Engineering Communication

  • 1. Data Augmentation for Improving Emotion Recognition in Software Engineering Communication Mia Mohammad Imran ASE 2022 - Research Paper Yashasvi Jain Preetha Chatterjee Kostadin Damevski 1
  • 2. ● Developers often show emotions (joy, anger, etc) in their communications. Motivation Toxic 🤬 Appreciation 🙏 2 “@[USER] Thank you, Stephen. I hope in the future Angular will become even better and easier to understand. However, first of all, I am grateful to Angular for making me grow as a developer.” Soooooooooooo you’re setting Angular on fire and saying bold shit in bold like the Angular team don’t care about you cause you found relative pathing has an issue is an odd area
  • 3. Motivation ● General purpose emotion classification tools are not effective to Software Engineering corpora. ● Researchers developed SE-specific tools to recognize emotions. ○ These tools do not perform very well [1]. On a StackOverflow dataset: ■ Joy: F1-score ranges between 0.37 to 0.47. ■ Fear: F1-score ranges between 0.22 to 0.40. ● Most likely problem: lack of large high-quality datasets on software developers emotions in communication channel. [1] Chen et al. “Emoji-powered sentiment and emotion detection from software developers' communication data.” TOSEM, 2021 3
  • 4. Data Collection ● Selected 4 popular OSS repositories with over 50k GitHub stars. 4 [1] Biswas et al., “Achieving reliable sentiment analysis in the software engineering domain using bert.” ICSME, 2020. ● Total 2000 comments (1000 positive & 1000 negative)
  • 5. Emotion Categorization ● There are a number of models of emotions. ● Most popular in SE is Shaver’s emotion categorization. ○ 6 primary categories: ■ Anger 😡 ■ Love ❤ ■ Fear 😨 ■ Joy 😊 ■ Sadness 😥 ■ Surprise 😲 ○ 25 secondary categories and over 100 tertiary categories. 5
  • 6. Emotion Categorization: Shaver’s Categories ● 6 primary categories: ○ Anger 😡 ○ Love ❤ ○ Fear 😨 ○ Joy 😊 ○ Sadness 😥 ○ Surprise 😲 ● 25 secondary categories and over 100 tertiary categories. ❤ 6
  • 7. Shaver’s Categories Are Not a Perfect Match ● “I’m curious about this - can you give more context on what exactly goes wrong? Perhaps if that causes bugs this should be prohibited instead?" ○ Expresses Curiosity 🤔 ● “And, I am a little confused, if there is not any special folder, according to the module resolution [URL] How could file find the correct modules? Did I miss something?” ○ Expresses Confusion 😕 7
  • 8. Shaver’s Categories Are Not a Perfect Match ● To mitigate the problem, we combine a recent text-based emotion classification tool GoEmotions (2020) by Google which has 27 categories. ○ Provided a mapping between their categories and primary emotions: ■ 👍 Approval to 😊 Joy ■ 👎 Disapproval to 😡 Anger ■ 🤔 Curiosity to 😲 Surprise 8
  • 9. Studied Tools for Emotion Classification in SE ESEM-E [1] SVM Unigram, bigram EMTk [2] SVM Unigram, bigram, emotion lexicon, polarity, mood SEntiMoji [3] Transfer learning DeepMoji representation model [1] Murgia et al., “An exploratory qualitative and quantitative analysis of emotions in issue report comments of open source systems.”, ESEM, 2018 [2] Calefato et al., “Emtk-the emotion mining toolkit.” SEmotion, 2019 [3] Chen et al. “Emoji-powered sentiment and emotion detection from software developers' communication data.” TOSEM, 2021 9
  • 10. How Do the Tools Perform ● F1-score similar across all three tools. ● Overall precision significantly higher than recall. ○ The tools predicted conservatively. ■ choosing to predict more utterances lacking a certain emotion. 10
  • 11. How Do the Tools Perform ● The false positive instances are broadly spread. ● Vast majority (58%) of the false negative instances are shared among the tools. 11
  • 12. Error Analysis of FNs ● Analyzed 176 FN instances using Novielli et al.’s categorization [1]. [1] Novielli, Nicole et al. "A benchmark study on sentiment analysis for software engineering research." 2018 MSR. 12
  • 13. Error Analysis of FNs ● General Error: the inability to recognize lexical cues that occur in the text. ○ “that’s awesome, I’ve been needing this for a while” ● Implicit Sentiment Polarity: humans use common knowledge to recognize emotions that the tools miss. ○ “This was actually causing this test-case not to be executed!” 13
  • 14. Data Augmentation ● Hypothesis: More training data should improve some error categories. ● Data Augmentation: a technique for creating new training instances by targeted modification. ● The new instance is: ○ different from the source instance. ○ label invariant. “awesome! I'm glad you know about this trick.” “awesome! I'm happy you know about this trick.” Data Augmentation 14
  • 15. Data Augmentation: Unconstrained Strategy ● Four operators: insert, substitute, delete and shuffle. ● Used BART [1] generative model for insert and substitute operations. 15 [1] Lewis et al., “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension.” ACL, 2020
  • 16. Data Augmentation: Unconstrained Strategy ● Unconstrained Strategy sometimes introduce noise. ○ Source: “This looks good, thanks for clarifying the docs.” ○ Augmented: “This looks worse, thanks for clarifying the docs.” 16
  • 17. Data Augmentation: Lexicon-based Strategy ● Insert or Substitute word using an SE-specific emotion lexicon. ○ Emotion of the word is same as the annotation of the utterance. ● The SE-specific emotion lexicon comes from Mäntylä et al. [1]. [1] Mäntylä et al., “Bootstrapping a lexicon for emotional arousal in software engineering.” MSR, 2017 “This looks good, thanks for clarifying the docs.” “This looks wonderful, thanks for clarifying the docs.” word from ‘Joy’ Lexicon 17
  • 18. Data Augmentation: Polarity-based Strategy ● Same four operators as Unconstrained Strategy. ○ Delete word only if it has neutral polarity. Positive Emotions Negative Emotions Ambiguous Emotions Love Anger Surprise Joy Fear Sadness Increase or Preserve Positive polarity Increase or Preserve Negative polarity No changes in polarity 18
  • 19. Data Augmentation: Results ● Overall Polarity strategy performed best. 19
  • 20. Data Augmentation: Takeaway ● Helps to identify insufficient lexical cues. ○ “that’s awesome, I’ve been needing this for a while” ● Data augmentation does not seem to help in identifying implicit emotions. ○ “This was actually causing this test-case not to be executed!” ● Polarity strategy worked best, likely because it provided a balance between: ○ completely unconstrained augmentation and highly constrained augmentation. 20
  • 21. Summary of Contributions ● Manually annotated 2000 GitHub utterances. ● Extension of emotion taxonomy. ● Qualitative error analysis of three existing SE emotion classification tools. ● Demonstration and evaluation of three data augmentation approaches. ● Annotation instructions, annotated dataset, and source codes for data augmentation are publicly available. Questions/Thoughts/Collaboration Ideas to: Mia Mohammad Imran, imranm3@vcu.edu 21