SlideShare a Scribd company logo

A Benchmark Study on Sentiment Analysis for Software Engineering Research

PAPER here: https://arxiv.org/abs/1803.06525 A recent research trend has emerged to identify developers’ emotions, by applying sentiment analysis to the content of communication traces left in collaborative development environments. Trying to overcome the limitations posed by using off-the-shelf sentiment analysis tools, researchers recently started to develop their own tools for the software engineering domain. In this paper, we report a benchmark study to assess the performance and reliability of three sentiment analysis tools specifically customized for software engineering. Furthermore, we offer a reflection on the open challenges, as they emerge from a qualitative analysis of misclassified texts.

1 of 30
Download to read offline
A Benchmark Study on
Sentiment Analysis for
Software Engineering Research
Nicole Novielli
@NicoleNovielli
Filippo Lanubile
@lanubile
Daniela Girardi
@DanielaGirard91
Sentiment analysis for software engineering
Collaborative software development
– Security concerns detection (Pletea et al., MSR’14)
– Impact on productivity (Ortu et al., MSR‘15)
– Early burnout discovery (Mantyla et al. MSR’15)
– Anger detection (Gachechiladze et al., ICSE-NIER‘17)
Collaborative knowledge sharing
– Empirically-driven guidelines for question writing (Calefato et al., IST 2018)
Requirements engineering
– User feedback (Guzman and Maalej, RE‘14)
– App improvement (Panichella et al., ICSME ‘14)
Actionable insights for
Off-the-shelf tools for sentiment analysis
Approach Ouput Validated on
Supervised learning
Bag-of-words
Probabilities:
• p(positive)
• p(negative)
• p(neutral)
Movie reviews
Tweets
Supervised learning Sentiment score in
[0,4]:
• 0 = very negative
• 2 = neutral
• 4 = very positive
Movie reviews
Lexicon-based
Dictionaries with a
priori polarity scores
in [-5, 5]
Sentiment scores
• Negative in [-5, -1]
• Positive in [1,5]
• Neutral = (-1,1)
Social media:
• YouTube
• Twitter
• MySpace
• …
http://sentistrength.wlv.ac.uk/
http://text-processing.com/
http://nlp.stanford.edu/sentiment/
Are off-the-shelf sentiment analysis tools
reliable for software engineering research?
RQ1: Do different sentiment analysis tools
agree with emotions of software developers?
The tools disagree with each other
Poor performance on technical texts
Disagreement can lead to diverging
conclusions
RQ2: Do sentiment analysis tools agree with
each other?
RQ3: Do different sentiment analysis tools lead
to contradictory results in software
engineering study?
RQ4: How does the choice of a sentiment
analysis tool affect conclusion validity?
Need for Software engineering (SE) specific tools for sentiment analysis
SE-specific sentiment analysis tools
• Senti4SD (Calefato et al. EMSE 2017)
• SentiCR(Ahmed et al., ASE ‘17)
• SentiStrength-SE (Islam and Zibran, MSR’17)
Supervised
Lexicon-based
F. Calefato, F. Lanubile, F. Maiorano, N. Novielli. Sentiment Polarity Detection for Software Development. EMSE, 2017
T. Ahmed, A. Bosu, A. Iqbal, and S. Rahimi. . SentiCR: a customized sentiment analysis tool for code review interactions, ASE 2017.
M.D.R. Islam and M.F. Zibran, Leveraging automated sentiment analysis in software engineering, MSR 2017.
Our replication
Research questions
RQ1: Do different sentiment analysis
tools agree with emotions of software
developers?
RQ2: Do sentiment analysis tools agree
with each other?
RQ2: Do SE-specific sentiment analysis
tools agree with each other?
RQ1: Do SE-specific sentiment analysis
tools agree with emotions of software
developers?
• Senti4SD (Calefato et al. EMSE 2017)
• SentiCR(Ahmed et al., ASE ‘17)
• SentiStrength-SE
(Islam and Zibran, MSR’17)
• SentiStrength (baseline)
• NLTK
• Stanford NLP
• Alchemy API
• SentiStrength
SE-specificOff-the-shelf
Ad

Recommended

Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemMark Cieliebak
 
Bba q&a study final white
Bba q&a study final whiteBba q&a study final white
Bba q&a study final whiteGreg Sterling
 
IRJET- BDI using NLP for Efficient Depression Identification
IRJET- BDI using NLP for Efficient Depression IdentificationIRJET- BDI using NLP for Efficient Depression Identification
IRJET- BDI using NLP for Efficient Depression IdentificationIRJET Journal
 
295B_Report_Sentiment_analysis
295B_Report_Sentiment_analysis295B_Report_Sentiment_analysis
295B_Report_Sentiment_analysisZahid Azam
 
Therapy chatbot-a-relief-from-mental-stress-and-problems
Therapy chatbot-a-relief-from-mental-stress-and-problemsTherapy chatbot-a-relief-from-mental-stress-and-problems
Therapy chatbot-a-relief-from-mental-stress-and-problemsPranavKapoor31
 

More Related Content

What's hot

How to Test Whether Consciousness Can Be Revived From Digital Reflections of ...
How to Test Whether Consciousness Can Be Revived From Digital Reflections of ...How to Test Whether Consciousness Can Be Revived From Digital Reflections of ...
How to Test Whether Consciousness Can Be Revived From Digital Reflections of ...martine
 
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment AnalysisSupervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment AnalysisTharindu Kumara
 
Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...
Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...
Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...Jason Hong
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisMakrand Patil
 
How Can Software Engineering Support AI
How Can Software Engineering Support AIHow Can Software Engineering Support AI
How Can Software Engineering Support AIWalid Maalej
 
IRE Major Project
IRE Major Project IRE Major Project
IRE Major Project Anurag Gupta
 
project sentiment analysis
project sentiment analysisproject sentiment analysis
project sentiment analysissneha penmetsa
 
Project sentiment analysis
Project sentiment analysisProject sentiment analysis
Project sentiment analysisBob Prieto
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]Sagar Ahire
 
SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisAditya Joshi
 
IRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET Journal
 
Philosophy of Software Diagnostics
Philosophy of Software DiagnosticsPhilosophy of Software Diagnostics
Philosophy of Software DiagnosticsDmitry Vostokov
 
A data driven approach to query expansion in question answering
A data driven approach to query expansion in question answeringA data driven approach to query expansion in question answering
A data driven approach to query expansion in question answeringLeon Derczynski
 
Predictive uncertainty of deep models and its applications
Predictive uncertainty of deep models and its applicationsPredictive uncertainty of deep models and its applications
Predictive uncertainty of deep models and its applicationsNAVER Engineering
 
LSTM Based Sentiment Analysis
LSTM Based Sentiment AnalysisLSTM Based Sentiment Analysis
LSTM Based Sentiment Analysisijtsrd
 

What's hot (15)

How to Test Whether Consciousness Can Be Revived From Digital Reflections of ...
How to Test Whether Consciousness Can Be Revived From Digital Reflections of ...How to Test Whether Consciousness Can Be Revived From Digital Reflections of ...
How to Test Whether Consciousness Can Be Revived From Digital Reflections of ...
 
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment AnalysisSupervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
 
Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...
Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...
Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
How Can Software Engineering Support AI
How Can Software Engineering Support AIHow Can Software Engineering Support AI
How Can Software Engineering Support AI
 
IRE Major Project
IRE Major Project IRE Major Project
IRE Major Project
 
project sentiment analysis
project sentiment analysisproject sentiment analysis
project sentiment analysis
 
Project sentiment analysis
Project sentiment analysisProject sentiment analysis
Project sentiment analysis
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]
 
SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment Analysis
 
IRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET- Semantic Question Matching
IRJET- Semantic Question Matching
 
Philosophy of Software Diagnostics
Philosophy of Software DiagnosticsPhilosophy of Software Diagnostics
Philosophy of Software Diagnostics
 
A data driven approach to query expansion in question answering
A data driven approach to query expansion in question answeringA data driven approach to query expansion in question answering
A data driven approach to query expansion in question answering
 
Predictive uncertainty of deep models and its applications
Predictive uncertainty of deep models and its applicationsPredictive uncertainty of deep models and its applications
Predictive uncertainty of deep models and its applications
 
LSTM Based Sentiment Analysis
LSTM Based Sentiment AnalysisLSTM Based Sentiment Analysis
LSTM Based Sentiment Analysis
 

Similar to A Benchmark Study on Sentiment Analysis for Software Engineering Research

To Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment AnalysisTo Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment AnalysisNicole Novielli
 
A Gold Standard for Emotion Annotation in Stack Overflow
A Gold Standard for Emotion Annotation in Stack Overflow A Gold Standard for Emotion Annotation in Stack Overflow
A Gold Standard for Emotion Annotation in Stack Overflow Fabio Calefato
 
A Context-Based Algorithm For Sentiment Analysis
A Context-Based Algorithm For Sentiment AnalysisA Context-Based Algorithm For Sentiment Analysis
A Context-Based Algorithm For Sentiment AnalysisRichard Hogue
 
Hybrid Deep Learning Model for Multilingual Sentiment Analysis
Hybrid Deep Learning Model for Multilingual Sentiment AnalysisHybrid Deep Learning Model for Multilingual Sentiment Analysis
Hybrid Deep Learning Model for Multilingual Sentiment AnalysisIRJET Journal
 
Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!University of Córdoba
 
Sentiment analysis tools for software engineering research cannot be used out...
Sentiment analysis tools for software engineering research cannot be used out...Sentiment analysis tools for software engineering research cannot be used out...
Sentiment analysis tools for software engineering research cannot be used out...Alexander Serebrenik
 
Explainable AI for non-expert users
Explainable AI for non-expert usersExplainable AI for non-expert users
Explainable AI for non-expert usersKatrien Verbert
 
On serendipity in recommender systems - Haifa RecSoc workshop june 2015
On serendipity in recommender systems - Haifa RecSoc workshop june 2015On serendipity in recommender systems - Haifa RecSoc workshop june 2015
On serendipity in recommender systems - Haifa RecSoc workshop june 2015Giovanni Semeraro
 
A Study On Various Classification Techniques For Sentiment Analysis On Social...
A Study On Various Classification Techniques For Sentiment Analysis On Social...A Study On Various Classification Techniques For Sentiment Analysis On Social...
A Study On Various Classification Techniques For Sentiment Analysis On Social...Andrea Porter
 
A General Architecture for an Emotion-aware Content-based Recommender System
A General Architecture for an Emotion-aware Content-based Recommender SystemA General Architecture for an Emotion-aware Content-based Recommender System
A General Architecture for an Emotion-aware Content-based Recommender SystemLucio Narducci
 
REVIEW PPT.pptx
REVIEW PPT.pptxREVIEW PPT.pptx
REVIEW PPT.pptxSaravanaD2
 
Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...
Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...
Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...Fabio Calefato
 
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET Journal
 
Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsKimberly Pulley
 
Sentiment Analysis for Software EngineeringHow Far Can We G.docx
Sentiment Analysis for Software EngineeringHow Far Can We G.docxSentiment Analysis for Software EngineeringHow Far Can We G.docx
Sentiment Analysis for Software EngineeringHow Far Can We G.docxedgar6wallace88877
 
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNETOPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNETijfcstjournal
 
Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews cscpconf
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWSUSING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWScsandit
 

Similar to A Benchmark Study on Sentiment Analysis for Software Engineering Research (20)

To Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment AnalysisTo Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
 
A Gold Standard for Emotion Annotation in Stack Overflow
A Gold Standard for Emotion Annotation in Stack Overflow A Gold Standard for Emotion Annotation in Stack Overflow
A Gold Standard for Emotion Annotation in Stack Overflow
 
A Context-Based Algorithm For Sentiment Analysis
A Context-Based Algorithm For Sentiment AnalysisA Context-Based Algorithm For Sentiment Analysis
A Context-Based Algorithm For Sentiment Analysis
 
Hybrid Deep Learning Model for Multilingual Sentiment Analysis
Hybrid Deep Learning Model for Multilingual Sentiment AnalysisHybrid Deep Learning Model for Multilingual Sentiment Analysis
Hybrid Deep Learning Model for Multilingual Sentiment Analysis
 
Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!
 
Sentiment analysis tools for software engineering research cannot be used out...
Sentiment analysis tools for software engineering research cannot be used out...Sentiment analysis tools for software engineering research cannot be used out...
Sentiment analysis tools for software engineering research cannot be used out...
 
Explainable AI for non-expert users
Explainable AI for non-expert usersExplainable AI for non-expert users
Explainable AI for non-expert users
 
On serendipity in recommender systems - Haifa RecSoc workshop june 2015
On serendipity in recommender systems - Haifa RecSoc workshop june 2015On serendipity in recommender systems - Haifa RecSoc workshop june 2015
On serendipity in recommender systems - Haifa RecSoc workshop june 2015
 
A Study On Various Classification Techniques For Sentiment Analysis On Social...
A Study On Various Classification Techniques For Sentiment Analysis On Social...A Study On Various Classification Techniques For Sentiment Analysis On Social...
A Study On Various Classification Techniques For Sentiment Analysis On Social...
 
A General Architecture for an Emotion-aware Content-based Recommender System
A General Architecture for an Emotion-aware Content-based Recommender SystemA General Architecture for an Emotion-aware Content-based Recommender System
A General Architecture for an Emotion-aware Content-based Recommender System
 
REVIEW PPT.pptx
REVIEW PPT.pptxREVIEW PPT.pptx
REVIEW PPT.pptx
 
Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...
Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...
Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...
 
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
 
Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel Reviews
 
Sentiment Analysis for Software EngineeringHow Far Can We G.docx
Sentiment Analysis for Software EngineeringHow Far Can We G.docxSentiment Analysis for Software EngineeringHow Far Can We G.docx
Sentiment Analysis for Software EngineeringHow Far Can We G.docx
 
Lac presentation
Lac presentationLac presentation
Lac presentation
 
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNETOPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
 
Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWSUSING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 

More from Nicole Novielli

Towards Supporting Emotion Awareness of Software Developers
Towards Supporting Emotion Awareness of Software DevelopersTowards Supporting Emotion Awareness of Software Developers
Towards Supporting Emotion Awareness of Software DevelopersNicole Novielli
 
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open ChallengesKeynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open ChallengesNicole Novielli
 
Emotion Detection Using Noninvasive Low-cost Sensors
Emotion Detection Using Noninvasive Low-cost SensorsEmotion Detection Using Noninvasive Low-cost Sensors
Emotion Detection Using Noninvasive Low-cost SensorsNicole Novielli
 
Evalita2018 iListen - itaLIan Speech acT labEliNg
Evalita2018 iListen - itaLIan Speech acT labEliNgEvalita2018 iListen - itaLIan Speech acT labEliNg
Evalita2018 iListen - itaLIan Speech acT labEliNgNicole Novielli
 
The Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer EcosystemThe Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer EcosystemNicole Novielli
 
Deep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment AnalysisDeep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment AnalysisNicole Novielli
 
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...Nicole Novielli
 
Towards Discovering the Role of Emotions in Stack Overflow
Towards Discovering the Role of Emotions in Stack OverflowTowards Discovering the Role of Emotions in Stack Overflow
Towards Discovering the Role of Emotions in Stack OverflowNicole Novielli
 
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...Nicole Novielli
 
Social Network Analysis for Global Software Engineering: Exploring relationsh...
Social Network Analysis for Global Software Engineering: Exploring relationsh...Social Network Analysis for Global Software Engineering: Exploring relationsh...
Social Network Analysis for Global Software Engineering: Exploring relationsh...Nicole Novielli
 

More from Nicole Novielli (10)

Towards Supporting Emotion Awareness of Software Developers
Towards Supporting Emotion Awareness of Software DevelopersTowards Supporting Emotion Awareness of Software Developers
Towards Supporting Emotion Awareness of Software Developers
 
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open ChallengesKeynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
 
Emotion Detection Using Noninvasive Low-cost Sensors
Emotion Detection Using Noninvasive Low-cost SensorsEmotion Detection Using Noninvasive Low-cost Sensors
Emotion Detection Using Noninvasive Low-cost Sensors
 
Evalita2018 iListen - itaLIan Speech acT labEliNg
Evalita2018 iListen - itaLIan Speech acT labEliNgEvalita2018 iListen - itaLIan Speech acT labEliNg
Evalita2018 iListen - itaLIan Speech acT labEliNg
 
The Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer EcosystemThe Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer Ecosystem
 
Deep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment AnalysisDeep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment Analysis
 
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
 
Towards Discovering the Role of Emotions in Stack Overflow
Towards Discovering the Role of Emotions in Stack OverflowTowards Discovering the Role of Emotions in Stack Overflow
Towards Discovering the Role of Emotions in Stack Overflow
 
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
 
Social Network Analysis for Global Software Engineering: Exploring relationsh...
Social Network Analysis for Global Software Engineering: Exploring relationsh...Social Network Analysis for Global Software Engineering: Exploring relationsh...
Social Network Analysis for Global Software Engineering: Exploring relationsh...
 

Recently uploaded

data analytics and tools from in2inglobal.pdf
data analytics  and tools from in2inglobal.pdfdata analytics  and tools from in2inglobal.pdf
data analytics and tools from in2inglobal.pdfdigimartfamily
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referencepriyansabari355
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)UNCResearchHub
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensKondapi V Siva Rama Brahmam
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023stephizcoolio
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsDataArchiva
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaAdrian Sanabria
 
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Cyber Security Experts
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referencepriyansabari355
 
What is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptxWhat is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptxJose Briones
 
Artificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxArtificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxVighnesh Shashtri
 
ppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxHizkiaJastis
 
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Thibaud Le Douarin
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxMdRafiqulIslam403212
 
Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)CUO VEERANAN VEERANAN
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfAustraliaChapterIIBA
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for usersStephenEfange3
 

Recently uploaded (18)

data analytics and tools from in2inglobal.pdf
data analytics  and tools from in2inglobal.pdfdata analytics  and tools from in2inglobal.pdf
data analytics and tools from in2inglobal.pdf
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a reference
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample Screens
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023
 
Electricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptxElectricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptx
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data Goals
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix Enigma
 
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as reference
 
What is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptxWhat is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptx
 
Artificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxArtificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptx
 
ppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptx
 
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptx
 
Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for users
 

A Benchmark Study on Sentiment Analysis for Software Engineering Research

  • 1. A Benchmark Study on Sentiment Analysis for Software Engineering Research Nicole Novielli @NicoleNovielli Filippo Lanubile @lanubile Daniela Girardi @DanielaGirard91
  • 2. Sentiment analysis for software engineering Collaborative software development – Security concerns detection (Pletea et al., MSR’14) – Impact on productivity (Ortu et al., MSR‘15) – Early burnout discovery (Mantyla et al. MSR’15) – Anger detection (Gachechiladze et al., ICSE-NIER‘17) Collaborative knowledge sharing – Empirically-driven guidelines for question writing (Calefato et al., IST 2018) Requirements engineering – User feedback (Guzman and Maalej, RE‘14) – App improvement (Panichella et al., ICSME ‘14) Actionable insights for
  • 3. Off-the-shelf tools for sentiment analysis Approach Ouput Validated on Supervised learning Bag-of-words Probabilities: • p(positive) • p(negative) • p(neutral) Movie reviews Tweets Supervised learning Sentiment score in [0,4]: • 0 = very negative • 2 = neutral • 4 = very positive Movie reviews Lexicon-based Dictionaries with a priori polarity scores in [-5, 5] Sentiment scores • Negative in [-5, -1] • Positive in [1,5] • Neutral = (-1,1) Social media: • YouTube • Twitter • MySpace • … http://sentistrength.wlv.ac.uk/ http://text-processing.com/ http://nlp.stanford.edu/sentiment/ Are off-the-shelf sentiment analysis tools reliable for software engineering research?
  • 4. RQ1: Do different sentiment analysis tools agree with emotions of software developers? The tools disagree with each other Poor performance on technical texts Disagreement can lead to diverging conclusions RQ2: Do sentiment analysis tools agree with each other? RQ3: Do different sentiment analysis tools lead to contradictory results in software engineering study? RQ4: How does the choice of a sentiment analysis tool affect conclusion validity? Need for Software engineering (SE) specific tools for sentiment analysis
  • 5. SE-specific sentiment analysis tools • Senti4SD (Calefato et al. EMSE 2017) • SentiCR(Ahmed et al., ASE ‘17) • SentiStrength-SE (Islam and Zibran, MSR’17) Supervised Lexicon-based F. Calefato, F. Lanubile, F. Maiorano, N. Novielli. Sentiment Polarity Detection for Software Development. EMSE, 2017 T. Ahmed, A. Bosu, A. Iqbal, and S. Rahimi. . SentiCR: a customized sentiment analysis tool for code review interactions, ASE 2017. M.D.R. Islam and M.F. Zibran, Leveraging automated sentiment analysis in software engineering, MSR 2017.
  • 6. Our replication Research questions RQ1: Do different sentiment analysis tools agree with emotions of software developers? RQ2: Do sentiment analysis tools agree with each other? RQ2: Do SE-specific sentiment analysis tools agree with each other? RQ1: Do SE-specific sentiment analysis tools agree with emotions of software developers? • Senti4SD (Calefato et al. EMSE 2017) • SentiCR(Ahmed et al., ASE ‘17) • SentiStrength-SE (Islam and Zibran, MSR’17) • SentiStrength (baseline) • NLTK • Stanford NLP • Alchemy API • SentiStrength SE-specificOff-the-shelf
  • 7. Our replication Gold standard datasets 392 comments (Murgia et al., MSR’14) 5869 comments (Murgia et al., MSR’16) 4423 Qs, As, Cs (Calefato et al., EMSE 2017) Model-driven annotation
  • 8. Model-driven annotation of emotions Emotion Original study Our replication Love Positive Positive Joy Positive Positive Surprise Positive Ambiguous Anger Negative Negative Sadness Negative Negative Fear Negative Negative No emotion Neutral Neutral (Shaver et al., 1987) Mapping emotions to polarity I'm happy with the approach and the code looks good Joy -> Positive Polarity Joy Happiness Satisfaction
  • 9. Our replication Gold standard datasets 392 comments (Murgia et al., MSR’14) 5869 comments (Murgia et al., MSR’16) 4423 Qs, As, Cs (Calefato et al., EMSE 2017) Model-driven annotation 1500 sentences QA on Java libraries (Lin et al., ICSE’18) 1600 comments from code review (Ahmed et al., ASE’17) Ad-hoc annotation
  • 10. Model-driven vs. ad-hoc annotation Model-driven Ad-hoc Theoretical models Yes No Training of raters Yes No Guidelines for annotation Based on taxonomy Based on subjective perception
  • 11. Research questions RQ3: To what extent the labeling approach has an impact on the performance of SE- specific sentiment analysis tools? Our replication RQ1: Do different sentiment analysis tools agree with emotions of software developers? RQ2: Do sentiment analysis tools agree with each other? RQ2: Do SE-specific sentiment analysis tools agree with each other? RQ1: Do SE-specific sentiment analysis tools agree with emotions of software developers?
  • 12. Metrics Our replication • Weighted Cohens’ Kappa (1968) • Weighted Cohens’ Kappa (1968) • Text categorization metrics (Sebastiani, 2002) – Precision – Recall – F-measure J. Cohen. 1968. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 4, 213-220. F. Sebastiani. 2002. Machine learning in automated text categorization. ACM Computing Surveys, 34,1, 1-47.
  • 13. Weighted Cohens’ Kappa Disagreement: strong vs. mild negative neutral positive negative 0 1 2 neutral 1 0 1 positive 2 1 0 Interpretation (Viera and Garrett, 2005) • less than chance κ ≤ 0 • slight if 0.01 ≤ κ ≤ 0.20 • fair if 0.21 ≤ κ ≤ 0.40 • moderate if 0.41 ≤ κ ≤ 0.60 • substantial if 0.61 ≤ κ ≤ 0.80 • almost perfect if 0.81 ≤ κ ≤ 1 A.J. Viera, J.M. Garrett. 2005. Understanding interobserver agreement: the kappa statistic. Family Medicine, 37,5, 360–363
  • 14. Experimental setting Gold standard datasets Train 70%Stratified sampling Senti4SD updated model SentiCR updated model Training of supervised tools Test 30% SentiStrength-SE SentiStrength Assessment of performance
  • 15. Our replication: SE-specific tools vs. manual annotation Original study: off-the-shelf tools RQ1: Do SE-specific sentiment analysis tools agree with emotions of software developers? Fair agreement Substantial agreement
  • 16. Original study: off-the-shelf tools Our replication: SE-specific tools vs. manual annotation RQ1: Do SE-specific sentiment analysis tools agree with emotions of software developers? Opportunistic sampling using SentiStrength (Calefato et al., EMSE 2017)
  • 17. Our replication vs. manual annotation RQ1: Do SE-specific sentiment analysis tools agree with emotions of software developers? • SE-specific optimization improves the classification accuracy • Retraining supervised tools produces better performance
  • 18. Our replication vs. manual annotation RQ1: Do SE-specific sentiment analysis tools agree with emotions of software developers? • SE-specific optimization improves the classification accuracy • Retraining supervised tools produces better performance • Comparable performance for SentiStrength-SE (lexicon- based)
  • 19. RQ2: Do SE-specific sentiment analysis tools agree with each other? Our replication Original study: off-the-shelf tools From substantial to perfect agreement From less than chance to fair agreement
  • 20. RQ3: To what extent the labeling approach has an impact on the performance of SE-specific sentiment analysis tools? Model- driven annotation Ad-hoc annotation
  • 21. RQ3: To what extent the labeling approach has an impact on the performance of SE-specific sentiment analysis tools? Model-driven annotation Ad-hoc annotation • From substantial to perfect agreement also between supervised and lexicon-based tools • From fair to moderate agreement • Better agreement for supervised approaches
  • 22. Error analysis Manual inspection of texts misclassified by all tools
  • 24. Error analysis Polar facts but neutral sentiment ‘I tried the following and it returns nothing’ --- ‘This creates an unnecessary garbage list. Sets.newHashSet should accept an Iterable.’
  • 25. Error analysis General error Broken syntax as in ‘wontbe so bad’ --- Idiomatic expression ‘Are you out of mind?’
  • 26. Error analysis Politeness Context-dependent interpretation of politeness by raters ‘Thank you’ vs. ‘Thank you!’
  • 27. Lessons learned  Reliable sentiment analysis in software engineering is possible
  • 28. Lessons learned  Reliable sentiment analysis in software engineering is possible  Tuning of tools for software engineering improves classification accuracy  SE-specific tools agree with manual annotation  SE-specific tools agree with each other
  • 29. Lessons learned  Reliable sentiment analysis in software engineering is possible  Tuning of tools for software engineering improves classification accuracy  SE-specific tools agree with manual annotation  SE-specific tools agree with each other  Grounding research on theoretical models of affect is recommended  The choice depends on the research goals: polarity vs. fine-grained emotions, emotions vs. attitudes, etc.
  • 30. Lessons learned  Reliable sentiment analysis in software engineering is possible  Tuning of tools for software engineering improves classification accuracy  SE-specific tools agree with manual annotation  SE-specific tools agree with each other  Grounding research on theoretical models of affect is recommended  The choice depends on the research goals: polarity vs. fine-grained emotions, emotions vs. attitudes, etc.  Preliminary sanity check is always recommended

Editor's Notes

  1. Better performance for supervised approaches
  2. Better performance for supervised approaches
  3. Mainly observed in ad hoc annotation datasets