Towards Discovering the Role of Emotions in Stack Overflow

Nicole Novielli
Nicole NovielliAssociate Professor
Towards Discovering 
the Role of Emotions 
in Stack Overflow 
N. Novielli, F. Calefato, F. Lanubile 
University of Bari, Italy 
{nicole.novielli, fabio.calefato, filippo.lanubile}@uniba.it
A new way to access knowledge 
SSE@FSE 2014 2
How Do Programmers Ask 
and Answers Questions? 
 Which questions are answered well and 
which ones remain unanswered? 
(Treude et al., ICSE’11), (Asudazzaman et al., MSR’13) 
 Can we predict how long a question will remain 
unanswered? (Asudazzaman et al., MSR’13) 
 What are the main discussion topics? 
(Barua et al., ’12), (Bajaji et al., MSR’14) 
 What are the main factors affecting reputation? 
(Bosu et al., MSR’13)
Emotions in Social 
Computing and SSE 
 Sentiment Analysis on Yahoo! Answers 
(Kucuktunc et al., WSDM’12) 
 Answers perceived as good have a more neutral 
sentiment than others 
 Do developers feel emotions? (Murgia, et al., MSR’14) 
 Apache Software Foundation issue tracker 
 Sentiment Analysis of Commit comments in 
GitHub (Guzman et al., MSR’13) 
 Correlation with day and time, programming language, 
team distribution 
SSE@FSE 2014 4
Research Question 
Getting emotional while asking or 
answering questions in Stack Overflow: 
good or bad? 
 Impact on success of questions 
 Impact on perceived quality of answers 
 Correlation with reputation 
 Correlation with topics 
 … 
SSE@FSE 2014 5
Preliminary study 
 RQ1:To what degree does the emotional 
style of a question affect the probability of 
success? 
 A successful question has an accepted answer 
SSE@FSE 2014 6
SSE@FSE 2014 7
Dataset distribution 
SSE@FSE 2014 8 
No 
accepted 
Answers 
(31%) 
No Answers 
(11%) 
Accepted 
Answers 
(58%) 
Successful 
4,196,125 
questions 
Unsuccessful 
3,013,677 questions
Building the Model 
SSE@FSE 2014 9 
Post Properties 
• Title Length 
• Post Length 
• Code Blocks 
• Day 
• Time 
• Topic 
• # Comments 
Social Factors 
• Question Score 
• Answer Score 
• # Accepted answer 
provided 
• # Answers accepted 
• # Badges 
Affective Factors 
•Sentiment Polarity 
• Polarity of Question/Answer 
• Polarity of Comments 
•Lexical Cues of Affective 
States 
• Positive emotions lexicon 
• Negative emotions lexicon 
• Gratitude 
• Politeness 
• Attitude of doubt 
• … 
Control Model
The Model 
Post Properties Social Factors Affective Factors 
SSE@FSE 2014 10 
Control Model 
Independent variables, logistic regression model 
Dependent variable: success of a question (Y/N)
Post Properties - Metrics 
• Title and Post Length: # words 
• Alhoff at al., @ICWSM’14; Asaduzzaman et al., @MSR’13 
• Used by SO moderators for automatic filtering 
• Code Blocks: yes/no 
• Treude et al., @ICSE’11 
• Day: in {weekday, weekend} 
• Bosu et al., @MSR2013 
• Time: in {morning, afternoon, evening night} 
• Bosu et al., @MSR2013 
• Topic: categorical, using LDA 
• Asaduzzaman et al., @MSR’13; Bosu et al., @MSR’13 
• Harper et al., @CHI’08 
• Barua et al., Empirical Software Engineering 2014 
SSE@FSE 2014 11
Social Factors - Metrics 
• Assessing the reputation of the author of the 
question at the time it is posted 
• High status correlated with success in Reddit.com (Althoff et al., ICWSM’14) 
• Novices’ questions are more likely answered on Stack Overflow 
(Treude et al., ICSE’ 11) 
• Metrics to approximate the author’s 
reputation 
• Question Score: upvotes - downvotes on questions 
• Answer Score: upvotes – downvotes on answers 
• # Accepted answer provided 
• # Answers accepted 
• # Badges: total badges owned 
SSE@FSE 2014 12
Affective Factors 
• Sentiment Polarity 
• Questions/Answers 
• Polarity of Comments 
SSE@FSE 2014 13
Sentiment Analysis Emotion Detection 
Subjective vs. Objective 
Negative vs. Positive 
Classification using Discrete 
Emotion Labels Goal 
‘I can't solve this problem, it’s very 
frustrating’ 
SSE@FSE 2014 14 
Example 
Resources - SentiStrength 
(Thelwall et al., 2012) 
- SentiWordNet 
(Esuli and Sebastiani, 2006) 
- MPQA Lexicon 
(Wilson et al., EMNLP’05) 
- … 
- LIWC 
(Tausczik and Pennebaker, 2010) 
- WordNet Affect 
(Strapparava and Valitutti, 2004) 
- Depeche Mood 
(Staiano and Guerini, ACL’14) 
- … 
Sad, Frustrated 
‘I can't solve this problem, it’s very 
frustrating’ 
Subjective, Negative
Affective Factors 
• Sentiment Polarity 
• Question 
• Polarity of Comments 
• Lexical Cues of Affective States 
• Positive emotions lexicon 
• Negative emotions lexicon 
• Gratitude 
• Politeness 
• Attitude of doubt 
• … 
Future work 
- Sentistrength: http://sentistrength.wlv.ac.uk/ 
SSE@FSE 2014 15
SentiStrength 
 Estimates the strength of both positive and 
negative sentiment in questions and comments 
 Robust also for informal language 
 Used in previous research 
 Sentiment Analysis of commit comments in GitHub 
(Guzman et al., MSR’13) 
 Sentiment Analysis on Yahoo! Answers 
(Kucuktnc et al., WSDM’12) 
SSE@FSE 2014 16
Preliminary results - Post Properties 
17 
Coeff Odds Ratio 
Code Blocks 0.2549 1.29 
# of comments -0.3659 0.69 
Day (Weekend) 0.0131 1.01 
TIME 
Afternoon 0.1418 1.15 
Evening 0.2093 1.23 
Night 0.1085 1.12 
Post LENGTH 
Body Length -0,0004 0.99 
Title Length -0.0039 0.99 
All significant, with a = 0.05 
• Review questions are more 
concrete and get more answers 
(Treude et al., ICSE’11) and vague 
questions remain unanswered 
(Asaduzzaman et al., MSR’13) 
• SO off-peak hours (night): longer 
answer interval and less 
questions posted 
(Barua et al., MSR’13)
Post properties: Topic 
18 
Coeff Odds Ratio 
DATABASES/PERFORMANCE 0.4062 1.50 
WEB PROGRAMMING 0.2725 1.31 
GRAPHICS 0.2415 1.27 
WEB PROGRAMMING/HTTP 0.1441 1.16 
JAVA 0.0029 1.00 
OOP 0.8599 2.36 
MOBILE DEVELOPMENT/iOS 0.2664 1.30 
SOURCE CODE MANAGEMENT 0.2805 1.32 
DATA STRUCTURE/ALGORITHMS 0.7340 2.08 
.NET FRAMEWORK/ASP 0.3442 1.41 
SCRIPTING 0.3649 1.44 
DATABASES/SQL 0.4488 1.57 
WEB APP DEVELOPMENT 0.3330 1.40 
MOBILE DEV/ANDROID 0.1111 1.12 
All significant, with a = 0.05
Success rate per topic 
19 
Topic Success rate 
Number of 
questions Post rate 
OOP 6 70,81% 630258 8,84% 
DATA STRUCTURE/ALGORITHMS 9 67,73% 798713 11,20% 
DATABASES/SQL 12 61,12% 582130 8,16% 
.NET FRAMEWORK/ASP 10 58,73% 518834 7,28% 
SCRIPTING 11 58,54% 497763 6,98% 
WEB APP DEVELOPMENT 13 58,47% 492173 6,90% 
DATABASES/PERFORMANCE 0 57,72% 415825 5,83% 
WEB PROGRAMMING 1 56,59% 536255 7,52% 
SOURCE CODE MANAGEMENT 8 55,37% 373397 5,24% 
GRAPHICS 2 54,37% 383376 5,38% 
MOBILE DEVELOPMENT/iOS 7 53,91% 376517 5,28% 
WEB PROGRAMMING/HTTP 3 52,22% 375510 5,27% 
MOBILE DEV/ANDROID 14 51,50% 432095 6,06% 
JAVA 5 49,35% 235489 3,30% 
WEB AUTHENTICATION/API 4 49,00% 482992 6,77%
Preliminary Results – 
Social Factors 
Coeff Odds Ratio 
User Question Score* -0,0017 0.99 
User Answer Score* -0,0002 0.99 
User Answers Accepted* 0,0047 1.00 
User Questions Accepted* 0,0078 1.00 
Number Of Badges 0,0001 1.0001103 
SSE@FSE 2014 20 
*significant with a = 0.05
Preliminary Results – 
Affective Factors 
Coef Odds Ratio 
SENTIMENT of the QUESTION 
Question Positive Score -0.0248 0.98 
Question Negative Score -0.0083 0.99 
SENTIMENT of the author’s COMMENTS 
Comment Positive Score -0.1813 0.83 
Comment Negative Score -0.1080 0.90 
All significant, with a = 0.05 
SSE@FSE 2014 21
Impact of Positive Sentiment on Success 
Positive polarity of QUESTION Positive polarity of COMMENTS 
22
Impact of Negative Sentiment on Success 
Negative polarity of QUESTION Negative Polarity of COMMENTS 
23
Problems in detecting 
sentiment 
 ‘Problem’ lexicon is too peculiar for the domain 
to be considered as a pure expression of 
negative emotions 
 Actually describing emotions 
 ‘I have very simple and stupid trouble […] I'm pretty 
confused, explain please, what is wrong?’ (neg=-2) 
 ‘Sorry for troubling you guys’ (neg=-2) 
 Simply describing problem 
 What is the best way to kill a critical process? (neg=-2) 
 What is wrong? (neg=-2) 
 Mixed 
 I’m missing a parenthesis . But where? :( (neg=-3) 
24
- Thanks! Preliminary 
qualitative analysis using 
LIWC 
- Positive score = 3 
SSE@FSE 2014 25
Next steps 
 Separate positive emotions from gratitude 
expressions 
 Qualitative analysis using of the first 1000 
questions with highest positive sentiment score 
 Gratitude and politeness are the most frequent cases 
 ‘Cheers’, ‘Thanks (in advance)’, ‘Thank you’, … 
 Gratitude is positively associated to success of 
requests (Althoff et al., 2014) 
26
Next steps 
 Further lexical analysis 
 Assessing the suitability of state-of-the-art tools for 
sentiment analysis 
 Modeling the ‘success lexicon’ 
 Classification study: is success predictable? 
 Preliminary results: 0.67 accuracy 
 Investigate other research questions 
 Emotions and perceived quality of answers 
 Emotions and reputation 
 Emotions and topics 
27
Towards Discovering the Role of Emotions in Stack Overflow
Thank you 
N. Novielli, F. Calefato, F. Lanubile 
University of Bari, Italy 
{nicole.novielli, fabio.calefato, filippo.lanubile}@uniba.it
1 of 29

Recommended

The Challenges of Affect Detection in the Social Programmer Ecosystem by
The Challenges of Affect Detection in the Social Programmer EcosystemThe Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer EcosystemNicole Novielli
1.5K views63 slides
A Preliminary Investigation of the Effect of Social Media on Affective Trust ... by
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...Nicole Novielli
1.4K views25 slides
Affective Trust as a Predictor of Successful Collaboration in Distributed Sof... by
Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...
Affective Trust as a Predictor of Successful Collaboration in Distributed Sof...Fabio Calefato
130 views12 slides
[0417] seunghyeong choe by
[0417] seunghyeong choe[0417] seunghyeong choe
[0417] seunghyeong choeivaderivader
46 views19 slides
A Pragmatic Perspective on Software Visualization by
A Pragmatic Perspective on Software VisualizationA Pragmatic Perspective on Software Visualization
A Pragmatic Perspective on Software VisualizationArie van Deursen
1.9K views59 slides
Psychometrics 2020 by
Psychometrics 2020Psychometrics 2020
Psychometrics 2020Juho Toivola
930 views11 slides

More Related Content

Viewers also liked

Improving Low Quality Stack Overflow Post Detection by
Improving Low Quality Stack Overflow Post DetectionImproving Low Quality Stack Overflow Post Detection
Improving Low Quality Stack Overflow Post DetectionLuca Ponzanelli
1.3K views54 slides
DOs and DONT’s of Social Analytics by
DOs and DONT’s of Social AnalyticsDOs and DONT’s of Social Analytics
DOs and DONT’s of Social AnalyticsChristophe Lauer
1.7K views51 slides
Collaborazione nelle comunità open source: tecniche e strumenti by
Collaborazione nelle comunità open source: tecniche e strumentiCollaborazione nelle comunità open source: tecniche e strumenti
Collaborazione nelle comunità open source: tecniche e strumentiFilippo Lanubile
722 views39 slides
What can Bioinformaticians learn from YouTube? by
What can Bioinformaticians learn from YouTube?What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?Matt Wood
737 views71 slides
Kaggle's WISE 2014 challenge by
Kaggle's WISE 2014 challenge Kaggle's WISE 2014 challenge
Kaggle's WISE 2014 challenge Eleftherios Spyromitros-Xioufis
1K views20 slides
Big Data and Social Media Mining in Crisis and Emergency Management by
Big Data and Social Media Mining in Crisis and Emergency ManagementBig Data and Social Media Mining in Crisis and Emergency Management
Big Data and Social Media Mining in Crisis and Emergency ManagementBYTE Project
789 views10 slides

Viewers also liked(20)

Improving Low Quality Stack Overflow Post Detection by Luca Ponzanelli
Improving Low Quality Stack Overflow Post DetectionImproving Low Quality Stack Overflow Post Detection
Improving Low Quality Stack Overflow Post Detection
Luca Ponzanelli1.3K views
DOs and DONT’s of Social Analytics by Christophe Lauer
DOs and DONT’s of Social AnalyticsDOs and DONT’s of Social Analytics
DOs and DONT’s of Social Analytics
Christophe Lauer1.7K views
Collaborazione nelle comunità open source: tecniche e strumenti by Filippo Lanubile
Collaborazione nelle comunità open source: tecniche e strumentiCollaborazione nelle comunità open source: tecniche e strumenti
Collaborazione nelle comunità open source: tecniche e strumenti
Filippo Lanubile722 views
What can Bioinformaticians learn from YouTube? by Matt Wood
What can Bioinformaticians learn from YouTube?What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?
Matt Wood737 views
Big Data and Social Media Mining in Crisis and Emergency Management by BYTE Project
Big Data and Social Media Mining in Crisis and Emergency ManagementBig Data and Social Media Mining in Crisis and Emergency Management
Big Data and Social Media Mining in Crisis and Emergency Management
BYTE Project789 views
Stackoverflow Data Analysis-Homework3 by Ayush Tak
Stackoverflow Data Analysis-Homework3Stackoverflow Data Analysis-Homework3
Stackoverflow Data Analysis-Homework3
Ayush Tak503 views
StackOverflow Architectural Overview by Folio3 Software
StackOverflow Architectural OverviewStackOverflow Architectural Overview
StackOverflow Architectural Overview
Folio3 Software4.5K views
Naïve multi label classification of you tube comments using by Nidhi Baranwal
Naïve multi label classification of you tube comments usingNaïve multi label classification of you tube comments using
Naïve multi label classification of you tube comments using
Nidhi Baranwal467 views
Transferring Software Testing Tools to Practice by Tao Xie
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to Practice
Tao Xie539 views
Software Analytics: Towards Software Mining that Matters by Tao Xie
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that Matters
Tao Xie1.4K views
The (R)evolution of Social Media in Software Engineering by Margaret-Anne Storey
The (R)evolution of Social Media in Software EngineeringThe (R)evolution of Social Media in Software Engineering
The (R)evolution of Social Media in Software Engineering
Benevol 2012 Keynote: The Social Software (R)evolution by Margaret-Anne Storey
Benevol 2012 Keynote: The Social Software (R)evolutionBenevol 2012 Keynote: The Social Software (R)evolution
Benevol 2012 Keynote: The Social Software (R)evolution
FSE 2016 Panel: The State of Software Engineering Research by Margaret-Anne Storey
FSE 2016 Panel: The State of Software Engineering ResearchFSE 2016 Panel: The State of Software Engineering Research
FSE 2016 Panel: The State of Software Engineering Research
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom... by Margaret-Anne Storey
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
Margaret-Anne Storey15.8K views
Crowdsourcing Documentation in Software Engineering by Margaret-Anne Storey
Crowdsourcing Documentation in Software EngineeringCrowdsourcing Documentation in Software Engineering
Crowdsourcing Documentation in Software Engineering
Stack Overflow slides Data Analytics by Rahul Thankachan
Stack Overflow slides Data Analytics Stack Overflow slides Data Analytics
Stack Overflow slides Data Analytics
Rahul Thankachan2.2K views
Data mining on social networks for students learning experiences by Biplab Debnath
Data mining on social networks for students learning experiences Data mining on social networks for students learning experiences
Data mining on social networks for students learning experiences
Biplab Debnath381 views

Similar to Towards Discovering the Role of Emotions in Stack Overflow

Filippo Lanubile's talk @IASESE 2018 by
Filippo Lanubile's talk @IASESE 2018Filippo Lanubile's talk @IASESE 2018
Filippo Lanubile's talk @IASESE 2018Filippo Lanubile
51 views32 slides
Engaging Students in Distance Learning by
Engaging Students in Distance LearningEngaging Students in Distance Learning
Engaging Students in Distance LearningS Gasson
1.7K views48 slides
PR2-Questionnaire.pptx by
PR2-Questionnaire.pptxPR2-Questionnaire.pptx
PR2-Questionnaire.pptxJessaBejer1
16 views16 slides
Peerwise and students’ contribution experiences from the field by
Peerwise and students’ contribution experiences from the fieldPeerwise and students’ contribution experiences from the field
Peerwise and students’ contribution experiences from the fieldLenandlar Singh
629 views43 slides
12_quantitative-research-methodology.ppt by
12_quantitative-research-methodology.ppt12_quantitative-research-methodology.ppt
12_quantitative-research-methodology.pptMcPoolMac
5 views17 slides
12_quantitative-research-methodology.ppt by
12_quantitative-research-methodology.ppt12_quantitative-research-methodology.ppt
12_quantitative-research-methodology.pptRizkyAmelia80
6 views17 slides

Similar to Towards Discovering the Role of Emotions in Stack Overflow(20)

Engaging Students in Distance Learning by S Gasson
Engaging Students in Distance LearningEngaging Students in Distance Learning
Engaging Students in Distance Learning
S Gasson1.7K views
PR2-Questionnaire.pptx by JessaBejer1
PR2-Questionnaire.pptxPR2-Questionnaire.pptx
PR2-Questionnaire.pptx
JessaBejer116 views
Peerwise and students’ contribution experiences from the field by Lenandlar Singh
Peerwise and students’ contribution experiences from the fieldPeerwise and students’ contribution experiences from the field
Peerwise and students’ contribution experiences from the field
Lenandlar Singh629 views
12_quantitative-research-methodology.ppt by McPoolMac
12_quantitative-research-methodology.ppt12_quantitative-research-methodology.ppt
12_quantitative-research-methodology.ppt
McPoolMac5 views
12_quantitative-research-methodology.ppt by RizkyAmelia80
12_quantitative-research-methodology.ppt12_quantitative-research-methodology.ppt
12_quantitative-research-methodology.ppt
RizkyAmelia806 views
12_quantitative-research-methodology.ppt by ssuser23a6db1
12_quantitative-research-methodology.ppt12_quantitative-research-methodology.ppt
12_quantitative-research-methodology.ppt
ssuser23a6db18 views
Survey Methodology and Questionnaire Design Theory Part I by Qualtrics
Survey Methodology and Questionnaire Design Theory Part ISurvey Methodology and Questionnaire Design Theory Part I
Survey Methodology and Questionnaire Design Theory Part I
Qualtrics4.3K views
DIY: Research on a shoestring budget by J. Todd Bennett
DIY: Research on a shoestring budgetDIY: Research on a shoestring budget
DIY: Research on a shoestring budget
J. Todd Bennett830 views
How to Ask for Technical Help? Evidence-based Guidelines for Writing Question... by Fabio Calefato
How to Ask for Technical Help? Evidence-based Guidelines for Writing Question...How to Ask for Technical Help? Evidence-based Guidelines for Writing Question...
How to Ask for Technical Help? Evidence-based Guidelines for Writing Question...
Fabio Calefato200 views
(Re)Writing History: Scoring GED Social Studies Test Extended Responses by Meagen Farrell
(Re)Writing History: Scoring GED Social Studies Test Extended Responses(Re)Writing History: Scoring GED Social Studies Test Extended Responses
(Re)Writing History: Scoring GED Social Studies Test Extended Responses
Meagen Farrell3K views
Use of online quizzes to support inquiry-based learning in chemical engineering by cilass.slideshare
Use of online quizzes to support inquiry-based learning in chemical engineeringUse of online quizzes to support inquiry-based learning in chemical engineering
Use of online quizzes to support inquiry-based learning in chemical engineering
香港六合彩 by iewsxc
香港六合彩香港六合彩
香港六合彩
iewsxc273 views
UXPA 2021: How do you know your users feel satisfied by UXPA International
UXPA 2021: How do you know your users feel satisfied   UXPA 2021: How do you know your users feel satisfied
UXPA 2021: How do you know your users feel satisfied
Rapid Reaction and Response Project by Morse Project
Rapid Reaction and Response ProjectRapid Reaction and Response Project
Rapid Reaction and Response Project
Morse Project704 views
Engineering Knowledge, Skills, and Abilities by Lisa Benson
Engineering Knowledge, Skills, and AbilitiesEngineering Knowledge, Skills, and Abilities
Engineering Knowledge, Skills, and Abilities
Lisa Benson2.5K views

More from Nicole Novielli

Towards Supporting Emotion Awareness of Software Developers by
Towards Supporting Emotion Awareness of Software DevelopersTowards Supporting Emotion Awareness of Software Developers
Towards Supporting Emotion Awareness of Software DevelopersNicole Novielli
11 views47 slides
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges by
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open ChallengesKeynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open ChallengesNicole Novielli
16 views83 slides
To Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis by
To Label or Not? Advances and Open Challenges in SE-specific Sentiment AnalysisTo Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment AnalysisNicole Novielli
164 views75 slides
Emotion Detection Using Noninvasive Low-cost Sensors by
Emotion Detection Using Noninvasive Low-cost SensorsEmotion Detection Using Noninvasive Low-cost Sensors
Emotion Detection Using Noninvasive Low-cost SensorsNicole Novielli
224 views70 slides
Evalita2018 iListen - itaLIan Speech acT labEliNg by
Evalita2018 iListen - itaLIan Speech acT labEliNgEvalita2018 iListen - itaLIan Speech acT labEliNg
Evalita2018 iListen - itaLIan Speech acT labEliNgNicole Novielli
213 views20 slides
A Benchmark Study on Sentiment Analysis for Software Engineering Research by
A Benchmark Study on Sentiment Analysis for Software Engineering ResearchA Benchmark Study on Sentiment Analysis for Software Engineering Research
A Benchmark Study on Sentiment Analysis for Software Engineering ResearchNicole Novielli
552 views30 slides

More from Nicole Novielli(9)

Towards Supporting Emotion Awareness of Software Developers by Nicole Novielli
Towards Supporting Emotion Awareness of Software DevelopersTowards Supporting Emotion Awareness of Software Developers
Towards Supporting Emotion Awareness of Software Developers
Nicole Novielli11 views
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges by Nicole Novielli
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open ChallengesKeynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
Nicole Novielli16 views
To Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis by Nicole Novielli
To Label or Not? Advances and Open Challenges in SE-specific Sentiment AnalysisTo Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
Nicole Novielli164 views
Emotion Detection Using Noninvasive Low-cost Sensors by Nicole Novielli
Emotion Detection Using Noninvasive Low-cost SensorsEmotion Detection Using Noninvasive Low-cost Sensors
Emotion Detection Using Noninvasive Low-cost Sensors
Nicole Novielli224 views
Evalita2018 iListen - itaLIan Speech acT labEliNg by Nicole Novielli
Evalita2018 iListen - itaLIan Speech acT labEliNgEvalita2018 iListen - itaLIan Speech acT labEliNg
Evalita2018 iListen - itaLIan Speech acT labEliNg
Nicole Novielli213 views
A Benchmark Study on Sentiment Analysis for Software Engineering Research by Nicole Novielli
A Benchmark Study on Sentiment Analysis for Software Engineering ResearchA Benchmark Study on Sentiment Analysis for Software Engineering Research
A Benchmark Study on Sentiment Analysis for Software Engineering Research
Nicole Novielli552 views
Deep Tweets: from Entity Linking to Sentiment Analysis by Nicole Novielli
Deep Tweets: from Entity Linking to Sentiment AnalysisDeep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment Analysis
Nicole Novielli1.3K views
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com... by Nicole Novielli
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
Nicole Novielli1.5K views
Social Network Analysis for Global Software Engineering: Exploring relationsh... by Nicole Novielli
Social Network Analysis for Global Software Engineering: Exploring relationsh...Social Network Analysis for Global Software Engineering: Exploring relationsh...
Social Network Analysis for Global Software Engineering: Exploring relationsh...
Nicole Novielli3.2K views

Recently uploaded

Listed Instruments Survey 2022.pptx by
Listed Instruments Survey  2022.pptxListed Instruments Survey  2022.pptx
Listed Instruments Survey 2022.pptxsecretariat4
52 views12 slides
Short Story Assignment by Kelly Nguyen by
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyenkellynguyen01
20 views17 slides
VoxelNet by
VoxelNetVoxelNet
VoxelNettaeseon ryu
16 views21 slides
LIVE OAK MEMORIAL PARK.pptx by
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptxms2332always
7 views6 slides
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... by
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...DataScienceConferenc1
5 views18 slides
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an... by
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...StatsCommunications
7 views26 slides

Recently uploaded(20)

Listed Instruments Survey 2022.pptx by secretariat4
Listed Instruments Survey  2022.pptxListed Instruments Survey  2022.pptx
Listed Instruments Survey 2022.pptx
secretariat452 views
Short Story Assignment by Kelly Nguyen by kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0120 views
LIVE OAK MEMORIAL PARK.pptx by ms2332always
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptx
ms2332always7 views
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... by DataScienceConferenc1
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an... by StatsCommunications
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by DataScienceConferenc1
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
4_4_WP_4_06_ND_Model.pptx by d6fmc6kwd4
4_4_WP_4_06_ND_Model.pptx4_4_WP_4_06_ND_Model.pptx
4_4_WP_4_06_ND_Model.pptx
d6fmc6kwd47 views
Data about the sector workshop by info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821729 views
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f... by DataScienceConferenc1
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int... by DataScienceConferenc1
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ... by DataScienceConferenc1
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1007 views
Data Journeys Hard Talk workshop final.pptx by info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821711 views
Lack of communication among family.pptx by ahmed164023
Lack of communication among family.pptxLack of communication among family.pptx
Lack of communication among family.pptx
ahmed16402314 views
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ... by DataScienceConferenc1
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...

Towards Discovering the Role of Emotions in Stack Overflow

  • 1. Towards Discovering the Role of Emotions in Stack Overflow N. Novielli, F. Calefato, F. Lanubile University of Bari, Italy {nicole.novielli, fabio.calefato, filippo.lanubile}@uniba.it
  • 2. A new way to access knowledge SSE@FSE 2014 2
  • 3. How Do Programmers Ask and Answers Questions?  Which questions are answered well and which ones remain unanswered? (Treude et al., ICSE’11), (Asudazzaman et al., MSR’13)  Can we predict how long a question will remain unanswered? (Asudazzaman et al., MSR’13)  What are the main discussion topics? (Barua et al., ’12), (Bajaji et al., MSR’14)  What are the main factors affecting reputation? (Bosu et al., MSR’13)
  • 4. Emotions in Social Computing and SSE  Sentiment Analysis on Yahoo! Answers (Kucuktunc et al., WSDM’12)  Answers perceived as good have a more neutral sentiment than others  Do developers feel emotions? (Murgia, et al., MSR’14)  Apache Software Foundation issue tracker  Sentiment Analysis of Commit comments in GitHub (Guzman et al., MSR’13)  Correlation with day and time, programming language, team distribution SSE@FSE 2014 4
  • 5. Research Question Getting emotional while asking or answering questions in Stack Overflow: good or bad?  Impact on success of questions  Impact on perceived quality of answers  Correlation with reputation  Correlation with topics  … SSE@FSE 2014 5
  • 6. Preliminary study  RQ1:To what degree does the emotional style of a question affect the probability of success?  A successful question has an accepted answer SSE@FSE 2014 6
  • 8. Dataset distribution SSE@FSE 2014 8 No accepted Answers (31%) No Answers (11%) Accepted Answers (58%) Successful 4,196,125 questions Unsuccessful 3,013,677 questions
  • 9. Building the Model SSE@FSE 2014 9 Post Properties • Title Length • Post Length • Code Blocks • Day • Time • Topic • # Comments Social Factors • Question Score • Answer Score • # Accepted answer provided • # Answers accepted • # Badges Affective Factors •Sentiment Polarity • Polarity of Question/Answer • Polarity of Comments •Lexical Cues of Affective States • Positive emotions lexicon • Negative emotions lexicon • Gratitude • Politeness • Attitude of doubt • … Control Model
  • 10. The Model Post Properties Social Factors Affective Factors SSE@FSE 2014 10 Control Model Independent variables, logistic regression model Dependent variable: success of a question (Y/N)
  • 11. Post Properties - Metrics • Title and Post Length: # words • Alhoff at al., @ICWSM’14; Asaduzzaman et al., @MSR’13 • Used by SO moderators for automatic filtering • Code Blocks: yes/no • Treude et al., @ICSE’11 • Day: in {weekday, weekend} • Bosu et al., @MSR2013 • Time: in {morning, afternoon, evening night} • Bosu et al., @MSR2013 • Topic: categorical, using LDA • Asaduzzaman et al., @MSR’13; Bosu et al., @MSR’13 • Harper et al., @CHI’08 • Barua et al., Empirical Software Engineering 2014 SSE@FSE 2014 11
  • 12. Social Factors - Metrics • Assessing the reputation of the author of the question at the time it is posted • High status correlated with success in Reddit.com (Althoff et al., ICWSM’14) • Novices’ questions are more likely answered on Stack Overflow (Treude et al., ICSE’ 11) • Metrics to approximate the author’s reputation • Question Score: upvotes - downvotes on questions • Answer Score: upvotes – downvotes on answers • # Accepted answer provided • # Answers accepted • # Badges: total badges owned SSE@FSE 2014 12
  • 13. Affective Factors • Sentiment Polarity • Questions/Answers • Polarity of Comments SSE@FSE 2014 13
  • 14. Sentiment Analysis Emotion Detection Subjective vs. Objective Negative vs. Positive Classification using Discrete Emotion Labels Goal ‘I can't solve this problem, it’s very frustrating’ SSE@FSE 2014 14 Example Resources - SentiStrength (Thelwall et al., 2012) - SentiWordNet (Esuli and Sebastiani, 2006) - MPQA Lexicon (Wilson et al., EMNLP’05) - … - LIWC (Tausczik and Pennebaker, 2010) - WordNet Affect (Strapparava and Valitutti, 2004) - Depeche Mood (Staiano and Guerini, ACL’14) - … Sad, Frustrated ‘I can't solve this problem, it’s very frustrating’ Subjective, Negative
  • 15. Affective Factors • Sentiment Polarity • Question • Polarity of Comments • Lexical Cues of Affective States • Positive emotions lexicon • Negative emotions lexicon • Gratitude • Politeness • Attitude of doubt • … Future work - Sentistrength: http://sentistrength.wlv.ac.uk/ SSE@FSE 2014 15
  • 16. SentiStrength  Estimates the strength of both positive and negative sentiment in questions and comments  Robust also for informal language  Used in previous research  Sentiment Analysis of commit comments in GitHub (Guzman et al., MSR’13)  Sentiment Analysis on Yahoo! Answers (Kucuktnc et al., WSDM’12) SSE@FSE 2014 16
  • 17. Preliminary results - Post Properties 17 Coeff Odds Ratio Code Blocks 0.2549 1.29 # of comments -0.3659 0.69 Day (Weekend) 0.0131 1.01 TIME Afternoon 0.1418 1.15 Evening 0.2093 1.23 Night 0.1085 1.12 Post LENGTH Body Length -0,0004 0.99 Title Length -0.0039 0.99 All significant, with a = 0.05 • Review questions are more concrete and get more answers (Treude et al., ICSE’11) and vague questions remain unanswered (Asaduzzaman et al., MSR’13) • SO off-peak hours (night): longer answer interval and less questions posted (Barua et al., MSR’13)
  • 18. Post properties: Topic 18 Coeff Odds Ratio DATABASES/PERFORMANCE 0.4062 1.50 WEB PROGRAMMING 0.2725 1.31 GRAPHICS 0.2415 1.27 WEB PROGRAMMING/HTTP 0.1441 1.16 JAVA 0.0029 1.00 OOP 0.8599 2.36 MOBILE DEVELOPMENT/iOS 0.2664 1.30 SOURCE CODE MANAGEMENT 0.2805 1.32 DATA STRUCTURE/ALGORITHMS 0.7340 2.08 .NET FRAMEWORK/ASP 0.3442 1.41 SCRIPTING 0.3649 1.44 DATABASES/SQL 0.4488 1.57 WEB APP DEVELOPMENT 0.3330 1.40 MOBILE DEV/ANDROID 0.1111 1.12 All significant, with a = 0.05
  • 19. Success rate per topic 19 Topic Success rate Number of questions Post rate OOP 6 70,81% 630258 8,84% DATA STRUCTURE/ALGORITHMS 9 67,73% 798713 11,20% DATABASES/SQL 12 61,12% 582130 8,16% .NET FRAMEWORK/ASP 10 58,73% 518834 7,28% SCRIPTING 11 58,54% 497763 6,98% WEB APP DEVELOPMENT 13 58,47% 492173 6,90% DATABASES/PERFORMANCE 0 57,72% 415825 5,83% WEB PROGRAMMING 1 56,59% 536255 7,52% SOURCE CODE MANAGEMENT 8 55,37% 373397 5,24% GRAPHICS 2 54,37% 383376 5,38% MOBILE DEVELOPMENT/iOS 7 53,91% 376517 5,28% WEB PROGRAMMING/HTTP 3 52,22% 375510 5,27% MOBILE DEV/ANDROID 14 51,50% 432095 6,06% JAVA 5 49,35% 235489 3,30% WEB AUTHENTICATION/API 4 49,00% 482992 6,77%
  • 20. Preliminary Results – Social Factors Coeff Odds Ratio User Question Score* -0,0017 0.99 User Answer Score* -0,0002 0.99 User Answers Accepted* 0,0047 1.00 User Questions Accepted* 0,0078 1.00 Number Of Badges 0,0001 1.0001103 SSE@FSE 2014 20 *significant with a = 0.05
  • 21. Preliminary Results – Affective Factors Coef Odds Ratio SENTIMENT of the QUESTION Question Positive Score -0.0248 0.98 Question Negative Score -0.0083 0.99 SENTIMENT of the author’s COMMENTS Comment Positive Score -0.1813 0.83 Comment Negative Score -0.1080 0.90 All significant, with a = 0.05 SSE@FSE 2014 21
  • 22. Impact of Positive Sentiment on Success Positive polarity of QUESTION Positive polarity of COMMENTS 22
  • 23. Impact of Negative Sentiment on Success Negative polarity of QUESTION Negative Polarity of COMMENTS 23
  • 24. Problems in detecting sentiment  ‘Problem’ lexicon is too peculiar for the domain to be considered as a pure expression of negative emotions  Actually describing emotions  ‘I have very simple and stupid trouble […] I'm pretty confused, explain please, what is wrong?’ (neg=-2)  ‘Sorry for troubling you guys’ (neg=-2)  Simply describing problem  What is the best way to kill a critical process? (neg=-2)  What is wrong? (neg=-2)  Mixed  I’m missing a parenthesis . But where? :( (neg=-3) 24
  • 25. - Thanks! Preliminary qualitative analysis using LIWC - Positive score = 3 SSE@FSE 2014 25
  • 26. Next steps  Separate positive emotions from gratitude expressions  Qualitative analysis using of the first 1000 questions with highest positive sentiment score  Gratitude and politeness are the most frequent cases  ‘Cheers’, ‘Thanks (in advance)’, ‘Thank you’, …  Gratitude is positively associated to success of requests (Althoff et al., 2014) 26
  • 27. Next steps  Further lexical analysis  Assessing the suitability of state-of-the-art tools for sentiment analysis  Modeling the ‘success lexicon’  Classification study: is success predictable?  Preliminary results: 0.67 accuracy  Investigate other research questions  Emotions and perceived quality of answers  Emotions and reputation  Emotions and topics 27
  • 29. Thank you N. Novielli, F. Calefato, F. Lanubile University of Bari, Italy {nicole.novielli, fabio.calefato, filippo.lanubile}@uniba.it

Editor's Notes

  1. How do this relate with previous research on this domain? How do this relate with reputation and expert distribution in the Stack Overflow community?