Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Towards Discovering the Role of Emotions in Stack Overflow

1,591 views

Published on

N. Novielli, F. Calefato, F. Lanubile. “Towards Discovering the Role of Emotions in Stack Overflow” – In Proceedings of the 6th International Workshop on Social Software Engineering pp. 33-36, ACM 2014
************************************************************************************************************
Today, people increasingly try to solve domain-specific problems through interaction on online Question and Answer (Q&A) sites, such as Stack Overflow. The growing success of the Stack Overflow community largely depends on the will of their members to answer others’ questions. Recent research has shown that the factors that push members of online communities encompass both social and technical aspects. Yet, we argue that also the emotional style of a technical question does influence the probability of promptly obtaining a satisfying answer. In this presentation, we describe the design of an empirical study aimed to investigate the role of affective lexicon on the questions posted in Stack Overflow.

Published in: Data & Analytics

Towards Discovering the Role of Emotions in Stack Overflow

  1. 1. Towards Discovering the Role of Emotions in Stack Overflow N. Novielli, F. Calefato, F. Lanubile University of Bari, Italy {nicole.novielli, fabio.calefato, filippo.lanubile}@uniba.it
  2. 2. A new way to access knowledge SSE@FSE 2014 2
  3. 3. How Do Programmers Ask and Answers Questions?  Which questions are answered well and which ones remain unanswered? (Treude et al., ICSE’11), (Asudazzaman et al., MSR’13)  Can we predict how long a question will remain unanswered? (Asudazzaman et al., MSR’13)  What are the main discussion topics? (Barua et al., ’12), (Bajaji et al., MSR’14)  What are the main factors affecting reputation? (Bosu et al., MSR’13)
  4. 4. Emotions in Social Computing and SSE  Sentiment Analysis on Yahoo! Answers (Kucuktunc et al., WSDM’12)  Answers perceived as good have a more neutral sentiment than others  Do developers feel emotions? (Murgia, et al., MSR’14)  Apache Software Foundation issue tracker  Sentiment Analysis of Commit comments in GitHub (Guzman et al., MSR’13)  Correlation with day and time, programming language, team distribution SSE@FSE 2014 4
  5. 5. Research Question Getting emotional while asking or answering questions in Stack Overflow: good or bad?  Impact on success of questions  Impact on perceived quality of answers  Correlation with reputation  Correlation with topics  … SSE@FSE 2014 5
  6. 6. Preliminary study  RQ1:To what degree does the emotional style of a question affect the probability of success?  A successful question has an accepted answer SSE@FSE 2014 6
  7. 7. SSE@FSE 2014 7
  8. 8. Dataset distribution SSE@FSE 2014 8 No accepted Answers (31%) No Answers (11%) Accepted Answers (58%) Successful 4,196,125 questions Unsuccessful 3,013,677 questions
  9. 9. Building the Model SSE@FSE 2014 9 Post Properties • Title Length • Post Length • Code Blocks • Day • Time • Topic • # Comments Social Factors • Question Score • Answer Score • # Accepted answer provided • # Answers accepted • # Badges Affective Factors •Sentiment Polarity • Polarity of Question/Answer • Polarity of Comments •Lexical Cues of Affective States • Positive emotions lexicon • Negative emotions lexicon • Gratitude • Politeness • Attitude of doubt • … Control Model
  10. 10. The Model Post Properties Social Factors Affective Factors SSE@FSE 2014 10 Control Model Independent variables, logistic regression model Dependent variable: success of a question (Y/N)
  11. 11. Post Properties - Metrics • Title and Post Length: # words • Alhoff at al., @ICWSM’14; Asaduzzaman et al., @MSR’13 • Used by SO moderators for automatic filtering • Code Blocks: yes/no • Treude et al., @ICSE’11 • Day: in {weekday, weekend} • Bosu et al., @MSR2013 • Time: in {morning, afternoon, evening night} • Bosu et al., @MSR2013 • Topic: categorical, using LDA • Asaduzzaman et al., @MSR’13; Bosu et al., @MSR’13 • Harper et al., @CHI’08 • Barua et al., Empirical Software Engineering 2014 SSE@FSE 2014 11
  12. 12. Social Factors - Metrics • Assessing the reputation of the author of the question at the time it is posted • High status correlated with success in Reddit.com (Althoff et al., ICWSM’14) • Novices’ questions are more likely answered on Stack Overflow (Treude et al., ICSE’ 11) • Metrics to approximate the author’s reputation • Question Score: upvotes - downvotes on questions • Answer Score: upvotes – downvotes on answers • # Accepted answer provided • # Answers accepted • # Badges: total badges owned SSE@FSE 2014 12
  13. 13. Affective Factors • Sentiment Polarity • Questions/Answers • Polarity of Comments SSE@FSE 2014 13
  14. 14. Sentiment Analysis Emotion Detection Subjective vs. Objective Negative vs. Positive Classification using Discrete Emotion Labels Goal ‘I can't solve this problem, it’s very frustrating’ SSE@FSE 2014 14 Example Resources - SentiStrength (Thelwall et al., 2012) - SentiWordNet (Esuli and Sebastiani, 2006) - MPQA Lexicon (Wilson et al., EMNLP’05) - … - LIWC (Tausczik and Pennebaker, 2010) - WordNet Affect (Strapparava and Valitutti, 2004) - Depeche Mood (Staiano and Guerini, ACL’14) - … Sad, Frustrated ‘I can't solve this problem, it’s very frustrating’ Subjective, Negative
  15. 15. Affective Factors • Sentiment Polarity • Question • Polarity of Comments • Lexical Cues of Affective States • Positive emotions lexicon • Negative emotions lexicon • Gratitude • Politeness • Attitude of doubt • … Future work - Sentistrength: http://sentistrength.wlv.ac.uk/ SSE@FSE 2014 15
  16. 16. SentiStrength  Estimates the strength of both positive and negative sentiment in questions and comments  Robust also for informal language  Used in previous research  Sentiment Analysis of commit comments in GitHub (Guzman et al., MSR’13)  Sentiment Analysis on Yahoo! Answers (Kucuktnc et al., WSDM’12) SSE@FSE 2014 16
  17. 17. Preliminary results - Post Properties 17 Coeff Odds Ratio Code Blocks 0.2549 1.29 # of comments -0.3659 0.69 Day (Weekend) 0.0131 1.01 TIME Afternoon 0.1418 1.15 Evening 0.2093 1.23 Night 0.1085 1.12 Post LENGTH Body Length -0,0004 0.99 Title Length -0.0039 0.99 All significant, with a = 0.05 • Review questions are more concrete and get more answers (Treude et al., ICSE’11) and vague questions remain unanswered (Asaduzzaman et al., MSR’13) • SO off-peak hours (night): longer answer interval and less questions posted (Barua et al., MSR’13)
  18. 18. Post properties: Topic 18 Coeff Odds Ratio DATABASES/PERFORMANCE 0.4062 1.50 WEB PROGRAMMING 0.2725 1.31 GRAPHICS 0.2415 1.27 WEB PROGRAMMING/HTTP 0.1441 1.16 JAVA 0.0029 1.00 OOP 0.8599 2.36 MOBILE DEVELOPMENT/iOS 0.2664 1.30 SOURCE CODE MANAGEMENT 0.2805 1.32 DATA STRUCTURE/ALGORITHMS 0.7340 2.08 .NET FRAMEWORK/ASP 0.3442 1.41 SCRIPTING 0.3649 1.44 DATABASES/SQL 0.4488 1.57 WEB APP DEVELOPMENT 0.3330 1.40 MOBILE DEV/ANDROID 0.1111 1.12 All significant, with a = 0.05
  19. 19. Success rate per topic 19 Topic Success rate Number of questions Post rate OOP 6 70,81% 630258 8,84% DATA STRUCTURE/ALGORITHMS 9 67,73% 798713 11,20% DATABASES/SQL 12 61,12% 582130 8,16% .NET FRAMEWORK/ASP 10 58,73% 518834 7,28% SCRIPTING 11 58,54% 497763 6,98% WEB APP DEVELOPMENT 13 58,47% 492173 6,90% DATABASES/PERFORMANCE 0 57,72% 415825 5,83% WEB PROGRAMMING 1 56,59% 536255 7,52% SOURCE CODE MANAGEMENT 8 55,37% 373397 5,24% GRAPHICS 2 54,37% 383376 5,38% MOBILE DEVELOPMENT/iOS 7 53,91% 376517 5,28% WEB PROGRAMMING/HTTP 3 52,22% 375510 5,27% MOBILE DEV/ANDROID 14 51,50% 432095 6,06% JAVA 5 49,35% 235489 3,30% WEB AUTHENTICATION/API 4 49,00% 482992 6,77%
  20. 20. Preliminary Results – Social Factors Coeff Odds Ratio User Question Score* -0,0017 0.99 User Answer Score* -0,0002 0.99 User Answers Accepted* 0,0047 1.00 User Questions Accepted* 0,0078 1.00 Number Of Badges 0,0001 1.0001103 SSE@FSE 2014 20 *significant with a = 0.05
  21. 21. Preliminary Results – Affective Factors Coef Odds Ratio SENTIMENT of the QUESTION Question Positive Score -0.0248 0.98 Question Negative Score -0.0083 0.99 SENTIMENT of the author’s COMMENTS Comment Positive Score -0.1813 0.83 Comment Negative Score -0.1080 0.90 All significant, with a = 0.05 SSE@FSE 2014 21
  22. 22. Impact of Positive Sentiment on Success Positive polarity of QUESTION Positive polarity of COMMENTS 22
  23. 23. Impact of Negative Sentiment on Success Negative polarity of QUESTION Negative Polarity of COMMENTS 23
  24. 24. Problems in detecting sentiment  ‘Problem’ lexicon is too peculiar for the domain to be considered as a pure expression of negative emotions  Actually describing emotions  ‘I have very simple and stupid trouble […] I'm pretty confused, explain please, what is wrong?’ (neg=-2)  ‘Sorry for troubling you guys’ (neg=-2)  Simply describing problem  What is the best way to kill a critical process? (neg=-2)  What is wrong? (neg=-2)  Mixed  I’m missing a parenthesis . But where? :( (neg=-3) 24
  25. 25. - Thanks! Preliminary qualitative analysis using LIWC - Positive score = 3 SSE@FSE 2014 25
  26. 26. Next steps  Separate positive emotions from gratitude expressions  Qualitative analysis using of the first 1000 questions with highest positive sentiment score  Gratitude and politeness are the most frequent cases  ‘Cheers’, ‘Thanks (in advance)’, ‘Thank you’, …  Gratitude is positively associated to success of requests (Althoff et al., 2014) 26
  27. 27. Next steps  Further lexical analysis  Assessing the suitability of state-of-the-art tools for sentiment analysis  Modeling the ‘success lexicon’  Classification study: is success predictable?  Preliminary results: 0.67 accuracy  Investigate other research questions  Emotions and perceived quality of answers  Emotions and reputation  Emotions and topics 27
  28. 28. Thank you N. Novielli, F. Calefato, F. Lanubile University of Bari, Italy {nicole.novielli, fabio.calefato, filippo.lanubile}@uniba.it

×