OPINION
MININGFOR
SOFTWARE
ENGINEERING
ALEXANDER SEREBRENIK
1
#WOCinTech Chat
Bin Lin Gabriele Bavota Michele Lanza
Nathan Cassee Nicole Novielli
sentiment polarity and
positivity degree identi
f
ication
Bo Pang, Lillian Lee (2008). Opinion Mining and Sentiment Analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1–135
subjectivity detection and
opinion identi
f
ication
joint topic-sentiment analysis viewpoints and perspectives other non-factual information
https://miro.medium.com/max/2040/1*YFroPGj9dpPx7nqf045AUQ.png
https://www.elephango.com/images/RCLG/check-13622.jpg
https://www.monocubed.com/wp-content/uploads/2020/11/Vue-vs-React-Comparison-of-Best-JavaScript-Frameworks.jpg
xkcd
https://pxhere.com/nl/photo/1620778
https://upload.wikimedia.org/wikipedia/commons/5/5a/Books_HD_%288314929977%29.jpg
https://upload.wikimedia.org/wikipedia/commons/5/5a/Books_HD_%288314929977%29.jpg
②
abstract
full paper
①
②
③
④
⑤
⑥
https://upload.wikimedia.org/wikipedia/commons/5/5a/Books_HD_%288314929977%29.jpg
②
abstract
full paper
①
②
③
④
⑤
⑥
("opinion mining" OR "sentiment analysis" OR “emotion")
AND (“software") AND ("developer" OR "development")
https://upload.wikimedia.org/wikipedia/commons/5/5a/Books_HD_%288314929977%29.jpg
②
abstract
full paper
①
②
③
④
⑤
⑥
795
127
71
1,056
268
114
71+114=185
2010-2019
71
RQ1 In which software engineering activities has opinion mining been applied?
RQ2 What publicly available opinion mining tools have been adopted/developed to support these activities?
RQ3 How often do researchers evaluate the reliability of opinion mining tools when they adopt the tools out-of-the box?
RQ4 Which opinion mining techniques have been compared in terms of performance and in what contexts?
RQ5
Which datasets are available for performance evaluation of opinion mining techniques in software-related contexts
and how are they curated?
RQ6 What are the concerns raised or the limitations encountered by researchers when using opinion mining techniques?
RQ1: In which software engineering activities has opinion mining been applied?
ISO/IEC/IEEE12207
RQ1: In which software engineering activities has opinion mining been applied?
ISO/IEC/IEEE12207
When should we plan end-of-life
of a system or a component?
Which architecture alternative
looks more promising?
How to adapt the planning
if setback is experienced
by developers?
RQ1: In which software engineering activities has opinion mining been applied?
ISO/IEC/IEEE12207
When should we plan end-of-life
of a system or a component?
Which architecture alternative
looks more promising?
How to adapt the planning
if setback is experienced
by developers?
RQ1: In which software engineering activities has opinion mining been applied?
ISO/IEC/IEEE12207
assessing technique/API
discovering rationale


API usage
general user satisfaction


user-reported issues/requests
identifying requirements
detecting emotion/sentiment


relating emotion/sentiment to performance


evaluating trust
RQ1: In which software engineering activities has opinion mining been applied?
ISO/IEC/IEEE12207
assessing technique/API
discovering rationale


API usage
general user satisfaction


user-reported issues/requests
identifying requirements
detecting emotion/sentiment


relating emotion/sentiment to performance


evaluating trust
Lin et al. ICSE 2019
RQ1: In which software engineering activities has opinion mining been applied?
ISO/IEC/IEEE12207
assessing technique/API
discovering rationale


API usage
general user satisfaction


user-reported issues/requests
identifying requirements
detecting emotion/sentiment


relating emotion/sentiment to performance


evaluating trust
Di Sorbo et al. ASE 2015
RQ1: In which software engineering activities has opinion mining been applied?
ISO/IEC/IEEE12207
assessing technique/API
discovering rationale


API usage
general user satisfaction


user-reported issues/requests
identifying requirements
detecting emotion/sentiment


relating emotion/sentiment to performance


evaluating trust
Operationalisations?


Tools?


Datasets?


Analysis techniques?
Operationalisations?


Tools?


Datasets?


Analysis techniques?
general purpose SE-speci
f
ic
sentiment SentiStrength


NLTK


Stanford CoreNLP


Watson Natural Language
Understanding*


Microsoft Azure Text
Analytics*


TextBlob


A
ff
in


USent


Syuzhet
Pattern


Rosette*


Aylien*


Narayanan et al., 2013
SentiStrength-SE


Senti4SD


SEntiMoji


SentiSW


SentiCR


SentiSE


emotion LIWC*


TensiStrength


NTUA
-
SLP
Deva


MarValous


EmoTxT
politeness politeness tool
trust Trust-Framework
opinion LDA


TwitterLDA
ARdoc


Ticket-Tagger


SURF


MARC 3.0


RE
-
SWOT


DeepTip


POME
general purpose SE-speci
f
ic
sentiment SentiStrength


NLTK


Stanford CoreNLP


Watson Natural Language
Understanding*


Microsoft Azure Text
Analytics*


TextBlob


A
ff
in


USent


Syuzhet
Pattern


Rosette*


Aylien*


Narayanan et al., 2013
SentiStrength-SE


Senti4SD


SEntiMoji


SentiSW


SentiCR


SentiSE


emotion LIWC*


TensiStrength


NTUA
-
SLP
Deva


MarValous


EmoTxT
politeness politeness tool
trust Trust-Framework
opinion LDA


TwitterLDA
ARdoc


Ticket-Tagger


SURF


MARC 3.0


RE
-
SWOT


DeepTip


POME
general purpose SE-speci
f
ic
sentiment SentiStrength


NLTK


Stanford CoreNLP


Watson Natural Language
Understanding*


Microsoft Azure Text
Analytics*


TextBlob


A
ff
in


USent


Syuzhet
Pattern


Rosette*


Aylien*


Narayanan et al., 2013
SentiStrength-SE


Senti4SD


SEntiMoji


SentiSW


SentiCR


SentiSE


emotion LIWC*


TensiStrength


NTUA
-
SLP
Deva


MarValous


EmoTxT
politeness politeness tool
trust Trust-Framework
opinion LDA


TwitterLDA
ARdoc


Ticket-Tagger


SURF


MARC 3.0


RE
-
SWOT


DeepTip


POME


Jongeling et al.: General-purpose sentiment
analysis tools are not reliable when applied
to software engineering texts (EMSE 2017)
Many SE-speci
f
ic tools
are not reused
Many SE-speci
f
ic tools
are not reused
Many SE-speci
f
ic tools
are not used
general purpose SE-speci
f
ic
sentiment SentiStrength


NLTK


Stanford CoreNLP


Watson Natural Language
Understanding*


Microsoft Azure Text
Analytics*


TextBlob


A
ff
in


USent


Syuzhet
Pattern


Rosette*


Aylien*


Narayanan et al., 2013
SentiStrength-SE


Senti4SD


SEntiMoji


SentiSW


SentiCR


SentiSE


emotion LIWC*


TensiStrength


NTUA
-
SLP
Deva


MarValous


EmoTxT
politeness politeness tool
trust Trust-Framework
opinion LDA


TwitterLDA
ARdoc


Ticket-Tagger


SURF


MARC 3.0


RE
-
SWOT


DeepTip


POME


Novielli et al.: Different SE-applications
require tool adaptation (MSR 2020)
2012 2013 2014 2015 2016 2017 2018 2019
SentiStrength
politeness
LDA
NLTK
LIWC*
Senti4SD
Stanford Core NLP
SentiStrength-SE
Watson Natural
Language*
Rosette*
TwitterLDA
SentiSE
SentiCR
Pattern*
Aylien*
Syuzhet
EmoTxT
Reliability of the tools is rarely
evaluated threatening conclusions
of the studies.
2012 2013 2014 2015 2016 2017 2018 2019
SentiStrength
politeness
LDA
NLTK
LIWC
Senti4SD
Stanford Core NLP
SentiStrength-SE
Watson Natural
Language
Rosette
TwitterLDA
SentiSE
SentiCR
Pattern
Aylien
Syuzhet
EmoTxT
Reliability of the tools is rarely
evaluated threatening conclusions
of the studies.
https://live.static
f
lickr.com/8732/16884639690_c206a818bf_b.jpg
341 - 4000 sentences


500 - 4800 texts
sentiment and emotion
content
712 - 12000 sentences


100 - 7100 texts
341 - 4000 sentences


500 - 4800 texts
sentiment and emotion
content
712 - 12000 sentences


100 - 7100 texts
Pre-train
model
Fine-tune
model
Repurpose
model
General
purpose data
SE data
Model 1
Model
2
Classi
f
ier
Robbes and Janes. ICSE NIER 2019
https://cdn.pixabay.com/photo/2013/02/23/20/33/seagull-85512_1280.jpg
Imtiaz et al. SEmotion 2018: Even human
ratings had low sentiment and politeness
consistency on GitHub comments.
@wzblin
@NathanCassee
@gbavota
@NicoleNovielli
@aserebrenik
@LanzaMichele

Opinion Mining for Software Engineering

  • 1.
  • 2.
    Bin Lin GabrieleBavota Michele Lanza Nathan Cassee Nicole Novielli
  • 4.
    sentiment polarity and positivitydegree identi f ication Bo Pang, Lillian Lee (2008). Opinion Mining and Sentiment Analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1–135 subjectivity detection and opinion identi f ication joint topic-sentiment analysis viewpoints and perspectives other non-factual information https://miro.medium.com/max/2040/1*YFroPGj9dpPx7nqf045AUQ.png https://www.elephango.com/images/RCLG/check-13622.jpg https://www.monocubed.com/wp-content/uploads/2020/11/Vue-vs-React-Comparison-of-Best-JavaScript-Frameworks.jpg xkcd
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
    RQ1 In whichsoftware engineering activities has opinion mining been applied? RQ2 What publicly available opinion mining tools have been adopted/developed to support these activities? RQ3 How often do researchers evaluate the reliability of opinion mining tools when they adopt the tools out-of-the box? RQ4 Which opinion mining techniques have been compared in terms of performance and in what contexts? RQ5 Which datasets are available for performance evaluation of opinion mining techniques in software-related contexts and how are they curated? RQ6 What are the concerns raised or the limitations encountered by researchers when using opinion mining techniques?
  • 11.
    RQ1: In whichsoftware engineering activities has opinion mining been applied? ISO/IEC/IEEE12207
  • 12.
    RQ1: In whichsoftware engineering activities has opinion mining been applied? ISO/IEC/IEEE12207 When should we plan end-of-life of a system or a component? Which architecture alternative looks more promising? How to adapt the planning if setback is experienced by developers?
  • 13.
    RQ1: In whichsoftware engineering activities has opinion mining been applied? ISO/IEC/IEEE12207 When should we plan end-of-life of a system or a component? Which architecture alternative looks more promising? How to adapt the planning if setback is experienced by developers?
  • 14.
    RQ1: In whichsoftware engineering activities has opinion mining been applied? ISO/IEC/IEEE12207 assessing technique/API discovering rationale API usage general user satisfaction user-reported issues/requests identifying requirements detecting emotion/sentiment relating emotion/sentiment to performance evaluating trust
  • 15.
    RQ1: In whichsoftware engineering activities has opinion mining been applied? ISO/IEC/IEEE12207 assessing technique/API discovering rationale API usage general user satisfaction user-reported issues/requests identifying requirements detecting emotion/sentiment relating emotion/sentiment to performance evaluating trust Lin et al. ICSE 2019
  • 16.
    RQ1: In whichsoftware engineering activities has opinion mining been applied? ISO/IEC/IEEE12207 assessing technique/API discovering rationale API usage general user satisfaction user-reported issues/requests identifying requirements detecting emotion/sentiment relating emotion/sentiment to performance evaluating trust Di Sorbo et al. ASE 2015
  • 17.
    RQ1: In whichsoftware engineering activities has opinion mining been applied? ISO/IEC/IEEE12207 assessing technique/API discovering rationale API usage general user satisfaction user-reported issues/requests identifying requirements detecting emotion/sentiment relating emotion/sentiment to performance evaluating trust
  • 20.
  • 21.
  • 23.
    general purpose SE-speci f ic sentimentSentiStrength NLTK Stanford CoreNLP Watson Natural Language Understanding* Microsoft Azure Text Analytics* TextBlob A ff in USent Syuzhet Pattern Rosette* Aylien* Narayanan et al., 2013 SentiStrength-SE Senti4SD SEntiMoji SentiSW SentiCR SentiSE emotion LIWC* TensiStrength NTUA - SLP Deva MarValous EmoTxT politeness politeness tool trust Trust-Framework opinion LDA TwitterLDA ARdoc Ticket-Tagger SURF MARC 3.0 RE - SWOT DeepTip POME
  • 24.
    general purpose SE-speci f ic sentimentSentiStrength NLTK Stanford CoreNLP Watson Natural Language Understanding* Microsoft Azure Text Analytics* TextBlob A ff in USent Syuzhet Pattern Rosette* Aylien* Narayanan et al., 2013 SentiStrength-SE Senti4SD SEntiMoji SentiSW SentiCR SentiSE emotion LIWC* TensiStrength NTUA - SLP Deva MarValous EmoTxT politeness politeness tool trust Trust-Framework opinion LDA TwitterLDA ARdoc Ticket-Tagger SURF MARC 3.0 RE - SWOT DeepTip POME
  • 25.
    general purpose SE-speci f ic sentimentSentiStrength NLTK Stanford CoreNLP Watson Natural Language Understanding* Microsoft Azure Text Analytics* TextBlob A ff in USent Syuzhet Pattern Rosette* Aylien* Narayanan et al., 2013 SentiStrength-SE Senti4SD SEntiMoji SentiSW SentiCR SentiSE emotion LIWC* TensiStrength NTUA - SLP Deva MarValous EmoTxT politeness politeness tool trust Trust-Framework opinion LDA TwitterLDA ARdoc Ticket-Tagger SURF MARC 3.0 RE - SWOT DeepTip POME Jongeling et al.: General-purpose sentiment analysis tools are not reliable when applied to software engineering texts (EMSE 2017) Many SE-speci f ic tools are not reused Many SE-speci f ic tools are not reused Many SE-speci f ic tools are not used
  • 26.
    general purpose SE-speci f ic sentimentSentiStrength NLTK Stanford CoreNLP Watson Natural Language Understanding* Microsoft Azure Text Analytics* TextBlob A ff in USent Syuzhet Pattern Rosette* Aylien* Narayanan et al., 2013 SentiStrength-SE Senti4SD SEntiMoji SentiSW SentiCR SentiSE emotion LIWC* TensiStrength NTUA - SLP Deva MarValous EmoTxT politeness politeness tool trust Trust-Framework opinion LDA TwitterLDA ARdoc Ticket-Tagger SURF MARC 3.0 RE - SWOT DeepTip POME Novielli et al.: Different SE-applications require tool adaptation (MSR 2020)
  • 27.
    2012 2013 20142015 2016 2017 2018 2019 SentiStrength politeness LDA NLTK LIWC* Senti4SD Stanford Core NLP SentiStrength-SE Watson Natural Language* Rosette* TwitterLDA SentiSE SentiCR Pattern* Aylien* Syuzhet EmoTxT Reliability of the tools is rarely evaluated threatening conclusions of the studies.
  • 28.
    2012 2013 20142015 2016 2017 2018 2019 SentiStrength politeness LDA NLTK LIWC Senti4SD Stanford Core NLP SentiStrength-SE Watson Natural Language Rosette TwitterLDA SentiSE SentiCR Pattern Aylien Syuzhet EmoTxT Reliability of the tools is rarely evaluated threatening conclusions of the studies.
  • 29.
  • 32.
    341 - 4000sentences 500 - 4800 texts sentiment and emotion content 712 - 12000 sentences 100 - 7100 texts
  • 33.
    341 - 4000sentences 500 - 4800 texts sentiment and emotion content 712 - 12000 sentences 100 - 7100 texts Pre-train model Fine-tune model Repurpose model General purpose data SE data Model 1 Model 2 Classi f ier Robbes and Janes. ICSE NIER 2019
  • 34.
    https://cdn.pixabay.com/photo/2013/02/23/20/33/seagull-85512_1280.jpg Imtiaz et al.SEmotion 2018: Even human ratings had low sentiment and politeness consistency on GitHub comments.
  • 37.