Sentiment analysis tools for software engineering research cannot be used out of the box

On sentiment analysis tools for
software engineering research
Robbert Jongeling Subhajit Datta Alexander Serebrenik
Eindhoven U of
Technology (NL)
Singapore U of Technology
and Design (SG)
Eindhoven U of
Technology (NL)
@jongeling_r @datta_subhajit @aserebrenik

E. Guzman, D. Azócar, and Y. Li,
“Sentiment analysis of commit
comments in GitHub: An empirical
study,” MSR 2014
A.-I. Rousinopoulos, G. Robles, and
J. M. González-Barahona, “Sentiment
analysis of Free/Open Source
developers: preliminary ﬁndings from
a case study,” Revista Eletrônica de
Sistemas de Informação, 2014
E. Guzman and B. Bruegge, “Towards
emotional awareness in software
development teams,” in Joint Meeting on
Foundations of Software Engineering, 2013
D. Pletea, B. Vasilescu, and A. Serebrenik,
“Security and emotion: Sentiment analysis
of security discussions on GitHub”, MSR
2014
M. Ortu, B. Adams, G. Destefanis, P. Tourani,
M. Marchesi, and R. Tonelli, “Are bullies
more productive? empirical study of
affectiveness vs. issue ﬁxing time,” in MSR
2015
D. Garcia, M. S. Zanetti, and F. Schweitzer,
“The role of emotions in contributors
activity: A case study on the Gentoo
community,” in International Conference on
Cloud and Green Computing, 2013

study,” MSR 2014
2014
2015
NLTK SentiStrength

study,” MSR 2014
2014
2015
NLTK SentiStrength
Trained on movie/product reviews.
Threat: might misidentify (or fail to identify) a
sentiment in a software engineering artefact

• RQ1: To what extent do different sentiment analysis
tools agree with emotions of software developers?
• RQ2: To what extent do different sentiment analysis
tools agree with each other?
• RQ3: Do different sentiment analysis tools lead to
contradictory results in a software engineering
study?

Murgia et al.
MSR 2014
392 comments x 4 evaluators
joy love surprise anger fearsadness
positive negative
{
{
RQ1
RQ2

Murgia et al.
MSR 2014
positive negative
{
{
Consistent:
positive: 3 positive, none negative
negative: 3 negative, none positive
neutral: ≥3 without emotion indication
Alchemy
Stanford NLP
NLTK
SentiStrength
RQ1
Manual
neg neu pos
Tool
neg
neu
pos
RQ2
Tool A
neg neu pos
Tool
B
neg
neu
pos
RQ1
RQ2

Murgia et al.
MSR 2014
positive negative
{
{
Consistent:
positive: 3 positive, none negative
negative: 3 negative, none positive
neutral: ≥3 without emotion indication
Alchemy
Stanford NLP
NLTK
SentiStrength
RQ1
Manual
neg neu pos
Tool
neg
neu
pos
54
24
217
0 ≤ Adjusted Rand Index ≤ 1
[Santos, Embrechts, ICANN 2009]
RQ2
Tool A
neg neu pos
Tool
B
neg
neu
pos
RQ1
RQ2

RQ1: To what extent do different sentiment analysis tools
agree with emotions of software developers?
RQ1
Manual
neg neu pos
NLTK
neg 19 51 11
neu 0 138 7
pos 5 28 36
Tool ARI
NLTK 0.239
SentiStrength 0.113
Stanford NLP 0.108
Alchemy 0.079
Tools do not agree with manual evaluation
RQ1
RQ2

RQ2: To what extent do different sentiment analysis tools
agree with each other?
RQ2
SentiStrength
neg neu pos
NLTK
neg 17 39 25
neu 15 96 34
pos 6 20 43
Tool A Tool B ARI
NLTK Alchemy 0.104
NLTK SentiStrength 0.090
Tools do not agree with each other
RQ1
RQ2

RQ3
issue tracker
over
text
response
time
Sentiment
Analysis Tool
compare times for
neg, neu, pos
issues/questions
q & a site
NLTK

issue tracker
over
text
response
time
Sentiment
Anal. Tool
compare times for
neg, neu, pos
issues/questions
q & a site
NLTK ∩
SentiStrength
issue tracker
over
text
response
time
Sentiment
Anal. Tool
compare times for
neg, neu, pos
issues/questions
q & a site
SentiStrength
RQ3
issue tracker
over
text
response
time
Sentiment
Analysis Tool
compare times for
neg, neu, pos
issues/questions
q & a site
NLTK
Are the results the same?

NLTK SentiStrength NLTK ∩ SentiStrength
ASF
descr
neg > neu*** neg > neu***
pos > neu*** pos > neu*** pos > neu***
pos > neg*** pos > neg***
ASF title
neg > neu**
pos > neu*** pos > neu**
pos > neg* pos > neg**
GNOME
descr
neg > neu*** neg > neu*** neg > neu***
pos > neu*** pos > neu*** pos > neu***
pos > neg***
neg > pos***
SO
descr
ø neg > pos* ø
RQ3 RQ3: Do different sentiment analysis tools lead to
contradictory results in a software engineering study?
Choice of the sentiment analysis tool affects results of the
software engineering study

Tools do not agree with manual evaluation
Tools do not agree with each other
Choice of the sentiment analysis tool affects results of the
software engineering study
Summary
Sentiment analysis tools are trained on movie/
product reviews.
Threat: might misidentify (or fail to identify) a
sentiment in a software engineering artefact

Next steps?
• Train sentiment analysis tools on software
engineering data
• Data of Murgia et al.: ﬁrst step
• More and better-suited data is needed

Sentiment analysis tools for software engineering research cannot be used out of the box

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Sentiment analysis tools for software engineering research cannot be used out of the box

Similar to Sentiment analysis tools for software engineering research cannot be used out of the box (20)

More from Alexander Serebrenik

More from Alexander Serebrenik (20)

Recently uploaded

Recently uploaded (20)

Sentiment analysis tools for software engineering research cannot be used out of the box