Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools

Exploratory Study of Slack Q&A Chats
as a Mining Source for
Software Engineering Tools
Preetha Chatterjee Kostadin Damevski Lori Pollock Vinay Augustine Nicholas A. Kraft
1

8 million daily active users
Given Slack’s increased use, are Slack Q&A chats a good mining source for
Software Engineering tools?
3
https://www.statista.com/statistics/652779/worldwide-slack-users-total-vs-paid/
16 140 268
500
750
1,100
1,700
2,000
2,300
2,700
3,000
4,000
6,000
8,000
10,000
0
2000
4000
6000
8000
10000
12000
Numberofusersinthousands

Research Questions
4
RQ1. How prevalent is the kinds of information that has
been successfully mined from the Stack Overflow Q&A
forum to support software engineering tools in developer
Q&A chats such as Slack?
RQ2. Do Slack Q&A chats have characteristics that might
inhibit automatic mining of information to support
software engineering tools?

Data Sets
5
Community
(Slack Channels)
#Conversations Community
(SO Tags)
#Posts
Slackauto Slackmanual SOauto SOmanual
clojurians#clojure 5,013 80 clojure 1,3920 80
elmlang#beginners 7,627 80 elm 1,019 160
elmlang#general 5,906 80 - - -
pythondev#help 3,768 80 python 806,763 80
racket#general 1,579 80 racket 3,592 80
Total 23,893 400 Total 825,294 400
Data Preparation:
• Chat Disentanglement [Elsner and Charniak 2008]
• LDA topic model

Research Questions
6

How has Stack Overflow been used as a
mining resource?
8
Code:
• IDE code recommendation [DeSouza‘14, Rahman‘14, Cordeiro’12, Ponzanelli‘14,
Bacchelli‘12, Amintaber‘15]
• Automatic generation of comments [Wong’13, Rahman‘15]
API:
• Learning and recommendation of APIs [Chen’16, Rahman’16, Wang’13]
• Augmenting API documentation [Treude‘16, Subramanian ‘14, Chen’14]
Other:
• Building thesaurus of software-specific terms [Tian’14, Chen’17]
• Gender bias and emotions [Novielli’14, Morgan ’17, Ford’16]
RQ1: Prevalence of information

Study Measures
9
Measure
Document length
Code snippet count
Code snippet length
Bad code snippets
Gist links
Stack Overflow links
API mentions in code snippets
API mentions in text

Study Measures
10

11
Much of the information mined from Stack Overflow is also available on Slack
Q&A channels.
API mentions are available in larger quantities on Slack Q&A channels.
Links are rarely available on both Slack and Stack Overflow Q&A.
Study Results

Research Questions
12

13
Measure
Participant count
Questions with no answer
Answer count
Indicators of accepted answers
Questions with no accepted answer
NL text context per code snippet
Incomplete sentences
Noise in document
Knowledge construction process *
* A. Zagalsky, D. M. German, M.-A. Storey, C. G. Teshima, and G. Poo-Caamaño, “How the R community creates and
curates knowledge: An extended study of Stack Overflow and mailing lists,” Empirical Software Engineering, 2017.
RQ2: Challenges of Mining Slack
Study Measures

14
Words/Phrases: good find; Thanks for your help; cool; this works; that’s it, thanks
a bunch for the swift and adequate pointers; Ah, ya that works; thx for the info;
alright, thx; awesome; that would work; your suggestion is what I landed on; will
have a look thank you; checking it out now thanks; that what i thought; Ok; okay;
kk; maybe this is what i am searching for; handy trick; I see, I’ll give it a whirl;
thanks for the insight!; thanks for the quick response @user, that was extremely
helpful!; That’s a good idea! ; gotcha; oh, I see; Ah fair; that really helps; ah, I
think this is falling into place; that seems reasonable; Thanks for taking the time to
elaborate; Yeah, that did it; why didn’t I try that?
Emojis:
Accepted Answer Indicators

15
Measure Results
Participant frequency 1 < 2 < 34
Questions with no answer 15.75%
Answer frequency 0 < 1 < 5
Questions with no accepted answer 52.25%
NL text context per code snippet 0 < 2 < 13
Incomplete sentences 12.63%
Noise in document 10.5%
Knowledge construction
61.5% crowd; 38.5%
participatory
Study Results

Study Results
16
Accepted answers are available in chat conversations, but require more effort
to discern.
Participatory conversations provide additional value but require deeper analysis
of conversational context.
Percentages of incomplete sentences and noise are low.
Measure Results
Participant frequency 1 < 2 < 34
Questions with no answer 15.75%
Answer frequency 0<1<5
Questions with no accepted answer 52.25%
NL text context per code snippet 0 < 2 < 13
Incomplete sentences 12.63%
Noise in document 10.5%
Knowledge construction 61.5% crowd; 38.5% participatory

17
P. Chatterjee, M. A. Nishi, K. Damevski, V. Augustine, L. Pollock and N. A. Kraft, "What information about code
snippets is available in different software-related documents? An exploratory study," 2017 IEEE 24th International
Conference on Software Analysis, Evolution and Reengineering (SANER), Klagenfurt, 2017, pp. 382-386.
The largest proportion of Slack Q&A conversations discuss software design.
Analyzing Types of Information in Chats

Related Work on Analyzing Chats
18
• Learn developer behaviors [Elliot’03, Shihab’09, Yu’11, Lin’16]
• Filter out off-topic discussion [Chowdhury and Hindle’15]
• Extraction of rationale [Alkadhi’17, ‘18]
• Chatbots [Lebeuf’17, Paikari’18]

Conclusions
19
Q&A chats provide, in lesser quantities, the same information as can be
found in Q&A posts on Stack Overflow.
Adapting technique and training sets can achieve high accuracy in
disentangling the Slack conversations.
It is feasible to apply automated mining approaches to chat conversations
from Slack. However, identifying an accepted answer is non-trivial.
Future Work
Investigate linking between public Slack channels to Stack Overflow.
Mine conversations for software development insights.
Mine opinion statements available in public Slack channels.

20
preethac@udel.edu
@PreethaChatterj
Exploratory Study of Slack Q&A Chats as a Mining Source for
Software Engineering Tools
Q&A chats provide, in lesser quantities, the same information as can be found in
Q&A posts on Stack Overflow.
Adapting technique and training sets can achieve high accuracy in disentangling
the Slack conversations.
It is feasible to apply automated mining approaches to chat conversations from
Slack. However, identifying an accepted answer is non-trivial.
Investigate linking between public Slack channels to Stack Overflow.
Mine conversations for software development insights.
Mine opinion statements available in public Slack channels.
Conclusions
Future Work
Supported by :
• NSF grant grant no. 1812968, 1813253
• DARPA MUSE program Air Force Research
Lab contract no. FA8750-16-2-0288.
Preprint:
https://tinyurl.com/
yxmown4x

Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools

Similar to Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools (20)

Recently uploaded

Recently uploaded (20)

Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools

Editor's Notes