Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools
1. Exploratory Study of Slack Q&A Chats
as a Mining Source for
Software Engineering Tools
Preetha Chatterjee Kostadin Damevski Lori Pollock Vinay Augustine Nicholas A. Kraft
1
3. 8 million daily active users
Given Slack’s increased use, are Slack Q&A chats a good mining source for
Software Engineering tools?
3
https://www.statista.com/statistics/652779/worldwide-slack-users-total-vs-paid/
16 140 268
500
750
1,100
1,700
2,000
2,300
2,700
3,000
4,000
6,000
8,000
10,000
0
2000
4000
6000
8000
10000
12000
Numberofusersinthousands
4. Research Questions
4
RQ1. How prevalent is the kinds of information that has
been successfully mined from the Stack Overflow Q&A
forum to support software engineering tools in developer
Q&A chats such as Slack?
RQ2. Do Slack Q&A chats have characteristics that might
inhibit automatic mining of information to support
software engineering tools?
5. Data Sets
5
Community
(Slack Channels)
#Conversations Community
(SO Tags)
#Posts
Slackauto Slackmanual SOauto SOmanual
clojurians#clojure 5,013 80 clojure 1,3920 80
elmlang#beginners 7,627 80 elm 1,019 160
elmlang#general 5,906 80 - - -
pythondev#help 3,768 80 python 806,763 80
racket#general 1,579 80 racket 3,592 80
Total 23,893 400 Total 825,294 400
Data Preparation:
• Chat Disentanglement [Elsner and Charniak 2008]
• LDA topic model
6. Research Questions
6
RQ1. How prevalent is the kinds of information that has
been successfully mined from the Stack Overflow Q&A
forum to support software engineering tools in developer
Q&A chats such as Slack?
RQ2. Do Slack Q&A chats have characteristics that might
inhibit automatic mining of information to support
software engineering tools?
7. How has Stack Overflow been used as a
mining resource?
8
Code:
• IDE code recommendation [DeSouza‘14, Rahman‘14, Cordeiro’12, Ponzanelli‘14,
Bacchelli‘12, Amintaber‘15]
• Automatic generation of comments [Wong’13, Rahman‘15]
API:
• Learning and recommendation of APIs [Chen’16, Rahman’16, Wang’13]
• Augmenting API documentation [Treude‘16, Subramanian ‘14, Chen’14]
Other:
• Building thesaurus of software-specific terms [Tian’14, Chen’17]
• Gender bias and emotions [Novielli’14, Morgan ’17, Ford’16]
RQ1: Prevalence of information
8. Study Measures
9
Measure
Document length
Code snippet count
Code snippet length
Bad code snippets
Gist links
Stack Overflow links
API mentions in code snippets
API mentions in text
RQ1: Prevalence of information
10. 11
Much of the information mined from Stack Overflow is also available on Slack
Q&A channels.
API mentions are available in larger quantities on Slack Q&A channels.
Links are rarely available on both Slack and Stack Overflow Q&A.
Study Results
RQ1: Prevalence of information
11. Research Questions
12
RQ1. How prevalent is the kinds of information that has
been successfully mined from the Stack Overflow Q&A
forum to support software engineering tools in developer
Q&A chats such as Slack?
RQ2. Do Slack Q&A chats have characteristics that might
inhibit automatic mining of information to support
software engineering tools?
12. 13
Measure
Participant count
Questions with no answer
Answer count
Indicators of accepted answers
Questions with no accepted answer
NL text context per code snippet
Incomplete sentences
Noise in document
Knowledge construction process *
* A. Zagalsky, D. M. German, M.-A. Storey, C. G. Teshima, and G. Poo-Caamaño, “How the R community creates and
curates knowledge: An extended study of Stack Overflow and mailing lists,” Empirical Software Engineering, 2017.
RQ2: Challenges of Mining Slack
Study Measures
13. 14
Words/Phrases: good find; Thanks for your help; cool; this works; that’s it, thanks
a bunch for the swift and adequate pointers; Ah, ya that works; thx for the info;
alright, thx; awesome; that would work; your suggestion is what I landed on; will
have a look thank you; checking it out now thanks; that what i thought; Ok; okay;
kk; maybe this is what i am searching for; handy trick; I see, I’ll give it a whirl;
thanks for the insight!; thanks for the quick response @user, that was extremely
helpful!; That’s a good idea! ; gotcha; oh, I see; Ah fair; that really helps; ah, I
think this is falling into place; that seems reasonable; Thanks for taking the time to
elaborate; Yeah, that did it; why didn’t I try that?
Emojis:
Accepted Answer Indicators
RQ2: Challenges of Mining Slack
14. 15
Measure Results
Participant frequency 1 < 2 < 34
Questions with no answer 15.75%
Answer frequency 0 < 1 < 5
Questions with no accepted answer 52.25%
NL text context per code snippet 0 < 2 < 13
Incomplete sentences 12.63%
Noise in document 10.5%
Knowledge construction
61.5% crowd; 38.5%
participatory
RQ2: Challenges of Mining Slack
Study Results
15. Study Results
16
Accepted answers are available in chat conversations, but require more effort
to discern.
Participatory conversations provide additional value but require deeper analysis
of conversational context.
Percentages of incomplete sentences and noise are low.
RQ2: Challenges of Mining Slack
Measure Results
Participant frequency 1 < 2 < 34
Questions with no answer 15.75%
Answer frequency 0<1<5
Questions with no accepted answer 52.25%
NL text context per code snippet 0 < 2 < 13
Incomplete sentences 12.63%
Noise in document 10.5%
Knowledge construction 61.5% crowd; 38.5% participatory
16. 17
P. Chatterjee, M. A. Nishi, K. Damevski, V. Augustine, L. Pollock and N. A. Kraft, "What information about code
snippets is available in different software-related documents? An exploratory study," 2017 IEEE 24th International
Conference on Software Analysis, Evolution and Reengineering (SANER), Klagenfurt, 2017, pp. 382-386.
The largest proportion of Slack Q&A conversations discuss software design.
Analyzing Types of Information in Chats
17. Related Work on Analyzing Chats
18
• Learn developer behaviors [Elliot’03, Shihab’09, Yu’11, Lin’16]
• Filter out off-topic discussion [Chowdhury and Hindle’15]
• Extraction of rationale [Alkadhi’17, ‘18]
• Chatbots [Lebeuf’17, Paikari’18]
18. Conclusions
19
Q&A chats provide, in lesser quantities, the same information as can be
found in Q&A posts on Stack Overflow.
Adapting technique and training sets can achieve high accuracy in
disentangling the Slack conversations.
It is feasible to apply automated mining approaches to chat conversations
from Slack. However, identifying an accepted answer is non-trivial.
Future Work
Investigate linking between public Slack channels to Stack Overflow.
Mine conversations for software development insights.
Mine opinion statements available in public Slack channels.
19. 20
preethac@udel.edu
@PreethaChatterj
Exploratory Study of Slack Q&A Chats as a Mining Source for
Software Engineering Tools
Q&A chats provide, in lesser quantities, the same information as can be found in
Q&A posts on Stack Overflow.
Adapting technique and training sets can achieve high accuracy in disentangling
the Slack conversations.
It is feasible to apply automated mining approaches to chat conversations from
Slack. However, identifying an accepted answer is non-trivial.
Investigate linking between public Slack channels to Stack Overflow.
Mine conversations for software development insights.
Mine opinion statements available in public Slack channels.
Conclusions
Future Work
Supported by :
• NSF grant grant no. 1812968, 1813253
• DARPA MUSE program Air Force Research
Lab contract no. FA8750-16-2-0288.
Preprint:
https://tinyurl.com/
yxmown4x
Editor's Notes
Thank you. I’m Preetha Chatterjee, a PhD student at University of Delaware. Today, I will describe our work on “Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools.”
My coauthors are: Kostadin Damevski, Lori Pollock, Vinay Augustine and Nicholas Kraft.
With increased online sharing, developers are having conversations about software via online chat services. (click)
Developers use these communities to ask and answer specific development questions, with the aim of improving their own skills and helping others. Slack is currently the most popular platform which hosts many active public channels focused on software development technologies.
Over 8 million active users participate daily on Slack, and this graph shows how the number of users increased on Slack over the past few years.
Through this study we investigate given Slack’s increased use, are Slack Q&A chats a good mining source for Software Engineering tools?
For RQ1, We compare the content in Q \& A focused public chat communities (e.g. Slack) with Q \& A based discussion forums (e.g. Stack Overflow).
We explore the availability and prevalence of information in Slack that are mined from SO, which provides us with the first insight into the prospect of chat communities as a source of mining.
As a part of RQ2, we investigate the feasibility of applying automatic information extraction techniques on chat messages.
We curated a comparison data set on Slack and SO by using LDA and a modified chat disentanglement technique which was initially proposed by Elsner and Charniak. We gathered around 24k Slack conversations and 800k SO posts. Since all the measures for this study could not be computed automatically with high accuracy, we created smaller subsets of data each containing 400 conversations and posts for manual analysis.
I will first present the methodology and results of RQ1.
This slide shows a pair of example conversation on Slack and a Stack Overflow post on the similar topic, to highlight their differences in form and structure. Chat conversations are transient and as a result important information and advice are lost over time. SO is archival-based resource and developers can easily refer to the information for future references. Chats are informal communication platform where developers exchange a lot of information in short time, while SO has more in-depth questions with well-thought out answers. As opposed to SO, chat conversations lack a formal structure and are often interleaved.
I DON’T THINK WE HAVE TIME TO SHOW THIS SLIDE
Literature shows that, Code and NL text from SO has been mined by researchers for several s/w engg tasks such as IDE recommendation, augmenting API documentation, building thesaurus of s/w specific terms, etc. Collectively, these prior works suggest that specific types of information embedded in software-related documents could be used in building or improving software engineering tools.
To answer RQ1, we focused on similar information that has been commonly mined in SO. Specifically, we analyzed code snippets, links to external resources, and API mentions.
To answer RQ1, we focused on similar information that has been commonly mined in SO. Specifically, we analyzed code snippets, links to external resources, and API mentions.
We display the results primarily as box plots. Read take always and add:
However, most of this information is available in larger quantities on Stack Overflow.
Specifically for API mentions in text, both sources had a fairly low median occurrence, but Slack had a higher value and more variance.
Before the study, we anticipated that developers on Slack would often use links to answer questions, saving time by pointing askers to an existing information source, such as Stack Overflow. Alternatively, we expected askers to use Gist to post code prior to asking questions, in order to benefit from the clean formatting that enables the display of a larger block of code. While both of these behaviors did occur, they were fairly infrequent.
Next I will discuss the methodology and results of RQ2.
To answer RQ2, we focused on measures that could provide some insights into the form of Slack Q&A conversations (participant count, questions with no answer, answer count) and measures that could indicate challenges in automation (how participants indicate accepted answers, questions with no accepted answer, natural language text describing code snippets, incomplete sentences, noise within a document, and knowledge construction process) that suggest a need to filter. Since RQ2 investigates challenges in mining information in developer chat communications to support software engineering tools, we only computed the measures on Slack.
We observed the common words/phrases that indicate answer acceptance in Slack conversations. The most prevalent indicator is “Thanks/thank you”, followed by phrases acknowledging the participant’s help such as “okay”, ”got it”, and other positive sentiment indicators such as “this worked”, “cool”, and “great”.
Accepted answers were also commonly indicated using emojis as listed in the table.
Results represented as percentages are reported directly, while other results, computed as simple counts, are reported as minimum < median < maximum.
The results indicate that the number of incomplete sentences describing code is low, 13%, and similarly the noise in a conversation can be as high as 11%.
2) There is a significant proportion of accepted answers available in Slack. However, an automatic mining tool needs to automatically identify the sentence in a conversation that is an answer to a question and which question it is answering. This implies that NLP techniques and sentiment analysis will most likely be needed to automatically identify and match answers with questions.
3) Nearly 40% of conversations on Slack Q&A channels were participatory, with multiple individuals working together to produce an answer to the initial question. These conversations present an additional mining challenge, as utterances form a complex dependence graph, as answers are contributed and debated concurrently.
To gain insight into the semantic information, we analyzed the kinds of information provided in the conversations. Using the labels defined in one of our previous work, we observed that the most prevalent types of information on Slack is “Design”, which includes information on programming language, framework, and time/space complexity of the code snippet. This aligns with the fact that the main purpose of developer Q&A chats is to ask and answer questions about alternatives for a particular task, specific to using a particular language or technology.
Often the focal point of conversations are APIs, where a developer is asking experts on the channel for suggestions on API or proper idioms for API usage.
Other researches have conducted studies on analyzing chats. However they have focused on learning developer behaviors. Chowdhury and Hindle proposed an approach to automatically filter out off-topic IRC discussions by exploiting Stack Overflow programming discussions and YouTube video comments. Alkadhi et al. examined the frequency and completeness of available rationale in chat messages, contribution of rationale by developers, and the potential of automatic techniques for rationale extraction. Researchers have also investigated the role of chatbots in software development activities.
In summary, Q&A chats provide similar information that can be found on Q&A forums such as Stack Overflow. Adapting existing technique and training sets can achieve high accuracy in disentangling the Slack conversations. And finally, presence of low percentages of noise and incomplete sentences show feasibility to apply automatic mining approaches to extract information from Slack chats.
1) While there were few explicit links to Stack Overflow and GitHub Gists in our dataset, we believe that information is often duplicated on these platforms, and that answers on one platform can be used to complement the other. Future work includes further investigating this linking between public Slack channels to Stack Overflow.
2) Participatory Q&A conversations are available on Slack on large quantities. These conversations often provide interesting insights about various technologies and their use, incorporating various design choices. As future work, we intend to investigate mining such conversations for software development insights.
3) We also observed that developers use Slack to share opinions on best practices, APIs, or tools (e.g., API X has better design or usability than API Y ). Stack Overflow explicitly forbids the use of opinions on its site. Opinions are valuable to software developers, and it could also lead to new mining opportunities for software tools. Hence, we plan to investigate the mining of opinion statements available in public Slack channels.
This concludes my talk. I will be happy to answer questions now.