Automatic Identification of Informative Code in Stack Overflow Posts

Automatic Identification of Informative
Code in Stack Overflow Posts
0
Preetha Chatterjee
1st International Workshop on Natural Language-based Software Engineering (NLBSE)
Preprint: https://preethac.github.io/files/NLBSE_22.pdf
@PreethaChatterj
preethac@drexel.edu https://preethac.github.io

Stack Overflow Usage: Challenges
Information Overload
o The top 5 SO questions for a query
“Differences Hashmap Hashtable
Java”, consist of 51 answers with a
total of 6,771 words [Xu et. al. 2017]
o Software engineers focus on only
27% of code in a post to understand
and apply the relevant information to
their context [Chatterjee et. al. 2020]
We aim to:
Extract informative code (problematic and suggested code pairs)
within a Stack Overflow post
 Reformulation of queries
[Li et al. ‘16, Nie et al. ‘16]
 High-quality post detection
[Ponzanelli et al.’14, Yao et al.’15]
 Automatic tag recommendation
[Beyer and Pinzger’15 , Saha et al. ‘13]
 Identification of cue sentences for
navigation [Nadi and Treude ’20]

2
Problematic and Suggested Code Pairs

3
Problematic and Suggested Code Pairs from Answer
Textual Patterns
change/modify <prob code> to <sugg code>
replace <prob code> with <sugg code>
instead of <prob code> add/try/use <sugg code>
switch <prob code> to <sugg code>
<sugg code> instead of <prob code>

4
Problematic Code from Question and Suggested Code from Answer
Identification of sugg code from answer
- DECA [Di Sorbo et al ‘15]
- Identify short imperative sentences that
suggest a solution
(e.g., “Perform an interactive rebase”).
Identification of prob code from question
- Extract code elements from sugg code.
- Determine the approximate location of
prob code.
- Measure textual similarity between
prob code in the question with sugg
code in answer.

Evaluation: Preliminary Survey
Q2. Feedback – Is highlighted code
helpful for users to understand the
problem and choose a potential
solution?
Q1. Focus - Which parts of a code
software engineers focus on to
reduce their time in locating
information?
 14/25 (56%) indicated ‘Very useful’
 8/25 (32%) indicated ‘Useful’
 3/25 (12%) indicated ‘Somewhat
useful’
 Lines of code in the question
causing the error or exception
 Targeted fix from the suggested
code snippets in the answers

Summary and Overview
Q2. Feedback – Is highlighted code
helpful for users to understand the
problem and choose a potential
solution?
Q1. Focus - Which parts of a code
software engineers focus on to
reduce their time in locating
information?
NL-based approaches show promise to extract prob and sugg code from a post
Significance:
 improve the Q&A forum interface
 guide tools for mining forums
 improve granularity of traceability mappings involving forum posts
Preprint: https://preethac.github.io/files/NLBSE_22.pdf
@PreethaChatterj
preethac@drexel.edu https://preethac.github.io
 14/25 (56%) indicated ‘Very useful’
 8/25 (32%) indicated ‘Useful’
 3/25 (12%) indicated ‘Somewhat
useful’
 Lines of code in the question
causing the error or exception
 Targeted fix from the suggested
code snippets in the answers

Automatic Identification of Informative Code in Stack Overflow Posts

More Related Content

Similar to Automatic Identification of Informative Code in Stack Overflow Posts

More from Preetha Chatterjee

Recently uploaded

Automatic Identification of Informative Code in Stack Overflow Posts

Editor's Notes