CodeInsight-SCAM2015

RECOMMENDING INSIGHTFUL COMMENTS
FOR SOURCE
CODE USING CROWDSOURCED
KNOWLEDGE
Mohammad Masudur Rahman, Chanchal K. Roy and
+Iman Keivanloo
Department of Computer Science
University of Saskatchewan, +Queen’s University, Canada
15th International Working Conference on Source Code
Analysis and Manipulation (SCAM 2015)
Presented by: Jeffrey Svajlenko

CODE COMMENTS
2
 Programmer-readable
annotations in the code
 Explains what the code
does and how
 Make the code easier to
understand
 Part of a good coding
practice
 Does not discuss issues, concerns of the code
 Does not contain expert’s observation
 Such insights required for reuse or maintenance

3
Android: show soft keyboard automatically when focus is on an EditText

OUTLINE OF THIS TALK
4
Exploratory
study
CodeInsight
Empirical evaluation (for
comment ranking)
User study
(for comment quality)
Conclusion

EXPLORATORY STUDY
 Exp-RQ1: Do the follow-up discussions from SO
contain any useful information that is likely to aid
software maintenance activities (e.g., bug fixation,
code quality improvement)?
 Exp-RQ2: Which API classes and methods are used
in SO code examples that might encourage those
insightful discussions?
5

EXPLORATORY DATASET
6
Items Java Android C# Total
Questions 98 81 103 282
Accepted answers 98 81 103 282
Code segments 101 83 108 292
Discussion comments 276 161 269 706
 The question should be widely viewed (i.e., 500 times),
and the answer should contain one or more code
segments.
 The answer should contain at least 10 discussion
comments
 The comments should be up-voted at least by 5 users

STUDY DESIGN
 Two separate analyses for two RQs.
 Careful manual analysis on comment texts
 Discover the intent behind each comment.
 Identified 7 types of comments in the discussions
 Topic modeling and cross-domain analysis for RQ2.
7
Exp-RQ1
Exp-RQ2

STUDY FINDINGS
 Seven types of discussion comments found.
 Particularly interested on two types- Tips and
Bugs & warnings.
 On average, about 22% comments from each
domain fall in those types, which is significant. 8

CODEINSIGHT: PROPOSED TECHNIQUE’S
OVERVIEW
 Insightful comments for a given code segment by
exploiting crowd knowledge
 Extracts discussion comments based on five
heuristics.
 Recommends tips, bugs or concerns in the code
identified by the crowd.
9

HEURISTICS: CAPTURING INSIGHT
 Popularity (P)
 Insight makes the comment popular
 Important observations are paid up by votes
 Relevance (R)
 References to relevant API methods or classes in code.
 Based on cosine similarity measure
 Comment Rank (CR)
 User references (e.g., @Thomas) among the comments
make them important
 PageRank for relative importance of any comment
 Sentiment (S)
 Polarity based sentiment analysis from comment texts
 Issues, concerns are mostly associated with negativity
 Word Count (WC)
 Too small comments do not contain insight
 Too big comments are noisy
10

CODEINSIGHT: MINING INSIGHTFUL CODE
COMMENTS FOR SOURCE CODE
11

EVALUATION OF CODEINSIGHT
 Two-fold evaluation– (1) comment ranking
technique and (2) comment quality
 RQ1: How effective the technique is in retrieving
the comments that discuss bugs, concerns and tips
for improvement in the code?
 RQ2: Are the recommended comments accurate,
precise and concise in describing the potential
issues or troubleshooting tips?
 RQ3: Are the recommended comments useful for
static analysis involving maintenance of the target
code?
12

EMPIRICAL DATASET
 Exploratory dataset reused for experiments
 Manually labeled/classified comments as gold
comments
 Recommended comments matched against gold
comments
 Top 3 comments recommended from the ranking
 Two performance metrics– Recall (R) and Mean
Reciprocal Rank (MRR)
13

EMPIRICAL EVALUATION: COMMENT RANKING
TECHNIQUE
Heuristics Metrics Java Android C# Average
C3 C4 C3 C4 C3 C4 C3 C4
{ P } Recall 60.00% 66.67% 95.65% 87.50% 89.47% 69.44% 81.71% 74.53%
MRR 0.44 0.50 0.59 0.57 0.47 0.56 0.50 0.54
{P, R,CR} Recall 66.67% 70.83% 95.65% 84.38% 84.21% 77.78% 82.18% 77.66%
MRR 0.60 0.29 0.73 0.56 0.44 0.50 0.59 0.45
{ P, R,CR,
WC, S }
Recall 60.00% 79.16% 95.65% 96.88% 94.74% 86.11% 83.46% 87.38%
MRR 0.44 0.32 0.55 0.52 0.33 0.45 0.44 0.43
14
C3 = Tips, C4 = Bugs & warnings
P = Popularity, R = Relevance, CR = Comment Rank
WC = Word Count, S = Sentiment

USER STUDY: COMMENT QUALITY
 Dataset for Study
 Case study with 82 OS projects from GitHub
 85 code segments similar to SO segments
 Using GitHub code search
 Study Participants
 Four professional developers from two companies
 Specialized in mobile and web technologies
 Professional experience of 1-2.5 yr. in Java, 1.5-5 yr. in
Android and 1.5-3.5 yr. in C#
15

USER STUDY DESIGN
16
Accurate? Precise? Concise? Useful?
OS code
Segments (85)
Random selector CodeInsight
Participant
pool
Segments (20)
Insightful comments
Code
segments

USER STUDY: COMMENT QUALITY
EVALUATION
Responses
Average
Accurate Precise Concise Useful
Strongly agree
82.50% 80.83% 78.33% 79.17%
Agree
Neutral 6.67% 7.50% 12.50% 10.00%
Disagree
10.83% 10.00% 9.17% 10.83%
Strongly disagree
17

THREAT TO VALIDITY
 Limited dataset: The dataset is quite limited for
empirical evaluation.
 Limited participant pool: Only four professional
developers are involved.
 Closed source code: SO generally hosts code
segments from open source code. Thus, the
technique might not be effective for closed source
code.
18

TAKE-HOME MESSAGE
 Traditional code comments do not provide insight
on quality or issues in the code.
 About 22% of discussion comments for Java,
Android and C# code segments from SO discuss
issues, concerns or tips.
 Heuristics such as Popularity, Relevance,
Comment Rank or Sentiment are found quite
successful in capturing such insight in comments.
 Professional developers confirmed that 80% of
the recommended comments are accurate and
useful indeed.
19

PROVOCATIVE STATEMENT
 Descriptive identifiers vs. inline code comments.
Which one will you choose? Why?
21

CodeInsight-SCAM2015

Recommended

Recommended

More Related Content

Similar to CodeInsight-SCAM2015

Similar to CodeInsight-SCAM2015 (20)

More from Masud Rahman

More from Masud Rahman (20)

Recently uploaded

Recently uploaded (20)

CodeInsight-SCAM2015

Editor's Notes