Understanding the Rationale for Updating a Function's Comment

Understanding the Rationale for Updating a
Function’s Comment
Haroon Malik, Istehad Chowdhury, Hsiao-Ming Tsou,
Zhen Ming Jiang, Ahmed E. Hassan
School of Computing, Queen’s University, Canada

Documentation is vital for the successful
evolution of a software system
2

Why understand the rationale for
updating a comment
3

Because…
Reduce efforts to understand code
Reduce maintenance cost
Prevent bugs
Increase reliability 4

Likelihood of updating a comment
Function 1.
function incrementValue ($val)
{
return ($val++);
}
Function 2.
function processInput($val)
{ //loop 11 times.
for (i=0;i<10;i++) {
// loop executes for the upper
bound of J
for (j=0;j<10;j++) {
$val = ($val | i) << 2;
$val = $val & $j << 2;
}
}
return $val;
}
5

Likelihood of updating a comment
Function 1.
function incrementValue ($val)
{
return ($val++);
}
Function 2.
function processInput($val)
{ //loop 11 times.
for (i=0;i<10;i++) {
// loop executes for the upper
bound of J
for (j=0;j<10;j++) {
$val = ($val | i) << 2;
$val = $val & $j << 2;
}
}
return $val;
}
6

• Modified function characteristics (8 attributes)
– Long vs. short functions
– Long vs. short function names
– Well-documented functions
– Complex vs. simple functions (# of control statements)
• Change characteristics (8 attributes)
– Complex vs. simple change
– Large vs. small change
• Time and code ownership characteristics (9)
– Do habits change over time? Weekends vs. weekends
– Same developer that changed it last time
7
Study Dimensions

Comment Update?
YES ? No?
8
Modeled as a classification problem

Measuring Performance
9
True Class
Classified As
YES NO
YES a b
NO c d
We measure overall misclassification rate
= (b+c)/(a+b+c+d)

• Explainable model
• Resistant to noise
• Correlated attributes
• Minimum configuration
10
Need

Random Forests
Project Comment
update history
Data Set

Random Forests
12
Project Comment
update history
Data Set
RandomSample
RandomTress
Yes No No
No
Vote
Prediction

Finding Top Attributes
• Sensitivity Analysis for particular attribute
• Randomly change the value in all samples
• Re-classify and compare performance
–Drop in performance is relative to the
importance of the attribute
13

Case Study
• Used 4 open source projects with over 39
years of development:
• PostgreSQL, FreeBSD, Gcluster and GCC
• Conducted 5 experiments
• 1 for each dimension
• 1 for all attributes of each project
• 1 for total combined attributes of all projects

Exp. #1 Characteristics of changed
function
• Intuition
– Modification to complex functions are trickier and
more likely to introduce integration bugs
• Findings
– Likelihood of comment update is higher in
functions
• With a large number of comments
• That are complex
15

Exp. #2 Characteristics of the change
• Intuition
– More extensive and complex changes will increase
the probability that a comment will get updated
• Findings
– Likelihood of comment update is higher for
changes
• That are bug fixes
• With a large number of changed dependencies
• Which increase the complexity of a function (control statements)
16

Exp. #3 Change time and code-
ownership
• Intuition
– To see if time has any impact on a developer
tendency to update a comment
– To highlight the relation of a function with
developer
• Findings
– Likelihood of comment update
• Depends on Weekday: Developers are reluctant to update
comment on certain weekdays
• Does not depend on developer: non-creator of function will
update too
17

Exp. #4 All attributes
• Intuition
– To find general trend towards all attributes instead
of specific trend per dimension
• Findings
– The top attributes are consistent across projects
– The top attributes are from the changed function
and change characteristics dimension
• Number of changed dependencies
• Percentage of changed dependencies
• Total number of comments
18

Exp. #5 All Projects
• Intuition
– Determine the most influential attributes across
all projects
• Added an extra attribute “Project
Name”
• Findings
– Project name did not bubble up as an important
attribute
19

Number Speaks
• Performance of classifier improves with
combining data from all projects. Over
all misclassification rate ~ 20%
21

Random Forests
Training set
…
1
2
n
n random cases
Classification
Algorithm
n classifiers
1
2
3
3
n
Classification
Algorithm
Classification
Algorithm
Classification
Algorithm
Test set
…
L1
L2
L3
Ln
n labels
L
vote
23

Understanding the Rationale for Updating a Function's Comment

More Related Content

What's hot

Viewers also liked

Similar to Understanding the Rationale for Updating a Function's Comment

More from SAIL_QU

Recently uploaded

Understanding the Rationale for Updating a Function's Comment