4. Atomic Code Change
Fixed bug #12, #34, #56
Removed duplicate code
Add a feature
Javadoc updated
Composite Code Change
Fixed bug #123
Difficult to review
Likely be rejected
5. Research Questions
• RQ1: Are composite code changes prevalent?
• RQ2: Can we propose an approach to improve the semantic atomicity
of composite code changes?
• RQ3: Can our approach help developers better review composite code
changes?
6. RQ1: Occurrence of composite code changes
• Data source
• 4 open-source Java projects
• Revisions that changed >= 2 lines of code
• Commit logs and source code were manually inspected
6
Time period Revisions Avg. cLOC Avg. files
Ant 2010/04/27 -- 2012/03/05 137 26.1 2.0
Commons Math 2011/11/28 -- 2012/04/12 107 84.7 3.5
Xerces 2008/11/03 -- 2012/03/13 116 63.6 3.0
JFreeChart 2008/07/02 -- 2010/03/30 93 144.9 4.1
Total revisions 453
10. Approach
Partition of the patch
Unix diff
(text differencing)
ChangeDistiller*
(AST differencing)
*http://www.ifi.uzh.ch/seal/research
/tools/changeDistiller.html
Formatting
changes
10
Formatting
Dependency
Similarity
A set of changed
statements
A composite
code change
11. Approach
Partition of the patch
IBM T.J. Watson
Libraries for Analysis
(WALA)*
Inter-procedural
Backward static slicing
*http://wala.sourceforge.net/wiki/in
dex.php/Main_Page
11
Formatting
Dependency
Similarity
A set of changed
statements
A composite
code change
12. Approach
Partition of the patch“protect array entries against
corruption by returning a clone”
Same change type
Similar delta
12
Formatting
Dependency
Similarity
A set of changed
statements
A composite
code change
P
P’
13.
14. Evaluation
• 78 composite code changes from the previously inspected data
• 3 human evaluators establish manual partitions for these changes
• Automatic partition results are compared to manual partitions
• Considered acceptableif it exactly matched the manual partition
82%
13%
5%
Xerces
92%
7%
1%
Ant
82%
12%
6%
Commons Math
71%
18%
11%
JFreeChart
1 issue
2 issues
> 2 issues
15. Evaluation
• 78 composite code changes from the previously inspected data
• 3 human evaluators establish manual partitions for these changes
• Automatic partition results are compared to manual partitions
• Considered acceptableif it exactly matched the manual partition
82%
13%
5%
Xerces
92%
7%
1%
Ant
82%
12%
6%
Commons Math
71%
18%
11%
JFreeChart
1 issue
2 issues
> 2 issues
16. Evaluation
• 78 composite code changes from the previously inspected data
• 3 human evaluators establish manual partitions for these changes
• Automatic partition results are compared to manual partitions
• Considered acceptableif it exactly matched the manual partition
Acceptable # / Total #
Ant 8 / 11
Commons Math 10 / 19
Xerces 16 / 21
JFreeChart 20 / 27
54 / 78 (69%)
17. Ant revision 943068 (24 changed LOC)
“Wrong assignment after I renamed the parameter. Unfortunately
there doesn’t seem to be a testcase that catches the error.”
Later fixed in revision 943070
17
One of the two change-slices after partitioning
18. Preliminary User Study
• RQ3
• Can our automatic partition help developers better review composite changes?
• Participants
• 18 CS graduate students
• Task
• Participants review 12 composite code changes
• Answer a series of code review questions [1], e.g.,
• “What is the consequence of removing the schemaType field?”
• “What do changes in these files have in common?”
18
[1]“Questions programmers ask during software evolution tasks” Sillito et al. FSE 2006
23. Related Work
• Helping developers help themselves: Automatic decomposition of
code review changesets. Barnett et al. ICSE 2015
• The industrial perspective of composite code changes
• The impact of tangled code changes. Kim Herzig and Andreas Zeller.
MSR 2013
• Filtering Noise in Mixed-Purpose Fixing Commits to Improve Defect
Prediction and Localization. Nguyen et al. ISSRE 2013
• How composite code changes affect defect prediction
24. Acknowledgment
• We thank all the students that participated in our user study.
• We especially thank Tao He and Hai Wan for their kind help of
arranging the user study.
• We also thank Ananya Kanjilal for her comments on the paper draft.
Editor's Notes
The focus of this session, and of course this work, is code review, which in modern software development typically practiced on lightweight code changes.
Characteristics of changes directly affect how code review is practice
Considering two types of code changes/commits/revisions.
The second change fixed 3 bugs, did some refactoring….
Which one is more acceptable from a code reviewers’ perspective?
From our previous study, we found evidences that CCC are … compared to atomic changes.
And this is the particular problem we try to tackle with in this work
Specifically, we want to answer these 3 questions.
We first investigate whether the problem of CCC is really prominent by taking a look the revision history of these 4 OS projects.
We inspected revisions that changed at least 2 lines of code within this time period, that results in 453 revisions in total
We mainly inspect the commit log to see how many logical issues are addressed.
Filter criteria: multiple lines of source code changes, compilable,
Here is what we observed. Blue means the revision is atomic, that is, it addresses one single issue. And orange and grey mean that the revision addresses 2 or more than 2 issues.
Among the revisions that changed multiple source code lines, from 8% - 29% , on average 17% patches are composite, that is, they address 2 or more logical issues.
non-trivial
So next, we propose an approach to partition a composite code change, so that in the end each subset is more semantically cohesive and hopefully easier to review.
First, a code change is modeled as a set of changed statements.
Basically, we identify relations between statements to derive change partition. Statements in the same partition (change-slice) are related while statements in different subsets are not
Now the core of our approach is this black box: how do we …
We use 3 heuristics here
First, we believe that formatting changes should be on their own instead of mingled with other type of changes, such as bug fixes.
So we compute textual and AST differencing results of the change.
Changed lines that are not identified in AST differencing but in the textual differencing are considered as formatting changes
Second, we believe that changed statements that have static dependency should belong to the same subset.
We use WALA to perform … on changed lines so that we know whether two changes have static dependency.
Third, we believe that changes with similar patterns are also related.
For example, these two functions both change the returning of a variable the same way. Although they don’t have static dependency, it’s obvious that they are serving the same purpose…
So, as the 3rd heuristic, we check if two changes have same change type (e.g., if they both update the return statement, which we can tell from its AST node type)
If so, whether they have string delta is above a threshold (0.8 by python library default)
Here is an example of how our approach works. Before the change and after the change. Some code is added with the plus sign, some is deleted with the minus sign, others are modified with the caret sign
Based on our heuristics, this change is partitioned into 3 change-slices, marked with different colors.
First, the orange part is formatting changes
Another change-slice is the green part of the change, …
Finally, these two added methods also form a partition cuz their method names are similar.
Remember in RQ1, we inspected over 400 revisions and some of them address multiple logical issues?
We used these 78 revisions, marked in yellow and grey, in our evaluation.
We compare the automatic partitions produced by our approach to the manual partitions. An automatic partition for a change is considered acceptable if …
We are not claiming that the manual partition by human evaluators is the only way of partitioning a change, because code semantics are subject to …. In other words, the solution space for partitioning is potentially infinite.
However, our goal is to facilitate code review. So if our automatic partition results match a common way of manual partition, then we say it’s acceptable.
Among the 78 CCC, 54 are automatically partitioned the same way as human evaluators did.
Yet, this brings up our third research question: does this automatic partition really help code review?
We’ve observed several cases that indicate the benefits of our approach. Here is one of them.
Our approach partitions a Ant revision, which changes 24 LOC, into 2 subsets. One subset pinpoints a buggy area, which was later fixed by developers.
Specifically, developers said “no testcase…”
We think that our patch partition may help make this bug more obvious and identifiable from a big, composite code change.
We also conduct a user study to investigate…?
We recruit … to review 12 composite … from our evaluation data and answer
There are 2 treatments.
The control condition is that participants review … by file, as they usually do
The experimental condition is that .. by partition
We analyze the correctness of their answers and the time they spent on coming up the answers.
We found that the group using partition to review patches performs significantly better than the control group in answering questions correctly, although they spend a slightly, but not significantly longer time.
This result is also promising in showing that our approach potentially help the patch review process
One-tailed Paired-samples t-test
Time no significant different. Participants even spent a slightly longer time under the by slice treatment. We conjecture that such a time increase mainly resulted from participants’ additional learning cost for using the unfamiliar “partitions” to understand patches.
69% composite changes are automatically partitioned the same way as human evaluators did
Our user study shows Participants’ code review effectiveness is significantly improved
Finally, I would like to bring up these issues that are highly relevant and important, yet we haven’t been able to address them in this work yet.
For example ….
I think these would to interesting to discussion and I’ll leave it to the discussion slot.
Cost: learning cost, execution cost
Hypothesis: size and complexity