Mining Source Code Improvement Patterns from Similar Code Review Works

Mining Source Code Improvement
Patterns from Similar Code Review
Yuki Ueda1, Takashi Ishio1, Akinori Ihara2,
Kenichi Matsumoto1
1Nara Institute of Science and Technology
2Wakayama University
13th International Workshop on Software Clones (IWSC’19)

Background Approach Result Summary
Contents
• Goal：Reduce Code Review Cost
• Approach：Code Improvement Pattern Detection
That Appeared Review
• Evaluation: Measure Patterns’ Frequency and
Accuracy
2

Code review process:
Reviewers suggest code fix
Patch
Author
Reviewer Project
3
- i=key
+ i=dic[“key”]
Patch
Background
(1) Submit

Patch
Author
Reviewer Project
4
- i=key
+ i=dic[“key”]
Patch
You should fix
(1) Submit
(2) Review, Fix suggestion
Background

5
- i=key
+ i=dic[“key”]
- i=key
+ i_=_dic[“KEY”]
(3) Integrate
Patch
Author
Reviewer Project(1) Submit
(2) Review, Fix suggestion
Reviewed Patch
(Integrated Patch)
Pre-Review Patch
(Initial Patch)
Background

Problem:
Reviewers need to check several times
6
- i=key
+ i=dic[“key”]
- i=key
+ i_=_dic[“KEY”]
(2) (n) Review Fix suggestion
(n) Integrate
Patch
Author
Reviewer Project(1) Submit
Reviewed Patch
(Integrated Patch)
Pre-Review Patch
(Initial Patch)
String should be lower
Waste space
Background

Goal
Reduce Similar Review Automatically
7
Auto Review
System
(2) Review Fix suggestion
(3) Review
request
Patch
Author
Reviewer(1) Submit
Similar patch is fixed in the
past like..
Background

Approach:
Detect Pattern from Reviewed Patch Diff
8
”key” , it will be “KEY”
Pattern
i=dic[“key”] i=dic[“KEY”]
Dataset
i=dic[“key”] i=dic[“KEY”]i=dic[“key”]
Pre-Review Patch
i=dic[“KEY”]
Reviewed Patch
Approach
If patch has
Detect

Approach:
Detect Pattern from Reviewed Patch Diff
9
Patch
Author
Auto Review
System
print(“key”)
print(“KEY”)
”key” , it will be “KEY”
Pattern
If patch has
Use
Dataset
i=dic[“key”] i=dic[“KEY”]i=dic[“key”] i=dic[“KEY”]i=dic[“key”]
Pre-Review Patch
i=dic[“KEY”]
Reviewed Patch
Approach

Detect Code Improved Pattern (1/2):
Divide Patch Diff to Chunk
10
- if i␣==␣0:
+ if i==0:
break
- i=dic[“key”]
+ i=dic.get(“key”)
- i=dic[“key”]
- if i␣==␣0:
+ if i==0:
Approach

Get Pattern by Sequential Pattern Mining
11
- i=dic[“key”]
- [i=dic - [ + .get(i=dic
- [i=dic
i=dic
- [ + .get(i=dic “key”
- ]
+ )
Length Length Length
Approach

Get Pattern by Sequential Pattern Mining
12
- i=dic[“key”]
- [i=dic - [ + .get(i=dic
- [i=dic - ]
i=dic + )
- [ + .get(i=dic “key”
Length Length Length
Keep Frequently Appeared and Longer Patterns
Approach

Pattern Evaluation
13
i=dic + .get( - ]
Appeared Time:
+ )(e.g. Pattern
i=dic
.get(
]
)
Pre-Reviewed Patches that have
Reviewed Patches that have
)
Number of Patch Pairs
Approach

Pattern Evaluation
14
Appeared Time:
.get( )
i=dic[“key”]
i=dic.get(“key”)
i=dic[”KEY”]
Count
NOT Count
e.g.
Pre-Reviewed Patch
i=dic + .get( - ] + )(e.g. Pattern )
i=dic ]
Approach

15
Appeared Time:
.get( )
Accuracy:
.get( )
Ratio of Patch Pairs
Pattern Evaluation
i=dic ]
i=dic ]
i=dic + .get( - ] + )(e.g. Pattern )
Approach

Target
16
Project OpenStack
Language Python3
Time Period 2011-2016
# Patches 173,749
# Chunks for Detect Pattern 555,050
# Chunks for Evaluate Pattern 61,673
Result

8 Frequently Appeared Pattern
17
self.stbout() self.stubs.Set()
Why?: Support for OpenStacks‘ library dependency changes
Result

18
assertEquals() assertEqual()
Why?: Support for Python 2 to 3 changes
xrange() range()
Result

19
Why?: Support for Python 2 to 3 changes
assertTrue(x in array)
Why?: Improve readability
assertIn(x, array)
xrange() range()
Result

20
Why?: Support for Python2 to 3 changes
assertTrue(x in array)
assertIn(x, array)
- xrange() + range()
Thresholds:
Appeared time > 300
Accuracy > 10%
Total 8 patterns
Cover: 32.3% (19,940/ 61,673) similar patches
Accuracy: 45.9%
Result

Patterns are discussed on StackOverflow
21
- assertEquals() + assertEqual()
Why?: Support for Python2 to 3 changes
- assertTrue(x in array)
+ assertIn(x, array)
- xrange() + range()
- self.stbout() + self.stubs.Set()
Result

For Automatically Code Review:
Work as GitHub Bot
22
Patch authorBot
I fixed
Reviewer
OK
Sample URL: https://github.com/Ikuyadeu/ExtentionTest/pull/9
Result

vs Other Tool (1 / 2)
Static Analysis Tool
FOO=0 foo_=_0
23
Bad name
Waste
space
Static Analysis Tool (pylint)
Fix based on Language
Other tools: ESlint, Pmd, checkstyle
Result

Static Analysis Tool
FOO=0 foo_=_0
24
Static Analysis Tool (pylint)
Fix based on Language
This research:
Project-specific
changes
self.stbout()
xrange()
self.stubs.Set()
range()
Old library
dependency
Language
definition
Result

25
Choose best rule set from large rule set
• Invalid-name
• Bad-continuation
• Wrong-import-order
• Invalid-name
IntelliCode
Result

26
Find NEW pattern set from history
• Invalid-name
• Invalid-name
• disk2disk_api
• stubs.Set2stub_out
• assert-equals2equal
IntelliCode
This Study
Result

27
Find NEW pattern set from history
• Invalid-name
• Invalid-name
• disk2disk_api
• stubs.Set2stub_out
• assert-equals2equal
IntelliCode
This Study
Support project-specific problem
Support change of environment
Result

Future Work
• Which pattern should bot choose?
üMost appeared pattern, High accuracy pattern
• Compare with Other Projects and Languages’
Patterns
• Evaluate by submitting pull request, and get ratio of
Accepted / Submitted pull request
28
Summary

Mining Source Code Improvement Patterns from Similar Code Review Works

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Mining Source Code Improvement Patterns from Similar Code Review Works

Similar to Mining Source Code Improvement Patterns from Similar Code Review Works (20)

Recently uploaded

Recently uploaded (20)

Mining Source Code Improvement Patterns from Similar Code Review Works