Yuki Ueda, Akinori Ihara, Takashi Ishio, and Kenichi Matsumoto, "Impact of Coding Style Checker on Code Review -A case study on the OpenStack projects-", In Proc. The 9th International Workshop on Empirical Software Engineering in Practice (IWESEP’18), 2018
Project Based Learning (A.I).pptx detail explanation
Impact of Coding Style Checker on Code Review -A case study on the OpenStack projects-
1. Impact of Coding Style Checker on
Code Review
-A case study on the OpenStack
projects-
Yuki Ueda1, Akinori Ihara2,
Takashi Ishio1 , Kenichi Matsumoto1
1Nara Institute of Science and Technology
2Wakayama University
The 9th International Workshop on Empirical Software Engineering in Practice
2. Background Approach Results Future Work
Code review process:
Reviewer detect potential issues
Patch
author
ReviewerSubmit Project
2
Background
It can be better
- i=key
+ i=dic[“key”]
Review, Fix Request
Initial Patch
4. Background Approach Results Future Work
Code improvement is important [1]
[1] Alberto Bacchelli and Christian Bird. Expectations, outcomes, and challenges
of modern code review. In Proc. ICSE’13, pp. 712–721
4
Background
6. Background Approach Results Future Work
Coding style checkers are used to reduce
the review cost
6
Background
Patch
author
ReviewerChecker Fix Fix
ADI (Automatically Detected Issues)
MDI (Manually Detected Issues)
7. Background Approach Results Future Work
Coding style checkers detect style issues
automatically
FOO=0
print(“var =_” + var)
foo_=_0
print(“var =”, var)
7
Invalid name
Format for string
Space deficiency
Background
ADI (Automatically Detected Issues)
Example of pylint on python
Other tool: ESlint, Pmd, checkstyle
8. Background Approach Results Future Work
Goal:
Understand the impact of adopting
checkers to patch authors style
8
Background
i=dic[“key”]
i_=_dic[“key”]
Patch
author
ReviewerSubmit Fix Fix
i_=_dic.get(“key”)
Checker
9. Background Approach Results Future Work
Hypothesis: Patch authors avoid
ADI/MDI before submitting to the review
i=dic[“key”]
9
Patch
author
ReviewerSubmit Checker Fix Fix
Approach
ADI (Automatically Detected Issues)
MDI (Manually Detected Issues)
[Filtered to only fix of small size changes]
i_=_dic[“key”]
i_=_dic.get(“key”)
10. Background Approach Results Future Work
Classify MDIs by changed tokens
10
Patch
author
ReviewerSubmit Fix Fix
MDI (Manually Detected Issues)
Approach
print(“String”) if (i == 0){
StringAlphabet Number
if (i == 0){
i␣=␣dic.get(“key”)
Checker
11. Background Approach Results Future Work
Approach: How often do the patch authors
repeatedly introduce ADIs/MDIs in future patch
submissions?
i=dic[“key”]
i_=_dic[“key”]
11
Patch
author
ReviewerSubmit Fix Fix
i_=_dic.get(“key”)
ADI (Automatically Detected Issues)
MDI (Manually Detected Issues)
Approach
i=dic[“key”]
……
Count Target
Checker
12. Background Approach Results Future Work
- i=key
+ i=dic[“key”]
- i=key
+ i=dic.get(“key”)
- i=dic[“key”]
+ i=dic.get(“key”)
12
Patch author ReviewerSubmit Project
Integrate
Review, Fix Request
Approach
Diff of initial and
Integrated patches
Detected ADI/MDI
ADI/MDI count by
each patch author
Initial Patch Integrated Patch
13. Background Approach Results Future Work
13
- FOO = 0
+ foo = 0
………
- bar = 0
+ bar = 1
- FOO = 0
+ foo = 0
ADI Count
Invalid-
name
1
Approach
Run checker Count
Diff of initial and
Integrated patches
Detected ADI/MDI
ADI/MDI count by
each patch author
14. Background Approach Results Future Work
14
- FOO = 0
+ foo = 0
………
- bar = 0
+ bar = 1
- FOO
+ foo
- 0
+ 1
Alphabet
Number
MDI Count
Alphabet 1
Number 1
- FOO = 0
+ foo = 0
ADI Count
Invalid-
name
1
Approach
Diff of initial and
Integrated patches
Detected ADI/MDI
ADI/MDI count by
each patch author
Identify
token type
Count
15. Background Approach Results Future Work
15
Diff of initial and
Integrated patches
Detected ADI/MDI
ADI/MDI count by
each patch author
Approach
MDI Count
Alphabet 5
Number 1Author A
A B C
AlphabetCount
MDI Count
Alphabet 2
String 2Author B
MDI Count
Alphabet 3
Space 4Author B
17. Background Approach Results Future Work
Frequently fixed ADIs
unused-argument
invalid-name
bad-continuation
- FOO = 0
- cmelCase = 1
+ foo = 0
+ snake_case = 1
if foo > 0 and
- foo < 1:
if foo > 0 and
+ ___foo < 1:
17
def foo(x, y):
- return x
def foo(x, y):
+ return x + y
Results
18. Background Approach Results Future Work
Coding style checker are effective for
improving patch authors’ coding style
18
# of
Introduced
each author
ADI (%) MDI (%)
invalid-
name
bad-
continuation
alphabet strings
1 36.6 44.9 8.7 9.5
2 21.0 22.2 9.3 10.1
3 11.9 11.4 7.3 7.7
4 7.2 5.7 5.9 6.3
5 5.0 4.4 5.1 5.4
>5 18.3 11.5 63.7 61.0
Results
The ratio (%) of patch authors who introduced each of ADIs/MDIs n times
19. Background Approach Results Future Work
Coding style checker are effective for
improving patch authors’ coding style
19
# of
Introduced
each author
ADI (%) MDI (%)
invalid-
name
bad-
continuation
alphabet strings
1 36.6 44.9 8.7 9.5
2 21.0 22.2 9.3 10.1
3 11.9 11.4 7.3 7.7
4 7.2 5.7 5.9 6.3
5 5.0 4.4 5.1 5.4
>5 18.3 11.5 63.7 61.0
Most patch authors will not introduce same ADI
more than 3 times
Results
The ratio (%) of patch authors who introduced each of ADIs/MDIs n times
20. Background Approach Results Future Work
Coding style checker are effective for
improving patch authors’ coding style
20
# of
Introduced
each author
ADI (%) MDI (%)
invalid-
name
bad-
continuation
alphabet strings
1 36.6 44.9 8.7 9.5
2 21.0 22.2 9.3 10.1
3 11.9 11.4 7.3 7.7
4 7.2 5.7 5.9 6.3
5 5.0 4.4 5.1 5.4
>5 18.3 11.5 63.7 61.0
Code reviewers should carefully verify issues that
can not be detected automatically
Results
The ratio (%) of patch authors who introduced each of ADIs/MDIs n times
21. Background Approach Results Future Work
Impact:
Automation of MDIs will reduce
the review cost
21
Patch
author
ReviewerChecker Fix Fix
Review, Fix Coding Style
Review, Fix Coding Style
Review, Fix Coding Style
Results
22. Background Approach Results Future Work
Patterns of MDIs
- i=dic[“key”] + i=dic.get(“key”)
Why fixed?:To avoid ”KeyError” and get “None” value
- assertEqual(x,None)
Why fixed?:To make it works with python custom class
+ assertIsNone(x)
22
Future Work
Some MDIs can be automatically fixed
based on patterns
Hello, I’m Yuki Ueda, a masters course student at Nara Institute of Science and Technology in Japan.
Today, I’d like to talk about “Impact of Coding Style Checker on Code Review -A case study on the OpenStack projects-”.
In software development, many patch author submit source code change request, that is called patch.
Especially, we defined first submitted patch as the initial patch
However, code changes aren’t always merged into the repository,
because a code reviewer may indicate issues with the code, such as bugs and code readability.
Then, the reviewer gives change requests to the patch author, and the author will fix them.
After fixing the patches several times, the patch may finally be merged into the repository.
We defined this merged final path to integrated patch.
As the research of review purpose,
39% of developers said code improvement is important
We focus concrete source code are improved or not
As the problem of this review process, when the reviewer needs to check the patches several times, reviewer take 6 hour per week.
Because, patch author and reviewer can not detect all problem at once.
As the problem,
I show the review change example.
In this example, left side of source code is not so bad.
However, to fill the space, avoid small error, reviewer should fix them.
To reduce code review cost, some projects are using coding style checker. That can find potential issue automatically.
The code review process of coding style checker, we defined two potential issue kinds.
First is ADI, it is short of manually detected issue by coding style checker.
Second is MDI, it is short of manually detected issue by reviewer.
What is about ADI/MDI rough
As the problem of this review process, when the reviewer needs to check the patches several times, reviewer take 6 hour per week.
Because, patch author and reviewer can not detect all problem at once.
検証と投稿を繰り返し,
最終的にレビューアがよいと判断した安全なソースコードがプロジェクトに統合されます
この提案,再投稿のループにかかる時間的コストは開発作業の中で最も高く,レビューア一人あたりが週に6時間も費やします
その原因としてあらかじめ開発者がどのようにソースコードを修正すればよいのかがわからないためです
This is example of ADIs
Left side python source has a three potential issue.
We found other 100 adis from pylint
In this study, to reduce the code review cost, we suggest automatic code changes.
As our hypothesis, patch authors avoid adi/mdi before submitting to the review.
It mearn checker and reviewers can improve patch authors’ coding style.
In this research, we defined them as the 6 MDI category.
For example they alphabet, string, number
In this study, to reduce the code review cost, we suggest automatic code changes.
To get How often do the patch authors repeatedly introduce ADIs/MDIs in future patch submissions, we have a three steps.
First, In this study we target diff of first and integrated patches that changed through review
Second, we detected ADI and MDIs.
First, I explain about ADIs
Next about MDIs,
First we extracted changed character, and identify to type.
Finally count
Final process, we count ADI/MDI count by each patch author
About our target project, it is openstack.
This project
We found seven frequently fixed ADIs.
Today, I will introduce three ADIs.
There are most freauently fixed ADIs, invalid-name, bad-continuation, unused-argument.
As the causing
This unused-argument has 2 solve way, first is just removing value of y, second is use value of y
This show the main result.
This table show the ratio (%) of patch authors who introduced each of ADIs/MDIs n times
First about ADIs result, Most patch authors will not introduce same ADI more than 3 times.
Not once, but 2 times is enough to careful patch author.
coding style checker are effective for improving patch authors’ coding style.
However, code reviewers should carefully verify issues that can not be detected automatically, no matter how many reviews they undergo.
In right side, most patch author introduce same MDIs automatically.
As the our research impact,
MDIs are discussed on StackOverflow
既存のツールでは検出できない修正例としてはこれらが挙げられます.
例えば上の例は辞書へのアクセスに対してエラーを防ぐ目的があります
2つ目は可読性の向上とを目的としています
引数と合わせてNoneとequalであるという書き方よりもNoneであるという関数を利用することでわかりやすくしています
これらはStackOverflowなどを調べると説明されている言語に詳しい開発者の経験を必要とする修正例です
Finally, I summarize our research.
To measure the impact of adopting coding style checkers to patch authors style,
We counted how patch author repeatedly introduce same type of issue.
As the result, automatically detected issues will not introduce by same author more than 3 times.
I would appreciate if you could ask by slow English.