Automatic patch generation learned from human written patches
Automatic Patch Generation Learned
from Human-Written Patches
Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim
The Hong Kong University of Science and Technology, China
24 May 2013
the 35th International Conference on Software Engineering (ICSE 2013)
3
GenProg
C. Le Goues, M. Dewey-Vogt, S. Forrest, and W.Weimer,“A systematic study of automated
program repair: Fixing 55 out of 105 bugs for $8 each,” in ICSE ’12.
3
GenProg
State-of-the-art
C. Le Goues, M. Dewey-Vogt, S. Forrest, and W.Weimer,“A systematic study of automated
program repair: Fixing 55 out of 105 bugs for $8 each,” in ICSE ’12.
19
Fault
Locations
J.A. Jones, M. J. Harrold, and J. Stasko,“Visualization of test information to assist fault localization,” in Proceedings
of the 24th International Conference on Software Engineering, NewYork, NY, USA, 2002, pp. 467–477.
19
TP TF
Fault
Locations
J.A. Jones, M. J. Harrold, and J. Stasko,“Visualization of test information to assist fault localization,” in Proceedings
of the 24th International Conference on Software Engineering, NewYork, NY, USA, 2002, pp. 467–477.
19
TP TF
Fault
Locations
J.A. Jones, M. J. Harrold, and J. Stasko,“Visualization of test information to assist fault localization,” in Proceedings
of the 24th International Conference on Software Engineering, NewYork, NY, USA, 2002, pp. 467–477.
19
TP TF
Fault
Locations
J.A. Jones, M. J. Harrold, and J. Stasko,“Visualization of test information to assist fault localization,” in Proceedings
of the 24th International Conference on Software Engineering, NewYork, NY, USA, 2002, pp. 467–477.
19
TP TF
Fault
Locations
J.A. Jones, M. J. Harrold, and J. Stasko,“Visualization of test information to assist fault localization,” in Proceedings
of the 24th International Conference on Software Engineering, NewYork, NY, USA, 2002, pp. 467–477.
19
TP TF
Fault
Locations
J.A. Jones, M. J. Harrold, and J. Stasko,“Visualization of test information to assist fault localization,” in Proceedings
of the 24th International Conference on Software Engineering, NewYork, NY, USA, 2002, pp. 467–477.
19
TP TF
Fault
Locations
Fault
locations
J.A. Jones, M. J. Harrold, and J. Stasko,“Visualization of test information to assist fault localization,” in Proceedings
of the 24th International Conference on Software Engineering, NewYork, NY, USA, 2002, pp. 467–477.
Using a Fix Template:An Example
24
1500 num = state.parenCount;
1501 int kidMatch = matchRENodes(state, (RENode)ren.kid,
1502 stop, index);
1503 if (kidMatch != 1) return kidMatch;
1504 for (int i = num; i < state.parenCount; i++)
1505 state.parens[i].length = 0;
1506 state.parenCount = num;
Using a Fix Template:An Example
24
1500 num = state.parenCount;
1501 int kidMatch = matchRENodes(state, (RENode)ren.kid,
1502 stop, index);
1503 if (kidMatch != 1) return kidMatch;
1504 for (int i = num; i < state.parenCount; i++)
1505 state.parens[i].length = 0;
1506 state.parenCount = num;
+
-
+Null Pointer Checker
Using a Fix Template:An Example
24
1500 num = state.parenCount;
1501 int kidMatch = matchRENodes(state, (RENode)ren.kid,
1502 stop, index);
1503 if (kidMatch != 1) return kidMatch;
1504 for (int i = num; i < state.parenCount; i++)
1505 state.parens[i].length = 0;
1506 state.parenCount = num;
+
-
+Null Pointer Checker
Using a Fix Template:An Example
24
obj ref.: state, parens[i], ...
1500 num = state.parenCount;
1501 int kidMatch = matchRENodes(state, (RENode)ren.kid,
1502 stop, index);
1503 if (kidMatch != 1) return kidMatch;
1504 for (int i = num; i < state.parenCount; i++)
1505 state.parens[i].length = 0;
1506 state.parenCount = num;
+
-
+Null Pointer Checker
Using a Fix Template:An Example
24
obj ref.: state, parens[i], ...
Check obj ref.: PASS
1500 num = state.parenCount;
1501 int kidMatch = matchRENodes(state, (RENode)ren.kid,
1502 stop, index);
1503 if (kidMatch != 1) return kidMatch;
1504 for (int i = num; i < state.parenCount; i++)
1505 state.parens[i].length = 0;
1506 state.parenCount = num;
+
-
+Null Pointer Checker
Using a Fix Template:An Example
24
obj ref.: state, parens[i], ...
Check obj ref.: PASS
Edit: Insert
...
...
+ if( ) {
state.parens[i].length = 0;
+ }
...
...
state != null && state.parens[i] != null
1500 num = state.parenCount;
1501 int kidMatch = matchRENodes(state, (RENode)ren.kid,
1502 stop, index);
1503 if (kidMatch != 1) return kidMatch;
1504 for (int i = num; i < state.parenCount; i++)
1505 state.parens[i].length = 0;
1506 state.parenCount = num;
+
-
+Null Pointer Checker
Using a Fix Template:An Example
24
obj ref.: state, parens[i], ...
Check obj ref.: PASS
Edit: Insert
...
...
+ if( ) {
state.parens[i].length = 0;
+ }
...
...
state != null && state.parens[i] != null
1500 num = state.parenCount;
1501 int kidMatch = matchRENodes(state, (RENode)ren.kid,
1502 stop, index);
1503 if (kidMatch != 1) return kidMatch;
1504 for (int i = num; i < state.parenCount; i++)
1505 state.parens[i].length = 0;
1506 state.parenCount = num;
+
-
+Null Pointer Checker
Using a Fix Template:An Example
24
obj ref.: state, parens[i], ...
Check obj ref.: PASS
Edit: Insert
...
...
+ if( ) {
state.parens[i].length = 0;
+ }
...
...
state != null && state.parens[i] != null
1500 num = state.parenCount;
1501 int kidMatch = matchRENodes(state, (RENode)ren.kid,
1502 stop, index);
1503 if (kidMatch != -1) return kidMatch;
1504 for (int i = num; i < state.parenCount; i++)
1505 state.parens[i].length = 0;
1506 state.parenCount = num;
1500 num = state.parenCount;
1501 int kidMatch = matchRENodes(state, (RENode)ren.kid,
1502 stop, index);
1503 if (kidMatch != -1) return kidMatch;
1504 for (int i = num; i < state.parenCount; i++)
1505 {
1506 // deleted.
1507 }
1508 state.parenCount = num;
1500 num = state.parenCount;
1501 int kidMatch = matchRENodes(state, (RENode)ren.kid,
1502 stop, index);
1503 if (kidMatch != -1) return kidMatch;
1504 for (int i = num; i < state.parenCount; i++)
1505 state.parens[i].length = 0;
1506 state.parenCount = num;
1500 num = state.parenCount;
1501 int kidMatch = matchRENodes(state, (RENode)ren.kid,
1502 stop, index);
1503 if (kidMatch != -1) return kidMatch;
1504 for (int i = num; i < state.parenCount; i++)
1505 {
1506 if( state != null && state.parens[i] != null)
1507 state.parens[i].length = 0;
1508 }
1509 state.parenCount = num;
1500 num = state.parenCount;
1501 int kidMatch = matchRENodes(state, (RENode)ren.kid,
1502 stop, index);
1503 if (kidMatch != 1) return kidMatch;
1504 for (int i = num; i < state.parenCount; i++)
1505 state.parens[i].length = 0;
1506 state.parenCount = num;
+
-
+Null Pointer Checker
Using a Fix Template:An Example
24
obj ref.: state, parens[i], ...
Check obj ref.: PASS
Edit: Insert
...
...
+ if( ) {
state.parens[i].length = 0;
+ }
...
...
state != null && state.parens[i] != null
1502 stop, index);
1503 if (kidMatch != -1) return kidMatch;
1504 for (int i = num; i < state.parenCount; i++)
1505 state.parens[i].length = 0;
1506 state.parenCount = num;
1500 num = state.parenCount;
1501 int kidMatch = matchRENodes(state, (RENode)ren.kid,
1502 stop, index);
1503 if (kidMatch != -1) return kidMatch;
1504 for (int i = num; i < state.parenCount; i++)
1505 {
1506 if( state != null && state.parens[i] != null)
1507 state.parens[i].length = 0;
1508 }
1509 state.parenCount = num;
1502 stop, index);
1503 if (kidMatch != -1) return kidMatch;
1504 for (int i = num; i < state.parenCount; i++)
1505 state.parens[i].length = 0;
1506 state.parenCount = num;
1500 num = state.parenCount;
1501 int kidMatch = matchRENodes(state, (RENode)ren.kid,
1502 stop, index);
1503 if (kidMatch != -1) return kidMatch;
1504 for (int i = num; i < state.parenCount; i++)
1505 {
1506 if( state != null && state.parens[i] != null)
1507 state.parens[i].length = 0;
1508 }
1509 state.parenCount = num;
1500 num = state.parenCount;
1501 int kidMatch = matchRENodes(state, (RENode)ren.kid,
1502 stop, index);
1503 if (kidMatch != 1) return kidMatch;
1504 for (int i = num; i < state.parenCount; i++)
1505 state.parens[i].length = 0;
1506 state.parenCount = num;
32
RQ1(Fixability): How many bugs
are fixed successfully?
RQ2(Acceptability):Which approach
can generate more acceptable bug
patches?
Evaluation:
Research Questions
#
33
Subject # bugs LOC # test cases
Rhino 17 51,001 5,578
AspectJ 18 180,394 1,602
log4j 15 27,855 705
Math 29 121,168 3,538
Lang 20 54,537 2,051
Collections 20 48,049 11,577
Total 119 351,406 25,051
Experiment Subjects
37
User Study #1
Bug Description
Buggy Code
Anonymized Patch #1
Anonymized Patch #2
Ranking Patches
37
User Study #1
Bug Description
Buggy Code
Anonymized Patch #1
Anonymized Patch #2
Anonymized Patch #3
Ranking Patches
37
User Study #1
Bug Description
Buggy Code
Anonymized Patch #1
Anonymized Patch #2
Anonymized Patch #3
Ranking Patches
37
User Study #1
Bug Description
Buggy Code
Anonymized Patch #1
Anonymized Patch #2
Anonymized Patch #3
Rank patches
17
Students
68
Developers
Ranking Patches
37
User Study #1
Bug Description
Buggy Code
Anonymized Patch #1
Anonymized Patch #2
Anonymized Patch #3
2
1
3
Rank patches
17
Students
68
Developers
Ranking Patches
User Study #1: Results
Student
group
(avg.
ranking)
38
(the lower the better)
Developer
group
(avg.
ranking)
(the lower the better)
User Study #1: Results
Student
group
(avg.
ranking)
38
0
0.75
1.5
2.25
3
1.72 1.57
2.67
PAR GenProgHuman
(the lower the better)
Developer
group
(avg.
ranking)
(the lower the better)
User Study #1: Results
Student
group
(avg.
ranking)
38
0
0.75
1.5
2.25
3
1.72 1.57
2.67
PAR GenProgHuman
(the lower the better)
Developer
group
(avg.
ranking)
(the lower the better)
Significantly
Different
User Study #1: Results
Student
group
(avg.
ranking)
38
0
0.75
1.5
2.25
3
1.72 1.57
2.67
PAR GenProgHuman
(the lower the better)
Developer
group
(avg.
ranking)
1
1.35
1.7
2.05
2.4
1.81 1.82
2.36
(the lower the better)
PAR
GenPro
g
Human
Significantly
Different
User Study #1: Results
Student
group
(avg.
ranking)
38
0
0.75
1.5
2.25
3
1.72 1.57
2.67
PAR GenProgHuman
(the lower the better)
Developer
group
(avg.
ranking)
1
1.35
1.7
2.05
2.4
1.81 1.82
2.36
(the lower the better)
PAR
GenPro
g
Human
Significantly
Different
Significantly
Different
User Study #1: Results
Student
group
(avg.
ranking)
38
0
0.75
1.5
2.25
3
1.72 1.57
2.67
PAR GenProgHuman
(the lower the better)
Developer
group
(avg.
ranking)
1
1.35
1.7
2.05
2.4
1.81 1.82
2.36
(the lower the better)
PAR
GenPro
g
Human
Significantly
Different
Significantly
Different
PAR generates better ranking
patches than GenProg
39
RQ2: Acceptability
User Study #1: Ranking between
PAR and GenProg
User Study #2: Pair-wise Comparison
between
Human-written Patches
Vs.
PAR or GenProg
User Study #2: Results
GenProg
42
0
10
20
30
40
21
28
37
14
responses(%)
PAR HumanBoth Not
Sure
PAR
User Study #2: Results
GenProg
42
0
10
20
30
40
21
28
37
14
responses(%)
PAR HumanBoth Not
Sure
PAR
0
15
30
45
60
20
12
51
17responses(%)
GenProg HumanBoth Not
Sure
User Study #2: Results
GenProg
42
0
10
20
30
40
21
28
37
14
responses(%)
PAR HumanBoth Not
Sure
PAR
0
15
30
45
60
20
12
51
17responses(%)
GenProg HumanBoth Not
Sure
49%
User Study #2: Results
GenProg
42
0
10
20
30
40
21
28
37
14
responses(%)
PAR HumanBoth Not
Sure
PAR
0
15
30
45
60
20
12
51
17responses(%)
GenProg HumanBoth Not
Sure
49%
32%
User Study #2: Results
GenProg
42
0
10
20
30
40
21
28
37
14
responses(%)
PAR HumanBoth Not
Sure
PAR
0
15
30
45
60
20
12
51
17responses(%)
GenProg HumanBoth Not
Sure
49%
32%
PAR generates more
acceptable patches than GenProg
43
Limitations
• Fix templates are written manually.
• But it is one-time cost, these are highly reusable.
•We entirely re-implemented GenProg by Java.
•All subjects are collected from open source projects.
• Some participants may not be thoroughly qualified.
44
Summary
Can fix more bugs with more acceptability
Observed common patches
#patches
Patterns
Fix Templates and PAR
if(lhs == DBL_MRK) lhs = ...;
if(lhs == undefined) {
lhs = strings[pc + 1];
}
Scriptable calleeScope = ...;
Buggy
Program
(a) Fault
Localization
+
-
+
-
+
+
(b) Template-based
Patch Candidate Generation
Fail
Pass
(c) Patch Evaluation
T
Repaired
Fix
Template
Patch
Candidate
Repaired
Program
Fault
Location
0
6
12
18
24
30
16
27
0
0.75
1.5
2.25
3
1
1.35
1.7
2.05
2.4
0
10
20
30
40
PAR HumanBoth Not
Sure
0
15
30
45
60
GenProg HumanBoth Not
Sure
49%
32%
45
Future Work
Automatic Fix Template Identification
• More templates can fix more bugs.
More Test Cases
• More test cases may lead us to better patches.