A Closer Look at Real-world Patches
Kui Liu, Dongsun Kim, Anil Konyuncu, Tegawendé F. Bissyandé, and Yves Le Traon
Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg, Luxembourg
Li Li
Monash Software Force (MSF), Monash University, Melbourne, Australia
@ Madrid Spain, 34th ICSME 2018September 27, 2018
1
> Basic Process of Automated Program Repair (APR)
Fault
Localization
Test
Pass
Fail
Patch
Candidate
APR
Tools
Suspicious
buggy code
Where is the code to be fixed? How to generate patches? Is the patch correct?
passing
tests
Passing
tests
Failing
tests
2
> How many bugs are fixed by existing APR tools?
Benchmark Defects4J [42] (395 bugs).
APR Tool # fixed bugs # Correctly fixed bugs
jGenProg 29 5
jKali 22 1
jMutRepair 17 3
Nopol 35 5
HDRepair 23 6
ACS 23 18
ssFix 60 20
ELIXIR 41 26
JAID 26 9
CapGen 25 21
SketchFix 26 19
SimFix 56 34
Why the quantity of bugs
that can be fixed by APR
tools and the quality of
patches generated by APR
tools are such low?
3
> Scope Limitation of APR Tools
Fixing bugs at the statement level.
Bug Chart_1 in Defects4J fixed by jMutRepair, ELIXIR, ssFix,
JAID, SketchFix, CapGen, SimFix.
4
> Are Non-Statement Code Entities Bug-free?
Bug located in Type Declaration (Math-12 in Defects4J).
Bug located in Method Declaration (Lang-29 in Defects4J). Bug located in Field Declaration (Lang-56 in Defects4J).
Bugs located in non-statement code entities.
None of existing APR tools can fix these bugs.
5
> Statement Level VS. Finer Granularity Level
Statement level: UPD ReturnStatement.
The repair action is difficult to be used
to fix similar bugs.
Expression level: dim / 2  0.5 * dim.
Project: Commons-math.
Bug Report ID: MATH-29, “Fix truncated value.”
Commit cedf0d27f9e9341a9e9fa8a192735a0c2e11be40,
--- a/src/main/java/org/apache/commons/math3/distribution/MultivariateNormalDistribution.java
+++ b/src/main/java/org/apache/commons/math3/distribution/MultivariateNormalDistribution.java
@@ −895, 1 +895, 1 @@
- return FastMath.pow(2 * FastMath.PI, -dim / 2) *
+ return FastMath.pow(2 * FastMath.PI, -0.5 * dim) *
FastMath.pow(covarianceMatrixDeterminant, -0.5) * getExponentTerm(vals);
The fix pattern could be used to fix similar bugs.
6
> Objective
Deepen knowledge on repair ingredients
from real-world patches in a fine-grained
way for automated program repair.
7
STUDY DESIGN
8
> Research Questions.
RQ4. Which parts of buggy expressions are prone to be buggy?
RQ1. Do patches impact some specific statement types?
RQ2. Are there code elements in statements that are prone to be faulty?
RQ3. Which expression types are most impacted by patches?
In APR, Fault localization techniques
(e.g.,Tarantula[31], Ochiai[32], Ochiai2[33], Zoltar[34] and
DStar[35]) are used to identify bug positions at code line level.
Data
Type
Variable
Name Operator
Being
Assigned
Expression
9
> Bug-fixing Patches Collection
1). Keyword matching.
Bug, error, fault, fix, patch or repair
2) Bug linking.
Bug IDs (e.g, MATH-929) in issue
tracking system:
(1) Issue Type is ‘bug’,
(2) Resolution is ‘fixed’.
Projects
# Commits
Identified Selected
Commons-io 222 191
Commons-lang 643 522
Mahout 751 717
Commans-math 1,021 909
Derby 3,788 3,356
Lucene-solr 11,408 10,755
Total 18,013 16,450
Buggy_Hunk
Fixed_Hunk
0 2 4 6 8 10
Hunk Size
Commit logs.
10
> Patch Differencing at AST Node Level
Buggy version
Fixed version
Patch
Regroup
Hierarchical construct
of code change actions.
GumTree[25]
11
> Hierarchical Construct of Code Change Actions of a Patch
“Fixed truncated value.”
12
RESULTS
13
> RQ1: Root AST Nodes Impacted by Patches
• Statements are the main buggy
code entities.
None of existing APR tools can fix declaration-related bugs in Defects4J.
Distributions of Root AST node Types Impacted by Patches.
MethodDeclaration, 15.95%
FieldDeclaration, 9.32%
EnumDeclaration, 0.03%
TypeDeclaration, 1.41%
Statement,
73.29%
• Declaration entities (~27%) could
be buggy.
14
> RQ1: Statements Recurrently Impacted by Patches.
5 out of 22 Statement types occupy 88% buggy code statements.
APR tools could focus on fixing some specific statements.
15
> RQ1: Adoption of Update
Supports the investigation of repair
ingredients in a fine-grained way.
“Update” occupies half of repair actions.
1. double d = FastMath.pow(2 * FastMath.PI, -dim / 2);
2. double d = FastMath.pow(2 * FastMath.PI, -dim / 3);
Update:
- a = a + b;
+ a = a * b;
Delete:
int a = 0;
- a = a + b
Move:
- a = a + b;
sum(a,b);
+ a = a + b;
Insert:
int a = 0;
+ a = a + b;
16
> Search Space at Statement Level VS. Expression Level
Expression-level granularity could reduce search space.
Number of buggy
ExpressionStatements: ~40,000.
Commit log: added protection against infinite loops by
setting a maximal number of valuations.
Number of buggy
PrefixExpression: 1,362.
17
> RQ2: Buggy Modifier.
Three ways of repair actions for “modifier”-
related bugs:
1) Add a missing modifier.
2) Delete an inappropriate modifier.
3) Replace an inappropriate modifier.
None of existing APR tools can fix modifier-related bugs in Defects4J.
Modifier, 3.30%
Type, 8.70%
Identifier, 5.50%
Expression,
82.40%
Distributions of inner-statement elements impacted by patches.
Commit log: LANG-334: To avoid exposing a mutating map.
18
> RQ2: Buggy Type Usage.
Buggy Types:
1. Buggy primitive types.
2. Buggy non-primitive types.
Modifier, 3.30%
Type, 8.70%
Identifier, 5.50%
Expression,
82.40%
Distributions of inner-statement elements impacted by patches.
It is a new challenge for APR tools to fix non-primitive type related
bugs.
Commit log: Fix integer overflow.
19
> RQ2: Buggy Identifiers.
APR tools Do not Fix Buggy Identifiers.
Modifying the inconsistent identifier is also
labeled as a bug fix by developers.
Debugging buggy names [58, 59, 60, 61, 62].
Modifier, 3.30%
Type, 8.70%
Identifier, 5.50%
Expression,
82.40%
Distributions of inner-statement elements impacted by patches.
20
> RQ3: Expressions Recurrently Impacted by Patches
5 out of 34 expression types occupy 80% of buggy expressions.
APR tools could focus on fixing some specific expressions.
Distributions of repair actions at the expression level.
21
> RQ3: Buggy Literal Expressions.
Buggy Literal Expressions raise a new challenge for APR tools.
Commit log: SOLR-6959, fix incorrect base url for PDFs.
22
> RQ4: Fault-prone Parts in Expressions.
Non-buggy part of expressions could provide context for fix
pattern mining at the expression level.
Distribution of whole VS. sub-element changes in some buggy expressions.
Expression % whole exp % each sub-exp
Assignment 18.1% Left_Hand_Exp (13.3%) Operator (0.8%) Right_Hand_Exp (73.5)
CastExpression 45.8% Type (11.9%) Exp (42.9%)
ClassInstanceCreation 15.5% Pre_Exp (9.2%) ClassType (19.7%) Argus (63%)
ConditionalExpression 22.9% Condition_Exp (24.1%) Then_Exp (33%) Else_Exp (49.5%)
InfixExpression 27.3% Left_Hand_Exp (35%) Operator (5.6%) Right_Hand_Exp (68.7)
MethodInvocation 14.7% MethodName (22.1%) Argus (79.8%)
23
> Fix Pattern Mining at Expression Level
Commit 44854912194177d67cdfa1dc765ba684eb013a4c
--- a/src/main/java/org/apache/commons/lang3/time/FastDateParser.java
+++ b/src/main/java/org/apache/commons/lang3/time/FastDateParser.java
@@ −895, 1 +895, 1 @@
- final TimeZone tz = TimeZone.getTimeZone(value.toUpperCase());
+ final TimeZone tz = TimeZone.getTimeZone(value.toUpperCase(Locale.ROOT));
- value.toUpperCase()
+
value.toUpperCase(Locale.ROOT);
Fix
Pattern:
Commit log: use toUpperCase(Locale) internally to avoid i18n issues.
24
> Take-away
RQ1:
1. APR scope should be extended to declaration entities.
2. APR changes can be prioritized on a few specific statement types.
3. Move action can be ignored by APR tools.
4. Real-world patches support further investigation in a fine-grained way.
RQ2:
1. APR scope should be extended to modifiers.
2. Buggy non-primitive types could be a new direction for APR.
RQ3:
1. APR changes can be prioritized on a few specific expression types.
2. Buggy literal expressions raise a new challenge for APR.
RQ4:
Non-buggy part of expressions could provide context for fix pattern mining at the expression level.
25
> Summary
15
> RQ1: Adoption of Update
Supports the investigation of repair
ingredients in a fine-grained way.
“Update” occupies half of repair actions.
1. double d = FastMath.pow(2 * FastMath.PI, -dim / 2);
2. double d = FastMath.pow(2 * FastMath.PI, -dim / 3);
10
> Patch Differencing at AST Node Level
Buggy version
Fixed version
Patch
Regroup
Hierarchical construct
of code change actions.
GumTree[25]
https://github.com/AutoProRepair/PatchParser

A Closer Look at Real-World Patches

  • 1.
    A Closer Lookat Real-world Patches Kui Liu, Dongsun Kim, Anil Konyuncu, Tegawendé F. Bissyandé, and Yves Le Traon Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg, Luxembourg Li Li Monash Software Force (MSF), Monash University, Melbourne, Australia @ Madrid Spain, 34th ICSME 2018September 27, 2018
  • 2.
    1 > Basic Processof Automated Program Repair (APR) Fault Localization Test Pass Fail Patch Candidate APR Tools Suspicious buggy code Where is the code to be fixed? How to generate patches? Is the patch correct? passing tests Passing tests Failing tests
  • 3.
    2 > How manybugs are fixed by existing APR tools? Benchmark Defects4J [42] (395 bugs). APR Tool # fixed bugs # Correctly fixed bugs jGenProg 29 5 jKali 22 1 jMutRepair 17 3 Nopol 35 5 HDRepair 23 6 ACS 23 18 ssFix 60 20 ELIXIR 41 26 JAID 26 9 CapGen 25 21 SketchFix 26 19 SimFix 56 34 Why the quantity of bugs that can be fixed by APR tools and the quality of patches generated by APR tools are such low?
  • 4.
    3 > Scope Limitationof APR Tools Fixing bugs at the statement level. Bug Chart_1 in Defects4J fixed by jMutRepair, ELIXIR, ssFix, JAID, SketchFix, CapGen, SimFix.
  • 5.
    4 > Are Non-StatementCode Entities Bug-free? Bug located in Type Declaration (Math-12 in Defects4J). Bug located in Method Declaration (Lang-29 in Defects4J). Bug located in Field Declaration (Lang-56 in Defects4J). Bugs located in non-statement code entities. None of existing APR tools can fix these bugs.
  • 6.
    5 > Statement LevelVS. Finer Granularity Level Statement level: UPD ReturnStatement. The repair action is difficult to be used to fix similar bugs. Expression level: dim / 2  0.5 * dim. Project: Commons-math. Bug Report ID: MATH-29, “Fix truncated value.” Commit cedf0d27f9e9341a9e9fa8a192735a0c2e11be40, --- a/src/main/java/org/apache/commons/math3/distribution/MultivariateNormalDistribution.java +++ b/src/main/java/org/apache/commons/math3/distribution/MultivariateNormalDistribution.java @@ −895, 1 +895, 1 @@ - return FastMath.pow(2 * FastMath.PI, -dim / 2) * + return FastMath.pow(2 * FastMath.PI, -0.5 * dim) * FastMath.pow(covarianceMatrixDeterminant, -0.5) * getExponentTerm(vals); The fix pattern could be used to fix similar bugs.
  • 7.
    6 > Objective Deepen knowledgeon repair ingredients from real-world patches in a fine-grained way for automated program repair.
  • 8.
  • 9.
    8 > Research Questions. RQ4.Which parts of buggy expressions are prone to be buggy? RQ1. Do patches impact some specific statement types? RQ2. Are there code elements in statements that are prone to be faulty? RQ3. Which expression types are most impacted by patches? In APR, Fault localization techniques (e.g.,Tarantula[31], Ochiai[32], Ochiai2[33], Zoltar[34] and DStar[35]) are used to identify bug positions at code line level. Data Type Variable Name Operator Being Assigned Expression
  • 10.
    9 > Bug-fixing PatchesCollection 1). Keyword matching. Bug, error, fault, fix, patch or repair 2) Bug linking. Bug IDs (e.g, MATH-929) in issue tracking system: (1) Issue Type is ‘bug’, (2) Resolution is ‘fixed’. Projects # Commits Identified Selected Commons-io 222 191 Commons-lang 643 522 Mahout 751 717 Commans-math 1,021 909 Derby 3,788 3,356 Lucene-solr 11,408 10,755 Total 18,013 16,450 Buggy_Hunk Fixed_Hunk 0 2 4 6 8 10 Hunk Size Commit logs.
  • 11.
    10 > Patch Differencingat AST Node Level Buggy version Fixed version Patch Regroup Hierarchical construct of code change actions. GumTree[25]
  • 12.
    11 > Hierarchical Constructof Code Change Actions of a Patch “Fixed truncated value.”
  • 13.
  • 14.
    13 > RQ1: RootAST Nodes Impacted by Patches • Statements are the main buggy code entities. None of existing APR tools can fix declaration-related bugs in Defects4J. Distributions of Root AST node Types Impacted by Patches. MethodDeclaration, 15.95% FieldDeclaration, 9.32% EnumDeclaration, 0.03% TypeDeclaration, 1.41% Statement, 73.29% • Declaration entities (~27%) could be buggy.
  • 15.
    14 > RQ1: StatementsRecurrently Impacted by Patches. 5 out of 22 Statement types occupy 88% buggy code statements. APR tools could focus on fixing some specific statements.
  • 16.
    15 > RQ1: Adoptionof Update Supports the investigation of repair ingredients in a fine-grained way. “Update” occupies half of repair actions. 1. double d = FastMath.pow(2 * FastMath.PI, -dim / 2); 2. double d = FastMath.pow(2 * FastMath.PI, -dim / 3); Update: - a = a + b; + a = a * b; Delete: int a = 0; - a = a + b Move: - a = a + b; sum(a,b); + a = a + b; Insert: int a = 0; + a = a + b;
  • 17.
    16 > Search Spaceat Statement Level VS. Expression Level Expression-level granularity could reduce search space. Number of buggy ExpressionStatements: ~40,000. Commit log: added protection against infinite loops by setting a maximal number of valuations. Number of buggy PrefixExpression: 1,362.
  • 18.
    17 > RQ2: BuggyModifier. Three ways of repair actions for “modifier”- related bugs: 1) Add a missing modifier. 2) Delete an inappropriate modifier. 3) Replace an inappropriate modifier. None of existing APR tools can fix modifier-related bugs in Defects4J. Modifier, 3.30% Type, 8.70% Identifier, 5.50% Expression, 82.40% Distributions of inner-statement elements impacted by patches. Commit log: LANG-334: To avoid exposing a mutating map.
  • 19.
    18 > RQ2: BuggyType Usage. Buggy Types: 1. Buggy primitive types. 2. Buggy non-primitive types. Modifier, 3.30% Type, 8.70% Identifier, 5.50% Expression, 82.40% Distributions of inner-statement elements impacted by patches. It is a new challenge for APR tools to fix non-primitive type related bugs. Commit log: Fix integer overflow.
  • 20.
    19 > RQ2: BuggyIdentifiers. APR tools Do not Fix Buggy Identifiers. Modifying the inconsistent identifier is also labeled as a bug fix by developers. Debugging buggy names [58, 59, 60, 61, 62]. Modifier, 3.30% Type, 8.70% Identifier, 5.50% Expression, 82.40% Distributions of inner-statement elements impacted by patches.
  • 21.
    20 > RQ3: ExpressionsRecurrently Impacted by Patches 5 out of 34 expression types occupy 80% of buggy expressions. APR tools could focus on fixing some specific expressions. Distributions of repair actions at the expression level.
  • 22.
    21 > RQ3: BuggyLiteral Expressions. Buggy Literal Expressions raise a new challenge for APR tools. Commit log: SOLR-6959, fix incorrect base url for PDFs.
  • 23.
    22 > RQ4: Fault-proneParts in Expressions. Non-buggy part of expressions could provide context for fix pattern mining at the expression level. Distribution of whole VS. sub-element changes in some buggy expressions. Expression % whole exp % each sub-exp Assignment 18.1% Left_Hand_Exp (13.3%) Operator (0.8%) Right_Hand_Exp (73.5) CastExpression 45.8% Type (11.9%) Exp (42.9%) ClassInstanceCreation 15.5% Pre_Exp (9.2%) ClassType (19.7%) Argus (63%) ConditionalExpression 22.9% Condition_Exp (24.1%) Then_Exp (33%) Else_Exp (49.5%) InfixExpression 27.3% Left_Hand_Exp (35%) Operator (5.6%) Right_Hand_Exp (68.7) MethodInvocation 14.7% MethodName (22.1%) Argus (79.8%)
  • 24.
    23 > Fix PatternMining at Expression Level Commit 44854912194177d67cdfa1dc765ba684eb013a4c --- a/src/main/java/org/apache/commons/lang3/time/FastDateParser.java +++ b/src/main/java/org/apache/commons/lang3/time/FastDateParser.java @@ −895, 1 +895, 1 @@ - final TimeZone tz = TimeZone.getTimeZone(value.toUpperCase()); + final TimeZone tz = TimeZone.getTimeZone(value.toUpperCase(Locale.ROOT)); - value.toUpperCase() + value.toUpperCase(Locale.ROOT); Fix Pattern: Commit log: use toUpperCase(Locale) internally to avoid i18n issues.
  • 25.
    24 > Take-away RQ1: 1. APRscope should be extended to declaration entities. 2. APR changes can be prioritized on a few specific statement types. 3. Move action can be ignored by APR tools. 4. Real-world patches support further investigation in a fine-grained way. RQ2: 1. APR scope should be extended to modifiers. 2. Buggy non-primitive types could be a new direction for APR. RQ3: 1. APR changes can be prioritized on a few specific expression types. 2. Buggy literal expressions raise a new challenge for APR. RQ4: Non-buggy part of expressions could provide context for fix pattern mining at the expression level.
  • 26.
    25 > Summary 15 > RQ1:Adoption of Update Supports the investigation of repair ingredients in a fine-grained way. “Update” occupies half of repair actions. 1. double d = FastMath.pow(2 * FastMath.PI, -dim / 2); 2. double d = FastMath.pow(2 * FastMath.PI, -dim / 3); 10 > Patch Differencing at AST Node Level Buggy version Fixed version Patch Regroup Hierarchical construct of code change actions. GumTree[25] https://github.com/AutoProRepair/PatchParser

Editor's Notes

  • #20  Chart_17, Lang_4 none of apr tools can fix non primitive type related bugs.
  • #23 Some bugs are also related to literal expressions.