SlideShare a Scribd company logo
1 of 29
Download to read offline
You Cannot Fix What You Cannot Find!
An Investigation of Fault Localization Bias in
Benchmarking Automated Program Repair Systems
Kui Liu, Anil Koyuncu, Tegawendé F. Bissyandé, Dongsun Kim,
Jacques Klein and Yves Le Traon
SnT, University of Luxembourg, Luxembourg
Xi’an, China, 12th ICST 2019April 24, 2019
> CONTEXT
1
2
> Automated Program Repair (APR) Basic Process
Test
Pass
Fail
Patch Validation
Plausible
Patch
Patch
Candidate
APR
Approach
Patch Generation
Fault
Localization
Tools
A Ranked List
of Suspicious
Code
Locations
Fault Localization (FL)
Buggy
Program
Passing
tests
Failing
tests
3
> Repair Performance Assessment of APR Tools
GenProg, ACS,
CapGen, SimFix, etc.
4
> Repair Performance Assessment of APR Tools
Benchmark
Defects4J
SimFix jGP jKali Nopol ACS HDR ssFix ELIXIR JAID
Chart 4 0 0 1 2 -(2) 3 4 2(4)
Closure 6 0 0 0 0 -(7) 2 0 5(9)
Math 14 5 1 1 12 -(7) 10 12 1(7)
Lang 9 0 0 3 3 -(6) 5 8 1(5)
Time 1 0 0 0 1 -(1) 0 2 0(0)
Total 34 5 1 5 18 13(23) 20 26 9(25)
TABLE I. TABLE EXCERPTED FROM [33] WITH THE CAPTION
“Correct patches generated by different techniques”.
Is this kind of comparison fair enough?
*The numbers in parenthesis(#) denote the number of bugs fixed by APR tools but ignoring the patch ranking.
> PROBLEM
5
6
> Automated Program Repair (APR) Basic Process
Test
Pass
Fail
Patch Validation
Plausible
Patch
Patch
Candidate
APR
Approach
Patch Generation
Fault
Localization
Tools
A Ranked List
of Suspicious
Code
Locations
Fault Localization (FL)
Buggy
Program
Passing
tests
Failing
tests
Same SameContributionNo Discussion
There is no any disclosed discussion on the impact of fault localization for
APR tool repair performance.
7
> Repair Performance Assessment of APR Tools
Benchmark
Defects4J
SimFix jGP jKali Nopol ACS HDR ssFix ELIXIR JAID
Chart 4 0 0 1 2 -(2) 3 4 2(4)
Closure 6 0 0 0 0 -(7) 2 0 5(9)
Math 14 5 1 1 12 -(7) 10 12 1(7)
Lang 9 0 0 3 3 -(6) 5 8 1(5)
Time 1 0 0 0 1 -(1) 0 2 0(0)
Total 34 5 1 5 18 13(23) 20 26 9(25)
TABLE I. TABLE EXCERPTED FROM [33] WITH THE CAPTION
“Correct patches generated by different techniques”.
BIASED
If the fault localization of APR tools are different from each
other, this kind of comparison could be biased or misleading.
> FAULT LOCALIZATION
8
9
>
Spectrum-based fault localization, e.g.,
𝑆 𝑂𝑐ℎ𝑖𝑎𝑖 𝑠 =
𝑓𝑎𝑖𝑙𝑒𝑑(𝑠)
𝑓𝑎𝑖𝑙𝑒𝑑 𝑠 + 𝑝𝑎𝑠𝑠𝑒𝑑 𝑠 ∗ (𝑓𝑎𝑖𝑙𝑒𝑑 𝑠 + 𝑓𝑎𝑖𝑙𝑒𝑑(¬𝑠))
𝑆 𝑇𝑎𝑟𝑎𝑛𝑡𝑢𝑙𝑎 𝑠 =
𝑆 𝐵ar𝑖𝑛𝑒𝑙 𝑠 = 1 −
𝑝𝑎𝑠𝑠𝑒𝑑(𝑠)
𝑓𝑎𝑖𝑙𝑒𝑑 𝑠 + 𝑝𝑎𝑠𝑠𝑒𝑑(𝑠)
𝑓𝑎𝑖𝑙𝑒𝑑(𝑠)
𝑓𝑎𝑖𝑙𝑒𝑑 𝑠 + 𝑓𝑎𝑖𝑙𝑒𝑑(¬𝑠)
𝑓𝑎𝑖𝑙𝑒𝑑(𝑠)
𝑓𝑎𝑖𝑙𝑒𝑑 𝑠 + 𝑓𝑎𝑖𝑙𝑒𝑑(¬𝑠)
+
𝑝𝑎𝑠𝑠𝑒𝑑(𝑠)
𝑝𝑎𝑠𝑠𝑒𝑑 𝑠 + 𝑝𝑎𝑠𝑠𝑒𝑑(¬𝑠)
Fault Localization used in Automated Program Repair for Java
10
>
<Suspicious Class Name> <Line> <Suspicious Value>
org.jfree.chart.plot.CategoryPlot 1613 0.316227766
org.jfree.chart.plot.CategoryPlot 1684 0.25
org.jfree.chart.plot.CategoryPlot 1682 0.25
org.jfree.chart.plot.CategoryPlot 1665 0.25
org.jfree.chart.plot.CategoryPlot 1340 0.1825741858
org.jfree.chart.plot.CategoryPlot 1339 0.1825741858
org.jfree.chart.plot.CategoryPlot 1358 0.1507556723
org.jfree.chart.plot.CategoryPlot 1367 0.1474419562
org.jfree.chart.plot.CategoryPlot 1045 0.1443375673
org.jfree.chart.renderer.category.AbstractCategoryItemRenderer 1798 0.115
org.jfree.chart.renderer.category.AbstractCategoryItemRenderer 1797 0.115
org.jfree.chart.renderer.category.AbstractCategoryItemRenderer 1796 0.115
org.jfree.data.category.DefaultCategoryDataset 259 0.0733235575
Fault Localization used in Automated Program Repair for Java
11
> Fault Localization Bias in APR
Test
Pass
Fail
Patch Validation
Plausible
Patch
Patch
Candidate
APR
Approach
Patch Generation
Fault
Localization
Tools
A Ranked List
of Suspicious
Code
Locations
Fault Localization (FL)
Buggy
Program
Passing
tests
Failing
tests
>
12
> Research Questions
• RQ1: How do APR systems leverage fault localization techniques?
• RQ2: How many bugs from a common APR benchmark are actually localizable?
• RQ3: To what extent APR performance can vary by adapting different fault
localization configurations?
13
RQ1: Integration of Fault Localization Tools
in APR Pipelines
14
> FL Techniques used in APR tools
HDRepair,
ssFix,
ACS
SimFix.
APR Tools FL framework Framework version FL ranking metric Supplementary information
jGenProg GZoltar 0.1.1 Ochiai ∅
jKali GZoltar 0.1.1 Ochiai ∅
jMutRepair GZoltar 0.1.1 Ochiai ∅
HDRepair ? ? ? Faulty method is known.
Nopol GZoltar 0.0.10 Ochiai ∅
ACS GZoltar 0.1.1 Ochiai Predicate switching [53].
ELIXIR ? ? Ochiai ?
JAID ? ? ? ?
ssFix GZoltar 0.1.1 ? Statements in crashed stack trace.
CapGen GZoltar 0.1.1 Ochiai ?
SketchFix ? ? Ochiai ?
FixMiner GZoltar 0.1.1 Ochiai ∅
LSRepair GZoltar 0.1.1 Ochiai ∅
SimFix GZoltar 1.6.0 Ochiai Test case purification [54].
15
1. State-of-the-art APR systems in the literature add some
adaptations to the usual FL process to improve its accuracy.
Unfortunately, researchers have eluded so far the contribution of
this improvement in the overall repair performance, leading to
biased comparisons.
2. APR systems do not fully disclose their fault localization tuning
parameters, thus preventing reliable replication and comparisons.
> Take-away Message
16
RQ2: Localizability of Defects4J Bugs
17
>
How many bugs in the Defects4J benchmark can actually be localized by
current automated fault localization tools?
Localizability: --- a/src/org/jfree/data/time/Week.java
+++ b/src/org/jfree/data/time/Week.java
@@ −173, 1 +173, 1 @@
public Week(Date time, TimeZone zone) {
// defer argument checking...
- this(time, RegularTimePeriod.DEFAULT_TIME_ZONE,
Locale.getDefault());
+ this(time, zone, Locale.getDefault);
}
File Level
Method Level
Line Level
18
>
How many bugs in the Defects4J benchmark can actually be localized by
current automated fault localization tools?
Project # Bugs
File Method Line
GZ-0.1.1 GZ-1.6.0 GZ-0.1.1 GZ-1.6.0 GZ-0.1.1 GZ-1.6.0
Chart 26 25 25 22 24 22 24
Closure 133 113 128 78 96 78 95
Lang 65 54 64 32 59 29 57
Math 106 101 105 92 100 91 100
Mockito 38 25 26 22 24 21 23
Time 27 26 26 22 22 22 22
Total 395 344 374 268 325 263 321
Number of Defects4J bugs localized with GZoltar + Ochiai.
One third of bugs in the Defects4J dataset cannot be localized at line
level by the commonly used automated fault localization tool.
19
>
How many bugs in the Defects4J benchmark can actually be localized by
current automated fault localization tools?
Ranking
Metric
GZoltar-0.1.1 GZoltar-1.6.0
File Method Line File Method Line
Top-1 Position
Tarantula 171 101 45 169 106 35
Ochiai 173 102 45 172 111 38
DStar 173 102 45 175 114 40
Barinel 171 101 45 169 107 36
Opt2 175 97 39 179 115 39
Muse 170 98 40 178 118 41
Jaccard 173 102 45 171 112 39
Top-10 Position
Tarantula 240 180 135 242 189 144
Ochiai 244 184 140 242 191 145
DStar 245 184 139 242 190 142
Barinel 240 180 135 242 190 145
Opt2 237 168 128 239 184 135
Muse 234 169 129 239 186 140
Jaccard 245 184 139 241 188 142
A small part of bugs
can be localized with
high positions in the
ranking list of
suspicious positions.
20
> Impact of Effective Ranking in Fault Localization
APR tools are prone to correctly fix the subset of
Defects4J bugs that can be accurately localized.
21
1. One third of bugs in the Defects4J dataset cannot be localized by the
commonly used automated fault localization tool.
2. APR tools are prone to correctly fix the subset of Defects4J bugs that
can be accurately localized.
> Take-away Message
22
RQ3: Evaluation kPAR with Specific
Fault Localization Configurations
23
> kPAR with four different fault localization configurations
Normal FL:
File Assumption:
Method
Assumption:
Line Assumption:
It relies on the ranked list of suspicious code locations reported
by a given FL tool.
It assumes that the faulty code files are known.
It assumes that the faulty methods are known.
It assumes that the faulty code lines are known. No
fault localization is then used.
kPAR: Java implementation of PAR (Kim et al. ICSE 2013)
+ Gzoltar-0.1.1 + Ochiai.
24
> Repair Performance of kPAR with Four FL Configurations
FL Conf. Chart (C) Closure (Cl) Lang (L) Math (M) Mockito (Moc) Time (T) Total
Normal FL 3/10 5/9 1/8 7/18 1/2 1/2 18/49
File Assumption 4/7 6/13 1/8 7/15 2/2 2/3 22/48
Method Assumption 4/6 7/16 1/7 7/15 2/2 2/3 23/49
Line Assumption 7/8 11/16 4/9 9/16 2/2 3/4 36/55
Number of Defects4J bugs fixed by kPAR with four FL configurations.
C-1, 4, 7, L-59.
Cl-2, 38, 62, 63, 73.
M-15, 33, 58, 70, 75, 85, 89.
Moc-38, T-7.
File_AssumptionNormal_FL
C-26, Cl-4,
Moc-29, T-19.
Cl-10
Method_Assumption
C-8,14,19.
Cl-31,38,40,70.
M-4,82, T-26.
L-6,22,24.
Line_Assumption
With better fault localization results, kPAR can correctly fix more bugs.
25
> Take-away Message
Accuracy of fault localization has a
direct and substantial impact on the
performance of APR repair pipelines.
26
Discussion
27
> Discussions
1.Full disclosure of FL parameters.
2.Qualification of APR performance.
3.Patch generation step vs Repair pipeline.
4.Sensitivity of search space.
28
> Summary
7
> Fault Localization Bias in APR
Test
Pass
Fail
Patch Validation
Plausible
Patch
Patch
Candidate
APR
Approach
Patch Generation
Fault
Localization
Tools
A Ranked List
of Suspicious
Code
Locations
Fault Localization (FL)
Buggy
Program
Passing
tests
Failing
tests

More Related Content

What's hot

Learning to Spot and Refactor Inconsistent Method Names
Learning to Spot and Refactor Inconsistent Method NamesLearning to Spot and Refactor Inconsistent Method Names
Learning to Spot and Refactor Inconsistent Method NamesDongsun Kim
 
Automated Program Repair Keynote talk
Automated Program Repair Keynote talkAutomated Program Repair Keynote talk
Automated Program Repair Keynote talkAbhik Roychoudhury
 
Test final jav_aaa
Test final jav_aaaTest final jav_aaa
Test final jav_aaaBagusBudi11
 
Code Analysis-run time error prediction
Code Analysis-run time error predictionCode Analysis-run time error prediction
Code Analysis-run time error predictionNIKHIL NAWATHE
 
SherLog: Error Diagnosis by Connecting Clues from Run-time Logs
SherLog: Error Diagnosis by Connecting Clues from Run-time LogsSherLog: Error Diagnosis by Connecting Clues from Run-time Logs
SherLog: Error Diagnosis by Connecting Clues from Run-time LogsDacong (Tony) Yan
 
Automated Repair - ISSTA Summer School
Automated Repair - ISSTA Summer SchoolAutomated Repair - ISSTA Summer School
Automated Repair - ISSTA Summer SchoolAbhik Roychoudhury
 
Effective Fault-Localization Techniques for Concurrent Software
Effective Fault-Localization Techniques for Concurrent SoftwareEffective Fault-Localization Techniques for Concurrent Software
Effective Fault-Localization Techniques for Concurrent SoftwareSangmin Park
 
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...Sangmin Park
 
Opal Hermes - towards representative benchmarks
Opal  Hermes - towards representative benchmarksOpal  Hermes - towards representative benchmarks
Opal Hermes - towards representative benchmarksMichaelEichberg1
 
Eclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDT
Eclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDTEclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDT
Eclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDTElena Laskavaia
 
Dissertation Defense
Dissertation DefenseDissertation Defense
Dissertation DefenseSung Kim
 
Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Sung Kim
 
Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...
Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...
Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...Lionel Briand
 
Code Analysis and Refactoring with CDT
Code Analysis and Refactoring with CDTCode Analysis and Refactoring with CDT
Code Analysis and Refactoring with CDTdschaefer
 

What's hot (20)

Learning to Spot and Refactor Inconsistent Method Names
Learning to Spot and Refactor Inconsistent Method NamesLearning to Spot and Refactor Inconsistent Method Names
Learning to Spot and Refactor Inconsistent Method Names
 
Automated Program Repair Keynote talk
Automated Program Repair Keynote talkAutomated Program Repair Keynote talk
Automated Program Repair Keynote talk
 
Isorc18 keynote
Isorc18 keynoteIsorc18 keynote
Isorc18 keynote
 
Repair dagstuhl jan2017
Repair dagstuhl jan2017Repair dagstuhl jan2017
Repair dagstuhl jan2017
 
Test final jav_aaa
Test final jav_aaaTest final jav_aaa
Test final jav_aaa
 
Code Analysis-run time error prediction
Code Analysis-run time error predictionCode Analysis-run time error prediction
Code Analysis-run time error prediction
 
Mobilesoft 2017 Keynote
Mobilesoft 2017 KeynoteMobilesoft 2017 Keynote
Mobilesoft 2017 Keynote
 
SherLog: Error Diagnosis by Connecting Clues from Run-time Logs
SherLog: Error Diagnosis by Connecting Clues from Run-time LogsSherLog: Error Diagnosis by Connecting Clues from Run-time Logs
SherLog: Error Diagnosis by Connecting Clues from Run-time Logs
 
Automated Repair - ISSTA Summer School
Automated Repair - ISSTA Summer SchoolAutomated Repair - ISSTA Summer School
Automated Repair - ISSTA Summer School
 
Effective Fault-Localization Techniques for Concurrent Software
Effective Fault-Localization Techniques for Concurrent SoftwareEffective Fault-Localization Techniques for Concurrent Software
Effective Fault-Localization Techniques for Concurrent Software
 
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
 
Abhik-Satish-dagstuhl
Abhik-Satish-dagstuhlAbhik-Satish-dagstuhl
Abhik-Satish-dagstuhl
 
Opal Hermes - towards representative benchmarks
Opal  Hermes - towards representative benchmarksOpal  Hermes - towards representative benchmarks
Opal Hermes - towards representative benchmarks
 
Eclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDT
Eclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDTEclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDT
Eclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDT
 
APSEC2020 Keynote
APSEC2020 KeynoteAPSEC2020 Keynote
APSEC2020 Keynote
 
Dissertation Defense
Dissertation DefenseDissertation Defense
Dissertation Defense
 
Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)
 
Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...
Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...
Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...
 
Code Analysis and Refactoring with CDT
Code Analysis and Refactoring with CDTCode Analysis and Refactoring with CDT
Code Analysis and Refactoring with CDT
 
c++ lab manual
c++ lab manualc++ lab manual
c++ lab manual
 

Similar to You Cannot Fix What You Cannot Find! --- An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems

Combining Cluster Sampling and ACE analysis to improve fault-injection based ...
Combining Cluster Sampling and ACE analysis to improve fault-injection based ...Combining Cluster Sampling and ACE analysis to improve fault-injection based ...
Combining Cluster Sampling and ACE analysis to improve fault-injection based ...Stefano Di Carlo
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Sung Kim
 
Virtual hybrid simualtion test - Modelling experimental errors
Virtual hybrid simualtion test - Modelling experimental errorsVirtual hybrid simualtion test - Modelling experimental errors
Virtual hybrid simualtion test - Modelling experimental errorsopenseesdays
 
COMPARATIVE ANALYSIS OF CONVENTIONAL PID CONTROLLER AND FUZZY CONTROLLER WIT...
COMPARATIVE  ANALYSIS OF CONVENTIONAL PID CONTROLLER AND FUZZY CONTROLLER WIT...COMPARATIVE  ANALYSIS OF CONVENTIONAL PID CONTROLLER AND FUZZY CONTROLLER WIT...
COMPARATIVE ANALYSIS OF CONVENTIONAL PID CONTROLLER AND FUZZY CONTROLLER WIT...IJITCA Journal
 
Software Testing: Test Design and the Project Life Cycle
Software Testing: Test Design and the Project Life CycleSoftware Testing: Test Design and the Project Life Cycle
Software Testing: Test Design and the Project Life CycleDerek Callaway
 
Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2gregoryg
 
Automatic Fine-Grained Issue Report Reclassification
Automatic Fine-Grained Issue Report ReclassificationAutomatic Fine-Grained Issue Report Reclassification
Automatic Fine-Grained Issue Report ReclassificationPavneet Singh Kochhar
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysJ On The Beach
 
A Graph-based Feature Location Approach using Set Theory [SPLC2019]
A Graph-based Feature Location Approach using Set Theory [SPLC2019]A Graph-based Feature Location Approach using Set Theory [SPLC2019]
A Graph-based Feature Location Approach using Set Theory [SPLC2019]Richard Müller
 
Advancing VLSI Design Reliability: A Comprehensive Examination of Embedded De...
Advancing VLSI Design Reliability: A Comprehensive Examination of Embedded De...Advancing VLSI Design Reliability: A Comprehensive Examination of Embedded De...
Advancing VLSI Design Reliability: A Comprehensive Examination of Embedded De...IRJET Journal
 
Medical Image Segmentation Using Hidden Markov Random Field A Distributed Ap...
Medical Image Segmentation Using Hidden Markov Random Field  A Distributed Ap...Medical Image Segmentation Using Hidden Markov Random Field  A Distributed Ap...
Medical Image Segmentation Using Hidden Markov Random Field A Distributed Ap...EL-Hachemi Guerrout
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...NECST Lab @ Politecnico di Milano
 
Using Machine Learning to Debug Oracle RAC Issues
Using Machine Learning to Debug Oracle RAC IssuesUsing Machine Learning to Debug Oracle RAC Issues
Using Machine Learning to Debug Oracle RAC IssuesAnil Nair
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfcookie1969
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)Sung Kim
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUEScsitconf
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUEScscpconf
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 

Similar to You Cannot Fix What You Cannot Find! --- An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems (20)

Combining Cluster Sampling and ACE analysis to improve fault-injection based ...
Combining Cluster Sampling and ACE analysis to improve fault-injection based ...Combining Cluster Sampling and ACE analysis to improve fault-injection based ...
Combining Cluster Sampling and ACE analysis to improve fault-injection based ...
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
 
Virtual hybrid simualtion test - Modelling experimental errors
Virtual hybrid simualtion test - Modelling experimental errorsVirtual hybrid simualtion test - Modelling experimental errors
Virtual hybrid simualtion test - Modelling experimental errors
 
COMPARATIVE ANALYSIS OF CONVENTIONAL PID CONTROLLER AND FUZZY CONTROLLER WIT...
COMPARATIVE  ANALYSIS OF CONVENTIONAL PID CONTROLLER AND FUZZY CONTROLLER WIT...COMPARATIVE  ANALYSIS OF CONVENTIONAL PID CONTROLLER AND FUZZY CONTROLLER WIT...
COMPARATIVE ANALYSIS OF CONVENTIONAL PID CONTROLLER AND FUZZY CONTROLLER WIT...
 
SBCCI08
SBCCI08SBCCI08
SBCCI08
 
Software Testing: Test Design and the Project Life Cycle
Software Testing: Test Design and the Project Life CycleSoftware Testing: Test Design and the Project Life Cycle
Software Testing: Test Design and the Project Life Cycle
 
CodeChecker summary 21062021
CodeChecker summary 21062021CodeChecker summary 21062021
CodeChecker summary 21062021
 
Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2
 
Automatic Fine-Grained Issue Report Reclassification
Automatic Fine-Grained Issue Report ReclassificationAutomatic Fine-Grained Issue Report Reclassification
Automatic Fine-Grained Issue Report Reclassification
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The Monkeys
 
A Graph-based Feature Location Approach using Set Theory [SPLC2019]
A Graph-based Feature Location Approach using Set Theory [SPLC2019]A Graph-based Feature Location Approach using Set Theory [SPLC2019]
A Graph-based Feature Location Approach using Set Theory [SPLC2019]
 
Advancing VLSI Design Reliability: A Comprehensive Examination of Embedded De...
Advancing VLSI Design Reliability: A Comprehensive Examination of Embedded De...Advancing VLSI Design Reliability: A Comprehensive Examination of Embedded De...
Advancing VLSI Design Reliability: A Comprehensive Examination of Embedded De...
 
Medical Image Segmentation Using Hidden Markov Random Field A Distributed Ap...
Medical Image Segmentation Using Hidden Markov Random Field  A Distributed Ap...Medical Image Segmentation Using Hidden Markov Random Field  A Distributed Ap...
Medical Image Segmentation Using Hidden Markov Random Field A Distributed Ap...
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
 
Using Machine Learning to Debug Oracle RAC Issues
Using Machine Learning to Debug Oracle RAC IssuesUsing Machine Learning to Debug Oracle RAC Issues
Using Machine Learning to Debug Oracle RAC Issues
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 

Recently uploaded

Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceOpsTree solutions
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactivestartupro
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The work to make the piecework work: An ethnographic study of food delivery w...
The work to make the piecework work: An ethnographic study of food delivery w...The work to make the piecework work: An ethnographic study of food delivery w...
The work to make the piecework work: An ethnographic study of food delivery w...stockholm university
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Arti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdfArti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdfwill854175
 
The Critical Role of Spatial Data in Today's Data Ecosystem
The Critical Role of Spatial Data in Today's Data EcosystemThe Critical Role of Spatial Data in Today's Data Ecosystem
The Critical Role of Spatial Data in Today's Data EcosystemSafe Software
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024BookNet Canada
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
Software Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerSoftware Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerAnchore
 
Why Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile EvolutionWhy Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile EvolutionDEEPRAJ PATHAK
 
Introduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxIntroduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxmprakaash5
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 

Recently uploaded (20)

Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer Experience
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactive
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The work to make the piecework work: An ethnographic study of food delivery w...
The work to make the piecework work: An ethnographic study of food delivery w...The work to make the piecework work: An ethnographic study of food delivery w...
The work to make the piecework work: An ethnographic study of food delivery w...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Arti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdfArti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdf
 
The Critical Role of Spatial Data in Today's Data Ecosystem
The Critical Role of Spatial Data in Today's Data EcosystemThe Critical Role of Spatial Data in Today's Data Ecosystem
The Critical Role of Spatial Data in Today's Data Ecosystem
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
Software Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerSoftware Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey Hightower
 
Why Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile EvolutionWhy Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile Evolution
 
Introduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxIntroduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 

You Cannot Fix What You Cannot Find! --- An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems

  • 1. You Cannot Fix What You Cannot Find! An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems Kui Liu, Anil Koyuncu, Tegawendé F. Bissyandé, Dongsun Kim, Jacques Klein and Yves Le Traon SnT, University of Luxembourg, Luxembourg Xi’an, China, 12th ICST 2019April 24, 2019
  • 3. 2 > Automated Program Repair (APR) Basic Process Test Pass Fail Patch Validation Plausible Patch Patch Candidate APR Approach Patch Generation Fault Localization Tools A Ranked List of Suspicious Code Locations Fault Localization (FL) Buggy Program Passing tests Failing tests
  • 4. 3 > Repair Performance Assessment of APR Tools GenProg, ACS, CapGen, SimFix, etc.
  • 5. 4 > Repair Performance Assessment of APR Tools Benchmark Defects4J SimFix jGP jKali Nopol ACS HDR ssFix ELIXIR JAID Chart 4 0 0 1 2 -(2) 3 4 2(4) Closure 6 0 0 0 0 -(7) 2 0 5(9) Math 14 5 1 1 12 -(7) 10 12 1(7) Lang 9 0 0 3 3 -(6) 5 8 1(5) Time 1 0 0 0 1 -(1) 0 2 0(0) Total 34 5 1 5 18 13(23) 20 26 9(25) TABLE I. TABLE EXCERPTED FROM [33] WITH THE CAPTION “Correct patches generated by different techniques”. Is this kind of comparison fair enough? *The numbers in parenthesis(#) denote the number of bugs fixed by APR tools but ignoring the patch ranking.
  • 7. 6 > Automated Program Repair (APR) Basic Process Test Pass Fail Patch Validation Plausible Patch Patch Candidate APR Approach Patch Generation Fault Localization Tools A Ranked List of Suspicious Code Locations Fault Localization (FL) Buggy Program Passing tests Failing tests Same SameContributionNo Discussion There is no any disclosed discussion on the impact of fault localization for APR tool repair performance.
  • 8. 7 > Repair Performance Assessment of APR Tools Benchmark Defects4J SimFix jGP jKali Nopol ACS HDR ssFix ELIXIR JAID Chart 4 0 0 1 2 -(2) 3 4 2(4) Closure 6 0 0 0 0 -(7) 2 0 5(9) Math 14 5 1 1 12 -(7) 10 12 1(7) Lang 9 0 0 3 3 -(6) 5 8 1(5) Time 1 0 0 0 1 -(1) 0 2 0(0) Total 34 5 1 5 18 13(23) 20 26 9(25) TABLE I. TABLE EXCERPTED FROM [33] WITH THE CAPTION “Correct patches generated by different techniques”. BIASED If the fault localization of APR tools are different from each other, this kind of comparison could be biased or misleading.
  • 10. 9 > Spectrum-based fault localization, e.g., 𝑆 𝑂𝑐ℎ𝑖𝑎𝑖 𝑠 = 𝑓𝑎𝑖𝑙𝑒𝑑(𝑠) 𝑓𝑎𝑖𝑙𝑒𝑑 𝑠 + 𝑝𝑎𝑠𝑠𝑒𝑑 𝑠 ∗ (𝑓𝑎𝑖𝑙𝑒𝑑 𝑠 + 𝑓𝑎𝑖𝑙𝑒𝑑(¬𝑠)) 𝑆 𝑇𝑎𝑟𝑎𝑛𝑡𝑢𝑙𝑎 𝑠 = 𝑆 𝐵ar𝑖𝑛𝑒𝑙 𝑠 = 1 − 𝑝𝑎𝑠𝑠𝑒𝑑(𝑠) 𝑓𝑎𝑖𝑙𝑒𝑑 𝑠 + 𝑝𝑎𝑠𝑠𝑒𝑑(𝑠) 𝑓𝑎𝑖𝑙𝑒𝑑(𝑠) 𝑓𝑎𝑖𝑙𝑒𝑑 𝑠 + 𝑓𝑎𝑖𝑙𝑒𝑑(¬𝑠) 𝑓𝑎𝑖𝑙𝑒𝑑(𝑠) 𝑓𝑎𝑖𝑙𝑒𝑑 𝑠 + 𝑓𝑎𝑖𝑙𝑒𝑑(¬𝑠) + 𝑝𝑎𝑠𝑠𝑒𝑑(𝑠) 𝑝𝑎𝑠𝑠𝑒𝑑 𝑠 + 𝑝𝑎𝑠𝑠𝑒𝑑(¬𝑠) Fault Localization used in Automated Program Repair for Java
  • 11. 10 > <Suspicious Class Name> <Line> <Suspicious Value> org.jfree.chart.plot.CategoryPlot 1613 0.316227766 org.jfree.chart.plot.CategoryPlot 1684 0.25 org.jfree.chart.plot.CategoryPlot 1682 0.25 org.jfree.chart.plot.CategoryPlot 1665 0.25 org.jfree.chart.plot.CategoryPlot 1340 0.1825741858 org.jfree.chart.plot.CategoryPlot 1339 0.1825741858 org.jfree.chart.plot.CategoryPlot 1358 0.1507556723 org.jfree.chart.plot.CategoryPlot 1367 0.1474419562 org.jfree.chart.plot.CategoryPlot 1045 0.1443375673 org.jfree.chart.renderer.category.AbstractCategoryItemRenderer 1798 0.115 org.jfree.chart.renderer.category.AbstractCategoryItemRenderer 1797 0.115 org.jfree.chart.renderer.category.AbstractCategoryItemRenderer 1796 0.115 org.jfree.data.category.DefaultCategoryDataset 259 0.0733235575 Fault Localization used in Automated Program Repair for Java
  • 12. 11 > Fault Localization Bias in APR Test Pass Fail Patch Validation Plausible Patch Patch Candidate APR Approach Patch Generation Fault Localization Tools A Ranked List of Suspicious Code Locations Fault Localization (FL) Buggy Program Passing tests Failing tests
  • 13. > 12 > Research Questions • RQ1: How do APR systems leverage fault localization techniques? • RQ2: How many bugs from a common APR benchmark are actually localizable? • RQ3: To what extent APR performance can vary by adapting different fault localization configurations?
  • 14. 13 RQ1: Integration of Fault Localization Tools in APR Pipelines
  • 15. 14 > FL Techniques used in APR tools HDRepair, ssFix, ACS SimFix. APR Tools FL framework Framework version FL ranking metric Supplementary information jGenProg GZoltar 0.1.1 Ochiai ∅ jKali GZoltar 0.1.1 Ochiai ∅ jMutRepair GZoltar 0.1.1 Ochiai ∅ HDRepair ? ? ? Faulty method is known. Nopol GZoltar 0.0.10 Ochiai ∅ ACS GZoltar 0.1.1 Ochiai Predicate switching [53]. ELIXIR ? ? Ochiai ? JAID ? ? ? ? ssFix GZoltar 0.1.1 ? Statements in crashed stack trace. CapGen GZoltar 0.1.1 Ochiai ? SketchFix ? ? Ochiai ? FixMiner GZoltar 0.1.1 Ochiai ∅ LSRepair GZoltar 0.1.1 Ochiai ∅ SimFix GZoltar 1.6.0 Ochiai Test case purification [54].
  • 16. 15 1. State-of-the-art APR systems in the literature add some adaptations to the usual FL process to improve its accuracy. Unfortunately, researchers have eluded so far the contribution of this improvement in the overall repair performance, leading to biased comparisons. 2. APR systems do not fully disclose their fault localization tuning parameters, thus preventing reliable replication and comparisons. > Take-away Message
  • 17. 16 RQ2: Localizability of Defects4J Bugs
  • 18. 17 > How many bugs in the Defects4J benchmark can actually be localized by current automated fault localization tools? Localizability: --- a/src/org/jfree/data/time/Week.java +++ b/src/org/jfree/data/time/Week.java @@ −173, 1 +173, 1 @@ public Week(Date time, TimeZone zone) { // defer argument checking... - this(time, RegularTimePeriod.DEFAULT_TIME_ZONE, Locale.getDefault()); + this(time, zone, Locale.getDefault); } File Level Method Level Line Level
  • 19. 18 > How many bugs in the Defects4J benchmark can actually be localized by current automated fault localization tools? Project # Bugs File Method Line GZ-0.1.1 GZ-1.6.0 GZ-0.1.1 GZ-1.6.0 GZ-0.1.1 GZ-1.6.0 Chart 26 25 25 22 24 22 24 Closure 133 113 128 78 96 78 95 Lang 65 54 64 32 59 29 57 Math 106 101 105 92 100 91 100 Mockito 38 25 26 22 24 21 23 Time 27 26 26 22 22 22 22 Total 395 344 374 268 325 263 321 Number of Defects4J bugs localized with GZoltar + Ochiai. One third of bugs in the Defects4J dataset cannot be localized at line level by the commonly used automated fault localization tool.
  • 20. 19 > How many bugs in the Defects4J benchmark can actually be localized by current automated fault localization tools? Ranking Metric GZoltar-0.1.1 GZoltar-1.6.0 File Method Line File Method Line Top-1 Position Tarantula 171 101 45 169 106 35 Ochiai 173 102 45 172 111 38 DStar 173 102 45 175 114 40 Barinel 171 101 45 169 107 36 Opt2 175 97 39 179 115 39 Muse 170 98 40 178 118 41 Jaccard 173 102 45 171 112 39 Top-10 Position Tarantula 240 180 135 242 189 144 Ochiai 244 184 140 242 191 145 DStar 245 184 139 242 190 142 Barinel 240 180 135 242 190 145 Opt2 237 168 128 239 184 135 Muse 234 169 129 239 186 140 Jaccard 245 184 139 241 188 142 A small part of bugs can be localized with high positions in the ranking list of suspicious positions.
  • 21. 20 > Impact of Effective Ranking in Fault Localization APR tools are prone to correctly fix the subset of Defects4J bugs that can be accurately localized.
  • 22. 21 1. One third of bugs in the Defects4J dataset cannot be localized by the commonly used automated fault localization tool. 2. APR tools are prone to correctly fix the subset of Defects4J bugs that can be accurately localized. > Take-away Message
  • 23. 22 RQ3: Evaluation kPAR with Specific Fault Localization Configurations
  • 24. 23 > kPAR with four different fault localization configurations Normal FL: File Assumption: Method Assumption: Line Assumption: It relies on the ranked list of suspicious code locations reported by a given FL tool. It assumes that the faulty code files are known. It assumes that the faulty methods are known. It assumes that the faulty code lines are known. No fault localization is then used. kPAR: Java implementation of PAR (Kim et al. ICSE 2013) + Gzoltar-0.1.1 + Ochiai.
  • 25. 24 > Repair Performance of kPAR with Four FL Configurations FL Conf. Chart (C) Closure (Cl) Lang (L) Math (M) Mockito (Moc) Time (T) Total Normal FL 3/10 5/9 1/8 7/18 1/2 1/2 18/49 File Assumption 4/7 6/13 1/8 7/15 2/2 2/3 22/48 Method Assumption 4/6 7/16 1/7 7/15 2/2 2/3 23/49 Line Assumption 7/8 11/16 4/9 9/16 2/2 3/4 36/55 Number of Defects4J bugs fixed by kPAR with four FL configurations. C-1, 4, 7, L-59. Cl-2, 38, 62, 63, 73. M-15, 33, 58, 70, 75, 85, 89. Moc-38, T-7. File_AssumptionNormal_FL C-26, Cl-4, Moc-29, T-19. Cl-10 Method_Assumption C-8,14,19. Cl-31,38,40,70. M-4,82, T-26. L-6,22,24. Line_Assumption With better fault localization results, kPAR can correctly fix more bugs.
  • 26. 25 > Take-away Message Accuracy of fault localization has a direct and substantial impact on the performance of APR repair pipelines.
  • 28. 27 > Discussions 1.Full disclosure of FL parameters. 2.Qualification of APR performance. 3.Patch generation step vs Repair pipeline. 4.Sensitivity of search space.
  • 29. 28 > Summary 7 > Fault Localization Bias in APR Test Pass Fail Patch Validation Plausible Patch Patch Candidate APR Approach Patch Generation Fault Localization Tools A Ranked List of Suspicious Code Locations Fault Localization (FL) Buggy Program Passing tests Failing tests

Editor's Notes

  1. Good afternoon, every one, my name is Jacques Klein, I’m prof. At the University of Luxembourg. Today, I am going to present our recent work entitled The main outcome of this work is the demonstration of BIAS in Fault localization in APR pipeline Before to start let me introduce the authors of this paper
  2. Now, let’s take a look at the basic process of automated program repair. The first process is Fault localization that identifies the code positions that are likely to be buggy. The second one is Patch generation, mutating the buggy code to generate patch candidates with proposed approach. The last process is to validate whether the generated patch can fix the bug or not.
  3. Then, how to evaluate the repair performance of APR tools? To assess the repair performance of automated program repair tools, Researchers proposed to, in a dataset with limited number of bugs, compare APR tools with the number of bugs fixed by plausible patches and correct patches, respectively.
  4. For example, this table is excerpted from SimFix Paper, which shows the number of bugs in Defects4J correctly fixed by different APR tools. OK, here is a question: is this kind of comparison fair enough.
  5. OK, let’s go back to the APR basic process. The input of APR tools are the same, which are from the same benchmark defects4j. The last process is patch validation which is conducted by regression test, so it is also the same for APR tools. The third process is patch generation that is the key contribution of each APR tool which shows how to generate patches for bugs. The third process is highly dependent on the fault localization process which identify which code should be modified by the APR tool to generated patches. However, there is no any disclosed discussion on the impact of fault localization for ARP tool repair performance.
  6. If the fault localization of APR tools are different from each other, this kind of comparison could be biased or misleading.
  7. Why the proposed comparison could be biased, if APR tools use different fault localization tools. Now, let’s take a look at the fault localization techniques used in automated program repair for Java programs. In the state-of-the-art APR tools for Java programs, the widely used fault localization technique is spectrum-based fault localization, such as the GZoltar framework. Spectrum-based fault localization relies on the ranking metric to calculate the suspiciousness value of each code statement. Here, we list three ranking metrics: Ochiai, Tarantula, and Barinel, to calculate the suspiciousness value of each statement in buggy program. In these formulas, lower case letter s denotes one statement, failed(s) denotes the number of failed test cases that execute this statement, passed(s) denotes the number of passed test cases that execute this statement. failed(¬s) denotes the number of failed test cases that do not execute this statement. passed(¬s) denotes the number of passed test cases that do not execute this statement. Ok, with this kind of spectrum-based fault localization, suspicious code positions of buggy programs are reported in this way. The first column lists the suspicious class name, the second column lists the suspicious line, and the last column lists the suspiciousness value. APR tools try to mutate each suspicious line one by one until the first plausible patch is generated. So, this suspicious code position list can affect the repair performance of APR tools.
  8. Why the proposed comparison could be biased, if APR tools use different fault localization tools. Now, let’s take a look at the fault localization techniques used in automated program repair for Java programs. In the state-of-the-art APR tools for Java programs, the widely used fault localization technique is spectrum-based fault localization, such as the GZoltar framework. Spectrum-based fault localization relies on the ranking metric to calculate the suspiciousness value of each code statement. Here, we list three ranking metrics: Ochiai, Tarantula, and Barinel, to calculate the suspiciousness value of each statement in buggy program. In these formulas, lower case letter s denotes one statement, failed(s) denotes the number of failed test cases that execute this statement, passed(s) denotes the number of passed test cases that execute this statement. failed(¬s) denotes the number of failed test cases that do not execute this statement. passed(¬s) denotes the number of passed test cases that do not execute this statement. Ok, with this kind of spectrum-based fault localization, suspicious code positions of buggy programs are reported in this way. The first column lists the suspicious class name, the second column lists the suspicious line, and the last column lists the suspiciousness value. APR tools try to mutate each suspicious line one by one until the first plausible patch is generated. So, this suspicious code position list can affect the repair performance of APR tools.
  9. So, if the fault localization techniques are different, APR tools can have different repair performance even they use the same patch generation technique.
  10. In this work, we investigate the fault localization bias in automated program repair systems with these three research questions.
  11. To answer RQ1: We first investigate FL techniques used in APR systems in the literature. This reveals which FL tool and formula are integrated for each APR system.
  12. This table illustrates the fault localization techniques used by APR tools. Most APR tools use the Gzoltar framework, 0.1.1 version and Ochiai ranking metric to identify bug positions. The unspecified/unconfirmed information of an APR tools is marked with ‘?’. If an APR tool does not use any supplementary information for FL, the corresponding table cell is marked with ‘∅’. Furthermore, HDRepair assumes that the faulty method is known to improve fault localization accuracy. ACS leverages predicate switching, ssFix uses statements in crashed stack trace, and SimFix leverages test case purification to improve the fault localization accuracy.
  13. Now, let’s take a look at the second research question.
  14. To answer the second research question, we define the localizability with three levels: file, method, and line level. File level: we consider that the faulty code is accurately localized if an FL tool reports at least one line from the buggy code file as suspicious. Method Level: we consider that the faulty code is accurately localized if at least one code line in the buggy method is reported by an FL tool as suspicious. Line level: we consider that the faulty code is accurately localized if suspicious code lines reported by an FL tool contain any of the buggy code lines.
  15. This table shows the number of bugs that can be identified in the three granularities with two versions of Gzoltar. We note that some bugs cannot be located by Gzoltar.
  16. APR tools repair bugs by mutating the suspicious statements one by one from the high suspicious value to low one. If we only consider the suspicious statements with the highest suspicious value at line granularity, less than 50 bugs can be localized. If we enlarge the search space of suspicious statements to top-10 at line granularity, only one third of bugs cane be localized.
  17. To understand the fault localization effects on automated program, we further investigate the localization ranking positions of fixed and unfixed bugs. If the value is closer to 1.0, the exact buggy statements have higher suspiciousness value in the suspicious statements reported by fault localization techniques. This figure shows that, most bugs are fixed since they can be localized accurate than the unfixed bugs. where bugpos refers to the position of the actual bug location1 in the ranked list of suspicious locations reported by the FL step. If the bug location can be found in the higher position of the ranked list, the value of Reciprocal is closer to 1.
  18. To what extent APR performance can vary by adapting different fault localization configurations?
  19. To answer this question, we implementd kPAR, the java implementation of PAR, with four different fault localization configurations. File Assumption: it means that only the suspicious statements in the buggy code file are considered as the input of APR tools to generate patch candidates. Method Assumption: it means that only the suspicious statements in the buggy methods are considered as the input of APR tools.
  20. In this table, each cell has two numbers, the left one means the number of bugs correctly fixed by kPAR. The right one means the number of bugs plausibly fixed by kPAR. According to the results shown in this table, we note that, with better fault localization results, kPAR can correctly fix more bugs. Additionally, the bugs correctly fixed with lower accurate fault localization results can also be correctly fixed with higher accurate fault localization results.
  21. Given that many APR systems do not release their source code, it is important that the experimental reports clearly indicate the protocol used for fault localization. (Preferably, authors should strive to assess their performance under a standard and replicable configuration of fault localization.) 2. To ensure that novel approaches to APR are indeed improving over the state-of-the-art, authors must qualify the performance gain brought by the different ingredients of their approaches. 3. There are two distinct directions of repair benchmarking that APR researchers should consider. In the first, a novel contribution to the patch generation problem must be assessed directly by assuming a perfect fault localization. In the second, for ensuring realistic assessment w.r.t. industry adoption, the full pipeline should be tested with no assumptions on fault localization accuracy. 4. Given that fault localization produces a ranked list of suspicious locations, it is essential to characterize whether exact locations are strictly necessary for the APR approach to generate the correct patches. For example, an APR system may not focus only on a suspected location but on the context around this location. APR approaches may also use heuristics to curate the FL results.
  22. Voilà To conclude, I repeat the key message, to not propose biased assement in APR pipeline, take care of the FL step