This keynote presentation discusses defect prediction, focusing on identifying new metrics, developing new algorithms, and exploring various levels of granularity. It provides examples of studies that have identified new metrics like complexity metrics, historical metrics, and network measures. It also discusses different machine learning algorithms used like classification, regression, and both. The presentation shows defect prediction can be done at different levels of granularity including module/binary/package, file, method, and change levels. It summarizes research on performance and discusses challenges like noise and applying defect prediction to new customers/projects. Finally, it discusses moving towards finer-grained prediction at the line level and word level.
21. Slide by Mik Kersten. “Mylyn – The task-focused interface” (December 2007, http://live.eclipse.org)
22. With Mylyn
Tasks are integrated
See only what you are working on
Slide by Mik Kersten. “Mylyn – The task-focused interface” (December 2007, http://live.eclipse.org)
28. Change Entropy
11
Low Entropy High Entropy
3 3 3 3 3
1 1 1 1
F1
F1 F2
F2 F3
F3 F4
F4 F5
F5 F6
F1 F7
F2 F8
F3 F9
F4 F10
F5
The number of changes in a period (e.g., a week) per file
Hassan, “Predicting Faults Using the Complexity of Code Changes,” ICSE 2009
29. Change Entropy
11
Low Entropy High Entropy
3 3 3 3 3
1 1 1 1
F1
F1 F2
F2 F3
F3 F4
F4 F5
F5 F6
F1 F7
F2 F8
F3 F9
F4 F10
F5
The number of changes in a period (e.g., a week) per file
Hassan, “Predicting Faults Using the Complexity of Code Changes,” ICSE 2009
30. Previous Fixes
Hassan et al., ICSM 2005, Kim et al., ICSE 2007
31. Previous Fixes
Hassan et al., ICSM 2005, Kim et al., ICSE 2007
32. Previous Fixes
Hassan et al., ICSM 2005, Kim et al., ICSE 2007
37. Classification
training instances complexity metrics
(metrics+ labels)
historical metrics
...
?
new instance
Prediction
Learner (classification)
38. Regression
training instances complexity metrics
(metrics+ values)
historical metrics
...
?
new instance
Prediction
Learner (values)
39. Active Learning
Anomaly Detection
System 1
Refinement
Sorted
Engine
Bug Reports 2 5
<<Refinement Loop>>
First Few Bug User
Reports 3 Feedback
4
Figure 4. Active Refinement Process
characteristics in a clone group. Then, the set ofet al., PROMISE 2012
Lo et al., “Active Refinement of Clone Anomaly Reports,” ICSE 2012, Lu anomalies or
40. Bug Cache
c h 10% files
e t most bug-prone
- f
re ad
s
is
p
on Lo
m
t
en
em
ac
pl
Nearby: co changes
re
all files
Kim et al., “Predicting Faults from Cached History,” ICSE 2007
41. Algorithms
Classification 21
Algorithms
Regression 18
Both 4
Etc. 4
0 5 10 15 20 25
# of publications (recent 7 years)
31
47. Method Level
void foo () {
...
}
Hata et al.,“Bug Prediction Based on Fine-Grained Module Histories,” ICSE 2012
48. Method Level
void foo () {
...
}
Hata et al.,“Bug Prediction Based on Fine-Grained Module Histories,” ICSE 2012
49. Change Level
Development history of a file
Rev 1 Rev 2 Rev 3 Rev 4
... ... ... ...
... change ... change ... change ...
... ... ... ...
... ... ... ...
Did I just
introduce
a bug?
Kim et al., "Classifying Software Changes: Clean or Buggy?" TSE 2009
50. Change Level
Development history of a file
Rev 1 Rev 2 Rev 3 Rev 4
... ... ... ...
... change ... change ... change ...
... ... ... ...
... ... ... ...
Did I just
introduce
a bug?
Kim et al., "Classifying Software Changes: Clean or Buggy?" TSE 2009
51. More Granularities
Project/Release/SubSystem 3
Component/Module 8
Package 3
File 19
Class 8
Function/Method 2
Change/Hunk level 1
0 5 10 15 20
# of publications (recent 7 years)
53. Performance 11
Apache ArgoUML Eclipse Embedded Healthcare Microsoft Mozilla
System system
system
Hall et al., "A Systematic Review of Fault Prediction Performance in Software Engineering," TSE 2011 (Figure 2)
54. Performance
13 13
Class File Module Binary/plug-in
*For example plug-ins, binaries
*For example plug-ins, binaries
Figure 6. The granularity of the results
The granularity of the results
Hall et al., "A Systematic Review of Fault Prediction Performance in Software Engineering," TSE 2011 (Figure 6)
55. Performance
13 13
Class File Module Binary/plug-in
*For example plug-ins, binaries
*For example plug-ins, binaries
Figure 6. The granularity of the results
The granularity of the results
Hall et al., "A Systematic Review of Fault Prediction Performance in Software Engineering," TSE 2011 (Figure 6)
65. Performance of Bug Detection Tools
Tools' priority 1
FindBugs
Warnings
jLint
PMD
0 5 10 15 20
Precision (%)
Kim and Ernst, “Which Warnings Should I Fix First?” FSE 2007
66. RQ1: How Many False Negatives
! Defects missed, partially, or fully captured
! Warnings from a tool should also correctly explain in
detail why a flagged line may be faulty
! How many one-line defects are captured and explained
reasonably well (so called, “strictly captured”)?
Very high miss rates!
21
Thung et al., “To What Extent Could We Detect Field Defects?” ASE 2012
67. RQ1: How Many False Negatives
! Defects missed, partially, or fully captured
! Warnings from a tool should also correctly explain in
detail why a flagged line may be faulty
! How many one-line defects are captured and explained
reasonably well (so called, “strictly captured”)?
Very high miss rates!
21
Thung et al., “To What Extent Could We Detect Field Defects?” ASE 2012
70. Bug Fix Memories
Bug fix changes in
revision 1 .. n-1
……
Extract patterns in bug fix Memory
change history
Kim et al., “"Memories of bug fixes",” FSE 2006
71. Bug Fix Memories
Bug fix changes in
revision 1 .. n-1 Code to examine
……
Search for patterns
in Memory
Extract patterns in bug fix Memory
change history
Kim et al., “"Memories of bug fixes",” FSE 2006
72. Fix Wizard
public void setColspan(int colspan) throws WrongValueException{ public
if (colspan <= 0) throw new WrongValueException(...);
if ( colspan != colspan) { public
colspan = colspan; Objec
final Execution exec = Executions.getCurrent(); if (tar
MCla
if (exec != null && exec.isExplorer()) invalidate() ; Colle
smartUpdate(”colspan” Integer.toString( colspan));...
, MOp
class
public void setRowspan(int rowspan) throws WrongValueException{
if (rowspan <= 0) throw new WrongValueException(...);
if ( rowspan != rowspan) { public
rowspan = rowspan;
final Execution exec = Executions.getCurrent(); public
Objec
if (exec != null && exec.isExplorer()) invalidate(); if (tar
smartUpdate(”rowspan” Integer.toString( rowspan));...
, MCla
Colle
MAt
class
Figure 1: et al., “Recurring at v5088-v5089 in ZK
Nguyen Bug Fixes Bug Fixes in Object-Oriented Programs,” ICSE 2010
73. if (exec != null && exec.isExplorer()) invalidate();
smartUpdate(”rowspan” Integer.toString( rowspan));...
,
Fix Wizard in ZK
Figure 1: Bug Fixes at v5088-v5089
public void setColspan(int colspan) throws WrongValueException{
if (colspan <= 0) throw new WrongValueException(...);
Usage in method colspan) {
if ( colspan != colSpan Usage in method rowSpan
colspan = colspan;
IF IF IF IF
final Execution exec = Executions.getCurrent();
WrongValueException(exec
if .<init> != nullExecutions.getCurrent
&& exec.isExplorer()) invalidate() ;
WrongValueException .<init> Executions.getCurrent
smartUpdate(”colspan” Integer.toString( colspan));...
,
Execution.isExplorer Execution.isExplorer
IF
public void setRowspan(int rowspan) throws WrongValueException{ IF
if (rowspan <= 0) throw new WrongValueException(...);
if ( rowspan != rowspan) {
Auxheader.invalidate Auxheader.invalidate
rowspan = rowspan;
Usage in changed code
Auxheader.smartUpdate
final Execution exec = Executions.getCurrent(); Auxheader.smartUpdate
if (exec != null && exec.isExplorer()) invalidate();
smartUpdate(”rowspan” Integer.toString( rowspan));...
,
Figure 2: Graph-based Object Usages for Figure 1
Nguyen et al., Recurring Bug Fixes in Object-Oriented Programs,” ICSE 2010
77. Source Repository Bug Database
all commits C
commit all bugs B
commit commit
commit commit
commit
fixed bugs Bf
commit
commit
commit
Bird et al., “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
78. Source Repository Bug Database
all commits C
commit all bugs B
commit commit
commit commit
commit
fixed bugs Bf
commit
commit
commit
linked via log messages
Bird et al., “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
79. Source Repository Bug Database
all commits C
commit all bugs B
commit commit
commit commit
commit
fixed bugs Bf
commit
linked fixed bugs Bfl
commit
commit
linked via log messages
Bird et al., “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
80. Source Repository Bug Database
all commits C
commit all bugs B
commit commit
commit commit
commit
fixed bugs Bf
commit
linked fixes Cfl linked fixed bugs Bfl
commit
commit
linked via log messages
Bird et al., “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
81. Source Repository Bug Database
all commits C
commit all bugs B
commit commit
commit commit
related,
commit
but not linked fixed bugs Bf
commit
linked fixes Cfl linked fixed bugs Bfl
commit
commit
linked via log messages
Bird et al., “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
82. Source Repository Bug Database
all commits C
commit all bugs B
commit commit
commit commit
bug fixes Cf related,
commit
but not linked fixed bugs Bf
commit
linked fixes Cfl linked fixed bugs Bfl
commit
commit
linked via log messages
Bird et al., “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
83. Source Repository Bug Database
all commits C
oise!
N all bugs B
commit
commit commit
commit commit
bug fixes Cf related,
commit
but not linked fixed bugs Bf
commit
linked fixes Cfl linked fixed bugs Bfl
commit
commit
linked via log messages
Bird et al., “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
84. How resistant a defect
prediction model is to noise?
1"
0.9"
0.8"
0.7"
Buggy%F'measure
SWT"
0.6"
0.5" Debug"
0.4" Columba"
0.3"
Eclipse"
0.2"
Scarab"
0.1"
0"
0" 0.1" 0.2" 0.3" 0.4" 0.5" 0.6"
(c)%Training%set%false%nega6ve%(FN)%&%false%posi6ve%(FP)%rate
Kim et al., “Dealing with Noise in Defect Prediction,” ICSE 2011
85. How resistant a defect
prediction model is to noise?
1"
0.9"
0.8"
0.7"
Buggy%F'measure
SWT"
0.6"
0.5" Debug"
0.4" Columba"
0.3"
Eclipse"
0.2"
Scarab"
0.1"
0"
0" 0.1" 0.2" 0.3" 0.4" 0.5" 0.6"
(c)%Training%set%false%nega6ve%(FN)%&%false%posi6ve%(FP)%rate
Kim et al., “Dealing with Noise in Defect Prediction,” ICSE 2011
86. How resistant a defect
prediction model is to noise?
1"
0.9"
0.8"
0.7"
Buggy%F'measure
SWT"
0.6"
0.5" Debug"
0.4" Columba"
0.3"
0.2"
0.1"
20% Eclipse"
Scarab"
0"
0" 0.1" 0.2" 0.3" 0.4" 0.5" 0.6"
(c)%Training%set%false%nega6ve%(FN)%&%false%posi6ve%(FP)%rate
Kim et al., “Dealing with Noise in Defect Prediction,” ICSE 2011
87. Closest List Noise
return Aj
Identification
F igure 9. The pseudo-code of the C LN I algorit
A
Kim et al., “Dealing with Noise in Defect Prediction,” ICSE 2011
88. Noise detection
performance
Precision Recall F-measure
Debug 0.681 0.871 0.764
SWT 0.624 0.830 0.712
(noise level =20%)
Kim et al., “Dealing with Noise in Defect Prediction,” ICSE 2011
89. Bug prediction using cleaned data
Noisey
100
75
SWT F-measure
50
25
0
0% 15% 30% 45%
Noise level
91. Bug prediction using cleaned data
Noisey Cleaned
100
75
SWT F-measure
50
25
76%
F-measure
with 45% noise
0
0% 15% 30% 45%
Noise level
92. ReLink
Source
code
repository Traditional
Unknown
heuristics
(link miner) links
Bug
database Recovering
links using
feature
Links
Features
Links
Combine
Links
Wu et al., “ReLink: Recovering Links between Bugs and Changes,” FSE 2011
93. ReLink
Source
code
repository Traditional
Unknown
heuristics
(link miner) links
Bug
database Recovering
links using
feature
Links
Features
Links
Combine
Links
Wu et al., “ReLink: Recovering Links between Bugs and Changes,” FSE 2011
94. ReLink Performance
ZXing
Projects
OpenIntents
Apache
0 20 40 60 80 100
F-measure
Traditional ReLink
Wu et al., “ReLink: Recovering Links between Bugs and Changes,” FSE 2011
95. Label Historical Changes
Change message:
“fix for bug 28434”
Rev 101 (with BUG) Rev 102 (no BUG)
... ...
... ...
... ...
fixed
... ...
Rev 1 Rev 100 Rev 101 Rev 102
... ... ... ...
...
...
…… ...
...
change ...
...
change ...
...
... ... ... ...
Development history of a file
Fischer et al, “Populating a Release History Database from Version Control and Bug Tracking Systems,” ICSM2003
96. Atomic Change
Change message:
“fix for bug 28434”
Rev 101 (with BUG) Rev 102 (no BUG)
... ...
setText(“t”) insertTab()
... ...
fixed
... ...
Fischer et al, “Populating a Release History Database from Version Control and Bug Tracking Systems,” ICSM2003
97. Composite Change
public TimeSeriesDataItem addOrUpdate(RegularTimePeriod period, double value)
hunk 1 677
678 }
return this.addOrUpdate(period, new Double(value));
}
return addOrUpdate(period, new Double(value)); this.
this.
public TimeSeries createCopy(RegularTimePeriod start, RegularTimePeriod end) this.
944 if (endIndex < 0) { if ((endIndex < 0) || (endIndex < startIndex)) {
hunk 2 945
946 }
emptyRange = true;
}
emptyRange = true; if (ti
|
ge
public boolean equals(Object object)
973 if (!ObjectUtilities.equal( if (!ObjectUtilities.equal(getDomainDescription(),
}
hunk 3 974
)){
getDomainDescription(), s.getDomainDescription() s.getDomainDescription())) {
975 return false; return false;
976 } }
pub
978 if (!ObjectUtilities.equal( if (!ObjectUtilities.equal(getRangeDescription(),
hunk 4 979
)){
getRangeDescription(), s.getRangeDescription() s.getRangeDescription())) {
}
980 return false; return false;
981 } }
JFree revision 1083
Figure 5. JFreeChart revision 1083.
Tao et al, “"How Do Software Engineers Understand Code Changes?” FSE 2012
110. Warning Prioritization
18"
16"
14"
12"
Precision)(%))
10"
8" History"
Tool"
6"
4"
2"
0"
0" 20" 40" 60" 80" 100"
Warning)Instances)by)Priority)
Kim and Ernst, “Which Warnings Should I Fix First?” FSE 2007
111. Other Topics
• Explanation
- Why it has been predicted as defect-prone?
• Cross-project prediction
• Cost effectiveness measures
• Active Learning/Refinement
118. Some slides/data are borrowed
with thanks from
• Tom Zimmermann, Chris Bird
• Andreas Zeller
• Ahmed Hassan
• David Lo
• Jaechang Nam,Yida Tao
• Tien Neguan
• Steve Counsell, David Bowes, Tracy Hall and David Gray
• Wen Zhang