Defect, defect, defect: PROMISE 2012 Keynote

Keynote

Defect,
Defect, Defect

Sung Kim
The Hong Kong University of
Science and Technology

Program Analysis and Mining
(PAM) Group

The First Bug
September 9, 1947

Finding Bugs
Veriﬁcation

Testing

Prediction

Defect Prediction

42 24

14

Program Tool Future defects

Defect Prediction Model

D= 4.8 6+
0. 0 18L

F. Akiyama, “An Example of Software System Debugging,” Information Processing, vol. 71, 1971

Defect Prediction

Identifying New Metrics

Developing New Algorithms

Various Granularities

Complex Files

a simple ﬁle

a complex ﬁle
Ostrand and Weyuker, Basili et al., TSE 1996, Ohlsson and Alberg, TSE 1996, Menzies et al., TSE 2007

Changes

Bell et al. PROMISE 2011, Moser et al., ICSE 2008, Nagappan et al., ICSE 2006, Hassan et al., ICSM 2005

View/Edit Patterns

Lee et al., FSE2011

Slide by Mik Kersten. “Mylyn – The task-focused interface” (December 2007, http://live.eclipse.org)

With Mylyn
Tasks are integrated
See only what you are working on

Slide by Mik Kersten. “Mylyn – The task-focused interface” (December 2007, http://live.eclipse.org)

* Eclipse plug-in storing and recovering task contexts

<InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ”
… StructureHandle=“ ” … Interest=“ ” … >

* Eclipse plug-in storing and recovering task contexts

Burst Edits/Views

Lee et al., FSE2011

Change Entropy

11
Low Entropy High Entropy

3 3 3 3 3
1 1 1 1

F1
F1 F2
F2 F3
F3 F4
F4 F5
F5 F6
F1 F7
F2 F8
F3 F9
F4 F10
F5
The number of changes in a period (e.g., a week) per ﬁle

Hassan, “Predicting Faults Using the Complexity of Code Changes,” ICSE 2009

Previous Fixes

Hassan et al., ICSM 2005, Kim et al., ICSE 2007

Network

Zimmermann and Nagappan, “Predicting Defects using Network Analysis on Dependency Graphs,”ICSE 2008

More Metrics
Complexity (Size)
CK
McCabe
OO
Process metrics
Halstead
Developer Count metrics
Change metrics
Entropy of changes (Change Complexity)
Churn (source code metrics)
# of changes to the file
Previous defects
Network measures
Calling structure attributes
Entropy (source code metrics)

0 5 10 15 20 25
# of publications (last 7 years)

Classiﬁcation
training instances complexity metrics
(metrics+ labels)
historical metrics
...

?
new instance
Prediction
Learner (classiﬁcation)

Regression
training instances complexity metrics
(metrics+ values)
historical metrics
...

?
new instance
Prediction
Learner (values)

Active Learning
Anomaly Detection
System 1

Refinement
Sorted
Engine
Bug Reports 2 5
<<Refinement Loop>>

First Few Bug User
Reports 3 Feedback
4
Figure 4. Active Reﬁnement Process

characteristics in a clone group. Then, the set ofet al., PROMISE 2012
Lo et al., “Active Reﬁnement of Clone Anomaly Reports,” ICSE 2012, Lu anomalies or

Bug Cache
c h 10% files
e t most bug-prone
- f
re ad
s
is
p
on Lo
m

t
en
em
ac
pl
Nearby: co changes

re
all files

Kim et al., “Predicting Faults from Cached History,” ICSE 2007

Algorithms

Classification 21
Algorithms

Regression 18

Both 4

Etc. 4

0 5 10 15 20 25
# of publications (recent 7 years)

31

Module/Binary/Package
Level

Method Level

void foo () {
...

}

Hata et al.,“Bug Prediction Based on Fine-Grained Module Histories,” ICSE 2012

Change Level
Development history of a ﬁle
Rev 1 Rev 2 Rev 3 Rev 4
... ... ... ...
... change ... change ... change ...
... ... ... ...
... ... ... ...

Did I just
introduce
a bug?

Kim et al., "Classifying Software Changes: Clean or Buggy?" TSE 2009

More Granularities

Project/Release/SubSystem 3
Component/Module 8
Package 3
File 19
Class 8
Function/Method 2
Change/Hunk level 1

0 5 10 15 20
# of publications (recent 7 years)

Defect Prediction Summary

Identifying New Metrics

Developing New Algorithms

Various Granularities

Performance 11

Apache ArgoUML Eclipse Embedded Healthcare Microsoft Mozilla
System system
system

Hall et al., "A Systematic Review of Fault Prediction Performance in Software Engineering," TSE 2011 (Figure 2)

Performance
13 13

Class File Module Binary/plug-in

*For example plug-ins, binaries
*For example plug-ins, binaries

Figure 6. The granularity of the results
The granularity of the results
Hall et al., "A Systematic Review of Fault Prediction Performance in Software Engineering," TSE 2011 (Figure 6)

Defect prediction
totally works!

Detailed To Fix List
VS Buggy Modules

Defect Prediction 2.0

Finer Granularity

Noise Handling

New Customers

FindBugs

http://ﬁndbugs.sourceforge.net/

Performance of Bug Detection Tools

Tools' priority 1

FindBugs
Warnings

jLint

PMD

0 5 10 15 20
Precision (%)

Kim and Ernst, “Which Warnings Should I Fix First?” FSE 2007

RQ1: How Many False Negatives
!  Defects missed, partially, or fully captured

!  Warnings from a tool should also correctly explain in
detail why a flagged line may be faulty
!  How many one-line defects are captured and explained
reasonably well (so called, “strictly captured”)?

Very high miss rates!
21

Thung et al., “To What Extent Could We Detect Field Defects?” ASE 2012

Line Level Defect Prediction

We have seen this
bug in revision 100

Bug Fix Memories
Bug fix changes in
revision 1 .. n-1

……

Extract patterns in bug fix Memory
change history

Kim et al., “"Memories of bug ﬁxes",” FSE 2006

Bug Fix Memories
Bug fix changes in
revision 1 .. n-1 Code to examine
……

Search for patterns
in Memory

Extract patterns in bug fix Memory
change history

Kim et al., “"Memories of bug ﬁxes",” FSE 2006

Fix Wizard
public void setColspan(int colspan) throws WrongValueException{ public
if (colspan <= 0) throw new WrongValueException(...);
if ( colspan != colspan) { public
colspan = colspan; Objec
ﬁnal Execution exec = Executions.getCurrent(); if (tar
MCla
if (exec != null && exec.isExplorer()) invalidate() ; Colle
smartUpdate(”colspan” Integer.toString( colspan));...
, MOp
class

public void setRowspan(int rowspan) throws WrongValueException{
if (rowspan <= 0) throw new WrongValueException(...);
if ( rowspan != rowspan) { public
rowspan = rowspan;
ﬁnal Execution exec = Executions.getCurrent(); public
Objec
if (exec != null && exec.isExplorer()) invalidate(); if (tar
smartUpdate(”rowspan” Integer.toString( rowspan));...
, MCla
Colle
MAt
class
Figure 1: et al., “Recurring at v5088-v5089 in ZK
Nguyen Bug Fixes Bug Fixes in Object-Oriented Programs,” ICSE 2010

if (exec != null && exec.isExplorer()) invalidate();
,

Fix Wizard in ZK
Figure 1: Bug Fixes at v5088-v5089
public void setColspan(int colspan) throws WrongValueException{
if (colspan <= 0) throw new WrongValueException(...);
Usage in method colspan) {
if ( colspan != colSpan Usage in method rowSpan
colspan = colspan;
IF IF IF IF
ﬁnal Execution exec = Executions.getCurrent();

WrongValueException(exec
if .<init> != nullExecutions.getCurrent
&& exec.isExplorer()) invalidate() ;
WrongValueException .<init> Executions.getCurrent
smartUpdate(”colspan” Integer.toString( colspan));...
,
Execution.isExplorer Execution.isExplorer

IF
public void setRowspan(int rowspan) throws WrongValueException{ IF
if (rowspan <= 0) throw new WrongValueException(...);
if ( rowspan != rowspan) {
Auxheader.invalidate Auxheader.invalidate
rowspan = rowspan;
Usage in changed code
Auxheader.smartUpdate
ﬁnal Execution exec = Executions.getCurrent(); Auxheader.smartUpdate

if (exec != null && exec.isExplorer()) invalidate();
,
Figure 2: Graph-based Object Usages for Figure 1

Nguyen et al., Recurring Bug Fixes in Object-Oriented Programs,” ICSE 2010

Word Level Defect Prediction

Fix suggestion
...

Source Repository Bug Database

all commits C
commit all bugs B
commit commit

commit commit

commit
ﬁxed bugs Bf

commit

commit

commit

Bird et al., “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009


all commits C
commit all bugs B
commit commit

commit commit

commit
ﬁxed bugs Bf

commit

commit

commit

linked via log messages



all commits C
commit all bugs B
commit commit

commit commit

commit
fixed bugs Bf

commit

linked fixed bugs Bfl
commit

commit




all commits C
commit all bugs B
commit commit

commit commit

commit
fixed bugs Bf

commit

linked fixes Cfl linked fixed bugs Bfl
commit

commit




all commits C
commit all bugs B
commit commit

commit commit

related,
commit
but not linked ﬁxed bugs Bf

commit

commit

commit




all commits C
commit all bugs B
commit commit

commit commit

bug ﬁxes Cf related,
commit

commit

commit

commit




all commits C
oise!
N all bugs B
commit

commit commit

commit commit

bug ﬁxes Cf related,
commit

commit

commit

commit



How resistant a defect
prediction model is to noise?
1"
0.9"
0.8"
0.7"
Buggy%F'measure

SWT"
0.6"
0.5" Debug"
0.4" Columba"
0.3"
Eclipse"
0.2"
Scarab"
0.1"
0"
0" 0.1" 0.2" 0.3" 0.4" 0.5" 0.6"
(c)%Training%set%false%nega6ve%(FN)%&%false%posi6ve%(FP)%rate
Kim et al., “Dealing with Noise in Defect Prediction,” ICSE 2011

How resistant a defect
prediction model is to noise?
1"
0.9"
0.8"
0.7"
Buggy%F'measure

SWT"
0.6"
0.5" Debug"
0.4" Columba"
0.3"
0.2"
0.1"
20% Eclipse"

Scarab"

0"
0" 0.1" 0.2" 0.3" 0.4" 0.5" 0.6"
(c)%Training%set%false%nega6ve%(FN)%&%false%posi6ve%(FP)%rate

Closest List Noise
return Aj

Identiﬁcation
F igure 9. The pseudo-code of the C LN I algorit

A


Noise detection
performance

Precision Recall F-measure

Debug 0.681 0.871 0.764

SWT 0.624 0.830 0.712

(noise level =20%)

Bug prediction using cleaned data
Noisey

100

75
SWT F-measure

50

25

0
0% 15% 30% 45%
Noise level

Noisey Cleaned

100

75
SWT F-measure

50

25

0
0% 15% 30% 45%
Noise level

Noisey Cleaned

100

75
SWT F-measure

50

25
76%
F-measure
with 45% noise
0
0% 15% 30% 45%
Noise level

ReLink
Source
code
repository Traditional
Unknown
heuristics
(link miner) links

Bug
database Recovering
links using
feature
Links
Features

Links
Combine

Links

Wu et al., “ReLink: Recovering Links between Bugs and Changes,” FSE 2011

ReLink Performance
ZXing
Projects

OpenIntents

Apache

0 20 40 60 80 100
F-measure

Traditional ReLink

Wu et al., “ReLink: Recovering Links between Bugs and Changes,” FSE 2011

Label Historical Changes
Change message:
“fix for bug 28434”

Rev 101 (with BUG) Rev 102 (no BUG)
... ...
... ...
... ...
ﬁxed
... ...
... ... ... ...
...
...
…… ...
...
change ...
...
change ...
...
... ... ... ...

Development history of a ﬁle
Fischer et al, “Populating a Release History Database from Version Control and Bug Tracking Systems,” ICSM2003

Atomic Change
Change message:
“fix for bug 28434”

Rev 101 (with BUG) Rev 102 (no BUG)
... ...
setText(“t”) insertTab()
... ...
ﬁxed
... ...

Fischer et al, “Populating a Release History Database from Version Control and Bug Tracking Systems,” ICSM2003

Composite Change
public TimeSeriesDataItem addOrUpdate(RegularTimePeriod period, double value)

hunk 1 677
678 }
return this.addOrUpdate(period, new Double(value));
}
return addOrUpdate(period, new Double(value)); this.
this.
public TimeSeries createCopy(RegularTimePeriod start, RegularTimePeriod end) this.
944 if (endIndex < 0) { if ((endIndex < 0) || (endIndex < startIndex)) {

hunk 2 945
946 }
emptyRange = true;
}
emptyRange = true; if (ti
|
ge
public boolean equals(Object object)
973 if (!ObjectUtilities.equal( if (!ObjectUtilities.equal(getDomainDescription(),
}
hunk 3 974
)){
getDomainDescription(), s.getDomainDescription() s.getDomainDescription())) {

975 return false; return false;
976 } }
pub
978 if (!ObjectUtilities.equal( if (!ObjectUtilities.equal(getRangeDescription(),

hunk 4 979
)){
getRangeDescription(), s.getRangeDescription() s.getRangeDescription())) {
}
980 return false; return false;
981 } }

JFree revision 1083
Figure 5. JFreeChart revision 1083.
Tao et al, “"How Do Software Engineers Understand Code Changes?” FSE 2012

Warning Developers

“Safe” Files
(Predicted as not buggy)

“Risky” Files
(Predicted as buggy)

Change Classiﬁcation
... ... ... ...
... ... ... ...

Kim et al., "Classifying Software Changes: Clean or Buggy?" TSE 2009

... ... ... ...
... ... ... ...

“Safe” Files
... ... ... ...
... ... ... ...

“Risky” Files

Defect prediction based
Debug UI

JDT

JEdit
Projects

PDE

POI

Team UI

0 0.20 0.40 0.60 0.80
F-measure
CC Cached CC

Warning Developers

“Safe” Location
(Predicted as not buggy)

“Risky” Location
(Predicted as buggy)

Test-case Selection

Executing
test cases

Test-case Selection
1.00

0.75
Baseline
APFD

0.50 History1
History2

0.25

0
R1.0 R1.1 R1.2 R1.3 R1.4 R1.5
Releases
Runeson and Ljung, “Improving Regression Testing Transparency and Efﬁciency with
History-Based Prioritization,” ICST 2011

Warning Prioritization
18"
16"
14"
12"
Precision)(%))

10"
8" History"
Tool"
6"
4"
2"
0"
0" 20" 40" 60" 80" 100"
Warning)Instances)by)Priority)

Kim and Ernst, “Which Warnings Should I Fix First?” FSE 2007

Other Topics

• Explanation
- Why it has been predicted as defect-prone?
• Cross-project prediction
• Cost effectiveness measures
• Active Learning/Reﬁnement


New metrics

Algorithms

Coarse granularity 1.0


New metrics Finer granularity

Algorithms Noise Handling

Coarse granularity 1.0 New customers 2.0


New metrics Finer granularity

Algorithms Noise Handling

Corse granularity 1.0 New customers 2.0

MSR$2013:$Back$to$roots$
Tom$Zimmermann$ Alberto$Bacchelli$$
General'chair' Mining'Challenge'Chair'

Massimiliano$Di$Penta$and$Sung$Kim$
Program'co)chairs'

MSR$2013:$Back$to$roots$
Tom$Zimmermann$ Alberto$Bacchelli$$
General'chair' Mining'Challenge'Chair'

February

1 5
Massimiliano$Di$Penta$and$Sung$Kim$
Program'co)chairs'

Some slides/data are borrowed
with thanks from

• Tom Zimmermann, Chris Bird
• Andreas Zeller
• Ahmed Hassan
• David Lo
• Jaechang Nam,Yida Tao
• Tien Neguan
• Steve Counsell, David Bowes, Tracy Hall and David Gray
• Wen Zhang

Defect, defect, defect: PROMISE 2012 Keynote

Recommended

Recommended

More Related Content

Similar to Defect, defect, defect: PROMISE 2012 Keynote

Similar to Defect, defect, defect: PROMISE 2012 Keynote (20)

More from Sung Kim

More from Sung Kim (20)

Recently uploaded

Recently uploaded (20)

Defect, defect, defect: PROMISE 2012 Keynote