An Exploration of Challenges Limiting Pragmatic Software Defect Prediction

An Exploration of Challenges
Limiting Pragmatic Software
Defect Prediction
Emad Shihab
Queen’s University

Software Quality is Important!
Cost of Software Defects:
$59.5 Billion
2

100+ papers on defect prediction in
past 10 years 3
SDP can save verification efforts
by 29%

Limited Industrial Adoption
Interesting Makes Sense!
4
Why?

Prior Approaches are Not Adding Value!
5
Impact of defects in not considered
No guidance on what to do is provided
Prediction is too late and too defect-centric
We need pragmatic solutions!

Overview of Thesis
6
Pragmatic SDP
Considering Impact Providing Guidance
Proactive &
Encompassing SDP
Surprises &
Breakages
Re-opened
defects
Simplifying
SDP models
Unit test
creation
Risky Changes

Considering Impact
7
Surprises &
Breakages
Re-opened
defects

Surprise Defects
Low pre-, high post-release defects
Catch developers off-guard
Lead to schedule interruptions
Occur in unexpected locations
8

Factors Used to Model Surprise Defects
Size
Pre-release defects
Number, churn, size, pre-release
changes, pre-release defects
Latest change
Age
Traditional
Co-changed files
Time 9

High-Impact Defects
Precision
Recall
74.0%
5.9%
2.0%
Random Predictor
Prediction Results
2-3X precision, high recall
10

Traditional Co-change Time
Most Important Factors
0
5
10
15
20
25
30
35
40
R2.1 R3 R4 R4.1
DevianceExplained(%)
Co-change
Time
11

Considering Impact
12
Surprises &
Breakages
Re-opened
defects

Motivation
13
Degrade Quality Increase
Maintenance Costs
Unnecessary
Re-work

Factors Used to Model Re-opened
Defects
14
Work habit Bug report
Bug fix People

Prediction Results
Eclipse
49.9
72.6
Re-opened
Precisionandrecall(%)
Precision
Recall
15
+ ++
~3X precision and 72.6 recall

Eclipse
+ ++
Level Frequency Attributes
Level 1 10 Comment text
Level 2 20 Description text
16
Bug report information, especially
comments, are most important

Providing Guidance
17
Simplifying
SDP models
Unit test
creation

Motivation
18
Complexity metrics,
Program dependencies,
socio-technical networks
Size is a good
indicator of
buggy files
Use dependency
and complexity
metrics
Process and
code metrics
University of Lugano
Change coupling,
popularity and
design flaws
Change complexity
and social structures
Which metrics should I
use? How do they impact
my code quality?
Structure and
historical changes

Case Study
1. Build a model with initial set of 34 factors
2. Iteratively remove statistically insignificant and
highly correlated metrics
19
Replicate the study by Zimmermann et al. [310]

Main Findings
20
Narrowed down 34 code and process metrics to
only 3 or 4
0
10
20
30
40
50
60
70
80
90
100
Precision (%) Recall (%) Accuracy (%)
Performancemeasure(%)
Simple
All metrics

Providing Guidance
21
Simplifying
SDP models
Unit test
creation

Prioritizing Unit Test Creation
Use the rich history of the software system to
prioritize the creation of unit tests

0
10
20
30
40
50
60
70
80
90
100
Usefulness(%)
Time (Days)
Most Frequently Modified (MFM)
Most Frequently Fixed (MFF)
Largest Fixed (LF)
Change Risk (CR)
Random
Usefulness
Was writing the unit test useful?
2-3X improvement in usefulness

Encompassing and Proactive SDP
24
Risky Changes

Research Context
25
Defect
Prediction
Risky changes

An Example Change
26
Change 12345 by author@adesk on 2000/03/23 12:47:15
Purpose: Bug fix
Modifies API: Yes
Related Changes: 1234, 3421
…
Change description: Changed files A and B to implement new feature and
fix bug 123 ...
Files affected:
//root/comp1/subcomp1/A.java (+10, -1, e10)
//root/comp1/subcomp1/B.cpp (+1, -2, e5)
Risky?

Factors Used to Model Risky Changes
Lines and chunks add, deleted, modified,
total churn
No. of changes, No. of fixes, defectiveness,
No. developers
Developer experience, Bug fix?, No. linked
bugs
Changed files
Experience &
Defects
Code
Size
Modify Java, CPP, other, API

Developer Team
Precision
Recall
67.9%67.6%
+37%+87%
Prediction Results
37-87% improvement in precision,
67% recall
28

Developer Team
7 X Lines Added 10 X Chunks Added
7 X File defectiveness 6 X File defectiveness
None 3 X Modifies C++
2 X No. linked bugs
1 X Developer experience
4 X No. linked bugs
Code added, file defectiveness, No. linked
defects and developer experience
29

Test Writing Factors
Most Recently Modified (MRM)
Most Recently Fixed (MRF)
Largest Fixed (LF)
Largest Modified (LM)
Change Risk (CR)
Size Risk (SR)
Random
Modification
Fix
Size
Risk
Random

Prediction Performance
Actually Defective
Predicted Defective
TP
FP
FN Precision
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
Recall
33

High-Impact Defects: A Study of
Breakage and Surprise Defects
Emad Shihab, Audris Mockus, Yasutaka
Kamei, Bram Adams, Ahmed E. Hassan

We know that….
Software ^ has defects
How can we spend the limited
resources to maximize quality?
Q:
Projects ^ have limited resources
35

Defect Prediction
0.8
0.1
Prediction
Model
Size
Pre-release defects
.
.
Complexity
Input: Metrics
Churn
Output: Risk [0..1]
.
.
Key Predictors: Size and pre-release
defects
36

Existing Approaches Aren’t Adding Value
• Obvious to practitioners
• Require a large amount of effort
• Not all defects are equally important
So….what can we do?
FOCUS ON HIGH-IMPACT DEFECTS !
37

Impact Is In The Eye of The Beholder!
Customers: Breakages
Break existing functionality
Affect established customers
Hurt company image
Low pre-, high post-release defects
Catch developers off-guard
Lead to schedule interruptions
Developers: Surprises
Occur in unexpected locations
38

Case Study
Commercial telecom project
30+ years of development
7+ MLOC
Mainly in C/C++
39

Part 1 Part 2
Part 3 Part 4
Exploratory Study of
Breakages and Surprises
Prediction of Breakages
and Surprises
Understanding
Prediction Models of
Value of Focusing on
Study Overview
40

Exploratory Study of Breakages and
Surprises
All files
Breakages Surprises
Post-release
10%
2% 2%
Rare (2% of files)
6% overlap  Should study them separately
 Very difficult to model
41

Part 1 Part 2
Part 3 Part 4
Prediction of
Understanding Prediction
Models of Breakages and
Surprises
Predicting Breakages and Surprises
42

Prediction Using Logistic Regression
Outcome = Const + β1 factor 1
+ β2 factor2
+ β3 factor 3
.
.
+ βn factor n
Breakage?
Surprises?
Factors From 3 Dimensions
43

Factors Used to Model Breakages and
Surprises
Size
Pre-release defects
Number, churn, size, pre-release
changes, pre-release defects
Latest change
Age
Traditional
Co-changed files
Time 44

Breakages Surprises
Precision
Recall
74.1%71.2%
6.7%
2.0%
4.7%
2.0%
Random Predictor
Prediction Results
2-3X precision, high recall
45

Part 1 Part 2
Part 3 Part 4
and Surprises
Understanding
Prediction Models of
Understanding Breakages and
Surprises Models
46

Determining Important Factors
15.6%
Quality of fit  Deviance Explained
+1.5% +0.4%
Example: Breakages R1.1 47

Important Factors for High-Impact Defects
0
5
10
15
20
25
30
35
40
R1.1 R2.1 R3 R4 R4.1
0
5
10
15
20
25
30
35
40
R1.1 R2.1 R3 R4 R4.1
Breakages Surprises
DevianceExplained(%)
Traditional
Co-change
Time
48

Part 1 Part 2
Part 3 Part 4
and Surprises
Understanding Prediction
Models of Breakages and
Surprises
Value of Focusing on Breakages and
Surprises
49

Building Specialized Models
Test
Post-release
Defects
Train
Breakages
Test
Breakages
Train
Breakages
Compare
False
Positives
General model
Specialized model
50

Effort Savings Using Specialized Models
41 42
55
50
0
10
20
30
40
50
60
70
80
90
100
File LOC
EffortSavings(%)
Breakages
Surprises40-50% Effort Savings Using Specialized
Models
51

Take Home Messages
1. Breakages and Surprises are different. Occur in 2%
of files, hard to predict
2. Achieve 2-3X improvement in precision, high recall
Co-change and Time metrics
4. Building specialized models saves 40-50% effort
 Traditional metrics3. Breakages
Surprises 
52
http://research.cs.queensu.ca/home/emads/data/FSE2011/hid_artifact.html

Predicting Re-opened Bugs
A Case Study on the Eclipse Project
Emad Shihab, A. Ihara, Y. Kamei, W. Ibrahim,
M. Ohira, B. Adams, A. E. Hassan and K. Matsumoto
emads@cs.queensu.ca
SAIL, Queen’s University, Canada
NAIST, Japan
53

When you discover a bug …
Report bug Fix bug Verify fix Close bug
Re-opened
54

Increase maintenance costs …
56

Research questions …
1. Which attributes indicate re-opened bugs?
2. Can we accurately predict if a bug will be re-
opened using the extracted attributes?
58

Determine
best
attributes
Mine code
and bug
repositories
Approach overview
Extract
attributes
Predict re-
opened bugs
59

Our dimensions …
60
Work habit Bug report
Bug fix People

Work habit attributes
1. Time (Hour of day)
2. Weekday
3. Day of month
4. Month
61

Bug report attributes
1. Component
2. Platform
3. Severity
4. Priority
5. CC list
6. Priority changed
7. Description size
8. Description text
9. Number of comments
10. Comment size
11. Comment text
62
Metadata
Textual
data

Bug fix attributes
1. Time to resolve (in days)
2. Last status
3. Number of edited files
63

People attributes
1. Reporter Name
2. Reporter experience
3. Fixer name
4. Fixer experience
64

Research question 1
Which attributes indicate re-opened bugs?
65
Comment text, description text and fix location
(component) are the best indicators

Top node analysis setup
1. Build 10 decision trees for each attribute set
3. Repeat using all attributes
2. Record the frequency and level of each attribute
66

Decision tree prediction model
67
No. files
>= 5 < 5
Dev exp
>= 3 < 3
Re-openedMonth
Time
>= 12 < 12
Time to resolve
>= 6 < 6 >= 24 < 24
Re-opened Not Re-opened Re-opened.
.
.
.
.
.
Level 1
Level 2
Level 3

Top node analysis example with 3
trees
Comment
Time No. comments
Comment
Time No. files
No. files
Time Description size
Level 1 2
1
Comment
No. files
Level 2 3
1
1
1
Time
No. comments
No. files
Description size
.
.
.
.
.
.
68

Which attributes best indicate re-
opened bugs?
69
Work habit attributes
9 X Month
1 X Time (Hour of day)
Weekday
Day of month

opened bugs?
70
Bug report attributes
Component
Platform
Severity
Priority
CC list
Priority changed
Description size
Description text
Number of comments
Comment size
10 X Comment text
Metadata
Textual
data

opened bugs?
7 X Time to resolve
3 X Last status
Number of files in fix
71
Bug fix attributes

opened bugs?
5 X Reporter name
5 X Fixer name
Reporter experience
Fixer experience
72
People attributes

Combining all attributes
+ ++
Level 1 10 Comment text
Level 2 19
1
Description text
Component
73

Research question 2
Can we accurately predict if a bug will be
re-opened using the extracted attributes?
74
Our models can correctly predict re-opened bugs with
63% precision and 85% recall

Decision tree prediction model
75
No. files
>= 5 < 5
Dev exp
>= 3 < 3
Re-openedMonth
Time
>= 12 < 12
Time to resolve
>= 6 < 6 >= 24 < 24
Re-opened Not Re-opened Re-opened.
.
.
.
.
.
Level 1
Level 2
Level 3

Performance measures
Re-opened precision:
Re-opened Recall:
Re-opened Not re-opened
Re-opened TP FP
Not re-opened FN TN
Predicted
Actual
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
Not re-opened precision:
Not re-opened recall:
𝑇𝑁
𝑇𝑁 + 𝐹𝑁
𝑇𝑁
𝑇𝑁 + 𝐹𝑃
76

33
63
21
27
74
83 83
67
Work habits Bug report Bug fix People
Precision
Recall
Predicting re-opened bugs
77

93
97
93 91
71
91
39
66
Work habits Bug report Bug fix People
Precision
Recall
Predicting NOT re-opened bugs
78

Combining all attributes
63
97
85
90
re-opened NOT re-opened
Precision
Recall
79
+ ++

Bug comments are important …
Bug report is most important set
What words are important?
Comment text most important bug report attribute
80

Important words
Re-opened Not Re-opened
control
background
debugging
breakpoint
blocked
platforms
verified
duplicate
screenshot
important
testing
warning
81

Understanding the Impact of Code and Process
Metrics on Post-release Defects: A Case Study on
the Eclipse Project
Emad Shihab, Zhen Ming Jiang, Walid Ibrahim,
Bram Adams and Ahmed E. Hassan
Software Analysis and Intelligence Lab (SAIL)
Queen’s University 83

Motivation
Software has^ bugs and
managers have ^ limited resources
84
How to allocate quality assurance
resources?
Q:
A: Defect prediction!

Motivation
85
Complexity metrics,
Program dependencies,
socio-technical networks
Size is a good
indicator of
buggy files
Use dependency
and complexity
metrics
Use number of
imports and
code metrics
Use process
and code
metrics
University of Lugano
Change coupling,
popularity and
design flaws
Change complexity
and social structures
Which metrics should I
use? How do they impact
my code quality?

The challenge we face …
1. more work to mine
2. difficult to understand impact
3. less adoption in practice
86
more metrics, means ….

Our goal ….
Use a statistical approach based on work by Cataldo et al. :
1. Narrow down large set of metrics to much smaller set
2. Study the impact on post-release defects
87

Our findings ….
Narrowed down 34 code and process metrics to only 3 or 4
Simple models achieve comparable predictive power
Explanative power of simple model outperform 95% PCA
88
Some metrics ALWAYS matter: Size and pre-bugs
Let me show you how ….

34 Code and Process Metrics
Metric Description
POST Number of post-release defects in a file in the 6 months after the release.
PRE Number of pre-release defects in a file in the 6 months before the release
TPC Total number of changes to a file in the 6 months before the release
BFC Number of bug fixing changes in a file in the 6 months before the release.
TLOC Total number lines of code of a file
ACD Number of anonymous type declarations in a file
FOUT (3) Number of method calls of a file
MLOC (3) Number of method lines of code
NBD (3) Nested block depth of the methods in a file
NOF (3) Number of fields of the classes in a file
NOI Number of interfaces in a file
NOM (3) Number of methods of the classes in a file
NOT Number of classes in a file
NSF (3) Number of static fields of the classes in a file
NSM (3) Number of static methods of the classes in a file
PAR (3) Number of parameters of the methods in a file
VG (3) McCabe cyclomatic complexity of the methods in a file
Process
Metrics
Code
Metrics
89

Approach overview
P < 0.1 VIF < 2.5
1. Build Logistic Regression model using all metrics
2. Remove statistically insignificant metrics
3. Remove highly co-linear metrics
4. Narrow down to a much smaller set of metrics 90
Initial model
w/ all metrics
Statistical
significance
check
Co-linearity
check
Simple
model
The std error of metric coefficient is
~1.6 times as large if metrics were
uncorrelated
5.2

Case study
Perform case study on Eclipse 2.0, 2.1 and 3.0
RQ1: Which metrics impact post-release defects?
Do these metrics change for different releases of Eclipse?
RQ2: How much do metrics impact the post-release defects?
Does the level of impact change across different releases?
91

Metric Eclipse 2.0 Eclipse 2.1 Eclipse 3.0
P-value VIF P-value VIF P-value VIF
Anonymous Type Declarations * 1.2
No. of Static Methods *** 1.1
No. of Parameters *** 1.2
No. Pre-release Defects *** 1.1 *** 1.1 *** 1.2
Total Prior Changes *** 1.1 ** 1.1
Total lines of Code *** 1.3 *** 1.4 *** 1.3
RQ1: Which metrics impact? Do
they change?
92
Important and stable for all releases
Code metrics specific for release
(p<0.001 ***; p<0.001 **, p<0.05*)

RQ2: How much do metrics
explain?
93
Metric Eclipse 2.0 Eclipse 2.1 Eclipse 3.0
Total lines of Code
Total Prior Changes
No. Pre-release defects
No. of Parameters
No. of static methods
Anonymous Type Declarations
Deviance explained 25.2% 17.7% 21.2%
0.1%
4.9%
17.6%
2.2%
11.2%
6.3%
0.2%
14.5%
5.9%
0.5%
0.7%
Size and process metrics are most important
How well the model fits, explains
the observed phenomena

RQ2: Impact of the metrics?
Eclipse 3.0
94
Metric Odds-ratios
(M 1)
Odds-ratios
(M2)
Odds-ratios
(M 3)
Odds-ratios
(M 4)
Lines of Code 2.57 2.40 2.11 1.88
Prior Changes 1.87 1.62 1.62
Pre-release defects 1.87 1.90
Max parameters of
methods
1.73
1 unit increase, increases the chance
of post-release defect by 90%
Odds ratios are used to quantify impact on post-release
defects

But …What about predictive power?
Eclipse 3.0
95
Simple models achieve comparable results to
more complex models
0
10
20
30
40
50
60
70
80
90
100
Precision (%) Recall (%) Accuracy (%)
Performancemeasure(%)
Simple
All metrics

Comparing to PCA
Eclipse 3.0
96
Simple 95% PCA 99% PCA 100% PCA
Deviance
explained
21.2% 16.3% 21.7% 22.0%
No. of
metrics
4 33 33 33
No. of PCs - 8 15 33
Can outperform 95% PCA, using much simpler models

Comparing to PCA
97
0 10 20 30
Eclipse 2.0
Eclipse 2.1
Eclipse 3.0
Deviance explained (%)
100% PCA
95% PCA
Simple
Outperform 95% PCA,
slightly below 100% PCA
Use at most 4 metrics
Vs. 34 metrics used in
PCA

Prioritizing Unit Test Creation for
Test-Driven Maintenance of Legacy
Systems
Emad Shihab, Zhen Ming Jiang, Bram Adams,
Ahmed E. Hassan and Robert Bowerman
Queen’s University and Research In Motion
Canada

Test Driven Development (TDD)
Write unit test
before
writing new code
What about already written code

Test Drive Maintenance (TDM)
Adopting
Test Driven Development (TDD)
for Legacy Applications
But time and resources are limited!

Prioritizing Unit Test Creation
Use the rich history of the legacy system to
prioritize the writing of unit tests

Avoid the most bugs effectively!
Write unit tests for functions with best
Return on Investment (ROI)
How can we avoid the most
bugs given limited resources?

Testing Writing Prioritization
Heuristics
Largest Fixed (LF)
Change Risk (CR)
Size Risk (SR)
Random
Modification
Fix
Size
Risk
Random

Usefulness
Time to write unit test
A
B
C
6 bug fixes
2 bug fixes
0 bug fixes
Usefulness = 2/3 = 66.67%

POP: Percentage of Optimal Performance
How close are we to the optimal performance?
Time to write unit test
A
B
C
6 bug fixes
2 bug fixes
0 bug fixes
POP = 8/13 = 61.5%
D
E
4 bug fixes
3 bug fixes

Case Study
Simulation Parameters
Calibration Period: 6 months
Simulation time: 5 years
List Size: 10 functions
Effort: 100 lines per day

Study Setup
Extracting Historical Data
1. Search modification record comments for keywords
and bug identifiers
2. Extract source code of modified file(s) and compare
to previous version to identify changed functions
3. Combine data from 1 and 2 to identify
changed/fixed functions

main() {
int a;
/*call
help*/
helpInfo();
}
helpInfo() {
errorString!
}
main() {
int a;
/*call
help*/
helpInfo();
}
helpInfo(){
int b;
}
main() {
int a;
/*call
help*/
helpInfo();
}
V1:
Undefined func.
(Link Error)
V2:
Syntax error
V3:
Valid code
Mapping Historical Changes to Functions

Study Setup
Measuring the Performance of a Heuristic
Based on a heuristic, generate list of X
functions to write unit tests for
Use size of function to measure effort
required to write unit test

Test Writing Heuristics
Largest Fixed (LF)
Change Risk (CR)
Size Risk (SR)
Random
Modification
Fix
Size
Risk
Random

Best Test Writing Heuristics
Largest Fixed (LF)
Change Risk (CR)
Size Risk (SR)
Random
Modification
Fix
Size
Risk
Random

0
10
20
30
40
50
60
70
80
90
100
Usefulness(%)
Time (Days)
Largest Fixed (LF)
Change Risk (CR)
Random
Usefulness

POP: Percentage of Optimal Performance
How close are we to the optimal performance?
0
10
20
30
40
50
60
70
80
90
100
Percentageofoptimalperformance(%)
Time (days)
Largest Fixed (LF)
Change Risk (CR)
Random

87
84.7
83.8
80
56.9
55
48.8
43.1
27.7
32.4
32.2
22.2
21.8
7
5.5
4.3
4.9
1.7
0 10 20 30 40 50 60 70 80 90 100
Largest Fixed (LF)
Change Risk (CR)
Size Risk (SR)
Random
Overall Performance of Heuristics
POP
Usefulness

0
5
10
15
20
25
30
35
40
5 10 20
List size
Effect of Varying Parameters
Varying List Size
Largest Fixed (LF)
Change Risk (CR)
Random

0
5
10
15
20
25
30
35
40
45
50
50 100 200
Effort (lines per day)
Effect of Varying Parameters
Varying Writing Effort
Largest Fixed (LF)
Change Risk (CR)
Random

Overview of Change Integration Process
Local
Repository
Risky?
Yes
Closer
review
No
Main
Repository
Change
120

Case Study
Commercial mobile system
Dec 2009 – Dec 2010
450+ developers
60+ teams
7000+ changes
Mainly in Java and C/C++
121

Part 1 Part 2 Part 3
Prediction of
Risky Changes
Understanding
Risky Changes
Misclassification
of Risky Changes
Study Overview
122

An Example Change
123
Change 12345 by author@adesk on 2000/03/23 12:47:15
Purpose: Bug fix
Modifies API: Yes
Related Changes: 1234, 3421
…
Change description: Changed files A and B to implement new feature and
fix bug 123 ...
Files affected:
//root/comp1/subcomp1/A.java (+10, -1, e10)
//root/comp1/subcomp1/B.cpp (+1, -2, e5)
Risky?

Factors Used to Model Risky Changes
Lines and chunks add, deleted, modified,
total churn
No. of changes, No. of fixes, bugginess, No.
developers
Developer experience, Bug fix?, No. linked
bugs
Changed files
Experience &
Defects
Code
Size
Modify Java, CPP, other, API

Developer Team
Precision
Recall
67.9%67.6%
+37%+87%
Prediction Results
37-87% improvement in precision,
67% recall
125

Prediction of
Risky Changes
Understanding
Risky Changes
Misclassification
of Risky Changes
Study Overview
126

Developer Team
7 X Lines Added 10 X Chunks Added
7 X File bugginess 6 X File bugginess
None 3 X Modifies C++
2 X No. linked bugs
4 X No. linked bugs
Code added, file bugginess, No. linked
defects and developer experience
127

Prediction of
Risky Changes
Understanding
Risky Changes
Misclassification
of Risky Changes
Study Overview
128

When were Developers Wrong?
Compare percentage of correctly and wrongly classified
changes:
• Cause: Unclear requirements, inadequate testing, coding
errors, design flaw
• Related changes?
• Modifies API code?
Changes that have related changes are 10
times more likely to be wrongly classified!
129

Reality Check!
130
Too much to review Impact is not
considered
Models are not
explainable

Success Story!
A tool based on this work is being used by RIM’s
Handheld Integration Team
Tools team is working on building a tool to be
deployed company wide
131

Overview of Change Integration Process
Local
Repository
Risky?
Yes
Closer
review
No
Main
Repository
Change
132

When were Developers Wrong?
Compare percentage of correctly and wrongly classified
changes:
• Cause: Unclear requirements, inadequate testing, coding
errors, design flaw
• Related changes?
• Modifies API code?
Changes that have related changes are 10
times more likely to be wrongly classified!
133

Evaluation of Prediction Models
Training
2/3 Testing
1/3
Data
Build Model
Input
Actually
defective
139
X 10
Pr(Defecti) = α + β1 * metric iPr(Defecti) = 0.1 + 0.5 * metric i
Pr(Defecti)

Evaluation of Prediction Models
140
Actually Defective
Predicted Defective
TP
FP
FN
Precision: “How small is FP”
Recall: “How small is FN”

Metrics
32 product metrics
1 process metric
142

Logistic Regression Model
Pr(Defect) = α + β1 metric 1
+ β2 metric 2
+ β3 metric 3
.
.
+ βn metric n
32 Product & 1 Process Metrics
Post-release
Defect
Cutoff = 0.5 143

Prediction Performance
Actually Defective
Predicted Defective
TP
FP
FN Precision
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
Recall
= 0.68
= 0.38
144

Defect Prediction
0.8
0.1
Prediction
Model
Size
Pre-release defects
Complexity
Input: Metrics
Churn
Output: Risk [0..1]
145

The challenge we face …
1. more work to mine
2. difficult to understand impact
3. less adoption in practice
146
more metrics, means ….

An Exploration of Challenges Limiting Pragmatic Software Defect Prediction

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to An Exploration of Challenges Limiting Pragmatic Software Defect Prediction

Similar to An Exploration of Challenges Limiting Pragmatic Software Defect Prediction (20)

More from SAIL_QU

More from SAIL_QU (20)

Recently uploaded

Recently uploaded (20)

An Exploration of Challenges Limiting Pragmatic Software Defect Prediction

Editor's Notes