Emad fse2011 final

High-Impact Defects: A Study of
Breakage and Surprise Defects
Emad Shihab, Audris Mockus, Yasutaka
Kamei, Bram Adams, Ahmed E. Hassan

We know that….
Software ^ has defects
How can we spend the limited
resources to maximize quality?
Q:
Projects ^ have limited resources
2

Defect Prediction
0.8
0.1
Prediction
Model
Size
Pre-release defects
.
.
Complexity
Input: Metrics
Churn
Output: Risk [0..1]
.
.
Key Predictors: Size and pre-release
defects
3

Existing Approaches Aren’t Adding Value
• Obvious to practitioners
• Require a large amount of effort
• Not all defects are equally important
So….what can we do?
FOCUS ON HIGH-IMPACT DEFECTS !
4

Impact Is In The Eye of The Beholder!
Customers: Breakages
Break existing functionality
Affect established customers
Hurt company image
Low pre-, high post-release defects
Catch developers off-guard
Lead to schedule interruptions
Developers: Surprises
Occur in unexpected locations
5

Case Study
Commercial telecom project
30+ years of development
7+ MLOC
Mainly in C/C++
6

Part 1 Part 2
Part 3 Part 4
Exploratory Study of
Breakages and Surprises
Prediction of Breakages
and Surprises
Understanding
Prediction Models of
Value of Focusing on
Study Overview
7

Exploratory Study of Breakages and
Surprises
All files
Breakages Surprises
Post-release
10%
2% 2%
Rare (2% of files)
6% overlap  Should study them separately
 Very difficult to model
8

Part 1 Part 2
Part 3 Part 4
Prediction of
Understanding Prediction
Models of Breakages and
Surprises
Predicting Breakages and Surprises
9

Prediction Using Logistic Regression
Outcome = Const + β1 factor 1
+ β2 factor2
+ β3 factor 3
.
.
+ βn factor n
Breakage?
Surprises?
Factors From 3 Dimensions
10

Factors Used to Model Breakages and
Surprises
Size
Pre-release defects
Number, churn, size, pre-release
changes, pre-release defects
Latest change
Age
Traditional
Co-changed files
Time 11

Breakages Surprises
Precision
Recall
74.1%71.2%
6.7%
2.0%
4.7%
2.0%
Random Predictor
Prediction Results
2-3X precision, high recall
12

Part 1 Part 2
Part 3 Part 4
and Surprises
Understanding
Prediction Models of
Understanding Breakages and
Surprises Models
13

Determining Important Factors
Traditional Co-change Time
15.6%
Quality of fit  Deviance Explained
+1.5% +0.4%
Example: Breakages R1.1 14

Traditional Co-change Time
Important Factors for High-Impact Defects
0
5
10
15
20
25
30
35
40
R1.1 R2.1 R3 R4 R4.1
0
5
10
15
20
25
30
35
40
R1.1 R2.1 R3 R4 R4.1
Breakages Surprises
DevianceExplained(%)
Traditional
Co-change
Time
15

Part 1 Part 2
Part 3 Part 4
and Surprises
Understanding Prediction
Models of Breakages and
Surprises
Value of Focusing on Breakages and
Surprises
16

Building Specialized Models
Test
Post-release
Defects
Train
Breakages
Test
Breakages
Train
Breakages
Compare
False
Positives
General model
Specialized model
17

Effort Savings Using Specialized Models
41 42
55
50
0
10
20
30
40
50
60
70
80
90
100
File LOC
EffortSavings(%)
Breakages
Surprises40-50% Effort Savings Using Specialized
Models
18

Take Home Messages
1. Breakages and Surprises are different. Occur in 2%
of files, hard to predict
2. Achieve 2-3X improvement in precision, high recall
Co-change and Time metrics
4. Building specialized models saves 40-50% effort
 Traditional metrics3. Breakages
Surprises 
19
http://research.cs.queensu.ca/home/emads/data/FSE2011/hid_artifact.html

Quantifying Effort Savings
Yes No
Yes 26 320
No 7 1093
Predicted
Actual
Yes No
Yes 26 538
No 7 875
Predicted
Actual
Set recall to be the same
Effort Savings ~41%!
General model Specialized model
21

Remaining Challenges
• “We tend to test features not files”
– Can we predict defects for features
• “Without knowing more about the nature of the
defect or recommendations for how to fix it, I
am not sure how we can use it”
– Predict the nature of defects
– Can we provide specific remediation strategies for
predicted defects
• e.g., surprises mostly relate to incorrectly implemented
requirements
22

Quantifying Effect…An Example…
Prediction
Model
Median Size
Median Pre-defects
.
.
Median age
2 x Median Size
0.10.2
23

Effect of Factors on Breakages and Surprises
154
39
-85
-19
-92
-150
-100
-50
0
50
100
150
200
Pre-release
defects
Size No. co-
changed files
Churn of co-
changed files
Latest change
Breakages
Surprises
24

High Impact Defects: Summary
Can we identify
them?
What factor best
predict them?
What is the value of
focusing on them?
Yes, 2-3X precision,
~70% recall
Breakages: Traditional
Surprises: Co-change and
release schedule
40-50% effort savings
25

Current approaches predict the obvious
Focus on high-impact, i.e. surprises and
breakages
Pre-defects and size predict Breakages
Number and churn of co-changed files
and late changes predict surprises
Using specialized models reduces effort by 40-50%
26

Study Overview
Extract
Metrics
Build
Statistical
Models
Analyze
Effect on
Quality
1. Traditional
2. Co-change
3. Time
Logistic Regression 1. Predictive &
explanative power
2. Quantify
Effect
27

Breakage Defects
Defects that break
existing functionality
Affect an established
customer base
Hurt quality image
28

Surprise Defects
Flag files with defects in
unexpected locations
Catch practitioners
off guard
Interrupt schedules
High ratio of post-
to-pre defects
29

Predicting Breakages and Surprises
Explanative Power
Breakages Surprises
17.8%
13.1%
State of Art
(post-release)
17.7 – 27.9%
30

Stability of Important Factors
Breakages
R1.1 R2.1 R3.1
No. co-changed files
Late changes
Pre-defects
R3 R4.1
Size
Churn co-changed files
Highly
stable
Mainly
stable
Not
stable
31

Stability of Important Factors
R1.1 R2.1 R3.1R3 R4.1 R1.1 R2.1 R3.1R3 R4.1
Breakages Surprises 32

Breakage Defects
Defects that break
existing functionality
Affect an established
customer base
Hurt quality image
33

Surprise Defects
Flag files with defects in
unexpected locations
Catch practitioners
off guard
Interrupt schedules
High ratio of post-
to-pre defects
34

Defect Prediction Helps Focus Quality
Assurance Efforts
Extract
Metrics
Size
Complexity
.
.
Post-release defects
D(f) = C + 0.1*size(f) + 0.2*complexity(f) + …
Model
(e.g. Logistic
Regression)
Extract
Metrics
Size
Complexity
D(f) = C + 0.1*size(f) + 0.2*complexity(f) + …
D(f) = 0.8
D(f) = 0.6
35

Factors Used to Model High Impact Defects
Size
Pre-release defects
Age
Number, churn, size, pre-release
changes, pre-release defects
Latest changes
Traditional
Co-changed files
Release schedule 36

Size
Pre-release
defects
# of files
Churn
Size
Pre-release defects
Pre-release changes
Latest Change
Age
Co-Changed Files
Prediction Factors
37

Evaluation of Prediction Model
Yes No
Yes TP FP
No FN TN
Predicted
Actual
Precision 𝑇𝑃
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
Recall
Training
2/3 Testing
1/3
Data
Build Model
Input
Outcome
38

Emad fse2011 final

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Emad fse2011 final

Similar to Emad fse2011 final (20)

More from SAIL_QU

More from SAIL_QU (20)

Emad fse2011 final

Editor's Notes