SlideShare a Scribd company logo
An Exploration of Challenges
Limiting Pragmatic Software
Defect Prediction
Emad Shihab
Queen’s University
Software Quality is Important!
Cost of Software Defects:
$59.5 Billion
2
100+ papers on defect prediction in
past 10 years 3
SDP can save verification efforts
by 29%
Limited Industrial Adoption
Interesting Makes Sense!
4
Why?
Prior Approaches are Not Adding Value!
5
Impact of defects in not considered
No guidance on what to do is provided
Prediction is too late and too defect-centric
We need pragmatic solutions!
Overview of Thesis
6
Pragmatic SDP
Considering Impact Providing Guidance
Proactive &
Encompassing SDP
Surprises &
Breakages
Re-opened
defects
Simplifying
SDP models
Unit test
creation
Risky Changes
Considering Impact
7
Surprises &
Breakages
Re-opened
defects
Surprise Defects
Low pre-, high post-release defects
Catch developers off-guard
Lead to schedule interruptions
Occur in unexpected locations
8
Factors Used to Model Surprise Defects
Size
Pre-release defects
Number, churn, size, pre-release
changes, pre-release defects
Latest change
Age
Traditional
Co-changed files
Time 9
High-Impact Defects
Precision
Recall
74.0%
5.9%
2.0%
Random Predictor
Prediction Results
2-3X precision, high recall
10
Traditional Co-change Time
Most Important Factors
0
5
10
15
20
25
30
35
40
R2.1 R3 R4 R4.1
DevianceExplained(%)
Co-change
Time
11
Considering Impact
12
Surprises &
Breakages
Re-opened
defects
Motivation
13
Degrade Quality Increase
Maintenance Costs
Unnecessary
Re-work
Factors Used to Model Re-opened
Defects
14
Work habit Bug report
Bug fix People
Prediction Results
Eclipse
49.9
72.6
Re-opened
Precisionandrecall(%)
Precision
Recall
15
+ ++
~3X precision and 72.6 recall
Most Important Factors
Eclipse
+ ++
Level Frequency Attributes
Level 1 10 Comment text
Level 2 20 Description text
16
Bug report information, especially
comments, are most important
Providing Guidance
17
Simplifying
SDP models
Unit test
creation
Motivation
18
Complexity metrics,
Program dependencies,
socio-technical networks
Size is a good
indicator of
buggy files
Use dependency
and complexity
metrics
Process and
code metrics
University of Lugano
Change coupling,
popularity and
design flaws
Change complexity
and social structures
Which metrics should I
use? How do they impact
my code quality?
Structure and
historical changes
Case Study
1. Build a model with initial set of 34 factors
2. Iteratively remove statistically insignificant and
highly correlated metrics
19
Replicate the study by Zimmermann et al. [310]
Main Findings
20
Narrowed down 34 code and process metrics to
only 3 or 4
0
10
20
30
40
50
60
70
80
90
100
Precision (%) Recall (%) Accuracy (%)
Performancemeasure(%)
Simple
All metrics
Providing Guidance
21
Simplifying
SDP models
Unit test
creation
Prioritizing Unit Test Creation
Use the rich history of the software system to
prioritize the creation of unit tests
0
10
20
30
40
50
60
70
80
90
100
Usefulness(%)
Time (Days)
Most Frequently Modified (MFM)
Most Frequently Fixed (MFF)
Largest Fixed (LF)
Change Risk (CR)
Random
Usefulness
Was writing the unit test useful?
2-3X improvement in usefulness
Encompassing and Proactive SDP
24
Risky Changes
Research Context
25
Defect
Prediction
Risky changes
An Example Change
26
Change 12345 by author@adesk on 2000/03/23 12:47:15
Purpose: Bug fix
Modifies API: Yes
Related Changes: 1234, 3421
…
Change description: Changed files A and B to implement new feature and
fix bug 123 ...
Files affected:
//root/comp1/subcomp1/A.java (+10, -1, e10)
//root/comp1/subcomp1/B.cpp (+1, -2, e5)
Risky?
Factors Used to Model Risky Changes
Lines and chunks add, deleted, modified,
total churn
No. of changes, No. of fixes, defectiveness,
No. developers
Developer experience, Bug fix?, No. linked
bugs
Changed files
Experience &
Defects
Code
Size
Modify Java, CPP, other, API
Developer Team
Precision
Recall
67.9%67.6%
+37%+87%
Prediction Results
37-87% improvement in precision,
67% recall
28
Most Important Factors
Developer Team
7 X Lines Added 10 X Chunks Added
7 X File defectiveness 6 X File defectiveness
None 3 X Modifies C++
2 X No. linked bugs
1 X Developer experience
4 X No. linked bugs
4 X Developer experience
Code added, file defectiveness, No. linked
defects and developer experience
29
30
31
Test Writing Factors
Most Frequently Modified (MFM)
Most Recently Modified (MRM)
Most Frequently Fixed (MFF)
Most Recently Fixed (MRF)
Largest Fixed (LF)
Largest Modified (LM)
Change Risk (CR)
Size Risk (SR)
Random
Modification
Fix
Size
Risk
Random
Prediction Performance
Actually Defective
Predicted Defective
TP
FP
FN Precision
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
Recall
33
High-Impact Defects: A Study of
Breakage and Surprise Defects
Emad Shihab, Audris Mockus, Yasutaka
Kamei, Bram Adams, Ahmed E. Hassan
We know that….
Software ^ has defects
How can we spend the limited
resources to maximize quality?
Q:
Projects ^ have limited resources
35
Defect Prediction
0.8
0.1
Prediction
Model
Size
Pre-release defects
.
.
Complexity
Input: Metrics
Churn
Output: Risk [0..1]
.
.
Key Predictors: Size and pre-release
defects
36
Existing Approaches Aren’t Adding Value
• Obvious to practitioners
• Require a large amount of effort
• Not all defects are equally important
So….what can we do?
FOCUS ON HIGH-IMPACT DEFECTS !
37
Impact Is In The Eye of The Beholder!
Customers: Breakages
Break existing functionality
Affect established customers
Hurt company image
Low pre-, high post-release defects
Catch developers off-guard
Lead to schedule interruptions
Developers: Surprises
Occur in unexpected locations
38
Case Study
Commercial telecom project
30+ years of development
7+ MLOC
Mainly in C/C++
39
Part 1 Part 2
Part 3 Part 4
Exploratory Study of
Breakages and Surprises
Prediction of Breakages
and Surprises
Understanding
Prediction Models of
Breakages and Surprises
Value of Focusing on
Breakages and Surprises
Study Overview
40
Exploratory Study of Breakages and
Surprises
All files
Breakages Surprises
Post-release
10%
2% 2%
Rare (2% of files)
6% overlap  Should study them separately
 Very difficult to model
41
Part 1 Part 2
Part 3 Part 4
Exploratory Study of
Breakages and Surprises
Prediction of
Breakages and Surprises
Understanding Prediction
Models of Breakages and
Surprises
Value of Focusing on
Breakages and Surprises
Predicting Breakages and Surprises
42
Prediction Using Logistic Regression
Outcome = Const + β1 factor 1
+ β2 factor2
+ β3 factor 3
.
.
+ βn factor n
Breakage?
Surprises?
Factors From 3 Dimensions
43
Factors Used to Model Breakages and
Surprises
Size
Pre-release defects
Number, churn, size, pre-release
changes, pre-release defects
Latest change
Age
Traditional
Co-changed files
Time 44
Breakages Surprises
Precision
Recall
74.1%71.2%
6.7%
2.0%
4.7%
2.0%
Random Predictor
Prediction Results
2-3X precision, high recall
45
Part 1 Part 2
Part 3 Part 4
Exploratory Study of
Breakages and Surprises
Prediction of Breakages
and Surprises
Understanding
Prediction Models of
Breakages and Surprises
Value of Focusing on
Breakages and Surprises
Understanding Breakages and
Surprises Models
46
Determining Important Factors
Traditional Co-change Time
15.6%
Quality of fit  Deviance Explained
+1.5% +0.4%
Example: Breakages R1.1 47
Traditional Co-change Time
Important Factors for High-Impact Defects
0
5
10
15
20
25
30
35
40
R1.1 R2.1 R3 R4 R4.1
0
5
10
15
20
25
30
35
40
R1.1 R2.1 R3 R4 R4.1
Breakages Surprises
DevianceExplained(%)
Traditional
Co-change
Time
48
Part 1 Part 2
Part 3 Part 4
Exploratory Study of
Breakages and Surprises
Prediction of Breakages
and Surprises
Understanding Prediction
Models of Breakages and
Surprises
Value of Focusing on
Breakages and Surprises
Value of Focusing on Breakages and
Surprises
49
Building Specialized Models
Test
Post-release
Defects
Train
Breakages
Test
Breakages
Train
Breakages
Compare
False
Positives
General model
Specialized model
50
Effort Savings Using Specialized Models
41 42
55
50
0
10
20
30
40
50
60
70
80
90
100
File LOC
EffortSavings(%)
Breakages
Surprises40-50% Effort Savings Using Specialized
Models
51
Take Home Messages
1. Breakages and Surprises are different. Occur in 2%
of files, hard to predict
2. Achieve 2-3X improvement in precision, high recall
Co-change and Time metrics
4. Building specialized models saves 40-50% effort
 Traditional metrics3. Breakages
Surprises 
52
http://research.cs.queensu.ca/home/emads/data/FSE2011/hid_artifact.html
Predicting Re-opened Bugs
A Case Study on the Eclipse Project
Emad Shihab, A. Ihara, Y. Kamei, W. Ibrahim,
M. Ohira, B. Adams, A. E. Hassan and K. Matsumoto
emads@cs.queensu.ca
SAIL, Queen’s University, Canada
NAIST, Japan
53
When you discover a bug …
Report bug Fix bug Verify fix Close bug
Re-opened
54
Degrade quality …
55
Increase maintenance costs …
56
Unnecessary re-work…
57
Research questions …
1. Which attributes indicate re-opened bugs?
2. Can we accurately predict if a bug will be re-
opened using the extracted attributes?
58
Determine
best
attributes
Mine code
and bug
repositories
Approach overview
Extract
attributes
Predict re-
opened bugs
59
Our dimensions …
60
Work habit Bug report
Bug fix People
Work habit attributes
1. Time (Hour of day)
2. Weekday
3. Day of month
4. Month
61
Bug report attributes
1. Component
2. Platform
3. Severity
4. Priority
5. CC list
6. Priority changed
7. Description size
8. Description text
9. Number of comments
10. Comment size
11. Comment text
62
Metadata
Textual
data
Bug fix attributes
1. Time to resolve (in days)
2. Last status
3. Number of edited files
63
People attributes
1. Reporter Name
2. Reporter experience
3. Fixer name
4. Fixer experience
64
Research question 1
Which attributes indicate re-opened bugs?
65
Comment text, description text and fix location
(component) are the best indicators
Top node analysis setup
1. Build 10 decision trees for each attribute set
3. Repeat using all attributes
2. Record the frequency and level of each attribute
66
Decision tree prediction model
67
No. files
>= 5 < 5
Dev exp
>= 3 < 3
Re-openedMonth
Time
>= 12 < 12
Time to resolve
>= 6 < 6 >= 24 < 24
Re-opened Not Re-opened Re-opened.
.
.
.
.
.
Level 1
Level 2
Level 3
Top node analysis example with 3
trees
Comment
Time No. comments
Comment
Time No. files
No. files
Time Description size
Level Frequency Attributes
Level 1 2
1
Comment
No. files
Level 2 3
1
1
1
Time
No. comments
No. files
Description size
.
.
.
.
.
.
68
Which attributes best indicate re-
opened bugs?
69
Work habit attributes
9 X Month
1 X Time (Hour of day)
Weekday
Day of month
Which attributes best indicate re-
opened bugs?
70
Bug report attributes
Component
Platform
Severity
Priority
CC list
Priority changed
Description size
Description text
Number of comments
Comment size
10 X Comment text
Metadata
Textual
data
Which attributes best indicate re-
opened bugs?
7 X Time to resolve
3 X Last status
Number of files in fix
71
Bug fix attributes
Which attributes best indicate re-
opened bugs?
5 X Reporter name
5 X Fixer name
Reporter experience
Fixer experience
72
People attributes
Combining all attributes
+ ++
Level Frequency Attributes
Level 1 10 Comment text
Level 2 19
1
Description text
Component
73
Research question 2
Can we accurately predict if a bug will be
re-opened using the extracted attributes?
74
Our models can correctly predict re-opened bugs with
63% precision and 85% recall
Decision tree prediction model
75
No. files
>= 5 < 5
Dev exp
>= 3 < 3
Re-openedMonth
Time
>= 12 < 12
Time to resolve
>= 6 < 6 >= 24 < 24
Re-opened Not Re-opened Re-opened.
.
.
.
.
.
Level 1
Level 2
Level 3
Performance measures
Re-opened precision:
Re-opened Recall:
Re-opened Not re-opened
Re-opened TP FP
Not re-opened FN TN
Predicted
Actual
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
Not re-opened precision:
Not re-opened recall:
𝑇𝑁
𝑇𝑁 + 𝐹𝑁
𝑇𝑁
𝑇𝑁 + 𝐹𝑃
76
33
63
21
27
74
83 83
67
Work habits Bug report Bug fix People
Precisionandrecall(%)
Precision
Recall
Predicting re-opened bugs
77
93
97
93 91
71
91
39
66
Work habits Bug report Bug fix People
Precisionandrecall(%)
Precision
Recall
Predicting NOT re-opened bugs
78
Combining all attributes
63
97
85
90
re-opened NOT re-opened
Precisionandrecall(%)
Precision
Recall
79
+ ++
Bug comments are important …
Bug report is most important set
What words are important?
Comment text most important bug report attribute
80
Important words
Re-opened Not Re-opened
control
background
debugging
breakpoint
blocked
platforms
verified
duplicate
screenshot
important
testing
warning
81
82
Understanding the Impact of Code and Process
Metrics on Post-release Defects: A Case Study on
the Eclipse Project
Emad Shihab, Zhen Ming Jiang, Walid Ibrahim,
Bram Adams and Ahmed E. Hassan
Software Analysis and Intelligence Lab (SAIL)
Queen’s University 83
Motivation
Software has^ bugs and
managers have ^ limited resources
84
How to allocate quality assurance
resources?
Q:
A: Defect prediction!
Motivation
85
Complexity metrics,
Program dependencies,
socio-technical networks
Size is a good
indicator of
buggy files
Use dependency
and complexity
metrics
Use number of
imports and
code metrics
Use process
and code
metrics
University of Lugano
Change coupling,
popularity and
design flaws
Change complexity
and social structures
Which metrics should I
use? How do they impact
my code quality?
The challenge we face …
1. more work to mine
2. difficult to understand impact
3. less adoption in practice
86
more metrics, means ….
Our goal ….
Use a statistical approach based on work by Cataldo et al. :
1. Narrow down large set of metrics to much smaller set
2. Study the impact on post-release defects
87
Our findings ….
Narrowed down 34 code and process metrics to only 3 or 4
Simple models achieve comparable predictive power
Explanative power of simple model outperform 95% PCA
88
Some metrics ALWAYS matter: Size and pre-bugs
Let me show you how ….
34 Code and Process Metrics
Metric Description
POST Number of post-release defects in a file in the 6 months after the release.
PRE Number of pre-release defects in a file in the 6 months before the release
TPC Total number of changes to a file in the 6 months before the release
BFC Number of bug fixing changes in a file in the 6 months before the release.
TLOC Total number lines of code of a file
ACD Number of anonymous type declarations in a file
FOUT (3) Number of method calls of a file
MLOC (3) Number of method lines of code
NBD (3) Nested block depth of the methods in a file
NOF (3) Number of fields of the classes in a file
NOI Number of interfaces in a file
NOM (3) Number of methods of the classes in a file
NOT Number of classes in a file
NSF (3) Number of static fields of the classes in a file
NSM (3) Number of static methods of the classes in a file
PAR (3) Number of parameters of the methods in a file
VG (3) McCabe cyclomatic complexity of the methods in a file
Process
Metrics
Code
Metrics
89
Approach overview
P < 0.1 VIF < 2.5
1. Build Logistic Regression model using all metrics
2. Remove statistically insignificant metrics
3. Remove highly co-linear metrics
4. Narrow down to a much smaller set of metrics 90
Initial model
w/ all metrics
Statistical
significance
check
Co-linearity
check
Simple
model
The std error of metric coefficient is
~1.6 times as large if metrics were
uncorrelated
5.2
Case study
Perform case study on Eclipse 2.0, 2.1 and 3.0
RQ1: Which metrics impact post-release defects?
Do these metrics change for different releases of Eclipse?
RQ2: How much do metrics impact the post-release defects?
Does the level of impact change across different releases?
91
Metric Eclipse 2.0 Eclipse 2.1 Eclipse 3.0
P-value VIF P-value VIF P-value VIF
Anonymous Type Declarations * 1.2
No. of Static Methods *** 1.1
No. of Parameters *** 1.2
No. Pre-release Defects *** 1.1 *** 1.1 *** 1.2
Total Prior Changes *** 1.1 ** 1.1
Total lines of Code *** 1.3 *** 1.4 *** 1.3
RQ1: Which metrics impact? Do
they change?
92
Important and stable for all releases
Code metrics specific for release
(p<0.001 ***; p<0.001 **, p<0.05*)
RQ2: How much do metrics
explain?
93
Metric Eclipse 2.0 Eclipse 2.1 Eclipse 3.0
Total lines of Code
Total Prior Changes
No. Pre-release defects
No. of Parameters
No. of static methods
Anonymous Type Declarations
Deviance explained 25.2% 17.7% 21.2%
0.1%
4.9%
17.6%
2.2%
11.2%
6.3%
0.2%
14.5%
5.9%
0.5%
0.7%
Size and process metrics are most important
How well the model fits, explains
the observed phenomena
RQ2: Impact of the metrics?
Eclipse 3.0
94
Metric Odds-ratios
(M 1)
Odds-ratios
(M2)
Odds-ratios
(M 3)
Odds-ratios
(M 4)
Lines of Code 2.57 2.40 2.11 1.88
Prior Changes 1.87 1.62 1.62
Pre-release defects 1.87 1.90
Max parameters of
methods
1.73
1 unit increase, increases the chance
of post-release defect by 90%
Odds ratios are used to quantify impact on post-release
defects
But …What about predictive power?
Eclipse 3.0
95
Simple models achieve comparable results to
more complex models
0
10
20
30
40
50
60
70
80
90
100
Precision (%) Recall (%) Accuracy (%)
Performancemeasure(%)
Simple
All metrics
Comparing to PCA
Eclipse 3.0
96
Simple 95% PCA 99% PCA 100% PCA
Deviance
explained
21.2% 16.3% 21.7% 22.0%
No. of
metrics
4 33 33 33
No. of PCs - 8 15 33
Can outperform 95% PCA, using much simpler models
Comparing to PCA
97
0 10 20 30
Eclipse 2.0
Eclipse 2.1
Eclipse 3.0
Deviance explained (%)
100% PCA
95% PCA
Simple
Outperform 95% PCA,
slightly below 100% PCA
Use at most 4 metrics
Vs. 34 metrics used in
PCA
Conclusion
98
Prioritizing Unit Test Creation for
Test-Driven Maintenance of Legacy
Systems
Emad Shihab, Zhen Ming Jiang, Bram Adams,
Ahmed E. Hassan and Robert Bowerman
Queen’s University and Research In Motion
Canada
Test Driven Development (TDD)
Write unit test
before
writing new code
What about already written code
Test Drive Maintenance (TDM)
Adopting
Test Driven Development (TDD)
for Legacy Applications
But time and resources are limited!
Prioritizing Unit Test Creation
Use the rich history of the legacy system to
prioritize the writing of unit tests
Avoid the most bugs effectively!
Write unit tests for functions with best
Return on Investment (ROI)
How can we avoid the most
bugs given limited resources?
Testing Writing Prioritization
Heuristics
Most Frequently Modified (MFM)
Most Recently Modified (MRM)
Most Frequently Fixed (MFF)
Most Recently Fixed (MRF)
Largest Fixed (LF)
Largest Modified (LM)
Change Risk (CR)
Size Risk (SR)
Random
Modification
Fix
Size
Risk
Random
Usefulness
Was writing the unit test useful?
Time to write unit test
A
B
C
6 bug fixes
2 bug fixes
0 bug fixes
Usefulness = 2/3 = 66.67%
POP: Percentage of Optimal Performance
How close are we to the optimal performance?
Time to write unit test
A
B
C
6 bug fixes
2 bug fixes
0 bug fixes
POP = 8/13 = 61.5%
D
E
4 bug fixes
3 bug fixes
Case Study
Simulation Parameters
Calibration Period: 6 months
Simulation time: 5 years
List Size: 10 functions
Effort: 100 lines per day
Study Setup
Extracting Historical Data
1. Search modification record comments for keywords
and bug identifiers
2. Extract source code of modified file(s) and compare
to previous version to identify changed functions
3. Combine data from 1 and 2 to identify
changed/fixed functions
main() {
int a;
/*call
help*/
helpInfo();
}
helpInfo() {
errorString!
}
main() {
int a;
/*call
help*/
helpInfo();
}
helpInfo(){
int b;
}
main() {
int a;
/*call
help*/
helpInfo();
}
V1:
Undefined func.
(Link Error)
V2:
Syntax error
V3:
Valid code
Mapping Historical Changes to Functions
Study Setup
Measuring the Performance of a Heuristic
Based on a heuristic, generate list of X
functions to write unit tests for
Use size of function to measure effort
required to write unit test
Test Writing Heuristics
Most Frequently Modified (MFM)
Most Recently Modified (MRM)
Most Frequently Fixed (MFF)
Most Recently Fixed (MRF)
Largest Fixed (LF)
Largest Modified (LM)
Change Risk (CR)
Size Risk (SR)
Random
Modification
Fix
Size
Risk
Random
Best Test Writing Heuristics
Most Frequently Modified (MFM)
Most Recently Modified (MRM)
Most Frequently Fixed (MFF)
Most Recently Fixed (MRF)
Largest Fixed (LF)
Largest Modified (LM)
Change Risk (CR)
Size Risk (SR)
Random
Modification
Fix
Size
Risk
Random
0
10
20
30
40
50
60
70
80
90
100
Usefulness(%)
Time (Days)
Most Frequently Modified (MFM)
Most Frequently Fixed (MFF)
Largest Fixed (LF)
Change Risk (CR)
Random
Usefulness
Was writing the unit test useful?
POP: Percentage of Optimal Performance
How close are we to the optimal performance?
0
10
20
30
40
50
60
70
80
90
100
Percentageofoptimalperformance(%)
Time (days)
Most Frequently Modified (MFM)
Most Frequently Fixed (MFF)
Largest Fixed (LF)
Change Risk (CR)
Random
87
84.7
83.8
80
56.9
55
48.8
43.1
27.7
32.4
32.2
22.2
21.8
7
5.5
4.3
4.9
1.7
0 10 20 30 40 50 60 70 80 90 100
Largest Fixed (LF)
Largest Modified (LM)
Most Frequently Fixed (MFF)
Most Frequently Modified (MFM)
Most Recently Fixed (MRF)
Change Risk (CR)
Size Risk (SR)
Most Recently Modified (MRM)
Random
Overall Performance of Heuristics
POP
Usefulness
0
5
10
15
20
25
30
35
40
5 10 20
Percentageofoptimalperformance(%)
List size
Effect of Varying Parameters
Varying List Size
Most Frequently Modified (MFM)
Most Frequently Fixed (MFF)
Largest Fixed (LF)
Change Risk (CR)
Random
0
5
10
15
20
25
30
35
40
45
50
50 100 200
Percentageofoptimalperformance(%)
Effort (lines per day)
Effect of Varying Parameters
Varying Writing Effort
Most Frequently Modified (MFM)
Most Frequently Fixed (MFF)
Largest Fixed (LF)
Change Risk (CR)
Random
Conclusion
%
Risky Changes
119
Overview of Change Integration Process
Local
Repository
Risky?
Yes
Closer
review
No
Main
Repository
Change
120
Case Study
Commercial mobile system
Dec 2009 – Dec 2010
450+ developers
60+ teams
7000+ changes
Mainly in Java and C/C++
121
Part 1 Part 2 Part 3
Prediction of
Risky Changes
Understanding
Risky Changes
Misclassification
of Risky Changes
Study Overview
122
An Example Change
123
Change 12345 by author@adesk on 2000/03/23 12:47:15
Purpose: Bug fix
Modifies API: Yes
Related Changes: 1234, 3421
…
Change description: Changed files A and B to implement new feature and
fix bug 123 ...
Files affected:
//root/comp1/subcomp1/A.java (+10, -1, e10)
//root/comp1/subcomp1/B.cpp (+1, -2, e5)
Risky?
Factors Used to Model Risky Changes
Lines and chunks add, deleted, modified,
total churn
No. of changes, No. of fixes, bugginess, No.
developers
Developer experience, Bug fix?, No. linked
bugs
Changed files
Experience &
Defects
Code
Size
Modify Java, CPP, other, API
Developer Team
Precision
Recall
67.9%67.6%
+37%+87%
Prediction Results
37-87% improvement in precision,
67% recall
125
Part 1 Part 2 Part 3
Prediction of
Risky Changes
Understanding
Risky Changes
Misclassification
of Risky Changes
Study Overview
126
Most Important Factors
Developer Team
7 X Lines Added 10 X Chunks Added
7 X File bugginess 6 X File bugginess
None 3 X Modifies C++
2 X No. linked bugs
1 X Developer experience
4 X No. linked bugs
4 X Developer experience
Code added, file bugginess, No. linked
defects and developer experience
127
Part 1 Part 2 Part 3
Prediction of
Risky Changes
Understanding
Risky Changes
Misclassification
of Risky Changes
Study Overview
128
When were Developers Wrong?
Compare percentage of correctly and wrongly classified
changes:
• Cause: Unclear requirements, inadequate testing, coding
errors, design flaw
• Related changes?
• Modifies API code?
Changes that have related changes are 10
times more likely to be wrongly classified!
129
Reality Check!
130
Too much to review Impact is not
considered
Models are not
explainable
Success Story!
A tool based on this work is being used by RIM’s
Handheld Integration Team
Tools team is working on building a tool to be
deployed company wide
131
Overview of Change Integration Process
Local
Repository
Risky?
Yes
Closer
review
No
Main
Repository
Change
132
When were Developers Wrong?
Compare percentage of correctly and wrongly classified
changes:
• Cause: Unclear requirements, inadequate testing, coding
errors, design flaw
• Related changes?
• Modifies API code?
Changes that have related changes are 10
times more likely to be wrongly classified!
133
Evaluation of Prediction Models
Training
2/3 Testing
1/3
Data
Build Model
Input
Actually
defective
139
X 10
Pr(Defecti) = α + β1 * metric iPr(Defecti) = 0.1 + 0.5 * metric i
Pr(Defecti)
Evaluation of Prediction Models
140
Actually Defective
Predicted Defective
TP
FP
FN
Precision: “How small is FP”
Recall: “How small is FN”
Putting It All Together
141
Metrics
32 product metrics
1 process metric
142
Logistic Regression Model
Pr(Defect) = α + β1 metric 1
+ β2 metric 2
+ β3 metric 3
.
.
+ βn metric n
32 Product & 1 Process Metrics
Post-release
Defect
Cutoff = 0.5 143
Prediction Performance
Actually Defective
Predicted Defective
TP
FP
FN Precision
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
Recall
= 0.68
= 0.38
144
Defect Prediction
0.8
0.1
Prediction
Model
Size
Pre-release defects
Complexity
Input: Metrics
Churn
Output: Risk [0..1]
145
The challenge we face …
1. more work to mine
2. difficult to understand impact
3. less adoption in practice
146
more metrics, means ….

More Related Content

What's hot

CLOUD TESTING MODEL – BENEFITS, LIMITATIONS AND CHALLENGES
CLOUD TESTING MODEL – BENEFITS, LIMITATIONS AND CHALLENGESCLOUD TESTING MODEL – BENEFITS, LIMITATIONS AND CHALLENGES
CLOUD TESTING MODEL – BENEFITS, LIMITATIONS AND CHALLENGES
International Journal of Computer and Communication System Engineering
 
Why On-Demand Provisioning Enables Tighter Alignment of Test and Production E...
Why On-Demand Provisioning Enables Tighter Alignment of Test and Production E...Why On-Demand Provisioning Enables Tighter Alignment of Test and Production E...
Why On-Demand Provisioning Enables Tighter Alignment of Test and Production E...
Cognizant
 
Rapid software testing
Rapid software testingRapid software testing
Rapid software testing
Sachin MK
 
Exploratory Testing: Make It Part of Your Test Strategy
Exploratory Testing: Make It Part of Your Test StrategyExploratory Testing: Make It Part of Your Test Strategy
Exploratory Testing: Make It Part of Your Test Strategy
TechWell
 
Getting Started with Risk-based Testing
Getting Started with Risk-based TestingGetting Started with Risk-based Testing
Getting Started with Risk-based Testing
TechWell
 
Marc perillo
Marc perilloMarc perillo
Marc perillo
MarcPerillo1
 
QC Sprinter Whitepaper
QC Sprinter WhitepaperQC Sprinter Whitepaper
QC Sprinter Whitepaper
esbosman
 
Rapid Software Testing
Rapid Software TestingRapid Software Testing
Rapid Software Testing
vladimir zaremba
 
A Productive Method for Improving Test Effectiveness
A Productive Method for Improving Test EffectivenessA Productive Method for Improving Test Effectiveness
A Productive Method for Improving Test Effectiveness
Shradha Singh
 
Regression Optimizer
Regression OptimizerRegression Optimizer
Regression Optimizer
Shradha Singh
 
Defect effort prediction models in software maintenance projects
Defect  effort prediction models in software maintenance projectsDefect  effort prediction models in software maintenance projects
Defect effort prediction models in software maintenance projects
iaemedu
 
When Agile is a Quality Game Changer Webinar - Michael Mah, Philip Lew
When Agile is a Quality Game Changer Webinar - Michael Mah, Philip LewWhen Agile is a Quality Game Changer Webinar - Michael Mah, Philip Lew
When Agile is a Quality Game Changer Webinar - Michael Mah, Philip Lew
XBOSoft
 
Software reliability & quality
Software reliability & qualitySoftware reliability & quality
Software reliability & quality
Nur Islam
 
Black box-software-testing-douglas-hoffman2483
Black box-software-testing-douglas-hoffman2483Black box-software-testing-douglas-hoffman2483
Black box-software-testing-douglas-hoffman2483
Chaitanya Kn
 
Not Your Grandfather's Requirements-Based Testing Webinar – Robin Goldsmith, ...
Not Your Grandfather's Requirements-Based Testing Webinar – Robin Goldsmith, ...Not Your Grandfather's Requirements-Based Testing Webinar – Robin Goldsmith, ...
Not Your Grandfather's Requirements-Based Testing Webinar – Robin Goldsmith, ...
XBOSoft
 
Software Reliability Engineering
Software Reliability EngineeringSoftware Reliability Engineering
Software Reliability Engineering
guest90cec6
 
Getting Started with Risk-Based Testing
Getting Started with Risk-Based TestingGetting Started with Risk-Based Testing
Getting Started with Risk-Based Testing
TechWell
 
Top Ten things that have been proven to effect software reliability
Top Ten things that have been proven to effect software reliabilityTop Ten things that have been proven to effect software reliability
Top Ten things that have been proven to effect software reliability
Ann Marie Neufelder
 
Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...
Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...
Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...
TEST Huddle
 
IEEE PSRC - Quality Assurance for Protection and Control Design
IEEE PSRC -  Quality Assurance for Protection and Control DesignIEEE PSRC -  Quality Assurance for Protection and Control Design
IEEE PSRC - Quality Assurance for Protection and Control Design
Jose J. Rodriguez Alvarez, MEM
 

What's hot (20)

CLOUD TESTING MODEL – BENEFITS, LIMITATIONS AND CHALLENGES
CLOUD TESTING MODEL – BENEFITS, LIMITATIONS AND CHALLENGESCLOUD TESTING MODEL – BENEFITS, LIMITATIONS AND CHALLENGES
CLOUD TESTING MODEL – BENEFITS, LIMITATIONS AND CHALLENGES
 
Why On-Demand Provisioning Enables Tighter Alignment of Test and Production E...
Why On-Demand Provisioning Enables Tighter Alignment of Test and Production E...Why On-Demand Provisioning Enables Tighter Alignment of Test and Production E...
Why On-Demand Provisioning Enables Tighter Alignment of Test and Production E...
 
Rapid software testing
Rapid software testingRapid software testing
Rapid software testing
 
Exploratory Testing: Make It Part of Your Test Strategy
Exploratory Testing: Make It Part of Your Test StrategyExploratory Testing: Make It Part of Your Test Strategy
Exploratory Testing: Make It Part of Your Test Strategy
 
Getting Started with Risk-based Testing
Getting Started with Risk-based TestingGetting Started with Risk-based Testing
Getting Started with Risk-based Testing
 
Marc perillo
Marc perilloMarc perillo
Marc perillo
 
QC Sprinter Whitepaper
QC Sprinter WhitepaperQC Sprinter Whitepaper
QC Sprinter Whitepaper
 
Rapid Software Testing
Rapid Software TestingRapid Software Testing
Rapid Software Testing
 
A Productive Method for Improving Test Effectiveness
A Productive Method for Improving Test EffectivenessA Productive Method for Improving Test Effectiveness
A Productive Method for Improving Test Effectiveness
 
Regression Optimizer
Regression OptimizerRegression Optimizer
Regression Optimizer
 
Defect effort prediction models in software maintenance projects
Defect  effort prediction models in software maintenance projectsDefect  effort prediction models in software maintenance projects
Defect effort prediction models in software maintenance projects
 
When Agile is a Quality Game Changer Webinar - Michael Mah, Philip Lew
When Agile is a Quality Game Changer Webinar - Michael Mah, Philip LewWhen Agile is a Quality Game Changer Webinar - Michael Mah, Philip Lew
When Agile is a Quality Game Changer Webinar - Michael Mah, Philip Lew
 
Software reliability & quality
Software reliability & qualitySoftware reliability & quality
Software reliability & quality
 
Black box-software-testing-douglas-hoffman2483
Black box-software-testing-douglas-hoffman2483Black box-software-testing-douglas-hoffman2483
Black box-software-testing-douglas-hoffman2483
 
Not Your Grandfather's Requirements-Based Testing Webinar – Robin Goldsmith, ...
Not Your Grandfather's Requirements-Based Testing Webinar – Robin Goldsmith, ...Not Your Grandfather's Requirements-Based Testing Webinar – Robin Goldsmith, ...
Not Your Grandfather's Requirements-Based Testing Webinar – Robin Goldsmith, ...
 
Software Reliability Engineering
Software Reliability EngineeringSoftware Reliability Engineering
Software Reliability Engineering
 
Getting Started with Risk-Based Testing
Getting Started with Risk-Based TestingGetting Started with Risk-Based Testing
Getting Started with Risk-Based Testing
 
Top Ten things that have been proven to effect software reliability
Top Ten things that have been proven to effect software reliabilityTop Ten things that have been proven to effect software reliability
Top Ten things that have been proven to effect software reliability
 
Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...
Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...
Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...
 
IEEE PSRC - Quality Assurance for Protection and Control Design
IEEE PSRC -  Quality Assurance for Protection and Control DesignIEEE PSRC -  Quality Assurance for Protection and Control Design
IEEE PSRC - Quality Assurance for Protection and Control Design
 

Similar to An Exploration of Challenges Limiting Pragmatic Software Defect Prediction

Emad fse2011 final
Emad fse2011 finalEmad fse2011 final
Emad fse2011 final
SAIL_QU
 
Software Engineering by Pankaj Jalote
Software Engineering by Pankaj JaloteSoftware Engineering by Pankaj Jalote
Software Engineering by Pankaj Jalote
Golda Margret Sheeba J
 
Slides chapter 3
Slides chapter 3Slides chapter 3
Slides chapter 3
Hardik Patel
 
Slides chapter 3
Slides chapter 3Slides chapter 3
Slides chapter 3
Priyanka Shetty
 
Icse 2011 ds_1
Icse 2011 ds_1Icse 2011 ds_1
Icse 2011 ds_1
SAIL_QU
 
Otto Vinter - Analysing Your Defect Data for Improvement Potential
Otto Vinter - Analysing Your Defect Data for Improvement PotentialOtto Vinter - Analysing Your Defect Data for Improvement Potential
Otto Vinter - Analysing Your Defect Data for Improvement Potential
TEST Huddle
 
chapter1-convehisudhiusdiudiudsiusdiuddsdshdibsdiubdsjxkjxjntionalsoftwareman...
chapter1-convehisudhiusdiudiudsiusdiuddsdshdibsdiubdsjxkjxjntionalsoftwareman...chapter1-convehisudhiusdiudiudsiusdiuddsdshdibsdiubdsjxkjxjntionalsoftwareman...
chapter1-convehisudhiusdiudiudsiusdiuddsdshdibsdiubdsjxkjxjntionalsoftwareman...
WrushabhShirsat3
 
Chapter1 conventional softwaremanagement (1)
Chapter1 conventional softwaremanagement (1)Chapter1 conventional softwaremanagement (1)
Chapter1 conventional softwaremanagement (1)
Jkumararaja
 
Establishing A Defect Prediction Model Using A Combination of Product Metrics...
Establishing A Defect Prediction Model Using A Combination of Product Metrics...Establishing A Defect Prediction Model Using A Combination of Product Metrics...
Establishing A Defect Prediction Model Using A Combination of Product Metrics...
MIMOS Berhad/Open University Malaysia/Universiti Teknologi Malaysia
 
SW Engineering Management
SW Engineering ManagementSW Engineering Management
SW Engineering Management
Robert Sayegh
 
Lean for Competitive Advantage and Customer Delight
Lean for Competitive Advantage and Customer DelightLean for Competitive Advantage and Customer Delight
Lean for Competitive Advantage and Customer Delight
Lean India Summit
 
Software engineering the process
Software engineering the processSoftware engineering the process
Software engineering the process
Dr. Anthony Vincent. B
 
Software requirements engineering
Software requirements engineeringSoftware requirements engineering
Software requirements engineering
Abdul Basit
 
When do software issues get reported in large open source software
When do software issues get reported in large open source softwareWhen do software issues get reported in large open source software
When do software issues get reported in large open source software
RAKESH RANA
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
Michaela Greiler
 
Metrics
MetricsMetrics
Metrics
geethawilliam
 
The process
The processThe process
The process
prakashvs7
 
Pm soln9416141129710
Pm soln9416141129710Pm soln9416141129710
Pm soln9416141129710
Nikhil Todkar
 
Risk Driven Testing
Risk Driven TestingRisk Driven Testing
Risk Driven Testing
Jorge Boria
 
Automation in the Bug Flow - Machine Learning for Triaging and Tracing
Automation in the Bug Flow - Machine Learning for Triaging and TracingAutomation in the Bug Flow - Machine Learning for Triaging and Tracing
Automation in the Bug Flow - Machine Learning for Triaging and Tracing
Markus Borg
 

Similar to An Exploration of Challenges Limiting Pragmatic Software Defect Prediction (20)

Emad fse2011 final
Emad fse2011 finalEmad fse2011 final
Emad fse2011 final
 
Software Engineering by Pankaj Jalote
Software Engineering by Pankaj JaloteSoftware Engineering by Pankaj Jalote
Software Engineering by Pankaj Jalote
 
Slides chapter 3
Slides chapter 3Slides chapter 3
Slides chapter 3
 
Slides chapter 3
Slides chapter 3Slides chapter 3
Slides chapter 3
 
Icse 2011 ds_1
Icse 2011 ds_1Icse 2011 ds_1
Icse 2011 ds_1
 
Otto Vinter - Analysing Your Defect Data for Improvement Potential
Otto Vinter - Analysing Your Defect Data for Improvement PotentialOtto Vinter - Analysing Your Defect Data for Improvement Potential
Otto Vinter - Analysing Your Defect Data for Improvement Potential
 
chapter1-convehisudhiusdiudiudsiusdiuddsdshdibsdiubdsjxkjxjntionalsoftwareman...
chapter1-convehisudhiusdiudiudsiusdiuddsdshdibsdiubdsjxkjxjntionalsoftwareman...chapter1-convehisudhiusdiudiudsiusdiuddsdshdibsdiubdsjxkjxjntionalsoftwareman...
chapter1-convehisudhiusdiudiudsiusdiuddsdshdibsdiubdsjxkjxjntionalsoftwareman...
 
Chapter1 conventional softwaremanagement (1)
Chapter1 conventional softwaremanagement (1)Chapter1 conventional softwaremanagement (1)
Chapter1 conventional softwaremanagement (1)
 
Establishing A Defect Prediction Model Using A Combination of Product Metrics...
Establishing A Defect Prediction Model Using A Combination of Product Metrics...Establishing A Defect Prediction Model Using A Combination of Product Metrics...
Establishing A Defect Prediction Model Using A Combination of Product Metrics...
 
SW Engineering Management
SW Engineering ManagementSW Engineering Management
SW Engineering Management
 
Lean for Competitive Advantage and Customer Delight
Lean for Competitive Advantage and Customer DelightLean for Competitive Advantage and Customer Delight
Lean for Competitive Advantage and Customer Delight
 
Software engineering the process
Software engineering the processSoftware engineering the process
Software engineering the process
 
Software requirements engineering
Software requirements engineeringSoftware requirements engineering
Software requirements engineering
 
When do software issues get reported in large open source software
When do software issues get reported in large open source softwareWhen do software issues get reported in large open source software
When do software issues get reported in large open source software
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
 
Metrics
MetricsMetrics
Metrics
 
The process
The processThe process
The process
 
Pm soln9416141129710
Pm soln9416141129710Pm soln9416141129710
Pm soln9416141129710
 
Risk Driven Testing
Risk Driven TestingRisk Driven Testing
Risk Driven Testing
 
Automation in the Bug Flow - Machine Learning for Triaging and Tracing
Automation in the Bug Flow - Machine Learning for Triaging and TracingAutomation in the Bug Flow - Machine Learning for Triaging and Tracing
Automation in the Bug Flow - Machine Learning for Triaging and Tracing
 

More from SAIL_QU

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
SAIL_QU
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load tests
SAIL_QU
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...
SAIL_QU
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...
SAIL_QU
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
SAIL_QU
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
SAIL_QU
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...
SAIL_QU
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?
SAIL_QU
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log Changes
SAIL_QU
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution Analyses
SAIL_QU
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
SAIL_QU
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
SAIL_QU
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
SAIL_QU
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
SAIL_QU
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?
SAIL_QU
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
SAIL_QU
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...
SAIL_QU
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
SAIL_QU
 

More from SAIL_QU (20)

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load tests
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log Changes
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution Analyses
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
 

Recently uploaded

ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
kalichargn70th171
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 

Recently uploaded (20)

ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 

An Exploration of Challenges Limiting Pragmatic Software Defect Prediction

  • 1. An Exploration of Challenges Limiting Pragmatic Software Defect Prediction Emad Shihab Queen’s University
  • 2. Software Quality is Important! Cost of Software Defects: $59.5 Billion 2
  • 3. 100+ papers on defect prediction in past 10 years 3 SDP can save verification efforts by 29%
  • 5. Prior Approaches are Not Adding Value! 5 Impact of defects in not considered No guidance on what to do is provided Prediction is too late and too defect-centric We need pragmatic solutions!
  • 6. Overview of Thesis 6 Pragmatic SDP Considering Impact Providing Guidance Proactive & Encompassing SDP Surprises & Breakages Re-opened defects Simplifying SDP models Unit test creation Risky Changes
  • 8. Surprise Defects Low pre-, high post-release defects Catch developers off-guard Lead to schedule interruptions Occur in unexpected locations 8
  • 9. Factors Used to Model Surprise Defects Size Pre-release defects Number, churn, size, pre-release changes, pre-release defects Latest change Age Traditional Co-changed files Time 9
  • 11. Traditional Co-change Time Most Important Factors 0 5 10 15 20 25 30 35 40 R2.1 R3 R4 R4.1 DevianceExplained(%) Co-change Time 11
  • 14. Factors Used to Model Re-opened Defects 14 Work habit Bug report Bug fix People
  • 16. Most Important Factors Eclipse + ++ Level Frequency Attributes Level 1 10 Comment text Level 2 20 Description text 16 Bug report information, especially comments, are most important
  • 18. Motivation 18 Complexity metrics, Program dependencies, socio-technical networks Size is a good indicator of buggy files Use dependency and complexity metrics Process and code metrics University of Lugano Change coupling, popularity and design flaws Change complexity and social structures Which metrics should I use? How do they impact my code quality? Structure and historical changes
  • 19. Case Study 1. Build a model with initial set of 34 factors 2. Iteratively remove statistically insignificant and highly correlated metrics 19 Replicate the study by Zimmermann et al. [310]
  • 20. Main Findings 20 Narrowed down 34 code and process metrics to only 3 or 4 0 10 20 30 40 50 60 70 80 90 100 Precision (%) Recall (%) Accuracy (%) Performancemeasure(%) Simple All metrics
  • 22. Prioritizing Unit Test Creation Use the rich history of the software system to prioritize the creation of unit tests
  • 23. 0 10 20 30 40 50 60 70 80 90 100 Usefulness(%) Time (Days) Most Frequently Modified (MFM) Most Frequently Fixed (MFF) Largest Fixed (LF) Change Risk (CR) Random Usefulness Was writing the unit test useful? 2-3X improvement in usefulness
  • 24. Encompassing and Proactive SDP 24 Risky Changes
  • 26. An Example Change 26 Change 12345 by author@adesk on 2000/03/23 12:47:15 Purpose: Bug fix Modifies API: Yes Related Changes: 1234, 3421 … Change description: Changed files A and B to implement new feature and fix bug 123 ... Files affected: //root/comp1/subcomp1/A.java (+10, -1, e10) //root/comp1/subcomp1/B.cpp (+1, -2, e5) Risky?
  • 27. Factors Used to Model Risky Changes Lines and chunks add, deleted, modified, total churn No. of changes, No. of fixes, defectiveness, No. developers Developer experience, Bug fix?, No. linked bugs Changed files Experience & Defects Code Size Modify Java, CPP, other, API
  • 29. Most Important Factors Developer Team 7 X Lines Added 10 X Chunks Added 7 X File defectiveness 6 X File defectiveness None 3 X Modifies C++ 2 X No. linked bugs 1 X Developer experience 4 X No. linked bugs 4 X Developer experience Code added, file defectiveness, No. linked defects and developer experience 29
  • 30. 30
  • 31. 31
  • 32. Test Writing Factors Most Frequently Modified (MFM) Most Recently Modified (MRM) Most Frequently Fixed (MFF) Most Recently Fixed (MRF) Largest Fixed (LF) Largest Modified (LM) Change Risk (CR) Size Risk (SR) Random Modification Fix Size Risk Random
  • 33. Prediction Performance Actually Defective Predicted Defective TP FP FN Precision 𝑇𝑃 𝑇𝑃 + 𝐹𝑃 𝑇𝑃 𝑇𝑃 + 𝐹𝑁 Recall 33
  • 34. High-Impact Defects: A Study of Breakage and Surprise Defects Emad Shihab, Audris Mockus, Yasutaka Kamei, Bram Adams, Ahmed E. Hassan
  • 35. We know that…. Software ^ has defects How can we spend the limited resources to maximize quality? Q: Projects ^ have limited resources 35
  • 36. Defect Prediction 0.8 0.1 Prediction Model Size Pre-release defects . . Complexity Input: Metrics Churn Output: Risk [0..1] . . Key Predictors: Size and pre-release defects 36
  • 37. Existing Approaches Aren’t Adding Value • Obvious to practitioners • Require a large amount of effort • Not all defects are equally important So….what can we do? FOCUS ON HIGH-IMPACT DEFECTS ! 37
  • 38. Impact Is In The Eye of The Beholder! Customers: Breakages Break existing functionality Affect established customers Hurt company image Low pre-, high post-release defects Catch developers off-guard Lead to schedule interruptions Developers: Surprises Occur in unexpected locations 38
  • 39. Case Study Commercial telecom project 30+ years of development 7+ MLOC Mainly in C/C++ 39
  • 40. Part 1 Part 2 Part 3 Part 4 Exploratory Study of Breakages and Surprises Prediction of Breakages and Surprises Understanding Prediction Models of Breakages and Surprises Value of Focusing on Breakages and Surprises Study Overview 40
  • 41. Exploratory Study of Breakages and Surprises All files Breakages Surprises Post-release 10% 2% 2% Rare (2% of files) 6% overlap  Should study them separately  Very difficult to model 41
  • 42. Part 1 Part 2 Part 3 Part 4 Exploratory Study of Breakages and Surprises Prediction of Breakages and Surprises Understanding Prediction Models of Breakages and Surprises Value of Focusing on Breakages and Surprises Predicting Breakages and Surprises 42
  • 43. Prediction Using Logistic Regression Outcome = Const + β1 factor 1 + β2 factor2 + β3 factor 3 . . + βn factor n Breakage? Surprises? Factors From 3 Dimensions 43
  • 44. Factors Used to Model Breakages and Surprises Size Pre-release defects Number, churn, size, pre-release changes, pre-release defects Latest change Age Traditional Co-changed files Time 44
  • 46. Part 1 Part 2 Part 3 Part 4 Exploratory Study of Breakages and Surprises Prediction of Breakages and Surprises Understanding Prediction Models of Breakages and Surprises Value of Focusing on Breakages and Surprises Understanding Breakages and Surprises Models 46
  • 47. Determining Important Factors Traditional Co-change Time 15.6% Quality of fit  Deviance Explained +1.5% +0.4% Example: Breakages R1.1 47
  • 48. Traditional Co-change Time Important Factors for High-Impact Defects 0 5 10 15 20 25 30 35 40 R1.1 R2.1 R3 R4 R4.1 0 5 10 15 20 25 30 35 40 R1.1 R2.1 R3 R4 R4.1 Breakages Surprises DevianceExplained(%) Traditional Co-change Time 48
  • 49. Part 1 Part 2 Part 3 Part 4 Exploratory Study of Breakages and Surprises Prediction of Breakages and Surprises Understanding Prediction Models of Breakages and Surprises Value of Focusing on Breakages and Surprises Value of Focusing on Breakages and Surprises 49
  • 51. Effort Savings Using Specialized Models 41 42 55 50 0 10 20 30 40 50 60 70 80 90 100 File LOC EffortSavings(%) Breakages Surprises40-50% Effort Savings Using Specialized Models 51
  • 52. Take Home Messages 1. Breakages and Surprises are different. Occur in 2% of files, hard to predict 2. Achieve 2-3X improvement in precision, high recall Co-change and Time metrics 4. Building specialized models saves 40-50% effort  Traditional metrics3. Breakages Surprises  52 http://research.cs.queensu.ca/home/emads/data/FSE2011/hid_artifact.html
  • 53. Predicting Re-opened Bugs A Case Study on the Eclipse Project Emad Shihab, A. Ihara, Y. Kamei, W. Ibrahim, M. Ohira, B. Adams, A. E. Hassan and K. Matsumoto emads@cs.queensu.ca SAIL, Queen’s University, Canada NAIST, Japan 53
  • 54. When you discover a bug … Report bug Fix bug Verify fix Close bug Re-opened 54
  • 58. Research questions … 1. Which attributes indicate re-opened bugs? 2. Can we accurately predict if a bug will be re- opened using the extracted attributes? 58
  • 59. Determine best attributes Mine code and bug repositories Approach overview Extract attributes Predict re- opened bugs 59
  • 60. Our dimensions … 60 Work habit Bug report Bug fix People
  • 61. Work habit attributes 1. Time (Hour of day) 2. Weekday 3. Day of month 4. Month 61
  • 62. Bug report attributes 1. Component 2. Platform 3. Severity 4. Priority 5. CC list 6. Priority changed 7. Description size 8. Description text 9. Number of comments 10. Comment size 11. Comment text 62 Metadata Textual data
  • 63. Bug fix attributes 1. Time to resolve (in days) 2. Last status 3. Number of edited files 63
  • 64. People attributes 1. Reporter Name 2. Reporter experience 3. Fixer name 4. Fixer experience 64
  • 65. Research question 1 Which attributes indicate re-opened bugs? 65 Comment text, description text and fix location (component) are the best indicators
  • 66. Top node analysis setup 1. Build 10 decision trees for each attribute set 3. Repeat using all attributes 2. Record the frequency and level of each attribute 66
  • 67. Decision tree prediction model 67 No. files >= 5 < 5 Dev exp >= 3 < 3 Re-openedMonth Time >= 12 < 12 Time to resolve >= 6 < 6 >= 24 < 24 Re-opened Not Re-opened Re-opened. . . . . . Level 1 Level 2 Level 3
  • 68. Top node analysis example with 3 trees Comment Time No. comments Comment Time No. files No. files Time Description size Level Frequency Attributes Level 1 2 1 Comment No. files Level 2 3 1 1 1 Time No. comments No. files Description size . . . . . . 68
  • 69. Which attributes best indicate re- opened bugs? 69 Work habit attributes 9 X Month 1 X Time (Hour of day) Weekday Day of month
  • 70. Which attributes best indicate re- opened bugs? 70 Bug report attributes Component Platform Severity Priority CC list Priority changed Description size Description text Number of comments Comment size 10 X Comment text Metadata Textual data
  • 71. Which attributes best indicate re- opened bugs? 7 X Time to resolve 3 X Last status Number of files in fix 71 Bug fix attributes
  • 72. Which attributes best indicate re- opened bugs? 5 X Reporter name 5 X Fixer name Reporter experience Fixer experience 72 People attributes
  • 73. Combining all attributes + ++ Level Frequency Attributes Level 1 10 Comment text Level 2 19 1 Description text Component 73
  • 74. Research question 2 Can we accurately predict if a bug will be re-opened using the extracted attributes? 74 Our models can correctly predict re-opened bugs with 63% precision and 85% recall
  • 75. Decision tree prediction model 75 No. files >= 5 < 5 Dev exp >= 3 < 3 Re-openedMonth Time >= 12 < 12 Time to resolve >= 6 < 6 >= 24 < 24 Re-opened Not Re-opened Re-opened. . . . . . Level 1 Level 2 Level 3
  • 76. Performance measures Re-opened precision: Re-opened Recall: Re-opened Not re-opened Re-opened TP FP Not re-opened FN TN Predicted Actual 𝑇𝑃 𝑇𝑃 + 𝐹𝑃 𝑇𝑃 𝑇𝑃 + 𝐹𝑁 Not re-opened precision: Not re-opened recall: 𝑇𝑁 𝑇𝑁 + 𝐹𝑁 𝑇𝑁 𝑇𝑁 + 𝐹𝑃 76
  • 77. 33 63 21 27 74 83 83 67 Work habits Bug report Bug fix People Precisionandrecall(%) Precision Recall Predicting re-opened bugs 77
  • 78. 93 97 93 91 71 91 39 66 Work habits Bug report Bug fix People Precisionandrecall(%) Precision Recall Predicting NOT re-opened bugs 78
  • 79. Combining all attributes 63 97 85 90 re-opened NOT re-opened Precisionandrecall(%) Precision Recall 79 + ++
  • 80. Bug comments are important … Bug report is most important set What words are important? Comment text most important bug report attribute 80
  • 81. Important words Re-opened Not Re-opened control background debugging breakpoint blocked platforms verified duplicate screenshot important testing warning 81
  • 82. 82
  • 83. Understanding the Impact of Code and Process Metrics on Post-release Defects: A Case Study on the Eclipse Project Emad Shihab, Zhen Ming Jiang, Walid Ibrahim, Bram Adams and Ahmed E. Hassan Software Analysis and Intelligence Lab (SAIL) Queen’s University 83
  • 84. Motivation Software has^ bugs and managers have ^ limited resources 84 How to allocate quality assurance resources? Q: A: Defect prediction!
  • 85. Motivation 85 Complexity metrics, Program dependencies, socio-technical networks Size is a good indicator of buggy files Use dependency and complexity metrics Use number of imports and code metrics Use process and code metrics University of Lugano Change coupling, popularity and design flaws Change complexity and social structures Which metrics should I use? How do they impact my code quality?
  • 86. The challenge we face … 1. more work to mine 2. difficult to understand impact 3. less adoption in practice 86 more metrics, means ….
  • 87. Our goal …. Use a statistical approach based on work by Cataldo et al. : 1. Narrow down large set of metrics to much smaller set 2. Study the impact on post-release defects 87
  • 88. Our findings …. Narrowed down 34 code and process metrics to only 3 or 4 Simple models achieve comparable predictive power Explanative power of simple model outperform 95% PCA 88 Some metrics ALWAYS matter: Size and pre-bugs Let me show you how ….
  • 89. 34 Code and Process Metrics Metric Description POST Number of post-release defects in a file in the 6 months after the release. PRE Number of pre-release defects in a file in the 6 months before the release TPC Total number of changes to a file in the 6 months before the release BFC Number of bug fixing changes in a file in the 6 months before the release. TLOC Total number lines of code of a file ACD Number of anonymous type declarations in a file FOUT (3) Number of method calls of a file MLOC (3) Number of method lines of code NBD (3) Nested block depth of the methods in a file NOF (3) Number of fields of the classes in a file NOI Number of interfaces in a file NOM (3) Number of methods of the classes in a file NOT Number of classes in a file NSF (3) Number of static fields of the classes in a file NSM (3) Number of static methods of the classes in a file PAR (3) Number of parameters of the methods in a file VG (3) McCabe cyclomatic complexity of the methods in a file Process Metrics Code Metrics 89
  • 90. Approach overview P < 0.1 VIF < 2.5 1. Build Logistic Regression model using all metrics 2. Remove statistically insignificant metrics 3. Remove highly co-linear metrics 4. Narrow down to a much smaller set of metrics 90 Initial model w/ all metrics Statistical significance check Co-linearity check Simple model The std error of metric coefficient is ~1.6 times as large if metrics were uncorrelated 5.2
  • 91. Case study Perform case study on Eclipse 2.0, 2.1 and 3.0 RQ1: Which metrics impact post-release defects? Do these metrics change for different releases of Eclipse? RQ2: How much do metrics impact the post-release defects? Does the level of impact change across different releases? 91
  • 92. Metric Eclipse 2.0 Eclipse 2.1 Eclipse 3.0 P-value VIF P-value VIF P-value VIF Anonymous Type Declarations * 1.2 No. of Static Methods *** 1.1 No. of Parameters *** 1.2 No. Pre-release Defects *** 1.1 *** 1.1 *** 1.2 Total Prior Changes *** 1.1 ** 1.1 Total lines of Code *** 1.3 *** 1.4 *** 1.3 RQ1: Which metrics impact? Do they change? 92 Important and stable for all releases Code metrics specific for release (p<0.001 ***; p<0.001 **, p<0.05*)
  • 93. RQ2: How much do metrics explain? 93 Metric Eclipse 2.0 Eclipse 2.1 Eclipse 3.0 Total lines of Code Total Prior Changes No. Pre-release defects No. of Parameters No. of static methods Anonymous Type Declarations Deviance explained 25.2% 17.7% 21.2% 0.1% 4.9% 17.6% 2.2% 11.2% 6.3% 0.2% 14.5% 5.9% 0.5% 0.7% Size and process metrics are most important How well the model fits, explains the observed phenomena
  • 94. RQ2: Impact of the metrics? Eclipse 3.0 94 Metric Odds-ratios (M 1) Odds-ratios (M2) Odds-ratios (M 3) Odds-ratios (M 4) Lines of Code 2.57 2.40 2.11 1.88 Prior Changes 1.87 1.62 1.62 Pre-release defects 1.87 1.90 Max parameters of methods 1.73 1 unit increase, increases the chance of post-release defect by 90% Odds ratios are used to quantify impact on post-release defects
  • 95. But …What about predictive power? Eclipse 3.0 95 Simple models achieve comparable results to more complex models 0 10 20 30 40 50 60 70 80 90 100 Precision (%) Recall (%) Accuracy (%) Performancemeasure(%) Simple All metrics
  • 96. Comparing to PCA Eclipse 3.0 96 Simple 95% PCA 99% PCA 100% PCA Deviance explained 21.2% 16.3% 21.7% 22.0% No. of metrics 4 33 33 33 No. of PCs - 8 15 33 Can outperform 95% PCA, using much simpler models
  • 97. Comparing to PCA 97 0 10 20 30 Eclipse 2.0 Eclipse 2.1 Eclipse 3.0 Deviance explained (%) 100% PCA 95% PCA Simple Outperform 95% PCA, slightly below 100% PCA Use at most 4 metrics Vs. 34 metrics used in PCA
  • 99. Prioritizing Unit Test Creation for Test-Driven Maintenance of Legacy Systems Emad Shihab, Zhen Ming Jiang, Bram Adams, Ahmed E. Hassan and Robert Bowerman Queen’s University and Research In Motion Canada
  • 100. Test Driven Development (TDD) Write unit test before writing new code What about already written code
  • 101. Test Drive Maintenance (TDM) Adopting Test Driven Development (TDD) for Legacy Applications But time and resources are limited!
  • 102. Prioritizing Unit Test Creation Use the rich history of the legacy system to prioritize the writing of unit tests
  • 103. Avoid the most bugs effectively! Write unit tests for functions with best Return on Investment (ROI) How can we avoid the most bugs given limited resources?
  • 104. Testing Writing Prioritization Heuristics Most Frequently Modified (MFM) Most Recently Modified (MRM) Most Frequently Fixed (MFF) Most Recently Fixed (MRF) Largest Fixed (LF) Largest Modified (LM) Change Risk (CR) Size Risk (SR) Random Modification Fix Size Risk Random
  • 105. Usefulness Was writing the unit test useful? Time to write unit test A B C 6 bug fixes 2 bug fixes 0 bug fixes Usefulness = 2/3 = 66.67%
  • 106. POP: Percentage of Optimal Performance How close are we to the optimal performance? Time to write unit test A B C 6 bug fixes 2 bug fixes 0 bug fixes POP = 8/13 = 61.5% D E 4 bug fixes 3 bug fixes
  • 107. Case Study Simulation Parameters Calibration Period: 6 months Simulation time: 5 years List Size: 10 functions Effort: 100 lines per day
  • 108. Study Setup Extracting Historical Data 1. Search modification record comments for keywords and bug identifiers 2. Extract source code of modified file(s) and compare to previous version to identify changed functions 3. Combine data from 1 and 2 to identify changed/fixed functions
  • 109. main() { int a; /*call help*/ helpInfo(); } helpInfo() { errorString! } main() { int a; /*call help*/ helpInfo(); } helpInfo(){ int b; } main() { int a; /*call help*/ helpInfo(); } V1: Undefined func. (Link Error) V2: Syntax error V3: Valid code Mapping Historical Changes to Functions
  • 110. Study Setup Measuring the Performance of a Heuristic Based on a heuristic, generate list of X functions to write unit tests for Use size of function to measure effort required to write unit test
  • 111. Test Writing Heuristics Most Frequently Modified (MFM) Most Recently Modified (MRM) Most Frequently Fixed (MFF) Most Recently Fixed (MRF) Largest Fixed (LF) Largest Modified (LM) Change Risk (CR) Size Risk (SR) Random Modification Fix Size Risk Random
  • 112. Best Test Writing Heuristics Most Frequently Modified (MFM) Most Recently Modified (MRM) Most Frequently Fixed (MFF) Most Recently Fixed (MRF) Largest Fixed (LF) Largest Modified (LM) Change Risk (CR) Size Risk (SR) Random Modification Fix Size Risk Random
  • 113. 0 10 20 30 40 50 60 70 80 90 100 Usefulness(%) Time (Days) Most Frequently Modified (MFM) Most Frequently Fixed (MFF) Largest Fixed (LF) Change Risk (CR) Random Usefulness Was writing the unit test useful?
  • 114. POP: Percentage of Optimal Performance How close are we to the optimal performance? 0 10 20 30 40 50 60 70 80 90 100 Percentageofoptimalperformance(%) Time (days) Most Frequently Modified (MFM) Most Frequently Fixed (MFF) Largest Fixed (LF) Change Risk (CR) Random
  • 115. 87 84.7 83.8 80 56.9 55 48.8 43.1 27.7 32.4 32.2 22.2 21.8 7 5.5 4.3 4.9 1.7 0 10 20 30 40 50 60 70 80 90 100 Largest Fixed (LF) Largest Modified (LM) Most Frequently Fixed (MFF) Most Frequently Modified (MFM) Most Recently Fixed (MRF) Change Risk (CR) Size Risk (SR) Most Recently Modified (MRM) Random Overall Performance of Heuristics POP Usefulness
  • 116. 0 5 10 15 20 25 30 35 40 5 10 20 Percentageofoptimalperformance(%) List size Effect of Varying Parameters Varying List Size Most Frequently Modified (MFM) Most Frequently Fixed (MFF) Largest Fixed (LF) Change Risk (CR) Random
  • 117. 0 5 10 15 20 25 30 35 40 45 50 50 100 200 Percentageofoptimalperformance(%) Effort (lines per day) Effect of Varying Parameters Varying Writing Effort Most Frequently Modified (MFM) Most Frequently Fixed (MFF) Largest Fixed (LF) Change Risk (CR) Random
  • 120. Overview of Change Integration Process Local Repository Risky? Yes Closer review No Main Repository Change 120
  • 121. Case Study Commercial mobile system Dec 2009 – Dec 2010 450+ developers 60+ teams 7000+ changes Mainly in Java and C/C++ 121
  • 122. Part 1 Part 2 Part 3 Prediction of Risky Changes Understanding Risky Changes Misclassification of Risky Changes Study Overview 122
  • 123. An Example Change 123 Change 12345 by author@adesk on 2000/03/23 12:47:15 Purpose: Bug fix Modifies API: Yes Related Changes: 1234, 3421 … Change description: Changed files A and B to implement new feature and fix bug 123 ... Files affected: //root/comp1/subcomp1/A.java (+10, -1, e10) //root/comp1/subcomp1/B.cpp (+1, -2, e5) Risky?
  • 124. Factors Used to Model Risky Changes Lines and chunks add, deleted, modified, total churn No. of changes, No. of fixes, bugginess, No. developers Developer experience, Bug fix?, No. linked bugs Changed files Experience & Defects Code Size Modify Java, CPP, other, API
  • 126. Part 1 Part 2 Part 3 Prediction of Risky Changes Understanding Risky Changes Misclassification of Risky Changes Study Overview 126
  • 127. Most Important Factors Developer Team 7 X Lines Added 10 X Chunks Added 7 X File bugginess 6 X File bugginess None 3 X Modifies C++ 2 X No. linked bugs 1 X Developer experience 4 X No. linked bugs 4 X Developer experience Code added, file bugginess, No. linked defects and developer experience 127
  • 128. Part 1 Part 2 Part 3 Prediction of Risky Changes Understanding Risky Changes Misclassification of Risky Changes Study Overview 128
  • 129. When were Developers Wrong? Compare percentage of correctly and wrongly classified changes: • Cause: Unclear requirements, inadequate testing, coding errors, design flaw • Related changes? • Modifies API code? Changes that have related changes are 10 times more likely to be wrongly classified! 129
  • 130. Reality Check! 130 Too much to review Impact is not considered Models are not explainable
  • 131. Success Story! A tool based on this work is being used by RIM’s Handheld Integration Team Tools team is working on building a tool to be deployed company wide 131
  • 132. Overview of Change Integration Process Local Repository Risky? Yes Closer review No Main Repository Change 132
  • 133. When were Developers Wrong? Compare percentage of correctly and wrongly classified changes: • Cause: Unclear requirements, inadequate testing, coding errors, design flaw • Related changes? • Modifies API code? Changes that have related changes are 10 times more likely to be wrongly classified! 133
  • 134. Evaluation of Prediction Models Training 2/3 Testing 1/3 Data Build Model Input Actually defective 139 X 10 Pr(Defecti) = α + β1 * metric iPr(Defecti) = 0.1 + 0.5 * metric i Pr(Defecti)
  • 135. Evaluation of Prediction Models 140 Actually Defective Predicted Defective TP FP FN Precision: “How small is FP” Recall: “How small is FN”
  • 136. Putting It All Together 141
  • 137. Metrics 32 product metrics 1 process metric 142
  • 138. Logistic Regression Model Pr(Defect) = α + β1 metric 1 + β2 metric 2 + β3 metric 3 . . + βn metric n 32 Product & 1 Process Metrics Post-release Defect Cutoff = 0.5 143
  • 139. Prediction Performance Actually Defective Predicted Defective TP FP FN Precision 𝑇𝑃 𝑇𝑃 + 𝐹𝑃 𝑇𝑃 𝑇𝑃 + 𝐹𝑁 Recall = 0.68 = 0.38 144
  • 141. The challenge we face … 1. more work to mine 2. difficult to understand impact 3. less adoption in practice 146 more metrics, means ….

Editor's Notes

  1. Numbers are from 2002
  2. How is this slide related to previous one?
  3. Way too many terms that are not defined: Predictive power - relative impact -effort saving Just remove all green stuff for now – you need to sell your work for now not the exact technique the exact techqniue needs to be presented and detailed later on. Avoid Green text very hard on the eyes Also you never get back to these questions? These questions need to be answered later in your presentation (so the presentation should be around that structure and your conclusion should highlight these answers too) The black magic picture means that your methodology is black magic Predictors are a way to study this thing – your paper is not about predictors it is about studying what makes things happen. You are using prediction models as a tool for your study. What are the best predictors  What
  4. Factors… may be say Causes? What is this graph? ?How is it measured? What is your Y-axis? Need aslide before to explain how this graph is generated and what is the intuition behind it?
  5. Way too many terms that are not defined: Predictive power - relative impact -effort saving Just remove all green stuff for now – you need to sell your work for now not the exact technique the exact techqniue needs to be presented and detailed later on. Avoid Green text very hard on the eyes Also you never get back to these questions? These questions need to be answered later in your presentation (so the presentation should be around that structure and your conclusion should highlight these answers too) The black magic picture means that your methodology is black magic Predictors are a way to study this thing – your paper is not about predictors it is about studying what makes things happen. You are using prediction models as a tool for your study. What are the best predictors  What
  6. I do not get how you measured effort savings? What do you mean by File or LOC? Need a slide before this to explain what you are doing? In the last slide you said you are comparing false positives.. I do not see that I just see File and LOC