SlideShare a Scribd company logo
1 of 38
High-Impact Defects: A Study of
Breakage and Surprise Defects
Emad Shihab, Audris Mockus, Yasutaka
Kamei, Bram Adams, Ahmed E. Hassan
We know that….
Software ^ has defects
How can we spend the limited
resources to maximize quality?
Q:
Projects ^ have limited resources
2
Defect Prediction
0.8
0.1
Prediction
Model
Size
Pre-release defects
.
.
Complexity
Input: Metrics
Churn
Output: Risk [0..1]
.
.
Key Predictors: Size and pre-release
defects
3
Existing Approaches Aren’t Adding Value
• Obvious to practitioners
• Require a large amount of effort
• Not all defects are equally important
So….what can we do?
FOCUS ON HIGH-IMPACT DEFECTS !
4
Impact Is In The Eye of The Beholder!
Customers: Breakages
Break existing functionality
Affect established customers
Hurt company image
Low pre-, high post-release defects
Catch developers off-guard
Lead to schedule interruptions
Developers: Surprises
Occur in unexpected locations
5
Case Study
Commercial telecom project
30+ years of development
7+ MLOC
Mainly in C/C++
6
Part 1 Part 2
Part 3 Part 4
Exploratory Study of
Breakages and Surprises
Prediction of Breakages
and Surprises
Understanding
Prediction Models of
Breakages and Surprises
Value of Focusing on
Breakages and Surprises
Study Overview
7
Exploratory Study of Breakages and
Surprises
All files
Breakages Surprises
Post-release
10%
2% 2%
Rare (2% of files)
6% overlap  Should study them separately
 Very difficult to model
8
Part 1 Part 2
Part 3 Part 4
Exploratory Study of
Breakages and Surprises
Prediction of
Breakages and Surprises
Understanding Prediction
Models of Breakages and
Surprises
Value of Focusing on
Breakages and Surprises
Predicting Breakages and Surprises
9
Prediction Using Logistic Regression
Outcome = Const + β1 factor 1
+ β2 factor2
+ β3 factor 3
.
.
+ βn factor n
Breakage?
Surprises?
Factors From 3 Dimensions
10
Factors Used to Model Breakages and
Surprises
Size
Pre-release defects
Number, churn, size, pre-release
changes, pre-release defects
Latest change
Age
Traditional
Co-changed files
Time 11
Breakages Surprises
Precision
Recall
74.1%71.2%
6.7%
2.0%
4.7%
2.0%
Random Predictor
Prediction Results
2-3X precision, high recall
12
Part 1 Part 2
Part 3 Part 4
Exploratory Study of
Breakages and Surprises
Prediction of Breakages
and Surprises
Understanding
Prediction Models of
Breakages and Surprises
Value of Focusing on
Breakages and Surprises
Understanding Breakages and
Surprises Models
13
Determining Important Factors
Traditional Co-change Time
15.6%
Quality of fit  Deviance Explained
+1.5% +0.4%
Example: Breakages R1.1 14
Traditional Co-change Time
Important Factors for High-Impact Defects
0
5
10
15
20
25
30
35
40
R1.1 R2.1 R3 R4 R4.1
0
5
10
15
20
25
30
35
40
R1.1 R2.1 R3 R4 R4.1
Breakages Surprises
DevianceExplained(%)
Traditional
Co-change
Time
15
Part 1 Part 2
Part 3 Part 4
Exploratory Study of
Breakages and Surprises
Prediction of Breakages
and Surprises
Understanding Prediction
Models of Breakages and
Surprises
Value of Focusing on
Breakages and Surprises
Value of Focusing on Breakages and
Surprises
16
Building Specialized Models
Test
Post-release
Defects
Train
Breakages
Test
Breakages
Train
Breakages
Compare
False
Positives
General model
Specialized model
17
Effort Savings Using Specialized Models
41 42
55
50
0
10
20
30
40
50
60
70
80
90
100
File LOC
EffortSavings(%)
Breakages
Surprises40-50% Effort Savings Using Specialized
Models
18
Take Home Messages
1. Breakages and Surprises are different. Occur in 2%
of files, hard to predict
2. Achieve 2-3X improvement in precision, high recall
Co-change and Time metrics
4. Building specialized models saves 40-50% effort
 Traditional metrics3. Breakages
Surprises 
19
http://research.cs.queensu.ca/home/emads/data/FSE2011/hid_artifact.html
20
Quantifying Effort Savings
Yes No
Yes 26 320
No 7 1093
Predicted
Actual
Yes No
Yes 26 538
No 7 875
Predicted
Actual
Set recall to be the same
Effort Savings ~41%!
General model Specialized model
21
Remaining Challenges
• “We tend to test features not files”
– Can we predict defects for features
• “Without knowing more about the nature of the
defect or recommendations for how to fix it, I
am not sure how we can use it”
– Predict the nature of defects
– Can we provide specific remediation strategies for
predicted defects
• e.g., surprises mostly relate to incorrectly implemented
requirements
22
Quantifying Effect…An Example…
Prediction
Model
Median Size
Median Pre-defects
.
.
Median age
2 x Median Size
0.10.2
23
Effect of Factors on Breakages and Surprises
154
39
-85
-19
-92
-150
-100
-50
0
50
100
150
200
Pre-release
defects
Size No. co-
changed files
Churn of co-
changed files
Latest change
Breakages
Surprises
24
High Impact Defects: Summary
Can we identify
them?
What factor best
predict them?
What is the value of
focusing on them?
Yes, 2-3X precision,
~70% recall
Breakages: Traditional
Surprises: Co-change and
release schedule
40-50% effort savings
25
Current approaches predict the obvious
Focus on high-impact, i.e. surprises and
breakages
Pre-defects and size predict Breakages
Number and churn of co-changed files
and late changes predict surprises
Using specialized models reduces effort by 40-50%
26
Study Overview
Extract
Metrics
Build
Statistical
Models
Analyze
Effect on
Quality
1. Traditional
2. Co-change
3. Time
Logistic Regression 1. Predictive &
explanative power
2. Quantify
Effect
27
Breakage Defects
Defects that break
existing functionality
Affect an established
customer base
Hurt quality image
28
Surprise Defects
Flag files with defects in
unexpected locations
Catch practitioners
off guard
Interrupt schedules
High ratio of post-
to-pre defects
29
Predicting Breakages and Surprises
Explanative Power
Breakages Surprises
17.8%
13.1%
State of Art
(post-release)
17.7 – 27.9%
30
Stability of Important Factors
Breakages
R1.1 R2.1 R3.1
No. co-changed files
Late changes
Pre-defects
R3 R4.1
Size
Churn co-changed files
Highly
stable
Mainly
stable
Not
stable
31
Stability of Important Factors
R1.1 R2.1 R3.1R3 R4.1 R1.1 R2.1 R3.1R3 R4.1
Breakages Surprises 32
Breakage Defects
Defects that break
existing functionality
Affect an established
customer base
Hurt quality image
33
Surprise Defects
Flag files with defects in
unexpected locations
Catch practitioners
off guard
Interrupt schedules
High ratio of post-
to-pre defects
34
Defect Prediction Helps Focus Quality
Assurance Efforts
Extract
Metrics
Size
Complexity
.
.
Post-release defects
D(f) = C + 0.1*size(f) + 0.2*complexity(f) + …
Model
(e.g. Logistic
Regression)
Extract
Metrics
Size
Complexity
D(f) = C + 0.1*size(f) + 0.2*complexity(f) + …
D(f) = 0.8
D(f) = 0.6
35
Factors Used to Model High Impact Defects
Size
Pre-release defects
Age
Number, churn, size, pre-release
changes, pre-release defects
Latest changes
Traditional
Co-changed files
Release schedule 36
Size
Pre-release
defects
# of files
Churn
Size
Pre-release defects
Pre-release changes
Latest Change
Age
Co-Changed Files
Prediction Factors
37
Evaluation of Prediction Model
Yes No
Yes TP FP
No FN TN
Predicted
Actual
Precision 𝑇𝑃
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
Recall
Training
2/3 Testing
1/3
Data
Build Model
Input
Outcome
38

More Related Content

What's hot

Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!
Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!
Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!TEST Huddle
 
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010TEST Huddle
 
Neil Thompson - Value Inspired Testing: Renovating Risk-Based Testing and Inn...
Neil Thompson - Value Inspired Testing: Renovating Risk-Based Testing and Inn...Neil Thompson - Value Inspired Testing: Renovating Risk-Based Testing and Inn...
Neil Thompson - Value Inspired Testing: Renovating Risk-Based Testing and Inn...TEST Huddle
 
Julian Harty - Alternatives To Testing - EuroSTAR 2010
Julian Harty - Alternatives To Testing - EuroSTAR 2010Julian Harty - Alternatives To Testing - EuroSTAR 2010
Julian Harty - Alternatives To Testing - EuroSTAR 2010TEST Huddle
 
'Test Data Management and Project Quality Go Hand In Hand' by Kristian Fische...
'Test Data Management and Project Quality Go Hand In Hand' by Kristian Fische...'Test Data Management and Project Quality Go Hand In Hand' by Kristian Fische...
'Test Data Management and Project Quality Go Hand In Hand' by Kristian Fische...TEST Huddle
 
Practical Application Of Risk Based Testing Methods
Practical Application Of Risk Based Testing MethodsPractical Application Of Risk Based Testing Methods
Practical Application Of Risk Based Testing MethodsReuben Korngold
 
Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...
Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...
Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...TEST Huddle
 
Fundamentals of Risk-based Testing
Fundamentals of Risk-based TestingFundamentals of Risk-based Testing
Fundamentals of Risk-based TestingTechWell
 
Thomas Axen - Lean Kaizen Applied To Software Testing - EuroSTAR 2010
Thomas Axen - Lean Kaizen Applied To Software Testing - EuroSTAR 2010Thomas Axen - Lean Kaizen Applied To Software Testing - EuroSTAR 2010
Thomas Axen - Lean Kaizen Applied To Software Testing - EuroSTAR 2010TEST Huddle
 
Mattias Ratert - Incremental Scenario Testing
Mattias Ratert - Incremental Scenario TestingMattias Ratert - Incremental Scenario Testing
Mattias Ratert - Incremental Scenario TestingTEST Huddle
 
From Defect Reporting To Defect Prevention
From Defect Reporting To Defect PreventionFrom Defect Reporting To Defect Prevention
From Defect Reporting To Defect PreventionSune Gynthersen
 
Using Functional ,Test Automation to Prevent Defects from Escaping the Develo...
Using Functional ,Test Automation to Prevent Defects from Escaping the Develo...Using Functional ,Test Automation to Prevent Defects from Escaping the Develo...
Using Functional ,Test Automation to Prevent Defects from Escaping the Develo...TEST Huddle
 
Better Software Classic Testing Mistakes
Better Software Classic Testing MistakesBetter Software Classic Testing Mistakes
Better Software Classic Testing Mistakesnazeer pasha
 
Kasper Hanselman - Imagination is More Important Than Knowledge
Kasper Hanselman - Imagination is More Important Than KnowledgeKasper Hanselman - Imagination is More Important Than Knowledge
Kasper Hanselman - Imagination is More Important Than KnowledgeTEST Huddle
 
Franck Mignet - How Exploratory Testing Helps get Structured Testing Started
Franck Mignet -  How Exploratory Testing Helps get Structured Testing StartedFranck Mignet -  How Exploratory Testing Helps get Structured Testing Started
Franck Mignet - How Exploratory Testing Helps get Structured Testing StartedTEST Huddle
 
RCA on Residual defects – Techniques for adaptive Regression testing
RCA on Residual defects – Techniques for adaptive Regression testingRCA on Residual defects – Techniques for adaptive Regression testing
RCA on Residual defects – Techniques for adaptive Regression testingIndium Software
 
Risks of Risk-Based Testing
Risks of Risk-Based TestingRisks of Risk-Based Testing
Risks of Risk-Based Testingrrice2000
 
Put Risk Based Testing in place right now!
Put Risk Based Testing in place right now!Put Risk Based Testing in place right now!
Put Risk Based Testing in place right now!SQALab
 
[Vu Van Nguyen] Value-based Software Testing an Approach to Prioritizing Tests
[Vu Van Nguyen]  Value-based Software Testing an Approach to Prioritizing Tests[Vu Van Nguyen]  Value-based Software Testing an Approach to Prioritizing Tests
[Vu Van Nguyen] Value-based Software Testing an Approach to Prioritizing TestsHo Chi Minh City Software Testing Club
 

What's hot (20)

Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!
Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!
Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!
 
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
 
Neil Thompson - Value Inspired Testing: Renovating Risk-Based Testing and Inn...
Neil Thompson - Value Inspired Testing: Renovating Risk-Based Testing and Inn...Neil Thompson - Value Inspired Testing: Renovating Risk-Based Testing and Inn...
Neil Thompson - Value Inspired Testing: Renovating Risk-Based Testing and Inn...
 
Julian Harty - Alternatives To Testing - EuroSTAR 2010
Julian Harty - Alternatives To Testing - EuroSTAR 2010Julian Harty - Alternatives To Testing - EuroSTAR 2010
Julian Harty - Alternatives To Testing - EuroSTAR 2010
 
'Test Data Management and Project Quality Go Hand In Hand' by Kristian Fische...
'Test Data Management and Project Quality Go Hand In Hand' by Kristian Fische...'Test Data Management and Project Quality Go Hand In Hand' by Kristian Fische...
'Test Data Management and Project Quality Go Hand In Hand' by Kristian Fische...
 
Practical Application Of Risk Based Testing Methods
Practical Application Of Risk Based Testing MethodsPractical Application Of Risk Based Testing Methods
Practical Application Of Risk Based Testing Methods
 
Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...
Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...
Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...
 
Fundamentals of Risk-based Testing
Fundamentals of Risk-based TestingFundamentals of Risk-based Testing
Fundamentals of Risk-based Testing
 
Thomas Axen - Lean Kaizen Applied To Software Testing - EuroSTAR 2010
Thomas Axen - Lean Kaizen Applied To Software Testing - EuroSTAR 2010Thomas Axen - Lean Kaizen Applied To Software Testing - EuroSTAR 2010
Thomas Axen - Lean Kaizen Applied To Software Testing - EuroSTAR 2010
 
Mattias Ratert - Incremental Scenario Testing
Mattias Ratert - Incremental Scenario TestingMattias Ratert - Incremental Scenario Testing
Mattias Ratert - Incremental Scenario Testing
 
Risk based QA af Michael Agerkvist Petersen, Radiometer Medical
Risk based QA af Michael Agerkvist Petersen, Radiometer MedicalRisk based QA af Michael Agerkvist Petersen, Radiometer Medical
Risk based QA af Michael Agerkvist Petersen, Radiometer Medical
 
From Defect Reporting To Defect Prevention
From Defect Reporting To Defect PreventionFrom Defect Reporting To Defect Prevention
From Defect Reporting To Defect Prevention
 
Using Functional ,Test Automation to Prevent Defects from Escaping the Develo...
Using Functional ,Test Automation to Prevent Defects from Escaping the Develo...Using Functional ,Test Automation to Prevent Defects from Escaping the Develo...
Using Functional ,Test Automation to Prevent Defects from Escaping the Develo...
 
Better Software Classic Testing Mistakes
Better Software Classic Testing MistakesBetter Software Classic Testing Mistakes
Better Software Classic Testing Mistakes
 
Kasper Hanselman - Imagination is More Important Than Knowledge
Kasper Hanselman - Imagination is More Important Than KnowledgeKasper Hanselman - Imagination is More Important Than Knowledge
Kasper Hanselman - Imagination is More Important Than Knowledge
 
Franck Mignet - How Exploratory Testing Helps get Structured Testing Started
Franck Mignet -  How Exploratory Testing Helps get Structured Testing StartedFranck Mignet -  How Exploratory Testing Helps get Structured Testing Started
Franck Mignet - How Exploratory Testing Helps get Structured Testing Started
 
RCA on Residual defects – Techniques for adaptive Regression testing
RCA on Residual defects – Techniques for adaptive Regression testingRCA on Residual defects – Techniques for adaptive Regression testing
RCA on Residual defects – Techniques for adaptive Regression testing
 
Risks of Risk-Based Testing
Risks of Risk-Based TestingRisks of Risk-Based Testing
Risks of Risk-Based Testing
 
Put Risk Based Testing in place right now!
Put Risk Based Testing in place right now!Put Risk Based Testing in place right now!
Put Risk Based Testing in place right now!
 
[Vu Van Nguyen] Value-based Software Testing an Approach to Prioritizing Tests
[Vu Van Nguyen]  Value-based Software Testing an Approach to Prioritizing Tests[Vu Van Nguyen]  Value-based Software Testing an Approach to Prioritizing Tests
[Vu Van Nguyen] Value-based Software Testing an Approach to Prioritizing Tests
 

Similar to Emad fse2011 final

Unit v11 proactive maintenance analysis
Unit v11 proactive maintenance analysisUnit v11 proactive maintenance analysis
Unit v11 proactive maintenance analysisCharlton Inao
 
Isabel Evans - Working Ourselves out of a Job: A Passion For Improvement - Eu...
Isabel Evans - Working Ourselves out of a Job: A Passion For Improvement - Eu...Isabel Evans - Working Ourselves out of a Job: A Passion For Improvement - Eu...
Isabel Evans - Working Ourselves out of a Job: A Passion For Improvement - Eu...TEST Huddle
 
Icse 2011 ds_1
Icse 2011 ds_1Icse 2011 ds_1
Icse 2011 ds_1SAIL_QU
 
1-SoftwareEngineeringandBestPractices.ppt
1-SoftwareEngineeringandBestPractices.ppt1-SoftwareEngineeringandBestPractices.ppt
1-SoftwareEngineeringandBestPractices.pptMeenakshiPanda
 
1-SoftwareEngineeringandBestPractices.ppt
1-SoftwareEngineeringandBestPractices.ppt1-SoftwareEngineeringandBestPractices.ppt
1-SoftwareEngineeringandBestPractices.pptBUSHRASHAIKH804312
 
Risk Driven Testing
Risk Driven TestingRisk Driven Testing
Risk Driven TestingJorge Boria
 
Risk Event Modeling and Event Chains
Risk Event Modeling and Event ChainsRisk Event Modeling and Event Chains
Risk Event Modeling and Event ChainsIntaver Insititute
 
Lean for Competitive Advantage and Customer Delight
Lean for Competitive Advantage and Customer DelightLean for Competitive Advantage and Customer Delight
Lean for Competitive Advantage and Customer DelightLean India Summit
 
PFMEA, Risk Reduction and Effectiveness – Advance (AIAG FMEA #4 Edition)
PFMEA, Risk Reduction and Effectiveness – Advance (AIAG FMEA #4 Edition)PFMEA, Risk Reduction and Effectiveness – Advance (AIAG FMEA #4 Edition)
PFMEA, Risk Reduction and Effectiveness – Advance (AIAG FMEA #4 Edition)Prashant Rasekar
 
Risk Assessment Step Powerpoint Presentation Slides
Risk Assessment Step Powerpoint Presentation SlidesRisk Assessment Step Powerpoint Presentation Slides
Risk Assessment Step Powerpoint Presentation SlidesSlideTeam
 
Андрій Мудрий “Risk managemnt: Welcome to Risk World” Lviv Project Managemen...
 Андрій Мудрий “Risk managemnt: Welcome to Risk World” Lviv Project Managemen... Андрій Мудрий “Risk managemnt: Welcome to Risk World” Lviv Project Managemen...
Андрій Мудрий “Risk managemnt: Welcome to Risk World” Lviv Project Managemen...Lviv Startup Club
 
Андрій Мудрий «Risk managemnt: Welcome to Risk World»
Андрій Мудрий «Risk managemnt: Welcome to Risk World»Андрій Мудрий «Risk managemnt: Welcome to Risk World»
Андрій Мудрий «Risk managemnt: Welcome to Risk World»Lviv Startup Club
 
Issues, Incidents, and Problems
Issues, Incidents, and ProblemsIssues, Incidents, and Problems
Issues, Incidents, and ProblemsMarkSchmidt98
 
Risk Assessment Step PowerPoint Presentation Slides
Risk Assessment Step PowerPoint Presentation SlidesRisk Assessment Step PowerPoint Presentation Slides
Risk Assessment Step PowerPoint Presentation SlidesSlideTeam
 
Test Estimation
Test Estimation Test Estimation
Test Estimation SQALab
 
Doe As Process Control Introduction
Doe As Process Control IntroductionDoe As Process Control Introduction
Doe As Process Control IntroductionKelly Brown
 

Similar to Emad fse2011 final (20)

Unit v11 proactive maintenance analysis
Unit v11 proactive maintenance analysisUnit v11 proactive maintenance analysis
Unit v11 proactive maintenance analysis
 
Isabel Evans - Working Ourselves out of a Job: A Passion For Improvement - Eu...
Isabel Evans - Working Ourselves out of a Job: A Passion For Improvement - Eu...Isabel Evans - Working Ourselves out of a Job: A Passion For Improvement - Eu...
Isabel Evans - Working Ourselves out of a Job: A Passion For Improvement - Eu...
 
Icse 2011 ds_1
Icse 2011 ds_1Icse 2011 ds_1
Icse 2011 ds_1
 
1-SoftwareEngineeringandBestPractices.ppt
1-SoftwareEngineeringandBestPractices.ppt1-SoftwareEngineeringandBestPractices.ppt
1-SoftwareEngineeringandBestPractices.ppt
 
1-SoftwareEngineeringandBestPractices.ppt
1-SoftwareEngineeringandBestPractices.ppt1-SoftwareEngineeringandBestPractices.ppt
1-SoftwareEngineeringandBestPractices.ppt
 
Risk Driven Testing
Risk Driven TestingRisk Driven Testing
Risk Driven Testing
 
Risk Event Modeling and Event Chains
Risk Event Modeling and Event ChainsRisk Event Modeling and Event Chains
Risk Event Modeling and Event Chains
 
Lean for Competitive Advantage and Customer Delight
Lean for Competitive Advantage and Customer DelightLean for Competitive Advantage and Customer Delight
Lean for Competitive Advantage and Customer Delight
 
Analytical Risk-based and Specification-based Testing - Bui Duy Tam
Analytical Risk-based and Specification-based Testing - Bui Duy TamAnalytical Risk-based and Specification-based Testing - Bui Duy Tam
Analytical Risk-based and Specification-based Testing - Bui Duy Tam
 
PFMEA, Risk Reduction and Effectiveness – Advance (AIAG FMEA #4 Edition)
PFMEA, Risk Reduction and Effectiveness – Advance (AIAG FMEA #4 Edition)PFMEA, Risk Reduction and Effectiveness – Advance (AIAG FMEA #4 Edition)
PFMEA, Risk Reduction and Effectiveness – Advance (AIAG FMEA #4 Edition)
 
FMEA-Intro.ppt
FMEA-Intro.pptFMEA-Intro.ppt
FMEA-Intro.ppt
 
1st module.....
1st module.....1st module.....
1st module.....
 
Risk Assessment Step Powerpoint Presentation Slides
Risk Assessment Step Powerpoint Presentation SlidesRisk Assessment Step Powerpoint Presentation Slides
Risk Assessment Step Powerpoint Presentation Slides
 
Risk Management
Risk Management Risk Management
Risk Management
 
Андрій Мудрий “Risk managemnt: Welcome to Risk World” Lviv Project Managemen...
 Андрій Мудрий “Risk managemnt: Welcome to Risk World” Lviv Project Managemen... Андрій Мудрий “Risk managemnt: Welcome to Risk World” Lviv Project Managemen...
Андрій Мудрий “Risk managemnt: Welcome to Risk World” Lviv Project Managemen...
 
Андрій Мудрий «Risk managemnt: Welcome to Risk World»
Андрій Мудрий «Risk managemnt: Welcome to Risk World»Андрій Мудрий «Risk managemnt: Welcome to Risk World»
Андрій Мудрий «Risk managemnt: Welcome to Risk World»
 
Issues, Incidents, and Problems
Issues, Incidents, and ProblemsIssues, Incidents, and Problems
Issues, Incidents, and Problems
 
Risk Assessment Step PowerPoint Presentation Slides
Risk Assessment Step PowerPoint Presentation SlidesRisk Assessment Step PowerPoint Presentation Slides
Risk Assessment Step PowerPoint Presentation Slides
 
Test Estimation
Test Estimation Test Estimation
Test Estimation
 
Doe As Process Control Introduction
Doe As Process Control IntroductionDoe As Process Control Introduction
Doe As Process Control Introduction
 

More from SAIL_QU

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...SAIL_QU
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsSAIL_QU
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...SAIL_QU
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...SAIL_QU
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...SAIL_QU
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...SAIL_QU
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...SAIL_QU
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?SAIL_QU
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesSAIL_QU
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesSAIL_QU
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...SAIL_QU
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...SAIL_QU
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...SAIL_QU
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...SAIL_QU
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?SAIL_QU
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...SAIL_QU
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...SAIL_QU
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsSAIL_QU
 

More from SAIL_QU (20)

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load tests
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log Changes
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution Analyses
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
 

Emad fse2011 final

  • 1. High-Impact Defects: A Study of Breakage and Surprise Defects Emad Shihab, Audris Mockus, Yasutaka Kamei, Bram Adams, Ahmed E. Hassan
  • 2. We know that…. Software ^ has defects How can we spend the limited resources to maximize quality? Q: Projects ^ have limited resources 2
  • 3. Defect Prediction 0.8 0.1 Prediction Model Size Pre-release defects . . Complexity Input: Metrics Churn Output: Risk [0..1] . . Key Predictors: Size and pre-release defects 3
  • 4. Existing Approaches Aren’t Adding Value • Obvious to practitioners • Require a large amount of effort • Not all defects are equally important So….what can we do? FOCUS ON HIGH-IMPACT DEFECTS ! 4
  • 5. Impact Is In The Eye of The Beholder! Customers: Breakages Break existing functionality Affect established customers Hurt company image Low pre-, high post-release defects Catch developers off-guard Lead to schedule interruptions Developers: Surprises Occur in unexpected locations 5
  • 6. Case Study Commercial telecom project 30+ years of development 7+ MLOC Mainly in C/C++ 6
  • 7. Part 1 Part 2 Part 3 Part 4 Exploratory Study of Breakages and Surprises Prediction of Breakages and Surprises Understanding Prediction Models of Breakages and Surprises Value of Focusing on Breakages and Surprises Study Overview 7
  • 8. Exploratory Study of Breakages and Surprises All files Breakages Surprises Post-release 10% 2% 2% Rare (2% of files) 6% overlap  Should study them separately  Very difficult to model 8
  • 9. Part 1 Part 2 Part 3 Part 4 Exploratory Study of Breakages and Surprises Prediction of Breakages and Surprises Understanding Prediction Models of Breakages and Surprises Value of Focusing on Breakages and Surprises Predicting Breakages and Surprises 9
  • 10. Prediction Using Logistic Regression Outcome = Const + β1 factor 1 + β2 factor2 + β3 factor 3 . . + βn factor n Breakage? Surprises? Factors From 3 Dimensions 10
  • 11. Factors Used to Model Breakages and Surprises Size Pre-release defects Number, churn, size, pre-release changes, pre-release defects Latest change Age Traditional Co-changed files Time 11
  • 13. Part 1 Part 2 Part 3 Part 4 Exploratory Study of Breakages and Surprises Prediction of Breakages and Surprises Understanding Prediction Models of Breakages and Surprises Value of Focusing on Breakages and Surprises Understanding Breakages and Surprises Models 13
  • 14. Determining Important Factors Traditional Co-change Time 15.6% Quality of fit  Deviance Explained +1.5% +0.4% Example: Breakages R1.1 14
  • 15. Traditional Co-change Time Important Factors for High-Impact Defects 0 5 10 15 20 25 30 35 40 R1.1 R2.1 R3 R4 R4.1 0 5 10 15 20 25 30 35 40 R1.1 R2.1 R3 R4 R4.1 Breakages Surprises DevianceExplained(%) Traditional Co-change Time 15
  • 16. Part 1 Part 2 Part 3 Part 4 Exploratory Study of Breakages and Surprises Prediction of Breakages and Surprises Understanding Prediction Models of Breakages and Surprises Value of Focusing on Breakages and Surprises Value of Focusing on Breakages and Surprises 16
  • 18. Effort Savings Using Specialized Models 41 42 55 50 0 10 20 30 40 50 60 70 80 90 100 File LOC EffortSavings(%) Breakages Surprises40-50% Effort Savings Using Specialized Models 18
  • 19. Take Home Messages 1. Breakages and Surprises are different. Occur in 2% of files, hard to predict 2. Achieve 2-3X improvement in precision, high recall Co-change and Time metrics 4. Building specialized models saves 40-50% effort  Traditional metrics3. Breakages Surprises  19 http://research.cs.queensu.ca/home/emads/data/FSE2011/hid_artifact.html
  • 20. 20
  • 21. Quantifying Effort Savings Yes No Yes 26 320 No 7 1093 Predicted Actual Yes No Yes 26 538 No 7 875 Predicted Actual Set recall to be the same Effort Savings ~41%! General model Specialized model 21
  • 22. Remaining Challenges • “We tend to test features not files” – Can we predict defects for features • “Without knowing more about the nature of the defect or recommendations for how to fix it, I am not sure how we can use it” – Predict the nature of defects – Can we provide specific remediation strategies for predicted defects • e.g., surprises mostly relate to incorrectly implemented requirements 22
  • 23. Quantifying Effect…An Example… Prediction Model Median Size Median Pre-defects . . Median age 2 x Median Size 0.10.2 23
  • 24. Effect of Factors on Breakages and Surprises 154 39 -85 -19 -92 -150 -100 -50 0 50 100 150 200 Pre-release defects Size No. co- changed files Churn of co- changed files Latest change Breakages Surprises 24
  • 25. High Impact Defects: Summary Can we identify them? What factor best predict them? What is the value of focusing on them? Yes, 2-3X precision, ~70% recall Breakages: Traditional Surprises: Co-change and release schedule 40-50% effort savings 25
  • 26. Current approaches predict the obvious Focus on high-impact, i.e. surprises and breakages Pre-defects and size predict Breakages Number and churn of co-changed files and late changes predict surprises Using specialized models reduces effort by 40-50% 26
  • 27. Study Overview Extract Metrics Build Statistical Models Analyze Effect on Quality 1. Traditional 2. Co-change 3. Time Logistic Regression 1. Predictive & explanative power 2. Quantify Effect 27
  • 28. Breakage Defects Defects that break existing functionality Affect an established customer base Hurt quality image 28
  • 29. Surprise Defects Flag files with defects in unexpected locations Catch practitioners off guard Interrupt schedules High ratio of post- to-pre defects 29
  • 30. Predicting Breakages and Surprises Explanative Power Breakages Surprises 17.8% 13.1% State of Art (post-release) 17.7 – 27.9% 30
  • 31. Stability of Important Factors Breakages R1.1 R2.1 R3.1 No. co-changed files Late changes Pre-defects R3 R4.1 Size Churn co-changed files Highly stable Mainly stable Not stable 31
  • 32. Stability of Important Factors R1.1 R2.1 R3.1R3 R4.1 R1.1 R2.1 R3.1R3 R4.1 Breakages Surprises 32
  • 33. Breakage Defects Defects that break existing functionality Affect an established customer base Hurt quality image 33
  • 34. Surprise Defects Flag files with defects in unexpected locations Catch practitioners off guard Interrupt schedules High ratio of post- to-pre defects 34
  • 35. Defect Prediction Helps Focus Quality Assurance Efforts Extract Metrics Size Complexity . . Post-release defects D(f) = C + 0.1*size(f) + 0.2*complexity(f) + … Model (e.g. Logistic Regression) Extract Metrics Size Complexity D(f) = C + 0.1*size(f) + 0.2*complexity(f) + … D(f) = 0.8 D(f) = 0.6 35
  • 36. Factors Used to Model High Impact Defects Size Pre-release defects Age Number, churn, size, pre-release changes, pre-release defects Latest changes Traditional Co-changed files Release schedule 36
  • 37. Size Pre-release defects # of files Churn Size Pre-release defects Pre-release changes Latest Change Age Co-Changed Files Prediction Factors 37
  • 38. Evaluation of Prediction Model Yes No Yes TP FP No FN TN Predicted Actual Precision 𝑇𝑃 𝑇𝑃 + 𝐹𝑃 𝑇𝑃 𝑇𝑃 + 𝐹𝑁 Recall Training 2/3 Testing 1/3 Data Build Model Input Outcome 38

Editor's Notes

  1. How is this slide related to previous one?
  2. Way too many terms that are not defined: Predictive power - relative impact -effort saving Just remove all green stuff for now – you need to sell your work for now not the exact technique the exact techqniue needs to be presented and detailed later on. Avoid Green text very hard on the eyes Also you never get back to these questions? These questions need to be answered later in your presentation (so the presentation should be around that structure and your conclusion should highlight these answers too) The black magic picture means that your methodology is black magic Predictors are a way to study this thing – your paper is not about predictors it is about studying what makes things happen. You are using prediction models as a tool for your study. What are the best predictors  What
  3. Factors… may be say Causes? What is this graph? ?How is it measured? What is your Y-axis? Need aslide before to explain how this graph is generated and what is the intuition behind it?
  4. Way too many terms that are not defined: Predictive power - relative impact -effort saving Just remove all green stuff for now – you need to sell your work for now not the exact technique the exact techqniue needs to be presented and detailed later on. Avoid Green text very hard on the eyes Also you never get back to these questions? These questions need to be answered later in your presentation (so the presentation should be around that structure and your conclusion should highlight these answers too) The black magic picture means that your methodology is black magic Predictors are a way to study this thing – your paper is not about predictors it is about studying what makes things happen. You are using prediction models as a tool for your study. What are the best predictors  What
  5. I do not get how you measured effort savings? What do you mean by File or LOC? Need a slide before this to explain what you are doing? In the last slide you said you are comparing false positives.. I do not see that I just see File and LOC
  6. Have one model box Then have angeled input so both inputs and output are visible.. Ie. have two lines in and two lines out.
  7. I would use the overview running slide here and have below each point – the take home
  8. Put a basic model of how defect prediction works and how people use it, so attendees understand what defect prediction is about – not everyone knows this stuff
  9. How is this slide related to previous one?
  10. precision is the fraction of retrieved instances that are relevant, while recall is the fraction of relevant instances that are retrieved