SlideShare a Scribd company logo
1 of 25
Download to read offline
PREDICTING POST-RELEASE
DEFECTS USING PRE-RELEASE
FIELD TESTING RESULTS

          Foutse Khomh, Brian
          Chan, Ying Zou



          Anand Sinha, Dave Dietz
FIELD TESTING CYCLE




Field testing is important to improve the quality of   2
 an application before release.
MEAN TIME BETWEEN
FAILURE




Mean Time Between Failures (MTBF) is frequently
 used to gauge the reliability of the application.

Applications with a low MTBF are undesirable
                                                     3
  since they would have a higher number of
                   defects
AVERAGE USAGE TIME
 AVT is the average time that a user actively uses the
 application.




 The AVT can be longer than the period of field testing.

 A longer AVT indicates that an application is
                                                         4
reliable and a user tends to use the application
                     longer.
PROBLEM STATEMENT
 MTBF and AVT cannot capture the whole
pattern of failure occurrences in the field testing
of an application.




                                                      5

The reliability of A and B is very different.
METRICS
We propose three metrics that capture additional
patterns of failure occurrences:

   TTFF: the average length of usage time before
  the occurrence of the first failure,

   FAR: the failure accumulation rating to gauge
  the spread of failures to the majority of users,
  and

  OFR: the overall failure ratio that captures
  daily rates of failures.                           6
AVERAGE TIME TO FIRST
                                FAILURE (TTFF)
                                                           VersionA
% of users reporting failures




                                0.45
                                 0.4
                                0.35
                                 0.3
                                0.25
                                 0.2
                                0.15
                                 0.1
                                0.05
                                  0                                                         7
                                       1   2   3   4   5   6   7   8   9   10 11 12 13 14
                                                       Days
AVERAGE TIME TO FIRST
                                FAILURE (TTFF)
                                                       VersionA       VersionB
% of users reporting failures




                                0.45
                                 0.4
                                0.35
                                 0.3
                                0.25
                                 0.2
                                0.15
                                 0.1
                                0.05
                                  0                                                            8
                                       1   2   3   4    5   6     7   8   9   10 11 12 13 14
                                                        Days
AVERAGE TIME TO FIRST
FAILURE (TTFF)




                 reporting failures
                                              VersionA    VersionB




                    % of users
                                      0.5
                                      0.4
                                      0.3
                                      0.2
                                      0.1
                                        0
                                            1 2 3 4 5 6 7 8 9 1011121314

                                                   Days



TTFF produces high scores for applications
where the majority of users experience the                            9
            first failure late.
AVERAGE TIME TO FIRST
                                FAILURE (TTFF)
                                                       VersionA       VersionB
                                0.45
% of users reporting failures




                                 0.4
                                0.35
                                 0.3
                                                                  TTFFB = 3.56
                                0.25
                                 0.2
                                0.15
                                                                  TTFFA = 6.11
                                 0.1
                                0.05
                                  0                                                            10
                                       1   2   3   4    5   6     7   8   9   10 11 12 13 14
                                                        Days
FAILURE ACCUMULATION
                       RATING (FAR)
                        1
                       0.9
                       0.8
                       0.7
% of users reporting




                       0.6
                       0.5
                                                                                VersionA
                       0.4
                       0.3
                       0.2
                       0.1
                        0                                                                  11
                             1   2   3   4   5   6   7   8   9 10 11 12 13 14
                                     Number of unique failures
FAILURE ACCUMULATION
                       RATING (FAR)
                        1
                       0.9
                       0.8
                       0.7
% of users reporting




                       0.6
                       0.5                                                      VersionA
                       0.4                                                      VersionB
                       0.3
                       0.2
                       0.1
                        0                                                                  12
                             1   2   3   4   5   6   7   8   9 10 11 12 13 14
                                     Number of unique failures
FAILURE ACCUMULATION
RATING (FAR) 1




                      % of users reporting
                                             0.8

                                             0.6

                                             0.4

                                             0.2

                                              0
                                                   1     3   5   7   9   11   13
                                                       Number of unique failures




   The FAR metric produces high scores for
                                                                               13
applications where the majority of users report
        a very low numbers of failures.
FAILURE ACCUMULATION
                       RATING (FAR)
                        1
                       0.9
                       0.8
                       0.7                                   FARB = 4.97
% of users reporting




                       0.6
                       0.5                                                      VersionA
                       0.4                                   FARA = 6.97        VersionB
                       0.3
                       0.2
                       0.1
                        0                                                                  14
                             1   2   3   4   5   6   7   8   9 10 11 12 13 14
                                     Number of unique failures
OVERALL FAILURE RATING
                                (OFR)
                                                           VersionA
                                0.35
% of users reporting failures




                                 0.3

                                0.25

                                 0.2

                                0.15

                                 0.1

                                0.05

                                  0                                                         15
                                       1   2   3   4   5   6   7   8   9   10 11 12 13 14
                                                       Days
OVERALL FAILURE RATING
                                (OFR)
                                                       VersionA       VersionB
                                0.35
% of users reporting failures




                                 0.3

                                0.25

                                 0.2

                                0.15

                                 0.1

                                0.05

                                  0                                                            16
                                       1   2   3   4    5   6     7   8   9   10 11 12 13 14
                                                        Days
OVERALL FAILURE RATING
(OFR)




                      % of users reporting
                                                   VersionA        VersionB
                                             0.4
                                             0.3
                                             0.2




                      failures
                                             0.1
                                              0
                                                   1   3   5   7   9   11 13
                                                           Days




 The OFR metric produces high scores for
                                                                              17
  applications with fewer users reporting
              failures overall.
OVERALL FAILURE RATING
                                (OFR)
                                                       VersionA       VersionB           OFRB = 0.78
                                0.35
% of users reporting failures




                                 0.3
                                                                                         OFRA = 0.93
                                0.25

                                 0.2

                                0.15

                                 0.1

                                0.05

                                  0                                                             18
                                       1   2   3   4    5   6     7   8   9   10 11 12 13 14
                                                        Days
CASE STUDY
We analyze 18 versions of an enterprise software
 application

 Overall 2,546 users were involved in the field
 testing

 The testing period lasted 30 days




                                                   19
SPEARMAN CORRELATION
OF THE METRICS

         TTFF    FAR     OFR     AVT     MTBF

  TTFF    1      0.09    -0.08   -0.31   -0.08

  FAR    0.09     1      0.07    0.33    -0.24

  OFR    -0.08   0.07     1      0.39    -0.54

  AVT    -0.31   0.33    0.39     1      -0.3

  MTBF   -0.08   -0.24   -0.54   -0.3     1      20
INDEPENDENCY AMONG
PROPOSED METRICS
  1
0.8
0.6
0.4
0.2                            TTFF
                               FAR
  0
       PC1   PC2   PC3   PC4   OFR
-0.2
                               MTBF
-0.4
-0.6
-0.8
                                      21
 -1
PREDICTIVE POWER FOR
                    POST-RELEASE DEFECTS
                    0.14

                    0.12

                     0.1
Marginal R-square
           square




                    0.08                                       6 months
                                                               1 year
                    0.06
                                                               2 years
                    0.04

                    0.02

                      0                                                   22
                           TTFF   FAR     OFR     AVT   MTBF
                                        Metrics
PRECISION OF PREDICTIONS
                WITH ALL FIVE METRICS
                100
                 90
                 80
                 70
                 60
                                                             6 months
                 50
Precision (%)




                                                             1 year
                 40
                                                             2 years
                 30
                 20
                 10
                  0                                                     23
                      5   10      15     20     25      30
                               Number of testing days
CONCLUSION
TTFF, FAR, and OFR complement the traditional
MTBF and AVT in predicting the number of post-
release defects

 Provide faster predictions of the number of post-
release defects with good precision within just 5
days of a pre-release testing period

It takes MTBF up to 25 days to predict the
number of post-release defects
                                                     24
25

More Related Content

Viewers also liked

ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...
ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...
ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...ICSM 2011
 
ERA - Tracking Technical Debt
ERA - Tracking Technical DebtERA - Tracking Technical Debt
ERA - Tracking Technical DebtICSM 2011
 
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...ICSM 2011
 
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...ICSM 2011
 
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...ICSM 2011
 
Components - Graph Based Detection of Library API Limitations
Components - Graph Based Detection of Library API LimitationsComponents - Graph Based Detection of Library API Limitations
Components - Graph Based Detection of Library API LimitationsICSM 2011
 
Richard Kemmerer Keynote icsm11
Richard Kemmerer Keynote icsm11Richard Kemmerer Keynote icsm11
Richard Kemmerer Keynote icsm11ICSM 2011
 
Lionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 KeynoteLionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 KeynoteICSM 2011
 
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...ICSM 2011
 
Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...ICSM 2011
 
Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects ICSM 2011
 
ERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
ERA - A Comparison of Stemmers on Source Code Identifiers for Software SearchERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
ERA - A Comparison of Stemmers on Source Code Identifiers for Software SearchICSM 2011
 
Metrics - You can't control the unfamiliar
Metrics - You can't control the unfamiliarMetrics - You can't control the unfamiliar
Metrics - You can't control the unfamiliarICSM 2011
 
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...ICSM 2011
 
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...ICSM 2011
 
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...ICSM 2011
 
Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...ICSM 2011
 
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...ICSM 2011
 
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...ICSM 2011
 
ERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to TaskERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to TaskICSM 2011
 

Viewers also liked (20)

ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...
ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...
ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...
 
ERA - Tracking Technical Debt
ERA - Tracking Technical DebtERA - Tracking Technical Debt
ERA - Tracking Technical Debt
 
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
 
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
 
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
 
Components - Graph Based Detection of Library API Limitations
Components - Graph Based Detection of Library API LimitationsComponents - Graph Based Detection of Library API Limitations
Components - Graph Based Detection of Library API Limitations
 
Richard Kemmerer Keynote icsm11
Richard Kemmerer Keynote icsm11Richard Kemmerer Keynote icsm11
Richard Kemmerer Keynote icsm11
 
Lionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 KeynoteLionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 Keynote
 
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
 
Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...
 
Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects
 
ERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
ERA - A Comparison of Stemmers on Source Code Identifiers for Software SearchERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
ERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
 
Metrics - You can't control the unfamiliar
Metrics - You can't control the unfamiliarMetrics - You can't control the unfamiliar
Metrics - You can't control the unfamiliar
 
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
 
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
 
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
 
Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...
 
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
 
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
 
ERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to TaskERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to Task
 

Similar to Reliability and Quality - Predicting post-release defects using pre-release field testing results

Slidecast Financial situation Unilever
Slidecast Financial situation UnileverSlidecast Financial situation Unilever
Slidecast Financial situation UnileverSam Guldemont
 
Dp and causal analysis guideline
Dp and causal analysis guidelineDp and causal analysis guideline
Dp and causal analysis guidelineM H Chandra
 
SPICE MODEL of 1SS393 (Standard Model) in SPICE PARK
SPICE MODEL of 1SS393 (Standard Model) in SPICE PARKSPICE MODEL of 1SS393 (Standard Model) in SPICE PARK
SPICE MODEL of 1SS393 (Standard Model) in SPICE PARKTsuyoshi Horigome
 
Feasible study of a light weight prediction system in China
Feasible study of a light weight prediction system in ChinaFeasible study of a light weight prediction system in China
Feasible study of a light weight prediction system in ChinaOsamu Masutani
 
SPICE MODEL of TLP521-4 SAMPLE B in SPICE PARK
SPICE MODEL of TLP521-4 SAMPLE B in SPICE PARKSPICE MODEL of TLP521-4 SAMPLE B in SPICE PARK
SPICE MODEL of TLP521-4 SAMPLE B in SPICE PARKTsuyoshi Horigome
 
SPICE MODEL of C4D15120A LTspice Model (Professional Model) in SPICE PARK
SPICE MODEL of C4D15120A LTspice Model (Professional Model) in SPICE PARKSPICE MODEL of C4D15120A LTspice Model (Professional Model) in SPICE PARK
SPICE MODEL of C4D15120A LTspice Model (Professional Model) in SPICE PARKTsuyoshi Horigome
 
SPICE MODEL of MA4S111 (Standard Model) in SPICE PARK
SPICE MODEL of MA4S111 (Standard Model) in SPICE PARKSPICE MODEL of MA4S111 (Standard Model) in SPICE PARK
SPICE MODEL of MA4S111 (Standard Model) in SPICE PARKTsuyoshi Horigome
 
SPICE MODEL of MA4S111 (Standard Model) in SPICE PARK
SPICE MODEL of MA4S111 (Standard Model) in SPICE PARKSPICE MODEL of MA4S111 (Standard Model) in SPICE PARK
SPICE MODEL of MA4S111 (Standard Model) in SPICE PARKTsuyoshi Horigome
 
SPICE MODEL of C4D40120D LTspice Model (Professional Model) in SPICE PARK
SPICE MODEL of C4D40120D LTspice Model (Professional Model) in SPICE PARKSPICE MODEL of C4D40120D LTspice Model (Professional Model) in SPICE PARK
SPICE MODEL of C4D40120D LTspice Model (Professional Model) in SPICE PARKTsuyoshi Horigome
 
Venture Capitalist Competition
Venture Capitalist CompetitionVenture Capitalist Competition
Venture Capitalist Competitiondoshihardik
 
Failure Reporting Webex Slides - March 9, 2010
Failure Reporting Webex Slides - March 9, 2010Failure Reporting Webex Slides - March 9, 2010
Failure Reporting Webex Slides - March 9, 2010Ricky Smith CMRP, CMRT
 
SPICE MODEL of 2FWJ42M (Standard Model) in SPICE PARK
SPICE MODEL of 2FWJ42M (Standard Model) in SPICE PARKSPICE MODEL of 2FWJ42M (Standard Model) in SPICE PARK
SPICE MODEL of 2FWJ42M (Standard Model) in SPICE PARKTsuyoshi Horigome
 
SPICE MODEL of D3FS4A (Standard Model) in SPICE PARK
SPICE MODEL of D3FS4A (Standard Model) in SPICE PARKSPICE MODEL of D3FS4A (Standard Model) in SPICE PARK
SPICE MODEL of D3FS4A (Standard Model) in SPICE PARKTsuyoshi Horigome
 

Similar to Reliability and Quality - Predicting post-release defects using pre-release field testing results (18)

Slidecast Financial situation Unilever
Slidecast Financial situation UnileverSlidecast Financial situation Unilever
Slidecast Financial situation Unilever
 
Dp and causal analysis guideline
Dp and causal analysis guidelineDp and causal analysis guideline
Dp and causal analysis guideline
 
SPICE MODEL of 1SS393 (Standard Model) in SPICE PARK
SPICE MODEL of 1SS393 (Standard Model) in SPICE PARKSPICE MODEL of 1SS393 (Standard Model) in SPICE PARK
SPICE MODEL of 1SS393 (Standard Model) in SPICE PARK
 
Feasible study of a light weight prediction system in China
Feasible study of a light weight prediction system in ChinaFeasible study of a light weight prediction system in China
Feasible study of a light weight prediction system in China
 
Database Health Check
Database Health CheckDatabase Health Check
Database Health Check
 
SPICE MODEL of TLP521-4 SAMPLE B in SPICE PARK
SPICE MODEL of TLP521-4 SAMPLE B in SPICE PARKSPICE MODEL of TLP521-4 SAMPLE B in SPICE PARK
SPICE MODEL of TLP521-4 SAMPLE B in SPICE PARK
 
C4 d15120a p
C4 d15120a pC4 d15120a p
C4 d15120a p
 
SPICE MODEL of C4D15120A LTspice Model (Professional Model) in SPICE PARK
SPICE MODEL of C4D15120A LTspice Model (Professional Model) in SPICE PARKSPICE MODEL of C4D15120A LTspice Model (Professional Model) in SPICE PARK
SPICE MODEL of C4D15120A LTspice Model (Professional Model) in SPICE PARK
 
C4 d15120a p
C4 d15120a pC4 d15120a p
C4 d15120a p
 
C4 d15120a p
C4 d15120a pC4 d15120a p
C4 d15120a p
 
SPICE MODEL of MA4S111 (Standard Model) in SPICE PARK
SPICE MODEL of MA4S111 (Standard Model) in SPICE PARKSPICE MODEL of MA4S111 (Standard Model) in SPICE PARK
SPICE MODEL of MA4S111 (Standard Model) in SPICE PARK
 
SPICE MODEL of MA4S111 (Standard Model) in SPICE PARK
SPICE MODEL of MA4S111 (Standard Model) in SPICE PARKSPICE MODEL of MA4S111 (Standard Model) in SPICE PARK
SPICE MODEL of MA4S111 (Standard Model) in SPICE PARK
 
SPICE MODEL of C4D40120D LTspice Model (Professional Model) in SPICE PARK
SPICE MODEL of C4D40120D LTspice Model (Professional Model) in SPICE PARKSPICE MODEL of C4D40120D LTspice Model (Professional Model) in SPICE PARK
SPICE MODEL of C4D40120D LTspice Model (Professional Model) in SPICE PARK
 
Venture Capitalist Competition
Venture Capitalist CompetitionVenture Capitalist Competition
Venture Capitalist Competition
 
Failure Reporting Webex Slides - March 9, 2010
Failure Reporting Webex Slides - March 9, 2010Failure Reporting Webex Slides - March 9, 2010
Failure Reporting Webex Slides - March 9, 2010
 
SPICE MODEL of 2FWJ42M (Standard Model) in SPICE PARK
SPICE MODEL of 2FWJ42M (Standard Model) in SPICE PARKSPICE MODEL of 2FWJ42M (Standard Model) in SPICE PARK
SPICE MODEL of 2FWJ42M (Standard Model) in SPICE PARK
 
Automotive UI 2011
Automotive UI 2011Automotive UI 2011
Automotive UI 2011
 
SPICE MODEL of D3FS4A (Standard Model) in SPICE PARK
SPICE MODEL of D3FS4A (Standard Model) in SPICE PARKSPICE MODEL of D3FS4A (Standard Model) in SPICE PARK
SPICE MODEL of D3FS4A (Standard Model) in SPICE PARK
 

Recently uploaded

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Reliability and Quality - Predicting post-release defects using pre-release field testing results

  • 1. PREDICTING POST-RELEASE DEFECTS USING PRE-RELEASE FIELD TESTING RESULTS Foutse Khomh, Brian Chan, Ying Zou Anand Sinha, Dave Dietz
  • 2. FIELD TESTING CYCLE Field testing is important to improve the quality of 2 an application before release.
  • 3. MEAN TIME BETWEEN FAILURE Mean Time Between Failures (MTBF) is frequently used to gauge the reliability of the application. Applications with a low MTBF are undesirable 3 since they would have a higher number of defects
  • 4. AVERAGE USAGE TIME AVT is the average time that a user actively uses the application. The AVT can be longer than the period of field testing. A longer AVT indicates that an application is 4 reliable and a user tends to use the application longer.
  • 5. PROBLEM STATEMENT MTBF and AVT cannot capture the whole pattern of failure occurrences in the field testing of an application. 5 The reliability of A and B is very different.
  • 6. METRICS We propose three metrics that capture additional patterns of failure occurrences: TTFF: the average length of usage time before the occurrence of the first failure, FAR: the failure accumulation rating to gauge the spread of failures to the majority of users, and OFR: the overall failure ratio that captures daily rates of failures. 6
  • 7. AVERAGE TIME TO FIRST FAILURE (TTFF) VersionA % of users reporting failures 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Days
  • 8. AVERAGE TIME TO FIRST FAILURE (TTFF) VersionA VersionB % of users reporting failures 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Days
  • 9. AVERAGE TIME TO FIRST FAILURE (TTFF) reporting failures VersionA VersionB % of users 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 1011121314 Days TTFF produces high scores for applications where the majority of users experience the 9 first failure late.
  • 10. AVERAGE TIME TO FIRST FAILURE (TTFF) VersionA VersionB 0.45 % of users reporting failures 0.4 0.35 0.3 TTFFB = 3.56 0.25 0.2 0.15 TTFFA = 6.11 0.1 0.05 0 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Days
  • 11. FAILURE ACCUMULATION RATING (FAR) 1 0.9 0.8 0.7 % of users reporting 0.6 0.5 VersionA 0.4 0.3 0.2 0.1 0 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Number of unique failures
  • 12. FAILURE ACCUMULATION RATING (FAR) 1 0.9 0.8 0.7 % of users reporting 0.6 0.5 VersionA 0.4 VersionB 0.3 0.2 0.1 0 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Number of unique failures
  • 13. FAILURE ACCUMULATION RATING (FAR) 1 % of users reporting 0.8 0.6 0.4 0.2 0 1 3 5 7 9 11 13 Number of unique failures The FAR metric produces high scores for 13 applications where the majority of users report a very low numbers of failures.
  • 14. FAILURE ACCUMULATION RATING (FAR) 1 0.9 0.8 0.7 FARB = 4.97 % of users reporting 0.6 0.5 VersionA 0.4 FARA = 6.97 VersionB 0.3 0.2 0.1 0 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Number of unique failures
  • 15. OVERALL FAILURE RATING (OFR) VersionA 0.35 % of users reporting failures 0.3 0.25 0.2 0.15 0.1 0.05 0 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Days
  • 16. OVERALL FAILURE RATING (OFR) VersionA VersionB 0.35 % of users reporting failures 0.3 0.25 0.2 0.15 0.1 0.05 0 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Days
  • 17. OVERALL FAILURE RATING (OFR) % of users reporting VersionA VersionB 0.4 0.3 0.2 failures 0.1 0 1 3 5 7 9 11 13 Days The OFR metric produces high scores for 17 applications with fewer users reporting failures overall.
  • 18. OVERALL FAILURE RATING (OFR) VersionA VersionB OFRB = 0.78 0.35 % of users reporting failures 0.3 OFRA = 0.93 0.25 0.2 0.15 0.1 0.05 0 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Days
  • 19. CASE STUDY We analyze 18 versions of an enterprise software application Overall 2,546 users were involved in the field testing The testing period lasted 30 days 19
  • 20. SPEARMAN CORRELATION OF THE METRICS TTFF FAR OFR AVT MTBF TTFF 1 0.09 -0.08 -0.31 -0.08 FAR 0.09 1 0.07 0.33 -0.24 OFR -0.08 0.07 1 0.39 -0.54 AVT -0.31 0.33 0.39 1 -0.3 MTBF -0.08 -0.24 -0.54 -0.3 1 20
  • 21. INDEPENDENCY AMONG PROPOSED METRICS 1 0.8 0.6 0.4 0.2 TTFF FAR 0 PC1 PC2 PC3 PC4 OFR -0.2 MTBF -0.4 -0.6 -0.8 21 -1
  • 22. PREDICTIVE POWER FOR POST-RELEASE DEFECTS 0.14 0.12 0.1 Marginal R-square square 0.08 6 months 1 year 0.06 2 years 0.04 0.02 0 22 TTFF FAR OFR AVT MTBF Metrics
  • 23. PRECISION OF PREDICTIONS WITH ALL FIVE METRICS 100 90 80 70 60 6 months 50 Precision (%) 1 year 40 2 years 30 20 10 0 23 5 10 15 20 25 30 Number of testing days
  • 24. CONCLUSION TTFF, FAR, and OFR complement the traditional MTBF and AVT in predicting the number of post- release defects Provide faster predictions of the number of post- release defects with good precision within just 5 days of a pre-release testing period It takes MTBF up to 25 days to predict the number of post-release defects 24
  • 25. 25