Paper : Predicting Post-release Defects Using Pre-release Field Testing Results
Authors : Foutse Khomh, Brian Chan, Ying Zou, Anand Sinha and Dave Dietz
Session: Research Track Session 9: Reliability and Quality
2. FIELD TESTING CYCLE
Field testing is important to improve the quality of 2
an application before release.
3. MEAN TIME BETWEEN
FAILURE
Mean Time Between Failures (MTBF) is frequently
used to gauge the reliability of the application.
Applications with a low MTBF are undesirable
3
since they would have a higher number of
defects
4. AVERAGE USAGE TIME
AVT is the average time that a user actively uses the
application.
The AVT can be longer than the period of field testing.
A longer AVT indicates that an application is
4
reliable and a user tends to use the application
longer.
5. PROBLEM STATEMENT
MTBF and AVT cannot capture the whole
pattern of failure occurrences in the field testing
of an application.
5
The reliability of A and B is very different.
6. METRICS
We propose three metrics that capture additional
patterns of failure occurrences:
TTFF: the average length of usage time before
the occurrence of the first failure,
FAR: the failure accumulation rating to gauge
the spread of failures to the majority of users,
and
OFR: the overall failure ratio that captures
daily rates of failures. 6
7. AVERAGE TIME TO FIRST
FAILURE (TTFF)
VersionA
% of users reporting failures
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0 7
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Days
8. AVERAGE TIME TO FIRST
FAILURE (TTFF)
VersionA VersionB
% of users reporting failures
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0 8
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Days
9. AVERAGE TIME TO FIRST
FAILURE (TTFF)
reporting failures
VersionA VersionB
% of users
0.5
0.4
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 1011121314
Days
TTFF produces high scores for applications
where the majority of users experience the 9
first failure late.
10. AVERAGE TIME TO FIRST
FAILURE (TTFF)
VersionA VersionB
0.45
% of users reporting failures
0.4
0.35
0.3
TTFFB = 3.56
0.25
0.2
0.15
TTFFA = 6.11
0.1
0.05
0 10
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Days
13. FAILURE ACCUMULATION
RATING (FAR) 1
% of users reporting
0.8
0.6
0.4
0.2
0
1 3 5 7 9 11 13
Number of unique failures
The FAR metric produces high scores for
13
applications where the majority of users report
a very low numbers of failures.
19. CASE STUDY
We analyze 18 versions of an enterprise software
application
Overall 2,546 users were involved in the field
testing
The testing period lasted 30 days
19
22. PREDICTIVE POWER FOR
POST-RELEASE DEFECTS
0.14
0.12
0.1
Marginal R-square
square
0.08 6 months
1 year
0.06
2 years
0.04
0.02
0 22
TTFF FAR OFR AVT MTBF
Metrics
23. PRECISION OF PREDICTIONS
WITH ALL FIVE METRICS
100
90
80
70
60
6 months
50
Precision (%)
1 year
40
2 years
30
20
10
0 23
5 10 15 20 25 30
Number of testing days
24. CONCLUSION
TTFF, FAR, and OFR complement the traditional
MTBF and AVT in predicting the number of post-
release defects
Provide faster predictions of the number of post-
release defects with good precision within just 5
days of a pre-release testing period
It takes MTBF up to 25 days to predict the
number of post-release defects
24