An Empirical Study of Reliability Growth of Open versus Closed Source Software through Software Reliability Growth Models
1. An Empirical Study of Reliability Growth of Open versus
Closed Source Software through Software Reliability
Growth Models
Najeeb Ullah , Maurizio Morisio
POLITECNICO DI TORINO
1
3. Motivation
• Open Source Software (OSS) have attracted
significant attention in recent years
• Uses of Open Source Software (OSS) are increasing
and many commercial products use OSS in various
fields
• “Forrester” empirical research highlighted that many
European software companies have clear strategy for
OSS adoption but fear and questions on adoption
• Quality and hence reliability of OSS is under focus
3
4. Background
• In literature software reliability growth models
(SRGM) have been widely used for reliability
characterization of closed source software (CSS)
• Many studies available about applicability of SRGMs
for reliability characterization of OSS with unclear
results
• Studied reliability growth of OSS versus CSS through
SRGMs
4
5. Software reliability growth model: SRGMS
• Reliability:
– Probability of failure free operation of software for specified
period of time in specified environment (IEEE Glossary)
• Reliability Model:
– Mathematical expression that specifies the general form of
the software failure process as a function of factors such as
fault introduction, fault removal, and the operational
environment. (IEEE Glossary)
– Divided into many types
• SRGM are classified into Concave and S-Shaped
Concave Shaped Model S-Shaped Mode
6. Goals And Research Questions
• Study and compare reliability growth of OSS with
CSS projects
• Determine a good model in terms of fitting and
prediction for OSS and CSS projects
– RQ1: Do SRGM models fit well the failure occurrence
pattern of OSS and CSS projects?
– RQ2: Which SRGM models are good predictors for OSS
and CSS?
6
7. Methodology and Models Evaluation Metrics (RQ1)
• Non Linear Regression used for Models fitting.
• Goodness of Fit (i.e. R2 ):
– indicates how successful the fit is.
– R2 takes a value between 0 and 1, inclusive.
– The closer the R2 value is to one, the better the fit.
– Rank models for each data set on R2 value
– Selected Threshold (i.e. Categorization) : R2 > 0.90
7
8. Methodology and Models Evaluation Metrics (RQ2)
• Models are fitted to two-third data points of each data
set and predict the remaining.
• Theil’s Statistics (i.e. TS: Accuracy)
– Average deviation percentage over all data points.
– The closer TS is to zero, the better the prediction accuracy of the
model.
– Rank models for each data set on TS value.
– Selected Threshold (i.e. Categorization) : TS < 10%
• Predictive Relative Error (i.e. PRE: Correctness)
– Ratio between the error difference (actual versus predicted) and
the predicted number of defects
– Rank models for each data set on PRE
– Selected threshold (i.e. Categorization) : 10% of total no of
actual defects
8
9. Selected SRGM Models
• Eight Models have been selected for this study due to
their prevalence among SRM.
Model Type
Musa Okumoto Concave
Inflection S-Shaped S-Shaped
Goel Okumoto Concave
Delayed S-Shaped S-Shaped
Logistic S-Shaped
Yamada Exponential Concave
Gompertz S-Shaped
Generalized Goel Concave
9
10. Data Collection (Industrial datasets)
• Searched papers on IEEE Explorer, ACM Digital
Library and in four journals, i.e. Journal of Information
and Software Technology, the Journal of System and
Software, IEEE software and IEEE Transaction on
Reliability.
• Strings used:
– Software failure rate, Software failure intensity, Software
failure Data sets, Software failure rate and
Reliability, Software failure intensity and Reliability.
• 2000 Papers, Relevant 14 papers, contained failure
data sets on 28 projects (22 from CSS, 6 from OSS) .
10
11. Data Collection (OSS datasets)
11
Projects Number of releases
C++ Standard Library 8
JUDDI 4
GNOME 3
APACHE 3
• Defects data downloaded from Apache repositories
using JIRA
13. Models Ranking Results (RQ1) on R2
CSS DS OSS DS
Model Fitted DS No of DS having
Highest R2 Value
Fitted DS No of DS having
Highest R2 Value
Musa 22/22 5/22 18/18 0/18
Inflection 22/22 4/22 18/18 5/18
Goel 21/22 0/22 18/18 0/18
Delayed 20/22 2/22 16/18 1/18
Logistic 20/22 3/22 15/18 1/18
Yamada 18/22 0/22 18/18 0/18
Gompertz 14/22 4/22 17/18 8/18
Generalized 10/22 4/22 16/18 3/18
13
14. Models Prediction Results (RQ2) On TS & PRE
Box Plots of Prediction Correctness
(PRE) values
Box Plots of Prediction Accuracy
(TS) values
15. Models Ranking Results (RQ2)
CSS DS OSS DS
Model No of DS
having lowest
TS Value
No of DS having
Lowest PRE
Value
No of DS
having lowest
TS Value
No of DS having
Lowest PRE
Value
Musa 10/22 10/22 4/18 2/18
Inflection 2/22 4/22 3/18 3/18
Goel 3/22 3/22 1/18 3/18
Delayed 2/22 2/22 4/18 3/18
Logistic 4/22 2/22 5/18 3/18
Yamada 2/22 3/22 1/18 3/18
Gompertz 3/22 2/22 6/18 6/18
Generalized 1/22 1/22 3/18 2/18
15
16. Threat to Validity
• No hypothesis test in methodology used for
answering RQs
• The choice of thresholds is not grounded in the
literature.
• Provide ranking of the models to identify the best
model for each type of dataset and metric.
16
17. Conclusions
• Results
– All selected SRGM fit to defect data of the OSS projects in
the same manner as that of CSS. OSS reliability grows
similarly to that of CSS.
– Musa Okumoto and Inflection S-Shaped perform well for
CSS, while for OSS Inflection S-Shaped and Gompertz are
good performers.
– SRGM can be used for reliability characterization of OSS
projects
• Observations
– Best models in OSS are different from the best in industrial
datasets
17
18. Questions ?
An Empirical Study of Reliability Growth of Open versus
Closed Source Software through Software Reliability
Growth Models
Najeeb Ullah , Maurizio Morisio
name.surname@polito.it
18