Revised IEEE 1633 Recommended Practices for Software Reliability
Advantages of IEEE 1633 Recommend
Practices for Software Reliability
Chair: Ann Marie Neufelder, SoftRel, LLC
Vice Chair: Martha Wetherholt, NASA
Secretary: Debra Haehn, Philips
IEEE Standards Association Chair: Louis Gullo, Raytheon Missile Systems
Software reliability timeline
1960’s 1970’s 1980’s 1990’s
failure due to
Many software reliability growth estimation models
developed. Limitation– can’t be used until late in
First predictive model
developed by USAF Rome
Air Development Center
with SAIC and Research
Triangle Park –
Limitations– model only
useful for aircraft and
never updated after
on RL model.
Can be used
on any system.
Larry Putnam/QSM quantifies the bell curve
used for both scheduling and staffing
Introduction and motivation
• Software reliability engineering
• Has existed for over 50 years.
• Fundamental prerequisite for virtually all modern systems
• Rich body of software reliability research generated over last
several decades, but…
• Practical guidance on how to apply these models has lagged
• Diverse set of stakeholders requires pragmatic guidance and tools
to apply software reliability models to assess real software or
firmware projects during each stage of the software development
• Reliability engineers may lack software development experience
• Software engineers may be unfamiliar with methods to predict software
• Both may have challenges acquiring data needed for the analyses
• Newly revised IEEE 1633 Recommended Practice for Software
Reliability provides actionable step by step procedures for employing
software reliability models and analyses
• During any phase of software or firmware development
• With any software lifecycle model for any industry or application
• Easy to use models for predicting software reliability early in
development and during test and operation.
• Methods to analyze software failure modes and include software in
a system fault tree analysis.
• Ability to assess the reliability of COTS, FOSS, and contractor or
subcontractor delivered software.
• This presentation will cover the key features of the IEEE 1633
Recommended Practices for software reliability.
• Current status of this document - Approved by IEEE Standards
Association Ballot of May 24, 2016
Acknowledgement of IEEE 1633 Working
• Lance Fiondella
• Peter Lakey
• Robert Binder
• Michael Siok
• Ming Li
• Ying Shi
• Nematollah Bidokhti
• Thierry Wandji
• Michael Grottke
• Andy Long
• George Stark
• Allen Nikora
• Bakul Banerjee
• Debra Greenhalgh Lubas
• Mark Sims
• Rajesh Murthy
• Willie Fitzpatrick
• Mark Ofori-kyei
• Sonya Davis
• Burdette Joyner
• Marty Shooman
• Andrew Mack
• Loren Garroway
• Kevin Mattos
• Kevin Frye
• Claire Jones
• Robert Raygan
• Mary Ann DeCicco
• Shane Smith
• Franklin Marotta
• David Bernreuther
• Martin Wayne
• Nathan Herbert
• Richard E Gibbs III
• Harry White
• Jacob Axman
• Ahlia T. Kitwana
• Yuan Wei
• Darwin Heiser
• Brian McQuillan
• Kishor Trivedi
Chair: Ann Marie
Neufelder, SoftRel, LLC
Vice Chair: Martha
Association Chair: Louis
Gullo, Raytheon Missile
IEEE 1633 Working Group
• Defense/aerospace contractors – 11 members
• Commercial engineering – 9 members
• US Army – 6 members
• US Navy – 5 members
• Academia – 4 members
• DoD – 3 members
• NASA – 3 members
• Medical equipment – 2 members
• Software Engineering Institute – 1 member
• Nuclear Regulatory Commission – 1 member
Table of contents
1,2,3 Overview, definitions and acronyms
4 Tailoring guidance
5 “Actionable” Procedures with Checklists and Examples
5.1 Planning for software reliability.
5.2 Develop a failure modes mode
5.3 Apply SRE during development
5.4 Apply SRE during testing
5.5 Support Release decision
5.6 Apply SRE in operation
Annex A Supporting information on the software FMEA
Annex B Detailed procedures on predicting size and supporting information for the predictive
Annex C Supporting information for the software reliability growth models
Annex D Estimated cost of SRE
Annex E SRE tools
Annex F Examples
Section 4 SRE Tailoring
• The document is geared towards 4 different roles, any
industry and any type of software.
• Hence, section 4 provides guidance for tailoring the
• By role – recommended sections if you are a reliability engineer,
software QA, software manager or acquisitions.
• By life cycle - How to apply the document if you have an
incremental or agile life cycle model.
• By criticality – Some SR tasks are essential, some are typical and
some are project specific.
Section 5.1 Planning for software reliability
• An often overlooked but essential step in SRE
What are the Line Replaceable Units? (Applications,
executables, DLLs, COTS, FOSS, firmware, glueware)
Which are applicable for SRE?
What is the operational profile?
Define failures and
There is no one definition fits all. Failures need to be
defined relative to the system under development.
Perform a reliability
Determine a simple Red/Yellow/Green SRE risk. Use
that to determine the degree of SRE.
Assess the data
The available data and SRE tools will determine which
tasks are feasible
Review the available
Finalize the SRE plan The Software Reliability Program Plan can be part of
the Software Development Plan or the Reliability Plan
or a standalone document
Section 5.2 Develop Failure ModesAnalysis
• This section focuses on the 3 analyses that identify potential
• Understanding the failure modes is essential for development,
testing, and decision making. Real examples are included in
• Perform Defect Root Cause Analysis (RCA)
• Perform Software Failure Modes Effects Analysis (SFMEA)
• Prepare the SFMEA
• Analyze Failure Modes and Root Causes
• Identify consequences
• Generate a Critical Items List (CIL)
• Understand the differences between a hardware FMEA
and a software FMEA
• Include Software in the System Fault Tree Analysis
SFMEA and SFTA Viewpoints
These are complementary methods
Section 5.3 Apply SRE during development
1. Determine/obtain system
reliability objectives in terms of
reliability, availability, MTBF
Today’s system are software intensive. This makes
it difficult to establish a reasonable system
objective. This document provides 3 approaches
2. Perform software reliability
assessment and prediction
See upcoming slides
3. Sanity check the early
One reason why SRE prediction models haven’t be
used is that reliability engineers are unsure of the
results. The document has typical reliability values
based on the size of the software.
4. Merge the predictions into
the over system prediction
Once the predictions are done, the reliability
engineer will want to integrate them into the overall
system RBD or fault tree. The document has
several methods for doing so.
5. Determine the total software
reliability needed to reach the
Since software engineering is often managed
centrally, the software manager will want to know
what the software components as an aggregate
need to achieve.
Section 5.3 Apply SRE during development
6. Plan the reliability
growth needed to reach the
Once the software objective is established, plans
can and should be made to ensure that there is
sufficient reliability growth in the schedule.
Reliability growth can only happen if the software
is operated in a real environment with no new
7. Perform a sensitivity
Quite often there isn’t sufficient schedule for
extended reliability growth so a sensitivity
analysis is needed to determine how to cut the
defects to reach the objective.
8. Allocate the required
objective to each software
If the software components are managed by
different organizations or vendors, the software
level objective will need to be further allocated.
9. Employ software
There are other metrics that can support decision
making, testing and delivery that also support
more reliable software.
Section 5.3.2 Perform software reliability assessment
• Since the 1970s most of the software reliability models are usable only
during test or operation when it’s too late to do planning, tradeoffs,
• The models presented in this section can be used for the code is even
written. The predictions are then merged with the hardware reliability
predictions and compared to the system reliability objective.
If you can predict this fault profile you can
predict all of the other reliability figures of merit
The predictive models predict the fault profile first and then then
failure rate, MTBF, reliability and availability is predicted from that
Section 5.3 Apply SRE during development
1. Develop a reliability test
Software reliability growth models are useless
unless the software is being exercised. The first
step is to make sure that it is.
2. Measure test coverage The models can’t measure what they don’t know.
The higher the test coverage, the higher the
confidence in the models.
3. Increase test effectiveness
via fault insertion
Many software reliability issues are due to the
software performing an unexpected function as
opposed to it failing to perform a required function.
This increases the confidence in the reliability.
4. Collect failure and defect
All of the models require the testing/operational
hours and either the time of each failure
observation or the total number of failures in a day.
5. Select and use reliability
Before you use any model, you need to plot the
failure data and see which models are applicable.
The document provides complete guidance on how
to do this.
Section 5.3 Apply SRE during development
6. Apply SRE metrics Certain metrics provide information about the
maturity of the software which are essential for
decision making and planning of resources.
7. Determine accuracy of the
The failure trend can change at any time during
testing. Hence, the best model can change with it.
The best way to measure accuracy is to compare
the estimations to the next time to failure.
8. Support release decision The release decision should not be made solely
based on the SRG models. The decision is based
on the test coverage and approach, degree of fault
insertion, other SRE metrics which can indicate
troubled releases as well as the SRG models.
Section 5.4Apply SRE during testing
• Software reliability growth models have existed since the 1970s
• Many of them provide nearly identical results
• SWRG models have been difficult to implement and understand
due to poor guidance from academic community
• Several models assume data which is not feasible to collect on
non-academic large software systems
This document provides
• Models that are feasible for real software systems
• Consolidation of models that provide similar results
• Step by step instructions for how to select the best model(s)
• The observed defect discovery trend (see next slide)
• Inherent Defect Content
• Effort required to use the model(s)
• Availability of data required for the model(s)
• How to apply them when you have an incremental life cycle
• Test coverage methods which affect the accuracy of all SWRG models
Selecting the best SWRG model
• Most important criteria is the current defect discovery trend.
• A few models can be used when the discovery rate is increasing or peaking.
Most can be used when decreasing or stabilizing.
• If the defect discovery is plotted first, the user will know which models can be
Normalized usage period
Section 5.5 Support Release Decision
Once the development and testing is complete the SRE analyses,
models and metrics can be used to determine whether a decision
should be accepted
• Decision is based on
• Requirements and Operational Profile coverage
• Stress test coverage
• Code coverage
• Adequate defect removal
• Confidence in reliability estimates
• SRE Tasks performed prior to acceptance
• Determine Release Stability – do the reliability estimates meet the objective?
• Forecast additional test duration – If the objective hasn’t been met how many more
test hours are required?
• Forecast remaining defects and effort required to correct them – Will the forecasted
defects pile up? Impact the next release?
• Perform a Reliability Demonstration Test – Determine statistically whether the software
meets the objective
Section 5.6Apply SRE in Operations
Once the software is deployed the reliability should be monitored to assess any
changes needed to previous analyses, predictions and estimations
Section 5.6 Apply SRE in Operations
1. Employ SRE metrics to
monitor software reliability
The best way to improve the accuracy of the
predictions and SWRG models is to measure the
actual software reliability once in operation.2. Compare operational and
3. Assess changes to previous
characterizations or analyses
The operational failure modes may be different than
what’s visible in testing. If so, the software failure
modes analyses will need to focus on the
operational failure modes to improve the reliability
of the next release.
4. Archive operational data Operational data is valuable for future predictions,
sanity checking, etc.
• IEEE P1633 2016 puts forth recommended practices to
apply qualitative software failure modes analyses and
• Improve product and ensure software or firmware delivered with
• IEEE P1633 2016 includes improved guidance
• Offers increased value more accessible to a broader audience
• Reliability engineers
• Software quality engineers
• Software managers