Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Revised IEEE 1633 Recommended Practices for Software Reliability


Published on

The 2008 document has been revised and was approved by the IEEE on 9/19/16.

Published in: Engineering
  • On September 19, 2016 the revision of the IEEE 1633 Recommended Practices for Software Reliability, was approved by the IEEE Standards Association. The document is expected to be available by the end of this year and provides common sense step by step practical guidance for reliability engineers, software quality engineers and software managers. Dozens of working group members from industry participated in development and review of this much needed guidance.
    Are you sure you want to  Yes  No
    Your message goes here

Revised IEEE 1633 Recommended Practices for Software Reliability

  1. 1. Advantages of IEEE 1633 Recommend Practices for Software Reliability Chair: Ann Marie Neufelder, SoftRel, LLC Vice Chair: Martha Wetherholt, NASA Secretary: Debra Haehn, Philips IEEE Standards Association Chair: Louis Gullo, Raytheon Missile Systems Division 1
  2. 2. Software reliability timeline 1960’s 1970’s 1980’s 1990’s 1962 First recorded system failure due to software Many software reliability growth estimation models developed. Limitation– can’t be used until late in testing. 1968 The term “software reliability” is invented. First predictive model developed by USAF Rome Air Development Center with SAIC and Research Triangle Park – Limitations– model only useful for aircraft and never updated after 1992. SoftRel, LLC models based on RL model. Can be used on any system. Updated every 4 years. 2000’s 2 Martin Trachtenberg notices the “bell curve” Larry Putnam/QSM quantifies the bell curve used for both scheduling and staffing
  3. 3. Introduction and motivation • Software reliability engineering • Has existed for over 50 years. • Fundamental prerequisite for virtually all modern systems • Rich body of software reliability research generated over last several decades, but… • Practical guidance on how to apply these models has lagged significantly • Diverse set of stakeholders requires pragmatic guidance and tools to apply software reliability models to assess real software or firmware projects during each stage of the software development lifecycle • Reliability engineers may lack software development experience • Software engineers may be unfamiliar with methods to predict software reliability • Both may have challenges acquiring data needed for the analyses 3
  4. 4. Abstract • Newly revised IEEE 1633 Recommended Practice for Software Reliability provides actionable step by step procedures for employing software reliability models and analyses • During any phase of software or firmware development • With any software lifecycle model for any industry or application type. • Includes • Easy to use models for predicting software reliability early in development and during test and operation. • Methods to analyze software failure modes and include software in a system fault tree analysis. • Ability to assess the reliability of COTS, FOSS, and contractor or subcontractor delivered software. • This presentation will cover the key features of the IEEE 1633 Recommended Practices for software reliability. • Current status of this document - Approved by IEEE Standards Association Ballot of May 24, 2016 4
  5. 5. Acknowledgement of IEEE 1633 Working Group members • Lance Fiondella • Peter Lakey • Robert Binder • Michael Siok • Ming Li • Ying Shi • Nematollah Bidokhti • Thierry Wandji • Michael Grottke • Andy Long • George Stark • Allen Nikora • Bakul Banerjee • Debra Greenhalgh Lubas • Mark Sims • Rajesh Murthy • Willie Fitzpatrick • Mark Ofori-kyei 5 • Sonya Davis • Burdette Joyner • Marty Shooman • Andrew Mack • Loren Garroway • Kevin Mattos • Kevin Frye • Claire Jones • Robert Raygan • Mary Ann DeCicco • Shane Smith • Franklin Marotta • David Bernreuther • Martin Wayne • Nathan Herbert • Richard E Gibbs III • Harry White • Jacob Axman • Ahlia T. Kitwana • Yuan Wei • Darwin Heiser • Brian McQuillan • Kishor Trivedi Chair: Ann Marie Neufelder, SoftRel, LLC Vice Chair: Martha Wetherholt, NASA Secretary: Debra Haehn, Philips IEEE Standards Association Chair: Louis Gullo, Raytheon Missile Systems Division
  6. 6. IEEE 1633 Working Group • Defense/aerospace contractors – 11 members • Commercial engineering – 9 members • US Army – 6 members • US Navy – 5 members • Academia – 4 members • DoD – 3 members • NASA – 3 members • Medical equipment – 2 members • Software Engineering Institute – 1 member • Nuclear Regulatory Commission – 1 member 6
  7. 7. Table of contents Section Contents 1,2,3 Overview, definitions and acronyms 4 Tailoring guidance 5 “Actionable” Procedures with Checklists and Examples 5.1 Planning for software reliability. 5.2 Develop a failure modes mode 5.3 Apply SRE during development 5.4 Apply SRE during testing 5.5 Support Release decision 5.6 Apply SRE in operation Annex A Supporting information on the software FMEA Annex B Detailed procedures on predicting size and supporting information for the predictive models Annex C Supporting information for the software reliability growth models Annex D Estimated cost of SRE Annex E SRE tools Annex F Examples 7
  8. 8. Section 4 SRE Tailoring 8 • The document is geared towards 4 different roles, any industry and any type of software. • Hence, section 4 provides guidance for tailoring the document. • By role – recommended sections if you are a reliability engineer, software QA, software manager or acquisitions. • By life cycle - How to apply the document if you have an incremental or agile life cycle model. • By criticality – Some SR tasks are essential, some are typical and some are project specific.
  9. 9. Typical tasks by role 9
  10. 10. Section 5.1 Planning for software reliability 10
  11. 11. Planning • An often overlooked but essential step in SRE 11 Topic Description Characterize the software system What are the Line Replaceable Units? (Applications, executables, DLLs, COTS, FOSS, firmware, glueware) Which are applicable for SRE? What is the operational profile? Define failures and criticality There is no one definition fits all. Failures need to be defined relative to the system under development. Perform a reliability risk assessment Determine a simple Red/Yellow/Green SRE risk. Use that to determine the degree of SRE. Assess the data collection system The available data and SRE tools will determine which tasks are feasible Review the available SRE tools Finalize the SRE plan The Software Reliability Program Plan can be part of the Software Development Plan or the Reliability Plan or a standalone document
  12. 12. Section 5.2 Develop Failure ModesAnalysis 12
  13. 13. Section 5.2 Develop Failure ModesAnalysis 13 • This section focuses on the 3 analyses that identify potential failure modes. • Understanding the failure modes is essential for development, testing, and decision making. Real examples are included in the document. • Perform Defect Root Cause Analysis (RCA) • Perform Software Failure Modes Effects Analysis (SFMEA) • Prepare the SFMEA • Analyze Failure Modes and Root Causes • Identify consequences • Mitigate • Generate a Critical Items List (CIL) • Understand the differences between a hardware FMEA and a software FMEA • Include Software in the System Fault Tree Analysis
  14. 14. SFMEA and SFTA Viewpoints These are complementary methods 14
  15. 15. Software defect root cause analysis • The RCA ensures that any SRE improvement efforts address the right types of defects. • Example, if most of the defects are introduced in the design phase, you don’t want to put all of your SRE effort into improving coding practices. • Software reliability assessment identified certain gaps and strengths which can lead to a certain “volume” of defects • But, a root cause analysis can confirm the “types” of defects • Faulty requirements? • Faulty design? • Faulty implementation? • Faulty interfaces? • Faulty changes or corrective actions? • Faulty source and version control? • These can and will be unique for each configuration item even if they have the same development processes Copyright © SoftRel, LLC 2011. This presentation may not be copied in part or in whole without written permission from Ann Marie Neufelder.
  16. 16. Example of a root cause analysis Defects are introduced because of either bad requirements, bad design, bad coding practices or bad change control. • Requirements defect – The “whats” are incorrect, ambiguous or incomplete. • Design defect – The “whats” are correct but the “hows” are not. Logic, state, timing, exception handling are all design related. • Coding defect- The “whats” and “hows” are correct but the software engineer did not implement one or more lines of code properly. 16Copyright © SoftRel, LLC 2011. This presentation may not be copied in part or in whole without written permission from Ann Marie Neufelder.
  17. 17. Section 5.3 Apply SRE during development 17
  18. 18. Section 5.3 Apply SRE during development Tasks Description 1. Determine/obtain system reliability objectives in terms of reliability, availability, MTBF Today’s system are software intensive. This makes it difficult to establish a reasonable system objective. This document provides 3 approaches for this. 2. Perform software reliability assessment and prediction See upcoming slides 3. Sanity check the early prediction One reason why SRE prediction models haven’t be used is that reliability engineers are unsure of the results. The document has typical reliability values based on the size of the software. 4. Merge the predictions into the over system prediction Once the predictions are done, the reliability engineer will want to integrate them into the overall system RBD or fault tree. The document has several methods for doing so. 5. Determine the total software reliability needed to reach the objective Since software engineering is often managed centrally, the software manager will want to know what the software components as an aggregate need to achieve. 18
  19. 19. Section 5.3 Apply SRE during development 6. Plan the reliability growth needed to reach the system objective Once the software objective is established, plans can and should be made to ensure that there is sufficient reliability growth in the schedule. Reliability growth can only happen if the software is operated in a real environment with no new feature drops. 7. Perform a sensitivity analysis Quite often there isn’t sufficient schedule for extended reliability growth so a sensitivity analysis is needed to determine how to cut the defects to reach the objective. 8. Allocate the required objective to each software LRU If the software components are managed by different organizations or vendors, the software level objective will need to be further allocated. 9. Employ software reliability metrics There are other metrics that can support decision making, testing and delivery that also support more reliable software. 19
  20. 20. Section 5.3.2 Perform software reliability assessment and prediction 20 • Since the 1970s most of the software reliability models are usable only during test or operation when it’s too late to do planning, tradeoffs, improvements. • The models presented in this section can be used for the code is even written. The predictions are then merged with the hardware reliability predictions and compared to the system reliability objective.
  21. 21. If you can predict this fault profile you can predict all of the other reliability figures of merit The predictive models predict the fault profile first and then then failure rate, MTBF, reliability and availability is predicted from that 21
  22. 22. Section 5.4 Apply SRE during testing 22
  23. 23. Section 5.3 Apply SRE during development Tasks Description 1. Develop a reliability test suite Software reliability growth models are useless unless the software is being exercised. The first step is to make sure that it is. 2. Measure test coverage The models can’t measure what they don’t know. The higher the test coverage, the higher the confidence in the models. 3. Increase test effectiveness via fault insertion Many software reliability issues are due to the software performing an unexpected function as opposed to it failing to perform a required function. This increases the confidence in the reliability. 4. Collect failure and defect data All of the models require the testing/operational hours and either the time of each failure observation or the total number of failures in a day. 5. Select and use reliability growth models Before you use any model, you need to plot the failure data and see which models are applicable. The document provides complete guidance on how to do this. 23
  24. 24. Section 5.3 Apply SRE during development Tasks Description 6. Apply SRE metrics Certain metrics provide information about the maturity of the software which are essential for decision making and planning of resources. 7. Determine accuracy of the models The failure trend can change at any time during testing. Hence, the best model can change with it. The best way to measure accuracy is to compare the estimations to the next time to failure. 8. Support release decision The release decision should not be made solely based on the SRG models. The decision is based on the test coverage and approach, degree of fault insertion, other SRE metrics which can indicate troubled releases as well as the SRG models. 24
  25. 25. Section 5.4Apply SRE during testing 25 • Software reliability growth models have existed since the 1970s • Many of them provide nearly identical results • SWRG models have been difficult to implement and understand due to poor guidance from academic community • Several models assume data which is not feasible to collect on non-academic large software systems This document provides • Models that are feasible for real software systems • Consolidation of models that provide similar results • Step by step instructions for how to select the best model(s) based on • The observed defect discovery trend (see next slide) • Inherent Defect Content • Effort required to use the model(s) • Availability of data required for the model(s) • How to apply them when you have an incremental life cycle • Test coverage methods which affect the accuracy of all SWRG models
  26. 26. Selecting the best SWRG model • Most important criteria is the current defect discovery trend. • A few models can be used when the discovery rate is increasing or peaking. Most can be used when decreasing or stabilizing. • If the defect discovery is plotted first, the user will know which models can be used 26 0 2 4 6 8 10 12 NonCumulativedefects discovered Normalized usage period Increasing Peaking Decreasing Stabilizing
  27. 27. Section 5.5 Support Release Decision 27
  28. 28. Section 5.5 Support Release Decision 28 Once the development and testing is complete the SRE analyses, models and metrics can be used to determine whether a decision should be accepted • Decision is based on • Requirements and Operational Profile coverage • Stress test coverage • Code coverage • Adequate defect removal • Confidence in reliability estimates • SRE Tasks performed prior to acceptance • Determine Release Stability – do the reliability estimates meet the objective? • Forecast additional test duration – If the objective hasn’t been met how many more test hours are required? • Forecast remaining defects and effort required to correct them – Will the forecasted defects pile up? Impact the next release? • Perform a Reliability Demonstration Test – Determine statistically whether the software meets the objective
  29. 29. Section 5.6Apply SRE in Operations 29 Once the software is deployed the reliability should be monitored to assess any changes needed to previous analyses, predictions and estimations
  30. 30. Section 5.6 Apply SRE in Operations Tasks Description 1. Employ SRE metrics to monitor software reliability The best way to improve the accuracy of the predictions and SWRG models is to measure the actual software reliability once in operation.2. Compare operational and predicted reliability 3. Assess changes to previous characterizations or analyses The operational failure modes may be different than what’s visible in testing. If so, the software failure modes analyses will need to focus on the operational failure modes to improve the reliability of the next release. 4. Archive operational data Operational data is valuable for future predictions, sanity checking, etc. 30
  31. 31. Summary • IEEE P1633 2016 puts forth recommended practices to apply qualitative software failure modes analyses and qualitative models • Improve product and ensure software or firmware delivered with required reliability • IEEE P1633 2016 includes improved guidance • Offers increased value more accessible to a broader audience • Reliability engineers • Software quality engineers • Software managers • Acquisitions 31