System and Software Reliability


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

System and Software Reliability

  1. 1. SAS 03/ GSFC/SATC-NSWC-DD System and Software Reliability Dolores R. Wallace SRS Technologies Software Assurance Technology Center http:// / Dr. William H. Farr, Dr. John R. Crigler Naval Surface Warfare Center Dahlgren Division NASA OSMA SAS '03
  2. 2. Overview of the Problem <ul><li>Reliability Measurement is a critical objective for NASA systems </li></ul><ul><li>Systems are assessed from the software/hardware/systems perspective </li></ul><ul><li>Methodologies for hardware reliability assessment have been developed and utilized over the past several decades </li></ul><ul><li>Methodologies for software reliability assessment have been developed since the 70’s and have been utilized over the last twenty years </li></ul><ul><li>Methodologies for system reliability assessment have only been addressed over the last 10 years with little application experience </li></ul><ul><li>Need for a tool that integrates all aspects of reliability data (software, hardware, and systems perspectives) </li></ul>
  3. 3. Project Objectives <ul><li>Enhance the capability for NASA to assess software reliability by identifying and incorporating recent models into the tool S tatistical M odeling and E stimation of R eliability F unctions for S ystems ( SMERFS^3 ) </li></ul><ul><ul><li>First Year Initiative </li></ul></ul><ul><ul><li>Perform a detailed literature search (1990 and beyond) </li></ul></ul><ul><li>Enhance the capability for NASA to assess system reliability by updating SMERFS^3 </li></ul><ul><ul><li>Second Year Initiative </li></ul></ul><ul><ul><li>Identify system models for incorporation </li></ul></ul><ul><li>Apply the identified methodologies to project data sets within the NASA/DoD environments </li></ul>
  4. 4. FY03 Research Plan <ul><li>Literature search </li></ul><ul><li>Selection of new models </li></ul><ul><li>Build new models into SMERFS^3 </li></ul><ul><li>Test new models with Goddard project data </li></ul><ul><li>Make latest version of SMERFS^3 available </li></ul>
  5. 5. Literature Search <ul><li>Articles from 1990 forward </li></ul><ul><li>Journals - sample </li></ul><ul><ul><li>IEEE TSE </li></ul></ul><ul><ul><li>IEEE Reliability </li></ul></ul><ul><ul><li>Software Testing, Verification, and Reliability </li></ul></ul><ul><ul><li>IEEE Software </li></ul></ul><ul><ul><li>IEEE Computer </li></ul></ul><ul><li>Conferences </li></ul><ul><ul><li>ISSRE </li></ul></ul><ul><ul><li>ICSE </li></ul></ul><ul><ul><li>Reliability & Maintainability </li></ul></ul><ul><ul><li>High-Assurance Systems Eng. </li></ul></ul><ul><ul><li>Various others </li></ul></ul><ul><li>Model selection criteria </li></ul><ul><ul><li>Model assumptions </li></ul></ul><ul><ul><li>Fit within current SMERFS^3 </li></ul></ul><ul><ul><li>Type of system </li></ul></ul><ul><ul><li>Data availability </li></ul></ul><ul><li>Domain Experts </li></ul>
  6. 6. Characteristics of the Software Based Systems <ul><li>Software </li></ul><ul><ul><li>Real-time </li></ul></ul><ul><ul><li>Large-scale </li></ul></ul><ul><ul><li>Time-critical </li></ul></ul><ul><ul><li>Embedded </li></ul></ul><ul><ul><li>Maybe heavy COTS </li></ul></ul><ul><ul><li>Distributed </li></ul></ul><ul><li>System </li></ul><ul><ul><li>Safety-critical components </li></ul></ul><ul><ul><li>Heterogeneous </li></ul></ul><ul><ul><li>Fault tolerant </li></ul></ul><ul><ul><li>Costly to develop </li></ul></ul><ul><ul><li>Long lifetime, evolutionary </li></ul></ul>
  7. 7. SMERFS^3 <ul><li>Current Version features: </li></ul><ul><ul><li>6 software reliability models </li></ul></ul><ul><ul><li>2D, 3D plots of input data, fit into each model </li></ul></ul><ul><ul><li>Various reliability estimates </li></ul></ul><ul><ul><li>User queries for predictions </li></ul></ul><ul><li>Updates constraints: </li></ul><ul><ul><li>Employ data from integration, system test, or operational phase </li></ul></ul><ul><ul><li>Use existing graphics of SMERFS^3 </li></ul></ul><ul><ul><li>Integrate with existing user interfaces, goodness-of-fit tests, and prediction capabilities </li></ul></ul>
  8. 8. Available Data <ul><li>Large GSFC project, but confidentiality required </li></ul><ul><li>GSFC person invaluable in explaining the system and the data </li></ul><ul><li>Several subsystems </li></ul><ul><li>Data flat files – much effort into spreadsheet/database </li></ul><ul><li>Operational failures only </li></ul><ul><li>Remove specific faults and sort others </li></ul><ul><li>Apply IntervalCounter </li></ul><ul><li>Bottom line: organizing data required substantial effort – minimized if project person prepared the data </li></ul>
  9. 9. Identified Models <ul><li>Hypergeometric </li></ul><ul><li>Schneidewind (enhancements) </li></ul><ul><li>Log-logistic </li></ul><ul><li>Extended Execution Time (EET) </li></ul><ul><li>The first two models require error count failure data; the last two require time-between-failure data </li></ul><ul><li>Only error count data has been captured in the GSFC project database available for analysis </li></ul><ul><li>Hence, software reliability additions to SMERFS^3 in this task will be limited to the hypergeometric model and the metrics enhancements to the Schneidewind model </li></ul>
  10. 10. Hypergeometric Model Assumptions <ul><li>Test instance , t(i): A collection of input test data. </li></ul><ul><li>N : Total number of initial faults in the software. </li></ul><ul><li>Faults detected by a test instance are removed before the next test instance is exercised </li></ul><ul><li>No new fault is inserted into the software in the removal of the detected fault. </li></ul><ul><li>A test instance t(i) senses w(i) initial faults. w(i) may vary with the condition of test instances over i. It is sometimes referred to in the authors' papers as a &quot;sensitivity&quot; factor. </li></ul><ul><li>The initial faults actually sensed by t(i) depend upon t(i) itself. The w(i) initial faults are taken randomly from the N initial faults. </li></ul>
  11. 11. <ul><li>Meets many of our selection criteria: </li></ul><ul><ul><li>Data type </li></ul></ul><ul><ul><li>Fits within the framework of the SMERFS^3 software </li></ul></ul><ul><ul><li>Research shows that it appears to perform well against other models </li></ul></ul><ul><ul><ul><li>Allows for testing intensity factor (for example: number of test cases, number of testing personnel, debug time ) </li></ul></ul></ul><ul><li>Scheduled for implementation in the last quarter of FY03 </li></ul>Hypergeometric Model
  12. 12. Schneidewind Model <ul><li>There are three versions: </li></ul><ul><ul><li>Model 1 : All of the fault counts for each testing period are treated the same. </li></ul></ul><ul><ul><li>Model 2 : Ignore the first s-1 testing periods and their associated fault counts. Only use the data from s to n. </li></ul></ul><ul><ul><li>Model 3 : Combine the fault counts of the intervals 1 to s-1 into the first data point. Thus there are s+1 data points. </li></ul></ul>
  13. 13. Schneidewind Assumptions <ul><li>The number of faults detected in each of the respective intervals are independent. </li></ul><ul><li>The fault correction rate is proportional to the number of faults to be corrected. </li></ul><ul><li>The intervals over which the software is tested are all taken to be of the same length. </li></ul><ul><li>The cumulative number of faults by time t, M(t), follows a Poisson process with mean value function μ(t). The mean value function is such that the expected number of fault occurrences for any time period is proportional to the expected number of undetected faults at that time. </li></ul><ul><li>The failure intensity function, λ(t), is assumed to be an exponentially decreasing function of time; that is, λ(t)=αexp(-βt) for some α, β > 0. </li></ul>
  14. 14. <ul><li>Meets many of our selection criteria: </li></ul><ul><ul><li>Data type </li></ul></ul><ul><ul><li>Basic model already in the SMERFS^3 software </li></ul></ul><ul><ul><li>It has been shown to perform well against other models </li></ul></ul><ul><ul><ul><li>Allows learning curve effect </li></ul></ul></ul><ul><li>Updates are being implemented this quarter </li></ul><ul><ul><li>Risk measures </li></ul></ul><ul><ul><ul><li>Operational quality at time t </li></ul></ul></ul><ul><ul><ul><li>Risk criterion metric for the remaining faults at time t </li></ul></ul></ul><ul><ul><ul><li>Risk criterion metric for the time to next failure at time t </li></ul></ul></ul><ul><ul><li>Confidence intervals </li></ul></ul>Schneidewind Model Enhancements
  15. 16. Data Analysis of NASA Three Month Fault Counts
  16. 17. Proposed Next Steps <ul><li>FY03 – Focused on software </li></ul><ul><ul><li>Complete implementation and testing </li></ul></ul><ul><ul><li>Prepare paper describing the research and model selection, implementation, conclusions </li></ul></ul><ul><ul><ul><li>Apply the enhancements on the Goddard data set </li></ul></ul></ul><ul><ul><li>Prepare SMERFS^3 for distribution </li></ul></ul><ul><li>FY04 </li></ul><ul><ul><li>Conduct similar research effort for System Reliability </li></ul></ul><ul><ul><ul><li>University of Connecticut will participate </li></ul></ul></ul><ul><ul><li>Enhance and validate system models </li></ul></ul>