Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

  • Be the first to comment


  1. 1. Software Reliability Engineering: A Roadmap Michael R. Lyu Dept. of Computer Science & Engineering The Chinese University of Hong Kong Future of Software Engineering ICSE’2007 Minneapolis, Minnesota May 24, 2007
  2. 2. Introduction <ul><li>Software reliability is the probability of failure-free operation with respect to execution time and environment. </li></ul><ul><li>Software reliability engineering (SRE) is the quantitative study of the operational behavior of software-based systems with respect to user requirements concerning reliability. </li></ul><ul><li>SRE has been adopted by more than 50 companies as standards or best current practices. </li></ul><ul><li>Creditable software reliability techniques are still in urgent need. </li></ul>
  3. 3. Historical SRE Techniques: Fault Lifecycle <ul><li>Fault prevention: to avoid, by construction , fault occurrences. </li></ul><ul><li>Fault removal: to detect, by verification and validation , the existence of faults and eliminate them. </li></ul><ul><li>Fault tolerance: to provide, by redundancy and diversity , service complying with the specification in spite of manifested faults. </li></ul><ul><li>Fault/failure forecasting: to estimate, by statistical modeling , the presence of faults and occurrence of failures. </li></ul>
  4. 4. Fault Lifecycle Technique Fault Manifestation and Modeling Process Reliability Fault Prevention Fault Removal Fault Tolerance Fault/Failure Forecasting
  5. 5. Fault Lifecycle Technique Fault Manifestation and Modeling Process Reliability Availability Safety Security Fault Prevention Fault Removal Fault Tolerance Fault/Failure Forecasting
  6. 6. Software Reliability Modeling R = e -  t  Testing Time
  7. 7. Current SRE Process Overview
  8. 8. Current Trends and Problems <ul><li>The theoretical foundation of software reliability comes from hardware reliability techniques. </li></ul><ul><li>Software failures do not happen independently. </li></ul><ul><li>Software failures seldom repeat in exactly the same or predictable pattern. </li></ul><ul><li>Failure mode and effect analysis (FMEA) for software is still controversial and incomplete. </li></ul><ul><li>There is currently a need for a creditable end-to-end software reliability paradigm that can be directly linked to reliability prediction from the very beginning. </li></ul>
  9. 9. Future Direction 1: Reliability-Centric Software Architectures <ul><li>The product view – achieve failure-resilient software architecture </li></ul><ul><ul><li>Fault prevention </li></ul></ul><ul><ul><li>Fault tolerance </li></ul></ul><ul><li>The process view – explore the component-based software engineering </li></ul><ul><ul><li>Component identification, construction, protection, integration and interaction </li></ul></ul><ul><ul><li>Reliability modeling based on software structure </li></ul></ul>
  10. 10. Future D i r e c t i o n 2: Design for Reliability Achievement <ul><li>Fault confinement </li></ul><ul><li>Fault detection </li></ul><ul><li>Diagnosis </li></ul><ul><li>Reconfiguration </li></ul><ul><li>Recovery </li></ul><ul><li>Restart </li></ul><ul><li>Repair </li></ul><ul><li>Reintegration </li></ul>
  11. 11. Fault Confinement Fault Detection Fault Detection Failover Diagnosis Online Offline Reconfiguration Recovery Restart Repair Reintegration
  12. 12. Future D i r e c t i o n 3: Testing for Reliability Assessment <ul><li>Establish the link between software testing and reliability </li></ul><ul><li>Study the effect of code coverage to fault coverage </li></ul><ul><li>Evaluate impact of reliability by various testing metrics </li></ul><ul><li>Assess competing testing schemes quantitatively </li></ul>
  13. 13. Positive vs. negative evidences for coverage-based software testing Code coverage contributes to a noticeable amount of fault coverage Cai (2005) High code coverage brings high software reliability and low failure rate Frankl(1988) Horgan(1994) Weyuker(1988) Positive Findings Resources The testing result on published data did not support a causal dependency between code coverage and defect coverage Briand(2000) Negative An increase in reliability comes with an increase in at least one code coverage measures Frate(1995) The correlation between test effectiveness and block coverage is higher than that between test effectiveness and the size of test set Wong(1994) A correlation between code coverage and software reliability is observed Chen(1992)
  14. 14. RSDIMU test cases description I II III IV V VI
  15. 15. The correlation: various test regions <ul><li>Linear modeling fitness in various test case regions </li></ul><ul><li>Linear regression relationship between block coverage and fault coverage in the whole test set </li></ul>Fault Coverage
  16. 16. The correlation: normal operational testing vs. exceptional testing <ul><li>Normal operational testing </li></ul><ul><ul><li>very weak correlation </li></ul></ul><ul><li>Exceptional testing </li></ul><ul><ul><li>strong correlation </li></ul></ul>0.944 Exceptional testing (373) 0.045 Normal testing (827) 0.781 Whole test case (1200) R-square Testing profile (size)
  17. 17. The correlation: normal operational testing vs. exceptional testing <ul><li>Normal testing: small coverage range (48%-52%) </li></ul><ul><li>Exceptional testing: two main clusters </li></ul>Fault Coverage Fault Coverage
  18. 18. The Spectrum in Software Testing and Reliability Software Reliability Growth Models New Model Coverage-Based Analysis <ul><li>A new model is needed to combine execution time and testing coverage </li></ul>- user oriented - tester oriented - more physical meaning - less physical meaning - abundant models - lack of models - easy data collection - hard data collection - less relevance to testing - more relevance to testing Time Based Models Coverage Based Testing
  19. 19. A New Coverage-Based Reliability Model <ul><ul><li>λ (t,c): joint failure intensity function </li></ul></ul><ul><ul><li>λ 1 (t): failure intensity function with respect to time </li></ul></ul><ul><ul><li>λ 2 (c): failure intensity function with respect to coverage </li></ul></ul><ul><ul><li>α 1 , γ 1 , α 2 , γ 2 : parameters with the constraint of </li></ul></ul><ul><ul><li>α 1 + α 2 = 1 </li></ul></ul><ul><ul><li>joint failure intensity function </li></ul></ul><ul><ul><li>failure intensity function with time </li></ul></ul><ul><ul><li>failure intensity function with coverage </li></ul></ul>Dependency factors
  20. 20. Estimation Accuracy
  21. 21. Future D i r e c t i o n 4: Metrics for Reliability Prediction <ul><li>New models (e.g., BBN) to explore rich software metrics </li></ul><ul><li>Data mining approaches </li></ul><ul><li>Machine learning techniques </li></ul><ul><li>Bridging the gap of the one-way function: feedback to building reliable software </li></ul><ul><li>Continuous industrial data collection efforts – demonstration of cost-effectiveness </li></ul>
  22. 22. Future D i r e c t i o n 5: Reliability for Emerging Software Applications <ul><li>“ The Internet changes everything” </li></ul><ul><li>On-demand customizable software </li></ul><ul><li>Service oriented architecture, composition, integration </li></ul><ul><li>Customization by middleware – from metadata to metacode </li></ul><ul><li>A common infrastructure delivers reliability to all customers </li></ul>
  23. 23. A Paradigm for Reliable Web Service Replication Manager Web service selection algorithm WatchDog UDDI Registry WSDL Web Service IIS Application Database Web Service IIS Application Database Web Service IIS Application Database Client Port Application Database <ul><li>Create Web services </li></ul><ul><li>Select primary Web </li></ul><ul><li>service (PWS) </li></ul>3. Register 4. Look up 5. Get WSDL 6. Invoke Web service <ul><li>Keep check the availability of the PWS </li></ul><ul><li>If PWS failed, reselect the PWS. </li></ul>9. Update the WSDL
  24. 24. Conclusions <ul><li>Software reliability is receiving higher attention as it becomes an important economic consideration for businesses. </li></ul><ul><li>New SRE paradigms need to consider software architectures, testing techniques, data analyses, and creditable reliability modeling procedures. </li></ul><ul><li>Domain specific approaches on emerging software applications are worthy of investigation. </li></ul><ul><li>Still a long way to go, but the directions are clear. </li></ul>