Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Fault avoidance: To avoid or prevent the introduction of faults by engaging various design methodologies, techniques and technologies, including …… Fault removal: To detect and eliminate software faults by techniques such as reviews, inspection, testing, verification and validation… … Fault tolerance: To provide a service complying with the specification in spite of faults, typically by means o f…… Fault forecast: To estimate the existence of faults and the occurrences and consequences of failures …
  • Rational To deal with residual design faults
  • Block: maximal code fragments without branching, which contains no internal control flow change; Decision: a code fragment associated with a branch predicate C-use: a pair of definition and computational use of a variable P-use: a pair of definition and predicate use of a variable
  • In this figure, the failure rate of a software system is generally decreasing due to the discovery and removal of software failures. At any particular time, say, the point marked “present time”, it is possible to observe a history of the failure rate of the software. Software reliability modeling forecasts the curve of the failure rate by statistical evidence. The purpose of this measure is twofold: 1) to predict the extra time needed to test the software to achieve a specified objective; 2) To predict the expected reliability of the software when the testing is finished. Failure rate: the probability that a failure occurs in a certain time period.
  • Necessity: ni’sesiti Accuracy: ‘eikurasi
  • Avionics: eivi`onics Semi- Octahedron : eukte`hedren Ac . cele ` rometer Acceleration At least five sensors to measure inertial, so there are enough redundancy to allow tolerance of sensor failures
  • Quick review
  • Emphasize the importance of this number 11.39: This is not the first time that mutation testing is performed. But in previous studies, mutants are generated artificially by automatic programs, which affect only one or two lines of code. Millions of mutants can be created by hypothetical faults. But in our case, we identify real faults that recorded in development phase and inject them back into the final program versions. From our results, we can find that on average, one fault affects about 11 lines, while some faults affect more than 51 lines at most. This is very important for our mutation testing.
  • ?????
  • Quick review
  • Highlight: The reliability is higher in our experiment than theirs; lower failure number, density, coincident failures in both projects; high reliability improvements in both projects. Reliable program versions with low failure probability Small number of faults and low fault density Remarkable reliability improvement for NVP, 10 2 to 10 4 times enhancement Low probability of fault correlation
  • In order to answer the question of whether testing coverage is an effective means of fault detection, we executed the 426 mutants over all test cases and observed whether additional coverage of the code was achieved when the mutants were killed by a new test case. Effectiveness of testing coverage in revealing faults is shown in this table. The second to fifth column identify the number of faults in relation to coverage changes. For example, ``6/8'' for version ID ``1'‘ under the ``Blocks'' column means during the test , six out of eight faults in program version 1 showed the property that when these faults were detected by a test case, block coverage of the code increased. In other words , two faults of program version 1 were detected without increasing the block coverage. The last column ``Any'' counts the total number of mutants whose coverage increased in any of the four coverage measures when the y were killed. The result clearly shows that coverage increase is closely related to more fault detections. Out of the 217 mutants under analysis, 155 of them showed some kinds of coverage increase when they were killed. This represents an impressive ratio of 71.4%. The range, however, is very wide (from 33.3% to 90.9%) among different versions. This indicates that the programmer's individual capability accounted for a large degree of variation in the faults created and the detectability of these faults. Why 217? 426-
  • Generally speaking, both block coverage and fault coverage grow over time.
  • A significant correlation exists in exceptional test cases, while no correlation in normal operational test cases. Higher correlation is revealed in functional testing than in random testing, but the difference is insignificant The difference between the effects of different coverage metrics is not obvious with our experimental data
  • The behavior of three pairs of mutants show three different features of fault coincidence of design diversity. The bounds of PS model are not tight in the following situations: One version is always worse than the other; (upper bound is not tighter than minimum) Negative covariance exists; (lower bound is not tighter than independence)
  • Reliability: DRB > NVP >NSCP
  • In`tegral De`rivative Since lambda (t) is the failure intensity function with respect to time, any existing distributions in well-known reliability models can be used, e.g., NHPP,Weibull model,S-shaped model or logarithmic Poisson models.
  • NHPP model
  • Gray lines: redundant test cases Black lines: non-redundant test cases 1200  698
  • presentation

    1. 1. Presented by: CAI Xia Supervisor: Prof. Michael R. Lyu August 24, 2006 Ph.D Thesis Defense Coverage-Based Testing Strategies and Reliability Modeling for Fault-Tolerant Software Systems
    2. 2. Outline <ul><li>Background and related work </li></ul><ul><li>Research methodology </li></ul><ul><li>Experimental setup </li></ul><ul><li>Evaluations on design diversity </li></ul><ul><li>Coverage-based testing strategies </li></ul><ul><li>Reliability modeling </li></ul><ul><li>Conclusion and future work </li></ul>
    3. 3. Background <ul><li>Four technical methods to achieve reliable software systems </li></ul>Structural Programming Formal methods Software reuse Software testing Formal inspection Checkpointing and recovery Exception handling Data diversity Design diversity Software reliability modeling
    4. 4. Fault-tolerant software <ul><li>Single-version technique </li></ul><ul><ul><li>Checkpointing and recovery </li></ul></ul><ul><ul><li>Exception handling </li></ul></ul><ul><li>Multi-version technique (design diversity) </li></ul><ul><ul><li>Recovery block (RB) </li></ul></ul><ul><ul><li>N-version programming (NVP) </li></ul></ul><ul><ul><li>N self-checking programming (NSCP) </li></ul></ul>NVP model
    5. 5. Design diversity <ul><li>Requirement </li></ul><ul><ul><li>Same specification; </li></ul></ul><ul><ul><li>The multiple versions developed differently by independent teams; </li></ul></ul><ul><ul><li>No communications allowed between teams; </li></ul></ul><ul><li>Expectation </li></ul><ul><ul><li>Programs built differently should fail differently </li></ul></ul><ul><li>Challenges </li></ul><ul><ul><li>Cost consuming; </li></ul></ul><ul><ul><li>Correlated faults? </li></ul></ul>
    6. 6. Experiments and evaluations <ul><li>Empirical and theoretical investigations have been conducted based on experiments, modeling, and evaluations </li></ul><ul><ul><li>Knight and Leveson (1986), Kelly et al (1988), Eckhardt et al (1991), Lyu and He (1993) </li></ul></ul><ul><ul><li>Eckhardt and Lee (1985), Littlewood and Miller (1989), Popov et al. (2003) </li></ul></ul><ul><ul><li>Belli and Jedrzejowicz (1990), Littlewood. et al (2001), Teng and Pham (2002) </li></ul></ul><ul><li>No conclusive estimation can be made because of the size, population, complexity and comparability of these experiments </li></ul>
    7. 7. Software testing strategies <ul><li>Key issue </li></ul><ul><ul><li>test case selection and evaluation </li></ul></ul><ul><li>Classifications </li></ul><ul><ul><li>Functional testing (black-box testing) </li></ul></ul><ul><ul><ul><li>Specification-based testing </li></ul></ul></ul><ul><ul><li>Structural testing (white-box testing) </li></ul></ul><ul><ul><ul><li>Branch testing </li></ul></ul></ul><ul><ul><ul><li>Data-flow coverage testing </li></ul></ul></ul><ul><ul><li>Mutation testing </li></ul></ul><ul><ul><li>Random testing </li></ul></ul><ul><li>Comparison of different testing strategies: </li></ul><ul><ul><li>Simulations </li></ul></ul><ul><ul><li>Formal analysis </li></ul></ul>Code coverage: measurement of testing completeness? Subdomain-based testing
    8. 8. Code coverage <ul><li>Definition </li></ul><ul><ul><li>measured as the fraction of program codes that are executed at least once during the test. </li></ul></ul><ul><li>Classification </li></ul><ul><ul><li>Block coverage : the portion of basic blocks executed. </li></ul></ul><ul><ul><li>Decision coverage : the portion of decisions executed </li></ul></ul><ul><ul><li>C-Use coverage : computational uses of a variable. </li></ul></ul><ul><ul><li>P-Use coverage : predicate uses of a variable </li></ul></ul>
    9. 9. Code coverage: an indicator of testing effectiveness? <ul><li>Positive evidence </li></ul><ul><ul><li>high code coverage brings high software reliability and low fault rate </li></ul></ul><ul><ul><li>both code coverage and fault detected in programs grow over time, as testing progresses. </li></ul></ul><ul><li>Negative evidence </li></ul><ul><ul><li>Can this be attributed to c ausal dependency between code coverage and defect coverage ? </li></ul></ul><ul><li>Controversial, not conclusive </li></ul>
    10. 10. Software reliability growth modeling (SRGM) <ul><li>To model past failure data to predict future behavior </li></ul>
    11. 11. SRGM: some examples <ul><li>Nonhomogeneous Poisson Process (NHPP) model </li></ul><ul><li>S-shaped reliability growth model </li></ul><ul><li>Musa-Okumoto Logarithmic Poisson model </li></ul>μ (t) is the mean value of cumulative number of failure by time t
    12. 12. Reliability models for design diversity <ul><li>Echhardt and Lee (1985) </li></ul><ul><ul><li>Variation of difficulty on demand space </li></ul></ul><ul><ul><li>Positive correlations between version failures </li></ul></ul><ul><li>Littlewood and Miller (1989) </li></ul><ul><ul><li>Forced design diversity </li></ul></ul><ul><ul><li>Possibility of negative correlations </li></ul></ul><ul><li>Dugan and Lyu (1995) </li></ul><ul><ul><li>Markov reward model </li></ul></ul><ul><li>Tomek and Trivedi (1995) </li></ul><ul><ul><li>Stochastic reward net </li></ul></ul><ul><li>Popov, Strigini et al (2003) </li></ul><ul><ul><li>Subdomains on demand space </li></ul></ul><ul><ul><li>Upper bounds and “likely” lower bounds for reliability </li></ul></ul>
    13. 13. Our contributions <ul><li>For Fault Tolerance: </li></ul><ul><ul><li>Assess the effectiveness of design diversity </li></ul></ul><ul><li>For Fault Removal: </li></ul><ul><ul><li>Establish the relationship between fault coverage and code coverage under various testing strategies </li></ul></ul><ul><li>For Fault Forecast: </li></ul><ul><ul><li>Propose a new reliability model which incorporates code coverage and testing time together </li></ul></ul>
    14. 14. Outline <ul><li>Background and related work </li></ul><ul><li>Research methodology </li></ul><ul><li>Experimental setup </li></ul><ul><li>Evaluations on design diversity </li></ul><ul><li>Coverage-based testing strategies </li></ul><ul><li>Reliability modeling </li></ul><ul><li>Conclusion and future work </li></ul>
    15. 15. Motivation <ul><li>Fault-tolerant software </li></ul><ul><ul><li>A necessity </li></ul></ul><ul><ul><li>Yet controversial </li></ul></ul><ul><li>Lack of </li></ul><ul><ul><li>Conclusive assessment </li></ul></ul><ul><ul><li>creditable reliability model </li></ul></ul><ul><ul><li>effective testing strategy </li></ul></ul><ul><ul><li>Real-world project data on testing and fault tolerance techniques together </li></ul></ul>
    16. 16. Research procedure and methodology <ul><li>A comprehensive and systematic approach </li></ul><ul><ul><li>Modeling </li></ul></ul><ul><ul><li>Experimentation </li></ul></ul><ul><ul><li>Evaluation </li></ul></ul><ul><ul><li>Economics </li></ul></ul><ul><li>Modeling </li></ul><ul><ul><li>Formulate the relationship between testing and reliability achievement </li></ul></ul><ul><ul><li>Propose our own reliability models with the key attributes </li></ul></ul>
    17. 17. Research procedure and methodology <ul><li>Experimentation </li></ul><ul><ul><li>Obtain new real-world fault-tolerant empirical data with coverage testing and mutation testing </li></ul></ul><ul><li>Evaluation </li></ul><ul><ul><li>Collect statistical data for the effectiveness of design diversity </li></ul></ul><ul><ul><li>Evaluate existing reliability models for design diversity; </li></ul></ul><ul><ul><li>Investigate the effect of code coverage; </li></ul></ul><ul><li>Economics </li></ul><ul><ul><li>Perform a tradeoff study on testing and fault tolerance </li></ul></ul>
    18. 18. Outline <ul><li>Background and related work </li></ul><ul><li>Research methodology </li></ul><ul><li>Experimental setup </li></ul><ul><li>Evaluations on design diversity </li></ul><ul><li>Coverage-based testing strategies </li></ul><ul><li>Reliability modeling </li></ul><ul><li>Conclusion and future work </li></ul>
    19. 19. Project features <ul><li>Complicated and real-world application </li></ul><ul><li>Large population of program versions </li></ul><ul><li>Controlled development process </li></ul><ul><li>Mutation testing with real faults injection </li></ul><ul><li>Well-defined acceptance test set </li></ul>
    20. 20. Experimental setup <ul><li>Time : spring of 2002 </li></ul><ul><li>Population : 34 teams of four members </li></ul><ul><li>Application : a critical avionics application </li></ul><ul><li>Duration : a 12-week long project </li></ul><ul><li>Developers : senior-level undergraduate students with computer science major </li></ul><ul><li>Place : CUHK </li></ul>
    21. 21. Experimental project description <ul><li>Geometry </li></ul><ul><li>Data flow diagram </li></ul>Redundant Strapped-Down Inertial Measurement Unit (RSDIMU)
    22. 22. Software development procedure <ul><li>Initial design document ( 3 weeks) </li></ul><ul><li>Final design document (3 weeks) </li></ul><ul><li>Initial code (1.5 weeks) </li></ul><ul><li>Code passing unit test (2 weeks) </li></ul><ul><li>Code passing integration test (1 weeks) </li></ul><ul><li>Code passing acceptance test (1.5 weeks) </li></ul>
    23. 23. Mutant creation <ul><li>Revision control applied and code changes analyzed </li></ul><ul><li>Mutants created by injecting real f ault s identified during each development stage </li></ul><ul><li>Each mutant contain ing one design or programming fault </li></ul><ul><li>426 mutants created for 21 program versions </li></ul>
    24. 24. Program metrics Total: 426 1753.4 1651.5 766.8 1700.1 48.8 9.0 2234.2 Average 16 2887 2574 1009 1880 27 7 2022 33 20 2132 1649 974 1807 41 8 1919 32 23 1617 1075 827 1601 24 12 1914 31 24 833 1539 506 1710 43 8 1627 29 15 1364 1114 622 1327 21 9 1455 27 22 4461 2404 1238 2144 45 20 4512 26 9 1805 1586 706 1890 90 8 2131 24 23 2335 2907 1076 2403 68 7 3253 22 18 1735 1512 782 1531 60 9 1807 20 10 1251 1138 686 1635 69 6 2177 18 17 1328 1014 655 1310 58 9 1768 17 29 1448 1645 732 1736 47 8 1849 15 31 1201 1204 551 1308 46 8 2559 12 20 1979 2022 666 1663 56 9 2161 09 17 1293 1470 585 1429 57 9 2154 08 19 1792 2815 917 2686 35 11 2918 07 26 1853 2434 960 2460 40 7 2623 05 24 1339 646 647 1183 39 7 1749 04 17 1070 899 548 1081 51 8 2331 03 21 1714 2022 809 1592 37 11 2361 02 25 1384 1012 606 1327 70 9 1628 01 Mutants P-Use C-Use Decisions Blocks Functions Modules Lines Id
    25. 25. Setup of evaluation test <ul><li>ATAC tool employed to analyze the compare testing coverage </li></ul><ul><li>1200 test cases exercised as acceptance test </li></ul><ul><li>All failures analyzed, code coverage measured, and cross-mutant failure results compared </li></ul><ul><li>60 Sun machines running Solaris involved with 30 hours one cycle and a total of 1.6 million files around 20GB generated </li></ul><ul><li>1M test cases in operational test </li></ul>
    26. 26. Outline <ul><li>Background and related work </li></ul><ul><li>Research methodology </li></ul><ul><li>Experimental setup </li></ul><ul><li>Evaluations on design diversity </li></ul><ul><li>Coverage-based testing strategies </li></ul><ul><li>Reliability modeling </li></ul><ul><li>Conclusion and future work </li></ul>
    27. 27. Static analysis result (1) Fault Type Distribution Qualifier Distribution 1% 5 Interface/OO Messages 14% 60 Checking: 19% 81 Algorithm/Method: 33% 144 Function/Class/Object: 31% 136 Assign/Init: Percentage Number Fault types 4% 18 Extraneous: 33% 141 Missing: 63% 267 Incorrect: Percentage Number Qualifier
    28. 28. Static analysis result (2) Development Stage Distribution Fault Effect Code Lines 8.9% 38 Acceptance Test 7.3% 31 Integration Test 28.2% 120 Unit Test 55.6% 237 Init Code Percentage Number Stage 11.39 Average 5.40% 23 >51 lines: 12.44% 53 21-50 lines: 10.09% 43 11-20 lines: 14.32% 61 6-10 lines: 30.52% 130 2-5 lines: 27.23% 116 1 line: Percentage Number Lines
    29. 29. Mutants relationship <ul><li>Related mutants: </li></ul><ul><ul><li>- same success/failure 1200-bit binary string </li></ul></ul><ul><li>Similar mutants: </li></ul><ul><ul><li>- same binary string with the same erroneous output variables </li></ul></ul><ul><li>Exact mutants: </li></ul><ul><ul><li>- same binary string with same values of erroneous output variables </li></ul></ul>Total pairs: 90525
    30. 30. Cross project comparison
    31. 31. Cross project comparison <ul><li>NASA 4-university project: 7 out of 20 versions passed the operational testing </li></ul><ul><li>Coincident failures were found among 2 to 8 versions </li></ul><ul><li>5 of the 7 related faults were not observed in our project </li></ul>
    32. 32. Major contributions or findings on fault tolerance <ul><li>Real-world mutation data for design diversity </li></ul><ul><li>A major empirical study in this field with substantial coverage and fault data </li></ul><ul><li>Supportive evidence for design diversity </li></ul><ul><ul><li>Remarkable reliability improvement (10 2 to 10 4) </li></ul></ul><ul><ul><li>Low probability of fault correlation </li></ul></ul>
    33. 33. Outline <ul><li>Background and related work </li></ul><ul><li>Research methodology </li></ul><ul><li>Experimental setup </li></ul><ul><li>Evaluations on design diversity </li></ul><ul><li>Coverage-based testing strategies </li></ul><ul><li>Reliability modeling </li></ul><ul><li>Conclusion and future work </li></ul>
    34. 34. Research questions <ul><li>Is code coverage a positive indicator for fault detection capability? </li></ul><ul><li>Does such effect vary under different testing strategies and profiles? </li></ul><ul><li>Does any such e ff ect vary with di ffe rent code coverage metrics ? </li></ul>
    35. 35. Fault detection related to changes of test coverage <ul><li>426 </li></ul><ul><li>174 </li></ul><ul><li>35 </li></ul><ul><li>= 217 </li></ul>Coverage increase => more faults detected! 155/217 ( 71.4% ) 152/217 ( 70 % ) 137/217 ( 63.1% ) 145/217 ( 66.8% ) 131/217 ( 60.4% ) Overall 10/13 (76.9%) 10/13 9/13 7/13 7/13 33 5/7 (71.4%) 5/7 5/7 4/7 3/7 32 8/11 (72.7%) 7/11 7/11 7/11 7/11 31 12/18 (66.7%) 10/18 11/18 10/18 10/18 29 5/7 (71.4%) 5/7 4/7 5/7 4/7 27 4/12 ( 33.3% ) 4/12 4/12 4/12 2/12 26 5/7 (71.4%) 5/7 5/7 5/7 5/7 24 12/13 ( 92.3% ) 12/13 12/13 12/13 12/13 22 10/11 (90.9%) 10/11 8/11 10/11 9/11 20 5/6 (83.3%) 5/6 5/6 5/6 5/6 18 5/7 (71.4%) 5/7 5/7 5/7 5/7 17 6/11 (54.5%) 6/11 6/11 6/11 6/11 15 18/20 (90%) 17/20 11/20 17/20 10/20 12 7/9 (77.8%) 7/9 7/9 7/9 7/9 9 2/5 (40%) 2/5 2/5 2/5 1/5 8 5/10 (50%) 5/10 5/10 5/10 5/10 7 7/10 (70%) 7/10 5/10 7/10 7/10 5 8/11 (72.5%) 8/11 8/11 8/11 7/11 4 4/7 (57.1%) 4/7 3/7 4/7 4/7 3 10/14 (71.4%) 10/14 9/14 9/14 9/14 2 7/8 (87.5%) 7/8 6/8 6/8 6/8 1 Any P-Use C-Use Decisions Blocks Version ID
    36. 36. Cumulated defect/block coverage
    37. 37. Cumulated defect coverage versus block coverage R 2 =0.945
    38. 38. Test cases description I II III IV V VI
    39. 39. Block coverage vs. fault coverage <ul><li>Test case contribution on block coverage </li></ul><ul><li>Test case contribution on fault coverage </li></ul>I II III IV V VI I II III IV V VI
    40. 40. Correlation between block coverage and fault coverage <ul><li>Linear modeling fitness in various test case regions </li></ul><ul><li>Linear regression relationship between block coverage and defect coverage in the whole test set </li></ul>
    41. 41. The correlation at various test regions <ul><li>Linear regression relationship between block coverage and defect coverage in Region VI </li></ul><ul><li>Linear regression relationship between block coverage and defect coverage in Region IV </li></ul>
    42. 42. Under various testing strategies <ul><ul><li>Functional test: 1-800 </li></ul></ul><ul><ul><li>Random test: 801-1200 </li></ul></ul><ul><ul><li>Normal test: the system is operational according to the spec </li></ul></ul><ul><ul><li>Exceptional test: the system is under severe stress conditions. </li></ul></ul>
    43. 43. With different coverage metrics <ul><ul><li>The correlations under decision, C-use and P-use are similar with that of block coverage </li></ul></ul>
    44. 44. Answers to the research questions <ul><li>Is code coverage a positive indicator for fault detection capability? </li></ul><ul><ul><li>Yes. </li></ul></ul><ul><li>Does such effect vary under different testing strategies and profiles? </li></ul><ul><ul><li>Yes. The effect is highest with exceptional test cases, while lowest with normal test cases. </li></ul></ul><ul><li>Does any such e ff ect vary with di ffe rent code coverage metrics ? </li></ul><ul><ul><li>Not obvious with our experimental data. </li></ul></ul>
    45. 45. Major contributions or findings on software testing <ul><li>High correlation between fault coverage and code coverage in exceptional test cases </li></ul><ul><ul><li>Give guidelines for design of exceptional test cases </li></ul></ul><ul><li>This is the first time that such correlation has been investigated under various testing strategies </li></ul>
    46. 46. Outline <ul><li>Background and related work </li></ul><ul><li>Research methodology </li></ul><ul><li>Experimental setup </li></ul><ul><li>Evaluations on design diversity </li></ul><ul><li>Coverage-based testing strategies </li></ul><ul><li>Reliability modeling </li></ul><ul><li>Conclusion and future work </li></ul>
    47. 47. Work on reliability modeling <ul><li>Evaluate current probability reliability models for design diversity with our experimental data </li></ul><ul><li>Propose a new reliability model which incorporates test coverage measurement into traditional software growth model </li></ul>
    48. 48. Results of PS Model with our project data <ul><li>Popov, Strigini et al (2003) </li></ul>
    49. 49. Results of PS Model with our project data
    50. 50. Results of DL model with our project data <ul><li>Dugan and Lyu ( 1995 ) </li></ul><ul><li>Predicted reliability by different configurations </li></ul><ul><li>The result is consistent with previous study </li></ul>
    51. 51. Introducing coverage into software reliability modeling <ul><li>Most traditional software reliability models are based on time domain </li></ul><ul><li>However, time may not be the only factor that affects the failure behavior of software </li></ul><ul><li>Test completeness may be another indicator for software reliability </li></ul>
    52. 52. A new reliability model <ul><li>Assumptions: </li></ul><ul><li>The number of failures revealed in testing is related to not only the execution time, but also the code coverage achieved; </li></ul><ul><li>The failure rate with respect to time and test coverage together is a parameterized summation of those with respect to time or coverage alone; </li></ul><ul><li>The probabilities of failure with respect to time and coverage are not independent, they a ffe ct each other by an exponential rate. </li></ul>
    53. 53. Model form <ul><ul><li>λ (t,c): joint failure intensity function </li></ul></ul><ul><ul><li>λ 1 (t): failure intensity function with respect to time </li></ul></ul><ul><ul><li>λ 2 (c): failure intensity function with respect to coverage </li></ul></ul><ul><ul><li>α 1 , γ 1 , α 2 , γ 2 : parameters with the constraint of </li></ul></ul><ul><ul><li>α 1 + α 2 = 1 </li></ul></ul><ul><ul><li>joint failure intensity function </li></ul></ul><ul><ul><li>failure intensity function with time </li></ul></ul><ul><ul><li>failure intensity function with coverage </li></ul></ul>Dependency factors
    54. 54. <ul><li>Method A: </li></ul><ul><ul><li>Select a model for λ 1 (t) and λ 2 (c) ; </li></ul></ul><ul><ul><li>Estimate the parameters in λ 1 (t) and λ 2 (c) independently; </li></ul></ul><ul><ul><li>Optimize other four parameters afterwards. </li></ul></ul><ul><li>Method B: </li></ul><ul><ul><li>Select a model for λ 1 (t) and λ 2 (c) ; </li></ul></ul><ul><ul><li>Optimize all parameters together. </li></ul></ul><ul><li>Least-squares estimation (LSE) employed </li></ul>Estimation methods <ul><ul><li>Existing reliability models: NHPP, S-shaped, logarithmic, Weibull … </li></ul></ul><ul><ul><li>??? </li></ul></ul>
    55. 55. λ (c) : Modeling defect coverage and code coverage <ul><li>A Hyper-exponential model </li></ul><ul><ul><li>F c : cumulated number of failures when coverage c is achieved </li></ul></ul><ul><ul><li>K: number of classes of testing strategies; </li></ul></ul><ul><ul><li>N i : the expected number of faults detected eventually in each class </li></ul></ul><ul><li>A Beta model </li></ul><ul><ul><li>N 1 : the expected number of faults detected eventually </li></ul></ul><ul><ul><li>N 2 : the ultimate test coverage </li></ul></ul>
    56. 56. λ (c) : Experimental evaluation
    57. 57. λ (c) : Parameters estimation results <ul><li>Hyper-exponential model </li></ul><ul><li>Beta model </li></ul><ul><li>SSE=38365 </li></ul>
    58. 58. <ul><li>λ 1 (t), λ 2 (c): exponential (NHPP) </li></ul><ul><li>NHPP model: original SRGM </li></ul>Parameter estimation (1)
    59. 59. Prediction accuracy (1)
    60. 60. Parameter estimation (2) <ul><li>λ 1 (t) : NHPP </li></ul><ul><li>λ 2 (c): Beta model </li></ul>
    61. 61. Estimation accuracy (2)
    62. 62. Major contributions or findings on software reliability modeling <ul><li>The first reliability model which combines the effect of testing time and code coverage together </li></ul><ul><li>The new reliability model outperforms traditional NHPP model in terms of estimation accuracy </li></ul>
    63. 63. Outline <ul><li>Background and related work </li></ul><ul><li>Research methodology </li></ul><ul><li>Experimental setup </li></ul><ul><li>Evaluations on design diversity </li></ul><ul><li>Coverage-based testing strategies </li></ul><ul><li>Reliability modeling </li></ul><ul><li>Conclusion and future work </li></ul>
    64. 64. Conclusion <ul><li>Propose a new software reliability modeling </li></ul><ul><ul><li>Incorporate code coverage into traditional software reliability growth models </li></ul></ul><ul><ul><li>Achieve better accuracy than the traditional NHPP model </li></ul></ul><ul><li>The first reliability model combining the effect of testing time and code coverage together </li></ul>
    65. 65. Conclusion <ul><li>Assess multi-version fault-tolerant software with supportive evidence by a large-scale experiment </li></ul><ul><ul><li>High reliability improvement </li></ul></ul><ul><ul><li>Low fault correlation </li></ul></ul><ul><ul><li>Stable performance </li></ul></ul><ul><li>A major empirical study in this field with substantial fault and coverage data </li></ul>
    66. 66. Conclusion <ul><li>Evaluate the effectiveness of coverage-based testing strategies: </li></ul><ul><ul><li>Code coverage is a reasonably positive indicator for fault detection capability </li></ul></ul><ul><ul><li>The effect is remarkable under exceptional testing profile </li></ul></ul><ul><li>The first evaluation looking into different categories of testing strategies </li></ul>
    67. 67. Future work <ul><li>Further evaluate the current reliability model using comparisons with existing reliability models other than NHPP </li></ul><ul><li>Consider other formulations about the relationship between fault coverage and test coverage </li></ul><ul><li>Further study on the economical tradeoff between software testing and fault tolerance </li></ul>
    68. 68. Publication list <ul><li>Journal papers and book chapters </li></ul><ul><ul><li>Xia Cai , Michael R. Lyu and Kam-Fai Wong, A Generic Environment for COTS Testing and Quality Prediction , Testing Commercial-off-the-shelf Components and Systems, Sami Beydeda and Volker Gruhn (eds.), Springer-Verlag, Berlin, 2005, pp.315-347. </li></ul></ul><ul><ul><li>Michael R. Lyu and Xia Cai , Fault-tolerant Software , To appear in Encyclopedia on Computer Science and Engineering, Benjamin Wah (ed.), Wiley. . </li></ul></ul><ul><ul><li>Xia Cai , Michael R. Lyu, An Experimental Evaluation of the Effect of Code Coverage on Fault Detection , Submitted to IEEE Transactions on Software Engineering, June 2006. </li></ul></ul><ul><ul><li>Xia Cai , Michael R. Lyu, Mladen A. Vouk, Reliability Features for Design Diversity :Cross Project Evaluations and Comparisons , in preparation. </li></ul></ul><ul><ul><li>Xia Cai , Michael R. Lyu, Predicting Software Reliability with Test Coverage , in preparation. </li></ul></ul>
    69. 69. Publication list <ul><li>Conference papers </li></ul><ul><ul><li>Michael R. Lyu, Zubin Huang, Sam K. S. Sze and Xia Cai , “ An Empirical Study on Testing and Fault Tolerance for Software Reliability Engineering, ” Proceedings of the 14th IEEE International Symposium on Software Reliability Engineering (ISSRE'2003), Denver, Colorado, Nov. 2003, pp.119-130. This paper received the ISSRE'2003 Best Paper Award. </li></ul></ul><ul><ul><li>Xia Cai and Michael R. Lyu, “ An Empirical Study on Reliability and Fault Correlation Models for Diverse Software Systems, ” ISSRE ’ 2004, Saint-Malo, France, Nov. 2004, pp.125-136. </li></ul></ul><ul><ul><li>Xia Cai and Michael R. Lyu, “ The Effect of Code Coverage on Fault Detection under Different Testing Profiles, ” ICSE 2005 Workshop on Advances in Model-Based Software Testing (A-MOST), St. Louis, Missouri, May 2005. </li></ul></ul><ul><ul><li>Xia Cai , Michael R. Lyu and Mladen A. Vouk, “ An Experimental Evaluation on Reliability Features of N-Version Programming, ” ISSRE ’2 005, Chicago, Illinois, Nov. 8-11, 2005, pp. 161-170. </li></ul></ul><ul><ul><li>Xia Cai and Michael R. Lyu, “Predicting Software Reliability with Testing Coverage Information , ” In preparation to International Conference on Software Engineering (ICSE’2007) . </li></ul></ul>
    70. 70. Q & A Thanks!
    71. 71. Previous work on modeling reliability with coverage information <ul><li>Vouk (1992) </li></ul><ul><ul><li>Rayleigh model </li></ul></ul><ul><li>Malaiya et al.(2002) </li></ul><ul><ul><li>Logarithmic-exponential model </li></ul></ul><ul><li>Chen et al. (2001) </li></ul><ul><ul><li>Using code coverage as a factor to reduce the execution time in reliability models </li></ul></ul>
    72. 72. Comparisons with previous estimations
    73. 73. <ul><li>The number of mutants failing in different testing </li></ul>
    74. 74. Non-redundant set of test cases
    75. 75. Test set reduction with normal testing
    76. 76. Test set reduction with exceptional testing
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.