Cleansing Coincidental Correctnessto Enhance Fault Localization                                                         Ta...
Outline   Coverage-Based Fault Localization       Introduction       Methodology       Evaluation       Discussion  ...
Introduction                  Software Debugging is an arduous task[1] that requires                        Time        ...
Input of Fault Localization      Source code      Test Cases                                      Input:     oracle//Fin...
Output of Fault Localization            Suspiciousness of each statement                 Based on likelihood of containi...
Coverage-Based Fault Localization (CBFL)               Based on the executable statement hit (coverage)               In...
Input of CBFL                                         For brevity…//Find the maximum among a, b and c                    ...
Methodology               Intuitively, for each statement, there are four factors,                which will contribute t...
Jaccard [3]    Similarity of asymmetric binary attributes    a, b, c       S3         S6       Others        r           ...
Tarantula [4]       Used in the Tarantula fault                       a, b, c   S3         S6       Others r        local...
Ochiai [5]                                                                       a, b, c   S3    S6   Others   r   Used i...
Evaluation                     Assign a score to every faulty version of each                      subject program       ...
Evaluation (cont’) - An example   Step 1             S6    S3     S1 S2 S4 S5 S7 S8                   S   0.7   0.5    0....
Evaluation (cont’) - An example   Step 1               S6    S3     S1 S2 S4 S5 S7 S8                     S   0.7   0.5  ...
Evaluation (cont’) - An example    2 statements have been examined    8 statements in the program totally    Score of t...
Evaluation (cont’)   Assign a score to every faulty version of Siemens suite       The effectiveness of existing techniq...
Discussion                     Rewrite the coefficients as below [7]                           a11                       ...
The impact of a11 and a10   The suspiciousness calculated by the coefficients have       a positive correlation with a11...
Interferences           Factors impair the CBFL (interferences)                 Coincidental Correctness [8]            ...
Coincidental Correctness                    Not all conditions for failure are met.                         The RIP (rea...
Multiple Faults         The fault is not executed, but this execution will failed.          (Because another fault is exe...
Coverage Equivalence        The coverage between statements are always the same.        Due to             Inadequacy o...
Empirical Study           Coincidental Correctness (72.1%) [8]                 Strong Coincidental Correctness (15.7%)  ...
Cleansing Coincidental Correctness [10]                  Input:                     A test suite and the coverage matrix...
Technique - I                        Populate CCE with program elements that                                              ...
Technique - I (cont’)Assumption  fT(cce) = 1.0   0 < pT(cce) ≤ θ        where fT(cce) is the percentage of         TF e...
Technique - I - An example//Find the maximum among a, b and c  int max (int a, int b, int c){1   int temp = a;            ...
Technique - I - An example (cont’)//Find the maximum among a, b and c  int max (int a, int b, int c){              a, b, c...
Technique - II   A high average weight is more likely to be a    coincidentally correct test.        Weight (correlate w...
Technique - III   Partitions the cct’s into two clusters based on the similarity    of the suspicious cce’s   Assumption...
Evaluation   false negatives:   false positives:   safety change:   precision change:   coverage reduction:          ...
Evaluation (cont’)                     32/37
Evaluation (cont’)           Comparative results summaries                                           33/37
Conclusion   Without interferences, CBFL are effective and efficient    techniques that automate Fault Localization.   W...
Future Work   Conduct more algorithms to identify coincidental    correctness       e.g. cluster analysis and failure cl...
Q&A      36/37
Thank you!Contact me via elfinhe@gmail.com                                   37/37
Upcoming SlideShare
Loading in …5
×

Cleansing test suites from coincidental correctness to enhance falut localization

692 views

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
692
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • [49] [142] [138] [138] [8] [29] [61]
  • 这里抽象出来了四个因素, a 00 , a 10 , a 01 , a 11 ,但是如何对这四个因素进行错误定位领域的数学建模呢?
  • 这里抽象出来了四个因素, a 00 , a 10 , a 01 , a 11 ,但是如何对这四个因素进行错误定位领域的数学建模呢? a 00 is the number of passed runs in which part S is not involved a 01 is the number of failed runs in which part S is not involved a 10 is the number of passed runs in which part S is involved a 11 is the number of failed runs in which part S is involved
  • The  Jaccard index , also known as the  Jaccard similarity coefficient  (originally coined  coefficient de communauté  by  Paul Jaccard ), is a  statistic  used for comparing the  similarity  and diversity of  sample  sets. Similarity of asymmetric binary attributes Given two objects,  A  and  B , each with  n   binary  attributes, the Jaccard coefficient is a useful measure of the overlap that  A and  B  share with their attributes. Each attribute of  A  and  B  can either be 0 or 1. The total number of each combination of attributes for both  A  and  B  are specified as follows:
  • by dividing the numerator and denominator
  • 以上这三种情况,会给疑似度带来怎样的影响呢? which come from Insufficient Test Suite Program’s inherent features
  • 需要发生多少,才会把正确语句的疑似度给挤上去? The fault is executed, this execution will not fail
  • 这两条语句的疑似度降低了,而其他语句的疑似度超过了它们。
  • 即使是极端情况,也不会把错误语句疑似度排在正确语句的后面。
  • Lines 1-4 populate CCE with program elements that are totally correlated with failing runs and correlated with a given ratio of passing runs (determined by θ). Lines 5-8 populate CCT with tests that execute one or more cce’s. Technique - I exhibits a low rate of false negatives.
  • Lines 1-4 populate CCE with program elements that are totally correlated with failing runs and correlated with a given ratio of passing runs (determined by θ). Lines 5-8 populate CCT with tests that execute one or more cce’s. Technique - I exhibits a low rate of false negatives.
  • Lines 1-4 populate CCE with program elements that are totally correlated with failing runs and correlated with a given ratio of passing runs (determined by θ). Lines 5-8 populate CCT with tests that execute one or more cce’s. Technique - I exhibits a low rate of false negatives.
  • Lines 1-4 populate CCE with program elements that are totally correlated with failing runs and correlated with a given ratio of passing runs (determined by θ). Lines 5-8 populate CCT with tests that execute one or more cce’s. Technique - I exhibits a low rate of false negatives.
  • Cleansing test suites from coincidental correctness to enhance falut localization

    1. 1. Cleansing Coincidental Correctnessto Enhance Fault Localization Tao He elfinhe@gmail.com Software Engineering Laboratory Department of Computer Science, Sun Yat-Sen University The 2nd Joint Winter Workshop on Software Engineering December 2010 Sun Yat-Sen University, Guangzhou, China 1/37
    2. 2. Outline Coverage-Based Fault Localization  Introduction  Methodology  Evaluation  Discussion Cleansing Coincidental Correctness  Methodology  Evaluation Conclusion and Future Work 2/37
    3. 3. Introduction  Software Debugging is an arduous task[1] that requires  Time  Effort  A good understanding of the source code  Three steps to debug[2]  Fault detection  Fault localization  Fault correction  We focus on automatic Fault Localization…[1] I. Vessey. Expertise in debugging computer programs: A process analysis. International Journal of Man-Machine Studies, 23(5):459–494, November 1985.[2] D. Wieland. Model-Based Debugging of Java Programs Using Dependencies. PhD thesis, Technischen 3/37Universitat Wien, 2001.
    4. 4. Input of Fault Localization  Source code  Test Cases Input: oracle//Find the maximum among a, b and c int max (int a, int b, int c){ a, b, c1 int temp = a; 3, 2, 1 32 if (b > temp ){3 temp = b+1; //bug 2, 1, 3 34 } 1, 2, 3 35 if (c > temp ){6 temp = c; 1, 2, 4 47 } 1, 2, 3 38 return temp; } 1, 3, 2 3 Source Code Test Cases 4/37
    5. 5. Output of Fault Localization  Suspiciousness of each statement  Based on likelihood of containing faults.  Statement with higher suspiciousness should be examined before statement with a lower suspiciousness.//Find the maximum among a, b and c most suspicious int max (int a, int b, int c){1 int temp = a;2 if (b > temp ){3 temp = b+1; //bug S1 S2 S3 S4 S5 S6 S7 S84 }5 if (c > temp ){6 temp = c; S 0.33 0.33 0.5 0.33 0.33 0.25 0.33 0.337 }8 return temp; } Source Code Suspiciousness results for Jaccard coefficient 5/37
    6. 6. Coverage-Based Fault Localization (CBFL)  Based on the executable statement hit (coverage)  Input of CBFL  Coverage  Execution result (passed or failed)//Find the maximum among a, b and c int max (int a, int b, int c){ a, b, c S1 S2 S3 S4 S5 S6 S7 S8 r1 int temp = a; 3, 2, 1 1 1 0 1 1 0 1 1 p2 if (b > temp ){3 temp = b+1; //bug 2, 1, 3 1 1 0 1 1 1 1 1 p4 }5 if (c > temp ){ 1, 2, 3 1 1 1 1 1 0 1 1 p6 temp = c;7 } 1, 2, 4 1 1 1 1 1 1 1 1 p8 return temp; } 1, 2, 3 1 1 1 1 1 1 1 1 f Source Code 1, 3, 2 1 1 1 1 1 0 1 1 f 6/37
    7. 7. Input of CBFL  For brevity…//Find the maximum among a, b and c a, b, c S3 S6 Others r int max (int a, int b, int c){1 int temp = a; 3, 2, 1 0 0 1 p2 if (b > temp ){ 2, 1, 3 0 1 1 p3 temp = b+1; //bug 1, 2, 3 1 0 1 p4 }5 if (c > temp ){ 1, 2, 4 1 1 1 p6 temp = c;7 } 1, 2, 3 1 1 1 f8 return temp; } 1, 3, 2 1 0 1 f Source Code 7/37
    8. 8. Methodology  Intuitively, for each statement, there are four factors, which will contribute to the suspiciousness.For each statement S An example a, b, c S3 S6 Others r SJ(s) Cue 3, 2, 1 0 0 1 p 2, 1, 3 0 1 1 p ↑a00(S) ↑ |Not cover S, Passed tests| 1, 2, 3 1 0 1 p ↑a10(S) ↓ |Cover S, Passed tests| 1, 2, 4 1 1 1 p ↑a01(S) ↓ |Not cover S, Failed tests| 1, 2, 3 1 1 1 f ↑a11(S) ↑ |Cover S, Failed tests| 1, 3, 2 1 0 1 f a00(S) 2 2 0 a10(S) 2 2 4 a01(S) 0 1 0 a11(S) 2 1 2 8/37
    9. 9. Jaccard [3] Similarity of asymmetric binary attributes a, b, c S3 S6 Others r 3, 2, 1 0 0 1 p 2, 1, 3 0 1 1 p a11 ( s ) SJ ( s) = 1, 2, 3 1 0 1 p a11 ( s ) + a01 ( s) + a10 ( s) 1, 2, 4 1 1 1 p failed ( s ) 1, 2, 3 1 1 1 f SJ ( s) = totalfailed + passed ( s) 1, 3, 2 1 0 1 f SJ(j) 0.5 0.25 0.33[3] M. Y. Chen, E. Kiciman, E. Fratkin, A. Fox, and E. A. Brewer. Pinpoint: Problem determination in large,dynamic internet services. In Proceedings of 2002 International Conference on Dependable Systems and Networks(DSN 2002), pages 595–604, Bethesda, MD, USA, 23-26 June 2002. IEEE Computer Society. 9/37
    10. 10. Tarantula [4]  Used in the Tarantula fault a, b, c S3 S6 Others r localization tool 3, 2, 1 0 0 1 p a11 ( s ) a11 ( s ) + a01 ( s ) 2, 1, 3 0 1 1 p ST ( s ) = a11 ( s ) a10 ( s ) 1, 2, 3 1 0 1 p + a11 ( s ) + a01 ( s ) a10 ( s ) + a00 ( s ) 1, 2, 4 1 1 1 p failed ( s ) 1, 2, 3 1 1 1 f totalfailed ST ( s ) = failed ( s ) passed ( s ) 1, 3, 2 1 0 1 f + totalfailed totalpassed ST(j) 0.66 0.5 0.5[4] J. A. Jones and M. J. Harrold. Empirical evaluation of the tarantula automatic faultlocalization technique. In D.F. Redmiles, T. Ellman, and A. Zisman, editors, 20th IEEE/ACM International Conference on Automated SoftwareEngineering (ASE 2005), pages 273–282, Long Beach, CA, USA, November 7-11 2005. ACM. 10/37
    11. 11. Ochiai [5] a, b, c S3 S6 Others r Used in the molecular biology domain  To measure genetic similarity 3, 2, 1 0 0 1 p 2, 1, 3 0 1 1 p SO ( s ) = a11 ( s ) 1, 2, 3 1 0 1 p (a11 ( s ) + a01 ( s )) × (a11 ( s ) + a10 ( s )) 1, 2, 4 1 1 1 p 1, 2, 3 1 1 1 f failed ( s ) SO ( s ) = totalfailed × ( passed ( s ) + failed ( s )) 1, 3, 2 1 0 1 f SO(j) 0.7 0.41 0.57[5] R. Abreu, P. Zoeteweij, and A. J. van Gemund. On the accuracy of spectrum-based fault localization. In P.McMinn, editor, Proceedings of the Testing: Academia and Industry Conference - Practice And Research Techniques(TAIC PART’07), pages 89–98, Windsor, United Kingdom, September 2007. IEEE Computer Society. 11/37
    12. 12. Evaluation  Assign a score to every faulty version of each subject program  Score [6]  Describes the percentage of program that need not to be examined until the first bug-containing statement is reached  Assumption  Perfect bug detection  i.e., programmers can always correctly classify faulty code as faulty, and non-faulty code as non-faulty.[6] J. A. Jones and M. J. Harrold. Empirical evaluation of the tarantula automatic faultlocalization technique. In D.F. Redmiles, T. Ellman, and A. Zisman, editors, 20th IEEE/ACM International Conference on Automated SoftwareEngineering (ASE 2005), pages 273–282, Long Beach, CA, USA, November 7-11 2005. ACM. 12/37
    13. 13. Evaluation (cont’) - An example Step 1 S6 S3 S1 S2 S4 S5 S7 S8 S 0.7 0.5 0.33 0.33 0.33 0.33 0.33 0.33  Not bug Sorted suspiciousness results //Find the maximum among a, b and c int max (int a, int b, int c){ 1 int temp = a; 2 if (b > temp ){ 3 temp = b+1; //bug 4 } 5 if (c > temp ){ Not bug 6 temp = c; 7 } 8 return temp; } Source Code 13/37
    14. 14. Evaluation (cont’) - An example Step 1 S6 S3 S1 S2 S4 S5 S7 S8 S 0.7 0.5 0.33 0.33 0.33 0.33 0.33 0.33  Not bug Sorted suspiciousness results Step 2 //Find the maximum among a, b and c  Find it! int max (int a, int b, int c){ 1 int temp = a; 2 if (b > temp ){ Find it! 3 temp = b+1; //bug 4 } 5 if (c > temp ){ 6 temp = c; 7 } 8 return temp; } Source Code 14/37
    15. 15. Evaluation (cont’) - An example  2 statements have been examined  8 statements in the program totally  Score of this program is  1- (2 ÷ 8) = 0.75  The percentage of statements that need not to be examined 15/37
    16. 16. Evaluation (cont’) Assign a score to every faulty version of Siemens suite  The effectiveness of existing techniques has been limited… 16/37
    17. 17. Discussion  Rewrite the coefficients as below [7] a11 a11 + a01 a11 a11 ST = SJ = SO = a11 a10 a11 + a01 + a10 (a11 + a01 ) × (a11 + a10 ) + a11 + a01 a10 + a00 Replace by Square, and Divide by a11 + a01 divide by CT = C J = a11 + a01 a10 + a00 C J = a11 + a01 For brevity 1 a11 a11 a11 ′ ST = ′ SJ = S = 2 O ′ SO = a10  a  a 1 + CT × C J + a10 C J × 1 + 10  1 + 10 a11  a  a11  11   Both CT=(a11+a01)/(a10+a00) and CJ=a11+a01 are constant for all statements  Not influence the suspiciousness ranking  So rankings from three coefficients depend only on a11 and a10[7] R. Abreu, P. Zoeteweij, R. Golsteijn, and A. J. C. van Gemund. A practical evaluation of spectrum-based faultlocalization. Journal of Systems and Software, 82(11):1780–1792, 2009. 17/37
    18. 18. The impact of a11 and a10 The suspiciousness calculated by the coefficients have  a positive correlation with a11  a negative correlation with a10 Assume that  the fault is executed, this execution will fail (to increase a11),  the fault is not executed, this execution will pass (to increase a10),  the test suite is adequate. Then the fault statement will always rank top. Why ineffective? Any interferences? 18/37
    19. 19. Interferences  Factors impair the CBFL (interferences)  Coincidental Correctness [8]  The fault is executed, but this execution will not fail,  Multiple Faults  The fault is not executed, but this execution will fail.  Coverage Equivalence  The coverage between statements are always the same.[8] W. Masri, R. Abou-Assi, M. El-Ghali, and N. Al-Fatairi. An empirical study of the factors that reduce theeffectiveness of coverage-based fault localization. In B. Liblit, N. Nagappan, and T. Zimmermann, editors,Proceedings of the 2nd International Workshop on Defects in Large Software Systems: Held in conjunction withISSTA 2009, pages 1–5, Chicago, Illinois, July 19-19 2009. ACM. 19/37
    20. 20. Coincidental Correctness  Not all conditions for failure are met.  The RIP (reachability-infection-propagation) model[9]  Condition 1:the fault is executed  Condition 2:the program has transitioned into an infectious state  Condition 3:the infection has propagated to the output //Find the maximum among a, b and c int max (int a, int b, int c){ 1 int temp = a; 2 if (b > temp ){ 3 temp = b+1; //bug 4 } condition a, b, c S3 S6 Others r 5 if (c > temp ){ a < b, b + 1 = c 1, 2, 3 1 0 1 p 6 temp = c; a < b, b + 1 < c 1, 2, 4 1 1 1 p 7 } 8 return temp; }[9] Ammann P. and Offutt J. Introduction to Software Testing. Cambridge University Press, 2008. 20/37
    21. 21. Multiple Faults  The fault is not executed, but this execution will failed. (Because another fault is executed.)//Find the maximum among a, b and c int max (int a, int b, int c){1 int temp = a;2 if (b > temp ){ condition a, b, c S3 S6 r3 temp = b+1; //bug a < b, b + 1 ≥ c 1, 2, 4 1 0 f4 }5 if (c > temp ){ a ≥ b, a < c 3, 2, 4 0 1 f6 temp = c+1; //bug7 }8 return temp; } 21/37
    22. 22. Coverage Equivalence  The coverage between statements are always the same.  Due to  Inadequacy of the test suite  The inherent property of a program//Find the maximum among a, b and c int max (int a, int b, int c){1 int temp = a+1; //bug2 if (b > temp ){ condition a, b, c S1 S8 r3 temp = b;4 } a < b or a <c 1, 2, 3 1 1 p5 if (c > temp ){ otherwise 7, 2, 4 1 1 f6 temp = c;7 }8 return temp; } 22/37
    23. 23. Empirical Study  Coincidental Correctness (72.1%) [8]  Strong Coincidental Correctness (15.7%)  Meet Condition 1,2 of RIP(reachability-infection-propagation) model.  Weak Coincidental Correctness (56.4%)  Meet only Condition 1 of RIP(reachability-infection-propagation) model.  A safety reducing factor.  Causes the faulty statement has a lower score than others.[8] W. Masri, R. Abou-Assi, M. El-Ghali, and N. Al-Fatairi. An empirical study of the factors that reduce theeffectiveness of coverage-based fault localization. In B. Liblit, N. Nagappan, and T. Zimmermann, editors,Proceedings of the 2nd International Workshop on Defects in Large Software Systems: Held in conjunction withISSTA 2009, pages 1–5, Chicago, Illinois, July 19-19 2009. ACM. 23/37
    24. 24. Cleansing Coincidental Correctness [10]  Input:  A test suite and the coverage matrix  Output:  Subset of passing tests that are likely to be coincidentally correct.  Assumption  A good candidate for a cce is a program element that occurs in all failing runs and in a non-zero but not excessively large percentage of passing runs[10] Wes Masri, Rawad Abou Assi, Cleansing Test Suites from Coincidental Correctness to Enhance Fault-Localization, 2008 International Conference on Software Testing, Verification, and Validation, pp. 165-174, 2010Third International Conference on Software Testing, Verification and Validation, 2010. IEEE 24/37
    25. 25. Technique - I Populate CCE with program elements that are totally correlated with failures.Assumption fT(cce) = 1.0 0 < pT(cce) ≤ θ  where fT(cce) is the percentage of TF executing cce, pT(cce) the percentage of Tp executing cce, and θ < 1.0. T : a test suite We estimate: CCE: the set of program elements that are likely to TF : failing tests be correlated with coincidentally correct tests. cce: an element in CCE TP : passing tests cct : test that induce cce CCT: estimate of TCC TCC : Coincidentally Correct tests 25/37
    26. 26. Technique - I (cont’)Assumption fT(cce) = 1.0 0 < pT(cce) ≤ θ  where fT(cce) is the percentage of TF executing cce, pT(cce) the percentage of Tp executing cce, and θ < 1.0. Populate CCT with tests that execute one or more cce’s. T : a test suite We estimate: CCE: the set of program elements that are likely to TF : failing tests be correlated with coincidentally correct tests. cce: an element in CCE TP : passing tests cct : test that induce cce CCT: estimate of TCC TCC : Coincidentally Correct tests 26/37
    27. 27. Technique - I - An example//Find the maximum among a, b and c int max (int a, int b, int c){1 int temp = a; a, b, c S3 S6 Others r2 if (b > temp ){ 1, 2, 3 1 0 1 p3 temp = b+1; //bug 1, 2, 4 1 1 1 p4 }5 if (c > temp ){ 3, 2, 1 0 0 1 p6 temp = c; 2, 1, 3 0 1 1 p7 } 1, 2, 3 1 1 1 f8 return temp; } 1, 3, 2 1 0 1 f cce 27/37
    28. 28. Technique - I - An example (cont’)//Find the maximum among a, b and c int max (int a, int b, int c){ a, b, c S3 S6 Others r1 int temp = a;2 if (b > temp ){ Find them! cct 1, 2, 3 1 0 1 p3 temp = b+1; //bug coincidental cct 1, 2, 4 1 1 1 p4 } correctness 3, 2, 1 0 0 1 p5 if (c > temp ){6 temp = c; 2, 1, 3 0 1 1 p7 } 1, 2, 3 1 1 1 f8 return temp; } 1, 3, 2 1 0 1 f cce 28/37
    29. 29. Technique - II A high average weight is more likely to be a coincidentally correct test.  Weight (correlate with suspiciousness)  ((average weight of the covered cce’s) + (percent of cce’s covered)) The lower ranked cct’s are discarded 29/37
    30. 30. Technique - III Partitions the cct’s into two clusters based on the similarity of the suspicious cce’s Assumptions  Typically, some cce’s are relevant to the fault and others are not.  The coincidentally correct tests exercise these fault relevant cce’s whereas the correct tests don’t. 30/37
    31. 31. Evaluation false negatives: false positives: safety change: precision change: coverage reduction: 31/37
    32. 32. Evaluation (cont’) 32/37
    33. 33. Evaluation (cont’) Comparative results summaries 33/37
    34. 34. Conclusion Without interferences, CBFL are effective and efficient techniques that automate Fault Localization. Well designed coefficients will be compatible with some interferences but not all of them. Three variations of a technique are presented to identify coincidental correctness, a safety reducing factor for CBFL. 34/37
    35. 35. Future Work Conduct more algorithms to identify coincidental correctness  e.g. cluster analysis and failure classification. Evaluate whether different program elements can further reduce the rate of false positives  e.g. predicates, function calls, program paths Assess the impact of cleansing coincidental correctness on other fault localization approaches 35/37
    36. 36. Q&A 36/37
    37. 37. Thank you!Contact me via elfinhe@gmail.com 37/37

    ×