Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- An Interval-value Clustering Approa... by Tommy96 360 views
- TDWI World Conference-Winter 2004, ... by Tommy96 801 views
- syllabus555A.doc by Tommy96 468 views
- Full Article by Timothy212 285 views
- A RESEARCH SUPPORT SYSTEM FRAMEWORK... by Tommy96 2101 views
- PaDDMAS: Parallel and Distributed D... by Tommy96 389 views

No Downloads

Total views

726

On SlideShare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

30

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Data Mining: Concepts and Techniques — Chapter 11 — — Software Bug Mining — <ul><li>Jiawei Han and Micheline Kamber </li></ul><ul><li>Department of Computer Science </li></ul><ul><li>University of Illinois at Urbana-Champaign </li></ul><ul><li>www.cs.uiuc.edu/~hanj </li></ul><ul><li>©2006 Jiawei Han and Micheline Kamber. All rights reserved. </li></ul><ul><li>Acknowledgement: Chao Liu </li></ul>
- 3. Outline <ul><li>Automated Debugging and Failure Triage </li></ul><ul><li>SOBER: Statistical Model-Based Fault Localization </li></ul><ul><li>Fault Localization-Based Failure Triage </li></ul><ul><li>Copy and Paste Bug Mining </li></ul><ul><li>Conclusions & Future Research </li></ul>
- 4. Software Bugs Are Costly <ul><li>Software is “full of bugs” </li></ul><ul><ul><li>Windows 2000, 35 million lines of code </li></ul></ul><ul><ul><ul><li>63,000 known bugs at the time of release, 2 per 1000 lines </li></ul></ul></ul><ul><li>Software failure costs </li></ul><ul><ul><li>Ariane 5 explosion due to “errors in the software of the inertial reference system” (Ariaen-5 flight 501 inquiry board report http://ravel.esrin.esa.it/docs/esa-x-1819eng.pdf ) </li></ul></ul><ul><ul><li>A study by the National Institute of Standards and Technology found that software errors cost the U.S. economy about $59.5 billion annually http://www.nist.gov/director/prog-ofc/report02-3.pdf </li></ul></ul><ul><li>Testing and debugging are laborious and expensive </li></ul><ul><ul><li>“ 50% of my company employees are testers, and the rest spends 50% of their time testing!” —Bill Gates, in 1995 </li></ul></ul>
- 5. Automated Failure Reporting <ul><li>End-users as Beta testers </li></ul><ul><ul><li>Valuable information about failure occurrences in reality </li></ul></ul><ul><ul><li>24.5 million/day in Redmond (if all users send) – John Dvorak, PC Magazine </li></ul></ul><ul><li>Widely adopted because of its usefulness </li></ul><ul><ul><li>Microsoft Windows, Linux Gentoo, Mozilla applications … </li></ul></ul><ul><ul><li>Any applications can implement this functionality </li></ul></ul>
- 6. After Failures Collected …: Failure triage <ul><li>Failure triage </li></ul><ul><ul><li>Failure prioritization: </li></ul></ul><ul><ul><ul><li>What are the most severe bugs? </li></ul></ul></ul><ul><ul><li>Failure assignment: </li></ul></ul><ul><ul><ul><li>Which developers should debug a given set of failures? </li></ul></ul></ul><ul><li>Automated debugging </li></ul><ul><ul><li>Where is the likely bug location? </li></ul></ul>
- 7. A Glimpse on Software Bugs <ul><li>Crashing bugs </li></ul><ul><ul><li>Symptoms: segmentation faults </li></ul></ul><ul><ul><li>Reasons: memory access violations </li></ul></ul><ul><ul><li>Tools: Valgrind, CCured </li></ul></ul><ul><li>Noncrashing bugs </li></ul><ul><ul><li>Symptoms: unexpected outputs </li></ul></ul><ul><ul><li>Reasons: logic or semantic errors </li></ul></ul><ul><ul><ul><li>if ((m >= 0)) vs. if ((m >= 0) && (m != lastm )) </li></ul></ul></ul><ul><ul><ul><li>< vs. <=, > vs. >=, etc .. </li></ul></ul></ul><ul><ul><ul><li>j = i vs. j= i+1 </li></ul></ul></ul><ul><ul><li>Tools: No sound tools </li></ul></ul>
- 8. Semantic Bugs Dominate <ul><li>Semantic Bugs : </li></ul><ul><li>Application specific </li></ul><ul><li>Only few detectable </li></ul><ul><li>Mostly require annotations or specifications </li></ul><ul><li>Memory-related Bugs: </li></ul><ul><li>Many are detectable </li></ul>Others Concurrency bugs <ul><li>Bug Distribution [Li et al., ICSE’07] </li></ul><ul><li>264 bugs in Mozilla and 98 bugs in Apache manually checked </li></ul><ul><li>29,000 bugs in Bugzilla automatically checked </li></ul>Courtesy of Zhenmin Li
- 9. Hacking Semantic Bugs is HARD <ul><li>Major challenge: No crashes! </li></ul><ul><ul><li>No failure signatures </li></ul></ul><ul><ul><li>No debugging hints </li></ul></ul><ul><li>Major Methods </li></ul><ul><ul><li>Statistical debugging of semantic bugs [Liu et al., FSE’05, TSE’06] </li></ul></ul><ul><ul><li>Triage noncrashing failures through statistical debugging [Liu et al., FSE’06] </li></ul></ul>
- 10. Outline <ul><li>Automated Debugging and Failure Triage </li></ul><ul><li>SOBER: Statistical Model-Based Fault Localization </li></ul><ul><li>Fault Localization-Based Failure Triage </li></ul><ul><li>Copy and Paste Bug Mining </li></ul><ul><li>Conclusions & Future Research </li></ul>
- 11. A Running Example void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m >= 0){ lastm = m; } if ((m == -1) || (m == i)){ i = i + 1; } else i = m; } } void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ( (m >= 0) && (lastm != m) ) { lastm = m; } if ((m == -1) || (m == i)) { i = i + 1; } else i = m; } } void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ((m >= 0)){ lastm = m; } if ((m == -1) || (m == i)) { i = i + 1; } else i = m; } } <ul><li>130 of 5542 test cases fail, no crashes </li></ul>5 1 Predicate evaluation as tossing a coin # of false # of true Predicate (lin[i] != ENDSTR)==true Ret_amatch > 0 Ret_amatch == 0 Ret_amatch < 0 (m >= 0) == true (m == i) == true (m >= -1) == true 5 1 1 5 1 5 4 2 2 4 1 5
- 12. Profile Executions as Vectors <ul><li>Extreme case </li></ul><ul><ul><li>Always false in passing and always true in failing … </li></ul></ul><ul><li>Generalized case </li></ul><ul><ul><li>Different true probability in passing and failing executions </li></ul></ul>Two passing executions One failing execution 5 1 4 2 5 1 2 4 5 1 5 1 1 5 19 1 18 2 19 1 2 18 19 1 19 1 1 19 9 1 8 2 2 8 2 8 9 1 9 1 1 9
- 13. Estimated Head Probability <ul><li>Evaluation bias </li></ul><ul><ul><li>Estimated head probability from every execution </li></ul></ul><ul><ul><li>Specifically, </li></ul></ul><ul><ul><li>where and are the number of true and false evaluations in one execution. </li></ul></ul><ul><ul><li>Defined for each predicate and each execution </li></ul></ul>
- 14. Divergence in Head Probability <ul><li>Multiple evaluation biases from multiple executions </li></ul><ul><li>Evaluation bias as generated from models </li></ul>0 1 Prob Head Probability 0 1 Prob Head Probability
- 15. Major Challenges <ul><li>No closed form of either model </li></ul><ul><li>No sufficient number of failing executions to estimate </li></ul>0 1 Prob Head Probability 0 1 Prob Head Probability
- 17. SOBER in Summary Test Suite Source Code Pred 2 Pred 6 Pred 1 Pred 3 Pred 2 Pred 6 Pred 1 Pred 3 SOBER
- 18. Previous State of the Art [Liblit et al, 2005] <ul><li>Correlation analysis </li></ul><ul><ul><li>Context( P ) = Prob(fail | P ever evaluated) </li></ul></ul><ul><ul><li>Failure( P ) = Prob(fail | P ever evaluated as true ) </li></ul></ul><ul><ul><li>Increase( P ) = Failure( P ) – Context( P ) </li></ul></ul><ul><li>How more likely the program fails when a predicate is ever evaluated true </li></ul>
- 19. Liblit05 in Illustration Failing Passing O + + + + + + O O O O O O O O O Context(P) = Prob(fail | P ever evaluated) = 4/10 = 2/5 Increase(P) = Failure(P) – Context(P) = 3/7 – 2/5 = 1/35 Failure(P) = Prob(fail | P ever evaluated as true) = 3/7
- 20. SOBER in Illustration O + + + + + + O O O O O O O O O Failing Passing 0 1 Prob Evaluation bias 0 1 Prob Evaluation bias
- 21. Difference between SOBER and Liblit05 <ul><li>Methodology: </li></ul><ul><ul><li>Liblit05: Correlation analysis </li></ul></ul><ul><ul><li>SOBER: Model-based approach </li></ul></ul><ul><li>void subline(char *lin, char *pat, char *sub) </li></ul><ul><li>{ </li></ul><ul><li>1 int i, lastm, m; </li></ul><ul><li>2 lastm = -1; </li></ul><ul><li>3 i = 0; </li></ul><ul><li>4 while((lin[i] != ENDSTR)) { </li></ul><ul><li>5 m = amatch(lin, i, pat, 0); </li></ul><ul><li>6 if (m >= 0){ </li></ul><ul><li>7 putsub(lin, i, m, sub); </li></ul><ul><li>8 lastm = m; </li></ul><ul><li>} </li></ul><ul><li>} </li></ul><ul><li>11 } </li></ul><ul><li>Utilized information </li></ul><ul><ul><li>Liblit05: Ever true? </li></ul></ul><ul><ul><li>SOBER: What percentage is true? </li></ul></ul><ul><li>Liblit05: </li></ul><ul><li>Line 6 is ever true in most passing and failing exec. </li></ul><ul><li>SOBER: </li></ul><ul><li>Prone to be true in failing exec. </li></ul><ul><li>Prone to be false in passing exec. </li></ul>
- 22. T-Score: Metric of Debugging Quality <ul><li>How close is the blamed to the real bug location? </li></ul>void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ((m >= 0)){ lastm = m; } if ((m == -1) || (m == i)) { i = i + 1; } else i = m; } } T-score = 70%
- 23. A Better Debugging Result void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ((m >= 0)){ lastm = m; } if ((m == -1) || (m == i)) { i = i + 1; } else i = m; } } T-score = 40%
- 24. Evaluation 1: Siemens Program Suite <ul><li>T-Score <= 20% is meaningful </li></ul><ul><li>Siemens program suite </li></ul><ul><ul><li>130 buggy versions of 7 small (<700LOC) programs </li></ul></ul><ul><li>What percentage bugs can be located with no more than % code examination </li></ul>
- 25. Evaluation 2: Reasonably Large Programs Software-artifact Infrastructure Repository (SIR): http://sir.unl.edu 2.9% 17/217 Subclause-missing Bug 2 0.5% 65/217 Subclause-missing Bug 1 Gzip 1.2 (6,184 LOC) 0.2% 88/470 Subclause-missing Bug 2 0.6% 48/470 Off-by-one Bug 1 Grep 2.2 (11,826 LOC) 45.6% 92/525 Off-by-one Bug 5 15.4% 22/525 Mis-parenthesize ((a||b)&&c) as (a || (b && c)) Bug 4 7.6% 69/525 Mis-assign value true for false Bug 3 1.6% 356/525 Misuse of = for == Bug 2 0.5% 163/525 Misuse >= for > Bug 1 Flex 2.4.7 (8,834 LOC) T-Score Failure Number Bug Type
- 26. A Glimpse of Bugs in Flex-2.4.7
- 27. Evaluation 2: Reasonably Large Programs Software-artifact Infrastructure Repository (SIR): http://sir.unl.edu 2.9% 17/217 Subclause-missing Bug 2 0.5% 65/217 Subclause-missing Bug 1 Gzip 1.2 (6,184 LOC) 0.2% 88/470 Subclause-missing Bug 2 0.6% 48/470 Off-by-one Bug 1 Grep 2.2 (11,826 LOC) 45.6% 92/525 Off-by-one Bug 5 15.4% 22/525 Mis-parenthesize ((a||b)&&c) as (a || (b && c)) Bug 4 7.6% 69/525 Mis-assign value true for false Bug 3 1.6% 356/525 Misuse of = for == Bug 2 0.5% 163/525 Misuse >= for > Bug 1 Flex 2.4.7 (8,834 LOC) T-Score Failure Number Bug Type
- 28. A Close Look: Grep-2.2: Bug 1 <ul><li>11,826 lines of C code </li></ul><ul><li>3,136 predicates instrumented </li></ul><ul><li>48 out of 470 cases fail </li></ul>
- 29. Grep-2.2: Bug 2 <ul><li>11,826 lines of C code </li></ul><ul><li>3,136 predicates instrumented </li></ul><ul><li>88 out of 470 cases fail </li></ul>
- 30. No Silver Bullet: Flex Bug 5 No wrong value in chk[offset -1] chk[offset] is not used here but later <ul><li>8,834 lines of C code </li></ul><ul><li>2,699 predicates instrumented </li></ul>
- 31. Experiment Result in Summary Effective for bugs demonstrating abnormal control flows 2.9% 17/217 Subclause-missing Bug 2 0.5% 65/217 Subclause-missing Bug 1 Gzip 1.2 (6,184 LOC) 0.2% 88/470 Subclause-missing Bug 2 0.6% 48/470 Off-by-one Bug 1 Grep 2.2 (11,826 LOC) 45.6% 92/525 Off-by-one Bug 5 15.4% 22/525 Mis-parenthesize ((a||b)&&c) as (a || (b && c)) Bug 4 7.6% 69/525 Mis-assign value true for false Bug 3 1.6% 356/525 Misuse of = for == Bug 2 0.5% 163/525 Misuse >= for > Bug 1 Flex 2.4.7 ( 8,834 LOC) T-Score Failure Number Bug Type
- 32. SOBER Handles Memory Bugs As Well <ul><li>bc 1.06: </li></ul><ul><li>Two memory bugs found with SOBER </li></ul><ul><li>One of them is unreported </li></ul><ul><li>Blamed location is NOT the crashing venue </li></ul>
- 33. Outline <ul><li>Automated Debugging and Failure Triage </li></ul><ul><li>SOBER: Statistical Model-Based Fault Localization </li></ul><ul><li>Fault Localization-Based Failure Triage </li></ul><ul><li>Copy and Paste Bug Mining </li></ul><ul><li>Conclusions & Future Research </li></ul>
- 34. Major Problems in Failure Triage <ul><li>Failure Prioritization </li></ul><ul><ul><li>What failures are likely due to the same bug </li></ul></ul><ul><ul><li>What bugs are the most severe </li></ul></ul><ul><ul><li>Worst 1% bugs = 50% failures </li></ul></ul><ul><li>Failure Assignment </li></ul><ul><ul><li>Which developer should debug which set of failures? </li></ul></ul>Courtesy of Microsoft Corporation
- 35. <ul><li>Failure indexing </li></ul><ul><ul><li>Identify failures likely due to the same bug </li></ul></ul>A Solution: Failure Clustering X Y 0 Most sever Less Severe Least Severe Fault in core.io ? Fault in function initialize() ? Failure Reports + + + + + + + + + + + + + + + + + + + + + +
- 36. The Central Question: A Distance Measure between Failures <ul><li>Different measures render different clusterings </li></ul>X Y Dist. defined on X-axis Dist. defined on Y-axis 0 0 0 O O O O O O + + + + + + + O + X + + O O O O O O O O + + + + + + + Y
- 37. How to Define a Distance <ul><li>Previous work [Podgurski et al., 2003] </li></ul><ul><ul><li>T-Proximity : Distance defined on literal trace similarity </li></ul></ul><ul><li>Our approach [Liu et al., 2006] </li></ul><ul><ul><li>R-Proximity : Distance defined on likely bug location </li></ul></ul>= SOBER
- 38. Why Our Approach is Reasonable <ul><li>Optimal proximity: defined on root causes ( RC ) </li></ul><ul><li>Our approach: defined on likely causes ( LC ) </li></ul>F P + + + + X Y 0 = Automated Fault Localization
- 39. R-Proximity: An Instantiation with SOBER <ul><li>Likely causes (LCs) are predicate rankings </li></ul>Pred 2 Pred 3 Pred 1 Pred 6 Pred 2 Pred 3 Pred 1 Pred 6 Pred 2 Pred 6 Pred 1 Pred 3 F P A distances between rankings is needed + + + + X Y 0 Pred 2 Pred 6 Pred 1 Pred 3 SOBER
- 40. Distance between Rankings <ul><li>Traditional Kendall’s tau distance </li></ul><ul><ul><li>Number of preference disagreements </li></ul></ul><ul><ul><li>E.g. </li></ul></ul><ul><li>NOT all predicates need to be considered? </li></ul><ul><ul><li>Predicates are uniformly instrumented </li></ul></ul><ul><ul><li>Only fault-relevant predicates count </li></ul></ul>
- 41. Predicate Weighting in a Nutshell <ul><li>Fault-relevant predicates receive higher weights </li></ul><ul><li>Fault-relevance is implied by rankings </li></ul>Pred 2 Pred 6 Pred 1 Pred 3 Pred 2 Pred 6 Pred 1 Pred 3 Pred 2 Pred 1 Pred 3 Pred 6 Pred 2 Pred 1 Pred 3 Pred 6 Mostly favored predicates receive higher weights
- 42. Automated Failure Assignment <ul><li>Most-favored predicates indicate the agreed bug location for a group of failures </li></ul><ul><li>Predicate spectrum graph </li></ul>Pred. Index Y 0 1 3 6 2 4 5 2 4 Pred 2 Pred 6 Pred 1 Pred 3 Pred 2 Pred 6 Pred 1 Pred 3 Pred 2 Pred 1 Pred 3 Pred 6 Pred 2 Pred 1 Pred 3 Pred 6
- 43. Case Study 1: Grep-2.2 <ul><li>470 test cases in total </li></ul><ul><li>136 cases fail due to both faults, no crashes </li></ul><ul><li>48 fail due to Fault 1, 88 fail due to Fault 2 </li></ul>
- 44. Failure Proximity Graphs <ul><li>Red crosses are failures due to Fault 1 </li></ul><ul><li>Blue circles are failures due to Fault 2 </li></ul><ul><li>Divergent behaviors due to the same fault </li></ul><ul><li>Better clustering result under R-Proximity </li></ul>T-Proximity R-Proximity
- 45. Guided Failure Assignment <ul><li>What predicates are favored in each group? </li></ul>
- 46. Assign Failures to Appropriate Developers <ul><li>The 21 failing cases in Cluster 1 are assigned to developers responsible for the function grep </li></ul><ul><li>The 112 failing cases in Cluster 2 are assigned to developers responsible for the function comsub </li></ul>
- 47. Case Study 2: Gzip-1.2.3 <ul><li>217 test cases in total </li></ul><ul><li>82 cases fail due to both faults, no crashes </li></ul><ul><li>65 fail due to Fault 1, 17 fail due to Fault 2 </li></ul>
- 48. Failure Proximity Graphs <ul><li>Red crosses are for failures due to Fault 1 </li></ul><ul><li>Blue circles are for failures due to Fault 2 </li></ul><ul><li>Nearly perfect clustering under R-Proximity </li></ul><ul><li>Accurate failure assignment </li></ul>T-Proximity R-Proximity
- 49. Outline <ul><li>Automated Debugging and Failure Triage </li></ul><ul><li>SOBER: Statistical Model-Based Fault Localization </li></ul><ul><li>Fault Localization-Based Failure Triage </li></ul><ul><li>Copy and Paste Bug Mining </li></ul><ul><li>Conclusions & Future Research </li></ul>
- 50. Mining Copy-Paste Bugs <ul><li>Copy-pasting is common </li></ul><ul><ul><li>12% in Linux file system [Kasper2003] </li></ul></ul><ul><ul><li>19% in X Window system [Baker1995] </li></ul></ul><ul><li>Copy-pasted code is error prone </li></ul><ul><ul><li>Among 35 errors in Linux drivers/i2o, 34 are caused by copy-paste [Chou2001] </li></ul></ul>Forget to change! void __init prom_meminit(void) { …… for (i=0; i<n; i++) { total [i].adr = list[i].addr; total [i].bytes = list[i].size; total [i].more = & total [i+1]; } …… for (i=0; i<n; i++) { taken [i].adr = list[i].addr; taken [i].bytes = list[i].size; taken [i].more = & total [i+1]; } for (i=0; i<n; i++) { total [i].adr = list[i].addr; total [i].bytes = list[i].size; total [i].more = & total [i+1]; } (Simplified example from linux-2.6.6/arch/sparc/prom/memory.c )
- 51. An Overview of Copy-Paste Bug Detection Parse source code & build a sequence database Mine for basic copy-pasted segments Compose larger copy-pasted segments Prune false positives
- 52. Parsing Source Code <ul><li>Purpose: building a sequence database </li></ul><ul><li>Idea: statement number </li></ul><ul><ul><li>Tokenize each component </li></ul></ul><ul><ul><li>Different operators/constant/key words different tokens </li></ul></ul><ul><li>Handle identifier renaming: </li></ul><ul><ul><li>same type of identifiers same token </li></ul></ul>5 61 20 old = 3; Tokenize Hash 16 new = 3; 5 61 20 Hash 16
- 53. Building Sequence Database <ul><li>Program a long sequence </li></ul><ul><ul><li>Need a sequence database </li></ul></ul><ul><li>Cut the long sequence </li></ul><ul><ul><li>Naïve method: fixed length </li></ul></ul><ul><ul><li>Our method: basic block </li></ul></ul>for (i=0; i<n; i++) { total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1]; } …… for (i=0; i<n; i++) { taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1]; } 65 16 16 71 … 65 16 16 71 Final sequence DB: (65) (16, 16, 71) … (65) (16, 16, 71) Hash values
- 54. Mining for Basic Copy-pasted Segments <ul><li>Apply frequent sequence mining algorithm on the sequence database </li></ul><ul><li>Modification </li></ul><ul><ul><li>Constrain the max gap </li></ul></ul>total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1]; taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1]; (16, 16, 71) …… (16, 16, 10, 71) (16, 16, 71) …… (16, 16, 71) Frequent subsequence Insert 1 statement ( gap = 1)
- 55. Composing Larger Copy-Pasted Segments <ul><li>Combine the neighboring copy-pasted segments repeatedly </li></ul>for (i=0; i<n; i++) { total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1]; for (i=0; i<n; i++) { taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1]; for (i=0; i<n; i++) { taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1]; } for (i=0; i<n; i++) { total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1]; } 65 65 16 16 71 16 16 71 65 16 16 71 65 16 16 71 Hash values copy-pasted …… combine combine
- 56. Pruning False Positives <ul><li>Unmappable segments </li></ul><ul><ul><li>Identifier names cannot be mapped to corresponding ones </li></ul></ul><ul><li>Tiny segments </li></ul><ul><li>For more detail, see </li></ul><ul><ul><li>Zhenmin Li, Shan Lu, Suvda Myagmar, Yuanyuan Zhou. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code, in Proc. 6th Symp. Operating Systems Design and Implementation , 2004 </li></ul></ul>f (a1); f (a2); f (a3); f1 (b1); f1 (b2); f2 (b3); conflict
- 57. Some Test Results of C-P Bug Detection 0 2 PostgreSQL 0 5 Apache 8 23 FreeBSD 21 28 Linux Potential Bugs (careless programming) Verified Bugs Software 458 K PostgreSQL 224 K Apache 3.3 M FreeBSD 4.4 M Linux # LOC Software Space (MB) Time Software 57 38 secs PostgreSQL 30 15 secs Apache 459 20 mins FreeBSD 527 20 mins Linux
- 58. Outline <ul><li>Automated Debugging and Failure Triage </li></ul><ul><li>SOBER: Statistical Model-Based Fault Localization </li></ul><ul><li>Fault Localization-Based Failure Triage </li></ul><ul><li>Copy and Paste Bug Mining </li></ul><ul><li>Conclusions & Future Research </li></ul>
- 59. Conclusions <ul><li>Data mining into software and computer systems </li></ul><ul><li>Identify incorrect executions from program runtime behaviors </li></ul><ul><li>Classification dynamics can give away “backtrace” for noncrashing bugs without any semantic inputs </li></ul><ul><li>A hypothesis testing-like approach is developed to localize logic bugs in software </li></ul><ul><li>No prior knowledge about the program semantics is assumed </li></ul><ul><li>Lots of other software bug mining methods should be and explored </li></ul>
- 60. Future Research: Mining into Computer Systems <ul><li>Huge volume of data from computer systems </li></ul><ul><ul><li>Persistent state interactions, event logs, network logs, CPU usage, … </li></ul></ul><ul><li>Mining system data for … </li></ul><ul><ul><li>Reliability </li></ul></ul><ul><ul><li>Performance </li></ul></ul><ul><ul><li>Manageability </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>Challenges in data mining </li></ul><ul><ul><li>Statistical modeling of computer systems </li></ul></ul><ul><ul><li>Online, scalability, interpretability … </li></ul></ul>
- 61. References <ul><li>[DRL+98] David L. Detlefs, K. Rustan, M. Leino, Greg Nelson and James B. Saxe. Extended static checking, 1998 </li></ul><ul><li>[EGH+94] David Evans, John Guttag, James Horning, and Yang Meng Tan. LCLint: A tool for using specifications to check code. In Proceedings of the ACM SIG-SOFT '94 Symposium on the Foundations of Software Engineering, pages 87-96, 1994. </li></ul><ul><li>[DLS02] Manuvir Das, Sorin Lerner, and Mark Seigle. Esp: Path-sensitive program verication in polynomial time. In Conference on Programming Language Design and Implementation, 2002. </li></ul><ul><li>[ECC00] D.R. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specic, programmer-written compiler extensions. In Proc. 4th Symp. Operating Systems Design and Implementation, October 2000. </li></ul><ul><li>[M93] Ken McMillan. Symbolic Model Checking. Kluwer Academic Publishers, 1993 </li></ul><ul><li>[H97] Gerard J. Holzmann. The model checker SPIN. Software Engineering, 23(5):279-295, 1997. </li></ul><ul><li>[DDH+92] David L. Dill, Andreas J. Drexler, Alan J. Hu, and C. Han Yang. Protocol verication as a hardware design aid. In IEEE Int. Conf. Computer Design: VLSI in Computers and Processors, pages 522-525, 1992. </li></ul><ul><li>[MPC+02] M. Musuvathi, D. Y.W. Park, A. Chou, D. R. Engler and D. L. Dill. CMC: A Pragmatic Approach to Model Checking Real Code. In Proc. 5th Symp. Operating Systems Design and Implementation, 2002. </li></ul>
- 62. References (cont’d) <ul><li>[G97] P. Godefroid. Model Checking for Programming Languages using VeriSoft. In Proc. 24th ACM Symp. Principles of Programming Languages, 1997 </li></ul><ul><li>[BHP+-00] G. Brat, K. Havelund, S. Park, and W. Visser. Model checking programs. In IEEE Int.l Conf. Automated Software Engineering (ASE), 2000. </li></ul><ul><li>[HJ92] R. Hastings and B. Joyce. Purify: Fast Detection of Memory Leaks and Access Errors. 1991. in Proc. Winter 1992 USENIX Conference, pp. 125-138. San Francisco, California </li></ul><ul><li>Chao Liu, Xifeng Yan, and Jiawei Han, “ Mining Control Flow Abnormality for Logic Error Isolation, ” in Proc. 2006 SIAM Int. Conf. on Data Mining (SDM'06), Bethesda, MD, April 2006. </li></ul><ul><li>C. Liu, X. Yan, L. Fei, J. Han, and S. Midkiff, “ SOBER: Statistical Model-based Bug Localization ”, in Proc. 2005 ACM SIGSOFT Symp. Foundations of Software Engineering (FSE 2005), Lisbon, Portugal, Sept. 2005. </li></ul><ul><li>C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu, “ Mining Behavior Graphs for Backtrace of Noncrashing Bugs ”, in Proc. 2005 SIAM Int. Conf. on Data Mining (SDM'05), Newport Beach, CA, April 2005. </li></ul><ul><li>[SN00] Julian Seward and Nick Nethercote. Valgrind, an open-source memory debugger for x86-GNU/Linux http:// valgrind.org / </li></ul><ul><li>[LLM+04] Zhenmin Li, Shan Lu, Suvda Myagmar, Yuanyuan Zhou. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code, in Proc. 6th Symp. Operating Systems Design and Implementation, 2004 </li></ul><ul><li>[LCS+04] Zhenmin Li, Zhifeng Chen, Sudarshan M. Srinivasan, Yuanyuan Zhou. C-Miner: Mining Block Correlations in Storage Systems. In pro. 3rd USENIX conf. on file and storage technologies, 2004 </li></ul>
- 64. Surplus Slides <ul><li>The remaining are leftover slides </li></ul>
- 65. Representative Publications <ul><li>Chao Liu , Long Fei, Xifeng Yan, Jiawei Han and Samuel Midkiff, “Statistical Debugging: A Hypothesis Testing-Based Approach,” IEEE Transaction on Software Engineering , Vol. 32, No. 10, pp. 831-848, Oct., 2006. </li></ul><ul><li>Chao Liu and Jiawei Han, “R-Proximity: Failure Proximity Defined via Statistical Debugging,” IEEE Transaction on Software Engineering , Sept. 2006. (under review) </li></ul><ul><li>Chao Liu , Zeng Lian and Jiawei Han, "How Bayesians Debug", the 6th IEEE International Conference on Data Mining, pp. pp. 382-393,Hong Kong, China, Dec. 2006. </li></ul><ul><li>Chao Liu and Jiawei Han, "Failure Proximity: A Fault Localization-Based Approach", the 14th ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 286-295, Portland, USA, Nov. 2006. </li></ul><ul><li>Chao Liu , "Fault-aware Fingerprinting: Towards Mutualism between Failure Investigation and Statistical Debugging", the 14th ACM SIGSOFT Symposium on the Foundations of Software Engineering, Portland, USA, Nov. 2006. </li></ul><ul><li>Chao Liu , Chen Chen, Jiawei Han and Philip S. Yu, "GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis", the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 872-881, Philadelphia, USA, Aug. 2006. </li></ul><ul><li>Qiaozhu Mei, Chao Liu , Hang Su and Chengxiang Zhai, "A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs", the 15th International Conference on World Wide Web, pp. 533-542, Edinburgh, Scotland, May, 2006. </li></ul><ul><li>Chao Liu , Xifeng Yan and Jiawei Han, "Mining Control Flow Abnormality for Logic Error Isolation", 2006 SIAM International Conference on Data Mining, pp. 106-117, Bethesda, US, April, 2006. </li></ul><ul><li>Chao Liu , Xifeng Yan, Long Fei, Jiawei Han and Samuel Midkiff, "SOBER: Statistical Model-Based Bug Localization", the 5th joint meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 286-295, Lisbon, Portugal, Sept. 2005. </li></ul><ul><li>William Yurcik and Chao Liu . "A First Step Toward Detecting SSH Identity Theft on HPC Clusters: Discriminating Cluster Masqueraders Based on Command Behavior" the 5th International Symposium on Cluster Computing and the Grid, pp. 111-120, Cardiff, UK, May 2005. </li></ul><ul><li>Chao Liu , Xifeng Yan, Hwanjo Yu, Jiawei Han and Philip S. Yu, "Mining Behavior Graphs for "Backtrace" of Noncrashing Bugs", In Proc. 2005 SIAM Int. Conf. on Data Mining, pp. 286-297, Newport Beach, US, April, 2005. </li></ul>
- 66. Example of Noncrashing Bugs void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m > 0){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } } void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m >= 0){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } } void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m >= 0){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } } void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ( (m >= 0) && (lastm != m) ) { putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } }
- 67. Debugging Crashes Crashing Bugs
- 68. Bug Localization via Backtrace <ul><li>Can we circle out the backtrace for noncrashing bugs? </li></ul><ul><li>Major challenges </li></ul><ul><ul><li>We do not know where abnormality happens </li></ul></ul><ul><li>Observations </li></ul><ul><ul><li>Classifications depend on discriminative features, which can be regarded as a kind of abnormality </li></ul></ul><ul><ul><li>Can we extract backtrace from classification results? </li></ul></ul>
- 69. Outline <ul><li>Motivation </li></ul><ul><li>Related Work </li></ul><ul><li>Classification of Program Executions </li></ul><ul><li>Extract “Backtrace” from Classification Dynamics </li></ul><ul><li>Mining Control Flow Abnormality for Logic Error Isolation </li></ul><ul><li>CP-Miner: Mining Copy-Paste Bugs </li></ul><ul><li>Conclusions </li></ul>
- 70. Related Work <ul><li>Crashing bugs </li></ul><ul><ul><li>Memory access monitoring </li></ul></ul><ul><ul><ul><li>Purify [HJ92], Valgrind [SN00] … </li></ul></ul></ul><ul><li>Noncrashing bugs </li></ul><ul><ul><li>Static program analysis </li></ul></ul><ul><ul><li>Traditional model checking </li></ul></ul><ul><ul><li>Model checking source code </li></ul></ul>
- 71. Static Program Analysis <ul><li>Methodology </li></ul><ul><ul><li>Examine source code directly </li></ul></ul><ul><ul><li>Enumerate all the possible execution paths without running the program </li></ul></ul><ul><ul><li>Check user-specified properties, e.g. </li></ul></ul><ul><ul><ul><li>free(p) …… (*p) </li></ul></ul></ul><ul><ul><ul><li>lock(res) …… unlock(res) </li></ul></ul></ul><ul><ul><ul><li>receive_ack() … … send_data() </li></ul></ul></ul><ul><li>Strengths </li></ul><ul><ul><li>Check all possible execution paths </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>Shallow semantics </li></ul></ul><ul><ul><li>Properties can be directly mapped to source code structure </li></ul></ul><ul><li>Tools </li></ul><ul><ul><li>ESC [DRL+98], LCLint [EGH+94], ESP [DLS02], MC Checker [ECC00] … </li></ul></ul>×
- 72. Traditional Model Checking <ul><li>Methodology </li></ul><ul><ul><li>Formally model the system under check in a particular description language </li></ul></ul><ul><ul><li>Exhaustive exploration of the reachable states in checking desired or undesired properties </li></ul></ul><ul><li>Strengths </li></ul><ul><ul><li>Model deep semantics </li></ul></ul><ul><ul><li>Naturally fit in checking event-driven systems, like protocols </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>Significant amount of manual efforts in modeling </li></ul></ul><ul><ul><li>State space explosion </li></ul></ul><ul><li>Tools </li></ul><ul><ul><li>SMV [M93], SPIN [H97], Murphi [DDH+92] … </li></ul></ul>
- 73. Model Checking Source Code <ul><li>Methodology </li></ul><ul><ul><li>Run real program in sandbox </li></ul></ul><ul><ul><li>Manipulate event happenings, e.g., </li></ul></ul><ul><ul><ul><li>Message incomings </li></ul></ul></ul><ul><ul><ul><li>the outcomes of memory allocation </li></ul></ul></ul><ul><li>Strengths </li></ul><ul><ul><li>Less significant manual specification </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>Application restrictions, e.g., </li></ul></ul><ul><ul><ul><li>Event-driven programs (still) </li></ul></ul></ul><ul><ul><ul><li>Clear mapping between source code and logic event </li></ul></ul></ul><ul><li>Tools </li></ul><ul><ul><li>CMC [MPC+02], Verisoft [G97], Java PathFinder [BHP+-00] … </li></ul></ul>
- 74. Summary of Related Work <ul><li>In common, </li></ul><ul><ul><li>Semantic inputs are necessary </li></ul></ul><ul><ul><ul><li>Program model </li></ul></ul></ul><ul><ul><ul><li>Properties to check </li></ul></ul></ul><ul><ul><li>Application scenarios </li></ul></ul><ul><ul><ul><li>Shallow semantics </li></ul></ul></ul><ul><ul><ul><li>Event-driven system </li></ul></ul></ul>
- 75. Outline <ul><li>Motivation </li></ul><ul><li>Related Work </li></ul><ul><li>Classification of Program Executions </li></ul><ul><li>Extract “Backtrace” from Classification Dynamics </li></ul><ul><li>Mining Control Flow Abnormality for Logic Error Isolation </li></ul><ul><li>CP-Miner: Mining Copy-Paste Bugs </li></ul><ul><li>Conclusions </li></ul>
- 76. Example Revisited void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m > 0){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } } void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m >= 0){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } } void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m >= 0){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } } void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ( (m >= 0) && (lastm != m) ) { putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } } <ul><li>No memory violations </li></ul><ul><li>Not event-driven program </li></ul><ul><li>No explicit error properties </li></ul>
- 77. Identification of Incorrect Executions <ul><li>A two-class classification problem </li></ul><ul><ul><li>How to abstract program executions </li></ul></ul><ul><ul><ul><li>Program behavior graph </li></ul></ul></ul><ul><ul><li>Feature selection </li></ul></ul><ul><ul><ul><li>Edges + Closed frequent subgraphs </li></ul></ul></ul><ul><li>Program behavior graphs </li></ul><ul><ul><li>Function-level abstraction of program behaviors </li></ul></ul>int main(){ ... A(); ... B(); } int A(){ ... } int B(){ ... C() ... } int C(){ ... }
- 78. Values of Classification <ul><li>A graph classification problem </li></ul><ul><ul><li>Every execution gives one behavior graph </li></ul></ul><ul><ul><li>Two sets of instances: correct and incorrect </li></ul></ul><ul><li>Values of classification </li></ul><ul><ul><li>Classification itself does not readily work for bug localization </li></ul></ul><ul><ul><ul><li>Classifier only labels each run as either correct or incorrect as a whole </li></ul></ul></ul><ul><ul><ul><li>It does not tell when abnormality happens </li></ul></ul></ul><ul><ul><li>Successful classification relies on discriminative features </li></ul></ul><ul><ul><ul><li>Can discriminative features be treated as a kind of abnormality? </li></ul></ul></ul><ul><ul><li>When abnormality happens? </li></ul></ul><ul><ul><ul><li>Incremental classification? </li></ul></ul></ul>?
- 79. Outline <ul><li>Motivation </li></ul><ul><li>Related Work </li></ul><ul><li>Classification of Program Executions </li></ul><ul><li>Extract “Backtrace” from Classification Dynamics </li></ul><ul><li>Mining Control Flow Abnormality for Logic Error Isolation </li></ul><ul><li>CP-Miner: Mining Copy-Paste Bugs </li></ul><ul><li>Conclusions </li></ul>
- 80. Incremental Classification <ul><li>Classification works only when instances of two classes are different. </li></ul><ul><li>So that we can use classification accuracy as a measure of difference. </li></ul><ul><li>Relate classification dynamics to bug relevant functions </li></ul>
- 81. Illustration: Precision Boost One Correct Execution One Incorrect Execution main main A A B C D B C D E E F G F G H
- 82. Bug Relevance <ul><li>Precision boost </li></ul><ul><ul><li>For each function F : </li></ul></ul><ul><ul><ul><li>Precision boost = Exit precision - Entrance precision. </li></ul></ul></ul><ul><ul><li>Intuition </li></ul></ul><ul><ul><ul><li>Differences take place within the execution of F </li></ul></ul></ul><ul><ul><ul><li>Abnormalities happens while F is in the stack </li></ul></ul></ul><ul><ul><ul><li>The larger this precision boost, the more likely F is part of the backtrace </li></ul></ul></ul><ul><li>Bug-relevant function </li></ul>
- 83. Outline <ul><li>Related Work </li></ul><ul><li>Classification of Program Executions </li></ul><ul><li>Extract “Backtrace” from Classification Dynamics </li></ul><ul><li>Case Study </li></ul><ul><li>Conclusions </li></ul>
- 84. Case Study <ul><li>Subject program </li></ul><ul><ul><li>replace: perform regular expression matching and substitutions </li></ul></ul><ul><ul><li>563 lines of C code </li></ul></ul><ul><ul><li>17 functions are involved </li></ul></ul><ul><li>Execution behaviors </li></ul><ul><ul><li>130 out of 5542 test cases fail to give correct outputs </li></ul></ul><ul><ul><li>No incorrect executions incur segmentation faults </li></ul></ul><ul><li>Logic bug </li></ul><ul><ul><li>Can we circle out the backtrace for this bug? </li></ul></ul>void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m >= 0){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } } void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ( (m >= 0) && (lastm != m) ) { putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } }
- 85. Precision Pairs
- 86. Precision Boost Analysis <ul><li>Objective judgment of bug relevant functions </li></ul><ul><li>main function is always bug relevant </li></ul><ul><li>Stepwise precision boost </li></ul><ul><li>Line-up property </li></ul>
- 87. Backtrace for Noncrashing Bugs
- 88. Method Summary <ul><li>Identify incorrect executions from program runtime behaviors </li></ul><ul><li>Classification dynamics can give away “backtrace” for noncrashing bugs without any semantic inputs </li></ul><ul><li>Data mining can contribute to software engineering and system researches in general </li></ul>
- 89. Outline <ul><li>Motivation </li></ul><ul><li>Related Work </li></ul><ul><li>Classification of Program Executions </li></ul><ul><li>Extract “Backtrace” from Classification Dynamics </li></ul><ul><li>Mining Control Flow Abnormality for Logic Error Isolation </li></ul><ul><li>CP-Miner: Mining Copy-Paste Bugs </li></ul><ul><li>Conclusions </li></ul>
- 90. An Example <ul><li>Replace program: 563 lines of C code, 20 functions </li></ul><ul><li>Symptom: 30 out of 5542 test cases fail to give correct outputs, and no crashes </li></ul><ul><li>Goal: Localizing the bug, and prioritizing manual examination </li></ul><ul><li>void dodash(char delim, char *src, int *i, char *dest, int *j, int maxset) </li></ul><ul><li>{ </li></ul><ul><li>while (…){ </li></ul><ul><li>… </li></ul><ul><li>if(isalnum(isalnum(src[*i+1]) && src[*i-1]<=src[*i+1] ){ </li></ul><ul><ul><li>for(k = src[*i-1]+1; k<=src[*i+1]; k++) </li></ul></ul><ul><ul><li>junk = addst(k, dest, j, maxset); </li></ul></ul><ul><ul><li>*i = *i + 1; </li></ul></ul><ul><ul><li>} </li></ul></ul><ul><ul><li>*i = *i + 1; </li></ul></ul><ul><li>} </li></ul><ul><li>} </li></ul>
- 91. Difficulty & Expectation <ul><li>Difficulty </li></ul><ul><ul><li>Statically, even small programs are complex due to dependencies </li></ul></ul><ul><ul><li>Dynamically, execution paths can vary significantly across all possible inputs </li></ul></ul><ul><ul><li>Logic errors have no apparent symptoms </li></ul></ul><ul><li>Expectations </li></ul><ul><ul><li>Unrealistic to fully unload developers </li></ul></ul><ul><ul><li>Localize buggy region </li></ul></ul><ul><ul><li>Prioritize manual examination </li></ul></ul>
- 92. Execution Profiling <ul><li>Full execution trace </li></ul><ul><ul><li>Control flow + value tags </li></ul></ul><ul><ul><li>Too expensive to record at runtime </li></ul></ul><ul><ul><li>Unwieldy to process </li></ul></ul><ul><li>Summarized control flow for conditionals (if, while, for) </li></ul><ul><ul><li>Branch evaluation counts </li></ul></ul><ul><ul><li>Lightweight to take at runtime </li></ul></ul><ul><ul><li>Easy to process and effective </li></ul></ul>
- 93. Analysis of the Example <ul><ul><li>A = isalnum(isalnum(src[*i+1])) </li></ul></ul><ul><ul><li>B = src[*i-1]<=src[*i+1] </li></ul></ul><ul><li>An execution is logically correct until (A ^ ¬B) is evaluated as true when the evaluation reaches this condition </li></ul><ul><li>If we monitor the program conditionals like A here, their evaluation will shed light on the hidden error and can be exploited for error isolation </li></ul><ul><li>if(isalnum(isalnum(src[*i+1]) && src[*i-1]<=src[*i+1]){ </li></ul><ul><ul><li>for(k = src[*i-1]+1; k<=src[*i+1]; k++) </li></ul></ul><ul><ul><li>junk = addst(k, dest, j, maxset); </li></ul></ul><ul><ul><li>*i = *i + 1; } </li></ul></ul>
- 94. Analysis of Branching Actions <ul><li>Correct vs. in correct runs in program P </li></ul><ul><li>AS we tested through 5542 test cases, the true eval prob for (A^¬B) is 0.727 in a correct and 0.896 in an incorrect execution on average </li></ul><ul><li>Error location does exhibit detectable abnormal behaviors in incorrect executions </li></ul>n ¬A¬B n A¬B = 0 ¬B n ¬AB n AB B ¬A A n ¬A¬B n A¬B ≥1 ¬B n ¬AB n AB B ¬A A
- 95. Conditional Test Works for Nonbranching Errors <ul><li>Off-by-one error can still be detected using the conditional tests </li></ul>Void makepat (char *arg, int start, char delim, char *pat) { … if (!junk) result = 0; else result = i + 1; /* off-by-one error */ /* should be: result = i */ return result; }
- 96. Ranking Based on Boolean Bias <ul><li>Let input d i has a desired output o i . We execute P. P passes the test iff o i ’ is identical to o i </li></ul><ul><ul><li>T p = {t i | o i ’= P (d i ) matches o i } </li></ul></ul><ul><ul><li>T f = {t i | o i ’= P (d i ) does not match o i } </li></ul></ul><ul><li>Boolean bias: </li></ul><ul><ul><li>n t : # times that a boolean feature B evaluates true, similar for n f </li></ul></ul><ul><ul><li>Boolean bias: π (B) = (n t – n f )/(n t + n f ) </li></ul></ul><ul><ul><li>It encodes the distribution of B’s value: 1 if B always assumes true, -1 if always false, in between for all the other mixtures </li></ul></ul>
- 97. Evaluation Abnormality <ul><li>Boolean bias for branch P </li></ul><ul><ul><li>the probability of being evaluated as true within one execution </li></ul></ul><ul><li>Suppose we have n correct and m incorrect executions, for any predicate P , we end up with </li></ul><ul><ul><li>An observation sequence for correct runs </li></ul></ul><ul><ul><ul><li>S_p = (X’_1, X’_2, …, X’_n) </li></ul></ul></ul><ul><ul><li>An observation sequence for incorrect runs </li></ul></ul><ul><ul><ul><li>S_f = (X_1, X_2, …, X_m) </li></ul></ul></ul><ul><li>Can we infer whether P is suspicious based on S_p and S_f ? </li></ul>
- 98. Underlying Populations <ul><li>Imagine the underlying distribution of boolean bias for correct and incorrect executions are f(X| θ p ) and f(X| θ f ) </li></ul><ul><li>S_p and S_f can be viewed as random sample from the underlying populations respectively </li></ul><ul><li>Major heuristic: The larger the divergence between f(X| θ p ) and f(X| θ f ), the more relevant the branch P is to the bug </li></ul>0 1 Prob Evaluation bias 0 1 Prob Evaluation bias
- 99. Major Challenges <ul><li>No knowledge of the closed forms of both distributions </li></ul><ul><li>Usually, we do not have sufficient incorrect executions to estimate f(X| θ f ) reliably. </li></ul>0 1 Prob Evaluation bias 0 1 Prob Evaluation bias
- 100. Our Approach: Hypothesis Testing
- 101. Faulty Functions <ul><li>Motivation </li></ul><ul><ul><li>Bugs are not necessarily on branches </li></ul></ul><ul><ul><li>Higher confidence in function rankings than branch rankings </li></ul></ul><ul><li>Abnormality score for functions </li></ul><ul><ul><li>Calculate the abnormality score for each branch within each function </li></ul></ul><ul><ul><li>Aggregate them </li></ul></ul>
- 102. Two Evaluation Measures <ul><li>CombineRank </li></ul><ul><ul><li>Combine these score by summation </li></ul></ul><ul><ul><li>Intuition: When a function contains many abnormal branches, it is likely bug-relevant </li></ul></ul><ul><li>UpperRank </li></ul><ul><ul><li>Choose the largest score as the representative </li></ul></ul><ul><ul><li>Intuition: When a function has one extremely abnormal branch, it is likely bug-relevant </li></ul></ul>
- 103. Dodash vs. Omatch: Which function is likely buggy?─And Which Measure is More Effective?
- 104. Bug Benchmark <ul><li>Bug benchmark </li></ul><ul><ul><li>Siemens Program Suite </li></ul></ul><ul><ul><ul><li>89 variants of 6 subject programs, each of 200-600 LOC </li></ul></ul></ul><ul><ul><ul><li>89 known bugs in total </li></ul></ul></ul><ul><ul><ul><li>Mainly logic (or semantic) bugs </li></ul></ul></ul><ul><ul><li>Widely used in software engineering research </li></ul></ul>
- 105. Results on Program “replace”
- 106. Comparison between CombineRank and UpperRank <ul><li>Buggy function ranked within top-k </li></ul>
- 107. Results on Other Programs
- 108. More Questions to Be Answered <ul><li>What will happen (i.e., how to handle) if multiple errors exist in one program? </li></ul><ul><li>How to detect bugs if only very few error test cases are available? </li></ul><ul><li>Is it really more effective if we have more execution traces? </li></ul><ul><li>How to integrate program semantics in this statistics-based testing algorithm? </li></ul><ul><li>How to integrate program semantics analysis with statistics-based analysis? </li></ul>

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment