Your SlideShare is downloading. ×
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Debug me
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Debug me

532

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
532
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Debug Me : Statistical-based Fault Localization Toolbox Presented by: Noha Elprince noha.elprince@uwaterloo.ca
  • 2. Outline —  Categories of Bug Localization Techniques —  Software Bug Categories —  Related Work —  SOBER Approach —  Debug Me —  Future Work 2
  • 3. Categories of Bug Localization Techniques —  Static Analysis: Ø  Locate bugs by checking source code. Ø  Cons. Restrictive in scope to a certain language. —  Dynamic Analysis: Ø  Locate bugs by: v Labeling of each execution as either correct or incorrect. v Contrasting runtime behavior of correct & incorrect executions. v Cons. Availability of passing and failing control flows may not be satisfiable in real-world scenarios. 3
  • 4. Software Bug Categories §  Memory bugs —  Improper memory accesses and usage —  A lot of study and effective detection tools §  Concurrency bugs •  Wrong synchronization in concurrent execution •  Increasingly important with the pervading concurrent program trend •  Hard to detect §  Semantic bugs •  Violation to the design requirements or programmer intentions •  Biggest part (~80%*) of software bugs •  No silver bullet * Have Things Changed Now? -- An Empirical Study of Bug Characteristics in Modern Open Source Software [ACID’06] 4
  • 5. Trend of bugs: Concurrency Bugs 5 * Have Things Changed Now? -- An Empirical Study of Bug Characteristics in Modern Open Source Software [ACID’06]
  • 6. Trend of bugs: Memory vs. Semantic •  Memory-related bugs have decreased because quite a few effective detection tools became available recently. •  Semantic bugs are the dominant root causes, as they are application specific and difficult to fix. => More efforts should be put into detecting and fixing them. 6 * Have Things Changed Now? -- An Empirical Study of Bug Characteristics in Modern Open Source Software [ACID’06]
  • 7. Subcategories of Semantic Bugs —  Missing Features —  Missing Cases —  Wrong Control Flow —  Exception Handling —  Processing (e.g. incorrect evaluation of expressions & equations) —  Typo —  Other wrong functionality implementation 7
  • 8. Related Work Dynamic Bug Localization Techniques Slicing-based •  Static slicing •  Dynamic slicing Statistical-based •  Path profiles •  Model checking •  Predicate values e.g. Liblit[05], Sober[05] Graph-based •  Probabilistic Data Dependence Graph •  Data Flow graph •  Extended Data Flow graph 8
  • 9. Tarantula (ASE 2005, ISSTA 2007) —  Statements in a program that are primarily executed by failed test cases are more likely to be faulty than those that are primarily executed by passed test cases. Ø  Cons Performs poorly when there are multiple bugs 9
  • 10. Scalable Remote Bug Isolation (PLDI 2004, 2005) —  Look at predicates —  Branches (True/False) —  Function returns (<0, <=0, >0, >=0, ==0, !=0) —  Scalar pairs —  For each assignment x=…, find all variables y_i and constants c_j, each pair of x (=,<,<=…) y_i/c_j —  Investigate the relation of the probability of a predicate being true with the bug existence. 10
  • 11. Bug Isolation F(p) = # of failing runs in which p is observed to be true S(p) = # of successful runs in which p observed to be false 11
  • 12. Problem in Bug Isolation technique void subline(char *lin, char *pat, char *sub) { voidint i, lastm, m; subline(char *lin, char *pat, char *sub) { lastm = -1; int i, i = 0; lastm, m; lastm = -1; while((lin[i] != ENDSTR)) { i = 0; = amatch(lin, i, pat, 0); m while((lin[i] != ENDSTR)) { != m) ){ if ((m >= 0) && (lastm m = putsub(lin, i, m, sub); amatch(lin, i, pat, 0); —  The predicate are evaluated to both true and false in one execution if (m >= 0){m; lastm = putsub(lin, i, m, sub); } lastm -1) if ((m == = m;|| (m == i)){ } fputc(lin[i], stdout); if ((m=== + 1;|| (m == i)){ i i -1) fputc(lin[i], stdout); } else i = i i = m;+ 1; } } not enough } else i = m; } } 12
  • 13. void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ((m >= 0) && (lastm != m) ){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } } 13
  • 14. SOBER Technique —  A predicate can be evaluated multiple times during one execution —  Every evaluation gives either true or false —  Therefore, a predicate is simply a boolean random variable, which encodes program executions from a particular aspect. 14
  • 15. SOBER Technique [LY05, LY06] —  Evaluation bias of Predicate P —  Def’n: the probability of being evaluated as true within one execution —  Maximum likelihood estimation: Number of true evaluations over the total number of evaluations in one run —  Each run gives one observation of evaluation bias for predicate P —  Suppose we have n correct and m incorrect executions, for any predicate P, we end up with —  An observation sequence for correct runs —  S_p = (X’_1, X’_2, …, X’_n) —  An observation sequence for incorrect runs —  S_f = (X_1, X_2, …, X_m) —  Can we infer whether P is suspicious based on S_p and S_f? 15
  • 16. Underlying Populations —  Imagine the underlying distribution of evaluation bias for correct and incorrect executions are and —  S_p and S_f can be viewed as a random sample from the underlying populations respectively —  One major heuristic is —  The larger the divergence between and more relevant the predicate P is to the bug Prob Prob 0 , the Evaluation bias 1 0 Evaluation bias 1 16
  • 17. Major Challenges Prob Prob 0 Evaluation bias 1 0 Evaluation bias 1 —  No knowledge of the closed forms of both distributions —  Usually, we do not have sufficient incorrect executions to estimate reliably. 17
  • 18. SOBER’s Approach [LY05, LY06] 18
  • 19. Algorithm Outputs —  A ranked list of program predicates w.r.t. the bug relevance score s(P) —  Higher-ranked predicates are regarded more relevant to the bug —  What’s the use? —  Top-ranked predicates suggest the possible buggy regions 19
  • 20. SOBER’s Experiment Results —  Localization quality metric —  Software bug benchmark —  Quantitative metric —  Related works —  Cause Transition (CT), [CZ05] —  Statistical Debugging, [LN+05] —  Performance comparisons 20
  • 21. SOBER’s Bug Benchmark —  Bug benchmark —  Dreaming benchmark —  Large number of known bugs on large-scale programs with adequate test suite —  Siemens Program Suite —  130 variants of 7 subject programs, each of 100-600 LOC —  130 known bugs in total —  mainly logic (or semantic) bugs —  Advantages —  Known bugs, thus judgments are objective —  Large number of bugs, thus comparative study is statistically significant. —  Disadvantages —  Small-scaled subject programs —  State-of-the-art performance, so far claimed in literature, —  Cause-transition approach, [CZ05] 21
  • 22. Localization Quality Metric H.Cleve and A.Zeller “ Locating Causes of Program Failures” [ICSE’05] 22
  • 23. 1st Example 1 10 2 6 3 5 4 7 9 8 T-score = 70% 23
  • 24. 2nd Example 1 10 2 6 3 5 4 7 9 8 T-score = 20% 24
  • 25. Problems in Related Works —  Cause Transition (CT) approach [CZ05] —  A variant of delta debugging [Z02] —  Previous state-of-the-art performance holder on Siemens suite —  Published in ICSE’05, May 15, 2005 —  Cons: it relies on memory abnormality, hence its performance is restricted. —  Statistical Debugging (Liblit05) [LN+05] —  Predicate ranking based on discriminant analysis —  Published in PLDI’05, June 12, 2005 —  Cons: Ignores evaluation patterns of predicates within each execution 25
  • 26. Localized bugs w.r.t. Examined Code 26
  • 27. Cumulative Effects w.r.t. Code Examination 27
  • 28. Top-k Selection —  Regardless of specific selection of k, both Liblit05 and SOBER are better than CT, the current state-of-the-art holder —  From k=2 to 10, SOBER is better than Liblit05 consistently 28
  • 29. SOBER Limitations —  Evaluation bias for return predicates are not explained. —  Predicates within loops are not explained. —  Doesn’t handle scalar predicates. —  Handles crashing and non-crashing bugs equivalently. —  Test suite contains only pass / Fail cases but doesn’t handle skipped cases. —  Ranking of bugs is based on descending sorting of top k ‘s bug-relevance score without considering: how k is determined ? 29
  • 30. DebugMe: Architecture Test Suite DebugMe Bug Report Buggy Prog. 30
  • 31. Test Suites’ specifications 31
  • 32. Experimental Evaluation Buggy Code Version Correct Code Version 1 0.8 0.8 0.6 0.4 0.2 0 1 2 3 Test Suite Expected Error Expected Error 1 0.6 0.4 0.2 0 1 2 3 Expected Error for buggy program Test Suite Expected Error for correct program Optimal Expected Error Optimal Expected Error * Expected error is equivalent to the bug relevant score for the top-1 suspicious predicate 32
  • 33. Experience —  Testing DebugMe using 3 prepared programs along with their test suites. —  No Instrumentation was performed as all predicate(s) evaluation was performed inside DebugMe. —  For predicates within loops, we find that test suite’s quality affects greatly the result. So, whenever the test suite contain all the cases that may be encountered in a single loop, we find that the bug localization accuracy improves. 33
  • 34. Future Work —  Enhance DebugMe by overcoming Sober’s limitations. —  DebugMe can be generalized to accommodate debugging several languages chosen via an interactive GUI. —  Study how test suites affect the bug localization accuracy. —  Compare Statistical methods with Graph methods. 34
  • 35. Testing Problems —  Test oracle problem : —  A test oracle is responsible for the decision, whether a test case passes or not. —  In Dynamic Analysis expected results should be provided, which can be compared to the actual results. => more complex oracles are needed. 35
  • 36. References —  [LY05] C. Liu, X. Yan, L. Fei, J. Han, and S. Midkiff. Sober: Sta- tistical model-based bug localization. In FSE: Foundations of Software Engineering, 2005. —  [LY06] C. Liu, L. Fei, X. Yan, J. Han, and S. Midkiff. Statistical debugging: A hypothesis testing-based approach. IEEE Transactions on Software Engineering, 32 (10), 2006. —  [CZ05] H. Cleve and A. Zeller. Locating causes of program failures. In Proc. 27th Int. Conf. Software Engineering (ICSE’05), 2005. —  [LN+05] B. Liblit, M. Naik, A. Zheng, A. Aiken, and M. Jordan. Scalable statistical bug isolation. In Proc. ACM SIGPLAN 2005 Int. Conf. Programming Language Design and Implementation (PLDI’05), 2005. —  [Z02] A. Zeller. Isolating cause-effect chains from computer programs. In Proc. ACM 10th Int. Symp. Foundations of Software Engineering (FSE’02), 2002. 36

×