Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Debug me

848 views

Published on

Published in: Technology
  • Be the first to comment

Debug me

  1. 1. Debug Me : Statistical-based Fault Localization Toolbox Presented by: Noha Elprince noha.elprince@uwaterloo.ca
  2. 2. Outline —  Categories of Bug Localization Techniques —  Software Bug Categories —  Related Work —  SOBER Approach —  Debug Me —  Future Work 2
  3. 3. Categories of Bug Localization Techniques —  Static Analysis: Ø  Locate bugs by checking source code. Ø  Cons. Restrictive in scope to a certain language. —  Dynamic Analysis: Ø  Locate bugs by: v Labeling of each execution as either correct or incorrect. v Contrasting runtime behavior of correct & incorrect executions. v Cons. Availability of passing and failing control flows may not be satisfiable in real-world scenarios. 3
  4. 4. Software Bug Categories §  Memory bugs —  Improper memory accesses and usage —  A lot of study and effective detection tools §  Concurrency bugs •  Wrong synchronization in concurrent execution •  Increasingly important with the pervading concurrent program trend •  Hard to detect §  Semantic bugs •  Violation to the design requirements or programmer intentions •  Biggest part (~80%*) of software bugs •  No silver bullet * Have Things Changed Now? -- An Empirical Study of Bug Characteristics in Modern Open Source Software [ACID’06] 4
  5. 5. Trend of bugs: Concurrency Bugs 5 * Have Things Changed Now? -- An Empirical Study of Bug Characteristics in Modern Open Source Software [ACID’06]
  6. 6. Trend of bugs: Memory vs. Semantic •  Memory-related bugs have decreased because quite a few effective detection tools became available recently. •  Semantic bugs are the dominant root causes, as they are application specific and difficult to fix. => More efforts should be put into detecting and fixing them. 6 * Have Things Changed Now? -- An Empirical Study of Bug Characteristics in Modern Open Source Software [ACID’06]
  7. 7. Subcategories of Semantic Bugs —  Missing Features —  Missing Cases —  Wrong Control Flow —  Exception Handling —  Processing (e.g. incorrect evaluation of expressions & equations) —  Typo —  Other wrong functionality implementation 7
  8. 8. Related Work Dynamic Bug Localization Techniques Slicing-based •  Static slicing •  Dynamic slicing Statistical-based •  Path profiles •  Model checking •  Predicate values e.g. Liblit[05], Sober[05] Graph-based •  Probabilistic Data Dependence Graph •  Data Flow graph •  Extended Data Flow graph 8
  9. 9. Tarantula (ASE 2005, ISSTA 2007) —  Statements in a program that are primarily executed by failed test cases are more likely to be faulty than those that are primarily executed by passed test cases. Ø  Cons Performs poorly when there are multiple bugs 9
  10. 10. Scalable Remote Bug Isolation (PLDI 2004, 2005) —  Look at predicates —  Branches (True/False) —  Function returns (<0, <=0, >0, >=0, ==0, !=0) —  Scalar pairs —  For each assignment x=…, find all variables y_i and constants c_j, each pair of x (=,<,<=…) y_i/c_j —  Investigate the relation of the probability of a predicate being true with the bug existence. 10
  11. 11. Bug Isolation F(p) = # of failing runs in which p is observed to be true S(p) = # of successful runs in which p observed to be false 11
  12. 12. Problem in Bug Isolation technique void subline(char *lin, char *pat, char *sub) { voidint i, lastm, m; subline(char *lin, char *pat, char *sub) { lastm = -1; int i, i = 0; lastm, m; lastm = -1; while((lin[i] != ENDSTR)) { i = 0; = amatch(lin, i, pat, 0); m while((lin[i] != ENDSTR)) { != m) ){ if ((m >= 0) && (lastm m = putsub(lin, i, m, sub); amatch(lin, i, pat, 0); —  The predicate are evaluated to both true and false in one execution if (m >= 0){m; lastm = putsub(lin, i, m, sub); } lastm -1) if ((m == = m;|| (m == i)){ } fputc(lin[i], stdout); if ((m=== + 1;|| (m == i)){ i i -1) fputc(lin[i], stdout); } else i = i i = m;+ 1; } } not enough } else i = m; } } 12
  13. 13. void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ((m >= 0) && (lastm != m) ){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } } 13
  14. 14. SOBER Technique —  A predicate can be evaluated multiple times during one execution —  Every evaluation gives either true or false —  Therefore, a predicate is simply a boolean random variable, which encodes program executions from a particular aspect. 14
  15. 15. SOBER Technique [LY05, LY06] —  Evaluation bias of Predicate P —  Def’n: the probability of being evaluated as true within one execution —  Maximum likelihood estimation: Number of true evaluations over the total number of evaluations in one run —  Each run gives one observation of evaluation bias for predicate P —  Suppose we have n correct and m incorrect executions, for any predicate P, we end up with —  An observation sequence for correct runs —  S_p = (X’_1, X’_2, …, X’_n) —  An observation sequence for incorrect runs —  S_f = (X_1, X_2, …, X_m) —  Can we infer whether P is suspicious based on S_p and S_f? 15
  16. 16. Underlying Populations —  Imagine the underlying distribution of evaluation bias for correct and incorrect executions are and —  S_p and S_f can be viewed as a random sample from the underlying populations respectively —  One major heuristic is —  The larger the divergence between and more relevant the predicate P is to the bug Prob Prob 0 , the Evaluation bias 1 0 Evaluation bias 1 16
  17. 17. Major Challenges Prob Prob 0 Evaluation bias 1 0 Evaluation bias 1 —  No knowledge of the closed forms of both distributions —  Usually, we do not have sufficient incorrect executions to estimate reliably. 17
  18. 18. SOBER’s Approach [LY05, LY06] 18
  19. 19. Algorithm Outputs —  A ranked list of program predicates w.r.t. the bug relevance score s(P) —  Higher-ranked predicates are regarded more relevant to the bug —  What’s the use? —  Top-ranked predicates suggest the possible buggy regions 19
  20. 20. SOBER’s Experiment Results —  Localization quality metric —  Software bug benchmark —  Quantitative metric —  Related works —  Cause Transition (CT), [CZ05] —  Statistical Debugging, [LN+05] —  Performance comparisons 20
  21. 21. SOBER’s Bug Benchmark —  Bug benchmark —  Dreaming benchmark —  Large number of known bugs on large-scale programs with adequate test suite —  Siemens Program Suite —  130 variants of 7 subject programs, each of 100-600 LOC —  130 known bugs in total —  mainly logic (or semantic) bugs —  Advantages —  Known bugs, thus judgments are objective —  Large number of bugs, thus comparative study is statistically significant. —  Disadvantages —  Small-scaled subject programs —  State-of-the-art performance, so far claimed in literature, —  Cause-transition approach, [CZ05] 21
  22. 22. Localization Quality Metric H.Cleve and A.Zeller “ Locating Causes of Program Failures” [ICSE’05] 22
  23. 23. 1st Example 1 10 2 6 3 5 4 7 9 8 T-score = 70% 23
  24. 24. 2nd Example 1 10 2 6 3 5 4 7 9 8 T-score = 20% 24
  25. 25. Problems in Related Works —  Cause Transition (CT) approach [CZ05] —  A variant of delta debugging [Z02] —  Previous state-of-the-art performance holder on Siemens suite —  Published in ICSE’05, May 15, 2005 —  Cons: it relies on memory abnormality, hence its performance is restricted. —  Statistical Debugging (Liblit05) [LN+05] —  Predicate ranking based on discriminant analysis —  Published in PLDI’05, June 12, 2005 —  Cons: Ignores evaluation patterns of predicates within each execution 25
  26. 26. Localized bugs w.r.t. Examined Code 26
  27. 27. Cumulative Effects w.r.t. Code Examination 27
  28. 28. Top-k Selection —  Regardless of specific selection of k, both Liblit05 and SOBER are better than CT, the current state-of-the-art holder —  From k=2 to 10, SOBER is better than Liblit05 consistently 28
  29. 29. SOBER Limitations —  Evaluation bias for return predicates are not explained. —  Predicates within loops are not explained. —  Doesn’t handle scalar predicates. —  Handles crashing and non-crashing bugs equivalently. —  Test suite contains only pass / Fail cases but doesn’t handle skipped cases. —  Ranking of bugs is based on descending sorting of top k ‘s bug-relevance score without considering: how k is determined ? 29
  30. 30. DebugMe: Architecture Test Suite DebugMe Bug Report Buggy Prog. 30
  31. 31. Test Suites’ specifications 31
  32. 32. Experimental Evaluation Buggy Code Version Correct Code Version 1 0.8 0.8 0.6 0.4 0.2 0 1 2 3 Test Suite Expected Error Expected Error 1 0.6 0.4 0.2 0 1 2 3 Expected Error for buggy program Test Suite Expected Error for correct program Optimal Expected Error Optimal Expected Error * Expected error is equivalent to the bug relevant score for the top-1 suspicious predicate 32
  33. 33. Experience —  Testing DebugMe using 3 prepared programs along with their test suites. —  No Instrumentation was performed as all predicate(s) evaluation was performed inside DebugMe. —  For predicates within loops, we find that test suite’s quality affects greatly the result. So, whenever the test suite contain all the cases that may be encountered in a single loop, we find that the bug localization accuracy improves. 33
  34. 34. Future Work —  Enhance DebugMe by overcoming Sober’s limitations. —  DebugMe can be generalized to accommodate debugging several languages chosen via an interactive GUI. —  Study how test suites affect the bug localization accuracy. —  Compare Statistical methods with Graph methods. 34
  35. 35. Testing Problems —  Test oracle problem : —  A test oracle is responsible for the decision, whether a test case passes or not. —  In Dynamic Analysis expected results should be provided, which can be compared to the actual results. => more complex oracles are needed. 35
  36. 36. References —  [LY05] C. Liu, X. Yan, L. Fei, J. Han, and S. Midkiff. Sober: Sta- tistical model-based bug localization. In FSE: Foundations of Software Engineering, 2005. —  [LY06] C. Liu, L. Fei, X. Yan, J. Han, and S. Midkiff. Statistical debugging: A hypothesis testing-based approach. IEEE Transactions on Software Engineering, 32 (10), 2006. —  [CZ05] H. Cleve and A. Zeller. Locating causes of program failures. In Proc. 27th Int. Conf. Software Engineering (ICSE’05), 2005. —  [LN+05] B. Liblit, M. Naik, A. Zheng, A. Aiken, and M. Jordan. Scalable statistical bug isolation. In Proc. ACM SIGPLAN 2005 Int. Conf. Programming Language Design and Implementation (PLDI’05), 2005. —  [Z02] A. Zeller. Isolating cause-effect chains from computer programs. In Proc. ACM 10th Int. Symp. Foundations of Software Engineering (FSE’02), 2002. 36

×