Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HIMPS_slides

266 views

Published on

Slides for the research work: efficient bug signature mining via hierarchical instrumentation

  • Be the first to comment

  • Be the first to like this

HIMPS_slides

  1. 1. Efficient Bug Signature Mining via Hierarchical Instrumentation Zhiqiang Zuo National University of Singapore Siau-Cheng Khoo National University of Singapore Chengnian Sun University of California, 1 Davis
  2. 2. Introduction  Bugs are prevalent  Statistical Debugging  Analyze failing and passing executions  Discriminative elements highly correlated to failure  Executed frequently in failing runs, but rarely in passing ones Buggy program Failing profiles Passing profiles Statistical debugging Discriminative elements 2
  3. 3. Introduction  Assumption  Every program element to be potentially relevant to the failure  Consequence  Entire program instrumented  Full-scale instrumentation incurs hefty cost  Observation  Only small portions of code are relevant to a failure  98%-99% predicates monitored are irrelevant 4 • B. Liblit, M. Naik, A.X. Zheng, A. Aiken, and M.I. Jordon, “Scalable Statistical Bug Isolation”, in PLDI, 2005
  4. 4. Introduction P m1 e m2 e e e e m3 e e e mk e e e m4 e e … … 5
  5. 5. Introduction  Full instrumentation  Statistical debugging  Top-k suspicious elements P m1 e m2 e e e e m3 e e e mk e e e m4 e e … … 6
  6. 6. Introduction  Selective instrumentation  Statistical debugging  Same top-k suspicious elements P m1 e m2 e e e e m3 e e e mk e e e m4 e e … … 7
  7. 7. Introduction  Goal  Enhance efficiency of statistical debugging  Uphold effectiveness  Selective instrumentation  Instrument only highly bug-relevant elements  Prune away unnecessary instrumentation  Challenge 8 How to perform a safe and effective selection?
  8. 8. Hierarchical Instrumentation  Insight  Hierarchical Instrumentation (HI)  Coarse-grained instrumentation and analysis  To obtain information of composite constructs (e.g., functions)  Fine-grained instrumentation and analysis  Guided by the coarse-grained execution information 9 Information collected and measured of composite constructs (e.g., functions) can be used to guide the selection of program elements (e.g., predicates) for subsequent instrumentation.
  9. 9. Hierarchical Instrumentation  Predicated bug signature mining (MPS)  Mine discriminative predicate itemset as signature  Top-k suspicious bug signatures returned  Efficient bug signature mining via HI (HIMPS)  Improve the efficiency by conducting selective instrumentation  Uphold the original effectiveness  Produce the same top-k suspicious bug signatures • C. Sun and S.C. Khoo, “Mining Succinct Predicated Bug Signatures”, in FSE, 2013 10
  10. 10. Background  Mining Predicated Bug Signatures (MPS)  Predicated bug signature  Branches, returns, scalar-pairs, float-points  Each profile is a set of predicates evaluated to true  Each profile is labeled as passing or failing  Suspiciousness measure  Discriminative significance DS(p, n)  p and n are the positive and negative supports of a given predicate  Preprocessing & bug signature mining  Convert profiles to mining dataset  Mine the itemset with high DS value as the signature • C. Sun and S.C. Khoo, “Mining Succinct Predicated Bug Signatures”, in FSE, 2013 11
  11. 11. Predicates Pruning  How to prune away unnecessary predicates?  Necessary Condition  Given signature P={e1,e2,…ek}, function mi containing predicate ei, n0=arg minn{IG(0,n)≥θ}  Safe pruning  Given coarse-grained information n(mi)  Prune away unnecessary predicates whose DS value must be less than some threshold 12 DS(p(P),n(P)) ³q Þ"i Î[1, k],n(mi ) ³ n0
  12. 12. Approach P m1 e m2 e e e e m3 e e e mk e e e m4 e e … … 13
  13. 13. Approach  Coarse-grained instrumentation  Function entry  Coarse-grained information: (n, p) n(m): negative sup p(m): positive sup P (8, 1) (4, 5) (3, 2) (7, 2) (1, 4) m1 e m2 e e e e m3 e e e mk e e e m4 e e … … 14
  14. 14. Approach  Safe pruning P  θ=0.8, n0=6 (8, 1) (4, 5) (3, 2) (7, 2) (1, 4) m1 e m2 e e e e m3 e e e mk e e e m4 e e … … 15
  15. 15. Approach  Safe pruning  θ=0.8, n0=6  n(m1) ≥ n0 n(m4) ≥ n0 P (8, 1) (4, 5) (3, 2) (7, 2) (1, 4) m1 e m2 e e e e m3 e e e mk e e e m4 e e … … 0.5 0.8 0.7 0.8 16 5 0.4
  16. 16. Threshold Boosting  How to get the threshold θ?  Issues of threshold  Safeness  Θ must be no greater than top-kth DS value  Produce the same top-k signatures  Effectiveness  Θ should be as high as possible  Prune away more predicates  Solution  A boosting pass by performing MPS with a small subset of highly suspicious predicates being instrumented 17
  17. 17. Safeness of Threshold Boosting  Lower bound  The boosted threshold θ is a lower bound of the actual top-kth DS value 18 Let θ be the top-kth DS value of signature having a few predicates instrumented, dsk be the actual top-kth DS value having all predicates instrumented, the we can derive that: θ is a lower bound of dsk, formally θ ≤ dsk.
  18. 18. Predicate Selection for Boosting Correlation Coefficient Subject replace space grep sed gzip Overall CC 0.69 0.71 0.63 0.46 0.71 0.64 19 If the DS value of a function is high, then it is quite likely that the DS values of predicates within this function are high as well
  19. 19. Approach  Rank functions in descending DS values P (8, 1) (4, 5) (3, 2) (7, 2) (1, 4) m1 e m2 e e e e m3 e e e mk e e e m4 e e … … 20
  20. 20. Approach  Rank functions in descending DS values  Select predicates within top functions  The number of predicates selected reaches percentage γ  θ=0.8 P (8, 1) (4, 5) (3, 2) (7, 2) (1, 4) m1 e m2 e e e e m3 e e e mk e e e m4 e e … … 0.5 0.8 0.7 21
  21. 21. Algorithm Predicated Bug Signature Mining via Hierarchical Instrumentation // first phase: coarse-grained instrumentation and analysis 1 Instrument all function entries in the entire program 2 Run all failing and partial passing test cases to collect coarse-grained profiles CP 3 list  AnalyzeCoarseGrainedProfiles(CP) // second phase: fine-grained instrumentation and analysis // first pass: threshold boosting 4 boost  SelectPredicatesForBoosting(list, γ) 5 Instrument all predicates in boost 6 Run all failing and passing tests to collect all fine-grained profiles BP 7 BD  Preprocess(BP) 8 BS  MineBugSignatures(BD, k) 9 θ  the top-kth DS value of signatures // first pass: safe pruning 10 prospect  PrunePredicates(list, θ) 11 Instrument all predicates in prospect - boost 12 Run all failing and passing tests to collect all fine-grained profiles PP 13 PD  Preprocess(PP) 14 PS  MineBugSignatures(PD, k) 15 Return PS 22
  22. 22. Empirical Evaluation  Subjects Subject Versions LoC Functions Predicates Tests replace 31 564 21 22,412 5,542 space 34 6,199 125 461,566 13,585 grep 12 10,068 121 1,418,835 809 sed 16 14,427 163 2,377,612 363 gzip 9 5,680 91 3,741,611 213  Setup  k: 1  γ: 5% • Software-artifact Infrastructure Repository: http://sir.unl.edu/content/sir.php 23
  23. 23. Profile Collection Execution time (in seconds) during profile collection MPS HIMPS Ratio(%) Subject original coarse boost prune total total/original replace 12,332 841 6,829 10,108 17,777 144.16 space 293,760 11,765 29,219 124,670 165,655 56.39 grep 148,803 934 8,638 18,557 28,129 18.90 sed 68,474 549 3,690 38,164 42,403 61.93 gzip 663,789 1,945 112,151 57,980 172,076 25.92 Overall 237,431 3,207 32,105 49,896 85,208 61.46 24
  24. 24. Profile Collection Disk storage space used (in KB) for profiles MPS HIMPS Ratio(%) Subject original coarse boost prune total total/original replace 125,883 116 13,686 89,132 102,935 81.77 space 6,242,337 1,950 401,59 1 2,174,017 2,577,558 41.29 grep 1,145,575 170 116,00 8 391,439 507,617 44.31 sed 864,367 125 49,515 288,398 338,038 39.11 gzip 821,421 29 50,932 146,060 197,020 23.99 Overall 1,839,916 478 126,34 6 617,809 744,634 46.09 25
  25. 25. Preprocessing & Mining Time (in seconds) and peak memory consumption (in KB) for preprocessing and mining together MPS HIMPS Ratio(%) original coarse boost & prune total/original Subjec t Time Memory Time Memory Time Memory Time Memory replace 45.89 240,974 0.03 3,161 41.45 231,617 90.4 1 96.12 space 1,976.5 1 4,642,48 5 0.40 49,463 977.4 7 2,294,89 4 49.4 7 49.43 grep 428.78 895,591 0.73 47,906 219.6 9 466,192 51.4 0 52.05 sed 110.37 733,413 1.21 75,317 55.35 383,566 51.2 5 52.30 gzip 144.03 786,913 1.85 114,62 2 58.10 327,594 41.6 3 41.63 26
  26. 26. Related Work  Statistical bug isolation  Entity & Suspiciousness measure  Tarantula [ASE’05], Ochiai [ASE’09], CBI [PLDI’05], SOBER [FSE’05]  Bug signature identification  Contextual information, bug signature  RAPID [ASE’08], Cheng et al. [ISSTA’09], Sun and Khoo [FSE’13]  Jiang and Su [ASE’07]  Other automated debugging  Program slicing; delta debugging 27
  27. 27. Conclusion  Hierarchical Instrumentation  Selective instrumentation guided by coarse-grained information through a safe pruning  Efficient predicated bug signature mining (HIMPS)  one-pass coarse-grained followed by two-pass fine-grained  Performance improvement  40% to 60% saving in time cost, peak memory and disk space usage 28
  28. 28. Thank you! questions, comments, advice? 29
  29. 29. start Coarse-grained Function list Run profiles Buggy program Coarse-grained instrumented program Run Fine-grained instrumented program Fine-grained profiles Coarse-grained Analysis Fine-grained Instrumenter Predicate Selection for Boosting Bug Signature Mining Preprocess Predicate Pruning θ Prospective predicates Run Fine-grained instrumented program Fine-grained profiles Top-k bug signatures Fine-grained Instrumenter Bug Signature Mining Preprocess Coarse-grained Instrumenter Top-k bug signatures Passing tests Partial passing tests Failing tests Predicates for boosting Failing tests end 32
  30. 30. Instrumentation  Coarse-grained  Function entries are instrumented  Each coarse-grained profile records a set of functions executed  Fine-grained  Predicates are tracked  Each fine-grained profile is a set of predicates evaluated to true during execution 40
  31. 31. Predicate Selection for Boosting  Goal  To achieve a sufficiently high DS threshold of signatures  by performing signatures mining with only a small set of predicates instrumented  Method  Coarse-grained information, i.e., DS value of functions  Rank functions in descending DS values, select the predicates within top functions until the number of predicates selected reaches percentage γ of the total number of predicates 45
  32. 32. Coarse-grained Pruning Measure  Given F: N2–›R defined over the intervals ([0, X], [0, Y]), Cp under the same domain should satisfy: • Cp is an upper bound of F, i.e., Cp(x, y)≥F(x, y) • Cp is nondecreasing, i.e., Cp(x, y)≥Cp(x-1, y) and Cp(x, y)≥Cp(x, y-1) • Cp is as close to F as possible, i.e., Σ{Cp(x, y)-F(x, y)} is minimal 46
  33. 33. Necessary Condition  Let e be a fine-grained element, m be the composite construct containing e. p(e) (or p(m)) and n(e) (or n(m)) denote the number of passing and failing runs where e (or m) is executed. Given a θ: F(p(e), n(e)) ≥ θ  Cp(p(m), n(m)) ≥ θ 47
  34. 34. Coarse-grained Ranking Measure  Cr value of a coarse-grained element should be highly correlated with the F value of enclosing fine-grained elements.  A high θ can be acquired by instrumenting fine-grained elements in the coarse-grained element of high Cr value  Derivation  F as Cr at the coarse granularity  Cp as Cr 48

×