Slideshow transcript
Slide 1: Learning near optimum inspection policies Tim@Menzies.us (WVU) Zach Milton, WVU Feb 5 2008 1
Slide 2: The Briand Threshold % defective (100,100) Modules Goal: over ho ld detected es threshold thr % LOC read 2
Slide 3: “Manual Up”: the Koru Hypothesis % defective Manual (100,100) Modules ld ho detected es thr % LOC read Smaller modules have disproportionately more defects If so, then we'll find more bugs sooner if we read “manualUp” (I.e. read smallest modules first) 3
Slide 4: Optimum Detector % defective optimal Manual (100,100) Modules ld ho detected es thr X% % LOC read X% of the code in defective modules. Some perfect oracle finds all defective modules, which, when we inspect manualUp, we find all the defects 4
Slide 5: Sub-optimum, useful automatic detector % defective optimal Manual (100,100) Modules ld ho detected es useful thr X% Y% % LOC read Triggers on Y% of the code, not all of which is defective. Useful if above manual and threshold 5
Slide 6: Comparing two detectors optimal % defective Modules detected detector1 detector 2 % LOC read Report detector performance as area = AUC(detector)/AUC(optimal • 0 <= area <= 1 (larger is better) • For 10 data sets, 10 randomizations or ordering, 3-way hold-outs (66% train, 33% test): • 300 numbers for each detector; 6 • compare with Mann-Whitney (99% confidence)
Slide 7: Technical details We don’t know The trajectory from % defective from Y% read to 100% read Modules detected detector We’ll make the most pessimistic assumption (so our results are better than what we report below) Y% 100 % % LOC read Other assumptions: • All bugs treated equally (no concept of defect severity) • Inspections are % effective at recognizing defective modules (and since we report ratio of two AUC curves, cancels out) • So these results are independent of inspection 7 effectiveness)
Slide 8: Three class of detectors • Manual methods – Manual up (inspect smallest modules first) – Manual down (inspect largest first) • Traditional learners – J48, NaiveBayes, RIPPER • A new learner – Different versions of WHICH – E.g. WHICH2loc discretizes log of numbers into two bins and favors rules that selects least LOC – E.g. WHICH8 discretizes log of numbers into 8 bins • For each learner – Take the modules selected via learning – Sort them in LOC size – Inspect them smallest to largest – Track when we stumble over a module with defects 8
Slide 9: What is WHICH? • WHICH= our new idea – Technically: is a stochastic best first search, or SBFS. – The implementation of this type of search is not done with a tree, but rather a stack. • Motto of WHICH: – Start as you mean to go on – If the learned theory is to be assessed via criteria “P” – Use “P” at every step of growing, pruning the theory • -Note: standard learners – Grow/prune via criteria “Q”, then assess the learned theory via criteria “P” 9
Slide 10: The logic of WHICH • If the red path in the above tree is a current rule that is scoring (via “P”) very well and the blue path is another rule that is scoring well also, why not skip the adding of one conjunction at a time? • Instead combine the two paths so far and see if that works out better. • This would essentially skip the growing a and bit move right to a potentially more optimum solution 10
Slide 11: WHICH Implementation outlook=overcast humidity=high outlook=overcast rain=true AND rain=true humidity=low rain=false ... • Items in a stack scored and sorted via criteria “P” • Once the stack is picked, two rules are selected randomly based on their scores and combined. • The new rule is then scored and placed back in the stack. 11 • It is placed in sorted order.
Slide 12: WHICH Implementation Continued outlook=overcast humidity=high humidity=high outlook=overcast AND outlook=overcast AND rain=true AND rain=true rain=true humidity=low rain=false • New rules that score high have a better chance to be combined. • This leads to bigger rules over time. • This process is repeated several times until either – A total number of picks is reached 12 – or a criterion is met( an early stopping condition )
Slide 13: WHICH Summary • WHICH initially creates a • WHICH supports both sorted stack of all attribute conjunction and disjunctions. ranges in isolation. • If a the two rules selected • It then, based on score, both contain different ranges randomly selects two rules from the same attribute, they from the stack, combines are OR'd together instead of them, and places the new rule AND'd in the stack in sorted order. outlook=sunny • It continues to do this until a AND rain=true stopping criterion is met. outlook=overcast outlook = [ sunny OR overcast ] AND rain = true 13
Slide 14: Sample results Manual up Manual up WHICH2 Others manual Down But how representative are these results?
Slide 15: Results type #1 : 5/8 examples WHICH > manual > traditional 15
Slide 16: “areas” in cm1 which2, 0.0, 57.4, 68.1, 71.5, 81.5, [---------------------------- |+++++ ] manualUp, 48.3, 57.4, 59.8, 65.3, 71.5, [ -----| ++++ ] nBayes, 36.2, 46.0, 52.1, 59.1, 69.2, [ ----- | ++++++ ] manualDown, 33.6, 40.3, 47.6, 49.3, 60.2, [ ---- |++++++ ] which8loc, 0.0, 0.0, 0.0, 0.0, 16.1, [++++++++ ] which8, 0.0, 0.0, 11.4, 26.2, 35.6, [ | +++++ ] which4loc, 0.0, 0.0, 0.0, 0.0, 10.4, [+++++ ] which4, 0.0, 0.0, 0.0, 41.2, 69.0, [ ++++++++++++++ ] which2loc, 0.0, 0.0, 0.0, 0.0, 40.7, [++++++++++++++++++++ ] jRip, 0.0, 0.0, 5.8, 11.5, 24.1, [ | +++++++ ] j48, 0.0, 0.0, 0.1, 12.9, 33.3, [ +++++++++++ ] 1. Distributions of results #key, ties, win, loss, win-loss @ 99% which2, 1, 9, 0, 9 manualUp, 1, 9, 0, 9 nBayes, 0, 8, 2, 6 manualDown, 0, 7, 3, 4 which8, 3, 3, 4, -1 which4, 3, 3, 4, -1 jRip, 3, 3, 4, -1 j48, 3, 3, 4, -1 which8loc, 2, 0, 8, -8 which4loc, 2, 0, 8, -8 which2loc, 2, 0, 8, -8 2. Statistical results comparing the distributions (which has the largest median ranked values?) 16
Slide 17: “areas” in KC1 which2, 71.4, 73.8, 76.0, 78.0, 81.8, [ --| ++ ] manualUp, 64.5, 65.8, 67.6, 68.9, 70.0, [ -|+ ] nBayes, 54.9, 60.2, 61.9, 63.0, 67.7, [ ---|+++ ] which4, 0.0, 49.8, 52.9, 55.2, 60.5, [------------------------ |+++ ] manualDown, 39.7, 42.2, 43.3, 45.2, 47.7, [ --|++ ] j48, 11.6, 20.5, 27.8, 31.7, 40.1, [ ----- | +++++ ] jRip, 10.2, 17.3, 21.3, 25.2, 32.4, [ ---- | ++++ ] which8loc, 0.0, 0.0, 0.0, 1.0, 2.2, [ ] which8, 0.0, 0.0, 0.0, 2.0, 33.9, [+++++++++++++++ ] which4loc, 0.0, 0.0, 0.0, 0.0, 1.1, [ ] which2loc, 0.0, 0.0, 0.0, 0.0, 2.1, [+ ] #key, ties, win, loss, win-loss @ 99% which2, 0, 10, 0, 10 manualUp, 0, 9, 1, 8 nBayes, 0, 8, 2, 6 which4, 0, 7, 3, 4 manualDown, 0, 6, 4, 2 j48, 0, 5, 5, 0 jRip, 0, 4, 6, -2 which8loc, 1, 2, 7, -5 which8, 3, 0, 7, -7 which4loc, 2, 0, 8, -8 which2loc, 2, 0, 8, -8 17
Slide 18: “areas” in KC2 which2, 65.6, 76.0, 81.6, 84.6, 88.5, [ ------ | ++ ] manualUp, 57.9, 65.4, 69.3, 71.0, 76.6, [ ---- |+++ ] nBayes, 47.0, 54.8, 58.7, 61.0, 69.4, [ ---- |+++++ ] which4, 43.1, 52.5, 59.4, 66.8, 79.6, [ ----- | +++++++ ] manualDown, 37.9, 43.1, 46.1, 52.3, 62.4, [ --- | ++++++ ] which8, 26.3, 36.5, 41.2, 47.6, 56.5, [ ------ | +++++ ] j48, 26.0, 36.1, 41.2, 45.9, 59.8, [ ------ | +++++++ ] jRip, 22.2, 36.0, 42.2, 49.5, 65.2, [ ------- | ++++++++ ] which8loc, 0.0, 0.0, 0.0, 0.0, 5.9, [++ ] which4loc, 0.0, 0.0, 0.0, 0.0, 2.9, [+ ] which2loc, 0.0, 0.0, 0.0, 0.0, 3.1, [+ ] #key, ties, win, loss, win-loss @ 99% which2, 0, 10, 0, 10 manualUp, 0, 9, 1, 8 which4, 1, 7, 2, 5 nBayes, 1, 7, 2, 5 manualDown, 1, 5, 4, 1 jRip, 3, 3, 4, -1 which8, 2, 3, 5, -2 j48, 2, 3, 5, -2 which8loc, 2, 0, 8, -8 which4loc, 2, 0, 8, -8 which2loc, 2, 0, 8, -8 18
Slide 19: “areas” in MW1_mod which2, 35.8, 57.4, 62.4, 70.8, 83.3, [ ----------- | +++++++ ] manualDown, 42.8, 52.1, 60.2, 63.7, 71.8, [ ----- |+++++ ] manualUp, 37.1, 44.0, 47.8, 51.9, 62.5, [ ---- | ++++++ ] which8, 0.1, 35.6, 39.3, 47.6, 60.4, [----------------- | +++++++ ] nBayes, 19.5, 33.1, 41.7, 47.7, 62.1, [ ------- | ++++++++ ] which4, 0.0, 25.8, 42.7, 49.8, 60.6, [------------ | ++++++ ] j48, 0.0, 10.0, 20.0, 24.3, 42.9, [----- | ++++++++++ ] jRip, 0.0, 7.9, 15.8, 31.2, 49.4, [--- | ++++++++++ ] which8loc, 0.0, 0.0, 0.0, 0.0, 10.5, [+++++ ] which4loc, 0.0, 0.0, 0.0, 0.0, 10.4, [+++++ ] which2loc, 0.0, 0.0, 0.0, 0.0, 25.6, [++++++++++++ ] #key, ties, win, loss, win-loss @ 99% which2, 1, 9, 0, 9 manualDown, 1, 9, 0, 9 manualUp, 2, 6, 2, 4 which4, 3, 5, 2, 3 nBayes, 3, 5, 2, 3 which8, 2, 5, 3, 2 jRip, 1, 3, 6, -3 j48, 1, 3, 6, -3 which8loc, 2, 0, 8, -8 which4loc, 2, 0, 8, -8 which2loc, 2, 0, 8, -8 manual down wins? 19
Slide 20: “areas” in PC1 which2, 0.0, 0.0, 65.0, 71.1, 81.8, [ | ++++++ ] manualUp, 52.1, 58.4, 60.6, 63.4, 71.6, [ ----|+++++ ] nBayes, 36.4, 46.1, 51.5, 53.4, 60.9, [ ----- |++++ ] manualDown, 32.3, 41.9, 44.6, 46.2, 55.3, [ ----- |+++++ ] j48, 3.1, 12.5, 19.2, 24.6, 41.5, [----- | +++++++++ ] jRip, 0.0, 11.0, 15.1, 23.2, 30.8, [----- | ++++ ] which8, 0.0, 9.1, 22.6, 30.7, 47.7, [---- | +++++++++ ] which8loc, 0.0, 0.0, 7.4, 12.7, 22.1, [ | +++++ ] which4loc, 0.0, 0.0, 3.8, 14.8, 30.3, [| ++++++++ ] which4, 0.0, 0.0, 0.0, 50.3, 59.0, [ +++++ ] which2loc, 0.0, 0.0, 0.0, 9.7, 26.3, [ +++++++++ ] #key, ties, win, loss, win-loss @ 99% manualUp, 1, 9, 0, 9 which2, 2, 8, 0, 8 nBayes, 1, 8, 1, 7 manualDown, 1, 6, 3, 3 which8, 3, 3, 4, -1 jRip, 3, 3, 4, -1 j48, 3, 3, 4, -1 which4, 7, 0, 3, -3 which8loc, 3, 0, 7, -7 which4loc, 3, 0, 7, -7 which2loc, 3, 0, 7, -7 20
Slide 21: Results type #2: 2/8 examples Manual worse than (WHICH or traditional data miners) 21
Slide 22: “areas” in KC3_mod which2, 73.3, 82.4, 87.3, 90.5, 95.4, [ ----- | +++ ] nBayes, 45.5, 59.2, 64.2, 69.6, 75.4, [ ------- | +++ ] manualUp, 50.7, 57.5, 64.2, 68.1, 77.4, [ ---- | +++++ ] which4, 0.0, 40.5, 47.8, 58.6, 67.2, [-------------------- | +++++ ] manualDown, 31.3, 39.5, 47.6, 55.6, 66.8, [ ----- | ++++++ ] which8, 0.0, 36.2, 46.7, 52.7, 62.1, [------------------ | +++++ ] j48, 0.0, 13.6, 23.1, 28.9, 42.6, [------ | +++++++ ] jRip, 0.0, 13.1, 17.7, 23.9, 54.2, [------ | ++++++++++++++++ ] which8loc, 0.0, 0.0, 0.0, 0.0, 43.0, [+++++++++++++++++++++ ] which4loc, 0.0, 0.0, 0.0, 8.3, 19.7, [ ++++++ ] which2loc, 0.0, 0.0, 6.6, 18.9, 39.9, [ | +++++++++++ ] #key, ties, win, loss, win-loss @ 99% which2, 0, 10, 0, 10 nBayes, 1, 8, 1, 7 manualUp, 1, 8, 1, 7 which8, 2, 5, 3, 2 which4, 2, 5, 3, 2 manualDown, 2, 5, 3, 2 j48, 1, 3, 6, -3 jRip, 2, 2, 6, -4 which2loc, 2, 1, 7, -6 which4loc, 2, 0, 8, -8 which8loc, 1, 0, 9, -9 22
Slide 23: “areas” in PC3_mod which2, 70.6, 76.0, 79.3, 82.7, 88.4, [ --- | +++ ] nBayes, 58.8, 63.0, 67.4, 69.0, 75.4, [ --- |++++ ] which4, 56.2, 62.2, 65.3, 68.3, 77.5, [ --- | +++++ ] manualDown, 48.9, 55.3, 57.5, 60.1, 65.2, [ ----| +++ ] manualUp, 43.1, 47.7, 49.9, 52.4, 59.0, [ ---| ++++ ] j48, 0.0, 17.4, 22.7, 26.3, 36.5, [-------- | ++++++ ] which8, 0.0, 13.6, 31.9, 36.7, 43.7, [------ | ++++ ] jRip, 0.0, 6.3, 12.5, 19.4, 34.4, [--- | ++++++++ ] which4loc, 0.0, 2.1, 5.6, 9.8, 16.4, [-| ++++ ] which8loc, 0.0, 0.0, 0.0, 4.1, 16.1, [ ++++++ ] which2loc, 0.0, 0.0, 1.9, 6.6, 21.5, [ ++++++++ ] #key, ties, win, loss, win-loss @ 99% which2, 0, 10, 0, 10 which4, 1, 8, 1, 7 nBayes, 1, 8, 1, 7 manualDown, 0, 7, 3, 4 manualUp, 0, 6, 4, 2 which8, 1, 4, 5, -1 j48, 1, 4, 5, -1 jRip, 0, 3, 7, -4 which4loc, 1, 1, 8, -7 which2loc, 2, 0, 8, -8 which8loc, 1, 0, 9, -9 manual down wins? 23
Slide 24: Once Manual beats ( WHICH or traditional data miners) 24
Slide 25: “areas” in MC2_mod manualUp, 63.3, 70.9, 74.3, 78.3, 80.4, [ ---- | ++ ] nBayes, 21.4, 46.6, 55.9, 59.1, 79.1, [ ------------- | ++++++++++ ] manualDown, 29.7, 38.1, 42.8, 47.2, 57.4, [ ----- | ++++++ ] j48, 21.9, 29.3, 43.7, 55.4, 69.7, [ ---- | ++++++++ ] jRip, 12.7, 17.0, 28.5, 35.2, 56.4, [ --- | +++++++++++ ] which8, 0.0, 11.2, 21.9, 27.4, 42.4, [----- | ++++++++ ] which8loc, 0.0, 0.0, 0.0, 0.0, 29.8, [++++++++++++++ ] which4loc, 0.0, 0.0, 0.0, 5.6, 14.9, [ +++++ ] which4, 0.0, 0.0, 5.6, 25.3, 47.9, [ | ++++++++++++ ] which2loc, 0.0, 0.0, 0.0, 0.0, 21.0, [++++++++++ ] which2, 0.0, 0.0, 0.0, 40.8, 99.7, [ ++++++++++++++++++++++++++++++ ] #key, ties, win, loss, win-loss @ 99% manualUp, 0, 10, 0, 10 nBayes, 0, 9, 1, 8 manualDown, 1, 7, 2, 5 j48, 1, 7, 2, 5 jRip, 1, 5, 4, 1 which8, 3, 3, 4, -1 which4, 4, 1, 5, -4 which2, 5, 0, 5, -5 which4loc, 4, 0, 6, -6 which2loc, 4, 0, 6, -6 which8loc, 3, 0, 7, -7 25
Slide 26: Overall WHICH2 > manual > traditional 26
Slide 27: Across all data sets which2, 0.0, 66.8, 77.6, 85.6, 99.7, [--------------------------------- | ++++++++ ] manualUp, 37.1, 56.5, 63.7, 70.2, 80.4, [ ---------- | ++++++ ] nBayes, 19.5, 52.9, 61.2, 69.6, 82.4, [ ----------------- | +++++++ ] manualDown, 29.7, 42.3, 46.4, 53.4, 71.8, [ ------- | ++++++++++ ] which4, 0.0, 35.6, 53.7, 63.9, 96.7, [----------------- | +++++++++++++++++ ] which8, 0.0, 18.6, 35.5, 47.0, 92.5, [--------- | +++++++++++++++++++++++ ] j48, 0.0, 18.3, 27.9, 42.9, 72.0, [--------- | +++++++++++++++ ] jRip, 0.0, 13.3, 23.9, 39.7, 65.2, [------ | +++++++++++++ ] which8loc, 0.0, 0.0, 0.0, 6.7, 92.5, [ +++++++++++++++++++++++++++++++++++++++++++ ] which4loc, 0.0, 0.0, 0.0, 9.8, 96.7, [ ++++++++++++++++++++++++++++++++++++++++++++ ] which2loc, 0.0, 0.0, 0.0, 11.2, 97.0, [ +++++++++++++++++++++++++++++++++++++++++++ ] #key, ties, win, loss, win-loss @ 99% which2, 0, 10, 0, 10 nBayes, 1, 8, 1, 7 manualUp, 1, 8, 1, 7 which4, 0, 7, 3, 4 manualDown, 0, 6, 4, 2 which8, 1, 4, 5, -1 j48, 1, 4, 5, -1 jRip, 0, 3, 7, -4 which8loc, 2, 0, 8, -8 which4loc, 2, 0, 8, -8 which2loc, 2, 0, 8, -8 27
Slide 28: Conclusions 28
Slide 29: Overall • Don’t assess learners without a usage context. – Here: context = “read less, find more” • Some support for the Koru hypothesis • Value of manual (up or down) questionable – Only outstandingly better in one data set – And worse than other methods in 4/10 data sets • WHICH2 – The general winner – Near optimum • Min: 0% • Lower quartile: 67% • Median: 78% • 3rd quartile: 86% Still room for • Max: 99% improvement 29
Slide 30: Early stopping rules (useful, a little interesting) optimal % defective Modules detected detector1 Watch inspection rules to learn when enough is enough % LOC read 30
Slide 31: Learning the actual number of defects (very useful, very interesting) Curve1 = optimal - real defects % defective Modules detected curve2 = inspections Q:Can we learn curve1 from watching the growth of curve2? A: Maybe. WHICH2’ s (50%,75%) percentile = (79%, 86%) (I.e. getting pretty close to curve2) % LOC read 31



Add a comment on Slide 1
If you have a SlideShare account, login to comment; else you can comment as a guest- Favorites & Groups
Showing 1-50 of 0 (more)