LOCAL VS. GLOBALMODELS FOR EFFORTESTIMATION AND DEFECTPREDICTIONTIM MENZIES, ANDREW BUTCHER   (WVU)ANDRIAN MARCUS         ...
PREMISESomething is very wrong with data miningresearch in software engineering    •  Need less “algorithm mining” and mor...
Less “algorithm mining”       More “data mining”12/1/2011                                 3
TOO MUCH MINING? Porter & Selby, 1990     •  Evaluating Techniques for Generating Metric-Based Classification Trees, JSS. ...
THE FIELD IS CALLED “DATA MINING”, NOT “ALGORITHM MINING”To understand datamining, look at the data,not the algorithmsOur ...
Handle    “Conclusion instability”12/1/2011                               6
CONCLUSION INSTABILITY:WHAT WORKS THERE DOES NOT WORK HERE12/1/2011                                      7
Conclusion Instability:what works there does not work herePosnet et al [2011]Zimmermann [2009] : learned defect predictors...
ROOT CAUSE OFCONCLUSION INSTABILITY?  HYPOTHESIS #1                         HYPOTHESIS #2  Any one of….                   ...
SOLVE CONCLUSION INSTABILITYWITH “DELPHI LOCALIZATIONS” ?Restrict data mining to just related projectsAsk an expert to fin...
Q: What to do                                             about rareDELPHI LOCALIZATIONS                           zones? ...
Cluster then learn                                 1212/1/2011
KOCAGUNELI [2011]CLUSTERING TO FIND “LOCAL”TEAK: estimates from “k”nearest-neighbors    •  “k” auto-selected       per tes...
LESSON : DATA MAY NOT DIVIDENEATLY ON RAW DIMENSIONSThe best description for SE projects may be synthesizedimensions extra...
SYNTHESIZED DIMENSIONSPCA : e.g. Nagappan [2006]           Fastmap: Faloutsos [1995]                                     O...
HIERARCHICAL PARTITIONINGGrow                             PruneFind two orthogonal dimensions   Combine quadtree leavesFin...
Q: WHY CLUSTER VIA FASTMAP?A1: Circular methods (e.g. k-means) assumeround clusters.    •  But density-based clustering al...
Learning via “envy”                                  1812/1/2011
Q: WHY TRAIN ON NEIGHBORINGCLUSTERS WITH BETTER SCORES?A1: Why learn fromyour own mistakes?    •  When there exists       ...
HIERARCHICAL PARTITIONINGGrow                             PruneFind two orthogonal dimensions   Combine quadtree leaves   ...
HIERARCHICAL PARTITIONINGGrow                             PruneFind two orthogonal dimensions   Combine quadtree leaves   ...
Q: HOW TO LEARN RULES FROMNEIGHBORING CLUSTERSA: it doesn’t really matter   •  But when comparing global & intra-cluster r...
DATA FROMHTTP://PROMISEDATA.ORG/DATA                                      Distributions have percentiles:Effort reduction ...
BY ANY MEASURE,PER-CLUSTER LEARNING IS BESTLower median efforts/defects (50th percentile)Greater stability (75th – 25th pe...
CLUSTERS GENERATEDIFFERENT RULESWhat works “here” does not work “there”    •  Misguided to try and tame conclusion instabi...
Related work                           2612/1/2011
RELATED WORKDefect & effort prediction: 1,000 papers          Design of experiments  •  All about making predictions      ...
Conclusion                         2812/1/2011
THIS TALKSomething is fundamentally wrong with data mining research insoftware engineering    •  Needs more “data mining”,...
NOT “ONE RING TO RULE THEM ALL”Trite global statements about multiple SEprojects are… triteNeed effective ways to learn lo...
THE WISDOM OF THE   CROWDS                             3112/1/2011
THE WISDOM OF THE   CROWDS                             3212/1/2011
THE WISDOM OF THE   CROWDS                             3312/1/2011
THE WISDOM OF THE   COWS                           3412/1/2011
THE WISDOM OF THE          COWS•  Seek the fence where   the grass is greener   on the other side.     •  Learn from there...
3612/1/2011
Upcoming SlideShare
Loading in...5
×

Local vs. Global Models for Effort Estimation and Defect Prediction

1,021

Published on

Talk at IEEE ASE 2011

Tim Menzies, Andrew Butcher (WVU)
Andrian marcus (Wayne State)
Thomas Zimmermann (microsoft)
David coK (Grammatech)

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,021
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
31
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Local vs. Global Models for Effort Estimation and Defect Prediction

  1. 1. LOCAL VS. GLOBALMODELS FOR EFFORTESTIMATION AND DEFECTPREDICTIONTIM MENZIES, ANDREW BUTCHER (WVU)ANDRIAN MARCUS (WAYNE STATE)THOMAS ZIMMERMANN (MICROSOFT)DAVID COK (GRAMMATECH)
  2. 2. PREMISESomething is very wrong with data miningresearch in software engineering •  Need less “algorithm mining” and more “data mining” •  Handle “conclusion instability”Need to do a different kind of data mining •  Cluster, then learn •  Learning via “envy”12/1/2011 2
  3. 3. Less “algorithm mining” More “data mining”12/1/2011 3
  4. 4. TOO MUCH MINING? Porter & Selby, 1990 •  Evaluating Techniques for Generating Metric-Based Classification Trees, JSS. •  Empirically Guided Software Development Using Metric-Based Classification Trees. IEEE Software •  Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis. IEEE TSE In 2011, Hall et al. (TSE, pre-print) •  reported 100s of similar studies. •  L learners on D data sets in a M*N cross-val What is your next paper? •  Hopefully not D*L*M*N12/1/201 4
  5. 5. THE FIELD IS CALLED “DATA MINING”, NOT “ALGORITHM MINING”To understand datamining, look at the data,not the algorithmsOur results should beinsights about data, •  not trivia about (say) decision tree algorithmsBesides, the thing thatmost predicts forperformance is the data,not the algorithm, •  Domingos & Pazzani: Optimality of the Simple Bayesian Classifier under Zero-One Loss, Machine Learning, 5 Volume 29, [103-130, 1997
  6. 6. Handle “Conclusion instability”12/1/2011 6
  7. 7. CONCLUSION INSTABILITY:WHAT WORKS THERE DOES NOT WORK HERE12/1/2011 7
  8. 8. Conclusion Instability:what works there does not work herePosnet et al [2011]Zimmermann [2009] : learned defect predictors from 622 pairs ofprojects ⟨project1, project2⟩. •  4% of pairs did project1’s predictors work for project2.Kitchenham [2007] : studies comparing effort models learned from localor imported models •  1/3 better, 1/3 same, 1/3 worseJørgensen [2004] :15 studies comparing model-based to expert-based estimation. •  1/3 better, 1/3 same, 1/3 worseMair [2005] : studies comparing regression to analogy methods foreffort estimation •  7/20 better,4/20 same, 9/20
  9. 9. ROOT CAUSE OFCONCLUSION INSTABILITY? HYPOTHESIS #1 HYPOTHESIS #2 Any one of…. SE is an inherently varied •  Over-generalization across activity different kinds of projects? •  Solve with “delphi •  So conclusion instability localization” can’t be fixed •  Noisy data? •  It must be managed •  Too little data? •  Poor statistical technique? •  Needs different kinds of •  Stochastic choice within data miners data miner (e.g. random •  Cluster, then learn forests) •  Learning via “envy” •  Insert idea here12/1/2011 9
  10. 10. SOLVE CONCLUSION INSTABILITYWITH “DELPHI LOCALIZATIONS” ?Restrict data mining to just related projectsAsk an expert to find the right local context •  Are we sure they’re right? •  Posnett at al. 2011: •  What is right level for learning? •  Files or packages? •  Methods or classes? •  Changes from study to studyAnd even if they are “right”: •  Should we use those contexts? •  What if not enough info in our own delphi localization? 1012/1/2011
  11. 11. Q: What to do about rareDELPHI LOCALIZATIONS zones? A: Select the nearest ones from the rest 11 But how? 11"
  12. 12. Cluster then learn 1212/1/2011
  13. 13. KOCAGUNELI [2011]CLUSTERING TO FIND “LOCAL”TEAK: estimates from “k”nearest-neighbors •  “k” auto-selected per test case •  Pre-processor to cluster data, remove worrisome regions •  IEEE TSE, Jan’11ESEM’11 •  Train within one delphi localization •  Or train on all and see what it picks •  Result #1: usually, cross as good as within •  Result #2: given a choice of both, TEAK picks “within” as much as “cross 1312/1/2011
  14. 14. LESSON : DATA MAY NOT DIVIDENEATLY ON RAW DIMENSIONSThe best description for SE projects may be synthesizedimensions extracted from the raw dimensions 1412/1/2011
  15. 15. SYNTHESIZED DIMENSIONSPCA : e.g. Nagappan [2006] Fastmap: Faloutsos [1995] O(2N) generation of axis of large variabilityFinds orthogonal “components” •  Pick any point W; •  Transforms N correlated •  Find X furthest from W, variables to •  Find Y furthest from Y. fewer uncorrelated "components". c = dist(X,Y) •  Component[i]: accounts for All points have distance a,b to (X,Y) as much variability as possible. •  Component[ j>I ] : accounts •  x = (a2 + c2 − b2)/2c for remaining variability •  y= sqrt(a2 – x2)O(N2) to generate 1512/1/2011
  16. 16. HIERARCHICAL PARTITIONINGGrow PruneFind two orthogonal dimensions Combine quadtree leavesFind median(x), median(y) with similar densitiesRecurse on four quadrants Score each cluster by median score of class variable 16
  17. 17. Q: WHY CLUSTER VIA FASTMAP?A1: Circular methods (e.g. k-means) assumeround clusters. •  But density-based clustering allows clusters to be any shapeA2: No need to pre-set the number of clustersA3: the O(2N) heuristicis very fast, •  Unoptimized Python: 1712/1/2011
  18. 18. Learning via “envy” 1812/1/2011
  19. 19. Q: WHY TRAIN ON NEIGHBORINGCLUSTERS WITH BETTER SCORES?A1: Why learn fromyour own mistakes? •  When there exists a smarter neighbor? •  The “grass is greener” principle 1912/1/2011
  20. 20. HIERARCHICAL PARTITIONINGGrow PruneFind two orthogonal dimensions Combine quadtree leaves with similar densitiesFind median(x), median(y) Score each cluster by medianRecurse on four quadrants score of class variable 20
  21. 21. HIERARCHICAL PARTITIONINGGrow PruneFind two orthogonal dimensions Combine quadtree leaves with similar densitiesFind median(x), median(y) Score each cluster by medianRecurse on four quadrants score of class variable Where is grass greenest? C1 envies neighbor C2 with max abs(score(C2) - score(C1)) 21 •  Train on C2, test on C1
  22. 22. Q: HOW TO LEARN RULES FROMNEIGHBORING CLUSTERSA: it doesn’t really matter •  But when comparing global & intra-cluster rules •  Use the same rule learnerThis study uses WHICH (Menzies [2010]) • Customizable scoring operator • Faster termination • Generates very small rules (good for explanation) 2212/1/2011
  23. 23. DATA FROMHTTP://PROMISEDATA.ORG/DATA Distributions have percentiles:Effort reduction ={ NasaCoc, China } : 100thCOCOMO or function points 75thDefect reduction ={lucene,xalan jedit,synapse,etc } : 50thCK metrics(OO) 25th 0 20 40 60 80 100Clusters have untreated classdistribution. untreated global localRules select a subset of theexamples: Treated with rules learned from all data •  generate a treated class distribution Treated with rules learned 2312/1/2011 from neighboring cluster
  24. 24. BY ANY MEASURE,PER-CLUSTER LEARNING IS BESTLower median efforts/defects (50th percentile)Greater stability (75th – 25th percentile)Decreased worst case (100th percentile) 2412/1/2011
  25. 25. CLUSTERS GENERATEDIFFERENT RULESWhat works “here” does not work “there” •  Misguided to try and tame conclusion instability •  Inherent in the dataDon’t tame it, use it: build lots of local models 2512/1/2011
  26. 26. Related work 2612/1/2011
  27. 27. RELATED WORKDefect & effort prediction: 1,000 papers Design of experiments •  All about making predictions •  Don’t learn from immediate •  This work: learning controllers to change data, learn from better prediction neighbors •  Here: , train once per clusterOutlier removal : (small subset of whole data) •  Yin [2011], Yoon [2010], Kocaguneli [2011] •  Orders of magnitude faster •  Subsumed by this work than N*M cross-valClustering & case-based reasoning Localizations: •  Kocaguneli [2011], Turhan [2009], •  Expert-based Petersen [2009]: Cuadrado [2007] how to know it correct? •  No generated, nothing to reflect about •  Source code-based: ecological •  Needs indexing (runtime speed) inference: Posnett [2011] •  This work: auto-learning ofStructured literature reviews: contexts; beneficial •  Kitchenham [2007] + many more besides •  May be over-generalizing across cluster boundaries 2712/1/2011
  28. 28. Conclusion 2812/1/2011
  29. 29. THIS TALKSomething is fundamentally wrong with data mining research insoftware engineering •  Needs more “data mining”, less “algorithm mining” •  Handle “conclusion instability”Need to do a different kind of data mining •  Cluster, then learn •  Learning via “envy” 2912/1/2011
  30. 30. NOT “ONE RING TO RULE THEM ALL”Trite global statements about multiple SEprojects are… triteNeed effective ways to learn local lessons •  Automatic clustering tools •  Rule learning (per cluster, using envy) 3012/1/2011
  31. 31. THE WISDOM OF THE CROWDS 3112/1/2011
  32. 32. THE WISDOM OF THE CROWDS 3212/1/2011
  33. 33. THE WISDOM OF THE CROWDS 3312/1/2011
  34. 34. THE WISDOM OF THE COWS 3412/1/2011
  35. 35. THE WISDOM OF THE COWS•  Seek the fence where the grass is greener on the other side. •  Learn from there •  Test on here•  Don’t rely on trite definitions of “there” and “here” •  Cluster to find “here” and “there” 35 12/1/2011
  36. 36. 3612/1/2011
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×