Your SlideShare is downloading. ×
0
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Idea
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Idea

447

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
447
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Learning to Change Projects Raymond Borges, Tim Menzies Lane Department of Computer Science & Electrical Engineering West Virginia University PROMISE’12: Lund, Sweden Sept 21, 2012tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 1 / 18
  • 2. Sound bites Less predicition, more decision Data has shape “Data mining” = “carving” out that shape To reveal shape, remove irrelvancies Cut the cr*p Use reduction operators: dimension, column, row, rule Show, don’t code Once you can see shape, inference is superflous. Implications for other research.tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 1 / 18
  • 3. Decisions, Decisions... Tom Zimmermann: “We forget that the original motivation for predictive modeling was making decisions about software project.”tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 2 / 18
  • 4. Decisions, Decisions... Tom Zimmermann: “We forget that the original motivation for predictive modeling was making decisions about software project.” ICSE 2012 Panel on Software Analytics “Prediction is all well and good, but what about decision making?”.tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 2 / 18
  • 5. Decisions, Decisions... Tom Zimmermann: “We forget that the original motivation for predictive modeling was making decisions about software project.” ICSE 2012 Panel on Software Analytics “Prediction is all well and good, but what about decision making?”. Predictive models are useful They focus an inquiry onto particular issues but predictions are sub-routines of decision processestim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 2 / 18
  • 6. Q: How to Build Decision Systems? 1996: T Menzies, Applications of abduction: knowledge-level modeling, International Journal of Human Computer Studiestim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
  • 7. Q: How to Build Decision Systems? 1996: T Menzies, Applications of abduction: knowledge-level modeling, International Journal of Human Computer Studies Score contexts e.g. Hate, Love; count frequencies of ranges in each:tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
  • 8. Q: How to Build Decision Systems? 1996: T Menzies, Applications of abduction: knowledge-level modeling, International Journal of Human Computer Studies Score contexts e.g. Hate, Love; count frequencies of ranges in each: Diagnosis = what went wrong.tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
  • 9. Q: How to Build Decision Systems? 1996: T Menzies, Applications of abduction: knowledge-level modeling, International Journal of Human Computer Studies Score contexts e.g. Hate, Love; count frequencies of ranges in each: Diagnosis = what went wrong. δ = Hate(now) − Love(past)tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
  • 10. Q: How to Build Decision Systems? 1996: T Menzies, Applications of abduction: knowledge-level modeling, International Journal of Human Computer Studies Score contexts e.g. Hate, Love; count frequencies of ranges in each: Diagnosis = what went wrong. δ = Hate(now) − Love(past) Monitor = what not to do.tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
  • 11. Q: How to Build Decision Systems? 1996: T Menzies, Applications of abduction: knowledge-level modeling, International Journal of Human Computer Studies Score contexts e.g. Hate, Love; count frequencies of ranges in each: Diagnosis = what went wrong. δ = Hate(now) − Love(past) Monitor = what not to do. δ = Hate(next) − Love(now)tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
  • 12. Q: How to Build Decision Systems? 1996: T Menzies, Applications of abduction: knowledge-level modeling, International Journal of Human Computer Studies Score contexts e.g. Hate, Love; count frequencies of ranges in each: Diagnosis = what went wrong. δ = Hate(now) − Love(past) Monitor = what not to do. δ = Hate(next) − Love(now) Planning = what to do next.tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
  • 13. Q: How to Build Decision Systems? 1996: T Menzies, Applications of abduction: knowledge-level modeling, International Journal of Human Computer Studies Score contexts e.g. Hate, Love; count frequencies of ranges in each: Diagnosis = what went wrong. δ = Hate(now) − Love(past) Monitor = what not to do. δ = Hate(next) − Love(now) Planning = what to do next. δ = Love(next) − Hate(now)tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
  • 14. Q: How to Build Decision Systems? 1996: T Menzies, Applications of abduction: knowledge-level modeling, International Journal of Human Computer Studies Score contexts e.g. Hate, Love; count frequencies of ranges in each: Diagnosis = what went wrong. δ = Hate(now) − Love(past) Monitor = what not to do. δ = Hate(next) − Love(now) Planning = what to do next. δ = Love(next) − Hate(now) δ = X − Y = contrast set = things frequent X but rare in Y TAR3 (2003),WHICH (2010),etc But for PROMISE effort estimation data Contrast sets are obvious... ... Once you find the underlying shape of the data.tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
  • 15. Q: How to find the underlying shape of the data? Data mining = data carving To find the signal in the noise... Timm’s algorithm 1 Find some cr*p 2 Throw it away 3 Go to 1tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 4 / 18
  • 16. IDEA = Iterative Dichomization on Every Attribute Timm’s algorithm 1 Find some cr*p 2 Throw it away 3 Go to 1tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 5 / 18
  • 17. IDEA = Iterative Dichomization on Every Attribute Timm’s algorithm 1 Find some cr*p 2 Throw it away 3 Go to 1 1 Dimensionality reduction 2 Column reduction 3 Row reduction 4 Rule reductiontim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 5 / 18
  • 18. IDEA = Iterative Dichomization on Every Attribute Timm’s algorithm 1 Find some cr*p 2 Throw it away 3 Go to 1 1 Dimensionality reduction 2 Column reduction 3 Row reduction 4 Rule reduction And in the reduced data, inference is obvious.tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 5 / 18
  • 19. IDEA = Iterative Dichomization on Every Attribute 1 Dimensionality reduction (recursive fast PCA)tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18
  • 20. IDEA = Iterative Dichomization on Every Attribute 1 Dimensionality reduction (recursive fast PCA) Fastmap (Faloutsos’94) W = anything X = furthest from W Y = furthest from Xtim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18
  • 21. IDEA = Iterative Dichomization on Every Attribute 1 Dimensionality reduction (recursive fast PCA) Fastmap (Faloutsos’94) W = anything X = furthest from W Y = furthest from X Takes time O(2N)tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18
  • 22. IDEA = Iterative Dichomization on Every Attribute 1 Dimensionality reduction (recursive fast PCA) Fastmap (Faloutsos’94) W = anything X = furthest from W Y = furthest from X Takes time O(2N) Let c = dist(X,Y) If Z has distance a, b to X,Y then 2 2 2 X projects to a +c −b 2ctim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18
  • 23. IDEA = Iterative Dichomization on Every Attribute 1 Dimensionality reduction (recursive fast PCA) Fastmap (Faloutsos’94) W = anything X = furthest from W Y = furthest from X Takes time O(2N) Let c = dist(X,Y) If Z has distance a, b to X,Y then 2 2 2 X projects to a +c −b 2c ¨ Platt’05: Fastmap = Nystrom algorithm = fast & approximate PCAtim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18
  • 24. IDEA = Iterative Dichomization on Every Attribute 1 Dimensionality reduction (recursive fast PCA)tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18
  • 25. IDEA = Iterative Dichomization on Every Attribute 1 Dimensionality reduction (recursive fast PCA) 2 Column reduction (info gain)tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18
  • 26. IDEA = Iterative Dichomization on Every Attribute 1 Dimensionality reduction (recursive fast PCA) 2 Column reduction (info gain) Sort columns by their diversity Keep columns that select for fewest clusterstim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18
  • 27. IDEA = Iterative Dichomization on Every Attribute 1 Dimensionality reduction (recursive fast PCA) 2 Column reduction (info gain) Sort columns by their diversity Keep columns that select for fewest clusters e.g. nine rows in two clusters cluster c1 has acap=2,3,3,3,3; pcap=3,3,4,5,5 cluster c2 has acap=2,2,2,3; pcap=3,4,4,5tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18
  • 28. IDEA = Iterative Dichomization on Every Attribute 1 Dimensionality reduction (recursive fast PCA) 2 Column reduction (info gain) Sort columns by their diversity Keep columns that select for fewest clusters e.g. nine rows in two clusters cluster c1 has acap=2,3,3,3,3; pcap=3,3,4,5,5 cluster c2 has acap=2,2,2,3; pcap=3,4,4,5 p(acap = 2) = 0.44 p(acap = 3) = 0.55 p(pcap = 3) = p(pcap = 4) = 0.33 p(pcap = 5) = 0.33 p(acap = 2|c1 ) = 0.25 p(acap = 2|c2 ) = 0.75 p(acap = 3|c1 ) = 0.8 p(acap = 3|c2 ) = 0.2 p(pcap = 3|c1 ) = 0.67 p(pcap = 3|c2 ) = 0.33 p(pcap = 4|c1 ) = 0.33 p(pcap = 4|c2 ) = 0.67 p(pcap = 5|c1 ) = 0.67 p(pcap = 5|c2 ) = 0.33tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18
  • 29. IDEA = Iterative Dichomization on Every Attribute 1 Dimensionality reduction (recursive fast PCA) 2 Column reduction (info gain) Sort columns by their diversity Keep columns that select for fewest clusters e.g. nine rows in two clusters cluster c1 has acap=2,3,3,3,3; pcap=3,3,4,5,5 cluster c2 has acap=2,2,2,3; pcap=3,4,4,5 p(acap = 2) = 0.44 p(acap = 3) = 0.55 p(pcap = 3) = p(pcap = 4) = 0.33 p(pcap = 5) = 0.33 p(acap = 2|c1 ) = 0.25 p(acap = 2|c2 ) = 0.75 p(acap = 3|c1 ) = 0.8 p(acap = 3|c2 ) = 0.2 p(pcap = 3|c1 ) = 0.67 p(pcap = 3|c2 ) = 0.33 p(pcap = 4|c1 ) = 0.33 p(pcap = 4|c2 ) = 0.67 p(pcap = 5|c1 ) = 0.67 p(pcap = 5|c2 ) = 0.33 I(col) = (p(x) ∗ ( −p(x|c).log(x|c))) I(acap) = 0.239 ← keep I(pcap) = 0.273 ← prunetim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18
  • 30. IDEA = Iterative Dichomization on Every Attribute 1 Dimensionality reduction (recursive fast PCA) 2 Column reduction (info gain)tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 8 / 18
  • 31. IDEA = Iterative Dichomization on Every Attribute 1 Dimensionality reduction (recursive fast PCA) 2 Column reduction (info gain) 3 Row reduction (replace clusters with their mean)tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 8 / 18
  • 32. IDEA = Iterative Dichomization on Every Attribute 1 Dimensionality reduction (recursive fast PCA) 2 Column reduction (info gain) 3 Row reduction (replace clusters with their mean) Replace all leaf cluster instances with their centroid Described only using columns within 50% of min diversity. e.g. Nasa93 reduces to 12 columns and 13 centroids.tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 8 / 18
  • 33. Nasa93 reduces to 12 columns and 13 centroidstim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 9 / 18
  • 34. IDEA = Iterative Dichomization on Every Attribute 1 Dimensionality reduction (recursive fast PCA) 2 Column reduction (info gain) 3 Row reduction (replace clusters with their mean)tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 10 / 18
  • 35. IDEA = Iterative Dichomization on Every Attribute 1 Dimensionality reduction (recursive fast PCA) 2 Column reduction (info gain) 3 Row reduction (replace clusters with their mean) 4 Rule reduction (contrast home vs neighbors) Surprise: after steps 1,2,3... Further computation is superfluous. Visuals sufficient for contrast set generationtim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 10 / 18
  • 36. Manual Construction of Contrast Sets Table5 = Your “home” cluster Table6 = Projects of similar size Table7 = Nearby project with fearsome effort Contrast set = delta on last linetim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 11 / 18
  • 37. Why Cluster120? Is it valid that cluter120 costs so much? Yes, if building core services with cost amortized over N future apps. No, if racing to get products to a competitive market We do not know- but at least we are focused on that issue.tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 12 / 18
  • 38. Reductions on PROMISE data sets size of reduces data sets reduced 25 data set rows columns Albrecht 4 4 China 66 15 Cocomo81 8 18 Cocomo81e 4 16 20 Cocomo81o 4 16 Cocomo81s 2 16 Desharnais 8 19 Desharnais L1 6 10 Desharnais L2 4 10 15 Desharnais L3 2 10 columns Finnish 6 2 Kemerer 2 7 Miyazaki’94 6 3 Nasa93 13 12 10 Nasa93 center 5 7 16 Nasa93 center1 2 15 Nasa93 center2 5 16 SDR 4 21 Telcom1 2 1 5 0 1 10 100 rowstim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 13 / 18
  • 39. Reductions on PROMISE data sets size of reduces data sets reduced 25 data set rows columns Albrecht 4 4 China 66 15 Cocomo81 8 18 Cocomo81e 4 16 20 Cocomo81o 4 16 Cocomo81s 2 16 Desharnais 8 19 Desharnais L1 6 10 Desharnais L2 4 10 15 Desharnais L3 2 10 columns Finnish 6 2 Kemerer 2 7 Miyazaki’94 6 3 Nasa93 13 12 10 Nasa93 center 5 7 16 Nasa93 center1 2 15 Nasa93 center2 5 16 SDR 4 21 Telcom1 2 1 5 Q: throwing away too much? 0 1 10 100 rowstim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 13 / 18
  • 40. Q: Throwing Away Too Much? Estimates = class variable of nearest centroid in reduced space Compare to 90 pre-processors*learners from Kocagueneli et al. TSE, 2011 On the Value of Ensemble Learning in Effort Estimation. pred−actual Performance measure = MRE = actualtim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 14 / 18
  • 41. Q: Throwing Away Too Much? Estimates = class variable of nearest centroid in reduced space Compare to 90 pre-processors*learners from Kocagueneli et al. TSE, 2011 On the Value of Ensemble Learning in Effort Estimation. pred−actual Performance measure = MRE = actual9 pre-processors: 1 norm: normalize numerics 0..1, min..max 2 log: replace numerics of the non-class columns with their logarithms 3 PCA: replace non-class columns with principle components 4 SWReg: cull uninformative columns with stepwise regression 5 Width3bin: divide numerics into 3 bins with boundaries (max-min)/3 6 Wdith5bin: divide numerics into 5 bins with boundaries (max-min)/5 7 Freq3bins: split numerics into 3 equal size percentiles. 8 Freq5bins: split numerics into 5 equal size percentiles. 9 None: no pre-processor.tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 14 / 18
  • 42. Q: Throwing Away Too Much? Estimates = class variable of nearest centroid in reduced space Compare to 90 pre-processors*learners from Kocagueneli et al. TSE, 2011 On the Value of Ensemble Learning in Effort Estimation. pred−actual Performance measure = MRE = actual9 pre-processors: 10 learners: 1 norm: normalize numerics 0..1, min..max 1 INN: simple one nearest neighbor 2 log: replace numerics of the non-class columns with 2 ABE0-1nn: analogy-based estimation using nearest their logarithms neighbor. 3 PCA: replace non-class columns with principle 3 ABE0-5nn: analogy-based estimation using the median components of the five nearest neighbors. 4 SWReg: cull uninformative columns with stepwise 4 CART(yes): regression trees, with sub-tree postpruning. regression 5 CART(no): regression trees, no post-pruning. 5 Width3bin: divide numerics into 3 bins with boundaries (max-min)/3 6 NNet: two-layered neural net. 6 Wdith5bin: divide numerics into 5 bins with boundaries 7 LReg: linear regression (max-min)/5 8 PLSR: partial least squares regression. 7 Freq3bins: split numerics into 3 equal size percentiles. 9 PCR: principle components regression 8 Freq5bins: split numerics into 5 equal size percentiles. 10 SWReg: Stepwise regressions. 9 None: no pre-processor.tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 14 / 18
  • 43. Results Perennial problem with assessing different effort estimation tools. MRE not normal: low valley, high hills (injects much variance)tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 15 / 18
  • 44. Results Perennial problem with assessing different effort estimation tools. MRE not normal: low valley, high hills (injects much variance) IDEA’s predictions not better or worse than others, avoids all hillstim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 15 / 18
  • 45. Related Work Cluster using (a) centrality (e.g. k-means); (b) connectedness (e.g. dbScan) (c) separation (e.g. IDEA)tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 16 / 18
  • 46. Related Work Cluster using (a) centrality (e.g. k-means); (b) connectedness (e.g. dbScan) (c) separation (e.g. IDEA) case- feature Who based clustering selection task √ Shepperd (1997) predict Boley (1998) recursive PCA predict Bettenburg et al. (MSR’12) recurive regression predict Posnett et al. (ASE’11) on file/package divisions predict √ Menzies et al. (ASE’11) FastMap contrast √ √ √ IDEA contrasttim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 16 / 18
  • 47. Back to the Sound bites Less predicition, more decision Data has shape “Data mining” = “carving” out that shape To reveal shape, remove irrelvancies Cut the cr*p IDEA = reduction operators: dimension, column, row, rule Show, don’t code Once you can see shape, inference is superflous. Implications for other research.tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 17 / 18
  • 48. Questions? Comments?tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 18 / 18

×