Idea Engineering

435 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
435
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Idea Engineering

  1. 1. Idea Engineering tim@menzies.us PROMISE’13 Oct’13 0. algorithm mining 1. landscape mining 2. decision mining 3. discussion mining yesterday today tomorrow future
  2. 2. The Premises of PROMISE (2005) – Wanted: predictions • Nope. Users want decision, or engagement
  3. 3. The Premises of PROMISE (2005) – Wanted: predictions • Nope. Users want decision, or engagement – Data mining will reveal “the truth” about SE • [Dejaeger: TSE’11], [Hall: TSE’12], [Shepperd:COW’13] • Not(Better learners = better conclusions)
  4. 4. The Premises of PROMISE (2005) – Wanted: predictions • Nope. Users want decision, or engagement – Data mining will reveal “the truth” about SE • [Dejaeger: TSE’11], [Hall: TSE’12], [Shepperd:COW’13] • Not(Better learners = better conclusions) – Sooner or later: enough data for general conclusions • Found more differences than generalities • Special issues: [IST’13], [ESEj’13] • Best papers, ASE’11, MSR’12 • Menzies, Zimmermann et al [TSE’13] • Lots of local models
  5. 5. Landscape mining: look before your leap • Report what is true about the data – Not trivia on how algorithms walk that data • Map the landscape – Reason on each part of map • E.g. landscape mining – Unsupervised iterative dichotomization – Cluster, prune – Then generate rules 5
  6. 6. Landscape mining: look before your leap • Report what is true about the data – Not trivia on how algorithms walk that data • Map the landscape – Reason on each part of map • E.g. landscape mining – Unsupervised iterative dichotomization – Cluster, prune – Then generate rules • Different to “leap before you look” – i.e. skew learning by class variable – then study the results • E.g. C4.5, CART, Fayya-Iranni, etc – Supervised iterative dichotomization • E.g. 61% * 300+effort estimation papers – Algorithm tinkering, without end 6
  7. 7. Find landscape = cluster data, assign “heights” Find decisions = report delta highs to lows Monitor discussions = watch, help, communities explore deltas 7 IDEA Engineering = <landscape, decisions, discussion>
  8. 8. Spectral Landscape Mining • Spectrum = condition that is not limited to a specific set of values but varies in a continuum. • Groups together a broad range of conditions or behaviors under one single title • In mathematics, the spectrum of a (finite-dimensional) matrix is the set of its eigenvalues. • Nystrom algorithms: approximations to eigenvalues – FASTMAP: linear time
  9. 9. Project data on first 2 PCA; grid that data e.g. Nasa93dem 1) project 23 dimensions projected into 2 2a) cluster 2b) replace clusters with centroids. MOEA: score= effort+defects +months
  10. 10. Sanity check: What information loss? • E.g. POI-3 – 400+ examples – 20 centroids • Prediction via: – Extrapolation between two nearest centroids • Works as well as – Random forest, Naïve Bayes • For defect prediction (10 data sets) – Linear regression, M5’ • For effort estimation (10 data sets)
  11. 11. • Find delta between neighbors that go worse to better • Very small rules, found in logLinear time • Menzies et al. [TSE’13] 11 Planning = Inter-cluster contrast sets
  12. 12. Applications • Prediction • Planning • Monitoring • Multi-objective optimization – Cluster first on N objectives • Anomaly detection • Incremental theory revision • Compression • Privacy • etc
  13. 13. Idea Engineering 0. algorithm mining 1. landscape mining 2. decision mining 3. discussion mining yesterday today tomorrow future Beyond Data Mining, T. Menzies, IEEE Software, 2013, to appear 13 Q: why call it mining? • A1: because all the primitives for the above are in the data mining literature • So we know how to get from here to there • A2: because data mining scales

×