TRANSFER
LEARNING
AND SE
TIM@MENZIES.US
WVU, JULY 2013
SOUND BITES
•  Ye olde worlde SE
•  “The” model of SE (defects, effort, etc)
•  21st century SE
•  Models (plural)
•  No g...
3
WHAT IS TRANSFER LEARNING?
•  Source = old= Domain1 = < Eg1, P1>
•  Target = new = Domain2 = <Eg2, P2>
•  If we move from ...
WHO CARES?
(WHAT’S AT
STAKE?)
•  “Transfer” is a core
scientific issue
•  Lack of transfer is the
scandal of SE
•  Replica...
MANUAL TRANSFER (WAR STORIES)
•  Brazil, SEL, 2002: need domain knowledge (but now gone)?
•  NSF, SEL, 2006: need better a...
WAR STORIES
(EFFORT ESTIMATION)
Effort = a . locx . y
•  learned using Boehm’s
methods
•  20*66% of NASA93
•  COCOMO attri...
WAR STORIES
(DEFECT ESTIMATION)
8
BUT THERE IS HOPE
•  Maybe we’ve been looking in the wrong direction
•  SE project data = surface features of an underlyin...
Focused too much on what
we can see at first glance
Did not check the nuances on the
hidden structure beneath
10
BUT THERE...
With new data mining technologies, true picture
emerges, where we can see what is going on
12/1/2011
11
BUT THERE IS HOPE
ESEM, 2011 :
How to Find Relevant
Data for Effort Estimation
TIM MENZIES,
EKREM KOCAGUNELI
THERE IS HOPE
•  Maybe we’ve been looking in the wrong direction
•  SE project data = surface features of an underlying ef...
USD DOD MILITARY PROJECTS
(LAST DECADE)
14
You must
segment to
find relevant
data
15"
DOMAIN SEGMENTATIONS
15
Q: What to do
about rare
zones?
A: Select the nearest ones from the rest
But how?
IN THE LITERATURE: WITHIN VS
CROSS = ??
BEFORE THIS WORK
16
Kitchenham et al. TSE
2007
•  Within-company
learning (just us...
SOME DATA DOES NOT DIVIDE
NEATLY ON EXISTING DIMENSIONS
17
THE LOCALITY(1) ASSUMPTION
18
Data divides best on one attribute
1.  development centers of developers;
2.  project type; ...
THE LOCALITY(N) ASSUMPTION
19
Data divides best on
combination of attributes
If Locality(N)
• Easier to use data across
th...
HOW TO FIND RELEVANT TRAINING
DATA?
20
independent
attributes
w x y z class
similar 1
0 1 1 1 2
similar 2
0 1 1 1 3
differ...
VARIANCE PRUNING
21
independent
attributes
w x y z class
similar 1
0 1 1 1 2
similar 2
0 1 1 1 3
different 1 7 7 6 2 5
dif...
TEAK: CLUSTERING + VARIANCE
PRUNING (TSE, JAN 2011)
22
• TEAK is a variance-based
instance selector
• It is built via GAC ...
ESSENTIAL POINT
23
TEAK finds local regions important to the
estimation of particular cases
TEAK finds those regions via l...
WITHIN AND CROSS DATASETS
24
Note: all
Locality(1)
divisions
EXPERIMENT1: PERFORMANCE
COMPARISON OF WITHIN AND CROSS-
SOURCE DATA
25
• TEAK on within & cross data for each dataset
gro...
EXPERIMENT 2: RETRIEVAL TENDENCY
OF TEAK FROM WITHIN AND CROSS-
SOURCE DATA
26
EXPERIMENT2: RETRIEVAL TENDENCY OF
TEAK FROM WITHIN AND CROSS-SOURCE
DATA
27
Diagonal (WC) vs.
Off-Diagonal (CC)
selection...
HIGHLIGHTS
28
1.  Don’t listen to everyone
•  When listening to a crowd, first
filter the noise
2.  Once the noise clears:...
SO, THERE IS HOPE
•  Maybe we’ve been looking in the wrong direction
•  SE project data = surface features of an underlyin...
TSE, 2013 :
LOCAL VS. GLOBAL
MODELS FOR EFFORT
ESTIMATION AND
DEFECT PREDICTION
TIM MENZIES, ANDREW BUTCHER (WVU)
ANDRIAN ...
Do not on what
we can see at first glance
Check the nuances on the
hidden structure beneath
31
THERE IS HOPE
12/1/2011
32
Cluster then learn
(using envy)
•  Seek the fence
where the grass
is greener on
the other side.
•  Learn from
there
•  Test on
here
•  Cluster to find
“he...
12/1/2011
34
@attribute recordnumber real
@attribute projectname {de,erb,gal,X,hst,slp,spl,Y}
@attribute cat2 {Avionics, a...
CAUTION: DATA MAY NOT DIVIDE
NEATLY ON RAW DIMENSIONS
The best description for SE projects may be synthesize
dimensions ex...
FASTMAP
36
Fastmap: Faloutsos [1995]
O(2N) generation of axis of large variability
•  Pick any point W;
•  Find X furthest...
HIERARCHICAL PARTITIONING
Prune
Find two orthogonal dimensions
Find median(x), median(y)
Recurse on four quadrants
Combine...
38
Learning via “envy”
•  Seek the fence
where the grass
is greener on
the other side.
•  Learn from
there
•  Test on
here
•  Cluster to find
“he...
HIERARCHICAL PARTITIONING
Prune
Find two orthogonal dimensions
Find median(x), median(y)
Recurse on four quadrants
Combine...
HIERARCHICAL PARTITIONING
Prune
Find two orthogonal dimensions
Find median(x), median(y)
Recurse on four quadrants
Combine...
Q: HOW TO LEARN RULES FROM
NEIGHBORING CLUSTERS
A: it doesn’t really matter
•  Many competent rule learners
But to evaluat...
DATA FROM
HTTP://PROMISEDATA.ORG/DATA
Effort reduction =
{ NasaCoc, China } :
COCOMO or function points
Defect reduction =...
Lower median efforts/defects (50th percentile)
Greater stability (75th – 25th percentile)
Decreased worst case (100th perc...
RULES LEARNED IN EACH CLUSTER
What works best “here” does not work “there”
•  Misguided to try and tame conclusion instabi...
RULES LEARNED IN EACH CLUSTER
What works best “here” does not work “there”
•  Misguided to try and tame conclusion instabi...
Do not on what
we can see at first glance
Check the nuances on the
structures within our data
•  Cluster, then envy
47
SO ...
48
Conclusion
LACK OF
TRANSFER =
THE GREAT
SCANDAL OF SE
•  Replication is Empirical
SE is rare
•  Conclusion instability
•  “It all dep...
BUT THERE IS HOPE
•  Maybe we’ve been looking in the wrong direction
•  SE project data = surface features of an underlyin...
Do not on what
we can see at first glance
Check the nuances on the
structures within our data
•  Cluster, then envy
51
BUT...
With new data mining technologies, true picture
emerges, where we can see what is going on
12/1/2011
52
BUT THERE IS HOPE
53
Upcoming SlideShare
Loading in...5
×

Franhouder july2013

202

Published on

Transfer learning and SE

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
202
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Franhouder july2013

  1. 1. TRANSFER LEARNING AND SE TIM@MENZIES.US WVU, JULY 2013
  2. 2. SOUND BITES •  Ye olde worlde SE •  “The” model of SE (defects, effort, etc) •  21st century SE •  Models (plural) •  No generality in models •  But , perhaps generality in how we find those models •  Transfer learning 2
  3. 3. 3
  4. 4. WHAT IS TRANSFER LEARNING? •  Source = old= Domain1 = < Eg1, P1> •  Target = new = Domain2 = <Eg2, P2> •  If we move from domain1 to domain2, do have have to start afresh? •  Or can we learn faster in “new” … •  … Using lessons learned from “old”? •  NSF funding (2013..2017): •  Transfer learning in Software Engineering •  Menzies, Layman, Shull , Diep 4
  5. 5. WHO CARES? (WHAT’S AT STAKE?) •  “Transfer” is a core scientific issue •  Lack of transfer is the scandal of SE •  Replication is Empirical SE is rare •  Conclusion instability •  It all depends. •  The full stop syndrome •  The result? •  A funding crisis 5
  6. 6. MANUAL TRANSFER (WAR STORIES) •  Brazil, SEL, 2002: need domain knowledge (but now gone)? •  NSF, SEL, 2006: need better automatic support •  Kitchenham, Mendes et al, TSE 2007: for = against •  Zimmermann FSE, 2009: cross works in 4/600 times 6
  7. 7. WAR STORIES (EFFORT ESTIMATION) Effort = a . locx . y •  learned using Boehm’s methods •  20*66% of NASA93 •  COCOMO attributes •  Linear regression (log pre-processor) •  Sort the co-efficients found for each member of x,y 7
  8. 8. WAR STORIES (DEFECT ESTIMATION) 8
  9. 9. BUT THERE IS HOPE •  Maybe we’ve been looking in the wrong direction •  SE project data = surface features of an underlying effect •  Go beneath the surface 9
  10. 10. Focused too much on what we can see at first glance Did not check the nuances on the hidden structure beneath 10 BUT THERE IS HOPE
  11. 11. With new data mining technologies, true picture emerges, where we can see what is going on 12/1/2011 11 BUT THERE IS HOPE
  12. 12. ESEM, 2011 : How to Find Relevant Data for Effort Estimation TIM MENZIES, EKREM KOCAGUNELI
  13. 13. THERE IS HOPE •  Maybe we’ve been looking in the wrong direction •  SE project data = surface features of an underlying effect •  Go beneath the surface 13
  14. 14. USD DOD MILITARY PROJECTS (LAST DECADE) 14 You must segment to find relevant data
  15. 15. 15" DOMAIN SEGMENTATIONS 15 Q: What to do about rare zones? A: Select the nearest ones from the rest But how?
  16. 16. IN THE LITERATURE: WITHIN VS CROSS = ?? BEFORE THIS WORK 16 Kitchenham et al. TSE 2007 •  Within-company learning (just use local data) •  Cross-company learning (just use data from other companies) Results mixed •  No clear win from cross or within Cross vs within are no rigid boundaries •  They are soft borders •  And we can move a few examples across the border •  And after making those moves •  “Cross” same as “local”
  17. 17. SOME DATA DOES NOT DIVIDE NEATLY ON EXISTING DIMENSIONS 17
  18. 18. THE LOCALITY(1) ASSUMPTION 18 Data divides best on one attribute 1.  development centers of developers; 2.  project type; e.g. embedded, etc; 3.  development language 4.  application type (MIS; GNC; etc); 5.  targeted hardware platform; 6.  in-house vs outsourced projects; 7.  Etc If Locality(1) : hard to use data across these boundaries •  Then harder to build effort models: •  Need to collect local data (slow)
  19. 19. THE LOCALITY(N) ASSUMPTION 19 Data divides best on combination of attributes If Locality(N) • Easier to use data across these boundaries •  Relevant data spread all around •  little diamonds floating in the dust
  20. 20. HOW TO FIND RELEVANT TRAINING DATA? 20 independent attributes w x y z class similar 1 0 1 1 1 2 similar 2 0 1 1 1 3 different 1 7 7 6 2 5 different 2 1 9 1 8 8 different 3 5 4 2 6 10 alien 1 74 15 73 56 20 alien 2 77 45 13 6 40 alien 3 35 99 31 21 60 alien 4 49 55 37 4 80 Use similar? Use more variant? Use aliens ?
  21. 21. VARIANCE PRUNING 21 independent attributes w x y z class similar 1 0 1 1 1 2 similar 2 0 1 1 1 3 different 1 7 7 6 2 5 different 2 1 9 1 8 8 different 3 5 4 2 6 10 alien 1 74 15 73 56 20 alien 2 77 45 13 6 40 alien 3 35 99 31 21 60 alien 4 49 55 37 4 80 1) Sort the clusters by “variance” 2) Prune those high variance things 3) Estimate on the rest “Easy path”: cull the examples that hurt the learner PRUNE ! KEEP !
  22. 22. TEAK: CLUSTERING + VARIANCE PRUNING (TSE, JAN 2011) 22 • TEAK is a variance-based instance selector • It is built via GAC trees • TEAK is a two-pass system • First pass selects low- variance relevant projects • Second pass retrieves projects to estimate from
  23. 23. ESSENTIAL POINT 23 TEAK finds local regions important to the estimation of particular cases TEAK finds those regions via locality(N) •  Not locality(1)
  24. 24. WITHIN AND CROSS DATASETS 24 Note: all Locality(1) divisions
  25. 25. EXPERIMENT1: PERFORMANCE COMPARISON OF WITHIN AND CROSS- SOURCE DATA 25 • TEAK on within & cross data for each dataset group (lines separate groups) • LOOCV used for runs • 20 runs performed for each treatment • Results evaluated w.r.t. MAR, MMRE, MdMRE and Pred(30), but see http://goo.gl/6q0tw • If within data outperforms cross, the dataset is highlighted with gray • See only 2 datasets highlighted
  26. 26. EXPERIMENT 2: RETRIEVAL TENDENCY OF TEAK FROM WITHIN AND CROSS- SOURCE DATA 26
  27. 27. EXPERIMENT2: RETRIEVAL TENDENCY OF TEAK FROM WITHIN AND CROSS-SOURCE DATA 27 Diagonal (WC) vs. Off-Diagonal (CC) selection percentages sorted Percentiles of diagonals and off-diagonals
  28. 28. HIGHLIGHTS 28 1.  Don’t listen to everyone •  When listening to a crowd, first filter the noise 2.  Once the noise clears: bits of me are similar to bits of you •  Probability of selecting cross or within instances is the same 3.  Cross-vs-within is not a useful distinction •  Locality(1) not informative •  Enables “cross-company” learning
  29. 29. SO, THERE IS HOPE •  Maybe we’ve been looking in the wrong direction •  SE project data = surface features of an underlying effect •  Go beneath the surface •  Assuming locality(N), not locality(1) •  No cross-, no within- •  Its all data we can learn from 29
  30. 30. TSE, 2013 : LOCAL VS. GLOBAL MODELS FOR EFFORT ESTIMATION AND DEFECT PREDICTION TIM MENZIES, ANDREW BUTCHER (WVU) ANDRIAN MARCUS (WAYNE STATE) THOMAS ZIMMERMANN (MICROSOFT) DAVID COK (GRAMMATECH)
  31. 31. Do not on what we can see at first glance Check the nuances on the hidden structure beneath 31 THERE IS HOPE
  32. 32. 12/1/2011 32 Cluster then learn (using envy)
  33. 33. •  Seek the fence where the grass is greener on the other side. •  Learn from there •  Test on here •  Cluster to find “here” and “there” 12/1/2011 33 ENVY = THE WISDOM OF THE COWS
  34. 34. 12/1/2011 34 @attribute recordnumber real @attribute projectname {de,erb,gal,X,hst,slp,spl,Y} @attribute cat2 {Avionics, application_ground, avionicsmonitoring, … } @attribute center {1,2,3,4,5,6} @attribute year real @attribute mode {embedded,organic,semidetached} @attribute rely {vl,l,n,h,vh,xh} @attribute data {vl,l,n,h,vh,xh} … @attribute equivphyskloc real @attribute act_effort real @data 1,de,avionicsmonitoring,g,2,1979,semidetached,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,25.9,117.6 2,de,avionicsmonitoring,g,2,1979,semidetached,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,24.6,117.6 3,de,avionicsmonitoring,g,2,1979,semidetached,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,7.7,31.2 4,de,avionicsmonitoring,g,2,1979,semidetached,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,8.2,36 5,de,avionicsmonitoring,g,2,1979,semidetached,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,9.7,25.2 6,de,avionicsmonitoring,g,2,1979,semidetached,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,2.2,8.4 …. DATA = MULTI-DIMENSIONAL VECTORS
  35. 35. CAUTION: DATA MAY NOT DIVIDE NEATLY ON RAW DIMENSIONS The best description for SE projects may be synthesize dimensions extracted from the raw dimensions 12/1/2011 35
  36. 36. FASTMAP 36 Fastmap: Faloutsos [1995] O(2N) generation of axis of large variability •  Pick any point W; •  Find X furthest from W, •  Find Y furthest from Y. c = dist(X,Y) All points have distance a,b to (X,Y) •  x = (a2 + c2 − b2)/2c •  y= sqrt(a2 – x2) Find median(x), median(y) Recurse on four quadrants
  37. 37. HIERARCHICAL PARTITIONING Prune Find two orthogonal dimensions Find median(x), median(y) Recurse on four quadrants Combine quadtree leaves with similar densities Score each cluster by median score of class variable 37 Grow
  38. 38. 38 Learning via “envy”
  39. 39. •  Seek the fence where the grass is greener on the other side. •  Learn from there •  Test on here •  Cluster to find “here” and “there” 39 ENVY = THE WISDOM OF THE COWS
  40. 40. HIERARCHICAL PARTITIONING Prune Find two orthogonal dimensions Find median(x), median(y) Recurse on four quadrants Combine quadtree leaves with similar densities Score each cluster by median score of class variable 40 Grow
  41. 41. HIERARCHICAL PARTITIONING Prune Find two orthogonal dimensions Find median(x), median(y) Recurse on four quadrants Combine quadtree leaves with similar densities Score each cluster by median score of class variable This cluster envies its neighbor with better score and max abs(score(this) - score(neighbor)) 41 Grow Where is grass greenest?
  42. 42. Q: HOW TO LEARN RULES FROM NEIGHBORING CLUSTERS A: it doesn’t really matter •  Many competent rule learners But to evaluate global vs local rules: •  Use the same rule learner for local vs global rule learning This study uses WHICH (Menzies [2010]) • Customizable scoring operator • Faster termination • Generates very small rules (good for explanation) 42
  43. 43. DATA FROM HTTP://PROMISEDATA.ORG/DATA Effort reduction = { NasaCoc, China } : COCOMO or function points Defect reduction = {lucene,xalan jedit,synapse,etc } : CK metrics(OO) Clusters have untreated class distribution. Rules select a subset of the examples: •  generate a treated class distribution 43 0 20 40 60 80 100 25th 50th 75th 100th untreated global local Distributions have percentiles: Treated with rules learned from all data Treated with rules learned from neighboring cluster
  44. 44. Lower median efforts/defects (50th percentile) Greater stability (75th – 25th percentile) Decreased worst case (100th percentile) BY ANY MEASURE, LOCAL BETTER THAN GLOBAL 44
  45. 45. RULES LEARNED IN EACH CLUSTER What works best “here” does not work “there” •  Misguided to try and tame conclusion instability •  Inherent in the data Can’t tame conclusion instability. •  Instead, you can exploit it •  Learn local lessons that do better than overly generalized global theories 45
  46. 46. RULES LEARNED IN EACH CLUSTER What works best “here” does not work “there” •  Misguided to try and tame conclusion instability •  Inherent in the data Can’t tame conclusion instability. •  Instead, you can exploit it •  Learn local lessons that do better than overly generalized global theories 46
  47. 47. Do not on what we can see at first glance Check the nuances on the structures within our data •  Cluster, then envy 47 SO THERE IS HOPE
  48. 48. 48 Conclusion
  49. 49. LACK OF TRANSFER = THE GREAT SCANDAL OF SE •  Replication is Empirical SE is rare •  Conclusion instability •  “It all depends.” is not good enough •  A funding crisis 49
  50. 50. BUT THERE IS HOPE •  Maybe we’ve been looking in the wrong direction •  SE project data = surface features of an underlying effect •  Go beneath the surface •  Assuming locality(N), not locality(1) •  No cross-, no within- •  Its all data we can learn from 50
  51. 51. Do not on what we can see at first glance Check the nuances on the structures within our data •  Cluster, then envy 51 BUT THERE IS HOPE
  52. 52. With new data mining technologies, true picture emerges, where we can see what is going on 12/1/2011 52 BUT THERE IS HOPE
  53. 53. 53
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×