Your SlideShare is downloading. ×
Understanding the Value of Software Engineering Technologies
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Understanding the Value of Software Engineering Technologies

1,680

Published on

Talk to ASE'09. Phillip Green, Tim Menzies, Steven Williams, Oussama El-Rasaws …

Talk to ASE'09. Phillip Green, Tim Menzies, Steven Williams, Oussama El-Rasaws
WVU, USA

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,680
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Understanding the Value of Software Engineering Technologies Phillip Green, Tim Menzies, Steven Williams, Oussama El-Rasaws WVU, USA ASE’09. November. Auckland, New Zealand ?
  • 2. Many technologies are good •  Model checking is a good thing •  Runtime verification is a good thing •  Test generation is a good thing •  Mining OO patterns is a good thing. •  X is a good thing •  Y is a good thing •  Z is a … •  etc Automated SE is a “good thing”
  • 3. But which are better? •  Is “technology1” more cost- effective than “technology2”? •  What is a “technology”? –  Paper-based; e.g. •  orthogonal defect classification . –  Tool-based; e.g. •  functional programming languages •  execution and testing tools •  automated formal analysis –  Process-based; e.g. •  changing an organization’s hiring practices, •  an agile continual renegotiation of the requirements Better than others, in context of particular project?
  • 4. This talk •  When making relative assessments of cost/benefits of SE technologies, –  Context changes everything –  Claims of relative cost benefits of tools are context-dependent •  The “local lessons effect” –  Ideas that are useful in general –  May not the most useful in particular •  Tools to find context-dependent local lessons –  “W”: case-based reasoner (goal: reduce effort) –  “NOVA”: more intricate tool (goal: reduce effort and time and defects) •  Take home lessons: –  Beware blanket claims of “this is better” –  Always define context where you tested your tools –  Compare ASE to non-ASE tools The real world is a special case
  • 5. Roadmap •  Round 1: – “W” : a very simple local lessons” finder •  Discussion – What’s wrong with “W” ? •  Round 2: – “NOVA” – Results from NOVA •  Conclusions
  • 6. Roadmap •  Round 1: – “W” : a very simple local lessons” finder •  Discussion – What’s wrong with “W” ? •  Round 2: – “NOVA” – Results from NOVA •  Conclusions
  • 7. How to survive the Titanic
  • 8. “W”:Simpler (Bayesian) Contrast Set Learning (in linear time) •  “best” = target class •  “rest” = other classes •  x = any range (e.g. sex = female) •  f(x|c) = frequency of x in class c •  b = f( x | best ) / F(best) •  r = f( x | rest ) / F(rest) •  LOR= log(odds ratio) = log(b/r) –  ? normalize 0 to max = 1 to 100 –  e = 2.7183 … –  p = F(B) / (F(B) + F(R)) –  P(B) = 1 / (1 + e^(-1*ln(p/(1 - p)) - s )) Mozina: KDD’04 “W”: 1)  Discretize data and outcomes 2)  Count frequencies of ranges in classes 3)  Sort ranges by LOR 4) Greedy search on top ranked ranges
  • 9. Preliminaries “W” + CBR •  “Query” –  What kind of project you want to analyze; e.g. •  Analysts not so clever, •  High reliability system •  Small KLOC •  “Cases” –  Historical records, with their development effort •  Output: –  A recommendation on how to change our projects in order to reduce development effort
  • 10. Cases train test Cases map features F to a utility F= Controllables + others
  • 11. Cases train test Cases map features F to a utility F= Controllables + others (query ⊆ ranges) relevant k-NN
  • 12. Cases train test (query ⊆ ranges) relevant Best utilities rest x x b = F(x | best) / F(best) r = F(x | rest) / F(rest) k-NN Cases map features F to a utility F= Controllables + others
  • 13. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi k-NN Cases map features F to a utility F= Controllables + others b = F(x | best) / F(best) r = F(x | rest) / F(rest)
  • 14. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi b = F(x | best) / F(best) r = F(x | rest) / F(rest)
  • 15. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN i utility spread Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi b = F(x | best) / F(best) r = F(x | rest) / F(rest) median
  • 16. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN i q0* qi* As is To be Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi treatment b = F(x | best) / F(best) r = F(x | rest) / F(rest) i utility spread median
  • 17. Results (distribution of development efforts in qi*) Cases from promisedata.org/data Median = 50% percentile Spread = 75% - 25% percentile Improvement = (X - Y) / X •  X = as is •  Y = to be •  more is better Usually: • spread ≥ 75% improvement • median ≥ 60% improvement 0% 50% 100% 150% -50% 0% 50% 100% 150% median improvement spreadimprovement Using cases from http://promisedata.org
  • 18. Treatments Suggested by “W” All different
  • 19. Roadmap •  Round 1: – “W” : a very simple local lessons” finder •  Discussion – What’s wrong with “W” ? •  Round 2: – “NOVA” – Results from NOVA •  Conclusions
  • 20. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN i q0* qi* As is To be Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi treatment b = F(x | best) / F(best) r = F(x | rest) / F(rest) i utility spread median
  • 21. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN i q0* qi* As is To be Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi treatment b = F(x | best) / F(best) r = F(x | rest) / F(rest) i utility spread median
  • 22. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN i q0* qi* As is To be Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi treatment b = F(x | best) / F(best) r = F(x | rest) / F(rest) i utility spread median A greedy linear time search? • Need to use much better search algorithms • Simulated annealing, Beam, Astar, ISSAMP, MaxWalkSat • SEESAW (home brew)
  • 23. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN i q0* qi* As is To be Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi treatment b = F(x | best) / F(best) r = F(x | rest) / F(rest) i utility spread median
  • 24. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN i q0* qi* As is To be Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi treatment b = F(x | best) / F(best) r = F(x | rest) / F(rest) i utility spread median
  • 25. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN i q0* qi* As is To be Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi treatment b = F(x | best) / F(best) r = F(x | rest) / F(rest) i utility spread median Just trying to reduce effort? • What about development time? • What about number of defects? • What about different business contexts? e.g. “racing to market” vs “mission-critical” apps
  • 26. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN i q0* qi* As is To be Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi treatment b = F(x | best) / F(best) r = F(x | rest) / F(rest) i utility spread median
  • 27. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN i q0* qi* As is To be Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi treatment b = F(x | best) / F(best) r = F(x | rest) / F(rest) i utility spread median
  • 28. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN i q0* qi* As is To be Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi treatment b = F(x | best) / F(best) r = F(x | rest) / F(rest) i utility spread median Is nearest neighbor causing conclusion instability? • Q: How to smooth the bumps between between the samples ? • A: Don’t apply constraints to the data • Apply it as model inputs instead
  • 29. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN i q0* qi* As is To be Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi treatment b = F(x | best) / F(best) r = F(x | rest) / F(rest) i utility spread median
  • 30. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN i q0* qi* As is To be Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi treatment b = F(x | best) / F(best) r = F(x | rest) / F(rest) i utility spread median
  • 31. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN i q0* qi* As is To be Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi treatment b = F(x | best) / F(best) r = F(x | rest) / F(rest) i utility spread median Just one test? • What about looking for stability in “N” repeats?
  • 32. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN i q0* qi* As is To be Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi treatment b = F(x | best) / F(best) r = F(x | rest) / F(rest) i utility spread median More tests1
  • 33. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN i q0* qi* As is To be Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi treatment b = F(x | best) / F(best) r = F(x | rest) / F(rest) i utility spread median More tests1 More models 2
  • 34. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN i q0* qi* As is To be Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi treatment b = F(x | best) / F(best) r = F(x | rest) / F(rest) i utility spread median More tests1 More models 2 More goals3
  • 35. Cases train test (query ⊆ ranges) relevant Best utilities rest x x S = all x sorted descending by score queryi* = query + ∪iSi treatedi k-NN k-NN i q0* qi* As is To be Cases map features F to a utility F= Controllables + others if controllable(x) && b > r && b > min then score(x) = log(b/r) else score(x) = 0 fi treatment b = F(x | best) / F(best) r = F(x | rest) / F(rest) i utility spread median More tests1 More models 2 More goals3 More search4
  • 36. Roadmap •  Round 1: – “W” : a very simple local lessons” finder •  Discussion – What’s wrong with “W” ? •  Round 2: – “NOVA” – Results from NOVA •  Conclusions
  • 37. COCOMO •  Time to build it (calendar months) •  Effort to build it (total staff months) COQUALMO •  defects per 1000 lines of code Estimate = model( p, t ) •  P = project options •  T = tuning options •  Normal practice: Adjust “t” using local data •  NOVA: Stagger randomly all tunings even seen before USC Cocomo suite (Boehm 1981, 2000) More models ?
  • 38. B = BFC Goal #1: •  better, faster, cheaper Try to minimize: •  Development time and •  Development effort and •  # defects Goal #2 •  minimize risk exposure Rushing to beat the competition •  Get to market, soon as you can •  Without too many defects More goals X = XPOS
  • 39. Simulated Annealling ISSAMP ASTAR BEAM MaxWalkSat SEESAW : MaxWalkSat + boundary mutation •  Local favorite •  Does best at reduction defects or effort or time Not greedy search More search engines
  • 40. Data sets •  OSP= orbital space plane GNC •  OSP2 = second generation GNC •  Flight = JPL flight systems •  Ground = JPL ground systems For each data set •  Search N= 20 times (with SEESAW) •  Record how often decisions are found Four data sets, repeat N=20 times More tests
  • 41. Frequency% of range in 20 repeats If high, then more in BFC If low, then usually in XPOS Better, faster, cheaper Minimize risk exposure (rushing to market) (ignore all ranges found < 50%) If 50% then same In BFC and XPOS Mostly: if selected by one, rejected by the other “Value” (business context) changes everything
  • 42. And what of defect removal techniques? Better, faster, cheaper Minimize risk exposure (rushing to market) Aa = automated analysis Etat= execution testing and tools Pr= peer review Stopping defect introduction is better than defect removal.
  • 43. Roadmap •  Round 1: – “W” : a very simple local lessons” finder •  Discussion – What’s wrong with “W” ? •  Round 2: – “NOVA” – Results from NOVA •  Conclusions
  • 44. Conclusion •  When making relative assessments of cost/benefits of SE technologies, –  Context changes everything –  Claims of relative cost benefits of tools are context-dependent •  The “local lessons effect” –  Ideas that are useful in general –  May not the most useful in particular •  Tools to find context-dependent local lessons –  “W”: case-based reasoner (goal: reduce effort) –  “NOVA”: more intricate tool (goal: reduce effort and time and defects) •  Take home lessons: –  Beware blanket claims of “this is better” –  Always define context where you tested your tools –  Compare ASE to non-ASE tools The real world is a special case

×