Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

On the value of Sampling and Pruning for SBSE

89 views

Published on

Oral Prelim Exam slides (for publication).
Thesis statement: for the optimization of SE planning and replanning tasks, given appropriate separation operators, then oversampling and pruning is better than mutation based evolutionary approaches.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

On the value of Sampling and Pruning for SBSE

  1. 1. On the Value of Sampling and Pruning for Search-Based Software Engineering Jianfeng Chen (jchen37@ncsu.edu) April 20 2018 1
  2. 2. How to better support SE planning + re-planning? Plan (what to do) Re-plan (what to react to new circumstance) What feature to include in project What feature to include in vi+1 Assign software to cloud env. How? Adjust to cloud env. changes. How? What to test first? What to test next? 2
  3. 3. Problem: planning & re-planning can be very slow. Running time SLOW [Zhang’17] Yuanyuan Zhang, Mark Harman, and A Mansouri. The sbse repository: A repository and analysis of authors and research articles on search based software engineering. CREST Centre, UCL 3
  4. 4. Thesis Statement For the optimization of SE planning and re-planning tasks, ● given appropriate separation operators1 , ● then OverSampling and Pruning1 (OSAP) is better ● than the mutation based EVOLutionary1 (EVOL) approach ● (where “better” is measured in terms of runtimes, number of evaluations, and value of final result). 1 to be defined, later in this talk 4
  5. 5. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Roadmap ● What is Search-based SE ● EVOL: Evolutionary algorithms ○ GALE: A geometric learner ● OSAP: Oversampling-and-pruning via Separation Operators 5
  6. 6. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Publications & tools in this PhD program FINAL THESISTHIS TALK [CLOUD18 Chen et al.] (Accept rate: 15%) RIOT: workflow scheduling tool [TSE18 Chen et al.] Sampling as a baseline for SBSE [IST17 Chen et al.] Beyond EA for SBSE [SSBSE16 Nair et al.] Accidental exploration for SBSE Publications Tools 6
  7. 7. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Roadmap ● What is Search-based SE ● EVOL: Evolutionary algorithms ○ GALE: A geometric learner ● OSAP: Oversampling-and-pruning via Separation Operators 7
  8. 8. SE = making choices in multi (rival) objectives ● Deployments (improving QoS vs. reducing deployment cost) ○ CLOUD: cloud configuration optimization ● Testing (test cost vs. defects detected) ○ Fuzzy testing: less test cases to cover more paths ● SE Planning (trade offs functionality vs. cost) ○ NRP: next release requirements planning ○ SPL: software product lines: product selection Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors 8
  9. 9. Res s Tim Cos Search based Software Engineering (SBSE) converts a software engineering problem into a computational search problem, and solves that. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Mem A c C U l o Ban d b f(b) f(a) a 9
  10. 10. Configuration Space Objective Space Dominance: p dominates q if and only if Consider every objective, p performs no worse than q AND There exist at least one objective, p preforms strictly better than q Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors f(p) f(q)f(x) 10 Pareto frontier Res s Tim co
  11. 11. Configuration Space Objective Space Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors f(p) f(q)f(x) Characteristics of SBSE problems ● More than one objective ● Configuration space is huge ● Constrained configurations ● Complex (no easy to assess configurations) 11In SBSE community: the Evolutionary algorithm
  12. 12. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Roadmap ● What is Search-based SE ● EVOL: Evolutionary algorithms ○ GALE: A geometric learner ● OSAP: Oversampling-and-pruning via Separation Operators 12
  13. 13. initial configurations (population) best configurations Treat the problem as black-box Easy to deploy to new problem ~~SLOW~~ ● Airspace operation model verification -- 7 days [Krall’15] ● Test suite generation -- weeks [Yoo’12] ● Software clone evaluation @ pc -- 15 years [Wang’13] Krall, Joseph, Tim Menzies, and Misty Davies. "Learning the task management space of an aircraft approach model." (2014). Yoo, Shin, and Mark Harman. "Regression testing minimization, selection and prioritization: a survey." Software Testing, Verification and Reliability 22.2 (2012): 67-120. Wang, Tiantian, et al. "Searching for better configurations: a rigorous approach to clone evaluation." Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. ACM, 2013. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Evolutionary algorithm (EVOL) 13
  14. 14. Chang, C. K., Jiang, H. Y., Di, Y., Zhu, D., & Ge, Y. (2008). Time-line based model for software project scheduling with genetic algorithms. Information and Software Technology, 50(11) Tsai, Chun-Wei, et al. "A hyper-heuristic scheduling algorithm for cloud." IEEE Transactions on Cloud Computing 2.2 (2014): 236-250. Arcuri, Andrea. "Many Independent Objective (MIO) Algorithm for Test Suite Generation." International Symposium on Search Based Software Engineering. Springer, Cham, 2017. Research directions in SBSE 2 Combining EAs E.g. [Tsai’14] A Hyper-heuristic Scheduling Algorithm for cloud GA+SA+ACO+PSO Slow^2 3 Re-design objective functions E.g. [Andrea’17] Many Independent objective algorithm for test suite generation Much complex model. Longer time to evaluate Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors 1 Better configuration encoding E.g. [Chang’11] Time-line based model for software project scheduling with genetic algorithm Expert knowledge; carefully design recombination/mutation 14
  15. 15. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Current SBSE solutions are too slow ! Why need faster optimizers? (Save $$$, Faster response to model changes) 15
  16. 16. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Roadmap ● What is Search-based SE ● EVOL: Evolutionary algorithms ○ GALE: A geometric learner ● OSAP: Oversampling-and-pruning via Separation Operators 16
  17. 17. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors [Krall’15] Krall, Joseph, Tim Menzies, and Misty Davies. "Gale: Geometric active learning for search-based software engineering." TSE Configuration Space GALE = Geometric active learner [krall’15] 17 initial configurations (population) best configurations Objective Space
  18. 18. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors [Krall’15] Krall, Joseph, Tim Menzies, and Misty Davies. "Gale: Geometric active learning for search-based software engineering." TSE Configuration Space GALE = Geometric active learner [krall’15] 18 best configurations Objective Space initial configurations (population)
  19. 19. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors [Krall’15] Krall, Joseph, Tim Menzies, and Misty Davies. "Gale: Geometric active learning for search-based software engineering." TSE GALE = Geometric active learner [krall’15] EVOL GALE Population N = 100 N = 100 Recombination ✓ ✓ Mutation ✓ ✓ Evaluation # gen# * N gen# * 2*log(N) O(G·N) -> O(G·logN) 19
  20. 20. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors [Krall’15] Krall, Joseph, Tim Menzies, and Misty Davies. "Gale: Geometric active learning for search-based software engineering." TSE GALE = Geometric active learner [krall’15] 20
  21. 21. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors [Krall’15] Krall, Joseph, Tim Menzies, and Misty Davies. "Gale: Geometric active learning for search-based software engineering." TSE Configuration Space Objective Space The selected configuration region did not swift a lot. Not necessary to explore more generations. Increase population size. [100->10,000] Over-sampling 21
  22. 22. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors OSAP Oversampling and pruning EVOL GALE Over-sampling Population N = 100 N = 100 Recombination ✓ ✓ Mutation ✓ ✓ Evaluation # gen# * N gen# * 2log(N) O(G·N) -> O(G·logN)-> O(logN) N=10,000 ✘ ✘ 2log(N) ... Over-sampling: population is much larger 22
  23. 23. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Roadmap ● What is Search-based SE ● EVOL: Evolutionary algorithms ○ GALE: A geometric learner ● OSAP: Oversampling-and-pruning via Separation Operators Separation Operators 1 Top-down bi-clustering Algorithm Configuration Space Study Cases 23
  24. 24. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors SWAY = Top-down bi-clustering (R) rand init configuration (W) Furthest to (R) (E)Furthest to (W) Configuration Space 24 “Diameter” of configuration space
  25. 25. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors SWAY = Top-down bi-clustering (W) Furthest to (R) (E)Furthest to (W) Configuration Space 25 “Diameter” of configuration space
  26. 26. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Separation Operators 1 Top-down bi-clustering Algorithm SWAY Configuration Space Continuous Study Cases XOMO, POM3 26 Chen, Jianfeng, et al. "" Sampling" as a Baseline Optimizer for Search-based Software Engineering." IEEE Transactions on Software Engineering (2018).
  27. 27. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Assuming: small region of configuration space can lead to the frontier What if Configuration Space Objective Space 27
  28. 28. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Configuration Space Objective Space Perform the top-down bi-clustering separately 28
  29. 29. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Configuration Space Objective Space Encoding: represent the model configuration in vectors, combinations, etc. How the model encoded? How can we gather similar configurations ? SWAY2 , Separate via Encoding knowledge 29
  30. 30. Software Product Line optimization Objectives Select features to develop such that... ● More features ● Less defects ● Less total cost ● More familiar features Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors 30
  31. 31. Software Product Line optimization Configuration (feature model) Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors optionalmandatory Cross tree constraints 31
  32. 32. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Software Product Line optimization CNF (conjunctive normal forms) Solvable by SAT solvers. Initialization via SAT solver. 32
  33. 33. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Software Product Line optimization CNF (conjunctive normal forms) Solvable by SAT solvers. Initialization via SAT solver. HIGH DIMENSIONAL HIGHLY CONSTRAINED 33
  34. 34. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Software Product Line optimization Related work (EVOL) White, Jules, Brian Doughtery, and Douglas C. Schmidt. "Filtered Cartesian Flattening: An Approximation Technique for Optimally Selecting Features while Adhering to Resource Constraints." SPLC (2). 2008. Wu, Zhiqiao, et al. "An optimization model for reuse scenario selection considering reliability and cost in software product line development." International Journal of Information Technology & Decision Making 10.05 (2011): 811-841. Sayyad, Abdel Salam, Tim Menzies, and Hany Ammar. "On the value of user preferences in search-based software engineering: a case study in software product lines." ICSE’13 Sayyad, Abdel Salam, et al. "Scalable product line configuration: A straw to break the camel's back." Automated Software Engineering (ASE), 2013 Henard, Christopher, et al. "Combining multi-objective search and constraint solving for configuring large software product lines." Software Engineering (ICSE), 2015 White’08 Wu’11 Sayyad’13 Henard’15 Single obj Aggregated obj IBEA 34
  35. 35. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors How the model encoded? How can we gather similar configurations ? As scale increases Scale = 4 Configuration Space Objective Space co s de t 35
  36. 36. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors As scale increasesscale Radius ∝ scale Inner circle :: smaller area :: less diverse for simple configurations Outer circle :: larger area :: larger diverse for complex configurations 36
  37. 37. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Radius ∝ scale Smaller area. Less configurations Larger area. More configurations 37 Configuration Space Objective Space co s de t
  38. 38. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors constraints#. i.e. complexity of the model State-of-the-art EVOL SWAY2 is (orders of magnitude) faster than EVOL. This is important when models become complex 38
  39. 39. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Wang, Shuai, et al. "A practical guide to select quality indicators for assessing pareto-based search algorithms in search-based software engineering." Software Engineering (ICSE), 2016 IEEE/ACM 38th International Conference on. IEEE, 2016. GS PFS HV Webportal 81 Eshop 506 Fiasco 5228 Freebsd 62138 Linux 343944 Obtained frontiers Pareto front size (PFS) # of obtained frontiers Hyper-volume (HV) Spread (GS) 39
  40. 40. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors SWAY(*) VS. State-Of-The-Art ⬤ Statistically no difference than SATIBEA ⬤ Significantly better than SATIBEA ⬤ Significantly worse than SATIBEA 40 A12 >= 0.6, not the same Arcuri and Briand at ICSE’11 Arcuri, Andrea, and Lionel Briand. "A practical guide for using statistical tests to assess randomized algorithms in software engineering." Software Engineering (ICSE), 2011 33rd International Conference on. IEEE, 2011.
  41. 41. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors W/o encoding knowledge GS PFS HV Webportal 81 ⬤ ⬤ ⬤ Eshop 506 ⬤ ⬤ ⬤ Fiasco 5228 ⬤ ⬤ ⬤ Freebsd 62138 ⬤ ⬤ ⬤ Linux 343944 ⬤ ⬤ ⬤ SWAY(*) VS. State-Of-The-Art ⬤ Statistically no difference than SATIBEA ⬤ Significantly better than SATIBEA ⬤ Significantly worse than SATIBEA With encoding knowledge GS PFS HV Webportal ⬤ ⬤ ⬤ eshop ⬤ ⬤ ⬤ Fiasco ⬤ ⬤ ⬤ freebsd ⬤ ⬤ ⬤ linux ⬤ ⬤ ⬤ 41Arcuri, Andrea, and Lionel Briand. "A practical guide for using statistical tests to assess randomized algorithms in software engineering." Software Engineering (ICSE), 2011 33rd International Conference on. IEEE, 2011.
  42. 42. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors W/o encoding knowledge GS PFS HV Webportal 81 ⬤ ⬤ ⬤ Eshop 506 ⬤ ⬤ ⬤ Fiasco 5228 ⬤ ⬤ ⬤ Freebsd 62138 ⬤ ⬤ ⬤ Linux 343944 ⬤ ⬤ ⬤ SWAY(*) VS. State-Of-The-Art ⬤ Statistically no difference than SATIBEA ⬤ Significantly better than SATIBEA ⬤ Significantly worse than SATIBEA With encoding knowledge GS PFS HV Webportal ⬤ ⬤ ⬤ eshop ⬤ ⬤ ⬤ Fiasco ⬤ ⬤ ⬤ freebsd ⬤ ⬤ ⬤ linux ⬤ ⬤ ⬤ Across all measures, in the majority cases, SWAY2 is better than SATIBEA (EVOL) 42Arcuri, Andrea, and Lionel Briand. "A practical guide for using statistical tests to assess randomized algorithms in software engineering." Software Engineering (ICSE), 2011 33rd International Conference on. IEEE, 2011.
  43. 43. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Separation Operators 1 Top-down bi-clustering 2 Encoding Knowledge Algorithm SWAY SWAY2 Configuration Space Continuous Binary vector Highly constrained Study Cases XOMO, POM3 SPL 43
  44. 44. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Configuration Space Objective Space Q: How to find the complete frontier? A: Increase the “resolution” of the separation However, we can’t evaluate too many configurations! 44
  45. 45. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Select and evaluate a few “representative” configurations -- anchors. # anchors << # init configurations Choices of anchors: ★ 1 = the diagonal ★ 2 = random ★ 3 = 1 + 2 45
  46. 46. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Select and evaluate a few “representative” configurations -- anchors. Then use the evaluated anchors to guess objectives of the other configurations Surrogate model: replace the origin complex model with a very simple model/formula. Config to guess “c” Nearest anchor N Similar config-> similar objs Furthest anchor F p Q p:Q 46 xY x:Y = p:Q f(c) f(N) f(F) O1
  47. 47. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Workflow deployments MONTAGE NASA workflow for generating custom images of the sky task workflow Objectives Select proper virtual machines to execute each task so that ... ● end workflow earlier ● less cloud service rental cost Configuration space 47RIOT: Randomized instance types
  48. 48. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Zhu, Zhaomeng, et al. "Evolutionary multi-objective workflow scheduling in cloud." IEEE Transactions on parallel and distributed Systems 27.5 (2016): 1344-1357. Finish time if we deploy model to aws using median $$$ State-of-the-art method [Zhang’16]. EVOL based 48
  49. 49. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Zhu, Zhaomeng, et al. "Evolutionary multi-objective workflow scheduling in cloud." IEEE Transactions on parallel and distributed Systems 27.5 (2016): 1344-1357. 49 Montage as tasks # increases Epigenomics Inspiral Cybershake Sipht y=speedup EVOL/RIOT
  50. 50. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Zhu, Zhaomeng, et al. "Evolutionary multi-objective workflow scheduling in cloud." IEEE Transactions on parallel and distributed Systems 27.5 (2016): 1344-1357. 50 Montage as tasks # increases Epigenomics Inspiral Cybershake Sipht y=speedup EVOL/RIOT RIOT is much faster than state-of-the-art(EVOL)
  51. 51. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Zhu, Zhaomeng, et al. "Evolutionary multi-objective workflow scheduling in cloud." IEEE Transactions on parallel and distributed Systems 27.5 (2016): 1344-1357. Obtained frontiers Hyper-volume (HV) Spread (GS) Bold blue values RIOT performed as well as or better than state-of-the-art EVOL Across all measures, in the majority cases, statistically, RIOT is better than EVOL. 51
  52. 52. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Recap ● EVOL: Evolutionary algorithms ● OSAP: Oversampling-and-pruning via Separation Operators Separation Operators 1 Top-down bi-clustering 2 Encoding Knowledge 3 Random Anchors Algorithm SWAY SWAY2 RIOT Configuration Space Continuous Binary vector Highly constrained Enumerates Study Cases XOMO, POM3 SPL Workflow config 52
  53. 53. Roadmap Introduction EVOL GALE OSAP ├─ TopDown Bi-clustering ├─ Encoding Knowledge └─ Random Anchors Conclusion For the optimization of SE planning and re-planning tasks, ● given appropriate separation operators, ● then over-sampling+pruning (OSAP) is better ● than the standard mutation+evolutionary (EVOL) approach 53

×