Successfully reported this slideshow.

On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

1

Share

Loading in …3
×
1 of 27
1 of 27

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

  1. 1. On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects Mitglied der Helmholtz-Gemeinschaft 8th Workshop On Workflows in Support of Large-Scale Science 17. November 2013 | Sonja Holl*, Daniel Garijo+, Khalid Belhajjame$, Olav Zimmermann*, Renato De Giovanni#, Matthias Obst~, Carole Goble$ *Jülich Supercomputing Centre (JSC),Forschungszentrum Juelich, Germany +Ontology Engineering Group,  Facultad de Informática Universidad Politécnica de Madrid, Spain $School of Computer Science University of Manchester, UK #Reference Center on Environmental Information Campinas SP, Brazil ~Department of Biological and Environmental Sciences University of Gothenburg, Sweden
  2. 2. Scientific Workflows • Mitglied der Helmholtz-Gemeinschaft • Popular choice to design, manage, and execute in silico experiments Sharing and reuse via workflow repositories Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 2
  3. 3. Ecological Niche Modeling 1 4 5 3 Mitglied der Helmholtz-Gemeinschaft 2 Perform species adaptation to environmental changes (BioVeL Project) Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 3
  4. 4. Ecological Niche Modeling Workflow Parameter Occurrence  Data Environmental  Layer Geographic  Mask createModel Mitglied der Helmholtz-Gemeinschaft testModel calcAUC AUC Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 4
  5. 5. Designing workflow  (from scratch) in silico experiment Reusing workflow REFINE Sharing & Analysis Mitglied der Helmholtz-Gemeinschaft Planning Execution Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 5
  6. 6. Ecological Niche Modeling Workflow Gamma Cost NumberOfPseu doAbsences Occurrence  Data createModel Environmental  Layer Geographic  Mask SVM Maxent GARP Mitglied der Helmholtz-Gemeinschaft testModel calcAUC AUC Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 6
  7. 7. ‐3.2 1 11 2.3 1.5 a 4.55 ‐3 Ecological Niche Modeling Workflow 84 BLAST 10 6.788 Gamma 0.5 Cost NumberOfPseu doAbsences Occurrence  Data Environmental  Layer Select Algorithms 0 createModel Geographic  Mask 12 SVM Maxent GARP Select Parameters 100 testModel Mitglied der Helmholtz-Gemeinschaft ‐2.9 ‐bt 1.3 calcAUC 1 AUC 1 Sunday Nov. 17, 2013 / gaussian 8th Workshop On Workflows in Support of Large-Scale Science 1.9425 6.7 7 13
  8. 8. Common strategies to handle this challenge • • • Default parameters & applications Trial and error Parameter sweeps But: Mitglied der Helmholtz-Gemeinschaft • • • Increasing complexity of scientific workflows Raising number parameters Work time & compute intensive Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 8
  9. 9. Designing workflow  (from scratch) in silico experiment REFINE Reusing workflow Planning Mitglied der Helmholtz-Gemeinschaft Sharing & Analysis Execution Sunday Nov. 17, 2013 Optimization 8th Workshop On Workflows in Support of Large-Scale Science 9
  10. 10. Intelligent automated optimization techniques Goal: • Automated way to find workflow settings that optimizes the output • Mitglied der Helmholtz-Gemeinschaft • • Define workflow output(s) as fitness value Use fitness value for evaluation (e.g. AUC or correlation coefficient) Use heuristic search algorithm to find best Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 10
  11. 11. How does it work? • • • Mitglied der Helmholtz-Gemeinschaft • Development of optimization framework that extends Taverna workflow management system Abstracts optimization process (e.g. parallel execution, security) Developer API allows rapid adaption of new optimization methods Optimization plugins can be added independently WMS Taverna  Sunday Nov. 17, 2013 Framework Optimization      Layer      Plugins A P I Parameter Optimization Component Optimization 8th Workshop On Workflows in Support of Large-Scale Science 11
  12. 12. Taverna Optimization Framework & Plugin (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization method parameters (population size, termination criteria) Best Fitness: 0.34 1 Best Fitness: 0.42 2 Best Fitness: 0.48 Mitglied der Helmholtz-Gemeinschaft . . . Display the optimization result x Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 12
  13. 13. Status quo • • Workflow optimization starts from scratch each time Optimization meta-data are lost Mitglied der Helmholtz-Gemeinschaft Idea: Capture optimization meta-data next to traditional provenance data ⇒ ⇒ learn from/extend prior optimization runs improve and accelerate optimization process Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 13
  14. 14. Research Objects • • • • Aligned with W3C standards Aggregates various resources Describes scientific processes in machine readable format Specified by several ontologies Mitglied der Helmholtz-Gemeinschaft … ore:aggregates Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 14
  15. 15. Taverna Optimization Framework & Plugin Mitglied der Helmholtz-Gemeinschaft (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Display the optimization result Best Fitness: 0.34 Best Fitness: 0.42 Best Fitness: 0.48 1 2 . . . x Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 15
  16. 16. Optimization Research Object Ontology ro:Research Object opt:Optimization Research Object ore:aggregates Mitglied der Helmholtz-Gemeinschaft opt:Algorithm Describes the  optimization  algorithm and  its parameters opt:Fitness opt:Generation opt:Optimization Run opt:Search Space opt:Termination Condition opt:Workflow Describes the  fitness  functions Defines the  population size  and generation  number for an  Optimization  Run Represents one  result set: sub‐ workflow,  parameters and  obtained fitness  values Describes the  dependencies  and parameter  constraints Describes the  termination  condition  defined by the  user The workflow  that was  optimized rdfs:subClassOf Sunday Nov. 17, 2013 rdf:Property 8th Workshop On Workflows in Support of Large-Scale Science 16
  17. 17. Algorithm Mitglied der Helmholtz-Gemeinschaft • Genetic Algorihm • Mutation rate: 0.1 • Crossover rate 0.7 Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 17
  18. 18. Search Space Gamma: • Double • 0 - 10 Mitglied der Helmholtz-Gemeinschaft • Cost/2 < Gamma (fictional) Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 18
  19. 19. Optimization Run Mitglied der Helmholtz-Gemeinschaft • Origin of result • Parameter setting • Fitness value Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 19
  20. 20. Taverna Optimization Framework & Plugin (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Generation 1 Iteration 1 Best Fitness: Fitness: 0.05 0.34 Fitness: 0.05 1 Best Fitness: 0.42 2 Best Fitness: 0.48 Mitglied der Helmholtz-Gemeinschaft . . . Display the optimization result x Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 20
  21. 21. Taverna Optimization Framework & Plugin (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Generation 1 Iteration 1 Best Fitness: Fitness: 0.05 0.34 Generation 1 Iteration 2 Fitness: 0.05 1 Fitness: 0.22 Generation 1 Iteration 3 Best Fitness: 0.42 Fitness: 0.27 Generation 1 Iteration 4 2 Fitness: 0.19 Best Fitness: Generation 1 Iteration 5 0.48 Fitness: 0.31 . Generation 1 Iteration 6 . Fitness: 0.34 x Mitglied der Helmholtz-Gemeinschaft . Display the optimization result Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 21
  22. 22. Taverna Optimization Framework & Plugin Mitglied der Helmholtz-Gemeinschaft (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Display the optimization result Sunday Nov. 17, 2013 Generation 1 Iteration 1 Best Fitness: Fitness: 0.05 0.34 Generation 1 Iteration 2 Fitness: 0.05 Generation 2 Iteration 1 1 Fitness: 0.22 Fitness: 0.05 Generation 3 Iteration 1 Generation 1 Iteration 3 Generation 2 Iteration 2 Best Fitness: 0.42 Fitness: 0.27 Fitness: 0.05 Fitness: 0.22 Generation 1 Iteration 4 2 Generation 3 Iteration 2 Generation 2 Iteration 3 Fitness: 0.19 Fitness: 0.22 Fitness: 0.34 Best Fitness: Generation 1 Iteration 5 Generation 3 Iteration 3 Generation 2 Iteration 4 0.48 Fitness: 0.31 Fitness: 0.34 Fitness: 0.19 . Generation 1 Iteration 6 x Generation 3 Iteration 4 . Generation 2 Iteration 5 Fitness: 0.34 . Fitness: 0.19 Fitness: 0.31 Generation 3 Iteration 5 Generation 2 Iteration 6 Fitness: 0.31 Best Fitness: 0.49 Fitness: 0.33 Generation 3 Iteration  6 Fitness: 0.46 Genetic Algorithm Parameter  Optimization Plugin  8th Workshop On Workflows in Support of Large-Scale Science 22
  23. 23. Example Result Name Value Gamma 2.36 Cost 8 Mitglied der Helmholtz-Gemeinschaft NumberOfPseudo 363 Absences Fitness Sunday Nov. 17, 2013 0.9207 8th Workshop On Workflows in Support of Large-Scale Science 23
  24. 24. Benefits of sharing and exploiting Optimization Research Objects • • • Mitglied der Helmholtz-Gemeinschaft • • • What is the optimal setting? - Reuse optimized settings What ranges have been explored? - Adopt used parameter ranges What algorithm settings were used? - Reuse algorithm settings Are there similar optimizations? - Reuse existing results Resume the optimization Embed optimization provenance into workflow infrastructures to be reused by other scientists Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 24
  25. 25. Conclusion • Scientific workflows are hard to configure Optimization can help but meta-data get lost Extend Research Objects Build new Optimization Research Object Ontology Reuse of optimization meta-data to speed up optimization Shareable with the community in workflow infrastructures • Outlook: How to learn from similar workflows? • • • • Mitglied der Helmholtz-Gemeinschaft • Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 25
  26. 26. Links Mitglied der Helmholtz-Gemeinschaft http://purl.org/net/ro-optimization http://purl.org/net/svm-opt-research-object Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 26
  27. 27. Mitglied der Helmholtz-Gemeinschaft Questions? Thank you!

×