On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

dgarijo
On Specifying and Sharing Scientific Workflow
Optimization Results Using Research Objects

Mitglied der Helmholtz-Gemeinschaft

8th Workshop On Workflows in Support of Large-Scale Science

17. November 2013 | Sonja Holl*, Daniel Garijo+, Khalid Belhajjame$, Olav Zimmermann*,
Renato De Giovanni#, Matthias Obst~, Carole Goble$
*Jülich Supercomputing Centre (JSC),Forschungszentrum Juelich, Germany
+Ontology Engineering Group,  Facultad

de Informática Universidad Politécnica de Madrid, Spain

$School of Computer Science University of Manchester, UK
#Reference Center

on Environmental Information Campinas SP, Brazil

~Department of Biological and Environmental Sciences University of Gothenburg, Sweden
Scientific Workflows
•

Mitglied der Helmholtz-Gemeinschaft

•

Popular choice to design,
manage, and execute in silico
experiments
Sharing and reuse via workflow
repositories

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

2
Ecological Niche Modeling
1

4

5

3

Mitglied der Helmholtz-Gemeinschaft

2

Perform species adaptation to environmental
changes (BioVeL Project)
Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

3
Ecological Niche Modeling Workflow
Parameter

Occurrence 
Data

Environmental 
Layer

Geographic 
Mask

createModel

Mitglied der Helmholtz-Gemeinschaft

testModel

calcAUC

AUC
Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

4
Designing workflow 
(from scratch)

in silico experiment

Reusing workflow

REFINE

Sharing & Analysis
Mitglied der Helmholtz-Gemeinschaft

Planning

Execution
Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

5
Ecological Niche Modeling Workflow
Gamma

Cost

NumberOfPseu
doAbsences

Occurrence 
Data

createModel

Environmental 
Layer

Geographic 
Mask

SVM
Maxent
GARP

Mitglied der Helmholtz-Gemeinschaft

testModel

calcAUC

AUC
Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

6
‐3.2

1
11

2.3

1.5
a

4.55

‐3

Ecological Niche Modeling Workflow
84

BLAST

10
6.788
Gamma

0.5

Cost

NumberOfPseu
doAbsences

Occurrence 
Data

Environmental 
Layer

Select Algorithms
0

createModel

Geographic 
Mask

12

SVM
Maxent
GARP

Select Parameters

100

testModel
Mitglied der Helmholtz-Gemeinschaft

‐2.9

‐bt

1.3

calcAUC

1
AUC

1

Sunday Nov. 17, 2013

/

gaussian
8th Workshop On Workflows in Support of Large-Scale Science
1.9425
6.7

7

13
Common strategies to handle this challenge

•
•
•

Default parameters & applications
Trial and error
Parameter sweeps

But:
Mitglied der Helmholtz-Gemeinschaft

•
•
•

Increasing complexity of scientific workflows
Raising number parameters
Work time & compute intensive

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

8
Designing workflow 
(from scratch)

in silico experiment

REFINE

Reusing workflow

Planning

Mitglied der Helmholtz-Gemeinschaft

Sharing & Analysis

Execution
Sunday Nov. 17, 2013

Optimization

8th Workshop On Workflows in Support of Large-Scale Science

9
Intelligent automated optimization techniques
Goal:
• Automated way to find workflow settings that optimizes
the output
•

Mitglied der Helmholtz-Gemeinschaft

•
•

Define workflow output(s) as fitness value
Use fitness value for evaluation (e.g. AUC or correlation
coefficient)
Use heuristic search algorithm to find best

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

10
How does it work?
•
•
•

Mitglied der Helmholtz-Gemeinschaft

•

Development of optimization framework that extends
Taverna workflow management system
Abstracts optimization process (e.g. parallel execution,
security)
Developer API allows rapid adaption of new optimization
methods
Optimization plugins can be added independently
WMS
Taverna 

Sunday Nov. 17, 2013

Framework
Optimization     
Layer     

Plugins
A
P
I

Parameter Optimization
Component Optimization

8th Workshop On Workflows in Support of Large-Scale Science

11
Taverna

Optimization Framework & Plugin

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
method parameters
(population size,
termination criteria)

Best Fitness:
0.34

1

Best Fitness:
0.42

2

Best Fitness:
0.48

Mitglied der Helmholtz-Gemeinschaft

.
.
.

Display the
optimization
result

x

Best Fitness: 0.49
Genetic Algorithm Parameter 
Optimization Plugin 

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

12
Status quo
•
•

Workflow optimization starts from scratch each time
Optimization meta-data are lost

Mitglied der Helmholtz-Gemeinschaft

Idea: Capture optimization meta-data next to traditional
provenance data

⇒
⇒

learn from/extend prior optimization runs
improve and accelerate optimization process

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

13
Research Objects
•
•
•
•

Aligned with W3C standards
Aggregates various resources
Describes scientific processes in machine readable
format
Specified by several ontologies

Mitglied der Helmholtz-Gemeinschaft

…
ore:aggregates

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

14
Taverna

Optimization Framework & Plugin

Mitglied der Helmholtz-Gemeinschaft

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)

Display the
optimization
result

Best
Fitness:
0.34
Best
Fitness:
0.42
Best
Fitness:
0.48

1

2

.
.
.

x

Best Fitness: 0.49
Genetic Algorithm Parameter 
Optimization Plugin 

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

15
Optimization Research Object Ontology
ro:Research
Object

opt:Optimization
Research
Object

ore:aggregates

Mitglied der Helmholtz-Gemeinschaft

opt:Algorithm

Describes the 
optimization 
algorithm and 
its parameters

opt:Fitness

opt:Generation

opt:Optimization
Run

opt:Search
Space

opt:Termination
Condition

opt:Workflow

Describes the 
fitness 
functions

Defines the 
population size 
and generation 
number for an 
Optimization 
Run

Represents one 
result set: sub‐
workflow, 
parameters and 
obtained fitness 
values

Describes the 
dependencies 
and parameter 
constraints

Describes the 
termination 
condition 
defined by the 
user

The workflow 
that was 
optimized

rdfs:subClassOf
Sunday Nov. 17, 2013

rdf:Property
8th Workshop On Workflows in Support of Large-Scale Science

16
Algorithm

Mitglied der Helmholtz-Gemeinschaft

• Genetic Algorihm
• Mutation rate: 0.1
• Crossover rate 0.7

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

17
Search Space

Gamma:
• Double
• 0 - 10
Mitglied der Helmholtz-Gemeinschaft

• Cost/2 < Gamma
(fictional)

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

18
Optimization Run

Mitglied der Helmholtz-Gemeinschaft

• Origin of result
• Parameter setting
• Fitness value

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

19
Taverna

Optimization Framework & Plugin

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)

Generation 1 Iteration 1

Best Fitness:
Fitness: 0.05
0.34
Fitness: 0.05

1

Best Fitness:
0.42

2

Best Fitness:
0.48

Mitglied der Helmholtz-Gemeinschaft

.
.
.

Display the
optimization
result

x

Best Fitness: 0.49
Genetic Algorithm Parameter 
Optimization Plugin 

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

20
Taverna

Optimization Framework & Plugin

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)

Generation 1 Iteration 1

Best Fitness:
Fitness: 0.05
0.34
Generation 1 Iteration 2
Fitness: 0.05

1

Fitness: 0.22
Generation 1 Iteration 3
Best Fitness:

0.42
Fitness: 0.27
Generation 1 Iteration 4

2

Fitness: 0.19

Best Fitness:
Generation 1 Iteration 5
0.48
Fitness: 0.31

.
Generation 1 Iteration 6
.
Fitness: 0.34

x

Mitglied der Helmholtz-Gemeinschaft

.

Display the
optimization
result

Best Fitness: 0.49
Genetic Algorithm Parameter 
Optimization Plugin 

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

21
Taverna

Optimization Framework & Plugin

Mitglied der Helmholtz-Gemeinschaft

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)

Display the
optimization
result

Sunday Nov. 17, 2013

Generation 1 Iteration 1

Best Fitness:
Fitness: 0.05
0.34
Generation 1 Iteration 2
Fitness: 0.05
Generation 2 Iteration 1

1

Fitness: 0.22
Fitness: 0.05
Generation 3 Iteration 1
Generation 1 Iteration 3
Generation 2 Iteration 2
Best Fitness:

0.42
Fitness: 0.27
Fitness: 0.05
Fitness: 0.22
Generation 1 Iteration 4
2
Generation 3 Iteration 2
Generation 2 Iteration 3
Fitness: 0.19
Fitness: 0.22
Fitness: 0.34
Best Fitness:
Generation 1 Iteration 5
Generation 3 Iteration 3
Generation 2 Iteration 4
0.48
Fitness: 0.31
Fitness: 0.34
Fitness: 0.19
.
Generation 1 Iteration 6
x
Generation 3 Iteration 4 .
Generation 2 Iteration 5
Fitness: 0.34
.
Fitness: 0.19
Fitness: 0.31
Generation 3 Iteration 5
Generation 2 Iteration 6
Fitness: 0.31
Best Fitness: 0.49
Fitness: 0.33
Generation 3 Iteration  6
Fitness: 0.46
Genetic Algorithm Parameter 
Optimization Plugin 

8th Workshop On Workflows in Support of Large-Scale Science

22
Example
Result

Name

Value

Gamma

2.36

Cost

8

Mitglied der Helmholtz-Gemeinschaft

NumberOfPseudo 363
Absences
Fitness

Sunday Nov. 17, 2013

0.9207

8th Workshop On Workflows in Support of Large-Scale Science

23
Benefits of sharing and exploiting Optimization
Research Objects
•
•
•

Mitglied der Helmholtz-Gemeinschaft

•
•

•

What is the optimal setting? - Reuse optimized settings
What ranges have been explored? - Adopt used parameter
ranges
What algorithm settings were used? - Reuse algorithm
settings
Are there similar optimizations? - Reuse existing results
Resume the optimization
Embed optimization provenance into workflow
infrastructures to be reused by other scientists

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

24
Conclusion

•

Scientific workflows are hard to configure
Optimization can help but meta-data get lost
Extend Research Objects
Build new Optimization Research Object Ontology
Reuse of optimization meta-data to speed up
optimization
Shareable with the community in workflow infrastructures

•

Outlook: How to learn from similar workflows?

•
•
•
•

Mitglied der Helmholtz-Gemeinschaft

•

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

25
Links

Mitglied der Helmholtz-Gemeinschaft

http://purl.org/net/ro-optimization
http://purl.org/net/svm-opt-research-object

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

26
Mitglied der Helmholtz-Gemeinschaft

Questions?
Thank you!
1 of 27

Recommended

Deploying your Predictive Models as a Service via Domino by
Deploying your Predictive Models as a Service via DominoDeploying your Predictive Models as a Service via Domino
Deploying your Predictive Models as a Service via DominoJo-fai Chow
19.1K views33 slides
Taverna workflows in the cloud by
Taverna workflows in the cloudTaverna workflows in the cloud
Taverna workflows in the cloudmyGrid team
1.3K views39 slides
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe... by
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
459 views59 slides
OpenML Tutorial ECMLPKDD 2015 by
OpenML Tutorial ECMLPKDD 2015OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015Joaquin Vanschoren
481 views55 slides
PhD Thesis: Mining abstractions in scientific workflows by
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
1.8K views86 slides
Results may vary: Collaborations Workshop, Oxford 2014 by
Results may vary: Collaborations Workshop, Oxford 2014Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Carole Goble
1.8K views52 slides

More Related Content

Similar to On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

UCIAD overview by
UCIAD overviewUCIAD overview
UCIAD overviewMathieu d'Aquin
830 views26 slides
HPC I/O for Computational Scientists by
HPC I/O for Computational ScientistsHPC I/O for Computational Scientists
HPC I/O for Computational Scientistsinside-BigData.com
1.1K views31 slides
Software Sustainability: Better Software Better Science by
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
2.1K views73 slides
From Scientific Workflows to Research Objects: Publication and Abstraction of... by
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...dgarijo
838 views40 slides
Using SigOpt to Tune Deep Learning Models with Nervana Cloud by
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudSigOpt
1K views27 slides
Eclipse Meets Systems Biology by
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems BiologyRichard Adams
967 views29 slides

Similar to On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects(20)

Software Sustainability: Better Software Better Science by Carole Goble
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Carole Goble2.1K views
From Scientific Workflows to Research Objects: Publication and Abstraction of... by dgarijo
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
dgarijo838 views
Using SigOpt to Tune Deep Learning Models with Nervana Cloud by SigOpt
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
SigOpt1K views
Eclipse Meets Systems Biology by Richard Adams
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems Biology
Richard Adams967 views
Opquast desktop : quick analysis of an Opendata Dataset by Temesis
Opquast desktop : quick analysis of an Opendata DatasetOpquast desktop : quick analysis of an Opendata Dataset
Opquast desktop : quick analysis of an Opendata Dataset
Temesis1.6K views
From Scientific Workflows to Research Objects: Publication and Abstraction of... by dgarijo
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
dgarijo513 views
The Planets Testbed by Max Kaiser
The Planets TestbedThe Planets Testbed
The Planets Testbed
Max Kaiser488 views
Stacked Ensembles in H2O by Sri Ambati
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2O
Sri Ambati2.1K views
Multimodal graph-based analysis over the DBLP repository: critical discoverie... by Universidade de São Paulo
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
XGBoost @ Fyber by Daniel Hen
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
Daniel Hen359 views
Learning Content and Usage Factors Simultaneously by Arnab Bhadury
Learning Content and Usage Factors SimultaneouslyLearning Content and Usage Factors Simultaneously
Learning Content and Usage Factors Simultaneously
Arnab Bhadury355 views
Scientific Workflows: what do we have, what do we miss? by Paolo Romano
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
Paolo Romano742 views
Creating abstractions from scientific workflows: PhD symposium 2015 by dgarijo
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
dgarijo526 views
Efficient evaluation of flatness error from Coordinate Measurement Data using... by Ali Shahed
Efficient evaluation of flatness error from Coordinate Measurement Data using...Efficient evaluation of flatness error from Coordinate Measurement Data using...
Efficient evaluation of flatness error from Coordinate Measurement Data using...
Ali Shahed55 views
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud by Equnix
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien RouhaudPGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
Equnix783 views
Software tools to facilitate materials science research by Anubhav Jain
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
Anubhav Jain1.7K views

More from dgarijo

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles by
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
519 views8 slides
FAIR Workflows: A step closer to the Scientific Paper of the Future by
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
618 views36 slides
Towards Reusable Research Software by
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Softwaredgarijo
171 views9 slides
SOMEF: a metadata extraction framework from software documentation by
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationdgarijo
121 views7 slides
A Template-Based Approach for Annotating Long-Tailed Datasets by
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasetsdgarijo
144 views12 slides
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs by
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphsdgarijo
423 views21 slides

More from dgarijo(20)

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles by dgarijo
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
dgarijo519 views
FAIR Workflows: A step closer to the Scientific Paper of the Future by dgarijo
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
dgarijo618 views
Towards Reusable Research Software by dgarijo
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
dgarijo171 views
SOMEF: a metadata extraction framework from software documentation by dgarijo
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
dgarijo121 views
A Template-Based Approach for Annotating Long-Tailed Datasets by dgarijo
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
dgarijo144 views
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs by dgarijo
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
dgarijo423 views
Towards Knowledge Graphs of Reusable Research Software Metadata by dgarijo
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
dgarijo624 views
Scientific Software Registry Collaboration Workshop: From Software Metadata r... by dgarijo
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
dgarijo460 views
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data by dgarijo
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
dgarijo584 views
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M... by dgarijo
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
dgarijo1.8K views
Towards Human-Guided Machine Learning - IUI 2019 by dgarijo
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
dgarijo545 views
Capturing Context in Scientific Experiments: Towards Computer-Driven Science by dgarijo
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
dgarijo551 views
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met... by dgarijo
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
dgarijo583 views
WIDOCO: A Wizard for Documenting Ontologies by dgarijo
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
dgarijo1.2K views
Towards Automating Data Narratives by dgarijo
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
dgarijo918 views
Automated Hypothesis Testing with Large Scale Scientific Workflows by dgarijo
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
dgarijo586 views
OntoSoft: A Distributed Semantic Registry for Scientific Software by dgarijo
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
dgarijo919 views
OEG tools for supporting Ontology Engineering by dgarijo
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
dgarijo289 views
Software Metadata: Describing "dark software" in GeoSciences by dgarijo
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
dgarijo901 views
Reproducibility Using Semantics: An Overview by dgarijo
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
dgarijo890 views

Recently uploaded

PRODUCT LISTING.pptx by
PRODUCT LISTING.pptxPRODUCT LISTING.pptx
PRODUCT LISTING.pptxangelicacueva6
14 views1 slide
PRODUCT PRESENTATION.pptx by
PRODUCT PRESENTATION.pptxPRODUCT PRESENTATION.pptx
PRODUCT PRESENTATION.pptxangelicacueva6
15 views1 slide
Info Session November 2023.pdf by
Info Session November 2023.pdfInfo Session November 2023.pdf
Info Session November 2023.pdfAleksandraKoprivica4
13 views15 slides
Unit 1_Lecture 2_Physical Design of IoT.pdf by
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdfStephenTec
12 views36 slides
Uni Systems for Power Platform.pptx by
Uni Systems for Power Platform.pptxUni Systems for Power Platform.pptx
Uni Systems for Power Platform.pptxUni Systems S.M.S.A.
56 views21 slides
Democratising digital commerce in India-Report by
Democratising digital commerce in India-ReportDemocratising digital commerce in India-Report
Democratising digital commerce in India-ReportKapil Khandelwal (KK)
18 views161 slides

Recently uploaded(20)

Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc11 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
6g - REPORT.pdf by Liveplex
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdf
Liveplex10 views
Serverless computing with Google Cloud (2023-24) by wesley chun
Serverless computing with Google Cloud (2023-24)Serverless computing with Google Cloud (2023-24)
Serverless computing with Google Cloud (2023-24)
wesley chun11 views
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson92 views
"Running students' code in isolation. The hard way", Yurii Holiuk by Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays17 views

On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

  • 1. On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects Mitglied der Helmholtz-Gemeinschaft 8th Workshop On Workflows in Support of Large-Scale Science 17. November 2013 | Sonja Holl*, Daniel Garijo+, Khalid Belhajjame$, Olav Zimmermann*, Renato De Giovanni#, Matthias Obst~, Carole Goble$ *Jülich Supercomputing Centre (JSC),Forschungszentrum Juelich, Germany +Ontology Engineering Group,  Facultad de Informática Universidad Politécnica de Madrid, Spain $School of Computer Science University of Manchester, UK #Reference Center on Environmental Information Campinas SP, Brazil ~Department of Biological and Environmental Sciences University of Gothenburg, Sweden
  • 2. Scientific Workflows • Mitglied der Helmholtz-Gemeinschaft • Popular choice to design, manage, and execute in silico experiments Sharing and reuse via workflow repositories Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 2
  • 3. Ecological Niche Modeling 1 4 5 3 Mitglied der Helmholtz-Gemeinschaft 2 Perform species adaptation to environmental changes (BioVeL Project) Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 3
  • 4. Ecological Niche Modeling Workflow Parameter Occurrence  Data Environmental  Layer Geographic  Mask createModel Mitglied der Helmholtz-Gemeinschaft testModel calcAUC AUC Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 4
  • 5. Designing workflow  (from scratch) in silico experiment Reusing workflow REFINE Sharing & Analysis Mitglied der Helmholtz-Gemeinschaft Planning Execution Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 5
  • 6. Ecological Niche Modeling Workflow Gamma Cost NumberOfPseu doAbsences Occurrence  Data createModel Environmental  Layer Geographic  Mask SVM Maxent GARP Mitglied der Helmholtz-Gemeinschaft testModel calcAUC AUC Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 6
  • 7. ‐3.2 1 11 2.3 1.5 a 4.55 ‐3 Ecological Niche Modeling Workflow 84 BLAST 10 6.788 Gamma 0.5 Cost NumberOfPseu doAbsences Occurrence  Data Environmental  Layer Select Algorithms 0 createModel Geographic  Mask 12 SVM Maxent GARP Select Parameters 100 testModel Mitglied der Helmholtz-Gemeinschaft ‐2.9 ‐bt 1.3 calcAUC 1 AUC 1 Sunday Nov. 17, 2013 / gaussian 8th Workshop On Workflows in Support of Large-Scale Science 1.9425 6.7 7 13
  • 8. Common strategies to handle this challenge • • • Default parameters & applications Trial and error Parameter sweeps But: Mitglied der Helmholtz-Gemeinschaft • • • Increasing complexity of scientific workflows Raising number parameters Work time & compute intensive Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 8
  • 9. Designing workflow  (from scratch) in silico experiment REFINE Reusing workflow Planning Mitglied der Helmholtz-Gemeinschaft Sharing & Analysis Execution Sunday Nov. 17, 2013 Optimization 8th Workshop On Workflows in Support of Large-Scale Science 9
  • 10. Intelligent automated optimization techniques Goal: • Automated way to find workflow settings that optimizes the output • Mitglied der Helmholtz-Gemeinschaft • • Define workflow output(s) as fitness value Use fitness value for evaluation (e.g. AUC or correlation coefficient) Use heuristic search algorithm to find best Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 10
  • 11. How does it work? • • • Mitglied der Helmholtz-Gemeinschaft • Development of optimization framework that extends Taverna workflow management system Abstracts optimization process (e.g. parallel execution, security) Developer API allows rapid adaption of new optimization methods Optimization plugins can be added independently WMS Taverna  Sunday Nov. 17, 2013 Framework Optimization      Layer      Plugins A P I Parameter Optimization Component Optimization 8th Workshop On Workflows in Support of Large-Scale Science 11
  • 12. Taverna Optimization Framework & Plugin (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization method parameters (population size, termination criteria) Best Fitness: 0.34 1 Best Fitness: 0.42 2 Best Fitness: 0.48 Mitglied der Helmholtz-Gemeinschaft . . . Display the optimization result x Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 12
  • 13. Status quo • • Workflow optimization starts from scratch each time Optimization meta-data are lost Mitglied der Helmholtz-Gemeinschaft Idea: Capture optimization meta-data next to traditional provenance data ⇒ ⇒ learn from/extend prior optimization runs improve and accelerate optimization process Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 13
  • 14. Research Objects • • • • Aligned with W3C standards Aggregates various resources Describes scientific processes in machine readable format Specified by several ontologies Mitglied der Helmholtz-Gemeinschaft … ore:aggregates Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 14
  • 15. Taverna Optimization Framework & Plugin Mitglied der Helmholtz-Gemeinschaft (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Display the optimization result Best Fitness: 0.34 Best Fitness: 0.42 Best Fitness: 0.48 1 2 . . . x Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 15
  • 16. Optimization Research Object Ontology ro:Research Object opt:Optimization Research Object ore:aggregates Mitglied der Helmholtz-Gemeinschaft opt:Algorithm Describes the  optimization  algorithm and  its parameters opt:Fitness opt:Generation opt:Optimization Run opt:Search Space opt:Termination Condition opt:Workflow Describes the  fitness  functions Defines the  population size  and generation  number for an  Optimization  Run Represents one  result set: sub‐ workflow,  parameters and  obtained fitness  values Describes the  dependencies  and parameter  constraints Describes the  termination  condition  defined by the  user The workflow  that was  optimized rdfs:subClassOf Sunday Nov. 17, 2013 rdf:Property 8th Workshop On Workflows in Support of Large-Scale Science 16
  • 17. Algorithm Mitglied der Helmholtz-Gemeinschaft • Genetic Algorihm • Mutation rate: 0.1 • Crossover rate 0.7 Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 17
  • 18. Search Space Gamma: • Double • 0 - 10 Mitglied der Helmholtz-Gemeinschaft • Cost/2 < Gamma (fictional) Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 18
  • 19. Optimization Run Mitglied der Helmholtz-Gemeinschaft • Origin of result • Parameter setting • Fitness value Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 19
  • 20. Taverna Optimization Framework & Plugin (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Generation 1 Iteration 1 Best Fitness: Fitness: 0.05 0.34 Fitness: 0.05 1 Best Fitness: 0.42 2 Best Fitness: 0.48 Mitglied der Helmholtz-Gemeinschaft . . . Display the optimization result x Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 20
  • 21. Taverna Optimization Framework & Plugin (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Generation 1 Iteration 1 Best Fitness: Fitness: 0.05 0.34 Generation 1 Iteration 2 Fitness: 0.05 1 Fitness: 0.22 Generation 1 Iteration 3 Best Fitness: 0.42 Fitness: 0.27 Generation 1 Iteration 4 2 Fitness: 0.19 Best Fitness: Generation 1 Iteration 5 0.48 Fitness: 0.31 . Generation 1 Iteration 6 . Fitness: 0.34 x Mitglied der Helmholtz-Gemeinschaft . Display the optimization result Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 21
  • 22. Taverna Optimization Framework & Plugin Mitglied der Helmholtz-Gemeinschaft (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Display the optimization result Sunday Nov. 17, 2013 Generation 1 Iteration 1 Best Fitness: Fitness: 0.05 0.34 Generation 1 Iteration 2 Fitness: 0.05 Generation 2 Iteration 1 1 Fitness: 0.22 Fitness: 0.05 Generation 3 Iteration 1 Generation 1 Iteration 3 Generation 2 Iteration 2 Best Fitness: 0.42 Fitness: 0.27 Fitness: 0.05 Fitness: 0.22 Generation 1 Iteration 4 2 Generation 3 Iteration 2 Generation 2 Iteration 3 Fitness: 0.19 Fitness: 0.22 Fitness: 0.34 Best Fitness: Generation 1 Iteration 5 Generation 3 Iteration 3 Generation 2 Iteration 4 0.48 Fitness: 0.31 Fitness: 0.34 Fitness: 0.19 . Generation 1 Iteration 6 x Generation 3 Iteration 4 . Generation 2 Iteration 5 Fitness: 0.34 . Fitness: 0.19 Fitness: 0.31 Generation 3 Iteration 5 Generation 2 Iteration 6 Fitness: 0.31 Best Fitness: 0.49 Fitness: 0.33 Generation 3 Iteration  6 Fitness: 0.46 Genetic Algorithm Parameter  Optimization Plugin  8th Workshop On Workflows in Support of Large-Scale Science 22
  • 23. Example Result Name Value Gamma 2.36 Cost 8 Mitglied der Helmholtz-Gemeinschaft NumberOfPseudo 363 Absences Fitness Sunday Nov. 17, 2013 0.9207 8th Workshop On Workflows in Support of Large-Scale Science 23
  • 24. Benefits of sharing and exploiting Optimization Research Objects • • • Mitglied der Helmholtz-Gemeinschaft • • • What is the optimal setting? - Reuse optimized settings What ranges have been explored? - Adopt used parameter ranges What algorithm settings were used? - Reuse algorithm settings Are there similar optimizations? - Reuse existing results Resume the optimization Embed optimization provenance into workflow infrastructures to be reused by other scientists Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 24
  • 25. Conclusion • Scientific workflows are hard to configure Optimization can help but meta-data get lost Extend Research Objects Build new Optimization Research Object Ontology Reuse of optimization meta-data to speed up optimization Shareable with the community in workflow infrastructures • Outlook: How to learn from similar workflows? • • • • Mitglied der Helmholtz-Gemeinschaft • Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 25