SlideShare a Scribd company logo
On Specifying and Sharing Scientific Workflow
Optimization Results Using Research Objects

Mitglied der Helmholtz-Gemeinschaft

8th Workshop On Workflows in Support of Large-Scale Science

17. November 2013 | Sonja Holl*, Daniel Garijo+, Khalid Belhajjame$, Olav Zimmermann*,
Renato De Giovanni#, Matthias Obst~, Carole Goble$
*Jülich Supercomputing Centre (JSC),Forschungszentrum Juelich, Germany
+Ontology Engineering Group,  Facultad

de Informática Universidad Politécnica de Madrid, Spain

$School of Computer Science University of Manchester, UK
#Reference Center

on Environmental Information Campinas SP, Brazil

~Department of Biological and Environmental Sciences University of Gothenburg, Sweden
Scientific Workflows
•

Mitglied der Helmholtz-Gemeinschaft

•

Popular choice to design,
manage, and execute in silico
experiments
Sharing and reuse via workflow
repositories

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

2
Ecological Niche Modeling
1

4

5

3

Mitglied der Helmholtz-Gemeinschaft

2

Perform species adaptation to environmental
changes (BioVeL Project)
Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

3
Ecological Niche Modeling Workflow
Parameter

Occurrence 
Data

Environmental 
Layer

Geographic 
Mask

createModel

Mitglied der Helmholtz-Gemeinschaft

testModel

calcAUC

AUC
Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

4
Designing workflow 
(from scratch)

in silico experiment

Reusing workflow

REFINE

Sharing & Analysis
Mitglied der Helmholtz-Gemeinschaft

Planning

Execution
Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

5
Ecological Niche Modeling Workflow
Gamma

Cost

NumberOfPseu
doAbsences

Occurrence 
Data

createModel

Environmental 
Layer

Geographic 
Mask

SVM
Maxent
GARP

Mitglied der Helmholtz-Gemeinschaft

testModel

calcAUC

AUC
Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

6
‐3.2

1
11

2.3

1.5
a

4.55

‐3

Ecological Niche Modeling Workflow
84

BLAST

10
6.788
Gamma

0.5

Cost

NumberOfPseu
doAbsences

Occurrence 
Data

Environmental 
Layer

Select Algorithms
0

createModel

Geographic 
Mask

12

SVM
Maxent
GARP

Select Parameters

100

testModel
Mitglied der Helmholtz-Gemeinschaft

‐2.9

‐bt

1.3

calcAUC

1
AUC

1

Sunday Nov. 17, 2013

/

gaussian
8th Workshop On Workflows in Support of Large-Scale Science
1.9425
6.7

7

13
Common strategies to handle this challenge

•
•
•

Default parameters & applications
Trial and error
Parameter sweeps

But:
Mitglied der Helmholtz-Gemeinschaft

•
•
•

Increasing complexity of scientific workflows
Raising number parameters
Work time & compute intensive

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

8
Designing workflow 
(from scratch)

in silico experiment

REFINE

Reusing workflow

Planning

Mitglied der Helmholtz-Gemeinschaft

Sharing & Analysis

Execution
Sunday Nov. 17, 2013

Optimization

8th Workshop On Workflows in Support of Large-Scale Science

9
Intelligent automated optimization techniques
Goal:
• Automated way to find workflow settings that optimizes
the output
•

Mitglied der Helmholtz-Gemeinschaft

•
•

Define workflow output(s) as fitness value
Use fitness value for evaluation (e.g. AUC or correlation
coefficient)
Use heuristic search algorithm to find best

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

10
How does it work?
•
•
•

Mitglied der Helmholtz-Gemeinschaft

•

Development of optimization framework that extends
Taverna workflow management system
Abstracts optimization process (e.g. parallel execution,
security)
Developer API allows rapid adaption of new optimization
methods
Optimization plugins can be added independently
WMS
Taverna 

Sunday Nov. 17, 2013

Framework
Optimization     
Layer     

Plugins
A
P
I

Parameter Optimization
Component Optimization

8th Workshop On Workflows in Support of Large-Scale Science

11
Taverna

Optimization Framework & Plugin

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
method parameters
(population size,
termination criteria)

Best Fitness:
0.34

1

Best Fitness:
0.42

2

Best Fitness:
0.48

Mitglied der Helmholtz-Gemeinschaft

.
.
.

Display the
optimization
result

x

Best Fitness: 0.49
Genetic Algorithm Parameter 
Optimization Plugin 

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

12
Status quo
•
•

Workflow optimization starts from scratch each time
Optimization meta-data are lost

Mitglied der Helmholtz-Gemeinschaft

Idea: Capture optimization meta-data next to traditional
provenance data

⇒
⇒

learn from/extend prior optimization runs
improve and accelerate optimization process

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

13
Research Objects
•
•
•
•

Aligned with W3C standards
Aggregates various resources
Describes scientific processes in machine readable
format
Specified by several ontologies

Mitglied der Helmholtz-Gemeinschaft

…
ore:aggregates

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

14
Taverna

Optimization Framework & Plugin

Mitglied der Helmholtz-Gemeinschaft

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)

Display the
optimization
result

Best
Fitness:
0.34
Best
Fitness:
0.42
Best
Fitness:
0.48

1

2

.
.
.

x

Best Fitness: 0.49
Genetic Algorithm Parameter 
Optimization Plugin 

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

15
Optimization Research Object Ontology
ro:Research
Object

opt:Optimization
Research
Object

ore:aggregates

Mitglied der Helmholtz-Gemeinschaft

opt:Algorithm

Describes the 
optimization 
algorithm and 
its parameters

opt:Fitness

opt:Generation

opt:Optimization
Run

opt:Search
Space

opt:Termination
Condition

opt:Workflow

Describes the 
fitness 
functions

Defines the 
population size 
and generation 
number for an 
Optimization 
Run

Represents one 
result set: sub‐
workflow, 
parameters and 
obtained fitness 
values

Describes the 
dependencies 
and parameter 
constraints

Describes the 
termination 
condition 
defined by the 
user

The workflow 
that was 
optimized

rdfs:subClassOf
Sunday Nov. 17, 2013

rdf:Property
8th Workshop On Workflows in Support of Large-Scale Science

16
Algorithm

Mitglied der Helmholtz-Gemeinschaft

• Genetic Algorihm
• Mutation rate: 0.1
• Crossover rate 0.7

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

17
Search Space

Gamma:
• Double
• 0 - 10
Mitglied der Helmholtz-Gemeinschaft

• Cost/2 < Gamma
(fictional)

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

18
Optimization Run

Mitglied der Helmholtz-Gemeinschaft

• Origin of result
• Parameter setting
• Fitness value

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

19
Taverna

Optimization Framework & Plugin

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)

Generation 1 Iteration 1

Best Fitness:
Fitness: 0.05
0.34
Fitness: 0.05

1

Best Fitness:
0.42

2

Best Fitness:
0.48

Mitglied der Helmholtz-Gemeinschaft

.
.
.

Display the
optimization
result

x

Best Fitness: 0.49
Genetic Algorithm Parameter 
Optimization Plugin 

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

20
Taverna

Optimization Framework & Plugin

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)

Generation 1 Iteration 1

Best Fitness:
Fitness: 0.05
0.34
Generation 1 Iteration 2
Fitness: 0.05

1

Fitness: 0.22
Generation 1 Iteration 3
Best Fitness:

0.42
Fitness: 0.27
Generation 1 Iteration 4

2

Fitness: 0.19

Best Fitness:
Generation 1 Iteration 5
0.48
Fitness: 0.31

.
Generation 1 Iteration 6
.
Fitness: 0.34

x

Mitglied der Helmholtz-Gemeinschaft

.

Display the
optimization
result

Best Fitness: 0.49
Genetic Algorithm Parameter 
Optimization Plugin 

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

21
Taverna

Optimization Framework & Plugin

Mitglied der Helmholtz-Gemeinschaft

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)

Display the
optimization
result

Sunday Nov. 17, 2013

Generation 1 Iteration 1

Best Fitness:
Fitness: 0.05
0.34
Generation 1 Iteration 2
Fitness: 0.05
Generation 2 Iteration 1

1

Fitness: 0.22
Fitness: 0.05
Generation 3 Iteration 1
Generation 1 Iteration 3
Generation 2 Iteration 2
Best Fitness:

0.42
Fitness: 0.27
Fitness: 0.05
Fitness: 0.22
Generation 1 Iteration 4
2
Generation 3 Iteration 2
Generation 2 Iteration 3
Fitness: 0.19
Fitness: 0.22
Fitness: 0.34
Best Fitness:
Generation 1 Iteration 5
Generation 3 Iteration 3
Generation 2 Iteration 4
0.48
Fitness: 0.31
Fitness: 0.34
Fitness: 0.19
.
Generation 1 Iteration 6
x
Generation 3 Iteration 4 .
Generation 2 Iteration 5
Fitness: 0.34
.
Fitness: 0.19
Fitness: 0.31
Generation 3 Iteration 5
Generation 2 Iteration 6
Fitness: 0.31
Best Fitness: 0.49
Fitness: 0.33
Generation 3 Iteration  6
Fitness: 0.46
Genetic Algorithm Parameter 
Optimization Plugin 

8th Workshop On Workflows in Support of Large-Scale Science

22
Example
Result

Name

Value

Gamma

2.36

Cost

8

Mitglied der Helmholtz-Gemeinschaft

NumberOfPseudo 363
Absences
Fitness

Sunday Nov. 17, 2013

0.9207

8th Workshop On Workflows in Support of Large-Scale Science

23
Benefits of sharing and exploiting Optimization
Research Objects
•
•
•

Mitglied der Helmholtz-Gemeinschaft

•
•

•

What is the optimal setting? - Reuse optimized settings
What ranges have been explored? - Adopt used parameter
ranges
What algorithm settings were used? - Reuse algorithm
settings
Are there similar optimizations? - Reuse existing results
Resume the optimization
Embed optimization provenance into workflow
infrastructures to be reused by other scientists

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

24
Conclusion

•

Scientific workflows are hard to configure
Optimization can help but meta-data get lost
Extend Research Objects
Build new Optimization Research Object Ontology
Reuse of optimization meta-data to speed up
optimization
Shareable with the community in workflow infrastructures

•

Outlook: How to learn from similar workflows?

•
•
•
•

Mitglied der Helmholtz-Gemeinschaft

•

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

25
Links

Mitglied der Helmholtz-Gemeinschaft

http://purl.org/net/ro-optimization
http://purl.org/net/svm-opt-research-object

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

26
Mitglied der Helmholtz-Gemeinschaft

Questions?
Thank you!

More Related Content

Similar to On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

UCIAD overview
UCIAD overviewUCIAD overview
UCIAD overview
Mathieu d'Aquin
 
HPC I/O for Computational Scientists
HPC I/O for Computational ScientistsHPC I/O for Computational Scientists
HPC I/O for Computational Scientists
inside-BigData.com
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Carole Goble
 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
dgarijo
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
SigOpt
 
Eclipse Meets Systems Biology
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems Biology
Richard Adams
 
Opquast desktop : quick analysis of an Opendata Dataset
Opquast desktop : quick analysis of an Opendata DatasetOpquast desktop : quick analysis of an Opendata Dataset
Opquast desktop : quick analysis of an Opendata Dataset
Temesis
 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
dgarijo
 
The Planets Testbed
The Planets TestbedThe Planets Testbed
The Planets Testbed
Max Kaiser
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2O
Sri Ambati
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Universidade de São Paulo
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
Daniel Hen
 
Learning Content and Usage Factors Simultaneously
Learning Content and Usage Factors SimultaneouslyLearning Content and Usage Factors Simultaneously
Learning Content and Usage Factors Simultaneously
Arnab Bhadury
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
Paolo Romano
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
Massimiliano Di Penta
 
An approach for knowledge-driven product, process and resource mappings for a...
An approach for knowledge-driven product, process and resource mappings for a...An approach for knowledge-driven product, process and resource mappings for a...
An approach for knowledge-driven product, process and resource mappings for a...
FAST-Lab. Factory Automation Systems and Technologies Laboratory, Tampere University of Technology
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
dgarijo
 
Efficient evaluation of flatness error from Coordinate Measurement Data using...
Efficient evaluation of flatness error from Coordinate Measurement Data using...Efficient evaluation of flatness error from Coordinate Measurement Data using...
Efficient evaluation of flatness error from Coordinate Measurement Data using...
Ali Shahed
 
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien RouhaudPGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
Equnix Business Solutions
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
Anubhav Jain
 

Similar to On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects (20)

UCIAD overview
UCIAD overviewUCIAD overview
UCIAD overview
 
HPC I/O for Computational Scientists
HPC I/O for Computational ScientistsHPC I/O for Computational Scientists
HPC I/O for Computational Scientists
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
Eclipse Meets Systems Biology
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems Biology
 
Opquast desktop : quick analysis of an Opendata Dataset
Opquast desktop : quick analysis of an Opendata DatasetOpquast desktop : quick analysis of an Opendata Dataset
Opquast desktop : quick analysis of an Opendata Dataset
 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
 
The Planets Testbed
The Planets TestbedThe Planets Testbed
The Planets Testbed
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2O
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
 
Learning Content and Usage Factors Simultaneously
Learning Content and Usage Factors SimultaneouslyLearning Content and Usage Factors Simultaneously
Learning Content and Usage Factors Simultaneously
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
An approach for knowledge-driven product, process and resource mappings for a...
An approach for knowledge-driven product, process and resource mappings for a...An approach for knowledge-driven product, process and resource mappings for a...
An approach for knowledge-driven product, process and resource mappings for a...
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
 
Efficient evaluation of flatness error from Coordinate Measurement Data using...
Efficient evaluation of flatness error from Coordinate Measurement Data using...Efficient evaluation of flatness error from Coordinate Measurement Data using...
Efficient evaluation of flatness error from Coordinate Measurement Data using...
 
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien RouhaudPGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 

More from dgarijo

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
dgarijo
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
dgarijo
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
dgarijo
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
dgarijo
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
dgarijo
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
dgarijo
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
dgarijo
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
dgarijo
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
dgarijo
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
dgarijo
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
dgarijo
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
dgarijo
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
dgarijo
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
dgarijo
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
dgarijo
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
dgarijo
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
dgarijo
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
dgarijo
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
dgarijo
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
dgarijo
 

More from dgarijo (20)

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 

Recently uploaded

AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 

Recently uploaded (20)

AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 

On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

  • 1. On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects Mitglied der Helmholtz-Gemeinschaft 8th Workshop On Workflows in Support of Large-Scale Science 17. November 2013 | Sonja Holl*, Daniel Garijo+, Khalid Belhajjame$, Olav Zimmermann*, Renato De Giovanni#, Matthias Obst~, Carole Goble$ *Jülich Supercomputing Centre (JSC),Forschungszentrum Juelich, Germany +Ontology Engineering Group,  Facultad de Informática Universidad Politécnica de Madrid, Spain $School of Computer Science University of Manchester, UK #Reference Center on Environmental Information Campinas SP, Brazil ~Department of Biological and Environmental Sciences University of Gothenburg, Sweden
  • 2. Scientific Workflows • Mitglied der Helmholtz-Gemeinschaft • Popular choice to design, manage, and execute in silico experiments Sharing and reuse via workflow repositories Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 2
  • 3. Ecological Niche Modeling 1 4 5 3 Mitglied der Helmholtz-Gemeinschaft 2 Perform species adaptation to environmental changes (BioVeL Project) Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 3
  • 4. Ecological Niche Modeling Workflow Parameter Occurrence  Data Environmental  Layer Geographic  Mask createModel Mitglied der Helmholtz-Gemeinschaft testModel calcAUC AUC Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 4
  • 5. Designing workflow  (from scratch) in silico experiment Reusing workflow REFINE Sharing & Analysis Mitglied der Helmholtz-Gemeinschaft Planning Execution Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 5
  • 6. Ecological Niche Modeling Workflow Gamma Cost NumberOfPseu doAbsences Occurrence  Data createModel Environmental  Layer Geographic  Mask SVM Maxent GARP Mitglied der Helmholtz-Gemeinschaft testModel calcAUC AUC Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 6
  • 7. ‐3.2 1 11 2.3 1.5 a 4.55 ‐3 Ecological Niche Modeling Workflow 84 BLAST 10 6.788 Gamma 0.5 Cost NumberOfPseu doAbsences Occurrence  Data Environmental  Layer Select Algorithms 0 createModel Geographic  Mask 12 SVM Maxent GARP Select Parameters 100 testModel Mitglied der Helmholtz-Gemeinschaft ‐2.9 ‐bt 1.3 calcAUC 1 AUC 1 Sunday Nov. 17, 2013 / gaussian 8th Workshop On Workflows in Support of Large-Scale Science 1.9425 6.7 7 13
  • 8. Common strategies to handle this challenge • • • Default parameters & applications Trial and error Parameter sweeps But: Mitglied der Helmholtz-Gemeinschaft • • • Increasing complexity of scientific workflows Raising number parameters Work time & compute intensive Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 8
  • 9. Designing workflow  (from scratch) in silico experiment REFINE Reusing workflow Planning Mitglied der Helmholtz-Gemeinschaft Sharing & Analysis Execution Sunday Nov. 17, 2013 Optimization 8th Workshop On Workflows in Support of Large-Scale Science 9
  • 10. Intelligent automated optimization techniques Goal: • Automated way to find workflow settings that optimizes the output • Mitglied der Helmholtz-Gemeinschaft • • Define workflow output(s) as fitness value Use fitness value for evaluation (e.g. AUC or correlation coefficient) Use heuristic search algorithm to find best Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 10
  • 11. How does it work? • • • Mitglied der Helmholtz-Gemeinschaft • Development of optimization framework that extends Taverna workflow management system Abstracts optimization process (e.g. parallel execution, security) Developer API allows rapid adaption of new optimization methods Optimization plugins can be added independently WMS Taverna  Sunday Nov. 17, 2013 Framework Optimization      Layer      Plugins A P I Parameter Optimization Component Optimization 8th Workshop On Workflows in Support of Large-Scale Science 11
  • 12. Taverna Optimization Framework & Plugin (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization method parameters (population size, termination criteria) Best Fitness: 0.34 1 Best Fitness: 0.42 2 Best Fitness: 0.48 Mitglied der Helmholtz-Gemeinschaft . . . Display the optimization result x Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 12
  • 13. Status quo • • Workflow optimization starts from scratch each time Optimization meta-data are lost Mitglied der Helmholtz-Gemeinschaft Idea: Capture optimization meta-data next to traditional provenance data ⇒ ⇒ learn from/extend prior optimization runs improve and accelerate optimization process Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 13
  • 14. Research Objects • • • • Aligned with W3C standards Aggregates various resources Describes scientific processes in machine readable format Specified by several ontologies Mitglied der Helmholtz-Gemeinschaft … ore:aggregates Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 14
  • 15. Taverna Optimization Framework & Plugin Mitglied der Helmholtz-Gemeinschaft (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Display the optimization result Best Fitness: 0.34 Best Fitness: 0.42 Best Fitness: 0.48 1 2 . . . x Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 15
  • 16. Optimization Research Object Ontology ro:Research Object opt:Optimization Research Object ore:aggregates Mitglied der Helmholtz-Gemeinschaft opt:Algorithm Describes the  optimization  algorithm and  its parameters opt:Fitness opt:Generation opt:Optimization Run opt:Search Space opt:Termination Condition opt:Workflow Describes the  fitness  functions Defines the  population size  and generation  number for an  Optimization  Run Represents one  result set: sub‐ workflow,  parameters and  obtained fitness  values Describes the  dependencies  and parameter  constraints Describes the  termination  condition  defined by the  user The workflow  that was  optimized rdfs:subClassOf Sunday Nov. 17, 2013 rdf:Property 8th Workshop On Workflows in Support of Large-Scale Science 16
  • 17. Algorithm Mitglied der Helmholtz-Gemeinschaft • Genetic Algorihm • Mutation rate: 0.1 • Crossover rate 0.7 Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 17
  • 18. Search Space Gamma: • Double • 0 - 10 Mitglied der Helmholtz-Gemeinschaft • Cost/2 < Gamma (fictional) Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 18
  • 19. Optimization Run Mitglied der Helmholtz-Gemeinschaft • Origin of result • Parameter setting • Fitness value Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 19
  • 20. Taverna Optimization Framework & Plugin (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Generation 1 Iteration 1 Best Fitness: Fitness: 0.05 0.34 Fitness: 0.05 1 Best Fitness: 0.42 2 Best Fitness: 0.48 Mitglied der Helmholtz-Gemeinschaft . . . Display the optimization result x Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 20
  • 21. Taverna Optimization Framework & Plugin (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Generation 1 Iteration 1 Best Fitness: Fitness: 0.05 0.34 Generation 1 Iteration 2 Fitness: 0.05 1 Fitness: 0.22 Generation 1 Iteration 3 Best Fitness: 0.42 Fitness: 0.27 Generation 1 Iteration 4 2 Fitness: 0.19 Best Fitness: Generation 1 Iteration 5 0.48 Fitness: 0.31 . Generation 1 Iteration 6 . Fitness: 0.34 x Mitglied der Helmholtz-Gemeinschaft . Display the optimization result Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 21
  • 22. Taverna Optimization Framework & Plugin Mitglied der Helmholtz-Gemeinschaft (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Display the optimization result Sunday Nov. 17, 2013 Generation 1 Iteration 1 Best Fitness: Fitness: 0.05 0.34 Generation 1 Iteration 2 Fitness: 0.05 Generation 2 Iteration 1 1 Fitness: 0.22 Fitness: 0.05 Generation 3 Iteration 1 Generation 1 Iteration 3 Generation 2 Iteration 2 Best Fitness: 0.42 Fitness: 0.27 Fitness: 0.05 Fitness: 0.22 Generation 1 Iteration 4 2 Generation 3 Iteration 2 Generation 2 Iteration 3 Fitness: 0.19 Fitness: 0.22 Fitness: 0.34 Best Fitness: Generation 1 Iteration 5 Generation 3 Iteration 3 Generation 2 Iteration 4 0.48 Fitness: 0.31 Fitness: 0.34 Fitness: 0.19 . Generation 1 Iteration 6 x Generation 3 Iteration 4 . Generation 2 Iteration 5 Fitness: 0.34 . Fitness: 0.19 Fitness: 0.31 Generation 3 Iteration 5 Generation 2 Iteration 6 Fitness: 0.31 Best Fitness: 0.49 Fitness: 0.33 Generation 3 Iteration  6 Fitness: 0.46 Genetic Algorithm Parameter  Optimization Plugin  8th Workshop On Workflows in Support of Large-Scale Science 22
  • 23. Example Result Name Value Gamma 2.36 Cost 8 Mitglied der Helmholtz-Gemeinschaft NumberOfPseudo 363 Absences Fitness Sunday Nov. 17, 2013 0.9207 8th Workshop On Workflows in Support of Large-Scale Science 23
  • 24. Benefits of sharing and exploiting Optimization Research Objects • • • Mitglied der Helmholtz-Gemeinschaft • • • What is the optimal setting? - Reuse optimized settings What ranges have been explored? - Adopt used parameter ranges What algorithm settings were used? - Reuse algorithm settings Are there similar optimizations? - Reuse existing results Resume the optimization Embed optimization provenance into workflow infrastructures to be reused by other scientists Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 24
  • 25. Conclusion • Scientific workflows are hard to configure Optimization can help but meta-data get lost Extend Research Objects Build new Optimization Research Object Ontology Reuse of optimization meta-data to speed up optimization Shareable with the community in workflow infrastructures • Outlook: How to learn from similar workflows? • • • • Mitglied der Helmholtz-Gemeinschaft • Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 25