SlideShare a Scribd company logo
1 of 27
Download to read offline
On Specifying and Sharing Scientific Workflow
Optimization Results Using Research Objects

Mitglied der Helmholtz-Gemeinschaft

8th Workshop On Workflows in Support of Large-Scale Science

17. November 2013 | Sonja Holl*, Daniel Garijo+, Khalid Belhajjame$, Olav Zimmermann*,
Renato De Giovanni#, Matthias Obst~, Carole Goble$
*Jülich Supercomputing Centre (JSC),Forschungszentrum Juelich, Germany
+Ontology Engineering Group,  Facultad

de Informática Universidad Politécnica de Madrid, Spain

$School of Computer Science University of Manchester, UK
#Reference Center

on Environmental Information Campinas SP, Brazil

~Department of Biological and Environmental Sciences University of Gothenburg, Sweden
Scientific Workflows
•

Mitglied der Helmholtz-Gemeinschaft

•

Popular choice to design,
manage, and execute in silico
experiments
Sharing and reuse via workflow
repositories

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

2
Ecological Niche Modeling
1

4

5

3

Mitglied der Helmholtz-Gemeinschaft

2

Perform species adaptation to environmental
changes (BioVeL Project)
Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

3
Ecological Niche Modeling Workflow
Parameter

Occurrence 
Data

Environmental 
Layer

Geographic 
Mask

createModel

Mitglied der Helmholtz-Gemeinschaft

testModel

calcAUC

AUC
Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

4
Designing workflow 
(from scratch)

in silico experiment

Reusing workflow

REFINE

Sharing & Analysis
Mitglied der Helmholtz-Gemeinschaft

Planning

Execution
Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

5
Ecological Niche Modeling Workflow
Gamma

Cost

NumberOfPseu
doAbsences

Occurrence 
Data

createModel

Environmental 
Layer

Geographic 
Mask

SVM
Maxent
GARP

Mitglied der Helmholtz-Gemeinschaft

testModel

calcAUC

AUC
Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

6
‐3.2

1
11

2.3

1.5
a

4.55

‐3

Ecological Niche Modeling Workflow
84

BLAST

10
6.788
Gamma

0.5

Cost

NumberOfPseu
doAbsences

Occurrence 
Data

Environmental 
Layer

Select Algorithms
0

createModel

Geographic 
Mask

12

SVM
Maxent
GARP

Select Parameters

100

testModel
Mitglied der Helmholtz-Gemeinschaft

‐2.9

‐bt

1.3

calcAUC

1
AUC

1

Sunday Nov. 17, 2013

/

gaussian
8th Workshop On Workflows in Support of Large-Scale Science
1.9425
6.7

7

13
Common strategies to handle this challenge

•
•
•

Default parameters & applications
Trial and error
Parameter sweeps

But:
Mitglied der Helmholtz-Gemeinschaft

•
•
•

Increasing complexity of scientific workflows
Raising number parameters
Work time & compute intensive

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

8
Designing workflow 
(from scratch)

in silico experiment

REFINE

Reusing workflow

Planning

Mitglied der Helmholtz-Gemeinschaft

Sharing & Analysis

Execution
Sunday Nov. 17, 2013

Optimization

8th Workshop On Workflows in Support of Large-Scale Science

9
Intelligent automated optimization techniques
Goal:
• Automated way to find workflow settings that optimizes
the output
•

Mitglied der Helmholtz-Gemeinschaft

•
•

Define workflow output(s) as fitness value
Use fitness value for evaluation (e.g. AUC or correlation
coefficient)
Use heuristic search algorithm to find best

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

10
How does it work?
•
•
•

Mitglied der Helmholtz-Gemeinschaft

•

Development of optimization framework that extends
Taverna workflow management system
Abstracts optimization process (e.g. parallel execution,
security)
Developer API allows rapid adaption of new optimization
methods
Optimization plugins can be added independently
WMS
Taverna 

Sunday Nov. 17, 2013

Framework
Optimization     
Layer     

Plugins
A
P
I

Parameter Optimization
Component Optimization

8th Workshop On Workflows in Support of Large-Scale Science

11
Taverna

Optimization Framework & Plugin

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
method parameters
(population size,
termination criteria)

Best Fitness:
0.34

1

Best Fitness:
0.42

2

Best Fitness:
0.48

Mitglied der Helmholtz-Gemeinschaft

.
.
.

Display the
optimization
result

x

Best Fitness: 0.49
Genetic Algorithm Parameter 
Optimization Plugin 

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

12
Status quo
•
•

Workflow optimization starts from scratch each time
Optimization meta-data are lost

Mitglied der Helmholtz-Gemeinschaft

Idea: Capture optimization meta-data next to traditional
provenance data

⇒
⇒

learn from/extend prior optimization runs
improve and accelerate optimization process

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

13
Research Objects
•
•
•
•

Aligned with W3C standards
Aggregates various resources
Describes scientific processes in machine readable
format
Specified by several ontologies

Mitglied der Helmholtz-Gemeinschaft

…
ore:aggregates

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

14
Taverna

Optimization Framework & Plugin

Mitglied der Helmholtz-Gemeinschaft

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)

Display the
optimization
result

Best
Fitness:
0.34
Best
Fitness:
0.42
Best
Fitness:
0.48

1

2

.
.
.

x

Best Fitness: 0.49
Genetic Algorithm Parameter 
Optimization Plugin 

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

15
Optimization Research Object Ontology
ro:Research
Object

opt:Optimization
Research
Object

ore:aggregates

Mitglied der Helmholtz-Gemeinschaft

opt:Algorithm

Describes the 
optimization 
algorithm and 
its parameters

opt:Fitness

opt:Generation

opt:Optimization
Run

opt:Search
Space

opt:Termination
Condition

opt:Workflow

Describes the 
fitness 
functions

Defines the 
population size 
and generation 
number for an 
Optimization 
Run

Represents one 
result set: sub‐
workflow, 
parameters and 
obtained fitness 
values

Describes the 
dependencies 
and parameter 
constraints

Describes the 
termination 
condition 
defined by the 
user

The workflow 
that was 
optimized

rdfs:subClassOf
Sunday Nov. 17, 2013

rdf:Property
8th Workshop On Workflows in Support of Large-Scale Science

16
Algorithm

Mitglied der Helmholtz-Gemeinschaft

• Genetic Algorihm
• Mutation rate: 0.1
• Crossover rate 0.7

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

17
Search Space

Gamma:
• Double
• 0 - 10
Mitglied der Helmholtz-Gemeinschaft

• Cost/2 < Gamma
(fictional)

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

18
Optimization Run

Mitglied der Helmholtz-Gemeinschaft

• Origin of result
• Parameter setting
• Fitness value

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

19
Taverna

Optimization Framework & Plugin

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)

Generation 1 Iteration 1

Best Fitness:
Fitness: 0.05
0.34
Fitness: 0.05

1

Best Fitness:
0.42

2

Best Fitness:
0.48

Mitglied der Helmholtz-Gemeinschaft

.
.
.

Display the
optimization
result

x

Best Fitness: 0.49
Genetic Algorithm Parameter 
Optimization Plugin 

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

20
Taverna

Optimization Framework & Plugin

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)

Generation 1 Iteration 1

Best Fitness:
Fitness: 0.05
0.34
Generation 1 Iteration 2
Fitness: 0.05

1

Fitness: 0.22
Generation 1 Iteration 3
Best Fitness:

0.42
Fitness: 0.27
Generation 1 Iteration 4

2

Fitness: 0.19

Best Fitness:
Generation 1 Iteration 5
0.48
Fitness: 0.31

.
Generation 1 Iteration 6
.
Fitness: 0.34

x

Mitglied der Helmholtz-Gemeinschaft

.

Display the
optimization
result

Best Fitness: 0.49
Genetic Algorithm Parameter 
Optimization Plugin 

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

21
Taverna

Optimization Framework & Plugin

Mitglied der Helmholtz-Gemeinschaft

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)

Display the
optimization
result

Sunday Nov. 17, 2013

Generation 1 Iteration 1

Best Fitness:
Fitness: 0.05
0.34
Generation 1 Iteration 2
Fitness: 0.05
Generation 2 Iteration 1

1

Fitness: 0.22
Fitness: 0.05
Generation 3 Iteration 1
Generation 1 Iteration 3
Generation 2 Iteration 2
Best Fitness:

0.42
Fitness: 0.27
Fitness: 0.05
Fitness: 0.22
Generation 1 Iteration 4
2
Generation 3 Iteration 2
Generation 2 Iteration 3
Fitness: 0.19
Fitness: 0.22
Fitness: 0.34
Best Fitness:
Generation 1 Iteration 5
Generation 3 Iteration 3
Generation 2 Iteration 4
0.48
Fitness: 0.31
Fitness: 0.34
Fitness: 0.19
.
Generation 1 Iteration 6
x
Generation 3 Iteration 4 .
Generation 2 Iteration 5
Fitness: 0.34
.
Fitness: 0.19
Fitness: 0.31
Generation 3 Iteration 5
Generation 2 Iteration 6
Fitness: 0.31
Best Fitness: 0.49
Fitness: 0.33
Generation 3 Iteration  6
Fitness: 0.46
Genetic Algorithm Parameter 
Optimization Plugin 

8th Workshop On Workflows in Support of Large-Scale Science

22
Example
Result

Name

Value

Gamma

2.36

Cost

8

Mitglied der Helmholtz-Gemeinschaft

NumberOfPseudo 363
Absences
Fitness

Sunday Nov. 17, 2013

0.9207

8th Workshop On Workflows in Support of Large-Scale Science

23
Benefits of sharing and exploiting Optimization
Research Objects
•
•
•

Mitglied der Helmholtz-Gemeinschaft

•
•

•

What is the optimal setting? - Reuse optimized settings
What ranges have been explored? - Adopt used parameter
ranges
What algorithm settings were used? - Reuse algorithm
settings
Are there similar optimizations? - Reuse existing results
Resume the optimization
Embed optimization provenance into workflow
infrastructures to be reused by other scientists

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

24
Conclusion

•

Scientific workflows are hard to configure
Optimization can help but meta-data get lost
Extend Research Objects
Build new Optimization Research Object Ontology
Reuse of optimization meta-data to speed up
optimization
Shareable with the community in workflow infrastructures

•

Outlook: How to learn from similar workflows?

•
•
•
•

Mitglied der Helmholtz-Gemeinschaft

•

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

25
Links

Mitglied der Helmholtz-Gemeinschaft

http://purl.org/net/ro-optimization
http://purl.org/net/svm-opt-research-object

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

26
Mitglied der Helmholtz-Gemeinschaft

Questions?
Thank you!

More Related Content

Similar to On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

The Planets Testbed
The Planets TestbedThe Planets Testbed
The Planets Testbed
Max Kaiser
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Universidade de São Paulo
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
dgarijo
 

Similar to On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects (20)

UCIAD overview
UCIAD overviewUCIAD overview
UCIAD overview
 
HPC I/O for Computational Scientists
HPC I/O for Computational ScientistsHPC I/O for Computational Scientists
HPC I/O for Computational Scientists
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
Eclipse Meets Systems Biology
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems Biology
 
Opquast desktop : quick analysis of an Opendata Dataset
Opquast desktop : quick analysis of an Opendata DatasetOpquast desktop : quick analysis of an Opendata Dataset
Opquast desktop : quick analysis of an Opendata Dataset
 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
 
The Planets Testbed
The Planets TestbedThe Planets Testbed
The Planets Testbed
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2O
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
 
Learning Content and Usage Factors Simultaneously
Learning Content and Usage Factors SimultaneouslyLearning Content and Usage Factors Simultaneously
Learning Content and Usage Factors Simultaneously
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
An approach for knowledge-driven product, process and resource mappings for a...
An approach for knowledge-driven product, process and resource mappings for a...An approach for knowledge-driven product, process and resource mappings for a...
An approach for knowledge-driven product, process and resource mappings for a...
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
 
Efficient evaluation of flatness error from Coordinate Measurement Data using...
Efficient evaluation of flatness error from Coordinate Measurement Data using...Efficient evaluation of flatness error from Coordinate Measurement Data using...
Efficient evaluation of flatness error from Coordinate Measurement Data using...
 
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien RouhaudPGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 

More from dgarijo

Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
dgarijo
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
dgarijo
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
dgarijo
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
dgarijo
 

More from dgarijo (20)

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

  • 1. On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects Mitglied der Helmholtz-Gemeinschaft 8th Workshop On Workflows in Support of Large-Scale Science 17. November 2013 | Sonja Holl*, Daniel Garijo+, Khalid Belhajjame$, Olav Zimmermann*, Renato De Giovanni#, Matthias Obst~, Carole Goble$ *Jülich Supercomputing Centre (JSC),Forschungszentrum Juelich, Germany +Ontology Engineering Group,  Facultad de Informática Universidad Politécnica de Madrid, Spain $School of Computer Science University of Manchester, UK #Reference Center on Environmental Information Campinas SP, Brazil ~Department of Biological and Environmental Sciences University of Gothenburg, Sweden
  • 2. Scientific Workflows • Mitglied der Helmholtz-Gemeinschaft • Popular choice to design, manage, and execute in silico experiments Sharing and reuse via workflow repositories Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 2
  • 3. Ecological Niche Modeling 1 4 5 3 Mitglied der Helmholtz-Gemeinschaft 2 Perform species adaptation to environmental changes (BioVeL Project) Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 3
  • 4. Ecological Niche Modeling Workflow Parameter Occurrence  Data Environmental  Layer Geographic  Mask createModel Mitglied der Helmholtz-Gemeinschaft testModel calcAUC AUC Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 4
  • 5. Designing workflow  (from scratch) in silico experiment Reusing workflow REFINE Sharing & Analysis Mitglied der Helmholtz-Gemeinschaft Planning Execution Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 5
  • 6. Ecological Niche Modeling Workflow Gamma Cost NumberOfPseu doAbsences Occurrence  Data createModel Environmental  Layer Geographic  Mask SVM Maxent GARP Mitglied der Helmholtz-Gemeinschaft testModel calcAUC AUC Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 6
  • 7. ‐3.2 1 11 2.3 1.5 a 4.55 ‐3 Ecological Niche Modeling Workflow 84 BLAST 10 6.788 Gamma 0.5 Cost NumberOfPseu doAbsences Occurrence  Data Environmental  Layer Select Algorithms 0 createModel Geographic  Mask 12 SVM Maxent GARP Select Parameters 100 testModel Mitglied der Helmholtz-Gemeinschaft ‐2.9 ‐bt 1.3 calcAUC 1 AUC 1 Sunday Nov. 17, 2013 / gaussian 8th Workshop On Workflows in Support of Large-Scale Science 1.9425 6.7 7 13
  • 8. Common strategies to handle this challenge • • • Default parameters & applications Trial and error Parameter sweeps But: Mitglied der Helmholtz-Gemeinschaft • • • Increasing complexity of scientific workflows Raising number parameters Work time & compute intensive Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 8
  • 9. Designing workflow  (from scratch) in silico experiment REFINE Reusing workflow Planning Mitglied der Helmholtz-Gemeinschaft Sharing & Analysis Execution Sunday Nov. 17, 2013 Optimization 8th Workshop On Workflows in Support of Large-Scale Science 9
  • 10. Intelligent automated optimization techniques Goal: • Automated way to find workflow settings that optimizes the output • Mitglied der Helmholtz-Gemeinschaft • • Define workflow output(s) as fitness value Use fitness value for evaluation (e.g. AUC or correlation coefficient) Use heuristic search algorithm to find best Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 10
  • 11. How does it work? • • • Mitglied der Helmholtz-Gemeinschaft • Development of optimization framework that extends Taverna workflow management system Abstracts optimization process (e.g. parallel execution, security) Developer API allows rapid adaption of new optimization methods Optimization plugins can be added independently WMS Taverna  Sunday Nov. 17, 2013 Framework Optimization      Layer      Plugins A P I Parameter Optimization Component Optimization 8th Workshop On Workflows in Support of Large-Scale Science 11
  • 12. Taverna Optimization Framework & Plugin (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization method parameters (population size, termination criteria) Best Fitness: 0.34 1 Best Fitness: 0.42 2 Best Fitness: 0.48 Mitglied der Helmholtz-Gemeinschaft . . . Display the optimization result x Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 12
  • 13. Status quo • • Workflow optimization starts from scratch each time Optimization meta-data are lost Mitglied der Helmholtz-Gemeinschaft Idea: Capture optimization meta-data next to traditional provenance data ⇒ ⇒ learn from/extend prior optimization runs improve and accelerate optimization process Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 13
  • 14. Research Objects • • • • Aligned with W3C standards Aggregates various resources Describes scientific processes in machine readable format Specified by several ontologies Mitglied der Helmholtz-Gemeinschaft … ore:aggregates Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 14
  • 15. Taverna Optimization Framework & Plugin Mitglied der Helmholtz-Gemeinschaft (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Display the optimization result Best Fitness: 0.34 Best Fitness: 0.42 Best Fitness: 0.48 1 2 . . . x Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 15
  • 16. Optimization Research Object Ontology ro:Research Object opt:Optimization Research Object ore:aggregates Mitglied der Helmholtz-Gemeinschaft opt:Algorithm Describes the  optimization  algorithm and  its parameters opt:Fitness opt:Generation opt:Optimization Run opt:Search Space opt:Termination Condition opt:Workflow Describes the  fitness  functions Defines the  population size  and generation  number for an  Optimization  Run Represents one  result set: sub‐ workflow,  parameters and  obtained fitness  values Describes the  dependencies  and parameter  constraints Describes the  termination  condition  defined by the  user The workflow  that was  optimized rdfs:subClassOf Sunday Nov. 17, 2013 rdf:Property 8th Workshop On Workflows in Support of Large-Scale Science 16
  • 17. Algorithm Mitglied der Helmholtz-Gemeinschaft • Genetic Algorihm • Mutation rate: 0.1 • Crossover rate 0.7 Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 17
  • 18. Search Space Gamma: • Double • 0 - 10 Mitglied der Helmholtz-Gemeinschaft • Cost/2 < Gamma (fictional) Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 18
  • 19. Optimization Run Mitglied der Helmholtz-Gemeinschaft • Origin of result • Parameter setting • Fitness value Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 19
  • 20. Taverna Optimization Framework & Plugin (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Generation 1 Iteration 1 Best Fitness: Fitness: 0.05 0.34 Fitness: 0.05 1 Best Fitness: 0.42 2 Best Fitness: 0.48 Mitglied der Helmholtz-Gemeinschaft . . . Display the optimization result x Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 20
  • 21. Taverna Optimization Framework & Plugin (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Generation 1 Iteration 1 Best Fitness: Fitness: 0.05 0.34 Generation 1 Iteration 2 Fitness: 0.05 1 Fitness: 0.22 Generation 1 Iteration 3 Best Fitness: 0.42 Fitness: 0.27 Generation 1 Iteration 4 2 Fitness: 0.19 Best Fitness: Generation 1 Iteration 5 0.48 Fitness: 0.31 . Generation 1 Iteration 6 . Fitness: 0.34 x Mitglied der Helmholtz-Gemeinschaft . Display the optimization result Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 21
  • 22. Taverna Optimization Framework & Plugin Mitglied der Helmholtz-Gemeinschaft (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Display the optimization result Sunday Nov. 17, 2013 Generation 1 Iteration 1 Best Fitness: Fitness: 0.05 0.34 Generation 1 Iteration 2 Fitness: 0.05 Generation 2 Iteration 1 1 Fitness: 0.22 Fitness: 0.05 Generation 3 Iteration 1 Generation 1 Iteration 3 Generation 2 Iteration 2 Best Fitness: 0.42 Fitness: 0.27 Fitness: 0.05 Fitness: 0.22 Generation 1 Iteration 4 2 Generation 3 Iteration 2 Generation 2 Iteration 3 Fitness: 0.19 Fitness: 0.22 Fitness: 0.34 Best Fitness: Generation 1 Iteration 5 Generation 3 Iteration 3 Generation 2 Iteration 4 0.48 Fitness: 0.31 Fitness: 0.34 Fitness: 0.19 . Generation 1 Iteration 6 x Generation 3 Iteration 4 . Generation 2 Iteration 5 Fitness: 0.34 . Fitness: 0.19 Fitness: 0.31 Generation 3 Iteration 5 Generation 2 Iteration 6 Fitness: 0.31 Best Fitness: 0.49 Fitness: 0.33 Generation 3 Iteration  6 Fitness: 0.46 Genetic Algorithm Parameter  Optimization Plugin  8th Workshop On Workflows in Support of Large-Scale Science 22
  • 23. Example Result Name Value Gamma 2.36 Cost 8 Mitglied der Helmholtz-Gemeinschaft NumberOfPseudo 363 Absences Fitness Sunday Nov. 17, 2013 0.9207 8th Workshop On Workflows in Support of Large-Scale Science 23
  • 24. Benefits of sharing and exploiting Optimization Research Objects • • • Mitglied der Helmholtz-Gemeinschaft • • • What is the optimal setting? - Reuse optimized settings What ranges have been explored? - Adopt used parameter ranges What algorithm settings were used? - Reuse algorithm settings Are there similar optimizations? - Reuse existing results Resume the optimization Embed optimization provenance into workflow infrastructures to be reused by other scientists Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 24
  • 25. Conclusion • Scientific workflows are hard to configure Optimization can help but meta-data get lost Extend Research Objects Build new Optimization Research Object Ontology Reuse of optimization meta-data to speed up optimization Shareable with the community in workflow infrastructures • Outlook: How to learn from similar workflows? • • • • Mitglied der Helmholtz-Gemeinschaft • Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 25