SlideShare a Scribd company logo
1 of 16
https://workflowhub.org
Community Framework for Enabling
Scientific Workflow Research and Development
Rafael Ferreira da Silva1
Loic Pottier1
Tainã Coleman1
Ewa Deelman1
Henri Casanova2
1University of Southern California
2University of Hawai’i at Manoã
https://workflowhub.org
As workflows continue to be adopted by
scientific projects and user communities,
they are becoming more complex and
require more sophisticated workflow
management capabilities
Motivation
Workflows are being designed that can
analyze terabyte-scale datasets, be
composed of millions of individual tasks
that execute for milliseconds up to
several hours, process data streams, and
process static data in object storesCatering to these workflow features and
demands requires WMS research and
development at several levels, from
algorithms and systems all the way to
user interfaces
ThomasMcCauley©2018CERN
©2019NASA
2
https://workflowhub.org
State-of-the-Art
3
A traditional approach for testing, evaluating, and
evolving WMS is to use full-fledged software stacks
to execute applications on distributed platforms and
testbeds
An alternative is to use simulation,
i.e., implement and use a software
artifact that models the functional
and performance behaviors of
software and hardware stacks of
interest
https://workflowhub.org 4
WorkflowHub is a community framework that provides a
collection of tools for analyzing workflow execution traces,
producing realistic synthetic workflow traces, and
simulating workflow executions
Concept
https://workflowhub.org 5
CommonFormat
Open source common
JSON format for
representing collected
workflow traces and
generated synthetic
workflows traces
Users are encouraged to
contribute additional
workflow traces for any
scientific domain, as long as
they conform to the
WorkflowHub’s common
format
https://workflowhub.org 6
Collection of open access
workflow traces from
production workflow systems
This collection of workflow traces form a representative set of
small- and large-scale workflow configurations:
• They consume/produce large volumes of data processed by
thousands of compute tasks
• Their structures are sufficiently complex and heterogeneous to
encompass current and emerging large-scale workflow
execution models
Traces
https://pegasus.isi.edu
https://workflowhub.org 7
open source Python
package to analyze traces
and generate representative
synthetic traces in that
same format
analyses can be performed
to produce statistical
summaries of workflow
performance characteristics
PythonPackage
https://workflowhub.org 8
Example of probability distribution fitting
of runtime (in seconds) for workflow tasks
WorkflowHub’s Python package
attempts to fit data with 23 probability
distributions provided as part of
SciPy’s statistics submodule
TraceAnalysis
Example of an analysis summary showing the best fit probability distribution for
runtime of the individuals tasks (1000Genome workflow)
https://workflowhub.org 9
The WorkflowHub package provides a number of
workflow recipes for generating realistic synthetic
workflow traces
TraceGenerator
Current available workflow recipes for high-throughput applications:
• 1000Genome: A data-intensive bioinformatics workflow
• Cycles: A compute-intensive scientific workflow for agroecosystems modeling
• Epigenomics: A data-intensive bioinformatics workflow
• Montage: A compute-intensive astronomy workflow
• Seismology: A data-intensive seismology workflow
• SoyKB: A data-intensive bioinformatics workflow
https://workflowhub.org 10
Simulator We do not develop
simulators as part of
the WorkflowHub
project. Instead, we
catalog open source
workflow systems
simulators
https://wrench-project.org
Objective: Make it easy to
develop simulators of complex
Cyberinfrastructure application
executions
• Provides high-level, reusable
simulation abstractions
• Produces accurate and
scalable simulations
WRENCH Simulation Framework
https://workflowhub.org 11
CASE STUDY:
EVALUATING SYNTHETIC TRACES
WITH A SIMULATOR OF A PRODUCTION WMS
https://workflowhub.org 12
Scenarios
Our previous work has
enabled of 30 research
articles, but synthetic traces
only used 2 probability
distributions to fit runtime
and I/O operations
We use the WRENCH-Pegasus
simulator for evaluating the
accuracy and scalability of
WorkflowHub traces
https://workflowhub.org 13
Accuracy
0.00
0.25
0.50
0.75
1.00
0 1000 2000
Workflow Makespan (s)
F(SubmittedTasks)
A
0.00
0.25
0.50
0.75
1.00
0 1000 2000
Workflow Makespan (s)
F(CompletedTasks)
B
real
synthetic
previous
ilmn−125
ilmn−263
ilmn−405
ilmn−559
ilmn−713
ilmn−803
0.00
0.25
0.50
0.75
1.00
0 10000 20000
Workflow Makespan (s)
F(SubmittedTasks)
A
0.00
0.25
0.50
0.75
1.00
0 10000 20000
Workflow Makespan (s)
F(CompletedTasks)
B
real
synthetic
previous
2mass−0.5
2mass−1.0
2mass−1.5
2mass−2.0
Empirical cumulative distribution function of task submit times (top) and task completion times (bottom) for sample real-world
(“real”) and synthetic (“synthetic” and “previous”) workflow trace executions using the WRENCH-Pegasus simulator
https://workflowhub.org 14
Scaling
Seismology Soy−KB
Epigenomics Montage
1000Genome Cycles
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
Normalized Workflow Makespan
F(SubmittedTasks)
real
synthetic−1K
synthetic−5K
synthetic−10K
synthetic−25K
synthetic−50K
synthetic−100K
Empirical cumulative distribution function of task submit times for sample real-world (“real”)
and synthetic (“synthetic”) workflow trace executions using the WRENCH-Pegasus simulator.
Root mean square errors (RMSEs) for large scale synthetic workflows.
(RMSE values are computed from normalized workflow makespan.)
https://workflowhub.org 15
Takeaways
https://workflowhub.org
Community Framework for Enabling
Scientific Workflow Research and Development
Rafael Ferreira da Silva1
Loic Pottier1
Tainã Coleman1
Ewa Deelman1
Henri Casanova2
1University of Southern California
2University of Hawai’i at Manoã
Thank you!
Questions?
This work is funded by NSF contracts #1923539
and #1923621, and DOE contract number #DE-
SC0012636; and partly funded by NSF contracts
#1664162, #2016610, and #2016619. We also
thank the NSF Chameleon Cloud for providing
time grants to access their resources.

More Related Content

Similar to WorkflowHub: Community Framework for Enabling Scientific Workflow Research and Development

Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Timothy McPhillips
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...Big Data Value Association
 
XSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata TutorialXSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata Tutorialmarpierc
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowDaniel S. Katz
 
Sysml 2019 demo_paper
Sysml 2019 demo_paperSysml 2019 demo_paper
Sysml 2019 demo_paperstrange_loop
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEWShiyong Lu
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseHao Chen
 
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTestistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTurkish Testing Board
 
Simulagora (Euroscipy2014 - Logilab)
Simulagora (Euroscipy2014 - Logilab)Simulagora (Euroscipy2014 - Logilab)
Simulagora (Euroscipy2014 - Logilab)Logilab
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsLightbend
 
2014 Taverna tutorial introduction to Taverna workflows
2014 Taverna tutorial introduction to Taverna workflows2014 Taverna tutorial introduction to Taverna workflows
2014 Taverna tutorial introduction to Taverna workflowsmyGrid team
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...Rafael Ferreira da Silva
 
Design_Support_Cloud_Application_Redistribution
Design_Support_Cloud_Application_RedistributionDesign_Support_Cloud_Application_Redistribution
Design_Support_Cloud_Application_RedistributionSantiago Gómez Sáez
 
2016 Federal User Group Conference - DevOps Product Strategy
2016 Federal User Group Conference - DevOps Product Strategy2016 Federal User Group Conference - DevOps Product Strategy
2016 Federal User Group Conference - DevOps Product StrategyCollabNet
 

Similar to WorkflowHub: Community Framework for Enabling Scientific Workflow Research and Development (20)

Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
 
XSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata TutorialXSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata Tutorial
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
 
Sysml 2019 demo_paper
Sysml 2019 demo_paperSysml 2019 demo_paper
Sysml 2019 demo_paper
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
 
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTestistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
 
Simulagora (Euroscipy2014 - Logilab)
Simulagora (Euroscipy2014 - Logilab)Simulagora (Euroscipy2014 - Logilab)
Simulagora (Euroscipy2014 - Logilab)
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
 
Ikc 2015
Ikc 2015Ikc 2015
Ikc 2015
 
p850-ries
p850-riesp850-ries
p850-ries
 
2014 Taverna tutorial introduction to Taverna workflows
2014 Taverna tutorial introduction to Taverna workflows2014 Taverna tutorial introduction to Taverna workflows
2014 Taverna tutorial introduction to Taverna workflows
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
 
Design_Support_Cloud_Application_Redistribution
Design_Support_Cloud_Application_RedistributionDesign_Support_Cloud_Application_Redistribution
Design_Support_Cloud_Application_Redistribution
 
2016 Federal User Group Conference - DevOps Product Strategy
2016 Federal User Group Conference - DevOps Product Strategy2016 Federal User Group Conference - DevOps Product Strategy
2016 Federal User Group Conference - DevOps Product Strategy
 

More from Rafael Ferreira da Silva

Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific WorkflowsAccurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific WorkflowsRafael Ferreira da Silva
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Rafael Ferreira da Silva
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningRafael Ferreira da Silva
 
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific WorkflowsOn the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific WorkflowsRafael Ferreira da Silva
 
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Rafael Ferreira da Silva
 
Automating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific WorkflowsAutomating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific WorkflowsRafael Ferreira da Silva
 
Analysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTCAnalysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTCRafael Ferreira da Silva
 
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...Rafael Ferreira da Silva
 
Pegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computationsPegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computationsRafael Ferreira da Silva
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsRafael Ferreira da Silva
 
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Rafael Ferreira da Silva
 
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresExperiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresRafael Ferreira da Silva
 
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...Rafael Ferreira da Silva
 
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific WorkflowsLeveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific WorkflowsRafael Ferreira da Silva
 
A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...Rafael Ferreira da Silva
 
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...Rafael Ferreira da Silva
 
On-line, non-clairvoyant optimization of workflow activity granularity task o...
On-line, non-clairvoyant optimization of workflow activity granularity task o...On-line, non-clairvoyant optimization of workflow activity granularity task o...
On-line, non-clairvoyant optimization of workflow activity granularity task o...Rafael Ferreira da Silva
 
Workflow fairness control on online and non-clairvoyant distributed computing...
Workflow fairness control on online and non-clairvoyant distributed computing...Workflow fairness control on online and non-clairvoyant distributed computing...
Workflow fairness control on online and non-clairvoyant distributed computing...Rafael Ferreira da Silva
 
VIP: design and implementation of the portal and execution service
VIP: design and implementation of the portal and execution serviceVIP: design and implementation of the portal and execution service
VIP: design and implementation of the portal and execution serviceRafael Ferreira da Silva
 
A science-gateway workload archive application to the self-healing of workflo...
A science-gateway workload archive application to the self-healing of workflo...A science-gateway workload archive application to the self-healing of workflo...
A science-gateway workload archive application to the self-healing of workflo...Rafael Ferreira da Silva
 

More from Rafael Ferreira da Silva (20)

Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific WorkflowsAccurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific WorkflowsOn the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
 
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
 
Automating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific WorkflowsAutomating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific Workflows
 
Analysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTCAnalysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTC
 
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
 
Pegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computationsPegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computations
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and Workflows
 
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
 
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresExperiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
 
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
 
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific WorkflowsLeveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
 
A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...
 
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
 
On-line, non-clairvoyant optimization of workflow activity granularity task o...
On-line, non-clairvoyant optimization of workflow activity granularity task o...On-line, non-clairvoyant optimization of workflow activity granularity task o...
On-line, non-clairvoyant optimization of workflow activity granularity task o...
 
Workflow fairness control on online and non-clairvoyant distributed computing...
Workflow fairness control on online and non-clairvoyant distributed computing...Workflow fairness control on online and non-clairvoyant distributed computing...
Workflow fairness control on online and non-clairvoyant distributed computing...
 
VIP: design and implementation of the portal and execution service
VIP: design and implementation of the portal and execution serviceVIP: design and implementation of the portal and execution service
VIP: design and implementation of the portal and execution service
 
A science-gateway workload archive application to the self-healing of workflo...
A science-gateway workload archive application to the self-healing of workflo...A science-gateway workload archive application to the self-healing of workflo...
A science-gateway workload archive application to the self-healing of workflo...
 

Recently uploaded

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 

Recently uploaded (20)

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 

WorkflowHub: Community Framework for Enabling Scientific Workflow Research and Development

  • 1. https://workflowhub.org Community Framework for Enabling Scientific Workflow Research and Development Rafael Ferreira da Silva1 Loic Pottier1 Tainã Coleman1 Ewa Deelman1 Henri Casanova2 1University of Southern California 2University of Hawai’i at Manoã
  • 2. https://workflowhub.org As workflows continue to be adopted by scientific projects and user communities, they are becoming more complex and require more sophisticated workflow management capabilities Motivation Workflows are being designed that can analyze terabyte-scale datasets, be composed of millions of individual tasks that execute for milliseconds up to several hours, process data streams, and process static data in object storesCatering to these workflow features and demands requires WMS research and development at several levels, from algorithms and systems all the way to user interfaces ThomasMcCauley©2018CERN ©2019NASA 2
  • 3. https://workflowhub.org State-of-the-Art 3 A traditional approach for testing, evaluating, and evolving WMS is to use full-fledged software stacks to execute applications on distributed platforms and testbeds An alternative is to use simulation, i.e., implement and use a software artifact that models the functional and performance behaviors of software and hardware stacks of interest
  • 4. https://workflowhub.org 4 WorkflowHub is a community framework that provides a collection of tools for analyzing workflow execution traces, producing realistic synthetic workflow traces, and simulating workflow executions Concept
  • 5. https://workflowhub.org 5 CommonFormat Open source common JSON format for representing collected workflow traces and generated synthetic workflows traces Users are encouraged to contribute additional workflow traces for any scientific domain, as long as they conform to the WorkflowHub’s common format
  • 6. https://workflowhub.org 6 Collection of open access workflow traces from production workflow systems This collection of workflow traces form a representative set of small- and large-scale workflow configurations: • They consume/produce large volumes of data processed by thousands of compute tasks • Their structures are sufficiently complex and heterogeneous to encompass current and emerging large-scale workflow execution models Traces https://pegasus.isi.edu
  • 7. https://workflowhub.org 7 open source Python package to analyze traces and generate representative synthetic traces in that same format analyses can be performed to produce statistical summaries of workflow performance characteristics PythonPackage
  • 8. https://workflowhub.org 8 Example of probability distribution fitting of runtime (in seconds) for workflow tasks WorkflowHub’s Python package attempts to fit data with 23 probability distributions provided as part of SciPy’s statistics submodule TraceAnalysis Example of an analysis summary showing the best fit probability distribution for runtime of the individuals tasks (1000Genome workflow)
  • 9. https://workflowhub.org 9 The WorkflowHub package provides a number of workflow recipes for generating realistic synthetic workflow traces TraceGenerator Current available workflow recipes for high-throughput applications: • 1000Genome: A data-intensive bioinformatics workflow • Cycles: A compute-intensive scientific workflow for agroecosystems modeling • Epigenomics: A data-intensive bioinformatics workflow • Montage: A compute-intensive astronomy workflow • Seismology: A data-intensive seismology workflow • SoyKB: A data-intensive bioinformatics workflow
  • 10. https://workflowhub.org 10 Simulator We do not develop simulators as part of the WorkflowHub project. Instead, we catalog open source workflow systems simulators https://wrench-project.org Objective: Make it easy to develop simulators of complex Cyberinfrastructure application executions • Provides high-level, reusable simulation abstractions • Produces accurate and scalable simulations WRENCH Simulation Framework
  • 11. https://workflowhub.org 11 CASE STUDY: EVALUATING SYNTHETIC TRACES WITH A SIMULATOR OF A PRODUCTION WMS
  • 12. https://workflowhub.org 12 Scenarios Our previous work has enabled of 30 research articles, but synthetic traces only used 2 probability distributions to fit runtime and I/O operations We use the WRENCH-Pegasus simulator for evaluating the accuracy and scalability of WorkflowHub traces
  • 13. https://workflowhub.org 13 Accuracy 0.00 0.25 0.50 0.75 1.00 0 1000 2000 Workflow Makespan (s) F(SubmittedTasks) A 0.00 0.25 0.50 0.75 1.00 0 1000 2000 Workflow Makespan (s) F(CompletedTasks) B real synthetic previous ilmn−125 ilmn−263 ilmn−405 ilmn−559 ilmn−713 ilmn−803 0.00 0.25 0.50 0.75 1.00 0 10000 20000 Workflow Makespan (s) F(SubmittedTasks) A 0.00 0.25 0.50 0.75 1.00 0 10000 20000 Workflow Makespan (s) F(CompletedTasks) B real synthetic previous 2mass−0.5 2mass−1.0 2mass−1.5 2mass−2.0 Empirical cumulative distribution function of task submit times (top) and task completion times (bottom) for sample real-world (“real”) and synthetic (“synthetic” and “previous”) workflow trace executions using the WRENCH-Pegasus simulator
  • 14. https://workflowhub.org 14 Scaling Seismology Soy−KB Epigenomics Montage 1000Genome Cycles 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Normalized Workflow Makespan F(SubmittedTasks) real synthetic−1K synthetic−5K synthetic−10K synthetic−25K synthetic−50K synthetic−100K Empirical cumulative distribution function of task submit times for sample real-world (“real”) and synthetic (“synthetic”) workflow trace executions using the WRENCH-Pegasus simulator. Root mean square errors (RMSEs) for large scale synthetic workflows. (RMSE values are computed from normalized workflow makespan.)
  • 16. https://workflowhub.org Community Framework for Enabling Scientific Workflow Research and Development Rafael Ferreira da Silva1 Loic Pottier1 Tainã Coleman1 Ewa Deelman1 Henri Casanova2 1University of Southern California 2University of Hawai’i at Manoã Thank you! Questions? This work is funded by NSF contracts #1923539 and #1923621, and DOE contract number #DE- SC0012636; and partly funded by NSF contracts #1664162, #2016610, and #2016619. We also thank the NSF Chameleon Cloud for providing time grants to access their resources.