Scientific workflows are a cornerstone of modern scientific computing. They are used to describe complex computational applications that require efficient and robust management of large volumes of data, which are typically stored/processed on heterogeneous, distributed resources. The workflow research and development community has employed a number of methods for the quantitative evaluation of existing and novel workflow algorithms and systems. In particular, a common approach is to simulate workflow executions. In previous work, we have presented a collection of tools that have been used for aiding research and development activities in the Pegasus project, and that have been adopted by others for conducting workflow research. Despite their popularity, there are several shortcomings that prevent easy adoption, maintenance, and consistency with the evolving structures and computational requirements of production workflows. In this work, we present WorkflowHub, a community framework that provides a collection of tools for analyzing workflow execution traces, producing realistic synthetic workflow traces, and simulating workflow executions. We demonstrate the realism of the generated synthetic traces by comparing simulated executions of these traces with actual workflow executions. We also contrast these results with those obtained when using the previously available collection of tools. We find that our framework not only can be used to generate representative synthetic workflow traces (i.e., with workflow structures and task characteristics distributions that resemble those in traces obtained from real-world workflow executions), but can also generate representative workflow traces at larger scales than that of available workflow traces.
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
WorkflowHub: Community Framework for Enabling Scientific Workflow Research and Development
1. https://workflowhub.org
Community Framework for Enabling
Scientific Workflow Research and Development
Rafael Ferreira da Silva1
Loic Pottier1
Tainã Coleman1
Ewa Deelman1
Henri Casanova2
1University of Southern California
2University of Hawai’i at Manoã
3. https://workflowhub.org
State-of-the-Art
3
A traditional approach for testing, evaluating, and
evolving WMS is to use full-fledged software stacks
to execute applications on distributed platforms and
testbeds
An alternative is to use simulation,
i.e., implement and use a software
artifact that models the functional
and performance behaviors of
software and hardware stacks of
interest
4. https://workflowhub.org 4
WorkflowHub is a community framework that provides a
collection of tools for analyzing workflow execution traces,
producing realistic synthetic workflow traces, and
simulating workflow executions
Concept
5. https://workflowhub.org 5
CommonFormat
Open source common
JSON format for
representing collected
workflow traces and
generated synthetic
workflows traces
Users are encouraged to
contribute additional
workflow traces for any
scientific domain, as long as
they conform to the
WorkflowHub’s common
format
6. https://workflowhub.org 6
Collection of open access
workflow traces from
production workflow systems
This collection of workflow traces form a representative set of
small- and large-scale workflow configurations:
• They consume/produce large volumes of data processed by
thousands of compute tasks
• Their structures are sufficiently complex and heterogeneous to
encompass current and emerging large-scale workflow
execution models
Traces
https://pegasus.isi.edu
7. https://workflowhub.org 7
open source Python
package to analyze traces
and generate representative
synthetic traces in that
same format
analyses can be performed
to produce statistical
summaries of workflow
performance characteristics
PythonPackage
8. https://workflowhub.org 8
Example of probability distribution fitting
of runtime (in seconds) for workflow tasks
WorkflowHub’s Python package
attempts to fit data with 23 probability
distributions provided as part of
SciPy’s statistics submodule
TraceAnalysis
Example of an analysis summary showing the best fit probability distribution for
runtime of the individuals tasks (1000Genome workflow)
9. https://workflowhub.org 9
The WorkflowHub package provides a number of
workflow recipes for generating realistic synthetic
workflow traces
TraceGenerator
Current available workflow recipes for high-throughput applications:
• 1000Genome: A data-intensive bioinformatics workflow
• Cycles: A compute-intensive scientific workflow for agroecosystems modeling
• Epigenomics: A data-intensive bioinformatics workflow
• Montage: A compute-intensive astronomy workflow
• Seismology: A data-intensive seismology workflow
• SoyKB: A data-intensive bioinformatics workflow
10. https://workflowhub.org 10
Simulator We do not develop
simulators as part of
the WorkflowHub
project. Instead, we
catalog open source
workflow systems
simulators
https://wrench-project.org
Objective: Make it easy to
develop simulators of complex
Cyberinfrastructure application
executions
• Provides high-level, reusable
simulation abstractions
• Produces accurate and
scalable simulations
WRENCH Simulation Framework
12. https://workflowhub.org 12
Scenarios
Our previous work has
enabled of 30 research
articles, but synthetic traces
only used 2 probability
distributions to fit runtime
and I/O operations
We use the WRENCH-Pegasus
simulator for evaluating the
accuracy and scalability of
WorkflowHub traces
13. https://workflowhub.org 13
Accuracy
0.00
0.25
0.50
0.75
1.00
0 1000 2000
Workflow Makespan (s)
F(SubmittedTasks)
A
0.00
0.25
0.50
0.75
1.00
0 1000 2000
Workflow Makespan (s)
F(CompletedTasks)
B
real
synthetic
previous
ilmn−125
ilmn−263
ilmn−405
ilmn−559
ilmn−713
ilmn−803
0.00
0.25
0.50
0.75
1.00
0 10000 20000
Workflow Makespan (s)
F(SubmittedTasks)
A
0.00
0.25
0.50
0.75
1.00
0 10000 20000
Workflow Makespan (s)
F(CompletedTasks)
B
real
synthetic
previous
2mass−0.5
2mass−1.0
2mass−1.5
2mass−2.0
Empirical cumulative distribution function of task submit times (top) and task completion times (bottom) for sample real-world
(“real”) and synthetic (“synthetic” and “previous”) workflow trace executions using the WRENCH-Pegasus simulator
14. https://workflowhub.org 14
Scaling
Seismology Soy−KB
Epigenomics Montage
1000Genome Cycles
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
Normalized Workflow Makespan
F(SubmittedTasks)
real
synthetic−1K
synthetic−5K
synthetic−10K
synthetic−25K
synthetic−50K
synthetic−100K
Empirical cumulative distribution function of task submit times for sample real-world (“real”)
and synthetic (“synthetic”) workflow trace executions using the WRENCH-Pegasus simulator.
Root mean square errors (RMSEs) for large scale synthetic workflows.
(RMSE values are computed from normalized workflow makespan.)
16. https://workflowhub.org
Community Framework for Enabling
Scientific Workflow Research and Development
Rafael Ferreira da Silva1
Loic Pottier1
Tainã Coleman1
Ewa Deelman1
Henri Casanova2
1University of Southern California
2University of Hawai’i at Manoã
Thank you!
Questions?
This work is funded by NSF contracts #1923539
and #1923621, and DOE contract number #DE-
SC0012636; and partly funded by NSF contracts
#1664162, #2016610, and #2016619. We also
thank the NSF Chameleon Cloud for providing
time grants to access their resources.