1Ilkay ALTINTAS- September, 2013
Ilkay ALTINTAS, Ph.D.
San Diego Supercomputer Center, UCSD
Roles and Challenges for!
Scientiﬁc Workﬂows and Provenance !
in the Age of Open Science, !
Cloud Computing and Web 2.0!
2Ilkay ALTINTAS- September, 2013
Workﬂows are a Part of Cyberinfrastructure!
and Reuse via a
Schedule, Run and
Support for end-to-end computational scientific process
BUILD SHARE RUN LEARN
3Ilkay ALTINTAS- September, 2013
What motivated this?!
4Ilkay ALTINTAS- September, 2013
Facilitating and Accelerating XXX-Info or
Comp-XXX Research using Scientiﬁc Workﬂows!
• Important Attributes"
– Assemble complex processing easily"
– Access transparently to diverse resources "
– Incorporate multiple software tools "
– Assure reproducibility "
– Build around community development model "
5Ilkay ALTINTAS- September, 2013
In addition, workﬂows today are…!
• Encapsulations of scientiﬁc knowledge"
• Easy to share bits of scientiﬁc process"
– e.g., as research objects"
• Mostly portable"
• Facilitate and encourage reproducible science"
– Track provenance at each step of science… "
• Key integrator for (big and small) data science"
• A means to standardize scientiﬁc data
6Ilkay ALTINTAS- September, 2013
Pushing the boundaries of
scientiﬁc computing created
7Ilkay ALTINTAS- September, 2013
The ‘bioinformatics’ Bottleneck!
• Resources needed for sequence analysis far
exceed the costs of sequence generation"
– Cloud computing is an attractive on-demand
– Need new scheduling capabilities"
• on-demand access to a shared conﬁgurable resources "
• networks, servers, storage, applications, and services"
– Need ability to easily combine users environment
and community tools together with workﬂow "
– Various tools with different proﬁles"
8Ilkay ALTINTAS- September, 2013
The ‘sensor data’ bottleneck!
• Data streaming in at various rates"
• “Big Data” by deﬁnition in its volume, variety,
velocity and viscosity"
– Workﬂows can improve veracity and add value by
providing provenance- and standards-aware on-
the-ﬂy archival capabilities"
– Workﬂows can QA/QC and automate (real-time)
analysis of streaming data before it is even
9Ilkay ALTINTAS- September, 2013
The ‘HPC’ bottleneck!
• Scaling for exascale not happening very
– Different memory architectures"
– Analysis codes being redeveloped"
– Just scheduling through the batch schedulers not
– HPC workﬂows are becoming more interactive "
– In-situ data analysis to deal with volumes of data"
10Ilkay ALTINTAS- September, 2013
As users see the value, they say:!
• Increase reuse "
– best development practices by the scientiﬁc community"
– other bio packages"
• Increase programmability by end users"
– users with various skill levels "
– to formulate actual domain speciﬁc workﬂows"
• Increase resource utilization"
– optimize execution across available computing resources "
– in an efﬁcient, transparent and intuitive manner"
• Make workﬂows a part of the end-to-end scientiﬁc
model from data generation to publication"
11Ilkay ALTINTAS- September, 2013
What are some next steps?!
• Specialize workﬂow systems with domain-speciﬁc "
– Tools; Data models and formats; User interfaces;
• Workﬂow publications and data repositories"
– Treat workﬂows same as data"
– Strong virtualization capability"
• Standards for provenance needed"
– For data and for process"
• Build upon prior knowledge by detecting best
practice programming patterns and motifs"
• Cater to cater to different hardware architectures"
12Ilkay ALTINTAS- September, 2013
Thanks! & Questions…!
How to download Kepler?
Please start with the short Getting Started Guide: