Invited Talk for EUDAT Workshop in Barcelona


Published on

EUDAT Workflow Workshop - September 25th, 2013

Published in: Design, Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Invited Talk for EUDAT Workshop in Barcelona

  1. 1. 1Ilkay ALTINTAS- September, 2013 Ilkay ALTINTAS, Ph.D. San Diego Supercomputer Center, UCSD Roles and Challenges for! Scientific Workflows and Provenance ! in the Age of Open Science, ! Cloud Computing and Web 2.0!
  2. 2. 2Ilkay ALTINTAS- September, 2013 Workflows are a Part of Cyberinfrastructure! Workflow Design! ! Reporting! ! Workflow Monitoring! ! Workflow Execution! ! ! Workflow Scheduling and Execution Planning! ! ! Run! Review! ! Provenance Analysis! ! ! Deploy! and! Publish! ! Accelerate Workflow Design and Reuse via a Drag-and-Drop Visual Interface Facilitate Sharing Schedule, Run and Monitor Workflow Execution Promote Learning Support for end-to-end computational scientific process BUILD SHARE RUN LEARN
  3. 3. 3Ilkay ALTINTAS- September, 2013 What motivated this?!
  4. 4. 4Ilkay ALTINTAS- September, 2013 Facilitating and Accelerating XXX-Info or 
 Comp-XXX Research using Scientific Workflows! •  Important Attributes" – Assemble complex processing easily" – Access transparently to diverse resources " – Incorporate multiple software tools " – Assure reproducibility " – Build around community development model "
  5. 5. 5Ilkay ALTINTAS- September, 2013 In addition, workflows today are…! •  Encapsulations of scientific knowledge" •  Easy to share bits of scientific process" – e.g., as research objects" •  Mostly portable" •  Facilitate and encourage reproducible science" – Track provenance at each step of science… " •  Key integrator for (big and small) data science" •  A means to standardize scientific data products"
  6. 6. 6Ilkay ALTINTAS- September, 2013 Pushing the boundaries of scientific computing created new requirements.!
  7. 7. 7Ilkay ALTINTAS- September, 2013 The ‘bioinformatics’ Bottleneck! •  Resources needed for sequence analysis far exceed the costs of sequence generation" – Cloud computing is an attractive on-demand decentralized model" – Need new scheduling capabilities" •  on-demand access to a shared configurable resources " •  networks, servers, storage, applications, and services" – Need ability to easily combine users environment and community tools together with workflow " – Various tools with different profiles"
  8. 8. 8Ilkay ALTINTAS- September, 2013 The ‘sensor data’ bottleneck! •  Data streaming in at various rates" •  “Big Data” by definition in its volume, variety, velocity and viscosity" – Workflows can improve veracity and add value by providing provenance- and standards-aware on- the-fly archival capabilities" – Workflows can QA/QC and automate (real-time) analysis of streaming data before it is even archived."
  9. 9. 9Ilkay ALTINTAS- September, 2013 The ‘HPC’ bottleneck! •  Scaling for exascale not happening very naturally" – Different memory architectures" – Analysis codes being redeveloped" – Just scheduling through the batch schedulers not enough" – HPC workflows are becoming more interactive " – In-situ data analysis to deal with volumes of data"
  10. 10. 10Ilkay ALTINTAS- September, 2013 As users see the value, they say:! •  Increase reuse " –  best development practices by the scientific community" –  other bio packages" •  Increase programmability by end users" –  users with various skill levels " –  to formulate actual domain specific workflows" •  Increase resource utilization" –  optimize execution across available computing resources " –  in an efficient, transparent and intuitive manner" •  Make workflows a part of the end-to-end scientific model from data generation to publication"
  11. 11. 11Ilkay ALTINTAS- September, 2013 What are some next steps?! •  Specialize workflow systems with domain-specific " –  Tools; Data models and formats; User interfaces; Deployment " •  Workflow publications and data repositories" –  Treat workflows same as data" –  Strong virtualization capability" •  Standards for provenance needed" –  For data and for process" •  Build upon prior knowledge by detecting best practice programming patterns and motifs" •  Cater to cater to different hardware architectures"
  12. 12. 12Ilkay ALTINTAS- September, 2013 Ilkay Altintas @ilkayaltintas @bioKepler @KeplerWorkflow @WIFIREProject Thanks! & Questions…! How to download Kepler? Please start with the short Getting Started Guide: