Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

2017-11-03 Scientific Workflow systems

626 views

Published on

Presented 2017-11-03 to CESAB workshop on Reproducible Workflows, Aix-en-Provence

Published in: Science
  • Be the first to comment

  • Be the first to like this

2017-11-03 Scientific Workflow systems

  1. 1. Partners Funding bioexcel.eu Scientific Workflow Systems 1 Stian Soiland-Reyes eScience Lab, The University of Manchester 2017-11-03, Aix-en-Provence CESAB workshop: Reproducible Workflows orcid.org/0000-0001-9842-9718 @soilandreyes This work is licensed under a Creative Commons Attribution 4.0 International License.
  2. 2. bioexcel.eu What is a Workflow? Orchestrating computational tasks Managing the control and data flow Homogeneous or heterogeneous tasks: – Local / remote – Own / third party – White, grey or black boxes – Reliable / fragile – Reserved / dynamic – Various underpinning infrastructure – Various access controls BioExcel: Biomolecular recognition
  3. 3. bioexcel.eu Not on the agenda: Business workflows Control flow of who has responsibility for what BPM Business workflows + computational workflows  IBISBA 3
  4. 4. bioexcel.eu Why use workflows?Automation – Automate computational aspects – Repetitive pipelines, sweep campaigns Scaling – compute cycles – Make use of computational infrastructure & handle large data Abstraction – people cycles – Shield complexity and incompatibilities – Report, re-use, evolve, share, compare – Repeat –Tweak - Repeat – First class commodities Provenance - reporting – Capture, report and utilize log and data lineage auto-documentation – Traceable evolution, audit, transparency – Compare Findable Accessible Interoperable Reusable (Reproducible) 4 Adapted from Bertram Ludäscher atWORKS2015 https://www.slideshare.net/ludaesch/works-2015provenancemileage
  5. 5. bioexcel.eu The humble Makefile 5 https://github.com/vak/makefile2dot
  6. 6. bioexcel.eu Laser Interferometer Gravitational-Wave Observatory First detection of gravitational waves from colliding black holes https://pegasus.isi.edu/2016/02/11/pegasus-powers-ligo-gravitational-waves-detection-analysis/ https://pegasus.isi.edu/
  7. 7. bioexcel.eu Workflow Environment Ecosystem 7
  8. 8. bioexcel.euhttps://s.apache.org/existing-workflow-systems
  9. 9. bioexcel.eu https://taverna.incubator.apache.org/
  10. 10. bioexcel.eu https://www.knime.org/ https://www.openphacts.org/ Pharmacological queries target, compound and pathway data https://doi.org/10.1371/journal.pone.0115460 http://www.myexperiment.org/workflows/4292
  11. 11. bioexcel.eu https://usegalaxy.org/
  12. 12. bioexcel.eu Stop Press!GUIs not essential! GUI: Canvas, drag-drop blocks, arrows, run button, data visualization Script: Textual, command line, view data externally. Script easily run from other apps. Scripts can be workflows! Workflow systems ⇆ Scripts Scripts on ASAP meter: Automation: ★ ★ ★ ★ ★ Scaling: ★ ★ Abstraction: ★ Provenance: ★ ★
  13. 13. bioexcel.eu https://www.nextflow.io/ Script-like, define flow as channels Streaming Automatic Parallelism Checkpoints Virtualization and packaging Portable Reproducibility
  14. 14. bioexcel.eu Snakemake MakeFile + Python ⇝ SnakeMake Filename patterns Shell commands Inline Python, R Scalable to grid/cloud 14 https://snakemake.readthedocs.io/
  15. 15. bioexcel.eu YesWorkflow Declare workflow steps as #annotations in existing scripts Graphical visualization of workflow 15 http://yesworkflow.org/
  16. 16. bioexcel.eu https://github.com/chapmanb/bcbio- nextgen Distributed workflows for Next-Gen Sequencing analysis Domain-specific language Focus on parameters, algorithms Workflow fixed – no command lines! https://bcbio-nextgen.readthedocs.org
  17. 17. bioexcel.eu http://commonwl.org/ Workflow interoperability Common workflow format Community based standards effort Designed for clusters & clouds Use containers (e.g. Docker) Textual YAML files (GUIs available) Workflow: Steps with data dependencies Step: command line or inline scripts Scatter/gather on steps Rich annotations
  18. 18. bioexcel.eu http://www.commonwl.org/
  19. 19. bioexcel.eu ContainersLinux Container technology ..light-weight "virtual" virtual machine A container is started from a image Images downloaded from Docker Hub Dockerfile: Layer-based recipe Philosophy: One service, one image → microservices Cloud's best friend: scalable, reproducible, customizable 19
  20. 20. bioexcel.eu Publish your own container images 20 https://hub.docker.com/r/openphacts/ Dockerfile
  21. 21. bioexcel.eu http://www.myexperiment.org Find and Share
  22. 22. bioexcel.eu https://view.commonwl.org/ http://doi.org/10.7490/f1000research.1114375.1
  23. 23. bioexcel.eu Running workflows, tracking provenance
  24. 24. bioexcel.eu Copyright © 2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved. http://www.w3.org/TR/prov-overview/ Provenance W3C standard: PROV But multiple formats Multiple styles Multiple extensions Best practice for Workflow Provenance? wfprov (Research Object, Taverna) OPMW/P-Plan (WINGS) ProvONE (DataOne) https://w3id.org/ro/2016-01-28/wfprov/ http://www.opmw.org http://vcvcomputing.com/provone/provone.html
  25. 25. bioexcel.eu https://twitter.com/ianholmes/status/288689712636493824
  26. 26. bioexcel.euhttps://doi.org/10.1016/j.websem.2015.01.003 application/vnd.wf4ever.robundle+zip Research Object Bundle http://www.researchobject.org/
  27. 27. Partners Funding bioexcel.eu Acknowledgements 27 Carole Goble Michael R. Crusoe Apache Taverna BioExcel Common Workflow Language Research Object

×