Advertisement

FAIR Computational Workflows

Nov. 15, 2021
Advertisement

More Related Content

Similar to FAIR Computational Workflows(20)

Advertisement
Advertisement

FAIR Computational Workflows

  1. FAIR Computational Workflows Professor Carole Goble The University of Manchester UK EU Research Infrastructures ELIXIR, IBISBA, EOSC-Life BioExcel Centre of Excellence Software Sustainability Institute UK FAIRDOM Consortium carole.goble@manchester.ac.uk 16th Workshop on Workflows in Support of Large-Scale Science November 15, 2021
  2. 20 years+ Computational workflows decades in the making… finally coming of age….? doi: 10.1093/gigascience/giaa140 Nature 573, 149-150 (2019) https://doi.org/10.1038/d41586-019-02619-z https://doi.org/10.1038/s41592-021-01254-9
  3. An open collaborative space for digital biology in Europe https://lifescience-ri.eu/ https://www.eosc-life.eu/
  4. Computational Workflows for Data intensive Bioscience CryoEM Image Analysis Metagenomic Pipelines [Rob Finn] [Carlos Oscar Sorzano Sanchez] Nature 573, 149-150 (2019) https://doi.org/10.1038/d41586-019-02619-z Data pipelines, simulation sweeps, workflow ensembles. Mixture of workflow systems, notebooks and scripts. Chaining different codes. Genome Annotation [Romain Dallet] High Throughput Sequencing [Fabrice Allain] Interactive & exploratory analysis Production, automated, repetitive & workflow- integrated software
  5. Workflow System Landscape Inter-twingled, mix and matching Scripting environments Interactive Electronic Research Notebooks Repositories Registries Workflow Management Systems & execution platforms *https://s.apache.org/existing-workflow-systems 300+ Systems* General and Specialised General Repositories
  6. https://snakemake.github.io/ Workflows are rules: Graph of jobs for automatic parallelisation, DIY package & containerisation installation, auto-documentation From frameworks to web based analysis platforms Communities cluster round a few systems. Take up of a WfMS typically depends on the “plugged-in” support of data types & specific codes, skills level of the workflow developers, its popularity & sustainability. Online portals users build and reuse workflows around publicly available or user-uploaded data and pre- wrapped, pre-installed tools.
  7. A FAIR data and workflow commons sharing and running workflows Workflows are: an entry point to tools and datasets, democratising resources functions for FAIR data processing and secure data processing FAIR digital objects Honour legacy & diversity of WfMS -> Buy-in & on-boarding of WfMS
  8. A FAIR data and workflow commons Workflows are: an entry point to tools and datasets, democratising resources functions for FAIR data processing and secure data processing FAIR digital objects Honour legacy & diversity of WfMS -> Buy-in & on-boarding of WfMS
  9. FAIR Guiding Principles for Research Data Findable, Accessible, Interoperable, Reusable A set of guiding principles to enhance the value of all digital resources and their reuse by people and by machines A community journey to common guidelines The glue to federate data and services, to apply to all objects Benefit both consumers and producers.
  10. The FAIR Research Data Principles RDA FAIR Data Maturity Model. Specification and Guidelines https://zenodo.org/record/3909563#.YORYkUzTX19 https://www.go-fair.org/fair-principles/ Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
  11. tl;dr FAIR Research Data Principles https://www.go-fair.org/fair-principles/ Persistent human readable & machine- actionable metadata • Linked • Community standards Persistent identifiers Clear licensing and access rules Protocols for machine accessibility & AAI Registration Searching & Indexing Enabling automation
  12. FAIR Research Data Principles update in a nutshell Policy Rallying point I’m FAIR! What is it? Definition Spectrum Contextual Methodology FAIRification FAIR by Design Assessment Compliance Certification FREE Infrastructure Services Adoption Incentives Stewardship Services
  13. FAIR Research Software Principles Software is a digital object but research software is not (just) data https://www.rd-alliance.org/groups/fair-4-researchsoftware-fair4rs-wg FAIR for Research Software (FAIR4RS) Working Group FAIR4RS First Draft of FAIR4RS principles Katz, et al PATTERNS 2, 2021 Lamprecht et al., 2020
  14. FAIR Research Software Principles Software is a digital object but research software is not (just) data Findable Accessable I1. Software should read, write or exchange data in a way that meets domain-relevant community standards I2. Software includes qualified references to other objects. Reusable Interoperable R1. Software is richly described with a plurality of accurate & relevant attributes R1.1. Software is made available with a clear & accessible software usage license R1.2. Software is associated with detailed provenance R1.3. Software meets domain-relevant community standards R2. Software includes qualified references to other software (Katz et al, 2021 PATTERNS, https://doi.org/10.1016/j.patter.2021.100222) R. The software is usable (it can be executed) & reusable (it can be understood, modified, built upon, or incorporated into other software).
  15. Enabling FAIR? FAIRification. Assessment. Services. Governance. Incentives. FAIR takes a Village* *Borgman, C. L., & Bourne, P. E. (2021). Why it takes a village to manage and share data. Harvard Data Science Review (under Review). https://arxiv.org/abs/2109.01694 FAIR Computational Workflow Principles?
  16. FAIR Principles for Workflows Abstraction 1: Hybrid Processual Digital Objects
  17. Image credit: BioExcel Centre of Excellence different components, codes, languages, third parties FAIR Principles for Workflows Abstraction 2: Compositional Objects Interoperability and Reusability FAIR Unit Test
  18. FAIR Principles for Workflows Method “Data” Objects Workflows as FAIR Software FAIR+R and FAIR++ Quality, maturity, maintainability The principles revised Workflows as FAIR Digital Objects Data-like method objects Associated objects The principles adapted Workflows as FAIR Data Instruments FAIRification of the dataflow The data principles supported C. Goble, S. Cohen-Boulakia, S. Soiland-Reyes, D. Garijo, Y. Gil, M.R. Crusoe, K. Peters & D. Schober. FAIR computational workflows. Data Intelligence 2(2020), 108–121. doi: 10.1162/dint_a_000 Workflow Objects Software Objects Data FAIRification FAIR enabling services Services
  19. Findable & Accessible WORKS 2007 WORKS 2021 https://workflowhub.eu https://workflowhub.org https://myexperiment.org
  20. Findable & Accessable register workflows with assigned PID + metadata in a searchable resource. https://workflowhub.eu Publishing Services Journals Digital Objects of Scholarship published, cited, exchanged, reviewed, validated & reused • Versioning, DOI/PID assignment • Collections, workflow libraries scripts Repos Containers Deploys Tools WfMS Agnostic degrees of onboarding, support & access • Native repositories • Metadata standards framework, handle associated objects and links between objects. • Execution API https://dockstore.org/
  21. Link up providers and users Building visibility & reputation Close the “Find – Get– Use – Credit” loop Credit, Attribution, Citation Knowledge Graphs linking out to OpenAIRE, DataCite Associate workflows Associate sister objects myExperiment influence Social aspects Teams, People
  22. licensing authors & credit analytics access search versions & status other workflows Smoothing onboarding • GitHub integration • WfMS metadata support • Accessible by API scripts
  23. Tool Registry Service API Accessible “metadata & workflow retrievable by PID using a standardized communication protocol” GitHub page: https://github.com/ga4gh/tool-registry-service-schemas
  24. Accessible an implementation of GA4GH WES https://github.com/sapporo-wes/sapporo top layer over the tools, the workflow languages, and the workflow runners GA4GH TRS
  25. FAIR Workflow are FAIR Software lifecycle support for living objects Git Coupling Publishing Status Testing Benchmarking
  26. Extensible Metadata Framework catering for those processual FAIR criteria Common metadata about the workflow, tools & parameters Canonical workflow description of the steps of the workflow Type the input and outputs of the steps Run Provenance / Histories / Tests WfMS native history logs Format for packaging a workflow, its metadata and companion objects (links to containers, data etc) for exchange, archiving, reporting, citing. WorkflowHub and Services create and consume Crates FAIR Digital Object Adopting Open Community efforts
  27. FAIR Metadata for Machines & Humans https://www.commonwl.org WfMS neutral canonical description Linked to containerised tools • Portable, reusable workflows • Standardise expression of workflow • Standardise compatible I/O for steps • Reduce vendor / project lock-in • Workflow comparisons • Collaboration & knowledge transfer https://openwdl.org/
  28. Computational workflow profile Formal parameter profile https://bioschemas.org Opinionated use of schema.org, the web resource mark-up used by search engines, knowledge graphs and scientific resources. Computational tool profile FAIR Metadata for Machines & Humans data and software objects
  29. RO-Crate Digital Objects Packaging everything together regardless where or what it is https://www.researchobject.org/ro-crate/ Self describing format for packaging up scattered resources integrated view + context metadata and PIDs reference digital and real things - datasets, workflows, services, software & people, places etc. Web-native, COTS machine and human readable search engine & developer friendly. Infrastructure independent & self- describing Avoid repository silos Extensible and open- ended profiles duck- typing, cope with diversity and legacy
  30. RO-Crate Profile https://www.researchobject.org/ro-crate/profiles WfMS produce and consume Workflow-RO-Crates
  31. Provenance & Preservation Transparency & Reuse matter more than Reproducibility? Traceability more important? When is it FAIR enough? WfMS heavy lifting needed … R1.2: (Meta)data, software and workflows are associated with detailed provenance – data lineage, workflow lineage & workflow logs
  32. ProvenanceWeek 2021, T7 Workshop on Provenance for Transparent Research, July 2021 https://iitdbgroup.github.io/ProvenanceWeek2021/t7.html
  33. A2. metadata are accessible, even when the workflow is no longer available Read-reproducible as a method description if no longer runs, Metadata preserved beyond any one service republished in a long-term archive R. The workflow is usable (it can be executed) and reusable (it can be understood, modified, built upon, or incorporated into other workflows). FAIR Services Law of decline All workflows decay over time. Complexity of Dependencies Description persists -> Review, Repair, Remake
  34. Reusable and Usable i.e. can be executed once accessed Quality, maturity, maintainability -> FAIR++ Multiple wf/test backends: Galaxy Pandemo, CWL, Jenkins … Check workflow performance, provenance on containers, memory usage … Testing and monitoring -> metadata into WorkflowHub Portability High-level workflow execution service backend, sensitive data analysis & running on private clouds “Interoperable” Execution Is a workflow reusable if it’s resource greedy or too slow or needs special resources or unavailable data or cannot be ported or run by anyone other than the developers? Like Google ML…
  35. Interoperable and Reusable Workflows… a portability viewpoint All good WORKS stuff which I am not going to talk about…. exascale computing
  36. Composability -> Interoperability and Reusability Community driven Reusability first I1: Software interoperates through APIs and metadata standards. FAIR Unit tested & validated canonical workflows & blocks. Well documented, well maintained CWL Canonical descriptions • Recycle descriptions and sub-workflows • Platform independent exchange and comparison • Standardised I/O formats Thanks: Rob Finn
  37. Composability -> Interoperability and Reusability Community driven Reusability first I1: Software interoperates through APIs and metadata standards. FAIR Unit tested & validated canonical workflows & blocks. Canonical Workflow Frameworks for Research (CWFR) https://www.rd- alliance.org/canonical- workflow-frameworks- research-cwfr https://fairdo.org/wg/fdo-cwfr/ Thanks: Stian Soiland-Reyes
  38. Workflow Data FAIRification & FAIR Data by Design Assisted by WfMS Challenge of diverse API & AAI landscape, formats and packaging Reviewing Curation Certification Governance Best Practice Golden Examples Canonical workflows Design for FAIR Data and Reuse Metadata generated for data products
  39. FAIR Reusable Workflow Design is Hard and Hard Work Nearly always post-hoc Third party dependencies  Technology Debt and Refactoring Software Engineering In the Sweatshop of Science who has the Time? Inclination? Skills? Resources?
  40. FAIR Reusable Workflow Design is Hard and Hard Work Nearly always post-hoc Workflow developers Tool and data set providers Workflow readiness FAIR Unit Testing Brack, et al (2021). 10 Simple Rules for making a software tool workflow-ready. https://doi.org/10.5281/zenodo.5636487 What’s the reward? What’s a FAIR Unit? How will we assess? How to refactor? WfMS platforms Programmatic access to workflow metadata Common metadata, PID & API standards FAIR Software. Service that is FAIR enabling* Ramezani et al . (2021). D2.7 Framework for assessing FAIR Services (V1.0_DRAFT). https://doi.org/10.5281/zenodo.5336234
  41. Can we FAIR assist? automate? Abstraction framework for granularity assessment & (semi)- automated refactoring 2021 IEEE International Conference on Cluster Computing DOI: 10.1109/Cluster48925.2021.00053
  42. Professionalisation Community activism Service activism Can we FAIR assist? Best practice, stewardship. Training https://society-rse.org/
  43. WORKFLOW APPLICATION USER FAIR takes a Village Shared responsibility, shared benefits, shared curation TOOL DEVELOPER WORKFLOW USER WFMS DEVOP WORKFLOW DEVELOPER & CUSTODIAN COMPUTATIONAL USER Platform Service Workflow Labour Use Reach Software
  44. What can a lab do to be FAIR? As developer and user of workflows, datasets, tools? Get Help Skill the Team with Best Practice Register/Publish Cite & credit makers Document for Strangers https://fair-software.nl/ Professionalisation Pre and post hoc Corpas M et al (2018) A FAIR guide for data providers to maximise sharing of human genomic data, PLOS Comp Bio Boeckhout M et al (2018) The FAIR guiding principles for data stewardship: fair enough?, E J of Human Genetics Use WfMSs and tools that are FAIR enabling Checklists A Management Plan Use Standards Use IDs
  45. What can a lab do to be FAIR? As developer and user of workflows, datasets, tools? Get Help Document for Strangers https://fair-software.nl/ Professionalisation Pre and post hoc Corpas M et al (2018) A FAIR guide for data providers to maximise sharing of human genomic data, PLOS Comp Bio Boeckhout M et al (2018) The FAIR guiding principles for data stewardship: fair enough?, E J of Human Genetics Use WfMSs and tools that are FAIR enabling Checklists A Management Plan Use Standards Use IDs Register/Publish Cite & credit makers Skill the Team with Best Practice
  46. What can the WfMS Community do? Collective action by a few WfMS and services nails 80% workflow use. Ferreira da Silva et al, A Community Roadmap for Scientific Workflows Research and Development, arXiv:2110 Best Practice Support a FAIR metadata framework
  47. TL;DL FAIR Computational Workflows FAIR Principles laid the foundation for sharing digital assets Computational workflows are Hybrid Digital Objects of scholarship Should support the creation of FAIR data and themselves adhere to FAIR Principles Metadata matters FAIR takes a Village. Life Sciences has begun work.
  48. Acknowledgements The WorkflowHub Club, Bioschemas Community, RO-Crate Community, CWL Community, Galaxy Europe, EOSC-Life and ELIXIR Tools Platform. https://about.workflowhub.eu/community/ Special Thanks Rafael Ferreira da Silva (Oakridge) Stian Soiland-Reyes (U Manchester / U Amsterdam) Paul Brack, Stuart Owen, Finn Bacall, Alan Williams, Doug Lowe (U Manchester) Björn Grüning (U Freiburg) Frederik Coppens (VIB) Sarah Jones (GEANT) Herve Menager (Pasteur Institute) Sarah Cohen-Boulakia (U Paris Sacly) Dan Katz (U Illinois Urbana-Champaign) Simone Leo (CRS4) Laura Rodriguez-Navas (BSC) José Mª Fernández (BSC) Denis Yuen (Ontario Institute for Cancer Research) Tristan Glatard (Concordia University) Chris Erdmann (AGU) WorkflowHub https://workflowhub.eu/ and https://workflowhub.org EOSC-Life https://www.eosc-life.eu/ ELIXIR http://elixir-europe.org RO-Crate https://www.researchobject.org/ro-crate/ Galaxy Europe https://galaxyproject.eu/ Bioschemas https://bioschemas.org Common Workflow Language https://www.commonwl.org/ WorkflowsRI https://workflowsri.org/ Dockstore https://dockstore.org/ RDMkit https://rdmkit.elixir-europe.org
  49. Wither Workflow Interoperability? FAR not FAIR? (Question by Rafael Ferreira da Silva) What is Workflow Interoperability? • CWL /WDL - WfMS independence rather than interoperability? • Execution of sub-workflows – (re)usability rather than interoperability? • Multiple WfMS execution – are WfMS really executed in mixed workflows or is this front/backends that can run multiple WfMS (e.g. TES/WES)? • Composability of workflow units - Data I/O compatibility I1. Software should read, write or exchange data in a way that meets domain-relevant community standards
Advertisement