FAIR
Computational
Workflows
Professor Carole Goble
The University of Manchester UK
EU Research Infrastructures ELIXIR, IBISBA, EOSC-Life
BioExcel Centre of Excellence
Software Sustainability Institute UK
FAIRDOM Consortium
carole.goble@manchester.ac.uk
FAIReScience, IEEE eScience, 20th September 2021
Computational Workflows for Data intensive Bioscience
prepare, analyze, and share increasing volumes of complex data
CryoEM Image Analysis
Metagenomic Pipelines
Protein Ligand
Simulation
[Adam Hospital]
[Rob Finn]
[Carlos Oscar Sorzano Sanchez]
Nature 573, 149-150 (2019)
https://doi.org/10.1038/d41586-019-02619-z
Multi-step processes to
coordinate and execute multiple
codes and handle data and
processing dependencies
Typically Data flows
Benefit from FAIR data with
machine processable metadata
A precise description
A special kind of software
Workflow Management Systems FAIR bits
Abstraction: Separation of the workflow specification from its execution & tools
FAIR stratification, FAIR all the way down
FAIR Software
FAIR Data
FAIR Data FAIR Services
Image credit: BioExcel Centre of Excellence
Composition & Portability
different
components,
codes,
languages,
third parties
Workflow Management Systems FAIR bits
Composition: modularisation, FAIR parts & dependencies, propagation of FAIR properties
FAIR all the way down, versions, parts recycled, repurposed, remixed, citable credit
Workflow System Landscape
Inter-twingled, mix and matching
Scripting
environments
Interactive Electronic
Research Notebooks
Repositories Registries
Workflow
Management
Systems & execution
platforms
https://s.apache.org/existing-workflow-systems
298 Systems
General and Specialised
General Repositories
Identifiable
Community
FAIR Principles for Workflows
Hybrid Processual Digital Objects
Method “Data” Objects
Workflows as
FAIR Software
FAIR+R and FAIR++
Quality, maturity, maintainability
The principles revised
Workflows as
FAIR Digital Objects
Data-like method objects
Associated objects
The principles adapted
Workflows as
FAIR Data Instruments
FAIRification of the dataflow
The data principles supported
C. Goble, S. Cohen-Boulakia, S.
Soiland-Reyes, D. Garijo, Y. Gil, M.R.
Crusoe, K. Peters & D. Schober. FAIR
computational workflows. Data
Intelligence 2(2020), 108–121.
doi: 10.1162/dint_a_000
Workflow Objects
Software Objects
Data FAIRification
Efforts: Workflow Findability and Accessibility
Registries: lifecycle support for living workflows and associated objects
Identifiers: DOIs, ORCID, ROR etc
Licensing, credit, attribution
Support versions, reuse & remix
Workflow libraries
Access workflows at source, Github support
Auto / manual harvested metadata
Registry – execution integration
Execution monitoring services
Onboard WfMS platforms
Metadata standards framework
Metadata by stealth
https://workflowhub.eu
Publishing Services
Journals
scripts
Repos
Containers Deploys
Tools
https://dockstore.org/
Registries
Efforts: Workflow Metadata Frameworks
Metadata for machines & people, for WfMS, Registries & Services
Common metadata
about the workflow,
tools & parameters
Canonical workflow
description of the
steps of the workflow
Type the input and outputs
of the steps
Run Provenance /
Histories / Tests
RO-Crate format for
packaging a workflow, its
metadata and companion
objects (links to containers,
data etc) for exchange,
archiving, reporting, citing.
FAIR Digital Object
Open
Communities
Efforts: Workflow Interoperability
1. Workflow spec & WfMS
interoperability: describe workflows
independently of WfMS. Platform
independent pipeline exchange and
comparison.
2. Workflow Composability: Software interoperates
through APIs and metadata standards (FAIR4RS*).
Workflow-ready tools.
Recycle tested & validated canonical workflow blocks.
https://openwdl.org/
https://www.commonwl.org
Design for FAIR Data
& FAIR Workflow Reuse
Review
Curation
Certification
Governance
Licence combinations
Access permissions
Local -> Global identifiers
Best
Practice
* FAIR4RS First Draft of FAIR4RS principles
Efforts: Workflow Reusability and Usability
FAIR+R, FAIR++, FAIR4RS
Reusable – “can be understood, modified, built upon or incorporated into other software
workflows” Composability + Associated Objects + Metadata
Usable – “can be executed”
Containers & Packaging Testing & monitoring Execution standards APIs
Tool Registry Service API
checker workflows
test data
A2. metadata are accessible, even when the workflow is no longer available
Enough metadata that a workflow is read-reproducible as a method description if it no longer runs
Effort: Workflows as functions for FAIR Data
Data FAIRification of Workflows, assisted by WfMS & reporting
Challenge of diverse API & AAI landscape, formats and packaging
Review
Curation
Certification
Governance
Best Practice
Golden Examples
Canonical
workflows
Manage
AAI, format,
packaging
choices
Design for
FAIR Data
and Reuse
FAIR Computational Workflows
Hybrid Processual Digital Objects
Data + Software FAIR Principles
Data FAIRification methods
WfMS support
FAIR takes a village
Community of projects, WfMS, platforms &
environments, stakeholders.
Long tail pattern. Collective action by a few
WfMS and services nails 80:20.
FAIR by stealth.
Borgman, C. L., & Bourne, P. E. (2021). Why it takes a village to manage and share data. Harvard Data Science Review (under Review), arXiv:2109.01694v1.
EOSC-Life https://www.eosc-life.eu/
RO-Crate https://www.researchobject.org/ro-crate/
WorkflowHub https://workflowhub.eu/
Galaxy Europe https://galaxyproject.eu/
Bioschemas https://bioschemas.org/
Common Workflow Language https://www.commonwl.org/
Dockstore https://dockstore.org/
WorkflowsRI https://workflowsri.org/
Acknowledgements
Sarah Cohen-Boulakia, Stian Soiland-Reyes, Daniel Garijo,
Yolanda Gil, Michael Crusoe, Kristian Peters, Daniel
Schober
FAIR Computational Workflows

FAIR Computational Workflows

  • 1.
    FAIR Computational Workflows Professor Carole Goble TheUniversity of Manchester UK EU Research Infrastructures ELIXIR, IBISBA, EOSC-Life BioExcel Centre of Excellence Software Sustainability Institute UK FAIRDOM Consortium carole.goble@manchester.ac.uk FAIReScience, IEEE eScience, 20th September 2021
  • 2.
    Computational Workflows forData intensive Bioscience prepare, analyze, and share increasing volumes of complex data CryoEM Image Analysis Metagenomic Pipelines Protein Ligand Simulation [Adam Hospital] [Rob Finn] [Carlos Oscar Sorzano Sanchez] Nature 573, 149-150 (2019) https://doi.org/10.1038/d41586-019-02619-z Multi-step processes to coordinate and execute multiple codes and handle data and processing dependencies Typically Data flows Benefit from FAIR data with machine processable metadata A precise description A special kind of software
  • 3.
    Workflow Management SystemsFAIR bits Abstraction: Separation of the workflow specification from its execution & tools FAIR stratification, FAIR all the way down FAIR Software FAIR Data FAIR Data FAIR Services
  • 4.
    Image credit: BioExcelCentre of Excellence Composition & Portability different components, codes, languages, third parties Workflow Management Systems FAIR bits Composition: modularisation, FAIR parts & dependencies, propagation of FAIR properties FAIR all the way down, versions, parts recycled, repurposed, remixed, citable credit
  • 5.
    Workflow System Landscape Inter-twingled,mix and matching Scripting environments Interactive Electronic Research Notebooks Repositories Registries Workflow Management Systems & execution platforms https://s.apache.org/existing-workflow-systems 298 Systems General and Specialised General Repositories Identifiable Community
  • 6.
    FAIR Principles forWorkflows Hybrid Processual Digital Objects Method “Data” Objects Workflows as FAIR Software FAIR+R and FAIR++ Quality, maturity, maintainability The principles revised Workflows as FAIR Digital Objects Data-like method objects Associated objects The principles adapted Workflows as FAIR Data Instruments FAIRification of the dataflow The data principles supported C. Goble, S. Cohen-Boulakia, S. Soiland-Reyes, D. Garijo, Y. Gil, M.R. Crusoe, K. Peters & D. Schober. FAIR computational workflows. Data Intelligence 2(2020), 108–121. doi: 10.1162/dint_a_000 Workflow Objects Software Objects Data FAIRification
  • 7.
    Efforts: Workflow Findabilityand Accessibility Registries: lifecycle support for living workflows and associated objects Identifiers: DOIs, ORCID, ROR etc Licensing, credit, attribution Support versions, reuse & remix Workflow libraries Access workflows at source, Github support Auto / manual harvested metadata Registry – execution integration Execution monitoring services Onboard WfMS platforms Metadata standards framework Metadata by stealth https://workflowhub.eu Publishing Services Journals scripts Repos Containers Deploys Tools https://dockstore.org/ Registries
  • 8.
    Efforts: Workflow MetadataFrameworks Metadata for machines & people, for WfMS, Registries & Services Common metadata about the workflow, tools & parameters Canonical workflow description of the steps of the workflow Type the input and outputs of the steps Run Provenance / Histories / Tests RO-Crate format for packaging a workflow, its metadata and companion objects (links to containers, data etc) for exchange, archiving, reporting, citing. FAIR Digital Object Open Communities
  • 9.
    Efforts: Workflow Interoperability 1.Workflow spec & WfMS interoperability: describe workflows independently of WfMS. Platform independent pipeline exchange and comparison. 2. Workflow Composability: Software interoperates through APIs and metadata standards (FAIR4RS*). Workflow-ready tools. Recycle tested & validated canonical workflow blocks. https://openwdl.org/ https://www.commonwl.org Design for FAIR Data & FAIR Workflow Reuse Review Curation Certification Governance Licence combinations Access permissions Local -> Global identifiers Best Practice * FAIR4RS First Draft of FAIR4RS principles
  • 10.
    Efforts: Workflow Reusabilityand Usability FAIR+R, FAIR++, FAIR4RS Reusable – “can be understood, modified, built upon or incorporated into other software workflows” Composability + Associated Objects + Metadata Usable – “can be executed” Containers & Packaging Testing & monitoring Execution standards APIs Tool Registry Service API checker workflows test data A2. metadata are accessible, even when the workflow is no longer available Enough metadata that a workflow is read-reproducible as a method description if it no longer runs
  • 11.
    Effort: Workflows asfunctions for FAIR Data Data FAIRification of Workflows, assisted by WfMS & reporting Challenge of diverse API & AAI landscape, formats and packaging Review Curation Certification Governance Best Practice Golden Examples Canonical workflows Manage AAI, format, packaging choices Design for FAIR Data and Reuse
  • 12.
    FAIR Computational Workflows HybridProcessual Digital Objects Data + Software FAIR Principles Data FAIRification methods WfMS support FAIR takes a village Community of projects, WfMS, platforms & environments, stakeholders. Long tail pattern. Collective action by a few WfMS and services nails 80:20. FAIR by stealth. Borgman, C. L., & Bourne, P. E. (2021). Why it takes a village to manage and share data. Harvard Data Science Review (under Review), arXiv:2109.01694v1.
  • 13.
    EOSC-Life https://www.eosc-life.eu/ RO-Crate https://www.researchobject.org/ro-crate/ WorkflowHubhttps://workflowhub.eu/ Galaxy Europe https://galaxyproject.eu/ Bioschemas https://bioschemas.org/ Common Workflow Language https://www.commonwl.org/ Dockstore https://dockstore.org/ WorkflowsRI https://workflowsri.org/ Acknowledgements Sarah Cohen-Boulakia, Stian Soiland-Reyes, Daniel Garijo, Yolanda Gil, Michael Crusoe, Kristian Peters, Daniel Schober