EOSC-Life
Workflow
Collaboratory
IEEE e-Science Conference 2021, 22nd Sept 2021
Carole Goble
The University of Manchester
Joint Head of Node ELIXIR-UK
EOSC-Life Cluster
carole.goble@manchester.ac.uk
The European Open Science Cloud
• seamless access to data, tools,
compute and services
• FAIR management
• reliable reuse of all research
digital objects produced along the
research life cycle
• Web of FAIR Data and services for
science & value-added services.
A federated and open multi-
disciplinary environment where
they can publish, find and re-use
data, tools and services for
research, innovation and
educational purposes.
The European Open Science Cloud
Figure: EPOS
13 Research Infrastructures
350+ institutions
45+ partners in project
https://lifescience-ri.eu/
https://www.eosc-life.eu/
An open collaborative
space for digital
biology in Europe
Since 2019
13 Research Infrastructures
350+ institutions
45+ partners in project
An open collaborative
space for digital
biology in Europe
https://lifescience-ri.eu/
https://www.eosc-life.eu/
Since 2019
Computational Workflows for Data intensive Bioscience
prepare, analyze, and share increasing volumes of complex data
CryoEM Image Analysis
Metagenomic Pipelines
Protein Ligand
Simulation
[Adam Hospital]
[Rob Finn]
[Carlos Oscar Sorzano Sanchez]
Nature 573, 149-150 (2019)
https://doi.org/10.1038/d41586-019-02619-z
Computational Workflows
Multi-step processes to
coordinate and execute
multiple codes and handle
data and processing
dependencies
https://covid19.galaxyproject.org
https://covid19beacon.crg.eu
https://bit.ly/cog-uk-monitoring
SARS-CoV-2 pre-processing, monitoring, analysis
Automated monitoring of structured
data from the EU COVID-19 Data Portal
Managed central service and deployable
infrastructure
Improved data quality, uniformly
analysed data, submission to public
archives
Basis for new National French COVID-
19 surveillance platform
Accelerating knowledge exchange
through workflow and data product
exchange
Take EOSC to the users’ tools
Workflows are an entry point to
the tools and datasets
functions for production quality
FAIR data processing
access to secure data processing
democratising resources
Figure Credit: Romain Dallet
A data and method commons / collaboratory
A portable environment of interoperable tools
RIs publish data, methods & services for management,
storage and reuse
WORKFLOW
APPLICATION USER
Collaboratory stakeholders
TOOL
DEVELOPER
WORKFLOW
USER
SYS ADMIN WORKFLOW
DEVELOPER
& CUSTODIAN
COMPUTATIONAL
USER
Workflow System as a Platform Workflow System as a Service
Labour
Reach
need
infrastructure
& services
need tools to be
wrapped &
maintained
need workflows to be
developed, tested,
run & maintained
need to find and understand
workflows, with explanations to
use properly and safely.
Principles of the Collaboratory: Honour legacy & diversity
Workflow management system agnostic
• WfMS
• Jupyter Notebooks
• Scripts
• Common Workflow Language
Different degrees of support
Buy-in & On-boarding of WfMS:
• popular WfMS: Galaxy, nextflow,
snakemake, CWL
• Specialised WfMS: SCIPION, NMRPipeline
Workflow lifecycle support
Workflows as FAIR Digital Objects
Principles of the Collaboratory: Honour legacy & diversity
https://fairdo.org
https://fairdo.org/wg/fdo-cwfr/
EOSC interoperability framework 2021
Workflows as FAIR Digital Objects
Encourage workflow communities to
make workflows data-FAIR
Support what we already have and
communities actually use
Towards adoption and sustainability
Open federated ecosystem of services
Open ended standards and metadata
exchange for glue
Open communities
The EOSC-Life Workflow Collaboratory Infra Roadmap
People, workflows, services and standards for FAIR Workflows.
CONTAINERS
WF REPOS
REGISTRIES
<my script>
Dedicated Workflow
Testing and Monitoring
services
Workflow Registry
Existing EOSC, community,
commercial computational
infrastructure
Existing Workflow Mgt & Execution Systems
Community Wf Repos
FAIR Workflow Metadata & Standards Framework
Describe self-describe workflows with PIDs and metadata.
Flow: move workflows between services and platforms. Conduits, not silos.
Parts: package (scattered) objects linked together by context (metadata files + their objects)
RO-Crate https://www.researchobject.org/ro-crate/
Bioschemas https://bioschemas.org/
Common Workflow Language https://www.commonwl.org
GA4GH TRS https://ga4gh.github.io/tool-registry-service-schemas/
Practical, lightweight approach Machine
and human readable, search engine friendly
and developer familiar
FAIR Object Underware
Standard Web Native PIDs + JSON-LD +
Schema.org, off the shelf archiving formats
Self-describing, duck-typed by profiles +
add more schema.org and domain
ontologies
Extensible, descriptive and content
openendedness, honouring legacy, diversity,
and known and unknown unknowns - one size
does not fit all
A Graph inside the RO-Crate
PIDs connect the Graph to the
outside world
http://www.researchobject.org/ro-crate/
https://workflowhub.eu
https://workflowhub.org
Workflows
• May remain in their host
repositories
• Organised by teams, collections
& properties
• Linked with data, docs …
WorkflowHub
• GA4GH TRS API
• RO-Crate for import/export
• Bioschemas for metadata
• CWL for canonical workflow
description
• Full GitHub integration Fall 2021
Mixed depth of support for WfMS
• Lifting metadata from systems
• RO-Crate / TRS support
• Coupling to execution platforms
1
Linking up providers and users
Building visibility & reputation
Reciprocity to close the
“Find – Get– Use – Credit” loop
Canonical workflows, workflow
blocks and libraries
DOIs, Citation
Companion objects
Versioning
Knowledge Graphs linking out to
OpenAIRE, DataCite etc
Deposit workflows in Zenodo
Workflow Collaboratory
and Collections
Workflow Services: Testing and monitoring
Uses RO-Crates to exchange. Enriches RO-Crates & their metadata
Integrated with WorkflowHub
Central aggregation point for workflow test statuses and outputs from various testing
services (e.g., Travis CI, GitHub Actions, Jenkins, etc.).
Facilitate the periodic automated execution of workflow tests.
Benchmarking and Technical monitoring of bioinformatics tools.
Check workflow performance, provenance on containers, memory
usage …
https://openebench.bsc.es/dashboard
https://lifemonitor.eu/
Enable new services
https://github.com/inab/WfExS-backend
Workflow execution service for handling
sensitive human data & analysis
Consumes and creates RO-Crates
UI to start computational tasks based
on containerised software
[Jose Maria Fernandez, Laura Rodrigues-Navas, Salvador Capella, BSC]
Beyond Biology, Beyond Our Infrastructures
Specimen Data Refinery
Natural History Collection
Digitalisation Pipelines
EOSC-Life Workflow Collaboratory
Bringing EOSC to users through workflows
Making a picture out of a jigsaw through
metadata and APIs
Essential to get WfMS on board and adopt
active community efforts
Mainly built through virtual hackathons
and open development
Being rolled out into big EU projects on
infectious diseases and cancer
Consultancy and
training essential in
infrastructures and
habitually under or miss
resourced
Workflow best practice
Delegate to WfMS
communities
PEOPLE
ARE
INFRASTRUCTURE
WorkflowHub Club : a open community effort
Join us on
https://about.workflowhub.eu/community/
EOSC-Life https://www.eosc-life.eu/
RO-Crate https://www.researchobject.org/ro-crate/
WorkflowHub https://workflowhub.eu/
Galaxy Europe https://galaxyproject.eu/
Bioschemas https://bioschemas.org/
Common Workflow Language https://www.commonwl.org/

EOSC-Life Workflow Collaboratory

  • 1.
    EOSC-Life Workflow Collaboratory IEEE e-Science Conference2021, 22nd Sept 2021 Carole Goble The University of Manchester Joint Head of Node ELIXIR-UK EOSC-Life Cluster carole.goble@manchester.ac.uk
  • 2.
    The European OpenScience Cloud • seamless access to data, tools, compute and services • FAIR management • reliable reuse of all research digital objects produced along the research life cycle • Web of FAIR Data and services for science & value-added services. A federated and open multi- disciplinary environment where they can publish, find and re-use data, tools and services for research, innovation and educational purposes.
  • 3.
    The European OpenScience Cloud Figure: EPOS
  • 4.
    13 Research Infrastructures 350+institutions 45+ partners in project https://lifescience-ri.eu/ https://www.eosc-life.eu/ An open collaborative space for digital biology in Europe Since 2019
  • 5.
    13 Research Infrastructures 350+institutions 45+ partners in project An open collaborative space for digital biology in Europe https://lifescience-ri.eu/ https://www.eosc-life.eu/ Since 2019
  • 6.
    Computational Workflows forData intensive Bioscience prepare, analyze, and share increasing volumes of complex data CryoEM Image Analysis Metagenomic Pipelines Protein Ligand Simulation [Adam Hospital] [Rob Finn] [Carlos Oscar Sorzano Sanchez] Nature 573, 149-150 (2019) https://doi.org/10.1038/d41586-019-02619-z Computational Workflows Multi-step processes to coordinate and execute multiple codes and handle data and processing dependencies
  • 7.
    https://covid19.galaxyproject.org https://covid19beacon.crg.eu https://bit.ly/cog-uk-monitoring SARS-CoV-2 pre-processing, monitoring,analysis Automated monitoring of structured data from the EU COVID-19 Data Portal Managed central service and deployable infrastructure Improved data quality, uniformly analysed data, submission to public archives Basis for new National French COVID- 19 surveillance platform Accelerating knowledge exchange through workflow and data product exchange
  • 8.
    Take EOSC tothe users’ tools Workflows are an entry point to the tools and datasets functions for production quality FAIR data processing access to secure data processing democratising resources Figure Credit: Romain Dallet A data and method commons / collaboratory A portable environment of interoperable tools RIs publish data, methods & services for management, storage and reuse
  • 9.
    WORKFLOW APPLICATION USER Collaboratory stakeholders TOOL DEVELOPER WORKFLOW USER SYSADMIN WORKFLOW DEVELOPER & CUSTODIAN COMPUTATIONAL USER Workflow System as a Platform Workflow System as a Service Labour Reach need infrastructure & services need tools to be wrapped & maintained need workflows to be developed, tested, run & maintained need to find and understand workflows, with explanations to use properly and safely.
  • 10.
    Principles of theCollaboratory: Honour legacy & diversity Workflow management system agnostic • WfMS • Jupyter Notebooks • Scripts • Common Workflow Language Different degrees of support Buy-in & On-boarding of WfMS: • popular WfMS: Galaxy, nextflow, snakemake, CWL • Specialised WfMS: SCIPION, NMRPipeline Workflow lifecycle support Workflows as FAIR Digital Objects
  • 11.
    Principles of theCollaboratory: Honour legacy & diversity https://fairdo.org https://fairdo.org/wg/fdo-cwfr/ EOSC interoperability framework 2021 Workflows as FAIR Digital Objects Encourage workflow communities to make workflows data-FAIR Support what we already have and communities actually use Towards adoption and sustainability Open federated ecosystem of services Open ended standards and metadata exchange for glue Open communities
  • 12.
    The EOSC-Life WorkflowCollaboratory Infra Roadmap People, workflows, services and standards for FAIR Workflows.
  • 13.
    CONTAINERS WF REPOS REGISTRIES <my script> DedicatedWorkflow Testing and Monitoring services Workflow Registry Existing EOSC, community, commercial computational infrastructure Existing Workflow Mgt & Execution Systems Community Wf Repos
  • 14.
    FAIR Workflow Metadata& Standards Framework Describe self-describe workflows with PIDs and metadata. Flow: move workflows between services and platforms. Conduits, not silos. Parts: package (scattered) objects linked together by context (metadata files + their objects) RO-Crate https://www.researchobject.org/ro-crate/ Bioschemas https://bioschemas.org/ Common Workflow Language https://www.commonwl.org GA4GH TRS https://ga4gh.github.io/tool-registry-service-schemas/
  • 15.
    Practical, lightweight approachMachine and human readable, search engine friendly and developer familiar FAIR Object Underware Standard Web Native PIDs + JSON-LD + Schema.org, off the shelf archiving formats Self-describing, duck-typed by profiles + add more schema.org and domain ontologies Extensible, descriptive and content openendedness, honouring legacy, diversity, and known and unknown unknowns - one size does not fit all A Graph inside the RO-Crate PIDs connect the Graph to the outside world http://www.researchobject.org/ro-crate/
  • 16.
    https://workflowhub.eu https://workflowhub.org Workflows • May remainin their host repositories • Organised by teams, collections & properties • Linked with data, docs … WorkflowHub • GA4GH TRS API • RO-Crate for import/export • Bioschemas for metadata • CWL for canonical workflow description • Full GitHub integration Fall 2021 Mixed depth of support for WfMS • Lifting metadata from systems • RO-Crate / TRS support • Coupling to execution platforms
  • 17.
    1 Linking up providersand users Building visibility & reputation Reciprocity to close the “Find – Get– Use – Credit” loop Canonical workflows, workflow blocks and libraries DOIs, Citation Companion objects Versioning Knowledge Graphs linking out to OpenAIRE, DataCite etc Deposit workflows in Zenodo Workflow Collaboratory and Collections
  • 18.
    Workflow Services: Testingand monitoring Uses RO-Crates to exchange. Enriches RO-Crates & their metadata Integrated with WorkflowHub Central aggregation point for workflow test statuses and outputs from various testing services (e.g., Travis CI, GitHub Actions, Jenkins, etc.). Facilitate the periodic automated execution of workflow tests. Benchmarking and Technical monitoring of bioinformatics tools. Check workflow performance, provenance on containers, memory usage … https://openebench.bsc.es/dashboard https://lifemonitor.eu/
  • 19.
    Enable new services https://github.com/inab/WfExS-backend Workflowexecution service for handling sensitive human data & analysis Consumes and creates RO-Crates UI to start computational tasks based on containerised software [Jose Maria Fernandez, Laura Rodrigues-Navas, Salvador Capella, BSC]
  • 20.
    Beyond Biology, BeyondOur Infrastructures Specimen Data Refinery Natural History Collection Digitalisation Pipelines
  • 21.
    EOSC-Life Workflow Collaboratory BringingEOSC to users through workflows Making a picture out of a jigsaw through metadata and APIs Essential to get WfMS on board and adopt active community efforts Mainly built through virtual hackathons and open development Being rolled out into big EU projects on infectious diseases and cancer Consultancy and training essential in infrastructures and habitually under or miss resourced Workflow best practice Delegate to WfMS communities PEOPLE ARE INFRASTRUCTURE
  • 22.
    WorkflowHub Club :a open community effort Join us on https://about.workflowhub.eu/community/ EOSC-Life https://www.eosc-life.eu/ RO-Crate https://www.researchobject.org/ro-crate/ WorkflowHub https://workflowhub.eu/ Galaxy Europe https://galaxyproject.eu/ Bioschemas https://bioschemas.org/ Common Workflow Language https://www.commonwl.org/