SlideShare a Scribd company logo
1 of 49
FAIR Computational Workflows
Professor Carole Goble
The University of Manchester UK
EU Research Infrastructures: ELIXIR, IBISBA, EOSC-Life
Centre of Excellence: BioExcel
carole.goble@manchester.ac.uk
JOBIM 2021, 8th July 2021
https://tinyurl.com/jobim-goble
Computational Workflows for Data intensive Bioscience
prepare, analyze, and share increasing volumes of complex data
CryoEM Image Analysis
Metagenomic Pipelines
Drug Discovery
Protein Ligand MD
Simulation
Genome Annotation
High Throughput Sequencing
Fabrice Allain
Romain Dallet
20 years+
Computational workflows
decades in the making…finally coming of age….
doi: 10.1093/gigascience/giaa140
Nature 573, 149-150 (2019)
https://doi.org/10.1038/d41586-019-02619-z
What are Data intensive Computational Workflows?
Systematic linking together multiple tools and software packages
inputs
outputs
tools, CLI,
containers,
workflows
Scale up
Access to computational infrastructure
and datasets, tool interoperability,
processing portability and
optimisation, data wrangling.
Specification
description
Software
Execution
WfMS
Engine
Workflow
Scale out
Flexible workflow composition to
construct & run executable control
and data flows using
heterogeneous software packages,
codes, tools, other workflows made
by other people.
SARS-CoV-2 allelic-variant surveillance
Automated monitoring of structured data
from the European COVID-19 Data Portal and
national SAR-CoV-2 sequencing datasets,
notably COG-UK.
Scalable via access to a global distributed
compute network
• Improved data quality
• Uniformly analysed data for downstream
analysis & visualisation
• Submission of data to public archives
• All workflows, data and documentation
available https://covid19.galaxyproject.org
https://elixir-europe.org/news/covid-19-variants-galaxy
https://doi.org/10.1101/2021.03.25.437046
Suite of
workflows
Distributed analysis , Pulsar network
Managed online hosted Workflow as a Service Platform
Designed for direct use by end users - 32K users
Experts build workflows that others can use with their own data
Researchers build and reuse workflows that are shared
End users also use it to access and interact with a tool
Workflow and Tool histories and reporting [Björn Grüning]
Those workflows in the WorkflowHub Registry
Find, publish and cite workflows and
collections. Reuse, recycle, repurpose.
Sharing Accelerates Science
8
Jacques van Helden
A digital space for
EMERGEN, the French
plan for SARS-CoV-2
genomic surveillance and
research
Adapting and Reusing the ELIXIR
Galaxy Workflows
Tried and tested transparent
methods.
Inter-twingled Workflow System Landscape
Scripting
environments
Interactive Electronic
Research Notebooks
Workflow
Management
Systems & execution
platforms
Repositories Registries
Inter-twingling
Mix and Matching
Interactive &
exploratory
analysis
Production, automated,
workflow-integrated
software
https://s.apache.org/existing-workflow-systems
298 Systems
10 Handy Properties of Computational Workflows
Composition & Abstraction
Using the best codes written by 3rd parties
Handle heterogeneity
Shield complexity & incompatibility
Sharable reusable, re-mixable methods
Automation
Repetitive reproducible pipelines
Simulation sweeps
Manage data and control flow
Optimised monitoring & recovery
Automated deployment
Scalability & Infrastructure Access
Accessing infrastructures, datasets and tools
Optimised computation and data handling
Parallelisation
Secure sensitive data access & management
Interoperating datasets & permission handling
Reporting & Accreditation
Portability
Sharing & Adaptability
Provenance logging & data lineage
Auto-documentation
Result comparison
Dependency handling
Containerisation & packaging
Moving between on premise & cloud
Shared method, publishable know-how
BYOD / parameters
Different implementations
Changes in execution infrastructure
https://snakemake.github.io/
Workflows are rules:
Graph of jobs for automatic parallelisation,
DIY package & containerisation
installation, auto-documentation
from frameworks to web based analysis platforms, hybrid cloud deployment
Communities tend to cluster round a few systems.
Take up of a WfMS typically depends on the “plugged-in” availability of data type
specific codes, skills level of the workflow developers, and popularity.
Online portals users build and reuse
workflows around publicly available or
user-uploaded data and pre-wrapped,
pre-installed tools.
Vive la France!
https://galaxy-synbiocad.org/
https://www.biorxiv.org/content/10.1101/2020.06.14.145730v1.full.pdf
[Jean-Loup Faulon]
WORKFLOW
APPLICATION USER
Yes it’s work, Labour saving -> Labour shifting know-how
Production platforms & pipelines
TOOL
DEVELOPER
WORKFLOW
USER
SYS ADMIN WORKFLOW
DEVELOPER
& CUSTODIAN
COMPUTATIONAL
USER
Workflow System as a Platform Workflow System as a Service
Labour
Reach
need
infrastructure
& services
need tools to be
wrapped &
maintained
need workflows to be
developed, tested,
run & maintained
need to find and understand
workflows, with explanations to
use properly and safely.
from compounds &
genomics to tissue banks,
from plants to marine to
humans…
https://lifescience-ri.eu/
An open collaborative
space for digital
biology in Europe
A Workflow and Tools Collaboratory
A data and method commons
Workflows are an entry point to the tools and
datasets of EOSC-Life
functions for production quality FAIR data
processing and access to secure data processing
With thanks: Romain Dallet
Galaxy Genome Annotation (GGA) environment in the cloud
The EOSC-Life Workflow Collaboratory
People -> workflows, services and standards for FAIR Workflows.
Computational Workflow Framings
Reproducibility
Replication
Regulation
Labour saving
Productivity
Reliability
Knowledge sharing
Adaption
Scholarly Objects
Democratisation
of computational
analysis & methods
Computational Workflow Framing: FAIR Principles
The EOSC-Life FAIR Workflow Collaboratory
A set of guiding principles to enhance the
value of all digital resources and their reuse
by people and by machines
aligning a community around a journey to
common data guidelines
To help accelerate science so folks can find
and reuse and interlink data – and tools
and workflows too!
Consumers and producers all benefit.
Computational Workflow Framing: FAIR Principles
The EOSC-Life FAIR Workflow Collaboratory
FAIR is the EOSC glue to federate
data and services,
to apply to all objects
How the FAIR Principles look
RDA FAIR Data Maturity Model. Specification and Guidelines
https://zenodo.org/record/3909563#.YORYkUzTX19
https://www.go-fair.org/fair-principles/
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data
management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
FAIR Principles for Data
tl;dr
https://www.go-fair.org/fair-principles/
Persistent human readable and machine-actionable metadata
Linked metadata and community standards
Persistent identifiers
Clear licensing and access rules
Protocols for machine accessibility
Register / Index
FAIR for Software
Software is a digital object but research software is not (just) data
https://www.rd-alliance.org/groups/fair-4-researchsoftware-fair4rs-wg
FAIR for Research Software (FAIR4RS) working group
Katz et al., 2016; Lamprecht et al., 2019
FAIR4RS First Draft of FAIR4RS principles
CodeMeta
https://github.com/codemeta/codemeta/
https://www.softwareheritage.org/
https://www.cascad.tech/
puts software on a par with publications and data and announces a
number of measures designed to open research software and
better recognize software development in research.
https://cache.media.enseignementsup-recherche.gouv.fr/file/science_ouverte/20/9/MEN_brochure_PNSO_web_1415209.pdf
Data and software are first class objects and
there will be sharing.
Primary responsibility aimed at creators and
providers for benefit of consumers
but consumers need to shoulder responsibility
too.
Operating in an (open) ecosystem.
Adoption at scale in legacy settings.
Not a green-field site.
EOSC-Life FAIR
Workflow Collaboratory
FAIR Implicit Assumptions in the Principles
FAIR Principles for Workflows
Hybrid Processual Digital Objects
Method “Data” Objects
Workflows as
FAIR Software
FAIR+R and FAIR++
The principles can be
revised
Workflows as
FAIR Digital Objects
Data-like method objects
The principles can be
adapted
Workflows as
FAIR Data Instruments
FAIRification of the dataflow
The data principles can be
supported
C. Goble, S. Cohen-Boulakia, S.
Soiland-Reyes, D. Garijo, Y. Gil, M.R.
Crusoe, K. Peters & D. Schober. FAIR
computational workflows. Data
Intelligence 2(2020), 108–121.
doi: 10.1162/dint_a_000
Workflow Objects
Software Objects
Composable
Usable
Reusable
FAIR Data
Abstraction & Reporting
Separation of the workflow specification from its execution & tools
Specification
description
Software
Execution
Precise description of a procedure
composed of multiple steps
coordinated by input/output data
relationships.
Execution of computational and
composted processes with data
consumed & produced by each step.
WfMS
Engine
Workflow
Sub
Workflows
Tools and
codes
Parameters
Inputs
Outputs
Infrastructure
Guidance
Associated
Objects
Data
Logs /
Histories /
Provenance
Services,
e.g. Test engines
+
Related workflows
Checker workflows
Contextual Entities
Metadata Graphs
Sample input
parameters, test data
Software
Management
https://bioexcel.eu/speed-up-your-biomolecular-simulations-with-workflows-using-the-bioexcel-building-blocks-biobb/
Image credit: Bioexcel Centre of excellence
Composition & Portability
Analysis components - different codes/languages/third parties/compute
FAIR Principles for Workflows
coping with Hybrid Processual Digital Objects
Composition & agency
Usable not just reusable
Abstraction forms
Living & reusable parts & whole
versioned, forked, cloned
parts recycled, repurposed, remixed
limited lifespans
citable credit
executability
reproducibility, portability
testing, maturity
quality, maintainability
specification
implementation
instantiation
run result
FAIR+R
FAIR++
modularisation
FAIR parts & dependencies
propagation of FAIR properties
Findable & Accessable
register workflows with assigned PID + metadata in a searchable resource.
https://workflowhub.eu
Publishing Services
Journals
Digital Objects of Scholarship
published, cited, exchanged, reviewed, validated & reused in
new and different ways
• Versioned identifiers
• DOI assignment (https://doi.org/10.48546/workflowhub.workflow.29.2)
• Collections, Canonical workflow libraries
scripts
Repos
Containers Deploys
Tools
Agnostic and generous with the many
WfMSs (with different degrees of support)
• Workflows can be in native places
• Metadata standards framework that
all services can adopt on a spectrum
and handles associated objects and
links between objects.
• Perpetual development by an open
community
licensing
authors
& credit
analytics
access
search
versions & status
other
workflows
Biggest challenge?
Metadata of course!
Work with WfMS to auto-extract
metadata + provide metadata services
More than just a list
3
Spaces, Teams, People
Linking up providers and users
Building visibility & reputation
Reciprocity to close the
“Find – Get– Use – Credit” loop
Research objects to be cited
Build Knowledge Graphs linking
out to OpenAIRE, DataCite and
other tools
FAIR Workflow are FAIR Software
lifecycle support for living objects
Indicators of Status
Workflow
monitoring
Register versions
Version PIDs
Support Github actions
Track authors and contributions
Incremental metadata and
supplementary materials
Track & lift out sub-
workflows
R1.2: (Meta)data and software are associated
with detailed provenance
Tool Registry Service API
Accessible
metadata and workflows are retrievable by their PID using a standardized
communication protocol
GitHub page: https://github.com/ga4gh/tool-registry-service-schemas
FAIR Metadata for Machines
Machine and human readable canonical descriptions of the workflow
that are WfMS neutral
https://www.commonwl.org
Canonical description of the workflow
Linked to containerised tools
Aid collaboration & knowledge transfer
Standardise expression of workflow
Describe engine neutral portable, reusable workflows
Reduce vendor / project lock-in
Enable workflow comparisons
“Abstract” CWL
Design by canonical, modularised workflow blocks
Build a library of tested and validated CWL blocks
CWL:
• Canonical descriptions
• Recycle descriptions and sub-workflows
• Platform independent pipeline exchange and comparison
Rob Finn
Folker Meyer
AWE
MEGAHIT
Assembly
pipeline
[with thanks to Rob Finn]
Extensible Metadata Framework
that caters for all those processual FAIR criteria
Common metadata
about the workflow,
tools & parameters
Canonical workflow
description of the
steps of the workflow
Type the input and outputs
of the steps
Run Provenance / Histories / Tests
Format for packaging a
workflow, its metadata and
companion objects (links to
containers, data etc) for
exchange, archiving,
reporting, citing.
FAIR Digital Object
All Open Communities
Bioschemas lightweight metadata
Extensible and Linked metadata in service of the Life Science Community
Open community reusing industry de facto standard
Computation workflow profile
Formal
parameter
profile
https://bioschemas.org
Opinionated use of schema.org, the web
resource mark-up used by search engines,
knowledge graphs and increasingly science
as a whole.
Computational tool
Herve
Menager
Pasteur
Alban
Gaignard
Nante
Workflow Digital Objects
Lightweight way of packaging everything together regardless where or what it is
https://www.researchobject.org/ro-crate/
Format for packaging up scattered resources and self
describing the package and its parts to get an integrated
view + context, using metadata and PIDs to reference
digital and real things - data, workflows & people, places.
Web-native, off the
shelf - machine and
human readable,
search engine &
developer friendly.
Infrastructure
independent &
self-describing
PIDs, JSON-LD,
Schema.org,
archive formats
Extensible and open-
ended to cope with
diversity and legacy
“Duck typing”
using profiles +
added schema.org
and domain
ontologies
RO-Crate Profile Variants
Galaxy-
Workflow-
RO-Crate
Workflow-RO-Crate
Workflow-
Testing-
RO-Crate
Workflow-
Run-
RO-Crate
BioComputeObject
-RO-Crate
IEEE P2791-2020
https://www.researchobject.org/ro-crate/profiles
BioComputeObject - Regulation
why and how to use a workflow IEEE P2791-2020
robust, safe exchange & reuse of
HTS computational analytical
workflows
http://biocomputeobject.org
Alterovitz, Dean II, Goble, Crusoe, Soiland-Reyes et al “Enabling Precision Medicine via standard
communication of NGS provenance, analysis, and results” PLOS Biology 2018m,
https://doi.org/10.1371/journal.pbio.3000099
https://biocompute-objects.github.io/bco-ro-crate/
“Sidecar” third party metadata files
inside the RO-Crate
FAIR has to operate in a
legacy ecosystem
format
FAIR Digital Objects
RO-Crate a step towards FAIR Digital Object Middleware
“To be FAIR each digital object
type has its own metadata
requirements, and may have its
own repositories and registries”
FAIR Digital Objects for Science: From Data Pieces to Actionable
Knowledge Units: https://doi.org/10.3390/publications8020021
https://fairdo.org
https://fairdo.org/wg/fdo-cwfr/
Lightweight Semantic Workflow Underware is ready!
A2. metadata are accessible, even when the workflow is no
longer available
Metadata preservation...beyond any one service.
RO-Crate archive preserves metadata and workflow,
republished in a long-term archive
Archiving
General
Executing
Testing & Monitoring
WfMS
R1. workflows are richly described with a plurality of
accurate and relevant attributes
Automating metadata as much as possible, which
means on-boarding WfMS and FAIR services
Enough metadata that a workflow is read-
reproducible as a method description
FAIR Software - not just Reusable but Usable
i.e. can be executed once accessed
Multiple wf/test
backends: Galaxy
Pandemo, CWL,
Jenkins …
Check workflow
performance,
provenance on
containers,
memory usage …
Testing and monitoring
Containers & Packaging
FAIR+R
FAIR++
Tool Registry Service API
UI to start
computational tasks
based on
containerised
software
https://github.com/inab/WfExS-backend
High-level workflow
execution service
backend, sensitive
data analysis &
running on private
clouds, produces &
consumes RO-Crate
Reproduciblity – Repeatability
Provenance & Preservation
Workflow-Run-RO-Crate
Some heavy lifting … when is FAIR enough?
https://iitdbgroup.github.io/ProvenanceWeek2021/
July 22nd 2021
It’s free!!
R1.2: (Meta)data and software are associated with
detailed provenance - not just the workflow but the
run record associated with the data it produced ….
FAIR Interoperability and Reusability = Composability
*Reusable (can be understood, modified, built
upon or incorporated into other software)
Software interoperates with other software through community
standard APIs and community standard meta(data)
Software include qualified references to other objects
Richly described
Well documented
Licensed
Sample input parameters and test data
Checker workflows
Track versions
Programmatic access to (meta)data
Libraries of canonical workflow blocks
Make tools workflow-ready
Wrap tools
*FAIR4RS Proposed Principles for FAIR Software
Design for FAIR Data
Design for Reuse
Community Review
Community Curation
Certification
Best Practice
Licence combinations
Access permissions
Local -> Global identifiers
FAIR takes a village
its a JOINT responsibility and opportunity!
In order for data to be
FAIR, you need services
that enable FAIR
Be a good plug-in tool and data citizen
enable programmatic access to datasets
make clean tool interface
avoid usage restrictions
use open community data standards and formats
simplify installation
code for portability, parallelisation & reproducibility
manage versions
register! document!
Be a good workflow maker......and user
use and make FAIR identifiers for data
license data outputs
use open community data standards and formats
validate parameters
use a WfMS that tracks data provenance
consider secure data processing
manage versions
design tests and test data
credit tool and sub-workflow makers
choose FAIR data services
register! document! build libraries!
use well documented FAIR
enabling and FAIR workflows
credit the makers!
FAIR takes a village
its a JOINT responsibility and FAIR ≠ FREE
Advocate standards & practice
Sustain and manage infrastructure
Credit and incentives
Maturity models & metrics
Certification and canonical libraries
In order for data to be
FAIR, you need services
that enable FAIR
Training,
Stewardship &
Sustainability
Workflows are an entry point to
the tools and datasets of EOSC-Life
and functions for FAIR data.
FAIR Computational Workflows: TL;DL
Modern bioinformatics increasingly leans on computational workflows as
production workhorses and transparent, reproducible processing.
Workflows democratise access to data and infrastructure and sharing of
complex processing.
Workflows are hybrid Digital Objects of scholarship that should be FAIR
which means defining FAIR, and the necessary standards, services and
processes.
FAIR is an opportunity and necessity to get wider uptake of workflows
FAIR data, workflows and their infrastructure and everything else takes a
village where everyone shoulders responsibility for the benefit of all.
Acknowledgements
The WorkflowHub Club, Bioschemas Community, RO-Crate
Community, CWL Community, Galaxy Europe, EOSC-Life
and ELIXIR Tools Platform.
Special Thanks
Stian Soiland-Reyes (U Manchester / U Amsterdam)
Paul Brack, Stuart Owen, Finn Bacall, Alan Williams (U Manchester)
Björn Grüning (U Freiburg)
Frederik Coppens (VIB)
Sarah Jones (GEANT)
Herve Menager (Pasteur Institute)
Sarah Cohen-Boulakia (U Paris Sacly)
Dan Katz (U Illinois Urbana-Champaign)
Simone Leo (CRS4)
Laura Rodriguez-Navas (BSC)
José Mª Fernández (BSC)
EOSC-Life https://www.eosc-life.eu/
ELIXIR http://elixir-europe.org
RO-Crate https://www.researchobject.org/ro-crate/
WorkflowHub https://workflowhub.eu/ and workflowhub.org
Galaxy Europe https://galaxyproject.eu/
Bioschemas https://bioschemas.org
Common Workflow Language https://www.commonwl.org/

More Related Content

What's hot

E-Utilities
E-UtilitiesE-Utilities
E-Utilities
mkim8
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
Neo4j
 
Juanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdf
Juanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdfJuanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdf
Juanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdf
FIWARE
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
JULIO GONZALEZ SANZ
 
How to identify radiology productivity bottlenecks?
How to identify radiology productivity bottlenecks?How to identify radiology productivity bottlenecks?
How to identify radiology productivity bottlenecks?
Sergey Morozov, MD, PhD, MPH
 

What's hot (20)

Preparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR PrinciplesPreparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR Principles
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
 
E-Utilities
E-UtilitiesE-Utilities
E-Utilities
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
FAIR data overview
FAIR data overviewFAIR data overview
FAIR data overview
 
The future of FAIR
The future of FAIRThe future of FAIR
The future of FAIR
 
survival-guide.pptx
survival-guide.pptxsurvival-guide.pptx
survival-guide.pptx
 
Juanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdf
Juanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdfJuanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdf
Juanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdf
 
A short introduction to Canis Major
A short introduction to Canis MajorA short introduction to Canis Major
A short introduction to Canis Major
 
Knowledge graphs on the Web
Knowledge graphs on the WebKnowledge graphs on the Web
Knowledge graphs on the Web
 
Data Privacy in the Cloud.pdf
Data Privacy in the Cloud.pdfData Privacy in the Cloud.pdf
Data Privacy in the Cloud.pdf
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
GraphAware: Insights Discovery with KGs: Bringing Archives to Life (GraphSumm...
GraphAware: Insights Discovery with KGs: Bringing Archives to Life (GraphSumm...GraphAware: Insights Discovery with KGs: Bringing Archives to Life (GraphSumm...
GraphAware: Insights Discovery with KGs: Bringing Archives to Life (GraphSumm...
 
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
 
Network Biology Lent 2010 - lecture 1
Network Biology Lent 2010 - lecture 1Network Biology Lent 2010 - lecture 1
Network Biology Lent 2010 - lecture 1
 
Blockchain Technology for Patients Medical Records
Blockchain Technology for Patients Medical RecordsBlockchain Technology for Patients Medical Records
Blockchain Technology for Patients Medical Records
 
How to identify radiology productivity bottlenecks?
How to identify radiology productivity bottlenecks?How to identify radiology productivity bottlenecks?
How to identify radiology productivity bottlenecks?
 
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
 
Adi Wijaya - Scrum in Data Science, What Works and What Doesn’t
Adi Wijaya - Scrum in Data Science, What Works and What Doesn’tAdi Wijaya - Scrum in Data Science, What Works and What Doesn’t
Adi Wijaya - Scrum in Data Science, What Works and What Doesn’t
 
Topological associated domains- Hi-C
Topological associated domains- Hi-CTopological associated domains- Hi-C
Topological associated domains- Hi-C
 

Similar to FAIR Computational Workflows

FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
Carole Goble
 
Syst biol 2012-burguiere-sysbio sys069
Syst biol 2012-burguiere-sysbio sys069Syst biol 2012-burguiere-sysbio sys069
Syst biol 2012-burguiere-sysbio sys069
Thomas Burguiere
 
Use r 2013 tutorial - r and cloud computing for higher education and research
Use r 2013   tutorial - r and cloud computing for higher education and researchUse r 2013   tutorial - r and cloud computing for higher education and research
Use r 2013 tutorial - r and cloud computing for higher education and research
kchine3
 

Similar to FAIR Computational Workflows (20)

FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
 
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
 
Grid computing
Grid computingGrid computing
Grid computing
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf
 
Syst biol 2012-burguiere-sysbio sys069
Syst biol 2012-burguiere-sysbio sys069Syst biol 2012-burguiere-sysbio sys069
Syst biol 2012-burguiere-sysbio sys069
 
SomeSlides
SomeSlidesSomeSlides
SomeSlides
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
 
Use r 2013 tutorial - r and cloud computing for higher education and research
Use r 2013   tutorial - r and cloud computing for higher education and researchUse r 2013   tutorial - r and cloud computing for higher education and research
Use r 2013 tutorial - r and cloud computing for higher education and research
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
 
IRJET- A Workflow Management System for Scalable Data Mining on Clouds
IRJET- A Workflow Management System for Scalable Data Mining on CloudsIRJET- A Workflow Management System for Scalable Data Mining on Clouds
IRJET- A Workflow Management System for Scalable Data Mining on Clouds
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discovery
 
Paralyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop ClusterParalyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop Cluster
 
Knowledge Infrastructure for Global Systems Science
Knowledge Infrastructure for Global Systems ScienceKnowledge Infrastructure for Global Systems Science
Knowledge Infrastructure for Global Systems Science
 

More from Carole Goble

RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
Carole Goble
 
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Carole Goble
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
Carole Goble
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
Carole Goble
 

More from Carole Goble (20)

RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
Research Software Sustainability takes a Village
Research Software Sustainability takes a VillageResearch Software Sustainability takes a Village
Research Software Sustainability takes a Village
 
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
 
Open Research: Manchester leading and learning
Open Research: Manchester leading and learningOpen Research: Manchester leading and learning
Open Research: Manchester leading and learning
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can help
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
 
ELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR Board
 
FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research Commons
 
Reproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects help
 
Reflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic careerReflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic career
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community Update
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 

Recently uploaded

Warming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptxWarming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptx
GlendelCaroz
 
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptxNanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
ssusera4ec7b
 
Heat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree daysHeat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree days
Brahmesh Reddy B R
 
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPTHIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPT

Recently uploaded (20)

Fun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfFun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdf
 
Introduction and significance of Symbiotic algae
Introduction and significance of  Symbiotic algaeIntroduction and significance of  Symbiotic algae
Introduction and significance of Symbiotic algae
 
Technical english Technical english.pptx
Technical english Technical english.pptxTechnical english Technical english.pptx
Technical english Technical english.pptx
 
GBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismGBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) Metabolism
 
GBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationGBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolation
 
Heads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdfHeads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdf
 
Warming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptxWarming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptx
 
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptxNanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
 
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
 
Heat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree daysHeat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree days
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
 
GBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisGBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of Asepsis
 
An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...
An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...
An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...
 
THE FUNDAMENTAL UNIT OF LIFE CLASS IX.ppt
THE FUNDAMENTAL UNIT OF LIFE CLASS IX.pptTHE FUNDAMENTAL UNIT OF LIFE CLASS IX.ppt
THE FUNDAMENTAL UNIT OF LIFE CLASS IX.ppt
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPTHIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
 
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
 
NuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfNuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdf
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary Gland
 
PARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th semPARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th sem
 

FAIR Computational Workflows

  • 1. FAIR Computational Workflows Professor Carole Goble The University of Manchester UK EU Research Infrastructures: ELIXIR, IBISBA, EOSC-Life Centre of Excellence: BioExcel carole.goble@manchester.ac.uk JOBIM 2021, 8th July 2021 https://tinyurl.com/jobim-goble
  • 2. Computational Workflows for Data intensive Bioscience prepare, analyze, and share increasing volumes of complex data CryoEM Image Analysis Metagenomic Pipelines Drug Discovery Protein Ligand MD Simulation Genome Annotation High Throughput Sequencing Fabrice Allain Romain Dallet
  • 3. 20 years+ Computational workflows decades in the making…finally coming of age…. doi: 10.1093/gigascience/giaa140 Nature 573, 149-150 (2019) https://doi.org/10.1038/d41586-019-02619-z
  • 4. What are Data intensive Computational Workflows? Systematic linking together multiple tools and software packages inputs outputs tools, CLI, containers, workflows Scale up Access to computational infrastructure and datasets, tool interoperability, processing portability and optimisation, data wrangling. Specification description Software Execution WfMS Engine Workflow Scale out Flexible workflow composition to construct & run executable control and data flows using heterogeneous software packages, codes, tools, other workflows made by other people.
  • 5. SARS-CoV-2 allelic-variant surveillance Automated monitoring of structured data from the European COVID-19 Data Portal and national SAR-CoV-2 sequencing datasets, notably COG-UK. Scalable via access to a global distributed compute network • Improved data quality • Uniformly analysed data for downstream analysis & visualisation • Submission of data to public archives • All workflows, data and documentation available https://covid19.galaxyproject.org https://elixir-europe.org/news/covid-19-variants-galaxy https://doi.org/10.1101/2021.03.25.437046 Suite of workflows
  • 6. Distributed analysis , Pulsar network Managed online hosted Workflow as a Service Platform Designed for direct use by end users - 32K users Experts build workflows that others can use with their own data Researchers build and reuse workflows that are shared End users also use it to access and interact with a tool Workflow and Tool histories and reporting [Björn Grüning]
  • 7. Those workflows in the WorkflowHub Registry Find, publish and cite workflows and collections. Reuse, recycle, repurpose.
  • 8. Sharing Accelerates Science 8 Jacques van Helden A digital space for EMERGEN, the French plan for SARS-CoV-2 genomic surveillance and research Adapting and Reusing the ELIXIR Galaxy Workflows Tried and tested transparent methods.
  • 9. Inter-twingled Workflow System Landscape Scripting environments Interactive Electronic Research Notebooks Workflow Management Systems & execution platforms Repositories Registries Inter-twingling Mix and Matching Interactive & exploratory analysis Production, automated, workflow-integrated software https://s.apache.org/existing-workflow-systems 298 Systems
  • 10. 10 Handy Properties of Computational Workflows Composition & Abstraction Using the best codes written by 3rd parties Handle heterogeneity Shield complexity & incompatibility Sharable reusable, re-mixable methods Automation Repetitive reproducible pipelines Simulation sweeps Manage data and control flow Optimised monitoring & recovery Automated deployment Scalability & Infrastructure Access Accessing infrastructures, datasets and tools Optimised computation and data handling Parallelisation Secure sensitive data access & management Interoperating datasets & permission handling Reporting & Accreditation Portability Sharing & Adaptability Provenance logging & data lineage Auto-documentation Result comparison Dependency handling Containerisation & packaging Moving between on premise & cloud Shared method, publishable know-how BYOD / parameters Different implementations Changes in execution infrastructure
  • 11. https://snakemake.github.io/ Workflows are rules: Graph of jobs for automatic parallelisation, DIY package & containerisation installation, auto-documentation from frameworks to web based analysis platforms, hybrid cloud deployment Communities tend to cluster round a few systems. Take up of a WfMS typically depends on the “plugged-in” availability of data type specific codes, skills level of the workflow developers, and popularity. Online portals users build and reuse workflows around publicly available or user-uploaded data and pre-wrapped, pre-installed tools.
  • 13. WORKFLOW APPLICATION USER Yes it’s work, Labour saving -> Labour shifting know-how Production platforms & pipelines TOOL DEVELOPER WORKFLOW USER SYS ADMIN WORKFLOW DEVELOPER & CUSTODIAN COMPUTATIONAL USER Workflow System as a Platform Workflow System as a Service Labour Reach need infrastructure & services need tools to be wrapped & maintained need workflows to be developed, tested, run & maintained need to find and understand workflows, with explanations to use properly and safely.
  • 14. from compounds & genomics to tissue banks, from plants to marine to humans… https://lifescience-ri.eu/ An open collaborative space for digital biology in Europe
  • 15. A Workflow and Tools Collaboratory A data and method commons Workflows are an entry point to the tools and datasets of EOSC-Life functions for production quality FAIR data processing and access to secure data processing With thanks: Romain Dallet Galaxy Genome Annotation (GGA) environment in the cloud
  • 16. The EOSC-Life Workflow Collaboratory People -> workflows, services and standards for FAIR Workflows.
  • 17. Computational Workflow Framings Reproducibility Replication Regulation Labour saving Productivity Reliability Knowledge sharing Adaption Scholarly Objects Democratisation of computational analysis & methods
  • 18. Computational Workflow Framing: FAIR Principles The EOSC-Life FAIR Workflow Collaboratory A set of guiding principles to enhance the value of all digital resources and their reuse by people and by machines aligning a community around a journey to common data guidelines To help accelerate science so folks can find and reuse and interlink data – and tools and workflows too! Consumers and producers all benefit.
  • 19. Computational Workflow Framing: FAIR Principles The EOSC-Life FAIR Workflow Collaboratory FAIR is the EOSC glue to federate data and services, to apply to all objects
  • 20. How the FAIR Principles look RDA FAIR Data Maturity Model. Specification and Guidelines https://zenodo.org/record/3909563#.YORYkUzTX19 https://www.go-fair.org/fair-principles/ Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
  • 21. FAIR Principles for Data tl;dr https://www.go-fair.org/fair-principles/ Persistent human readable and machine-actionable metadata Linked metadata and community standards Persistent identifiers Clear licensing and access rules Protocols for machine accessibility Register / Index
  • 22. FAIR for Software Software is a digital object but research software is not (just) data https://www.rd-alliance.org/groups/fair-4-researchsoftware-fair4rs-wg FAIR for Research Software (FAIR4RS) working group Katz et al., 2016; Lamprecht et al., 2019 FAIR4RS First Draft of FAIR4RS principles CodeMeta https://github.com/codemeta/codemeta/
  • 23. https://www.softwareheritage.org/ https://www.cascad.tech/ puts software on a par with publications and data and announces a number of measures designed to open research software and better recognize software development in research. https://cache.media.enseignementsup-recherche.gouv.fr/file/science_ouverte/20/9/MEN_brochure_PNSO_web_1415209.pdf
  • 24. Data and software are first class objects and there will be sharing. Primary responsibility aimed at creators and providers for benefit of consumers but consumers need to shoulder responsibility too. Operating in an (open) ecosystem. Adoption at scale in legacy settings. Not a green-field site. EOSC-Life FAIR Workflow Collaboratory FAIR Implicit Assumptions in the Principles
  • 25. FAIR Principles for Workflows Hybrid Processual Digital Objects Method “Data” Objects Workflows as FAIR Software FAIR+R and FAIR++ The principles can be revised Workflows as FAIR Digital Objects Data-like method objects The principles can be adapted Workflows as FAIR Data Instruments FAIRification of the dataflow The data principles can be supported C. Goble, S. Cohen-Boulakia, S. Soiland-Reyes, D. Garijo, Y. Gil, M.R. Crusoe, K. Peters & D. Schober. FAIR computational workflows. Data Intelligence 2(2020), 108–121. doi: 10.1162/dint_a_000 Workflow Objects Software Objects Composable Usable Reusable FAIR Data
  • 26. Abstraction & Reporting Separation of the workflow specification from its execution & tools Specification description Software Execution Precise description of a procedure composed of multiple steps coordinated by input/output data relationships. Execution of computational and composted processes with data consumed & produced by each step. WfMS Engine Workflow Sub Workflows Tools and codes Parameters Inputs Outputs Infrastructure Guidance Associated Objects Data Logs / Histories / Provenance Services, e.g. Test engines + Related workflows Checker workflows Contextual Entities Metadata Graphs Sample input parameters, test data Software Management
  • 27. https://bioexcel.eu/speed-up-your-biomolecular-simulations-with-workflows-using-the-bioexcel-building-blocks-biobb/ Image credit: Bioexcel Centre of excellence Composition & Portability Analysis components - different codes/languages/third parties/compute
  • 28. FAIR Principles for Workflows coping with Hybrid Processual Digital Objects Composition & agency Usable not just reusable Abstraction forms Living & reusable parts & whole versioned, forked, cloned parts recycled, repurposed, remixed limited lifespans citable credit executability reproducibility, portability testing, maturity quality, maintainability specification implementation instantiation run result FAIR+R FAIR++ modularisation FAIR parts & dependencies propagation of FAIR properties
  • 29. Findable & Accessable register workflows with assigned PID + metadata in a searchable resource. https://workflowhub.eu Publishing Services Journals Digital Objects of Scholarship published, cited, exchanged, reviewed, validated & reused in new and different ways • Versioned identifiers • DOI assignment (https://doi.org/10.48546/workflowhub.workflow.29.2) • Collections, Canonical workflow libraries scripts Repos Containers Deploys Tools Agnostic and generous with the many WfMSs (with different degrees of support) • Workflows can be in native places • Metadata standards framework that all services can adopt on a spectrum and handles associated objects and links between objects. • Perpetual development by an open community
  • 30. licensing authors & credit analytics access search versions & status other workflows Biggest challenge? Metadata of course! Work with WfMS to auto-extract metadata + provide metadata services
  • 31. More than just a list 3 Spaces, Teams, People Linking up providers and users Building visibility & reputation Reciprocity to close the “Find – Get– Use – Credit” loop Research objects to be cited Build Knowledge Graphs linking out to OpenAIRE, DataCite and other tools
  • 32. FAIR Workflow are FAIR Software lifecycle support for living objects Indicators of Status Workflow monitoring Register versions Version PIDs Support Github actions Track authors and contributions Incremental metadata and supplementary materials Track & lift out sub- workflows R1.2: (Meta)data and software are associated with detailed provenance
  • 33. Tool Registry Service API Accessible metadata and workflows are retrievable by their PID using a standardized communication protocol GitHub page: https://github.com/ga4gh/tool-registry-service-schemas
  • 34. FAIR Metadata for Machines Machine and human readable canonical descriptions of the workflow that are WfMS neutral https://www.commonwl.org Canonical description of the workflow Linked to containerised tools Aid collaboration & knowledge transfer Standardise expression of workflow Describe engine neutral portable, reusable workflows Reduce vendor / project lock-in Enable workflow comparisons “Abstract” CWL
  • 35. Design by canonical, modularised workflow blocks Build a library of tested and validated CWL blocks CWL: • Canonical descriptions • Recycle descriptions and sub-workflows • Platform independent pipeline exchange and comparison Rob Finn Folker Meyer AWE MEGAHIT Assembly pipeline [with thanks to Rob Finn]
  • 36. Extensible Metadata Framework that caters for all those processual FAIR criteria Common metadata about the workflow, tools & parameters Canonical workflow description of the steps of the workflow Type the input and outputs of the steps Run Provenance / Histories / Tests Format for packaging a workflow, its metadata and companion objects (links to containers, data etc) for exchange, archiving, reporting, citing. FAIR Digital Object All Open Communities
  • 37. Bioschemas lightweight metadata Extensible and Linked metadata in service of the Life Science Community Open community reusing industry de facto standard Computation workflow profile Formal parameter profile https://bioschemas.org Opinionated use of schema.org, the web resource mark-up used by search engines, knowledge graphs and increasingly science as a whole. Computational tool Herve Menager Pasteur Alban Gaignard Nante
  • 38. Workflow Digital Objects Lightweight way of packaging everything together regardless where or what it is https://www.researchobject.org/ro-crate/ Format for packaging up scattered resources and self describing the package and its parts to get an integrated view + context, using metadata and PIDs to reference digital and real things - data, workflows & people, places. Web-native, off the shelf - machine and human readable, search engine & developer friendly. Infrastructure independent & self-describing PIDs, JSON-LD, Schema.org, archive formats Extensible and open- ended to cope with diversity and legacy “Duck typing” using profiles + added schema.org and domain ontologies
  • 40. BioComputeObject - Regulation why and how to use a workflow IEEE P2791-2020 robust, safe exchange & reuse of HTS computational analytical workflows http://biocomputeobject.org Alterovitz, Dean II, Goble, Crusoe, Soiland-Reyes et al “Enabling Precision Medicine via standard communication of NGS provenance, analysis, and results” PLOS Biology 2018m, https://doi.org/10.1371/journal.pbio.3000099 https://biocompute-objects.github.io/bco-ro-crate/ “Sidecar” third party metadata files inside the RO-Crate FAIR has to operate in a legacy ecosystem format
  • 41. FAIR Digital Objects RO-Crate a step towards FAIR Digital Object Middleware “To be FAIR each digital object type has its own metadata requirements, and may have its own repositories and registries” FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units: https://doi.org/10.3390/publications8020021 https://fairdo.org https://fairdo.org/wg/fdo-cwfr/
  • 42. Lightweight Semantic Workflow Underware is ready! A2. metadata are accessible, even when the workflow is no longer available Metadata preservation...beyond any one service. RO-Crate archive preserves metadata and workflow, republished in a long-term archive Archiving General Executing Testing & Monitoring WfMS R1. workflows are richly described with a plurality of accurate and relevant attributes Automating metadata as much as possible, which means on-boarding WfMS and FAIR services Enough metadata that a workflow is read- reproducible as a method description
  • 43. FAIR Software - not just Reusable but Usable i.e. can be executed once accessed Multiple wf/test backends: Galaxy Pandemo, CWL, Jenkins … Check workflow performance, provenance on containers, memory usage … Testing and monitoring Containers & Packaging FAIR+R FAIR++ Tool Registry Service API UI to start computational tasks based on containerised software https://github.com/inab/WfExS-backend High-level workflow execution service backend, sensitive data analysis & running on private clouds, produces & consumes RO-Crate
  • 44. Reproduciblity – Repeatability Provenance & Preservation Workflow-Run-RO-Crate Some heavy lifting … when is FAIR enough? https://iitdbgroup.github.io/ProvenanceWeek2021/ July 22nd 2021 It’s free!! R1.2: (Meta)data and software are associated with detailed provenance - not just the workflow but the run record associated with the data it produced ….
  • 45. FAIR Interoperability and Reusability = Composability *Reusable (can be understood, modified, built upon or incorporated into other software) Software interoperates with other software through community standard APIs and community standard meta(data) Software include qualified references to other objects Richly described Well documented Licensed Sample input parameters and test data Checker workflows Track versions Programmatic access to (meta)data Libraries of canonical workflow blocks Make tools workflow-ready Wrap tools *FAIR4RS Proposed Principles for FAIR Software Design for FAIR Data Design for Reuse Community Review Community Curation Certification Best Practice Licence combinations Access permissions Local -> Global identifiers
  • 46. FAIR takes a village its a JOINT responsibility and opportunity! In order for data to be FAIR, you need services that enable FAIR Be a good plug-in tool and data citizen enable programmatic access to datasets make clean tool interface avoid usage restrictions use open community data standards and formats simplify installation code for portability, parallelisation & reproducibility manage versions register! document! Be a good workflow maker......and user use and make FAIR identifiers for data license data outputs use open community data standards and formats validate parameters use a WfMS that tracks data provenance consider secure data processing manage versions design tests and test data credit tool and sub-workflow makers choose FAIR data services register! document! build libraries! use well documented FAIR enabling and FAIR workflows credit the makers!
  • 47. FAIR takes a village its a JOINT responsibility and FAIR ≠ FREE Advocate standards & practice Sustain and manage infrastructure Credit and incentives Maturity models & metrics Certification and canonical libraries In order for data to be FAIR, you need services that enable FAIR Training, Stewardship & Sustainability Workflows are an entry point to the tools and datasets of EOSC-Life and functions for FAIR data.
  • 48. FAIR Computational Workflows: TL;DL Modern bioinformatics increasingly leans on computational workflows as production workhorses and transparent, reproducible processing. Workflows democratise access to data and infrastructure and sharing of complex processing. Workflows are hybrid Digital Objects of scholarship that should be FAIR which means defining FAIR, and the necessary standards, services and processes. FAIR is an opportunity and necessity to get wider uptake of workflows FAIR data, workflows and their infrastructure and everything else takes a village where everyone shoulders responsibility for the benefit of all.
  • 49. Acknowledgements The WorkflowHub Club, Bioschemas Community, RO-Crate Community, CWL Community, Galaxy Europe, EOSC-Life and ELIXIR Tools Platform. Special Thanks Stian Soiland-Reyes (U Manchester / U Amsterdam) Paul Brack, Stuart Owen, Finn Bacall, Alan Williams (U Manchester) Björn Grüning (U Freiburg) Frederik Coppens (VIB) Sarah Jones (GEANT) Herve Menager (Pasteur Institute) Sarah Cohen-Boulakia (U Paris Sacly) Dan Katz (U Illinois Urbana-Champaign) Simone Leo (CRS4) Laura Rodriguez-Navas (BSC) José Mª Fernández (BSC) EOSC-Life https://www.eosc-life.eu/ ELIXIR http://elixir-europe.org RO-Crate https://www.researchobject.org/ro-crate/ WorkflowHub https://workflowhub.eu/ and workflowhub.org Galaxy Europe https://galaxyproject.eu/ Bioschemas https://bioschemas.org Common Workflow Language https://www.commonwl.org/