This project has received funding from the European Union’s Horizon 2020 research and
Innovation programme under grant agreement No. 825775
Practically FAIR
Presenter: Andrew Stubbs (Erasmus Medical Center)
Host: Marta Lloret Llinares (EMBL-EBI)
This webinar is being recorded
Audience Q&A Session
Please write your
questions in the
questions
window of the
GoToWebinar
application
The challenges:
Stay
informed
@CinecaProject
www.cineca-project.eu
Common Infrastructure for National Cohorts
in Europe, Canada and Africa
This project has received funding from the European Union’s Horizon 2020 research and
Innovation programme under grant agreement No. 825775
Accelerating disease research and
improving health by facilitating
transcontinental human data exchange
The vision:
This project has received funding from the Canadian Institute of Health
Research under grant agreement #404896
Context for the webinar
• CINECA “How FAIR are you?” webinar series and hackathon:
• https://www.cineca-project.eu/news-events-all/how-fair-are-you-webinar-series-and-
hackathon
• Webinar series Jan-April
• Introduction to FAIR principles - Open science through FAIR health data networks: dream or reality?
• Making cohort data FAIR
• FAIR software tools
• Practically FAIR
• How to make training FAIR
• Ethics/ELSI considerations
• Hackathon 28-29th April 4 hours per day
• 3 streams: cohort data, software, training materials
Today’s presenter
Dr Stubbs is an Associate Professor in the Department of Pathology and Clinical Bioinformatics.
The Stubbs group is focused on Artificial Intelligence (AI) in Healthcare, Translational
Bioinformatics and FAIR data management in translational and clinical research. His group
have applied Machine learning to discriminate bacteria from viral infections to reduce the use
of antibiotic treatment (Tailored treatment H2020: cordis.europa.eu/project/id/602860) and
Deep Learning to deliver predictive models form multi-omics experiments to improve patient
stratification in pancreatic cancer patients (Eurostars iKnowIT grant: www.erasmusmc-
rdo.nl/project/iknowit-integrated-knowledge-discovery-it/; Hanarth Fonds: www.erasmusmc-
rdo.nl/project/5702/).
His team are the Dutch lead for the European Galaxy project and have implemented Galaxy servers and services
supporting Cancer research (including and neo-antigen prediction), metagenomics (NanoGalaxy) and for immune
repertoire analysis (ARGalaxy). FAIR data management is a prerequisite for reproducible science and required by all
H2020 projects. To address this requirement we have developed a cloud based FAIR data management and analysis
platform (myFAIR) for use in the Canada-European Big Data Federated analysis (CINECA: www.cineca-project.eu/)
H2020 project.
This project has received funding from the European Union’s Horizon 2020 research and
Innovation programme under grant agreement No. 825775
Practically FAIR?
CINECA “How FAIR are you” webinar series #1:
Introduction to FAIR Principles
Andrew Stubbs
a.stubbs@erasmusmc.nl
Outline
1. Genomics in Healthcare
2. FAIR Genomics in Healthcare
3. FAIR(-R)
4. Practically FAIR-R.
Personalized Medicine
Precision Medicine Informatics: Principles, Prospects, and Challenges. Afzul et al 2019 . arXiv:1911.01014
Percentage of patients for which a
certain drug is ineffective, by therapy
class
Genomics in Healthcare
Requirements
• Clinical data: anonymous patient data
• Genomics NGS platforms both internal and
external to Hospital
• Genomics Analytical Workflows both
internal and external to Hospital
Genomics Healthcare Data Lifecycle
(Gen)-omics Translational Research
Experimental Design Multi-omics Data
FAIR Genomics in Healthcare
What is FAIR Genomics?
What is FAIR Genomics?
Genomic Data Management
https://github.com/fairgenomes
What is FAIR Genomes
• Aim to develop a national guideline to
promote optimal (re)use of NGS data in
research and healthcare.
• 61 people from 14 institutes (NL).
• Interacting with EJP-RD CDE, Solve-RD RD3,
1+MG, GA4GH, Phenopackets, X-omics, and
others.
Genomic Data Management
https://github.com/fairgenomes
Genomic Data Management
https://figshare.com/articles/poster/FAIR_Genomes_Standardizing_a_meta-data_schema_for_FAIRifying_personal_genome_data_workflows/11694693
Genomic Data sharing
• Beacon Protocol for Genomic Data Sharing
• Beacons provide discovery services for
genomic data using the Beacon API developed
as a key driver project of the Global Alliance
for Genomics and Health (GA4GH).
• The Beacon protocol itself defines an open
standard for genomics data discovery. It
provides a framework for public web services
responding to queries against genomic data
collections, for instance from population
based or disease specific genome
repositories.
Genomic Data sharing
Nature Biotechnology | VOL 37 | MARCH 2019 | 215–226 |
Genomic Data sharing
Beacon Data Access Control (DAC)
Data Access Phases for Controlled Access
Beaconize Hospitals
Benefit : identify if the pathogenic or non-
pathogenic variant was sequenced at
another hospital!
Benefit : The aim is a secure standardized data
sharing without compromising patient confidentiality.
FAIR-R
Why create FAIR-R analysis?
Beyond FAIR is FAIR-R
We would like to be Practically …
Beyond FAIR is FAIR-R
We would like to be Practically …
and Reproducible, Replicable, ….
Reproducible, Replicable, …
• Data Analysis platform
• Web-based
• Easy to use
• Free and Open Source
• Many tools (~8000)
• Popular (>10.000 publications)
• Extensive Training Materials
(training.galaxyproject.org)
Homepage: galaxyproject.org
Galaxy as a solution
Multitude of Galaxies
• All tools and workflows are publicly accessible via
GitHub
• 6 different types of analysis
• 5 different Galaxy servers - including Galaxy
Australia
• Workflows and tools available on all servers
• Workload shared amongst global clouds and
compute resources
• Reproducible across multiple servers
https://covid19.galaxyproject.org
Galaxy for COVID-research
Adapted from “Viral Beacon and Galaxy variant workflows” , Singh, Grüning & Maier ELIXIR
Galaxy integrated with COVID Beacon
Adapted from “Viral Beacon and Galaxy variant workflows” , Singh, Grüning & Maier ELIXIR
Galaxy integrated with COVID Beacon
Adapted from “Viral Beacon and Galaxy variant workflows” , Singh, Grüning & Maier ELIXIR
FAIR-R Containerized Tools
Practically FAIR – Solution 101
How to stream data from BEACONS
https://htsget.readthedocs.io/en/latest/
Request DAC
AAI DAC access and htsget in Galaxy
1
Login and stream data with htsget Galaxy
2
Reproducible Cancer Workflows
myFAIR: prototype FAIR-R application
In Summary
• The barriers (e.g privacy, security, ..) to the use of FAIR genomes across hospitals
can be addressed using the tools presented today.
• The FAIR-R concept and associated applications are essential for valid clinical and
translational research
• The combination of BEACON, FAIR genomes and Galaxy together provide a simple
framework to deliver FAIR-R services to medical and translational researchers.
• Better practical FAIR-R applications need to developed to improved the uptake by
medical researchers.
Erasmus MC
• Saskia Hiltemann
• Helena Rasche
• Willem de Koning
• Jie Ju
• Teodora Trandafir
• David van Zessen
• Yunlei Li
Our fabulous colleagues from CINECA
• Also Marta and Daniel for todays meeting.
Acknowledgements
Questions?
Title: Practically FAIR
Presenter: Andrew Stubbs
Please write your questions in the
questions window of the GoToWebinar
application
Next CINECA webinars
Title: How to make training FAIR
Presenter: Anna Swan and Sarah Morgan
Date: Thurs 18th March 2021
Time: 3:00 PM GMT / 4:00 PM CET
Registration and details:
https://www.cineca-project.eu/news-
events-all/how-to-make-training-fair
Title: Ethics/ELSI considerations - From
FAIR to fair data sharing
Presenter: Melanie Goisauf
Date: Thurs 15th April 2021
Time: 4:00 PM BST / 5:00 PM CEST
Registration and details:
https://www.cineca-project.eu/news-
events-all/ethics/elsi-considerations

CINECA webinar slides: Practically FAIR

  • 1.
    This project hasreceived funding from the European Union’s Horizon 2020 research and Innovation programme under grant agreement No. 825775 Practically FAIR Presenter: Andrew Stubbs (Erasmus Medical Center) Host: Marta Lloret Llinares (EMBL-EBI)
  • 2.
    This webinar isbeing recorded
  • 3.
    Audience Q&A Session Pleasewrite your questions in the questions window of the GoToWebinar application
  • 4.
    The challenges: Stay informed @CinecaProject www.cineca-project.eu Common Infrastructurefor National Cohorts in Europe, Canada and Africa This project has received funding from the European Union’s Horizon 2020 research and Innovation programme under grant agreement No. 825775 Accelerating disease research and improving health by facilitating transcontinental human data exchange The vision: This project has received funding from the Canadian Institute of Health Research under grant agreement #404896
  • 5.
    Context for thewebinar • CINECA “How FAIR are you?” webinar series and hackathon: • https://www.cineca-project.eu/news-events-all/how-fair-are-you-webinar-series-and- hackathon • Webinar series Jan-April • Introduction to FAIR principles - Open science through FAIR health data networks: dream or reality? • Making cohort data FAIR • FAIR software tools • Practically FAIR • How to make training FAIR • Ethics/ELSI considerations • Hackathon 28-29th April 4 hours per day • 3 streams: cohort data, software, training materials
  • 6.
    Today’s presenter Dr Stubbsis an Associate Professor in the Department of Pathology and Clinical Bioinformatics. The Stubbs group is focused on Artificial Intelligence (AI) in Healthcare, Translational Bioinformatics and FAIR data management in translational and clinical research. His group have applied Machine learning to discriminate bacteria from viral infections to reduce the use of antibiotic treatment (Tailored treatment H2020: cordis.europa.eu/project/id/602860) and Deep Learning to deliver predictive models form multi-omics experiments to improve patient stratification in pancreatic cancer patients (Eurostars iKnowIT grant: www.erasmusmc- rdo.nl/project/iknowit-integrated-knowledge-discovery-it/; Hanarth Fonds: www.erasmusmc- rdo.nl/project/5702/). His team are the Dutch lead for the European Galaxy project and have implemented Galaxy servers and services supporting Cancer research (including and neo-antigen prediction), metagenomics (NanoGalaxy) and for immune repertoire analysis (ARGalaxy). FAIR data management is a prerequisite for reproducible science and required by all H2020 projects. To address this requirement we have developed a cloud based FAIR data management and analysis platform (myFAIR) for use in the Canada-European Big Data Federated analysis (CINECA: www.cineca-project.eu/) H2020 project.
  • 7.
    This project hasreceived funding from the European Union’s Horizon 2020 research and Innovation programme under grant agreement No. 825775 Practically FAIR? CINECA “How FAIR are you” webinar series #1: Introduction to FAIR Principles Andrew Stubbs a.stubbs@erasmusmc.nl
  • 8.
    Outline 1. Genomics inHealthcare 2. FAIR Genomics in Healthcare 3. FAIR(-R) 4. Practically FAIR-R.
  • 9.
    Personalized Medicine Precision MedicineInformatics: Principles, Prospects, and Challenges. Afzul et al 2019 . arXiv:1911.01014 Percentage of patients for which a certain drug is ineffective, by therapy class
  • 10.
  • 11.
    Requirements • Clinical data:anonymous patient data • Genomics NGS platforms both internal and external to Hospital • Genomics Analytical Workflows both internal and external to Hospital Genomics Healthcare Data Lifecycle
  • 12.
  • 13.
    FAIR Genomics inHealthcare
  • 14.
    What is FAIRGenomics?
  • 15.
    What is FAIRGenomics?
  • 16.
    Genomic Data Management https://github.com/fairgenomes Whatis FAIR Genomes • Aim to develop a national guideline to promote optimal (re)use of NGS data in research and healthcare. • 61 people from 14 institutes (NL). • Interacting with EJP-RD CDE, Solve-RD RD3, 1+MG, GA4GH, Phenopackets, X-omics, and others.
  • 17.
  • 18.
  • 19.
  • 20.
    • Beacon Protocolfor Genomic Data Sharing • Beacons provide discovery services for genomic data using the Beacon API developed as a key driver project of the Global Alliance for Genomics and Health (GA4GH). • The Beacon protocol itself defines an open standard for genomics data discovery. It provides a framework for public web services responding to queries against genomic data collections, for instance from population based or disease specific genome repositories. Genomic Data sharing Nature Biotechnology | VOL 37 | MARCH 2019 | 215–226 |
  • 21.
  • 22.
    Beacon Data AccessControl (DAC) Data Access Phases for Controlled Access
  • 23.
    Beaconize Hospitals Benefit :identify if the pathogenic or non- pathogenic variant was sequenced at another hospital! Benefit : The aim is a secure standardized data sharing without compromising patient confidentiality.
  • 24.
  • 25.
  • 26.
    Beyond FAIR isFAIR-R We would like to be Practically …
  • 27.
    Beyond FAIR isFAIR-R We would like to be Practically … and Reproducible, Replicable, ….
  • 28.
  • 29.
    • Data Analysisplatform • Web-based • Easy to use • Free and Open Source • Many tools (~8000) • Popular (>10.000 publications) • Extensive Training Materials (training.galaxyproject.org) Homepage: galaxyproject.org Galaxy as a solution
  • 30.
  • 31.
    • All toolsand workflows are publicly accessible via GitHub • 6 different types of analysis • 5 different Galaxy servers - including Galaxy Australia • Workflows and tools available on all servers • Workload shared amongst global clouds and compute resources • Reproducible across multiple servers https://covid19.galaxyproject.org Galaxy for COVID-research Adapted from “Viral Beacon and Galaxy variant workflows” , Singh, Grüning & Maier ELIXIR
  • 32.
    Galaxy integrated withCOVID Beacon Adapted from “Viral Beacon and Galaxy variant workflows” , Singh, Grüning & Maier ELIXIR
  • 33.
    Galaxy integrated withCOVID Beacon Adapted from “Viral Beacon and Galaxy variant workflows” , Singh, Grüning & Maier ELIXIR
  • 34.
  • 35.
  • 36.
    How to streamdata from BEACONS https://htsget.readthedocs.io/en/latest/
  • 37.
    Request DAC AAI DACaccess and htsget in Galaxy 1 Login and stream data with htsget Galaxy 2
  • 38.
  • 39.
  • 40.
    In Summary • Thebarriers (e.g privacy, security, ..) to the use of FAIR genomes across hospitals can be addressed using the tools presented today. • The FAIR-R concept and associated applications are essential for valid clinical and translational research • The combination of BEACON, FAIR genomes and Galaxy together provide a simple framework to deliver FAIR-R services to medical and translational researchers. • Better practical FAIR-R applications need to developed to improved the uptake by medical researchers.
  • 41.
    Erasmus MC • SaskiaHiltemann • Helena Rasche • Willem de Koning • Jie Ju • Teodora Trandafir • David van Zessen • Yunlei Li Our fabulous colleagues from CINECA • Also Marta and Daniel for todays meeting. Acknowledgements
  • 42.
    Questions? Title: Practically FAIR Presenter:Andrew Stubbs Please write your questions in the questions window of the GoToWebinar application
  • 43.
    Next CINECA webinars Title:How to make training FAIR Presenter: Anna Swan and Sarah Morgan Date: Thurs 18th March 2021 Time: 3:00 PM GMT / 4:00 PM CET Registration and details: https://www.cineca-project.eu/news- events-all/how-to-make-training-fair Title: Ethics/ELSI considerations - From FAIR to fair data sharing Presenter: Melanie Goisauf Date: Thurs 15th April 2021 Time: 4:00 PM BST / 5:00 PM CEST Registration and details: https://www.cineca-project.eu/news- events-all/ethics/elsi-considerations