This project has received funding from the European Union’s Horizon 2020 research and
Innovation programme under grant agreement No. 825775
Introduction to FAIR principles - Open science through
FAIR health data networks: dream or reality?
Presenter: Kees van Bochove (The Hyve)
Host: Marta Lloret Llinares (EMBL-EBI)
This webinar is being recorded
Audience Q&A Session
Please write your
questions in the
questions
window of the
GoToWebinar
application
The challenges:
Stay
informed
@CinecaProject
www.cineca-project.eu
Common Infrastructure for National Cohorts
in Europe, Canada and Africa
This project has received funding from the European Union’s Horizon 2020 research and
Innovation programme under grant agreement No. 825775
Accelerating disease research and
improving health by facilitating
transcontinental human data exchange
The vision:
This project has received funding from the Canadian Institute of Health
Research under grant agreement #404896
Context for the webinar
• CINECA “How FAIR are you?” webinar series and hackathon:
• https://www.cineca-project.eu/news-events-all/how-fair-are-you-webinar-
series-and-hackathon
• Webinar series Jan-April
• Making cohort data FAIR
• FAIR software tools
• Practically FAIR
• How to make training FAIR
• Ethics/ELSI considerations
• Hackathon 28-29th April 4 hours per day
• 3 streams: cohort data, software, training materials
Today’s presenter
Kees van Bochove is founder of The Hyve, a company dedicated to the support and facilitation
of open source, open standards and open data in biomedical informatics. He studied
Computer Science at University of Utrecht and Bioinformatics at VU University Amsterdam,
for which he did his research project on lipoprotein metabolism at TNO Quality of Life in Zeist
and the Jean Mayer USDA Human Nutrition Research Center on Aging at Tufts University
Boston.
Kees has been involved with the FAIR movement since the initial Lorentz workshop in 2014,
and was one of the initiators of the FAIR implementation working group in the Pistoia Alliance,
hosting a formatory Pistoia workshop on the implementation of the FAIR principles in Pharma
R&D at The Hyve in 2018. He actively contributes to international dialogue about the
application of biomedical data standards such as OMOP, FHIR, openEHR, GA4GH etc.
Today, Kees’ main expertise and engagements are as Principal Consultant, advising pharma companies, academic
hospitals as well as patient and health data networks on their FAIR Data Strategy, and advising and leading
implementation projects with teams from The Hyve.
This project has received funding from the European Union’s Horizon 2020 research and
Innovation programme under grant agreement No. 825775
Open science through FAIR health data
networks: dream or reality?
CINECA “How FAIR are you” webinar series #1:
Introduction to FAIR Principles, Jan 21, 2021
Kees van Bochove, Founder, The Hyve
@keesvanbochove
We enable open science by
developing and implementing
open source solutions and FAIRifying
data in life sciences
Open innovation ecosystem at
National Health Data Networks
Open Source Software Precompetitive Health Data Projects Partner Communities
March 2020: frantic search for medical evidence
Outline
1. The case for FAIR and open science
2. Medical evidence generation is
changing through open science
3. Case study: COVID-19 studyathon
Science is broken
Statement #1
@keesvanbochove @TheHyveNL
but it can be fixed
What’s wrong with the academic career path?
What’s wrong with the academic career path?
https://danco.substack.com/p/can-twitter-save-science
Poor usage of statistics in biomedical science
Data that is not FAIR and behind paywalls
The FAIR Principles for (meta)data
http://www.nature.com/articles/sdata201618
https://doi.org/10.1038/sdata.2016.18
Accessible:
A1. retrievable using standardized protocol
A1.1 open, free and universally implementable
A1.2. authentication and authorization
A2. metadata stay accessible
Reusable:
R1. described by relevant attributes
R1.1. clear and accessible data usage license
R1.2. detailed provenance
R1.3. meet domain-relevant community standards
Interoperable:
I1. use a formal language for knowledge representation
I2. use vocabularies that follow FAIR principles
I3. include qualified references to other (meta)data
Findable:
F1. globally unique persistent identifier
F2. rich metadata
F3. metadata - data link
F4. registered or indexed
https://www.go-fair.org/fair-principles/
the identifiers assigned to and used within the objects
the data standards and code used to represent the information in them
the metadata that provides contextual information about the
object
data, software, protocols or other digital resources Choose appropriate building blocks
Registration of data & domain models
Usage of open standards
Minimum metadata fields
Conventions for domain specific fields
API documentation and design
Generation of digital object identifiers
Registration of identifier namespace
Digital objects as building
blocks in open science
Registration of objects in data catalogue
Assign explicit data object classes
FAIR Digital Objects
http://doi.org/10.2777/1524
Open Science means re-inventing the process
From ivory tower to societal service
https://www.uu.nl/en/research/open-science
Science is
broken, but it can
be fixed
Statement #1
● The classical way of doing science
has now too many perverse
incentives
● Digital collaboration can change
that for the better
● Open science and FAIR data
stewardship offer a way forward
for society
@keesvanbochove @TheHyveNL
@keesvanbochove @TheHyveNL
In this webinar, we will dive into the
basics of FAIR health data, but also
take stock of the current situation in
health data networks: after a year
of frantic research and
collaborations and many open
datasets and hackathons on
COVID-19, has the situation
actually improved? Are we sharing
health data on a global scale to
improve medical practice, or is
quality medical data still only
accessible to researchers with the
right credentials and deep pockets?
Medical evidence
generation is
changing through
open science
Statement #2
@keesvanbochove @TheHyveNL
Stakeholders in Health Data Networks
Researcher
Healthcare
Professional
Patient /
Citizen
Health
Data
https://youtu.be/C95pl11zdAs - webinar Health Data Networks
The CINECA Network
The many faces of health data
Clinical view: the disease
phenomenon
• Hospital systems: EMR, LIMS,
PACS etc.
• Clinical guidelines: data for
decision making
Molecular view: drug & disease
mechanisms
• DNA, RNA, proteome,
metabolome, microbiome etc.
• Molecular pathways, cell models
• Macromodelling & simulation,
bioactivity data
• Drug discovery, PK/PD etc.
Financial view: the patient as
customer
• Medical claims datasets:
reimbursed drugs & procedures
• Value based healthcare:
outcomes based reimbursement
• Health economics
• Health datasets typically
only reflect a partial,
incomplete and biased
portion of one of these
views.
• Establishing causality is
complex:
- Observational studies try
to infer from existing data
- Interventional studies
generate new data
• Data is often not FAIR
Patient view: the experience of
the patient
• Outcomes measurement through
PROMS, PREMS etc.
• eHealth apps, self-monitoring etc.
• Social media and forums
The “Odyssey” approach
Analytical method
Link to data
Data interoperability Standardised analytics Data network Strong community
What will it require?
The data…
The OMOP common data model
Standardized
clinical
data
Standardized
health
economies
Standardized
derived
elements
Standardized
vocabularies
Standardized meta-data
Standardized health system data
Person
Observation period
Specimen
Death
Visit occurrence
Procedure occurrence
Drug exposure
Device exposure
Condition occurrence
Measurement
Observation
Note
Note NLP
Fact relationship
Care site
Payer plan period
CDM source
Concept
Vocabulary
Domain
Concept class
Concept relationship
Relationship
Condition era
Drug era
Dose era
Location
Cost
Cohort
Cohort attribute
Concept synonym
Concept ancestor
Source-to-concept map
Drug strength
Cohort definition
Attribute definition
Provider
Patient-centric
Tabular
Extendable
Built for analytics
Relational design
Standardized vocabularies
https://www.ehden.eu/datapartners/
Standardized analytics
https://www.ehden.eu/datapartners/
EHDEN Network
https://www.ehden.eu/datapartners/
32
LANCET PAPER FROM OHDSI-LEGEND COLLABORATION
33
LEGEND Principles
https://www.ehden.eu/datapartners/
https://doi.org/10.1093/jamia/ocaa103
1. LEGEND will generate evidence at a large scale.
2. Dissemination of the evidence will not depend on the
estimated effects.
3. LEGEND will generate evidence using a prespecified
analysis design.
4. LEGEND will generate evidence by consistently applying a
systematic process across all research questions.
5. LEGEND will generate evidence using best practices.
6. LEGEND will include empirical evaluation through the use
of control questions.
7. LEGEND will generate evidence using open-source
software that is freely available to all.
8. LEGEND will not be used to evaluate new methods.
9. LEGEND will generate evidence across a network of
multiple databases.
10. LEGEND will maintain data confidentiality; patient-level
data will not be shared between sites in the network.
Medical evidence
generation is
changing through
open science
Statement #2
@keesvanbochove @TheHyveNL
● The scale is changing: “a
paradigm shift from single study
and single estimate medical
research to large-scale
systematic evidence generation”
https://ohdsi.github.io/TheBookOfOhdsi/OpenScience.html
@keesvanbochove @TheHyveNL
In this webinar, we will dive into the
basics of FAIR health data, but also
take stock of the current situation in
health data networks: after a year
of frantic research and
collaborations and many open
datasets and hackathons on
COVID-19, has the situation
actually improved? Are we
sharing health data on a global
scale to improve medical practice,
or is quality medical data still only
accessible to researchers with the
right credentials and deep pockets?
The COVID-19
pandemic is an
accelerator for
open science
Statement #3
@keesvanbochove @TheHyveNL
COVID-19 hackathons are abundant
Bioinformatics Citizen science
Medical science?
Fast observational research is feasible
March 26-29, 2020
● Virtual event
● >300 collaborators
from 30 countries
● Four time zones
● 37 healthcare
databases
● Twelve concurrent
network studies
Interoperable data network
Interoperable data network
1)
2)
3)
4)
Three focus areas, Twelve Questions
“Safety of hydroxychloroquine, alone and in
combination with azithromycin, in light of rapid wide-
spread use for COVID-19: a multinational, network
cohort and self-controlled case series study”
https://github.com/ohdsi-studies/Covid19EstimationHydroxychloroquine
Immediate dissemination of results
https://data.ohdsi.org/Covid19EstimationHydroxychloroquine/
Within weeks: Paper in pre-print
https://www.medrxiv.org/content/10.1101/2020.04.08.20054551v1
https://www.sciencemag.org/news/2020/04/antimalarials-widely-used-against-covid-19-heighten-risk-cardiac-arrest-how-can-doctors
One meeting produced a library!
https://www.ohdsi.org/covid-19-updates/
Months later: the peer-reviewed versions
More studyathons are on the way!
https://prostate-pioneer.eu - https://bit.ly/3c3ywpR
The COVID-19
pandemic is an
accelerator for
open science
Statement #3
@keesvanbochove @TheHyveNL
● Scientific communities have come
together to distribute and share
data and analytics in multiple
communities
● It has created urgency for data
sharing and standardization in the
medical world… which hopefully
will materialize in the coming
years
Questions?
Title: Introduction to FAIR principles - Open science through FAIR
health data networks: dream or reality?
Presenter: Kees van Bochove
Please write your questions in the
questions window of the GoToWebinar
application
Next CINECA webinars
Title: Making Cohort data FAIR
Presenter: William Hsiao
Date: Wed 10th February 2021,
Time: 4:00 PM GMT / 5:00 PM CET
Registration and details:
https://www.cineca-project.eu/news-
events-all/making-cohort-data-fair
Title: FAIR Software tools
Presenter: Carlos Martinez
Date: Wed 24th February 2021,
Time: 3:00 PM GMT / 4:00 PM CET
Registration and details:
https://www.cineca-project.eu/news-
events-all/fair-software-tools

CINECA webinar slides: Open science through fair health data networks dream or reality?

  • 1.
    This project hasreceived funding from the European Union’s Horizon 2020 research and Innovation programme under grant agreement No. 825775 Introduction to FAIR principles - Open science through FAIR health data networks: dream or reality? Presenter: Kees van Bochove (The Hyve) Host: Marta Lloret Llinares (EMBL-EBI)
  • 2.
    This webinar isbeing recorded
  • 3.
    Audience Q&A Session Pleasewrite your questions in the questions window of the GoToWebinar application
  • 4.
    The challenges: Stay informed @CinecaProject www.cineca-project.eu Common Infrastructurefor National Cohorts in Europe, Canada and Africa This project has received funding from the European Union’s Horizon 2020 research and Innovation programme under grant agreement No. 825775 Accelerating disease research and improving health by facilitating transcontinental human data exchange The vision: This project has received funding from the Canadian Institute of Health Research under grant agreement #404896
  • 5.
    Context for thewebinar • CINECA “How FAIR are you?” webinar series and hackathon: • https://www.cineca-project.eu/news-events-all/how-fair-are-you-webinar- series-and-hackathon • Webinar series Jan-April • Making cohort data FAIR • FAIR software tools • Practically FAIR • How to make training FAIR • Ethics/ELSI considerations • Hackathon 28-29th April 4 hours per day • 3 streams: cohort data, software, training materials
  • 6.
    Today’s presenter Kees vanBochove is founder of The Hyve, a company dedicated to the support and facilitation of open source, open standards and open data in biomedical informatics. He studied Computer Science at University of Utrecht and Bioinformatics at VU University Amsterdam, for which he did his research project on lipoprotein metabolism at TNO Quality of Life in Zeist and the Jean Mayer USDA Human Nutrition Research Center on Aging at Tufts University Boston. Kees has been involved with the FAIR movement since the initial Lorentz workshop in 2014, and was one of the initiators of the FAIR implementation working group in the Pistoia Alliance, hosting a formatory Pistoia workshop on the implementation of the FAIR principles in Pharma R&D at The Hyve in 2018. He actively contributes to international dialogue about the application of biomedical data standards such as OMOP, FHIR, openEHR, GA4GH etc. Today, Kees’ main expertise and engagements are as Principal Consultant, advising pharma companies, academic hospitals as well as patient and health data networks on their FAIR Data Strategy, and advising and leading implementation projects with teams from The Hyve.
  • 7.
    This project hasreceived funding from the European Union’s Horizon 2020 research and Innovation programme under grant agreement No. 825775 Open science through FAIR health data networks: dream or reality? CINECA “How FAIR are you” webinar series #1: Introduction to FAIR Principles, Jan 21, 2021 Kees van Bochove, Founder, The Hyve @keesvanbochove
  • 8.
    We enable openscience by developing and implementing open source solutions and FAIRifying data in life sciences
  • 9.
    Open innovation ecosystemat National Health Data Networks Open Source Software Precompetitive Health Data Projects Partner Communities
  • 10.
    March 2020: franticsearch for medical evidence
  • 11.
    Outline 1. The casefor FAIR and open science 2. Medical evidence generation is changing through open science 3. Case study: COVID-19 studyathon
  • 12.
    Science is broken Statement#1 @keesvanbochove @TheHyveNL but it can be fixed
  • 13.
    What’s wrong withthe academic career path?
  • 14.
    What’s wrong withthe academic career path? https://danco.substack.com/p/can-twitter-save-science
  • 15.
    Poor usage ofstatistics in biomedical science
  • 16.
    Data that isnot FAIR and behind paywalls
  • 17.
    The FAIR Principlesfor (meta)data http://www.nature.com/articles/sdata201618 https://doi.org/10.1038/sdata.2016.18 Accessible: A1. retrievable using standardized protocol A1.1 open, free and universally implementable A1.2. authentication and authorization A2. metadata stay accessible Reusable: R1. described by relevant attributes R1.1. clear and accessible data usage license R1.2. detailed provenance R1.3. meet domain-relevant community standards Interoperable: I1. use a formal language for knowledge representation I2. use vocabularies that follow FAIR principles I3. include qualified references to other (meta)data Findable: F1. globally unique persistent identifier F2. rich metadata F3. metadata - data link F4. registered or indexed https://www.go-fair.org/fair-principles/
  • 18.
    the identifiers assignedto and used within the objects the data standards and code used to represent the information in them the metadata that provides contextual information about the object data, software, protocols or other digital resources Choose appropriate building blocks Registration of data & domain models Usage of open standards Minimum metadata fields Conventions for domain specific fields API documentation and design Generation of digital object identifiers Registration of identifier namespace Digital objects as building blocks in open science Registration of objects in data catalogue Assign explicit data object classes FAIR Digital Objects http://doi.org/10.2777/1524
  • 19.
    Open Science meansre-inventing the process
  • 20.
    From ivory towerto societal service https://www.uu.nl/en/research/open-science
  • 21.
    Science is broken, butit can be fixed Statement #1 ● The classical way of doing science has now too many perverse incentives ● Digital collaboration can change that for the better ● Open science and FAIR data stewardship offer a way forward for society @keesvanbochove @TheHyveNL
  • 22.
    @keesvanbochove @TheHyveNL In thiswebinar, we will dive into the basics of FAIR health data, but also take stock of the current situation in health data networks: after a year of frantic research and collaborations and many open datasets and hackathons on COVID-19, has the situation actually improved? Are we sharing health data on a global scale to improve medical practice, or is quality medical data still only accessible to researchers with the right credentials and deep pockets?
  • 23.
    Medical evidence generation is changingthrough open science Statement #2 @keesvanbochove @TheHyveNL
  • 24.
    Stakeholders in HealthData Networks Researcher Healthcare Professional Patient / Citizen Health Data https://youtu.be/C95pl11zdAs - webinar Health Data Networks
  • 25.
  • 26.
    The many facesof health data Clinical view: the disease phenomenon • Hospital systems: EMR, LIMS, PACS etc. • Clinical guidelines: data for decision making Molecular view: drug & disease mechanisms • DNA, RNA, proteome, metabolome, microbiome etc. • Molecular pathways, cell models • Macromodelling & simulation, bioactivity data • Drug discovery, PK/PD etc. Financial view: the patient as customer • Medical claims datasets: reimbursed drugs & procedures • Value based healthcare: outcomes based reimbursement • Health economics • Health datasets typically only reflect a partial, incomplete and biased portion of one of these views. • Establishing causality is complex: - Observational studies try to infer from existing data - Interventional studies generate new data • Data is often not FAIR Patient view: the experience of the patient • Outcomes measurement through PROMS, PREMS etc. • eHealth apps, self-monitoring etc. • Social media and forums
  • 27.
    The “Odyssey” approach Analyticalmethod Link to data Data interoperability Standardised analytics Data network Strong community What will it require? The data…
  • 28.
    The OMOP commondata model Standardized clinical data Standardized health economies Standardized derived elements Standardized vocabularies Standardized meta-data Standardized health system data Person Observation period Specimen Death Visit occurrence Procedure occurrence Drug exposure Device exposure Condition occurrence Measurement Observation Note Note NLP Fact relationship Care site Payer plan period CDM source Concept Vocabulary Domain Concept class Concept relationship Relationship Condition era Drug era Dose era Location Cost Cohort Cohort attribute Concept synonym Concept ancestor Source-to-concept map Drug strength Cohort definition Attribute definition Provider Patient-centric Tabular Extendable Built for analytics Relational design
  • 29.
  • 30.
  • 31.
  • 32.
    32 LANCET PAPER FROMOHDSI-LEGEND COLLABORATION
  • 33.
  • 34.
    LEGEND Principles https://www.ehden.eu/datapartners/ https://doi.org/10.1093/jamia/ocaa103 1. LEGENDwill generate evidence at a large scale. 2. Dissemination of the evidence will not depend on the estimated effects. 3. LEGEND will generate evidence using a prespecified analysis design. 4. LEGEND will generate evidence by consistently applying a systematic process across all research questions. 5. LEGEND will generate evidence using best practices. 6. LEGEND will include empirical evaluation through the use of control questions. 7. LEGEND will generate evidence using open-source software that is freely available to all. 8. LEGEND will not be used to evaluate new methods. 9. LEGEND will generate evidence across a network of multiple databases. 10. LEGEND will maintain data confidentiality; patient-level data will not be shared between sites in the network.
  • 35.
    Medical evidence generation is changingthrough open science Statement #2 @keesvanbochove @TheHyveNL ● The scale is changing: “a paradigm shift from single study and single estimate medical research to large-scale systematic evidence generation” https://ohdsi.github.io/TheBookOfOhdsi/OpenScience.html
  • 36.
    @keesvanbochove @TheHyveNL In thiswebinar, we will dive into the basics of FAIR health data, but also take stock of the current situation in health data networks: after a year of frantic research and collaborations and many open datasets and hackathons on COVID-19, has the situation actually improved? Are we sharing health data on a global scale to improve medical practice, or is quality medical data still only accessible to researchers with the right credentials and deep pockets?
  • 37.
    The COVID-19 pandemic isan accelerator for open science Statement #3 @keesvanbochove @TheHyveNL
  • 38.
    COVID-19 hackathons areabundant Bioinformatics Citizen science Medical science?
  • 39.
    Fast observational researchis feasible March 26-29, 2020 ● Virtual event ● >300 collaborators from 30 countries ● Four time zones ● 37 healthcare databases ● Twelve concurrent network studies
  • 40.
  • 41.
  • 42.
    Three focus areas,Twelve Questions “Safety of hydroxychloroquine, alone and in combination with azithromycin, in light of rapid wide- spread use for COVID-19: a multinational, network cohort and self-controlled case series study” https://github.com/ohdsi-studies/Covid19EstimationHydroxychloroquine
  • 43.
    Immediate dissemination ofresults https://data.ohdsi.org/Covid19EstimationHydroxychloroquine/
  • 44.
    Within weeks: Paperin pre-print https://www.medrxiv.org/content/10.1101/2020.04.08.20054551v1 https://www.sciencemag.org/news/2020/04/antimalarials-widely-used-against-covid-19-heighten-risk-cardiac-arrest-how-can-doctors
  • 45.
    One meeting produceda library! https://www.ohdsi.org/covid-19-updates/
  • 46.
    Months later: thepeer-reviewed versions
  • 47.
    More studyathons areon the way! https://prostate-pioneer.eu - https://bit.ly/3c3ywpR
  • 48.
    The COVID-19 pandemic isan accelerator for open science Statement #3 @keesvanbochove @TheHyveNL ● Scientific communities have come together to distribute and share data and analytics in multiple communities ● It has created urgency for data sharing and standardization in the medical world… which hopefully will materialize in the coming years
  • 49.
    Questions? Title: Introduction toFAIR principles - Open science through FAIR health data networks: dream or reality? Presenter: Kees van Bochove Please write your questions in the questions window of the GoToWebinar application
  • 50.
    Next CINECA webinars Title:Making Cohort data FAIR Presenter: William Hsiao Date: Wed 10th February 2021, Time: 4:00 PM GMT / 5:00 PM CET Registration and details: https://www.cineca-project.eu/news- events-all/making-cohort-data-fair Title: FAIR Software tools Presenter: Carlos Martinez Date: Wed 24th February 2021, Time: 3:00 PM GMT / 4:00 PM CET Registration and details: https://www.cineca-project.eu/news- events-all/fair-software-tools