Lessons from the UK:
Data access, patient trust & real-world impact
with health data science
Varsha Khodiyar, PhD
HDR UK, Data Access Project Manager
Beilstein Open Science Symposium, 5 - 7 October 2021
Agenda
• About Health Data Research UK
• Facilitating data access through The Innovation Gateway
• Importance of transparency for patients and the lay public
• HDR UK’s impact on the coronavirus pandemic
| 2
Health Data Research UK
Where this all started…
The need for a robust health data research infrastructure
in the UK
| 4
Industrial Strategy for Life Sciences
(November 2017) Hubs for health
data research (digital innovation
hubs)
HDR UK Funded by:
British Heart Foundation, Chief Scientists
Office (Scotland), Health and Care Research
Wales, Health & Social Care R&D N. Ireland,
Engineering and Physical Sciences Research
Council, Economic and Social Research
Council, Medical Research Council, National
Institute for Health Research, Wellcome, UK
Research and Innovation
Health Data Research UK asked to
lead delivery of this programme on
behalf of UK Research and
Innovation in September 2018
HDR UK aims to realise the
potential of UK’s health data
resources for patient and public
benefit Four-year programme launched in
September 2018
HDR UK’s mission is to unite the UK’s health data to
enable discoveries that improve people’s lives
Our 20-year vision is for
large scale data and advanced analytics to benefit every
patient interaction, clinical trial, biomedical discovery
and enhance public health.
| 5
HDR UK is a UK-wide virtual institute, working across all four nations
HEALTH DATA
RESEARCH HUBS
BREATHE
DATA-CAN
Discover-NOW
Gut Reaction
INSIGHT
PIONEER
NHS DigiTrials
BHF Data Science Centre
TRAINING LOCATIONS
(Masters and PhD)
Belfast
Birmingham
Bristol
Cambridge
Edinburgh
Exeter
Leeds
London
Manchester
Oxford
CENTRAL TEAM OFFICES
Wellcome Trust
Great Ormond Street
DRIVE Unit
RESEARCH LOCATIONS
HDR UK Cambridge
HDR UK London
HDR UK Midlands
HDR UK North
HDR UK Oxford
HDR UK Scotland
HDR UK South-West
HDR UK Wales and
Northern Ireland
| 6
Bringing together a complex data custodian landscape
Focused on national research priority areas
| 8
The infrastructure to deliver this strategy
USING HEALTH DATA
Data access via ‘The Innovation Gateway’
The Innovation Gateway for health data discovery and access
www.healthdatagateway.org
| 12
“Really impressed with this resource. I think as a gateway to search by data type and indication, it’s a really powerful tool.“
David Leather, GSK
Several options for discovering different types of health data
| 13
Gain insight into data reusability prior to requesting access
| 14
Technical details & metadata wheel Data utility
Layout for illustration purpose only
Health Data Gateway
Query Engine
Cohort Query
Dataset 1: 20K
Dataset 2: 3K
Dataset 3: 1K
…
Researchers
Cohort Discovery on the Innovation Gateway
Co-vars
Inc. Crit
Exc. Crit
Custodian controlled
Cohort Query Agents
Query Processing
Query Results
Total Patients: 23K
Statistical Disclosure Control Policies:
• User validation (e.g. Bona-fide Researcher)
• Low number suppression (e.g. >50)
• Query Count binning (e.g. 50, 60, 70, -)
• Query Rate limiting
• Researchers can reuse the cohort query to define their research protocol when submitting
their data access request
• Researchers will be able to reuse and compare cohort definitions between similar protocols
• Cohort definitions will be able to reuse phenotype definitions (asthma, diabetes) without
resorting to using ICD-10, Read, SNOMED-CT codes
Solution: Cohort Discovery enables researchers to discover, assess and request
access to potential datasets that exactly match the research project cohort definition
using standardized inclusion & exclusion criteria and co-variates
Datasets with female
patients between 18-35
who have asthma and
diabetes and who are
not smokers and not
pregnant
Cohort Definition
Demo: https://www.youtube.com/watch?v=L50nqIR6k98
Data Access Request workflow
| 16
• HDR UK works with data custodians to
incorporate their due diligence
processes into the Five Safes form
• Five Safes form provides a standardized
way for researchers to request access to
data held by multiple data custodians
• Data custodians maintain decision
making responsibility on whether or not
to grant data access for each request
received
Data access & management based on the Five Safes framework
| 17
https://blog.ons.gov.uk/2017/01/27/the-five-safes-data-privacy-at-ons/
1. Safe People
e.g. Approved researchers scheme
2. Safe Projects
Project proposal reviewed by governance
board convened by the data custodian
3. Safe Settings
Data access provided within a Trusted
Research Environment
4. Safe Data
Data supplied as deidentified to minimize
identification of individuals
5. Safe Output
Analysis outputs checked by data custodians
prior to release
A TRE is a Trusted Research Environment. Also known as ‘Data Safe Havens’, TREs
are highly secure computing environments that provide remote access to health data
for approved researchers to use in research that can save and improve lives.
What is a TRE?
Custodian makes
deidentified data
available within TRE
after data access
request approval.
Why are they important?
http://www.hdruk.ac.uk/trustedresearchenvironments
TRE green paper
doi.org/10.5281/zenodo.4594703 (2020)
Researcher is able to
specify the tools and
software they need
within the TRE to enable
their study.
Data analysis carried out
within the TRE and
outputs checked for
sensitivity by data
custodian.
Providing researchers with tools, training and standards
| 19
www.hdruk.ac.uk/help-with-your-data/
Data Utility Matrix
International links
Data Quality Tool Evaluation
Data Officers Groups / Hubs
Position/consultation papers
Projects (Eg COVID-19)
Metadata completeness
SIGs (FHIR, Synthetic data,etc)
We are seeing people discovering UK health data from
across the world
| 20
25,609 visits from around the world (1st Jan 2021 to 3rd Oct 2021)
Research transparency via ‘Patient and Public
Involvement and Engagement (PPIE)’
| 22
I am glad you're involving me from what
seems to be the beginning so that you can
actually take my concerns and address
them whilst helping the greater good
Patient / Public Voice Rep
“ It is essential that the public is included in
this ground-breaking work.
Margaret Rogers
Member of the HDR UK Public
Advisory Board
“
92 people consulted to inform decisions on
methods and process for clinical trial
recruitment
Consultation on COVID work across 7 UK-
wide patient and public networks with
168 responses.
16,500 contacts with patient & public
contributors across the institute in 2020
Strong Public Advisory Board providing
strategic guidance on all our work
An exemplar of working with public and patients
Monthly Open Door meetings aimed at the lay public
| 23
Gateway Data Use Registers – closing the loop
A data use register offers a public record of how data is being used, by who and most importantly for
what purpose – Outputs linked to data
| 24
• Search >600 datasets, tools, papers,
training
• Data visualisations
• Query data via cohort discovery (depth
& breadth)
• API Driven
• Data Access Requests
• Metadata Catalogues
• 5 Safes framework
• GA4GH Passports & Visas
• Open ID Connect integration
• LinkedIn & Google integration
• OpenAthens integration
• TRE integrations with Cohort
Discovery
• Publications from multiple TREs
• TRE Green Paper
• TRE white paper & reference
architecture
• TRE federation & interoperability
60% 40% 20%
DISCOVERY ACCESS ANALYSIS
= % of total
features built
OUTPUT SHARING FOR RE-USE
10%
• Publication outputs from data used
• Standards for Data use registers
• Datasets DOIs
• Automatic output linking to data
• Data Access Request/Data use
register integration
Impact of HDR UK on the coronavirus pandemic
Working in partnership to answer priority questions
- The National Core Studies
| 26
Surveillance &
Epidemiology
Longitudinal Health
and Wellbeing
Transmission &
Environment
Immunity
Clinical Trials
Infrastructure & Support
Vaccines Therapeutics
Professor
Ian Diamond (ONS)
Professor
Nishi Chaturvedi (UCL)
Divya Chadha
Manek
(VTF/NIHR)
Professor
Patrick Chinnery
(MRC)
Professor
Andrew Curran (HSE)
Professor Paul Moss
(University of
Birmingham)
Collecting and
analysing data to
understand incidence
and prevalence broadly
and in different
settings in order to
inform response
measures.
Understanding the
impact of Covid-19 on
long term health
(including long covid)
to inform the design of
mitigating policies.
Establishing infrastructure to run
large scale trials for Covid-19 drugs
and vaccines without disrupting trials
for other diseases.
Taking samples to aid
understanding of
transmission of the
disease in workplace,
transport and public
places.
Understanding
serology as a useable
predictor of immunity
against Covid.
Data and Connectivity
Professor Andrew Morris (HDR UK working with ONS)
Making UK-wide health and administrative data available for linkage and accessible to catalyse Covid-19 research.
Our three key priorities at the onset of the COVID-19 pandemic
| 27
1
2
3
Co-ordinate and connect national data science
driven research efforts related to COVID-19
Accelerate access to UK-wide priority data relevant
to COVID-19 for research
Leverage the best of the UK’s health data science
capability to address the wider impact of the COVID-
19 pandemic, supporting vulnerable groups that will
be hardest hit
Support the UK government response
through regular reporting to SAGE
| 28
Enabled daily access to positive C-19 test results within
24 hours to increase recruitment.
Support also included:
• Daily problem solving and management scrums
• In-depth engagement with the ICO
• A public survey with over 90 responses
• Engagement with multiple data custodians
• Review by NHS Digital’s IGARD
• NHS Digital’s Trial team are now addressing the
remaining actions to support data flow
Provides confidence in concept for future trials that
require incident COVID cases
I'm so pleased you're looking into this as this is
something that can actually make a difference.
Patient / Public Voice Rep
“
Patient input shaping use of data for
PRINCIPLE clinical trial
“The creative energy that has gone into making the
progress you outline could well be saving lives soon.
You are making history.”
Chris Butler, Principle Trial Lead
Important discoveries by researchers using HDR UK
infrastructure
| 29
Better
Care
Understanding Causes
of Disease
Clinical
Trials
Public
Health
High-resolution mapping of
COVID cases directly informed
public health policy in Wales
Understanding the impact of
COVID-19 on vulnerable
people with health conditions,
including those with cancer
and heart disease
RECOVERY trial used data to
establish that dexamethasone
reduces death by up to one
third in hospitalised patients
with severe respiratory
complications of COVID-19
Genetic sequencing of COVID-
19 to track which strains have
resulted in different outbreaks
30
International COVID-19 Data Alliance (ICODA):
advancing open science practice globally
Vision
To unite international health research data to enable discoveries that benefit
everyone, everywhere, by reducing the harm of COVID-19; and enable an efficient
data response to future pandemics and other health challenges
To build an open international partnership that demonstrates trustworthiness to
support a rapid response to COVID-19 and a long-term alliance for making data
accessible to researchers and scientists around the world.
Mission
• Launched on 6 July 2020
• Convened by Health Data Research UK
• Focus on Lower and Middle-Income Countries
• Supported by the COVID-19 Therapeutics Accelerator
• Driver Projects awarded to test the infrastructure and deliver COVID-19 research
OurTherapeutics Accelerator Funders
Convened by
Summary
• HDR UK is uniting the UK’s health data access for researchers.
• The Innovation Gateway facilitates discovery and access to health data by researchers.
• Increasing the FAIRness of data, by increasing the F, A, I of metadata, and indicating the
R of data via metadata assessment.
• Demonstration of the importance of patient and lay public involvement in all aspects of
health data research.
• HDR UK has had an important role to play in the UK’s response to the pandemic.
• HDR UK is laying the foundations for an international alliance to enable a data-centric
response to COVID-19 and other health challenges
| 31
Thank you
Find out more:
www.hdruk.ac.uk
@HDR_UK
Visit the Gateway: healthdatagateway.org
Find out about the UK Alliance: healthdata.org

Lessons from the UK: Data access, patient trust & real-world impact with health data science

  • 1.
    Lessons from theUK: Data access, patient trust & real-world impact with health data science Varsha Khodiyar, PhD HDR UK, Data Access Project Manager Beilstein Open Science Symposium, 5 - 7 October 2021
  • 2.
    Agenda • About HealthData Research UK • Facilitating data access through The Innovation Gateway • Importance of transparency for patients and the lay public • HDR UK’s impact on the coronavirus pandemic | 2
  • 3.
  • 4.
    Where this allstarted… The need for a robust health data research infrastructure in the UK | 4 Industrial Strategy for Life Sciences (November 2017) Hubs for health data research (digital innovation hubs) HDR UK Funded by: British Heart Foundation, Chief Scientists Office (Scotland), Health and Care Research Wales, Health & Social Care R&D N. Ireland, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Medical Research Council, National Institute for Health Research, Wellcome, UK Research and Innovation Health Data Research UK asked to lead delivery of this programme on behalf of UK Research and Innovation in September 2018 HDR UK aims to realise the potential of UK’s health data resources for patient and public benefit Four-year programme launched in September 2018
  • 5.
    HDR UK’s missionis to unite the UK’s health data to enable discoveries that improve people’s lives Our 20-year vision is for large scale data and advanced analytics to benefit every patient interaction, clinical trial, biomedical discovery and enhance public health. | 5
  • 6.
    HDR UK isa UK-wide virtual institute, working across all four nations HEALTH DATA RESEARCH HUBS BREATHE DATA-CAN Discover-NOW Gut Reaction INSIGHT PIONEER NHS DigiTrials BHF Data Science Centre TRAINING LOCATIONS (Masters and PhD) Belfast Birmingham Bristol Cambridge Edinburgh Exeter Leeds London Manchester Oxford CENTRAL TEAM OFFICES Wellcome Trust Great Ormond Street DRIVE Unit RESEARCH LOCATIONS HDR UK Cambridge HDR UK London HDR UK Midlands HDR UK North HDR UK Oxford HDR UK Scotland HDR UK South-West HDR UK Wales and Northern Ireland | 6
  • 7.
    Bringing together acomplex data custodian landscape
  • 8.
    Focused on nationalresearch priority areas | 8
  • 10.
    The infrastructure todeliver this strategy USING HEALTH DATA
  • 11.
    Data access via‘The Innovation Gateway’
  • 12.
    The Innovation Gatewayfor health data discovery and access www.healthdatagateway.org | 12 “Really impressed with this resource. I think as a gateway to search by data type and indication, it’s a really powerful tool.“ David Leather, GSK
  • 13.
    Several options fordiscovering different types of health data | 13
  • 14.
    Gain insight intodata reusability prior to requesting access | 14 Technical details & metadata wheel Data utility
  • 15.
    Layout for illustrationpurpose only Health Data Gateway Query Engine Cohort Query Dataset 1: 20K Dataset 2: 3K Dataset 3: 1K … Researchers Cohort Discovery on the Innovation Gateway Co-vars Inc. Crit Exc. Crit Custodian controlled Cohort Query Agents Query Processing Query Results Total Patients: 23K Statistical Disclosure Control Policies: • User validation (e.g. Bona-fide Researcher) • Low number suppression (e.g. >50) • Query Count binning (e.g. 50, 60, 70, -) • Query Rate limiting • Researchers can reuse the cohort query to define their research protocol when submitting their data access request • Researchers will be able to reuse and compare cohort definitions between similar protocols • Cohort definitions will be able to reuse phenotype definitions (asthma, diabetes) without resorting to using ICD-10, Read, SNOMED-CT codes Solution: Cohort Discovery enables researchers to discover, assess and request access to potential datasets that exactly match the research project cohort definition using standardized inclusion & exclusion criteria and co-variates Datasets with female patients between 18-35 who have asthma and diabetes and who are not smokers and not pregnant Cohort Definition Demo: https://www.youtube.com/watch?v=L50nqIR6k98
  • 16.
    Data Access Requestworkflow | 16 • HDR UK works with data custodians to incorporate their due diligence processes into the Five Safes form • Five Safes form provides a standardized way for researchers to request access to data held by multiple data custodians • Data custodians maintain decision making responsibility on whether or not to grant data access for each request received
  • 17.
    Data access &management based on the Five Safes framework | 17 https://blog.ons.gov.uk/2017/01/27/the-five-safes-data-privacy-at-ons/ 1. Safe People e.g. Approved researchers scheme 2. Safe Projects Project proposal reviewed by governance board convened by the data custodian 3. Safe Settings Data access provided within a Trusted Research Environment 4. Safe Data Data supplied as deidentified to minimize identification of individuals 5. Safe Output Analysis outputs checked by data custodians prior to release
  • 18.
    A TRE isa Trusted Research Environment. Also known as ‘Data Safe Havens’, TREs are highly secure computing environments that provide remote access to health data for approved researchers to use in research that can save and improve lives. What is a TRE? Custodian makes deidentified data available within TRE after data access request approval. Why are they important? http://www.hdruk.ac.uk/trustedresearchenvironments TRE green paper doi.org/10.5281/zenodo.4594703 (2020) Researcher is able to specify the tools and software they need within the TRE to enable their study. Data analysis carried out within the TRE and outputs checked for sensitivity by data custodian.
  • 19.
    Providing researchers withtools, training and standards | 19 www.hdruk.ac.uk/help-with-your-data/ Data Utility Matrix International links Data Quality Tool Evaluation Data Officers Groups / Hubs Position/consultation papers Projects (Eg COVID-19) Metadata completeness SIGs (FHIR, Synthetic data,etc)
  • 20.
    We are seeingpeople discovering UK health data from across the world | 20 25,609 visits from around the world (1st Jan 2021 to 3rd Oct 2021)
  • 21.
    Research transparency via‘Patient and Public Involvement and Engagement (PPIE)’
  • 22.
    | 22 I amglad you're involving me from what seems to be the beginning so that you can actually take my concerns and address them whilst helping the greater good Patient / Public Voice Rep “ It is essential that the public is included in this ground-breaking work. Margaret Rogers Member of the HDR UK Public Advisory Board “ 92 people consulted to inform decisions on methods and process for clinical trial recruitment Consultation on COVID work across 7 UK- wide patient and public networks with 168 responses. 16,500 contacts with patient & public contributors across the institute in 2020 Strong Public Advisory Board providing strategic guidance on all our work An exemplar of working with public and patients
  • 23.
    Monthly Open Doormeetings aimed at the lay public | 23
  • 24.
    Gateway Data UseRegisters – closing the loop A data use register offers a public record of how data is being used, by who and most importantly for what purpose – Outputs linked to data | 24 • Search >600 datasets, tools, papers, training • Data visualisations • Query data via cohort discovery (depth & breadth) • API Driven • Data Access Requests • Metadata Catalogues • 5 Safes framework • GA4GH Passports & Visas • Open ID Connect integration • LinkedIn & Google integration • OpenAthens integration • TRE integrations with Cohort Discovery • Publications from multiple TREs • TRE Green Paper • TRE white paper & reference architecture • TRE federation & interoperability 60% 40% 20% DISCOVERY ACCESS ANALYSIS = % of total features built OUTPUT SHARING FOR RE-USE 10% • Publication outputs from data used • Standards for Data use registers • Datasets DOIs • Automatic output linking to data • Data Access Request/Data use register integration
  • 25.
    Impact of HDRUK on the coronavirus pandemic
  • 26.
    Working in partnershipto answer priority questions - The National Core Studies | 26 Surveillance & Epidemiology Longitudinal Health and Wellbeing Transmission & Environment Immunity Clinical Trials Infrastructure & Support Vaccines Therapeutics Professor Ian Diamond (ONS) Professor Nishi Chaturvedi (UCL) Divya Chadha Manek (VTF/NIHR) Professor Patrick Chinnery (MRC) Professor Andrew Curran (HSE) Professor Paul Moss (University of Birmingham) Collecting and analysing data to understand incidence and prevalence broadly and in different settings in order to inform response measures. Understanding the impact of Covid-19 on long term health (including long covid) to inform the design of mitigating policies. Establishing infrastructure to run large scale trials for Covid-19 drugs and vaccines without disrupting trials for other diseases. Taking samples to aid understanding of transmission of the disease in workplace, transport and public places. Understanding serology as a useable predictor of immunity against Covid. Data and Connectivity Professor Andrew Morris (HDR UK working with ONS) Making UK-wide health and administrative data available for linkage and accessible to catalyse Covid-19 research.
  • 27.
    Our three keypriorities at the onset of the COVID-19 pandemic | 27 1 2 3 Co-ordinate and connect national data science driven research efforts related to COVID-19 Accelerate access to UK-wide priority data relevant to COVID-19 for research Leverage the best of the UK’s health data science capability to address the wider impact of the COVID- 19 pandemic, supporting vulnerable groups that will be hardest hit Support the UK government response through regular reporting to SAGE
  • 28.
    | 28 Enabled dailyaccess to positive C-19 test results within 24 hours to increase recruitment. Support also included: • Daily problem solving and management scrums • In-depth engagement with the ICO • A public survey with over 90 responses • Engagement with multiple data custodians • Review by NHS Digital’s IGARD • NHS Digital’s Trial team are now addressing the remaining actions to support data flow Provides confidence in concept for future trials that require incident COVID cases I'm so pleased you're looking into this as this is something that can actually make a difference. Patient / Public Voice Rep “ Patient input shaping use of data for PRINCIPLE clinical trial “The creative energy that has gone into making the progress you outline could well be saving lives soon. You are making history.” Chris Butler, Principle Trial Lead
  • 29.
    Important discoveries byresearchers using HDR UK infrastructure | 29 Better Care Understanding Causes of Disease Clinical Trials Public Health High-resolution mapping of COVID cases directly informed public health policy in Wales Understanding the impact of COVID-19 on vulnerable people with health conditions, including those with cancer and heart disease RECOVERY trial used data to establish that dexamethasone reduces death by up to one third in hospitalised patients with severe respiratory complications of COVID-19 Genetic sequencing of COVID- 19 to track which strains have resulted in different outbreaks
  • 30.
    30 International COVID-19 DataAlliance (ICODA): advancing open science practice globally Vision To unite international health research data to enable discoveries that benefit everyone, everywhere, by reducing the harm of COVID-19; and enable an efficient data response to future pandemics and other health challenges To build an open international partnership that demonstrates trustworthiness to support a rapid response to COVID-19 and a long-term alliance for making data accessible to researchers and scientists around the world. Mission • Launched on 6 July 2020 • Convened by Health Data Research UK • Focus on Lower and Middle-Income Countries • Supported by the COVID-19 Therapeutics Accelerator • Driver Projects awarded to test the infrastructure and deliver COVID-19 research OurTherapeutics Accelerator Funders Convened by
  • 31.
    Summary • HDR UKis uniting the UK’s health data access for researchers. • The Innovation Gateway facilitates discovery and access to health data by researchers. • Increasing the FAIRness of data, by increasing the F, A, I of metadata, and indicating the R of data via metadata assessment. • Demonstration of the importance of patient and lay public involvement in all aspects of health data research. • HDR UK has had an important role to play in the UK’s response to the pandemic. • HDR UK is laying the foundations for an international alliance to enable a data-centric response to COVID-19 and other health challenges | 31
  • 32.
    Thank you Find outmore: www.hdruk.ac.uk @HDR_UK Visit the Gateway: healthdatagateway.org Find out about the UK Alliance: healthdata.org