Invited talk for ConTech Pharma on 1st March 2022
Abstract
Health Data Research UK is the UK’s national institute for health data science, with a mission to unite the UK’s health data to enable discoveries that improve people’s lives. In this talk, Dr Varsha Khodiyar will outline how HDR UK is bringing together disparate health data from all four countries of the United Kingdom, creating the infrastructure to enable discovery of and access to health data, and the convening standards making bodies to improve data linkage and data reuse. Varsha will also discuss how HDR UK is moving beyond the traditional confines of FAIR data to also ensure that data sharing and data use is transparent and ‘fair’ for the patients and lay public who are the subjects of these datasets.
Disentangling the origin of chemical differences using GHOST
Digital transformation enables FAIR approach for health data science
1. Digital transformation to enable a
FAIR approach for health data science
ConTech Pharma 2022
Varsha Khodiyar, PhD
1st March 2022
2. What I’ll cover
• Introduction to Health Data Research UK and our work
• Infrastructure for data discovery (Findability)
• Facilitating access to data (Accessibility)
• Improving data linkage (Interoperability)
• Demonstrating data reuse (Reusability)
• Public patient partnerships (considerations beyond FAIR)
3. HDR UK is the national institute for health data science
Our mission is to unite the UK’s health data to enable
discoveries that improve people’s lives.
Our 20-year vision is for large scale data and advanced
analytics to benefit every patient interaction, clinical trial,
and biomedical discovery.
| 3
4. Working in partnership with patients, the national health service (NHS),
universities, business & charities to create a world-leading and robust
health data infrastructure
| 4
Gearing up the UK for quality health
care, research and innovation
Enabling data science and innovation
as a catalyst for change
Facilitating standardization of data
access and information governance
5. | 5
The Alliance: network of
experts committed to define
best practice in the use of
data
Hubs and DSCs: Data
science centres and
research hubs that
provide services of
data curation and
access
Research: Data-
driven research
through our research
sites and promote
open science
Training for
researchers
through our HDR
UK Future
platform
The Gateway:
platform for
discovery of and
access to UK data
and samples
available for
research
Our strategy focuses on uniting,
improving and using health data
7. The Innovation Gateway for health data discovery
| 7
“Really impressed with this resource. I think as a gateway to search by data type and indication, it’s a really powerful tool.“
David Leather, GSK
8. ✓ Search datasets by keywords/publisher or by themed
collection.
✓ For some datasets: a Cohort Discovery Tool enables
search population data based on characteristics required.
✓ Find summary and technical metadata for datasets.
✓ Transparency on requirements for accessing data.
✓ Request access directly via the Gateway, user-friendly
end-to-end data access request management system.
✓ Upload your papers and tools and create your own
thematic collection.
The Gateway for Researchers
www.healthdatagateway.org/about/innovators-and-researchers
9. Searching for data in the Gateway
• Wide-range of health datasets from
cardiovascular, to maternal health,
primary care, mental health to
diabetes and of course COVID-19.
• Biobank physical samples and
tissues also listed.
• Range of disciplines: e.g. emergency
care, cardiovascular, maternal
health, cancer
• Types: primary care, secondary care,
acute care, palliative care, research
cohorts
• Purpose of initial data collection:
clinical care (40%), clinical trials
(38%), administration (34%)
Covering all four nations of
the UK and discoverable via a
shared ‘front door’
49%
19%
17%
9%
7%
England Wales
Scotland NI
International
10. Cohort Discovery on the Innovation Gateway
Co-vars
Inc. Crit
Exc. Crit
Custodian
controlled
Cohort Query
Agents
Statistical Disclosure Control Policies:
• User validation (e.g. Bona-fide
Researcher)
• Low number suppression (e.g. >50)
• Query Count binning (e.g. 50, 60, 70, -)
• Query Rate limiting
• Researchers can reuse the cohort query to define their research protocol when
submitting their data access request
• Researchers will be able to reuse and compare cohort definitions between similar
protocols
• Cohort definitions will be able to reuse phenotype definitions (asthma, diabetes)
without the need for ICD-10, Read, SNOMED-CT codes
Cohort Discovery enables researchers to discover, assess and request access to
potential datasets that exactly match the research project cohort definition
using standardized inclusion & exclusion criteria and co-variates
Datasets with
female
patients
between 18-35
who have
asthma and
diabetes and
who are not
smokers and
not pregnant
Innovation Gateway
Query Engine
Dataset 1: 20K
Dataset 2: 3K
Dataset 3: 1K
…
Total Patients:
23K
Demo: https://www.youtube.com/watch?v=L50nqIR6k98
12. The Gateway for Data custodians
✓ Facilitate discoverability and accessibility of your data resources.
✓ Implement best practice for health data governance and sharing.
✓Cohort Discovery tool to improve discovery of your consented
patient cohorts.
✓ Create collections of datasets and samples to increase findability of
your resources.
✓ Provide data access in a trustworthy manner, using the Five Safes
framework.
✓ Fully digital system to manage your data access requests
13. Gain insight into data reusability prior to requesting access
| 13
Technical details & metadata wheel Data utility
14. Enabling a common, transparent approach to data access management
based around the Five Safes framework
| 14
1. Safe People
e.g. Approved researchers scheme
2. Safe Projects
Project proposal reviewed by governance board
convened by the data custodian
3. Safe Settings
Data access provided within a Trusted Research
Environment
4. Safe Data
Data supplied as deidentified to minimize identification
of individuals
5. Safe Output
Analysis outputs checked by data custodians prior to
release
15. Aligning approaches to evaluation of public benefit in data access
request applications
The National Data Guardian guidance on public benefit will be published later in the year
➢ How will you
demonstrate that
use of data
requested will
deliver public
benefit?
➢ Do you anticipate
any risks to
individuals, and if
so, what steps have
you made in your
proposal to mitigate
the risks?
17. Towards an interoperable infrastructure: Federation – Metadata Catalogues,
Cohort Discovery & Trusted Research Environments
Query Engine
Federated Data Custodian Environments (No Direct Access)
Cohort
Discovery
Link
Agents
Federated
Analysis
Federated Trusted Research Environments
Common
API
Trusted
Research
Environments
Federated Metadata Catalogues
A TRE is a highly secure environment used for saving, accessing and analysing health and care data. Within any
TRE there are multiple layers of security and safeguards in place, to minimise the risk of data being misused.
18. | 18
Data Utility Matrix
International links
Data Quality Tool Evaluation
Data Officers Groups / Hubs
Position/consultation papers
Projects (Eg COVID-19)
Metadata completeness
SIGs (FHIR, Synthetic data,etc)
Gateway development and streamlined access management are underpinned by harmonisation of
principles and best practice as driven by the Alliance
Developing and implementing community agreed standards to
maximise interoperability
19. Improving Data Utility: produce guidance and policy regarding data quality,
standards, dataset publishing, data provenance, ontology / terminology
services and use. Output: data utility framework
Promoting participation and improving access: Ensure consistency and
harmonise principles, definitions, process steps to develop a best practice for
assessing requests to access UK health data. Output: harmonised five safe DAR
Enabling FAIR data
Aligning approaches to trusted research environments: secure analytics
environments. Output: recommendation for a TRE standard
Improving transparency in data use: understand interests and values of people
in health data. Output: recommendations for a data use register standard
21. | 21
| 21
• A data use register is a record showing when and why a dataset has been shared with another organisation.
• Data uses are not always made public. There is also a lack of standardisation for data use registers (content,
functionality and purpose).
Data Use Registers
22. By establishing a core set of standards for Data Use
Registers, we hope to…
•develop a culture of openness amongst health data custodians
•increase transparency in the use of health data for research and
innovation
• HDR UK Alliance members agree to transparency of governance and
operations
•build public trust by demonstrating the value and benefit of using
health data
• National Data Guardian and Understanding Patient Data highlight that
‘transparency cannot be separated from public benefit’
•generate insights into how health data are used and accessed | 22
- Public contributor-
“Being clear how data is
used, is a vital step to
ensure the whole process is
meaningful and trusted, in
terms of outcomes, cost
effectiveness and public
trust.”
23. Developing the Data Use Register standard
Community involvement
• Collaborative approach with input from public and lay representatives,
data custodians, researchers, policy makers and funders (more than 100
people and 50 orgs contributed to the standard)
Analysis of 48 health data custodian entities - May 2021 green paper
• Nearly 50% of data custodians do not publish information about data use
Public consultation on green paper - July 2021
• Recommendations supported by 93% of respondents
Recommendations for Data Use Registers - Jan 2022 white paper
• Recommendations presenting a minimum standard for data use registers
Development of a Gateway data use register - ongoing
• Implementation of the Data Use Register recommendations on HDR UK’s
Innovation Gateway Data use register standards white paper
doi.org/10.5281/zenodo.5902743
24. Inclusion of patient and lay public groups
for fairer health data
Moving beyond ‘FAIR’
25. | 25
I am glad you're involving me from what
seems to be the beginning so that you can
actually take my concerns and address
them whilst helping the greater good
Patient / Public Voice Rep
“ It is essential that the public is included in
this ground-breaking work.
Margaret Rogers
Member of the HDR UK Public
Advisory Board
“
92 people consulted to inform decisions on
methods and process for clinical trial
recruitment
Consultation on COVID work across 7 UK-
wide patient and public networks with
168 responses.
16,500 contacts with patient & public
contributors across the institute in 2020
Strong Public Advisory Board providing
strategic guidance on all our work
HDR UK exemplifies working with public and
patients on all aspects of health data science
26. Webinars for the lay public on health data science
| 26
www.hdruk.ac.uk/events/data-access-discovery
27. Summary
• HDR UK is uniting the UK’s health data access for researchers.
• The Innovation Gateway facilitates discovery and access to health data
• Increasing the FAIRness of data, by increasing the F, A, I of (meta)data, and indicating the
R of data.
• Demonstration of the importance of patient and lay public involvement in all aspects of
health data research.
• HDR UK is laying the foundations for an international alliance to enable a data-centric
response to COVID-19 and other health challenges
| 27
28. Thank you
Varsha Khodiyar, PhD
@varsha_khodiyar
Varsha.Khodiyar@hdruk.ac.uk
Health Data Research UK
www.hdruk.ac.uk
@HDR_UK
Visit the Gateway
www.healthdatagateway.org
29. Links and resources
Health Data Research UK: https://www.hdruk.ac.uk/
UK Health Data Research Alliance: https://ukhealthdata.org/
The Health Data Research innovation Gateway: https://www.healthdatagateway.org/
Data use register standards green paper: https://zenodo.org/record/5084761#.YZq_t9DP02w
HDR UK Futures: https://www.hdruk.ac.uk/careers-in-health-data-science/continued-professional-
development/power-up-your-health-data-science-knowledge/
Working in a Trusted Research Environment: https://www.youtube.com/watch?v=8CBy7mGFcEY
The importance of public and patient engagement and involvement:
https://www.youtube.com/watch?v=tustdIQu4ZQ
An introduction to the Innovation Gateway: https://www.youtube.com/watch?v=U7mZ7cNE_KQ