SlideShare a Scribd company logo
Professor Paolo Missier
School of Computing
Newcastle University
October 2023
Realising the potential of Health Data Science:
opportunities and challenges to practical adoption
2
<event
name>
The promise of data-driven medicine and healthcare
Predictive, Preventative, Personalised, Participatory: a systems biology perspective on the future of
medicine and health care
Hood L, Heath JR, Phelps ME, Lin B. Systems biology and new technologies enable predictive and preventative medicine. Science. 2004;306(5696):640–643.
Hood L, Balling R, Auffray C. Revolutionizing medicine in the 21st century through systems approaches. Biotechnol J. 2012;7(8):992–1001. Provides an overview of the science and
technological foundations of predictive, preventive, personalized and participatory healthcare
Flores M, Glusman G, Brogaard K, Price ND, Hood L. P4 medicine: how systems medicine will transform the healthcare sector and society. Per Med. 2013;10(6):565-576. doi:
10.2217/pme.13.57. PMID: 25342952; PMCID: PMC4204402.
Schmidt, Charlie. ‘Leroy Hood Looks Forward to P4 Medicine: Predictive, Personalized, Preventive, and Participatory’. JNCI Journal of the National Cancer Institute 106, no. 12
(December 2014): dju416–dju416. https://doi.org/10.1093/jnci/dju416.
[1] Sagner, M, A McNeil, P Puska, and R Arena. ‘The P4 Health Spectrum – A Predictive, Preventive, Personalized and Participatory Continuum for Promoting Healthspan’.
Progress in Cardiovascular Diseases 59, no. 5 (2017): 506–21. https://doi.org/10.1016/j.pcad.2016.08.002.
A new approach in medicine that is predictive, preventive, personalized and participatory, which we
label here as “P4” holds great promise to reduce the burden of chronic diseases by harnessing
technology and an increasingly better understanding of environment-biology interactions, evidence-
based interventions and the underlying mechanisms of chronic diseases. [1]
3
<event
name>
Data about us
Sagner, M, A McNeil, P Puska, and R Arena. ‘The P4 Health Spectrum – A Predictive, Preventive, Personalized and Participatory Continuum for Promoting Healthspan’.
Progress in Cardiovascular Diseases 59, no. 5 (2017): 506–21. https://doi.org/10.1016/j.pcad.2016.08.002.
4
<event
name>
Outline
• AI for HealthCare: a convergence of needs and opportunities
• A complex multifaceted landscape
• Challenges, opportunities, state of the art through two first-hand case studies
• Costs and Challenges throughout the data value chain
5
<event
name>
Understanding the facets of Health data
• Clinical
• Lifestyle, social
•Which data types?
• Prospective vs
retrospective
•Where do datasets
come from?
• Acquisition
• Curation, annotation
•How much do they
cost?
• Small vs Big Health
Data
•How large?
• Governance
• Protection
•Who can use it and
how?
Data
Science and
Engineering
Benefits to
patients
6
<event
name>
Which data? Capturing individuals’ complexity
Primary care records:
- Clinical tests / GP notes, diagnoses / Prescriptions
Secondary care records:
- hospital admission / diagnoses / operations / prescriptions
Multi-omics data:
- genotypes, exomes, genomes.
- Transcriptomics, proteomics
Digital Health:
- Data streams from wearable and environment sensors,
self-monitoring
Socio-demographics:
- Area of residence, family, social deprivation
7
Baseline
assessment
GP events
prescriptions
HESIN diagnoses
N = 240,000
N = 500,000
Hospital events
Used to determine
admission/ re-admission
patterns
operations
57,698,505
123,644,445
Example: UK biobank
eid
Up to 20 years of records
8
<event
name>
CPRD
Data access fee for research ~£60K
(non-commercial license)
Population makeup:
over 2,000 primary care practices
60 million patients (18m registered active patient)
at least 20 years of follow-up for 25% of the patients
Core dataset:
Demographics
Diagnoses and symptoms
Drug exposures
Vaccination history
Laboratory tests
Referrals to hospital and specialist care
Data linkages:
Hospital care (A&E; Inpatient; Outpatient; Imaging)
Death registry
Cancer registry and treatment
Mental health services
Socio-economic measures
9
<event
name>
A convergence of needs and opportunities
P4
Data-driven
Healthcare
Personal self-
monitoring
devices
Health Data
Science and
Engineering
Governance, consent
Secure data access
(Big) Health
Data
- Operations  Research
- ML, AI Methods
- Scalable computing
Medical grade  Consumer grade
- Privacy (eg GDPR)
- Opt-in vs opt-out
- Trusted Research Environments
Bigger == more useful?
10
<event
name>
The data-to-actions loop
Monitoring
Clinical testing
Data Engineering
Predictive Analytics
/ AI
Personalised
Predictions
- Prevention
- interventions
11
A complex health data science landscape for translational research
Challenges
Data
integration
Protocol design
Retrospective
Dataset search
and selection
Prospective
Data cleaning
Data standardisation
Data augmentation
- Annotation amplification
- Synthetic data
. Population characterization
. Subgroups identification
- Patient subtyping
- Disease subtyping
- ”group by”
- Clustering
- Latent Class Analysis
- Risk prediction
- Next disease prediction
- {bio, digital} markers discovery
- Other outcomes
Process modelling, HMM
Established ML
- Deep NN
- Generative AI (eg BEHRT)
Tasks
and
methods
Cross-source integration
across types:
clinical/EHR/Omics/sensors
Understanding
data semantics
Data and annotation scarcity
Managing the
quality/quantity/cost envelope
Bias control
Data noise
Advancing the methods:
“Better data science for better science”
Data governance, computational scalability  Safe Data Environments
End-to-end explainability  provenance engineering, demonstrating the benefits
Reproducible Analytics Pipelines (RAP)
Architectures
Data and
methods
Data ingestion
Data preparation /
engineering
Descriptive analytics
Pattern discovery
Predictions
12
✗
<event
name>
II. Prospective vs retrospective datasets
Prospective: defined for research purposes
✓ Stable and
predictable
✓ Follow protocol
✓ Research ready
✓ Potentially well-
curated
✓ Bias known a priori
✗ Expensive
✗ Not very reusable
✗ Scarce
 Potentially more reusable
 Natural Bias (reflects natural cohort locality)
✗ Generally not research ready
✗ Require data engineering
Retrospective: typically operational data
Example:
Clinical Practice Research Datalink
- Data collected from UK GP practices
- 60+ million patients
- (also prospective)
Example: UK Biobank
- 500,000 volunteer participants
- General health information
- Genotypes and whole genomes
- Selected internal organ imaging study (100K)
- Bias: 40+ years, geographic / social bias
Prospective datasets:
13
<event
name>
Cost of health data
Retrospective: integration/harmonisation, curation, cleaning
Prospective: cost of cohort recruitment, data collection, data processing
Acquisition + processing cost by data type:
Routinely collected
clinical variables
(GP test)
- Tests requiring specialist labs
- Proteomics
- Genotyping
(a few genes)
Whole exome
sequencing
Whole genome
sequencing
Low High
14
Case study: LITMUS
Retrospective data collected from hospitals datasets (N ≅ 10K)
Prospective data from active recruitment (N ≅ 2K)
- Routine clinical tests
- Omics (genotypes, transcriptomes, proteomes)
- Biopsies  provide label annotations
• EU IMI2 project
• Non-Alcoholic Fatty Liver Disease (NAFLD / steathosis) and NASH
(fibrosis, cirrhosis)
https://litmus-project.eu/litmus-partners/
Main contributor: Matt McTeer, PhD student
From multivariate linear regression to non-linear combinations
of markers
15
Data scarcity / sparsity issues
N= 9,449
Clinical: 8,745
GWAS: 2,216
miRNA: 183
RNASeq: 461
16
Exploring the cost/quality/importance envelope
Stratified feature set: Core  Extended  Specialist (85% missing)
17
Core variables may be enough?
Outcome: “at-risk NASH”
18
LITMUS
Challenges
Data
integration
Protocol design
Retrospective
Dataset search
and selection
Prospective Data cleaning
Data standardisation
Data augmentation
- Annotation amplification
- Synthetic data
. Population characterization
. Subgroups identification
- Patient subtyping
- Disease subtyping
- ”group by”
- Clustering
- Latent Class Analysis
- Statistical modelling
- Multivariate regression
- Risk prediction
- Next disease prediction
- {bio, digital} markers discovery
- Other outcomes
Process modelling, HMM
Established ML
- Deep NN
- Generative AI (eg BEHRT)
Tasks
and
methods
Cross-source integration
across types:
clinical/EHR/Omics/sensors
Understanding
data semantics
Data and annotation scarcity
Managing the
quality/quantity/cost envelope
Bias control
Data noise
Advancing the methods:
“Better data science for better science”
Data governance, computational scalability  Safe Data Environment
End-to-end explainability  provenance engineering, demonstrating the benefits
Reproducible Analytics Pipelines (RAP)
Architectural
Data
“Long and thin” vs “short and broad” training sets
feature completeness vs
importance, imputation
Binary classifiers across
multiple feature sets
19
<event
name>
Issues requiring Data Engineering
Recurringdata
issues
Data–driven, AI–based clinical practice: experiences, challenges, and research directions
DATA SPARSITY
AND SCARSITY
• EHR: Irregular
collections of
time series
• Imputation is
not always
possible
DATA
IMBALANCE
• Predicting
rare events
can be a
priority
• No
downsampling
option
DATA
INCONSISTENCY
and INSTABILITY
• Retrospective
data are often
source of
inconsistency
and their
schema are
instable
NOT ALL
ERRORSARE
EQUALLY
WRONG
• In high-stake
domains
sometimes a
bias towards
one type of
error is
preferible
HUMAN-IN-
THE-LOOP
• Explanations
engender trust
in the models
• Trust should
include not
only the
clinician but
also the
patient.
20
<event
name>
Sparsity/ scarcity, imbalance
Classifiers are not resilient to class imbalance:
- Models will be biased towards predicting
majority class regardless of the input features
- Will struggle to generalise correctly on the
minority class
- In clinical datasets, data scarcity/sparsity often
conspires with data imbalance
- Imbalance is very common in medical datasets
Typical mitigation:
- Downsample the majority class  lose training examples
- Upsample the minority class.  SMOTE (Synthetic Minority Oversampling Technique)
When modelling processes, these mitigations do not work
We used Hidden Markov Models (HMMs) to predict oxygen-therapy state-transitions
However, intubation is a infrequent state (and so is “death”)
This makes it was difficult to accurately learn probability distributions.
[1] proposes a novel, generic ensemble technique to mitigate the imbalance problem in HMM
21
<event
name>
Instability
Retrospective studies are often unstable:
Data acquisition and management practices may change over time, following changes in
- Clinical practices
- Public policy
- Hospital resources
- Data collection technologies
- In our COVID dataset clinical tests vary daily depending on the patient’s
condition
- Scientific evidence for the need of certain tests changed rapidly
- Example: new biomarkers like interleukin-6 were introduced in “mid flight”
- Thus earlier study datasets completely miss this variable
22
<event
name>
Translational challenge: Not all errors are equally wrong
- In high-stakes domains, prediction errors are not symmetric:
- Typically, underestimating risk is less desirable than overestimating it
- Standard model performance metrics (eg AUC, F1 etc) fail to capture this distinction
Cost-sensitive learning (cf eg [1,2,3])
- Introduce an explicit penalty of mis-classifying samples
- Note that cost- sensitive methods can sometimes deal with imbalanced datasets without
altering the original data distribution [4]
[1] Lomax, S., and Vadera, S. (2013). A survey of cost-sensitive decision tree induction algorithms. ACM Comput. Surveys 45, 1–35. doi: 10.1145/2431211.2431215
[2] Wang, H., Cui, Z., Chen, Y., Avidan, M., Abdallah, A. B., and Kronzer, A. (2018). Predicting hospital readmission via cost-sensitive deep learning. ACM Trans.
Comput. Biol. Bioinformatics 15, 1968–1978. doi: 10.1109/TCBB.2018.2827029
[3] Freitas, A., Costa-Pereira, A., and Brazdil, P. (2007). “Cost-sensitive decision trees applied to medical data,” in Data Warehousing and Knowledge Discovery
(Regensburg), 303–312. doi: 10.1007/978-3-540-74553-2_28
[4] Mienye, I. D., and Sun, Y. (2021). Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. Inform. Med. Unlock.
25:100690. doi: 10.1016/j.imu.2021.100690
23
<event
name>
Translational challenge: human-in-the-loop AI
• Essential in medical AI
• Evidence of performance is not enough
• Black-box AI not acceptable in clinical practice
From technical explanations:
• non-linear [1] and Deep Learning [2] models
• Shapley values [3]
• Interpretable ML [4,5]
Also importantly:
Patient and Public Involvement (PPI) is essential in publicly funded clinical research
“Explanation gap”:
To expert involvement in the learning process:
- by accepting/rejecting predictions
- By expressing preference for a given error type
Causal Machine Learning (CML) [6,7]:
- Visualisation and reasoning over complex clinical scenarios
- Counterfactuals, what-if scenarios
24
IEEE
BigData
2022
Multimorbidities and disease prediction
Multiple Long-Term Conditions, defined as [1,2]:
• Two/Four or more long-term (chronic) conditions
A Long Term Condition (LTC) is a condition that cannot, at present, be cured
but is controlled by medication and/or other treatment/therapies (*)
(*) NHS and UK Dept. of Health, Long Term Conditions Compendium of Information Third Edition,
https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/216528/dh_134486.pdf
[1] M. C. Johnston, M. Crilly, C. Black, G. J. Prescott, and S. W. Mercer, “Defining and measuring multimorbidity: a systematic review of systematic reviews,”
European journal of public health, vol. 29, no. 1, pp. 182–189, 2019.
[2] B. P. Nunes, T. R. Flores, G. I. Mielke, E. Thum ́e, and L. A. Facchini, “Multimorbidity and mortality in older adults: A systematic review and meta-analysis.”
Archives of gerontology and geriatrics, vol. 67, pp. 130–138, Dec. 2016, place: Netherlands.
Significant research investment by NIHR, the core
translational medicine funder in the UK
The number of people with multiple LTCs in the UK is set to
rise to 2.9 million in 2018 from 1.9 million in 2008.
25
Multiple Long Term Conditions: research at Newcastle
Characterising the inter-relationships between multiple long-term conditions (MLTC)
and polypharmacy
Funding: NIHR, 2022-2024 (CO-I)
 Disease clustering based on co-occurrence in patients’ medical timelines
 Patient clustering based on timeline similarity
 Predicting outcomes using diagnoses + prescriptions / deep learning
• Disease embeddings  supervised learning for outcome prediction
 Characterise patient pathways through hospital  process modelling
AI methods Outputs & Impacts
Datasets
Replicate
Test
Discover
UK-Biobank
CPRD
GNCR
ELPR
Event spatial &
temporal order
Event Characterisation
Event Prediction
Shared standards
Portable pipelines
Identification of high-risk
situations & tipping points
Trial emulation
Local and national policies
(high risk groups)
Training & capacity building
Clinical dashboard
Clinical support tools
NIHR AIM CISC
Connected Bradford
NIHR AIM
OPTIMAL
Improve
patient
care
Reduce
health
inequalities
Communication of results
Explainable research &
Explainable AI
Local
Health
Intelligence
Datasets
Replication
Datasets
National
Discovery
Datasets
> >
Within 5 years
LTC
Embedding: 200x100
Diagnosis
Embedding: 251x50
Historical Prescriptions
Embedding: 512x50
Preadmission Prescriptions
Embedding: 512x50
Postadmission Prescriptions
Embedding: 512x50
Demographics Vector
sex, ethnicity, townsend, etc.
Feature Vector
size: 6048
Gradient Boost
size: 100 estimators
Output:
{0, 1}
• Neural network + xgboost
combination
• Our readmission cohort is more
general vs domain specific cohorts
in literature
• Our model performs better than
current literature in spite of more
complex problem
Predicting MLTC-PP outcomes: hospital readmission
Adding explanations: which
predictors are more relevant to
explain unplanned readmission?
- LTCs and how they accumulate
- Prescriptions given between
discharge and readmission?
28
AI-MULTIPLY
Challenges
Data
integration
Protocol design
Prospective
Dataset search
and selection
Retrospective Data cleaning
Data standardisation
Data augmentation
- Annotation amplification
- Synthetic data
. Population characterization
. Subgroups identification
- Patient subtyping
- Disease subtyping
- ”group by”
- Clustering
- Latent Class Analysis
- Risk prediction
- Next disease prediction
- {bio, digital} markers discovery
- Other outcomes
Process modelling, HMM
Established ML
- Deep NN
- Generative AI (eg BEHRT)
Tasks
and
methods
Cross-source integration
across types:
clinical/EHR/Omics/sensors
Understanding
data semantics
Data and annotation scarcity
Managing the
quality/quantity/cost envelope
Bias control
Data noise
Advancing the methods:
“Better data science for better science”
Data governance, computational scalability  Safe Data Environment
End-to-end explainability  provenance engineering, demonstrating the benefits
Reproducible Analytics Pipelines (RAP)
Architectural
Data
Challenges in Drug coding in UKBB Defining and predicting hospital readmission
Defining and coding MLTC
Reproduce DNN results across sites
Disease clustering and cluster prediction
29
<event
name>
Challenges and opportunities
Data:
• Multi-site research presents opportunities for cross-validation of results, but also challenges
• Newcastle  UK Biobank
• QMUL  CPRD
• Projects like these tend to “piggyback” on existing data licenses, which may restrictive
Modelling:
• LLMs and genAI have shown potential to “sidestep” some of the more traditional prediction techniques
 Next disease prediction becomes a case of “sentence completion”
Engineering / reproducibility:
• at this stage, prototyping and experimenting are distributed across sites and each piece is owned by
one researcher
• reproducibility and reusability both seem like distant goals…
Patient and Public Involvement and Engagement:
• Establishing a productive and sustained relationship between PPIE members and researchers is a
priority
30
<event
name>
Role of PPIE in Health Data Science / AI projects
PPIE involvement “built into” every NIHR-funded project: it’s an asset and opportunity
BUT: need to make it work!
What kind of involvement? Consultation vs research co-design
• Periodic, scheduled “themed” sessions at designated project checkpoints
• Key research questions defined upfront, but opportunities to revise / refine mid-flight
• The academic perspective and the lived experiences are very different
• Need to find a common language
• But also to find a way to ensure mutual benefit and a two-way learning experience
31
<event
name>
PPIE: some elements for reflection
Engendering trust in AI and in secure data management practices
• Where is your data held? How do TREs work?
• What are the boundaries of legitimate use of your data for research? How is the law changing?
• Transparency and explainability: How we can achieve effective communication on what an algorithm is doing?
What outcomes are most relevant? Are those aligned with the data we work with?
• Ex.: ensuring good Quality of Life for LTC patients: very important, but data hardly available
Medication / prescriptions:
• Meeting expectations like “predict the best combination of medicines” present hard challenges
Data limitations: “you don’t know half my story”
32
<event
name>
Data governance issues: the emerging UK landscape
https://www.goldacrereview.org/
Build a small number of Trusted Research Environments, avoiding duplication
Promote culture of reuse of code (curation pipelines, analytics)
- Reproducible Analytical Pipelines”, a set of best practices
- Promote high quality, shared, reviewable, re-usable, well-documented code for
standardized data curation and analysis
- Promote transparency, avoid black box analysis
Adopt single governance rules for integrated data access
- Rationalise approvals: create one map of all approval processes
Build appropriate capabilities:
- Train academic researchers and NHS analysts in computational data science
techniques
33
<event
name>
Cluster analysis workflows
patients x
diagnoses
Binary matrix
Patients’ medical histories
(EHR)
Latent
Class
Analysis
Patient / cluster
associations
(discard time)
[1,2]
[2]
Disease
clustering
Ex:
Topic
Modelling
[1]
Patient
clustering
Cluster
phenotyping
Patients
Cross-sectional
data
{[…]}
[1]
[1,2]
34
<event
name>
LCA example
[2]
35
<event
name>
Cluster phenotyping example
[2]
36
<event
name>
Workflow - example
[1]
37
<event
name>
[1]
38
<event
name>
Summary
Enablers:
Data availability
Scalable data processing technology
Inexpensive, accurate self-monitoring
Mature data science and engineering methods
Rapidly advancing AI
A unique convergence of opportunities and challenges to achieve a “P4” vision of data-driven medicine
and healthcare management
Blockers:
Data access and governance, data integration
Data Quality control, device tolerance, intrusiveness
Data engineering expensive and ad hoc
Still very experimental. Trustworthy, Ethical, Responsible AI
Hard “management” questions:
- how do you calculate the “total cost of operation” for data-driven medicine?
- at which point does it become cost-effective for the health service?
- what are the real benefits to patients?
- …

More Related Content

Similar to Realising the potential of Health Data Science: opportunities and challenges to practical adoption

Enabling Analytics on Sensitive Medical Data with Secure Multiparty Computation
Enabling Analytics on Sensitive Medical Data with Secure Multiparty ComputationEnabling Analytics on Sensitive Medical Data with Secure Multiparty Computation
Enabling Analytics on Sensitive Medical Data with Secure Multiparty Computation
Wessel Kraaij
 
Improving health care outcomes with responsible data science
Improving health care outcomes with responsible data scienceImproving health care outcomes with responsible data science
Improving health care outcomes with responsible data science
Wessel Kraaij
 
Umc floortje scheepers
Umc floortje scheepersUmc floortje scheepers
Umc floortje scheepers
BigDataExpo
 
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
D3 Consutling
 
The Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across ScalesThe Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across Scales
Philip Payne
 
Day 1: Real-World Data Panel
Day 1: Real-World Data Panel Day 1: Real-World Data Panel
Day 1: Real-World Data Panel
Canadian Organization for Rare Disorders
 
Precision and Participatory Medicine - MEDINFO 2015 Panel on big data
Precision and Participatory Medicine - MEDINFO 2015 Panel on big dataPrecision and Participatory Medicine - MEDINFO 2015 Panel on big data
Precision and Participatory Medicine - MEDINFO 2015 Panel on big data
Health and Biomedical Informatics Centre @ The University of Melbourne
 
Leveraging Data Analysis for Advancements in Healthcare and Medical Research.pdf
Leveraging Data Analysis for Advancements in Healthcare and Medical Research.pdfLeveraging Data Analysis for Advancements in Healthcare and Medical Research.pdf
Leveraging Data Analysis for Advancements in Healthcare and Medical Research.pdf
Soumodeep Nanee Kundu
 
Augmented Personalized Health: using AI techniques on semantically integrated...
Augmented Personalized Health: using AI techniques on semantically integrated...Augmented Personalized Health: using AI techniques on semantically integrated...
Augmented Personalized Health: using AI techniques on semantically integrated...
Amit Sheth
 
Big data
Big dataBig data
ppt for data science slideshare.pptx
ppt for data science slideshare.pptxppt for data science slideshare.pptx
ppt for data science slideshare.pptx
MangeshPatil358834
 
50120140506011
5012014050601150120140506011
50120140506011
IAEME Publication
 
AI/ML in Clinical Development
AI/ML in Clinical DevelopmentAI/ML in Clinical Development
AI/ML in Clinical Development
Justin Hayward
 
Ex33900906
Ex33900906Ex33900906
Ex33900906
IJERA Editor
 
Ex33900906
Ex33900906Ex33900906
Ex33900906
IJERA Editor
 
Data Commons & Data Science Workshop
Data Commons & Data Science WorkshopData Commons & Data Science Workshop
Data Commons & Data Science Workshop
Warren Kibbe
 
The Future of Personalized Medicine
The Future of Personalized MedicineThe Future of Personalized Medicine
The Future of Personalized Medicine
Edgewater
 
Agility v7.0-rro
Agility v7.0-rroAgility v7.0-rro
Agility v7.0-rro
Rex Osborn
 
Vph2012 20 sept12_shublaq_final
Vph2012 20 sept12_shublaq_finalVph2012 20 sept12_shublaq_final
Vph2012 20 sept12_shublaq_final
Nour Shublaq
 

Similar to Realising the potential of Health Data Science: opportunities and challenges to practical adoption (20)

Enabling Analytics on Sensitive Medical Data with Secure Multiparty Computation
Enabling Analytics on Sensitive Medical Data with Secure Multiparty ComputationEnabling Analytics on Sensitive Medical Data with Secure Multiparty Computation
Enabling Analytics on Sensitive Medical Data with Secure Multiparty Computation
 
Improving health care outcomes with responsible data science
Improving health care outcomes with responsible data scienceImproving health care outcomes with responsible data science
Improving health care outcomes with responsible data science
 
Umc floortje scheepers
Umc floortje scheepersUmc floortje scheepers
Umc floortje scheepers
 
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
 
The Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across ScalesThe Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across Scales
 
Day 1: Real-World Data Panel
Day 1: Real-World Data Panel Day 1: Real-World Data Panel
Day 1: Real-World Data Panel
 
Precision and Participatory Medicine - MEDINFO 2015 Panel on big data
Precision and Participatory Medicine - MEDINFO 2015 Panel on big dataPrecision and Participatory Medicine - MEDINFO 2015 Panel on big data
Precision and Participatory Medicine - MEDINFO 2015 Panel on big data
 
Leveraging Data Analysis for Advancements in Healthcare and Medical Research.pdf
Leveraging Data Analysis for Advancements in Healthcare and Medical Research.pdfLeveraging Data Analysis for Advancements in Healthcare and Medical Research.pdf
Leveraging Data Analysis for Advancements in Healthcare and Medical Research.pdf
 
Augmented Personalized Health: using AI techniques on semantically integrated...
Augmented Personalized Health: using AI techniques on semantically integrated...Augmented Personalized Health: using AI techniques on semantically integrated...
Augmented Personalized Health: using AI techniques on semantically integrated...
 
Big data
Big dataBig data
Big data
 
ppt for data science slideshare.pptx
ppt for data science slideshare.pptxppt for data science slideshare.pptx
ppt for data science slideshare.pptx
 
50120140506011
5012014050601150120140506011
50120140506011
 
1-s2.0-S0167923620300944-main.pdf
1-s2.0-S0167923620300944-main.pdf1-s2.0-S0167923620300944-main.pdf
1-s2.0-S0167923620300944-main.pdf
 
AI/ML in Clinical Development
AI/ML in Clinical DevelopmentAI/ML in Clinical Development
AI/ML in Clinical Development
 
Ex33900906
Ex33900906Ex33900906
Ex33900906
 
Ex33900906
Ex33900906Ex33900906
Ex33900906
 
Data Commons & Data Science Workshop
Data Commons & Data Science WorkshopData Commons & Data Science Workshop
Data Commons & Data Science Workshop
 
The Future of Personalized Medicine
The Future of Personalized MedicineThe Future of Personalized Medicine
The Future of Personalized Medicine
 
Agility v7.0-rro
Agility v7.0-rroAgility v7.0-rro
Agility v7.0-rro
 
Vph2012 20 sept12_shublaq_final
Vph2012 20 sept12_shublaq_finalVph2012 20 sept12_shublaq_final
Vph2012 20 sept12_shublaq_final
 

More from Paolo Missier

(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
Paolo Missier
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
Paolo Missier
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
Paolo Missier
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
Paolo Missier
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
Paolo Missier
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Paolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Paolo Missier
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
Paolo Missier
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
Paolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Paolo Missier
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Paolo Missier
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
Paolo Missier
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Paolo Missier
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
Paolo Missier
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
Paolo Missier
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Paolo Missier
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Paolo Missier
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Paolo Missier
 

More from Paolo Missier (20)

(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 

Recently uploaded

Artificial Intelligence to Optimize Cardiovascular Therapy
Artificial Intelligence to Optimize Cardiovascular TherapyArtificial Intelligence to Optimize Cardiovascular Therapy
Artificial Intelligence to Optimize Cardiovascular Therapy
Iris Thiele Isip-Tan
 
CHAPTER 1 SEMESTER V - ROLE OF PEADIATRIC NURSE.pdf
CHAPTER 1 SEMESTER V - ROLE OF PEADIATRIC NURSE.pdfCHAPTER 1 SEMESTER V - ROLE OF PEADIATRIC NURSE.pdf
CHAPTER 1 SEMESTER V - ROLE OF PEADIATRIC NURSE.pdf
Sachin Sharma
 
Surgery-Mini-OSCE-All-Past-Years-Questions-Modified.
Surgery-Mini-OSCE-All-Past-Years-Questions-Modified.Surgery-Mini-OSCE-All-Past-Years-Questions-Modified.
Surgery-Mini-OSCE-All-Past-Years-Questions-Modified.
preciousstephanie75
 
VVIP Dehradun Girls 9719300533 Heat-bake { Dehradun } Genteel ℂall Serviℂe By...
VVIP Dehradun Girls 9719300533 Heat-bake { Dehradun } Genteel ℂall Serviℂe By...VVIP Dehradun Girls 9719300533 Heat-bake { Dehradun } Genteel ℂall Serviℂe By...
VVIP Dehradun Girls 9719300533 Heat-bake { Dehradun } Genteel ℂall Serviℂe By...
rajkumar669520
 
GLOBAL WARMING BY PRIYA BHOJWANI @..pptx
GLOBAL WARMING BY PRIYA BHOJWANI @..pptxGLOBAL WARMING BY PRIYA BHOJWANI @..pptx
GLOBAL WARMING BY PRIYA BHOJWANI @..pptx
priyabhojwani1200
 
Antibiotic Stewardship by Anushri Srivastava.pptx
Antibiotic Stewardship by Anushri Srivastava.pptxAntibiotic Stewardship by Anushri Srivastava.pptx
Antibiotic Stewardship by Anushri Srivastava.pptx
AnushriSrivastav
 
How many patients does case series should have In comparison to case reports.pdf
How many patients does case series should have In comparison to case reports.pdfHow many patients does case series should have In comparison to case reports.pdf
How many patients does case series should have In comparison to case reports.pdf
pubrica101
 
CONSTRUCTION OF TEST IN MANAGEMENT .docx
CONSTRUCTION OF TEST IN MANAGEMENT .docxCONSTRUCTION OF TEST IN MANAGEMENT .docx
CONSTRUCTION OF TEST IN MANAGEMENT .docx
PGIMS Rohtak
 
一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证
一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证
一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证
o6ov5dqmf
 
Contact ME {89011**83002} Haridwar ℂall Girls By Full Service Call Girl In Ha...
Contact ME {89011**83002} Haridwar ℂall Girls By Full Service Call Girl In Ha...Contact ME {89011**83002} Haridwar ℂall Girls By Full Service Call Girl In Ha...
Contact ME {89011**83002} Haridwar ℂall Girls By Full Service Call Girl In Ha...
ranishasharma67
 
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdf
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdfCHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdf
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdf
Sachin Sharma
 
💘Ludhiana ℂall Girls 📞]][89011★83002][[ 📱 ❤ESCORTS service in Ludhiana💃💦Ludhi...
💘Ludhiana ℂall Girls 📞]][89011★83002][[ 📱 ❤ESCORTS service in Ludhiana💃💦Ludhi...💘Ludhiana ℂall Girls 📞]][89011★83002][[ 📱 ❤ESCORTS service in Ludhiana💃💦Ludhi...
💘Ludhiana ℂall Girls 📞]][89011★83002][[ 📱 ❤ESCORTS service in Ludhiana💃💦Ludhi...
ranishasharma67
 
The Importance of Community Nursing Care.pdf
The Importance of Community Nursing Care.pdfThe Importance of Community Nursing Care.pdf
The Importance of Community Nursing Care.pdf
AD Healthcare
 
ventilator, child on ventilator, newborn
ventilator, child on ventilator, newbornventilator, child on ventilator, newborn
ventilator, child on ventilator, newborn
Pooja Rani
 
HEAT WAVE presented by priya bhojwani..pptx
HEAT WAVE presented by priya bhojwani..pptxHEAT WAVE presented by priya bhojwani..pptx
HEAT WAVE presented by priya bhojwani..pptx
priyabhojwani1200
 
Deep Leg Vein Thrombosis (DVT): Meaning, Causes, Symptoms, Treatment, and Mor...
Deep Leg Vein Thrombosis (DVT): Meaning, Causes, Symptoms, Treatment, and Mor...Deep Leg Vein Thrombosis (DVT): Meaning, Causes, Symptoms, Treatment, and Mor...
Deep Leg Vein Thrombosis (DVT): Meaning, Causes, Symptoms, Treatment, and Mor...
The Lifesciences Magazine
 
Medical Technology Tackles New Health Care Demand - Research Report - March 2...
Medical Technology Tackles New Health Care Demand - Research Report - March 2...Medical Technology Tackles New Health Care Demand - Research Report - March 2...
Medical Technology Tackles New Health Care Demand - Research Report - March 2...
pchutichetpong
 
Health Education on prevention of hypertension
Health Education on prevention of hypertensionHealth Education on prevention of hypertension
Health Education on prevention of hypertension
Radhika kulvi
 
Neuro Saphirex Cranial Brochure
Neuro Saphirex Cranial BrochureNeuro Saphirex Cranial Brochure
Neuro Saphirex Cranial Brochure
RXOOM Healthcare Pvt. Ltd. ​
 
BOWEL ELIMINATION BY ANUSHRI SRIVASTAVA.pptx
BOWEL ELIMINATION BY ANUSHRI SRIVASTAVA.pptxBOWEL ELIMINATION BY ANUSHRI SRIVASTAVA.pptx
BOWEL ELIMINATION BY ANUSHRI SRIVASTAVA.pptx
AnushriSrivastav
 

Recently uploaded (20)

Artificial Intelligence to Optimize Cardiovascular Therapy
Artificial Intelligence to Optimize Cardiovascular TherapyArtificial Intelligence to Optimize Cardiovascular Therapy
Artificial Intelligence to Optimize Cardiovascular Therapy
 
CHAPTER 1 SEMESTER V - ROLE OF PEADIATRIC NURSE.pdf
CHAPTER 1 SEMESTER V - ROLE OF PEADIATRIC NURSE.pdfCHAPTER 1 SEMESTER V - ROLE OF PEADIATRIC NURSE.pdf
CHAPTER 1 SEMESTER V - ROLE OF PEADIATRIC NURSE.pdf
 
Surgery-Mini-OSCE-All-Past-Years-Questions-Modified.
Surgery-Mini-OSCE-All-Past-Years-Questions-Modified.Surgery-Mini-OSCE-All-Past-Years-Questions-Modified.
Surgery-Mini-OSCE-All-Past-Years-Questions-Modified.
 
VVIP Dehradun Girls 9719300533 Heat-bake { Dehradun } Genteel ℂall Serviℂe By...
VVIP Dehradun Girls 9719300533 Heat-bake { Dehradun } Genteel ℂall Serviℂe By...VVIP Dehradun Girls 9719300533 Heat-bake { Dehradun } Genteel ℂall Serviℂe By...
VVIP Dehradun Girls 9719300533 Heat-bake { Dehradun } Genteel ℂall Serviℂe By...
 
GLOBAL WARMING BY PRIYA BHOJWANI @..pptx
GLOBAL WARMING BY PRIYA BHOJWANI @..pptxGLOBAL WARMING BY PRIYA BHOJWANI @..pptx
GLOBAL WARMING BY PRIYA BHOJWANI @..pptx
 
Antibiotic Stewardship by Anushri Srivastava.pptx
Antibiotic Stewardship by Anushri Srivastava.pptxAntibiotic Stewardship by Anushri Srivastava.pptx
Antibiotic Stewardship by Anushri Srivastava.pptx
 
How many patients does case series should have In comparison to case reports.pdf
How many patients does case series should have In comparison to case reports.pdfHow many patients does case series should have In comparison to case reports.pdf
How many patients does case series should have In comparison to case reports.pdf
 
CONSTRUCTION OF TEST IN MANAGEMENT .docx
CONSTRUCTION OF TEST IN MANAGEMENT .docxCONSTRUCTION OF TEST IN MANAGEMENT .docx
CONSTRUCTION OF TEST IN MANAGEMENT .docx
 
一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证
一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证
一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证
 
Contact ME {89011**83002} Haridwar ℂall Girls By Full Service Call Girl In Ha...
Contact ME {89011**83002} Haridwar ℂall Girls By Full Service Call Girl In Ha...Contact ME {89011**83002} Haridwar ℂall Girls By Full Service Call Girl In Ha...
Contact ME {89011**83002} Haridwar ℂall Girls By Full Service Call Girl In Ha...
 
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdf
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdfCHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdf
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdf
 
💘Ludhiana ℂall Girls 📞]][89011★83002][[ 📱 ❤ESCORTS service in Ludhiana💃💦Ludhi...
💘Ludhiana ℂall Girls 📞]][89011★83002][[ 📱 ❤ESCORTS service in Ludhiana💃💦Ludhi...💘Ludhiana ℂall Girls 📞]][89011★83002][[ 📱 ❤ESCORTS service in Ludhiana💃💦Ludhi...
💘Ludhiana ℂall Girls 📞]][89011★83002][[ 📱 ❤ESCORTS service in Ludhiana💃💦Ludhi...
 
The Importance of Community Nursing Care.pdf
The Importance of Community Nursing Care.pdfThe Importance of Community Nursing Care.pdf
The Importance of Community Nursing Care.pdf
 
ventilator, child on ventilator, newborn
ventilator, child on ventilator, newbornventilator, child on ventilator, newborn
ventilator, child on ventilator, newborn
 
HEAT WAVE presented by priya bhojwani..pptx
HEAT WAVE presented by priya bhojwani..pptxHEAT WAVE presented by priya bhojwani..pptx
HEAT WAVE presented by priya bhojwani..pptx
 
Deep Leg Vein Thrombosis (DVT): Meaning, Causes, Symptoms, Treatment, and Mor...
Deep Leg Vein Thrombosis (DVT): Meaning, Causes, Symptoms, Treatment, and Mor...Deep Leg Vein Thrombosis (DVT): Meaning, Causes, Symptoms, Treatment, and Mor...
Deep Leg Vein Thrombosis (DVT): Meaning, Causes, Symptoms, Treatment, and Mor...
 
Medical Technology Tackles New Health Care Demand - Research Report - March 2...
Medical Technology Tackles New Health Care Demand - Research Report - March 2...Medical Technology Tackles New Health Care Demand - Research Report - March 2...
Medical Technology Tackles New Health Care Demand - Research Report - March 2...
 
Health Education on prevention of hypertension
Health Education on prevention of hypertensionHealth Education on prevention of hypertension
Health Education on prevention of hypertension
 
Neuro Saphirex Cranial Brochure
Neuro Saphirex Cranial BrochureNeuro Saphirex Cranial Brochure
Neuro Saphirex Cranial Brochure
 
BOWEL ELIMINATION BY ANUSHRI SRIVASTAVA.pptx
BOWEL ELIMINATION BY ANUSHRI SRIVASTAVA.pptxBOWEL ELIMINATION BY ANUSHRI SRIVASTAVA.pptx
BOWEL ELIMINATION BY ANUSHRI SRIVASTAVA.pptx
 

Realising the potential of Health Data Science: opportunities and challenges to practical adoption

  • 1. Professor Paolo Missier School of Computing Newcastle University October 2023 Realising the potential of Health Data Science: opportunities and challenges to practical adoption
  • 2. 2 <event name> The promise of data-driven medicine and healthcare Predictive, Preventative, Personalised, Participatory: a systems biology perspective on the future of medicine and health care Hood L, Heath JR, Phelps ME, Lin B. Systems biology and new technologies enable predictive and preventative medicine. Science. 2004;306(5696):640–643. Hood L, Balling R, Auffray C. Revolutionizing medicine in the 21st century through systems approaches. Biotechnol J. 2012;7(8):992–1001. Provides an overview of the science and technological foundations of predictive, preventive, personalized and participatory healthcare Flores M, Glusman G, Brogaard K, Price ND, Hood L. P4 medicine: how systems medicine will transform the healthcare sector and society. Per Med. 2013;10(6):565-576. doi: 10.2217/pme.13.57. PMID: 25342952; PMCID: PMC4204402. Schmidt, Charlie. ‘Leroy Hood Looks Forward to P4 Medicine: Predictive, Personalized, Preventive, and Participatory’. JNCI Journal of the National Cancer Institute 106, no. 12 (December 2014): dju416–dju416. https://doi.org/10.1093/jnci/dju416. [1] Sagner, M, A McNeil, P Puska, and R Arena. ‘The P4 Health Spectrum – A Predictive, Preventive, Personalized and Participatory Continuum for Promoting Healthspan’. Progress in Cardiovascular Diseases 59, no. 5 (2017): 506–21. https://doi.org/10.1016/j.pcad.2016.08.002. A new approach in medicine that is predictive, preventive, personalized and participatory, which we label here as “P4” holds great promise to reduce the burden of chronic diseases by harnessing technology and an increasingly better understanding of environment-biology interactions, evidence- based interventions and the underlying mechanisms of chronic diseases. [1]
  • 3. 3 <event name> Data about us Sagner, M, A McNeil, P Puska, and R Arena. ‘The P4 Health Spectrum – A Predictive, Preventive, Personalized and Participatory Continuum for Promoting Healthspan’. Progress in Cardiovascular Diseases 59, no. 5 (2017): 506–21. https://doi.org/10.1016/j.pcad.2016.08.002.
  • 4. 4 <event name> Outline • AI for HealthCare: a convergence of needs and opportunities • A complex multifaceted landscape • Challenges, opportunities, state of the art through two first-hand case studies • Costs and Challenges throughout the data value chain
  • 5. 5 <event name> Understanding the facets of Health data • Clinical • Lifestyle, social •Which data types? • Prospective vs retrospective •Where do datasets come from? • Acquisition • Curation, annotation •How much do they cost? • Small vs Big Health Data •How large? • Governance • Protection •Who can use it and how? Data Science and Engineering Benefits to patients
  • 6. 6 <event name> Which data? Capturing individuals’ complexity Primary care records: - Clinical tests / GP notes, diagnoses / Prescriptions Secondary care records: - hospital admission / diagnoses / operations / prescriptions Multi-omics data: - genotypes, exomes, genomes. - Transcriptomics, proteomics Digital Health: - Data streams from wearable and environment sensors, self-monitoring Socio-demographics: - Area of residence, family, social deprivation
  • 7. 7 Baseline assessment GP events prescriptions HESIN diagnoses N = 240,000 N = 500,000 Hospital events Used to determine admission/ re-admission patterns operations 57,698,505 123,644,445 Example: UK biobank eid Up to 20 years of records
  • 8. 8 <event name> CPRD Data access fee for research ~£60K (non-commercial license) Population makeup: over 2,000 primary care practices 60 million patients (18m registered active patient) at least 20 years of follow-up for 25% of the patients Core dataset: Demographics Diagnoses and symptoms Drug exposures Vaccination history Laboratory tests Referrals to hospital and specialist care Data linkages: Hospital care (A&E; Inpatient; Outpatient; Imaging) Death registry Cancer registry and treatment Mental health services Socio-economic measures
  • 9. 9 <event name> A convergence of needs and opportunities P4 Data-driven Healthcare Personal self- monitoring devices Health Data Science and Engineering Governance, consent Secure data access (Big) Health Data - Operations  Research - ML, AI Methods - Scalable computing Medical grade  Consumer grade - Privacy (eg GDPR) - Opt-in vs opt-out - Trusted Research Environments Bigger == more useful?
  • 10. 10 <event name> The data-to-actions loop Monitoring Clinical testing Data Engineering Predictive Analytics / AI Personalised Predictions - Prevention - interventions
  • 11. 11 A complex health data science landscape for translational research Challenges Data integration Protocol design Retrospective Dataset search and selection Prospective Data cleaning Data standardisation Data augmentation - Annotation amplification - Synthetic data . Population characterization . Subgroups identification - Patient subtyping - Disease subtyping - ”group by” - Clustering - Latent Class Analysis - Risk prediction - Next disease prediction - {bio, digital} markers discovery - Other outcomes Process modelling, HMM Established ML - Deep NN - Generative AI (eg BEHRT) Tasks and methods Cross-source integration across types: clinical/EHR/Omics/sensors Understanding data semantics Data and annotation scarcity Managing the quality/quantity/cost envelope Bias control Data noise Advancing the methods: “Better data science for better science” Data governance, computational scalability  Safe Data Environments End-to-end explainability  provenance engineering, demonstrating the benefits Reproducible Analytics Pipelines (RAP) Architectures Data and methods Data ingestion Data preparation / engineering Descriptive analytics Pattern discovery Predictions
  • 12. 12 ✗ <event name> II. Prospective vs retrospective datasets Prospective: defined for research purposes ✓ Stable and predictable ✓ Follow protocol ✓ Research ready ✓ Potentially well- curated ✓ Bias known a priori ✗ Expensive ✗ Not very reusable ✗ Scarce  Potentially more reusable  Natural Bias (reflects natural cohort locality) ✗ Generally not research ready ✗ Require data engineering Retrospective: typically operational data Example: Clinical Practice Research Datalink - Data collected from UK GP practices - 60+ million patients - (also prospective) Example: UK Biobank - 500,000 volunteer participants - General health information - Genotypes and whole genomes - Selected internal organ imaging study (100K) - Bias: 40+ years, geographic / social bias Prospective datasets:
  • 13. 13 <event name> Cost of health data Retrospective: integration/harmonisation, curation, cleaning Prospective: cost of cohort recruitment, data collection, data processing Acquisition + processing cost by data type: Routinely collected clinical variables (GP test) - Tests requiring specialist labs - Proteomics - Genotyping (a few genes) Whole exome sequencing Whole genome sequencing Low High
  • 14. 14 Case study: LITMUS Retrospective data collected from hospitals datasets (N ≅ 10K) Prospective data from active recruitment (N ≅ 2K) - Routine clinical tests - Omics (genotypes, transcriptomes, proteomes) - Biopsies  provide label annotations • EU IMI2 project • Non-Alcoholic Fatty Liver Disease (NAFLD / steathosis) and NASH (fibrosis, cirrhosis) https://litmus-project.eu/litmus-partners/ Main contributor: Matt McTeer, PhD student From multivariate linear regression to non-linear combinations of markers
  • 15. 15 Data scarcity / sparsity issues N= 9,449 Clinical: 8,745 GWAS: 2,216 miRNA: 183 RNASeq: 461
  • 16. 16 Exploring the cost/quality/importance envelope Stratified feature set: Core  Extended  Specialist (85% missing)
  • 17. 17 Core variables may be enough? Outcome: “at-risk NASH”
  • 18. 18 LITMUS Challenges Data integration Protocol design Retrospective Dataset search and selection Prospective Data cleaning Data standardisation Data augmentation - Annotation amplification - Synthetic data . Population characterization . Subgroups identification - Patient subtyping - Disease subtyping - ”group by” - Clustering - Latent Class Analysis - Statistical modelling - Multivariate regression - Risk prediction - Next disease prediction - {bio, digital} markers discovery - Other outcomes Process modelling, HMM Established ML - Deep NN - Generative AI (eg BEHRT) Tasks and methods Cross-source integration across types: clinical/EHR/Omics/sensors Understanding data semantics Data and annotation scarcity Managing the quality/quantity/cost envelope Bias control Data noise Advancing the methods: “Better data science for better science” Data governance, computational scalability  Safe Data Environment End-to-end explainability  provenance engineering, demonstrating the benefits Reproducible Analytics Pipelines (RAP) Architectural Data “Long and thin” vs “short and broad” training sets feature completeness vs importance, imputation Binary classifiers across multiple feature sets
  • 19. 19 <event name> Issues requiring Data Engineering Recurringdata issues Data–driven, AI–based clinical practice: experiences, challenges, and research directions DATA SPARSITY AND SCARSITY • EHR: Irregular collections of time series • Imputation is not always possible DATA IMBALANCE • Predicting rare events can be a priority • No downsampling option DATA INCONSISTENCY and INSTABILITY • Retrospective data are often source of inconsistency and their schema are instable NOT ALL ERRORSARE EQUALLY WRONG • In high-stake domains sometimes a bias towards one type of error is preferible HUMAN-IN- THE-LOOP • Explanations engender trust in the models • Trust should include not only the clinician but also the patient.
  • 20. 20 <event name> Sparsity/ scarcity, imbalance Classifiers are not resilient to class imbalance: - Models will be biased towards predicting majority class regardless of the input features - Will struggle to generalise correctly on the minority class - In clinical datasets, data scarcity/sparsity often conspires with data imbalance - Imbalance is very common in medical datasets Typical mitigation: - Downsample the majority class  lose training examples - Upsample the minority class.  SMOTE (Synthetic Minority Oversampling Technique) When modelling processes, these mitigations do not work We used Hidden Markov Models (HMMs) to predict oxygen-therapy state-transitions However, intubation is a infrequent state (and so is “death”) This makes it was difficult to accurately learn probability distributions. [1] proposes a novel, generic ensemble technique to mitigate the imbalance problem in HMM
  • 21. 21 <event name> Instability Retrospective studies are often unstable: Data acquisition and management practices may change over time, following changes in - Clinical practices - Public policy - Hospital resources - Data collection technologies - In our COVID dataset clinical tests vary daily depending on the patient’s condition - Scientific evidence for the need of certain tests changed rapidly - Example: new biomarkers like interleukin-6 were introduced in “mid flight” - Thus earlier study datasets completely miss this variable
  • 22. 22 <event name> Translational challenge: Not all errors are equally wrong - In high-stakes domains, prediction errors are not symmetric: - Typically, underestimating risk is less desirable than overestimating it - Standard model performance metrics (eg AUC, F1 etc) fail to capture this distinction Cost-sensitive learning (cf eg [1,2,3]) - Introduce an explicit penalty of mis-classifying samples - Note that cost- sensitive methods can sometimes deal with imbalanced datasets without altering the original data distribution [4] [1] Lomax, S., and Vadera, S. (2013). A survey of cost-sensitive decision tree induction algorithms. ACM Comput. Surveys 45, 1–35. doi: 10.1145/2431211.2431215 [2] Wang, H., Cui, Z., Chen, Y., Avidan, M., Abdallah, A. B., and Kronzer, A. (2018). Predicting hospital readmission via cost-sensitive deep learning. ACM Trans. Comput. Biol. Bioinformatics 15, 1968–1978. doi: 10.1109/TCBB.2018.2827029 [3] Freitas, A., Costa-Pereira, A., and Brazdil, P. (2007). “Cost-sensitive decision trees applied to medical data,” in Data Warehousing and Knowledge Discovery (Regensburg), 303–312. doi: 10.1007/978-3-540-74553-2_28 [4] Mienye, I. D., and Sun, Y. (2021). Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. Inform. Med. Unlock. 25:100690. doi: 10.1016/j.imu.2021.100690
  • 23. 23 <event name> Translational challenge: human-in-the-loop AI • Essential in medical AI • Evidence of performance is not enough • Black-box AI not acceptable in clinical practice From technical explanations: • non-linear [1] and Deep Learning [2] models • Shapley values [3] • Interpretable ML [4,5] Also importantly: Patient and Public Involvement (PPI) is essential in publicly funded clinical research “Explanation gap”: To expert involvement in the learning process: - by accepting/rejecting predictions - By expressing preference for a given error type Causal Machine Learning (CML) [6,7]: - Visualisation and reasoning over complex clinical scenarios - Counterfactuals, what-if scenarios
  • 24. 24 IEEE BigData 2022 Multimorbidities and disease prediction Multiple Long-Term Conditions, defined as [1,2]: • Two/Four or more long-term (chronic) conditions A Long Term Condition (LTC) is a condition that cannot, at present, be cured but is controlled by medication and/or other treatment/therapies (*) (*) NHS and UK Dept. of Health, Long Term Conditions Compendium of Information Third Edition, https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/216528/dh_134486.pdf [1] M. C. Johnston, M. Crilly, C. Black, G. J. Prescott, and S. W. Mercer, “Defining and measuring multimorbidity: a systematic review of systematic reviews,” European journal of public health, vol. 29, no. 1, pp. 182–189, 2019. [2] B. P. Nunes, T. R. Flores, G. I. Mielke, E. Thum ́e, and L. A. Facchini, “Multimorbidity and mortality in older adults: A systematic review and meta-analysis.” Archives of gerontology and geriatrics, vol. 67, pp. 130–138, Dec. 2016, place: Netherlands. Significant research investment by NIHR, the core translational medicine funder in the UK The number of people with multiple LTCs in the UK is set to rise to 2.9 million in 2018 from 1.9 million in 2008.
  • 25. 25 Multiple Long Term Conditions: research at Newcastle Characterising the inter-relationships between multiple long-term conditions (MLTC) and polypharmacy Funding: NIHR, 2022-2024 (CO-I)  Disease clustering based on co-occurrence in patients’ medical timelines  Patient clustering based on timeline similarity  Predicting outcomes using diagnoses + prescriptions / deep learning • Disease embeddings  supervised learning for outcome prediction  Characterise patient pathways through hospital  process modelling
  • 26. AI methods Outputs & Impacts Datasets Replicate Test Discover UK-Biobank CPRD GNCR ELPR Event spatial & temporal order Event Characterisation Event Prediction Shared standards Portable pipelines Identification of high-risk situations & tipping points Trial emulation Local and national policies (high risk groups) Training & capacity building Clinical dashboard Clinical support tools NIHR AIM CISC Connected Bradford NIHR AIM OPTIMAL Improve patient care Reduce health inequalities Communication of results Explainable research & Explainable AI Local Health Intelligence Datasets Replication Datasets National Discovery Datasets > > Within 5 years
  • 27. LTC Embedding: 200x100 Diagnosis Embedding: 251x50 Historical Prescriptions Embedding: 512x50 Preadmission Prescriptions Embedding: 512x50 Postadmission Prescriptions Embedding: 512x50 Demographics Vector sex, ethnicity, townsend, etc. Feature Vector size: 6048 Gradient Boost size: 100 estimators Output: {0, 1} • Neural network + xgboost combination • Our readmission cohort is more general vs domain specific cohorts in literature • Our model performs better than current literature in spite of more complex problem Predicting MLTC-PP outcomes: hospital readmission Adding explanations: which predictors are more relevant to explain unplanned readmission? - LTCs and how they accumulate - Prescriptions given between discharge and readmission?
  • 28. 28 AI-MULTIPLY Challenges Data integration Protocol design Prospective Dataset search and selection Retrospective Data cleaning Data standardisation Data augmentation - Annotation amplification - Synthetic data . Population characterization . Subgroups identification - Patient subtyping - Disease subtyping - ”group by” - Clustering - Latent Class Analysis - Risk prediction - Next disease prediction - {bio, digital} markers discovery - Other outcomes Process modelling, HMM Established ML - Deep NN - Generative AI (eg BEHRT) Tasks and methods Cross-source integration across types: clinical/EHR/Omics/sensors Understanding data semantics Data and annotation scarcity Managing the quality/quantity/cost envelope Bias control Data noise Advancing the methods: “Better data science for better science” Data governance, computational scalability  Safe Data Environment End-to-end explainability  provenance engineering, demonstrating the benefits Reproducible Analytics Pipelines (RAP) Architectural Data Challenges in Drug coding in UKBB Defining and predicting hospital readmission Defining and coding MLTC Reproduce DNN results across sites Disease clustering and cluster prediction
  • 29. 29 <event name> Challenges and opportunities Data: • Multi-site research presents opportunities for cross-validation of results, but also challenges • Newcastle  UK Biobank • QMUL  CPRD • Projects like these tend to “piggyback” on existing data licenses, which may restrictive Modelling: • LLMs and genAI have shown potential to “sidestep” some of the more traditional prediction techniques  Next disease prediction becomes a case of “sentence completion” Engineering / reproducibility: • at this stage, prototyping and experimenting are distributed across sites and each piece is owned by one researcher • reproducibility and reusability both seem like distant goals… Patient and Public Involvement and Engagement: • Establishing a productive and sustained relationship between PPIE members and researchers is a priority
  • 30. 30 <event name> Role of PPIE in Health Data Science / AI projects PPIE involvement “built into” every NIHR-funded project: it’s an asset and opportunity BUT: need to make it work! What kind of involvement? Consultation vs research co-design • Periodic, scheduled “themed” sessions at designated project checkpoints • Key research questions defined upfront, but opportunities to revise / refine mid-flight • The academic perspective and the lived experiences are very different • Need to find a common language • But also to find a way to ensure mutual benefit and a two-way learning experience
  • 31. 31 <event name> PPIE: some elements for reflection Engendering trust in AI and in secure data management practices • Where is your data held? How do TREs work? • What are the boundaries of legitimate use of your data for research? How is the law changing? • Transparency and explainability: How we can achieve effective communication on what an algorithm is doing? What outcomes are most relevant? Are those aligned with the data we work with? • Ex.: ensuring good Quality of Life for LTC patients: very important, but data hardly available Medication / prescriptions: • Meeting expectations like “predict the best combination of medicines” present hard challenges Data limitations: “you don’t know half my story”
  • 32. 32 <event name> Data governance issues: the emerging UK landscape https://www.goldacrereview.org/ Build a small number of Trusted Research Environments, avoiding duplication Promote culture of reuse of code (curation pipelines, analytics) - Reproducible Analytical Pipelines”, a set of best practices - Promote high quality, shared, reviewable, re-usable, well-documented code for standardized data curation and analysis - Promote transparency, avoid black box analysis Adopt single governance rules for integrated data access - Rationalise approvals: create one map of all approval processes Build appropriate capabilities: - Train academic researchers and NHS analysts in computational data science techniques
  • 33. 33 <event name> Cluster analysis workflows patients x diagnoses Binary matrix Patients’ medical histories (EHR) Latent Class Analysis Patient / cluster associations (discard time) [1,2] [2] Disease clustering Ex: Topic Modelling [1] Patient clustering Cluster phenotyping Patients Cross-sectional data {[…]} [1] [1,2]
  • 38. 38 <event name> Summary Enablers: Data availability Scalable data processing technology Inexpensive, accurate self-monitoring Mature data science and engineering methods Rapidly advancing AI A unique convergence of opportunities and challenges to achieve a “P4” vision of data-driven medicine and healthcare management Blockers: Data access and governance, data integration Data Quality control, device tolerance, intrusiveness Data engineering expensive and ad hoc Still very experimental. Trustworthy, Ethical, Responsible AI Hard “management” questions: - how do you calculate the “total cost of operation” for data-driven medicine? - at which point does it become cost-effective for the health service? - what are the real benefits to patients? - …

Editor's Notes

  1. Mention "reusable analysis pipelines" (RAP) NHS data in the UK are a prime example of retrospective data. In principle accessible for research, but There are governance issues It requires coding and integration
  2. This slide illustrates the range of national and local datasets together with our planned outputs and impacts.r ..We will use UK biobank and CPRD for discovery, our local “health-intelligence” data sets in the North East and East London for testing and then replicate by sharing code with colleagues in Edinburgh, Bradford and Birmingham. Applying a range of AI methods, our outputs and impacts - wide ranging. Specifically, identify high risk situations and tipping points to inform local and national policies. Together these outputs will improve patient care and reduce health inequalities.