SlideShare a Scribd company logo
1 of 70
Predictive Analytics in
Healthcare:
Building Successful
Pipelines
Danielle Belgrave
Researcher
Healthcare Machine Learning
@DaniCMBelg
Latent Variable Modelling
My Research
Missing DataLongitudinal Data Analysis
0 2 4 6 8 10 12 14 16 18 20 22 24
Time (hours)
Patient 1
Patient 2
Patient 3
Patient 4
MultidisciplinaryCausality
X Y
Z
Patient-Centric Approach
Roadmap: Predictive Analytics for Healthcare
Part 1
• Drivers ofpredictive analyticsin healthcare
Part 2
• Evaluationof data-driven healthcare
Part3
• Important concepts in study design
Part 4
• Top 10 DataScience Skills
Part 1: Drivers of Predictive Analytics in healthcare
PopulationPatient/Person Pharmaceuticals Providers
TheKey Stakeholders inHealth
ThePerson at the Centre of Healthcare
Patient/Person
MachineLearninghas the capacityto
transformhealthcare
- Understandingphysiological changes
overtime
- Forecastingofprogression or onsetof
disease
- Personalisingtreatmentstrategies
Population Data-driven Healthcare
Population
Elucidatesaverage effectsand deviations
fromaverage effects
Policy recommendations
Healtheducation
Outreach
Researchfor diseasedetectionand injury
prevention
Reducehealthcareinequalities
Whatwe as asociety do collectively
toassurethe conditionsin which
people can be healthy
The Pharmaceutical Perspective:
Drug Discoveryand Therapeutics
Idea
Basic
Research
Clinical
Trails
Phase I Phase II Phase III Regulatory
Approval
Patient
Care
Data Protection and
Connected Care: The Provider
and Regulator Perspective
Providers
Predictive Analytics forHealthcare
Part 1
• Drivers of predictive analyticsin healthcare
Part 2
• Evaluation of data-driven healthcare
Part3
• Important concepts in study design
Part 4
• Top 10 DataScience Skills
DATA EXPERTISE
METHODS &
MODELS
Vast data volume,
velocity, variety
TSUNAMI
Supra-linear growth in
papers & tools
BLIZZARD
Human expertise to make
sense of the growing data
DROUGHT
Myth 1: ‘Big data’ are a solution
Myth 2: We have the models needed to turn healthcare data into decision support algorithms
Myth 3: Clinicians will continue to be the main source of data
Current Stateof Healthcare Data
Machine Learning has the Potential to
Disrupt and Impact Healthcare…
…But it’s Important to Remember the Beginnings
The study of the distribution and determinantsof healthrelated statesor events in
specific populations & theapplications of this study to thecontrol of healthproblems
Data visualisation: death toll of
the Crimean War
Army data: 16,000/18,000
deaths notdue to battle
wounds, but to preventable
diseases, spread by poor
sanitation
The Beginnings of Data-Driven Health
Florence Nightingale(1820– 1910)
Anchor Data Science in
The Power ofObservation
Contextualphenomena:cholera
incidence
Ecologicaldesign:compare
cholera rates by region
Cohort design:comparecholera
ratesin exposedand non-
exposedindividuals
R.A. Fisher and the
Principles of Experimental Design
1.Randomisation:Unbiasedallocationof
treatmentsto
differentexperimentalplot
2.Replication:repetitionofthe treatment
tomorethan
oneexperimentalplot
3.Error control: Measurefor reducing the
error ofvariance
Why do these 2
plants differ in growth?
Predictive Analytics forHealthcare
Part 1
• Drivers of predictive analyticsin healthcare
Part 2
• Evaluationof data-driven healthcare
Part3
• Important conceptsin study design
Part 4
• Top 10 DataScience Skills
Principles of Study Design
Needtosetup a studytoanswer aresearch question
Designmost importantaspect of a studyand perhaps themostneglected
Thestudydesign should matchresearch question
So that we don’t endup collecting useless data orthe principle outcome ends up not being recorded
Nomatter howgoodanalgorithmis,if the studydesign is inadequate
(garbage in) for answeringthe research question,we’ll get garbage out
Good Predictive Analyticsfor Healthcare
starts with Good Study Design
Threequestions:
1. Whatis thequestionor hypothesis I’mtrying toinvestigate
2. Is thedataI have adequateinorder to address thisquestion
3. How can I design astudythataddresses thesequestions
Types of Study Design
Non-Experimental
Observational
Studies
Descriptive
Case Reports
Case Series
Cross-Sectional or
Prevalence Study
Analytical
Case-control
Cohort Study
Experimental
Intervention
Studies
Randomised
Clinical Trial
Non-randomized/
Field/ Community
Trial
CaseReport
Profileofa singlepatientis reportedin detailby oneor more clinician
CaseSeries
An individualcase report thathas been expandedtoincludea number ofpatientswitha
givendisease
Cross-sectional study
• Allinformationcollectedatsametimepoint
• Snapshots of healthstate
• Designedtoobtaininformationfromsamplesregarding prevalence, distributionsand
interrelations
• Eg:survey
• Thereare no typical formats:they aredesignedor modifiedtomeettheneeds ofthe
researcher or fitthetopicofresearch
Cross-sectional Study
Source
Population
Exposed
&
Bad outcome
Exposed
&
Good outcome
Unexposed
&
Bad outcome
Unexposed
&
Good outcome
Ineligible Eligible
Participation
No
Paricipation
Types of Study Design
Non-Experimental
Observational
Studies
Descriptive
Case Reports
Case Series
Cross-Sectional or
Prevalence Study
Analytical
Case-control
Cohort Study
Experimental
Intervention
Studies
Randomised
Clinical Trial
Non-randomized/
Field/ Community
Trial
Observational Studies
• Analytical
• Mainobjective is to test hypothesis of relationship between exposure to risk factor and
disease or other healthoutcome
• A measure of associationis estimated
• The magnitude, precision and statisticalsignificanceif the associationis determined
Cohort Study
• Identifya group ofsubjects
• Followthemover time
• Compare eventrate in:
• Subjects exposed to risk factors and
• Those not exposedto risk factor
• Can be both retrospectiveand prospective
Cohort Study
Source
Population
Ineligible Eligible
Exposed
No
Participation
Participation
Bad
Outcome
Good
Outcome
Unexposed
No
Participation
Participation
Bad
Outcome
Good
Outcome
Types of Study Design
Non-Experimental
Observational
Studies
Descriptive
Case Reports
Case Series
Cross-Sectional or
Prevalence Study
Analytical
Case-control
Cohort Study
Experimental
Intervention
Studies
Randomised
Clinical Trial
Non-randomized/
Field/ Community
Trial
Important Concept: Randomisation
Definition:Theprocess by which allocationofsubjectstotreatmentgroups
is doneby chance,withouttheabilitytopredict who is in whatgroup
Aims:-
-To prevent statisticalbias in allocating subjects to treatment groups
- To achieve comparability between the groups
- To ensure samples representative of the general population
SimpleRandomSampling PermutedBlockRandomisation StratifiedRandomSampling
Methods ofRandomisation
1 2 3 4
5 6 7 8
9 10 11 12
2 35
3
8 10
Population
Sample
AABBAABB
BBBBAAA
AABAAABB
Populations Strata Sample
Predictive Analytics forHealthcare
Part 1
• Drivers of predictive analyticsin healthcare
Part 2
• Evaluationof data-driven healthcare
Part3
• Important concepts in study design
Part 4
• Top10 Data Science Skills
Skill #1: ROC Curves
No disease Disease
No disease
(D = 0)

Specificity
X
Type I error (False
+) 
Disease
(D = 1)
X
Type II error (False -)


Power 1 - ;
Sensitivity
UsefulTool forDiagnosticTestEvaluation
Specific Example:How welldoesIgE“score” for Ara h 2peanutcomponent
predict/ diagnosepeanutallergy
IgE response to Ara h2
Ptswith
peanut allergy
Ptswithout the
peanut allergy
IgE response to Ara h2
Call these patients “non-peanut allergic” Call these patients “peanut allergic”
Threshold
IgE= 3.15
Call these patients “non-peanut allergic” Call these patients “peanut allergic”
withoutpeanut allergy
with peanutallergy
TruePositives
Some definitions ...
IgE response to Ara h2
IgE= 3.15
withoutpeanut allergy
with peanutallergy
False Positives
IgE response to Ara h2
IgE= 3.15
Call these patients “non-peanut allergic” Call these patients “peanut allergic”
True negatives
Call these patients “non-peanut allergic” Call these patients “peanut allergic”
withoutpeanut allergy
with peanutallergy
IgE response to Ara h2
IgE= 3.15
False negatives
Call these patients “non-peanut allergic” Call these patients “peanut allergic”
withoutpeanut allergy
with peanutallergy
IgE response to Ara h2
IgE= 3.15
TruePositiveRate(sensitivity)
0%
100%
FalsePositiveRate(1-
specificity)
0% 100%
ROC curve
IgEResponse to Arah 2
without peanut allergy
with peanut allergy
Skill #2: Power and Sample Size Calculation
https://towardsdatascience.com/5-quick-and-easy-data-visualizations-in-python-with-code-a2284bae952f
𝑛 =
𝑟 + 1
𝑟
𝜎2
𝑍 𝛽 + 𝑍 𝛼
2
2
𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒2
r=ratio ofcontrols tocases
Samplesize in case group
Standard deviation ofthe outcome
variable
Representsthe desired power
(typically0.84 for 80% power)
Effect Size
(the difference in means)
Representsthelevelof statistical
significance
(typically1.96)
Skill #3: Data Visualisation
https://towardsdatascience.com/5-quick-and-easy-data-visualizations-in-python-with-code-a2284bae952f
Skill #4: A/B Testing
Patient Group
Population is splitinto 2
groups byrandom allocation
Intervention
Control
Outcomes for both
groups are measured
= Cured = Still Diseased
Skill #5 Linear Regression
• Linear regression allowsus tolookat thelinear relationshipbetweenone normally
distributedinterval predictorand onenormallydistributedinterval outcomevariable
• Example:Wewish tolookatthe relationshipbetweenFEV1and IgEatage5
• In otherwords, predictingFEV1from IgE
Linear Regression: Example
• WeseethattherelationshipbetweenFEVand TotalIgEat age 5 isnegative(-0.11)
• Based on thep-value(<0.05), wewouldconclude thisrelationship is statistically
significant.
• Hence, wewould say thereisa statisticallysignificantnegativelinearrelationship
betweenreading andwriting.
Coefficientsa
1.146 .033 35.014 .000
-.011 .005 -.221 -2.242 .027
(Constant)
Total serum IgE, age 5
Model
1
B Std. Error
Unstandardized
Coefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: FEV1, age 5a.
Skill #6: Logistic Regression
Asthm a/wheeze sym ptom between 0 to 8 years * Ever antibiotic in first 12 m onths
165 257 422
39.1% 60.9% 100.0%
98 380 478
20.5% 79.5% 100.0%
263 637 900
29.2% 70.8% 100.0%
Control
Case
Total
No Yes
Antibiotic in first 12 months
Total
Odds of developing asthma inantibiotic group are:
(380/637)/(257/637)= 380/257= 1.48
Odds of developing asthma innon-antibiotic group are:
(98/263)/(165/263)= 98/165= 0.59
Theratio of odds for antibiotic users to the odds of non-antibiotic
users is:
(380/257)/(98/165)= (380*165)/(257*98)= 2.49
Theodds for antibiotic users is 1.49times the odds for non-users
What are the Odds?
Variables in the Equation
.912 .151 36.506 1 .000 2.489 1.852 3.347
-.521 .128 16.688 1 .000 .594
evrab12m
Constant
Step
1
a
B S.E. Wald df Sig. Exp(B) Lower Upper
95.0%C.I.for EXP(B)
Variable(s) entered on step 1: evrab12m.a.
Theinterceptof-0.52is thelogoddsfor non-antibioticusers since
this isthereference group.log(-0.52)=0.59
Thecoefficientofevrab12mis thelogoftheodds ratiobetweenthe
antibioticusers andnon-antibioticusers
Skill #7: CausalReasoning
Thequestions that motivate most studies in the health, social and behavioral sciences
arenotassociational butcausal in nature.
Before an association is assessed for the possibility that it is causal, other explanations
such as chance,bias and confounding have to beexcluded
Require some knowledge of the data-generating process - cannot be computed from
the data alone, nor from distributions governing data
Aim: to infer dynamics of beliefs under changing conditions, for example, changes
induced by treatments orexternal interventions.
Pearl, Judea."Causal inferencein statistics: An overview." Statistics surveys3 (2009): 96-146.
Bradford-Hill Principles of Causality
Plausibility
Does causation make sense
Consistency
Cause associated with disease in
different population and studies
Temporality
Cause precedes disease
Strength
Cause strongly associated with disease
Specificity
Does the cause lead toa specific effect
Dose-Response
Greater exposure to cause,
higher the risk of disease
Prognosticbiomarker (risk
factor)
Treatment
Genetic Marker
Tumor Size
Outcome
(Survival)
U
Example: Personalisation of Cancer Treatment
Skill #8: Missing Data
MissingCompletelyAtRandom(MCAR)
The probability of data being missing does not depend on the observed or unobserved data
e.g. logit(pit) = θ0
MissingAtRandom(MAR)
The probability of data being missing does not depend on the unobserved data, conditional on the observed data
e.g. Children with missing wheeze data have better lung function
e.g. logit(pit) = θ0 + θ1ti or logit(pit) = θ0 + θ2y0
MissingNotAt Random(MNAR)
The probability of data being missing does depend on the unobserved data, conditional on the observed data.
e.g. Children with missing lung function have better lung function
e.g. logit(pit) = θ0 + θ3yit
Alexina Mason. “Bayesian methods for modelling non-randommissing data mechanisms in longitudinal studies” PhD Thesis (2009)
The Challenge of Missing Data
Missing data is a common problem in healthcaredata and can produce biased
parameter estimates
Reasons for missingness may be informativefor estimatingmodel parameters
Bayesian models: coherent approach to incorporating uncertaintyby assigning
prior distributions
Mason, Alexina, NickyBest, SylviaRichardson, and IANPLEWIS. "Strategy for modelling non-randommissing data mechanisms in observational studies using
Bayesian methods." Journalof Official Statistics (2010)
Skill #9: LatentVariable Modelling using Probabilistic
Programming
To identifysubgroups ofcomplexdiseaserisk
Treatmentoutcomeexplainedby distinctiveunderlyingmechanism
FoundationofStratifiedMedicine
Seekingbetter-targetedinterventions
Identifying DiseaseEndotypes
Parsimonious descriptionofthe
datainferredfromwhatis
observed
Poor
Lung Function
Wheeze
Allergy
AsthmaMedication
Asthma Symptoms
Exacerbations
Grow out of
Asthma
Asthma Late in
Childhood
Respond to
treatment
Don’t Respond to
treatment
Severity
EndotypesDiscovery: Different DiseasesWith Different Causes
Phenotypes: Observable Manifestationsof Disease
Defining Asthma
Asthma
Probabilistic Programming:
Finding Patterns in Data
Inference
Algorithm
Probabilistic
model
Probabilistic reasoning system
The probabilistic
model expresses
general knowledge
about a situation
The inference
algorithm uses the
model to answer
queries given evidence
The answers to queries
are framed as
probabilities of
different outcomes
The basic components of a probabilistic reasoning system
Answer
Queries
Evidence
The evidence contains
specific information
about a situation
The queries express
the things that will
help you make a
decisionAdapted from Pfeffer,Avi. "Practical probabilistic programming." International ConferenceonInductive Logic Programming. Springer Berlin Heidelberg, 2010.
1. Team Science:Discoveries about healthcare, not hypothesised a priori, have been made by experts
explaining structure learned from data by algorithms tuned by those experts
2. Heuristic blend of biostatistics and machine-learningfor principled problem-led healthcare research
3. An ML approach to extracting knowledge from information in healthcare requires persistent integration of
Data
Methods
Expertise
Skill #10: Interdisciplinary Research
Problem-led vsData-driven Health
Thinkdeeplyabout theclinical context.Find
solutions which are specifictothe problem.
Goodscienceis aboutmergingdifferentschoolsof
thoughtfordevelopingthebiggerpicture.
Datadriven approach + DomainKnowledge=
Problem-ledapproach withthe patientatthecentre
DanielleBelgrave, JohnHenderson,AngelaSimpson,IainBuchan,ChristopherBishop,andAdnanCustovic.
"Disaggregatingasthma:Big investigationversusbig data." Journalof Allergy andClinicalImmunology139, no. 2 (2017):400-407..
Take Home Message
Problem-Led Patient-Centred Research
Healthcare ML at MSRCambridge
Javier Alvarez-Valle Danielle Belgrave Chris Bishop Laurence Bourn David Carter Richard Lowe
Hannah Murfet Jay Nanavati Kenton O’Hara Konstantina Palla Anton Schwaighofer
Kenji Takeda Ivan Tarapov Stefan Wijnen
Aditya NoriPratik Ghosh
Anja Thieme
Tim Regan
Isabel Chien
Jan Steumer
Antonio Criminisi
Sebastian Tschiatschek
Thank You
@DaniCMBelg
Cohort Study: Example
• A long-term follow-up study of doctors began in 1950 (Doll and Hill)
• Questionnaire sent to all men and women on British Medical Register and
Resident in the UK
• Asked about age and smoking habits
• Replies obtained from 34,440 men
• Subsequent follow-up focused on men
• Further questionnaires sent at intervals and asked about changes in smoking
habit
• During 1951-1971 10,074 had died
Missing Completely At Random
𝑚𝑖
𝑝𝑖σ²
β
µ𝑖
θ
𝑦𝑖
Individual i
Model of Interest
Model of
Missingness
𝑥𝑖
logit(pit) = θ0
Missing At Random
𝑚𝑖
𝑝𝑖σ²
β
µ𝑖
θ
𝑦𝑖
Individual i
Model of Interest
Model of Missingness
𝑥𝑖
logit(pit) = θ0 + θ1xi
Missing NotAt Random
𝑚𝑖
𝑝𝑖σ²
β
µ𝑖
θ
𝑦𝑖
Individual i
Model of Interest
Model of Missingness
𝑥𝑖
logit(pit) = θ0 + θ3yit
Prognostic Biomarker (Risk Factor)
A biological measurementmade before treatment toindicatelong-term
outcome for patientseither untreated or receiving standardoutcome
Prognosticbiomarker (risk
factor)
Random Allocation
(Treatment) Outcomes
Dunn,Graham, RichardEmsley, Hanhua Liu, and Sabine Landau. "Integratingbiomarkerinformation within trials to evaluatetreatment mechanisms andefficacy for
personalised medicine." ClinicalTrials10, no. 5 (2013): 709-719.
Predictive Biomarker (Moderator)
A variablethat changesthe impact of treatment on the outcome. A
biologicalmeasurement made before treatmentto identifypatientslikely
or unlikelyto benefit from a particulartreatment
Predictive biomarker
(Moderator)
Random Allocation
(Treatment) Outcomes
Mediator
A mechanism by which one variableaffectsanother variable.Omitted
common causes (hidden confounding)shouldalwaysbe considered as a
possibleexplanationfor associationsthat might be interpreted as causal
Random Allocation
(Treatment)
Mediator
Outcomes
U
Efficacyand mechanism evaluation: Causal
Framework forinvestigating who
medications workfor
Prognosticbiomarker (risk
factor)
Random Allocation
Predictive biomarker
(moderator)
Mediator
Outcomes
U
‘‘-’’ ‘‘+’’
Moving the Threshold: right
withoutpeanut allergy
with peanutallergy
IgE response to Ara h2
3.15 7.155.15 9.15
‘‘-’’ ‘‘+’’
Moving the Threshold: left
IgE response to Ara h2
withoutpeanut allergy
with peanutallergy
3.152.151.150.15

More Related Content

What's hot

Neural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmNeural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmMostafa G. M. Mostafa
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDatamining Tools
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Venkata Reddy Konasani
 
Data Mining in Health Care
Data Mining in Health CareData Mining in Health Care
Data Mining in Health CareShahDhruv21
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Data Mining in Healthcare: How Health Systems Can Improve Quality and Reduce...
Data Mining in Healthcare:  How Health Systems Can Improve Quality and Reduce...Data Mining in Healthcare:  How Health Systems Can Improve Quality and Reduce...
Data Mining in Healthcare: How Health Systems Can Improve Quality and Reduce...Health Catalyst
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An OverviewMachinePulse
 
Machine Learning Strategies for Time Series Prediction
Machine Learning Strategies for Time Series PredictionMachine Learning Strategies for Time Series Prediction
Machine Learning Strategies for Time Series PredictionGianluca Bontempi
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxVrishit Saraswat
 
Remote patient monitoring system
Remote patient monitoring system Remote patient monitoring system
Remote patient monitoring system Rahul Singh
 
Big-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewBig-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewHamdaoui Younes
 
Crop predction ppt using ANN
Crop predction ppt using ANNCrop predction ppt using ANN
Crop predction ppt using ANNAstha Jain
 
Big Data Science - hype?
Big Data Science - hype?Big Data Science - hype?
Big Data Science - hype?BalaBit
 
Ppt for Application of big data
Ppt for Application of big dataPpt for Application of big data
Ppt for Application of big dataPrashant Sharma
 

What's hot (20)

Neural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmNeural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) Algorithm
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
AI in the Covid-19 pandemic
AI in the Covid-19 pandemicAI in the Covid-19 pandemic
AI in the Covid-19 pandemic
 
Data Mining in Health Care
Data Mining in Health CareData Mining in Health Care
Data Mining in Health Care
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Data Mining in Healthcare: How Health Systems Can Improve Quality and Reduce...
Data Mining in Healthcare:  How Health Systems Can Improve Quality and Reduce...Data Mining in Healthcare:  How Health Systems Can Improve Quality and Reduce...
Data Mining in Healthcare: How Health Systems Can Improve Quality and Reduce...
 
Statistics and data science
Statistics and data scienceStatistics and data science
Statistics and data science
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Machine Learning Strategies for Time Series Prediction
Machine Learning Strategies for Time Series PredictionMachine Learning Strategies for Time Series Prediction
Machine Learning Strategies for Time Series Prediction
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Remote patient monitoring system
Remote patient monitoring system Remote patient monitoring system
Remote patient monitoring system
 
Medical Data Analysis
Medical Data AnalysisMedical Data Analysis
Medical Data Analysis
 
Big-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewBig-Data in HealthCare _ Overview
Big-Data in HealthCare _ Overview
 
Crop predction ppt using ANN
Crop predction ppt using ANNCrop predction ppt using ANN
Crop predction ppt using ANN
 
Deep learning and Healthcare
Deep learning and HealthcareDeep learning and Healthcare
Deep learning and Healthcare
 
Big Data Science - hype?
Big Data Science - hype?Big Data Science - hype?
Big Data Science - hype?
 
Federated Learning
Federated LearningFederated Learning
Federated Learning
 
Ppt for Application of big data
Ppt for Application of big dataPpt for Application of big data
Ppt for Application of big data
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 

Similar to Day 1 (Lecture 3): Predictive Analytics in Healthcare

Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptx
Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptxDr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptx
Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptxPriyankaSharma89719
 
EPIDEMIOLOGY OF PERIODONTAL DISEASE DR SINDHURA.ppt
EPIDEMIOLOGY OF PERIODONTAL DISEASE DR SINDHURA.pptEPIDEMIOLOGY OF PERIODONTAL DISEASE DR SINDHURA.ppt
EPIDEMIOLOGY OF PERIODONTAL DISEASE DR SINDHURA.pptDentalYoutube
 
Research methodology
Research methodologyResearch methodology
Research methodologymonaaboserea
 
Case control study
Case control studyCase control study
Case control studyAbhijit Das
 
1_Intro to Research.pdf
1_Intro to Research.pdf1_Intro to Research.pdf
1_Intro to Research.pdfMharCastro
 
Epidemiology for Ophthalmologist.pptx
Epidemiology for Ophthalmologist.pptxEpidemiology for Ophthalmologist.pptx
Epidemiology for Ophthalmologist.pptxFruArnold
 
P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...David Pratap
 
Epidemiological Studies
Epidemiological StudiesEpidemiological Studies
Epidemiological StudiesINAAMUL HAQ
 
Designing Causal Inference Studies Using Real-World Data
Designing Causal Inference Studies Using Real-World DataDesigning Causal Inference Studies Using Real-World Data
Designing Causal Inference Studies Using Real-World DataInsideScientific
 
Critical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptxCritical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptxMrs S Sen
 
Using real-world evidence to investigate clinical research questions
Using real-world evidence to investigate clinical research questionsUsing real-world evidence to investigate clinical research questions
Using real-world evidence to investigate clinical research questionsKarin Verspoor
 
Research methodology 101
Research methodology 101Research methodology 101
Research methodology 101Hesham Gaber
 
RCT to causal inference.pptx
RCT to causal inference.pptxRCT to causal inference.pptx
RCT to causal inference.pptxFrancois MAIGNEN
 

Similar to Day 1 (Lecture 3): Predictive Analytics in Healthcare (20)

Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptx
Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptxDr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptx
Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptx
 
EPIDEMIOLOGY OF PERIODONTAL DISEASE DR SINDHURA.ppt
EPIDEMIOLOGY OF PERIODONTAL DISEASE DR SINDHURA.pptEPIDEMIOLOGY OF PERIODONTAL DISEASE DR SINDHURA.ppt
EPIDEMIOLOGY OF PERIODONTAL DISEASE DR SINDHURA.ppt
 
Types of studies 2016
Types of studies 2016Types of studies 2016
Types of studies 2016
 
Research methodology part 1
Research methodology part 1Research methodology part 1
Research methodology part 1
 
5. Case control
5. Case control5. Case control
5. Case control
 
Research methodology
Research methodologyResearch methodology
Research methodology
 
Cohort study
Cohort studyCohort study
Cohort study
 
clinical trial designs
clinical trial designsclinical trial designs
clinical trial designs
 
Case control study
Case control studyCase control study
Case control study
 
Etiology Research.pptx
Etiology Research.pptxEtiology Research.pptx
Etiology Research.pptx
 
1_Intro to Research.pdf
1_Intro to Research.pdf1_Intro to Research.pdf
1_Intro to Research.pdf
 
Epidemiology for Ophthalmologist.pptx
Epidemiology for Ophthalmologist.pptxEpidemiology for Ophthalmologist.pptx
Epidemiology for Ophthalmologist.pptx
 
P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...
 
Epidemiological Studies
Epidemiological StudiesEpidemiological Studies
Epidemiological Studies
 
Designing Causal Inference Studies Using Real-World Data
Designing Causal Inference Studies Using Real-World DataDesigning Causal Inference Studies Using Real-World Data
Designing Causal Inference Studies Using Real-World Data
 
Critical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptxCritical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptx
 
Using real-world evidence to investigate clinical research questions
Using real-world evidence to investigate clinical research questionsUsing real-world evidence to investigate clinical research questions
Using real-world evidence to investigate clinical research questions
 
Research methodology 101
Research methodology 101Research methodology 101
Research methodology 101
 
Evidence-Based Medicine Glossary
Evidence-Based Medicine GlossaryEvidence-Based Medicine Glossary
Evidence-Based Medicine Glossary
 
RCT to causal inference.pptx
RCT to causal inference.pptxRCT to causal inference.pptx
RCT to causal inference.pptx
 

More from Aseda Owusua Addai-Deseh

Day 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and ApplicationsDay 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and ApplicationsAseda Owusua Addai-Deseh
 
Day 1 (Lecture 4): Data Science in the Retail Marketing and Financial Services
Day 1 (Lecture 4): Data Science in the Retail Marketing and Financial ServicesDay 1 (Lecture 4): Data Science in the Retail Marketing and Financial Services
Day 1 (Lecture 4): Data Science in the Retail Marketing and Financial ServicesAseda Owusua Addai-Deseh
 
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...Aseda Owusua Addai-Deseh
 
Day 2 (Lecture 3): Deep Learning Fundamentals - Architecture and Applications
Day 2 (Lecture 3): Deep Learning Fundamentals - Architecture and ApplicationsDay 2 (Lecture 3): Deep Learning Fundamentals - Architecture and Applications
Day 2 (Lecture 3): Deep Learning Fundamentals - Architecture and ApplicationsAseda Owusua Addai-Deseh
 
Day 1 Keynote Address-GDSS 2019 (IndabaX Ghana)
Day 1 Keynote Address-GDSS 2019 (IndabaX Ghana)Day 1 Keynote Address-GDSS 2019 (IndabaX Ghana)
Day 1 Keynote Address-GDSS 2019 (IndabaX Ghana)Aseda Owusua Addai-Deseh
 
Day 1 (Lecture 1): Data Management- The Foundation of all Analytics
Day 1 (Lecture 1): Data Management- The Foundation of all AnalyticsDay 1 (Lecture 1): Data Management- The Foundation of all Analytics
Day 1 (Lecture 1): Data Management- The Foundation of all AnalyticsAseda Owusua Addai-Deseh
 
Day 2 (Lecture 4): Machine Learning Applications in Health Care
Day 2 (Lecture 4): Machine Learning Applications in Health CareDay 2 (Lecture 4): Machine Learning Applications in Health Care
Day 2 (Lecture 4): Machine Learning Applications in Health CareAseda Owusua Addai-Deseh
 

More from Aseda Owusua Addai-Deseh (9)

Day 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and ApplicationsDay 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
 
Day 1 (Lecture 4): Data Science in the Retail Marketing and Financial Services
Day 1 (Lecture 4): Data Science in the Retail Marketing and Financial ServicesDay 1 (Lecture 4): Data Science in the Retail Marketing and Financial Services
Day 1 (Lecture 4): Data Science in the Retail Marketing and Financial Services
 
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
 
Day 2 (Lecture 3): Deep Learning Fundamentals - Architecture and Applications
Day 2 (Lecture 3): Deep Learning Fundamentals - Architecture and ApplicationsDay 2 (Lecture 3): Deep Learning Fundamentals - Architecture and Applications
Day 2 (Lecture 3): Deep Learning Fundamentals - Architecture and Applications
 
Day 1 Keynote Address-GDSS 2019 (IndabaX Ghana)
Day 1 Keynote Address-GDSS 2019 (IndabaX Ghana)Day 1 Keynote Address-GDSS 2019 (IndabaX Ghana)
Day 1 Keynote Address-GDSS 2019 (IndabaX Ghana)
 
Day 1 (Lecture 1): Data Management- The Foundation of all Analytics
Day 1 (Lecture 1): Data Management- The Foundation of all AnalyticsDay 1 (Lecture 1): Data Management- The Foundation of all Analytics
Day 1 (Lecture 1): Data Management- The Foundation of all Analytics
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
 
Day 2 (Lecture 4): Machine Learning Applications in Health Care
Day 2 (Lecture 4): Machine Learning Applications in Health CareDay 2 (Lecture 4): Machine Learning Applications in Health Care
Day 2 (Lecture 4): Machine Learning Applications in Health Care
 
Welcome Address-GDSS 2019 (IndabaX Ghana)
Welcome Address-GDSS 2019 (IndabaX Ghana)Welcome Address-GDSS 2019 (IndabaX Ghana)
Welcome Address-GDSS 2019 (IndabaX Ghana)
 

Recently uploaded

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 

Day 1 (Lecture 3): Predictive Analytics in Healthcare

  • 1. Predictive Analytics in Healthcare: Building Successful Pipelines Danielle Belgrave Researcher Healthcare Machine Learning @DaniCMBelg
  • 2. Latent Variable Modelling My Research Missing DataLongitudinal Data Analysis 0 2 4 6 8 10 12 14 16 18 20 22 24 Time (hours) Patient 1 Patient 2 Patient 3 Patient 4 MultidisciplinaryCausality X Y Z Patient-Centric Approach
  • 3. Roadmap: Predictive Analytics for Healthcare Part 1 • Drivers ofpredictive analyticsin healthcare Part 2 • Evaluationof data-driven healthcare Part3 • Important concepts in study design Part 4 • Top 10 DataScience Skills
  • 4. Part 1: Drivers of Predictive Analytics in healthcare PopulationPatient/Person Pharmaceuticals Providers TheKey Stakeholders inHealth
  • 5. ThePerson at the Centre of Healthcare Patient/Person MachineLearninghas the capacityto transformhealthcare - Understandingphysiological changes overtime - Forecastingofprogression or onsetof disease - Personalisingtreatmentstrategies
  • 6. Population Data-driven Healthcare Population Elucidatesaverage effectsand deviations fromaverage effects Policy recommendations Healtheducation Outreach Researchfor diseasedetectionand injury prevention Reducehealthcareinequalities Whatwe as asociety do collectively toassurethe conditionsin which people can be healthy
  • 7. The Pharmaceutical Perspective: Drug Discoveryand Therapeutics Idea Basic Research Clinical Trails Phase I Phase II Phase III Regulatory Approval Patient Care
  • 8. Data Protection and Connected Care: The Provider and Regulator Perspective Providers
  • 9. Predictive Analytics forHealthcare Part 1 • Drivers of predictive analyticsin healthcare Part 2 • Evaluation of data-driven healthcare Part3 • Important concepts in study design Part 4 • Top 10 DataScience Skills
  • 10. DATA EXPERTISE METHODS & MODELS Vast data volume, velocity, variety TSUNAMI Supra-linear growth in papers & tools BLIZZARD Human expertise to make sense of the growing data DROUGHT Myth 1: ‘Big data’ are a solution Myth 2: We have the models needed to turn healthcare data into decision support algorithms Myth 3: Clinicians will continue to be the main source of data Current Stateof Healthcare Data
  • 11. Machine Learning has the Potential to Disrupt and Impact Healthcare…
  • 12. …But it’s Important to Remember the Beginnings The study of the distribution and determinantsof healthrelated statesor events in specific populations & theapplications of this study to thecontrol of healthproblems
  • 13. Data visualisation: death toll of the Crimean War Army data: 16,000/18,000 deaths notdue to battle wounds, but to preventable diseases, spread by poor sanitation The Beginnings of Data-Driven Health Florence Nightingale(1820– 1910)
  • 14. Anchor Data Science in The Power ofObservation Contextualphenomena:cholera incidence Ecologicaldesign:compare cholera rates by region Cohort design:comparecholera ratesin exposedand non- exposedindividuals
  • 15. R.A. Fisher and the Principles of Experimental Design 1.Randomisation:Unbiasedallocationof treatmentsto differentexperimentalplot 2.Replication:repetitionofthe treatment tomorethan oneexperimentalplot 3.Error control: Measurefor reducing the error ofvariance Why do these 2 plants differ in growth?
  • 16. Predictive Analytics forHealthcare Part 1 • Drivers of predictive analyticsin healthcare Part 2 • Evaluationof data-driven healthcare Part3 • Important conceptsin study design Part 4 • Top 10 DataScience Skills
  • 17. Principles of Study Design Needtosetup a studytoanswer aresearch question Designmost importantaspect of a studyand perhaps themostneglected Thestudydesign should matchresearch question So that we don’t endup collecting useless data orthe principle outcome ends up not being recorded Nomatter howgoodanalgorithmis,if the studydesign is inadequate (garbage in) for answeringthe research question,we’ll get garbage out
  • 18. Good Predictive Analyticsfor Healthcare starts with Good Study Design Threequestions: 1. Whatis thequestionor hypothesis I’mtrying toinvestigate 2. Is thedataI have adequateinorder to address thisquestion 3. How can I design astudythataddresses thesequestions
  • 19. Types of Study Design Non-Experimental Observational Studies Descriptive Case Reports Case Series Cross-Sectional or Prevalence Study Analytical Case-control Cohort Study Experimental Intervention Studies Randomised Clinical Trial Non-randomized/ Field/ Community Trial
  • 20. CaseReport Profileofa singlepatientis reportedin detailby oneor more clinician
  • 21. CaseSeries An individualcase report thathas been expandedtoincludea number ofpatientswitha givendisease
  • 22. Cross-sectional study • Allinformationcollectedatsametimepoint • Snapshots of healthstate • Designedtoobtaininformationfromsamplesregarding prevalence, distributionsand interrelations • Eg:survey • Thereare no typical formats:they aredesignedor modifiedtomeettheneeds ofthe researcher or fitthetopicofresearch
  • 23. Cross-sectional Study Source Population Exposed & Bad outcome Exposed & Good outcome Unexposed & Bad outcome Unexposed & Good outcome Ineligible Eligible Participation No Paricipation
  • 24. Types of Study Design Non-Experimental Observational Studies Descriptive Case Reports Case Series Cross-Sectional or Prevalence Study Analytical Case-control Cohort Study Experimental Intervention Studies Randomised Clinical Trial Non-randomized/ Field/ Community Trial
  • 25. Observational Studies • Analytical • Mainobjective is to test hypothesis of relationship between exposure to risk factor and disease or other healthoutcome • A measure of associationis estimated • The magnitude, precision and statisticalsignificanceif the associationis determined
  • 26. Cohort Study • Identifya group ofsubjects • Followthemover time • Compare eventrate in: • Subjects exposed to risk factors and • Those not exposedto risk factor • Can be both retrospectiveand prospective
  • 28. Types of Study Design Non-Experimental Observational Studies Descriptive Case Reports Case Series Cross-Sectional or Prevalence Study Analytical Case-control Cohort Study Experimental Intervention Studies Randomised Clinical Trial Non-randomized/ Field/ Community Trial
  • 29. Important Concept: Randomisation Definition:Theprocess by which allocationofsubjectstotreatmentgroups is doneby chance,withouttheabilitytopredict who is in whatgroup Aims:- -To prevent statisticalbias in allocating subjects to treatment groups - To achieve comparability between the groups - To ensure samples representative of the general population
  • 30. SimpleRandomSampling PermutedBlockRandomisation StratifiedRandomSampling Methods ofRandomisation 1 2 3 4 5 6 7 8 9 10 11 12 2 35 3 8 10 Population Sample AABBAABB BBBBAAA AABAAABB Populations Strata Sample
  • 31. Predictive Analytics forHealthcare Part 1 • Drivers of predictive analyticsin healthcare Part 2 • Evaluationof data-driven healthcare Part3 • Important concepts in study design Part 4 • Top10 Data Science Skills
  • 32. Skill #1: ROC Curves No disease Disease No disease (D = 0)  Specificity X Type I error (False +)  Disease (D = 1) X Type II error (False -)   Power 1 - ; Sensitivity UsefulTool forDiagnosticTestEvaluation
  • 33. Specific Example:How welldoesIgE“score” for Ara h 2peanutcomponent predict/ diagnosepeanutallergy IgE response to Ara h2 Ptswith peanut allergy Ptswithout the peanut allergy
  • 34. IgE response to Ara h2 Call these patients “non-peanut allergic” Call these patients “peanut allergic” Threshold IgE= 3.15
  • 35. Call these patients “non-peanut allergic” Call these patients “peanut allergic” withoutpeanut allergy with peanutallergy TruePositives Some definitions ... IgE response to Ara h2 IgE= 3.15
  • 36. withoutpeanut allergy with peanutallergy False Positives IgE response to Ara h2 IgE= 3.15 Call these patients “non-peanut allergic” Call these patients “peanut allergic”
  • 37. True negatives Call these patients “non-peanut allergic” Call these patients “peanut allergic” withoutpeanut allergy with peanutallergy IgE response to Ara h2 IgE= 3.15
  • 38. False negatives Call these patients “non-peanut allergic” Call these patients “peanut allergic” withoutpeanut allergy with peanutallergy IgE response to Ara h2 IgE= 3.15
  • 40. Skill #2: Power and Sample Size Calculation https://towardsdatascience.com/5-quick-and-easy-data-visualizations-in-python-with-code-a2284bae952f 𝑛 = 𝑟 + 1 𝑟 𝜎2 𝑍 𝛽 + 𝑍 𝛼 2 2 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒2 r=ratio ofcontrols tocases Samplesize in case group Standard deviation ofthe outcome variable Representsthe desired power (typically0.84 for 80% power) Effect Size (the difference in means) Representsthelevelof statistical significance (typically1.96)
  • 41. Skill #3: Data Visualisation https://towardsdatascience.com/5-quick-and-easy-data-visualizations-in-python-with-code-a2284bae952f
  • 42. Skill #4: A/B Testing Patient Group Population is splitinto 2 groups byrandom allocation Intervention Control Outcomes for both groups are measured = Cured = Still Diseased
  • 43. Skill #5 Linear Regression • Linear regression allowsus tolookat thelinear relationshipbetweenone normally distributedinterval predictorand onenormallydistributedinterval outcomevariable • Example:Wewish tolookatthe relationshipbetweenFEV1and IgEatage5 • In otherwords, predictingFEV1from IgE
  • 44. Linear Regression: Example • WeseethattherelationshipbetweenFEVand TotalIgEat age 5 isnegative(-0.11) • Based on thep-value(<0.05), wewouldconclude thisrelationship is statistically significant. • Hence, wewould say thereisa statisticallysignificantnegativelinearrelationship betweenreading andwriting. Coefficientsa 1.146 .033 35.014 .000 -.011 .005 -.221 -2.242 .027 (Constant) Total serum IgE, age 5 Model 1 B Std. Error Unstandardized Coefficients Beta Standardized Coefficients t Sig. Dependent Variable: FEV1, age 5a.
  • 45. Skill #6: Logistic Regression Asthm a/wheeze sym ptom between 0 to 8 years * Ever antibiotic in first 12 m onths 165 257 422 39.1% 60.9% 100.0% 98 380 478 20.5% 79.5% 100.0% 263 637 900 29.2% 70.8% 100.0% Control Case Total No Yes Antibiotic in first 12 months Total Odds of developing asthma inantibiotic group are: (380/637)/(257/637)= 380/257= 1.48 Odds of developing asthma innon-antibiotic group are: (98/263)/(165/263)= 98/165= 0.59 Theratio of odds for antibiotic users to the odds of non-antibiotic users is: (380/257)/(98/165)= (380*165)/(257*98)= 2.49 Theodds for antibiotic users is 1.49times the odds for non-users
  • 46. What are the Odds? Variables in the Equation .912 .151 36.506 1 .000 2.489 1.852 3.347 -.521 .128 16.688 1 .000 .594 evrab12m Constant Step 1 a B S.E. Wald df Sig. Exp(B) Lower Upper 95.0%C.I.for EXP(B) Variable(s) entered on step 1: evrab12m.a. Theinterceptof-0.52is thelogoddsfor non-antibioticusers since this isthereference group.log(-0.52)=0.59 Thecoefficientofevrab12mis thelogoftheodds ratiobetweenthe antibioticusers andnon-antibioticusers
  • 47. Skill #7: CausalReasoning Thequestions that motivate most studies in the health, social and behavioral sciences arenotassociational butcausal in nature. Before an association is assessed for the possibility that it is causal, other explanations such as chance,bias and confounding have to beexcluded Require some knowledge of the data-generating process - cannot be computed from the data alone, nor from distributions governing data Aim: to infer dynamics of beliefs under changing conditions, for example, changes induced by treatments orexternal interventions. Pearl, Judea."Causal inferencein statistics: An overview." Statistics surveys3 (2009): 96-146.
  • 48. Bradford-Hill Principles of Causality Plausibility Does causation make sense Consistency Cause associated with disease in different population and studies Temporality Cause precedes disease Strength Cause strongly associated with disease Specificity Does the cause lead toa specific effect Dose-Response Greater exposure to cause, higher the risk of disease
  • 49. Prognosticbiomarker (risk factor) Treatment Genetic Marker Tumor Size Outcome (Survival) U Example: Personalisation of Cancer Treatment
  • 50. Skill #8: Missing Data MissingCompletelyAtRandom(MCAR) The probability of data being missing does not depend on the observed or unobserved data e.g. logit(pit) = θ0 MissingAtRandom(MAR) The probability of data being missing does not depend on the unobserved data, conditional on the observed data e.g. Children with missing wheeze data have better lung function e.g. logit(pit) = θ0 + θ1ti or logit(pit) = θ0 + θ2y0 MissingNotAt Random(MNAR) The probability of data being missing does depend on the unobserved data, conditional on the observed data. e.g. Children with missing lung function have better lung function e.g. logit(pit) = θ0 + θ3yit Alexina Mason. “Bayesian methods for modelling non-randommissing data mechanisms in longitudinal studies” PhD Thesis (2009)
  • 51. The Challenge of Missing Data Missing data is a common problem in healthcaredata and can produce biased parameter estimates Reasons for missingness may be informativefor estimatingmodel parameters Bayesian models: coherent approach to incorporating uncertaintyby assigning prior distributions Mason, Alexina, NickyBest, SylviaRichardson, and IANPLEWIS. "Strategy for modelling non-randommissing data mechanisms in observational studies using Bayesian methods." Journalof Official Statistics (2010)
  • 52. Skill #9: LatentVariable Modelling using Probabilistic Programming To identifysubgroups ofcomplexdiseaserisk Treatmentoutcomeexplainedby distinctiveunderlyingmechanism FoundationofStratifiedMedicine Seekingbetter-targetedinterventions
  • 54. Poor Lung Function Wheeze Allergy AsthmaMedication Asthma Symptoms Exacerbations Grow out of Asthma Asthma Late in Childhood Respond to treatment Don’t Respond to treatment Severity EndotypesDiscovery: Different DiseasesWith Different Causes Phenotypes: Observable Manifestationsof Disease Defining Asthma Asthma
  • 55. Probabilistic Programming: Finding Patterns in Data Inference Algorithm Probabilistic model Probabilistic reasoning system The probabilistic model expresses general knowledge about a situation The inference algorithm uses the model to answer queries given evidence The answers to queries are framed as probabilities of different outcomes The basic components of a probabilistic reasoning system Answer Queries Evidence The evidence contains specific information about a situation The queries express the things that will help you make a decisionAdapted from Pfeffer,Avi. "Practical probabilistic programming." International ConferenceonInductive Logic Programming. Springer Berlin Heidelberg, 2010.
  • 56. 1. Team Science:Discoveries about healthcare, not hypothesised a priori, have been made by experts explaining structure learned from data by algorithms tuned by those experts 2. Heuristic blend of biostatistics and machine-learningfor principled problem-led healthcare research 3. An ML approach to extracting knowledge from information in healthcare requires persistent integration of Data Methods Expertise Skill #10: Interdisciplinary Research
  • 57. Problem-led vsData-driven Health Thinkdeeplyabout theclinical context.Find solutions which are specifictothe problem. Goodscienceis aboutmergingdifferentschoolsof thoughtfordevelopingthebiggerpicture. Datadriven approach + DomainKnowledge= Problem-ledapproach withthe patientatthecentre DanielleBelgrave, JohnHenderson,AngelaSimpson,IainBuchan,ChristopherBishop,andAdnanCustovic. "Disaggregatingasthma:Big investigationversusbig data." Journalof Allergy andClinicalImmunology139, no. 2 (2017):400-407..
  • 58. Take Home Message Problem-Led Patient-Centred Research
  • 59. Healthcare ML at MSRCambridge Javier Alvarez-Valle Danielle Belgrave Chris Bishop Laurence Bourn David Carter Richard Lowe Hannah Murfet Jay Nanavati Kenton O’Hara Konstantina Palla Anton Schwaighofer Kenji Takeda Ivan Tarapov Stefan Wijnen Aditya NoriPratik Ghosh Anja Thieme Tim Regan Isabel Chien Jan Steumer Antonio Criminisi Sebastian Tschiatschek
  • 61. Cohort Study: Example • A long-term follow-up study of doctors began in 1950 (Doll and Hill) • Questionnaire sent to all men and women on British Medical Register and Resident in the UK • Asked about age and smoking habits • Replies obtained from 34,440 men • Subsequent follow-up focused on men • Further questionnaires sent at intervals and asked about changes in smoking habit • During 1951-1971 10,074 had died
  • 62. Missing Completely At Random 𝑚𝑖 𝑝𝑖σ² β µ𝑖 θ 𝑦𝑖 Individual i Model of Interest Model of Missingness 𝑥𝑖 logit(pit) = θ0
  • 63. Missing At Random 𝑚𝑖 𝑝𝑖σ² β µ𝑖 θ 𝑦𝑖 Individual i Model of Interest Model of Missingness 𝑥𝑖 logit(pit) = θ0 + θ1xi
  • 64. Missing NotAt Random 𝑚𝑖 𝑝𝑖σ² β µ𝑖 θ 𝑦𝑖 Individual i Model of Interest Model of Missingness 𝑥𝑖 logit(pit) = θ0 + θ3yit
  • 65. Prognostic Biomarker (Risk Factor) A biological measurementmade before treatment toindicatelong-term outcome for patientseither untreated or receiving standardoutcome Prognosticbiomarker (risk factor) Random Allocation (Treatment) Outcomes Dunn,Graham, RichardEmsley, Hanhua Liu, and Sabine Landau. "Integratingbiomarkerinformation within trials to evaluatetreatment mechanisms andefficacy for personalised medicine." ClinicalTrials10, no. 5 (2013): 709-719.
  • 66. Predictive Biomarker (Moderator) A variablethat changesthe impact of treatment on the outcome. A biologicalmeasurement made before treatmentto identifypatientslikely or unlikelyto benefit from a particulartreatment Predictive biomarker (Moderator) Random Allocation (Treatment) Outcomes
  • 67. Mediator A mechanism by which one variableaffectsanother variable.Omitted common causes (hidden confounding)shouldalwaysbe considered as a possibleexplanationfor associationsthat might be interpreted as causal Random Allocation (Treatment) Mediator Outcomes U
  • 68. Efficacyand mechanism evaluation: Causal Framework forinvestigating who medications workfor Prognosticbiomarker (risk factor) Random Allocation Predictive biomarker (moderator) Mediator Outcomes U
  • 69. ‘‘-’’ ‘‘+’’ Moving the Threshold: right withoutpeanut allergy with peanutallergy IgE response to Ara h2 3.15 7.155.15 9.15
  • 70. ‘‘-’’ ‘‘+’’ Moving the Threshold: left IgE response to Ara h2 withoutpeanut allergy with peanutallergy 3.152.151.150.15