SlideShare a Scribd company logo
1 of 67
Machine Learning for
Healthcare Data
Katherine A. Heller
Duke University
Outline
Electronic Health Records
Gaussian Process-based Models for:
Chronic Kidney Disease
Sepsis
National Surgery Quality Improvement Program
The Basics of Predicting Surgical Complications
Clustering Procedural Codes
Transfer Learning Models
Mobile apps
Graph-coupled HMMs for Predicting the Spread of Influenza
MS Mosaic
Chronic Kidney Disease
PCP Visit
ED Visit
Hospital Admission
Acute MI
Death
Nephrology Visit
1st Nephrology Visit
eGFR
Kidney
Function
(eGFR)
Age
PCP Visit
ED Visit
Hospital Admission
Acute MI
Death
Nephrology Visit
1st Nephrology Visit
eGFR
Age 47
Untreated diabetes & high
blood pressure.
Normal kidney function, but
with evidence of kidney
damage.
No regular medical care.
PCP Visit
ED Visit
Hospital Admission
Acute MI
Death
Nephrology Visit
1st Nephrology Visit
eGFR
Age 49
Kidney function now 50%
PCP Visit
ED Visit
Hospital Admission
Acute MI
Death
Nephrology Visit
1st Nephrology Visit
eGFR
Age 51
Referred to
kidney specialist
PCP Visit
ED Visit
Hospital Admission
Acute MI
Death
Nephrology Visit
1st Nephrology Visit
eGFR
Three months later presents to
ER with kidney failure
symptoms and
“crash starts” dialysis
Dialysis
Begins
PCP Visit
ED Visit
Hospital Admission
Acute MI
Death
Nephrology Visit
1st Nephrology Visit
eGFR
Dialysis
Begins
PCP Visit
ED Visit
Hospital Admission
Acute MI
Death
Nephrology Visit
1st Nephrology Visit
eGFR
Missed Opportunities:
To prevent or delay
kidney failure
To prepare for kidney
failure
42%
starting dialysis have
no prior nephrology care
1
1
<10%with moderate CKD
<50%with severe CKD
even aware of illness!
12
Model for a single trajectory
1
Individual transient
deviations (GP)
Latent subpopulation
curve
Individual long-term
deviations
Curves
per
subtype
Schulam and Saria, NIPS 2015
Inducing Dependence
1
Dependence between mean functions for the P labs in 2 ways:
Long-term deviations are correlated
via multivariate normal:
Subtypes/clusters per lab are correlated
via mixture of multinomials:
…
…
Conditional likelihood factorizes across P labs:
Experimental Setup
6 variables of interest: eGFR, 5 other labs relevant to
CKD
Cohort of 44,000 patients at Duke with at least moderate
stage CKD (Stage 3+) and 5+ measurements for eGFR
For each test patient: use data before t to predict future
labs
Evaluation for each lab:
average MAE across test patients, in future time windows
Baseline: [Schulam & Saria, 2015] trained independently
Quantitative Results
1
7
Chronic Kidney Disease (CKD)
Heart disease Diabetes
Proposed Joint Model
Goal: Jointly model risk of future loss of kidney function
and cardiac events.
Hierarchical latent variable model captures dependencies
between disease trajectory and event risk
Conditional independence in joint likelihood:
where y are eGFR values at times t, u are event times, and x
are covariates
Point Process Submodel
Poisson process model with conditional likelihood on
events:
Rate function -- hazard funtion in Cox proportional
hazards model:
piecewise constant
baseline rate
coefficient
vector
baseline
covariates
association between
event risk and
expected mean/slope
of eGFR
random effect
(frailty term):
Data
23,450 patients with moderate stage CKD and 10+ eGFR
readings
CKD definition: 2 eGFR readings < 60mL/min, separated by 90+ days
Preprocessing: mean in monthly bins
eGFR only valid estimate of kidney function at steady state
22.9 readings on average (std. dev. 13.6, median 19.0)
Alignment: set t=0 to be first eGFR reading < 60mL/min
Adverse events: AMI, CVA identified using ICD9 codes. Max 1
event / month
13.4% had 1+ CVA code (mean w/ 1+: 4.1, std dev: 7.1, median: 2.0)
17.4% had 1+ AMI code (mean w/ 1+: 6.4, std dev: 13.3, median: 3.0)
Baseline covariates: baseline age, race, gender; hypertension,
diabetes
Joint Model Results
Sepsis
Previous Methods
Early Warning
Scores in
widespread
use
Duke uses
NEWS
Overly
simplistic
(Henry, Hager, Pronovost, Saria; Science Translational Medicine 2015) use
Cox regression to predict time to septic shock, using 54 potential features
Classify time series with sparse and irregular sampling, observed on
common time interval [0,T]
Observation times t, Observation values v, Reference times x
Represent a time series by its marginal posterior at x:
Use a GP regression with zero-mean, parameters (kernel, noise)
z is random, so use expected loss w.r.t GP posterior:
Learn GP parameters and classifier parameters end-to-end
Reparameterization trick to get gradients of the expectation
Approximate intractable expectations with a few Monte Carlo samples
Conjugate gradient to compute GP means and covariances
Lanczos method to efficiently draw samples from a (large) normal from the
GP (can backprop through this)
Gaussian Processes for
Irregularly Sampled Time
Series
(Cheng-Xian Li, Marlin NIPS 2016)
Similar setup, but now multivariate (M=31) time series, with
highly variable lengths (many < 12 hours, some > 10 days)
Goal: use clinical time series, baseline covariates, times of
medications to predict sepsis
New mean which depends on medications p
GP kernel now defines covariances amongst m:
RNN (LSTM with several layers) classifier
Sepsis with a Multitask GP RNN
Data
52,000 inpatient encounters for training (all data from 18
months). 14,000 for testing (6 months of future data)
31 longitudinal variables (6 vitals, 25 labs)
36 baseline covariates (29 comorbidity indicators; 6
demographic / admission info)
8 medication classes
Mean length of stay 121 hours (sd: 108)
Results
Surgery
Hierarchical Infinite Factor Model
Speeding up:
Stochastic
Gradient
Nose-Hoover
Thermostats
Sampling
Phone
Screen
EPIC
Clarity
Machine
Learning
RISK
PREDICTIO
N
(MODEL 1)
LOW
Data pulls
every 24
hours
Optimization
Interventions
Standard
Care
PAT*
INTERMEDI
ATE
Machine
Learning
RISK
PREDICTION
(MODEL 2)
POSH
***
POET
**
HIGH
Surgery
Scheduled
Preoperative
Assessment:
Phone/PAT/POET/
POSH
Preoperativ
e Care
* PAT: Pre-operative Anesthesia Testing
** POET: Peri-Operative Enhancement Team
*** POSH: Perioperative Optimization of Senior Health
Pre-operative Framework
Infectious Disease
Infection in a Social Network
Goal: To model dynamical interactions between agents in
a social network and apply to inferring the spread of
infection.
Many traditional epidemics models work on a population
level, treating each person the same way.
Contemporary data collection techniques allow us to
model the spread of infection on an individual level.
Being able to make infection predictions on an individual
level is enormously beneficial because it allows people to
receive more personalized and relevant health advice.
Social Evolution
Experiment
Data collected in the social evolution experiment allows us
for the first time to closely track proximities and contagion
in an entire community over a substantial period of time.
Tracked “common cold” symptoms in an MIT residence
hall from January to April 2009.
Monitored over 80% of residents through their cell phones
from October 2008 to May 2009, taking daily surveys and
tracking their location, proximities and phone calls.
Monthly surveys on social, health, and political issues
taken. Locations taken by having cell phones scan nearby
wifi access points and bluetooth devices.
Student Hall Network
Health Surveys
In the Social Evolution experiment students were paid $1 a day
to answer surveys about contracting infection.
The surveys asked about symptoms:
Runny nose, nasal congestion, sneezing
Nausea, vomiting, diarrhea
Stress
Sadness and depression
Fever
64 of the 85 residents answered the surveys
Symptoms dependent on the social network. A student with a
symptom had a 3-10x higher odds of having a friend with the
same symptom.
Hidden Markov Models
We aim to leverage the social evolution data to predict the
spread of infection to individuals in our social network
using a model related to HMMs.
Graph-coupled HMMs
Associate each person in the dynamic interaction network
with an HMM chain. Let interaction network structure
determine the HMM couplings:
GCHMM Inference
Inference in the coupled HMM is very hard.
Typically ML estimation is done on few chains with few states
or another approximation is made.
In the worst case GCHMM inference is as difficult as
CHMM inference.
When the graph is fully connected.
Fortunately, since we’re dealing with social networks we
can leverage a couple of properties:
Social networks are usually sparse
For many applications the influence of interactions can be
modeled in a fairly simple way via a small number of
parameters.
GCHMMs for Modeling Infection
In the case of the social evolution data the influence of
other HMMs can be summarized by counts of interactions
in the infectious state.
The GCHMM can provide an individual level version of the
susceptible-infectious-susceptible (SIS) epidemiology
model:
GCHMM for infection:
Experimental Results
Dong et al, UAI 2012
Aiello Group Data
eX-FLU study at University of Michigan
590 students from 6 dorms
Chain referral scheme
A 103 student subset participated in iEpi
Smartphone based study where location is tracked and surveys
taken
Unlike MIT study confirmation of interaction was recorded on phones
and flu testing was done on students who reported being ill.
Also an isolation intervention was tested.
Hierarchical GCHMMs
Add a hierarchical level to where beta distributed infection
parameters are learned:
Inference Gibbs-EM algorithm but follow up papers e.g. stochastic
VB
Results
Fan et al, KDD 2016
Multiple Sclerosis
Initial Focus
Chaotic symptoms
Twenty-two non-pathognomonic symptoms
Onset, duration, and severity do not clarify etiology
How can we monitor this continuously to start making
sense of it?
MS Mosaic App
Embedded Consent
Disease History Survey
Daily Symptom & Medication
Survey (customized)
Exportable Relapse Survey
Five Activities
MS Mosaic App
Passively collects HealthKit data
82 Different Data Types (>10 thought to
affect MS)
Tagged with date, time, and provenance
Single notification
Each day’s survey
will take no more
than 1 minute to
complete
Weekly tasks no
more than 5 minutes
Initial Analyses
Develop a sparse logistic regression model for
predicting the likelihood of each symptom experience
Incorporate a hierarchical layer based on Gaussian
processes for modeling time series data (e.g. sleep)
Discover hidden subpopulations within symptoms (using
clustering methodology, such as Dirichlet Process
mixture models)
Evaluate the efficacy of symptom interventions using
longitudinal models and clinical trials
Planned Dataset Evolution
MRI Sub-study
(Duke Pilot)
Symptom sub-study
v1.0 - 4.0
Road-mapped
“Omics” Sub-study
(Duke Pilot)
Thanks!
Joe Futoma
Sanjay Hariharan
Cara O’Brien
Blake Cameron
Mark Sendak
Armando Bedhoya
Liz Lorenzi
Erich Huang
Jeff Sun
Sandhya Lagoo
Mitch Heflin
Madhav Swaminathan
Ouwen Huang
Kai Fan
Allison Aiello
Sandy Pentland
Wen Dong
Sepsis
CKD
Surgery Influenza
Multiple Sclerosis: Lee Hartsell

More Related Content

Similar to HEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHM

Predicting Chronic Kidney Disease using Data Mining Techniques
Predicting Chronic Kidney Disease using Data Mining TechniquesPredicting Chronic Kidney Disease using Data Mining Techniques
Predicting Chronic Kidney Disease using Data Mining Techniquesijtsrd
 
Kshivets O. Cancer, Computer Sciences and Alive Supersystems
Kshivets O. Cancer, Computer Sciences and Alive SupersystemsKshivets O. Cancer, Computer Sciences and Alive Supersystems
Kshivets O. Cancer, Computer Sciences and Alive SupersystemsOleg Kshivets
 
Moving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel DudleyMoving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel DudleyCityAge
 
Predicting Diabetic Readmission Rates: Moving Beyond HbA1c
Predicting Diabetic Readmission Rates: Moving Beyond HbA1cPredicting Diabetic Readmission Rates: Moving Beyond HbA1c
Predicting Diabetic Readmission Rates: Moving Beyond HbA1cDamian R. Mingle, MBA
 
study of hematological paremeter in sepsis patients and its prognostic implic...
study of hematological paremeter in sepsis patients and its prognostic implic...study of hematological paremeter in sepsis patients and its prognostic implic...
study of hematological paremeter in sepsis patients and its prognostic implic...RahulGupta1687
 
The emerging picture of host genetic control of susceptibility and outcome in...
The emerging picture of host genetic control of susceptibility and outcome in...The emerging picture of host genetic control of susceptibility and outcome in...
The emerging picture of host genetic control of susceptibility and outcome in...Meningitis Research Foundation
 
Amia tb-review-13
Amia tb-review-13Amia tb-review-13
Amia tb-review-13Russ Altman
 
BMJ Open-2016-Conway Morris--2
BMJ Open-2016-Conway Morris--2BMJ Open-2016-Conway Morris--2
BMJ Open-2016-Conway Morris--2Tracey Mare
 
PREDICTIVE MODEL FOR LIKELIHOOD OF DETECTING CHRONIC KIDNEY FAILURE AND DISEA...
PREDICTIVE MODEL FOR LIKELIHOOD OF DETECTING CHRONIC KIDNEY FAILURE AND DISEA...PREDICTIVE MODEL FOR LIKELIHOOD OF DETECTING CHRONIC KIDNEY FAILURE AND DISEA...
PREDICTIVE MODEL FOR LIKELIHOOD OF DETECTING CHRONIC KIDNEY FAILURE AND DISEA...ijcsitcejournal
 
Repurposing large datasets for exposomic discovery in disease
Repurposing large datasets for exposomic discovery in diseaseRepurposing large datasets for exposomic discovery in disease
Repurposing large datasets for exposomic discovery in diseaseChirag Patel
 
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...Fundación Ramón Areces
 
Recent advances and challenges of digital mental healthcare
Recent advances and challenges of digital mental healthcareRecent advances and challenges of digital mental healthcare
Recent advances and challenges of digital mental healthcareYoon Sup Choi
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingIncedo
 
Impact Of Computer Software Appplication On Medication Therapy Adherence
Impact Of Computer Software Appplication On Medication Therapy AdherenceImpact Of Computer Software Appplication On Medication Therapy Adherence
Impact Of Computer Software Appplication On Medication Therapy AdherenceYayra Nyoagbe
 
ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014Warren Kibbe
 
ppt1221[1][1].pptx
ppt1221[1][1].pptxppt1221[1][1].pptx
ppt1221[1][1].pptxAbebe334138
 
GENE IN FIBROMYALGIA SYNDROME
GENE IN FIBROMYALGIA SYNDROMEGENE IN FIBROMYALGIA SYNDROME
GENE IN FIBROMYALGIA SYNDROMEGhizal Fatima
 

Similar to HEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHM (20)

Predicting Chronic Kidney Disease using Data Mining Techniques
Predicting Chronic Kidney Disease using Data Mining TechniquesPredicting Chronic Kidney Disease using Data Mining Techniques
Predicting Chronic Kidney Disease using Data Mining Techniques
 
Kshivets O. Cancer, Computer Sciences and Alive Supersystems
Kshivets O. Cancer, Computer Sciences and Alive SupersystemsKshivets O. Cancer, Computer Sciences and Alive Supersystems
Kshivets O. Cancer, Computer Sciences and Alive Supersystems
 
Moving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel DudleyMoving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
 
Predicting Diabetic Readmission Rates: Moving Beyond HbA1c
Predicting Diabetic Readmission Rates: Moving Beyond HbA1cPredicting Diabetic Readmission Rates: Moving Beyond HbA1c
Predicting Diabetic Readmission Rates: Moving Beyond HbA1c
 
study of hematological paremeter in sepsis patients and its prognostic implic...
study of hematological paremeter in sepsis patients and its prognostic implic...study of hematological paremeter in sepsis patients and its prognostic implic...
study of hematological paremeter in sepsis patients and its prognostic implic...
 
The emerging picture of host genetic control of susceptibility and outcome in...
The emerging picture of host genetic control of susceptibility and outcome in...The emerging picture of host genetic control of susceptibility and outcome in...
The emerging picture of host genetic control of susceptibility and outcome in...
 
Amia tb-review-13
Amia tb-review-13Amia tb-review-13
Amia tb-review-13
 
BMJ Open-2016-Conway Morris--2
BMJ Open-2016-Conway Morris--2BMJ Open-2016-Conway Morris--2
BMJ Open-2016-Conway Morris--2
 
PREDICTIVE MODEL FOR LIKELIHOOD OF DETECTING CHRONIC KIDNEY FAILURE AND DISEA...
PREDICTIVE MODEL FOR LIKELIHOOD OF DETECTING CHRONIC KIDNEY FAILURE AND DISEA...PREDICTIVE MODEL FOR LIKELIHOOD OF DETECTING CHRONIC KIDNEY FAILURE AND DISEA...
PREDICTIVE MODEL FOR LIKELIHOOD OF DETECTING CHRONIC KIDNEY FAILURE AND DISEA...
 
Repurposing large datasets for exposomic discovery in disease
Repurposing large datasets for exposomic discovery in diseaseRepurposing large datasets for exposomic discovery in disease
Repurposing large datasets for exposomic discovery in disease
 
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...
 
PMED Opening Workshop - Regrowth Rates of Tumors after Radiation Vary Dependi...
PMED Opening Workshop - Regrowth Rates of Tumors after Radiation Vary Dependi...PMED Opening Workshop - Regrowth Rates of Tumors after Radiation Vary Dependi...
PMED Opening Workshop - Regrowth Rates of Tumors after Radiation Vary Dependi...
 
Recent advances and challenges of digital mental healthcare
Recent advances and challenges of digital mental healthcareRecent advances and challenges of digital mental healthcare
Recent advances and challenges of digital mental healthcare
 
Dalton
DaltonDalton
Dalton
 
Dalton presentation
Dalton presentationDalton presentation
Dalton presentation
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Impact Of Computer Software Appplication On Medication Therapy Adherence
Impact Of Computer Software Appplication On Medication Therapy AdherenceImpact Of Computer Software Appplication On Medication Therapy Adherence
Impact Of Computer Software Appplication On Medication Therapy Adherence
 
ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014
 
ppt1221[1][1].pptx
ppt1221[1][1].pptxppt1221[1][1].pptx
ppt1221[1][1].pptx
 
GENE IN FIBROMYALGIA SYNDROME
GENE IN FIBROMYALGIA SYNDROMEGENE IN FIBROMYALGIA SYNDROME
GENE IN FIBROMYALGIA SYNDROME
 

Recently uploaded

Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchersdarmandersingh4580
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样jk0tkvfv
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksBoston Institute of Analytics
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethSamantha Rae Coolbeth
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...ssuserf63bd7
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationmuqadasqasim10
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancingmohamed Elzalabany
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证pwgnohujw
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一fztigerwe
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证ju0dztxtn
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 

Recently uploaded (20)

Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 

HEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHM

  • 1. Machine Learning for Healthcare Data Katherine A. Heller Duke University
  • 2. Outline Electronic Health Records Gaussian Process-based Models for: Chronic Kidney Disease Sepsis National Surgery Quality Improvement Program The Basics of Predicting Surgical Complications Clustering Procedural Codes Transfer Learning Models Mobile apps Graph-coupled HMMs for Predicting the Spread of Influenza MS Mosaic
  • 4. PCP Visit ED Visit Hospital Admission Acute MI Death Nephrology Visit 1st Nephrology Visit eGFR Kidney Function (eGFR) Age
  • 5. PCP Visit ED Visit Hospital Admission Acute MI Death Nephrology Visit 1st Nephrology Visit eGFR Age 47 Untreated diabetes & high blood pressure. Normal kidney function, but with evidence of kidney damage. No regular medical care.
  • 6. PCP Visit ED Visit Hospital Admission Acute MI Death Nephrology Visit 1st Nephrology Visit eGFR Age 49 Kidney function now 50%
  • 7. PCP Visit ED Visit Hospital Admission Acute MI Death Nephrology Visit 1st Nephrology Visit eGFR Age 51 Referred to kidney specialist
  • 8. PCP Visit ED Visit Hospital Admission Acute MI Death Nephrology Visit 1st Nephrology Visit eGFR Three months later presents to ER with kidney failure symptoms and “crash starts” dialysis Dialysis Begins
  • 9. PCP Visit ED Visit Hospital Admission Acute MI Death Nephrology Visit 1st Nephrology Visit eGFR Dialysis Begins
  • 10. PCP Visit ED Visit Hospital Admission Acute MI Death Nephrology Visit 1st Nephrology Visit eGFR Missed Opportunities: To prevent or delay kidney failure To prepare for kidney failure
  • 11. 42% starting dialysis have no prior nephrology care 1 1
  • 12. <10%with moderate CKD <50%with severe CKD even aware of illness! 12
  • 13. Model for a single trajectory 1 Individual transient deviations (GP) Latent subpopulation curve Individual long-term deviations Curves per subtype Schulam and Saria, NIPS 2015
  • 14. Inducing Dependence 1 Dependence between mean functions for the P labs in 2 ways: Long-term deviations are correlated via multivariate normal: Subtypes/clusters per lab are correlated via mixture of multinomials: … … Conditional likelihood factorizes across P labs:
  • 15. Experimental Setup 6 variables of interest: eGFR, 5 other labs relevant to CKD Cohort of 44,000 patients at Duke with at least moderate stage CKD (Stage 3+) and 5+ measurements for eGFR For each test patient: use data before t to predict future labs Evaluation for each lab: average MAE across test patients, in future time windows Baseline: [Schulam & Saria, 2015] trained independently
  • 17. 1 7 Chronic Kidney Disease (CKD) Heart disease Diabetes
  • 18. Proposed Joint Model Goal: Jointly model risk of future loss of kidney function and cardiac events. Hierarchical latent variable model captures dependencies between disease trajectory and event risk Conditional independence in joint likelihood: where y are eGFR values at times t, u are event times, and x are covariates
  • 19. Point Process Submodel Poisson process model with conditional likelihood on events: Rate function -- hazard funtion in Cox proportional hazards model: piecewise constant baseline rate coefficient vector baseline covariates association between event risk and expected mean/slope of eGFR random effect (frailty term):
  • 20. Data 23,450 patients with moderate stage CKD and 10+ eGFR readings CKD definition: 2 eGFR readings < 60mL/min, separated by 90+ days Preprocessing: mean in monthly bins eGFR only valid estimate of kidney function at steady state 22.9 readings on average (std. dev. 13.6, median 19.0) Alignment: set t=0 to be first eGFR reading < 60mL/min Adverse events: AMI, CVA identified using ICD9 codes. Max 1 event / month 13.4% had 1+ CVA code (mean w/ 1+: 4.1, std dev: 7.1, median: 2.0) 17.4% had 1+ AMI code (mean w/ 1+: 6.4, std dev: 13.3, median: 3.0) Baseline covariates: baseline age, race, gender; hypertension, diabetes
  • 23. Previous Methods Early Warning Scores in widespread use Duke uses NEWS Overly simplistic (Henry, Hager, Pronovost, Saria; Science Translational Medicine 2015) use Cox regression to predict time to septic shock, using 54 potential features
  • 24. Classify time series with sparse and irregular sampling, observed on common time interval [0,T] Observation times t, Observation values v, Reference times x Represent a time series by its marginal posterior at x: Use a GP regression with zero-mean, parameters (kernel, noise) z is random, so use expected loss w.r.t GP posterior: Learn GP parameters and classifier parameters end-to-end Reparameterization trick to get gradients of the expectation Approximate intractable expectations with a few Monte Carlo samples Conjugate gradient to compute GP means and covariances Lanczos method to efficiently draw samples from a (large) normal from the GP (can backprop through this) Gaussian Processes for Irregularly Sampled Time Series (Cheng-Xian Li, Marlin NIPS 2016)
  • 25. Similar setup, but now multivariate (M=31) time series, with highly variable lengths (many < 12 hours, some > 10 days) Goal: use clinical time series, baseline covariates, times of medications to predict sepsis New mean which depends on medications p GP kernel now defines covariances amongst m: RNN (LSTM with several layers) classifier Sepsis with a Multitask GP RNN
  • 26.
  • 27.
  • 28. Data 52,000 inpatient encounters for training (all data from 18 months). 14,000 for testing (6 months of future data) 31 longitudinal variables (6 vitals, 25 labs) 36 baseline covariates (29 comorbidity indicators; 6 demographic / admission info) 8 medication classes Mean length of stay 121 hours (sd: 108)
  • 30.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43. Hierarchical Infinite Factor Model Speeding up: Stochastic Gradient Nose-Hoover Thermostats Sampling
  • 44. Phone Screen EPIC Clarity Machine Learning RISK PREDICTIO N (MODEL 1) LOW Data pulls every 24 hours Optimization Interventions Standard Care PAT* INTERMEDI ATE Machine Learning RISK PREDICTION (MODEL 2) POSH *** POET ** HIGH Surgery Scheduled Preoperative Assessment: Phone/PAT/POET/ POSH Preoperativ e Care * PAT: Pre-operative Anesthesia Testing ** POET: Peri-Operative Enhancement Team *** POSH: Perioperative Optimization of Senior Health Pre-operative Framework
  • 46. Infection in a Social Network Goal: To model dynamical interactions between agents in a social network and apply to inferring the spread of infection. Many traditional epidemics models work on a population level, treating each person the same way. Contemporary data collection techniques allow us to model the spread of infection on an individual level. Being able to make infection predictions on an individual level is enormously beneficial because it allows people to receive more personalized and relevant health advice.
  • 47. Social Evolution Experiment Data collected in the social evolution experiment allows us for the first time to closely track proximities and contagion in an entire community over a substantial period of time. Tracked “common cold” symptoms in an MIT residence hall from January to April 2009. Monitored over 80% of residents through their cell phones from October 2008 to May 2009, taking daily surveys and tracking their location, proximities and phone calls. Monthly surveys on social, health, and political issues taken. Locations taken by having cell phones scan nearby wifi access points and bluetooth devices.
  • 49. Health Surveys In the Social Evolution experiment students were paid $1 a day to answer surveys about contracting infection. The surveys asked about symptoms: Runny nose, nasal congestion, sneezing Nausea, vomiting, diarrhea Stress Sadness and depression Fever 64 of the 85 residents answered the surveys Symptoms dependent on the social network. A student with a symptom had a 3-10x higher odds of having a friend with the same symptom.
  • 50. Hidden Markov Models We aim to leverage the social evolution data to predict the spread of infection to individuals in our social network using a model related to HMMs.
  • 51. Graph-coupled HMMs Associate each person in the dynamic interaction network with an HMM chain. Let interaction network structure determine the HMM couplings:
  • 52. GCHMM Inference Inference in the coupled HMM is very hard. Typically ML estimation is done on few chains with few states or another approximation is made. In the worst case GCHMM inference is as difficult as CHMM inference. When the graph is fully connected. Fortunately, since we’re dealing with social networks we can leverage a couple of properties: Social networks are usually sparse For many applications the influence of interactions can be modeled in a fairly simple way via a small number of parameters.
  • 53. GCHMMs for Modeling Infection In the case of the social evolution data the influence of other HMMs can be summarized by counts of interactions in the infectious state. The GCHMM can provide an individual level version of the susceptible-infectious-susceptible (SIS) epidemiology model: GCHMM for infection:
  • 55. Aiello Group Data eX-FLU study at University of Michigan 590 students from 6 dorms Chain referral scheme A 103 student subset participated in iEpi Smartphone based study where location is tracked and surveys taken Unlike MIT study confirmation of interaction was recorded on phones and flu testing was done on students who reported being ill. Also an isolation intervention was tested.
  • 56. Hierarchical GCHMMs Add a hierarchical level to where beta distributed infection parameters are learned: Inference Gibbs-EM algorithm but follow up papers e.g. stochastic VB
  • 57. Results Fan et al, KDD 2016
  • 59. Initial Focus Chaotic symptoms Twenty-two non-pathognomonic symptoms Onset, duration, and severity do not clarify etiology How can we monitor this continuously to start making sense of it?
  • 60. MS Mosaic App Embedded Consent Disease History Survey Daily Symptom & Medication Survey (customized) Exportable Relapse Survey Five Activities
  • 61. MS Mosaic App Passively collects HealthKit data 82 Different Data Types (>10 thought to affect MS) Tagged with date, time, and provenance
  • 62. Single notification Each day’s survey will take no more than 1 minute to complete Weekly tasks no more than 5 minutes
  • 63.
  • 64.
  • 65. Initial Analyses Develop a sparse logistic regression model for predicting the likelihood of each symptom experience Incorporate a hierarchical layer based on Gaussian processes for modeling time series data (e.g. sleep) Discover hidden subpopulations within symptoms (using clustering methodology, such as Dirichlet Process mixture models) Evaluate the efficacy of symptom interventions using longitudinal models and clinical trials
  • 66. Planned Dataset Evolution MRI Sub-study (Duke Pilot) Symptom sub-study v1.0 - 4.0 Road-mapped “Omics” Sub-study (Duke Pilot)
  • 67. Thanks! Joe Futoma Sanjay Hariharan Cara O’Brien Blake Cameron Mark Sendak Armando Bedhoya Liz Lorenzi Erich Huang Jeff Sun Sandhya Lagoo Mitch Heflin Madhav Swaminathan Ouwen Huang Kai Fan Allison Aiello Sandy Pentland Wen Dong Sepsis CKD Surgery Influenza Multiple Sclerosis: Lee Hartsell

Editor's Notes

  1. I’l start by telling you a story about Bob. Not his real name of course. But Bob is a real person. Bob’s percent kidney function is on the y axis 0-100. Bob’s age is on the x-axis starting at age 47 The legend over shows the different types of events and medical care Bob receives
  2. Bob first meets Duke health system about 15 years ago at age 47. Bob shows up at the ED about a half-dozen times, gets a battery of tests. We collect enough data on Bob to know that he has Uncontrolled diabetes and HTN He has normal kidney function but evidence of kidney damage
  3. After a couple of years and visits the ER his kidney function has fallen by half.
  4. He finally sees a kidney specialist after 5 consistent years of decline
  5. Three months after that visit Bob starts hemodialysis
  6. He spent 10 years on dialysis and died last year at the age of 63.
  7. Bob missed out. He didn’t get treatment that could have slowed his progression and prolonged his life. He didn’t get any warning that his life would be turned upside-down. Nothing to smooth his transition. But this isn’t just about Bob. This was a missed opportunity for every taxpayer now footing the bill for all this expensive medical care.
  8. So we end up in a situation where 42% of incident dialysis patients have no prior nephrology care. Some of those patients had no prior medical care at all.
  9. Where the majority of people with CKD don’t even know they have the disease. Of course, seeing a kidney specialist is not a guarantee of a smooth transition to dialysis. Even specialists struggle to predict how fast patients will progress.
  10. So we’re going to assume that given latent variable for person i, the conditional likelihood factorizes over the variables. So we first need to specify the model for a single trajectory. So we’re going to use a GP model, with a highly structured mean function. The first grey term is a population term that uses baseline covariates. The second red term is a clustering term, assuming that the trajectory for this person’s lab follows a common curve like those shown at bottom left. The green term allows for an individual to adjust this mean function, and the blue term is the explicit GP short-term variations. The figures at right show the effect of these terms to the predicted mean function
  11. So now, we’re going to specify priors that will allow us to learn correlations and induce dependence between the different labs. So we’ll assume that those long-term individual deviations b are correlated via MVN, and the cluster assignments z are correlated via a mixture of multinomials. The c_i is a global cluster assignment that tells which multinomial distribution to draw the corresponding z from
  12. To evaluate our approach we use a dataset 6 variables - kidney function + 5 labs relevant to CKD, for a cohort of 44k CKD patients at Duke. After training the model using stochastic variational inference, for each test patient we use all their labs up until a cutoff time to predict held-out labs in the future. We evaluate by taking the mean absolute error for each test patient in future time bins and averaging over test patients. As a baseline we use the model for a single trajectory from 2 slides ago, trained independently to each variable.
  13. This table shows the results. Each row shows error for each lab, for our method and the baseline; bold or lower is better. Columns show errors in future time windows. The t=1, for instance, indicates the results for using data up until t=1 for test patients to then make predictions for future labs between t=1 to 2, 2 to 4, 4 to 8, etc. Overall, our method performs better for those labs that tend to have higher degrees of missingness, because our method leverages correlations between labs while the comparator can only use baseline covariates.
  14. Our initial focus will be one of the most distressing issues to MSers: seemingly chaotic symptoms. Twenty-two non-specific non-pathognomonic symptoms are seen with MS, ranging from generalized fatigue and cognitive changes to focal sensory changes and walking instability. Each symptom or collection of symptoms can be affected by active relapse or recrudescence from a silent relapse, medication/toxic effects, temperature change, stress, sleep deprivation, or biorhythm. So we had to ask… How can we monitor this continuously to find the patterns in these fragmented symptoms?
  15. So we developed an app. The app walks them through the consent process and then obtains some preliminary information on disease history — Year and age at diagnosis, MS type, the diagnostic studies used to make, state of birth, # of relapses, family history, co-morbidities. The daily symptom and medication survey then prompts the user to report — Monitor all 22 common symptoms in multiple sclerosis — Monitor all current medications (pre-populated with 80 medications commonly used in MS and ability to add ones not pre-populated) — Relapse data — Five prompted tasks useful for MS monitoring (evaluate spatial memory; auditory processing, cognitive flexibility, and mental calculation; walking stability; rapid alternating movements and fatiguability; coordination)
  16. Data types that are of interest in multiple sclerosis (at least 13): BMI, Weight, Height, electrodermal activity, UV index, steps, workouts, dietary energy, biotin, vitamin D, body temperature, heart rate, sleep analysis
  17. Features immediately useful to participants
  18. Feedback through journal review (ability to supplement each day’s entry with free text), weekly check-in on their symptoms, and a provider report Provider Reports Summarize events between clinic visits Facilitates communication during clinical encounters Exportable PDF for EMR
  19. Lastly, we have the app’s learn section, where users have direct access to more than 16,000 pages of curated content
  20. — Develop a sparse logistic regression model for predicting whether patients are likely to experience a particular symptom in the future. — Incorporate a hierarchical layer, based on Gaussian Processes for modeling time series data, such as data from sleep or activity trackers. — Use clustering methodology, such as Dirichlet Process mixture models, to discover hidden subpopulations of MS patients. For example, a patient might experience fatigue because they’re not getting enough sleep, they’re depressed, or the underlying cause is their MS. — Evaluate the efficacy of symptom interventions and disease modifying drugs using longitudinal models and clinical trials.