Designing IA for AI - Information Architecture Conference 2024
Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesian Multitask Learning Approach
1. Healthcare Predictive
Analytics for Risk Profiling
in Chronic Care:
A Bayesian Multitask Learning Approach
Yu-Kai Lin (Florida State University)
Hsinchun Chen (University of Arizona)
Randall A. Brown (Southern Arizona VA Health Care System)
Shu-Hsing Li (National Taiwan University)
Hung-Jen Yang (Stanford University)
5/27/2017 1Healthcare Predictive Analytics for Risk Profiling in Chronic Care
2. Background
Healthcare Predictive Analytics for
Risk Profiling in Chronic Care:
A Bayesian Multitask Learning Approach
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 2
3. How to improve chronic care?
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 3
Chronic Care Model
Bodenheimer et al. (2002)
Chronic Disease Control
Brownson and Bright (2004)
“Technovigilance”
Dixon-Woods et al. (2013)
• Community Resources and Policies
• Health Care Organization
• Self-management Support
• Delivery System Design
• Decision Support
• Clinical Information Systems
Data and science-driven
decision-making
If one consistent message has
emerged from the literature on
improving quality and safety in
health care, it is that high-quality
intelligence is indispensable.
4. Healthcare analytics for clinical
intelligence
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 4
Healthcare Analytics ⇔ Business Analytics in Healthcare
• Marketers vs. Clinicians (Fichman et al. 2011)
− Marketers:
◦ Consumer profiling for targeted marketing
◦ How likely a particular consumer will click an ad link, download
an app, respond to a coupon, …
− Clinicians:
◦ Patient profiling for personalized care
◦ How likely a particular patient will develop a complication,
experience an adverse medical event, respond to a treatment, …
5. Health analytics using EHR data
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 5
• Healthcare predictive analytics using electronic health
records (EHRs) is a promising IS research direction
• Fichman, Kohli & Krishnan
(ISR 2011: Healthcare IS)
“Using digital technology to enable
new kinds of mathematical
healthcare modeling … and how
they should be integrated with
electronic health records warrants
future research attention.”
• Chen, Chiang & Storey
(MISQ 2012: BI & Analytics)
“Over the past decade, electronic
health records have been widely
adopted in hospitals and clinics
worldwide. Significant clinical
knowledge and a deeper
understanding of patient disease
patterns can be gleaned from such
collections.”
6. Research motivation
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 6
• Chronic Diseases (CDs) in the US
− Half of all adults (117 million people) had one or more CDs
− 86% of the nation’s health care costs are for treating CDs
− Seven of the top 10 causes of death in 2010 were CDs
− Maps on the prevalence of diagnosed diabetes:
1994 2000
<4.5% 4.5%–5.9% 6.0%–7.4% 7.5%–8.9% >9.0%
2014
Data source: https://www.cdc.gov/chronicdisease/
7. Research motivation (cont.)
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 7
• Patients with chronic diseases are often at risk for
multiple complications
− Diabetes stroke, heart attack, kidney failure, eye problems,
and so on
• Surprisingly, almost all the existing clinical risk
models are designed to focus on only a single
outcome.
− Diabetes stroke
− Diabetes heart attack
− Diabetes kidney diseases
8. Single-Task Learning (STL)
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 8
• In the usual approach, each event is modeled
independently:
Predict Event 1
Predict Event 2
Predict Event K
logit = +
logit = +
…
logit = +
9. STL is fine, but…
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 9
• If the outcomes are related, the models are perhaps
related (that is, coefficients/parameters are related).
• If the models are related, a model can perhaps
“borrow” information from the other models.
Spillover effect in model training
10. Multitask Learning (MTL)
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 10
Task 1
Task 2
Task K
Data
Data
Data
Training
Training
Training
Trained
Model
Trained
Model
Trained
Model
…
…
…
Task 1
Task 2
Task K
Data
Data
Data
Training
Trained
Model
Trained
Model
Trained
Model
…
…
Single-Task Learning Multitask Learning
11. Research questions & relevance
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 11
• We are interested in studying:
− What are a patient’s risks to an array of events?
− How can we model multiple risks simultaneously?
− Does simultaneous learning of multiple event risks improve
overall predictive performance of each event risk?
• Relevance to Information Systems (IS) research
− Healthcare IS (Fichman et al. 2011; Bardhan et al. 2014)
− Predictive analytics (Shmueli and Koppius 2011)
− Design science (Hevner et al. 2004; Gregor and Hevner 2013)
12. Model Development
Healthcare Predictive Analytics for
Risk Profiling in Chronic Care:
A Bayesian Multitask Learning Approach
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 12
13. Bayesian MultiTask Learning
(BMTL) intuition
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 13
Ω1, … , ΩJ are K by K correlation matrices.
Ω1
(1)
Ω2 ΩJ
(1) (1)
(2) (2) (2)
(K) (K) (K)
...Task 1
Task 2
Task K
β1
β1
β1
β2
β2
β2
βJ
βJ
βJ
...
...
...
...
14. Bayesian Analysis
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 14
| =
,
=
|
| ∝ |
• is a random variable of interest; D is observed data
• We have a prior subjective belief about
[as a part of model specification]
• We update our prior belief with the data to form
posterior beliefs about |
[at a result of model fitting]
Bayes Rule:
15. Model Spec.
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 15
Para-
meter
Distribution/Function Form and
Supporting Reference
Cauchy distribution (Gelman et al. 2008):
~Cauchy 0, 10
Multivariate normal (MVN) distribution with
the horseshoe prior (Carvalho et al. 2010):
= , , … ,
"
~ MVN $, % Σ
% Horseshoe prior (Carvalho et al. 2010):
% = ' (
' , (~Half-Cauchy 0,1
Σ Covariance matrix (Barnard et al. 2000)
Σ = diag + ∗ Ω ∗ diag +
+ = . , . , … , .
"
. Half-Cauchy distribution (Gelman et al. 2008):
. ~ Half-Cauchy 0, 2.5
Ω Lewandowski, Kurowicka and Joe (LKJ, 2009):
Ω ~LKJ 2, 1
Note for the table: The index j ranges from 1 to J
(the total number of predictors), and the index k
ranges from 1 to K (the total number of tasks).
J
K
σ Ω
Σ
β
τ
rψ
K
θ
N
α
K
16. Model fitting
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 16
• No-U-Turn Sampler (NUTS; Hoffman and Gelman 2014)
− A variant of Hamiltonian Monte Carlo (HMC)
− Adaptively sets the algorithmic parameters in HMC
• 2 Markov chains, 1000 warm-up draws, 1000
sampling draws
• For each parameter, convergence was assessed using
Gelman and Rubin’s (1992) diagnostic test, a.k.a., 34
statistic, with the value less than 1.2.
17. Evaluations
Healthcare Predictive Analytics for
Risk Profiling in Chronic Care:
A Bayesian Multitask Learning Approach
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 17
18. Illustration of experiment design
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 18
v1 v2 v3 v4 v5 v6 v7 v8 Time
Step 1: Randomly sample a visit from the first half of the patient’s medical history.
v1 v2 v3 v4 v5 v6 v7 v8 Time
Step 2: Use information available in and before the sampled visit for learning or prediction.
v1 v2 v3 v4 v5 v6 v7 v8 Time
Step 3: Learn and predict if an event will happen in the next w years.
Event occurrence
Sampled visit, denoted by v0i
19. Summary of data
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 19
• 14,752 adult patients with type 2 diabetes
• Events / Complications
− stroke (henceforth denoted by STK), acute myocardial
infarction (AMI; aka, heart attack), and acute renal failure
(ARF)
− Exclude patients with all three events occurred before v0i
Before v0i
During v0i and v0i + w years
After v0i + 5
w = 1 w = 2 w = 3 w = 4 w = 5
STK 1507 354 560 685 793 828 47
AMI 485 75 146 178 210 225 20
ARF 410 217 399 488 536 571 37
20. Variables in our analysis
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 20
• 179 variables in total
− Missing values are imputed using the column mean
• Examples
Category Example Variables
Patient information Age, body weight, male, smoking
Diagnoses Three digit ICD-9 codes, e.g., 401 for essential
hypertension and 427 for cardiac dysrhythmias
Treatments Aspirin, clopidogrel, insulin, isoket, metformin
Labs and exams CT scan, low-density lipoprotein cholesterol, serum
creatinine, systolic blood pressure
Note: ICD-9=International Classification of Diseases, Ninth Revision
21. Three sets of evaluations
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 21
1. BMTL vs. single task learning approaches
2. BMTL vs. other multitask learning approaches
3. Counterfactual analysis of practical use
22. Evaluations 1 and 2
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 22
• 10-fold cross validation
• Area Under the Curve (AUC)
− Ranges from 0.5 (a worthless
model) to 1 (a perfect model)
− The DeLong test of AUC
(DeLong et al. 1988)
Testing data
Training data
Fold 1 Fold 2 Fold 3 Fold 10
……
OriginalData
TruePositiveRate
False Positive Rate
23. Evaluation 1 (AUC; 10-fold CV)
BMTL vs. STL approaches
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 23
Window
(w)
Task
(k)
Models
BMTL-Logit B-Logit Logit Logit-lasso
1 STK 0.747 0.725*** 0.723*** 0.735***
1 AMI 0.778 0.744*** 0.729*** 0.758**
1 ARF 0.863 0.855* 0.847** 0.849***
3 STK 0.742 0.724*** 0.722*** 0.728***
3 AMI 0.736 0.703*** 0.699*** 0.704***
3 ARF 0.833 0.823*** 0.819*** 0.823***
5 STK 0.739 0.724*** 0.723*** 0.727***
5 AMI 0.727 0.699*** 0.698*** 0.704***
5 ARF 0.820 0.812*** 0.809*** 0.814***
Note. Bolded values highlight the best AUC result in a row.
*** The AUC result is statistically significantly different from BMTL-Logit at α = 0.01.
** The AUC result is statistically significantly different from BMTL-Logit at α = 0.05.
* The AUC result is statistically significantly different from BMTL-Logit at α = 0.1.
24. Evaluation 2 (AUC; 10-fold CV)
BMTL vs. other MTL approaches
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 24
Window
(w)
Task
(k)
Models
BMTL-Logit
(this study)
MTL-Logit
(Huang et al. 2012)
MTL-Tree
(Simm et al. 2014)
MTL-ANN
(Caruana 1997)
1 STK 0.747 0.746 0.717** 0.660***
1 AMI 0.778 0.767* 0.737** 0.686**
1 ARF 0.863 0.849* 0.831*** 0.650***
3 STK 0.742 0.730** 0.702*** 0.677***
3 AMI 0.736 0.693*** 0.727* 0.680***
3 ARF 0.833 0.816*** 0.787*** 0.763***
5 STK 0.739 0.719*** 0.686*** 0.670***
5 AMI 0.727 0.705** 0.692** 0.653***
5 ARF 0.820 0.809*** 0.770*** 0.703***
Note. Bolded values highlight the best AUC result in a row.
*** The AUC result is statistically significantly different from BMTL-Logit at α = 0.01.
** The AUC result is statistically significantly different from BMTL-Logit at α = 0.05.
* The AUC result is statistically significantly different from BMTL-Logit at α = 0.1.
25. Evaluation 3
Counterfactual analysis
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 25
• Prediction of risk is not enough—we need evidence that
prediction can lead to actions that reduce risk beyond what
would occur without the prediction rule. (Grady and Berkowitz 2011)
− How to assess the practical value of a predictive model
without actual use?
− Assumption for our counterfactual analysis:
Physicians will always provide guideline-recommended
preventive interventions if they believe a patient has a high
risk of STK/AMI/ARF.
− Among the positive cases (patients with the STK/AMI/ARF
events between v0i and v0i + 5 years), what happened to them
and what could happen to them given a prediction rule.
26. Evaluation 3
Counterfactual analysis
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 26
• Guideline-recommended preventive treatments
− Source: “Diabetes Comprehensive Care Plan Guidelines” from
the American Association of Clinical Endocrinologists
Comorbidity Preventive Treatment
STK • Antihypertensive agents
• Antiplatelet therapy
AMI • Antihypertensive agents
• Antiplatelet therapy
• Lipid lowering therapy
ARF • Antihypertensive agents
• Angiotensin receptor blockers
• Angiotensin-converting-enzyme inhibitors
27. Evaluation 3
Counterfactual analysis
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 27
• From the positive cases, we are interested in the
proportions who
− actually received preventive interventions at or before v0i?
− potentially could receive preventive interventions at v0i,
given model predictions?
• Practically useful models: small c and large d
Predicted Risk
(from some model)
Low High
Preventive treatment
prescribed at/before v0i
Yes a b
No c d
High/low risk cutoff level:
2% per year 10% over 5 yrs
(Dhamoon and Elkind 2010)
28. Evaluation 3
Summary of results
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 28
A lower c (making fewer mistakes) is better
A higher d (supporting physicians) is better
29. Conclusions
Healthcare Predictive Analytics for
Risk Profiling in Chronic Care:
A Bayesian Multitask Learning Approach
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 29
30. Conclusions
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 30
• The proposed BMTL approach outperforms the
alternative models in risk profiling, and could support
physicians to identify high risk patients.
• Multitask learning improves overall learning
performance by sharing information across models
− Evidence for the spillover effect in model training
• Beyond healthcare
31. Practical implications
Risk profiling in chronic care
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 31
• Hospitals: new healthcare delivery models
− Accountable care organizations; bundled payments
• Physicians: decision support at the point of care
− To err is human
• Patients: healthcare spending and # of conditions
− Medical Expenditure Panel Survey
32. To error is human
5/27/2017 Healthcare Predictive Analytics for Risk Profiling in Chronic Care 32
• Building a better health system with IT and analytics
33. Thank you
Please send comments to Yu-Kai Lin
ylin@business.fsu.edu
5/27/2017 33Healthcare Predictive Analytics for Risk Profiling in Chronic Care