Healthcare Data Analytics with Extreme Tree Models

Introduction to
Healthcare Data Analytics
with Extreme Tree Models
Yubin Park, PhD
Chief Technology Officer
1

Who am I
• Co-founder and Chief Technology Officer of Accordion Health, Inc.
• PhD from the University of Texas at Austin
• Advisor: Professor JoydeepGhosh
• Studied Machine Learning and Data Mining, with a special focus on
healthcare data
• Involved in various industry data mining projects
• USAA: Life-time modeling of customers
• SK Telecom: Smartphone purchase prediction, usage pattern analysis
• LinkedIn Corp.: Related search keywords recommendation
• Whole Foods Market: Price elasticity modeling
• …
2

Accordion Health
• Healthcare Data Analytics Company
• Founded in 2014 by
• Sriram Vishwanath, PhD
• Yubin Park, PhD
• Joyce Ho, PhD
• A team of data scientists and medical
professionals
• Help healthcare organizations lower costs and
improve qualities
3
From Health Datapalooza 2014

Types of Problems We Solve
• Which patient is likely to be readmitted?
• Which patient is likely to develop type 2 diabetes?
• Which patient is likely to adhere to his medication?
• How much this patient will cost this year?
• How many inpatient admissions this patient will have this year?
• Which physician is likely to follow our care guideline?
• What star rating will our organization receive this year?
• …
4

Healthcare Data is Messy
• Data structure
• Unstructured data such as EHR
• Structured data such as claims
• Location
• Doctors’ offices, insurance companies, governments,
etc.
• Data definition
• Different definitions for different communities
• Data format
• Various industry formats
• Data complexity
• Patients going in and out of systems
• Incomplete data
• Regulations & requirements
• Source: Health Catalyst
5

My Usual Work Flow
Summary
Statistics
Visual
Inspection
Data Cleansing
& Feature
Engineering (1)
Baseline
Models
Extreme Tree
Models
Data Cleansing
& Feature
Engineering (2)
Custom
Extreme Tree
Models
Data Cleansing
& Feature
Engineering (3)
Fully
Customized
Models
6
I start my data project by
checking summary
statistics, distributions, data
errors, and applying simple
models.
Extreme Tree Models*
serve as a check point
before further
developing customized
models.
*Extreme Tree Models refer
to a class of models that use
a tree as a base classifier.

Why Tree-based Models
“Of all the well-known methods,
decision trees come closest to
meeting the requirements for
serving as an off-the-shelf
procedure for data mining.”
• J. H. Friedman, R. Tibshirani, and
T. Hastie,. The Elements of
Statistical Learning
7

How to Grow a Tree
1. Start with a dataset
2. Pick a splitting feature
3. Pick a splitting cut-point
4. Split the dataset into two sets based on the splitting feature and
cut-point
5. Repeat from Step 2 with the partitioned datasets
8

Various Kinds of Trees – C4.5, CART
cut-point
9
Information Gain à C4.5
Gini Impurity, Variance Reduction à CART
- Quinlan, J. R. (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann
Publishers.
- Breiman, Leo; Friedman, J. H.; Olshen, R. A.; Stone, C. J. (1984). Classification and
regression trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software.

Tree à Forest
• Randomization Methods
• Random data sampling
• Random feature sampling
• Random cut-point sampling
10

Various Kinds of Forests – Bagged Trees
cut-point
11
Sample with replacement, and many trees
à Bagged Trees
- Breiman, L. (1996b). Bagging predictors. Machine Learning, 24:2, 123–140.

Various Kinds of Forests – Random Subspace
cut-point
12
Select a random subset of features
Then find the best feature/cut-point
- Ho, T. (1998). The Random subspace method for constructing decision forests.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 20:8, 832–844.

Various Kinds of Forests – Random Forests
cut-point
13
Sample with replacement
Select a random subset of features
Then find the best feature/cut-point
- Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.

Various Kinds of Trees – ExtraTrees
cut-point
14
Select a random subset of (feature, cut-point) pairs
Then find the best (feature, cut-point) pair
- Geurts, P., Damien E., and Louis W..(2006) Extremely randomized trees.
Machine learning 63.1, 3-42.

Again, Bias vs Variance
• Bias: Error from model
• Variance: Error from data
• Recursive partition à fewer samples as
tree grows
• Split features/cut-points are susceptible to
training samples
• Randomization decreases variance
• Image Source: Scott Fortmann-Roe
15

Evolution of Bias vs. Variance
16
- Geurts, P., Damien E., and Louis W..(2006) Extremely randomized trees.
Machine learning 63.1, 3-42.

Bias Variance Trade-off
17Image Source: Scott Fortmann-Roe
reduces variance
• However, for some
problems, reducing the
bias of a model may be
more critical for improving
its accuracy
• A very complex dataset with
many variables and samples

Are Tree Models are High-Variance Models?
• It depends…
• Number of data samples
• Number of features
• Data complexity
• Decrease Variance
• But increase Bias
18
There is another way of decreasing the
expected error, which
- Decrease Bias
- May increase variance

Boosting: Learn from Errors
19
Y = f0(X), where E1 = |Y-f0(X)|2
E1 = f1(X), where E2 = |Y-f1(X)|2
E2 = f2(X), where E3 = |Y-f2(X)|2
and so on...

Additive Model Framework
• Additive Model Framework
generalizes boosting,
stacking, and other variants
• Source: J. H. Friedman, R.
Tibshirani, and T. Hastie,.
The Elements of Statistical
Learning (ESL)
20

Gradient Boosting Machine
• Additive Models can be numerically
optimized via Gradient Descent
• Source: Wikipedia and ESL
21
- Friedman, Jerome H. (2001) Greedy function approximation: a gradient
boosting machine. Annals of statistics: 1189-1232.

Extreme Gradient Boosting (XGBoost)
22
Various Data Mining
Competitions in Kaggle
One thing they have in
common:
- They all used XGBoost

What’s so Special about XGBoost
• XGBoost implements the basic idea of GBM with some tweaks, such
as:
• Regularization of base trees
• Approximate split finding
• Weighted quantile sketch
• Sparsity-aware split finding
• Cache-aware block structure for out-of-core computation
• “XGBoost scales beyond billions of examples using far fewer resources
than existing systems.” – T. Chen and C. Guestrin
23

Going Further Extreme
• XGBoost of XGBoost
• Bagging of XGBoost
• Bagging of XGBoost of
XGBoost of …
• Stacking, Bagging, Sampling,
etc.
• Source: Kaggle
24

Real-world Example: Predict MedAdh Scores
• Centers for Medicare and Medicaid Services (CMS) measures the
performance of Medicare Advantage (MA) Plans via Star Rating
System
• Medication Adherence (MedAdh) is one of the most important
quality measures in the Star Rating System
• MA Plans want to know how much their MedAdh scores will change
in the next two years
25

Predict MedAdh Scores
• Where can I find data
• Download from the CMS Part C and D Performance Data webpage
• Constructing datasets
• MedAdh Data from 2012, 2013 à Training Features, Xtrain
• MedAdh Data from 2015 à Training Label, Ytrain
• MedAdh Data from 2013, 2014 à Test Features, Xtest
• MedAdh Data from 2016 à Test Label, Ytest
26

Lots of Missing Data
• Not all MA plans are measured for a given year à Mean Imputation
27
X1,X2,X3,X4,X5,X6,X7,X8,X9,Y
...
71.2,72.7,69.9,75.2,75.9,71.0,1.8
-999,-999,-999,75.8,72.5,68.8,-4.8
61.8,59.4,57.7,57.3,59.3,58.3,16.7
...
-999,-999,-999,82.8,80.0,69.8,-11.8
73.8,73.2,71.8,74.5,76.1,72.9,4.5

Try Various Models
• From simple models like Linear Regression, Decision Tree to extreme-
tree models such as ExtraTrees and Gradient Boosting
28
from sklearn import linear_model
from sklearn import tree
from sklearn.utils import resample
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.ensemble import GradientBoostingRegressor

Try Various Models – code snippet
• From simple models like Linear Regression, Decision Tree to extreme-
tree models such as ExtraTrees and Gradient Boosting
29
lm = linear_model.LinearRegression()
dt = tree.DecisionTreeRegressor()
etr = ExtraTreesRegressor(n_estimators=100, max_depth=10)
gbr = GradientBoostingRegressor(n_estimators=500,
learning_rate=0.25,
max_depth=8)

Try Various Models – results
30
$ python test.py
…
RMSE Results
lm: 2.7125536923
dt: 3.10460672029
etr: 2.18597303421
gbr: 2.02698129388

Try Various Models – results
31
Extreme Tree Models
exhibit significant
improvements in
accuracies compared to
simple models.
One can build more
sophisticated models
based on the error
characteristics of these
models.

Contact
• yubin [at] accordionhealth [dot] com
32

Healthcare Data Analytics with Extreme Tree Models

More Related Content

What's hot

Viewers also liked

Similar to Healthcare Data Analytics with Extreme Tree Models

Recently uploaded

Healthcare Data Analytics with Extreme Tree Models