PMED Transition Workshop - Dynamic Treatment Regimes via Reward Ignorant Modeling - Michael Wallace, May 20, 2019

Dynamic Treatment Regimes via Reward Ignorant
Modelling
Michael P. Wallace (University of Waterloo)
Joint work with Erica E. M. Moodie and David A. Stephens (McGill University)
May 20, 2019

Treating the patient, not the diagnosis
Heterogeneity between patients:
0 months 3 months
Drug A
Drug A
Alice
Bob

Treating the patient, not the diagnosis
Heterogeneity between patients:
0 months 3 months
Drug A
Drug A
Alice
Bob
0 months 3 months
Drug A
Alice
Bob
Drug B

Dynamic treatment regimes
Dynamic treatment regimes (DTRs) ‘formalize’ the process of
personalized medicine:
“If patient over 65 prescribe Drug A, otherwise Drug B.”
DTR
Treatment
recommendation
Patient
information
DTRs can lead to improved results over standard ‘one size ﬁts
all’ approaches.

Notation
X A Y
Pre-treatment
covariates
Treatment
received
Outcome
Age 'Healthiness
metric'
Drug A or
Drug B?
Goal: ﬁnd treatment Aopt maximizing E[Y |x].

Notation
More generally, we work in stages:
X
A
Y1
1
X
A
2
2
X
A
3
3
Stage 1 Stage 3Stage 2
Goal: ﬁnd treatment sequence Aopt
1 , Aopt
2 , Aopt
3 maximizing E[Y |·].

Identifying the best treatment regime: multi-stage
First problem: how to make this more manageable?
X
A
Y1
1
X
A
2
2
X
A
3
3

The multi-stage case is more complicated.
YH
A
3
3
Writing H = (X ,A ,X ,A ,X )
reduces ﬁnding A to a
single-stage problem.
1 2 31 2
3
opt
3

We’ve now ‘solved’ the problem for stage 3 and can look at stage 2
X
A
Y1
1
X
A
2
2
X
A
3
3

Pseudo-outcome: Y opt
2 = Y if A3 = Aopt
3 according to analysis.
X
A
1
1
X
A
2
2
Y2
opt

Lots of methods available:
Q-learning
G-estimation
IPTW
A-learning
dWOLS
MSMs
OWL
etc...

Important principle: if patients are treated badly, we can learn
something from their observed outcome.
0 months 3 months
Drug A
Drug A
Alice
Bob
...but what if most patients are treated well?

Important principle: if patients are treated badly, we can learn
something from their observed outcome.
0 months 3 months
Drug A
Alice
Bob
Drug B
...but what if most patients are treated well?

Reward Ignorant Modelling
Intuition: a model relating the observed treatment to covariates
should elicit a viable treatment strategy if patients are treated
correctly (‘optimal dose assumption’).
e.g., linear or logistic regression of treatment on covariates.

Standard analysis: use pre-treatment
covariates, treatment, and outcome,
to inform dynamic treatment regime.
X
A Y

Standard analysis: use pre-treatment
covariates, treatment, and outcome,
to inform dynamic treatment regime.
X
A Y
Reward ignorant modelling: simply
model relationship between X and A.
X
A Y

Exploring the idea
Binary treatment decision A based on X crossing some threshold.
X
A Y
RIM: model relationship between X and A, ignoring Y .
Alternative: incorporate Y in analysis.

Exploring the idea
Two evaluation metrics:
1. Optimal treatment rate: for what proportion of patients
does the method identify the correct treatment?
2. Optimal outcome: if we used the treatment rules our
methods propose, what is the average outcome for our
patients?
Aside: which metric are we/practitioners more interested in?

Exploring the idea
Can simulate an ‘expert’ of increasing accuracy:
70 75 80
65707580859095
Observed optimal treatment (%)
Optimaltreatment(%)
q RIM dWOLS IPTW AIPTW

Exploring the idea
Logistic regression of A on X:
q
q
q
q
q
q
q
q
q
q
70 75 80
65707580859095
Optimaltreatment(%)

Exploring the idea
dWOLS: weighted least squares that takes outcome into account:
q
q
q
q
q
q
q
q
q
q
70 75 80
65707580859095
Optimaltreatment(%)

Exploring the idea
(Augmented) inverse probability of treatment weighting:
q
q
q
q
q
q
q
q
q
q
70 75 80
65707580859095
Optimaltreatment(%)

Exploring the idea
Can simulate expected outcome if patients treated according to
each method:
q
q
q
q
q
q
q
q
q
q
70 75 80
−0.15−0.10−0.050.00
Optimaloutcome

Exploring the idea
Key points:
RIM can out-perform more complex methods (at the 75-80%
accuracy mark in these simulations).
dWOLS most competitive when treatment is near-optimal.
But: choice of method may depend on treatment rate?

Now consider 2 stages of treatment:
X
A
X A
Y
1
1
2 2
Optimal treatment at each stage based on whether X1, X2 cross
some threshold.
Idea: treatment uninformed at stage 1, but expert improves by
stage 2.

X
A
X A
Y
1
1
2 2
We now consider a multi-method approach:
Method 1: use dWOLS at both stages.
Method 2: use RIM at stage 2 and dWOLS at stage 1.
Note: if stage 2 treatment near-optimal, then Y ≈ Y opt
2 .

Optimal outcome if each method’s decision rule is used:
74 76 78 80 82 84 86 88
−0.15−0.10−0.050.00
Observed optimal treatment (stage 2, %)
Optimaloutcome
q
q
q
q
q
q
q
q
q
q
q
Method: Stage 1, Stage 2
dWOLS, RIM dWOLS, dWOLS

74 76 78 80 82 84 86 88
−0.30−0.20−0.100.00
Observed optimal treatment (stage 2, %)
Optimaloutcome
q
q
q
q
q
q
q
q
q
q
q
Method: Stage 1, Stage 2
dWOLS, RIM dWOLS, dWOLS RIM, RIM RIM, dWOLS

Another potential mis-speciﬁcation structure:
X
A
W
Y
Optimal treatment depends on X and W .
Idea: expert uses X to inform A, but not W .
By varying the importance of W in the optimal treatment rule we
aﬀect the expert’s success rate.

Extension to 2 stages:
X
Y
1
1
2 2
X
A
A
2
W
Optimal treatment at stage 1 depends on X1, at stage 2 depends
on X2 and W2.
Stage 1 treatment uninformed; at stage 2, X2 (but not W2) used.

Warfarin example
Illustration: data from the International Warfarin
Pharmacogenetics Consortium.
Goal: identify dose of warfarin to optimize the international
normalized ratio (INR), typically recommended to lie between 2
and 3.
89% of 1,732 patients had an INR between 2 and 3 =⇒ patients
being treated well?

Warfarin example
Dataset split into training/testing pairs.
Chen et al. (2016) applied an outcome weighted learning approach.
Wallace et al. (2017) applied dynamic weighted ordinary least
squares (dichotomized treatment).
Dose Metric Non-RIM RIM
Continuous Correlation 0.60 (±0.08) 0.68 (±0.012)
Binary Agreement 54% (±8.27%) 78% (±0.94%)

Identifying optimal treatment rates
Observed outcome = Outcome under optimal treatment −
‘Harm’ caused by non-optimal treatment

or
Y (a) = Y (aopt) − µ(a, x) = f (x) − µ(a, x)
For given x, we expect Y (aopt) ≥ Y (a).

or
Y (a) = Y (aopt) − µ(a, x) = f (x) − µ(a, x)
For given x, we expect Y (aopt) > Y (a).
Idea: compare outcome among those optimally treated according
to various methods.

Summing up
What we have so far:
The quality of treatment can/should inform analysis method.
More complex methods may be outperformed by simpler ones.
Multi-method approaches an interesting direction for
stage-by-stage analysis.

Summing up
What we have so far:
The quality of treatment can/should inform analysis method.
More complex methods may be outperformed by simpler ones.
Multi-method approaches an interesting direction for
stage-by-stage analysis.
But:
How do we know the quality of treatment? What are experts
really thinking?
Need to develop actionable ideas/rules for analysis decisions.
Do other (more sophisticated) methods perform better?

References/Acknowledgments
dWOLS: M. P. Wallace and E. E. M. Moodie (2015).
Doubly-robust dynamic treatment regimen estimation via
weighted least squares. Biometrics 71(3) 636-644.
Reward Ignorant Modelling: Wallace M. P., Moodie E. E. M.
and Stephens D. A. (2018). Reward Ignorant Modeling of
Dynamic Treatment Regimes. Biometrical Journal 60
991-1002.
michael.wallace@uwaterloo.ca mpwallace.github.io

Optimal treatment % for n = 200:
0.0 0.5 1.0 1.5 2.0
5060708090100
Second covariate coefficient
Optimaltreatment%
Optimal stage 2 treatment %

Optimal treatment % for n = 200:
0.0 0.5 1.0 1.5 2.0
5060708090100
Optimaltreatment%
RIM
dWOLS Optimal stage 2 treatment %

0.0 0.5 1.0 1.5 2.0
−0.5−0.4−0.3−0.2−0.10.0
Optimalresponse
RIM
dWOLS
True optimal outcome

X
Y
1
1
2 2
X
A
A
2
W
We again consider a multi-method approach:
Method 1: use dWOLS at both stages.
Method 2: use RIM at stage 2 and dWOLS at stage 1.

Optimal treatment % for n = 200: a familiar pattern.
0.0 0.5 1.0 1.5 2.0
5060708090100
%optimallytreatedatbothstages
Stage 2 method
RIM
dWOLS
Optimal stage 2 treatment %

0.0 0.5 1.0 1.5 2.0
−0.4−0.3−0.2−0.10.0
Cut−off used
Optimaloutcome
RIM
dWOLS
True optimal outcome

PMED Transition Workshop - Dynamic Treatment Regimes via Reward Ignorant Modeling - Michael Wallace, May 20, 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PMED Transition Workshop - Dynamic Treatment Regimes via Reward Ignorant Modeling - Michael Wallace, May 20, 2019

Similar to PMED Transition Workshop - Dynamic Treatment Regimes via Reward Ignorant Modeling - Michael Wallace, May 20, 2019 (20)

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Recently uploaded

Recently uploaded (20)

PMED Transition Workshop - Dynamic Treatment Regimes via Reward Ignorant Modeling - Michael Wallace, May 20, 2019