History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
PMED Transition Workshop - Dynamic Treatment Regimes via Reward Ignorant Modeling - Michael Wallace, May 20, 2019
1. Dynamic Treatment Regimes via Reward Ignorant
Modelling
Michael P. Wallace (University of Waterloo)
Joint work with Erica E. M. Moodie and David A. Stephens (McGill University)
May 20, 2019
2. Treating the patient, not the diagnosis
Heterogeneity between patients:
0 months 3 months
Drug A
Drug A
Alice
Bob
3. Treating the patient, not the diagnosis
Heterogeneity between patients:
0 months 3 months
Drug A
Drug A
Alice
Bob
0 months 3 months
Drug A
Alice
Bob
Drug B
4. Dynamic treatment regimes
Dynamic treatment regimes (DTRs) ‘formalize’ the process of
personalized medicine:
“If patient over 65 prescribe Drug A, otherwise Drug B.”
DTR
Treatment
recommendation
Patient
information
DTRs can lead to improved results over standard ‘one size fits
all’ approaches.
6. Notation
More generally, we work in stages:
X
A
Y1
1
X
A
2
2
X
A
3
3
Stage 1 Stage 3Stage 2
Goal: find treatment sequence Aopt
1 , Aopt
2 , Aopt
3 maximizing E[Y |·].
7. Identifying the best treatment regime: multi-stage
First problem: how to make this more manageable?
X
A
Y1
1
X
A
2
2
X
A
3
3
Stage 1 Stage 3Stage 2
8. Identifying the best treatment regime: multi-stage
The multi-stage case is more complicated.
YH
A
3
3
Writing H = (X ,A ,X ,A ,X )
reduces finding A to a
single-stage problem.
1 2 31 2
3
opt
3
9. Identifying the best treatment regime: multi-stage
We’ve now ‘solved’ the problem for stage 3 and can look at stage 2
X
A
Y1
1
X
A
2
2
X
A
3
3
Stage 1 Stage 3Stage 2
10. Identifying the best treatment regime: multi-stage
Pseudo-outcome: Y opt
2 = Y if A3 = Aopt
3 according to analysis.
X
A
1
1
X
A
2
2
Y2
opt
12. Important principle: if patients are treated badly, we can learn
something from their observed outcome.
0 months 3 months
Drug A
Drug A
Alice
Bob
...but what if most patients are treated well?
13. Important principle: if patients are treated badly, we can learn
something from their observed outcome.
0 months 3 months
Drug A
Alice
Bob
Drug B
...but what if most patients are treated well?
14. Reward Ignorant Modelling
Intuition: a model relating the observed treatment to covariates
should elicit a viable treatment strategy if patients are treated
correctly (‘optimal dose assumption’).
e.g., linear or logistic regression of treatment on covariates.
15. Reward Ignorant Modelling
Standard analysis: use pre-treatment
covariates, treatment, and outcome,
to inform dynamic treatment regime.
X
A Y
16. Reward Ignorant Modelling
Standard analysis: use pre-treatment
covariates, treatment, and outcome,
to inform dynamic treatment regime.
X
A Y
Reward ignorant modelling: simply
model relationship between X and A.
X
A Y
17. Exploring the idea
Binary treatment decision A based on X crossing some threshold.
X
A Y
RIM: model relationship between X and A, ignoring Y .
Alternative: incorporate Y in analysis.
18. Exploring the idea
Two evaluation metrics:
1. Optimal treatment rate: for what proportion of patients
does the method identify the correct treatment?
2. Optimal outcome: if we used the treatment rules our
methods propose, what is the average outcome for our
patients?
Aside: which metric are we/practitioners more interested in?
19. Exploring the idea
Binary treatment decision A based on X crossing some threshold.
Can simulate an ‘expert’ of increasing accuracy:
70 75 80
65707580859095
Observed optimal treatment (%)
Optimaltreatment(%)
q RIM dWOLS IPTW AIPTW
20. Exploring the idea
Binary treatment decision A based on X crossing some threshold.
Logistic regression of A on X:
q
q
q
q
q
q
q
q
q
q
70 75 80
65707580859095
Observed optimal treatment (%)
Optimaltreatment(%)
q RIM dWOLS IPTW AIPTW
21. Exploring the idea
Binary treatment decision A based on X crossing some threshold.
dWOLS: weighted least squares that takes outcome into account:
q
q
q
q
q
q
q
q
q
q
70 75 80
65707580859095
Observed optimal treatment (%)
Optimaltreatment(%)
q RIM dWOLS IPTW AIPTW
22. Exploring the idea
Binary treatment decision A based on X crossing some threshold.
(Augmented) inverse probability of treatment weighting:
q
q
q
q
q
q
q
q
q
q
70 75 80
65707580859095
Observed optimal treatment (%)
Optimaltreatment(%)
q RIM dWOLS IPTW AIPTW
23. Exploring the idea
Can simulate expected outcome if patients treated according to
each method:
q
q
q
q
q
q
q
q
q
q
70 75 80
−0.15−0.10−0.050.00
Observed optimal treatment (%)
Optimaloutcome
q RIM dWOLS IPTW AIPTW
24. Exploring the idea
Can simulate expected outcome if patients treated according to
each method:
q
q
q
q
q
q
q
q
q
q
70 75 80
−0.15−0.10−0.050.00
Observed optimal treatment (%)
Optimaloutcome
q RIM dWOLS IPTW AIPTW
25. Exploring the idea
Key points:
RIM can out-perform more complex methods (at the 75-80%
accuracy mark in these simulations).
dWOLS most competitive when treatment is near-optimal.
But: choice of method may depend on treatment rate?
26. Now consider 2 stages of treatment:
X
A
X A
Y
1
1
2 2
Optimal treatment at each stage based on whether X1, X2 cross
some threshold.
Idea: treatment uninformed at stage 1, but expert improves by
stage 2.
27. X
A
X A
Y
1
1
2 2
We now consider a multi-method approach:
Method 1: use dWOLS at both stages.
Method 2: use RIM at stage 2 and dWOLS at stage 1.
Note: if stage 2 treatment near-optimal, then Y ≈ Y opt
2 .
30. Optimal outcome if each method’s decision rule is used:
74 76 78 80 82 84 86 88
−0.30−0.20−0.100.00
Observed optimal treatment (stage 2, %)
Optimaloutcome
q
q
q
q
q
q
q
q
q
q
q
Method: Stage 1, Stage 2
dWOLS, RIM dWOLS, dWOLS RIM, RIM RIM, dWOLS
31. Optimal outcome if each method’s decision rule is used:
74 76 78 80 82 84 86 88
−0.30−0.20−0.100.00
Observed optimal treatment (stage 2, %)
Optimaloutcome
q
q
q
q
q
q
q
q
q
q
q
Method: Stage 1, Stage 2
dWOLS, RIM dWOLS, dWOLS RIM, RIM RIM, dWOLS
32. Optimal outcome if each method’s decision rule is used:
74 76 78 80 82 84 86 88
−0.30−0.20−0.100.00
Observed optimal treatment (stage 2, %)
Optimaloutcome
q
q
q
q
q
q
q
q
q
q
q
Method: Stage 1, Stage 2
dWOLS, RIM dWOLS, dWOLS RIM, RIM RIM, dWOLS
33. Another potential mis-specification structure:
X
A
W
Y
Optimal treatment depends on X and W .
Idea: expert uses X to inform A, but not W .
By varying the importance of W in the optimal treatment rule we
affect the expert’s success rate.
34. Extension to 2 stages:
X
Y
1
1
2 2
X
A
A
2
W
Optimal treatment at stage 1 depends on X1, at stage 2 depends
on X2 and W2.
Stage 1 treatment uninformed; at stage 2, X2 (but not W2) used.
35. Warfarin example
Illustration: data from the International Warfarin
Pharmacogenetics Consortium.
Goal: identify dose of warfarin to optimize the international
normalized ratio (INR), typically recommended to lie between 2
and 3.
89% of 1,732 patients had an INR between 2 and 3 =⇒ patients
being treated well?
36. Warfarin example
Dataset split into training/testing pairs.
Chen et al. (2016) applied an outcome weighted learning approach.
Wallace et al. (2017) applied dynamic weighted ordinary least
squares (dichotomized treatment).
Dose Metric Non-RIM RIM
Continuous Correlation 0.60 (±0.08) 0.68 (±0.012)
Binary Agreement 54% (±8.27%) 78% (±0.94%)
37. Warfarin example
Dataset split into training/testing pairs.
Chen et al. (2016) applied an outcome weighted learning approach.
Wallace et al. (2017) applied dynamic weighted ordinary least
squares (dichotomized treatment).
Dose Metric Non-RIM RIM
Continuous Correlation 0.60 (±0.08) 0.68 (±0.012)
Binary Agreement 54% (±8.27%) 78% (±0.94%)
38. Warfarin example
Dataset split into training/testing pairs.
Chen et al. (2016) applied an outcome weighted learning approach.
Wallace et al. (2017) applied dynamic weighted ordinary least
squares (dichotomized treatment).
Dose Metric Non-RIM RIM
Continuous Correlation 0.60 (±0.08) 0.68 (±0.012)
Binary Agreement 54% (±8.27%) 78% (±0.94%)
39. Identifying optimal treatment rates
Observed outcome = Outcome under optimal treatment −
‘Harm’ caused by non-optimal treatment
40. Identifying optimal treatment rates
Observed outcome = Outcome under optimal treatment −
‘Harm’ caused by non-optimal treatment
or
Y (a) = Y (aopt) − µ(a, x) = f (x) − µ(a, x)
For given x, we expect Y (aopt) ≥ Y (a).
41. Identifying optimal treatment rates
Observed outcome = Outcome under optimal treatment −
‘Harm’ caused by non-optimal treatment
or
Y (a) = Y (aopt) − µ(a, x) = f (x) − µ(a, x)
For given x, we expect Y (aopt) > Y (a).
Idea: compare outcome among those optimally treated according
to various methods.
42. Summing up
What we have so far:
The quality of treatment can/should inform analysis method.
More complex methods may be outperformed by simpler ones.
Multi-method approaches an interesting direction for
stage-by-stage analysis.
43. Summing up
What we have so far:
The quality of treatment can/should inform analysis method.
More complex methods may be outperformed by simpler ones.
Multi-method approaches an interesting direction for
stage-by-stage analysis.
But:
How do we know the quality of treatment? What are experts
really thinking?
Need to develop actionable ideas/rules for analysis decisions.
Do other (more sophisticated) methods perform better?
44. References/Acknowledgments
dWOLS: M. P. Wallace and E. E. M. Moodie (2015).
Doubly-robust dynamic treatment regimen estimation via
weighted least squares. Biometrics 71(3) 636-644.
Reward Ignorant Modelling: Wallace M. P., Moodie E. E. M.
and Stephens D. A. (2018). Reward Ignorant Modeling of
Dynamic Treatment Regimes. Biometrical Journal 60
991-1002.
michael.wallace@uwaterloo.ca mpwallace.github.io
45. Optimal treatment % for n = 200:
0.0 0.5 1.0 1.5 2.0
5060708090100
Second covariate coefficient
Optimaltreatment%
Optimal stage 2 treatment %
46. Optimal treatment % for n = 200:
0.0 0.5 1.0 1.5 2.0
5060708090100
Second covariate coefficient
Optimaltreatment%
RIM
dWOLS Optimal stage 2 treatment %
47. Optimal treatment % for n = 200:
0.0 0.5 1.0 1.5 2.0
5060708090100
Second covariate coefficient
Optimaltreatment%
RIM
dWOLS Optimal stage 2 treatment %
48. Optimal outcome if each method’s decision rule is used:
0.0 0.5 1.0 1.5 2.0
−0.5−0.4−0.3−0.2−0.10.0
Second covariate coefficient
Optimalresponse
RIM
dWOLS
True optimal outcome
49. X
Y
1
1
2 2
X
A
A
2
W
We again consider a multi-method approach:
Method 1: use dWOLS at both stages.
Method 2: use RIM at stage 2 and dWOLS at stage 1.
50. Optimal treatment % for n = 200: a familiar pattern.
0.0 0.5 1.0 1.5 2.0
5060708090100
Second covariate coefficient
%optimallytreatedatbothstages
Stage 2 method
RIM
dWOLS
Optimal stage 2 treatment %
51. Optimal outcome if each method’s decision rule is used:
0.0 0.5 1.0 1.5 2.0
−0.4−0.3−0.2−0.10.0
Cut−off used
Optimaloutcome
RIM
dWOLS
True optimal outcome