PMED Opening Workshop - Inference on Individualized Treatment Rules from Observational Studies with High-Dimensional Covariates - Yingi Zhao, August 14, 2018

Individualized treatment rules from observational studies
with high-dimensional covariates
Yingqi Zhao, PhD
Fred Hutchinson Cancer Research Center
August 14, 2018, SAMSI PMED Opening Workshop

Contents
Introduction
Background
Proposed method & main results
Simulation & data analysis
Concluding remarks
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 1 / 29

Introduction

Precision medicine
“Delivering the right treatments, at the right time, to the right person”
- Remarks by Obama on Precision Medicine, 2015
Knowledge-driven
Uses scientiﬁc understandings of
genes, proteins, pathways, and
mechanisms
Data-driven
Uses a empirical, statistical,
computational methods and lets
the data talk

Observational data
Promises Challenges
Relatively cheap Missingness
Aid cost-eﬀective RCT Measurement errors
Only source when RCT unethical Security & privacy
Sampling bias
Distributed
High-dimensional
Huge gap in our understanding in observational data.

Motivating dataset
Elderly patients with Type-II Diabetes with comorbidity
30.3 million Americans have diabetes.
Linked claims and EHR data for Medicare beneﬁciaries in the University of
Wisconsin Medical Foundation system.
9101 patients, 136 pretreatment covariates
Treatment (Medication)
Hypoglycemic agent (Insulin, Metformin, . . .)
None
Outcome of interest: Glucose level (A1c)
Research question
1. Can we develop a data-driven rule that can better control glucose level if
implemented to future patients?
2. Can we utilize the learned data-driven rule to help expand our knowledge on
Diabetes care?
3. Can we provide uncertainty measurements for the learned rule?

Statistical goal and challenges
Goal
Develop a principled, robust, data-driven method for discovering individualized
treatment rule along with inferential procedures that can maximize future
patient’s beneﬁt
Challenges from the data
Treatments are not randomized.
Covariates (predictors) are high-dimensional.

Background

Settings
{(Xi, Ai, Yi)}n
i=1
iid
∼ P: Covariate, treatment, and outcome
X ∈ Rp
, A = ±1, and Y ∈ R (higher value preferred)
p possibly large
P(A|X) (propensity) unknown
Assume (1) P(A|X) > c > 0 and (2) no unmeasured confounders.
We want to construct an individualized treatment rule (ITR)
d(x) : X → {−1, +1}
e.g. d(x) ≡ 1, d(x) = sign(xT
1)
Objective: Find d maximizing the expected outcome if implemented in the
future.

Review: Existing approaches
Useful representation
Y = µ(X)
“main effect”
+ A · c(X)
“treatment effect”
+
d∗
(X) = sign{c(X)}.
Regression-based approach
1. Parametrize and fit Y ∼ µ(X) + A c(X).
2. d(X) = sign{c(X)}.
Potential issues
The decision is estimated indirectly through predicted outcome
If model is incorrect, the quality of estimated ITR may be poor

Review: Outcome weighted learning (O-learning)
Another characterization of optimal ITR
(X, A, Y ) ∼ P
V (d) (Value function)
V (d) = Ed
(Y ) = Y dPd
= Y
dPd
dP
dP = E Y ·
I{A = d(X)}
P(A|X)
(Qian and Murphy, 2011, AoS)
Optimal ITR satisﬁes
d∗
= argmax
d
V (d).

How about directly optimize V (d)?
Value maximization Risk minimization
E Y
P(A|X) I{A = d(X)} ⇐⇒ E Y
P(A|X) I{A = d(X)}
Empirical risk
1
n
n
i=1
Yi
P(Ai|Xi)
I{Ai = d(Xi)}
Problem: Minimization over d is computationally challenging.
Observation: It looks like a weighted zero-one loss.

Key idea: take a surrogate loss. (Zhao et al., 2012, JASA)
For any d, let d(·) = sign{f(·)} for measurable f.
Note: I{A = d(X)} = I(Af(X) < 0)
Consider
fo
= argmin
f: measurable
E
Y
P(A|X)
· φ(Af(X)) .
φ(t): convex surrogate loss of I(t < 0)
e.g. 1
log 2 log(1 + exp(−t)) (logistic loss), (1 − t)+ (hinge loss)

Note: I{A = d(X)} = I(Af(X) < 0)
Consider
fo
= argmin
f: measurable
E
Y
P(A|X)
· φ(Af(X)) .
Fisher consistency: sign(fo
) = d∗
if Y > 0.

Note: I{A = d(X)} = I(Af(X) < 0)
Consider
fo
= argmin
f: measurable
E
Y
P(A|X)
· φ(Af(X)) .
Fisher consistency: sign(fo
) = d∗
if Y > 0.
Summary: Outcome weighted learning
Avoid outcome regression (likelihood), optimize the Value directly (0-1 loss)
Potential for ﬂexibility and robustness

Sparse O-learning (Xu et al., 2015; Song et al. 2015)
d∗
(x) = sign(xT
β∗
) with β∗
sparse.
d(x) = sign(xT
β)
β = argmin
β∈Rp
1
n
n
i=1
Yi
P(Ai|Xi)
φ AiXT
i β + λ β 1
Existing results
Convergence rate & model selection consistency of β when P(Ai|Xi) known

Proposed method
Task: Rigorous inference on d∗
(X)
Framework: Outcome weighted learning
Main contribution
Asymptotically valid test when X is high-dim and P(A|X) unknown

Inference: Prelude
Hypothesis: β = (β1, βT
2 )T
.
H0 : β∗
1 = 0 vs. H1 : β∗
1 = 0 (Recall d∗
(X) = sign(XT
β∗
))
Easily extends to testing a vector β∗
1
Help discover important biomarker(s) for treatment eﬀect
Why inference? To quantify the uncertainty of discovery (p-value).
Challenge: high-dimensional inference is much more diﬃcult than estimation.
Sparse estimators are not regular. (Distribution of β not well understood)

Inference: How?
Key idea: β is obtained from a penalized M-estimation.
β = argmin
β∈Rp
lβ(β; α)
1
n
n
i=1
Yi
Pα(Ai|Xi)
φ AiXT
i β
+ λ β 1
Use the decorrleated score test (Ning and Liu, 2017, AoS)
Applicable to general penalized M-estimation with convex diﬀerentiable loss
Challenge again: nuisance α estimated from a propensity score model
e.g. Penalized logistic regression with A as responses
α = argmin
α∈Rp
lα(α)
1
n
n
i=1 log{1+exp(−AiXT
i α)}
+ λ α 1

Intuition on the decorrelated score test (w/o nuisance α)
Forget everything so far!
l(β): convex diﬀerentiable loss, β∗
:= argminβ E[l(β)], H0 : β∗
1 = 0
Information matrix: I∗
:= E ∂2
∂β∂βT l(β∗
) =
I∗
11 I∗
12
I∗
21 I∗
22
.
Classical Rao’s score
Based on the proﬁle score function ∂
∂β1
l(0, β2(0)), where
β2(β1) = arg minβ2
l(β1, β2)
√
n
∂
∂β1
l(0, β2(0))

1 = 0
:= E ∂2
∂β∂βT l(β∗
) =
I∗
11 I∗
12
I∗
21 I∗
22
.
∂β1
l(0, β2(0)), where
l(β1, β2)
√
n
∂
∂β1
l(0, β2(0)) =
√
n
∂
∂β1
l(0, β∗
2) − I∗
12(β2(0) − β∗
2) + Rem

1 = 0
:= E ∂2
∂β∂βT l(β∗
) =
I∗
11 I∗
12
I∗
21 I∗
22
.
Classical Rao’s score is problematic in high dimension,
∂β1
l(0, β2(0)), where
l(β1, β2)
√
n
∂
∂β1
l(0, β2(0)) =
√
n
∂
∂β1
l(0, β∗
2) − I∗
12(β2(0) − β∗
2) + Rem
because Rem is not negligible if p → ∞ or limiting distribution not tractable.

√
n
∂
∂β1
l(0, β2(0)) =
√
n



∂
∂β1
l(0, β∗
2) − I∗
12(I∗
22)−1
(w∗)T
∂
∂β2
l(0, β∗
2)


 + Rem

√
n
∂
∂β1
l(0, β2(0)) =
√
n



∂
∂β1
l(0, β∗
2) − I∗
12(I∗
22)−1
(w∗)T
∂
∂β2
l(0, β∗
2)


 + Rem
The decorrelated score function for β1
S(β1, β2) =
∂
∂β1
l(β1, β2) − wT ∂
∂β2
l(β1, β2), with wT
= I12I−1
22 .
Uncorrelated with the nuisance score functions.

√
n
∂
∂β1
l(0, β2(0)) =
√
n



∂
∂β1
l(0, β∗
2) − I∗
12(I∗
22)−1
(w∗)T
∂
∂β2
l(0, β∗
2)


 + Rem
The decorrelated score function for β1
S(β1, β2) =
∂
∂β1
l(β1, β2) − wT ∂
∂β2
l(β1, β2), with wT
= I12I−1
22 .
Uncorrelated with the nuisance score functions.
If β∗
and w∗
are sparse, then
√
nS(0, β2(0), w) =
√
nS(β∗
, w∗
) + oP(1).

How to deal with the nuisance α?
Decorrelated score test statistic w/ additional nuisance parameter
S(β, α ) :=
∂
∂β1
lβ(β; α)

S(β, w, α ) :=
∂
∂β1
lβ(β; α) −wT ∂
∂β2
lβ(β; α)
accounting
for nuisance
β2

S(β, w, α, ν) :=
∂
∂β1
∂β2
lβ(β; α)
accounting
for nuisance
β2
−νT ∂
∂α
lα(α)
accounting
for nuisance
α

S(β, w, α, ν) :=
∂
∂β1
∂β2
lβ(β; α)
accounting
for nuisance
β2
−νT ∂
∂α
lα(α)
accounting
for nuisance
α
w∗
= E
∂2
∂β2∂βT
2
lβ
−1
E
∂
∂β2
∂
∂β1
lβ
ν∗
= E
∂2
∂α∂αT
lα
−1
E
∂
∂α
∂
∂β1
lβ − w∗T ∂
∂β2
lβ
Estimating w∗ and ν∗
If z = A−1
b and z is sparse,
z = argmin
z
zT
Az − 2bT
z + λ z 1.

The proposed test
Inference on optimal ITR
H0: β∗
1 = 0
β
null
= argmin
β1=0
1
n
n
i=1
Yi
Pα(Ai|Xi)
φ AiXT
i β + λ β 1
Theorem
With additional sparsity assumptions on w∗
and ν∗
, under H0 : β∗
1 = 0,
√
nS(β
null
, w, α, ν)
d
→ N(0, σ2
).
Applied the decorrelated score test to deal with high-dim X.
Estimated propensity Pα(A|X) → modiﬁcation on test statistic.
Applicable to other penalized M-estimators with data-driven weights, e.g.,
IPW

Simulation & data analysis

Simulation design
Comparison
Proposed (O-learning + l1 penalty + decorrelated score test)
Q-learning (Qian and Murphy, 2011) + l1 penalty + decorrelated score test
O-learning + no penalty + classical Rao’s score test
Generative model summary
n = 250, p = 100, E(Y |X, A) = µ(X) + Ac(X)
X1 X2 X3 X4 X5 X6 X7 X8–X100
c(X) O O O O
µ(X) O O O O
P(A|X) O O O O
* Recall d∗
(X) = sign{c(X)}.

Simulation design
Comparison
Proposed (O-learning + l1 penalty + decorrelated score test)
Q-learning (Qian and Murphy, 2011) + l1 penalty + decorrelated score test
O-learning + no penalty + classical Rao’s score test
Generative model summary
n = 250, p = 100, E(Y |X, A) = µ(X) + Ac(X)
X1 X2 X3 X4 X5 X6 X7 X8–X100
c(X) O O O O
µ(X) O O O O
P(A|X) O O O O
* Recall d∗
(X) = sign{c(X)}.
Performance metric
Empirical powers: H01 : β∗
1 = 0 through H04 : β∗
4 = 0
Empirical type-I errors: H05 : β∗
5 = 0 through H08 : β∗
8 = 0
Value function, V (d)

Simulation results
Y continuous: Yi = E(Yi|Xi, Ai) + i with i ∼ N(0, 1/4)
Value Power Type-I errors
V (d) H01 H02 H03 H04 H05 H06 H07 H08
True optimal 0.62
Proposed 0.60 .963 .948 .952 .948 .055 .056 .038 .045
Q-learn & Decor 0.59 .905 .914 .918 .906 .056 .049 .045 .051
O-learn & Rao 0.52 .006 .008 .010 .009 .001 .002 .001 .001
True anti-optimal 0.38
Y binary: Yi ∼ Bernoulli with pi = E(Yi|Xi, Ai)
Value Power Type-I errors
V (d) H01 H02 H03 H04 H05 H06 H07 H08
True optimal 0.62
Proposed 0.57 .834 .739 .678 .762 .042 .060 .053 .056
Q-learn & Decor 0.56 .668 .653 .676 .666 .045 .048 .057 .043
O-learn & Rao 0.51 .015 .003 .000 .007 .001 .001 .001 .002
True anti-optimal 0.38

Application to the Diabetes study
Elderly patients with Type-II diabetes with comorbidity
n = 9101 patients and p = 136 covariates
Covariates: Sociodemographics, disease history, lab measures at the baseline
Treatment: Hypoglycemic agent (+1) / None (−1)
Outcome: A1c < 8% 1yr later (1) / otherwise (0)

Application to the Diabetes study
The ﬁtted ITR: d(X) = sign(XT
β)
Fitted coeﬃcients (zero omitted)
Covariate (X) Coef (β) p-value
(Intercept) 0.119
Glucose control success at baseline -0.161 0.01
Chronic kidney disease 0.127 <0.01
Other chronic complication 0.142 0.06
Female 0.043 0.13
Eye disease 0.021 0.35
Empirical Value on testing set (100 random splits)
As given
in the dataset
All +1 Q-learn
O-learn
+ no penalty
Proposed
Freq. of (A1c ¡ 8%)
after one year
85.6% 84.2% 86.7% 86.4% 87.2%
* SD ≈ 1.0%

Concluding remarks & future works

Where we contributed to?
Challenges
Treatments are not randomized.
Covariates (predictors) are high-dimensional.
Goal
Develop a principled, robust, data-driven method discovering ITR.
Contributions
Method: Hypothesis testing on ITRs under a model-robust framework
Data-driven rules → knowledge
Theory: Inferential technique for penalized M-estimators w/ nuisance param.

Works in near future
Asymptotic test and confidence regions for penalized M-estimators involving
nuisance estimation
Augmented estimator.
Convex non-differentiable loss
Inference on dynamic treatment regimes (DTR)
DTR optimizes a sequence of decision rules
Important features may differ across decisions
Idea: Optimal DTR minimizes a multivariate surrogate loss function
(Zhao et al., 2015b, 2018)

Acknowledgement
Joint work with Young-Geun Choi, Yang Ning, and Maureen Smith
Organizers
NIH grant support, R01DK108073
Contact: yqzhao@fredhutch.org
THANK YOU!

Main result, full version
Adjusted decorrelated score test statistic
S(β, w, α, ν) :=
∂
∂β1
lβ(β; α) − wT ∂
∂β2
lβ(β; α) − νT ∂
∂α
lα(α)
Theorem
Assume that the outcome Y = µ(X) + Ac(X) + satisfies
(1) c(X) = h(XT
β∗
) where h: continuous, h(·) > 0 on (0, ∞) & h(·) < 0 on (−∞, 0) and
logit{P(A = 1|X)} = XT
α∗
and the propensity model is correctly specified,
(2) β∗
, w∗
, α∗
and ν∗
are sparse,
(3) (maximum sparsity) · (log p)/
√
n → 0, (tuning parameters) log p/n
(4) loss function satisfies restricted eigenvalue conditions,
(5) E(XT
b|XT
β) is linear in XT
β and E(X) = 0, and
(6) is sub-exponential, ⊥ (X, A), and 0 < P(A|X) < 1.
Then, under the null hypothesis H0 : β∗
1 = 0,
(a)
√
nS(β
null
, w, α, ν)
d
→ N (0, σ
2
).
(b) σ2
= Var S(β∗
, w∗
, α∗
, ν∗
) is consistently estimated.

How the theorem was derived?
√
n
∂
∂β1
lβ(β
null
; α) ≈
√
n
∂
∂β1
lβ(β∗
; α) − (w∗
)T ∂
∂β2
lβ(β∗
; α)
where lβ = lβ(β∗
; α∗),

√
n
∂
∂β1
lβ(β
null
; α) ≈
√
n
∂
∂β1
lβ(β∗
; α∗
) − (w∗
)T ∂
∂β2
lβ(β∗
; α∗
) − (ν∗
)T ∂
∂α
lα(α∗
)
; α∗), and lα = lα(α∗).

√
n
∂
∂β1
lβ(β
null
; α) ≈
√
n
∂
∂β1
lβ(β∗
; α∗
) − (w∗
)T ∂
∂β2
lβ(β∗
; α∗
) − (ν∗
)T ∂
∂α
lα(α∗
)
w∗
= E
∂2
∂β2∂βT
2
lβ
−1
E
∂
∂β2
∂
∂β1
lβ
ν∗
= E
∂2
∂α∂αT
lα
−1
E
∂
∂α
∂
∂β1
lβ − w∗T ∂
∂β2
lβ
; α∗), and lα = lα(α∗).
Estimating w∗ and ν∗
If z = A−1
b and z is sparse,
z = argmin
z
zT
Az − 2bT
z + λ z 1.

Complete procedure
α = argmin
α
lα(α) + P(α)
β
null
= argmin
β1=0
lβ(β; α) + P(β)
w = argmin
w
wT
E
∂2
∂β2∂βT
2
lβ w − 2E
∂
∂β2
∂
∂β1
lβ
T
w + P(w)
ν = argmin
ν
νT
E
∂2
∂α∂αT
lα ν − 2E
∂
∂α
∂
∂β1
lβ − w∗T ∂
∂β2
lβ
T
ν
+ P(ν)

Tuning parameter selection
5-fold CV.
α, w, ν: Minimizing predictive loss functions
β
null
: Minimizing predictive Value function
V (d) = E Y ·
I{A = d(X)}
P(A|X)
V (d) =
1
n
n
i=1
Yi · I{Ai=d(Xi)}
P(Ai|Xi)
I{Ai=d(Xi)}
P(Ai|Xi)
d evaluated at training set
Data, P from testing set

More details on the simulation
Q-learning (Qian and Murphy, 2011)
Fit
argmin
γ,β
1
n
n
i=1
Yi − Φ(Xi)T
γ − AiXT
i β
2
+ P(γ, β)
Estimated decision: d(x) = sign(xT
β)
Signiﬁcance testing: Qian and Murphy (2011) did not developed the testing.
But the decorrelated score test (Ning and Liu, 2017) could be used since this
paper provided the test for penalized least square.

More details on the simulation
Generative model
n = 250, p = 100
Signal structure (recall E(Y |X, A) = µ(X) + Ac(X) )
Xij ∼ Bernoulli(0.3)
c(X) = 0.7 · (X1 + X2 − X3 − X4)/25 + 1/2
µ(X) = (X1 + X2 − X5 − X6)/25 + 1/2
logit [P(A = 1|X)] = 0.4 · (X1 + X3 − X5 − X7)
Note: 0 < E(Y |X, A) < 1
If Y continuous, Yi = E(Yi|Xi, Ai) + i with i ∼ N(0, 1/9).
If Y binary, sample Yi Bernoulli with pi = E(Yi|Xi, Ai).

More simulation
Propensity misspecified,
logit [P(A = 1|X)] = 0.685 exp(0.2X2
1 + 0.3X2
3 ) + 0.03(0.5 − X2
5 − X2
7 )(X2
5 + X2
7 − 0.3) − 0.73
Val Power Size
H01 H02 H03 H04 H05 H06 H07 H08
True optimal 0.64
Proposed 0.50 .959 .959 .952 .957 .062 .046 .060 .039
Q-learn & Decor 0.44 .915 .908 .910 .912 .032 .048 .046 .051
Main effect misspecified,
µ(X) = 0.685 exp(0.2X2
1 + 0.3X2
2 ) + 0.03(0.5 − X2
5 − X2
6 )(X2
5 + X2
6 − 0.3) − 0.73
True optimal 0.64
Proposed 0.47 .928 .885 .925 .917 .085 .041 .049 .036
Q-learn & Decor 0.40 .881 .814 .822 .847 .090 .046 .054 .045
Treatment effect misspecified,
c(X) = 0.685 exp(0.2X2
1 + 0.3X2
2 ) + 0.03(0.5 − X2
3 − X2
4 )(X2
3 + X2
4 − 0.3) − 0.73
True optimal 0.40
Proposed 0.18 .040 .997 .055 .035 .053 .050 .041 .051
Q-learn & Decor 0.13 .039 .984 .043 .042 .055 .053 .051 .053
** 1000 replications. 1.96*se(Value) < 0.007; 1.96*se(power) and 1.96*se(size) ≈ 0.014.

More detail on data analysis
Data selection criteria
Patient’s EHR linked to claims and enrollment ﬁles from Medicare
Patient inclusion criteria: if they
met a validated algorithm for identifying patients with diabetes via claims
were medically homed with an established plurality provider algorithm at the
participating large, Midwestern, multi-speciality provider group
Patients were included for each 90-day quarter from 2003-2011 in which they
were alive at the start of quarter, had continuous Medicare Part A & B
fee-for-service, and met the medical home criteria above.

Overview of O-learning
Zhao et al. (2012):
d(·) := sign{f(·)}, f = argmin
f∈F
1
n
n
i=1
Yi
P(Ai|Xi)
φ(Aif(Xi)) + λ f 2
Literature
Change of outcome-weight to reduce the variability of empirical risk
Paper Outcome-weight Choice of g(X)
Zhou et al. (2017) Y − g(X) µ(X)
Liu et al. (2016) |Y − g(X)| E(Y |X)
Other advances
Multi-staged decision rules (Zhao et al., 2015a), censored outcomes (?),
tree-based decisions (Laber and Zhao, 2015), penalized linear decisions (Song
et al., 2015; Xu et al., 2015), . . .

Fisher consistency
β = argmin
β∈Rp
1
n
n
i=1
weighted surrogate convex loss
lβ(β)
+ P(β)
βo
= argmin
β∈Rp
E weighted surrogate convex loss
lβ(β)
β∗
= argmin
β 2=1
E weighted 0-1 loss
Question. β → βo
(under certain conditions) but we are interested in β∗
1 . How
to connect β and β∗
?

Fisher consistency
β = argmin
β∈Rp
1
n
n
i=1
lβ(β)
+ P(β)
βo
= argmin
β∈Rp
lβ(β)
β∗
= argmin
β 2=1
E weighted 0-1 loss
Fisher consistency implies βo
= kβ∗
for some k > 0.
Thus, β∗
1 = 0 ⇐⇒ βo
1 = 0.

Fisher consistency
β = argmin
β∈Rp
1
n
n
i=1
lβ(β)
+ P(β)
βo
= argmin
β∈Rp
lβ(β)
Reduced problem: test βo
1 = 0 with convex loss function lβ(β)
Now we can work as if we had a negative log-likelihood with true parameter βo
.

Sparse models on main eﬀect & propensity
Main eﬀect
Fact:
µ(X) = E
Y
2P(A|X)
X
Assume µ(X) = XT
γ∗
(γ∗
sparse)
γ = argmin
γ∈Rp
1
n
n
i=1
Yi
2P(Ai|Xi)
− XT
i γ
2
+ P(γ)

Sparse models on main eﬀect & propensity
Main eﬀect
Fact:
µ(X) = E
Y
2P(A|X)
X
Assume µ(X) = XT
γ∗
(γ∗
sparse)
γ = argmin
γ∈Rp
1
n
n
i=1
Yi
2P(Ai|Xi)
− XT
i γ
2
+ P(γ)
Propensity
Assume logit{P(A = 1|X)} = XT
α∗
(α∗
sparse)
(Penalized) logistic likelihood, balancing equations, etc.
α = argmin
α∈Rp
1
n
n
i=1
log 1 + exp(−AiXT
i α) + P(α)

Modiﬁed estimation
α = argmin
α∈Rp
1
n
n
i=1
log 1 + exp(−AiXT
i α)
=: lα(α)
+ P(α)
γ = argmin
γ∈Rp
1
n
n
i=1
Yi
2Pα(Ai|Xi)
− XT
i γ
2
=: lγ (γ;α)
+ P(γ)
β
null
= argmin
β1=0
1
n
n
i=1
|Yi − XT
γ|
Pα(Ai|Xi)
φ Ai sign(Yi − XT
i γ)XT
i β
=: lβ(β;γ,α)
+ P(β)
Challenge: α and γ induce further variability on lβ.

References I
Laber, E. B. and Zhao, Y. Q. (2015). Tree-based methods for individualized treatment regimes.
Biometrika, 102(3):501–514.
Liu, Y., Wang, Y., Kosorok, M. R., Zhao, Y., and Zeng, D. (2016). Robust Hybrid Learning for
Estimating Personalized Dynamic Treatment Regimens. pages 1–42.
Ning, Y. and Liu, H. (2017). A general theory of hypothesis tests and confidence regions for
sparse high dimensional models. The Annals of Statistics, 45(1):158–195.
Qian, M. and Murphy, S. A. (2011). Performance guarantees for individualized treatment rules.
The Annals of Statistics, 39(2):1180–1210.
Song, R., Kosorok, M., Zeng, D., Zhao, Y., Laber, E., and Yuan, M. (2015). On sparse
representation for optimal individualized treatment selection with penalized outcome weighted
learning. Stat, 4(1):59–68.
Xu, Y., Yu, M., Zhao, Y. Q., Li, Q., Wang, S., and Shao, J. (2015). Regularized outcome
weighted subgroup identification for differential treatment effects. Biometrics, 71(3):645–653.
Zhao, Y., Zeng, D., Laber, E. B., and Kosorok, M. R. (2015a). New statistical learning methods
for estimating optimal dynamic treatment regimes. Journal of the American Statistical
Association, 110(510):583–598.
Zhao, Y., Zeng, D., Rush, a. J., and Kosorok, M. R. (2012). Estimating individualized treatment
rules using outcome weighted learning. Journal of the American Statistical Association,
107(499):1106–1118.

References II
Zhao, Y. Q., Zeng, D., Laber, E. B., Song, R., Yuan, M., and Kosorok, M. R. (2015b). Doubly
robust learning for estimating individualized treatment with censored data. Biometrika,
102(1):151–168.
Zhao, Y.-Q., Zhu, R., Chen, G., and Zheng, Y. (2018). Constructing stabilized dynamic
treatment regimes. arXiv preprint arXiv:1808.01332.
Zhou, X., Mayer-Hamblett, N., Khan, U., and Kosorok, M. R. (2017). Residual Weighted
Learning for Estimating Individualized Treatment Rules. Journal of American Statistical
Association, 112(517):169–187.

PMED Opening Workshop - Inference on Individualized Treatment Rules from Observational Studies with High-Dimensional Covariates - Yingi Zhao, August 14, 2018

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PMED Opening Workshop - Inference on Individualized Treatment Rules from Observational Studies with High-Dimensional Covariates - Yingi Zhao, August 14, 2018

Similar to PMED Opening Workshop - Inference on Individualized Treatment Rules from Observational Studies with High-Dimensional Covariates - Yingi Zhao, August 14, 2018 (20)

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Recently uploaded

Recently uploaded (20)

PMED Opening Workshop - Inference on Individualized Treatment Rules from Observational Studies with High-Dimensional Covariates - Yingi Zhao, August 14, 2018