SlideShare a Scribd company logo
1 of 59
Download to read offline
Individualized treatment rules from observational studies
with high-dimensional covariates
Yingqi Zhao, PhD
Fred Hutchinson Cancer Research Center
August 14, 2018, SAMSI PMED Opening Workshop
Contents
Introduction
Background
Proposed method & main results
Simulation & data analysis
Concluding remarks
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 1 / 29
Introduction
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 2 / 29
Precision medicine
“Delivering the right treatments, at the right time, to the right person”
- Remarks by Obama on Precision Medicine, 2015
Knowledge-driven
Uses scientific understandings of
genes, proteins, pathways, and
mechanisms
Data-driven
Uses a empirical, statistical,
computational methods and lets
the data talk
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 3 / 29
Observational data
Promises Challenges
Relatively cheap Missingness
Aid cost-effective RCT Measurement errors
Only source when RCT unethical Security & privacy
Sampling bias
Distributed
High-dimensional
Huge gap in our understanding in observational data.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 4 / 29
Motivating dataset
Elderly patients with Type-II Diabetes with comorbidity
30.3 million Americans have diabetes.
Linked claims and EHR data for Medicare beneficiaries in the University of
Wisconsin Medical Foundation system.
9101 patients, 136 pretreatment covariates
Treatment (Medication)
Hypoglycemic agent (Insulin, Metformin, . . .)
None
Outcome of interest: Glucose level (A1c)
Research question
1. Can we develop a data-driven rule that can better control glucose level if
implemented to future patients?
2. Can we utilize the learned data-driven rule to help expand our knowledge on
Diabetes care?
3. Can we provide uncertainty measurements for the learned rule?
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 5 / 29
Statistical goal and challenges
Goal
Develop a principled, robust, data-driven method for discovering individualized
treatment rule along with inferential procedures that can maximize future
patient’s benefit
Challenges from the data
Treatments are not randomized.
Covariates (predictors) are high-dimensional.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 6 / 29
Background
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 7 / 29
Settings
{(Xi, Ai, Yi)}n
i=1
iid
∼ P: Covariate, treatment, and outcome
X ∈ Rp
, A = ±1, and Y ∈ R (higher value preferred)
p possibly large
P(A|X) (propensity) unknown
Assume (1) P(A|X) > c > 0 and (2) no unmeasured confounders.
We want to construct an individualized treatment rule (ITR)
d(x) : X → {−1, +1}
e.g. d(x) ≡ 1, d(x) = sign(xT
1)
Objective: Find d maximizing the expected outcome if implemented in the
future.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 8 / 29
Review: Existing approaches
Useful representation
Y = µ(X)
“main effect”
+ A · c(X)
“treatment effect”
+
d∗
(X) = sign{c(X)}.
Regression-based approach
1. Parametrize and fit Y ∼ µ(X) + A c(X).
2. d(X) = sign{c(X)}.
Potential issues
The decision is estimated indirectly through predicted outcome
If model is incorrect, the quality of estimated ITR may be poor
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 9 / 29
Review: Outcome weighted learning (O-learning)
Another characterization of optimal ITR
(X, A, Y ) ∼ P
V (d) (Value function)
V (d) = Ed
(Y ) = Y dPd
= Y
dPd
dP
dP = E Y ·
I{A = d(X)}
P(A|X)
(Qian and Murphy, 2011, AoS)
Optimal ITR satisfies
d∗
= argmax
d
V (d).
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 10 / 29
Review: Outcome weighted learning (O-learning)
How about directly optimize V (d)?
Value maximization Risk minimization
E Y
P(A|X) I{A = d(X)} ⇐⇒ E Y
P(A|X) I{A = d(X)}
Empirical risk
1
n
n
i=1
Yi
P(Ai|Xi)
I{Ai = d(Xi)}
Problem: Minimization over d is computationally challenging.
Observation: It looks like a weighted zero-one loss.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 11 / 29
Review: Outcome weighted learning (O-learning)
Key idea: take a surrogate loss. (Zhao et al., 2012, JASA)
For any d, let d(·) = sign{f(·)} for measurable f.
Note: I{A = d(X)} = I(Af(X) < 0)
Consider
fo
= argmin
f: measurable
E
Y
P(A|X)
· φ(Af(X)) .
φ(t): convex surrogate loss of I(t < 0)
e.g. 1
log 2 log(1 + exp(−t)) (logistic loss), (1 − t)+ (hinge loss)
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 12 / 29
Review: Outcome weighted learning (O-learning)
Key idea: take a surrogate loss. (Zhao et al., 2012, JASA)
For any d, let d(·) = sign{f(·)} for measurable f.
Note: I{A = d(X)} = I(Af(X) < 0)
Consider
fo
= argmin
f: measurable
E
Y
P(A|X)
· φ(Af(X)) .
Fisher consistency: sign(fo
) = d∗
if Y > 0.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 12 / 29
Review: Outcome weighted learning (O-learning)
Key idea: take a surrogate loss. (Zhao et al., 2012, JASA)
For any d, let d(·) = sign{f(·)} for measurable f.
Note: I{A = d(X)} = I(Af(X) < 0)
Consider
fo
= argmin
f: measurable
E
Y
P(A|X)
· φ(Af(X)) .
Fisher consistency: sign(fo
) = d∗
if Y > 0.
Summary: Outcome weighted learning
Avoid outcome regression (likelihood), optimize the Value directly (0-1 loss)
Potential for flexibility and robustness
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 12 / 29
Review: Outcome weighted learning (O-learning)
Sparse O-learning (Xu et al., 2015; Song et al. 2015)
d∗
(x) = sign(xT
β∗
) with β∗
sparse.
d(x) = sign(xT
β)
β = argmin
β∈Rp
1
n
n
i=1
Yi
P(Ai|Xi)
φ AiXT
i β + λ β 1
Existing results
Convergence rate & model selection consistency of β when P(Ai|Xi) known
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 13 / 29
Proposed method
Task: Rigorous inference on d∗
(X)
Framework: Outcome weighted learning
Main contribution
Asymptotically valid test when X is high-dim and P(A|X) unknown
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 14 / 29
Inference: Prelude
Hypothesis: β = (β1, βT
2 )T
.
H0 : β∗
1 = 0 vs. H1 : β∗
1 = 0 (Recall d∗
(X) = sign(XT
β∗
))
Easily extends to testing a vector β∗
1
Help discover important biomarker(s) for treatment effect
Why inference? To quantify the uncertainty of discovery (p-value).
Challenge: high-dimensional inference is much more difficult than estimation.
Sparse estimators are not regular. (Distribution of β not well understood)
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 15 / 29
Inference: How?
Key idea: β is obtained from a penalized M-estimation.
β = argmin
β∈Rp
lβ(β; α)
1
n
n
i=1
Yi
Pα(Ai|Xi)
φ AiXT
i β
+ λ β 1
Use the decorrleated score test (Ning and Liu, 2017, AoS)
Applicable to general penalized M-estimation with convex differentiable loss
Challenge again: nuisance α estimated from a propensity score model
e.g. Penalized logistic regression with A as responses
α = argmin
α∈Rp
lα(α)
1
n
n
i=1 log{1+exp(−AiXT
i α)}
+ λ α 1
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 16 / 29
Intuition on the decorrelated score test (w/o nuisance α)
Forget everything so far!
l(β): convex differentiable loss, β∗
:= argminβ E[l(β)], H0 : β∗
1 = 0
Information matrix: I∗
:= E ∂2
∂β∂βT l(β∗
) =
I∗
11 I∗
12
I∗
21 I∗
22
.
Classical Rao’s score
Based on the profile score function ∂
∂β1
l(0, β2(0)), where
β2(β1) = arg minβ2
l(β1, β2)
√
n
∂
∂β1
l(0, β2(0))
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 17 / 29
Intuition on the decorrelated score test (w/o nuisance α)
Forget everything so far!
l(β): convex differentiable loss, β∗
:= argminβ E[l(β)], H0 : β∗
1 = 0
Information matrix: I∗
:= E ∂2
∂β∂βT l(β∗
) =
I∗
11 I∗
12
I∗
21 I∗
22
.
Classical Rao’s score
Based on the profile score function ∂
∂β1
l(0, β2(0)), where
β2(β1) = arg minβ2
l(β1, β2)
√
n
∂
∂β1
l(0, β2(0)) =
√
n
∂
∂β1
l(0, β∗
2) − I∗
12(β2(0) − β∗
2) + Rem
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 17 / 29
Intuition on the decorrelated score test (w/o nuisance α)
Forget everything so far!
l(β): convex differentiable loss, β∗
:= argminβ E[l(β)], H0 : β∗
1 = 0
Information matrix: I∗
:= E ∂2
∂β∂βT l(β∗
) =
I∗
11 I∗
12
I∗
21 I∗
22
.
Classical Rao’s score is problematic in high dimension,
Based on the profile score function ∂
∂β1
l(0, β2(0)), where
β2(β1) = arg minβ2
l(β1, β2)
√
n
∂
∂β1
l(0, β2(0)) =
√
n
∂
∂β1
l(0, β∗
2) − I∗
12(β2(0) − β∗
2) + Rem
because Rem is not negligible if p → ∞ or limiting distribution not tractable.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 17 / 29
Intuition on the decorrelated score test (w/o nuisance α)
Classical Rao’s score
√
n
∂
∂β1
l(0, β2(0)) =
√
n



∂
∂β1
l(0, β∗
2) − I∗
12(I∗
22)−1
(w∗)T
∂
∂β2
l(0, β∗
2)


 + Rem
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 18 / 29
Intuition on the decorrelated score test (w/o nuisance α)
Classical Rao’s score
√
n
∂
∂β1
l(0, β2(0)) =
√
n



∂
∂β1
l(0, β∗
2) − I∗
12(I∗
22)−1
(w∗)T
∂
∂β2
l(0, β∗
2)


 + Rem
The decorrelated score function for β1
S(β1, β2) =
∂
∂β1
l(β1, β2) − wT ∂
∂β2
l(β1, β2), with wT
= I12I−1
22 .
Uncorrelated with the nuisance score functions.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 18 / 29
Intuition on the decorrelated score test (w/o nuisance α)
Classical Rao’s score
√
n
∂
∂β1
l(0, β2(0)) =
√
n



∂
∂β1
l(0, β∗
2) − I∗
12(I∗
22)−1
(w∗)T
∂
∂β2
l(0, β∗
2)


 + Rem
The decorrelated score function for β1
S(β1, β2) =
∂
∂β1
l(β1, β2) − wT ∂
∂β2
l(β1, β2), with wT
= I12I−1
22 .
Uncorrelated with the nuisance score functions.
If β∗
and w∗
are sparse, then
√
nS(0, β2(0), w) =
√
nS(β∗
, w∗
) + oP(1).
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 18 / 29
How to deal with the nuisance α?
Decorrelated score test statistic w/ additional nuisance parameter
S(β, α ) :=
∂
∂β1
lβ(β; α)
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 19 / 29
How to deal with the nuisance α?
Decorrelated score test statistic w/ additional nuisance parameter
S(β, w, α ) :=
∂
∂β1
lβ(β; α) −wT ∂
∂β2
lβ(β; α)
accounting
for nuisance
β2
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 19 / 29
How to deal with the nuisance α?
Decorrelated score test statistic w/ additional nuisance parameter
S(β, w, α, ν) :=
∂
∂β1
lβ(β; α) −wT ∂
∂β2
lβ(β; α)
accounting
for nuisance
β2
−νT ∂
∂α
lα(α)
accounting
for nuisance
α
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 19 / 29
How to deal with the nuisance α?
Decorrelated score test statistic w/ additional nuisance parameter
S(β, w, α, ν) :=
∂
∂β1
lβ(β; α) −wT ∂
∂β2
lβ(β; α)
accounting
for nuisance
β2
−νT ∂
∂α
lα(α)
accounting
for nuisance
α
w∗
= E
∂2
∂β2∂βT
2
lβ
−1
E
∂
∂β2
∂
∂β1
lβ
ν∗
= E
∂2
∂α∂αT
lα
−1
E
∂
∂α
∂
∂β1
lβ − w∗T ∂
∂β2
lβ
Estimating w∗ and ν∗
If z = A−1
b and z is sparse,
z = argmin
z
zT
Az − 2bT
z + λ z 1.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 19 / 29
The proposed test
Inference on optimal ITR
H0: β∗
1 = 0
β
null
= argmin
β1=0
1
n
n
i=1
Yi
Pα(Ai|Xi)
φ AiXT
i β + λ β 1
Theorem
With additional sparsity assumptions on w∗
and ν∗
, under H0 : β∗
1 = 0,
√
nS(β
null
, w, α, ν)
d
→ N(0, σ2
).
Applied the decorrelated score test to deal with high-dim X.
Estimated propensity Pα(A|X) → modification on test statistic.
Applicable to other penalized M-estimators with data-driven weights, e.g.,
IPW
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 20 / 29
Simulation & data analysis
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 21 / 29
Simulation design
Comparison
Proposed (O-learning + l1 penalty + decorrelated score test)
Q-learning (Qian and Murphy, 2011) + l1 penalty + decorrelated score test
O-learning + no penalty + classical Rao’s score test
Generative model summary
n = 250, p = 100, E(Y |X, A) = µ(X) + Ac(X)
X1 X2 X3 X4 X5 X6 X7 X8–X100
c(X) O O O O
µ(X) O O O O
P(A|X) O O O O
* Recall d∗
(X) = sign{c(X)}.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 22 / 29
Simulation design
Comparison
Proposed (O-learning + l1 penalty + decorrelated score test)
Q-learning (Qian and Murphy, 2011) + l1 penalty + decorrelated score test
O-learning + no penalty + classical Rao’s score test
Generative model summary
n = 250, p = 100, E(Y |X, A) = µ(X) + Ac(X)
X1 X2 X3 X4 X5 X6 X7 X8–X100
c(X) O O O O
µ(X) O O O O
P(A|X) O O O O
* Recall d∗
(X) = sign{c(X)}.
Performance metric
Empirical powers: H01 : β∗
1 = 0 through H04 : β∗
4 = 0
Empirical type-I errors: H05 : β∗
5 = 0 through H08 : β∗
8 = 0
Value function, V (d)
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 22 / 29
Simulation results
Y continuous: Yi = E(Yi|Xi, Ai) + i with i ∼ N(0, 1/4)
Value Power Type-I errors
V (d) H01 H02 H03 H04 H05 H06 H07 H08
True optimal 0.62
Proposed 0.60 .963 .948 .952 .948 .055 .056 .038 .045
Q-learn & Decor 0.59 .905 .914 .918 .906 .056 .049 .045 .051
O-learn & Rao 0.52 .006 .008 .010 .009 .001 .002 .001 .001
True anti-optimal 0.38
Y binary: Yi ∼ Bernoulli with pi = E(Yi|Xi, Ai)
Value Power Type-I errors
V (d) H01 H02 H03 H04 H05 H06 H07 H08
True optimal 0.62
Proposed 0.57 .834 .739 .678 .762 .042 .060 .053 .056
Q-learn & Decor 0.56 .668 .653 .676 .666 .045 .048 .057 .043
O-learn & Rao 0.51 .015 .003 .000 .007 .001 .001 .001 .002
True anti-optimal 0.38
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 23 / 29
Application to the Diabetes study
Elderly patients with Type-II diabetes with comorbidity
n = 9101 patients and p = 136 covariates
Covariates: Sociodemographics, disease history, lab measures at the baseline
Treatment: Hypoglycemic agent (+1) / None (−1)
Outcome: A1c < 8% 1yr later (1) / otherwise (0)
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 24 / 29
Application to the Diabetes study
The fitted ITR: d(X) = sign(XT
β)
Fitted coefficients (zero omitted)
Covariate (X) Coef (β) p-value
(Intercept) 0.119
Glucose control success at baseline -0.161 0.01
Chronic kidney disease 0.127 <0.01
Other chronic complication 0.142 0.06
Female 0.043 0.13
Eye disease 0.021 0.35
Empirical Value on testing set (100 random splits)
As given
in the dataset
All +1 Q-learn
O-learn
+ no penalty
Proposed
Freq. of (A1c ¡ 8%)
after one year
85.6% 84.2% 86.7% 86.4% 87.2%
* SD ≈ 1.0%
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 25 / 29
Concluding remarks & future works
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 26 / 29
Where we contributed to?
Challenges
Treatments are not randomized.
Covariates (predictors) are high-dimensional.
Goal
Develop a principled, robust, data-driven method discovering ITR.
Contributions
Method: Hypothesis testing on ITRs under a model-robust framework
Data-driven rules → knowledge
Theory: Inferential technique for penalized M-estimators w/ nuisance param.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 27 / 29
Works in near future
Asymptotic test and confidence regions for penalized M-estimators involving
nuisance estimation
Augmented estimator.
Convex non-differentiable loss
Inference on dynamic treatment regimes (DTR)
DTR optimizes a sequence of decision rules
Important features may differ across decisions
Idea: Optimal DTR minimizes a multivariate surrogate loss function
(Zhao et al., 2015b, 2018)
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 28 / 29
Acknowledgement
Joint work with Young-Geun Choi, Yang Ning, and Maureen Smith
Organizers
NIH grant support, R01DK108073
Contact: yqzhao@fredhutch.org
THANK YOU!
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 29 / 29
Main result, full version
Adjusted decorrelated score test statistic
S(β, w, α, ν) :=
∂
∂β1
lβ(β; α) − wT ∂
∂β2
lβ(β; α) − νT ∂
∂α
lα(α)
Theorem
Assume that the outcome Y = µ(X) + Ac(X) + satisfies
(1) c(X) = h(XT
β∗
) where h: continuous, h(·) > 0 on (0, ∞) & h(·) < 0 on (−∞, 0) and
logit{P(A = 1|X)} = XT
α∗
and the propensity model is correctly specified,
(2) β∗
, w∗
, α∗
and ν∗
are sparse,
(3) (maximum sparsity) · (log p)/
√
n → 0, (tuning parameters) log p/n
(4) loss function satisfies restricted eigenvalue conditions,
(5) E(XT
b|XT
β) is linear in XT
β and E(X) = 0, and
(6) is sub-exponential, ⊥ (X, A), and 0 < P(A|X) < 1.
Then, under the null hypothesis H0 : β∗
1 = 0,
(a)
√
nS(β
null
, w, α, ν)
d
→ N (0, σ
2
).
(b) σ2
= Var S(β∗
, w∗
, α∗
, ν∗
) is consistently estimated.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 1 / 14
How the theorem was derived?
√
n
∂
∂β1
lβ(β
null
; α) ≈
√
n
∂
∂β1
lβ(β∗
; α) − (w∗
)T ∂
∂β2
lβ(β∗
; α)
where lβ = lβ(β∗
; α∗),
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 2 / 14
How the theorem was derived?
√
n
∂
∂β1
lβ(β
null
; α) ≈
√
n
∂
∂β1
lβ(β∗
; α∗
) − (w∗
)T ∂
∂β2
lβ(β∗
; α∗
) − (ν∗
)T ∂
∂α
lα(α∗
)
where lβ = lβ(β∗
; α∗), and lα = lα(α∗).
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 2 / 14
How the theorem was derived?
√
n
∂
∂β1
lβ(β
null
; α) ≈
√
n
∂
∂β1
lβ(β∗
; α∗
) − (w∗
)T ∂
∂β2
lβ(β∗
; α∗
) − (ν∗
)T ∂
∂α
lα(α∗
)
w∗
= E
∂2
∂β2∂βT
2
lβ
−1
E
∂
∂β2
∂
∂β1
lβ
ν∗
= E
∂2
∂α∂αT
lα
−1
E
∂
∂α
∂
∂β1
lβ − w∗T ∂
∂β2
lβ
where lβ = lβ(β∗
; α∗), and lα = lα(α∗).
Estimating w∗ and ν∗
If z = A−1
b and z is sparse,
z = argmin
z
zT
Az − 2bT
z + λ z 1.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 2 / 14
Complete procedure
α = argmin
α
lα(α) + P(α)
β
null
= argmin
β1=0
lβ(β; α) + P(β)
w = argmin
w
wT
E
∂2
∂β2∂βT
2
lβ w − 2E
∂
∂β2
∂
∂β1
lβ
T
w + P(w)
ν = argmin
ν
νT
E
∂2
∂α∂αT
lα ν − 2E
∂
∂α
∂
∂β1
lβ − w∗T ∂
∂β2
lβ
T
ν
+ P(ν)
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 3 / 14
Tuning parameter selection
5-fold CV.
α, w, ν: Minimizing predictive loss functions
β
null
: Minimizing predictive Value function
V (d) = E Y ·
I{A = d(X)}
P(A|X)
V (d) =
1
n
n
i=1
Yi · I{Ai=d(Xi)}
P(Ai|Xi)
I{Ai=d(Xi)}
P(Ai|Xi)
d evaluated at training set
Data, P from testing set
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 4 / 14
More details on the simulation
Q-learning (Qian and Murphy, 2011)
Fit
argmin
γ,β
1
n
n
i=1
Yi − Φ(Xi)T
γ − AiXT
i β
2
+ P(γ, β)
Estimated decision: d(x) = sign(xT
β)
Significance testing: Qian and Murphy (2011) did not developed the testing.
But the decorrelated score test (Ning and Liu, 2017) could be used since this
paper provided the test for penalized least square.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 5 / 14
More details on the simulation
Generative model
n = 250, p = 100
Signal structure (recall E(Y |X, A) = µ(X) + Ac(X) )
Xij ∼ Bernoulli(0.3)
c(X) = 0.7 · (X1 + X2 − X3 − X4)/25 + 1/2
µ(X) = (X1 + X2 − X5 − X6)/25 + 1/2
logit [P(A = 1|X)] = 0.4 · (X1 + X3 − X5 − X7)
Note: 0 < E(Y |X, A) < 1
If Y continuous, Yi = E(Yi|Xi, Ai) + i with i ∼ N(0, 1/9).
If Y binary, sample Yi Bernoulli with pi = E(Yi|Xi, Ai).
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 6 / 14
More simulation
Propensity misspecified,
logit [P(A = 1|X)] = 0.685 exp(0.2X2
1 + 0.3X2
3 ) + 0.03(0.5 − X2
5 − X2
7 )(X2
5 + X2
7 − 0.3) − 0.73
Val Power Size
H01 H02 H03 H04 H05 H06 H07 H08
True optimal 0.64
Proposed 0.50 .959 .959 .952 .957 .062 .046 .060 .039
Q-learn & Decor 0.44 .915 .908 .910 .912 .032 .048 .046 .051
Main effect misspecified,
µ(X) = 0.685 exp(0.2X2
1 + 0.3X2
2 ) + 0.03(0.5 − X2
5 − X2
6 )(X2
5 + X2
6 − 0.3) − 0.73
True optimal 0.64
Proposed 0.47 .928 .885 .925 .917 .085 .041 .049 .036
Q-learn & Decor 0.40 .881 .814 .822 .847 .090 .046 .054 .045
Treatment effect misspecified,
c(X) = 0.685 exp(0.2X2
1 + 0.3X2
2 ) + 0.03(0.5 − X2
3 − X2
4 )(X2
3 + X2
4 − 0.3) − 0.73
True optimal 0.40
Proposed 0.18 .040 .997 .055 .035 .053 .050 .041 .051
Q-learn & Decor 0.13 .039 .984 .043 .042 .055 .053 .051 .053
** 1000 replications. 1.96*se(Value) < 0.007; 1.96*se(power) and 1.96*se(size) ≈ 0.014.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 7 / 14
More detail on data analysis
Data selection criteria
Patient’s EHR linked to claims and enrollment files from Medicare
Patient inclusion criteria: if they
met a validated algorithm for identifying patients with diabetes via claims
were medically homed with an established plurality provider algorithm at the
participating large, Midwestern, multi-speciality provider group
Patients were included for each 90-day quarter from 2003-2011 in which they
were alive at the start of quarter, had continuous Medicare Part A & B
fee-for-service, and met the medical home criteria above.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 8 / 14
Overview of O-learning
Zhao et al. (2012):
d(·) := sign{f(·)}, f = argmin
f∈F
1
n
n
i=1
Yi
P(Ai|Xi)
φ(Aif(Xi)) + λ f 2
Literature
Change of outcome-weight to reduce the variability of empirical risk
Paper Outcome-weight Choice of g(X)
Zhou et al. (2017) Y − g(X) µ(X)
Liu et al. (2016) |Y − g(X)| E(Y |X)
Other advances
Multi-staged decision rules (Zhao et al., 2015a), censored outcomes (?),
tree-based decisions (Laber and Zhao, 2015), penalized linear decisions (Song
et al., 2015; Xu et al., 2015), . . .
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 9 / 14
Fisher consistency
β = argmin
β∈Rp
1
n
n
i=1
weighted surrogate convex loss
lβ(β)
+ P(β)
βo
= argmin
β∈Rp
E weighted surrogate convex loss
lβ(β)
β∗
= argmin
β 2=1
E weighted 0-1 loss
Question. β → βo
(under certain conditions) but we are interested in β∗
1 . How
to connect β and β∗
?
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 10 / 14
Fisher consistency
β = argmin
β∈Rp
1
n
n
i=1
weighted surrogate convex loss
lβ(β)
+ P(β)
βo
= argmin
β∈Rp
E weighted surrogate convex loss
lβ(β)
β∗
= argmin
β 2=1
E weighted 0-1 loss
Fisher consistency implies βo
= kβ∗
for some k > 0.
Thus, β∗
1 = 0 ⇐⇒ βo
1 = 0.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 10 / 14
Fisher consistency
β = argmin
β∈Rp
1
n
n
i=1
weighted surrogate convex loss
lβ(β)
+ P(β)
βo
= argmin
β∈Rp
E weighted surrogate convex loss
lβ(β)
Reduced problem: test βo
1 = 0 with convex loss function lβ(β)
Now we can work as if we had a negative log-likelihood with true parameter βo
.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 10 / 14
Sparse models on main effect & propensity
Main effect
Fact:
µ(X) = E
Y
2P(A|X)
X
Assume µ(X) = XT
γ∗
(γ∗
sparse)
γ = argmin
γ∈Rp
1
n
n
i=1
Yi
2P(Ai|Xi)
− XT
i γ
2
+ P(γ)
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 11 / 14
Sparse models on main effect & propensity
Main effect
Fact:
µ(X) = E
Y
2P(A|X)
X
Assume µ(X) = XT
γ∗
(γ∗
sparse)
γ = argmin
γ∈Rp
1
n
n
i=1
Yi
2P(Ai|Xi)
− XT
i γ
2
+ P(γ)
Propensity
Assume logit{P(A = 1|X)} = XT
α∗
(α∗
sparse)
(Penalized) logistic likelihood, balancing equations, etc.
α = argmin
α∈Rp
1
n
n
i=1
log 1 + exp(−AiXT
i α) + P(α)
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 11 / 14
Modified estimation
α = argmin
α∈Rp
1
n
n
i=1
log 1 + exp(−AiXT
i α)
=: lα(α)
+ P(α)
γ = argmin
γ∈Rp
1
n
n
i=1
Yi
2Pα(Ai|Xi)
− XT
i γ
2
=: lγ (γ;α)
+ P(γ)
β
null
= argmin
β1=0
1
n
n
i=1
|Yi − XT
γ|
Pα(Ai|Xi)
φ Ai sign(Yi − XT
i γ)XT
i β
=: lβ(β;γ,α)
+ P(β)
Challenge: α and γ induce further variability on lβ.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 12 / 14
References I
Laber, E. B. and Zhao, Y. Q. (2015). Tree-based methods for individualized treatment regimes.
Biometrika, 102(3):501–514.
Liu, Y., Wang, Y., Kosorok, M. R., Zhao, Y., and Zeng, D. (2016). Robust Hybrid Learning for
Estimating Personalized Dynamic Treatment Regimens. pages 1–42.
Ning, Y. and Liu, H. (2017). A general theory of hypothesis tests and confidence regions for
sparse high dimensional models. The Annals of Statistics, 45(1):158–195.
Qian, M. and Murphy, S. A. (2011). Performance guarantees for individualized treatment rules.
The Annals of Statistics, 39(2):1180–1210.
Song, R., Kosorok, M., Zeng, D., Zhao, Y., Laber, E., and Yuan, M. (2015). On sparse
representation for optimal individualized treatment selection with penalized outcome weighted
learning. Stat, 4(1):59–68.
Xu, Y., Yu, M., Zhao, Y. Q., Li, Q., Wang, S., and Shao, J. (2015). Regularized outcome
weighted subgroup identification for differential treatment effects. Biometrics, 71(3):645–653.
Zhao, Y., Zeng, D., Laber, E. B., and Kosorok, M. R. (2015a). New statistical learning methods
for estimating optimal dynamic treatment regimes. Journal of the American Statistical
Association, 110(510):583–598.
Zhao, Y., Zeng, D., Rush, a. J., and Kosorok, M. R. (2012). Estimating individualized treatment
rules using outcome weighted learning. Journal of the American Statistical Association,
107(499):1106–1118.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 13 / 14
References II
Zhao, Y. Q., Zeng, D., Laber, E. B., Song, R., Yuan, M., and Kosorok, M. R. (2015b). Doubly
robust learning for estimating individualized treatment with censored data. Biometrika,
102(1):151–168.
Zhao, Y.-Q., Zhu, R., Chen, G., and Zheng, Y. (2018). Constructing stabilized dynamic
treatment regimes. arXiv preprint arXiv:1808.01332.
Zhou, X., Mayer-Hamblett, N., Khan, U., and Kosorok, M. R. (2017). Residual Weighted
Learning for Estimating Individualized Treatment Rules. Journal of American Statistical
Association, 112(517):169–187.
Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 14 / 14

More Related Content

What's hot

Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GANSEMINARGROOT
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsJulyan Arbel
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsJulyan Arbel
 
Non-parametric analysis of models and data
Non-parametric analysis of models and dataNon-parametric analysis of models and data
Non-parametric analysis of models and datahaharrington
 
Uncoupled Regression from Pairwise Comparison Data
Uncoupled Regression from Pairwise Comparison DataUncoupled Regression from Pairwise Comparison Data
Uncoupled Regression from Pairwise Comparison DataLiyuan Xu
 
Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach Jae-kwang Kim
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsJulyan Arbel
 
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketingBayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketingJulyan Arbel
 
Bayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessAlessandro Panella
 
Formulation of model likelihood functions
Formulation of model likelihood functionsFormulation of model likelihood functions
Formulation of model likelihood functionsAndreas Scheidegger
 
効率的反実仮想学習
効率的反実仮想学習効率的反実仮想学習
効率的反実仮想学習Masa Kato
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsNBER
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data AnalysisNBER
 

What's hot (20)

Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GAN
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
 
2019 PMED Spring Course - Single Decision Treatment Regimes: Additional Metho...
2019 PMED Spring Course - Single Decision Treatment Regimes: Additional Metho...2019 PMED Spring Course - Single Decision Treatment Regimes: Additional Metho...
2019 PMED Spring Course - Single Decision Treatment Regimes: Additional Metho...
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
 
Non-parametric analysis of models and data
Non-parametric analysis of models and dataNon-parametric analysis of models and data
Non-parametric analysis of models and data
 
Uncoupled Regression from Pairwise Comparison Data
Uncoupled Regression from Pairwise Comparison DataUncoupled Regression from Pairwise Comparison Data
Uncoupled Regression from Pairwise Comparison Data
 
Propensity albert
Propensity albertPropensity albert
Propensity albert
 
Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian Nonparametrics
 
ISBA 2016: Foundations
ISBA 2016: FoundationsISBA 2016: Foundations
ISBA 2016: Foundations
 
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketingBayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Bayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet Process
 
Formulation of model likelihood functions
Formulation of model likelihood functionsFormulation of model likelihood functions
Formulation of model likelihood functions
 
効率的反実仮想学習
効率的反実仮想学習効率的反実仮想学習
効率的反実仮想学習
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and Algorithms
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
Side 2019 #9
Side 2019 #9Side 2019 #9
Side 2019 #9
 
PMED Transition Workshop - Estimation & Optimization of Composite Outcomes - ...
PMED Transition Workshop - Estimation & Optimization of Composite Outcomes - ...PMED Transition Workshop - Estimation & Optimization of Composite Outcomes - ...
PMED Transition Workshop - Estimation & Optimization of Composite Outcomes - ...
 

Similar to PMED Opening Workshop - Inference on Individualized Treatment Rules from Observational Studies with High-Dimensional Covariates - Yingi Zhao, August 14, 2018

Slides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometrySlides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometryFrank Nielsen
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Learning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldKai-Wen Zhao
 
Understanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesUnderstanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesGilles Louppe
 
Reject Inference in Credit Scoring
Reject Inference in Credit ScoringReject Inference in Credit Scoring
Reject Inference in Credit ScoringAdrien Ehrhardt
 
Statistical Decision Theory
Statistical Decision TheoryStatistical Decision Theory
Statistical Decision TheorySangwoo Mo
 
Module - 2 Discrete Mathematics and Graph Theory
Module - 2 Discrete Mathematics and Graph TheoryModule - 2 Discrete Mathematics and Graph Theory
Module - 2 Discrete Mathematics and Graph TheoryAdhiyaman Manickam
 
RuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative SystemsRuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative SystemsRuleML
 
slides_online_optimization_david_mateos
slides_online_optimization_david_mateosslides_online_optimization_david_mateos
slides_online_optimization_david_mateosDavid Mateos
 
Introduction to FDA and linear models
 Introduction to FDA and linear models Introduction to FDA and linear models
Introduction to FDA and linear modelstuxette
 

Similar to PMED Opening Workshop - Inference on Individualized Treatment Rules from Observational Studies with High-Dimensional Covariates - Yingi Zhao, August 14, 2018 (20)

Slides ACTINFO 2016
Slides ACTINFO 2016Slides ACTINFO 2016
Slides ACTINFO 2016
 
Proba stats-r1-2017
Proba stats-r1-2017Proba stats-r1-2017
Proba stats-r1-2017
 
Slides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometrySlides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometry
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Econometrics 2017-graduate-3
Econometrics 2017-graduate-3Econometrics 2017-graduate-3
Econometrics 2017-graduate-3
 
Learning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifold
 
lec2_CS540_handouts.pdf
lec2_CS540_handouts.pdflec2_CS540_handouts.pdf
lec2_CS540_handouts.pdf
 
Understanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesUnderstanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized trees
 
Side 2019, part 1
Side 2019, part 1Side 2019, part 1
Side 2019, part 1
 
Classification
ClassificationClassification
Classification
 
Reject Inference in Credit Scoring
Reject Inference in Credit ScoringReject Inference in Credit Scoring
Reject Inference in Credit Scoring
 
Statistical Decision Theory
Statistical Decision TheoryStatistical Decision Theory
Statistical Decision Theory
 
Madrid easy
Madrid easyMadrid easy
Madrid easy
 
ppt0320defenseday
ppt0320defensedayppt0320defenseday
ppt0320defenseday
 
Module - 2 Discrete Mathematics and Graph Theory
Module - 2 Discrete Mathematics and Graph TheoryModule - 2 Discrete Mathematics and Graph Theory
Module - 2 Discrete Mathematics and Graph Theory
 
QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...
QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...
QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
RuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative SystemsRuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative Systems
 
slides_online_optimization_david_mateos
slides_online_optimization_david_mateosslides_online_optimization_david_mateos
slides_online_optimization_david_mateos
 
Introduction to FDA and linear models
 Introduction to FDA and linear models Introduction to FDA and linear models
Introduction to FDA and linear models
 

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
 
2019 GDRR: Blockchain Data Analytics - Tracking Criminals by Following the Mo...
2019 GDRR: Blockchain Data Analytics - Tracking Criminals by Following the Mo...2019 GDRR: Blockchain Data Analytics - Tracking Criminals by Following the Mo...
2019 GDRR: Blockchain Data Analytics - Tracking Criminals by Following the Mo...
 

Recently uploaded

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 

Recently uploaded (20)

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 

PMED Opening Workshop - Inference on Individualized Treatment Rules from Observational Studies with High-Dimensional Covariates - Yingi Zhao, August 14, 2018

  • 1. Individualized treatment rules from observational studies with high-dimensional covariates Yingqi Zhao, PhD Fred Hutchinson Cancer Research Center August 14, 2018, SAMSI PMED Opening Workshop
  • 2. Contents Introduction Background Proposed method & main results Simulation & data analysis Concluding remarks Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 1 / 29
  • 3. Introduction Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 2 / 29
  • 4. Precision medicine “Delivering the right treatments, at the right time, to the right person” - Remarks by Obama on Precision Medicine, 2015 Knowledge-driven Uses scientific understandings of genes, proteins, pathways, and mechanisms Data-driven Uses a empirical, statistical, computational methods and lets the data talk Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 3 / 29
  • 5. Observational data Promises Challenges Relatively cheap Missingness Aid cost-effective RCT Measurement errors Only source when RCT unethical Security & privacy Sampling bias Distributed High-dimensional Huge gap in our understanding in observational data. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 4 / 29
  • 6. Motivating dataset Elderly patients with Type-II Diabetes with comorbidity 30.3 million Americans have diabetes. Linked claims and EHR data for Medicare beneficiaries in the University of Wisconsin Medical Foundation system. 9101 patients, 136 pretreatment covariates Treatment (Medication) Hypoglycemic agent (Insulin, Metformin, . . .) None Outcome of interest: Glucose level (A1c) Research question 1. Can we develop a data-driven rule that can better control glucose level if implemented to future patients? 2. Can we utilize the learned data-driven rule to help expand our knowledge on Diabetes care? 3. Can we provide uncertainty measurements for the learned rule? Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 5 / 29
  • 7. Statistical goal and challenges Goal Develop a principled, robust, data-driven method for discovering individualized treatment rule along with inferential procedures that can maximize future patient’s benefit Challenges from the data Treatments are not randomized. Covariates (predictors) are high-dimensional. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 6 / 29
  • 8. Background Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 7 / 29
  • 9. Settings {(Xi, Ai, Yi)}n i=1 iid ∼ P: Covariate, treatment, and outcome X ∈ Rp , A = ±1, and Y ∈ R (higher value preferred) p possibly large P(A|X) (propensity) unknown Assume (1) P(A|X) > c > 0 and (2) no unmeasured confounders. We want to construct an individualized treatment rule (ITR) d(x) : X → {−1, +1} e.g. d(x) ≡ 1, d(x) = sign(xT 1) Objective: Find d maximizing the expected outcome if implemented in the future. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 8 / 29
  • 10. Review: Existing approaches Useful representation Y = µ(X) “main effect” + A · c(X) “treatment effect” + d∗ (X) = sign{c(X)}. Regression-based approach 1. Parametrize and fit Y ∼ µ(X) + A c(X). 2. d(X) = sign{c(X)}. Potential issues The decision is estimated indirectly through predicted outcome If model is incorrect, the quality of estimated ITR may be poor Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 9 / 29
  • 11. Review: Outcome weighted learning (O-learning) Another characterization of optimal ITR (X, A, Y ) ∼ P V (d) (Value function) V (d) = Ed (Y ) = Y dPd = Y dPd dP dP = E Y · I{A = d(X)} P(A|X) (Qian and Murphy, 2011, AoS) Optimal ITR satisfies d∗ = argmax d V (d). Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 10 / 29
  • 12. Review: Outcome weighted learning (O-learning) How about directly optimize V (d)? Value maximization Risk minimization E Y P(A|X) I{A = d(X)} ⇐⇒ E Y P(A|X) I{A = d(X)} Empirical risk 1 n n i=1 Yi P(Ai|Xi) I{Ai = d(Xi)} Problem: Minimization over d is computationally challenging. Observation: It looks like a weighted zero-one loss. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 11 / 29
  • 13. Review: Outcome weighted learning (O-learning) Key idea: take a surrogate loss. (Zhao et al., 2012, JASA) For any d, let d(·) = sign{f(·)} for measurable f. Note: I{A = d(X)} = I(Af(X) < 0) Consider fo = argmin f: measurable E Y P(A|X) · φ(Af(X)) . φ(t): convex surrogate loss of I(t < 0) e.g. 1 log 2 log(1 + exp(−t)) (logistic loss), (1 − t)+ (hinge loss) Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 12 / 29
  • 14. Review: Outcome weighted learning (O-learning) Key idea: take a surrogate loss. (Zhao et al., 2012, JASA) For any d, let d(·) = sign{f(·)} for measurable f. Note: I{A = d(X)} = I(Af(X) < 0) Consider fo = argmin f: measurable E Y P(A|X) · φ(Af(X)) . Fisher consistency: sign(fo ) = d∗ if Y > 0. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 12 / 29
  • 15. Review: Outcome weighted learning (O-learning) Key idea: take a surrogate loss. (Zhao et al., 2012, JASA) For any d, let d(·) = sign{f(·)} for measurable f. Note: I{A = d(X)} = I(Af(X) < 0) Consider fo = argmin f: measurable E Y P(A|X) · φ(Af(X)) . Fisher consistency: sign(fo ) = d∗ if Y > 0. Summary: Outcome weighted learning Avoid outcome regression (likelihood), optimize the Value directly (0-1 loss) Potential for flexibility and robustness Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 12 / 29
  • 16. Review: Outcome weighted learning (O-learning) Sparse O-learning (Xu et al., 2015; Song et al. 2015) d∗ (x) = sign(xT β∗ ) with β∗ sparse. d(x) = sign(xT β) β = argmin β∈Rp 1 n n i=1 Yi P(Ai|Xi) φ AiXT i β + λ β 1 Existing results Convergence rate & model selection consistency of β when P(Ai|Xi) known Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 13 / 29
  • 17. Proposed method Task: Rigorous inference on d∗ (X) Framework: Outcome weighted learning Main contribution Asymptotically valid test when X is high-dim and P(A|X) unknown Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 14 / 29
  • 18. Inference: Prelude Hypothesis: β = (β1, βT 2 )T . H0 : β∗ 1 = 0 vs. H1 : β∗ 1 = 0 (Recall d∗ (X) = sign(XT β∗ )) Easily extends to testing a vector β∗ 1 Help discover important biomarker(s) for treatment effect Why inference? To quantify the uncertainty of discovery (p-value). Challenge: high-dimensional inference is much more difficult than estimation. Sparse estimators are not regular. (Distribution of β not well understood) Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 15 / 29
  • 19. Inference: How? Key idea: β is obtained from a penalized M-estimation. β = argmin β∈Rp lβ(β; α) 1 n n i=1 Yi Pα(Ai|Xi) φ AiXT i β + λ β 1 Use the decorrleated score test (Ning and Liu, 2017, AoS) Applicable to general penalized M-estimation with convex differentiable loss Challenge again: nuisance α estimated from a propensity score model e.g. Penalized logistic regression with A as responses α = argmin α∈Rp lα(α) 1 n n i=1 log{1+exp(−AiXT i α)} + λ α 1 Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 16 / 29
  • 20. Intuition on the decorrelated score test (w/o nuisance α) Forget everything so far! l(β): convex differentiable loss, β∗ := argminβ E[l(β)], H0 : β∗ 1 = 0 Information matrix: I∗ := E ∂2 ∂β∂βT l(β∗ ) = I∗ 11 I∗ 12 I∗ 21 I∗ 22 . Classical Rao’s score Based on the profile score function ∂ ∂β1 l(0, β2(0)), where β2(β1) = arg minβ2 l(β1, β2) √ n ∂ ∂β1 l(0, β2(0)) Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 17 / 29
  • 21. Intuition on the decorrelated score test (w/o nuisance α) Forget everything so far! l(β): convex differentiable loss, β∗ := argminβ E[l(β)], H0 : β∗ 1 = 0 Information matrix: I∗ := E ∂2 ∂β∂βT l(β∗ ) = I∗ 11 I∗ 12 I∗ 21 I∗ 22 . Classical Rao’s score Based on the profile score function ∂ ∂β1 l(0, β2(0)), where β2(β1) = arg minβ2 l(β1, β2) √ n ∂ ∂β1 l(0, β2(0)) = √ n ∂ ∂β1 l(0, β∗ 2) − I∗ 12(β2(0) − β∗ 2) + Rem Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 17 / 29
  • 22. Intuition on the decorrelated score test (w/o nuisance α) Forget everything so far! l(β): convex differentiable loss, β∗ := argminβ E[l(β)], H0 : β∗ 1 = 0 Information matrix: I∗ := E ∂2 ∂β∂βT l(β∗ ) = I∗ 11 I∗ 12 I∗ 21 I∗ 22 . Classical Rao’s score is problematic in high dimension, Based on the profile score function ∂ ∂β1 l(0, β2(0)), where β2(β1) = arg minβ2 l(β1, β2) √ n ∂ ∂β1 l(0, β2(0)) = √ n ∂ ∂β1 l(0, β∗ 2) − I∗ 12(β2(0) − β∗ 2) + Rem because Rem is not negligible if p → ∞ or limiting distribution not tractable. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 17 / 29
  • 23. Intuition on the decorrelated score test (w/o nuisance α) Classical Rao’s score √ n ∂ ∂β1 l(0, β2(0)) = √ n    ∂ ∂β1 l(0, β∗ 2) − I∗ 12(I∗ 22)−1 (w∗)T ∂ ∂β2 l(0, β∗ 2)    + Rem Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 18 / 29
  • 24. Intuition on the decorrelated score test (w/o nuisance α) Classical Rao’s score √ n ∂ ∂β1 l(0, β2(0)) = √ n    ∂ ∂β1 l(0, β∗ 2) − I∗ 12(I∗ 22)−1 (w∗)T ∂ ∂β2 l(0, β∗ 2)    + Rem The decorrelated score function for β1 S(β1, β2) = ∂ ∂β1 l(β1, β2) − wT ∂ ∂β2 l(β1, β2), with wT = I12I−1 22 . Uncorrelated with the nuisance score functions. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 18 / 29
  • 25. Intuition on the decorrelated score test (w/o nuisance α) Classical Rao’s score √ n ∂ ∂β1 l(0, β2(0)) = √ n    ∂ ∂β1 l(0, β∗ 2) − I∗ 12(I∗ 22)−1 (w∗)T ∂ ∂β2 l(0, β∗ 2)    + Rem The decorrelated score function for β1 S(β1, β2) = ∂ ∂β1 l(β1, β2) − wT ∂ ∂β2 l(β1, β2), with wT = I12I−1 22 . Uncorrelated with the nuisance score functions. If β∗ and w∗ are sparse, then √ nS(0, β2(0), w) = √ nS(β∗ , w∗ ) + oP(1). Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 18 / 29
  • 26. How to deal with the nuisance α? Decorrelated score test statistic w/ additional nuisance parameter S(β, α ) := ∂ ∂β1 lβ(β; α) Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 19 / 29
  • 27. How to deal with the nuisance α? Decorrelated score test statistic w/ additional nuisance parameter S(β, w, α ) := ∂ ∂β1 lβ(β; α) −wT ∂ ∂β2 lβ(β; α) accounting for nuisance β2 Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 19 / 29
  • 28. How to deal with the nuisance α? Decorrelated score test statistic w/ additional nuisance parameter S(β, w, α, ν) := ∂ ∂β1 lβ(β; α) −wT ∂ ∂β2 lβ(β; α) accounting for nuisance β2 −νT ∂ ∂α lα(α) accounting for nuisance α Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 19 / 29
  • 29. How to deal with the nuisance α? Decorrelated score test statistic w/ additional nuisance parameter S(β, w, α, ν) := ∂ ∂β1 lβ(β; α) −wT ∂ ∂β2 lβ(β; α) accounting for nuisance β2 −νT ∂ ∂α lα(α) accounting for nuisance α w∗ = E ∂2 ∂β2∂βT 2 lβ −1 E ∂ ∂β2 ∂ ∂β1 lβ ν∗ = E ∂2 ∂α∂αT lα −1 E ∂ ∂α ∂ ∂β1 lβ − w∗T ∂ ∂β2 lβ Estimating w∗ and ν∗ If z = A−1 b and z is sparse, z = argmin z zT Az − 2bT z + λ z 1. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 19 / 29
  • 30. The proposed test Inference on optimal ITR H0: β∗ 1 = 0 β null = argmin β1=0 1 n n i=1 Yi Pα(Ai|Xi) φ AiXT i β + λ β 1 Theorem With additional sparsity assumptions on w∗ and ν∗ , under H0 : β∗ 1 = 0, √ nS(β null , w, α, ν) d → N(0, σ2 ). Applied the decorrelated score test to deal with high-dim X. Estimated propensity Pα(A|X) → modification on test statistic. Applicable to other penalized M-estimators with data-driven weights, e.g., IPW Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 20 / 29
  • 31. Simulation & data analysis Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 21 / 29
  • 32. Simulation design Comparison Proposed (O-learning + l1 penalty + decorrelated score test) Q-learning (Qian and Murphy, 2011) + l1 penalty + decorrelated score test O-learning + no penalty + classical Rao’s score test Generative model summary n = 250, p = 100, E(Y |X, A) = µ(X) + Ac(X) X1 X2 X3 X4 X5 X6 X7 X8–X100 c(X) O O O O µ(X) O O O O P(A|X) O O O O * Recall d∗ (X) = sign{c(X)}. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 22 / 29
  • 33. Simulation design Comparison Proposed (O-learning + l1 penalty + decorrelated score test) Q-learning (Qian and Murphy, 2011) + l1 penalty + decorrelated score test O-learning + no penalty + classical Rao’s score test Generative model summary n = 250, p = 100, E(Y |X, A) = µ(X) + Ac(X) X1 X2 X3 X4 X5 X6 X7 X8–X100 c(X) O O O O µ(X) O O O O P(A|X) O O O O * Recall d∗ (X) = sign{c(X)}. Performance metric Empirical powers: H01 : β∗ 1 = 0 through H04 : β∗ 4 = 0 Empirical type-I errors: H05 : β∗ 5 = 0 through H08 : β∗ 8 = 0 Value function, V (d) Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 22 / 29
  • 34. Simulation results Y continuous: Yi = E(Yi|Xi, Ai) + i with i ∼ N(0, 1/4) Value Power Type-I errors V (d) H01 H02 H03 H04 H05 H06 H07 H08 True optimal 0.62 Proposed 0.60 .963 .948 .952 .948 .055 .056 .038 .045 Q-learn & Decor 0.59 .905 .914 .918 .906 .056 .049 .045 .051 O-learn & Rao 0.52 .006 .008 .010 .009 .001 .002 .001 .001 True anti-optimal 0.38 Y binary: Yi ∼ Bernoulli with pi = E(Yi|Xi, Ai) Value Power Type-I errors V (d) H01 H02 H03 H04 H05 H06 H07 H08 True optimal 0.62 Proposed 0.57 .834 .739 .678 .762 .042 .060 .053 .056 Q-learn & Decor 0.56 .668 .653 .676 .666 .045 .048 .057 .043 O-learn & Rao 0.51 .015 .003 .000 .007 .001 .001 .001 .002 True anti-optimal 0.38 Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 23 / 29
  • 35. Application to the Diabetes study Elderly patients with Type-II diabetes with comorbidity n = 9101 patients and p = 136 covariates Covariates: Sociodemographics, disease history, lab measures at the baseline Treatment: Hypoglycemic agent (+1) / None (−1) Outcome: A1c < 8% 1yr later (1) / otherwise (0) Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 24 / 29
  • 36. Application to the Diabetes study The fitted ITR: d(X) = sign(XT β) Fitted coefficients (zero omitted) Covariate (X) Coef (β) p-value (Intercept) 0.119 Glucose control success at baseline -0.161 0.01 Chronic kidney disease 0.127 <0.01 Other chronic complication 0.142 0.06 Female 0.043 0.13 Eye disease 0.021 0.35 Empirical Value on testing set (100 random splits) As given in the dataset All +1 Q-learn O-learn + no penalty Proposed Freq. of (A1c ¡ 8%) after one year 85.6% 84.2% 86.7% 86.4% 87.2% * SD ≈ 1.0% Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 25 / 29
  • 37. Concluding remarks & future works Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 26 / 29
  • 38. Where we contributed to? Challenges Treatments are not randomized. Covariates (predictors) are high-dimensional. Goal Develop a principled, robust, data-driven method discovering ITR. Contributions Method: Hypothesis testing on ITRs under a model-robust framework Data-driven rules → knowledge Theory: Inferential technique for penalized M-estimators w/ nuisance param. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 27 / 29
  • 39. Works in near future Asymptotic test and confidence regions for penalized M-estimators involving nuisance estimation Augmented estimator. Convex non-differentiable loss Inference on dynamic treatment regimes (DTR) DTR optimizes a sequence of decision rules Important features may differ across decisions Idea: Optimal DTR minimizes a multivariate surrogate loss function (Zhao et al., 2015b, 2018) Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 28 / 29
  • 40. Acknowledgement Joint work with Young-Geun Choi, Yang Ning, and Maureen Smith Organizers NIH grant support, R01DK108073 Contact: yqzhao@fredhutch.org THANK YOU! Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 29 / 29
  • 41. Main result, full version Adjusted decorrelated score test statistic S(β, w, α, ν) := ∂ ∂β1 lβ(β; α) − wT ∂ ∂β2 lβ(β; α) − νT ∂ ∂α lα(α) Theorem Assume that the outcome Y = µ(X) + Ac(X) + satisfies (1) c(X) = h(XT β∗ ) where h: continuous, h(·) > 0 on (0, ∞) & h(·) < 0 on (−∞, 0) and logit{P(A = 1|X)} = XT α∗ and the propensity model is correctly specified, (2) β∗ , w∗ , α∗ and ν∗ are sparse, (3) (maximum sparsity) · (log p)/ √ n → 0, (tuning parameters) log p/n (4) loss function satisfies restricted eigenvalue conditions, (5) E(XT b|XT β) is linear in XT β and E(X) = 0, and (6) is sub-exponential, ⊥ (X, A), and 0 < P(A|X) < 1. Then, under the null hypothesis H0 : β∗ 1 = 0, (a) √ nS(β null , w, α, ν) d → N (0, σ 2 ). (b) σ2 = Var S(β∗ , w∗ , α∗ , ν∗ ) is consistently estimated. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 1 / 14
  • 42. How the theorem was derived? √ n ∂ ∂β1 lβ(β null ; α) ≈ √ n ∂ ∂β1 lβ(β∗ ; α) − (w∗ )T ∂ ∂β2 lβ(β∗ ; α) where lβ = lβ(β∗ ; α∗), Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 2 / 14
  • 43. How the theorem was derived? √ n ∂ ∂β1 lβ(β null ; α) ≈ √ n ∂ ∂β1 lβ(β∗ ; α∗ ) − (w∗ )T ∂ ∂β2 lβ(β∗ ; α∗ ) − (ν∗ )T ∂ ∂α lα(α∗ ) where lβ = lβ(β∗ ; α∗), and lα = lα(α∗). Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 2 / 14
  • 44. How the theorem was derived? √ n ∂ ∂β1 lβ(β null ; α) ≈ √ n ∂ ∂β1 lβ(β∗ ; α∗ ) − (w∗ )T ∂ ∂β2 lβ(β∗ ; α∗ ) − (ν∗ )T ∂ ∂α lα(α∗ ) w∗ = E ∂2 ∂β2∂βT 2 lβ −1 E ∂ ∂β2 ∂ ∂β1 lβ ν∗ = E ∂2 ∂α∂αT lα −1 E ∂ ∂α ∂ ∂β1 lβ − w∗T ∂ ∂β2 lβ where lβ = lβ(β∗ ; α∗), and lα = lα(α∗). Estimating w∗ and ν∗ If z = A−1 b and z is sparse, z = argmin z zT Az − 2bT z + λ z 1. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 2 / 14
  • 45. Complete procedure α = argmin α lα(α) + P(α) β null = argmin β1=0 lβ(β; α) + P(β) w = argmin w wT E ∂2 ∂β2∂βT 2 lβ w − 2E ∂ ∂β2 ∂ ∂β1 lβ T w + P(w) ν = argmin ν νT E ∂2 ∂α∂αT lα ν − 2E ∂ ∂α ∂ ∂β1 lβ − w∗T ∂ ∂β2 lβ T ν + P(ν) Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 3 / 14
  • 46. Tuning parameter selection 5-fold CV. α, w, ν: Minimizing predictive loss functions β null : Minimizing predictive Value function V (d) = E Y · I{A = d(X)} P(A|X) V (d) = 1 n n i=1 Yi · I{Ai=d(Xi)} P(Ai|Xi) I{Ai=d(Xi)} P(Ai|Xi) d evaluated at training set Data, P from testing set Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 4 / 14
  • 47. More details on the simulation Q-learning (Qian and Murphy, 2011) Fit argmin γ,β 1 n n i=1 Yi − Φ(Xi)T γ − AiXT i β 2 + P(γ, β) Estimated decision: d(x) = sign(xT β) Significance testing: Qian and Murphy (2011) did not developed the testing. But the decorrelated score test (Ning and Liu, 2017) could be used since this paper provided the test for penalized least square. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 5 / 14
  • 48. More details on the simulation Generative model n = 250, p = 100 Signal structure (recall E(Y |X, A) = µ(X) + Ac(X) ) Xij ∼ Bernoulli(0.3) c(X) = 0.7 · (X1 + X2 − X3 − X4)/25 + 1/2 µ(X) = (X1 + X2 − X5 − X6)/25 + 1/2 logit [P(A = 1|X)] = 0.4 · (X1 + X3 − X5 − X7) Note: 0 < E(Y |X, A) < 1 If Y continuous, Yi = E(Yi|Xi, Ai) + i with i ∼ N(0, 1/9). If Y binary, sample Yi Bernoulli with pi = E(Yi|Xi, Ai). Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 6 / 14
  • 49. More simulation Propensity misspecified, logit [P(A = 1|X)] = 0.685 exp(0.2X2 1 + 0.3X2 3 ) + 0.03(0.5 − X2 5 − X2 7 )(X2 5 + X2 7 − 0.3) − 0.73 Val Power Size H01 H02 H03 H04 H05 H06 H07 H08 True optimal 0.64 Proposed 0.50 .959 .959 .952 .957 .062 .046 .060 .039 Q-learn & Decor 0.44 .915 .908 .910 .912 .032 .048 .046 .051 Main effect misspecified, µ(X) = 0.685 exp(0.2X2 1 + 0.3X2 2 ) + 0.03(0.5 − X2 5 − X2 6 )(X2 5 + X2 6 − 0.3) − 0.73 True optimal 0.64 Proposed 0.47 .928 .885 .925 .917 .085 .041 .049 .036 Q-learn & Decor 0.40 .881 .814 .822 .847 .090 .046 .054 .045 Treatment effect misspecified, c(X) = 0.685 exp(0.2X2 1 + 0.3X2 2 ) + 0.03(0.5 − X2 3 − X2 4 )(X2 3 + X2 4 − 0.3) − 0.73 True optimal 0.40 Proposed 0.18 .040 .997 .055 .035 .053 .050 .041 .051 Q-learn & Decor 0.13 .039 .984 .043 .042 .055 .053 .051 .053 ** 1000 replications. 1.96*se(Value) < 0.007; 1.96*se(power) and 1.96*se(size) ≈ 0.014. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 7 / 14
  • 50. More detail on data analysis Data selection criteria Patient’s EHR linked to claims and enrollment files from Medicare Patient inclusion criteria: if they met a validated algorithm for identifying patients with diabetes via claims were medically homed with an established plurality provider algorithm at the participating large, Midwestern, multi-speciality provider group Patients were included for each 90-day quarter from 2003-2011 in which they were alive at the start of quarter, had continuous Medicare Part A & B fee-for-service, and met the medical home criteria above. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 8 / 14
  • 51. Overview of O-learning Zhao et al. (2012): d(·) := sign{f(·)}, f = argmin f∈F 1 n n i=1 Yi P(Ai|Xi) φ(Aif(Xi)) + λ f 2 Literature Change of outcome-weight to reduce the variability of empirical risk Paper Outcome-weight Choice of g(X) Zhou et al. (2017) Y − g(X) µ(X) Liu et al. (2016) |Y − g(X)| E(Y |X) Other advances Multi-staged decision rules (Zhao et al., 2015a), censored outcomes (?), tree-based decisions (Laber and Zhao, 2015), penalized linear decisions (Song et al., 2015; Xu et al., 2015), . . . Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 9 / 14
  • 52. Fisher consistency β = argmin β∈Rp 1 n n i=1 weighted surrogate convex loss lβ(β) + P(β) βo = argmin β∈Rp E weighted surrogate convex loss lβ(β) β∗ = argmin β 2=1 E weighted 0-1 loss Question. β → βo (under certain conditions) but we are interested in β∗ 1 . How to connect β and β∗ ? Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 10 / 14
  • 53. Fisher consistency β = argmin β∈Rp 1 n n i=1 weighted surrogate convex loss lβ(β) + P(β) βo = argmin β∈Rp E weighted surrogate convex loss lβ(β) β∗ = argmin β 2=1 E weighted 0-1 loss Fisher consistency implies βo = kβ∗ for some k > 0. Thus, β∗ 1 = 0 ⇐⇒ βo 1 = 0. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 10 / 14
  • 54. Fisher consistency β = argmin β∈Rp 1 n n i=1 weighted surrogate convex loss lβ(β) + P(β) βo = argmin β∈Rp E weighted surrogate convex loss lβ(β) Reduced problem: test βo 1 = 0 with convex loss function lβ(β) Now we can work as if we had a negative log-likelihood with true parameter βo . Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 10 / 14
  • 55. Sparse models on main effect & propensity Main effect Fact: µ(X) = E Y 2P(A|X) X Assume µ(X) = XT γ∗ (γ∗ sparse) γ = argmin γ∈Rp 1 n n i=1 Yi 2P(Ai|Xi) − XT i γ 2 + P(γ) Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 11 / 14
  • 56. Sparse models on main effect & propensity Main effect Fact: µ(X) = E Y 2P(A|X) X Assume µ(X) = XT γ∗ (γ∗ sparse) γ = argmin γ∈Rp 1 n n i=1 Yi 2P(Ai|Xi) − XT i γ 2 + P(γ) Propensity Assume logit{P(A = 1|X)} = XT α∗ (α∗ sparse) (Penalized) logistic likelihood, balancing equations, etc. α = argmin α∈Rp 1 n n i=1 log 1 + exp(−AiXT i α) + P(α) Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 11 / 14
  • 57. Modified estimation α = argmin α∈Rp 1 n n i=1 log 1 + exp(−AiXT i α) =: lα(α) + P(α) γ = argmin γ∈Rp 1 n n i=1 Yi 2Pα(Ai|Xi) − XT i γ 2 =: lγ (γ;α) + P(γ) β null = argmin β1=0 1 n n i=1 |Yi − XT γ| Pα(Ai|Xi) φ Ai sign(Yi − XT i γ)XT i β =: lβ(β;γ,α) + P(β) Challenge: α and γ induce further variability on lβ. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 12 / 14
  • 58. References I Laber, E. B. and Zhao, Y. Q. (2015). Tree-based methods for individualized treatment regimes. Biometrika, 102(3):501–514. Liu, Y., Wang, Y., Kosorok, M. R., Zhao, Y., and Zeng, D. (2016). Robust Hybrid Learning for Estimating Personalized Dynamic Treatment Regimens. pages 1–42. Ning, Y. and Liu, H. (2017). A general theory of hypothesis tests and confidence regions for sparse high dimensional models. The Annals of Statistics, 45(1):158–195. Qian, M. and Murphy, S. A. (2011). Performance guarantees for individualized treatment rules. The Annals of Statistics, 39(2):1180–1210. Song, R., Kosorok, M., Zeng, D., Zhao, Y., Laber, E., and Yuan, M. (2015). On sparse representation for optimal individualized treatment selection with penalized outcome weighted learning. Stat, 4(1):59–68. Xu, Y., Yu, M., Zhao, Y. Q., Li, Q., Wang, S., and Shao, J. (2015). Regularized outcome weighted subgroup identification for differential treatment effects. Biometrics, 71(3):645–653. Zhao, Y., Zeng, D., Laber, E. B., and Kosorok, M. R. (2015a). New statistical learning methods for estimating optimal dynamic treatment regimes. Journal of the American Statistical Association, 110(510):583–598. Zhao, Y., Zeng, D., Rush, a. J., and Kosorok, M. R. (2012). Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107(499):1106–1118. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 13 / 14
  • 59. References II Zhao, Y. Q., Zeng, D., Laber, E. B., Song, R., Yuan, M., and Kosorok, M. R. (2015b). Doubly robust learning for estimating individualized treatment with censored data. Biometrika, 102(1):151–168. Zhao, Y.-Q., Zhu, R., Chen, G., and Zheng, Y. (2018). Constructing stabilized dynamic treatment regimes. arXiv preprint arXiv:1808.01332. Zhou, X., Mayer-Hamblett, N., Khan, U., and Kosorok, M. R. (2017). Residual Weighted Learning for Estimating Individualized Treatment Rules. Journal of American Statistical Association, 112(517):169–187. Yingqi Zhao ITR, observational studies, high-dim covariates August 14, 2018 14 / 14