Basic Concepts in Principal Stratification

Basic Concepts in Principal Stratiﬁcation
Kojin Oshiba & Wenshuo Wang
Harvard University
March 28, 2018
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 1 / 43

Overview
1 Review of the papers
Principal Stratification in Causal Inference (Frangkais & Rubin, 2002)
Estimation of Causal Effects via Principal Stratification When Some
Outcomes Are Truncated by ”Death” (Zhang & Rubin, 2003)
A Refreshing Account of Principal Stratification
(Mealli & Mattei, 2012)
2 Discussion

Review of the papers

Principal Stratiﬁcation in Causal Inference
(Frangkais & Rubin, 2002)

Summary
Scholars have defined net treatment effect using posttreatment
variables. But this is not a causal effect.
Principal stratification lets us define principal effect, which is a causal
effect within each stratum.
One application of principal stratification is surrogate endpoints that
are useful when the outcome is too expensive to measure.

Definition of a Causal Effect
Units i = 1, 2, . . . , n ∈ A
Control (z = 0) or treatment (z = 1)
Yi (z): value of Y if unit i is assigned treatment z
Causal effect of assignment on the outcome Y is the comparison of:
{Yi (0) : i ∈ A} and {Yi (1) : i ∈ A}. (1)

Post-treatment Variables
Post-treatment variable Sobs
i : variable observed after treatment
assignment in addition to the main outcome Y .
Assume Sobs
i is binary for simplicity.

Net Treatment Effect
Net Treatment Effect (NTE) is the comparison of:
Y obs
i |Sobs
i = s, zi = 0 and Y obs
i |Sobs
i = s, zi = 1. (2)
which, under complete randomization, reduces to
Yi (0)|Si (0) = s and Yi (1)|Si (1) = s. (3)
NTE is not a causal effect if treatment affects post-treatment variable
(post-treatment selection bias).

Principal Stratification
Basic principal stratification P0: partition s.t. all units i have the
same vector (Si (0), Si (1)) within any partition of P0.
Principal stratification P: partitions are unions of partitions in P0.
Example: P = {{i : Si (0) = Si (1)}, {i : Si (0) = Si (1)}} (4)

Principal Effect
SP
i : the stratum of P to which unit i belongs.
Principal Effect: A comparison of potential outcomes under control vs
treatment within a principal stratum θ in P
{Yi (0) : SP
i = θ} and {Yi (1) : SP
i = θ}. (5)
The stratum SP
i is unaffected by treatment for any principal
stratification P.
Therefore, any principal effect is a causal effect.

Missing Data in Principal Strata
Usually, one of the post-treatment variables and the potential
outcomes are missing.
Smis
= {Si (z) : all i; z = Zi }, Y mis
= {Yi (z) : all i; z = Zi } (6)
Estimate using Hobs = (Y obs, Sobs, z):
L(Hobs
; θS
, θY
) (7)
Additional assumptions/restrictions needed for a unique MLE of
(θS , θY ).

Surrogate Endpoints
The primary outcome Y may be too expensive or unfeasible to obtain
in a practical time span.
Surrogate variable: a post-treatment variable used as a ”surrogate”
for the treatment effects on Y . It should satisfy,
(Causal Necessity) Treatment effect on Y can occur only if there’s a
treatment effect on S.
(Statistical Generalizability) Sobs
should well predict Y obs
in an
application study.

Statistical Surrogate
S is a statistical surrogate if, for all ﬁxed s,
Y obs
i |Sobs
i = s, zi = 0 ∼ Y obs
i |Sobs
i = s, zi = 1 (8)
Statistical surrogacy does not satisfy causal necessity.

Principal Surrogate
S is a principal surrogate if, for all ﬁxed s,
Yi (0)|Si (0) = Si (1) = s ∼ Yi (1)|Si (0) = Si (1) = s (9)
or, under randomization,
Y obs
i |Si (0) = Si (1) = s, Zi = 0 and Y obs
i |Si (0) = Si (1) = s, Zi = 1
(10)
Principal surrogacy satisﬁes causal necessity.
S being a statistical surrogate doesn’t imply it being a principal
surrogate, vice versa.

Associative and Dissociative Effects
Dissociative effect is a comparison between
{Yi (0) : Si (0) = Si (1)} and {Yi (1) : Si (0) = Si (1)}. (11)
Associative effect is a comparison between
{Yi (0) : Si (0) = Si (1)} and {Yi (1) : Si (0) = Si (1)}. (12)
Comparison of (11) and (12) measures the association of surrogate
endpoints and treatment outcomes. If the association is high,
surrogate is a good target.

Summary
Scholars have defined net treatment effect using posttreatment
variables. But this is not a causal effect.
Principal stratification lets us define principal effect, which is a causal
effect within each stratum.
One application of principal stratification is surrogate endpoints that
are useful when the outcome is too expensive to measure.

Estimation of Causal Eﬀects via Principal Stratiﬁcation
When Some Outcomes Are Truncated by ”Death”
(Zhang & Rubin, 2003)

Summary
Truncation by death is different from censor by death. Should be
taken care of using principal stratification.
Using principal stratification, we can estimate a causal effect for
stratum without truncation by death.
We can find upper/lower bounds for such causal effect.

Truncation by Death
”Missing”,”Censored” = ”Truncated”
The causal effect is defined on R for ”Censored by Death”.
The causal effect is defined on {R, ∗} for ”Truncation by Death”.
Previous approaches have treated ”Truncation” as ”Censoring”:
Ignore truncated values.
Impute truncated outcomes in R.
Model a missing-data mechanism due to ”censoring”.
Principal stratification addresses this issue.

Example: Educational Program Assessment
Two educational programs: Treatment (T) and Control (C)
Graduation Indicators: Si (T), Si (C) ∈ {G, D}
Principal stratiﬁcation by the graduation indicator:
T
C
G D
G GG GD
D DG DD

Truncation and Causal Effect
Causal effect not defined on GD and DG due to truncation.
¯Y obs(T) − ¯Y obs(C) measures the effect of the mixture of strata,
which is misleading if either GD or DG exists. We should adjust for
the pair of indicators (Si (T), Si (C)) instead.
What we want to know: ¯YGG (T) − ¯YGG (C).

Large-Sample Bounds
Unfortunately, we don’t directly observe the principal strata. What we
do observe are
OBS(T, G) = {i : Zi = T, Sobs
i = G}
OBS(T, D) = {i : Zi = T, Sobs
i = D}
OBS(C, G) = {i : Zi = C, Sobs
i = G}
OBS(C, D) = {i : Zi = C, Sobs
i = D}
Large-sample bounds for the average causal eﬀect on Y in the GG
principal stratum can be derived.
This can be sharpened with additional assumptions:
Assumption 1. (Monotonicity) No DG group.
Assumption 2. (Ranked average score) When assigned treatment, GG
performs better than GD; when assigned control, GG performs better
than DG.

Large-Sample Bounds - Calculation
What we want to know: ¯YGG (T) − ¯YGG (C).
OBS(T, G) is the πGG
πGG +πGD
and πGD
πGG +πGD
mixture of the GG and GD.
¯YGG (T)’s upper (lower) bound can be found by averaging over the
largest (smallest) πGG
πGG +πGD
fraction of OBS(T, G).
¯YGG (C)’s bounds can be found analogously on OBS(C, G)
Together, we can bound ¯YGG (T) − ¯YGG (C).
Additional assumptions can further bound ¯YGG (T) − ¯YGG (C).
Monotonicity: πDG = 0.
Ranked average score: ¯YGG (T) achieves minimum when it equals
¯YGD(T); ¯YGG (C) achieves minimum when it equals ¯YDG (C).

Large-Sample Bounds - Summary

Summary
Truncation by death is different from censor by death. Should be
taken care of using principal stratification.
Using principal stratification, we can estimate a causal effect for
stratum without truncation by death.
We can find upper/lower bounds for such causal effect.

A Refreshing Account of Principal Stratiﬁcation
(Mealli & Mattei, 2012)

Summary
The paper formalizes the framework of principal stratiﬁcation analysis.
Causal mediation analysis is used when post-treatment variables can
be intervened.
In causal mediation analysis, we can potentially mix information
across principal strata to infer values of missing data.

Advantages of Principal Strata
Parsimony achieved by classifying units by principal strata instead of
baseline features (Pearl 2011).
The coarsest choice of subpopulations to maintain ignorability of the
treatment Zi :
Yi (0), Yi (1) ⊥⊥ Zi |Si (0), Si (1), Xi (13)
Considering Yi (z) not Yi (z, s) simpliﬁes the estimation:
Assume that S is not manipulatable.
Disregard ’a priori counterfactuals’ (Yi (z, Si (1 − z))).

Reformalization of Effects on Principal Strata
From now on, S takes values in its support S (not necessarily binary).
Principal Causal Effect (PCE)
PCE(s0, s1) = E[Yi (1) − Yi (0)|Si (0) = s0, Si (1) = s1]. (14)
Principal Strata Direct Effect (PSDE) of Z on Y at level s ∈ S
PSDE(s) = E[Yi (1) − Yi (0)|Si (0) = Si (1) = s]. (15)
PSDE is a.k.a dissociative effect.

Reformalization of Effects on Principal Strata (cont’d)
Average Natural Direct Effect:
NDE(z) = E[Yi (1, Si (z)) − Yi (0, Si (z))] (16)
Average Natural Indirect Effect:
NIE(z) = E[Yi (z, Si (1)) − Yi (z, Si (0))] (17)
Average Natural Direct Effect within a subpopulation P,
NDEP(z)
=
s0=s1=s
PSDE(s)πP
s,s
+
s0=s1
E[Yi (1, Si (z)) − Yi (0, Si (z))|Si (0) = s0, Si (1) = s1]πP
s0,s1
,
(18)
where πP
s0,s1
is the proportion of subjects with Si (0) = s0 and
Si (1) = s1 in P.
PSDEP(s) = 0 does not imply NDEP(z) = 0.

Reformalization of Eﬀects on Principal Strata (cont’d)
Average Total Causal Eﬀect:
ACE = NDE(z) + NIE(1 − z) (19)
ACE =E[Yi (1) − Yi (0)] =
(s0,s1)
PCE(s0, s1)πs0,s1
=
s0=s1=s
PSDE(s)πs,s +
s0=s1
PCE(s0, s1)πs0,s1 .
(20)

CACE vs ACE
If S = {0, 1}, Compliers Average Causal Eﬀect (CACE)
CACE = E[Yi (1) − Yi (0)|Si (0) = 0, Si (1) = 1] (21)
To identify ACE, we also need to extrapolate CACE to non-compliers
with additional assumptions.

Causal Mediation Analysis
Instead of Yi (z), investigate the potential outcomes Yi (z, s).
S is regarded as an additional treatment that can be intervened.
A priori counterfactuals: Yi (0, s(1)), Yi (1, s(0))
The goal is to estimate the eﬀect of intervention (aka indirect eﬀect)
from a data with no intervention.

Sequential Ignorability Assumptions
Sequential ignoralibility assumptions (Imai 2010):
{Yi (z , s), Si (z)} ⊥⊥ Zi |Xi = x (22)
Yi (z , s) ⊥⊥ Si (z)|Zi = z, Xi = x (23)
Under S.I.A, we can extrapolate the information on Yi (z, Si (z)) to
Yi (z, Si (1 − z)).
Extrapolation across principal strata (even for units with little data) is
possible.
But the treatment received should be confounded by S, so S.I.A may
not be credible!
Instead, start from preliminary principal stratiﬁcation analysis and mix
information across strata if reasonable.

Mixing Information Across Principal Strata: Si(0) = Si(1)
Simple case: When Si (0) = Si (1) for the subpopulation of units P,
Yi (z, Si (1 − z)) can be observed. For this subpopulation,
NDEP(z)
=
s0=s1=s
PSDE(s)πP
s,s
+
s0=s1
E[Yi (1, Si (z)) − Yi (0, Si (z))|Si (0) = s0, Si (1) = s1]πP
s0,s1
=
s0=s1=s
PSDE(s)πP
s,s,
(24)
which is a weighted average of the principal strata direct eﬀects
PSDE(s).

Mixing Information Across Principal Strata: Si(0) = Si(1)
Harder case: When Si (0) = Si (1) for the subpopulation of units P,
Yi (z, Si (1 − z)) cannot be observed.
Let’s think about NDE(0) first.
If we find that
different principal strata had similar covariate distributions and/or
outcome levels are similar under one of the treatment levels,
the problem can be simplified. For example, if we let
uv = {i : Si (0) = u, Si (1) = v} for u, v ∈ {0, 1} and find evidence
that
E[Yi (0)|i ∈ 01] = E[Yi (0)|i ∈ 00], (25)
then we might assume
E[Yi (1, Si (0))|i ∈ 01] = E[Yi (1, Si (0))|i ∈ 00], (26)
where the right hand side can be estimated. Similarly,
E[Yi (1, Si (0))|i ∈ 10] = E[Yi (1, Si (0))|i ∈ 11]. (27)

Mixing Information Across Principal Strata
Under the assumptions in the previous slide, NDE01(0) = NDE00(0)
and NDE10(0) = NDE11(0). Thus,
NDE(0) =NDE00(0)π00 + NDE11(0)π11 + NDE10(0)π10 + NDE01(0)π01
=NDE00(0)(π00 + π01) + NDE11(0)(π11 + π10)
=PSDE(0)(π00 + π01) + PSDE(1)(π11 + π10).
(28)
NIE(1) = ACE − NDE(0) (29)
NIE(0) can be estimated analogously from NDE(1).

Surrogate Endpoints Revisited
We did not quite understand...
What is full principal stratiﬁcation?
What is an example of a ’surrogate paradox’?

Summary
The paper formalizes the framework of principal stratiﬁcation analysis.
Causal mediation analysis is used when post-treatment variables can
be intervened.
In causal mediation analysis, we can potentially mix information
across principal strata to infer values of missing data.

Discussion

Discussion
Comparison of distributions versus comparison of means.
Rubin carefully defines ”effect” using a generic term, ”comparison”
between two distributions.
Page 8 on Mealli&Mattei: If PSDE(z) = 0 for each z ∈ S then there is
no evidence on the direct effect of the treatment after controlling for
the mediator
Isn’t this an overstatement because PSDE is defined as an expected
value of differences in two distributions?
If the post-treatment S is continuous, then what?
When we stratify, do we split S into bins?
If so, what is the optimal splitting? How many bins?

Discussion
How do we assess that interventions on S are conceivable?
If we don’t have a ”large-sample”, how can we get a limited-sample
bound?
Will we estimate π’s using a validation experiment and then estimate
bounds on an application experiment?

The End

Basic Concepts in Principal Stratification

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Basic Concepts in Principal Stratification

Similar to Basic Concepts in Principal Stratification (20)

Recently uploaded

Recently uploaded (20)

Basic Concepts in Principal Stratification