Introduction to Principal Stratification.
Presented Papers:
Principal Stratification in Causal Inference (Frangkais & Rubin, 2002)
Estimation of Causal Effects via Principal Stratification When Some Outcomes Are Truncated by "Death" (Zhang & Rubin, 2003)
A Refreshing Account of Principal Stratification (Mealli & Mattei, 2012)
Presentation given to students in Harvard University STAT 286: Causal Inference
Determinants of health, dimensions of health, positive health and spectrum of...
Basic Concepts in Principal Stratification
1. Basic Concepts in Principal Stratification
Kojin Oshiba & Wenshuo Wang
Harvard University
March 28, 2018
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 1 / 43
2. Overview
1 Review of the papers
Principal Stratification in Causal Inference (Frangkais & Rubin, 2002)
Estimation of Causal Effects via Principal Stratification When Some
Outcomes Are Truncated by ”Death” (Zhang & Rubin, 2003)
A Refreshing Account of Principal Stratification
(Mealli & Mattei, 2012)
2 Discussion
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 2 / 43
3. Review of the papers
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 3 / 43
4. Principal Stratification in Causal Inference
(Frangkais & Rubin, 2002)
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 4 / 43
5. Summary
Scholars have defined net treatment effect using posttreatment
variables. But this is not a causal effect.
Principal stratification lets us define principal effect, which is a causal
effect within each stratum.
One application of principal stratification is surrogate endpoints that
are useful when the outcome is too expensive to measure.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 5 / 43
6. Definition of a Causal Effect
Units i = 1, 2, . . . , n ∈ A
Control (z = 0) or treatment (z = 1)
Yi (z): value of Y if unit i is assigned treatment z
Causal effect of assignment on the outcome Y is the comparison of:
{Yi (0) : i ∈ A} and {Yi (1) : i ∈ A}. (1)
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 6 / 43
7. Post-treatment Variables
Post-treatment variable Sobs
i : variable observed after treatment
assignment in addition to the main outcome Y .
Assume Sobs
i is binary for simplicity.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 7 / 43
8. Net Treatment Effect
Net Treatment Effect (NTE) is the comparison of:
Y obs
i |Sobs
i = s, zi = 0 and Y obs
i |Sobs
i = s, zi = 1. (2)
which, under complete randomization, reduces to
Yi (0)|Si (0) = s and Yi (1)|Si (1) = s. (3)
NTE is not a causal effect if treatment affects post-treatment variable
(post-treatment selection bias).
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 8 / 43
9. Principal Stratification
Basic principal stratification P0: partition s.t. all units i have the
same vector (Si (0), Si (1)) within any partition of P0.
Principal stratification P: partitions are unions of partitions in P0.
Example: P = {{i : Si (0) = Si (1)}, {i : Si (0) = Si (1)}} (4)
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 9 / 43
10. Principal Effect
SP
i : the stratum of P to which unit i belongs.
Principal Effect: A comparison of potential outcomes under control vs
treatment within a principal stratum θ in P
{Yi (0) : SP
i = θ} and {Yi (1) : SP
i = θ}. (5)
The stratum SP
i is unaffected by treatment for any principal
stratification P.
Therefore, any principal effect is a causal effect.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 10 / 43
11. Missing Data in Principal Strata
Usually, one of the post-treatment variables and the potential
outcomes are missing.
Smis
= {Si (z) : all i; z = Zi }, Y mis
= {Yi (z) : all i; z = Zi } (6)
Estimate using Hobs = (Y obs, Sobs, z):
L(Hobs
; θS
, θY
) (7)
Additional assumptions/restrictions needed for a unique MLE of
(θS , θY ).
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 11 / 43
12. Surrogate Endpoints
The primary outcome Y may be too expensive or unfeasible to obtain
in a practical time span.
Surrogate variable: a post-treatment variable used as a ”surrogate”
for the treatment effects on Y . It should satisfy,
(Causal Necessity) Treatment effect on Y can occur only if there’s a
treatment effect on S.
(Statistical Generalizability) Sobs
should well predict Y obs
in an
application study.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 12 / 43
13. Statistical Surrogate
S is a statistical surrogate if, for all fixed s,
Y obs
i |Sobs
i = s, zi = 0 ∼ Y obs
i |Sobs
i = s, zi = 1 (8)
Statistical surrogacy does not satisfy causal necessity.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 13 / 43
14. Principal Surrogate
S is a principal surrogate if, for all fixed s,
Yi (0)|Si (0) = Si (1) = s ∼ Yi (1)|Si (0) = Si (1) = s (9)
or, under randomization,
Y obs
i |Si (0) = Si (1) = s, Zi = 0 and Y obs
i |Si (0) = Si (1) = s, Zi = 1
(10)
Principal surrogacy satisfies causal necessity.
S being a statistical surrogate doesn’t imply it being a principal
surrogate, vice versa.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 14 / 43
15. Associative and Dissociative Effects
Dissociative effect is a comparison between
{Yi (0) : Si (0) = Si (1)} and {Yi (1) : Si (0) = Si (1)}. (11)
Associative effect is a comparison between
{Yi (0) : Si (0) = Si (1)} and {Yi (1) : Si (0) = Si (1)}. (12)
Comparison of (11) and (12) measures the association of surrogate
endpoints and treatment outcomes. If the association is high,
surrogate is a good target.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 15 / 43
16. Summary
Scholars have defined net treatment effect using posttreatment
variables. But this is not a causal effect.
Principal stratification lets us define principal effect, which is a causal
effect within each stratum.
One application of principal stratification is surrogate endpoints that
are useful when the outcome is too expensive to measure.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 16 / 43
17. Estimation of Causal Effects via Principal Stratification
When Some Outcomes Are Truncated by ”Death”
(Zhang & Rubin, 2003)
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 17 / 43
18. Summary
Truncation by death is different from censor by death. Should be
taken care of using principal stratification.
Using principal stratification, we can estimate a causal effect for
stratum without truncation by death.
We can find upper/lower bounds for such causal effect.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 18 / 43
19. Truncation by Death
”Missing”,”Censored” = ”Truncated”
The causal effect is defined on R for ”Censored by Death”.
The causal effect is defined on {R, ∗} for ”Truncation by Death”.
Previous approaches have treated ”Truncation” as ”Censoring”:
Ignore truncated values.
Impute truncated outcomes in R.
Model a missing-data mechanism due to ”censoring”.
Principal stratification addresses this issue.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 19 / 43
20. Example: Educational Program Assessment
Two educational programs: Treatment (T) and Control (C)
Graduation Indicators: Si (T), Si (C) ∈ {G, D}
Principal stratification by the graduation indicator:
T
C
G D
G GG GD
D DG DD
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 20 / 43
21. Truncation and Causal Effect
Causal effect not defined on GD and DG due to truncation.
¯Y obs(T) − ¯Y obs(C) measures the effect of the mixture of strata,
which is misleading if either GD or DG exists. We should adjust for
the pair of indicators (Si (T), Si (C)) instead.
What we want to know: ¯YGG (T) − ¯YGG (C).
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 21 / 43
22. Large-Sample Bounds
Unfortunately, we don’t directly observe the principal strata. What we
do observe are
OBS(T, G) = {i : Zi = T, Sobs
i = G}
OBS(T, D) = {i : Zi = T, Sobs
i = D}
OBS(C, G) = {i : Zi = C, Sobs
i = G}
OBS(C, D) = {i : Zi = C, Sobs
i = D}
Large-sample bounds for the average causal effect on Y in the GG
principal stratum can be derived.
This can be sharpened with additional assumptions:
Assumption 1. (Monotonicity) No DG group.
Assumption 2. (Ranked average score) When assigned treatment, GG
performs better than GD; when assigned control, GG performs better
than DG.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 22 / 43
23. Large-Sample Bounds - Calculation
What we want to know: ¯YGG (T) − ¯YGG (C).
OBS(T, G) is the πGG
πGG +πGD
and πGD
πGG +πGD
mixture of the GG and GD.
¯YGG (T)’s upper (lower) bound can be found by averaging over the
largest (smallest) πGG
πGG +πGD
fraction of OBS(T, G).
¯YGG (C)’s bounds can be found analogously on OBS(C, G)
Together, we can bound ¯YGG (T) − ¯YGG (C).
Additional assumptions can further bound ¯YGG (T) − ¯YGG (C).
Monotonicity: πDG = 0.
Ranked average score: ¯YGG (T) achieves minimum when it equals
¯YGD(T); ¯YGG (C) achieves minimum when it equals ¯YDG (C).
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 23 / 43
24. Large-Sample Bounds - Summary
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 24 / 43
25. Summary
Truncation by death is different from censor by death. Should be
taken care of using principal stratification.
Using principal stratification, we can estimate a causal effect for
stratum without truncation by death.
We can find upper/lower bounds for such causal effect.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 25 / 43
26. A Refreshing Account of Principal Stratification
(Mealli & Mattei, 2012)
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 26 / 43
27. Summary
The paper formalizes the framework of principal stratification analysis.
Causal mediation analysis is used when post-treatment variables can
be intervened.
In causal mediation analysis, we can potentially mix information
across principal strata to infer values of missing data.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 27 / 43
28. Advantages of Principal Strata
Parsimony achieved by classifying units by principal strata instead of
baseline features (Pearl 2011).
The coarsest choice of subpopulations to maintain ignorability of the
treatment Zi :
Yi (0), Yi (1) ⊥⊥ Zi |Si (0), Si (1), Xi (13)
Considering Yi (z) not Yi (z, s) simplifies the estimation:
Assume that S is not manipulatable.
Disregard ’a priori counterfactuals’ (Yi (z, Si (1 − z))).
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 28 / 43
29. Reformalization of Effects on Principal Strata
From now on, S takes values in its support S (not necessarily binary).
Principal Causal Effect (PCE)
PCE(s0, s1) = E[Yi (1) − Yi (0)|Si (0) = s0, Si (1) = s1]. (14)
Principal Strata Direct Effect (PSDE) of Z on Y at level s ∈ S
PSDE(s) = E[Yi (1) − Yi (0)|Si (0) = Si (1) = s]. (15)
PSDE is a.k.a dissociative effect.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 29 / 43
30. Reformalization of Effects on Principal Strata (cont’d)
Average Natural Direct Effect:
NDE(z) = E[Yi (1, Si (z)) − Yi (0, Si (z))] (16)
Average Natural Indirect Effect:
NIE(z) = E[Yi (z, Si (1)) − Yi (z, Si (0))] (17)
Average Natural Direct Effect within a subpopulation P,
NDEP(z)
=
s0=s1=s
PSDE(s)πP
s,s
+
s0=s1
E[Yi (1, Si (z)) − Yi (0, Si (z))|Si (0) = s0, Si (1) = s1]πP
s0,s1
,
(18)
where πP
s0,s1
is the proportion of subjects with Si (0) = s0 and
Si (1) = s1 in P.
PSDEP(s) = 0 does not imply NDEP(z) = 0.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 30 / 43
31. Reformalization of Effects on Principal Strata (cont’d)
Average Total Causal Effect:
ACE = NDE(z) + NIE(1 − z) (19)
ACE =E[Yi (1) − Yi (0)] =
(s0,s1)
PCE(s0, s1)πs0,s1
=
s0=s1=s
PSDE(s)πs,s +
s0=s1
PCE(s0, s1)πs0,s1 .
(20)
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 31 / 43
32. CACE vs ACE
If S = {0, 1}, Compliers Average Causal Effect (CACE)
CACE = E[Yi (1) − Yi (0)|Si (0) = 0, Si (1) = 1] (21)
To identify ACE, we also need to extrapolate CACE to non-compliers
with additional assumptions.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 32 / 43
33. Causal Mediation Analysis
Instead of Yi (z), investigate the potential outcomes Yi (z, s).
S is regarded as an additional treatment that can be intervened.
A priori counterfactuals: Yi (0, s(1)), Yi (1, s(0))
The goal is to estimate the effect of intervention (aka indirect effect)
from a data with no intervention.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 33 / 43
34. Sequential Ignorability Assumptions
Sequential ignoralibility assumptions (Imai 2010):
{Yi (z , s), Si (z)} ⊥⊥ Zi |Xi = x (22)
Yi (z , s) ⊥⊥ Si (z)|Zi = z, Xi = x (23)
Under S.I.A, we can extrapolate the information on Yi (z, Si (z)) to
Yi (z, Si (1 − z)).
Extrapolation across principal strata (even for units with little data) is
possible.
But the treatment received should be confounded by S, so S.I.A may
not be credible!
Instead, start from preliminary principal stratification analysis and mix
information across strata if reasonable.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 34 / 43
35. Mixing Information Across Principal Strata: Si(0) = Si(1)
Simple case: When Si (0) = Si (1) for the subpopulation of units P,
Yi (z, Si (1 − z)) can be observed. For this subpopulation,
NDEP(z)
=
s0=s1=s
PSDE(s)πP
s,s
+
s0=s1
E[Yi (1, Si (z)) − Yi (0, Si (z))|Si (0) = s0, Si (1) = s1]πP
s0,s1
=
s0=s1=s
PSDE(s)πP
s,s,
(24)
which is a weighted average of the principal strata direct effects
PSDE(s).
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 35 / 43
36. Mixing Information Across Principal Strata: Si(0) = Si(1)
Harder case: When Si (0) = Si (1) for the subpopulation of units P,
Yi (z, Si (1 − z)) cannot be observed.
Let’s think about NDE(0) first.
If we find that
different principal strata had similar covariate distributions and/or
outcome levels are similar under one of the treatment levels,
the problem can be simplified. For example, if we let
uv = {i : Si (0) = u, Si (1) = v} for u, v ∈ {0, 1} and find evidence
that
E[Yi (0)|i ∈ 01] = E[Yi (0)|i ∈ 00], (25)
then we might assume
E[Yi (1, Si (0))|i ∈ 01] = E[Yi (1, Si (0))|i ∈ 00], (26)
where the right hand side can be estimated. Similarly,
E[Yi (1, Si (0))|i ∈ 10] = E[Yi (1, Si (0))|i ∈ 11]. (27)
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 36 / 43
37. Mixing Information Across Principal Strata
Under the assumptions in the previous slide, NDE01(0) = NDE00(0)
and NDE10(0) = NDE11(0). Thus,
NDE(0) =NDE00(0)π00 + NDE11(0)π11 + NDE10(0)π10 + NDE01(0)π01
=NDE00(0)(π00 + π01) + NDE11(0)(π11 + π10)
=PSDE(0)(π00 + π01) + PSDE(1)(π11 + π10).
(28)
NIE(1) = ACE − NDE(0) (29)
NIE(0) can be estimated analogously from NDE(1).
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 37 / 43
38. Surrogate Endpoints Revisited
We did not quite understand...
What is full principal stratification?
What is an example of a ’surrogate paradox’?
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 38 / 43
39. Summary
The paper formalizes the framework of principal stratification analysis.
Causal mediation analysis is used when post-treatment variables can
be intervened.
In causal mediation analysis, we can potentially mix information
across principal strata to infer values of missing data.
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 39 / 43
41. Discussion
Comparison of distributions versus comparison of means.
Rubin carefully defines ”effect” using a generic term, ”comparison”
between two distributions.
Page 8 on Mealli&Mattei: If PSDE(z) = 0 for each z ∈ S then there is
no evidence on the direct effect of the treatment after controlling for
the mediator
Isn’t this an overstatement because PSDE is defined as an expected
value of differences in two distributions?
If the post-treatment S is continuous, then what?
When we stratify, do we split S into bins?
If so, what is the optimal splitting? How many bins?
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 41 / 43
42. Discussion
How do we assess that interventions on S are conceivable?
If we don’t have a ”large-sample”, how can we get a limited-sample
bound?
Will we estimate π’s using a validation experiment and then estimate
bounds on an application experiment?
Kojin Oshiba & Wenshuo Wang (Harvard) STAT 286 March 28, 2018 42 / 43