2. Contents
1
2
3
4
OVERVIEW. G-FORMULA AND INVERSE PROBABILITY WEIGHTING
- MIGUEL HERNAN
MARGINAL STRUCTURAL MODELS AND G-ESTIMATION OF
STRUCTURAL NESTED MODELS
- JUDITH LOK
SINGLE WORLD INTERVENTION GRAPHS AND OTHER RECENT
DEVELOPMENTS IN CAUSAL INFERENCE
- JAMES ROBINS
CAUSAL MEDIATION ANALYSIS
- TYLER VANDERWEELE
2
5
CAUSAL METHODS TO TAME UNMEASURED CONFOUDING
- ERIC TCHETGEN TCHETGEN
4. Terminology
4
โข Upper-case letters (e.g. A, Y): random variables
โข Lower-case letters (e.g. a, y): possible values of
random variables (i.e. fixed to individual)
โข Potential outcomes = counterfactual outcomes:
the situation would have been observed under A=a
โข Consistency: If ๐ด๐ = ๐, then ๐๐
๐
= ๐๐
๐ด
= ๐๐ (same as Rubinโs
โstable-unit-treatment-value assumption(SUTVA)โ)
โข ๐ท๐[๐๐=๐ = ๐] : proportion of indivi. Y=1, had everybody
been treated <- unconditional probability
โข ๐ท๐ ๐ = ๐ ๐จ = ๐] : proportion of indivi. Y=1 among who
received treatment <- conditional probability
5. Association and Causation
5
โข Fundamental problem of causal inference
โ Individual causal effects cannot be determined (= missing data
problem)
โข Average causal effects of A on Y in the population exist if
๐ท๐ ๐๐=๐ = ๐ โ ๐ท๐ ๐๐=๐ = ๐
โข Association between A and Y in the population exist if
๐ท๐ ๐ = ๐ | ๐จ = ๐ โ ๐ท๐ ๐ = ๐ | ๐จ = ๐
โ If there is no association, they are independent ๐ดโ๐
โข If outcome is nondichotomous, ๐ ๐๐=๐ (population mean or
expectation) can be substituted
โข There is confounding when ๐ท๐ ๐๐
= ๐ โ ๐ท๐ ๐ = ๐ | ๐จ = ๐
7. Conditions for causal inference
7
โข Ideal randomized experiment
โ No loss to follow-up
โ Full compliance with assigned treatment
โ One version of treatment (well-defined treatment)
โ Double blind assigment (neither subjects nor investigators know)
1. Exchangeability <- marginal randomization
โ ๐ท๐ ๐๐
= ๐ ๐จ = ๐] = ๐ท๐ ๐๐
= ๐ ๐จ = ๐] โบ ๐จโ๐๐
๐๐๐ ๐๐๐ ๐ (โ ๐จโ๐)
โ Lack of confounding
โ If not holds, we need counfouding adjustment
2. Positivity (explained later)
โ No treatment that no one take
3. Consistency (explained later)
โ SUTVA
9. Exchangeabilityโฆ
9
โข Holds in mariginally randomized experiments => no need
of confounding adjustment : association is causation
โข Nonparametric estimation with saturated linear model
๐ธ ๐ ๐ด = ๐0 + ๐1๐ด
โ If not treated, i.e. A=0, ๐ธ ๐ ๐ด = 0 = ๐0 + ๐1 ร 0 = ๐0
โ If treated, i.e. A=1, ๐ธ ๐ ๐ด = 1 = ๐0 + ๐1 ร 1 = ๐0 + ๐1
โ Then, average effects estimate is ๐0 + ๐1 โ ๐0 = ๐1 (achievable
nonparametrically)
โข Not hold in conditionally randomized experiments,
observational study => need of confounding adjustment
โข What if randomized conditioning on L?
โ Need to estimate ๐ธ[๐๐=1
] and ๐ธ[๐๐=0
], i.e. The standardized mean in
the treated and in the untreated
10. Standardization
10
๐ธ ๐๐=1
= E ๐๐=1
๐ฟ = 1 ร Pr ๐ฟ = 1 + E ๐๐=1
๐ฟ = 0 ร Pr ๐ฟ = 0
= E ๐ ๐ฟ = 1, ๐ด = 1 ร Pr ๐ฟ = 1 + E ๐ ๐ฟ = 0, ๐ด = 1 ร Pr ๐ฟ = 0
= เท
๐=1,0
๐ธ[๐|๐ฟ = ๐, ๐ด = 1] ร Pr[๐ฟ = ๐]
๐ธ ๐๐=0
= เท
๐=1,0
๐ธ[๐|๐ฟ = ๐, ๐ด = 0] ร Pr[๐ฟ = ๐]
Exchangeability
โข Then, we can estimate the average causal effect
โข Without model (refer above)
โข With model (bias-variance trade-off)
โ Nonparametric estimation = saturated linear model ๐ธ ๐ ๐ด, ๐ฟ = ๐0 +
๐1๐ด + ๐2๐ฟ + ๐1๐ดL
: allow difference in treatment effect between L
: the curse of dimensionality
โ Parametric estimation = nonsaturated linear model ๐ธ ๐ ๐ด, ๐ฟ = ๐0 +
๐1๐ด + ๐2๐ฟ
: restriction that treatment effect is the same
: variance become larger
11. Inverse probability(IP)
weighting
11
โข IP weighting is anther methods to gain the average causal
effects under conditional exchangeability, i.e. adjustment
The goal is to make the pseudo-population with weighting
12. The way of IP weighting and
interpretation
12
โข IP weights: ๐๐ด =
1
๐(๐ด|๐ฟ)
โ ๐(๐) is the probability density function (pdf) of the random
variable A evaluated at the value a
โข Stabilized IP weights: ๐๐๐ด =
๐(๐ด)
๐(๐ด|๐ฟ)
โ get the same size of pseudo-population as the original one
โ Lead to smaller variance (by avoiding weighting too much on
small number of people)
โข Generalized IP weights: ๐บ๐๐ด =
๐(๐ด)
๐(๐ด|๐ฟ)
13. Violations of positivity
13
โข Structural
โ No causal inferences for subsets w/ structural non-positivity
โ Causal inference by restricting the study population
โข Random
โ Causal inference with parametric models to smooth over the
subsets w/ non-positivity
โข โฆSo,
โข Standardization vs IP weighting ?
โ In nonparametric estimate, same outcome is acquired
โ In parametric estimate, the outcome is different
โ Recommend doubly-robust methods
14. Suppose we want to estimate
the causal effect of A on Yโฆ
14
โข If everybody had been treated: ๐ธ[๐๐=1]
โข If everybody had been untreated: ๐ธ[๐๐=0
]
โข Then, average causal effect: ๐ธ[๐๐=1
] - ๐ธ[๐๐=0
]
โข Weighted regression model ๐ธ ๐ ๐ด = ๐0 + ๐1๐ด
โ Associational model
โ The difference, i.e. ๐1 would have a causal interpretation as
๐ธ[๐๐=1
] - ๐ธ[๐๐=0
] when all confounders are included in
calculation (e.g. Estimate ini the pseudo-population)
โข Marginal Structural Model (MSM) ๐ธ[๐๐] = ๐ฝ0 + ๐ฝ1๐
โ Causal model
โ ๐1 = ๐ฝ1 if ๐1 would have a causal interpretation
15. Marginal structural model
(MSM)
15
โข Structural?
โ Structural = causal
โ The outcome variable is counterfactual
โ i.e. parameters for treatment in MSM have direct causal interpretation
โข Marginal?
โ Marginal = unconditional
โ No need to include the confounders as covariates in the model
โ i.e. effect may be estimated in the entire population
โ If include covariates, allow to estimate conditional effect
โข Again, recommend doubly-robust estimators
16. Advantage of MSM with IP
weighting or standardization
16
โข IP weighting and standardization are able to control
time-varying treatment and confounders
โ Can handle treatment-confounder feedback
โข Outcome regression and propensity score methods
introduce bias if treatments and confounders are time-
variant
โ Can not handle treatment-confounder feedback
18. Case study 2
18
What is the causal effect of AZT on mortality?
โ AZT (Zidovudine, Retrovir) : anti-HIV drug
โ PCP : Pneumocystis pneumonia
โ Prophylaxis : anti-HIV drug
19. Taking account of the
treatment regime
19
โข Treatment Regime is
โ AZT was followed by โno prophylaxisโ if no PCP
โ AZT was followed by โprophylaxisโ if PCP
โข Intention-to-treat assessment?
โข Conditioning on PCP (might lead to selection bias)?
โข Also, positivity violation?
โข Conditioning on prophylaxis?
-> important to talk with subject-matter experts
20. MSM conditioning on past
treatment
20
โข Under the assumption that
โ Independent, identically distributed full data (i.i.d.)
โ No unmeasured confounding
โ Missing At Random(MAR) : censoring depend on past observed
characteristics but not on forther prognosis
โ Positivity
โข MSM investigate the effect of โstaticโ treatment regimes
โ Meaning treatment would not be patient-specific or be affected by
previous outcomes
โ IP was the probability of receiving actual treatment for each patient,
i.e. ๐(๐ด๐ = ๐๐|๐ฟ๐, ๐ด๐โ1)
21. How to model multistate?
21
โข Problem is
โ Computationally involved
โ Necessity to model each transition given the past
โ No specific parameter to indicate whether treatment affects the
outcome of interest -> no standard test for treatment effect
๏ Structural Nested Mean Model and Structural Failure Time
Models
โ To estimate the effect of treatment on the final outcome
22. Structural Nested Mean Model
(SNMM)
22
โข Assumptions
โ No unmeasured confounding
โ Consistency
โข Treatment effects, or difference bw observed outcomes and
counterfactual is defined as,
โ ๐พ๐
เดฅ
๐๐, ๐๐ = ๐ธ[๐ ๐๐,เดฅ
0
โ ๐ ๐๐โ1,เดฅ
0
|๐ฟ๐ = เดฅ
๐๐, ๐ด๐ = ๐๐]
โ The outcome had treatment stopped at k+1 versus at k
24. Directed Acyclic Graphs
(DAGs)
24
โข Whose nodes (vertices) are random variables with directed
edges (arrows) and no directed cycles
โข Parents of variables (e.g. ๐๐ด1 = ๐0)
โข A path is closed if it contains a collider, otherwise path is
open
โข Complete DAG
โ There is an arrow between every pair of nodes
โ ๐ ๐ฃ = ๐(๐ฃ3|๐ฃ1, ๐ฃ2)๐(๐ฃ2|๐ฃ0)๐(๐ฃ1|๐ฃ0)๐(๐ฃ0)
โ Nonparametric (saturated) model
โข Incomplete DAG
โ ๐ ๐ฃ = ๐(๐ฃ3|๐ฃ0, ๐ฃ1, ๐ฃ2)๐(๐ฃ3|๐ฃ0, ๐ฃ1) ๐(๐ฃ1|๐ฃ0)๐(๐ฃ0)
25. d-separation and d-connected
25
โข D-separation
โ When no open path between two variables along which
probability can flow, we call two variables are d-separated
โ Otherwise, we call they are d-connected
โ We also think d-separation and d-connected with
condition
โข If two sets of nodes are d-separated, they will be
independent in every distribution in DAG (soundness)
โข If two sets of nodes are not d-separated, there will be some
distribution that they are not independent in DAG
(completeness)
26. Causal DAGs
26
1. Lack of arrows = the absence of direct causal effects
2. Any variables are causes of all its descendants (vise versa)
3. All common causes must be on the graph even if they are
not measured
4. Causal Markov Assumption (CMA) is hold: the causal DAG
= a statistical DAG = distribution of factors
5. CMA = conditional on its direct causes, a variable is
independent of any variable it does not cause
โ d-separation implies statistical independence
โ d-connection does not imply statistical dependence (but generally we
assume dependence)
27. Single-World Intervention
Graphs (SWIGs)
27
โข The way to represent counterfactuals on the graphs
โข SWIG G(0) represents Pr ๐ด, ๐๐=0
โข SWIG G(1) represents Pr ๐ด, ๐๐=1
โข Since we cannot show ๐๐=0
and ๐๐=1
on the same SWIG,
the name Single-World Intervention Graphs is appropriate
28. SWIGs for dynamic regimes
28
โข The treatment at time t is determined by ๐๐ก
โข Under the regime ๐ = (๐0, ๐ฟ1)
โ ๐ด0
+๐
= ๐0
โ ๐ด1
+๐
= ๐1 ๐ฟ1
๐0
= ๐ฟ1
๐0
โข For any regime g, static or dynamic, the g-
formula will identify the counterfactual
outcome if :
โ ๐๐
โ๐ด0
โ ๐๐
โ๐ด1|๐ฟ1, ๐ด0 = ๐0
โข And, the g-formula will have a causal
interpretation
โ ๐ธ ๐๐ ฯ๐1
๐ธ[๐|๐ด1 = ๐1, ๐ฟ1 = ๐1, ๐ด0 = ๐0]Pr[๐ฟ1 = ๐1|๐ด0 = ๐0]
We do not consider the treatment A is d-co
30. Standard approach to
investigate mediation
30
1. Difference method
โ Using the difference between the two coefficient
โ ๐ธ ๐ ๐ด = ๐, ๐ถ = ๐ = ๐0 + ๐1๐ + ๐2๐
โ ๐ธ ๐ ๐ด = ๐, ๐ = ๐, ๐ถ = ๐ = ๐0 + ๐1๐ + ๐2๐ + ๐4๐
โ Indirect effect = ๐1 โ ๐1
โ Direct effect = ๐1
2. Product method
โ ๐ธ ๐ ๐ด = ๐, ๐ถ = ๐ = ๐ฝ0 + ๐ฝ1๐ + ๐ฝ2๐
โ ๐ธ ๐ ๐ด = ๐, ๐ = ๐, ๐ถ = ๐ = ๐0 + ๐1๐ + ๐2๐ + ๐4๐
โ Indirect effect = ๐ฝ1๐1
โ Direct effect = ๐1
โข Product method and difference method
โ coincide for continuous outcomes
โ Will not coincide for binary outcomes
31. Limitation1: Mediator-
outcome confounding
31
โข Just as unmeasured exposure-outcome confounders can
generate confounding bias of estimates of overall effects,
mediator-outcome confounders can generate bias of
estimates of direct and indirect effects
โข Meaning, we might get paradoxical result!
โข Approach 1) pay attention to mediator-outcome
confounding variables even during study design stage
โข Approach 2) conduct sensitivity analysis
32. Limitation2:exposure-
mediator interactions
32
โข Even if we include an interaction term, often analysis goes:
โ ๐ธ ๐ ๐ด = ๐, ๐ถ = ๐ = ๐0 + ๐1๐ + ๐2๐
โ ๐ธ ๐ ๐ด = ๐, ๐ = ๐, ๐ถ = ๐ = ๐0 + ๐1๐ + ๐2๐ + ๐3๐๐ + ๐4๐
โ Indirect effect = ๐1 โ ๐1
โ Direct effect = ๐1
โข Approach 1) consider the causal definitions f direct and
indirect effects for mediation analysis and required
unmeasured confounding assumptions
โข Approach 2) describe the regression methods which can
be used in accord with the above definition
โข Approach 3) provide sensitivity analysis techniques to
assess the possible violations to unmeasured confounding
assumptions
34. Approach 1) no unmeasured
confounder assumption
34
1. No unmeasured exposure-outcome confounders given C
2. No unmeasured mediator-outcome confounders given
(C,A)
3. No unmeasured exposure-mediator confounders given C
4. No unmeasured mediator-outcome confounder affected by
exposure
35. Approach 2) Regression model
for causal mediation analysis
35
โข Similar concepts apply to treatment levels A=a to A=a*,
then get the expression of regression
โ ๐ธ ๐ ๐ด = ๐, ๐ = ๐, ๐ถ = ๐ = ๐0 + ๐1๐ + ๐2๐ + ๐3๐๐ + ๐4๐
โ ๐ธ ๐ ๐ด = ๐, ๐ถ = ๐ = ๐ฝ0 + ๐ฝ1๐ + ๐ฝ2๐
โ ๐ถ๐ท๐ธ = ๐1 + ๐3๐ ๐ โ ๐ โ
โ ๐๐ท๐ธ = ๐1 + ๐3 ๐ฝ0 + ๐ฝ1๐ + ๐ฝ2๐ธ ๐ถ ๐ โ ๐ โ
โ ๐๐ผ๐ธ = ๐2๐ฝ1 + ๐3๐ฝ1๐ ๐ โ ๐ โ
โข SE can be obtained using the delta
โข Proportion mediated is the indirect effect divided by the total
effect (SAS, STATA and SPSS can do automatically for continuous,
binary, count, and time-to-event outcomes
36. Approach 2) cautions for
binary outcomes
36
โข Difference method for a dichotomous outcomes and logistic
regression will give valid estimates, provided
โ Model without the interaction is correctly specified
โ No unmeasured confounding assumptions are satisfied
โ Outcome is rare (can be relaxed by using log-linear)
โข With common outcome, the difference method fails with
logistic regression due to non-collapsibility
โข Monte Carlo approach, a simulation-based approach give
more flexibility
37. Approach 3) sensitivity
analysis
37
โข In order to examine the extent to which the unmeasured
confounder would have to affect both the mediator and the
outcome to invalidate conclusions about NDE and NIE
โข With an observed NDE or NIE of RR, we have unmeasured
confounder if ๐ ๐ ๐๐ and ๐ ๐ ๐ด๐|๐ are greater than:
โ ๐ธ โ ๐ฃ๐๐๐ข๐ = ๐ ๐ + ๐ ๐๐๐ก[๐ ๐ ร ๐ ๐ โ 1 ]
โ ๐ ๐ ๐๐|๐ด = 1, ๐ = max
max Pr(๐=1|๐ด=1,๐)
min Pr(๐=1|๐ด=1,๐)
: the max effect among exposed of U
on Y, not through M
โ ๐ ๐ ๐ด๐|๐ = ๐๐๐ฅ
Pr(๐ข|๐ด=1,๐)
Pr(๐ข|๐ด=0,๐)
: smaller in magnitude on the RR scale than
magnitude of max effect U on M across strata of A
โ We can apply this in a routine manner to both the estimate and the
confidence interval limit closest to the null
38. Surrogate paradox
38
โข โsurrogate paradoxโ is manifest if
โ The surrogate and outcome are strongly positively correlated
โ The treatment has a positive effect on the surrogate
โ The treatment has a negative effect on the outcome
โข Might happen if
โ E[Y|a,s,u] is NOT non-decreasing in a, i.e. a negative direct effect of A
on Y, OR
โ E[Y|a,su] is NOT non-decreasing in s, i.e. the positive correlation of S
and Y is not because of the actual effect but because of confounding,
OR
โ P(S>s|a,u) is NOT non-decreasing in a, i.e. transitivity fails; A affects
S for different people than S affects Y
39. Unification of Mediation and
Interaction
39
โข Assess mediation in the presence of interaction to get direct
and indirect effects
โข Under the composition assumption that ๐๐ = ๐๐๐๐, total
effects can be decomposed into four components
โ ๐1 โ ๐0 = ๐10 โ ๐00 + ๐11 โ ๐10 โ ๐01 + ๐00 ๐0 + ๐11 โ ๐10 โ ๐01 + ๐00 (
)
๐1 โ
๐0 + ๐10 โ ๐00 ๐1 โ ๐0
โ CDE : effect of A in the absence of M
โ INTref : interaction that operates only if the mediator is present in the
absence of exposure
โ INTmed : interaction that operates only if the exposure changes the
mediator
โ PIE : effect of the mediator in the absence of the exposure times the
effect of the exposure on the mediator
40. CAUSAL METHODS TO TAME UNMEASURED CONFOUDING
- ERIC TCHETGEN TCHETGEN
41. Instrumental variable (IV)
41
โข IV approach refers to a particular set of methods that allow
one to recover a causal effect of an exposure in the
presence of unmeasured confounding
โข Key assumption : one has observed a pre-exposure
unconfounded IV, which affects the outcome only through
its effects on the exposure
๐โ๐|๐, ๐ด
โข Mendelian randomization is also kind of instrumental
variable approach
42. Formal definition of IV
42
โข Assumption 1) Z and A are associated, Z has a causal
effect on A, or Z and A share common causes
โข Assumption 2) Z affects the outcome Y only through A, i.e.
no direct effect of Z on Y (exclusion restriction)
โข Assumption 3) Z does not share common causes with the
outcome Y, i.e. no confounding for the effect of Z on Y
โข Assumption 4) Monotonicity : there are no defiers, that is,
there is only never takers (๐ด0 = 0, ๐ด1 = 0), always takers
(๐ด0 = 1, ๐ด1 = 1), compliers (๐ด0 = 0, ๐ด1 = 1), but defiers (๐ด0 =
1, ๐ด1 = 0),
43. Complier average causal
effect(CACE)
43
โข The causal effect for individuals who would adhere to their
assignment
โข OR, the effect for individuals for whom treatment is
manipulable
โข Instrumental variable estimand ๐ฝ๐ด =
๐ฝ๐
๐ผ๐
=
๐๐๐ข๐ ๐๐ ๐๐๐๐๐๐ก ๐๐ ๐ ๐๐ ๐
๐๐๐ข๐ ๐๐ ๐๐๐๐๐๐ก ๐๐ ๐ ๐๐ ๐ด
โข Wald estimand
โ
๐๐๐๐๐๐ก ๐๐ ๐๐๐๐๐๐๐๐ง๐๐ก๐๐๐ ๐๐ ๐=๐ผ๐๐ ๐๐๐๐๐๐ก
๐๐๐๐๐๐ก ๐๐ ๐๐๐๐๐๐๐๐ง๐๐ก๐๐๐ ๐๐ ๐๐๐๐๐๐๐๐๐๐
=
๐ธ ๐ ๐ = 1 โ๐ธ(๐|๐=0)
Pr ๐ด = 1 ๐ = 1 โPr(๐ด=1|๐=0)
โข CACE
โ ๐ถ๐ด๐ถ๐ธ = ๐ธ ๐1 โ ๐0 ๐ด1 > ๐ด0 =
๐ธ ๐ ๐ = 1 โ๐ธ(๐|๐=0)
Pr ๐ด = 1 ๐ = 1 โPr(๐ด=1|๐=0)
โ However, both counterfactuals are never observed for a person, thus
compliers are not identified
44. Effect of treatment on the
treated (ETT)
44
โข Alternative assumption 4) no current treatment value
interaction
โข The advantage is that it does not require the monotonicity
assumption, BUT requires ruling out the possibility of effect
heterogeneity of the effect of A in the treated
โข ๐ธ๐๐ =
๐ธ ๐ ๐ = 1 โ๐ธ(๐|๐=0)
Pr ๐ด = 1 ๐ = 1 โPr(๐ด=1|๐=0)
= ๐ธ ๐1 โ ๐0 ๐ด = 1
46. Covariates
46
โข In the IV approach, we must account for covariates to
โ Account for confounding of the effects of Z on Y, and preventing a
violation of the exclusion restriction by C
โ Partially account for confounding of the effects of A on Y
โ Explain variation in the outcome Y to improve efficiency
47. Two stage least square (2SLS)
47
โข Another methods to estimate the causal effect instead of IV
estimand or wald estimand
โข Stage 1: fit a linear regression of A on Z and C, and
compute the predicted value
แ
๐ด = เท
๐ธ ๐ด ๐, ๐ถ = เท
๐ผ0 + เท
๐ผ1๐ + เท
๐ผ2๐ถ
โข Stage 2:fit a linear regression of Y on predicted A and C
๐ธ ๐ แ
๐ด, ๐ถ = ๐0 + ๐1
แ
๐ด + ๐2๐ถ
48. Reference
1
2
Hernรกn MA, Robins JM (2019). Causal Inference. Boca Raton:
Chapman & Hall/CRC, forthcoming.
https://www.hsph.harvard.edu/miguel-hernan/causal-inference-
book/
VanderWeele, T. (2015). Explanation in causal inference: methods
for mediation and interaction. Oxford University Press.
https://www.amazon.com/Explanation-Causal-Inference-
Mediation-Interaction/dp/0199325871
48