2. New(‐ish) tools to aid causal inference
• To aid identification of bias sources and sets
of adjustment covariates DAGsof adjustment covariates: DAGs.
• For adjustment of measured confounders:
algorithmic treatment modeling (PS, IPTW,
or OTW) combined with outcome modelingor OTW) combined with outcome modeling
to achieve “double robustness”.
T t f t i t b t• To account for uncertainty about
unmeasured confounders and other
uncontrolled bias sources: bias analysis.
5 June 2013 Greenland, Modern methods 2
4. Background readings:
G l d S (2010) O h i h• Greenland S (2010). Overthrowing the tyranny
of null hypotheses hidden in causal diagrams.
Ch 22 in: Dechter R Geffner H and HalpernCh. 22 in: Dechter, R., Geffner, H., and Halpern,
J.Y. (eds.). Heuristics, Probabilities, and
Causality: A Tribute to Judea Pearl. London:Causality: A Tribute to Judea Pearl. London:
College Publications, 365‐382.
• Greenland S (2012). Causal inference as aGreenland S (2012). Causal inference as a
prediction problem: Assumptions, identification,
and evidence synthesis. Ch. 5 in Berzuini C, y
Dawid AP, Bernardinelli L, eds. Causal Inference:
Statistical Perspectives and Applications. Wiley,
h hChichester, 43‐58.
5 June 2013 4Greenland, Modern methods
5. “Mathematics is one necessary tool [but] any
i i i h ll i histatistician who actually practices his art
must possess many additional resources…the
mathematical tail has been allowed to wag
the statistical dog for far too long… I think g g
that the built‐in mathematical bias of many
statistics departments and of much that westatistics departments and of much that we
are presently teaching is not innocuous; it is
in fact antiscientific ”in fact antiscientific.
– George Box, Statistical Science 1990
5 June 2013 Greenland: Is causal inference more 5
6. Cautions (conclusions, 2011)
C t f l “ l i f ”• Current formal “causal inference”
approaches are mostly about modeling
effects in single studies, and projection to
conditionally exchangeable populations. y g p p
• As technically sophisticated as current
causal inference methods may seem theycausal‐inference methods may seem, they
are far too simple to encompass the
di it f id th t h t bdiversity of evidence that has to be
synthesized in most real health and medical
decision problems.
5 June 2013 6Greenland, Modern methods
9. • Intuition is notoriously faulty, full of biases,
i l d i d isome innate, some value driven, and is
horrific at probability logic.
• Cognitive psychology and behavioral
economics provides books full of dramatic p
examples – which can be used to recognize
biases (e g double counting confirmationbiases (e.g., double counting, confirmation
bias, overconfidence, and wish bias):
“My colleagues they study artificialMy colleagues, they study artificial
intelligence; me, I study natural stupidity.”
‐Amos Tversky
5 June 2013 9Greenland: Is causal inference more
11. • Even in situations with clear risks, those
ith t ti ti l hi ti ti h twith statistical sophistication have not
outperformed those without (Susser, AJE
1977 i l i h lth i l )1977 gives classic health‐science examples).
• Similar lessons are seen in econometrics,
where pseudo‐Nobel laureates with
impressive mathematical skills have lost
fortunes for investors (e.g., the 1998 LTCM
fund disaster in which Merton and Scholes
lost nearly $5 billion; the 2009 Trinsum
bankruptcy; etc.): do(X=x), X= buy, sell
5 June 2013 11Greenland: Is causal inference more
12. Subjective elements and values play a
decisive role in all statistical analysesdecisive role in all statistical analyses
• There is an illusory sense of objectivity
induced when there is great overconfidence,
as when individuals feel infallible or there is
strong social agreement.
• Feelings of objectivity in turn feed back toFeelings of objectivity in turn feed back to
create overconfidence. This is well illustrated
historically by scientists, statisticians, andhistorically by scientists, statisticians, and
often entire fields being certain of
hypotheses later refuted.hypotheses later refuted.
5 June 2013 Greenland ‐ Bayes Workshop 12
13. Classic statistician examples:
• Fisher against smoking causing lung cancer
• Jeffreys against continental drifty g
Classic clinician‐researcher examples:
F i t i & H it i t t• Feinstein & Horwitz against estrogen
therapy causing much endometrial cancer
• Indiscriminate promotion of trans‐fat
margarine and low‐fat diets in the 1970s andmargarine and low fat diets in the 1970s and
1980s for weight loss and CHD prevention,
along with dismissal of the sugar relationalong with dismissal of the sugar relation.
5 June 2013 Greenland ‐ Bayes Workshop 13
14. Some facts of statistical life:
• Data alone do not convey information; they• Data alone do not convey information; they
are interpreted via models for their
generationgeneration.
• Models are sets of assumptions about the
d t ti (DGP)data‐generation process (DGP).
• Models are analogous to language
grammars: No model, no meaning.
• Unfortunately, unlike bad grammar, bad stat y g
modeling may not produce gibberish, even
if the models and outputs are very wrong.
5 June 2013 Greenland, Modern methods 14
p y g
15. The classical tensions
• Bias vs. precision: Assumptions introduce
bias to the extent they are incorrect butbias to the extent they are incorrect, but
increase precision to the extent that they
l d d i ht t lt tiexclude or down‐weight most alternatives.
• Procedures are “optimal” only under meta‐p y
assumptions, some untestable.
• Models for causal inference always include• Models for causal inference always include
untestable terminal randomization (no
id l f di “i bili ”)residual confounding, “ignorability”).
5 June 2013 15Greenland, Modern methods
17. Neyman’s (1923) potential‐outcome
(“ t f t l”) l t d l(“counterfactual”) causal meta‐model:
• Say X and Y are the treatment and outcome
variables of interest. Then Y is replaced by
a list (vector) of the outcomes that would
follow under different treatments. So if X =
1 or 0, Y is replaced by the potential‐
outcome vector (Y1,Y0) where
Y1 = outcome if X is 1, Y0 = outcome if X is 01 0
• Yx can be replaced by a parameter θx , e.g.,
the outcome probability (risk)the outcome probability (risk)
5 June 2013 Greenland, Modern methods 17
18. Causal inference under the potential‐outcome
l b blmodel becomes a prediction problem:
• Causal‐inference (CI) problems are• Causal‐inference (CI) problems are
isomorphic to missing‐data problems:
At most only one potential outcome is
observed; the rest are missing (Rubin, Ann
Stat 1978).
• Thus the vast predictive (imputational)• Thus the vast predictive (imputational)
machinery of statistics can be used for
i f b t l tinference about causal parameters.
5 June 2013 Greenland, Modern methods 18
20. From consistency, we get a precise definition
of sufficiency for confounding control:of sufficiency for confounding control:
A set of covariates Z is sufficient for control
f f di if h bof confounding if the outcomes we observe
when X=x follow the distribution of Yx given Z:
p(yobs|x,z) ≡ p(yx|x,z) = p(yx|z)
which is independence of X and Y given Z:which is independence of X and Yx given Z:
For all x and z, X ╨ Yx | z,
(“ id l f di ” “ d(“no residual confounding”, “no unmeasured
confounding”, “weak ignorability”); Z is also
minimal sufficient if no s bset of Z is s fficientminimal sufficient if no subset of Z is sufficient.
5 June 2013 Greenland, Modern methods 20
21. Further insights from potential outcomes:
B h 1960 h d l i• By the 1960s, methodologists were
developing methods for summarizing
f d i di i iconfounder sets using discriminant or
regression scores. The performance of the
i l l hvarious proposals was not clear, however.
• Rosenbaum & Rubin (1983) showed that,
given a sufficient set Z, the conditional
treatment distribution p(x|z) is itself
sufficient to control confounding of marginal
(total‐population) X effects by covariates in Z.
5 June 2013 Greenland, Modern methods 21
22. • For binary X, p(1|z) is usually called the
“ it ” (PS) t l f thi“propensity score” (PS); control of this score
will remove confounding when Z is sufficient.
• For other X, Robins, Mark & Newey (1992)
showed that, when Z is sufficient, control of
the regression score E(X|z) is sufficient for
control of confounding of additive effects of
X on Y. (note: PS = E(X|z) when X is binary)
Nonetheless, the missing‐data viewpoint leads g p
to other, more general ways to adjust for
confounding using treatment probabilities. g g p
5 June 2013 Greenland, Modern methods 22
23. • Inverse probability of treatment weighting
(IPTW) was adapted from survey weighting(IPTW) was adapted from survey weighting
ideas (Robins, Hernán, Brumback 2000).
I l b d i d f l i l di• It can also be derived from classical direct
standardization (Sato and Matsuyama 2003):
p(y|x) = ∑z p(y|x,z)p(z) = ∑z p(y,x,z)p(z)/p(x,z)
= ∑z p(y,x,z)/p(x|z) = ∑z wzp(y,x,z), ∑z p(y,x,z)/p(x|z) ∑z wzp(y,x,z),
where wz=1/p(x|z).
Th if Z i ffi i t th IPTW• Thus, if Z is sufficient, then IPTW removes
marginal confounding by averaging using the
i ht f ll ( t d di ti )same weights for all x (standardization).
5 June 2013 Greenland, Modern methods 23
24. Despite PO/PS/IPTW theory providing
landmark insights it is far from completelandmark insights, it is far from complete
for most health/med analyses:
I d h d l• It does not say how to model treatment,
but mismodeling can render the estimated
PS i ffi i d bi h ff iPS insufficient and bias the effect estimate;
• It does not address sampling variation or
how to balance bias vs. variance, e.g., in an
RCT, the randomization indicator predicts p
treatment perfectly so controlling it yields
infinite variance yet adjusts for no bias;y j
5 June 2013 Greenland, Modern methods 24
25. • It focuses on marginal (population‐
averaged) effects (ACE LATE CACE) It doesaveraged) effects (ACE, LATE, CACE). It does
not guide accurate estimation of effect
heterogeneity (modification) or conditionalheterogeneity (modification) or conditional
effects (e.g., effects in men vs. women),
which are essential for clinical practice;which are essential for clinical practice;
• It defines but does not operationalize how
f d ff l ffto find a sufficient or minimal sufficient Z.
These deficiencies are largely traceable to g y
omitting the outcome from modeling
(which Rubin AAS 2008 strongly advises). ( g y )
5 June 2013 Greenland, Modern methods 25
26. A simple solution: Treatment modeling
followed by outcome modelingfollowed by outcome modeling
Classical modeling for causal inference
th t Y X d Z f if Zregresses the outcome Y on X and Z, for if Z
is sufficient, E(Yobs|x,z) ≡ E(Yx|x,z) = E(Yx|z).
• The model for potential means E(Yx|z) is
called a structural model or structural
equation.
• This approach estimates conditional effects pp
as well as marginal effects (by averaging
over Z). As with PS, however, it will be ) , ,
biased by mismodeling.
5 June 2013 Greenland, Modern methods 26
27. By combining treatment modeling with
outcome modeling we can create estimatesoutcome modeling, we can create estimates
that are at least approximately doubly
robust (DR): If Z is sufficient the estimatedrobust (DR): If Z is sufficient, the estimated
effect of X on Y will be unconfounded if
either of the models is correcteither of the models is correct.
The simplest DR approaches either
• regress Y on X, Z, and PS as a covariate,
• regress Y on X, Z in a PS‐matched sample, orregress Y on X, Z in a PS matched sample, or
• regress Y on X, Z using IPT or OT weights.
E h f th h h dEach of these approaches have pros and cons.
5 June 2013 Greenland, Modern methods 27
28. Treating PS as a covariate:
Th l i f h PS i k b hi hl• The relation of the PS to risk can be highly
nonlinear and can be discontinuous when
i di Th i h PScovariates are discrete. Thus entering the PS
as a few terms may not retain sufficiency.
Hi hl fl ibl f l i b d dHighly flexible formulations may be needed
(e.g., many category indicators for the PS, or
h l d )machine‐learning procedures).
• The PS is a composite of Z; it thus can be p
highly collinear with Z terms in the outcome
model, leading to imprecision.g p
5 June 2013 Greenland, Modern methods 28
30. Weighted outcome regression:
• Ordinary fitting methods for estimating
treatment probabilities tend to produce very p p y
small values for some subjects, resulting in
huge highly unstable weights There arehuge, highly unstable weights. There are
several approaches to weight stabilization:
h b d1. Restore the X margin: Robins and crew
use wz = p(x)/p(x|z), but this weight may still
be too unstable, leading to crude fixes like
weight trimming to obtain sensible results. g g
5 June 2013 Greenland, Modern methods 30
31. 2. Ridgeway & McCaffrey (2004, 2007)
weight by the odds of X=1 vs X=X :weight by the odds of X=1 vs. X=Xobs:
wz=1 if X=1, wz= p(1|z)/p(0|z) if X=0.
• This odds‐of‐treatment weighting (OTW)
standardizes to the treated (X=1), as in PS
matching to the exposed.
• They fit these odds with a machine‐learningThey fit these odds with a machine learning
algorithm (boosted lasso).
Their approach eliminates stability problemsTheir approach eliminates stability problems.
Similar results have been reported using
related algorithms to fit probabilities for IPTWrelated algorithms to fit probabilities for IPTW.
5 June 2013 Greenland, Modern methods 31
35. Directed acyclic graphs and causal diagrams
• A DAG shows the factors in the problem as
nodes linked by arrows only, with no y y,
feedback loops.
• A graph is a causal diagram if the arrowsA graph is a causal diagram if the arrows
are interpreted as links in causal chains
(formalization is a bit controversial; R&R)(formalization is a bit controversial; R&R).
• Causal effects of one variable on another
are transmitted by causal sequences whichare transmitted by causal sequences, which
are directed (head‐tail) paths: X→Y→Z
means X can affect Z
2 Feb 2012 Greenland 35
means X can affect Z
37. Colliders vs. noncolliders on a path
P th l d (bl k d) t llid• Paths are closed (blocked) at colliders:
Associations cannot be transmitted across
a collider (→C←) on a path unless we
stratify (condition) on it or something it y ( ) g
affects (such as F in C→F).
• Paths are open (unblocked) at noncolliders:• Paths are open (unblocked) at noncolliders:
Associations can be transmitted across a
llid ( di →C→ f knoncollider (a mediator →C→ or a fork
←C→) on a path unless we stratify on it
2 Feb 2012 Greenland 37
completely.
38. Think of associations as signals flowing
h h h hthrough the graph
• A variable can transmit associations along g
some open (unblocked) directions but not
along closed (blocked) directionsalong closed (blocked) directions.
• The open and closed directions are
h d d b dswitched around by conditioning
(stratifying) on the variable, and are
partially switched by partially or indirectly
conditioning.
2 Feb 2012 Greenland 38
co d t o g
40. “Control” of bias in causal modeling
• Target path: A path that transmits some of
the effect we want to estimate; it is athe effect we want to estimate; it is a
directed path from cause to effect.
Bi i h A h h b• Biasing path: Any other open path between
the cause and effect variables.
• By judicious conditioning, we must close all
biasing paths without closing target pathsbiasing paths without closing target paths
or opening new biasing paths. (This isn’t
always possible with available data )
2 Feb 2012 Greenland 40
always possible with available data.)
41. Graphical sufficiency
• If conditioning on Z closes all biasing paths
while leaving all target paths open Z iswhile leaving all target paths open, Z is
sufficient for control of bias.
• If Z is sufficient (for control of bias) but no
subset is sufficient, Z is minimal sufficient.
Like almost all graphical concepts and results,
these are qualitative (topological); they dothese are qualitative (topological); they do
not address extent of bias. But they can aid
i i i l i i dinitial covariate screening and more.
5 June 2013 Greenland, Modern methods 41
42. Example: inadequacy of statistical criteria
Among traditional statistical criteria for
defining or detecting confounders are:defining or detecting confounders are:
• C is associated with E and with D given E
Adj t t f C h th E D• Adjustment for C changes the E‐D
association (noncollapsibility).
These are equivalent in linear systems.
(Often added: C must precede E and D.)( p )
Graphs illustrate how both criteria can fail,
leading to adjustment that increases biasleading to adjustment that increases bias.
5 June 2013 Greenland, Modern methods 42
44. Instrumental variables in a linear system:
A and F assoc with E and D|E yet worse bias if youA and F assoc with E and D|E, yet worse bias if you
adjust conventionally for A or F
A (B)A (B)
F E
A may be intent‐to‐treat D
2 Feb 2012 Greenland 44
49. Confounding paths from E to D: None!Confounding paths from E to D: None!
A [B]A [B]
[C][C]
FF
E D
2 Feb 2012 Greenland 49
50. What if essential variables are not
d ( ff l bl )measured? (no sufficient Z available)
We then have to turn to sensitivity analysis ofWe then have to turn to sensitivity analysis of
bias (bias analysis; see Ch. 19 of ME3) to
get an idea of how much bias is left afterget an idea of how much bias is left after
adjustment for measured covariates, and
how much uncertainty is appropriatehow much uncertainty is appropriate.
• Ordinary statistics ignore uncertainty about
unmeasured or mismeasured variables andunmeasured or mismeasured variables, and
so are grossly overconfident (intervals much
too narrow P values much too small)too narrow, P‐values much too small).
5 June 2013 Greenland, Modern methods 50
51. All the usual validity problems
can be viewed bias due to missing datacan be viewed bias due to missing data
• Confounding: nonrandomly missing
potential outcomes
• Selection bias: nonrandomly missingSelection bias: nonrandomly missing
subjects
M i i l• Measurement error: missing actual
variables of interest, so we use proxies in
their place (which may produce bias even if
the error is random)
5 June 2013 Greenland ‐ Bayes Workshop 51
)
52. This view enables use of imputation methods
for bias analysis (Greenland, 2009):
Completed data = observed + imputed dataCompleted data observed + imputed data
• To make any inference beyond what we see
(th b d) t h d l th t(the observed), we must have a model that
projects from the observed data to the
missing data (or to aspects of the data, like
means) to get the completed data.) g p
• In bias analysis, however, key parameters
are not identified by the observations
5 June 2013 Greenland ‐ Bayes Workshop 52
are not identified by the observations.
53. As a result, bias analysis can have far more
impact on results than other methods Yet itimpact on results than other methods. Yet it
has seen the least adoption. Possible reasons:
It i f i ti t ff t t• It requires far more investigator effort to
specify the model and inputs (one group is
t i t f l t id li t thi )trying to formulate guidelines to ease this),
• Once specified, it is nowhere near as easy to
run with commercial software as other
methods,
• It can completely ruin any hint of
decisiveness or “significance” of results.g
5 June 2013 Greenland, Modern methods 53
54. III. Conclusion:
Some modern tools you should know
• For identification of bias sources and• For identification of bias sources and
sufficient adjustment sets: DAGs.
• For adjustment of measured confounders:
algorithmic treatment modeling (PS, IPTW,
or OTW) combined with outcome modeling
to achieve double robustness.to achieve double robustness.
• To account for uncertainty about
t ll d bi bi l iuncontrolled bias: bias analysis.
5 June 2013 Greenland, Modern methods 54