1. Causal Inference in
Epidemiology
Part 2
PH 250B Fall 2016
Jade Benjamin-‐Chung,PhD MPH
Colford-‐HubbardResearch Group
Adapted from Professor Jen Ahern’s 250B slides
2. Outline
1. What does causal inference entail?
2. Using directed acyclic graphs
a. DAG basics
b. Identifying confounding
c. Understanding selection bias
3. Causal perspective on effect modification
a. Brief recap of effect modification (EM)
b. Linking EM in our studies to reality
c. Types of interaction
d. Causal interaction / EM
1. Sufficient cause model (“causalpies”)
2. Potential outcomes model (“causal types”)
e. Choosing which measure of interaction to estimate and report
4. Integrating causal concepts into your research
3. Reality of study design
• We often don’t have ideal data on our population
of interest
• The data we collect are incomplete
• Statistics can help us understand correlations or
associations between exposures and outcomes
• Typically what we really want to know is if the
exposure causes the outcome
4. What does causal inference entail?
• Careful definition of our estimation goals
• A set of assumptions that allow us to link our observed
data to ideal data that would be used to reach our
goals
• Causal inference techniques help us
– Express assumptions about our data in a transparent,
mathematical form
– Provide us with mathematical steps to translate
assumptions into quantities that can be estimated with
observed data
Pearl, Glymour & Jewell, 2016
5. 5
1. Define research hypothesis
– Your hypothesis can include possible effect modification
– Determine to what extent you aim to make causal inferences
using your data
2. Determine study design (trial, cohort, etc.)
3. Draw a DAG
a. Identify potential confounders
b. Choose which variables to measure
4. Analyze your data
a.
b.
c.
Control for confounders identified in step 3
Assess effect modification on the additive or multiplicative
scale
Make statistical inferences
5. Make scientific inferences about your hypothesis
Causal inference in your research
6. Outline
1. What does causal inference entail?
2. Using directed acyclic graphs
a. DAG basics
b. Identifying confounding
c. Understanding selection bias
3. Causal perspective on effect modification
a. Brief recap of effect modification (EM)
b. Linking EM in our studies to reality
c. Types of interaction
d. Causal interaction / EM
1. Sufficient cause model (“causalpies”)
2. Potential outcomes model (“causal types”)
e. Choosing which measure of interaction to estimate and report
4. Integrating causal concepts into your research
7. Causal diagrams as mathematical language
“Graphical methods now
provide a powerful
symbolic machinery for
deriving the consequences
of causal assumptions
when such assumptions
are combined with
statistical data.”
Pearl J, 2009, Causality
8. 8
Directed Acyclic Graphs (DAGs)
• Visually depict assumptions about causal relationships
between exposures, outcomes, and other variables
– Depicts the “data generating process”
• DAGs depict our knowledge (or beliefs) about the “data
generating process”
• DAGs are informed by subject matter knowledge, prior
research, and a priori hypotheses
• Learning curve on terminology and approach – practice helps!
Can be very useful tool once you are comfortable with it
9. How can we use DAGs?
Generally
• Document assumptions about cause-‐effect
relationships
• Explore implications of those assumptions
• Assess how to make causal inferences from both
one’s data and one’s assumptions
Today
• To understand selection bias
• To identify confounding
Pearl J, 2009, Causality
10. • Direct causal relationships between variables
are represented by arrows
– Directed
– All causal relationships have a direction
– A given variable cannot be simultaneously a cause
and an effect
SES Prenatal Care
1
0
DAG construction
11. Malnutrition (M)
Infection (I) I (t=0) I (t=1)
M (t=0) M (t=1)
• There are no feedback loops
– Acyclic
– Causes always precede their effects
– To avoid feedback loops, extend graph over time
1
1
DAG construction
12. Vitamins Birth Defects
Prenatal Care
Difficulty conceivingSES
Maternal genetics
• Parent & Child:
– Directly connected by an arrow
– Prenatal care is a “parent” of birthdefects
– Birth defects is a “child” of pre-‐natalcare
1
2
DAG terminology
13. Vitamins Birth Defects
Prenatal Care
Difficulty conceivingSES
Maternal genetics
• Ancestor & Descendant:
– Connected by a directed path of a series of arrows
– SES is an “ancestor” of Birth Defects
– Birth Defects is a “descendant” of SES
1
3
DAG terminology
14. Smoking
Smoking
CancerTar Mutations
Cancer
• Absence of a directed path from X to Y
implies X has no effect on Y
– Directed paths not in the graph as important as those in
the graph
• Note: Not all intermediate steps between two
variables need to be represented
– Depends on level of detail of the model
1
4
DAG assumptions
15. • DAGs assume that all common causes of exposure
and disease are included
– Common causes that are not observed should still be
included
– These are often denoted with a “U” to indicate they were
unmeasured
U (religious
beliefs, culture,
lifestyle, etc.)
Alcohol Use
Smoking
Heart Disease
DAG assumptions
1
5
20. Speed
Rider Skill/Experience
Bicycle
Characteristics
Bicycle Traffic
Road/Lane/Path
Surface
Road Grade
Car Traffic
Populace Bicycle
Awareness
Bicycle Lane/
Path
Example
Bicycle Fall
20
What are some
assumptions are we
making?
Bicycle lane/path
only has an effect on
bicycle falls through
its effect on bicycle
traffic
Road surface does
not affect bicycle
traffic
All common causes of
speed and bicycle fall
are included (even
those unmeasured)
21. Statistical underpinnings of DAGs
• Multiple possible causal models for this DAG:
Y Z
X = School funding
Y = SAT Scores
Z = College Acceptance
X = UX
Y = (x/3) + UY
Z = (y/16) + UZ
X = Number of hours worked per week
Y = Number of training hours per week
Z = Race completion time
X = UX
Y = 84 – x + UY
Z = (100/y) + UZ
UX UY
X
UZ
Pearl, Glymour & Jewell, 2016
22. Statistical underpinnings of DAGs
• Both models share the same statistical relationships:
• Z and Y are dependent
• Y and X are dependent
• Z and X are likely dependent
• Z and X are independent
depending on the values of Y
Y Z
UX UY
X
UZ
X = School funding
Y = SAT Scores
Z = College Acceptance
X = UX
Y = (x/3) + UY
Z = (y/16) + UZ
Pearl, Glymour & Jewell, 2016
23. Statistical underpinnings of DAGs
• Both models share the same statistical relationships
For specific values of these variables (lower case x, y, z):
• Z and Y are dependent
• Y and X are dependent
• Z and X are likely dependent
• Z and X are independent
depending on the values of Y
Y Z
UX UY
X
UZ
Pearl, Glymour & Jewell, 2016
24. Conditioning on a variable in a DAG
• “Conditioning” on a variable means filtering the data into
groups based on the value of a variable
• A box is often used around a variable denote that it is being
conditioned on (e.g., in this DAG we condition on Y)
• This is equivalent to stratifying the data or controlling for a
variable in a statistical model
X Y Z
UX UY UZ
25. DAG configurations
X
X
Y Z
Y
Z
X Z
Y
Chain
Fork
Collider
* Has special considerations
and challenges
Pearl, Glymour & Jewell, 2016
27. CancerDiet
BMI
Colliders
Conditioning on
BMI induces an
association
between diet and
cancer
CancerDiet
BMI
Among those who
have had a BMI
decrease there
will be larger
numbers of
dieters and larger
numbers of
people with
cancer than in the
total population
2
7
28. Colliders
YX
Why does conditioning on a collider induce an
association between its parents?
Example: Z=X+Y
Pearl, Glymour & Jewell, 2016
2
8
Do not condition on Z:
• X=3 ! know nothing about Y
Z
29. Colliders
YX
Z
Why does conditioning on a collider induce an
association between its parents?
Example: Z=X+Y
Pearl, Glymour & Jewell, 2016
2
9
Do not condition on Z:
• X=3 ! know nothing about Y
Condition on Z:
• Z=10, X=3 ! know Y=7
Thus, X and Y are dependent
given that (“conditional on”)
Z = 10
30. 3
0
Strengths of DAGs
• Can determine which variables depend on each
other in our causal model without knowing the
specific functions (e.g., Z=X+Y in the previous slide)
connecting them (Pearl, Glymour & Jewell, 2016)
• Allow us to link our causal model to our statistical
relationships in our data
• DAGS can incorporate measurement error as well
(Hernan & Cole, 2009)
31. 3
1
Limitations of DAGs
• Cannot display effect modification easily (example of road
surface)
• Arrows in graphs do not provide specific definitions of effects
(contrast with counterfactuals)
• Can become extremely complicated when representing real
data structures
• Are not designed to capture effects of infectious disease
interventions that may impact not only intervention
recipients but also non-‐recipients(e.g., herd effects of
vaccines)
33. 3
3
DAGs
• DAG itself is not used to analyze data from the study
you’ve conducted
– Informs how study is designed/data are collected
– Informs how data are analyzed
– Helps identify which research questions are answerable in
a given data set
• Utility of DAGs dependent on accuracy/correctness
of associations we represent in the diagram
34. 3
4
Non-‐parametricstructural equation
models
• Non-‐parametricstructural equation models
(NPSEMs) provide a link between DAGs and
counterfactuals and are a way to analyze data
• They encode relationships between variables that
can include many possible equations and functional
forms
• Non-‐parametricestimation used to avoid
assumptions of typical SEMs (e.g., linearity)
• Learn more about this topic in PH252D
35. Example of NPSEM
Y Z
UX UY
X
UZ
Previous example:
X = School funding
Y = SAT Scores
Z = College Acceptance
X = UX
Y = (x/3) + UY
Z = (y/16) + UZ
Pearl, Glymour & Jewell, 2016
NPSEM:
X = School funding
Y = SAT Scores
Z = College Acceptance
X = fX(UX)
Y = fY(X, UY)
Z = fZ(Y, UZ)