GonzalezGinestetResearchDay2016

Bayesian Adjustment for Confounding
(BAC) in Bayesian Propensity Score
Estimation
Pablo Gonzalez Ginestet
McGill University, CNODES, Lady Davis Institute
pablo.gonzalezginestet@mail.mcgill.ca
12th Annual EBOH Research Day
Montreal, April 2016

Outline
1 Traditional PS Estimation
2 Bayesian PS Estimation
Uncertainty regards the PS
Uncertainty regards the PS + Model uncertainty
3 BAC in Bayesian PS & Results
First Stage
Second Stage
4 Conclusion

Traditional PS Estimation
It is a sequential process:
1 “PS stage”: PS = P(Xi = 1|Ci ) is estimated:
logit(PS) =
p
k=1
γk Ck,i
2 “Outcome stage”: the causal eﬀect is estimated adjusting for the
ˆPS(ˆγ)
logit(E[Yi |Xi , Ci ]) = β0 + βX Xi + ξh( ˆPS)

logit(PS) =
p
k=1
γk Ck,i
ˆPS(ˆγ)
Remark 1. Outcome stage treats ˆPS as ﬁxed and known ⇒ It
ignores the uncertainty in regards to the PS.

logit(PS) =
p
k=1
γk Ck,i
ˆPS(ˆγ)
Remark 1. Outcome stage treats ˆPS as ﬁxed and known ⇒ It
ignores the uncertainty in regards to the PS.
Remark 2. It ignores model uncertainty regarding the selection of
confounders for the PS.
The set of covariate is ﬁxed⇒ MPS = { one model }

Uncertainty regards the PS
Bayesian PS Estimation
(McCandles, Gustafson and Austin 2009, and Zigler et al. 2013)
Bayesian PS estimates the PS stage and outcome stage
simultaneously:
“PS stage”:
logit(PS) =
p
k=1
γkCk,i
“Outcome stage”:
logit(E[Yi |Xi , Ci ]) = β0 + βX Xi + ξh(PS) +
p
k=1
δkCk,i
The set of covariate is ﬁxed⇒ MPS = { one model }

Uncertainty regards the PS + Model
uncertainty
(Zigler and Dominici 2014)
simultaneously:
“PS stage”:
logit(PS) =
p
k=1
α
x|c
k γkCk,i
p
k=1
α
x|c
k δkCk,i
The set of covariates is NOT ﬁxed⇒ MPS = { all possible models}

Uncertainty regards the PS + Model
uncertainty
(Zigler and Dominici 2014)
simultaneously:
“PS stage”:
logit(PS) =
p
k=1
α
x|c
k γkCk,i
p
k=1
α
x|c
k δkCk,i
The set of covariates is NOT ﬁxed⇒ MPS = { all possible models}
Posterior distribution of the ACE:
p(ACE|data) ≈
αx|c ∈MPS
p(ACEαx|c
|αx|c
, data)p(αx|c
|data)

Remarks, Motivation & Goal
Remark 1 Uninformative Prior: Each model has equal prior
probability: p(αx|c) = 1
|MPS |

|MPS |
Remark 2 Most of the time IVs are included in the PS model

|MPS |
Remark 2 Most of the time IVs are included in the PS model
Goal: To limit the selection of IVs
Strategy: Informative Prior on the PS model indicator αx|c

Illustrative Example
We simulate 250 replicated data sets under N = 1000 and p = 7
covariates
{C1, C2, C3} true confounder; {C4} risk factor of outcome; {C5, C6}
IVs and {C7} noise variable
ACE = = P(Y = 1|X = 1) − P(Y = 1|X = 0) = 0.06

First Stage
BAC in Bayesian PS
Prognostic score model (based on a single treatment group (Hansen
2008))
logit(E[Yi,0|Ci ]) =
p
k=1
α
y|c
k ηkCk,i

First Stage
BAC in Bayesian PS
Prognostic score model (based on a single treatment group (Hansen
2008))
logit(E[Yi,0|Ci ]) =
p
k=1
α
y|c
k ηkCk,i
Prior distribution on αx|c|αy|c following Wang et all. (2012):
p(α
x|c
k = 1|α
y|c
k = 0)
p(α
x|c
k = 0|α
y|c
k = 0)
=
1
ω
p(α
x|c
k = 1|α
y|c
k = 1)
p(α
x|c
k = 0|α
y|c
k = 1)
= 1

The above constrains imply the following:
P(α
x|c
k = 0|α
y|c
k = 0) =
ω
1 + ω
P(α
x|c
k = 1|α
y|c
k = 0) =
1
1 + ω
P(α
x|c
k = 0|α
y|c
k = 1) = P(α
x|c
k = 1|α
y|c
k = 1) =
1
2

First Stage: Informative prior p(αx|c
|Y) over all models across
ω = 1(−), 5(◦), 20( ), 50( ) and 100(•).
Our objective is p(αx|c
|Y) which is
p(αx|c
|Y) =
αy|c ∈My|c
p(αx|c
|αy|c
)p(αy|c
|Y)

Second Stage
BAC in Bayesian PS
It is a Bayesian PS stage with an informative prior p(αx|c|Y)
PS model
logit(P[Xi |Ci ]) =
p
k=1
α
x|c
k γkCk,i
Outcome model
p
k=1
α
x|c
k δkCk,i
MPS = {128 models}

The expected role of the informative prior
BAC in Bayesian PS
In the MCMC, we propose to move from model αx|c (current) to
α x|c (proposed):
⇒adding one covariate (α
x|c
j = 0 → α
x|c
j = 1) or
⇒deleting one covariate (α
x|c
j = 1 → α
x|c
j = 0)
Accept the proposed move with probability (for the adding case)
min
L(data|θα x|c , α x|c)p(θα x|c |α x|c)
L(data|θαx|c , αx|c)p(θαx|c |αx|c)ϕ(u)
p(α x|c|Y)
p(αx|c|Y)
, 1

Posterior probability that Ck is included in the model p(α
x|c
k |Y, X)
across ω’s along Best subset

p(α
x|c
k |Y, X) when acceptance probability includes penalty term

Bias and MSE of estimates of ACE across ω’s with ( ) and
without (◦) penalty vs Kitchen sink and Best subset

Conclusions
Novel approach: i) joining two methodology: BAC and Bayesian PS,
ii) informative prior and iii) applying RJMCMC.

Conclusions
The simulation study found that:
the informative prior was not able to shape the proﬁles of models
selected

Conclusions
selected
we have proposed to solve the above adding a penalty term so the
informative prior kicks in.

Conclusions
selected
we have proposed to solve the above adding a penalty term so the
informative prior kicks in.
with the penalty term, the informative prior was able to inﬂuence the
PIP of the IV without distorting the PIP of all other variables.
Thanks to: Robert Platt (McGill U.); Francesca Dominici (Harvard
U.); Matt Cefalu (RAND);Genevi`eve Lefebvre (UdeM) Jay Kaufman
(McGill U.); Sahir Bhatnagar (McGill U.); Maxime Turgeon (McGill
U.);CNODES and Jewish General Hospital at Montreal.

Appendix. Simulation Exercise
p = 7 covariate ⇒ consist of 128 (= 27) models. Ignoring the model
with no covariate |My|c| = |MPS | = 27 − 1 = 127.
Two scenarios: i) N = 300, and ii) N = 1000.
For all i = 1, 2, ...., N:
we simulate p covariates Ci ∼ MVN(0, I)
the exposure variable Xi is simulated from a Bernoulli distribution with
probability given by:
P(Xi |Ci ) =
exp(
p
k=1 γk Ck,i )
1 + exp(
p
k=1 γk Cm,k ))
(1)
where we set γ = (γ1, γ2, ..., γ7) = (0.6, −0.6, 0.1, 0, 0.6, 0.1, 0).
the outcome variable Yi is generated similarly from a Bernoulli
distribution with probability given by:
P(Yi |Xi , Ci ) =
exp(β0 + βXi +
p
k=1 φk Ck,i )
1 + exp(β0 + βXi +
p
k=1 φk Ck,i ))
(2)
where we set φ = (φ1, φ2, ..., φ7) = (0.6, 0.1, −0.6, 0.6, 0, 0, 0) and
β0 = 0 and β = 0.1.

Thus, αx|c = (1, 1, 1, 0, 0, 0, 0) is the model that contains the
confounders necessary to satisfy the assumption of no unmeasured
confounders. This model is called the minimal model and we denote
it as α∗x|c.
Lastly, this setting implies a true ACE equal to = 0.06 (calculated
based on a much larger sample size using the true value of the
parameters in order to compute P(Y = 1|X = 1) − P(Y = 1|X = 0))

Appendix
Joint Bayesian PS estimation
The likelihood of the PS stage is given by:
L(X|γ, ¯αm
, C) =
N
i=1
g−1
x
p
k=1
¯αm
k γkCk,i
Xi
1 − g−1
x
p
m=1
¯αm
k γkCk,i
1−Xi
and the likelihood of the outcome stage is given by:
L(Y|β, γ, δ, ξ, ¯αm
, X, C) =
N
i=1
g−1
y β0 + βX Xi + ξT
h(PS) +
p
k=1
¯αm
k δkCk,i
Yi
(3)
× 1 − g−1
y β0 + βX Xi + ξT
h(PS) +
p
k=1
¯αm
k δkCk,i
1−Yi

Joint Bayesian PS estimation with α unknown
Another consequence of adding α is that the ACE given by equation
turns into a weighted average over diﬀerent PS and outcome models,
with weights corresponding to the posterior probability of each model.
Formally, let M = {α : α ∈ {0, 1}p} denote the set of all models
being considered where its cardinality is |M| = 2p.
For instance, an element of M is the m-th model
: αm = (αm
1 , ....., αm
p ).
Let p(αm) be the prior probability of the m-th model.
Then, the posterior probability of the m-th model is
p(αm
|data) =
p(αm)p(data|αm)
αi ∈M p(αi )p(data|αi )

Joint Bayesian PS estimation with α unknown
Hence, the posterior distribution of the ACE will be a weighted
average of estimates of ACE under each model in M:
p( |data) ≈
αm∈M
p( m
|αm
, data)p(αm
|data)
where m = ECαm {E[Y |X = 1, Cαm ] − E[Y |X = 0, Cαm ]} and Cαm
denotes the subset of C which is included in model αm.
Remark 5. The m is an estimate of the causal eﬀect if and only if
αm contains the necessary confounders to satisfy the assumption of
no unmeasured confounders.
Remark 6. It assumes that each model have equal prior probability,
that is, for all possible α: p(α) = 1
|M|

First Stage. Posterior Distributions
Our objective is p(αx|c|Y)
First, we need to compute p(αy|c|Y):
p(αy|c
|Y) ∝ L(Y|αy|c
)p(αy|c
) (4)
L(Y|αy|c) is the marginal likelihood under model αy|c and is equal to:
L(Y|αy|c
) = L(Y|αy|c
, η)p(η|αy|c
)dη (5)
where η is a vector of parameters of the logistic regression parameter
in the prognostic score model for model αy|c and its dimension given
by p
k=1 α
y|c
k
p(η|αy|c) is the prior distribution of parameter η under model αy|c
L(Y|αy|c, η) is the likelihood for model αy|c which involves only the
prognostic score model

L(Y|αy|c) is not analytically tractable and thus we cannot apply
MC3.
We sample from the joint posterior of p(αy|c, η|Y) applying the
algorithm RJMCMC.
Then we compute the informative prior as follows:
p(αx|c
|Y) =
αy|c ∈My|c
p(αx|c
|αy|c
, Y)p(αy|c
|Y)
=
αx|c Y |αy|c αy|c ∈My|c
p(αx|c
|αy|c
)p(αy|c
|Y) (6)
where the last equality assumes that αx|c Y |αy|c.

When ω = 1 corresponds to p(αx|c|Y) being a uninformative
prior. Why?
ω = 1 ⇒ p(α
x|c
k = 0|α
y|c
k = 0) = p(α
x|c
k = 1|α
y|c
k = 0) = p(α
x|c
k =
0|α
y|c
k = 1) = p(α
x|c
k = 1|α
y|c
k = 1) = 1
2 for ∀k.
So, we have that
p(αx|c
|αy|c
) =
p
k=1
p(α
x|c
k |α
y|c
k ) =
1
2p
where p is the number of covariates.
Hence,
p(αx|c
|Y) =
1
2p
αy|c ∈My|c
p(αy|c
|Y) =
1
2p
(7)
since αy|c ∈My|c
p(αy|c|Y) = 1.
Thus, p(αx|c|Y) carries no outcome information to the second stage.

Second Stage.
It is a Bayesian PS estimation stage, based only on the PS and
outcome model
We incorporate an informative prior for the model indicator of the PS
model, p(αx|c|Y), which is inherited from the ﬁrst stage.
The setting of Zigler and Dominici 2014 is a particular case of our
setting when ω = 1 ⇒ p(αx|c|Y) = p(αx|c) = 1
p .
The main goal of this stage is to estimate the Average Causal Eﬀect
(ACE) of treatment with X = 1 vs X = 0.
The posterior distribution of the ACE:
p( |data) ≈
αx|c ∈MPS
p( αx|c
|αx|c
, data)p(αx|c
|data) (8)
where, as before,
αx|c
= ECαx|c
{E[Y |X = 1, Cαx|c ] − E[Y |X = 0, Cαx|c ]} and Cαx|c
denotes the subset of C which is included in model αx|c.

RJMCMC
RJMCMC was proposed by Green 1995 as an extension of the
Metropolis-Hastings algorithm that allows to create a reversible
Markov chain that can “jump” between models with parameter
spaces of diﬀerent dimensions (trans-dimensional Markov chains)
retaining the detailed balance condition which guarantee the correct
limiting distribution.
The standard Metropolis-Hastings within Gibbs sampling algorithm
cannot be applied because when we condition on one model, let say
αx|c, then (β0, βX , ξ, δαx|c , γαx|c ) ∈ Θαx|c but when we condition on
(β0, βX , ξ, δαx|c , γαx|c ), then αx|c cannot move and we cannot move
between models.
We need to complete the spaces or supplement each of them with an
artiﬁcial space in order to make them compatible. In other words, we
need to create a bijection between them.

Outline RJMCMC
Step 1) Update the parameters that are in the current model for
example using Metropolis-Hastings algorithm.
Step 2.a). Generate a proposed variable j ∈ {1, 2, ...., p} to add or
delete from the model with probability 1/p. Thus, we propose to
change α to α where αj = 1 − αj
Step 2.c). If αj = 0 → αj = 1 (include covariate j in the model)
i) Generate the additional parameter u corresponding to variable j from
a proposal density u ∼ ϕ(u)
ii) Set θα = (θα,(−j), uα,(j))
iii) Accept the proposed move with probability
∆{(α, θα) → (α , θα
)} =
min
L(data|θα
, α )p(θα
|α )p(α )
L(data|θα, α)p(θα|α)p(α)ϕ(u)
, 1

iv) If the proposed move is accepted, update α and θα by α and θα
.
Otherwise, leave α and θα unchanged.
Step 2.b).If αj = 1 → αj = 0 (exclude covariate j in the model)
i) Set θα = θα,(−j), it is equal to the corresponding parameter of θα
ii) Accept the proposed move with probability
∆{(α, θα) → (α , θα
)} =
min
L(data|θα
, α )p(θα
|α )p(α )ϕ(u)
L(data|θα, α)p(θα|α)p(α)
, 1
iv) If the proposed move is accepted, update α and θα by α and θα
.
Otherwise, leave α and θα unchanged.

The difference between the first and second stage in terms of the
RJMCMC algorithm lies mainly in:
the likelihood L(data|θα, α).
The likelihood of the the first stage L(data|θαy|c , αy|c) is only based
on the prognostic score
On the other hand, the likelihood in the second stage
L(data|θαx|c , αx|c) is based jointly on the PS and outcome model
the ratio p(α )
p(α) in the acceptance probability of the proposed
move.
In the first stage and the second stage for ω = 1, this cancels out
since each model has equal prior probability.
On the other hand, this ratio does appear in the acceptance
probability in the second stage for ω > 1. This is due to the fact that
p(α x|c|Y) is an informative prior and thus this ratio p(α x|c |Y)
p(αx|c |Y)
will not
cancel out.

GonzalezGinestetResearchDay2016

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to GonzalezGinestetResearchDay2016

Similar to GonzalezGinestetResearchDay2016 (20)

GonzalezGinestetResearchDay2016