SlideShare a Scribd company logo
Bayesian Adjustment for Confounding
(BAC) in Bayesian Propensity Score
Estimation
Pablo Gonzalez Ginestet
McGill University, CNODES, Lady Davis Institute
pablo.gonzalezginestet@mail.mcgill.ca
12th Annual EBOH Research Day
Montreal, April 2016
Outline
1 Traditional PS Estimation
2 Bayesian PS Estimation
Uncertainty regards the PS
Uncertainty regards the PS + Model uncertainty
3 BAC in Bayesian PS & Results
First Stage
Second Stage
4 Conclusion
Traditional PS Estimation
It is a sequential process:
1 “PS stage”: PS = P(Xi = 1|Ci ) is estimated:
logit(PS) =
p
k=1
γk Ck,i
2 “Outcome stage”: the causal effect is estimated adjusting for the
ˆPS(ˆγ)
logit(E[Yi |Xi , Ci ]) = β0 + βX Xi + ξh( ˆPS)
Traditional PS Estimation
It is a sequential process:
1 “PS stage”: PS = P(Xi = 1|Ci ) is estimated:
logit(PS) =
p
k=1
γk Ck,i
2 “Outcome stage”: the causal effect is estimated adjusting for the
ˆPS(ˆγ)
logit(E[Yi |Xi , Ci ]) = β0 + βX Xi + ξh( ˆPS)
Remark 1. Outcome stage treats ˆPS as fixed and known ⇒ It
ignores the uncertainty in regards to the PS.
Traditional PS Estimation
It is a sequential process:
1 “PS stage”: PS = P(Xi = 1|Ci ) is estimated:
logit(PS) =
p
k=1
γk Ck,i
2 “Outcome stage”: the causal effect is estimated adjusting for the
ˆPS(ˆγ)
logit(E[Yi |Xi , Ci ]) = β0 + βX Xi + ξh( ˆPS)
Remark 1. Outcome stage treats ˆPS as fixed and known ⇒ It
ignores the uncertainty in regards to the PS.
Remark 2. It ignores model uncertainty regarding the selection of
confounders for the PS.
The set of covariate is fixed⇒ MPS = { one model }
Uncertainty regards the PS
Bayesian PS Estimation
(McCandles, Gustafson and Austin 2009, and Zigler et al. 2013)
Bayesian PS estimates the PS stage and outcome stage
simultaneously:
“PS stage”:
logit(PS) =
p
k=1
γkCk,i
“Outcome stage”:
logit(E[Yi |Xi , Ci ]) = β0 + βX Xi + ξh(PS) +
p
k=1
δkCk,i
The set of covariate is fixed⇒ MPS = { one model }
Uncertainty regards the PS + Model
uncertainty
Bayesian PS Estimation
(Zigler and Dominici 2014)
Bayesian PS estimates the PS stage and outcome stage
simultaneously:
“PS stage”:
logit(PS) =
p
k=1
α
x|c
k γkCk,i
“Outcome stage”:
logit(E[Yi |Xi , Ci ]) = β0 + βX Xi + ξh(PS) +
p
k=1
α
x|c
k δkCk,i
The set of covariates is NOT fixed⇒ MPS = { all possible models}
Uncertainty regards the PS + Model
uncertainty
Bayesian PS Estimation
(Zigler and Dominici 2014)
Bayesian PS estimates the PS stage and outcome stage
simultaneously:
“PS stage”:
logit(PS) =
p
k=1
α
x|c
k γkCk,i
“Outcome stage”:
logit(E[Yi |Xi , Ci ]) = β0 + βX Xi + ξh(PS) +
p
k=1
α
x|c
k δkCk,i
The set of covariates is NOT fixed⇒ MPS = { all possible models}
Posterior distribution of the ACE:
p(ACE|data) ≈
αx|c ∈MPS
p(ACEαx|c
|αx|c
, data)p(αx|c
|data)
Remarks, Motivation & Goal
Remark 1 Uninformative Prior: Each model has equal prior
probability: p(αx|c) = 1
|MPS |
Remarks, Motivation & Goal
Remark 1 Uninformative Prior: Each model has equal prior
probability: p(αx|c) = 1
|MPS |
Remark 2 Most of the time IVs are included in the PS model
Remarks, Motivation & Goal
Remark 1 Uninformative Prior: Each model has equal prior
probability: p(αx|c) = 1
|MPS |
Remark 2 Most of the time IVs are included in the PS model
Goal: To limit the selection of IVs
Strategy: Informative Prior on the PS model indicator αx|c
Illustrative Example
We simulate 250 replicated data sets under N = 1000 and p = 7
covariates
{C1, C2, C3} true confounder; {C4} risk factor of outcome; {C5, C6}
IVs and {C7} noise variable
ACE = = P(Y = 1|X = 1) − P(Y = 1|X = 0) = 0.06
First Stage
BAC in Bayesian PS
Prognostic score model (based on a single treatment group (Hansen
2008))
logit(E[Yi,0|Ci ]) =
p
k=1
α
y|c
k ηkCk,i
First Stage
BAC in Bayesian PS
Prognostic score model (based on a single treatment group (Hansen
2008))
logit(E[Yi,0|Ci ]) =
p
k=1
α
y|c
k ηkCk,i
Prior distribution on αx|c|αy|c following Wang et all. (2012):
p(α
x|c
k = 1|α
y|c
k = 0)
p(α
x|c
k = 0|α
y|c
k = 0)
=
1
ω
p(α
x|c
k = 1|α
y|c
k = 1)
p(α
x|c
k = 0|α
y|c
k = 1)
= 1
The above constrains imply the following:
P(α
x|c
k = 0|α
y|c
k = 0) =
ω
1 + ω
P(α
x|c
k = 1|α
y|c
k = 0) =
1
1 + ω
P(α
x|c
k = 0|α
y|c
k = 1) = P(α
x|c
k = 1|α
y|c
k = 1) =
1
2
First Stage: Informative prior p(αx|c
|Y) over all models across
ω = 1(−), 5(◦), 20( ), 50( ) and 100(•).
Our objective is p(αx|c
|Y) which is
p(αx|c
|Y) =
αy|c ∈My|c
p(αx|c
|αy|c
)p(αy|c
|Y)
Second Stage
BAC in Bayesian PS
It is a Bayesian PS stage with an informative prior p(αx|c|Y)
PS model
logit(P[Xi |Ci ]) =
p
k=1
α
x|c
k γkCk,i
Outcome model
logit(E[Yi |Xi , Ci ]) = β0 + βX Xi + ξh(PS) +
p
k=1
α
x|c
k δkCk,i
MPS = {128 models}
The expected role of the informative prior
BAC in Bayesian PS
In the MCMC, we propose to move from model αx|c (current) to
α x|c (proposed):
⇒adding one covariate (α
x|c
j = 0 → α
x|c
j = 1) or
⇒deleting one covariate (α
x|c
j = 1 → α
x|c
j = 0)
The expected role of the informative prior
BAC in Bayesian PS
In the MCMC, we propose to move from model αx|c (current) to
α x|c (proposed):
⇒adding one covariate (α
x|c
j = 0 → α
x|c
j = 1) or
⇒deleting one covariate (α
x|c
j = 1 → α
x|c
j = 0)
Accept the proposed move with probability (for the adding case)
min
L(data|θα x|c , α x|c)p(θα x|c |α x|c)
L(data|θαx|c , αx|c)p(θαx|c |αx|c)ϕ(u)
p(α x|c|Y)
p(αx|c|Y)
, 1
Posterior probability that Ck is included in the model p(α
x|c
k |Y, X)
across ω’s along Best subset
We propose to penalize the log-likelihood term of the model that
contains one covariate more.
If the proposed model contains one covariate more
(α
x|c
j = 0 → α
x|c
j = 1):
Ψα x|c = −2 × log(N) ×
p(αx|c|Y)
p(α x|c|Y)
We propose to penalize the log-likelihood term of the model that
contains one covariate more.
If the proposed model contains one covariate more
(α
x|c
j = 0 → α
x|c
j = 1):
Ψα x|c = −2 × log(N) ×
p(αx|c|Y)
p(α x|c|Y)
if the proposed model has one covariate less than the current model
(α
x|c
j = 1 → α
x|c
j = 0):
Ψαx|c = −2 × log(N) ×
p(α x|c|Y)
p(αx|c|Y)
p(α
x|c
k |Y, X) when acceptance probability includes penalty term
Bias and MSE of estimates of ACE across ω’s with ( ) and
without (◦) penalty vs Kitchen sink and Best subset
Conclusions
Novel approach: i) joining two methodology: BAC and Bayesian PS,
ii) informative prior and iii) applying RJMCMC.
Conclusions
Novel approach: i) joining two methodology: BAC and Bayesian PS,
ii) informative prior and iii) applying RJMCMC.
The simulation study found that:
the informative prior was not able to shape the profiles of models
selected
Conclusions
Novel approach: i) joining two methodology: BAC and Bayesian PS,
ii) informative prior and iii) applying RJMCMC.
The simulation study found that:
the informative prior was not able to shape the profiles of models
selected
we have proposed to solve the above adding a penalty term so the
informative prior kicks in.
Conclusions
Novel approach: i) joining two methodology: BAC and Bayesian PS,
ii) informative prior and iii) applying RJMCMC.
The simulation study found that:
the informative prior was not able to shape the profiles of models
selected
we have proposed to solve the above adding a penalty term so the
informative prior kicks in.
with the penalty term, the informative prior was able to influence the
PIP of the IV without distorting the PIP of all other variables.
Thanks to: Robert Platt (McGill U.); Francesca Dominici (Harvard
U.); Matt Cefalu (RAND);Genevi`eve Lefebvre (UdeM) Jay Kaufman
(McGill U.); Sahir Bhatnagar (McGill U.); Maxime Turgeon (McGill
U.);CNODES and Jewish General Hospital at Montreal.
Appendix. Simulation Exercise
p = 7 covariate ⇒ consist of 128 (= 27) models. Ignoring the model
with no covariate |My|c| = |MPS | = 27 − 1 = 127.
Two scenarios: i) N = 300, and ii) N = 1000.
For all i = 1, 2, ...., N:
we simulate p covariates Ci ∼ MVN(0, I)
the exposure variable Xi is simulated from a Bernoulli distribution with
probability given by:
P(Xi |Ci ) =
exp(
p
k=1 γk Ck,i )
1 + exp(
p
k=1 γk Cm,k ))
(1)
where we set γ = (γ1, γ2, ..., γ7) = (0.6, −0.6, 0.1, 0, 0.6, 0.1, 0).
the outcome variable Yi is generated similarly from a Bernoulli
distribution with probability given by:
P(Yi |Xi , Ci ) =
exp(β0 + βXi +
p
k=1 φk Ck,i )
1 + exp(β0 + βXi +
p
k=1 φk Ck,i ))
(2)
where we set φ = (φ1, φ2, ..., φ7) = (0.6, 0.1, −0.6, 0.6, 0, 0, 0) and
β0 = 0 and β = 0.1.
Thus, αx|c = (1, 1, 1, 0, 0, 0, 0) is the model that contains the
confounders necessary to satisfy the assumption of no unmeasured
confounders. This model is called the minimal model and we denote
it as α∗x|c.
Lastly, this setting implies a true ACE equal to = 0.06 (calculated
based on a much larger sample size using the true value of the
parameters in order to compute P(Y = 1|X = 1) − P(Y = 1|X = 0))
Appendix
Joint Bayesian PS estimation
The likelihood of the PS stage is given by:
L(X|γ, ¯αm
, C) =
N
i=1
g−1
x
p
k=1
¯αm
k γkCk,i
Xi
1 − g−1
x
p
m=1
¯αm
k γkCk,i
1−Xi
and the likelihood of the outcome stage is given by:
L(Y|β, γ, δ, ξ, ¯αm
, X, C) =
N
i=1
g−1
y β0 + βX Xi + ξT
h(PS) +
p
k=1
¯αm
k δkCk,i
Yi
(3)
× 1 − g−1
y β0 + βX Xi + ξT
h(PS) +
p
k=1
¯αm
k δkCk,i
1−Yi
Joint Bayesian PS estimation with α unknown
Another consequence of adding α is that the ACE given by equation
turns into a weighted average over different PS and outcome models,
with weights corresponding to the posterior probability of each model.
Formally, let M = {α : α ∈ {0, 1}p} denote the set of all models
being considered where its cardinality is |M| = 2p.
For instance, an element of M is the m-th model
: αm = (αm
1 , ....., αm
p ).
Let p(αm) be the prior probability of the m-th model.
Then, the posterior probability of the m-th model is
p(αm
|data) =
p(αm)p(data|αm)
αi ∈M p(αi )p(data|αi )
Joint Bayesian PS estimation with α unknown
Hence, the posterior distribution of the ACE will be a weighted
average of estimates of ACE under each model in M:
p( |data) ≈
αm∈M
p( m
|αm
, data)p(αm
|data)
where m = ECαm {E[Y |X = 1, Cαm ] − E[Y |X = 0, Cαm ]} and Cαm
denotes the subset of C which is included in model αm.
Remark 5. The m is an estimate of the causal effect if and only if
αm contains the necessary confounders to satisfy the assumption of
no unmeasured confounders.
Remark 6. It assumes that each model have equal prior probability,
that is, for all possible α: p(α) = 1
|M|
First Stage. Posterior Distributions
Our objective is p(αx|c|Y)
First, we need to compute p(αy|c|Y):
p(αy|c
|Y) ∝ L(Y|αy|c
)p(αy|c
) (4)
L(Y|αy|c) is the marginal likelihood under model αy|c and is equal to:
L(Y|αy|c
) = L(Y|αy|c
, η)p(η|αy|c
)dη (5)
where η is a vector of parameters of the logistic regression parameter
in the prognostic score model for model αy|c and its dimension given
by p
k=1 α
y|c
k
p(η|αy|c) is the prior distribution of parameter η under model αy|c
L(Y|αy|c, η) is the likelihood for model αy|c which involves only the
prognostic score model
First Stage. Posterior Distributions
L(Y|αy|c) is not analytically tractable and thus we cannot apply
MC3.
We sample from the joint posterior of p(αy|c, η|Y) applying the
algorithm RJMCMC.
Then we compute the informative prior as follows:
p(αx|c
|Y) =
αy|c ∈My|c
p(αx|c
|αy|c
, Y)p(αy|c
|Y)
=
αx|c Y |αy|c αy|c ∈My|c
p(αx|c
|αy|c
)p(αy|c
|Y) (6)
where the last equality assumes that αx|c Y |αy|c.
First Stage. Posterior Distributions
When ω = 1 corresponds to p(αx|c|Y) being a uninformative
prior. Why?
ω = 1 ⇒ p(α
x|c
k = 0|α
y|c
k = 0) = p(α
x|c
k = 1|α
y|c
k = 0) = p(α
x|c
k =
0|α
y|c
k = 1) = p(α
x|c
k = 1|α
y|c
k = 1) = 1
2 for ∀k.
So, we have that
p(αx|c
|αy|c
) =
p
k=1
p(α
x|c
k |α
y|c
k ) =
1
2p
where p is the number of covariates.
Hence,
p(αx|c
|Y) =
1
2p
αy|c ∈My|c
p(αy|c
|Y) =
1
2p
(7)
since αy|c ∈My|c
p(αy|c|Y) = 1.
Thus, p(αx|c|Y) carries no outcome information to the second stage.
Second Stage.
It is a Bayesian PS estimation stage, based only on the PS and
outcome model
We incorporate an informative prior for the model indicator of the PS
model, p(αx|c|Y), which is inherited from the first stage.
The setting of Zigler and Dominici 2014 is a particular case of our
setting when ω = 1 ⇒ p(αx|c|Y) = p(αx|c) = 1
p .
The main goal of this stage is to estimate the Average Causal Effect
(ACE) of treatment with X = 1 vs X = 0.
The posterior distribution of the ACE:
p( |data) ≈
αx|c ∈MPS
p( αx|c
|αx|c
, data)p(αx|c
|data) (8)
where, as before,
αx|c
= ECαx|c
{E[Y |X = 1, Cαx|c ] − E[Y |X = 0, Cαx|c ]} and Cαx|c
denotes the subset of C which is included in model αx|c.
Second Stage.
Prior Distributions
Similarly to Zigler and Dominici 2014, we use a flat prior distribution
on (β0, βX , ξ, δαx|c , γαx|c ).
Contrast to the previous approach, here p(αx|c|Y) is an informative
prior.
Posterior Distributions
we sample from the joint posterior p(αx|c, β0, βX , ξ, δαx|c , γαx|c |data)
applying the method RJMCMC. This joint posterior is given by:
p(αx|c
, β0, βX , ξ, δαx|c , γαx|c |data) ∝
L(Y, X|αx|c
, θαx|c , C)p(θαx|c |αx|c
)p(αx|c
|Y)
where θαx|c = (β0, βX , ξ, δαx|c , γαx|c ), L(Y, X|αx|c, θαx|c , C) is the
joint likelihood of the PS and outcome model and p(θαx|c |αx|c) and
p(αx|c|Y) are the priors distribution.
The marginal likelihood under model αx|c
L(Y, X|αx|c
, C) =
L(Y, X|αx|c
, θαx|c , C)p(θαx|c |αx|c
)p(αx|c
|Y)dθαx|c
will not have an analytically tractable expression to be used to
compute p(αx|c|Y, X), the amount needed to apply MC3.
RJMCMC
RJMCMC was proposed by Green 1995 as an extension of the
Metropolis-Hastings algorithm that allows to create a reversible
Markov chain that can “jump” between models with parameter
spaces of different dimensions (trans-dimensional Markov chains)
retaining the detailed balance condition which guarantee the correct
limiting distribution.
The standard Metropolis-Hastings within Gibbs sampling algorithm
cannot be applied because when we condition on one model, let say
αx|c, then (β0, βX , ξ, δαx|c , γαx|c ) ∈ Θαx|c but when we condition on
(β0, βX , ξ, δαx|c , γαx|c ), then αx|c cannot move and we cannot move
between models.
We need to complete the spaces or supplement each of them with an
artificial space in order to make them compatible. In other words, we
need to create a bijection between them.
Outline RJMCMC
Step 1) Update the parameters that are in the current model for
example using Metropolis-Hastings algorithm.
Step 2.a). Generate a proposed variable j ∈ {1, 2, ...., p} to add or
delete from the model with probability 1/p. Thus, we propose to
change α to α where αj = 1 − αj
Step 2.c). If αj = 0 → αj = 1 (include covariate j in the model)
i) Generate the additional parameter u corresponding to variable j from
a proposal density u ∼ ϕ(u)
ii) Set θα = (θα,(−j), uα,(j))
iii) Accept the proposed move with probability
∆{(α, θα) → (α , θα
)} =
min
L(data|θα
, α )p(θα
|α )p(α )
L(data|θα, α)p(θα|α)p(α)ϕ(u)
, 1
iv) If the proposed move is accepted, update α and θα by α and θα
.
Otherwise, leave α and θα unchanged.
Step 2.b).If αj = 1 → αj = 0 (exclude covariate j in the model)
i) Set θα = θα,(−j), it is equal to the corresponding parameter of θα
ii) Accept the proposed move with probability
∆{(α, θα) → (α , θα
)} =
min
L(data|θα
, α )p(θα
|α )p(α )ϕ(u)
L(data|θα, α)p(θα|α)p(α)
, 1
iv) If the proposed move is accepted, update α and θα by α and θα
.
Otherwise, leave α and θα unchanged.
The difference between the first and second stage in terms of the
RJMCMC algorithm lies mainly in:
the likelihood L(data|θα, α).
The likelihood of the the first stage L(data|θαy|c , αy|c) is only based
on the prognostic score
On the other hand, the likelihood in the second stage
L(data|θαx|c , αx|c) is based jointly on the PS and outcome model
the ratio p(α )
p(α) in the acceptance probability of the proposed
move.
In the first stage and the second stage for ω = 1, this cancels out
since each model has equal prior probability.
On the other hand, this ratio does appear in the acceptance
probability in the second stage for ω > 1. This is due to the fact that
p(α x|c|Y) is an informative prior and thus this ratio p(α x|c |Y)
p(αx|c |Y)
will not
cancel out.

More Related Content

What's hot

Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Yandex
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
Christian Robert
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
Christian Robert
 
DissertationSlides169
DissertationSlides169DissertationSlides169
DissertationSlides169Ryan White
 
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Christian Robert
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
Christian Robert
 
How to find a cheap surrogate to approximate Bayesian Update Formula and to a...
How to find a cheap surrogate to approximate Bayesian Update Formula and to a...How to find a cheap surrogate to approximate Bayesian Update Formula and to a...
How to find a cheap surrogate to approximate Bayesian Update Formula and to a...
Alexander Litvinenko
 
2019 PMED Spring Course - Single Decision Treatment Regimes: Additional Metho...
2019 PMED Spring Course - Single Decision Treatment Regimes: Additional Metho...2019 PMED Spring Course - Single Decision Treatment Regimes: Additional Metho...
2019 PMED Spring Course - Single Decision Treatment Regimes: Additional Metho...
The Statistical and Applied Mathematical Sciences Institute
 
Probabilistic Control of Switched Linear Systems with Chance Constraints
Probabilistic Control of Switched Linear Systems with Chance ConstraintsProbabilistic Control of Switched Linear Systems with Chance Constraints
Probabilistic Control of Switched Linear Systems with Chance Constraints
Leo Asselborn
 
Big model, big data
Big model, big dataBig model, big data
Big model, big data
Christian Robert
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
Christian Robert
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
Christian Robert
 
2019 PMED Spring Course - Single Decision Treatment Regimes: Fundamentals (re...
2019 PMED Spring Course - Single Decision Treatment Regimes: Fundamentals (re...2019 PMED Spring Course - Single Decision Treatment Regimes: Fundamentals (re...
2019 PMED Spring Course - Single Decision Treatment Regimes: Fundamentals (re...
The Statistical and Applied Mathematical Sciences Institute
 
Introduction to Bayesian Methods
Introduction to Bayesian MethodsIntroduction to Bayesian Methods
Introduction to Bayesian Methods
Corey Chivers
 
Slides: Jeffreys centroids for a set of weighted histograms
Slides: Jeffreys centroids for a set of weighted histogramsSlides: Jeffreys centroids for a set of weighted histograms
Slides: Jeffreys centroids for a set of weighted histograms
Frank Nielsen
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
Christian Robert
 

What's hot (20)

Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
DissertationSlides169
DissertationSlides169DissertationSlides169
DissertationSlides169
 
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
How to find a cheap surrogate to approximate Bayesian Update Formula and to a...
How to find a cheap surrogate to approximate Bayesian Update Formula and to a...How to find a cheap surrogate to approximate Bayesian Update Formula and to a...
How to find a cheap surrogate to approximate Bayesian Update Formula and to a...
 
2019 PMED Spring Course - Single Decision Treatment Regimes: Additional Metho...
2019 PMED Spring Course - Single Decision Treatment Regimes: Additional Metho...2019 PMED Spring Course - Single Decision Treatment Regimes: Additional Metho...
2019 PMED Spring Course - Single Decision Treatment Regimes: Additional Metho...
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
Probabilistic Control of Switched Linear Systems with Chance Constraints
Probabilistic Control of Switched Linear Systems with Chance ConstraintsProbabilistic Control of Switched Linear Systems with Chance Constraints
Probabilistic Control of Switched Linear Systems with Chance Constraints
 
Big model, big data
Big model, big dataBig model, big data
Big model, big data
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
2019 PMED Spring Course - Single Decision Treatment Regimes: Fundamentals (re...
2019 PMED Spring Course - Single Decision Treatment Regimes: Fundamentals (re...2019 PMED Spring Course - Single Decision Treatment Regimes: Fundamentals (re...
2019 PMED Spring Course - Single Decision Treatment Regimes: Fundamentals (re...
 
Introduction to Bayesian Methods
Introduction to Bayesian MethodsIntroduction to Bayesian Methods
Introduction to Bayesian Methods
 
Slides: Jeffreys centroids for a set of weighted histograms
Slides: Jeffreys centroids for a set of weighted histogramsSlides: Jeffreys centroids for a set of weighted histograms
Slides: Jeffreys centroids for a set of weighted histograms
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Bayes 6
Bayes 6Bayes 6
Bayes 6
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 

Viewers also liked

Fotokurs Slideshow
Fotokurs SlideshowFotokurs Slideshow
Fotokurs SlideshowBastiFoto
 
Charles perrault
Charles perraultCharles perrault
Charles perrault
albabobesponja
 
бюллетень 054
бюллетень 054бюллетень 054
бюллетень 054
Димаш Махмутов
 
Caza del tesoro
Caza del tesoroCaza del tesoro
Caza del tesoro
Daniel Esparza
 
El xi mandamiento
El xi mandamientoEl xi mandamiento
El xi mandamiento
Christian trejo
 
Residuos electrónicos unesco
Residuos electrónicos unescoResiduos electrónicos unesco
Residuos electrónicos unescolasmaslindas1221
 
Grupo de trabalho luos decreto nº 17, de 28 02-2013
Grupo de trabalho luos   decreto nº 17, de 28 02-2013Grupo de trabalho luos   decreto nº 17, de 28 02-2013
Grupo de trabalho luos decreto nº 17, de 28 02-2013
Francis Zeman
 
Mensaje de stallman en 1983
Mensaje de stallman en 1983Mensaje de stallman en 1983
Mensaje de stallman en 1983edgarpugo
 
COMPÊNDIO BOTÂNICA
COMPÊNDIO BOTÂNICACOMPÊNDIO BOTÂNICA
COMPÊNDIO BOTÂNICA
Correira
 
бюллетень 056
бюллетень 056бюллетень 056
бюллетень 056
Димаш Махмутов
 
Manual de procedimentos irpf 2014 e legislação societária
Manual de procedimentos irpf 2014 e legislação societáriaManual de procedimentos irpf 2014 e legislação societária
Manual de procedimentos irpf 2014 e legislação societária
Rogerio Silva
 
Presentacion smart art
Presentacion smart artPresentacion smart art
Presentacion smart art
Daniel Esparza
 
Manifestação representação 14 2014
Manifestação representação 14 2014Manifestação representação 14 2014
Manifestação representação 14 2014Francis Zeman
 
Slim BS- Avisos de Vencimento
Slim BS- Avisos de VencimentoSlim BS- Avisos de Vencimento
Slim BS- Avisos de VencimentoSlim_bs
 

Viewers also liked (18)

Wow Photos
Wow PhotosWow Photos
Wow Photos
 
Fotokurs Slideshow
Fotokurs SlideshowFotokurs Slideshow
Fotokurs Slideshow
 
Hm06 jd ipaussurama
Hm06 jd ipaussuramaHm06 jd ipaussurama
Hm06 jd ipaussurama
 
Charles perrault
Charles perraultCharles perrault
Charles perrault
 
бюллетень 054
бюллетень 054бюллетень 054
бюллетень 054
 
Caza del tesoro
Caza del tesoroCaza del tesoro
Caza del tesoro
 
El xi mandamiento
El xi mandamientoEl xi mandamiento
El xi mandamiento
 
Residuos electrónicos unesco
Residuos electrónicos unescoResiduos electrónicos unesco
Residuos electrónicos unesco
 
Grupo de trabalho luos decreto nº 17, de 28 02-2013
Grupo de trabalho luos   decreto nº 17, de 28 02-2013Grupo de trabalho luos   decreto nº 17, de 28 02-2013
Grupo de trabalho luos decreto nº 17, de 28 02-2013
 
Mensaje de stallman en 1983
Mensaje de stallman en 1983Mensaje de stallman en 1983
Mensaje de stallman en 1983
 
COMPÊNDIO BOTÂNICA
COMPÊNDIO BOTÂNICACOMPÊNDIO BOTÂNICA
COMPÊNDIO BOTÂNICA
 
Salud ocupacional
Salud ocupacionalSalud ocupacional
Salud ocupacional
 
бюллетень 056
бюллетень 056бюллетень 056
бюллетень 056
 
Manual de procedimentos irpf 2014 e legislação societária
Manual de procedimentos irpf 2014 e legislação societáriaManual de procedimentos irpf 2014 e legislação societária
Manual de procedimentos irpf 2014 e legislação societária
 
Presentacion smart art
Presentacion smart artPresentacion smart art
Presentacion smart art
 
Guia iv i curso
Guia iv   i cursoGuia iv   i curso
Guia iv i curso
 
Manifestação representação 14 2014
Manifestação representação 14 2014Manifestação representação 14 2014
Manifestação representação 14 2014
 
Slim BS- Avisos de Vencimento
Slim BS- Avisos de VencimentoSlim BS- Avisos de Vencimento
Slim BS- Avisos de Vencimento
 

Similar to GonzalezGinestetResearchDay2016

Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
Md. Enamul Haque Chowdhury
 
Solving inverse problems via non-linear Bayesian Update of PCE coefficients
Solving inverse problems via non-linear Bayesian Update of PCE coefficientsSolving inverse problems via non-linear Bayesian Update of PCE coefficients
Solving inverse problems via non-linear Bayesian Update of PCE coefficients
Alexander Litvinenko
 
Litvinenko nlbu2016
Litvinenko nlbu2016Litvinenko nlbu2016
Litvinenko nlbu2016
Alexander Litvinenko
 
Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...
Duyen Do
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniques
Pierre Jacob
 
Minimum mean square error estimation and approximation of the Bayesian update
Minimum mean square error estimation and approximation of the Bayesian updateMinimum mean square error estimation and approximation of the Bayesian update
Minimum mean square error estimation and approximation of the Bayesian update
Alexander Litvinenko
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
Suvrat Mishra
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
ChinmayeeJonnalagadd2
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
Data mining assignment 2
Data mining assignment 2Data mining assignment 2
Data mining assignment 2
BarryK88
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
Frank Nielsen
 
Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier
108kaushik
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
Joachim Gwoke
 
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Alexander Litvinenko
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012
Zheng Mengdi
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
Federico Cerutti
 
Tutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksTutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian Networks
Anmol Dwivedi
 
Calibrating Probability with Undersampling for Unbalanced Classification
Calibrating Probability with Undersampling for Unbalanced ClassificationCalibrating Probability with Undersampling for Unbalanced Classification
Calibrating Probability with Undersampling for Unbalanced Classification
Andrea Dal Pozzolo
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
Connection between inverse problems and uncertainty quantification problems
Connection between inverse problems and uncertainty quantification problemsConnection between inverse problems and uncertainty quantification problems
Connection between inverse problems and uncertainty quantification problems
Alexander Litvinenko
 

Similar to GonzalezGinestetResearchDay2016 (20)

Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Solving inverse problems via non-linear Bayesian Update of PCE coefficients
Solving inverse problems via non-linear Bayesian Update of PCE coefficientsSolving inverse problems via non-linear Bayesian Update of PCE coefficients
Solving inverse problems via non-linear Bayesian Update of PCE coefficients
 
Litvinenko nlbu2016
Litvinenko nlbu2016Litvinenko nlbu2016
Litvinenko nlbu2016
 
Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniques
 
Minimum mean square error estimation and approximation of the Bayesian update
Minimum mean square error estimation and approximation of the Bayesian updateMinimum mean square error estimation and approximation of the Bayesian update
Minimum mean square error estimation and approximation of the Bayesian update
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Data mining assignment 2
Data mining assignment 2Data mining assignment 2
Data mining assignment 2
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
 
Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
 
Tutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksTutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian Networks
 
Calibrating Probability with Undersampling for Unbalanced Classification
Calibrating Probability with Undersampling for Unbalanced ClassificationCalibrating Probability with Undersampling for Unbalanced Classification
Calibrating Probability with Undersampling for Unbalanced Classification
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Connection between inverse problems and uncertainty quantification problems
Connection between inverse problems and uncertainty quantification problemsConnection between inverse problems and uncertainty quantification problems
Connection between inverse problems and uncertainty quantification problems
 

GonzalezGinestetResearchDay2016

  • 1. Bayesian Adjustment for Confounding (BAC) in Bayesian Propensity Score Estimation Pablo Gonzalez Ginestet McGill University, CNODES, Lady Davis Institute pablo.gonzalezginestet@mail.mcgill.ca 12th Annual EBOH Research Day Montreal, April 2016
  • 2. Outline 1 Traditional PS Estimation 2 Bayesian PS Estimation Uncertainty regards the PS Uncertainty regards the PS + Model uncertainty 3 BAC in Bayesian PS & Results First Stage Second Stage 4 Conclusion
  • 3. Traditional PS Estimation It is a sequential process: 1 “PS stage”: PS = P(Xi = 1|Ci ) is estimated: logit(PS) = p k=1 γk Ck,i 2 “Outcome stage”: the causal effect is estimated adjusting for the ˆPS(ˆγ) logit(E[Yi |Xi , Ci ]) = β0 + βX Xi + ξh( ˆPS)
  • 4. Traditional PS Estimation It is a sequential process: 1 “PS stage”: PS = P(Xi = 1|Ci ) is estimated: logit(PS) = p k=1 γk Ck,i 2 “Outcome stage”: the causal effect is estimated adjusting for the ˆPS(ˆγ) logit(E[Yi |Xi , Ci ]) = β0 + βX Xi + ξh( ˆPS) Remark 1. Outcome stage treats ˆPS as fixed and known ⇒ It ignores the uncertainty in regards to the PS.
  • 5. Traditional PS Estimation It is a sequential process: 1 “PS stage”: PS = P(Xi = 1|Ci ) is estimated: logit(PS) = p k=1 γk Ck,i 2 “Outcome stage”: the causal effect is estimated adjusting for the ˆPS(ˆγ) logit(E[Yi |Xi , Ci ]) = β0 + βX Xi + ξh( ˆPS) Remark 1. Outcome stage treats ˆPS as fixed and known ⇒ It ignores the uncertainty in regards to the PS. Remark 2. It ignores model uncertainty regarding the selection of confounders for the PS. The set of covariate is fixed⇒ MPS = { one model }
  • 6. Uncertainty regards the PS Bayesian PS Estimation (McCandles, Gustafson and Austin 2009, and Zigler et al. 2013) Bayesian PS estimates the PS stage and outcome stage simultaneously: “PS stage”: logit(PS) = p k=1 γkCk,i “Outcome stage”: logit(E[Yi |Xi , Ci ]) = β0 + βX Xi + ξh(PS) + p k=1 δkCk,i The set of covariate is fixed⇒ MPS = { one model }
  • 7. Uncertainty regards the PS + Model uncertainty Bayesian PS Estimation (Zigler and Dominici 2014) Bayesian PS estimates the PS stage and outcome stage simultaneously: “PS stage”: logit(PS) = p k=1 α x|c k γkCk,i “Outcome stage”: logit(E[Yi |Xi , Ci ]) = β0 + βX Xi + ξh(PS) + p k=1 α x|c k δkCk,i The set of covariates is NOT fixed⇒ MPS = { all possible models}
  • 8. Uncertainty regards the PS + Model uncertainty Bayesian PS Estimation (Zigler and Dominici 2014) Bayesian PS estimates the PS stage and outcome stage simultaneously: “PS stage”: logit(PS) = p k=1 α x|c k γkCk,i “Outcome stage”: logit(E[Yi |Xi , Ci ]) = β0 + βX Xi + ξh(PS) + p k=1 α x|c k δkCk,i The set of covariates is NOT fixed⇒ MPS = { all possible models} Posterior distribution of the ACE: p(ACE|data) ≈ αx|c ∈MPS p(ACEαx|c |αx|c , data)p(αx|c |data)
  • 9. Remarks, Motivation & Goal Remark 1 Uninformative Prior: Each model has equal prior probability: p(αx|c) = 1 |MPS |
  • 10. Remarks, Motivation & Goal Remark 1 Uninformative Prior: Each model has equal prior probability: p(αx|c) = 1 |MPS | Remark 2 Most of the time IVs are included in the PS model
  • 11. Remarks, Motivation & Goal Remark 1 Uninformative Prior: Each model has equal prior probability: p(αx|c) = 1 |MPS | Remark 2 Most of the time IVs are included in the PS model Goal: To limit the selection of IVs Strategy: Informative Prior on the PS model indicator αx|c
  • 12. Illustrative Example We simulate 250 replicated data sets under N = 1000 and p = 7 covariates {C1, C2, C3} true confounder; {C4} risk factor of outcome; {C5, C6} IVs and {C7} noise variable ACE = = P(Y = 1|X = 1) − P(Y = 1|X = 0) = 0.06
  • 13. First Stage BAC in Bayesian PS Prognostic score model (based on a single treatment group (Hansen 2008)) logit(E[Yi,0|Ci ]) = p k=1 α y|c k ηkCk,i
  • 14. First Stage BAC in Bayesian PS Prognostic score model (based on a single treatment group (Hansen 2008)) logit(E[Yi,0|Ci ]) = p k=1 α y|c k ηkCk,i Prior distribution on αx|c|αy|c following Wang et all. (2012): p(α x|c k = 1|α y|c k = 0) p(α x|c k = 0|α y|c k = 0) = 1 ω p(α x|c k = 1|α y|c k = 1) p(α x|c k = 0|α y|c k = 1) = 1
  • 15. The above constrains imply the following: P(α x|c k = 0|α y|c k = 0) = ω 1 + ω P(α x|c k = 1|α y|c k = 0) = 1 1 + ω P(α x|c k = 0|α y|c k = 1) = P(α x|c k = 1|α y|c k = 1) = 1 2
  • 16. First Stage: Informative prior p(αx|c |Y) over all models across ω = 1(−), 5(◦), 20( ), 50( ) and 100(•). Our objective is p(αx|c |Y) which is p(αx|c |Y) = αy|c ∈My|c p(αx|c |αy|c )p(αy|c |Y)
  • 17. Second Stage BAC in Bayesian PS It is a Bayesian PS stage with an informative prior p(αx|c|Y) PS model logit(P[Xi |Ci ]) = p k=1 α x|c k γkCk,i Outcome model logit(E[Yi |Xi , Ci ]) = β0 + βX Xi + ξh(PS) + p k=1 α x|c k δkCk,i MPS = {128 models}
  • 18. The expected role of the informative prior BAC in Bayesian PS In the MCMC, we propose to move from model αx|c (current) to α x|c (proposed): ⇒adding one covariate (α x|c j = 0 → α x|c j = 1) or ⇒deleting one covariate (α x|c j = 1 → α x|c j = 0)
  • 19. The expected role of the informative prior BAC in Bayesian PS In the MCMC, we propose to move from model αx|c (current) to α x|c (proposed): ⇒adding one covariate (α x|c j = 0 → α x|c j = 1) or ⇒deleting one covariate (α x|c j = 1 → α x|c j = 0) Accept the proposed move with probability (for the adding case) min L(data|θα x|c , α x|c)p(θα x|c |α x|c) L(data|θαx|c , αx|c)p(θαx|c |αx|c)ϕ(u) p(α x|c|Y) p(αx|c|Y) , 1
  • 20. Posterior probability that Ck is included in the model p(α x|c k |Y, X) across ω’s along Best subset
  • 21. We propose to penalize the log-likelihood term of the model that contains one covariate more. If the proposed model contains one covariate more (α x|c j = 0 → α x|c j = 1): Ψα x|c = −2 × log(N) × p(αx|c|Y) p(α x|c|Y)
  • 22. We propose to penalize the log-likelihood term of the model that contains one covariate more. If the proposed model contains one covariate more (α x|c j = 0 → α x|c j = 1): Ψα x|c = −2 × log(N) × p(αx|c|Y) p(α x|c|Y) if the proposed model has one covariate less than the current model (α x|c j = 1 → α x|c j = 0): Ψαx|c = −2 × log(N) × p(α x|c|Y) p(αx|c|Y)
  • 23. p(α x|c k |Y, X) when acceptance probability includes penalty term
  • 24. Bias and MSE of estimates of ACE across ω’s with ( ) and without (◦) penalty vs Kitchen sink and Best subset
  • 25. Conclusions Novel approach: i) joining two methodology: BAC and Bayesian PS, ii) informative prior and iii) applying RJMCMC.
  • 26. Conclusions Novel approach: i) joining two methodology: BAC and Bayesian PS, ii) informative prior and iii) applying RJMCMC. The simulation study found that: the informative prior was not able to shape the profiles of models selected
  • 27. Conclusions Novel approach: i) joining two methodology: BAC and Bayesian PS, ii) informative prior and iii) applying RJMCMC. The simulation study found that: the informative prior was not able to shape the profiles of models selected we have proposed to solve the above adding a penalty term so the informative prior kicks in.
  • 28. Conclusions Novel approach: i) joining two methodology: BAC and Bayesian PS, ii) informative prior and iii) applying RJMCMC. The simulation study found that: the informative prior was not able to shape the profiles of models selected we have proposed to solve the above adding a penalty term so the informative prior kicks in. with the penalty term, the informative prior was able to influence the PIP of the IV without distorting the PIP of all other variables. Thanks to: Robert Platt (McGill U.); Francesca Dominici (Harvard U.); Matt Cefalu (RAND);Genevi`eve Lefebvre (UdeM) Jay Kaufman (McGill U.); Sahir Bhatnagar (McGill U.); Maxime Turgeon (McGill U.);CNODES and Jewish General Hospital at Montreal.
  • 29. Appendix. Simulation Exercise p = 7 covariate ⇒ consist of 128 (= 27) models. Ignoring the model with no covariate |My|c| = |MPS | = 27 − 1 = 127. Two scenarios: i) N = 300, and ii) N = 1000. For all i = 1, 2, ...., N: we simulate p covariates Ci ∼ MVN(0, I) the exposure variable Xi is simulated from a Bernoulli distribution with probability given by: P(Xi |Ci ) = exp( p k=1 γk Ck,i ) 1 + exp( p k=1 γk Cm,k )) (1) where we set γ = (γ1, γ2, ..., γ7) = (0.6, −0.6, 0.1, 0, 0.6, 0.1, 0). the outcome variable Yi is generated similarly from a Bernoulli distribution with probability given by: P(Yi |Xi , Ci ) = exp(β0 + βXi + p k=1 φk Ck,i ) 1 + exp(β0 + βXi + p k=1 φk Ck,i )) (2) where we set φ = (φ1, φ2, ..., φ7) = (0.6, 0.1, −0.6, 0.6, 0, 0, 0) and β0 = 0 and β = 0.1.
  • 30. Thus, αx|c = (1, 1, 1, 0, 0, 0, 0) is the model that contains the confounders necessary to satisfy the assumption of no unmeasured confounders. This model is called the minimal model and we denote it as α∗x|c. Lastly, this setting implies a true ACE equal to = 0.06 (calculated based on a much larger sample size using the true value of the parameters in order to compute P(Y = 1|X = 1) − P(Y = 1|X = 0))
  • 31. Appendix Joint Bayesian PS estimation The likelihood of the PS stage is given by: L(X|γ, ¯αm , C) = N i=1 g−1 x p k=1 ¯αm k γkCk,i Xi 1 − g−1 x p m=1 ¯αm k γkCk,i 1−Xi and the likelihood of the outcome stage is given by: L(Y|β, γ, δ, ξ, ¯αm , X, C) = N i=1 g−1 y β0 + βX Xi + ξT h(PS) + p k=1 ¯αm k δkCk,i Yi (3) × 1 − g−1 y β0 + βX Xi + ξT h(PS) + p k=1 ¯αm k δkCk,i 1−Yi
  • 32. Joint Bayesian PS estimation with α unknown Another consequence of adding α is that the ACE given by equation turns into a weighted average over different PS and outcome models, with weights corresponding to the posterior probability of each model. Formally, let M = {α : α ∈ {0, 1}p} denote the set of all models being considered where its cardinality is |M| = 2p. For instance, an element of M is the m-th model : αm = (αm 1 , ....., αm p ). Let p(αm) be the prior probability of the m-th model. Then, the posterior probability of the m-th model is p(αm |data) = p(αm)p(data|αm) αi ∈M p(αi )p(data|αi )
  • 33. Joint Bayesian PS estimation with α unknown Hence, the posterior distribution of the ACE will be a weighted average of estimates of ACE under each model in M: p( |data) ≈ αm∈M p( m |αm , data)p(αm |data) where m = ECαm {E[Y |X = 1, Cαm ] − E[Y |X = 0, Cαm ]} and Cαm denotes the subset of C which is included in model αm. Remark 5. The m is an estimate of the causal effect if and only if αm contains the necessary confounders to satisfy the assumption of no unmeasured confounders. Remark 6. It assumes that each model have equal prior probability, that is, for all possible α: p(α) = 1 |M|
  • 34. First Stage. Posterior Distributions Our objective is p(αx|c|Y) First, we need to compute p(αy|c|Y): p(αy|c |Y) ∝ L(Y|αy|c )p(αy|c ) (4) L(Y|αy|c) is the marginal likelihood under model αy|c and is equal to: L(Y|αy|c ) = L(Y|αy|c , η)p(η|αy|c )dη (5) where η is a vector of parameters of the logistic regression parameter in the prognostic score model for model αy|c and its dimension given by p k=1 α y|c k p(η|αy|c) is the prior distribution of parameter η under model αy|c L(Y|αy|c, η) is the likelihood for model αy|c which involves only the prognostic score model
  • 35. First Stage. Posterior Distributions L(Y|αy|c) is not analytically tractable and thus we cannot apply MC3. We sample from the joint posterior of p(αy|c, η|Y) applying the algorithm RJMCMC. Then we compute the informative prior as follows: p(αx|c |Y) = αy|c ∈My|c p(αx|c |αy|c , Y)p(αy|c |Y) = αx|c Y |αy|c αy|c ∈My|c p(αx|c |αy|c )p(αy|c |Y) (6) where the last equality assumes that αx|c Y |αy|c.
  • 36. First Stage. Posterior Distributions When ω = 1 corresponds to p(αx|c|Y) being a uninformative prior. Why? ω = 1 ⇒ p(α x|c k = 0|α y|c k = 0) = p(α x|c k = 1|α y|c k = 0) = p(α x|c k = 0|α y|c k = 1) = p(α x|c k = 1|α y|c k = 1) = 1 2 for ∀k. So, we have that p(αx|c |αy|c ) = p k=1 p(α x|c k |α y|c k ) = 1 2p where p is the number of covariates. Hence, p(αx|c |Y) = 1 2p αy|c ∈My|c p(αy|c |Y) = 1 2p (7) since αy|c ∈My|c p(αy|c|Y) = 1. Thus, p(αx|c|Y) carries no outcome information to the second stage.
  • 37. Second Stage. It is a Bayesian PS estimation stage, based only on the PS and outcome model We incorporate an informative prior for the model indicator of the PS model, p(αx|c|Y), which is inherited from the first stage. The setting of Zigler and Dominici 2014 is a particular case of our setting when ω = 1 ⇒ p(αx|c|Y) = p(αx|c) = 1 p . The main goal of this stage is to estimate the Average Causal Effect (ACE) of treatment with X = 1 vs X = 0. The posterior distribution of the ACE: p( |data) ≈ αx|c ∈MPS p( αx|c |αx|c , data)p(αx|c |data) (8) where, as before, αx|c = ECαx|c {E[Y |X = 1, Cαx|c ] − E[Y |X = 0, Cαx|c ]} and Cαx|c denotes the subset of C which is included in model αx|c.
  • 38. Second Stage. Prior Distributions Similarly to Zigler and Dominici 2014, we use a flat prior distribution on (β0, βX , ξ, δαx|c , γαx|c ). Contrast to the previous approach, here p(αx|c|Y) is an informative prior. Posterior Distributions we sample from the joint posterior p(αx|c, β0, βX , ξ, δαx|c , γαx|c |data) applying the method RJMCMC. This joint posterior is given by: p(αx|c , β0, βX , ξ, δαx|c , γαx|c |data) ∝ L(Y, X|αx|c , θαx|c , C)p(θαx|c |αx|c )p(αx|c |Y) where θαx|c = (β0, βX , ξ, δαx|c , γαx|c ), L(Y, X|αx|c, θαx|c , C) is the joint likelihood of the PS and outcome model and p(θαx|c |αx|c) and p(αx|c|Y) are the priors distribution.
  • 39. The marginal likelihood under model αx|c L(Y, X|αx|c , C) = L(Y, X|αx|c , θαx|c , C)p(θαx|c |αx|c )p(αx|c |Y)dθαx|c will not have an analytically tractable expression to be used to compute p(αx|c|Y, X), the amount needed to apply MC3.
  • 40. RJMCMC RJMCMC was proposed by Green 1995 as an extension of the Metropolis-Hastings algorithm that allows to create a reversible Markov chain that can “jump” between models with parameter spaces of different dimensions (trans-dimensional Markov chains) retaining the detailed balance condition which guarantee the correct limiting distribution. The standard Metropolis-Hastings within Gibbs sampling algorithm cannot be applied because when we condition on one model, let say αx|c, then (β0, βX , ξ, δαx|c , γαx|c ) ∈ Θαx|c but when we condition on (β0, βX , ξ, δαx|c , γαx|c ), then αx|c cannot move and we cannot move between models. We need to complete the spaces or supplement each of them with an artificial space in order to make them compatible. In other words, we need to create a bijection between them.
  • 41. Outline RJMCMC Step 1) Update the parameters that are in the current model for example using Metropolis-Hastings algorithm. Step 2.a). Generate a proposed variable j ∈ {1, 2, ...., p} to add or delete from the model with probability 1/p. Thus, we propose to change α to α where αj = 1 − αj Step 2.c). If αj = 0 → αj = 1 (include covariate j in the model) i) Generate the additional parameter u corresponding to variable j from a proposal density u ∼ ϕ(u) ii) Set θα = (θα,(−j), uα,(j)) iii) Accept the proposed move with probability ∆{(α, θα) → (α , θα )} = min L(data|θα , α )p(θα |α )p(α ) L(data|θα, α)p(θα|α)p(α)ϕ(u) , 1
  • 42. iv) If the proposed move is accepted, update α and θα by α and θα . Otherwise, leave α and θα unchanged. Step 2.b).If αj = 1 → αj = 0 (exclude covariate j in the model) i) Set θα = θα,(−j), it is equal to the corresponding parameter of θα ii) Accept the proposed move with probability ∆{(α, θα) → (α , θα )} = min L(data|θα , α )p(θα |α )p(α )ϕ(u) L(data|θα, α)p(θα|α)p(α) , 1 iv) If the proposed move is accepted, update α and θα by α and θα . Otherwise, leave α and θα unchanged.
  • 43. The difference between the first and second stage in terms of the RJMCMC algorithm lies mainly in: the likelihood L(data|θα, α). The likelihood of the the first stage L(data|θαy|c , αy|c) is only based on the prognostic score On the other hand, the likelihood in the second stage L(data|θαx|c , αx|c) is based jointly on the PS and outcome model the ratio p(α ) p(α) in the acceptance probability of the proposed move. In the first stage and the second stage for ω = 1, this cancels out since each model has equal prior probability. On the other hand, this ratio does appear in the acceptance probability in the second stage for ω > 1. This is due to the fact that p(α x|c|Y) is an informative prior and thus this ratio p(α x|c |Y) p(αx|c |Y) will not cancel out.