SlideShare a Scribd company logo
Monte Carlo methods for some
not-quite-but-almost Bayesian problems
Pierre E. Jacob
Department of Statistics, Harvard University
joint work with
Ruobin Gong, Paul T. Edlefsen, Arthur P. Dempster
John O’Leary, Yves F. Atchad´e, Niloy Biswas, Paul Vanetti
and others
November 21, 2019
Department of Statistical Science, University of Toronto
Pierre E. Jacob Monte Carlo for not quite Bayes
Introduction
A lot of questions in statistics give rise to non-trivial
computational problems.
Among these, some are numerical integration problems ⇔
about sampling from probability distributions.
Besag, Markov chain Monte Carlo for statistical inference, 2001.
Computational challenges arise in deviations from standard
Bayesian inference, motivated by three questions,
quantifying ignorance / Dempster–Shafer analysis,
model misspecification / modular Bayesian inference,
robustness to some perturbation of the data / BayesBag.
Pierre E. Jacob Monte Carlo for not quite Bayes
Outline
1 Dempster–Shafer analysis of count data
2 Unbiased MCMC and diagnostics of convergence
3 Modular Bayesian inference
4 Bagging posterior distributions
Pierre E. Jacob Monte Carlo for not quite Bayes
Outline
1 Dempster–Shafer analysis of count data
2 Unbiased MCMC and diagnostics of convergence
3 Modular Bayesian inference
4 Bagging posterior distributions
Pierre E. Jacob Monte Carlo for not quite Bayes
Inference with count data
Notation: [N] := {1, . . . , N}. Simplex ∆.
Observations : xn ∈ [K] := {1, . . . , K}, x = (x1, . . . , xN ).
Index sets : Ik = {n ∈ [N] : xn = k}.
Counts : Nk = |Ik|.
Model: xn
iid
∼ Categorical(θ) with θ = (θk)k∈[K] ∈ ∆,
i.e. P(xn = k) = θk for all n, k.
Goal: estimate θ, predict, etc.
Maximum likelihood estimator: ˆθk = Nk/N.
Bayesian inference combines likelihood with prior on θ into a
posterior distribution, assigning a probability ∈ [0, 1] to any
measurable subset Σ of the simplex ∆.
Pierre E. Jacob Monte Carlo for not quite Bayes
Arthur Dempster’s approach to inference
Observations x = (xn)n∈[N] are fixed.
We will specify a sampling mechanism, on top of the likelihood,
e.g. xn = m(un, θ) for some function m and random variable un.
We will seek u = (un)n∈[N] that could have generated x for
some θ. For arbitrary u, such a θ might not exist.
If a set of feasible θ exists denote it by F(u). Dempster’s
approach defines lower/upper probabilities for subsets Σ of
interest, as expectations with respect to non-empty F(u).
Arthur P. Dempster. New methods for reasoning towards posterior
distributions based on sample data. Annals of Mathematical Statistics, 1966.
Arthur P. Dempster. Statistical inference from a Dempster—Shafer
perspective. Past, Present, and Future of Statistical Science, 2014.
Pierre E. Jacob Monte Carlo for not quite Bayes
Sampling from a Categorical distribution
2 3
1
∆1(θ)
∆2(θ)∆3(θ)
θ
Subsimplex ∆k(θ), for θ ∈ ∆:
{z ∈ ∆ : ∀ ∈ [K] z /zk ≥ θ /θk}.
Sampling mechanism, for θ ∈ ∆:
- draw un uniform on ∆,
- define xn such that un ∈ ∆xn (θ).
Then P(xn = k) = θk,
because Vol(∆k(θ)) = θk.
Pierre E. Jacob Monte Carlo for not quite Bayes
Draws in the simplex
Counts: (2, 3, 1). Let’s draw N = 6 uniform samples on ∆.
2 3
1
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Draws in the simplex
Each un is associated to an observed xn ∈ {11, 22, 33}.
2 3
1
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Draws in the simplex
If there exists a feasible θ, it cannot be just anywhere.
2 3
1
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Draws in the simplex
The uns of each category add constraints on θ.
2 3
1
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Draws in the simplex
Overall the constraints define a polytope for θ, or an empty set.
2 3
1
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Draws in the simplex
Here, there is a polytope of θ such that ∀n ∈ [N] un ∈ ∆xn (θ).
2 3
1
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Draws in the simplex
Any θ in the polytope separates the uns appropriately.
2 3
1
qqq
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Draws in the simplex
Let’s try again with fresh uniform samples on ∆.
2 3
1
q q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Draws in the simplex
Here there is no θ ∈ ∆ such that ∀n ∈ [N] un ∈ ∆xn (θ).
2 3
1
q q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Lower and upper probabilities
Consider the set
Rx = (u1, . . . , uN ) ∈ ∆N
: ∃θ ∈ ∆ ∀n ∈ [N] un ∈ ∆xn (θ) .
and denote by νx the uniform distribution on Rx.
For u ∈ Rx, there is a set F(u) = {θ ∈ ∆ : ∀n un ∈ ∆xn (θ)}.
For a set Σ ⊂ ∆ of interest, define
(lower probability) P(Σ) = 1(F(u) ⊂ Σ)νx(du),
(upper probability) ¯P(Σ) = 1(F(u) ∩ Σ = ∅)νx(du).
Pierre E. Jacob Monte Carlo for not quite Bayes
Summary and Monte Carlo problem
Arthur Dempster’s approach, later called Dempster–Shafer
theory of belief functions, is based on a distribution of
feasible sets,
F(u) = {θ ∈ ∆ ∀n ∈ [N] un ∈ ∆xn (θ)},
where u ∼ νx, the uniform distribution on Rx.
How do we obtain samples from this distribution?
Rejection rate 99%, for data (2, 3, 1).
Hit-and-run algorithm?
Our proposed strategy is a Gibbs sampler. Starting from
some u ∈ Rx, we will iteratively refresh some components
un of u given others.
Pierre E. Jacob Monte Carlo for not quite Bayes
Gibbs sampler: initialization
We can obtain some u in Rx as follows.
Choose an arbitrary θ ∈ ∆.
For all n ∈ [N] sample un uniformly in ∆k(θ) where xn = k.
2 3
1
∆1(θ)
∆2(θ)∆3(θ)
θ
q
q
q
q
q
q
To sample components un given
others, we will express Rx,
{u : ∃θ ∀n un ∈ ∆xn (θ)}
in terms of relations that the
components un must satisfy with
respect to one another.
Pierre E. Jacob Monte Carlo for not quite Bayes
Equivalent representation
For any θ ∈ ∆,
∀k ∈ [K] ∀n ∈ Ik un ∈ ∆k(θ)
⇔ ∀k ∈ [K] ∀n ∈ Ik ∀ ∈ [K]
un,
un,k
≥
θ
θk
.
This is equivalent to
∀k ∈ [K] ∀ ∈ [K] min
n∈Ik
un,
un,k
≥
θ
θk
.
Pierre E. Jacob Monte Carlo for not quite Bayes
Linear constraints
Counts: (9, 8, 3), u in Rx.
Values ηk→ = minn∈Ik
un, /un,k define linear constraints on θ.
2 3
1
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
θ3 θ1 = η1→3
θ2 θ1 = η1→2
Pierre E. Jacob Monte Carlo for not quite Bayes
Some inequalities
Next, assume u ∈ Rx, write ηk→ = minn∈Ik
un, /un,k, and
consider some implications.
There exists θ ∈ ∆ such that θ /θk ≤ ηk→ for all k, ∈ [K].
Then, for all k,
θ
θk
≤ ηk→ , and
θk
θ
≤ η →k, thus ηk→ η →k ≥ 1.
Pierre E. Jacob Monte Carlo for not quite Bayes
More inequalities
We can continue, if K ≥ 3: for all k, , j,
η−1
→k ≤
θ
θk
=
θ
θj
θj
θk
≤ ηj→ ηk→j,
thus ηk→jηj→ η →k ≥ 1.
And if K ≥ 4, for all k, , j, m
ηk→jηj→ η →mηm→k ≥ 1.
Generally,
∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1.
Pierre E. Jacob Monte Carlo for not quite Bayes
Main result
So far, if ∃θ ∈ ∆ such that θ /θk ≤ ηk→ for k, ∈ [K] then
∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1.
The reverse implication holds too.
This would mean
Rx = {u : ∃θ ∀k, ∈ [K] θ /θk ≤ ηk→ }
= {u : ∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1}.
i.e. Rx is represented by relations between components (un).
This helps computing conditional distributions under νx,
leading to a Gibbs sampler.
Pierre E. Jacob Monte Carlo for not quite Bayes
Some remarks on these inequalities
∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1.
We can consider only unique indices in j1, . . . , jL,
since the other cases can be deduced from those.
Example: η1→2η2→4η4→3η3→2η2→1 ≥ 1,
follows from η1→2η2→1 ≥ 1 and η2→4η4→3η3→2 ≥ 1.
The indices j1 → j2 → · · · → jL → j1 form a cycle.
Pierre E. Jacob Monte Carlo for not quite Bayes
Graphs
Fully connected graph with weight log ηk→ on edge (k, ).
1
2
3
log(η1→2)
log(η2→1)
Value of a path = sum of the weights along the path.
Negative cycle = path from vertex to itself with negative value
Pierre E. Jacob Monte Carlo for not quite Bayes
Graphs
∀L ∀j1, . . . , jL ηj1→j2 . . . ηjL→j1 ≥ 1
⇔ ∀L ∀j1, . . . , jL log(ηj1→j2 ) + . . . + log(ηjL→j1 ) ≥ 0
⇔ there are no negative cycles in the graph.
1
2
3
log(η1→2)
log(η2→1)
Pierre E. Jacob Monte Carlo for not quite Bayes
Proof
Proof of claim: “inequalities” ⇒ “∃θ : θ /θk ≤ ηk→ ∀k, ”.
min(k → ) := minimum value of path from k to in the graph.
Finite ∀k, because of absence of negative cycles in the graph.
Define θ via θk ∝ exp(min(K → k)).
Then θ ∈ ∆. Furthermore, for all k,
min(K → ) ≤ min(K → k) + log(ηk→ ),
therefore θ /θk ≤ ηk→ .
Pierre E. Jacob Monte Carlo for not quite Bayes
So far. . .
We want to sample uniformly on the set Rx,
Rx = {u : ∃θ ∀k, ∈ [K] θ /θk ≤ ηk→ }.
We have proved that this set can also be written
{u : ∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1}.
The inequalities hold if and only if some graph with weight
log ηk→ on edge (k, ) does not contain negative cycles.
Pierre E. Jacob Monte Carlo for not quite Bayes
Conditional distributions
We can obtain conditional distributions of un for n ∈ Ik given
(un)n/∈Ik
with respect to νx:
un given (un)n/∈Ik
are i.i.d. uniform in ∆k(θ ),
where θ ∝ exp(− min( → k)) for all ,
with min( → k) := minimum value of path from to k.
Shortest paths can be computed in polynomial time.
Pierre E. Jacob Monte Carlo for not quite Bayes
Conditional distributions
Counts: (9, 8, 3). What is the conditional distribution of
(un)n∈Ik
given (un)n/∈Ik
under νx?
2 3
1
q
q
q
qq q
q
q
q
q
q
q q
q
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Conditional distributions
Counts: (9, 8, 3). What is the conditional distribution of
(un)n∈Ik
given (un)n/∈Ik
under νx?
2 3
1
q
q
q q
q
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Conditional distributions
Counts: (9, 8, 3). What is the conditional distribution of
(un)n∈Ik
given (un)n/∈Ik
under νx?
2 3
1
q
q
q q
q
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Conditional distributions
Counts: (9, 8, 3). What is the conditional distribution of
(un)n∈Ik
given (un)n/∈Ik
under νx?
2 3
1
q
q
q q
q
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Gibbs sampler
Initial u(0) ∈ Rx.
At each iteration t ≥ 1, for each category k ∈ [K],
1 compute θ such that, for n ∈ Ik,
un given other components is uniform on ∆k(θ ).
2 Draw u
(t)
n ∼ ∆k(θ ) for n ∈ Ik.
3 Update η
(t)
k→ for ∈ [K].
In step 1, θ is obtained by computing shortest path in graph
with weights η
(t)
k→ on edge (k, ).
Computed e.g. with Bellman–Ford algorithm, implemented in
Cs´ardi & Nepusz, igraph package, 2006.
Alternatively, we can compute θ by solving a linear program,
Berkelaar, Eikland & Notebaert, lpsolve package, 2004
Pierre E. Jacob Monte Carlo for not quite Bayes
Gibbs sampler
Counts: (9, 8, 3), 100 polytopes generated by the sampler.
2 3
1
Pierre E. Jacob Monte Carlo for not quite Bayes
Cost per iteration
Cost in seconds for 100 full sweeps.
0.0
0.3
0.6
0.9
4 8 12 16
K
elapsed
N 256 512 1024 2048
https://github.com/pierrejacob/dempsterpolytope
Pierre E. Jacob Monte Carlo for not quite Bayes
Cost per iteration
Cost in seconds for 100 full sweeps.
0.0
0.3
0.6
0.9
256 512 1024 2048
N
elapsed
K 4 8 12 16
https://github.com/pierrejacob/dempsterpolytope
Pierre E. Jacob Monte Carlo for not quite Bayes
How many iterations for convergence?
Let ν(t) by the distribution of u(t) after t iterations.
TV(ν(t), νx) = supA |ν(t)(A) − νx(A)|.
0.00
0.25
0.50
0.75
1.00
0 25 50 75 100
iteration
TVupperbounds
K 5 10 20
Pierre E. Jacob Monte Carlo for not quite Bayes
How many iterations for convergence?
Let ν(t) by the distribution of u(t) after t iterations.
TV(ν(t), νx) = supA |ν(t)(A) − νx(A)|.
0.00
0.25
0.50
0.75
1.00
0 50 100 150 200
iteration
TVupperbounds
N 50 100 150 200
Pierre E. Jacob Monte Carlo for not quite Bayes
Summary
A Gibbs sampler can be used to approximate lower and upper
probabilities in the Dempster–Shafer framework.
Is perfect sampling possible here?
Extensions for hierarchical counts, hidden Markov models?
Jacob, Gong, Edlefsen & Dempster, A Gibbs sampler for a class of
random convex polytopes. On arXiv and researchers.one.
https://github.com/pierrejacob/dempsterpolytope
Pierre E. Jacob Monte Carlo for not quite Bayes
Outline
1 Dempster–Shafer analysis of count data
2 Unbiased MCMC and diagnostics of convergence
3 Modular Bayesian inference
4 Bagging posterior distributions
Pierre E. Jacob Monte Carlo for not quite Bayes
Coupled chains
Glynn & Rhee, Exact estimation for MC equilibrium expectations, 2014.
Generate two chains (Xt) and (Yt), going to π, as follows:
sample X0 and Y0 from π0 (independently, or not),
sample Xt|Xt−1 ∼ P(Xt−1, ·) for t = 1, . . . , L,
for t ≥ L + 1, sample
(Xt, Yt−L)|(Xt−1, Yt−L−1) ∼ ¯P ((Xt−1, Yt−L−1), ·).
¯P must be such that
Xt+1|Xt ∼ P(Xt, ·) and Yt|Yt−1 ∼ P(Yt−1, ·)
(thus Xt and Yt have the same distribution for all t ≥ 0),
there exists a random time τ such that Xt = Yt−L for t ≥ τ
(the chains meet and remain “faithful”).
Pierre E. Jacob Monte Carlo for not quite Bayes
Coupled chains
0
4
8
0 50 100 150 200
iteration
x
π = N(0, 1), RWMH with Normal proposal std = 0.5, π0 = N(10, 32
)
Pierre E. Jacob Monte Carlo for not quite Bayes
Unbiased estimators
Under some conditions, the estimator
1
m − k + 1
m
t=k
h(Xt)
+
1
m − k + 1
τ−1
t=k+L
min m − k + 1,
t − k
L
(h(Xt) − h(Yt−L)),
has expectation h(x)π(dx), finite cost and finite variance.
“MCMC estimator + bias correction terms”
Its efficiency can be close to that of MCMC estimators,
if k, m chosen appropriately (and L also).
Jacob, O’Leary & Atchad´e, Unbiased MCMC with couplings, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
Finite-time bias of MCMC
Total variation distance between Xt ∼ πt and π = limt→∞ πt:
πt − π TV ≤ E[max(0, (τ − L − t)/L )].
0.000
0.005
0.010
0.015
0 50 100 150 200
τ − lag
lag = 1
1e−04
1e−03
1e−02
1e−01
1e+00
1e+01
1e+02
0 50 100 150 200
iteration
TVupperbounds
Biswas, Jacob & Vanetti, Estimating Convergence of Markov chains
with L-Lag Couplings, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
Finite-time bias of MCMC
Total variation distance between Xt ∼ πt and π = limt→∞ πt:
πt − π TV ≤ E[max(0, (τ − L − t)/L )].
0.000
0.005
0.010
0.015
0 50 100 150
τ − lag
lag = 50
1e−04
1e−03
1e−02
1e−01
1e+00
1e+01
1e+02
0 50 100 150 200
iteration
TVupperbounds
Biswas, Jacob & Vanetti, Estimating Convergence of Markov chains
with L-Lag Couplings, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
Finite-time bias of MCMC
Total variation distance between Xt ∼ πt and π = limt→∞ πt:
πt − π TV ≤ E[max(0, (τ − L − t)/L )].
0.000
0.005
0.010
0.015
0 50 100 150
τ − lag
lag = 100
1e−04
1e−03
1e−02
1e−01
1e+00
1e+01
1e+02
0 50 100 150 200
iteration
TVupperbounds
Biswas, Jacob & Vanetti, Estimating Convergence of Markov chains
with L-Lag Couplings, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
Finite-time bias of MCMC
Upper bounds can also be obtained for e.g. 1-Wasserstein.
And perhaps lower bounds?
Applicable in e.g. high-dimensional and/or discrete spaces.
Biswas, Jacob & Vanetti, Estimating Convergence of Markov chains
with L-Lag Couplings, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
Finite-time bias of MCMC
Example: Gibbs sampler for Dempster’s analysis of counts.
0.00
0.25
0.50
0.75
1.00
0 50 100 150 200
iteration
TVupperbounds
N 50 100 150 200
This quantifies bias of MCMC estimators, not variance.
Pierre E. Jacob Monte Carlo for not quite Bayes
Outline
1 Dempster–Shafer analysis of count data
2 Unbiased MCMC and diagnostics of convergence
3 Modular Bayesian inference
4 Bagging posterior distributions
Pierre E. Jacob Monte Carlo for not quite Bayes
Models made of modules
First module:
parameter θ1, data Y1
prior: p1(θ1)
likelihood: p1(Y1|θ1)
Second module:
parameter θ2, data Y2
prior: p2(θ2|θ1)
likelihood: p2 (Y2|θ1, θ2)
We are interested in the estimation of θ1, θ2 or both.
Pierre E. Jacob Monte Carlo for not quite Bayes
Joint model approach
Parameter (θ1, θ2), with prior
p(θ1, θ2) = p1(θ1)p2(θ2|θ1).
Data (Y1, Y2), likelihood
p(Y1, Y2|θ1, θ2) = p1(Y1|θ1)p2(Y2|θ1, θ2).
Posterior distribution
π (θ1, θ2|Y1, Y2) ∝ p1 (θ1) p1(Y1|θ1)p2 (θ2|θ1) p2 (Y2|θ1, θ2).
Pierre E. Jacob Monte Carlo for not quite Bayes
Joint model approach
In the joint model approach, all data are used to
simultaneously infer all parameters. . .
. . . so that uncertainty about θ1 is propagated to the
estimation of θ2. . .
. . . but misspecification of the 2nd module can damage the
estimation of θ1.
What about allowing uncertainty propagation, but
preventing feedback of some modules on others?
Pierre E. Jacob Monte Carlo for not quite Bayes
Cut distribution
One might want to propagate uncertainty without allowing
“feedback” of second module on first module.
Cut distribution:
πcut
(θ1, θ2; Y1, Y2) = p1(θ1|Y1)p2 (θ2|θ1, Y2).
Different from the posterior distribution under joint model,
under which the first marginal is π(θ1|Y1, Y2).
Pierre E. Jacob Monte Carlo for not quite Bayes
Example: epidemiological study
Model of virus prevalence
∀i = 1, . . . , I Zi ∼ Binomial(Ni, ϕi),
Zi is number of women infected with high-risk HPV in a
sample of size Ni in country i.
Beta(1,1) prior on each ϕi, independently.
Impact of prevalence onto cervical cancer occurrence
∀i = 1, . . . , I Yi ∼ Poisson(λiTi), log(λi) = θ2,1 + θ2,2ϕi,
Yi is number of cancer cases arising from Ti woman-years of
follow-up in country i.
N(0, 103) on θ2,1, θ2,2, independently.
Plummer, Cuts in Bayesian graphical models, 2014.
Jacob, Holmes, Murray, Robert & Nicholson, Better together?
Statistical learning in models made of modules.
Pierre E. Jacob Monte Carlo for not quite Bayes
Monte Carlo with joint model approach
Joint model posterior has density
π (θ1, θ2|Y1, Y2) ∝ p1 (θ1) p1 (Y1|θ1)p2 (θ2|θ1) p2 (Y2|θ1, θ2).
The computational complexity typically grows
super-linearly with the number of modules.
Difficulties stack up. . .
intractability, multimodality, ridges, etc.
Pierre E. Jacob Monte Carlo for not quite Bayes
Monte Carlo with cut distribution
The cut distribution is defined as
πcut
(θ1, θ2; Y1, Y2) = p1(θ1|Y1)p2 (θ2|θ1, Y2) ∝
π (θ1, θ2|Y1, Y2)
p2 (Y2|θ1)
.
The denominator is the feedback of the 2nd module on θ1:
p2 (Y2|θ1) = p2(Y2|θ1, θ2)p2(dθ2|θ1).
The feedback term is typically intractable.
Pierre E. Jacob Monte Carlo for not quite Bayes
Monte Carlo with cut distribution
WinBUGS’ approach via the cut function: alternate between
sampling θ1 from K1(θ1 → dθ1), targeting p1(dθ1|Y1);
sampling θ2 from K2
θ1
(θ2 → dθ2), targeting p2(dθ2|θ1, Y2).
This does not leave the cut distribution invariant!
Iterating the kernel K2
θ1
enough times mitigates the issue.
Plummer, Cuts in Bayesian graphical models, 2014.
Pierre E. Jacob Monte Carlo for not quite Bayes
Monte Carlo with cut distribution
In a perfect world, we could sample i.i.d.
θi
1 from p1(θ1|Y1),
θi
2 given θi
1 from p2(θ2|θi
1, Y2),
then (θi
1, θi
2) would be i.i.d. from the cut distribution.
Pierre E. Jacob Monte Carlo for not quite Bayes
Monte Carlo with cut distribution
In an MCMC world, we can sample
θi
1 approximately from p1(θ1|Y1) using MCMC,
θi
2 given θi
1 approximately from p2(θ2|θi
1, Y2) using MCMC,
then resulting samples approximate the cut distribution,
in the limit of the numbers of iterations, at both stages.
Pierre E. Jacob Monte Carlo for not quite Bayes
Monte Carlo with cut distribution
In an unbiased MCMC world, we can approximate expectations
h(x)π(dx) without bias, in finite compute time.
We can obtain an unbiased approximation of p1(θ1|Y1), and for
each θ1, an unbiased approximation of p2(θ2|θ1, Y2).
Thus, by the tower property, we can unbiasedly estimate
h(θ1, θ2)p2(dθ2|θ1, Y1)p1(dθ1|Y1).
Jacob, O’Leary & Atchad´e, Unbiased MCMC with couplings, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
Example: epidemiological study
0
1
2
3
−2.5 −2.0 −1.5
θ2,1
density
0.00
0.05
0.10
0.15
10 15 20 25
θ2,2
densityApproximation of the marginals of the cut distribution of
(θ2,1, θ2,2), the parameters of the Poisson regression module in
the epidemiological model of Plummer (2014).
Jacob, Holmes, Murray, Robert & Nicholson, Better together?
Statistical learning in models made of modules.
Pierre E. Jacob Monte Carlo for not quite Bayes
Outline
1 Dempster–Shafer analysis of count data
2 Unbiased MCMC and diagnostics of convergence
3 Modular Bayesian inference
4 Bagging posterior distributions
Pierre E. Jacob Monte Carlo for not quite Bayes
Bagging posterior distributions
We can stabilize the posterior distribution by using a
bootstrap and aggregation scheme, in the spirit of bag-
ging (Breiman, 1996b). In a nutshell, denote by D a
bootstrap or subsample of the data D. The posterior of
the random parameters θ given the data D has c.d.f.
F(·|D), and we can stabilize this using
FBayesBag(·|D) = E [F(·|D )],
where E is with respect to the bootstrap- or subsam-
pling scheme. We call it the BayesBag estimator. It
can be approximated by averaging over B posterior com-
putations for bootstrap- or subsamples, which might be
a rather demanding task (although say B=10 would al-
ready stabilize to a certain extent).
B¨uhlmann, Discussion of Big Bayes Stories and BayesBag, 2014.
Pierre E. Jacob Monte Carlo for not quite Bayes
Bagging posterior distributions
For b = 1, . . . , B
Sample data set D(b) by bootstrapping from D.
Obtain MCMC approximation ˆπ(b) of posterior given D(b).
Finally obtain B−1 B
b=1 ˆπ(b).
Converges to “BayesBag” distribution as both B and number of
MCMC samples go to infinity.
If we can obtain unbiased approximation of posterior given any
D, the resulting approximation of “BayesBag” would be
consistent as B → ∞ only.
Exactly the same reasoning as for the cut distribution.
Example at https://statisfaction.wordpress.com/2019/
10/02/bayesbag-and-how-to-approximate-it/
Pierre E. Jacob Monte Carlo for not quite Bayes
Discussion
Some existing alternatives to standard Bayesian inference
are well motivated, but raise computational questions.
There are on-going efforts toward scalable Monte Carlo
methods, e.g. using coupled Markov chains or regeneration
techniques, in addition to sustained search for new MCMC
algorithms.
Quantification of variance is commonly done, quantification
of bias is also possible.
What makes a computational method convenient? It does
not seem to be entirely about asymptotic efficiency when
method is optimally tuned.
Thank you for listening!
Funding provided by the National Science Foundation,
grants DMS-1712872 and DMS-1844695.
Pierre E. Jacob Monte Carlo for not quite Bayes
References
Practical couplings in the literature. . .
Propp & Wilson, Exact sampling with coupled Markov chains
and applications to statistical mechanics, Random Structures &
Algorithms, 1996.
Johnson, Studying convergence of Markov chain Monte Carlo
algorithms using coupled sample paths, JASA, 1996.
Neal, Circularly-coupled Markov chain sampling, UoT tech
report, 1999.
Glynn & Rhee, Exact estimation for Markov chain equilibrium
expectations, Journal of Applied Probability, 2014.
Agapiou, Roberts & Vollmer, Unbiased Monte Carlo: posterior
estimation for intractable/infinite-dimensional models, Bernoulli,
2018.
Pierre E. Jacob Monte Carlo for not quite Bayes
References
Finite-time bias of MCMC. . .
Brooks & Roberts, Assessing convergence of Markov chain
Monte Carlo algorithms, STCO, 1998.
Cowles & Rosenthal, A simulation approach to convergence rates
for Markov chain Monte Carlo algorithms, STCO, 1998.
Johnson, Studying convergence of Markov chain Monte Carlo
algorithms using coupled sample paths, JASA, 1996.
Gorham, Duncan, Vollmer & Mackey, Measuring Sample Quality
with Diffusions, AAP, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
References
Own work. . .
with John O’Leary, Yves F. Atchad´e
Unbiased Markov chain Monte Carlo with couplings, 2019.
with Fredrik Lindsten, Thomas Sch¨on
Smoothing with Couplings of Conditional Particle Filters, 2019.
with Jeremy Heng
Unbiased Hamiltonian Monte Carlo with couplings, 2019.
with Lawrence Middleton, George Deligiannidis, Arnaud
Doucet
Unbiased Markov chain Monte Carlo for intractable target
distributions, 2019.
Unbiased Smoothing using Particle Independent
Metropolis-Hastings, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
References
with Maxime Rischard, Natesh Pillai
Unbiased estimation of log normalizing constants with
applications to Bayesian cross-validation.
with Niloy Biswas, Paul Vanetti
Estimating Convergence of Markov chains with L-Lag Couplings,
2019.
with Chris Holmes, Lawrence Murray, Christian Robert,
George Nicholson
Better together? Statistical learning in models made of modules.
Pierre E. Jacob Monte Carlo for not quite Bayes

More Related Content

What's hot

Big model, big data
Big model, big dataBig model, big data
Big model, big data
Christian Robert
 
Bayesian inversion of deterministic dynamic causal models
Bayesian inversion of deterministic dynamic causal modelsBayesian inversion of deterministic dynamic causal models
Bayesian inversion of deterministic dynamic causal modelskhbrodersen
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
Christian Robert
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Christian Robert
 
Chris Sherlock's slides
Chris Sherlock's slidesChris Sherlock's slides
Chris Sherlock's slides
Christian Robert
 
Complexity of exact solutions of many body systems: nonequilibrium steady sta...
Complexity of exact solutions of many body systems: nonequilibrium steady sta...Complexity of exact solutions of many body systems: nonequilibrium steady sta...
Complexity of exact solutions of many body systems: nonequilibrium steady sta...
Lake Como School of Advanced Studies
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
Christian Robert
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
Christian Robert
 
Omiros' talk on the Bernoulli factory problem
Omiros' talk on the  Bernoulli factory problemOmiros' talk on the  Bernoulli factory problem
Omiros' talk on the Bernoulli factory problem
BigMC
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Introduction to Diffusion Monte Carlo
Introduction to Diffusion Monte CarloIntroduction to Diffusion Monte Carlo
Introduction to Diffusion Monte Carlo
Claudio Attaccalite
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methods
Christian Robert
 
Macrocanonical models for texture synthesis
Macrocanonical models for texture synthesisMacrocanonical models for texture synthesis
Macrocanonical models for texture synthesis
Valentin De Bortoli
 

What's hot (20)

Big model, big data
Big model, big dataBig model, big data
Big model, big data
 
Bayesian inversion of deterministic dynamic causal models
Bayesian inversion of deterministic dynamic causal modelsBayesian inversion of deterministic dynamic causal models
Bayesian inversion of deterministic dynamic causal models
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
 
Chris Sherlock's slides
Chris Sherlock's slidesChris Sherlock's slides
Chris Sherlock's slides
 
Complexity of exact solutions of many body systems: nonequilibrium steady sta...
Complexity of exact solutions of many body systems: nonequilibrium steady sta...Complexity of exact solutions of many body systems: nonequilibrium steady sta...
Complexity of exact solutions of many body systems: nonequilibrium steady sta...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Omiros' talk on the Bernoulli factory problem
Omiros' talk on the  Bernoulli factory problemOmiros' talk on the  Bernoulli factory problem
Omiros' talk on the Bernoulli factory problem
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Introduction to Diffusion Monte Carlo
Introduction to Diffusion Monte CarloIntroduction to Diffusion Monte Carlo
Introduction to Diffusion Monte Carlo
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methods
 
Macrocanonical models for texture synthesis
Macrocanonical models for texture synthesisMacrocanonical models for texture synthesis
Macrocanonical models for texture synthesis
 

Similar to Monte Carlo methods for some not-quite-but-almost Bayesian problems

Monte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problemsMonte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problems
Pierre Jacob
 
The Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal FunctionThe Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal Function
Radboud University Medical Center
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
Pierre Jacob
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
Stefano Cabras
 
P, NP and NP-Complete, Theory of NP-Completeness V2
P, NP and NP-Complete, Theory of NP-Completeness V2P, NP and NP-Complete, Theory of NP-Completeness V2
P, NP and NP-Complete, Theory of NP-Completeness V2
S.Shayan Daneshvar
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
Christian Robert
 
Nonparametric testing for exogeneity with discrete regressors and instruments
Nonparametric testing for exogeneity with discrete regressors and instrumentsNonparametric testing for exogeneity with discrete regressors and instruments
Nonparametric testing for exogeneity with discrete regressors and instruments
GRAPE
 
Variational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionVariational Bayes: A Gentle Introduction
Variational Bayes: A Gentle Introduction
Flavio Morelli
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
Christian Robert
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniques
Pierre Jacob
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formula
Alexander Litvinenko
 
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
mathsjournal
 
Cs229 notes8
Cs229 notes8Cs229 notes8
Cs229 notes8
VuTran231
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
Federico Cerutti
 
Talk given at Kobayashi-Maskawa Institute, Nagoya University, Japan.
Talk given at Kobayashi-Maskawa Institute, Nagoya University, Japan.Talk given at Kobayashi-Maskawa Institute, Nagoya University, Japan.
Talk given at Kobayashi-Maskawa Institute, Nagoya University, Japan.
Peter Coles
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
Yoonho Lee
 
SMB_2012_HR_VAN_ST-last version
SMB_2012_HR_VAN_ST-last versionSMB_2012_HR_VAN_ST-last version
SMB_2012_HR_VAN_ST-last versionLilyana Vankova
 
Bath_IMI_Summer_Project
Bath_IMI_Summer_ProjectBath_IMI_Summer_Project
Bath_IMI_Summer_ProjectJosh Young
 

Similar to Monte Carlo methods for some not-quite-but-almost Bayesian problems (20)

Monte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problemsMonte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problems
 
The Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal FunctionThe Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal Function
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
P, NP and NP-Complete, Theory of NP-Completeness V2
P, NP and NP-Complete, Theory of NP-Completeness V2P, NP and NP-Complete, Theory of NP-Completeness V2
P, NP and NP-Complete, Theory of NP-Completeness V2
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
Nonparametric testing for exogeneity with discrete regressors and instruments
Nonparametric testing for exogeneity with discrete regressors and instrumentsNonparametric testing for exogeneity with discrete regressors and instruments
Nonparametric testing for exogeneity with discrete regressors and instruments
 
Variational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionVariational Bayes: A Gentle Introduction
Variational Bayes: A Gentle Introduction
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniques
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formula
 
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
 
Cs229 notes8
Cs229 notes8Cs229 notes8
Cs229 notes8
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
 
Talk given at Kobayashi-Maskawa Institute, Nagoya University, Japan.
Talk given at Kobayashi-Maskawa Institute, Nagoya University, Japan.Talk given at Kobayashi-Maskawa Institute, Nagoya University, Japan.
Talk given at Kobayashi-Maskawa Institute, Nagoya University, Japan.
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
SMB_2012_HR_VAN_ST-last version
SMB_2012_HR_VAN_ST-last versionSMB_2012_HR_VAN_ST-last version
SMB_2012_HR_VAN_ST-last version
 
Bath_IMI_Summer_Project
Bath_IMI_Summer_ProjectBath_IMI_Summer_Project
Bath_IMI_Summer_Project
 

More from Pierre Jacob

ISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lectureISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lecture
Pierre Jacob
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
Pierre Jacob
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
Pierre Jacob
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
Pierre Jacob
 
Current limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov modelsCurrent limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov models
Pierre Jacob
 
On non-negative unbiased estimators
On non-negative unbiased estimatorsOn non-negative unbiased estimators
On non-negative unbiased estimators
Pierre Jacob
 
Path storage in the particle filter
Path storage in the particle filterPath storage in the particle filter
Path storage in the particle filter
Pierre Jacob
 
SMC^2: an algorithm for sequential analysis of state-space models
SMC^2: an algorithm for sequential analysis of state-space modelsSMC^2: an algorithm for sequential analysis of state-space models
SMC^2: an algorithm for sequential analysis of state-space models
Pierre Jacob
 
PAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ Warwick
Pierre Jacob
 
Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7
Pierre Jacob
 
Presentation MCB seminar 09032011
Presentation MCB seminar 09032011Presentation MCB seminar 09032011
Presentation MCB seminar 09032011
Pierre Jacob
 

More from Pierre Jacob (11)

ISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lectureISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lecture
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Current limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov modelsCurrent limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov models
 
On non-negative unbiased estimators
On non-negative unbiased estimatorsOn non-negative unbiased estimators
On non-negative unbiased estimators
 
Path storage in the particle filter
Path storage in the particle filterPath storage in the particle filter
Path storage in the particle filter
 
SMC^2: an algorithm for sequential analysis of state-space models
SMC^2: an algorithm for sequential analysis of state-space modelsSMC^2: an algorithm for sequential analysis of state-space models
SMC^2: an algorithm for sequential analysis of state-space models
 
PAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ Warwick
 
Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7
 
Presentation MCB seminar 09032011
Presentation MCB seminar 09032011Presentation MCB seminar 09032011
Presentation MCB seminar 09032011
 

Recently uploaded

Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
Cherry
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 

Recently uploaded (20)

Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 

Monte Carlo methods for some not-quite-but-almost Bayesian problems

  • 1. Monte Carlo methods for some not-quite-but-almost Bayesian problems Pierre E. Jacob Department of Statistics, Harvard University joint work with Ruobin Gong, Paul T. Edlefsen, Arthur P. Dempster John O’Leary, Yves F. Atchad´e, Niloy Biswas, Paul Vanetti and others November 21, 2019 Department of Statistical Science, University of Toronto Pierre E. Jacob Monte Carlo for not quite Bayes
  • 2. Introduction A lot of questions in statistics give rise to non-trivial computational problems. Among these, some are numerical integration problems ⇔ about sampling from probability distributions. Besag, Markov chain Monte Carlo for statistical inference, 2001. Computational challenges arise in deviations from standard Bayesian inference, motivated by three questions, quantifying ignorance / Dempster–Shafer analysis, model misspecification / modular Bayesian inference, robustness to some perturbation of the data / BayesBag. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 3. Outline 1 Dempster–Shafer analysis of count data 2 Unbiased MCMC and diagnostics of convergence 3 Modular Bayesian inference 4 Bagging posterior distributions Pierre E. Jacob Monte Carlo for not quite Bayes
  • 4. Outline 1 Dempster–Shafer analysis of count data 2 Unbiased MCMC and diagnostics of convergence 3 Modular Bayesian inference 4 Bagging posterior distributions Pierre E. Jacob Monte Carlo for not quite Bayes
  • 5. Inference with count data Notation: [N] := {1, . . . , N}. Simplex ∆. Observations : xn ∈ [K] := {1, . . . , K}, x = (x1, . . . , xN ). Index sets : Ik = {n ∈ [N] : xn = k}. Counts : Nk = |Ik|. Model: xn iid ∼ Categorical(θ) with θ = (θk)k∈[K] ∈ ∆, i.e. P(xn = k) = θk for all n, k. Goal: estimate θ, predict, etc. Maximum likelihood estimator: ˆθk = Nk/N. Bayesian inference combines likelihood with prior on θ into a posterior distribution, assigning a probability ∈ [0, 1] to any measurable subset Σ of the simplex ∆. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 6. Arthur Dempster’s approach to inference Observations x = (xn)n∈[N] are fixed. We will specify a sampling mechanism, on top of the likelihood, e.g. xn = m(un, θ) for some function m and random variable un. We will seek u = (un)n∈[N] that could have generated x for some θ. For arbitrary u, such a θ might not exist. If a set of feasible θ exists denote it by F(u). Dempster’s approach defines lower/upper probabilities for subsets Σ of interest, as expectations with respect to non-empty F(u). Arthur P. Dempster. New methods for reasoning towards posterior distributions based on sample data. Annals of Mathematical Statistics, 1966. Arthur P. Dempster. Statistical inference from a Dempster—Shafer perspective. Past, Present, and Future of Statistical Science, 2014. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 7. Sampling from a Categorical distribution 2 3 1 ∆1(θ) ∆2(θ)∆3(θ) θ Subsimplex ∆k(θ), for θ ∈ ∆: {z ∈ ∆ : ∀ ∈ [K] z /zk ≥ θ /θk}. Sampling mechanism, for θ ∈ ∆: - draw un uniform on ∆, - define xn such that un ∈ ∆xn (θ). Then P(xn = k) = θk, because Vol(∆k(θ)) = θk. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 8. Draws in the simplex Counts: (2, 3, 1). Let’s draw N = 6 uniform samples on ∆. 2 3 1 q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 9. Draws in the simplex Each un is associated to an observed xn ∈ {11, 22, 33}. 2 3 1 q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 10. Draws in the simplex If there exists a feasible θ, it cannot be just anywhere. 2 3 1 q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 11. Draws in the simplex The uns of each category add constraints on θ. 2 3 1 q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 12. Draws in the simplex Overall the constraints define a polytope for θ, or an empty set. 2 3 1 q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 13. Draws in the simplex Here, there is a polytope of θ such that ∀n ∈ [N] un ∈ ∆xn (θ). 2 3 1 q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 14. Draws in the simplex Any θ in the polytope separates the uns appropriately. 2 3 1 qqq q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 15. Draws in the simplex Let’s try again with fresh uniform samples on ∆. 2 3 1 q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 16. Draws in the simplex Here there is no θ ∈ ∆ such that ∀n ∈ [N] un ∈ ∆xn (θ). 2 3 1 q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 17. Lower and upper probabilities Consider the set Rx = (u1, . . . , uN ) ∈ ∆N : ∃θ ∈ ∆ ∀n ∈ [N] un ∈ ∆xn (θ) . and denote by νx the uniform distribution on Rx. For u ∈ Rx, there is a set F(u) = {θ ∈ ∆ : ∀n un ∈ ∆xn (θ)}. For a set Σ ⊂ ∆ of interest, define (lower probability) P(Σ) = 1(F(u) ⊂ Σ)νx(du), (upper probability) ¯P(Σ) = 1(F(u) ∩ Σ = ∅)νx(du). Pierre E. Jacob Monte Carlo for not quite Bayes
  • 18. Summary and Monte Carlo problem Arthur Dempster’s approach, later called Dempster–Shafer theory of belief functions, is based on a distribution of feasible sets, F(u) = {θ ∈ ∆ ∀n ∈ [N] un ∈ ∆xn (θ)}, where u ∼ νx, the uniform distribution on Rx. How do we obtain samples from this distribution? Rejection rate 99%, for data (2, 3, 1). Hit-and-run algorithm? Our proposed strategy is a Gibbs sampler. Starting from some u ∈ Rx, we will iteratively refresh some components un of u given others. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 19. Gibbs sampler: initialization We can obtain some u in Rx as follows. Choose an arbitrary θ ∈ ∆. For all n ∈ [N] sample un uniformly in ∆k(θ) where xn = k. 2 3 1 ∆1(θ) ∆2(θ)∆3(θ) θ q q q q q q To sample components un given others, we will express Rx, {u : ∃θ ∀n un ∈ ∆xn (θ)} in terms of relations that the components un must satisfy with respect to one another. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 20. Equivalent representation For any θ ∈ ∆, ∀k ∈ [K] ∀n ∈ Ik un ∈ ∆k(θ) ⇔ ∀k ∈ [K] ∀n ∈ Ik ∀ ∈ [K] un, un,k ≥ θ θk . This is equivalent to ∀k ∈ [K] ∀ ∈ [K] min n∈Ik un, un,k ≥ θ θk . Pierre E. Jacob Monte Carlo for not quite Bayes
  • 21. Linear constraints Counts: (9, 8, 3), u in Rx. Values ηk→ = minn∈Ik un, /un,k define linear constraints on θ. 2 3 1 q q q q q q q q q q q q q q q q q q q q θ3 θ1 = η1→3 θ2 θ1 = η1→2 Pierre E. Jacob Monte Carlo for not quite Bayes
  • 22. Some inequalities Next, assume u ∈ Rx, write ηk→ = minn∈Ik un, /un,k, and consider some implications. There exists θ ∈ ∆ such that θ /θk ≤ ηk→ for all k, ∈ [K]. Then, for all k, θ θk ≤ ηk→ , and θk θ ≤ η →k, thus ηk→ η →k ≥ 1. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 23. More inequalities We can continue, if K ≥ 3: for all k, , j, η−1 →k ≤ θ θk = θ θj θj θk ≤ ηj→ ηk→j, thus ηk→jηj→ η →k ≥ 1. And if K ≥ 4, for all k, , j, m ηk→jηj→ η →mηm→k ≥ 1. Generally, ∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 24. Main result So far, if ∃θ ∈ ∆ such that θ /θk ≤ ηk→ for k, ∈ [K] then ∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1. The reverse implication holds too. This would mean Rx = {u : ∃θ ∀k, ∈ [K] θ /θk ≤ ηk→ } = {u : ∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1}. i.e. Rx is represented by relations between components (un). This helps computing conditional distributions under νx, leading to a Gibbs sampler. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 25. Some remarks on these inequalities ∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1. We can consider only unique indices in j1, . . . , jL, since the other cases can be deduced from those. Example: η1→2η2→4η4→3η3→2η2→1 ≥ 1, follows from η1→2η2→1 ≥ 1 and η2→4η4→3η3→2 ≥ 1. The indices j1 → j2 → · · · → jL → j1 form a cycle. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 26. Graphs Fully connected graph with weight log ηk→ on edge (k, ). 1 2 3 log(η1→2) log(η2→1) Value of a path = sum of the weights along the path. Negative cycle = path from vertex to itself with negative value Pierre E. Jacob Monte Carlo for not quite Bayes
  • 27. Graphs ∀L ∀j1, . . . , jL ηj1→j2 . . . ηjL→j1 ≥ 1 ⇔ ∀L ∀j1, . . . , jL log(ηj1→j2 ) + . . . + log(ηjL→j1 ) ≥ 0 ⇔ there are no negative cycles in the graph. 1 2 3 log(η1→2) log(η2→1) Pierre E. Jacob Monte Carlo for not quite Bayes
  • 28. Proof Proof of claim: “inequalities” ⇒ “∃θ : θ /θk ≤ ηk→ ∀k, ”. min(k → ) := minimum value of path from k to in the graph. Finite ∀k, because of absence of negative cycles in the graph. Define θ via θk ∝ exp(min(K → k)). Then θ ∈ ∆. Furthermore, for all k, min(K → ) ≤ min(K → k) + log(ηk→ ), therefore θ /θk ≤ ηk→ . Pierre E. Jacob Monte Carlo for not quite Bayes
  • 29. So far. . . We want to sample uniformly on the set Rx, Rx = {u : ∃θ ∀k, ∈ [K] θ /θk ≤ ηk→ }. We have proved that this set can also be written {u : ∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1}. The inequalities hold if and only if some graph with weight log ηk→ on edge (k, ) does not contain negative cycles. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 30. Conditional distributions We can obtain conditional distributions of un for n ∈ Ik given (un)n/∈Ik with respect to νx: un given (un)n/∈Ik are i.i.d. uniform in ∆k(θ ), where θ ∝ exp(− min( → k)) for all , with min( → k) := minimum value of path from to k. Shortest paths can be computed in polynomial time. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 31. Conditional distributions Counts: (9, 8, 3). What is the conditional distribution of (un)n∈Ik given (un)n/∈Ik under νx? 2 3 1 q q q qq q q q q q q q q q q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 32. Conditional distributions Counts: (9, 8, 3). What is the conditional distribution of (un)n∈Ik given (un)n/∈Ik under νx? 2 3 1 q q q q q q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 33. Conditional distributions Counts: (9, 8, 3). What is the conditional distribution of (un)n∈Ik given (un)n/∈Ik under νx? 2 3 1 q q q q q q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 34. Conditional distributions Counts: (9, 8, 3). What is the conditional distribution of (un)n∈Ik given (un)n/∈Ik under νx? 2 3 1 q q q q q q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 35. Gibbs sampler Initial u(0) ∈ Rx. At each iteration t ≥ 1, for each category k ∈ [K], 1 compute θ such that, for n ∈ Ik, un given other components is uniform on ∆k(θ ). 2 Draw u (t) n ∼ ∆k(θ ) for n ∈ Ik. 3 Update η (t) k→ for ∈ [K]. In step 1, θ is obtained by computing shortest path in graph with weights η (t) k→ on edge (k, ). Computed e.g. with Bellman–Ford algorithm, implemented in Cs´ardi & Nepusz, igraph package, 2006. Alternatively, we can compute θ by solving a linear program, Berkelaar, Eikland & Notebaert, lpsolve package, 2004 Pierre E. Jacob Monte Carlo for not quite Bayes
  • 36. Gibbs sampler Counts: (9, 8, 3), 100 polytopes generated by the sampler. 2 3 1 Pierre E. Jacob Monte Carlo for not quite Bayes
  • 37. Cost per iteration Cost in seconds for 100 full sweeps. 0.0 0.3 0.6 0.9 4 8 12 16 K elapsed N 256 512 1024 2048 https://github.com/pierrejacob/dempsterpolytope Pierre E. Jacob Monte Carlo for not quite Bayes
  • 38. Cost per iteration Cost in seconds for 100 full sweeps. 0.0 0.3 0.6 0.9 256 512 1024 2048 N elapsed K 4 8 12 16 https://github.com/pierrejacob/dempsterpolytope Pierre E. Jacob Monte Carlo for not quite Bayes
  • 39. How many iterations for convergence? Let ν(t) by the distribution of u(t) after t iterations. TV(ν(t), νx) = supA |ν(t)(A) − νx(A)|. 0.00 0.25 0.50 0.75 1.00 0 25 50 75 100 iteration TVupperbounds K 5 10 20 Pierre E. Jacob Monte Carlo for not quite Bayes
  • 40. How many iterations for convergence? Let ν(t) by the distribution of u(t) after t iterations. TV(ν(t), νx) = supA |ν(t)(A) − νx(A)|. 0.00 0.25 0.50 0.75 1.00 0 50 100 150 200 iteration TVupperbounds N 50 100 150 200 Pierre E. Jacob Monte Carlo for not quite Bayes
  • 41. Summary A Gibbs sampler can be used to approximate lower and upper probabilities in the Dempster–Shafer framework. Is perfect sampling possible here? Extensions for hierarchical counts, hidden Markov models? Jacob, Gong, Edlefsen & Dempster, A Gibbs sampler for a class of random convex polytopes. On arXiv and researchers.one. https://github.com/pierrejacob/dempsterpolytope Pierre E. Jacob Monte Carlo for not quite Bayes
  • 42. Outline 1 Dempster–Shafer analysis of count data 2 Unbiased MCMC and diagnostics of convergence 3 Modular Bayesian inference 4 Bagging posterior distributions Pierre E. Jacob Monte Carlo for not quite Bayes
  • 43. Coupled chains Glynn & Rhee, Exact estimation for MC equilibrium expectations, 2014. Generate two chains (Xt) and (Yt), going to π, as follows: sample X0 and Y0 from π0 (independently, or not), sample Xt|Xt−1 ∼ P(Xt−1, ·) for t = 1, . . . , L, for t ≥ L + 1, sample (Xt, Yt−L)|(Xt−1, Yt−L−1) ∼ ¯P ((Xt−1, Yt−L−1), ·). ¯P must be such that Xt+1|Xt ∼ P(Xt, ·) and Yt|Yt−1 ∼ P(Yt−1, ·) (thus Xt and Yt have the same distribution for all t ≥ 0), there exists a random time τ such that Xt = Yt−L for t ≥ τ (the chains meet and remain “faithful”). Pierre E. Jacob Monte Carlo for not quite Bayes
  • 44. Coupled chains 0 4 8 0 50 100 150 200 iteration x π = N(0, 1), RWMH with Normal proposal std = 0.5, π0 = N(10, 32 ) Pierre E. Jacob Monte Carlo for not quite Bayes
  • 45. Unbiased estimators Under some conditions, the estimator 1 m − k + 1 m t=k h(Xt) + 1 m − k + 1 τ−1 t=k+L min m − k + 1, t − k L (h(Xt) − h(Yt−L)), has expectation h(x)π(dx), finite cost and finite variance. “MCMC estimator + bias correction terms” Its efficiency can be close to that of MCMC estimators, if k, m chosen appropriately (and L also). Jacob, O’Leary & Atchad´e, Unbiased MCMC with couplings, 2019. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 46. Finite-time bias of MCMC Total variation distance between Xt ∼ πt and π = limt→∞ πt: πt − π TV ≤ E[max(0, (τ − L − t)/L )]. 0.000 0.005 0.010 0.015 0 50 100 150 200 τ − lag lag = 1 1e−04 1e−03 1e−02 1e−01 1e+00 1e+01 1e+02 0 50 100 150 200 iteration TVupperbounds Biswas, Jacob & Vanetti, Estimating Convergence of Markov chains with L-Lag Couplings, 2019. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 47. Finite-time bias of MCMC Total variation distance between Xt ∼ πt and π = limt→∞ πt: πt − π TV ≤ E[max(0, (τ − L − t)/L )]. 0.000 0.005 0.010 0.015 0 50 100 150 τ − lag lag = 50 1e−04 1e−03 1e−02 1e−01 1e+00 1e+01 1e+02 0 50 100 150 200 iteration TVupperbounds Biswas, Jacob & Vanetti, Estimating Convergence of Markov chains with L-Lag Couplings, 2019. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 48. Finite-time bias of MCMC Total variation distance between Xt ∼ πt and π = limt→∞ πt: πt − π TV ≤ E[max(0, (τ − L − t)/L )]. 0.000 0.005 0.010 0.015 0 50 100 150 τ − lag lag = 100 1e−04 1e−03 1e−02 1e−01 1e+00 1e+01 1e+02 0 50 100 150 200 iteration TVupperbounds Biswas, Jacob & Vanetti, Estimating Convergence of Markov chains with L-Lag Couplings, 2019. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 49. Finite-time bias of MCMC Upper bounds can also be obtained for e.g. 1-Wasserstein. And perhaps lower bounds? Applicable in e.g. high-dimensional and/or discrete spaces. Biswas, Jacob & Vanetti, Estimating Convergence of Markov chains with L-Lag Couplings, 2019. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 50. Finite-time bias of MCMC Example: Gibbs sampler for Dempster’s analysis of counts. 0.00 0.25 0.50 0.75 1.00 0 50 100 150 200 iteration TVupperbounds N 50 100 150 200 This quantifies bias of MCMC estimators, not variance. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 51. Outline 1 Dempster–Shafer analysis of count data 2 Unbiased MCMC and diagnostics of convergence 3 Modular Bayesian inference 4 Bagging posterior distributions Pierre E. Jacob Monte Carlo for not quite Bayes
  • 52. Models made of modules First module: parameter θ1, data Y1 prior: p1(θ1) likelihood: p1(Y1|θ1) Second module: parameter θ2, data Y2 prior: p2(θ2|θ1) likelihood: p2 (Y2|θ1, θ2) We are interested in the estimation of θ1, θ2 or both. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 53. Joint model approach Parameter (θ1, θ2), with prior p(θ1, θ2) = p1(θ1)p2(θ2|θ1). Data (Y1, Y2), likelihood p(Y1, Y2|θ1, θ2) = p1(Y1|θ1)p2(Y2|θ1, θ2). Posterior distribution π (θ1, θ2|Y1, Y2) ∝ p1 (θ1) p1(Y1|θ1)p2 (θ2|θ1) p2 (Y2|θ1, θ2). Pierre E. Jacob Monte Carlo for not quite Bayes
  • 54. Joint model approach In the joint model approach, all data are used to simultaneously infer all parameters. . . . . . so that uncertainty about θ1 is propagated to the estimation of θ2. . . . . . but misspecification of the 2nd module can damage the estimation of θ1. What about allowing uncertainty propagation, but preventing feedback of some modules on others? Pierre E. Jacob Monte Carlo for not quite Bayes
  • 55. Cut distribution One might want to propagate uncertainty without allowing “feedback” of second module on first module. Cut distribution: πcut (θ1, θ2; Y1, Y2) = p1(θ1|Y1)p2 (θ2|θ1, Y2). Different from the posterior distribution under joint model, under which the first marginal is π(θ1|Y1, Y2). Pierre E. Jacob Monte Carlo for not quite Bayes
  • 56. Example: epidemiological study Model of virus prevalence ∀i = 1, . . . , I Zi ∼ Binomial(Ni, ϕi), Zi is number of women infected with high-risk HPV in a sample of size Ni in country i. Beta(1,1) prior on each ϕi, independently. Impact of prevalence onto cervical cancer occurrence ∀i = 1, . . . , I Yi ∼ Poisson(λiTi), log(λi) = θ2,1 + θ2,2ϕi, Yi is number of cancer cases arising from Ti woman-years of follow-up in country i. N(0, 103) on θ2,1, θ2,2, independently. Plummer, Cuts in Bayesian graphical models, 2014. Jacob, Holmes, Murray, Robert & Nicholson, Better together? Statistical learning in models made of modules. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 57. Monte Carlo with joint model approach Joint model posterior has density π (θ1, θ2|Y1, Y2) ∝ p1 (θ1) p1 (Y1|θ1)p2 (θ2|θ1) p2 (Y2|θ1, θ2). The computational complexity typically grows super-linearly with the number of modules. Difficulties stack up. . . intractability, multimodality, ridges, etc. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 58. Monte Carlo with cut distribution The cut distribution is defined as πcut (θ1, θ2; Y1, Y2) = p1(θ1|Y1)p2 (θ2|θ1, Y2) ∝ π (θ1, θ2|Y1, Y2) p2 (Y2|θ1) . The denominator is the feedback of the 2nd module on θ1: p2 (Y2|θ1) = p2(Y2|θ1, θ2)p2(dθ2|θ1). The feedback term is typically intractable. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 59. Monte Carlo with cut distribution WinBUGS’ approach via the cut function: alternate between sampling θ1 from K1(θ1 → dθ1), targeting p1(dθ1|Y1); sampling θ2 from K2 θ1 (θ2 → dθ2), targeting p2(dθ2|θ1, Y2). This does not leave the cut distribution invariant! Iterating the kernel K2 θ1 enough times mitigates the issue. Plummer, Cuts in Bayesian graphical models, 2014. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 60. Monte Carlo with cut distribution In a perfect world, we could sample i.i.d. θi 1 from p1(θ1|Y1), θi 2 given θi 1 from p2(θ2|θi 1, Y2), then (θi 1, θi 2) would be i.i.d. from the cut distribution. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 61. Monte Carlo with cut distribution In an MCMC world, we can sample θi 1 approximately from p1(θ1|Y1) using MCMC, θi 2 given θi 1 approximately from p2(θ2|θi 1, Y2) using MCMC, then resulting samples approximate the cut distribution, in the limit of the numbers of iterations, at both stages. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 62. Monte Carlo with cut distribution In an unbiased MCMC world, we can approximate expectations h(x)π(dx) without bias, in finite compute time. We can obtain an unbiased approximation of p1(θ1|Y1), and for each θ1, an unbiased approximation of p2(θ2|θ1, Y2). Thus, by the tower property, we can unbiasedly estimate h(θ1, θ2)p2(dθ2|θ1, Y1)p1(dθ1|Y1). Jacob, O’Leary & Atchad´e, Unbiased MCMC with couplings, 2019. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 63. Example: epidemiological study 0 1 2 3 −2.5 −2.0 −1.5 θ2,1 density 0.00 0.05 0.10 0.15 10 15 20 25 θ2,2 densityApproximation of the marginals of the cut distribution of (θ2,1, θ2,2), the parameters of the Poisson regression module in the epidemiological model of Plummer (2014). Jacob, Holmes, Murray, Robert & Nicholson, Better together? Statistical learning in models made of modules. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 64. Outline 1 Dempster–Shafer analysis of count data 2 Unbiased MCMC and diagnostics of convergence 3 Modular Bayesian inference 4 Bagging posterior distributions Pierre E. Jacob Monte Carlo for not quite Bayes
  • 65. Bagging posterior distributions We can stabilize the posterior distribution by using a bootstrap and aggregation scheme, in the spirit of bag- ging (Breiman, 1996b). In a nutshell, denote by D a bootstrap or subsample of the data D. The posterior of the random parameters θ given the data D has c.d.f. F(·|D), and we can stabilize this using FBayesBag(·|D) = E [F(·|D )], where E is with respect to the bootstrap- or subsam- pling scheme. We call it the BayesBag estimator. It can be approximated by averaging over B posterior com- putations for bootstrap- or subsamples, which might be a rather demanding task (although say B=10 would al- ready stabilize to a certain extent). B¨uhlmann, Discussion of Big Bayes Stories and BayesBag, 2014. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 66. Bagging posterior distributions For b = 1, . . . , B Sample data set D(b) by bootstrapping from D. Obtain MCMC approximation ˆπ(b) of posterior given D(b). Finally obtain B−1 B b=1 ˆπ(b). Converges to “BayesBag” distribution as both B and number of MCMC samples go to infinity. If we can obtain unbiased approximation of posterior given any D, the resulting approximation of “BayesBag” would be consistent as B → ∞ only. Exactly the same reasoning as for the cut distribution. Example at https://statisfaction.wordpress.com/2019/ 10/02/bayesbag-and-how-to-approximate-it/ Pierre E. Jacob Monte Carlo for not quite Bayes
  • 67. Discussion Some existing alternatives to standard Bayesian inference are well motivated, but raise computational questions. There are on-going efforts toward scalable Monte Carlo methods, e.g. using coupled Markov chains or regeneration techniques, in addition to sustained search for new MCMC algorithms. Quantification of variance is commonly done, quantification of bias is also possible. What makes a computational method convenient? It does not seem to be entirely about asymptotic efficiency when method is optimally tuned. Thank you for listening! Funding provided by the National Science Foundation, grants DMS-1712872 and DMS-1844695. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 68. References Practical couplings in the literature. . . Propp & Wilson, Exact sampling with coupled Markov chains and applications to statistical mechanics, Random Structures & Algorithms, 1996. Johnson, Studying convergence of Markov chain Monte Carlo algorithms using coupled sample paths, JASA, 1996. Neal, Circularly-coupled Markov chain sampling, UoT tech report, 1999. Glynn & Rhee, Exact estimation for Markov chain equilibrium expectations, Journal of Applied Probability, 2014. Agapiou, Roberts & Vollmer, Unbiased Monte Carlo: posterior estimation for intractable/infinite-dimensional models, Bernoulli, 2018. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 69. References Finite-time bias of MCMC. . . Brooks & Roberts, Assessing convergence of Markov chain Monte Carlo algorithms, STCO, 1998. Cowles & Rosenthal, A simulation approach to convergence rates for Markov chain Monte Carlo algorithms, STCO, 1998. Johnson, Studying convergence of Markov chain Monte Carlo algorithms using coupled sample paths, JASA, 1996. Gorham, Duncan, Vollmer & Mackey, Measuring Sample Quality with Diffusions, AAP, 2019. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 70. References Own work. . . with John O’Leary, Yves F. Atchad´e Unbiased Markov chain Monte Carlo with couplings, 2019. with Fredrik Lindsten, Thomas Sch¨on Smoothing with Couplings of Conditional Particle Filters, 2019. with Jeremy Heng Unbiased Hamiltonian Monte Carlo with couplings, 2019. with Lawrence Middleton, George Deligiannidis, Arnaud Doucet Unbiased Markov chain Monte Carlo for intractable target distributions, 2019. Unbiased Smoothing using Particle Independent Metropolis-Hastings, 2019. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 71. References with Maxime Rischard, Natesh Pillai Unbiased estimation of log normalizing constants with applications to Bayesian cross-validation. with Niloy Biswas, Paul Vanetti Estimating Convergence of Markov chains with L-Lag Couplings, 2019. with Chris Holmes, Lawrence Murray, Christian Robert, George Nicholson Better together? Statistical learning in models made of modules. Pierre E. Jacob Monte Carlo for not quite Bayes