This document outlines an approach to inference when exact Bayesian methods are not applicable. Specifically, it discusses Dempster-Shafer theory, which defines lower and upper probabilities for hypotheses based on feasible parameter sets. It proposes a Gibbs sampler to sample from the distribution of these feasible sets defined by count data. It represents the feasible set as relations between data points, allowing conditional distributions to be derived. This leads to a Gibbs sampling algorithm for approximating inferences under Dempster-Shafer theory for problems where exact Bayesian computation is difficult.
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les cordeliers
Jere Koskela's slides
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les cordeliers
Jere Koskela's slides
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les cordeliers
Chris Sherlock's slides
Lattice rules are one of the two main classes of methods for quasi-Monte Carlo (QMC) and randomized quasi-Monte Carlo (RQMC) integration. In this tutorial, we recall the definition and summarize the key properties of lattice rules. We discuss what classes of functions these rules are good to integrate, and how their parameters can be chosen in terms of variance bounds for these classes of functions. We consider integration lattices in the real space as well as in a polynomial space over the finite field F2. We provide various numerical examples of how these rules perform compared with standard Monte Carlo. Some examples involve high-dimensional integrals, others involve Markov chains. We also discuss software design for RQMC and what software is available.
In this tutorial I will provide a survey of recent research efforts on the application of QMC methods to PDEs with random coefficients. Such PDE problems occur in the area of uncertainty quantification. A prime example is the flow of water through a disordered porous medium. There is a huge body of literature on this topic using a variety of methods. QMC methods are relatively new to this application area. The aim of this tutorial is to provide an entry point for QMC experts wanting to start research in this direction, for PDE analysts and practitioners wanting to tap into contemporary QMC theory and methods, and for anyone else who sees how to cross-fertilize the ideas to other application areas.
In this article we consider macrocanonical models for texture synthesis. In these models samples are generated given an input texture image and a set of features which should be matched in expectation. It is known that if the images are quantized, macrocanonical models are given by Gibbs measures, using the maximum entropy principle. We study conditions under which this result extends to real-valued images. If these conditions hold, finding a macrocanonical model amounts to minimizing a convex function and sampling from an associated Gibbs measure. We analyze an algorithm which alternates between sampling and minimizing. We present experiments with neural network features and study the drawbacks and advantages of using this sampling scheme.
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les cordeliers
Chris Sherlock's slides
Lattice rules are one of the two main classes of methods for quasi-Monte Carlo (QMC) and randomized quasi-Monte Carlo (RQMC) integration. In this tutorial, we recall the definition and summarize the key properties of lattice rules. We discuss what classes of functions these rules are good to integrate, and how their parameters can be chosen in terms of variance bounds for these classes of functions. We consider integration lattices in the real space as well as in a polynomial space over the finite field F2. We provide various numerical examples of how these rules perform compared with standard Monte Carlo. Some examples involve high-dimensional integrals, others involve Markov chains. We also discuss software design for RQMC and what software is available.
In this tutorial I will provide a survey of recent research efforts on the application of QMC methods to PDEs with random coefficients. Such PDE problems occur in the area of uncertainty quantification. A prime example is the flow of water through a disordered porous medium. There is a huge body of literature on this topic using a variety of methods. QMC methods are relatively new to this application area. The aim of this tutorial is to provide an entry point for QMC experts wanting to start research in this direction, for PDE analysts and practitioners wanting to tap into contemporary QMC theory and methods, and for anyone else who sees how to cross-fertilize the ideas to other application areas.
In this article we consider macrocanonical models for texture synthesis. In these models samples are generated given an input texture image and a set of features which should be matched in expectation. It is known that if the images are quantized, macrocanonical models are given by Gibbs measures, using the maximum entropy principle. We study conditions under which this result extends to real-valued images. If these conditions hold, finding a macrocanonical model amounts to minimizing a convex function and sampling from an associated Gibbs measure. We analyze an algorithm which alternates between sampling and minimizing. We present experiments with neural network features and study the drawbacks and advantages of using this sampling scheme.
Variational inference is a technique for estimating Bayesian models that provides similar precision to MCMC at a greater speed, and is one of the main areas of current research in Bayesian computation. In this introductory talk, we take a look at the theory behind the variational approach and some of the most common methods (e.g. mean field, stochastic, black box). The focus of this talk is the intuition behind variational inference, rather than the mathematical details of the methods. At the end of this talk, you will have a basic grasp of variational Bayes and its limitations.
After we applied the stochastic Galerkin method to solve stochastic PDE, and solve large linear system, we obtain stochastic solution (random field), which is represented in Karhunen Loeve and PCE basis. No sampling error is involved, only algebraic truncation error. Now we would like to escape classical MCMC path to compute the posterior. We develop an Bayesian* update formula for KLE-PCE coefficients.
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...mathsjournal
In the earlier work, Knuth present an algorithm to decrease the coefficient growth in the Euclidean algorithm of polynomials called subresultant algorithm. However, the output polynomials may have a small factor which can be removed. Then later, Brown of Bell Telephone Laboratories showed the subresultant in another way by adding a variant called 𝜏 and gave a way to compute the variant. Nevertheless, the way failed to determine every 𝜏 correctly.
In this paper, we will give a probabilistic algorithm to determine the variant 𝜏 correctly in most cases by adding a few steps instead of computing 𝑡(𝑥) when given 𝑓(𝑥) and𝑔(𝑥) ∈ ℤ[𝑥], where 𝑡(𝑥) satisfies that 𝑠(𝑥)𝑓(𝑥) + 𝑡(𝑥)𝑔(𝑥) = 𝑟(𝑥), here 𝑡(𝑥), 𝑠(𝑥) ∈ ℤ[𝑥]
Talk given at Kobayashi-Maskawa Institute, Nagoya University, Japan.Peter Coles
Cosmic Anomalies
Observational measurements of the temperature variation of the Cosmic Microwave Background across the celestial sphere made by the Wilkinson Microwave Anisotropy Probe (WMAP) and, more recently Planck, have played a major part in establishing the standard "concordance" cosmological model. However, extensive statistical analysis of these data have also revealed some tantalising anomalies whose interpretation within the standard framework is by no means clear. In this talk, I'll discuss the significance of the evidence for some aspects of this anomalous behaviour, offer some possible theoretical models, and suggest how future measurements may provide firmer conclusions.
Susie Bayarri Plenary Lecture given in the ISBA (International Society of Bayesian Analysis) World Meeting in Montreal, Canada on June 30, 2022, by Pierre E, Jacob (https://sites.google.com/site/pierrejacob/)
Talk on the design on non-negative unbiased estimators, useful to perform exact inference for intractable target distributions.
Corresponds to the article http://arxiv.org/abs/1309.6473
SMC^2: an algorithm for sequential analysis of state-space modelsPierre Jacob
In these slides I presented the SMC^2 method (see the article here: http://arxiv.org/abs/1101.1528 ) to an audience of marine biogeochemistry people, emphasizing on the model evidence estimation aspect.
This a short presentation for a 15 minutes talk at Bayesian Inference for Stochastic Processes 7, on the SMC^2 algorithm.
http://arxiv.org/abs/1101.1528
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Monte Carlo methods for some not-quite-but-almost Bayesian problems
1. Monte Carlo methods for some
not-quite-but-almost Bayesian problems
Pierre E. Jacob
Department of Statistics, Harvard University
joint work with
Ruobin Gong, Paul T. Edlefsen, Arthur P. Dempster
John O’Leary, Yves F. Atchad´e, Niloy Biswas, Paul Vanetti
and others
November 21, 2019
Department of Statistical Science, University of Toronto
Pierre E. Jacob Monte Carlo for not quite Bayes
2. Introduction
A lot of questions in statistics give rise to non-trivial
computational problems.
Among these, some are numerical integration problems ⇔
about sampling from probability distributions.
Besag, Markov chain Monte Carlo for statistical inference, 2001.
Computational challenges arise in deviations from standard
Bayesian inference, motivated by three questions,
quantifying ignorance / Dempster–Shafer analysis,
model misspecification / modular Bayesian inference,
robustness to some perturbation of the data / BayesBag.
Pierre E. Jacob Monte Carlo for not quite Bayes
3. Outline
1 Dempster–Shafer analysis of count data
2 Unbiased MCMC and diagnostics of convergence
3 Modular Bayesian inference
4 Bagging posterior distributions
Pierre E. Jacob Monte Carlo for not quite Bayes
4. Outline
1 Dempster–Shafer analysis of count data
2 Unbiased MCMC and diagnostics of convergence
3 Modular Bayesian inference
4 Bagging posterior distributions
Pierre E. Jacob Monte Carlo for not quite Bayes
5. Inference with count data
Notation: [N] := {1, . . . , N}. Simplex ∆.
Observations : xn ∈ [K] := {1, . . . , K}, x = (x1, . . . , xN ).
Index sets : Ik = {n ∈ [N] : xn = k}.
Counts : Nk = |Ik|.
Model: xn
iid
∼ Categorical(θ) with θ = (θk)k∈[K] ∈ ∆,
i.e. P(xn = k) = θk for all n, k.
Goal: estimate θ, predict, etc.
Maximum likelihood estimator: ˆθk = Nk/N.
Bayesian inference combines likelihood with prior on θ into a
posterior distribution, assigning a probability ∈ [0, 1] to any
measurable subset Σ of the simplex ∆.
Pierre E. Jacob Monte Carlo for not quite Bayes
6. Arthur Dempster’s approach to inference
Observations x = (xn)n∈[N] are fixed.
We will specify a sampling mechanism, on top of the likelihood,
e.g. xn = m(un, θ) for some function m and random variable un.
We will seek u = (un)n∈[N] that could have generated x for
some θ. For arbitrary u, such a θ might not exist.
If a set of feasible θ exists denote it by F(u). Dempster’s
approach defines lower/upper probabilities for subsets Σ of
interest, as expectations with respect to non-empty F(u).
Arthur P. Dempster. New methods for reasoning towards posterior
distributions based on sample data. Annals of Mathematical Statistics, 1966.
Arthur P. Dempster. Statistical inference from a Dempster—Shafer
perspective. Past, Present, and Future of Statistical Science, 2014.
Pierre E. Jacob Monte Carlo for not quite Bayes
7. Sampling from a Categorical distribution
2 3
1
∆1(θ)
∆2(θ)∆3(θ)
θ
Subsimplex ∆k(θ), for θ ∈ ∆:
{z ∈ ∆ : ∀ ∈ [K] z /zk ≥ θ /θk}.
Sampling mechanism, for θ ∈ ∆:
- draw un uniform on ∆,
- define xn such that un ∈ ∆xn (θ).
Then P(xn = k) = θk,
because Vol(∆k(θ)) = θk.
Pierre E. Jacob Monte Carlo for not quite Bayes
8. Draws in the simplex
Counts: (2, 3, 1). Let’s draw N = 6 uniform samples on ∆.
2 3
1
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
9. Draws in the simplex
Each un is associated to an observed xn ∈ {11, 22, 33}.
2 3
1
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
10. Draws in the simplex
If there exists a feasible θ, it cannot be just anywhere.
2 3
1
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
11. Draws in the simplex
The uns of each category add constraints on θ.
2 3
1
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
12. Draws in the simplex
Overall the constraints define a polytope for θ, or an empty set.
2 3
1
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
13. Draws in the simplex
Here, there is a polytope of θ such that ∀n ∈ [N] un ∈ ∆xn (θ).
2 3
1
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
14. Draws in the simplex
Any θ in the polytope separates the uns appropriately.
2 3
1
qqq
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
15. Draws in the simplex
Let’s try again with fresh uniform samples on ∆.
2 3
1
q q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
16. Draws in the simplex
Here there is no θ ∈ ∆ such that ∀n ∈ [N] un ∈ ∆xn (θ).
2 3
1
q q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
17. Lower and upper probabilities
Consider the set
Rx = (u1, . . . , uN ) ∈ ∆N
: ∃θ ∈ ∆ ∀n ∈ [N] un ∈ ∆xn (θ) .
and denote by νx the uniform distribution on Rx.
For u ∈ Rx, there is a set F(u) = {θ ∈ ∆ : ∀n un ∈ ∆xn (θ)}.
For a set Σ ⊂ ∆ of interest, define
(lower probability) P(Σ) = 1(F(u) ⊂ Σ)νx(du),
(upper probability) ¯P(Σ) = 1(F(u) ∩ Σ = ∅)νx(du).
Pierre E. Jacob Monte Carlo for not quite Bayes
18. Summary and Monte Carlo problem
Arthur Dempster’s approach, later called Dempster–Shafer
theory of belief functions, is based on a distribution of
feasible sets,
F(u) = {θ ∈ ∆ ∀n ∈ [N] un ∈ ∆xn (θ)},
where u ∼ νx, the uniform distribution on Rx.
How do we obtain samples from this distribution?
Rejection rate 99%, for data (2, 3, 1).
Hit-and-run algorithm?
Our proposed strategy is a Gibbs sampler. Starting from
some u ∈ Rx, we will iteratively refresh some components
un of u given others.
Pierre E. Jacob Monte Carlo for not quite Bayes
19. Gibbs sampler: initialization
We can obtain some u in Rx as follows.
Choose an arbitrary θ ∈ ∆.
For all n ∈ [N] sample un uniformly in ∆k(θ) where xn = k.
2 3
1
∆1(θ)
∆2(θ)∆3(θ)
θ
q
q
q
q
q
q
To sample components un given
others, we will express Rx,
{u : ∃θ ∀n un ∈ ∆xn (θ)}
in terms of relations that the
components un must satisfy with
respect to one another.
Pierre E. Jacob Monte Carlo for not quite Bayes
20. Equivalent representation
For any θ ∈ ∆,
∀k ∈ [K] ∀n ∈ Ik un ∈ ∆k(θ)
⇔ ∀k ∈ [K] ∀n ∈ Ik ∀ ∈ [K]
un,
un,k
≥
θ
θk
.
This is equivalent to
∀k ∈ [K] ∀ ∈ [K] min
n∈Ik
un,
un,k
≥
θ
θk
.
Pierre E. Jacob Monte Carlo for not quite Bayes
21. Linear constraints
Counts: (9, 8, 3), u in Rx.
Values ηk→ = minn∈Ik
un, /un,k define linear constraints on θ.
2 3
1
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
θ3 θ1 = η1→3
θ2 θ1 = η1→2
Pierre E. Jacob Monte Carlo for not quite Bayes
22. Some inequalities
Next, assume u ∈ Rx, write ηk→ = minn∈Ik
un, /un,k, and
consider some implications.
There exists θ ∈ ∆ such that θ /θk ≤ ηk→ for all k, ∈ [K].
Then, for all k,
θ
θk
≤ ηk→ , and
θk
θ
≤ η →k, thus ηk→ η →k ≥ 1.
Pierre E. Jacob Monte Carlo for not quite Bayes
23. More inequalities
We can continue, if K ≥ 3: for all k, , j,
η−1
→k ≤
θ
θk
=
θ
θj
θj
θk
≤ ηj→ ηk→j,
thus ηk→jηj→ η →k ≥ 1.
And if K ≥ 4, for all k, , j, m
ηk→jηj→ η →mηm→k ≥ 1.
Generally,
∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1.
Pierre E. Jacob Monte Carlo for not quite Bayes
24. Main result
So far, if ∃θ ∈ ∆ such that θ /θk ≤ ηk→ for k, ∈ [K] then
∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1.
The reverse implication holds too.
This would mean
Rx = {u : ∃θ ∀k, ∈ [K] θ /θk ≤ ηk→ }
= {u : ∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1}.
i.e. Rx is represented by relations between components (un).
This helps computing conditional distributions under νx,
leading to a Gibbs sampler.
Pierre E. Jacob Monte Carlo for not quite Bayes
25. Some remarks on these inequalities
∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1.
We can consider only unique indices in j1, . . . , jL,
since the other cases can be deduced from those.
Example: η1→2η2→4η4→3η3→2η2→1 ≥ 1,
follows from η1→2η2→1 ≥ 1 and η2→4η4→3η3→2 ≥ 1.
The indices j1 → j2 → · · · → jL → j1 form a cycle.
Pierre E. Jacob Monte Carlo for not quite Bayes
26. Graphs
Fully connected graph with weight log ηk→ on edge (k, ).
1
2
3
log(η1→2)
log(η2→1)
Value of a path = sum of the weights along the path.
Negative cycle = path from vertex to itself with negative value
Pierre E. Jacob Monte Carlo for not quite Bayes
27. Graphs
∀L ∀j1, . . . , jL ηj1→j2 . . . ηjL→j1 ≥ 1
⇔ ∀L ∀j1, . . . , jL log(ηj1→j2 ) + . . . + log(ηjL→j1 ) ≥ 0
⇔ there are no negative cycles in the graph.
1
2
3
log(η1→2)
log(η2→1)
Pierre E. Jacob Monte Carlo for not quite Bayes
28. Proof
Proof of claim: “inequalities” ⇒ “∃θ : θ /θk ≤ ηk→ ∀k, ”.
min(k → ) := minimum value of path from k to in the graph.
Finite ∀k, because of absence of negative cycles in the graph.
Define θ via θk ∝ exp(min(K → k)).
Then θ ∈ ∆. Furthermore, for all k,
min(K → ) ≤ min(K → k) + log(ηk→ ),
therefore θ /θk ≤ ηk→ .
Pierre E. Jacob Monte Carlo for not quite Bayes
29. So far. . .
We want to sample uniformly on the set Rx,
Rx = {u : ∃θ ∀k, ∈ [K] θ /θk ≤ ηk→ }.
We have proved that this set can also be written
{u : ∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1}.
The inequalities hold if and only if some graph with weight
log ηk→ on edge (k, ) does not contain negative cycles.
Pierre E. Jacob Monte Carlo for not quite Bayes
30. Conditional distributions
We can obtain conditional distributions of un for n ∈ Ik given
(un)n/∈Ik
with respect to νx:
un given (un)n/∈Ik
are i.i.d. uniform in ∆k(θ ),
where θ ∝ exp(− min( → k)) for all ,
with min( → k) := minimum value of path from to k.
Shortest paths can be computed in polynomial time.
Pierre E. Jacob Monte Carlo for not quite Bayes
31. Conditional distributions
Counts: (9, 8, 3). What is the conditional distribution of
(un)n∈Ik
given (un)n/∈Ik
under νx?
2 3
1
q
q
q
qq q
q
q
q
q
q
q q
q
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
32. Conditional distributions
Counts: (9, 8, 3). What is the conditional distribution of
(un)n∈Ik
given (un)n/∈Ik
under νx?
2 3
1
q
q
q q
q
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
33. Conditional distributions
Counts: (9, 8, 3). What is the conditional distribution of
(un)n∈Ik
given (un)n/∈Ik
under νx?
2 3
1
q
q
q q
q
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
34. Conditional distributions
Counts: (9, 8, 3). What is the conditional distribution of
(un)n∈Ik
given (un)n/∈Ik
under νx?
2 3
1
q
q
q q
q
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
35. Gibbs sampler
Initial u(0) ∈ Rx.
At each iteration t ≥ 1, for each category k ∈ [K],
1 compute θ such that, for n ∈ Ik,
un given other components is uniform on ∆k(θ ).
2 Draw u
(t)
n ∼ ∆k(θ ) for n ∈ Ik.
3 Update η
(t)
k→ for ∈ [K].
In step 1, θ is obtained by computing shortest path in graph
with weights η
(t)
k→ on edge (k, ).
Computed e.g. with Bellman–Ford algorithm, implemented in
Cs´ardi & Nepusz, igraph package, 2006.
Alternatively, we can compute θ by solving a linear program,
Berkelaar, Eikland & Notebaert, lpsolve package, 2004
Pierre E. Jacob Monte Carlo for not quite Bayes
36. Gibbs sampler
Counts: (9, 8, 3), 100 polytopes generated by the sampler.
2 3
1
Pierre E. Jacob Monte Carlo for not quite Bayes
37. Cost per iteration
Cost in seconds for 100 full sweeps.
0.0
0.3
0.6
0.9
4 8 12 16
K
elapsed
N 256 512 1024 2048
https://github.com/pierrejacob/dempsterpolytope
Pierre E. Jacob Monte Carlo for not quite Bayes
38. Cost per iteration
Cost in seconds for 100 full sweeps.
0.0
0.3
0.6
0.9
256 512 1024 2048
N
elapsed
K 4 8 12 16
https://github.com/pierrejacob/dempsterpolytope
Pierre E. Jacob Monte Carlo for not quite Bayes
39. How many iterations for convergence?
Let ν(t) by the distribution of u(t) after t iterations.
TV(ν(t), νx) = supA |ν(t)(A) − νx(A)|.
0.00
0.25
0.50
0.75
1.00
0 25 50 75 100
iteration
TVupperbounds
K 5 10 20
Pierre E. Jacob Monte Carlo for not quite Bayes
40. How many iterations for convergence?
Let ν(t) by the distribution of u(t) after t iterations.
TV(ν(t), νx) = supA |ν(t)(A) − νx(A)|.
0.00
0.25
0.50
0.75
1.00
0 50 100 150 200
iteration
TVupperbounds
N 50 100 150 200
Pierre E. Jacob Monte Carlo for not quite Bayes
41. Summary
A Gibbs sampler can be used to approximate lower and upper
probabilities in the Dempster–Shafer framework.
Is perfect sampling possible here?
Extensions for hierarchical counts, hidden Markov models?
Jacob, Gong, Edlefsen & Dempster, A Gibbs sampler for a class of
random convex polytopes. On arXiv and researchers.one.
https://github.com/pierrejacob/dempsterpolytope
Pierre E. Jacob Monte Carlo for not quite Bayes
42. Outline
1 Dempster–Shafer analysis of count data
2 Unbiased MCMC and diagnostics of convergence
3 Modular Bayesian inference
4 Bagging posterior distributions
Pierre E. Jacob Monte Carlo for not quite Bayes
43. Coupled chains
Glynn & Rhee, Exact estimation for MC equilibrium expectations, 2014.
Generate two chains (Xt) and (Yt), going to π, as follows:
sample X0 and Y0 from π0 (independently, or not),
sample Xt|Xt−1 ∼ P(Xt−1, ·) for t = 1, . . . , L,
for t ≥ L + 1, sample
(Xt, Yt−L)|(Xt−1, Yt−L−1) ∼ ¯P ((Xt−1, Yt−L−1), ·).
¯P must be such that
Xt+1|Xt ∼ P(Xt, ·) and Yt|Yt−1 ∼ P(Yt−1, ·)
(thus Xt and Yt have the same distribution for all t ≥ 0),
there exists a random time τ such that Xt = Yt−L for t ≥ τ
(the chains meet and remain “faithful”).
Pierre E. Jacob Monte Carlo for not quite Bayes
44. Coupled chains
0
4
8
0 50 100 150 200
iteration
x
π = N(0, 1), RWMH with Normal proposal std = 0.5, π0 = N(10, 32
)
Pierre E. Jacob Monte Carlo for not quite Bayes
45. Unbiased estimators
Under some conditions, the estimator
1
m − k + 1
m
t=k
h(Xt)
+
1
m − k + 1
τ−1
t=k+L
min m − k + 1,
t − k
L
(h(Xt) − h(Yt−L)),
has expectation h(x)π(dx), finite cost and finite variance.
“MCMC estimator + bias correction terms”
Its efficiency can be close to that of MCMC estimators,
if k, m chosen appropriately (and L also).
Jacob, O’Leary & Atchad´e, Unbiased MCMC with couplings, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
46. Finite-time bias of MCMC
Total variation distance between Xt ∼ πt and π = limt→∞ πt:
πt − π TV ≤ E[max(0, (τ − L − t)/L )].
0.000
0.005
0.010
0.015
0 50 100 150 200
τ − lag
lag = 1
1e−04
1e−03
1e−02
1e−01
1e+00
1e+01
1e+02
0 50 100 150 200
iteration
TVupperbounds
Biswas, Jacob & Vanetti, Estimating Convergence of Markov chains
with L-Lag Couplings, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
47. Finite-time bias of MCMC
Total variation distance between Xt ∼ πt and π = limt→∞ πt:
πt − π TV ≤ E[max(0, (τ − L − t)/L )].
0.000
0.005
0.010
0.015
0 50 100 150
τ − lag
lag = 50
1e−04
1e−03
1e−02
1e−01
1e+00
1e+01
1e+02
0 50 100 150 200
iteration
TVupperbounds
Biswas, Jacob & Vanetti, Estimating Convergence of Markov chains
with L-Lag Couplings, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
48. Finite-time bias of MCMC
Total variation distance between Xt ∼ πt and π = limt→∞ πt:
πt − π TV ≤ E[max(0, (τ − L − t)/L )].
0.000
0.005
0.010
0.015
0 50 100 150
τ − lag
lag = 100
1e−04
1e−03
1e−02
1e−01
1e+00
1e+01
1e+02
0 50 100 150 200
iteration
TVupperbounds
Biswas, Jacob & Vanetti, Estimating Convergence of Markov chains
with L-Lag Couplings, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
49. Finite-time bias of MCMC
Upper bounds can also be obtained for e.g. 1-Wasserstein.
And perhaps lower bounds?
Applicable in e.g. high-dimensional and/or discrete spaces.
Biswas, Jacob & Vanetti, Estimating Convergence of Markov chains
with L-Lag Couplings, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
50. Finite-time bias of MCMC
Example: Gibbs sampler for Dempster’s analysis of counts.
0.00
0.25
0.50
0.75
1.00
0 50 100 150 200
iteration
TVupperbounds
N 50 100 150 200
This quantifies bias of MCMC estimators, not variance.
Pierre E. Jacob Monte Carlo for not quite Bayes
51. Outline
1 Dempster–Shafer analysis of count data
2 Unbiased MCMC and diagnostics of convergence
3 Modular Bayesian inference
4 Bagging posterior distributions
Pierre E. Jacob Monte Carlo for not quite Bayes
52. Models made of modules
First module:
parameter θ1, data Y1
prior: p1(θ1)
likelihood: p1(Y1|θ1)
Second module:
parameter θ2, data Y2
prior: p2(θ2|θ1)
likelihood: p2 (Y2|θ1, θ2)
We are interested in the estimation of θ1, θ2 or both.
Pierre E. Jacob Monte Carlo for not quite Bayes
53. Joint model approach
Parameter (θ1, θ2), with prior
p(θ1, θ2) = p1(θ1)p2(θ2|θ1).
Data (Y1, Y2), likelihood
p(Y1, Y2|θ1, θ2) = p1(Y1|θ1)p2(Y2|θ1, θ2).
Posterior distribution
π (θ1, θ2|Y1, Y2) ∝ p1 (θ1) p1(Y1|θ1)p2 (θ2|θ1) p2 (Y2|θ1, θ2).
Pierre E. Jacob Monte Carlo for not quite Bayes
54. Joint model approach
In the joint model approach, all data are used to
simultaneously infer all parameters. . .
. . . so that uncertainty about θ1 is propagated to the
estimation of θ2. . .
. . . but misspecification of the 2nd module can damage the
estimation of θ1.
What about allowing uncertainty propagation, but
preventing feedback of some modules on others?
Pierre E. Jacob Monte Carlo for not quite Bayes
55. Cut distribution
One might want to propagate uncertainty without allowing
“feedback” of second module on first module.
Cut distribution:
πcut
(θ1, θ2; Y1, Y2) = p1(θ1|Y1)p2 (θ2|θ1, Y2).
Different from the posterior distribution under joint model,
under which the first marginal is π(θ1|Y1, Y2).
Pierre E. Jacob Monte Carlo for not quite Bayes
56. Example: epidemiological study
Model of virus prevalence
∀i = 1, . . . , I Zi ∼ Binomial(Ni, ϕi),
Zi is number of women infected with high-risk HPV in a
sample of size Ni in country i.
Beta(1,1) prior on each ϕi, independently.
Impact of prevalence onto cervical cancer occurrence
∀i = 1, . . . , I Yi ∼ Poisson(λiTi), log(λi) = θ2,1 + θ2,2ϕi,
Yi is number of cancer cases arising from Ti woman-years of
follow-up in country i.
N(0, 103) on θ2,1, θ2,2, independently.
Plummer, Cuts in Bayesian graphical models, 2014.
Jacob, Holmes, Murray, Robert & Nicholson, Better together?
Statistical learning in models made of modules.
Pierre E. Jacob Monte Carlo for not quite Bayes
57. Monte Carlo with joint model approach
Joint model posterior has density
π (θ1, θ2|Y1, Y2) ∝ p1 (θ1) p1 (Y1|θ1)p2 (θ2|θ1) p2 (Y2|θ1, θ2).
The computational complexity typically grows
super-linearly with the number of modules.
Difficulties stack up. . .
intractability, multimodality, ridges, etc.
Pierre E. Jacob Monte Carlo for not quite Bayes
58. Monte Carlo with cut distribution
The cut distribution is defined as
πcut
(θ1, θ2; Y1, Y2) = p1(θ1|Y1)p2 (θ2|θ1, Y2) ∝
π (θ1, θ2|Y1, Y2)
p2 (Y2|θ1)
.
The denominator is the feedback of the 2nd module on θ1:
p2 (Y2|θ1) = p2(Y2|θ1, θ2)p2(dθ2|θ1).
The feedback term is typically intractable.
Pierre E. Jacob Monte Carlo for not quite Bayes
59. Monte Carlo with cut distribution
WinBUGS’ approach via the cut function: alternate between
sampling θ1 from K1(θ1 → dθ1), targeting p1(dθ1|Y1);
sampling θ2 from K2
θ1
(θ2 → dθ2), targeting p2(dθ2|θ1, Y2).
This does not leave the cut distribution invariant!
Iterating the kernel K2
θ1
enough times mitigates the issue.
Plummer, Cuts in Bayesian graphical models, 2014.
Pierre E. Jacob Monte Carlo for not quite Bayes
60. Monte Carlo with cut distribution
In a perfect world, we could sample i.i.d.
θi
1 from p1(θ1|Y1),
θi
2 given θi
1 from p2(θ2|θi
1, Y2),
then (θi
1, θi
2) would be i.i.d. from the cut distribution.
Pierre E. Jacob Monte Carlo for not quite Bayes
61. Monte Carlo with cut distribution
In an MCMC world, we can sample
θi
1 approximately from p1(θ1|Y1) using MCMC,
θi
2 given θi
1 approximately from p2(θ2|θi
1, Y2) using MCMC,
then resulting samples approximate the cut distribution,
in the limit of the numbers of iterations, at both stages.
Pierre E. Jacob Monte Carlo for not quite Bayes
62. Monte Carlo with cut distribution
In an unbiased MCMC world, we can approximate expectations
h(x)π(dx) without bias, in finite compute time.
We can obtain an unbiased approximation of p1(θ1|Y1), and for
each θ1, an unbiased approximation of p2(θ2|θ1, Y2).
Thus, by the tower property, we can unbiasedly estimate
h(θ1, θ2)p2(dθ2|θ1, Y1)p1(dθ1|Y1).
Jacob, O’Leary & Atchad´e, Unbiased MCMC with couplings, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
63. Example: epidemiological study
0
1
2
3
−2.5 −2.0 −1.5
θ2,1
density
0.00
0.05
0.10
0.15
10 15 20 25
θ2,2
densityApproximation of the marginals of the cut distribution of
(θ2,1, θ2,2), the parameters of the Poisson regression module in
the epidemiological model of Plummer (2014).
Jacob, Holmes, Murray, Robert & Nicholson, Better together?
Statistical learning in models made of modules.
Pierre E. Jacob Monte Carlo for not quite Bayes
64. Outline
1 Dempster–Shafer analysis of count data
2 Unbiased MCMC and diagnostics of convergence
3 Modular Bayesian inference
4 Bagging posterior distributions
Pierre E. Jacob Monte Carlo for not quite Bayes
65. Bagging posterior distributions
We can stabilize the posterior distribution by using a
bootstrap and aggregation scheme, in the spirit of bag-
ging (Breiman, 1996b). In a nutshell, denote by D a
bootstrap or subsample of the data D. The posterior of
the random parameters θ given the data D has c.d.f.
F(·|D), and we can stabilize this using
FBayesBag(·|D) = E [F(·|D )],
where E is with respect to the bootstrap- or subsam-
pling scheme. We call it the BayesBag estimator. It
can be approximated by averaging over B posterior com-
putations for bootstrap- or subsamples, which might be
a rather demanding task (although say B=10 would al-
ready stabilize to a certain extent).
B¨uhlmann, Discussion of Big Bayes Stories and BayesBag, 2014.
Pierre E. Jacob Monte Carlo for not quite Bayes
66. Bagging posterior distributions
For b = 1, . . . , B
Sample data set D(b) by bootstrapping from D.
Obtain MCMC approximation ˆπ(b) of posterior given D(b).
Finally obtain B−1 B
b=1 ˆπ(b).
Converges to “BayesBag” distribution as both B and number of
MCMC samples go to infinity.
If we can obtain unbiased approximation of posterior given any
D, the resulting approximation of “BayesBag” would be
consistent as B → ∞ only.
Exactly the same reasoning as for the cut distribution.
Example at https://statisfaction.wordpress.com/2019/
10/02/bayesbag-and-how-to-approximate-it/
Pierre E. Jacob Monte Carlo for not quite Bayes
67. Discussion
Some existing alternatives to standard Bayesian inference
are well motivated, but raise computational questions.
There are on-going efforts toward scalable Monte Carlo
methods, e.g. using coupled Markov chains or regeneration
techniques, in addition to sustained search for new MCMC
algorithms.
Quantification of variance is commonly done, quantification
of bias is also possible.
What makes a computational method convenient? It does
not seem to be entirely about asymptotic efficiency when
method is optimally tuned.
Thank you for listening!
Funding provided by the National Science Foundation,
grants DMS-1712872 and DMS-1844695.
Pierre E. Jacob Monte Carlo for not quite Bayes
68. References
Practical couplings in the literature. . .
Propp & Wilson, Exact sampling with coupled Markov chains
and applications to statistical mechanics, Random Structures &
Algorithms, 1996.
Johnson, Studying convergence of Markov chain Monte Carlo
algorithms using coupled sample paths, JASA, 1996.
Neal, Circularly-coupled Markov chain sampling, UoT tech
report, 1999.
Glynn & Rhee, Exact estimation for Markov chain equilibrium
expectations, Journal of Applied Probability, 2014.
Agapiou, Roberts & Vollmer, Unbiased Monte Carlo: posterior
estimation for intractable/infinite-dimensional models, Bernoulli,
2018.
Pierre E. Jacob Monte Carlo for not quite Bayes
69. References
Finite-time bias of MCMC. . .
Brooks & Roberts, Assessing convergence of Markov chain
Monte Carlo algorithms, STCO, 1998.
Cowles & Rosenthal, A simulation approach to convergence rates
for Markov chain Monte Carlo algorithms, STCO, 1998.
Johnson, Studying convergence of Markov chain Monte Carlo
algorithms using coupled sample paths, JASA, 1996.
Gorham, Duncan, Vollmer & Mackey, Measuring Sample Quality
with Diffusions, AAP, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
70. References
Own work. . .
with John O’Leary, Yves F. Atchad´e
Unbiased Markov chain Monte Carlo with couplings, 2019.
with Fredrik Lindsten, Thomas Sch¨on
Smoothing with Couplings of Conditional Particle Filters, 2019.
with Jeremy Heng
Unbiased Hamiltonian Monte Carlo with couplings, 2019.
with Lawrence Middleton, George Deligiannidis, Arnaud
Doucet
Unbiased Markov chain Monte Carlo for intractable target
distributions, 2019.
Unbiased Smoothing using Particle Independent
Metropolis-Hastings, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
71. References
with Maxime Rischard, Natesh Pillai
Unbiased estimation of log normalizing constants with
applications to Bayesian cross-validation.
with Niloy Biswas, Paul Vanetti
Estimating Convergence of Markov chains with L-Lag Couplings,
2019.
with Chris Holmes, Lawrence Murray, Christian Robert,
George Nicholson
Better together? Statistical learning in models made of modules.
Pierre E. Jacob Monte Carlo for not quite Bayes