SlideShare a Scribd company logo
1 of 72
Download to read offline
Division of Engineering
& Applied Science
Robust MCMC Sampling with
Non-Gaussian and Hierarchical Priors
Trends and Advances in Monte Carlo Sampling Algorithms,
SAMSI, USA, December 15, 2017
Matt Dunlop
Victor Chen (Caltech)
Omiros Papaspiliopoulos (ICREA, UPF)
Andrew Stuart (Caltech)
Outline
1. Introduction
2. Series-Based Priors
3. Hierarchical Priors
4. Level Set Priors
5. Numerical Illustrations
6. Conclusions
The Inverse Problem
Problem Statement
Find u from y where : X → Y, η is noise and
y = (u) + η.
• Problem can contain many degrees of uncertainty related to η
and . Solution to problem should hence also contain
uncertainty.
• Quantifying prior beliefs about the state u by a probability
measure, Bayes’ theorem tells us how to update this
distribution given the data y, producing the posterior
distribution.
• In the Bayesian approach, the solution to the problem is the
posterior distribution. 1/56
The Posterior Distribution
• Assume, for simplicity, Y = RJ and the observational noise
η ∼ N(0, ) is Gaussian. The likelihood of y given u is
P(y|u) ∝ exp
(︂
−
1
2
y − (u) 2

)︂
=: exp (−(u; y)) .
• Quantify prior beliefs by a prior distribution μ0 = P(u) on X.
• Posterior distribution μ = P(u|y) on X is given by Bayes’
theorem:
μ(du) =
1
Z
exp (−(u; y)) μ0(du).
See e.g. (Stuart, 2010; Sullivan, 2017).
• We first assume μ0 is Gaussian, then move to more general
priors. 2/56
Robust Sampling
• Expectations under μ can be estimated numerically using samples
from μ. These may be generated though MCMC methods.
• Samples from MCMC chains are correlated, and strong correlations
lead to poor expectation estimates. With straightforward MCMC
methods, these correlations will become stronger as the dimension of
state space N is increased.
• Preferable to have MCMC chains with convergence rate bounded
independently of dimension, e.g. if μN is an approximation to μ,
d(Pk
N
(u0, ·), μN) ≤ C(1 − ρN)k
with ρN ≥ ρ∗ > 0 for all N.
• General idea: if the chain is ergodic in infinite dimensions, the same
should hold for finite-dimensional approximations.
3/56
Robust Sampling Gaussian Prior
• If the prior μ0 = N(0, C) is Gaussian, posterior can be sampled with
MCMC in a dimension robust manner using, for example, the pCN
algorithm (Beskos et al., 2008; Hairer et al., 2014):
pCN Algorithm
1. Set n = 0 and choose β ∈ (0, 1]. Initial State u(0)
∈ X.
2. Propose new state ˆu(n)
= (1 − β2
)
1
2 u(n)
+ βζ(n)
, ζ(n)
∼ N(0, C)
3. Set u(n+1)
= ˆu(n)
with probability
α(u(n)
→ ˆu(n)
) = min
{︀
1, exp
(︀
(u(n)
; y) − (ˆu(n)
; y)
)︀}︀
or else set u(n+1)
= u(n)
.
4. Set n → n + 1 and go to 2.
• Dimension robust geometric methods, such as ∞-MALA, ∞-HMC, are
also available, which take into account derivative information. 4/56
Robust Sampling Non-Gaussian Priors
• Non-Gaussian priors may be preferred over Gaussians to allow
for, for example,
– Piecewise constant fields/sparsity,
– Heavier tails,
– Different scaling properties,
– Complex non-stationary behavior.
• For non-Gaussian priors, dimension robust properties can be
obtained by choosing a prior-reversible proposal kernel such
that the resulting MCMC kernel has a uniformly positive
spectral gap (Vollmer, 2015).
• We consider non-Gaussian random variables that may be
written as non-linear transformations of Gaussians, allowing us
to take advantage of known dimension robust methods for
Gaussian priors. 5/56
Robust Sampling General Idea
• Suppose that we may write μ0 = T♯
ν0 for some probability
measure ν0 and transformation map T :  → X. Then ξ ∼ ν0
implies that T(ξ) ∼ μ0.
• Consider the two distributions
μ(du) =
1
Z
exp
(︀
− (u; y)
)︀
μ0(du),
ν(du) =
1
Z
exp
(︀
− (T(ξ); y)
)︀
ν0(dξ).
Then μ = T♯
ν.
• If we can sample ν in a dimension robust manner, we can
therefore sample μ in a dimension robust manner.
6/56
Robust Sampling Non-Gaussian Priors
• Assuming μ0 = T♯
N(0, I), we can sample the posterior in a dimension
robust manner with the whitened pCN algorithm:
wpCN Algorithm
1. Set n = 0 and choose β ∈ (0, 1]. Initial State ξ(0)
∈ , set
u(0)
= T(ξ(0)
).
2. Propose new state ˆξ(n)
= (1 − β2
)
1
2 ξ(n)
+ βζ(n)
, ζ(n)
∼ N(0, I)
3. Set ξ(n+1)
= ˆξ(n)
with probability
α(ξ(n)
→ ˆξ(n)
) = min
{︀
1, exp
(︀
(T(ξ(n)
); y) − (T(ˆξ(n)
); y)
)︀}︀
or else set ξ(n+1)
= ξ(n)
.
4. Set u(n+1)
= T(ξ(n+1)
).
5. Set n → n + 1 and go to 2.
7/56
Outline
1. Introduction
2. Series-Based Priors
3. Hierarchical Priors
4. Level Set Priors
5. Numerical Illustrations
6. Conclusions
7/56
Series-Based Priors
Let
• {φj}j≥1 a deterministic X-valued sequence,
• {ρj}j≥1 be a deterministic R-valued sequence,
• {ζj}j≥1 an independent random R-valued sequence,
• m ∈ X,
and define μ0 to be the law of the random variable
u = m +
∞∑︁
j=1
ρjζjφj.
Such priors have been considered in Bayesian inversion in cases where, for
example, {ζj} is uniform (Schwab, Stuart, 2012), Besov (Lassas et al., 2009)
and stable (Sullivan, 2017).
8/56
Series-Based Priors
We can find a Gaussian measure ν0 and map T such that μ0 = T♯
ν0
as follows:
1. Write down a method for sampling ζj.
2. Using an inverse CDF type method, rewrite this method in
terms of a (possibly multivariate) Gaussian random variable ξj,
so that ζj
d
= Λj(ξj).
3. Define
T(ξ) = m +
∞∑︁
j=1
ρjΛj(ξj)φj
and ν0 the joint distribution of {ξj}j≥1.
9/56
Series-Based Priors Examples
Example (Uniform)
• We want ζj ∼ U(−1, 1).
• Define ν0 = N(0, 1)∞ and Λj(z) = 2φ(z) − 1, where φ is the
standard normal CDF.
Example (Besov)
• We want ζj ∼ πq, where πq(z) ∝ exp
(︁
− 1
2
|z|q
)︁
.
• Define ν0 = N(0, 1)∞ and
Λj(z) = 21/q
sgn(z)
(︁
γ−1
1/q
(φ(|z|) − 1)
)︁1/q
where γ1/q is the normalized lower incomplete gamma function.
10/56
Numerical Example I Dimensional Robustness
• We consider a 2D regression problem with a Besov prior.
• We compare the average acceptance rate of the whitened pCN
algorithm with those of three different Random Walk
Metropolis algorithms, as the dimension of the state space is
increased.
RWM Algorithm
1. Set n = 0 and choose β > 0. Initial State u(n)
∈ X.
2. Propose new state ˆu(n)
= u(n)
+ βζ(n)
, ζ(n)
∼ Q(ζ)
3. Set u(n+1)
= ˆu(n)
with probability
α(u(n)
→ ˆu(n)
) = min
{︁
1, exp
(︀
(u(n)
; y) − (ˆu(n)
; y)
)︀ μ0(ˆu(n)
)
μ0(u(n))
}︁
or else set u(n+1)
= u(n)
.
4. Set n → n + 1 and go to 2.
11/56
Numerical Example I Dimensional Robustness
12/56
10
-5
10
-4
10
-3
10
-2
10
-1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Meanacceptanceprobability
Whitened pCN
10
-5
10
-4
10
-3
10
-2
10
-1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RWM with white Gaussian jumps
N = 64
N = 256
N = 1024
N = 4096
N = 16384
N = 65536
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Meanacceptanceprobability
RWM with colored Gaussian jumps
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RWM with Besov jumps
Figure: Mean acceptance rates for different MCMC algorithms.
Curves are shown for different state space dimensions N.
β β
Numerical Example I Dimensional Robustness
13/56
10
-5
10
-4
10
-3
10
-2
10
-1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Meanacceptanceprobability
10
-5
10
-4
10
-3
10
-2
10
-1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
N = 256
N = 1024
N = 4096
N = 16384
N = 65536
10-5
10-4
10-3
10-2
10-1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Meanacceptanceprobability
RWM with colored Gaussian jumps
10-5
10-4
10-3
10-2
10-1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RWM with Besov jumps
Figure: Mean acceptance rates for different MCMC algorithms.
Curves are shown for different state space dimensions N.
Numerical Example II Existing Methods
• Vollmer (2015) provides methodology for robust sampling with
a uniform prior, by reflecting random walk proposals at the
boundary of [−1, 1]N.
• We compare these methods, using both uniform and Gaussian
reflected random walk proposals, with the whitened pCN
method described above.
• As ν has a Gaussian prior, we have additional methodology
available: we compare also with a dimension robust version of
MALA, which makes likelihood informed proposals (Beskos et al,
2017).
14/56
Numerical Example II Existing Methods
15/56
0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RURWM
RSRWM
wPCN
-wMALA
0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RURWM
RSRWM
wPCN
-wMALA
Figure: Autocorrelations for different sampling methods for a
linear deblurring problem, with 8/32 observations (left/right).
Outline
1. Introduction
2. Series-Based Priors
3. Hierarchical Priors
4. Level Set Priors
5. Numerical Illustrations
6. Conclusions
15/56
Hierarchical Priors
• A family of Gaussian distributions will often have a number of
parameters associated with it controlling sample properties.
• For example, Whittle-Matérn distributions with parameters
θ = (ν, τ, σ), which have covariance function
cθ(x, y) = σ2
1
2ν−1(ν)
(τ|x − y|)ν
Kν(τ|x − y|).
– ν controls smoothness – draws from Gaussian fields with this
covariance have ν fractional Sobolev and Hölder derivatives.
– τ is an inverse length-scale.
– σ is an amplitude scale.
• These parameters may not be known a priori, and may be
treated hierarchically as hyperparameters in the problem.
16/56
Hierarchical Priors
• Placing a hyperprior π0 on the hyperparameters θ, the posterior
μ(du, dθ) may be sampled using a Metropolis-within-Gibbs
method, alternating the following two steps:
Metropolis-within-Gibbs
1. Update u(n) → u(n+1) using pCN (etc) method for the
conditional distribution u|(θ(n), y).
2. Update θ(n) → θ(n+1) using an MCMC method that samples
θ|(u(n+1), y) in stationarity.
• Problem: For different choices of θ, the Gaussian distributions
u|θ are typically singular. Moves in the hyperparameters are
hence never accepted in the infinite-dimensional limit. 17/56
Hierarchical Priors
• We use the same methodology for robust sampling as for the
series-based priors, finding an appropriate measure ν0 and
map T.
• Suppose u|θ ∼ N(0, C(θ)). If ξ ∼ N(0, I) is white noise, then
C(θ)1/2
ξ ∼ N(0, C(θ)).
We hence define ν0 = N(0, I) ⊗ π0 and T(ξ, θ) = C(θ)1/2ξ.
• ξ and θ are independent under ν0, and so singularity issues
from Metropolis-within-Gibbs disappear when sampling ν.
• Referred to as non-centered parameterization in literature
(Papaspiliopoulos et al., 2007)
18/56
Hierarchical Priors Centered vs Non-Centered
• Current state θ, proposed state θ = θ + ϵη.
• Centered acceptance rate for θ|u, y:
1∧
√︃
det C(θ)
det C(θ )
exp
(︂
1
2
C(θ)−1/2
u 2
−
1
2
C(θ )−1/2
u 2
)︂
π0(θ )
π0(θ)
.
• Non-Centered acceptance rate for θ|ξ, y:
1 ∧ exp
(︀
(T(ξ, θ); y) − (T(ξ, θ ); y)
)︀π0(θ )
π0(θ)
.
18/56
Hierarchical Priors
Example (Whittle-Matérn Distributions)
• If u is a Gaussian random field on Rd with covariance function
cθ(·, ·) above, then it is equal in law to the solution of the SPDE
(τ2
I − )ν/2+d/4
u = β(ν)στν
ξ
where ξ ∼ N(0, I). (Lindgren et al., 2011)
• The unknown field in the inverse problem can thus either be
treated as u or ξ, related via the mapping
u = T(ξ, θ) = β(ν)στν
(τ2
I − )−ν/2−d/4
ξ.
• We set ν0 = N(0, I) ⊗ π0 and define T as above.
19/56
Outline
1. Introduction
2. Series-Based Priors
3. Hierarchical Priors
4. Level Set Priors
5. Numerical Illustrations
6. Conclusions
19/56
Level Set Method
• It is often of interest to recover a piecewise constant function,
such as in inverse interface or classification problems.
• We can construct a piecewise constant function by thresholding
a continuous function at a number of levels.
• Define F : C0(D) → L∞(D) by
F(ξ)(x) =
⎧
⎪⎪⎨
⎪⎪⎩
κ1 ξ(x) ∈ (−∞, −1)
κ2 ξ(x) ∈ [−1, 1)
κ3 ξ(x) ∈ [1, ∞).
Then if ξ ∼ N(0, C) is continuous Gaussian random field,
u = F(ξ) is a piecewise constant random field.
• We have a natural method for sampling when a prior
μ0 = F♯
N(0, C) is used. 20/56
Level Set Method
21/56
Figure: Examples of continuous fields ξ ∼ N(0, C) (top) and
the thresholded fields u = F(ξ) (bottom).
Level Set Method
22/56
Figure: Examples of continuous fields ξ ∼ N(0, C(τ)) for vari-
ous τ (top) and the thresholded fields u = F(ξ) (bottom).
• The thresholding function F : C0(D; R) → L∞(D; R) has the
disadvantage that an ordering of layers/classes is asserted. This
can be overcome by instead using a vector level set method.
• Let k denote the number of layers/classes. Define
S : C0(D; Rk) → L∞(D; Rk) by
S(ξ)(x) = er(x;ξ), r(x; ξ) = argmax
j=1,...,k
ξj(x), (1)
where {er}k
r=1
denotes the standard basis for Rk.
• Samples from μ0 = S♯
N(0, C)k then take values in the set
{er}k
r=1
for each x ∈ D.
23/56
Outline
1. Introduction
2. Series-Based Priors
3. Hierarchical Priors
4. Level Set Priors
5. Numerical Illustrations
6. Conclusions
23/56
Numerical Example III Groundwater Flow
• We consider the case where the forward map represents the
mapping from permeability to pressure in the steady state
Darcy flow model.
• Let D = (0, 1)2. Given u ∈ L∞(D), denote p(x) = p(x; u) the
solution to {︃
−∇ · (u∇p) = f x ∈ D
p = 0 x ∈ ∂D
• Define : L∞(D) → RK by (u) = (p(x1; u), . . . , p(xJ; u)).
• We look at the Bayesian inverse problem of recovering u from a
noisy measurement of (u), using non-hierarchical and
hierarchical level set priors.
24/56
Numerical Example III Groundwater Flow
• We consider the case where the forward map represents the
mapping from permeability to pressure in the steady state
Darcy flow model, observed at 36 points.
25/56
Figure: (Left) True permeability u†, (Right) True pressure and
observed data y.
Numerical Example III Groundwater Flow
• We consider the case where the forward map represents the
mapping from permeability to pressure in the steady state
Darcy flow model, observed at 36 points.
26/56
Figure: (Left) True permeability u†, (Right) Posterior mean
F(E(ξ)).
Numerical Example III Groundwater Flow
• We consider the case where the forward map represents the
mapping from permeability to pressure in the steady state
Darcy flow model, observed at 36 points.
27/56
Figure: (Left) True permeability u†, (Right) Posterior mean
F
(︀
T
(︀
E(ξ), E(α), E(τ)
)︀)︀
.
Numerical Example IV Deep Gaussian Process I
• Consider the infinite hierarchy {un}n∈N of the form
un+1 | un ∼ N(0, C(un))
• We can alternatively write this in the form
un+1 = L(un)ξn, ξn ∼ N(0, I) i.i.d
where L(u)L(u)∗ = C(u).
• Classical setup of (Domianou, Lawrence, 2013): L(u)ξ = ξ ◦ u.
• We choose L(u) such that C(u) is a Whittle-Matern-type
covariance with spatially varying inverse length-scale τ(u).
• Ergodicity of this process means that it is sufficient for inference
to terminate the hierarchy after a small number of levels. 28/56
Numerical Example IV Deep Gaussian Process I
• We can consider the unknowns in the problem to be the
variables
– u = (u0, . . . , uN−1), which are correlated under the prior; or
– ξ = (ξ0, . . . , ξN−1), which areindependent under the prior.
• These variables are related via u = TN(ξ), where the
deterministic maps Tn :  → X are defined iteratively by
T1(ξ0, . . . , ξN−1) = ξ0,
Tn+1(ξ0, . . . , ξN−1) = L(Tn(ξ0, . . . , ξN−1))ξn, n = 1, . . . , N − 1.
• We thus set T = TN and ν0 = N(0, I)N.
29/56
Numerical Example IV 1D Regression
We consider regression using a Whittle-Matern DGP prior, with the
aim of recovering the following field from noisy point observations:
30/56
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
-0.5
0
0.5
1
1.5
Figure: The true field u†. It is observed on a grid of J points.
Numerical Example IV 1D Regression
31/56
0 0.2 0.4 0.6 0.8 1
-0.5
0
0.5
1
1.5
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
-0.5
0
0.5
1
1.5
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
-0.5
0
0.5
1
1.5
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
-0.5
0
0.5
1
1.5
Figure: Sample means of layers under the posterior, using 4,3,2,1
layers for the DGP prior. Here J = 25.
Numerical Example IV 1D Regression
32/56
0 0.2 0.4 0.6 0.8 1
-0.5
0
0.5
1
1.5
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
-0.5
0
0.5
1
1.5
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
-0.5
0
0.5
1
1.5
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
-0.5
0
0.5
1
1.5
Figure: Sample means of layers under the posterior, using 4,3,2,1
layers for the DGP prior. Here J = 50.
Numerical Example IV 1D Regression
33/56
0 0.2 0.4 0.6 0.8 1
-0.5
0
0.5
1
1.5
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
-0.5
0
0.5
1
1.5
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
-0.5
0
0.5
1
1.5
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
-0.5
0
0.5
1
1.5
Figure: Sample means of layers under the posterior, using 4,3,2,1
layers for the DGP prior. Here J = 100.
Numerical Example IV 1D Regression
34/56
0 0.2 0.4 0.6 0.8 1
-0.5
0
0.5
1
1.5
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
-0.5
0
0.5
1
1.5
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
-0.5
0
0.5
1
1.5
0 0.2 0.4 0.6 0.8 1
0
50
100
150
0 0.2 0.4 0.6 0.8 1
-0.5
0
0.5
1
1.5
Figure: Sample means of layers under the posterior, using 4,3,2,1
layers for the DGP prior. Here J = 106.
Numerical Example IV Error Values
Table: The L1
-errors u†
− E(uN) L1 between the true field and sample
means for the one-dimensional simulations.
J 1 layer 2 layers 3 layers 4 layers
25 0.0746 0.0658 0.0667 0.0670
50 0.0568 0.0339 0.0339 0.0337
100 0.0485 0.0200 0.0198 0.0196
106 0.0131 0.000145 0.000133 0.000133
35/56
Numerical Example IV 2D Regression
36/56
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure: The true field u†, which posesses multiple length-scales.
It is observed on a grid of J points.
Numerical Example IV 2D Regression
37/56
0 0.5 1
0
0.2
0.4
0.6
0.8
1
10
20
30
40
50
60
70
0 0.5 1
0
0.2
0.4
0.6
0.8
1
10
12
14
16
0 0.5 1
0
0.2
0.4
0.6
0.8
1
0 0.5 1
0
0.2
0.4
0.6
0.8
1
10
20
30
40
50
60
70
0 0.5 1
0
0.2
0.4
0.6
0.8
1
Figure: (Left) E(u3), (Middle, Right) The length-scale fields
E
(︀
F(u2)
)︀
, E
(︀
F(u1)
)︀
. Here J = 210.
Numerical Example IV 2D Regression
38/56
0 0.5 1
0
0.2
0.4
0.6
0.8
1
0 0.5 1
0
0.2
0.4
0.6
0.8
1
0 0.5 1
0
0.2
0.4
0.6
0.8
1
20
40
60
80
0 0.5 1
0
0.2
0.4
0.6
0.8
1
20
40
60
80
0 0.5 1
0
0.2
0.4
0.6
0.8
1
20
40
60
80
100
120
Figure: (Left) E(u3), (Middle, Right) The length-scale fields
E
(︀
F(u2)
)︀
, E
(︀
F(u1)
)︀
. Here J = 28.
Numerical Example IV Error Values
Table: The L2
-errors u†
− E(uN) L2 between the true field and sample
means for the two-dimensional simulations.
J 1 layer 2 layers 3 layers
210 0.0856 0.0813 0.0681
28 0.1310 0.1260 0.1279
39/56
Numerical Example V Deep Gaussian Process II
• We now consider the case where the forward map is the
mapping from conductivity to boundary voltage
measuremeants in the complete electrode model for EIT
(Somersalo et al, 1992).
• This is similar to the groundwater flow problem, except
boundary measurements are made instead of interior
measurements, for a collection of source terms.
• The prior is taken to be a 2 layer hierarchy of the form
considered above, except each continuous field is thresholded
to be piecewise constant.
40/56
Numerical Example V Deep Gaussian Process II
41/56
Figure: (Left) True conductivity field (Middle) True length-scale
field (Right) Vector of observed boundary voltages.
Numerical Example V Deep Gaussian Process II
42/56
Figure: (Left) Pushforward of posterior mean conductivity
(Right) pushforward of posterior mean length scale.
Numerical Example V Deep Gaussian Process II
43/56
Figure: Examples of samples of the conductivity field under the
posterior.
Numerical Example VI Graph-Based Learning
• Assume we have a collection of data points {xj}j∈Z ⊂ Rd
that we
wish to divide into classes.
• We have class labels {yj}j∈Z for a subset of points {xj}j∈Z , Z ⊂ Z,
and wish to propagate these to all points.
• We view the data points as vertices of a weighted graph G, with
weights Wij = η(xi, xj) describing the similarity between data points.
• A priori clustering information can be obtained from the spectral
properties of the graph Laplacian LN ∈ RN×N
associated with G:
LN = D − W, Dii =
N∑︁
j=1
Wij.
• If there are K clusters and η can distinguish between them perfectly,
LN has K zero eigenvalues, whose eigenvectors determine these
clusters perfectly.
44/56
Numerical Example VI Graph-Based Learning
• We define the prior ν0 = N(0, C(θ))K with
C(θ) = PM(LN + τ2)−α
P∗
M
, θ = (α, τ, M).
• A model for the labels is given,
yj = S(u(xj)) + ηj, j ∈ Z ,
where S : RK → RK is a multi-class thresholding function, and
u : G → R is a function on the entire graph.
• This defines the likelihood, and hence posterior distribution:
ν(du) ∝ exp
⎛
⎝−
1
2γ2
∑︁
j∈Z
⃒
⃒S(u(xj)) − yj
⃒
⃒2
−
1
2
⟨︀
u, C(θ)−1
u
⟩︀
⎞
⎠ du.
45/56
Numerical Example VI Graph-Based Learning
• This problem is high- but finite-dimensional: it is not directly an
approximation of a problem on an infinite-dimensional Hilbert
space. However, with appropriate assumptions and scaling, it
does have a continuum limit as N → ∞. (Garcia Trillos, Slepčev,
2014; Garcia-Trillos et al, 2017)
46/56
Numerical Example VI Graph-Based Learning
• We consider the case where the data points are elements of the
MNIST dataset: 28 × 28 pixel images of handwritten digits
0, 1, . . . , 9.
• We are hierarchical about subsets of the parameters (α, τ, M),
and see how classification accuracy is affected by this choice.
• A sample u(k) ∼ N(0, C(θ)) can be expressed in the eigenbasis
{λj, qj} of LN as
u(k)
=
M∑︁
j=1
(λj + τ2
)−α/2
ξjqj, ξj ∼ N(0, 1) i.i.d..
• We restrict to the digits 3, 4, 5, 9 here to speed up computations,
and take ≈ 2000 of each digit so that |Z| ≈ 8000. We provide
labels for a random subset of |Z | = 80 digits. 47/56
Numerical Example VI Graph-Based Learning
48/56
erge faster. The algorithm learning (⌧, ↵, M) follows a similar trend as the one le
), but it appears to be doing slightly worse overall.
Fig. 22. Mean classification accuracy of 40 trials against sample number. MNIST [3, 4, 5, 9].
9. Conclusions and Future Directions. We explored a Bayesian hierarchical appro
ry and multiclass graph-based classification, looking at hyperparameters that control
nvectors could be used to form the classification. We use MCMC techniques to samp
Figure: Mean classification accuracy of 40 trials against sample
number, for MNIST digits 3,4,5,9.
• We now focus on a test set of 10000 digits, with approximately
1000 of each digit 0, 1, . . . , 9, being hierarchical about (α, M).
• From samples we can look at uncertainty in addition to
accuracy.
• We introduce the measure of uncertainty associated with a data
point xj as
U(xj) =
√︃
k
k − 1
min
r=1,...,k
⃦
⃦E(Su)(xj) − er
⃦
⃦
2
Remark: (Su)(x) lies in the set {er}k
r=1
for each fixed u, x, but
the mean E(Su)(x) in general will not.
49/56
Numerical Example VI Graph-Based Learning
50/56
Figure: 100 most uncertain digits (top) and a selection of 100
most certain digits (bottom), 200 labels.
Mean uncertainty: 14.0%.
Numerical Example VI Graph-Based Learning
51/56
Figure: 100 most uncertain digits, 200+100 uncertain labels.
Mean uncertainty: 10.3%.
Numerical Example VI Graph-Based Learning
51/56
Figure: 100 most uncertain digits, 200+200 uncertain labels.
Mean uncertainty: 8.1%.
Numerical Example VI Graph-Based Learning
51/56
Figure: 100 most uncertain digits, 200+300 uncertain labels.
Mean uncertainty: 7.1%.
Numerical Example VI Graph-Based Learning
52/56
Figure: Histogram of uncertainty, 200 labels.
Numerical Example VI Graph-Based Learning
52/56
Figure: Histogram of uncertainty, 200+100 uncertain labels.
Numerical Example VI Graph-Based Learning
52/56
Figure: Histogram of uncertainty, 200+200 uncertain labels.
Numerical Example VI Graph-Based Learning
52/56
Figure: Histogram of uncertainty, 200+300 uncertain labels.
Numerical Example VI Graph-Based Learning
53/56
Figure: Histogram of uncertainty, 200 labels.
Numerical Example VI Graph-Based Learning
53/56
Figure: Histogram of uncertainty, 200+100 certain labels.
Numerical Example VI Graph-Based Learning
53/56
Figure: Histogram of uncertainty, 200+200 certain labels.
Numerical Example VI Graph-Based Learning
53/56
Figure: Histogram of uncertainty, 200+300 certain labels.
Outline
1. Introduction
2. Series-Based Priors
3. Hierarchical Priors
4. Level Set Priors
5. Numerical Illustrations
6. Conclusions
53/56
Conclusions
• Non-Gaussian priors can be more appropriate to use than
Gaussians in certain contexts, and can provide extra flexibility
via use of hyperparameters.
• By writing a non-Gaussian prior as the pushforward of a
Gaussian under some non-linear map, existing dimension robust
MCMC sampling methods for Gaussian priors can be used.
• For hierarchical priors, the transformation can also chosen such
that the priors on the field and hyperparameters are
independent (non-centering), avoiding issues associated with
measure singularity in the Gibbs sampler.
54/56
Thank you
55/56
References
[1] A. M. Stuart. Inverse problems: a Bayesian perspective. Acta
Numerica, 2010.
[2] M. M. Dunlop, M. Girolami, A. M. Stuart and A. L. Teckentrup.
How deep are deep Gaussian processes? Submitted, 2017.
[3] V. Chen, M. M. Dunlop, O. Papaspiliopoulos and A. M. Stuart.
Robust MCMC Sampling in High Dimensions with Non-Gaussian
and Hierarchical Priors. In Preparation.
[4] M. M. Dunlop, M. A. Iglesias and A. M Stuart. Hierarchical
Bayesian level set inversion. Statistics and Computing, 2016.
[5] Z. Wang, J. M. Bardsley, A. Solonen, T. Cui and Y. M. Marzouk.
Bayesian inverse problems with ℓ1 priors: a
randomize-then-optimize approach. SIAM Journal on Scientific
Computing, 2017. 56/56

More Related Content

What's hot

Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big DataChristian Robert
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methodsChristian Robert
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannolli0601
 

What's hot (20)

Nested sampling
Nested samplingNested sampling
Nested sampling
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methods
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
 

Similar to Robust MCMC Sampling with Non-Gaussian and Hierarchical Priors

Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themPierre Jacob
 
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...SSA KPI
 
Cs229 notes9
Cs229 notes9Cs229 notes9
Cs229 notes9VuTran231
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural NetworksMasahiro Suzuki
 
Minimum mean square error estimation and approximation of the Bayesian update
Minimum mean square error estimation and approximation of the Bayesian updateMinimum mean square error estimation and approximation of the Bayesian update
Minimum mean square error estimation and approximation of the Bayesian updateAlexander Litvinenko
 
Machine learning (10)
Machine learning (10)Machine learning (10)
Machine learning (10)NYversity
 
Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxHaibinSu2
 
Unbiased MCMC with couplings
Unbiased MCMC with couplingsUnbiased MCMC with couplings
Unbiased MCMC with couplingsPierre Jacob
 
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920Karl Rudeen
 
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...SYRTO Project
 
Introduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsIntroduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsUniversity of Salerno
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaAlexander Litvinenko
 

Similar to Robust MCMC Sampling with Non-Gaussian and Hierarchical Priors (20)

Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing them
 
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
 
Cs229 notes9
Cs229 notes9Cs229 notes9
Cs229 notes9
 
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
 
mcmc
mcmcmcmc
mcmc
 
Minimum mean square error estimation and approximation of the Bayesian update
Minimum mean square error estimation and approximation of the Bayesian updateMinimum mean square error estimation and approximation of the Bayesian update
Minimum mean square error estimation and approximation of the Bayesian update
 
Deconvolution
DeconvolutionDeconvolution
Deconvolution
 
Machine learning (10)
Machine learning (10)Machine learning (10)
Machine learning (10)
 
Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptx
 
Unbiased MCMC with couplings
Unbiased MCMC with couplingsUnbiased MCMC with couplings
Unbiased MCMC with couplings
 
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
 
Project Paper
Project PaperProject Paper
Project Paper
 
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
 
Introduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsIntroduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov Chains
 
Input analysis
Input analysisInput analysis
Input analysis
 
Implicit schemes for wave models
Implicit schemes for wave modelsImplicit schemes for wave models
Implicit schemes for wave models
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formula
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

Recently uploaded

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 

Recently uploaded (20)

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 

Robust MCMC Sampling with Non-Gaussian and Hierarchical Priors

  • 1. Division of Engineering & Applied Science Robust MCMC Sampling with Non-Gaussian and Hierarchical Priors Trends and Advances in Monte Carlo Sampling Algorithms, SAMSI, USA, December 15, 2017 Matt Dunlop Victor Chen (Caltech) Omiros Papaspiliopoulos (ICREA, UPF) Andrew Stuart (Caltech)
  • 2. Outline 1. Introduction 2. Series-Based Priors 3. Hierarchical Priors 4. Level Set Priors 5. Numerical Illustrations 6. Conclusions
  • 3. The Inverse Problem Problem Statement Find u from y where : X → Y, η is noise and y = (u) + η. • Problem can contain many degrees of uncertainty related to η and . Solution to problem should hence also contain uncertainty. • Quantifying prior beliefs about the state u by a probability measure, Bayes’ theorem tells us how to update this distribution given the data y, producing the posterior distribution. • In the Bayesian approach, the solution to the problem is the posterior distribution. 1/56
  • 4. The Posterior Distribution • Assume, for simplicity, Y = RJ and the observational noise η ∼ N(0, ) is Gaussian. The likelihood of y given u is P(y|u) ∝ exp (︂ − 1 2 y − (u) 2  )︂ =: exp (−(u; y)) . • Quantify prior beliefs by a prior distribution μ0 = P(u) on X. • Posterior distribution μ = P(u|y) on X is given by Bayes’ theorem: μ(du) = 1 Z exp (−(u; y)) μ0(du). See e.g. (Stuart, 2010; Sullivan, 2017). • We first assume μ0 is Gaussian, then move to more general priors. 2/56
  • 5. Robust Sampling • Expectations under μ can be estimated numerically using samples from μ. These may be generated though MCMC methods. • Samples from MCMC chains are correlated, and strong correlations lead to poor expectation estimates. With straightforward MCMC methods, these correlations will become stronger as the dimension of state space N is increased. • Preferable to have MCMC chains with convergence rate bounded independently of dimension, e.g. if μN is an approximation to μ, d(Pk N (u0, ·), μN) ≤ C(1 − ρN)k with ρN ≥ ρ∗ > 0 for all N. • General idea: if the chain is ergodic in infinite dimensions, the same should hold for finite-dimensional approximations. 3/56
  • 6. Robust Sampling Gaussian Prior • If the prior μ0 = N(0, C) is Gaussian, posterior can be sampled with MCMC in a dimension robust manner using, for example, the pCN algorithm (Beskos et al., 2008; Hairer et al., 2014): pCN Algorithm 1. Set n = 0 and choose β ∈ (0, 1]. Initial State u(0) ∈ X. 2. Propose new state ˆu(n) = (1 − β2 ) 1 2 u(n) + βζ(n) , ζ(n) ∼ N(0, C) 3. Set u(n+1) = ˆu(n) with probability α(u(n) → ˆu(n) ) = min {︀ 1, exp (︀ (u(n) ; y) − (ˆu(n) ; y) )︀}︀ or else set u(n+1) = u(n) . 4. Set n → n + 1 and go to 2. • Dimension robust geometric methods, such as ∞-MALA, ∞-HMC, are also available, which take into account derivative information. 4/56
  • 7. Robust Sampling Non-Gaussian Priors • Non-Gaussian priors may be preferred over Gaussians to allow for, for example, – Piecewise constant fields/sparsity, – Heavier tails, – Different scaling properties, – Complex non-stationary behavior. • For non-Gaussian priors, dimension robust properties can be obtained by choosing a prior-reversible proposal kernel such that the resulting MCMC kernel has a uniformly positive spectral gap (Vollmer, 2015). • We consider non-Gaussian random variables that may be written as non-linear transformations of Gaussians, allowing us to take advantage of known dimension robust methods for Gaussian priors. 5/56
  • 8. Robust Sampling General Idea • Suppose that we may write μ0 = T♯ ν0 for some probability measure ν0 and transformation map T :  → X. Then ξ ∼ ν0 implies that T(ξ) ∼ μ0. • Consider the two distributions μ(du) = 1 Z exp (︀ − (u; y) )︀ μ0(du), ν(du) = 1 Z exp (︀ − (T(ξ); y) )︀ ν0(dξ). Then μ = T♯ ν. • If we can sample ν in a dimension robust manner, we can therefore sample μ in a dimension robust manner. 6/56
  • 9. Robust Sampling Non-Gaussian Priors • Assuming μ0 = T♯ N(0, I), we can sample the posterior in a dimension robust manner with the whitened pCN algorithm: wpCN Algorithm 1. Set n = 0 and choose β ∈ (0, 1]. Initial State ξ(0) ∈ , set u(0) = T(ξ(0) ). 2. Propose new state ˆξ(n) = (1 − β2 ) 1 2 ξ(n) + βζ(n) , ζ(n) ∼ N(0, I) 3. Set ξ(n+1) = ˆξ(n) with probability α(ξ(n) → ˆξ(n) ) = min {︀ 1, exp (︀ (T(ξ(n) ); y) − (T(ˆξ(n) ); y) )︀}︀ or else set ξ(n+1) = ξ(n) . 4. Set u(n+1) = T(ξ(n+1) ). 5. Set n → n + 1 and go to 2. 7/56
  • 10. Outline 1. Introduction 2. Series-Based Priors 3. Hierarchical Priors 4. Level Set Priors 5. Numerical Illustrations 6. Conclusions 7/56
  • 11. Series-Based Priors Let • {φj}j≥1 a deterministic X-valued sequence, • {ρj}j≥1 be a deterministic R-valued sequence, • {ζj}j≥1 an independent random R-valued sequence, • m ∈ X, and define μ0 to be the law of the random variable u = m + ∞∑︁ j=1 ρjζjφj. Such priors have been considered in Bayesian inversion in cases where, for example, {ζj} is uniform (Schwab, Stuart, 2012), Besov (Lassas et al., 2009) and stable (Sullivan, 2017). 8/56
  • 12. Series-Based Priors We can find a Gaussian measure ν0 and map T such that μ0 = T♯ ν0 as follows: 1. Write down a method for sampling ζj. 2. Using an inverse CDF type method, rewrite this method in terms of a (possibly multivariate) Gaussian random variable ξj, so that ζj d = Λj(ξj). 3. Define T(ξ) = m + ∞∑︁ j=1 ρjΛj(ξj)φj and ν0 the joint distribution of {ξj}j≥1. 9/56
  • 13. Series-Based Priors Examples Example (Uniform) • We want ζj ∼ U(−1, 1). • Define ν0 = N(0, 1)∞ and Λj(z) = 2φ(z) − 1, where φ is the standard normal CDF. Example (Besov) • We want ζj ∼ πq, where πq(z) ∝ exp (︁ − 1 2 |z|q )︁ . • Define ν0 = N(0, 1)∞ and Λj(z) = 21/q sgn(z) (︁ γ−1 1/q (φ(|z|) − 1) )︁1/q where γ1/q is the normalized lower incomplete gamma function. 10/56
  • 14. Numerical Example I Dimensional Robustness • We consider a 2D regression problem with a Besov prior. • We compare the average acceptance rate of the whitened pCN algorithm with those of three different Random Walk Metropolis algorithms, as the dimension of the state space is increased. RWM Algorithm 1. Set n = 0 and choose β > 0. Initial State u(n) ∈ X. 2. Propose new state ˆu(n) = u(n) + βζ(n) , ζ(n) ∼ Q(ζ) 3. Set u(n+1) = ˆu(n) with probability α(u(n) → ˆu(n) ) = min {︁ 1, exp (︀ (u(n) ; y) − (ˆu(n) ; y) )︀ μ0(ˆu(n) ) μ0(u(n)) }︁ or else set u(n+1) = u(n) . 4. Set n → n + 1 and go to 2. 11/56
  • 15. Numerical Example I Dimensional Robustness 12/56 10 -5 10 -4 10 -3 10 -2 10 -1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Meanacceptanceprobability Whitened pCN 10 -5 10 -4 10 -3 10 -2 10 -1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RWM with white Gaussian jumps N = 64 N = 256 N = 1024 N = 4096 N = 16384 N = 65536 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Meanacceptanceprobability RWM with colored Gaussian jumps 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RWM with Besov jumps Figure: Mean acceptance rates for different MCMC algorithms. Curves are shown for different state space dimensions N. β β
  • 16. Numerical Example I Dimensional Robustness 13/56 10 -5 10 -4 10 -3 10 -2 10 -1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Meanacceptanceprobability 10 -5 10 -4 10 -3 10 -2 10 -1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 N = 256 N = 1024 N = 4096 N = 16384 N = 65536 10-5 10-4 10-3 10-2 10-1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Meanacceptanceprobability RWM with colored Gaussian jumps 10-5 10-4 10-3 10-2 10-1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RWM with Besov jumps Figure: Mean acceptance rates for different MCMC algorithms. Curves are shown for different state space dimensions N.
  • 17. Numerical Example II Existing Methods • Vollmer (2015) provides methodology for robust sampling with a uniform prior, by reflecting random walk proposals at the boundary of [−1, 1]N. • We compare these methods, using both uniform and Gaussian reflected random walk proposals, with the whitened pCN method described above. • As ν has a Gaussian prior, we have additional methodology available: we compare also with a dimension robust version of MALA, which makes likelihood informed proposals (Beskos et al, 2017). 14/56
  • 18. Numerical Example II Existing Methods 15/56 0 10 20 30 40 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RURWM RSRWM wPCN -wMALA 0 10 20 30 40 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RURWM RSRWM wPCN -wMALA Figure: Autocorrelations for different sampling methods for a linear deblurring problem, with 8/32 observations (left/right).
  • 19. Outline 1. Introduction 2. Series-Based Priors 3. Hierarchical Priors 4. Level Set Priors 5. Numerical Illustrations 6. Conclusions 15/56
  • 20. Hierarchical Priors • A family of Gaussian distributions will often have a number of parameters associated with it controlling sample properties. • For example, Whittle-Matérn distributions with parameters θ = (ν, τ, σ), which have covariance function cθ(x, y) = σ2 1 2ν−1(ν) (τ|x − y|)ν Kν(τ|x − y|). – ν controls smoothness – draws from Gaussian fields with this covariance have ν fractional Sobolev and Hölder derivatives. – τ is an inverse length-scale. – σ is an amplitude scale. • These parameters may not be known a priori, and may be treated hierarchically as hyperparameters in the problem. 16/56
  • 21. Hierarchical Priors • Placing a hyperprior π0 on the hyperparameters θ, the posterior μ(du, dθ) may be sampled using a Metropolis-within-Gibbs method, alternating the following two steps: Metropolis-within-Gibbs 1. Update u(n) → u(n+1) using pCN (etc) method for the conditional distribution u|(θ(n), y). 2. Update θ(n) → θ(n+1) using an MCMC method that samples θ|(u(n+1), y) in stationarity. • Problem: For different choices of θ, the Gaussian distributions u|θ are typically singular. Moves in the hyperparameters are hence never accepted in the infinite-dimensional limit. 17/56
  • 22. Hierarchical Priors • We use the same methodology for robust sampling as for the series-based priors, finding an appropriate measure ν0 and map T. • Suppose u|θ ∼ N(0, C(θ)). If ξ ∼ N(0, I) is white noise, then C(θ)1/2 ξ ∼ N(0, C(θ)). We hence define ν0 = N(0, I) ⊗ π0 and T(ξ, θ) = C(θ)1/2ξ. • ξ and θ are independent under ν0, and so singularity issues from Metropolis-within-Gibbs disappear when sampling ν. • Referred to as non-centered parameterization in literature (Papaspiliopoulos et al., 2007) 18/56
  • 23. Hierarchical Priors Centered vs Non-Centered • Current state θ, proposed state θ = θ + ϵη. • Centered acceptance rate for θ|u, y: 1∧ √︃ det C(θ) det C(θ ) exp (︂ 1 2 C(θ)−1/2 u 2 − 1 2 C(θ )−1/2 u 2 )︂ π0(θ ) π0(θ) . • Non-Centered acceptance rate for θ|ξ, y: 1 ∧ exp (︀ (T(ξ, θ); y) − (T(ξ, θ ); y) )︀π0(θ ) π0(θ) . 18/56
  • 24. Hierarchical Priors Example (Whittle-Matérn Distributions) • If u is a Gaussian random field on Rd with covariance function cθ(·, ·) above, then it is equal in law to the solution of the SPDE (τ2 I − )ν/2+d/4 u = β(ν)στν ξ where ξ ∼ N(0, I). (Lindgren et al., 2011) • The unknown field in the inverse problem can thus either be treated as u or ξ, related via the mapping u = T(ξ, θ) = β(ν)στν (τ2 I − )−ν/2−d/4 ξ. • We set ν0 = N(0, I) ⊗ π0 and define T as above. 19/56
  • 25. Outline 1. Introduction 2. Series-Based Priors 3. Hierarchical Priors 4. Level Set Priors 5. Numerical Illustrations 6. Conclusions 19/56
  • 26. Level Set Method • It is often of interest to recover a piecewise constant function, such as in inverse interface or classification problems. • We can construct a piecewise constant function by thresholding a continuous function at a number of levels. • Define F : C0(D) → L∞(D) by F(ξ)(x) = ⎧ ⎪⎪⎨ ⎪⎪⎩ κ1 ξ(x) ∈ (−∞, −1) κ2 ξ(x) ∈ [−1, 1) κ3 ξ(x) ∈ [1, ∞). Then if ξ ∼ N(0, C) is continuous Gaussian random field, u = F(ξ) is a piecewise constant random field. • We have a natural method for sampling when a prior μ0 = F♯ N(0, C) is used. 20/56
  • 27. Level Set Method 21/56 Figure: Examples of continuous fields ξ ∼ N(0, C) (top) and the thresholded fields u = F(ξ) (bottom).
  • 28. Level Set Method 22/56 Figure: Examples of continuous fields ξ ∼ N(0, C(τ)) for vari- ous τ (top) and the thresholded fields u = F(ξ) (bottom).
  • 29. • The thresholding function F : C0(D; R) → L∞(D; R) has the disadvantage that an ordering of layers/classes is asserted. This can be overcome by instead using a vector level set method. • Let k denote the number of layers/classes. Define S : C0(D; Rk) → L∞(D; Rk) by S(ξ)(x) = er(x;ξ), r(x; ξ) = argmax j=1,...,k ξj(x), (1) where {er}k r=1 denotes the standard basis for Rk. • Samples from μ0 = S♯ N(0, C)k then take values in the set {er}k r=1 for each x ∈ D. 23/56
  • 30. Outline 1. Introduction 2. Series-Based Priors 3. Hierarchical Priors 4. Level Set Priors 5. Numerical Illustrations 6. Conclusions 23/56
  • 31. Numerical Example III Groundwater Flow • We consider the case where the forward map represents the mapping from permeability to pressure in the steady state Darcy flow model. • Let D = (0, 1)2. Given u ∈ L∞(D), denote p(x) = p(x; u) the solution to {︃ −∇ · (u∇p) = f x ∈ D p = 0 x ∈ ∂D • Define : L∞(D) → RK by (u) = (p(x1; u), . . . , p(xJ; u)). • We look at the Bayesian inverse problem of recovering u from a noisy measurement of (u), using non-hierarchical and hierarchical level set priors. 24/56
  • 32. Numerical Example III Groundwater Flow • We consider the case where the forward map represents the mapping from permeability to pressure in the steady state Darcy flow model, observed at 36 points. 25/56 Figure: (Left) True permeability u†, (Right) True pressure and observed data y.
  • 33. Numerical Example III Groundwater Flow • We consider the case where the forward map represents the mapping from permeability to pressure in the steady state Darcy flow model, observed at 36 points. 26/56 Figure: (Left) True permeability u†, (Right) Posterior mean F(E(ξ)).
  • 34. Numerical Example III Groundwater Flow • We consider the case where the forward map represents the mapping from permeability to pressure in the steady state Darcy flow model, observed at 36 points. 27/56 Figure: (Left) True permeability u†, (Right) Posterior mean F (︀ T (︀ E(ξ), E(α), E(τ) )︀)︀ .
  • 35. Numerical Example IV Deep Gaussian Process I • Consider the infinite hierarchy {un}n∈N of the form un+1 | un ∼ N(0, C(un)) • We can alternatively write this in the form un+1 = L(un)ξn, ξn ∼ N(0, I) i.i.d where L(u)L(u)∗ = C(u). • Classical setup of (Domianou, Lawrence, 2013): L(u)ξ = ξ ◦ u. • We choose L(u) such that C(u) is a Whittle-Matern-type covariance with spatially varying inverse length-scale τ(u). • Ergodicity of this process means that it is sufficient for inference to terminate the hierarchy after a small number of levels. 28/56
  • 36. Numerical Example IV Deep Gaussian Process I • We can consider the unknowns in the problem to be the variables – u = (u0, . . . , uN−1), which are correlated under the prior; or – ξ = (ξ0, . . . , ξN−1), which areindependent under the prior. • These variables are related via u = TN(ξ), where the deterministic maps Tn :  → X are defined iteratively by T1(ξ0, . . . , ξN−1) = ξ0, Tn+1(ξ0, . . . , ξN−1) = L(Tn(ξ0, . . . , ξN−1))ξn, n = 1, . . . , N − 1. • We thus set T = TN and ν0 = N(0, I)N. 29/56
  • 37. Numerical Example IV 1D Regression We consider regression using a Whittle-Matern DGP prior, with the aim of recovering the following field from noisy point observations: 30/56 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -0.5 0 0.5 1 1.5 Figure: The true field u†. It is observed on a grid of J points.
  • 38. Numerical Example IV 1D Regression 31/56 0 0.2 0.4 0.6 0.8 1 -0.5 0 0.5 1 1.5 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 -0.5 0 0.5 1 1.5 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 -0.5 0 0.5 1 1.5 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 -0.5 0 0.5 1 1.5 Figure: Sample means of layers under the posterior, using 4,3,2,1 layers for the DGP prior. Here J = 25.
  • 39. Numerical Example IV 1D Regression 32/56 0 0.2 0.4 0.6 0.8 1 -0.5 0 0.5 1 1.5 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 -0.5 0 0.5 1 1.5 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 -0.5 0 0.5 1 1.5 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 -0.5 0 0.5 1 1.5 Figure: Sample means of layers under the posterior, using 4,3,2,1 layers for the DGP prior. Here J = 50.
  • 40. Numerical Example IV 1D Regression 33/56 0 0.2 0.4 0.6 0.8 1 -0.5 0 0.5 1 1.5 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 -0.5 0 0.5 1 1.5 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 -0.5 0 0.5 1 1.5 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 -0.5 0 0.5 1 1.5 Figure: Sample means of layers under the posterior, using 4,3,2,1 layers for the DGP prior. Here J = 100.
  • 41. Numerical Example IV 1D Regression 34/56 0 0.2 0.4 0.6 0.8 1 -0.5 0 0.5 1 1.5 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 -0.5 0 0.5 1 1.5 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 -0.5 0 0.5 1 1.5 0 0.2 0.4 0.6 0.8 1 0 50 100 150 0 0.2 0.4 0.6 0.8 1 -0.5 0 0.5 1 1.5 Figure: Sample means of layers under the posterior, using 4,3,2,1 layers for the DGP prior. Here J = 106.
  • 42. Numerical Example IV Error Values Table: The L1 -errors u† − E(uN) L1 between the true field and sample means for the one-dimensional simulations. J 1 layer 2 layers 3 layers 4 layers 25 0.0746 0.0658 0.0667 0.0670 50 0.0568 0.0339 0.0339 0.0337 100 0.0485 0.0200 0.0198 0.0196 106 0.0131 0.000145 0.000133 0.000133 35/56
  • 43. Numerical Example IV 2D Regression 36/56 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure: The true field u†, which posesses multiple length-scales. It is observed on a grid of J points.
  • 44. Numerical Example IV 2D Regression 37/56 0 0.5 1 0 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 70 0 0.5 1 0 0.2 0.4 0.6 0.8 1 10 12 14 16 0 0.5 1 0 0.2 0.4 0.6 0.8 1 0 0.5 1 0 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 70 0 0.5 1 0 0.2 0.4 0.6 0.8 1 Figure: (Left) E(u3), (Middle, Right) The length-scale fields E (︀ F(u2) )︀ , E (︀ F(u1) )︀ . Here J = 210.
  • 45. Numerical Example IV 2D Regression 38/56 0 0.5 1 0 0.2 0.4 0.6 0.8 1 0 0.5 1 0 0.2 0.4 0.6 0.8 1 0 0.5 1 0 0.2 0.4 0.6 0.8 1 20 40 60 80 0 0.5 1 0 0.2 0.4 0.6 0.8 1 20 40 60 80 0 0.5 1 0 0.2 0.4 0.6 0.8 1 20 40 60 80 100 120 Figure: (Left) E(u3), (Middle, Right) The length-scale fields E (︀ F(u2) )︀ , E (︀ F(u1) )︀ . Here J = 28.
  • 46. Numerical Example IV Error Values Table: The L2 -errors u† − E(uN) L2 between the true field and sample means for the two-dimensional simulations. J 1 layer 2 layers 3 layers 210 0.0856 0.0813 0.0681 28 0.1310 0.1260 0.1279 39/56
  • 47. Numerical Example V Deep Gaussian Process II • We now consider the case where the forward map is the mapping from conductivity to boundary voltage measuremeants in the complete electrode model for EIT (Somersalo et al, 1992). • This is similar to the groundwater flow problem, except boundary measurements are made instead of interior measurements, for a collection of source terms. • The prior is taken to be a 2 layer hierarchy of the form considered above, except each continuous field is thresholded to be piecewise constant. 40/56
  • 48. Numerical Example V Deep Gaussian Process II 41/56 Figure: (Left) True conductivity field (Middle) True length-scale field (Right) Vector of observed boundary voltages.
  • 49. Numerical Example V Deep Gaussian Process II 42/56 Figure: (Left) Pushforward of posterior mean conductivity (Right) pushforward of posterior mean length scale.
  • 50. Numerical Example V Deep Gaussian Process II 43/56 Figure: Examples of samples of the conductivity field under the posterior.
  • 51. Numerical Example VI Graph-Based Learning • Assume we have a collection of data points {xj}j∈Z ⊂ Rd that we wish to divide into classes. • We have class labels {yj}j∈Z for a subset of points {xj}j∈Z , Z ⊂ Z, and wish to propagate these to all points. • We view the data points as vertices of a weighted graph G, with weights Wij = η(xi, xj) describing the similarity between data points. • A priori clustering information can be obtained from the spectral properties of the graph Laplacian LN ∈ RN×N associated with G: LN = D − W, Dii = N∑︁ j=1 Wij. • If there are K clusters and η can distinguish between them perfectly, LN has K zero eigenvalues, whose eigenvectors determine these clusters perfectly. 44/56
  • 52. Numerical Example VI Graph-Based Learning • We define the prior ν0 = N(0, C(θ))K with C(θ) = PM(LN + τ2)−α P∗ M , θ = (α, τ, M). • A model for the labels is given, yj = S(u(xj)) + ηj, j ∈ Z , where S : RK → RK is a multi-class thresholding function, and u : G → R is a function on the entire graph. • This defines the likelihood, and hence posterior distribution: ν(du) ∝ exp ⎛ ⎝− 1 2γ2 ∑︁ j∈Z ⃒ ⃒S(u(xj)) − yj ⃒ ⃒2 − 1 2 ⟨︀ u, C(θ)−1 u ⟩︀ ⎞ ⎠ du. 45/56
  • 53. Numerical Example VI Graph-Based Learning • This problem is high- but finite-dimensional: it is not directly an approximation of a problem on an infinite-dimensional Hilbert space. However, with appropriate assumptions and scaling, it does have a continuum limit as N → ∞. (Garcia Trillos, Slepčev, 2014; Garcia-Trillos et al, 2017) 46/56
  • 54. Numerical Example VI Graph-Based Learning • We consider the case where the data points are elements of the MNIST dataset: 28 × 28 pixel images of handwritten digits 0, 1, . . . , 9. • We are hierarchical about subsets of the parameters (α, τ, M), and see how classification accuracy is affected by this choice. • A sample u(k) ∼ N(0, C(θ)) can be expressed in the eigenbasis {λj, qj} of LN as u(k) = M∑︁ j=1 (λj + τ2 )−α/2 ξjqj, ξj ∼ N(0, 1) i.i.d.. • We restrict to the digits 3, 4, 5, 9 here to speed up computations, and take ≈ 2000 of each digit so that |Z| ≈ 8000. We provide labels for a random subset of |Z | = 80 digits. 47/56
  • 55. Numerical Example VI Graph-Based Learning 48/56 erge faster. The algorithm learning (⌧, ↵, M) follows a similar trend as the one le ), but it appears to be doing slightly worse overall. Fig. 22. Mean classification accuracy of 40 trials against sample number. MNIST [3, 4, 5, 9]. 9. Conclusions and Future Directions. We explored a Bayesian hierarchical appro ry and multiclass graph-based classification, looking at hyperparameters that control nvectors could be used to form the classification. We use MCMC techniques to samp Figure: Mean classification accuracy of 40 trials against sample number, for MNIST digits 3,4,5,9.
  • 56. • We now focus on a test set of 10000 digits, with approximately 1000 of each digit 0, 1, . . . , 9, being hierarchical about (α, M). • From samples we can look at uncertainty in addition to accuracy. • We introduce the measure of uncertainty associated with a data point xj as U(xj) = √︃ k k − 1 min r=1,...,k ⃦ ⃦E(Su)(xj) − er ⃦ ⃦ 2 Remark: (Su)(x) lies in the set {er}k r=1 for each fixed u, x, but the mean E(Su)(x) in general will not. 49/56
  • 57. Numerical Example VI Graph-Based Learning 50/56 Figure: 100 most uncertain digits (top) and a selection of 100 most certain digits (bottom), 200 labels. Mean uncertainty: 14.0%.
  • 58. Numerical Example VI Graph-Based Learning 51/56 Figure: 100 most uncertain digits, 200+100 uncertain labels. Mean uncertainty: 10.3%.
  • 59. Numerical Example VI Graph-Based Learning 51/56 Figure: 100 most uncertain digits, 200+200 uncertain labels. Mean uncertainty: 8.1%.
  • 60. Numerical Example VI Graph-Based Learning 51/56 Figure: 100 most uncertain digits, 200+300 uncertain labels. Mean uncertainty: 7.1%.
  • 61. Numerical Example VI Graph-Based Learning 52/56 Figure: Histogram of uncertainty, 200 labels.
  • 62. Numerical Example VI Graph-Based Learning 52/56 Figure: Histogram of uncertainty, 200+100 uncertain labels.
  • 63. Numerical Example VI Graph-Based Learning 52/56 Figure: Histogram of uncertainty, 200+200 uncertain labels.
  • 64. Numerical Example VI Graph-Based Learning 52/56 Figure: Histogram of uncertainty, 200+300 uncertain labels.
  • 65. Numerical Example VI Graph-Based Learning 53/56 Figure: Histogram of uncertainty, 200 labels.
  • 66. Numerical Example VI Graph-Based Learning 53/56 Figure: Histogram of uncertainty, 200+100 certain labels.
  • 67. Numerical Example VI Graph-Based Learning 53/56 Figure: Histogram of uncertainty, 200+200 certain labels.
  • 68. Numerical Example VI Graph-Based Learning 53/56 Figure: Histogram of uncertainty, 200+300 certain labels.
  • 69. Outline 1. Introduction 2. Series-Based Priors 3. Hierarchical Priors 4. Level Set Priors 5. Numerical Illustrations 6. Conclusions 53/56
  • 70. Conclusions • Non-Gaussian priors can be more appropriate to use than Gaussians in certain contexts, and can provide extra flexibility via use of hyperparameters. • By writing a non-Gaussian prior as the pushforward of a Gaussian under some non-linear map, existing dimension robust MCMC sampling methods for Gaussian priors can be used. • For hierarchical priors, the transformation can also chosen such that the priors on the field and hyperparameters are independent (non-centering), avoiding issues associated with measure singularity in the Gibbs sampler. 54/56
  • 72. References [1] A. M. Stuart. Inverse problems: a Bayesian perspective. Acta Numerica, 2010. [2] M. M. Dunlop, M. Girolami, A. M. Stuart and A. L. Teckentrup. How deep are deep Gaussian processes? Submitted, 2017. [3] V. Chen, M. M. Dunlop, O. Papaspiliopoulos and A. M. Stuart. Robust MCMC Sampling in High Dimensions with Non-Gaussian and Hierarchical Priors. In Preparation. [4] M. M. Dunlop, M. A. Iglesias and A. M Stuart. Hierarchical Bayesian level set inversion. Statistics and Computing, 2016. [5] Z. Wang, J. M. Bardsley, A. Solonen, T. Cui and Y. M. Marzouk. Bayesian inverse problems with ℓ1 priors: a randomize-then-optimize approach. SIAM Journal on Scientific Computing, 2017. 56/56