The document describes a talk given by Simon Cotter on using surrogate models to accelerate Bayesian inverse problems. It discusses how computationally expensive the forward model can be for problems like heat transfer, requiring hours or days for standard MCMC. A surrogate forward model is constructed offline using polynomial chaos expansions, which can then be rapidly evaluated online in MCMC to sample the posterior. This reduces computational costs by several orders of magnitude compared to directly running the PDE solver at each MCMC step. Results are presented for an industrial copper sample problem, demonstrating the surrogate approach.
MUMS: Transition & SPUQ Workshop - Complexities in Bayesian Inverse Problems: Models and Distributions - Simon Cotter, May 17, 2019
1. Complexities in Bayesian inverse problems:
Models and Distributions
Simon Cotter
University of Manchester
17th May 2019
Simon Cotter Bayesian Complexities 0 / 39
2. The Graveyard Shift
Last day
After lunch
My promises to you:
2 talks in 1 to keep you on your toes
(or if you go to sleep in the 1st you have a second chance!)
Almost no details!
As many pictures of gorillas as is scientifically justifiable
Simon Cotter Bayesian Complexities 1 / 39
3. The Graveyard Shift
Last day
After lunch
My promises to you:
2 talks in 1 to keep you on your toes
(or if you go to sleep in the 1st you have a second chance!)
Almost no details!
As many pictures of gorillas as is scientifically justifiable
Simon Cotter Bayesian Complexities 1 / 39
4. The Graveyard Shift
Last day
After lunch
My promises to you:
2 talks in 1 to keep you on your toes
(or if you go to sleep in the 1st you have a second chance!)
Almost no details!
As many pictures of gorillas as is scientifically justifiable
Simon Cotter Bayesian Complexities 1 / 39
5. The Graveyard Shift
Last day
After lunch
My promises to you:
2 talks in 1 to keep you on your toes
(or if you go to sleep in the 1st you have a second chance!)
Almost no details!
As many pictures of gorillas as is scientifically justifiable
Simon Cotter Bayesian Complexities 1 / 39
6. The Graveyard Shift
Last day
After lunch
My promises to you:
2 talks in 1 to keep you on your toes
(or if you go to sleep in the 1st you have a second chance!)
Almost no details!
As many pictures of gorillas as is scientifically justifiable
Simon Cotter Bayesian Complexities 1 / 39
7. The Graveyard Shift
Last day
After lunch
My promises to you:
2 talks in 1 to keep you on your toes
(or if you go to sleep in the 1st you have a second chance!)
Almost no details!
As many pictures of gorillas as is scientifically justifiable
Simon Cotter Bayesian Complexities 1 / 39
8. The Graveyard Shift
Last day
After lunch
My promises to you:
2 talks in 1 to keep you on your toes
(or if you go to sleep in the 1st you have a second chance!)
Almost no details!
As many pictures of gorillas as is scientifically justifiable
Simon Cotter Bayesian Complexities 1 / 39
9. Collaborators
Left: Catherine Powell (University of Manchester, UK), Center:
James Rynn (University of Manchester, UK), Right: Louise Wright
(National Physical Laboratories, UK)
Simon Cotter Bayesian Complexities 2 / 39
10. Collaborators
Left: Colin Cotter (Imperial College, UK), Center: Yannis
Kevrekidis (John Hopkins, US), Right: Paul Russell (formerly
University of Manchester, UK).
SLC is grateful to EPSRC for First Grant award EP/L023393/1
Simon Cotter Bayesian Complexities 3 / 39
12. Bayesian Inverse Problems
Find the unknown θ given nz observations z, satisfying
z = G(θ) + η, η ∼ N(0, Σ),
where
z ∈ Rnz is a given vector of observations,
G : Θ → Rnz is the observation operator,
θ ∈ Θ is the unknown parameter,
η ∈ Rnz is a vector of observational noise.
Goal: Efficiently estimate the posterior density π(θ|z) for the
unknowns θ given the data z.
Simon Cotter Bayesian Complexities 4 / 39
13. Bayesian Inverse Problems
Find the unknown θ given nz observations z, satisfying
z = G(θ) + η, η ∼ N(0, Σ),
where
z ∈ Rnz is a given vector of observations,
G : Θ → Rnz is the observation operator,
θ ∈ Θ is the unknown parameter,
η ∈ Rnz is a vector of observational noise.
Goal: Efficiently estimate the posterior density π(θ|z) for the
unknowns θ given the data z.
Simon Cotter Bayesian Complexities 4 / 39
14. Bayesian Inverse Problems
Find the unknown θ given nz observations z, satisfying
z = G(θ) + η, η ∼ N(0, Σ),
where
z ∈ Rnz is a given vector of observations,
G : Θ → Rnz is the observation operator,
θ ∈ Θ is the unknown parameter,
η ∈ Rnz is a vector of observational noise.
Goal: Efficiently estimate the posterior density π(θ|z) for the
unknowns θ given the data z.
Simon Cotter Bayesian Complexities 4 / 39
15. Bayesian Inverse Problems
Find the unknown θ given nz observations z, satisfying
z = G(θ) + η, η ∼ N(0, Σ),
where
z ∈ Rnz is a given vector of observations,
G : Θ → Rnz is the observation operator,
θ ∈ Θ is the unknown parameter,
η ∈ Rnz is a vector of observational noise.
Goal: Efficiently estimate the posterior density π(θ|z) for the
unknowns θ given the data z.
Simon Cotter Bayesian Complexities 4 / 39
16. Bayesian Inverse Problems
Find the unknown θ given nz observations z, satisfying
z = G(θ) + η, η ∼ N(0, Σ),
where
z ∈ Rnz is a given vector of observations,
G : Θ → Rnz is the observation operator,
θ ∈ Θ is the unknown parameter,
η ∈ Rnz is a vector of observational noise.
Goal: Efficiently estimate the posterior density π(θ|z) for the
unknowns θ given the data z.
Simon Cotter Bayesian Complexities 4 / 39
17. Bayesian Inverse Problems
Find the unknown θ given nz observations z, satisfying
z = G(θ) + η, η ∼ N(0, Σ),
where
z ∈ Rnz is a given vector of observations,
G : Θ → Rnz is the observation operator,
θ ∈ Θ is the unknown parameter,
η ∈ Rnz is a vector of observational noise.
Goal: Efficiently estimate the posterior density π(θ|z) for the
unknowns θ given the data z.
Simon Cotter Bayesian Complexities 4 / 39
18. Markov Chain Monte Carlo (MCMC) Methods
π(θ|z) ∝ L(z|θ)π0(θ)
∝ exp −
1
2
z − G(θ) 2
Σ π0(θ).
Markov chain Monte Carlo estimates
Eπ[φ] =
Θ
φ(θ)π(θ|z)dθ ≈
1
M
M
i=1
φ(θi).
Extremely costly for some problem classes
Expensive likelihood evaluations (talk #1)
Poor mixing requiring high values of M (talk #2)
Simon Cotter Bayesian Complexities 5 / 39
19. Markov Chain Monte Carlo (MCMC) Methods
π(θ|z) ∝ L(z|θ)π0(θ)
∝ exp −
1
2
z − G(θ) 2
Σ π0(θ).
Markov chain Monte Carlo estimates
Eπ[φ] =
Θ
φ(θ)π(θ|z)dθ ≈
1
M
M
i=1
φ(θi).
Extremely costly for some problem classes
Expensive likelihood evaluations (talk #1)
Poor mixing requiring high values of M (talk #2)
Simon Cotter Bayesian Complexities 5 / 39
20. Markov Chain Monte Carlo (MCMC) Methods
π(θ|z) ∝ L(z|θ)π0(θ)
∝ exp −
1
2
z − G(θ) 2
Σ π0(θ).
Markov chain Monte Carlo estimates
Eπ[φ] =
Θ
φ(θ)π(θ|z)dθ ≈
1
M
M
i=1
φ(θi).
Extremely costly for some problem classes
Expensive likelihood evaluations (talk #1)
Poor mixing requiring high values of M (talk #2)
Simon Cotter Bayesian Complexities 5 / 39
21. Markov Chain Monte Carlo (MCMC) Methods
π(θ|z) ∝ L(z|θ)π0(θ)
∝ exp −
1
2
z − G(θ) 2
Σ π0(θ).
Markov chain Monte Carlo estimates
Eπ[φ] =
Θ
φ(θ)π(θ|z)dθ ≈
1
M
M
i=1
φ(θi).
Extremely costly for some problem classes
Expensive likelihood evaluations (talk #1)
Poor mixing requiring high values of M (talk #2)
Simon Cotter Bayesian Complexities 5 / 39
22. Markov Chain Monte Carlo (MCMC) Methods
π(θ|z) ∝ L(z|θ)π0(θ)
∝ exp −
1
2
z − G(θ) 2
Σ π0(θ).
Markov chain Monte Carlo estimates
Eπ[φ] =
Θ
φ(θ)π(θ|z)dθ ≈
1
M
M
i=1
φ(θi).
Extremely costly for some problem classes
Expensive likelihood evaluations (talk #1)
Poor mixing requiring high values of M (talk #2)
Simon Cotter Bayesian Complexities 5 / 39
26. Industrial Example
Possible unknowns:
λ — thermal conductivity,
I — laser intensity,
κ — heat transfer coefficient (affects boundary conditions),
σ — standard deviation of measurement noise.
. . .
Simon Cotter Bayesian Complexities 8 / 39
27. Forward Model: Heat Equation
Model the physical relationship between λ, I and the temperature
T of the material using the heat equation:
cp
∂T
∂t
− · (λ T) = Q(I).
Given a sample of λ and I, we can use numerical methods
(FEMs) to approximate T at the measurement times.
One evaluation of the FEM code takes ≈ 30 seconds, for a
“reasonable level of accuracy”.
Simon Cotter Bayesian Complexities 9 / 39
28. Forward Model: Heat Equation
Model the physical relationship between λ, I and the temperature
T of the material using the heat equation:
cp
∂T
∂t
− · (λ T) = Q(I).
Given a sample of λ and I, we can use numerical methods
(FEMs) to approximate T at the measurement times.
One evaluation of the FEM code takes ≈ 30 seconds, for a
“reasonable level of accuracy”.
Simon Cotter Bayesian Complexities 9 / 39
29. Metropolis Hastings Algorithm (FEM)
Algorithm 1: Metropolis-Hastings Algorithm
set initial state X(0) = θ0
for m = 1, 2, . . . , M do
draw proposal
evaluate likelihood by FEM approximation of G (expensive!)
compute acceptance probability α
accept proposal with probability α
end for
output chain X = (θ0, θ1, . . . , θM)
30 seconds per (time-dependant) PDE solve =⇒ 10m samples
takes 3 × 108 seconds = 9.5 years (single CPU)
Simon Cotter Bayesian Complexities 10 / 39
30. Surrogate Model
The temperature T is a function of θ = (λ, I).
cp
∂T(θ)
∂t
− · (λ T(θ)) = Q(I).
Instead of solving for individual samples of λ and I:
Offline: build an approximation of the form
Tapprox(θ) =
nk
i=1
TiΨi(θ), (Surrogate)
where Ψi are orthogonal polynomials in θ (Legendre)
This can be cheaply evaluated in MCMC routines (Online).
Simon Cotter Bayesian Complexities 11 / 39
31. Surrogate Model
The temperature T is a function of θ = (λ, I).
cp
∂T(θ)
∂t
− · (λ T(θ)) = Q(I).
Instead of solving for individual samples of λ and I:
Offline: build an approximation of the form
Tapprox(θ) =
nk
i=1
TiΨi(θ), (Surrogate)
where Ψi are orthogonal polynomials in θ (Legendre)
This can be cheaply evaluated in MCMC routines (Online).
Simon Cotter Bayesian Complexities 11 / 39
32. Metropolis Hastings Algorithm (SGFEM)
Algorithm 2: Metropolis-Hastings Algorithm with SGFEM Surro-
gate
compute SGFEM solution (cost dependant on FEM
discretisation parameters + maximum polynomial degree k)
set initial state X(0) = θ0
for m = 1, 2, . . . , M do
draw proposal
evaluate likelihood by evaluating SGFEM approximation of
G (cheap!)
compute acceptance probability α
accept proposal with probability α
end for
output chain X = (θ0, θ1, . . . , θM)
Simon Cotter Bayesian Complexities 12 / 39
33. Results: Copper Sample
Offline: Compute surrogate solution : ≈ 16 mins
(Solved 600K equations ×nt = 800 time steps)
Online: Generate M = 107 samples of θ from the approximate
posterior πapprox(θ|d) using standard MCMC method ≈ 26 mins
Computational Costs:
Offline Online
Standard -/- M × (nt × CPDE)
Surrogate nt × O(nk × CPDE) M × Ceval
Simon Cotter Bayesian Complexities 13 / 39
34. Results: Copper Sample
Offline: Compute surrogate solution : ≈ 16 mins
(Solved 600K equations ×nt = 800 time steps)
Online: Generate M = 107 samples of θ from the approximate
posterior πapprox(θ|d) using standard MCMC method ≈ 26 mins
Computational Costs:
Offline Online
Standard -/- M × (nt × CPDE)
Surrogate nt × O(nk × CPDE) M × Ceval
Simon Cotter Bayesian Complexities 13 / 39
37. V. HOANG, C. SCHWAB, AND A. STUART, Complexity Analysis
of Accelerated MCMC Methods for Bayesian Inversion,
Inverse Problems, 29 (2013), p. 085010.
Y. MARZOUK, H. NAJM, AND L. RAHN, Stochastic Spectral
Methods for Efficient Bayesian Solution of Inverse Problems,
Journal of Computational Physics, 224 (2007), pp. 560–586.
F. NOBILE AND R. TEMPONE, Analysis and Implementation
Issues for the Numerical Approximation of Parabolic
Equations with Random Coefficients, International Journal for
Numerical Methods in Engineering, 80 (2009), pp. 979–1006.
J. A. RYNN, S. L. COTTER, C. E. POWELL, AND L. WRIGHT,
Surrogate accelerated bayesian inversion for the
determination of the thermal diffusivity of a material,
Metrologia, 56 (2019), p. 015018.
Simon Cotter Bayesian Complexities 16 / 39
43. Constrained approximation: Simple Example
100 105
0
0.2
0.4
k1k2
100 105
0
20
0 10 20 30
0
0.05
0.1
k3
100 105
0
20
40
0 20
0
20
40
0 20 40
0
0.05
0.1
k1
k4
100 105
0
5
k2
0 20
0
5
k3
0 20 40
0
5
0 5
0
0.5
1
k4
Figure: CMA approximation of the posterior arising from observations of
the slow variable S = X1 + X2, concentrated around a manifold
k1(k2+k3+k4)
k2k4
= C, i.e. more challenging than this plot suggests. (Any
visualisation suggestions?)
Simon Cotter Bayesian Complexities 22 / 39
44. Importance Sampling
Sample xi ∼ ν
Compute weights wi = π(xi)/ν(xi)
Monte Carlo estimates:
Eπ(f) ≈
1
j wj
N
i=1
f(xi)wi
Variance of weights indicator for efficiency
Small when π ≈ ν
Simon Cotter Bayesian Complexities 23 / 39
45. Importance Sampling
Sample xi ∼ ν
Compute weights wi = π(xi)/ν(xi)
Monte Carlo estimates:
Eπ(f) ≈
1
j wj
N
i=1
f(xi)wi
Variance of weights indicator for efficiency
Small when π ≈ ν
Simon Cotter Bayesian Complexities 23 / 39
46. Importance Sampling
Sample xi ∼ ν
Compute weights wi = π(xi)/ν(xi)
Monte Carlo estimates:
Eπ(f) ≈
1
j wj
N
i=1
f(xi)wi
Variance of weights indicator for efficiency
Small when π ≈ ν
Simon Cotter Bayesian Complexities 23 / 39
47. Importance Sampling
Sample xi ∼ ν
Compute weights wi = π(xi)/ν(xi)
Monte Carlo estimates:
Eπ(f) ≈
1
j wj
N
i=1
f(xi)wi
Variance of weights indicator for efficiency
Small when π ≈ ν
Simon Cotter Bayesian Complexities 23 / 39
48. Importance Sampling
Sample xi ∼ ν
Compute weights wi = π(xi)/ν(xi)
Monte Carlo estimates:
Eπ(f) ≈
1
j wj
N
i=1
f(xi)wi
Variance of weights indicator for efficiency
Small when π ≈ ν
Simon Cotter Bayesian Complexities 23 / 39
61. Ensemble Transport Adaptive Importance Sampling
Proposal distribution in kth iteration informed by M ensemble
members with states θi
χ(k)
=
1
M
M
i=1
q(·; θ
(k)
i , β)
q(·; ·, β) a transition kernel, e.g. Gaussian, MALA proposal,
etc, with scaling parameter β
Resampling step; ensemble transform method
For large M, greedy approximation used, “multinomial
approximation”
C. Cotter, SLC, P. Russell, “Ensemble transport adaptive importance
sampling”, SIAM JUQ 2019.
S. Reich, “A non-parametric ensemble transform method for Bayesian
inference”, SISC 2013.
Simon Cotter Bayesian Complexities 28 / 39
62. Ensemble Transport Adaptive Importance Sampling
Proposal distribution in kth iteration informed by M ensemble
members with states θi
χ(k)
=
1
M
M
i=1
q(·; θ
(k)
i , β)
q(·; ·, β) a transition kernel, e.g. Gaussian, MALA proposal,
etc, with scaling parameter β
Resampling step; ensemble transform method
For large M, greedy approximation used, “multinomial
approximation”
C. Cotter, SLC, P. Russell, “Ensemble transport adaptive importance
sampling”, SIAM JUQ 2019.
S. Reich, “A non-parametric ensemble transform method for Bayesian
inference”, SISC 2013.
Simon Cotter Bayesian Complexities 28 / 39
63. Ensemble Transport Adaptive Importance Sampling
Proposal distribution in kth iteration informed by M ensemble
members with states θi
χ(k)
=
1
M
M
i=1
q(·; θ
(k)
i , β)
q(·; ·, β) a transition kernel, e.g. Gaussian, MALA proposal,
etc, with scaling parameter β
Resampling step; ensemble transform method
For large M, greedy approximation used, “multinomial
approximation”
C. Cotter, SLC, P. Russell, “Ensemble transport adaptive importance
sampling”, SIAM JUQ 2019.
S. Reich, “A non-parametric ensemble transform method for Bayesian
inference”, SISC 2013.
Simon Cotter Bayesian Complexities 28 / 39
64. Ensemble Transport Adaptive Importance Sampling
Proposal distribution in kth iteration informed by M ensemble
members with states θi
χ(k)
=
1
M
M
i=1
q(·; θ
(k)
i , β)
q(·; ·, β) a transition kernel, e.g. Gaussian, MALA proposal,
etc, with scaling parameter β
Resampling step; ensemble transform method
For large M, greedy approximation used, “multinomial
approximation”
C. Cotter, SLC, P. Russell, “Ensemble transport adaptive importance
sampling”, SIAM JUQ 2019.
S. Reich, “A non-parametric ensemble transform method for Bayesian
inference”, SISC 2013.
Simon Cotter Bayesian Complexities 28 / 39
65. Ensemble Transport Adaptive Importance Sampling:
Prior and Posterior
−5 0 5 10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Simon Cotter Bayesian Complexities 29 / 39
66. Ensemble Transport Adaptive Importance Sampling:
Current State Xi
0 2 4 6 8 10
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Simon Cotter Bayesian Complexities 29 / 39
67. Ensemble Transport Adaptive Importance Sampling:
MALA Proposals
0 2 4 6 8 10
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Simon Cotter Bayesian Complexities 29 / 39
74. ETAIS - pros and cons
PROS:
Possible big speed-ups with parallelisation
Well-informed proposals
Reduces variance of importance weights
Adaptive to global differences in scales of parameters
CONS:
Posterior concentrated on lower dimensional manifold:
Stability issues
Slow convergence
Requires large ensemble size (expensive)
Particle transition kernel q needs to “know” about the manifold
Simon Cotter Bayesian Complexities 30 / 39
75. ETAIS - pros and cons
PROS:
Possible big speed-ups with parallelisation
Well-informed proposals
Reduces variance of importance weights
Adaptive to global differences in scales of parameters
CONS:
Posterior concentrated on lower dimensional manifold:
Stability issues
Slow convergence
Requires large ensemble size (expensive)
Particle transition kernel q needs to “know” about the manifold
Simon Cotter Bayesian Complexities 30 / 39
76. ETAIS - pros and cons
PROS:
Possible big speed-ups with parallelisation
Well-informed proposals
Reduces variance of importance weights
Adaptive to global differences in scales of parameters
CONS:
Posterior concentrated on lower dimensional manifold:
Stability issues
Slow convergence
Requires large ensemble size (expensive)
Particle transition kernel q needs to “know” about the manifold
Simon Cotter Bayesian Complexities 30 / 39
77. ETAIS - pros and cons
PROS:
Possible big speed-ups with parallelisation
Well-informed proposals
Reduces variance of importance weights
Adaptive to global differences in scales of parameters
CONS:
Posterior concentrated on lower dimensional manifold:
Stability issues
Slow convergence
Requires large ensemble size (expensive)
Particle transition kernel q needs to “know” about the manifold
Simon Cotter Bayesian Complexities 30 / 39
78. ETAIS - pros and cons
PROS:
Possible big speed-ups with parallelisation
Well-informed proposals
Reduces variance of importance weights
Adaptive to global differences in scales of parameters
CONS:
Posterior concentrated on lower dimensional manifold:
Stability issues
Slow convergence
Requires large ensemble size (expensive)
Particle transition kernel q needs to “know” about the manifold
Simon Cotter Bayesian Complexities 30 / 39
79. ETAIS - pros and cons
PROS:
Possible big speed-ups with parallelisation
Well-informed proposals
Reduces variance of importance weights
Adaptive to global differences in scales of parameters
CONS:
Posterior concentrated on lower dimensional manifold:
Stability issues
Slow convergence
Requires large ensemble size (expensive)
Particle transition kernel q needs to “know” about the manifold
Simon Cotter Bayesian Complexities 30 / 39
80. ETAIS - pros and cons
PROS:
Possible big speed-ups with parallelisation
Well-informed proposals
Reduces variance of importance weights
Adaptive to global differences in scales of parameters
CONS:
Posterior concentrated on lower dimensional manifold:
Stability issues
Slow convergence
Requires large ensemble size (expensive)
Particle transition kernel q needs to “know” about the manifold
Simon Cotter Bayesian Complexities 30 / 39
81. ETAIS - pros and cons
PROS:
Possible big speed-ups with parallelisation
Well-informed proposals
Reduces variance of importance weights
Adaptive to global differences in scales of parameters
CONS:
Posterior concentrated on lower dimensional manifold:
Stability issues
Slow convergence
Requires large ensemble size (expensive)
Particle transition kernel q needs to “know” about the manifold
Simon Cotter Bayesian Complexities 30 / 39
82. ETAIS - pros and cons
PROS:
Possible big speed-ups with parallelisation
Well-informed proposals
Reduces variance of importance weights
Adaptive to global differences in scales of parameters
CONS:
Posterior concentrated on lower dimensional manifold:
Stability issues
Slow convergence
Requires large ensemble size (expensive)
Particle transition kernel q needs to “know” about the manifold
Simon Cotter Bayesian Complexities 30 / 39
84. Transport maps
Find homeomorphism T : Rd → Rd which maps target
measure π to an easily explored reference measure πr
µ(T−1
(A)) = µr (A)
Simple proposal densities on πr map to complex informed
densities on π via T−1
v ∼ T−1
(q(·, u; β))
Low-dimensional approximation can be computed from a
posterior sample
M. Parno, Y. Marzouk, “Transport Map Accelerated Markov Chain Monte
Carlo”, SIAM journal on uncertainty quantification, 2018.
Simon Cotter Bayesian Complexities 32 / 39
85. Transport maps
Find homeomorphism T : Rd → Rd which maps target
measure π to an easily explored reference measure πr
µ(T−1
(A)) = µr (A)
Simple proposal densities on πr map to complex informed
densities on π via T−1
v ∼ T−1
(q(·, u; β))
Low-dimensional approximation can be computed from a
posterior sample
M. Parno, Y. Marzouk, “Transport Map Accelerated Markov Chain Monte
Carlo”, SIAM journal on uncertainty quantification, 2018.
Simon Cotter Bayesian Complexities 32 / 39
86. Transport maps
Find homeomorphism T : Rd → Rd which maps target
measure π to an easily explored reference measure πr
µ(T−1
(A)) = µr (A)
Simple proposal densities on πr map to complex informed
densities on π via T−1
v ∼ T−1
(q(·, u; β))
Low-dimensional approximation can be computed from a
posterior sample
M. Parno, Y. Marzouk, “Transport Map Accelerated Markov Chain Monte
Carlo”, SIAM journal on uncertainty quantification, 2018.
Simon Cotter Bayesian Complexities 32 / 39
87. Transport map simplification of Rosenbrock
Target parameter space
-1 0 1 2 3
0
2
4
6
(a) Original sample θ
from MH-RW algorithm.
Reference parameter space
-4 -2 0 2 4
-4
-2
0
2
4
(b) Push forward of θ
onto reference space.
Figure: The effect of the approximate transport map ˜T on a sample from
the Rosenbrock target density.
Simon Cotter Bayesian Complexities 33 / 39
88. Rosenbrock density
Number of samples
102
103
104
105
106
relativeL2
error
10-4
10-3
10-2
10-1
RWMH
TRWMH
ETAIS-RW
ETAIS-TRW
Simon Cotter Bayesian Complexities 34 / 39
89. Constrained approximation: Simple Example
100 105
0
0.2
0.4
k1k2
100 105
0
20
0 10 20 30
0
0.05
0.1
k3
100 105
0
20
40
0 20
0
20
40
0 20 40
0
0.05
0.1
k1
k4
100 105
0
5
k2
0 20
0
5
k3
0 20 40
0
5
0 5
0
0.5
1
k4
Figure: CMA approximation of the posterior arising from observations of
the slow variable S = X1 + X2, concentrated around a manifold
k1(k2+k3+k4)
k2k4
= C, i.e. more challenging than this plot suggests. (Any
visualisation suggestions?)
Simon Cotter Bayesian Complexities 35 / 39
90. Multiscale stochastic reaction network example
Number of samples
10
2
10
3
10
4
10
5
10
6
relativeerror
10-4
10-3
10
-2
10
-1
10
0
MH-logRW
MH-logTRW
ETAIS-logRW
ETAIS-logTRW
O(1/
√
N)
Figure: Sampling algorithms with a log preconditioner for ˜T.
Simon Cotter Bayesian Complexities 36 / 39
92. References
SLC, I. Kevrekidis, P. Russell, 2019. “Transport map
accelerated adaptive importance sampling, and application to
inverse problems arising from multiscale stochastic reaction
networks.” arXiv preprint arXiv:1901.11269, submitted to
SIAM UQ.
M. Parno, Y. Marzouk, “Transport Map Accelerated Markov
Chain Monte Carlo”, SIAM journal on uncertainty
quantification, 2018.
C. Cotter, SLC, P. Russell, “Ensemble transport adaptive
importance sampling”, SIAM journal on uncertainty
quantification, 2019.
S. Reich, “A non-parametric ensemble transform method for
Bayesian inference”, SISC 2013.
SLC, “Constrained approximation of effective generators for
multiscale stochastic reaction networks and application to
conditioned path sampling”, Journal of Computational
Physics, 2016
Simon Cotter Bayesian Complexities 38 / 39
94. Conclusions
Multiple possible reasons for extortionate costs in Bayesian
inference
High cost of likelihood evaluations due to numerical
approximation of PDEs
Surrogates can sample efficiently from approximation of
posterior
Does πapprox(θ|d) converge to π(θ|d)? In what sense?
How is the error π(θ|d) − πapprox(θ|d) affected by the error
between the solution to the forward model and the chosen
surrogate, T(θ) − Tapprox(θ) ?
Large number of MCMC samples required due to complex
posterior structure
Many posteriors are concentrated on lower-dimensional
manifolds
Optimal transport maps can simplify target distributions
Ensemble adaptive importance sampling schemes can be
stabilised and accelerated for lower ensemble sizes
Simon Cotter Bayesian Complexities 39 / 39
95. Conclusions
Multiple possible reasons for extortionate costs in Bayesian
inference
High cost of likelihood evaluations due to numerical
approximation of PDEs
Surrogates can sample efficiently from approximation of
posterior
Does πapprox(θ|d) converge to π(θ|d)? In what sense?
How is the error π(θ|d) − πapprox(θ|d) affected by the error
between the solution to the forward model and the chosen
surrogate, T(θ) − Tapprox(θ) ?
Large number of MCMC samples required due to complex
posterior structure
Many posteriors are concentrated on lower-dimensional
manifolds
Optimal transport maps can simplify target distributions
Ensemble adaptive importance sampling schemes can be
stabilised and accelerated for lower ensemble sizes
Simon Cotter Bayesian Complexities 39 / 39
96. Conclusions
Multiple possible reasons for extortionate costs in Bayesian
inference
High cost of likelihood evaluations due to numerical
approximation of PDEs
Surrogates can sample efficiently from approximation of
posterior
Does πapprox(θ|d) converge to π(θ|d)? In what sense?
How is the error π(θ|d) − πapprox(θ|d) affected by the error
between the solution to the forward model and the chosen
surrogate, T(θ) − Tapprox(θ) ?
Large number of MCMC samples required due to complex
posterior structure
Many posteriors are concentrated on lower-dimensional
manifolds
Optimal transport maps can simplify target distributions
Ensemble adaptive importance sampling schemes can be
stabilised and accelerated for lower ensemble sizes
Simon Cotter Bayesian Complexities 39 / 39
97. Conclusions
Multiple possible reasons for extortionate costs in Bayesian
inference
High cost of likelihood evaluations due to numerical
approximation of PDEs
Surrogates can sample efficiently from approximation of
posterior
Does πapprox(θ|d) converge to π(θ|d)? In what sense?
How is the error π(θ|d) − πapprox(θ|d) affected by the error
between the solution to the forward model and the chosen
surrogate, T(θ) − Tapprox(θ) ?
Large number of MCMC samples required due to complex
posterior structure
Many posteriors are concentrated on lower-dimensional
manifolds
Optimal transport maps can simplify target distributions
Ensemble adaptive importance sampling schemes can be
stabilised and accelerated for lower ensemble sizes
Simon Cotter Bayesian Complexities 39 / 39
98. Conclusions
Multiple possible reasons for extortionate costs in Bayesian
inference
High cost of likelihood evaluations due to numerical
approximation of PDEs
Surrogates can sample efficiently from approximation of
posterior
Does πapprox(θ|d) converge to π(θ|d)? In what sense?
How is the error π(θ|d) − πapprox(θ|d) affected by the error
between the solution to the forward model and the chosen
surrogate, T(θ) − Tapprox(θ) ?
Large number of MCMC samples required due to complex
posterior structure
Many posteriors are concentrated on lower-dimensional
manifolds
Optimal transport maps can simplify target distributions
Ensemble adaptive importance sampling schemes can be
stabilised and accelerated for lower ensemble sizes
Simon Cotter Bayesian Complexities 39 / 39
99. Conclusions
Multiple possible reasons for extortionate costs in Bayesian
inference
High cost of likelihood evaluations due to numerical
approximation of PDEs
Surrogates can sample efficiently from approximation of
posterior
Does πapprox(θ|d) converge to π(θ|d)? In what sense?
How is the error π(θ|d) − πapprox(θ|d) affected by the error
between the solution to the forward model and the chosen
surrogate, T(θ) − Tapprox(θ) ?
Large number of MCMC samples required due to complex
posterior structure
Many posteriors are concentrated on lower-dimensional
manifolds
Optimal transport maps can simplify target distributions
Ensemble adaptive importance sampling schemes can be
stabilised and accelerated for lower ensemble sizes
Simon Cotter Bayesian Complexities 39 / 39
100. Conclusions
Multiple possible reasons for extortionate costs in Bayesian
inference
High cost of likelihood evaluations due to numerical
approximation of PDEs
Surrogates can sample efficiently from approximation of
posterior
Does πapprox(θ|d) converge to π(θ|d)? In what sense?
How is the error π(θ|d) − πapprox(θ|d) affected by the error
between the solution to the forward model and the chosen
surrogate, T(θ) − Tapprox(θ) ?
Large number of MCMC samples required due to complex
posterior structure
Many posteriors are concentrated on lower-dimensional
manifolds
Optimal transport maps can simplify target distributions
Ensemble adaptive importance sampling schemes can be
stabilised and accelerated for lower ensemble sizes
Simon Cotter Bayesian Complexities 39 / 39
101. Conclusions
Multiple possible reasons for extortionate costs in Bayesian
inference
High cost of likelihood evaluations due to numerical
approximation of PDEs
Surrogates can sample efficiently from approximation of
posterior
Does πapprox(θ|d) converge to π(θ|d)? In what sense?
How is the error π(θ|d) − πapprox(θ|d) affected by the error
between the solution to the forward model and the chosen
surrogate, T(θ) − Tapprox(θ) ?
Large number of MCMC samples required due to complex
posterior structure
Many posteriors are concentrated on lower-dimensional
manifolds
Optimal transport maps can simplify target distributions
Ensemble adaptive importance sampling schemes can be
stabilised and accelerated for lower ensemble sizes
Simon Cotter Bayesian Complexities 39 / 39
102. Conclusions
Multiple possible reasons for extortionate costs in Bayesian
inference
High cost of likelihood evaluations due to numerical
approximation of PDEs
Surrogates can sample efficiently from approximation of
posterior
Does πapprox(θ|d) converge to π(θ|d)? In what sense?
How is the error π(θ|d) − πapprox(θ|d) affected by the error
between the solution to the forward model and the chosen
surrogate, T(θ) − Tapprox(θ) ?
Large number of MCMC samples required due to complex
posterior structure
Many posteriors are concentrated on lower-dimensional
manifolds
Optimal transport maps can simplify target distributions
Ensemble adaptive importance sampling schemes can be
stabilised and accelerated for lower ensemble sizes
Simon Cotter Bayesian Complexities 39 / 39