AcceleratingBayesianInference

1Challenge the future
Accelerating high-dimensional posterior
exploration using a two-stage MCMC
algorithm
Luca Carniato

Outline
• Bayesian inference
• Markov Chain Monte Carlo (MCMC) algorithms
• Model approximations : polynomial chaos expansion (PCE)
• 2-stage MCMC algorithm
• Testing the 2-stage MCMC algorithm:
1. Analytic function
2. Conservative transport
3. Reactive transport
• Conclusions

Bayesian inference
Bayes' theorem (base of all inversion problems):
𝑝 𝐳|𝐝 =
𝑝 𝐝 𝐳 𝑝(𝐳)
𝑝(𝐝)
∝ 𝐿 𝐳 𝐝 𝑝 𝐳
• 𝐝 are the observations and 𝒛 the parameters of the model 𝐹 𝐳
• 𝐿 𝐳 𝐝 is the likelihood function (likelihood of 𝒛 given 𝐝)
• 𝑝(𝐳) is the prior density of model parameters
• 𝑝 𝐳|𝐝 is the posterior density
The aim of the Bayesian inference is to estimate the multidimensional posterior distribution 𝑝 𝐳|𝐝 .
The residuals are expressed as:
𝛆 𝑧 = 𝐝 − 𝐹 𝒛
If the residuals are multivariate Gaussian (𝛆 𝑧~𝑁(0, 𝚺 𝐞)) the likelihood is also Gaussian and 𝑝 𝐳|𝐝
can be expressed as:
𝑝 𝐳|𝐝 = (2𝜋)−
𝑚
2 det(𝚺 𝐞)−
1
2exp −
1
2
𝛆 𝑧
𝑇
𝚺 𝑒
−1
𝛆 𝑧 𝑝 𝐳
𝑝 𝐝 is a normalization constant and can be disregarded.

MCMC algorithms
The posterior 𝑝 𝐳|𝐝 is not available in closed form in most cases. The posterior can be
characterized by drawing a bunch of random samples:
• Monte Carlo methods: Not efficient in highly dimensional space.
• Markov Chain Monte Carlo (MCMC): More efficient. Constructing a Markov Chain that has the
posterior 𝑝 𝐳|𝐝 as its stationary distribution.
Basic steps of MCMC methods:
1) Draw a proposal 𝐳´ from the current state 𝐳 with probability q(𝐳´|𝐳) (the proposal distribution).
2) Calculate the value of the posterior 𝑝 𝐳´|𝐝 (a run of the model 𝐹(𝐳) is required).
3) Calculate the Metropolis-Hastings (MH) acceptance probability (assuming a symmetric proposal
distribution):
𝛼 𝐳, 𝐳´ = 𝑚𝑖𝑛 1,
𝑝 𝐳´|𝐝
𝑝 𝐳|𝐝
1) Draw a random number 𝑢 from a uniform distribution 𝑈(0,1). If 𝛼 𝐳, 𝐳´ < 𝑢 𝐳´ is accepted and
becomes the current state of the chain.
Steps 1 to 4 are repeated until the chain converges to its stationary distribution. Chain convergence
can be monitored using the Gelman and Rubin (1992) diagnostic.

MCMC algorithms
Advantages:
• Precise quantification of parameter uncertainty, parameter correlation and prediction uncertainty
also for nonlinear problems.
• Precise assessment of the Maximum a Posteriori value (MAP, i.e. the value that produces the
best fit).
• Prior knowledge of the model parameters 𝑝(𝐳) can be included as a density function.
Disadvantages:
• It normally requires thousand model evaluations for each model parameter before the target
distribution is reached.
• The number of samples scales exponentially with the dimension of the parameter space (“curse
of dimensionality”).
Solution:
• Substitute the full model 𝐹 𝒛 with an approximation that runs very fast.

Model approximation: polynomial chaos
expansion (PCE)
• Approximate the full model 𝐹(𝒛) with a polynomial approximation of degree 𝑃:
𝐹𝑃 𝐳 =
𝑗=0
𝑈−1
𝑎𝑗Ψ𝑗(𝐳)
• Ψ𝑗(𝐳) are orthogonal polynomials of degree not exceeding 𝑃 obtained as product of
monodimensional polynomials Ψ𝑗 𝐳 = 𝜓𝑗,1 𝑧1 × ⋯ × 𝜓𝑗,𝑛 𝑧 𝑛 .
• The type of monodimensional polynomial is associated with the prior distribution of the single
parameter 𝑧𝑖:
• Hermite polynomials for gaussian distributions
• Legendre polynomials for uniform distributions
• The number of the expansion coefficients 𝑎𝑗 is 𝑈 =
𝑃+𝑛 !
𝑃!𝑛!
(e.g. 𝑃 = 2, 𝑛 = 59 , 𝑛 = 1830).
• Example of a 2 dimensional PCE expansion of degree 2 using Hermite polynomials:
𝐹𝑃 𝐳 = 𝑎0 + 𝑎1 𝑧1 + 𝑎2 𝑧2 + 𝑎3(𝑧1
2
− 1) + 𝑎4(𝑧2
2
− 1) + 𝑎5 𝑧1 𝑧2
product of monodimensional polynomials

Model approximations: polynomial chaos
expansion (PCE)
• The problem reduces to calculate the expansion coefficients 𝑎𝑗.
• Here the least square approach is used:
• 2 × 𝑈 samples of 𝒛 are drawn from the prior distribution using the Latin Hypercube
Sampling experimental design.
• Model responses 𝐝 are calculated from the sample.
• An overdetermined linear system is solved using the Least Square approach:
Ψ0(𝐳1) … Ψ 𝑈−1(𝐳1)
⋮ ⋱ ⋮
Ψ0(𝐳2𝑈) … Ψ 𝑈−1(𝐳2𝑈)
𝑎0
⋮
𝑎 𝑈−1
=
𝐹 𝐳1
⋮
𝐹 𝐳2𝑈
Limitations:
• The number of orthogonal polynomials grows prohibitively fast with parameter dimensionality
and degree P.
• Non smooth problems are difficult to approximate.
• 𝐹𝑃 𝐳 contains approximation errors that can lead to incorrect estimation of the posterior 𝑝 𝐳|𝐝 .
Solution:
• Account for the approximation errors when selecting the valid proposals 𝐳´ (2-stage MCMC).

2-stage MCMC algorithm
Basic step of the 2-stage MCMC algorithm (Cui et al., 2011):
1) Draw a proposal 𝐳´ from the current state of the chain 𝐳 with probability q(𝐳´|𝐳)
2) Estimate the model reduction error 𝐫 𝐧 = 𝐹 𝐳 − 𝐹𝑃 𝐳 and its covariance matrix 𝚺 𝒓,𝒏 =
𝟏
𝒏
n − 1 𝚺 𝒓,𝒏−𝟏 + cov(𝐫𝐧) .
3) Correct the model approximation with the model reduction error 𝐹𝑃,𝑛 𝐳´ = 𝐹𝑃 𝐳´ + 𝐫 𝐧.
Compute the residuals with the corrected model 𝛆 𝑛,𝐳´ = 𝐝 − 𝐹𝑃,𝑛 𝐳´ .
4) Calculate the values of the approximate posterior 𝑝 𝐳´|𝐝 and 𝑝 𝐳|𝐝 :
𝑝 𝑛 𝐳´|𝐝 = (2𝜋)−
𝑚
2 det 𝚺 𝐞 + 𝚺 𝒓,𝒏
−
1
2 exp −
1
2
𝛆 𝑛,𝐳´
𝑇
𝚺 𝐞 + 𝚺 𝒓,𝒏
−1
𝛆 𝑛,𝐳´ 𝑝 𝐳
𝑝 𝑛 𝐳|𝐝 = (2𝜋)−
𝑚
2 det(𝚺 𝐞 + 𝚺 𝒓,𝒏)−
1
2exp −
1
2
𝛆 𝑧
𝑇
(𝚺 𝐞 + 𝚺 𝒓,𝒏)−1
𝛆 𝑧
𝑇
𝑝 𝐳
5) Calculate the forward and reverse MH acceptance probability:
𝛼 𝐳, 𝐳´ = 𝑚𝑖𝑛 1,
𝑝 𝑛 𝐳´|𝐝
𝑝 𝑛 𝐳|𝐝
𝛼 𝐳´, 𝐳 = 𝑚𝑖𝑛 1,
𝑝 𝑛 𝐳|𝐝
𝑝 𝑛 𝐳´|𝐝

6) With probability 𝛼 𝐳, 𝐳´ accept the proposal 𝐳´ to be used in the second stage of the
algorithm.
7) If the proposal 𝐳´ is accepted run the full model 𝐹 𝐳´ and compute the value of the posterior
𝑝 𝐳´|𝐝 . Otherwise the new state of the chain is 𝐳 and go back to step 1.
8) Calculate the MH ratio of the second step:
𝛽 𝐳, 𝐳´ = 𝑚𝑖𝑛 1,
𝑝 𝐳´|𝐝 𝛼 𝐳´, 𝐳
𝑝 𝐳|𝐝 𝛼 𝐳, 𝐳´
9) With probability 𝛽 𝐳, 𝐳´ accept 𝐳´ as the new state of the chain.
Advantages:
• The reduced model 𝐹𝑃 𝐳 is used in the first step to filter out unacceptable proposals
• Accounts for the errors in the approximation 𝐹𝑃 𝐳
Disadvantages:
• Still requires thousand evaluations of the full model 𝐹 𝐳´ .

Draw 𝐳´ from the proposal q(𝐳´|𝐳)
Compute 𝐹𝑃 𝐳 , 𝐫 𝐧, 𝚺 𝒓,𝒏
Compute the reduced posteriors 𝑝 𝐳´|𝐝 ,
𝑝 𝑛 𝐳|𝐝 , and MH ratios 𝛼 𝐳, 𝐳´ , 𝛼 𝐳´, 𝐳
𝛼 𝐳, 𝐳´
> 𝒖
Compute 𝐹 𝐳 , the full posterior 𝑝 𝐳´|𝐝 , and
the MH ratio 𝛽 𝐳, 𝐳´
𝐳´ rejected
𝐳´ accepted
𝛼 𝐳, 𝐳´
> 𝒖
𝐳´ rejected
Reduced model
(Fast, 0.01 sec,
80% rejection)
Full model
(Slow, 1-10 sec, 20%
rejection)

Test cases
• Three cases were tested:
1. Analytic case: one dimensional problem with a quadratic 𝐹 𝐳 .
2. Conservative transport: 59 dimensional problem.
3. Reactive transport: 68 dimensional problem.
• The 2-stage MCMC algorithm has been included in DREAM (Differential Evolution Adaptive
Metropolis, Vrugt et al. 2009):
• Runs multiple different chains simultaneously (>= 3, in the test cases 5).
• Jumps in each chain are generated from the difference of two randomly chosen chains
𝑐1 and 𝑐2:
𝐳´ = 𝐳 + 𝛄 𝐳 𝒄 𝟏 − 𝐳 𝒄 𝟐 + 𝐞 𝐞~𝑁(0, 𝜎)
• Subspace sampling (improved efficiency due to additional freedom of moves in the
parameter space).

1. Analytic case
• Full model 𝐹 𝐳 :
• 10 model responses represented by 10 quadratic functions (𝐭 = 0.1: 0.1: 1 , b = 10):
𝐹 𝐳 = 𝐭 + 𝐭z + b𝐭z 𝟐
• True z value equal to 10.
• Measurements 𝐝 generated adding to 𝐹 𝐳 a Gaussian noise with variance σe = 100
• Prior 𝑝 z ~𝑈(5,15)
• Reduced model 𝐹𝑃 z :
• PCE of degree one, Legendre monomial ( 𝐹𝑃 z = 𝑎0 + 𝑎1 𝑧 )
• 4 samples and full models runs used to estimate the expansion coefficients 𝑎0 and 𝑎1.
• Reduced model contains an approximation error because tries to approximate a quadratic
function with a linear function.

1. Analitic case: results
MAP: 9.71 ( 9.42 - 9.52 )
Log likelihood: -154.49 (-35.68)
Full model runs: 4
MAP: 10 (9.99 – 10.09)
Full model runs: 420
MAP: 10 (10 – 10.07)
Full model runs: 212
Reduced model runs: 133

2. Conservative transport
𝜕
𝜕𝑥𝑖
𝐾𝑖
𝜕ℎ
𝜕𝑥𝑖
+ 𝑊 = 0, 𝑞𝑖 = −𝐾𝑖
𝜕ℎ
𝜕𝑥𝑖
(𝑓𝑙𝑜𝑤)
𝜕𝐶 𝑘
𝜕𝑡
=
𝜕
𝜕𝑥𝑖
𝐷𝑖𝑗
𝜕𝐶 𝑘
𝜕𝑥𝑗
−
𝜕
𝜕𝑥𝑖
𝑞𝑖
𝜙
𝐶 𝑘 +
𝑊
𝜙
𝐶 𝑤𝑘 (𝑡𝑟𝑎𝑛. )
• True 𝐳: 59 horizontal hydraulic conductivity
Karhunen–Loève modes randomly generated
(average permeability 10 m d-1).
• Measurements 𝐝 generated adding to 480 log
transformed model responses a gaussian noise
with variance 𝜎𝑒 = 0.01.
• Priors of the modes 𝑝 𝐳 ~𝑁(0,1)
• Reduced model 𝐹𝑃 𝐳 : PCE of degree 1 (60
expansion coefficients).
Source Piezometers (20
measuraments
each piezometer)

2. Conservative transport: results
• The approximation error is comparable to the measurement noise.
• Linear polynomials approximate the full model well.

• The 2-stage MCMC test lasted 6 hours, the MCMC test with the full model 24 hours.
• The 2-stage MCMC did not fully converged (after 1100000 iterations), the MCMC test with the full
model converged after 85000 full model runs.
• In the 2-stage MCMC test the number of iterations is larger.
• About 87% of the proposals were rejected in the first stage.
• About 25% of the proposals were rejected in the second stage.
Acceleration

• 2 stage MCMC
approaches to the
same likelihood value
of the standard MCMC
with the full model.
• Less than half full
model runs are
required in the 2-stage
MCMC algorithm.
• Different permeability
fields give similar
likelihood values (the
Karhunen–Loève
modes are correlated).

𝜕
𝜕𝑥𝑖
𝐾𝑖
𝜕ℎ
𝜕𝑥𝑖
+ 𝑊 = 0, 𝑞𝑖 = −𝐾𝑖
𝜕ℎ
𝜕𝑥𝑖
(𝑓𝑙𝑜𝑤)
𝜕𝐶 𝑘
𝜕𝑡
=
𝜕
𝜕𝑥𝑖
𝐷𝑖𝑗
𝜕𝐶 𝑘
𝜕𝑥𝑗
−
𝜕
𝜕𝑥𝑖
𝑞𝑖
𝜙
𝐶 𝑘 +
𝑊
𝜙
𝐶 𝑤𝑘 − 𝑘 𝑘 𝐶 𝑘 +
𝑖
𝑝≠𝑘
𝛼 𝑘 𝑘 𝑝 𝐶 𝑝 (𝑡𝑟𝑎𝑛𝑠𝑝𝑜𝑟𝑡)
• Six reacting species (𝑘 = 6).
• True 𝐳: 59 horizontal hydraulic conductivity Karhunen–Loève modes, 9 first order degradation
rates 𝑘 and 3 branching coefficients 𝛼 𝑘.
• Measurements 𝐝 generated adding to 2880 log transformed model responses a gaussian noise
with variance 𝜎𝑒 = 0.01.
• Priors of the 59 modes 𝑝 𝐳 ~𝑁(0,1), priors of the degradation rates and branching coefficients
𝑝 𝐳 ~𝑈
• Reduced model 𝐹𝑃 𝐳 : PCE of degree 2 (2415 expansion coefficients).

2. Reactive transport: reaction sequence
• 𝑘 are the degradation coefficents, 𝛼 the branching ratios.
• The degradation sequence is simulated up to vinyl chloride (VC).
• VC is the most toxic bioproduct (law limit in groundwater 5 μg L-1).

• The approximation error is larger than the measurement noise.
• A larger degree of the expansion might be required, but unfeasible (degree
three would require 57155 expansion coefficents)
Large error at e-70

• Some chains did not converge in both tests given the maximum number of full model runs
(65000) and the maximum number of iterations (69000).
• It was not possible to run the algorithms until full convergence.
• At the end of the 2-stage MCMC test (690000 iterations) chains are more convergent.

• None of the tests
approaches to the true
log-likelihood value
• MCMC test with the full
model provides the
best result.
• The approximation
error of the reduced
model is quite
significant. MCMC with
the reduced model
provides the worst
result.
• The 2-stage MCMC
algorithm accounts for
the approximation
error and provides
better results
compared to the MCMC
test with the reduced
model only.

Conclusions
• The 2-stage MCMC algorithm accelerates the posterior exploration by a factor of 2 in the
conservative transport case.
• The 2-stage MCMC algorithm provides better estimations of the MAP compared to using only
the reduced model (it accounts for the model reduction error).
• Unfortunately the 2-stage algorithm still requires thousand model runs for each parameter
and a long waiting time (gradient based optimization still wins in highly dimensional
problems).
• A possible strategy to build higher degree PCE approximations is to use sparse polynomial
basis (e.g. dimension adaptive PCE as implemented in UQLAB, ETH Zurich).

• UQTk (UQ Toolkit, MATLAB/C++) Sandia National Laboratories: PCE expansion (available here)
• DREAM (MATLAB): Markov Chain Monte Carlo acceleration by Differential Evolution (an Octave
version able to exploit parallel computing is available, can be requested to the main author
Jasper Vrugt).
• MODFLOW USGS (Fortran): simulates groundwater flow in porous media using finite
differences (available here)
• RT3D (Fortran) Pacific Northwest National Laboratory: simulates groundwater reactive
transport using finite differences. Reactions networks can be coded in separate Fortran
modules (available here). Solution using operator splitting.
• Groundwater data utilities (part of the PEST software, Fortran): data extraction from
MODFLOW and RT3D output (available here).
• Cygwin: gfortran compiler.
1. Software used

Questions?

AcceleratingBayesianInference

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AcceleratingBayesianInference

Similar to AcceleratingBayesianInference (20)

AcceleratingBayesianInference

Editor's Notes