Linear Bayesian update surrogate for updating PCE coefficients

Bayesian Update in low-rank tensor format
A. Litvinenko, B. V. Rosi´c, E. Zander, O. Pajonk, H. G. Matthies,
Institute for Scientiﬁc Computing, TU Braunschweig, Germany
July 13, 2011
Bayesian Update in low-rank tensor format — July 13, 2011 1/40

1 Introduction
2 Direct General Bayesian Approach
3 Discretisation
4 Numerical Examples
5 Conclusion

Introduction
Inverse Problem: Find parameter q given measurement data z
qA( ,u)
f u=S(q,f)
Y(q,u)
Forward
(u?)
Inverse
(q?)
z
Ill-posed problem: issues of existence, uniqueness and stability

Bayesian Regularization
- Additional information to data z: qf (apriori information, forecast)
What is qf ?
- classical Bayesian approach: qf := πf apriori pdf
πa(q|z) = const πf (q)π(z|q) = const πf (q)L(q)
Markov Chain Monte Carlo methods (MCMC) [Gamerman 2006]
spectral stochastic FEM +MCMC [Kuˇcerov´a at all 2010, Marzouk
2009]
collocation methods [Christen & Fox 2010]
-drawback: requires a complete statistical description of the problem

Direct General Bayesian Approach
- Probability space (Ω, B, P)
- the space of RVs with ﬁnite variance S := L2(Ω) (stochastic space)
- the Hilbert space Q (deterministic space)
Q -valued RVs form a space Q := Q ⊗ S
True measurement
- Linear measurement ˇy = Y(q, u) ∈ Y is polluted by noise :
z = ˇy + , ∼ N(0, C ) ⇒ z ∈ Y0 ⊆ Y := Y ⊗ S
Apriori information
qf : Ω → Q, qf ∈ Qf ⊂ Q

Direct General Bayesian Approach
- already defined: z ∈ Y0, qf ∈ Qf
- given linear mapping H : Q → Y , predict observation
y = Hqf , y ∈ Q0 = H∗
(Y0)
Theorem
In the setting just described, the random variable qa ∈ Q — “a”
stands for “assimilated” or “analysis” — is the orthogonal ( min.
variance) projection of q onto the subspace Qf + Q0:
qa(ω) = qf (ω) + K(z(ω) − y(ω)), K := Cqf y (Cy + C )
−1
with qf being the orthogonal projection onto Qf and K the “Kalman
gain” operator [Luenberger 1969, Rosić at all 2011, Pajonk at all
2011].
- doesn’t assume Gaussian statistics; in linear case reduces to
Kalman fillter [Evensen 2009]

Discretisation
“Projection of Projection ”
- the orthogonal projector ˆP : Q → ˆQ, ˆP∗
= ˆP
ˆQ := QN ⊗ SJ
- project onto ˆQ
ˆqa(ω) = ˆPqa(ω) = ˆP(qf (ω) + K(z(ω) − y(ω)))
= ˆPqf (ω) + ˆPK(z(ω) − ˆy(ω))
= ˆqf (ω) + K(ˆz(ω) − ˆy(ω)),
where ˆy(ω) = H ˆPqf (ω) = Hˆqf (ω)

Example
- Darcy Law
− div(κ(x, ω) u(x, ω)) = f(x, ω),
u(x, ω) = 0.
- Conductivity is for simplicity assumed to be scalar ﬁeld with apriori
distribution (via maximum entropy principle)
κf (x) := exp(qf (x)), qf (x) ∼ N(µqf
, σ2
qf
)
- Covariance function
Covqf
(x, y) = σ2
qf
exp(−|x − y|/lc)
- following conditions hold:
κf (x, ω) > 0, κf L∞(G×Ω) < ∞, 1/κf L∞(G×Ω) < ∞.

Variational Formulation
- The solution space:
U := U ⊗ S, U := ˚H1
(G) = {u ∈ H1
(G) | u = 0 on ∂G}
- Euqilibrium equation:
a(v, u) := E (a(ω)(v(·, ω), u(·, ω))) = E ( (ω), v(·, ω) ) =: , v .
a(ω)(v, u) :=
G
v(x) · (κf (x, ω) u(x)) dx,
(ω), v :=
G
v(x)f(x, ω) dx, ∀v ∈ U,
- The well-possednes via Lax-Milgram theorem.

Discretisation
- Finite element discretisation: u(x, ω) =
N
n=1 un(ω)φn(x)
A(ω)[u(ω)] = f(ω),
(A(ω))m,n := a(ω)(φm, φn) with the bi-linear form a(ω),
(f(ω))m := (ω), φm ,
u(ω) = [u1(ω), . . . , uN(ω)]T
.

PCE and KLE
- Wiener’s polynomial chaos expansion: un(θ) = α∈J uα
n Hα(θ(ω)),
α = (α1, . . . , α, . . .) ∈ N
(N)
0 , (1)
∀β ∈ J : E ([f(θ) − A(θ)u(θ)]Hβ(θ)) = 0, (2)
with fβ := E (f(θ)Hβ(θ)) and Aβ,α := E (Hβ(θ)A(θ)Hα(θ)),
∀β ∈ J :
α∈J
Aβ,αuα
= fβ, (3)
which further represents a linear, symmetric and positive deﬁnite
system of equations of size N × R.

- The Karhunen-Lo`eve expansion (KLE) of stiffness and rhs
Au := (
∞
j=0
Aj ⊗ ∆j
)(
α∈J
uα
⊗ eα
) = (
α∈J
fα ⊗ eα
) =: f,
where ∆j
= E(Hαξj Hβ), κf =
M
j=1 κj
f ξj and |J | = R.
- The sparse tensor Galerkin methods [Zander, Matthies 2010]

Simulation of Measurements
- Measure some functional of the solution u in ﬁnitely many patches L:
ˆG := {x1, ..., xL} ⊂ G, L := | ˆG|.
- The average hydraulic head:
y(u, ω) := ..., y(xj ), ... ∈ RL
, y(xj ) =
Gj
u(x, ω)dx,
ˇy = [y(x1, ˇω), ..., y(xL, ˇω)]
T
- Observation:
z := ˇy + , ∼ N(0, C )

Inverse Problem
- κf is cone in the vector space of RVs (not subspace)
- project: κf = α∈J κ
(α)
f Hα(θ(ω)) (similar for z and y)
qf (x, ω) = log κf =
α∈J
q
(α)
f (x)Hα(θ(ω)) = Qf H, Qf ∈ RN×R
, H ∈ RR
Let Qa = [..., qβ
a , ...], Z = [..., zβ
, ...] and Y = [..., yβ
, ...], then
- matrix form of update formula:
Qa = Qf + K(Z − Y), K ∈ RN×L
; Z, Y ∈ RL×R
- map back
κa = exp(qa(x, ω))

Bayesian update procedure
Input: a priori information qf (ω) and measurements z.
1 approximate qf (ω) and input z(ω) by PCE.
2 set Qf = [..., qβ
f , ...], Z = [..., zβ
, ...]
3 solve u(ω) = S(qf (ω); f(ω))
4 forecast of measurement
y(ω) = Y(qf (ω); u(ω)) = Y(qf (ω); S(qf (ω); f(ω)))
5 PCE representation of y(ω): Y = [..., yβ
, ...]
6 compute covariance Cd = Cy + C = ˜Y∆0 ˜Y
T
+ C
7 compute G = C−1
d (Z − Y)
8 compute covariance Cqf y = ˜Qf ∆0 ˜Y
T
9 compute formula Qa = Qf + Cqf y G
Assimilated data Qa = [..., qβ
a , ...].

Kalman Filter
- the variance
Cqa = E (˜qa(·) ⊗ ˜qa(·)) =
γ,β>0
qγ
a ⊗ qβ
a E (HγHβ) =
γ>0
qγ
a ⊗ qγ
a γ!,
Cqa
= ˜Qa∆0 ˜Qa
T
, ˜Qa = Qa|γ=0
- Kalman formula:
Cqa
= Cqf
+ Cqf y (Cy + C )
−1
CT
qf y − 2Cqf y (Cy + C )
−1
CT
qf y
= Cqf
− Cqf y (Cy + C )
−1
CT
qf y

Low rank data format
Aim: to compute the following equation in low-rank tensor format
qa(ω) = qf (ω) + K(z(ω) − y(ω)), (4)
with
K = Cqf y (Cy + C )
−1
, (5)
where Cqf y = Cov(qf , y) = E (qf − E (qf ))(y − E (y))T
,
Cy = Cov(y, y), C = Cov( , ). can be approximated in H-matrix or in
low-rank tensor formats [Litvinenko et al. 2008].

Compression of PCE coefﬁcients
Let RF q(x, θ), θ = (θ1, ..., θM , ...) is approximated:
q(x, θ) =
β∈J
Hβ(θ)qβ(x), (6)
qβ(x) =
1
β! Θ
Hβ(θ)q(x, θ) P(dθ) ≈
1
β!
nq
i=1
Hβ(θi )q(x, θi )wi , (7)
where nq - number of quadrature points. Using low-rank format,
obtain
qβ(x) = [q(x, θ1), ..., q(x, θnq )] · [Hβ(θ1)w1, ..., Hβ(θnq )wnq ]T
(8)

Denote
cβ := [Hβ(θ1)w1, ..., Hβ(θnq
)wnq
] ∈ Rnq
(9)
and approximate the set of realisations in low-rank format:
[q(x, θ1), ..., q(x, θnq )] ≈ ABT
.
The matrix of all PCE coefﬁcients will be
RN×|J |
[...qβ(x)...] ≈ ABT
[...cT
β ...], β ∈ J . (10)
Later compression Hβ(θ) =
M
j=1 hβj
(θj ), where hβj
(θj ) are 1D
Hermite polynomials, is possible.

Response surface in low-rank format
Put all together, obtain low-rank representation of RS
q(x, θ) =
β∈J
Hβ(θ)qβ(x) = HqT
(x), (11)
where H = (..., Hβ(θ), ...) and q(x) = (..., qβ(x), ...). Use Eq. 10,
obtain
q(x, θ) = Hq(x)T
= HABT
[...cT
β ...], (12)
where vector cβ is deﬁned in Eq. 9.
Matrices A, BT
and [...cT
β ...] are given. By ﬁxing random parameter
θ = θ∗
compute vector H and then a realisations q(x, θ∗
) of RF.

Application of response surface
Now, having RS
q(x, θ) = HABT
[...cT
β ...] (13)
we generate RV θ, compute vector H, multiply by A, resulting vector
multiply by BT
and then by matrix [...cT
β ...]. We repeat this , e.g., 106
times and then use the obtained sample to compute (in each point x)
errorbars (command errorbar in Matlab ),
quantiles (command quantile in Matlab ),
cumulative density function (command ksdensity in Matlab ).

Relative errors and memory of rank-k approx.
rank k press. density tke ev xv memory, MB
10 1.9e-2 1.9e-2 4.0e-3 1.4e-3 1.1e-2 21
20 1.4e-2 1.3e-2 5.9e-3 4.1e-4 9.7e-3 42
50 5.3e-3 5.1e-3 1.5e-4 7.7e-5 3.4e-3 104
Table: Matrices ∈ R260000×600
. Dense matrix format costs 1.25 GB.

Numerical examples of tensor approximations
Gaussian kernel exp(−h2
) has the Kronecker rank 1.
The exponen. kernel exp(−h) can be approximated by a tensor with
low Kronecker rank r.
Approximation of C ∈ RN×N
, N = 412
= 1681 in the KT format.
r 1 2 3 4 5 6 10
C−Cr ∞
C ∞
11.5 1.7 0.4 0.14 0.035 0.007 2.8e − 8
C−Cr 2
C 2
6.7 0.52 0.1 0.03 0.008 0.001 5.3e − 9

Sequential Updating
1.update 2.update
f1
f2
κf1
κa1
κf2
start

Measurement points
−1 0 1
−1
−0.5
0
0.5
1
−1 0 1
−1
−0.5
0
0.5
1
a) 447 measurement patches b) 239 measurement patches
−1 0 1
−1
−0.5
0
0.5
1
−1 0 1
−1
−0.5
0
0.5
1
c) 120 measurement patches d) 10 measurement patches
Table: Position of measurement points (FEM nodes) used in the experiments

Given Data
- Right hand side: f = f0 sin(2π
λ xT
d + ϕ)
d = [cos α sin α], α ∈ [−π/2, π/2], ϕ ∈ [0, 2π]
- ’ Virtual truth’ is taken as
a) κ = 2
b) κ = 2 + 0.3 · (x + y)
c) κ = 2.2 − 0.1 · (x2
+ y2
)
- Apriori information:
E(κ) = 2.4, σκ = 0.4
order of PCE p = 3 and number of KLE modes: M <= 50

Relative Error
Experiment L εp 1st 2nd 3rd 4th
1. 477 0.45 0.08 0.04 0.03 0.03
2. 239 0.45 0.08 0.05 0.05 0.04
3. 120 0.45 0.07 0.05 0.05 0.04
4. 60 0.45 0.07 0.06 0.05 0.05
5. 10 0.45 0.13 0.08 0.07 0.07
Table: “Constant truth”: Decay of the relative error εa in each experiment
εa :=
κa − κt L2(Ω⊗G)
κt L2(Ω⊗G)
; ¯εa :=
|E(κa) − E(κt )|
|E(κt )|

Relative Error
0 1 2 3 4
10
−2
10
−1
10
0
Number of sequential updates
Relativeerrorεa
447 pt
239 pt
120 pt
60 pt
10 pt
Figure: “Linear truth”, experiment 1 (L=447): Convergence behaviour of
the relative error εa with respect to the number of sequential updates and
measurement points

Relative Error
−1
0
1
−1
0
1
0
1
2
a) εa
[%]
−1
0
1
−1
0
1
5.5
6
6.5
b) εa
[%]
−1
0
1
−1
0
1
76
78
80
c) I [%]
Figure: “Constant truth”, experiment 1 (L=447) after 4th update: a)
Relative error ¯εa (the mean of the posterior compared to the mean of the
truth) b) relative error εa (the posterior compared to the truth) c) improvement
I (the posterior compared to the prior)

PDF
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
2
4
6
κ
PDF
κ
f
κ
a
Figure: “Constant truth”, experiment 3 (L=120): Posterior probability
density function κa compared to the prior κf for a single point in domain

Update
Figure: “Linear truth”, experiment 1 (L=447) after 1th update: a) mean of
the prior, ¯κf b) truth, κ c) mean of the posterior, ¯κa

Update
−1
0
1
−1
0
1
1
2
3
a) κf
−1
0
1
−1
0
1
2
2.1
2.2
b) true κ
−1
0
1
−1
0
1
2
2.1
2.2
c) κa
Figure: “Quadratic truth”, experiment 1 (L=447) after 4th update: a)
mean of the prior, ¯κf b) truth, κ c) mean of the posterior, ¯κa

Example: The Lorenz-84 Model
Described by the system:
dx
dt
= −ax − y2
− z2
+ aF1
dy
dt
= −y + xy − bxz + F2 (14)
dz
dt
= −z − xz + bxy,
where F1 and F2 represent known thermal forcings, and a and b are
ﬁxed constants.

The Lorenz-84 model shows chaotic behaviour and is very sensitive
to the initial conditions. For this reason we model these as
independent Gaussian RVs:
x0(ω) ∼ N(x0, σ1)
y0(ω) ∼ N(y0, σ2) (15)
z0(ω) ∼ N(z0, σ3).
Due to the appearance of RVs, the determ. model turns into a system
of SDEs.

Figure: Bi-modal identification experiment after 1 update. Here are shown
the results for different amounts of measurements used to determine the
PCE coefficients. First, we use 10 measurements (a), then 100 (b) and finally
1000 (c). The plot contains the truth, the prior and the posterior, as well as
the last used measurement as an example.

Figure: Bi-modal identiﬁcation experiment after 10 updates

Figure: Bi-modal identiﬁcation experiment after 100 updates

Conclusion
The ill-posed problem is regularized by introduction of apriori
information
the update of the prior is a projection of the minimum variance
estimator from linear Bayesian updating onto the polynomial
chaos basis
for the mean and variance the estimation is of the Kalman type.
The estimation is purely deterministic without need for any kind
of sampling procedures
The presented linear Bayesian update does not need any
linearity in the forward model, and it can readily update
non-Gaussian uncertainties.

Any Questions?
Thank you for your attention! Any Questions?
LiBerty
LInear BayEsian diRecT polYnomial chaos update

References
1 Gamerman, D. and Lopes, H. F. , Markov Chain Monte Carlo:
Stochastic Simulation for Bayesian Inference, Chapman and
Hall, 2006
2 Kuˇcerová, A. and Matthies, H. G., Uncertainty Updating in the
Description of Heterogeneous Materials, Technische Mechanik,
Vol. 30, pp. 211–225, 2010
3 Marzouk, Y. M. and Najm, H. N. ,Dimensionality reduction and
polynomial chaos acceleration of Bayesian inference in inverse
problems, J. Comput. Phys, Vol. 228, 2009
4 Christen, J. A. and Fox, C., MCMC using an approximation, J.
Comput. Graph. Stat., Vol. 14, pp. 795–810, 2005
5 Luenberger, D. G., Optimization by Vector Space Methods, John
Wiley and Sons, Inc., New York, 1969
6 Rosić, B., Litvinenko. A, Pajonk O., Matthies H.G., Direct
Bayesian update of polynomial chaos representations, J.
Comput. Phys, 2011, submitted
7 Pajonk, O. and Rosić, B. V. and Litvinenko, A. and Matthies,
H. G., A Deterministic Filter for non-Gaussian Bayesian
Estimation, Physica D: Nonlinear Phenomena, 2011, submittedBayesian Update in low-rank tensor format — July 13, 2011 40/40

Linear Bayesian update surrogate for updating PCE coefficients

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Linear Bayesian update surrogate for updating PCE coefficients

Similar to Linear Bayesian update surrogate for updating PCE coefficients (20)

More from Alexander Litvinenko

More from Alexander Litvinenko (20)

Recently uploaded

Recently uploaded (20)

Linear Bayesian update surrogate for updating PCE coefficients