1. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Nested Sampling for General Bayesian
Computation
Represented by WU Changye
12 février 2015
Represented by WU Changye Nested Sampling for General Bayesian Computation
2. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Represented by WU Changye Nested Sampling for General Bayesian Computation
3. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Introduction
In the Bayesian paradigm, the parameter θ follows the prior
distribution π, the observations y follow the distribution L(y|θ)
given θ, then the posterior distribution f (θ|y) which indicates the
distribution of θ given the observations y has the following form :
f (θ|y) =
L(y|θ)π(θ)
Θ L(y|θ)π(θ)dθ
The objective of nested sampling is to compute the ’evidence’ :
Z =
Θ
L(y|θ)π(θ)dθ
Represented by WU Changye Nested Sampling for General Bayesian Computation
4. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
θ is a random variable, then
Z = Eπ(L(θ))
For simplicity, let L(θ) denote the likelihood L(y|θ). The cumulative
distribution function of L(θ) is
F(λ) =
L(θ)<λ
π(θ)dθ
Define the induced measure µ on R by the likelihood function and
the prior as follwing
µ(A) = Pπ(L(θ) ∈ A)
Represented by WU Changye Nested Sampling for General Bayesian Computation
5. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Lemma 1 : Eπ(L(θ)) = Eµ(X).
Proof : ∀g is a indication function of a measurable set A in R.
Then
Eπ(g(L(θ))) = Eπ(IA(L(θ))) =
L(θ)∈A
π(θ)dθ
However, µ(dx) = Θ δ{L(θ)}(dx)π(θ)dθ.
Eµ(g(X)) =
R
IA(x)µ(dx) =
Θ R
IA(x)δ{L(θ)}(dx) π(θ)dθ
Therefore,
Eµ(g(X)) = Eπ(IA(L(θ))) = Eπ(g(L(θ)))
Represented by WU Changye Nested Sampling for General Bayesian Computation
6. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
In the general case, let {gn} be an increasing sequence of step
functions converging to identity function Id ; then {gn ◦ L} is an
increasing sequence of step functions converging to L and the
desired conclusion follows by taking limits.
Represented by WU Changye Nested Sampling for General Bayesian Computation
7. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Lemma 2 : If X is a positive-valued random variable, has p.d.f. f
and c.d.f. F, then :
∞
0
(1 − F(x))dx =
∞
0
xf (x)dx = E(X).
Proof :
∞
0
(1 − F(x))dx =
∞
0
(1 − P(X < x))dx
=
∞
0
P(X ≥ x)dx
=
∞
0
∞
x
f (y) · dy · dx
=
∞
0
f (y)
y
0
dx · dy
=
∞
0
f (y) · ydy = E(X)
Represented by WU Changye Nested Sampling for General Bayesian Computation
8. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
According to Lemma 1 and 2,
Z = Eµ(X) =
∞
0
xdF(x) =
∞
0
(1 − F(x))dx
Let ϕ−1(x) = 1 − F(x) = P{θ : L(θ) > x}
Z =
∞
0
ϕ−1
(x)dx =
1
0
ϕ(x)dx
Therefore, we have the evidence represented by an one-dimensional
integration.
Represented by WU Changye Nested Sampling for General Bayesian Computation
9. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
In order to compute the following integration :
J =
1
0
ϕ(x)dx
there are three methods based on sampling.
Represented by WU Changye Nested Sampling for General Bayesian Computation
10. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
1) Importance Sampling :
i = 1, · · · , n, Ui ∼ U[0,1],
ˆJ1 = 1
n
n
i=1 ϕ(Ui )
2) Riemann approximation :
i = 1, · · · , n, Ui ∼ U[0,1], U(i) is the order statistics of
(U1, · · · , Un), U(1) ≤ · · · ≤ U(n),
ˆJ2 = n−1
i=1 ϕ(U(i))(U(i+1) − U(i))
3) A complicated method :
x0 = 1
step1 : i = 1, · · · , N, U1
i ∼ U[0,1], x1 = max{U1
1 , · · · , U1
N}
step2 : i = 1, · · · , N, U2
i ∼ U[0,x1], x2 = max{U2
1 , · · · , U2
N}
· · · · · ·
setp n : i = 1, · · · , N, Un
i ∼ U[0,xn−1], xn = max{Un
1 , · · · , Un
N}
ˆJ3 = n
i=1 ϕ(xi )(xi−1 − xi )
Represented by WU Changye Nested Sampling for General Bayesian Computation
11. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Nested sampling takes the third method and the reason is that ϕ is
a decreasing function and in many cases it decreases rapidly.
Figure: Graph of ϕ(x) and the trace of (xi , ϕ(xi ))
Represented by WU Changye Nested Sampling for General Bayesian Computation
12. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
First, we consider the distributions of x1, · · · , xn :
for u ∈ [0, 1],
P(x1 < u) = P(U1
1 < u, · · · , U1
N < u)
=
N
i=1
P(U1
i < u)
= uN
As a result, the density function of x1 is
f (x1) = NxN−1
1
By the same method, we have :
f (xk|xk−1) =
N
xk−1
xk
xk−1
N−1
Represented by WU Changye Nested Sampling for General Bayesian Computation
13. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Note tk = xk
xk−1
,
P(tk ≤ t) = P(xk ≤ tx|xk−1 = x)fxk−1
(x)dx
=
xk−1
tx
0
fxk |xk−1
(y|x)fxk−1
(x)dxdy
=
xk−1
tx
0
N
x
y
x
N−1
fxk−1
(x)dxdy
=
xk−1
tN
fxk−1
(x)dx = tN
Besides,
P(tk ≤ t|xk−1 = x) = P(xk ≤ tx|xk−1 = x) = tN
As a result, we have tk ⊥ xk−1.
Represented by WU Changye Nested Sampling for General Bayesian Computation
14. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Moreover, a point estimate for xk can be written entirely in terms
of point estimates for the tk,
xk =
xk
xk−1
×
xk−1
xk−2
×· · ·×
x1
x0
×x0 = tk ·tk−1 · · · t1 ·x0 =
k
i=1
ti ·x0
More appropriate to the large range common to many problems,
log xk becomes
log xk = log
k
i=1
ti · x0 =
k
i=1
log ti + log x0
where the logarithmic shrinkage is distributed as
f (log t) = Ne(N−1) log t
with the mean and the variance :
E(log t) = −
1
N
V(log t) =
1
N2
Represented by WU Changye Nested Sampling for General Bayesian Computation
15. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Taking the mean as the point estimate for each log ti finally gives
log
xk
x0
= −
k
N
±
√
k
N
Parameterizing xk in terms of the shrinkage proves immediately
advantageous – because the log ti are independent, the errors in the
point estimates tend to cancel and the estimates for the xk grow
increasingly more accurate with k.
xk = exp(−
k
N
)
Represented by WU Changye Nested Sampling for General Bayesian Computation
16. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Next, we consider the distribution of ϕ(X), where X ∼ U[0, 1]
Considering the random variable X = ϕ−1(L(θ)), where θ ∼ π.
Notice that :
ϕ−1
: [0, Lmax] → [0, 1],
λ → P(L(θ) > λ)
for u ∈ [0, 1],
P(X < u) = P(ϕ−1
(L(θ)) < u)
= P(L(θ) > ϕ(u))
= ϕ−1
(ϕ(u))
= u
This means that ϕ−1(L(θ)) follows the U[0, 1] and ϕ(X) ∼ L(θ).
Represented by WU Changye Nested Sampling for General Bayesian Computation
17. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Considering the situation on the truncated distribution :
π(θ) ∝
π(θ) L(θ) > L0
0 otherwise
Let X0 = ϕ−1(L0) and X = ϕ−1(L(θ)), where θ ∼ π.
For u ∈ [0, X0],
P(X < u) = P(ϕ−1
(L(θ)) < u|L(θ) > L0)
=
P(L(θ) > ϕ(u))
P(L(θ) > L0)
=
ϕ−1(ϕ(u))
X0
=
u
X0
X ∼ U[0, X0],
As a result, ϕ(X) ∼ L(θ), where X ∼ U[0, X0] and θ ∼ π.
Represented by WU Changye Nested Sampling for General Bayesian Computation
18. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Algorithm
The algorithm based on the method discussed in the previous
section is described in below :
– Iteration 1 : sample independently N points θ1,i from the prior
π(θ), determine θ1 = arg min1≤i≤N L(θ1,i ) and set ϕ1 = L(θ1)
– Iteration 2 : obtain the N current values θ2,i , by reproducing the
θ1,i ’s except for θ1 that is replaced by a draw from the prior
distribution π conditional upon L(θ) ≥ ϕ1 ; then select θ2 as
θ2 = arg min1≤i≤N L(θ2,i ), and set ϕ2 = L(θ2)
– Iterate the above step until a given stopping rule is satisfied, for
instance when observing very small changes in the approximation
ˆZ or when reaching the maximal value of L(θ) when it is known.
Represented by WU Changye Nested Sampling for General Bayesian Computation
19. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
ˆZ =
J
i=1
ϕi (xi−1 − xi )
Represented by WU Changye Nested Sampling for General Bayesian Computation
20. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
By-product of Nested Sampling
Skilling indicates that nested sampling provides simulations from
the posterior distribution at no extra cost : "the existing sequence
of points θ1, θ2, θ3, . . . already gives a set of posterior
representatives, provided the i’th is assigned the appropriate
importance ωi Li "
Eπ(f (θ)) = Θ π(θ)L(θ)f (θ)dθ
Θ π(θ)L(θ)dθ
We can use a single run of nested sampling to obtain estimators of
both the numerator and the denominator, the latter being the
evidence Z. The estimator of the numerator is
j
i=1
(xi−1 − xi )ϕi f (θi ) (1)
Represented by WU Changye Nested Sampling for General Bayesian Computation
21. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Lemma 3(N.Chopin & C.P Robert) :
Let f (l) = Eπ{f (θ)|L(θ) = l} for l > 0, then, if f is absolutely
continuous,
1
0
ϕ(x)f (ϕ(x)) dx = π(θ)L(θ)f (θ)dθ
Proof : Let ψ : x → xf (x),
π(θ)L(θ)f (θ)dθ = Eπ[ψ{L(θ}]
=
+∞
0
Pπ(ψ{L(θ} > l)dl
=
+∞
0
ϕ−1
(ψ−1
(l))dl =
1
0
ψ(ϕ(x))dx
Represented by WU Changye Nested Sampling for General Bayesian Computation
22. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Termination
The author suggests that
max(L1, · · · , LN)Xj < fZj =⇒ termination
where f is some fraction.
Represented by WU Changye Nested Sampling for General Bayesian Computation
23. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
N ?
The larger N is, the smaller the variability of the approximation is.
Represented by WU Changye Nested Sampling for General Bayesian Computation
24. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
How to sample N points from the constraint parametric
space
Using a MCMC method which constructs a Markov Chain that has
the invariant distribution of the truncated distribution.
Represented by WU Changye Nested Sampling for General Bayesian Computation
25. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
A decentred gaussian example
The prior is
π(θ) =
d
i=1
1
√
2π
exp −
1
2
(θ(k)
)2
and the likelihood is
L(y|θ) =
d
i=1
1
√
2π
exp −
1
2
(yk − θ(k)
)2
In this example, we can calculate the evidence analytically
Z =
Rd
L(θ)π(θ)dθ =
exp(−
d
k=1 y2
k
4 )
2d πd/2
Represented by WU Changye Nested Sampling for General Bayesian Computation
26. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Figure: Graph of ϕ(x) and the trace of (xi , ϕ(xi )) with d = 1 and y = 10.
Represented by WU Changye Nested Sampling for General Bayesian Computation
27. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Figure: The prior distribution and the likelihood with d = 1 and y = 10.
Represented by WU Changye Nested Sampling for General Bayesian Computation
28. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Figure: the box-plot of log ˆZ − log Z with d = 1 and y = 10 for Nested
sampling and Monte Carlo.
Represented by WU Changye Nested Sampling for General Bayesian Computation
29. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Figure: the box-plot of log ˆZ − log Z with d = 5 and y = (3, 3, 3, 3, 3).
Represented by WU Changye Nested Sampling for General Bayesian Computation
30. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
A Probit Model
We consider the arsenic dataset and a probit model studied in
Chapter 5 of Gelman & Hill (2006). The observations are
independent Bernoulli variables yi such that
P(yi = 1|xi ) = Φ(xT
i θ), where xi is a vector of d covariates, θ is a
vector parameter of size d, and Φ denotes the standard normal
distribution function. In this particular example, d = 7.
Represented by WU Changye Nested Sampling for General Bayesian Computation
31. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
The prior is
θ ∼ N(0, 102
Id )
L(θ) =
n
i=1
Φ(xT
i θ)
yi
1 − Φ(xT
i θ)
1−yi
Represented by WU Changye Nested Sampling for General Bayesian Computation
32. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Figure: the box-plot of log ˆZ with N = 20 for HMC and random walk
MCMC. The blue line remarks the true value of log Z(Chib’s method).
Represented by WU Changye Nested Sampling for General Bayesian Computation
33. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Posterior Samples
We use the Gaussian example to illustrate this result. Let
f (θ) = exp(−3θ + 9d
2 ).
Figure: The box-plot of the log-relative error of log ˆZ − log Z and
log ˆE(f ) − log E(f )
Represented by WU Changye Nested Sampling for General Bayesian Computation
34. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Conclusion
– Nested sampling reverses the accepted approach to Bayesian
computation by putting the evidence first.
– Nested sampling samples more sparsely from the prior in regions
where the likelihood is low and more densely where the likelihood
is high, resulting in greater efficiency than a sampler that draws
directly from the prior.
– The procedure runs with an evolving collection of N points,
where N can be chosen small for speed or large for accuracy.
– Nested sampling always reduces a multidimensional integral to
the integral of a one-dimensional monotonic function, no matter
how many dimensions θ occupies, and no matter how strange the
shape of the likelihood function L(θ) is.
Represented by WU Changye Nested Sampling for General Bayesian Computation
35. Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Problems
– How to generate N independent points in the constraint
parametric space is an important problem. Techniques to do so
effectively and efficiently may vary from problem to problem.
– Termination is also another problem in practice.
Represented by WU Changye Nested Sampling for General Bayesian Computation