Nested sampling

Outline
Nested Sampling
Posterior Simulation
Nested Sampling Termination and Size of N
Numerical Examples
Conclusion
Nested Sampling for General Bayesian
Computation
Represented by WU Changye
12 février 2015
Represented by WU Changye Nested Sampling for General Bayesian Computation

Outline
Nested Sampling
Numerical Examples
Conclusion
Nested Sampling
Numerical Examples
Conclusion

Outline
Nested Sampling
Numerical Examples
Conclusion
Introduction
In the Bayesian paradigm, the parameter θ follows the prior
distribution π, the observations y follow the distribution L(y|θ)
given θ, then the posterior distribution f (θ|y) which indicates the
distribution of θ given the observations y has the following form :
f (θ|y) =
L(y|θ)π(θ)
Θ L(y|θ)π(θ)dθ
The objective of nested sampling is to compute the ’evidence’ :
Z =
Θ
L(y|θ)π(θ)dθ

Outline
Nested Sampling
Numerical Examples
Conclusion
θ is a random variable, then
Z = Eπ(L(θ))
For simplicity, let L(θ) denote the likelihood L(y|θ). The cumulative
distribution function of L(θ) is
F(λ) =
L(θ)<λ
π(θ)dθ
Deﬁne the induced measure µ on R by the likelihood function and
the prior as follwing
µ(A) = Pπ(L(θ) ∈ A)

Outline
Nested Sampling
Numerical Examples
Conclusion
Lemma 1 : Eπ(L(θ)) = Eµ(X).
Proof : ∀g is a indication function of a measurable set A in R.
Then
Eπ(g(L(θ))) = Eπ(IA(L(θ))) =
L(θ)∈A
π(θ)dθ
However, µ(dx) = Θ δ{L(θ)}(dx)π(θ)dθ.
Eµ(g(X)) =
R
IA(x)µ(dx) =
Θ R
IA(x)δ{L(θ)}(dx) π(θ)dθ
Therefore,
Eµ(g(X)) = Eπ(IA(L(θ))) = Eπ(g(L(θ)))

Outline
Nested Sampling
Numerical Examples
Conclusion
In the general case, let {gn} be an increasing sequence of step
functions converging to identity function Id ; then {gn ◦ L} is an
increasing sequence of step functions converging to L and the
desired conclusion follows by taking limits.

Outline
Nested Sampling
Numerical Examples
Conclusion
Lemma 2 : If X is a positive-valued random variable, has p.d.f. f
and c.d.f. F, then :
∞
0
(1 − F(x))dx =
∞
0
xf (x)dx = E(X).
Proof :
∞
0
(1 − F(x))dx =
∞
0
(1 − P(X < x))dx
=
∞
0
P(X ≥ x)dx
=
∞
0
∞
x
f (y) · dy · dx
=
∞
0
f (y)
y
0
dx · dy
=
∞
0
f (y) · ydy = E(X)

Outline
Nested Sampling
Numerical Examples
Conclusion
According to Lemma 1 and 2,
Z = Eµ(X) =
∞
0
xdF(x) =
∞
0
(1 − F(x))dx
Let ϕ−1(x) = 1 − F(x) = P{θ : L(θ) > x}
Z =
∞
0
ϕ−1
(x)dx =
1
0
ϕ(x)dx
Therefore, we have the evidence represented by an one-dimensional
integration.

Outline
Nested Sampling
Numerical Examples
Conclusion
In order to compute the following integration :
J =
1
0
ϕ(x)dx
there are three methods based on sampling.

Outline
Nested Sampling
Numerical Examples
Conclusion
1) Importance Sampling :
i = 1, · · · , n, Ui ∼ U[0,1],
ˆJ1 = 1
n
n
i=1 ϕ(Ui )
2) Riemann approximation :
i = 1, · · · , n, Ui ∼ U[0,1], U(i) is the order statistics of
(U1, · · · , Un), U(1) ≤ · · · ≤ U(n),
ˆJ2 = n−1
i=1 ϕ(U(i))(U(i+1) − U(i))
3) A complicated method :
x0 = 1
step1 : i = 1, · · · , N, U1
i ∼ U[0,1], x1 = max{U1
1 , · · · , U1
N}
step2 : i = 1, · · · , N, U2
i ∼ U[0,x1], x2 = max{U2
1 , · · · , U2
N}
· · · · · ·
setp n : i = 1, · · · , N, Un
i ∼ U[0,xn−1], xn = max{Un
1 , · · · , Un
N}
ˆJ3 = n
i=1 ϕ(xi )(xi−1 − xi )

Outline
Nested Sampling
Numerical Examples
Conclusion
Nested sampling takes the third method and the reason is that ϕ is
a decreasing function and in many cases it decreases rapidly.
Figure: Graph of ϕ(x) and the trace of (xi , ϕ(xi ))

Outline
Nested Sampling
Numerical Examples
Conclusion
First, we consider the distributions of x1, · · · , xn :
for u ∈ [0, 1],
P(x1 < u) = P(U1
1 < u, · · · , U1
N < u)
=
N
i=1
P(U1
i < u)
= uN
As a result, the density function of x1 is
f (x1) = NxN−1
1
By the same method, we have :
f (xk|xk−1) =
N
xk−1
xk
xk−1
N−1

Outline
Nested Sampling
Numerical Examples
Conclusion
Note tk = xk
xk−1
,
P(tk ≤ t) = P(xk ≤ tx|xk−1 = x)fxk−1
(x)dx
=
xk−1
tx
0
fxk |xk−1
(y|x)fxk−1
(x)dxdy
=
xk−1
tx
0
N
x
y
x
N−1
fxk−1
(x)dxdy
=
xk−1
tN
fxk−1
(x)dx = tN
Besides,
P(tk ≤ t|xk−1 = x) = P(xk ≤ tx|xk−1 = x) = tN
As a result, we have tk ⊥ xk−1.

Outline
Nested Sampling
Numerical Examples
Conclusion
Moreover, a point estimate for xk can be written entirely in terms
of point estimates for the tk,
xk =
xk
xk−1
×
xk−1
xk−2
×· · ·×
x1
x0
×x0 = tk ·tk−1 · · · t1 ·x0 =
k
i=1
ti ·x0
More appropriate to the large range common to many problems,
log xk becomes
log xk = log
k
i=1
ti · x0 =
k
i=1
log ti + log x0
where the logarithmic shrinkage is distributed as
f (log t) = Ne(N−1) log t
with the mean and the variance :
E(log t) = −
1
N
V(log t) =
1
N2

Outline
Nested Sampling
Numerical Examples
Conclusion
Taking the mean as the point estimate for each log ti ﬁnally gives
log
xk
x0
= −
k
N
±
√
k
N
Parameterizing xk in terms of the shrinkage proves immediately
advantageous – because the log ti are independent, the errors in the
point estimates tend to cancel and the estimates for the xk grow
increasingly more accurate with k.
xk = exp(−
k
N
)

Outline
Nested Sampling
Numerical Examples
Conclusion
Next, we consider the distribution of ϕ(X), where X ∼ U[0, 1]
Considering the random variable X = ϕ−1(L(θ)), where θ ∼ π.
Notice that :
ϕ−1
: [0, Lmax] → [0, 1],
λ → P(L(θ) > λ)
for u ∈ [0, 1],
P(X < u) = P(ϕ−1
(L(θ)) < u)
= P(L(θ) > ϕ(u))
= ϕ−1
(ϕ(u))
= u
This means that ϕ−1(L(θ)) follows the U[0, 1] and ϕ(X) ∼ L(θ).

Outline
Nested Sampling
Numerical Examples
Conclusion
Considering the situation on the truncated distribution :
π(θ) ∝
π(θ) L(θ) > L0
0 otherwise
Let X0 = ϕ−1(L0) and X = ϕ−1(L(θ)), where θ ∼ π.
For u ∈ [0, X0],
P(X < u) = P(ϕ−1
(L(θ)) < u|L(θ) > L0)
=
P(L(θ) > ϕ(u))
P(L(θ) > L0)
=
ϕ−1(ϕ(u))
X0
=
u
X0
X ∼ U[0, X0],
As a result, ϕ(X) ∼ L(θ), where X ∼ U[0, X0] and θ ∼ π.

Outline
Nested Sampling
Numerical Examples
Conclusion
Algorithm
The algorithm based on the method discussed in the previous
section is described in below :
– Iteration 1 : sample independently N points θ1,i from the prior
π(θ), determine θ1 = arg min1≤i≤N L(θ1,i ) and set ϕ1 = L(θ1)
– Iteration 2 : obtain the N current values θ2,i , by reproducing the
θ1,i ’s except for θ1 that is replaced by a draw from the prior
distribution π conditional upon L(θ) ≥ ϕ1 ; then select θ2 as
θ2 = arg min1≤i≤N L(θ2,i ), and set ϕ2 = L(θ2)
– Iterate the above step until a given stopping rule is satisﬁed, for
instance when observing very small changes in the approximation
ˆZ or when reaching the maximal value of L(θ) when it is known.

Outline
Nested Sampling
Numerical Examples
Conclusion
ˆZ =
J
i=1
ϕi (xi−1 − xi )

Outline
Nested Sampling
Numerical Examples
Conclusion
By-product of Nested Sampling
Skilling indicates that nested sampling provides simulations from
the posterior distribution at no extra cost : "the existing sequence
of points θ1, θ2, θ3, . . . already gives a set of posterior
representatives, provided the i’th is assigned the appropriate
importance ωi Li "
Eπ(f (θ)) = Θ π(θ)L(θ)f (θ)dθ
Θ π(θ)L(θ)dθ
We can use a single run of nested sampling to obtain estimators of
both the numerator and the denominator, the latter being the
evidence Z. The estimator of the numerator is
j
i=1
(xi−1 − xi )ϕi f (θi ) (1)

Outline
Nested Sampling
Numerical Examples
Conclusion
Lemma 3(N.Chopin & C.P Robert) :
Let f (l) = Eπ{f (θ)|L(θ) = l} for l > 0, then, if f is absolutely
continuous,
1
0
ϕ(x)f (ϕ(x)) dx = π(θ)L(θ)f (θ)dθ
Proof : Let ψ : x → xf (x),
π(θ)L(θ)f (θ)dθ = Eπ[ψ{L(θ}]
=
+∞
0
Pπ(ψ{L(θ} > l)dl
=
+∞
0
ϕ−1
(ψ−1
(l))dl =
1
0
ψ(ϕ(x))dx

Outline
Nested Sampling
Numerical Examples
Conclusion
Termination
The author suggests that
max(L1, · · · , LN)Xj < fZj =⇒ termination
where f is some fraction.

Outline
Nested Sampling
Numerical Examples
Conclusion
N ?
The larger N is, the smaller the variability of the approximation is.

Outline
Nested Sampling
Numerical Examples
Conclusion
How to sample N points from the constraint parametric
space
Using a MCMC method which constructs a Markov Chain that has
the invariant distribution of the truncated distribution.

Outline
Nested Sampling
Numerical Examples
Conclusion
A decentred gaussian example
The prior is
π(θ) =
d
i=1
1
√
2π
exp −
1
2
(θ(k)
)2
and the likelihood is
L(y|θ) =
d
i=1
1
√
2π
exp −
1
2
(yk − θ(k)
)2
In this example, we can calculate the evidence analytically
Z =
Rd
L(θ)π(θ)dθ =
exp(−
d
k=1 y2
k
4 )
2d πd/2

Outline
Nested Sampling
Numerical Examples
Conclusion
Figure: Graph of ϕ(x) and the trace of (xi , ϕ(xi )) with d = 1 and y = 10.

Outline
Nested Sampling
Numerical Examples
Conclusion
Figure: The prior distribution and the likelihood with d = 1 and y = 10.

Outline
Nested Sampling
Numerical Examples
Conclusion
Figure: the box-plot of log ˆZ − log Z with d = 1 and y = 10 for Nested
sampling and Monte Carlo.

Outline
Nested Sampling
Numerical Examples
Conclusion
Figure: the box-plot of log ˆZ − log Z with d = 5 and y = (3, 3, 3, 3, 3).

Outline
Nested Sampling
Numerical Examples
Conclusion
A Probit Model
We consider the arsenic dataset and a probit model studied in
Chapter 5 of Gelman & Hill (2006). The observations are
independent Bernoulli variables yi such that
P(yi = 1|xi ) = Φ(xT
i θ), where xi is a vector of d covariates, θ is a
vector parameter of size d, and Φ denotes the standard normal
distribution function. In this particular example, d = 7.

Outline
Nested Sampling
Numerical Examples
Conclusion
The prior is
θ ∼ N(0, 102
Id )
L(θ) =
n
i=1
Φ(xT
i θ)
yi
1 − Φ(xT
i θ)
1−yi

Outline
Nested Sampling
Numerical Examples
Conclusion
Figure: the box-plot of log ˆZ with N = 20 for HMC and random walk
MCMC. The blue line remarks the true value of log Z(Chib’s method).

Outline
Nested Sampling
Numerical Examples
Conclusion
Posterior Samples
We use the Gaussian example to illustrate this result. Let
f (θ) = exp(−3θ + 9d
2 ).
Figure: The box-plot of the log-relative error of log ˆZ − log Z and
log ˆE(f ) − log E(f )

Outline
Nested Sampling
Numerical Examples
Conclusion
Conclusion
– Nested sampling reverses the accepted approach to Bayesian
computation by putting the evidence ﬁrst.
– Nested sampling samples more sparsely from the prior in regions
where the likelihood is low and more densely where the likelihood
is high, resulting in greater eﬃciency than a sampler that draws
directly from the prior.
– The procedure runs with an evolving collection of N points,
where N can be chosen small for speed or large for accuracy.
– Nested sampling always reduces a multidimensional integral to
the integral of a one-dimensional monotonic function, no matter
how many dimensions θ occupies, and no matter how strange the
shape of the likelihood function L(θ) is.

Outline
Nested Sampling
Numerical Examples
Conclusion
Problems
– How to generate N independent points in the constraint
parametric space is an important problem. Techniques to do so
eﬀectively and eﬃciently may vary from problem to problem.
– Termination is also another problem in practice.

Outline
Nested Sampling
Numerical Examples
Conclusion
Thank you !

Nested sampling

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Nested sampling

Similar to Nested sampling (20)

More from Christian Robert

More from Christian Robert (20)

Recently uploaded

Recently uploaded (20)

Nested sampling