A Vanilla Rao-Blackwellisation

Vanilla Rao–Blackwellisation of
Metropolis–Hastings algorithms

Christian P. Robert
Universit´ Paris-Dauphine and CREST, Paris, France
e
Joint works with Randal Douc, Pierre Jacob and Murray Smith

xian@ceremade.dauphine.fr

August 9, 2010

1 / 22

Main themes

1 Rao–Blackwellisation on MCMC.
2 Can be performed in any Hastings Metropolis algorithm.
3 Asymptotically more eﬃcient than usual MCMC with a
controlled additional computing
4 Can take advantage of parallel capacities at a very basic level

2 / 22

Metropolis Hastings revisited Rao–Blackwellisation

Outline

1 Metropolis Hastings revisited

2 Rao–Blackwellisation
Formal importance sampling
Variance reduction
Asymptotic results
Illustrations

3 / 22


Outline


Variance reduction
Asymptotic results
Illustrations

4 / 22


Metropolis Hastings algorithm

1 We wish to approximate

h(x)π(x)dx
I = = h(x)¯ (x)dx
π
π(x)dx

2 x → π(x) is known but not π(x)dx.
1 n
3 Approximate I with δ = n t=1 h(x (t) ) where (x (t) ) is a Markov
chain with limiting distribution π .
¯
4 Convergence obtained from Law of Large Numbers or CLT for
Markov chains.

5 / 22


Metropolis Hasting Algorithm

Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t) ).
2 Set x (t+1) = yt with probability

π(yt ) q(x (t) |yt )
α(x (t) , yt ) = min 1,
π(x (t) ) q(yt |x (t) )

Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisﬁed: ⊲ π is the
¯
stationary distribution of (x (t) ).
◮ The accepted candidates are simulated with the rejection algorithm.

6 / 22


Metropolis Hasting Algorithm
Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t) ).
2 Set x (t+1) = yt with probability

α(x (t) , yt ) = min 1,
π(x (t) ) q(yt |x (t) )

Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisﬁed:

π(x)q(y |x)α(x, y ) = π(y )q(x|y )α(y , x).

⊲ π is the stationary distribution of (x (t) ).
¯
◮ The accepted candidates are simulated with the rejection algorithm.
6 / 22


Some properties of the HM algorithm

1 Alternative representation of the estimator δ is
n MN
1 (t) 1
δ= h(x )= ni h(zi ) ,
n t=1
N
i=1

where
zi ’s are the accepted yj ’s,
MN is the number of accepted yj ’s till time N,
ni is the number of times zi appears in the sequence (x (t) )t .

7 / 22


α(zi , ·) q(·|zi ) q(·|zi )
q (·|zi ) =
˜ ≤ ,
p(zi ) p(zi )

where p(zi ) = α(zi , y ) q(y |zi )dy . To simulate according to q (·|zi ):
˜
1 Propose a candidate y ∼ q(·|zi )
2 Accept with probability

q(y |zi )
q (y |zi )/
˜ = α(zi , y )
p(zi )

Otherwise, reject it and starts again.
3 ◮ this is the transition of the HM algorithm.
The transition kernel q admits π as a stationary distribution:
˜ ˜

π (x)˜ (y |x) =
˜ q

8 / 22


Lemme
The sequence (zi , ni ) satisﬁes
1 (zi , ni )i is a Markov chain;
2 zi+1 and ni are independent given zi ;
3 ni is distributed as a geometric random variable with probability
parameter
p(zi ) := α(zi , y ) q(y |zi ) dy ; (1)

4 ˜
(zi )i is a Markov chain with transition kernel Q(z, dy ) = q (y |z)dy
˜
and stationary distribution π such that
˜

q (·|z) ∝ α(z, ·) q(·|z)
˜ and π (·) ∝ π(·)p(·) .
˜

9 / 22


Old bottle, new wine [or vice-versa]

zi−1

10 / 22



indep
zi−1 zi

indep

ni−1

10 / 22



indep indep
zi−1 zi zi+1

indep indep

ni−1 ni

10 / 22


indep indep
zi−1 zi zi+1

indep indep

ni−1 ni

n MN
1 1
δ= h(x (t) ) = ni h(zi ) .
n t=1
N
i=1

10 / 22


Outline


Variance reduction
Asymptotic results
Illustrations

11 / 22



Importance sampling perspective

1 A natural idea:
MN
1 h(zi )
δ∗ = ,
N p(zi )
i=1

12 / 22




1 A natural idea:

MN h(zi ) MN π(zi )
i=1 i=1 h(zi )
p(zi ) π (zi )
˜
δ∗ ≃ = .
MN 1 MN π(zi )
i=1 i=1
p(zi ) π (zi )
˜

12 / 22




1 A natural idea:

i=1 i=1 h(zi )
p(zi ) π (zi )
˜
δ∗ ≃ = .
MN 1 MN π(zi )
i=1 i=1
p(zi ) π (zi )
˜

2 But p not available in closed form.

12 / 22




1 A natural idea:

i=1 i=1 h(zi )
p(zi ) π (zi )
˜
δ∗ ≃ = .
MN 1 MN π(zi )
i=1 i=1
p(zi ) π (zi )
˜

2 But p not available in closed form.
3 The geometric ni is the replacement obvious solution that is used in
the original Metropolis–Hastings estimate since E[ni ] = 1/p(zi ).

12 / 22



The crude estimate of 1/p(zi ),
∞
ni = 1 + I {uℓ ≥ α(zi , yℓ )} ,
j=1 ℓ≤j

can be improved:
Lemma
If (yj )j is an iid sequence with distribution q(y |zi ), the quantity
∞
ˆ
ξi = 1 + {1 − α(zi , yℓ )}
j=1 ℓ≤j

is an unbiased estimator of 1/p(zi ) which variance, conditional on zi , is
lower than the conditional variance of ni , {1 − p(zi )}/p 2 (zi ).

13 / 22



Rao-Blackwellised, for sure?

∞
ˆ
ξi = 1 + {1 − α(zi , yℓ )}
j=1 ℓ≤j

1 Infinite sum but finite with at least positive probability:

α(x (t) , yt ) = min 1,
π(x (t) ) q(yt |x (t) )

For example: take a symetric random walk as a proposal.
2 What if we wish to be sure that the sum is finite?

14 / 22


Variance reduction

Variance improvement

Proposition
If (yj )j is an iid sequence with distribution q(y |zi ) and (uj )j is an iid
uniform sequence, for any k ≥ 0, the quantity
∞
ˆ
ξik = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )}
j=1 1≤ℓ≤k∧j k+1≤ℓ≤j

is an unbiased estimator of 1/p(zi ) with an almost sure ﬁnite number of
terms.

15 / 22


Variance reduction


Proposition
∞
ˆ
j=1 1≤ℓ≤k∧j k+1≤ℓ≤j

terms. Moreover, for k ≥ 1,
1 − (1 − 2p(zi ) + r (zi ))k 2 − p(zi )
„ «
h ˛ i
ˆ˛ 1 − p(zi )
V ξik ˛ zi = − (p(zi ) − r (zi )) ,
p 2 (zi ) 2p(zi ) − r (zi ) p 2 (zi )

α2 (zi , y ) q(y |zi ) dy .
R R
where p(zi ) := α(zi , y ) q(y |zi ) dy . and r (zi ) :=

15 / 22


Variance reduction


Proposition
∞
ˆ
j=1 1≤ℓ≤k∧j k+1≤ℓ≤j

terms. Therefore, we have

ˆ ˆ ˆ
V ξi zi ≤ V ξik zi ≤ V ξi0 zi = V [ni | zi ] .

15 / 22


Variance reduction

zi−1

∞
ˆ
j=1 1≤ℓ≤k∧j k+1≤ℓ≤j

16 / 22


Variance reduction

not indep
zi−1 zi

not indep

ˆk
ξi−1

∞
ˆ
j=1 1≤ℓ≤k∧j k+1≤ℓ≤j

16 / 22


Variance reduction

not indep not indep
zi−1 zi zi+1

not indep not indep

ˆk
ξi−1 ˆ
ξik

∞
ˆ
j=1 1≤ℓ≤k∧j k+1≤ℓ≤j

16 / 22


Variance reduction

not indep not indep
zi−1 zi zi+1

not indep not indep

ˆk
ξi−1 ˆ
ξik

M ˆk
k i=1 ξi h(zi )
δM = M ˆk
.
i=1 ξi

16 / 22


Asymptotic results

Let
M ˆk
k i=1 ξi h(zi )
δM = M ˆk
.
i=1 ξi

For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}.

17 / 22


Asymptotic results

Let
M ˆk
k i=1 ξi h(zi )
δM = M ˆk
.
i=1 ξi

For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}. Assume
that there exists a positive function ϕ ≥ 1 such that
PM
i=1 h(zi )/p(zi ) P
∀h ∈ Cϕ , PM −→ π(h)
i=1 1/p(zi )

Theorem

Under the assumption that π(p) > 0, the following convergence property holds:
i) If h is in Cϕ , then

k P
δM −→M→∞ π(h) (◮Consistency)

17 / 22


Asymptotic results

Let
M ˆk
k i=1 ξi h(zi )
δM = M ˆk
.
i=1 ξi

For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}.
Assume that there exists a positive function ψ such that
√ M
i=1 h(zi )/p(zi ) L
∀h ∈ Cψ , M M
− π(h) −→ N (0, Γ(h))
i=1 1/p(zi )

Theorem

Under the assumption that π(p) > 0, the following convergence property
holds:
ii) If, in addition, h2 /p ∈ Cϕ and h ∈ Cψ , then
√ k L
M(δM − π(h)) −→M→∞ N (0, Vk [h − π(h)]) , (◮Clt)

where Vk (h) := π(p) ˆ
π(dz)V ξik z h2 (z)p(z) + Γ(h) .
17 / 22


Asymptotic results

We will need some additional assumptions. Assume a maximal inequality
for the Markov chain (zi )i : there exists a measurable function ζ such that
for any starting point x,
 
i
NCh (x)
∀h ∈ Cζ , Px  sup [h(zi ) − π (h)] > ǫ ≤
˜
0≤i≤N ǫ2
j=0

Theorem

Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p 2 } ⊂ Cφ . Assume
moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .

Then, for any starting point x,
N
h(x (t) )
t=1 L
MN − π(h) −→N→∞ N (0, V0 [h − π(h)]) ,
N

where MN is deﬁned by
18 / 22


Asymptotic results

We will need some additional assumptions. Assume a maximal inequality
for the Markov chain (zi )i : there exists a measurable function ζ such that
for any starting point x,
 
i
NCh (x)
∀h ∈ Cζ , Px  sup [h(zi ) − π (h)] > ǫ ≤
˜
0≤i≤N ǫ2
j=0

Moreover, assume that ∃φ ≥ 1 such that for any starting point x,
˜ P
∀h ∈ Cφ , Q n (x, h) −→ π (h) = π(ph)/π(p) ,
˜

Theorem

moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .

N
t=1 h(x (t) ) L 18 / 22


Asymptotic results

 
i
NCh (x)
∀h ∈ Cζ , Px  sup [h(zi ) − π (h)] > ǫ ≤
˜
0≤i≤N j=0 ǫ2

˜ P
∀h ∈ Cφ , Q n (x, h) −→ π (h) = π(ph)/π(p) ,
˜

Theorem

moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .

N
t=1h(x (t) ) L
MN − π(h) −→N→∞ N (0, V0 [h − π(h)]) ,
N

18 / 22


Asymptotic results

Theorem

moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .

N
h(x (t) )
t=1 L
MN − π(h) −→N→∞ N (0, V0 [h − π(h)]) ,
N

MN MN +1
ˆ
ξi0 ≤ N < ˆ
ξi0 .
i=1 i=1

18 / 22


Illustrations

Variance gain (1)

h(x) x x2 IX >0 p(x)
τ = .1 0.971 0.953 0.957 0.207
τ =2 0.965 0.942 0.875 0.861
τ =5 0.913 0.982 0.785 0.826
τ =7 0.899 0.982 0.768 0.820

Ratios of the empirical variances of δ ∞ and δ estimating E[h(X )]: 100
MCMC iterations over 103 replications of a random walk Gaussian
proposal with scale τ .

19 / 22


Illustrations

Illustration (1)

Figure: Overlay of the variations of 250 iid realisations of the estimates
δ (gold) and δ ∞ (grey) of E[X ] = 0 for 1000 iterations, along with the
90% interquantile range for the estimates δ (brown) and δ ∞ (pink), in
the setting of a random walk Gaussian proposal with scale τ = 10.
20 / 22


Illustrations

Extra computational eﬀort

median mean q.8 q.9 time
τ = .25 0.0 8.85 4.9 13 4.2
τ = .50 0.0 6.76 4 11 2.25
τ = 1.0 0.25 6.15 4 10 2.5
τ = 2.0 0.20 5.90 3.5 8.5 4.5

Additional computing eﬀort due: median and mean numbers of additional
iterations, 80% and 90% quantiles for the additional iterations, and ratio
of the average R computing times obtained over 105 simulations

21 / 22


Illustrations

Illustration (2)

Figure: Overlay of the variations of 500 iid realisations of the estimates
δ (deep grey), δ ∞ (medium grey) and of the importance sampling version
(light grey) of E[X ] = 10 when X ∼ Exp(.1) for 100 iterations, along
with the 90% interquantile ranges (same colour code), in the setting of
an independent exponential proposal with scale µ = 0.02. 22 / 22

A Vanilla Rao-Blackwellisation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A Vanilla Rao-Blackwellisation

Similar to A Vanilla Rao-Blackwellisation (20)

More from Christian Robert

More from Christian Robert (20)

Recently uploaded

Recently uploaded (20)

A Vanilla Rao-Blackwellisation