1. The document discusses Rao-Blackwellisation, a technique that can make MCMC sampling more efficient.
2. Rao-Blackwellisation involves partitioning the state space and analytically integrating out some variables, reducing the variance of the estimates.
3. It can be applied to any Hastings-Metropolis algorithm and provides asymptotic computational improvements over standard MCMC with controlled additional computation.
1. Vanilla Rao–Blackwellisation of
Metropolis–Hastings algorithms
Christian P. Robert
Universit´ Paris-Dauphine and CREST, Paris, France
e
Joint works with Randal Douc, Pierre Jacob and Murray Smith
xian@ceremade.dauphine.fr
August 9, 2010
1 / 22
2. Main themes
1 Rao–Blackwellisation on MCMC.
2 Can be performed in any Hastings Metropolis algorithm.
3 Asymptotically more efficient than usual MCMC with a
controlled additional computing
4 Can take advantage of parallel capacities at a very basic level
2 / 22
3. Main themes
1 Rao–Blackwellisation on MCMC.
2 Can be performed in any Hastings Metropolis algorithm.
3 Asymptotically more efficient than usual MCMC with a
controlled additional computing
4 Can take advantage of parallel capacities at a very basic level
2 / 22
4. Main themes
1 Rao–Blackwellisation on MCMC.
2 Can be performed in any Hastings Metropolis algorithm.
3 Asymptotically more efficient than usual MCMC with a
controlled additional computing
4 Can take advantage of parallel capacities at a very basic level
2 / 22
8. Metropolis Hastings revisited Rao–Blackwellisation
Metropolis Hastings algorithm
1 We wish to approximate
h(x)π(x)dx
I = = h(x)¯ (x)dx
π
π(x)dx
2 x → π(x) is known but not π(x)dx.
1 n
3 Approximate I with δ = n t=1 h(x (t) ) where (x (t) ) is a Markov
chain with limiting distribution π .
¯
4 Convergence obtained from Law of Large Numbers or CLT for
Markov chains.
5 / 22
9. Metropolis Hastings revisited Rao–Blackwellisation
Metropolis Hastings algorithm
1 We wish to approximate
h(x)π(x)dx
I = = h(x)¯ (x)dx
π
π(x)dx
2 x → π(x) is known but not π(x)dx.
1 n
3 Approximate I with δ = n t=1 h(x (t) ) where (x (t) ) is a Markov
chain with limiting distribution π .
¯
4 Convergence obtained from Law of Large Numbers or CLT for
Markov chains.
5 / 22
10. Metropolis Hastings revisited Rao–Blackwellisation
Metropolis Hastings algorithm
1 We wish to approximate
h(x)π(x)dx
I = = h(x)¯ (x)dx
π
π(x)dx
2 x → π(x) is known but not π(x)dx.
1 n
3 Approximate I with δ = n t=1 h(x (t) ) where (x (t) ) is a Markov
chain with limiting distribution π .
¯
4 Convergence obtained from Law of Large Numbers or CLT for
Markov chains.
5 / 22
11. Metropolis Hastings revisited Rao–Blackwellisation
Metropolis Hastings algorithm
1 We wish to approximate
h(x)π(x)dx
I = = h(x)¯ (x)dx
π
π(x)dx
2 x → π(x) is known but not π(x)dx.
1 n
3 Approximate I with δ = n t=1 h(x (t) ) where (x (t) ) is a Markov
chain with limiting distribution π .
¯
4 Convergence obtained from Law of Large Numbers or CLT for
Markov chains.
5 / 22
12. Metropolis Hastings revisited Rao–Blackwellisation
Metropolis Hasting Algorithm
Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t) ).
2 Set x (t+1) = yt with probability
π(yt ) q(x (t) |yt )
α(x (t) , yt ) = min 1,
π(x (t) ) q(yt |x (t) )
Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisfied: ⊲ π is the
¯
stationary distribution of (x (t) ).
◮ The accepted candidates are simulated with the rejection algorithm.
6 / 22
13. Metropolis Hastings revisited Rao–Blackwellisation
Metropolis Hasting Algorithm
Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t) ).
2 Set x (t+1) = yt with probability
π(yt ) q(x (t) |yt )
α(x (t) , yt ) = min 1,
π(x (t) ) q(yt |x (t) )
Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisfied: ⊲ π is the
¯
stationary distribution of (x (t) ).
◮ The accepted candidates are simulated with the rejection algorithm.
6 / 22
14. Metropolis Hastings revisited Rao–Blackwellisation
Metropolis Hasting Algorithm
Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t) ).
2 Set x (t+1) = yt with probability
π(yt ) q(x (t) |yt )
α(x (t) , yt ) = min 1,
π(x (t) ) q(yt |x (t) )
Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisfied:
π(x)q(y |x)α(x, y ) = π(y )q(x|y )α(y , x).
⊲ π is the stationary distribution of (x (t) ).
¯
◮ The accepted candidates are simulated with the rejection algorithm.
6 / 22
15. Metropolis Hastings revisited Rao–Blackwellisation
Metropolis Hasting Algorithm
Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t) ).
2 Set x (t+1) = yt with probability
π(yt ) q(x (t) |yt )
α(x (t) , yt ) = min 1,
π(x (t) ) q(yt |x (t) )
Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisfied:
π(x)q(y |x)α(x, y ) = π(y )q(x|y )α(y , x).
⊲ π is the stationary distribution of (x (t) ).
¯
◮ The accepted candidates are simulated with the rejection algorithm.
6 / 22
16. Metropolis Hastings revisited Rao–Blackwellisation
Some properties of the HM algorithm
1 Alternative representation of the estimator δ is
n MN
1 (t) 1
δ= h(x )= ni h(zi ) ,
n t=1
N
i=1
where
zi ’s are the accepted yj ’s,
MN is the number of accepted yj ’s till time N,
ni is the number of times zi appears in the sequence (x (t) )t .
7 / 22
17. Metropolis Hastings revisited Rao–Blackwellisation
α(zi , ·) q(·|zi ) q(·|zi )
q (·|zi ) =
˜ ≤ ,
p(zi ) p(zi )
where p(zi ) = α(zi , y ) q(y |zi )dy . To simulate according to q (·|zi ):
˜
1 Propose a candidate y ∼ q(·|zi )
2 Accept with probability
q(y |zi )
q (y |zi )/
˜ = α(zi , y )
p(zi )
Otherwise, reject it and starts again.
3 ◮ this is the transition of the HM algorithm.
The transition kernel q admits π as a stationary distribution:
˜ ˜
π (x)˜ (y |x) =
˜ q
8 / 22
18. Metropolis Hastings revisited Rao–Blackwellisation
α(zi , ·) q(·|zi ) q(·|zi )
q (·|zi ) =
˜ ≤ ,
p(zi ) p(zi )
where p(zi ) = α(zi , y ) q(y |zi )dy . To simulate according to q (·|zi ):
˜
1 Propose a candidate y ∼ q(·|zi )
2 Accept with probability
q(y |zi )
q (y |zi )/
˜ = α(zi , y )
p(zi )
Otherwise, reject it and starts again.
3 ◮ this is the transition of the HM algorithm.
The transition kernel q admits π as a stationary distribution:
˜ ˜
π (x)˜ (y |x) =
˜ q
8 / 22
19. Metropolis Hastings revisited Rao–Blackwellisation
α(zi , ·) q(·|zi ) q(·|zi )
q (·|zi ) =
˜ ≤ ,
p(zi ) p(zi )
where p(zi ) = α(zi , y ) q(y |zi )dy . To simulate according to q (·|zi ):
˜
1 Propose a candidate y ∼ q(·|zi )
2 Accept with probability
q(y |zi )
q (y |zi )/
˜ = α(zi , y )
p(zi )
Otherwise, reject it and starts again.
3 ◮ this is the transition of the HM algorithm.
The transition kernel q admits π as a stationary distribution:
˜ ˜
π (x)˜ (y |x) =
˜ q
8 / 22
20. Metropolis Hastings revisited Rao–Blackwellisation
α(zi , ·) q(·|zi ) q(·|zi )
q (·|zi ) =
˜ ≤ ,
p(zi ) p(zi )
where p(zi ) = α(zi , y ) q(y |zi )dy . To simulate according to q (·|zi ):
˜
1 Propose a candidate y ∼ q(·|zi )
2 Accept with probability
q(y |zi )
q (y |zi )/
˜ = α(zi , y )
p(zi )
Otherwise, reject it and starts again.
3 ◮ this is the transition of the HM algorithm.
The transition kernel q admits π as a stationary distribution:
˜ ˜
π(x)p(x) α(x, y )q(y |x)
π (x)˜ (y |x) =
˜ q
π(u)p(u)du p(x)
π (x)
˜ q (y |x)
˜
8 / 22
21. Metropolis Hastings revisited Rao–Blackwellisation
α(zi , ·) q(·|zi ) q(·|zi )
q (·|zi ) =
˜ ≤ ,
p(zi ) p(zi )
where p(zi ) = α(zi , y ) q(y |zi )dy . To simulate according to q (·|zi ):
˜
1 Propose a candidate y ∼ q(·|zi )
2 Accept with probability
q(y |zi )
q (y |zi )/
˜ = α(zi , y )
p(zi )
Otherwise, reject it and starts again.
3 ◮ this is the transition of the HM algorithm.
The transition kernel q admits π as a stationary distribution:
˜ ˜
π(x)α(x, y )q(y |x)
π (x)˜ (y |x) =
˜ q
π(u)p(u)du
8 / 22
22. Metropolis Hastings revisited Rao–Blackwellisation
α(zi , ·) q(·|zi ) q(·|zi )
q (·|zi ) =
˜ ≤ ,
p(zi ) p(zi )
where p(zi ) = α(zi , y ) q(y |zi )dy . To simulate according to q (·|zi ):
˜
1 Propose a candidate y ∼ q(·|zi )
2 Accept with probability
q(y |zi )
q (y |zi )/
˜ = α(zi , y )
p(zi )
Otherwise, reject it and starts again.
3 ◮ this is the transition of the HM algorithm.
The transition kernel q admits π as a stationary distribution:
˜ ˜
π(y )α(y , x)q(x|y )
π (x)˜ (y |x) =
˜ q
π(u)p(u)du
8 / 22
23. Metropolis Hastings revisited Rao–Blackwellisation
α(zi , ·) q(·|zi ) q(·|zi )
q (·|zi ) =
˜ ≤ ,
p(zi ) p(zi )
where p(zi ) = α(zi , y ) q(y |zi )dy . To simulate according to q (·|zi ):
˜
1 Propose a candidate y ∼ q(·|zi )
2 Accept with probability
q(y |zi )
q (y |zi )/
˜ = α(zi , y )
p(zi )
Otherwise, reject it and starts again.
3 ◮ this is the transition of the HM algorithm.
The transition kernel q admits π as a stationary distribution:
˜ ˜
π (x)˜ (y |x) = π (y )˜ (x|y ) ,
˜ q ˜ q
8 / 22
24. Metropolis Hastings revisited Rao–Blackwellisation
Lemme
The sequence (zi , ni ) satisfies
1 (zi , ni )i is a Markov chain;
2 zi+1 and ni are independent given zi ;
3 ni is distributed as a geometric random variable with probability
parameter
p(zi ) := α(zi , y ) q(y |zi ) dy ; (1)
4 ˜
(zi )i is a Markov chain with transition kernel Q(z, dy ) = q (y |z)dy
˜
and stationary distribution π such that
˜
q (·|z) ∝ α(z, ·) q(·|z)
˜ and π (·) ∝ π(·)p(·) .
˜
9 / 22
25. Metropolis Hastings revisited Rao–Blackwellisation
Lemme
The sequence (zi , ni ) satisfies
1 (zi , ni )i is a Markov chain;
2 zi+1 and ni are independent given zi ;
3 ni is distributed as a geometric random variable with probability
parameter
p(zi ) := α(zi , y ) q(y |zi ) dy ; (1)
4 ˜
(zi )i is a Markov chain with transition kernel Q(z, dy ) = q (y |z)dy
˜
and stationary distribution π such that
˜
q (·|z) ∝ α(z, ·) q(·|z)
˜ and π (·) ∝ π(·)p(·) .
˜
9 / 22
26. Metropolis Hastings revisited Rao–Blackwellisation
Lemme
The sequence (zi , ni ) satisfies
1 (zi , ni )i is a Markov chain;
2 zi+1 and ni are independent given zi ;
3 ni is distributed as a geometric random variable with probability
parameter
p(zi ) := α(zi , y ) q(y |zi ) dy ; (1)
4 ˜
(zi )i is a Markov chain with transition kernel Q(z, dy ) = q (y |z)dy
˜
and stationary distribution π such that
˜
q (·|z) ∝ α(z, ·) q(·|z)
˜ and π (·) ∝ π(·)p(·) .
˜
9 / 22
27. Metropolis Hastings revisited Rao–Blackwellisation
Lemme
The sequence (zi , ni ) satisfies
1 (zi , ni )i is a Markov chain;
2 zi+1 and ni are independent given zi ;
3 ni is distributed as a geometric random variable with probability
parameter
p(zi ) := α(zi , y ) q(y |zi ) dy ; (1)
4 ˜
(zi )i is a Markov chain with transition kernel Q(z, dy ) = q (y |z)dy
˜
and stationary distribution π such that
˜
q (·|z) ∝ α(z, ·) q(·|z)
˜ and π (·) ∝ π(·)p(·) .
˜
9 / 22
29. Metropolis Hastings revisited Rao–Blackwellisation
Old bottle, new wine [or vice-versa]
indep
zi−1 zi
indep
ni−1
10 / 22
30. Metropolis Hastings revisited Rao–Blackwellisation
Old bottle, new wine [or vice-versa]
indep indep
zi−1 zi zi+1
indep indep
ni−1 ni
10 / 22
31. Metropolis Hastings revisited Rao–Blackwellisation
Old bottle, new wine [or vice-versa]
indep indep
zi−1 zi zi+1
indep indep
ni−1 ni
n MN
1 1
δ= h(x (t) ) = ni h(zi ) .
n t=1
N
i=1
10 / 22
32. Metropolis Hastings revisited Rao–Blackwellisation
Old bottle, new wine [or vice-versa]
indep indep
zi−1 zi zi+1
indep indep
ni−1 ni
n MN
1 1
δ= h(x (t) ) = ni h(zi ) .
n t=1
N
i=1
10 / 22
36. Metropolis Hastings revisited Rao–Blackwellisation
Formal importance sampling
Importance sampling perspective
1 A natural idea:
MN h(zi ) MN π(zi )
i=1 i=1 h(zi )
p(zi ) π (zi )
˜
δ∗ ≃ = .
MN 1 MN π(zi )
i=1 i=1
p(zi ) π (zi )
˜
2 But p not available in closed form.
12 / 22
37. Metropolis Hastings revisited Rao–Blackwellisation
Formal importance sampling
Importance sampling perspective
1 A natural idea:
MN h(zi ) MN π(zi )
i=1 i=1 h(zi )
p(zi ) π (zi )
˜
δ∗ ≃ = .
MN 1 MN π(zi )
i=1 i=1
p(zi ) π (zi )
˜
2 But p not available in closed form.
3 The geometric ni is the replacement obvious solution that is used in
the original Metropolis–Hastings estimate since E[ni ] = 1/p(zi ).
12 / 22
38. Metropolis Hastings revisited Rao–Blackwellisation
Formal importance sampling
Importance sampling perspective
1 A natural idea:
MN h(zi ) MN π(zi )
i=1 i=1 h(zi )
p(zi ) π (zi )
˜
δ∗ ≃ = .
MN 1 MN π(zi )
i=1 i=1
p(zi ) π (zi )
˜
2 But p not available in closed form.
3 The geometric ni is the replacement obvious solution that is used in
the original Metropolis–Hastings estimate since E[ni ] = 1/p(zi ).
12 / 22
39. Metropolis Hastings revisited Rao–Blackwellisation
Formal importance sampling
The crude estimate of 1/p(zi ),
∞
ni = 1 + I {uℓ ≥ α(zi , yℓ )} ,
j=1 ℓ≤j
can be improved:
Lemma
If (yj )j is an iid sequence with distribution q(y |zi ), the quantity
∞
ˆ
ξi = 1 + {1 − α(zi , yℓ )}
j=1 ℓ≤j
is an unbiased estimator of 1/p(zi ) which variance, conditional on zi , is
lower than the conditional variance of ni , {1 − p(zi )}/p 2 (zi ).
13 / 22
40. Metropolis Hastings revisited Rao–Blackwellisation
Formal importance sampling
Rao-Blackwellised, for sure?
∞
ˆ
ξi = 1 + {1 − α(zi , yℓ )}
j=1 ℓ≤j
1 Infinite sum but finite with at least positive probability:
π(yt ) q(x (t) |yt )
α(x (t) , yt ) = min 1,
π(x (t) ) q(yt |x (t) )
For example: take a symetric random walk as a proposal.
2 What if we wish to be sure that the sum is finite?
14 / 22
41. Metropolis Hastings revisited Rao–Blackwellisation
Variance reduction
Variance improvement
Proposition
If (yj )j is an iid sequence with distribution q(y |zi ) and (uj )j is an iid
uniform sequence, for any k ≥ 0, the quantity
∞
ˆ
ξik = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )}
j=1 1≤ℓ≤k∧j k+1≤ℓ≤j
is an unbiased estimator of 1/p(zi ) with an almost sure finite number of
terms.
15 / 22
42. Metropolis Hastings revisited Rao–Blackwellisation
Variance reduction
Variance improvement
Proposition
If (yj )j is an iid sequence with distribution q(y |zi ) and (uj )j is an iid
uniform sequence, for any k ≥ 0, the quantity
∞
ˆ
ξik = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )}
j=1 1≤ℓ≤k∧j k+1≤ℓ≤j
is an unbiased estimator of 1/p(zi ) with an almost sure finite number of
terms. Moreover, for k ≥ 1,
1 − (1 − 2p(zi ) + r (zi ))k 2 − p(zi )
„ «
h ˛ i
ˆ˛ 1 − p(zi )
V ξik ˛ zi = − (p(zi ) − r (zi )) ,
p 2 (zi ) 2p(zi ) − r (zi ) p 2 (zi )
α2 (zi , y ) q(y |zi ) dy .
R R
where p(zi ) := α(zi , y ) q(y |zi ) dy . and r (zi ) :=
15 / 22
43. Metropolis Hastings revisited Rao–Blackwellisation
Variance reduction
Variance improvement
Proposition
If (yj )j is an iid sequence with distribution q(y |zi ) and (uj )j is an iid
uniform sequence, for any k ≥ 0, the quantity
∞
ˆ
ξik = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )}
j=1 1≤ℓ≤k∧j k+1≤ℓ≤j
is an unbiased estimator of 1/p(zi ) with an almost sure finite number of
terms. Therefore, we have
ˆ ˆ ˆ
V ξi zi ≤ V ξik zi ≤ V ξi0 zi = V [ni | zi ] .
15 / 22
45. Metropolis Hastings revisited Rao–Blackwellisation
Variance reduction
not indep
zi−1 zi
not indep
ˆk
ξi−1
∞
ˆ
ξik = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )}
j=1 1≤ℓ≤k∧j k+1≤ℓ≤j
16 / 22
46. Metropolis Hastings revisited Rao–Blackwellisation
Variance reduction
not indep not indep
zi−1 zi zi+1
not indep not indep
ˆk
ξi−1 ˆ
ξik
∞
ˆ
ξik = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )}
j=1 1≤ℓ≤k∧j k+1≤ℓ≤j
16 / 22
47. Metropolis Hastings revisited Rao–Blackwellisation
Variance reduction
not indep not indep
zi−1 zi zi+1
not indep not indep
ˆk
ξi−1 ˆ
ξik
M ˆk
k i=1 ξi h(zi )
δM = M ˆk
.
i=1 ξi
16 / 22
48. Metropolis Hastings revisited Rao–Blackwellisation
Variance reduction
not indep not indep
zi−1 zi zi+1
not indep not indep
ˆk
ξi−1 ˆ
ξik
M ˆk
k i=1 ξi h(zi )
δM = M ˆk
.
i=1 ξi
16 / 22
49. Metropolis Hastings revisited Rao–Blackwellisation
Asymptotic results
Let
M ˆk
k i=1 ξi h(zi )
δM = M ˆk
.
i=1 ξi
For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}.
17 / 22
50. Metropolis Hastings revisited Rao–Blackwellisation
Asymptotic results
Let
M ˆk
k i=1 ξi h(zi )
δM = M ˆk
.
i=1 ξi
For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}. Assume
that there exists a positive function ϕ ≥ 1 such that
PM
i=1 h(zi )/p(zi ) P
∀h ∈ Cϕ , PM −→ π(h)
i=1 1/p(zi )
Theorem
Under the assumption that π(p) > 0, the following convergence property holds:
i) If h is in Cϕ , then
k P
δM −→M→∞ π(h) (◮Consistency)
17 / 22
51. Metropolis Hastings revisited Rao–Blackwellisation
Asymptotic results
Let
M ˆk
k i=1 ξi h(zi )
δM = M ˆk
.
i=1 ξi
For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}.
Assume that there exists a positive function ψ such that
√ M
i=1 h(zi )/p(zi ) L
∀h ∈ Cψ , M M
− π(h) −→ N (0, Γ(h))
i=1 1/p(zi )
Theorem
Under the assumption that π(p) > 0, the following convergence property
holds:
ii) If, in addition, h2 /p ∈ Cϕ and h ∈ Cψ , then
√ k L
M(δM − π(h)) −→M→∞ N (0, Vk [h − π(h)]) , (◮Clt)
where Vk (h) := π(p) ˆ
π(dz)V ξik z h2 (z)p(z) + Γ(h) .
17 / 22
52. Metropolis Hastings revisited Rao–Blackwellisation
Asymptotic results
We will need some additional assumptions. Assume a maximal inequality
for the Markov chain (zi )i : there exists a measurable function ζ such that
for any starting point x,
i
NCh (x)
∀h ∈ Cζ , Px sup [h(zi ) − π (h)] > ǫ ≤
˜
0≤i≤N ǫ2
j=0
Theorem
Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p 2 } ⊂ Cφ . Assume
moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .
Then, for any starting point x,
N
h(x (t) )
t=1 L
MN − π(h) −→N→∞ N (0, V0 [h − π(h)]) ,
N
where MN is defined by
18 / 22
53. Metropolis Hastings revisited Rao–Blackwellisation
Asymptotic results
We will need some additional assumptions. Assume a maximal inequality
for the Markov chain (zi )i : there exists a measurable function ζ such that
for any starting point x,
i
NCh (x)
∀h ∈ Cζ , Px sup [h(zi ) − π (h)] > ǫ ≤
˜
0≤i≤N ǫ2
j=0
Moreover, assume that ∃φ ≥ 1 such that for any starting point x,
˜ P
∀h ∈ Cφ , Q n (x, h) −→ π (h) = π(ph)/π(p) ,
˜
Theorem
Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p 2 } ⊂ Cφ . Assume
moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .
Then, for any starting point x,
N
t=1 h(x (t) ) L 18 / 22
54. Metropolis Hastings revisited Rao–Blackwellisation
Asymptotic results
We will need some additional assumptions. Assume a maximal inequality
for the Markov chain (zi )i : there exists a measurable function ζ such that
for any starting point x,
i
NCh (x)
∀h ∈ Cζ , Px sup [h(zi ) − π (h)] > ǫ ≤
˜
0≤i≤N ǫ2
j=0
Moreover, assume that ∃φ ≥ 1 such that for any starting point x,
˜ P
∀h ∈ Cφ , Q n (x, h) −→ π (h) = π(ph)/π(p) ,
˜
Theorem
Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p 2 } ⊂ Cφ . Assume
moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .
Then, for any starting point x,
N
t=1 h(x (t) ) L 18 / 22
55. Metropolis Hastings revisited Rao–Blackwellisation
Asymptotic results
i
NCh (x)
∀h ∈ Cζ , Px sup [h(zi ) − π (h)] > ǫ ≤
˜
0≤i≤N j=0 ǫ2
˜ P
∀h ∈ Cφ , Q n (x, h) −→ π (h) = π(ph)/π(p) ,
˜
Theorem
Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p 2 } ⊂ Cφ . Assume
moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .
Then, for any starting point x,
N
t=1h(x (t) ) L
MN − π(h) −→N→∞ N (0, V0 [h − π(h)]) ,
N
where MN is defined by
18 / 22
56. Metropolis Hastings revisited Rao–Blackwellisation
Asymptotic results
Theorem
Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p 2 } ⊂ Cφ . Assume
moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .
Then, for any starting point x,
N
h(x (t) )
t=1 L
MN − π(h) −→N→∞ N (0, V0 [h − π(h)]) ,
N
where MN is defined by
MN MN +1
ˆ
ξi0 ≤ N < ˆ
ξi0 .
i=1 i=1
18 / 22
57. Metropolis Hastings revisited Rao–Blackwellisation
Asymptotic results
Theorem
Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p 2 } ⊂ Cφ . Assume
moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .
Then, for any starting point x,
N
h(x (t) )
t=1 L
MN − π(h) −→N→∞ N (0, V0 [h − π(h)]) ,
N
where MN is defined by
MN MN +1
ˆ
ξi0 ≤ N < ˆ
ξi0 .
i=1 i=1
18 / 22
58. Metropolis Hastings revisited Rao–Blackwellisation
Illustrations
Variance gain (1)
h(x) x x2 IX >0 p(x)
τ = .1 0.971 0.953 0.957 0.207
τ =2 0.965 0.942 0.875 0.861
τ =5 0.913 0.982 0.785 0.826
τ =7 0.899 0.982 0.768 0.820
Ratios of the empirical variances of δ ∞ and δ estimating E[h(X )]: 100
MCMC iterations over 103 replications of a random walk Gaussian
proposal with scale τ .
19 / 22
59. Metropolis Hastings revisited Rao–Blackwellisation
Illustrations
Illustration (1)
Figure: Overlay of the variations of 250 iid realisations of the estimates
δ (gold) and δ ∞ (grey) of E[X ] = 0 for 1000 iterations, along with the
90% interquantile range for the estimates δ (brown) and δ ∞ (pink), in
the setting of a random walk Gaussian proposal with scale τ = 10.
20 / 22
60. Metropolis Hastings revisited Rao–Blackwellisation
Illustrations
Extra computational effort
median mean q.8 q.9 time
τ = .25 0.0 8.85 4.9 13 4.2
τ = .50 0.0 6.76 4 11 2.25
τ = 1.0 0.25 6.15 4 10 2.5
τ = 2.0 0.20 5.90 3.5 8.5 4.5
Additional computing effort due: median and mean numbers of additional
iterations, 80% and 90% quantiles for the additional iterations, and ratio
of the average R computing times obtained over 105 simulations
21 / 22
61. Metropolis Hastings revisited Rao–Blackwellisation
Illustrations
Illustration (2)
Figure: Overlay of the variations of 500 iid realisations of the estimates
δ (deep grey), δ ∞ (medium grey) and of the importance sampling version
(light grey) of E[X ] = 10 when X ∼ Exp(.1) for 100 iterations, along
with the 90% interquantile ranges (same colour code), in the setting of
an independent exponential proposal with scale µ = 0.02. 22 / 22