### SlideShare for iOS

by Linkedin Corporation

FREE - On the App Store

- Total Views
- 1,173
- Views on SlideShare
- 712
- Embed Views

- Likes
- 0
- Downloads
- 5
- Comments
- 0

http://www-public.it-sudparis.eu | 458 |

http://teukdey.free.fr | 1 |

http://translate.googleusercontent.com | 1 |

http://www.pearltrees.com | 1 |

Uploaded via SlideShare as Adobe PDF

© All Rights Reserved

- 1. Scaling analysis of multiple-try MCMC methods Randal DOUC randal.douc@it-sudparis.eu Travail joint avec Mylène Bédard et Eric Moulines. 1 / 25
- 2. Themes 1 MCMC algorithms with multiple proposals: MCTM, MTM-C. 2 Analysis through optimal scaling (introduced by Roberts, Gelman, Gilks, 1998) 3 Hit and Run algorithm. 2 / 25
- 3. Themes 1 MCMC algorithms with multiple proposals: MCTM, MTM-C. 2 Analysis through optimal scaling (introduced by Roberts, Gelman, Gilks, 1998) 3 Hit and Run algorithm. 2 / 25
- 4. Themes 1 MCMC algorithms with multiple proposals: MCTM, MTM-C. 2 Analysis through optimal scaling (introduced by Roberts, Gelman, Gilks, 1998) 3 Hit and Run algorithm. 2 / 25
- 5. Themes 1 MCMC algorithms with multiple proposals: MCTM, MTM-C. 2 Analysis through optimal scaling (introduced by Roberts, Gelman, Gilks, 1998) 3 Hit and Run algorithm. 2 / 25
- 6. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Plan de l’exposé 1 Introduction 2 MH algorithms with multiple proposals Random Walk MH MCTM algorithm MTM-C algorithms 3 Optimal scaling Main results 4 Optimising the speed up process MCTM algorithm MTM-C algorithms 5 Conclusion 3 / 25
- 7. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Plan de l’exposé 1 Introduction 2 MH algorithms with multiple proposals Random Walk MH MCTM algorithm MTM-C algorithms 3 Optimal scaling Main results 4 Optimising the speed up process MCTM algorithm MTM-C algorithms 5 Conclusion 3 / 25
- 8. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Plan de l’exposé 1 Introduction 2 MH algorithms with multiple proposals Random Walk MH MCTM algorithm MTM-C algorithms 3 Optimal scaling Main results 4 Optimising the speed up process MCTM algorithm MTM-C algorithms 5 Conclusion 3 / 25
- 9. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Plan de l’exposé 1 Introduction 2 MH algorithms with multiple proposals Random Walk MH MCTM algorithm MTM-C algorithms 3 Optimal scaling Main results 4 Optimising the speed up process MCTM algorithm MTM-C algorithms 5 Conclusion 3 / 25
- 10. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Plan de l’exposé 1 Introduction 2 MH algorithms with multiple proposals Random Walk MH MCTM algorithm MTM-C algorithms 3 Optimal scaling Main results 4 Optimising the speed up process MCTM algorithm MTM-C algorithms 5 Conclusion 3 / 25
- 11. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Plan 1 Introduction 2 MH algorithms with multiple proposals Random Walk MH MCTM algorithm MTM-C algorithms 3 Optimal scaling Main results 4 Optimising the speed up process MCTM algorithm MTM-C algorithms 5 Conclusion 4 / 25
- 12. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Metropolis Hastings (MH) algorithm 1 We wish to approximate π(x ) I= h(x ) dx = h(x )¯ (x )dx π π(u)du 2 x → π(x ) is known but not π(u)du. Approximate I with ˜ = n t=1 h(X [t]) where (X [t]) is a Markov 1 n 3 I chain with limiting distribution π . ¯ 4 In MH algorithm, the last condition is obtained from a detailed balance condition ∀x , y , π(x )p(x , y ) = π(y )p(y , x ) 5 Quality of the approximation are obtained from Law of Large Numbers or CLT for Markov chains. 5 / 25
- 13. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Metropolis Hastings (MH) algorithm 1 We wish to approximate π(x ) I= h(x ) dx = h(x )¯ (x )dx π π(u)du 2 x → π(x ) is known but not π(u)du. Approximate I with ˜ = n t=1 h(X [t]) where (X [t]) is a Markov 1 n 3 I chain with limiting distribution π . ¯ 4 In MH algorithm, the last condition is obtained from a detailed balance condition ∀x , y , π(x )p(x , y ) = π(y )p(y , x ) 5 Quality of the approximation are obtained from Law of Large Numbers or CLT for Markov chains. 5 / 25
- 14. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Metropolis Hastings (MH) algorithm 1 We wish to approximate π(x ) I= h(x ) dx = h(x )¯ (x )dx π π(u)du 2 x → π(x ) is known but not π(u)du. Approximate I with ˜ = n t=1 h(X [t]) where (X [t]) is a Markov 1 n 3 I chain with limiting distribution π . ¯ 4 In MH algorithm, the last condition is obtained from a detailed balance condition ∀x , y , π(x )p(x , y ) = π(y )p(y , x ) 5 Quality of the approximation are obtained from Law of Large Numbers or CLT for Markov chains. 5 / 25
- 15. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Metropolis Hastings (MH) algorithm 1 We wish to approximate π(x ) I= h(x ) dx = h(x )¯ (x )dx π π(u)du 2 x → π(x ) is known but not π(u)du. Approximate I with ˜ = n t=1 h(X [t]) where (X [t]) is a Markov 1 n 3 I chain with limiting distribution π . ¯ 4 In MH algorithm, the last condition is obtained from a detailed balance condition ∀x , y , π(x )p(x , y ) = π(y )p(y , x ) 5 Quality of the approximation are obtained from Law of Large Numbers or CLT for Markov chains. 5 / 25
- 16. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Metropolis Hastings (MH) algorithm 1 We wish to approximate π(x ) I= h(x ) dx = h(x )¯ (x )dx π π(u)du 2 x → π(x ) is known but not π(u)du. Approximate I with ˜ = n t=1 h(X [t]) where (X [t]) is a Markov 1 n 3 I chain with limiting distribution π . ¯ 4 In MH algorithm, the last condition is obtained from a detailed balance condition ∀x , y , π(x )p(x , y ) = π(y )p(y , x ) 5 Quality of the approximation are obtained from Law of Large Numbers or CLT for Markov chains. 5 / 25
- 17. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Plan 1 Introduction 2 MH algorithms with multiple proposals Random Walk MH MCTM algorithm MTM-C algorithms 3 Optimal scaling Main results 4 Optimising the speed up process MCTM algorithm MTM-C algorithms 5 Conclusion 6 / 25
- 18. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Random Walk MH Notation w.p. = with probability Algorithme (MCMC ) If X [t] = x , how is X [t + 1] simulated? (a) Y ∼ q(x ; ·). (b) Accept the proposal X [t + 1] = Y w.p. α(x , Y ) where π(y )q(y ; x ) α(x , y ) = 1 ∧ π(x )q(x ; y ) (c) Otherwise X [t + 1] = x 7 / 25
- 19. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Random Walk MH Notation w.p. = with probability Algorithme (MCMC ) If X [t] = x , how is X [t + 1] simulated? (a) Y ∼ q(x ; ·). (b) Accept the proposal X [t + 1] = Y w.p. α(x , Y ) where π(y )q(y ; x ) α(x , y ) = 1 ∧ π(x )q(x ; y ) (c) Otherwise X [t + 1] = x The chain is π-reversible since: 7 / 25
- 20. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Random Walk MH Notation w.p. = with probability Algorithme (MCMC ) If X [t] = x , how is X [t + 1] simulated? (a) Y ∼ q(x ; ·). (b) Accept the proposal X [t + 1] = Y w.p. α(x , Y ) where π(y )q(y ; x ) α(x , y ) = 1 ∧ π(x )q(x ; y ) (c) Otherwise X [t + 1] = x The chain is π-reversible since: π(x )α(x , y )q(x ; y ) = π(x )α(x , y ) ∧ π(y )α(y , x ) 7 / 25
- 21. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Random Walk MH Notation w.p. = with probability Algorithme (MCMC ) If X [t] = x , how is X [t + 1] simulated? (a) Y ∼ q(x ; ·). (b) Accept the proposal X [t + 1] = Y w.p. α(x , Y ) where π(y )q(y ; x ) α(x , y ) = 1 ∧ π(x )q(x ; y ) (c) Otherwise X [t + 1] = x The chain is π-reversible since: π(x )α(x , y )q(x ; y ) = π(y )α(y , x )q(y ; x ) 7 / 25
- 22. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Random Walk MH Assume that q(x ; y ) = q(y ; x ) ◮ the instrumental kernel is symmetric. Typically Y = X + U where U has symm. distr. Notation w.p. = with probability Algorithme (MCMC with symmetric proposal) If X [t] = x , how is X [t + 1] simulated? (a) Y ∼ q(x ; ·). (b) Accept the proposal X [t + 1] = Y w.p. α(x , Y ) where π(y )q(y ; x ) α(x , y ) = 1 ∧ π(x )q(x ; y ) (c) Otherwise X [t + 1] = x The chain is π-reversible since: π(x )α(x , y )q(x ; y ) = π(y )α(y , x )q(y ; x ) 7 / 25
- 23. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Random Walk MH Assume that q(x ; y ) = q(y ; x ) ◮ the instrumental kernel is symmetric. Typically Y = X + U where U has symm. distr. Notation w.p. = with probability Algorithme (MCMC with symmetric proposal) If X [t] = x , how is X [t + 1] simulated? (a) Y ∼ q(x ; ·). (b) Accept the proposal X [t + 1] = Y w.p. α(x , Y ) where π(y ) α(x , y ) = 1 ∧ π(x ) (c) Otherwise X [t + 1] = x The chain is π-reversible since: π(x )α(x , y )q(x ; y ) = π(y )α(y , x )q(y ; x ) 7 / 25
- 24. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm Multiple proposal MCMC 1 Liu, Liang, Wong (2000) introduced the multiple proposal MCMC. Generalized to multiple correlated proposals by Craiu and Lemieux (2007). 2 A pool of candidates is drawn from (Y 1 , . . . , Y K ) X [t] ∼ q(X [t]; ·). 3 We select one candidate a priori according to some "informative" criterium (with high values of π for example). 4 We accept the candidate with some well chosen probability. ◮ diversity of the candidates: some candidates are, other are far away from the current state. Some additional notations: Yj X [t] ∼ qj (X [t]; ·) (◮M ARGINAL DIST.) (1) (Y i )i=j X [t],Y j ∼ qj (X [t], Y j ; ·) (◮S IM . ¯ OTHER CAND.) . (2) 8 / 25
- 25. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm Multiple proposal MCMC 1 Liu, Liang, Wong (2000) introduced the multiple proposal MCMC. Generalized to multiple correlated proposals by Craiu and Lemieux (2007). 2 A pool of candidates is drawn from (Y 1 , . . . , Y K ) X [t] ∼ q(X [t]; ·). 3 We select one candidate a priori according to some "informative" criterium (with high values of π for example). 4 We accept the candidate with some well chosen probability. ◮ diversity of the candidates: some candidates are, other are far away from the current state. Some additional notations: Yj X [t] ∼ qj (X [t]; ·) (◮M ARGINAL DIST.) (1) (Y i )i=j X [t],Y j ∼ qj (X [t], Y j ; ·) (◮S IM . ¯ OTHER CAND.) . (2) 8 / 25
- 26. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm Multiple proposal MCMC 1 Liu, Liang, Wong (2000) introduced the multiple proposal MCMC. Generalized to multiple correlated proposals by Craiu and Lemieux (2007). 2 A pool of candidates is drawn from (Y 1 , . . . , Y K ) X [t] ∼ q(X [t]; ·). 3 We select one candidate a priori according to some "informative" criterium (with high values of π for example). 4 We accept the candidate with some well chosen probability. ◮ diversity of the candidates: some candidates are, other are far away from the current state. Some additional notations: Yj X [t] ∼ qj (X [t]; ·) (◮M ARGINAL DIST.) (1) (Y i )i=j X [t],Y j ∼ qj (X [t], Y j ; ·) (◮S IM . ¯ OTHER CAND.) . (2) 8 / 25
- 27. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm Multiple proposal MCMC 1 Liu, Liang, Wong (2000) introduced the multiple proposal MCMC. Generalized to multiple correlated proposals by Craiu and Lemieux (2007). 2 A pool of candidates is drawn from (Y 1 , . . . , Y K ) X [t] ∼ q(X [t]; ·). 3 We select one candidate a priori according to some "informative" criterium (with high values of π for example). 4 We accept the candidate with some well chosen probability. ◮ diversity of the candidates: some candidates are, other are far away from the current state. Some additional notations: Yj X [t] ∼ qj (X [t]; ·) (◮M ARGINAL DIST.) (1) (Y i )i=j X [t],Y j ∼ qj (X [t], Y j ; ·) (◮S IM . ¯ OTHER CAND.) . (2) 8 / 25
- 28. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm Multiple proposal MCMC 1 Liu, Liang, Wong (2000) introduced the multiple proposal MCMC. Generalized to multiple correlated proposals by Craiu and Lemieux (2007). 2 A pool of candidates is drawn from (Y 1 , . . . , Y K ) X [t] ∼ q(X [t]; ·). 3 We select one candidate a priori according to some "informative" criterium (with high values of π for example). 4 We accept the candidate with some well chosen probability. ◮ diversity of the candidates: some candidates are, other are far away from the current state. Some additional notations: Yj X [t] ∼ qj (X [t]; ·) (◮M ARGINAL DIST.) (1) (Y i )i=j X [t],Y j ∼ qj (X [t], Y j ; ·) (◮S IM . ¯ OTHER CAND.) . (2) 8 / 25
- 29. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm Multiple proposal MCMC 1 Liu, Liang, Wong (2000) introduced the multiple proposal MCMC. Generalized to multiple correlated proposals by Craiu and Lemieux (2007). 2 A pool of candidates is drawn from (Y 1 , . . . , Y K ) X [t] ∼ q(X [t]; ·). 3 We select one candidate a priori according to some "informative" criterium (with high values of π for example). 4 We accept the candidate with some well chosen probability. ◮ diversity of the candidates: some candidates are, other are far away from the current state. Some additional notations: Yj X [t] ∼ qj (X [t]; ·) (◮M ARGINAL DIST.) (1) (Y i )i=j X [t],Y j ∼ qj (X [t], Y j ; ·) (◮S IM . ¯ OTHER CAND.) . (2) 8 / 25
- 30. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm Assume that qj (x ; y ) = qj (y ; x ) . Algorithme (MCTM: Multiple Correlated try Metropolis alg.) If X [t] = x , how is X [t + 1] simulated? (a) (Y 1 , . . . , Y K ) ∼ q(x ; ·). (◮POOL OF CAND.) (b) Draw an index J ∈ {1, . . . , K }, with probability proportional to [π(Y 1 ), . . . , π(Y K )] . (◮S ELECTION A PRIORI) ˜ (c) {Y J,i }i=J ∼ qJ (Y J , x ; ·). ¯ (◮AUXILIARY VARIABLES) ˜ (d) Accept the proposal X [t + 1] = Y J w.p. αJ (x , (Y i )K , (Y J,i )i=J ) i=1 where i=j π(y i ) + π(y j ) αj (x , (y i )K , (y j,i )i=j ) = 1 ∧ i=1 ˜ . (3) ˜ j,i i=j π(y ) + π(x ) (◮MH ACCEPTANCE PROBABILITY) (e) Otherwise, X [t + 1] = X [t] See MTM-C 9 / 25
- 31. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm Assume that qj (x ; y ) = qj (y ; x ) . Algorithme (MCTM: Multiple Correlated try Metropolis alg.) If X [t] = x , how is X [t + 1] simulated? (a) (Y 1 , . . . , Y K ) ∼ q(x ; ·). (◮POOL OF CAND.) (b) Draw an index J ∈ {1, . . . , K }, with probability proportional to [π(Y 1 ), . . . , π(Y K )] . (◮S ELECTION A PRIORI) ˜ (c) {Y J,i }i=J ∼ qJ (Y J , x ; ·). ¯ (◮AUXILIARY VARIABLES) ˜ (d) Accept the proposal X [t + 1] = Y J w.p. αJ (x , (Y i )K , (Y J,i )i=J ) i=1 where i=j π(y i ) + π(y j ) αj (x , (y i )K , (y j,i )i=j ) = 1 ∧ i=1 ˜ . (3) ˜ j,i i=j π(y ) + π(x ) (◮MH ACCEPTANCE PROBABILITY) (e) Otherwise, X [t + 1] = X [t] See MTM-C 9 / 25
- 32. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm Assume that qj (x ; y ) = qj (y ; x ) . Algorithme (MCTM: Multiple Correlated try Metropolis alg.) If X [t] = x , how is X [t + 1] simulated? (a) (Y 1 , . . . , Y K ) ∼ q(x ; ·). (◮POOL OF CAND.) (b) Draw an index J ∈ {1, . . . , K }, with probability proportional to [π(Y 1 ), . . . , π(Y K )] . (◮S ELECTION A PRIORI) ˜ (c) {Y J,i }i=J ∼ qJ (Y J , x ; ·). ¯ (◮AUXILIARY VARIABLES) ˜ (d) Accept the proposal X [t + 1] = Y J w.p. αJ (x , (Y i )K , (Y J,i )i=J ) i=1 where i=j π(y i ) + π(y j ) αj (x , (y i )K , (y j,i )i=j ) = 1 ∧ i=1 ˜ . (3) ˜ j,i i=j π(y ) + π(x ) (◮MH ACCEPTANCE PROBABILITY) (e) Otherwise, X [t + 1] = X [t] See MTM-C 9 / 25
- 33. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm Assume that qj (x ; y ) = qj (y ; x ) . Algorithme (MCTM: Multiple Correlated try Metropolis alg.) If X [t] = x , how is X [t + 1] simulated? (a) (Y 1 , . . . , Y K ) ∼ q(x ; ·). (◮POOL OF CAND.) (b) Draw an index J ∈ {1, . . . , K }, with probability proportional to [π(Y 1 ), . . . , π(Y K )] . (◮S ELECTION A PRIORI) ˜ (c) {Y J,i }i=J ∼ qJ (Y J , x ; ·). ¯ (◮AUXILIARY VARIABLES) ˜ (d) Accept the proposal X [t + 1] = Y J w.p. αJ (x , (Y i )K , (Y J,i )i=J ) i=1 where i=j π(y i ) + π(y j ) αj (x , (y i )K , (y j,i )i=j ) = 1 ∧ i=1 ˜ . (3) ˜ j,i i=j π(y ) + π(x ) (◮MH ACCEPTANCE PROBABILITY) (e) Otherwise, X [t + 1] = X [t] See MTM-C 9 / 25
- 34. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm 1 It generalises the classical Random Walk Hasting Metropolis algorithm (which is the case K = 1). RWMC 10 / 25
- 35. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm 1 It generalises the classical Random Walk Hasting Metropolis algorithm (which is the case K = 1). RWMC 2 It satisﬁes the detailed balance condition wrt π: K π(x ) ··· ¯ qj (x ; y )Qj x , y ; ¯ d(y i ) Qj y , x ; d(y j,i ) j=1 i=j i=j π(y ) π(y ) + i=j π(y i ) 1∧ π(y ) + i=j π(y i ) π(x ) + i=j π(y j,i ) 10 / 25
- 36. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm 1 It generalises the classical Random Walk Hasting Metropolis algorithm (which is the case K = 1). RWMC 2 It satisﬁes the detailed balance condition wrt π: K π(x )π(y ) qj (x ; y ) ··· ¯ Qj x , y ; ¯ d(y i ) Qj y , x ; d(y j,i ) j=1 i=j i=j 1 1 i ∧ π(y ) + i=j π(y ) π(x ) + i=j π(y j,i ) ◮ symmetric wrt (x , y ) 10 / 25
- 37. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm 1 The MCTM uses the simulation of K random variables for the pool of candidates and K − 1 auxiliary variables to compute the MH acceptance ratio. 2 Can we reduce the number of simulated variables while keeping the diversity of the pool? 3 Draw one random variable and use transformations to create the pool of candidates and auxiliary variables. 11 / 25
- 38. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm 1 The MCTM uses the simulation of K random variables for the pool of candidates and K − 1 auxiliary variables to compute the MH acceptance ratio. 2 Can we reduce the number of simulated variables while keeping the diversity of the pool? 3 Draw one random variable and use transformations to create the pool of candidates and auxiliary variables. 11 / 25
- 39. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm 1 The MCTM uses the simulation of K random variables for the pool of candidates and K − 1 auxiliary variables to compute the MH acceptance ratio. 2 Can we reduce the number of simulated variables while keeping the diversity of the pool? 3 Draw one random variable and use transformations to create the pool of candidates and auxiliary variables. 11 / 25
- 40. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MTM-C algorithms Ψi : X ×[0, 1)r → X Let . Ψj,i : X × X → X Assume that 1 For all j ∈ {1, . . . , K }, set Y j = Ψj (x , V ) (◮C OMMON R . V.) where V ∼ U([0, 1)r ) 2 For any (i, j) ∈ {1, . . . , K }2 , Y i = Ψj,i (x , Y j ) . (◮R ECONSTRUCTION OF THE OTHER CAND.) (4) Example: i ψ i (x , v ) = x + σΦ−1 (< v i + v >) where v i =< K a >, a ∈ Rr , Φ cumulative repartition function of the normal distribution. ◮ Korobov seq. + Cranley Patterson rot. ψ i (x , v ) = x + γ i Φ−1 (v ) . ◮ Hit and Run algorithm. 12 / 25
- 41. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MTM-C algorithms Ψi : X ×[0, 1)r → X Let . Ψj,i : X × X → X Assume that 1 For all j ∈ {1, . . . , K }, set Y j = Ψj (x , V ) (◮C OMMON R . V.) where V ∼ U([0, 1)r ) 2 For any (i, j) ∈ {1, . . . , K }2 , Y i = Ψj,i (x , Y j ) . (◮R ECONSTRUCTION OF THE OTHER CAND.) (4) Example: i ψ i (x , v ) = x + σΦ−1 (< v i + v >) where v i =< K a >, a ∈ Rr , Φ cumulative repartition function of the normal distribution. ◮ Korobov seq. + Cranley Patterson rot. ψ i (x , v ) = x + γ i Φ−1 (v ) . ◮ Hit and Run algorithm. 12 / 25
- 42. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MTM-C algorithms Ψi : X ×[0, 1)r → X Let . Ψj,i : X × X → X Assume that 1 For all j ∈ {1, . . . , K }, set Y j = Ψj (x , V ) (◮C OMMON R . V.) where V ∼ U([0, 1)r ) 2 For any (i, j) ∈ {1, . . . , K }2 , Y i = Ψj,i (x , Y j ) . (◮R ECONSTRUCTION OF THE OTHER CAND.) (4) Example: i ψ i (x , v ) = x + σΦ−1 (< v i + v >) where v i =< K a >, a ∈ Rr , Φ cumulative repartition function of the normal distribution. ◮ Korobov seq. + Cranley Patterson rot. ψ i (x , v ) = x + γ i Φ−1 (v ) . ◮ Hit and Run algorithm. 12 / 25
- 43. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MTM-C algorithms Ψi : X ×[0, 1)r → X Let . Ψj,i : X × X → X Assume that 1 For all j ∈ {1, . . . , K }, set Y j = Ψj (x , V ) (◮C OMMON R . V.) where V ∼ U([0, 1)r ) 2 For any (i, j) ∈ {1, . . . , K }2 , Y i = Ψj,i (x , Y j ) . (◮R ECONSTRUCTION OF THE OTHER CAND.) (4) Example: i ψ i (x , v ) = x + σΦ−1 (< v i + v >) where v i =< K a >, a ∈ Rr , Φ cumulative repartition function of the normal distribution. ◮ Korobov seq. + Cranley Patterson rot. ψ i (x , v ) = x + γ i Φ−1 (v ) . ◮ Hit and Run algorithm. 12 / 25
- 44. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MTM-C algorithms Algorithme (MTM-C: Multiple Try Metropolis alg. with common proposal) (a) Draw V ∼ U([0, 1)r ) and set Y i = Ψi (x , V ) for i = 1, . . . , K . (b) Draw an index J ∈ {1, . . . , K }, with probability proportional to [π(Y 1 ), . . . , π(Y K )] . (c) Accept X [t + 1] = Y J with probability αJ (x , Y ), where, for ¯ j ∈ {1, . . . , K }, αj (x , y j ) = αj x , {Ψj,i (x , y j )}K , {Ψj,i (y j , x )}i=j ¯ i=1 , (5) with αj given in (3) and reject otherwise. (d) Otherwise X [t + 1] = Y J . See MCTM 13 / 25
- 45. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MTM-C algorithms Algorithme (MTM-C: Multiple Try Metropolis alg. with common proposal) (a) Draw V ∼ U([0, 1)r ) and set Y i = Ψi (x , V ) for i = 1, . . . , K . (b) Draw an index J ∈ {1, . . . , K }, with probability proportional to [π(Y 1 ), . . . , π(Y K )] . (c) Accept X [t + 1] = Y J with probability αJ (x , Y ), where, for ¯ j ∈ {1, . . . , K }, αj (x , y j ) = αj x , {Ψj,i (x , y j )}K , {Ψj,i (y j , x )}i=j ¯ i=1 , (5) with αj given in (3) and reject otherwise. (d) Otherwise X [t + 1] = Y J . See MCTM 13 / 25
- 46. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MTM-C algorithms Algorithme (MTM-C: Multiple Try Metropolis alg. with common proposal) (a) Draw V ∼ U([0, 1)r ) and set Y i = Ψi (x , V ) for i = 1, . . . , K . (b) Draw an index J ∈ {1, . . . , K }, with probability proportional to [π(Y 1 ), . . . , π(Y K )] . (c) Accept X [t + 1] = Y J with probability αJ (x , Y ), where, for ¯ j ∈ {1, . . . , K }, αj (x , y j ) = αj x , {Ψj,i (x , y j )}K , {Ψj,i (y j , x )}i=j ¯ i=1 , (5) with αj given in (3) and reject otherwise. (d) Otherwise X [t + 1] = Y J . See MCTM 13 / 25
- 47. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MTM-C algorithms Algorithme (MTM-C: Multiple Try Metropolis alg. with common proposal) (a) Draw V ∼ U([0, 1)r ) and set Y i = Ψi (x , V ) for i = 1, . . . , K . (b) Draw an index J ∈ {1, . . . , K }, with probability proportional to [π(Y 1 ), . . . , π(Y K )] . (c) Accept X [t + 1] = Y J with probability αJ (x , Y ), where, for ¯ j ∈ {1, . . . , K }, αj (x , y j ) = αj x , {Ψj,i (x , y j )}K , {Ψj,i (y j , x )}i=j ¯ i=1 , (5) with αj given in (3) and reject otherwise. (d) Otherwise X [t + 1] = Y J . See MCTM 13 / 25
- 48. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Plan 1 Introduction 2 MH algorithms with multiple proposals Random Walk MH MCTM algorithm MTM-C algorithms 3 Optimal scaling Main results 4 Optimising the speed up process MCTM algorithm MTM-C algorithms 5 Conclusion 14 / 25
- 49. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion How to compare two MH algorithms ◮ P ESKUN- If P1 and P2 are two π-reversible kernels and ∀x , y p1 (x , y ) ≤ p2 (x , y ) then P2 is better than P1 in terms of the asymptotic variance of N −1 N h(X1 ). i=1 1 Off diagonal order: Not always easy to compare! 2 Moreover, one expression of the asymptotic variance is: ∞ V = Varπ (h) + 2 Covπ (h(X0 ), h(Xt )) t=1 15 / 25
- 50. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion How to compare two MH algorithms ◮ P ESKUN- If P1 and P2 are two π-reversible kernels and ∀x , y p1 (x , y ) ≤ p2 (x , y ) then P2 is better than P1 in terms of the asymptotic variance of N −1 N h(X1 ). i=1 1 Off diagonal order: Not always easy to compare! 2 Moreover, one expression of the asymptotic variance is: ∞ V = Varπ (h) + 2 Covπ (h(X0 ), h(Xt )) t=1 15 / 25
- 51. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion How to compare two MH algorithms ◮ P ESKUN- If P1 and P2 are two π-reversible kernels and ∀x , y p1 (x , y ) ≤ p2 (x , y ) then P2 is better than P1 in terms of the asymptotic variance of N −1 N h(X1 ). i=1 1 Off diagonal order: Not always easy to compare! 2 Moreover, one expression of the asymptotic variance is: ∞ V = Varπ (h) + 2 Covπ (h(X0 ), h(Xt )) t=1 15 / 25
- 52. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Original idea of optimal scaling For the RW-MH algorithm: 1 Increase dimension T . T 2 Target distribution πT (x0:T ) = t=0 f (xt ) . 3 Assume that XT [0] ∼ πT . 4 Take a random walk increasingly conservative: draw candidate ℓ YT = XT [t] + √T UT [t] where UT [t] centered standard normal. 5 What is the "best" ℓ? 16 / 25
- 53. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Original idea of optimal scaling For the RW-MH algorithm: 1 Increase dimension T . T 2 Target distribution πT (x0:T ) = t=0 f (xt ) . 3 Assume that XT [0] ∼ πT . 4 Take a random walk increasingly conservative: draw candidate ℓ YT = XT [t] + √T UT [t] where UT [t] centered standard normal. 5 What is the "best" ℓ? 16 / 25
- 54. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Original idea of optimal scaling For the RW-MH algorithm: 1 Increase dimension T . T 2 Target distribution πT (x0:T ) = t=0 f (xt ) . 3 Assume that XT [0] ∼ πT . 4 Take a random walk increasingly conservative: draw candidate ℓ YT = XT [t] + √T UT [t] where UT [t] centered standard normal. 5 What is the "best" ℓ? 16 / 25
- 55. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Original idea of optimal scaling For the RW-MH algorithm: 1 Increase dimension T . T 2 Target distribution πT (x0:T ) = t=0 f (xt ) . 3 Assume that XT [0] ∼ πT . 4 Take a random walk increasingly conservative: draw candidate ℓ YT = XT [t] + √T UT [t] where UT [t] centered standard normal. 5 What is the "best" ℓ? 16 / 25
- 56. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Original idea of optimal scaling For the RW-MH algorithm: 1 Increase dimension T . T 2 Target distribution πT (x0:T ) = t=0 f (xt ) . 3 Assume that XT [0] ∼ πT . 4 Take a random walk increasingly conservative: draw candidate ℓ YT = XT [t] + √T UT [t] where UT [t] centered standard normal. 5 What is the "best" ℓ? 16 / 25
- 57. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Théorème The ﬁrst component of (XT [⌊Ts⌋])0≤s≤1 weakly converges in the Skorokhod topology to the stationary solution (W [λℓ s], s ∈ R+ ) of the Langevin SDE 1 dW [s] = dB[s] + [ln f ]′ (W [s])ds , 2 In particular, the ﬁrst component of (XT [0], XT [α1 T ], . . . , XT [αp T ]) converges weakly to the distribution of (W [0], W [λℓ α1 T ], . . . , W [λℓ αp T ]) 17 / 25
- 58. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Théorème The ﬁrst component of (XT [⌊Ts⌋])0≤s≤1 weakly converges in the Skorokhod topology to the stationary solution (W [λℓ s], s ∈ R+ ) of the Langevin SDE 1 dW [s] = dB[s] + [ln f ]′ (W [s])ds , 2 In particular, the ﬁrst component of (XT [0], XT [α1 T ], . . . , XT [αp T ]) converges weakly to the distribution of (W [0], W [λℓ α1 T ], . . . , W [λℓ αp T ]) ℓ √ Then, ℓ is chosen to maximize λℓ = 2ℓ2 Φ − 2 I where 2 I= {[ln f ]′ (x )} f (x )dx . 17 / 25
- 59. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Théorème The ﬁrst component of (XT [⌊Ts⌋])0≤s≤1 weakly converges in the Skorokhod topology to the stationary solution (W [s], s ∈ R+ ) of the Langevin SDE λℓ ′ dW [s] = λℓ dB[s] + [ln f ] (W [s])ds , 2 In particular, the ﬁrst component of (XT [0], XT [α1 T ], . . . , XT [αp T ]) converges weakly to the distribution of (W [0], W [λℓ α1 T ], . . . , W [λℓ αp T ]) ℓ √ Then, ℓ is chosen to maximize λℓ = 2ℓ2 Φ − 2 I where 2 I= {[ln f ]′ (x )} f (x )dx . 17 / 25
- 60. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Main results Optimal scaling for the MCTM algorithm ◮ The pool of candidates YT ,t [n + 1] = XT ,t [n] + T −1/2 Uti [n + 1] , i 0 ≤ t ≤ T, 1 ≤ i ≤ K, where for any t ∈ {0, . . . , T }, (Uti [n + 1])K ∼ N (0, Σ) , (◮MCTM) i=1 Uti [n + 1] = ψ i (Vt ), and Vt ∼ U[0, 1], (◮MTM-C) ◮ The auxiliary variables ˜ j,i ˜ YT ,t [n + 1] = XT ,t [n] + T −1/2 Utj,i [n + 1] , i =j , 18 / 25
- 61. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Main results Optimal scaling for the MCTM algorithm ◮ The pool of candidates YT ,t [n + 1] = XT ,t [n] + T −1/2 Uti [n + 1] , i 0 ≤ t ≤ T, 1 ≤ i ≤ K, where for any t ∈ {0, . . . , T }, (Uti [n + 1])K ∼ N (0, Σ) , (◮MCTM) i=1 Uti [n + 1] = ψ i (Vt ), and Vt ∼ U[0, 1], (◮MTM-C) ◮ The auxiliary variables ˜ j,i ˜ YT ,t [n + 1] = XT ,t [n] + T −1/2 Utj,i [n + 1] , i =j , 18 / 25
- 62. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Main results Théorème Suppose that XT [0] is distributed according to the target density πT . Then, the process (XT ,0 [sT ], s ∈ R+ ) weakly converges in the Skorokhod topology to the stationary solution (W [s], s ∈ R+ ) of the Langevin SDE 1 ′ dW [s] = λ1/2 dB[s] + λ [ln f ] (W [s])ds , 2 with λ λ I, (Γj )K , where Γj , 1 ≤ j ≤ K denotes the covariance j=1 j i ˜ j,i matrix of the random vector (U0 , (U0 )i=j , (U0 )i=j ). For the MCTM, Γj = Γj (Σ). 2K −1 α(Γ) = E A Gi − Var[Gi ]/2 i=1 , (6) where A is bounded lip. and (Gi )2K −1 ∼ N (0, Γ). i=1 K λ I, (Γj )K j=1 Γj1,1 × α IΓj , (7) 19 / 25
- 63. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Main results Théorème Suppose that XT [0] is distributed according to the target density πT . Then, the process (XT ,0 [sT ], s ∈ R+ ) weakly converges in the Skorokhod topology to the stationary solution (W [s], s ∈ R+ ) of the Langevin SDE 1 ′ dW [s] = λ1/2 dB[s] + λ [ln f ] (W [s])ds , 2 with λ λ I, (Γj )K , where Γj , 1 ≤ j ≤ K denotes the covariance j=1 j i ˜ j,i matrix of the random vector (U0 , (U0 )i=j , (U0 )i=j ). For the MCTM, Γj = Γj (Σ). 2K −1 α(Γ) = E A Gi − Var[Gi ]/2 i=1 , (6) where A is bounded lip. and (Gi )2K −1 ∼ N (0, Γ). i=1 K λ I, (Γj )K j=1 Γj1,1 × α IΓj , (7) 19 / 25
- 64. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Main results Théorème Suppose that XT [0] is distributed according to the target density πT . Then, the process (XT ,0 [sT ], s ∈ R+ ) weakly converges in the Skorokhod topology to the stationary solution (W [s], s ∈ R+ ) of the Langevin SDE 1 ′ dW [s] = λ1/2 dB[s] + λ [ln f ] (W [s])ds , 2 with λ λ I, (Γj )K , where Γj , 1 ≤ j ≤ K denotes the covariance j=1 j i ˜ j,i matrix of the random vector (U0 , (U0 )i=j , (U0 )i=j ). For the MCTM, Γj = Γj (Σ). 2K −1 α(Γ) = E A Gi − Var[Gi ]/2 i=1 , (6) where A is bounded lip. and (Gi )2K −1 ∼ N (0, Γ). i=1 K λ I, (Γj )K j=1 Γj1,1 × α IΓj , (7) 19 / 25
- 65. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Plan 1 Introduction 2 MH algorithms with multiple proposals Random Walk MH MCTM algorithm MTM-C algorithms 3 Optimal scaling Main results 4 Optimising the speed up process MCTM algorithm MTM-C algorithms 5 Conclusion 20 / 25
- 66. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm We optimize the speed λ λ(I, (Γj (Σ))K ) over a subset G j=1 G = Σ = diag(ℓ2 , . . . , ℓ2 ), (ℓ1 , . . . , ℓK ) ∈ RK : the proposals 1 K have different scales but are independent. G = Σ = ℓ2 Σa , ℓ2 ∈ R , where Σa is the extreme antithetic covariance matrix: K 1 Σa IK − 1K 1T K K −1 K −1 with 1K = (1, . . . , 1)T . 21 / 25
- 67. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm MCTM algorithms Table: Optimal scaling constants, value of the speed, and mean acceptance rate for independent proposals K 1 2 3 4 5 ℓ⋆ 2.38 2.64 2.82 2.99 3.12 λ⋆ 1.32 2.24 2.94 3.51 4.00 a⋆ 0.23 0.32 0.37 0.39 0.41 22 / 25
- 68. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MCTM algorithm MCTM algorithms Table: Optimal scaling constants, value of the speed, and mean acceptance rate for extreme antithetic proposals K 1 2 3 4 5 ℓ⋆ 2.38 2.37 2.64 2.83 2.99 λ⋆ 1.32 2.64 3.66 4.37 4.91 a⋆ 0.23 0.46 0.52 0.54 0.55 Table: Optimal scaling constants, value of the speed, and mean acceptance rate for the optimal covariance K 1 2 3 4 5 ℓ⋆ 2.38 2.37 2.66 2.83 2.98 λ⋆ 1.32 2.64 3.70 4.40 4.93 a⋆ 0.23 0.46 0.52 0.55 0.56 22 / 25
- 69. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion MTM-C algorithms MTM-C algorithms Table: Optimal scaling constants, optimal value of the speed and the mean acceptance rate for the RQMC MTM algorithm based on the Korobov sequence and Cranley-Patterson rotations K 1 2 3 4 5 σ⋆ 2.38 2.59 2.77 2.91 3.03 λ⋆ 1.32 2.43 3.31 4.01 4.56 a⋆ 0.23 0.36 0.42 0.47 0.50 Table: Optimal scaling constants, value of the speed, and mean acceptance rate for the hit-and-run algorithm K 1 2 4 6 8 ℓ⋆ 2.38 2.37 7.11 11.85 16.75 λ⋆ 1.32 2.64 2.65 2.65 2.65 a⋆ 0.23 0.46 0.46 0.46 0.46 23 / 25
- 70. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Plan 1 Introduction 2 MH algorithms with multiple proposals Random Walk MH MCTM algorithm MTM-C algorithms 3 Optimal scaling Main results 4 Optimising the speed up process MCTM algorithm MTM-C algorithms 5 Conclusion 24 / 25
- 71. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Conclusion ◮ MCTM algorithm: 1 Extreme antithetic proposals improves upon the MTM with independent proposals. 2 Still, the improvement is not overly impressive and since the introduction of correlation makes the computation of the acceptance ratio more complex. ◮ MTM-C algorithm: 1 The advantage of the MTM-C algorithms: only one simulation is required for obtaining the pool of proposals and auxiliary variables. 2 The MTM-RQMC ∼ the extreme antithetic proposals. 3 Our preferred choice: the MTM-HR algorithm. In particular, the case K = 2 induces a speed which is twice that of the Metropolis algorithm whereas the computational cost is almost the same in many scenarios. 25 / 25
- 72. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Conclusion ◮ MCTM algorithm: 1 Extreme antithetic proposals improves upon the MTM with independent proposals. 2 Still, the improvement is not overly impressive and since the introduction of correlation makes the computation of the acceptance ratio more complex. ◮ MTM-C algorithm: 1 The advantage of the MTM-C algorithms: only one simulation is required for obtaining the pool of proposals and auxiliary variables. 2 The MTM-RQMC ∼ the extreme antithetic proposals. 3 Our preferred choice: the MTM-HR algorithm. In particular, the case K = 2 induces a speed which is twice that of the Metropolis algorithm whereas the computational cost is almost the same in many scenarios. 25 / 25
- 73. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Conclusion ◮ MCTM algorithm: 1 Extreme antithetic proposals improves upon the MTM with independent proposals. 2 Still, the improvement is not overly impressive and since the introduction of correlation makes the computation of the acceptance ratio more complex. ◮ MTM-C algorithm: 1 The advantage of the MTM-C algorithms: only one simulation is required for obtaining the pool of proposals and auxiliary variables. 2 The MTM-RQMC ∼ the extreme antithetic proposals. 3 Our preferred choice: the MTM-HR algorithm. In particular, the case K = 2 induces a speed which is twice that of the Metropolis algorithm whereas the computational cost is almost the same in many scenarios. 25 / 25
- 74. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Conclusion ◮ MCTM algorithm: 1 Extreme antithetic proposals improves upon the MTM with independent proposals. 2 Still, the improvement is not overly impressive and since the introduction of correlation makes the computation of the acceptance ratio more complex. ◮ MTM-C algorithm: 1 The advantage of the MTM-C algorithms: only one simulation is required for obtaining the pool of proposals and auxiliary variables. 2 The MTM-RQMC ∼ the extreme antithetic proposals. 3 Our preferred choice: the MTM-HR algorithm. In particular, the case K = 2 induces a speed which is twice that of the Metropolis algorithm whereas the computational cost is almost the same in many scenarios. 25 / 25
- 75. Introduction MH algorithms with multiple proposals Optimal scaling Optimising the speed up process Conclusion Conclusion ◮ MCTM algorithm: 1 Extreme antithetic proposals improves upon the MTM with independent proposals. 2 Still, the improvement is not overly impressive and since the introduction of correlation makes the computation of the acceptance ratio more complex. ◮ MTM-C algorithm: 1 The advantage of the MTM-C algorithms: only one simulation is required for obtaining the pool of proposals and auxiliary variables. 2 The MTM-RQMC ∼ the extreme antithetic proposals. 3 Our preferred choice: the MTM-HR algorithm. In particular, the case K = 2 induces a speed which is twice that of the Metropolis algorithm whereas the computational cost is almost the same in many scenarios. 25 / 25

Full NameComment goes here.