About discretising Hamiltonians

             Christian P. Robert

      Universit´ Paris-Dauphine and CREST
               e
       http://xianblog.wordpress.com


Royal Statistical Society, October 13, 2010




        Christian P. Robert   About discretising Hamiltonians
Hamiltonian dynamics


   Dynamic on the level sets of
                               1                    1
      H (θ, p) = −L(θ) +         log{(2π)D |G(θ)|} + pT G(θ)−1 p ,
                               2                    2
   where p is an auxiliary vector of dimension D, is associated with
   Hamilton’s pde’s
                       ∂H                      ˙ ∂H (θ, p)
                  ˙
                  p=       (θ, p) ,            θ=
                        ∂p                        ∂θ

    which preserve the potential H (θ, p) and hence the target
   distribution at all times t




                       Christian P. Robert   About discretising Hamiltonians
Discretised Hamiltonian



   Girolami and Calderhead reproduce Hamiltonian equations within
   the simulation domain by discretisation via the generalised leapfrog
   (!) generator,
                                         [Subliminal French bashing?!]




                       Christian P. Robert   About discretising Hamiltonians
Discretised Hamiltonian




   Girolami and Calderhead reproduce Hamiltonian equations within
   the simulation domain by discretisation via the generalised leapfrog
   (!) generator,
   but...




                       Christian P. Robert   About discretising Hamiltonians
Discretised Hamiltonian




   Girolami and Calderhead reproduce Hamiltonian equations within
   the simulation domain by discretisation via the generalised leapfrog
   (!) generator,
   but...
   invariance and stability properties of the [background] continuous
   time process the method do not carry to the discretised version of
   the process [e.g., Langevin]




                       Christian P. Robert   About discretising Hamiltonians
Discretised Hamiltonian (2)



      Is it useful to so painstakingly reproduce the continuous
      behaviour?
      Approximations (see R&R’s Langevin) can be corrected by a
      Metropolis-Hastings step, so why bother with a second level
      of approximation?
      Discretisation induces a calibration problem: how long is long
      enough?
      Convergence issues (for the MCMC algorithm) should not be
      impacted by inexact renderings of the continuous time process
      in discrete time: loss of efficiency?




                     Christian P. Robert   About discretising Hamiltonians
An illustration

   Comparison of the fits of discretised Langevin diffusion sequences
   to the target f (x) ∝ exp(−x4 ) when using a discretisation step
   σ 2 = .1 and σ 2 = .0001, after the same number T = 107 of steps.

                            0.6
                            0.5
                            0.4
                  Density

                            0.3
                            0.2
                            0.1
                            0.0




                                  −1.5        −1.0    −0.5     0.0        0.5      1.0     1.5




                                         Christian P. Robert         About discretising Hamiltonians
An illustration

   Comparison of the fits of discretised Langevin diffusion sequences
   to the target f (x) ∝ exp(−x4 ) when using a discretisation step
   σ 2 = .1 and σ 2 = .0001, after the same number T = 107 of steps.

                            0.8
                            0.6
                  Density

                            0.4
                            0.2
                            0.0




                                  −1.5        −1.0    −0.5     0.0        0.5      1.0      1.5




                                         Christian P. Robert         About discretising Hamiltonians
An illustration

   Comparison of the fits of discretised Langevin diffusion sequences
   to the target f (x) ∝ exp(−x4 ) when using a discretisation step
   σ 2 = .1 and σ 2 = .0001, after the same number T = 107 of steps.

                         1e+05
                         8e+04
                         6e+04
                  time

                         4e+04
                         2e+04
                         0e+00




                                 −2            −1           0              1             2




                                      Christian P. Robert       About discretising Hamiltonians
Back on Langevin

   For the Langevin diffusion, the corresponding Langevin
   (discretised) algorithm could as well use another scale η for the
   gradient, rather than the one τ used for the noise




                        Christian P. Robert   About discretising Hamiltonians
Back on Langevin

   For the Langevin diffusion, the corresponding Langevin
   (discretised) algorithm could as well use another scale η for the
   gradient, rather than the one τ used for the noise

                          y = xt + η∇π(x) + τ ǫt
   rather than a strict Euler discretisation

                        y = xt + τ 2 ∇π(x)/2 + τ ǫt




                        Christian P. Robert   About discretising Hamiltonians
Back on Langevin

   For the Langevin diffusion, the corresponding Langevin
   (discretised) algorithm could as well use another scale η for the
   gradient, rather than the one τ used for the noise

                          y = xt + η∇π(x) + τ ǫt
   rather than a strict Euler discretisation

                        y = xt + τ 2 ∇π(x)/2 + τ ǫt

   A few experiments run in Robert and Casella (1999, Chap. 6, §6.5)
   hinted that using a scale η = τ 2 /2 could actually lead to
   improvements




                        Christian P. Robert   About discretising Hamiltonians
Back on Langevin

   For the Langevin diffusion, the corresponding Langevin
   (discretised) algorithm could as well use another scale η for the
   gradient, rather than the one τ used for the noise

                          y = xt + η∇π(x) + τ ǫt
   rather than a strict Euler discretisation

                        y = xt + τ 2 ∇π(x)/2 + τ ǫt

   A few experiments run in Robert and Casella (1999, Chap. 6, §6.5)
   hinted that using a scale η = τ 2 /2 could actually lead to
   improvements
   Which [independent] framework should we adopt for
   assessing discretised diffusions?


                        Christian P. Robert   About discretising Hamiltonians

RSS discussion of Girolami and Calderhead, October 13, 2010

  • 1.
    About discretising Hamiltonians Christian P. Robert Universit´ Paris-Dauphine and CREST e http://xianblog.wordpress.com Royal Statistical Society, October 13, 2010 Christian P. Robert About discretising Hamiltonians
  • 2.
    Hamiltonian dynamics Dynamic on the level sets of 1 1 H (θ, p) = −L(θ) + log{(2π)D |G(θ)|} + pT G(θ)−1 p , 2 2 where p is an auxiliary vector of dimension D, is associated with Hamilton’s pde’s ∂H ˙ ∂H (θ, p) ˙ p= (θ, p) , θ= ∂p ∂θ which preserve the potential H (θ, p) and hence the target distribution at all times t Christian P. Robert About discretising Hamiltonians
  • 3.
    Discretised Hamiltonian Girolami and Calderhead reproduce Hamiltonian equations within the simulation domain by discretisation via the generalised leapfrog (!) generator, [Subliminal French bashing?!] Christian P. Robert About discretising Hamiltonians
  • 4.
    Discretised Hamiltonian Girolami and Calderhead reproduce Hamiltonian equations within the simulation domain by discretisation via the generalised leapfrog (!) generator, but... Christian P. Robert About discretising Hamiltonians
  • 5.
    Discretised Hamiltonian Girolami and Calderhead reproduce Hamiltonian equations within the simulation domain by discretisation via the generalised leapfrog (!) generator, but... invariance and stability properties of the [background] continuous time process the method do not carry to the discretised version of the process [e.g., Langevin] Christian P. Robert About discretising Hamiltonians
  • 6.
    Discretised Hamiltonian (2) Is it useful to so painstakingly reproduce the continuous behaviour? Approximations (see R&R’s Langevin) can be corrected by a Metropolis-Hastings step, so why bother with a second level of approximation? Discretisation induces a calibration problem: how long is long enough? Convergence issues (for the MCMC algorithm) should not be impacted by inexact renderings of the continuous time process in discrete time: loss of efficiency? Christian P. Robert About discretising Hamiltonians
  • 7.
    An illustration Comparison of the fits of discretised Langevin diffusion sequences to the target f (x) ∝ exp(−x4 ) when using a discretisation step σ 2 = .1 and σ 2 = .0001, after the same number T = 107 of steps. 0.6 0.5 0.4 Density 0.3 0.2 0.1 0.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 Christian P. Robert About discretising Hamiltonians
  • 8.
    An illustration Comparison of the fits of discretised Langevin diffusion sequences to the target f (x) ∝ exp(−x4 ) when using a discretisation step σ 2 = .1 and σ 2 = .0001, after the same number T = 107 of steps. 0.8 0.6 Density 0.4 0.2 0.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 Christian P. Robert About discretising Hamiltonians
  • 9.
    An illustration Comparison of the fits of discretised Langevin diffusion sequences to the target f (x) ∝ exp(−x4 ) when using a discretisation step σ 2 = .1 and σ 2 = .0001, after the same number T = 107 of steps. 1e+05 8e+04 6e+04 time 4e+04 2e+04 0e+00 −2 −1 0 1 2 Christian P. Robert About discretising Hamiltonians
  • 10.
    Back on Langevin For the Langevin diffusion, the corresponding Langevin (discretised) algorithm could as well use another scale η for the gradient, rather than the one τ used for the noise Christian P. Robert About discretising Hamiltonians
  • 11.
    Back on Langevin For the Langevin diffusion, the corresponding Langevin (discretised) algorithm could as well use another scale η for the gradient, rather than the one τ used for the noise y = xt + η∇π(x) + τ ǫt rather than a strict Euler discretisation y = xt + τ 2 ∇π(x)/2 + τ ǫt Christian P. Robert About discretising Hamiltonians
  • 12.
    Back on Langevin For the Langevin diffusion, the corresponding Langevin (discretised) algorithm could as well use another scale η for the gradient, rather than the one τ used for the noise y = xt + η∇π(x) + τ ǫt rather than a strict Euler discretisation y = xt + τ 2 ∇π(x)/2 + τ ǫt A few experiments run in Robert and Casella (1999, Chap. 6, §6.5) hinted that using a scale η = τ 2 /2 could actually lead to improvements Christian P. Robert About discretising Hamiltonians
  • 13.
    Back on Langevin For the Langevin diffusion, the corresponding Langevin (discretised) algorithm could as well use another scale η for the gradient, rather than the one τ used for the noise y = xt + η∇π(x) + τ ǫt rather than a strict Euler discretisation y = xt + τ 2 ∇π(x)/2 + τ ǫt A few experiments run in Robert and Casella (1999, Chap. 6, §6.5) hinted that using a scale η = τ 2 /2 could actually lead to improvements Which [independent] framework should we adopt for assessing discretised diffusions? Christian P. Robert About discretising Hamiltonians