Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility in Stochastic EM algorithm

743 views

Published on

talk by S. Allassonnière (CMAP, Ecole Polytechnique) at BigMC seminar, 12/01/2011

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
743
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility in Stochastic EM algorithm

  1. 1. Anisotropic Metropolis adjusted Langevin algorithm: Convergence and utility in stochastic EM algorithm. ´ ` Stephanie Allassonniere ´ CMAP, Ecole Polytechnique BigMC, January 2012Join work with Estelle Kuhn (INRA, France)St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 1 / 42
  2. 2. IntroductionIntroduction:Where does the problem came from? Image analysis: Compare two observations via the quantification of the deformation from one to the other (D’Arcy Thompson, 1917) Each element of a population is a smooth deformation of a template St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 2 / 42
  3. 3. IntroductionIntroduction:Where does the problem came from? Image analysis: Compare two observations via the quantification of the deformation from one to the other (D’Arcy Thompson, 1917) Registration Each element of a population is a smooth deformation of a template St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 2 / 42
  4. 4. IntroductionIntroduction:Where does the problem came from? Image analysis: Compare two observations via the quantification of the deformation from one to the other (D’Arcy Thompson, 1917) Registration Each element of a population is a smooth deformation of a template Template estimation St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 2 / 42
  5. 5. IntroductionIntroduction:Where does the problem came from? Image analysis: Compare two observations via the quantification of the deformation from one to the other (D’Arcy Thompson, 1917) Registration Each element of a population is a smooth deformation of a template Template estimation / Mean St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 2 / 42
  6. 6. IntroductionIntroduction:Where does the problem came from? Image analysis: Compare two observations via the quantification of the deformation from one to the other (D’Arcy Thompson, 1917) Registration / Variance Each element of a population is a smooth deformation of a template Template estimation / Mean St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 2 / 42
  7. 7. IntroductionIntroduction:Where does the problem came from? St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 3 / 42
  8. 8. IntroductionIntroduction:Where does the problem came from? Deformable template model: (u = voxel, vu its position) y (u) = I0 (vu − m(vu )) + σ (u) , St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 3 / 42
  9. 9. IntroductionIntroduction:Where does the problem came from? Deformable template model: (u = voxel, vu its position) y (u) = I0 (vu − m(vu )) + σ (u) , Template I0 and geometry Law (m) estimation St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 3 / 42
  10. 10. IntroductionIntroduction:Where does the problem came from? Deformable template model: (u = voxel, vu its position) y (u) = I0 (vu − m(vu )) + σ (u) , Template I0 and geometry Law (m) estimation High dimensional setting, Low sample size St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 3 / 42
  11. 11. IntroductionIntroduction:Where does the problem came from? Deformable template model: (u = voxel, vu its position) y (u) = I0 (vu − m(vu )) + σ (u) , Template I0 and geometry Law (m) estimation High dimensional setting, Low sample size Considering the LDDMM framework through the shooting equations St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 3 / 42
  12. 12. IntroductionOutline: 1. AMALA: simulation of random variables in high dimension Anisotropic MALA description Convergence property 2. AMALA within stochastic algorithm for parameter estimation Maximum likelihood estimation for incomplete data setting AMALA-SAEM Convergence properties 3. Experiments BME-Template model: small deformation setting BME-Template model: LDDMM setting St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 4 / 42
  13. 13. IntroductionOutline: 1. AMALA: simulation of random variables in high dimension Anisotropic MALA description Convergence property 2. AMALA within stochastic algorithm for parameter estimation Maximum likelihood estimation for incomplete data setting AMALA-SAEM Convergence properties 3. Experiments BME-Template model: small deformation setting BME-Template model: LDDMM setting St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 5 / 42
  14. 14. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)Introduction:General setting: St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 6 / 42
  15. 15. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)Introduction:General setting: Simulation of random variable in high dimension settings: → Gibbs Sampler not useful St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 6 / 42
  16. 16. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)Introduction:General setting: Simulation of random variable in high dimension settings: → Gibbs Sampler not useful Metropolis Adjusted Langevin Algorithm (MALA) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 6 / 42
  17. 17. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)Introduction:General setting: Simulation of random variable in high dimension settings: → Gibbs Sampler not useful Metropolis Adjusted Langevin Algorithm (MALA) Target distribution: π St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 6 / 42
  18. 18. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)Introduction:General setting: Simulation of random variable in high dimension settings: → Gibbs Sampler not useful Metropolis Adjusted Langevin Algorithm (MALA) Target distribution: π At iteration k of this algorithm, Xk the current value Simulate Xc w.r.t. N (Xk + δD(Xk ), δIdd ) where D(x) = max(b,| blog π(x)|) log π(x). Update Xk+1 = Xc with probability α(Xk , Xc ) = min 1, qMALA (Xk ,Xc(Xc ,Xk ) and Xk+1 = Xk otherwise. π(Xc )qMALA )π(Xk ) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 6 / 42
  19. 19. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)Introduction:General setting: Simulation of random variable in high dimension settings: → Gibbs Sampler not useful Metropolis Adjusted Langevin Algorithm (MALA) Target distribution: π At iteration k of this algorithm, Xk the current value Simulate Xc w.r.t. N (Xk + δD(Xk ), δIdd ) where D(x) = max(b,| blog π(x)|) log π(x). Update Xk+1 = Xc with probability α(Xk , Xc ) = min 1, qMALA (Xk ,Xc(Xc ,Xk ) and Xk+1 = Xk otherwise. π(Xc )qMALA )π(Xk ) Problem: isotropic covariance matrix = numerically trapped (α(Xk , Xc ) = 0) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 6 / 42
  20. 20. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)Introduction:General setting: Simulation of random variable in high dimension settings: → Gibbs Sampler not useful Metropolis Adjusted Langevin Algorithm (MALA) Target distribution: π At iteration k of this algorithm, Xk the current value Simulate Xc w.r.t. N (Xk + δD(Xk ), δIdd ) where D(x) = max(b,| blog π(x)|) log π(x). Update Xk+1 = Xc with probability α(Xk , Xc ) = min 1, qMALA (Xk ,Xc(Xc ,Xk ) and Xk+1 = Xk otherwise. π(Xc )qMALA )π(Xk ) Problem: isotropic covariance matrix = numerically trapped (α(Xk , Xc ) = 0) → Anisotropic Metropolis Adjusted Langevin Algorithm (AMALA) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 6 / 42
  21. 21. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithmHow including anisotropy? Following the magnitude of the gradient St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 7 / 42
  22. 22. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithmHow including anisotropy? Following the magnitude of the gradient First approximation: independence of directions St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 7 / 42
  23. 23. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithmHow including anisotropy? Following the magnitude of the gradient First approximation: independence of directions Bounded covariance (same as bounded drift) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 7 / 42
  24. 24. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithmAnisotropic Metropolis Adjusted Langevin Algorithm (AMALA) For all k = 1 : kend Iterates of Markov chain St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 8 / 42
  25. 25. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithmAnisotropic Metropolis Adjusted Langevin Algorithm (AMALA) For all k = 1 : kend Iterates of Markov chain Sample Xc with respect to N (Xk + δD(Xk ), δΣ(Xk )) b with D(Xk ) = max(b,| log π(Xk )|) log π(Xk ) and Σ(Xk ) = Idd + diag ([ log π(Xk )]2 ∧ b), ... , ([ log π(Xk )]2 ∧ b) 1 d St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 8 / 42
  26. 26. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithmAnisotropic Metropolis Adjusted Langevin Algorithm (AMALA) For all k = 1 : kend Iterates of Markov chain Sample Xc with respect to N (Xk + δD(Xk ), δΣ(Xk )) b with D(Xk ) = max(b,| log π(Xk )|) log π(Xk ) and Σ(Xk ) = Idd + diag ([ log π(Xk )]2 ∧ b), ... , ([ log π(Xk )]2 ∧ b) 1 d Compute the acceptance ratio π(Xc )qc (Xc , Xk ) α(Xk , Xc ) = min 1, qc (Xk , Xc )π(Xk ) (qc = the pdf of this distribution). St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 8 / 42
  27. 27. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithmAnisotropic Metropolis Adjusted Langevin Algorithm (AMALA) For all k = 1 : kend Iterates of Markov chain Sample Xc with respect to N (Xk + δD(Xk ), δΣ(Xk )) b with D(Xk ) = max(b,| log π(Xk )|) log π(Xk ) and Σ(Xk ) = Idd + diag ([ log π(Xk )]2 ∧ b), ... , ([ log π(Xk )]2 ∧ b) 1 d Compute the acceptance ratio π(Xc )qc (Xc , Xk ) α(Xk , Xc ) = min 1, qc (Xk , Xc )π(Xk ) (qc = the pdf of this distribution). Sample Xk+1 = Xc with probability α(Xk , Xc ) and Xk+1 = Xk with probability 1 − α(Xk , Xc ) = Acceptation/reject St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 8 / 42
  28. 28. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chainGeometric ergodicity of the Markov chain Condition: π super-exponential: Smoothness condition on the target distribution (B1) The density π is positive with continuous first derivative such that: lim n(x). log π(x) = −∞ (1) |x|→∞ and lim sup n(x).m(x) < 0 (2) |x|→∞ x where is the gradient operator in Rd , n(x) = |x| is the unit vector π(x) pointing in the direction of x and m(x) = is the unit vector in | π(x)| the direction of the gradient of the stationary distribution at point x. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 9 / 42
  29. 29. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chainGeometric ergodicity of the Markov chain Result: St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 10 / 42
  30. 30. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chainGeometric ergodicity of the Markov chain Result: Existence of a small set Π(x, A) ≥ εν(A)1C (x), ∀x ∈ X and ∀A ∈ B St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 10 / 42
  31. 31. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chainGeometric ergodicity of the Markov chain Result: Existence of a small set Π(x, A) ≥ εν(A)1C (x), ∀x ∈ X and ∀A ∈ B Drift condition: pulls the chain back into the small set ΠV (x) ≤ λV (x) + b1C (x) . St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 10 / 42
  32. 32. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chainGeometric ergodicity of the Markov chain Result: Existence of a small set Π(x, A) ≥ εν(A)1C (x), ∀x ∈ X and ∀A ∈ B Drift condition: pulls the chain back into the small set ΠV (x) ≤ λV (x) + b1C (x) . Geometric ergodicity |Πn V (x) − π(x)| sup ≤ Rρn . (3) x∈X V (x) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 10 / 42
  33. 33. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chainExperiments on synthetic data Target: 10 dimensional Gaussian distribution with zero mean and diagonal covariance matrix with diagonal coefficients randomly picked between 1 and 2500 Comparison of AMALA and symmetric random walk 500, 000 iterations for each algorithm starting at zero Mean squared jump distance (MSJD) in stationarity: AMALA 0.1504 - random walk 0.0407. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 11 / 42
  34. 34. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chainExperiments on synthetic dataFigure: Autocorrelation functions of the AMALA (red) and the random walk(blue) samplers for four of the ten components of the Gaussian 10 dimensionaldistribution. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 12 / 42
  35. 35. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chainWhy not using exising MALA-like algorithms? Optimised MALA-like algorithms are usually adaptive Good performances in practice Good theoretical properties St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 13 / 42
  36. 36. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chainWhy not using exising MALA-like algorithms? Optimised MALA-like algorithms are usually adaptive Good performances in practice Good theoretical properties However St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 13 / 42
  37. 37. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chainWhy not using exising MALA-like algorithms? Optimised MALA-like algorithms are usually adaptive Good performances in practice Good theoretical properties However Numerical problem at the first iterations (not yet stationary): convergence time? St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 13 / 42
  38. 38. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chainWhy not using exising MALA-like algorithms? Optimised MALA-like algorithms are usually adaptive Good performances in practice Good theoretical properties However Numerical problem at the first iterations (not yet stationary): convergence time? Most important: Our goal = parameter estimation AMALA = one tool inside another algorithm Adaptive + estimation algorithm = numerical issues: too many degree of freedom St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 13 / 42
  39. 39. Applying AMALA within SAEMOutline: 1. AMALA: simulation of random variables in high dimension Anisotropic MALA description Convergence property 2. AMALA within stochastic algorithm for parameter estimation Maximum likelihood estimation for incomplete data setting AMALA-SAEM Convergence properties 3. Experiments BME-Template model: small deformation setting BME-Template model: LDDMM setting St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 14 / 42
  40. 40. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data settingMaximum likelihood estimation for incomplete data setting y ∈ Rn : observed data St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 15 / 42
  41. 41. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data settingMaximum likelihood estimation for incomplete data setting y ∈ Rn : observed data z ∈ Rl : missing data St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 15 / 42
  42. 42. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data settingMaximum likelihood estimation for incomplete data setting y ∈ Rn : observed data z ∈ Rl : missing data (y , z) ∈ Rn+l : complete data St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 15 / 42
  43. 43. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data settingMaximum likelihood estimation for incomplete data setting y ∈ Rn : observed data z ∈ Rl : missing data (y , z) ∈ Rn+l : complete data P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 15 / 42
  44. 44. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data settingMaximum likelihood estimation for incomplete data setting y ∈ Rn : observed data z ∈ Rl : missing data (y , z) ∈ Rn+l : complete data P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l Assumption: ∃θ ∈ Θ s.t. the complete data likelihood q(y , z; θ) = f (y , z; θ) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 15 / 42
  45. 45. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data settingMaximum likelihood estimation for incomplete data setting y ∈ Rn : observed data z ∈ Rl : missing data (y , z) ∈ Rn+l : complete data P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l Assumption: ∃θ ∈ Θ s.t. the complete data likelihood q(y , z; θ) = f (y , z; θ) Observed likelihood: g (y ; θ) = f (y , z; θ)µ(dz). (4) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 15 / 42
  46. 46. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data settingMaximum likelihood estimation for incomplete data setting y ∈ Rn : observed data z ∈ Rl : missing data (y , z) ∈ Rn+l : complete data P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l Assumption: ∃θ ∈ Θ s.t. the complete data likelihood q(y , z; θ) = f (y , z; θ) Observed likelihood: g (y ; θ) = f (y , z; θ)µ(dz). (4) n Given a sample of observations (yi )1≤i≤n = y1 St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 15 / 42
  47. 47. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data settingMaximum likelihood estimation for incomplete data setting y ∈ Rn : observed data z ∈ Rl : missing data (y , z) ∈ Rn+l : complete data P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l Assumption: ∃θ ∈ Θ s.t. the complete data likelihood q(y , z; θ) = f (y , z; θ) Observed likelihood: g (y ; θ) = f (y , z; θ)µ(dz). (4) Given a sample of observations (yi )1≤i≤n = y1n ˆ Find: θg in Θ s.t. ˆ n θg = arg max g (y1 ; θ) θ∈Θ St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 15 / 42
  48. 48. Applying AMALA within SAEM Description of the algorithmAMALA-SAEM Incomplete data setting + maximum likelihood estimation = EM algorithm St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 16 / 42
  49. 49. Applying AMALA within SAEM Description of the algorithmAMALA-SAEM Incomplete data setting + maximum likelihood estimation = EM algorithm General case −→ E step not tractable St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 16 / 42
  50. 50. Applying AMALA within SAEM Description of the algorithmAMALA-SAEM Incomplete data setting + maximum likelihood estimation = EM algorithm General case −→ E step not tractable Stochastic Approximation EM for convergence properties St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 16 / 42
  51. 51. Applying AMALA within SAEM Description of the algorithmAMALA-SAEM Incomplete data setting + maximum likelihood estimation = EM algorithm General case −→ E step not tractable Stochastic Approximation EM for convergence properties with MCMC method for simulation step. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 16 / 42
  52. 52. Applying AMALA within SAEM Description of the algorithmAMALA-SAEM Incomplete data setting + maximum likelihood estimation = EM algorithm General case −→ E step not tractable Stochastic Approximation EM for convergence properties with MCMC method for simulation step. → AMALA-SAEM: using AMALA as the MCMC method St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 16 / 42
  53. 53. Applying AMALA within SAEM Description of the algorithmDescription of the algorithmAssumption: model in the exponential family = all information carried bysufficient statistics S For k = 1 : kend Iteration of SAEM St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 17 / 42
  54. 54. Applying AMALA within SAEM Description of the algorithmDescription of the algorithmAssumption: model in the exponential family = all information carried bysufficient statistics S For k = 1 : kend Iteration of SAEM Sample zk through a single AMALA step (simulation and acceptation/reject) using current parameter θk−1 St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 17 / 42
  55. 55. Applying AMALA within SAEM Description of the algorithmDescription of the algorithmAssumption: model in the exponential family = all information carried bysufficient statistics S For k = 1 : kend Iteration of SAEM Sample zk through a single AMALA step (simulation and acceptation/reject) using current parameter θk−1 Compute the stochastic approximation sk = sk−1 + γk (S(zk ) − sk−1 ) , where (γk )k is a sequence of positive step sizes. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 17 / 42
  56. 56. Applying AMALA within SAEM Description of the algorithmDescription of the algorithmAssumption: model in the exponential family = all information carried bysufficient statistics S For k = 1 : kend Iteration of SAEM Sample zk through a single AMALA step (simulation and acceptation/reject) using current parameter θk−1 Compute the stochastic approximation sk = sk−1 + γk (S(zk ) − sk−1 ) , where (γk )k is a sequence of positive step sizes. Update the parameter ˆ θk = θ(sk ). St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 17 / 42
  57. 57. Applying AMALA within SAEM Description of the algorithmDescription of the algorithmAssumption: model in the exponential family = all information carried bysufficient statistics S For k = 1 : kend Iteration of SAEM Sample zk through a single AMALA step (simulation and acceptation/reject) using current parameter θk−1 Compute the stochastic approximation sk = sk−1 + γk (S(zk ) − sk−1 ) , where (γk )k is a sequence of positive step sizes. Update the parameter ˆ θk = θ(sk ).Can require truncation on random boundaries for convergence purposes St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 17 / 42
  58. 58. Applying AMALA within SAEM Convergence propertiesConvergence properties Conditions: Smoothness of the model (classic conditions for convergence of stochastic approximation and EM) Condition for AMALA geometric ergodicity (B1) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 18 / 42
  59. 59. Applying AMALA within SAEM Convergence propertiesConvergence properties Conditions: Smoothness of the model (classic conditions for convergence of stochastic approximation and EM) Condition for AMALA geometric ergodicity (B1) Results: Convergence of (sk ) a.s. towards critical point of mean field of the problem St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 18 / 42
  60. 60. Applying AMALA within SAEM Convergence propertiesConvergence properties Conditions: Smoothness of the model (classic conditions for convergence of stochastic approximation and EM) Condition for AMALA geometric ergodicity (B1) Results: Convergence of (sk ) a.s. towards critical point of mean field of the problem Convergence of estimated parameters (θk ) a.s. towards critical point of observed likelihood St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 18 / 42
  61. 61. Applying AMALA within SAEM Convergence propertiesConvergence properties Conditions: Smoothness of the model (classic conditions for convergence of stochastic approximation and EM) Condition for AMALA geometric ergodicity (B1) Results: Convergence of (sk ) a.s. towards critical point of mean field of the problem Convergence of estimated parameters (θk ) a.s. towards critical point of observed likelihood √ Central limit theorem for (θk ) with rate 1/ γk St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 18 / 42
  62. 62. Applying AMALA within SAEM Convergence properties Conditions for the SA to converge Define for any V : X → [1, ∞] and any g : X → Rm the norm g (z) g V = sup . z∈X V (z)(A1’) S is an open subset of Rm , h : S → Rm is continuous and there exists a continuously differentiable function w : S → [0, ∞[ with the following properties. (i) There exists an M0 > 0 such that L {s ∈ S, w (s), h(s) = 0} ⊂ {s ∈ S, w (s) < M0 } . St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 19 / 42
  63. 63. Applying AMALA within SAEM Convergence properties Conditions for the SA to converge (2) (ii) There exists a closed convex set Sa ⊂ S for which s → s + ρHs (z) ∈ Sa for any ρ ∈ [0, 1] and (z, s) ∈ X × Sa (Sa is absorbing) and such that for any M1 ∈]M0 , ∞], the set WM1 ∩ Sa is a compact set of S where WM1 {s ∈ S, w (s) ≤ M1 }. (iii) For any s ∈ SL w (s), h(s) < 0. (iv) The closure of w (L) has an empty interior.(A2’) For any s ∈ S, Hs : X → S is measurable and Hs (z) πs (dz) < ∞. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 20 / 42
  64. 64. Applying AMALA within SAEM Convergence properties Conditions for the SA to converge (3)(A3”) There exist a function V : X → [1, ∞] such that {z ∈ X , V (z) < ∞} = ∅, constants a ∈]0, 1], p ≥ 2 , r > 0 and q ≥ 1 such that for any compact subset K ⊂ S, (i) sup Hs V < ∞, (5) s∈K sup ( gs V + Πs gs V) < ∞, (6) s∈K −a sup s −s { gs − gs Vq + Πs gs − Πs gs Vq} < ∞, (7) s,s ∈K where for anys ∈ S a solution of the Poisson equation g − Πs g = Hs − πs (Hs ) is denoted by gs . St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 21 / 42
  65. 65. Applying AMALA within SAEM Convergence propertiesConditions for the SA to converge (4) (ii) For any sequence ε = (εk )k≥0 satisfying εk < ¯ for an ¯ sufficiently small, for any sequence γ = (γk )k≥0 , there exist a constant C such that and for any z ∈ X , sup sup Eγ V p (zk )1σ(K)∧ν(ε)≥k ≤ C V p+r (z) , z,s (8) s∈K k≥0 where ν(ε) = inf{k ≥ 1, sk − sk−1 ≥ εk } and σ(K) = inf{k ≥ 1, sk ∈ K} and the expectation is related to the / non-homogeneous Markov chain ((zk , sk ))k≥0 using the step-size sequence γ = (γk )k≥0 .(A4) The sequences γ = (γk )k≥0 and ε = (εk )k≥0 are non-increasing, ∞ positive and satisfy: γk = ∞, lim εk = 0 and k=0 k→∞ ∞ {γk + γk εa + (γk ε−1 )p } < ∞, where a and p are defined in (A3”). 2 k k k=1 St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 22 / 42
  66. 66. Applying AMALA within SAEM Convergence propertiesCondition for AMALA-SAEM to converge (M1) The parameter space Θ is an open subset of Rp . The complete data likelihood function is given by: f (y , z; θ) = exp {−ψ(θ) + S(z), φ(θ) } , where S is a Borel function on Rl taking its values in an open subset S of Rm . Moreover, the convex hull of S(Rl ) is included in S, and, for all θ in Θ, ||S(z)||pθ (z)µ(dz) < ∞. (M2) The functions ψ and φ are twice continuously differentiable on Θ. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 23 / 42
  67. 67. Applying AMALA within SAEM Convergence propertiesCondition for AMALA-SAEM to converge (2) (M3) The function ¯ : Θ → S defined as s ¯(θ) s S(z)pθ (z)µ(dz) is continuously differentiable on Θ. (M4) The function l : Θ → R defined as the observed-data log-likelihood l(θ) log g (y ; θ) = log f (y , z; θ)µ(dz) is continuously differentiable on Θ and ∂θ f (y , z; θ)µ(dz) = ∂θ f (y , z; θ)µ(dz). St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 24 / 42
  68. 68. Applying AMALA within SAEM Convergence propertiesCondition for AMALA-SAEM to converge (3) ˆ (M5) There exists a function θ : S → Θ, such that: ˆ ∀s ∈ S, ∀θ ∈ Θ, L(s; θ(s)) ≥ L(s; θ). ˆ Moreover, the function θ is continuously differentiable on S. ˆ (M6) The functions l : Θ → R and θ : S → Θ are m times differentiable. (M7) (i) There exists an M0 > 0 such that ˆ ˆ s ∈ S, ∂s l(θ(s)) = 0 ⊂ {s ∈ S, −l(θ(s)) < M0 } . ¯ ˆ (ii) For all M1 > M0 , the set Conv (S(Rl )) ∩ {s ∈ S, −l(θ(s)) ≤ M1 } is a compact set of S. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 25 / 42
  69. 69. Applying AMALA within SAEM Convergence propertiesCondition for AMALA-SAEM to converge (4) (M8) There exists a polynomial function P of degree 2 such that for all z ∈ X ||S(z)|| ≤ |P(z)| . (B3) For any compact subset K of S, there exists a polynomial function Q of the hidden variable such that sup | z log pθ(s) (z)| ≤ |Q(z)| ˆ s∈K . St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 26 / 42
  70. 70. Application on Bayesian Mixed effect template estimationOutline: 1. AMALA: simulation of random variables in high dimension Anisotropic MALA description Convergence property 2. AMALA within stochastic algorithm for parameter estimation Maximum likelihood estimation for incomplete data setting AMALA-SAEM Convergence properties 3. Experiments BME-Template model: small deformation setting BME-Template model: LDDMM setting St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 27 / 42
  71. 71. Application on Bayesian Mixed effect template estimation Description of the BME Template modelBME Template model with small deformations Deformable template model: (u = voxel, vu its position) y (u) = I0 (vu − m(vu )) + σ (u) , St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 28 / 42
  72. 72. Application on Bayesian Mixed effect template estimation Description of the BME Template modelBME Template model with small deformations Deformable template model: (u = voxel, vu its position) y (u) = I0 (vu − m(vu )) + σ (u) , Parametric template and deformation: kp Iα (v ) = (Kp α)(v ) = Kp (v , rp,k )αj and j=1 kg mz (v ) = (Kg z(v ) = Kg (v , rg ,k )z j . j=1 St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 28 / 42
  73. 73. Application on Bayesian Mixed effect template estimation Description of the BME Template modelBME Template model with small deformations Deformable template model: (u = voxel, vu its position) y (u) = I0 (vu − m(vu )) + σ (u) , Parametric template and deformation: kp Iα (v ) = (Kp α)(v ) = Kp (v , rp,k )αj and j=1 kg mz (v ) = (Kg z(v ) = Kg (v , rg ,k )z j . j=1 Generative model:  z ∼ ⊗n N2kg (0, Γg ) | Γg ,  i=1 y ∼ ⊗n N|Λ| (mzi Iα , σ 2 Id) | z, α, σ 2 ,  i=1 St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 28 / 42
  74. 74. Application on Bayesian Mixed effect template estimation Description of the BME Template modelBME Template model with small deformations Deformable template model: (u = voxel, vu its position) y (u) = I0 (vu − m(vu )) + σ (u) , Parametric template and deformation: kp Iα (v ) = (Kp α)(v ) = Kp (v , rp,k )αj and j=1 kg mz (v ) = (Kg z(v ) = Kg (v , rg ,k )z j . j=1 Generative model:  z ∼ ⊗n N2kg (0, Γg ) | Γg ,  i=1 y ∼ ⊗n N|Λ| (mzi Iα , σ 2 Id) | z, α, σ 2 ,  i=1 Bayesian framework → MAP estimator (= penalised MLE) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 28 / 42
  75. 75. Application on Bayesian Mixed effect template estimation Results on the template estimationTraining setsFigure: Left: Training set (inverse video). Right: Noisy training set (inversevideo). St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 29 / 42
  76. 76. Application on Bayesian Mixed effect template estimation Results on the template estimationEstimated templates Algorithm/ FAM-EM H.G.-SAEM AMALA-SAEM Noise level No Noise Noisy of Variance 1Figure: Estimated templates using different algorithms and two level of noise.The training set includes 20 images per digit. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 30 / 42
  77. 77. Application on Bayesian Mixed effect template estimation Results on the covariance matrix estimationEstimated geometric variabilityFigure: Synthetic samples generated with respect to the BME template model. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 31 / 42
  78. 78. Application on Bayesian Mixed effect template estimation CLT empirical proofEmpirical proof of the CLTFigure: Evolution of the estimation of the noise variance along the SAEMiterations. Left: original data. Right: Noisy training set. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 32 / 42
  79. 79. Application on Bayesian Mixed effect template estimation CLT empirical proofFigure: Evolution of the estimation of the noise variance along the SAEMiterations. Test of convergence towards the Gaussian distribution of the estimatedparameters. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 33 / 42
  80. 80. Application on Bayesian Mixed effect template estimation Medical image template estimationCorpus callosum data baseFigure: Medical image template estimation: 10 Corpus callosum and spleniumtraining images among the 47 available.Figure: Grey level mean. FAM-EM estimated template. Hybrid Gibbs - SAEMestimated template.AMALA-SAEM estimation . St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 34 / 42
  81. 81. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shootingBME Template model with LDDMM Deformable template model: (u = voxel, vu its position) y (u) = I0 (φ−1 (vu )) + σ (u) , β(0) kp Parametric template: Iα (v ) = (Kp α)(v ) = Kp (v , rp,k )αj and φ j=1 LDDMM solution of shooting with initial momentum β(0). Generative model:  z ∼ ⊗n N2kg (0, Γg ) | Γg ,  i=1 y ∼ ⊗n N|Λ| (φβ(0) Iα , σ 2 Id) | z, α, σ 2 ,  i=1 Bayesian framework → MAP estimator (= penalised MLE) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 35 / 42
  82. 82. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shootingLDDMM: parametric deformation: Fix some control points: c(t) = (c1 (t), ..., cng (t)) Choose a kernel Kg Start from an initial momentum β(0) = β 1 (0), ..., β ng (0)Then, Hamiltonian System → Time evolution of both momenta andcontrol points  dc = ∂H (c, β) = K (c(t))β(t)    dt g  ∂β (9)  dβ  ∂H 1 = − (c, β) = − c(t) K (β(t), β(t))   dt ∂c 2 St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 36 / 42
  83. 83. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shootingLDDMM: parametric deformation (2):Interpolating on any point of the domain: ng vt (r ) = (Kg β(t))(r ) = Kg (r , ck (t))β k (t) ∀r ∈ D (10) k=1Deformation = solution of the flow equation:    ∂φβ(0) (t)  = vt ◦ φβ(0) (t) (11)  φ ∂t   0 = Id . φβ(0) = φβ(0) (1) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 37 / 42
  84. 84. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shootingGradient computationE (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  85. 85. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shootingGradient computationE (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0   S0 = {(ci , βi )}i          St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  86. 86. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shootingGradient computationE (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt     St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  87. 87. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shootingGradient computationE (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt  dy (t)  = G (S(t), y (t)) y (1) = y   dt St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  88. 88. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shootingGradient computationE (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0 A(yk (0))   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt  dy (t)  = G (S(t), y (t)) y (1) = y   dt St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  89. 89. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shootingGradient computationE (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt  dy (t)  = G (S(t), y (t)) y (1) = y   dt St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  90. 90. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shootingGradient computationE (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt  dy (t)  = G (S(t), y (t)) y (1) = y   dt T S0 E = dS0 y (0) y (0) A + S0 L St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  91. 91. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shootingGradient computationE (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt  dy (t)  = G (S(t), y (t)) y (1) = y   dt T S0 E = dS0 y (0) y (0) A + S0 L yk (0) A = 2 (I0 (yk (0)) − I (yk (1))) yk (0) I0 St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  92. 92. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shootingGradient computationE (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt  dy (t)  = G (S(t), y (t)) y (1) = y   dt T S0 E = dS0 y (0) y (0) A + S0 L yk (0) A = 2 (I0 (yk (0)) − I (yk (1))) yk (0) I0 - Momenta decrease image discrepancy - Control Points attracted by image contours St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  93. 93. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shootingGradient computationE (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt  dy (t)  = G (S(t), y (t)) y (1) = y   dt dη(t) = ∂S(t) G T η(t), η(0) = y (0) A dt St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  94. 94. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shootingGradient computationE (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt  dy (t)  = G (S(t), y (t)) y (1) = y   dt dη(t) = ∂S(t) G T η(t), η(0) = y (0) A dt dξ(t) = ∂y (t) G T η(t) − dF T ξ(t), ξ(1) = 0 dt St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  95. 95. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shootingGradient computationE (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt  dy (t)  = G (S(t), y (t)) y (1) = y   dt dη(t) = ∂S(t) G T η(t), η(0) = y (0) A dt dξ(t) = ∂y (t) G T η(t) − dF T ξ(t), ξ(1) = 0 dt S0 E = ξ(0) + S0 L St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  96. 96. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shootingUsing LDDMM deformations via shooting (preliminary results) AMALA : GH : St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 39 / 42
  97. 97. ConclusionConclusion Good performances (as accurate as other algorithms) Reduce computational time Can handle the movement of control points in practice (theory to confirm) Can handle sparsity of the template ( model selection) Removing control points ? In practice, why not... theory ? St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 40 / 42
  98. 98. ConclusionConclusion Good performances (as accurate as other algorithms) Reduce computational time Can handle the movement of control points in practice (theory to confirm) Can handle sparsity of the template ( model selection) Removing control points ? In practice, why not... theory ? Thank you ! St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 40 / 42
  99. 99. ConclusionSt´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 41 / 42
  100. 100. ConclusionSt´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 42 / 42

×