0
Wang–Landau algorithm                            Improvements                Example: variable selection                  ...
Wang–Landau algorithm                               Improvements                   Example: variable selection            ...
Wang–Landau algorithm                             Improvements                 Example: variable selection                ...
Wang–Landau algorithm                               Improvements                   Example: variable selection            ...
Wang–Landau algorithm                            Improvements                Example: variable selection                  ...
Wang–Landau algorithm                                                                Improvements                         ...
Wang–Landau algorithm                             Improvements                 Example: variable selection                ...
Wang–Landau algorithm                              Improvements                  Example: variable selection              ...
Wang–Landau algorithm                               Improvements                   Example: variable selection            ...
Wang–Landau algorithm                             Improvements                 Example: variable selection                ...
Wang–Landau algorithm                              Improvements                  Example: variable selection              ...
Wang–Landau algorithm                             Improvements                 Example: variable selection                ...
Wang–Landau algorithm                                                                  Automatic Binning                  ...
Wang–Landau algorithm                                               Automatic Binning                             Improvem...
Wang–Landau algorithm                                                   Automatic Binning                                 ...
Wang–Landau algorithm                                                   Automatic Binning                               Im...
Wang–Landau algorithm                                                             Automatic Binning                       ...
Wang–Landau algorithm                                                   Automatic Binning                            Impro...
Wang–Landau algorithm                                                            Automatic Binning                        ...
Wang–Landau algorithm                                                            Automatic Binning                        ...
Wang–Landau algorithm                                                            Automatic Binning                        ...
Wang–Landau algorithm                                                Automatic Binning                              Improv...
Wang–Landau algorithm                             Improvements                 Example: variable selection                ...
Wang–Landau algorithm                              Improvements                  Example: variable selection              ...
Wang–Landau algorithm                              Improvements                  Example: variable selection              ...
Wang–Landau algorithm                                            Improvements                                Example: vari...
Wang–Landau algorithm                               Improvements                   Example: variable selection            ...
Wang–Landau algorithm                              Improvements                  Example: variable selection              ...
Wang–Landau algorithm                             Improvements                 Example: variable selection                ...
Upcoming SlideShare
Loading in...5
×

PAWL - GPU meeting @ Warwick

214

Published on

January 25th, 2012

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
214
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "PAWL - GPU meeting @ Warwick"

  1. 1. Wang–Landau algorithm Improvements Example: variable selection Conclusion Parallel Adaptive Wang–Landau Algorithm Pierre E. Jacob CEREMADE - Universit´ Paris Dauphine, funded by AXA Research e GPU in Computational Statistics January 25th, 2012 joint work with Luke Bornn (UBC), Arnaud Doucet (Oxford),Pierre Del Moral (INRIA & Universit´ de Bordeaux), Robin J. Ryder (Dauphine) e Pierre E. Jacob PAWL 1/ 29
  2. 2. Wang–Landau algorithm Improvements Example: variable selection ConclusionOutline 1 Wang–Landau algorithm 2 Improvements Automatic Binning Adaptive proposals Parallel Interacting Chains 3 Example: variable selection 4 Conclusion Pierre E. Jacob PAWL 2/ 29
  3. 3. Wang–Landau algorithm Improvements Example: variable selection ConclusionWang–Landau Context unnormalized target density π on a state space X A kind of adaptive MCMC algorithm It iteratively generates a sequence Xt . The stationary distribution is not π itself. At each iteration a different stationary distribution is targeted. Pierre E. Jacob PAWL 3/ 29
  4. 4. Wang–Landau algorithm Improvements Example: variable selection ConclusionWang–Landau Partition the space The state space X is cut into d bins: d X = Xi and ∀i = j Xi ∩ Xj = ∅ i=1 Goal The generated sequence spends a desired proportion φi of time in each bin Xi , within each bin Xi the sequence is asymptotically distributed according to the restriction of π to Xi . Pierre E. Jacob PAWL 4/ 29
  5. 5. Wang–Landau algorithm Improvements Example: variable selection ConclusionWang–Landau Stationary distribution Define the mass of π over Xi by: ψi = π(x)dx Xi The stationary distribution of the WL algorithm is: φJ(x) π (x) ∝ π(x) × ˜ ψJ(x) where J(x) is the index such that x ∈ XJ(x) Pierre E. Jacob PAWL 5/ 29
  6. 6. Wang–Landau algorithm Improvements Example: variable selection ConclusionWang–Landau Example with a bimodal, univariate target density: π and two π ˜ corresponding to different partitions. Here φi = d −1 . Original Density, with partition lines Biased by X Biased by Log Density 0 −2 −4 Log Density −6 −8 −10 −12 −5 0 5 10 15 −5 0 5 10 15 −5 0 5 10 15 X Pierre E. Jacob PAWL 6/ 29
  7. 7. Wang–Landau algorithm Improvements Example: variable selection ConclusionWang–Landau Plugging estimates In practice we cannot compute ψi analytically. Instead we plug in estimates θt (i) of ψi /φi at iteration t, and define the distribution πθt by: 1 πθt (x) ∝ π(x) × θt (J(x)) Metropolis–Hastings The algorithm does a Metropolis–Hastings step, aiming πθt at iteration t, generating a new point Xt , updating θt . . . Pierre E. Jacob PAWL 7/ 29
  8. 8. Wang–Landau algorithm Improvements Example: variable selection ConclusionWang–Landau Estimate of the bias The update of the estimated bias θt (i) is done according to: θt (i) ← θt−1 (i) [1 + γt (1 Xi (Xt ) − φi )] I with d the number of bins, γt a decreasing sequence or “step size”. E.g. γt = 1/t. If 1 Xi (Xt ) then θt (i) increases; I otherwise θt (i) decreases. Pierre E. Jacob PAWL 8/ 29
  9. 9. Wang–Landau algorithm Improvements Example: variable selection ConclusionWang–Landau The algorithm itself 1: First, ∀i ∈ {1, . . . , d} set θ0 (i) ← 1. 2: Choose a decreasing sequence {γt }, typically γt = 1/t. 3: Sample X0 from an initial distribution π0 . 4: for t = 1 to T do 5: Sample Xt from Pt−1 (Xt−1 , ·), a MH kernel with invariant distribution πθt−1 (x). 6: Update the bias: θt (i) ← θt−1 (i)[1 + γt (1 Xi (Xt ) − φi )]. I 7: end for Pierre E. Jacob PAWL 9/ 29
  10. 10. Wang–Landau algorithm Improvements Example: variable selection ConclusionWang–Landau Result In the end we get: a sequence Xt asymptotically following π , ˜ as well as estimates θt (i) of ψi /φi . Pierre E. Jacob PAWL 10/ 29
  11. 11. Wang–Landau algorithm Improvements Example: variable selection ConclusionWang–Landau Usual improvement: Flat Histogram Wait for the FH criterion to occur before decreasing γt . νt (i) (FH) max − φi < c i=1...d t t where νt (i) = k=1 1 Xi (Xk ) I and c > 0. WL with stochastic schedule Let κt be the number of times FH was reached at iteration t. Use γκt at iteration t instead of γt . If FH reached, reset νt (i) to 0. Pierre E. Jacob PAWL 11/ 29
  12. 12. Wang–Landau algorithm Improvements Example: variable selection ConclusionWang–Landau Theoretical Understanding of WL with deterministic schedule The schedule γt decreases at each iteration, hence θt converges, hence Pt (·, ·) converges . . . ≈ “diminishing adaptation”. Theoretical Understanding of WL with stochastic schedule Flat Histogram is reached in finite time for any γ, φ, c if one uses the following update: log θt (i) ← log θt−1 (i) + γ(1 Xt (Xt ) − φi ) I instead of θt (i) ← θt−1 (i)[1 + γ(1 Xt (Xt ) − φi )] I Pierre E. Jacob PAWL 12/ 29
  13. 13. Wang–Landau algorithm Automatic Binning Improvements Adaptive proposals Example: variable selection Parallel Interacting Chains ConclusionAutomate Binning Maintain some kind of uniformity within bins. If non-uniform, split the bin. Frequency Frequency Log density Log density (a) Before the split (b) After the split Pierre E. Jacob PAWL 13/ 29
  14. 14. Wang–Landau algorithm Automatic Binning Improvements Adaptive proposals Example: variable selection Parallel Interacting Chains ConclusionAdaptive proposals Target a specific acceptance rate: σt+1 = σt + ρt (21 > 0.234) − 1) I(A Or use the empirical covariance of the already-generated chain: Σt = δ × Cov (X1 , . . . , Xt ) Pierre E. Jacob PAWL 14/ 29
  15. 15. Wang–Landau algorithm Automatic Binning Improvements Adaptive proposals Example: variable selection Parallel Interacting Chains ConclusionParallel Interacting Chains (1) (N) N chains (Xt , . . . , Xt ) instead of one. targeting the same biased distribution πθt at iteration t, sharing the same estimated bias θt at iteration t. The update of the estimated bias becomes:   N 1 (j) log θt (i) ← log θt−1 (i) + γκt  1 Xi (Xt ) − φi  I N j=1 Pierre E. Jacob PAWL 15/ 29
  16. 16. Wang–Landau algorithm Automatic Binning Improvements Adaptive proposals Example: variable selection Parallel Interacting Chains ConclusionParallel Interacting Chains How “parallel” is PAWL? The algorithm’s additional cost compared to independent parallel MCMC chains lies in: 1 N (j) getting the proportions N j=1 1 Xi (Xt ) I updating (θt (1), . . . , θt (d)). Pierre E. Jacob PAWL 16/ 29
  17. 17. Wang–Landau algorithm Automatic Binning Improvements Adaptive proposals Example: variable selection Parallel Interacting Chains ConclusionParallel Interacting Chains Example: Normal distribution Histogram of the binned coordinate 0.4 0.3 Density 0.2 0.1 0.0 −4 −2 0 2 4 binned coordinate Pierre E. Jacob PAWL 17/ 29
  18. 18. Wang–Landau algorithm Automatic Binning Improvements Adaptive proposals Example: variable selection Parallel Interacting Chains ConclusionParallel Interacting Chains Reaching Flat Histogram 40 30 #FH N=1 20 N = 10 N = 100 10 2000 4000 6000 8000 10000 iterations Pierre E. Jacob PAWL 18/ 29
  19. 19. Wang–Landau algorithm Automatic Binning Improvements Adaptive proposals Example: variable selection Parallel Interacting Chains ConclusionParallel Interacting Chains Stabilization of the log penalties 10 5 value 0 −5 −10 2000 4000 6000 8000 10000 iterations Figure: log θt against t, for N = 1 Pierre E. Jacob PAWL 19/ 29
  20. 20. Wang–Landau algorithm Automatic Binning Improvements Adaptive proposals Example: variable selection Parallel Interacting Chains ConclusionParallel Interacting Chains Stabilization of the log penalties 10 5 value 0 −5 −10 2000 4000 6000 8000 10000 iterations Figure: log θt against t, for N = 10 Pierre E. Jacob PAWL 20/ 29
  21. 21. Wang–Landau algorithm Automatic Binning Improvements Adaptive proposals Example: variable selection Parallel Interacting Chains ConclusionParallel Interacting Chains Stabilization of the log penalties 10 5 value 0 −5 −10 2000 4000 6000 8000 10000 iterations Figure: log θt against t, for N = 100 Pierre E. Jacob PAWL 21/ 29
  22. 22. Wang–Landau algorithm Automatic Binning Improvements Adaptive proposals Example: variable selection Parallel Interacting Chains ConclusionParallel Interacting Chains Multiple effects of parallel chains   N 1 (j) log θt (i) ← log θt−1 (i) + γκt  1 Xi (Xt ) − φi  I N j=1 FH is reached more often when N increases, hence γκt decreases quicker; log θt tends to vary much less when N increases, even for a fixed value of γ. Pierre E. Jacob PAWL 22/ 29
  23. 23. Wang–Landau algorithm Improvements Example: variable selection ConclusionVariable selection Settings Pollution data as in McDonald & Schwing (1973). For 60 metropolitan areas: 15 possible explanatory variables (including precipitation, population per household, . . . ) (denoted by X ), the response variable Y is the age-adjusted mortality rate. This leads to 32,768 possible models to explain the data. Pierre E. Jacob PAWL 23/ 29
  24. 24. Wang–Landau algorithm Improvements Example: variable selection ConclusionVariable selection Introduce γ ∈ {0, 1}p the “variable selector”, qγ represents the number of variables in model “γ”, g some large value (g -prior, see Zellner 1986, Marin & Robert 2007). Posterior distribution π(γ|y, X) ∝ (g + 1)−(qγ +1)/2 −n/2 g T y y− yT Xγ (XT Xγ )−1 Xγ y γ . g +1 Pierre E. Jacob PAWL 24/ 29
  25. 25. Wang–Landau algorithm Improvements Example: variable selection ConclusionVariable selection Most naive MH algorithm The proposal is flipping a variable on / off at random, at each iteration. Binning Along values of log π(x), found with a preliminary exploration, in 20 bins. Pierre E. Jacob PAWL 25/ 29
  26. 26. Wang–Landau algorithm Improvements Example: variable selection ConclusionVariable selection N=1 N = 10 N = 100 0 −20 Log(θ) −40 −60 20000 40000 60000 80000 5000 10000 15000 20000 25000 500 1000 1500 2000 2500 3000 3500 Iteration Figure: Each run took 2 minutes (+/- 5 seconds). Dotted lines show the real ψ. Pierre E. Jacob PAWL 26/ 29
  27. 27. Wang–Landau algorithm Improvements Example: variable selection ConclusionVariable selection Wang−Landau Metropolis−Hastings, Temp = 1 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Model Saturation 0.0 Metropolis−Hastings, Temp = 10 Metropolis−Hastings, Temp = 100 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 500 1000 1500 2000 2500 3000 3500 500 1000 1500 2000 2500 3000 3500 Iteration Figure: qγ /p (mean and 95% interval) along iterations, for N = 100. Pierre E. Jacob PAWL 27/ 29
  28. 28. Wang–Landau algorithm Improvements Example: variable selection ConclusionConclusion Automatic binning but. . . We still have to define a range of plausible (or “interesting”) values. Parallel Chains Seems reasonable to use more than N = 1 chain, with or without GPUs. No theoretical validation of this yet. Optimal N for a given computational effort? Need of a stochastic schedule? It seems that using large N makes the use and hence the choice of γt irrelevant. Pierre E. Jacob PAWL 28/ 29
  29. 29. Wang–Landau algorithm Improvements Example: variable selection ConclusionWould you like to know more? Article: An Adaptive Interacting Wang-Landau Algorithm for Automatic Density Exploration, with L. Bornn, P. Del Moral, A. Doucet. Article: The Wang-Landau algorithm reaches the Flat Histogram criterion in finite time, with R. Ryder. Software: PAWL, available on CRAN: install.packages("PAWL") References: F. Wang, D. Landau, Physical Review E, 64(5):56101 Y. Atchad´, J. Liu, Statistica Sinica, 20:209-233 e Pierre E. Jacob PAWL 29/ 29
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×