Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MLPI Lecture 3: Advanced Sampling Techniques

1,031 views

Published on

This lecture covers several advanced MCMC sampling techniques, including collapsed Gibbs sampling, Slice Sampler, Parallel tempering, and Hamiltonian Monte Carlo, as well as several softwares for generic sampling.

Published in: Science
  • Be the first to comment

MLPI Lecture 3: Advanced Sampling Techniques

  1. 1. Lecture'3 Advanced(Sampling(Techniques Dahua%Lin The$Chinese$University$of$Hong$Kong 1
  2. 2. Overview • Collapsed*Gibbs*Sampling • Sampling*with*Auxiliary*Variables • Slice*Sampling • Simulated*Tempering*&*Parallel*Tempering • Swendsen?Wang*Algorithm • Hamiltonian*Monte*Carlo 2
  3. 3. Collapsed)Gibbs)Sampling 3
  4. 4. Mo#va#ng(Example with We#want#to#sample#from# . 4
  5. 5. Gibbs%Sampling Draw% %where% : with% . 5
  6. 6. Gibbs%Sampling%(cont'd) Draw% : • How%well%can%this%sampler%perform%when% ? 6
  7. 7. Collapsed)Gibbs)Sampling • Basic&idea:"replace"the"original"condi0onal" distribu0on"with"a"condi0onal"distribu0on"of"a" marginal(distribu.on,"o7en"called"a"reduced( condi.onal(distribu.on. • Consider"the"example"above,"we"consider"a" marginal(distribu.on: 7
  8. 8. Collapsed)Gibbs)Sampling)(cont'd) • Draw& ,&with& &marginalized&out,&as: • Draw& • Can&we&exchange&the&order&of&these&two&steps?& Why? 8
  9. 9. Basic&Guidelines • Order%of%steps%ma-ers! • Generally,*one*can*move*components*from*"being' sampled"*to*"being'condi0oned'on". • replacing*outputs*with*intermediates*would*change* the*sta:onary*distribu:on. • A*variable*can*be*updated*mul:ple*:mes*in*an* itera:on. 9
  10. 10. Why$do$collapsed$samplers$o/en$perform$be3er$than$ full6fledged$Gibbs$samplers? 10
  11. 11. Rao$Blackwell+Theorem Consider)an)example) )and)we)want)to) es1mate) .)Suppose)we)have)two)tractable) ways)to)do)so: (1)$draw$ ,$and$ compute 11
  12. 12. Rao$Blackwell+Theorem+(cont'd) (2)$draw$ $where$ $is$the$ marginal$distribu4on,$and$compute • Both&are&correct.&By&Strong&LLN,&both& &and& & converge&to& &almost&surely. • Which&one&is&be<er?&Can&you&jus@fy&your&answer? 12
  13. 13. Rao$Blackwell+Theorem+(cont'd) • (Rao%Blackwell,Theorem)"Sample"variance"will"be" reduced"when"some"components"are"marginalized" out."With"the"se:ng"above,"we"have • Generally,*reducing)sample)variance*would*also*lead* to*the*reduc3on*of*autocorrela2on*of*the*chain,* thus*improving*the*mixing*performance. 13
  14. 14. Sampling)with)Auxiliary)Variables • The%Rao$Blackwell$Theorem%suggests%that%in%order%to% achieve%be3er%performance,%one%should%try%to% marginalize%out%as%many%components%as%possible. • However,%in%many%cases,%one%may%want%to%do%the% opposite,%that%is,%to%introduce%addi>onal%variables% to%facilitate%the%simula>ons. • For%example,%when%the%target$distribu6on%is% mul6modal,%one%may%use%an%auxiliary%variable%to% help%the%chain%escape%from%local%traps. 14
  15. 15. Use$Auxiliary$Variables • Specify)an)auxiliary)variable) )and)the)joint) distribu8on) )such)that) )for)certain) . • Design)a)chain)to)update) )using)the)M=H) algorithm)or)the)Gibbs)sampler. • The)samples)of) )can)then)be)obtained)through) marginaliza)on)or)condi)oning. 15
  16. 16. Slice&Sampling 16
  17. 17. Slice&Sampler • Sampling* *is*equivalent*to*sampling* uniformly*from*the*area*under* :* . • Gibbs*sampling*based*on*the*uniform*distribu;on* over* .*Each*itera;on*consists*of*two*steps: • Given* ,* • Given* ,* 17
  18. 18. Slice&Sampler&(Illustra0on) 18
  19. 19. Slice&Sampler&(Discussion) • Slice&sampler"can"mix"very"rapidly,"as"it"will"not"be" locally"trapped. • Slice&sampler"is"o7en"nontrivial"to"implement"in" prac8ce."Drawing" "is" some8mes"very"difficult. • For"distribu8ons"of"certain"forms,"which"have"an" easy&way"to"draw" ," slice&sampling"is"good"strategy. 19
  20. 20. Simulated*Tempering 20
  21. 21. Gibbs%Measure A"Gibbs%measure"is"a"probability" measure"with"a"density"in"the" following"form: Here,% %is%called%the%energy& func*on,% %is%called%the%inverse& temperature,%and%the%normalizing% constant% %depends%on% . 21
  22. 22. Gibbs%Measure%(cont'd) In#literature#of#MCMC#sampling,#we#o5en# parameterize#a#Gibbs#measure#using#the#temperature( parameter# ,#thus# . 22
  23. 23. Tempered'MCMC Typical(MCMC(methods(usually(rely(on(local%moves(to( explore(the(state(space.(What(is(the(problem? 23
  24. 24. Tempered'MCMC'(cont'd) Local&traps&o+en&leads&to&very&poor&mixing.&Can&we& improve&this? 24
  25. 25. Simulated*Tempering Suppose'we'intend'to'sample'from' Basic&idea:!Augment!the!target!distribu0on!by! including!a!temperature(index( ,!with!joint!distribu0on! given!by 25
  26. 26. Simulated*Tempering*(cont'd) • We$only$collect$samples$at$the$lowest'temperature,$ . • The$chain$mixes$much$faster$at$high$temperatures,$ but$we$want$to$collect$samples$at$the$lowest$ temperature.$So$we$have$to$constantly$switch$ between$temperatures. 26
  27. 27. Simulated*Tempering*(Algorithm) One$itera)on$of$Simulated*Tempering$has$two$steps: • (Base&transi+on):#update# #at#the#same#temperature,# i.e.#holding# #fixed. • (Temperature&switching):#with# #fixed,#propose# #with# #such#that# • Accept#the#change#with#probability# . • Any#drawbacks? 27
  28. 28. Simulated*Tempering*(Discussion) • Set% .%Given% ,%we%should%set% %such%that% uphill%moves%from%( )%should%have% a%considerable%probability%of%being%accepted. • Build%the%temperature(ladder%step%by%step%un?l%we% have%a%sufficiently%smooth%distribu?on%at%the%top. • The%?me%spent%on%the%base%level% %is%around% .%If%we%have%too%many%levels,%only%a%very% small%por?on%of%samples%can%be%used. 28
  29. 29. Simulated*Tempering*(Discussion) • All$temperature$levels$play$an$important$role.$So$it$ is$desirable$to$spend$comparable$amount$of$8me$at$ each$level.$Se:ng$ $for$each$ ,$we$have • The%normalzing%constants% %are%typically%unknown% and%es8ma8ng%them%is%very%difficult%and%expensive. 29
  30. 30. Parallel&Tempering (Basic'idea)!rather!than!jumping!between! temperatures,!it!simultaneously!simulate!mul3ple! chains,!each!at!a!temperature!level! ,!called!a!replica,! and!constantly!swap!samples!between!replicas. 30
  31. 31. Parallel&Tempering&(Algorithm) Each%itera*on%consists%of%the%following%steps: • (Parallel'update):"simulate"each"replica"with"its"own" transi2on"kernel • (Replica'exchange):"propose"to"swap"states"between" two"replicas"(say"the" 7th"and" 7th,"where" ): 31
  32. 32. Parallel&Tempering&(Algorithm) • The%proposal%is%accepted%with%probability% ,%where • We$collect$samples$from$the$base$replica$(the$one$ with$ ). • Why$does$this$algorithm$produce$the$desired$ distribu;on? 32
  33. 33. Parallel&Tempering&(Jus1fica1on) Let$ .$We$define Obviously,+the+step+of+parallel&update+preserves+the+ invariant+distribu5on+ . 33
  34. 34. Parallel&Tempering&(Jus1fica1on) Note%that%the%step%of%replica(exchange%is%symmetric,%i.e.% the%probabili0es%of%going%up%and%down%are%equal,%then% according%to%the%Metropolis(algorithm,%we%have% %with 34
  35. 35. Parallel&Tempering&(Discussion) • It$is$efficient$and$very$easy$to$implement,$especially$ in$a$parallel$compu6ng$environment. • It$is$o9en$an$art$instead$of$a$technique$to$tune$a$ parallel$tempering$system. • The$parallel-tempering$is$a$special$case$of$a$large$ family$of$MCMC$methods$called$Extended-Ensemble- Monte-Carlo,$which$involves$a$collec6on$of$parallel$ Markov$chains$and$the$simula6on$switches$ between$these$them. 35
  36. 36. Swendsen'Wang+Algorithm The$Swendsen'Wang$algorithm$(R.-Swendsen$and$J.- Wang,$1987)$is$an$efficient$Gibbs$sampling$algorithm$ for$sampling$from$the$extended-Ising-model. 36
  37. 37. Standard'Ising'Model The$standard$Ising&model$is$defined$as where% %for%each% %is%called%a%spin,%and% . • Gibbs&sampling&is&extremely&slow,&especially&when& the&temperature&is&low. 37
  38. 38. Extended'Ising'Model • We$extend$the$model$by$introducing$addi5onal$ bond%variables$ ,$each$for$an$edge.$Each$bond$ has$two$states:$ $indica5ng$connected$and$ $ indica5ng$disconnected. • We$define$a$joint$distribu5on$that$couples$the$spins$ and$bonds, 38
  39. 39. Extended'Ising'Model'(cont'd) Here,% %is%described%as%below: • When& ,& &for&every& se.ng&of& • When& ,& 39
  40. 40. Extended'Ising'Model'(cont'd) With%this%se(ng,% %can%be%wri1en%as: where% : • when& ,& &must&be& • when& ,& &is&set&to&zero&with&probability& . 40
  41. 41. Swendsen'Wang+Algorithm !Each!itera*on!consists!of!two!steps: • (Clustering):"condi(oned"on"the"spins" ,"draw"the" bonds" "independenly."For"an"edge" : • If" ,"set" • If" ,"set" "with"probability" "or" "otherwise. 41
  42. 42. Swendsen'Wang+Algorithm • (Swapping):"condi(oned"on"the"bonds" ,"draw"the" spins" . • For"each"connected"component,"draw" "or" " with"equal"chance,"and"assign"the"resultant"value" to"all"nodes"in"the"component. 42
  43. 43. Swendsen'Wang+Algorithm+ (Illustra7on)In the case of a rectangular grid, this Gibbs sampling algorithm mixes very rapidly. The following figures illustrate Gibbs sampling. Spin states up and down are shown by filled and empty circles. Bond states 1 and 0 are shown by thick lines and thin dotted lines. We start from a state with five connected components. (Remember that isolated spins count as connected components, albeit of size 1.) First, let’s update the bonds The forbidden bonds are highlighted Bonds are forbidden from forming wherever the two adjacent spins are in opposite states. The bonds that are not forbidden are set to the 1 state with probability p. After updating the bonds Now we update spins Update bonds again 1.2 Other properties of the extended model We already mentioned that the partition function Z is the same as that of the Ising model. In the case of a rectangular grid, this Gibbs sampling algorithm mixes very rapidly. The following figures illustrate Gibbs sampling. Spin states up and down are shown by filled and empty circles. Bond states 1 and 0 are shown by thick lines and thin dotted lines. We start from a state with five connected components. (Remember that isolated spins count as connected components, albeit of size 1.) First, let’s update the bonds The forbidden bonds are highlighted Bonds are forbidden from forming wherever the two adjacent spins are in opposite states. The bonds that are not forbidden are set to the 1 state with probability p. After updating the bonds Now we update spins Update bonds again 1.2 Other properties of the extended model We already mentioned that the partition function Z is the same as that of the Ising model. 43
  44. 44. Swendsen'Wang+Algorithm+ (Discussion) • When& &is&large,& &has&a&high&probability&of& being&set&to&one,&i.e.& &and& &are&likely&to&be& connected. • Experiments&show&that&the&Swendsen)Wang& algorithm&mixes&very&rapidly,&especially&for& rectangular&grids. • Can&you&provide&an&intui?ve&explana?on? 44
  45. 45. Swendsen'Wang+Algorithm+ (Discussion) • The%Swendsen'Wang%algorithm%can%be%generalized% to%Po4s%models%(nodes%can%take%values%from%a%finite% set). • The%Swendsen'Wang%algorithm%has%been%widely% used%in%image%analysis%applicaAons,%e.g.%image% segmentaAon%(in%this%case,%it%is%called%Swendsen' Wang,cut). 45
  46. 46. Hamiltonian)Monte)Carlo • An$MCMC$method$based$on$Hamiltonian)Dynamics.$ It$was$originally$devised$for$molecular)simula1on • In$1987,$a$seminal$paper$by$Duane$et)al$unifies$ MCMC$and$molecular$dynamics.$They$called$it$ Hybrid)Monte)Carlo,$which$abbreviates$to$HMC • In$many$arEcles,$people$call$it$Hamiltonian)Monte) Carlo,$as$this$name$is$considered$to$be$more$ specific$and$informaEve,$and$it$retains$the$same$ abbreviaEon$"HMC". 46
  47. 47. Mo#va#ng(Example:(Free(Fall 47
  48. 48. Mo#va#ng(Example:(Free(Fall • The%change%of%momentum% %is%caused%by%the% accumula5on/release%of%the%poten(al+energy: • The%change%of%loca-on% %is%caused%by%velocity,%the% deriva-ve%of%kinema-c.energy%w.r.t.%the%momentum: 48
  49. 49. Hamiltonian)Dynamics • Hamiltonian)Dynamics"is"a"generalized"theory"of"the" classical)mechanics,"which"provides"a"elegant"and" flexible"abstrac:on"of"a"dynamic"system"in"physics. • In"Hamiltonian"Dynamics,"a"physical"system"is" described"by" ,"where" "and" "are" respec:vely"the"posi1on"and"momentum"of"the" @th" en:ty. 49
  50. 50. Hamilton's+Equa/ons The$dynamics$of$the$system$is$characterized$by$the$ Hamilton's+Equa/ons: Here,% %is%called%the%Hamiltonian,%which%can%be% interpreted%as%the%total)energy%of%the%system. 50
  51. 51. Hamilton's+Equa/ons+(cont'd) • The%Hamiltonian% %is%o)en%formulated%as%the%sum% of%the%poten+al,energy% %and%the%kine+c,energy% : • With&this&se)ng,&the&Hamilton's+Equa/ons&become: 51
  52. 52. Conserva)on*of*Hamiltonian The$Hamiltonian$is$conserved,$i.e.,$it$is$invariant$over$ ,me: Intui&vely,,this,reflects,the,law$of$energy$conserva/on. 52
  53. 53. Hamiltonian)Reversibility • The%Hamiltonian)dynamics%is%reversible • Let%the%ini+al%states%be% %and%the%states%at% +me% %be% .%Then,%it%we%reverse%the%process,% star+ng%at% ,%then%the%states%at%+me% % would%be% . • In%the%context%of%MCMC,%this%leads%to%the% reversibility%of%the%underlying%chain. 53
  54. 54. Simula'on*of*Hamiltonian*Dynamics A"natural"idea"to"simulate"Hamiltonian)dynamics"is"to" use"Euler's)method"over"discre1zed"1me"steps: Is#this#a#good#method? 54
  55. 55. Leapfrog)Method Be#er%results%can%be%obtained%with%leapfrog: More%importantly,%the%leapfrog%update%is%reversible. 55
  56. 56. Leapfrog)Method)(cont'd) 56
  57. 57. Example Consider)a)Hamiltonian)system: Write&down&the&Hamilton's+Equa/ons: Derive&the&solu-on: 57
  58. 58. Example((Simula-on) 58
  59. 59. Hamiltonian)Monte)Carlo (Basic'idea):!Consider!the!poten&al)energy!as!the!Gibbs) energy,!and!introduce!the!"momentums"!as!auxiliary) variables!to!control!the!dynamics. 59
  60. 60. Hamiltonian)Monte)Carlo)(cont'd) Suppose'the'target&distribu,on'is' ,'then'we'form'an'augmented& distribu,on'as Here,%the%loca%ons% %represent%the%variables%of% interest,%and%the%momentums% %control%the%dynamics% of%simula7on. 60
  61. 61. Hamiltonian)Monte)Carlo)(cont'd) In#prac(ce,#the#kine%c#energy#is#o2en#formalized#as 61
  62. 62. Hamiltonian)Monte)Carlo)(Algorithm) Each%itera*on%of%HMC%comprises%two%steps: • Gibbs%update:#sample#the#momentums# #from#the# Gaussian#prior#given#by 62
  63. 63. Hamiltonian)Monte)Carlo)(Algorithm) • Metropolis*update:#using#Hamiltonian#dynamics#to# propose#a#new#state.#Star8ng#from# ,#simulate# the#dynamic#system#with#the#leapfrog#method#for# # steps#with#step<size# ,#which#yields# .#The# proposed#state#is#accepted#with#probability: 63
  64. 64. HMC$(Discussion) • If$the$simula.on$is$exact,$we$will$have$ ,$and$thus$the$proposed$state$ should$always$be$accepted.$ • In$prac.ce,$there$can$be$some$devia.on$due$to$ discre.za.on,$we$have$to$use$the$Metropolis$ rule$to$guarantee$the$correctness. 64
  65. 65. HMC$(Discussion) • HMC%has%a%high%acceptance% rate%while%allowing%large% moves%along%less6constrained% direc8ons%at%each%itera8on. • This%is%a%key%advantage%as% compared%to%random'walk% proposals,%which,%in%order%to% maintain%a%reasonably%high% acceptance%rate,%has%to%keep% a%very%small%step%size,% resul8ng%in%substan8al% correla8on%between% consecu8ve%samples. 65
  66. 66. Tuning&HMC • For%efficient%simula1on,%it%is%important%to%choose% appropriate%values%for%both%the%leapfrog%step%size% % and%the%number%of%leapfrog%steps%per%itera1on% . • Tuning%HMC%(and%actually%many%generic%sampling% methods)%oCen%requires%preliminary*runs%with% different%trial%seGngs%and%different%ini1al%values,%as% well%as%careful%analysis%of%the%energy%trajectories. 66
  67. 67. Tuning&HMC&(cont'd) • For%most%cases,% %and% %can%be%tuned%independently. • Too%small%a%stepsize%would%waste%computa8on% 8me,%while%large%stepsize%would%cause%unstable% simula8on,%and%thus%low%acceptance%rate. • One%should%choose% %such%that%the%energy% trajectory%is%stable%and%the%acceptance%rate%is% maintained%at%a%reasonably%high%level. • One%should%choose% %such%that%back@and@forth% movement%of%the%states%can%be%observed. 67
  68. 68. Generic'Sampling'Systems A"number"of"so,ware"systems"are"available"for" sampling"from"models"specified"by"the"user • WinBUGS:*based*on*BUGS*(Bayesian*inference* Using*Gibbs*Sampling). • provide*a*friendly*language*for*user*to*specify* the*model • Running*only*on*Windows • Note:*The*development*has*stopped*since*2007. 68
  69. 69. Generic'Sampling'Systems'(cont'd) • JAGS:'"Just'Another'Gibbs'Sampler" • Cross8pla9orm'support • Use'a'dialect'of'BUGS • Extensible:'allow'users'to'write'customized' funcCons,'distribuCons,'and'samplers 69
  70. 70. Generic'Sampling'Systems'(cont'd) • Stan:'"Sampling'Through'Adap5ve'Neighborhoods" • Core'wri=en'in'C++,'and'ports'available'in' Python,'R,'Matlab,'and'Julia • A'user'friendly'language'for'model'specifica5on • Use'Hamiltonian'Monte'Carlo'(HMC)'and'No'UL Turn'Samplers'(NUTS)'as'core'algorithm • Open'source'(GPLv3'licensed)'and'under'ac5ve' development'on'Github 70
  71. 71. Stan%Example data { int<lower=0> N; vector[N] x; vector[N] y; } parameters { real alpha; real beta; real<lower=0> sigma; } model { for (n in 1:N) y[n] ~ normal(alpha + beta * x[n], sigma); } 71
  72. 72. Generic'Sampling'System'vs.' Dedicated'Algorithms Generic' Dedicated' Easy%to%use% Require%knowledge%and%experience% High%produc9vity% Time=consuming%to%develop% Slow% O@en%remarkably%more%efficient% Limited%flexibility% Necessary%for%many%new%models% 72

×