# MLPI Lecture 2: Monte Carlo Methods (Basics)

This lecture covers the basics of Monte Carlo methods, including Monte Carlo integration, Transform sampling, Rejection sampling, Importance sampling, Markov chain theory, and Markov Chain Monte Carlo (MCMC).

• The proofs of some propositions as well as the justifications of several sampling algorithms are provided here: http://www.slideshare.net/lindahua2015/lec2-appendix

### MLPI Lecture 2: Monte Carlo Methods (Basics)

1. 1. Lecture'2 Monte&Carlo&Methods Dahua%Lin The\$Chinese\$University\$of\$Hong\$Kong 1
2. 2. Monte&Carlo& Methods Monte&Carlo&methods!are!a!large! family!of!computa0onal! algorithms!that!rely!on!random& sampling.!These!methods!are! mainly!used!for • Numerical+integra/on • Stochas/c+op/miza/on • Characterizing+distribu/ons 2
3. 3. Expecta(ons,in,Sta(s(cal,Analysis Compu&ng)expecta'on)is)perhaps)the)most)common) opera&on)in)sta&s&cal)analysis. • Compu'ng*the*normaliza)on*factor*in*posterior* distribu'on: 3
4. 4. Expecta(ons,in,Sta(s(cal,Analysis, (cont'd) • Compu'ng*marginaliza'on: • Compu'ng*expecta'on*of*func'ons: 4
5. 5. Compu&ng)Expecta&on • Generally,*expecta'on*can*be*wri/en*as* • For*discrete*space:* • For*con7nuous*space:* 5
6. 6. Law\$of\$Large\$Numbers • (Strong))Law)of)Large)Numbers)(LLN):#Let# #be#i.i.d#random#variables#and# #be#a# measurable#func6on.#Then 6
7. 7. Monte&Carlo&Integra-on • We\$can\$use\$sample'mean\$to\$approximate\$ expecta.on: • How%many%samples%are%enough? 7
8. 8. Variance(of(Sample(Mean • By\$the\$Central(Limit(Theorem((CLT): Here,\$ \$is\$the\$variance\$of\$ . • The\$variance\$of\$ \$is\$ .\$The\$number\$of\$ samples\$required\$to\$a=ain\$a\$certain\$variance\$ \$is\$ at\$least\$ . 8
9. 9. Random'Number'Genera.on • All\$sampling\$methods\$rely\$on\$a\$stream\$of\$random' numbers\$to\$construct\$random\$samples. • "True"\$random\$numbers\$are\$diﬃcult\$to\$obtain.\$A\$ more\$widely\$used\$approach\$is\$to\$use\$computa:onal\$ algorithms\$to\$produce\$long\$sequences\$of\$apparently' random\$numbers,\$called\$pseudorandom'numbers. 9
10. 10. Random'Number'Genera.on'(cont'd) • The%sequence%of%pseudorandom+numbers%is% determined%by%a%seed. • If%a%randomized%simula9on%is%based%on%a%single% random+stream,%it%can%be%exactly%reproduced%by% ﬁxing%the%seed%ini9ally. 10
11. 11. RNGs • Linear(Congruen-al(Generator((LCG):( . • C(and(Java's(buil-n. • Useful(for(simple(randomized(program. • Not(good(enough(for(serious(Monte(Carlo( simula-on. 11
12. 12. RNGs • Mersenne'Twister'(MT):' • Passes'Die\$hard'tests,'good'enough'for'most' Monte'Carlo'experiments • Provided'by'C++'11'or'Boost • Default'RNG'for'MATLAB,'Numpy,'Julia,'and' many'other'numerical'soLwares • Not'amenable'to'parallel'use 12
13. 13. RNGs • Xorshi(1024:. • Proposed.in.year.2014 • Passes.BigCrush • Incredibly.simple.(5*+*6.lines.of.C.code) 13
14. 14. Sampling)a)Discrete)Distribu2on • Linear(search:( • Sorted(search:( • but(much(faster(when(prob.(mass(concentrates( on(a(few(values • Binary(search:( ,( • but(each(step(is(a(bit(more(expensive • Can(we(do(be?er? 14
15. 15. Sampling)a)Discrete)Distribu2on • Huﬀman(coding:( • (preprocessing(+( (per(sample • Alias(methods((by(A.#J.#Walker):( • (preprocessing(+( (per(sample 15
16. 16. Transform)Sampling • Let% %be%the%cumula&ve)distribu&on)func&on)(cdf)%of% a%distribu0on% .%Let% ,%then% . • Why% ? • How%to%generate%a%exponen&ally)distributed% sample%? 16
17. 17. Sampling)from)Mul/variate)Normal How\$to\$draw\$a\$sample\$from\$a\$mul\$variate+normal+ distribu\$on\$ \$? (Algorithm) 1. Perform)Cholesky)decomposi4on)to)get) )s.t.) . 2. Generate)a)vector) )comprised)of)iid)values)from) 3. Let) . 17
18. 18. Rejec%on(Sampling • (Rejec&on)sampling)algorithm):"To"sample"from"a" distribu2on" ,"which"has" "for" some" : 1. Sample" 2. Accept" "with"probability" . • Why"does"this"algorithm"yield"the"desired" distribu2on"? 18
19. 19. Rejec%on(Sampling((cont'd) • What&are&the&acceptance&rate? • What&are&the&problems&of&this&method? 19
20. 20. Importance+Sampling • Basic&idea:"generate"samples"from"an"easier& distribu+on" ,"which"is"o4en"referred"to"as"the" proposal&distribu+on,"and"then"reweight"the" samples. 20
21. 21. Importance+Sampling+(cont'd) • Let% %be%the%target&distribu,on%and% %be%the% proposal&distribu,on%with% ,%and%let%the% importance&weight%be% .%Then • We\$can\$approximate\$ \$with 21
22. 22. Importance+Sampling+(cont'd) • By\$the\$strong'law'of'large'numbers,\$we\$have\$ • How%to%choose%a%good%proposal%distribu4on% ? 22
23. 23. Variance(of(Importance(Sampling • The%variance%of% %is • The%2nd%term%does%not%depend%on% ,%while%the%1st% term%has 23
24. 24. Op#mal'Proposal • The%lower%bound%is%a1ained%when: • The%op#mal'proposal'distribu#on%is%generally%diﬃcult% to%sample%from.% • However,%this%analysis%leads%us%to%an%insight:%we'can' achieve'high'sampling'eﬃciency'by'emphasizing' regions'where'the'value'of' 'is'high. 24
25. 25. Adap%ve(Importance(Sampling • Basic'idea'**'Learn'to'do'sampling • choose'the'proposal'from'a'tractable'family:' . • Objec;ve:'minimize'the'sample'mean'of' .'Update'the'parameter' 'as: 25
26. 26. Self%Normalized.Weights • In\$many\$prac+cal\$cases,\$ \$is\$ known\$only\$upto\$a\$normalizing\$constant. • For\$such\$case,\$we\$may\$approximate\$ \$with • Here,& &is&called&the&self%normalized.weight. 26
27. 27. Self%Normalized.Weights.(cont'd) Note: By#strong#law#of#large#numbers,#we#have# 27
28. 28. MCMC:\$Mo&va&on Simple'strategies'like'transform)sampling,'rejec1on) sampling,'and'importance)sampling'all'rely'on'drawing' independent)samples'from' 'or'a'proposal'distribu7on' 'over'the'sample'space. This%can%become%very%diﬃcult%(if%not%impossible)%for% complex%distribu:ons. 28
29. 29. MCMC:\$Overview • Markov'Chain'Monte'Carlo'(MCMC)"explores"the" sample"space"through"an"ergodic"Markov"chain," whose"equilibrium"distribu;on"is"the"target" distribu;on" . • Many"important"samplers"belong"to"MCMC,"e.g." Gibbs,sampling,"Metropolis4Has6ngs,algorithm,"and" Slice,sampling. 29
30. 30. Markov'Processes • A#Markov'process#is#a#stochas+c#process#where#the# future#depends#only#on#the#present#and#not#on#the# past. • A#sequence#of#random#variables# # deﬁned#on#a#measurable#space# #is#called#a# (discrete01me)'Markov'process#if#it#sa+sﬁes#the# Markov'property: 30
31. 31. Countable*Markov*Chain We#ﬁrst#review#the#formula2on#and#proper2es#of# Markov'chains#under#a#simple#se6ng,#where# #is#a# countable#space.#We#will#later#extend#the#analysis#to# more#general#spaces. 31
32. 32. Homogeneous)Markov)Chains A"homogeneous)Markov)chain"on"a"countable"space" ," denoted"by" "is"characterized"by"an" ini5al"distribu5on" "and"a"transi2on)probability)matrix) (TPM),"denoted"by" ,"such"that • ,#and • . 32
33. 33. Stochas(c)Matrix The\$transi+on\$probability\$matrix\$ \$is\$a\$stochas'c( matrix,\$namely\$it\$is\$nonnega've\$and\$has:\$ ." 33
34. 34. Evolu&on(of(State(Distribu&ons Let\$ \$be\$the\$distribu,on\$of\$ ,\$then: We#can#simply#write# . 34
35. 35. Mul\$%step*Transi\$on*Probabili\$es • Consider*two*transi.on*steps: • More&generally,& . • Let& &be&the&distribu6on&of& ,&then& 35
36. 36. Classes&of&States • A#state# #is#said#to#be#accessible#from#state# ,#or# # leads)to# ,#denoted#by# ,#if# #for# some# . • States# #and# #are#said#to#communicate#with#each# other,#denoted#by# ,#if# #and# . • #is#an#equivalence)rela2on#on# ,#which# par88ons# #into#communica2ng)classes,#where# states#within#the#same#class#communicate#with# each#other. 36
37. 37. Irreducibility A"Markov"chain"is"said"to"be"irreducible"if"it"forms"a" single"communica7ng"class,"or"in"other"words,"all" states"communicate"with"any"other"states. 37
38. 38. Exercise(1 • Is\$this\$Markov\$chain\$irreducible\$? • Please\$iden5fy\$the\$communica-ng/classes. 38
39. 39. Periods(of(Markov(Chains • The%period%of%a%state% %is%deﬁned%as • A#state# #is#said#to#be#aperiodic#if# . • Period#is#a#class#property:#if# ,#then# . 39
40. 40. Aperiodic)Chains • A#Markov#chain#is#called#aperiodic,#if#all#states#are# aperiodic. • An#irreducible,Markov,chain#is#aperiodic,#if#there# exists#an#aperiodic#state. • Lazyness#breaks#periodicity:# . 40
41. 41. First&Return&Time • Suppose(the(chain(is(ini/ally(at(state( ,(the(ﬁrst% return%)me(to(state( (is(deﬁned(to(be Note(that( (is(a(random(variable. • We(also(deﬁne( ,(the(probability( that(the(chain(returns(to( (for%the%ﬁrst%)me(a<er( ( steps. 41
42. 42. Recurrence A"state" "is"said"to"be"recurrent"if"it"is"guaranteed"to" have"a"ﬁnite)hi+ng)-me,"as Otherwise,* *is*said*to*be*transient. 42
43. 43. Recurrence'(cont'd) • "is"recurrent"iﬀ"it"returns"to" "inﬁnitely+o-en: • Recurrence"is"a"class"property:"if" "and" "is" recurrent,"then" "is"also"recurrent. • Every"ﬁnite"communica9ng"class"is"recurrent. • An"irreducible"ﬁnite"Markov"chain"is"recurrent. 43
44. 44. Invariant(Distribu-ons Consider)a)Markov)chain)with)TPM) )on) : • A#distribu+on# #over# #is#called#an#invariant' distribu,on#(or#sta,onary'distribu,on)#if# . • Invariant'distribu,on#is#NOT#necessarily#existent#and# unique. • Under#certain#condi+on#(ergodicity),#there#exists#a# unique#invariant#distribu+on# .#In#such#cases,# #is# oAen#called#an#equilibrium'distribu,on. 44
45. 45. Exercise(2 • Is\$this\$chain\$irreducible? • Is\$this\$chain\$periodic? • Please\$compute\$the\$invariant/distribu1on. 45
46. 46. Posi%ve(Recurrence • The%expected'return'+me%of%a%state% %is%deﬁned%to% be% . • When% %is%transient,% .%If% %is%recurrent,% % is%NOT%necessarily%ﬁnite. • A%recurrent'state% %is%called%posi+ve'recurrent%if% .%Otherwise,%it%is%called%null'recurrent. 46
47. 47. Existence)of)Invariant)Distribu3ons For\$an\$irreducible\$Markov\$chain,\$if\$some\$state\$is\$ posi,ve.recurrent,\$then\$all\$states\$are\$posi,ve.recurrent\$ and\$the\$chain\$has\$an\$invariance.distribu,on\$ \$given\$by\$ 47
48. 48. Example:)1D)Random)Walk • Under'what'condi/on'is'this'chain'recurrent? • When'it'is'recurrent,'is'it'posi+ve-recurrent'or'null- recurrent? 48
49. 49. Ergodic(Markov(Chains • An\$irreducible,\$aperiodic,\$and\$posi-ve/recurrent\$ Markov\$chain\$is\$called\$an\$ergodic/Markov/chain,\$or\$ simply\$ergodic/chain. • A\$ﬁnite\$Markov\$chain\$is\$ergodic\$if\$and\$only\$if\$it\$is\$ irreducible\$and\$aperiodic. • A\$Markov\$chain\$is\$ergodic\$if\$it\$is\$aperiodic\$and\$there\$ exist\$ \$such\$that\$any\$state\$can\$be\$reached\$from\$ any\$other\$state\$within\$ \$steps\$with\$posi>ve\$ probability. 49
50. 50. Convergence)to)Equilibrium Let\$ \$be\$the\$transi,on\$probability\$matrix\$of\$an\$ergodic\$ Markov\$chain,\$then\$there\$exists\$a\$unique\$invariant, distribu0on\$ .\$Then\$with\$any\$ini,al\$distribu,on,\$ ! Par\$cularly,* *as* *for*all* . 50
51. 51. Ergodic(Theorem • The%ergodic(theorem%relates%,me(mean%to%space( mean: • Let% %be%an%ergodic(Markov(chain%over% %with%equilibrium%distribu7on% ,%and% %be%a% measurable%func7on%on% ,%then 51
52. 52. Ergodic(Theorem((cont'd) • More&generally,&we&have&for&any&posi4ve&integer& : • The%ergodic(theorem%is%the%theore+cal%founda+on% for%MCMC. 52
53. 53. Total&Varia*on&Distance • The%total%varia)on%distance%between%two%measures% %and% %is%deﬁned%as • The%total%varia)on%distance%is%a%metric.%If% %is% countable,%we%have 53
54. 54. Mixing&Time The\$%me\$required\$by\$a\$Markov\$chain\$to\$get\$close\$to\$ the\$equilibrium\$distribu%on\$is\$measured\$by\$the\$mixing& 'me,\$deﬁned\$as\$ In#par'cular# . 54
55. 55. Spectral)Representa-on • "be"a"ﬁnite"stochas'c(matrix"over" "with" . • The"spectral(radius"of" ,"namely"the"maximum" absolute"value"of"all"eigenvalues,"is" . • Assume" "is"ergodic"and"reversible"with"equilibrium" distribu=on" . • Deﬁne"an"inner"product:" 55
56. 56. Spectral)Representa-on)(cont'd) All#eigenvalues#of# #are#real#values,#given#by# .#Let# #be#the#right# eigenvector#associated#with# .#Then#the#le:# eigenvector#is# #(element'wise+product),#and# # can#be#represented#as 56
57. 57. Spectral)Gap Let\$ .\$Then\$the\$spectral) gap\$is\$deﬁned\$to\$be\$ \$and\$the\$absolute) spectral)gap\$is\$deﬁned\$to\$be\$ .\$Then: Here,% . 57
58. 58. Bounds'of'Mixing'Time The\$mixing&'me\$can\$be\$bounded\$by\$the\$inverse\$of\$ absolute&spectral&gap: Generally,)the)goal)to)design)a)rapidly(mixing) reversible)Markov)chain)is)to)maximize)the)absolute) spectral)gap) . 58
59. 59. Ergodic(Flow Consider)an)ergodic)Markov)chain)on)a)ﬁnite)space) ) with)transi5on)probability)matrix) )and)equilibrium) distribu5on) : • The%ergodic(ﬂow%from%a%subset% %to%another%subset% %is%deﬁned%as 59
60. 60. Conductance The\$conductance\$of\$a\$Markov\$chain\$is\$deﬁned\$as 60
61. 61. Bounds'of'Spectral'Gap (Jerrum'and'Sinclair'(1989))!The!spectral!gap!is! bounded!by 61
62. 62. Exercise(3 Consider)an)ergodic)ﬁnite)chain) )with) .)To) improve)the)mixing)7me,)one)can)add)a)li:le)bit) lazyness)as) .) Please&solve&the&op,mal&value&of& &that&maximizes& the&absolute)spectral)gap& . 62
63. 63. Exercise(4 Consider)a) )stochas.c)matrix) ,)given)by) )when) . • Please'specify'the'condi2on'under'which' 'is' ergodic. • What'is'the'equilibrium'distribu2on'when' 'is' ergodic? • Solve'the'op2mal'value'of' 'that'maximizes'the' absolute'spectral'gap. 63
64. 64. General'Markov'Chains Next,&we&extend&the&formula2on&of&Markov'chain&from& countable'space&to&general&measurable'space. • First,(the(Markov'property(remains. • But,&the&transi.on&probability&matrix&makes&no& sense&in&general. 64
65. 65. General'Markov'Chains'(cont'd) • Generally,*a*homogeneous)Markov)chain,*denoted*by* ,*over*a*measurable*space* *is* characterized*by* • an*ini1al)measure* ,*and* • a*transi1on)probability)kernel* : 65
66. 66. Stochas(c)Kernel The\$transi'on)probability)kernel\$ \$is\$a\$stochas'c)kernel: • Given' ,' 'is'a'probability* measure'over' . • Given'a'measurable'subset' ,' 'is'a'measurable'func5on. • When' 'is'a'countable'space,' 'reduces'to'a' stochas1c*matrix. 66
67. 67. Evolu&on(of(State(Distribu&ons Suppose'the'distribu.on'of' 'is' ,'then Again,'we'can'simply'write'this'as' . 67
68. 68. Composi'on)of)Stochas'c)Kernels • Composi(on*of*stochas(c*kernels* *and* *remains* a*stochas(c*kernel,*denoted*by* ,*deﬁned*as: • Recursive*composi.on*of* *for* *.mes*results*in*a* stochas.c*kernel*denoted*by* ,*and*we*have* *and* . 68
69. 69. Example:)Random)Walk)in) Here,%the%stochas,c%kernel%is%given%by% . 69
70. 70. Occupa&on)Time,)Return)Time,)and) Hi4ng)Time Let\$ \$be\$a\$measurable\$set: • The%occupa&on(&me:% . • The%return(&me:% . • The%hi/ng(&me:% . • ,% %and% %are%all%random%variables. 70
71. 71. !irreducibility • Deﬁne& . • Given&a&posi/ve&measure& &over& ,&a&markov& chain&is&called& !irreducible&if& & whenever& &is& !posi-ve,&i.e.& . • Intui/vely,&it&means&that&for&any& >posi/ve&set& ,& there&is&posi/ve&chance&that&the&chain&enters& & within&ﬁnite&/me,&no&ma?er&where&it&begins. 71
72. 72. !irreducibility,(cont'd) • A#Markov#chain#over# #is# 0irreducible#if#and#only#if# either#of#the#following#statement#holds: • • 72
73. 73. !irreducibility,(cont'd) • Typical)spaces)usually)come)with)natural'measure) : • The)natural'measure)for)countable)space)is)the) coun-ng'measure.)In)this)case,)the)no:on)of) ; irreducibility)coincides)with)the)one)introduced) earlier. • The)natural'measure)for) ,) ,)or)a)ﬁnite; dimensional)manifold)is)the)Lebesgue'measure 73
74. 74. Transience)and)Recurrence Given&a&Markov&chain& &over& ,&and& : • "is"called"transient"if" "for"every" . • "is"called"uniformly.transient"if"there"exists" " such"that" "for"every" . • "is"called"recurrent"if" "for"every" . 74
75. 75. Transience)and)Recurrence)(cont'd) • Consider*an* ,irreducible*chain,*then*either: • Every* ,posi9ve*subset*is*recurrent,*then*we*call* the*chain*recurrent • *is*covered*by*countably*many*uniformly* transient*sets,*then*we*call*the*chain*transient. 75
76. 76. Invariant(Measures • A#measure# #is#called#an#invariant'measure#w.r.t.#the# stochas2c#kernel# #if# ,#i.e. • A#recurrent#Markov#chain#admits#a#unique#invariant' measure# #(up#to#a#scale#constant). • Note:#This#measure# #can#be#ﬁnite#or#inﬁnite. 76
77. 77. Posi%ve(Chains • A#Markov#chain#is#called#posi%ve#if#it#is#irreducible# and#admits#an#invariant,probability,measure# . • The#study#of#the#existence#of# #requires#more# sophis<cated#analysis. • We#are#not#going#into#these#details,#as#in#MCMC# prac<ce,#existence#of# #is#usually#not#an#issue. 77
78. 78. Subsampled+Chains Let\$ \$be\$a\$stochas+c\$kernel\$and\$ \$be\$a\$probability\$ vector\$over\$ .\$Then\$ \$deﬁned\$as\$below\$is\$also\$a\$ stochas+c\$kernel: The\$chain\$with\$kernel\$ \$is\$called\$a\$subsampled*chain\$ with\$ . 78
79. 79. Subsampled+Chains+(cont'd) • When& ,& . • If& &is&invariant&w.r.t.& ,&then& &is&also&invariant&w.r.t.& . 79
80. 80. Sta\$onary)Process • A#stochas*c#process# #is#called#sta\$onary#if • A#Markov#chain#is#sta\$onary#if#it#has#an#invariant# probability#measure#and#that#is#also#its#ini9al# distribu9on. 80
81. 81. Birkhoﬀ(Ergodic(Theorem • (Birkhoﬀ)Ergodic)Theorem)"Every"irreducible) sta8onary"Markov"chain" "is"ergodic,"that"is,"for" any"real5valued"measurable"func:on" : where" "is"the"invariant"probability"measure. 81
82. 82. Markov'Chain'Monte'Carlo (Markov(Chain(Monte(Carlo):"To"sample"from"a"target( distribu6on" : • We\$ﬁrst\$construct\$a\$Markov\$chain\$with\$transi'on) probability)kernel\$ \$such\$that\$ . • This\$is\$the\$most\$diﬃcult\$part. 82
83. 83. Markov'Chain'Monte'Carlo'(cont'd) • Then&we&simulate&the&chain,&usually&in&two&stages: • (Burning(stage)&simulate&the&chain&and&ignore&all& samples,&un8l&it&gets&close&to&sta.onary • (Sampling(stage)&collect&samples& &from& a&subsampled&chain& &or& . • Approximate&the&expecta8on&of&the&func8on& &of& interest&using&the&sample&mean. 83
84. 84. Detailed(Balance Most%Markov%chains%in%MCMC%prac0ce%falls%in%a% special%family:%reversible(chains • A#distribu+on# #over#a#countable*space#is#said#to#be# in*detailed*balance#with# #if# . • Detailed(balance"implies"invariance. • The"converse"is"not"true. 84
85. 85. Reversible)Chains An#irreducible#Markov#chain# #with#transi5on# probability#matrix# #and#an#invariant#distribu5on# : • This&Markov&chain&is&called&reversible&if& &is&in& detailed+balance&with& . • Under&this&condi7on,&it&has: 85
86. 86. Reversible)Chains)on)General)Spaces Over%a%general%measurable%space% ,%a%stochas4c% kernel% %is%called%reversible%w.r.t.%a%probability%measure% %if for\$any\$bounded\$measurable\$func0on\$ . • If\$ \$is\$reversible\$w.r.t.\$ ,\$then\$ \$is\$an\$invariant\$to\$ . 86
87. 87. Detailed(Balance(on(General(Spaces Suppose'both' 'and' 'are'absolutely'con2nuous' w.r.t.'a'base'measure' :' Then%the%chain%is%reversible%if%and%only%if which%is%called%the%detailed'balance. 87
88. 88. Detailed(Balance(on(General(Spaces More%generally,%if% where% ,%then%the%chain%is%reversible% iﬀ 88
89. 89. Metropolis*Has-ngs:1Overview • In\$MCMC\$prac+ce,\$the\$target&distribu,on\$ \$is\$ usually\$known\$up\$to\$an\$unnormalized&density\$ ,\$ such\$that\$ ,\$and\$the\$normalizing& constant\$ \$is\$o9en\$intractable\$to\$compute. • The\$Metropolis6Has,ngs&algorithm&(M6H&algorithm)\$is\$ a\$classical\$and\$popular\$approach\$to\$MCMC\$ sampling,\$which\$requires\$only\$the\$unnormalized& density\$ .\$ 89
90. 90. Metropolis*Has-ngs:1How1it1Works 1. It%is%associated%with%a%proposal'kernel% . 2. At%each%itera2on,%a%candidate%is%generated%from% % given%the%current%state% . 3. With%a%certain%acceptance%ra2o,%which%depends%on% both% %and% ,%the%candidate%is%accepted. • The%acceptance%ra2o%is%determined%in%a%way%that% maintains%detailed'balance,%so%the%resultant%chain% is%reversible%w.r.t.% . 90
91. 91. Metropolis*Algorithm • The%Metropolis*algorithm%is%a%precursor%(and%a% special%case)%of%the%M6H%algorithm,%which%requires% the%designer%to%provide%a%symmetric*kernel% ,%i.e.% ,%where% %is%the%density%of% . • Note:% %is%not%necessarily%invariant%to% • Gaussian*random*walk%is%a%symmetric%kernel. 91
92. 92. Metropolis*Algorithm*(cont'd) • At\$each\$itera+on,\$with\$current\$state\$ : 1. Generate\$a\$candidate\$ \$from\$ 2. Accept\$the\$candidate\$with\$acceptance'ra)o\$ . • The\$Metropolis'update\$sa+sﬁes\$detailed'balance.\$ Why? 92
93. 93. Metropolis*Has-ngs0Algorithm • The%Metropolis*Has-ngs0algorithm%requires%a% proposal0kernel% , • %is%NOT%necessarily%symmetric%and%does%NOT% necessarily%admit% %as%an%invariant%measure. 93
94. 94. Metropolis*Has-ngs0Algorithm • At\$each\$itera+on,\$with\$current\$status\$ : • Generate\$a\$candidate\$ \$from\$ • Accept\$the\$candidate\$with\$acceptance'ra)o\$ ,\$with • The\$Metropolis/Has)ngs'update\$sa+sﬁes\$detailed' balance.\$Why? 94
95. 95. Gibbs%Sampling • The%Gibbs%sampler%was%introduced%by%Geman%and% Geman%(1984)%from%sampling%from%a%Markov% random%ﬁeld%over%images,%and%popularized%by% Gelfand%and%Smith%(1990). • Each%state% %is%comprised%of%mulIple%components% . 95
96. 96. Gibbs%Sampling%(cont'd) At#each#itera*on,#following#a#permuta*on# #over# .#For# ,#let# ,#update# #by# re;drawing# #condi*oned#on#all#other#components: Here\$ \$indicates\$the\$condi&onal)distribu&on\$of\$ \$ given\$all\$other\$components.\$ 96
97. 97. Gibbs%Sampling%(cont'd) • At\$each\$itera+on,\$one\$can\$use\$either\$a\$random'scan\$ or\$a\$ﬁxed'scan. • Diﬀerent\$schedules\$can\$be\$used\$at\$diﬀerent\$ itera+ons\$to\$scan\$the\$components. • The\$Gibbs'update\$is\$a\$special\$case\$of\$M4H'update,\$ and\$thus\$sa+sﬁes\$detailed4balance.\$Why? 97
98. 98. Example(1:(Gaussian( Mixture(Model 98
99. 99. Example(1:(Gibbs(Sampling Given& ,&ini(alize& &and& . Condi&oned(on( (and( : 99
100. 100. Example(1:(Gibbs(Sampling((cont'd) Condi&oned(on( (and( : with 100
101. 101. Example(2:(Ising( Model The\$normalizing\$constant\$ \$is\$ usually\$unknown\$and\$intractable\$ to\$compute. 101
102. 102. Example(2:(Gibbs(Sampling • Let% %denote%the%en*re% %vector%except%for%one% entry% , • How%can%we%schedule%the%computa2on%so%that% many%updates%can%be%done%in%parallel? • Coloring 102
103. 103. Mixture(of(MCMC(Kernels Let\$ \$be\$stochas'c(kernels\$with\$the\$same\$ invariant\$probability\$measure\$ : • Let% %be%a%probability%vector,%then% % remains%a%stochas'c(kernel%with%invariant%probability% measure% . • Furthermore,%if% %are%all%reversible,% then% %is%reversible. 103
104. 104. Composi'on)of)MCMC)Kernels • "is"also"a"stochas'c(kernel"with" invariant"probability"measure" . • Note:" "is"generally"not"reversible"even"when" "are"all"reversible,"except"when" . 104