Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MLPI Lecture 4: Graphical Model and Exponential Family

893 views

Published on

This lecture covers the basics of graphical models, exponential families, conjugate priors, as well as guidelines for formulating graphical models for practical problems.

Published in: Science
  • Be the first to comment

MLPI Lecture 4: Graphical Model and Exponential Family

  1. 1. Lecture'4 Graphical)Model)and)Exponen1al) Family Dahua%Lin The$Chinese$University$of$Hong$Kong 1
  2. 2. Outline You$will$learn$the$basics$of$probabilis3c$modeling$in$ this$lecture: • Graphical*Models • Exponen3al*Families • Conjugate*Prior • How*to*formulate*and*analyze*a*graphical*models*in* prac3ce. 2
  3. 3. Graphical)Models • The%key%idea%behind%graphical)models%is% factoriza2on. • A%graphical)model%generally%refers%to%a%family%of%joint% distribu8ons%over%mul8ple%variables%that%factorize% according%to%the%structure%of%the%underlying%graph.% 3
  4. 4. Graphical)Models A"graphical)model"can"be"viewed"in"two"ways: • A#data#structure#that#provides#the#skeleton#for# represen3ng#a#joint#distribu3on#in#a#factorized# manner. • A#compact#representa3on#of#a#set#of#condi/onal0 independencies#about#a#family#of#distribu3ons. These%two%views%are%equivalent%in%a%strict%sense. 4
  5. 5. Distribu(ons+on+a+Graph Consider)a)graph) ,)where)edges)can)be) directed)or)undirected: • A#ach'a'random'variable' 'to'each'vertex' • The'state'space'for' 'is'denoted'by' • A'par9cular'instance'of' 'is'denoted'by' • We'can'also'consider'a'set'of'variables:' 'and' 5
  6. 6. Categories*of*Graphical*Models • Bayesian)Networks)(Directed)Acyclic)Graphs) • Markov)Random)Fields)(Undirected)Graphs) • Chain)Graphs)(Directed)acyclic)graphs)over) undirected)components) • Factor)Graphs 6
  7. 7. Directed(Acyclic(Graphs Consider)a)directed'graph) : • "is"called"a"directed'acyclic'graph'(DAG)"if"it"has"no" directed'acyclic'cycles 7
  8. 8. Directed(Acyclic(Graphs((cont'd) Consider)a)directed'acyclic'graph) : • Given'an'edge' ,' 'is'called'a'parent'of' ,' and' 'is'called'a'child'of' . • A'vertex' 'is'called'an'ancestor'of' 'and' 'an' descendant'of' ,'denoted'as' ,'if'there'exists'a' directed'path'from' 'to' . 8
  9. 9. Topological)Ordering • A#topological)ordering#of#a#directed)graph# #is#a#linear)ordering#of#ver,ces#such#that# for#each#edge# ,# #always#comes#before# . • A#finite&directed&graph#is#acyclic#if#and#only#if#it#has#a# topological&ordering. 9
  10. 10. Bayesian(Networks Given&a&DAG& ,&we&say&a&joint&distribu,on& over& &factorizes&according&to& ,&if&its&density& &can& be&expressed&as: • Such&a&model&is&called&a&Bayesian(Network&over& . • &is&the&set&of& 's&parents,&which&can&be&empty. 10
  11. 11. Bayesian(Networks:(Example ! 11
  12. 12. Undirected)Graphs)and)Cliques Consider)an)undirected)graph) • A#clique#is#a#fully#connected#subset#of#ver4ces • A#clique#is#called#maximal#if#it#is#not#properly. contained#in#another#clique.# • #denotes#the#set#of#all#maximal.cliques.# 12
  13. 13. Undirected)Graphs)and)Cliques) (cont'd) 13
  14. 14. Markov'Random'Fields Consider)an)undirected)graph) ,)we)say)a) joint)distribu2on)of) )factorizes)according)to) )if)its) density) )can)be)expressed)as: • This&is&called&a&Markov'Random'Field&over& . • &are&called&factors. 14
  15. 15. Markov'Random'Fields'(cont'd) • The%normalizing*constant% %is%usually%needed%to% ensure%the%distribu2on%is%properly%normalized: • Generally,*the*compa&bility,func&ons* *need*not* have*any*obvious*rela4ons*with*the*marginal*or* condi4onal*distribu4ons*over*the*cliques. 15
  16. 16. a b c MRF$ Parameteriza-on • All$MRFs$can$be$parameterized$ in$terms$of$maximal&cliques.$In$ prac9ce,$this$is$not$necessarily$ the$most$natural$way. • Natural(parameteriza.on:( • Maximal4clique(based:( (with( 16
  17. 17. The$graphical$structure$also$encodes$a$set$of$ condi3onal$independencies$among$the$variables. 17
  18. 18. Condi&onal)Independence Consider)a)joint)distribu/on)over) ,) )and) ) are)called)condi&onally*independent)given) ,)denoted) by) )iff More%generally, 18
  19. 19. Condi&onal)Independence)(cont'd) If#the#condi,onal#distribu,ons# #and# #have# densi,es# #and# ,#then# ,#if#the# following#equality#holds#almost'surely: ! 19
  20. 20. I"map • Let% %be%a%family%of%distribu2ons%(e.g.%a%graphical% model).%We%define% %to%be%the%set%of%condi)onal, independencies%in%the%form%of% %that% hold%for%all%distribu2ons%in% . • Given%a%graph% %associated%with%a%set%of%condi)onal, independencies% ,%then% %is%called%an%I0map%of% % if% . • An%I0map%is%a%graph%that%captures%(part%of)%the% condi2onal%independencies%of%a%distribu2on%family.% 20
  21. 21. Condi&onal)Independencies)of)MRFs • The%condi+onal%independencies%of%an%MRF%can%be% characterized%in%three%ways: • Local&independencies • Pairwise&independencies • Global&independencies • In%the%sequel,%we%consider%an%undirected&graph% . 21
  22. 22. Local& Independencies • For%each% ,%the%Markov' blanket%of% %w.r.t.% ,%denoted% by% ,%is%the%set%of%all% neighbors%of% . • Local'independencies:% %is% independent%of%the%rest%given% its%neighbors. 22
  23. 23. Pairwise( Independencies • Pairwise(independencies:#Given# two#disjoint#sets# # with#no#direct#edges#between# them,# #is#independent#of# #given#the#rest: 23
  24. 24. Global& Independencies • We$say$ $separates$ $and$ ,$ denoted$by$ ,$if$all$ paths$between$ $and$ $go$ through$ . • Global+independencies:$If$ $ separates$ $and$ ,$then$ $is$ independent$of$ $given$ . 24
  25. 25. Rela%ons)between)Independencies • • Given'a'distribu/on'or'a'family'of'distribu/on' ,' we'say' 'sa#sfies' 'if'it'sa/sfies'all'condi#onal, independencies'in' ,'denoted'by' . • . • If' 'is'a'family'of'posi#ve,distribu#ons,'then 25
  26. 26. Soundness • Let% %be%a%distribu-on%that%factorizes%according%to% an%undirected%graph% ,%then% ,%or%in%other% words,% %is%an%I"map%of% • %and% . • How%to%proof? • How%is%the%separa)on,assump)on%related%to%the% maximal,cliques? 26
  27. 27. We#have#shown#that#if# #factorizes#according#to# ,# then# #is#an#I5map#for# .#Is#the#converse#also#true? 27
  28. 28. Hammersley)Clifford • (Hammersley*Clifford0Theorem)"Let" "be"a"posi5ve0 distribu5on"over" "and" "be"an"I.map"of" ,"then" "factorizes"according"to" ." • Combining"Soundness"and"Hammersley*Clifford: • A"posi5ve0distribu5on" "factorizes"according"to" " if"and"only"if" "is"an"I*map"of" . 28
  29. 29. Condi&onal)Independencies)of)BN • Condi'onal*independencies*of*a*Bayesian(network* can*be*characterized*in*two*ways: • local*independencies • global*independencies*(via* .separa0on) • In*the*sequel,*we*consider*a*directed(graph* . 29
  30. 30. b c g a d i e f h Local& Independencies • Given' ,' 'is' independent'of'its'non# descendants'given'its'parents' : 30
  31. 31. X Z Y Y Z X Z X Y Z X Y indirect effect common cause common effect 31
  32. 32. !separa'on When%"influence"%can%flow%from% %to% %via% ,%we%say% that%the%trail% %is%ac)ve: • "is"ac#ve"iff" "is"not"observed. • "is"ac#ve"iff" "is"not"observed. • "is"ac#ve"iff" "is"not"observed. • (V(structure)" "is"ac#ve"iff"either" "or" some"of" 's"descendants"is"observed. 32
  33. 33. !separa'on*(cont'd) • A#trail# #is#called#ac#ve#when#all# sub2trails# #are#ac#ve. • Let# #be#three#sets#of#ver8ces#of# .# #and# # are# 2separated#by# ,#denoted#by# ,#if# there#is#neither#direct#link#nor#ac8ve#trail#between# #and# #when# #are#observed. 33
  34. 34. b c g a d i e f h Global& Independencies • Given' ,' 'is' independent'of' 'given' 'if' 'and' 'is' 'separated'by' 'on' the'graph' : • • (Soundness)"If" "factorize" according"to" ,"then" ,"or"we"say" "is"an" I2map"of" . 34
  35. 35. Moralized*Graphs • Given'a'directed'graph' ,'we'can' construct'a'moralized'graph,'denoted'by' 'by' adding'edges'between'each'node'and'its'parents' and'between'each'node's'parents.' • In' ,'the'subgraph'that'span' 'forms' a'clique,'denoted'by' .' • The'procedure'of'construc=ng' 'from' 'is' called'moraliza2on. 35
  36. 36. Moralized*Graphs*(Illustra3on) b c g a d i e f h b c g a d i e f h 36
  37. 37. From%BN%to%MRF If# #factorizes#according#to# #as then% %factorizes%according%to% : • .#Is#the#opposite#true? 37
  38. 38. From%BN%to%MRF%(cont'd) • In$general,$moraliza(on$may$cause$the$loss$of$ condi(onal,independencies. • $may$be$a$proper$subset$of$ . • Consider$a$DAG:$ • $holds$for$ $but$not$for$ .$ • Not$every$MRF$can$be$converted$to$a$BN. 38
  39. 39. Factor'Graphs • An$MRF$does$not$always$fully$reveal$the$factorized. structure$of$a$distribu8on. • A$factor.graph$can$some8mes$give$a$more$accurate$ characteriza8on$of$a$family$of$distribu8ons.$ • A$factor.graph$is$a$bipar4te.graph$with$links$between$ two$types$of$nodes:$variables$and$factors.$ • A$variable$ $and$a$factor$ $is$linked$in$a$factor$ graph,$if$the$factor$involves$ $as$an$argument. 39
  40. 40. Factor'Graphs'(Illustra0on) 40
  41. 41. Study&of&Distribu.ons • Graphical*models* *structure*of*(in)dependencies • Exponen8al*families* *algebraic*characteris8cs 41
  42. 42. Exponen'al*Families An#exponen&al)family# #over#a#measure#space# : • sufficient)sta+s+cs:" • canonical)parameter)func+on:" • par++on)func+on:" • base)density:" "over" 42
  43. 43. Par$$on'Func$on • The%par$$on'func$on%is%given%by: • The%log$par((on*func(on%given%by% %is%o.en%used%instead%of% . 43
  44. 44. Parameter'Space • An$exponen)al$family$is$essen)ally$determined$by$ the$domain$ $and$the$sufficient-sta.s.cs$ . • The$set$of$valid$parameters$is$ • An$exponen)al$family$can$be$parameterized$in$ many$ways.$When$ ,$it$is$said$to$be$in$the$ canonical-form.$ 44
  45. 45. Many%important%families%of%distribu3ons%are% exponen3al%families: • Binomial(distribu/ons • Poisson(distribu/on • Normal(distribu/on • Exponen/al(distribu/on • Beta(distribu/on • And%many%more%... 45
  46. 46. Bernoulli)Distribu.on Domain:( Parameter:( Density:) Bernoulli)distribu.ons!describe!an!event!that!may!or! may!not!happen. 46
  47. 47. Bernoulli)Distribu.on)(cont'd) • sufficient)sta+s+cs:" • canonical)parameter:" • base)density:" "w.r.t."coun'ng • par++on)func+on:" 47
  48. 48. Bernoulli)Distribu.on)(cont'd) • sufficient)sta+s+cs:" • canonical)parameters:" • base)density:" "w.r.t."coun'ng • par++on)func+on:" 48
  49. 49. Poisson&Distribu,on Domain:( Parameter:( Density: Poisson&distribu,ons!characterize! the!number!of!independent! events!occurring!in!a!certain!rate! !within!a!unit!6me. 49
  50. 50. Poisson&Distribu,on&(cont'd) • sufficient)sta+s+cs:" • canonical)parameter:" " • base)density:" "w.r.t."coun'ng • par++on)func+on:" 50
  51. 51. Exponen'al* Distribu'on Domain:( Parameter:( Density:) Exponen'al*distribu'ons! characterize!the!*me!interval! between!independent!events! occurring!at!a!certain!rate! .! 51
  52. 52. Exponen'al*Distribu'on*(cont'd) • sufficient)sta+s+cs:" "or" • canonical)parameter:" "or" • base)density:" "w.r.t."Lebesgue • par++on)func+on:" ," which"is"finite"only"when" . 52
  53. 53. Normal'Distribu.on Domain:( Parameter:( Density: Normal'distribu.ons!are!probably! the!most!widely!used! distribu2ons!in!probabilis2c! analysis. 53
  54. 54. Normal'Distribu.on'(cont'd) • sufficient)sta+s+cs:" • canonical)parameter:" " • base)density:" "w.r.t."Lebesgue • par++on)func+on:" 54
  55. 55. Normal'Distr.'in'Canonical'Form The$normal$distribu1on$can$be$alterna1vely$ parameterized$in$the$canonical'form: • poten&al)coefficient:" • precision)coefficient:" with% . 55
  56. 56. Regular(Family In#the#sequel,#we#focus#on#exponen2al#families#in#the# canonical'form,#the#set#of#valid#canonical'parameters#is: The$exponen)al$family$ $is$called$a$regular'family,$if$ $is$an$open$subset$of$ .$We$restrict$our$ a:en)on$to$regular'families.$ 56
  57. 57. Iden%fiability Let$ $be$a$parameterized$family: • "is"called"iden%fiable"when"each"distribu1on"in" " corresponds"to"a"unique"parameter"in" : ! • Iden%fiability"means"that"the"parameter"of"a" distribu1on"can"be"learned"from"observed3samples" without"the"need"of"addi1onal"constraints. 57
  58. 58. Minimal'and'Overcomplete Consider)an)exponen-al)family)with)sufficient)stats) , • If$there$exist$ $such$that$ $holds$almost$everywhere,$this$is$called$ a$overcomplete*representa.on,$otherwise,$it$is$called$ a$minimal*representa.on. • An$exponen;al$family$is$iden.fiable$if$and$only$if$ the$representa;on$is$minimal.$Why? 58
  59. 59. Minimal'and'Overcomplete'(cont'd) • Consider*an*exponen.al*family*with*sufficient*stats* *such*that* *is*constant,*then*for*each* ,* *for*each* *is*also*in* *and*it* yields*the*same*distribu.on. • We*will*answer*why*minimal&representa,on&is& iden,fiable*later. • Overcomplete&representa,on*is*useful*as*it*may*lead* to*more*natural*parameteriza.on.*Also,*with* addi.onal*constraints,*it*can*be*made*iden,fiable. 59
  60. 60. Bernoulli)Revisited Consider)two)representa.ons: • [R1]" " • [R2]" For$each$representa-on: • Is$it$minimal$or$overcomplete? • If$it$is$overcomplete,$find$ $such$that$ • Is$it$iden.fiable$or$uniden.fiable? 60
  61. 61. Mean%Parameters • The%expecta'on%of%sufficient%sta0s0cs%as%below%are% called%mean+parameters: • Under'certain'condi+ons,'the'distribu+on'in'an' exponen+al'family'is'uniquely'determined'by'the' mean-parameters,'which'thus'provide'an'alterna+ve' parameteriza+on. 61
  62. 62. Realizable(Mean(Parameters • Given'a'sufficient'stats' ,'we'say'a'distribu4on' ' realizes'a'mean*parameter' 'if' . • The'set'of'(realizable)*mean*parameters'for'a'given' sufficient'stats' 'is: • Here,' 'is'not'restricted'to'the'exponen4al'family.' • 'is'a'convex*set.'Why? 62
  63. 63. Convex'Hulls • Given'a'set' ,'the'convex'hull'of' ,'denoted' by' ,'is'the'set'of'all'convex'combina/ons'of' elements'in' . • 'are'the'minimum'convex'set'containing' . • A'convex'hull'of'some'finite'set'is'called'a'convex' polytope. • Convex'polytopes'are'compact. 63
  64. 64. Probability*Simplex Given&a&finite&space& ,&the&probability*simplex&over& : When% ,% %reduces%to:% and$ $is$an$ 'dimensional$polytope. 64
  65. 65. Polytope(of(Mean(Parameters When%the%sample%space% %is%finite,%given%any% ,%the%set% %is%a%convex'polytope: Par$cularly,*each* *can*be*wri1en*as 65
  66. 66. Log$par((on*Func(on The$log$par((on*func(on$given$by$ has$the$following$proper0es: • "is"a"convex'func*on"and"thus" "is"a"convex'set. 66
  67. 67. Log$par((on*Func(on*(cont'd) • For%an%overcomplete*representa.on%with% ,% %has% ,%because:% • Conversely,,for,a,minimal&representa,on,,we,have, ,for,every,non1zero,vector, ,,hence, ,is,posi7ve,definite,for,every, ,and, thus, ,is,strictly&convex., 67
  68. 68. Gradient)Map The$gradient)map$ $is$a$mapping$ from$the$canonical)parameters$ $to$the$mean) parameters$ . • When&is& &injec&ve((i.e.(one,to,one)&? • When&is& &surjec&ve&(onto& )&?& 68
  69. 69. Gradient)Map)(cont'd) • The%gradient)map%is%injec.ve%if%and%only%if%the% exponen2al%representa2on%is%minimal. • An%exponen2al%family%with%minimal)representa.on% is%iden.fiable. • How%to%prove?% %is%strictly)convex% % . • With%overcomplete)representa.on,%there%is%one=to= one%correspondence%between%mean%parameters% and%affine)subsets%of% . 69
  70. 70. Gradient)Map)(cont'd) • With&minimal&representa,on,& &is&onto& ,&the& interior&of& .& • Each&mean&parameter& &is&uniquely&realized& by&a&canonical&parameter& . • Given& ,&there&can&be&many&distribu;ons& that&realize& ,&among&which&there&is&one&that& maximizes&the&entropy,&which&is&in&the&exponen;al& family&associated&with& &(we&will&see&this). 70
  71. 71. Entropy Given&an&exponen+al&family&distribu+on& The$entropy$of$ $is$defined$to$be: 71
  72. 72. Maximum'Entropy'(Problem) Consider)a)finite)space) ,)we)want)to ! What's'the'solu,on? 72
  73. 73. Maximum'Entropy'(Solu2on) Using&the&method'of'Lagrange'mul0pliers,&we&get&the& op.ma& : The$solu)on$to$an$maximum&entropy&problem$with$ expecta1on&constraints$is$always$an$exponen1al&family& distribu1on.$This$can$be$generalized$to$con)nuous$ space,$using$calculus&of&varia1ons. 73
  74. 74. Kullback(Leibler-Divergence The$Kullback(Leibler-divergence$(or$KL-divergence)$ between$two$probability$densi4es$ $and$ $(w.r.t.$the$ same$base$measure)$is$defined$to$be KL#divergence#is#not$symmetric. 74
  75. 75. KL#Divergence#(cont'd) • (Gibbs&inequality)" ,"where"the"equality" holds"if"and"only"if" "almost"everywhere" (w.r.t."the"base"measure)." • Given"two"distribu;ons" "in"the"same" exponen;al"family"with"sufficient"stats" : 75
  76. 76. Projec'ons*of*Distribu'ons Let$ $be$an$exponen&al)family$and$ $be$a$distribu-on,$ both$over$the$same$space$ : • The%I"projec)on+(informa)on+projec)on)%of% %onto% : • The%M"projec)on+(moment+projec)on)%of% %onto% : 76
  77. 77. Maximum'Likelihood'Es1ma1on Given&a&parameterized&family& &over& ,& and& ,&the&log$likelihood&of& : !is!called!an!maximum&likelihood&es.mate!given! !if 77
  78. 78. MLE$(cont'd) • Given' ,' 'is'called'an' empirical)probability)measure,'which'has • The%log$likelihood%of% %given% %can%be%rewri1en%as 78
  79. 79. MLE$(cont'd) • We$have$ . • Maximizing$ $is$equivalent$to$ minimizing$ . • Maximum'likelihood'es/ma/on$is$equivalent$to$M1 projec/on$of$the$empirical'distribu/on$ $to$a$given$ family. 79
  80. 80. M"projec)ons Given&an&exponen+al&family& &and&an&arbitrary& distribu+on& &over& ,&then Thus,&the&op#ma& &is 80
  81. 81. M"projec)ons,(cont'd) • This&is&a&convex'problem.&The& &is&op-mal&iff: • M0projec-on&to&an&exponen1al&family& &is&to&find&a& distribu1on&in& &whose&mean'parameter&matches& the&input&mean& .&( &is&always&realizable,&why?) • With&minimal'representa-on&the&op1ma&is&unique;& otherwise,&the&set&of&op1mal&solu1ons&is&an&affine' subset&of& ,&which&yield&the&same&distribu1on. 81
  82. 82. What%about%I"projec)ons%? • We$will$see$their$u-lity$when$we$talk$about$mean% field%methods$and$varia0onal%inference. 82
  83. 83. Convex'Conjugate Let$ $be$a$real)valued$func0on$ : • The%convex'conjugate%of% %is%defined%to%be • "is"always"convex"no"ma,er"whether"so"is" . • "is"convex. 83
  84. 84. Convex'Conjugate • (Fenchel's*inequality)" • (Fenchel2Moreau*theorem)" "iff" "is"convex"and" lower*semi2con:nuous: 84
  85. 85. Conjugate*Duality • "is"called"dually&coupled"if" . • The%convex'conjugate%to%a%log+par..on%func.on% : • Supreme(a*ained(at( (iff( (is(dually&coupled. 85
  86. 86. Conjugate*Duality*(cont'd) • "has: • "on" "determined"via"Cauchy"sequences. 86
  87. 87. Conjugate*Duality*(cont'd) • With& ,&the&log$par((on*func(on& &has: • Supreme(a*ained(at( (iff( (is(dually&coupled,( which(has( . • With(a(minimal&representa1on,( (maps( (one8to8 one(onto( ,(while( (is(the(inverse(map. 87
  88. 88. Prior%and%Posterior • In$Bayesian(analysis,$we$usually$place$a$prior$with$ density$ $over$the$parameter$space$ . • A$parameter$ $is$linked$to$observa;ons$ $ via$a$likelihood(model:$ . • The$posterior(measure$given$ $is$ 88
  89. 89. Prior%and%Posterior%(cont'd) • Compu'ng*the*posterior(distribu,on*is*in*generally* very*difficult. • However,*under*certain(condi,on*(e.g.*when*the* prior*is*conjugate*to*the*likelihood(model),*the* computa'on*becomes*par'cularly*easy. 89
  90. 90. Conjugate*Prior • A#prior#with#density# #is#called#a#conjugate* prior#to#the#likelihood#model# ,#if#the#posterior# distribu9on#given# #is#in#the#same# parameterized#family,#i.e.#in#the#form# • #is#le01associa3ve#and#sa9sfies# . • When# ,# .#The# result#is#independent#of#the#order#of#samples. 90
  91. 91. CP#for#Exponen,al#Families Generally,)conjugate*pairs)in)exponen0al*families)are)in) the)following)form: • Prior:' • Likelihood:' 91
  92. 92. CP#for#Exponen,al#Families#(cont'd) Hence,&the&posterior(update: • with&a&single&observa1on:& • with&mul*ple&observa*ons:& 92
  93. 93. CP#for#Exponen,al#Families#(cont'd) • The%family%of%conjugate*priors%is%largely%determined% by%the%likelihood*model,%par6cularly%by%the%form%of% %and% . • A%family%of%prior*distribu5ons%can%serve%as%the% conjugate*priors%to%different%likelihood*model.% 93
  94. 94. Example:)Beta,Bernoulli • Prior:'Beta'distribu0on • Likelihood:+Bernoulli+distribu3on • Posterior:*remains*a*Beta*distribu2on 94
  95. 95. Example:)Normal-Normal • Prior:'Normal'distribu1on • Likelihood:+Normal+distribu4on+(fixed+variance) • Posterior:*remains*a*Normal*distribu3on 95
  96. 96. Dirichlet) Distribu-on • Dirichlet)distribu.on"is"a" distribu+on"over" . • It"is"o2en"used"as"a"conjugate) prior"to"the"Categorical) distribu.on"or"the"Mul.nomial) distribu.on. • With" "as"the" parameter,"its"density: 96
  97. 97. Dirichlet)Distribu-on)(cont'd) • Mean:' 'with' . • Covariance:' • Mode:' • Marginal:' 97
  98. 98. Dirichlet)Categorical • Prior:' • Likelihood:' • Posterior:'remains'a'Dirichlet'distribu7on:' • When' ,' 'reduces'to'a'uniform( distribu-on'over' . 98
  99. 99. Dirichlet)Distribu-on)(cont'd) • Dirichlet*distribu/ons*are*an*exponen&al)family: • Canonical*parameter:* • Sufficient*stats:* • Log;par//on:* • Hence,* 99
  100. 100. Predic've)Distribu'on Given& ,&the&distribu/on&of&a&new&sample& ? With%exponen&al)family%and%conjugacy,%we%have 100
  101. 101. Important)Conjugate)Pairs • Beta":"the"probability,parameter"of"Bernoulli," Binomial,"Geometric,"or"Nega4ve,Binomial • Normal:"the"mean,parameter"of"Normal • InverseGamma:"the"variance,parameter"of"Normal • Gamma:"the"rate,parameter"of"Exponen4al"or" Poisson,"or"the"precision,parameter"of"Normal 101
  102. 102. Important)Conjugate)Pairs)(cont'd) • Dirichlet:#the#probability.vector#of#Categorical#or# Mul4nomial • Mul4variate.Normal:#the#mean.vector#of#Mul4variate. Normal • InverseWishart:#the#covariance.matrix#of#Mul4variate. Normal • Wishart:#the#precision.matrix#of#Mul4variate.Normal 102
  103. 103. Examples)of)Graphical)Models 103
  104. 104. N M µk 2 zi xi ⇡ GMM A"Gaussian'Mixture'Model'(GMM)" with"fixed"variance: • This&model&is&not$complete • How&are& &and& & generated? 104
  105. 105. N M µk 2 zi xi ⇡ µ0 2 0 ↵ GMM#(with#Prior) With%priors%placed%over%model* parameters,%we%get%a%Hierarchical* Bayesian*Model: • Hyperparameters"have"no" parents"(top"level) • Observa-ons"have"no"children" (bo4om"level) • Each"unknown"variable"is" generated"according"to"its" parents 105
  106. 106. N M µk 2 zi xi ⇡ µ0 2 0 ↵ GMM#(Joint#Model) This%is%an%exponen&al)family. 106
  107. 107. Topic&Models 107
  108. 108. M N nd ✓d zdi wdi k PLSI Probabilis)c+Latent+Seman)c+ Indexing+(PLSI): • Each&topic&is&associated&with& ,&a&distribu2on&over&the& vocabulary. • Each&document& &comes&with&a& vector&of&topic+propor-ons& & • To&generate&each&word& : • This&is&not&a&complete&model. 108
  109. 109. M N nd ✓d zdi wdi k ↵ LDA Latent&Dirichlet&Alloca/on&(LDA)! completes!PLSI!by!placing! Dirichlet&priors!over!latent! variables: • For%each%document,%the%topic& propor(ons%are%generated%as% • For%each%topic,%the%word% distribu+on%is%generated%as 109
  110. 110. M N nd ✓d zdi wdik ↵ LDA$(Joint$Model) Again,'an'exponen&al)family. 110
  111. 111. Summary • The%Basics%of%Graphical%Models • Bayesian%Networks%and%Markov%Random%Fields.% • How%the%joint%distribuBon%factorizes%according%to% the%graph. • RelaBons%between%graphical%structure%and% condiBonal%independencies. • Factor%graphs. 111
  112. 112. Summary'(cont'd) • The%Basics%of%Exponen1al%Families • The%form%of%exponen1al%families • Minimal%and%overcomplete%representa1on,% iden1fiability • Convexity%of%log@par11on%func1on,%gradient%map • KL%divergence,%projec1ons%of%distribu1ons • Conjugate%duality%between%log@par11on%func1on% and%nega1ve%entropy 112
  113. 113. Summary'(cont'd) • Conjugate+Prior • Posterior+distribu2ons+in+Bayesian+analysis • Conjugate+prior,+especially+of+exponen2al+ families+ • Important+conjugate+pairs 113
  114. 114. Summary'(cont'd) • Prac&ce • How+to+formulate+a+graphical+model+based+on+ intui&on • Graphical+representa&on+of+a+model,+factor+graph • Analysis+of+the+joint+distribu&on 114

×