# MLPI Lecture 4: Graphical Model and Exponential Family

This lecture covers the basics of graphical models, exponential families, conjugate priors, as well as guidelines for formulating graphical models for practical problems.

1. 1. Lecture'4 Graphical)Model)and)Exponen1al) Family Dahua%Lin The\$Chinese\$University\$of\$Hong\$Kong 1
2. 2. Outline You\$will\$learn\$the\$basics\$of\$probabilis3c\$modeling\$in\$ this\$lecture: • Graphical*Models • Exponen3al*Families • Conjugate*Prior • How*to*formulate*and*analyze*a*graphical*models*in* prac3ce. 2
3. 3. Graphical)Models • The%key%idea%behind%graphical)models%is% factoriza2on. • A%graphical)model%generally%refers%to%a%family%of%joint% distribu8ons%over%mul8ple%variables%that%factorize% according%to%the%structure%of%the%underlying%graph.% 3
4. 4. Graphical)Models A"graphical)model"can"be"viewed"in"two"ways: • A#data#structure#that#provides#the#skeleton#for# represen3ng#a#joint#distribu3on#in#a#factorized# manner. • A#compact#representa3on#of#a#set#of#condi/onal0 independencies#about#a#family#of#distribu3ons. These%two%views%are%equivalent%in%a%strict%sense. 4
5. 5. Distribu(ons+on+a+Graph Consider)a)graph) ,)where)edges)can)be) directed)or)undirected: • A#ach'a'random'variable' 'to'each'vertex' • The'state'space'for' 'is'denoted'by' • A'par9cular'instance'of' 'is'denoted'by' • We'can'also'consider'a'set'of'variables:' 'and' 5
6. 6. Categories*of*Graphical*Models • Bayesian)Networks)(Directed)Acyclic)Graphs) • Markov)Random)Fields)(Undirected)Graphs) • Chain)Graphs)(Directed)acyclic)graphs)over) undirected)components) • Factor)Graphs 6
7. 7. Directed(Acyclic(Graphs Consider)a)directed'graph) : • "is"called"a"directed'acyclic'graph'(DAG)"if"it"has"no" directed'acyclic'cycles 7
8. 8. Directed(Acyclic(Graphs((cont'd) Consider)a)directed'acyclic'graph) : • Given'an'edge' ,' 'is'called'a'parent'of' ,' and' 'is'called'a'child'of' . • A'vertex' 'is'called'an'ancestor'of' 'and' 'an' descendant'of' ,'denoted'as' ,'if'there'exists'a' directed'path'from' 'to' . 8
9. 9. Topological)Ordering • A#topological)ordering#of#a#directed)graph# #is#a#linear)ordering#of#ver,ces#such#that# for#each#edge# ,# #always#comes#before# . • A#ﬁnite&directed&graph#is#acyclic#if#and#only#if#it#has#a# topological&ordering. 9
10. 10. Bayesian(Networks Given&a&DAG& ,&we&say&a&joint&distribu,on& over& &factorizes&according&to& ,&if&its&density& &can& be&expressed&as: • Such&a&model&is&called&a&Bayesian(Network&over& . • &is&the&set&of& 's&parents,&which&can&be&empty. 10
11. 11. Bayesian(Networks:(Example ! 11
12. 12. Undirected)Graphs)and)Cliques Consider)an)undirected)graph) • A#clique#is#a#fully#connected#subset#of#ver4ces • A#clique#is#called#maximal#if#it#is#not#properly. contained#in#another#clique.# • #denotes#the#set#of#all#maximal.cliques.# 12
13. 13. Undirected)Graphs)and)Cliques) (cont'd) 13
14. 14. Markov'Random'Fields Consider)an)undirected)graph) ,)we)say)a) joint)distribu2on)of) )factorizes)according)to) )if)its) density) )can)be)expressed)as: • This&is&called&a&Markov'Random'Field&over& . • &are&called&factors. 14
15. 15. Markov'Random'Fields'(cont'd) • The%normalizing*constant% %is%usually%needed%to% ensure%the%distribu2on%is%properly%normalized: • Generally,*the*compa&bility,func&ons* *need*not* have*any*obvious*rela4ons*with*the*marginal*or* condi4onal*distribu4ons*over*the*cliques. 15
16. 16. a b c MRF\$ Parameteriza-on • All\$MRFs\$can\$be\$parameterized\$ in\$terms\$of\$maximal&cliques.\$In\$ prac9ce,\$this\$is\$not\$necessarily\$ the\$most\$natural\$way. • Natural(parameteriza.on:( • Maximal4clique(based:( (with( 16
17. 17. The\$graphical\$structure\$also\$encodes\$a\$set\$of\$ condi3onal\$independencies\$among\$the\$variables. 17
18. 18. Condi&onal)Independence Consider)a)joint)distribu/on)over) ,) )and) ) are)called)condi&onally*independent)given) ,)denoted) by) )iﬀ More%generally, 18
19. 19. Condi&onal)Independence)(cont'd) If#the#condi,onal#distribu,ons# #and# #have# densi,es# #and# ,#then# ,#if#the# following#equality#holds#almost'surely: ! 19
20. 20. I"map • Let% %be%a%family%of%distribu2ons%(e.g.%a%graphical% model).%We%deﬁne% %to%be%the%set%of%condi)onal, independencies%in%the%form%of% %that% hold%for%all%distribu2ons%in% . • Given%a%graph% %associated%with%a%set%of%condi)onal, independencies% ,%then% %is%called%an%I0map%of% % if% . • An%I0map%is%a%graph%that%captures%(part%of)%the% condi2onal%independencies%of%a%distribu2on%family.% 20
21. 21. Condi&onal)Independencies)of)MRFs • The%condi+onal%independencies%of%an%MRF%can%be% characterized%in%three%ways: • Local&independencies • Pairwise&independencies • Global&independencies • In%the%sequel,%we%consider%an%undirected&graph% . 21
22. 22. Local& Independencies • For%each% ,%the%Markov' blanket%of% %w.r.t.% ,%denoted% by% ,%is%the%set%of%all% neighbors%of% . • Local'independencies:% %is% independent%of%the%rest%given% its%neighbors. 22
23. 23. Pairwise( Independencies • Pairwise(independencies:#Given# two#disjoint#sets# # with#no#direct#edges#between# them,# #is#independent#of# #given#the#rest: 23
24. 24. Global& Independencies • We\$say\$ \$separates\$ \$and\$ ,\$ denoted\$by\$ ,\$if\$all\$ paths\$between\$ \$and\$ \$go\$ through\$ . • Global+independencies:\$If\$ \$ separates\$ \$and\$ ,\$then\$ \$is\$ independent\$of\$ \$given\$ . 24
25. 25. Rela%ons)between)Independencies • • Given'a'distribu/on'or'a'family'of'distribu/on' ,' we'say' 'sa#sﬁes' 'if'it'sa/sﬁes'all'condi#onal, independencies'in' ,'denoted'by' . • . • If' 'is'a'family'of'posi#ve,distribu#ons,'then 25
26. 26. Soundness • Let% %be%a%distribu-on%that%factorizes%according%to% an%undirected%graph% ,%then% ,%or%in%other% words,% %is%an%I"map%of% • %and% . • How%to%proof? • How%is%the%separa)on,assump)on%related%to%the% maximal,cliques? 26
27. 27. We#have#shown#that#if# #factorizes#according#to# ,# then# #is#an#I5map#for# .#Is#the#converse#also#true? 27
28. 28. Hammersley)Cliﬀord • (Hammersley*Cliﬀord0Theorem)"Let" "be"a"posi5ve0 distribu5on"over" "and" "be"an"I.map"of" ,"then" "factorizes"according"to" ." • Combining"Soundness"and"Hammersley*Cliﬀord: • A"posi5ve0distribu5on" "factorizes"according"to" " if"and"only"if" "is"an"I*map"of" . 28
29. 29. Condi&onal)Independencies)of)BN • Condi'onal*independencies*of*a*Bayesian(network* can*be*characterized*in*two*ways: • local*independencies • global*independencies*(via* .separa0on) • In*the*sequel,*we*consider*a*directed(graph* . 29
30. 30. b c g a d i e f h Local& Independencies • Given' ,' 'is' independent'of'its'non# descendants'given'its'parents' : 30
31. 31. X Z Y Y Z X Z X Y Z X Y indirect eﬀect common cause common eﬀect 31
32. 32. !separa'on When%"inﬂuence"%can%ﬂow%from% %to% %via% ,%we%say% that%the%trail% %is%ac)ve: • "is"ac#ve"iﬀ" "is"not"observed. • "is"ac#ve"iﬀ" "is"not"observed. • "is"ac#ve"iﬀ" "is"not"observed. • (V(structure)" "is"ac#ve"iﬀ"either" "or" some"of" 's"descendants"is"observed. 32
33. 33. !separa'on*(cont'd) • A#trail# #is#called#ac#ve#when#all# sub2trails# #are#ac#ve. • Let# #be#three#sets#of#ver8ces#of# .# #and# # are# 2separated#by# ,#denoted#by# ,#if# there#is#neither#direct#link#nor#ac8ve#trail#between# #and# #when# #are#observed. 33
34. 34. b c g a d i e f h Global& Independencies • Given' ,' 'is' independent'of' 'given' 'if' 'and' 'is' 'separated'by' 'on' the'graph' : • • (Soundness)"If" "factorize" according"to" ,"then" ,"or"we"say" "is"an" I2map"of" . 34
35. 35. Moralized*Graphs • Given'a'directed'graph' ,'we'can' construct'a'moralized'graph,'denoted'by' 'by' adding'edges'between'each'node'and'its'parents' and'between'each'node's'parents.' • In' ,'the'subgraph'that'span' 'forms' a'clique,'denoted'by' .' • The'procedure'of'construc=ng' 'from' 'is' called'moraliza2on. 35
36. 36. Moralized*Graphs*(Illustra3on) b c g a d i e f h b c g a d i e f h 36
37. 37. From%BN%to%MRF If# #factorizes#according#to# #as then% %factorizes%according%to% : • .#Is#the#opposite#true? 37
38. 38. From%BN%to%MRF%(cont'd) • In\$general,\$moraliza(on\$may\$cause\$the\$loss\$of\$ condi(onal,independencies. • \$may\$be\$a\$proper\$subset\$of\$ . • Consider\$a\$DAG:\$ • \$holds\$for\$ \$but\$not\$for\$ .\$ • Not\$every\$MRF\$can\$be\$converted\$to\$a\$BN. 38
39. 39. Factor'Graphs • An\$MRF\$does\$not\$always\$fully\$reveal\$the\$factorized. structure\$of\$a\$distribu8on. • A\$factor.graph\$can\$some8mes\$give\$a\$more\$accurate\$ characteriza8on\$of\$a\$family\$of\$distribu8ons.\$ • A\$factor.graph\$is\$a\$bipar4te.graph\$with\$links\$between\$ two\$types\$of\$nodes:\$variables\$and\$factors.\$ • A\$variable\$ \$and\$a\$factor\$ \$is\$linked\$in\$a\$factor\$ graph,\$if\$the\$factor\$involves\$ \$as\$an\$argument. 39
40. 40. Factor'Graphs'(Illustra0on) 40
41. 41. Study&of&Distribu.ons • Graphical*models* *structure*of*(in)dependencies • Exponen8al*families* *algebraic*characteris8cs 41
42. 42. Exponen'al*Families An#exponen&al)family# #over#a#measure#space# : • suﬃcient)sta+s+cs:" • canonical)parameter)func+on:" • par++on)func+on:" • base)density:" "over" 42
43. 43. Par\$\$on'Func\$on • The%par\$\$on'func\$on%is%given%by: • The%log\$par((on*func(on%given%by% %is%o.en%used%instead%of% . 43
44. 44. Parameter'Space • An\$exponen)al\$family\$is\$essen)ally\$determined\$by\$ the\$domain\$ \$and\$the\$suﬃcient-sta.s.cs\$ . • The\$set\$of\$valid\$parameters\$is\$ • An\$exponen)al\$family\$can\$be\$parameterized\$in\$ many\$ways.\$When\$ ,\$it\$is\$said\$to\$be\$in\$the\$ canonical-form.\$ 44
45. 45. Many%important%families%of%distribu3ons%are% exponen3al%families: • Binomial(distribu/ons • Poisson(distribu/on • Normal(distribu/on • Exponen/al(distribu/on • Beta(distribu/on • And%many%more%... 45
46. 46. Bernoulli)Distribu.on Domain:( Parameter:( Density:) Bernoulli)distribu.ons!describe!an!event!that!may!or! may!not!happen. 46
47. 47. Bernoulli)Distribu.on)(cont'd) • suﬃcient)sta+s+cs:" • canonical)parameter:" • base)density:" "w.r.t."coun'ng • par++on)func+on:" 47
48. 48. Bernoulli)Distribu.on)(cont'd) • suﬃcient)sta+s+cs:" • canonical)parameters:" • base)density:" "w.r.t."coun'ng • par++on)func+on:" 48
49. 49. Poisson&Distribu,on Domain:( Parameter:( Density: Poisson&distribu,ons!characterize! the!number!of!independent! events!occurring!in!a!certain!rate! !within!a!unit!6me. 49
50. 50. Poisson&Distribu,on&(cont'd) • suﬃcient)sta+s+cs:" • canonical)parameter:" " • base)density:" "w.r.t."coun'ng • par++on)func+on:" 50
51. 51. Exponen'al* Distribu'on Domain:( Parameter:( Density:) Exponen'al*distribu'ons! characterize!the!*me!interval! between!independent!events! occurring!at!a!certain!rate! .! 51
52. 52. Exponen'al*Distribu'on*(cont'd) • suﬃcient)sta+s+cs:" "or" • canonical)parameter:" "or" • base)density:" "w.r.t."Lebesgue • par++on)func+on:" ," which"is"ﬁnite"only"when" . 52
53. 53. Normal'Distribu.on Domain:( Parameter:( Density: Normal'distribu.ons!are!probably! the!most!widely!used! distribu2ons!in!probabilis2c! analysis. 53
54. 54. Normal'Distribu.on'(cont'd) • suﬃcient)sta+s+cs:" • canonical)parameter:" " • base)density:" "w.r.t."Lebesgue • par++on)func+on:" 54
55. 55. Normal'Distr.'in'Canonical'Form The\$normal\$distribu1on\$can\$be\$alterna1vely\$ parameterized\$in\$the\$canonical'form: • poten&al)coeﬃcient:" • precision)coeﬃcient:" with% . 55
56. 56. Regular(Family In#the#sequel,#we#focus#on#exponen2al#families#in#the# canonical'form,#the#set#of#valid#canonical'parameters#is: The\$exponen)al\$family\$ \$is\$called\$a\$regular'family,\$if\$ \$is\$an\$open\$subset\$of\$ .\$We\$restrict\$our\$ a:en)on\$to\$regular'families.\$ 56
57. 57. Iden%ﬁability Let\$ \$be\$a\$parameterized\$family: • "is"called"iden%ﬁable"when"each"distribu1on"in" " corresponds"to"a"unique"parameter"in" : ! • Iden%ﬁability"means"that"the"parameter"of"a" distribu1on"can"be"learned"from"observed3samples" without"the"need"of"addi1onal"constraints. 57
58. 58. Minimal'and'Overcomplete Consider)an)exponen-al)family)with)suﬃcient)stats) , • If\$there\$exist\$ \$such\$that\$ \$holds\$almost\$everywhere,\$this\$is\$called\$ a\$overcomplete*representa.on,\$otherwise,\$it\$is\$called\$ a\$minimal*representa.on. • An\$exponen;al\$family\$is\$iden.ﬁable\$if\$and\$only\$if\$ the\$representa;on\$is\$minimal.\$Why? 58
60. 60. Bernoulli)Revisited Consider)two)representa.ons: • [R1]" " • [R2]" For\$each\$representa-on: • Is\$it\$minimal\$or\$overcomplete? • If\$it\$is\$overcomplete,\$ﬁnd\$ \$such\$that\$ • Is\$it\$iden.ﬁable\$or\$uniden.ﬁable? 60
61. 61. Mean%Parameters • The%expecta'on%of%suﬃcient%sta0s0cs%as%below%are% called%mean+parameters: • Under'certain'condi+ons,'the'distribu+on'in'an' exponen+al'family'is'uniquely'determined'by'the' mean-parameters,'which'thus'provide'an'alterna+ve' parameteriza+on. 61
62. 62. Realizable(Mean(Parameters • Given'a'suﬃcient'stats' ,'we'say'a'distribu4on' ' realizes'a'mean*parameter' 'if' . • The'set'of'(realizable)*mean*parameters'for'a'given' suﬃcient'stats' 'is: • Here,' 'is'not'restricted'to'the'exponen4al'family.' • 'is'a'convex*set.'Why? 62
63. 63. Convex'Hulls • Given'a'set' ,'the'convex'hull'of' ,'denoted' by' ,'is'the'set'of'all'convex'combina/ons'of' elements'in' . • 'are'the'minimum'convex'set'containing' . • A'convex'hull'of'some'ﬁnite'set'is'called'a'convex' polytope. • Convex'polytopes'are'compact. 63
64. 64. Probability*Simplex Given&a&ﬁnite&space& ,&the&probability*simplex&over& : When% ,% %reduces%to:% and\$ \$is\$an\$ 'dimensional\$polytope. 64
65. 65. Polytope(of(Mean(Parameters When%the%sample%space% %is%ﬁnite,%given%any% ,%the%set% %is%a%convex'polytope: Par\$cularly,*each* *can*be*wri1en*as 65
66. 66. Log\$par((on*Func(on The\$log\$par((on*func(on\$given\$by\$ has\$the\$following\$proper0es: • "is"a"convex'func*on"and"thus" "is"a"convex'set. 66
67. 67. Log\$par((on*Func(on*(cont'd) • For%an%overcomplete*representa.on%with% ,% %has% ,%because:% • Conversely,,for,a,minimal&representa,on,,we,have, ,for,every,non1zero,vector, ,,hence, ,is,posi7ve,deﬁnite,for,every, ,and, thus, ,is,strictly&convex., 67
68. 68. Gradient)Map The\$gradient)map\$ \$is\$a\$mapping\$ from\$the\$canonical)parameters\$ \$to\$the\$mean) parameters\$ . • When&is& &injec&ve((i.e.(one,to,one)&? • When&is& &surjec&ve&(onto& )&?& 68
69. 69. Gradient)Map)(cont'd) • The%gradient)map%is%injec.ve%if%and%only%if%the% exponen2al%representa2on%is%minimal. • An%exponen2al%family%with%minimal)representa.on% is%iden.ﬁable. • How%to%prove?% %is%strictly)convex% % . • With%overcomplete)representa.on,%there%is%one=to= one%correspondence%between%mean%parameters% and%aﬃne)subsets%of% . 69
70. 70. Gradient)Map)(cont'd) • With&minimal&representa,on,& &is&onto& ,&the& interior&of& .& • Each&mean&parameter& &is&uniquely&realized& by&a&canonical&parameter& . • Given& ,&there&can&be&many&distribu;ons& that&realize& ,&among&which&there&is&one&that& maximizes&the&entropy,&which&is&in&the&exponen;al& family&associated&with& &(we&will&see&this). 70
71. 71. Entropy Given&an&exponen+al&family&distribu+on& The\$entropy\$of\$ \$is\$deﬁned\$to\$be: 71
72. 72. Maximum'Entropy'(Problem) Consider)a)ﬁnite)space) ,)we)want)to ! What's'the'solu,on? 72
73. 73. Maximum'Entropy'(Solu2on) Using&the&method'of'Lagrange'mul0pliers,&we&get&the& op.ma& : The\$solu)on\$to\$an\$maximum&entropy&problem\$with\$ expecta1on&constraints\$is\$always\$an\$exponen1al&family& distribu1on.\$This\$can\$be\$generalized\$to\$con)nuous\$ space,\$using\$calculus&of&varia1ons. 73
74. 74. Kullback(Leibler-Divergence The\$Kullback(Leibler-divergence\$(or\$KL-divergence)\$ between\$two\$probability\$densi4es\$ \$and\$ \$(w.r.t.\$the\$ same\$base\$measure)\$is\$deﬁned\$to\$be KL#divergence#is#not\$symmetric. 74
75. 75. KL#Divergence#(cont'd) • (Gibbs&inequality)" ,"where"the"equality" holds"if"and"only"if" "almost"everywhere" (w.r.t."the"base"measure)." • Given"two"distribu;ons" "in"the"same" exponen;al"family"with"suﬃcient"stats" : 75
76. 76. Projec'ons*of*Distribu'ons Let\$ \$be\$an\$exponen&al)family\$and\$ \$be\$a\$distribu-on,\$ both\$over\$the\$same\$space\$ : • The%I"projec)on+(informa)on+projec)on)%of% %onto% : • The%M"projec)on+(moment+projec)on)%of% %onto% : 76
77. 77. Maximum'Likelihood'Es1ma1on Given&a&parameterized&family& &over& ,& and& ,&the&log\$likelihood&of& : !is!called!an!maximum&likelihood&es.mate!given! !if 77
78. 78. MLE\$(cont'd) • Given' ,' 'is'called'an' empirical)probability)measure,'which'has • The%log\$likelihood%of% %given% %can%be%rewri1en%as 78
79. 79. MLE\$(cont'd) • We\$have\$ . • Maximizing\$ \$is\$equivalent\$to\$ minimizing\$ . • Maximum'likelihood'es/ma/on\$is\$equivalent\$to\$M1 projec/on\$of\$the\$empirical'distribu/on\$ \$to\$a\$given\$ family. 79
80. 80. M"projec)ons Given&an&exponen+al&family& &and&an&arbitrary& distribu+on& &over& ,&then Thus,&the&op#ma& &is 80
81. 81. M"projec)ons,(cont'd) • This&is&a&convex'problem.&The& &is&op-mal&iﬀ: • M0projec-on&to&an&exponen1al&family& &is&to&ﬁnd&a& distribu1on&in& &whose&mean'parameter&matches& the&input&mean& .&( &is&always&realizable,&why?) • With&minimal'representa-on&the&op1ma&is&unique;& otherwise,&the&set&of&op1mal&solu1ons&is&an&aﬃne' subset&of& ,&which&yield&the&same&distribu1on. 81
83. 83. Convex'Conjugate Let\$ \$be\$a\$real)valued\$func0on\$ : • The%convex'conjugate%of% %is%deﬁned%to%be • "is"always"convex"no"ma,er"whether"so"is" . • "is"convex. 83
84. 84. Convex'Conjugate • (Fenchel's*inequality)" • (Fenchel2Moreau*theorem)" "iﬀ" "is"convex"and" lower*semi2con:nuous: 84
85. 85. Conjugate*Duality • "is"called"dually&coupled"if" . • The%convex'conjugate%to%a%log+par..on%func.on% : • Supreme(a*ained(at( (iﬀ( (is(dually&coupled. 85
86. 86. Conjugate*Duality*(cont'd) • "has: • "on" "determined"via"Cauchy"sequences. 86
87. 87. Conjugate*Duality*(cont'd) • With& ,&the&log\$par((on*func(on& &has: • Supreme(a*ained(at( (iﬀ( (is(dually&coupled,( which(has( . • With(a(minimal&representa1on,( (maps( (one8to8 one(onto( ,(while( (is(the(inverse(map. 87
88. 88. Prior%and%Posterior • In\$Bayesian(analysis,\$we\$usually\$place\$a\$prior\$with\$ density\$ \$over\$the\$parameter\$space\$ . • A\$parameter\$ \$is\$linked\$to\$observa;ons\$ \$ via\$a\$likelihood(model:\$ . • The\$posterior(measure\$given\$ \$is\$ 88
89. 89. Prior%and%Posterior%(cont'd) • Compu'ng*the*posterior(distribu,on*is*in*generally* very*diﬃcult. • However,*under*certain(condi,on*(e.g.*when*the* prior*is*conjugate*to*the*likelihood(model),*the* computa'on*becomes*par'cularly*easy. 89
90. 90. Conjugate*Prior • A#prior#with#density# #is#called#a#conjugate* prior#to#the#likelihood#model# ,#if#the#posterior# distribu9on#given# #is#in#the#same# parameterized#family,#i.e.#in#the#form# • #is#le01associa3ve#and#sa9sﬁes# . • When# ,# .#The# result#is#independent#of#the#order#of#samples. 90
91. 91. CP#for#Exponen,al#Families Generally,)conjugate*pairs)in)exponen0al*families)are)in) the)following)form: • Prior:' • Likelihood:' 91
92. 92. CP#for#Exponen,al#Families#(cont'd) Hence,&the&posterior(update: • with&a&single&observa1on:& • with&mul*ple&observa*ons:& 92
93. 93. CP#for#Exponen,al#Families#(cont'd) • The%family%of%conjugate*priors%is%largely%determined% by%the%likelihood*model,%par6cularly%by%the%form%of% %and% . • A%family%of%prior*distribu5ons%can%serve%as%the% conjugate*priors%to%diﬀerent%likelihood*model.% 93
94. 94. Example:)Beta,Bernoulli • Prior:'Beta'distribu0on • Likelihood:+Bernoulli+distribu3on • Posterior:*remains*a*Beta*distribu2on 94
95. 95. Example:)Normal-Normal • Prior:'Normal'distribu1on • Likelihood:+Normal+distribu4on+(ﬁxed+variance) • Posterior:*remains*a*Normal*distribu3on 95
96. 96. Dirichlet) Distribu-on • Dirichlet)distribu.on"is"a" distribu+on"over" . • It"is"o2en"used"as"a"conjugate) prior"to"the"Categorical) distribu.on"or"the"Mul.nomial) distribu.on. • With" "as"the" parameter,"its"density: 96
97. 97. Dirichlet)Distribu-on)(cont'd) • Mean:' 'with' . • Covariance:' • Mode:' • Marginal:' 97
98. 98. Dirichlet)Categorical • Prior:' • Likelihood:' • Posterior:'remains'a'Dirichlet'distribu7on:' • When' ,' 'reduces'to'a'uniform( distribu-on'over' . 98
99. 99. Dirichlet)Distribu-on)(cont'd) • Dirichlet*distribu/ons*are*an*exponen&al)family: • Canonical*parameter:* • Suﬃcient*stats:* • Log;par//on:* • Hence,* 99
100. 100. Predic've)Distribu'on Given& ,&the&distribu/on&of&a&new&sample& ? With%exponen&al)family%and%conjugacy,%we%have 100
101. 101. Important)Conjugate)Pairs • Beta":"the"probability,parameter"of"Bernoulli," Binomial,"Geometric,"or"Nega4ve,Binomial • Normal:"the"mean,parameter"of"Normal • InverseGamma:"the"variance,parameter"of"Normal • Gamma:"the"rate,parameter"of"Exponen4al"or" Poisson,"or"the"precision,parameter"of"Normal 101
102. 102. Important)Conjugate)Pairs)(cont'd) • Dirichlet:#the#probability.vector#of#Categorical#or# Mul4nomial • Mul4variate.Normal:#the#mean.vector#of#Mul4variate. Normal • InverseWishart:#the#covariance.matrix#of#Mul4variate. Normal • Wishart:#the#precision.matrix#of#Mul4variate.Normal 102
103. 103. Examples)of)Graphical)Models 103
104. 104. N M µk 2 zi xi ⇡ GMM A"Gaussian'Mixture'Model'(GMM)" with"ﬁxed"variance: • This&model&is&not\$complete • How&are& &and& & generated? 104
105. 105. N M µk 2 zi xi ⇡ µ0 2 0 ↵ GMM#(with#Prior) With%priors%placed%over%model* parameters,%we%get%a%Hierarchical* Bayesian*Model: • Hyperparameters"have"no" parents"(top"level) • Observa-ons"have"no"children" (bo4om"level) • Each"unknown"variable"is" generated"according"to"its" parents 105
106. 106. N M µk 2 zi xi ⇡ µ0 2 0 ↵ GMM#(Joint#Model) This%is%an%exponen&al)family. 106
107. 107. Topic&Models 107
108. 108. M N nd ✓d zdi wdi k PLSI Probabilis)c+Latent+Seman)c+ Indexing+(PLSI): • Each&topic&is&associated&with& ,&a&distribu2on&over&the& vocabulary. • Each&document& &comes&with&a& vector&of&topic+propor-ons& & • To&generate&each&word& : • This&is&not&a&complete&model. 108
109. 109. M N nd ✓d zdi wdi k ↵ LDA Latent&Dirichlet&Alloca/on&(LDA)! completes!PLSI!by!placing! Dirichlet&priors!over!latent! variables: • For%each%document,%the%topic& propor(ons%are%generated%as% • For%each%topic,%the%word% distribu+on%is%generated%as 109
110. 110. M N nd ✓d zdi wdik ↵ LDA\$(Joint\$Model) Again,'an'exponen&al)family. 110
111. 111. Summary • The%Basics%of%Graphical%Models • Bayesian%Networks%and%Markov%Random%Fields.% • How%the%joint%distribuBon%factorizes%according%to% the%graph. • RelaBons%between%graphical%structure%and% condiBonal%independencies. • Factor%graphs. 111
112. 112. Summary'(cont'd) • The%Basics%of%Exponen1al%Families • The%form%of%exponen1al%families • Minimal%and%overcomplete%representa1on,% iden1ﬁability • Convexity%of%log@par11on%func1on,%gradient%map • KL%divergence,%projec1ons%of%distribu1ons • Conjugate%duality%between%log@par11on%func1on% and%nega1ve%entropy 112
113. 113. Summary'(cont'd) • Conjugate+Prior • Posterior+distribu2ons+in+Bayesian+analysis • Conjugate+prior,+especially+of+exponen2al+ families+ • Important+conjugate+pairs 113
114. 114. Summary'(cont'd) • Prac&ce • How+to+formulate+a+graphical+model+based+on+ intui&on • Graphical+representa&on+of+a+model,+factor+graph • Analysis+of+the+joint+distribu&on 114