Lecture'4
Graphical)Model)and)Exponen1al)
Family
Dahua%Lin
The$Chinese$University$of$Hong$Kong
1
Outline
You$will$learn$the$basics$of$probabilis3c$modeling$in$
this$lecture:
• Graphical*Models
• Exponen3al*Families
• Conjugate*Prior
• How*to*formulate*and*analyze*a*graphical*models*in*
prac3ce.
2
Graphical)Models
• The%key%idea%behind%graphical)models%is%
factoriza2on.
• A%graphical)model%generally%refers%to%a%family%of%joint%
distribu8ons%over%mul8ple%variables%that%factorize%
according%to%the%structure%of%the%underlying%graph.%
3
Graphical)Models
A"graphical)model"can"be"viewed"in"two"ways:
• A#data#structure#that#provides#the#skeleton#for#
represen3ng#a#joint#distribu3on#in#a#factorized#
manner.
• A#compact#representa3on#of#a#set#of#condi/onal0
independencies#about#a#family#of#distribu3ons.
These%two%views%are%equivalent%in%a%strict%sense.
4
Distribu(ons+on+a+Graph
Consider)a)graph) ,)where)edges)can)be)
directed)or)undirected:
• A#ach'a'random'variable' 'to'each'vertex'
• The'state'space'for' 'is'denoted'by'
• A'par9cular'instance'of' 'is'denoted'by'
• We'can'also'consider'a'set'of'variables:' 'and'
5
Categories*of*Graphical*Models
• Bayesian)Networks)(Directed)Acyclic)Graphs)
• Markov)Random)Fields)(Undirected)Graphs)
• Chain)Graphs)(Directed)acyclic)graphs)over)
undirected)components)
• Factor)Graphs
6
Directed(Acyclic(Graphs
Consider)a)directed'graph) :
• "is"called"a"directed'acyclic'graph'(DAG)"if"it"has"no"
directed'acyclic'cycles
7
Directed(Acyclic(Graphs((cont'd)
Consider)a)directed'acyclic'graph) :
• Given'an'edge' ,' 'is'called'a'parent'of' ,'
and' 'is'called'a'child'of' .
• A'vertex' 'is'called'an'ancestor'of' 'and' 'an'
descendant'of' ,'denoted'as' ,'if'there'exists'a'
directed'path'from' 'to' .
8
Topological)Ordering
• A#topological)ordering#of#a#directed)graph#
#is#a#linear)ordering#of#ver,ces#such#that#
for#each#edge# ,# #always#comes#before# .
• A#finite&directed&graph#is#acyclic#if#and#only#if#it#has#a#
topological&ordering.
9
Bayesian(Networks
Given&a&DAG& ,&we&say&a&joint&distribu,on&
over& &factorizes&according&to& ,&if&its&density& &can&
be&expressed&as:
• Such&a&model&is&called&a&Bayesian(Network&over& .
• &is&the&set&of& 's&parents,&which&can&be&empty.
10
Bayesian(Networks:(Example
!
11
Undirected)Graphs)and)Cliques
Consider)an)undirected)graph)
• A#clique#is#a#fully#connected#subset#of#ver4ces
• A#clique#is#called#maximal#if#it#is#not#properly.
contained#in#another#clique.#
• #denotes#the#set#of#all#maximal.cliques.#
12
Undirected)Graphs)and)Cliques)
(cont'd)
13
Markov'Random'Fields
Consider)an)undirected)graph) ,)we)say)a)
joint)distribu2on)of) )factorizes)according)to) )if)its)
density) )can)be)expressed)as:
• This&is&called&a&Markov'Random'Field&over& .
• &are&called&factors.
14
Markov'Random'Fields'(cont'd)
• The%normalizing*constant% %is%usually%needed%to%
ensure%the%distribu2on%is%properly%normalized:
• Generally,*the*compa&bility,func&ons* *need*not*
have*any*obvious*rela4ons*with*the*marginal*or*
condi4onal*distribu4ons*over*the*cliques.
15
a b
c
MRF$
Parameteriza-on
• All$MRFs$can$be$parameterized$
in$terms$of$maximal&cliques.$In$
prac9ce,$this$is$not$necessarily$
the$most$natural$way.
• Natural(parameteriza.on:(
• Maximal4clique(based:(
(with(
16
The$graphical$structure$also$encodes$a$set$of$
condi3onal$independencies$among$the$variables.
17
Condi&onal)Independence
Consider)a)joint)distribu/on)over) ,) )and) )
are)called)condi&onally*independent)given) ,)denoted)
by) )iff
More%generally,
18
Condi&onal)Independence)(cont'd)
If#the#condi,onal#distribu,ons# #and# #have#
densi,es# #and# ,#then# ,#if#the#
following#equality#holds#almost'surely:
!
19
I"map
• Let% %be%a%family%of%distribu2ons%(e.g.%a%graphical%
model).%We%define% %to%be%the%set%of%condi)onal,
independencies%in%the%form%of% %that%
hold%for%all%distribu2ons%in% .
• Given%a%graph% %associated%with%a%set%of%condi)onal,
independencies% ,%then% %is%called%an%I0map%of% %
if% .
• An%I0map%is%a%graph%that%captures%(part%of)%the%
condi2onal%independencies%of%a%distribu2on%family.%
20
Condi&onal)Independencies)of)MRFs
• The%condi+onal%independencies%of%an%MRF%can%be%
characterized%in%three%ways:
• Local&independencies
• Pairwise&independencies
• Global&independencies
• In%the%sequel,%we%consider%an%undirected&graph%
.
21
Local&
Independencies
• For%each% ,%the%Markov'
blanket%of% %w.r.t.% ,%denoted%
by% ,%is%the%set%of%all%
neighbors%of% .
• Local'independencies:% %is%
independent%of%the%rest%given%
its%neighbors.
22
Pairwise(
Independencies
• Pairwise(independencies:#Given#
two#disjoint#sets# #
with#no#direct#edges#between#
them,# #is#independent#of#
#given#the#rest:
23
Global&
Independencies
• We$say$ $separates$ $and$ ,$
denoted$by$ ,$if$all$
paths$between$ $and$ $go$
through$ .
• Global+independencies:$If$ $
separates$ $and$ ,$then$ $is$
independent$of$ $given$ .
24
Rela%ons)between)Independencies
•
• Given'a'distribu/on'or'a'family'of'distribu/on' ,'
we'say' 'sa#sfies' 'if'it'sa/sfies'all'condi#onal,
independencies'in' ,'denoted'by' .
• .
• If' 'is'a'family'of'posi#ve,distribu#ons,'then
25
Soundness
• Let% %be%a%distribu-on%that%factorizes%according%to%
an%undirected%graph% ,%then% ,%or%in%other%
words,% %is%an%I"map%of%
• %and% .
• How%to%proof?
• How%is%the%separa)on,assump)on%related%to%the%
maximal,cliques?
26
We#have#shown#that#if# #factorizes#according#to# ,#
then# #is#an#I5map#for# .#Is#the#converse#also#true?
27
Hammersley)Clifford
• (Hammersley*Clifford0Theorem)"Let" "be"a"posi5ve0
distribu5on"over" "and" "be"an"I.map"of"
,"then" "factorizes"according"to" ."
• Combining"Soundness"and"Hammersley*Clifford:
• A"posi5ve0distribu5on" "factorizes"according"to" "
if"and"only"if" "is"an"I*map"of" .
28
Condi&onal)Independencies)of)BN
• Condi'onal*independencies*of*a*Bayesian(network*
can*be*characterized*in*two*ways:
• local*independencies
• global*independencies*(via* .separa0on)
• In*the*sequel,*we*consider*a*directed(graph*
.
29
b c
g
a
d
i
e f
h
Local&
Independencies
• Given' ,' 'is'
independent'of'its'non#
descendants'given'its'parents'
:
30
X
Z
Y
Y
Z
X
Z
X Y Z
X Y
indirect effect
common cause common effect
31
!separa'on
When%"influence"%can%flow%from% %to% %via% ,%we%say%
that%the%trail% %is%ac)ve:
• "is"ac#ve"iff" "is"not"observed.
• "is"ac#ve"iff" "is"not"observed.
• "is"ac#ve"iff" "is"not"observed.
• (V(structure)" "is"ac#ve"iff"either" "or"
some"of" 's"descendants"is"observed.
32
!separa'on*(cont'd)
• A#trail# #is#called#ac#ve#when#all#
sub2trails# #are#ac#ve.
• Let# #be#three#sets#of#ver8ces#of# .# #and# #
are# 2separated#by# ,#denoted#by# ,#if#
there#is#neither#direct#link#nor#ac8ve#trail#between#
#and# #when# #are#observed.
33
b c
g
a
d
i
e f
h
Global&
Independencies
• Given' ,' 'is'
independent'of' 'given' 'if'
'and' 'is' 'separated'by' 'on'
the'graph' :
•
• (Soundness)"If" "factorize"
according"to" ,"then"
,"or"we"say" "is"an"
I2map"of" .
34
Moralized*Graphs
• Given'a'directed'graph' ,'we'can'
construct'a'moralized'graph,'denoted'by' 'by'
adding'edges'between'each'node'and'its'parents'
and'between'each'node's'parents.'
• In' ,'the'subgraph'that'span' 'forms'
a'clique,'denoted'by' .'
• The'procedure'of'construc=ng' 'from' 'is'
called'moraliza2on.
35
Moralized*Graphs*(Illustra3on)
b c
g
a
d
i
e f
h
b c
g
a
d
i
e f
h
36
From%BN%to%MRF
If# #factorizes#according#to# #as
then% %factorizes%according%to% :
• .#Is#the#opposite#true?
37
From%BN%to%MRF%(cont'd)
• In$general,$moraliza(on$may$cause$the$loss$of$
condi(onal,independencies.
• $may$be$a$proper$subset$of$ .
• Consider$a$DAG:$
• $holds$for$ $but$not$for$ .$
• Not$every$MRF$can$be$converted$to$a$BN.
38
Factor'Graphs
• An$MRF$does$not$always$fully$reveal$the$factorized.
structure$of$a$distribu8on.
• A$factor.graph$can$some8mes$give$a$more$accurate$
characteriza8on$of$a$family$of$distribu8ons.$
• A$factor.graph$is$a$bipar4te.graph$with$links$between$
two$types$of$nodes:$variables$and$factors.$
• A$variable$ $and$a$factor$ $is$linked$in$a$factor$
graph,$if$the$factor$involves$ $as$an$argument.
39
Factor'Graphs'(Illustra0on)
40
Study&of&Distribu.ons
• Graphical*models* *structure*of*(in)dependencies
• Exponen8al*families* *algebraic*characteris8cs
41
Exponen'al*Families
An#exponen&al)family# #over#a#measure#space# :
• sufficient)sta+s+cs:"
• canonical)parameter)func+on:"
• par++on)func+on:"
• base)density:" "over"
42
Par$$on'Func$on
• The%par$$on'func$on%is%given%by:
• The%log$par((on*func(on%given%by%
%is%o.en%used%instead%of% .
43
Parameter'Space
• An$exponen)al$family$is$essen)ally$determined$by$
the$domain$ $and$the$sufficient-sta.s.cs$ .
• The$set$of$valid$parameters$is$
• An$exponen)al$family$can$be$parameterized$in$
many$ways.$When$ ,$it$is$said$to$be$in$the$
canonical-form.$
44
Many%important%families%of%distribu3ons%are%
exponen3al%families:
• Binomial(distribu/ons
• Poisson(distribu/on
• Normal(distribu/on
• Exponen/al(distribu/on
• Beta(distribu/on
• And%many%more%...
45
Bernoulli)Distribu.on
Domain:(
Parameter:(
Density:)
Bernoulli)distribu.ons!describe!an!event!that!may!or!
may!not!happen.
46
Bernoulli)Distribu.on)(cont'd)
• sufficient)sta+s+cs:"
• canonical)parameter:"
• base)density:" "w.r.t."coun'ng
• par++on)func+on:"
47
Bernoulli)Distribu.on)(cont'd)
• sufficient)sta+s+cs:"
• canonical)parameters:"
• base)density:" "w.r.t."coun'ng
• par++on)func+on:"
48
Poisson&Distribu,on
Domain:(
Parameter:(
Density:
Poisson&distribu,ons!characterize!
the!number!of!independent!
events!occurring!in!a!certain!rate!
!within!a!unit!6me.
49
Poisson&Distribu,on&(cont'd)
• sufficient)sta+s+cs:"
• canonical)parameter:" "
• base)density:" "w.r.t."coun'ng
• par++on)func+on:"
50
Exponen'al*
Distribu'on
Domain:(
Parameter:(
Density:)
Exponen'al*distribu'ons!
characterize!the!*me!interval!
between!independent!events!
occurring!at!a!certain!rate! .!
51
Exponen'al*Distribu'on*(cont'd)
• sufficient)sta+s+cs:" "or"
• canonical)parameter:" "or"
• base)density:" "w.r.t."Lebesgue
• par++on)func+on:" ,"
which"is"finite"only"when" .
52
Normal'Distribu.on
Domain:(
Parameter:(
Density:
Normal'distribu.ons!are!probably!
the!most!widely!used!
distribu2ons!in!probabilis2c!
analysis.
53
Normal'Distribu.on'(cont'd)
• sufficient)sta+s+cs:"
• canonical)parameter:" "
• base)density:" "w.r.t."Lebesgue
• par++on)func+on:"
54
Normal'Distr.'in'Canonical'Form
The$normal$distribu1on$can$be$alterna1vely$
parameterized$in$the$canonical'form:
• poten&al)coefficient:"
• precision)coefficient:"
with% .
55
Regular(Family
In#the#sequel,#we#focus#on#exponen2al#families#in#the#
canonical'form,#the#set#of#valid#canonical'parameters#is:
The$exponen)al$family$ $is$called$a$regular'family,$if$
$is$an$open$subset$of$ .$We$restrict$our$
a:en)on$to$regular'families.$
56
Iden%fiability
Let$ $be$a$parameterized$family:
• "is"called"iden%fiable"when"each"distribu1on"in" "
corresponds"to"a"unique"parameter"in" :
!
• Iden%fiability"means"that"the"parameter"of"a"
distribu1on"can"be"learned"from"observed3samples"
without"the"need"of"addi1onal"constraints.
57
Minimal'and'Overcomplete
Consider)an)exponen-al)family)with)sufficient)stats) ,
• If$there$exist$ $such$that$
$holds$almost$everywhere,$this$is$called$
a$overcomplete*representa.on,$otherwise,$it$is$called$
a$minimal*representa.on.
• An$exponen;al$family$is$iden.fiable$if$and$only$if$
the$representa;on$is$minimal.$Why?
58
Minimal'and'Overcomplete'(cont'd)
• Consider*an*exponen.al*family*with*sufficient*stats*
*such*that* *is*constant,*then*for*each*
,* *for*each* *is*also*in* *and*it*
yields*the*same*distribu.on.
• We*will*answer*why*minimal&representa,on&is&
iden,fiable*later.
• Overcomplete&representa,on*is*useful*as*it*may*lead*
to*more*natural*parameteriza.on.*Also,*with*
addi.onal*constraints,*it*can*be*made*iden,fiable.
59
Bernoulli)Revisited
Consider)two)representa.ons:
• [R1]" "
• [R2]"
For$each$representa-on:
• Is$it$minimal$or$overcomplete?
• If$it$is$overcomplete,$find$ $such$that$
• Is$it$iden.fiable$or$uniden.fiable?
60
Mean%Parameters
• The%expecta'on%of%sufficient%sta0s0cs%as%below%are%
called%mean+parameters:
• Under'certain'condi+ons,'the'distribu+on'in'an'
exponen+al'family'is'uniquely'determined'by'the'
mean-parameters,'which'thus'provide'an'alterna+ve'
parameteriza+on.
61
Realizable(Mean(Parameters
• Given'a'sufficient'stats' ,'we'say'a'distribu4on' '
realizes'a'mean*parameter' 'if' .
• The'set'of'(realizable)*mean*parameters'for'a'given'
sufficient'stats' 'is:
• Here,' 'is'not'restricted'to'the'exponen4al'family.'
• 'is'a'convex*set.'Why?
62
Convex'Hulls
• Given'a'set' ,'the'convex'hull'of' ,'denoted'
by' ,'is'the'set'of'all'convex'combina/ons'of'
elements'in' .
• 'are'the'minimum'convex'set'containing' .
• A'convex'hull'of'some'finite'set'is'called'a'convex'
polytope.
• Convex'polytopes'are'compact.
63
Probability*Simplex
Given&a&finite&space& ,&the&probability*simplex&over& :
When% ,% %reduces%to:%
and$ $is$an$ 'dimensional$polytope.
64
Polytope(of(Mean(Parameters
When%the%sample%space% %is%finite,%given%any%
,%the%set% %is%a%convex'polytope:
Par$cularly,*each* *can*be*wri1en*as
65
Log$par((on*Func(on
The$log$par((on*func(on$given$by$
has$the$following$proper0es:
• "is"a"convex'func*on"and"thus"
"is"a"convex'set.
66
Log$par((on*Func(on*(cont'd)
• For%an%overcomplete*representa.on%with% ,%
%has% ,%because:%
• Conversely,,for,a,minimal&representa,on,,we,have,
,for,every,non1zero,vector, ,,hence,
,is,posi7ve,definite,for,every, ,and,
thus, ,is,strictly&convex.,
67
Gradient)Map
The$gradient)map$ $is$a$mapping$
from$the$canonical)parameters$ $to$the$mean)
parameters$ .
• When&is& &injec&ve((i.e.(one,to,one)&?
• When&is& &surjec&ve&(onto& )&?&
68
Gradient)Map)(cont'd)
• The%gradient)map%is%injec.ve%if%and%only%if%the%
exponen2al%representa2on%is%minimal.
• An%exponen2al%family%with%minimal)representa.on%
is%iden.fiable.
• How%to%prove?% %is%strictly)convex% %
.
• With%overcomplete)representa.on,%there%is%one=to=
one%correspondence%between%mean%parameters%
and%affine)subsets%of% .
69
Gradient)Map)(cont'd)
• With&minimal&representa,on,& &is&onto& ,&the&
interior&of& .&
• Each&mean&parameter& &is&uniquely&realized&
by&a&canonical&parameter& .
• Given& ,&there&can&be&many&distribu;ons&
that&realize& ,&among&which&there&is&one&that&
maximizes&the&entropy,&which&is&in&the&exponen;al&
family&associated&with& &(we&will&see&this).
70
Entropy
Given&an&exponen+al&family&distribu+on&
The$entropy$of$ $is$defined$to$be:
71
Maximum'Entropy'(Problem)
Consider)a)finite)space) ,)we)want)to
!
What's'the'solu,on?
72
Maximum'Entropy'(Solu2on)
Using&the&method'of'Lagrange'mul0pliers,&we&get&the&
op.ma& :
The$solu)on$to$an$maximum&entropy&problem$with$
expecta1on&constraints$is$always$an$exponen1al&family&
distribu1on.$This$can$be$generalized$to$con)nuous$
space,$using$calculus&of&varia1ons.
73
Kullback(Leibler-Divergence
The$Kullback(Leibler-divergence$(or$KL-divergence)$
between$two$probability$densi4es$ $and$ $(w.r.t.$the$
same$base$measure)$is$defined$to$be
KL#divergence#is#not$symmetric.
74
KL#Divergence#(cont'd)
• (Gibbs&inequality)" ,"where"the"equality"
holds"if"and"only"if" "almost"everywhere"
(w.r.t."the"base"measure)."
• Given"two"distribu;ons" "in"the"same"
exponen;al"family"with"sufficient"stats" :
75
Projec'ons*of*Distribu'ons
Let$ $be$an$exponen&al)family$and$ $be$a$distribu-on,$
both$over$the$same$space$ :
• The%I"projec)on+(informa)on+projec)on)%of% %onto% :
• The%M"projec)on+(moment+projec)on)%of% %onto% :
76
Maximum'Likelihood'Es1ma1on
Given&a&parameterized&family& &over& ,&
and& ,&the&log$likelihood&of& :
!is!called!an!maximum&likelihood&es.mate!given! !if
77
MLE$(cont'd)
• Given' ,' 'is'called'an'
empirical)probability)measure,'which'has
• The%log$likelihood%of% %given% %can%be%rewri1en%as
78
MLE$(cont'd)
• We$have$
.
• Maximizing$ $is$equivalent$to$
minimizing$ .
• Maximum'likelihood'es/ma/on$is$equivalent$to$M1
projec/on$of$the$empirical'distribu/on$ $to$a$given$
family.
79
M"projec)ons
Given&an&exponen+al&family& &and&an&arbitrary&
distribu+on& &over& ,&then
Thus,&the&op#ma& &is
80
M"projec)ons,(cont'd)
• This&is&a&convex'problem.&The& &is&op-mal&iff:
• M0projec-on&to&an&exponen1al&family& &is&to&find&a&
distribu1on&in& &whose&mean'parameter&matches&
the&input&mean& .&( &is&always&realizable,&why?)
• With&minimal'representa-on&the&op1ma&is&unique;&
otherwise,&the&set&of&op1mal&solu1ons&is&an&affine'
subset&of& ,&which&yield&the&same&distribu1on.
81
What%about%I"projec)ons%?
• We$will$see$their$u-lity$when$we$talk$about$mean%
field%methods$and$varia0onal%inference.
82
Convex'Conjugate
Let$ $be$a$real)valued$func0on$ :
• The%convex'conjugate%of% %is%defined%to%be
• "is"always"convex"no"ma,er"whether"so"is" .
• "is"convex.
83
Convex'Conjugate
• (Fenchel's*inequality)"
• (Fenchel2Moreau*theorem)" "iff" "is"convex"and"
lower*semi2con:nuous:
84
Conjugate*Duality
• "is"called"dually&coupled"if"
.
• The%convex'conjugate%to%a%log+par..on%func.on% :
• Supreme(a*ained(at( (iff( (is(dually&coupled.
85
Conjugate*Duality*(cont'd)
• "has:
• "on" "determined"via"Cauchy"sequences.
86
Conjugate*Duality*(cont'd)
• With& ,&the&log$par((on*func(on& &has:
• Supreme(a*ained(at( (iff( (is(dually&coupled,(
which(has( .
• With(a(minimal&representa1on,( (maps( (one8to8
one(onto( ,(while( (is(the(inverse(map.
87
Prior%and%Posterior
• In$Bayesian(analysis,$we$usually$place$a$prior$with$
density$ $over$the$parameter$space$ .
• A$parameter$ $is$linked$to$observa;ons$ $
via$a$likelihood(model:$ .
• The$posterior(measure$given$ $is$
88
Prior%and%Posterior%(cont'd)
• Compu'ng*the*posterior(distribu,on*is*in*generally*
very*difficult.
• However,*under*certain(condi,on*(e.g.*when*the*
prior*is*conjugate*to*the*likelihood(model),*the*
computa'on*becomes*par'cularly*easy.
89
Conjugate*Prior
• A#prior#with#density# #is#called#a#conjugate*
prior#to#the#likelihood#model# ,#if#the#posterior#
distribu9on#given# #is#in#the#same#
parameterized#family,#i.e.#in#the#form#
• #is#le01associa3ve#and#sa9sfies#
.
• When# ,# .#The#
result#is#independent#of#the#order#of#samples.
90
CP#for#Exponen,al#Families
Generally,)conjugate*pairs)in)exponen0al*families)are)in)
the)following)form:
• Prior:'
• Likelihood:'
91
CP#for#Exponen,al#Families#(cont'd)
Hence,&the&posterior(update:
• with&a&single&observa1on:&
• with&mul*ple&observa*ons:&
92
CP#for#Exponen,al#Families#(cont'd)
• The%family%of%conjugate*priors%is%largely%determined%
by%the%likelihood*model,%par6cularly%by%the%form%of%
%and% .
• A%family%of%prior*distribu5ons%can%serve%as%the%
conjugate*priors%to%different%likelihood*model.%
93
Example:)Beta,Bernoulli
• Prior:'Beta'distribu0on
• Likelihood:+Bernoulli+distribu3on
• Posterior:*remains*a*Beta*distribu2on
94
Example:)Normal-Normal
• Prior:'Normal'distribu1on
• Likelihood:+Normal+distribu4on+(fixed+variance)
• Posterior:*remains*a*Normal*distribu3on
95
Dirichlet)
Distribu-on
• Dirichlet)distribu.on"is"a"
distribu+on"over" .
• It"is"o2en"used"as"a"conjugate)
prior"to"the"Categorical)
distribu.on"or"the"Mul.nomial)
distribu.on.
• With" "as"the"
parameter,"its"density:
96
Dirichlet)Distribu-on)(cont'd)
• Mean:' 'with' .
• Covariance:'
• Mode:'
• Marginal:'
97
Dirichlet)Categorical
• Prior:'
• Likelihood:'
• Posterior:'remains'a'Dirichlet'distribu7on:'
• When' ,' 'reduces'to'a'uniform(
distribu-on'over' .
98
Dirichlet)Distribu-on)(cont'd)
• Dirichlet*distribu/ons*are*an*exponen&al)family:
• Canonical*parameter:*
• Sufficient*stats:*
• Log;par//on:*
• Hence,*
99
Predic've)Distribu'on
Given& ,&the&distribu/on&of&a&new&sample& ?
With%exponen&al)family%and%conjugacy,%we%have
100
Important)Conjugate)Pairs
• Beta":"the"probability,parameter"of"Bernoulli,"
Binomial,"Geometric,"or"Nega4ve,Binomial
• Normal:"the"mean,parameter"of"Normal
• InverseGamma:"the"variance,parameter"of"Normal
• Gamma:"the"rate,parameter"of"Exponen4al"or"
Poisson,"or"the"precision,parameter"of"Normal
101
Important)Conjugate)Pairs)(cont'd)
• Dirichlet:#the#probability.vector#of#Categorical#or#
Mul4nomial
• Mul4variate.Normal:#the#mean.vector#of#Mul4variate.
Normal
• InverseWishart:#the#covariance.matrix#of#Mul4variate.
Normal
• Wishart:#the#precision.matrix#of#Mul4variate.Normal
102
Examples)of)Graphical)Models
103
N
M
µk
2
zi
xi
⇡
GMM
A"Gaussian'Mixture'Model'(GMM)"
with"fixed"variance:
• This&model&is&not$complete
• How&are& &and& &
generated?
104
N
M
µk
2
zi
xi
⇡
µ0
2
0
↵
GMM#(with#Prior)
With%priors%placed%over%model*
parameters,%we%get%a%Hierarchical*
Bayesian*Model:
• Hyperparameters"have"no"
parents"(top"level)
• Observa-ons"have"no"children"
(bo4om"level)
• Each"unknown"variable"is"
generated"according"to"its"
parents
105
N
M
µk
2
zi
xi
⇡
µ0
2
0
↵
GMM#(Joint#Model)
This%is%an%exponen&al)family.
106
Topic&Models
107
M
N
nd
✓d
zdi
wdi
k
PLSI
Probabilis)c+Latent+Seman)c+
Indexing+(PLSI):
• Each&topic&is&associated&with&
,&a&distribu2on&over&the&
vocabulary.
• Each&document& &comes&with&a&
vector&of&topic+propor-ons& &
• To&generate&each&word& :
• This&is&not&a&complete&model.
108
M
N
nd
✓d
zdi
wdi
k
↵
LDA
Latent&Dirichlet&Alloca/on&(LDA)!
completes!PLSI!by!placing!
Dirichlet&priors!over!latent!
variables:
• For%each%document,%the%topic&
propor(ons%are%generated%as%
• For%each%topic,%the%word%
distribu+on%is%generated%as
109
M
N
nd
✓d
zdi
wdik
↵
LDA$(Joint$Model)
Again,'an'exponen&al)family.
110
Summary
• The%Basics%of%Graphical%Models
• Bayesian%Networks%and%Markov%Random%Fields.%
• How%the%joint%distribuBon%factorizes%according%to%
the%graph.
• RelaBons%between%graphical%structure%and%
condiBonal%independencies.
• Factor%graphs.
111
Summary'(cont'd)
• The%Basics%of%Exponen1al%Families
• The%form%of%exponen1al%families
• Minimal%and%overcomplete%representa1on,%
iden1fiability
• Convexity%of%log@par11on%func1on,%gradient%map
• KL%divergence,%projec1ons%of%distribu1ons
• Conjugate%duality%between%log@par11on%func1on%
and%nega1ve%entropy
112
Summary'(cont'd)
• Conjugate+Prior
• Posterior+distribu2ons+in+Bayesian+analysis
• Conjugate+prior,+especially+of+exponen2al+
families+
• Important+conjugate+pairs
113
Summary'(cont'd)
• Prac&ce
• How+to+formulate+a+graphical+model+based+on+
intui&on
• Graphical+representa&on+of+a+model,+factor+graph
• Analysis+of+the+joint+distribu&on
114

MLPI Lecture 4: Graphical Model and Exponential Family