2.
• Factor graph • undirected tree, directed tree, ploy tree (F8.43)• Goal: • Obtain an efﬁcient, exact inference algorithm for ﬁnding marginals • Compute efﬁciently where several marginals are require
3.
den. Later we shall see how to modify p(x) algorithm to incorporate evidenc p(x) = the (8.61) onding to observed variables. By deﬁnition, the marginal is obtained by sum • x denotes the set of variables in x with variable x omitted. The idea is xxhe joint distribution over all variables except x so that node x Calculate marginals for particular variable where x to substitute for p(x) using the factor graph expression (8.59) and then interchange p(x) = summations and products in order to obtain p(x) an efﬁcient algorithm. Consider the (8.61 fragment of graph shown in Figure 8.46xin which we see that8.61 tree structure of x F the the graph allows us to partitiondistribution overthe variables except x into groups, with sum the joint the factors in all joint distributionx x group associated with each of the factor x with variable x omitted. The idea one denotes the set of variables in nodes that is a neighbour of the variable • node x. We see using the factor graph expression (8.59)factorsformtitute for p(x)Joint distribution in form a production of andthe interchang that the joint distribution can be written as a product of thentions and products in order to obtain san Xs ) p(x) = F (x, efﬁcient algorithm. Consider th (8.62)nt 404 graph shown in Figure 8.46ne(x) which we see 8.62 the tree structure of 8. GRAPHICAL MODELS s∈ in F thatph ne(x) denotes the set of factor nodes that in the joint distribution into groups, wi allows us to partition the factors are neighbours of x, and X denotes the supset of all variables in the subtree connected to the variable node x via the factor node associated evaluation of the marginal p(x). Figure 8.46 with each of graph illustrating the A fragment of a factor the factor nodes that is a neighbour of the variab We see that the joint distribution can be written as a product of the form µ (x) fs →x Fs (x, Xs ) p(x) = Fs (x, Xs ) fs x (8.62 s∈ne(x)denotes the set of factor nodes that are neighbours of x, and Xs denotes thll variables in theand Fs(x, Xs)connected to theall the factors innode x associated factor nod fs , subtree represents the product of variable the group via the
4.
x, Xs ) represents the product of all the factors in the group associatedg.fs(8.62) into (8.61) and interchanging the sums and products, we ob- •uting (8.62) into and product sum (8.61) and interchanging the sums and products, we ob- p(x) = Fs (x, Xs ) p(x) = ne(x) s∈ Xs Fs (x, Xs ) F 8.61, F8.62 -> F8.63 s∈ne(x) Xs = µfs →x (x). (8.63) = ne(x) s∈ µfs →x (x). (8.63) s∈ne(x)ntroduced a set of functions µfs →x (x), deﬁned byve introduced a set of functions µfs →x (x), deﬁned by µfs →x (x) ≡ Fs (x, Xs ) (8.64) µfs →x (x)Xs ≡ Fs (x, Xs ) F 8.64 (8.64) Xs iewed as messages8.63 message from factor node to to variable node x x. F from the factor nodes fs fx the variable nodebe viewed marginal p(x)from the by the nodes fsof all the incoming x. required as messages is given factor product to the variable node ng atrequired marginalproductis given by the product of all the incoming the node x. F 8.64 p(x) of all incoming messages arriving at node x riving at these x. evaluate node messages, we again turn to Figure 8.46 and note thatrx, Xs ) is describedmessages, we again turnandFigure 8.46 and fac- that to evaluate these by a factor (sub-)graph to so can itself be note cular, we can write Fs (x, Xs ) is described by a factor (sub-)graph and so can itself be fac-particular, we, can, write (x , X ) . . . G (x , X ) ) = f (x, x . . . x )G s 1 M 1 1 s1 M M sM (8.65) enience,fwe have .denoted)G1 variables )associated with factor fx , in Xs ) = s (x, x1 , . . , xM the (x1 , Xs1 . . . GM (xM , XsM ) (8.65)
5.
messages arriving at node x. a set of functions µfs →x (x), deﬁned by Here we have introduced In order to evaluate these messages, we again s f turn to Figure 8.46 and note that • Evaluate is describeds →x (x) µf by a factor Fs (x, Xs ) each factor Fs (x, Xs )these messages≡ (sub-)graph and so can x torized. In particular, we can write Xs µfs →x (x) (8.64) itself be fac- which can ) =viewedxas . . . , xM )G1 (x1the s1 ) . . . nodes fM to the variable node x. Fs (x, Xs be fs (x, 1 , messagesxfrom , Xfactor GM (x s , XsM ) (8.65) m We see that the required marginal p(x) is given by the product Models 405 8.4. Inference in Graphicalof all the incoming where, for convenience, we have denoted the X ) messages arriving at node x. Gm (xm , variables associated with factor fx , in Figure 8.47 x , . . . x the factorization of the sm as- x addition to x, by Illustration, of Mthese messages, subgraphillustratedFigure 8.468.47. note that . This factorizationagain turn M in Figure and Note In order sociated with factor node fs . to1 evaluate we is to µx →f (xM ) that the set of variables {x,is 1 , . . . , xM byis the set (sub-)graph and so can itself be fac- each factor Fs (x, Xs ) xdescribed } a factor of variables on which the factor M s fs depends, andparticular,alsocan denoted xs , using the notation of (8.59). torized. In so it can we be write fs denotes the set of(8.65) into (8.64)that are neighbours of the factor node Substituting variable nodes we obtain x Fs (x, Xs ) = fs (x, x1 , . . . , xM )G1 (x1 , Xs1 ) . . . GM (xM , XsM )→x (x) (8.65)s) x denotes the same set but with node x removed. Here we have µf sollowing messages from. variable have denoted the nodes xassociated (x , factor) fx , in µ where, for convenience, we(x, x , . to ,factor variables m (x) = . . f nodes . . x ) G with X fs →x s 1 M m m sm addition to x, by x1 , . . . , xM . This factorization is illustrated in Figure 8.47. Note x1 xM m∈ne(fs )Gm (xmxm sm ) x X , X that µxmset of variables {x,Gm (x. , , Xsm ).the set of variables(8.67) the →fs (xm ) ≡ x1 , . . m xM } is on which the factor = . . Xsm be , . . . , x x) fs depends, and so.it can fs (x, x1denotedM s , usingF 8.67 xm →fsof (8.59). (8.66) also the notation (xm ) µ Substituting (8.65))into (8.64)set ofobtainm∈ne(fthat are neighbours of the factor node where ne(f Mdenotes the x1 x we variable nodes s )x sefore introduced two, distinctskinds of message, those that go from factor Here we have fs and ne(f ) x denotes the same set but with node x removed. µfs denoted= f →x (x), andfmessages from x from variable nodes toable nodes→x (x) deﬁned the .following those 1 , . . .go M ) nodes to factor nodes (xm , Xsm ) µ .. s (x, x that , variable Gmdenoted µx→f (x). In each case, we see that )messages(x s )x Xxm a x1 xM µx →f (xm ≡ Gm m passed along m∈ne(f , Xsm ). (8.67)ys a function of the variable associated with the variable node that link m s = fs (x, x1 , . . . , xM ) µxm →fs (xm ) X ... sm (8.66) We have 1 x therefore introduced two distinct kinds ne(message, those that go from factor xM m∈ of fs )xt (8.66) says that to evaluate thenodes denoted µf →x (x),factor node to a vari- nodes to variable message sent by a and those that go from variable nodes tong the link connecting them,denoted µx→f (x). In of the incoming messagespassed along a factor nodes take the product each case, we see that messages link are always a function of the variable associated with the variable node that link
6.
always a function of the variable associated with the variable node that link s to.e result (8.66) says that to evaluate the message sent by a factor node to a vari- •de along the link connecting them, take the product of thefrom variable to factor CAL MODELS Evaluate messages from messages incoming messagesll other linksusing sub-graph factorization the factor associated by coming into the factor node, multiply byat node, and then marginalize over all of the variables associated with theng406 of the8. GRAPHICAL MODELS sent by a It fL important to note that messages. evaluationillustrated in Figure 8.47. isstration This is of the message able node to an adjacent factor node. node can send a message to a variable node once it has received incoming Figure 8.48 Illustration of variable nodes.es from all other neighbouring the evaluation of the message sent by a fL variable node to an adjacent factor node. ally, we derive an expression for evaluating the messages from variable nodes r nodes, again by making use of the (sub-)graph factorization. From Fig- s xm f8, we see that term Gm (xm , Xsm ) associated with node xm is given by a fs of terms Fl (xm , Xml ) each associated with one of the factor nodes fl that is xmo node xm (excluding node fs ), so that fl fl Fl (xm , Xml ) Gm (xm , Xsm ) = Fl (xm , Xml ) Fl (xm(8.68) , Xml ) l∈ne(xm )fs F 8.68 n obtain he product is taken overobtain then all neighbours of node xm except for node fs .Xm except for node fs product of node Notech of the factors Fl (xm , Xml ) represents a subtree of the original graph of y the same µxm →fs (xm ) = in xm →fs (xm ) = Fl (xm (8.68)) into l(8.67),ml ) kind as introduced µ (8.62). Substituting , Xml F (xm , X we l∈ne(xm )fs Xml l∈ne(xm )fs Xml = = µfl →xm (xm ) µfl →xm (xm ) (8.69) (8.69) l∈ne(xm )fs l∈ne(xm )fs F 8.67 + F 8.68 -> F 8.69 where we have used the deﬁnition (8.64) of the messages passed from factor nodes toere we have used the deﬁnition (8.64) of the messages passed from factor nodes to
7.
from (8.66) that the message sent should take the form • Message send by leaf(variable fnode = f (x)factor node) µ →x (x) and (8.71) Figure 8.49 The sum-product algorithm µx→f (x) = 1 µf →x (x) = f (x) begins with messages sent by the leaf nodes, which de- pend on whether the leaf x f f x node is (a) a variable node, (a) (b) or (b) a factor node. • Find marginals for every variable node introduced by John-san08 • Sum-product algorithm 8. GRAPHICAL MODELS Figure 8.50 The sum-product algorithm can be viewed purely in terms of messages sent out by factor nodes to other factor nodes. In this example, the outgoing message shown by the blue arrow is obtained by taking the product of all the in- x1 coming messages shown by green arrows, mul- tiplying by the factor fs , and marginalizing over x3 the variables x1 and x2 . x2 fs and indeed the notion of one node having a special status was introduced only as a
8.
• Normalization Inference in Graphical Models 8.4. (undirected graph) 409 • totoget normalization coefﬁcient 1/Z p(x) = p~(x)/Zgraph used illustrate the x x 1 x 2 3 • use sum-product to ﬁndfunnormalized marginals for xi orithm. f • coefﬁcient 1/Z can be obtained by normalizing the marginal a b • f efﬁcient as calculated only over one single variable c 8.4. Inference in Graphical Models 409 Figure 8.51 A simple factor graph used to illustrate the x1 x2 x3 sum-product algorithm. fa fb x4 fcnnormalized joint distribution is given by x4 p(x) = fa (x1 , x2 )fb (x2 , x3 )fc (x2 , x4 ). (8.73) F 8.73 graph whose unnormalized joint distribution is given by Unnormalized joint distributionsply the sum-product algorithm to this graph, let us designate node x3which case there are two leaf nodes fa1 1 , x2 )fb (x.2 ,Startingxwith the leaf (8.73) p(x) = x(xand x 4 x3 )fc (x2 , 4 ).
9.
410 8.4.8. GRAPHICAL MODELS Inference in Graphical Models 409 p(x) = fa (x1 , x2 )fb (x2 , x3 )fc (x2 , x4 ). (8.73) r graph used to illustrate the x1 x1 x2 x2 In order to3apply thex x3 x sum-product algorithm to this graph, let us x x2 designate node x3 1 3algorithm. as the root, in which case there are two leaf nodes x1 and x4 . Starting with the leaf fa nodes, we then have the following sequence of six messages f b µx1 →fa (x1 ) = 1 (8.74) fc µfa →x2 (x2 ) = fa (x1 , x2 ) (8.75) x1 µx4 →fc (x4 ) = 1 (8.76) x4 µfc →x2 (x2 ) = fc (x2 , x4 ) (8.77) x4 x4 x4 µx2 →fb (x2 ) = µfa →x2 (x2 )µfc →x2 (x2 ) (8.78) unnormalized joint distribution is given by (a) µ (x ) = (b) , x )µ f (x . (8.79) fb →x3 3 b 2 3 x2 →fb p(x) = fa (x1 , x2 )fb (x2 , x3 )fc (x2 , x4 ). (8.73) x2 Figure 8.52 Flow of messages for the sum-product algorithm applied to the example graph in Figure 8.51. (a) pply the sum-productleaf nodesto 1 and x4 towards theThe direction 3 . (b) From the messages istowards the leaf nodes. Once this mes- From the algorithm x this graph, let us designate node xx of ﬂow of these root node illustrated in Figure 8.52. root node 3n which case there are two leaf nodes x1 and x4 . Startingsage propagation is complete, we can then propagate messages from the root node with the leafen have the following sequence of six messages out to the leaf nodes, and these are given by One message has now passed in each direction across each link, and we can now µx1 →fa (x1 ) = 1 evaluate the marginals. As a simplex3 →fb (x3 ) = verify that the marginal p(x2 ) is (8.80) (8.74) µ check, let us 1 µfa →x2 (x2 ) = fa (x1given by the correct expression. Using→x2 (x2 )and substitutingxfor the messages using (8.81) , x2 ) (8.75) µfb (8.63) = fb (x2 , 3 ) x1 the above results, we have x3 µx4 →fc (x4 ) = 1 (8.76) µx2 →fa (x2 ) = µfb →x2 (x2 )µfc →x2 (x2 ) (8.82) p(x2 ) = µfa →x2 (x2 )µfb →x2 (x2 )µfc →x2 (x2 ) µfc →x2 (x2 ) = fc (x2 , x4 ) (8.77) µfa →x1 (x1 ) = fa (x1 , x2 )µx2 →fa (x2 ) (8.83) x2 = fa (x1 , xµ) fb (x2 ,µ 3 ) fc (x2 , x4 ) x4 x x2 →fc (x2 ) = fa →x2 (x2 )µfb →x2 (x2 ) 2 (8.84) µx2 →fb (x2 ) = µfa →x2 (x2 )µfc →x2 (x2 ) (8.78) x1 x3 x4 µfb →x3 (x3 ) = fb (x2 , x3 )µx2 →fb . (8.79) µfc →x4 (x4 ) = fc (x2 , x4 )µx2 →fc (x2 ). (8.85) x2 = fa (x1 , x2 )fb (x2 , xx2)fc (x2 , x4 ) 3 x1 x2 x4n of ﬂow of these messages is illustrated in Figure 8.52. Once this mes- =ation is complete, we can then propagate messages from the root node p(x) (8.86) f nodes, and these are given by x1 x3 x4 µx3 →fb (x3 ) = 1 as required. (8.80)
10.
af nodes x1 and x4 towards the root node x3 . (b) From the root node towards the leaf nodes. One message has now passed in each direction across each link, and we can now evaluate the marginals. As a simple check, let us verify that the marginal p(x2 ) is •given by the correct expression. Usingcalculated Marginal p(x2) can be (8.63) and substituting for the messages using the above results, we have p(x2 ) = µfa →x2 (x2 )µfb →x2 (x2 )µfc →x2 (x2 ) = fa (x1 , x2 ) fb (x2 , x3 ) fc (x2 , x4 ) x1 x3 x4 = fa (x1 , x2 )fb (x2 , x3 )fc (x2 , x4 ) x1 x2 x4 = p(x) (8.86) x1 x3 x4 as required. So far, we have assumed that all of the variables in the graph are hidden. In most practical applications, a subset of the variables will be observed, and we wish to cal- culate posterior distributions conditioned on these observations. Observed nodes are easily handled within the sum-product algorithm as follows. Suppose we partition x into hidden variables h and observed variables v, and that the observed value of v is denoted v. Then we simply multiply the joint distribution p(x) by i I(vi , vi ), references @n_shuyo product corresponds where I(v, v) = 1 if v = v and I(v, v) = 0 otherwise. This @sleepy_yoshi @nokuno to p(h, v = v) and hence is an unnormalized version of p(h|v = v). By run- ning the sum-product algorithm, we can efﬁciently calculate the posterior marginals p(hi |v = v) up to a normalization coefﬁcient whose value can be found efﬁciently
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.
Be the first to comment