Upcoming SlideShare
×

02 probabilistic inference in graphical models

613

Published on

Published in: Technology, Spiritual
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
613
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
18
0
Likes
0
Embeds 0
No embeds

No notes for slide

02 probabilistic inference in graphical models

1. 1. Belief Propagation Variational Inference Sampling Break Part 3: Probabilistic Inference in Graphical Models Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
2. 2. Belief Propagation Variational Inference Sampling BreakBelief PropagationBelief Propagation “Message-passing” algorithm Exact and optimal for tree-structured graphs Approximate for cyclic graphsSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
3. 3. Belief Propagation Variational Inference Sampling BreakBelief PropagationExample: Inference on Chains Yi Yj Yk Yl F G H Z = exp(−E (y )) y ∈Y = exp(−E (yi , yj , yk , yl )) yi ∈Yi yj ∈Yj yk ∈Yk yl ∈Yl = exp(−(EF (yi , yj ) + EG (yj , yk ) + EH (yk , yl ))) yi ∈Yi yj ∈Yj yk ∈Yk yl ∈YlSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
4. 4. Belief Propagation Variational Inference Sampling BreakBelief PropagationExample: Inference on Chains Yi Yj Yk Yl F G H Z = exp(−E (y )) y ∈Y = exp(−E (yi , yj , yk , yl )) yi ∈Yi yj ∈Yj yk ∈Yk yl ∈Yl = exp(−(EF (yi , yj ) + EG (yj , yk ) + EH (yk , yl ))) yi ∈Yi yj ∈Yj yk ∈Yk yl ∈YlSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
5. 5. Belief Propagation Variational Inference Sampling BreakBelief PropagationExample: Inference on Chains Yi Yj Yk Yl F G H Z = exp(−E (y )) y ∈Y = exp(−E (yi , yj , yk , yl )) yi ∈Yi yj ∈Yj yk ∈Yk yl ∈Yl = exp(−(EF (yi , yj ) + EG (yj , yk ) + EH (yk , yl ))) yi ∈Yi yj ∈Yj yk ∈Yk yl ∈YlSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
6. 6. Belief Propagation Variational Inference Sampling BreakBelief PropagationExample: Inference on Chains Yi Yj Yk Yl F G H Z = exp(−(EF (yi , yj ) + EG (yj , yk ) + EH (yk , yl ))) yi ∈Yi yj ∈Yj yk ∈Yk yl ∈Yl = exp(−EF (yi , yj )) exp(−EG (yj , yk )) exp(−EH (yk , yl )) yi ∈Yi yj ∈Yj yk ∈Yk yl ∈Yl = exp(−EF (yi , yj )) exp(−EG (yj , yk )) exp(−EH (yk , yl )) yi ∈Yi yj ∈Yj yk ∈Yk yl ∈YlSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
7. 7. Belief Propagation Variational Inference Sampling BreakBelief PropagationExample: Inference on Chains Yi Yj Yk Yl F G H Z = exp(−(EF (yi , yj ) + EG (yj , yk ) + EH (yk , yl ))) yi ∈Yi yj ∈Yj yk ∈Yk yl ∈Yl = exp(−EF (yi , yj )) exp(−EG (yj , yk )) exp(−EH (yk , yl )) yi ∈Yi yj ∈Yj yk ∈Yk yl ∈Yl = exp(−EF (yi , yj )) exp(−EG (yj , yk )) exp(−EH (yk , yl )) yi ∈Yi yj ∈Yj yk ∈Yk yl ∈YlSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
8. 8. Belief Propagation Variational Inference Sampling BreakBelief PropagationExample: Inference on Chains Yi Yj Yk Yl F G H Z = exp(−(EF (yi , yj ) + EG (yj , yk ) + EH (yk , yl ))) yi ∈Yi yj ∈Yj yk ∈Yk yl ∈Yl = exp(−EF (yi , yj )) exp(−EG (yj , yk )) exp(−EH (yk , yl )) yi ∈Yi yj ∈Yj yk ∈Yk yl ∈Yl = exp(−EF (yi , yj )) exp(−EG (yj , yk )) exp(−EH (yk , yl )) yi ∈Yi yj ∈Yj yk ∈Yk yl ∈YlSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
9. 9. Belief Propagation Variational Inference Sampling BreakBelief PropagationExample: Inference on Chains rH→Yk ∈ RYk Yi Yj Yk Yl F G H Z = exp(−EF (yi , yj )) exp(−EG (yj , yk )) exp(−EH (yk , yl )) yi ∈Yi yj ∈Yj yk ∈Yk yl ∈Yl rH→Yk (yk ) = exp(−EF (yi , yj )) exp(−EG (yj , yk ))rH→Yk (yk ) yi ∈Yi yj ∈Yj yk ∈YiSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
10. 10. Belief Propagation Variational Inference Sampling BreakBelief PropagationExample: Inference on Chains rH→Yk ∈ RYk Yi Yj Yk Yl F G H Z = exp(−EF (yi , yj )) exp(−EG (yj , yk )) exp(−EH (yk , yl )) yi ∈Yi yj ∈Yj yk ∈Yk yl ∈Yl rH→Yk (yk ) = exp(−EF (yi , yj )) exp(−EG (yj , yk ))rH→Yk (yk ) yi ∈Yi yj ∈Yj yk ∈YiSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
11. 11. Belief Propagation Variational Inference Sampling BreakBelief PropagationExample: Inference on Chains rG→Yj ∈ RYj Yi Yj Yk Yl F G H Z = exp(−EF (yi , yj )) exp(−EG (yj , yk ))rH→Yk (yk ) yi ∈Yi yj ∈Yj yk ∈Yk rG →Yj (yj ) = exp(−EF (yi , yj ))rG →Yj (yj ) yi ∈Yi yj ∈YjSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
12. 12. Belief Propagation Variational Inference Sampling BreakBelief PropagationExample: Inference on Chains rG→Yj ∈ RYj Yi Yj Yk Yl F G H Z = exp(−EF (yi , yj )) exp(−EG (yj , yk ))rH→Yk (yk ) yi ∈Yi yj ∈Yj yk ∈Yk rG →Yj (yj ) = exp(−EF (yi , yj ))rG →Yj (yj ) yi ∈Yi yj ∈YjSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
13. 13. Belief Propagation Variational Inference Sampling BreakBelief PropagationExample: Inference on Trees Yi Yj Yk Yl F G H I Ym Z = exp(−E (y )) y ∈Y = exp(−(EF (yi , yj ) + · · · + EI (yk , ym ))) yi ∈Yi yj ∈Yi yk ∈Yi yl ∈Yi ym ∈YmSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
14. 14. Belief Propagation Variational Inference Sampling BreakBelief PropagationExample: Inference on Trees Yi Yj Yk Yl F G H I Ym Z = exp(−EF (yi , yj )) exp(−EG (yj , yk )) · yi ∈Yi yj ∈Yj yk ∈Yk            exp(−EH (yk , yl ))· exp(−EI (yk , ym ))     yl ∈Yl ym ∈Ym    rH→Yk (yk ) rI →Yk (yk )Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
15. 15. Belief Propagation Variational Inference Sampling BreakBelief PropagationExample: Inference on Trees rH→Yk (yk ) Yi Yj Yk Yl F G H rI→Yk (yk ) I Ym Z = exp(−EF (yi , yj )) exp(−EG (yj , yk )) · yi ∈Yi yj ∈Yj yk ∈Yk (rH→Yk (yk ) · rI →Yk (yk ))Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
16. 16. Belief Propagation Variational Inference Sampling BreakBelief PropagationExample: Inference on Trees qYk →G (yk ) rH→Yk (yk ) Yi Yj Yk Yl F G H rI→Yk (yk ) I Ym Z = exp(−EF (yi , yj )) exp(−EG (yj , yk )) · yi ∈Yi yj ∈Yj yk ∈Yk (rH→Yk (yk ) · rI →Yk (yk )) qYk →G (yk )Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
17. 17. Belief Propagation Variational Inference Sampling BreakBelief PropagationExample: Inference on Trees qYk →G (yk ) rH→Yk (yk ) Yi Yj Yk Yl F G H rI→Yk (yk ) I Ym Z = exp(−EF (yi , yj )) exp(−EG (yj , yk ))qYk →G (yk ) yi ∈Yi yj ∈Yj yk ∈YkSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
18. 18. Belief Propagation Variational Inference Sampling BreakBelief PropagationFactor Graph Sum-Product Algorithm “Message”: pair of vectors at each ... factor graph edge (i, F ) ∈ E ... rF →Yi 1. rF →Yi ∈ RYi : factor-to-variable Yi ... message 2. qYi →F ∈ RYi : variable-to-factor ... F qYi →F message Algorithm iteratively update messages After convergence: Z and µF can be obtained from the messagesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
19. 19. Belief Propagation Variational Inference Sampling BreakBelief PropagationFactor Graph Sum-Product Algorithm “Message”: pair of vectors at each ... factor graph edge (i, F ) ∈ E ... rF →Yi 1. rF →Yi ∈ RYi : factor-to-variable Yi ... message 2. qYi →F ∈ RYi : variable-to-factor ... F qYi →F message Algorithm iteratively update messages After convergence: Z and µF can be obtained from the messagesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
20. 20. Belief Propagation Variational Inference Sampling BreakBelief PropagationSum-Product: Variable-to-Factor message Set of factors adjacent to rA→Yi variable i A qYi →F M(i) = {F ∈ F : (i, F ) ∈ E} Yi B rF →Yi F Variable-to-factor message . rB→Yi . . qYi →F (yi ) = rF →Yi (yi ) F ∈M(i){F }Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
21. 21. Belief Propagation Variational Inference Sampling BreakBelief PropagationSum-Product: Factor-to-Variable message Yj qYj →F rF →Yi Yi Yk qYi →F qYk →F F . . . Factor-to-variable message   rF →Yi (yi ) = exp (−EF (yF )) qYj →F (yj ) yF ∈YF , j∈N(F ){i} yi =yiSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
22. 22. Belief Propagation Variational Inference Sampling BreakBelief PropagationMessage Scheduling qYi →F (yi ) = rF →Yi (yi ) F ∈M(i){F }   rF →Yi (yi ) = exp (−EF (yF )) qYj →F (yj ) yF ∈YF , j∈N(F ){i} yi =yi Problem: message updates depend on each other No dependencies if product is empty (= 1) For tree-structured graphs we can resolve all dependenciesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
23. 23. Belief Propagation Variational Inference Sampling BreakBelief PropagationMessage Scheduling qYi →F (yi ) = rF →Yi (yi ) F ∈M(i){F }   rF →Yi (yi ) = exp (−EF (yF )) qYj →F (yj ) yF ∈YF , j∈N(F ){i} yi =yi Problem: message updates depend on each other No dependencies if product is empty (= 1) For tree-structured graphs we can resolve all dependenciesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
24. 24. Belief Propagation Variational Inference Sampling BreakBelief PropagationMessage Scheduling qYi →F (yi ) = rF →Yi (yi ) F ∈M(i){F }   rF →Yi (yi ) = exp (−EF (yF )) qYj →F (yj ) yF ∈YF , j∈N(F ){i} yi =yi Problem: message updates depend on each other No dependencies if product is empty (= 1) For tree-structured graphs we can resolve all dependenciesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
25. 25. Belief Propagation Variational Inference Sampling BreakBelief PropagationMessage Scheduling: Trees 10. 1. B Ym B Ym 9. 2. F F 6. 8. 5. 3. C C Yk Yl Yk Yl 7. 4. 3. 5. 8. 6. D E D E 2. 4. 9. 7. A A Yi Yj Yi Yj 1. 10. 1. Select one variable node as tree root 2. Compute leaf-to-root messages 3. Compute root-to-leaf messagesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
26. 26. Belief Propagation Variational Inference Sampling BreakBelief PropagationInference Results: Z and marginals . . . . . . Yi A rA→Yi qYi →F ... F Yi qYj →F qYk →F rB→Yi rC→Yi B C Yj Yk ... ... ... ... ... ... Partition function, evaluated at root Z= rF →Yr (yr ) yr ∈Yr F ∈M(r ) Marginal distributions, for each factor 1 µF (yF ) = p(YF = yF ) = exp (−EF (yF )) qYi →F (yi ) Z i∈N(F )Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
27. 27. Belief Propagation Variational Inference Sampling BreakBelief PropagationMax-Product/Max-Sum Algorithm Belief Propagation for MAP inference y ∗ = argmax p(Y = y |x, w ) y ∈Y Exact for trees For cyclic graphs: not as well understood as sum-product algorithm qYi →F (yi ) = rF →Yi (yi ) F ∈M(i){F }   rF →Yi (yi ) = max −EF (yF ) + qYj →F (yj ) yF ∈YF , j∈N(F ){i} yi =yiSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
28. 28. Belief Propagation Variational Inference Sampling BreakBelief PropagationSum-Product/Max-Sum comparison Sum-Product qYi →F (yi ) = rF →Yi (yi ) F ∈M(i){F }   rF →Yi (yi ) = exp (−EF (yF )) qYj →F (yj ) yF ∈YF , j∈N(F ){i} yi =yi Max-Sum qYi →F (yi ) = rF →Yi (yi ) F ∈M(i){F }   rF →Yi (yi ) = max −EF (yF ) + qYj →F (yj ) yF ∈YF , j∈N(F ){i} yi =yiSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
29. 29. Belief Propagation Variational Inference Sampling BreakBelief PropagationExample: Pictorial Structures (1) Ftop X Ytop (2) F top,head ... ... Yhead ... ... ... Yrarm Ytorso Ylarm ... Yrhnd ... ... ... Ylhnd Yrleg Ylleg ... ... Yrfoot Ylfoot Tree-structured model for articulated pose (Felzenszwalb and Huttenlocher, 2000), (Fischler and Elschlager, 1973) Body-part variables, states: discretized tuple (x, y , s, θ) (x, y ) position, s scale, and θ rotationSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
30. 30. Belief Propagation Variational Inference Sampling BreakBelief PropagationExample: Pictorial Structures (1) Ftop X Ytop (2) F top,head ... ... Yhead ... ... ... Yrarm Ytorso Ylarm ... Yrhnd ... ... ... Ylhnd Yrleg Ylleg ... ... Yrfoot Ylfoot Tree-structured model for articulated pose (Felzenszwalb and Huttenlocher, 2000), (Fischler and Elschlager, 1973) Body-part variables, states: discretized tuple (x, y , s, θ) (x, y ) position, s scale, and θ rotationSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
31. 31. Belief Propagation Variational Inference Sampling BreakBelief PropagationExample: Pictorial Structures (cont)   rF →Yi (yi ) = max −EF (yi , yj ) + qYj →F (yj ) (1) (yi ,yj )∈Yi ×Yj , j∈N(F ){i} yi =yi Because Yi is large (≈ 500k), Yi × Yj is too big (Felzenszwalb and Huttenlocher, 2000) use special form for EF (yi , yj ) so that (1) is computable in O(|Yi |)Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
32. 32. Belief Propagation Variational Inference Sampling BreakBelief PropagationLoopy Belief Propagation Key diﬀerence: no schedule that removes dependencies But: message computation is still well deﬁned Therefore, classic loopy belief propagation (Pearl, 1988) 1. Initialize message vectors to 1 (sum-product) or 0 (max-sum) 2. Update messages, hoping for convergence 3. Upon convergence, treat beliefs µF as approximate marginals Diﬀerent messaging schedules (synchronous/asynchronous, static/dynamic) Improvements: generalized BP (Yedidia et al., 2001), convergent BP (Heskes, 2006)Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
33. 33. Belief Propagation Variational Inference Sampling BreakBelief PropagationLoopy Belief Propagation Key diﬀerence: no schedule that removes dependencies But: message computation is still well deﬁned Therefore, classic loopy belief propagation (Pearl, 1988) 1. Initialize message vectors to 1 (sum-product) or 0 (max-sum) 2. Update messages, hoping for convergence 3. Upon convergence, treat beliefs µF as approximate marginals Diﬀerent messaging schedules (synchronous/asynchronous, static/dynamic) Improvements: generalized BP (Yedidia et al., 2001), convergent BP (Heskes, 2006)Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
34. 34. Belief Propagation Variational Inference Sampling BreakBelief PropagationSynchronous Iteration A B A B Yi Yj Yk Yi Yj Yk C D E C D E F G F G Yl Ym Yn Yl Ym Yn H I J H I J K L K L Yo Yp Yq Yo Yp YqSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
35. 35. Belief Propagation Variational Inference Sampling BreakVariational InferenceMean ﬁeld methods Mean ﬁeld methods (Jordan et al., 1999), (Xing et al., 2003) Distribution p(y |x, w ), inference intractable Approximate distribution q(y ) Tractable family Q Q q pSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
36. 36. Belief Propagation Variational Inference Sampling BreakVariational InferenceMean ﬁeld methods (cont) Q q ∗ = argmin DKL (q(y ) p(y |x, w )) q∈Q q p DKL (q(y ) p(y |x, w )) q(y ) = q(y ) log p(y |x, w ) y ∈Y = q(y ) log q(y ) − q(y ) log p(y |x, w ) y ∈Y y ∈Y = −H(q) + µF ,yF (q)EF (yF ; xF , w ) + log Z (x, w ), F ∈F yF ∈YF where H(q) is the entropy of q and µF ,yF are the marginals of q. (The form of µ depends on Q.)Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
37. 37. Belief Propagation Variational Inference Sampling BreakVariational InferenceMean ﬁeld methods (cont) Q q ∗ = argmin DKL (q(y ) p(y |x, w )) q∈Q q p DKL (q(y ) p(y |x, w )) q(y ) = q(y ) log p(y |x, w ) y ∈Y = q(y ) log q(y ) − q(y ) log p(y |x, w ) y ∈Y y ∈Y = −H(q) + µF ,yF (q)EF (yF ; xF , w ) + log Z (x, w ), F ∈F yF ∈YF where H(q) is the entropy of q and µF ,yF are the marginals of q. (The form of µ depends on Q.)Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
38. 38. Belief Propagation Variational Inference Sampling BreakVariational InferenceMean ﬁeld methods (cont) q ∗ = argmin DKL (q(y ) p(y |x, w )) q∈Q When Q is rich: q ∗ is close to p Marginals of q ∗ approximate marginals of p Gibbs inequality DKL (q(y ) p(y |x, w )) ≥ 0 Therefore, we have a lower bound log Z (x, w ) ≥ H(q) − µF ,yF (q)EF (yF ; xF , w ). F ∈F yF ∈YFSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
39. 39. Belief Propagation Variational Inference Sampling BreakVariational InferenceMean ﬁeld methods (cont) q ∗ = argmin DKL (q(y ) p(y |x, w )) q∈Q When Q is rich: q ∗ is close to p Marginals of q ∗ approximate marginals of p Gibbs inequality DKL (q(y ) p(y |x, w )) ≥ 0 Therefore, we have a lower bound log Z (x, w ) ≥ H(q) − µF ,yF (q)EF (yF ; xF , w ). F ∈F yF ∈YFSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
40. 40. Belief Propagation Variational Inference Sampling BreakVariational InferenceNaive Mean Field Set Q all distributions of the form q(y ) = qi (yi ). i∈V qe qf qg qh qi qj qk ql qmSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
41. 41. Belief Propagation Variational Inference Sampling BreakVariational InferenceNaive Mean Field Set Q all distributions of the form q(y ) = qi (yi ). i∈V Marginals µF ,yF take the form µF ,yF (q) = qi (yi ). i∈N(F ) Entropy H(q) decomposes H(q) = Hi (qi ) = − qi (yi ) log qi (yi ). i∈V i∈V yi ∈YiSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
42. 42. Belief Propagation Variational Inference Sampling BreakVariational InferenceNaive Mean Field (cont) Putting it together, argmin DKL (q(y ) p(y |x, w )) q∈Q = argmax H(q) − µF ,yF (q)EF (yF ; xF , w ) − log Z (x, w ) q∈Q F ∈F yF ∈YF = argmax H(q) − µF ,yF (q)EF (yF ; xF , w ) q∈Q F ∈F yF ∈YF = argmax − qi (yi ) log qi (yi ) q∈Q i∈V yi ∈Yi − qi (yi ) EF (yF ; xF , w ) . F ∈F yF ∈YF i∈N(F ) Optimizing over Q is optimizing over qi ∈ ∆i , the probability simplices.Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
43. 43. Belief Propagation Variational Inference Sampling BreakVariational InferenceNaive Mean Field (cont) argmax − qi (yi ) log qi (yi ) q∈Q i∈V yi ∈Yi − qi (yi ) EF (yF ; xF , w ) . F ∈F yF ∈YF i∈N(F ) Non-concave maximization problem → hard. (For general EF and pairwise or F qf ˆ higher-order factors.) µF Block coordinate ascent: closed-form qi qh ˆ qj ˆ update for each qi ql ˆSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
44. 44. Belief Propagation Variational Inference Sampling BreakVariational InferenceNaive Mean Field (cont) Closed form update for qi :   qi∗ (yi ) =   exp 1 −  qj (yj ) EF (yF ; xF , w ) + λ ˆ  F ∈F , yF ∈YF , j∈N(F ){i} i∈N(F ) [yF ]i =yi     λ = − log   exp 1 − qj (yj ) EF (yF ; xF , w )  ˆ  yi ∈Yi F ∈F , yF ∈YF ,j∈N(F ){i} i∈N(F ) [yF ]i =yi Look scary, but very easy to implementSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
45. 45. Belief Propagation Variational Inference Sampling BreakVariational InferenceStructured Mean Field Naive mean ﬁeld approximation can be poor Structured mean ﬁeld (Saul and Jordan, 1995) uses factorial approximations with larger tractable subgraphs Block coordinate ascent: optimizing an entire subgraph using exact probabilistic inference on treesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
46. 46. Belief Propagation Variational Inference Sampling BreakSamplingSampling Probabilistic inference and related tasks require the computation of Ey ∼p(y |x,w ) [h(x, y )], where h : X × Y → R is an arbitary but well-behaved function. Inference: hF ,zF (x, y ) = I (yF = zF ), Ey ∼p(y |x,w ) [hF ,zF (x, y )] = p(yF = zF |x, w ), the marginal probability of factor F taking state zF . Parameter estimation: feature map h(x, y ) = φ(x, y ), φ : X × Y → Rd , Ey ∼p(y |x,w ) [φ(x, y )], “expected suﬃcient statistics under the model distribution”.Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
47. 47. Belief Propagation Variational Inference Sampling BreakSamplingSampling Probabilistic inference and related tasks require the computation of Ey ∼p(y |x,w ) [h(x, y )], where h : X × Y → R is an arbitary but well-behaved function. Inference: hF ,zF (x, y ) = I (yF = zF ), Ey ∼p(y |x,w ) [hF ,zF (x, y )] = p(yF = zF |x, w ), the marginal probability of factor F taking state zF . Parameter estimation: feature map h(x, y ) = φ(x, y ), φ : X × Y → Rd , Ey ∼p(y |x,w ) [φ(x, y )], “expected suﬃcient statistics under the model distribution”.Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
48. 48. Belief Propagation Variational Inference Sampling BreakSamplingSampling Probabilistic inference and related tasks require the computation of Ey ∼p(y |x,w ) [h(x, y )], where h : X × Y → R is an arbitary but well-behaved function. Inference: hF ,zF (x, y ) = I (yF = zF ), Ey ∼p(y |x,w ) [hF ,zF (x, y )] = p(yF = zF |x, w ), the marginal probability of factor F taking state zF . Parameter estimation: feature map h(x, y ) = φ(x, y ), φ : X × Y → Rd , Ey ∼p(y |x,w ) [φ(x, y )], “expected suﬃcient statistics under the model distribution”.Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
49. 49. Belief Propagation Variational Inference Sampling BreakSamplingMonte Carlo Sample approximation from y (1) , y (2) , . . . S 1 Ey ∼p(y |x,w ) [h(x, y )] ≈ h(x, y (s) ). S s=1 When the expectation exists, then the law of large numbers guarantees convergence for S → ∞, √ For S independent samples, approximation error is O(1/ S), independent of the dimension d. Problem Producing exact samples y (s) from p(y |x) is hardSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
50. 50. Belief Propagation Variational Inference Sampling BreakSamplingMonte Carlo Sample approximation from y (1) , y (2) , . . . S 1 Ey ∼p(y |x,w ) [h(x, y )] ≈ h(x, y (s) ). S s=1 When the expectation exists, then the law of large numbers guarantees convergence for S → ∞, √ For S independent samples, approximation error is O(1/ S), independent of the dimension d. Problem Producing exact samples y (s) from p(y |x) is hardSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
51. 51. Belief Propagation Variational Inference Sampling BreakSamplingMarkov Chain Monte Carlo (MCMC) Markov chain with p(y |x) as stationary distribution 0.5 B 0.1 0.25 0.4 Here: Y = {A, B, C } Here: p(y ) = (0.1905, 0.3571, 0.4524) A 0.5 0.5 0.25 C 0.5Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
52. 52. Belief Propagation Variational Inference Sampling BreakSamplingMarkov Chains Deﬁnition (Finite Markov chain) 0.5 B Given a ﬁnite set Y and a matrix 0.1 P ∈ RY×Y , then a random process 0.25 0.4 (Z1 , Z2 , . . . ) with Zt taking values from Y A 0.5 0.5 is a Markov chain with transition matrix P, if 0.25 p(Zt+1 = y (j) |Z1 = y (1) , Z2 = y (2) , . . . , Zt = y (t) ) C 0.5 = p(Zt+1 = y (j) |Zt = y (t) ) = Py (t) ,y (j)Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
53. 53. Belief Propagation Variational Inference Sampling BreakSamplingMCMC Simulation (1) Steps 1. Construct a Markov chain with stationary distribution p(y |x, w ) 2. Start at y (0) 3. Perform random walk according to Markov chain 4. After suﬃcient number S of steps, stop and treat y (S) as sample from p(y |x, w ) Justiﬁed by ergodic theorem. In practise: discard a ﬁxed number of initial samples (“burn-in phase”) to forget starting point In practise: afterwards, use between 100 to 100, 000 samplesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
54. 54. Belief Propagation Variational Inference Sampling BreakSamplingMCMC Simulation (1) Steps 1. Construct a Markov chain with stationary distribution p(y |x, w ) 2. Start at y (0) 3. Perform random walk according to Markov chain 4. After suﬃcient number S of steps, stop and treat y (S) as sample from p(y |x, w ) Justiﬁed by ergodic theorem. In practise: discard a ﬁxed number of initial samples (“burn-in phase”) to forget starting point In practise: afterwards, use between 100 to 100, 000 samplesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
55. 55. Belief Propagation Variational Inference Sampling BreakSamplingMCMC Simulation (1) Steps 1. Construct a Markov chain with stationary distribution p(y |x, w ) 2. Start at y (0) 3. Perform random walk according to Markov chain 4. After each step, output y (i) as sample from p(y |x, w ) Justiﬁed by ergodic theorem. In practise: discard a ﬁxed number of initial samples (“burn-in phase”) to forget starting point In practise: afterwards, use between 100 to 100, 000 samplesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
56. 56. Belief Propagation Variational Inference Sampling BreakSamplingMCMC Simulation (1) Steps 1. Construct a Markov chain with stationary distribution p(y |x, w ) 2. Start at y (0) 3. Perform random walk according to Markov chain 4. After each step, output y (i) as sample from p(y |x, w ) Justiﬁed by ergodic theorem. In practise: discard a ﬁxed number of initial samples (“burn-in phase”) to forget starting point In practise: afterwards, use between 100 to 100, 000 samplesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
57. 57. Belief Propagation Variational Inference Sampling BreakSamplingGibbs sampler How to construct a suitable Markov chain? Metropolis-Hastings chain, almost always possible Special case: Gibbs sampler (Geman and Geman, 1984) 1. Select a variable yi , 2. Sample yi ∼ p(yi |yV {i} , x).Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
58. 58. Belief Propagation Variational Inference Sampling BreakSamplingGibbs sampler How to construct a suitable Markov chain? Metropolis-Hastings chain, almost always possible Special case: Gibbs sampler (Geman and Geman, 1984) 1. Select a variable yi , 2. Sample yi ∼ p(yi |yV {i} , x).Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
59. 59. Belief Propagation Variational Inference Sampling BreakSamplingGibbs Sampler (t) (t) (t) p(yi , yV {i} |x, w ) p (yi , yV {i} |x, w ) ˜ p(yi |yV {i} , x, w ) = (t) = (t) yi ∈Yi p(yi , yV {i} |x, w ) yi ∈Yi p (yi , yV {i} |x, w ) ˜Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
60. 60. Belief Propagation Variational Inference Sampling BreakSamplingGibbs Sampler (t) (t) F ∈M(i) exp(−EF (yi , yF {i} , xF , w )) p(yi |yV {i} , x, w ) = (t) yi ∈Yi F ∈M(i) exp(−EF (yi , yF {i} , xF , w ))Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
61. 61. Belief Propagation Variational Inference Sampling BreakSamplingExample: Gibbs samplerSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
62. 62. Belief Propagation Variational Inference Sampling BreakSamplingExample: Gibbs sampler Cummulative mean 1 0.95 0.9 0.85 p(foreground) 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0 500 1000 1500 2000 2500 3000 Gibbs sweepSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
63. 63. Belief Propagation Variational Inference Sampling BreakSamplingExample: Gibbs sampler Gibbs sample means and average of 50 repetitions 1 0.8 Estimated probability 0.6 0.4 0.2 0 0 500 1000 1500 2000 2500 3000 Gibbs sweep p(yi = “foreground ) ≈ 0.770 ± 0.011Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
64. 64. Belief Propagation Variational Inference Sampling BreakSamplingExample: Gibbs sampler Estimated one unit standard deviation 0.25 0.2 Standard deviation 0.15 0.1 0.05 0 0 500 1000 1500 2000 2500 3000 Gibbs sweep √ Standard deviation O(1/ S)Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
65. 65. Belief Propagation Variational Inference Sampling BreakSamplingGibbs Sampler TransitionsSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
66. 66. Belief Propagation Variational Inference Sampling BreakSamplingGibbs Sampler TransitionsSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
67. 67. Belief Propagation Variational Inference Sampling BreakSamplingBlock Gibbs Sampler Extension to larger groups of variables 1. Select a block yI 2. Sample yI ∼ p(yI |yV I , x) → Tractable if sampling from blocks is tractable.Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
68. 68. Belief Propagation Variational Inference Sampling BreakSamplingSummary: Sampling Two families of Monte Carlo methods 1. Markov Chain Monte Carlo (MCMC) 2. Importance Sampling Properties Often simple to implement, general, parallelizable (Cannot compute partition function Z ) Can fail without any warning signs References (H¨ggstr¨m, 2000), introduction to Markov chains a o (Liu, 2001), excellent Monte Carlo introductionSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
69. 69. Belief Propagation Variational Inference Sampling BreakBreakCoﬀee Break Coﬀee Break Continuing at 10:30am Slides available at http://www.nowozin.net/sebastian/ cvpr2011tutorial/Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models
1. A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.