Orbit-Product Analysis of (Generalized) Gaussian              Belief Propagation         Jason Johnson, Post-Doctoral Fell...
Overview   Introduction         graphical models + belief propagation         specialization to Gaussian model  Analysis o...
Graphical Models   A graphical model is a multivariate probability distribution that is   expressed in terms of interactio...
Gaussian Graphical Model   Information form of Gaussian density.                    P(x) ∝ exp − 2 x T Jx + hT x          ...
Belief Propagation   Belief Propagation iteratively updates a set of messages µi→j (xj )   defined on directed edges of the...
Belief Propagation II   Pairwise Estimates (on edges of graph):                               1 ˜        ˜            ψ(xi...
Gaussian Belief Propagation (GaBP)                              1   Messages µi→j (xj ) ∝ exp{ 2 αi→j xj2 + βi→j xj }.   B...
Gaussian BP Determinant Estimate   Estimates of pairwise covariance on edges:                                             ...
The BP Computation Tree  BP marginal estimates are equivalent to the exact marginal in a  tree-structured model [Weiss & F...
Walk-Summable Gaussian Models                                                                     ∞  Let J = I − R. If ρ(R...
Walk-Sum Interpretation of GaBP   Combine interpretation of BP as exact inference on computation   tree with walk-sum inte...
Complete Walk-Sum for Means  Every walk in G ending at a node i maps to a walk of the  computation tree Ti (ending at root...
Incomplete Walk-Sum for Variances   Only those totally backtracking walks of G can be embedded as   closed walks in the co...
Zeta Function and Orbit-Product   What about the determinant?   Definition of Orbits:       A walk is closed if it begins a...
Zbp as Totally-Backtracking Orbit-Product   Definition of Totally-Backtracking Orbits:       Orbit is reducible if it conta...
Orbit-Product Correction and Error Bound   Orbit-product correction to Z bp :                             Z = Z bp        ...
Reduction to Backtrackless Orbit-Product Correction   We may reduce the orbit-product correction to one over just   backtr...
Backtrackless Determinant Correction   Define backtrackless graph G of G as follows: nodes of G   correspond to directed ed...
Region-Based Estimates/Corrections   Select a set of regions R ⊂ 2V that is closed under intersections   and cover all ver...
Example: 2-D Grids                                                     L   Choice of regions for grids: overlapping L × L,...
256 × 256 Periodic Grid, uniform edge weights r ∈ [0, .25].Test with L = 2, 4, 8, 16, 32.  1                              ...
Generalized Belief Propagation   Select a set of regions R ⊂ 2V that is closed under intersections   and cover all vertice...
Example: 2-D Grid Revisited                               −2                              10                              ...
GBP Toy Example  Look at graph G = K4 and consider different choices of regions...                                         ...
GBP “3∆” Regions:    n = +1               n = −1         12              n = +1       124         123                     ...
Computational Experiment with equal edge weights r = .32 (themodel becomes singular/indefinite for r ≥ 1 ).                ...
Orbit-Product Interpretation of GBP   Answer: sometimes GBP can overcount orbits of the graph.       Let T (R) be the set ...
Examples of Over-Counting   Orbit       = [(12)(23)(34)(41)]:                       2                                   1 ...
Conclusion and Future Work   Graphical view of inference in walk-summable Gaussian graphical   models that is very intuiti...
Upcoming SlideShare
Loading in …5
×

Physics of Algorithms Talk

268 views
197 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
268
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Physics of Algorithms Talk

  1. 1. Orbit-Product Analysis of (Generalized) Gaussian Belief Propagation Jason Johnson, Post-Doctoral Fellow, LANL Joint work with Michael Chertkov and Vladimir Chernyak Physics of Algorithms Workshop Santa Fe, New Mexico September 3, 2009
  2. 2. Overview Introduction graphical models + belief propagation specialization to Gaussian model Analysis of Gaussian BP walk-sum analysis for means, variances, covariances1 orbit-product analysis/corrections for determinant2 Current Work on Generalized Belief Propagation (GBP) [Yedidia et al] uses larger “regions” to capture more walks/orbits of the graph (better approximation) However, it can also lead to over-counting of walks/orbits (bad approximation/unstable algorithm)! 1 Earlier joint work with Malioutov & Willsky (NIPS, JMLR ’06). 2 Johnson, Chernyak & Chertkov (ICML ’09).
  3. 3. Graphical Models A graphical model is a multivariate probability distribution that is expressed in terms of interactions among subsets of variables (e.g. pairwise interactions on the edges of a graph G ). 1 P(x) = ψi (xi ) ψij (xi , xj ) Z i∈V {i,j}∈G Markov property: A S B P(xA , xB |xS ) = P(xA |xS )P(xB |xS ) Given the potential functions ψ, the goal of inference is to compute marginals P(xi ) = xV i P(x) or the normalization constant Z , which is generally difficult in large, complex graphical models.
  4. 4. Gaussian Graphical Model Information form of Gaussian density. P(x) ∝ exp − 2 x T Jx + hT x 1 Gaussian graphical model: sparse J matrix Jij = 0 if and only if {i, j} ∈ G Potentials: 1 2 ψi (xi ) = e − 2 Jii xi +hi xi ψij (xi , xj ) = e −Jij xi xj Inference corresponds to calculation of mean vector µ = J −1 h, covariance matrix K = J −1 or determinant Z = det J −1 . Marginals P(xi ) specified by means µi and variances Kii .
  5. 5. Belief Propagation Belief Propagation iteratively updates a set of messages µi→j (xj ) defined on directed edges of the graph G using the rule: µi→j (xj ) ∝ ψi (xi ) µk→i (xi )ψ(xi , xj ) xi k∈N(i)j Iterate message updates until converges to a fixed point. Marginal Estimates: combine messages at a node 1 P(xi ) = ψi (xi ) µk→i (xi ) Zi k∈N(i) ˜ ψi (xi )
  6. 6. Belief Propagation II Pairwise Estimates (on edges of graph): 1 ˜ ˜ ψ(xi , xj ) P(xi , xj ) = ψi (xi )ψj (xj ) Zij µi→j (xj )µj→i (xi ) ˜ ψij (xi ,xj ) Estimate of Normalization Constant: Zij Z bp = Zi Zi Zj i∈V {i,j}∈G BP fixed point is saddle point of RHS with respect to messages/reparameterizations. In trees, BP converges in finite number of steps and is exact (equivalent to variable elimination).
  7. 7. Gaussian Belief Propagation (GaBP) 1 Messages µi→j (xj ) ∝ exp{ 2 αi→j xj2 + βi→j xj }. BP fixed-point equations reduce to: αi→j = Jij (Jii − αij )−1 2 βi→j = −Jij (Jii − αij )−1 (hi + βij ) where αij = k∈N(i)j αk→i and βij = k∈N(i)j αk→i . Marginals specified by: Kibp = (Jii − αk→i )−1 k∈N(i) µbp i = Kibp (hi + βk→i ) k∈N(i)
  8. 8. Gaussian BP Determinant Estimate Estimates of pairwise covariance on edges: −1 bp Jii − αij Jij K(ij) = Jij Jjj − αji Estimate of Z det K = det J −1 : Zij Z bp = Zi Zi Zj i∈V {i,j}∈G where Zi = Kibp and Zij = det K(ij) . bp Exact in tree models (equivalent to Gaussian elimination), approximate in loopy models.
  9. 9. The BP Computation Tree BP marginal estimates are equivalent to the exact marginal in a tree-structured model [Weiss & Freeman]. (4) 1 µ2→1 1 2 3 (3) 2 4 µ3→2 3 5 5 7 4 5 6 (2) µ6→3 6 4 6 8 2 6 8 8 (1) µ5→6 7 8 9 5 9 1 7 3 9 7 9 1 3 3 9 7 9 5 9 The BP messages correspond to upwards variable elimination steps in this computation tree.
  10. 10. Walk-Summable Gaussian Models ∞ Let J = I − R. If ρ(R) < 1 then (I − R)−1 = L L=0 R . Walk-Sum interpretation of inference: ∞ ? Kij = Rw = Rw L=0 L w :i→j w :i →j ∞ ? µi = hj Rw = h∗ R w j L=0 L w :∗→i w :j →i Walk-Summable if w :i→j |R w | converges for all i, j. Absolute convergence implies convergence of walk-sums (to same value) for arbitrary orderings and partitions of the set of walks. Equivalent to ρ(|R|) < 1.
  11. 11. Walk-Sum Interpretation of GaBP Combine interpretation of BP as exact inference on computation tree with walk-sum interpretation of Gaussian inference in trees: messages represent walk-sums in subtrees of computation tree Gauss BP converges in walk-summable models complete walk-sum for the means incomplete walk-sum for the variances
  12. 12. Complete Walk-Sum for Means Every walk in G ending at a node i maps to a walk of the computation tree Ti (ending at root node of Ti )... 1 1 2 3 2 4 4 5 6 3 5 5 7 6 4 6 8 2 6 8 8 7 8 9 5 9 1 7 3 9 7 9 1 3 3 9 7 9 5 9 Gaussian BP converges to the correct means in WS models.
  13. 13. Incomplete Walk-Sum for Variances Only those totally backtracking walks of G can be embedded as closed walks in the computation tree... 1 1 2 3 2 4 4 5 6 3 5 5 7 6 4 6 8 2 6 8 8 7 8 9 5 9 1 7 3 9 7 9 1 3 3 9 7 9 5 9 Gaussian BP converges to incorrect variance estimates (underestimate in non-negative model).
  14. 14. Zeta Function and Orbit-Product What about the determinant? Definition of Orbits: A walk is closed if it begins and ends at same vertex. It is primitive if does not repeat a shorter walk. Two primitive walks are equivalent if one is a cyclic shift of the other. Define orbits ∈ L of G to be equivalence classes of closed, primitive walks. Theorem. Let Z det(I − R)−1 . If ρ(|R|) < 1 then Z= (1 − R )−1 Z. A kind of zeta function in graph theory.
  15. 15. Zbp as Totally-Backtracking Orbit-Product Definition of Totally-Backtracking Orbits: Orbit is reducible if it contains backtracking steps ...(ij)(ji)..., else it is irreducible (or backtrackless). Every orbit has a unique irreducible core γ = Γ( ) obtained by iteratively deleting pairs of backtracking steps until no more remain. Let Lγ denote the set of all orbits that reduce to γ. Orbit is totally backtracking (or trivial) if it reduces to the empty orbit Γ( ) = ∅, else it is non-trivial. Theorem. If ρ(|R|) < 1 then Z bp (defined earlier) is equal to the totally-backtracking orbit-product: Z bp = Z ∈L∅
  16. 16. Orbit-Product Correction and Error Bound Orbit-product correction to Z bp : Z = Z bp Z ∈L∅ Error Bound: missing orbits must all involve cycles of the graph... 1 Z ρg log bp ≤ n Z g (1 − ρ) where ρ ρ(|R|) < 1 and g is girth of the graph (length of shortest cycle).
  17. 17. Reduction to Backtrackless Orbit-Product Correction We may reduce the orbit-product correction to one over just backtrackless orbits γ   Z = Zbp Z = Zbp  Z γ ∈L(γ) Zγ with modified orbit-factors Zγ based on GaBP Zγ = (1 − rij )−1 where rij (1 − αij )−1 rij (ij)∈γ The factor (1 − αij )−1 serves to reconstruct totally-backtracking walks at each point i along the backtrackless orbit γ.
  18. 18. Backtrackless Determinant Correction Define backtrackless graph G of G as follows: nodes of G correspond to directed edges of G , edges (ij) → (jk) for k = i. 21 32 1 2 3 12 23 14 41 25 52 36 63 54 65 4 5 6 45 56 47 74 58 85 69 96 87 98 7 8 9 78 89 Let R be adjacency matrix of G with modified edge-weights r based on GaBP. Then, Z = Zbp det(I − R )−1
  19. 19. Region-Based Estimates/Corrections Select a set of regions R ⊂ 2V that is closed under intersections and cover all vertices and edges of G. Define regions counts (nA ∈ Z, A ∈ R) by inclusion-exclusion rule: nA = 1 − nB B∈R|A B To capture all orbits covered by any region (without over-counting) we calculate the estimate: n ZR ZB B (det(I − RB )−1 )nB B B Error Bounds. Select regions to cover all orbits up to length L. Then, 1 ZB ρL log ≤ n Z L(1 − ρ)
  20. 20. Example: 2-D Grids L Choice of regions for grids: overlapping L × L, 2 × L, L × L , 2 L 2 × L 2 (shifted by L ). 2 For example, in 6 × 6 grid with block size L = 4: n = +1 n = −1 n = +1
  21. 21. 256 × 256 Periodic Grid, uniform edge weights r ∈ [0, .25].Test with L = 2, 4, 8, 16, 32. 1 0.25 0.9 ρ(|R|) n−1 log Ztrue 0.8 ρ(|R′|) 0.2 n−1 log Z bp 0.7 n−1 log ZB (L=2,4,8,...) 0.6 0.15 0.5 0.4 0.1 0.3 0.2 0.05 0.1 0 0 0 0.05 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25 r r −20.25 10 −1 n log Ztrue 0.2 n−1 log Z −4 10 bp n−1 log Zbp Z′B −60.15 10 −8 0.1 10 n−1|log Z−1 ZB|0.05 −10 10 true n−1|log Z−1 Zbp Z′B| true −12 0 10 0 0.05 0.1 0.15 0.2 0.25 5 10 15 20 25 30 r L
  22. 22. Generalized Belief Propagation Select a set of regions R ⊂ 2V that is closed under intersections and cover all vertices and edges of G. Define regions counts (nA ∈ Z, A ∈ R) by inclusion-exclusion rule: nA = 1 − nB B∈R|A B Then, GBP solves for saddle point of ZR (ψ) Z (ψR )nR A∈R over reparameterizations {ψA , A ∈ R} of the form 1 P(x) = ψR (xR )nR Z A∈R Denote saddle-point by Zgbp = ZR (ψ gbp ).
  23. 23. Example: 2-D Grid Revisited −2 10 block estimate −4 GBP estimate 10 −6 10 free energy error −8 10 −10 10 −12 10 −14 10 −16 10 4 6 8 10 12 14 16 block size
  24. 24. GBP Toy Example Look at graph G = K4 and consider different choices of regions... 1 2 1 4 2 4 3 3 BP Regions: n = +1 12 n = −2 2 24 23 1 14 13 4 3 34
  25. 25. GBP “3∆” Regions: n = +1 n = −1 12 n = +1 124 123 1 134 14 13GBP “4∆” Regions: n = +1 n = −1 12 n = +1 2 124 123 24 23 1 234 14 13 4 3 134 34
  26. 26. Computational Experiment with equal edge weights r = .32 (themodel becomes singular/indefinite for r ≥ 1 ). 3 Z = 10.9 Zbp = 2.5 Zgbp (3∆) = 9.9 Zgbp (4∆) = 54.4!!!GBP with 3∆ regions is big improvement of BP (GBP capturesmore orbits).What went wrong with the 4∆ method?
  27. 27. Orbit-Product Interpretation of GBP Answer: sometimes GBP can overcount orbits of the graph. Let T (R) be the set of hypertrees T one may construct from regions R. Orbit spans T if we can embed in T but cannot embed it in any sub-hypertree of T . Let g #{T ∈ T (R)| spans T }. Orbit-Product Interpretation of GBP: Zgbp = Zg Remark. GBP may also include multiples of an orbit as independent orbits (these are not counted by Z ). We say GBP is consistent if g ≤ 1 for all (primitive) orbits and g = 0 for multiples of orbits (no overcounting).
  28. 28. Examples of Over-Counting Orbit = [(12)(23)(34)(41)]: 2 1 1 124 1 4 134 123 2 4 2 234 4 3 3 3 Orbit = [(12)(23)(34)(42)(21)]: 2 1 2 1 2 12 124 123 2 1 134 234 4 3 4 3 4 3
  29. 29. Conclusion and Future Work Graphical view of inference in walk-summable Gaussian graphical models that is very intuitive for understanding iterative inference algorithms and approximation methods. Future Work: many open questions on GBP. multiscale method to approximate longer orbits from coarse-grained model. beyond walk-summable?

×