SlideShare a Scribd company logo
1 of 29
Download to read offline
Orbit-Product Analysis of (Generalized) Gaussian
              Belief Propagation

         Jason Johnson, Post-Doctoral Fellow, LANL
  Joint work with Michael Chertkov and Vladimir Chernyak


             Physics of Algorithms Workshop
                 Santa Fe, New Mexico
                   September 3, 2009
Overview
   Introduction
         graphical models + belief propagation
         specialization to Gaussian model

  Analysis of Gaussian BP
         walk-sum analysis for means, variances, covariances1
         orbit-product analysis/corrections for determinant2

  Current Work on Generalized Belief Propagation (GBP) [Yedidia et
  al]
         uses larger “regions” to capture more walks/orbits of the
         graph (better approximation)
         However, it can also lead to over-counting of walks/orbits
         (bad approximation/unstable algorithm)!
    1
        Earlier joint work with Malioutov & Willsky (NIPS, JMLR ’06).
    2
        Johnson, Chernyak & Chertkov (ICML ’09).
Graphical Models
   A graphical model is a multivariate probability distribution that is
   expressed in terms of interactions among subsets of variables (e.g.
   pairwise interactions on the edges of a graph G ).

                            1
                   P(x) =             ψi (xi )             ψij (xi , xj )
                            Z
                                i∈V              {i,j}∈G
   Markov property:

  A            S                B

                                          P(xA , xB |xS ) = P(xA |xS )P(xB |xS )




   Given the potential functions ψ, the goal of inference is to compute
   marginals P(xi ) = xV i P(x) or the normalization constant Z ,
   which is generally difficult in large, complex graphical models.
Gaussian Graphical Model

   Information form of Gaussian density.

                    P(x) ∝ exp − 2 x T Jx + hT x
                                 1



   Gaussian graphical model: sparse J matrix

                    Jij = 0 if and only if {i, j} ∈ G

   Potentials:                               1    2
                          ψi (xi ) = e − 2 Jii xi +hi xi
                      ψij (xi , xj ) = e −Jij xi xj

   Inference corresponds to calculation of mean vector µ = J −1 h,
   covariance matrix K = J −1 or determinant Z = det J −1 . Marginals
   P(xi ) specified by means µi and variances Kii .
Belief Propagation

   Belief Propagation iteratively updates a set of messages µi→j (xj )
   defined on directed edges of the graph G using the rule:

             µi→j (xj ) ∝        ψi (xi )              µk→i (xi )ψ(xi , xj )
                            xi              k∈N(i)j

   Iterate message updates until converges to a fixed point.

   Marginal Estimates: combine messages at a node
                                 1
                    P(xi ) =        ψi (xi )            µk→i (xi )
                                 Zi
                                               k∈N(i)

                                                 ˜
                                                 ψi (xi )
Belief Propagation II

   Pairwise Estimates (on edges of graph):

                               1 ˜        ˜            ψ(xi , xj )
              P(xi , xj ) =       ψi (xi )ψj (xj )
                              Zij                  µi→j (xj )µj→i (xi )
                                                           ˜
                                                           ψij (xi ,xj )

   Estimate of Normalization Constant:
                                                          Zij
                         Z bp =           Zi
                                                         Zi Zj
                                    i∈V        {i,j}∈G

   BP fixed point is saddle point of RHS with respect to
   messages/reparameterizations.
   In trees, BP converges in finite number of steps and is exact
   (equivalent to variable elimination).
Gaussian Belief Propagation (GaBP)
                              1
   Messages µi→j (xj ) ∝ exp{ 2 αi→j xj2 + βi→j xj }.

   BP fixed-point equations reduce to:

                  αi→j     = Jij (Jii − αij )−1
                              2

                  βi→j     = −Jij (Jii − αij )−1 (hi + βij )

   where αij =     k∈N(i)j   αk→i and βij =            k∈N(i)j   αk→i .
   Marginals specified by:

                      Kibp = (Jii −                αk→i )−1
                                          k∈N(i)

                     µbp
                      i    =   Kibp (hi   +            βk→i )
                                              k∈N(i)
Gaussian BP Determinant Estimate

   Estimates of pairwise covariance on edges:
                                                                 −1
                    bp         Jii − αij             Jij
                   K(ij)   =
                                   Jij            Jjj − αji


   Estimate of Z     det K = det J −1 :
                                                          Zij
                           Z bp =         Zi
                                                         Zi Zj
                                    i∈V        {i,j}∈G


   where Zi = Kibp and Zij = det K(ij) .
                                  bp



   Exact in tree models (equivalent to Gaussian elimination),
   approximate in loopy models.
The BP Computation Tree

  BP marginal estimates are equivalent to the exact marginal in a
  tree-structured model [Weiss & Freeman].

                                                (4)
                                                          1
                                            µ2→1
   1    2     3                   (3)       2                           4
                              µ3→2
                              3                       5         5               7
   4    5     6         (2)
                        µ6→3
                            6           4             6   8 2       6       8   8
                        (1)
                       µ5→6
   7    8     9               5 9 1 7 3 9 7 9 1 3 3 9 7 9 5 9


  The BP messages correspond to upwards variable elimination steps
  in this computation tree.
Walk-Summable Gaussian Models

                                                                     ∞
  Let J = I − R. If ρ(R) < 1 then (I − R)−1 =                             L
                                                                     L=0 R .

  Walk-Sum interpretation of inference:
                            ∞
                                                   ?
                    Kij =                       Rw =            Rw
                            L=0         L              w :i→j
                                  w :i →j

                                ∞
                                                       ?
                µi =       hj                    Rw =               h∗ R w
                       j        L=0         L              w :∗→i
                                      w :j →i

  Walk-Summable if w :i→j |R w | converges for all i, j. Absolute
  convergence implies convergence of walk-sums (to same value) for
  arbitrary orderings and partitions of the set of walks. Equivalent to
  ρ(|R|) < 1.
Walk-Sum Interpretation of GaBP




   Combine interpretation of BP as exact inference on computation
   tree with walk-sum interpretation of Gaussian inference in trees:
       messages represent walk-sums in subtrees of computation tree
       Gauss BP converges in walk-summable models
       complete walk-sum for the means
       incomplete walk-sum for the variances
Complete Walk-Sum for Means


  Every walk in G ending at a node i maps to a walk of the
  computation tree Ti (ending at root node of Ti )...

                                               1
          1     2    3
                                     2                         4

           4    5     6      3           5             5               7

                             6   4       6   8 2           6       8   8

          7     8    9       5 9 1 7 3 9 7 9       1 3 3 9 7 9 5 9


  Gaussian BP converges to the correct means in WS models.
Incomplete Walk-Sum for Variances


   Only those totally backtracking walks of G can be embedded as
   closed walks in the computation tree...

                                                1
           1     2     3
                                      2                         4

            4    5     6      3           5             5               7

                              6   4       6   8 2           6       8   8

           7     8     9      5 9 1 7 3 9 7 9       1 3 3 9 7 9 5 9


   Gaussian BP converges to incorrect variance estimates
   (underestimate in non-negative model).
Zeta Function and Orbit-Product

   What about the determinant?
   Definition of Orbits:
       A walk is closed if it begins and ends at same vertex.
       It is primitive if does not repeat a shorter walk.
       Two primitive walks are equivalent if one is a cyclic shift of
       the other.
       Define orbits ∈ L of G to be equivalence classes of closed,
       primitive walks.
   Theorem. Let Z         det(I − R)−1 . If ρ(|R|) < 1 then

                      Z=        (1 − R )−1        Z.


   A kind of zeta function in graph theory.
Zbp as Totally-Backtracking Orbit-Product

   Definition of Totally-Backtracking Orbits:
       Orbit is reducible if it contains backtracking steps ...(ij)(ji)...,
       else it is irreducible (or backtrackless).
       Every orbit has a unique irreducible core γ = Γ( ) obtained
       by iteratively deleting pairs of backtracking steps until no more
       remain. Let Lγ denote the set of all orbits that reduce to γ.
       Orbit is totally backtracking (or trivial) if it reduces to the
       empty orbit Γ( ) = ∅, else it is non-trivial.

   Theorem. If ρ(|R|) < 1 then Z bp (defined earlier) is equal to the
   totally-backtracking orbit-product:

                              Z bp =         Z
                                       ∈L∅
Orbit-Product Correction and Error Bound

   Orbit-product correction to Z bp :

                             Z = Z bp         Z
                                        ∈L∅


   Error Bound: missing orbits must all involve cycles of the graph...

                         1     Z        ρg
                           log bp ≤
                         n    Z     g (1 − ρ)

   where ρ ρ(|R|) < 1 and g is girth of the graph (length of
   shortest cycle).
Reduction to Backtrackless Orbit-Product Correction

   We may reduce the orbit-product correction to one over just
   backtrackless orbits γ
                                                   

                 Z = Zbp            Z = Zbp                    Z
                                                γ       ∈L(γ)

                                                          Zγ


   with modified orbit-factors Zγ based on GaBP

         Zγ = (1 −            rij )−1   where       rij    (1 − αij )−1 rij
                     (ij)∈γ


   The factor (1 − αij )−1 serves to reconstruct totally-backtracking
   walks at each point i along the backtrackless orbit γ.
Backtrackless Determinant Correction
   Define backtrackless graph G of G as follows: nodes of G
   correspond to directed edges of G , edges (ij) → (jk) for k = i.

                                                  21             32
             1        2       3

                                                  12             23
                                        14   41        25   52        36   63
                                                  54             65

             4        5       6

                                                  45             56
                                       47    74        58   85        69   96
                                                  87             98

             7        8        9
                                                  78             89




   Let R be adjacency matrix of G with modified edge-weights r
   based on GaBP. Then,

                          Z = Zbp det(I − R )−1
Region-Based Estimates/Corrections
   Select a set of regions R ⊂ 2V that is closed under intersections
   and cover all vertices and edges of G.
   Define regions counts (nA ∈ Z, A ∈ R) by inclusion-exclusion rule:

                          nA = 1 −              nB
                                     B∈R|A B


   To capture all orbits covered by any region (without over-counting)
   we calculate the estimate:
                             n
                 ZR         ZB B         (det(I − RB )−1 )nB
                        B            B

   Error Bounds. Select regions to cover all orbits up to length L.
   Then,
                         1      ZB        ρL
                            log     ≤
                         n      Z     L(1 − ρ)
Example: 2-D Grids

                                                     L
   Choice of regions for grids: overlapping L × L,   2   × L, L × L ,
                                                                  2
                                                                        L
                                                                        2   ×   L
                                                                                2
   (shifted by L ).
               2

   For example, in 6 × 6 grid with block size L = 4:
            n = +1                 n = −1                     n = +1
256 × 256 Periodic Grid, uniform edge weights r ∈ [0, .25].
Test with L = 2, 4, 8, 16, 32.
  1                                             0.25

 0.9    ρ(|R|)                                                 n−1 log Ztrue
 0.8    ρ(|R′|)                                  0.2
                                                               n−1 log Z
                                                                        bp
 0.7                                                           n−1 log ZB (L=2,4,8,...)
 0.6                                            0.15

 0.5

 0.4                                             0.1

 0.3

 0.2                                            0.05

 0.1

  0                                                  0
   0   0.05       0.1       0.15   0.2   0.25         0       0.05        0.1        0.15    0.2        0.25
                        r                                                        r
                                                 −2
0.25                                            10
          −1
        n log Ztrue
 0.2
        n−1 log Z                                −4
                                                10
                 bp
        n−1 log Zbp Z′B
                                                 −6
0.15                                            10


                                                 −8
 0.1                                            10


                                                               n−1|log Z−1 ZB|
0.05
                                                 −10
                                                10                      true
                                                               n−1|log Z−1 Zbp Z′B|
                                                                        true
                                                 −12
  0                                             10
   0   0.05       0.1       0.15   0.2   0.25             5          10     15       20     25     30
                        r                                                        L
Generalized Belief Propagation
   Select a set of regions R ⊂ 2V that is closed under intersections
   and cover all vertices and edges of G.
   Define regions counts (nA ∈ Z, A ∈ R) by inclusion-exclusion rule:

                         nA = 1 −                nB
                                      B∈R|A B


   Then, GBP solves for saddle point of

                         ZR (ψ)             Z (ψR )nR
                                      A∈R

   over reparameterizations {ψA , A ∈ R} of the form
                                  1
                        P(x) =              ψR (xR )nR
                                  Z
                                      A∈R

   Denote saddle-point by Zgbp = ZR (ψ gbp ).
Example: 2-D Grid Revisited

                               −2
                              10
                                                                  block estimate
                               −4                                 GBP estimate
                              10

                               −6
                              10
          free energy error




                               −8
                              10

                               −10
                              10

                               −12
                              10

                               −14
                              10

                               −16
                              10
                                    4   6   8       10       12        14          16
                                                block size
GBP Toy Example
  Look at graph G = K4 and consider different choices of regions...

                                                           1
                            2




                                1
                                              4                    2

                   4                     3
                                                       3



  BP Regions:

                 n = +1     12                n = −2           2



                       24           23
                                                               1



                 14                      13   4                    3
                            34
GBP “3∆” Regions:

    n = +1               n = −1         12              n = +1


       124         123
                                                                 1




             134         14                        13




GBP “4∆” Regions:

    n = +1               n = −1    12                   n = +1   2


       124         123
                              24             23
                                                                 1



    234                  14                       13    4            3
             134                   34
Computational Experiment with equal edge weights r = .32 (the
model becomes singular/indefinite for r ≥ 1 ).
                                         3

                            Z   = 10.9
                          Zbp = 2.5
                     Zgbp (3∆) = 9.9
                     Zgbp (4∆) = 54.4!!!

GBP with 3∆ regions is big improvement of BP (GBP captures
more orbits).
What went wrong with the 4∆ method?
Orbit-Product Interpretation of GBP
   Answer: sometimes GBP can overcount orbits of the graph.
       Let T (R) be the set of hypertrees T one may construct from
       regions R.
       Orbit spans T if we can embed           in T but cannot embed it
       in any sub-hypertree of T .
       Let g     #{T ∈ T (R)| spans T }.
   Orbit-Product Interpretation of GBP:

                             Zgbp =       Zg

   Remark. GBP may also include multiples of an orbit as
   independent orbits (these are not counted by Z ).
   We say GBP is consistent if g ≤ 1 for all (primitive) orbits and
   g = 0 for multiples of orbits (no overcounting).
Examples of Over-Counting
   Orbit       = [(12)(23)(34)(41)]:

                       2                                   1                              1


                                                                                         124
                           1               4         134       123       2       4                    2
                                                                                         234


               4                   3                       3                              3



   Orbit       = [(12)(23)(34)(42)(21)]:

                   2                                                                          1

                                       2                   1                 2                12

                                               124               123                              2
                       1
                                                       134
                                                                                          234
                                               4                     3
           4                   3
                                                                                     4                3
Conclusion and Future Work



   Graphical view of inference in walk-summable Gaussian graphical
   models that is very intuitive for understanding iterative inference
   algorithms and approximation methods.
   Future Work:
       many open questions on GBP.
       multiscale method to approximate longer orbits from
       coarse-grained model.
       beyond walk-summable?

More Related Content

What's hot

A Coq Library for the Theory of Relational Calculus
A Coq Library for the Theory of Relational CalculusA Coq Library for the Theory of Relational Calculus
A Coq Library for the Theory of Relational CalculusYoshihiro Mizoguchi
 
Multilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureMultilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureVjekoslavKovac1
 
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton TensorDual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton TensorSebastian De Haro
 
Reflect tsukuba524
Reflect tsukuba524Reflect tsukuba524
Reflect tsukuba524kazuhase2011
 
Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math ColloquiumChristian Robert
 
Classification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metricsClassification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metricsFrank Nielsen
 
Theory of Relational Calculus and its Formalization
Theory of Relational Calculus and its FormalizationTheory of Relational Calculus and its Formalization
Theory of Relational Calculus and its FormalizationYoshihiro Mizoguchi
 
Algebras for programming languages
Algebras for programming languagesAlgebras for programming languages
Algebras for programming languagesYoshihiro Mizoguchi
 
A Szemerédi-type theorem for subsets of the unit cube
A Szemerédi-type theorem for subsets of the unit cubeA Szemerédi-type theorem for subsets of the unit cube
A Szemerédi-type theorem for subsets of the unit cubeVjekoslavKovac1
 
Engr 371 final exam april 2010
Engr 371 final exam april 2010Engr 371 final exam april 2010
Engr 371 final exam april 2010amnesiann
 
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operatorsA T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operatorsVjekoslavKovac1
 
Measures of risk on variability with application in stochastic activity networks
Measures of risk on variability with application in stochastic activity networksMeasures of risk on variability with application in stochastic activity networks
Measures of risk on variability with application in stochastic activity networksAlexander Decker
 
Geodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and GraphicsGeodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and GraphicsGabriel Peyré
 
H function and a problem related to a string
H function and a problem related to a stringH function and a problem related to a string
H function and a problem related to a stringAlexander Decker
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applicationsFrank Nielsen
 
Engr 371 final exam april 2006
Engr 371 final exam april 2006Engr 371 final exam april 2006
Engr 371 final exam april 2006amnesiann
 
Mesh Processing Course : Multiresolution
Mesh Processing Course : MultiresolutionMesh Processing Course : Multiresolution
Mesh Processing Course : MultiresolutionGabriel Peyré
 

What's hot (20)

A Coq Library for the Theory of Relational Calculus
A Coq Library for the Theory of Relational CalculusA Coq Library for the Theory of Relational Calculus
A Coq Library for the Theory of Relational Calculus
 
Multilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureMultilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structure
 
Holographic Cotton Tensor
Holographic Cotton TensorHolographic Cotton Tensor
Holographic Cotton Tensor
 
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton TensorDual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
 
Reflect tsukuba524
Reflect tsukuba524Reflect tsukuba524
Reflect tsukuba524
 
Session 6
Session 6Session 6
Session 6
 
Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math Colloquium
 
Classification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metricsClassification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metrics
 
Theory of Relational Calculus and its Formalization
Theory of Relational Calculus and its FormalizationTheory of Relational Calculus and its Formalization
Theory of Relational Calculus and its Formalization
 
Algebras for programming languages
Algebras for programming languagesAlgebras for programming languages
Algebras for programming languages
 
A Szemerédi-type theorem for subsets of the unit cube
A Szemerédi-type theorem for subsets of the unit cubeA Szemerédi-type theorem for subsets of the unit cube
A Szemerédi-type theorem for subsets of the unit cube
 
Engr 371 final exam april 2010
Engr 371 final exam april 2010Engr 371 final exam april 2010
Engr 371 final exam april 2010
 
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operatorsA T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
 
Measures of risk on variability with application in stochastic activity networks
Measures of risk on variability with application in stochastic activity networksMeasures of risk on variability with application in stochastic activity networks
Measures of risk on variability with application in stochastic activity networks
 
Geodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and GraphicsGeodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and Graphics
 
H function and a problem related to a string
H function and a problem related to a stringH function and a problem related to a string
H function and a problem related to a string
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
 
Engr 371 final exam april 2006
Engr 371 final exam april 2006Engr 371 final exam april 2006
Engr 371 final exam april 2006
 
Mesh Processing Course : Multiresolution
Mesh Processing Course : MultiresolutionMesh Processing Course : Multiresolution
Mesh Processing Course : Multiresolution
 
Gp2613231328
Gp2613231328Gp2613231328
Gp2613231328
 

Viewers also liked

Efficient Belief Propagation in Depth Finding
Efficient Belief Propagation in Depth FindingEfficient Belief Propagation in Depth Finding
Efficient Belief Propagation in Depth FindingSamantha Luber
 
A Movement Recognition Method using LBP
A Movement Recognition Method using LBPA Movement Recognition Method using LBP
A Movement Recognition Method using LBPZihui Li
 
02 probabilistic inference in graphical models
02 probabilistic inference in graphical models02 probabilistic inference in graphical models
02 probabilistic inference in graphical modelszukun
 
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale GraphsVertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale GraphsUniversidade de São Paulo
 
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing ModelFast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing ModelUniversidade de São Paulo
 
Performance comparison of eg ldpc codes
Performance comparison of eg ldpc codesPerformance comparison of eg ldpc codes
Performance comparison of eg ldpc codesijcsity
 
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...Andres Mendez-Vazquez
 

Viewers also liked (10)

C04922125
C04922125C04922125
C04922125
 
Efficient Belief Propagation in Depth Finding
Efficient Belief Propagation in Depth FindingEfficient Belief Propagation in Depth Finding
Efficient Belief Propagation in Depth Finding
 
A Movement Recognition Method using LBP
A Movement Recognition Method using LBPA Movement Recognition Method using LBP
A Movement Recognition Method using LBP
 
02 probabilistic inference in graphical models
02 probabilistic inference in graphical models02 probabilistic inference in graphical models
02 probabilistic inference in graphical models
 
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale GraphsVertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
 
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing ModelFast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
 
Performance comparison of eg ldpc codes
Performance comparison of eg ldpc codesPerformance comparison of eg ldpc codes
Performance comparison of eg ldpc codes
 
Ece221 Ch7 Part1
Ece221 Ch7 Part1Ece221 Ch7 Part1
Ece221 Ch7 Part1
 
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...
 
Counters
CountersCounters
Counters
 

Similar to Physics of Algorithms Talk

T. Popov - Drinfeld-Jimbo and Cremmer-Gervais Quantum Lie Algebras
T. Popov - Drinfeld-Jimbo and Cremmer-Gervais Quantum Lie AlgebrasT. Popov - Drinfeld-Jimbo and Cremmer-Gervais Quantum Lie Algebras
T. Popov - Drinfeld-Jimbo and Cremmer-Gervais Quantum Lie AlgebrasSEENET-MTP
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clusteringFrank Nielsen
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future TrendCVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trendzukun
 
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-LassoSparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-LassoLaboratoire Statistique et génome
 
Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Deb Roy
 
Bayesian case studies, practical 2
Bayesian case studies, practical 2Bayesian case studies, practical 2
Bayesian case studies, practical 2Robin Ryder
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesPierre Jacob
 
Triangle counting handout
Triangle counting handoutTriangle counting handout
Triangle counting handoutcsedays
 
01 graphical models
01 graphical models01 graphical models
01 graphical modelszukun
 
Jam 2006 Test Papers Mathematical Statistics
Jam 2006 Test Papers Mathematical StatisticsJam 2006 Test Papers Mathematical Statistics
Jam 2006 Test Papers Mathematical Statisticsashu29
 
Bayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process modelsBayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process modelsTommaso Rigon
 
Alternating direction
Alternating directionAlternating direction
Alternating directionDerek Pang
 
Ellipsoidal Representations about correlations (2011-11, Tsukuba, Kakenhi-Sym...
Ellipsoidal Representations about correlations (2011-11, Tsukuba, Kakenhi-Sym...Ellipsoidal Representations about correlations (2011-11, Tsukuba, Kakenhi-Sym...
Ellipsoidal Representations about correlations (2011-11, Tsukuba, Kakenhi-Sym...Toshiyuki Shimono
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Beniamino Murgante
 

Similar to Physics of Algorithms Talk (20)

Normal
NormalNormal
Normal
 
T. Popov - Drinfeld-Jimbo and Cremmer-Gervais Quantum Lie Algebras
T. Popov - Drinfeld-Jimbo and Cremmer-Gervais Quantum Lie AlgebrasT. Popov - Drinfeld-Jimbo and Cremmer-Gervais Quantum Lie Algebras
T. Popov - Drinfeld-Jimbo and Cremmer-Gervais Quantum Lie Algebras
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future TrendCVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
 
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-LassoSparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
 
YSC 2013
YSC 2013YSC 2013
YSC 2013
 
iTute Notes MM
iTute Notes MMiTute Notes MM
iTute Notes MM
 
Iwsmbvs
IwsmbvsIwsmbvs
Iwsmbvs
 
Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01
 
Bayesian case studies, practical 2
Bayesian case studies, practical 2Bayesian case studies, practical 2
Bayesian case studies, practical 2
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniques
 
Triangle counting handout
Triangle counting handoutTriangle counting handout
Triangle counting handout
 
01 graphical models
01 graphical models01 graphical models
01 graphical models
 
Jam 2006 Test Papers Mathematical Statistics
Jam 2006 Test Papers Mathematical StatisticsJam 2006 Test Papers Mathematical Statistics
Jam 2006 Test Papers Mathematical Statistics
 
Lagrange
LagrangeLagrange
Lagrange
 
Bayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process modelsBayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process models
 
Alternating direction
Alternating directionAlternating direction
Alternating direction
 
Igv2008
Igv2008Igv2008
Igv2008
 
Ellipsoidal Representations about correlations (2011-11, Tsukuba, Kakenhi-Sym...
Ellipsoidal Representations about correlations (2011-11, Tsukuba, Kakenhi-Sym...Ellipsoidal Representations about correlations (2011-11, Tsukuba, Kakenhi-Sym...
Ellipsoidal Representations about correlations (2011-11, Tsukuba, Kakenhi-Sym...
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
 

Physics of Algorithms Talk

  • 1. Orbit-Product Analysis of (Generalized) Gaussian Belief Propagation Jason Johnson, Post-Doctoral Fellow, LANL Joint work with Michael Chertkov and Vladimir Chernyak Physics of Algorithms Workshop Santa Fe, New Mexico September 3, 2009
  • 2. Overview Introduction graphical models + belief propagation specialization to Gaussian model Analysis of Gaussian BP walk-sum analysis for means, variances, covariances1 orbit-product analysis/corrections for determinant2 Current Work on Generalized Belief Propagation (GBP) [Yedidia et al] uses larger “regions” to capture more walks/orbits of the graph (better approximation) However, it can also lead to over-counting of walks/orbits (bad approximation/unstable algorithm)! 1 Earlier joint work with Malioutov & Willsky (NIPS, JMLR ’06). 2 Johnson, Chernyak & Chertkov (ICML ’09).
  • 3. Graphical Models A graphical model is a multivariate probability distribution that is expressed in terms of interactions among subsets of variables (e.g. pairwise interactions on the edges of a graph G ). 1 P(x) = ψi (xi ) ψij (xi , xj ) Z i∈V {i,j}∈G Markov property: A S B P(xA , xB |xS ) = P(xA |xS )P(xB |xS ) Given the potential functions ψ, the goal of inference is to compute marginals P(xi ) = xV i P(x) or the normalization constant Z , which is generally difficult in large, complex graphical models.
  • 4. Gaussian Graphical Model Information form of Gaussian density. P(x) ∝ exp − 2 x T Jx + hT x 1 Gaussian graphical model: sparse J matrix Jij = 0 if and only if {i, j} ∈ G Potentials: 1 2 ψi (xi ) = e − 2 Jii xi +hi xi ψij (xi , xj ) = e −Jij xi xj Inference corresponds to calculation of mean vector µ = J −1 h, covariance matrix K = J −1 or determinant Z = det J −1 . Marginals P(xi ) specified by means µi and variances Kii .
  • 5. Belief Propagation Belief Propagation iteratively updates a set of messages µi→j (xj ) defined on directed edges of the graph G using the rule: µi→j (xj ) ∝ ψi (xi ) µk→i (xi )ψ(xi , xj ) xi k∈N(i)j Iterate message updates until converges to a fixed point. Marginal Estimates: combine messages at a node 1 P(xi ) = ψi (xi ) µk→i (xi ) Zi k∈N(i) ˜ ψi (xi )
  • 6. Belief Propagation II Pairwise Estimates (on edges of graph): 1 ˜ ˜ ψ(xi , xj ) P(xi , xj ) = ψi (xi )ψj (xj ) Zij µi→j (xj )µj→i (xi ) ˜ ψij (xi ,xj ) Estimate of Normalization Constant: Zij Z bp = Zi Zi Zj i∈V {i,j}∈G BP fixed point is saddle point of RHS with respect to messages/reparameterizations. In trees, BP converges in finite number of steps and is exact (equivalent to variable elimination).
  • 7. Gaussian Belief Propagation (GaBP) 1 Messages µi→j (xj ) ∝ exp{ 2 αi→j xj2 + βi→j xj }. BP fixed-point equations reduce to: αi→j = Jij (Jii − αij )−1 2 βi→j = −Jij (Jii − αij )−1 (hi + βij ) where αij = k∈N(i)j αk→i and βij = k∈N(i)j αk→i . Marginals specified by: Kibp = (Jii − αk→i )−1 k∈N(i) µbp i = Kibp (hi + βk→i ) k∈N(i)
  • 8. Gaussian BP Determinant Estimate Estimates of pairwise covariance on edges: −1 bp Jii − αij Jij K(ij) = Jij Jjj − αji Estimate of Z det K = det J −1 : Zij Z bp = Zi Zi Zj i∈V {i,j}∈G where Zi = Kibp and Zij = det K(ij) . bp Exact in tree models (equivalent to Gaussian elimination), approximate in loopy models.
  • 9. The BP Computation Tree BP marginal estimates are equivalent to the exact marginal in a tree-structured model [Weiss & Freeman]. (4) 1 µ2→1 1 2 3 (3) 2 4 µ3→2 3 5 5 7 4 5 6 (2) µ6→3 6 4 6 8 2 6 8 8 (1) µ5→6 7 8 9 5 9 1 7 3 9 7 9 1 3 3 9 7 9 5 9 The BP messages correspond to upwards variable elimination steps in this computation tree.
  • 10. Walk-Summable Gaussian Models ∞ Let J = I − R. If ρ(R) < 1 then (I − R)−1 = L L=0 R . Walk-Sum interpretation of inference: ∞ ? Kij = Rw = Rw L=0 L w :i→j w :i →j ∞ ? µi = hj Rw = h∗ R w j L=0 L w :∗→i w :j →i Walk-Summable if w :i→j |R w | converges for all i, j. Absolute convergence implies convergence of walk-sums (to same value) for arbitrary orderings and partitions of the set of walks. Equivalent to ρ(|R|) < 1.
  • 11. Walk-Sum Interpretation of GaBP Combine interpretation of BP as exact inference on computation tree with walk-sum interpretation of Gaussian inference in trees: messages represent walk-sums in subtrees of computation tree Gauss BP converges in walk-summable models complete walk-sum for the means incomplete walk-sum for the variances
  • 12. Complete Walk-Sum for Means Every walk in G ending at a node i maps to a walk of the computation tree Ti (ending at root node of Ti )... 1 1 2 3 2 4 4 5 6 3 5 5 7 6 4 6 8 2 6 8 8 7 8 9 5 9 1 7 3 9 7 9 1 3 3 9 7 9 5 9 Gaussian BP converges to the correct means in WS models.
  • 13. Incomplete Walk-Sum for Variances Only those totally backtracking walks of G can be embedded as closed walks in the computation tree... 1 1 2 3 2 4 4 5 6 3 5 5 7 6 4 6 8 2 6 8 8 7 8 9 5 9 1 7 3 9 7 9 1 3 3 9 7 9 5 9 Gaussian BP converges to incorrect variance estimates (underestimate in non-negative model).
  • 14. Zeta Function and Orbit-Product What about the determinant? Definition of Orbits: A walk is closed if it begins and ends at same vertex. It is primitive if does not repeat a shorter walk. Two primitive walks are equivalent if one is a cyclic shift of the other. Define orbits ∈ L of G to be equivalence classes of closed, primitive walks. Theorem. Let Z det(I − R)−1 . If ρ(|R|) < 1 then Z= (1 − R )−1 Z. A kind of zeta function in graph theory.
  • 15. Zbp as Totally-Backtracking Orbit-Product Definition of Totally-Backtracking Orbits: Orbit is reducible if it contains backtracking steps ...(ij)(ji)..., else it is irreducible (or backtrackless). Every orbit has a unique irreducible core γ = Γ( ) obtained by iteratively deleting pairs of backtracking steps until no more remain. Let Lγ denote the set of all orbits that reduce to γ. Orbit is totally backtracking (or trivial) if it reduces to the empty orbit Γ( ) = ∅, else it is non-trivial. Theorem. If ρ(|R|) < 1 then Z bp (defined earlier) is equal to the totally-backtracking orbit-product: Z bp = Z ∈L∅
  • 16. Orbit-Product Correction and Error Bound Orbit-product correction to Z bp : Z = Z bp Z ∈L∅ Error Bound: missing orbits must all involve cycles of the graph... 1 Z ρg log bp ≤ n Z g (1 − ρ) where ρ ρ(|R|) < 1 and g is girth of the graph (length of shortest cycle).
  • 17. Reduction to Backtrackless Orbit-Product Correction We may reduce the orbit-product correction to one over just backtrackless orbits γ   Z = Zbp Z = Zbp  Z γ ∈L(γ) Zγ with modified orbit-factors Zγ based on GaBP Zγ = (1 − rij )−1 where rij (1 − αij )−1 rij (ij)∈γ The factor (1 − αij )−1 serves to reconstruct totally-backtracking walks at each point i along the backtrackless orbit γ.
  • 18. Backtrackless Determinant Correction Define backtrackless graph G of G as follows: nodes of G correspond to directed edges of G , edges (ij) → (jk) for k = i. 21 32 1 2 3 12 23 14 41 25 52 36 63 54 65 4 5 6 45 56 47 74 58 85 69 96 87 98 7 8 9 78 89 Let R be adjacency matrix of G with modified edge-weights r based on GaBP. Then, Z = Zbp det(I − R )−1
  • 19. Region-Based Estimates/Corrections Select a set of regions R ⊂ 2V that is closed under intersections and cover all vertices and edges of G. Define regions counts (nA ∈ Z, A ∈ R) by inclusion-exclusion rule: nA = 1 − nB B∈R|A B To capture all orbits covered by any region (without over-counting) we calculate the estimate: n ZR ZB B (det(I − RB )−1 )nB B B Error Bounds. Select regions to cover all orbits up to length L. Then, 1 ZB ρL log ≤ n Z L(1 − ρ)
  • 20. Example: 2-D Grids L Choice of regions for grids: overlapping L × L, 2 × L, L × L , 2 L 2 × L 2 (shifted by L ). 2 For example, in 6 × 6 grid with block size L = 4: n = +1 n = −1 n = +1
  • 21. 256 × 256 Periodic Grid, uniform edge weights r ∈ [0, .25]. Test with L = 2, 4, 8, 16, 32. 1 0.25 0.9 ρ(|R|) n−1 log Ztrue 0.8 ρ(|R′|) 0.2 n−1 log Z bp 0.7 n−1 log ZB (L=2,4,8,...) 0.6 0.15 0.5 0.4 0.1 0.3 0.2 0.05 0.1 0 0 0 0.05 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25 r r −2 0.25 10 −1 n log Ztrue 0.2 n−1 log Z −4 10 bp n−1 log Zbp Z′B −6 0.15 10 −8 0.1 10 n−1|log Z−1 ZB| 0.05 −10 10 true n−1|log Z−1 Zbp Z′B| true −12 0 10 0 0.05 0.1 0.15 0.2 0.25 5 10 15 20 25 30 r L
  • 22. Generalized Belief Propagation Select a set of regions R ⊂ 2V that is closed under intersections and cover all vertices and edges of G. Define regions counts (nA ∈ Z, A ∈ R) by inclusion-exclusion rule: nA = 1 − nB B∈R|A B Then, GBP solves for saddle point of ZR (ψ) Z (ψR )nR A∈R over reparameterizations {ψA , A ∈ R} of the form 1 P(x) = ψR (xR )nR Z A∈R Denote saddle-point by Zgbp = ZR (ψ gbp ).
  • 23. Example: 2-D Grid Revisited −2 10 block estimate −4 GBP estimate 10 −6 10 free energy error −8 10 −10 10 −12 10 −14 10 −16 10 4 6 8 10 12 14 16 block size
  • 24. GBP Toy Example Look at graph G = K4 and consider different choices of regions... 1 2 1 4 2 4 3 3 BP Regions: n = +1 12 n = −2 2 24 23 1 14 13 4 3 34
  • 25. GBP “3∆” Regions: n = +1 n = −1 12 n = +1 124 123 1 134 14 13 GBP “4∆” Regions: n = +1 n = −1 12 n = +1 2 124 123 24 23 1 234 14 13 4 3 134 34
  • 26. Computational Experiment with equal edge weights r = .32 (the model becomes singular/indefinite for r ≥ 1 ). 3 Z = 10.9 Zbp = 2.5 Zgbp (3∆) = 9.9 Zgbp (4∆) = 54.4!!! GBP with 3∆ regions is big improvement of BP (GBP captures more orbits). What went wrong with the 4∆ method?
  • 27. Orbit-Product Interpretation of GBP Answer: sometimes GBP can overcount orbits of the graph. Let T (R) be the set of hypertrees T one may construct from regions R. Orbit spans T if we can embed in T but cannot embed it in any sub-hypertree of T . Let g #{T ∈ T (R)| spans T }. Orbit-Product Interpretation of GBP: Zgbp = Zg Remark. GBP may also include multiples of an orbit as independent orbits (these are not counted by Z ). We say GBP is consistent if g ≤ 1 for all (primitive) orbits and g = 0 for multiples of orbits (no overcounting).
  • 28. Examples of Over-Counting Orbit = [(12)(23)(34)(41)]: 2 1 1 124 1 4 134 123 2 4 2 234 4 3 3 3 Orbit = [(12)(23)(34)(42)(21)]: 2 1 2 1 2 12 124 123 2 1 134 234 4 3 4 3 4 3
  • 29. Conclusion and Future Work Graphical view of inference in walk-summable Gaussian graphical models that is very intuitive for understanding iterative inference algorithms and approximation methods. Future Work: many open questions on GBP. multiscale method to approximate longer orbits from coarse-grained model. beyond walk-summable?