Graphical Models                              Factor Graphs           Test-time Inference   Training




                       Part 2: Introduction to Graphical Models

                                 Sebastian Nowozin and Christoph H. Lampert



                                             Colorado Springs, 25th June 2011




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference       Training

Graphical Models



Introduction
             Model: relating observations x to
             quantities of interest y
                                                                                   f
             Example 1: given RGB image x, infer
             depth y for each pixel
             Example 2: given RGB image x, infer              X                        Y
             presence and positions y of all objects                     f :X →Y
             shown




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs           Test-time Inference       Training

Graphical Models



Introduction
             Model: relating observations x to
             quantities of interest y
                                                                                            f
             Example 1: given RGB image x, infer
             depth y for each pixel
             Example 2: given RGB image x, infer                       X                        Y
             presence and positions y of all objects                              f :X →Y
             shown




                                             X : image, Y: object annotations
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference           Training

Graphical Models



Introduction



             General case: mapping x ∈ X to y ∈ Y
             Graphical models are a concise
             language to define this mapping                              x
             Mapping can be ambiguous:
                                                                                   f (x)
             measurement noise, lack of                       X                            Y
             well-posedness (e.g. occlusions)                            f :X →Y
             Probabilistic graphical models: define
             form p(y |x) or p(x, y ) for all y ∈ Y




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference       Training

Graphical Models



Introduction



             General case: mapping x ∈ X to y ∈ Y
             Graphical models are a concise                                        ?
             language to define this mapping                              x
             Mapping can be ambiguous:                                             ?
             measurement noise, lack of                       X                        Y
             well-posedness (e.g. occlusions)                            p(Y |X = x)
             Probabilistic graphical models: define
             form p(y |x) or p(x, y ) for all y ∈ Y




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference          Training

Graphical Models



Graphical Models

        A graphical model defines
                   a family of probability distributions over a set of random variables,
                   by means of a graph,
                   so that the random variables satisfy conditional independence
                   assumptions encoded in the graph.




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference          Training

Graphical Models



Graphical Models

        A graphical model defines
                   a family of probability distributions over a set of random variables,
                   by means of a graph,
                   so that the random variables satisfy conditional independence
                   assumptions encoded in the graph.
     Popular classes of graphical models,
         Undirected graphical models (Markov
         random fields),
         Directed graphical models (Bayesian
         networks),
             Factor graphs,
             Others: chain graphs, influence
             diagrams, etc.

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs           Test-time Inference                Training

Graphical Models



Bayesian Networks

      Graph: G = (V , E), E ⊂ V × V                                                          Yi            Yj
              directed
              acyclic
      Variable domains Yi                                                                           Yk
      Factorization

                      p(Y = y ) =                  p(yi |ypaG (i) )
                                                                                                    Yl
                                             i∈V

      over distributions, by conditioning on parent                                         A simple Bayes net
      nodes.
      Example

      p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj )
                            p(Yi = yi )p(Yj = yj ).
Sebastian Nowozin and Christoph H. Lampert
      Family of distributions
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs           Test-time Inference                Training

Graphical Models



Bayesian Networks

      Graph: G = (V , E), E ⊂ V × V                                                          Yi            Yj
              directed
              acyclic
      Variable domains Yi                                                                           Yk
      Factorization

                      p(Y = y ) =                  p(yi |ypaG (i) )
                                                                                                    Yl
                                             i∈V

      over distributions, by conditioning on parent                                         A simple Bayes net
      nodes.
      Example

      p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj )
                            p(Yi = yi )p(Yj = yj ).
Sebastian Nowozin and Christoph H. Lampert
      Family of distributions
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                     Test-time Inference                  Training

Graphical Models



Undirected Graphical Models
                                                                                                Yi         Yj        Yk
             = Markov random field (MRF) = Markov
             network                                                                                  A simple MRF
             Graph: G = (V , E), E ⊂ V × V
                      undirected, no self-edges
             Variable domains Yi
             Factorization over potentials ψ at cliques,
                                              1
                                 p(y ) =                       ψC (yC )
                                              Z
                                                    C ∈C(G )


             Constant Z =                    y ∈Y     C ∈C(G )   ψC (yC )
             Example
                                    1
                     p(y ) =          ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj )
                                    Z
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                     Test-time Inference                  Training

Graphical Models



Undirected Graphical Models
                                                                                                Yi         Yj        Yk
             = Markov random field (MRF) = Markov
             network                                                                                  A simple MRF
             Graph: G = (V , E), E ⊂ V × V
                      undirected, no self-edges
             Variable domains Yi
             Factorization over potentials ψ at cliques,
                                              1
                                 p(y ) =                       ψC (yC )
                                              Z
                                                    C ∈C(G )


             Constant Z =                    y ∈Y     C ∈C(G )   ψC (yC )
             Example
                                    1
                     p(y ) =          ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj )
                                    Z
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                      Test-time Inference   Training

Graphical Models



Example 1


                                                  Yi               Yj              Yk




                   Cliques C(G ): set of vertex sets V with V ⊆ V ,
                   E ∩ (V × V ) = V × V
                   Here C(G ) = {{i}, {i, j}, {j}, {j, k}, {k}}

                                                         1
                                             p(y ) =       ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj )
                                                         Z



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                       Test-time Inference   Training

Graphical Models



Example 2


                                                        Yi                 Yj




                                                        Yk                 Yl



                   Here C(G ) = 2V : all subsets of V are cliques

                                                             1
                                                   p(y ) =                      ψA (yA ).
                                                             Z
                                                                 A∈2{i,j,k,l}


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                  Test-time Inference                       Training

Factor Graphs



Factor Graphs


                Graph: G = (V , F, E), E ⊆ V × F                                              Yi                  Yj
                      variable nodes V ,
                      factor nodes F ,
                      edges E between variable and factor nodes.
                      scope of a factor,
                      N(F ) = {i ∈ V : (i, F ) ∈ E}
                                                                                              Yk                  Yl
                Variable domains Yi
                Factorization over potentials ψ at factors,                                        Factor graph
                                              1
                                 p(y ) =                   ψF (yN(F ) )
                                              Z
                                                    F ∈F

                Constant Z =                 y ∈Y     F ∈F    ψF (yN(F ) )


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                  Test-time Inference                       Training

Factor Graphs



Factor Graphs


                Graph: G = (V , F, E), E ⊆ V × F                                              Yi                  Yj
                      variable nodes V ,
                      factor nodes F ,
                      edges E between variable and factor nodes.
                      scope of a factor,
                      N(F ) = {i ∈ V : (i, F ) ∈ E}
                                                                                              Yk                  Yl
                Variable domains Yi
                Factorization over potentials ψ at factors,                                        Factor graph
                                              1
                                 p(y ) =                   ψF (yN(F ) )
                                              Z
                                                    F ∈F

                Constant Z =                 y ∈Y     F ∈F    ψF (yN(F ) )


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs             Test-time Inference        Training

Factor Graphs



Why factor graphs?


                         Yi                  Yj              Yi   Yj         Yi              Yj




                         Yk                  Yl              Yk   Yl         Yk              Yl




                   Factor graphs are explicit about the factorization
                   Hence, easier to work with
                   Universal (just like MRFs and Bayesian networks)


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference             Training

Factor Graphs



Capacity



           Yi                     Yj                                               Yi   Yj




           Yk                     Yl                                               Yk   Yl




                   Factor graph defines family of distributions
                   Some families are larger than others



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Factor Graphs



Four remaining pieces




           1. Conditional distributions (CRFs)
           2. Parameterization
           3. Test-time inference
           4. Learning the model from training data




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Factor Graphs



Four remaining pieces




           1. Conditional distributions (CRFs)
           2. Parameterization
           3. Test-time inference
           4. Learning the model from training data




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference                        Training

Factor Graphs



Conditional Distributions

                We have discussed p(y ),                                           Xi          Xj
                How do we define p(y |x)?
                Potentials become a function of xN(F )
                Partition function depends on x
                                                                                   Yi              Yj
                Conditional random fields (CRFs)
                x is not part of the probability model, i.e. not                    conditional
                treated as random variable                                          distribution




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                    Test-time Inference                        Training

Factor Graphs



Conditional Distributions

                We have discussed p(y ),                                                             Xi          Xj
                How do we define p(y |x)?
                Potentials become a function of xN(F )
                Partition function depends on x
                                                                                                     Yi              Yj
                Conditional random fields (CRFs)
                x is not part of the probability model, i.e. not                                      conditional
                treated as random variable                                                            distribution
                                               1
                                      p(y ) =        ψF (yN(F ) )
                                               Z
                                                              F ∈F

                                                       1
                                           p(y |x) =                 ψF (yN(F ) ; xN(F ) )
                                                     Z (x)
                                                              F ∈F



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                    Test-time Inference                        Training

Factor Graphs



Conditional Distributions

                We have discussed p(y ),                                                             Xi          Xj
                How do we define p(y |x)?
                Potentials become a function of xN(F )
                Partition function depends on x
                                                                                                     Yi              Yj
                Conditional random fields (CRFs)
                x is not part of the probability model, i.e. not                                      conditional
                treated as random variable                                                            distribution
                                               1
                                      p(y ) =        ψF (yN(F ) )
                                               Z
                                                              F ∈F

                                                       1
                                           p(y |x) =                 ψF (yN(F ) ; xN(F ) )
                                                     Z (x)
                                                              F ∈F



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                         Test-time Inference                     Training

Factor Graphs



Potentials and Energy Functions

                   For each factor F ∈ F, YF =                     ×
                                                                  i∈N(F )
                                                                            Yi ,

                                                              EF : YN(F ) → R,

                   Potentials and energies (assume ψF (yF ) > 0)

                         ψF (yF ) = exp(−EF (yF )),                  and EF (yF ) = − log(ψF (yF )).

                   Then p(y ) can be written as
                                                                   1
                                             p(Y = y )        =                ψF (yF )
                                                                   Z
                                                                       F ∈F
                                                                   1
                                                              =      exp(−                EF (yF )),
                                                                   Z
                                                                                   F ∈F

                   Hence, p(y ) is completely determined by E (y ) =                                      F ∈F   EF (yF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                         Test-time Inference                     Training

Factor Graphs



Potentials and Energy Functions

                   For each factor F ∈ F, YF =                     ×
                                                                  i∈N(F )
                                                                            Yi ,

                                                              EF : YN(F ) → R,

                   Potentials and energies (assume ψF (yF ) > 0)

                         ψF (yF ) = exp(−EF (yF )),                  and EF (yF ) = − log(ψF (yF )).

                   Then p(y ) can be written as
                                                                   1
                                             p(Y = y )        =                ψF (yF )
                                                                   Z
                                                                       F ∈F
                                                                   1
                                                              =      exp(−                EF (yF )),
                                                                   Z
                                                                                   F ∈F

                   Hence, p(y ) is completely determined by E (y ) =                                      F ∈F   EF (yF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                         Test-time Inference                     Training

Factor Graphs



Potentials and Energy Functions

                   For each factor F ∈ F, YF =                     ×
                                                                  i∈N(F )
                                                                            Yi ,

                                                              EF : YN(F ) → R,

                   Potentials and energies (assume ψF (yF ) > 0)

                         ψF (yF ) = exp(−EF (yF )),                  and EF (yF ) = − log(ψF (yF )).

                   Then p(y ) can be written as
                                                                   1
                                             p(Y = y )        =                ψF (yF )
                                                                   Z
                                                                       F ∈F
                                                                   1
                                                              =      exp(−                EF (yF )),
                                                                   Z
                                                                                   F ∈F

                   Hence, p(y ) is completely determined by E (y ) =                                      F ∈F   EF (yF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                     Test-time Inference      Training

Factor Graphs



Energy Minimization

                                                                      1
                     argmax p(Y = y )                 =      argmax     exp(−               EF (yF ))
                        y ∈Y                                  y ∈Y    Z
                                                                                    F ∈F

                                                      =      argmax exp(−              EF (yF ))
                                                              y ∈Y
                                                                               F ∈F

                                                      =      argmax −           EF (yF )
                                                              y ∈Y
                                                                        F ∈F

                                                      =      argmin          EF (yF )
                                                              y ∈Y
                                                                      F ∈F
                                                      =      argmin E (y ).
                                                              y ∈Y


                   Energy minimization can be interpreted as solving for the most likely
                   state of some factor graph model
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference      Training

Factor Graphs



Parameterization
                   Factor graphs define a family of distributions
                   Parameterization: identifying individual members by parameters w




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs           Test-time Inference   Training

Factor Graphs



Parameterization
                   Factor graphs define a family of distributions
                   Parameterization: identifying individual members by parameters w


                         distributions
                         indexed
                         by w                                      pw1
                                                             pw2




                                                                          distributions
                                                                          in family

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference      Training

Factor Graphs



Example: Parameterization



                Image segmentation model
                Pairwise “Potts” energy function
                EF (yi , yj ; w1 ),

                      EF : {0, 1} × {0, 1} × R → R,

                EF (0, 0; w1 ) = EF (1, 1; w1 ) = 0            image segmentation model
                EF (0, 1; w1 ) = EF (1, 0; w1 ) = w1




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference      Training

Factor Graphs



Example: Parameterization (cont)



                Image segmentation model
                Unary energy function EF (yi ; x, w ),

                   EF : {0, 1} × X × R{0,1}×D → R,

                EF (0; x, w ) = w (0), ψF (x)
                EF (1; x, w ) = w (1), ψF (x)                  image segmentation model
                Features ψF : X → RD , e.g. image
                filters




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs          Test-time Inference         Training

Factor Graphs



Example: Parameterization (cont)

                                w(0), ψF (x)
                                                             ...               ...        ...
                                w(1), ψF (x)
                                                                                          ...
                                                              0    w1
                                                              w1   0




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs               Test-time Inference         Training

Factor Graphs



Example: Parameterization (cont)

                                w(0), ψF (x)
                                                               ...                  ...        ...
                                w(1), ψF (x)
                                                                                               ...
                                                                 0     w1
                                                                 w1    0


                   Total number of parameters: D + D + 1
                   Parameters are shared, but energies differ because of different ψF (x)
                   General form, linear in w ,

                                              EF (yF ; xF , w ) = w (yF ), ψF (xF )

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Test-time Inference



Making Predictions




                   Making predictions: given x ∈ X , predict y ∈ Y
                   How to measure quality of prediction? (or function f : X → Y)




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                 Test-time Inference   Training

Test-time Inference



Loss function


                   Define a loss function

                                                             ∆ : Y × Y → R+ ,

                   so that ∆(y , y ∗ ) measures the loss incurred by predicting y when y ∗
                   is true.
                   The loss function is application dependent




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                      Test-time Inference   Training

Test-time Inference



Test-time Inference

                   Loss function ∆(y , f (x)): correct label y , predict f (x)

                                                             ∆:Y ×Y →R

                   True joint distribution d(X , Y ) and true conditional d(y |x)
                   Model distribution p(y |x)
                   Expected loss: quality of prediction

                                             R∆ (x)
                                              f              = Ey ∼d(y |x) ∆(y , f (x))
                                                             =          d(y |x) ∆(y , f (x)).
                                                                 y ∈Y
                                                             ≈ Ey ∼p(y |x;w ) ∆(y , f (x))

                   Assuming that p(y |x; w ) ≈ d(y |x)

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                      Test-time Inference   Training

Test-time Inference



Test-time Inference

                   Loss function ∆(y , f (x)): correct label y , predict f (x)

                                                             ∆:Y ×Y →R

                   True joint distribution d(X , Y ) and true conditional d(y |x)
                   Model distribution p(y |x)
                   Expected loss: quality of prediction

                                             R∆ (x)
                                              f              = Ey ∼d(y |x) ∆(y , f (x))
                                                             =          d(y |x) ∆(y , f (x)).
                                                                 y ∈Y
                                                             ≈ Ey ∼p(y |x;w ) ∆(y , f (x))

                   Assuming that p(y |x; w ) ≈ d(y |x)

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                      Test-time Inference   Training

Test-time Inference



Test-time Inference

                   Loss function ∆(y , f (x)): correct label y , predict f (x)

                                                             ∆:Y ×Y →R

                   True joint distribution d(X , Y ) and true conditional d(y |x)
                   Model distribution p(y |x)
                   Expected loss: quality of prediction

                                             R∆ (x)
                                              f              = Ey ∼d(y |x) ∆(y , f (x))
                                                             =          d(y |x) ∆(y , f (x)).
                                                                 y ∈Y
                                                             ≈ Ey ∼p(y |x;w ) ∆(y , f (x))

                   Assuming that p(y |x; w ) ≈ d(y |x)

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                  Test-time Inference   Training

Test-time Inference



Example 1: 0/1 loss

        Loss 0 iff perfectly predicted, 1 otherwise:

                                                                        0      if y = y ∗
                                  ∆0/1 (y , y ∗ ) = I (y = y ∗ ) =
                                                                        1      otherwise

        Plugging it in,

                                      y∗     := argmin Ey ∼p(y |x) ∆0/1 (y , y )
                                                       y ∈Y

                                             =      argmax p(y |x)
                                                       y ∈Y

                                             =      argmin E (y , x).
                                                       y ∈Y



                   Minimizing the expected 0/1-loss → MAP prediction (energy
                   minimization)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                  Test-time Inference   Training

Test-time Inference



Example 1: 0/1 loss

        Loss 0 iff perfectly predicted, 1 otherwise:

                                                                        0      if y = y ∗
                                  ∆0/1 (y , y ∗ ) = I (y = y ∗ ) =
                                                                        1      otherwise

        Plugging it in,

                                      y∗     := argmin Ey ∼p(y |x) ∆0/1 (y , y )
                                                       y ∈Y

                                             =      argmax p(y |x)
                                                       y ∈Y

                                             =      argmin E (y , x).
                                                       y ∈Y



                   Minimizing the expected 0/1-loss → MAP prediction (energy
                   minimization)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                Factor Graphs                 Test-time Inference   Training

Test-time Inference



Example 2: Hamming loss
     Count the number of mislabeled variables:
                                             1
                      ∆H (y , y ∗ ) =                    I (yi = yi∗ )
                                            |V |
                                                   i∈V




        Plugging it in,

                                           y∗   := argmin Ey ∼p(y |x) [∆H (y , y )]
                                                            y ∈Y


                                                 =          argmax p(yi |x)
                                                                yi ∈Yi
                                                                              i∈V


                   Minimizing the expected Hamming loss → maximum posterior
                   marginal (MPM, Max-Marg) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                Factor Graphs                 Test-time Inference   Training

Test-time Inference



Example 2: Hamming loss
     Count the number of mislabeled variables:
                                             1
                      ∆H (y , y ∗ ) =                    I (yi = yi∗ )
                                            |V |
                                                   i∈V




        Plugging it in,

                                           y∗   := argmin Ey ∼p(y |x) [∆H (y , y )]
                                                            y ∈Y


                                                 =          argmax p(yi |x)
                                                                yi ∈Yi
                                                                              i∈V


                   Minimizing the expected Hamming loss → maximum posterior
                   marginal (MPM, Max-Marg) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                 Factor Graphs                           Test-time Inference   Training

Test-time Inference



Example 3: Squared error
     Assume a vector space on Yi (pixel intensities,
     optical flow vectors, etc.).
     Sum of squared errors
                                                 1
                      ∆Q (y , y ∗ ) =                            yi − yi∗ 2 .
                                                |V |
                                                       i∈V


        Plugging it in,
                                           y∗    := argmin Ey ∼p(y |x) [∆Q (y , y )]
                                                             y ∈Y
                                                                                  

                                                  =                      p(yi |x)yi 
                                                                 yi ∈Yi
                                                                                         i∈V

                   Minimizing the expected squared error → minimum mean squared
                   error (MMSE) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                 Factor Graphs                           Test-time Inference   Training

Test-time Inference



Example 3: Squared error
     Assume a vector space on Yi (pixel intensities,
     optical flow vectors, etc.).
     Sum of squared errors
                                                 1
                      ∆Q (y , y ∗ ) =                            yi − yi∗ 2 .
                                                |V |
                                                       i∈V


        Plugging it in,
                                           y∗    := argmin Ey ∼p(y |x) [∆Q (y , y )]
                                                             y ∈Y
                                                                                  

                                                  =                      p(yi |x)yi 
                                                                 yi ∈Yi
                                                                                         i∈V

                   Minimizing the expected squared error → minimum mean squared
                   error (MMSE) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs      Test-time Inference   Training

Test-time Inference



Inference Task: Maximum A Posteriori (MAP) Inference




        Definition (Maximum A Posteriori (MAP) Inference)
        Given a factor graph, parameterization, and weight vector w , and given
        the observation x, find

                             y ∗ = argmax p(Y = y |x, w ) = argmin E (y ; x, w ).
                                           y ∈Y                y ∈Y




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                        Test-time Inference   Training

Test-time Inference



Inference Task: Probabilistic Inference


        Definition (Probabilistic Inference)
        Given a factor graph, parameterization, and weight vector w , and given
        the observation x, find

                       log Z (x, w ) =              log             exp(−E (y ; x, w )),
                                                             y ∈Y
                               µF (yF )      = p(YF = yf |x, w ),                ∀F ∈ F, ∀yF ∈ YF .


                   This typically includes variable marginals

                                                             µi (yi ) = p(yi |x, w )



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference             Training

Test-time Inference



Example: Man-made structure detection


                                                                       Xi
                                                                     ψi2
                                                                    Yi            3
                                                                                 ψi,k   Yk
                                                                           ψi1




                   Left: input image x,
                   Middle: ground truth labeling on 16-by-16 pixel blocks,
                   Right: factor graph model

                   Features: gradient and color histograms
                   Estimate model parameters from ≈ 60 training images


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference    Training

Test-time Inference



Example: Man-made structure detection




                   Left: input image x,
                   Middle (probabilistic inference): visualization of the variable
                   marginals p(yi = “manmade |x, w ),
                   Right (MAP inference): joint MAP labeling
                   y ∗ = argmaxy ∈Y p(y |x, w ).



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Training



Training the Model




           What can be learned?
              Model structure: factors
                   Model variables: observed variables fixed, but we can add
                   unobserved variables
                   Factor energies: parameters




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Training



Training the Model




           What can be learned?
              Model structure: factors
                   Model variables: observed variables fixed, but we can add
                   unobserved variables
                   Factor energies: parameters




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                Factor Graphs                Test-time Inference   Training

Training



Training: Overview



                   Assume a fully observed, independent and identically distributed
                   (iid) sample set

                                           {(x n , y n )}n=1,...,N ,   (x n , y n ) ∼ d(X , Y )

                   Goal: predict well,
                   Alternative goal: first model d(y |x) well by p(y |x, w ), then predict
                   by minimizing the expected loss




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference     Training

Training



Probabilistic Learning



           Problem (Probabilistic Parameter Learning)
           Let d(y |x) be the (unknown) conditional distribution of labels for a
           problem to be solved. For a parameterized conditional distribution
           p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is
           the task of finding a point estimate of the parameter w ∗ that makes
           p(y |x, w ∗ ) closest to d(y |x).

                   We will discuss probabilistic parameter learning in detail.




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference     Training

Training



Probabilistic Learning



           Problem (Probabilistic Parameter Learning)
           Let d(y |x) be the (unknown) conditional distribution of labels for a
           problem to be solved. For a parameterized conditional distribution
           p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is
           the task of finding a point estimate of the parameter w ∗ that makes
           p(y |x, w ∗ ) closest to d(y |x).

                   We will discuss probabilistic parameter learning in detail.




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs        Test-time Inference   Training

Training



Loss-Minimizing Parameter Learning


           Problem (Loss-Minimizing Parameter Learning)
           Let d(x, y ) be the unknown distribution of data in labels, and let
           ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is
           the task of finding a parameter value w ∗ such that the expected
           prediction risk
                                   E(x,y )∼d(x,y ) [∆(y , fp (x))]
           is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ).

                   Requires loss function at training time
                   Directly learns a prediction function fp (x)



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs        Test-time Inference   Training

Training



Loss-Minimizing Parameter Learning


           Problem (Loss-Minimizing Parameter Learning)
           Let d(x, y ) be the unknown distribution of data in labels, and let
           ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is
           the task of finding a parameter value w ∗ such that the expected
           prediction risk
                                   E(x,y )∼d(x,y ) [∆(y , fp (x))]
           is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ).

                   Requires loss function at training time
                   Directly learns a prediction function fp (x)



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models

01 graphical models

  • 1.
    Graphical Models Factor Graphs Test-time Inference Training Part 2: Introduction to Graphical Models Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 2.
    Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction Model: relating observations x to quantities of interest y f Example 1: given RGB image x, infer depth y for each pixel Example 2: given RGB image x, infer X Y presence and positions y of all objects f :X →Y shown Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 3.
    Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction Model: relating observations x to quantities of interest y f Example 1: given RGB image x, infer depth y for each pixel Example 2: given RGB image x, infer X Y presence and positions y of all objects f :X →Y shown X : image, Y: object annotations Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 4.
    Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction General case: mapping x ∈ X to y ∈ Y Graphical models are a concise language to define this mapping x Mapping can be ambiguous: f (x) measurement noise, lack of X Y well-posedness (e.g. occlusions) f :X →Y Probabilistic graphical models: define form p(y |x) or p(x, y ) for all y ∈ Y Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 5.
    Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction General case: mapping x ∈ X to y ∈ Y Graphical models are a concise ? language to define this mapping x Mapping can be ambiguous: ? measurement noise, lack of X Y well-posedness (e.g. occlusions) p(Y |X = x) Probabilistic graphical models: define form p(y |x) or p(x, y ) for all y ∈ Y Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 6.
    Graphical Models Factor Graphs Test-time Inference Training Graphical Models Graphical Models A graphical model defines a family of probability distributions over a set of random variables, by means of a graph, so that the random variables satisfy conditional independence assumptions encoded in the graph. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 7.
    Graphical Models Factor Graphs Test-time Inference Training Graphical Models Graphical Models A graphical model defines a family of probability distributions over a set of random variables, by means of a graph, so that the random variables satisfy conditional independence assumptions encoded in the graph. Popular classes of graphical models, Undirected graphical models (Markov random fields), Directed graphical models (Bayesian networks), Factor graphs, Others: chain graphs, influence diagrams, etc. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 8.
    Graphical Models Factor Graphs Test-time Inference Training Graphical Models Bayesian Networks Graph: G = (V , E), E ⊂ V × V Yi Yj directed acyclic Variable domains Yi Yk Factorization p(Y = y ) = p(yi |ypaG (i) ) Yl i∈V over distributions, by conditioning on parent A simple Bayes net nodes. Example p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj ) p(Yi = yi )p(Yj = yj ). Sebastian Nowozin and Christoph H. Lampert Family of distributions Part 2: Introduction to Graphical Models
  • 9.
    Graphical Models Factor Graphs Test-time Inference Training Graphical Models Bayesian Networks Graph: G = (V , E), E ⊂ V × V Yi Yj directed acyclic Variable domains Yi Yk Factorization p(Y = y ) = p(yi |ypaG (i) ) Yl i∈V over distributions, by conditioning on parent A simple Bayes net nodes. Example p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj ) p(Yi = yi )p(Yj = yj ). Sebastian Nowozin and Christoph H. Lampert Family of distributions Part 2: Introduction to Graphical Models
  • 10.
    Graphical Models Factor Graphs Test-time Inference Training Graphical Models Undirected Graphical Models Yi Yj Yk = Markov random field (MRF) = Markov network A simple MRF Graph: G = (V , E), E ⊂ V × V undirected, no self-edges Variable domains Yi Factorization over potentials ψ at cliques, 1 p(y ) = ψC (yC ) Z C ∈C(G ) Constant Z = y ∈Y C ∈C(G ) ψC (yC ) Example 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) Z Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 11.
    Graphical Models Factor Graphs Test-time Inference Training Graphical Models Undirected Graphical Models Yi Yj Yk = Markov random field (MRF) = Markov network A simple MRF Graph: G = (V , E), E ⊂ V × V undirected, no self-edges Variable domains Yi Factorization over potentials ψ at cliques, 1 p(y ) = ψC (yC ) Z C ∈C(G ) Constant Z = y ∈Y C ∈C(G ) ψC (yC ) Example 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) Z Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 12.
    Graphical Models Factor Graphs Test-time Inference Training Graphical Models Example 1 Yi Yj Yk Cliques C(G ): set of vertex sets V with V ⊆ V , E ∩ (V × V ) = V × V Here C(G ) = {{i}, {i, j}, {j}, {j, k}, {k}} 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) Z Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 13.
    Graphical Models Factor Graphs Test-time Inference Training Graphical Models Example 2 Yi Yj Yk Yl Here C(G ) = 2V : all subsets of V are cliques 1 p(y ) = ψA (yA ). Z A∈2{i,j,k,l} Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 14.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Factor Graphs Graph: G = (V , F, E), E ⊆ V × F Yi Yj variable nodes V , factor nodes F , edges E between variable and factor nodes. scope of a factor, N(F ) = {i ∈ V : (i, F ) ∈ E} Yk Yl Variable domains Yi Factorization over potentials ψ at factors, Factor graph 1 p(y ) = ψF (yN(F ) ) Z F ∈F Constant Z = y ∈Y F ∈F ψF (yN(F ) ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 15.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Factor Graphs Graph: G = (V , F, E), E ⊆ V × F Yi Yj variable nodes V , factor nodes F , edges E between variable and factor nodes. scope of a factor, N(F ) = {i ∈ V : (i, F ) ∈ E} Yk Yl Variable domains Yi Factorization over potentials ψ at factors, Factor graph 1 p(y ) = ψF (yN(F ) ) Z F ∈F Constant Z = y ∈Y F ∈F ψF (yN(F ) ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 16.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Why factor graphs? Yi Yj Yi Yj Yi Yj Yk Yl Yk Yl Yk Yl Factor graphs are explicit about the factorization Hence, easier to work with Universal (just like MRFs and Bayesian networks) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 17.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Capacity Yi Yj Yi Yj Yk Yl Yk Yl Factor graph defines family of distributions Some families are larger than others Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 18.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Four remaining pieces 1. Conditional distributions (CRFs) 2. Parameterization 3. Test-time inference 4. Learning the model from training data Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 19.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Four remaining pieces 1. Conditional distributions (CRFs) 2. Parameterization 3. Test-time inference 4. Learning the model from training data Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 20.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Conditional Distributions We have discussed p(y ), Xi Xj How do we define p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random fields (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 21.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Conditional Distributions We have discussed p(y ), Xi Xj How do we define p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random fields (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution 1 p(y ) = ψF (yN(F ) ) Z F ∈F 1 p(y |x) = ψF (yN(F ) ; xN(F ) ) Z (x) F ∈F Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 22.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Conditional Distributions We have discussed p(y ), Xi Xj How do we define p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random fields (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution 1 p(y ) = ψF (yN(F ) ) Z F ∈F 1 p(y |x) = ψF (yN(F ) ; xN(F ) ) Z (x) F ∈F Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 23.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Potentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 24.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Potentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 25.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Potentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 26.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Energy Minimization 1 argmax p(Y = y ) = argmax exp(− EF (yF )) y ∈Y y ∈Y Z F ∈F = argmax exp(− EF (yF )) y ∈Y F ∈F = argmax − EF (yF ) y ∈Y F ∈F = argmin EF (yF ) y ∈Y F ∈F = argmin E (y ). y ∈Y Energy minimization can be interpreted as solving for the most likely state of some factor graph model Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 27.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Parameterization Factor graphs define a family of distributions Parameterization: identifying individual members by parameters w Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 28.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Parameterization Factor graphs define a family of distributions Parameterization: identifying individual members by parameters w distributions indexed by w pw1 pw2 distributions in family Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 29.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization Image segmentation model Pairwise “Potts” energy function EF (yi , yj ; w1 ), EF : {0, 1} × {0, 1} × R → R, EF (0, 0; w1 ) = EF (1, 1; w1 ) = 0 image segmentation model EF (0, 1; w1 ) = EF (1, 0; w1 ) = w1 Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 30.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization (cont) Image segmentation model Unary energy function EF (yi ; x, w ), EF : {0, 1} × X × R{0,1}×D → R, EF (0; x, w ) = w (0), ψF (x) EF (1; x, w ) = w (1), ψF (x) image segmentation model Features ψF : X → RD , e.g. image filters Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 31.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization (cont) w(0), ψF (x) ... ... ... w(1), ψF (x) ... 0 w1 w1 0 Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 32.
    Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization (cont) w(0), ψF (x) ... ... ... w(1), ψF (x) ... 0 w1 w1 0 Total number of parameters: D + D + 1 Parameters are shared, but energies differ because of different ψF (x) General form, linear in w , EF (yF ; xF , w ) = w (yF ), ψF (xF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 33.
    Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Making Predictions Making predictions: given x ∈ X , predict y ∈ Y How to measure quality of prediction? (or function f : X → Y) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 34.
    Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Loss function Define a loss function ∆ : Y × Y → R+ , so that ∆(y , y ∗ ) measures the loss incurred by predicting y when y ∗ is true. The loss function is application dependent Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 35.
    Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Test-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 36.
    Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Test-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 37.
    Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Test-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 38.
    Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 1: 0/1 loss Loss 0 iff perfectly predicted, 1 otherwise: 0 if y = y ∗ ∆0/1 (y , y ∗ ) = I (y = y ∗ ) = 1 otherwise Plugging it in, y∗ := argmin Ey ∼p(y |x) ∆0/1 (y , y ) y ∈Y = argmax p(y |x) y ∈Y = argmin E (y , x). y ∈Y Minimizing the expected 0/1-loss → MAP prediction (energy minimization) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 39.
    Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 1: 0/1 loss Loss 0 iff perfectly predicted, 1 otherwise: 0 if y = y ∗ ∆0/1 (y , y ∗ ) = I (y = y ∗ ) = 1 otherwise Plugging it in, y∗ := argmin Ey ∼p(y |x) ∆0/1 (y , y ) y ∈Y = argmax p(y |x) y ∈Y = argmin E (y , x). y ∈Y Minimizing the expected 0/1-loss → MAP prediction (energy minimization) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 40.
    Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 2: Hamming loss Count the number of mislabeled variables: 1 ∆H (y , y ∗ ) = I (yi = yi∗ ) |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆H (y , y )] y ∈Y = argmax p(yi |x) yi ∈Yi i∈V Minimizing the expected Hamming loss → maximum posterior marginal (MPM, Max-Marg) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 41.
    Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 2: Hamming loss Count the number of mislabeled variables: 1 ∆H (y , y ∗ ) = I (yi = yi∗ ) |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆H (y , y )] y ∈Y = argmax p(yi |x) yi ∈Yi i∈V Minimizing the expected Hamming loss → maximum posterior marginal (MPM, Max-Marg) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 42.
    Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 3: Squared error Assume a vector space on Yi (pixel intensities, optical flow vectors, etc.). Sum of squared errors 1 ∆Q (y , y ∗ ) = yi − yi∗ 2 . |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆Q (y , y )] y ∈Y   =  p(yi |x)yi  yi ∈Yi i∈V Minimizing the expected squared error → minimum mean squared error (MMSE) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 43.
    Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 3: Squared error Assume a vector space on Yi (pixel intensities, optical flow vectors, etc.). Sum of squared errors 1 ∆Q (y , y ∗ ) = yi − yi∗ 2 . |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆Q (y , y )] y ∈Y   =  p(yi |x)yi  yi ∈Yi i∈V Minimizing the expected squared error → minimum mean squared error (MMSE) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 44.
    Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Inference Task: Maximum A Posteriori (MAP) Inference Definition (Maximum A Posteriori (MAP) Inference) Given a factor graph, parameterization, and weight vector w , and given the observation x, find y ∗ = argmax p(Y = y |x, w ) = argmin E (y ; x, w ). y ∈Y y ∈Y Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 45.
    Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Inference Task: Probabilistic Inference Definition (Probabilistic Inference) Given a factor graph, parameterization, and weight vector w , and given the observation x, find log Z (x, w ) = log exp(−E (y ; x, w )), y ∈Y µF (yF ) = p(YF = yf |x, w ), ∀F ∈ F, ∀yF ∈ YF . This typically includes variable marginals µi (yi ) = p(yi |x, w ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 46.
    Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example: Man-made structure detection Xi ψi2 Yi 3 ψi,k Yk ψi1 Left: input image x, Middle: ground truth labeling on 16-by-16 pixel blocks, Right: factor graph model Features: gradient and color histograms Estimate model parameters from ≈ 60 training images Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 47.
    Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example: Man-made structure detection Left: input image x, Middle (probabilistic inference): visualization of the variable marginals p(yi = “manmade |x, w ), Right (MAP inference): joint MAP labeling y ∗ = argmaxy ∈Y p(y |x, w ). Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 48.
    Graphical Models Factor Graphs Test-time Inference Training Training Training the Model What can be learned? Model structure: factors Model variables: observed variables fixed, but we can add unobserved variables Factor energies: parameters Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 49.
    Graphical Models Factor Graphs Test-time Inference Training Training Training the Model What can be learned? Model structure: factors Model variables: observed variables fixed, but we can add unobserved variables Factor energies: parameters Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 50.
    Graphical Models Factor Graphs Test-time Inference Training Training Training: Overview Assume a fully observed, independent and identically distributed (iid) sample set {(x n , y n )}n=1,...,N , (x n , y n ) ∼ d(X , Y ) Goal: predict well, Alternative goal: first model d(y |x) well by p(y |x, w ), then predict by minimizing the expected loss Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 51.
    Graphical Models Factor Graphs Test-time Inference Training Training Probabilistic Learning Problem (Probabilistic Parameter Learning) Let d(y |x) be the (unknown) conditional distribution of labels for a problem to be solved. For a parameterized conditional distribution p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is the task of finding a point estimate of the parameter w ∗ that makes p(y |x, w ∗ ) closest to d(y |x). We will discuss probabilistic parameter learning in detail. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 52.
    Graphical Models Factor Graphs Test-time Inference Training Training Probabilistic Learning Problem (Probabilistic Parameter Learning) Let d(y |x) be the (unknown) conditional distribution of labels for a problem to be solved. For a parameterized conditional distribution p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is the task of finding a point estimate of the parameter w ∗ that makes p(y |x, w ∗ ) closest to d(y |x). We will discuss probabilistic parameter learning in detail. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 53.
    Graphical Models Factor Graphs Test-time Inference Training Training Loss-Minimizing Parameter Learning Problem (Loss-Minimizing Parameter Learning) Let d(x, y ) be the unknown distribution of data in labels, and let ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is the task of finding a parameter value w ∗ such that the expected prediction risk E(x,y )∼d(x,y ) [∆(y , fp (x))] is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ). Requires loss function at training time Directly learns a prediction function fp (x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 54.
    Graphical Models Factor Graphs Test-time Inference Training Training Loss-Minimizing Parameter Learning Problem (Loss-Minimizing Parameter Learning) Let d(x, y ) be the unknown distribution of data in labels, and let ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is the task of finding a parameter value w ∗ such that the expected prediction risk E(x,y )∼d(x,y ) [∆(y , fp (x))] is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ). Requires loss function at training time Directly learns a prediction function fp (x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models