SlideShare a Scribd company logo
Graphical Models                              Factor Graphs           Test-time Inference   Training




                       Part 2: Introduction to Graphical Models

                                 Sebastian Nowozin and Christoph H. Lampert



                                             Colorado Springs, 25th June 2011




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference       Training

Graphical Models



Introduction
             Model: relating observations x to
             quantities of interest y
                                                                                   f
             Example 1: given RGB image x, infer
             depth y for each pixel
             Example 2: given RGB image x, infer              X                        Y
             presence and positions y of all objects                     f :X →Y
             shown




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs           Test-time Inference       Training

Graphical Models



Introduction
             Model: relating observations x to
             quantities of interest y
                                                                                            f
             Example 1: given RGB image x, infer
             depth y for each pixel
             Example 2: given RGB image x, infer                       X                        Y
             presence and positions y of all objects                              f :X →Y
             shown




                                             X : image, Y: object annotations
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference           Training

Graphical Models



Introduction



             General case: mapping x ∈ X to y ∈ Y
             Graphical models are a concise
             language to define this mapping                              x
             Mapping can be ambiguous:
                                                                                   f (x)
             measurement noise, lack of                       X                            Y
             well-posedness (e.g. occlusions)                            f :X →Y
             Probabilistic graphical models: define
             form p(y |x) or p(x, y ) for all y ∈ Y




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference       Training

Graphical Models



Introduction



             General case: mapping x ∈ X to y ∈ Y
             Graphical models are a concise                                        ?
             language to define this mapping                              x
             Mapping can be ambiguous:                                             ?
             measurement noise, lack of                       X                        Y
             well-posedness (e.g. occlusions)                            p(Y |X = x)
             Probabilistic graphical models: define
             form p(y |x) or p(x, y ) for all y ∈ Y




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference          Training

Graphical Models



Graphical Models

        A graphical model defines
                   a family of probability distributions over a set of random variables,
                   by means of a graph,
                   so that the random variables satisfy conditional independence
                   assumptions encoded in the graph.




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference          Training

Graphical Models



Graphical Models

        A graphical model defines
                   a family of probability distributions over a set of random variables,
                   by means of a graph,
                   so that the random variables satisfy conditional independence
                   assumptions encoded in the graph.
     Popular classes of graphical models,
         Undirected graphical models (Markov
         random fields),
         Directed graphical models (Bayesian
         networks),
             Factor graphs,
             Others: chain graphs, influence
             diagrams, etc.

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs           Test-time Inference                Training

Graphical Models



Bayesian Networks

      Graph: G = (V , E), E ⊂ V × V                                                          Yi            Yj
              directed
              acyclic
      Variable domains Yi                                                                           Yk
      Factorization

                      p(Y = y ) =                  p(yi |ypaG (i) )
                                                                                                    Yl
                                             i∈V

      over distributions, by conditioning on parent                                         A simple Bayes net
      nodes.
      Example

      p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj )
                            p(Yi = yi )p(Yj = yj ).
Sebastian Nowozin and Christoph H. Lampert
      Family of distributions
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs           Test-time Inference                Training

Graphical Models



Bayesian Networks

      Graph: G = (V , E), E ⊂ V × V                                                          Yi            Yj
              directed
              acyclic
      Variable domains Yi                                                                           Yk
      Factorization

                      p(Y = y ) =                  p(yi |ypaG (i) )
                                                                                                    Yl
                                             i∈V

      over distributions, by conditioning on parent                                         A simple Bayes net
      nodes.
      Example

      p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj )
                            p(Yi = yi )p(Yj = yj ).
Sebastian Nowozin and Christoph H. Lampert
      Family of distributions
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                     Test-time Inference                  Training

Graphical Models



Undirected Graphical Models
                                                                                                Yi         Yj        Yk
             = Markov random field (MRF) = Markov
             network                                                                                  A simple MRF
             Graph: G = (V , E), E ⊂ V × V
                      undirected, no self-edges
             Variable domains Yi
             Factorization over potentials ψ at cliques,
                                              1
                                 p(y ) =                       ψC (yC )
                                              Z
                                                    C ∈C(G )


             Constant Z =                    y ∈Y     C ∈C(G )   ψC (yC )
             Example
                                    1
                     p(y ) =          ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj )
                                    Z
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                     Test-time Inference                  Training

Graphical Models



Undirected Graphical Models
                                                                                                Yi         Yj        Yk
             = Markov random field (MRF) = Markov
             network                                                                                  A simple MRF
             Graph: G = (V , E), E ⊂ V × V
                      undirected, no self-edges
             Variable domains Yi
             Factorization over potentials ψ at cliques,
                                              1
                                 p(y ) =                       ψC (yC )
                                              Z
                                                    C ∈C(G )


             Constant Z =                    y ∈Y     C ∈C(G )   ψC (yC )
             Example
                                    1
                     p(y ) =          ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj )
                                    Z
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                      Test-time Inference   Training

Graphical Models



Example 1


                                                  Yi               Yj              Yk




                   Cliques C(G ): set of vertex sets V with V ⊆ V ,
                   E ∩ (V × V ) = V × V
                   Here C(G ) = {{i}, {i, j}, {j}, {j, k}, {k}}

                                                         1
                                             p(y ) =       ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj )
                                                         Z



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                       Test-time Inference   Training

Graphical Models



Example 2


                                                        Yi                 Yj




                                                        Yk                 Yl



                   Here C(G ) = 2V : all subsets of V are cliques

                                                             1
                                                   p(y ) =                      ψA (yA ).
                                                             Z
                                                                 A∈2{i,j,k,l}


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                  Test-time Inference                       Training

Factor Graphs



Factor Graphs


                Graph: G = (V , F, E), E ⊆ V × F                                              Yi                  Yj
                      variable nodes V ,
                      factor nodes F ,
                      edges E between variable and factor nodes.
                      scope of a factor,
                      N(F ) = {i ∈ V : (i, F ) ∈ E}
                                                                                              Yk                  Yl
                Variable domains Yi
                Factorization over potentials ψ at factors,                                        Factor graph
                                              1
                                 p(y ) =                   ψF (yN(F ) )
                                              Z
                                                    F ∈F

                Constant Z =                 y ∈Y     F ∈F    ψF (yN(F ) )


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                  Test-time Inference                       Training

Factor Graphs



Factor Graphs


                Graph: G = (V , F, E), E ⊆ V × F                                              Yi                  Yj
                      variable nodes V ,
                      factor nodes F ,
                      edges E between variable and factor nodes.
                      scope of a factor,
                      N(F ) = {i ∈ V : (i, F ) ∈ E}
                                                                                              Yk                  Yl
                Variable domains Yi
                Factorization over potentials ψ at factors,                                        Factor graph
                                              1
                                 p(y ) =                   ψF (yN(F ) )
                                              Z
                                                    F ∈F

                Constant Z =                 y ∈Y     F ∈F    ψF (yN(F ) )


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs             Test-time Inference        Training

Factor Graphs



Why factor graphs?


                         Yi                  Yj              Yi   Yj         Yi              Yj




                         Yk                  Yl              Yk   Yl         Yk              Yl




                   Factor graphs are explicit about the factorization
                   Hence, easier to work with
                   Universal (just like MRFs and Bayesian networks)


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference             Training

Factor Graphs



Capacity



           Yi                     Yj                                               Yi   Yj




           Yk                     Yl                                               Yk   Yl




                   Factor graph defines family of distributions
                   Some families are larger than others



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Factor Graphs



Four remaining pieces




           1. Conditional distributions (CRFs)
           2. Parameterization
           3. Test-time inference
           4. Learning the model from training data




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Factor Graphs



Four remaining pieces




           1. Conditional distributions (CRFs)
           2. Parameterization
           3. Test-time inference
           4. Learning the model from training data




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference                        Training

Factor Graphs



Conditional Distributions

                We have discussed p(y ),                                           Xi          Xj
                How do we define p(y |x)?
                Potentials become a function of xN(F )
                Partition function depends on x
                                                                                   Yi              Yj
                Conditional random fields (CRFs)
                x is not part of the probability model, i.e. not                    conditional
                treated as random variable                                          distribution




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                    Test-time Inference                        Training

Factor Graphs



Conditional Distributions

                We have discussed p(y ),                                                             Xi          Xj
                How do we define p(y |x)?
                Potentials become a function of xN(F )
                Partition function depends on x
                                                                                                     Yi              Yj
                Conditional random fields (CRFs)
                x is not part of the probability model, i.e. not                                      conditional
                treated as random variable                                                            distribution
                                               1
                                      p(y ) =        ψF (yN(F ) )
                                               Z
                                                              F ∈F

                                                       1
                                           p(y |x) =                 ψF (yN(F ) ; xN(F ) )
                                                     Z (x)
                                                              F ∈F



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                    Test-time Inference                        Training

Factor Graphs



Conditional Distributions

                We have discussed p(y ),                                                             Xi          Xj
                How do we define p(y |x)?
                Potentials become a function of xN(F )
                Partition function depends on x
                                                                                                     Yi              Yj
                Conditional random fields (CRFs)
                x is not part of the probability model, i.e. not                                      conditional
                treated as random variable                                                            distribution
                                               1
                                      p(y ) =        ψF (yN(F ) )
                                               Z
                                                              F ∈F

                                                       1
                                           p(y |x) =                 ψF (yN(F ) ; xN(F ) )
                                                     Z (x)
                                                              F ∈F



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                         Test-time Inference                     Training

Factor Graphs



Potentials and Energy Functions

                   For each factor F ∈ F, YF =                     ×
                                                                  i∈N(F )
                                                                            Yi ,

                                                              EF : YN(F ) → R,

                   Potentials and energies (assume ψF (yF ) > 0)

                         ψF (yF ) = exp(−EF (yF )),                  and EF (yF ) = − log(ψF (yF )).

                   Then p(y ) can be written as
                                                                   1
                                             p(Y = y )        =                ψF (yF )
                                                                   Z
                                                                       F ∈F
                                                                   1
                                                              =      exp(−                EF (yF )),
                                                                   Z
                                                                                   F ∈F

                   Hence, p(y ) is completely determined by E (y ) =                                      F ∈F   EF (yF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                         Test-time Inference                     Training

Factor Graphs



Potentials and Energy Functions

                   For each factor F ∈ F, YF =                     ×
                                                                  i∈N(F )
                                                                            Yi ,

                                                              EF : YN(F ) → R,

                   Potentials and energies (assume ψF (yF ) > 0)

                         ψF (yF ) = exp(−EF (yF )),                  and EF (yF ) = − log(ψF (yF )).

                   Then p(y ) can be written as
                                                                   1
                                             p(Y = y )        =                ψF (yF )
                                                                   Z
                                                                       F ∈F
                                                                   1
                                                              =      exp(−                EF (yF )),
                                                                   Z
                                                                                   F ∈F

                   Hence, p(y ) is completely determined by E (y ) =                                      F ∈F   EF (yF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                         Test-time Inference                     Training

Factor Graphs



Potentials and Energy Functions

                   For each factor F ∈ F, YF =                     ×
                                                                  i∈N(F )
                                                                            Yi ,

                                                              EF : YN(F ) → R,

                   Potentials and energies (assume ψF (yF ) > 0)

                         ψF (yF ) = exp(−EF (yF )),                  and EF (yF ) = − log(ψF (yF )).

                   Then p(y ) can be written as
                                                                   1
                                             p(Y = y )        =                ψF (yF )
                                                                   Z
                                                                       F ∈F
                                                                   1
                                                              =      exp(−                EF (yF )),
                                                                   Z
                                                                                   F ∈F

                   Hence, p(y ) is completely determined by E (y ) =                                      F ∈F   EF (yF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                     Test-time Inference      Training

Factor Graphs



Energy Minimization

                                                                      1
                     argmax p(Y = y )                 =      argmax     exp(−               EF (yF ))
                        y ∈Y                                  y ∈Y    Z
                                                                                    F ∈F

                                                      =      argmax exp(−              EF (yF ))
                                                              y ∈Y
                                                                               F ∈F

                                                      =      argmax −           EF (yF )
                                                              y ∈Y
                                                                        F ∈F

                                                      =      argmin          EF (yF )
                                                              y ∈Y
                                                                      F ∈F
                                                      =      argmin E (y ).
                                                              y ∈Y


                   Energy minimization can be interpreted as solving for the most likely
                   state of some factor graph model
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference      Training

Factor Graphs



Parameterization
                   Factor graphs define a family of distributions
                   Parameterization: identifying individual members by parameters w




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs           Test-time Inference   Training

Factor Graphs



Parameterization
                   Factor graphs define a family of distributions
                   Parameterization: identifying individual members by parameters w


                         distributions
                         indexed
                         by w                                      pw1
                                                             pw2




                                                                          distributions
                                                                          in family

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference      Training

Factor Graphs



Example: Parameterization



                Image segmentation model
                Pairwise “Potts” energy function
                EF (yi , yj ; w1 ),

                      EF : {0, 1} × {0, 1} × R → R,

                EF (0, 0; w1 ) = EF (1, 1; w1 ) = 0            image segmentation model
                EF (0, 1; w1 ) = EF (1, 0; w1 ) = w1




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference      Training

Factor Graphs



Example: Parameterization (cont)



                Image segmentation model
                Unary energy function EF (yi ; x, w ),

                   EF : {0, 1} × X × R{0,1}×D → R,

                EF (0; x, w ) = w (0), ψF (x)
                EF (1; x, w ) = w (1), ψF (x)                  image segmentation model
                Features ψF : X → RD , e.g. image
                filters




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs          Test-time Inference         Training

Factor Graphs



Example: Parameterization (cont)

                                w(0), ψF (x)
                                                             ...               ...        ...
                                w(1), ψF (x)
                                                                                          ...
                                                              0    w1
                                                              w1   0




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs               Test-time Inference         Training

Factor Graphs



Example: Parameterization (cont)

                                w(0), ψF (x)
                                                               ...                  ...        ...
                                w(1), ψF (x)
                                                                                               ...
                                                                 0     w1
                                                                 w1    0


                   Total number of parameters: D + D + 1
                   Parameters are shared, but energies differ because of different ψF (x)
                   General form, linear in w ,

                                              EF (yF ; xF , w ) = w (yF ), ψF (xF )

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Test-time Inference



Making Predictions




                   Making predictions: given x ∈ X , predict y ∈ Y
                   How to measure quality of prediction? (or function f : X → Y)




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                 Test-time Inference   Training

Test-time Inference



Loss function


                   Define a loss function

                                                             ∆ : Y × Y → R+ ,

                   so that ∆(y , y ∗ ) measures the loss incurred by predicting y when y ∗
                   is true.
                   The loss function is application dependent




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                      Test-time Inference   Training

Test-time Inference



Test-time Inference

                   Loss function ∆(y , f (x)): correct label y , predict f (x)

                                                             ∆:Y ×Y →R

                   True joint distribution d(X , Y ) and true conditional d(y |x)
                   Model distribution p(y |x)
                   Expected loss: quality of prediction

                                             R∆ (x)
                                              f              = Ey ∼d(y |x) ∆(y , f (x))
                                                             =          d(y |x) ∆(y , f (x)).
                                                                 y ∈Y
                                                             ≈ Ey ∼p(y |x;w ) ∆(y , f (x))

                   Assuming that p(y |x; w ) ≈ d(y |x)

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                      Test-time Inference   Training

Test-time Inference



Test-time Inference

                   Loss function ∆(y , f (x)): correct label y , predict f (x)

                                                             ∆:Y ×Y →R

                   True joint distribution d(X , Y ) and true conditional d(y |x)
                   Model distribution p(y |x)
                   Expected loss: quality of prediction

                                             R∆ (x)
                                              f              = Ey ∼d(y |x) ∆(y , f (x))
                                                             =          d(y |x) ∆(y , f (x)).
                                                                 y ∈Y
                                                             ≈ Ey ∼p(y |x;w ) ∆(y , f (x))

                   Assuming that p(y |x; w ) ≈ d(y |x)

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                      Test-time Inference   Training

Test-time Inference



Test-time Inference

                   Loss function ∆(y , f (x)): correct label y , predict f (x)

                                                             ∆:Y ×Y →R

                   True joint distribution d(X , Y ) and true conditional d(y |x)
                   Model distribution p(y |x)
                   Expected loss: quality of prediction

                                             R∆ (x)
                                              f              = Ey ∼d(y |x) ∆(y , f (x))
                                                             =          d(y |x) ∆(y , f (x)).
                                                                 y ∈Y
                                                             ≈ Ey ∼p(y |x;w ) ∆(y , f (x))

                   Assuming that p(y |x; w ) ≈ d(y |x)

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                  Test-time Inference   Training

Test-time Inference



Example 1: 0/1 loss

        Loss 0 iff perfectly predicted, 1 otherwise:

                                                                        0      if y = y ∗
                                  ∆0/1 (y , y ∗ ) = I (y = y ∗ ) =
                                                                        1      otherwise

        Plugging it in,

                                      y∗     := argmin Ey ∼p(y |x) ∆0/1 (y , y )
                                                       y ∈Y

                                             =      argmax p(y |x)
                                                       y ∈Y

                                             =      argmin E (y , x).
                                                       y ∈Y



                   Minimizing the expected 0/1-loss → MAP prediction (energy
                   minimization)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                  Test-time Inference   Training

Test-time Inference



Example 1: 0/1 loss

        Loss 0 iff perfectly predicted, 1 otherwise:

                                                                        0      if y = y ∗
                                  ∆0/1 (y , y ∗ ) = I (y = y ∗ ) =
                                                                        1      otherwise

        Plugging it in,

                                      y∗     := argmin Ey ∼p(y |x) ∆0/1 (y , y )
                                                       y ∈Y

                                             =      argmax p(y |x)
                                                       y ∈Y

                                             =      argmin E (y , x).
                                                       y ∈Y



                   Minimizing the expected 0/1-loss → MAP prediction (energy
                   minimization)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                Factor Graphs                 Test-time Inference   Training

Test-time Inference



Example 2: Hamming loss
     Count the number of mislabeled variables:
                                             1
                      ∆H (y , y ∗ ) =                    I (yi = yi∗ )
                                            |V |
                                                   i∈V




        Plugging it in,

                                           y∗   := argmin Ey ∼p(y |x) [∆H (y , y )]
                                                            y ∈Y


                                                 =          argmax p(yi |x)
                                                                yi ∈Yi
                                                                              i∈V


                   Minimizing the expected Hamming loss → maximum posterior
                   marginal (MPM, Max-Marg) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                Factor Graphs                 Test-time Inference   Training

Test-time Inference



Example 2: Hamming loss
     Count the number of mislabeled variables:
                                             1
                      ∆H (y , y ∗ ) =                    I (yi = yi∗ )
                                            |V |
                                                   i∈V




        Plugging it in,

                                           y∗   := argmin Ey ∼p(y |x) [∆H (y , y )]
                                                            y ∈Y


                                                 =          argmax p(yi |x)
                                                                yi ∈Yi
                                                                              i∈V


                   Minimizing the expected Hamming loss → maximum posterior
                   marginal (MPM, Max-Marg) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                 Factor Graphs                           Test-time Inference   Training

Test-time Inference



Example 3: Squared error
     Assume a vector space on Yi (pixel intensities,
     optical flow vectors, etc.).
     Sum of squared errors
                                                 1
                      ∆Q (y , y ∗ ) =                            yi − yi∗ 2 .
                                                |V |
                                                       i∈V


        Plugging it in,
                                           y∗    := argmin Ey ∼p(y |x) [∆Q (y , y )]
                                                             y ∈Y
                                                                                  

                                                  =                      p(yi |x)yi 
                                                                 yi ∈Yi
                                                                                         i∈V

                   Minimizing the expected squared error → minimum mean squared
                   error (MMSE) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                 Factor Graphs                           Test-time Inference   Training

Test-time Inference



Example 3: Squared error
     Assume a vector space on Yi (pixel intensities,
     optical flow vectors, etc.).
     Sum of squared errors
                                                 1
                      ∆Q (y , y ∗ ) =                            yi − yi∗ 2 .
                                                |V |
                                                       i∈V


        Plugging it in,
                                           y∗    := argmin Ey ∼p(y |x) [∆Q (y , y )]
                                                             y ∈Y
                                                                                  

                                                  =                      p(yi |x)yi 
                                                                 yi ∈Yi
                                                                                         i∈V

                   Minimizing the expected squared error → minimum mean squared
                   error (MMSE) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs      Test-time Inference   Training

Test-time Inference



Inference Task: Maximum A Posteriori (MAP) Inference




        Definition (Maximum A Posteriori (MAP) Inference)
        Given a factor graph, parameterization, and weight vector w , and given
        the observation x, find

                             y ∗ = argmax p(Y = y |x, w ) = argmin E (y ; x, w ).
                                           y ∈Y                y ∈Y




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                        Test-time Inference   Training

Test-time Inference



Inference Task: Probabilistic Inference


        Definition (Probabilistic Inference)
        Given a factor graph, parameterization, and weight vector w , and given
        the observation x, find

                       log Z (x, w ) =              log             exp(−E (y ; x, w )),
                                                             y ∈Y
                               µF (yF )      = p(YF = yf |x, w ),                ∀F ∈ F, ∀yF ∈ YF .


                   This typically includes variable marginals

                                                             µi (yi ) = p(yi |x, w )



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference             Training

Test-time Inference



Example: Man-made structure detection


                                                                       Xi
                                                                     ψi2
                                                                    Yi            3
                                                                                 ψi,k   Yk
                                                                           ψi1




                   Left: input image x,
                   Middle: ground truth labeling on 16-by-16 pixel blocks,
                   Right: factor graph model

                   Features: gradient and color histograms
                   Estimate model parameters from ≈ 60 training images


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference    Training

Test-time Inference



Example: Man-made structure detection




                   Left: input image x,
                   Middle (probabilistic inference): visualization of the variable
                   marginals p(yi = “manmade |x, w ),
                   Right (MAP inference): joint MAP labeling
                   y ∗ = argmaxy ∈Y p(y |x, w ).



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Training



Training the Model




           What can be learned?
              Model structure: factors
                   Model variables: observed variables fixed, but we can add
                   unobserved variables
                   Factor energies: parameters




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Training



Training the Model




           What can be learned?
              Model structure: factors
                   Model variables: observed variables fixed, but we can add
                   unobserved variables
                   Factor energies: parameters




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                Factor Graphs                Test-time Inference   Training

Training



Training: Overview



                   Assume a fully observed, independent and identically distributed
                   (iid) sample set

                                           {(x n , y n )}n=1,...,N ,   (x n , y n ) ∼ d(X , Y )

                   Goal: predict well,
                   Alternative goal: first model d(y |x) well by p(y |x, w ), then predict
                   by minimizing the expected loss




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference     Training

Training



Probabilistic Learning



           Problem (Probabilistic Parameter Learning)
           Let d(y |x) be the (unknown) conditional distribution of labels for a
           problem to be solved. For a parameterized conditional distribution
           p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is
           the task of finding a point estimate of the parameter w ∗ that makes
           p(y |x, w ∗ ) closest to d(y |x).

                   We will discuss probabilistic parameter learning in detail.




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference     Training

Training



Probabilistic Learning



           Problem (Probabilistic Parameter Learning)
           Let d(y |x) be the (unknown) conditional distribution of labels for a
           problem to be solved. For a parameterized conditional distribution
           p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is
           the task of finding a point estimate of the parameter w ∗ that makes
           p(y |x, w ∗ ) closest to d(y |x).

                   We will discuss probabilistic parameter learning in detail.




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs        Test-time Inference   Training

Training



Loss-Minimizing Parameter Learning


           Problem (Loss-Minimizing Parameter Learning)
           Let d(x, y ) be the unknown distribution of data in labels, and let
           ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is
           the task of finding a parameter value w ∗ such that the expected
           prediction risk
                                   E(x,y )∼d(x,y ) [∆(y , fp (x))]
           is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ).

                   Requires loss function at training time
                   Directly learns a prediction function fp (x)



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs        Test-time Inference   Training

Training



Loss-Minimizing Parameter Learning


           Problem (Loss-Minimizing Parameter Learning)
           Let d(x, y ) be the unknown distribution of data in labels, and let
           ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is
           the task of finding a parameter value w ∗ such that the expected
           prediction risk
                                   E(x,y )∼d(x,y ) [∆(y , fp (x))]
           is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ).

                   Requires loss function at training time
                   Directly learns a prediction function fp (x)



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models

More Related Content

What's hot

Lesson 12: Linear Approximation
Lesson 12: Linear ApproximationLesson 12: Linear Approximation
Lesson 12: Linear ApproximationMatthew Leingang
 
Elementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization ProblemsElementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization Problems
jfrchicanog
 
JavaYDL13
JavaYDL13JavaYDL13
JavaYDL13
Terry Yoast
 
Lesson 25: The Definite Integral
Lesson 25: The Definite IntegralLesson 25: The Definite Integral
Lesson 25: The Definite IntegralMatthew Leingang
 
Robust Shape and Topology Optimization - Northwestern
Robust Shape and Topology Optimization - Northwestern Robust Shape and Topology Optimization - Northwestern
Robust Shape and Topology Optimization - Northwestern
Altair
 
Identity Based Encryption
Identity Based EncryptionIdentity Based Encryption
Identity Based Encryption
Pratik Poddar
 
Elementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization ProblemsElementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization Problems
jfrchicanog
 
Lecture11
Lecture11Lecture11
Lecture11Bo Li
 
JOSA TechTalks - Machine Learning on Graph-Structured Data
JOSA TechTalks - Machine Learning on Graph-Structured DataJOSA TechTalks - Machine Learning on Graph-Structured Data
JOSA TechTalks - Machine Learning on Graph-Structured Data
Jordan Open Source Association
 
Adaptive Signal and Image Processing
Adaptive Signal and Image ProcessingAdaptive Signal and Image Processing
Adaptive Signal and Image Processing
Gabriel Peyré
 
Mixture Models for Image Analysis
Mixture Models for Image AnalysisMixture Models for Image Analysis
Image formation
Image formationImage formation
Image formation
potaters
 
Discussion of Faming Liang's talk
Discussion of Faming Liang's talkDiscussion of Faming Liang's talk
Discussion of Faming Liang's talk
Christian Robert
 
Kernelization algorithms for graph and other structure modification problems
Kernelization algorithms for graph and other structure modification problemsKernelization algorithms for graph and other structure modification problems
Kernelization algorithms for graph and other structure modification problems
Anthony Perez
 
Optimal Transport in Imaging Sciences
Optimal Transport in Imaging SciencesOptimal Transport in Imaging Sciences
Optimal Transport in Imaging Sciences
Gabriel Peyré
 
Camera parameters
Camera parametersCamera parameters
Camera parameters
TheYacine
 
An Introduction to Optimal Transport
An Introduction to Optimal TransportAn Introduction to Optimal Transport
An Introduction to Optimal Transport
Gabriel Peyré
 
Bayesian Defect Signal Analysis for Nondestructive Evaluation of Materials
Bayesian Defect Signal Analysis for Nondestructive Evaluation of MaterialsBayesian Defect Signal Analysis for Nondestructive Evaluation of Materials
Bayesian Defect Signal Analysis for Nondestructive Evaluation of Materials
Aleksandar Dogandžić
 
A type system for the vectorial aspects of the linear-algebraic lambda-calculus
A type system for the vectorial aspects of the linear-algebraic lambda-calculusA type system for the vectorial aspects of the linear-algebraic lambda-calculus
A type system for the vectorial aspects of the linear-algebraic lambda-calculus
Alejandro Díaz-Caro
 

What's hot (19)

Lesson 12: Linear Approximation
Lesson 12: Linear ApproximationLesson 12: Linear Approximation
Lesson 12: Linear Approximation
 
Elementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization ProblemsElementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization Problems
 
JavaYDL13
JavaYDL13JavaYDL13
JavaYDL13
 
Lesson 25: The Definite Integral
Lesson 25: The Definite IntegralLesson 25: The Definite Integral
Lesson 25: The Definite Integral
 
Robust Shape and Topology Optimization - Northwestern
Robust Shape and Topology Optimization - Northwestern Robust Shape and Topology Optimization - Northwestern
Robust Shape and Topology Optimization - Northwestern
 
Identity Based Encryption
Identity Based EncryptionIdentity Based Encryption
Identity Based Encryption
 
Elementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization ProblemsElementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization Problems
 
Lecture11
Lecture11Lecture11
Lecture11
 
JOSA TechTalks - Machine Learning on Graph-Structured Data
JOSA TechTalks - Machine Learning on Graph-Structured DataJOSA TechTalks - Machine Learning on Graph-Structured Data
JOSA TechTalks - Machine Learning on Graph-Structured Data
 
Adaptive Signal and Image Processing
Adaptive Signal and Image ProcessingAdaptive Signal and Image Processing
Adaptive Signal and Image Processing
 
Mixture Models for Image Analysis
Mixture Models for Image AnalysisMixture Models for Image Analysis
Mixture Models for Image Analysis
 
Image formation
Image formationImage formation
Image formation
 
Discussion of Faming Liang's talk
Discussion of Faming Liang's talkDiscussion of Faming Liang's talk
Discussion of Faming Liang's talk
 
Kernelization algorithms for graph and other structure modification problems
Kernelization algorithms for graph and other structure modification problemsKernelization algorithms for graph and other structure modification problems
Kernelization algorithms for graph and other structure modification problems
 
Optimal Transport in Imaging Sciences
Optimal Transport in Imaging SciencesOptimal Transport in Imaging Sciences
Optimal Transport in Imaging Sciences
 
Camera parameters
Camera parametersCamera parameters
Camera parameters
 
An Introduction to Optimal Transport
An Introduction to Optimal TransportAn Introduction to Optimal Transport
An Introduction to Optimal Transport
 
Bayesian Defect Signal Analysis for Nondestructive Evaluation of Materials
Bayesian Defect Signal Analysis for Nondestructive Evaluation of MaterialsBayesian Defect Signal Analysis for Nondestructive Evaluation of Materials
Bayesian Defect Signal Analysis for Nondestructive Evaluation of Materials
 
A type system for the vectorial aspects of the linear-algebraic lambda-calculus
A type system for the vectorial aspects of the linear-algebraic lambda-calculusA type system for the vectorial aspects of the linear-algebraic lambda-calculus
A type system for the vectorial aspects of the linear-algebraic lambda-calculus
 

Similar to 01 graphical models

A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functions
LARCA UPC
 
Physics of Algorithms Talk
Physics of Algorithms TalkPhysics of Algorithms Talk
Physics of Algorithms Talkjasonj383
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
岳華 杜
 
Stochastic Differentiation
Stochastic DifferentiationStochastic Differentiation
Stochastic Differentiation
SSA KPI
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slidesCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slideszukun
 
Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Deb Roy
 
Bayesian case studies, practical 2
Bayesian case studies, practical 2Bayesian case studies, practical 2
Bayesian case studies, practical 2Robin Ryder
 
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Alex (Oleksiy) Varfolomiyev
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Beniamino Murgante
 
Chapter 3 projection
Chapter 3 projectionChapter 3 projection
Chapter 3 projectionNBER
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
Frank Nielsen
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 
Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programming
SSA KPI
 
An introduction to quantum stochastic calculus
An introduction to quantum stochastic calculusAn introduction to quantum stochastic calculus
An introduction to quantum stochastic calculusSpringer
 
Tuto part2
Tuto part2Tuto part2
Tuto part2Bo Li
 
Numerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis methodNumerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis method
Alexander Decker
 

Similar to 01 graphical models (20)

A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functions
 
Physics of Algorithms Talk
Physics of Algorithms TalkPhysics of Algorithms Talk
Physics of Algorithms Talk
 
YSC 2013
YSC 2013YSC 2013
YSC 2013
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
 
Stochastic Differentiation
Stochastic DifferentiationStochastic Differentiation
Stochastic Differentiation
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slidesCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
 
Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01
 
Bayesian case studies, practical 2
Bayesian case studies, practical 2Bayesian case studies, practical 2
Bayesian case studies, practical 2
 
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
 
cswiercz-general-presentation
cswiercz-general-presentationcswiercz-general-presentation
cswiercz-general-presentation
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
 
Chapter 3 projection
Chapter 3 projectionChapter 3 projection
Chapter 3 projection
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
UCB 2012-02-28
UCB 2012-02-28UCB 2012-02-28
UCB 2012-02-28
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programming
 
An introduction to quantum stochastic calculus
An introduction to quantum stochastic calculusAn introduction to quantum stochastic calculus
An introduction to quantum stochastic calculus
 
Tuto part2
Tuto part2Tuto part2
Tuto part2
 
Numerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis methodNumerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis method
 

More from zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

More from zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Recently uploaded

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 

Recently uploaded (20)

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 

01 graphical models

  • 1. Graphical Models Factor Graphs Test-time Inference Training Part 2: Introduction to Graphical Models Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 2. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction Model: relating observations x to quantities of interest y f Example 1: given RGB image x, infer depth y for each pixel Example 2: given RGB image x, infer X Y presence and positions y of all objects f :X →Y shown Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 3. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction Model: relating observations x to quantities of interest y f Example 1: given RGB image x, infer depth y for each pixel Example 2: given RGB image x, infer X Y presence and positions y of all objects f :X →Y shown X : image, Y: object annotations Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 4. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction General case: mapping x ∈ X to y ∈ Y Graphical models are a concise language to define this mapping x Mapping can be ambiguous: f (x) measurement noise, lack of X Y well-posedness (e.g. occlusions) f :X →Y Probabilistic graphical models: define form p(y |x) or p(x, y ) for all y ∈ Y Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 5. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction General case: mapping x ∈ X to y ∈ Y Graphical models are a concise ? language to define this mapping x Mapping can be ambiguous: ? measurement noise, lack of X Y well-posedness (e.g. occlusions) p(Y |X = x) Probabilistic graphical models: define form p(y |x) or p(x, y ) for all y ∈ Y Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 6. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Graphical Models A graphical model defines a family of probability distributions over a set of random variables, by means of a graph, so that the random variables satisfy conditional independence assumptions encoded in the graph. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 7. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Graphical Models A graphical model defines a family of probability distributions over a set of random variables, by means of a graph, so that the random variables satisfy conditional independence assumptions encoded in the graph. Popular classes of graphical models, Undirected graphical models (Markov random fields), Directed graphical models (Bayesian networks), Factor graphs, Others: chain graphs, influence diagrams, etc. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 8. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Bayesian Networks Graph: G = (V , E), E ⊂ V × V Yi Yj directed acyclic Variable domains Yi Yk Factorization p(Y = y ) = p(yi |ypaG (i) ) Yl i∈V over distributions, by conditioning on parent A simple Bayes net nodes. Example p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj ) p(Yi = yi )p(Yj = yj ). Sebastian Nowozin and Christoph H. Lampert Family of distributions Part 2: Introduction to Graphical Models
  • 9. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Bayesian Networks Graph: G = (V , E), E ⊂ V × V Yi Yj directed acyclic Variable domains Yi Yk Factorization p(Y = y ) = p(yi |ypaG (i) ) Yl i∈V over distributions, by conditioning on parent A simple Bayes net nodes. Example p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj ) p(Yi = yi )p(Yj = yj ). Sebastian Nowozin and Christoph H. Lampert Family of distributions Part 2: Introduction to Graphical Models
  • 10. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Undirected Graphical Models Yi Yj Yk = Markov random field (MRF) = Markov network A simple MRF Graph: G = (V , E), E ⊂ V × V undirected, no self-edges Variable domains Yi Factorization over potentials ψ at cliques, 1 p(y ) = ψC (yC ) Z C ∈C(G ) Constant Z = y ∈Y C ∈C(G ) ψC (yC ) Example 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) Z Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 11. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Undirected Graphical Models Yi Yj Yk = Markov random field (MRF) = Markov network A simple MRF Graph: G = (V , E), E ⊂ V × V undirected, no self-edges Variable domains Yi Factorization over potentials ψ at cliques, 1 p(y ) = ψC (yC ) Z C ∈C(G ) Constant Z = y ∈Y C ∈C(G ) ψC (yC ) Example 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) Z Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 12. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Example 1 Yi Yj Yk Cliques C(G ): set of vertex sets V with V ⊆ V , E ∩ (V × V ) = V × V Here C(G ) = {{i}, {i, j}, {j}, {j, k}, {k}} 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) Z Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 13. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Example 2 Yi Yj Yk Yl Here C(G ) = 2V : all subsets of V are cliques 1 p(y ) = ψA (yA ). Z A∈2{i,j,k,l} Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 14. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Factor Graphs Graph: G = (V , F, E), E ⊆ V × F Yi Yj variable nodes V , factor nodes F , edges E between variable and factor nodes. scope of a factor, N(F ) = {i ∈ V : (i, F ) ∈ E} Yk Yl Variable domains Yi Factorization over potentials ψ at factors, Factor graph 1 p(y ) = ψF (yN(F ) ) Z F ∈F Constant Z = y ∈Y F ∈F ψF (yN(F ) ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 15. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Factor Graphs Graph: G = (V , F, E), E ⊆ V × F Yi Yj variable nodes V , factor nodes F , edges E between variable and factor nodes. scope of a factor, N(F ) = {i ∈ V : (i, F ) ∈ E} Yk Yl Variable domains Yi Factorization over potentials ψ at factors, Factor graph 1 p(y ) = ψF (yN(F ) ) Z F ∈F Constant Z = y ∈Y F ∈F ψF (yN(F ) ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 16. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Why factor graphs? Yi Yj Yi Yj Yi Yj Yk Yl Yk Yl Yk Yl Factor graphs are explicit about the factorization Hence, easier to work with Universal (just like MRFs and Bayesian networks) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 17. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Capacity Yi Yj Yi Yj Yk Yl Yk Yl Factor graph defines family of distributions Some families are larger than others Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 18. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Four remaining pieces 1. Conditional distributions (CRFs) 2. Parameterization 3. Test-time inference 4. Learning the model from training data Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 19. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Four remaining pieces 1. Conditional distributions (CRFs) 2. Parameterization 3. Test-time inference 4. Learning the model from training data Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 20. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Conditional Distributions We have discussed p(y ), Xi Xj How do we define p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random fields (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 21. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Conditional Distributions We have discussed p(y ), Xi Xj How do we define p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random fields (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution 1 p(y ) = ψF (yN(F ) ) Z F ∈F 1 p(y |x) = ψF (yN(F ) ; xN(F ) ) Z (x) F ∈F Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 22. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Conditional Distributions We have discussed p(y ), Xi Xj How do we define p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random fields (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution 1 p(y ) = ψF (yN(F ) ) Z F ∈F 1 p(y |x) = ψF (yN(F ) ; xN(F ) ) Z (x) F ∈F Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 23. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Potentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 24. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Potentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 25. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Potentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 26. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Energy Minimization 1 argmax p(Y = y ) = argmax exp(− EF (yF )) y ∈Y y ∈Y Z F ∈F = argmax exp(− EF (yF )) y ∈Y F ∈F = argmax − EF (yF ) y ∈Y F ∈F = argmin EF (yF ) y ∈Y F ∈F = argmin E (y ). y ∈Y Energy minimization can be interpreted as solving for the most likely state of some factor graph model Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 27. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Parameterization Factor graphs define a family of distributions Parameterization: identifying individual members by parameters w Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 28. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Parameterization Factor graphs define a family of distributions Parameterization: identifying individual members by parameters w distributions indexed by w pw1 pw2 distributions in family Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 29. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization Image segmentation model Pairwise “Potts” energy function EF (yi , yj ; w1 ), EF : {0, 1} × {0, 1} × R → R, EF (0, 0; w1 ) = EF (1, 1; w1 ) = 0 image segmentation model EF (0, 1; w1 ) = EF (1, 0; w1 ) = w1 Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 30. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization (cont) Image segmentation model Unary energy function EF (yi ; x, w ), EF : {0, 1} × X × R{0,1}×D → R, EF (0; x, w ) = w (0), ψF (x) EF (1; x, w ) = w (1), ψF (x) image segmentation model Features ψF : X → RD , e.g. image filters Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 31. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization (cont) w(0), ψF (x) ... ... ... w(1), ψF (x) ... 0 w1 w1 0 Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 32. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization (cont) w(0), ψF (x) ... ... ... w(1), ψF (x) ... 0 w1 w1 0 Total number of parameters: D + D + 1 Parameters are shared, but energies differ because of different ψF (x) General form, linear in w , EF (yF ; xF , w ) = w (yF ), ψF (xF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 33. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Making Predictions Making predictions: given x ∈ X , predict y ∈ Y How to measure quality of prediction? (or function f : X → Y) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 34. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Loss function Define a loss function ∆ : Y × Y → R+ , so that ∆(y , y ∗ ) measures the loss incurred by predicting y when y ∗ is true. The loss function is application dependent Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 35. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Test-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 36. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Test-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 37. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Test-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 38. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 1: 0/1 loss Loss 0 iff perfectly predicted, 1 otherwise: 0 if y = y ∗ ∆0/1 (y , y ∗ ) = I (y = y ∗ ) = 1 otherwise Plugging it in, y∗ := argmin Ey ∼p(y |x) ∆0/1 (y , y ) y ∈Y = argmax p(y |x) y ∈Y = argmin E (y , x). y ∈Y Minimizing the expected 0/1-loss → MAP prediction (energy minimization) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 39. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 1: 0/1 loss Loss 0 iff perfectly predicted, 1 otherwise: 0 if y = y ∗ ∆0/1 (y , y ∗ ) = I (y = y ∗ ) = 1 otherwise Plugging it in, y∗ := argmin Ey ∼p(y |x) ∆0/1 (y , y ) y ∈Y = argmax p(y |x) y ∈Y = argmin E (y , x). y ∈Y Minimizing the expected 0/1-loss → MAP prediction (energy minimization) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 40. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 2: Hamming loss Count the number of mislabeled variables: 1 ∆H (y , y ∗ ) = I (yi = yi∗ ) |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆H (y , y )] y ∈Y = argmax p(yi |x) yi ∈Yi i∈V Minimizing the expected Hamming loss → maximum posterior marginal (MPM, Max-Marg) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 41. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 2: Hamming loss Count the number of mislabeled variables: 1 ∆H (y , y ∗ ) = I (yi = yi∗ ) |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆H (y , y )] y ∈Y = argmax p(yi |x) yi ∈Yi i∈V Minimizing the expected Hamming loss → maximum posterior marginal (MPM, Max-Marg) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 42. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 3: Squared error Assume a vector space on Yi (pixel intensities, optical flow vectors, etc.). Sum of squared errors 1 ∆Q (y , y ∗ ) = yi − yi∗ 2 . |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆Q (y , y )] y ∈Y   =  p(yi |x)yi  yi ∈Yi i∈V Minimizing the expected squared error → minimum mean squared error (MMSE) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 43. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 3: Squared error Assume a vector space on Yi (pixel intensities, optical flow vectors, etc.). Sum of squared errors 1 ∆Q (y , y ∗ ) = yi − yi∗ 2 . |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆Q (y , y )] y ∈Y   =  p(yi |x)yi  yi ∈Yi i∈V Minimizing the expected squared error → minimum mean squared error (MMSE) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 44. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Inference Task: Maximum A Posteriori (MAP) Inference Definition (Maximum A Posteriori (MAP) Inference) Given a factor graph, parameterization, and weight vector w , and given the observation x, find y ∗ = argmax p(Y = y |x, w ) = argmin E (y ; x, w ). y ∈Y y ∈Y Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 45. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Inference Task: Probabilistic Inference Definition (Probabilistic Inference) Given a factor graph, parameterization, and weight vector w , and given the observation x, find log Z (x, w ) = log exp(−E (y ; x, w )), y ∈Y µF (yF ) = p(YF = yf |x, w ), ∀F ∈ F, ∀yF ∈ YF . This typically includes variable marginals µi (yi ) = p(yi |x, w ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 46. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example: Man-made structure detection Xi ψi2 Yi 3 ψi,k Yk ψi1 Left: input image x, Middle: ground truth labeling on 16-by-16 pixel blocks, Right: factor graph model Features: gradient and color histograms Estimate model parameters from ≈ 60 training images Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 47. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example: Man-made structure detection Left: input image x, Middle (probabilistic inference): visualization of the variable marginals p(yi = “manmade |x, w ), Right (MAP inference): joint MAP labeling y ∗ = argmaxy ∈Y p(y |x, w ). Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 48. Graphical Models Factor Graphs Test-time Inference Training Training Training the Model What can be learned? Model structure: factors Model variables: observed variables fixed, but we can add unobserved variables Factor energies: parameters Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 49. Graphical Models Factor Graphs Test-time Inference Training Training Training the Model What can be learned? Model structure: factors Model variables: observed variables fixed, but we can add unobserved variables Factor energies: parameters Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 50. Graphical Models Factor Graphs Test-time Inference Training Training Training: Overview Assume a fully observed, independent and identically distributed (iid) sample set {(x n , y n )}n=1,...,N , (x n , y n ) ∼ d(X , Y ) Goal: predict well, Alternative goal: first model d(y |x) well by p(y |x, w ), then predict by minimizing the expected loss Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 51. Graphical Models Factor Graphs Test-time Inference Training Training Probabilistic Learning Problem (Probabilistic Parameter Learning) Let d(y |x) be the (unknown) conditional distribution of labels for a problem to be solved. For a parameterized conditional distribution p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is the task of finding a point estimate of the parameter w ∗ that makes p(y |x, w ∗ ) closest to d(y |x). We will discuss probabilistic parameter learning in detail. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 52. Graphical Models Factor Graphs Test-time Inference Training Training Probabilistic Learning Problem (Probabilistic Parameter Learning) Let d(y |x) be the (unknown) conditional distribution of labels for a problem to be solved. For a parameterized conditional distribution p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is the task of finding a point estimate of the parameter w ∗ that makes p(y |x, w ∗ ) closest to d(y |x). We will discuss probabilistic parameter learning in detail. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 53. Graphical Models Factor Graphs Test-time Inference Training Training Loss-Minimizing Parameter Learning Problem (Loss-Minimizing Parameter Learning) Let d(x, y ) be the unknown distribution of data in labels, and let ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is the task of finding a parameter value w ∗ such that the expected prediction risk E(x,y )∼d(x,y ) [∆(y , fp (x))] is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ). Requires loss function at training time Directly learns a prediction function fp (x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 54. Graphical Models Factor Graphs Test-time Inference Training Training Loss-Minimizing Parameter Learning Problem (Loss-Minimizing Parameter Learning) Let d(x, y ) be the unknown distribution of data in labels, and let ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is the task of finding a parameter value w ∗ such that the expected prediction risk E(x,y )∼d(x,y ) [∆(y , fp (x))] is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ). Requires loss function at training time Directly learns a prediction function fp (x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models