SlideShare a Scribd company logo
1 of 63
Download to read offline
P ROBABILISTIC S EGMENTATION


             IIT Kharagpur


 Computer Science and Engineering,
   Indian Institute of Technology
            Kharagpur.




                                     ,

                                         1 / 36
Mixture Model                                       Image Segmentation
Probability of generating a pixel measurement vector:

                       p (x) =       p (x | θl )   πl
                                 l




                                                                    ,

                                                                        2 / 36
Mixture Model                                              Image Segmentation
Probability of generating a pixel measurement vector:

                       p (x) =              p (x | θl )   πl
                                    l

The mixture model has the form:
                                        g
                    p (x | Θ)   =            αl   pl (x | θl )
                                    l=1




                                                                           ,

                                                                               2 / 36
Mixture Model                                                         Image Segmentation
Probability of generating a pixel measurement vector:

                               p (x) =                 p (x | θl )   πl
                                               l

The mixture model has the form:
                                                   g
                          p (x | Θ)      =              αl   pl (x | θl )
                                             l=1

Component densities:

                                    1                            1
    pl (x | θl )   =          d/2              1/2
                                                       exp −       x − µl   Σ−1 x − µl
                                                                             l
                       (2π)         det(Σl )                     2



                                                                                         ,

                                                                                             2 / 36
Image Segmentation
Likelihood for all observations (data points):
                                   g                 
                                                     
                                      αl pl xj | θl
                                  
                                                     
                                                      
                                  
                                                     
                                                      
                                                     
                    j∈ observations   l=1




                                                                 ,

                                                                     3 / 36
Mixture Model                                Line Fitting
          p (W)   =       πl   p (W | al )
                      l




                                                      ,

                                                          4 / 36
Mixture Model                                                    Line Fitting
                        p (W)    =         πl   p (W | al )
                                      l

Likelihood for a set of observations:
                                  g                         
                                                            
                                      πl
                                                            
                                 
                                 
                                 
                                 
                                                pl   Wj | al 
                                                             
                                                             
                                                             
                   j∈ observations   l=1




                                                                          ,

                                                                              4 / 36
Missing data problems
                                             
                          
                                             
                                              
         Lc (x ; u) = log 
                          
                          
                          
                                  pc   xj ; u 
                                              
                                              
                                              
                              j

                   =       log    pc   xj ; u
                       j




                                                  ,

                                                      5 / 36
Missing data problems
                                                          
                                   
                                                          
                                                           
                  Lc (x ; u) = log 
                                   
                                   
                                   
                                               pc   xj ; u 
                                                           
                                                           
                                                           
                                           j

                                 =       log   pc   xj ; u
                                     j

The incomplete data space:

                pi (y ; u)   =




                                                               ,

                                                                   5 / 36
Missing data problems
                                                           
                                   
                                                           
                                                            
                  Lc (x ; u) = log 
                                   
                                   
                                   
                                               pc    xj ; u 
                                                            
                                                            
                                                            
                                           j

                                 =       log   pc    xj ; u
                                     j

The incomplete data space:

                pi (y ; u)   =                      pc (x ; u)




                                                                 ,

                                                                     5 / 36
Missing data problems
                                                                
                                   
                                                                
                                                                 
                  Lc (x ; u) = log 
                                   
                                   
                                   
                                                     pc   xj ; u 
                                                                 
                                                                 
                                                                 
                                              j

                                 =         log    pc      xj ; u
                                       j

The incomplete data space:

                pi (y ; u)   =                         pc (x ; u)
                                     (x | f (x)=y)




                                                                     ,

                                                                         5 / 36
Missing data problems
                                                                 
                                    
                                                                 
                                                                  
                   Lc (x ; u) = log 
                                    
                                    
                                    
                                                      pc   xj ; u 
                                                                  
                                                                  
                                                                  
                                               j

                                  =         log    pc      xj ; u
                                        j

The incomplete data space:

                 pi (y ; u)   =                         pc (x ; u) dη
                                      (x | f (x)=y)

where η measures volume on the space of x such that f (x) = y




                                                                        ,

                                                                            5 / 36
Missing data problems
The incomplete data likelihood:

                                          pi   yj ; u
                        j∈ observations




                                                        ,

                                                            6 / 36
Missing data problems
The incomplete data likelihood:

                                            pi   yj ; u
                          j∈ observations



             Li (y ; u)




                                                          ,

                                                              6 / 36
Missing data problems
The incomplete data likelihood:

                                             pi   yj ; u
                          j∈ observations

                                                  
                              
                                                  
                                                   
             Li (y ; u) = log 
                              
                              
                              
                                      pi    yj ; u 
                                                   
                                                   
                                                   
                                  j




                                                           ,

                                                               6 / 36
Missing data problems
The incomplete data likelihood:

                                             pi     yj ; u
                          j∈ observations

                                                  
                              
                                                  
                                                   
             Li (y ; u) = log 
                              
                              
                              
                                      pi    yj ; u 
                                                   
                                                   
                                                   
                                  j

                      =        log    pi   yj ; u
                           j




                                                             ,

                                                                 6 / 36
Missing data problems
The incomplete data likelihood:

                                              pi     yj ; u
                          j∈ observations

                                                  
                              
                                                  
                                                   
             Li (y ; u) = log 
                              
                              
                              
                                       pi   yj ; u 
                                                   
                                                   
                                                   
                                  j

                      =        log     pi   yj ; u
                           j
                                                                      
                                                          pc (x ; u) dη
                                                                      
                      =        log 
                                                                      
                                   
                                                                      
                                                                       
                           j            {x | f (x)=yj }



                                                                           ,

                                                                               6 / 36
EM for mixture models
The complete data is a composition of the incomplete data and the
missing data.
                           xj = yj , zj




                                                                    ,

                                                                        7 / 36
EM for mixture models
The complete data is a composition of the incomplete data and the
missing data.
                           xj = yj , zj
Mixture model:
                       p (y)   =       πl   p (y | al )
                                   i




                                                                    ,

                                                                        7 / 36
EM for mixture models
The complete data is a composition of the incomplete data and the
missing data.
                           xj = yj , zj
Mixture model:
                        p (y)   =             πl   p (y | al )
                                         i

Complete data log likelihood:
                                        g
                                                                    
                                    
                                                                   
                                                                    
                                    
                                    
                                    
                                    
                                              zlj log   p   yj | al 
                                                                    
                                                                    
                                                                    
                  j∈ observations       l=1




                                                                        ,

                                                                            7 / 36
EM
 E-step: Compute the expected value for zj for each j.
                (s)
 i.e. Compute ¯j . This results in ¯s = [y, ¯s ]
              z                    x        z




                                                         ,

                                                             8 / 36
EM
 E-step: Compute the expected value for zj for each j.
                (s)
 i.e. Compute ¯j . This results in ¯s = [y, ¯s ]
              z                    x        z
 M-step: Maximize the complete data log-likelihood
 with respect to u


                   us+1 = arg max Lc (¯s ; u)
                                      x
                                u




                                                         ,

                                                             8 / 36
EM
 E-step: Compute the expected value for zj for each j.
                (s)
 i.e. Compute ¯j . This results in ¯s = [y, ¯s ]
              z                    x        z
 M-step: Maximize the complete data log-likelihood
 with respect to u


                   us+1 = arg max Lc (¯s ; u)
                                      x
                                 u
                        = arg max Lc ([y, ¯s ] ; u)
                                          z
                                 u




                                                         ,

                                                             8 / 36
EM in General Case
Expected value of the complete data log-likelihood:

              Q u ; u(s) =     Lc (x ; u)   p   x | u(s) , y dx

We maximize with respect to u to get.

                      us+1 = arg max Q u ; u(s)
                                   u




                                                                  ,

                                                                      9 / 36
Image Segmentation
W HAT IS M ISSING DATA ? An (n × g) matrix I of indicator variables.

Expectation step:

   E(Ilm )   = ¯lm =
               I




                                                                       ,

                                                                       10 / 36
Image Segmentation
W HAT IS M ISSING DATA ? An (n × g) matrix I of indicator variables.

Expectation step:

   E(Ilm )   = ¯lm =
               I         1 · P l th pixel comes from mth blob
                       + 0 · P l th pixel does not come from mth blob




                                                                        ,

                                                                        10 / 36
Image Segmentation
W HAT IS M ISSING DATA ? An (n × g) matrix I of indicator variables.

Expectation step:

   E(Ilm )   = ¯lm =
               I         1 · P l th pixel comes from mth blob
                       + 0 · P l th pixel does not come from mth blob
                  = P l th pixel comes from mth blob




                                                                        ,

                                                                        10 / 36
Image Segmentation
W HAT IS M ISSING DATA ? An (n × g) matrix I of indicator variables.

Expectation step:

   E(Ilm )   = ¯lm =
               I         1 · P l th pixel comes from mth blob
                       + 0 · P l th pixel does not come from mth blob
                  = P l th pixel comes from mth blob

We get:
                                  (s)             (s)
                                αm pm xl | θm
                        ¯lm =
                        I             (s)
                                K
                                k =1 αk     pk xl | θ(s)
                                                     k




                                                                        ,

                                                                        10 / 36
Image Segmentation
C OMPLETE DATA LOG - LIKELIHOOD :
                                                   g
                                                                             
                                                                            
         Lc [x, ¯lm ] ; Θ(s) =                          ¯lm log p (xl | θm ) 
                                                                            
                I                                       I
                                                
                                                
                                                                            
                                                                             
                                                                            
                                 l∈ all pixel   m=1




                                                                                 ,

                                                                                 11 / 36
Image Segmentation
C OMPLETE DATA LOG - LIKELIHOOD :
                                                   g
                                                                             
                                                                            
         Lc [x, ¯lm ] ; Θ(s) =                          ¯lm log p (xl | θm ) 
                                                                            
                I                                       I
                                                
                                                
                                                                            
                                                                             
                                                                            
                                 l∈ all pixel   m=1

Maximization step:

                  Θ(s+1) = arg max Lc [x, ¯lm ] ; Θ(s)
                                          I
                                       Θ




                                                                                 ,

                                                                                 11 / 36
Image Segmentation
Maximization step:
                                          n
                      (s+1)       1
                     αm         =               p    m | xl , Θ(s)
                                  n
                                       l=1


                                    n
                                    l=1   xl    p m | xl , Θ(s)
                     µ(s+1) =
                      m               n
                                      l=1      p m | xl , Θ(s)

                                                             (s)          (s)
                      n
                      l=1   p   m | xl , Θ(s)         xl − µm        xl − µm
         (s+1)
       Σm        =
                                      n
                                      l=1      p    m | xl , Θ(s)


                                                                                ,

                                                                                12 / 36
How EM works for Image Segmentation
E-step:
                                      (s)            (s)
                                   αm pm xl | θm
                     ¯lm =
                     I                   (s)
                                   K
                                   k =1 αk     pk xl | θ(s)
                                                        m

                                                           (s)    (s)
    For each pixel we compute the values: αm pm xl | θm                   for each
    segment m.
                                                       (s)       (s)
    For each pixel compute the sum K=1 αk pk xl | θm
                                     k                                  , i.e.
    perform summation over all the K segments.
    Divide the former by the latter.
M-step:
                    (s+1)     (s+1)     (s+1)
    Compute the αm          , µm      , Σm


                                                                                 ,

                                                                                 13 / 36
Line Fitting                             Expectation Maximization
W HAT IS M ISSING DATA ?
An (n × g) matrix M of indicator variables.


                                   1 if point k is drawn from line l
   k, lth entry of M =   mk,l =
                                   0 otherwise




                                                                       ,

                                                                       14 / 36
Line Fitting                                Expectation Maximization
W HAT IS M ISSING DATA ?
An (n × g) matrix M of indicator variables.


                                       1 if point k is drawn from line l
   k, lth entry of M =     mk,l =
                                       0 otherwise

                P (mkl = 1 |   point k, line l s parameters) = 1.
            l




                                                                           ,

                                                                           14 / 36
Line Fitting                                Expectation Maximization
W HAT IS M ISSING DATA ?
An (n × g) matrix M of indicator variables.


                                       1 if point k is drawn from line l
   k, lth entry of M =     mk,l =
                                       0 otherwise

                P (mkl = 1 |   point k, line l s parameters) = 1.
            l

H OW TO FORMULATE LIKELIHOOD ?




                                                                           ,

                                                                           14 / 36
Line Fitting                                Expectation Maximization
W HAT IS M ISSING DATA ?
An (n × g) matrix M of indicator variables.


                                       1 if point k is drawn from line l
   k, lth entry of M =      mk,l =
                                       0 otherwise

                P (mkl = 1 |   point k, line l s parameters) = 1.
            l

H OW TO FORMULATE LIKELIHOOD ?

                        (distance from point k to line l )2
                exp −
                                      2σ2



                                                                           ,

                                                                           14 / 36
Motion Segmentation                                                   EM
W HAT IS M ISSING DATA ? It is the motion field to which the pixel l
belongs. Indicator variable Vxy,l is the xy, l th entry of V .

                   1 if xy th pixel belongs to the l th motion field
        Vxy,l =
                   0 otherwise
H OW TO FORMULATE LIKELIHOOD ?




                                                                      ,

                                                                      15 / 36
Motion Segmentation                                                                  EM
W HAT IS M ISSING DATA ? It is the motion field to which the pixel l
belongs. Indicator variable Vxy,l is the xy, l th entry of V .

                        1 if xy th pixel belongs to the l th motion field
         Vxy,l =
                        0 otherwise
H OW TO FORMULATE LIKELIHOOD ?

                               (I1 (x, y) − I2 (x+m1 (x, y ; θl ), y+m2 (x, y ; θl )) ) 2
L(V , Θ) = −           Vxy,l
                xy,l
                                                          2σ2

where Θ = θ1 , θ2 , . . . θg


                                 P Vxy,l   = 1 ; I1 , I2 , Θ

                                                                                      ,

                                                                                      15 / 36
Motion Segmentation                                                            EM
H OW TO FORMULATE LIKELIHOOD ?

                               P Vxy,l   = 1 ; I1 , I2 , Θ

A common choice is the affine motion model:

                 m1                       a11 a12            x    a13
                       (x, y ; θl ) =
                 m2                       a21 a22            y    a23

where θl = (a11 , a12 , . . . , a23 )

                                                             Layered representation




                                                                                ,

                                                                                16 / 36
Identifying Outliers                                                 EM
    We construct an explicit model of the outliers.

           (1 − λ)   P (measurements | model)   +λ    P (outliers)

    Here λ = [0, 1] models the frequency with which the outliers
    occur,
    P (outliers) is the probability model for the outliers.
W HAT IS M ISSING DATA ?
A variable that indicates which component generated each point.




                                                                     ,

                                                                     17 / 36
Identifying Outliers                                                  EM
     We construct an explicit model of the outliers.

            (1 − λ)   P (measurements | model)   +λ    P (outliers)

     Here λ = [0, 1] models the frequency with which the outliers
     occur,
     P (outliers) is the probability model for the outliers.
W HAT IS M ISSING DATA ?
A variable that indicates which component generated each point.
Complete data likelihood

     (1 − λ) P measurementj | model + λ P measurementj | outliers
 j

                                                                      ,

                                                                      17 / 36
Background Subtraction                                         EM
 For each pixel we get a series of observations for the successive
 frames.
 The source of these obeservations is a mixture model with two
 components: the background and the noise (foreground).
 The background can be modeled as a Gaussian.
 The noise can come from some uniform source.
 Any pixel which belongs to noise is not background.




                                                                 ,

                                                                 18 / 36
Difficulties                           Expectation Maximization
  Local minima.
  Proper initialization.
  Extremely small expected weights.
  Parameters converging to the boundaries of parameter space.




                                                                ,

                                                                19 / 36
Model Selection
  Should we consider minimizing the negative of log likelihood?




                                                                  ,

                                                                  20 / 36
Model Selection
  Should we consider minimizing the negative of log likelihood?
  We should have a penalty term which increases as the number of
  components increase.




                                                                  ,

                                                                  20 / 36
Model Selection
    Should we consider minimizing the negative of log likelihood?
    We should have a penalty term which increases as the number of
    components increase.

An Information Criteria (AIC)
                          −2L(x ; Θ∗ ) + 2p
where p is the number of free parameters.




                                                                    ,

                                                                    20 / 36
Model Selection
    Should we consider minimizing the negative of log likelihood?
    We should have a penalty term which increases as the number of
    components increase.

An Information Criteria (AIC)
                          −2L(x ; Θ∗ ) + 2p
where p is the number of free parameters.

Bayesian Information Criteria (BIC)
                                      p
                         −L(D ; θ∗ ) + log N
                                      2
where p is the number of free parameters.


                                                                    ,

                                                                    20 / 36
Bayesian Information Criteria (BIC)

                             P (D | M)
            P (M | D)   =              P (M)
                              P (D)
                              P (D | M , θ) P (θ) dθ
                        =                            P (M)
                                     P (D)
Maximizing the posterior    P (M | D)   yields:
                                          p
                            −L(D ; θ∗ ) + 2 log N

where p is the number of free parameters.




                                                             ,

                                                             21 / 36
Minimum Description Length (MDL) criteria
It yields a selection criteria which is the same as BIC.
                                       p
                          −L(D ; θ∗ ) + log N
                                       2
where p is the number of free parameters.




                                                           ,

                                                           22 / 36
,

23 / 36
,

24 / 36
,

25 / 36
,

26 / 36
,

27 / 36
,

28 / 36
,

29 / 36
,

30 / 36
,

31 / 36
,

32 / 36
,

33 / 36
,

34 / 36
,

35 / 36
,

36 / 36

More Related Content

What's hot

01 graphical models
01 graphical models01 graphical models
01 graphical models
zukun
 
IGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfIGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdf
grssieee
 
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...
Jiří Šmída
 
Random Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application ExamplesRandom Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application Examples
Förderverein Technische Fakultät
 
Bayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet Process
Alessandro Panella
 
Specific function examples
Specific function examplesSpecific function examples
Specific function examples
Leo Crisologo
 

What's hot (20)

Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...
 
Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...
 
01 graphical models
01 graphical models01 graphical models
01 graphical models
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)
 
Ben Gal
Ben Gal Ben Gal
Ben Gal
 
IGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfIGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdf
 
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...
 
Tro07 sparse-solutions-talk
Tro07 sparse-solutions-talkTro07 sparse-solutions-talk
Tro07 sparse-solutions-talk
 
Approximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUsApproximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUs
 
Eigenvalues of Symmetrix Hierarchical Matrices
Eigenvalues of Symmetrix Hierarchical MatricesEigenvalues of Symmetrix Hierarchical Matrices
Eigenvalues of Symmetrix Hierarchical Matrices
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU supportABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
 
Discussion of Faming Liang's talk
Discussion of Faming Liang's talkDiscussion of Faming Liang's talk
Discussion of Faming Liang's talk
 
Random Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application ExamplesRandom Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application Examples
 
Bayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet Process
 
Bouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration ToolboxBouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration Toolbox
 
Quantum Algorithms and Lower Bounds in Continuous Time
Quantum Algorithms and Lower Bounds in Continuous TimeQuantum Algorithms and Lower Bounds in Continuous Time
Quantum Algorithms and Lower Bounds in Continuous Time
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
 
Specific function examples
Specific function examplesSpecific function examples
Specific function examples
 

Viewers also liked (17)

11 mm91r05
11 mm91r0511 mm91r05
11 mm91r05
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Lecture 10h
Lecture 10hLecture 10h
Lecture 10h
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 
Translational health research
Translational health researchTranslational health research
Translational health research
 
Lecture 12
Lecture 12Lecture 12
Lecture 12
 
Lecture 13
Lecture 13Lecture 13
Lecture 13
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
Lecture 11
Lecture 11Lecture 11
Lecture 11
 
Lecture 8
Lecture 8Lecture 8
Lecture 8
 
Экостроительство. Перспективы для зеленого туризма.
Экостроительство. Перспективы для зеленого туризма.Экостроительство. Перспективы для зеленого туризма.
Экостроительство. Перспективы для зеленого туризма.
 
Перемакультура. Арбоскульптура. Дом как часть живой природы.
Перемакультура. Арбоскульптура. Дом как часть живой природы.Перемакультура. Арбоскульптура. Дом как часть живой природы.
Перемакультура. Арбоскульптура. Дом как часть живой природы.
 
Lecture 6
Lecture 6Lecture 6
Lecture 6
 
Lecture9
Lecture9Lecture9
Lecture9
 
здоровый дом здоровый ребенок
здоровый дом  здоровый ребенокздоровый дом  здоровый ребенок
здоровый дом здоровый ребенок
 
Lecture 9h
Lecture 9hLecture 9h
Lecture 9h
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 

Similar to Em

CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
zukun
 
Ml mle_bayes
Ml  mle_bayesMl  mle_bayes
Ml mle_bayes
Phong Vo
 
Jyokyo-kai-20120605
Jyokyo-kai-20120605Jyokyo-kai-20120605
Jyokyo-kai-20120605
ketanaka
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
zukun
 
Fixed points and two-cycles of the self-power map
Fixed points and two-cycles of the self-power mapFixed points and two-cycles of the self-power map
Fixed points and two-cycles of the self-power map
Joshua Holden
 

Similar to Em (20)

CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
 
Ml mle_bayes
Ml  mle_bayesMl  mle_bayes
Ml mle_bayes
 
ma112011id535
ma112011id535ma112011id535
ma112011id535
 
Mixture Models for Image Analysis
Mixture Models for Image AnalysisMixture Models for Image Analysis
Mixture Models for Image Analysis
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
Stochastic Differentiation
Stochastic DifferentiationStochastic Differentiation
Stochastic Differentiation
 
Jyokyo-kai-20120605
Jyokyo-kai-20120605Jyokyo-kai-20120605
Jyokyo-kai-20120605
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
 
1 - Linear Regression
1 - Linear Regression1 - Linear Regression
1 - Linear Regression
 
Computation of the marginal likelihood
Computation of the marginal likelihoodComputation of the marginal likelihood
Computation of the marginal likelihood
 
Introduction to modern Variational Inference.
Introduction to modern Variational Inference.Introduction to modern Variational Inference.
Introduction to modern Variational Inference.
 
Particle filtering
Particle filteringParticle filtering
Particle filtering
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
Fixed points and two-cycles of the self-power map
Fixed points and two-cycles of the self-power mapFixed points and two-cycles of the self-power map
Fixed points and two-cycles of the self-power map
 
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmission
 
rinko2010
rinko2010rinko2010
rinko2010
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
YSC 2013
YSC 2013YSC 2013
YSC 2013
 

Em

  • 1. P ROBABILISTIC S EGMENTATION IIT Kharagpur Computer Science and Engineering, Indian Institute of Technology Kharagpur. , 1 / 36
  • 2. Mixture Model Image Segmentation Probability of generating a pixel measurement vector: p (x) = p (x | θl ) πl l , 2 / 36
  • 3. Mixture Model Image Segmentation Probability of generating a pixel measurement vector: p (x) = p (x | θl ) πl l The mixture model has the form: g p (x | Θ) = αl pl (x | θl ) l=1 , 2 / 36
  • 4. Mixture Model Image Segmentation Probability of generating a pixel measurement vector: p (x) = p (x | θl ) πl l The mixture model has the form: g p (x | Θ) = αl pl (x | θl ) l=1 Component densities: 1 1 pl (x | θl ) = d/2 1/2 exp − x − µl Σ−1 x − µl l (2π) det(Σl ) 2 , 2 / 36
  • 5. Image Segmentation Likelihood for all observations (data points):  g    αl pl xj | θl           j∈ observations l=1 , 3 / 36
  • 6. Mixture Model Line Fitting p (W) = πl p (W | al ) l , 4 / 36
  • 7. Mixture Model Line Fitting p (W) = πl p (W | al ) l Likelihood for a set of observations:  g    πl       pl Wj | al     j∈ observations l=1 , 4 / 36
  • 8. Missing data problems       Lc (x ; u) = log     pc xj ; u     j = log pc xj ; u j , 5 / 36
  • 9. Missing data problems       Lc (x ; u) = log     pc xj ; u     j = log pc xj ; u j The incomplete data space: pi (y ; u) = , 5 / 36
  • 10. Missing data problems       Lc (x ; u) = log     pc xj ; u     j = log pc xj ; u j The incomplete data space: pi (y ; u) = pc (x ; u) , 5 / 36
  • 11. Missing data problems       Lc (x ; u) = log     pc xj ; u     j = log pc xj ; u j The incomplete data space: pi (y ; u) = pc (x ; u) (x | f (x)=y) , 5 / 36
  • 12. Missing data problems       Lc (x ; u) = log     pc xj ; u     j = log pc xj ; u j The incomplete data space: pi (y ; u) = pc (x ; u) dη (x | f (x)=y) where η measures volume on the space of x such that f (x) = y , 5 / 36
  • 13. Missing data problems The incomplete data likelihood: pi yj ; u j∈ observations , 6 / 36
  • 14. Missing data problems The incomplete data likelihood: pi yj ; u j∈ observations Li (y ; u) , 6 / 36
  • 15. Missing data problems The incomplete data likelihood: pi yj ; u j∈ observations       Li (y ; u) = log     pi yj ; u     j , 6 / 36
  • 16. Missing data problems The incomplete data likelihood: pi yj ; u j∈ observations       Li (y ; u) = log     pi yj ; u     j = log pi yj ; u j , 6 / 36
  • 17. Missing data problems The incomplete data likelihood: pi yj ; u j∈ observations       Li (y ; u) = log     pi yj ; u     j = log pi yj ; u j   pc (x ; u) dη   = log        j {x | f (x)=yj } , 6 / 36
  • 18. EM for mixture models The complete data is a composition of the incomplete data and the missing data. xj = yj , zj , 7 / 36
  • 19. EM for mixture models The complete data is a composition of the incomplete data and the missing data. xj = yj , zj Mixture model: p (y) = πl p (y | al ) i , 7 / 36
  • 20. EM for mixture models The complete data is a composition of the incomplete data and the missing data. xj = yj , zj Mixture model: p (y) = πl p (y | al ) i Complete data log likelihood:  g          zlj log p yj | al     j∈ observations l=1 , 7 / 36
  • 21. EM E-step: Compute the expected value for zj for each j. (s) i.e. Compute ¯j . This results in ¯s = [y, ¯s ] z x z , 8 / 36
  • 22. EM E-step: Compute the expected value for zj for each j. (s) i.e. Compute ¯j . This results in ¯s = [y, ¯s ] z x z M-step: Maximize the complete data log-likelihood with respect to u us+1 = arg max Lc (¯s ; u) x u , 8 / 36
  • 23. EM E-step: Compute the expected value for zj for each j. (s) i.e. Compute ¯j . This results in ¯s = [y, ¯s ] z x z M-step: Maximize the complete data log-likelihood with respect to u us+1 = arg max Lc (¯s ; u) x u = arg max Lc ([y, ¯s ] ; u) z u , 8 / 36
  • 24. EM in General Case Expected value of the complete data log-likelihood: Q u ; u(s) = Lc (x ; u) p x | u(s) , y dx We maximize with respect to u to get. us+1 = arg max Q u ; u(s) u , 9 / 36
  • 25. Image Segmentation W HAT IS M ISSING DATA ? An (n × g) matrix I of indicator variables. Expectation step: E(Ilm ) = ¯lm = I , 10 / 36
  • 26. Image Segmentation W HAT IS M ISSING DATA ? An (n × g) matrix I of indicator variables. Expectation step: E(Ilm ) = ¯lm = I 1 · P l th pixel comes from mth blob + 0 · P l th pixel does not come from mth blob , 10 / 36
  • 27. Image Segmentation W HAT IS M ISSING DATA ? An (n × g) matrix I of indicator variables. Expectation step: E(Ilm ) = ¯lm = I 1 · P l th pixel comes from mth blob + 0 · P l th pixel does not come from mth blob = P l th pixel comes from mth blob , 10 / 36
  • 28. Image Segmentation W HAT IS M ISSING DATA ? An (n × g) matrix I of indicator variables. Expectation step: E(Ilm ) = ¯lm = I 1 · P l th pixel comes from mth blob + 0 · P l th pixel does not come from mth blob = P l th pixel comes from mth blob We get: (s) (s) αm pm xl | θm ¯lm = I (s) K k =1 αk pk xl | θ(s) k , 10 / 36
  • 29. Image Segmentation C OMPLETE DATA LOG - LIKELIHOOD :  g    Lc [x, ¯lm ] ; Θ(s) = ¯lm log p (xl | θm )    I I        l∈ all pixel m=1 , 11 / 36
  • 30. Image Segmentation C OMPLETE DATA LOG - LIKELIHOOD :  g    Lc [x, ¯lm ] ; Θ(s) = ¯lm log p (xl | θm )    I I        l∈ all pixel m=1 Maximization step: Θ(s+1) = arg max Lc [x, ¯lm ] ; Θ(s) I Θ , 11 / 36
  • 31. Image Segmentation Maximization step: n (s+1) 1 αm = p m | xl , Θ(s) n l=1 n l=1 xl p m | xl , Θ(s) µ(s+1) = m n l=1 p m | xl , Θ(s) (s) (s) n l=1 p m | xl , Θ(s) xl − µm xl − µm (s+1) Σm = n l=1 p m | xl , Θ(s) , 12 / 36
  • 32. How EM works for Image Segmentation E-step: (s) (s) αm pm xl | θm ¯lm = I (s) K k =1 αk pk xl | θ(s) m (s) (s) For each pixel we compute the values: αm pm xl | θm for each segment m. (s) (s) For each pixel compute the sum K=1 αk pk xl | θm k , i.e. perform summation over all the K segments. Divide the former by the latter. M-step: (s+1) (s+1) (s+1) Compute the αm , µm , Σm , 13 / 36
  • 33. Line Fitting Expectation Maximization W HAT IS M ISSING DATA ? An (n × g) matrix M of indicator variables. 1 if point k is drawn from line l k, lth entry of M = mk,l = 0 otherwise , 14 / 36
  • 34. Line Fitting Expectation Maximization W HAT IS M ISSING DATA ? An (n × g) matrix M of indicator variables. 1 if point k is drawn from line l k, lth entry of M = mk,l = 0 otherwise P (mkl = 1 | point k, line l s parameters) = 1. l , 14 / 36
  • 35. Line Fitting Expectation Maximization W HAT IS M ISSING DATA ? An (n × g) matrix M of indicator variables. 1 if point k is drawn from line l k, lth entry of M = mk,l = 0 otherwise P (mkl = 1 | point k, line l s parameters) = 1. l H OW TO FORMULATE LIKELIHOOD ? , 14 / 36
  • 36. Line Fitting Expectation Maximization W HAT IS M ISSING DATA ? An (n × g) matrix M of indicator variables. 1 if point k is drawn from line l k, lth entry of M = mk,l = 0 otherwise P (mkl = 1 | point k, line l s parameters) = 1. l H OW TO FORMULATE LIKELIHOOD ? (distance from point k to line l )2 exp − 2σ2 , 14 / 36
  • 37. Motion Segmentation EM W HAT IS M ISSING DATA ? It is the motion field to which the pixel l belongs. Indicator variable Vxy,l is the xy, l th entry of V . 1 if xy th pixel belongs to the l th motion field Vxy,l = 0 otherwise H OW TO FORMULATE LIKELIHOOD ? , 15 / 36
  • 38. Motion Segmentation EM W HAT IS M ISSING DATA ? It is the motion field to which the pixel l belongs. Indicator variable Vxy,l is the xy, l th entry of V . 1 if xy th pixel belongs to the l th motion field Vxy,l = 0 otherwise H OW TO FORMULATE LIKELIHOOD ? (I1 (x, y) − I2 (x+m1 (x, y ; θl ), y+m2 (x, y ; θl )) ) 2 L(V , Θ) = − Vxy,l xy,l 2σ2 where Θ = θ1 , θ2 , . . . θg P Vxy,l = 1 ; I1 , I2 , Θ , 15 / 36
  • 39. Motion Segmentation EM H OW TO FORMULATE LIKELIHOOD ? P Vxy,l = 1 ; I1 , I2 , Θ A common choice is the affine motion model: m1 a11 a12 x a13 (x, y ; θl ) = m2 a21 a22 y a23 where θl = (a11 , a12 , . . . , a23 ) Layered representation , 16 / 36
  • 40. Identifying Outliers EM We construct an explicit model of the outliers. (1 − λ) P (measurements | model) +λ P (outliers) Here λ = [0, 1] models the frequency with which the outliers occur, P (outliers) is the probability model for the outliers. W HAT IS M ISSING DATA ? A variable that indicates which component generated each point. , 17 / 36
  • 41. Identifying Outliers EM We construct an explicit model of the outliers. (1 − λ) P (measurements | model) +λ P (outliers) Here λ = [0, 1] models the frequency with which the outliers occur, P (outliers) is the probability model for the outliers. W HAT IS M ISSING DATA ? A variable that indicates which component generated each point. Complete data likelihood (1 − λ) P measurementj | model + λ P measurementj | outliers j , 17 / 36
  • 42. Background Subtraction EM For each pixel we get a series of observations for the successive frames. The source of these obeservations is a mixture model with two components: the background and the noise (foreground). The background can be modeled as a Gaussian. The noise can come from some uniform source. Any pixel which belongs to noise is not background. , 18 / 36
  • 43. Difficulties Expectation Maximization Local minima. Proper initialization. Extremely small expected weights. Parameters converging to the boundaries of parameter space. , 19 / 36
  • 44. Model Selection Should we consider minimizing the negative of log likelihood? , 20 / 36
  • 45. Model Selection Should we consider minimizing the negative of log likelihood? We should have a penalty term which increases as the number of components increase. , 20 / 36
  • 46. Model Selection Should we consider minimizing the negative of log likelihood? We should have a penalty term which increases as the number of components increase. An Information Criteria (AIC) −2L(x ; Θ∗ ) + 2p where p is the number of free parameters. , 20 / 36
  • 47. Model Selection Should we consider minimizing the negative of log likelihood? We should have a penalty term which increases as the number of components increase. An Information Criteria (AIC) −2L(x ; Θ∗ ) + 2p where p is the number of free parameters. Bayesian Information Criteria (BIC) p −L(D ; θ∗ ) + log N 2 where p is the number of free parameters. , 20 / 36
  • 48. Bayesian Information Criteria (BIC) P (D | M) P (M | D) = P (M) P (D) P (D | M , θ) P (θ) dθ = P (M) P (D) Maximizing the posterior P (M | D) yields: p −L(D ; θ∗ ) + 2 log N where p is the number of free parameters. , 21 / 36
  • 49. Minimum Description Length (MDL) criteria It yields a selection criteria which is the same as BIC. p −L(D ; θ∗ ) + log N 2 where p is the number of free parameters. , 22 / 36