Lecture 12

P ROBABILISTIC S EGMENTATION

IIT Kharagpur

Computer Science and Engineering,
Indian Institute of Technology
Kharagpur.

,

1 / 36

Mixture Model Image Segmentation
Probability of generating a pixel measurement vector:

p (x) = p (x | θl ) πl
l

,

2 / 36


p (x) = p (x | θl ) πl
l

The mixture model has the form:
g
p (x | Θ) = αl pl (x | θl )
l=1

,

2 / 36


p (x) = p (x | θl ) πl
l

The mixture model has the form:
g
p (x | Θ) = αl pl (x | θl )
l=1

Component densities:

1 1
pl (x | θl ) = d/2 1/2
exp − x − µl Σ−1 x − µl
l
(2π) det(Σl ) 2

,

2 / 36

Image Segmentation
Likelihood for all observations (data points):
 g 
 
αl pl xj | θl

 


 

 
j∈ observations l=1

,

3 / 36

Mixture Model Line Fitting
p (W) = πl p (W | al )
l

,

4 / 36

Mixture Model Line Fitting
p (W) = πl p (W | al )
l

Likelihood for a set of observations:
 g 
 
πl
 




pl Wj | al 




,

4 / 36

Missing data problems
 

 

Lc (x ; u) = log 



pc xj ; u 



j

= log pc xj ; u
j

,

5 / 36

 

 

Lc (x ; u) = log 



pc xj ; u 



j

= log pc xj ; u
j

The incomplete data space:

pi (y ; u) =

,

5 / 36

 

 

Lc (x ; u) = log 



pc xj ; u 



j

= log pc xj ; u
j


pi (y ; u) = pc (x ; u)

,

5 / 36

 

 

Lc (x ; u) = log 



pc xj ; u 



j

= log pc xj ; u
j


pi (y ; u) = pc (x ; u)
(x | f (x)=y)

,

5 / 36

 

 

Lc (x ; u) = log 



pc xj ; u 



j

= log pc xj ; u
j


pi (y ; u) = pc (x ; u) dη
(x | f (x)=y)

where η measures volume on the space of x such that f (x) = y

,

5 / 36

The incomplete data likelihood:

pi yj ; u
j∈ observations

,

6 / 36


pi yj ; u
j∈ observations

Li (y ; u)

,

6 / 36


pi yj ; u
j∈ observations

 

 

Li (y ; u) = log 



pi yj ; u 



j

,

6 / 36


pi yj ; u
j∈ observations

 

 

Li (y ; u) = log 



pi yj ; u 



j

= log pi yj ; u
j

,

6 / 36


pi yj ; u
j∈ observations

 

 

Li (y ; u) = log 



pi yj ; u 



j

= log pi yj ; u
j
 
pc (x ; u) dη
 
= log 
 

 

j {x | f (x)=yj }

,

6 / 36

EM for mixture models
The complete data is a composition of the incomplete data and the
missing data.
xj = yj , zj

,

7 / 36

missing data.
xj = yj , zj
Mixture model:
p (y) = πl p (y | al )
i

,

7 / 36

missing data.
xj = yj , zj
Mixture model:
p (y) = πl p (y | al )
i

Complete data log likelihood:
 g


 





zlj log p yj | al 




,

7 / 36

EM
E-step: Compute the expected value for zj for each j.
(s)
i.e. Compute ¯j . This results in ¯s = [y, ¯s ]
z x z

,

8 / 36

EM
(s)
z x z
M-step: Maximize the complete data log-likelihood
with respect to u

us+1 = arg max Lc (¯s ; u)
x
u

,

8 / 36

EM
(s)
z x z
M-step: Maximize the complete data log-likelihood
with respect to u

us+1 = arg max Lc (¯s ; u)
x
u
= arg max Lc ([y, ¯s ] ; u)
z
u

,

8 / 36

EM in General Case
Expected value of the complete data log-likelihood:

Q u ; u(s) = Lc (x ; u) p x | u(s) , y dx

We maximize with respect to u to get.

us+1 = arg max Q u ; u(s)
u

,

9 / 36

Image Segmentation
W HAT IS M ISSING DATA ? An (n × g) matrix I of indicator variables.

Expectation step:

E(Ilm ) = ¯lm =
I

,

10 / 36

Image Segmentation

Expectation step:

E(Ilm ) = ¯lm =
I 1 · P l th pixel comes from mth blob
+ 0 · P l th pixel does not come from mth blob

,

10 / 36

Image Segmentation

Expectation step:

E(Ilm ) = ¯lm =
= P l th pixel comes from mth blob

,

10 / 36

Image Segmentation

Expectation step:

E(Ilm ) = ¯lm =
= P l th pixel comes from mth blob

We get:
(s) (s)
αm pm xl | θm
¯lm =
I (s)
K
k =1 αk pk xl | θ(s)
k

,

10 / 36

Image Segmentation
C OMPLETE DATA LOG - LIKELIHOOD :
 g

 
Lc [x, ¯lm ] ; Θ(s) = ¯lm log p (xl | θm ) 
 
I I


 

 
l∈ all pixel m=1

,

11 / 36

Image Segmentation
C OMPLETE DATA LOG - LIKELIHOOD :
 g

 
Lc [x, ¯lm ] ; Θ(s) = ¯lm log p (xl | θm ) 
 
I I


 

 
l∈ all pixel m=1

Maximization step:

Θ(s+1) = arg max Lc [x, ¯lm ] ; Θ(s)
I
Θ

,

11 / 36

How EM works for Image Segmentation
E-step:
(s) (s)
αm pm xl | θm
¯lm =
I (s)
K
k =1 αk pk xl | θ(s)
m

(s) (s)
For each pixel we compute the values: αm pm xl | θm for each
segment m.
(s) (s)
For each pixel compute the sum K=1 αk pk xl | θm
k , i.e.
perform summation over all the K segments.
Divide the former by the latter.
M-step:
(s+1) (s+1) (s+1)
Compute the αm , µm , Σm

,

13 / 36

Line Fitting Expectation Maximization
W HAT IS M ISSING DATA ?
An (n × g) matrix M of indicator variables.

1 if point k is drawn from line l
k, lth entry of M = mk,l =
0 otherwise

,

14 / 36


0 otherwise

P (mkl = 1 | point k, line l s parameters) = 1.
l

,

14 / 36


0 otherwise

l

H OW TO FORMULATE LIKELIHOOD ?

,

14 / 36


0 otherwise

l


(distance from point k to line l )2
exp −
2σ2

,

14 / 36

Motion Segmentation EM
W HAT IS M ISSING DATA ? It is the motion ﬁeld to which the pixel l
belongs. Indicator variable Vxy,l is the xy, l th entry of V .

1 if xy th pixel belongs to the l th motion ﬁeld
Vxy,l =
0 otherwise

,

15 / 36

W HAT IS M ISSING DATA ? It is the motion ﬁeld to which the pixel l
belongs. Indicator variable Vxy,l is the xy, l th entry of V .

1 if xy th pixel belongs to the l th motion ﬁeld
Vxy,l =
0 otherwise

(I1 (x, y) − I2 (x+m1 (x, y ; θl ), y+m2 (x, y ; θl )) ) 2
L(V , Θ) = − Vxy,l
xy,l
2σ2

where Θ = θ1 , θ2 , . . . θg

P Vxy,l = 1 ; I1 , I2 , Θ

,

15 / 36


P Vxy,l = 1 ; I1 , I2 , Θ

A common choice is the afﬁne motion model:

m1 a11 a12 x a13
(x, y ; θl ) =
m2 a21 a22 y a23

where θl = (a11 , a12 , . . . , a23 )

Layered representation

,

16 / 36

Identifying Outliers EM
We construct an explicit model of the outliers.

(1 − λ) P (measurements | model) +λ P (outliers)

Here λ = [0, 1] models the frequency with which the outliers
occur,
P (outliers) is the probability model for the outliers.
A variable that indicates which component generated each point.

,

17 / 36

Identifying Outliers EM
We construct an explicit model of the outliers.

(1 − λ) P (measurements | model) +λ P (outliers)

Here λ = [0, 1] models the frequency with which the outliers
occur,
P (outliers) is the probability model for the outliers.
A variable that indicates which component generated each point.
Complete data likelihood

(1 − λ) P measurementj | model + λ P measurementj | outliers
j

,

17 / 36

Background Subtraction EM
For each pixel we get a series of observations for the successive
frames.
The source of these obeservations is a mixture model with two
components: the background and the noise (foreground).
The background can be modeled as a Gaussian.
The noise can come from some uniform source.
Any pixel which belongs to noise is not background.

,

18 / 36

Difﬁculties Expectation Maximization
Local minima.
Proper initialization.
Extremely small expected weights.
Parameters converging to the boundaries of parameter space.

,

19 / 36

Model Selection
Should we consider minimizing the negative of log likelihood?

,

20 / 36

Model Selection
We should have a penalty term which increases as the number of
components increase.

,

20 / 36

Model Selection

An Information Criteria (AIC)
−2L(x ; Θ∗ ) + 2p
where p is the number of free parameters.

,

20 / 36

Model Selection

An Information Criteria (AIC)
−2L(x ; Θ∗ ) + 2p

Bayesian Information Criteria (BIC)
p
−L(D ; θ∗ ) + log N
2

,

20 / 36

Bayesian Information Criteria (BIC)

P (D | M)
P (M | D) = P (M)
P (D)
P (D | M , θ) P (θ) dθ
= P (M)
P (D)
Maximizing the posterior P (M | D) yields:
p
−L(D ; θ∗ ) + 2 log N


,

21 / 36

Minimum Description Length (MDL) criteria
It yields a selection criteria which is the same as BIC.
p
−L(D ; θ∗ ) + log N
2

,

22 / 36

Lecture 12

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Lecture 12

Similar to Lecture 12 (20)

Lecture 12