1. P ROBABILISTIC S EGMENTATION
IIT Kharagpur
Computer Science and Engineering,
Indian Institute of Technology
Kharagpur.
,
1 / 36
2. Mixture Model Image Segmentation
Probability of generating a pixel measurement vector:
p (x) = p (x | θl ) πl
l
,
2 / 36
3. Mixture Model Image Segmentation
Probability of generating a pixel measurement vector:
p (x) = p (x | θl ) πl
l
The mixture model has the form:
g
p (x | Θ) = αl pl (x | θl )
l=1
,
2 / 36
4. Mixture Model Image Segmentation
Probability of generating a pixel measurement vector:
p (x) = p (x | θl ) πl
l
The mixture model has the form:
g
p (x | Θ) = αl pl (x | θl )
l=1
Component densities:
1 1
pl (x | θl ) = d/2 1/2
exp − x − µl Σ−1 x − µl
l
(2π) det(Σl ) 2
,
2 / 36
6. Mixture Model Line Fitting
p (W) = πl p (W | al )
l
,
4 / 36
7. Mixture Model Line Fitting
p (W) = πl p (W | al )
l
Likelihood for a set of observations:
g
πl
pl Wj | al
j∈ observations l=1
,
4 / 36
8. Missing data problems
Lc (x ; u) = log
pc xj ; u
j
= log pc xj ; u
j
,
5 / 36
9. Missing data problems
Lc (x ; u) = log
pc xj ; u
j
= log pc xj ; u
j
The incomplete data space:
pi (y ; u) =
,
5 / 36
10. Missing data problems
Lc (x ; u) = log
pc xj ; u
j
= log pc xj ; u
j
The incomplete data space:
pi (y ; u) = pc (x ; u)
,
5 / 36
11. Missing data problems
Lc (x ; u) = log
pc xj ; u
j
= log pc xj ; u
j
The incomplete data space:
pi (y ; u) = pc (x ; u)
(x | f (x)=y)
,
5 / 36
12. Missing data problems
Lc (x ; u) = log
pc xj ; u
j
= log pc xj ; u
j
The incomplete data space:
pi (y ; u) = pc (x ; u) dη
(x | f (x)=y)
where η measures volume on the space of x such that f (x) = y
,
5 / 36
15. Missing data problems
The incomplete data likelihood:
pi yj ; u
j∈ observations
Li (y ; u) = log
pi yj ; u
j
,
6 / 36
16. Missing data problems
The incomplete data likelihood:
pi yj ; u
j∈ observations
Li (y ; u) = log
pi yj ; u
j
= log pi yj ; u
j
,
6 / 36
17. Missing data problems
The incomplete data likelihood:
pi yj ; u
j∈ observations
Li (y ; u) = log
pi yj ; u
j
= log pi yj ; u
j
pc (x ; u) dη
= log
j {x | f (x)=yj }
,
6 / 36
18. EM for mixture models
The complete data is a composition of the incomplete data and the
missing data.
xj = yj , zj
,
7 / 36
19. EM for mixture models
The complete data is a composition of the incomplete data and the
missing data.
xj = yj , zj
Mixture model:
p (y) = πl p (y | al )
i
,
7 / 36
20. EM for mixture models
The complete data is a composition of the incomplete data and the
missing data.
xj = yj , zj
Mixture model:
p (y) = πl p (y | al )
i
Complete data log likelihood:
g
zlj log p yj | al
j∈ observations l=1
,
7 / 36
21. EM
E-step: Compute the expected value for zj for each j.
(s)
i.e. Compute ¯j . This results in ¯s = [y, ¯s ]
z x z
,
8 / 36
22. EM
E-step: Compute the expected value for zj for each j.
(s)
i.e. Compute ¯j . This results in ¯s = [y, ¯s ]
z x z
M-step: Maximize the complete data log-likelihood
with respect to u
us+1 = arg max Lc (¯s ; u)
x
u
,
8 / 36
23. EM
E-step: Compute the expected value for zj for each j.
(s)
i.e. Compute ¯j . This results in ¯s = [y, ¯s ]
z x z
M-step: Maximize the complete data log-likelihood
with respect to u
us+1 = arg max Lc (¯s ; u)
x
u
= arg max Lc ([y, ¯s ] ; u)
z
u
,
8 / 36
24. EM in General Case
Expected value of the complete data log-likelihood:
Q u ; u(s) = Lc (x ; u) p x | u(s) , y dx
We maximize with respect to u to get.
us+1 = arg max Q u ; u(s)
u
,
9 / 36
25. Image Segmentation
W HAT IS M ISSING DATA ? An (n × g) matrix I of indicator variables.
Expectation step:
E(Ilm ) = ¯lm =
I
,
10 / 36
26. Image Segmentation
W HAT IS M ISSING DATA ? An (n × g) matrix I of indicator variables.
Expectation step:
E(Ilm ) = ¯lm =
I 1 · P l th pixel comes from mth blob
+ 0 · P l th pixel does not come from mth blob
,
10 / 36
27. Image Segmentation
W HAT IS M ISSING DATA ? An (n × g) matrix I of indicator variables.
Expectation step:
E(Ilm ) = ¯lm =
I 1 · P l th pixel comes from mth blob
+ 0 · P l th pixel does not come from mth blob
= P l th pixel comes from mth blob
,
10 / 36
28. Image Segmentation
W HAT IS M ISSING DATA ? An (n × g) matrix I of indicator variables.
Expectation step:
E(Ilm ) = ¯lm =
I 1 · P l th pixel comes from mth blob
+ 0 · P l th pixel does not come from mth blob
= P l th pixel comes from mth blob
We get:
(s) (s)
αm pm xl | θm
¯lm =
I (s)
K
k =1 αk pk xl | θ(s)
k
,
10 / 36
29. Image Segmentation
C OMPLETE DATA LOG - LIKELIHOOD :
g
Lc [x, ¯lm ] ; Θ(s) = ¯lm log p (xl | θm )
I I
l∈ all pixel m=1
,
11 / 36
30. Image Segmentation
C OMPLETE DATA LOG - LIKELIHOOD :
g
Lc [x, ¯lm ] ; Θ(s) = ¯lm log p (xl | θm )
I I
l∈ all pixel m=1
Maximization step:
Θ(s+1) = arg max Lc [x, ¯lm ] ; Θ(s)
I
Θ
,
11 / 36
31. Image Segmentation
Maximization step:
n
(s+1) 1
αm = p m | xl , Θ(s)
n
l=1
n
l=1 xl p m | xl , Θ(s)
µ(s+1) =
m n
l=1 p m | xl , Θ(s)
(s) (s)
n
l=1 p m | xl , Θ(s) xl − µm xl − µm
(s+1)
Σm =
n
l=1 p m | xl , Θ(s)
,
12 / 36
32. How EM works for Image Segmentation
E-step:
(s) (s)
αm pm xl | θm
¯lm =
I (s)
K
k =1 αk pk xl | θ(s)
m
(s) (s)
For each pixel we compute the values: αm pm xl | θm for each
segment m.
(s) (s)
For each pixel compute the sum K=1 αk pk xl | θm
k , i.e.
perform summation over all the K segments.
Divide the former by the latter.
M-step:
(s+1) (s+1) (s+1)
Compute the αm , µm , Σm
,
13 / 36
33. Line Fitting Expectation Maximization
W HAT IS M ISSING DATA ?
An (n × g) matrix M of indicator variables.
1 if point k is drawn from line l
k, lth entry of M = mk,l =
0 otherwise
,
14 / 36
34. Line Fitting Expectation Maximization
W HAT IS M ISSING DATA ?
An (n × g) matrix M of indicator variables.
1 if point k is drawn from line l
k, lth entry of M = mk,l =
0 otherwise
P (mkl = 1 | point k, line l s parameters) = 1.
l
,
14 / 36
35. Line Fitting Expectation Maximization
W HAT IS M ISSING DATA ?
An (n × g) matrix M of indicator variables.
1 if point k is drawn from line l
k, lth entry of M = mk,l =
0 otherwise
P (mkl = 1 | point k, line l s parameters) = 1.
l
H OW TO FORMULATE LIKELIHOOD ?
,
14 / 36
36. Line Fitting Expectation Maximization
W HAT IS M ISSING DATA ?
An (n × g) matrix M of indicator variables.
1 if point k is drawn from line l
k, lth entry of M = mk,l =
0 otherwise
P (mkl = 1 | point k, line l s parameters) = 1.
l
H OW TO FORMULATE LIKELIHOOD ?
(distance from point k to line l )2
exp −
2σ2
,
14 / 36
37. Motion Segmentation EM
W HAT IS M ISSING DATA ? It is the motion field to which the pixel l
belongs. Indicator variable Vxy,l is the xy, l th entry of V .
1 if xy th pixel belongs to the l th motion field
Vxy,l =
0 otherwise
H OW TO FORMULATE LIKELIHOOD ?
,
15 / 36
38. Motion Segmentation EM
W HAT IS M ISSING DATA ? It is the motion field to which the pixel l
belongs. Indicator variable Vxy,l is the xy, l th entry of V .
1 if xy th pixel belongs to the l th motion field
Vxy,l =
0 otherwise
H OW TO FORMULATE LIKELIHOOD ?
(I1 (x, y) − I2 (x+m1 (x, y ; θl ), y+m2 (x, y ; θl )) ) 2
L(V , Θ) = − Vxy,l
xy,l
2σ2
where Θ = θ1 , θ2 , . . . θg
P Vxy,l = 1 ; I1 , I2 , Θ
,
15 / 36
39. Motion Segmentation EM
H OW TO FORMULATE LIKELIHOOD ?
P Vxy,l = 1 ; I1 , I2 , Θ
A common choice is the affine motion model:
m1 a11 a12 x a13
(x, y ; θl ) =
m2 a21 a22 y a23
where θl = (a11 , a12 , . . . , a23 )
Layered representation
,
16 / 36
40. Identifying Outliers EM
We construct an explicit model of the outliers.
(1 − λ) P (measurements | model) +λ P (outliers)
Here λ = [0, 1] models the frequency with which the outliers
occur,
P (outliers) is the probability model for the outliers.
W HAT IS M ISSING DATA ?
A variable that indicates which component generated each point.
,
17 / 36
41. Identifying Outliers EM
We construct an explicit model of the outliers.
(1 − λ) P (measurements | model) +λ P (outliers)
Here λ = [0, 1] models the frequency with which the outliers
occur,
P (outliers) is the probability model for the outliers.
W HAT IS M ISSING DATA ?
A variable that indicates which component generated each point.
Complete data likelihood
(1 − λ) P measurementj | model + λ P measurementj | outliers
j
,
17 / 36
42. Background Subtraction EM
For each pixel we get a series of observations for the successive
frames.
The source of these obeservations is a mixture model with two
components: the background and the noise (foreground).
The background can be modeled as a Gaussian.
The noise can come from some uniform source.
Any pixel which belongs to noise is not background.
,
18 / 36
43. Difficulties Expectation Maximization
Local minima.
Proper initialization.
Extremely small expected weights.
Parameters converging to the boundaries of parameter space.
,
19 / 36
44. Model Selection
Should we consider minimizing the negative of log likelihood?
,
20 / 36
45. Model Selection
Should we consider minimizing the negative of log likelihood?
We should have a penalty term which increases as the number of
components increase.
,
20 / 36
46. Model Selection
Should we consider minimizing the negative of log likelihood?
We should have a penalty term which increases as the number of
components increase.
An Information Criteria (AIC)
−2L(x ; Θ∗ ) + 2p
where p is the number of free parameters.
,
20 / 36
47. Model Selection
Should we consider minimizing the negative of log likelihood?
We should have a penalty term which increases as the number of
components increase.
An Information Criteria (AIC)
−2L(x ; Θ∗ ) + 2p
where p is the number of free parameters.
Bayesian Information Criteria (BIC)
p
−L(D ; θ∗ ) + log N
2
where p is the number of free parameters.
,
20 / 36
48. Bayesian Information Criteria (BIC)
P (D | M)
P (M | D) = P (M)
P (D)
P (D | M , θ) P (θ) dθ
= P (M)
P (D)
Maximizing the posterior P (M | D) yields:
p
−L(D ; θ∗ ) + 2 log N
where p is the number of free parameters.
,
21 / 36
49. Minimum Description Length (MDL) criteria
It yields a selection criteria which is the same as BIC.
p
−L(D ; θ∗ ) + log N
2
where p is the number of free parameters.
,
22 / 36