SlideShare a Scribd company logo
1 of 30
Download to read offline
Report on Two-stage Convex Image Segmentation Method
and Super Pixels
Hongyi LI∗
August 2014
Abstract
This summer we studied the paper ’A Two-Stage Image Segmentation Method Using a Convex
Variant of the Mumford-Shah Model and Thresholding’ and we also studied about super pixel.
This report is mainly about very basic and primary understandings about these two ideas. We
focus on the algorithms of these methods through matlab where some difficulties would be faced
by beginners. We tried to follow the algorithm of two-stage segmentation method provide in the
paper and we leant normalized cutting method to do super pixel segmentation. More than that we
concentrated more on a faster algorithm called SLIC method and made some attempt to improve
that method. Lastly we also found some applications such as Saliency Detection based on super
pixel segmentation.
Acknowledgement
I would like to express my greatest gratitude to the people who have helped and supported me through-
out my work. I wish to firstly thank Prof. Sunney Chan. This summer research is supported by prof.
Sunney Chan and I deeply appreciate him for providing such a chance for students like me touch the
top things. In this summer research program I really have found some new things and ideas. Besides
I think I have got a more advanced understanding about how research should be conducted and what
academic world is. All these nice experience I get and the progress I probably make should be cred-
ited to prof. Sinney Chan. Due to his supporting, I could get enough recourse this summer to finish
this work. I am equally grateful to my supervisor dr. Tieyong Zeng. This work is supervised by dr.
Tieyong Zeng and without his help I could hardly finish this report.
1 Introduction
Image segmentation is our main topic this summer. The first thing for a beginner in this area is the
understanding of image. Here we introduce two types of comprehension of image. Firstly we can regard
an image as a function defined on the x-y plane. In this way, coordinate just means position of pixel and
function values means grey or colour value of that pixel. The advantage of this understanding is that
we can use functional tools to absorb information about image. For example, we may take derivatives
∗12051217@life.hkbu.edu.hk
1
on each pixel to detect edges which would yields larger absolute value or we can take convulsion with
a nice kernel to draw texture information. Indeed our two-stage convex method mainly sees images
this way.
Secondly an image can also be regard as a graph with pixels serve as verticals. However since our
image is usually large, 256 × 256 for instance, we need to deal with graph with 65536 verticals. This
can be a disadvantage since we seem to need too much computation. In fact, the idea of super pixels
is more based on this perspective.
Now let’s come back to segmentation. When we see image as function, we need to find compact
curves to segment it. The famous Mumford-Shah model provide direction of finding curves and our
two-stage model is based on Mumford-Shah model. The two-stage method provide a very helpful way
to find curves, that is to find the contours of another function somehow got from the original one.
When we see image as graph, we need to define distance among verticals and then do clustering,
each group of verticals is the collection of pixels forming each piece of our segmentation. The main
idea is thus naturally to group pixels with litter or larger distances or similarities with each other.
A general algorithm for this sophisticated problem is normalized cuts which can solve all problem of
this kind. Yet, since our image is always a flat plane, SLIC method was produced to give a efficient
clustering of pixels of image.
Our main work this summer is to study those method mentioned above and make some attempt
to combine them somehow to get possible improvement.
2 Two-stage Image Segmentation Method Using a Convex Method
This is an efficient method to transform an image to one enough piecewise smooth that can be directly
separated by its value [1]. The most important work is to change the Mumford-Shah model into a
convex model and excoriate one variable i.e. Γ in Mumford-Shah model, which is
Ems(g, Γ) =
λ
2 Ω
(f − g)2
dx +
µ
2 ΩΓ
| g|2
dx + length(Γ) (1)
into
E(g) =
λ
2 Ω
(f − g)2
dx +
µ
2 Ω
| g|2
dx +
Ω
| g|dx. (2)
2.1 Inspiration
This idea rises from binary images. Since a binary image contains only 2 value, if we have found the
boundary and set it to be Γ in Mumford-Shah model, what left is just to find g, which would be easy.
It is the same to find the region Σ in the image has value in certain range, i.e. > 0.5, s.t. ∂Σ = Γ,
Σ = Closure(inside(Γ)). Let g1 be restriction of g on Σ  Γ, g2 be restriction of g on Ω  Σ, then
Mumford-Shah model becomes:
min
(Σ,g1,g2)
{E(Σ, g1, g2) =
λ
2 ΣΓ
(f − g1)2
dx +
λ
2 ΩΣ
(f − g2)2
dx
+
µ
2 ΣΓ
| g1|2
dx +
µ
2 ΩΣ
| g2|2
dx + Per(Σ)}.
(3)
2
Firstly, assuming that Σ is fixed, the following two minimizers need to be found:
λ ΣΓ
(f − g1)2
dx + µ ΣΓ
| g1|2
dx,
λ ΩΣ
(f − g2)2
dx + µ ΩΣ
| g2|2
dx.
(4)
Proposition 1. The set of formulas of g1, g2 in (4) has unique minimizer.
Proof. Introduce inner product
inner product: u, v W = Ω
uv + Σk
i=1
∂u
∂xi
∂v
∂xi
dx,
by Cauchy-Schwarz, u, v 2
W u, u W · v, v W , u, v W is linear, thus it is well defined.
Now introduce
W-norm: |u|2
W = u, u W ,
energy-function: E(u) = µ2
Ω
(u − g)2
dx + Ω
| µ|2
dx .
If we take u0 = 0, then E(u0) = µ2
Ω
g2
dx < +∞, thus 0 inf(E) < +∞.
Let’s consider ∀u, v, we know that
u + v
2
2
2 +
u − v
2
2
2 =
Ω
u2
2
+
v2
2
dx =
1
2
u 2
2 +
1
2
v 2
2, (5)
u + v
2
2
2 =
1
2
u 2
2 +
1
2
v 2
2 −
u − v
2
2
2. (6)
Applying this formula, we have
E(
u + v
2
) = µ2
Ω
(
u − g
2
+
v − g
2
2
)dx +
Ω
|
u
2
+
v
2
|2
dx
=
µ2
2 Ω
(u − g)2
dx +
µ2
2 Ω
(v − g)2
dx − µ2
Ω
(
u − v
2
)2
dx
+
1
2 Ω
| u|2
dx +
1
2 Ω
| v|2
dx −
Ω
|
u + v
2
|2
dx
=
E(u)
2
+
E(v)
2
− G(u, v),
(7)
where G(u, v) =
1
4 (µ2
u − v 2
W + (1 − µ2
) (u − v) 2
2) (1 > µ2
),
1
4 (µ2
u − v 2
W + (µ2
− 1) (u − v) 2
2) (1 µ2
).
Suppose that inf(E) = m, then
∀ε, ∃u, v s.t. E(u), E(v) < m + ε, we have m E(u+v
2 ) < m + ε − G(u, v),
thus we have,
G(u, v) < ε,
u − v 2
W < 4ε
µ2 (1 > µ2
),
u − v 2
W < 4ε (1 µ2
).
3
Consequently any sequence u1, u2, u3... s.t. E(ui) −→ m, is Cauchy sequence. Since W provide a
Banach space, there is a limit ui −→ u∗
.
So we only need E(u) to be continuous.
E(u) − E(v) = −2µ2
Ω
(u − v)gdx + u 2
W − v 2
W + (µ2
− 1)
Ω
(u2
− v2
)dx, (8)
knowing | Ω
(u − v)gdx|2
Ω
g2
dx Ω
(u − v)2
dx, it is easy to see E(u) is continuous, thus E(u∗
) = m.
So (4) do have unique minimizer.
Secondly assuming that g1, g2 are fixed, we need to find Σ minimizing E in (3). We should change
this problem into finding a function u as following and setting Σ to be a kind of contour of u.
Introduce
Co-area-formula: Ω
g(x)| u(x)|dx =
+∞
−∞ u−1(t)
g(x)dH(x)dt .
Let’s set g = 1, u ∈ [0, 1], then
Ω
| u(x)|dx =
1
0
length(x|u(x) = t)dt =
1
0
Per(x|u(x) > ρ)dρ. (9)
Then we can change function u ∈ [0, 1] into integral form: u(x) =
1
0
1[0,u(x)](ρ)dρ, where 1I is
characteristic function of set I. Then
Ω
{λ(f − g1)2
+ µ| g1|2
− λ(f − g2)2
− µ| g2|2
}u(x)dx
=
Ω
{λ(f − g1)2
+ µ| g1|2
− λ(f − g2)2
− µ| g2|2
}
1
0
1[0,u(x)](ρ)dρdx
=
1
0 Ω
{λ(f − g1)2
+ µ| g1|2
− λ(f − g2)2
− µ| g2|2
}1[0,u(x)](ρ)dxdρ
=
1
0 x:u(x)>ρ
λ(f − g1)2
+ µ| g1|2
− λ(f − g2)2
− µ| g2|2
dxdρ
=
1
0 x:u(x)>ρ
λ(f − g1)2
+ µ| g1|2
dxdρ +
1
0 x:u(x) ρ
λ(f − g2)2
+ µ| g2|2
dxdρ −
Ω
λ(f − g2)2
+ µ| g2|2
dx,
(10)
where Ω
λ(f − g2)2
+ µ| g2|2
dx is independent with u. Fix u, then let Σ(ρ) = Closure({x|u(x) >
4
ρ}), Γ(ρ) = ∂Σ(ρ), we have
Ω
| u|dx +
1
2 Ω
{λ(f − g1)2
+ µ| g1|2
− λ(f − g2)2
− µ| g2|2
}udx (c) (11)
= (c) −
1
2
1
0 Γ(ρ)
λ(f − g1)2
=
1
0
Per(Σ(ρ))dρ +
1
2
1
0 Σ(ρ)Γ(ρ)
λ(f − g1)2
+ µ| g1|2
dxdρ
+
1
2
1
0 ΩΣ(ρ)
λ(f − g2)2
+ µ| g2|2
dxdρ − C
=
1
0
E(Σ(ρ), g1, g2)dρ − C,
(12)
where C = 1
2 Ω
λ(f − g2)2
+ µ| g2|2
dx, for Γ(ρ) is of measure 0 (we can omit it to get a more integral
formula).
Then the u minimizing (c)(11) give us the set of Σ minimize E(Σ, g1, g2) in (3).
To show this, let the minimizer be Σ0, let u0(x) = 1Σ, then u0 minimizes (c)(11).
So we arrive that: to find Σ is to find
minu Ω
| u|dx + 1
2 Ω
{λ(f − g1)2
+ µ| g1|2
− λ(f − g2)2
− µ| g2|2
}udx .
Therefore, from the above analysis, we actually change the Mumford-Shah model into: minimize
E(u, g1, g2) =
Ω
| u|dx +
1
2 Ω
{λ(f − g1)2
+ µ| g1|2
− λ(f − g2)2
− µ| g2|2
}udx. (13)
What is important here lies in the fact that we indeed use Ω
| u|dx to substitute length(Γ).
Then inspired by (13), the paper give out this kind of mollified Mumford-Shah model:
E(g) =
λ
2 Ω
(f − Ag)2
dx +
µ
2 Ω
| g|2
dx +
Ω
| g|dx, (14)
A is a blur or nothing, by minimizing which we get a new image g from the origin image f that is
smooth enough to be clustered by the value of each pixel to give a satisfying segmentation.
2.2 Soundness and Algorithm
Proposition 2. The energy function E(g) in (14) has unique minimizer.
Proof. • E(g) in (14) is convex.
Since f(x) = x2
is strict convex, we know λ
2 Ω
(f − Ag)2
dx + µ
2 Ω
| g|2
dx is strict convex,
adding that h(x1, x2, ...xn) = Σn
i=1x2
i is convex, we thus have Ω
| g|dx is convex.
Thus E(g) is strict convex.
5
• E(g) in (14) is coercive under W-norm defined above.
Knowing u 2
W = Ω
u2
dx + Ω
| u|2
, we have E(g) Ω
| u|2
.
By Poincar’s inequality, g − gΩ 2 g 2 C E(g),
gΩ f − Ag 2 + f − A(g − gΩ) 2 C1 E(g) + f 2 + A · g − gΩ 2 f 2 + C2 E(g).
So gΩ C3 + C2 E(g), g 2 gΩ 2 + g − gΩ 2 C3 + C2 E(g).
Thus g W C4 + C5 E(g), E(g) is coercive (Ci, Cjs are constance).
Therefore, from convex analysis, E(g) has unique minimizer.
This idea is thus sound.
To solve min E(g) in (14), we introduce split-Bregman algorithm[2, 3].
Define
Dp
E(u, v) = E(u) − E(v) − p, u − v , p ∈ ∂E .
Assume that inf H(u) = 0, and H(u) is differentiable, then H(u) ∈ ∂H(u). Then Bregman claims
that to solve minu E(u) + λH(u), one can iteratively solve
uk+1
= arg min
u
Dpk
E (u, uk
) + λH(u), pk+1
= pk
− λ H(uk+1
). (15)
Then we apply this method to solve minu E(u) + Φ(u) 1.
Let’s rewrite the formula into
minu d 1 + E(u) s.t. d = Φ(u),
and then relax it into
min
(u,d)
d 1 + E(u) +
λ
2
d − Φ(u) 2
2, (16)
(λ is positive constant), let J(u, d) = d 1 + E(u), we come to
min
(u,d)
J(u, d) +
λ
2
d − Φ(u) 2
2. (17)
Now we can apply Bregman iteration setting all initial value to be 0:
(uk+1
, dk+1
) = arg min(u,d) Dpk
J (u, uk
, d, dk
) + λ
2 d − Φ(u) 2
2,
pk+1
u − pk
u + λΦt
(Φuk+1
− dk+1
) = 0; pk+1
d − pk
d + λ(dk+1
− Φuk
+ 1) = 0.
(18)
Thus we have
pk+1
u = −λΦt
Σk+1
i=1 (Φui
− di
); pk+1
d = λΣk+1
i=1 (Φui
− di
). (19)
6
Let bk+1
= bk
+ (Φui
− di
) = Σk+1
i=1 (Φui
− di
), b1
= 0, we have pk
u = −λΦt
bk
, pk
d = λbk
. Therefore,
(uk+1
, dk+1
) = arg min
(u,d)
J(u, d) − J(uk
, dk
) + λ < bk
, Φu − Φuk
>
− λ < bk
, d − dk
> +
λ
2
d − Φu 2
2
= arg min
(u,d)
J(u, d) − J(uk
, dk
) − λ < bk
, d − Φu >
− λ < bk
, dk
− Φuk
> +
λ
2
d − Φu 2
2
= arg min
(u,d)
J(u, d) − λ < bk
, dk
− Φuk
> +
λ
2
d − Φu 2
2
= arg min
(u,d)
J(u, d) +
λ
2
d − Φu − bk 2
2,
(20)
after that we get split-Bregman iteration:
(uk+1
, dk+1
) = arg min(u,d) J(u, d) + λ
2 d − Φu − bk 2
2,
bk+1
= bk
+ (Φuk+1
− dk+1
).
(21)
Now we apply this method into our minimizing of E(g)(14). Our problem becomes:
ming
λ
2 f − Ag 2
2 + µ
2 g 2
2 + (dx, dy) 1 s.t. dx = xg, dy = yg.
We can transfer it to:
min
g,dx,dy
λ
2
f − Ag 2
2 +
µ
2
g 2
2 + (dx, dy) 1 +
σ
2
dx − xg 2
2 +
σ
2
dy − yg 2
2, (22)
from which we have:
(gk+1
, dk+1
x , dk+1
y ) = arg min
g,dx,dy
λ
2
f − Ag 2
2 +
µ
2
g 2
2+
(dx, dy) 1 +
σ
2
dx − xg − bk
x
2
2 +
σ
2
dy − yg − bk
y
2
2,
(23)
bk+1
x = bk
x + ( gk+1
− dk+1
x ); bk+1
y = bk
y + ( gk+1
− dk+1
y ) (al1). (24)
Let’s change (23) into:
gk+1
= arg ming
λ
2 f − Ag 2
2 + µ
2 g 2
2 + σ
2 dx − xg − bk
x
2
2 + σ
2 dy − yg − bk
y
2
2,
(dk+1
x , dk+1
y ) = arg mindx,dy
(dx, dy) 1 + σ
2 dx − xg − bk
x
2
2 + σ
2 dy − yg − bk
y
2
2.
(25)
The second of (25) can be solved by a generalized shrinkage solver[4]:
sk
x = xgk+1
+ bk
x; sk
y = ygk+1
+ bk
y (al2), (26)
dk+1
x = max(sk
−
1
σ
)
sk
x
sk
; dk+1
y = max(sk
−
1
σ
)
sk
y
sk
(al3). (27)
7
To solve the first of (25), we take derivative on g, which give us:
(λAt
A − (µ + σ)∆)g = λAt
f + σ t
x(dk
x − bk
x) + σ t
y(dk
y − bk
y). (28)
But g can also be regarded as a function, which provide us the Fourier transformation to solve (28)
[4] (using F to denote Fourier transformation).
• (1): F∆F−1
, F−1
∆F are diagonal.
We only need to show one of them: Let f be a function, then
F(f(u, v))(s, t) =
∞
−∞
∞
−∞
exp(−c1su − c2tv) · f(u, v)dudv,
where c1, c2 are constants,
∆F(f(u, v))(s, t) =
∞
−∞
∞
−∞
(c2
1u2
+ c2
2v2
) · exp(−c1su − c2tv) · f(u, v)dudv ,
thus
F−1
∆F(f(u, v)) = (c2
1u2
+ c2
2v2
) · f(u, v).
It is just multiply a constant to each point of the original function f.
To get those constant, we can simply compute F−1
∆F(1(u, v)) for 1(u, v) ≡ 1.
• (2): FAt
AF−1
, F−1
At
AF are diagonal for A be a blur.
Let h be the kernel of A, f be a image,
Af(s) = f ∗ h =
∞
−∞
f(x) · h(s − x)dx =
∞
0
f(x) · h(s − x)dx,
For any other image a(x),
Af, a =
∞
−∞
a(s)
∞
0
f(x) · h(s − x)dxds =
∞
0
a(s)
∞
0
f(x) · h(s − x)dxds =
∞
0
f(x)
∞
0
a(s)h(x − s)dsdx = f, At
a ,
thus
At
f(s) =
∞
0
f(x)h(x − s)dx =
∞
0
f(x)ht
(s − x)dx,
where ht
(s − x) = h(x − s).
So FAt
AF−1
f = F ∗ ht
∗ h ∗ F−1
f = F(ht
∗ h) · f, it is also diagonal.
Now we can solve (28). Let’s first solve g∗
:
F(λAt
A − (µ + σ)∆)F−1
g∗
= F(λAt
f + σ t
x(dk
x − bk
x) + σ t
y(dk
y − bk
y)) (al4), (29)
then solve g:
g = F−1
g∗
(al5). (30)
Lastly our algorithm is as following:
8
Algorithm 1 two-stage
1: Initialize tolerance and set g0
= f, d0
x = d0
y = b0
x = b0
y = 0.
2: Repeat
3: Compute dk
x, dk
y, bk
x, bk
y, gk
from al4(29), al5(30), al2(26), al3(27), al1(24).
4: Until |gk
− gk−1
| tolerance
5: Get output g.
2.3 Experimental Results
For all the following outcomes, we set σ = 2, tolerance = 0.0001 in our algorithm.
(a) (b)
Figure 1: Antimass image segmentation
(a): the original image; (b): our result with iteration n.b. 192, time 15.7249.
(a) (b)
Figure 2: Kidney vascular system segmentation
(a): the original image; (b): our bi-segmentation by al6 with p = 0.1760, iteration n.b. 111, time
3.1668.
9
(a) (b)
Figure 3: Noise
(a): the original image; (b): our bi-segmentation by al6, p = 0.8308 with iteration n.b. 63, time 0.2340.
(a) (b)
Figure 4: tri-segmentation
(a): the original image; (b): our tri-segmentation by al∗
6 with iteration n.b. 63, time 0.3432.
10
(a) (b) (c)
Figure 5: Gaussian blur ([15, 15], 15)
Here we show only central part of images with the same size as original one.
(a): the clean image; (b): the blurred image; (c): our result with iteration n.b. 83, time 1.2480.
For the reason of simpleness, we mainly consider square images. If the original image is not square,
we simply make it square by adding 0s and then cut back. In the second and third example, the output
is a bi-segmentation by (al6). The last example gives a demonstration on dealing blurry image: we get
a blurred image by convolution a clean image and a blur kernel, then use our algorithm to segment. In
the example I just give the central part of the blurred image and the segmentation which is actually
the true size of those images should be larger due to matrix convolution.
Algorithm (al6): This is a very simple proceeder to get segmentation from our smooth image
obtained by the above section g. Given a constant chosen by user p ∈ (0, 1), let max denote the
maximum of image g, min denote the minimum of image g, then p = p · max + (1 − p) · min. Finally
set all pixel whose value in g is less than p to 0, others to 1.
Algorithm (al∗
6): For multi class segmentation, say K segments, we use ’kmeans’ function to firstly
compute K group mean values of all pixels and take the K −1 mid-points of the K mean values. Then
we use the K − 1 mid-points to segment all pixels in to K clusters according to their value and each
cluster is a segment.
3 Super Pixels by Normalized Cut
Super pixels is a very important topic [6]. The most popular method of getting super pixel is to do
normalized cut [7, 8, 10]. But it is also commonly agreeable that it is a time consuming method. Thus
we need SLIC method to provide us a quick way to do super pixel segmentation.
In this case, we need to regard an image as a graph in which each pixel represents a vertical. We
have to manage to find some relationship between pixels and turn them to a kind of distance from
which we can cluster those pixels into groups. Such groups are our super pixels. What we do is to
evaluate the edges among verticals with the distance of pixels and then use knowledge from graph
theory to do clustering.
11
3.1 Criteria
Firstly we need to define the distance between pixels.
wi,j = exp{−
f(i) − f(j) 2
2
σ2
I
} · wdi,j, where



wdi,j = exp{−
X(i) − X(j) 2
2
σ2
X
} ( X(i) − X(j) 2 < r),
wdi,j = 0 otherwise,
(31)
where wi,j denotes our distance between pixels i, j, f(i) denotes the gray value of pixel i, X(i) denotes
position of pixel i, σI, σX are constance where we take σI = 0.02, σX = 10 in this report.
Define V be the set of all pixels, W be the distance matrix.
What we want is partition Γ = {V1, V2...VK}, K group of pixels.
Secondly we need some measure of goodness of the clustering Γ above.
For A, B ∈ 2V
, define
links(A, B) = Σi∈A,j∈Bw(i, j), degree(A) = links(A, V ), linkratio(A, B) =
links(A, B)
degree(A)
.
Then introduce following concepts
• knassoc(Γ) = 1
K ΣK
i=1linkratio(Vi, Vi), which measures tightness of super pixels which the larger
the better.
• kncuts(Γ) = 1
K ΣK
i=1linkratio(Vi, V Vi), which measures differences between super pixels which
the smaller the better.
In this report we use knassoc(Γ) as criteria and try to find Γ gives the largest knassoc(Γ), and introduce
following notation:
W =




w1,1 w1,2 w1,3 · · ·
w2,1 w2,2 w2,3 · · ·
w3,1 w3,2 w3,3 · · ·
· · · · · · · · · · · ·



 (h),
D =




w1,1 + ... + w1,n 0 0 · · ·
0 w2,1 + ... + w2,n 0 · · ·
0 0 w3,1 + ... + w3,n · · ·
· · · · · · · · · · · ·




with wi,j’s from (31).
Then links(Vi, Vi) = Xt
i · W · Xi, degree(Vi) = Xt
i · D · Xi, where Xi is indicator vector of Vi. Thus we
want to maximize:
knassoc(Γ) =
1
K
ΣK
i=1
Xt
i · W · Xi
Xt
i · D · Xi
. (32)
12
3.2 Soundness and Algorithm
Let Z = X(Xt
DX)− 1
2 , then Zt
WZ = (Xt
DX)− 1
2 Xt
WX(Xt
DX)− 1
2 =





Xt
1 · W · X1
Xt
1 · D · X1
0 · · ·
0
Xt
2 · W · X2
Xt
2 · D · X2
· · ·
· · · · · · · · ·






, so we have knassoc(Γ) = 1
K tr(Zt
WZ).
Then we change our problem into: maximize
ε(Z) =
1
K
tr(Zt
WZ) s.t. Zt
DZ = IK, (33)
(IK is K denominational identity). From property of tr,
if Rt
R = IK, then ε(ZR) = ε(Z).
Let P = D−1
W, it is easy to see that 1N (N is total number of pixels) is eigenvector of eigenvalue
λ = 1.
Let V be matrix whose columns are eigenvector of P (note V is no longer the set of verticals), S be
the matrix has eigenvalues of P as diagonal entries, then PV = V S.
Let V = D− 1
2 · V , then
D− 1
2 WD− 1
2 V = D− 1
2 WV = D− 1
2 V S = V S. (34)
Since M = D− 1
2 WD− 1
2 is Laplacian from (h), for which λivt
i vj = vt
i Mvj = vt
i λjvj, so vi, vj = 0 (vis
are columns of V ), we have
ε(V ) =
1
K
tr(V
t
D− 1
2 WD− 1
2 V ) =
1
K
tr(V
t
V S) =
1
K
tr(S) =
1
K
ΣK
i=1si, (35)
where V
t
V = IK, sis are eigenvalues of M as well as of P, indeed M = P.
Proposition 3. When s1, s2, ..., sK are the K largest eigenvalues of P defined above, S = Diag[s1, s2, ..., sK],
PV = V S, then ε(V ) is maximized.
Proof. By property of Rayleigh quotient, if V is composed by eigenvectors of P, then it is local maxi-
mizer of ε(Z).
Now let Z∗
= [v∗
1, v∗
2, ..., v∗
K], for v∗
1, v∗
2, ..., v∗
K are eigenvectors corresponding to the K largest eigen-
values of P, we have Z∗
to be the global maximizer of ε(Z).
(all above are under the condition that Zt
DZ = IK.)
Therefore, our solution is:
{Z∗
· R | Rt
R = IK, Z∗
= [v∗
1...v∗
K]}, (36)
where v∗
1...v∗
K are eigenvector corresponding to largest eigenvalues of P.
For Z = f(X) = X(Xt
DX)− 1
2 , we get
X = f−1
(Z) = (Diag(diag(ZZt
)))− 1
2 · Z (al7), (37)
13
where, let ZZt
=


z1,1 z1,2 · · ·
z2,1 z2,2 · · ·
· · · · · · · · ·

, define diag(ZZt
) = [z1,1, z2,2, z3,3, ...], then we should have
Diag(diag(ZZt
)) =


z1,1 0 · · ·
0 z2,2 · · ·
· · · · · · · · ·

. This formula is correct whenever X is indicator matrix of all
pixels (note that Xt
DX is always diagonal matrix).
Then we have
f−1
(Z∗
R) = F−1
(Z∗
) · R. (38)
Suppose that we have find Z∗
from (36), the next thing is to find R s.t. Rt
R = IK which give
us f−1
(Z∗
)R looks most like an indicator matrix of pixels and use it as an approximation of the final
solution.
To do this, we come to find such X, R that minimize the following Φ(X, R): define ˜X∗ = f−1
(Z∗
)
Φ(X, R) = X − ˜X∗R 2
, where



X(i, j) ∈ {0, 1} (X = [x1, ..xK]),
x1 + ... + xK = 1K,
Rt
R = IK.
(39)
In (39) we use Frobenious norm since ε(Z) is continuous under that norm. To show this, consider
Z1, Z2 being bounded, if Z1 − Z2 < σ then ∀i, j, |Z1,i,j − Z2,i,j| < σ thus
|ε(Z1) − ε(Z2)| = | 1
K tr(Zt
1W(Z1 − Z2)) + 1
K tr((Z1 − Z2)t
WZ2)| < C · σ
for constant C depending on Z1, W.
Given R, we can find X by:
X(i, l) = is(l = arg max
k
˜X∗R(i, k)) (al8), (40)
which means define 1 on any but only one of row-maximizers of ˜X∗R and 0 on others.
Given X, we find R by:
Xt ˜X∗ = UΩ ˜Ut
,
R = ˜UUt
.
(al9) (41)
Here the first row is SVD decomposition of Xt ˜X∗.
Proposition 4. The two algorithm above (al8)(40), (al9)(41) can yield unique minimizer of (39)
Proof. We know that
Φ(X, R) = tr((X − ˜X∗R) · (Xt
− ( ˜X∗R)t
)) = tr{XXt
+ ˜X∗ ˜X∗
t
− X( ˜X∗R)t
− Xt ˜X∗R} =
C − 2tr(XRt ˜X∗
t
).
Then to minimize Φ(X, R) is to maximize 2tr(XRt ˜X∗
t
).
To maximize 2tr(XRt ˜X∗
t
) with respect to X, it is easy to get (al8)(40).
Introduce symmetric matrix Λ and define
14
L(R, Λ) = tr(XRt ˜X∗
t
) − 1
2 tr(Λt
(Rt
R − IK)),
then we want to maximize L(R, Λ).
Take gradients on every entry of R, we get
tr(XRt ˜X∗
t
− 1
2 Λt
(Rt
R − IK)) = ˜X∗
t
X − RΛ = 0,
since we know actually Rt
R = IK, take Λ∗
= Rt ˜X∗
t
X,
Xt ˜X∗ = UΩ ˜Ut
, ∴ (Λ∗
)t
Λ = UΩ2
Ut
, Λ∗
= (Λ∗
)t
= UΩUt
,
thus Rt ˜UΩUt
= UΩUt
, finally we get R = ˜UUt
.
Before the iteration of computing X, R, we want a initial R which is close to the exact solution.
Thus we simply take R composed by rows of ˜X∗ what are as perpendicular as possible. This is done
by:
R1 = first row of ( ˜X∗)t
, c = 0N×1,
k = 2, 3, ..., K,



c = c + ˜X∗ · Rk−1,
i = arg minj cj,
Rk = ith row of ˜X∗,
R = [R1, R2, ...Rk] (al10).
(42)
Finally, the algorithm is as following:
Algorithm 2 normalized cuts
1: Compute matrix W, D as (31),(h) with ’spares’ function.
2: Find V ∗
consisting K eigenvectors of D− 1
2 WD− 1
2 corresponding to K largest eigenvalues using
’egis’ function and check if 1N is inside.
3: Compute Z∗
= D− 1
2 V ∗
, ˜X∗ by al7(37).
4: Initialize R by al10(42).
5: Set tolerance.
6: Repeat
7: Compute Xk
, Rk
from al8(40), al9(41).
8: Until |Xk
− Xk−1
| tolerance
9: Get X, the indicator matrix of pixels.
3.3 Experimental Results
For all the following outcomes, we take σ2
I = 0.02, σ2
X = 10 in our algorithm.
15
(a) (b) (c)
Figure 6: simple example
(a): the original image; (b): our result with 4 super pixels, r = 7; (c): our result with 16 super pixels,
r = 7.
(a) (b)
Figure 7: 128 × 128 image
(a): the original image; (b): our result with 64 super pixels, r = 7.
16
(a) (b)
Figure 8: 128 × 128 image
(a): the original image; (b): our result with 128 super pixels, r = 5.
4 SLIC Super Pixels and Combination
One shortcoming of normalized cutting is its time cost thus one of the most popular methods of getting
super pixel segmentation, SLIC method [11], is introduced.
4.1 Key Idea
The most genius idea of this method is the introduce of cluster centre. This is to take a position as
the centre of each super pixel and then decide which cluster each pixel should belong to by measuring
the distance of that particular pixel with cluster centres nearby.
At the beginning we should also define the distance between pixels. Inspired by the idea used in
normalized cutting, we consider the model:
wi,j = σ2
I · wc2
i,j + σ2
X · wd2
i,j, (43)
where 


wci,j = f(i) − f(j) 2,
wdi,j = X(i) − X(j) 2 ( X(i) − X(j) 2 < r),
σ2
I = 1, σX = m
S .
(j) (44)
Here S = N
K , N is total number of pixels, usually 256 × 256. K is the number of clusters or super
pixels. Thus we can regard S as length of edge of each super pixel.
Then, we should get our initial group of cluster centres. Here we simply use grid points of step S, then
move them little around to get as lower gradient value as possible in order to avoid cluster centres on
17
edges.
Secondly, we assign each pixel to existing cluster centres with the least distance, and then use all pixels
in each cluster to compute new position of cluster centres and assign average grey values to those new
cluster centres.
Finally, we repeat the above procedure until it almost converges and then enforce connectivity by
eliminating isolated pixels. Thus we get:
Algorithm 3 SLIC
1: Initialize cluster centres Ck at grid points(hexagon).
2: Perturb cluster centres to the lowest gradient position.
3: Set tolerance.
4: Repeat
5: Assign each pixel a cluster centre with minimum distance (43),(j)(44).
6: Compute new cluster centres and residual error E (L1 distance between previous centres and
recomputed centres).
7: Until E tolerance
8: Enforce connectivity.
After we get our over segmentation, an important thing is to check the goodness of our segmentation.
We compare the segmentation result from our algorithm with those done by human being. And we
introduce two measurements.
•
Definition 1. under segmentation error
This measures how many of our super pixels lei exactly in segmentations drawn by human. If
too many of our super pixels cover both parts with equally portion of two distinct segments of
human, the result we get would be less satisfactory. Let g1, g2, · · · gm be segments drawn by
human, s1, s2, · · · sl be our super pixels(gi, sj’s are sets of pixels),N be total number of pixels.
Define
U =
1
N
[Σm
i=1(Σ[sj ||sj ∩gi|>B|sj |]|sj|) − N] (k), (45)
where |sj| just means number of pixels in that super pixel and we set B = 0.005. In this way,
the less under segmentation error U is, the better our over segmentation is.
•
Definition 2. boundary recall
This measures how well can our super pixels recognize the boundary of correct segmentation
done by human. Let A = {bg1, bg2, · · · bgM } be set of all boundary pixels of segmentation of
human, B = {bs1, bs2, · · · bsL} be set of all boundary pixels of our super pixels. Define:



C = {bg ∈ A | ∃bs ∈ B, bg − bs 2 D},
S =
|C|
|A|
(l),
(46)
where we take D = 1. in this way, the larger boundary recall S is, the better our over segmen-
tation is.
18
Here are some experimental results (U, S’s are from (k)(45), (l)(46)):
(a) (b)
Figure 9: 256 × 256 picture 1
(a): result from human; (b): our result with 1024 super pixels, m = 20 in (j)(44), U = 0.1826, S =
0.8969.
(a) (b)
Figure 10: 256 × 256 picture 2
(a) result from human; (b): our result with 1024 super pixels, m = 20 in (j)(44), U = 0.1665, S =
0.9288.
19
(a) (b)
Figure 11: 310 × 310 picture 3
(a): result from human; (b): our result with 1024 super pixels, m = 20 in (j)(44), U = 0.1930, S =
0.7222.
4.2 Combination and Probable Improvement
From (j) we may think of the SLIC method is more or less sensitive to noises or something like error
pixels for only in the second step we can do something to avoid isolated points. Therefore it is a natural
idea to use a smoothed image modified from original one while keeping main information instead to
do super pixel segmentation. The two-stage method above is a efficient way to smooth a image. Thus
we try to first smooth original picture by two-stage method and then use SLIC algorithm to do over
segmentation. What is more, we want to improve the distance in (43),(j)(44). Suppose that we already
have the hand made segmentation of certain pictures, it is reasonable to put those informations into
the distance formula. The direct way is to assign very small value to pairs in a same hand made
segment and very large value to other pairs. This inspires us to first get a rough segmentation and
them use further information to get a better distance formula. The simplest idea is of course use the
already smoothed image with algorithm (al∗
6) to get a rough segment. But the problem lies in the
repetition of information here. Thus we want to get another segmentation way not exactly same as
the two-stage method above to provide additional information.
The easiest way is to introduce some little change in the algorithm of two-stage method. Here we
change
E(g) = λ
2 Ω
(f − Ag)2
dx + µ
2 Ω
| g|2
dx + Ω
| g|dx
in (14) into:
E1(g) =
λ
2 Ω
(f − Ag)2
dx +
µ
2 Ω
| g|2
dx +
Ω
g 2dx, (47)
20
and then do the similar thing to minimize E1(47), where the second of (25) becomes:
(dk+1
x , dk+1
y ) = arg min
dx,dy
(dx, dy) 2 +
σ
2
dx − xg − bk
x
2
2 +
σ
2
dy − yg − bk
y
2
2. (48)
To solve (48), assume that dx = [d1,1, d1,2, · · · d1,N ]t
, dy = [d2,1, d2,2, · · · d2,N ]t
, xg+bk
x = [a1,1, a1,2, · · · a1,N ]t
,
yg + bk
y = [a2,1, a2,2, · · · a2,N ]t
.
Then let [dt
x, dt
y] = dt
, [( xg + bk
x)t
, ( yg + bk
y)t
] = at
, take derivative on
(dx, dy) 2 + σ
2 dx − xg − bk
x
2
2 + σ
2 dy − yg − bk
y
2
2
with respect to all di,js and set them equal to 0, we get
d
√
dt · d
+ σ · d − a = 0, (49)
thus d = c · a for some scalar c and we can solve that
c =
1
σ
(1 −
1
√
at · a
). (50)
Let this algorithm be (al∗
3).
Then we find some useful properties of this change. Firstly this algorithm converges very fast,
usually getting a relative error less than 0.0001 in around 5 iterations. Secondly it doesn’t give a
very smooth outcome, which is a disadvantage comparing to the original one, but this has an effect of
getting finder although more messy segmentation. Yet messiness is not a serious problem here because
we just need a rough segmentation to serve as a parameter in our new amended SLIC algorithm. What
is more, the similarity of the rough segmentation and original image can be helpful to eliminate some
errors introduced by previous smoothing.
For example, on the left is segmentation from original two-stage method λ = 10, µ = 1, σ = 2, K = 16,
we take large λ to get more similar to original image; on the right is from changed two-stage method
above with λ = 0.5, µ = 3, σ = 2, K = 16. We see that the new method should not be a good
segmentation but it reveals more information especially near boundaries.
After getting a rough segmentation, we put new information into distances among pixels by:
w∗
i,j = wi,j + is(|K(i) − K(j)| > 3) · |K(i) − K(j)| (j∗
), (51)
21
where wi,js the same as in (43),(j)(44), K(i) is the segmentation index of pixel i, noting that gray
value of i increases monotonically with K(i).
Then comes our mollification of SLIC method.
Algorithm 4 revised SLIC
1: Smooth original image to g with two-stage method (λ = 10, µ = 1, σ = 2).
2: Get segmentation of original image using changed two-stage method above (λ = 0.5, µ = 3, σ =
2, K = 16).
3: Change distances formula in SLIC algorithm from (j)(44) to (j∗
)(51).
4: Apply SLIC algorithm on g.
To see the improvement of this mollification, we use the code from ’VLFeat’ (regularizer: 0.1) as
control group. Here are 3 examples: (Uvl, Ug denote under segmentation error of images from VLFeat
and ours, Svl, Sg denote under boundary recall of images from VLFeat and ours.)
(a) (b)
Figure 12: 256 × 256 picture 1
(a): result from VLFeat, 1024 super pixels, Uvl = 0.1778, Svl = 0.8491; (b): ours, 1024 super pixels,
Ug = 0.1540, Sg = 0.9202.
22
(a) (b)
Figure 13: 256 × 256 picture 2
(a): result from VLFeat, 1024 super pixels, Uvl = 0.1598, Svl = 0.9180; (b): ours, 1024 super pixels,
Ug = 0.1298, Sg = 0.9341.
(a) (b)
Figure 14: 310 × 310 picture 3
(a): result from VLFeat, 1250 super pixels, Uvl = 0.1830, Svl = 0.6649; (b): ours, 1024 super pixels,
Ug = 0.1793, Sg = 0.7248.
23
From above examples we find our revised method results in smaller under segmentation error and
larger boundary recall. It is then reasonable to have more experiments. We took 10 other images:
image 1 2 3 4 5 6 7 8 9 10
Uvl 0.1109 0.0880 0.0994 0.1631 0.2928 0.2761 0.1423 0.0963 0.1540 0.3232
Ug 0.1023 0.0624 0.0901 0.1464 0.2559 0.2673 0.1272 0.0858 0.1477 0.2984
Svl 0.8368 0.9383 0.8583 0.9441 0.9111 0.8396 0.7122 0.7227 0.8014 0.8430
Sg 0.9066 0.9759 0.9215 0.9827 0.9199 0.8714 0.8356 0.8242 0.8726 0.9067
We can see our revised method tends to result in smaller under segmentation error and larger boundary
recall. To make it more clear, we look at the relative decrement of under segmentation error and relative
increment of boundary recall with respect to result from VLFeat.
(a) (b)
histogram (a) is about under segmentation error while histogram (b) is about boundary recall.
Therefore from experimental result, we can see our revising maybe at least a little bit useful.
5 Simple Application
After we got the supper pixel segmentation of an image, we become more powerful since we have more
information. Before, we only know things on or near each pixels but now we have plots of pixels where
we can get information simultaneously from many pixels.
5.1 Saliency Detection
As before we define distance between pixels, now we have more information to define reasonable
distances between super pixels (as in [12]).
Let i, j be two distinct super pixels, let Ri, Rj be range of grey or colour value of each super pixel,
Ci,j be the range of common grey or colour value of these two super pixels, Di,j be distance of the two
cluster centres of i, j, d be the diagonal length of the image, then define:
sim(i, j) =
1
2
(
|Ci,j|
|Ri|
+
|Ci,j|
|Rj|
) · (1 −
Di,j
d
), (52)
24
where |C| just means length of interval C. Then sim(i, j) is a reasonable similarity or distance between
i, j. The larger sim(i, j) is, the more similar are i, j and the more likely they should be in the same
larger segment.
On a map, a point with smaller average distances with all other points may be near the centre thus
it is natural to detect the saliency of each super pixel by compute a kind of ’average’ similarity to all
others. Let
GC(i) = ΣK
j=1W(i, j) · |mci − mcj|, for W(i, j) = |j| · (1 −
Di,j
d ),
where mci means the mean grey or colour value of super pixel i, |j| means the number of pixels in j,
K is number of super pixels, thus GC(i) serves as some indicator of saliency of i and we normalized it
by NGC(i) =
GC(i) − GCmin
GCmax − GCmin
. And define
SS(i) =
ΣK
j=1sim(i, j) · Ds(j)
ΣK
j=1sim(i, j)
, NSS(i) =
SS(i) − SSmin
SSmax − SSmin
,
where Ds(i) is distance between centre of i and centre of image. NSS(i) is also an indicator of saliency
of i.
Taking advantage of both NGC, NSS, define
SAL(i) =
ΣK
j=1sim(i, j) · NGC(j)
ΣK
j=1sim(i, j)
·
ΣK
j=1sim(i, j) · NSS(j)
ΣK
j=1sim(i, j)
, (53)
and we use SAL(i) as the final indicator of i. Thus we can segment the image according to value of
SAL for all super pixels and put super pixels with close SAL value into same segment.
This idea is similar to our two-stage method above, that is to concentrate all information in one
measure and do segmentation according to this measure. Here are 2 bi-segmentation result from the
above method and two-stage method. Example 321 × 321 plane, bird:
(c) (d) (e)
Figure 15: (a): the original image; (b): result from above method; (c): result from two-stage method.
25
(a) (b) (c)
Figure 16: (a): the original image; (b): result from above method; (c): result from two-stage method.
From above results we may found two properties of this method.
Firstly it reveal more information about edges for two-sage method tend to give us more smooth
edges.
Secondly it always gives high brightness to main part of the image when two-stage gives brightness
according to original grey value but this method gives higher brightness to more central part. Thus in
the bird image, two-sage method gives a almost opposite result.
5.2 Reunion of Super Pixels
Super pixel segmentation is a kind of over segmentation which yields too many segments thus it is
natural to union them back together to form larger segments.
Now let’s introduce back all notations from part 3, we want to define another kind of distance
among super pixels. Let f to be the origin image, i, j are to adjacent super pixels. Define
wgi,j = exp(−MEAN({
∂f
∂x
pk +
∂f
∂y
pk|pk ∈ Ni,j})), (54)
where pk’s are pixels of the image and Ni,j be common boundary of i, j. If wgi,j gives large value then
it means there is not likely to ba any edge between i, j thus i, j should be grouped into one cluster.
Also if both linkatio(i, j), linkratio(j, i) are large, then i, j are similar in colour and position. Define
wli,j =
1
2
(linkatio(i, j) + linkratio(j, i)). (55)
Finally define
w∗
i,j = wgi,j + c · wli,j (56)
to be our new similarity measure of super pixels and always take c = 0.1.
To compute linkatio(i, j), let Xi, Xj be indicator vector of i, j, W, D the same definition as in part 3,
then linkraio(i, j) =
Xt
i WXj
Xt
i DXi
.
Then we use exactly the same algorithm as part 3 to group our super pixels.
26
Sine all method above, two-stage method and the two super pixel method actually can all do the
same work, lets us make comparison of our method with the other 3 methods above. (all results cons
from 256 × 256 image with 1024 super pixels,and ours are always on the right side.)
• Segmentation of 32, comparing with two-stage method
We may find two-stage method not efficient at segmenting too many pieces. And our method can
reveal little more information about edges.
• Segmentation of 16, comparing with SLIC method
We may see when making less segments, SLIC method end to give unreasonable boundaries of segments
and it fails to detect many edges, while our method results in simpler shape of segments and gives less
segments on the top where there should be only one.
• Segmentation of 8, comparing with normalized cuts
27
We can see these two results are similar but our method finds some thing about the right crux and
window and performs better at telling corners of edges.
Thus our new method is notorious at a large range of segments. And I hope it would be helpful
for some people.
6 Deficiencies
Due to the limitation of my capacity, there still exists much room of improvement for part 4.2 and
part 5.2. Most significantly, information of textures has been ignored above. Actually there are many
efficient ways to extract that kind of information as well as edges by convulsion or diffusion [13]. For
example there are two simple method to do segmentation through texture: Gabor filter and structure
tensor.
For Gabor filter, define the kernel: K = (ki,j), for
ki,j = exp(−
i 2
+ γ2
· j 2
2σ2
) · exp(2π i
λ + ψ),
where i = i·cos(θ)+j·sin(θ), j = −i·sin(θ)+j·cos(θ), and we often set σ = 10, ψ = 0.2, γ = 0.1, λ = 10
and change the value of θ to get different results. In the following examples, we take ’lenna’ picture
and Gabor kernel above with size 3.
28
(a) (b) (c) (d)
Figure 17: 256 × 256 lenna
(a): θ = 30, (b): θ = 30, (c): θ = 90, (b): θ = 180.
For structure tensor, let original image be f where f(i, j) is gray or color value of pixel (i, j), define:
Sw(i, j) = [S0 ∗ w](i, j),
where S0(i, j) =
∂f
∂x
2
(i, j) ∂f
∂x · ∂f
∂y (i, j)
∂f
∂x · ∂f
∂y (i, j) ∂f
∂y
2
(i, j)
, and w(i, j) be usually Gaussian kernel. Then this
yields 3 different imagine results from which we can do combination and extract useful information.
Most commonly we compute eigenvector of Sw(i, j) corresponding to the larger eigenvalue for each
pixel, then we can get the direction with max gradient. Thus we may use the information of detection
as indicator of texture.
For example, take Gaussian kernel with size 3 and σ = 3, pixel A = (60, 60), B = (60, 180), C =
(180, 180), then the eigenvectors corresponding to larger eigenvalues are respectably [−0.8256, −0.5642]t
,
[−0.9922, 0.1245]t
, [−0.9808, −0.1950]t
, which is show by the arrow I draw in the above picture.
In a nutshell, it would be better if more about texture had been considered.
29
References
[1] Xiaohao Cai, Raymond Chan, and Tieyong Zeng. A Two-Stage Image Segmentation Method Using
a Convex Variant of the Mumford-Shah Model and Thresholding. SIAM J. Imaging Sciences (2013),
Vol. 6, No. 1, pp. 368-390.
[2] Tom Goldstein, Stanley Osher. The split Bregman algorithm for L1 regularized problems. pp. 4-6.
[3] Jacqueline Bush, Dr. Carlos Garcia-Cervera. Bregman Algorithms (University of California, Santa
Barbara ). (2011), pp. 20-23.
[4] Yilun Wang, Wotao Yin and Yin Zhang. A Fast Algorithm for Image Deblurring with Total
Variation Regularization. CAAM (2007).
[5] Jose Bioucas Dias. Convolution operators. IP, IST (2007), pp. 2-13.
[6] Xiaofeng Ren and Jitendra Malik. Learning a Classification Model for Segmentation. CA 94720.
[7] Jianbo Shi and Jitendra Malik. normalized cuts and image segmentation. IEEE Transactions On
Pattern Analysis and Machine Intelligence (2000), VOL. 22, NO. 8, pp. 888-905.
[8] Subhransu Maji, Nisheeth K. Vishnoi and Jitendra Malik. Biased Normalized Cuts. Computer
Vision and Pattern Recognition (2011), pp. 2057-2064.
[9] J. Almeida. Lanczos Algorithm: Theory and Aplications. York (United Kingdom) (2012).
[10] Stella X. Yu, Jianbo Shi. Multiclass Spectral Clustering. Computer Vision (2003).
[11] Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine
Susstrunk. SLIC Superpixels. Journal of Latex Class Files, VOL. 6, NO. 1, December 2011.
[12] Zhi Liu, Olivier Le Meur and Shuhua Luo. Superpixel-based saliency detection. Image Analysis
for Multimedia Interactive Services, 2013 14th.
[13] Mikael Rousson, Thomas Brox and Rachid Deriche. Active Unsupervised Texture Segmentation
on a Diffusion Based Feature Space. Institut National De Recherche En Informatique et En Au-
tomatique (2003).
30

More Related Content

What's hot

Introductory maths analysis chapter 14 official
Introductory maths analysis   chapter 14 officialIntroductory maths analysis   chapter 14 official
Introductory maths analysis chapter 14 officialEvert Sandye Taasiringan
 
Paraproducts with general dilations
Paraproducts with general dilationsParaproducts with general dilations
Paraproducts with general dilationsVjekoslavKovac1
 
Flexural analysis of thick beams using single
Flexural analysis of thick beams using singleFlexural analysis of thick beams using single
Flexural analysis of thick beams using singleiaemedu
 
34032 green func
34032 green func34032 green func
34032 green funcansarixxx
 
Tales on two commuting transformations or flows
Tales on two commuting transformations or flowsTales on two commuting transformations or flows
Tales on two commuting transformations or flowsVjekoslavKovac1
 
Introductory maths analysis chapter 02 official
Introductory maths analysis   chapter 02 officialIntroductory maths analysis   chapter 02 official
Introductory maths analysis chapter 02 officialEvert Sandye Taasiringan
 
H function and a problem related to a string
H function and a problem related to a stringH function and a problem related to a string
H function and a problem related to a stringAlexander Decker
 
A Szemerédi-type theorem for subsets of the unit cube
A Szemerédi-type theorem for subsets of the unit cubeA Szemerédi-type theorem for subsets of the unit cube
A Szemerédi-type theorem for subsets of the unit cubeVjekoslavKovac1
 
Group {1, −1, i, −i} Cordial Labeling of Product Related Graphs
Group {1, −1, i, −i} Cordial Labeling of Product Related GraphsGroup {1, −1, i, −i} Cordial Labeling of Product Related Graphs
Group {1, −1, i, −i} Cordial Labeling of Product Related GraphsIJASRD Journal
 
Introductory maths analysis chapter 09 official
Introductory maths analysis   chapter 09 officialIntroductory maths analysis   chapter 09 official
Introductory maths analysis chapter 09 officialEvert Sandye Taasiringan
 
Scattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisScattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisVjekoslavKovac1
 
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...grssieee
 
On fixed point theorem in fuzzy metric spaces
On fixed point theorem in fuzzy metric spacesOn fixed point theorem in fuzzy metric spaces
On fixed point theorem in fuzzy metric spacesAlexander Decker
 
Table 1
Table 1Table 1
Table 1butest
 
Numerical solution of linear volterra fredholm integro-
Numerical solution of linear volterra fredholm integro-Numerical solution of linear volterra fredholm integro-
Numerical solution of linear volterra fredholm integro-Alexander Decker
 

What's hot (19)

Introductory maths analysis chapter 14 official
Introductory maths analysis   chapter 14 officialIntroductory maths analysis   chapter 14 official
Introductory maths analysis chapter 14 official
 
Paraproducts with general dilations
Paraproducts with general dilationsParaproducts with general dilations
Paraproducts with general dilations
 
On the Zeros of Complex Polynomials
On the Zeros of Complex PolynomialsOn the Zeros of Complex Polynomials
On the Zeros of Complex Polynomials
 
Ji2416271633
Ji2416271633Ji2416271633
Ji2416271633
 
Flexural analysis of thick beams using single
Flexural analysis of thick beams using singleFlexural analysis of thick beams using single
Flexural analysis of thick beams using single
 
34032 green func
34032 green func34032 green func
34032 green func
 
Tales on two commuting transformations or flows
Tales on two commuting transformations or flowsTales on two commuting transformations or flows
Tales on two commuting transformations or flows
 
Introductory maths analysis chapter 02 official
Introductory maths analysis   chapter 02 officialIntroductory maths analysis   chapter 02 official
Introductory maths analysis chapter 02 official
 
H function and a problem related to a string
H function and a problem related to a stringH function and a problem related to a string
H function and a problem related to a string
 
A Szemerédi-type theorem for subsets of the unit cube
A Szemerédi-type theorem for subsets of the unit cubeA Szemerédi-type theorem for subsets of the unit cube
A Szemerédi-type theorem for subsets of the unit cube
 
Group {1, −1, i, −i} Cordial Labeling of Product Related Graphs
Group {1, −1, i, −i} Cordial Labeling of Product Related GraphsGroup {1, −1, i, −i} Cordial Labeling of Product Related Graphs
Group {1, −1, i, −i} Cordial Labeling of Product Related Graphs
 
Introductory maths analysis chapter 09 official
Introductory maths analysis   chapter 09 officialIntroductory maths analysis   chapter 09 official
Introductory maths analysis chapter 09 official
 
Scattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisScattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysis
 
Holographic Cotton Tensor
Holographic Cotton TensorHolographic Cotton Tensor
Holographic Cotton Tensor
 
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...
 
Inner product
Inner productInner product
Inner product
 
On fixed point theorem in fuzzy metric spaces
On fixed point theorem in fuzzy metric spacesOn fixed point theorem in fuzzy metric spaces
On fixed point theorem in fuzzy metric spaces
 
Table 1
Table 1Table 1
Table 1
 
Numerical solution of linear volterra fredholm integro-
Numerical solution of linear volterra fredholm integro-Numerical solution of linear volterra fredholm integro-
Numerical solution of linear volterra fredholm integro-
 

Viewers also liked

призентация эксэрготрансформаторного двигателя. (1)
призентация эксэрготрансформаторного двигателя. (1)призентация эксэрготрансформаторного двигателя. (1)
призентация эксэрготрансформаторного двигателя. (1)mkril
 
Balanço Social 2015 - Transferro
Balanço Social   2015 - TransferroBalanço Social   2015 - Transferro
Balanço Social 2015 - TransferroComunicacaoftc
 
Présenter un site
Présenter un sitePrésenter un site
Présenter un sitefatinamelhem
 
Carol Vernallis
Carol VernallisCarol Vernallis
Carol Vernallisjessskinn
 
презентация авиационного двигателя сверхвысоких пораметров газа. копия
презентация  авиационного двигателя сверхвысоких пораметров газа.   копияпрезентация  авиационного двигателя сверхвысоких пораметров газа.   копия
презентация авиационного двигателя сверхвысоких пораметров газа. копияmkril
 
Front Cover Analysis
Front Cover AnalysisFront Cover Analysis
Front Cover Analysisjessskinn
 
[week8] 데이터읽어주는남자
[week8] 데이터읽어주는남자[week8] 데이터읽어주는남자
[week8] 데이터읽어주는남자neuroassociates
 

Viewers also liked (12)

Evaluation Q3
Evaluation Q3Evaluation Q3
Evaluation Q3
 
__JOB_STRENGTHS
__JOB_STRENGTHS__JOB_STRENGTHS
__JOB_STRENGTHS
 
pagina web II
pagina web IIpagina web II
pagina web II
 
Keeton Pss 21
Keeton Pss 21Keeton Pss 21
Keeton Pss 21
 
призентация эксэрготрансформаторного двигателя. (1)
призентация эксэрготрансформаторного двигателя. (1)призентация эксэрготрансформаторного двигателя. (1)
призентация эксэрготрансформаторного двигателя. (1)
 
Balanço Social 2015 - Transferro
Balanço Social   2015 - TransferroBalanço Social   2015 - Transferro
Balanço Social 2015 - Transferro
 
Présenter un site
Présenter un sitePrésenter un site
Présenter un site
 
Carol Vernallis
Carol VernallisCarol Vernallis
Carol Vernallis
 
презентация авиационного двигателя сверхвысоких пораметров газа. копия
презентация  авиационного двигателя сверхвысоких пораметров газа.   копияпрезентация  авиационного двигателя сверхвысоких пораметров газа.   копия
презентация авиационного двигателя сверхвысоких пораметров газа. копия
 
Front Cover Analysis
Front Cover AnalysisFront Cover Analysis
Front Cover Analysis
 
[week8] 데이터읽어주는남자
[week8] 데이터읽어주는남자[week8] 데이터읽어주는남자
[week8] 데이터읽어주는남자
 
Presentation-HeliSpeed
Presentation-HeliSpeedPresentation-HeliSpeed
Presentation-HeliSpeed
 

Similar to Two-Stage Image Segmentation and Super Pixels Report

DISTANCE TWO LABELING FOR MULTI-STOREY GRAPHS
DISTANCE TWO LABELING FOR MULTI-STOREY GRAPHSDISTANCE TWO LABELING FOR MULTI-STOREY GRAPHS
DISTANCE TWO LABELING FOR MULTI-STOREY GRAPHSgraphhoc
 
Complex analysis notes
Complex analysis notesComplex analysis notes
Complex analysis notesPrakash Dabhi
 
New Classes of Odd Graceful Graphs - M. E. Abdel-Aal
New Classes of Odd Graceful Graphs - M. E. Abdel-AalNew Classes of Odd Graceful Graphs - M. E. Abdel-Aal
New Classes of Odd Graceful Graphs - M. E. Abdel-AalGiselleginaGloria
 
Lego like spheres and tori, enumeration and drawings
Lego like spheres and tori, enumeration and drawingsLego like spheres and tori, enumeration and drawings
Lego like spheres and tori, enumeration and drawingsMathieu Dutour Sikiric
 
Chapter2 functionsandgraphs-151003144959-lva1-app6891
Chapter2 functionsandgraphs-151003144959-lva1-app6891Chapter2 functionsandgraphs-151003144959-lva1-app6891
Chapter2 functionsandgraphs-151003144959-lva1-app6891Cleophas Rwemera
 
Natural and Clamped Cubic Splines
Natural and Clamped Cubic SplinesNatural and Clamped Cubic Splines
Natural and Clamped Cubic SplinesMark Brandao
 
College algebra 7th edition by blitzer solution manual
College algebra 7th edition by blitzer solution manualCollege algebra 7th edition by blitzer solution manual
College algebra 7th edition by blitzer solution manualrochidavander
 
Strong convergence of an algorithm about strongly quasi nonexpansive mappings
Strong convergence of an algorithm about strongly quasi nonexpansive mappingsStrong convergence of an algorithm about strongly quasi nonexpansive mappings
Strong convergence of an algorithm about strongly quasi nonexpansive mappingsAlexander Decker
 
2.3 new functions from old functions
2.3 new functions from old functions2.3 new functions from old functions
2.3 new functions from old functionsBarojReal1
 
On the Odd Gracefulness of Cyclic Snakes With Pendant Edges
On the Odd Gracefulness of Cyclic Snakes With Pendant EdgesOn the Odd Gracefulness of Cyclic Snakes With Pendant Edges
On the Odd Gracefulness of Cyclic Snakes With Pendant EdgesGiselleginaGloria
 
The Fundamental Solution of an Extension to a Generalized Laplace Equation
The Fundamental Solution of an Extension to a Generalized Laplace EquationThe Fundamental Solution of an Extension to a Generalized Laplace Equation
The Fundamental Solution of an Extension to a Generalized Laplace EquationJohnathan Gray
 
Best Approximation in Real Linear 2-Normed Spaces
Best Approximation in Real Linear 2-Normed SpacesBest Approximation in Real Linear 2-Normed Spaces
Best Approximation in Real Linear 2-Normed SpacesIOSR Journals
 
10.11648.j.pamj.20170601.11
10.11648.j.pamj.20170601.1110.11648.j.pamj.20170601.11
10.11648.j.pamj.20170601.11DAVID GIKUNJU
 
ODD HARMONIOUS LABELINGS OF CYCLIC SNAKES
ODD HARMONIOUS LABELINGS OF CYCLIC SNAKESODD HARMONIOUS LABELINGS OF CYCLIC SNAKES
ODD HARMONIOUS LABELINGS OF CYCLIC SNAKESgraphhoc
 

Similar to Two-Stage Image Segmentation and Super Pixels Report (20)

Functions
FunctionsFunctions
Functions
 
DISTANCE TWO LABELING FOR MULTI-STOREY GRAPHS
DISTANCE TWO LABELING FOR MULTI-STOREY GRAPHSDISTANCE TWO LABELING FOR MULTI-STOREY GRAPHS
DISTANCE TWO LABELING FOR MULTI-STOREY GRAPHS
 
Complex analysis notes
Complex analysis notesComplex analysis notes
Complex analysis notes
 
New Classes of Odd Graceful Graphs - M. E. Abdel-Aal
New Classes of Odd Graceful Graphs - M. E. Abdel-AalNew Classes of Odd Graceful Graphs - M. E. Abdel-Aal
New Classes of Odd Graceful Graphs - M. E. Abdel-Aal
 
Lego like spheres and tori, enumeration and drawings
Lego like spheres and tori, enumeration and drawingsLego like spheres and tori, enumeration and drawings
Lego like spheres and tori, enumeration and drawings
 
Chapter2 functionsandgraphs-151003144959-lva1-app6891
Chapter2 functionsandgraphs-151003144959-lva1-app6891Chapter2 functionsandgraphs-151003144959-lva1-app6891
Chapter2 functionsandgraphs-151003144959-lva1-app6891
 
Chapter 2 - Functions and Graphs
Chapter 2 - Functions and GraphsChapter 2 - Functions and Graphs
Chapter 2 - Functions and Graphs
 
Natural and Clamped Cubic Splines
Natural and Clamped Cubic SplinesNatural and Clamped Cubic Splines
Natural and Clamped Cubic Splines
 
College algebra 7th edition by blitzer solution manual
College algebra 7th edition by blitzer solution manualCollege algebra 7th edition by blitzer solution manual
College algebra 7th edition by blitzer solution manual
 
Strong convergence of an algorithm about strongly quasi nonexpansive mappings
Strong convergence of an algorithm about strongly quasi nonexpansive mappingsStrong convergence of an algorithm about strongly quasi nonexpansive mappings
Strong convergence of an algorithm about strongly quasi nonexpansive mappings
 
2.3 new functions from old functions
2.3 new functions from old functions2.3 new functions from old functions
2.3 new functions from old functions
 
On the Odd Gracefulness of Cyclic Snakes With Pendant Edges
On the Odd Gracefulness of Cyclic Snakes With Pendant EdgesOn the Odd Gracefulness of Cyclic Snakes With Pendant Edges
On the Odd Gracefulness of Cyclic Snakes With Pendant Edges
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
1543 integration in mathematics b
1543 integration in mathematics b1543 integration in mathematics b
1543 integration in mathematics b
 
The Fundamental Solution of an Extension to a Generalized Laplace Equation
The Fundamental Solution of an Extension to a Generalized Laplace EquationThe Fundamental Solution of an Extension to a Generalized Laplace Equation
The Fundamental Solution of an Extension to a Generalized Laplace Equation
 
Best Approximation in Real Linear 2-Normed Spaces
Best Approximation in Real Linear 2-Normed SpacesBest Approximation in Real Linear 2-Normed Spaces
Best Approximation in Real Linear 2-Normed Spaces
 
10.11648.j.pamj.20170601.11
10.11648.j.pamj.20170601.1110.11648.j.pamj.20170601.11
10.11648.j.pamj.20170601.11
 
AnsChap1.pdf
AnsChap1.pdfAnsChap1.pdf
AnsChap1.pdf
 
ODD HARMONIOUS LABELINGS OF CYCLIC SNAKES
ODD HARMONIOUS LABELINGS OF CYCLIC SNAKESODD HARMONIOUS LABELINGS OF CYCLIC SNAKES
ODD HARMONIOUS LABELINGS OF CYCLIC SNAKES
 
Bayes gauss
Bayes gaussBayes gauss
Bayes gauss
 

Two-Stage Image Segmentation and Super Pixels Report

  • 1. Report on Two-stage Convex Image Segmentation Method and Super Pixels Hongyi LI∗ August 2014 Abstract This summer we studied the paper ’A Two-Stage Image Segmentation Method Using a Convex Variant of the Mumford-Shah Model and Thresholding’ and we also studied about super pixel. This report is mainly about very basic and primary understandings about these two ideas. We focus on the algorithms of these methods through matlab where some difficulties would be faced by beginners. We tried to follow the algorithm of two-stage segmentation method provide in the paper and we leant normalized cutting method to do super pixel segmentation. More than that we concentrated more on a faster algorithm called SLIC method and made some attempt to improve that method. Lastly we also found some applications such as Saliency Detection based on super pixel segmentation. Acknowledgement I would like to express my greatest gratitude to the people who have helped and supported me through- out my work. I wish to firstly thank Prof. Sunney Chan. This summer research is supported by prof. Sunney Chan and I deeply appreciate him for providing such a chance for students like me touch the top things. In this summer research program I really have found some new things and ideas. Besides I think I have got a more advanced understanding about how research should be conducted and what academic world is. All these nice experience I get and the progress I probably make should be cred- ited to prof. Sinney Chan. Due to his supporting, I could get enough recourse this summer to finish this work. I am equally grateful to my supervisor dr. Tieyong Zeng. This work is supervised by dr. Tieyong Zeng and without his help I could hardly finish this report. 1 Introduction Image segmentation is our main topic this summer. The first thing for a beginner in this area is the understanding of image. Here we introduce two types of comprehension of image. Firstly we can regard an image as a function defined on the x-y plane. In this way, coordinate just means position of pixel and function values means grey or colour value of that pixel. The advantage of this understanding is that we can use functional tools to absorb information about image. For example, we may take derivatives ∗12051217@life.hkbu.edu.hk 1
  • 2. on each pixel to detect edges which would yields larger absolute value or we can take convulsion with a nice kernel to draw texture information. Indeed our two-stage convex method mainly sees images this way. Secondly an image can also be regard as a graph with pixels serve as verticals. However since our image is usually large, 256 × 256 for instance, we need to deal with graph with 65536 verticals. This can be a disadvantage since we seem to need too much computation. In fact, the idea of super pixels is more based on this perspective. Now let’s come back to segmentation. When we see image as function, we need to find compact curves to segment it. The famous Mumford-Shah model provide direction of finding curves and our two-stage model is based on Mumford-Shah model. The two-stage method provide a very helpful way to find curves, that is to find the contours of another function somehow got from the original one. When we see image as graph, we need to define distance among verticals and then do clustering, each group of verticals is the collection of pixels forming each piece of our segmentation. The main idea is thus naturally to group pixels with litter or larger distances or similarities with each other. A general algorithm for this sophisticated problem is normalized cuts which can solve all problem of this kind. Yet, since our image is always a flat plane, SLIC method was produced to give a efficient clustering of pixels of image. Our main work this summer is to study those method mentioned above and make some attempt to combine them somehow to get possible improvement. 2 Two-stage Image Segmentation Method Using a Convex Method This is an efficient method to transform an image to one enough piecewise smooth that can be directly separated by its value [1]. The most important work is to change the Mumford-Shah model into a convex model and excoriate one variable i.e. Γ in Mumford-Shah model, which is Ems(g, Γ) = λ 2 Ω (f − g)2 dx + µ 2 ΩΓ | g|2 dx + length(Γ) (1) into E(g) = λ 2 Ω (f − g)2 dx + µ 2 Ω | g|2 dx + Ω | g|dx. (2) 2.1 Inspiration This idea rises from binary images. Since a binary image contains only 2 value, if we have found the boundary and set it to be Γ in Mumford-Shah model, what left is just to find g, which would be easy. It is the same to find the region Σ in the image has value in certain range, i.e. > 0.5, s.t. ∂Σ = Γ, Σ = Closure(inside(Γ)). Let g1 be restriction of g on Σ Γ, g2 be restriction of g on Ω Σ, then Mumford-Shah model becomes: min (Σ,g1,g2) {E(Σ, g1, g2) = λ 2 ΣΓ (f − g1)2 dx + λ 2 ΩΣ (f − g2)2 dx + µ 2 ΣΓ | g1|2 dx + µ 2 ΩΣ | g2|2 dx + Per(Σ)}. (3) 2
  • 3. Firstly, assuming that Σ is fixed, the following two minimizers need to be found: λ ΣΓ (f − g1)2 dx + µ ΣΓ | g1|2 dx, λ ΩΣ (f − g2)2 dx + µ ΩΣ | g2|2 dx. (4) Proposition 1. The set of formulas of g1, g2 in (4) has unique minimizer. Proof. Introduce inner product inner product: u, v W = Ω uv + Σk i=1 ∂u ∂xi ∂v ∂xi dx, by Cauchy-Schwarz, u, v 2 W u, u W · v, v W , u, v W is linear, thus it is well defined. Now introduce W-norm: |u|2 W = u, u W , energy-function: E(u) = µ2 Ω (u − g)2 dx + Ω | µ|2 dx . If we take u0 = 0, then E(u0) = µ2 Ω g2 dx < +∞, thus 0 inf(E) < +∞. Let’s consider ∀u, v, we know that u + v 2 2 2 + u − v 2 2 2 = Ω u2 2 + v2 2 dx = 1 2 u 2 2 + 1 2 v 2 2, (5) u + v 2 2 2 = 1 2 u 2 2 + 1 2 v 2 2 − u − v 2 2 2. (6) Applying this formula, we have E( u + v 2 ) = µ2 Ω ( u − g 2 + v − g 2 2 )dx + Ω | u 2 + v 2 |2 dx = µ2 2 Ω (u − g)2 dx + µ2 2 Ω (v − g)2 dx − µ2 Ω ( u − v 2 )2 dx + 1 2 Ω | u|2 dx + 1 2 Ω | v|2 dx − Ω | u + v 2 |2 dx = E(u) 2 + E(v) 2 − G(u, v), (7) where G(u, v) = 1 4 (µ2 u − v 2 W + (1 − µ2 ) (u − v) 2 2) (1 > µ2 ), 1 4 (µ2 u − v 2 W + (µ2 − 1) (u − v) 2 2) (1 µ2 ). Suppose that inf(E) = m, then ∀ε, ∃u, v s.t. E(u), E(v) < m + ε, we have m E(u+v 2 ) < m + ε − G(u, v), thus we have, G(u, v) < ε, u − v 2 W < 4ε µ2 (1 > µ2 ), u − v 2 W < 4ε (1 µ2 ). 3
  • 4. Consequently any sequence u1, u2, u3... s.t. E(ui) −→ m, is Cauchy sequence. Since W provide a Banach space, there is a limit ui −→ u∗ . So we only need E(u) to be continuous. E(u) − E(v) = −2µ2 Ω (u − v)gdx + u 2 W − v 2 W + (µ2 − 1) Ω (u2 − v2 )dx, (8) knowing | Ω (u − v)gdx|2 Ω g2 dx Ω (u − v)2 dx, it is easy to see E(u) is continuous, thus E(u∗ ) = m. So (4) do have unique minimizer. Secondly assuming that g1, g2 are fixed, we need to find Σ minimizing E in (3). We should change this problem into finding a function u as following and setting Σ to be a kind of contour of u. Introduce Co-area-formula: Ω g(x)| u(x)|dx = +∞ −∞ u−1(t) g(x)dH(x)dt . Let’s set g = 1, u ∈ [0, 1], then Ω | u(x)|dx = 1 0 length(x|u(x) = t)dt = 1 0 Per(x|u(x) > ρ)dρ. (9) Then we can change function u ∈ [0, 1] into integral form: u(x) = 1 0 1[0,u(x)](ρ)dρ, where 1I is characteristic function of set I. Then Ω {λ(f − g1)2 + µ| g1|2 − λ(f − g2)2 − µ| g2|2 }u(x)dx = Ω {λ(f − g1)2 + µ| g1|2 − λ(f − g2)2 − µ| g2|2 } 1 0 1[0,u(x)](ρ)dρdx = 1 0 Ω {λ(f − g1)2 + µ| g1|2 − λ(f − g2)2 − µ| g2|2 }1[0,u(x)](ρ)dxdρ = 1 0 x:u(x)>ρ λ(f − g1)2 + µ| g1|2 − λ(f − g2)2 − µ| g2|2 dxdρ = 1 0 x:u(x)>ρ λ(f − g1)2 + µ| g1|2 dxdρ + 1 0 x:u(x) ρ λ(f − g2)2 + µ| g2|2 dxdρ − Ω λ(f − g2)2 + µ| g2|2 dx, (10) where Ω λ(f − g2)2 + µ| g2|2 dx is independent with u. Fix u, then let Σ(ρ) = Closure({x|u(x) > 4
  • 5. ρ}), Γ(ρ) = ∂Σ(ρ), we have Ω | u|dx + 1 2 Ω {λ(f − g1)2 + µ| g1|2 − λ(f − g2)2 − µ| g2|2 }udx (c) (11) = (c) − 1 2 1 0 Γ(ρ) λ(f − g1)2 = 1 0 Per(Σ(ρ))dρ + 1 2 1 0 Σ(ρ)Γ(ρ) λ(f − g1)2 + µ| g1|2 dxdρ + 1 2 1 0 ΩΣ(ρ) λ(f − g2)2 + µ| g2|2 dxdρ − C = 1 0 E(Σ(ρ), g1, g2)dρ − C, (12) where C = 1 2 Ω λ(f − g2)2 + µ| g2|2 dx, for Γ(ρ) is of measure 0 (we can omit it to get a more integral formula). Then the u minimizing (c)(11) give us the set of Σ minimize E(Σ, g1, g2) in (3). To show this, let the minimizer be Σ0, let u0(x) = 1Σ, then u0 minimizes (c)(11). So we arrive that: to find Σ is to find minu Ω | u|dx + 1 2 Ω {λ(f − g1)2 + µ| g1|2 − λ(f − g2)2 − µ| g2|2 }udx . Therefore, from the above analysis, we actually change the Mumford-Shah model into: minimize E(u, g1, g2) = Ω | u|dx + 1 2 Ω {λ(f − g1)2 + µ| g1|2 − λ(f − g2)2 − µ| g2|2 }udx. (13) What is important here lies in the fact that we indeed use Ω | u|dx to substitute length(Γ). Then inspired by (13), the paper give out this kind of mollified Mumford-Shah model: E(g) = λ 2 Ω (f − Ag)2 dx + µ 2 Ω | g|2 dx + Ω | g|dx, (14) A is a blur or nothing, by minimizing which we get a new image g from the origin image f that is smooth enough to be clustered by the value of each pixel to give a satisfying segmentation. 2.2 Soundness and Algorithm Proposition 2. The energy function E(g) in (14) has unique minimizer. Proof. • E(g) in (14) is convex. Since f(x) = x2 is strict convex, we know λ 2 Ω (f − Ag)2 dx + µ 2 Ω | g|2 dx is strict convex, adding that h(x1, x2, ...xn) = Σn i=1x2 i is convex, we thus have Ω | g|dx is convex. Thus E(g) is strict convex. 5
  • 6. • E(g) in (14) is coercive under W-norm defined above. Knowing u 2 W = Ω u2 dx + Ω | u|2 , we have E(g) Ω | u|2 . By Poincar’s inequality, g − gΩ 2 g 2 C E(g), gΩ f − Ag 2 + f − A(g − gΩ) 2 C1 E(g) + f 2 + A · g − gΩ 2 f 2 + C2 E(g). So gΩ C3 + C2 E(g), g 2 gΩ 2 + g − gΩ 2 C3 + C2 E(g). Thus g W C4 + C5 E(g), E(g) is coercive (Ci, Cjs are constance). Therefore, from convex analysis, E(g) has unique minimizer. This idea is thus sound. To solve min E(g) in (14), we introduce split-Bregman algorithm[2, 3]. Define Dp E(u, v) = E(u) − E(v) − p, u − v , p ∈ ∂E . Assume that inf H(u) = 0, and H(u) is differentiable, then H(u) ∈ ∂H(u). Then Bregman claims that to solve minu E(u) + λH(u), one can iteratively solve uk+1 = arg min u Dpk E (u, uk ) + λH(u), pk+1 = pk − λ H(uk+1 ). (15) Then we apply this method to solve minu E(u) + Φ(u) 1. Let’s rewrite the formula into minu d 1 + E(u) s.t. d = Φ(u), and then relax it into min (u,d) d 1 + E(u) + λ 2 d − Φ(u) 2 2, (16) (λ is positive constant), let J(u, d) = d 1 + E(u), we come to min (u,d) J(u, d) + λ 2 d − Φ(u) 2 2. (17) Now we can apply Bregman iteration setting all initial value to be 0: (uk+1 , dk+1 ) = arg min(u,d) Dpk J (u, uk , d, dk ) + λ 2 d − Φ(u) 2 2, pk+1 u − pk u + λΦt (Φuk+1 − dk+1 ) = 0; pk+1 d − pk d + λ(dk+1 − Φuk + 1) = 0. (18) Thus we have pk+1 u = −λΦt Σk+1 i=1 (Φui − di ); pk+1 d = λΣk+1 i=1 (Φui − di ). (19) 6
  • 7. Let bk+1 = bk + (Φui − di ) = Σk+1 i=1 (Φui − di ), b1 = 0, we have pk u = −λΦt bk , pk d = λbk . Therefore, (uk+1 , dk+1 ) = arg min (u,d) J(u, d) − J(uk , dk ) + λ < bk , Φu − Φuk > − λ < bk , d − dk > + λ 2 d − Φu 2 2 = arg min (u,d) J(u, d) − J(uk , dk ) − λ < bk , d − Φu > − λ < bk , dk − Φuk > + λ 2 d − Φu 2 2 = arg min (u,d) J(u, d) − λ < bk , dk − Φuk > + λ 2 d − Φu 2 2 = arg min (u,d) J(u, d) + λ 2 d − Φu − bk 2 2, (20) after that we get split-Bregman iteration: (uk+1 , dk+1 ) = arg min(u,d) J(u, d) + λ 2 d − Φu − bk 2 2, bk+1 = bk + (Φuk+1 − dk+1 ). (21) Now we apply this method into our minimizing of E(g)(14). Our problem becomes: ming λ 2 f − Ag 2 2 + µ 2 g 2 2 + (dx, dy) 1 s.t. dx = xg, dy = yg. We can transfer it to: min g,dx,dy λ 2 f − Ag 2 2 + µ 2 g 2 2 + (dx, dy) 1 + σ 2 dx − xg 2 2 + σ 2 dy − yg 2 2, (22) from which we have: (gk+1 , dk+1 x , dk+1 y ) = arg min g,dx,dy λ 2 f − Ag 2 2 + µ 2 g 2 2+ (dx, dy) 1 + σ 2 dx − xg − bk x 2 2 + σ 2 dy − yg − bk y 2 2, (23) bk+1 x = bk x + ( gk+1 − dk+1 x ); bk+1 y = bk y + ( gk+1 − dk+1 y ) (al1). (24) Let’s change (23) into: gk+1 = arg ming λ 2 f − Ag 2 2 + µ 2 g 2 2 + σ 2 dx − xg − bk x 2 2 + σ 2 dy − yg − bk y 2 2, (dk+1 x , dk+1 y ) = arg mindx,dy (dx, dy) 1 + σ 2 dx − xg − bk x 2 2 + σ 2 dy − yg − bk y 2 2. (25) The second of (25) can be solved by a generalized shrinkage solver[4]: sk x = xgk+1 + bk x; sk y = ygk+1 + bk y (al2), (26) dk+1 x = max(sk − 1 σ ) sk x sk ; dk+1 y = max(sk − 1 σ ) sk y sk (al3). (27) 7
  • 8. To solve the first of (25), we take derivative on g, which give us: (λAt A − (µ + σ)∆)g = λAt f + σ t x(dk x − bk x) + σ t y(dk y − bk y). (28) But g can also be regarded as a function, which provide us the Fourier transformation to solve (28) [4] (using F to denote Fourier transformation). • (1): F∆F−1 , F−1 ∆F are diagonal. We only need to show one of them: Let f be a function, then F(f(u, v))(s, t) = ∞ −∞ ∞ −∞ exp(−c1su − c2tv) · f(u, v)dudv, where c1, c2 are constants, ∆F(f(u, v))(s, t) = ∞ −∞ ∞ −∞ (c2 1u2 + c2 2v2 ) · exp(−c1su − c2tv) · f(u, v)dudv , thus F−1 ∆F(f(u, v)) = (c2 1u2 + c2 2v2 ) · f(u, v). It is just multiply a constant to each point of the original function f. To get those constant, we can simply compute F−1 ∆F(1(u, v)) for 1(u, v) ≡ 1. • (2): FAt AF−1 , F−1 At AF are diagonal for A be a blur. Let h be the kernel of A, f be a image, Af(s) = f ∗ h = ∞ −∞ f(x) · h(s − x)dx = ∞ 0 f(x) · h(s − x)dx, For any other image a(x), Af, a = ∞ −∞ a(s) ∞ 0 f(x) · h(s − x)dxds = ∞ 0 a(s) ∞ 0 f(x) · h(s − x)dxds = ∞ 0 f(x) ∞ 0 a(s)h(x − s)dsdx = f, At a , thus At f(s) = ∞ 0 f(x)h(x − s)dx = ∞ 0 f(x)ht (s − x)dx, where ht (s − x) = h(x − s). So FAt AF−1 f = F ∗ ht ∗ h ∗ F−1 f = F(ht ∗ h) · f, it is also diagonal. Now we can solve (28). Let’s first solve g∗ : F(λAt A − (µ + σ)∆)F−1 g∗ = F(λAt f + σ t x(dk x − bk x) + σ t y(dk y − bk y)) (al4), (29) then solve g: g = F−1 g∗ (al5). (30) Lastly our algorithm is as following: 8
  • 9. Algorithm 1 two-stage 1: Initialize tolerance and set g0 = f, d0 x = d0 y = b0 x = b0 y = 0. 2: Repeat 3: Compute dk x, dk y, bk x, bk y, gk from al4(29), al5(30), al2(26), al3(27), al1(24). 4: Until |gk − gk−1 | tolerance 5: Get output g. 2.3 Experimental Results For all the following outcomes, we set σ = 2, tolerance = 0.0001 in our algorithm. (a) (b) Figure 1: Antimass image segmentation (a): the original image; (b): our result with iteration n.b. 192, time 15.7249. (a) (b) Figure 2: Kidney vascular system segmentation (a): the original image; (b): our bi-segmentation by al6 with p = 0.1760, iteration n.b. 111, time 3.1668. 9
  • 10. (a) (b) Figure 3: Noise (a): the original image; (b): our bi-segmentation by al6, p = 0.8308 with iteration n.b. 63, time 0.2340. (a) (b) Figure 4: tri-segmentation (a): the original image; (b): our tri-segmentation by al∗ 6 with iteration n.b. 63, time 0.3432. 10
  • 11. (a) (b) (c) Figure 5: Gaussian blur ([15, 15], 15) Here we show only central part of images with the same size as original one. (a): the clean image; (b): the blurred image; (c): our result with iteration n.b. 83, time 1.2480. For the reason of simpleness, we mainly consider square images. If the original image is not square, we simply make it square by adding 0s and then cut back. In the second and third example, the output is a bi-segmentation by (al6). The last example gives a demonstration on dealing blurry image: we get a blurred image by convolution a clean image and a blur kernel, then use our algorithm to segment. In the example I just give the central part of the blurred image and the segmentation which is actually the true size of those images should be larger due to matrix convolution. Algorithm (al6): This is a very simple proceeder to get segmentation from our smooth image obtained by the above section g. Given a constant chosen by user p ∈ (0, 1), let max denote the maximum of image g, min denote the minimum of image g, then p = p · max + (1 − p) · min. Finally set all pixel whose value in g is less than p to 0, others to 1. Algorithm (al∗ 6): For multi class segmentation, say K segments, we use ’kmeans’ function to firstly compute K group mean values of all pixels and take the K −1 mid-points of the K mean values. Then we use the K − 1 mid-points to segment all pixels in to K clusters according to their value and each cluster is a segment. 3 Super Pixels by Normalized Cut Super pixels is a very important topic [6]. The most popular method of getting super pixel is to do normalized cut [7, 8, 10]. But it is also commonly agreeable that it is a time consuming method. Thus we need SLIC method to provide us a quick way to do super pixel segmentation. In this case, we need to regard an image as a graph in which each pixel represents a vertical. We have to manage to find some relationship between pixels and turn them to a kind of distance from which we can cluster those pixels into groups. Such groups are our super pixels. What we do is to evaluate the edges among verticals with the distance of pixels and then use knowledge from graph theory to do clustering. 11
  • 12. 3.1 Criteria Firstly we need to define the distance between pixels. wi,j = exp{− f(i) − f(j) 2 2 σ2 I } · wdi,j, where    wdi,j = exp{− X(i) − X(j) 2 2 σ2 X } ( X(i) − X(j) 2 < r), wdi,j = 0 otherwise, (31) where wi,j denotes our distance between pixels i, j, f(i) denotes the gray value of pixel i, X(i) denotes position of pixel i, σI, σX are constance where we take σI = 0.02, σX = 10 in this report. Define V be the set of all pixels, W be the distance matrix. What we want is partition Γ = {V1, V2...VK}, K group of pixels. Secondly we need some measure of goodness of the clustering Γ above. For A, B ∈ 2V , define links(A, B) = Σi∈A,j∈Bw(i, j), degree(A) = links(A, V ), linkratio(A, B) = links(A, B) degree(A) . Then introduce following concepts • knassoc(Γ) = 1 K ΣK i=1linkratio(Vi, Vi), which measures tightness of super pixels which the larger the better. • kncuts(Γ) = 1 K ΣK i=1linkratio(Vi, V Vi), which measures differences between super pixels which the smaller the better. In this report we use knassoc(Γ) as criteria and try to find Γ gives the largest knassoc(Γ), and introduce following notation: W =     w1,1 w1,2 w1,3 · · · w2,1 w2,2 w2,3 · · · w3,1 w3,2 w3,3 · · · · · · · · · · · · · · ·     (h), D =     w1,1 + ... + w1,n 0 0 · · · 0 w2,1 + ... + w2,n 0 · · · 0 0 w3,1 + ... + w3,n · · · · · · · · · · · · · · ·     with wi,j’s from (31). Then links(Vi, Vi) = Xt i · W · Xi, degree(Vi) = Xt i · D · Xi, where Xi is indicator vector of Vi. Thus we want to maximize: knassoc(Γ) = 1 K ΣK i=1 Xt i · W · Xi Xt i · D · Xi . (32) 12
  • 13. 3.2 Soundness and Algorithm Let Z = X(Xt DX)− 1 2 , then Zt WZ = (Xt DX)− 1 2 Xt WX(Xt DX)− 1 2 =      Xt 1 · W · X1 Xt 1 · D · X1 0 · · · 0 Xt 2 · W · X2 Xt 2 · D · X2 · · · · · · · · · · · ·       , so we have knassoc(Γ) = 1 K tr(Zt WZ). Then we change our problem into: maximize ε(Z) = 1 K tr(Zt WZ) s.t. Zt DZ = IK, (33) (IK is K denominational identity). From property of tr, if Rt R = IK, then ε(ZR) = ε(Z). Let P = D−1 W, it is easy to see that 1N (N is total number of pixels) is eigenvector of eigenvalue λ = 1. Let V be matrix whose columns are eigenvector of P (note V is no longer the set of verticals), S be the matrix has eigenvalues of P as diagonal entries, then PV = V S. Let V = D− 1 2 · V , then D− 1 2 WD− 1 2 V = D− 1 2 WV = D− 1 2 V S = V S. (34) Since M = D− 1 2 WD− 1 2 is Laplacian from (h), for which λivt i vj = vt i Mvj = vt i λjvj, so vi, vj = 0 (vis are columns of V ), we have ε(V ) = 1 K tr(V t D− 1 2 WD− 1 2 V ) = 1 K tr(V t V S) = 1 K tr(S) = 1 K ΣK i=1si, (35) where V t V = IK, sis are eigenvalues of M as well as of P, indeed M = P. Proposition 3. When s1, s2, ..., sK are the K largest eigenvalues of P defined above, S = Diag[s1, s2, ..., sK], PV = V S, then ε(V ) is maximized. Proof. By property of Rayleigh quotient, if V is composed by eigenvectors of P, then it is local maxi- mizer of ε(Z). Now let Z∗ = [v∗ 1, v∗ 2, ..., v∗ K], for v∗ 1, v∗ 2, ..., v∗ K are eigenvectors corresponding to the K largest eigen- values of P, we have Z∗ to be the global maximizer of ε(Z). (all above are under the condition that Zt DZ = IK.) Therefore, our solution is: {Z∗ · R | Rt R = IK, Z∗ = [v∗ 1...v∗ K]}, (36) where v∗ 1...v∗ K are eigenvector corresponding to largest eigenvalues of P. For Z = f(X) = X(Xt DX)− 1 2 , we get X = f−1 (Z) = (Diag(diag(ZZt )))− 1 2 · Z (al7), (37) 13
  • 14. where, let ZZt =   z1,1 z1,2 · · · z2,1 z2,2 · · · · · · · · · · · ·  , define diag(ZZt ) = [z1,1, z2,2, z3,3, ...], then we should have Diag(diag(ZZt )) =   z1,1 0 · · · 0 z2,2 · · · · · · · · · · · ·  . This formula is correct whenever X is indicator matrix of all pixels (note that Xt DX is always diagonal matrix). Then we have f−1 (Z∗ R) = F−1 (Z∗ ) · R. (38) Suppose that we have find Z∗ from (36), the next thing is to find R s.t. Rt R = IK which give us f−1 (Z∗ )R looks most like an indicator matrix of pixels and use it as an approximation of the final solution. To do this, we come to find such X, R that minimize the following Φ(X, R): define ˜X∗ = f−1 (Z∗ ) Φ(X, R) = X − ˜X∗R 2 , where    X(i, j) ∈ {0, 1} (X = [x1, ..xK]), x1 + ... + xK = 1K, Rt R = IK. (39) In (39) we use Frobenious norm since ε(Z) is continuous under that norm. To show this, consider Z1, Z2 being bounded, if Z1 − Z2 < σ then ∀i, j, |Z1,i,j − Z2,i,j| < σ thus |ε(Z1) − ε(Z2)| = | 1 K tr(Zt 1W(Z1 − Z2)) + 1 K tr((Z1 − Z2)t WZ2)| < C · σ for constant C depending on Z1, W. Given R, we can find X by: X(i, l) = is(l = arg max k ˜X∗R(i, k)) (al8), (40) which means define 1 on any but only one of row-maximizers of ˜X∗R and 0 on others. Given X, we find R by: Xt ˜X∗ = UΩ ˜Ut , R = ˜UUt . (al9) (41) Here the first row is SVD decomposition of Xt ˜X∗. Proposition 4. The two algorithm above (al8)(40), (al9)(41) can yield unique minimizer of (39) Proof. We know that Φ(X, R) = tr((X − ˜X∗R) · (Xt − ( ˜X∗R)t )) = tr{XXt + ˜X∗ ˜X∗ t − X( ˜X∗R)t − Xt ˜X∗R} = C − 2tr(XRt ˜X∗ t ). Then to minimize Φ(X, R) is to maximize 2tr(XRt ˜X∗ t ). To maximize 2tr(XRt ˜X∗ t ) with respect to X, it is easy to get (al8)(40). Introduce symmetric matrix Λ and define 14
  • 15. L(R, Λ) = tr(XRt ˜X∗ t ) − 1 2 tr(Λt (Rt R − IK)), then we want to maximize L(R, Λ). Take gradients on every entry of R, we get tr(XRt ˜X∗ t − 1 2 Λt (Rt R − IK)) = ˜X∗ t X − RΛ = 0, since we know actually Rt R = IK, take Λ∗ = Rt ˜X∗ t X, Xt ˜X∗ = UΩ ˜Ut , ∴ (Λ∗ )t Λ = UΩ2 Ut , Λ∗ = (Λ∗ )t = UΩUt , thus Rt ˜UΩUt = UΩUt , finally we get R = ˜UUt . Before the iteration of computing X, R, we want a initial R which is close to the exact solution. Thus we simply take R composed by rows of ˜X∗ what are as perpendicular as possible. This is done by: R1 = first row of ( ˜X∗)t , c = 0N×1, k = 2, 3, ..., K,    c = c + ˜X∗ · Rk−1, i = arg minj cj, Rk = ith row of ˜X∗, R = [R1, R2, ...Rk] (al10). (42) Finally, the algorithm is as following: Algorithm 2 normalized cuts 1: Compute matrix W, D as (31),(h) with ’spares’ function. 2: Find V ∗ consisting K eigenvectors of D− 1 2 WD− 1 2 corresponding to K largest eigenvalues using ’egis’ function and check if 1N is inside. 3: Compute Z∗ = D− 1 2 V ∗ , ˜X∗ by al7(37). 4: Initialize R by al10(42). 5: Set tolerance. 6: Repeat 7: Compute Xk , Rk from al8(40), al9(41). 8: Until |Xk − Xk−1 | tolerance 9: Get X, the indicator matrix of pixels. 3.3 Experimental Results For all the following outcomes, we take σ2 I = 0.02, σ2 X = 10 in our algorithm. 15
  • 16. (a) (b) (c) Figure 6: simple example (a): the original image; (b): our result with 4 super pixels, r = 7; (c): our result with 16 super pixels, r = 7. (a) (b) Figure 7: 128 × 128 image (a): the original image; (b): our result with 64 super pixels, r = 7. 16
  • 17. (a) (b) Figure 8: 128 × 128 image (a): the original image; (b): our result with 128 super pixels, r = 5. 4 SLIC Super Pixels and Combination One shortcoming of normalized cutting is its time cost thus one of the most popular methods of getting super pixel segmentation, SLIC method [11], is introduced. 4.1 Key Idea The most genius idea of this method is the introduce of cluster centre. This is to take a position as the centre of each super pixel and then decide which cluster each pixel should belong to by measuring the distance of that particular pixel with cluster centres nearby. At the beginning we should also define the distance between pixels. Inspired by the idea used in normalized cutting, we consider the model: wi,j = σ2 I · wc2 i,j + σ2 X · wd2 i,j, (43) where    wci,j = f(i) − f(j) 2, wdi,j = X(i) − X(j) 2 ( X(i) − X(j) 2 < r), σ2 I = 1, σX = m S . (j) (44) Here S = N K , N is total number of pixels, usually 256 × 256. K is the number of clusters or super pixels. Thus we can regard S as length of edge of each super pixel. Then, we should get our initial group of cluster centres. Here we simply use grid points of step S, then move them little around to get as lower gradient value as possible in order to avoid cluster centres on 17
  • 18. edges. Secondly, we assign each pixel to existing cluster centres with the least distance, and then use all pixels in each cluster to compute new position of cluster centres and assign average grey values to those new cluster centres. Finally, we repeat the above procedure until it almost converges and then enforce connectivity by eliminating isolated pixels. Thus we get: Algorithm 3 SLIC 1: Initialize cluster centres Ck at grid points(hexagon). 2: Perturb cluster centres to the lowest gradient position. 3: Set tolerance. 4: Repeat 5: Assign each pixel a cluster centre with minimum distance (43),(j)(44). 6: Compute new cluster centres and residual error E (L1 distance between previous centres and recomputed centres). 7: Until E tolerance 8: Enforce connectivity. After we get our over segmentation, an important thing is to check the goodness of our segmentation. We compare the segmentation result from our algorithm with those done by human being. And we introduce two measurements. • Definition 1. under segmentation error This measures how many of our super pixels lei exactly in segmentations drawn by human. If too many of our super pixels cover both parts with equally portion of two distinct segments of human, the result we get would be less satisfactory. Let g1, g2, · · · gm be segments drawn by human, s1, s2, · · · sl be our super pixels(gi, sj’s are sets of pixels),N be total number of pixels. Define U = 1 N [Σm i=1(Σ[sj ||sj ∩gi|>B|sj |]|sj|) − N] (k), (45) where |sj| just means number of pixels in that super pixel and we set B = 0.005. In this way, the less under segmentation error U is, the better our over segmentation is. • Definition 2. boundary recall This measures how well can our super pixels recognize the boundary of correct segmentation done by human. Let A = {bg1, bg2, · · · bgM } be set of all boundary pixels of segmentation of human, B = {bs1, bs2, · · · bsL} be set of all boundary pixels of our super pixels. Define:    C = {bg ∈ A | ∃bs ∈ B, bg − bs 2 D}, S = |C| |A| (l), (46) where we take D = 1. in this way, the larger boundary recall S is, the better our over segmen- tation is. 18
  • 19. Here are some experimental results (U, S’s are from (k)(45), (l)(46)): (a) (b) Figure 9: 256 × 256 picture 1 (a): result from human; (b): our result with 1024 super pixels, m = 20 in (j)(44), U = 0.1826, S = 0.8969. (a) (b) Figure 10: 256 × 256 picture 2 (a) result from human; (b): our result with 1024 super pixels, m = 20 in (j)(44), U = 0.1665, S = 0.9288. 19
  • 20. (a) (b) Figure 11: 310 × 310 picture 3 (a): result from human; (b): our result with 1024 super pixels, m = 20 in (j)(44), U = 0.1930, S = 0.7222. 4.2 Combination and Probable Improvement From (j) we may think of the SLIC method is more or less sensitive to noises or something like error pixels for only in the second step we can do something to avoid isolated points. Therefore it is a natural idea to use a smoothed image modified from original one while keeping main information instead to do super pixel segmentation. The two-stage method above is a efficient way to smooth a image. Thus we try to first smooth original picture by two-stage method and then use SLIC algorithm to do over segmentation. What is more, we want to improve the distance in (43),(j)(44). Suppose that we already have the hand made segmentation of certain pictures, it is reasonable to put those informations into the distance formula. The direct way is to assign very small value to pairs in a same hand made segment and very large value to other pairs. This inspires us to first get a rough segmentation and them use further information to get a better distance formula. The simplest idea is of course use the already smoothed image with algorithm (al∗ 6) to get a rough segment. But the problem lies in the repetition of information here. Thus we want to get another segmentation way not exactly same as the two-stage method above to provide additional information. The easiest way is to introduce some little change in the algorithm of two-stage method. Here we change E(g) = λ 2 Ω (f − Ag)2 dx + µ 2 Ω | g|2 dx + Ω | g|dx in (14) into: E1(g) = λ 2 Ω (f − Ag)2 dx + µ 2 Ω | g|2 dx + Ω g 2dx, (47) 20
  • 21. and then do the similar thing to minimize E1(47), where the second of (25) becomes: (dk+1 x , dk+1 y ) = arg min dx,dy (dx, dy) 2 + σ 2 dx − xg − bk x 2 2 + σ 2 dy − yg − bk y 2 2. (48) To solve (48), assume that dx = [d1,1, d1,2, · · · d1,N ]t , dy = [d2,1, d2,2, · · · d2,N ]t , xg+bk x = [a1,1, a1,2, · · · a1,N ]t , yg + bk y = [a2,1, a2,2, · · · a2,N ]t . Then let [dt x, dt y] = dt , [( xg + bk x)t , ( yg + bk y)t ] = at , take derivative on (dx, dy) 2 + σ 2 dx − xg − bk x 2 2 + σ 2 dy − yg − bk y 2 2 with respect to all di,js and set them equal to 0, we get d √ dt · d + σ · d − a = 0, (49) thus d = c · a for some scalar c and we can solve that c = 1 σ (1 − 1 √ at · a ). (50) Let this algorithm be (al∗ 3). Then we find some useful properties of this change. Firstly this algorithm converges very fast, usually getting a relative error less than 0.0001 in around 5 iterations. Secondly it doesn’t give a very smooth outcome, which is a disadvantage comparing to the original one, but this has an effect of getting finder although more messy segmentation. Yet messiness is not a serious problem here because we just need a rough segmentation to serve as a parameter in our new amended SLIC algorithm. What is more, the similarity of the rough segmentation and original image can be helpful to eliminate some errors introduced by previous smoothing. For example, on the left is segmentation from original two-stage method λ = 10, µ = 1, σ = 2, K = 16, we take large λ to get more similar to original image; on the right is from changed two-stage method above with λ = 0.5, µ = 3, σ = 2, K = 16. We see that the new method should not be a good segmentation but it reveals more information especially near boundaries. After getting a rough segmentation, we put new information into distances among pixels by: w∗ i,j = wi,j + is(|K(i) − K(j)| > 3) · |K(i) − K(j)| (j∗ ), (51) 21
  • 22. where wi,js the same as in (43),(j)(44), K(i) is the segmentation index of pixel i, noting that gray value of i increases monotonically with K(i). Then comes our mollification of SLIC method. Algorithm 4 revised SLIC 1: Smooth original image to g with two-stage method (λ = 10, µ = 1, σ = 2). 2: Get segmentation of original image using changed two-stage method above (λ = 0.5, µ = 3, σ = 2, K = 16). 3: Change distances formula in SLIC algorithm from (j)(44) to (j∗ )(51). 4: Apply SLIC algorithm on g. To see the improvement of this mollification, we use the code from ’VLFeat’ (regularizer: 0.1) as control group. Here are 3 examples: (Uvl, Ug denote under segmentation error of images from VLFeat and ours, Svl, Sg denote under boundary recall of images from VLFeat and ours.) (a) (b) Figure 12: 256 × 256 picture 1 (a): result from VLFeat, 1024 super pixels, Uvl = 0.1778, Svl = 0.8491; (b): ours, 1024 super pixels, Ug = 0.1540, Sg = 0.9202. 22
  • 23. (a) (b) Figure 13: 256 × 256 picture 2 (a): result from VLFeat, 1024 super pixels, Uvl = 0.1598, Svl = 0.9180; (b): ours, 1024 super pixels, Ug = 0.1298, Sg = 0.9341. (a) (b) Figure 14: 310 × 310 picture 3 (a): result from VLFeat, 1250 super pixels, Uvl = 0.1830, Svl = 0.6649; (b): ours, 1024 super pixels, Ug = 0.1793, Sg = 0.7248. 23
  • 24. From above examples we find our revised method results in smaller under segmentation error and larger boundary recall. It is then reasonable to have more experiments. We took 10 other images: image 1 2 3 4 5 6 7 8 9 10 Uvl 0.1109 0.0880 0.0994 0.1631 0.2928 0.2761 0.1423 0.0963 0.1540 0.3232 Ug 0.1023 0.0624 0.0901 0.1464 0.2559 0.2673 0.1272 0.0858 0.1477 0.2984 Svl 0.8368 0.9383 0.8583 0.9441 0.9111 0.8396 0.7122 0.7227 0.8014 0.8430 Sg 0.9066 0.9759 0.9215 0.9827 0.9199 0.8714 0.8356 0.8242 0.8726 0.9067 We can see our revised method tends to result in smaller under segmentation error and larger boundary recall. To make it more clear, we look at the relative decrement of under segmentation error and relative increment of boundary recall with respect to result from VLFeat. (a) (b) histogram (a) is about under segmentation error while histogram (b) is about boundary recall. Therefore from experimental result, we can see our revising maybe at least a little bit useful. 5 Simple Application After we got the supper pixel segmentation of an image, we become more powerful since we have more information. Before, we only know things on or near each pixels but now we have plots of pixels where we can get information simultaneously from many pixels. 5.1 Saliency Detection As before we define distance between pixels, now we have more information to define reasonable distances between super pixels (as in [12]). Let i, j be two distinct super pixels, let Ri, Rj be range of grey or colour value of each super pixel, Ci,j be the range of common grey or colour value of these two super pixels, Di,j be distance of the two cluster centres of i, j, d be the diagonal length of the image, then define: sim(i, j) = 1 2 ( |Ci,j| |Ri| + |Ci,j| |Rj| ) · (1 − Di,j d ), (52) 24
  • 25. where |C| just means length of interval C. Then sim(i, j) is a reasonable similarity or distance between i, j. The larger sim(i, j) is, the more similar are i, j and the more likely they should be in the same larger segment. On a map, a point with smaller average distances with all other points may be near the centre thus it is natural to detect the saliency of each super pixel by compute a kind of ’average’ similarity to all others. Let GC(i) = ΣK j=1W(i, j) · |mci − mcj|, for W(i, j) = |j| · (1 − Di,j d ), where mci means the mean grey or colour value of super pixel i, |j| means the number of pixels in j, K is number of super pixels, thus GC(i) serves as some indicator of saliency of i and we normalized it by NGC(i) = GC(i) − GCmin GCmax − GCmin . And define SS(i) = ΣK j=1sim(i, j) · Ds(j) ΣK j=1sim(i, j) , NSS(i) = SS(i) − SSmin SSmax − SSmin , where Ds(i) is distance between centre of i and centre of image. NSS(i) is also an indicator of saliency of i. Taking advantage of both NGC, NSS, define SAL(i) = ΣK j=1sim(i, j) · NGC(j) ΣK j=1sim(i, j) · ΣK j=1sim(i, j) · NSS(j) ΣK j=1sim(i, j) , (53) and we use SAL(i) as the final indicator of i. Thus we can segment the image according to value of SAL for all super pixels and put super pixels with close SAL value into same segment. This idea is similar to our two-stage method above, that is to concentrate all information in one measure and do segmentation according to this measure. Here are 2 bi-segmentation result from the above method and two-stage method. Example 321 × 321 plane, bird: (c) (d) (e) Figure 15: (a): the original image; (b): result from above method; (c): result from two-stage method. 25
  • 26. (a) (b) (c) Figure 16: (a): the original image; (b): result from above method; (c): result from two-stage method. From above results we may found two properties of this method. Firstly it reveal more information about edges for two-sage method tend to give us more smooth edges. Secondly it always gives high brightness to main part of the image when two-stage gives brightness according to original grey value but this method gives higher brightness to more central part. Thus in the bird image, two-sage method gives a almost opposite result. 5.2 Reunion of Super Pixels Super pixel segmentation is a kind of over segmentation which yields too many segments thus it is natural to union them back together to form larger segments. Now let’s introduce back all notations from part 3, we want to define another kind of distance among super pixels. Let f to be the origin image, i, j are to adjacent super pixels. Define wgi,j = exp(−MEAN({ ∂f ∂x pk + ∂f ∂y pk|pk ∈ Ni,j})), (54) where pk’s are pixels of the image and Ni,j be common boundary of i, j. If wgi,j gives large value then it means there is not likely to ba any edge between i, j thus i, j should be grouped into one cluster. Also if both linkatio(i, j), linkratio(j, i) are large, then i, j are similar in colour and position. Define wli,j = 1 2 (linkatio(i, j) + linkratio(j, i)). (55) Finally define w∗ i,j = wgi,j + c · wli,j (56) to be our new similarity measure of super pixels and always take c = 0.1. To compute linkatio(i, j), let Xi, Xj be indicator vector of i, j, W, D the same definition as in part 3, then linkraio(i, j) = Xt i WXj Xt i DXi . Then we use exactly the same algorithm as part 3 to group our super pixels. 26
  • 27. Sine all method above, two-stage method and the two super pixel method actually can all do the same work, lets us make comparison of our method with the other 3 methods above. (all results cons from 256 × 256 image with 1024 super pixels,and ours are always on the right side.) • Segmentation of 32, comparing with two-stage method We may find two-stage method not efficient at segmenting too many pieces. And our method can reveal little more information about edges. • Segmentation of 16, comparing with SLIC method We may see when making less segments, SLIC method end to give unreasonable boundaries of segments and it fails to detect many edges, while our method results in simpler shape of segments and gives less segments on the top where there should be only one. • Segmentation of 8, comparing with normalized cuts 27
  • 28. We can see these two results are similar but our method finds some thing about the right crux and window and performs better at telling corners of edges. Thus our new method is notorious at a large range of segments. And I hope it would be helpful for some people. 6 Deficiencies Due to the limitation of my capacity, there still exists much room of improvement for part 4.2 and part 5.2. Most significantly, information of textures has been ignored above. Actually there are many efficient ways to extract that kind of information as well as edges by convulsion or diffusion [13]. For example there are two simple method to do segmentation through texture: Gabor filter and structure tensor. For Gabor filter, define the kernel: K = (ki,j), for ki,j = exp(− i 2 + γ2 · j 2 2σ2 ) · exp(2π i λ + ψ), where i = i·cos(θ)+j·sin(θ), j = −i·sin(θ)+j·cos(θ), and we often set σ = 10, ψ = 0.2, γ = 0.1, λ = 10 and change the value of θ to get different results. In the following examples, we take ’lenna’ picture and Gabor kernel above with size 3. 28
  • 29. (a) (b) (c) (d) Figure 17: 256 × 256 lenna (a): θ = 30, (b): θ = 30, (c): θ = 90, (b): θ = 180. For structure tensor, let original image be f where f(i, j) is gray or color value of pixel (i, j), define: Sw(i, j) = [S0 ∗ w](i, j), where S0(i, j) = ∂f ∂x 2 (i, j) ∂f ∂x · ∂f ∂y (i, j) ∂f ∂x · ∂f ∂y (i, j) ∂f ∂y 2 (i, j) , and w(i, j) be usually Gaussian kernel. Then this yields 3 different imagine results from which we can do combination and extract useful information. Most commonly we compute eigenvector of Sw(i, j) corresponding to the larger eigenvalue for each pixel, then we can get the direction with max gradient. Thus we may use the information of detection as indicator of texture. For example, take Gaussian kernel with size 3 and σ = 3, pixel A = (60, 60), B = (60, 180), C = (180, 180), then the eigenvectors corresponding to larger eigenvalues are respectably [−0.8256, −0.5642]t , [−0.9922, 0.1245]t , [−0.9808, −0.1950]t , which is show by the arrow I draw in the above picture. In a nutshell, it would be better if more about texture had been considered. 29
  • 30. References [1] Xiaohao Cai, Raymond Chan, and Tieyong Zeng. A Two-Stage Image Segmentation Method Using a Convex Variant of the Mumford-Shah Model and Thresholding. SIAM J. Imaging Sciences (2013), Vol. 6, No. 1, pp. 368-390. [2] Tom Goldstein, Stanley Osher. The split Bregman algorithm for L1 regularized problems. pp. 4-6. [3] Jacqueline Bush, Dr. Carlos Garcia-Cervera. Bregman Algorithms (University of California, Santa Barbara ). (2011), pp. 20-23. [4] Yilun Wang, Wotao Yin and Yin Zhang. A Fast Algorithm for Image Deblurring with Total Variation Regularization. CAAM (2007). [5] Jose Bioucas Dias. Convolution operators. IP, IST (2007), pp. 2-13. [6] Xiaofeng Ren and Jitendra Malik. Learning a Classification Model for Segmentation. CA 94720. [7] Jianbo Shi and Jitendra Malik. normalized cuts and image segmentation. IEEE Transactions On Pattern Analysis and Machine Intelligence (2000), VOL. 22, NO. 8, pp. 888-905. [8] Subhransu Maji, Nisheeth K. Vishnoi and Jitendra Malik. Biased Normalized Cuts. Computer Vision and Pattern Recognition (2011), pp. 2057-2064. [9] J. Almeida. Lanczos Algorithm: Theory and Aplications. York (United Kingdom) (2012). [10] Stella X. Yu, Jianbo Shi. Multiclass Spectral Clustering. Computer Vision (2003). [11] Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Susstrunk. SLIC Superpixels. Journal of Latex Class Files, VOL. 6, NO. 1, December 2011. [12] Zhi Liu, Olivier Le Meur and Shuhua Luo. Superpixel-based saliency detection. Image Analysis for Multimedia Interactive Services, 2013 14th. [13] Mikael Rousson, Thomas Brox and Rachid Deriche. Active Unsupervised Texture Segmentation on a Diffusion Based Feature Space. Institut National De Recherche En Informatique et En Au- tomatique (2003). 30