More Related Content
Similar to Lecture8 xing (20)
More from Tianlu Wang (20)
Lecture8 xing
- 1. Eric Xing © Eric Xing @ CMU, 2006-2010 1
Machine Learning
Spectral Clustering
Eric Xing
Lecture 8, August 13, 2010
Reading:
- 2. Eric Xing © Eric Xing @ CMU, 2006-2010 2
Data Clustering
- 4. Eric Xing © Eric Xing @ CMU, 2006-2010 4
Data Clustering
Compactness Connectivity
Two different criteria
Compactness, e.g., k-means, mixture models
Connectivity, e.g., spectral clustering
- 5. Eric Xing © Eric Xing @ CMU, 2006-2010 5
Spectral Clustering
Data Similarities
- 6. Eric Xing © Eric Xing @ CMU, 2006-2010 6
Some graph terminology
Objects (e.g., pixels, data points)
i∈ I = vertices of graph G
Edges (ij) = pixel pairs with Wij > 0
Similarity matrix W = [ Wij ]
Degree
di = Σj∈G Wij
dA = Σi∈A di degree of A G
Assoc(A,B) = Σi∈A Σj∈B Wij
⊆
Wij
i
j
i
A
A
B
Weighted Graph Partitioning
- 7. Eric Xing © Eric Xing @ CMU, 2006-2010 7
(edge) cut = set of edges whose removal makes a graph
disconnected
weight of a cut:
cut( A, B ) = Σi∈A Σj∈B Wij=Assoc(A,B)
Normalized Cut criteria: minimum cut(A,Ā)
More generally:
Cuts in a Graph
AA d
AA
d
AA
AA
),(cut),(cut
),Ncut( +=
∑∑
∑
∑
== ∈∈
∈∈
=
=
k
r A
rr
k
r VjAi ij
AVjAi ij
k
rr
rr
d
AA
W
W
AAA
11
21
),(cut
),Ncut(
,
,
- 8. Eric Xing © Eric Xing @ CMU, 2006-2010 8
Graph-based Clustering
Data Grouping
Image sigmentation
Affinity matrix:
Degree matrix:
Laplacian matrix:
(bipartite) partition vector:
ijW
G = {V,E}
Wij
i
j
)),(( jiij xxdfW =
][ , jiwW =
)(diag idD =
WDL −=
],,,[
][
111111
1
−−−=
=
,,
,...,xxx N
- 9. Eric Xing © Eric Xing @ CMU, 2006-2010 9
Affinity Function
Affinities grow as σ grows
How the choice of σ value affects the results?
What would be the optimal choice for σ?
2
2
2
σ
ji XX
ji eW
−−
=,
- 10. Eric Xing © Eric Xing @ CMU, 2006-2010 10
Clustering via Optimizing
Normalized Cut
The normalized cut:
Computing an optimal normalized cut over all possible y (i.e.,
partition) is NP hard
Transform Ncut equation to a matrix form (Shi & Malik 2000):
Still an NP hard problem
BA d
BA
d
BA
BA
),(cut),(cut
),Ncut( +=
Dyy
yWDy
xNcut T
T
yx
)(
min)(min
−
=
01 =DyT
Subject to:
Rayleigh quotientn
by },{ −∈ 1
...
),(
),(
;
11)1(
)1)(()1(
11
)1)(()1(
)Bdeg(
B)A,(
)Adeg(
B)A,(
B)A,(
0
=
=
−
−−−
+
+−+
=
+=
∑
∑ >
i
x
T
T
T
T
iiD
iiD
k
Dk
xWDx
Dk
xWDx
cutcut
Ncut
i
- 11. Eric Xing © Eric Xing @ CMU, 2006-2010 11
Instead, relax into the continuous domain by solving generalized eigenvalue
system:
Which gives:
Note that so, the first eigenvector is y0=1 with eigenvalue 0.
The second smallest eigenvector is the real valued solution to this problem!!
Relaxation
DyyWD λ=− )(
01 =− )( WD
Dyy
yWDy
xNcut T
T
yx
)(
min)(min
−
=
01 =DyT
Subject to:
Rayleigh quotient
n
by },{ −∈ 1
1=− DyyyWDy TT
y s.t.,)(min
Rayleigh quotient theorem
- 12. Eric Xing © Eric Xing @ CMU, 2006-2010 12
Algorithm
1. Define a similarity function between 2 nodes. i.e.:
2. Compute affinity matrix (W) and degree matrix (D).
3. Solve
Do singular value decomposition (SVD) of the graph Laplacian
4. Use the eigenvector with the second smallest eigenvalue, , to
bipartition the graph.
For each threshold k, Ak={i | yi among k largest element of y*}
Bk={i | yi among n-k smallest element of y*}
Compute Ncut(Ak,Bk)
Output
DyyWD λ=− )(
2
2
2
X
ji XX
ji ew σ
)()(
,
−−
=
WDL −=
VVL T
Λ= ⇒
*
y
),(Ncutmaxarg*
kk BAk = ** ,and kk
BA
- 13. Eric Xing © Eric Xing @ CMU, 2006-2010 13
.},,{with,
)(
),( 011 =−∈
−
= Dyby
Dyy
ySDy
BANcut T
iT
T
y2
i
i
A
y2
i
i
A
DyySD λ=− )(
Ideally …
- 14. Eric Xing © Eric Xing @ CMU, 2006-2010 14
Example (Xing et al, 2001)
- 15. Eric Xing © Eric Xing @ CMU, 2006-2010 15
Poor features can lead to poor
outcome (Xing et al 2001)
- 16. Eric Xing © Eric Xing @ CMU, 2006-2010 16
Cluster vs. Block matrix
BA d
BAcut
d
BAcut
BANcut
),(),(
),( +=
∑ ∈∈
= VjAi jiWADegree , ,)(
A
B
A
B
BA d
BA
d
BA
BA
),(cut),(cut
),Ncut( +=
- 17. Eric Xing © Eric Xing @ CMU, 2006-2010 17
Criterion for partition:
Compare to Minimum cut
∑∈∈
=
BjAi
ji
BA
WBAcut
,
,
,
min),(min
First proposed by Wu and Leahy
A
B
Ideal Cut
Cuts with
lesser weight
than the
ideal cut
Problem!
Weight of cut is directly proportional
to the number of edges in the cut.
- 18. Eric Xing © Eric Xing @ CMU, 2006-2010 18
Superior Performance?
K-means and Gaussian mixture methods are biased toward
convex clusters
- 19. Eric Xing © Eric Xing @ CMU, 2006-2010 19
Ncut is superior in certain cases
- 21. Eric Xing © Eric Xing @ CMU, 2006-2010 21
General Spectral Clustering
Data Similarities
- 22. Eric Xing © Eric Xing @ CMU, 2006-2010 22
Representation
[ ]KXXX ,...,1=
∑= j jiwiiD ,),(
WDL −=
segments
pixels
),(),( jiaffjiW =
Partition matrix X:
Pair-wise similarity matrix W:
Degree matrix D:
Laplacian matrix L:
- 23. Eric Xing © Eric Xing @ CMU, 2006-2010 23
Given a set of points S={s1,…sn}
Form the affinity matrix
Define diagonal matrix Dii= Σκ aik
Form the matrix
Stack the k largest eigenvectors of L to for the columns of the new
matrix X:
Renormalize each of X’s rows to have unit length and get new
matrix Y. Cluster rows of Y as points in R k
A Spectral Clustering Algorithm
Ng, Jordan, and Weiss 2003
0
2
2
2
=≠∀=
−−
ii
SS
ji wjiew
ji
,, ,,
σ
2121 // −−
= WDDL
=
|
|
|
|
|
|
kxxxX 21
- 24. Eric Xing © Eric Xing @ CMU, 2006-2010 24
SC vs Kmeans
- 25. Eric Xing © Eric Xing @ CMU, 2006-2010 25
Why it works?
K-means in the spectrum space !
- 26. Eric Xing © Eric Xing @ CMU, 2006-2010 26
Eigenvectors and blocks
1 1 0 0
1 1 0 0
0 0 1 1
0 0 1 1
eigensolver
.71
.71
0
0
0
0
.71
.71
λ1= 2 λ2= 2 λ3= 0
λ4= 0
1 1 .2 0
1 1 0 -.2
.2 0 1 1
0 -.2 1 1
eigensolver
.71
.69
.14
0
0
-.14
.69
.71
λ1= 2.02 λ2= 2.02 λ3= -0.02
λ4= -0.02
Block matrices have block eigenvectors:
Near-block matrices have near-block eigenvectors:
- 27. Eric Xing © Eric Xing @ CMU, 2006-2010 27
Spectral Space
1 1 .2 0
1 1 0 -.2
.2 0 1 1
0 -.2 1 1
.71
.69
.14
0
0
-.14
.69
.71
e1
e2
e1 e2
1 .2 1 0
.2 1 0 1
1 0 1 -.2
0 1 -.2 1
.71
.14
.69
0
0
.69
-.14
.71
e1
e2
e1 e2
Can put items into blocks by eigenvectors:
Clusters clear regardless of row ordering:
- 28. Eric Xing © Eric Xing @ CMU, 2006-2010 28
More formally …
Recall generalized Ncut
Minimizing this is equivalent to spectral clustering
∑∑
∑
∑
== ∈∈
∈∈
=
=
k
r A
rr
k
r VjAi ij
AVjAi ij
k
rr
rr
d
AA
W
W
AAA
11
21
),(cut
),Ncut(
,
,
∑=
=
k
r A
rr
k
r
d
AA
AAA
1
21
),(cut
),Ncut(min
YWDDY 2121 //T
min −−
IYY =T
s.t.
segments
pixels
Y
- 29. Eric Xing © Eric Xing @ CMU, 2006-2010 29
Spectral Clustering
Algorithms that cluster points using eigenvectors of matrices
derived from the data
Obtain data representation in the low-dimensional space that
can be easily clustered
Variety of methods that use the eigenvectors differently (we
have seen an example)
Empirically very successful
Authors disagree:
Which eigenvectors to use
How to derive clusters from these eigenvectors
Two general methods
- 30. Eric Xing © Eric Xing @ CMU, 2006-2010 30
Method #1
Partition using only one eigenvector at a time
Use procedure recursively
Example: Image Segmentation
Uses 2nd (smallest) eigenvector to define optimal cut
Recursively generates two clusters with each cut
- 31. Eric Xing © Eric Xing @ CMU, 2006-2010 31
Method #2
Use k eigenvectors (k chosen by user)
Directly compute k-way partitioning
Experimentally has been seen to be “better”
- 32. Eric Xing © Eric Xing @ CMU, 2006-2010 32
Toy examples
Images from Matthew Brand (TR-2002-42)
- 33. Eric Xing © Eric Xing @ CMU, 2006-2010 33
User’s Prerogative
Choice of k, the number of clusters
Choice of scaling factor
Realistically, search over and pick value that gives the tightest clusters
Choice of clustering method: k-way or recursive bipartite
Kernel affinity matrix
2
σ
),(, jiji SSKw =
- 34. Eric Xing © Eric Xing @ CMU, 2006-2010 34
Conclusions
Good news:
Simple and powerful methods to segment images.
Flexible and easy to apply to other clustering problems.
Bad news:
High memory requirements (use sparse matrices).
Very dependant on the scale factor for a specific problem.
2
2
2)()(
),( X
ji XX
ejiW σ
−−
=