Lecture8 xing

Eric Xing © Eric Xing @ CMU, 2006-2010 1
Machine Learning
Spectral Clustering
Eric Xing
Lecture 8, August 13, 2010
Reading:

Data Clustering

Data Clustering
Compactness Connectivity
 Two different criteria
 Compactness, e.g., k-means, mixture models
 Connectivity, e.g., spectral clustering

Spectral Clustering
Data Similarities

 Some graph terminology
 Objects (e.g., pixels, data points)
i∈ I = vertices of graph G
 Edges (ij) = pixel pairs with Wij > 0
 Similarity matrix W = [ Wij ]
 Degree
di = Σj∈G Wij
dA = Σi∈A di degree of A G
 Assoc(A,B) = Σi∈A Σj∈B Wij
⊆
Wij
i
j
i
A
A
B
Weighted Graph Partitioning

 (edge) cut = set of edges whose removal makes a graph
disconnected
 weight of a cut:
cut( A, B ) = Σi∈A Σj∈B Wij=Assoc(A,B)
 Normalized Cut criteria: minimum cut(A,Ā)
More generally:
Cuts in a Graph
AA d
AA
d
AA
AA
),(cut),(cut
),Ncut( +=
∑∑
∑
∑
== ∈∈
∈∈








=








=
k
r A
rr
k
r VjAi ij
AVjAi ij
k
rr
rr
d
AA
W
W
AAA
11
21
),(cut
),Ncut(
,
,


Graph-based Clustering
 Data Grouping
 Image sigmentation
 Affinity matrix:
 Degree matrix:
 Laplacian matrix:
 (bipartite) partition vector:
ijW
G = {V,E}
Wij
i
j
)),(( jiij xxdfW =
][ , jiwW =
)(diag idD =
WDL −=
],,,[
][
111111
1
−−−=
=
,,
,...,xxx N

Affinity Function
 Affinities grow as σ grows 
 How the choice of σ value affects the results?
 What would be the optimal choice for σ?
2
2
2
σ
ji XX
ji eW
−−
=,

Clustering via Optimizing
Normalized Cut
 The normalized cut:
 Computing an optimal normalized cut over all possible y (i.e.,
partition) is NP hard
 Transform Ncut equation to a matrix form (Shi & Malik 2000):
 Still an NP hard problem
BA d
BA
d
BA
BA
),(cut),(cut
),Ncut( +=
Dyy
yWDy
xNcut T
T
yx
)(
min)(min
−
=
01 =DyT
Subject to:
Rayleigh quotientn
by },{ −∈ 1
...
),(
),(
;
11)1(
)1)(()1(
11
)1)(()1(
)Bdeg(
B)A,(
)Adeg(
B)A,(
B)A,(
0
=
=
−
−−−
+
+−+
=
+=
∑
∑ >
i
x
T
T
T
T
iiD
iiD
k
Dk
xWDx
Dk
xWDx
cutcut
Ncut
i

 Instead, relax into the continuous domain by solving generalized eigenvalue
system:
 Which gives:
 Note that so, the first eigenvector is y0=1 with eigenvalue 0.
 The second smallest eigenvector is the real valued solution to this problem!!
Relaxation
DyyWD λ=− )(
01 =− )( WD
Dyy
yWDy
xNcut T
T
yx
)(
min)(min
−
=
01 =DyT
Subject to:
Rayleigh quotient
n
by },{ −∈ 1
1=− DyyyWDy TT
y s.t.,)(min
Rayleigh quotient theorem

Algorithm
1. Define a similarity function between 2 nodes. i.e.:
2. Compute affinity matrix (W) and degree matrix (D).
3. Solve
 Do singular value decomposition (SVD) of the graph Laplacian
4. Use the eigenvector with the second smallest eigenvalue, , to
bipartition the graph.
 For each threshold k, Ak={i | yi among k largest element of y*}
Bk={i | yi among n-k smallest element of y*}
 Compute Ncut(Ak,Bk)
 Output
DyyWD λ=− )(
2
2
2
X
ji XX
ji ew σ
)()(
,
−−
=
WDL −=
VVL T
Λ= ⇒
*
y
),(Ncutmaxarg*
kk BAk = ** ,and kk
BA

.},,{with,
)(
),( 011 =−∈
−
= Dyby
Dyy
ySDy
BANcut T
iT
T
y2
i
i
A
y2
i
i
A
DyySD λ=− )(
Ideally …

Example (Xing et al, 2001)

Poor features can lead to poor
outcome (Xing et al 2001)

Cluster vs. Block matrix
BA d
BAcut
d
BAcut
BANcut
),(),(
),( +=
∑ ∈∈
= VjAi jiWADegree , ,)(
A
B
A
B
BA d
BA
d
BA
BA
),(cut),(cut
),Ncut( +=

 Criterion for partition:
Compare to Minimum cut
∑∈∈
=
BjAi
ji
BA
WBAcut
,
,
,
min),(min
First proposed by Wu and Leahy
A
B
Ideal Cut
Cuts with
lesser weight
than the
ideal cut
Problem!
Weight of cut is directly proportional
to the number of edges in the cut.

Superior Performance?
 K-means and Gaussian mixture methods are biased toward
convex clusters

Ncut is superior in certain cases

Why?

General Spectral Clustering
Data Similarities

Representation
[ ]KXXX ,...,1=
∑= j jiwiiD ,),(
WDL −=
segments
pixels
),(),( jiaffjiW =
 Partition matrix X:
 Pair-wise similarity matrix W:
 Degree matrix D:
 Laplacian matrix L:

 Given a set of points S={s1,…sn}
 Form the affinity matrix
 Define diagonal matrix Dii= Σκ aik
 Form the matrix
 Stack the k largest eigenvectors of L to for the columns of the new
matrix X:
 Renormalize each of X’s rows to have unit length and get new
matrix Y. Cluster rows of Y as points in R k
A Spectral Clustering Algorithm
Ng, Jordan, and Weiss 2003
0
2
2
2
=≠∀=
−−
ii
SS
ji wjiew
ji
,, ,,
σ
2121 // −−
= WDDL










=
|
|
|
|
|
|
kxxxX 21

SC vs Kmeans

Why it works?
 K-means in the spectrum space !

Eigenvectors and blocks
1 1 0 0
1 1 0 0
0 0 1 1
0 0 1 1
eigensolver
.71
.71
0
0
0
0
.71
.71
λ1= 2 λ2= 2 λ3= 0
λ4= 0
1 1 .2 0
1 1 0 -.2
.2 0 1 1
0 -.2 1 1
eigensolver
.71
.69
.14
0
0
-.14
.69
.71
λ1= 2.02 λ2= 2.02 λ3= -0.02
λ4= -0.02
 Block matrices have block eigenvectors:
 Near-block matrices have near-block eigenvectors:

Spectral Space
1 1 .2 0
1 1 0 -.2
.2 0 1 1
0 -.2 1 1
.71
.69
.14
0
0
-.14
.69
.71
e1
e2
e1 e2
1 .2 1 0
.2 1 0 1
1 0 1 -.2
0 1 -.2 1
.71
.14
.69
0
0
.69
-.14
.71
e1
e2
e1 e2
 Can put items into blocks by eigenvectors:
 Clusters clear regardless of row ordering:

More formally …
 Recall generalized Ncut
 Minimizing this is equivalent to spectral clustering
∑∑
∑
∑
== ∈∈
∈∈








=








=
k
r A
rr
k
r VjAi ij
AVjAi ij
k
rr
rr
d
AA
W
W
AAA
11
21
),(cut
),Ncut(
,
,

∑=








=
k
r A
rr
k
r
d
AA
AAA
1
21
),(cut
),Ncut(min 
YWDDY 2121 //T
min −−
IYY =T
s.t.
segments
pixels
Y

Spectral Clustering
 Algorithms that cluster points using eigenvectors of matrices
derived from the data
 Obtain data representation in the low-dimensional space that
can be easily clustered
 Variety of methods that use the eigenvectors differently (we
have seen an example)
 Empirically very successful
 Authors disagree:
 Which eigenvectors to use
 How to derive clusters from these eigenvectors
 Two general methods

Method #1
 Partition using only one eigenvector at a time
 Use procedure recursively
 Example: Image Segmentation
 Uses 2nd (smallest) eigenvector to define optimal cut
 Recursively generates two clusters with each cut

Method #2
 Use k eigenvectors (k chosen by user)
 Directly compute k-way partitioning
 Experimentally has been seen to be “better”

Toy examples
Images from Matthew Brand (TR-2002-42)

User’s Prerogative
 Choice of k, the number of clusters
 Choice of scaling factor
 Realistically, search over and pick value that gives the tightest clusters
 Choice of clustering method: k-way or recursive bipartite
 Kernel affinity matrix
2
σ
),(, jiji SSKw =

Conclusions
 Good news:
 Simple and powerful methods to segment images.
 Flexible and easy to apply to other clustering problems.
 Bad news:
 High memory requirements (use sparse matrices).
 Very dependant on the scale factor for a specific problem.
2
2
2)()(
),( X
ji XX
ejiW σ
−−
=

Lecture8 xing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Lecture8 xing

Similar to Lecture8 xing (20)

More from Tianlu Wang

More from Tianlu Wang (20)

Recently uploaded

Recently uploaded (20)

Lecture8 xing