08. spectal clustering

(목적)
데이터 집합이 𝑛개의 "objects"으로 이루어졌다고 가정하자.
objects은 이미지 or 단어 or 기타 등등 일 수 있다.
데이터의 집합을 임의의 𝑘개의 집합(clusters)로 나누자.
(집합의 기준)
𝑛개의 objects사이에는 유사성(similarity)이 존재한다.
같은 집합에 속하는 objects들 사이의 유사성은 높아야 하며,
다른 집합에 속하는 objects들 사이의 유사성은 낮아야 한다.

아래의 데이터 집합을 k-means 알고리즘을 사용하여
2개의 clusters로 나누어 보자.

k-means를 사용했을 때의 clusters이다.
"On Spectral Clustering: Analysis and an algorithm" by AY Ng
이와 같이 convex set이 아닌 경우 k-means는 사용할 수 없다.

그렇다면,
convex set이 아닌 경우에는 어떻게 clusters를 나누어 줄 수 있을까?
아래와 같은 방법은 어떨까?
original data points를 새로운 points로 변환 한 뒤 (embedding, demsional reduction)
그 새로운 points set에서 클러스터링 알고리즘을 돌리면 어떨까?
"On Spectral Clustering: Analysis and an algorithm" by AY Ng
original 데이터 공간 새로운 데이터 공간(차원이 줄어든 공간)

(How?)
𝑛개의 objects사이에 유사성(similarity)이 주어졌다면,
유사성 행렬(similarity matrix)를 만든다.
만약 유사성이 주어지지 않았다면, objects를 points로 mapping 시켜서
points 사이의 유사성을 구해 줄 수 있다.
유사성 행렬(similarity) 행렬을 이용해서,
유사 그래프(similarity graph)를 생성한다.
유사 그래프를 생성한다는 말은, 여기서는 인접 행렬(adjacency matrix)과
라플라시안 행렬(laplacian matrix)를 생성한다는 뜻으로 해석하면 된다.
그리고 라플라시안 행렬을 이용하여 새로운 points set을 구해준다.
objects의 clusters를 구하기 위해 Graph를 이용!
그래프 사용시 계산이 쉽다.

(계산이 왜 쉬운가? & 기본개념)
① 𝑛개의 data objects(points) 𝑜1, … , 𝑜 𝑛
② 모든 objects 사이의 유사도(similarity) 𝑠 𝑜𝑖, 𝑜𝑗 = 𝑠𝑖𝑗
𝕩 = 𝑥1, … , 𝑥 𝑛
𝑇
𝑜1 대응되는 새로운 데이터 포인트를 𝑥1이라고 하고,
𝑜 𝑛에 대응되는 새로운 데이터 포인트를 𝑥 𝑛이라고 하자.
𝑜1, … , 𝑜 𝑛
𝑇
𝑜1
𝑜2
𝑜3
𝑜4
𝑥1과 𝑥2는 값이 거의 차이가 없고, 𝑥1와 𝑥3는 어느 정도 값의 차이가 있고,
𝑥1과 𝑥4는 값이 차이가 많이 난다.

우리가 구하고자 하는 clusters는 아래의 식으로 표현될 수 있다.
𝑠𝑖𝑗 ↑ 𝑥𝑖 − 𝑥𝑗
2
↓
𝑠𝑖𝑗 ↓ 𝑥𝑖 − 𝑥𝑗
2
↑
위 식은 다음과 동일한 식이다.
𝐦𝐢𝐧
𝕩
෍ 𝒔𝒊𝒋 𝒙𝒊 − 𝒙𝒋
𝟐
𝐦𝐢𝐧
𝕩
෍ 𝒘𝒊𝒋 𝒙𝒊 − 𝒙𝒋
𝟐

위 식의 답은, 𝐿의 가장 작은 고유 값(eigenvalue)에 대응되는
고유 벡터(eigenvector)이다.
𝕩 = 𝑥1, … , 𝑥 𝑛
𝑇
𝕩 =
𝑥1
𝑥2
⋮
𝑥 𝑛
가장 작은 고유 벡터
𝕧1 =
𝑣1
𝑣2
⋮
𝑣 𝑛
𝑣1이 𝑜1에 대응되는 새로운 point이고, 𝑣2가 𝑜2에 대응되는 새로운 point이다.
𝐦𝐢𝐧
𝕩
𝟐
= min
𝕩
1
2
𝕩 𝑻 𝑳𝕩
새로운 points
(새로운 공간안에 존재)

By ,
this change of representation enhances the cluster-properties
in the data, so that clusters can be trivially detected in the new
representation.

 Input : Data points 𝕡 𝟏, 𝕡 𝟐, ...,𝕡 𝒏 ∈ 𝑹 𝒏
, Similarity matrix 𝑊 ∈ 𝑀 𝑛×𝑛 ,
clusters의 수 𝐾
○ Similarity matrix를 이용하여 similarity graph를 생성(build)
○ adjacency matrix 𝑊, diagonal matrix 𝐷를 사용하여 graph laplacian을 생성
① 𝐿 = 𝐷 − 𝑊 : unnormalized graph laplacian
○ matrix L의 𝑘개의 고유 벡터(eigenvector) 𝕧1, … , 𝕧 𝑘 ∈ 𝑅 𝑛
를 작은 것 부터 구함
Spectral Clustering Algorithm

○ 𝑘개의 고유 벡터를 열(column)로 사용하는 matrix 𝑉를 생성 (𝑉 ∈ 𝑀 𝑛×𝑘)
○ 𝑉의 열(row)을 새로운 data points로 해석(interpret)
𝒙 𝟏, 𝒙 𝟐,..., 𝒙 𝒏 ∈ 𝑹 𝒌
○ 𝑛개의 새로운 data points 𝑥1, 𝑥2, ..., 𝑥 𝑛를 𝐾-means 알고리즘을 사용하여 𝐾개의
clusters로 분할
Spectral Clustering Algorithm
𝑣11 𝑣12
𝑣21 𝑣22
⋯
𝑣1(𝑘−1) 𝑣1𝑘
𝑣2(𝑘−1) 𝑣2𝑘
⋮ ⋱ ⋮
𝑣 𝑛1 𝑣 𝑛2 ⋯ 𝑣 𝑛(𝑘−1) 𝑣 𝑛𝑘
𝑉 = 𝕧1, 𝕧2, … , 𝕧 𝑘 =
𝑣11 𝑣12
𝑣21 𝑣22
⋯
𝑣1(𝑘−1) 𝑣1𝑘
𝑣2(𝑘−1) 𝑣2𝑘
⋮ ⋱ ⋮
𝑣 𝑛1 𝑣 𝑛2 ⋯ 𝑣 𝑛(𝑘−1) 𝑣 𝑛𝑘
new data point 𝕩 𝟏
new data point 𝕩 𝟐
new data point 𝕩 𝒏
Dimensional Reduction : 𝒏 × 𝒏 → 𝒏 × 𝒌

1
3
4
2 5
6
8
7
실험으로 살펴 보자(굉장히 쉬운 예).
3개의 clusters로 나누어 보겠다.

𝕧1 𝕧2 𝕧3
𝕧2
𝕧1
𝕧3
2개 (0 , -0.7071 , 0)
새로운 points 8개가 생성
(0 , 0 , -0.5)
(0 , 0 , -0.5)
(0 , 0 , -0.5)
(0 , 0 , -0.5)
(-0.7071 , 0 , 0)
(-0.7071 , 0 , 0)
(0 , -0.7071 , 0)
(0 , -0.7071 , 0)
2개 (-0.7071 , 0 , 0)
4개 (0 , 0 , -0.5000)
새로운 공간(Embedding 된 공간)

- 작은 eigenvalue 𝑘개에 대응하는 𝑘개의 eigenvector를 선택하는 수학적 당위성?
- 왜 𝑘인가?
- 𝑘는 어떻게 선택할 수 있는가?
Questions
Go to appendix 1

 if, connected graph에서는 first laplacian eigenvector는 constant vector. (𝟙)
 if, disconnected(𝑘-connected components), graph laplacian은 block diagonal
matrix이고, 처음 𝑘 laplacian eigenvector는 다음과 같다.
일반적으로,
Go to appendix 2

Spectral Clustering의 도식화 (connected component가 1개 이상)

 만약에, components가 아주 약하게(loosely) 연결 되어 있으면(connected), 즉
graph laplacian이 정확한 block diagonal matrix가 아닐 때, 첫 번째 laplacian
eigenvector는 𝟙이다.
○ Balanced min-cut을 위해서는 두 번째 laplacian eigenvector를 사용한다.
○ 𝑘 cluster를 위해서, 𝑘개의 eigenvector를 사용하나, 이 eigenvector들은 아주 살짝 변형
된다.(perturbed)
※ 자세한 내용은 Davis-Kahan Theorem을 참고.
Go to appendix 3

Spectral Clustering의 도식화 (connected component가 1개)

Spectral Clustering에서 𝐾-means 알고리즘을 사용함으로,
우리는 non-convex 경계를 가진 data 집합에서 clusters를 찾아낼 수 있다.

 Eigengap
○ 𝑘를 선택하는 여러 방법중에 하나.
○ choose the number 𝑘 such that all eigenvalues 𝜆1,…,𝜆 𝑘 are very small, but 𝜆 𝑘+1 is
relatively large.

 Unnormalized Graph Laplacian
𝑳 = 𝑫 − 𝑾
 Normalized Graph Laplacian
𝑳 𝒔𝒚𝒎 = 𝑫−
𝟏
𝟐 𝑳𝑫−
𝟏
𝟐 = 𝑰 − 𝑫−
𝟏
𝟐 𝑾𝑫−
𝟏
𝟐
𝑳 𝒓𝒘 = 𝑫−𝟏
𝑳 = 𝑰 − 𝑫−𝟏
𝑾
𝑅𝑎𝑡𝑖𝑜𝐶𝑢𝑡
𝑁𝐶𝑢𝑡

 graph가 normal하고, 대부분의 vertices가 대략(approximately) 같은 degree를 가
질 때는, 𝐿, 𝐿 𝑟𝑤, 𝐿 𝑠𝑦𝑚가 거의 동일한 clustering 결과를 도출한다.
 graph의 degrees가 매우 broadly distributed 된 경우, (논문 저자의 의견)
○ normalized rather than unnormalized
○ 𝐿 𝑟𝑤 rather than 𝐿 𝑠𝑦𝑚

 We want to partition such that
○ points in different clusters are dissimilar to each other
① minimize the between-cluster similarity
② minimize 𝒄𝒖𝒕(𝑨, 𝑨 𝑪
)
○ points in the same cluster are similar to each other
① maximize the within-cluster similarity
② maximize 𝑾(𝑨, 𝑨) and 𝑾(𝑨 𝑪
, 𝑨 𝑪
)

 between-cluster similarity
○ 𝑁𝐶𝑢𝑡, 𝑅𝑎𝑡𝑖𝑜𝐶𝑢𝑡 satisfy.
 within-cluster similarity
○ 𝑊 𝐴, 𝐴 = 𝑊 𝐴, 𝑉 − 𝑊 𝐴, 𝐴 𝐶
= 𝑣𝑜𝑙 𝐴 − 𝑐𝑢𝑡(𝐴, 𝐴 𝐶
)
○ If 𝑐𝑢𝑡(𝐴, 𝐴 𝐶
) is small and 𝑣𝑜𝑙(𝐴) is large, then within-cluster is maximized.
○ We can achieve this by minimizing 𝑁𝐶𝑢𝑡.
 Normalized spectral clustering using implements both clustering objective
mentioned above, while unnormalized spectral clustering only implements the
first objective.

 Spectral clustering에서 우리는 relaxation을 사용한다. (discrete problem)
 Most importantly, there is no guarantee whatsoever on the quality of the
solution of the relaxed problem compared to the exact solution.
 The reason why the spectral relaxation is so appealing is not that it leads to
particularly good solutions.
 Its popularity is mainly due to the fact that it results in a standard linear
algebra problem which is simple to solve.

 http://snap.stanford.edu/class/cs224w-readings/ng01spectralcluster.pdf
 http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf
 http://www.kyb.mpg.de/fileadmin/user_upload/files/publications/attachments/Lu
xburg07_tutorial_4488%5b0%5d.pdf
 http://ranger.uta.edu/~chqding/papers/KmeansPCA1.pdf
 http://ranger.uta.edu/~chqding/Spectral/spectralA.pdf

cluster의 수학적 표기 𝐦𝐢𝐧
𝕩
෍ 𝒔𝒊𝒋 𝒙𝒊 − 𝒙𝒋
𝟐
𝐦𝐢𝐧
𝕩
𝟐
𝐦𝐢𝐧
𝕩
𝟏
𝟐
𝕩 𝑻 𝑳𝕩
𝑅𝑎𝑡𝑖𝑜𝐶𝑢𝑡 𝑁𝐶𝑢𝑡
by def of 𝕩
graph cut
Spectral cluster is a way
to solve relaxed versions
of cut problems.

 𝑅𝑎𝑡𝑖𝑜𝐶𝑢𝑡 𝐴1, 𝐴2, … , 𝐴 𝑘 문제의 경우
○ Given a partition of 𝑉 into 𝑘 sets, we define 𝑘 indicator vectors ℎ𝑗 = (ℎ1,𝑗, … , ℎ 𝑛,𝑗) by
① ℎ𝑖,𝑗 =
1
|𝐴 𝑗|
if 𝑣𝑖 ∈ 𝐴𝑗
② ℎ𝑖,𝑗 = 0 otherwise
for all (𝑖 = 1, … , 𝑛; 𝑗 = 1, … , 𝑘)
𝑅𝑎𝑡𝑖𝑜𝐶𝑢𝑡 𝐴1, … , 𝐴 𝑘 ≔ ෍
𝑙=1
𝑘
𝑐𝑢𝑡 𝐴𝑖, 𝐴𝑖
𝑐
|𝐴𝑖|
= ෍
𝑖=1
𝑘
ℎ𝑖
𝑇
𝐿ℎ𝑖 = ෍
𝑖=1
𝑘
𝐻 𝑇 𝐿𝐻 𝑖𝑖 = 𝑇𝑟(𝐻 𝑇 𝐿𝐻)
𝐦𝐢𝐧
𝑯∈ℝ 𝒏×𝒌
𝑻𝒓(𝑯 𝑻 𝑳𝑯) s.t. 𝑯 𝑻 𝑯 = 𝑰
by Rayleigh-Ritz thm, sol of this problem : 𝑘 eigenvectors of unnormalized laplacian 𝐿

 𝑁𝐶𝑢𝑡 𝐴1, 𝐴2, … , 𝐴 𝑘 문제의 경우
○ Given a partition of 𝑉 into 𝑘 sets, we define 𝑘 indicator vectors ℎ𝑗 = (ℎ1,𝑗, … , ℎ 𝑛,𝑗) by
① ℎ𝑖,𝑗 =
1
𝑣𝑜𝑙(𝑣 𝑖)
if 𝑣𝑖 ∈ 𝐴𝑗
② ℎ𝑖,𝑗 = 0 otherwise
for all (𝑖 = 1, … , 𝑛; 𝑗 = 1, … , 𝑘)
𝐦𝐢𝐧
𝑨 𝟏,…,𝑨 𝒌
𝑻𝒓(𝑯 𝑻 𝑳𝑯) s.t. 𝑯 𝑻
𝑫𝑯 = 𝑰
by Rayleigh-Ritz thm, sol of this problem : 𝑘 eigenvectors of normalized laplacian 𝐿 𝑠𝑦𝑚
𝐦𝐢𝐧
𝑻∈ℝ 𝒏×𝒌
𝑻𝒓(𝑻 𝑻 𝑫−
𝟏
𝟐 𝑳𝑫−
𝟏
𝟐 𝑻) s.t. 𝑻 𝑻 𝑻 = 𝑰
여기에
수식을
입력하십시오.
𝑇 = 𝐷
1
2 𝐻
Back
by Rayleigh-Ritz thm, sol of this problem : 𝑘 eigenvectors of normalized laplacian 𝐿 𝑟𝑤

Ideal 상태를 생각해보자. 𝑘개의 connected components가 있다고 가정하자.
서로 다른 components는 connectio이 없는 경우이다.
(connected component : component내의 모든 요소가 path로 연결된 경우)
block diagonal matrix이다.
𝐿 =
𝐿1
𝐿2
⋱
𝐿 𝑘

※ Prove
○ 우선, connected component가 1이라고 생각해 보자. 그리고 고유 값 0에 대응되는
고유 벡터를 𝕗라고 하자. 그러면 0 = Σ𝑖,𝑗 𝑤𝑖,𝑗 𝑓𝑖 − 𝑓𝑗
2
가 될 것이다. 𝑤𝑖,𝑗 > 0 이므로,
𝑓𝑖와 𝑓𝑗는 모든 𝑖, 𝑗에 대해서 같은 값을 가지게 될 것이다. 따라서, 모든 vertices 들은
하나의 path로 연결 될 것이다. 따라서 𝕗 = 𝑐𝟙 where 𝑐 ∈ ℝ이 된다.

※ Prove
○ connected component가 1보다 크다고 생각해 보자. 고유 값 0에 대응되는 고유 벡터
를 위와 같이 생각할 수 있다. 라플라시안이 block diagonal 행렬을 가지게 되므로,
connected component에 해당되는 부분의 𝑓𝑖와 𝑓𝑗만 같은 값을 갖고 나머지 부분은
𝑤𝑖,𝑗 = 0이므로 𝑓𝑖, 𝑓𝑗는 0을 갖는다. 따라서, block diagonal 행렬에 대응 하는 원소만
동일한 상수이고 나머지 원소는 0인 eigenvector가 component의 수만큼 존재하게 된
다. 물론 이 eigenvector는 모두 eigenvalue 0에 대응되는 eigenvectors이다.
𝐿 =
𝐿1
𝐿2
⋱
𝐿 𝑘
𝑐1
𝑐1
0
0
0
0
0
0
0
𝑐2
𝑐2
0
0
0
0
0
0
0
0
𝑐 𝑘
𝑐 𝑘

○ 𝐿 is a block diagonal matrix, the spectrum of 𝐿 is given by the union of the spectra
of 𝐿𝑖, and the corresponding eigenvectors of 𝐿 are the eigenvectors of 𝐿𝑖,filled with
0 at the positions of the other blocks.
Back

 Perturbation theory
○ How eigenvalues and eigenvectors of a matrix 𝐴 change if we add a small
perturbation 𝐻.
○ Most perturbation theorems state that a certain distance between eigenvalues or
eigenvectors of 𝐴 and perturbed matrix 𝐴 𝑝 = 𝐴 + 𝐻 is bounded by a constant times
a norm of 𝐻.
○ Strongly connected componen를 ideal case라고 가정하자. loosely connected
componen를 nearly ideal case라고 하자. nearly ideal case에서 우리는 여전히 distinct
cluster를 가지고 있다. 하지만 between-cluster의 similarity는 정확히 0은 아니다.
①ideal case의 perturbed laplacian matrix(nearly ideal case)를 고려해보자.
②Perturbation theory에 의하면, perturbed laplacian의 eigenvectors는 ideal case
의 indicator vectors(eigenvectors of laplacian)와 차이가 거의 없다.
③따라서 ideal case의 eigenvectors로 만들어진 새로운 𝑦𝑖와 nearly ideal case의
eigenvectors로 만들어진 새로운 𝑦𝑖는 약간의 error term을 제외하고는 서로 비
슷하다고 볼 수 있다.

○ 하지만 여기서 생각해야 할 properties 2개가 있다.
① eigenvectors와 eigenvalues의 순서가 의미가 있어야 한다.
– 𝐿은 의미가 있다. 𝑊나 𝑆는 의미가 없다.
② eigenvector의 components가 0으로부터 "safely bounded away" 되어야 한다.
– 𝐿, 𝐿 𝑟𝑤 = 𝐷−1
𝐿은 이 property를 잘 만족한다.
– 𝐿 𝑠𝑦𝑚 = 𝐷−
1
2 𝐿𝐷−
1
2의 eigenvector는 𝐷
1
2 𝟙 𝐴 𝑖
이다. 따라서, vertices의 degree가 차이가
많이 나거나, degre가 매우 낮은 vertices가 존재하면, eignevectors에서 대응되는
entries의 값이 0에 매우 가깝게 된다. 따라서 이를 해결하기 위해서 row-
normalization step을 사용한다.
– 𝐿 𝑠𝑦𝑚은 very low degrees를 가지는 vertices가 존재할 때, 주의해서 사용해야 한다.
Back

08. spectal clustering

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 08. spectal clustering

Similar to 08. spectal clustering (20)

More from Jeonghun Yoon

More from Jeonghun Yoon (12)

08. spectal clustering