Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Tutorial on Spectral Clustering                                                    Chris Ding                           ...
Some historical notes         • Fiedler, 1973, 1975, graph Laplacian matrix         • Donath & Hoffman, 1973, bounds      ...
Spectral Gold-Rush of 2001                                   9 papers on spectral clustering     • Meila & Shi, AI-Stat 20...
Part I: Basic Theory, 1973 – 2001Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California   4
Spectral Graph Partitioning       MinCut: min cutsize                                                              Constra...
2-way Spectral Graph Partitioning                                                                               1 if i ∈ ...
Properties of Graph Laplacian           Laplacian matrix of the Graph: L = D − W         • L is semi-positive definite xT ...
Recovering Partitions          From the definition of cluster indicators:          Partitions A, B are determined by:     ...
Multi-way Graph Partitioning         • Recursively applying the 2-way partitioning            • Recursive 2-way partitioni...
2-way Spectral Clustering               • Undirected graphs (pairwise similarities)               • Bipartite graphs (cont...
Spectral Clustering         min cutsize , without explicit size constraints          But where to cut ?            Need to...
Clustering Objective Functions                                                                                    s(A,B) =...
Ratio Cut                  (Hagen & Kahng, 1992)        Min similarity between A , B:                                     ...
Normalized Cut (Shi & Malik, 1997)           Min similarity between A & B: s(A,B) = ∑                               ∑ wij ...
MinMaxCut (Ding et al 2001)              Min similarity between A & B:                              s(A,B) =     ∑∑ w     ...
A simple example                 2 dense clusters, with sparse connections                 between them.               Adj...
Comparison of Clustering Objectives          • If clusters are well separated, all three give            very similar and ...
2-way Clustering of Newsgroups     Newsgroups                        RatioCut NormCut                             MinMaxCu...
Cluster Balance Analysis I:                                Random Graph Model       • Random graph: edges are randomly ass...
2-way Clustering of Newsgroups    Cluster Balance    Eigenvector      JNcut(i)      JMMC(i)Tutorial on Spectral Clustering...
Cluster Balance Analysis II:                                                         Large Overlap Case                   ...
Spectral Clustering of Bipartite Graphs            Simultaneous clustering of rows and columns            of a contingency...
Spectral Clustering of Bipartite Graphs          Simultaneous clustering of rows and columns                     (adjacenc...
Bipartite Graph Clustering         Clustering indicators for rows and columns:                          1 if ri ∈ R1     ...
Clustering of Bipartite Graphs       Let        ~                          u         D1 / 2 f                   B = Dr...
Clustering of Bipartite Graphs        Recovering row clusters:                     R1 = {ri , | f 2 (i) < z r }, R2 = {ri ...
Clustering of Directed Graphs          Min directed edge weights between A & B:                                           ...
K-way Spectral Clustering                        K≥2Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of...
K-way Clustering Objectives         • Ratio Cut                                                         s(C k ,Cl )      ...
K-way Spectral Relaxation         • Prove that the solution lie in the subspace           spanned by the first k eigenvect...
K-way Spectral Relaxation                                                                h1 = (1m1,0m 0,0 m 0)T     Unsign...
K-way Ratio Cut Spectral Relaxation                                             -     Unsigned cluster indicators: x = (0 ...
K-way Normalized Cut Spectral Relaxation     Unsigned cluster indicators:                                                 ...
K-way Min-Max Cut Spectral Relaxation     Unsigned cluster indicators:                                                    ...
K-way Spectral Clustering          • Embedding (similar to PCA subspace approach)                – Embed data points in th...
DNA Gene expression     Lymphoma Cancer     (Alizadeh et al, 2000)                                              Genes   Ef...
Lymphoma Cancer       Tissue samples      B cell lymphoma go thru          different stages                –3 cancer stage...
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California   38
Brief summary of Part I     •   Spectral graph partitioning as origin     •   Clustering objective functions and solutions...
icml2004 tutorial on spectral clustering part I
Upcoming SlideShare
Loading in …5
×

icml2004 tutorial on spectral clustering part I

1,249 views

Published on

Published in: Technology
  • Be the first to comment

icml2004 tutorial on spectral clustering part I

  1. 1. A Tutorial on Spectral Clustering Chris Ding Computational Research Division Lawrence Berkeley National Laboratory University of California Supported by Office of Science, U.S. Dept. of EnergyTutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 1
  2. 2. Some historical notes • Fiedler, 1973, 1975, graph Laplacian matrix • Donath & Hoffman, 1973, bounds • Pothen, Simon, Liou, 1990, Spectral graph partitioning (many related papers there after) • Hagen & Kahng, 1992, Ratio-cut • Chan, Schlag & Zien, multi-way Ratio-cut • Chung, 1997, Spectral graph theory book • Shi & Malik, 2000, Normalized CutTutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 2
  3. 3. Spectral Gold-Rush of 2001 9 papers on spectral clustering • Meila & Shi, AI-Stat 2001. Random Walk interpreation of Normalized Cut • Ding, He & Zha, KDD 2001. Perturbation analysis of Laplacian matrix on sparsely connected graphs • Ng, Jordan & Weiss, NIPS 2001, K-means algorithm on the embeded eigen-space • Belkin & Niyogi, NIPS 2001. Spectral Embedding • Dhillon, KDD 2001, Bipartite graph clustering • Zha et al, CIKM 2001, Bipartite graph clustering • Zha et al, NIPS 2001. Spectral Relaxation of K-means • Ding et al, ICDM 2001. MinMaxCut, Uniqueness of relaxation. • Gu et al, K-way Relaxation of NormCut and MinMaxCutTutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 3
  4. 4. Part I: Basic Theory, 1973 – 2001Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 4
  5. 5. Spectral Graph Partitioning MinCut: min cutsize Constraint on sizes: |A| = |B| cutsize = # of cut edgesTutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 5
  6. 6. 2-way Spectral Graph Partitioning  1 if i ∈ A Partition membership indicator: qi =  − 1 if i ∈ B ∑ 1 J = CutSize = wij [qi − q j ]2 4 i, j ∑ ∑ 1 1 = wij [qi2 + q 2 − 2qi q j ] = j q [d δ − wij ]q j 4 i, j 2 i , j i i ij 1 T = q ( D − W )q 2 Relax indicators qi from discrete values to continuous values, the solution for min J(q) is given by the eigenvectors of ( D − W ) q = λq (Fiedler, 1973, 1975) (Pothen, Simon, Liou, 1990)Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 6
  7. 7. Properties of Graph Laplacian Laplacian matrix of the Graph: L = D − W • L is semi-positive definite xT Lx ≥ 0 for any x. • First eigenvector is q1=(1,…,1)T = eT with λ1=0. • Second eigenvector q2 is the desired solution. • The smaller λ2, the better quality of the partitioning. Perturbation analysis gives cutsize cutsize λ2 = + | A| |B| • Higher eigenvectors are also usefulTutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 7
  8. 8. Recovering Partitions From the definition of cluster indicators: Partitions A, B are determined by: A = {i | q2 (i ) < 0}, B = {i | q2 (i ) ≥ 0} However, the objective function J(q) is insensitive to additive constant c : ∑ 1 J = CutSize = w [( qi + c) − (q j + c)]2 4 i , j ij Thus, we sort q2 to increasing order, and cut in the middle point.Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 8
  9. 9. Multi-way Graph Partitioning • Recursively applying the 2-way partitioning • Recursive 2-way partitioning • Using Kernigan-Lin to do local refinements • Using higher eigenvectors • Using q3 to further partitioning those obtained via q2. • Popular graph partitioning packages • Metis, Univ of Minnesota • Chaco, Sandia Nat’l LabTutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 9
  10. 10. 2-way Spectral Clustering • Undirected graphs (pairwise similarities) • Bipartite graphs (contingency tables) • Directed graphs (web graphs)Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 10
  11. 11. Spectral Clustering min cutsize , without explicit size constraints But where to cut ? Need to balance sizesTutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 11
  12. 12. Clustering Objective Functions s(A,B) = ∑∑ w ij • Ratio Cut i∈A j∈B s(A,B) s(A,B) J Rcut (A,B) = + |A| |B| • Normalized Cut dA = ∑d i i∈A s( A, B ) s( A, B ) J Ncut ( A, B) = + dA dB s ( A, B) s ( A, B ) = + s ( A, A) + s ( A, B ) s(B, B ) + s ( A, B ) • Min-Max-Cut s(A,B) s(A,B) J MMC (A,B) = + s(A,A) s(B,B)Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 12
  13. 13. Ratio Cut (Hagen & Kahng, 1992) Min similarity between A , B: s(A,B) = ∑∑ w i∈ A j∈B ij s(A,B) s(A,B) Size Balance J Rcut (A,B) = + (Wei & Cheng, 1989) |A| |B|   n2 / n1n if i ∈ A Cluster membership indicator: q(i ) =  − n1 / n2 n  if i ∈ B Normalization: q T q = 1, q T e = 0 Substitute q leads to J Rcut (q) = q T ( D − W )q Now relax q, the by eigenvectornd eigenvector of L Solution given solution is 2Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 13
  14. 14. Normalized Cut (Shi & Malik, 1997) Min similarity between A & B: s(A,B) = ∑ ∑ wij i∈ A j∈B Balance weights s( A, B ) s( A, B ) J Ncut ( A, B) = + dA dB dA = ∑d i∈A i  d B / d Ad  if i ∈ A Cluster indicator: q(i ) =  − d A / d B d  if i ∈ B d= ∑d i∈G i Normalization: q Dq = 1, q De = 0 T T Substitute q leads to J Ncut (q) = q T ( D − W )q min q q T ( D − W )q + λ (q T Dq − 1) Solution is eigenvector of ( D − W )q = λDqTutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 14
  15. 15. MinMaxCut (Ding et al 2001) Min similarity between A & B: s(A,B) = ∑∑ w i∈A j∈B ij Max similarity within A & B: s(A,A) = ∑∑ w i∈A j∈A ij s(A,B) s(A,B) J MMC(A,B) = + s(A,A) s(B,B)   d B / d Ad if i ∈ A Cluster indicator: q(i ) =  − d A / d B d  if i ∈ B Substituting, 1+ dB / d A 1+ d A / dB q T Wq J MMC ( q) = + −2 Jm = Jm + dB / d A Jm + d A / dB q T Dq Because dJ MMC ( J m ) <0 min Jmmc ⇒ max Jm(q) dJ m ⇒ Wq = ξDq ⇒ ( D − W )q = λDqTutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 15
  16. 16. A simple example 2 dense clusters, with sparse connections between them. Adjacency matrix Eigenvector q2Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 16
  17. 17. Comparison of Clustering Objectives • If clusters are well separated, all three give very similar and accurate results. • When clusters are marginally separated, NormCut and MinMaxCut give better results • When clusters overlap significantly, MinMaxCut tend to give more compact and balanced clusters. s ( A, B) s ( A, B) J Ncut = + s ( A, A) + s(A, B) s(B, B) + s(A, B) Cluster Compactness ⇒ max s ( A, A)Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 17
  18. 18. 2-way Clustering of Newsgroups Newsgroups RatioCut NormCut MinMaxCut Atheism 63.2 ± 16.2 97.2 ± 0.8 97.2 ± 1.1 Comp.graphics Baseball 54.9 ± 2.5 74.4 ± 20.4 79.5 ± 11.0 Hockey Politics.mideast 53.6 ± 3.1 57.5 ± 0.9 83.6 ± 2.5 Politics.miscTutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 18
  19. 19. Cluster Balance Analysis I: Random Graph Model • Random graph: edges are randomly assigned with probability p: 0 ≤ p ≤ 1. • RatioCut & NormCut show no size dependence p | A || B | p | A || B | J Rcut ( A, B) = + = np = constant | A| |B| p | A || B | p | A || B | n J Ncut ( A, B) = + = = constant p | A | (n − 1) p | B | ( n − 1) n − 1 • MinMaxCut favors balanced clusters: |A|=|B| p | A || B | p | A || B | |B| | A| J MMC ( A, B) = + = + p | A | (| A | −1) p | B | (| B | −1) | A | −1 | B | −1Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 19
  20. 20. 2-way Clustering of Newsgroups Cluster Balance Eigenvector JNcut(i) JMMC(i)Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 20
  21. 21. Cluster Balance Analysis II: Large Overlap Case s ( A, B ) f = > 0.5 (1 / 2)[ s( A, A) + s( B, B)] Conditions for skewed cuts: 1 1 NormCut : s(A,A) ≥ ( − ) s ( A, B ) = s ( A, B) / 2 2f 2 1 MinMaxCut : s(A,A) ≥ s ( A, B) = s ( A, B ) 2f Thus MinMaxCut is much less prone to skewed cutsTutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 21
  22. 22. Spectral Clustering of Bipartite Graphs Simultaneous clustering of rows and columns of a contingency table (adjacency matrix B ) Examples of bipartite graphs • Information Retrieval: word-by-document matrix • Market basket data: transaction-by-item matrix • DNA Gene expression profiles • Protein vs protein-complexTutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 22
  23. 23. Spectral Clustering of Bipartite Graphs Simultaneous clustering of rows and columns (adjacency matrix B ) s ( BR1 ,C2 ) = ∑ ∑b ri ∈R1c j ∈C 2 ij min between-cluster sum of xyz weights: s(R1,C2), s(R2,C1) max within-cluster sum of xyz cut xyz weights: s(R1,C1), s(R2,C2) s ( BR1 ,C2 ) + s ( B R2 ,C1 ) s ( BR1 ,C2 ) + s ( B R2 ,C1 ) J MMC (C1 , C 2 ; R1 , R2 ) = + 2s ( B R1 ,C1 ) 2s ( BR2 ,C2 ) (Ding, AI-STAT 2003)Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 23
  24. 24. Bipartite Graph Clustering Clustering indicators for rows and columns:  1 if ri ∈ R1  1 if ci ∈ C1 f (i ) =  g (i ) =  − 1 if ri ∈ R2 − 1 if ci ∈ C2  BR ,C BR1 ,C2   0 B f  B= 1 1  W = T  q=  g  BR ,C BR2 ,C2  B 0    2 1    Substitute and obtain s (W12 ) s (W12 ) J MMC (C1 , C 2 ; R1 , R2 ) = + s (W11 ) s (W22 ) f,g are determined by  D r   0 B   f  D  f   −    = λ  r      Dc   B T    g 0       Dc  g   Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 24
  25. 25. Clustering of Bipartite Graphs Let ~ u   D1 / 2 f  B = Dr−1/ 2 BDc 1/ 2 , z =   = Dq =  r / 2  −    D1 g  v  c  We obtain ~  0 B  u  u  ~    B T 0  v  = λ  v         m ∑u λ v ~ Solution is SVD: B= T k k k k =1 (Zha et al, 2001, Dhillon, 2001)Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 25
  26. 26. Clustering of Bipartite Graphs Recovering row clusters: R1 = {ri , | f 2 (i) < z r }, R2 = {ri , | f 2 (i) ≥ z r }, Recovering column clusters: C1 = {ci , | g 2 (i ) < z c }, C 2 = {ci , | g 2 (i ) ≥ z c }, zr=zc=0 are dividing points. Relaxation is invariant up to a constant shift. Algorithm: search for optimal points icut, jcut, let zr=f2(icut), zc= g2(jcut), such that J MMC (C1 , C2 ; R1 , R2 ) (Zha et al, 2001) is minimized.Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 26
  27. 27. Clustering of Directed Graphs Min directed edge weights between A & B: s(A,B)= ∑∑(w i∈A j∈B ij + w ji ) Max directed edges within A & B: s(A,A)= ∑∑(w i∈A j∈A ij + w ji ) • Equivalent to deal with W = W + W T ~ ~ • All spectral methods apply to W • For example, web graphs clustered in such way (He, Ding, Zha, Simon, ICDM 2001)Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 27
  28. 28. K-way Spectral Clustering K≥2Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 28
  29. 29. K-way Clustering Objectives • Ratio Cut  s(C k ,Cl ) s (C k ,Cl )  s(C k ,G − C k ) J Rcut (C1 , , C K ) = ∑   k ,l |C k| + |Cl|  =  ∑ k |C k| • Normalized Cut  s(C k ,Cl ) s (C k ,Cl )  s(C k ,G − C k ) J Ncut (C1 , , C K ) = ∑   k ,l dk + dl =   ∑ k dk • Min-Max-Cut  s(C k ,Cl ) s (C k ,Cl )  s(C k ,G − C k ) J MMC (C1 , , C K ) = ∑  s(C , C ) + s(C , C )  = ∑   k ,l k k   l l k s (C k , C k )Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 29
  30. 30. K-way Spectral Relaxation • Prove that the solution lie in the subspace spanned by the first k eigenvectors • Ratio Cut • Normalized Cut • Min-Max-CutTutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 30
  31. 31. K-way Spectral Relaxation h1 = (1m1,0m 0,0 m 0)T Unsigned cluster indicators: h2 = (0 m 0,1m1,0 m 0)T mmm Re-write: hk = (0 m 0,0 m 0,1m1)T h1 ( D − W )h1 T hk ( D − W ) hk T J Rcut (h1 , , hk ) = T ++ T h1 h1 hk hk h1 ( D − W )h1 T hk ( D − W )hk T J Ncut (h1 , , hk ) = T ++ T h1 Dh1 hk Dhk h1 ( D − W ) h1 T hk ( D − W )hk T J MMC (h1 , , hk ) = T ++ T h1 Wh1 hk WhkTutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 31
  32. 32. K-way Ratio Cut Spectral Relaxation - Unsigned cluster indicators: x = (0 0,11,0 0)T / n1/ 2 nk k k Re-write: J ( x , , x ) = xT ( D − W ) x + + xT ( D − W ) x Rcut 1 k 1 1 k k = Tr ( X T ( D − W ) X ) X = ( x1 , , xk ) Optimize : min Tr ( X T ( D − W ) X ), subject to X T X = I X By K. Fan’s theorem, optimal solution is eigenvectors: X=(v1,v2, …, vk), (D-W)vk=λkvk and lower-bound λ1 + + λk ≤ min J Rcut ( x1 , , xk ) (Chan, Schlag, Zien, 1994)Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 32
  33. 33. K-way Normalized Cut Spectral Relaxation Unsigned cluster indicators: -nk yk = D 1/ 2 (0 o 0,1o1,0o 0)T / || D1/ 2 hk || Re-write: ~ ~ J Ncut ( y1 , , y k ) = y1 ( I − W ) y1 + + y k ( I − W ) y k T T ~ ~ = Tr (Y T ( I − W )Y ) W = D −1/ 2WD −1/ 2 ~ Optimize : min Tr (Y T ( I − W )Y ), subject to Y T Y = I Y By K. Fan’s theorem, optimal solution is ~ eigenvectors: Y=(v1,v2, …, vk), ( I − W )vk = λk vk ( D − W )u k = λk Du k , u k = D −1 / 2 v k λ1 + l + λk ≤ min J Ncut ( y1 , l, y k ) (Gu, et al, 2001)Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 33
  34. 34. K-way Min-Max Cut Spectral Relaxation Unsigned cluster indicators: ~ y k = D1/ 2 hk / || D1/ 2 hk || W = D −1/ 2WD −1/ 2 Re-write: 1 1 J MMC ( y1 , , y k ) = T ~ ++ T ~ −k y1 W y1 yk W yk T ~ Optimize : min J MMC (Y ), subject to Y T Y = I , y k Wy k 0. Y Theorem. Optimal solution is by eigenvectors: ~ Y=(v1,v2, …, vk), W v k = λk v k k2 − k ≤ min J MMC ( y1 , m , y k ) λ1 + m + λk (Gu, et al, 2001)Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 34
  35. 35. K-way Spectral Clustering • Embedding (similar to PCA subspace approach) – Embed data points in the subspace of the K eigenvectors – Clustering embedded points using another algorithm, such as K- means (Shi Malik, Ng et al, Zha, et al) • Recursive 2-way clustering (standard graph partitioning) – If desired K is not power of 2, how optimcally to choose the next sub-cluster to split? (Ding, et al 2002) • Both above approach do not use K-way clustering objective functions. • Refine the obtained clusters using the K-way clustering objective function typically improve the results (Ding et al 2002).Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 35
  36. 36. DNA Gene expression Lymphoma Cancer (Alizadeh et al, 2000) Genes Effects of feature selection: Select 900 genes out of 4025 genesTutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California Tissue sample 36
  37. 37. Lymphoma Cancer Tissue samples B cell lymphoma go thru different stages –3 cancer stages –3 normal stages Key question: can we detect them automatically ? PCA 2D DisplayTutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 37
  38. 38. Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 38
  39. 39. Brief summary of Part I • Spectral graph partitioning as origin • Clustering objective functions and solutions • Extensions to bipartite and directed graphs • Characteristics – Principled approach – Well-motivated objective functions – Clear, un-ambiguous – A framework of rich structures and contents – Everything is proved rigorously (within the relaxation framework, i.e., using continuous approximation of the discrete variables) • Above results mostly done by 2001. • More to come in Part IITutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 39

×