Part 2. Spectral Clustering from                   Matrix Perspective                A brief tutorial emphasizing recent d...
From PCA to spectral clustering                         using generalized eigenvectors            Consider the kernel matr...
Indicator Matrix Quadratic Clustering                                   Framework       Unsigned Cluster indicator Matrix ...
Brief Introduction to Spectral Clustering                        (Laplacian matrix based clustering)PCA & Matrix Factoriza...
Some historical notes         •   Fiedler, 1973, 1975, graph Laplacian matrix         •   Donath & Hoffman, 1973, bounds  ...
Spectral Gold-Rush of 2001                                  9 papers on spectral clustering     • Meila & Shi, AI-Stat 200...
Spectral Clustering         min cutsize , without explicit size constraints          But where to cut ?            Need to...
Graph Clustering                                       min between-cluster similarities (weights)                         ...
Clustering Objective Functions                                                                            s(A,B) =   ∑∑ w ...
Normalized Cut (Shi & Malik, 2000)           Min similarity between A & B: s(A,B) = ∑                             ∑ wij   ...
A simple example                 2 dense clusters, with sparse connections                 between them.               Adj...
K-way Spectral Clustering                       K≥2PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Din...
K-way Clustering Objectives         • Ratio Cut                                                    ⎛ s (C k ,Cl ) s (C k ,...
K-way Spectral Relaxation                                                             h1 = (1L1,0 L 0,0 L 0)T     Unsigned...
K-way Normalized Cut Spectral Relaxation     Unsigned cluster indicators:                             nk                  ...
K-way Spectral Clustering is difficult         • Spectral clustering is best applied to 2-way           clustering        ...
Scaled PCA: a Unified Framework                           for clustering and ordering       • Scaled PCA has two optimalit...
Scaled PCA            similarity matrix S=(sij) (generated from XXT)                                                   D =...
Scaled PCA on a Rectangle Matrix               ⇒ Correspondence Analysis                         ~ −1 −1 ~   Nonlinear re-...
Correspondence Analysis (CA)      • Mainly used in graphical display of data      • Popular in France (Benzécri, 1969)    ...
Clustering of Bipartite Graphs (rectangle matrix)           Simultaneous clustering of rows and columns           of a con...
Bipartite Graph Clustering         Clustering indicators for rows and columns:                         ⎧ 1 if ri ∈ R1     ...
Spectral Clustering of Bipartite Graphs         Simultaneous clustering of rows and columns                    (adjacency ...
Internet Newsgroups   Simultaneous clustering   of documents and wordsPCA & Matrix Factorizations for Learning, ICML 2005 ...
Embedding in Principal Subspace               Cluster Self-Aggregation           (proved in perturbation analysis)        ...
Spectral Embedding: Self-aggregation                        • Compute K eigenvectors of the Laplacian.                    ...
Spectral embedding is not                              topology preserving              700 3-D data points form          ...
Spectral Embedding          Simplex Embedding Theorem.          Objects self-aggregate to K centroids          Centroids l...
Perturbation Analysis      Wq = λDq                   Wˆ z = ( D −1 / 2WD −1 / 2 ) z = λz q = D −1 / 2 z        Assume dat...
Spectral Perturbation Theorem       Orthogonal Transform Matrix                             T = (t1 ,L , t K )            ...
Connectivity Network                               ⎧ 1 if i, j belong to same cluster                         Cij = ⎨     ...
Similarity matrix W   1st order Perturbation: Example 1                                                                   ...
Optimality Properties of Scaled PCA    Scaled principal components have optimality properties:    Ordering          – Adja...
Spectral Graph Ordering    (Barnard, Pothen, Simon, 1993), envelop reduction of sparse      matrix: find ordering such tha...
Distance Sensitive Ordering     Given a graph. Find an optimal Ordering of the nodes.                                     ...
Distance Sensitive Ordering           J (π ) = ∑ (i − j ) wπ i ,π j = ∑ (i − j ) wπ i ,π j                                ...
Distance Sensitive Ordering        Once q2 is computed, since                          q2 (i ) < q2 ( j ) ⇒ π             ...
Re-ordering of Genes and Tissues                                                                                      J (π...
Spectral clustering vs Spectral ordering         • Continuous approximation of both integer         programming problems a...
Linearized Cluster Assignment        Turn spectral clustering to 1D clustering problem         • Spectral ordering on conn...
Cluster overlap and crossing                    Given similarity W, and clusters A,B.        • Cluster overlap            ...
cluster crossingPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding        97
K-way Clustering Experiments              Accuracy of clustering results:         Method Linearized                       ...
Some Additional                          Advanced/related Topics         •   Random talks and normalized cut         •   S...
Upcoming SlideShare
Loading in …5
×

Principal component analysis and matrix factorizations for learning (part 2) ding - icml 2005 tutorial - 2005

1,188 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,188
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
36
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Principal component analysis and matrix factorizations for learning (part 2) ding - icml 2005 tutorial - 2005

  1. 1. Part 2. Spectral Clustering from Matrix Perspective A brief tutorial emphasizing recent developments (More detailed tutorial is given in ICML’04 )PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 56
  2. 2. From PCA to spectral clustering using generalized eigenvectors Consider the kernel matrix: Wij = φ ( xi ),φ ( x j ) In Kernel PCA we compute eigenvector: Wv = λv Generalized Eigenvector: Wq = λDq D = diag (d1,L, dn ) di = ∑w j ij This leads to Spectral Clustering !PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 57
  3. 3. Indicator Matrix Quadratic Clustering Framework Unsigned Cluster indicator Matrix H=(h1, …, hK) Kernel K-means clustering: max Tr( H T WH ), s.t. H T H = I , H ≥ 0 H K-means: W = XT X; Kernel K-means W = (< φ ( xi ),φ ( x j ) >) Spectral clustering (normalized cut) max Tr( H T WH ), s.t. H T DH = I , H ≥ 0 HPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 58
  4. 4. Brief Introduction to Spectral Clustering (Laplacian matrix based clustering)PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 59
  5. 5. Some historical notes • Fiedler, 1973, 1975, graph Laplacian matrix • Donath & Hoffman, 1973, bounds • Hall, 1970, Quadratic Placement (embedding) • Pothen, Simon, Liou, 1990, Spectral graph partitioning (many related papers there after) • Hagen & Kahng, 1992, Ratio-cut • Chan, Schlag & Zien, multi-way Ratio-cut • Chung, 1997, Spectral graph theory book • Shi & Malik, 2000, Normalized CutPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 60
  6. 6. Spectral Gold-Rush of 2001 9 papers on spectral clustering • Meila & Shi, AI-Stat 2001. Random Walk interpreation of Normalized Cut • Ding, He & Zha, KDD 2001. Perturbation analysis of Laplacian matrix on sparsely connected graphs • Ng, Jordan & Weiss, NIPS 2001, K-means algorithm on the embeded eigen-space • Belkin & Niyogi, NIPS 2001. Spectral Embedding • Dhillon, KDD 2001, Bipartite graph clustering • Zha et al, CIKM 2001, Bipartite graph clustering • Zha et al, NIPS 2001. Spectral Relaxation of K-means • Ding et al, ICDM 2001. MinMaxCut, Uniqueness of relaxation. • Gu et al, K-way Relaxation of NormCut and MinMaxCutPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 61
  7. 7. Spectral Clustering min cutsize , without explicit size constraints But where to cut ? Need to balance sizesPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 62
  8. 8. Graph Clustering min between-cluster similarities (weights) sim(A,B) = ∑∑ wij i∈ A j∈B Balance weight Balance size Balance volume sim(A,A) = ∑∑ wij i∈ A j∈ A max within-cluster similarities (weights)PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 63
  9. 9. Clustering Objective Functions s(A,B) = ∑∑ w ij • Ratio Cut i∈ A j∈B s(A,B) s(A,B) J Rcut (A,B) = + |A| |B| • Normalized Cut dA = ∑d i i∈A s ( A, B) s ( A, B) J Ncut ( A, B) = + dA dB s ( A, B) s ( A, B) = + s ( A, A) + s ( A, B) s(B, B) + s ( A, B) • Min-Max-Cut s(A,B) s(A,B) J MMC(A,B) = + s(A,A) s(B,B)PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 64
  10. 10. Normalized Cut (Shi & Malik, 2000) Min similarity between A & B: s(A,B) = ∑ ∑ wij i∈ A j∈B Balance weights s ( A, B) s ( A, B) J Ncut ( A, B) = dA + dB dA = ∑d i∈A i ⎧ d B / d Ad ⎪ if i ∈ A Cluster indicator: q (i ) = ⎨ ⎪− d A / d B d ⎩ if i ∈ B d= ∑d i∈G i Normalization: q Dq = 1, q De = 0 T T Substitute q leads to J Ncut (q) = q T ( D − W )q min q q T ( D − W )q + λ (q T Dq − 1) Solution is eigenvector of ( D − W )q = λDqPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 65
  11. 11. A simple example 2 dense clusters, with sparse connections between them. Adjacency matrix Eigenvector q2PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 66
  12. 12. K-way Spectral Clustering K≥2PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 67
  13. 13. K-way Clustering Objectives • Ratio Cut ⎛ s (C k ,Cl ) s (C k ,Cl ) ⎞ s (C k ,G − C k ) J Rcut (C1 , L , C K ) = ∑ ⎜ ⎜ |C | + |C | ⎟ = < k ,l > ⎝ k l ⎟ ⎠ ∑ k |C k| • Normalized Cut ⎛ s (C k ,Cl ) s (C k ,Cl ) ⎞ s (C k ,G − C k ) J Ncut (C1 , L , C K ) = < k ,l > ∑⎜ ⎜ d ⎝ k + dl ⎟= ⎟ ⎠ ∑ k dk • Min-Max-Cut ⎛ s (C k ,Cl ) s (C k ,Cl ) ⎞ s (C k ,G − C k ) J MMC (C1 , L , C K ) = ⎜ < k ,l > ⎝ ∑ k k l l ⎠ ⎟ ⎜ s (C , C ) + s (C , C ) ⎟ = ∑ k s (C k , C k )PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 68
  14. 14. K-way Spectral Relaxation h1 = (1L1,0 L 0,0 L 0)T Unsigned cluster indicators: h2 = (0L 0,1L1,0 L 0)T LLL Re-write: hk = (0 L 0,0L 0,1L1)T h1 ( D − W )h1 T hk ( D − W )hk T J Rcut (h1 , L, hk ) = T +L+ T h1 h1 hk hk h1 ( D − W )h1 T hk ( D − W )hk T J Ncut (h1 , L, hk ) = T +L+ T h1 Dh1 hk Dhk h1 ( D − W )h1 T hk ( D − W )hk T J MMC (h1 , L , hk ) = T +L+ T h1 Wh1 hk WhkPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 69
  15. 15. K-way Normalized Cut Spectral Relaxation Unsigned cluster indicators: nk } y k = D1/ 2 (0 L 0,1L1,0L 0)T / || D1/ 2 hk || Re-write: ~ ~ J Ncut ( y1 , L , y k ) = T y1 ( I − W ) y1 + L + y k ( I − W ) y k T ~ ~ = Tr (Y T ( I − W )Y ) W = D −1/ 2WD −1/ 2 ~ Optimize : min Tr (Y ( I − W )Y ), subject to Y T Y = I T Y By K. Fan’s theorem, optimal solution is ~ eigenvectors: Y=(v1,v2, …, vk), ( I − W )vk = λk vk ( D − W )u k = λk Du k , u k = D −1/ 2 vk λ1 + L + λk ≤ min J Ncut ( y1 ,L , y k ) (Gu, et al, 2001)PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 70
  16. 16. K-way Spectral Clustering is difficult • Spectral clustering is best applied to 2-way clustering – positive entries for one cluster – negative entries for another cluster • For K-way (K>2) clustering – Positive and negative signs make cluster assignment difficult – Recursive 2-way clustering – Low-dimension embedding. Project the data to eigenvector subspace; use another clustering method such as K-means to cluster the data (Ng et al; Zha et al; Back & Jordan, etc) – Linearized cluster assignment using spectral ordering and cluster crossingPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 71
  17. 17. Scaled PCA: a Unified Framework for clustering and ordering • Scaled PCA has two optimality properties – Distance sensitive ordering – Min-max principle Clustering • SPCA on contingency table ⇒ Correspondence Analysis – Simultaneous ordering of rows and columns – Simultaneous clustering of rows and columnsPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 72
  18. 18. Scaled PCA similarity matrix S=(sij) (generated from XXT) D = diag(d1 ,L, d n ) di = si. ~ −1 −1 ~ Nonlinear re-scaling: S = D SD , sij = sij /(si.s j. ) 2 2 1/ 2 ~ Apply SVD on S⇒ ~ 1 ⎡ T⎤ S = D S D = D ∑ zk λk z k D = D ⎢∑ qk λk qk ⎥ D 1 1 1 2 2 2 T 2 k ⎣k ⎦ qk = D-1/2 zk is the scaled principal component Subtract trivial component λ = 1, z = d 1/ 2 /s.., q =1 0 0 0 ⇒ S − dd T /s.. = D ∑ qk λk qT D k k =1 (Ding, et al, 2002)PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 73
  19. 19. Scaled PCA on a Rectangle Matrix ⇒ Correspondence Analysis ~ −1 −1 ~ Nonlinear re-scaling: P = D 2 PD 2 , p = p /( p p )1/ 2 r c ij ij i. j. ~ Apply SVD on P Subtract trivial component P − rc / p.. = Dr ∑ f k λk g Dc T T r = ( p1.,L, pn. ) T k k =1 −1 −1 c = ( p.1,L, p.n ) T fk = D u , gk = D v r 2 k 2 c k are the scaled row and column principal component (standard coordinates in CA)PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 74
  20. 20. Correspondence Analysis (CA) • Mainly used in graphical display of data • Popular in France (Benzécri, 1969) • Long history – Simultaneous row and column regression (Hirschfeld, 1935) – Reciprocal averaging (Richardson & Kuder, 1933; Horst, 1935; Fisher, 1940; Hill, 1974) – Canonical correlations, dual scaling, etc. • Formulation is a bit complicated (“convoluted” Jolliffe, 2002, p.342) • “A neglected method”, (Hill, 1974)PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 75
  21. 21. Clustering of Bipartite Graphs (rectangle matrix) Simultaneous clustering of rows and columns of a contingency table (adjacency matrix B ) Examples of bipartite graphs • Information Retrieval: word-by-document matrix • Market basket data: transaction-by-item matrix • DNA Gene expression profiles • Protein vs protein-complexPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 76
  22. 22. Bipartite Graph Clustering Clustering indicators for rows and columns: ⎧ 1 if ri ∈ R1 ⎧ 1 if ci ∈ C1 f (i ) = ⎨ g (i ) = ⎨ ⎩− 1 if ri ∈ R2 ⎩− 1 if ci ∈ C2 ⎛ BR1 ,C1 BR1 ,C2 ⎞ ⎛ 0 B⎞ ⎛f ⎞ B=⎜ ⎟ W =⎜ T ⎟ q=⎜ ⎟ ⎜g⎟ ⎜ BR ,C BR2 ,C2 ⎟ ⎜B 0⎟ ⎝ ⎠ ⎝ 2 1 ⎠ ⎝ ⎠ Substitute and obtain s (W12 ) s (W12 ) J MMC (C1 , C 2 ; R1 , R2 ) = + s (W11 ) s (W22 ) f,g are determined by ⎡⎛ Dr ⎞ ⎛ 0 B ⎞⎤⎛ f ⎞ ⎛ Dr ⎞⎛ f ⎞ ⎜ ⎢⎜ ⎟−⎜ T ⎟⎥⎜ ⎟ = λ ⎜ ⎟⎜ ⎟ ⎢⎝ ⎣ Dc ⎟ ⎜ B ⎠ ⎝ 0 ⎟⎥⎜ g ⎟ ⎠⎦⎝ ⎠ ⎜ ⎝ Dc ⎟⎜ g ⎟ ⎠⎝ ⎠PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 77
  23. 23. Spectral Clustering of Bipartite Graphs Simultaneous clustering of rows and columns (adjacency matrix B ) s ( BR1 ,C2 ) = ∑ ∑b ri ∈R1c j ∈C 2 ij min between-cluster sum of xyz weights: s(R1,C2), s(R2,C1) max within-cluster sum of xyz cut xyz weights: s(R1,C1), s(R2,C2) s ( BR1 ,C2 ) + s ( B R2 ,C1 ) s ( B R1 ,C2 ) + s ( B R2 ,C1 ) J MMC (C1 , C 2 ; R1 , R2 ) = + 2 s ( B R1 ,C1 ) 2 s ( B R2 ,C2 ) (Ding, AI-STAT 2003)PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 78
  24. 24. Internet Newsgroups Simultaneous clustering of documents and wordsPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 79
  25. 25. Embedding in Principal Subspace Cluster Self-Aggregation (proved in perturbation analysis) (Hall, 1970, “quadratic placement” (embedding) a graph)PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 80
  26. 26. Spectral Embedding: Self-aggregation • Compute K eigenvectors of the Laplacian. • Embed objects in the K-dim eigenspace (Ding, 2004)PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 81
  27. 27. Spectral embedding is not topology preserving 700 3-D data points form 2 interlock rings In eigenspace, they shrink and separatePCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 82
  28. 28. Spectral Embedding Simplex Embedding Theorem. Objects self-aggregate to K centroids Centroids locate on K corners of a simplex • Simplex consists K basis vectors + coordinate origin • Simplex is rotated by an orthogonal transformation T •T are determined by perturbation analysis (Ding, 2004)PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 83
  29. 29. Perturbation Analysis Wq = λDq Wˆ z = ( D −1 / 2WD −1 / 2 ) z = λz q = D −1 / 2 z Assume data has 3 dense clusters sparsely connected. C2 ⎡W W W ⎤ 11 12 13 C1 W = ⎢ 21 W22 W23⎥ ⎢W ⎥ ⎢ 31 W32 W33⎥ ⎣W ⎦ C3 Off-diagonal blocks are between-cluster connections, assumed small and are treated as a perturbation (Ding et al, KDD’01)PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 84
  30. 30. Spectral Perturbation Theorem Orthogonal Transform Matrix T = (t1 ,L , t K ) T are determined by: Γt k = λ k t k −1 −1 Spectral Perturbation Matrix Γ=Ω 2 ΓΩ 2 ⎡ h11 − s12 L − s1K ⎤ s pq = s (C p , Cq ) ⎢− s L − s2 K ⎥ Γ = ⎢ 21 ⎢ M h22 M L M ⎥ ⎥ hkk = ∑ s p| p ≠ k kp ⎢ ⎥ ⎣− s K 1 − s K 2 L hKK ⎦ Ω = diag[ ρ (C1 ),L, ρ (Ck )]PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 85
  31. 31. Connectivity Network ⎧ 1 if i, j belong to same cluster Cij = ⎨ ⎩ 0 otherwise K Scaled PCA provides C≅D ∑k =1 qk λk qT D k K ∑ 1 Green’s function : C ≈G = qk qT k =2 1 − λk k K Projection matrix: C≈P≡ ∑k =1 qk qT kPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding (Ding et al, 2002) 86
  32. 32. Similarity matrix W 1st order Perturbation: Example 1 1st order solutionConnectivity λ2 = 0.300, λ2 = 0.268 matrix Between-cluster connections suppressed Within-cluster connections enhanced Effects of self-aggregationPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 87
  33. 33. Optimality Properties of Scaled PCA Scaled principal components have optimality properties: Ordering – Adjacent objects along the order are similar – Far-away objects along the order are dissimilar – Optimal solution for the permutation index are given by scaled PCA. Clustering – Maximize within-cluster similarity – Minimize between-cluster similarity – Optimal solution for cluster membership indicators given by scaled PCA.PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 88
  34. 34. Spectral Graph Ordering (Barnard, Pothen, Simon, 1993), envelop reduction of sparse matrix: find ordering such that the envelop is minimized min ∑ max j | i − j | wij ⇒ min ∑ ( xi − x j ) wij 2 i ij (Hall, 1970), “quadratic placement of a graph”: Find coordinate x to minimize J= ∑ ij ( xi − x j ) 2 wij = x T ( D − W ) x Solution are eigenvectors of LaplacianPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 89
  35. 35. Distance Sensitive Ordering Given a graph. Find an optimal Ordering of the nodes. π permutation indexes J (π ) = ∑ d n−d i =1 π i ,π i + d π (1,L, n) = (π 1 ,L, π n ) w ∩∩ ∩∩∩∩∩∩∩∩ wπ1 ,π 3 J d =2 (π ) : ∪∪∪∪∪∪∪∪ min J (π ) = ∑ n −1 d =1 d J d (π ) 2 π The larger distance, the larger weights, panelity.PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 90
  36. 36. Distance Sensitive Ordering J (π ) = ∑ (i − j ) wπ i ,π j = ∑ (i − j ) wπ i ,π j 2 2 ij π i ,π j = ∑ (π − π ) wi , j i −1 −1 2 j ij n2 π i−1 −( n +1) / 2 π −1 −( n +1) / 2 2 = 8 ij ∑( n/2 − j n/2 ) wi , j Define: shifted and rescaled inverse permutation indexes π i−1 − (n + 1) /2 1− n 3 − n n −1 qi = ={ , ,L, } n /2 n n n J (π ) = n2 8 ∑ (qi − q j ) wij = q ( D − W )q 2 n2 4 T ijPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 91
  37. 37. Distance Sensitive Ordering Once q2 is computed, since q2 (i ) < q2 ( j ) ⇒ π i −1 <π −1 j π i −1 can be uniquely recovered from q2 Implementation: sort q2 induces πPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 92
  38. 38. Re-ordering of Genes and Tissues J (π ) r= J (random) r = 0.18 J d =1 (π ) rd =1= J d =1 ( random ) rd =1 = 3.39PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 93
  39. 39. Spectral clustering vs Spectral ordering • Continuous approximation of both integer programming problems are given by the same eigenvector • Different problems could have the same continuous approximate solution. • Quality of the approximation: Ordering: better quality: the solution relax from a set of evenly spaced discrete values Clustering: less better quality: solution relax from 2 discrete valuesPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 94
  40. 40. Linearized Cluster Assignment Turn spectral clustering to 1D clustering problem • Spectral ordering on connectivity network • Cluster crossing – Sum of similarities along anti-diagonal – Gives 1-D curve with valleys and peaks – Divide valleys and peaks into clustersPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 95
  41. 41. Cluster overlap and crossing Given similarity W, and clusters A,B. • Cluster overlap s(A,B) = ∑∑ w i∈ A j∈B ij • Cluster crossing compute a smaller fraction of cluster overlap. • Cluster crossing depends on an ordering o. It sums weights cross the site i along the order m ρ (i ) = ∑ wo (i− j ),o (i+ j ) j =1 • This is a sum along anti-diagonals of W.PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 96
  42. 42. cluster crossingPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 97
  43. 43. K-way Clustering Experiments Accuracy of clustering results: Method Linearized Recursive 2-way Embedding Assignment clustering + K-means Data A 89.0% 82.8% 75.1% Data B 75.7% 67.2% 56.4%PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 98
  44. 44. Some Additional Advanced/related Topics • Random talks and normalized cut • Semi-definite programming • Sub-sampling in spectral clustering • Extending to semi-supervised classification • Green’s function approach • Out-of-sample embedingPCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 99

×