Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Spectral clustering Tutorial by Zitao Liu 4965 views
- Spectral clustering by So Yeon Kim 826 views
- Blog clustering by Ahmad Ammari 1194 views
- Laplacian Colormaps: a framework fo... by Davide Eynard 4157 views
- ICWSM12 Brief Review by Akisato Kimura 1207 views
- Teaching and Learning Support Activ... by Haruo Takemura 824 views

2,202 views

1,630 views

1,630 views

Published on

Published in:
Technology

No Downloads

Total views

2,202

On SlideShare

0

From Embeds

0

Number of Embeds

40

Shares

0

Downloads

18

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Paper digest “Large-Scale Spectral Clustering on Graphs” Akisato Kimura akisato@ieee.org, @_akisato
- 2. One-page abstract • Approx. acceleration of spectral clustering – by introducing additional nodes that enable us to compress the original graph, – resulting in a bipartite graph which is computationally efficient for spectral clustering. • Note – Large-scale spectral clustering, especially works well for dense graphs. – Not suitable for large-scale graph clustering, due to the sparsity in nature.
- 3. Spectral clustering [Shi & Malik 1997] • Notations – Undirected weighted graph 𝐺 = 𝑉, 𝐸 – Num. nodes 𝑛 = |𝑉|; Num. Edges 𝑚 = |𝐸| – Adjacency matrix 𝑊 = 𝑊𝑖,𝑗 𝑖,𝑗=1,2,…,𝑛 • Objective function – Solved by eigen-decomposition (EVD) min 𝑋∈ℝ 𝑘×𝑛 𝑇𝑟(𝑋 𝑇 𝐷−1/2 𝐿𝐷−1/2 𝑋) s.t. 𝑋 𝑇 𝑋 = 𝐼 (𝐿: graph Laplacian of 𝑊, 𝐷 = 𝐿 − 𝑊, 𝑘: num.clusters)
- 4. Main contribution of this work • SC needs 𝑂(𝑛3 ) computations due to EVD. • Several improvements so far. – Compressing the adjacency matrix by Nystrom method [Fowlkes+ 2004] – Reducing samples (= nodes) [Shinnou & Sasaki 2008] [Yan+ 2009] [Sakai & Imiya 2009] [Chen & Cai 2011] – Early stopping of EVD [Chen+ 2006] [Liu+ 2007] • In contrast, this work – Reducing the size of the graph.
- 5. • Why supernodes? --- Intuition from co-clustering – A partition of supernodes can induce a partition of the observed nodes, and vise versa. • Generating a set of 𝑑 ≪ 𝑛 supernodes Introducing supernodes Original graph Regular nodes Supernodes
- 6. How to generate supernodes 1. Randomly choosing 𝑑 regular nodes as seeds. 2. Calculating the shortest paths from the seeds to the other regular nodes. i. Converting adjacencies to distances. ii. Applying Dijkstra’s algorithm. 3. Partitioning all the regular nodes into 𝑑 disjoint subsets based on the shortest paths. 4. (Each subset corresponds to a supernode.)
- 7. After generating supernodes 𝑛 regular nodes 𝑑 supernodes 𝑊 𝑅 𝑊 = 𝑅𝑊 𝑅 ∈ ℤ 𝑑×𝑛: binary bipartite graph 𝑊 ∈ ℝ 𝑑×𝑛: bipartite, called a “reduced graph” 𝑊Propagating edge weights between regular nodes and supernodes
- 8. Spectral clustering on reduced graphs • Consider another representation of the reduced graph • Spectral clustering on 𝑊′ 𝑛 regular nodes 𝑑 supernodes 𝑛 regular nodes 𝑑 supernodes Result of spectral clustering on 𝑊′
- 9. Spectral clustering on reduced graphs • Spectral clustering on 𝑊′ becomes • It can be more simplified – 𝑦 is also an eigenvector of 𝑍𝑍 𝑇 ∈ ℝ 𝑑×𝑑 𝑛 regular nodes 𝑑 supernodes • Co-clustering structure • 𝑥 and 𝑦 are left & right singular vectors of 𝑍 ∈ ℝ 𝑑×𝑛. ∵ 𝑍𝑍 𝑇 𝑦 = 𝑍 1 − 𝜆 𝑥 = 1 − 𝜆 2 𝑦 (𝑍𝑍 𝑇 looks like a compressed representation of 𝑊.)
- 10. In summary Described by now Additional steps
- 11. Regenerating supernodes • Intuitions 1. The matrix 𝑈 ∈ ℝ 𝑛×𝑘 implies the current clustering. 2. Most of the nodes in the same cluster expect to be densely connected. • Method – Selecting 𝑘 − 1 right (= with large eigenvalues) vectors as supernodes. 𝑈 𝑛 regular nodes 𝑑 supernodes 𝑘 cluster nodes 𝑊
- 12. In detail New regular-super links Average affiliation score over all the samples. • Resulting in (𝑘 − 1) edges from every regular node. • Every edge stands for a binalized affiliation score • So, this idea can be easily extended to quantized affiliation scores with arbitrary sizes
- 13. Finally, the algorithm is as follows Generating or updating supernodes Small-size spectral clustering can be replaced to a function of 𝑡 as 𝑙 𝑡
- 14. Computational costs 3-4. 𝑂(𝑚𝑑) 1-2. 𝑂(𝑛𝑑 log 𝑛) 6. 𝑂 𝑛𝑑2 + 𝑂(𝑑3) 7-9. 𝑂(𝑛𝑑𝑘) Alg. 1: 𝑂(𝑛𝑑 log 𝑛 + 𝑚𝑑 + 𝑛𝑑2) 5. 𝑂(𝑛𝑑) 3. 𝑂(𝑛𝑑 log 𝑛 + 𝑚(𝑑 + 1)) 5. 𝑂(𝑚𝑘) Alg. 2: 𝑂(𝑚𝑘) Alg. 3: 𝑂(𝑛𝑑 log 𝑛 + 𝑚𝑑 + 𝑚𝑘𝑡 + 𝑛(𝑑2 + 𝑘2 𝑡)) If 𝑑2 ≈ 𝑘2 𝑡 ≈ log2 𝑛 → 𝑂 𝑛 log2 𝑛 ( = modularity-based clustering)
- 15. Data sets for experiments • 2 synthetic, 2 real-world. – Syn-1k: kNN graph; 100k: 100-ins & 40-outs – DBLP: Author network, co-conference links. – IMDB: Movie network, co-director links. • Looks like moderate-scale (not large-scale) graphs…
- 16. Experimental results Shortest Path (See Slide 6) Proposed (Alg. 1) Proposed (Alg. 3) Spectral Clustering [Khoa & Chawla 2012] [Fowlkes+ 2004] The proposed method is suitable for dense graphs. (if sparse, modularity-based clustering would be better (𝑂 𝑛 log 𝑛 ∼ 𝑂(𝑛 log2 𝑛)) )
- 17. Detailed results Performance of the proposed methods w.r.t parameter 𝑑 (num.supernodes). Why not monotonically increasing? Performance of the proposed methods w.r.t parameter 𝑡 (num.iterations).
- 18. Qualitative evaluations • Toy example on Syn-1K Ground truth k-NN graph SP Proposed 1 Proposed 2 (5 iterations) SC RESC Nystrom
- 19. Comments • The idea and technique are interesting and maybe versatile. • (Serialized and parallel) implementation would be quite simple. – Matlab code is available at http://jialu.cs.illinois.edu/publication • Might be suitable only for dense graph clustering (with features).

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment