Your SlideShare is downloading. ×
Spectral clustering Tutorial
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Spectral clustering Tutorial

3,279
views

Published on

Spectral clustering Tutorial

Spectral clustering Tutorial

Published in: Technology

1 Comment
1 Like
Statistics
Notes
  • 深入浅出,好文~
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
3,279
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
136
Comments
1
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • In this presentation, I will mainly talk about how to find a good partitioning of a given graph using spectral properties of that graph. In mathematics, the spectrum of a (finite-dimensional) matrix is the set of its eigenvalues.
  • In virtually every scientific field dealing with empirical data, people attempt to get a first impression on their data by trying to identify groups of “similar behavior” in their data. Connectivity models: hierarchical clustering; Centroid models: k-means; Distribution models: Mixture of Gaussian.
  • In this presentation, I will mainly talk about how to find a good partitioning of a given graph using spectral properties of that graph. In mathematics, the spectrum of a (finite-dimensional) matrix is the set of its eigenvalues.
  • vol(A) measures the size of A by summing over the weights of all edges attached to vertices in A.
  • 𝜀-neighborhood graph: the distances between all connected points are roughly of the same scale𝜀-neighborhood graph: the distances between all connected points are roughly of the same scale
  • As each Li is a graph Laplacian of a connected graph, we know that every Li has eigenvalue 0 with multiplicity 1, and the corresponding eigenvector is the constant one vector on the i-th connected component. Thus, the matrix L has as many eigenvalues 0 as there are connected components, and the corresponding eigenvectors are the indicator vectors of the connected components.
  • Due to the properties of the graph Laplacians that this change of representation is useful. We will see in the next sections that this change of representation enhances the cluster-properties in the data, so that clusters can be trivially detected in the new representation.
  • 10-nearest neighbor graph: We can see that the first four eigenvalues are 0, and the corresponding eigenvectors are cluster indicator vectors. The reason is that the clusters form disconnected parts in the 10-nearest neighbor graph, in which case the eigenvectors are given as in Propositions 2 and 4.Fully connected graph: this graph only consists of one connected component. Thus, eigenvalue 0 has multiplicity 1, and the first eigenvector is the constant vector. Threshold the second eigenvector at 0. the first four eigenvectors carry all the information about the four clusters.
  • The results are surprisingly good: Even for clusters that do not form convex regions or that are not cleanly separated, the algorithm reliably finds clusterings consistent with what a human would have chosen.
  • This problem can be restated as follows: we want to find a partition of the graph such that the edges between different groups have a very low weight (which meansthat points in different clusters are dissimilar from each other) and the edges within a group have high weight (which means that points within the same cluster are similar to each other).
  • we introduce the factor 1/2 for notational consistency, otherwise we would count each edge twice in the cut.
  • Spectral clustering is a way to solve relaxed versions of those problems.
  • Magic happens!!!
  • Again we need to re-convert the real valued solution matrix to a discrete partition. As above, the standard way is to use the k-means algorithms on the rows of U.
  • It is well known that many properties of a graph can be expressed in terms of the corresponding random walk transition matrix P
  • The embedding used in unnormalized spectral clustering is related to the commute time embedding, but not identical.
  • Transcript

    • 1. Spectral Clustering Zitao Liu
    • 2. Agenda• Brief Clustering Review• Similarity Graph• Graph Laplacian• Spectral Clustering Algorithm• Graph Cut Point of View• Random Walk Point of View• Perturbation Theory Point of View• Practical Details
    • 3. • Brief Clustering Review• Similarity Graph• Graph Laplacian• Spectral Clustering Algorithm• Graph Cut Point of View• Random Walk Point of View• Perturbation Theory Point of View• Practical Details
    • 4. CLUSTERING REVIEW
    • 5. K-MEANS CLUSTERING• Description Given a set of observations (x1, x2, …, xn), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k sets (k ≤ n) S = {S1, S2, …, Sk} so as to minimize the within-cluster sum of squares (WCSS): where μi is the mean of points in Si.• Standard Algorithm1) k initial "means" 2) k clusters are created 3) The centroid of 4) Steps 2 and 3(in this case k=3) by associating every each of the k are repeated untilare randomly observation with the clusters becomes convergence hasselected from the nearest mean. the new means. been reached.data set.
    • 6. • Brief Clustering Review• Similarity Graph• Graph Laplacian• Spectral Clustering Algorithm• Graph Cut Point of View• Random Walk Point of View• Perturbation Theory Point of View• Practical Details
    • 7. GENERAL Disconnected Groups of points(Weakly connections in between componentsgraph components Strongly connections within components)
    • 8. GRAPH NOTATION
    • 9. GRAPH NOTATION
    • 10. SIMILARITYGRAPH All the above graphs are regularly used in spectral clustering!
    • 11. Spectral Clustering• Brief Clustering Review• Similarity Graph• Graph Laplacian• Spectral Clustering Algorithm• Graph Cut Point of View• Random Walk Point of View• Perturbation Theory Point of View• Practical Details
    • 12. GRAPHLAPLACIANS• Unnormalized Graph Laplacian L=D-W
    • 13. GRAPHLAPLACIANS• Unnormalized Graph Laplacian L=D-W
    • 14. GRAPHLAPLACIANS• Unnormalized Graph Laplacian L=D-WProof:
    • 15. GRAPHLAPLACIANS• Normalized Graph Laplacian
    • 16. GRAPHLAPLACIANS Laplacian• Normalized Graph
    • 17. GRAPHLAPLACIANS• Normalized Graph Laplacian Proof. The proof is analogous to the one of Proposition 2, using Proposition 3.
    • 18. Spectral Clustering• Brief Clustering Review• Similarity Graph• Graph Laplacian• Spectral Clustering Algorithm• Graph Cut Point of View• Random Walk Point of View• Perturbation Theory Point of View• Practical Details
    • 19. ALGORIGHM
    • 20. ALGORIGHM• Unnormalized Graph Laplacian L=D-W
    • 21. ALGORIGHM• Normalized Graph Laplacian
    • 22. ALGORIGHM• Normalized Graph Laplacian
    • 23. ALGORIGHM It really works!!!
    • 24. ALGORIGHM
    • 25. Spectral Clustering• Brief Clustering Review• Similarity Graph• Graph Laplacian• Spectral Clustering Algorithm• Graph Cut Point of View• Random Walk Point of View• Perturbation Theory Point of View• Practical Details
    • 26. GRAPH CUT Disconnected Groups of points(Weakly connections in between componentsgraph components Strongly connections within components)
    • 27. GRAPH CUT
    • 28. GRAPH CUTProblems!!!• Sensitive to outliers What we get What we want
    • 29. GRAPH CUTSolutions• RatioCut(Hagen and Kahng, 1992)• Ncut(Shi and Malik, 2000)
    • 30. GRAPH CUTProblem!!!• NP hardSolution!!!• Approximation relaxing Ncut Normalized Spectral ClusteringApprox Spectralimation Clustering relaxing RatioCut Unnormalized Spectral Clustering
    • 31. GRAPH CUT• Approximation RatioCut for k=2 Our goal is to solve the optimization problem: Rewrite the problem in a more convenient form: Magic happens!!!
    • 32. GRAPH CUT• Approximation RatioCut for k=2
    • 33. GRAPH CUT• Approximation RatioCut for k=2 Additionally, we have f satisfies
    • 34. GRAPH CUT• Approximation RatioCut for k=2 Relaxation !!! Rayleigh-Ritz Theorem
    • 35. GRAPH CUT• Approximation RatioCut for k=2 f is the eigenvector corresponding to the second smallest eigenvalue of L Use the sign as indicator re-convert function Only works for k = 2 More General, works for any k
    • 36. GRAPH CUT• Approximation RatioCut for arbitrary k (i=1,…,n; j=1,…,k)
    • 37. GRAPH CUT• Approximation RatioCut for arbitrary k Problem reformulation: Relaxation !!! Rayleigh-Ritz Theorem Optimal H is the first k eigenvectors of L as columns.
    • 38. GRAPH CUT• Approximation Ncut for k=2 Our goal is to solve the optimization problem: Rewrite the problem in a more convenient form: Similar to above one can check that:
    • 39. GRAPH CUT• Approximation Ncut for k=2 (6) Relaxation !!! Rayleigh-Ritz Theorem!!!
    • 40. GRAPH CUT• Approximation Ncut for arbitrary k Problem reformulation: Relaxation !!! Rayleigh-Ritz Theorem
    • 41. Spectral Clustering• Brief Clustering Review• Similarity Graph• Graph Laplacian• Spectral Clustering Algorithm• Graph Cut Point of View• Random Walk Point of View• Perturbation Theory Point of View• Practical Details
    • 42. RANDOM WALK• A random walk on a graph is a stochastic process which randomly jumps from vertex to vertex.• Random walk stays long within the same cluster and seldom jumps between clusters.• A balanced partition with a low cut will also have the property that the random walk does not have many opportunities to jump between clusters.
    • 43. RANDOM WALK
    • 44. RANDOM WALK
    • 45. RANDOM WALK• Random walks and Ncut
    • 46. RANDOM WALK• Random walks and NcutProof. First of all observe thatUsing this we obtain Now the proposition follows directly with the definition of Ncut.
    • 47. RANDOM WALK• Random walks and Ncut
    • 48. RANDOM WALK• What is commute distance Points which are connected by a short path in the graph and lie in the same high-density region of the graph are considered closer to each other than points which are connected by a short path but lie in different high-density regions of the graph. Well-suited for Clustering
    • 49. RANDOM WALK• How to calculate commute distance Generalized inverse (also called pseudo-inverse or Moore-Penrose inverse)
    • 50. RANDOM WALK• How to calculate commute distance This is result has been published by Klein and Randic(1993), where it has been proved by methods of electrical network theory.
    • 51. RANDOM WALK• Proposition 6’s consequence Construct an embedding
    • 52. RANDOM WALK• A loose relation between spectral clustering and commute distance.
    • 53. Spectral Clustering• Brief Clustering Review• Similarity Graph• Graph Laplacian• Spectral Clustering Algorithm• Graph Cut Point of View• Random Walk Point of View• Perturbation Theory Point of View• Practical Details
    • 54. PERTURBATION THEORY• Perturbation theory studies the question of how eigenvalues and eigenvectors of a matrix A change if we add a small perturbation H.
    • 55. PERTURBATION THEORYBelow we will see that the size of the eigengap can also be used in a different contextas a quality criterion for spectral clustering, namely when choosing the number k ofclusters to construct.
    • 56. Spectral Clustering• Brief Clustering Review• Similarity Graph• Graph Laplacian• Spectral Clustering Algorithm• Graph Cut Point of View• Random Walk Point of View• Perturbation Theory Point of View• Practical Details
    • 57. PRACTICAL DETAILS• Constructing the similarity graph
    • 58. PRACTICAL DETAILS• Computing Eigenvectors 1. How to compute the first eigenvectors efficiently for large L 2. Numerical eigensolvers converge to some orthonormal basis of the eigenspace.• Number of Clusters
    • 59. PRACTICAL DETAILS Well Separated More Blurry Overlap So MuchEigengap Heuristic usually works well if the data contains very well pronouncedclusters, but in ambiguous cases it also returns ambiguous results.
    • 60. PRACTICAL DETAILS• The k-means step It is not necessary. People also use other techniques• Which graph Laplacian should be used?
    • 61. PRACTICAL DETAILS• Which graph Laplacian should be used? Why normalized is better than unnormalized spectral clustering? Objective1: Both RatioCut and Ncut directly implement Objective2: Only Ncut implements Normalized spectral clustering implements both clustering objectives mentioned above, while unnormalized spectral clustering only implements the first obejctive.
    • 62. PRACTICAL DETAILS• Which graph Laplacian should be used?
    • 63. REFERENCE• Ulrike Von Luxburg. A Tutorial on Spectral Clustering. Max Planck Institute for Biological Cybernetics Technical Report No. TR-149.• Marina Meila and Jianbo Shi. A random walks view of spectral segmentation. In Tenth International Workshop on Artificial Intelligence and Statistics (AISTATS), 2001.• A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems (NIPS),14, 2001.• A. Azran and Z. Ghahramani. A new Approach to Data Driven Clustering. In International Conference on Machine Learning (ICML),11, 2006.• A. Azran and Z. Ghahramani. Spectral Methods for Automatic Multiscale Data Clustering. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006.• Arik Azran. A Tutorial on Spectral Clustring. http://videolectures.net/mlcued08_azran_mcl/
    • 64. SPECTRAL CLUSTERING Thank you