Presented by Danushka Bollegala
 Spectrum = the set of eigenvalues
 By looking at the spectrum we can know about
the graph itself!
 A way of normalizing data (canonical form)
and then perform clustering (e.g. via k-
means) on this normalized/reduced space.
 Input: A similarity matrix
 Output: A set of (non-overlapping/hard)
clusters.
 UndirectedGraph G(V, E)
 V: set of vertices (nodes in the network)
 E: set of edges (links in the network)
▪ Weight wij is the weight of the edge connecting vertex I
and j (represented by the affinity matrix.)
 Degree: sum of weights on outgoing edges of a
vertex.
 Measuring the size of a subset A ofV
 How to create the affinity matrixW from the
similarity matrix S?
 ε-neighborhood graph
▪ Connect all vertices that have similarity greater than ε
 k-nearest neighbor graph
▪ Connect the k-nearest neighbors of each vertex.
▪ Mutual k-nearest neighbor graphs for asymmetric S.
 Fully connected graph
▪ Use the Gaussian similarity function (kernel)
 L = D –W
 D: degree matrix. A diagonal matrix diag(d1,...,dn)
 Properties
 For every vector
 L is symmetric and positive semi-definite
 The smallest eigenvalue of L is zero and the
corresponding eigenvector is 1 = (1,...,1)T
 L has n non-negative, real-valued eigenvalues
 Two versions exist
 Lsym = D-1/2LD-1/2 = I - D-1/2WD-1/2
 Lrw = D-1L = I - D-1W
 The partition (A1,...,Ak) induces a cut on the graph
 Two types of graph cuts exist
 Spectral clustering solves a relaxed version of the
mincut problem (therefore it is an approximation)
By the Rayleigh-Ritz
theorem it follows that the
second eigenvalue is the
minimum.
 Transition probability matrix and Laplacian
are related!
 P = D-1W
 Lrw = I - P
 Lrw based spectral clustering (Shi &
Malik,2000) is better (especially when the
degree distribution is uneven).
 Use k-nearest neighbor graphs
 How to set the number of clusters:
 k=log(n)
 Use the eigengap heuristic
 If using Gaussian kernel how to set sigma
 Mean distance of a point to its log(n)+1 nearest
neighbors.
 Eckart-YoungTheorem
 The low-rank approximation B for a matrix A s.t.
rank(B) = r < rank(A) is given by,
 B = USV*, where A = UZV* and S is the same as Z
except the (r+1) and above singular values of Z are
set to zero.
 Approximation is done by minimizing the
Frobenius norm
▪ minB||A – B||F, subject to rank(B) = r
Spectral graph theory

Spectral graph theory

  • 1.
  • 2.
     Spectrum =the set of eigenvalues  By looking at the spectrum we can know about the graph itself!  A way of normalizing data (canonical form) and then perform clustering (e.g. via k- means) on this normalized/reduced space.  Input: A similarity matrix  Output: A set of (non-overlapping/hard) clusters.
  • 3.
     UndirectedGraph G(V,E)  V: set of vertices (nodes in the network)  E: set of edges (links in the network) ▪ Weight wij is the weight of the edge connecting vertex I and j (represented by the affinity matrix.)  Degree: sum of weights on outgoing edges of a vertex.  Measuring the size of a subset A ofV
  • 4.
     How tocreate the affinity matrixW from the similarity matrix S?  ε-neighborhood graph ▪ Connect all vertices that have similarity greater than ε  k-nearest neighbor graph ▪ Connect the k-nearest neighbors of each vertex. ▪ Mutual k-nearest neighbor graphs for asymmetric S.  Fully connected graph ▪ Use the Gaussian similarity function (kernel)
  • 5.
     L =D –W  D: degree matrix. A diagonal matrix diag(d1,...,dn)  Properties  For every vector  L is symmetric and positive semi-definite  The smallest eigenvalue of L is zero and the corresponding eigenvector is 1 = (1,...,1)T  L has n non-negative, real-valued eigenvalues
  • 6.
     Two versionsexist  Lsym = D-1/2LD-1/2 = I - D-1/2WD-1/2  Lrw = D-1L = I - D-1W
  • 11.
     The partition(A1,...,Ak) induces a cut on the graph  Two types of graph cuts exist  Spectral clustering solves a relaxed version of the mincut problem (therefore it is an approximation)
  • 12.
    By the Rayleigh-Ritz theoremit follows that the second eigenvalue is the minimum.
  • 14.
     Transition probabilitymatrix and Laplacian are related!  P = D-1W  Lrw = I - P
  • 15.
     Lrw basedspectral clustering (Shi & Malik,2000) is better (especially when the degree distribution is uneven).  Use k-nearest neighbor graphs  How to set the number of clusters:  k=log(n)  Use the eigengap heuristic  If using Gaussian kernel how to set sigma  Mean distance of a point to its log(n)+1 nearest neighbors.
  • 16.
     Eckart-YoungTheorem  Thelow-rank approximation B for a matrix A s.t. rank(B) = r < rank(A) is given by,  B = USV*, where A = UZV* and S is the same as Z except the (r+1) and above singular values of Z are set to zero.  Approximation is done by minimizing the Frobenius norm ▪ minB||A – B||F, subject to rank(B) = r