SlideShare a Scribd company logo
Part 2. Spectral Clustering from
                   Matrix Perspective


                A brief tutorial emphasizing recent developments

                      (More detailed tutorial is given in ICML’04 )



PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding   56
From PCA to spectral clustering
                         using generalized eigenvectors
            Consider the kernel matrix:                          Wij = φ ( xi ),φ ( x j )

          In Kernel PCA we compute eigenvector:                             Wv = λv

             Generalized Eigenvector:                         Wq = λDq

                                                       D = diag (d1,L, dn )       di =   ∑w j   ij


                   This leads to Spectral Clustering !
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                             57
Indicator Matrix Quadratic Clustering
                                   Framework

       Unsigned Cluster indicator Matrix H=(h1, …, hK)
       Kernel K-means clustering:

                            max Tr( H T WH ), s.t. H T H = I , H ≥ 0
                              H

             K-means:             W = XT X;             Kernel K-means W = (< φ ( xi ),φ ( x j ) >)

        Spectral clustering (normalized cut)

                           max Tr( H T WH ), s.t. H T DH = I , H ≥ 0
                             H

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                              58
Brief Introduction to Spectral Clustering
                        (Laplacian matrix based clustering)




PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding   59
Some historical notes
         •   Fiedler, 1973, 1975, graph Laplacian matrix
         •   Donath & Hoffman, 1973, bounds
         •   Hall, 1970, Quadratic Placement (embedding)
         •   Pothen, Simon, Liou, 1990, Spectral graph
             partitioning (many related papers there after)
         •   Hagen & Kahng, 1992, Ratio-cut
         •   Chan, Schlag & Zien, multi-way Ratio-cut
         •   Chung, 1997, Spectral graph theory book
         •   Shi & Malik, 2000, Normalized Cut
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding   60
Spectral Gold-Rush of 2001
                                  9 papers on spectral clustering

     • Meila & Shi, AI-Stat 2001. Random Walk interpreation of
           Normalized Cut
     • Ding, He & Zha, KDD 2001. Perturbation analysis of Laplacian
                    matrix on sparsely connected graphs
     • Ng, Jordan & Weiss, NIPS 2001, K-means algorithm on the
           embeded eigen-space
     • Belkin & Niyogi, NIPS 2001. Spectral Embedding
     • Dhillon, KDD 2001, Bipartite graph clustering
     • Zha et al, CIKM 2001, Bipartite graph clustering
     • Zha et al, NIPS 2001. Spectral Relaxation of K-means
     • Ding et al, ICDM 2001. MinMaxCut, Uniqueness of relaxation.
     • Gu et al, K-way Relaxation of NormCut and MinMaxCut
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding   61
Spectral Clustering
         min cutsize , without explicit size constraints
          But where to cut ?




            Need to balance sizes
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding   62
Graph Clustering
                                       min between-cluster similarities (weights)
                                                                           sim(A,B) = ∑∑ wij
                                                                                    i∈ A j∈B




                                                                              Balance weight
                                                                              Balance size
                                                                              Balance volume



                                                                            sim(A,A) = ∑∑ wij
                                                                                      i∈ A j∈ A

                                           max within-cluster similarities (weights)
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                          63
Clustering Objective Functions
                                                                            s(A,B) =   ∑∑ w           ij

         • Ratio Cut                                                                   i∈ A j∈B
                                                       s(A,B) s(A,B)
                             J Rcut (A,B) =                  +
                                                         |A|    |B|
         • Normalized Cut                                                         dA =    ∑d      i
                                                                                          i∈A
                                                      s ( A, B) s ( A, B)
                           J Ncut ( A, B) =                    +
                                                          dA        dB
                                                             s ( A, B)           s ( A, B)
                                                =                          +
                                                      s ( A, A) + s ( A, B) s(B, B) + s ( A, B)
         • Min-Max-Cut
                                                      s(A,B) s(A,B)
                             J MMC(A,B) =                   +
                                                      s(A,A) s(B,B)
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                                   64
Normalized Cut (Shi & Malik, 2000)

           Min similarity between A & B: s(A,B) = ∑                             ∑ wij
                                                                        i∈ A j∈B
      Balance weights                                        s ( A, B) s ( A, B)
                                  J Ncut ( A, B) =
                                                                 dA
                                                                      +
                                                                           dB      dA =    ∑d
                                                                                           i∈A
                                                                                                     i


                                                      ⎧ d B / d Ad
                                                      ⎪                    if i ∈ A
     Cluster indicator:                      q (i ) = ⎨
                                                      ⎪− d A / d B d
                                                      ⎩                    if i ∈ B   d=   ∑d
                                                                                           i∈G
                                                                                                 i

      Normalization:                       q Dq = 1, q De = 0
                                              T                T


      Substitute q leads to                       J Ncut (q) = q T ( D − W )q

       min q q T ( D − W )q + λ (q T Dq − 1)
        Solution is eigenvector of ( D − W )q = λDq
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                                 65
A simple example
                 2 dense clusters, with sparse connections
                 between them.
               Adjacency matrix                                            Eigenvector q2




PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                    66
K-way Spectral Clustering
                       K≥2




PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding   67
K-way Clustering Objectives

         • Ratio Cut
                                                    ⎛ s (C k ,Cl ) s (C k ,Cl ) ⎞           s (C k ,G − C k )
           J Rcut (C1 , L , C K ) =         ∑       ⎜
                                                    ⎜ |C | + |C | ⎟ =
                                           < k ,l > ⎝        k            l
                                                                                ⎟
                                                                                ⎠
                                                                                    ∑
                                                                                    k
                                                                                                   |C k|


         • Normalized Cut
                                             ⎛ s (C k ,Cl ) s (C k ,Cl ) ⎞                  s (C k ,G − C k )
           J Ncut (C1 , L , C K ) =
                                    < k ,l >
                                            ∑⎜
                                             ⎜ d
                                             ⎝       k
                                                           +
                                                                dl
                                                                         ⎟=
                                                                         ⎟
                                                                         ⎠
                                                                                    ∑
                                                                                    k
                                                                                                    dk

         • Min-Max-Cut
                                            ⎛ s (C k ,Cl ) s (C k ,Cl ) ⎞                     s (C k ,G − C k )
           J MMC (C1 , L , C K ) =          ⎜
                                   < k ,l > ⎝
                                             ∑    k    k        l   l ⎠
                                                                        ⎟
                                            ⎜ s (C , C ) + s (C , C ) ⎟ =           ∑   k
                                                                                                 s (C k , C k )

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                                          68
K-way Spectral Relaxation
                                                             h1 = (1L1,0 L 0,0 L 0)T
     Unsigned cluster indicators:
                                                             h2 = (0L 0,1L1,0 L 0)T
                                                             LLL
       Re-write:                                             hk = (0 L 0,0L 0,1L1)T

                                         h1 ( D − W )h1
                                          T
                                                                           hk ( D − W )hk
                                                                            T
             J Rcut (h1 , L, hk ) =              T
                                                              +L+               T
                                                h1 h1                          hk hk

                                          h1 ( D − W )h1
                                           T
                                                                           hk ( D − W )hk
                                                                            T
              J Ncut (h1 , L, hk ) =            T
                                                               +L+              T
                                               h1 Dh1                          hk Dhk
                                            h1 ( D − W )h1
                                             T
                                                                            hk ( D − W )hk
                                                                             T
              J MMC (h1 , L , hk ) =             T
                                                                +L+              T
                                                h1 Wh1                          hk Whk


PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                     69
K-way Normalized Cut Spectral Relaxation
     Unsigned cluster indicators:                             nk
                                                             }
                                          y k = D1/ 2 (0 L 0,1L1,0L 0)T / || D1/ 2 hk ||
      Re-write:                                           ~                      ~
                  J Ncut ( y1 , L , y k ) =     T
                                               y1   ( I − W ) y1 + L + y k ( I − W ) y k
                                                                         T

                                     ~                                  ~
                  = Tr (Y T ( I − W )Y )                               W = D −1/ 2WD −1/ 2
                                   ~
        Optimize : min Tr (Y ( I − W )Y ), subject to Y T Y = I
                                              T
                               Y
  By K. Fan’s theorem, optimal solution is
                                        ~
  eigenvectors: Y=(v1,v2, …, vk), ( I − W )vk = λk vk
                        ( D − W )u k = λk Du k ,                u k = D −1/ 2 vk

                        λ1 + L + λk ≤ min J Ncut ( y1 ,L , y k )                   (Gu, et al, 2001)

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                               70
K-way Spectral Clustering is difficult
         • Spectral clustering is best applied to 2-way
           clustering
               – positive entries for one cluster
               – negative entries for another cluster
         • For K-way (K>2) clustering
               – Positive and negative signs make cluster
                 assignment difficult
               – Recursive 2-way clustering
               – Low-dimension embedding. Project the data to
                 eigenvector subspace; use another clustering
                 method such as K-means to cluster the data (Ng
                 et al; Zha et al; Back & Jordan, etc)
               – Linearized cluster assignment using spectral ordering and
                 cluster crossing
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding     71
Scaled PCA: a Unified Framework
                           for clustering and ordering

       • Scaled PCA has two optimality properties
          – Distance sensitive ordering
          – Min-max principle Clustering
       • SPCA on contingency table ⇒ Correspondence Analysis
          – Simultaneous ordering of rows and columns
          – Simultaneous clustering of rows and columns




PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding   72
Scaled PCA
            similarity matrix S=(sij) (generated from XXT)
                                                   D = diag(d1 ,L, d n )                  di = si.
                                              ~ −1 −1 ~
   Nonlinear re-scaling:                      S = D SD , sij = sij /(si.s j. )
                                                    2 2                       1/ 2

                                   ~
    Apply SVD on                   S⇒
             ~ 1                       ⎡         T⎤
       S = D S D = D ∑ zk λk z k D = D ⎢∑ qk λk qk ⎥ D
                    1
                    1             1
                2   2
                    2         T   2


                      k                ⎣k          ⎦
        qk = D-1/2 zk is the scaled principal component
   Subtract trivial component λ = 1, z = d 1/ 2 /s.., q                                                  =1
                                                               0           0                         0

           ⇒            S − dd T /s.. = D ∑ qk λk qT D
                                                   k
                                                     k =1                      (Ding, et al, 2002)
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                                  73
Scaled PCA on a Rectangle Matrix
               ⇒ Correspondence Analysis
                         ~ −1 −1 ~
   Nonlinear re-scaling: P = D 2 PD 2 , p = p /( p p )1/ 2
                              r    c     ij  ij   i. j.

                       ~
          Apply SVD on P                               Subtract trivial component

           P − rc / p.. = Dr ∑ f k λk g Dc
                         T                                        T        r = ( p1.,L, pn. )
                                                                                            T
                                                                  k
                                               k =1
                                −1                      −1
                                                                           c = ( p.1,L, p.n )   T

                    fk = D u , gk = D v
                                r
                                  2
                                    k
                                                         2
                                                        c k
              are the scaled row and column principal
              component (standard coordinates in CA)
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                            74
Correspondence Analysis (CA)
      • Mainly used in graphical display of data
      • Popular in France (Benzécri, 1969)
      • Long history
            – Simultaneous row and column regression (Hirschfeld,
              1935)
            – Reciprocal averaging (Richardson & Kuder, 1933;
              Horst, 1935; Fisher, 1940; Hill, 1974)
            – Canonical correlations, dual scaling, etc.
      • Formulation is a bit complicated (“convoluted”
        Jolliffe, 2002, p.342)
      • “A neglected method”, (Hill, 1974)

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding   75
Clustering of Bipartite Graphs (rectangle matrix)

           Simultaneous clustering of rows and columns
           of a contingency table (adjacency matrix B )

            Examples of bipartite graphs
            • Information Retrieval: word-by-document matrix
            • Market basket data: transaction-by-item matrix
            • DNA Gene expression profiles
            • Protein vs protein-complex
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding   76
Bipartite Graph Clustering
         Clustering indicators for rows and columns:
                         ⎧ 1 if ri ∈ R1                                 ⎧ 1 if ci ∈ C1
                f (i ) = ⎨                                     g (i ) = ⎨
                         ⎩− 1 if ri ∈ R2                                ⎩− 1 if ci ∈ C2

                  ⎛ BR1 ,C1          BR1 ,C2 ⎞             ⎛ 0             B⎞         ⎛f ⎞
                B=⎜                          ⎟          W =⎜ T              ⎟       q=⎜ ⎟
                                                                                      ⎜g⎟
                  ⎜ BR ,C            BR2 ,C2 ⎟             ⎜B              0⎟         ⎝ ⎠
                  ⎝ 2 1                      ⎠             ⎝                ⎠
     Substitute and obtain
                                                                    s (W12 ) s (W12 )
                                      J MMC (C1 , C 2 ; R1 , R2 ) =         +
                                                                    s (W11 ) s (W22 )
    f,g are determined by
                        ⎡⎛ Dr         ⎞ ⎛ 0            B ⎞⎤⎛ f ⎞   ⎛ Dr            ⎞⎛ f ⎞
                         ⎜
                        ⎢⎜            ⎟−⎜ T              ⎟⎥⎜ ⎟ = λ ⎜               ⎟⎜ ⎟
                        ⎢⎝
                        ⎣          Dc ⎟ ⎜ B
                                      ⎠ ⎝              0 ⎟⎥⎜ g ⎟
                                                         ⎠⎦⎝ ⎠
                                                                   ⎜
                                                                   ⎝            Dc ⎟⎜ g ⎟
                                                                                   ⎠⎝ ⎠
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                     77
Spectral Clustering of Bipartite Graphs
         Simultaneous clustering of rows and columns
                    (adjacency matrix B )
                                                                                s ( BR1 ,C2 ) =   ∑ ∑b
                                                                                                  ri ∈R1c j ∈C 2
                                                                                                                   ij




                                                          min between-cluster sum of
                                                          xyz weights: s(R1,C2), s(R2,C1)
                                                          max within-cluster sum of xyz
                                 cut                      xyz weights: s(R1,C1), s(R2,C2)

                                        s ( BR1 ,C2 ) + s ( B R2 ,C1 )         s ( B R1 ,C2 ) + s ( B R2 ,C1 )
   J MMC (C1 , C 2 ; R1 , R2 ) =                                           +
                                                2 s ( B R1 ,C1 )                      2 s ( B R2 ,C2 )
                                                                                           (Ding, AI-STAT 2003)
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                                                78
Internet Newsgroups




   Simultaneous clustering
   of documents and words




PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding   79
Embedding in Principal Subspace

               Cluster Self-Aggregation
           (proved in perturbation analysis)

                                   (Hall, 1970, “quadratic placement” (embedding) a graph)


PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                     80
Spectral Embedding: Self-aggregation

                        • Compute K eigenvectors of the Laplacian.
                        • Embed objects in the K-dim eigenspace




                                                                           (Ding, 2004)
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                  81
Spectral embedding is not
                              topology preserving


              700 3-D data points form
              2 interlock rings



              In eigenspace, they
              shrink and separate




PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding   82
Spectral Embedding


          Simplex Embedding Theorem.
          Objects self-aggregate to K centroids
          Centroids locate on K corners of a simplex
                • Simplex consists K basis vectors + coordinate origin
                • Simplex is rotated by an orthogonal transformation T
                •T are determined by perturbation analysis




                                                                           (Ding, 2004)
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                  83
Perturbation Analysis
      Wq = λDq                   Wˆ z = ( D −1 / 2WD −1 / 2 ) z = λz q = D −1 / 2 z

        Assume data has 3 dense clusters sparsely connected.

                                                                                C2

                   ⎡W W W ⎤
                     11  12  13                        C1


               W = ⎢ 21 W22 W23⎥
                   ⎢W          ⎥
                   ⎢ 31 W32 W33⎥
                   ⎣W          ⎦                                  C3




     Off-diagonal blocks are between-cluster connections,
     assumed small and are treated as a perturbation
                                                                           (Ding et al, KDD’01)
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                          84
Spectral Perturbation Theorem

       Orthogonal Transform Matrix                             T = (t1 ,L , t K )

            T are determined by:                                 Γt k = λ k t k
                                                                                     −1          −1
           Spectral Perturbation Matrix                                    Γ=Ω        2
                                                                                           ΓΩ     2


                ⎡ h11    − s12             L − s1K ⎤                s pq = s (C p , Cq )
                ⎢− s                       L − s2 K ⎥
            Γ = ⎢ 21
                ⎢ M
                          h22
                           M               L   M ⎥
                                                    ⎥               hkk =   ∑           s
                                                                                p| p ≠ k kp
                ⎢                                   ⎥
                ⎣− s K 1 − s K 2           L hKK ⎦               Ω = diag[ ρ (C1 ),L, ρ (Ck )]



PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                              85
Connectivity Network
                               ⎧ 1 if i, j belong to same cluster
                         Cij = ⎨
                               ⎩ 0            otherwise

                                                                      K
      Scaled PCA provides                                C≅D        ∑k =1
                                                                            qk λk qT D
                                                                                   k

                                                                       K

                                                                     ∑
                                                                        1
       Green’s function :                             C ≈G =      qk       qT
                                                             k =2
                                                                     1 − λk k
                                                                      K
       Projection matrix:                             C≈P≡          ∑k =1
                                                                            qk qT
                                                                                k

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
                                                                                    (Ding et al, 2002)
                                                                                                         86
Similarity matrix W   1st order Perturbation: Example 1




                                                                               1st order
                                                                               solution
Connectivity




                                                      λ2 = 0.300, λ2 = 0.268
  matrix




                                             Between-cluster connections suppressed
                                             Within-cluster connections enhanced
                                                  Effects of self-aggregation
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                   87
Optimality Properties of Scaled PCA
    Scaled principal components have optimality properties:
    Ordering
          – Adjacent objects along the order are similar
          – Far-away objects along the order are dissimilar
          – Optimal solution for the permutation index are given by
            scaled PCA.

    Clustering
          – Maximize within-cluster similarity
          – Minimize between-cluster similarity
          – Optimal solution for cluster membership indicators given
            by scaled PCA.

PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding   88
Spectral Graph Ordering
    (Barnard, Pothen, Simon, 1993), envelop reduction of sparse
      matrix: find ordering such that the envelop is minimized

       min ∑ max j | i − j | wij ⇒ min ∑ ( xi − x j ) wij                       2

                  i                                                        ij


       (Hall, 1970), “quadratic placement of a graph”:
             Find coordinate x to minimize
                        J=      ∑ ij
                                       ( xi − x j ) 2 wij = x T ( D − W ) x

          Solution are eigenvectors of Laplacian
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding            89
Distance Sensitive Ordering
     Given a graph. Find an optimal Ordering of the nodes.
                                            π permutation indexes
          J (π ) = ∑
          d
                        n−d
                        i =1   π i ,π i + d
                                            π (1,L, n) = (π 1 ,L, π n )
                                                w

                           ∩∩
                           ∩∩∩∩∩∩∩∩
                            wπ1 ,π 3
  J d =2 (π ) :
                            ∪∪∪∪∪∪∪∪
                        min J (π ) = ∑                      n −1
                                                            d =1   d J d (π )
                                                                      2
                           π
      The larger distance, the larger weights, panelity.
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding        90
Distance Sensitive Ordering
           J (π ) = ∑ (i − j ) wπ i ,π j = ∑ (i − j ) wπ i ,π j
                                              2                                         2

                             ij                                   π i ,π j

                       = ∑ (π − π ) wi , j
                                       i
                                        −1         −1 2
                                                   j
                             ij
                             n2              π i−1 −( n +1) / 2            π −1 −( n +1) / 2 2
                       =
                               8 ij
                                       ∑(           n/2             −        j
                                                                                  n/2       ) wi , j
    Define: shifted and rescaled inverse permutation indexes
                       π i−1 − (n + 1) /2              1− n 3 − n     n −1
              qi =                                   ={    ,      ,L,     }
                                  n /2                  n     n         n
                J (π ) =          n2
                                  8    ∑ (qi − q j ) wij = q ( D − W )q
                                                            2                n2
                                                                             4
                                                                                  T

                                       ij
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                               91
Distance Sensitive Ordering
        Once q2 is computed, since

                          q2 (i ) < q2 ( j ) ⇒ π                i
                                                                 −1
                                                                      <π   −1
                                                                           j

              π   i
                   −1
                        can be uniquely recovered from q2


              Implementation: sort q2 induces π




PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding        92
Re-ordering of Genes and Tissues

                                                                                      J (π )
                                                                            r=
                                                                                    J (random)


                                                                               r = 0.18


                                                                                       J d =1 (π )
                                                                           rd =1=
                                                                                    J d =1 ( random )

                                                                              rd =1 = 3.39


PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding                             93
Spectral clustering vs Spectral ordering
         • Continuous approximation of both integer
         programming problems are given by the same
         eigenvector
         • Different problems could have the same
         continuous approximate solution.
         • Quality of the approximation:
               Ordering: better quality: the solution relax
               from a set of evenly spaced discrete values
               Clustering: less better quality: solution relax
               from 2 discrete values
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding   94
Linearized Cluster Assignment

        Turn spectral clustering to 1D clustering problem


         • Spectral ordering on connectivity network
         • Cluster crossing
               – Sum of similarities along anti-diagonal
               – Gives 1-D curve with valleys and peaks
               – Divide valleys and peaks into clusters



PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding   95
Cluster overlap and crossing
                    Given similarity W, and clusters A,B.
        • Cluster overlap                         s(A,B) =       ∑∑ w
                                                                 i∈ A j∈B
                                                                            ij

        • Cluster crossing compute a smaller fraction of cluster
          overlap.
        • Cluster crossing depends on an ordering o. It sums
          weights cross the site i along the order
                                                     m
                                       ρ (i ) = ∑ wo (i− j ),o (i+ j )
                                                     j =1


        • This is a sum along anti-diagonals of W.
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding         96
cluster crossing




PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding        97
K-way Clustering Experiments

              Accuracy of clustering results:

         Method Linearized                            Recursive 2-way Embedding
                           Assignment                 clustering      + K-means
         Data A 89.0%                                 82.8%                75.1%
         Data B 75.7%                                 67.2%                56.4%



PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding           98
Some Additional
                          Advanced/related Topics

         •   Random talks and normalized cut
         •   Semi-definite programming
         •   Sub-sampling in spectral clustering
         •   Extending to semi-supervised classification
         •   Green’s function approach
         •   Out-of-sample embeding



PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding   99

More Related Content

What's hot

Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math Colloquium
Christian Robert
 
M Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala
M Gumbel - SCABIO: a framework for bioinformatics algorithms in ScalaM Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala
M Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala
Jan Aerts
 
Elementary Landscape Decomposition of the Quadratic Assignment Problem
Elementary Landscape Decomposition of the Quadratic Assignment ProblemElementary Landscape Decomposition of the Quadratic Assignment Problem
Elementary Landscape Decomposition of the Quadratic Assignment Problem
jfrchicanog
 
Bayesian case studies, practical 2
Bayesian case studies, practical 2Bayesian case studies, practical 2
Bayesian case studies, practical 2
Robin Ryder
 
Proximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportProximal Splitting and Optimal Transport
Proximal Splitting and Optimal Transport
Gabriel Peyré
 
M. Visinescu - Higher Order First Integrals, Killing Tensors, Killing-Maxwell...
M. Visinescu - Higher Order First Integrals, Killing Tensors, Killing-Maxwell...M. Visinescu - Higher Order First Integrals, Killing Tensors, Killing-Maxwell...
M. Visinescu - Higher Order First Integrals, Killing Tensors, Killing-Maxwell...
SEENET-MTP
 
Coherent feedback formulation of a continuous quantum error correction protocol
Coherent feedback formulation of a continuous quantum error correction protocolCoherent feedback formulation of a continuous quantum error correction protocol
Coherent feedback formulation of a continuous quantum error correction protocol
hendrai
 
Predictve data mining
Predictve data miningPredictve data mining
Predictve data mining
Mintu246
 
A new class of a stable implicit schemes for treatment of stiff
A new class of a stable implicit schemes for treatment of stiffA new class of a stable implicit schemes for treatment of stiff
A new class of a stable implicit schemes for treatment of stiff
Alexander Decker
 
Beck Workshop on Modelling and Simulation of Coal-fired Power Generation and ...
Beck Workshop on Modelling and Simulation of Coal-fired Power Generation and ...Beck Workshop on Modelling and Simulation of Coal-fired Power Generation and ...
Beck Workshop on Modelling and Simulation of Coal-fired Power Generation and ...
UK Carbon Capture and Storage Research Centre
 
Theory of Relational Calculus and its Formalization
Theory of Relational Calculus and its FormalizationTheory of Relational Calculus and its Formalization
Theory of Relational Calculus and its Formalization
Yoshihiro Mizoguchi
 
Dragisa Zunic - Classical computing with explicit structural rules - the *X c...
Dragisa Zunic - Classical computing with explicit structural rules - the *X c...Dragisa Zunic - Classical computing with explicit structural rules - the *X c...
Dragisa Zunic - Classical computing with explicit structural rules - the *X c...
Dragisa Zunic
 
Lecture10 outilier l0_svdd
Lecture10 outilier l0_svddLecture10 outilier l0_svdd
Lecture10 outilier l0_svdd
Stéphane Canu
 
A Coq Library for the Theory of Relational Calculus
A Coq Library for the Theory of Relational CalculusA Coq Library for the Theory of Relational Calculus
A Coq Library for the Theory of Relational Calculus
Yoshihiro Mizoguchi
 
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
grssieee
 
Talk iccf 19_ben_hammouda
Talk iccf 19_ben_hammoudaTalk iccf 19_ben_hammouda
Talk iccf 19_ben_hammouda
Chiheb Ben Hammouda
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Chiheb Ben Hammouda
 
Hibbeler chapter10
Hibbeler chapter10Hibbeler chapter10
Hibbeler chapter10
ahmedalnamer
 
Introduction to harmonic analysis on groups, links with spatial correlation.
Introduction to harmonic analysis on groups, links with spatial correlation.Introduction to harmonic analysis on groups, links with spatial correlation.
Introduction to harmonic analysis on groups, links with spatial correlation.
Valentin De Bortoli
 
CMA-ES with local meta-models
CMA-ES with local meta-modelsCMA-ES with local meta-models
CMA-ES with local meta-models
zyedb
 

What's hot (20)

Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math Colloquium
 
M Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala
M Gumbel - SCABIO: a framework for bioinformatics algorithms in ScalaM Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala
M Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala
 
Elementary Landscape Decomposition of the Quadratic Assignment Problem
Elementary Landscape Decomposition of the Quadratic Assignment ProblemElementary Landscape Decomposition of the Quadratic Assignment Problem
Elementary Landscape Decomposition of the Quadratic Assignment Problem
 
Bayesian case studies, practical 2
Bayesian case studies, practical 2Bayesian case studies, practical 2
Bayesian case studies, practical 2
 
Proximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportProximal Splitting and Optimal Transport
Proximal Splitting and Optimal Transport
 
M. Visinescu - Higher Order First Integrals, Killing Tensors, Killing-Maxwell...
M. Visinescu - Higher Order First Integrals, Killing Tensors, Killing-Maxwell...M. Visinescu - Higher Order First Integrals, Killing Tensors, Killing-Maxwell...
M. Visinescu - Higher Order First Integrals, Killing Tensors, Killing-Maxwell...
 
Coherent feedback formulation of a continuous quantum error correction protocol
Coherent feedback formulation of a continuous quantum error correction protocolCoherent feedback formulation of a continuous quantum error correction protocol
Coherent feedback formulation of a continuous quantum error correction protocol
 
Predictve data mining
Predictve data miningPredictve data mining
Predictve data mining
 
A new class of a stable implicit schemes for treatment of stiff
A new class of a stable implicit schemes for treatment of stiffA new class of a stable implicit schemes for treatment of stiff
A new class of a stable implicit schemes for treatment of stiff
 
Beck Workshop on Modelling and Simulation of Coal-fired Power Generation and ...
Beck Workshop on Modelling and Simulation of Coal-fired Power Generation and ...Beck Workshop on Modelling and Simulation of Coal-fired Power Generation and ...
Beck Workshop on Modelling and Simulation of Coal-fired Power Generation and ...
 
Theory of Relational Calculus and its Formalization
Theory of Relational Calculus and its FormalizationTheory of Relational Calculus and its Formalization
Theory of Relational Calculus and its Formalization
 
Dragisa Zunic - Classical computing with explicit structural rules - the *X c...
Dragisa Zunic - Classical computing with explicit structural rules - the *X c...Dragisa Zunic - Classical computing with explicit structural rules - the *X c...
Dragisa Zunic - Classical computing with explicit structural rules - the *X c...
 
Lecture10 outilier l0_svdd
Lecture10 outilier l0_svddLecture10 outilier l0_svdd
Lecture10 outilier l0_svdd
 
A Coq Library for the Theory of Relational Calculus
A Coq Library for the Theory of Relational CalculusA Coq Library for the Theory of Relational Calculus
A Coq Library for the Theory of Relational Calculus
 
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
 
Talk iccf 19_ben_hammouda
Talk iccf 19_ben_hammoudaTalk iccf 19_ben_hammouda
Talk iccf 19_ben_hammouda
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
 
Hibbeler chapter10
Hibbeler chapter10Hibbeler chapter10
Hibbeler chapter10
 
Introduction to harmonic analysis on groups, links with spatial correlation.
Introduction to harmonic analysis on groups, links with spatial correlation.Introduction to harmonic analysis on groups, links with spatial correlation.
Introduction to harmonic analysis on groups, links with spatial correlation.
 
CMA-ES with local meta-models
CMA-ES with local meta-modelsCMA-ES with local meta-models
CMA-ES with local meta-models
 

Viewers also liked

Nonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problemNonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problem
Michele Filannino
 
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdfKernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
grssieee
 
fauvel_igarss.pdf
fauvel_igarss.pdffauvel_igarss.pdf
fauvel_igarss.pdf
grssieee
 
Different kind of distance and Statistical Distance
Different kind of distance and Statistical DistanceDifferent kind of distance and Statistical Distance
Different kind of distance and Statistical Distance
Khulna University
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
Jordan McBain
 
KPCA_Survey_Report
KPCA_Survey_ReportKPCA_Survey_Report
KPCA_Survey_Report
Randy Salm
 
Adaptive anomaly detection with kernel eigenspace splitting and merging
Adaptive anomaly detection with kernel eigenspace splitting and mergingAdaptive anomaly detection with kernel eigenspace splitting and merging
Adaptive anomaly detection with kernel eigenspace splitting and merging
ieeepondy
 
Analyzing Kernel Security and Approaches for Improving it
Analyzing Kernel Security and Approaches for Improving itAnalyzing Kernel Security and Approaches for Improving it
Analyzing Kernel Security and Approaches for Improving it
Milan Rajpara
 
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
hanshang
 
Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf
Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdfExplicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf
Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf
grssieee
 
A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...
A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...
A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...
Sahidul Islam
 
Regularized Principal Component Analysis for Spatial Data
Regularized Principal Component Analysis for Spatial DataRegularized Principal Component Analysis for Spatial Data
Regularized Principal Component Analysis for Spatial Data
Wen-Ting Wang
 
Pca and kpca of ecg signal
Pca and kpca of ecg signalPca and kpca of ecg signal
Pca and kpca of ecg signal
es712
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
Hakka Labs
 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and more
hsharmasshare
 
Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...
zukun
 
Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and Clustering
Usha Vijay
 
ECG: Indication and Interpretation
ECG: Indication and InterpretationECG: Indication and Interpretation
ECG: Indication and Interpretation
Rakesh Verma
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
mahutte
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
Farah M. Altufaili
 

Viewers also liked (20)

Nonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problemNonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problem
 
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdfKernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
 
fauvel_igarss.pdf
fauvel_igarss.pdffauvel_igarss.pdf
fauvel_igarss.pdf
 
Different kind of distance and Statistical Distance
Different kind of distance and Statistical DistanceDifferent kind of distance and Statistical Distance
Different kind of distance and Statistical Distance
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
 
KPCA_Survey_Report
KPCA_Survey_ReportKPCA_Survey_Report
KPCA_Survey_Report
 
Adaptive anomaly detection with kernel eigenspace splitting and merging
Adaptive anomaly detection with kernel eigenspace splitting and mergingAdaptive anomaly detection with kernel eigenspace splitting and merging
Adaptive anomaly detection with kernel eigenspace splitting and merging
 
Analyzing Kernel Security and Approaches for Improving it
Analyzing Kernel Security and Approaches for Improving itAnalyzing Kernel Security and Approaches for Improving it
Analyzing Kernel Security and Approaches for Improving it
 
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
 
Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf
Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdfExplicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf
Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf
 
A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...
A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...
A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...
 
Regularized Principal Component Analysis for Spatial Data
Regularized Principal Component Analysis for Spatial DataRegularized Principal Component Analysis for Spatial Data
Regularized Principal Component Analysis for Spatial Data
 
Pca and kpca of ecg signal
Pca and kpca of ecg signalPca and kpca of ecg signal
Pca and kpca of ecg signal
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and more
 
Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...
 
Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and Clustering
 
ECG: Indication and Interpretation
ECG: Indication and InterpretationECG: Indication and Interpretation
ECG: Indication and Interpretation
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 

Similar to Principal component analysis and matrix factorizations for learning (part 2) ding - icml 2005 tutorial - 2005

icml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Iicml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part I
zukun
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
Christian Robert
 
Matrix Computations in Machine Learning
Matrix Computations in Machine LearningMatrix Computations in Machine Learning
Matrix Computations in Machine Learning
butest
 
YSC 2013
YSC 2013YSC 2013
YSC 2013
Adrien Ickowicz
 
Triangle counting handout
Triangle counting handoutTriangle counting handout
Triangle counting handout
csedays
 
Principal component analysis and matrix factorizations for learning (part 3) ...
Principal component analysis and matrix factorizations for learning (part 3) ...Principal component analysis and matrix factorizations for learning (part 3) ...
Principal component analysis and matrix factorizations for learning (part 3) ...
zukun
 
Logistic Regression(SGD)
Logistic Regression(SGD)Logistic Regression(SGD)
Logistic Regression(SGD)
Prentice Xu
 
Image segmentation 3 morphology
Image segmentation 3 morphologyImage segmentation 3 morphology
Image segmentation 3 morphology
Rumah Belajar
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
Lesson 5: Matrix Algebra (slides)
Lesson 5: Matrix Algebra (slides)Lesson 5: Matrix Algebra (slides)
Lesson 5: Matrix Algebra (slides)
Matthew Leingang
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
rik0
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
Frank Nielsen
 
Masters Thesis Defense
Masters Thesis DefenseMasters Thesis Defense
Masters Thesis Defense
ssj4mathgenius
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future TrendCVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
zukun
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
Gianmario Spacagna
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
Jam 2006 Test Papers Mathematical Statistics
Jam 2006 Test Papers Mathematical StatisticsJam 2006 Test Papers Mathematical Statistics
Jam 2006 Test Papers Mathematical Statistics
ashu29
 
Cs229 notes7a
Cs229 notes7aCs229 notes7a
Cs229 notes7a
VuTran231
 
Automatic bayesian cubature
Automatic bayesian cubatureAutomatic bayesian cubature
Automatic bayesian cubature
Jagadeeswaran Rathinavel
 
C4 January 2012 QP
C4 January 2012 QPC4 January 2012 QP
C4 January 2012 QP
anicholls1234
 

Similar to Principal component analysis and matrix factorizations for learning (part 2) ding - icml 2005 tutorial - 2005 (20)

icml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Iicml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part I
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
Matrix Computations in Machine Learning
Matrix Computations in Machine LearningMatrix Computations in Machine Learning
Matrix Computations in Machine Learning
 
YSC 2013
YSC 2013YSC 2013
YSC 2013
 
Triangle counting handout
Triangle counting handoutTriangle counting handout
Triangle counting handout
 
Principal component analysis and matrix factorizations for learning (part 3) ...
Principal component analysis and matrix factorizations for learning (part 3) ...Principal component analysis and matrix factorizations for learning (part 3) ...
Principal component analysis and matrix factorizations for learning (part 3) ...
 
Logistic Regression(SGD)
Logistic Regression(SGD)Logistic Regression(SGD)
Logistic Regression(SGD)
 
Image segmentation 3 morphology
Image segmentation 3 morphologyImage segmentation 3 morphology
Image segmentation 3 morphology
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Lesson 5: Matrix Algebra (slides)
Lesson 5: Matrix Algebra (slides)Lesson 5: Matrix Algebra (slides)
Lesson 5: Matrix Algebra (slides)
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
 
Masters Thesis Defense
Masters Thesis DefenseMasters Thesis Defense
Masters Thesis Defense
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future TrendCVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Jam 2006 Test Papers Mathematical Statistics
Jam 2006 Test Papers Mathematical StatisticsJam 2006 Test Papers Mathematical Statistics
Jam 2006 Test Papers Mathematical Statistics
 
Cs229 notes7a
Cs229 notes7aCs229 notes7a
Cs229 notes7a
 
Automatic bayesian cubature
Automatic bayesian cubatureAutomatic bayesian cubature
Automatic bayesian cubature
 
C4 January 2012 QP
C4 January 2012 QPC4 January 2012 QP
C4 January 2012 QP
 

More from zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
zukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
zukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
zukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
zukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
zukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
zukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
zukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
zukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
zukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
zukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
zukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
zukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
zukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
zukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
zukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
zukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
zukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
zukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
zukun
 

More from zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 

Principal component analysis and matrix factorizations for learning (part 2) ding - icml 2005 tutorial - 2005

  • 1. Part 2. Spectral Clustering from Matrix Perspective A brief tutorial emphasizing recent developments (More detailed tutorial is given in ICML’04 ) PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 56
  • 2. From PCA to spectral clustering using generalized eigenvectors Consider the kernel matrix: Wij = φ ( xi ),φ ( x j ) In Kernel PCA we compute eigenvector: Wv = λv Generalized Eigenvector: Wq = λDq D = diag (d1,L, dn ) di = ∑w j ij This leads to Spectral Clustering ! PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 57
  • 3. Indicator Matrix Quadratic Clustering Framework Unsigned Cluster indicator Matrix H=(h1, …, hK) Kernel K-means clustering: max Tr( H T WH ), s.t. H T H = I , H ≥ 0 H K-means: W = XT X; Kernel K-means W = (< φ ( xi ),φ ( x j ) >) Spectral clustering (normalized cut) max Tr( H T WH ), s.t. H T DH = I , H ≥ 0 H PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 58
  • 4. Brief Introduction to Spectral Clustering (Laplacian matrix based clustering) PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 59
  • 5. Some historical notes • Fiedler, 1973, 1975, graph Laplacian matrix • Donath & Hoffman, 1973, bounds • Hall, 1970, Quadratic Placement (embedding) • Pothen, Simon, Liou, 1990, Spectral graph partitioning (many related papers there after) • Hagen & Kahng, 1992, Ratio-cut • Chan, Schlag & Zien, multi-way Ratio-cut • Chung, 1997, Spectral graph theory book • Shi & Malik, 2000, Normalized Cut PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 60
  • 6. Spectral Gold-Rush of 2001 9 papers on spectral clustering • Meila & Shi, AI-Stat 2001. Random Walk interpreation of Normalized Cut • Ding, He & Zha, KDD 2001. Perturbation analysis of Laplacian matrix on sparsely connected graphs • Ng, Jordan & Weiss, NIPS 2001, K-means algorithm on the embeded eigen-space • Belkin & Niyogi, NIPS 2001. Spectral Embedding • Dhillon, KDD 2001, Bipartite graph clustering • Zha et al, CIKM 2001, Bipartite graph clustering • Zha et al, NIPS 2001. Spectral Relaxation of K-means • Ding et al, ICDM 2001. MinMaxCut, Uniqueness of relaxation. • Gu et al, K-way Relaxation of NormCut and MinMaxCut PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 61
  • 7. Spectral Clustering min cutsize , without explicit size constraints But where to cut ? Need to balance sizes PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 62
  • 8. Graph Clustering min between-cluster similarities (weights) sim(A,B) = ∑∑ wij i∈ A j∈B Balance weight Balance size Balance volume sim(A,A) = ∑∑ wij i∈ A j∈ A max within-cluster similarities (weights) PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 63
  • 9. Clustering Objective Functions s(A,B) = ∑∑ w ij • Ratio Cut i∈ A j∈B s(A,B) s(A,B) J Rcut (A,B) = + |A| |B| • Normalized Cut dA = ∑d i i∈A s ( A, B) s ( A, B) J Ncut ( A, B) = + dA dB s ( A, B) s ( A, B) = + s ( A, A) + s ( A, B) s(B, B) + s ( A, B) • Min-Max-Cut s(A,B) s(A,B) J MMC(A,B) = + s(A,A) s(B,B) PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 64
  • 10. Normalized Cut (Shi & Malik, 2000) Min similarity between A & B: s(A,B) = ∑ ∑ wij i∈ A j∈B Balance weights s ( A, B) s ( A, B) J Ncut ( A, B) = dA + dB dA = ∑d i∈A i ⎧ d B / d Ad ⎪ if i ∈ A Cluster indicator: q (i ) = ⎨ ⎪− d A / d B d ⎩ if i ∈ B d= ∑d i∈G i Normalization: q Dq = 1, q De = 0 T T Substitute q leads to J Ncut (q) = q T ( D − W )q min q q T ( D − W )q + λ (q T Dq − 1) Solution is eigenvector of ( D − W )q = λDq PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 65
  • 11. A simple example 2 dense clusters, with sparse connections between them. Adjacency matrix Eigenvector q2 PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 66
  • 12. K-way Spectral Clustering K≥2 PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 67
  • 13. K-way Clustering Objectives • Ratio Cut ⎛ s (C k ,Cl ) s (C k ,Cl ) ⎞ s (C k ,G − C k ) J Rcut (C1 , L , C K ) = ∑ ⎜ ⎜ |C | + |C | ⎟ = < k ,l > ⎝ k l ⎟ ⎠ ∑ k |C k| • Normalized Cut ⎛ s (C k ,Cl ) s (C k ,Cl ) ⎞ s (C k ,G − C k ) J Ncut (C1 , L , C K ) = < k ,l > ∑⎜ ⎜ d ⎝ k + dl ⎟= ⎟ ⎠ ∑ k dk • Min-Max-Cut ⎛ s (C k ,Cl ) s (C k ,Cl ) ⎞ s (C k ,G − C k ) J MMC (C1 , L , C K ) = ⎜ < k ,l > ⎝ ∑ k k l l ⎠ ⎟ ⎜ s (C , C ) + s (C , C ) ⎟ = ∑ k s (C k , C k ) PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 68
  • 14. K-way Spectral Relaxation h1 = (1L1,0 L 0,0 L 0)T Unsigned cluster indicators: h2 = (0L 0,1L1,0 L 0)T LLL Re-write: hk = (0 L 0,0L 0,1L1)T h1 ( D − W )h1 T hk ( D − W )hk T J Rcut (h1 , L, hk ) = T +L+ T h1 h1 hk hk h1 ( D − W )h1 T hk ( D − W )hk T J Ncut (h1 , L, hk ) = T +L+ T h1 Dh1 hk Dhk h1 ( D − W )h1 T hk ( D − W )hk T J MMC (h1 , L , hk ) = T +L+ T h1 Wh1 hk Whk PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 69
  • 15. K-way Normalized Cut Spectral Relaxation Unsigned cluster indicators: nk } y k = D1/ 2 (0 L 0,1L1,0L 0)T / || D1/ 2 hk || Re-write: ~ ~ J Ncut ( y1 , L , y k ) = T y1 ( I − W ) y1 + L + y k ( I − W ) y k T ~ ~ = Tr (Y T ( I − W )Y ) W = D −1/ 2WD −1/ 2 ~ Optimize : min Tr (Y ( I − W )Y ), subject to Y T Y = I T Y By K. Fan’s theorem, optimal solution is ~ eigenvectors: Y=(v1,v2, …, vk), ( I − W )vk = λk vk ( D − W )u k = λk Du k , u k = D −1/ 2 vk λ1 + L + λk ≤ min J Ncut ( y1 ,L , y k ) (Gu, et al, 2001) PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 70
  • 16. K-way Spectral Clustering is difficult • Spectral clustering is best applied to 2-way clustering – positive entries for one cluster – negative entries for another cluster • For K-way (K>2) clustering – Positive and negative signs make cluster assignment difficult – Recursive 2-way clustering – Low-dimension embedding. Project the data to eigenvector subspace; use another clustering method such as K-means to cluster the data (Ng et al; Zha et al; Back & Jordan, etc) – Linearized cluster assignment using spectral ordering and cluster crossing PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 71
  • 17. Scaled PCA: a Unified Framework for clustering and ordering • Scaled PCA has two optimality properties – Distance sensitive ordering – Min-max principle Clustering • SPCA on contingency table ⇒ Correspondence Analysis – Simultaneous ordering of rows and columns – Simultaneous clustering of rows and columns PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 72
  • 18. Scaled PCA similarity matrix S=(sij) (generated from XXT) D = diag(d1 ,L, d n ) di = si. ~ −1 −1 ~ Nonlinear re-scaling: S = D SD , sij = sij /(si.s j. ) 2 2 1/ 2 ~ Apply SVD on S⇒ ~ 1 ⎡ T⎤ S = D S D = D ∑ zk λk z k D = D ⎢∑ qk λk qk ⎥ D 1 1 1 2 2 2 T 2 k ⎣k ⎦ qk = D-1/2 zk is the scaled principal component Subtract trivial component λ = 1, z = d 1/ 2 /s.., q =1 0 0 0 ⇒ S − dd T /s.. = D ∑ qk λk qT D k k =1 (Ding, et al, 2002) PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 73
  • 19. Scaled PCA on a Rectangle Matrix ⇒ Correspondence Analysis ~ −1 −1 ~ Nonlinear re-scaling: P = D 2 PD 2 , p = p /( p p )1/ 2 r c ij ij i. j. ~ Apply SVD on P Subtract trivial component P − rc / p.. = Dr ∑ f k λk g Dc T T r = ( p1.,L, pn. ) T k k =1 −1 −1 c = ( p.1,L, p.n ) T fk = D u , gk = D v r 2 k 2 c k are the scaled row and column principal component (standard coordinates in CA) PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 74
  • 20. Correspondence Analysis (CA) • Mainly used in graphical display of data • Popular in France (Benzécri, 1969) • Long history – Simultaneous row and column regression (Hirschfeld, 1935) – Reciprocal averaging (Richardson & Kuder, 1933; Horst, 1935; Fisher, 1940; Hill, 1974) – Canonical correlations, dual scaling, etc. • Formulation is a bit complicated (“convoluted” Jolliffe, 2002, p.342) • “A neglected method”, (Hill, 1974) PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 75
  • 21. Clustering of Bipartite Graphs (rectangle matrix) Simultaneous clustering of rows and columns of a contingency table (adjacency matrix B ) Examples of bipartite graphs • Information Retrieval: word-by-document matrix • Market basket data: transaction-by-item matrix • DNA Gene expression profiles • Protein vs protein-complex PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 76
  • 22. Bipartite Graph Clustering Clustering indicators for rows and columns: ⎧ 1 if ri ∈ R1 ⎧ 1 if ci ∈ C1 f (i ) = ⎨ g (i ) = ⎨ ⎩− 1 if ri ∈ R2 ⎩− 1 if ci ∈ C2 ⎛ BR1 ,C1 BR1 ,C2 ⎞ ⎛ 0 B⎞ ⎛f ⎞ B=⎜ ⎟ W =⎜ T ⎟ q=⎜ ⎟ ⎜g⎟ ⎜ BR ,C BR2 ,C2 ⎟ ⎜B 0⎟ ⎝ ⎠ ⎝ 2 1 ⎠ ⎝ ⎠ Substitute and obtain s (W12 ) s (W12 ) J MMC (C1 , C 2 ; R1 , R2 ) = + s (W11 ) s (W22 ) f,g are determined by ⎡⎛ Dr ⎞ ⎛ 0 B ⎞⎤⎛ f ⎞ ⎛ Dr ⎞⎛ f ⎞ ⎜ ⎢⎜ ⎟−⎜ T ⎟⎥⎜ ⎟ = λ ⎜ ⎟⎜ ⎟ ⎢⎝ ⎣ Dc ⎟ ⎜ B ⎠ ⎝ 0 ⎟⎥⎜ g ⎟ ⎠⎦⎝ ⎠ ⎜ ⎝ Dc ⎟⎜ g ⎟ ⎠⎝ ⎠ PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 77
  • 23. Spectral Clustering of Bipartite Graphs Simultaneous clustering of rows and columns (adjacency matrix B ) s ( BR1 ,C2 ) = ∑ ∑b ri ∈R1c j ∈C 2 ij min between-cluster sum of xyz weights: s(R1,C2), s(R2,C1) max within-cluster sum of xyz cut xyz weights: s(R1,C1), s(R2,C2) s ( BR1 ,C2 ) + s ( B R2 ,C1 ) s ( B R1 ,C2 ) + s ( B R2 ,C1 ) J MMC (C1 , C 2 ; R1 , R2 ) = + 2 s ( B R1 ,C1 ) 2 s ( B R2 ,C2 ) (Ding, AI-STAT 2003) PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 78
  • 24. Internet Newsgroups Simultaneous clustering of documents and words PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 79
  • 25. Embedding in Principal Subspace Cluster Self-Aggregation (proved in perturbation analysis) (Hall, 1970, “quadratic placement” (embedding) a graph) PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 80
  • 26. Spectral Embedding: Self-aggregation • Compute K eigenvectors of the Laplacian. • Embed objects in the K-dim eigenspace (Ding, 2004) PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 81
  • 27. Spectral embedding is not topology preserving 700 3-D data points form 2 interlock rings In eigenspace, they shrink and separate PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 82
  • 28. Spectral Embedding Simplex Embedding Theorem. Objects self-aggregate to K centroids Centroids locate on K corners of a simplex • Simplex consists K basis vectors + coordinate origin • Simplex is rotated by an orthogonal transformation T •T are determined by perturbation analysis (Ding, 2004) PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 83
  • 29. Perturbation Analysis Wq = λDq Wˆ z = ( D −1 / 2WD −1 / 2 ) z = λz q = D −1 / 2 z Assume data has 3 dense clusters sparsely connected. C2 ⎡W W W ⎤ 11 12 13 C1 W = ⎢ 21 W22 W23⎥ ⎢W ⎥ ⎢ 31 W32 W33⎥ ⎣W ⎦ C3 Off-diagonal blocks are between-cluster connections, assumed small and are treated as a perturbation (Ding et al, KDD’01) PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 84
  • 30. Spectral Perturbation Theorem Orthogonal Transform Matrix T = (t1 ,L , t K ) T are determined by: Γt k = λ k t k −1 −1 Spectral Perturbation Matrix Γ=Ω 2 ΓΩ 2 ⎡ h11 − s12 L − s1K ⎤ s pq = s (C p , Cq ) ⎢− s L − s2 K ⎥ Γ = ⎢ 21 ⎢ M h22 M L M ⎥ ⎥ hkk = ∑ s p| p ≠ k kp ⎢ ⎥ ⎣− s K 1 − s K 2 L hKK ⎦ Ω = diag[ ρ (C1 ),L, ρ (Ck )] PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 85
  • 31. Connectivity Network ⎧ 1 if i, j belong to same cluster Cij = ⎨ ⎩ 0 otherwise K Scaled PCA provides C≅D ∑k =1 qk λk qT D k K ∑ 1 Green’s function : C ≈G = qk qT k =2 1 − λk k K Projection matrix: C≈P≡ ∑k =1 qk qT k PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding (Ding et al, 2002) 86
  • 32. Similarity matrix W 1st order Perturbation: Example 1 1st order solution Connectivity λ2 = 0.300, λ2 = 0.268 matrix Between-cluster connections suppressed Within-cluster connections enhanced Effects of self-aggregation PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 87
  • 33. Optimality Properties of Scaled PCA Scaled principal components have optimality properties: Ordering – Adjacent objects along the order are similar – Far-away objects along the order are dissimilar – Optimal solution for the permutation index are given by scaled PCA. Clustering – Maximize within-cluster similarity – Minimize between-cluster similarity – Optimal solution for cluster membership indicators given by scaled PCA. PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 88
  • 34. Spectral Graph Ordering (Barnard, Pothen, Simon, 1993), envelop reduction of sparse matrix: find ordering such that the envelop is minimized min ∑ max j | i − j | wij ⇒ min ∑ ( xi − x j ) wij 2 i ij (Hall, 1970), “quadratic placement of a graph”: Find coordinate x to minimize J= ∑ ij ( xi − x j ) 2 wij = x T ( D − W ) x Solution are eigenvectors of Laplacian PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 89
  • 35. Distance Sensitive Ordering Given a graph. Find an optimal Ordering of the nodes. π permutation indexes J (π ) = ∑ d n−d i =1 π i ,π i + d π (1,L, n) = (π 1 ,L, π n ) w ∩∩ ∩∩∩∩∩∩∩∩ wπ1 ,π 3 J d =2 (π ) : ∪∪∪∪∪∪∪∪ min J (π ) = ∑ n −1 d =1 d J d (π ) 2 π The larger distance, the larger weights, panelity. PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 90
  • 36. Distance Sensitive Ordering J (π ) = ∑ (i − j ) wπ i ,π j = ∑ (i − j ) wπ i ,π j 2 2 ij π i ,π j = ∑ (π − π ) wi , j i −1 −1 2 j ij n2 π i−1 −( n +1) / 2 π −1 −( n +1) / 2 2 = 8 ij ∑( n/2 − j n/2 ) wi , j Define: shifted and rescaled inverse permutation indexes π i−1 − (n + 1) /2 1− n 3 − n n −1 qi = ={ , ,L, } n /2 n n n J (π ) = n2 8 ∑ (qi − q j ) wij = q ( D − W )q 2 n2 4 T ij PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 91
  • 37. Distance Sensitive Ordering Once q2 is computed, since q2 (i ) < q2 ( j ) ⇒ π i −1 <π −1 j π i −1 can be uniquely recovered from q2 Implementation: sort q2 induces π PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 92
  • 38. Re-ordering of Genes and Tissues J (π ) r= J (random) r = 0.18 J d =1 (π ) rd =1= J d =1 ( random ) rd =1 = 3.39 PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 93
  • 39. Spectral clustering vs Spectral ordering • Continuous approximation of both integer programming problems are given by the same eigenvector • Different problems could have the same continuous approximate solution. • Quality of the approximation: Ordering: better quality: the solution relax from a set of evenly spaced discrete values Clustering: less better quality: solution relax from 2 discrete values PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 94
  • 40. Linearized Cluster Assignment Turn spectral clustering to 1D clustering problem • Spectral ordering on connectivity network • Cluster crossing – Sum of similarities along anti-diagonal – Gives 1-D curve with valleys and peaks – Divide valleys and peaks into clusters PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 95
  • 41. Cluster overlap and crossing Given similarity W, and clusters A,B. • Cluster overlap s(A,B) = ∑∑ w i∈ A j∈B ij • Cluster crossing compute a smaller fraction of cluster overlap. • Cluster crossing depends on an ordering o. It sums weights cross the site i along the order m ρ (i ) = ∑ wo (i− j ),o (i+ j ) j =1 • This is a sum along anti-diagonals of W. PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 96
  • 42. cluster crossing PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 97
  • 43. K-way Clustering Experiments Accuracy of clustering results: Method Linearized Recursive 2-way Embedding Assignment clustering + K-means Data A 89.0% 82.8% 75.1% Data B 75.7% 67.2% 56.4% PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 98
  • 44. Some Additional Advanced/related Topics • Random talks and normalized cut • Semi-definite programming • Sub-sampling in spectral clustering • Extending to semi-supervised classification • Green’s function approach • Out-of-sample embeding PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 99