SlideShare a Scribd company logo
COMPUTING LOCAL
AND GLOBAL
CENTRALITY
DAVID F. GLEICH (AND MANY OTHERS)!
DATA MINING, NETWORKS AND DYNAMICS
2011 NOVEMBER 7




                                      1
LOCAL
                      GLOBAL


   Pooya
Esfandiar
                                      Reid
                      Francesco                 Andersen
                      Bonchi




    Chen                             Vahab
    Greif
                          Mirrokni

                      Laks V.S.
                      Lakshmanan



 Byung-




                                                            2/41
 Won On
Graph centrality

Global
How important is a
node? 

Local
How important is a
node with respect
to another one?




                     3/41
Graph centrality

Koschützki et al.
must respect
isomorphism

higher is better

Examples
node-degree
1/shortest-path




                     4/41
Graph centrality
                This talk
                
                Path summation
               X
               
 f (paths of length `)
                 `


local Katz score
X                 number of paths of
        ↵` ·
               length ` between i and j
    `




                                          5/41
A – adjacency matrix
L – Laplacian matrix
P – random walk transition matrix

Katz score
      Ki,j = [(I ↵AT ) 1 ]i,j
                                                
Commute time

    Ci,j = vol(G)(L+ + L+
                    i,i  j,j                        2L+ )
                                                      i,j
PageRank
     (I ↵P T )x = (1 ↵)e/n
                      
     Xi,j = (1 ↵)[(I ↵P T ) 1 ]i,j




                                                            6/41
USES FOR CENTRALITY

Ranking features for web-search/classification
    Najork, M. A.; Zaragoza, H. & Taylor, M. J.#
    HITS on the web: How does it compare? 
    Becchetti, L.; Castillo, C.; Donato, D.; Baeza-Yates, R.
    & Leonardi, S. Link analysis for Web spam detection 

Interesting nodes
    GeneRank, ProteinRank, TwitterRank, IsoRank,
    FutureRank, HostRank, DiffusionRank, ItemRank,
    SocialPageRank, SimRank




                                                               7/41
USES FOR CENTRALITY

Ranking networks of comparisons.
    Chartier, T. P.; Kreutzer, E.; Langville, A. N. & Pedings,
    K. E. Sensitivity and Stability of Ranking Vectors 

Clustering or community detection
    Andersen, R.; Chung, F. & Lang, K.#
    Local Graph Partitioning using PageRank Vectors 

Link prediction
    Savas et al. Hold on about 90 minutes 




                                                                 8/41
THESE GET USED
  A LOT. THEY
 MUST BE FAST.


                  9
MATRICES, MOMENTS, QUADRATURE

Estimate a quadratic form

                                     T

                         l  x f (Z )x  u
                                      T    +

                  (ei          ej ) L (ei               ej )     Commute


1                    T                     1
  (ei + ej )T (I   ↵P )   1
                              (ei + ej )     (ei   ej )T (I   ↵P T )   1
                                                                           (ei   ej )   Katz
4

                                          4

Also used by Benzi and Bonito (LAA) for Katz
scores and the matrix exponential




                                                                                           10/41
MMQ - THE BIG IDEA
Quadratic form                                                         Think                              
          


Weighted sum                                                           A is s.p.d. use EVD

          


Stieltjes integral                                                     “A tautology”

          


Quadrature approximation                                                              
            

Matrix equation                                                        Lanczos
David F. Gleich (Purdue)       Univ. Chicago SSCS Seminar                                         22 of 47




                                                                                                             11/41
MMQ PROCEDURE
Goal                                    
Given                                    

1. Run k-steps of Lanczos on       starting with      
2. Compute          ,       with an additional eigenvalue at       ,
        set                                                         Correspond to a Gauss-Radau rule, with
                                                                    u as a prescribed node
3. Compute       ,       with an additional eigenvalue at    , set
                                                                    Correspond to a Gauss-Radau rule, with
                                                                    l as a prescribed node
4. Output                      as lower and upper bounds on      




                                                                                                                        12/41
David F. Gleich (Purdue)                          Univ. Chicago SSCS Seminar                                 25 of 47
How well does it work?
                Bounds
                                     Error
          arxiv, Katz, hard alpha                      arxiv, Katz, hard
50
                                          0
                                       10



 0
                                          -5
                                       10



-50                                             5     10     15    20    25   30
      5     10     15    20    25    30             matrix-vector products
          matrix-vector products




                                                                                   13/41
                              ������ = 1/( || A ||2 + 1 )
MY COMPLAINTS


Matvecs are expensive.

Takes many iterations.

Just one score comes out!






                             14/41
Katz scores
ATZ               SCORES ARE LOCALIZED
                       T
                  (I ↵A )k = e i    are highly
                                     localized.
                                                      Up to 50 neighbors is
                                                      99.65% of the total
                                                      mass




                                                                                     15/41
Gleich (Purdue)          Univ. Chicago SSCS Seminar                       32 of 47
HOW CAN WE
EXPLOIT THIS?


                 16
TOP-K ALGORITHM FOR KATZ

Approximate      
                                     T
                                                          
where       is sparse

Keep       sparse too
Ideally, don’t “touch” all of      




                                                                17/41
David F. Gleich (Purdue)           Univ. Chicago SSCS Seminar     34 of
TOP-K ALGORITHM FOR KATZ

Approximate      
                                     T
                                                          
where       is sparse

Keep       sparse too
Ideally, don’t “touch” all of      


                                            This is possible for "




                                                                     18/41
David F. Gleich (Purdue)           Univ. Chicago SSCS Seminar          34 of

                                       personalized PageRank!
Richardson Ax = b
x(k+1) = x(k) + r(k)        A = AT , A ⌫ 0   Gradient descent 
r(k+1) = b Ax(k)              equivalent#    min xT Ax       2xT b
                                  to 
                                  

          What about coordinate descent?

Gauss-Southwell Ax = b
x(k+1) = x(k) + rj(k) ej                     How to
r(k+1) = r(k) + rj(k) Aej                    pick j? 

               Frequently “rediscovered” for PageRank.




                                                                     19/41
               McSherry (WWW2005), Berkhin (JIM 2007),
               Andersen-Chung-Lang (FOCS 2006)
DEMO!




         20
NEW CONVERGENCE THEORY

Katz and PageRank are equivalent if 
������ < 1 / || A ||1 

Gauss-Southwell converges when ������ < 1 / || A ||2 
(Luo and Tseng 1992) if j is picked as the largest
residual

Read all about it
Fast matrix computations for pair-wise and column-wise commute times and
Katz scores. Bonchi, Esfandiar, Gleich, Greif, Lakshmanan, J. Internet
Mathematics (to appear)




                                                                           21/41
1,000,000 node, 100,000,000 edges
                                             hollywood, Katz, hard alpha

Precision@k for exact top−k sets    1

                                   0.8

                                   0.6

                                   0.4
                                                                         k=10
                                                                         k=100
                                   0.2                                   k=1000
                                                                         cg k=25
                                    0                                    k=25

                                           −2      −1       0        1         2




                                                                                   22/41
                                         10     10       10     10      10
                                          Equivalent matrix−vector products
OPEN QUESTIONS

I can’t find any existing derivation of this method
in the non-symmetric case (prior to the
PageRank literature). Any thoughts?

How to show that the method convergence for a
non-symmetric matrix when (I ↵P T ) is not
diagonally dominant?






                                                     23/41
OVERLAPPING
CLUSTERS FOR
DISTRIBUTED
CENTRALITY


               24
LARGE GRAPHS, IN PRACTICE
                      Copy 1
          Copy 2
                  src -> dst
      src -> dst
                  src -> dst
      src -> dst
                  src -> dst
      src -> dst

                         Copy 1
          Copy 2
                     src -> dst
      src -> dst
                     src -> dst
      src -> dst
                     src -> dst
      src -> dst

                            Copy 1
          Copy 2
                        src -> dst
      src -> dst
                        src -> dst
      src -> dst
                        src -> dst
      src -> dst



                   Edge lists maybe tied together by a




                                                         25/41
                   common host, stored redundantly on
                   many hard drives.
UTILIZE SOME
REDUNDANCY?
   To compute global PageRank?




                                  26
Overlapping
                         Clusters
                               Use the
                               redundancy to
                               reduce
                               communication
                               when solving a
                               PageRank problem


Overlapping clusters for distributed computation. #




                                                      27/41
Andersen, Gleich, Mirrokni, WSDM2012 (to appear).
Communication
avoiding
algorithms

Communication is the limiting
factor in most computations
these days. Flops are,
relatively speaking, free.




                                28/41
KEY POINTS

Utilize personalized PageRank vectors to find
the clusters with “good” conductance scores.

Define “core” vertices for each cluster. Find a
good way to cover the graph with these
clusters.

Use restricted additive Schwarz to solve #
(thanks Prof. Szyld and Frommer!)




                                                 29/41
All nodes solve locally using #
the coordinate descent method.




                                  30/41
All nodes solve locally using #
the coordinate descent method.




A core vertex for the




                                  31/41
gray cluster.
All nodes solve locally using #
    the coordinate descent method.




   Red sends residuals to white.
White send residuals to red.




                                      32/41
White then uses the coordinate
descent method to adjust its solution.




                                          33/41
Will cause communication to red/blue.
It works!
                 2
                                  Swapping Probability (usroads)
                                  PageRank Communication (usroads)
                                  Swapping Probability (web−Google)
                1.5
                                  PageRank Communication (web−Google)
Relative Work




                 1                                         Metis Partitioner




                0.5


                 0
                  1   1.1   1.2    1.3     1.4     1.5     1.6           1.7
                                   Volume Ratio

                            How much more of the




                                                                               34/41
                            graph we need to store.
PERSONALIZED PAGERANK CLUSTERS

Solve (I ↵P T )x = (1 ↵)ei
       #
to a large degree-weighted tolerance ������ 

Sweep over the vertices in order of their degree-
normalized rank. Find the best conductance set. 

A Cheeger-like inequality. (Not a heuristic.) 




                                                    35/41
CORE VERTICES

Compute the expected “leavetime” for each
vertex in a cluster. 

Keep increasing the threshold for a “good”
vertex until every vertex is core in some cluster.

Then approximate a set-cover problem to cover
the graph with clusters, and use a heuristic to
pack vertices until 




                                                      36/41
MY QUESTIONS "
and future directions

REVERSE ORDER




                         37
GRAPH SPECTRA




                                                38/41
                 Some work by Banerjee and Jost.

More Related Content

Similar to Computing Local and Global Centrality

PhD_Thesis_slides.pdf
PhD_Thesis_slides.pdfPhD_Thesis_slides.pdf
PhD_Thesis_slides.pdf
NiloyBiswas36
 
PAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ Warwick
Pierre Jacob
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
David Gleich
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
Marco Righini
 
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...Yury Lifshits
 
A superglue for string comparison
A superglue for string comparisonA superglue for string comparison
A superglue for string comparison
BioinformaticsInstitute
 
On the convergence properties of the Wang-Landau algorithm
On the convergence properties of the Wang-Landau algorithmOn the convergence properties of the Wang-Landau algorithm
On the convergence properties of the Wang-Landau algorithm
Robin Ryder
 
Surveys
SurveysSurveys
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
David Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
David Gleich
 
Analysis and design of algorithms part 3
Analysis and design of algorithms part 3Analysis and design of algorithms part 3
Analysis and design of algorithms part 3
Deepak John
 
Chapter 23 aoa
Chapter 23 aoaChapter 23 aoa
Chapter 23 aoa
Hanif Durad
 
Fast matrix computations for pair-wise and column-wise Katz scores and commut...
Fast matrix computations for pair-wise and column-wise Katz scores and commut...Fast matrix computations for pair-wise and column-wise Katz scores and commut...
Fast matrix computations for pair-wise and column-wise Katz scores and commut...
David Gleich
 
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Michail Argyriou
 
Sensors and Samples: A Homological Approach
Sensors and Samples:  A Homological ApproachSensors and Samples:  A Homological Approach
Sensors and Samples: A Homological Approach
Don Sheehy
 
Graph Kernels for Chemical Informatics
Graph Kernels for Chemical InformaticsGraph Kernels for Chemical Informatics
Graph Kernels for Chemical Informatics
Mukund Raj
 
Topological Inference via Meshing
Topological Inference via MeshingTopological Inference via Meshing
Topological Inference via Meshing
Don Sheehy
 

Similar to Computing Local and Global Centrality (20)

PhD_Thesis_slides.pdf
PhD_Thesis_slides.pdfPhD_Thesis_slides.pdf
PhD_Thesis_slides.pdf
 
Kent_2007
Kent_2007Kent_2007
Kent_2007
 
PAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ Warwick
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Slides4
Slides4Slides4
Slides4
 
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...
 
A superglue for string comparison
A superglue for string comparisonA superglue for string comparison
A superglue for string comparison
 
On the convergence properties of the Wang-Landau algorithm
On the convergence properties of the Wang-Landau algorithmOn the convergence properties of the Wang-Landau algorithm
On the convergence properties of the Wang-Landau algorithm
 
Ryder
RyderRyder
Ryder
 
Surveys
SurveysSurveys
Surveys
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
Analysis and design of algorithms part 3
Analysis and design of algorithms part 3Analysis and design of algorithms part 3
Analysis and design of algorithms part 3
 
Chapter 23 aoa
Chapter 23 aoaChapter 23 aoa
Chapter 23 aoa
 
Fast matrix computations for pair-wise and column-wise Katz scores and commut...
Fast matrix computations for pair-wise and column-wise Katz scores and commut...Fast matrix computations for pair-wise and column-wise Katz scores and commut...
Fast matrix computations for pair-wise and column-wise Katz scores and commut...
 
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
 
Sensors and Samples: A Homological Approach
Sensors and Samples:  A Homological ApproachSensors and Samples:  A Homological Approach
Sensors and Samples: A Homological Approach
 
Graph Kernels for Chemical Informatics
Graph Kernels for Chemical InformaticsGraph Kernels for Chemical Informatics
Graph Kernels for Chemical Informatics
 
Topological Inference via Meshing
Topological Inference via MeshingTopological Inference via Meshing
Topological Inference via Meshing
 

More from David Gleich

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
David Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
David Gleich
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
David Gleich
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
David Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
David Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
David Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
David Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
David Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
David Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
David Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
David Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
David Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
David Gleich
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
David Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
David Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
David Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
David Gleich
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
David Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
David Gleich
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
David Gleich
 

More from David Gleich (20)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 

Recently uploaded

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 

Computing Local and Global Centrality

  • 1. COMPUTING LOCAL AND GLOBAL CENTRALITY DAVID F. GLEICH (AND MANY OTHERS)! DATA MINING, NETWORKS AND DYNAMICS 2011 NOVEMBER 7 1
  • 2. LOCAL GLOBAL Pooya Esfandiar Reid Francesco Andersen Bonchi Chen Vahab Greif Mirrokni Laks V.S. Lakshmanan Byung- 2/41 Won On
  • 3. Graph centrality Global How important is a node? Local How important is a node with respect to another one? 3/41
  • 4. Graph centrality Koschützki et al. must respect isomorphism higher is better Examples node-degree 1/shortest-path 4/41
  • 5. Graph centrality This talk Path summation X f (paths of length `) ` local Katz score X number of paths of ↵` · length ` between i and j ` 5/41
  • 6. A – adjacency matrix L – Laplacian matrix P – random walk transition matrix Katz score Ki,j = [(I ↵AT ) 1 ]i,j                                                  Commute time Ci,j = vol(G)(L+ + L+ i,i j,j 2L+ ) i,j PageRank (I ↵P T )x = (1 ↵)e/n                       Xi,j = (1 ↵)[(I ↵P T ) 1 ]i,j 6/41
  • 7. USES FOR CENTRALITY Ranking features for web-search/classification Najork, M. A.; Zaragoza, H. & Taylor, M. J.# HITS on the web: How does it compare? Becchetti, L.; Castillo, C.; Donato, D.; Baeza-Yates, R. & Leonardi, S. Link analysis for Web spam detection Interesting nodes GeneRank, ProteinRank, TwitterRank, IsoRank, FutureRank, HostRank, DiffusionRank, ItemRank, SocialPageRank, SimRank 7/41
  • 8. USES FOR CENTRALITY Ranking networks of comparisons. Chartier, T. P.; Kreutzer, E.; Langville, A. N. & Pedings, K. E. Sensitivity and Stability of Ranking Vectors Clustering or community detection Andersen, R.; Chung, F. & Lang, K.# Local Graph Partitioning using PageRank Vectors Link prediction Savas et al. Hold on about 90 minutes 8/41
  • 9. THESE GET USED A LOT. THEY MUST BE FAST. 9
  • 10. MATRICES, MOMENTS, QUADRATURE Estimate a quadratic form T l  x f (Z )x  u T + (ei ej ) L (ei ej ) Commute 1 T 1 (ei + ej )T (I ↵P ) 1 (ei + ej ) (ei ej )T (I ↵P T ) 1 (ei ej ) Katz 4 4 Also used by Benzi and Bonito (LAA) for Katz scores and the matrix exponential 10/41
  • 11. MMQ - THE BIG IDEA Quadratic form                         Think                                     Weighted sum                            A is s.p.d. use EVD       Stieltjes integral                            “A tautology”       Quadrature approximation                                  Matrix equation                      Lanczos David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 22 of 47 11/41
  • 12. MMQ PROCEDURE Goal                                     Given                                     1. Run k-steps of Lanczos on       starting with       2. Compute          ,       with an additional eigenvalue at       , set                               Correspond to a Gauss-Radau rule, with u as a prescribed node 3. Compute       ,       with an additional eigenvalue at    , set                            Correspond to a Gauss-Radau rule, with l as a prescribed node 4. Output                      as lower and upper bounds on       12/41 David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 25 of 47
  • 13. How well does it work? Bounds Error arxiv, Katz, hard alpha arxiv, Katz, hard 50 0 10 0 -5 10 -50 5 10 15 20 25 30 5 10 15 20 25 30 matrix-vector products matrix-vector products 13/41 ������ = 1/( || A ||2 + 1 )
  • 14. MY COMPLAINTS Matvecs are expensive. Takes many iterations. Just one score comes out! 14/41
  • 15. Katz scores ATZ SCORES ARE LOCALIZED T (I ↵A )k = e i are highly localized. Up to 50 neighbors is 99.65% of the total mass 15/41 Gleich (Purdue) Univ. Chicago SSCS Seminar 32 of 47
  • 16. HOW CAN WE EXPLOIT THIS? 16
  • 17. TOP-K ALGORITHM FOR KATZ Approximate       T                                           where       is sparse Keep       sparse too Ideally, don’t “touch” all of       17/41 David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of
  • 18. TOP-K ALGORITHM FOR KATZ Approximate       T                                           where       is sparse Keep       sparse too Ideally, don’t “touch” all of       This is possible for " 18/41 David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of personalized PageRank!
  • 19. Richardson Ax = b x(k+1) = x(k) + r(k) A = AT , A ⌫ 0 Gradient descent r(k+1) = b Ax(k) equivalent# min xT Ax 2xT b to What about coordinate descent? Gauss-Southwell Ax = b x(k+1) = x(k) + rj(k) ej How to r(k+1) = r(k) + rj(k) Aej pick j? Frequently “rediscovered” for PageRank. 19/41 McSherry (WWW2005), Berkhin (JIM 2007), Andersen-Chung-Lang (FOCS 2006)
  • 20. DEMO! 20
  • 21. NEW CONVERGENCE THEORY Katz and PageRank are equivalent if ������ < 1 / || A ||1 Gauss-Southwell converges when ������ < 1 / || A ||2 (Luo and Tseng 1992) if j is picked as the largest residual Read all about it Fast matrix computations for pair-wise and column-wise commute times and Katz scores. Bonchi, Esfandiar, Gleich, Greif, Lakshmanan, J. Internet Mathematics (to appear) 21/41
  • 22. 1,000,000 node, 100,000,000 edges hollywood, Katz, hard alpha Precision@k for exact top−k sets 1 0.8 0.6 0.4 k=10 k=100 0.2 k=1000 cg k=25 0 k=25 −2 −1 0 1 2 22/41 10 10 10 10 10 Equivalent matrix−vector products
  • 23. OPEN QUESTIONS I can’t find any existing derivation of this method in the non-symmetric case (prior to the PageRank literature). Any thoughts? How to show that the method convergence for a non-symmetric matrix when (I ↵P T ) is not diagonally dominant? 23/41
  • 25. LARGE GRAPHS, IN PRACTICE Copy 1 Copy 2 src -> dst src -> dst src -> dst src -> dst src -> dst src -> dst Copy 1 Copy 2 src -> dst src -> dst src -> dst src -> dst src -> dst src -> dst Copy 1 Copy 2 src -> dst src -> dst src -> dst src -> dst src -> dst src -> dst Edge lists maybe tied together by a 25/41 common host, stored redundantly on many hard drives.
  • 26. UTILIZE SOME REDUNDANCY? To compute global PageRank? 26
  • 27. Overlapping Clusters Use the redundancy to reduce communication when solving a PageRank problem Overlapping clusters for distributed computation. # 27/41 Andersen, Gleich, Mirrokni, WSDM2012 (to appear).
  • 28. Communication avoiding algorithms Communication is the limiting factor in most computations these days. Flops are, relatively speaking, free. 28/41
  • 29. KEY POINTS Utilize personalized PageRank vectors to find the clusters with “good” conductance scores. Define “core” vertices for each cluster. Find a good way to cover the graph with these clusters. Use restricted additive Schwarz to solve # (thanks Prof. Szyld and Frommer!) 29/41
  • 30. All nodes solve locally using # the coordinate descent method. 30/41
  • 31. All nodes solve locally using # the coordinate descent method. A core vertex for the 31/41 gray cluster.
  • 32. All nodes solve locally using # the coordinate descent method. Red sends residuals to white. White send residuals to red. 32/41
  • 33. White then uses the coordinate descent method to adjust its solution. 33/41 Will cause communication to red/blue.
  • 34. It works! 2 Swapping Probability (usroads) PageRank Communication (usroads) Swapping Probability (web−Google) 1.5 PageRank Communication (web−Google) Relative Work 1 Metis Partitioner 0.5 0 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Volume Ratio How much more of the 34/41 graph we need to store.
  • 35. PERSONALIZED PAGERANK CLUSTERS Solve (I ↵P T )x = (1 ↵)ei # to a large degree-weighted tolerance ������ Sweep over the vertices in order of their degree- normalized rank. Find the best conductance set. A Cheeger-like inequality. (Not a heuristic.) 35/41
  • 36. CORE VERTICES Compute the expected “leavetime” for each vertex in a cluster. Keep increasing the threshold for a “good” vertex until every vertex is core in some cluster. Then approximate a set-cover problem to cover the graph with clusters, and use a heuristic to pack vertices until 36/41
  • 37. MY QUESTIONS " and future directions REVERSE ORDER 37
  • 38. GRAPH SPECTRA 38/41 Some work by Banerjee and Jost.