COMPUTING LOCALAND GLOBALCENTRALITYDAVID F. GLEICH (AND MANY OTHERS)!DATA MINING, NETWORKS AND DYNAMICS2011 NOVEMBER 7    ...
LOCAL                      GLOBAL   PooyaEsfandiar                                      Reid                      Francesc...
Graph centralityGlobalHow important is anode? LocalHow important is anode with respectto another one?                     ...
Graph centralityKoschützki et al.must respectisomorphismhigher is betterExamplesnode-degree1/shortest-path                ...
Graph centrality                This talk                                Path summation               X                f (...
A – adjacency matrixL – Laplacian matrixP – random walk transition matrixKatz score      Ki,j = [(I ↵AT ) 1 ]i,j          ...
USES FOR CENTRALITYRanking features for web-search/classification    Najork, M. A.; Zaragoza, H. & Taylor, M. J.#    HITS o...
USES FOR CENTRALITYRanking networks of comparisons.    Chartier, T. P.; Kreutzer, E.; Langville, A. N. & Pedings,    K. E....
THESE GET USED  A LOT. THEY MUST BE FAST.                  9
MATRICES, MOMENTS, QUADRATUREEstimate a quadratic form                                     T                         l  x...
MMQ - THE BIG IDEAQuadratic form                                                         Think                            ...
MMQ PROCEDUREGoal                                    Given                                    1. Run k-steps of Lanczos on...
How well does it work?                Bounds                                     Error          arxiv, Katz, hard alpha   ...
MY COMPLAINTSMatvecs are expensive.Takes many iterations.Just one score comes out!                             14/41
Katz scoresATZ               SCORES ARE LOCALIZED                       T                  (I ↵A )k = e i    are highly   ...
HOW CAN WEEXPLOIT THIS?                 16
TOP-K ALGORITHM FOR KATZApproximate                                           T                                           ...
TOP-K ALGORITHM FOR KATZApproximate                                           T                                           ...
Richardson Ax = bx(k+1) = x(k) + r(k)        A = AT , A ⌫ 0   Gradient descent r(k+1) = b Ax(k)              equivalent#  ...
DEMO!         20
NEW CONVERGENCE THEORYKatz and PageRank are equivalent if   1 / || A ||1 Gauss-Southwell converges when   1 / || A ||2 (Lu...
1,000,000 node, 100,000,000 edges                                             hollywood, Katz, hard alphaPrecision@k for e...
OPEN QUESTIONSI can’t find any existing derivation of this methodin the non-symmetric case (prior to thePageRank literature...
OVERLAPPINGCLUSTERS FORDISTRIBUTEDCENTRALITY               24
LARGE GRAPHS, IN PRACTICE                      Copy 1          Copy 2                  src - dst      src - dst           ...
UTILIZE SOMEREDUNDANCY?   To compute global PageRank?                                  26
Overlapping                         Clusters                               Use the                               redundanc...
CommunicationavoidingalgorithmsCommunication is the limitingfactor in most computationsthese days. Flops are,relatively sp...
KEY POINTSUtilize personalized PageRank vectors to findthe clusters with “good” conductance scores.Define “core” vertices fo...
All nodes solve locally using #the coordinate descent method.                                  30/41
All nodes solve locally using #the coordinate descent method.A core vertex for the                                  31/41g...
All nodes solve locally using #    the coordinate descent method.   Red sends residuals to white.White send residuals to r...
White then uses the coordinatedescent method to adjust its solution.                                          33/41Will ca...
It works!                 2                                  Swapping Probability (usroads)                               ...
PERSONALIZED PAGERANK CLUSTERSSolve (I ↵P T )x = (1 ↵)ei       #to a large degree-weighted tolerance  Sweep over the verti...
CORE VERTICESCompute the expected “leavetime” for eachvertex in a cluster. Keep increasing the threshold for a “good”verte...
MY QUESTIONS and future directionsREVERSE ORDER                         37
GRAPH SPECTRA                                                38/41                 Some work by Banerjee and Jost.
Upcoming SlideShare
Loading in...5
×

Computing Local and Global Centrality

887

Published on

Some recent results on computing PageRank and Katz scores on large networks that I presented at a Dagstuhl workshop.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
887
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
15
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Computing Local and Global Centrality

  1. 1. COMPUTING LOCALAND GLOBALCENTRALITYDAVID F. GLEICH (AND MANY OTHERS)!DATA MINING, NETWORKS AND DYNAMICS2011 NOVEMBER 7 1
  2. 2. LOCAL GLOBAL PooyaEsfandiar Reid Francesco Andersen Bonchi Chen Vahab Greif Mirrokni Laks V.S. Lakshmanan Byung- 2/41 Won On
  3. 3. Graph centralityGlobalHow important is anode? LocalHow important is anode with respectto another one? 3/41
  4. 4. Graph centralityKoschützki et al.must respectisomorphismhigher is betterExamplesnode-degree1/shortest-path 4/41
  5. 5. Graph centrality This talk Path summation X f (paths of length `) `local Katz scoreX number of paths of ↵` · length ` between i and j ` 5/41
  6. 6. A – adjacency matrixL – Laplacian matrixP – random walk transition matrixKatz score Ki,j = [(I ↵AT ) 1 ]i,j                                                Commute time Ci,j = vol(G)(L+ + L+ i,i j,j 2L+ ) i,jPageRank (I ↵P T )x = (1 ↵)e/n                       Xi,j = (1 ↵)[(I ↵P T ) 1 ]i,j 6/41
  7. 7. USES FOR CENTRALITYRanking features for web-search/classification Najork, M. A.; Zaragoza, H. & Taylor, M. J.# HITS on the web: How does it compare? Becchetti, L.; Castillo, C.; Donato, D.; Baeza-Yates, R. & Leonardi, S. Link analysis for Web spam detection Interesting nodes GeneRank, ProteinRank, TwitterRank, IsoRank, FutureRank, HostRank, DiffusionRank, ItemRank, SocialPageRank, SimRank 7/41
  8. 8. USES FOR CENTRALITYRanking networks of comparisons. Chartier, T. P.; Kreutzer, E.; Langville, A. N. & Pedings, K. E. Sensitivity and Stability of Ranking Vectors Clustering or community detection Andersen, R.; Chung, F. & Lang, K.# Local Graph Partitioning using PageRank Vectors Link prediction Savas et al. Hold on about 90 minutes 8/41
  9. 9. THESE GET USED A LOT. THEY MUST BE FAST. 9
  10. 10. MATRICES, MOMENTS, QUADRATUREEstimate a quadratic form T l  x f (Z )x  u T + (ei ej ) L (ei ej ) Commute1 T 1 (ei + ej )T (I ↵P ) 1 (ei + ej ) (ei ej )T (I ↵P T ) 1 (ei ej ) Katz4 4Also used by Benzi and Bonito (LAA) for Katzscores and the matrix exponential 10/41
  11. 11. MMQ - THE BIG IDEAQuadratic form                         Think                                    Weighted sum                            A is s.p.d. use EVD      Stieltjes integral                            “A tautology”      Quadrature approximation                                 Matrix equation                      LanczosDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 22 of 47 11/41
  12. 12. MMQ PROCEDUREGoal                                    Given                                    1. Run k-steps of Lanczos on       starting with      2. Compute          ,       with an additional eigenvalue at       , set                               Correspond to a Gauss-Radau rule, with u as a prescribed node3. Compute       ,       with an additional eigenvalue at    , set                            Correspond to a Gauss-Radau rule, with l as a prescribed node4. Output                      as lower and upper bounds on       12/41David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 25 of 47
  13. 13. How well does it work? Bounds Error arxiv, Katz, hard alpha arxiv, Katz, hard50 0 10 0 -5 10-50 5 10 15 20 25 30 5 10 15 20 25 30 matrix-vector products matrix-vector products 13/41 = 1/( || A ||2 + 1 )
  14. 14. MY COMPLAINTSMatvecs are expensive.Takes many iterations.Just one score comes out! 14/41
  15. 15. Katz scoresATZ SCORES ARE LOCALIZED T (I ↵A )k = e i are highly localized. Up to 50 neighbors is 99.65% of the total mass 15/41Gleich (Purdue) Univ. Chicago SSCS Seminar 32 of 47
  16. 16. HOW CAN WEEXPLOIT THIS? 16
  17. 17. TOP-K ALGORITHM FOR KATZApproximate       T                                          where       is sparseKeep       sparse tooIdeally, don’t “touch” all of       17/41David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of
  18. 18. TOP-K ALGORITHM FOR KATZApproximate       T                                          where       is sparseKeep       sparse tooIdeally, don’t “touch” all of       This is possible for 18/41David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of personalized PageRank!
  19. 19. Richardson Ax = bx(k+1) = x(k) + r(k) A = AT , A ⌫ 0 Gradient descent r(k+1) = b Ax(k) equivalent# min xT Ax 2xT b to What about coordinate descent?Gauss-Southwell Ax = bx(k+1) = x(k) + rj(k) ej How tor(k+1) = r(k) + rj(k) Aej pick j? Frequently “rediscovered” for PageRank. 19/41 McSherry (WWW2005), Berkhin (JIM 2007), Andersen-Chung-Lang (FOCS 2006)
  20. 20. DEMO! 20
  21. 21. NEW CONVERGENCE THEORYKatz and PageRank are equivalent if 1 / || A ||1 Gauss-Southwell converges when 1 / || A ||2 (Luo and Tseng 1992) if j is picked as the largestresidualRead all about itFast matrix computations for pair-wise and column-wise commute times andKatz scores. Bonchi, Esfandiar, Gleich, Greif, Lakshmanan, J. InternetMathematics (to appear) 21/41
  22. 22. 1,000,000 node, 100,000,000 edges hollywood, Katz, hard alphaPrecision@k for exact top−k sets 1 0.8 0.6 0.4 k=10 k=100 0.2 k=1000 cg k=25 0 k=25 −2 −1 0 1 2 22/41 10 10 10 10 10 Equivalent matrix−vector products
  23. 23. OPEN QUESTIONSI can’t find any existing derivation of this methodin the non-symmetric case (prior to thePageRank literature). Any thoughts?How to show that the method convergence for anon-symmetric matrix when (I ↵P T ) is notdiagonally dominant? 23/41
  24. 24. OVERLAPPINGCLUSTERS FORDISTRIBUTEDCENTRALITY 24
  25. 25. LARGE GRAPHS, IN PRACTICE Copy 1 Copy 2 src - dst src - dst src - dst src - dst src - dst src - dst Copy 1 Copy 2 src - dst src - dst src - dst src - dst src - dst src - dst Copy 1 Copy 2 src - dst src - dst src - dst src - dst src - dst src - dst Edge lists maybe tied together by a 25/41 common host, stored redundantly on many hard drives.
  26. 26. UTILIZE SOMEREDUNDANCY? To compute global PageRank? 26
  27. 27. Overlapping Clusters Use the redundancy to reduce communication when solving a PageRank problemOverlapping clusters for distributed computation. # 27/41Andersen, Gleich, Mirrokni, WSDM2012 (to appear).
  28. 28. CommunicationavoidingalgorithmsCommunication is the limitingfactor in most computationsthese days. Flops are,relatively speaking, free. 28/41
  29. 29. KEY POINTSUtilize personalized PageRank vectors to findthe clusters with “good” conductance scores.Define “core” vertices for each cluster. Find agood way to cover the graph with theseclusters.Use restricted additive Schwarz to solve #(thanks Prof. Szyld and Frommer!) 29/41
  30. 30. All nodes solve locally using #the coordinate descent method. 30/41
  31. 31. All nodes solve locally using #the coordinate descent method.A core vertex for the 31/41gray cluster.
  32. 32. All nodes solve locally using # the coordinate descent method. Red sends residuals to white.White send residuals to red. 32/41
  33. 33. White then uses the coordinatedescent method to adjust its solution. 33/41Will cause communication to red/blue.
  34. 34. It works! 2 Swapping Probability (usroads) PageRank Communication (usroads) Swapping Probability (web−Google) 1.5 PageRank Communication (web−Google)Relative Work 1 Metis Partitioner 0.5 0 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Volume Ratio How much more of the 34/41 graph we need to store.
  35. 35. PERSONALIZED PAGERANK CLUSTERSSolve (I ↵P T )x = (1 ↵)ei #to a large degree-weighted tolerance Sweep over the vertices in order of their degree-normalized rank. Find the best conductance set. A Cheeger-like inequality. (Not a heuristic.) 35/41
  36. 36. CORE VERTICESCompute the expected “leavetime” for eachvertex in a cluster. Keep increasing the threshold for a “good”vertex until every vertex is core in some cluster.Then approximate a set-cover problem to coverthe graph with clusters, and use a heuristic topack vertices until 36/41
  37. 37. MY QUESTIONS and future directionsREVERSE ORDER 37
  38. 38. GRAPH SPECTRA 38/41 Some work by Banerjee and Jost.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×