Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
×

# Computing Local and Global Centrality

1,218 views

Published on

Some recent results on computing PageRank and Katz scores on large networks that I presented at a Dagstuhl workshop.

Published in: Technology, Education
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

### Computing Local and Global Centrality

1. 1. COMPUTING LOCALAND GLOBALCENTRALITYDAVID F. GLEICH (AND MANY OTHERS)!DATA MINING, NETWORKS AND DYNAMICS2011 NOVEMBER 7 1
2. 2. LOCAL GLOBAL PooyaEsfandiar Reid Francesco Andersen Bonchi Chen Vahab Greif Mirrokni Laks V.S. Lakshmanan Byung- 2/41 Won On
3. 3. Graph centralityGlobalHow important is anode? LocalHow important is anode with respectto another one? 3/41
4. 4. Graph centralityKoschützki et al.must respectisomorphismhigher is betterExamplesnode-degree1/shortest-path 4/41
5. 5. Graph centrality This talk Path summation X f (paths of length `) `local Katz scoreX number of paths of ↵` · length ` between i and j ` 5/41
6. 6. A – adjacency matrixL – Laplacian matrixP – random walk transition matrixKatz score Ki,j = [(I ↵AT ) 1 ]i,j                                                Commute time Ci,j = vol(G)(L+ + L+ i,i j,j 2L+ ) i,jPageRank (I ↵P T )x = (1 ↵)e/n                       Xi,j = (1 ↵)[(I ↵P T ) 1 ]i,j 6/41
7. 7. USES FOR CENTRALITYRanking features for web-search/classiﬁcation Najork, M. A.; Zaragoza, H. & Taylor, M. J.# HITS on the web: How does it compare? Becchetti, L.; Castillo, C.; Donato, D.; Baeza-Yates, R. & Leonardi, S. Link analysis for Web spam detection Interesting nodes GeneRank, ProteinRank, TwitterRank, IsoRank, FutureRank, HostRank, DiffusionRank, ItemRank, SocialPageRank, SimRank 7/41
8. 8. USES FOR CENTRALITYRanking networks of comparisons. Chartier, T. P.; Kreutzer, E.; Langville, A. N. & Pedings, K. E. Sensitivity and Stability of Ranking Vectors Clustering or community detection Andersen, R.; Chung, F. & Lang, K.# Local Graph Partitioning using PageRank Vectors Link prediction Savas et al. Hold on about 90 minutes 8/41
9. 9. THESE GET USED A LOT. THEY MUST BE FAST. 9
10. 10. MATRICES, MOMENTS, QUADRATUREEstimate a quadratic form T l  x f (Z )x  u T + (ei ej ) L (ei ej ) Commute1 T 1 (ei + ej )T (I ↵P ) 1 (ei + ej ) (ei ej )T (I ↵P T ) 1 (ei ej ) Katz4 4Also used by Benzi and Bonito (LAA) for Katzscores and the matrix exponential 10/41
11. 11. MMQ - THE BIG IDEAQuadratic form                         Think                                    Weighted sum                            A is s.p.d. use EVD      Stieltjes integral                            “A tautology”      Quadrature approximation                                 Matrix equation                      LanczosDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 22 of 47 11/41
12. 12. MMQ PROCEDUREGoal                                    Given                                    1. Run k-steps of Lanczos on       starting with      2. Compute          ,       with an additional eigenvalue at       , set                               Correspond to a Gauss-Radau rule, with u as a prescribed node3. Compute       ,       with an additional eigenvalue at    , set                            Correspond to a Gauss-Radau rule, with l as a prescribed node4. Output                      as lower and upper bounds on       12/41David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 25 of 47
13. 13. How well does it work? Bounds Error arxiv, Katz, hard alpha arxiv, Katz, hard50 0 10 0 -5 10-50 5 10 15 20 25 30 5 10 15 20 25 30 matrix-vector products matrix-vector products 13/41 ������ = 1/( || A ||2 + 1 )
14. 14. MY COMPLAINTSMatvecs are expensive.Takes many iterations.Just one score comes out! 14/41
15. 15. Katz scoresATZ SCORES ARE LOCALIZED T (I ↵A )k = e i are highly localized. Up to 50 neighbors is 99.65% of the total mass 15/41Gleich (Purdue) Univ. Chicago SSCS Seminar 32 of 47
16. 16. HOW CAN WEEXPLOIT THIS? 16
17. 17. TOP-K ALGORITHM FOR KATZApproximate       T                                          where       is sparseKeep       sparse tooIdeally, don’t “touch” all of       17/41David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of
18. 18. TOP-K ALGORITHM FOR KATZApproximate       T                                          where       is sparseKeep       sparse tooIdeally, don’t “touch” all of       This is possible for " 18/41David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of personalized PageRank!
19. 19. Richardson Ax = bx(k+1) = x(k) + r(k) A = AT , A ⌫ 0 Gradient descent r(k+1) = b Ax(k) equivalent# min xT Ax 2xT b to What about coordinate descent?Gauss-Southwell Ax = bx(k+1) = x(k) + rj(k) ej How tor(k+1) = r(k) + rj(k) Aej pick j? Frequently “rediscovered” for PageRank. 19/41 McSherry (WWW2005), Berkhin (JIM 2007), Andersen-Chung-Lang (FOCS 2006)
20. 20. DEMO! 20
21. 21. NEW CONVERGENCE THEORYKatz and PageRank are equivalent if ������ < 1 / || A ||1 Gauss-Southwell converges when ������ < 1 / || A ||2 (Luo and Tseng 1992) if j is picked as the largestresidualRead all about itFast matrix computations for pair-wise and column-wise commute times andKatz scores. Bonchi, Esfandiar, Gleich, Greif, Lakshmanan, J. InternetMathematics (to appear) 21/41
22. 22. 1,000,000 node, 100,000,000 edges hollywood, Katz, hard alphaPrecision@k for exact top−k sets 1 0.8 0.6 0.4 k=10 k=100 0.2 k=1000 cg k=25 0 k=25 −2 −1 0 1 2 22/41 10 10 10 10 10 Equivalent matrix−vector products
23. 23. OPEN QUESTIONSI can’t ﬁnd any existing derivation of this methodin the non-symmetric case (prior to thePageRank literature). Any thoughts?How to show that the method convergence for anon-symmetric matrix when (I ↵P T ) is notdiagonally dominant? 23/41
24. 24. OVERLAPPINGCLUSTERS FORDISTRIBUTEDCENTRALITY 24
25. 25. LARGE GRAPHS, IN PRACTICE Copy 1 Copy 2 src -> dst src -> dst src -> dst src -> dst src -> dst src -> dst Copy 1 Copy 2 src -> dst src -> dst src -> dst src -> dst src -> dst src -> dst Copy 1 Copy 2 src -> dst src -> dst src -> dst src -> dst src -> dst src -> dst Edge lists maybe tied together by a 25/41 common host, stored redundantly on many hard drives.
26. 26. UTILIZE SOMEREDUNDANCY? To compute global PageRank? 26
27. 27. Overlapping Clusters Use the redundancy to reduce communication when solving a PageRank problemOverlapping clusters for distributed computation. # 27/41Andersen, Gleich, Mirrokni, WSDM2012 (to appear).
28. 28. CommunicationavoidingalgorithmsCommunication is the limitingfactor in most computationsthese days. Flops are,relatively speaking, free. 28/41
29. 29. KEY POINTSUtilize personalized PageRank vectors to ﬁndthe clusters with “good” conductance scores.Deﬁne “core” vertices for each cluster. Find agood way to cover the graph with theseclusters.Use restricted additive Schwarz to solve #(thanks Prof. Szyld and Frommer!) 29/41
30. 30. All nodes solve locally using #the coordinate descent method. 30/41
31. 31. All nodes solve locally using #the coordinate descent method.A core vertex for the 31/41gray cluster.
32. 32. All nodes solve locally using # the coordinate descent method. Red sends residuals to white.White send residuals to red. 32/41
33. 33. White then uses the coordinatedescent method to adjust its solution. 33/41Will cause communication to red/blue.
34. 34. It works! 2 Swapping Probability (usroads) PageRank Communication (usroads) Swapping Probability (web−Google) 1.5 PageRank Communication (web−Google)Relative Work 1 Metis Partitioner 0.5 0 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Volume Ratio How much more of the 34/41 graph we need to store.
35. 35. PERSONALIZED PAGERANK CLUSTERSSolve (I ↵P T )x = (1 ↵)ei #to a large degree-weighted tolerance ������ Sweep over the vertices in order of their degree-normalized rank. Find the best conductance set. A Cheeger-like inequality. (Not a heuristic.) 35/41
36. 36. CORE VERTICESCompute the expected “leavetime” for eachvertex in a cluster. Keep increasing the threshold for a “good”vertex until every vertex is core in some cluster.Then approximate a set-cover problem to coverthe graph with clusters, and use a heuristic topack vertices until 36/41
37. 37. MY QUESTIONS "and future directionsREVERSE ORDER 37
38. 38. GRAPH SPECTRA 38/41 Some work by Banerjee and Jost.