Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Design lesson not taught in schools by Pavithra Solai Ja... 1196 views
- Catastrophic Cancellation by C4Media 976 views
- Matrices in computer applications by Rayyan777 1889 views
- Profitable growth via adjacency - G... by Peter Spung 317 views
- Multiplication of matrices and its ... by nayanika bhalla 5344 views
- Adjacency pair by LanzManipor 19354 views

1,992 views

Published on

Slides from a seminar I gave at Emory University on 2012-01-20

No Downloads

Total views

1,992

On SlideShare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

17

Comments

0

Likes

1

No embeds

No notes for slide

- 1. FAST MATRIX COMPUTATIONS FOR PAIRWISE AND COLUMN-WISE KATZ SCORES AND COMMUTE TIMES David F. Gleich Purdue University Emory University Math/CS Seminar January 20, 2012 With Pooya Esfandiar, Francesco Bonchi, Chen Greif, Laks V. S. Lakshmanan, and Byung-Won OnDavid F. Gleich (Purdue) Emory Math/CS Seminar 1 / 47
- 2. MAIN RESULTSA – adjacency matrix For KatzL – Laplacian matrix Compute one fast Compute top fastKatz score : Commute time: For CommuteCompute one fastEstimate David F. Gleich (Purdue) Emory Math/CS Seminar 2 of 47
- 3. OUTLINEWhy study these measures?Katz Rank and Commute TimeHow else do people compute them?Quadrature rules for pairwise scoresSparse linear systems solves for top-kAs many results as we have time for…David F. Gleich (Purdue) Emory Math/CS Seminar 3 of 47
- 4. WHY? LINK PREDICTION Neighborhood based Path based Liben-Nowell and Kleinberg 2003, 2006 found that path based link prediction was more efficientDavid F. Gleich (Purdue) Emory Math/CS Seminar 4 of 47
- 5. NOTEAll graphs are undirectedAll graphs are connectedDavid F. Gleich (Purdue) Emory Math/CS Seminar 5 of 47
- 6. LEO KATZDavid F. Gleich (Purdue) Emory Math/CS Seminar 6 of 47
- 7. NOT QUITE, WIKIPEDIA : adjacency, : random walk PageRank Katz These are equivalent if has constantdegreeDavid F. Gleich (Purdue) Emory Math/CS Seminar 7 of 47
- 8. WHAT KATZ ACTUALLY SAID “we assume that each link independently has the same probability of being effective” … “we conceive a constant , depending on the group and the context of the particular investigation, which has the force of a probability of effectiveness of a single link. A k-step chain then, has probability of being effective.” “We wish to find the column sums of the matrix” Leo Katz 1953, A New Status Index Derived from Sociometric Analysis, Psychometria 18(1):39-43David F. Gleich (Purdue) Emory Math/CS Seminar 8 of 47
- 9. A MODERN TAKEThe Katz score (node-based) is The Katz score (edge-based) is David F. Gleich (Purdue) Emory Math/CS Seminar 9 of 47
- 10. RETURNING TO THE MATRIX Carl Neumann David F. Gleich (Purdue) Emory Math/CS Seminar 10 of 47
- 11. Carl Neumann I’ve heard the Neumann series called the “von Neumann” series more than I’d like! In fact, the von Neumann kernel of a graph should be named the “Neumann” kernel! Wikipedia pageDavid F. Gleich (Purdue) Emory Math/CS Seminar 11 / 47
- 12. PROPERTIES OF KATZ’S MATRIX is symmetric exists when is spd when Note that 1/max-degree sufficesDavid F. Gleich (Purdue) Emory Math/CS Seminar 12 of 47
- 13. COMMUTE TIME Picture taken from Google images, seems to be Bay Bridge Traffic by Jim M. Goldstein.David F. Gleich (Purdue) Emory Math/CS Seminar 13 / 47
- 14. COMMUTE TIMEConsider a uniform random walk on agraph Also called the hitting time from node i to j, or the first transition time David F. Gleich (Purdue) Emory Math/CS Seminar 14 of 47
- 15. SKIPPING DETAILS : graph Laplacian is the only null-vectorDavid F. Gleich (Purdue) Emory Math/CS Seminar 15 of 47
- 16. WHAT DO OTHER PEOPLE DO?1) Just work with the linear algebra formulations2) For Katz, Truncate the Neumann series as a few (3-5) terms, or MMQ (Benzi & Boito)3) Use low-rank approximations from EVD(A) or EVD(L)4) For commute, use Johnson-Lindenstrauss inspired random sampling5) Approximately decompose into smaller problems Liben-Nowell and Kleinberg CIKM2003, Acar et al. ICDM2009, Spielman and Srivastava STOC2008, Sarkar and Moore UAI2007, Wang et al. ICDM2007David F. Gleich (Purdue) Emory Math/CS Seminar 16 of 47
- 17. THE PROBLEM All of these techniques are preprocessing based because most people’s goal is to compute all the scores. We want to avoid preprocessing the graph. There are a few caveats here! i.e. one could solve the system instead of looking for the matrix inverseDavid F. Gleich (Purdue) Emory Math/CS Seminar 17 of 47
- 18. WHY NO PREPROCESSING? The graph is constantly changing as I rate new movies.David F. Gleich (Purdue) Emory Math/CS Seminar 18 of 47
- 19. WHY NO PREPROCESSING? Top-k predicted “links” are movies to watch! Pairwise scores give user similarityDavid F. Gleich (Purdue) Emory Math/CS Seminar 19 of 47
- 20. PAIR-WISE ALGORITHMSDavid F. Gleich (Purdue) Emory Math/CS Seminar 20 / 47
- 21. PAIRWISE ALGORITHMS Katz Commute Golub and Meurant to the rescue!David F. Gleich (Purdue) Emory Math/CS Seminar 21 of 47
- 22. MMQ - THE BIG IDEAQuadratic form Think Weighted sum A is s.p.d. use EVD Stieltjes integral “A tautology” Quadrature approximation Matrix equation LanczosDavid F. Gleich (Purdue) Emory Math/CS Seminar 22 of 47
- 23. LANCZOS , k-steps of the Lanczos methodproduce and = David F. Gleich (Purdue) Emory Math/CS Seminar 23 of 47
- 24. PRACTICAL LANCZOSOnly need to store the last 2 vectors in Updating requires O(matvec) work is not orthogonalDavid F. Gleich (Purdue) Emory Math/CS Seminar 24 of 47
- 25. MMQ PROCEDUREGoal Given 1. Run k-steps of Lanczos on starting with 2. Compute , with an additional eigenvalue at , set Correspond to a Gauss-Radau rule, with u as a prescribed node3. Compute , with an additional eigenvalue at , set Correspond to a Gauss-Radau rule, with l as a prescribed node4. Output as lower and upper bounds on David F. Gleich (Purdue) Emory Math/CS Seminar 25 of 47
- 26. PRACTICAL MMQIncrease k to become more accurateBad eigenvalue bounds yield worse results and are easy to compute not required, we can iterativelyupdate it’s LU factorizationDavid F. Gleich (Purdue) Emory Math/CS Seminar 26 of 47
- 27. PRACTICAL MMQDavid F. Gleich (Purdue) Emory Math/CS Seminar 27 of 47
- 28. ONE LAST STEP FOR KATZ Katz David F. Gleich (Purdue) Emory Math/CS Seminar 28 of 47
- 29. COLUMN-WISE ALGORITHMSDavid F. Gleich (Purdue) Emory Math/CS Seminar 29 / 47
- 30. COLUMN-WISE COMMUTE-TIME requires entire diagonal of = Each vector is computed by the a Lanczos based CG algorithm. Paige and Saunders, 1975David F. Gleich (Purdue) Emory Math/CS Seminar 30 of 47
- 31. COLUMN-WISE COMMUTE TIME following Hein et al. 2010 is a MUCH better and faster approximation.David F. Gleich (Purdue) Emory Math/CS Seminar 31
- 32. KATZ SCORES ARE LOCALIZED Up to 50 neighbors is 99.65% of the total massDavid F. Gleich (Purdue) Emory Math/CS Seminar 32 of 47
- 33. PARTICIPATION RATIOSParticipation Ratios “effectivenon-zeros”in a vector David F. Gleich (Purdue) Emory Math/CS Seminar 33 of 47
- 34. TOP-K ALGORITHM FOR KATZApproximate where is sparseKeep sparse tooIdeally, don’t “touch” all of David F. Gleich (Purdue) Emory Math/CS Seminar 34 of 47
- 35. INSPIRATION - PAGERANK Approximate where is sparse Keep sparse too? YES! Ideally, don’t “touch” all of ? YES!McSherry WWW2005, Berkhin 2007, Anderson et al. FOCS2008 – Thanks to Reid Anderson for telling me McSherry did this too. David F. Gleich (Purdue) Emory Math/CS Seminar 35 of 47
- 36. THE ALGORITHM - MCSHERRYFor Start with the Richardson iteration Rewrite Richardson converges if David F. Gleich (Purdue) Emory Math/CS Seminar 36 of 47
- 37. THE ALGORITHMNote is sparse.If , then is sparse.Idea only add one component of to David F. Gleich (Purdue) Emory Math/CS Seminar 37 of 47
- 38. THE ALGORITHMFor Init: How to pick ?David F. Gleich (Purdue) Emory Math/CS Seminar 38 of 47
- 39. THE ALGORITHM FOR KATZFor Init: Pick as max Storing the non-zeros of the residual in a heap makes picking the max log(n) time. See Anderson et al. FOCS2008 for moreDavid F. Gleich (Purdue) Emory Math/CS Seminar 39 of 47
- 40. CONVERGENCE?If 1/max-degree then is sub-stochastic and the PageRank based proofapplies because the matrix is diagonallydominantFor , then for symmetric , thisalgorithm is the Gauss-Southwellprocedure and it still converges.David F. Gleich (Purdue) Emory Math/CS Seminar 40 of 47
- 41. NONSYMMETRIC ALGORITHM?Was this algorithm known in the non-symmetric case?YES! It’s just Gauss-Seidel/Gauss-Southwell, but with a column vs. roworientation.Called dynamic residual to AMGersDavid F. Gleich (Purdue) Emory Math/CS Seminar 41 of 47
- 42. RESULTS – DATA, PARAMETERS All unweighted, connected graphs Easy Hard David F. Gleich (Purdue) Emory Math/CS Seminar 42 of 47
- 43. KATZ BOUND CONVERGENCEDavid F. Gleich (Purdue) Emory Math/CS Seminar 43 of 47
- 44. COMMUTE BOUND CONVERG.David F. Gleich (Purdue) Emory Math/CS Seminar 44 of 47
- 45. KATZ SET CONVERGENCE For arXiv graph.David F. Gleich (Purdue) Emory Math/CS Seminar 45 of 47
- 46. TIMINGDavid F. Gleich (Purdue) Emory Math/CS Seminar 46 of 47
- 47. CONCLUSIONSThese algorithms are faster than manyalternatives.For pairwise commute, stopping criteriaare simpler with boundsFor top-k problems, we often need lessthan 1 matvec for good enough resultsDavid F. Gleich (Purdue) Emory Math/CS Seminar 47 of 47
- 48. Paper at WAW2010 and J. Internet Mathematics Slides online Code is online already www.cs.purdue.edu/ homes/dgleich/codes/fast-katz-2011 By AngryDogDesign on DeviantArtDavid F. Gleich (Purdue) Emory Math/CS Seminar 48

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment