Your SlideShare is downloading. ×
Fast matrix computations for pair-wise and column-wise Katz scores and commute times
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Fast matrix computations for pair-wise and column-wise Katz scores and commute times

711
views

Published on

A seminar I gave at the University of Chicago about the ideas to compute the Katz matrices and commute times quickly.

A seminar I gave at the University of Chicago about the ideas to compute the Katz matrices and commute times quickly.

Published in: Education, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
711
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. FAST MATRIX COMPUTATIONS FOR PAIRWISE AND COLUMN-WISE KATZ SCORES AND COMMUTE TIMES David F. Gleich Purdue University University of Chicago Statistical and Scientific Computing Seminar October 6th, 2011 With Pooya Esfandiar, Francesco Bonchi, Chen Grief, Laks V. S. Lakshmanan, and Byung-Won OnDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 1 / 47
  • 2. MAIN RESULTSA – adjacency matrix For KatzL – Laplacian matrix Compute one       fast Compute top       fastKatz score :                         Commute time:                For CommuteCompute one       fastDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 2 of 47
  • 3. OUTLINEWhy study these measures?Katz Rank and Commute TimeHow else do people compute them?Quadrature rules for pairwise scoresSparse linear systems solves for top-kAs many results as we have time for…David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 3 of 47
  • 4. WHY? LINK PREDICTION Neighborhood based Path based Liben-Nowell and Kleinberg 2003, 2006 found that path based link prediction was more efficientDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 4 of 47
  • 5. NOTEAll graphs are undirectedAll graphs are connectedDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 5 of 47
  • 6. LEO KATZDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 6 of 47
  • 7. NOT QUITE, WIKIPEDIA    : adjacency,                     : random walk PageRank               Katz              These are equivalent if     has constantdegreeDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 7 of 47
  • 8. WHAT KATZ ACTUALLY SAID “we assume that each link independently has the same probability of being effective” … “we conceive a constant     , depending on the group and the context of the particular investigation, which has the force of a probability of effectiveness of a single link. A k-step chain then, has probability       of being effective.” “We wish to find the column sums of the matrix” Leo Katz 1953, A New Status Index Derived from Sociometric Analysis, Psychometria 18(1):39-43David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 8 of 47
  • 9. A MODERN TAKEThe Katz score (node-based) is           The Katz score (edge-based) is           David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 9 of 47
  • 10. RETURNING TO THE MATRIX                                         Carl Neumann                            David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 10 of 47
  • 11. Carl Neumann I’ve heard the Neumann series called the “von Neumann” series more than I’d like! In fact, the von Neumann kernel of a graph should be named the “Neumann” kernel! Wikipedia pageDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 11 / 47
  • 12. PROPERTIES OF KATZ’S MATRIX    is symmetric    exists when                                      is spd when                      Note that         1/max-degree sufficesDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 12 of 47
  • 13. COMMUTE TIME Picture taken from Google images, seems to be Bay Bridge Traffic by Jim M. Goldstein.David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 13 / 47
  • 14. COMMUTE TIMEConsider a uniform random walk on agraph                     Also called the hitting time from node i to j, or the first transition time                                                     David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 14 of 47
  • 15. SKIPPING DETAILS                    : graph Laplacian                                             is the only null-vectorDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 15 of 47
  • 16. WHAT DO OTHER PEOPLE DO?1) Just work with the linear algebra formulations2) For Katz, Truncate the Neumann series as a few (3-5) terms3) Use low-rank approximations from EVD(A) or EVD(L)4) For commute, use Johnson-Lindenstrauss inspired random sampling5) Approximately decompose into smaller problems Liben-Nowell and Kleinberg CIKM2003, Acar et al. ICDM2009, Spielman and Srivastava STOC2008, Sarkar and Moore UAI2007,Wang et al. ICDM2007David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 16 of 47
  • 17. THE PROBLEM All of these techniques are preprocessing based because most people’s goal is to compute all the scores. We want to avoid preprocessing the graph. There are a few caveats here! i.e. one could solve the system instead of looking for the matrix inverseDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 17 of 47
  • 18. WHY NO PREPROCESSING? The graph is constantly changing as I rate new movies.David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 18 of 47
  • 19. WHY NO PREPROCESSING? Top-k predicted “links” are movies to watch! Pairwise scores give user similarityDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 19 of 47
  • 20. PAIR-WISE ALGORITHMSDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 20 / 47
  • 21. PAIRWISE ALGORITHMS Katz            Commute                                         Golub and Meurant to the rescue!David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 21 of 47
  • 22. MMQ - THE BIG IDEAQuadratic form                 Think                       Weighted sum           A is s.p.d. use EVD   Stieltjes integral           “A tautology”   Quadrature approximation             Matrix equation         LanczosDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 22 of 47
  • 23. LANCZOS        , k-steps of the Lanczos methodproduce                                    and                              =              David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 23 of 47
  • 24. PRACTICAL LANCZOSOnly need to store the last 2 vectors in      Updating requires O(matvec) work      is not orthogonalDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 24 of 47
  • 25. MMQ PROCEDUREGoal                        Given                        1. Run k-steps of Lanczos on     starting with    2. Compute       ,     with an additional eigenvalue at     , set            Correspond to a Gauss-Radau rule, with u as a prescribed node3. Compute     ,     with an additional eigenvalue at   , set           Correspond to a Gauss-Radau rule, with l as a prescribed node4. Output               as lower and upper bounds on    David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 25 of 47
  • 26. PRACTICAL MMQIncrease k to become more accurateBad eigenvalue bounds yield worse results      and     are easy to compute    not required, we can iterativelyupdate it’s LU factorizationDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 26 of 47
  • 27. PRACTICAL MMQDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 27 of 47
  • 28. ONE LAST STEP FOR KATZ Katz                                                                            David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 28 of 47
  • 29. COLUMN-WISE ALGORITHMSDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 29 / 47
  • 30. COLUMN-WISE COMMUTE-TIME                      requires entire diagonal of             =                                     Each vector     is computed by the a                       Lanczos based CG algorithm. Paige and Saunders, 1975David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 30 of 47
  • 31. COLUMN-WISE COMMUTE TIME   following Hein et al. 2010 is a MUCH better and faster approximation.David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 31
  • 32. KATZ SCORES ARE LOCALIZED Up to 50 neighbors is 99.65% of the total massDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 32 of 47
  • 33. PARTICIPATION RATIOSParticipation Ratios “effectivenon-zeros”in a vector       David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 33 of 47
  • 34. TOP-K ALGORITHM FOR KATZApproximate                   where     is sparseKeep     sparse tooIdeally, don’t “touch” all of    David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of 47
  • 35. INSPIRATION - PAGERANK Approximate                    where     is sparse Keep     sparse too? YES! Ideally, don’t “touch” all of     ? YES!McSherry WWW2005, Berkhin 2007, Anderson et al. FOCS2008 – Thanks to Reid Anderson for telling me McSherry did this too. David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 35 of 47
  • 36. THE ALGORITHM - MCSHERRYFor              Start with the Richardson iteration                            Rewrite                    Richardson converges if                        David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 36 of 47
  • 37. THE ALGORITHMNote     is sparse.If                 , then         is sparse.Idea only add one component of         to        David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 37 of 47
  • 38. THE ALGORITHMFor              Init:                                                                How to pick   ?David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 38 of 47
  • 39. THE ALGORITHM FOR KATZFor                            Init:                                                                       Pick   as max     Storing the non-zeros of the residual in a heap makes picking the max log(n) time. See Anderson et al. FOCS2008 for moreDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 39 of 47
  • 40. CONVERGENCE?If         1/max-degree then       is sub-stochastic and the PageRank based proofapplies because the matrix is diagonallydominantFor                       , then for symmetric     ,this algorithm is the Gauss-Southwellprocedure and it still converges.David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 40 of 47
  • 41. RESULTS – DATA, PARAMETERS All unweighted, connected graphs Easy                                     Hard                            David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 41 of 47
  • 42. KATZ BOUND CONVERGENCEDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 42 of 47
  • 43. COMMUTE BOUND CONVERG.David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 43 of 47
  • 44. KATZ SET CONVERGENCE For arXiv graph.David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 44 of 47
  • 45. TIMINGDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 45 of 47
  • 46. CONCLUSIONSThese algorithms are faster than manyalternatives.For pairwise commute, stopping criteriaare simpler with boundsFor top-k problems, we often need lessthan 1 matvec for good enough resultsDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 46 of 47
  • 47. Paper at WAW2010, J. Internet Mathematics Slides should be online soon Code is online already www.cs.purdue.edu/homes/dgleich/ /codes/fast-katz-2011 By AngryDogDesign on DeviantArtDavid F. Gleich (Purdue) Univ. Chicago SSCS Seminar 47