0
FAST KATZ AND COMMUTERS                                                       Quadrature Rules and Sparse Linear Solvers  ...
MAIN RESULTSA – adjacency matrix                    For KatzL – Laplacian matrix                    Compute one       fast...
OUTLINEWhy study these measures?Katz Rank and Commute TimeHow else do people compute them?Quadrature rules for pairwise sc...
WHY? LINK PREDICTION                                                                                 Neighborhood based   ...
NOTEAll graphs are undirectedAll graphs are connectedDavid F. Gleich (Sandia)   Berkeley SCMC Seminar   5 of 43
LEO KATZDavid F. Gleich (Sandia)   Berkeley SCMC Seminar   6 of 43
NOT QUITE, WIKIPEDIA    : adjacency,                     : random walk                 PageRank                           ...
WHAT KATZ ACTUALLY SAID                                           “we assume that each link independently has the         ...
A MODERN TAKEThe Katz score (node-based) is                                     The Katz score (edge-based) is            ...
RETURNING TO THE MATRIX                                                                                                   ...
Carl Neumann                           I’ve heard the Neumann series called the “von Neumann”                           se...
PROPERTIES OF KATZ’S MATRIX    is symmetric    exists when                                      is spd when               ...
COMMUTE TIME                           Picture taken from Google images, seems to be                                Bay Br...
COMMUTE TIMEConsider a uniform random walk on agraph                                                                      ...
SKIPPING DETAILS                    : graph Laplacian                                             is the only null-vectorD...
WHAT DO OTHER PEOPLE DO?1) Just work with the linear algebra formulations2) For Katz, Truncate the Neumann series as a few...
THE PROBLEM                             All of these techniques are                            preprocessing based because...
WHY NO PREPROCESSING?                           The graph is constantly changing                                as I rate ...
WHY NO PREPROCESSING?                                                      Top-k predicted “links”                        ...
PAIRWISE ALGORITHMS                Katz                  Commute                                                          ...
MMQ - THE BIG IDEAQuadratic form                                          Think                           Weighted sum    ...
LANCZOS        , k-steps of the Lanczos methodproduce                                    and                              ...
PRACTICAL LANCZOSOnly need to store the last 2 vectors in      Updating requires O(matvec) work      is not orthogonalDavi...
MMQ PROCEDUREGoal                        Given                        1. Run k-steps of Lanczos on     starting with    2....
PRACTICAL MMQIncrease k to become more accurateBad eigenvalue bounds yield worse results      and     are easy to compute ...
PRACTICAL MMQDavid F. Gleich (Sandia)   Berkeley SCMC Seminar   26 of 43
ONE LAST STEP FOR KATZ                      Katz                                                                         ...
KATZ SCORES ARE LOCALIZED                                                   Up to 50 neighbors is                         ...
TOP-K ALGORITHM FOR KATZApproximate                                 where     is sparseKeep     sparse tooIdeally, don’t “...
INSPIRATION - PAGERANK Approximate                                   where     is sparse Keep     sparse too? YES! Ideally...
THE ALGORITHM - MCSHERRYFor              Start with the Richardson iteration                                      Rewrite ...
THE ALGORITHMNote     is sparse.If                 , then         is sparse.Idea  only add one component of         to    ...
THE ALGORITHMFor              Init:                                                                How to pick   ?David F....
THE ALGORITHM FOR KATZFor                            Init:                                                                ...
CONVERGENCE?The “standard” proof works for        1/max-degree.For                       , then for symmetric     ,this al...
RESULTS – DATA, PARAMETERS               All unweighted, connected graphs                    Easy                         ...
KATZ BOUND CONVERGENCEDavid F. Gleich (Sandia)   Berkeley SCMC Seminar   37 of 43
COMMUTE BOUND CONVERG.David F. Gleich (Sandia)   Berkeley SCMC Seminar   38 of 43
KATZ SET CONVERGENCE                                                   For arXiv graph.David F. Gleich (Sandia)   Berkeley...
TIMINGDavid F. Gleich (Sandia)   Berkeley SCMC Seminar   40 of 43
CONCLUSIONSThese algorithms are faster than manyalternatives.For pairwise commute, stopping criteriaare simpler with bound...
WARTS AND TODOSStopping criteria on our top-k algorithmcan be a bit hairy… we should refine it.The top-k approach doesn’t ...
Paper at WAW2010                                                     Slides should be online soon                         ...
F-MEASUREDavid F. Gleich (Sandia)   Berkeley SCMC Seminar   44 of 43
MAIN RESULTS – SLIDE ONEA – adjacency matrixL – Laplacian matrixKatz score :                             Commute time:    ...
MAIN RESULTS – SLIDE TWOFor Katz Compute one       fast         Compute top       fastFor Commute       Compute one       ...
MAIN RESULTS – SLIDE THREEDavid F. Gleich (Sandia)   Berkeley SCMC Seminar   47 of 43
KATZ BOUND CONVERGENCEDavid F. Gleich (Sandia)   Berkeley SCMC Seminar   48 of 43
KATZ ERROR CONVERGENCEDavid F. Gleich (Sandia)   Berkeley SCMC Seminar   49 of 43
COMMUTE BOUND CONVERG.David F. Gleich (Sandia)   Berkeley SCMC Seminar   50 of 43
COMMUTE ERROR CONVERG.David F. Gleich (Sandia)   Berkeley SCMC Seminar   51 of 43
KATZ SET CONVERGENCEDavid F. Gleich (Sandia)   Berkeley SCMC Seminar   52 of 43
KATZ ORDER CONVERGENCEDavid F. Gleich (Sandia)   Berkeley SCMC Seminar   53 of 43
Upcoming SlideShare
Loading in...5
×

Fast Katz and Commuters

716

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
716
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Fast Katz and Commuters"

  1. 1. FAST KATZ AND COMMUTERS Quadrature Rules and Sparse Linear Solvers for Link Prediction Heuristics David F. Gleich Sandia National Labs Berkeley LAPACK Seminar February 3, 2011 With Pooya Esfandiar, Francesco Bonchi, Chen Grief, Laks V. S. Lakshmanan, and Byung-Won OnSandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.David F. Gleich (Sandia) Berkeley SCMC Seminar 1 / 43
  2. 2. MAIN RESULTSA – adjacency matrix For KatzL – Laplacian matrix Compute one       fast Compute top       fastKatz score :                         Commute time:                For CommuteCompute one       fastDavid F. Gleich (Sandia) Berkeley SCMC Seminar 2 of 43
  3. 3. OUTLINEWhy study these measures?Katz Rank and Commute TimeHow else do people compute them?Quadrature rules for pairwise scoresSparse linear systems solves for top-kAs many results as we have time for…David F. Gleich (Sandia) Berkeley SCMC Seminar 3 of 43
  4. 4. WHY? LINK PREDICTION Neighborhood based Path based Liben-Nowell and Kleinberg 2003, 2006 found that path based link prediction was more efficientDavid F. Gleich (Sandia) Berkeley SCMC Seminar 4 of 43
  5. 5. NOTEAll graphs are undirectedAll graphs are connectedDavid F. Gleich (Sandia) Berkeley SCMC Seminar 5 of 43
  6. 6. LEO KATZDavid F. Gleich (Sandia) Berkeley SCMC Seminar 6 of 43
  7. 7. NOT QUITE, WIKIPEDIA    : adjacency,                     : random walk PageRank               Katz              These are equivalent if     has constantdegreeDavid F. Gleich (Sandia) Berkeley SCMC Seminar 7 of 43
  8. 8. WHAT KATZ ACTUALLY SAID “we assume that each link independently has the same probability of being effective” … “we conceive a constant     , depending on the group and the context of the particular investigation, which has the force of a probability of effectiveness of a single link. A k-step chain then, has probability       of being effective.” “We wish to find the column sums of the matrix” Leo Katz 1953, A New Status Index Derived from Sociometric Analysis, Psychometria 18(1):39-43David F. Gleich (Sandia) Berkeley SCMC Seminar 8 of 43
  9. 9. A MODERN TAKEThe Katz score (node-based) is           The Katz score (edge-based) is           David F. Gleich (Sandia) Berkeley SCMC Seminar 9 of 43
  10. 10. RETURNING TO THE MATRIX                                         Carl Neumann                            David F. Gleich (Sandia) Berkeley SCMC Seminar 10 of 43
  11. 11. Carl Neumann I’ve heard the Neumann series called the “von Neumann” series more than I’d like! In fact, the von Neumann kernel of a graph should be named the “Neumann” kernel! Wikipedia pageDavid F. Gleich (Sandia) Berkeley SCMC Seminar 11 / 43
  12. 12. PROPERTIES OF KATZ’S MATRIX    is symmetric    exists when                                      is spd when                      Note that         1/max-degree sufficesDavid F. Gleich (Sandia) Berkeley SCMC Seminar 12 of 43
  13. 13. COMMUTE TIME Picture taken from Google images, seems to be Bay Bridge Traffic by Jim M. Goldstein.David F. Gleich (Sandia) Berkeley SCMC Seminar 13 / 43
  14. 14. COMMUTE TIMEConsider a uniform random walk on agraph                     Also called the hitting time from node i to j, or the first transition time                                                     David F. Gleich (Sandia) Berkeley SCMC Seminar 14 of 43
  15. 15. SKIPPING DETAILS                    : graph Laplacian                                             is the only null-vectorDavid F. Gleich (Sandia) Berkeley SCMC Seminar 15 of 43
  16. 16. WHAT DO OTHER PEOPLE DO?1) Just work with the linear algebra formulations2) For Katz, Truncate the Neumann series as a few (3-5) terms3) Use low-rank approximations from EVD(A) or EVD(L)4) For commute, use Johnson-Lindenstrauss inspired random sampling5) Approximately decompose into smaller problems Liben-Nowell and Kleinberg CIKM2003, Acar et al. ICDM2009, Spielman and Srivastava STOC2008, Sarkar and Moore UAI2007,Wang et al. ICDM2007David F. Gleich (Sandia) Berkeley SCMC Seminar 16 of 43
  17. 17. THE PROBLEM All of these techniques are preprocessing based because most people’s goal is to compute all the scores. We want to avoid preprocessing the graph. There are a few caveats here! i.e. one could solve the system instead of looking for the matrix inverseDavid F. Gleich (Sandia) Berkeley SCMC Seminar 17 of 43
  18. 18. WHY NO PREPROCESSING? The graph is constantly changing as I rate new movies.David F. Gleich (Sandia) Berkeley SCMC Seminar 18 of 43
  19. 19. WHY NO PREPROCESSING? Top-k predicted “links” are movies to watch! Pairwise scores give user similarityDavid F. Gleich (Sandia) Berkeley SCMC Seminar 19 of 43
  20. 20. PAIRWISE ALGORITHMS Katz            Commute                                         Golub and Meurant to the rescue!David F. Gleich (Sandia) Berkeley SCMC Seminar 20 of 43
  21. 21. MMQ - THE BIG IDEAQuadratic form                 Think                       Weighted sum           A is s.p.d. use EVD   Stieltjes integral           “A tautology”   Quadrature approximation             Matrix equation         LanczosDavid F. Gleich (Sandia) Berkeley SCMC Seminar 21 of 43
  22. 22. LANCZOS        , k-steps of the Lanczos methodproduce                                    and                              =              David F. Gleich (Sandia) Berkeley SCMC Seminar 22 of 43
  23. 23. PRACTICAL LANCZOSOnly need to store the last 2 vectors in      Updating requires O(matvec) work      is not orthogonalDavid F. Gleich (Sandia) Berkeley SCMC Seminar 23 of 43
  24. 24. MMQ PROCEDUREGoal                        Given                        1. Run k-steps of Lanczos on     starting with    2. Compute       ,     with an additional eigenvalue at     , set            Correspond to a Gauss-Radau rule, with u as a prescribed node3. Compute     ,     with an additional eigenvalue at   , set           Correspond to a Gauss-Radau rule, with l as a prescribed node4. Output               as lower and upper bounds on    David F. Gleich (Sandia) Berkeley SCMC Seminar 24 of 43
  25. 25. PRACTICAL MMQIncrease k to become more accurateBad eigenvalue bounds yield worse results      and     are easy to compute    not required, we can iterativelyupdate it’s LU factorizationDavid F. Gleich (Sandia) Berkeley SCMC Seminar 25 of 43
  26. 26. PRACTICAL MMQDavid F. Gleich (Sandia) Berkeley SCMC Seminar 26 of 43
  27. 27. ONE LAST STEP FOR KATZ Katz                                                                            David F. Gleich (Sandia) Berkeley SCMC Seminar 27 of 43
  28. 28. KATZ SCORES ARE LOCALIZED Up to 50 neighbors is 99.65% of the total massDavid F. Gleich (Sandia) Berkeley SCMC Seminar 28 of 43
  29. 29. TOP-K ALGORITHM FOR KATZApproximate                   where     is sparseKeep     sparse tooIdeally, don’t “touch” all of    David F. Gleich (Sandia) Berkeley SCMC Seminar 29 of 43
  30. 30. INSPIRATION - PAGERANK Approximate                    where     is sparse Keep     sparse too? YES! Ideally, don’t “touch” all of     ? YES!McSherry WWW2005, Berkhin 2007, Anderson et al. FOCS2008 – Thanks to Reid Anderson for telling me McSherry did this too. David F. Gleich (Sandia) Berkeley SCMC Seminar 30 of 43
  31. 31. THE ALGORITHM - MCSHERRYFor              Start with the Richardson iteration                            Rewrite                    Richardson converges if                        David F. Gleich (Sandia) Berkeley SCMC Seminar 31 of 43
  32. 32. THE ALGORITHMNote     is sparse.If                 , then         is sparse.Idea only add one component of         to        David F. Gleich (Sandia) Berkeley SCMC Seminar 32 of 43
  33. 33. THE ALGORITHMFor              Init:                                                                How to pick   ?David F. Gleich (Sandia) Berkeley SCMC Seminar 33 of 43
  34. 34. THE ALGORITHM FOR KATZFor                            Init:                                                                       Pick   as max     Storing the non-zeros of the residual in a heap makes picking the max log(n) time. See Anderson et al. FOCS2008 for moreDavid F. Gleich (Sandia) Berkeley SCMC Seminar 34 of 43
  35. 35. CONVERGENCE?The “standard” proof works for        1/max-degree.For                       , then for symmetric     ,this algorithm is the Gauss-Southwellprocedure and it still converges.David F. Gleich (Sandia) Berkeley SCMC Seminar 35 of 43
  36. 36. RESULTS – DATA, PARAMETERS All unweighted, connected graphs Easy                                     Hard                            David F. Gleich (Sandia) Berkeley SCMC Seminar 36 of 43
  37. 37. KATZ BOUND CONVERGENCEDavid F. Gleich (Sandia) Berkeley SCMC Seminar 37 of 43
  38. 38. COMMUTE BOUND CONVERG.David F. Gleich (Sandia) Berkeley SCMC Seminar 38 of 43
  39. 39. KATZ SET CONVERGENCE For arXiv graph.David F. Gleich (Sandia) Berkeley SCMC Seminar 39 of 43
  40. 40. TIMINGDavid F. Gleich (Sandia) Berkeley SCMC Seminar 40 of 43
  41. 41. CONCLUSIONSThese algorithms are faster than manyalternatives.For pairwise commute, stopping criteriaare simpler with boundsFor top-k problems, we often need lessthan 1 matvec for good enough resultsDavid F. Gleich (Sandia) Berkeley SCMC Seminar 41 of 43
  42. 42. WARTS AND TODOSStopping criteria on our top-k algorithmcan be a bit hairy… we should refine it.The top-k approach doesn’t work right forcommute time… we have an alternative.                        requires the entire diagonal of      Take advantage of new research to “seed”commute time better! von Luxburg et al. NIPS2010David F. Gleich (Sandia) Berkeley SCMC Seminar 42 of 43
  43. 43. Paper at WAW2010 Slides should be online soon Code is online already stanford.edu/~dgleich/ publications/2010/codes/fast-katz By AngryDogDesign on DeviantArtDavid F. Gleich (Sandia) Berkeley SCMC Seminar 43
  44. 44. F-MEASUREDavid F. Gleich (Sandia) Berkeley SCMC Seminar 44 of 43
  45. 45. MAIN RESULTS – SLIDE ONEA – adjacency matrixL – Laplacian matrixKatz score :                         Commute time:                 David F. Gleich (Sandia) Berkeley SCMC Seminar 45 of 43
  46. 46. MAIN RESULTS – SLIDE TWOFor Katz Compute one       fast Compute top       fastFor Commute Compute one       fastFor almost commute Compute top $F_{i,:}$ fastDavid F. Gleich (Sandia) Berkeley SCMC Seminar 46 of 43
  47. 47. MAIN RESULTS – SLIDE THREEDavid F. Gleich (Sandia) Berkeley SCMC Seminar 47 of 43
  48. 48. KATZ BOUND CONVERGENCEDavid F. Gleich (Sandia) Berkeley SCMC Seminar 48 of 43
  49. 49. KATZ ERROR CONVERGENCEDavid F. Gleich (Sandia) Berkeley SCMC Seminar 49 of 43
  50. 50. COMMUTE BOUND CONVERG.David F. Gleich (Sandia) Berkeley SCMC Seminar 50 of 43
  51. 51. COMMUTE ERROR CONVERG.David F. Gleich (Sandia) Berkeley SCMC Seminar 51 of 43
  52. 52. KATZ SET CONVERGENCEDavid F. Gleich (Sandia) Berkeley SCMC Seminar 52 of 43
  53. 53. KATZ ORDER CONVERGENCEDavid F. Gleich (Sandia) Berkeley SCMC Seminar 53 of 43
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×