FAST KATZ AND COMMUTERS                                                              Efficient Estimation of Social Relate...
MAIN RESULTS                                       For Katz                                       Compute one       fastA ...
OUTLINEKatz Rank and Commute Time, then WhyMatrices, moments, and quadrature rulesfor pairwise scoresSparse linear systems...
NOTE – EVERYTHING IS SIMPLEAll graphs are undirectedAll graphs are connectedAll graphs are unweightedDavid F. Gleich (Sand...
KATZ SCORESThe Katz score is                                                                                              ...
COMMUTE TIMEConsider a uniform random walk on agraph                                                                      ...
WHAT DO OTHER PEOPLE DO?1) Just work with the linear algebra formulations2) For Katz, truncate the Neumann series as a few...
THE PROBLEM                             All of these techniques are                            preprocessing based because...
WHY? LINK PREDICTION                                                                                 Neighborhood based   ...
WHY NO PREPROCESSING?                           The graph is constantly changing                                as I rate ...
WHY NO PREPROCESSING?                                                    Top-k predicted “links”                          ...
PAIRWISE ALGORITHMS                Katz                 Commute                                                           ...
MMQ - THE BIG IDEAQuadratic form                                         Think                           Weighted sum     ...
MMQ PROCEDURE SKETCH                                                 Bad bounds give worseGoal                            ...
PRACTICAL MMQDavid F. Gleich (Sandia)   ICME la/opt seminar   15 of 28
ONE LAST STEP FOR KATZ                      Katz                                                                         ...
TOP-K ALGORITHM FOR KATZApproximate                                 where     is sparseKeep     sparse tooIdeally, don’t “...
THE ALGORITHM - MCSHERRYFor              Start with the Richardson iteration                                              ...
THE ALGORITHM FOR KATZFor                          Init:                                                                  ...
CONVERGENCE?The “standard” proof works for        1/max-degree.If you pick   as the maximum element, wecan show this is co...
RESULTS – DATA, PARAMETERS               All unweighted, connected graphs                     Easy     :                  ...
KATZ BOUND CONVERGENCEDavid F. Gleich (Sandia)   ICME la/opt seminar   22 of 28
COMMUTE BOUND CONVERG.David F. Gleich (Sandia)   ICME la/opt seminar   23 of 28
KATZ SET CONVERGENCE                                                 For arXiv graph.David F. Gleich (Sandia)   ICME la/op...
TIMINGDavid F. Gleich (Sandia)   ICME la/opt seminar   25 of 28
CONCLUSIONSThese algorithms are faster than manyalternatives in these special cases.For pairwise commute, stopping criteri...
WARTS AND TODOSStopping criteria on our top-k algorithmcan be a bit hairy… we should refine it.The top-k approach doesn’t ...
More details in the paper                                                       Slides should be online soon              ...
Upcoming SlideShare
Loading in …5
×

Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks

879 views

Published on

WAW2010 presentation

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
879
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks

  1. 1. FAST KATZ AND COMMUTERS Efficient Estimation of Social Relatedness David F. Gleich Sandia National Labs WAW2010 16 December 2010 Palo Alto, CA With Pooya Esfandiar, Francesco Bonchi, Chen Grief, Laks V. S. Lakshmanan, and Byung-Won OnSandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.David F. Gleich (Sandia) ICME la/opt seminar 1 / 28
  2. 2. MAIN RESULTS For Katz Compute one       fastA – adjacency matrix Compute top       fastL – Laplacian matrixKatz score :                         Commute time:                 For Commute Compute one       fastDavid F. Gleich (Sandia) ICME la/opt seminar 2 of 28
  3. 3. OUTLINEKatz Rank and Commute Time, then WhyMatrices, moments, and quadrature rulesfor pairwise scoresSparse linear systems solves for top-kSome resultsDavid F. Gleich (Sandia) ICME la/opt seminar 3 of 28
  4. 4. NOTE – EVERYTHING IS SIMPLEAll graphs are undirectedAll graphs are connectedAll graphs are unweightedDavid F. Gleich (Sandia) ICME la/opt seminar 4 of 28
  5. 5. KATZ SCORESThe Katz score is                                                    Carl Neumann                                 David F. Gleich (Sandia) ICME la/opt seminar 5 of 28
  6. 6. COMMUTE TIMEConsider a uniform random walk on agraph                   Also called the hitting time from node i to j, or                            the first transition time                                              : graph Laplacian                                               is the only null-vector Fouss et al.TKDE 2007David F. Gleich (Sandia) ICME la/opt seminar 6 of 28
  7. 7. WHAT DO OTHER PEOPLE DO?1) Just work with the linear algebra formulations2) For Katz, truncate the Neumann series as a few (3-5) terms3) Use low-rank approximations from EVD(A) or EVD(L)4) For commute, use Johnson-Lindenstrauss inspired random sampling5) Approximately decompose into smaller problems Liben-Nowell and Kleinberg CIKM2003, Acar et al. ICDM2009, Spielman and Srivastava STOC2008, Sarkar and Moore UAI2007,Wang et al. ICDM2007David F. Gleich (Sandia) ICME la/opt seminar 7 of 28
  8. 8. THE PROBLEM All of these techniques are preprocessing based because most people’s goal is to compute all the scores. We want to avoid preprocessing the graph. There are a few caveats here! i.e. one could solve the system instead of looking for the matrix inverseDavid F. Gleich (Sandia) ICME la/opt seminar 8 of 28
  9. 9. WHY? LINK PREDICTION Neighborhood based Path based Liben-Nowell and Kleinberg 2003, 2006 found that path based link prediction was more efficientDavid F. Gleich (Sandia) ICME la/opt seminar 9 of 28
  10. 10. WHY NO PREPROCESSING? The graph is constantly changing as I rate new movies.David F. Gleich (Sandia) ICME la/opt seminar 10 of 28
  11. 11. WHY NO PREPROCESSING? Top-k predicted “links” are movies to watch! Pairwise scores give user similarityDavid F. Gleich (Sandia) ICME la/opt seminar 11 of 28
  12. 12. PAIRWISE ALGORITHMS Katz           Commute                                         Golub and Meurant to the rescue!David F. Gleich (Sandia) ICME la/opt seminar 12 of 28
  13. 13. MMQ - THE BIG IDEAQuadratic form                 Think                       Weighted sum           A is s.p.d. use EVD   Stieltjes integral           “A tautology”   Quadrature approximation             Matrix equation         LanczosDavid F. Gleich (Sandia) ICME la/opt seminar 13 of 28
  14. 14. MMQ PROCEDURE SKETCH Bad bounds give worseGoal                         performance!Given                         Larger k gives better results; k steps is k matvecs1. Run k-steps of Lanczos on     starting with    2. Compute       ,     with an additional eigenvalue at     , set            Correspond to a Gauss-Radau rule, with u as a prescribed node Easy to “update” this inverse as k increases.3. Compute     ,     with an additional eigenvalue at   , set           Correspond to a Gauss-Radau rule, with l as a prescribed node4. Output               as lower and upper bounds on bDavid F. Gleich (Sandia) ICME la/opt seminar 14 of 28
  15. 15. PRACTICAL MMQDavid F. Gleich (Sandia) ICME la/opt seminar 15 of 28
  16. 16. ONE LAST STEP FOR KATZ Katz                                                                          David F. Gleich (Sandia) ICME la/opt seminar 16 of 28
  17. 17. TOP-K ALGORITHM FOR KATZApproximate                   where     is sparseKeep     sparse tooIdeally, don’t “touch” all of     For PageRank, we can do this!David F. Gleich (Sandia) ICME la/opt seminar 17 of 28
  18. 18. THE ALGORITHM - MCSHERRYFor              Start with the Richardson iteration                             Richardson converges ifRewrite                                 Note     is sparseIf                 , then         is sparse.Idea Only add one component of         to         McSherry WWW2005David F. Gleich (Sandia) ICME la/opt seminar 18 of 28
  19. 19. THE ALGORITHM FOR KATZFor                          Init:                                                                     Pick   as max     Storing the non-zeros of the residual in a heap makes picking the max log(n) time. See Anderson et al. FOCS2008 for moreDavid F. Gleich (Sandia) ICME la/opt seminar 19 of 28
  20. 20. CONVERGENCE?The “standard” proof works for        1/max-degree.If you pick   as the maximum element, wecan show this is convergent if Richardsonconverges. This proof requires     to besymmetric positive definite.David F. Gleich (Sandia) ICME la/opt seminar 20 of 28
  21. 21. RESULTS – DATA, PARAMETERS All unweighted, connected graphs Easy     :                                 Hard     :                        David F. Gleich (Sandia) ICME la/opt seminar 21 of 28
  22. 22. KATZ BOUND CONVERGENCEDavid F. Gleich (Sandia) ICME la/opt seminar 22 of 28
  23. 23. COMMUTE BOUND CONVERG.David F. Gleich (Sandia) ICME la/opt seminar 23 of 28
  24. 24. KATZ SET CONVERGENCE For arXiv graph.David F. Gleich (Sandia) ICME la/opt seminar 24 of 28
  25. 25. TIMINGDavid F. Gleich (Sandia) ICME la/opt seminar 25 of 28
  26. 26. CONCLUSIONSThese algorithms are faster than manyalternatives in these special cases.For pairwise commute, stopping criteriaare simpler with bounds.For top-k problems, we often need lessthan 1 matvec for good enough resultsDavid F. Gleich (Sandia) ICME la/opt seminar 26 of 28
  27. 27. WARTS AND TODOSStopping criteria on our top-k algorithmcan be a bit hairy… we should refine it.The top-k approach doesn’t work right forcommute time… we have an alternative.Take advantage of new research to “seed”commute time better!Evaluate more datasets, like Netflix. von Luxburg et al. NIPS2010David F. Gleich (Sandia) ICME la/opt seminar 27 of 28
  28. 28. More details in the paper Slides should be online soon Code is online already stanford.edu/~dgleich/ publications/2010/codes/fast-katz By AngryDogDesign on DeviantArtDavid F. Gleich (Sandia) ICME la/opt seminar 28
  29. 29. LEO KATZDavid F. Gleich (Sandia) ICME la/opt seminar 29 of 28
  30. 30. NOT QUITE, WIKIPEDIA    : adjacency,                     : random walk PageRank              Katz             These are equivalent if     has constantdegreeDavid F. Gleich (Sandia) ICME la/opt seminar 30 of 28
  31. 31. WHAT KATZ ACTUALLY SAID “we assume that each link independently has the same probability of being effective” … “we conceive a constant     , depending on the group and the context of the particular investigation, which has the force of a probability of effectiveness of a single link. A k-step chain then, has probability       of being effective.” “We wish to find the column sums of the matrix” Leo Katz 1953, A New Status Index Derived from Sociometric Analysis, Psychometria 18(1):39-43David F. Gleich (Sandia) ICME la/opt seminar 31 of 28
  32. 32. RETURNING TO THE MATRIX                                         Carl Neumann                                 David F. Gleich (Sandia) ICME la/opt seminar 32 of 28
  33. 33. PROPERTIES OF KATZ’S MATRIX    is symmetric    exists when                                  is sym. pos. def. when                      Note that         1/max-degree sufficesDavid F. Gleich (Sandia) ICME la/opt seminar 33 of 28
  34. 34. SKIPPING DETAILS                    : graph Laplacian                                             is the only null-vectorDavid F. Gleich (Sandia) ICME la/opt seminar 34 of 28
  35. 35. Carl Neumann I’ve heard the Neumann series called the “von Neumann” series more than I’d like! In fact, the von Neumann kernel of a graph should be named the “Neumann” kernel! Wikipedia pageDavid F. Gleich (Sandia) ICME la/opt seminar 35 / 28
  36. 36. LANCZOS          , k-steps of the Lanczos methodproduce                                    and                              =             David F. Gleich (Sandia) ICME la/opt seminar 36 of 28
  37. 37. PRACTICAL LANCZOSOnly need to store the last 2 vectors in      Updating requires O(matvec) work      is not orthogonalDavid F. Gleich (Sandia) ICME la/opt seminar 37 of 28
  38. 38. PRACTICAL MMQIncrease k to become more accurateBad eigenvalue bounds yield worse results      and     are easy to compute    not required, we can iterativelyupdate it’s LU factorizationDavid F. Gleich (Sandia) ICME la/opt seminar 38 of 28
  39. 39. THE ALGORITHMFor              Init:                                                                  How to pick   ?David F. Gleich (Sandia) ICME la/opt seminar 39 of 28
  40. 40. MAIN RESULTS – SLIDE ONEA – adjacency matrixL – Laplacian matrixKatz score :                         Commute time:                 David F. Gleich (Sandia) ICME la/opt seminar 40 of 28
  41. 41. TOP-K INSPIRATION: PAGERANK Approximate                    where     is sparse Keep     sparse too? YES! Ideally, don’t “touch” all of     ? YES!McSherry WWW2005, Berkhin 2007, Anderson et al. FOCS2008 – Thanks to Reid Anderson for telling me McSherry did this too. David F. Gleich (Sandia) ICME la/opt seminar 41 of 28
  42. 42. THE ALGORITHMNote     is sparse.If                 , then         is sparse.Idea only add one component of         to        David F. Gleich (Sandia) ICME la/opt seminar 42 of 28
  43. 43. MAIN RESULTS – SLIDE TWOFor Katz Compute one       fast Compute top       fastFor Commute Compute one       fastFor almost commute Compute top       fastDavid F. Gleich (Sandia) ICME la/opt seminar 43 of 28
  44. 44. MAIN RESULTS – SLIDE THREEDavid F. Gleich (Sandia) ICME la/opt seminar 44 of 28
  45. 45. F-MEASUREDavid F. Gleich (Sandia) ICME la/opt seminar 45 of 28
  46. 46. PAIRWISE RESULTSKatz upper and lower boundsKatz error convergenceCommute-time upper and lower boundsCommute-time error convergenceFor the arXiv graph hereDavid F. Gleich (Sandia) ICME la/opt seminar 46 of 28

×