Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Tweet along @dgleich
FAST KATZ AND COMMUTERS
Quadrature Rules and Sparse Linear Solvers
for Link Prediction Heuristics
Dav...
Tweet along @dgleich
MAIN RESULTS – SLIDE ONE
A – adjacency matrix
L – Laplacian matrix
Katz score :
                     ...
Tweet along @dgleich
MAIN RESULTS – SLIDE TWO
For Katz Compute one       fast
Compute top       fast
For Commute
Compute o...
Tweet along @dgleich
MAIN RESULTS – SLIDE THREE
David F. Gleich (Sandia) ICME la/opt seminar 4 of 50
Tweet along @dgleich
OUTLINE
Why study these measures?
Katz Rank and Commute Time
How else do people compute them?
Quadrat...
Tweet along @dgleich
WHY? LINK PREDICTION
David F. Gleich (Sandia) ICME la/opt seminar
Liben-Nowell and Kleinberg 2003, 20...
Tweet along @dgleich
NOTE
All graphs are undirected
All graphs are connected
David F. Gleich (Sandia) ICME la/opt seminar ...
Tweet along @dgleich
LEO KATZ
David F. Gleich (Sandia) ICME la/opt seminar 8 of 50
Tweet along @dgleich
NOT QUITE,WIKIPEDIA
    : adjacency,                     : random walk
PageRank                      ...
Tweet along @dgleich
WHAT KATZ ACTUALLY SAID
Leo Katz 1953,A New Status Index Derived from Sociometric Analysis, Psychomet...
Tweet along @dgleich
A MODERN TAKE
The Katz score (node-based) is
                   
The Katz score (edge-based) is
     ...
Tweet along @dgleich
RETURNING TO THE MATRIX
                     
                                                     
C...
Tweet along @dgleich
Carl Neumann
I’ve heard the Neumann series called the “von Neumann”
series more than I’d like! In fac...
Tweet along @dgleich
PROPERTIES OF KATZ’S MATRIX
    is symmetric
    exists when                      
              is s...
Tweet along @dgleich
COMMUTE TIME
Consider a uniform random walk on a
graph                    
                          ...
Tweet along @dgleich
SKIPPING DETAILS
                : graph Laplacian
                                                  ...
Tweet along @dgleich
WHAT DO OTHER PEOPLE DO?
1) Just work with the linear algebra formulations
2) For Katz, Truncate the ...
Tweet along @dgleich
THE PROBLEM
All of these techniques are
preprocessing based because
most people’s goal is to compute
...
Tweet along @dgleich
WHY NO PREPROCESSING?
The graph is constantly changing
as I rate new movies.
David F. Gleich (Sandia)...
Tweet along @dgleich
WHY NO PREPROCESSING?
David F. Gleich (Sandia) ICME la/opt seminar
Top-k predicted “links”
are movies...
Tweet along @dgleich
PAIRWISE ALGORITHMS
Katz                    
Commute                                        
David F....
Tweet along @dgleich
MMQ - THE BIG IDEA
Quadratic form                
   
Weighted sum                  
   
Stieltjes in...
Tweet along @dgleich
LANCZOS
          , $k$-steps of the Lanczos method
produce
                                    and  ...
Tweet along @dgleich
PRACTICAL LANCZOS
Only need to store the last 2 vectors in      
Updating requires O(matvec) work
   ...
Tweet along @dgleich
MMQ PROCEDURE
Goal                        
Given                        
1. Run k-steps of Lanczos on...
Tweet along @dgleich
PRACTICAL MMQ
Increase k to become more accurate
Bad eigenvalue bounds yield worse results
      and ...
Tweet along @dgleich
PRACTICAL MMQ
David F. Gleich (Sandia) ICME la/opt seminar 27 of 50
Tweet along @dgleich
ONE LAST STEP FOR KATZ
Katz                    
          
                                         ...
Tweet along @dgleich
TOP-K ALGORITHM FOR KATZ
Approximate    
                           
where     is sparse
Keep     spa...
Tweet along @dgleich
INSPIRATION - PAGERANK
Approximate    
                           
where     is sparse
Keep     spars...
Tweet along @dgleich
THE ALGORITHM - MCSHERRY
For              
Start with the Richardson iteration
                      ...
Tweet along @dgleich
THE ALGORITHM
Note     is sparse.
If                 , then         is sparse.
Idea
only add one comp...
Tweet along @dgleich
THE ALGORITHM
For              
Init:                                  
                             ...
Tweet along @dgleich
THE ALGORITHM FOR KATZ
For                          
Init:                                  
        ...
Tweet along @dgleich
CONVERGENCE?
If you pick   as the maximum element, we
can show this is convergent if Richardson
conve...
Tweet along @dgleich
RESULTS - DATA
All unweighted, connected graphs
David F. Gleich (Sandia) ICME la/opt seminar 36 of 50
Tweet along @dgleich
RESULTS – KATZ ALPHAS
Easy    
                               
Hard    
                       
David...
Tweet along @dgleich
PAIRWISE RESULTS
Katz upper and lower bounds
Katz error convergence
Commute-time upper and lower boun...
Tweet along @dgleich
KATZ BOUND CONVERGENCE
David F. Gleich (Sandia) ICME la/opt seminar 39 of 50
Tweet along @dgleich
KATZ ERROR CONVERGENCE
David F. Gleich (Sandia) ICME la/opt seminar 40 of 50
Tweet along @dgleich
COMMUTE BOUND CONVERG.
David F. Gleich (Sandia) ICME la/opt seminar 41 of 50
Tweet along @dgleich
COMMUTE ERROR CONVERG.
David F. Gleich (Sandia) ICME la/opt seminar 42 of 50
Tweet along @dgleich
TOP-K RESULTS
Katz set convergence
Katz order convergence
For arXiv graph
David F. Gleich (Sandia) IC...
Tweet along @dgleich
KATZ SET CONVERGENCE
David F. Gleich (Sandia) ICME la/opt seminar 44 of 50
Tweet along @dgleich
KATZ ORDER CONVERGENCE
David F. Gleich (Sandia) ICME la/opt seminar 45 of 50
Tweet along @dgleich
CONCLUSIONS
These algorithms are faster than many
alternatives.
For pairwise commute, stopping criter...
Tweet along @dgleich
WARTS
Stopping criteria on our top-k algorithm
can be a bit hairy
The top-k approach doesn’t work rig...
Tweet along @dgleich
TODO
Try on netflix data! 
Explore our “almost commute measure
more”
David F. Gleich (Sandia) ICME l...
Tweet along @dgleich
F-MEASURE
David F. Gleich (Sandia) ICME la/opt seminar 49 of 50
Tweet along @dgleich
By AngryDogDesign on DeviantArt
Preprint available by request
Slides should be online soon
Code is on...
Upcoming SlideShare
Loading in …5
×

Fast katz-presentation

1,258 views

Published on

Fast Katz and Commuters - Quadrature Rules and Sparse Linear Solvers
for Link Prediction Heuristics

Motivated by social network data mining problems such as link
prediction and collaborative filtering, significant research effort
has been devoted to computing topological measures including the Katz
score and the commute time. Existing approaches approximate all
pairwise relationships simultaneously. We are interested in
computing

the score for a single pair of nodes;
the top-k nodes with the best scores from a given source node.

For the pairwise problem, we introduce an iterative algorithm that
computes upper and lower bounds for the measures we seek. This
algorithm exploits a relationship between the Lanczos process and a
quadrature rule.

For the top-k problem, we propose an algorithm that only accesses a
small portion of the graph, similar to algorithms used in personalized
PageRank computing. To test scalability and accuracy, we experiment
with three real-world networks and find that our algorithms run in
milliseconds to seconds without any preprocessing.

Published in: Technology, Education
  • Be the first to comment

Fast katz-presentation

  1. 1. Tweet along @dgleich FAST KATZ AND COMMUTERS Quadrature Rules and Sparse Linear Solvers for Link Prediction Heuristics David F. Gleich Sandia National Labs la/opt seminar October 14th 2010 With Pooya Esfandiar, Francesco Bonchi, Chen Grief, LaksV. S. Lakshmanan, and Byung-Won On David F. Gleich (Sandia) ICME la/opt seminar 1 / 50
  2. 2. Tweet along @dgleich MAIN RESULTS – SLIDE ONE A – adjacency matrix L – Laplacian matrix Katz score :                                                 Commute time:                                 David F. Gleich (Sandia) ICME la/opt seminar 2 of 50
  3. 3. Tweet along @dgleich MAIN RESULTS – SLIDE TWO For Katz Compute one       fast Compute top       fast For Commute Compute one       fast For almost commute Compute top       fast David F. Gleich (Sandia) ICME la/opt seminar 3 of 50
  4. 4. Tweet along @dgleich MAIN RESULTS – SLIDE THREE David F. Gleich (Sandia) ICME la/opt seminar 4 of 50
  5. 5. Tweet along @dgleich OUTLINE Why study these measures? Katz Rank and Commute Time How else do people compute them? Quadrature rules for pairwise scores Sparse linear systems solves for top-k As many results as we have time for… David F. Gleich (Sandia) ICME la/opt seminar 5 of 50
  6. 6. Tweet along @dgleich WHY? LINK PREDICTION David F. Gleich (Sandia) ICME la/opt seminar Liben-Nowell and Kleinberg 2003, 2006 found that path based link prediction was more efficient Neighborhood based Path based 6 of 50
  7. 7. Tweet along @dgleich NOTE All graphs are undirected All graphs are connected David F. Gleich (Sandia) ICME la/opt seminar 7 of 50
  8. 8. Tweet along @dgleich LEO KATZ David F. Gleich (Sandia) ICME la/opt seminar 8 of 50
  9. 9. Tweet along @dgleich NOT QUITE,WIKIPEDIA     : adjacency,                     : random walk PageRank                         Katz                         These are equivalent if     has constant degree David F. Gleich (Sandia) ICME la/opt seminar 9 of 50
  10. 10. Tweet along @dgleich WHAT KATZ ACTUALLY SAID Leo Katz 1953,A New Status Index Derived from Sociometric Analysis, Psychometria 18(1):39-43 “we assume that each link independently has the same probability of being effective” … “we conceive a constant     , depending on the group and the context of the particular investigation, which has the force of a probability of effectiveness of a single link.A k-step chain then, has probability       of being effective.” “We wish to find the column sums of the matrix” David F. Gleich (Sandia) ICME la/opt seminar 10 of 50
  11. 11. Tweet along @dgleich A MODERN TAKE The Katz score (node-based) is                     The Katz score (edge-based) is                     David F. Gleich (Sandia) ICME la/opt seminar 11 of 50
  12. 12. Tweet along @dgleich RETURNING TO THE MATRIX                                                                             Carl Neumann                                                               David F. Gleich (Sandia) ICME la/opt seminar 12 of 50
  13. 13. Tweet along @dgleich Carl Neumann I’ve heard the Neumann series called the “von Neumann” series more than I’d like! In fact, the von Neumann kernel of a graph should be named the “Neumann” kernel! David F. Gleich (Sandia) ICME la/opt seminar Wikipedia page 13 / 50
  14. 14. Tweet along @dgleich PROPERTIES OF KATZ’S MATRIX     is symmetric     exists when                                     is sym. pos. def. when                       Note that         1/max-degree suffices David F. Gleich (Sandia) ICME la/opt seminar 14 of 50
  15. 15. Tweet along @dgleich COMMUTE TIME Consider a uniform random walk on a graph                                                                                                                           David F. Gleich (Sandia) ICME la/opt seminar Also called the hitting time from node i to j, or the first transition time 15 of 50
  16. 16. Tweet along @dgleich SKIPPING DETAILS                 : graph Laplacian                                                                           is the only null-vector David F. Gleich (Sandia) ICME la/opt seminar 16 of 50
  17. 17. Tweet along @dgleich WHAT DO OTHER PEOPLE DO? 1) Just work with the linear algebra formulations 2) For Katz, Truncate the Neumann series as a few (3-5) terms (I’m searching for this ref.) 3) Use low-rank approximations from EVD(A) or EVD(L) 4) For commute, use Johnson-Lindenstrauss inspired random sampling 5) Approximately decompose into smaller problems David F. Gleich (Sandia) ICME la/opt seminar Liben-Nowll and Kleinberg CIKM2003,Acar et al. ICDM2009,Spielman and Srivastava STOC2008,Sarkar and Moore UAI2007 17 of 50
  18. 18. Tweet along @dgleich THE PROBLEM All of these techniques are preprocessing based because most people’s goal is to compute all the scores. We want to avoid preprocessing the graph. David F. Gleich (Sandia) ICME la/opt seminar There are a few caveats here! i.e. one could solve the system instead of looking for the matrix inverse 18 of 50
  19. 19. Tweet along @dgleich WHY NO PREPROCESSING? The graph is constantly changing as I rate new movies. David F. Gleich (Sandia) ICME la/opt seminar 19 of 50
  20. 20. Tweet along @dgleich WHY NO PREPROCESSING? David F. Gleich (Sandia) ICME la/opt seminar Top-k predicted “links” are movies to watch! Pairwise scores give user similarity 20 of 50
  21. 21. Tweet along @dgleich PAIRWISE ALGORITHMS Katz                     Commute                                         David F. Gleich (Sandia) ICME la/opt seminar Golub and Meurant to the rescue! 21 of 50
  22. 22. Tweet along @dgleich MMQ - THE BIG IDEA Quadratic form                     Weighted sum                       Stieltjes integral                       Quadrature approximation                       Matrix equation               David F. Gleich (Sandia) ICME la/opt seminar Think                     A is s.p.d. use EVD “A tautology” Lanczos 22 of 50
  23. 23. Tweet along @dgleich LANCZOS           , $k$-steps of the Lanczos method produce                                     and                       David F. Gleich (Sandia) ICME la/opt seminar                    =               23 of 50
  24. 24. Tweet along @dgleich PRACTICAL LANCZOS Only need to store the last 2 vectors in       Updating requires O(matvec) work       is not orthogonal David F. Gleich (Sandia) ICME la/opt seminar 24 of 50
  25. 25. Tweet along @dgleich MMQ PROCEDURE Goal                         Given                         1. Run k-steps of Lanczos on     starting with     2. Compute       ,     with an additional eigenvalue at     , set                     3. Compute     ,     with an additional eigenvalue at   , set                   4. Output               as lower and upper bounds on b David F. Gleich (Sandia) ICME la/opt seminar Correspond to a Gauss-Radau rule, with u as a prescribed node Correspond to a Gauss-Radau rule, with l as a prescribed node 25 of 50
  26. 26. Tweet along @dgleich PRACTICAL MMQ Increase k to become more accurate Bad eigenvalue bounds yield worse results       and     are easy to compute       not required, we can iteratively update it’s LU factorization David F. Gleich (Sandia) ICME la/opt seminar 26 of 50
  27. 27. Tweet along @dgleich PRACTICAL MMQ David F. Gleich (Sandia) ICME la/opt seminar 27 of 50
  28. 28. Tweet along @dgleich ONE LAST STEP FOR KATZ Katz                                                                                                                                          David F. Gleich (Sandia) ICME la/opt seminar 28 of 50
  29. 29. Tweet along @dgleich TOP-K ALGORITHM FOR KATZ Approximate                                 where     is sparse Keep     sparse too Ideally, don’t “touch” all of     David F. Gleich (Sandia) ICME la/opt seminar 29 of 50
  30. 30. Tweet along @dgleich INSPIRATION - PAGERANK Approximate                                 where     is sparse Keep     sparse too? YES! Ideally, don’t “touch” all of     ? YES! David F. Gleich (Sandia) ICME la/opt seminar McSherryWWW2005, Berkhin 2007,Anderson et al. FOCS2008 –Thanks to Reid Anderson for telling me McSherry did this too. 30 of 50
  31. 31. Tweet along @dgleich THE ALGORITHM - MCSHERRY For               Start with the Richardson iteration                                                       Rewrite                                       Richardson converges if                         David F. Gleich (Sandia) ICME la/opt seminar 31 of 50
  32. 32. Tweet along @dgleich THE ALGORITHM Note     is sparse. If                 , then         is sparse. Idea only add one component of         to         David F. Gleich (Sandia) ICME la/opt seminar 32 of 50
  33. 33. Tweet along @dgleich THE ALGORITHM For               Init:                                                                                                   How to pick   ? David F. Gleich (Sandia) ICME la/opt seminar 33 of 50
  34. 34. Tweet along @dgleich THE ALGORITHM FOR KATZ For                           Init:                                                                                                             Pick   as max       David F. Gleich (Sandia) ICME la/opt seminar Storing the non-zeros of the residual in a heap makes picking the max log(n) time. See Anderson et al. FOCS2008 for more 34 of 50
  35. 35. Tweet along @dgleich CONVERGENCE? If you pick   as the maximum element, we can show this is convergent if Richardson converges. This proof requires     to be symmetric positive definite. David F. Gleich (Sandia) ICME la/opt seminar 35 of 50
  36. 36. Tweet along @dgleich RESULTS - DATA All unweighted, connected graphs David F. Gleich (Sandia) ICME la/opt seminar 36 of 50
  37. 37. Tweet along @dgleich RESULTS – KATZ ALPHAS Easy                                     Hard                             David F. Gleich (Sandia) ICME la/opt seminar 37 of 50
  38. 38. Tweet along @dgleich PAIRWISE RESULTS Katz upper and lower bounds Katz error convergence Commute-time upper and lower bounds Commute-time error convergence For the arXiv graph here David F. Gleich (Sandia) ICME la/opt seminar 38 of 50
  39. 39. Tweet along @dgleich KATZ BOUND CONVERGENCE David F. Gleich (Sandia) ICME la/opt seminar 39 of 50
  40. 40. Tweet along @dgleich KATZ ERROR CONVERGENCE David F. Gleich (Sandia) ICME la/opt seminar 40 of 50
  41. 41. Tweet along @dgleich COMMUTE BOUND CONVERG. David F. Gleich (Sandia) ICME la/opt seminar 41 of 50
  42. 42. Tweet along @dgleich COMMUTE ERROR CONVERG. David F. Gleich (Sandia) ICME la/opt seminar 42 of 50
  43. 43. Tweet along @dgleich TOP-K RESULTS Katz set convergence Katz order convergence For arXiv graph David F. Gleich (Sandia) ICME la/opt seminar 43 of 50
  44. 44. Tweet along @dgleich KATZ SET CONVERGENCE David F. Gleich (Sandia) ICME la/opt seminar 44 of 50
  45. 45. Tweet along @dgleich KATZ ORDER CONVERGENCE David F. Gleich (Sandia) ICME la/opt seminar 45 of 50
  46. 46. Tweet along @dgleich CONCLUSIONS These algorithms are faster than many alternatives. For pairwise commute, stopping criteria are simpler For top-k, we often need less than 1 matvec for good enough results David F. Gleich (Sandia) ICME la/opt seminar 46 of 50
  47. 47. Tweet along @dgleich WARTS Stopping criteria on our top-k algorithm can be a bit hairy The top-k approach doesn’t work right for commute time David F. Gleich (Sandia) ICME la/opt seminar 47 of 50
  48. 48. Tweet along @dgleich TODO Try on netflix data!  Explore our “almost commute measure more” David F. Gleich (Sandia) ICME la/opt seminar 48 of 50
  49. 49. Tweet along @dgleich F-MEASURE David F. Gleich (Sandia) ICME la/opt seminar 49 of 50
  50. 50. Tweet along @dgleich By AngryDogDesign on DeviantArt Preprint available by request Slides should be online soon Code is online already stanford.edu/~dgleich/ publications/2010/codes/fast-katz David F. Gleich (Sandia) ICME la/opt seminar 50

×