SlideShare a Scribd company logo
1 of 50
Download to read offline
Tweet along @dgleich
FAST KATZ AND COMMUTERS
Quadrature Rules and Sparse Linear Solvers
for Link Prediction Heuristics
David F. Gleich
Sandia National Labs
la/opt seminar
October 14th 2010
With Pooya Esfandiar, Francesco Bonchi, Chen Grief,
LaksV. S. Lakshmanan, and Byung-Won On
David F. Gleich (Sandia) ICME la/opt seminar 1 / 50
Tweet along @dgleich
MAIN RESULTS – SLIDE ONE
A – adjacency matrix
L – Laplacian matrix
Katz score :
                                               
Commute time:
                               
David F. Gleich (Sandia) ICME la/opt seminar 2 of 50
Tweet along @dgleich
MAIN RESULTS – SLIDE TWO
For Katz Compute one       fast
Compute top       fast
For Commute
Compute one       fast
For almost commute
Compute top       fast
David F. Gleich (Sandia) ICME la/opt seminar 3 of 50
Tweet along @dgleich
MAIN RESULTS – SLIDE THREE
David F. Gleich (Sandia) ICME la/opt seminar 4 of 50
Tweet along @dgleich
OUTLINE
Why study these measures?
Katz Rank and Commute Time
How else do people compute them?
Quadrature rules for pairwise scores
Sparse linear systems solves for top-k
As many results as we have time for…
David F. Gleich (Sandia) ICME la/opt seminar 5 of 50
Tweet along @dgleich
WHY? LINK PREDICTION
David F. Gleich (Sandia) ICME la/opt seminar
Liben-Nowell and Kleinberg 2003, 2006 found that path based link prediction was more efficient
Neighborhood based
Path based
6 of 50
Tweet along @dgleich
NOTE
All graphs are undirected
All graphs are connected
David F. Gleich (Sandia) ICME la/opt seminar 7 of 50
Tweet along @dgleich
LEO KATZ
David F. Gleich (Sandia) ICME la/opt seminar 8 of 50
Tweet along @dgleich
NOT QUITE,WIKIPEDIA
    : adjacency,                     : random walk
PageRank                        
Katz                        
These are equivalent if     has constant
degree
David F. Gleich (Sandia) ICME la/opt seminar 9 of 50
Tweet along @dgleich
WHAT KATZ ACTUALLY SAID
Leo Katz 1953,A New Status Index Derived from Sociometric Analysis, Psychometria 18(1):39-43
“we assume that each link independently has the
same probability of being effective” …
“we conceive a constant     , depending
on the group and the context of the particular
investigation, which has the force of a probability
of effectiveness of a single link.A k-step chain
then, has probability       of being effective.”
“We wish to find the column sums of the matrix”
David F. Gleich (Sandia) ICME la/opt seminar 10 of 50
Tweet along @dgleich
A MODERN TAKE
The Katz score (node-based) is
                   
The Katz score (edge-based) is
                   
David F. Gleich (Sandia) ICME la/opt seminar 11 of 50
Tweet along @dgleich
RETURNING TO THE MATRIX
                     
                                                     
Carl Neumann                          
                                   
David F. Gleich (Sandia) ICME la/opt seminar 12 of 50
Tweet along @dgleich
Carl Neumann
I’ve heard the Neumann series called the “von Neumann”
series more than I’d like! In fact, the von Neumann kernel
of a graph should be named the “Neumann” kernel!
David F. Gleich (Sandia) ICME la/opt seminar
Wikipedia page
13 / 50
Tweet along @dgleich
PROPERTIES OF KATZ’S MATRIX
    is symmetric
    exists when                      
              is sym. pos. def. when                      
Note that         1/max-degree suffices
David F. Gleich (Sandia) ICME la/opt seminar 14 of 50
Tweet along @dgleich
COMMUTE TIME
Consider a uniform random walk on a
graph                    
                                                     
                                               
David F. Gleich (Sandia) ICME la/opt seminar
Also called the hitting
time from node i to j, or
the first transition time
15 of 50
Tweet along @dgleich
SKIPPING DETAILS
                : graph Laplacian
                                                             
            is the only null-vector
David F. Gleich (Sandia) ICME la/opt seminar 16 of 50
Tweet along @dgleich
WHAT DO OTHER PEOPLE DO?
1) Just work with the linear algebra formulations
2) For Katz, Truncate the Neumann series as a few (3-5)
terms (I’m searching for this ref.)
3) Use low-rank approximations from EVD(A) or EVD(L)
4) For commute, use Johnson-Lindenstrauss inspired
random sampling
5) Approximately decompose into smaller problems
David F. Gleich (Sandia) ICME la/opt seminar
Liben-Nowll and Kleinberg CIKM2003,Acar et al. ICDM2009,Spielman and Srivastava STOC2008,Sarkar and Moore UAI2007
17 of 50
Tweet along @dgleich
THE PROBLEM
All of these techniques are
preprocessing based because
most people’s goal is to compute
all the scores.
We want to avoid
preprocessing the graph.
David F. Gleich (Sandia) ICME la/opt seminar
There are a few caveats here! i.e. one could solve the system instead of looking for the matrix inverse
18 of 50
Tweet along @dgleich
WHY NO PREPROCESSING?
The graph is constantly changing
as I rate new movies.
David F. Gleich (Sandia) ICME la/opt seminar 19 of 50
Tweet along @dgleich
WHY NO PREPROCESSING?
David F. Gleich (Sandia) ICME la/opt seminar
Top-k predicted “links”
are movies to watch!
Pairwise scores give
user similarity
20 of 50
Tweet along @dgleich
PAIRWISE ALGORITHMS
Katz                    
Commute                                        
David F. Gleich (Sandia) ICME la/opt seminar
Golub and Meurant
to the rescue!
21 of 50
Tweet along @dgleich
MMQ - THE BIG IDEA
Quadratic form                
   
Weighted sum                  
   
Stieltjes integral                  
   
Quadrature approximation                  
   
Matrix equation              
David F. Gleich (Sandia) ICME la/opt seminar
Think                    
A is s.p.d. use EVD
“A tautology”
Lanczos
22 of 50
Tweet along @dgleich
LANCZOS
          , $k$-steps of the Lanczos method
produce
                                    and                      
David F. Gleich (Sandia) ICME la/opt seminar
                   =
             
23 of 50
Tweet along @dgleich
PRACTICAL LANCZOS
Only need to store the last 2 vectors in      
Updating requires O(matvec) work
      is not orthogonal
David F. Gleich (Sandia) ICME la/opt seminar 24 of 50
Tweet along @dgleich
MMQ PROCEDURE
Goal                        
Given                        
1. Run k-steps of Lanczos on     starting with    
2. Compute       ,     with an additional eigenvalue at     ,
set                    
3. Compute     ,     with an additional eigenvalue at   , set
                 
4. Output               as lower and upper bounds on b
David F. Gleich (Sandia) ICME la/opt seminar
Correspond to a Gauss-Radau rule, with
u as a prescribed node
Correspond to a Gauss-Radau rule, with
l as a prescribed node
25 of 50
Tweet along @dgleich
PRACTICAL MMQ
Increase k to become more accurate
Bad eigenvalue bounds yield worse results
      and     are easy to compute
      not required, we can iteratively
update it’s LU factorization
David F. Gleich (Sandia) ICME la/opt seminar 26 of 50
Tweet along @dgleich
PRACTICAL MMQ
David F. Gleich (Sandia) ICME la/opt seminar 27 of 50
Tweet along @dgleich
ONE LAST STEP FOR KATZ
Katz                    
          
                                                                   
                                     
David F. Gleich (Sandia) ICME la/opt seminar 28 of 50
Tweet along @dgleich
TOP-K ALGORITHM FOR KATZ
Approximate    
                           
where     is sparse
Keep     sparse too
Ideally, don’t “touch” all of    
David F. Gleich (Sandia) ICME la/opt seminar 29 of 50
Tweet along @dgleich
INSPIRATION - PAGERANK
Approximate    
                           
where     is sparse
Keep     sparse too? YES!
Ideally, don’t “touch” all of     ? YES!
David F. Gleich (Sandia) ICME la/opt seminar
McSherryWWW2005, Berkhin 2007,Anderson et al. FOCS2008 –Thanks to Reid Anderson for telling me McSherry did this too.
30 of 50
Tweet along @dgleich
THE ALGORITHM - MCSHERRY
For              
Start with the Richardson iteration
                                                     
Rewrite
                                     
Richardson converges if                        
David F. Gleich (Sandia) ICME la/opt seminar 31 of 50
Tweet along @dgleich
THE ALGORITHM
Note     is sparse.
If                 , then         is sparse.
Idea
only add one component of         to        
David F. Gleich (Sandia) ICME la/opt seminar 32 of 50
Tweet along @dgleich
THE ALGORITHM
For              
Init:                                  
                               
                               
How to pick   ?
David F. Gleich (Sandia) ICME la/opt seminar 33 of 50
Tweet along @dgleich
THE ALGORITHM FOR KATZ
For                          
Init:                                  
                             
                                           
Pick   as max      
David F. Gleich (Sandia) ICME la/opt seminar
Storing the non-zeros of the residual in a heap makes picking the max log(n) time. See Anderson et al. FOCS2008 for more
34 of 50
Tweet along @dgleich
CONVERGENCE?
If you pick   as the maximum element, we
can show this is convergent if Richardson
converges. This proof requires     to be
symmetric positive definite.
David F. Gleich (Sandia) ICME la/opt seminar 35 of 50
Tweet along @dgleich
RESULTS - DATA
All unweighted, connected graphs
David F. Gleich (Sandia) ICME la/opt seminar 36 of 50
Tweet along @dgleich
RESULTS – KATZ ALPHAS
Easy    
                               
Hard    
                       
David F. Gleich (Sandia) ICME la/opt seminar 37 of 50
Tweet along @dgleich
PAIRWISE RESULTS
Katz upper and lower bounds
Katz error convergence
Commute-time upper and lower bounds
Commute-time error convergence
For the arXiv graph here
David F. Gleich (Sandia) ICME la/opt seminar 38 of 50
Tweet along @dgleich
KATZ BOUND CONVERGENCE
David F. Gleich (Sandia) ICME la/opt seminar 39 of 50
Tweet along @dgleich
KATZ ERROR CONVERGENCE
David F. Gleich (Sandia) ICME la/opt seminar 40 of 50
Tweet along @dgleich
COMMUTE BOUND CONVERG.
David F. Gleich (Sandia) ICME la/opt seminar 41 of 50
Tweet along @dgleich
COMMUTE ERROR CONVERG.
David F. Gleich (Sandia) ICME la/opt seminar 42 of 50
Tweet along @dgleich
TOP-K RESULTS
Katz set convergence
Katz order convergence
For arXiv graph
David F. Gleich (Sandia) ICME la/opt seminar 43 of 50
Tweet along @dgleich
KATZ SET CONVERGENCE
David F. Gleich (Sandia) ICME la/opt seminar 44 of 50
Tweet along @dgleich
KATZ ORDER CONVERGENCE
David F. Gleich (Sandia) ICME la/opt seminar 45 of 50
Tweet along @dgleich
CONCLUSIONS
These algorithms are faster than many
alternatives.
For pairwise commute, stopping criteria
are simpler
For top-k, we often need less than 1
matvec for good enough results
David F. Gleich (Sandia) ICME la/opt seminar 46 of 50
Tweet along @dgleich
WARTS
Stopping criteria on our top-k algorithm
can be a bit hairy
The top-k approach doesn’t work right for
commute time
David F. Gleich (Sandia) ICME la/opt seminar 47 of 50
Tweet along @dgleich
TODO
Try on netflix data! 
Explore our “almost commute measure
more”
David F. Gleich (Sandia) ICME la/opt seminar 48 of 50
Tweet along @dgleich
F-MEASURE
David F. Gleich (Sandia) ICME la/opt seminar 49 of 50
Tweet along @dgleich
By AngryDogDesign on DeviantArt
Preprint available by request
Slides should be online soon
Code is online already
stanford.edu/~dgleich/
publications/2010/codes/fast-katz
David F. Gleich (Sandia) ICME la/opt seminar 50

More Related Content

More from David Gleich

Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceDavid Gleich
 

More from David Gleich (20)

Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 

Recently uploaded

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Fast katz-presentation

  • 1. Tweet along @dgleich FAST KATZ AND COMMUTERS Quadrature Rules and Sparse Linear Solvers for Link Prediction Heuristics David F. Gleich Sandia National Labs la/opt seminar October 14th 2010 With Pooya Esfandiar, Francesco Bonchi, Chen Grief, LaksV. S. Lakshmanan, and Byung-Won On David F. Gleich (Sandia) ICME la/opt seminar 1 / 50
  • 2. Tweet along @dgleich MAIN RESULTS – SLIDE ONE A – adjacency matrix L – Laplacian matrix Katz score :                                                 Commute time:                                 David F. Gleich (Sandia) ICME la/opt seminar 2 of 50
  • 3. Tweet along @dgleich MAIN RESULTS – SLIDE TWO For Katz Compute one       fast Compute top       fast For Commute Compute one       fast For almost commute Compute top       fast David F. Gleich (Sandia) ICME la/opt seminar 3 of 50
  • 4. Tweet along @dgleich MAIN RESULTS – SLIDE THREE David F. Gleich (Sandia) ICME la/opt seminar 4 of 50
  • 5. Tweet along @dgleich OUTLINE Why study these measures? Katz Rank and Commute Time How else do people compute them? Quadrature rules for pairwise scores Sparse linear systems solves for top-k As many results as we have time for… David F. Gleich (Sandia) ICME la/opt seminar 5 of 50
  • 6. Tweet along @dgleich WHY? LINK PREDICTION David F. Gleich (Sandia) ICME la/opt seminar Liben-Nowell and Kleinberg 2003, 2006 found that path based link prediction was more efficient Neighborhood based Path based 6 of 50
  • 7. Tweet along @dgleich NOTE All graphs are undirected All graphs are connected David F. Gleich (Sandia) ICME la/opt seminar 7 of 50
  • 8. Tweet along @dgleich LEO KATZ David F. Gleich (Sandia) ICME la/opt seminar 8 of 50
  • 9. Tweet along @dgleich NOT QUITE,WIKIPEDIA     : adjacency,                     : random walk PageRank                         Katz                         These are equivalent if     has constant degree David F. Gleich (Sandia) ICME la/opt seminar 9 of 50
  • 10. Tweet along @dgleich WHAT KATZ ACTUALLY SAID Leo Katz 1953,A New Status Index Derived from Sociometric Analysis, Psychometria 18(1):39-43 “we assume that each link independently has the same probability of being effective” … “we conceive a constant     , depending on the group and the context of the particular investigation, which has the force of a probability of effectiveness of a single link.A k-step chain then, has probability       of being effective.” “We wish to find the column sums of the matrix” David F. Gleich (Sandia) ICME la/opt seminar 10 of 50
  • 11. Tweet along @dgleich A MODERN TAKE The Katz score (node-based) is                     The Katz score (edge-based) is                     David F. Gleich (Sandia) ICME la/opt seminar 11 of 50
  • 12. Tweet along @dgleich RETURNING TO THE MATRIX                                                                             Carl Neumann                                                               David F. Gleich (Sandia) ICME la/opt seminar 12 of 50
  • 13. Tweet along @dgleich Carl Neumann I’ve heard the Neumann series called the “von Neumann” series more than I’d like! In fact, the von Neumann kernel of a graph should be named the “Neumann” kernel! David F. Gleich (Sandia) ICME la/opt seminar Wikipedia page 13 / 50
  • 14. Tweet along @dgleich PROPERTIES OF KATZ’S MATRIX     is symmetric     exists when                                     is sym. pos. def. when                       Note that         1/max-degree suffices David F. Gleich (Sandia) ICME la/opt seminar 14 of 50
  • 15. Tweet along @dgleich COMMUTE TIME Consider a uniform random walk on a graph                                                                                                                           David F. Gleich (Sandia) ICME la/opt seminar Also called the hitting time from node i to j, or the first transition time 15 of 50
  • 16. Tweet along @dgleich SKIPPING DETAILS                 : graph Laplacian                                                                           is the only null-vector David F. Gleich (Sandia) ICME la/opt seminar 16 of 50
  • 17. Tweet along @dgleich WHAT DO OTHER PEOPLE DO? 1) Just work with the linear algebra formulations 2) For Katz, Truncate the Neumann series as a few (3-5) terms (I’m searching for this ref.) 3) Use low-rank approximations from EVD(A) or EVD(L) 4) For commute, use Johnson-Lindenstrauss inspired random sampling 5) Approximately decompose into smaller problems David F. Gleich (Sandia) ICME la/opt seminar Liben-Nowll and Kleinberg CIKM2003,Acar et al. ICDM2009,Spielman and Srivastava STOC2008,Sarkar and Moore UAI2007 17 of 50
  • 18. Tweet along @dgleich THE PROBLEM All of these techniques are preprocessing based because most people’s goal is to compute all the scores. We want to avoid preprocessing the graph. David F. Gleich (Sandia) ICME la/opt seminar There are a few caveats here! i.e. one could solve the system instead of looking for the matrix inverse 18 of 50
  • 19. Tweet along @dgleich WHY NO PREPROCESSING? The graph is constantly changing as I rate new movies. David F. Gleich (Sandia) ICME la/opt seminar 19 of 50
  • 20. Tweet along @dgleich WHY NO PREPROCESSING? David F. Gleich (Sandia) ICME la/opt seminar Top-k predicted “links” are movies to watch! Pairwise scores give user similarity 20 of 50
  • 21. Tweet along @dgleich PAIRWISE ALGORITHMS Katz                     Commute                                         David F. Gleich (Sandia) ICME la/opt seminar Golub and Meurant to the rescue! 21 of 50
  • 22. Tweet along @dgleich MMQ - THE BIG IDEA Quadratic form                     Weighted sum                       Stieltjes integral                       Quadrature approximation                       Matrix equation               David F. Gleich (Sandia) ICME la/opt seminar Think                     A is s.p.d. use EVD “A tautology” Lanczos 22 of 50
  • 23. Tweet along @dgleich LANCZOS           , $k$-steps of the Lanczos method produce                                     and                       David F. Gleich (Sandia) ICME la/opt seminar                    =               23 of 50
  • 24. Tweet along @dgleich PRACTICAL LANCZOS Only need to store the last 2 vectors in       Updating requires O(matvec) work       is not orthogonal David F. Gleich (Sandia) ICME la/opt seminar 24 of 50
  • 25. Tweet along @dgleich MMQ PROCEDURE Goal                         Given                         1. Run k-steps of Lanczos on     starting with     2. Compute       ,     with an additional eigenvalue at     , set                     3. Compute     ,     with an additional eigenvalue at   , set                   4. Output               as lower and upper bounds on b David F. Gleich (Sandia) ICME la/opt seminar Correspond to a Gauss-Radau rule, with u as a prescribed node Correspond to a Gauss-Radau rule, with l as a prescribed node 25 of 50
  • 26. Tweet along @dgleich PRACTICAL MMQ Increase k to become more accurate Bad eigenvalue bounds yield worse results       and     are easy to compute       not required, we can iteratively update it’s LU factorization David F. Gleich (Sandia) ICME la/opt seminar 26 of 50
  • 27. Tweet along @dgleich PRACTICAL MMQ David F. Gleich (Sandia) ICME la/opt seminar 27 of 50
  • 28. Tweet along @dgleich ONE LAST STEP FOR KATZ Katz                                                                                                                                          David F. Gleich (Sandia) ICME la/opt seminar 28 of 50
  • 29. Tweet along @dgleich TOP-K ALGORITHM FOR KATZ Approximate                                 where     is sparse Keep     sparse too Ideally, don’t “touch” all of     David F. Gleich (Sandia) ICME la/opt seminar 29 of 50
  • 30. Tweet along @dgleich INSPIRATION - PAGERANK Approximate                                 where     is sparse Keep     sparse too? YES! Ideally, don’t “touch” all of     ? YES! David F. Gleich (Sandia) ICME la/opt seminar McSherryWWW2005, Berkhin 2007,Anderson et al. FOCS2008 –Thanks to Reid Anderson for telling me McSherry did this too. 30 of 50
  • 31. Tweet along @dgleich THE ALGORITHM - MCSHERRY For               Start with the Richardson iteration                                                       Rewrite                                       Richardson converges if                         David F. Gleich (Sandia) ICME la/opt seminar 31 of 50
  • 32. Tweet along @dgleich THE ALGORITHM Note     is sparse. If                 , then         is sparse. Idea only add one component of         to         David F. Gleich (Sandia) ICME la/opt seminar 32 of 50
  • 33. Tweet along @dgleich THE ALGORITHM For               Init:                                                                                                   How to pick   ? David F. Gleich (Sandia) ICME la/opt seminar 33 of 50
  • 34. Tweet along @dgleich THE ALGORITHM FOR KATZ For                           Init:                                                                                                             Pick   as max       David F. Gleich (Sandia) ICME la/opt seminar Storing the non-zeros of the residual in a heap makes picking the max log(n) time. See Anderson et al. FOCS2008 for more 34 of 50
  • 35. Tweet along @dgleich CONVERGENCE? If you pick   as the maximum element, we can show this is convergent if Richardson converges. This proof requires     to be symmetric positive definite. David F. Gleich (Sandia) ICME la/opt seminar 35 of 50
  • 36. Tweet along @dgleich RESULTS - DATA All unweighted, connected graphs David F. Gleich (Sandia) ICME la/opt seminar 36 of 50
  • 37. Tweet along @dgleich RESULTS – KATZ ALPHAS Easy                                     Hard                             David F. Gleich (Sandia) ICME la/opt seminar 37 of 50
  • 38. Tweet along @dgleich PAIRWISE RESULTS Katz upper and lower bounds Katz error convergence Commute-time upper and lower bounds Commute-time error convergence For the arXiv graph here David F. Gleich (Sandia) ICME la/opt seminar 38 of 50
  • 39. Tweet along @dgleich KATZ BOUND CONVERGENCE David F. Gleich (Sandia) ICME la/opt seminar 39 of 50
  • 40. Tweet along @dgleich KATZ ERROR CONVERGENCE David F. Gleich (Sandia) ICME la/opt seminar 40 of 50
  • 41. Tweet along @dgleich COMMUTE BOUND CONVERG. David F. Gleich (Sandia) ICME la/opt seminar 41 of 50
  • 42. Tweet along @dgleich COMMUTE ERROR CONVERG. David F. Gleich (Sandia) ICME la/opt seminar 42 of 50
  • 43. Tweet along @dgleich TOP-K RESULTS Katz set convergence Katz order convergence For arXiv graph David F. Gleich (Sandia) ICME la/opt seminar 43 of 50
  • 44. Tweet along @dgleich KATZ SET CONVERGENCE David F. Gleich (Sandia) ICME la/opt seminar 44 of 50
  • 45. Tweet along @dgleich KATZ ORDER CONVERGENCE David F. Gleich (Sandia) ICME la/opt seminar 45 of 50
  • 46. Tweet along @dgleich CONCLUSIONS These algorithms are faster than many alternatives. For pairwise commute, stopping criteria are simpler For top-k, we often need less than 1 matvec for good enough results David F. Gleich (Sandia) ICME la/opt seminar 46 of 50
  • 47. Tweet along @dgleich WARTS Stopping criteria on our top-k algorithm can be a bit hairy The top-k approach doesn’t work right for commute time David F. Gleich (Sandia) ICME la/opt seminar 47 of 50
  • 48. Tweet along @dgleich TODO Try on netflix data!  Explore our “almost commute measure more” David F. Gleich (Sandia) ICME la/opt seminar 48 of 50
  • 49. Tweet along @dgleich F-MEASURE David F. Gleich (Sandia) ICME la/opt seminar 49 of 50
  • 50. Tweet along @dgleich By AngryDogDesign on DeviantArt Preprint available by request Slides should be online soon Code is online already stanford.edu/~dgleich/ publications/2010/codes/fast-katz David F. Gleich (Sandia) ICME la/opt seminar 50