Anti-differentiating Approximation Algorithms: PageRank and MinCut
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Anti-differentiating Approximation Algorithms: PageRank and MinCut

on

  • 225 views

We study how Google's PageRank method relates to mincut and a particular type of electrical flow in a network. We also explain the details of how the "push method" for computing PageRank helps to ...

We study how Google's PageRank method relates to mincut and a particular type of electrical flow in a network. We also explain the details of how the "push method" for computing PageRank helps to accelerate it. This has implications for semi-supervised learning and machine learning, as well as social network analysis.

Statistics

Views

Total Views
225
Views on SlideShare
224
Embed Views
1

Actions

Likes
0
Downloads
3
Comments
0

1 Embed 1

http://www.slideee.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Anti-differentiating Approximation Algorithms: PageRank and MinCut Presentation Transcript

  • 1. Anti-differentiating approximation algorithms ! & new relationships between ! Page Rank, spectral, and localized flow David F. Gleich! Purdue University! Joint work with Michael Mahoney. Supported by " NSF CAREER 1149756-CCF, " Simons Inst. ICERM David Gleich · Purdue 1
  • 2. Anti-differentiating approximation algorithms ! & new relationships between ! Page Rank, spectral, and localized flow A new derivation of the PageRank vector for an undirected graph based on Laplacians, cuts, or flows. A new understanding of the “push” methods to compute Personalized PageRank An empirical improvement to methods for semi- supervised learning. 1st 2nd ICERM David Gleich · Purdue 2
  • 3. The PageRank problem ! The PageRank random surfer 1.  With probability beta, follow a random-walk step 2.  With probability (1-beta), jump randomly ~ dist. v. Goal find the stationary dist. x! ! Alg Solve the linear system Symmetric adjacency matrix Diagonal degree matrix Solution Jump-vector (I AD 1 )x = (1 )v x = AD 1 x + (1 )v ICERM David Gleich · Purdue 3
  • 4. The PageRank problem & ! the Laplacian 1. (I AD 1 )x = (1 )v; 2. (I A)y = (1 )D 1/2 v, where A = D 1/2 AD 1/2 and x = D1/2 y; and 3. [↵D + L]z = ↵v where = 1/(1 + ↵) and x = Dz. Combinatorial Laplacian ICERM David Gleich · Purdue 4
  • 5. The Push Algorithm for PageRank Proposed (in closest form) in Andersen, Chung, Lang " (also by McSherry, Jeh & Widom) for personalized PageRank Strongly related to Gauss-Seidel (see my talk at Simons for this) Derived to show improved runtime for balanced solvers 1. x(1) = 0, r(1) = (1 )ei , k = 1 2. while any rj > ⌧dj (dj is the degree of node j) 3. x(k+1) = x(k) + (rj ⌧dj ⇢)ej 4. r(k+1) i = 8 >< >: ⌧dj ⇢ i = j r(k) i + (rj ⌧dj ⇢)/dj i ⇠ j r(k) i otherwise 5. k k + 1 The Push Method! ⌧, ⇢ ICERM David Gleich · Purdue 5
  • 6. … demo of push … ICERM David Gleich · Purdue 6
  • 7. Why do we care about push? 1.  Used for empirical studies of “communities” 2.  Used for “fast PageRank” approximation It produces sparse approximations to PageRank! Newman’s netscience! 379 vertices, 1828 nnz “zero” on most of the nodes v has a single " one here 7
  • 8. Our question! Why does the “push method” have such incredible empirical utility? 8
  • 9. The O(correct) answer 1.  PageRank related to Laplacian 2.  Laplacian related to cuts 3.  Andersen, Chung, Lang provides the " “right” bounds and “localization” This talk the θ(correct) answer?" A deeper insight into the relationship ICERM David Gleich · Purdue 9
  • 10. Intellectually indebted to … Chin, Mądry, Miller & Peng [2013] Orecchia & Zhu [2014] 10
  • 11. minimize kBxkC,1 = P ij2E Ci,j |xi xj | subject to xs = 1, xt = 0, x 0. The s-t min-cut problem Unweighted incidence matrix Diagonal capacity matrix 11
  • 12. The localized cut graph Related to a construction used in “FlowImprove” " Andersen & Lang (2007); and Orecchia & Zhu (2014) AS = 2 4 0 ↵dT S 0 ↵dS A ↵d¯S 0 ↵dT ¯S 0 3 5 Connect s to vertices in S with weight ↵ · degree Connect t to vertices in ¯S with weight ↵ · degree ICERM David Gleich · Purdue 12
  • 13. The localized cut graph Connect s to vertices in S with weight ↵ · degree Connect t to vertices in ¯S with weight ↵ · degree BS = 2 4 e IS 0 0 B 0 0 I¯S e 3 5 minimize kBSxkC(↵),1 subject to xs = 1, xt = 0 x 0. Solve the s-t min-cut ICERM David Gleich · Purdue 13
  • 14. The localized cut graph Connect s to vertices in S with weight ↵ · degree Connect t to vertices in ¯S with weight ↵ · degree BS = 2 4 e IS 0 0 B 0 0 I¯S e 3 5 Solve the “electrical flow” 
 s-t min-cut minimize kBSxkC(↵),2 subject to xs = 1, xt = 0 ICERM David Gleich · Purdue 14
  • 15. s-t min-cut à PageRank The PageRank vector z that solves (↵D + L)z = ↵v with v = dS/vol(S) is a renormalized solution of the electrical cut computation: minimize kBSxkC(↵),2 subject to xs = 1, xt = 0. Specifically, if x is the solution, then x = 2 4 1 vol(S)z 0 3 5 Proof Square and expand the objective into a Laplacian, then apply constraints. ICERM David Gleich · Purdue 15
  • 16. PageRank à s-t min-cut That equivalence works if v is degree-weighted. What if v is the uniform vector? A(s) = 2 4 0 ↵sT 0 ↵s A ↵(d s) 0 ↵(d s)T 0 3 5 . ICERM David Gleich · Purdue 16
  • 17. And beyond … Easy to cook up interesting diffusion-like problems and adapt them to this framework. In particular, Zhou et al. (2004) gave a semi- supervised learning diffusion we study soon. 2 4 0 eT S 0 eS ✓A e¯S 0 e¯S 0 3 5 . (I + ✓L)x = eS ICERM David Gleich · Purdue 17
  • 18. Back to the push method Let x be the output from the push method with 0 < < 1, v = dS/vol(S), ⇢ = 1, and ⌧ > 0. Set ↵ = 1 ,  = ⌧vol(S)/ , and let zG solve: minimize 1 2 kBSzk 2 C(↵),2 + kDzk1 subject to zs = 1, zt = 0, z 0 , where z = h 1 zG 0 i . Then x = DzG/vol(S). Proof Write out KKT conditions Show that the push method solves them. Slackness was “tricky” Regularization for sparsity ICERM David Gleich · Purdue 18 Need for normalization
  • 19. … demo of equivalence … 19
  • 20. This is a case of Algorithmic Anti-differentiation! 20
  • 21. The ideal world Given Problem P Derive solution characterization C Show algorithm A " finds a solution where C holds Profit?! Given “min-cut” Derive “max-flow is equivalent to min-cut” Show push-relabel solves max-flow " Profit!! ICERM David Gleich · Purdue 21
  • 22. (The ideal world)’ Given Problem P Derive solution approx. characterization C’ Show algorithm A’ quickly finds a solution where C’ holds Profit?! Given “sparest-cut” Derive Rayleigh- quotient approximation Show power-method finds a good Rayleigh- quotient Profit?! ICERM David Gleich · Purdue 22
  • 23. The real world? Given Task P Hack around until you find something useful Write paper presenting “novel heuristic” H for P and … Profit!! Given “find-communities” Hack around " ??? (hidden) ??? Write paper presenting “three matvecs finds real- world communities” Profit!! ICERM David Gleich · Purdue 23
  • 24. Understand why H works! Show heuristic H solves P’ Guess and check! until you find something H solves Derive characterization of heuristic H The real world Given “find-communities” Hack around " Write paper presenting “three matvecs finds real- world communities” Profit!! Algorithmic Anti-differentiation! Given heuristic H, is there a problem P’ such that H is an algorithm for P’ ? ICERM David Gleich · Purdue 24 e.g. Mahoney & Orecchia
  • 25. If your algorithm is related to optimization, this is: Given a procedure X, " what objective does it optimize? The real world Algorithmic Anti-differentiation! Given heuristic H, is there a problem P’ such that H is an algorithm for P’ ? In an unconstrained case, this is just “anti-differentiation!” ICERM David Gleich · Purdue 25
  • 26. Algorithmic Anti-differentiation in the literature Dhillon et al. (2007) " Spectral clustering, trace minimization & kernel k-means Saunders (1995) LSQR & Craig iterative methods ICERM David Gleich · Purdue 26
  • 27. Why does it matter?! These details matter in " many empirical studies, and can dramatically impact performance (speed or quality) ICERM David Gleich · Purdue 27
  • 28. Semi-supervised Learning on Graphs Ai,j = exp ✓ kdi dj k2 2 2 2 ◆ di dj = 2.5 = 1.25 Zhou et al. NIPS (2003) 28
  • 29. Semi-supervised Learning on Graphs = 2.5 = 1.25 Experiment predict unlabeled images from the labeled ones 29
  • 30. Semi-supervised Learning on Graphs K2 = (D A) 1 K1 = (I A) 1 K3 = (Diag(Ae) A) 1 Y = Ki L Our new “kernel” Indicators on the revealed labels Predictions Experiment vary number of labeled images and track perf. y = argmaxj Y 30
  • 31. Semi-supervised Learning on Graphs K2 = (D A) 1 K1 = (I A) 1 K3 = (Diag(Ae) A) 1 Y = Ki L Experiment vary number of labeled images and track perf. y = argmaxj Y 0 20 40 0 0.2 0.4 0.6 0.8 1 Num. labels Errorrate K1 K2 K3 RK3 = 1.25 Regularized K3 Zhou et al. NIPS (2004) 31
  • 32. Semi-supervised Learning on Graphs K2 = (D A) 1 K1 = (I A) 1 K3 = (Diag(Ae) A) 1 Y = Ki L Experiment vary number of labeled images and track perf. y = argmaxj Y Regularized K3 = 2.5 Our new value Random guessing 32
  • 33. Semi-supervised Learning on Graphs K2 = (D A) 1 K1 = (I A) 1 K3 = (Diag(Ae) A) 1 Y = Ki L Experiment vary number of labeled images and track perf. y = argmaxj Y Regularized K3 0 20 40 0 0.2 0.4 0.6 0.8 1 Num. labels Errorrate K1 K2 K3 RK3 = 2.5 Our new value Random guessing 33
  • 34. What’s happening? 0 0.5 1 0 0.2 0.4 0.6 0.8 1 2 vs. 1,2,3,4, σ=2.50 false pos. truepos. K1 K2 K3 RK3 0 0.5 1 0 0.2 0.4 0.6 0.8 1 2 vs. 1,2,3,4, σ=1.25 false pos. truepos. K1 K2 K3 RK3 Much better performance! ICERM David Gleich · Purdue 34
  • 35. The results of our ! regularized estimate 500 1000 1500 2000 2500 3000 3500 0.05 0.1 0.15 0.2 0.25 0.3 0.35 ICERM David Gleich · Purdue 35
  • 36. Why does it matter?! Theory has the answer! We “sweep” over cuts from approximate eigenvectors! It’s the order not the values. ICERM David Gleich · Purdue 36
  • 37. 0 20 40 0 0.1 0.2 0.3 0.4 Num. labels Errorrate K1 K2 K3 RK3 Improved performance Y = Ki L Regularized K3 y = argminj SortedRank(Y) We have spent no time tuning the reg. parameter. ICERM David Gleich · Purdue 37 K2 = (D A) 1 K1 = (I A) 1 K3 = (Diag(Ae) A) 1 = 2.5 Our new value
  • 38. Anti-di↵erentiating Approximation Algorithms 16 nonzeros 15 nonzeros 284 nonzeros 24 nonzeros Recap & Conclusions ICERM David Gleich · Purdue 38 Open issues! Better treatment of directed graphs? Algorithm for rho < 1?! rho set to ½ in most “uses” Need new analysis New relationships between localized cuts & PageRank New understanding of PPR" push procedure Improvements to semi- supervised learning on graphs!