Upcoming SlideShare
×

# Anti-differentiating Approximation Algorithms: PageRank and MinCut

640 views

Published on

We study how Google's PageRank method relates to mincut and a particular type of electrical flow in a network. We also explain the details of how the "push method" for computing PageRank helps to accelerate it. This has implications for semi-supervised learning and machine learning, as well as social network analysis.

Published in: Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
640
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
5
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Anti-differentiating Approximation Algorithms: PageRank and MinCut

1. 1. Anti-differentiating approximation algorithms ! & new relationships between ! Page Rank, spectral, and localized ﬂow David F. Gleich! Purdue University! Joint work with Michael Mahoney. Supported by " NSF CAREER 1149756-CCF, " Simons Inst. ICERM David Gleich · Purdue 1
2. 2. Anti-differentiating approximation algorithms ! & new relationships between ! Page Rank, spectral, and localized ﬂow A new derivation of the PageRank vector for an undirected graph based on Laplacians, cuts, or ﬂows. A new understanding of the “push” methods to compute Personalized PageRank An empirical improvement to methods for semi- supervised learning. 1st 2nd ICERM David Gleich · Purdue 2
3. 3. The PageRank problem ! The PageRank random surfer 1.  With probability beta, follow a random-walk step 2.  With probability (1-beta), jump randomly ~ dist. v. Goal ﬁnd the stationary dist. x! ! Alg Solve the linear system Symmetric adjacency matrix Diagonal degree matrix Solution Jump-vector (I AD 1 )x = (1 )v x = AD 1 x + (1 )v ICERM David Gleich · Purdue 3
4. 4. The PageRank problem & ! the Laplacian 1. (I AD 1 )x = (1 )v; 2. (I A)y = (1 )D 1/2 v, where A = D 1/2 AD 1/2 and x = D1/2 y; and 3. [↵D + L]z = ↵v where = 1/(1 + ↵) and x = Dz. Combinatorial Laplacian ICERM David Gleich · Purdue 4
5. 5. The Push Algorithm for PageRank Proposed (in closest form) in Andersen, Chung, Lang " (also by McSherry, Jeh & Widom) for personalized PageRank Strongly related to Gauss-Seidel (see my talk at Simons for this) Derived to show improved runtime for balanced solvers 1. x(1) = 0, r(1) = (1 )ei , k = 1 2. while any rj > ⌧dj (dj is the degree of node j) 3. x(k+1) = x(k) + (rj ⌧dj ⇢)ej 4. r(k+1) i = 8 >< >: ⌧dj ⇢ i = j r(k) i + (rj ⌧dj ⇢)/dj i ⇠ j r(k) i otherwise 5. k k + 1 The Push Method! ⌧, ⇢ ICERM David Gleich · Purdue 5
6. 6. … demo of push … ICERM David Gleich · Purdue 6
7. 7. Why do we care about push? 1.  Used for empirical studies of “communities” 2.  Used for “fast PageRank” approximation It produces sparse approximations to PageRank! Newman’s netscience! 379 vertices, 1828 nnz “zero” on most of the nodes v has a single " one here 7
8. 8. Our question! Why does the “push method” have such incredible empirical utility? 8
9. 9. The O(correct) answer 1.  PageRank related to Laplacian 2.  Laplacian related to cuts 3.  Andersen, Chung, Lang provides the " “right” bounds and “localization” This talk the θ(correct) answer?" A deeper insight into the relationship ICERM David Gleich · Purdue 9
10. 10. Intellectually indebted to … Chin, Mądry, Miller & Peng [2013] Orecchia & Zhu [2014] 10
11. 11. minimize kBxkC,1 = P ij2E Ci,j |xi xj | subject to xs = 1, xt = 0, x 0. The s-t min-cut problem Unweighted incidence matrix Diagonal capacity matrix 11
12. 12. The localized cut graph Related to a construction used in “FlowImprove” " Andersen & Lang (2007); and Orecchia & Zhu (2014) AS = 2 4 0 ↵dT S 0 ↵dS A ↵d¯S 0 ↵dT ¯S 0 3 5 Connect s to vertices in S with weight ↵ · degree Connect t to vertices in ¯S with weight ↵ · degree ICERM David Gleich · Purdue 12
13. 13. The localized cut graph Connect s to vertices in S with weight ↵ · degree Connect t to vertices in ¯S with weight ↵ · degree BS = 2 4 e IS 0 0 B 0 0 I¯S e 3 5 minimize kBSxkC(↵),1 subject to xs = 1, xt = 0 x 0. Solve the s-t min-cut ICERM David Gleich · Purdue 13
14. 14. The localized cut graph Connect s to vertices in S with weight ↵ · degree Connect t to vertices in ¯S with weight ↵ · degree BS = 2 4 e IS 0 0 B 0 0 I¯S e 3 5 Solve the “electrical ﬂow”   s-t min-cut minimize kBSxkC(↵),2 subject to xs = 1, xt = 0 ICERM David Gleich · Purdue 14
15. 15. s-t min-cut à PageRank The PageRank vector z that solves (↵D + L)z = ↵v with v = dS/vol(S) is a renormalized solution of the electrical cut computation: minimize kBSxkC(↵),2 subject to xs = 1, xt = 0. Speciﬁcally, if x is the solution, then x = 2 4 1 vol(S)z 0 3 5 Proof Square and expand the objective into a Laplacian, then apply constraints. ICERM David Gleich · Purdue 15
16. 16. PageRank à s-t min-cut That equivalence works if v is degree-weighted. What if v is the uniform vector? A(s) = 2 4 0 ↵sT 0 ↵s A ↵(d s) 0 ↵(d s)T 0 3 5 . ICERM David Gleich · Purdue 16
17. 17. And beyond … Easy to cook up interesting diffusion-like problems and adapt them to this framework. In particular, Zhou et al. (2004) gave a semi- supervised learning diffusion we study soon. 2 4 0 eT S 0 eS ✓A e¯S 0 e¯S 0 3 5 . (I + ✓L)x = eS ICERM David Gleich · Purdue 17
18. 18. Back to the push method Let x be the output from the push method with 0 < < 1, v = dS/vol(S), ⇢ = 1, and ⌧ > 0. Set ↵ = 1 ,  = ⌧vol(S)/ , and let zG solve: minimize 1 2 kBSzk 2 C(↵),2 + kDzk1 subject to zs = 1, zt = 0, z 0 , where z = h 1 zG 0 i . Then x = DzG/vol(S). Proof Write out KKT conditions Show that the push method solves them. Slackness was “tricky” Regularization for sparsity ICERM David Gleich · Purdue 18 Need for normalization
19. 19. … demo of equivalence … 19
20. 20. This is a case of Algorithmic Anti-differentiation! 20
21. 21. The ideal world Given Problem P Derive solution characterization C Show algorithm A " ﬁnds a solution where C holds Proﬁt?! Given “min-cut” Derive “max-ﬂow is equivalent to min-cut” Show push-relabel solves max-ﬂow " Proﬁt!! ICERM David Gleich · Purdue 21
22. 22. (The ideal world)’ Given Problem P Derive solution approx. characterization C’ Show algorithm A’ quickly ﬁnds a solution where C’ holds Proﬁt?! Given “sparest-cut” Derive Rayleigh- quotient approximation Show power-method ﬁnds a good Rayleigh- quotient Proﬁt?! ICERM David Gleich · Purdue 22
23. 23. The real world? Given Task P Hack around until you ﬁnd something useful Write paper presenting “novel heuristic” H for P and … Proﬁt!! Given “ﬁnd-communities” Hack around " ??? (hidden) ??? Write paper presenting “three matvecs ﬁnds real- world communities” Proﬁt!! ICERM David Gleich · Purdue 23
24. 24. Understand why H works! Show heuristic H solves P’ Guess and check! until you ﬁnd something H solves Derive characterization of heuristic H The real world Given “ﬁnd-communities” Hack around " Write paper presenting “three matvecs ﬁnds real- world communities” Proﬁt!! Algorithmic Anti-differentiation! Given heuristic H, is there a problem P’ such that H is an algorithm for P’ ? ICERM David Gleich · Purdue 24 e.g. Mahoney & Orecchia
25. 25. If your algorithm is related to optimization, this is: Given a procedure X, " what objective does it optimize? The real world Algorithmic Anti-differentiation! Given heuristic H, is there a problem P’ such that H is an algorithm for P’ ? In an unconstrained case, this is just “anti-differentiation!” ICERM David Gleich · Purdue 25
26. 26. Algorithmic Anti-differentiation in the literature Dhillon et al. (2007) " Spectral clustering, trace minimization & kernel k-means Saunders (1995) LSQR & Craig iterative methods ICERM David Gleich · Purdue 26
27. 27. Why does it matter?! These details matter in " many empirical studies, and can dramatically impact performance (speed or quality) ICERM David Gleich · Purdue 27
28. 28. Semi-supervised Learning on Graphs Ai,j = exp ✓ kdi dj k2 2 2 2 ◆ di dj = 2.5 = 1.25 Zhou et al. NIPS (2003) 28
29. 29. Semi-supervised Learning on Graphs = 2.5 = 1.25 Experiment predict unlabeled images from the labeled ones 29
30. 30. Semi-supervised Learning on Graphs K2 = (D A) 1 K1 = (I A) 1 K3 = (Diag(Ae) A) 1 Y = Ki L Our new “kernel” Indicators on the revealed labels Predictions Experiment vary number of labeled images and track perf. y = argmaxj Y 30
31. 31. Semi-supervised Learning on Graphs K2 = (D A) 1 K1 = (I A) 1 K3 = (Diag(Ae) A) 1 Y = Ki L Experiment vary number of labeled images and track perf. y = argmaxj Y 0 20 40 0 0.2 0.4 0.6 0.8 1 Num. labels Errorrate K1 K2 K3 RK3 = 1.25 Regularized K3 Zhou et al. NIPS (2004) 31
32. 32. Semi-supervised Learning on Graphs K2 = (D A) 1 K1 = (I A) 1 K3 = (Diag(Ae) A) 1 Y = Ki L Experiment vary number of labeled images and track perf. y = argmaxj Y Regularized K3 = 2.5 Our new value Random guessing 32
33. 33. Semi-supervised Learning on Graphs K2 = (D A) 1 K1 = (I A) 1 K3 = (Diag(Ae) A) 1 Y = Ki L Experiment vary number of labeled images and track perf. y = argmaxj Y Regularized K3 0 20 40 0 0.2 0.4 0.6 0.8 1 Num. labels Errorrate K1 K2 K3 RK3 = 2.5 Our new value Random guessing 33
34. 34. What’s happening? 0 0.5 1 0 0.2 0.4 0.6 0.8 1 2 vs. 1,2,3,4, σ=2.50 false pos. truepos. K1 K2 K3 RK3 0 0.5 1 0 0.2 0.4 0.6 0.8 1 2 vs. 1,2,3,4, σ=1.25 false pos. truepos. K1 K2 K3 RK3 Much better performance! ICERM David Gleich · Purdue 34
35. 35. The results of our ! regularized estimate 500 1000 1500 2000 2500 3000 3500 0.05 0.1 0.15 0.2 0.25 0.3 0.35 ICERM David Gleich · Purdue 35
36. 36. Why does it matter?! Theory has the answer! We “sweep” over cuts from approximate eigenvectors! It’s the order not the values. ICERM David Gleich · Purdue 36
37. 37. 0 20 40 0 0.1 0.2 0.3 0.4 Num. labels Errorrate K1 K2 K3 RK3 Improved performance Y = Ki L Regularized K3 y = argminj SortedRank(Y) We have spent no time tuning the reg. parameter. ICERM David Gleich · Purdue 37 K2 = (D A) 1 K1 = (I A) 1 K3 = (Diag(Ae) A) 1 = 2.5 Our new value
38. 38. Anti-di↵erentiating Approximation Algorithms 16 nonzeros 15 nonzeros 284 nonzeros 24 nonzeros Recap & Conclusions ICERM David Gleich · Purdue 38 Open issues! Better treatment of directed graphs? Algorithm for rho < 1?! rho set to ½ in most “uses” Need new analysis New relationships between localized cuts & PageRank New understanding of PPR" push procedure Improvements to semi- supervised learning on graphs!