Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.                       Upcoming SlideShare
Loading in …5
×

# Anti-differentiating approximation algorithms: A case study with min-cuts, spectral, and flow

728 views

Published on

This talk covers the idea of anti-differentiating approximation algorithms, which is an idea to explain the success of widely used heuristic procedures. Formally, this involves finding an optimization problem solved exactly by an approximation algorithm or heuristic.

Published in: Technology, Education
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here • Be the first to comment

### Anti-differentiating approximation algorithms: A case study with min-cuts, spectral, and flow

1. 1. Algorithmic ! Anti-Differentiation! A case study with ! min-cuts, spectral, and ﬂow ! ! David F. Gleich · Purdue University! Michael W. Mahoney · Berkeley ICSI! Code "www.cs.purdue.edu/homes/dgleich/codes/l1pagerank! 1
2. 2. Algorithmic Anti-differentiation! Understanding how and why heuristic procedures •  Early stopping •  Truncating small entries •  etc are actually algorithms for implicit objectives. 2 ICML David Gleich · Purdue
3. 3. The ideal world Given Problem P Derive solution characterization C Show algorithm A " ﬁnds a solution where C holds Proﬁt?! Given “min-cut” Derive “max-ﬂow is equivalent to min-cut” Show push-relabel solves max-ﬂow " Proﬁt!! ICML David Gleich · Purdue 3
4. 4. (The ideal world)’ Given Problem P Derive solution approx. characterization C Show algorithm A’ " ﬁnds a solution where C’ holds Proﬁt?! Given “sparsest-cut” Derive Rayleigh- quotient approximation Show power method ﬁnds good Rayleigh quotient Proﬁt? ! ICML David Gleich · Purdue 4 (In academia!)!
5. 5. The real world Given Task P Hack around until you ﬁnd something useful Write paper presenting “novel heuristic” H for P and Proﬁt!! Given “ﬁnd-communities” Hack around ! … hidden ..! Write paper on “three steps of power method ﬁnds communities” Proﬁt!! ICML David Gleich · Purdue 5
6. 6. (The ideal world)’’ Understand why H works! Show heuristic H solves P’ Guess and check! until you ﬁnd something H solves Derive characterization of heuristic H Given “ﬁnd-communities” Hack around ! ! Write paper on “three steps of power method ﬁnds communities” Proﬁt!! ICML David Gleich · Purdue 6
7. 7. If your algorithm is related to optimization, this is: Given a procedure X, " what objective does it optimize? The real world Algorithmic Anti-differentiation! Given heuristic H, is there a problem P’ such that H is an algorithm for P’ ? In the smooth, unconstrained case, this is just “anti- differentiation!” ICML David Gleich · Purdue 7
8. 8. Algorithmic Anti-differentiation in the literature Mahoney & Orecchia (2011)   Three steps of the power method and p-norm reg. Dhillon et al. (2007) " Spectral clustering, trace minimization & kernel k-means Saunders (1995) LSQR & Craig iterative methods for Ax = b! … many more … ICML David Gleich · Purdue 8
9. 9. Outline 1.  A new derivation of the PageRank vector for an undirected graph based on Laplacians, cuts, or ﬂows. 2.  An understanding of the implicit regularization of PageRank “push” method. 3.  The impact of this on a few applications. ICML David Gleich · Purdue 9
10. 10. The PageRank problem The PageRank random surfer 1.  With probability beta, follow a random-walk step 2.  With probability (1-beta), jump randomly ~ dist. v. Goal ﬁnd the stationary dist. x! ! Sym. adjacency matrix Diagonal degree matrix Solution Jump-vector (I AD 1 )x = (1 )v ICML David Gleich · Purdue 10 [↵D + L]z = ↵v where = 1/(1 + ↵) and x = Dz Equivalent to Combinatorial " Laplacian
11. 11. The Push Algorithm for PageRank! Proposed (in closest form) in Andersen, Chung, Lang " (also by McSherry, Jeh & Widom) for personalized PageRank Strongly related to Gauss-Seidel, coordinate descent Derived to quickly approximate PageRank with sparsity 1. x(1) = 0, r(1) = (1 )ei , k = 1 2. while any rj > ⌧dj (dj is the degree of node j) 3. x(k+1) = x(k) + (rj ⌧dj ⇢)ej 4. r(k+1) i = 8 >< >: ⌧dj ⇢ i = j r(k) i + (rj ⌧dj ⇢)/dj i ⇠ j r(k) i otherwise 5. k k + 1 The Push Method! ⌧, ⇢ ICML David Gleich · Purdue 11
12. 12. The push method stays local ICML David Gleich · Purdue 12
13. 13. Why do we care about push? 1.  Used for empirical studies of “communities” and an ingredient in an empirically successful community ﬁnder (Whang et al. CIKM 2013). 2.  Used for “fast PageRank” approximation 3.  It produces sparse approximations to PageRank! Newman’s netscience! 379 vertices, 1828 nnz “zero” on most of the nodes v has a single " one here 13 ICML
14. 14. minimize kBxkC,1 = P ij2E Ci,j |xi xj | subject to xs = 1, xt = 0, x 0. The s-t min-cut problem Unweighted incidence matrix Diagonal cost matrix 14 ICML David Gleich · Purdue
15. 15. The localized cut graph Related to a construction used in “FlowImprove” " Andersen & Lang (2007); and Orecchia & Zhu (2014) AS = 2 4 0 ↵dT S 0 ↵dS A ↵d¯S 0 ↵dT ¯S 0 3 5 Connect s to vertices in S with weight ↵ · degree Connect t to vertices in ¯S with weight ↵ · degree ICML David Gleich · Purdue 15
16. 16. The localized cut graph & PageRank ICML David Gleich · Purdue 16 minimize kBSxkC(↵),1 subject to xs = 1, xt = 0 x 0. Solve the s-t min-cut
17. 17. The localized cut graph & PageRank ICML David Gleich · Purdue 17 Solve “spectral” s-t min-cut minimize kBSxkC(↵),2 subject to xs = 1, xt = 0 x 0. The PageRank vector z that solves (↵D + L)z = ↵v with v = dS/vol(S) is a renormalized solution of the electrical cut computation: minimize kBSxkC(↵),2 subject to xs = 1, xt = 0. Speciﬁcally, if x is the solution, then x = 2 4 1 vol(S)z 0 3 5
18. 18. Back to the push method Let x be the output from the push method with 0 < < 1, v = dS/vol(S), ⇢ = 1, and ⌧ > 0. Set ↵ = 1 ,  = ⌧vol(S)/ , and let zG solve: minimize 1 2 kBSzk 2 C(↵),2 + kDzk1 subject to zs = 1, zt = 0, z 0 , where z = h 1 zG 0 i . Then x = DzG/vol(S). Proof Write out KKT conditions Show that the push method solves them. Slackness was “tricky” Regularization for sparsity ICML David Gleich · Purdue 18 Need for normalization
19. 19. A simple example The vector xpr, z, and x(↵, S ) are the PageRank vectors from Theo- rem 1, where x(↵, S ) solves Prob. (4) and the others are from the problems at the end of Section 2. The vector xcut solves the cut Prob. (2), and zG solves Prob. (6). Deg. xpr z x(↵, S ) xcut zG 2 0.0788 0.0394 0.8276 1 0.2758 4 0.1475 0.0369 0.7742 1 0.2437 7 0.2362 0.0337 0.7086 1 0.2138 4 0.1435 0.0359 0.7533 1 0.2325 4 0.1297 0.0324 0.6812 1 0.1977 7 0.1186 0.0169 0.3557 0 0 3 0.0385 0.0128 0.2693 0 0 2 0.0167 0.0083 0.1749 0 0 4 0.0487 0.0122 0.2554 0 0 3 0.0419 0.0140 0.2933 0 0 Prob. (6) solves an `1-regularized `2 regression problem) has 24 non-zeros. The true “min-cut” set is large in both the 2-norm PageRank problem and the regularized problem. Thus, we identify the underlying graph feature correctly; but the implicitly regularized ACL procedure does so with many fewer non-zeros than the vanilla PageRank procedure. ICML David Gleich · Purdue 19
20. 20. David Gleich · Purdue 20 Anti-di↵erentiating Approximat 16 nonzeros 15 nonzeros Figure 2. Examples of the di↵erent cut vectors on a portion of the netscience with its vertices enlarged. In the other subﬁgures, we show the solution vectors (4), and (6), solved with min-cut, PageRank, and ACL) for this set S . Each v values are large and dark. White vertices with outlines are numerically non-zer outlined, in contrast to the third ﬁgure). The true min-cut set is large in all ve with many fewer non-zeros than the vanilla PageRank problem. References Andersen, Reid and Lang, Kevin. An algorithm for improving graph partitions. In Proceedings of the 19th annual ACM-SIAM Symposium on Discrete Algorithms, pp. 651–660, 2008. Andersen, Reid, Chung, Fan, and Lang, Kevin. Local graph par- titioning using PageRank vectors. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, Leskov Mic clus Inte Mahon regu of th 143 Anti-di↵erentiating Approximation Algorithms eros 15 nonzeros 284 nonzeros 24 nonzeros of the di↵erent cut vectors on a portion of the netscience graph. In the left subﬁgure, we show the set S highlighted arged. In the other subﬁgures, we show the solution vectors from the various cut problems (from left to right, Probs. (2), with min-cut, PageRank, and ACL) for this set S . Each vector determines the color and size of a vertex, where high dark. White vertices with outlines are numerically non-zero (which is why most of the vertices in the fourth ﬁgure are t to the third ﬁgure). The true min-cut set is large in all vectors, but the implicitly regularized problem achieves this Push’s sparsity helps it identify the “right” graph feature with fewer non-zeros The set S The mincut solution The push solution The PageRank solution ICML
21. 21. It’s easy to make this apply broadly Easy to cook up interesting diffusion-like problems and adapt them to this framework. In particular, Zhou et al. (2004) gave a semi-supervised learning diffusion we are currently studying … 2 4 0 eT S 0 eS ✓A e¯S 0 e¯S 0 3 5 . ICML David Gleich · Purdue 21 minimize 1 2 kBS ˆxk 2 2 + kˆxk1 subject to ˆxs = 1, ˆxt = 0, ˆx 0 minimize 1 2 xT (I + ✓L)x xT eS + kxk1 subject to x 0
22. 22. Anti-di↵erentiating Approximation Algorithms 16 nonzeros 15 nonzeros 284 nonzeros 24 nonzeros Figure 1. Examples of the di↵erent cut vectors on a portion of the net-science graph. At left we show the set S highlighted Recap & Conclusions ICML David Gleich · Purdue 22 Open issues! Better treatment of directed graphs? Algorithm for rho < 1?! rho set to ½ in most “uses” Need new analysis (Coming soon)" Improvements to semi-supervised learning on graphs! Key point We don’t solve the 1-norm regularized problem with a 1-norm solver, but with the efﬁcient push method. Run push, and you get a 1-norm reg. with early stopping David Gleich · Purdue Supported by NSF CAREER 1149756-CCF www.cs.purdue.edu/homes/dgleich 1.  “Deﬁned” alg. anti-diff to understand why heuristics work. 2.  Found equiv. w/ PageRank and cut / ﬂow. 3.  Push & 1-norm regularization.
23. 23. PageRank à s-t min-cut That equivalence works if s is degree-weighted. What if s is the uniform vector? A(s) = 2 4 0 ↵sT 0 ↵s A ↵(d s) 0 ↵(d s)T 0 3 5 . David Gleich · Purdue 23 MMDS 2014