Anti-differentiating approximation algorithms: A case study with min-cuts, spectral, and flow

618 views

Published on

This talk covers the idea of anti-differentiating approximation algorithms, which is an idea to explain the success of widely used heuristic procedures. Formally, this involves finding an optimization problem solved exactly by an approximation algorithm or heuristic.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
618
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Anti-differentiating approximation algorithms: A case study with min-cuts, spectral, and flow

  1. 1. Algorithmic ! Anti-Differentiation! A case study with ! min-cuts, spectral, and flow ! ! David F. Gleich · Purdue University! Michael W. Mahoney · Berkeley ICSI! Code "www.cs.purdue.edu/homes/dgleich/codes/l1pagerank! 1
  2. 2. Algorithmic Anti-differentiation! Understanding how and why heuristic procedures •  Early stopping •  Truncating small entries •  etc are actually algorithms for implicit objectives. 2 ICML David Gleich · Purdue
  3. 3. The ideal world Given Problem P Derive solution characterization C Show algorithm A " finds a solution where C holds Profit?! Given “min-cut” Derive “max-flow is equivalent to min-cut” Show push-relabel solves max-flow " Profit!! ICML David Gleich · Purdue 3
  4. 4. (The ideal world)’ Given Problem P Derive solution approx. characterization C Show algorithm A’ " finds a solution where C’ holds Profit?! Given “sparsest-cut” Derive Rayleigh- quotient approximation Show power method finds good Rayleigh quotient Profit? ! ICML David Gleich · Purdue 4 (In academia!)!
  5. 5. The real world Given Task P Hack around until you find something useful Write paper presenting “novel heuristic” H for P and Profit!! Given “find-communities” Hack around ! … hidden ..! Write paper on “three steps of power method finds communities” Profit!! ICML David Gleich · Purdue 5
  6. 6. (The ideal world)’’ Understand why H works! Show heuristic H solves P’ Guess and check! until you find something H solves Derive characterization of heuristic H Given “find-communities” Hack around ! ! Write paper on “three steps of power method finds communities” Profit!! ICML David Gleich · Purdue 6
  7. 7. If your algorithm is related to optimization, this is: Given a procedure X, " what objective does it optimize? The real world Algorithmic Anti-differentiation! Given heuristic H, is there a problem P’ such that H is an algorithm for P’ ? In the smooth, unconstrained case, this is just “anti- differentiation!” ICML David Gleich · Purdue 7
  8. 8. Algorithmic Anti-differentiation in the literature Mahoney & Orecchia (2011) 
 Three steps of the power method and p-norm reg. Dhillon et al. (2007) " Spectral clustering, trace minimization & kernel k-means Saunders (1995) LSQR & Craig iterative methods for Ax = b! … many more … ICML David Gleich · Purdue 8
  9. 9. Outline 1.  A new derivation of the PageRank vector for an undirected graph based on Laplacians, cuts, or flows. 2.  An understanding of the implicit regularization of PageRank “push” method. 3.  The impact of this on a few applications. ICML David Gleich · Purdue 9
  10. 10. The PageRank problem The PageRank random surfer 1.  With probability beta, follow a random-walk step 2.  With probability (1-beta), jump randomly ~ dist. v. Goal find the stationary dist. x! ! Sym. adjacency matrix Diagonal degree matrix Solution Jump-vector (I AD 1 )x = (1 )v ICML David Gleich · Purdue 10 [↵D + L]z = ↵v where = 1/(1 + ↵) and x = Dz Equivalent to Combinatorial " Laplacian
  11. 11. The Push Algorithm for PageRank! Proposed (in closest form) in Andersen, Chung, Lang " (also by McSherry, Jeh & Widom) for personalized PageRank Strongly related to Gauss-Seidel, coordinate descent Derived to quickly approximate PageRank with sparsity 1. x(1) = 0, r(1) = (1 )ei , k = 1 2. while any rj > ⌧dj (dj is the degree of node j) 3. x(k+1) = x(k) + (rj ⌧dj ⇢)ej 4. r(k+1) i = 8 >< >: ⌧dj ⇢ i = j r(k) i + (rj ⌧dj ⇢)/dj i ⇠ j r(k) i otherwise 5. k k + 1 The Push Method! ⌧, ⇢ ICML David Gleich · Purdue 11
  12. 12. The push method stays local ICML David Gleich · Purdue 12
  13. 13. Why do we care about push? 1.  Used for empirical studies of “communities” and an ingredient in an empirically successful community finder (Whang et al. CIKM 2013). 2.  Used for “fast PageRank” approximation 3.  It produces sparse approximations to PageRank! Newman’s netscience! 379 vertices, 1828 nnz “zero” on most of the nodes v has a single " one here 13 ICML
  14. 14. minimize kBxkC,1 = P ij2E Ci,j |xi xj | subject to xs = 1, xt = 0, x 0. The s-t min-cut problem Unweighted incidence matrix Diagonal cost matrix 14 ICML David Gleich · Purdue
  15. 15. The localized cut graph Related to a construction used in “FlowImprove” " Andersen & Lang (2007); and Orecchia & Zhu (2014) AS = 2 4 0 ↵dT S 0 ↵dS A ↵d¯S 0 ↵dT ¯S 0 3 5 Connect s to vertices in S with weight ↵ · degree Connect t to vertices in ¯S with weight ↵ · degree ICML David Gleich · Purdue 15
  16. 16. The localized cut graph & PageRank ICML David Gleich · Purdue 16 minimize kBSxkC(↵),1 subject to xs = 1, xt = 0 x 0. Solve the s-t min-cut
  17. 17. The localized cut graph & PageRank ICML David Gleich · Purdue 17 Solve “spectral” s-t min-cut minimize kBSxkC(↵),2 subject to xs = 1, xt = 0 x 0. The PageRank vector z that solves (↵D + L)z = ↵v with v = dS/vol(S) is a renormalized solution of the electrical cut computation: minimize kBSxkC(↵),2 subject to xs = 1, xt = 0. Specifically, if x is the solution, then x = 2 4 1 vol(S)z 0 3 5
  18. 18. Back to the push method Let x be the output from the push method with 0 < < 1, v = dS/vol(S), ⇢ = 1, and ⌧ > 0. Set ↵ = 1 ,  = ⌧vol(S)/ , and let zG solve: minimize 1 2 kBSzk 2 C(↵),2 + kDzk1 subject to zs = 1, zt = 0, z 0 , where z = h 1 zG 0 i . Then x = DzG/vol(S). Proof Write out KKT conditions Show that the push method solves them. Slackness was “tricky” Regularization for sparsity ICML David Gleich · Purdue 18 Need for normalization
  19. 19. A simple example The vector xpr, z, and x(↵, S ) are the PageRank vectors from Theo- rem 1, where x(↵, S ) solves Prob. (4) and the others are from the problems at the end of Section 2. The vector xcut solves the cut Prob. (2), and zG solves Prob. (6). Deg. xpr z x(↵, S ) xcut zG 2 0.0788 0.0394 0.8276 1 0.2758 4 0.1475 0.0369 0.7742 1 0.2437 7 0.2362 0.0337 0.7086 1 0.2138 4 0.1435 0.0359 0.7533 1 0.2325 4 0.1297 0.0324 0.6812 1 0.1977 7 0.1186 0.0169 0.3557 0 0 3 0.0385 0.0128 0.2693 0 0 2 0.0167 0.0083 0.1749 0 0 4 0.0487 0.0122 0.2554 0 0 3 0.0419 0.0140 0.2933 0 0 Prob. (6) solves an `1-regularized `2 regression problem) has 24 non-zeros. The true “min-cut” set is large in both the 2-norm PageRank problem and the regularized problem. Thus, we identify the underlying graph feature correctly; but the implicitly regularized ACL procedure does so with many fewer non-zeros than the vanilla PageRank procedure. ICML David Gleich · Purdue 19
  20. 20. David Gleich · Purdue 20 Anti-di↵erentiating Approximat 16 nonzeros 15 nonzeros Figure 2. Examples of the di↵erent cut vectors on a portion of the netscience with its vertices enlarged. In the other subfigures, we show the solution vectors (4), and (6), solved with min-cut, PageRank, and ACL) for this set S . Each v values are large and dark. White vertices with outlines are numerically non-zer outlined, in contrast to the third figure). The true min-cut set is large in all ve with many fewer non-zeros than the vanilla PageRank problem. References Andersen, Reid and Lang, Kevin. An algorithm for improving graph partitions. In Proceedings of the 19th annual ACM-SIAM Symposium on Discrete Algorithms, pp. 651–660, 2008. Andersen, Reid, Chung, Fan, and Lang, Kevin. Local graph par- titioning using PageRank vectors. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, Leskov Mic clus Inte Mahon regu of th 143 Anti-di↵erentiating Approximation Algorithms eros 15 nonzeros 284 nonzeros 24 nonzeros of the di↵erent cut vectors on a portion of the netscience graph. In the left subfigure, we show the set S highlighted arged. In the other subfigures, we show the solution vectors from the various cut problems (from left to right, Probs. (2), with min-cut, PageRank, and ACL) for this set S . Each vector determines the color and size of a vertex, where high dark. White vertices with outlines are numerically non-zero (which is why most of the vertices in the fourth figure are t to the third figure). The true min-cut set is large in all vectors, but the implicitly regularized problem achieves this Push’s sparsity helps it identify the “right” graph feature with fewer non-zeros The set S The mincut solution The push solution The PageRank solution ICML
  21. 21. It’s easy to make this apply broadly Easy to cook up interesting diffusion-like problems and adapt them to this framework. In particular, Zhou et al. (2004) gave a semi-supervised learning diffusion we are currently studying … 2 4 0 eT S 0 eS ✓A e¯S 0 e¯S 0 3 5 . ICML David Gleich · Purdue 21 minimize 1 2 kBS ˆxk 2 2 + kˆxk1 subject to ˆxs = 1, ˆxt = 0, ˆx 0 minimize 1 2 xT (I + ✓L)x xT eS + kxk1 subject to x 0
  22. 22. Anti-di↵erentiating Approximation Algorithms 16 nonzeros 15 nonzeros 284 nonzeros 24 nonzeros Figure 1. Examples of the di↵erent cut vectors on a portion of the net-science graph. At left we show the set S highlighted Recap & Conclusions ICML David Gleich · Purdue 22 Open issues! Better treatment of directed graphs? Algorithm for rho < 1?! rho set to ½ in most “uses” Need new analysis (Coming soon)" Improvements to semi-supervised learning on graphs! Key point We don’t solve the 1-norm regularized problem with a 1-norm solver, but with the efficient push method. Run push, and you get a 1-norm reg. with early stopping David Gleich · Purdue Supported by NSF CAREER 1149756-CCF www.cs.purdue.edu/homes/dgleich 1.  “Defined” alg. anti-diff to understand why heuristics work. 2.  Found equiv. w/ PageRank and cut / flow. 3.  Push & 1-norm regularization.
  23. 23. PageRank à s-t min-cut That equivalence works if s is degree-weighted. What if s is the uniform vector? A(s) = 2 4 0 ↵sT 0 ↵s A ↵(d s) 0 ↵(d s)T 0 3 5 . David Gleich · Purdue 23 MMDS 2014

×