Anti-differentiating approximation algorithms: A case study with min-cuts, spectral, and flow

Algorithmic !
Anti-Differentiation!
A case study with !
min-cuts, spectral, and ﬂow
!
!
David F. Gleich · Purdue University!
Michael W. Mahoney · Berkeley ICSI!
Code "www.cs.purdue.edu/homes/dgleich/codes/l1pagerank!
1

Algorithmic Anti-differentiation!
Understanding how and why heuristic procedures
•  Early stopping
•  Truncating small entries
•  etc
are actually algorithms for implicit objectives.
2
ICML
David Gleich · Purdue

The ideal world
Given Problem P
Derive solution
characterization C
Show algorithm A "
finds a solution where C
holds
Profit?!
Given “min-cut”
Derive “max-flow is
equivalent to min-cut”
Show push-relabel
solves max-flow "

Profit!!
ICML
3

(The ideal world)’
Given Problem P
Derive solution approx.
characterization C
Show algorithm A’ "
finds a solution where C’
holds
Profit?!
Given “sparsest-cut”
Derive Rayleigh-
quotient approximation
Show power method
finds good Rayleigh
quotient
Profit? !
ICML
4
(In academia!)!

The real world
Given Task P
Hack around until you
find something useful
Write paper presenting
“novel heuristic” H for P
and
Profit!!
Given “find-communities”
Hack around !
… hidden ..!
Write paper on “three
steps of power method
finds communities”
Profit!!
ICML
5

(The ideal world)’’
Understand why H works!
Show heuristic H solves P’
Guess and check!
until you find something H
solves
Derive characterization of
heuristic H
Given “find-communities”
Hack around !
!
Write paper on “three
steps of power method
finds communities”
Profit!!
ICML
6

If your algorithm is related
to optimization, this is:
Given a procedure X, "
what objective does it
optimize?
The real world
Algorithmic Anti-differentiation!
Given heuristic H, is there a problem P’
such that H is an algorithm for P’ ?
In the smooth,
unconstrained case,
this is just “anti-
differentiation!”
ICML
7

Algorithmic Anti-differentiation
in the literature
Mahoney & Orecchia (2011)  
Three steps of the power method and p-norm reg.
Dhillon et al. (2007) "
Spectral clustering, trace minimization & kernel k-means
Saunders (1995) LSQR & Craig iterative methods for Ax = b!
… many more …

ICML
8

Outline
1.  A new derivation of the PageRank vector for an
undirected graph based on Laplacians, cuts, or ﬂows.
2.  An understanding of the implicit regularization of
PageRank “push” method.
3.  The impact of this on a few applications.
ICML
9

The PageRank problem
The PageRank random surfer
1.  With probability beta, follow a
random-walk step
2.  With probability (1-beta), jump
randomly ~ dist. v.
Goal ﬁnd the stationary dist. x!
!

Sym. adjacency matrix
Diagonal degree matrix
Solution
Jump-vector
(I AD 1
)x = (1 )v
ICML
10
[↵D + L]z = ↵v
where
= 1/(1 + ↵)
and x = Dz
Equivalent to
Combinatorial "
Laplacian

The Push Algorithm for PageRank!

Proposed (in closest form) in Andersen, Chung, Lang "
(also by McSherry, Jeh & Widom) for personalized PageRank
Strongly related to Gauss-Seidel, coordinate descent
Derived to quickly approximate PageRank with sparsity
1. x(1)
= 0, r(1)
= (1 )ei , k = 1
2. while any rj > ⌧dj (dj is the degree of node j)
3. x(k+1)
= x(k)
+ (rj ⌧dj ⇢)ej
4. r(k+1)
i =
8
><
>:
⌧dj ⇢ i = j
r(k)
i + (rj ⌧dj ⇢)/dj i ⇠ j
r(k)
i otherwise
5. k k + 1
The
Push
Method!
⌧, ⇢
ICML
11

The push method stays local
ICML
12

Why do we care
about push?

1.  Used for empirical studies of
“communities” and an
ingredient in an empirically
successful community ﬁnder
(Whang et al. CIKM 2013).
2.  Used for “fast PageRank”
approximation
3.  It produces sparse
approximations to PageRank!

Newman’s netscience!
379 vertices, 1828 nnz
“zero” on most of the nodes
v has a single "
one here
13
ICML

minimize kBxkC,1 =
P
ij2E Ci,j |xi xj |
subject to xs = 1, xt = 0, x 0.
The s-t min-cut problem
Unweighted incidence matrix
Diagonal cost matrix
14
ICML

The localized cut graph

Related to a construction
used in “FlowImprove” "
Andersen & Lang (2007); and
Orecchia & Zhu (2014)
AS =
2
4
0 ↵dT
S 0
↵dS A ↵d¯S
0 ↵dT
¯S 0
3
5
Connect s to vertices
in S with weight ↵ · degree
Connect t to vertices
in ¯S with weight ↵ · degree
ICML
15

The localized cut graph & PageRank
ICML
16
minimize kBSxkC(↵),1
subject to xs = 1, xt = 0
x 0.
Solve the s-t min-cut

The localized cut graph & PageRank
ICML
17
Solve “spectral” s-t min-cut
subject to xs = 1, xt = 0
x 0.
The PageRank vector z that solves
(↵D + L)z = ↵v
with v = dS/vol(S) is a renormalized
solution of the electrical cut computation:
subject to xs = 1, xt = 0.
Speciﬁcally, if x is the solution, then
x =
2
4
1
vol(S)z
0
3
5

Back to the push method
Let x be the output from the push method
with 0 < < 1, v = dS/vol(S),
⇢ = 1, and ⌧ > 0.
Set ↵ = 1
,  = ⌧vol(S)/ , and let zG solve:
minimize 1
2 kBSzk
2
C(↵),2 + kDzk1
subject to zs = 1, zt = 0, z 0
,
where z =
h 1
zG
0
i
.
Then x = DzG/vol(S).
Proof Write out KKT conditions
Show that the push method
solves them. Slackness was “tricky”
Regularization
for sparsity
ICML
18
Need for
normalization

A simple example
The vector xpr, z, and x(↵, S ) are the PageRank vectors from Theo-
rem 1, where x(↵, S ) solves Prob. (4) and the others are from the
problems at the end of Section 2. The vector xcut solves the cut
Prob. (2), and zG solves Prob. (6).
Deg. xpr z x(↵, S ) xcut zG
2 0.0788 0.0394 0.8276 1 0.2758
4 0.1475 0.0369 0.7742 1 0.2437
7 0.2362 0.0337 0.7086 1 0.2138
4 0.1435 0.0359 0.7533 1 0.2325
4 0.1297 0.0324 0.6812 1 0.1977
7 0.1186 0.0169 0.3557 0 0
3 0.0385 0.0128 0.2693 0 0
2 0.0167 0.0083 0.1749 0 0
4 0.0487 0.0122 0.2554 0 0
3 0.0419 0.0140 0.2933 0 0
Prob. (6) solves an `1-regularized `2 regression problem)
has 24 non-zeros. The true “min-cut” set is large in both
the 2-norm PageRank problem and the regularized problem.
Thus, we identify the underlying graph feature correctly;
but the implicitly regularized ACL procedure does so with
many fewer non-zeros than the vanilla PageRank procedure.
ICML
19

20
Anti-di↵erentiating Approximat
16 nonzeros 15 nonzeros
Figure 2. Examples of the di↵erent cut vectors on a portion of the netscience
with its vertices enlarged. In the other subfigures, we show the solution vectors
(4), and (6), solved with min-cut, PageRank, and ACL) for this set S . Each v
values are large and dark. White vertices with outlines are numerically non-zer
outlined, in contrast to the third figure). The true min-cut set is large in all ve
with many fewer non-zeros than the vanilla PageRank problem.
References
Andersen, Reid and Lang, Kevin. An algorithm for improving
graph partitions. In Proceedings of the 19th annual ACM-SIAM
Symposium on Discrete Algorithms, pp. 651–660, 2008.
Andersen, Reid, Chung, Fan, and Lang, Kevin. Local graph par-
titioning using PageRank vectors. In Proceedings of the 47th
Annual IEEE Symposium on Foundations of Computer Science,
Leskov
Mic
clus
Inte
Mahon
regu
of th
143
Anti-di↵erentiating Approximation Algorithms
eros 15 nonzeros 284 nonzeros 24 nonzeros
of the di↵erent cut vectors on a portion of the netscience graph. In the left subfigure, we show the set S highlighted
arged. In the other subfigures, we show the solution vectors from the various cut problems (from left to right, Probs. (2),
with min-cut, PageRank, and ACL) for this set S . Each vector determines the color and size of a vertex, where high
dark. White vertices with outlines are numerically non-zero (which is why most of the vertices in the fourth figure are
t to the third figure). The true min-cut set is large in all vectors, but the implicitly regularized problem achieves this
Push’s sparsity
helps it identify
the “right” graph
feature with fewer
non-zeros
The set S
The mincut solution
The push solution
The PageRank solution
ICML

It’s easy to make this apply broadly

Easy to cook up interesting diffusion-like problems and adapt them to this
framework. In particular, Zhou et al. (2004) gave a semi-supervised learning
diffusion we are currently studying …
2
4
0 eT
S 0
eS ✓A e¯S
0 e¯S 0
3
5 .
ICML
21
minimize 1
2 kBS ˆxk
2
2 + kˆxk1
subject to ˆxs = 1, ˆxt = 0, ˆx 0
minimize 1
2 xT
(I + ✓L)x xT
eS + kxk1
subject to x 0

Anti-di↵erentiating Approximation Algorithms
16 nonzeros 15 nonzeros 284 nonzeros 24 nonzeros
Figure 1. Examples of the di↵erent cut vectors on a portion of the net-science graph. At left we show the set S highlighted
Recap & Conclusions
ICML
22
Open issues!
Better treatment of directed graphs?

Algorithm for rho < 1?!
rho set to ½ in most “uses”
Need new analysis

(Coming soon)"
Improvements to semi-supervised
learning on graphs!

Key point
We don’t solve the 1-norm
regularized problem with
a 1-norm solver, but with
the efficient push method.

Run push, and you get a
1-norm reg. with early
stopping
Supported by NSF CAREER 1149756-CCF
www.cs.purdue.edu/homes/dgleich
1.  “Defined” alg.
anti-diff to
understand why
heuristics work.
2.  Found equiv. w/
PageRank and
cut / flow.
3.  Push & 1-norm
regularization.

PageRank à s-t min-cut
That equivalence works if s is degree-weighted.
What if s is the uniform vector?
A(s) =
2
4
0 ↵sT
0
↵s A ↵(d s)
0 ↵(d s)T
0
3
5 .
23
MMDS 2014

Anti-differentiating approximation algorithms: A case study with min-cuts, spectral, and flow

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to Anti-differentiating approximation algorithms: A case study with min-cuts, spectral, and flow

Similar to Anti-differentiating approximation algorithms: A case study with min-cuts, spectral, and flow (20)

Recently uploaded

Recently uploaded (20)

Anti-differentiating approximation algorithms: A case study with min-cuts, spectral, and flow