Successfully reported this slideshow.                                      You’ve finished this document.
Upcoming SlideShare
[Challenge:Future] Inter-School Bridge System (I.S.B.S): My Dream Job
Next

of                                      Upcoming SlideShare
[Challenge:Future] Inter-School Bridge System (I.S.B.S): My Dream Job
Next

Share

# Localized methods for diffusions in large graphs

I describe a few ongoing research projects on diffusions in large graphs and how we can create efficient matrix computations in order to determine them efficiently.

See all

### Related Audiobooks

#### Free with a 30 day trial from Scribd

See all
• Be the first to like this

### Localized methods for diffusions in large graphs

1. 1. Localized methods for diffusions in large graphs David F. Gleich! Purdue University! Joint work with Kyle Kloster @" Purdue & Michael Mahoney @ Berkeley supported by " NSF CAREER CCF-1149756 Code "www.cs.purdue.edu/homes/dgleich/codes/nexpokit ! "www.cs.purdue.edu/homes/dgleich/codes/l1pagerank! David Gleich · Purdue 1 MMDS 2014
2. 2. Image from rockysprings, deviantart, CC share-alike Everything in the world can be explained by a matrix, and we see how deep the rabbit hole goes The talk ends, you believe -- whatever you want to.
3. 3. 3
4. 4. Graph diffusions David Gleich · Purdue 4 f = 1X k=0 ↵k Pk s ate t in on work, or mesh, from a typical problem in scientiﬁc computing high low A – adjacency matrix! D – degree matrix! P – column stochastic operator s – the “seed” (a sparse vector) f – the diffusion result 𝛼k – the path weights P = AD 1 Px = X j!i 1 dj xj Graph diffusions help: 1.  Attribute prediction 2.  Community detection 3.  “Ranking” 4.  Find small conductance sets MMDS 2014
5. 5. Graph diffusions David Gleich · Purdue 5 ate t in on work, or mesh, from a typical problem in scientiﬁc computing high low h = e t 1X k=0 tk k! Pk s h = e t exp{tP}s PageRank Heat kernel x = (1 ) 1X k=0 k Pk s (I P)x = (1 )s P = AD 1 Px = X j!i 1 dj xj MMDS 2014
6. 6. Graph diffusions David Gleich · Purdue 6 h = e t 1X k=0 tk k! Pk s h = e t exp{tP}s PageRank Heat kernel 0 20 40 60 80 100 10 −5 10 0 t=1 t=5 t=15 α=0.85 α=0.99 Weight Length x = (1 ) 1X k=0 k Pk s (I P)x = (1 )s MMDS 2014
7. 7. Uniformly localized " solutions in livejournal 1 2 3 4 5 x 10 6 0 0.5 1 1.5 nnz = 4815948 magnitude plot(x) 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 −14 10 −12 10 −10 10 −8 10 −6 10 −4 10 −2 10 0 1−normerror largest non−zeros retained 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 −14 10 −12 10 −10 10 −8 10 −6 10 −4 10 −2 10 0 1−normerror largest non−zeros retained x = exp(P)ec David Gleich · Purdue 7 nnz(x) = 4, 815, 948 Gleich & Kloster, arXiv:1310.3423 MMDS 2014
8. 8. Our mission! Find the solution with work " roughly proportional to the " localization, not the matrix. David Gleich · Purdue 8 MMDS 2014
9. 9. Two types of localization David Gleich · Purdue 9 kx x⇤ k1  " kD 1 (x x⇤ )k1  " x ⇡ x⇤ Uniform (Strong)! Entry-wise (Weak)! Localized vectors are not sparse, but they can be approximated by sparse vectors. Good global approximation using only a local region. “Hard” to prove. “Need” a graph property. Good approximation for cuts and communities. “Easy” to prove. “Fast” algorithms MMDS 2014
10. 10. We have four results 1.  A new interpretation for the PageRank diffusion in relationship with a mincut problem. 2.  A new understanding of the scalable, localized PageRank “push” method 3.  A new algorithm for the heat kernel diffusion in a degree weighted norm. 4.  Algorithms for diffusions as functions of matrices (K. Kloster’s poster on Thurs.) David Gleich · Purdue 10 Undirected graphs only Entry-wise localization Directed, uniform localization MMDS 2014
11. 11. Our algorithms for uniform localization" www.cs.purdue.edu/homes/dgleich/codes/nexpokit 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 −8 10 −6 10 −4 10 −2 10 0 non−zeros 1−normerror gexpm gexpmq expmimv 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 −8 10 −6 10 −4 10 −2 10 0 non−zeros 1−normerror David Gleich · Purdue 11 MMDS 2014 work = O ⇣ log(1 " )(1 " )3/2 d2 (log d)2 ⌘ nnz = O ⇣ log(1 " )(1 " )3/2 d(log d) ⌘
12. 12. PageRank, mincuts, and the push method via Algorithmic Anti-Differentiation David Gleich · Purdue 12 Gleich & Mahoney, ICML 2014 MMDS 2014
13. 13. The PageRank problem & " the Laplacian on undirected graphs Combinatorial Laplacian L = D - A! David Gleich · Purdue 13 The PageRank random surfer 1.  With probability beta, follow a random-walk step 2.  With probability (1-beta), jump randomly ~ dist. s. Goal ﬁnd the stationary dist. x! x = (1 ) 1X k=0 k Pk s1. (I AD 1 )x = (1 )s; 2. [↵D + L]z = ↵s where = 1/(1 + ↵) and x = Dz. MMDS 2014
14. 14. minimize kBxkC,1 = P ij2E Ci,j |xi xj | subject to xs = 1, xt = 0, x 0. The s-t min-cut problem Unweighted incidence matrix Diagonal capacity matrix 14 David Gleich · Purdue t s In the unweighted case, " solve via max-ﬂow. In the weighted case, solve via network simplex or industrial LP. MMDS 2014
15. 15. The localized cut graph Related to a construction used in “FlowImprove” " Andersen & Lang (2007); and Orecchia & Zhu (2014) AS = 2 4 0 ↵dT S 0 ↵dS A ↵d¯S 0 ↵dT ¯S 0 3 5 Connect s to vertices in S with weight ↵ · degree Connect t to vertices in ¯S with weight ↵ · degree David Gleich · Purdue 15 MMDS 2014
16. 16. The localized cut graph Connect s to vertices in S with weight ↵ · degree Connect t to vertices in ¯S with weight ↵ · degree BS = 2 4 e IS 0 0 B 0 0 I¯S e 3 5 minimize kBSxkC(↵),1 subject to xs = 1, xt = 0 x 0. Solve the s-t min-cut David Gleich · Purdue 16 MMDS 2014
17. 17. The localized cut graph Connect s to vertices in S with weight ↵ · degree Connect t to vertices in ¯S with weight ↵ · degree BS = 2 4 e IS 0 0 B 0 0 I¯S e 3 5 Solve the “electrical ﬂow”   s-t min-cut minimize kBSxkC(↵),2 subject to xs = 1, xt = 0 David Gleich · Purdue 17 MMDS 2014
18. 18. s-t min-cut à PageRank Proof Square and expand the objective into a Laplacian, then apply constraints. David Gleich · Purdue 18 MMDS 2014 The PageRank vector z that solves (↵D + L)z = ↵s with s = dS/vol(S) is a renormalized solution of the electrical cut computation: minimize kBSxkC(↵),2 subject to xs = 1, xt = 0. Speciﬁcally, if x is the solution, then x = 2 4 1 vol(S)z 0 3 5
19. 19. PageRank à s-t min-cut That equivalence works if s is degree-weighted. What if s is the uniform vector? A(s) = 2 4 0 ↵sT 0 ↵s A ↵(d s) 0 ↵(d s)T 0 3 5 . David Gleich · Purdue 19 MMDS 2014
20. 20. Insight 1! PageRank implicitly approximates the solution of these s-t mincut problems David Gleich · Purdue 20 MMDS 2014
21. 21. The Push Algorithm for PageRank Proposed (in closest form) in Andersen, Chung, Lang " (also by McSherry, Jeh & Widom) for personalized PageRank Strongly related to Gauss-Seidel on Ax=b (see my talk at Simons) Derived to show improved runtime for balanced solvers 1. x(1) = 0, r(1) = (1 )ei , k = 1 2. while any rj > ⌧dj (dj is the degree of node j) 3. x(k+1) = x(k) + (rj ⌧dj ⇢)ej 4. r(k+1) i = 8 >< >: ⌧dj ⇢ i = j r(k) i + (rj ⌧dj ⇢)/dj i ⇠ j r(k) i otherwise 5. k k + 1 The Push Method! ⌧, ⇢ David Gleich · Purdue 21 a b c MMDS 2014
22. 22. Why do we care about push? 1.  Used for empirical stud- ies of “communities” 2.  Local Cheeger inequality. 3.  Used for “fast Page- Rank approximation” 4.  It produces weakly localized approximations to PageRank! Newman’s netscience! 379 vertices, 1828 nnz “zero” on most of the nodes s has a single " one here 22 kD 1 (x x⇤ )k1  " 1 (1 )" edges
23. 23. The push method revisited Let x be the output from the push method with 0 < < 1, v = dS/vol(S), ⇢ = 1, and ⌧ > 0. Set ↵ = 1 ,  = ⌧vol(S)/ , and let zG solve: minimize 1 2 kBSzk 2 C(↵),2 + kDzk1 subject to zs = 1, zt = 0, z 0 , where z = h 1 zG 0 i . Then x = DzG/vol(S). Proof Write out KKT conditions Show that the push method solves them. Slackness was “tricky” Regularization for sparsity David Gleich · Purdue 23 Need for normalization MMDS 2014
24. 24. Insight 2! The PageRank push method implicitly solves a 1-norm regularized 2-norm cut approximation. David Gleich · Purdue 24 MMDS 2014
25. 25. Insight 2’ We get 3-digits of accuracy on P and 16-digits of accuracy on P’. David Gleich · Purdue 25 MMDS 2014
26. 26. David Gleich · Purdue 26 Anti-di↵erentiating Approximat 16 nonzeros 15 nonzeros Figure 2. Examples of the di↵erent cut vectors on a portion of the netscience with its vertices enlarged. In the other subﬁgures, we show the solution vectors (4), and (6), solved with min-cut, PageRank, and ACL) for this set S . Each v values are large and dark. White vertices with outlines are numerically non-zer outlined, in contrast to the third ﬁgure). The true min-cut set is large in all ve with many fewer non-zeros than the vanilla PageRank problem. References Andersen, Reid and Lang, Kevin. An algorithm for improving graph partitions. In Proceedings of the 19th annual ACM-SIAM Symposium on Discrete Algorithms, pp. 651–660, 2008. Leskov Mic clus Inte Mahon Anti-di↵erentiating Approximation Algorithms eros 15 nonzeros 284 nonzeros 24 nonzeros of the di↵erent cut vectors on a portion of the netscience graph. In the left subﬁgure, we show the set S highlighted arged. In the other subﬁgures, we show the solution vectors from the various cut problems (from left to right, Probs. (2), Push’s sparsity helps it identify the “right” graph feature with fewer non-zeros The set S The mincut solution The push solution The PageRank solution MMDS 2014
27. 27. The push method revisited Let x be the output from the push method with 0 < < 1, v = dS/vol(S), ⇢ = 1, and ⌧ > 0. Set ↵ = 1 ,  = ⌧vol(S)/ , and let zG solve: minimize 1 2 kBSzk 2 C(↵),2 + kDzk1 subject to zs = 1, zt = 0, z 0 , where z = h 1 zG 0 i . Then x = DzG/vol(S). Regularization for sparsity in solution and residual David Gleich · Purdue 27 The push method is scalable because it gives us sparse solutions AND sparse residuals r. MMDS 2014
28. 28. This is a case of Algorithmic Anti-differentiation! 28 MMDS 2014 David Gleich · Purdue
29. 29. Understand why H works! Show heuristic H solves P’ Guess and check! until you ﬁnd something H solves Derive characterization of heuristic H The real world Given “ﬁnd-communities” Hack around " Write paper presenting “three steps of the power method on P ﬁnds communities” Algorithmic Anti-differentiation! Given heuristic H, is there a problem P’ such that H is an algorithm for P’ ? MMDS 2014 David Gleich · Purdue 29 e.g. Mahoney & Orecchia, Dhillon et al. (Graclus); Saunders
30. 30. Without these insights, we’d draw the wrong conclusion. David Gleich · Purdue 30 Gleich & Mahoney, Submitted Our s-t mincut framework extends to many diffusions used in semi-supervised learning. MMDS 2014
31. 31. Without these insights, we’d draw the wrong conclusion. David Gleich · Purdue 31 Gleich & Mahoney, Submitted Our s-t mincut framework extends to many diffusions used in semi-supervised learning. 2 4 6 8 10 0 0.2 0.4 0.6 0.8 errorrate average training samples per class K2 RK2 K3 RK3 Off the shelf SSL procedure MMDS 2014
32. 32. Without these insights, we’d draw the wrong conclusion. David Gleich · Purdue 32 Gleich & Mahoney, Submitted Our s-t mincut framework extends to many diffusions used in semi-supervised learning. 2 4 6 8 10 0 0.2 0.4 0.6 0.8 errorrate average training samples per class K2 RK2 K3 RK3 2 4 6 8 10 0 0.2 0.4 0.6 0.8 errorrate average training samples per class K2 RK2 K3 RK3 Off the shelf SSL procedure Rank-rounded SSL MMDS 2014
33. 33. Recap so far 1.  Used the relationship between PageRank and mincut to get a new understanding of the implicit properties of the push method 2.  Showed that this insight helps improve semi-supervised learning. (next) A new algorithm for the heat kernel diffusion in a degree weighted norm. David Gleich · Purdue 33 MMDS 2014
34. 34. Graph diffusions David Gleich · Purdue 34 h = e t 1X k=0 tk k! Pk s h = e t exp{tP}s PageRank Heat kernel 0 20 40 60 80 100 10 −5 10 0 t=1 t=5 t=15 α=0.85 α=0.99 Weight Length x = (1 ) 1X k=0 k Pk s (I P)x = (1 )s Many “empirically useful” properties of PageRank also hold for the Heat kernel diffusion, e.g. " Chung (2007) showed a local Cheeger inequality. No “local” algorithm until a randomized method by Simpson & Chung (2013). MMDS 2014
35. 35. We can turn the heat kernel into a linear system Direct expansion! " ! ! ! David Gleich · Purdue 35 x = exp(P)ec ⇡ PN k=0 1 k! Pk ec = xN 2 6 6 6 6 6 6 4 III P/1 III P/2 ... ... III P/N III 3 7 7 7 7 7 7 5 2 6 6 6 6 6 6 4 v0 v1 ... ... vN 3 7 7 7 7 7 7 5 = 2 6 6 6 6 6 6 4 ec 0 ... ... 0 3 7 7 7 7 7 7 5 xN = NX i=0 vi (III ⌦ IIIN SN ⌦ P)v = e1 ⌦ ec Lemma we approximate xN well if we approximate v well Kloster & Gleich, WAW2013 MMDS 2014
36. 36. There is a fast deterministic adaptation of the push method David Gleich · Purdue 36 Kloster & Gleich, KDD2014 ons hat hen, erm s is d to the s: (7) (8) tity em (9) to k ⇡ we # G is graph as dictionary -of -sets , # seed is an array of seeds , # t, eps , N, psis are precomputed x = {} # Store x, r as dictionaries r = {} # initialize residual Q = collections.deque () # initialize queue for s in seed: r[(s ,0)] = 1./ len(seed) Q.append ((s ,0)) while len(Q) > 0: (v,j) = Q.popleft () # v has r[(v,j)] ... rvj = r[(v,j)] # perform the hk -relax step if v not in x: x[v] = 0. x[v] += rvj r[(v,j)] = 0. mass = (t*rvj/( float(j)+1.))/ len(G[v]) for u in G[v]: # for neighbors of v next = (u,j+1) # in the next block if j+1 == N: # last step , add to soln x[u] += rvj/len(G(v)) continue if next not in r: r[next] = 0. thresh = math.exp(t)*eps*len(G[u]) thresh = thresh /(N*psis[j+1])/2. if r[next] < thresh and r[next] + mass >= thresh: Q.append(next) # add u to queue r[next] = r[next] + mass Figure 2: Pseudo-code for our algorithm as work- ing python code. The graph is stored as a dic- Let h = e t exp{tP}s. Let x = hk-push(") output Then kD 1 (x h)k1  " after looking at 2Net " edges. We believe that the bound below sufﬁces N  2t log(1/") MMDS 2014
37. 37. PageRank vs. Heat Kernel David Gleich · Purdue 37 5 6 7 8 9 0 0.5 1 1.5 2 Runtime: hk vs. ppr log10(|V|+|E|) Runtime(s) hkgrow 50% 25% 75% pprgrow 50% 25% 75% 5 6 7 8 9 10 −2 10 −1 10 0 Conductances: hk vs. ppr log10(|V|+|E|) log10(Conductances) hkgrow 50% 25% 75% pprgrow 50% 25% 75% 5 6 7 8 9 0 0.5 1 1.5 2 Runtime: hk vs. ppr log10(|V|+|E|) Runtime(s) hkgrow 50% 25% 75% pprgrow 50% 25% 75% 10 −2 10 −1 10 0 Conductances: hk vs. ppr log10(Conductances) hkgrow 50% 25% 75% pprgrow 50% 25% 75% On large graphs, our heat kernel takes slightly longer than a localized PageRank, but produces sets with smaller (better) conductance scores. Our python code on clueweb12 (72B edges) via libbvg: •  99 seconds to load •  1 second to compute MMDS 2014
38. 38. References and ongoing work Gleich and Kloster – Relaxation methods for the matrix exponential, Submitted" Kloster and Gleich – Heat kernel based community detection KDD2014 Gleich and Mahoney – Algorithmic Anti-differentiation, ICML 2014 " Gleich and Mahoney – Regularized diffusions, Submitted www.cs.purdue.edu/homes/dgleich/codes/nexpokit! www.cs.purdue.edu/homes/dgleich/codes/l1pagerank •  Improved localization bounds for functions of matrices •  Asynchronous and parallel “push”-style methods David Gleich · Purdue 38 Supported by NSF CAREER 1149756-CCF www.cs.purdue.edu/homes/dgleich

Total views

1,274

On Slideshare

0

From embeds

0

Number of embeds

1

23

Shares

0