Relaxation methods for the matrix exponential on large networks

1,041 views

Published on

My talk from the Stanford ICME seminar series on doing network analysis and link prediction using the a fast algorithm for the matrix exponential on graph problems.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,041
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Relaxation methods for the matrix exponential on large networks

  1. 1. Coordinate descent methods for the matrix exponential ! on large networks David F. Gleich! Purdue University! Joint work with Kyle Kloster @ Purdue supported by " NSF CAREER 1149756-CCF Code www.cs.purdue.edu/homes/dgleich/codes/nexpokit! ICME David Gleich · Purdue 1
  2. 2. This talk x = exp(P)ec x the solution P the matrix ec the column localized large, sparse, stochastic ICME David Gleich · Purdue 2
  3. 3. Localized solutions 0 2 4 6 x 10 5 0 0.5 1 1.5 plot(x) nnz(x) = 513, 969 10 0 10 2 10 4 10 6 10 −15 10 −10 10 −5 10 0 10 0 10 2 10 4 10 6 10 −15 10 −10 10 −5 10 0 nonzeros error x = exp(P)ec length(x) = 513, 969 ICME David Gleich · Purdue 3
  4. 4. Our mission! Find the solution with work " roughly proportional to the " localization, not the matrix. ICME David Gleich · Purdue 4
  5. 5. Our algorithm! www.cs.purdue.edu/homes/dgleich/codes/nexpokit 10 0 10 2 10 4 10 6 10 −15 10 −10 10 −5 10 0 10 0 10 2 10 4 10 6 10 −15 10 −10 10 −5 10 0 nonzeros error ICME David Gleich · Purdue 5
  6. 6. Outline 1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for " linear systems from large networks 4.  Error analysis 5.  Experiments ICME David Gleich · Purdue 6
  7. 7. Models and algorithms for high performance ! matrix and network computations ICME David Gleich · Purdue 7 1 error 1 std 0 2 (b) Std, s = 0.39 cm 10 error 0 0 10 std 0 20 (d) Std, s = 1.95 cm model compared to the prediction standard de- bble locations at the final time for two values of = 1.95 cm. (Colors are visible in the electronic approximately twenty minutes to construct using s. ta involved a few pre- and post-processing steps: m Aria, globally transpose the data, compute the nd errors. The preprocessing steps took approx- recise timing information, but we do not report Tensor eigenvalues" and a power method FIGURE 6 – Previous work from the PI tackled net- work alignment with ma- trix methods for edge overlap: i j j0 i0 OverlapOverlap A L B This proposal is for match- ing triangles using tensor methods: j i k j0 i0 k0 TriangleTriangle A L B t r o s. g n. o n s s- g maximize P ijk Tijk xi xj xk subject to kxk2 = 1 where ! ensures the 2-norm [x(next) ]i = ⇢ · ( X jk Tijk xj xk + xi ) SSHOPM method due to " Kolda and Mayo Simulation data analysis SIMAX ‘09, SISC ‘11,MapReduce ‘11, ICASSP ’12 Network alignment ICDM ‘09, SC ‘11, TKDE ‘13 Fast & Scalable" Network centrality SC ‘05, WAW ‘07, SISC ‘10, WWW ’10, … Data clustering WSDM ‘12, KDD ‘12, CIKM ’13 … Ax = b min kAx bk Ax = x Massive matrix " computations on multi-threaded and distributed architectures
  8. 8. Matrix exponentials exp(A) is defined as 1X k=0 1 k! Ak Always converges special case of a function of a matrix dx dt = Ax(t) , x(t) = exp(tA)x(0) Evolution operator " for an ODE A is n ⇥ n, real ICME David Gleich · Purdue 8
  9. 9. SIAM REVIEW c 2003 Society for Industrial and Applied Mathematics Vol. 45, No. 1, pp. 3–49 Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later∗ Cleve Moler† Charles Van Loan‡ ICME David Gleich · Purdue 9
  10. 10. Matrix exponentials on large networks exp(A) = 1X k=0 1 k! Ak If A is the adjacency matrix, then Ak counts the number of length k paths between node pairs. [Estrada 2000, Farahat et al. 2002, 2006] Large entries denote important nodes or edges. Used for link prediction and centrality If P is a transition matrix, then " Pk is the probability of a length k walk between node pairs. [Kondor & Lafferty 2002, Kunegis & Lommatzsch 2009, Chung 2007] Used for link prediction, kernels, and clustering or community detection exp(P) = 1X k=0 1 k! Pk ICME David Gleich · Purdue 10
  11. 11. Another useful matrix exponential P column stochastic e.g. P = AT D 1 A is the adjacency matrix if A is symmetric exp(PT ) = exp(D 1 A) = D 1 exp(AD 1 )D = D 1 exp(P)D ICME David Gleich · Purdue 11
  12. 12. Another useful matrix exponential P column stochastic e.g. P = AT D 1 A is the adjacency matrix if A is symmetric exp( L) = exp(D 1/2 AD 1/2 I) = 1 e exp(D 1/2 AD 1/2 ) = 1 e D 1/2 exp(AD 1 )D1/2 = 1 e D 1/2 exp(P)D1/2 Negative Normalized Laplacian ICME David Gleich · Purdue 12
  13. 13. Matrix exponentials on large networks Is a single column interesting? Yes! exp(P)ec = 1X k=0 1 k! Pk ec Link prediction scores for node c A community relative to node c But … modern networks are " large ~ O(109) nodes, sparse ~ O(1011) edges, constantly changing … and so we’d like " speed over accuracy ICME David Gleich · Purdue 13
  14. 14. The issue with existing methods We want good results in less than one matvec. Our graphs have small diameter and fast fill-in. Krylov methods ! A few matvecs, quick loss of sparsity due to orthogonality ! Direct expansion! A few matvecs, quick loss of sparsity due to fill-in ICME David Gleich · Purdue 14 exp(P)ec ⇡ ⇢Vexp(H)e1 [Sidje 1998]" ExpoKit exp(P)ec ⇡ PN k=0 1 k! Pk ec
  15. 15. Outline 1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for " linear systems from large networks 4.  Error analysis 5.  Experiments ✓ ICME David Gleich · Purdue 15
  16. 16. Our underlying method Direct expansion! A few matvecs, quick loss of sparsity due to fill-in This method is stable for stochastic P! "… no cancellation, unbounded norm, etc. ! ! ICME David Gleich · Purdue 16 x = exp(P)ec ⇡ PN k=0 1 k! Pk ec = xN Lemma kx xNk1  1 N!N
  17. 17. Our underlying method ! as a linear system Direct expansion! " ! ! ! ICME David Gleich · Purdue 17 x = exp(P)ec ⇡ PN k=0 1 k! Pk ec = xN 2 6 6 6 6 6 6 4 III P/1 III P/2 ... ... III P/N III 3 7 7 7 7 7 7 5 2 6 6 6 6 6 6 4 v0 v1 ... ... vN 3 7 7 7 7 7 7 5 = 2 6 6 6 6 6 6 4 ec 0 ... ... 0 3 7 7 7 7 7 7 5 xN = NX i=0 vi (III ⌦ IIIN SN ⌦ P)v = e1 ⌦ ec Lemma we approximate xN well if we approximate v well
  18. 18. Our mission (2)! Approximately solve " when A, b are sparse," x is localized. ICME David Gleich · Purdue 18 Ax = b
  19. 19. Outline 1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for " linear systems from large networks 4.  Error analysis 5.  Experiments ✓ ICME David Gleich · Purdue 19 ✓
  20. 20. Coordinate descent, Gauss-Southwell, Gauss-Seidel, relaxation & “push” methods ICME David Gleich · Purdue 20 Algebraically! Procedurally! Solve(A,b) x = sparse(size(A,1),1) r = b While (1) Pick j where r(j) != 0 z = r(j) x(j) = x(j) + r(j) For i where A(i,j) != 0 r(i) = r(i) – z*A(i,j) Ax = b r(k) = b Ax(k) x(k+1) = x(k) + ej eT j r(k) r(k+1) = r(k) r(k) j Aej
  21. 21. It’s called the “push” method because of PageRank ICME David Gleich · Purdue 21 (III ↵P)x = v r(k) = v (III ↵P)x(k) x(k+1) = x(k) + ej eT j r(k) “r(k+1) = r(k) r(k) j Aej ” r(k+1) i = 8 >< >: 0 i = j r(k) i + ↵Pi,j r(k) j Pi,j 6= 0 r(k) i otherwise PageRankPush(links,v,alpha) x = sparse(size(A,1),1) r = b While (1) Pick j where r(j) != 0 z = r(j) x(j) = x(j) + z r(j) = 0 z = z / deg(j) For i where “j links to i” r(i) = r(i) + z
  22. 22. It’s called the “push” method because of PageRank ICME David Gleich · Purdue 22 Demo
  23. 23. Justification of terminology This method is frequently “rediscovered” (3 times for PageRank!) Let Ax = b, diag(A) = I It’s Gauss-Seidel if j is chosen cyclically It’s Gauss-Southwell if j is the largest entry in the residual It’s coordinate descent if A is symmetric, pos. definite It’s a relaxation step for any A Works great for other problems too! " [Bonchi, Gleich, et al. J. Internet Math. 2012] ICME David Gleich · Purdue 23
  24. 24. Back to the exponential ICME David Gleich · Purdue 24 2 6 6 6 6 6 6 4 III P/1 III P/2 ... ... III P/N III 3 7 7 7 7 7 7 5 2 6 6 6 6 6 6 4 v0 v1 ... ... vN 3 7 7 7 7 7 7 5 = 2 6 6 6 6 6 6 4 ec 0 ... ... 0 3 7 7 7 7 7 7 5 xN = NX i=0 vi (III ⌦ IIIN SN ⌦ P)v = e1 ⌦ ec Solve this system via the same method. Optimization 1 build system implicitly Optimization 2 don’t store vi, just store sum xN
  25. 25. Code (inefficient, but working) for ! Gauss-Southwell to solve function x = nexpm(P,c,tol) n = size(P,1); N = 11; sumr=1; r = zeros(n,N+1); r(c,1) = 1; x = zeros(n,1); % the residual and solution while sumr >= tol % use max iteration too [ml,q]=max(r(:)); i=mod(q-1,n)+1; k=ceil(q/n); % use a heap in practice for max r(q) = 0; x(i) = x(i)+ml; sumr = sumr-ml;% zero the residual, add to solution [nset,~,vals] = find(P(:,i)); ml=ml/k; % look up the neighbors of node i for j=1:numel(nset) % for all neighbors if k==N, x(nset(j)) = x(nset(j)) + vals(j)*ml; % add to solution else, r(nset(j),k+1) = r(nset(j),k+1) + vals(j)*ml;% or add to next residual sumr = sumr + vals(j)*ml; end, end, end % end if, end for, end while Todo use dictionary for x, r and use heap or queue for residual ICME David Gleich · Purdue 25
  26. 26. Outline 1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for " linear systems from large networks 4.  Error analysis 5.  Experiments ✓ ICME David Gleich · Purdue 26 ✓ ✓
  27. 27. Error analysis for Gauss-Southwell ICME David Gleich · Purdue 27 Theorem Assume P is column-stochastic, v(0) = 0. (Nonnegativity) iterates and residuals are nonnegative v(l) 0 and r(l) 0 (Convergence) residual goes to 0: kr(l) k1  Ql k=1 1 1 2dk  l( 1 2d ) (III ⌦ IIIN SN ⌦ P)v = e1 ⌦ ec “easy” “annoying” d is the largest degree
  28. 28. Proof sketch Gauss-Southwell picks largest residual ⇒  Bound the update by avg. nonzeros in residual (sloppy) ⇒  Algebraic convergence with slow rate, but each update is REALLY fast O(d max log n). If d is log log n, then our method runs in sub-linear time " (but so does just about anything) ICME David Gleich · Purdue 28
  29. 29. Overall error analysis ICME David Gleich · Purdue 29 Components! Truncation to N terms Residual to error Approximate solve Theorem kxN (`) xk1  1 N!N + 1 e · ` 1 2d After ℓ steps of Gauss-Southwell
  30. 30. Outline 1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for " linear systems from large networks 4.  Error analysis 5.  Experiments ✓ ICME David Gleich · Purdue 30 ✓ ✓ ✓
  31. 31. Our implementations C++ mex implementation with a heap to implement Gauss-Southwell. C++ mex implementation with a queue to store all residual entries ≥ 1/(tol nN). At completion, the residual norm ≤ tol. We use the queue except for the runtime comparison. ICME David Gleich · Purdue 31
  32. 32. Accuracy vs. tolerance ICME David Gleich · Purdue 32 0 0.2 0.4 0.6 0.8 1 −2 −3 −4 −5 −6 −7 log10 of residual tolerance Precisionat100 pgp−ccpgp social graph, 10k vertices For the pgp social graph, we study the precision in finding the 100 largest nodes as we vary the tolerance. This set of 100 does not include the nodes immediate neighbors. (Boxplot over 50 trials)
  33. 33. Accuracy vs. work ICME David Gleich · Purdue 33 For the dblp collaboration graph, we study the precision in finding the 100 largest nodes as we vary the work. This set of 100 does not include the nodes immediate neighbors. (One column, but representative) 10 −2 10 −1 10 0 0 0.2 0.4 0.6 0.8 1 dblp−cc Effective matrix−vector products Precision tol=10−4 tol=10 −5 @10 @25 @100 @1000 dblp collaboration graph, 225k vertices
  34. 34. Runtime ICME David Gleich · Purdue 34 10 3 10 4 10 5 10 6 10 −4 10 −2 10 0 |E| + |V| Runtime(secs). TSGS TSGSQ EXPV MEXPV TAYLOR Flickr social network" 500k nodes, 5M edges
  35. 35. Outline 1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for " linear systems from large networks 4.  Error analysis 5.  Experiments ✓ ICME David Gleich · Purdue 35 ✓ ✓ ✓ ✓
  36. 36. References and ongoing work Kloster and Gleich, Workshop on Algorithms for the Web-graph, 2013 (forthcoming). www.cs.purdue.edu/homes/dgleich/codes/nexpokit •  Error analysis using the queue •  Better linear systems for faster convergence •  Asynchronous coordinate descent methods •  Scaling up to billion node graphs •  More explicit localization in algorithms ICME David Gleich · Purdue 36

×