• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Relaxation methods for the matrix exponential on large networks
 

Relaxation methods for the matrix exponential on large networks

on

  • 682 views

My talk from the Stanford ICME seminar series on doing network analysis and link prediction using the a fast algorithm for the matrix exponential on graph problems.

My talk from the Stanford ICME seminar series on doing network analysis and link prediction using the a fast algorithm for the matrix exponential on graph problems.

Statistics

Views

Total Views
682
Views on SlideShare
680
Embed Views
2

Actions

Likes
0
Downloads
9
Comments
0

1 Embed 2

https://twitter.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Relaxation methods for the matrix exponential on large networks Relaxation methods for the matrix exponential on large networks Presentation Transcript

    • Coordinate descent methods for the matrix exponential ! on large networks David F. Gleich! Purdue University! Joint work with Kyle Kloster @ Purdue supported by " NSF CAREER 1149756-CCF Code www.cs.purdue.edu/homes/dgleich/codes/nexpokit! ICME David Gleich · Purdue 1
    • This talk x = exp(P)ec x the solution P the matrix ec the column localized large, sparse, stochastic ICME David Gleich · Purdue 2
    • Localized solutions 0 2 4 6 x 10 5 0 0.5 1 1.5 plot(x) nnz(x) = 513, 969 10 0 10 2 10 4 10 6 10 −15 10 −10 10 −5 10 0 10 0 10 2 10 4 10 6 10 −15 10 −10 10 −5 10 0 nonzeros error x = exp(P)ec length(x) = 513, 969 ICME David Gleich · Purdue 3
    • Our mission! Find the solution with work " roughly proportional to the " localization, not the matrix. ICME David Gleich · Purdue 4
    • Our algorithm! www.cs.purdue.edu/homes/dgleich/codes/nexpokit 10 0 10 2 10 4 10 6 10 −15 10 −10 10 −5 10 0 10 0 10 2 10 4 10 6 10 −15 10 −10 10 −5 10 0 nonzeros error ICME David Gleich · Purdue 5
    • Outline 1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for " linear systems from large networks 4.  Error analysis 5.  Experiments ICME David Gleich · Purdue 6
    • Models and algorithms for high performance ! matrix and network computations ICME David Gleich · Purdue 7 1 error 1 std 0 2 (b) Std, s = 0.39 cm 10 error 0 0 10 std 0 20 (d) Std, s = 1.95 cm model compared to the prediction standard de- bble locations at the final time for two values of = 1.95 cm. (Colors are visible in the electronic approximately twenty minutes to construct using s. ta involved a few pre- and post-processing steps: m Aria, globally transpose the data, compute the nd errors. The preprocessing steps took approx- recise timing information, but we do not report Tensor eigenvalues" and a power method FIGURE 6 – Previous work from the PI tackled net- work alignment with ma- trix methods for edge overlap: i j j0 i0 OverlapOverlap A L B This proposal is for match- ing triangles using tensor methods: j i k j0 i0 k0 TriangleTriangle A L B t r o s. g n. o n s s- g maximize P ijk Tijk xi xj xk subject to kxk2 = 1 where ! ensures the 2-norm [x(next) ]i = ⇢ · ( X jk Tijk xj xk + xi ) SSHOPM method due to " Kolda and Mayo Simulation data analysis SIMAX ‘09, SISC ‘11,MapReduce ‘11, ICASSP ’12 Network alignment ICDM ‘09, SC ‘11, TKDE ‘13 Fast & Scalable" Network centrality SC ‘05, WAW ‘07, SISC ‘10, WWW ’10, … Data clustering WSDM ‘12, KDD ‘12, CIKM ’13 … Ax = b min kAx bk Ax = x Massive matrix " computations on multi-threaded and distributed architectures
    • Matrix exponentials exp(A) is defined as 1X k=0 1 k! Ak Always converges special case of a function of a matrix dx dt = Ax(t) , x(t) = exp(tA)x(0) Evolution operator " for an ODE A is n ⇥ n, real ICME David Gleich · Purdue 8
    • SIAM REVIEW c 2003 Society for Industrial and Applied Mathematics Vol. 45, No. 1, pp. 3–49 Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later∗ Cleve Moler† Charles Van Loan‡ ICME David Gleich · Purdue 9
    • Matrix exponentials on large networks exp(A) = 1X k=0 1 k! Ak If A is the adjacency matrix, then Ak counts the number of length k paths between node pairs. [Estrada 2000, Farahat et al. 2002, 2006] Large entries denote important nodes or edges. Used for link prediction and centrality If P is a transition matrix, then " Pk is the probability of a length k walk between node pairs. [Kondor & Lafferty 2002, Kunegis & Lommatzsch 2009, Chung 2007] Used for link prediction, kernels, and clustering or community detection exp(P) = 1X k=0 1 k! Pk ICME David Gleich · Purdue 10
    • Another useful matrix exponential P column stochastic e.g. P = AT D 1 A is the adjacency matrix if A is symmetric exp(PT ) = exp(D 1 A) = D 1 exp(AD 1 )D = D 1 exp(P)D ICME David Gleich · Purdue 11
    • Another useful matrix exponential P column stochastic e.g. P = AT D 1 A is the adjacency matrix if A is symmetric exp( L) = exp(D 1/2 AD 1/2 I) = 1 e exp(D 1/2 AD 1/2 ) = 1 e D 1/2 exp(AD 1 )D1/2 = 1 e D 1/2 exp(P)D1/2 Negative Normalized Laplacian ICME David Gleich · Purdue 12
    • Matrix exponentials on large networks Is a single column interesting? Yes! exp(P)ec = 1X k=0 1 k! Pk ec Link prediction scores for node c A community relative to node c But … modern networks are " large ~ O(109) nodes, sparse ~ O(1011) edges, constantly changing … and so we’d like " speed over accuracy ICME David Gleich · Purdue 13
    • The issue with existing methods We want good results in less than one matvec. Our graphs have small diameter and fast fill-in. Krylov methods ! A few matvecs, quick loss of sparsity due to orthogonality ! Direct expansion! A few matvecs, quick loss of sparsity due to fill-in ICME David Gleich · Purdue 14 exp(P)ec ⇡ ⇢Vexp(H)e1 [Sidje 1998]" ExpoKit exp(P)ec ⇡ PN k=0 1 k! Pk ec
    • Outline 1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for " linear systems from large networks 4.  Error analysis 5.  Experiments ✓ ICME David Gleich · Purdue 15
    • Our underlying method Direct expansion! A few matvecs, quick loss of sparsity due to fill-in This method is stable for stochastic P! "… no cancellation, unbounded norm, etc. ! ! ICME David Gleich · Purdue 16 x = exp(P)ec ⇡ PN k=0 1 k! Pk ec = xN Lemma kx xNk1  1 N!N
    • Our underlying method ! as a linear system Direct expansion! " ! ! ! ICME David Gleich · Purdue 17 x = exp(P)ec ⇡ PN k=0 1 k! Pk ec = xN 2 6 6 6 6 6 6 4 III P/1 III P/2 ... ... III P/N III 3 7 7 7 7 7 7 5 2 6 6 6 6 6 6 4 v0 v1 ... ... vN 3 7 7 7 7 7 7 5 = 2 6 6 6 6 6 6 4 ec 0 ... ... 0 3 7 7 7 7 7 7 5 xN = NX i=0 vi (III ⌦ IIIN SN ⌦ P)v = e1 ⌦ ec Lemma we approximate xN well if we approximate v well
    • Our mission (2)! Approximately solve " when A, b are sparse," x is localized. ICME David Gleich · Purdue 18 Ax = b
    • Outline 1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for " linear systems from large networks 4.  Error analysis 5.  Experiments ✓ ICME David Gleich · Purdue 19 ✓
    • Coordinate descent, Gauss-Southwell, Gauss-Seidel, relaxation & “push” methods ICME David Gleich · Purdue 20 Algebraically! Procedurally! Solve(A,b) x = sparse(size(A,1),1) r = b While (1) Pick j where r(j) != 0 z = r(j) x(j) = x(j) + r(j) For i where A(i,j) != 0 r(i) = r(i) – z*A(i,j) Ax = b r(k) = b Ax(k) x(k+1) = x(k) + ej eT j r(k) r(k+1) = r(k) r(k) j Aej
    • It’s called the “push” method because of PageRank ICME David Gleich · Purdue 21 (III ↵P)x = v r(k) = v (III ↵P)x(k) x(k+1) = x(k) + ej eT j r(k) “r(k+1) = r(k) r(k) j Aej ” r(k+1) i = 8 >< >: 0 i = j r(k) i + ↵Pi,j r(k) j Pi,j 6= 0 r(k) i otherwise PageRankPush(links,v,alpha) x = sparse(size(A,1),1) r = b While (1) Pick j where r(j) != 0 z = r(j) x(j) = x(j) + z r(j) = 0 z = z / deg(j) For i where “j links to i” r(i) = r(i) + z
    • It’s called the “push” method because of PageRank ICME David Gleich · Purdue 22 Demo
    • Justification of terminology This method is frequently “rediscovered” (3 times for PageRank!) Let Ax = b, diag(A) = I It’s Gauss-Seidel if j is chosen cyclically It’s Gauss-Southwell if j is the largest entry in the residual It’s coordinate descent if A is symmetric, pos. definite It’s a relaxation step for any A Works great for other problems too! " [Bonchi, Gleich, et al. J. Internet Math. 2012] ICME David Gleich · Purdue 23
    • Back to the exponential ICME David Gleich · Purdue 24 2 6 6 6 6 6 6 4 III P/1 III P/2 ... ... III P/N III 3 7 7 7 7 7 7 5 2 6 6 6 6 6 6 4 v0 v1 ... ... vN 3 7 7 7 7 7 7 5 = 2 6 6 6 6 6 6 4 ec 0 ... ... 0 3 7 7 7 7 7 7 5 xN = NX i=0 vi (III ⌦ IIIN SN ⌦ P)v = e1 ⌦ ec Solve this system via the same method. Optimization 1 build system implicitly Optimization 2 don’t store vi, just store sum xN
    • Code (inefficient, but working) for ! Gauss-Southwell to solve function x = nexpm(P,c,tol) n = size(P,1); N = 11; sumr=1; r = zeros(n,N+1); r(c,1) = 1; x = zeros(n,1); % the residual and solution while sumr >= tol % use max iteration too [ml,q]=max(r(:)); i=mod(q-1,n)+1; k=ceil(q/n); % use a heap in practice for max r(q) = 0; x(i) = x(i)+ml; sumr = sumr-ml;% zero the residual, add to solution [nset,~,vals] = find(P(:,i)); ml=ml/k; % look up the neighbors of node i for j=1:numel(nset) % for all neighbors if k==N, x(nset(j)) = x(nset(j)) + vals(j)*ml; % add to solution else, r(nset(j),k+1) = r(nset(j),k+1) + vals(j)*ml;% or add to next residual sumr = sumr + vals(j)*ml; end, end, end % end if, end for, end while Todo use dictionary for x, r and use heap or queue for residual ICME David Gleich · Purdue 25
    • Outline 1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for " linear systems from large networks 4.  Error analysis 5.  Experiments ✓ ICME David Gleich · Purdue 26 ✓ ✓
    • Error analysis for Gauss-Southwell ICME David Gleich · Purdue 27 Theorem Assume P is column-stochastic, v(0) = 0. (Nonnegativity) iterates and residuals are nonnegative v(l) 0 and r(l) 0 (Convergence) residual goes to 0: kr(l) k1  Ql k=1 1 1 2dk  l( 1 2d ) (III ⌦ IIIN SN ⌦ P)v = e1 ⌦ ec “easy” “annoying” d is the largest degree
    • Proof sketch Gauss-Southwell picks largest residual ⇒  Bound the update by avg. nonzeros in residual (sloppy) ⇒  Algebraic convergence with slow rate, but each update is REALLY fast O(d max log n). If d is log log n, then our method runs in sub-linear time " (but so does just about anything) ICME David Gleich · Purdue 28
    • Overall error analysis ICME David Gleich · Purdue 29 Components! Truncation to N terms Residual to error Approximate solve Theorem kxN (`) xk1  1 N!N + 1 e · ` 1 2d After ℓ steps of Gauss-Southwell
    • Outline 1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for " linear systems from large networks 4.  Error analysis 5.  Experiments ✓ ICME David Gleich · Purdue 30 ✓ ✓ ✓
    • Our implementations C++ mex implementation with a heap to implement Gauss-Southwell. C++ mex implementation with a queue to store all residual entries ≥ 1/(tol nN). At completion, the residual norm ≤ tol. We use the queue except for the runtime comparison. ICME David Gleich · Purdue 31
    • Accuracy vs. tolerance ICME David Gleich · Purdue 32 0 0.2 0.4 0.6 0.8 1 −2 −3 −4 −5 −6 −7 log10 of residual tolerance Precisionat100 pgp−ccpgp social graph, 10k vertices For the pgp social graph, we study the precision in finding the 100 largest nodes as we vary the tolerance. This set of 100 does not include the nodes immediate neighbors. (Boxplot over 50 trials)
    • Accuracy vs. work ICME David Gleich · Purdue 33 For the dblp collaboration graph, we study the precision in finding the 100 largest nodes as we vary the work. This set of 100 does not include the nodes immediate neighbors. (One column, but representative) 10 −2 10 −1 10 0 0 0.2 0.4 0.6 0.8 1 dblp−cc Effective matrix−vector products Precision tol=10−4 tol=10 −5 @10 @25 @100 @1000 dblp collaboration graph, 225k vertices
    • Runtime ICME David Gleich · Purdue 34 10 3 10 4 10 5 10 6 10 −4 10 −2 10 0 |E| + |V| Runtime(secs). TSGS TSGSQ EXPV MEXPV TAYLOR Flickr social network" 500k nodes, 5M edges
    • Outline 1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for " linear systems from large networks 4.  Error analysis 5.  Experiments ✓ ICME David Gleich · Purdue 35 ✓ ✓ ✓ ✓
    • References and ongoing work Kloster and Gleich, Workshop on Algorithms for the Web-graph, 2013 (forthcoming). www.cs.purdue.edu/homes/dgleich/codes/nexpokit •  Error analysis using the queue •  Better linear systems for faster convergence •  Asynchronous coordinate descent methods •  Scaling up to billion node graphs •  More explicit localization in algorithms ICME David Gleich · Purdue 36