Coordinate descent methods
for the matrix exponential !
on large networks
David F. Gleich!
Purdue University!
Joint work with 
Kyle Kloster @ Purdue
supported by "
NSF CAREER
1149756-CCF 
Code www.cs.purdue.edu/homes/dgleich/codes/nexpokit!
ICME
David Gleich · Purdue
1
This talk
x = exp(P)ec
x the solution
P the matrix
ec the column
localized
large, sparse, stochastic
ICME
David Gleich · Purdue
2
Localized solutions
0 2 4 6
x 10
5
0
0.5
1
1.5
plot(x) nnz(x) = 513, 969
10
0
10
2
10
4
10
6
10
−15
10
−10
10
−5
10
0
10
0
10
2
10
4
10
6
10
−15
10
−10
10
−5
10
0
nonzeros
error
x = exp(P)ec
length(x) = 513, 969
ICME
David Gleich · Purdue
3
Our mission!
Find the solution with work "
roughly proportional to the "
localization, not the matrix.
ICME
David Gleich · Purdue
4
Our algorithm!
www.cs.purdue.edu/homes/dgleich/codes/nexpokit
10
0
10
2
10
4
10
6
10
−15
10
−10
10
−5
10
0
10
0
10
2
10
4
10
6
10
−15
10
−10
10
−5
10
0
nonzeros
error
ICME
David Gleich · Purdue
5
Outline
1.  Motivation and setup
2.  Converting x = exp(P) ec into a linear system
3.  Coordinate descent methods for "
linear systems from large networks
4.  Error analysis
5.  Experiments
ICME
David Gleich · Purdue
6
Models and algorithms for high performance !
matrix and network computations
ICME
David Gleich · Purdue
7
1
error
1
std
0
2
(b) Std, s = 0.39 cm
10
error
0
0
10
std
0
20
(d) Std, s = 1.95 cm
model compared to the prediction standard de-
bble locations at the final time for two values of
= 1.95 cm. (Colors are visible in the electronic
approximately twenty minutes to construct using
s.
ta involved a few pre- and post-processing steps:
m Aria, globally transpose the data, compute the
nd errors. The preprocessing steps took approx-
recise timing information, but we do not report
Tensor eigenvalues"
and a power method

FIGURE 6 – Previous work
from the PI tackled net-
work alignment with ma-
trix methods for edge
overlap:
i
j j0
i0
OverlapOverlap
A L B
This proposal is for match-
ing triangles using tensor
methods:
j
i
k
j0
i0
k0
TriangleTriangle
A L B
t
r
o
s.
g
n.
o
n
s
s-
g
maximize
P
ijk Tijk xi xj xk
subject to kxk2 = 1
where ! ensures the 2-norm
[x(next)
]i = ⇢ · (
X
jk
Tijk xj xk + xi )
SSHOPM method due to "
Kolda and Mayo
Simulation data analysis
SIMAX ‘09, SISC ‘11,MapReduce ‘11, ICASSP ’12
Network alignment
ICDM ‘09, SC ‘11, TKDE ‘13
Fast & Scalable"
Network centrality
SC ‘05, WAW ‘07, SISC ‘10, WWW ’10, …
Data clustering
WSDM ‘12, KDD ‘12, CIKM ’13 …
Ax = b
min kAx bk
Ax = x
Massive matrix "
computations
on multi-threaded
and distributed 
architectures
Matrix exponentials
exp(A) is defined as
1X
k=0
1
k!
Ak Always converges
special case of a function of a matrix
dx
dt
= Ax(t) , x(t) = exp(tA)x(0) Evolution operator "
for an ODE
A is n ⇥ n, real
ICME
David Gleich · Purdue
8
SIAM REVIEW c 2003 Society for Industrial and Applied Mathematics
Vol. 45, No. 1, pp. 3–49
Nineteen Dubious Ways to
Compute the Exponential of a
Matrix, Twenty-Five Years Later∗
Cleve Moler†
Charles Van Loan‡
ICME
David Gleich · Purdue
9
Matrix exponentials on large networks
exp(A) =
1X
k=0
1
k!
Ak If A is the adjacency matrix, then
Ak counts the number of length k
paths between node pairs.
[Estrada 2000, Farahat et al. 2002, 2006] 
Large entries denote important nodes or edges.
Used for link prediction and centrality
If P is a transition matrix, then "
Pk is the probability of a length k
walk between node pairs.
[Kondor & Lafferty 2002, Kunegis & Lommatzsch 2009, Chung 2007]
Used for link prediction, kernels, and 
clustering or community detection
exp(P) =
1X
k=0
1
k!
Pk
ICME
David Gleich · Purdue
10
Another useful matrix exponential
P column stochastic
e.g. P = AT
D 1
A is the adjacency matrix
if A is symmetric
exp(PT
) = exp(D 1
A) = D 1
exp(AD 1
)D = D 1
exp(P)D
ICME
David Gleich · Purdue
11
Another useful matrix exponential
P column stochastic
e.g. P = AT
D 1
A is the adjacency matrix
if A is symmetric
exp( L) = exp(D 1/2
AD 1/2
I)
=
1
e
exp(D 1/2
AD 1/2
)
=
1
e
D 1/2
exp(AD 1
)D1/2
=
1
e
D 1/2
exp(P)D1/2
Negative Normalized Laplacian
ICME
David Gleich · Purdue
12
Matrix exponentials on large networks
Is a single column interesting? Yes!
exp(P)ec =
1X
k=0
1
k!
Pk
ec Link prediction scores for node c
A community relative to node c
But …
modern networks are "
large ~ O(109) nodes,
sparse ~ O(1011) edges,
constantly changing …
and so we’d like "
speed over accuracy
ICME
David Gleich · Purdue
13
The issue with existing methods
We want good results in less than one matvec.
Our graphs have small diameter and fast fill-in.

Krylov methods !
A few matvecs, quick loss of sparsity due to orthogonality
!
Direct expansion!
A few matvecs, quick loss of sparsity due to fill-in
ICME
David Gleich · Purdue
14
exp(P)ec ⇡ ⇢Vexp(H)e1
[Sidje 1998]"
ExpoKit
exp(P)ec ⇡
PN
k=0
1
k! Pk
ec
Outline
1.  Motivation and setup
2.  Converting x = exp(P) ec into a linear system
3.  Coordinate descent methods for "
linear systems from large networks
4.  Error analysis
5.  Experiments
✓
ICME
David Gleich · Purdue
15
Our underlying method
Direct expansion!
A few matvecs, quick loss of sparsity due to fill-in

This method is stable for stochastic P!
"… no cancellation, unbounded norm, etc.
!
!

ICME
David Gleich · Purdue
16
x = exp(P)ec ⇡
PN
k=0
1
k! Pk
ec = xN
Lemma kx xNk1 
1
N!N
Our underlying method !
as a linear system
Direct expansion!


"
!
!
!

ICME
David Gleich · Purdue
17
x = exp(P)ec ⇡
PN
k=0
1
k! Pk
ec = xN
2
6
6
6
6
6
6
4
III
P/1 III
P/2
...
... III
P/N III
3
7
7
7
7
7
7
5
2
6
6
6
6
6
6
4
v0
v1
...
...
vN
3
7
7
7
7
7
7
5
=
2
6
6
6
6
6
6
4
ec
0
...
...
0
3
7
7
7
7
7
7
5
xN =
NX
i=0
vi
(III ⌦ IIIN SN ⌦ P)v = e1 ⌦ ec
Lemma we approximate xN well if we approximate v well
Our mission (2)!
Approximately solve "

when A, b are sparse,"
x is localized.
ICME
David Gleich · Purdue
18
Ax = b
Outline
1.  Motivation and setup
2.  Converting x = exp(P) ec into a linear system
3.  Coordinate descent methods for "
linear systems from large networks
4.  Error analysis
5.  Experiments
✓
ICME
David Gleich · Purdue
19
✓
Coordinate descent, Gauss-Southwell,
Gauss-Seidel, relaxation & “push” methods
ICME
David Gleich · Purdue
20
Algebraically! Procedurally!
Solve(A,b)
x = sparse(size(A,1),1)
r = b
While (1)
Pick j where r(j) != 0
z = r(j)
x(j) = x(j) + r(j)
For i where A(i,j) != 0
r(i) = r(i) – z*A(i,j)
Ax = b
r(k)
= b Ax(k)
x(k+1)
= x(k)
+ ej eT
j r(k)
r(k+1)
= r(k)
r(k)
j Aej
It’s called the “push” method
because of PageRank
ICME
David Gleich · Purdue
21
(III ↵P)x = v
r(k)
= v (III ↵P)x(k)
x(k+1)
= x(k)
+ ej eT
j r(k)
“r(k+1)
= r(k)
r(k)
j Aej ”
r(k+1)
i =
8
><
>:
0 i = j
r(k)
i + ↵Pi,j r(k)
j Pi,j 6= 0
r(k)
i otherwise
PageRankPush(links,v,alpha)
x = sparse(size(A,1),1)
r = b
While (1)
Pick j where r(j) != 0
z = r(j)
x(j) = x(j) + z
r(j) = 0
z = z / deg(j)
For i where “j links to i”
r(i) = r(i) + z
It’s called the “push” method
because of PageRank
ICME
David Gleich · Purdue
22
Demo
Justification of terminology
This method is frequently “rediscovered” (3 times for PageRank!)

Let Ax = b, diag(A) = I
It’s Gauss-Seidel if j is chosen cyclically
It’s Gauss-Southwell if j is the largest entry in the residual
It’s coordinate descent if A is symmetric, pos. definite
It’s a relaxation step for any A

Works great for other problems too! "
[Bonchi, Gleich, et al. J. Internet Math. 2012]
ICME
David Gleich · Purdue
23
Back to the exponential
ICME
David Gleich · Purdue
24
2
6
6
6
6
6
6
4
III
P/1 III
P/2
...
... III
P/N III
3
7
7
7
7
7
7
5
2
6
6
6
6
6
6
4
v0
v1
...
...
vN
3
7
7
7
7
7
7
5
=
2
6
6
6
6
6
6
4
ec
0
...
...
0
3
7
7
7
7
7
7
5
xN =
NX
i=0
vi
(III ⌦ IIIN SN ⌦ P)v = e1 ⌦ ec
Solve this system via the same method.

Optimization 1 build system implicitly

Optimization 2 don’t store vi, just store sum xN
Code (inefficient, but working) for !
Gauss-Southwell to solve
function x = nexpm(P,c,tol)
n = size(P,1); N = 11; sumr=1;
r = zeros(n,N+1); r(c,1) = 1; x = zeros(n,1); % the residual and solution
while sumr >= tol % use max iteration too
[ml,q]=max(r(:)); i=mod(q-1,n)+1; k=ceil(q/n); % use a heap in practice for max
r(q) = 0; x(i) = x(i)+ml; sumr = sumr-ml;% zero the residual, add to solution
[nset,~,vals] = find(P(:,i)); ml=ml/k; % look up the neighbors of node i
for j=1:numel(nset) % for all neighbors
if k==N, x(nset(j)) = x(nset(j)) + vals(j)*ml; % add to solution
else, r(nset(j),k+1) = r(nset(j),k+1) + vals(j)*ml;% or add to next residual
sumr = sumr + vals(j)*ml; 
end, end, end % end if, end for, end while

Todo use dictionary for x, r and use heap or queue for residual
ICME
David Gleich · Purdue
25
Outline
1.  Motivation and setup
2.  Converting x = exp(P) ec into a linear system
3.  Coordinate descent methods for "
linear systems from large networks
4.  Error analysis
5.  Experiments
✓
ICME
David Gleich · Purdue
26
✓
✓
Error analysis for Gauss-Southwell
ICME
David Gleich · Purdue
27
Theorem
Assume P is column-stochastic, v(0)
= 0.
(Nonnegativity)
iterates and residuals are nonnegative
v(l)
0 and r(l)
0
(Convergence)
residual goes to 0:
kr(l)
k1 
Ql
k=1 1 1
2dk  l( 1
2d )
(III ⌦ IIIN SN ⌦ P)v = e1 ⌦ ec
“easy”
“annoying”
d is the
largest degree
Proof sketch
Gauss-Southwell picks largest residual
⇒  Bound the update by avg. nonzeros in residual (sloppy)
⇒  Algebraic convergence with slow rate, but each update is
REALLY fast O(d max log n).
If d is log log n, then our method runs in sub-linear time "
(but so does just about anything)
ICME
David Gleich · Purdue
28
Overall error analysis
ICME
David Gleich · Purdue
29

Components!
Truncation to N terms
Residual to error
 

Approximate solve

Theorem kxN
(`)
xk1 
1
N!N
+
1
e
· `
1
2d
After ℓ steps of Gauss-Southwell
Outline
1.  Motivation and setup
2.  Converting x = exp(P) ec into a linear system
3.  Coordinate descent methods for "
linear systems from large networks
4.  Error analysis
5.  Experiments
✓
ICME
David Gleich · Purdue
30
✓
✓
✓
Our implementations
C++ mex implementation with a heap to
implement Gauss-Southwell.
C++ mex implementation with a queue to store
all residual entries ≥ 1/(tol nN). 

At completion, the residual norm ≤ tol.
We use the queue except for the runtime
comparison.
ICME
David Gleich · Purdue
31
Accuracy vs. tolerance
ICME
David Gleich · Purdue
32
0
0.2
0.4
0.6
0.8
1
−2 −3 −4 −5 −6 −7
log10 of residual tolerance
Precisionat100
pgp−ccpgp social graph, 10k vertices
For the pgp social
graph, we study the
precision in finding the
100 largest nodes as
we vary the tolerance.
This set of 100 does
not include the nodes
immediate neighbors.
(Boxplot over 50 trials)
Accuracy vs. work
ICME
David Gleich · Purdue
33
For the dblp collaboration
graph, we study the
precision in finding the
100 largest nodes as we
vary the work. This set of
100 does not include the
nodes immediate
neighbors. (One column,
but representative)
10
−2
10
−1
10
0
0
0.2
0.4
0.6
0.8
1
dblp−cc
Effective matrix−vector products
Precision
tol=10−4
tol=10
−5
@10
@25
@100
@1000
dblp collaboration graph, 225k vertices
Runtime
ICME
David Gleich · Purdue
34
10
3
10
4
10
5
10
6
10
−4
10
−2
10
0
|E| + |V|
Runtime(secs).
TSGS
TSGSQ
EXPV
MEXPV
TAYLOR
Flickr social network"
500k nodes, 5M edges
Outline
1.  Motivation and setup
2.  Converting x = exp(P) ec into a linear system
3.  Coordinate descent methods for "
linear systems from large networks
4.  Error analysis
5.  Experiments
✓
ICME
David Gleich · Purdue
35
✓
✓
✓
✓
References and ongoing work
Kloster and Gleich, Workshop on Algorithms for the
Web-graph, 2013 (forthcoming).
www.cs.purdue.edu/homes/dgleich/codes/nexpokit
•  Error analysis using the queue
•  Better linear systems for faster convergence
•  Asynchronous coordinate descent methods
•  Scaling up to billion node graphs
•  More explicit localization in algorithms
ICME
David Gleich · Purdue
36

Relaxation methods for the matrix exponential on large networks

  • 1.
    Coordinate descent methods forthe matrix exponential ! on large networks David F. Gleich! Purdue University! Joint work with Kyle Kloster @ Purdue supported by " NSF CAREER 1149756-CCF Code www.cs.purdue.edu/homes/dgleich/codes/nexpokit! ICME David Gleich · Purdue 1
  • 2.
    This talk x =exp(P)ec x the solution P the matrix ec the column localized large, sparse, stochastic ICME David Gleich · Purdue 2
  • 3.
    Localized solutions 0 24 6 x 10 5 0 0.5 1 1.5 plot(x) nnz(x) = 513, 969 10 0 10 2 10 4 10 6 10 −15 10 −10 10 −5 10 0 10 0 10 2 10 4 10 6 10 −15 10 −10 10 −5 10 0 nonzeros error x = exp(P)ec length(x) = 513, 969 ICME David Gleich · Purdue 3
  • 4.
    Our mission! Find thesolution with work " roughly proportional to the " localization, not the matrix. ICME David Gleich · Purdue 4
  • 5.
  • 6.
    Outline 1.  Motivation andsetup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for " linear systems from large networks 4.  Error analysis 5.  Experiments ICME David Gleich · Purdue 6
  • 7.
    Models and algorithmsfor high performance ! matrix and network computations ICME David Gleich · Purdue 7 1 error 1 std 0 2 (b) Std, s = 0.39 cm 10 error 0 0 10 std 0 20 (d) Std, s = 1.95 cm model compared to the prediction standard de- bble locations at the final time for two values of = 1.95 cm. (Colors are visible in the electronic approximately twenty minutes to construct using s. ta involved a few pre- and post-processing steps: m Aria, globally transpose the data, compute the nd errors. The preprocessing steps took approx- recise timing information, but we do not report Tensor eigenvalues" and a power method FIGURE 6 – Previous work from the PI tackled net- work alignment with ma- trix methods for edge overlap: i j j0 i0 OverlapOverlap A L B This proposal is for match- ing triangles using tensor methods: j i k j0 i0 k0 TriangleTriangle A L B t r o s. g n. o n s s- g maximize P ijk Tijk xi xj xk subject to kxk2 = 1 where ! ensures the 2-norm [x(next) ]i = ⇢ · ( X jk Tijk xj xk + xi ) SSHOPM method due to " Kolda and Mayo Simulation data analysis SIMAX ‘09, SISC ‘11,MapReduce ‘11, ICASSP ’12 Network alignment ICDM ‘09, SC ‘11, TKDE ‘13 Fast & Scalable" Network centrality SC ‘05, WAW ‘07, SISC ‘10, WWW ’10, … Data clustering WSDM ‘12, KDD ‘12, CIKM ’13 … Ax = b min kAx bk Ax = x Massive matrix " computations on multi-threaded and distributed architectures
  • 8.
    Matrix exponentials exp(A) isdefined as 1X k=0 1 k! Ak Always converges special case of a function of a matrix dx dt = Ax(t) , x(t) = exp(tA)x(0) Evolution operator " for an ODE A is n ⇥ n, real ICME David Gleich · Purdue 8
  • 9.
    SIAM REVIEW c2003 Society for Industrial and Applied Mathematics Vol. 45, No. 1, pp. 3–49 Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later∗ Cleve Moler† Charles Van Loan‡ ICME David Gleich · Purdue 9
  • 10.
    Matrix exponentials onlarge networks exp(A) = 1X k=0 1 k! Ak If A is the adjacency matrix, then Ak counts the number of length k paths between node pairs. [Estrada 2000, Farahat et al. 2002, 2006] Large entries denote important nodes or edges. Used for link prediction and centrality If P is a transition matrix, then " Pk is the probability of a length k walk between node pairs. [Kondor & Lafferty 2002, Kunegis & Lommatzsch 2009, Chung 2007] Used for link prediction, kernels, and clustering or community detection exp(P) = 1X k=0 1 k! Pk ICME David Gleich · Purdue 10
  • 11.
    Another useful matrixexponential P column stochastic e.g. P = AT D 1 A is the adjacency matrix if A is symmetric exp(PT ) = exp(D 1 A) = D 1 exp(AD 1 )D = D 1 exp(P)D ICME David Gleich · Purdue 11
  • 12.
    Another useful matrixexponential P column stochastic e.g. P = AT D 1 A is the adjacency matrix if A is symmetric exp( L) = exp(D 1/2 AD 1/2 I) = 1 e exp(D 1/2 AD 1/2 ) = 1 e D 1/2 exp(AD 1 )D1/2 = 1 e D 1/2 exp(P)D1/2 Negative Normalized Laplacian ICME David Gleich · Purdue 12
  • 13.
    Matrix exponentials onlarge networks Is a single column interesting? Yes! exp(P)ec = 1X k=0 1 k! Pk ec Link prediction scores for node c A community relative to node c But … modern networks are " large ~ O(109) nodes, sparse ~ O(1011) edges, constantly changing … and so we’d like " speed over accuracy ICME David Gleich · Purdue 13
  • 14.
    The issue withexisting methods We want good results in less than one matvec. Our graphs have small diameter and fast fill-in. Krylov methods ! A few matvecs, quick loss of sparsity due to orthogonality ! Direct expansion! A few matvecs, quick loss of sparsity due to fill-in ICME David Gleich · Purdue 14 exp(P)ec ⇡ ⇢Vexp(H)e1 [Sidje 1998]" ExpoKit exp(P)ec ⇡ PN k=0 1 k! Pk ec
  • 15.
    Outline 1.  Motivation andsetup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for " linear systems from large networks 4.  Error analysis 5.  Experiments ✓ ICME David Gleich · Purdue 15
  • 16.
    Our underlying method Directexpansion! A few matvecs, quick loss of sparsity due to fill-in This method is stable for stochastic P! "… no cancellation, unbounded norm, etc. ! ! ICME David Gleich · Purdue 16 x = exp(P)ec ⇡ PN k=0 1 k! Pk ec = xN Lemma kx xNk1  1 N!N
  • 17.
    Our underlying method! as a linear system Direct expansion! " ! ! ! ICME David Gleich · Purdue 17 x = exp(P)ec ⇡ PN k=0 1 k! Pk ec = xN 2 6 6 6 6 6 6 4 III P/1 III P/2 ... ... III P/N III 3 7 7 7 7 7 7 5 2 6 6 6 6 6 6 4 v0 v1 ... ... vN 3 7 7 7 7 7 7 5 = 2 6 6 6 6 6 6 4 ec 0 ... ... 0 3 7 7 7 7 7 7 5 xN = NX i=0 vi (III ⌦ IIIN SN ⌦ P)v = e1 ⌦ ec Lemma we approximate xN well if we approximate v well
  • 18.
    Our mission (2)! Approximatelysolve " when A, b are sparse," x is localized. ICME David Gleich · Purdue 18 Ax = b
  • 19.
    Outline 1.  Motivation andsetup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for " linear systems from large networks 4.  Error analysis 5.  Experiments ✓ ICME David Gleich · Purdue 19 ✓
  • 20.
    Coordinate descent, Gauss-Southwell, Gauss-Seidel,relaxation & “push” methods ICME David Gleich · Purdue 20 Algebraically! Procedurally! Solve(A,b) x = sparse(size(A,1),1) r = b While (1) Pick j where r(j) != 0 z = r(j) x(j) = x(j) + r(j) For i where A(i,j) != 0 r(i) = r(i) – z*A(i,j) Ax = b r(k) = b Ax(k) x(k+1) = x(k) + ej eT j r(k) r(k+1) = r(k) r(k) j Aej
  • 21.
    It’s called the“push” method because of PageRank ICME David Gleich · Purdue 21 (III ↵P)x = v r(k) = v (III ↵P)x(k) x(k+1) = x(k) + ej eT j r(k) “r(k+1) = r(k) r(k) j Aej ” r(k+1) i = 8 >< >: 0 i = j r(k) i + ↵Pi,j r(k) j Pi,j 6= 0 r(k) i otherwise PageRankPush(links,v,alpha) x = sparse(size(A,1),1) r = b While (1) Pick j where r(j) != 0 z = r(j) x(j) = x(j) + z r(j) = 0 z = z / deg(j) For i where “j links to i” r(i) = r(i) + z
  • 22.
    It’s called the“push” method because of PageRank ICME David Gleich · Purdue 22 Demo
  • 23.
    Justification of terminology Thismethod is frequently “rediscovered” (3 times for PageRank!) Let Ax = b, diag(A) = I It’s Gauss-Seidel if j is chosen cyclically It’s Gauss-Southwell if j is the largest entry in the residual It’s coordinate descent if A is symmetric, pos. definite It’s a relaxation step for any A Works great for other problems too! " [Bonchi, Gleich, et al. J. Internet Math. 2012] ICME David Gleich · Purdue 23
  • 24.
    Back to theexponential ICME David Gleich · Purdue 24 2 6 6 6 6 6 6 4 III P/1 III P/2 ... ... III P/N III 3 7 7 7 7 7 7 5 2 6 6 6 6 6 6 4 v0 v1 ... ... vN 3 7 7 7 7 7 7 5 = 2 6 6 6 6 6 6 4 ec 0 ... ... 0 3 7 7 7 7 7 7 5 xN = NX i=0 vi (III ⌦ IIIN SN ⌦ P)v = e1 ⌦ ec Solve this system via the same method. Optimization 1 build system implicitly Optimization 2 don’t store vi, just store sum xN
  • 25.
    Code (inefficient, butworking) for ! Gauss-Southwell to solve function x = nexpm(P,c,tol) n = size(P,1); N = 11; sumr=1; r = zeros(n,N+1); r(c,1) = 1; x = zeros(n,1); % the residual and solution while sumr >= tol % use max iteration too [ml,q]=max(r(:)); i=mod(q-1,n)+1; k=ceil(q/n); % use a heap in practice for max r(q) = 0; x(i) = x(i)+ml; sumr = sumr-ml;% zero the residual, add to solution [nset,~,vals] = find(P(:,i)); ml=ml/k; % look up the neighbors of node i for j=1:numel(nset) % for all neighbors if k==N, x(nset(j)) = x(nset(j)) + vals(j)*ml; % add to solution else, r(nset(j),k+1) = r(nset(j),k+1) + vals(j)*ml;% or add to next residual sumr = sumr + vals(j)*ml; end, end, end % end if, end for, end while Todo use dictionary for x, r and use heap or queue for residual ICME David Gleich · Purdue 25
  • 26.
    Outline 1.  Motivation andsetup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for " linear systems from large networks 4.  Error analysis 5.  Experiments ✓ ICME David Gleich · Purdue 26 ✓ ✓
  • 27.
    Error analysis forGauss-Southwell ICME David Gleich · Purdue 27 Theorem Assume P is column-stochastic, v(0) = 0. (Nonnegativity) iterates and residuals are nonnegative v(l) 0 and r(l) 0 (Convergence) residual goes to 0: kr(l) k1  Ql k=1 1 1 2dk  l( 1 2d ) (III ⌦ IIIN SN ⌦ P)v = e1 ⌦ ec “easy” “annoying” d is the largest degree
  • 28.
    Proof sketch Gauss-Southwell pickslargest residual ⇒  Bound the update by avg. nonzeros in residual (sloppy) ⇒  Algebraic convergence with slow rate, but each update is REALLY fast O(d max log n). If d is log log n, then our method runs in sub-linear time " (but so does just about anything) ICME David Gleich · Purdue 28
  • 29.
    Overall error analysis ICME DavidGleich · Purdue 29 Components! Truncation to N terms Residual to error Approximate solve Theorem kxN (`) xk1  1 N!N + 1 e · ` 1 2d After ℓ steps of Gauss-Southwell
  • 30.
    Outline 1.  Motivation andsetup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for " linear systems from large networks 4.  Error analysis 5.  Experiments ✓ ICME David Gleich · Purdue 30 ✓ ✓ ✓
  • 31.
    Our implementations C++ meximplementation with a heap to implement Gauss-Southwell. C++ mex implementation with a queue to store all residual entries ≥ 1/(tol nN). At completion, the residual norm ≤ tol. We use the queue except for the runtime comparison. ICME David Gleich · Purdue 31
  • 32.
    Accuracy vs. tolerance ICME DavidGleich · Purdue 32 0 0.2 0.4 0.6 0.8 1 −2 −3 −4 −5 −6 −7 log10 of residual tolerance Precisionat100 pgp−ccpgp social graph, 10k vertices For the pgp social graph, we study the precision in finding the 100 largest nodes as we vary the tolerance. This set of 100 does not include the nodes immediate neighbors. (Boxplot over 50 trials)
  • 33.
    Accuracy vs. work ICME DavidGleich · Purdue 33 For the dblp collaboration graph, we study the precision in finding the 100 largest nodes as we vary the work. This set of 100 does not include the nodes immediate neighbors. (One column, but representative) 10 −2 10 −1 10 0 0 0.2 0.4 0.6 0.8 1 dblp−cc Effective matrix−vector products Precision tol=10−4 tol=10 −5 @10 @25 @100 @1000 dblp collaboration graph, 225k vertices
  • 34.
    Runtime ICME David Gleich ·Purdue 34 10 3 10 4 10 5 10 6 10 −4 10 −2 10 0 |E| + |V| Runtime(secs). TSGS TSGSQ EXPV MEXPV TAYLOR Flickr social network" 500k nodes, 5M edges
  • 35.
    Outline 1.  Motivation andsetup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for " linear systems from large networks 4.  Error analysis 5.  Experiments ✓ ICME David Gleich · Purdue 35 ✓ ✓ ✓ ✓
  • 36.
    References and ongoingwork Kloster and Gleich, Workshop on Algorithms for the Web-graph, 2013 (forthcoming). www.cs.purdue.edu/homes/dgleich/codes/nexpokit •  Error analysis using the queue •  Better linear systems for faster convergence •  Asynchronous coordinate descent methods •  Scaling up to billion node graphs •  More explicit localization in algorithms ICME David Gleich · Purdue 36