Localized methods in graph mining

Localized methods in !
graph mining
David F. Gleich!
Purdue University!
Joint work with
Kyle Kloster @"
Purdue &
Michael
Mahoney @
Berkeley
supported by "
NSF CAREER
CCF-1149756
David Gleich · Purdue
1

STORY TIME
•  Simple theme
•  Many pictures!
3

Localized methods in graph mining "
use the local structure of a network !
(and not the global structure).
USE THIS
NOT THIS
4

Point 1 "
Localized methods are the right thing to
use for large graph mining
Point 2 "
Localized methods are still the right thing
to use even if you don't believe my
answer to part 1.

5

Some graphs have global structure
6
Image by R. Rossi from our paper"
on clique detection for "
Temporal Strong-Components

Some graphs do not
7

Some graphs are random
8

Can you tell
which one is
random?
9

At large scales, !
real networks !
look random
(or slightly better)
10

Localized methods only operate on
meaningful local structures in the data
11

CAVEATS
There are large-scale global
structures.
BUT
They don’t look like what your
small-scale intuition would predict.
Continents exist in Facebook, but
they don’t look small scale
structures
Leskovec, Lang, Dasgupta, Mahoney.
Internet Math, 2009.
Ugander, Backstrom, WSDM (2013)
Jeub, Balachandran, Porter, Mucha,
Mahoney. Phys Rev E 2015.
12

Point 1 "
Localized methods are the right thing to
use for large graph mining
Point 2 "
Localized methods are still the right thing
to use even if you don't believe my
answer to part 1.

13

Local algorithms
give fast answers
to global queries "
(for small-source diffusions)
14

Local algorithms
give useful answers
to global queries "
(for small-source diffusions)
15

Pictures from Sparse Matrix Respository (David & Hu)
www.cise.uﬂ.edu/research/sparse/matrices/
16

Graph diffusions
17
ate
t in
on
work, or mesh, from a typical problem in scientiﬁc computing
high
low
Diffusions show how
{importance, rank,
information, status, …}

ﬂows from a source to
target nodes via edges

Graph diffusions
18
f =
1X
k=0
↵k Pk
s
ate
t in
on
high
low
A – adjacency matrix!
D – degree matrix!
P – column stochastic operator
s – the “seed” (a sparse vector)
f – the diffusion result
𝛼k – the path weights
P = AD 1
Px =
X
j!i
1
dj
xj
Graph diffusions help:
1.  Attribute prediction
2.  Community detection
3.  “Ranking”
4.  Find small conductance sets
5.  Graph label propagation

Graph diffusions
19
ate
t in
on
high
low
h = e t
1X
k=0
tk
k!
Pk
s
h = e t
exp{tP}s
PageRank
Heat kernel
x = (1 )
1X
k=0
k
Pk
s
(I P)x = (1 )s
P = AD 1
Px =
X
j!i
1
dj
xj

Graph diffusions
20
h = e t
1X
k=0
tk
k!
Pk
s
h = e t
exp{tP}s
PageRank
Heat kernel
0 20 40 60 80 100
10
−5
10
0
t=1 t=5 t=15 α=0.85
α=0.99
Weight
Length
x = (1 )
1X
k=0
k
Pk
s
(I P)x = (1 )s

TwitterRank
GeneRank
IsoRank
MonitorRank
BookRank
TimedPage-
Rank
CiteRank
AuthorRank
PopRank
FactRank
ObjectRank
FolkRank
ItemRank
BuddyRank
TwitterRank
HostRank
DirRank
TrustRank
BadRank
VisualRank
PAGERANK BEYOND THE WEB 15
PAGERANK BEYOND THE WEB
PageRank beyond the Webhttp://arxiv.org/abs/1407.5107
(I ↵P)x = (1 ↵)x
by Jessica Leber
Fast Magazine
21

Diffusion based !
community detection

1.  Given a seed, approximate the
diffusion.
2.  Extract the community.

Both are local operations.
22

Conductance communities
Conductance is one of the most
important community scores [Schaeffer07]
The conductance of a set of vertices is
the ratio of edges leaving to total edges:

Equivalently, it’s the probability that a
random edge leaves the set.
Small conductance ó Good community
(S) =
cut(S)
min vol(S), vol( ¯S)
(edges leaving the set)
(total edges
in the set)
cut(S) = 7
vol(S) = 33
vol( ¯S) = 11
(S) = 7/11
23

Andersen-
Chung-Lang!
personalized
PageRank
community
theorem!
[Andersen et al. 2006]!
[Ghosh et al. 2014, KDD]
Informally
Suppose the seeds are in a set
of good conductance, then a
sweep-cut on a diffusion will ﬁnd
a set with conductance that’s
nearly as good.
… also, it’s really fast.
24

Sweep-cuts ﬁnd small-
conductance sets
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
Class 1
Class 2
Class 3
(b) Zhou (3 labels) (c) Andersen-Lang (3 labels) (d)
Class 1
Class 2
Class 3
Class 1
Class 2
Class 3
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
(a) The adjacency
structure of our sample
with the three
unbalanced classes
indicated.
Class 1
Class 2
Class 3
Class 1
Class 2
Class 3
(b) Zhou (3 labels) (c) Andersen-Lang (3 labels)
Class 1
Class 2
Class 3
Class 1
Class 2
Class 3
(e) Zhou (15 labels) (f) Andersen-Lang (15 labels)
Figure 4: A study of the paradoxical e↵ects of value-based rounding on di↵usion
GOOD !
SET 1!
Check the conductance for all “preﬁxes” of the diffusion vector
sorted by value – there is a fast update – O(sum of degrees work)
GOOD !
SET 2!
GOOD !
SET 3!
25

Diffusions are localized "
and we have algorithms to ﬁnd their local regions
26

Uniformly localized !
solutions in ﬂickr
plot(x)
27
0 2 4 6 8 10
x 10
5
0
0.02
0.04
0.06
0.08
0.1
10
0
10
2
10
4
10
6
10
−15
10
−10
10
−5
10
0
10
0
10
2
10
4
10
6
10
−15
10
−10
10
−5
10
0
nonzeros
Crawl of ﬂickr from 2006 ~800k nodes, 6M edges, beta=1/2
(I P)x = (1 )s
nnz(x) ⇡ 800k
kD1
(xx⇤
)k1"

Our mission!
Find the solution with work "
roughly proportional to the "
localization, not the matrix.
28

Our Point"
The push procedure gives "
localized algorithms for diffusions "
in a pleasingly wide variety of settings.
Our Results"
New empirical and theoretical insights into
why and how “push” is so effective
29

The Push Algorithm for PageRank
Proposed (in closest form) in Andersen,
Chung, Lang (also by McSherry, Jeh &
Widom, Berkhin) for fast approx.
PageRank
Derived to show improved runtime for
balanced solvers
30
1.  Used for empirical studies
of “communities”
2.  Local Cheeger inequality.
3.  Used for “fast Page-Rank
approximation”
4.  Works on massive graphs
O(1 second) for 4 billion
edge graph on a laptop.
5.  It yields weakly localized
PageRank approximations!
Newman’s netscience!
379 vertices, 1828 nnz

Produce an ε-accurate entrywise
localized PageRank vector in work
1
"(1 )

Gauss-Seidel and !
Gauss-Southwell
31
Methods to solve A x = b
x(k+1)
= x(k)
+ ⇢j ej [Ax(k+1)
]j = [b]jUpdate
such that
In words “Relax” or “free” the jth coordinate of your solution vector in
order to satisfy the jth equation of your linear system.
Gauss-Seidel repeatedly cycle through j = 1 to n
Gauss-Southwell use the value of j that has the highest magnitude residual
r(k)
= b Ax(k)
a
b
c

Almost “the push” method
The
Push
Method!
32
1. x(1)
= 0, r(1)
= (1 )ei , k = 1
2. while any rj > "dj (dj is the degree of node j)
3. x(k+1)
= x(k)
+ (rj "dj ⇢)ej
4. r(k+1)
i =
8
><
>:
"dj ⇢ i = j
r(k)
i + (rj "dj ⇢)/dj i ⇠ j
r(k)
i otherwise
5. k k + 1
", ⇢
Only push “some” of the residual – If we want tolerance “eps” then
push to tolerance “eps” and no further

Push is fast!
For the PageRank diffusion, Push
gives constant work (entry-wise)."
Andersen, Chung, Lang FOCS 2006
1.  For the Katz diffusion"
Push works empirically fast "
Bonchi, Gleich, et al., 2012, Internet Math.
2.  For the exponential"
Push gives uniform localization
on power-law graphs and fast
runtimes"
Gleich and Kloster, 2014, Internet Math.
3.  For the heat-kernel diffusion "
Push gives constant work
(entry-wise)"
Kloster and Gleich, 2014, KDD
4.  For the PageRank diffusion "
Push yields sparsity
regularization"
Gleich and Mahoney, ICML 2014
5.  For a general class of diffusions "
There is a Cheeger inequality
like before"
Ghosh, Teng, et al. KDD 2014
6.  For the PageRank diffusion "
Push gives the solution path in
constant work (entry-wise)"
Kloster and Gleich, arXiv:1503.00322
x = exp(tP)ei
x = exp(P)ei
(I P)x
= (1 ↵)ei
(I A)x
= (1 ↵)ei
PageRank
Katz
33

Push is useful!
1.  Push implicitly regularizes semi-
supervised learning"
Gleich and Mahoney, submitted
2.  Push gives state of the art
results for overlapping
community detection "
Whang, Gleich, Dhillon, CIKM 2013!
Whang, Gleich, Dhillon, In prep.
3.  Push for overlapping clusters
decrease communication in
parallel solutions"
Andersen, Gleich, Mirrokni, WSDM 2012
34
F1 F2
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
DBLP
demon
bigclam
graclus centers
spread hubs
random
egonet
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6 dem
big
gra
spr
ran
ego
Figure 3: F1 and F2 measures comparing our algorithmic comm
indicates better communities.
6
7
8
Run time
demon
bigclam
graclus centers
spread hubs
random
Our seed set
because eac
property ind
sion method
version. Als
Seeding Phase
Seed Set Expansion Phase
Propagation Phase
Joyce Jiyoung Whang, The University of Texas at Austin Conference on Information and Knowledge Management (8/44)

HK PPR
F1 0.87 0.34
Set size 14 67
F1 0.33 0.14
Set size 192 15293
Amazon
(Average)
Us! Prev. Best
Thisset
Heat Kernel Based Community Detection
KDD 2014
Kyle Kloster and David F. Gleich!
Purdue University
f = exp{tP}s =
1X
k=0
tk
k! Pk
s
Convert to a linear system, and
solve in constant time

Heat kernel localization
General recipe!
1.  Take problem X, "
convert into a linear
system
2.  Apply “push” to that
linear system
3.  Analyze and bound
total work
David
Gleich
·∙
Purdue

36

Heat kernel recipe!
1.  Convert into "
"
"

2.  Apply “push”
3.  Analyze work bound "

x = exp(tP)ei
2
6
6
6
6
6
6
4
III
tP/1 III
tP/2
...
... III
tP/N III
3
7
7
7
7
7
7
5
2
6
6
6
6
6
6
4
v0
v1
...
...
vN
3
7
7
7
7
7
7
5
=
2
6
6
6
6
6
6
4
ei
0
...
...
0
3
7
7
7
7
7
7
5

There is a fast deterministic
adaptation of the push method
37
Kloster & Gleich,
KDD2014

ons
hat
hen,
erm
s is
d to
the
s:
(7)
(8)
tity
em
(9)
to
k ⇡
we
# G is graph as dictionary -of -sets ,
# seed is an array of seeds ,
# t, eps , N, psis are precomputed
x = {} # Store x, r as dictionaries
r = {} # initialize residual
Q = collections.deque () # initialize queue
for s in seed:
r[(s ,0)] = 1./ len(seed)
Q.append ((s ,0))
while len(Q) > 0:
(v,j) = Q.popleft () # v has r[(v,j)] ...
rvj = r[(v,j)]
# perform the hk -relax step
if v not in x: x[v] = 0.
x[v] += rvj
r[(v,j)] = 0.
mass = (t*rvj/( float(j)+1.))/ len(G[v])
for u in G[v]: # for neighbors of v
next = (u,j+1) # in the next block
if j+1 == N: # last step , add to soln
x[u] += rvj/len(G(v))
continue
if next not in r: r[next] = 0.
thresh = math.exp(t)*eps*len(G[u])
thresh = thresh /(N*psis[j+1])/2.
if r[next] < thresh and
r[next] + mass >= thresh:
Q.append(next) # add u to queue
r[next] = r[next] + mass
Figure 2: Pseudo-code for our algorithm as work-
ing python code. The graph is stored as a dic-
Let h = e t
exp{tP}s.
Let x = hk-push(") output
Then kD 1
(x h)k1  "
after looking at 2Net
" edges.
We believe that the bound below sufﬁces
N  2t log(1/")
MMDS 2014
THEOREM!

Analysis, three pages to one slide
1.  State the approximation error that results from
approximating using the linear system.!
“Standard” matrix-approximation result.
2.  Bound the work involved in doing push. !
Iterate y ≥ 0, residual r ≥ 0 "
Each step moves “mass” from r to y, "

keeps non-neg and increasing property."
Each step moves at least “deg(i)·ε” mass in deg(i) work"
So in T steps, we “push” Sum [ deg(i)·ε , i in each step]"
But we can only push “so much”, so we can bound this

from above, and invert to get a total work bound.
38
Kloster & Gleich,
KDD2014
X
i2steps
"deg(i)  et

Runtime
39
5 6 7 8 9
0
0.5
1
1.5
2
Runtime: hk vs. ppr
log10(|V|+|E|)
Runtime(s)
hkgrow 50%
25%
75%
pprgrow 50%
25%
75%

PageRank solution paths
10
1
10
2
10
3
10
4
10
5
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
1/ε
DegreenormalizedPageRank
φ=0.005
φ=0.010
φ=0.060
φ=0.111
φ=0.268
40
Compute one diffusion, and all sweep-cuts, for all values of epsilon
Kloster & Gleich,
arXiv:1503.00322

PageRank solution paths
41

These take about a second
to compute with our “new”
push-based algorithm on
graphs with millions of
nodes and edges

Related to the LARS
method for 1-norm
regularized problems

Use “centers” of graph partitions to
seed for overlapping communities
42
0 10 20 30 40 50 60 70 80 90 100
0
Coverage (percentage)
Student Version of MATLAB
(a) AstroPh
0
0
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Coverage (percentage)
MaximumConductance
egonet
graclus centers
spread hubs
random
bigclam
(d) Flickr
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
MaximumConductance
Flickr social
network

2M vertices"
22M edges

We can cover
95% of network
with communities
of cond. ~0.15.

References and ongoing work
Gleich and Kloster – Relaxation methods for the matrix exponential, J.
Internet Math "
Kloster and Gleich – Heat kernel based community detection KDD2014
Gleich and Mahoney – Algorithmic Anti-differentiation, ICML 2014 "
Gleich and Mahoney – Regularized diffusions, Submitted
Whang, Gleich, Dhillon – Seeds for overlapping communities, CIKM 2013
www.cs.purdue.edu/homes/dgleich/codes/nexpokit!
www.cs.purdue.edu/homes/dgleich/codes/l1pagerank
•  Improved localization bounds for functions of matrices
•  Asynchronous and parallel “push”-style methods
•  Localized methods beyond conductance
43
Supported by NSF CAREER 1149756-CCF
www.cs.purdue.edu/homes/dgleich

Localized methods in graph mining

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Localized methods in graph mining

Similar to Localized methods in graph mining (20)

Recently uploaded

Recently uploaded (20)

Localized methods in graph mining