SlideShare a Scribd company logo
1 of 43
Download to read offline
Localized methods in !
graph mining
David F. Gleich!
Purdue University!
Joint work with 
Kyle Kloster @"
Purdue &
Michael
Mahoney @
Berkeley
supported by "
NSF CAREER
CCF-1149756
 David Gleich · Purdue
1
David Gleich · Purdue
2
STORY TIME
•  Simple theme
•  Many pictures!
David Gleich · Purdue
3
Localized methods in graph mining "
use the local structure of a network !
(and not the global structure).
USE THIS
 NOT THIS
David Gleich · Purdue
4
Point 1 "
Localized methods are the right thing to
use for large graph mining
Point 2 "
Localized methods are still the right thing
to use even if you don't believe my
answer to part 1.

David Gleich · Purdue
5
Some graphs have global structure
David Gleich · Purdue
6
Image by R. Rossi from our paper"
on clique detection for "
Temporal Strong-Components
Some graphs do not
David Gleich · Purdue
7
Some graphs are random
David Gleich · Purdue
8
Can you tell
which one is
random?
David Gleich · Purdue
9
At large scales, !
real networks !
look random
(or slightly better)
David Gleich · Purdue
10
Localized methods only operate on
meaningful local structures in the data
David Gleich · Purdue
11
CAVEATS
There are large-scale global
structures.
BUT 
They don’t look like what your
small-scale intuition would predict. 
Continents exist in Facebook, but
they don’t look small scale
structures
Leskovec, Lang, Dasgupta, Mahoney.
Internet Math, 2009.
Ugander, Backstrom, WSDM (2013)
Jeub, Balachandran, Porter, Mucha,
Mahoney. Phys Rev E 2015.
David Gleich · Purdue
12
Point 1 "
Localized methods are the right thing to
use for large graph mining
Point 2 "
Localized methods are still the right thing
to use even if you don't believe my
answer to part 1.

David Gleich · Purdue
13
Local algorithms
give fast answers
to global queries "
(for small-source diffusions)
David Gleich · Purdue
14
Local algorithms
give useful answers
to global queries "
(for small-source diffusions)
David Gleich · Purdue
15
Pictures from Sparse Matrix Respository (David & Hu)
www.cise.ufl.edu/research/sparse/matrices/
David Gleich · Purdue
16
Graph diffusions
David Gleich · Purdue
17
ate
t in
on
work, or mesh, from a typical problem in scientific computing
high
low
Diffusions show how
{importance, rank,
information, status, …}

flows from a source to
target nodes via edges
Graph diffusions
David Gleich · Purdue
18
f =
1X
k=0
↵k Pk
s
ate
t in
on
work, or mesh, from a typical problem in scientific computing
high
low
A – adjacency matrix!
D – degree matrix!
P – column stochastic operator
s – the “seed” (a sparse vector)
f – the diffusion result
𝛼k – the path weights
P = AD 1
Px =
X
j!i
1
dj
xj
Graph diffusions help:
1.  Attribute prediction
2.  Community detection
3.  “Ranking”
4.  Find small conductance sets
5.  Graph label propagation
Graph diffusions
David Gleich · Purdue
19
ate
t in
on
work, or mesh, from a typical problem in scientific computing
high
low
h = e t
1X
k=0
tk
k!
Pk
s
h = e t
exp{tP}s
PageRank
Heat kernel
x = (1 )
1X
k=0
k
Pk
s
(I P)x = (1 )s
P = AD 1
Px =
X
j!i
1
dj
xj
Graph diffusions
David Gleich · Purdue
20
h = e t
1X
k=0
tk
k!
Pk
s
h = e t
exp{tP}s
PageRank
Heat kernel
0 20 40 60 80 100
10
−5
10
0
t=1 t=5 t=15 α=0.85
α=0.99
Weight
Length
x = (1 )
1X
k=0
k
Pk
s
(I P)x = (1 )s
TwitterRank
GeneRank
IsoRank
MonitorRank
BookRank
TimedPage-
Rank
CiteRank
AuthorRank
PopRank
FactRank
ObjectRank
FolkRank
ItemRank
BuddyRank
TwitterRank
HostRank
DirRank
TrustRank
BadRank
VisualRank
PAGERANK BEYOND THE WEB 15
PAGERANK BEYOND THE WEB
PageRank beyond the Webhttp://arxiv.org/abs/1407.5107
(I ↵P)x = (1 ↵)x
by Jessica Leber
Fast Magazine
21
Diffusion based !
community detection

1.  Given a seed, approximate the
diffusion.
2.  Extract the community.

Both are local operations.
David Gleich · Purdue
22
Conductance communities
Conductance is one of the most
important community scores [Schaeffer07]
The conductance of a set of vertices is
the ratio of edges leaving to total edges:


Equivalently, it’s the probability that a
random edge leaves the set.
Small conductance ó Good community
(S) =
cut(S)
min vol(S), vol( ¯S)
(edges leaving the set)
(total edges
in the set)
David Gleich · Purdue
cut(S) = 7
vol(S) = 33
vol( ¯S) = 11
(S) = 7/11
23
Andersen-
Chung-Lang!
personalized
PageRank
community
theorem!
[Andersen et al. 2006]!
[Ghosh et al. 2014, KDD]
Informally
Suppose the seeds are in a set
of good conductance, then a
sweep-cut on a diffusion will find
a set with conductance that’s
nearly as good.
… also, it’s really fast.
David Gleich · Purdue
24
Sweep-cuts find small-
conductance sets
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
(b) Zhou (3 labels) (c) Andersen-Lang (3 labels) (d)
Class 1
Class 2
Class 3
Class 1
Class 2
Class 3
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
(a) The adjacency
structure of our sample
with the three
unbalanced classes
indicated.
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
(b) Zhou (3 labels) (c) Andersen-Lang (3 labels)
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
(e) Zhou (15 labels) (f) Andersen-Lang (15 labels)
Figure 4: A study of the paradoxical e↵ects of value-based rounding on di↵usion
GOOD !
SET 1!
Check the conductance for all “prefixes” of the diffusion vector
sorted by value – there is a fast update – O(sum of degrees work)
GOOD !
SET 2!
GOOD !
SET 3!
David Gleich · Purdue
25
Diffusions are localized "
and we have algorithms to find their local regions
David Gleich · Purdue
26
Uniformly localized !
solutions in flickr
plot(x)
David Gleich · Purdue
27
0 2 4 6 8 10
x 10
5
0
0.02
0.04
0.06
0.08
0.1
10
0
10
2
10
4
10
6
10
−15
10
−10
10
−5
10
0
10
0
10
2
10
4
10
6
10
−15
10
−10
10
−5
10
0
nonzeros
Crawl of flickr from 2006 ~800k nodes, 6M edges, beta=1/2
(I P)x = (1 )s
nnz(x) ⇡ 800k
kD1
(xx⇤
)k1"
Our mission!
Find the solution with work "
roughly proportional to the "
localization, not the matrix.
David Gleich · Purdue
28
Our Point"
The push procedure gives "
localized algorithms for diffusions "
in a pleasingly wide variety of settings.
Our Results"
New empirical and theoretical insights into
why and how “push” is so effective
David Gleich · Purdue
29
The Push Algorithm for PageRank
Proposed (in closest form) in Andersen,
Chung, Lang (also by McSherry, Jeh &
Widom, Berkhin) for fast approx.
PageRank
Derived to show improved runtime for
balanced solvers
David Gleich · Purdue
30
1.  Used for empirical studies
of “communities”
2.  Local Cheeger inequality.
3.  Used for “fast Page-Rank
approximation” 
4.  Works on massive graphs
O(1 second) for 4 billion
edge graph on a laptop.
5.  It yields weakly localized
PageRank approximations!
Newman’s netscience!
379 vertices, 1828 nnz

Produce an ε-accurate entrywise
localized PageRank vector in work
1
"(1 )
Gauss-Seidel and !
Gauss-Southwell
David Gleich · Purdue
31
Methods to solve A x = b
x(k+1)
= x(k)
+ ⇢j ej [Ax(k+1)
]j = [b]jUpdate
 such that
In words “Relax” or “free” the jth coordinate of your solution vector in
order to satisfy the jth equation of your linear system.
Gauss-Seidel repeatedly cycle through j = 1 to n
Gauss-Southwell use the value of j that has the highest magnitude residual 
r(k)
= b Ax(k)
a
b
c
Almost “the push” method
The
Push
Method!
David Gleich · Purdue
32
1. x(1)
= 0, r(1)
= (1 )ei , k = 1
2. while any rj > "dj (dj is the degree of node j)
3. x(k+1)
= x(k)
+ (rj "dj ⇢)ej
4. r(k+1)
i =
8
><
>:
"dj ⇢ i = j
r(k)
i + (rj "dj ⇢)/dj i ⇠ j
r(k)
i otherwise
5. k k + 1
", ⇢
Only push “some” of the residual – If we want tolerance “eps” then
push to tolerance “eps” and no further
Push is fast!
For the PageRank diffusion, Push
gives constant work (entry-wise)."
Andersen, Chung, Lang FOCS 2006
1.  For the Katz diffusion"
Push works empirically fast "
Bonchi, Gleich, et al., 2012, Internet Math.
2.  For the exponential"
Push gives uniform localization
on power-law graphs and fast
runtimes"
Gleich and Kloster, 2014, Internet Math.
3.  For the heat-kernel diffusion "
Push gives constant work
(entry-wise)"
Kloster and Gleich, 2014, KDD
4.  For the PageRank diffusion "
Push yields sparsity
regularization"
Gleich and Mahoney, ICML 2014
5.  For a general class of diffusions "
There is a Cheeger inequality
like before"
Ghosh, Teng, et al. KDD 2014 
6.  For the PageRank diffusion "
Push gives the solution path in
constant work (entry-wise)"
Kloster and Gleich, arXiv:1503.00322
x = exp(tP)ei
x = exp(P)ei
(I P)x
= (1 ↵)ei
(I A)x
= (1 ↵)ei
PageRank
 Katz
David Gleich · Purdue
33
Push is useful!
1.  Push implicitly regularizes semi-
supervised learning"
Gleich and Mahoney, submitted
2.  Push gives state of the art
results for overlapping
community detection "
Whang, Gleich, Dhillon, CIKM 2013!
Whang, Gleich, Dhillon, In prep. 
3.  Push for overlapping clusters
decrease communication in
parallel solutions"
Andersen, Gleich, Mirrokni, WSDM 2012
David Gleich · Purdue
34
F1 F2
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
DBLP
demon
bigclam
graclus centers
spread hubs
random
egonet
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6 dem
big
gra
spr
ran
ego
Figure 3: F1 and F2 measures comparing our algorithmic comm
indicates better communities.
6
7
8
Run time
demon
bigclam
graclus centers
spread hubs
random
Our seed set
because eac
property ind
sion method
version. Als
Seeding Phase
Seed Set Expansion Phase
Propagation Phase
Joyce Jiyoung Whang, The University of Texas at Austin Conference on Information and Knowledge Management (8/44)
HK PPR
F1 0.87 0.34
Set size 14 67
F1 0.33 0.14
Set size 192 15293
Amazon
(Average)
Us! Prev. Best
Thisset
Heat Kernel Based Community Detection
KDD 2014
Kyle Kloster and David F. Gleich!
Purdue University
f = exp{tP}s =
1X
k=0
tk
k! Pk
s
Convert to a linear system, and
solve in constant time
Heat kernel localization
General recipe!
1.  Take problem X, "
convert into a linear
system
2.  Apply “push” to that
linear system
3.  Analyze and bound
total work
David	
  Gleich	
  ·∙	
  Purdue	
  
36	
  
Heat kernel recipe!
1.  Convert into "
"
"

2.  Apply “push” 
3.  Analyze work bound "

x = exp(tP)ei
2
6
6
6
6
6
6
4
III
tP/1 III
tP/2
...
... III
tP/N III
3
7
7
7
7
7
7
5
2
6
6
6
6
6
6
4
v0
v1
...
...
vN
3
7
7
7
7
7
7
5
=
2
6
6
6
6
6
6
4
ei
0
...
...
0
3
7
7
7
7
7
7
5
There is a fast deterministic
adaptation of the push method
David Gleich · Purdue
37
Kloster & Gleich,
KDD2014


ons
hat
hen,
erm
s is
d to
the
s:
(7)
(8)
tity
em
(9)
to
k ⇡
we
# G is graph as dictionary -of -sets ,
# seed is an array of seeds ,
# t, eps , N, psis are precomputed
x = {} # Store x, r as dictionaries
r = {} # initialize residual
Q = collections.deque () # initialize queue
for s in seed:
r[(s ,0)] = 1./ len(seed)
Q.append ((s ,0))
while len(Q) > 0:
(v,j) = Q.popleft () # v has r[(v,j)] ...
rvj = r[(v,j)]
# perform the hk -relax step
if v not in x: x[v] = 0.
x[v] += rvj
r[(v,j)] = 0.
mass = (t*rvj/( float(j)+1.))/ len(G[v])
for u in G[v]: # for neighbors of v
next = (u,j+1) # in the next block
if j+1 == N: # last step , add to soln
x[u] += rvj/len(G(v))
continue
if next not in r: r[next] = 0.
thresh = math.exp(t)*eps*len(G[u])
thresh = thresh /(N*psis[j+1])/2.
if r[next] < thresh and 
r[next] + mass >= thresh:
Q.append(next) # add u to queue
r[next] = r[next] + mass
Figure 2: Pseudo-code for our algorithm as work-
ing python code. The graph is stored as a dic-
Let h = e t
exp{tP}s.
Let x = hk-push(") output
Then kD 1
(x h)k1  "
after looking at 2Net
" edges.
We believe that the bound below suffices
N  2t log(1/")
MMDS 2014
THEOREM!
Analysis, three pages to one slide
1.  State the approximation error that results from
approximating using the linear system.!
“Standard” matrix-approximation result.
2.  Bound the work involved in doing push. !
Iterate y ≥ 0, residual r ≥ 0 "
Each step moves “mass” from r to y, "

keeps non-neg and increasing property."
Each step moves at least “deg(i)·ε” mass in deg(i) work"
So in T steps, we “push” Sum [ deg(i)·ε , i in each step]"
But we can only push “so much”, so we can bound this

from above, and invert to get a total work bound.
David Gleich · Purdue
38
Kloster & Gleich,
KDD2014
X
i2steps
"deg(i)  et
Runtime
David Gleich · Purdue
39
5 6 7 8 9
0
0.5
1
1.5
2
Runtime: hk vs. ppr
log10(|V|+|E|)
Runtime(s)
hkgrow 50%
25%
75%
pprgrow 50%
25%
75%
PageRank solution paths
10
1
10
2
10
3
10
4
10
5
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
1/ε
DegreenormalizedPageRank
φ=0.005
φ=0.010
φ=0.060
φ=0.111
φ=0.268
David Gleich · Purdue
40
Compute one diffusion, and all sweep-cuts, for all values of epsilon
Kloster & Gleich,
arXiv:1503.00322
PageRank solution paths
David Gleich · Purdue
41


These take about a second
to compute with our “new”
push-based algorithm on
graphs with millions of
nodes and edges

Related to the LARS
method for 1-norm
regularized problems
Use “centers” of graph partitions to
seed for overlapping communities
David Gleich · Purdue
42
0 10 20 30 40 50 60 70 80 90 100
0
Coverage (percentage)
Student Version of MATLAB
(a) AstroPh
0
0
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Coverage (percentage)
MaximumConductance
egonet
graclus centers
spread hubs
random
bigclam
(d) Flickr
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
MaximumConductance
Flickr social
network

2M vertices"
22M edges

We can cover
95% of network
with communities
of cond. ~0.15.
References and ongoing work
Gleich and Kloster – Relaxation methods for the matrix exponential, J.
Internet Math "
Kloster and Gleich – Heat kernel based community detection KDD2014
Gleich and Mahoney – Algorithmic Anti-differentiation, ICML 2014 "
Gleich and Mahoney – Regularized diffusions, Submitted
Whang, Gleich, Dhillon – Seeds for overlapping communities, CIKM 2013
www.cs.purdue.edu/homes/dgleich/codes/nexpokit!
www.cs.purdue.edu/homes/dgleich/codes/l1pagerank
•  Improved localization bounds for functions of matrices
•  Asynchronous and parallel “push”-style methods
•  Localized methods beyond conductance
David Gleich · Purdue
43
Supported by NSF CAREER 1149756-CCF 
 www.cs.purdue.edu/homes/dgleich

More Related Content

What's hot

Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationDavid Gleich
 
Lesson 26: Integration by Substitution (handout)
Lesson 26: Integration by Substitution (handout)Lesson 26: Integration by Substitution (handout)
Lesson 26: Integration by Substitution (handout)Matthew Leingang
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
A lattice-based consensus clustering
A lattice-based consensus clusteringA lattice-based consensus clustering
A lattice-based consensus clusteringDmitrii Ignatov
 
High-Performance Approach to String Similarity using Most Frequent K Characters
High-Performance Approach to String Similarity using Most Frequent K CharactersHigh-Performance Approach to String Similarity using Most Frequent K Characters
High-Performance Approach to String Similarity using Most Frequent K CharactersHolistic Benchmarking of Big Linked Data
 
A new generalized lindley distribution
A new generalized lindley distributionA new generalized lindley distribution
A new generalized lindley distributionAlexander Decker
 
Pattern-based classification of demographic sequences
Pattern-based classification of demographic sequencesPattern-based classification of demographic sequences
Pattern-based classification of demographic sequencesDmitrii Ignatov
 
Uncertainty Modeling in Deep Learning
Uncertainty Modeling in Deep LearningUncertainty Modeling in Deep Learning
Uncertainty Modeling in Deep LearningSungjoon Choi
 

What's hot (20)

Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportation
 
Lesson 26: Integration by Substitution (handout)
Lesson 26: Integration by Substitution (handout)Lesson 26: Integration by Substitution (handout)
Lesson 26: Integration by Substitution (handout)
 
Cs36565569
Cs36565569Cs36565569
Cs36565569
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
A lattice-based consensus clustering
A lattice-based consensus clusteringA lattice-based consensus clustering
A lattice-based consensus clustering
 
High-Performance Approach to String Similarity using Most Frequent K Characters
High-Performance Approach to String Similarity using Most Frequent K CharactersHigh-Performance Approach to String Similarity using Most Frequent K Characters
High-Performance Approach to String Similarity using Most Frequent K Characters
 
A new generalized lindley distribution
A new generalized lindley distributionA new generalized lindley distribution
A new generalized lindley distribution
 
Pattern-based classification of demographic sequences
Pattern-based classification of demographic sequencesPattern-based classification of demographic sequences
Pattern-based classification of demographic sequences
 
Uncertainty Modeling in Deep Learning
Uncertainty Modeling in Deep LearningUncertainty Modeling in Deep Learning
Uncertainty Modeling in Deep Learning
 

Viewers also liked

The spectre of the spectrum
The spectre of the spectrumThe spectre of the spectrum
The spectre of the spectrumDavid Gleich
 
Overlapping clusters for distributed computation
Overlapping clusters for distributed computationOverlapping clusters for distributed computation
Overlapping clusters for distributed computationDavid Gleich
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 
Graph libraries in Matlab: MatlabBGL and gaimc
Graph libraries in Matlab: MatlabBGL and gaimcGraph libraries in Matlab: MatlabBGL and gaimc
Graph libraries in Matlab: MatlabBGL and gaimcDavid Gleich
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsDavid Gleich
 
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...David Gleich
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveDavid Gleich
 
Iterative methods for network alignment
Iterative methods for network alignmentIterative methods for network alignment
Iterative methods for network alignmentDavid Gleich
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceDavid Gleich
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignmentDavid Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDavid Gleich
 
How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...David Gleich
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesDavid Gleich
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...David Gleich
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisDavid Gleich
 

Viewers also liked (16)

The spectre of the spectrum
The spectre of the spectrumThe spectre of the spectrum
The spectre of the spectrum
 
Overlapping clusters for distributed computation
Overlapping clusters for distributed computationOverlapping clusters for distributed computation
Overlapping clusters for distributed computation
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
Graph libraries in Matlab: MatlabBGL and gaimc
Graph libraries in Matlab: MatlabBGL and gaimcGraph libraries in Matlab: MatlabBGL and gaimc
Graph libraries in Matlab: MatlabBGL and gaimc
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulants
 
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspective
 
Iterative methods for network alignment
Iterative methods for network alignmentIterative methods for network alignment
Iterative methods for network alignment
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignment
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architectures
 
How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architectures
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysis
 

Similar to Localized methods in graph mining

EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overviewdgarijo
 
Simulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific DatasetsSimulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific DatasetsDavid Gleich
 
Ordinal Common-sense Inference
Ordinal Common-sense InferenceOrdinal Common-sense Inference
Ordinal Common-sense InferenceNaoki Otani
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RFinding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RRevolution Analytics
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Roy Clariana
 
Fast matrix computations for pair-wise and column-wise Katz scores and commut...
Fast matrix computations for pair-wise and column-wise Katz scores and commut...Fast matrix computations for pair-wise and column-wise Katz scores and commut...
Fast matrix computations for pair-wise and column-wise Katz scores and commut...David Gleich
 
Higher-order clustering coefficients
Higher-order clustering coefficientsHigher-order clustering coefficients
Higher-order clustering coefficientsAustin Benson
 
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseLarge-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseAapo Kyrölä
 
Constructionof heuristicsforasearch basedapproachtosolvingsudoku
Constructionof heuristicsforasearch basedapproachtosolvingsudokuConstructionof heuristicsforasearch basedapproachtosolvingsudoku
Constructionof heuristicsforasearch basedapproachtosolvingsudokuDevArena1
 
A spatio-temporal scientometrics framework for exploring the citation impact ...
A spatio-temporal scientometrics framework for exploring the citation impact ...A spatio-temporal scientometrics framework for exploring the citation impact ...
A spatio-temporal scientometrics framework for exploring the citation impact ...Song Gao
 
Kdd'20 presentation 223
Kdd'20 presentation 223Kdd'20 presentation 223
Kdd'20 presentation 223Manh Tuan Do
 
20140327 - Hashing Object Embedding
20140327 - Hashing Object Embedding20140327 - Hashing Object Embedding
20140327 - Hashing Object EmbeddingJacob Xu
 
Sketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignmentSketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignmentssuser2be88c
 
Neo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpNeo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpAdrian Ziegler
 

Similar to Localized methods in graph mining (20)

EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
 
Simulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific DatasetsSimulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific Datasets
 
Ordinal Common-sense Inference
Ordinal Common-sense InferenceOrdinal Common-sense Inference
Ordinal Common-sense Inference
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Interactive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social GraphsInteractive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social Graphs
 
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RFinding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016
 
Fast matrix computations for pair-wise and column-wise Katz scores and commut...
Fast matrix computations for pair-wise and column-wise Katz scores and commut...Fast matrix computations for pair-wise and column-wise Katz scores and commut...
Fast matrix computations for pair-wise and column-wise Katz scores and commut...
 
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
 
Higher-order clustering coefficients
Higher-order clustering coefficientsHigher-order clustering coefficients
Higher-order clustering coefficients
 
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseLarge-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
 
Presentation @SIGIR2015
Presentation @SIGIR2015Presentation @SIGIR2015
Presentation @SIGIR2015
 
Constructionof heuristicsforasearch basedapproachtosolvingsudoku
Constructionof heuristicsforasearch basedapproachtosolvingsudokuConstructionof heuristicsforasearch basedapproachtosolvingsudoku
Constructionof heuristicsforasearch basedapproachtosolvingsudoku
 
Gis lecture #01
Gis lecture #01Gis lecture #01
Gis lecture #01
 
A spatio-temporal scientometrics framework for exploring the citation impact ...
A spatio-temporal scientometrics framework for exploring the citation impact ...A spatio-temporal scientometrics framework for exploring the citation impact ...
A spatio-temporal scientometrics framework for exploring the citation impact ...
 
Kdd'20 presentation 223
Kdd'20 presentation 223Kdd'20 presentation 223
Kdd'20 presentation 223
 
20140327 - Hashing Object Embedding
20140327 - Hashing Object Embedding20140327 - Hashing Object Embedding
20140327 - Hashing Object Embedding
 
Sketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignmentSketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignment
 
Neo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpNeo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExp
 

Recently uploaded

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Localized methods in graph mining

  • 1. Localized methods in ! graph mining David F. Gleich! Purdue University! Joint work with Kyle Kloster @" Purdue & Michael Mahoney @ Berkeley supported by " NSF CAREER CCF-1149756 David Gleich · Purdue 1
  • 2. David Gleich · Purdue 2
  • 3. STORY TIME •  Simple theme •  Many pictures! David Gleich · Purdue 3
  • 4. Localized methods in graph mining " use the local structure of a network ! (and not the global structure). USE THIS NOT THIS David Gleich · Purdue 4
  • 5. Point 1 " Localized methods are the right thing to use for large graph mining Point 2 " Localized methods are still the right thing to use even if you don't believe my answer to part 1. David Gleich · Purdue 5
  • 6. Some graphs have global structure David Gleich · Purdue 6 Image by R. Rossi from our paper" on clique detection for " Temporal Strong-Components
  • 7. Some graphs do not David Gleich · Purdue 7
  • 8. Some graphs are random David Gleich · Purdue 8
  • 9. Can you tell which one is random? David Gleich · Purdue 9
  • 10. At large scales, ! real networks ! look random (or slightly better) David Gleich · Purdue 10
  • 11. Localized methods only operate on meaningful local structures in the data David Gleich · Purdue 11
  • 12. CAVEATS There are large-scale global structures. BUT They don’t look like what your small-scale intuition would predict. Continents exist in Facebook, but they don’t look small scale structures Leskovec, Lang, Dasgupta, Mahoney. Internet Math, 2009. Ugander, Backstrom, WSDM (2013) Jeub, Balachandran, Porter, Mucha, Mahoney. Phys Rev E 2015. David Gleich · Purdue 12
  • 13. Point 1 " Localized methods are the right thing to use for large graph mining Point 2 " Localized methods are still the right thing to use even if you don't believe my answer to part 1. David Gleich · Purdue 13
  • 14. Local algorithms give fast answers to global queries " (for small-source diffusions) David Gleich · Purdue 14
  • 15. Local algorithms give useful answers to global queries " (for small-source diffusions) David Gleich · Purdue 15
  • 16. Pictures from Sparse Matrix Respository (David & Hu) www.cise.ufl.edu/research/sparse/matrices/ David Gleich · Purdue 16
  • 17. Graph diffusions David Gleich · Purdue 17 ate t in on work, or mesh, from a typical problem in scientific computing high low Diffusions show how {importance, rank, information, status, …} flows from a source to target nodes via edges
  • 18. Graph diffusions David Gleich · Purdue 18 f = 1X k=0 ↵k Pk s ate t in on work, or mesh, from a typical problem in scientific computing high low A – adjacency matrix! D – degree matrix! P – column stochastic operator s – the “seed” (a sparse vector) f – the diffusion result 𝛼k – the path weights P = AD 1 Px = X j!i 1 dj xj Graph diffusions help: 1.  Attribute prediction 2.  Community detection 3.  “Ranking” 4.  Find small conductance sets 5.  Graph label propagation
  • 19. Graph diffusions David Gleich · Purdue 19 ate t in on work, or mesh, from a typical problem in scientific computing high low h = e t 1X k=0 tk k! Pk s h = e t exp{tP}s PageRank Heat kernel x = (1 ) 1X k=0 k Pk s (I P)x = (1 )s P = AD 1 Px = X j!i 1 dj xj
  • 20. Graph diffusions David Gleich · Purdue 20 h = e t 1X k=0 tk k! Pk s h = e t exp{tP}s PageRank Heat kernel 0 20 40 60 80 100 10 −5 10 0 t=1 t=5 t=15 α=0.85 α=0.99 Weight Length x = (1 ) 1X k=0 k Pk s (I P)x = (1 )s
  • 22. Diffusion based ! community detection 1.  Given a seed, approximate the diffusion. 2.  Extract the community. Both are local operations. David Gleich · Purdue 22
  • 23. Conductance communities Conductance is one of the most important community scores [Schaeffer07] The conductance of a set of vertices is the ratio of edges leaving to total edges: Equivalently, it’s the probability that a random edge leaves the set. Small conductance ó Good community (S) = cut(S) min vol(S), vol( ¯S) (edges leaving the set) (total edges in the set) David Gleich · Purdue cut(S) = 7 vol(S) = 33 vol( ¯S) = 11 (S) = 7/11 23
  • 24. Andersen- Chung-Lang! personalized PageRank community theorem! [Andersen et al. 2006]! [Ghosh et al. 2014, KDD] Informally Suppose the seeds are in a set of good conductance, then a sweep-cut on a diffusion will find a set with conductance that’s nearly as good. … also, it’s really fast. David Gleich · Purdue 24
  • 25. Sweep-cuts find small- conductance sets Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 (b) Zhou (3 labels) (c) Andersen-Lang (3 labels) (d) Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 (a) The adjacency structure of our sample with the three unbalanced classes indicated. Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 (b) Zhou (3 labels) (c) Andersen-Lang (3 labels) Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 (e) Zhou (15 labels) (f) Andersen-Lang (15 labels) Figure 4: A study of the paradoxical e↵ects of value-based rounding on di↵usion GOOD ! SET 1! Check the conductance for all “prefixes” of the diffusion vector sorted by value – there is a fast update – O(sum of degrees work) GOOD ! SET 2! GOOD ! SET 3! David Gleich · Purdue 25
  • 26. Diffusions are localized " and we have algorithms to find their local regions David Gleich · Purdue 26
  • 27. Uniformly localized ! solutions in flickr plot(x) David Gleich · Purdue 27 0 2 4 6 8 10 x 10 5 0 0.02 0.04 0.06 0.08 0.1 10 0 10 2 10 4 10 6 10 −15 10 −10 10 −5 10 0 10 0 10 2 10 4 10 6 10 −15 10 −10 10 −5 10 0 nonzeros Crawl of flickr from 2006 ~800k nodes, 6M edges, beta=1/2 (I P)x = (1 )s nnz(x) ⇡ 800k kD1 (xx⇤ )k1"
  • 28. Our mission! Find the solution with work " roughly proportional to the " localization, not the matrix. David Gleich · Purdue 28
  • 29. Our Point" The push procedure gives " localized algorithms for diffusions " in a pleasingly wide variety of settings. Our Results" New empirical and theoretical insights into why and how “push” is so effective David Gleich · Purdue 29
  • 30. The Push Algorithm for PageRank Proposed (in closest form) in Andersen, Chung, Lang (also by McSherry, Jeh & Widom, Berkhin) for fast approx. PageRank Derived to show improved runtime for balanced solvers David Gleich · Purdue 30 1.  Used for empirical studies of “communities” 2.  Local Cheeger inequality. 3.  Used for “fast Page-Rank approximation” 4.  Works on massive graphs O(1 second) for 4 billion edge graph on a laptop. 5.  It yields weakly localized PageRank approximations! Newman’s netscience! 379 vertices, 1828 nnz Produce an ε-accurate entrywise localized PageRank vector in work 1 "(1 )
  • 31. Gauss-Seidel and ! Gauss-Southwell David Gleich · Purdue 31 Methods to solve A x = b x(k+1) = x(k) + ⇢j ej [Ax(k+1) ]j = [b]jUpdate such that In words “Relax” or “free” the jth coordinate of your solution vector in order to satisfy the jth equation of your linear system. Gauss-Seidel repeatedly cycle through j = 1 to n Gauss-Southwell use the value of j that has the highest magnitude residual r(k) = b Ax(k) a b c
  • 32. Almost “the push” method The Push Method! David Gleich · Purdue 32 1. x(1) = 0, r(1) = (1 )ei , k = 1 2. while any rj > "dj (dj is the degree of node j) 3. x(k+1) = x(k) + (rj "dj ⇢)ej 4. r(k+1) i = 8 >< >: "dj ⇢ i = j r(k) i + (rj "dj ⇢)/dj i ⇠ j r(k) i otherwise 5. k k + 1 ", ⇢ Only push “some” of the residual – If we want tolerance “eps” then push to tolerance “eps” and no further
  • 33. Push is fast! For the PageRank diffusion, Push gives constant work (entry-wise)." Andersen, Chung, Lang FOCS 2006 1.  For the Katz diffusion" Push works empirically fast " Bonchi, Gleich, et al., 2012, Internet Math. 2.  For the exponential" Push gives uniform localization on power-law graphs and fast runtimes" Gleich and Kloster, 2014, Internet Math. 3.  For the heat-kernel diffusion " Push gives constant work (entry-wise)" Kloster and Gleich, 2014, KDD 4.  For the PageRank diffusion " Push yields sparsity regularization" Gleich and Mahoney, ICML 2014 5.  For a general class of diffusions " There is a Cheeger inequality like before" Ghosh, Teng, et al. KDD 2014 6.  For the PageRank diffusion " Push gives the solution path in constant work (entry-wise)" Kloster and Gleich, arXiv:1503.00322 x = exp(tP)ei x = exp(P)ei (I P)x = (1 ↵)ei (I A)x = (1 ↵)ei PageRank Katz David Gleich · Purdue 33
  • 34. Push is useful! 1.  Push implicitly regularizes semi- supervised learning" Gleich and Mahoney, submitted 2.  Push gives state of the art results for overlapping community detection " Whang, Gleich, Dhillon, CIKM 2013! Whang, Gleich, Dhillon, In prep. 3.  Push for overlapping clusters decrease communication in parallel solutions" Andersen, Gleich, Mirrokni, WSDM 2012 David Gleich · Purdue 34 F1 F2 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 DBLP demon bigclam graclus centers spread hubs random egonet 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 dem big gra spr ran ego Figure 3: F1 and F2 measures comparing our algorithmic comm indicates better communities. 6 7 8 Run time demon bigclam graclus centers spread hubs random Our seed set because eac property ind sion method version. Als Seeding Phase Seed Set Expansion Phase Propagation Phase Joyce Jiyoung Whang, The University of Texas at Austin Conference on Information and Knowledge Management (8/44)
  • 35. HK PPR F1 0.87 0.34 Set size 14 67 F1 0.33 0.14 Set size 192 15293 Amazon (Average) Us! Prev. Best Thisset Heat Kernel Based Community Detection KDD 2014 Kyle Kloster and David F. Gleich! Purdue University f = exp{tP}s = 1X k=0 tk k! Pk s Convert to a linear system, and solve in constant time
  • 36. Heat kernel localization General recipe! 1.  Take problem X, " convert into a linear system 2.  Apply “push” to that linear system 3.  Analyze and bound total work David  Gleich  ·∙  Purdue   36   Heat kernel recipe! 1.  Convert into " " " 2.  Apply “push” 3.  Analyze work bound " x = exp(tP)ei 2 6 6 6 6 6 6 4 III tP/1 III tP/2 ... ... III tP/N III 3 7 7 7 7 7 7 5 2 6 6 6 6 6 6 4 v0 v1 ... ... vN 3 7 7 7 7 7 7 5 = 2 6 6 6 6 6 6 4 ei 0 ... ... 0 3 7 7 7 7 7 7 5
  • 37. There is a fast deterministic adaptation of the push method David Gleich · Purdue 37 Kloster & Gleich, KDD2014 ons hat hen, erm s is d to the s: (7) (8) tity em (9) to k ⇡ we # G is graph as dictionary -of -sets , # seed is an array of seeds , # t, eps , N, psis are precomputed x = {} # Store x, r as dictionaries r = {} # initialize residual Q = collections.deque () # initialize queue for s in seed: r[(s ,0)] = 1./ len(seed) Q.append ((s ,0)) while len(Q) > 0: (v,j) = Q.popleft () # v has r[(v,j)] ... rvj = r[(v,j)] # perform the hk -relax step if v not in x: x[v] = 0. x[v] += rvj r[(v,j)] = 0. mass = (t*rvj/( float(j)+1.))/ len(G[v]) for u in G[v]: # for neighbors of v next = (u,j+1) # in the next block if j+1 == N: # last step , add to soln x[u] += rvj/len(G(v)) continue if next not in r: r[next] = 0. thresh = math.exp(t)*eps*len(G[u]) thresh = thresh /(N*psis[j+1])/2. if r[next] < thresh and r[next] + mass >= thresh: Q.append(next) # add u to queue r[next] = r[next] + mass Figure 2: Pseudo-code for our algorithm as work- ing python code. The graph is stored as a dic- Let h = e t exp{tP}s. Let x = hk-push(") output Then kD 1 (x h)k1  " after looking at 2Net " edges. We believe that the bound below suffices N  2t log(1/") MMDS 2014 THEOREM!
  • 38. Analysis, three pages to one slide 1.  State the approximation error that results from approximating using the linear system.! “Standard” matrix-approximation result. 2.  Bound the work involved in doing push. ! Iterate y ≥ 0, residual r ≥ 0 " Each step moves “mass” from r to y, " keeps non-neg and increasing property." Each step moves at least “deg(i)·ε” mass in deg(i) work" So in T steps, we “push” Sum [ deg(i)·ε , i in each step]" But we can only push “so much”, so we can bound this from above, and invert to get a total work bound. David Gleich · Purdue 38 Kloster & Gleich, KDD2014 X i2steps "deg(i)  et
  • 39. Runtime David Gleich · Purdue 39 5 6 7 8 9 0 0.5 1 1.5 2 Runtime: hk vs. ppr log10(|V|+|E|) Runtime(s) hkgrow 50% 25% 75% pprgrow 50% 25% 75%
  • 40. PageRank solution paths 10 1 10 2 10 3 10 4 10 5 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 1/ε DegreenormalizedPageRank φ=0.005 φ=0.010 φ=0.060 φ=0.111 φ=0.268 David Gleich · Purdue 40 Compute one diffusion, and all sweep-cuts, for all values of epsilon Kloster & Gleich, arXiv:1503.00322
  • 41. PageRank solution paths David Gleich · Purdue 41 These take about a second to compute with our “new” push-based algorithm on graphs with millions of nodes and edges Related to the LARS method for 1-norm regularized problems
  • 42. Use “centers” of graph partitions to seed for overlapping communities David Gleich · Purdue 42 0 10 20 30 40 50 60 70 80 90 100 0 Coverage (percentage) Student Version of MATLAB (a) AstroPh 0 0 0 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Coverage (percentage) MaximumConductance egonet graclus centers spread hubs random bigclam (d) Flickr 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MaximumConductance Flickr social network 2M vertices" 22M edges We can cover 95% of network with communities of cond. ~0.15.
  • 43. References and ongoing work Gleich and Kloster – Relaxation methods for the matrix exponential, J. Internet Math " Kloster and Gleich – Heat kernel based community detection KDD2014 Gleich and Mahoney – Algorithmic Anti-differentiation, ICML 2014 " Gleich and Mahoney – Regularized diffusions, Submitted Whang, Gleich, Dhillon – Seeds for overlapping communities, CIKM 2013 www.cs.purdue.edu/homes/dgleich/codes/nexpokit! www.cs.purdue.edu/homes/dgleich/codes/l1pagerank •  Improved localization bounds for functions of matrices •  Asynchronous and parallel “push”-style methods •  Localized methods beyond conductance David Gleich · Purdue 43 Supported by NSF CAREER 1149756-CCF www.cs.purdue.edu/homes/dgleich