My talk from KDD2012 about vertex neighborhoods and low conductance cuts. See the paper here: http://arxiv.org/abs/1112.0031 and http://dl.acm.org/citation.cfm?id=2339628
Presentation on how to chat with PDF using ChatGPT code interpreter
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
1. Vertex Neighborhoods, !
Low Conductance Cuts, !
and Good Seeds for Local
Community Methods
DAVID F. GLEICH
PURDUE
C. SESHADHRI
SANDIA - LIVERMORE
KDD2012
David Gleich · Purdue
5. Neighborhoods are good communities
^
conductance
^
A Vertex
(4-4 𝜅)/(3-2 𝜅)
where 𝜅 is
the clustering
coefficient
and
the graph has a heavy tailed degree distribution
6. Neighborhoods are good communities
^
conductance
^
A Vertex
(4-4 𝜅)/(3-2 𝜅)
where 𝜅 is
the clustering
coefficient
and
the graph has a heavy tailed degree distribution
is a
y
7. A vertex neighborhood is a
“good” conductance community
in a graph with a heavy-tailed
degree distribution and large
clustering coefficient.
8. Our contributions
1. The previous theorem and its proof. This
shows that good communities are expected
and easy to find in modern networks with
heavy-tailed degrees and large clustering.
2. An empirical evaluation of neighborhood
communities that shows vertex
neighborhoods are the “backbone” of the
network community profile.
KDD2012
David Gleich · Purdue
9. Formal background for the theorem
1. Vertex neighborhoods
2. Low conductance cuts
3. Clustering coefficients
KDD2012
David Gleich · Purdue
10. Vertex neighborhoods
The set of a vertex and"
all its neighborhood
Also called an “egonet”
Prior research on egonets of social networks from
the “structural holes” perspective [Burt95,Kleinberg08].
Used for anomaly detection [Akoglu10], "
community seeds [Huang11,Schaeffer11], "
overlapping communities [Schaeffer07,Rees10].
KDD2012
David Gleich · Purdue
11. Conductance communities
Conductance is one of the most
important community scores [Schaeffer07]
The conductance of a set of vertices is
the ratio of edges leaving to total edges:
Equivalently, it’s the probability that a
random edge leaves the set.
Small conductance ó Good community
(S) =
cut(S)
min vol(S), vol( ¯S)
(edges leaving the set)
(total edges
in the set)
KDD2012
David Gleich · Purdue
cut(S) = 7
vol(S) = 33
vol( ¯S) = 11
(S) = 7/11
12. Clustering coefficients
Wedge
Global clustering coefficient
=
number of closed wedges
number of wedges
center of wedge
closed wedge
Probability that a
random wedge
is closed
KDD2012
David Gleich · Purdue
13. Simple version of theorem
If global clustering coefficient = 1, then "
the graph is a disjoint union of cliques.
Vertex neighborhoods are optimal communities!
KDD2012
David Gleich · Purdue
14. Theorem
Condition Let graph G have
clustering coefficient 𝜅 and "
have vertex degrees bounded "
by a power-law function with
exponent 𝛾 less than 3.
Theorem Then there exists a vertex
neighborhood with conductance
log degree
logprobability
↵1n/d
↵2n/d
4(1 )/(3 2)
KDD2012
David Gleich · Purdue
15. Proof Sketch
1) Large clustering coefficient "
⇒ many wedges are closed
2) Heavy tailed degree dist "
⇒ a few vertices have a very large degree
3) Large degree ⇒ O(d 2) wedges ⇒ “most” of wedges
Thus, there must exist a vertex with a high edge density ⇒
“good” conductance
Use the probabilistic method to formalize
10
0
10
1
10
2
10
3
10
4
0
0.2
0.4
0.6
0.8
1
CDFofNumberofWedges
Degree
KDD2012
David Gleich · Purdue
17. We view this theory as "
“intuition for the truth”
KDD2012
David Gleich · Purdue
18. Empirical Evaluation using
Network Community Profiles
la25 [27]) or as
304 [10]). The
the nodes.
and the edges
].
OD
ure to compute
he conductance
ost of the work
erformed when
We can express
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
1
10
−4
10
−3
10
0
10
1
1
10
−4
10
−3
fb-A-oneyear
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
soc-LiveJournal1 ca
10
0
10
0
10
0
10
0
Community Size
Minimum
conductance for
any community of
the given size
Canonical shape
found by
Leskovec, Lang,
Dasgupta, and
Mahoney
Holds for a variety
of approximations
to conductance.
KDD2012
David Gleich · Purdue
19. Empirical Evaluation using
Network Community Profiles
la25 [27]) or as
304 [10]). The
the nodes.
and the edges
].
OD
ure to compute
he conductance
ost of the work
erformed when
We can express
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
1
10
−4
10
−3
10
0
10
1
1
10
−4
10
−3
fb-A-oneyear
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
soc-LiveJournal1 ca
10
0
10
0
10
0
10
0
Community Size"
(Degree + 1)
Minimum
conductance for
any community
neighborhood of
the given size
“Egonet
community
profile” shows
the same
shape, 3 secs
to compute.
1.1M verts, 4M edges
The Fiedler
community
computed from
the normalized
Laplacian is a
neighborhood!
KDD2012
David Gleich · Purdue
Facebook data
from Wilson et
al. 2009
20. Not just one graph
10
5
10
5
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
max
deg
ver t s
2
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
max
deg
ver t s
2
arxiv
10
5
10
5
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
ver t s
2
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
ver t s
2
ca-AstroPh
10
0
10
0
t any procedure to compute
o compute the conductance
he graph. Most of the work
e cient is performed when
the vertex. We can express
s:
{v})/2
ighbors produces a triangle
double-counts). Note also
dges(N1(v))/2 dv. Then
ges(N1(v)). And so, given
compute the cut given the
well. This is easy to do with
tly stores the degrees.
hood communities
network community plot to
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
max
deg
1
10
−4
10
−3
1
10
−4
10
−3
soc-LiveJournal1
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
1
10
−2
10
−1
10
0
1
10
−2
10
−1
10
0
Number of vertices in cluster N
Figure 2: The best neighbo
ductance at each size (black
arXiv – 86k verts, 500k edges
soc-LiveJournal – 5M verts, 42M edges
15 more graphs available
www.cs.purdue.edu/~dgleich/codes/neighborhoods
KDD2012
David Gleich · Purdue
21. Filling in the !
Network Community Profile
la25 [27]) or as
304 [10]). The
the nodes.
and the edges
].
OD
ure to compute
he conductance
ost of the work
erformed when
We can express
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
1
10
−4
10
−3
10
0
10
1
1
10
−4
10
−3
fb-A-oneyear
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
soc-LiveJournal1 ca
10
0
10
0
10
0
10
0
Minimum
conductance for
any community
neighborhood of
the given size
We are missing
a region of the
NCP when we
just look at
neighborhoods
KDD2012
David Gleich · Purdue
Community Size"
(Degree + 1)
22. Personalized PageRank
Communities [Andersen06]
To find the canonical NCP structure, Leskovec et
al. used a personalized PageRank based
community finder.
These start with a single vertex seed, and then
expand the community based on the solution of a
personalized PageRank problem.
The resulting community satisfies a local Cheeger
inequality.
This needs to run thousands of times for an NCP
KDD2012
David Gleich · Purdue
23. 10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
Filling in the !
Network Community Profile
la25 [27]) or as
304 [10]). The
the nodes.
and the edges
].
OD
ure to compute
he conductance
ost of the work
erformed when
We can express
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
1
10
−4
10
−3
10
0
10
1
1
10
−4
10
−3
fb-A-oneyear
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
soc-LiveJournal1 ca
10
0
10
0
10
0
10
0
Minimum
conductance for
any community of
the given size
7807 seconds
This region
fills when
using the
PPR method
(like now!)
KDD2012
David Gleich · Purdue
Community Size"
24. Vertex Neighborhoods, !
Low Conductance Cuts, !
and Good Seeds for Local
Community Methods
KDD2012
David Gleich · Purdue
25. Am I a good seed?!
Locally Minimal Communities
“My conductance is the best locally.”
(N(v)) (N(w))
for all w adjacent to v
In Zachary’s Karate Club
network, there are four locally
minimal communities, the two
leaders and two peripheral
nodes.
KDD2012
David Gleich · Purdue
26. 10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
Locally minimal communities
capture extremal neighborhoods
la25 [27]) or as
304 [10]). The
the nodes.
and the edges
].
OD
ure to compute
he conductance
ost of the work
erformed when
We can express
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
1
10
−4
10
−3
10
0
10
1
1
10
−4
10
−3
fb-A-oneyear
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
soc-LiveJournal1 ca
10
0
10
0
10
0
10
0
Red dots are
conductance "
and size of a "
locally minimal
community
Usually about 1%
of # of vertices.
The red
circles – the
best local
mins – find
the extremes
in the egonet
profile.
KDD2012
David Gleich · Purdue
Community Size"
27. 10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
Filling in the NCP!
Growing locally minimal comm.
la25 [27]) or as
304 [10]). The
the nodes.
and the edges
].
OD
ure to compute
he conductance
ost of the work
erformed when
We can express
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
max
deg
10
0
10
1
1
10
−4
10
−3
10
0
10
1
1
10
−4
10
−3
fb-A-oneyear
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
2
10
3
10
4
10
5
10
−4
10
−3
10
−2
10
−1
10
0
max
deg
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
10
0
10
1
10
−4
10
−3
10
−2
10
−1
10
0
soc-LiveJournal1 ca
10
0
10
0
10
0
10
0
Growing only
locally minimal
communities
283 seconds
vs.
7807 seconds
Full NCP
Locally min
NCP
Original
Egonet
KDD2012
David Gleich · Purdue
Community Size"
29. Recap
A theorem relating clustering,"
heavy-tailed degrees, and"
low-conductance cuts of "
vertex neighborhoods.
Empirical evaluation of "
vertex neighborhoods.
More on k-cores in the paper.
⇒ Many communities are easy to find!
⇒ Explains success of community detection?
Acknowledgements!
David supported by NSF CAREER
award 1149756-CCF.
Sesh supported by the Sandia
LDRD program (project 158477) and
the applied mathematics program at
the Dept. of Energy.
KDD2012
David Gleich · Purdue
Code and results available online
www.cs.purdue.edu/~dgleich/
codes/neighborhoods
30. Two words on computing
Can be done by just
counting the triangles at
each node. Linear
complexity in |E| in a power-
law graph.
It’s possible to do this in
MapReduce too.
KDD2012
David Gleich · Purdue
32. Clustering coefficients
Wedge
Global clustering coefficient
Local clustering coefficient
=
number of closed wedges
number of wedges
Cv =
number of closed wedges centered at v
number of wedges centered at v
center of wedge
closed wedge
Probability that a
random wedge
is closed
KDD2012
David Gleich · Purdue