꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
New perspectives on measuring network clustering
1. MINING AND MODELING
NETWORK DATA
SIAM DM'18 1
SIAM DM 2018
June 7, 2018
Denver, Colorado, USA
Act 1.
09:30–09:55 New Perspectives on Measuring Network Clustering
10:00–10:25 Hypergraph Kronecker Models for Networks
10:30–10:55 Mitigating Overexposure in Viral Marketing
11:00–11:25 Modeling and Mining Dynamic Competition Networks
Act 2.
2:45-3:10 Graph Matching Via Low Rank Factors
3:15-3:40 Tuning the Activity of Neural Networks at Criticality
3:45-4:10 Detectability of Hierarchical Community Structure in Preprocessed Multilayer Networks
4:15-4:40 Evaluating Overfit and Underfit in Models of Network Community Structure
Davis and Leinhardt. The structure of positive interpersonal
relations in small groups. Sociological Theories in Progress,
1971.
2. New perspectives on
measuring network
clustering
Austin R. Benson · Cornell
SIAM DM'18Benson 2
Joint work with
Hao Yin · Stanford
Jure Leskovec · Stanford
Johan Ugander · Stanford
slides ⟶ bit.ly/arb-DM-18 code ⟶ github.com/arbenson/HigherOrderClustering.jl
3. Many networks are globally sparse but locally
dense.
SIAM DM'18Benson 3
Coauthorship network
Brain network
Sporns and
Bullmore, Nature
Rev. Neuro., 2012
Networks for real-world systems have modules, clusters, communities.
[Watts-Strogatz 98; Flake 00; Newman 04, 06; many others…]
4. The clustering coefficient is a fundamental
measure in network science about how much a
network clusters.
SIAM DM'18Benson 4
?
C(u) = fraction of length-2 paths centered at node u
that form a triangle.
Average clustering coefficient C = mean of C(u).
• Data insights. Average clustering coefficient is larger than we would expect.
[Watts-Strogatz 98] > 36k citations!
• Domain phenomenon. Triadic closure in sociology.
[Simmel 1908; Rapoport 53; Granovetter 73]
• Statistical Feature. Role discovery, anomaly detection, mental health study.
[Henderson+ 12; La Fond+ 14, 16; Bearman-Moody 2004]
• Modeling tool. Key property for generative models.
[Newman 09; Seshadhri-Kolda-Pinar 12; Roble+ 16]
-
5. This talk introduces two new classes of network
clustering measures with an eye towards data
mining.
SIAM DM'18Benson 5
1. Higher-order clustering coefficients.
The clustering coefficient measures the closure
probability of just one simple structure—the triangle.
We will show that triangles are sufficient to explain clustering
in only some networks. We need larger cliques.
There is evidence in the literature that this should be true…
• 4-cliques reveal community structure in word association and PPI networks [Palla+ 05]
• 4-/5-cliques (+ other structure) identify network type & dimension [Yaveroğlu+ 14, Bonato+ 14]
• 4-node motifs identify community structure in neural systems [Benson-Gleich-Leskovec 16]
6. This talk introduces two new classes of network
clustering measures with an eye towards data
mining.
SIAM DM'18Benson 6
2. Closure coefficients.
Why do we measure clustering from the center node?
?
We will show that
• large closure coefficients theoretically imply local community structure
• closure coefficients with directed edges expose hierarchy
The well-known proverb
a friend of my friend is my friend
suggests a different way of measuring clustering.
?
7. Part I. Higher-order clustering coefficients
SIAM DM'18Benson 7
Yin, Benson, and Leskovec.
Higher-order clustering in networks.
Physical Review E, 2018.
github.com/arbenson/HigherOrderClustering.jl
8. 1. Find a 2-clique 2. Attach adjacent edge 3. Check for (2+1)-clique
1. Find a 3-clique 2. Attach adjacent edge 3. Check for (3+1)-clique
1. Find a 4-clique 2. Attach adjacent edge 3. Check for (4+1)-
clique
8
C2 = avg. fraction of (2-clique, adjacent edge) pairs that induce a (2+1)-clique.
Increase clique size by 1 to get a higher-order clustering coefficient!
C3 = avg. fraction of (3-clique, adjacent edge) pairs that induce a (3+1)-clique.
C4 = avg. fraction of (4-clique, adjacent edge) pairs that induce a (4+1)-clique.
-
-
-
We view clustering as a clique expansion process.
SIAM DM'18Benson
9. 9
We can think of higher-order closure processes in
everyday life.
SIAM DM'18Benson
Alice
Bob
Charlie
1. Start with a group
of 3 friends.
2. One person in the
group befriends
someone new.
3.The group might
increase in size.
Dave
rollingstone.com
oprah.com
10. 10
Higher-order clustering coefficients offer
several advantages.
SIAM DM'18Benson
Theory & analysis.
• small-world and Gn,p random graph models.
• Extremal combinatorics for general graphs.
Data Insights.
• old idea ⟶ pretty much all real-world networks exhibit clustering.
• new idea ⟶ real-world networks may only cluster up to a certain order.
order.
11. 11
Background.
Local, average, and global clustering coefficients.
SIAM DM'18Benson
Second-order (classical)
local clustering
coefficient at node u.
Second-order (classical)
global clustering coefficient.
Second-order (classical)
average clustering
coefficient.
#
#
#
#
#
#
12. 12
Higher-order (third-order)
local, average, and global clustering coefficients.
SIAM DM'18Benson
Third-order
local clustering
coefficient at node u.
Third-order
global clustering coefficient.
Third-order
average clustering
coefficient.
#
#
#
#
#
#
13. Theorem [Watts-Strogatz 98]
13
We can analyze higher-order clustering with
small-world models.
SIAM DM'18Benson
• Start with n nodes and edges to 2k neighbors
and then rewire each edge with probability p.
n = 16
k = 3
p = 0
[Yin-Benson-Leskovec 18]
[Watts-Strogatz 98]
14. 14
We can also analyze higher-order clustering in
Gn,p.
SIAM DM'18Benson
Theorem [Yin-Benson-Leskovec 2017]
Everything scales exponentially in the order of the cluster coefficient...
Even if a node’s neighborhood is dense, i.e., C2(u) is large,
higher-order clustering still decays exponentially in Gn,p.
15. 15
Extremal combinatorics show relationships
between clustering coefficients of different orders.
SIAM DM'18Benson
Theorem [Yin-Benson-Leskovec 18]
17. Global clustering patterns varies widely across
datasets.
SIAM DM'18Benson 17
Neural connections 0.18 0.08 0.06 decreases with order
Facebook friendships 0.16 0.11 0.12 decreases and increases
Coauthorships 0.32 0.33 0.36 increases with order
Not obviously due to cliques in coauthorship!
High-degree nodes in co-authorships exhibit
clique + star structure where C3(u) > C2(u).
18. Average higher-order clustering also varies widely.
SIAM DM'18Benson 18
Neural connections 0.31 0.14
Random configurations 0.15 0.04
Random configurations (C2 fixed). 0.31 0.17
Facebook friendships 0.25 0.18
Random configurations 0.03 0.00
Random configurations (C2 fixed) 0.25 0.14
Coauthorships 0.68 0.61
Random configurations 0.01 0.00
Random configurations (C2 fixed). 0.68 0.60-
-
-
statistically
significantly
less
clustering
statistically
significantly
more clustering
Not significantly
different
clustering
(using sampling tools from [Bollobás 1980; Milo+ 03; Park-Newman 04; Colomer de Simón+ 13])
19. SIAM DM'18Benson 19
Local higher-order clustering gives a more nuanced
view.
Neural connections
Gn,p baseline
Upper bound
Facebook friendships Coauthorships
Dense but nearly
random regions
Dense and
structured regions
• Actual network data
• Random configuration with C2 fixed
-
Hitting upper
bound
20. SIAM DM'18Benson 20
• old idea ⟶ pretty much all real-world networks exhibit
clustering.
• new idea ⟶ networks may only cluster up to a certain order.
21. Part II. Closure coefficients.
SIAM DM'18Benson 21
Yin, Benson, and Leskovec.
The Local Closure Coefficient.
Submitted, 2018.
Yin, Ugander, and Benson.
Directed Closure Coefficients.
In preparation, 2018.
22. We typically measure clustering from the center of
a wedge, but we could just as well measure from
the head.
SIAM DM'18Benson 22
Clustering coefficient.
• A common friend provides a
friendship opportunity.
• C(u) = fraction of neighbor pairs
pairs that are connected.
Closure coefficient.
• A friend of my friend provides a
friendship opportunity.
• H(u) = fraction of neighbor pairs
that are connected.
23. There is no universal correlation between
clustering and closure.
SIAM DM'18Benson 23
24. Closure coefficients tend to increase with degree
while clustering coefficients tend to decrease with
degree.
SIAM DM'18Benson 24
Theorem [Yin-Benson-Leskovec]
degree
degree
25. Large closure coefficients imply existence of
communities.
SIAM DM'18Benson 25
(edges leaving S)
(edge end points in S)
Conductance is one of the most important cluster quality scores [Schaeffer 07]
u
Theorem [Yin-Benson-Leskovec]
where N(u) is the 1-hop neighborhood
of u.
26. Directed closure coefficients offer additional
insights.
SIAM DM'18Benson 26
• There are 8 analogous clustering coefficients, too.
• We’ll see that closure coefficients are more useful features than
clustering coefficients for some supervised prediction problems.
27. Closure coefficients help detect social hierarchy.
SIAM DM'18Benson 27
Davis and Leinhardt. The structure of
positive interpersonal relations in small
groups. Sociological Theories in Progress,
1971.
• Corporate law advice network [Lazega 01]
• Nodes are lawyers; ~50/50 associates/partners
• Edges represent advice
i → j if lawyer i goes to lawyer j for advice
To whom did you go for basic professional advice? For instance, you
want to make sure that you are handling a case right, making a proper
decision, and you want to consult someone whose professional opinions
are in general of great value to you. By advice I do not mean simply
technical advice.
• We used closure coefficients in a supervised learning setting to
predict seniority (if a lawyer is a partner).
28. Closure coefficients help detect social hierarchy.
SIAM DM'18Benson 28
Degree. 79%
Clustering. 64%
Degree + Clustering. 79%
Closure. 87%
Degree + Closure. 87%
• Lasso regression (L1-regularized linear regression).
• Features are in/out degree, clustering coeffs., closure coeffs.
Regularization level
29. Closure coefficients are good features for
classifying fish vs. non-fish in food webs.
SIAM DM'18Benson 29
Bascompte, Melián, and Sala. Interaction
strength combinations and the overfishing
of a marine food web. PNAS, 2005.
• Florida Bay food web [Ulanowicz+ 98]
• Nodes that represent species
• Edges represent carbon exchange
i → j if j consumes i
• Same experiments as for lawyer network,
but for predicting if a node is a fish.
30. Closure coefficients help detect fish in food webs.
SIAM DM'18Benson 30
Degree. 63%
Clustering. 70%
Degree + Clustering. 74%
Closure. 88%
Degree + Closure. 88%
• Lasso regression (L1-regularized linear regression).
• Features are in/out degree, clustering coeffs., closure coeffs.
Regularization level
31. We should keep various cluster measures in mind
when mining and modeling network data.
SIAM DM'18Benson 31
1. Higher-order clustering coefficients and closure coefficients offer
additional measures of network clustering.
→We should plug these features into ML pipelines for network data.
2. Only using triangles gives a misleading notion of clustering.
Some networks do not even exhibit clustering w/r/t larger cliques!
→ Are there models that capture higher-order clustering statistics?
3. Measuring clustering from the center of a wedge is also misleading.
Measuring from the head actually connects clustering to communities!
→ Are there models that capture closure statistics?
32. New perspectives on measuring network clustering.
Thanks for your attention!
SLDS/NS'18Benson 32
Austin R. Benson
http://cs.cornell.edu/~arb
@austinbenson
arb@cs.cornell.edu
Yin, Benson, and Leskovec. Higher-order clustering in networks. Physical Review E, 2018.
→ github.com/arbenson/HigherOrderClustering.jl
Yin, Benson, and Leskovec.The LocalClosure Coefficient. Submitted, 2018.
Yin, Ugander, and Benson. Directed Closure Coefficients. In preparation, 2018.