Network Science
Communities
1736 ?
2000+ ?
Section 2 Zachary’s Karate Club
W.W. Zachary, J. Anthropol. Res. 33:452-473 (1977).
A.-L. Barabási, Network Science: Communities.
Zachary's karate club is a social network of a university karate club,
described in the paper "An Information Flow Model for Conflict and
Fission in Small Groups" by Wayne W. Zachary. The network became a
popular example of community structure in networks after its use by
Michelle Girvan and Mark Newman in 2002.
A social network of a karate club was studied by Wayne
W. Zachary for a period of three years from 1970 to
1972.[2]
The network captures 34 members of a karate club,
documenting links between pairs of members who
interacted outside the club.
During the study a conflict arose between the
administrator "John A" and instructor "Mr. Hi"
(pseudonyms), which led to the split of the club into
two.
Half of the members formed a new club around Mr. Hi;
members from the other part found a new instructor or
gave up karate.
Based on collected data Zachary correctly assigned all
but one member of the club to the groups they actually
joined after the split.
Section 2 Zachary’s Karate Club
Citation history
of the Zachary’s Karate club paper
W.W. Zachary, J. Anthropol. Res. 33:452-473 (1977).
A.-L. Barabási, Network Science: Communities.
Section 2 Zachary Karate Club Club
The first scientist at any conference on networks
who uses Zachary's karate club as an example is
inducted into the Zachary Karate Club Club, and
awarded a prize.
Chris Moore (9 May 2013).
Mason Porter (NetSci, June 2013).
Yong-Year Ahn (Oxford University, July 2013)
Marián Boguñá (ECCS, September 2013).
Mark Newman (Netsci, June 2014)
http://networkkarate.tumblr.com/)
Section 2 Auxiliary information
 Karate Club:
Breakup of the club
 Belgian Phone Data:
Language spoken
• Belgium appears to be the model bicultural society:
59% of its citizens are Flemish, speaking Dutch and
40% are Walloons who speak French.
• Vincent Blondel and his students in 2007 developed
an algorithm to identify the country’s community
structure. They started from the mobile call
network.
Section 2 Biological Modules
E. Ravasz et al., Science 297 (2002).
A.-L. Barabási, Network Science: Communities.
The E. coli metabolism
offers a community
structure of biological
systems.
a.The biological modules
(communities) identified
by the Ravasz algorithm
Communities in Metabolic Networks
The E. coli metabolism offers a community structure of biological
systems [11].
a.The biological modules (communities) identified by the Ravasz
algorithm [11] (SECTION 9.3). The color of each node, capturing
the predominant biochemical class to which it belongs, indicates
that different functional classes are segregated in distinct
network neighborhoods. The highlighted region selects the nodes
that belong to the pyrimidine metabolism, one of the predicted
communities.
b.The topologic overlap matrix of the E. coli metabolism and the
corresponding dendrogram that allows us to identify the modules
shown in (a). The color of the branches reflect the predominant
biochemical role of the participating molecules, like
carbohydrates (blue), nucleotide and nucleic acid metabolism
(red), and lipid metabolism (cyan).
c.The red right branch of the dendrogram tree shown in (b),
highlighting the region corresponding to the pyridine module.
d.The detailed metabolic reactions within the pyrimidine module.
The boxes around the reactions highlight the communities
predicted by the Ravasz algorithm.
Basics of communities
Section 3
What do we really mean by a community?
How many communities are in a network?
How many different ways can we partition a
network into communities?
Section 2 Communities
A.-L. Barabási, Network Science: Communities.
We focus on the mesoscopic scale of the network
Microscopic Mesoscopic Macroscopic
Section 2 Fundamental Hypothesis
A.-L. Barabási, Network Science: Communities.
H1: A network’s community structure is
uniquely encoded in its wiring diagram
According to the fundamental hypothesis there is a
ground truth about a network’s community
organization, that can be uncovered by inspecting
Aij.
Section 3 Basics of Communities
H2: Connectedness Hypothesis
A community corresponds to a connected
subgraph.
H3: Density Hypothesis
Communities correspond to locally dense
neighborhoods of a network.
A.-L. Barabási, Network Science: Communities.
Section 3 Basics of Communities
H2: Connectedness Hypothesis
A community corresponds to a connected
subgraph.
H3: Density Hypothesis
Communities correspond to locally dense
neighborhoods of a network.
A.-L. Barabási, Network Science: Communities.
Section 3 Basics of Communities
Cliques as communities
A clique is a complete subgraph of k-nodes
R.D. Luce & A.D. Perry, Psychometrika 14 (1949)
A.-L. Barabási, Network Science: Communities.
Section 3 Basics of Communities
• Triangles are frequent; larger cliques
are rare.
• Communities do not necessarily
correspond to complete subgraphs, as
many of their nodes do not link directly
to each other.
• Finding the cliques of a network is
computationally rather demanding,
being a so-called NP-complete problem.
Cliques as communities
Section 3 Basics of Communities
Consider a connected subgraph C of Nc nodes
Internal degree, ki
int : set of links of node i that connects
to other nodes of the same community C.
External degree ki
ext: the set of links of node i that
connects to the rest of the network.
If ki
ext=0: all neighbors of i belong to C, and C is a good
community for i.
If ki
int=0, all neighbors of i belong to other communities,
then i should be assigned to a different community.
Strong and weak communities
A.-L. Barabási, Network Science: Communities.
Section 3 Basics of Communities
Strong community:
Each node of C has more links within the
community than with the rest of the graph.
Weak community:
The total internal degree of C exceeds its
total external degree,
Clique Strong Weak
A.-L. Barabási, Network Science: Communities.
Each clique is a strong community and each strong community is a week
community. The converse is generally not true.
Section 3 Number of Partitions
How many ways can we partition a network into 2 communities?
Divide a network into two equal non-overlapping subgraphs, such that the
number of links between the nodes in the two groups is minimized.
Two subgroups of size n1 and n2. Total number of combinations:
N=10  256 partitions (1 ms)
N=100 1026 partitions (1021 years)
Graph bisection
A.-L. Barabási, Network Science: Communities.
Section 3 Graph Partitions (history)
2.5 billion transistors
partition the full wiring diagram of an
integrated circuit into smaller
subgraphs, so that they minimize the
number of connections between them.
Graph Partitioning
Two-way partitioning problem
Each node has unit size
Each edge has unit weight
Find two partition V1 and V2 such that
Each of V1 and V2 has equal size
External wiring will be minimum (cut-set will have
to minimize)
2
3
4
5
6
7
8 9
10
11
12
13
14
15
16
2
3
4
5
6
7
8 9
10
11
12
13
14
15
s = 1
t = 16
st-numbering
i s t
 , has two neighbors j, k
.
k
i
j 

2
3
4
5
6
7
8 9
10
11
12
13
14
15
s = 1
t = 16
st-numbering
Size of cutset = 4
2
3
4
5
6
7
8
9
10
11
12
13
14
15
s = 1
t = 16
st-numbering
Size of cutset = 3
To find a bipartition with the minimum cutset, we have to enumerate all
bipartitions.
We need to enumerate all st-numbering.
Two-way partitioning problem
Each node has unit size
Each edge has unit weight
Find two partitions V1 and V2 such that
Each of V1 and V2 has equal size
External wiring will be minimum (cut-set will have
to minimmize)
NP-hard problem.
Heuristic techniques to approximate solutions.
Section 3 Graph Partitions (history)
Kerninghan-Lin Algorithm for graph bisection
• Partition a network into two groups of
predefined size. This partition is called cut.
• Inspect each a pair of nodes, one from each
group. Identify the pair that results in the largest
reduction of the cut size (links between the two
groups) if we swap them
• Swap them.
• If no pair deduces the cut size, we swap the pair
that increases the cut size the least.
• The process is repeated until each node is
moved once.
Fiduccia–Mattheyses (FM) Partitioning Algorithm
Kernighan-Lin (KL) Algorithm




B
A
n
B
A

Initial partition A, B
Size of the cut set 


B
b
A
a
ab
c
T
,
We have to minimize the size of the cut set.
Initial Partition
Optimal Partition
A, B
A*, B*
Swap B
Y
with
A
X 
 such that
B
A
Y
B
A
X
Y
X


*
*



Initial Partition Optimal Partition
B*
Swap B
Y
with
A
X 
 such that
B
A
Y
B
A
X
Y
X


*
*



How to find X and Y ?
A
B
A*
X
Y X
Y
Kernighan-Lin (KL) Algorithm
• Iterate as long as the cutsize improves:
• Find a pair of vertices that result in the largest decrease in
cutsize if exchanged
• Exchange the two vertices (potential move)
• “Lock” the vertices
• If no improvement possible, and
still some vertices unlocked, then
exchange vertices that result in smallest increase in cutsize
Kernighan-Lin (KL) Algorithm
• Initialize
• Bipartition G into V1 and V2, s.t., |V1| = |V2|  1
• n = |V|
• Repeat
• for i=1 to n/2
• Find a pair of unlocked vertices vai V1 and vbi V2 whose
exchange makes the largest decrease or smallest increase
in cut-cost
• Mark vai and vbi as locked
• Store the gain gi.
• Find k, s.t. i=1..k gi=Gaink is maximized
• If Gaink > 0 then
move va1,...,vak from V1 to V2 and
vb1,...,vbk from V2 to V1.
• Until Gaink  0
Kernighan-Lin (KL) Example
a
b
c
d
e
f
g
h
4 { a, e } -2
0 -- 0
1 { d, g } 3
2 { c, f } 1
3 { b, h } -2
Step No. Vertex Pair Gain
5
5
2
1
3
Cut-cost
[©Sarrafzadeh]
Gain sum
0
3
4
2
0
Kernighan-Lin (KL) Example
a
b c
d
e
f
g
h
4 { a, e } -2
0 -- 0
1 { d, g } 3
2 { c, f } 1
3 { b, h } -2
Step No. Vertex Pair Gain
5
5
2
1
3
Cut-cost
[©Sarrafzadeh]
Gain sum
0
3
4
2
0
Kernighan-Lin (KL) : Analysis
• Time complexity?
• Inner (for) loop
• Iterates n/2 times
• Iteration 1: (n/2) x (n/2)
• Iteration i: (n/2 – i + 1)2.
• Passes? Usually independent of n
• O(n3)
• Drawbacks?
• Local optimum
• Balanced partitions only
• No weight for the vertices
• High time complexity
Internal
cost
GA
GB
a1
a2
an
ai
a3
a5 a6
a4
b2
bj
b4 b3
b1
b6
b7
b5
 
 






A
x B
y
y
b
x
b
b
b
b
a
a
a
j
j
j
j
j
i
i
i
C
C
I
E
D
I
E
D
Likewise,
[©Kang]
External
cost

 



B
y
y
a
a
A
x
x
a
a i
i
i
i
C
E
C
I ,
• Lemma: Consider any ai  A, bj  B.
If ai, bj are interchanged, the gain is
• Proof:
Total cost before interchange (T) between A and B
Total cost after interchange (T’) between A and B
Therefore
Gain Calculation (cont.)
j
i
j
i b
a
b
a C
D
D
g 2



[©Kang]
others)
all
for
cost
(



 j
i
j
i b
a
b
a C
E
E
T
others)
all
for
cost
(




 j
i
j
i b
a
b
a C
I
I
T
j
i
j
j
i
i b
a
b
b
a
a C
I
E
I
E
T
T
g 2








i
a
D j
b
D
Gain Calculation (cont.)
• Lemma:
• Let Dx’, Dy’ be the new D values for elements of
A - {ai} and B - {bj}. Then after interchanging ai & bj,
• Proof:
• The edge x-ai changed from internal in Dx to external in Dx’
• The edge y-bj changed from internal in Dx to external in Dx’
• The x-bj edge changed from external to internal
• The y-ai edge changed from external to internal
• More clarification in the next two slides
}
{
,
2
2
}
{
,
2
2
j
ya
yb
y
y
i
xb
xa
x
x
b
B
y
C
C
D
D
a
A
x
C
C
D
D
i
j
j
i












[©Kang]
Clarification of the Lemma
ai
bj
x
a
b
• Decompose Ix and Ex to separate edges from ai and
bj:
Write the equations before the move
• ... And after the move
b
a 


 j
i xb
x
xa
x C
E
C
I
j
i
i
j
xb
xa
xa
xb
x
x
x
C
C
C
C
I
E
D











b
a
a
b )
(
)
(
j
i
j
i
xb
xa
x
xb
xa
x
C
C
D
C
C
D
2
2 








b
a
b
a 





i
j xa
x
xb
x C
E
C
I
Example: KL
• Step 1 - Initialization
A = {2, 3, 4}, B = {1, 5, 6}
A’ = A = {2, 3, 4}, B’ = B = {1, 5, 6}
• Step 2 - Compute D values
D1 = E1 - I1 = 1-0 = +1
D2 = E2 - I2 = 1-2 = -1
D3 = E3 - I3 = 0-1 = -1
D4 = E4 - I4 = 2-1 = +1
D5 = E5 - I5 = 1-1 = +0
D6 = E6 - I6 = 1-1 = +0
[©Kang]
5
6
4 2 1
3
Initial partition
4
5
6
2
3
1
Example: KL (cont.)
• Step 3 - compute gains
g21 = D2 + D1 - 2C21 = (-1) + (+1) - 2(1) = -2
g25 = D2 + D5 - 2C25 = (-1) + (+0) - 2(0) = -1
g26 = D2 + D6 - 2C26 = (-1) + (+0) - 2(0) = -1
g31 = D3 + D1 - 2C31 = (-1) + (+1) - 2(0) = 0
g35 = D3 + D5 - 2C35 = (-1) + (0) - 2(0) = -1
g36 = D3 + D6 - 2C36 = (-1) + (0) - 2(0) = -1
g41 = D4 + D1 - 2C41 = (+1) + (+1) - 2(0) = +2
g45 = D4 + D5 - 2C45 = (+1) + (+0) - 2(+1) = -1
g46 = D4 + D6 - 2C46 = (+1) + (+0) - 2(+1) = -1
• The largest g value is g41 = +2
interchange 4 and 1 (a1, b1) = (4, 1)
A’ = A’ - {4} = {2, 3}
B’ = B’ - {1} = {5, 6} both not empty
[©Kang]
Example: KL (cont.)
• Step 4 - update D values of node connected to vertices (4, 1)
D2’ = D2 + 2C24 - 2C21 = (-1) + 2(+1) - 2(+1) = -1
D5’ = D5 + 2C51 - 2C54 = +0 + 2(0) - 2(+1) = -2
D6’ = D6 + 2C61 - 2C64 = +0 + 2(0) - 2(+1) = -2
• Assign Di = Di’, repeat step 3 :
g25 = D2 + D5 - 2C25 = -1 - 2 - 2(0) = -3
g26 = D2 + D6 - 2C26 = -1 - 2 - 2(0) = -3
g35 = D3 + D5 - 2C35 = -1 - 2 - 2(0) = -3
g36 = D3 + D6 - 2C36 = -1 - 2 - 2(0) = -3
• All values are equal;
arbitrarily choose g36 = -3  (a2, b2) = (3, 6)
A’ = A’ - {3} = {2}, B’ = B’ - {6} = {5}
New D values are:
D2’ = D2 + 2C23 - 2C26 = -1 + 2(1) - 2(0) = +1
D5’ = D5 + 2C56 - 2C53 = -2 + 2(1) - 2(0) = +0
• New gain with D2  D2’, D5  D5’
g25 = D2 + D5 - 2C52 = +1 + 0 - 2(0) = +1  (a3, b3) = (2, 5) [©Kang]
Example: KL (cont.)
• Step 5 - Determine the # of
moves to take
g1 = +2
g1 + g2 = +2 - 3 = -1
g1 + g2 + g3 = +2 - 3 + 1 = 0
• The value of k for max G is 1
X = {a1} = {4}, Y = {b1} = {1}
• Move X to B, Y to A  A = {1, 2, 3}, B = {4, 5, 6}
• Repeat the whole process:
• • • • •
• The final solution is A = {1, 2, 3}, B = {4, 5, 6}
5
6
4 2 1
3
Section 3 Number of communities
Community detection
The number and size of the communities are unknown at the beginning.
Partition
Division of a network into groups of nodes, so that each node belongs to one group.
Bell Number: number of possible partitions
of N nodes
A.-L. Barabási, Network Science: Communities.
Hierarchical Clustering
Section 4
Section 4 Hierarchical Clustering
Agglomerative algorithms merge nodes and communities with high
similarity.
Divisive algorithms split communities by removing links that connect
nodes with low similarity.
1. Build a similarity matrix for the network
2. Similarity matrix: how similar two nodes are to each other  we need to
determine from the adjacency matrix
3. Hierarchical clustering iteratively identifies groups of nodes with high similarity,
following one of two distinct strategies:
Hierarchical tree or dendrogram: visualize the history of the merging or splitting
process the algorithm follows. Horizontal cuts of this tree offer various community
partitions.
4.
Section 4 Agglomerative Algorithms
Step 1: Define the Similarity Matrix (Ravasz algorithm)
• High for node pairs that likely belong to the same
community, low for those that likely belong to different
communities.
• Nodes that connect directly to each other and/or share
multiple neighbors are more likely to belong to the same
dense local neighborhood, hence their similarity should
be large.
Topological overlap matrix:
JN(i,j): number of common
neighbors of node i and j;
(+1) if there is a direct link
between i and j;
E. Ravasz et al., Science 297 (2002).
A.-L. Barabási, Network Science: Communities.
Agglomerative algorithms merge nodes and communities with high similarity.
Section 4 Agglomerative Algorithms
E. Ravasz et al., Science 297 (2002).
A.-L. Barabási, Network Science: Communities.
Step 2: Decide Group Similarity
• Groups are merged based on their mutual similarity through single, complete or
average cluster linkage
Section 4 Agglomerative Algorithms
Step 3: Apply Hierarchical Clustering
• Assign each node to a community of its own and evaluate the similarity
for all node pairs. The initial similarities between these “communities” are
simply the node similarities.
• Find the community pair with the highest similarity and merge them to
form a single community.
• Calculate the similarity between the new community and all other
communities.
• Repeat from Step 2 until all nodes are merged into a single community.
Step 4: Build Dendrogram
• Describes the precise order in which the nodes are assigned to
communities.
E. Ravasz et al., Science 297 (2002).
A.-L. Barabási, Network Science: Communities.
Section 4 Agglomerative Algorithms
Computational complexity:
• Step 1 (calculation similarity matrix):
• Step 2-3 (group similarity):
• Step 4 (dendrogram): E. Ravasz et al., Science 297 (2002).
A.-L. Barabási, Network Science: Communities.
Section 4 Divisive Algorithms
Step 1: Define a Centrality Measure (Girvan-Newman algorithm)
• Link betweenness is the number of shortest paths
between all node pairs that run along a link.
• Random-walk betweenness. A pair of nodes m and n are
chosen at random. A walker starts at m, following each
adjacent link with equal probability until it reaches n.
Random walk betweenness xij is the probability that the
link i→j was crossed by the walker after averaging over
all possible choices for the starting nodes m and n
Divisive algorithms split communities by removing links that connect nodes
with low similarity.
M. Girvan & M.E.J. Newman, PNAS 99 (2002).
A.-L. Barabási, Network Science: Communities.
Section 4 Divisive Algorithms
M. Girvan & M.E.J. Newman, PNAS 99 (2002).
A.-L. Barabási, Network Science: Communities.
Step 2: Hierarchical Clustering
a) Compute of the centrality of
each link.
b) Remove the link with the
largest centrality; in case of a
tie, choose one randomly.
c) Recalculate the centrality of
each link for the altered
network.
d) Repeat until all links are
removed (yields a
dendrogram).
Section 4 Divisive Algorithms
M. Girvan & M.E.J. Newman, PNAS 99 (2002).
A.-L. Barabási, Network Science: Communities.
Step 2: Hierarchical Clustering
a) Compute of the centrality of
each link.
b) Remove the link with the
largest centrality; in case of a
tie, choose one randomly.
c) Recalculate the centrality of
each link for the altered
network.
d) Repeat until all links are
removed (yields a
dendrogram).
Section 4 Divisive Algorithm
M. Girvan & M.E.J. Newman, PNAS 99 (2002).
A.-L. Barabási, Network Science: Communities.
Computational complexity:
• Step 1a (calculation betweenness
centrality):
• Step 1b (Recalculation of betweenness
centrality for all links):
for sparse networks
Section 4 Hierarchy in networks
Can a hierarchical network be scale-free?
Section 4 Hierarchy in networks
(1) Scale-free property
The obtained network is scale-free, its
degree distribution following a power-
law with
E. Ravasz & A.-L. Barabási, PRE 67 (2003).
A.-L. Barabási, Network Science: Communities.
A construction
Section 4 Hierarchy in networks
(1) Scale-free property
The obtained network is scale-free, its
degree distribution following a power-
law with
E. Ravasz & A.-L. Barabási, PRE 67 (2003).
A.-L. Barabási, Network Science: Communities.
Section 4 Hierarchy in networks
(2) Clustering coefficient scaling with k
Small k nodes:
*high clustering
coefficient;
*their neighbors tend to
link to each other in highly
interlinked, compact
communities.
High k nodes (hubs):
*small clustering coefficient;
*connect independent
communities.
E. Ravasz & A.-L. Barabási, PRE 67 (2003).
A.-L. Barabási, Network Science: Communities.
Section 4 Hierarchy in networks
(3) Clustering coefficient independent of N
E. Ravasz & A.-L. Barabási, PRE 67 (2003).
A.-L. Barabási, Network Science: Communities.
Section 4 Hierarchy in networks
(3) Clustering coefficient independent of N
E. Ravasz & A.-L. Barabási, PRE 67 (2003).
A.-L. Barabási, Network Science: Communities.
2. Scaling clustering
coefficient (DGM)
1. Scale-free 3. Clustering coefficient
independent of N
x
E. Ravasz & A.-L. Barabási, PRE 67 (2003).
A.-L. Barabási, Network Science: Communities.
Section 4 Hierarchy in networks
A.-L. Barabási, Network Science: Communities.
Section 4 Hierarchy in real networks
POWER GRID INTERNET
Section 4 Ambiguity in Hierarchical clustering
A.-L. Barabási, Network Science: Communities.
Where to “cut”?
Phylogenetic dendrograms
In bioinformatrics, clusters and dendrograms have been studied for a long time.
For example, the sequences of the same protein or gene in different species are
selected, and compared with each other.
Phylogenetic dendrograms
A similarity matrix is constructed between these sequences,
by looking at how many aminoacids/nucleotides stay in place
Phylogenetic dendrograms
A similarity matrix is constructed between these sequences,
by looking at how many aminoacids/nucleotides stay in place
Phylogenetic dendrograms
Phylogenetic dendrograms
Modularity
Section 4
Section 4 Modularity
MEJ Newman, PNAS 103 (2006).
A.-L. Barabási, Network Science: Communities.
H4: Random Hypothesis
Randomly wired networks are not expected to have a community structure.
Section 4 Modularity
MEJ Newman, PNAS 103 (2006).
A.-L. Barabási, Network Science: Communities.
Imagine a partition in nc communities
Modularity
H4: Random Hypothesis
Randomly wired networks are not expected to have a community structure.
Section 4 Modularity
MEJ Newman, PNAS 103 (2006).
A.-L. Barabási, Network Science: Communities.
Imagine a partition in nc communities
Modularity
Original data
H4: Random Hypothesis
Randomly wired networks are not expected to have a community structure.
Section 4 Modularity
MEJ Newman, PNAS 103 (2006).
A.-L. Barabási, Network Science: Communities.
Imagine a partition in nc communities
Modularity
Original data Expected connections,
a model
H4: Random Hypothesis
Randomly wired networks are not expected to have a community structure.
Section 4 Modularity
MEJ Newman, PNAS 103 (2006).
A.-L. Barabási, Network Science: Communities.
Imagine a partition in nc communities
Modularity
Original data Expected connections,
a model
Relative to a specific
partition
H4: Random Hypothesis
Randomly wired networks are not expected to have a community structure.
Section 4 Modularity
MEJ Newman, PNAS 103 (2006).
A.-L. Barabási, Network Science: Communities.
Imagine a partition in nc communities
Modularity
Original data Expected connections,
a model
Relative to a specific
partition
Modularity is a measure associated to a partition
Random network
H4: Random Hypothesis
Randomly wired networks are not expected to have a community structure.
Section 4 Modularity
Another way of writing M
MEJ Newman, PNAS 103 (2006).
A.-L. Barabási, Network Science: Communities.
where LC is the number of links within C. In a similar fashion, the second term becomes
We can rewrite the first term as
Finally we get:
Section 4 Modularity
MEJ Newman, PNAS 103 (2006).
A.-L. Barabási, Network Science: Communities.
H5: Maximal Modularity Hypothesis
The partition with the maximum modularity M for a given network offers the
optimal community structure
Section 4 Modularity
MEJ Newman, PNAS 103 (2006).
A.-L. Barabási, Network Science: Communities.
H5: Maximal Modularity Hypothesis
The partition with the maximum modularity M for a given network offers the
optimal community structure
Find
Goal
that maximizes M
Section 4 Modularity
• Optimal partition, that
maximizes the modularity.
• Sub-optimal but positive
modularity.
• Negative Modularity: If we
assign each node to a different
community.
• Zero modularity: Assigning all
nodes to the same community,
independent of the network
structure.
• Modularity is size dependent
Which partition ?
A.-L. Barabási, Network Science: Communities.
Section 4 Modularity based community identification
A greedy algorithm, which iteratively joins nodes if the move increases the new
partition’s modularity.
Step 1. Assign each node to a community of its own. Hence we start with N
communities.
Step 2. Inspect each pair of communities connected by at least one link and
compute the modularity variation obtained if we merge these two communities.
Step 3. Identify the community pairs for which ΔM is the largest and merge them.
Note that modularity of a particular partition is always calculated from the full
topology of the network.
Step 4. Repeat step 2 until all nodes are merged into a single community.
Step 5. Record for each step and select the partition for which the modularity is
maximal.
MEJ Newman, PRE 69 (2004).
A.-L. Barabási, Network Science: Communities.
Section 4 Modularity
Which partition ?
A.-L. Barabási, Network Science: Communities.
Modularity can be used to compare different partitions provided by
other algorithms, like hierarchical clustering
It can be used to design new algorithms, aiming at maximizing M
Section 4 Modularity for the Girvan-Newman
Which partition ?
A.-L. Barabási, Network Science: Communities.
Section 4 Modularity based community identification
MEJ Newman, PRE 69 (2004).
A.-L. Barabási, Network Science: Communities.
Computational complexity:
• Step 1-2 (calculation of ΔM for L links ):
• Step 3 (matrix update):
• Step 4 (N-1 community merges):
for sparse networks
Section 4 Modularity based community identification
MEJ Newman, PRE 69 (2004).
A.-L. Barabási, Network Science: Communities.
Computational complexity:
• Step 1-2 (calculation of ΔM for L links ):
• Step 3 (matrix update):
• Step 4 (N-1 community merges):
for sparse networks
Section 4 Limits of Modularity
A.-L. Barabási, Network Science: Communities.
kA and kB total degree in A and B
A B
Resolution limit
Section 4 Limits of Modularity
A.-L. Barabási, Network Science: Communities.
kA and kB total degree in A and B
If and
A B
Resolution limit
Section 4 Limits of Modularity
A.-L. Barabási, Network Science: Communities.
kA and kB total degree in A and B
If and
A B
We merge A and B to
maximize modularity.
Resolution limit
Section 4 Limits of Modularity
A.-L. Barabási, Network Science: Communities.
kA and kB total degree in A and B
If and
Assuming
A B
We merge A and B to
maximize modularity.
Resolution limit
Section 4 Limits of Modularity
A.-L. Barabási, Network Science: Communities.
kA and kB total degree in A and B
If and
Assuming
Modularity has a resolution limit, as it cannot detect communities smaller than
this size.
A B
We merge A and B to
maximize modularity.
Resolution limit
Section 4 Limits of Modularity
A.-L. Barabási, Network Science: Communities.
One maximum?
Section 4 Limits of Modularity
Null models
Expected connections,
a model
can take into account weights
can take into account directions
can take into account attributes or space
S. Fortunato, Phys. Rep. 486 (2010)
S. Fortunato, Phys. Rep. 486 (2010)
P. Expert el al., PNAS 108 (2011)
Section 5 Online Resources (Modularity)
Gephi
NetworkX
R assigns self-loops to nodes to increase or decrease the aversion of nodes to form communities
Finds the partition that maximizes modularity
(considers weights and direction)
Calculates the modularity of the partition you
provide
Section 4 Online Resources (1)
The greedy algorithm is neither particularly fast nor particularly successful at
maximizing M.
Scalability: Due to the sparsity of the adjacency matrix, the update of the matrix
involves a large number of useless operations. The use of data structures for
sparse matrices can decrease the complexity of the computational algorithm to ,
which allows us to analyze is of networks up to nodes. See
"Fast Modularity" Community Structure Inference Algorithm
http://cs.unm.edu/~aaron/research/fastmodularity.htm for the code.
A fast greedy algorithm was proposed by Blondel and collaborators, that can
process networks with millions of nodes. For the description of the algorithm see
Louvain method: Finding communities in large networks
https://sites.google.com/site/findcommunities/ for the code.
Overlapping Communities
Section 6
Section 5 Overlapping Communities
G. Palla et al., Nature 435 (2005).
A.-L. Barabási, Network Science: Communities.
Section 5 Clique Percolation (CFinder)
Other k-cliques that can not be reached from a
particular clique correspond to other clique-
communities
Start with a k-clique (complete subgraphs
of k nodes), a 3-clique for example
Start “rolling” the clique over adjacent
cliques. Two k-cliques are considered
adjacent if they share k-1 nodes
A k-clique community is the largest
connected subgraph obtained by the union
of all adjacent k–cliques G. Palla et al., Nature 435 (2005).
A.-L. Barabási, Network Science: Communities.
Section 5 Overlapping Communities
Bright:
• community containing
light-related words (glow
or dark);
• community capturing
different colors (yellow,
brown)
• community consisting of
astronomical terms (sun,
ray).
• community linked to
intelligence (gifted,
brilliant).
A.-L. Barabási, Network Science: Communities.
Notice that if a random network is sufficiently dense, there are cliques of
varying order.
I. Derényi et al., PRL 94 (2005).
A.-L. Barabási, Network Science: Communities.
Section 5 Could CP communities emerge by chance?
Notice that if a random network is sufficiently dense, there are cliques of
varying order.
A k-clique community emerges in a random graph only if the connection
probability exceeds the threshold:
I. Derényi et al., PRL 94 (2005).
A.-L. Barabási, Network Science: Communities.
Section 5 Could CP communities emerge by chance?
p=0.13 (<pc) p=0.22 (>pc)
N=20 pc=0.16
I. Derényi et al., PRL 94 (2005).
A.-L. Barabási, Network Science: Communities.
Random networks with
Section 5 Could CP communities emerge by chance?
p=0.13 (<pc) p=0.22 (>pc)
N=20 pc=0.16
I. Derényi et al., PRL 94 (2005).
A.-L. Barabási, Network Science: Communities.
Random networks with
Section 5 Could CP communities emerge by chance?
Compare your Cfinder output with that you obtain
in a random graph with same N and p!
Section 5 Clique percolation
Computational complexity:
• Finding maximal cliques require exponential time.
• However the algorithm has to find only k-cliques, which can be done in
polynomial time
I. Derényi et al., PRL 94 (2005).
A.-L. Barabási, Network Science: Communities.
Section 5 Online Resources (CFinder)
The CFinder software package that implements the Clique
Percolation Method can be downloaded at
www.cfinder.org
NetworkX
Section 5 Link Clustering
Social networks, a link may indicate:
• they are in the same family
• they work together
• they share a hobby.
Biological networks:
each interaction of a protein is responsible for a different function, uniquely
defining the protein’s role in the cell
Nodes tend to belong to multiple communities
Links tend to be specific, capturing the nature of the relationship
between two nodes.
Ahn, Bragow and Lehmann, Nature 466 (2010).
A.-L. Barabási, Network Science: Communities.
Define a hierarchical algorithm based on similarity of links
Section 6 Link Clustering
n+(i): the list of the neighbors of node i,
including itself.
S measures the relative number of
common neighbors i and j have.
Ahn, Bragow and Lehmann, Nature 466 (2010).
A.-L. Barabási, Network Science: Communities.
1. Define link similarity
Section 6 Link Clustering
n+(i): the list of the neighbors of node i,
including itself.
S measures the relative number of
common neighbors i and j have.
Ahn, Bragow and Lehmann, Nature 466 (2010).
A.-L. Barabási, Network Science: Communities.
1. Define link similarity
Section 5 Link Clustering
2. Apply hierarchical clustering (agglomerative, single linkage)
Ahn, Bragow and Lehmann, Nature 466 (2010).
A.-L. Barabási, Network Science: Communities.
Section 5 Link Clustering
Ahn, Bragow and Lehmann, Nature 466 (2010).
A.-L. Barabási, Network Science: Communities.
The network of characters in Victor
Hugo’s 1862 novel Les
Miserables. Two characters are
connected if they interact directly
with each other in the story. The
link colors indicate the clusters,
grey nodes corresponding to
single-link clusters. Each node is
depicted as a pie-chart, illustrating
its membership in multiple
communities. Not surprisingly, the
main character, Jean Valjean, has
the most diverse community
membership
Section 5 Link Clustering
Ahn, Bragow and Lehmann, Nature 466 (2010).
A.-L. Barabási, Network Science: Communities.
Computational complexity:
• Step 1: Comparison between two links requires
max(k1,k2) steps. For scale free networks the
step has complexity
• Step 2: hierarchical clustering requires
Section 5 Link Clustering
Ahn, Bragow and Lehmann, Nature 466 (2010).
A.-L. Barabási, Network Science: Communities.
Computational complexity:
• Step 1: Comparison between two links requires
max(k1,k2) steps. For scale free networks the
step has complexity
• Step 2: hierarchical clustering requires
Section 5 Link Clustering
Ahn, Bragow and Lehmann, Nature 466 (2010).
A.-L. Barabási, Network Science: Communities.
Computational complexity:
• Step 1: Comparison between two links requires
max(k1,k2) steps. For scale free networks the
step has complexity
• Step 2: hierarchical clustering requires
for sparse networks
Section 5 Link Clustering
Ahn, Bragow and Lehmann, Nature 466 (2010).
A.-L. Barabási, Network Science: Communities.
Computational complexity:
• Step 1: Comparison between two links requires
max(k1,k2) steps. For scale free networks the
step has complexity
• Step 2: hierarchical clustering requires
for sparse networks

Communities in Network Science

  • 1.
  • 2.
    Section 2 Zachary’sKarate Club W.W. Zachary, J. Anthropol. Res. 33:452-473 (1977). A.-L. Barabási, Network Science: Communities. Zachary's karate club is a social network of a university karate club, described in the paper "An Information Flow Model for Conflict and Fission in Small Groups" by Wayne W. Zachary. The network became a popular example of community structure in networks after its use by Michelle Girvan and Mark Newman in 2002.
  • 3.
    A social networkof a karate club was studied by Wayne W. Zachary for a period of three years from 1970 to 1972.[2] The network captures 34 members of a karate club, documenting links between pairs of members who interacted outside the club. During the study a conflict arose between the administrator "John A" and instructor "Mr. Hi" (pseudonyms), which led to the split of the club into two. Half of the members formed a new club around Mr. Hi; members from the other part found a new instructor or gave up karate. Based on collected data Zachary correctly assigned all but one member of the club to the groups they actually joined after the split.
  • 4.
    Section 2 Zachary’sKarate Club Citation history of the Zachary’s Karate club paper W.W. Zachary, J. Anthropol. Res. 33:452-473 (1977). A.-L. Barabási, Network Science: Communities.
  • 5.
    Section 2 ZacharyKarate Club Club The first scientist at any conference on networks who uses Zachary's karate club as an example is inducted into the Zachary Karate Club Club, and awarded a prize. Chris Moore (9 May 2013). Mason Porter (NetSci, June 2013). Yong-Year Ahn (Oxford University, July 2013) Marián Boguñá (ECCS, September 2013). Mark Newman (Netsci, June 2014) http://networkkarate.tumblr.com/)
  • 6.
    Section 2 Auxiliaryinformation  Karate Club: Breakup of the club  Belgian Phone Data: Language spoken
  • 7.
    • Belgium appearsto be the model bicultural society: 59% of its citizens are Flemish, speaking Dutch and 40% are Walloons who speak French. • Vincent Blondel and his students in 2007 developed an algorithm to identify the country’s community structure. They started from the mobile call network.
  • 8.
    Section 2 BiologicalModules E. Ravasz et al., Science 297 (2002). A.-L. Barabási, Network Science: Communities. The E. coli metabolism offers a community structure of biological systems. a.The biological modules (communities) identified by the Ravasz algorithm
  • 9.
    Communities in MetabolicNetworks The E. coli metabolism offers a community structure of biological systems [11]. a.The biological modules (communities) identified by the Ravasz algorithm [11] (SECTION 9.3). The color of each node, capturing the predominant biochemical class to which it belongs, indicates that different functional classes are segregated in distinct network neighborhoods. The highlighted region selects the nodes that belong to the pyrimidine metabolism, one of the predicted communities. b.The topologic overlap matrix of the E. coli metabolism and the corresponding dendrogram that allows us to identify the modules shown in (a). The color of the branches reflect the predominant biochemical role of the participating molecules, like carbohydrates (blue), nucleotide and nucleic acid metabolism (red), and lipid metabolism (cyan). c.The red right branch of the dendrogram tree shown in (b), highlighting the region corresponding to the pyridine module. d.The detailed metabolic reactions within the pyrimidine module. The boxes around the reactions highlight the communities predicted by the Ravasz algorithm.
  • 10.
  • 11.
    What do wereally mean by a community? How many communities are in a network? How many different ways can we partition a network into communities?
  • 12.
    Section 2 Communities A.-L.Barabási, Network Science: Communities. We focus on the mesoscopic scale of the network Microscopic Mesoscopic Macroscopic
  • 13.
    Section 2 FundamentalHypothesis A.-L. Barabási, Network Science: Communities. H1: A network’s community structure is uniquely encoded in its wiring diagram According to the fundamental hypothesis there is a ground truth about a network’s community organization, that can be uncovered by inspecting Aij.
  • 14.
    Section 3 Basicsof Communities H2: Connectedness Hypothesis A community corresponds to a connected subgraph. H3: Density Hypothesis Communities correspond to locally dense neighborhoods of a network. A.-L. Barabási, Network Science: Communities.
  • 15.
    Section 3 Basicsof Communities H2: Connectedness Hypothesis A community corresponds to a connected subgraph. H3: Density Hypothesis Communities correspond to locally dense neighborhoods of a network. A.-L. Barabási, Network Science: Communities.
  • 16.
    Section 3 Basicsof Communities Cliques as communities A clique is a complete subgraph of k-nodes R.D. Luce & A.D. Perry, Psychometrika 14 (1949) A.-L. Barabási, Network Science: Communities.
  • 17.
    Section 3 Basicsof Communities • Triangles are frequent; larger cliques are rare. • Communities do not necessarily correspond to complete subgraphs, as many of their nodes do not link directly to each other. • Finding the cliques of a network is computationally rather demanding, being a so-called NP-complete problem. Cliques as communities
  • 18.
    Section 3 Basicsof Communities Consider a connected subgraph C of Nc nodes Internal degree, ki int : set of links of node i that connects to other nodes of the same community C. External degree ki ext: the set of links of node i that connects to the rest of the network. If ki ext=0: all neighbors of i belong to C, and C is a good community for i. If ki int=0, all neighbors of i belong to other communities, then i should be assigned to a different community. Strong and weak communities A.-L. Barabási, Network Science: Communities.
  • 19.
    Section 3 Basicsof Communities Strong community: Each node of C has more links within the community than with the rest of the graph. Weak community: The total internal degree of C exceeds its total external degree, Clique Strong Weak A.-L. Barabási, Network Science: Communities. Each clique is a strong community and each strong community is a week community. The converse is generally not true.
  • 20.
    Section 3 Numberof Partitions How many ways can we partition a network into 2 communities? Divide a network into two equal non-overlapping subgraphs, such that the number of links between the nodes in the two groups is minimized. Two subgroups of size n1 and n2. Total number of combinations: N=10  256 partitions (1 ms) N=100 1026 partitions (1021 years) Graph bisection A.-L. Barabási, Network Science: Communities.
  • 21.
    Section 3 GraphPartitions (history) 2.5 billion transistors partition the full wiring diagram of an integrated circuit into smaller subgraphs, so that they minimize the number of connections between them. Graph Partitioning
  • 22.
    Two-way partitioning problem Eachnode has unit size Each edge has unit weight Find two partition V1 and V2 such that Each of V1 and V2 has equal size External wiring will be minimum (cut-set will have to minimize)
  • 23.
  • 24.
    2 3 4 5 6 7 8 9 10 11 12 13 14 15 s =1 t = 16 st-numbering i s t  , has two neighbors j, k . k i j  
  • 25.
    2 3 4 5 6 7 8 9 10 11 12 13 14 15 s =1 t = 16 st-numbering Size of cutset = 4
  • 26.
    2 3 4 5 6 7 8 9 10 11 12 13 14 15 s = 1 t= 16 st-numbering Size of cutset = 3 To find a bipartition with the minimum cutset, we have to enumerate all bipartitions. We need to enumerate all st-numbering.
  • 27.
    Two-way partitioning problem Eachnode has unit size Each edge has unit weight Find two partitions V1 and V2 such that Each of V1 and V2 has equal size External wiring will be minimum (cut-set will have to minimmize) NP-hard problem. Heuristic techniques to approximate solutions.
  • 28.
    Section 3 GraphPartitions (history) Kerninghan-Lin Algorithm for graph bisection • Partition a network into two groups of predefined size. This partition is called cut. • Inspect each a pair of nodes, one from each group. Identify the pair that results in the largest reduction of the cut size (links between the two groups) if we swap them • Swap them. • If no pair deduces the cut size, we swap the pair that increases the cut size the least. • The process is repeated until each node is moved once. Fiduccia–Mattheyses (FM) Partitioning Algorithm
  • 29.
    Kernighan-Lin (KL) Algorithm     B A n B A  Initialpartition A, B Size of the cut set    B b A a ab c T , We have to minimize the size of the cut set.
  • 30.
    Initial Partition Optimal Partition A,B A*, B* Swap B Y with A X   such that B A Y B A X Y X   * *   
  • 31.
    Initial Partition OptimalPartition B* Swap B Y with A X   such that B A Y B A X Y X   * *    How to find X and Y ? A B A* X Y X Y
  • 32.
    Kernighan-Lin (KL) Algorithm •Iterate as long as the cutsize improves: • Find a pair of vertices that result in the largest decrease in cutsize if exchanged • Exchange the two vertices (potential move) • “Lock” the vertices • If no improvement possible, and still some vertices unlocked, then exchange vertices that result in smallest increase in cutsize
  • 33.
    Kernighan-Lin (KL) Algorithm •Initialize • Bipartition G into V1 and V2, s.t., |V1| = |V2|  1 • n = |V| • Repeat • for i=1 to n/2 • Find a pair of unlocked vertices vai V1 and vbi V2 whose exchange makes the largest decrease or smallest increase in cut-cost • Mark vai and vbi as locked • Store the gain gi. • Find k, s.t. i=1..k gi=Gaink is maximized • If Gaink > 0 then move va1,...,vak from V1 to V2 and vb1,...,vbk from V2 to V1. • Until Gaink  0
  • 34.
    Kernighan-Lin (KL) Example a b c d e f g h 4{ a, e } -2 0 -- 0 1 { d, g } 3 2 { c, f } 1 3 { b, h } -2 Step No. Vertex Pair Gain 5 5 2 1 3 Cut-cost [©Sarrafzadeh] Gain sum 0 3 4 2 0
  • 35.
    Kernighan-Lin (KL) Example a bc d e f g h 4 { a, e } -2 0 -- 0 1 { d, g } 3 2 { c, f } 1 3 { b, h } -2 Step No. Vertex Pair Gain 5 5 2 1 3 Cut-cost [©Sarrafzadeh] Gain sum 0 3 4 2 0
  • 36.
    Kernighan-Lin (KL) :Analysis • Time complexity? • Inner (for) loop • Iterates n/2 times • Iteration 1: (n/2) x (n/2) • Iteration i: (n/2 – i + 1)2. • Passes? Usually independent of n • O(n3) • Drawbacks? • Local optimum • Balanced partitions only • No weight for the vertices • High time complexity
  • 37.
    Internal cost GA GB a1 a2 an ai a3 a5 a6 a4 b2 bj b4 b3 b1 b6 b7 b5          A x B y y b x b b b b a a a j j j j j i i i C C I E D I E D Likewise, [©Kang] External cost       B y y a a A x x a a i i i i C E C I ,
  • 38.
    • Lemma: Considerany ai  A, bj  B. If ai, bj are interchanged, the gain is • Proof: Total cost before interchange (T) between A and B Total cost after interchange (T’) between A and B Therefore Gain Calculation (cont.) j i j i b a b a C D D g 2    [©Kang] others) all for cost (     j i j i b a b a C E E T others) all for cost (      j i j i b a b a C I I T j i j j i i b a b b a a C I E I E T T g 2         i a D j b D
  • 39.
    Gain Calculation (cont.) •Lemma: • Let Dx’, Dy’ be the new D values for elements of A - {ai} and B - {bj}. Then after interchanging ai & bj, • Proof: • The edge x-ai changed from internal in Dx to external in Dx’ • The edge y-bj changed from internal in Dx to external in Dx’ • The x-bj edge changed from external to internal • The y-ai edge changed from external to internal • More clarification in the next two slides } { , 2 2 } { , 2 2 j ya yb y y i xb xa x x b B y C C D D a A x C C D D i j j i             [©Kang]
  • 40.
    Clarification of theLemma ai bj x a b
  • 41.
    • Decompose Ixand Ex to separate edges from ai and bj: Write the equations before the move • ... And after the move b a     j i xb x xa x C E C I j i i j xb xa xa xb x x x C C C C I E D            b a a b ) ( ) ( j i j i xb xa x xb xa x C C D C C D 2 2          b a b a       i j xa x xb x C E C I
  • 42.
    Example: KL • Step1 - Initialization A = {2, 3, 4}, B = {1, 5, 6} A’ = A = {2, 3, 4}, B’ = B = {1, 5, 6} • Step 2 - Compute D values D1 = E1 - I1 = 1-0 = +1 D2 = E2 - I2 = 1-2 = -1 D3 = E3 - I3 = 0-1 = -1 D4 = E4 - I4 = 2-1 = +1 D5 = E5 - I5 = 1-1 = +0 D6 = E6 - I6 = 1-1 = +0 [©Kang] 5 6 4 2 1 3 Initial partition 4 5 6 2 3 1
  • 43.
    Example: KL (cont.) •Step 3 - compute gains g21 = D2 + D1 - 2C21 = (-1) + (+1) - 2(1) = -2 g25 = D2 + D5 - 2C25 = (-1) + (+0) - 2(0) = -1 g26 = D2 + D6 - 2C26 = (-1) + (+0) - 2(0) = -1 g31 = D3 + D1 - 2C31 = (-1) + (+1) - 2(0) = 0 g35 = D3 + D5 - 2C35 = (-1) + (0) - 2(0) = -1 g36 = D3 + D6 - 2C36 = (-1) + (0) - 2(0) = -1 g41 = D4 + D1 - 2C41 = (+1) + (+1) - 2(0) = +2 g45 = D4 + D5 - 2C45 = (+1) + (+0) - 2(+1) = -1 g46 = D4 + D6 - 2C46 = (+1) + (+0) - 2(+1) = -1 • The largest g value is g41 = +2 interchange 4 and 1 (a1, b1) = (4, 1) A’ = A’ - {4} = {2, 3} B’ = B’ - {1} = {5, 6} both not empty [©Kang]
  • 44.
    Example: KL (cont.) •Step 4 - update D values of node connected to vertices (4, 1) D2’ = D2 + 2C24 - 2C21 = (-1) + 2(+1) - 2(+1) = -1 D5’ = D5 + 2C51 - 2C54 = +0 + 2(0) - 2(+1) = -2 D6’ = D6 + 2C61 - 2C64 = +0 + 2(0) - 2(+1) = -2 • Assign Di = Di’, repeat step 3 : g25 = D2 + D5 - 2C25 = -1 - 2 - 2(0) = -3 g26 = D2 + D6 - 2C26 = -1 - 2 - 2(0) = -3 g35 = D3 + D5 - 2C35 = -1 - 2 - 2(0) = -3 g36 = D3 + D6 - 2C36 = -1 - 2 - 2(0) = -3 • All values are equal; arbitrarily choose g36 = -3  (a2, b2) = (3, 6) A’ = A’ - {3} = {2}, B’ = B’ - {6} = {5} New D values are: D2’ = D2 + 2C23 - 2C26 = -1 + 2(1) - 2(0) = +1 D5’ = D5 + 2C56 - 2C53 = -2 + 2(1) - 2(0) = +0 • New gain with D2  D2’, D5  D5’ g25 = D2 + D5 - 2C52 = +1 + 0 - 2(0) = +1  (a3, b3) = (2, 5) [©Kang]
  • 45.
    Example: KL (cont.) •Step 5 - Determine the # of moves to take g1 = +2 g1 + g2 = +2 - 3 = -1 g1 + g2 + g3 = +2 - 3 + 1 = 0 • The value of k for max G is 1 X = {a1} = {4}, Y = {b1} = {1} • Move X to B, Y to A  A = {1, 2, 3}, B = {4, 5, 6} • Repeat the whole process: • • • • • • The final solution is A = {1, 2, 3}, B = {4, 5, 6} 5 6 4 2 1 3
  • 46.
    Section 3 Numberof communities Community detection The number and size of the communities are unknown at the beginning. Partition Division of a network into groups of nodes, so that each node belongs to one group. Bell Number: number of possible partitions of N nodes A.-L. Barabási, Network Science: Communities.
  • 47.
  • 48.
    Section 4 HierarchicalClustering Agglomerative algorithms merge nodes and communities with high similarity. Divisive algorithms split communities by removing links that connect nodes with low similarity. 1. Build a similarity matrix for the network 2. Similarity matrix: how similar two nodes are to each other  we need to determine from the adjacency matrix 3. Hierarchical clustering iteratively identifies groups of nodes with high similarity, following one of two distinct strategies: Hierarchical tree or dendrogram: visualize the history of the merging or splitting process the algorithm follows. Horizontal cuts of this tree offer various community partitions. 4.
  • 49.
    Section 4 AgglomerativeAlgorithms Step 1: Define the Similarity Matrix (Ravasz algorithm) • High for node pairs that likely belong to the same community, low for those that likely belong to different communities. • Nodes that connect directly to each other and/or share multiple neighbors are more likely to belong to the same dense local neighborhood, hence their similarity should be large. Topological overlap matrix: JN(i,j): number of common neighbors of node i and j; (+1) if there is a direct link between i and j; E. Ravasz et al., Science 297 (2002). A.-L. Barabási, Network Science: Communities. Agglomerative algorithms merge nodes and communities with high similarity.
  • 50.
    Section 4 AgglomerativeAlgorithms E. Ravasz et al., Science 297 (2002). A.-L. Barabási, Network Science: Communities. Step 2: Decide Group Similarity • Groups are merged based on their mutual similarity through single, complete or average cluster linkage
  • 51.
    Section 4 AgglomerativeAlgorithms Step 3: Apply Hierarchical Clustering • Assign each node to a community of its own and evaluate the similarity for all node pairs. The initial similarities between these “communities” are simply the node similarities. • Find the community pair with the highest similarity and merge them to form a single community. • Calculate the similarity between the new community and all other communities. • Repeat from Step 2 until all nodes are merged into a single community. Step 4: Build Dendrogram • Describes the precise order in which the nodes are assigned to communities. E. Ravasz et al., Science 297 (2002). A.-L. Barabási, Network Science: Communities.
  • 52.
    Section 4 AgglomerativeAlgorithms Computational complexity: • Step 1 (calculation similarity matrix): • Step 2-3 (group similarity): • Step 4 (dendrogram): E. Ravasz et al., Science 297 (2002). A.-L. Barabási, Network Science: Communities.
  • 53.
    Section 4 DivisiveAlgorithms Step 1: Define a Centrality Measure (Girvan-Newman algorithm) • Link betweenness is the number of shortest paths between all node pairs that run along a link. • Random-walk betweenness. A pair of nodes m and n are chosen at random. A walker starts at m, following each adjacent link with equal probability until it reaches n. Random walk betweenness xij is the probability that the link i→j was crossed by the walker after averaging over all possible choices for the starting nodes m and n Divisive algorithms split communities by removing links that connect nodes with low similarity. M. Girvan & M.E.J. Newman, PNAS 99 (2002). A.-L. Barabási, Network Science: Communities.
  • 54.
    Section 4 DivisiveAlgorithms M. Girvan & M.E.J. Newman, PNAS 99 (2002). A.-L. Barabási, Network Science: Communities. Step 2: Hierarchical Clustering a) Compute of the centrality of each link. b) Remove the link with the largest centrality; in case of a tie, choose one randomly. c) Recalculate the centrality of each link for the altered network. d) Repeat until all links are removed (yields a dendrogram).
  • 55.
    Section 4 DivisiveAlgorithms M. Girvan & M.E.J. Newman, PNAS 99 (2002). A.-L. Barabási, Network Science: Communities. Step 2: Hierarchical Clustering a) Compute of the centrality of each link. b) Remove the link with the largest centrality; in case of a tie, choose one randomly. c) Recalculate the centrality of each link for the altered network. d) Repeat until all links are removed (yields a dendrogram).
  • 56.
    Section 4 DivisiveAlgorithm M. Girvan & M.E.J. Newman, PNAS 99 (2002). A.-L. Barabási, Network Science: Communities. Computational complexity: • Step 1a (calculation betweenness centrality): • Step 1b (Recalculation of betweenness centrality for all links): for sparse networks
  • 57.
    Section 4 Hierarchyin networks Can a hierarchical network be scale-free?
  • 58.
    Section 4 Hierarchyin networks (1) Scale-free property The obtained network is scale-free, its degree distribution following a power- law with E. Ravasz & A.-L. Barabási, PRE 67 (2003). A.-L. Barabási, Network Science: Communities. A construction
  • 59.
    Section 4 Hierarchyin networks (1) Scale-free property The obtained network is scale-free, its degree distribution following a power- law with E. Ravasz & A.-L. Barabási, PRE 67 (2003). A.-L. Barabási, Network Science: Communities.
  • 60.
    Section 4 Hierarchyin networks (2) Clustering coefficient scaling with k Small k nodes: *high clustering coefficient; *their neighbors tend to link to each other in highly interlinked, compact communities. High k nodes (hubs): *small clustering coefficient; *connect independent communities. E. Ravasz & A.-L. Barabási, PRE 67 (2003). A.-L. Barabási, Network Science: Communities.
  • 61.
    Section 4 Hierarchyin networks (3) Clustering coefficient independent of N E. Ravasz & A.-L. Barabási, PRE 67 (2003). A.-L. Barabási, Network Science: Communities.
  • 62.
    Section 4 Hierarchyin networks (3) Clustering coefficient independent of N E. Ravasz & A.-L. Barabási, PRE 67 (2003). A.-L. Barabási, Network Science: Communities.
  • 63.
    2. Scaling clustering coefficient(DGM) 1. Scale-free 3. Clustering coefficient independent of N x E. Ravasz & A.-L. Barabási, PRE 67 (2003). A.-L. Barabási, Network Science: Communities. Section 4 Hierarchy in networks
  • 64.
    A.-L. Barabási, NetworkScience: Communities. Section 4 Hierarchy in real networks POWER GRID INTERNET
  • 65.
    Section 4 Ambiguityin Hierarchical clustering A.-L. Barabási, Network Science: Communities. Where to “cut”?
  • 66.
    Phylogenetic dendrograms In bioinformatrics,clusters and dendrograms have been studied for a long time. For example, the sequences of the same protein or gene in different species are selected, and compared with each other.
  • 67.
    Phylogenetic dendrograms A similaritymatrix is constructed between these sequences, by looking at how many aminoacids/nucleotides stay in place
  • 68.
    Phylogenetic dendrograms A similaritymatrix is constructed between these sequences, by looking at how many aminoacids/nucleotides stay in place
  • 69.
  • 70.
  • 71.
  • 72.
    Section 4 Modularity MEJNewman, PNAS 103 (2006). A.-L. Barabási, Network Science: Communities. H4: Random Hypothesis Randomly wired networks are not expected to have a community structure.
  • 73.
    Section 4 Modularity MEJNewman, PNAS 103 (2006). A.-L. Barabási, Network Science: Communities. Imagine a partition in nc communities Modularity H4: Random Hypothesis Randomly wired networks are not expected to have a community structure.
  • 74.
    Section 4 Modularity MEJNewman, PNAS 103 (2006). A.-L. Barabási, Network Science: Communities. Imagine a partition in nc communities Modularity Original data H4: Random Hypothesis Randomly wired networks are not expected to have a community structure.
  • 75.
    Section 4 Modularity MEJNewman, PNAS 103 (2006). A.-L. Barabási, Network Science: Communities. Imagine a partition in nc communities Modularity Original data Expected connections, a model H4: Random Hypothesis Randomly wired networks are not expected to have a community structure.
  • 76.
    Section 4 Modularity MEJNewman, PNAS 103 (2006). A.-L. Barabási, Network Science: Communities. Imagine a partition in nc communities Modularity Original data Expected connections, a model Relative to a specific partition H4: Random Hypothesis Randomly wired networks are not expected to have a community structure.
  • 77.
    Section 4 Modularity MEJNewman, PNAS 103 (2006). A.-L. Barabási, Network Science: Communities. Imagine a partition in nc communities Modularity Original data Expected connections, a model Relative to a specific partition Modularity is a measure associated to a partition Random network H4: Random Hypothesis Randomly wired networks are not expected to have a community structure.
  • 78.
    Section 4 Modularity Anotherway of writing M MEJ Newman, PNAS 103 (2006). A.-L. Barabási, Network Science: Communities. where LC is the number of links within C. In a similar fashion, the second term becomes We can rewrite the first term as Finally we get:
  • 79.
    Section 4 Modularity MEJNewman, PNAS 103 (2006). A.-L. Barabási, Network Science: Communities. H5: Maximal Modularity Hypothesis The partition with the maximum modularity M for a given network offers the optimal community structure
  • 80.
    Section 4 Modularity MEJNewman, PNAS 103 (2006). A.-L. Barabási, Network Science: Communities. H5: Maximal Modularity Hypothesis The partition with the maximum modularity M for a given network offers the optimal community structure Find Goal that maximizes M
  • 81.
    Section 4 Modularity •Optimal partition, that maximizes the modularity. • Sub-optimal but positive modularity. • Negative Modularity: If we assign each node to a different community. • Zero modularity: Assigning all nodes to the same community, independent of the network structure. • Modularity is size dependent Which partition ? A.-L. Barabási, Network Science: Communities.
  • 82.
    Section 4 Modularitybased community identification A greedy algorithm, which iteratively joins nodes if the move increases the new partition’s modularity. Step 1. Assign each node to a community of its own. Hence we start with N communities. Step 2. Inspect each pair of communities connected by at least one link and compute the modularity variation obtained if we merge these two communities. Step 3. Identify the community pairs for which ΔM is the largest and merge them. Note that modularity of a particular partition is always calculated from the full topology of the network. Step 4. Repeat step 2 until all nodes are merged into a single community. Step 5. Record for each step and select the partition for which the modularity is maximal. MEJ Newman, PRE 69 (2004). A.-L. Barabási, Network Science: Communities.
  • 83.
    Section 4 Modularity Whichpartition ? A.-L. Barabási, Network Science: Communities. Modularity can be used to compare different partitions provided by other algorithms, like hierarchical clustering It can be used to design new algorithms, aiming at maximizing M
  • 84.
    Section 4 Modularityfor the Girvan-Newman Which partition ? A.-L. Barabási, Network Science: Communities.
  • 85.
    Section 4 Modularitybased community identification MEJ Newman, PRE 69 (2004). A.-L. Barabási, Network Science: Communities. Computational complexity: • Step 1-2 (calculation of ΔM for L links ): • Step 3 (matrix update): • Step 4 (N-1 community merges): for sparse networks
  • 86.
    Section 4 Modularitybased community identification MEJ Newman, PRE 69 (2004). A.-L. Barabási, Network Science: Communities. Computational complexity: • Step 1-2 (calculation of ΔM for L links ): • Step 3 (matrix update): • Step 4 (N-1 community merges): for sparse networks
  • 87.
    Section 4 Limitsof Modularity A.-L. Barabási, Network Science: Communities. kA and kB total degree in A and B A B Resolution limit
  • 88.
    Section 4 Limitsof Modularity A.-L. Barabási, Network Science: Communities. kA and kB total degree in A and B If and A B Resolution limit
  • 89.
    Section 4 Limitsof Modularity A.-L. Barabási, Network Science: Communities. kA and kB total degree in A and B If and A B We merge A and B to maximize modularity. Resolution limit
  • 90.
    Section 4 Limitsof Modularity A.-L. Barabási, Network Science: Communities. kA and kB total degree in A and B If and Assuming A B We merge A and B to maximize modularity. Resolution limit
  • 91.
    Section 4 Limitsof Modularity A.-L. Barabási, Network Science: Communities. kA and kB total degree in A and B If and Assuming Modularity has a resolution limit, as it cannot detect communities smaller than this size. A B We merge A and B to maximize modularity. Resolution limit
  • 92.
    Section 4 Limitsof Modularity A.-L. Barabási, Network Science: Communities. One maximum?
  • 93.
    Section 4 Limitsof Modularity Null models Expected connections, a model can take into account weights can take into account directions can take into account attributes or space S. Fortunato, Phys. Rep. 486 (2010) S. Fortunato, Phys. Rep. 486 (2010) P. Expert el al., PNAS 108 (2011)
  • 94.
    Section 5 OnlineResources (Modularity) Gephi NetworkX R assigns self-loops to nodes to increase or decrease the aversion of nodes to form communities Finds the partition that maximizes modularity (considers weights and direction) Calculates the modularity of the partition you provide
  • 95.
    Section 4 OnlineResources (1) The greedy algorithm is neither particularly fast nor particularly successful at maximizing M. Scalability: Due to the sparsity of the adjacency matrix, the update of the matrix involves a large number of useless operations. The use of data structures for sparse matrices can decrease the complexity of the computational algorithm to , which allows us to analyze is of networks up to nodes. See "Fast Modularity" Community Structure Inference Algorithm http://cs.unm.edu/~aaron/research/fastmodularity.htm for the code. A fast greedy algorithm was proposed by Blondel and collaborators, that can process networks with millions of nodes. For the description of the algorithm see Louvain method: Finding communities in large networks https://sites.google.com/site/findcommunities/ for the code.
  • 96.
  • 97.
    Section 5 OverlappingCommunities G. Palla et al., Nature 435 (2005). A.-L. Barabási, Network Science: Communities.
  • 98.
    Section 5 CliquePercolation (CFinder) Other k-cliques that can not be reached from a particular clique correspond to other clique- communities Start with a k-clique (complete subgraphs of k nodes), a 3-clique for example Start “rolling” the clique over adjacent cliques. Two k-cliques are considered adjacent if they share k-1 nodes A k-clique community is the largest connected subgraph obtained by the union of all adjacent k–cliques G. Palla et al., Nature 435 (2005). A.-L. Barabási, Network Science: Communities.
  • 99.
    Section 5 OverlappingCommunities Bright: • community containing light-related words (glow or dark); • community capturing different colors (yellow, brown) • community consisting of astronomical terms (sun, ray). • community linked to intelligence (gifted, brilliant). A.-L. Barabási, Network Science: Communities.
  • 100.
    Notice that ifa random network is sufficiently dense, there are cliques of varying order. I. Derényi et al., PRL 94 (2005). A.-L. Barabási, Network Science: Communities. Section 5 Could CP communities emerge by chance?
  • 101.
    Notice that ifa random network is sufficiently dense, there are cliques of varying order. A k-clique community emerges in a random graph only if the connection probability exceeds the threshold: I. Derényi et al., PRL 94 (2005). A.-L. Barabási, Network Science: Communities. Section 5 Could CP communities emerge by chance?
  • 102.
    p=0.13 (<pc) p=0.22(>pc) N=20 pc=0.16 I. Derényi et al., PRL 94 (2005). A.-L. Barabási, Network Science: Communities. Random networks with Section 5 Could CP communities emerge by chance?
  • 103.
    p=0.13 (<pc) p=0.22(>pc) N=20 pc=0.16 I. Derényi et al., PRL 94 (2005). A.-L. Barabási, Network Science: Communities. Random networks with Section 5 Could CP communities emerge by chance? Compare your Cfinder output with that you obtain in a random graph with same N and p!
  • 104.
    Section 5 Cliquepercolation Computational complexity: • Finding maximal cliques require exponential time. • However the algorithm has to find only k-cliques, which can be done in polynomial time I. Derényi et al., PRL 94 (2005). A.-L. Barabási, Network Science: Communities.
  • 105.
    Section 5 OnlineResources (CFinder) The CFinder software package that implements the Clique Percolation Method can be downloaded at www.cfinder.org NetworkX
  • 106.
    Section 5 LinkClustering Social networks, a link may indicate: • they are in the same family • they work together • they share a hobby. Biological networks: each interaction of a protein is responsible for a different function, uniquely defining the protein’s role in the cell Nodes tend to belong to multiple communities Links tend to be specific, capturing the nature of the relationship between two nodes. Ahn, Bragow and Lehmann, Nature 466 (2010). A.-L. Barabási, Network Science: Communities. Define a hierarchical algorithm based on similarity of links
  • 107.
    Section 6 LinkClustering n+(i): the list of the neighbors of node i, including itself. S measures the relative number of common neighbors i and j have. Ahn, Bragow and Lehmann, Nature 466 (2010). A.-L. Barabási, Network Science: Communities. 1. Define link similarity
  • 108.
    Section 6 LinkClustering n+(i): the list of the neighbors of node i, including itself. S measures the relative number of common neighbors i and j have. Ahn, Bragow and Lehmann, Nature 466 (2010). A.-L. Barabási, Network Science: Communities. 1. Define link similarity
  • 109.
    Section 5 LinkClustering 2. Apply hierarchical clustering (agglomerative, single linkage) Ahn, Bragow and Lehmann, Nature 466 (2010). A.-L. Barabási, Network Science: Communities.
  • 110.
    Section 5 LinkClustering Ahn, Bragow and Lehmann, Nature 466 (2010). A.-L. Barabási, Network Science: Communities. The network of characters in Victor Hugo’s 1862 novel Les Miserables. Two characters are connected if they interact directly with each other in the story. The link colors indicate the clusters, grey nodes corresponding to single-link clusters. Each node is depicted as a pie-chart, illustrating its membership in multiple communities. Not surprisingly, the main character, Jean Valjean, has the most diverse community membership
  • 111.
    Section 5 LinkClustering Ahn, Bragow and Lehmann, Nature 466 (2010). A.-L. Barabási, Network Science: Communities. Computational complexity: • Step 1: Comparison between two links requires max(k1,k2) steps. For scale free networks the step has complexity • Step 2: hierarchical clustering requires
  • 112.
    Section 5 LinkClustering Ahn, Bragow and Lehmann, Nature 466 (2010). A.-L. Barabási, Network Science: Communities. Computational complexity: • Step 1: Comparison between two links requires max(k1,k2) steps. For scale free networks the step has complexity • Step 2: hierarchical clustering requires
  • 113.
    Section 5 LinkClustering Ahn, Bragow and Lehmann, Nature 466 (2010). A.-L. Barabási, Network Science: Communities. Computational complexity: • Step 1: Comparison between two links requires max(k1,k2) steps. For scale free networks the step has complexity • Step 2: hierarchical clustering requires for sparse networks
  • 114.
    Section 5 LinkClustering Ahn, Bragow and Lehmann, Nature 466 (2010). A.-L. Barabási, Network Science: Communities. Computational complexity: • Step 1: Comparison between two links requires max(k1,k2) steps. For scale free networks the step has complexity • Step 2: hierarchical clustering requires for sparse networks