Communities in Network Science

Network Science
Communities
1736 ?
2000+ ?

Section 2 Zachary’s Karate Club
W.W. Zachary, J. Anthropol. Res. 33:452-473 (1977).
A.-L. Barabási, Network Science: Communities.
Zachary's karate club is a social network of a university karate club,
described in the paper "An Information Flow Model for Conflict and
Fission in Small Groups" by Wayne W. Zachary. The network became a
popular example of community structure in networks after its use by
Michelle Girvan and Mark Newman in 2002.

A social network of a karate club was studied by Wayne
W. Zachary for a period of three years from 1970 to
1972.[2]
The network captures 34 members of a karate club,
documenting links between pairs of members who
interacted outside the club.
During the study a conflict arose between the
administrator "John A" and instructor "Mr. Hi"
(pseudonyms), which led to the split of the club into
two.
Half of the members formed a new club around Mr. Hi;
members from the other part found a new instructor or
gave up karate.
Based on collected data Zachary correctly assigned all
but one member of the club to the groups they actually
joined after the split.

Section 2 Zachary’s Karate Club
Citation history
of the Zachary’s Karate club paper
W.W. Zachary, J. Anthropol. Res. 33:452-473 (1977).

Section 2 Zachary Karate Club Club
The first scientist at any conference on networks
who uses Zachary's karate club as an example is
inducted into the Zachary Karate Club Club, and
awarded a prize.
Chris Moore (9 May 2013).
Mason Porter (NetSci, June 2013).
Yong-Year Ahn (Oxford University, July 2013)
Marián Boguñá (ECCS, September 2013).
Mark Newman (Netsci, June 2014)
http://networkkarate.tumblr.com/)

Section 2 Auxiliary information
 Karate Club:
Breakup of the club
 Belgian Phone Data:
Language spoken

• Belgium appears to be the model bicultural society:
59% of its citizens are Flemish, speaking Dutch and
40% are Walloons who speak French.
• Vincent Blondel and his students in 2007 developed
an algorithm to identify the country’s community
structure. They started from the mobile call
network.

Section 2 Biological Modules
E. Ravasz et al., Science 297 (2002).
The E. coli metabolism
offers a community
structure of biological
systems.
a.The biological modules
(communities) identified
by the Ravasz algorithm

Communities in Metabolic Networks
The E. coli metabolism offers a community structure of biological
systems [11].
a.The biological modules (communities) identified by the Ravasz
algorithm [11] (SECTION 9.3). The color of each node, capturing
the predominant biochemical class to which it belongs, indicates
that different functional classes are segregated in distinct
network neighborhoods. The highlighted region selects the nodes
that belong to the pyrimidine metabolism, one of the predicted
communities.
b.The topologic overlap matrix of the E. coli metabolism and the
corresponding dendrogram that allows us to identify the modules
shown in (a). The color of the branches reflect the predominant
biochemical role of the participating molecules, like
carbohydrates (blue), nucleotide and nucleic acid metabolism
(red), and lipid metabolism (cyan).
c.The red right branch of the dendrogram tree shown in (b),
highlighting the region corresponding to the pyridine module.
d.The detailed metabolic reactions within the pyrimidine module.
The boxes around the reactions highlight the communities
predicted by the Ravasz algorithm.

Basics of communities
Section 3

What do we really mean by a community?
How many communities are in a network?
How many different ways can we partition a
network into communities?

Section 2 Communities
We focus on the mesoscopic scale of the network
Microscopic Mesoscopic Macroscopic

Section 2 Fundamental Hypothesis
H1: A network’s community structure is
uniquely encoded in its wiring diagram
According to the fundamental hypothesis there is a
ground truth about a network’s community
organization, that can be uncovered by inspecting
Aij.

Section 3 Basics of Communities
H2: Connectedness Hypothesis
A community corresponds to a connected
subgraph.
H3: Density Hypothesis
Communities correspond to locally dense
neighborhoods of a network.

Cliques as communities
A clique is a complete subgraph of k-nodes
R.D. Luce & A.D. Perry, Psychometrika 14 (1949)

• Triangles are frequent; larger cliques
are rare.
• Communities do not necessarily
correspond to complete subgraphs, as
many of their nodes do not link directly
to each other.
• Finding the cliques of a network is
computationally rather demanding,
being a so-called NP-complete problem.
Cliques as communities

Consider a connected subgraph C of Nc nodes
Internal degree, ki
int : set of links of node i that connects
to other nodes of the same community C.
External degree ki
ext: the set of links of node i that
connects to the rest of the network.
If ki
ext=0: all neighbors of i belong to C, and C is a good
community for i.
If ki
int=0, all neighbors of i belong to other communities,
then i should be assigned to a different community.
Strong and weak communities

Strong community:
Each node of C has more links within the
community than with the rest of the graph.
Weak community:
The total internal degree of C exceeds its
total external degree,
Clique Strong Weak
Each clique is a strong community and each strong community is a week
community. The converse is generally not true.

Section 3 Number of Partitions
How many ways can we partition a network into 2 communities?
Divide a network into two equal non-overlapping subgraphs, such that the
number of links between the nodes in the two groups is minimized.
Two subgroups of size n1 and n2. Total number of combinations:
N=10  256 partitions (1 ms)
N=100 1026 partitions (1021 years)
Graph bisection

Section 3 Graph Partitions (history)
2.5 billion transistors
partition the full wiring diagram of an
integrated circuit into smaller
subgraphs, so that they minimize the
number of connections between them.
Graph Partitioning

Two-way partitioning problem
Each node has unit size
Each edge has unit weight
Find two partition V1 and V2 such that
Each of V1 and V2 has equal size
External wiring will be minimum (cut-set will have
to minimize)

2
3
4
5
6
7
8 9
10
11
12
13
14
15
16

2
3
4
5
6
7
8 9
10
11
12
13
14
15
s = 1
t = 16
st-numbering
i s t
 , has two neighbors j, k
.
k
i
j 


2
3
4
5
6
7
8 9
10
11
12
13
14
15
s = 1
t = 16
st-numbering
Size of cutset = 4

2
3
4
5
6
7
8
9
10
11
12
13
14
15
s = 1
t = 16
st-numbering
Size of cutset = 3
To find a bipartition with the minimum cutset, we have to enumerate all
bipartitions.
We need to enumerate all st-numbering.

Two-way partitioning problem
Each node has unit size
Each edge has unit weight
Find two partitions V1 and V2 such that
Each of V1 and V2 has equal size
External wiring will be minimum (cut-set will have
to minimmize)
NP-hard problem.
Heuristic techniques to approximate solutions.

Section 3 Graph Partitions (history)
Kerninghan-Lin Algorithm for graph bisection
• Partition a network into two groups of
predefined size. This partition is called cut.
• Inspect each a pair of nodes, one from each
group. Identify the pair that results in the largest
reduction of the cut size (links between the two
groups) if we swap them
• Swap them.
• If no pair deduces the cut size, we swap the pair
that increases the cut size the least.
• The process is repeated until each node is
moved once.
Fiduccia–Mattheyses (FM) Partitioning Algorithm

Kernighan-Lin (KL) Algorithm




B
A
n
B
A

Initial partition A, B
Size of the cut set 


B
b
A
a
ab
c
T
,
We have to minimize the size of the cut set.

Initial Partition
Optimal Partition
A, B
A*, B*
Swap B
Y
with
A
X 
 such that
B
A
Y
B
A
X
Y
X


*
*




Initial Partition Optimal Partition
B*
Swap B
Y
with
A
X 
 such that
B
A
Y
B
A
X
Y
X


*
*



How to find X and Y ?
A
B
A*
X
Y X
Y

• Iterate as long as the cutsize improves:
• Find a pair of vertices that result in the largest decrease in
cutsize if exchanged
• Exchange the two vertices (potential move)
• “Lock” the vertices
• If no improvement possible, and
still some vertices unlocked, then
exchange vertices that result in smallest increase in cutsize

• Initialize
• Bipartition G into V1 and V2, s.t., |V1| = |V2|  1
• n = |V|
• Repeat
• for i=1 to n/2
• Find a pair of unlocked vertices vai V1 and vbi V2 whose
exchange makes the largest decrease or smallest increase
in cut-cost
• Mark vai and vbi as locked
• Store the gain gi.
• Find k, s.t. i=1..k gi=Gaink is maximized
• If Gaink > 0 then
move va1,...,vak from V1 to V2 and
vb1,...,vbk from V2 to V1.
• Until Gaink  0

Kernighan-Lin (KL) Example
a
b
c
d
e
f
g
h
4 { a, e } -2
0 -- 0
1 { d, g } 3
2 { c, f } 1
3 { b, h } -2
Step No. Vertex Pair Gain
5
5
2
1
3
Cut-cost
[©Sarrafzadeh]
Gain sum
0
3
4
2
0

Kernighan-Lin (KL) Example
a
b c
d
e
f
g
h
4 { a, e } -2
0 -- 0
1 { d, g } 3
2 { c, f } 1
3 { b, h } -2
Step No. Vertex Pair Gain
5
5
2
1
3
Cut-cost
[©Sarrafzadeh]
Gain sum
0
3
4
2
0

Kernighan-Lin (KL) : Analysis
• Time complexity?
• Inner (for) loop
• Iterates n/2 times
• Iteration 1: (n/2) x (n/2)
• Iteration i: (n/2 – i + 1)2.
• Passes? Usually independent of n
• O(n3)
• Drawbacks?
• Local optimum
• Balanced partitions only
• No weight for the vertices
• High time complexity

Internal
cost
GA
GB
a1
a2
an
ai
a3
a5 a6
a4
b2
bj
b4 b3
b1
b6
b7
b5
 
 






A
x B
y
y
b
x
b
b
b
b
a
a
a
j
j
j
j
j
i
i
i
C
C
I
E
D
I
E
D
Likewise,
[©Kang]
External
cost

 



B
y
y
a
a
A
x
x
a
a i
i
i
i
C
E
C
I ,

• Lemma: Consider any ai  A, bj  B.
If ai, bj are interchanged, the gain is
• Proof:
Total cost before interchange (T) between A and B
Total cost after interchange (T’) between A and B
Therefore
Gain Calculation (cont.)
j
i
j
i b
a
b
a C
D
D
g 2



[©Kang]
others)
all
for
cost
(



 j
i
j
i b
a
b
a C
E
E
T
others)
all
for
cost
(




 j
i
j
i b
a
b
a C
I
I
T
j
i
j
j
i
i b
a
b
b
a
a C
I
E
I
E
T
T
g 2








i
a
D j
b
D

Gain Calculation (cont.)
• Lemma:
• Let Dx’, Dy’ be the new D values for elements of
A - {ai} and B - {bj}. Then after interchanging ai & bj,
• Proof:
• The edge x-ai changed from internal in Dx to external in Dx’
• The edge y-bj changed from internal in Dx to external in Dx’
• The x-bj edge changed from external to internal
• The y-ai edge changed from external to internal
• More clarification in the next two slides
}
{
,
2
2
}
{
,
2
2
j
ya
yb
y
y
i
xb
xa
x
x
b
B
y
C
C
D
D
a
A
x
C
C
D
D
i
j
j
i












[©Kang]

Clarification of the Lemma
ai
bj
x
a
b

• Decompose Ix and Ex to separate edges from ai and
bj:
Write the equations before the move
• ... And after the move
b
a 


 j
i xb
x
xa
x C
E
C
I
j
i
i
j
xb
xa
xa
xb
x
x
x
C
C
C
C
I
E
D











b
a
a
b )
(
)
(
j
i
j
i
xb
xa
x
xb
xa
x
C
C
D
C
C
D
2
2 








b
a
b
a 





i
j xa
x
xb
x C
E
C
I

Example: KL
• Step 1 - Initialization
A = {2, 3, 4}, B = {1, 5, 6}
A’ = A = {2, 3, 4}, B’ = B = {1, 5, 6}
• Step 2 - Compute D values
D1 = E1 - I1 = 1-0 = +1
D2 = E2 - I2 = 1-2 = -1
D3 = E3 - I3 = 0-1 = -1
D4 = E4 - I4 = 2-1 = +1
D5 = E5 - I5 = 1-1 = +0
D6 = E6 - I6 = 1-1 = +0
[©Kang]
5
6
4 2 1
3
Initial partition
4
5
6
2
3
1

Example: KL (cont.)
• Step 3 - compute gains
g21 = D2 + D1 - 2C21 = (-1) + (+1) - 2(1) = -2
g25 = D2 + D5 - 2C25 = (-1) + (+0) - 2(0) = -1
g26 = D2 + D6 - 2C26 = (-1) + (+0) - 2(0) = -1
g31 = D3 + D1 - 2C31 = (-1) + (+1) - 2(0) = 0
g35 = D3 + D5 - 2C35 = (-1) + (0) - 2(0) = -1
g36 = D3 + D6 - 2C36 = (-1) + (0) - 2(0) = -1
g41 = D4 + D1 - 2C41 = (+1) + (+1) - 2(0) = +2
g45 = D4 + D5 - 2C45 = (+1) + (+0) - 2(+1) = -1
g46 = D4 + D6 - 2C46 = (+1) + (+0) - 2(+1) = -1
• The largest g value is g41 = +2
interchange 4 and 1 (a1, b1) = (4, 1)
A’ = A’ - {4} = {2, 3}
B’ = B’ - {1} = {5, 6} both not empty
[©Kang]

Example: KL (cont.)
• Step 4 - update D values of node connected to vertices (4, 1)
D2’ = D2 + 2C24 - 2C21 = (-1) + 2(+1) - 2(+1) = -1
D5’ = D5 + 2C51 - 2C54 = +0 + 2(0) - 2(+1) = -2
D6’ = D6 + 2C61 - 2C64 = +0 + 2(0) - 2(+1) = -2
• Assign Di = Di’, repeat step 3 :
g25 = D2 + D5 - 2C25 = -1 - 2 - 2(0) = -3
g26 = D2 + D6 - 2C26 = -1 - 2 - 2(0) = -3
g35 = D3 + D5 - 2C35 = -1 - 2 - 2(0) = -3
g36 = D3 + D6 - 2C36 = -1 - 2 - 2(0) = -3
• All values are equal;
arbitrarily choose g36 = -3  (a2, b2) = (3, 6)
A’ = A’ - {3} = {2}, B’ = B’ - {6} = {5}
New D values are:
D2’ = D2 + 2C23 - 2C26 = -1 + 2(1) - 2(0) = +1
D5’ = D5 + 2C56 - 2C53 = -2 + 2(1) - 2(0) = +0
• New gain with D2  D2’, D5  D5’
g25 = D2 + D5 - 2C52 = +1 + 0 - 2(0) = +1  (a3, b3) = (2, 5) [©Kang]

Example: KL (cont.)
• Step 5 - Determine the # of
moves to take
g1 = +2
g1 + g2 = +2 - 3 = -1
g1 + g2 + g3 = +2 - 3 + 1 = 0
• The value of k for max G is 1
X = {a1} = {4}, Y = {b1} = {1}
• Move X to B, Y to A  A = {1, 2, 3}, B = {4, 5, 6}
• Repeat the whole process:
• • • • •
• The final solution is A = {1, 2, 3}, B = {4, 5, 6}
5
6
4 2 1
3

Section 3 Number of communities
Community detection
The number and size of the communities are unknown at the beginning.
Partition
Division of a network into groups of nodes, so that each node belongs to one group.
Bell Number: number of possible partitions
of N nodes

Hierarchical Clustering
Section 4

Section 4 Hierarchical Clustering
Agglomerative algorithms merge nodes and communities with high
similarity.
Divisive algorithms split communities by removing links that connect
nodes with low similarity.
1. Build a similarity matrix for the network
2. Similarity matrix: how similar two nodes are to each other  we need to
determine from the adjacency matrix
3. Hierarchical clustering iteratively identifies groups of nodes with high similarity,
following one of two distinct strategies:
Hierarchical tree or dendrogram: visualize the history of the merging or splitting
process the algorithm follows. Horizontal cuts of this tree offer various community
partitions.
4.

Section 4 Agglomerative Algorithms
Step 1: Define the Similarity Matrix (Ravasz algorithm)
• High for node pairs that likely belong to the same
community, low for those that likely belong to different
communities.
• Nodes that connect directly to each other and/or share
multiple neighbors are more likely to belong to the same
dense local neighborhood, hence their similarity should
be large.
Topological overlap matrix:
JN(i,j): number of common
neighbors of node i and j;
(+1) if there is a direct link
between i and j;
Agglomerative algorithms merge nodes and communities with high similarity.

Step 2: Decide Group Similarity
• Groups are merged based on their mutual similarity through single, complete or
average cluster linkage

Step 3: Apply Hierarchical Clustering
• Assign each node to a community of its own and evaluate the similarity
for all node pairs. The initial similarities between these “communities” are
simply the node similarities.
• Find the community pair with the highest similarity and merge them to
form a single community.
• Calculate the similarity between the new community and all other
communities.
• Repeat from Step 2 until all nodes are merged into a single community.
Step 4: Build Dendrogram
• Describes the precise order in which the nodes are assigned to
communities.

Computational complexity:
• Step 1 (calculation similarity matrix):
• Step 2-3 (group similarity):
• Step 4 (dendrogram): E. Ravasz et al., Science 297 (2002).

Section 4 Divisive Algorithms
Step 1: Define a Centrality Measure (Girvan-Newman algorithm)
• Link betweenness is the number of shortest paths
between all node pairs that run along a link.
• Random-walk betweenness. A pair of nodes m and n are
chosen at random. A walker starts at m, following each
adjacent link with equal probability until it reaches n.
Random walk betweenness xij is the probability that the
link i→j was crossed by the walker after averaging over
all possible choices for the starting nodes m and n
Divisive algorithms split communities by removing links that connect nodes
with low similarity.
M. Girvan & M.E.J. Newman, PNAS 99 (2002).

Section 4 Divisive Algorithms
Step 2: Hierarchical Clustering
a) Compute of the centrality of
each link.
b) Remove the link with the
largest centrality; in case of a
tie, choose one randomly.
c) Recalculate the centrality of
each link for the altered
network.
d) Repeat until all links are
removed (yields a
dendrogram).

Section 4 Divisive Algorithm
• Step 1a (calculation betweenness
centrality):
• Step 1b (Recalculation of betweenness
centrality for all links):
for sparse networks

Section 4 Hierarchy in networks
Can a hierarchical network be scale-free?

(1) Scale-free property
The obtained network is scale-free, its
degree distribution following a power-
law with
E. Ravasz & A.-L. Barabási, PRE 67 (2003).
A construction

(1) Scale-free property
The obtained network is scale-free, its
degree distribution following a power-
law with

(2) Clustering coefficient scaling with k
Small k nodes:
*high clustering
coefficient;
*their neighbors tend to
link to each other in highly
interlinked, compact
communities.
High k nodes (hubs):
*small clustering coefficient;
*connect independent
communities.

(3) Clustering coefficient independent of N

2. Scaling clustering
coefficient (DGM)
1. Scale-free 3. Clustering coefficient
independent of N
x

Section 4 Hierarchy in real networks
POWER GRID INTERNET

Section 4 Ambiguity in Hierarchical clustering
Where to “cut”?

Phylogenetic dendrograms
In bioinformatrics, clusters and dendrograms have been studied for a long time.
For example, the sequences of the same protein or gene in different species are
selected, and compared with each other.

Phylogenetic dendrograms
A similarity matrix is constructed between these sequences,
by looking at how many aminoacids/nucleotides stay in place

Section 4 Modularity
MEJ Newman, PNAS 103 (2006).
H4: Random Hypothesis
Randomly wired networks are not expected to have a community structure.

Imagine a partition in nc communities
Modularity

Modularity
Original data

Modularity
Original data Expected connections,
a model

Modularity
a model
Relative to a specific
partition

Modularity
a model
Relative to a specific
partition
Modularity is a measure associated to a partition
Random network

Another way of writing M
where LC is the number of links within C. In a similar fashion, the second term becomes
We can rewrite the first term as
Finally we get:

H5: Maximal Modularity Hypothesis
The partition with the maximum modularity M for a given network offers the
optimal community structure

H5: Maximal Modularity Hypothesis
The partition with the maximum modularity M for a given network offers the
optimal community structure
Find
Goal
that maximizes M

• Optimal partition, that
maximizes the modularity.
• Sub-optimal but positive
modularity.
• Negative Modularity: If we
assign each node to a different
community.
• Zero modularity: Assigning all
nodes to the same community,
independent of the network
structure.
• Modularity is size dependent
Which partition ?

Section 4 Modularity based community identification
A greedy algorithm, which iteratively joins nodes if the move increases the new
partition’s modularity.
Step 1. Assign each node to a community of its own. Hence we start with N
communities.
Step 2. Inspect each pair of communities connected by at least one link and
compute the modularity variation obtained if we merge these two communities.
Step 3. Identify the community pairs for which ΔM is the largest and merge them.
Note that modularity of a particular partition is always calculated from the full
topology of the network.
Step 4. Repeat step 2 until all nodes are merged into a single community.
Step 5. Record for each step and select the partition for which the modularity is
maximal.
MEJ Newman, PRE 69 (2004).

Which partition ?
Modularity can be used to compare different partitions provided by
other algorithms, like hierarchical clustering
It can be used to design new algorithms, aiming at maximizing M

Section 4 Modularity for the Girvan-Newman
Which partition ?

Section 4 Modularity based community identification
MEJ Newman, PRE 69 (2004).
• Step 1-2 (calculation of ΔM for L links ):
• Step 3 (matrix update):
• Step 4 (N-1 community merges):
for sparse networks

Section 4 Limits of Modularity
kA and kB total degree in A and B
A B
Resolution limit

If and
A B
Resolution limit

If and
A B
We merge A and B to
maximize modularity.
Resolution limit

If and
Assuming
A B
We merge A and B to
Resolution limit

If and
Assuming
Modularity has a resolution limit, as it cannot detect communities smaller than
this size.
A B
We merge A and B to
Resolution limit

One maximum?

Null models
Expected connections,
a model
can take into account weights
can take into account directions
can take into account attributes or space
S. Fortunato, Phys. Rep. 486 (2010)
S. Fortunato, Phys. Rep. 486 (2010)
P. Expert el al., PNAS 108 (2011)

Section 5 Online Resources (Modularity)
Gephi
NetworkX
R assigns self-loops to nodes to increase or decrease the aversion of nodes to form communities
Finds the partition that maximizes modularity
(considers weights and direction)
Calculates the modularity of the partition you
provide

Section 4 Online Resources (1)
The greedy algorithm is neither particularly fast nor particularly successful at
maximizing M.
Scalability: Due to the sparsity of the adjacency matrix, the update of the matrix
involves a large number of useless operations. The use of data structures for
sparse matrices can decrease the complexity of the computational algorithm to ,
which allows us to analyze is of networks up to nodes. See
"Fast Modularity" Community Structure Inference Algorithm
http://cs.unm.edu/~aaron/research/fastmodularity.htm for the code.
A fast greedy algorithm was proposed by Blondel and collaborators, that can
process networks with millions of nodes. For the description of the algorithm see
Louvain method: Finding communities in large networks
https://sites.google.com/site/findcommunities/ for the code.

Overlapping Communities
Section 6

Section 5 Overlapping Communities
G. Palla et al., Nature 435 (2005).

Section 5 Clique Percolation (CFinder)
Other k-cliques that can not be reached from a
particular clique correspond to other clique-
communities
Start with a k-clique (complete subgraphs
of k nodes), a 3-clique for example
Start “rolling” the clique over adjacent
cliques. Two k-cliques are considered
adjacent if they share k-1 nodes
A k-clique community is the largest
connected subgraph obtained by the union
of all adjacent k–cliques G. Palla et al., Nature 435 (2005).

Section 5 Overlapping Communities
Bright:
• community containing
light-related words (glow
or dark);
• community capturing
different colors (yellow,
brown)
• community consisting of
astronomical terms (sun,
ray).
• community linked to
intelligence (gifted,
brilliant).

Notice that if a random network is sufficiently dense, there are cliques of
varying order.
I. Derényi et al., PRL 94 (2005).
Section 5 Could CP communities emerge by chance?

Notice that if a random network is sufficiently dense, there are cliques of
varying order.
A k-clique community emerges in a random graph only if the connection
probability exceeds the threshold:

p=0.13 (<pc) p=0.22 (>pc)
N=20 pc=0.16
Random networks with

p=0.13 (<pc) p=0.22 (>pc)
N=20 pc=0.16
Random networks with
Compare your Cfinder output with that you obtain
in a random graph with same N and p!

Section 5 Clique percolation
• Finding maximal cliques require exponential time.
• However the algorithm has to find only k-cliques, which can be done in
polynomial time

Section 5 Online Resources (CFinder)
The CFinder software package that implements the Clique
Percolation Method can be downloaded at
www.cfinder.org
NetworkX

Section 5 Link Clustering
Social networks, a link may indicate:
• they are in the same family
• they work together
• they share a hobby.
Biological networks:
each interaction of a protein is responsible for a different function, uniquely
defining the protein’s role in the cell
Nodes tend to belong to multiple communities
Links tend to be specific, capturing the nature of the relationship
between two nodes.
Ahn, Bragow and Lehmann, Nature 466 (2010).
Define a hierarchical algorithm based on similarity of links

n+(i): the list of the neighbors of node i,
including itself.
S measures the relative number of
common neighbors i and j have.
1. Define link similarity

2. Apply hierarchical clustering (agglomerative, single linkage)

The network of characters in Victor
Hugo’s 1862 novel Les
Miserables. Two characters are
connected if they interact directly
with each other in the story. The
link colors indicate the clusters,
grey nodes corresponding to
single-link clusters. Each node is
depicted as a pie-chart, illustrating
its membership in multiple
communities. Not surprisingly, the
main character, Jean Valjean, has
the most diverse community
membership

• Step 1: Comparison between two links requires
max(k1,k2) steps. For scale free networks the
step has complexity
• Step 2: hierarchical clustering requires

• Step 1: Comparison between two links requires
max(k1,k2) steps. For scale free networks the
step has complexity
• Step 2: hierarchical clustering requires
for sparse networks

Communities in Network Science

More Related Content

What's hot

Similar to Communities in Network Science

Recently uploaded

Communities in Network Science