7. in understanding and visualizing the structure of net-
works. In this paper we show how this can be achieved. pr
arXiv:cond-mat/0308
“a group of densely
The study of community structure in networks has a
long history. It is closely related to the ideas of graph
nic
era
partitioning in graph theory and computer science, and th
interconnected nodes” ing
a
op
th
rit
ev
sta
if
nit
mu
wh
th
mi
be
8. arXiv:cond-m a
op
th
rit
ev
sta
if
nit
mu
wh
th
mi
be
Hundreds of community
FIG. 1: A small network with community structure of the
type considered in this paper. In this case there are three us
communities, denoted by the dashed circles, which have dense wi
detection methods
internal links but between which there are only a lower density
of external links.
ing
div
17. neously explain and to observed network data using the tools of statistical inf
only observed topo- ence, combining a maximum likelihood approach [15] w
as right-skewed de- a Monte Carlo sampling algorithm [16] on the space of
fficients, and short
knowledge of hier-
ict missing connec-
high accuracy, and
han competing tech-
suggest that hierar-
complex networks,
network phenom-
devoted to the study
n networks [5, 6, 9,
nd simple clustering,
G. 1: A hierarchical network with structure on many scales and
ation at hierarchical random graph. Each internal node r
corresponding all scales in
he dendrogram is associated with a probability p that a pair of
r
tices hierarchical struc-
y, in the left and right subtrees of that node are connected. (The
des of the internal nodes in the figure represent the probabilities.)
am in which closely
mmon ancestors that
ore distantly related
ability of a connec- Clauset et al., Nature (2008)
20. ciated large network we introduce the distributions of these four basic
priori quantities. In particular we focus on their cumulative distribution
ins5,6,
o the
es of
e net-
actual
ps of
main
mmu-
ucture
eristic
ficient
scale.
ns we
ies of
raphs
ns and
nodes
est of
usters,
ve no G. Palla, I. Derényi, I. Farkas & T. Vicsek, Nature, 2005
21. A B
Multiple Contexts
C overlap and hierarchy Family
do not mix buildings in same
neighborhood
University home and work
22. A Multiple Contexts
B
Multiple Contexts
Multiple Contexts
C overlap and hierarchy Family
do not mix
Multiple Contexts buildings in same
neighborhood
University home and work
23. C overlap and hierarchy Family
do not mix buildings in same
neighborhood
University home and work
joint appointment
D 1
2 F
Single dendrogram cannot represent
multiple hierarchical contexts
3 3! 4
40. measures, and second, these measures are, crucially, recalculated after each removal. We also propose
a measure for the strength of the community structure found by our algorithms, which gives us an
objective metric for choosing the number of communities into which a network should be divided.
We demonstrate that our algorithms are highly effective at discovering community structure in both
computer-generated and real-world network data, and show how they can be used to shed light on
What the xxxx
the sometimes dauntingly complex structure of networked systems.
I. INTRODUCTION hierarchical clustering in sociology [18, 19]. Before pre-
senting our own findings, it is worth reviewing some of
this preceding work, to understand its achievements and
is this?
Empirical studies and theoretical modeling of networks
have been the subject of a large body of recent research in where it falls short.
statistical physics and applied mathematics [1, 2, 3, 4]. Graph partitioning is a problem that arises in, for ex-
Network ideas have been applied with great success to ample, parallel computing. Suppose we have a num-
topics as diverse as the Internet and the world wide ber n of intercommunicating computer processes, which
web [5, 6, 7], epidemiology [8, 9, 10, 11], scientific ci- we wish to distribute over a number g of computer proces-
tation and collaboration [12, 13], metabolism [14, 15], sors. Processes do not necessarily need to communicate
and ecosystems [16, 17], to name but a few. A property with all others, and the pattern of required communica-
that seems to be common to many networks is commu- tions can be represented by a graph or network in which
nity structure, the division of network nodes into groups the vertices represent processes and edges join process
within which the network connections are dense, but be- pairs that need to communicate. The problem is to allo-
tween which they are sparser—see Fig. 1. The ability to cate the processes to processors in such a way as roughly
find and analyze such groups can provide invaluable help to balance the load on each processor, while at the same
in understanding and visualizing the structure of net- time minimizing the number of edges that run between
works. In this paper we show how this can be achieved. processors, so that the amount of interprocessor commu-
The study of community structure in networks has a nication (which is normally slow) is minimized. In gen-
long history. It is closely related to the ideas of graph eral, finding an exact solution to a partitioning task of
partitioning in graph theory and computer science, and this kind is believed to be an NP-complete problem, mak-
ing it prohibitively difficult to solve for large graphs, but
a wide variety of heuristic algorithms have been devel-
oped that give acceptably good solutions in many cases,
the best known being perhaps the Kernighan–Lin algo-
rithm [20], which runs in time O(n3 ) on sparse graphs.
A solution to the graph partitioning problem is how-
ever not particularly helpful for analyzing and under-
standing networks in general. If we merely want to find
if and how a given network breaks down into commu-
nities, we probably don’t know how many such com-
munities there are going to be, and there is no reason
why they should be roughly the same size. Furthermore,
the number of inter-community edges needn’t be strictly
minimized either, since more such edges are admissible
between large communities than between small ones.
FIG. 1: A small network with community structure of the
As far as our goals in this paper are concerned, a more
useful approach is that taken by social network analysis
41. above, because none of the others in the literature satisfy all these of protein–protein interactions27 (Fig. 2c). These pictures ca
requirements simultaneously21,24. tests or validations of the efficiency of our algorithm. In p
Word association network: Network of “commonly
associated English words”
Figure 2 | The community structure around a particular node in three be associated with his fields of interest. b, The communities of t
different networks. The communities are colourG. Palla, I. Derényi, I. Farkas & T. Vicsek, Nature, 2005 w*
coded, the overlapping ‘bright’ in the South Florida Free Association norms list (for
42. a Link communities and Bob also work together b
Spouses Alice Word Association examples
Link communities COMBINE
COMBINE
JOIN
Alice FRUIT
BLENDER JOIN
Alice FRUIT INTEGRATE
BLENDER
INTEGRATE
Bob JUICE BLEND
Bob JUICE BLEND
MIX
MIXTURE
Family Work MIX
MIXTURE
Family Work
Node communities
Node communities
Figure S16: Overlapping community structure around Acetyl-CoA in the E. coli metabolic network.
Alice Alice
different and important roles in metabolism. Shown are only communities with homogeneity score e
DISAPPEAR
inside each community share at least one pathway annotation); all other links, including those that
Alice Alice
LOOK
structure, are omitted. Pathway annotations shared by all community members are displayed with c
LOOK
APPEAR DISAPPEAR
two communities to the right of Acetyl-CoA are grouped since they share the same exact pathway an
APPEAR VANISH
Bob Bob SEE
VANISH
Bob Bob Work
SEE REAPPEAR
Family REAPPEAR
Work SHOW ATTEND
Family
The Alice-Bob link was placed in family but both SHOW ATTEND
The Alice-Bobwork was placed in are identified
home and link relationships family but both
home and work relationships are identified BROOM
PAINT
Figure S4: Overlapping links. In the link community framework, a link may beSWEEP
assigned to only one community. By de
gure S4: Overlapping links. In the link community framework, a link may be relationships betweencommunity. By derivi
node communities, however, the problem of effectively discovering multiple assigned to only one nodes is effectively s
PAINTER
ode communities, however,many communities together regardless of the membership of the link betweenis effectively illust
Two nodes can belong to the problem of effectively discovering multiple relationships between nodes them. Left: solv
GROOM
wo nodes can belong to manyexamples from word association network. In the upper example, Blend and blender belong to
of the situation. Right: real communities together regardless of the membership of the link between them. Left: illustrati
BRUSH
PAINTING
the situation.community and ‘mix’ from word association network. In thethe linkexample, Blend and blender belong tono
‘fruit juice’ Right: real examples community. In the bottom example, upper between appear and reappear does bo
HAIR
ruit juice’ communityother ‘mix’ community. they belong to several communities together.
belong to any of the and communities, but In the bottom example, the COMB between appear and reappear does not ev
link TOOTHBRUSH
long to any of the other communities, but they belong to several communities together.
HAIRSPRAY
TOOTHPASTE
link can simultaneously belong to multiple communities even though the link itself belongs to only
43. pping community structure around Acetyl-CoA in the E. coli metabolic network. Acetyl-CoA plays several
tant roles in metabolism. Shown are only communities with homogeneity score equal to 1 (all compounds
nity share at least one pathway annotation); all other links, including those that contribute to community
Simple Complex
ed. Pathway annotations shared by all community members are displayed with corresponding colors. The
the right of Acetyl-CoA are grouped since they share the same exact pathway annotations.
BROOM
PAINT
SWEEP
PAINTER
GROOM
PAINTING
BRUSH
HAIR
TOOTHBRUSH
COMB
HAIRSPRAY
TOOTHPASTE
Global
• SUNSET, SUNRISE, ORANGE
Local • SUNSET, SUNRISE, RED
• SUNSET, SUNRISE, PRETTY,
BEAUTIFUL
• SUNSET, SUNRISE, MOON
• SUNSET, SUNRISE, BEACH
• SUNSET, SUNRISE, SUN, DAWN, DUSK,
SUNSHINE
• SUNSET, SUNRISE, DAWN, DUSK,
AFTERNOON, EVENING
45. Then, how can we find
hierarchical community
structure
in COMPLEX networks
with pervasive overlap?
65. A B
ei k ejk c
i k j a
S(eac , ebc )
Figure S1: (A) The similarity measure S(eik , ejk ) between edges
For this example, |n+ (i) ∪ n+ (j)| = 12 and |n+ (i) ∩ n+ (j)| = 4,
cases: (B) an isolated (ka = kb = 1), connected triple (a,c,b) has S
triangle has S = 1.
structure can become radically different.) Thus, we neglect the ne
first define the inclusive neighbors of a node i as:
66. A B
ei k ejk c
i k j a
S(eac , ebc )
Figure S1: (A) The similarity measure S(eik , ejk ) between edges
For this example, |n+ (i) ∪ n+ (j)| = 12 and |n+ (i) ∩ n+ (j)| = 4,
cases: (B) an isolated (ka = kb = 1), connected triple (a,c,b) has S
triangle has S = 1.
4
structure can become radically different.) Thus, we neglect12 ne
the
first define the inclusive neighbors of a node i as:
82. Quantitative Evaluation Framework
How homogeneous each
Community quality community is?
How accurate the # of
Overlap quality overlap is?
How many nodes are
Community coverage covered?
How many memberships
Overlap coverage are assigned?
83. Quantitative Evaluation Framework
How homogeneous each
Community quality community is?
How accurate the # of
Overlap quality overlap is?
How many nodes are
Community coverage covered?
How many memberships
Overlap coverage are assigned?
84. Quantitative Evaluation Framework
How homogeneous each
Community quality community is?
How accurate the # of
Overlap quality overlap is?
How many nodes are
Community coverage covered?
How many memberships
Overlap coverage are assigned?
85. Quantitative Evaluation Framework
How homogeneous each
Community quality community is?
How accurate the # of
Overlap quality overlap is?
How many nodes are
Community coverage covered?
How many memberships
Overlap coverage are assigned?
86. Metadata
Figure R11: Example of the network and available metadata for the Amazon.com product co-purchases network. Here we show a
particular book (upper left), some of the books it is often bought with (lower left), the set of subjects it is classified into by Amazon.com
(upper right), and the set of popular “tags” Amazon.com users have chosen to describe or annotate the book’s content (lower right).
We can use shared tags to quantify how similar pairs of books are, and the more subjects a book has, the more communities it is
expected to belong to. Other combinations of metadata are certainly possible. Other networks used here have analogous metadata.
87. Quantitative Evaluation Framework
Community quality Amazon.com Community coverage no membership
Subjects
Subjects HIV / AIDS
Medical
Africa - General
Africa
Africa
History
Subjects
HIV / AIDS
Medical
Nonfiction / General
Infectious Diseases
high coverage low coverage
Overlap quality Metabolic network Overlap coverage community
memberships
Acetyl-CoA
1. Glycolysis / Gluconeogenesis
2. TCA cycle
3. Fatty acid biosynthesis
4. ...
Many pathway
Memberships
high overlap
IDP (Inosine diphosphate)
1. Purine metaboilsm
Few pathway
Memberships
low overlap high overlap coverage low overlap coverage
88. and topologies (for example, the network range from sparse (average degree 6.34) to dense (average degree 38.95)).
metadata
network description N k community overlap
PPI (Y2H) PPI network of S. cerevisiae 1647 3.06 Set of each protein’s The number of GO
obtained by yeast two-hybrid known functions (GO terms
(Y2H) experiment [3] terms)a
PPI (AP/MS) Affinity purification mass 1004 16.57 GO terms GO terms
spectrometry (AP/MS)
experiment
PPI (LC) Literature curated (LC) 1213 4.21 GO terms GO terms
PPI (all) Union of Y2H, AP/MS, and LC 2729 8.92 GO terms GO-terms
PPI networksb
Metabolic Metabolic network (metabolites 1042 16.81 Set of each The number of
connected by reactions) of E. metabolite’s pathway KEGG pathway
coli annotations (KEGG)c annotations
Phone Social contacts between mobile 885989 6.34 Each user’s most likely Call activity
phone users [15, 16, 17] geographic location (number of phone
callsd )
Actor Film actors that appear in the 67411 8.90 Set of plot keywords Length of career
same movies during for all of the actor’s (year of first role)
2000–2009 [18] films
US Congress Congressmen who co-sponsor 390 38.95 Political ideology, Seniority (number
bills during the 108th US from the common of congresses
Congress [19, 20] space score [21, 22] served)
Philosopher Philosophers and their 1219 9.80 Set of (wikipedia) Number of
philosophical influences, from hyperlinks exiting in wikipedia subject
the English Wikipediae the philosopher’s page categories
Word Assoc. English words that are often 5018 22.02 Set of each word’s Number of senses
mentally associated [23] senses, as documented
by WordNet f
Amazon.com Products that users frequently 18142 5.09g Set of each product’s Number of product
buy together user tags (annotations) categories
97. B
proteasome core complex (GO:005839, C)
threonine-type endopeptidase activity (GO:0004298, F)
Signalosome (GO:0008180, C)
ubiquitin-dependent protein catabolic process (GO:0006511, P)
Protein deneddylation (GO:0000338, P)
proteasome regulatory particle (GO:0005838, C)
ubiquitin-dependent protein catabolic process (GO:0006511, P)
endopeptidase activity (GO:0004175, F)
Core Regulatory particle
Proteasome
ure S16: Another example of overlapping community structure. (A) The subnetwork
98. B 103
number of communities
103
metabolites
ATP
number of
2
10 ADP
2 H2O, H+
10 1
Pi
10
0
10
101 0 50 100 150 200
number of communities
per metabolite
0
E. coli
10
101 102 103
number of metabolites per community
106
mmunities
6
10
ber of users
105 105
4
10
104 10
3
102
99. u rren cies
B 3 C
number of communities
10
103
metabolites
ATP
number of
2
10 ADP
2 H2O, H+
10 1
Pi
10
0
10
101 0 50 100 150 200
number of communities
per metabolite
0
E. coli
10
101 102 103
number of metabolites per community
106
mmunities
6
10
ber of users
105 105
4
10
104 10
3
102
109. A B threshold = 0.23
50 km
thr =
0.24
C D
threshold thr = 0.27
= 0.20
thr = 0.27
F
0.4
0.5
0.6
0.7
0.8
E 0.9
1
110. Remaining
hierarchy
e 1
Phone Metabolic Word association
0.8
Q/Qmax
0.6
0.4
0.2 Actual
Control
0
0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8
Link dendrogram threshold, t
Figure 4 | Meaningful communities at multiple levels of the link
dendrogram. a–c, The social network of mobile phone users displays co-
located, overlapping communities on multiple scales. a, Heat map of the
most likely locations of all users in the region, showing several cities.
b, Cutting the dendrogram above the optimum threshold yields small, intra-
111. a b Planets
Diving, Swim, Marine life
SPLASH DUCK
Water and aquatic animals
MARSH
Astronomy SAILING
Astronomy
MARS
Scuba diving, DROWN SINKER DRIFT BOG SWAMP
(more general terms)
PLUTO URANUS Scuba diving Coral reef LAGOON SWAN CROCODILE
SATURN GALAXY REEF SWIMMER
EARTH UNDERWATER SAIL
JUPITER OVERFLOW
DIVER CORAL FLOAT POND REPTILE
PLANET PLANETS
NEPTUNE DIVING
STARS MOAT
UNIVERSE SNORKEL SWIM DUCKS
VENUS DIVE RAFT ALLIGATOR
ASTRONOMY SCUBA
LAKE
METEORITE ASTROLOGY CANOE PIER
MOON MERMAID BROOK DOCK
COMET FIN PADDLE CREEK
METEOR
STAR FLIPPER BAY FISHING
OBSERVE UPSTREAM
ASTEROID DOLPHIN RIVER CANAL
ROCKET PORPOISE
SKY
ASTRONAUT Diving with animals OTTER FLOOD
WHALE DOWNSTREAM
SHUTTLE TELESCOPE SEAL
TANK STREAM DAM
SALMON
MARINE FLOW
WALRUS
MAMMAL TROUT INLET
c d SATURN MERMAID
URANUS
NEPTUNE DIVING
JUPITER
MARS SWIMMER
PLUTO
CORAL
TELESCOPE VENUS SWIM
STARS UNDERWATER
FIN
REEF
MOON PLANETS SNORKEL
MARINE
GALAXY DIVE
PLANET
SCUBA
METEOR
UNIVERSE DOLPHIN DIVER
ASTRONOMY
FLIPPER
ASTEROID METEORITE WHALE
PORPOISE A community at threshold = 0.20,
A community at threshold = 0.20,
and sub-communities at threshold = 0.28
COMET and sub-communities at threshold = 0.28 WALRUS
Figure 23: Examples of hierarchical structure in the word association network. The word association network is a nice example
for this purpose, since it is easy to appreciate the meanings and contexts of the individual words and communities. (a) Here
we pick a link and follow how the link merges with others as we climb the hierarchical tree. (b) We start from the link MARS–
112. Conclusion
• Link viewpoint effectively removes
the problem of overlap.
• Global hierarchical structure can be
found by clustering links.
• doi:10.1038/nature09182
• http://barabasilab.neu.edu/projects/
linkcommunities/
115. Acknowledgements
A.-L. Barabási, H. Yu, S. Ahnert, J. Park, D.-
S. Lee, P.-J. Kim, M. A. Yildirim,
T. S. Evans, R. Lambiotte,
Line Graphs, Link Partitions and
Overlapping Communities,
http://sites.google.com/site/linegraphs/
117. a Spouses Alice and Bob also work together b Word Association examples
Link communities
COMBINE
JOIN
Alice FRUIT
BLENDER
INTEGRATE
Bob JUICE BLEND
MIX
MIXTURE
Family Work
Node communities
Alice Alice LOOK
DISAPPEAR
APPEAR
VANISH
Bob Bob SEE
REAPPEAR
Work
Family
SHOW ATTEND
The Alice-Bob link was placed in family but both
home and work relationships are identified
ultiple relationships between nodes be found by link communities that assume one membe
hemselves “inherit” multiple memberships from their links. Two nodes can belong to many c
118. link communities
a Internal groups without distinguishing features are undetectable to ALL methods
i e
communities language class basketball team f d
project g b
prob. p
a j
c h
students a b c d e f g h i j
all students are identical
one community, D = 0.750
b subtle structural differences are found by link communities
g
c coach
e
communities language class basketball team
a f
project
prob. p i
d
j
b h
students a b c d e f g h i j coach
coach separates them
two communities, D = 0.756
c juniors basketball team seniors
1 2 3 4 5 6 7 8 9 10 21 22 23 24 25 26 27 28 29 30
6
10
1
3
11 12 13 14 15 16 17 18 19 20
14
2
20
7
project
24
12
prob. p
8
13
18
28
5
23
15
three communities, D = 0.745
4
17
25
9
26
11
16
Multiple relationships are found:
22
juniors and
19
29
The link between students 18 and 20
basketball players
is senior but both 18 and 20 belong to
30
27
21
both seniors and basketball players! seniors and
basketball players
Figure 5: Some small, illustrative examples of the subtle structural changes that link communities detect, using the bipartite
social model of [21] with p = 0.8, followed by our link communities algorithm. In (a) there are no distinguishing structural
features to separate the “subsumed” basketball team from the language class. Detecting the team is impossible for all methods.
Editor's Notes
organizing principle, underlying mechanism, dynamics, function