Advanced Methods in Network Science: Community Detection Algorithms

Advanced
Network
Analysis
Methods:

Community
Detec:on

MICHAEL
J
BOMMARITO
II

DANIEL
MARTIN
KATZ

Deﬁni:on
–
Simple
Version

—  Broadly:
a
group
of
nodes
that
are
rela&vely
densely

connected
to
each
other
but
sparsely
connected
to
other

dense
groups
in
the
network

¡  Porter,
Onnela,
Mucha.

Communi&es
in
Networks.
No:ces
to
the
AMS,
2009.

—  Examples:

¡  Cliques
in
a
high
school
social
network

¡  Vo:ng
coali:ons
in
Congress

¡  Consumer
types
in
a
network
of
co-‐purchases

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Example
–
Social
Networks

Imagine
this
Graph
….

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Example
–
Social
Networks

VerJces:
People

Edges:
Friendship

What
factors
might
aﬀect
the
formaJon
of

friendships
in
a
high
school
social
network?

Ideas:

Age,

Gender,
Class,
Race,
Interests

How
might
we
assign
communiJes
to
this

network?

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Example
–
Social
Networks

VerJces:
People

Edges:
Friendship

Girls

What
factors
might
aﬀect
the
formaJon
of

friendships
in
a
high
school
social
network?

Ideas:

Age,

Gender,
Class,
Race,
Interests

Boys
How
might
we
assign
communiJes
to
this

network?

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Example
–
Vo:ng
Coali:ons

VerJces:
People

Edges:
Co-‐voted

at
least
once

Now
let s
look
at
the
same
network
as
if
it

represented
co-‐voJng
in
the
Senate.

Ideas:
Issue
posi:on,
geography,
ethnicity,
gender

How
might
we
assign
communiJes
to
this

network?

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Example
–
Vo:ng
Coali:ons

Republicans

VerJces:
People

Democrats
Edges:
Co-‐voted

at
least
once

Now
let s
look
at
the
same
network
as
if
it

represented
co-‐voJng
in
the
Senate.

Ideas:
Issue
posi:on,
geography,
ethnicity,
gender

How
might
we
assign
communiJes
to
this

network?

Independents

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Context!

Note
that
we
have
assigned
community
membership
diﬀerently

despite
observing
the
same
graph!

Community
detecJon
is
not
a
concept
that
can
be
divorced
from
context.

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Directedness

Undirected
Directed

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Directedness

Many
methods
do
not
incorporate
direcJon!

Many
methods
that
do
incorporate
direcJon
do
not
allow

for
bidirected
edges.

Diﬀerent
soVware
packages
may
implement
the
same

method
with
or
without
support
for
directed
edges.

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Weights

Unweighted
Weighted

• 
Binary
rela:onships
• 
Rela:onship
strength

• 
Data
limita:ons
• 
Frequency
of
rela:onship

• 
Flow

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Weights

Note
edge

thickness.

Unweighted
Weighted

• 
Binary
rela:onships
• 
Rela:onship
strength

• 
Data
limita:ons
• 
Frequency
of
rela:onship

• 
Flow

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Weights

Many
methods
do
not
incorporate
edge
weights!

Methods
that
do
incorporate
edge
weights
may
diﬀer
in

acceptable
values!

• 
Integers
or
real
weights

• 
Strictly
posi:ve
weights

Diﬀerent
soVware
packages
may
implement
the
same

method
with
or
without
support
for
weighted
edges.

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Resolu:on

Resolu:on
is
a
concept
inherited
from
op:cs.

According
to
Wiki,

Op,cal
resolu,on
describes
the
ability
of
an
imaging
system

to
resolve
detail
in
the
object
that
is
being
imaged.

High
resoluJon)
Low
resoluJon

• 
Can
make
out
many
details!
(15.1MP)
• 
Can t
read
a
word!

• 
But…
• 
But…

• 
Details
may
be
noise
• 
Can
focus
on
broad
regions

• 
Some:mes
they
don t
ma]er!

• 
Noise
is
out
of
focus

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Resolu:on

Same
graphs!

High
resoluJon
(microscopic)
Low
resoluJon
(macroscopic)

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Resolu:on

Different
hypotheses
or
quesJons
correspond
to
different

resoluJons.

Different
methods
are
more
or
less
effecJve
at
detecJng

community
structure
at
different
resoluJons.

Modularity-‐based
methods
cannot
detect
structure
below

a
known
resoluJon
limit.

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Overlapping
Communi:es

Palla,
Derenyi,
Farkas
,Vicsek.

Uncovering
the
overlapping
community
structure
of
complex
networks
in
nature
and
society

Nature

435,
2005.

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Computa:onal
Complexity
Refresher

ComputaJonal
complexity
is
a
serious
issue!

Data
is
becoming
more
abundant
and
more

detailed.

Many
quan:ta:ve
research
projects
hinge

on

the
feasibility
of
calcula:ons.

Understanding
computa:onal
complexity
can

allow
you
to
communicate
with
department
IT

personnel
or
computer
scien:sts
to
solve
your

problem.

Make
sure
your
project
is
feasible
before

commi[ng
the
Jme!

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Computa:onal
Complexity
Refresher

Computa:onal
complexity
in
the
context
of
modern
compu:ng
is

primarily
focused
on
two
resources:

1. 
Time:
How
long
does
it
take
to
perform
a
sequence
of
opera:ons?

•  CPU/GPU

•  Exact
vs.
approximate
solu:ons

2. 
Storage:
How
much
space
does
it
take
to
store
our
problem?

•  Memory
and
persistent
storage
(to
a
lesser
degree)

•  Data
representa:ons

We
tend
to
communicate
:me
and
storage
complexity
through
Big-‐O
nota:on.

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Computa:onal
Complexity
Refresher

In
computa:onal
complexity,
Big-‐O
nota:on
conveys
informa:on

about
how
:me
and
storage
costs
scale
with
inputs.

• 
O(1):
constant
-‐
independent
of
input

• 
O(n):
scales
linearly
with
the
size
of
input

• 
O(n^2):
scales
quadra:cally
with
the
size
of
input

• 
O(n^3):
scales
cubically
with
the
size
of
input

These
terms
ofen
occur
with
log
n
terms

and
are
then
given
the
preﬁx
quasi-‐.

For
graph
algorithms,
the
input
n
is
typically

• |V|,
the
number
of
ver:ces

• |E|,
the
number
of
edges

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Taxonomy
of
Methods

This
taxonomy
of
methods
follows
the
history
of
their
development.

• Divisive
Methods

•  Edge-‐betweenness
(2002)

• Modularity
Methods

•  Fast-‐greedy
(2004)

•  Leading
Eigenvector
(2006)

• Dynamic
Methods

•  Clique
percola:on
(2005)

•  Walktrap
(2005)

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Edge
Betweenness

PublicaJon(s):

Girvan,
Newman.

Community
structure
in
social
and
biological
networks.

PNAS,
2002.

Basic
Idea:

Divide
the
network
into
subsequently
smaller
pieces
by
ﬁnding
edges
that
bridge
communi:es.

Constraints:

• 
Can
be
adapted
to
directed
networks
(igraph).

• 
Can
be
adapted
to
weights
(no
public
sofware).

Time
Complexity:
O(|V|^3)
in
general,
O(|V|^2
log
|V|)
for
special
cases

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Edge
Betweenness

From
the
paper:

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Quick
Aside
–
Zach s
Karate
Club

Zachary's
Karate
Club:
Social
network
of
friendships
between
34
members
of
a
karate

club
at
a
US
university
in
the
1970s

Event:
During
the
observa:on
period,
the
club
broke
into
2
smaller
clubs.

This
split

occurred
along
a
pre-‐exis:ng
social
division
between
the
two
communi:es
in
the

network.

Drawn
from
the
Paper:
Zachary.
An
informa&on
flow
model
for
conflict
and
fission
in

small
groups.
Journal
of
Anthropological
Research
33,
1977.

Download
the
Data:
h]p://www-‐personal.umich.edu/~mejn/netdata/

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Edge
Betweenness

Only
misclassiﬁca:on

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Edge
Betweenness

Betweenness
tends
to
get
the
big
picture

right.

However,
resolu:on
can
be
a
problem!

Do
not
draw
conclusions
about
small

communi:es
from
this
algorithm
alone.

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Modularity

• 
e
is
the
number
of
edges
in
module
i

• 
d
is
total
degree
of
ver:ces
in
module
i

• 
m
is
the
total
number
of
edges
in
network

Q
is
difference
between
observed
connecJvity
within
modules
and
EV
for

the
configuraJon
model
(degree-‐distribuJon
fixed)

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Modularity

Remember
our
previous
discussion
on
computa:onal
complexity?

Modularity
maximiza:on
is
an
NP-‐hard
problem.

This
means
that
there
is
no
polynomial
representa:on
of
:me
complexity!

All
methods
therefore
try
to
solve
for
approximate
solu&ons.

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Modularity

Benjamin
H.
Good,
Yves-‐Alexandre
de
Montjoye
&
Aaron
Clauset,

The
Performance
of

Modularity
Maximiza:on
in
Prac:cal
Contexts,
Phys.
Rev.
E
81,
046106
(2010)

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Fast
Greedy

PublicaJon(s):

• 
Newman.

Fast
algorithm
for
detec&ng
community
structure
in
networks.
Phys.
Rev.
E,
2004.

• 
Clauset,
Newman,
Moore.

Finding
community
structure
in
very
large
networks.
Phys.
Rev.

E,
2004.

• 
Wakita,
Tsurumi.
Finding
Community
Structure
in
Mega-‐scale
Social
Networks.
2007.

Basic
Idea:

Try
to
randomly
assemble
a
larger
and
larger
communi:es
from
the
ground
up.

Start
by
placing
each
vertex
in
its

own
community
and
then
combine
communi:es
that
produce
the
best
modularity
at
that
step.

Constraints:

• 
Can
be
adapted
to
directed
edges
(no
public).

• 
Can
be
adapted
to
weights
(igraph).

Time
Complexity:
O(|E||V|
log
|V|)
worst
case

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Fast
Greedy

Fast-‐Greedy
also
tends
to
aggressively
create

larger
communi:es
to
the
detriment
of

smaller
communi:es.

Why
is
this
node
red
instead
of
blue?

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Leading
Eigenvector

PublicaJon(s):

• 
Newman.
Finding
community
structure
in
networks
using
the
eigenvectors
of
matrices.
Phys.
Rev.
E,
2006.

• 
Leicht,
Newman.
Community
structure
in
directed
networks.
Phys.
Rev.
Le].,
2008.

Basic
Idea:
Use
the
sign
on
the
components
of
the
leading
eigenvector
of
the
Laplacian
to
sequen:ally
divide
the

network.

Constraints:

• 
Can
be
adapted
to
directed
edges
(no
public).

• 
Can
be
adapted
to
weights
(igraph).

Time
Complexity:
O(|V|^2)

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Leading
Eigenvector

Note
that
eigenvector s
results

seem
to
split
the
diﬀerence

between
edge
betweenness
and

fast-‐greedy
in
this
case.

Why
are
these
nodes
not
a

part
of
the
larger
modules?

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Walktrap

PublicaJon(s):
Pons,
Latapy.
Compu&ng
communi&es
in
large
networks
using
random
walks.
JGAA,
2006.

Basic
Idea:

Simulate
many
short
random
walks
on
the
network
and
compute
pairwise
similarity
measures
based

on
these
walks.

Use
these
similarity
values
to
aggregate
ver:ces
into
communi:es.

Constraints:

• 
Can
be
adapted
to
directed
edges
(igraph).

• 
Can
be
adapted
to
weights
(igraph).

• 
Can
alter
resolu:on
by
walk
length
(igraph).

Time
Complexity:
depends
on
walk
length,
O(|V|^2
log
|V|)
typically

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Walktrap

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Walktrap

Walktrap
assigns
ver:ces
to
diﬀerent

communi:es
than
previous
algorithms.

Note
that
the
simulated
walk
length
can
be

changed
to
alter
resolu:on.

Furthermore,
simulaJon
is
stochasJc
and

thus
results
may
change
even
aVer
ﬁxing

the
walk
length
and
input
graph!

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Method
Comparison

Edge-‐Betweenness
Fast-‐Greedy

Walktrap

Leading
Eigenvector

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Recommended
Sofware
-‐
igraph

• 
Core
Library:
C

• 
Interfaces:
Python,
R,
Ruby

• 
Features:
Graph
opera:ons
&
algorithms,
random
graph
genera:on,
graph
sta:s:cs,

community
detec:on,
visualiza:on
layout,
ploqng

• 
URL:
h]p://igraph.sourceforge.net/

• 
Documenta:on:
h]p://igraph.sourceforge.net/documenta:on.html

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Example
Python
Source
Code

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Fron:ers
of
Community
Detec:on:

Temporal
Network
Dynamics

Gergely Palla, Albert-Laszlo Barabasi & Tamas Vicsek, Quantifying
Social Group Evolution, Nature 446:7136, 664-667 (2007)

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Fron:ers
of
Community
Detec:on:

Community
Structure
Over
Scales,
Time
Period,
etc.

Science 14 May 2010, Vol. 328. no. 5980,
pp. 876 - 878

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Community
Detec:on
Review
Ar:cles

Some
Useful
Review
ArJcles:

Mason A. Porter, Jukka-Pekka Onnela and Peter J. Mucha. 2009.
Communities in Networks. Notices of the American Mathematical Society
56: 1082-1166.

Santo Forunato. 2010. Community detection in graphs. Physics Reports.
486: 75-174.

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

A
Transi:on
to
Our
Sink
Method
Paper

—  Now
we
are
going
to
transi:on
to
a
speciﬁc
project
-‐-‐-‐

where
we
apply
some
of
the
ideas
contained
herein

—  Provide
a
very
brief
introduc:on
to
the

Exponen:al
Random
Graph
Models
(p*)

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Our
Sink
Paper
–Physica
A

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Dynamic
Acyclic
Digraphs

—  We
are
interested
in
conduc:ng
community
detec:on
in
the

special
case
of
dynamic
acyclic
digraphs
…

—  Before
we
transi:on
to
the
full
presenta:on
–
some

background

—  Dynamic
=
Changing
both
Locally
and
Globally

—  Digraph
=
Directed
Graph

—  Acyclic
=
No
cycles
because
current
documents
generally

cannot
cite
documents
in
the
future

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Dynamic
Acyclic
Digraphs

Case
to
Case
Judicial
Cita:on
Networks
are
Dynamic
Acyclic
Digraphs

So
are
Academic
Cita:on
Networks,
Patents,
etc.

Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz

Advanced Methods in Network Science: Community Detection Algorithms

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Advanced Methods in Network Science: Community Detection Algorithms

Similar to Advanced Methods in Network Science: Community Detection Algorithms (20)

More from Daniel Katz

More from Daniel Katz (20)

Recently uploaded

Recently uploaded (20)

Advanced Methods in Network Science: Community Detection Algorithms