05 Communities in Networks (2016)

Communi'es
in
Networks

Peter
J.
Mucha

University
of
North
Carolina

at
Chapel
Hill

0.021086
p = 0.7
Virginia
Maryland
FloridaStateDuke
NorthCarolinaState
WakeForest
ClemsonGeorgiaTech
North
Carolina
TexasTech
TexasA&M
Baylor
Texas
O
klahom
a
O
klahom
a
State
C
olorado
Kansas
State
Iowa
State
Nebraska
Missouri
Kansas
Utah State
Colorado State
Utah
Brigham Young
Wyoming
Air Force
Nevada−Las Vegas
New Mexico
San Diego State
Tulsa
Texas−El PasoSouthern MethodistFresno StateNevadaHawaiiSan Jose State
Louisiana Tech
RiceBoise State
Alabama−Birmingham
Louisville
M
em
phis
Cincinnati
H
ouston
EastCarolina
Tulane
Southern
Mississippi
Army
Non−DivisionIA
TexasChristian
CentralFlorida
SouthFlorida
TroyState
NewMexicoState
Louisiana−Lafayette
ArkansasState
NorthTexas
Louisiana−Monroe
Idaho
MiddleTennesseeState
Arkansas
Florida
Georgia
Tennessee
Kentucky
SouthCarolina
Vanderbilt
LouisianaState
Mississippi
MississippiState
Auburn
Alabam
a
W
ashington
State
W
ashington
UCLA
Southern
California
Oregon
State
Oregon
Arizona State
Stanford
CaliforniaArizona
Miami (Florida)SyracuseTemple
RutgersBoston College
Pittsburgh
West Virginia
Virginia Tech
Navy
Notre Dame
Purdue
Ohio State
Penn State
Indiana
Wisconsin
Illinois
Michigan
Northwestern
Iowa
M
innesota
M
ichigan
State
Connecticut
M
iam
i(Ohio)Kent
Marshall
AkronBuffaloOhio
BowlingGreenState
CentralMichigan
EasternMichigan
WesternMichigan
Toledo
BallState
NorthernIllinois
AGRICULTURE
APPROPRIATIONS
INTERNATIONAL RELATIONS
BUDGET
HOUSE ADMINISTRATION
ENERGY/COMMERCE
FINANCIAL SERVICES
VETERANS’ AFFAIRS
EDUCATION
ARMED SERVICES
JUDICIARY
RESOURCES
RULES
SCIENCE
SMALL BUSINESS
OFFICIAL CONDUCT
TRANSPORTATION
GOVERNMENT REFORM
WAYS AND MEANS
INTELLIGENCE
HOMELAND SECURITY
10 20 30 40 50 60 70 80 90 100 110
CT
MEMA
NHRI VT
DE
NJNY PAIL
IN
MI OHWI
IAKS
MNMO
NENDSD
VA
ALAR FLGA
LAMSNC
SC
TXKY
MDOKTN
WVAZCO
IDMT
NVNM
UT
WYCAOR
WAAK
HI
Congress #
Coupling = 0.2: 13 communities
1917D, 122R, 13other
36PA, 15F, 6AA
373D, 162J, 75other
1615R, 220W, 163F, 97AJ, 273other
605R, 109D, 6other
105DR, 1F
1256D, 140R, 62other
13PA, 4AA
67DR, 7F
66D, 2W, 1FS
105R, 44D
145DR, 28AA, 6F, 5PA
941R, 159D, 7I, 3C
1807−1809
1827−1829
1847−1849
1867−1869 1927−1929
1947−1949
1967−1969
1987−1989
2007−2009

Communi'es
in
Networks

1.  What
is
a
community
and
why
are
they
useful?

2.  How
do
you
calculate
communi'es?

•  Descrip've:
e.g.,
Modularity

•  Genera've:
e.g.,
Stochas'c
Block
Models

3.  Where
is
community
detec'on
going
in
the
future?

…
with
apologies
that
this
presenta0on
will
seriously

err
on
the
self-‐absorbed
side.
It’s
a
big
ﬁeld,
and
I
do

not
promise
to
know
nor
present
it
all.

“Communi'es
in
Networks,”
Porter,
Onnela
&
Mucha,

No0ces
of
the
American
Mathema0cal
Society
56,
1082-‐97
&
1164-‐6
(2009).

“Community
Detec'on
in
Graphs,”
S.
Fortunato,

Physics
Reports
486,
75-‐174
(2010).

Acknowledgements:

• 
Shankar
Bhamidi,
Jean
Carlson,
Aaron
Clauset,

Skyler
Cranmer,
James
Fowler,
James
Gleeson,
Sco[
Graon,

Jim
Moody,
Mark
Newman,
Andrew
Nobel,
Mason
Porter

•  Dani
Basse[,
Elizabeth
Leicht,
Nishant
Malik,
Sergey
Melnik,

J.-‐P.
Onnela,
Serguei
Saavedra

•  Dan
Fenn,
Elizabeth
Menninga,
Feng
“Bill”
Shi,

Ashton
Verdery,
Simi
Wang,
James
Wilson,
Andrew
Waugh

• 
Thomas
Callaghan,
A.
J.
Friend,
Chris'
Frost,
Eric
Kelsic,

Kevin
Macon,
Sean
Myers,
Ye
Pei,
Sco[
Powers,

Stephen
Reid,
Thomas
Richardson,
Mandi
Traud,

Casey
Warmbrand,
Yan
Zhang

•  NSF
(CAREER/REU
&
VIGRE),
NIGMS
(SNAH),

JSMF
(MAP/JF
&
PJM),
Caltech
SURF,
UNC
(AGEP,
CAS,
SURF)

•  Jim
Moody
(paraphrased):
“I’ve
been
accused
of

turning
everything
into
a
network.”

•  PJM
(in
response):
“I’m
accused
of
turning
everything

into
a
network
and
a
graph
par''oning
problem.”

•  “Structure
ßà
Func0on”

How
to
extend
the
no+on
of
modularity
in
networks

to
mul+ple
networks
between
the
same
actors/units,

i.e.
how
to
properly
use
iden+ty
in
modularity?

Philosophical
Disclaimer

Images
by
Aaron
Clauset

Karate
Club
Example

This
par''on
op'mizes
modularity,
which
measures
the

number
of
intra-‐community
'es
(rela've
to
randomness)

“If
your
method
doesn’t
work
on
this
network,
then
go
home.”

Karate
Club
Example

Brought
to
you
by
Mason
Porter
and
The
Power
Law
Shop

h[p://www.cafepress.com/thepowerlawshop

Women’s
and
kids’
sizes
also
available

“If
your
method
doesn’t
work
on
this
network,
then
go
home.”

“Cris
Moore
(leJ)
is
the

inaugural
recipient
of
the

Zachary
Karate
Club
Club
prize,

awarded
on
behalf
of
the

community
by
Aric
Hagberg

(right).
(9
May
2013)”

Facebook

Traud
et
al.,
“Comparing
community
structure
to

characteris'cs
in
online
collegiate
social
networks”
(2011)

Traud
et
al.
“Social
structure
of
Facebook
networks”
(2012)

Caltech
2005:

Colors
indicate
residen'al

“House”
aﬃlia'ons

Purple
=
Not
provided

Facebook

Caltech
2005:

Colors
indicate
residen'al

“House”
aﬃlia'ons

Purple
=
Not
provided

Traud
et
al.,
“Comparing
community
structure
to

characteris'cs
in
online
collegiate
social
networks”
(2011)

Traud
et
al.
“Social
structure
of
Facebook
networks”
(2012)

Facebook

Caltech
2005:

Colors
indicate
residen'al

“House”
aﬃlia'ons

Purple
=
Not
provided

Traud
et
al.,
“Comparing
community
structure
to

characteris'cs
in
online
collegiate
social
networks”
(2011)

Traud
et
al.
“Social
structure
of
Facebook
networks”
(2012)

Logis'c
Regression:

zRand:

Roll
call
as
a
network?

Scien'ﬁc
Coauthorship

v.

Roll
Call
Similari'es

see
Waugh
et
al.,
“Party
polariza'on
in
Congress:
a
network
science
approach”
(2009)

Moody
&
Mucha,
“Portrait
of
poli'cal
party
polariza'on”
(2013)

Parker
et
al.,
“Network
Analysis
Reveals
Sex-‐
and
An'bio'c
Resistance-‐
Associated
An'virulence
Targets
in
Clinical
Uropathogens”
(2015)

Communi'es
in
Networks

1.  What
is
a
community
and
why
are
they
useful?

2.  How
do
you
calculate
communiBes?

•  DescripBve:
e.g.,
Modularity

•  GeneraBve:
e.g.,
StochasBc
Block
Models

3.  Where
is
community
detec'on
going
in
the
future?

…
with
apologies
that
this
presenta0on
will
seriously

err
on
the
self-‐absorbed
side.
It’s
a
big
ﬁeld,
and
I
do

not
promise
to
know
nor
present
it
all.

“Communi'es
in
Networks,”
Porter,
Onnela
&
Mucha,

No0ces
of
the
American
Mathema0cal
Society
56,
1082-‐97
&
1164-‐6
(2009).

“Community
Detec'on
in
Graphs,”
S.
Fortunato,

Physics
Reports
486,
75-‐174
(2010).

Community
Detec'on
Firehose
Overview

•  Computa'onal
sledgehammer
for
large
data

•  “Hard/rigid”
v.
“so/overlapping”
clusters

•  cf.
biclustering
methods
and
mathema'cs
of
expander
graphs

•  A
community
should
describe
a
“cohesive
group,”
and
there
are

varying
formula'ons
and
algorithms

–  Linkage
clustering
(average,
single),
local
clustering
coeﬃcients,

betweeness
(geodesic,
random
walk),
spectral,
conductance,…

•  Classic
approach
in
CS:

Spectral
Graph
Par''oning

–  Need
to
specify
number
of
communi'es
sought

•  Conductance

•  MDL,
Infomap,
OSLOM,
…
(many
other
things
I’ve
missed)
…

•  Modularity:

a
good
par''on
has
more
intra-‐community
edges
than

one
would
expect
at
random

•  Stochas'c
Block
Models:

a
genera've
random
graph
model
with

diﬀerent
in/out
probabili'es
between
labeled
groups

“Communi'es
in
Networks,”
Porter,
Onnela
&
Mucha,

No0ces
of
the
American
Mathema0cal
Society
56,
1082-‐97
&
1164-‐6
(2009).

“Community
Detec'on
in
Graphs,”
S.
Fortunato,
Physics
Reports
486,
75-‐174
(2010).

Images
by
Aaron
Clauset

Structure
ßà
Func'on/Process

“Modularity”
Approach:

Community
Detec'on:

Null
Model
&

Computa'onal
Heuris'cs

•  GOAL:

Assign
nodes
to
communi'es
in
order
to

maximize
quality
func'on
Q

•  NP-‐Complete
[Brandes
et
al.
2008]

~
enumerate
possible
par''ons

•  Numerous
packages
developed/developing

–  e.g.
igraph
library
(R,
python),
NetworkX

– Need
appropriate
null
model

Maximizing
Modularity

(Newman
&
Girvan,
PRE
2004;
Newman,
PRE
2004,
PNAS
2006,
PRE
2006)

•  Independent
edges,
constrained
to
expected

degree
sequence
same
as
observed.

•  Requires
Pij
=
f(ki)f(kj),
quickly
yielding

•  γ
resolu'on
parameter
ad
hoc
(default
=
1)

(Reichardt
&
Bornholdt,
PRE
2006;

Lambio[e
et
al.,
arXiv
2008)

•  Resolu0on
limit
(Fortunato
&
Barthelemy,
PNAS
2007)

Degenerate
landscape
(Good,
de
Montjoye
&
Clauset,
PRE
2010)

Forces
par00on
(many
authors!)

Fenn
et
al.,
Chaos
2009
Macon,
PJM
&
MAP,
Physica
A
2012

Community
Detec'on:

Other
Models

•  Erdos-‐Renyi
(Bernoulli)
•  Newman-‐Girvan*

•  Leicht-‐Newman*
(directed)
•  Barber*
(bipar'te)

Poli'cal
Blogs
(Adamic
&
Glance,
WWW-‐2005)

“On
closer
inspec0on,
we
find
that
the
method
[(a)]
fails
in
this
case

because
it
does
not
take
into
account
the
wide
varia0on
among
the
degrees

of
nodes
in
the
network.
In
this
network
(and
many
others)
degrees
vary

over
a
great
range,
whereas
degrees
in
the
block
model
are
Poisson

distributed
and
narrowly
peaked
about
their
mean.
This
means,
in
effect,

that
there
is
no
choice
of
parameters
for
the
model
that
gives
a
good
fit
to

the
data.
Ficng
this
block
model
is
similar
to
ficng
a
straight
line
through

an
inherently
curved
set
of
data
points—you
can
do
it,
but
it
is
unlikely
to

give
you
a
meaningful
answer.”
—Newman,
Nature
Physics
2012

Similar
visualiza'ons
from
different
models
in
Amini
et
al.,
arXiv
(2012)

Bo[om
Right:
Par''ons
v.
overlap
&
extrac'on
(Wilson
et
al.
in
prep)

Fortunato
&
Barthelemy,
PNAS
2007
Ball,
Karrer
&
Newman,
PRE
2011

Louvain
(Blondel
et
al.
J.Stat.Mech.
2008)

Other
great
codes
to
know:

h[p://www.mapequa'on.org/

h[ps://graph-‐tool.skewed.de/

InfoMap

(Rosvall
&
Bergstrom
2008)

OSLOM
(Lancichinez
et
al.,
PLoS
One
2011)

•  Score:
Signiﬁcance

•  “Homeless”
ver'ces

•  Overlap

•  Cluster
hierarchy

•  Because
of
the
way
the

algorithm
evolves

clusters,
it
can
naturally

be
used
for
temporal

network
data.

Conductance
&
NCP
Plots
(Leskovec,
Mahoney,
…)

Stochas'c
Block
Models

R:
Mixer

Python:
Graph-‐Tool

At
the
most
general
level…

Two
related
but
different
issues
to
keep
straight:

1.  Theore'cal
Concept
(e.g.,
“Modularity”,

“Map
Equa'on”,
“Stochas'c
Block
Models”)

2.  Computa'onal
Heuris'c
&
Implementa'on

(e.g.
“Fast
Greedy”,
“Louvain”,
“Itera've

Improvement”,
or
the
specific
SBM
code

[possible
ini'aliza'on
issues
with
some])

And,
finally,
how
do
you
compare
communi'es?

Comparing
Par''ons

(e.g.
Sec'on
15.2
of
Fortunato
2010)

R
x
C
Con'ngency
Table:

1.  Cluster
Matching

–  Requires
injec0on

2.  Pair
Coun'ng

–  “Adjusted”
v.

“Standardized”

3.  Informa'on
Theory

–  Varia'on
of

Informa'on,

Normalized
Mutual

Informa'on

Informa'on-‐Theore'c
Comparisons

(e.g.
Sec'on
15.2
of
Fortunato
2010)

Pair
Coun'ng
&
Standardiza'on

(see,
e.g.,
Traud
et
al.,
SIAM
Review
2011)

wαβ
counts:
α
&
β
binary

indicator
for
same/diﬀerent

•  Rand,
Jaccard,
Minkowski,

Fowlkes-‐Mallows,…

•  “Adjusted”:
center
on
mean

with
perfect
match
=
1

•  “Standardized”
by
stdev,

expressed
as
z-‐score

•  Linear
in
w11
à
equal
z

•  Monotonic
in
w11
à
equal
p

Communi'es
in
Networks

1.  What
is
a
community
and
why
are
they
useful?

2.  How
do
you
calculate
communi'es?

•  Modularity,
Stochas'c
Block
Models,
Infomap

3.  Where
is
community
detecBon
going
in
the
future?

…
with
apologies
that
this
presenta0on
will
seriously

err
on
the
self-‐absorbed
side.
It’s
a
big
ﬁeld,
and
I
do

not
promise
to
know
nor
present
it
all.

“Communi'es
in
Networks,”
Porter,
Onnela
&
Mucha,

No0ces
of
the
American
Mathema0cal
Society
56,
1082-‐97
&
1164-‐6
(2009).

“Community
Detec'on
in
Graphs,”
S.
Fortunato,

Physics
Reports
486,
75-‐174
(2010).

MulBlayer
Networks

Ordered
Categorical

Mucha
et
al.,
“Community
structure
in
'me-‐dependent,
mul'scale,
and
mul'plex
networks”
(2010)

Kivelä
et
al.,
“Mul'layer
Networks”
(2014)

Mul'layer
Modularity
Deriva'on

•  Generalized
Lambio[e
et
al.
(2008)
connec'on
between

modularity
and
autocorrela'on
under
Laplacian
dynamics
to

rederive
null
models
for
bipar'te
(Barber),
directed
(Leicht-‐
Newman),
and
signed
(Traag
et
al.)
networks,
via
one-‐step

condi'onal
probabili'es

intra-‐slice

adjacency
data

and
null

inter-‐slice

idenBty
arcs

Same
formalism
works
for
more
general
mul'layer
networks,

with
sum
over
inter-‐layer
connec'ons
within
same
community

Mucha
et
al.,
“Community
structure
in
'me-‐dependent,
mul'scale,
and
mul'plex
networks”
(2010)

110
Senates
(two-‐year
Congresses)

PJM
&
MAP,
Chaos
2010

“Mul'layer
Stochas'c
Block
Model”

Strata
MLSBM
(sMLSBM)

Stanley
et
al.,
“Clustering
network
layers
with
the

strata
mul'layer
stochas'c
block
model”
(to
appear)

Initialization
layer l kmeans
cluster L
layers in
to S
strata
stratum s
Iterative Process
stratum s
Update number of strata to the
number of unique clustering
patterns according to (1) and (2)
kmeans
cluster
2L
layers in
to S
strata
(1)
(2)

sMLSBM
on
SparCC
microbial
interac'ons

Stanley
et
al.,
“Clustering
network
layers
with
the

strata
mul'layer
stochas'c
block
model”
(to
appear)

Summary

•  Community
detec'on
is
an
exploratory
tool
that
can

provide
a
simpliﬁed
high-‐level
view
of
the
organiza'on
of
a

network.

•  There
are
many
methods.
Don’t
0e
yourself
down
to
one

method:
good
clusters
should
be
robust,
and
(hopefully)

your
story
shouldn’t
depend
on
the
precise
method
(or

understand
why).

•  Many
of
these
methods
have
parameters
and
it
is

important
to
know
about
them
for
best
use.

•  Mul'layer
networks
are
very
general.
There
are
rela'vely

few
op'ons
currently
available
for
ﬁnding
communi'es
in

mul'layer
network
data,
but
this
area
will
expand
rapidly.

05 Communities in Networks (2016)

More Related Content

What's hot

Similar to 05 Communities in Networks (2016)

More from Duke Network Analysis Center

Recently uploaded

05 Communities in Networks (2016)