2. Defini:on
–
Simple
Version
— Broadly:
a
group
of
nodes
that
are
rela&vely
densely
connected
to
each
other
but
sparsely
connected
to
other
dense
groups
in
the
network
¡ Porter,
Onnela,
Mucha.
Communi&es
in
Networks.
No:ces
to
the
AMS,
2009.
— Examples:
¡ Cliques
in
a
high
school
social
network
¡ Vo:ng
coali:ons
in
Congress
¡ Consumer
types
in
a
network
of
co-‐purchases
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
3. Example
–
Social
Networks
Imagine
this
Graph
….
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
4. Example
–
Social
Networks
VerJces:
People
Edges:
Friendship
What
factors
might
affect
the
formaJon
of
friendships
in
a
high
school
social
network?
Ideas:
Age,
Gender,
Class,
Race,
Interests
How
might
we
assign
communiJes
to
this
network?
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
5. Example
–
Social
Networks
VerJces:
People
Edges:
Friendship
Girls
What
factors
might
affect
the
formaJon
of
friendships
in
a
high
school
social
network?
Ideas:
Age,
Gender,
Class,
Race,
Interests
Boys
How
might
we
assign
communiJes
to
this
network?
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
6. Example
–
Vo:ng
Coali:ons
VerJces:
People
Edges:
Co-‐voted
at
least
once
Now
let s
look
at
the
same
network
as
if
it
represented
co-‐voJng
in
the
Senate.
Ideas:
Issue
posi:on,
geography,
ethnicity,
gender
How
might
we
assign
communiJes
to
this
network?
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
7. Example
–
Vo:ng
Coali:ons
Republicans
VerJces:
People
Democrats
Edges:
Co-‐voted
at
least
once
Now
let s
look
at
the
same
network
as
if
it
represented
co-‐voJng
in
the
Senate.
Ideas:
Issue
posi:on,
geography,
ethnicity,
gender
How
might
we
assign
communiJes
to
this
network?
Independents
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
8. Context!
Note
that
we
have
assigned
community
membership
differently
despite
observing
the
same
graph!
Community
detecJon
is
not
a
concept
that
can
be
divorced
from
context.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
10. Directedness
Many
methods
do
not
incorporate
direcJon!
Many
methods
that
do
incorporate
direcJon
do
not
allow
for
bidirected
edges.
Different
soVware
packages
may
implement
the
same
method
with
or
without
support
for
directed
edges.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
11. Weights
Unweighted
Weighted
•
Binary
rela:onships
•
Rela:onship
strength
•
Data
limita:ons
•
Frequency
of
rela:onship
•
Flow
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
12. Weights
Note
edge
thickness.
Unweighted
Weighted
•
Binary
rela:onships
•
Rela:onship
strength
•
Data
limita:ons
•
Frequency
of
rela:onship
•
Flow
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
13. Weights
Many
methods
do
not
incorporate
edge
weights!
Methods
that
do
incorporate
edge
weights
may
differ
in
acceptable
values!
•
Integers
or
real
weights
•
Strictly
posi:ve
weights
Different
soVware
packages
may
implement
the
same
method
with
or
without
support
for
weighted
edges.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
14. Resolu:on
Resolu:on
is
a
concept
inherited
from
op:cs.
According
to
Wiki,
Op,cal
resolu,on
describes
the
ability
of
an
imaging
system
to
resolve
detail
in
the
object
that
is
being
imaged.
High
resoluJon)
Low
resoluJon
•
Can
make
out
many
details!
(15.1MP)
•
Can t
read
a
word!
•
But…
•
But…
•
Details
may
be
noise
•
Can
focus
on
broad
regions
•
Some:mes
they
don t
ma]er!
•
Noise
is
out
of
focus
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
15. Resolu:on
Same
graphs!
High
resoluJon
(microscopic)
Low
resoluJon
(macroscopic)
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
16. Resolu:on
Different
hypotheses
or
quesJons
correspond
to
different
resoluJons.
Different
methods
are
more
or
less
effecJve
at
detecJng
community
structure
at
different
resoluJons.
Modularity-‐based
methods
cannot
detect
structure
below
a
known
resoluJon
limit.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
17. Overlapping
Communi:es
Palla,
Derenyi,
Farkas
,Vicsek.
Uncovering
the
overlapping
community
structure
of
complex
networks
in
nature
and
society
Nature
435,
2005.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
18. Computa:onal
Complexity
Refresher
ComputaJonal
complexity
is
a
serious
issue!
Data
is
becoming
more
abundant
and
more
detailed.
Many
quan:ta:ve
research
projects
hinge
on
the
feasibility
of
calcula:ons.
Understanding
computa:onal
complexity
can
allow
you
to
communicate
with
department
IT
personnel
or
computer
scien:sts
to
solve
your
problem.
Make
sure
your
project
is
feasible
before
commi[ng
the
Jme!
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
19. Computa:onal
Complexity
Refresher
Computa:onal
complexity
in
the
context
of
modern
compu:ng
is
primarily
focused
on
two
resources:
1.
Time:
How
long
does
it
take
to
perform
a
sequence
of
opera:ons?
• CPU/GPU
• Exact
vs.
approximate
solu:ons
2.
Storage:
How
much
space
does
it
take
to
store
our
problem?
• Memory
and
persistent
storage
(to
a
lesser
degree)
• Data
representa:ons
We
tend
to
communicate
:me
and
storage
complexity
through
Big-‐O
nota:on.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
20. Computa:onal
Complexity
Refresher
In
computa:onal
complexity,
Big-‐O
nota:on
conveys
informa:on
about
how
:me
and
storage
costs
scale
with
inputs.
•
O(1):
constant
-‐
independent
of
input
•
O(n):
scales
linearly
with
the
size
of
input
•
O(n^2):
scales
quadra:cally
with
the
size
of
input
•
O(n^3):
scales
cubically
with
the
size
of
input
These
terms
ofen
occur
with
log
n
terms
and
are
then
given
the
prefix
quasi-‐.
For
graph
algorithms,
the
input
n
is
typically
• |V|,
the
number
of
ver:ces
• |E|,
the
number
of
edges
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
21. Taxonomy
of
Methods
This
taxonomy
of
methods
follows
the
history
of
their
development.
• Divisive
Methods
• Edge-‐betweenness
(2002)
• Modularity
Methods
• Fast-‐greedy
(2004)
• Leading
Eigenvector
(2006)
• Dynamic
Methods
• Clique
percola:on
(2005)
• Walktrap
(2005)
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
22. Edge
Betweenness
PublicaJon(s):
Girvan,
Newman.
Community
structure
in
social
and
biological
networks.
PNAS,
2002.
Basic
Idea:
Divide
the
network
into
subsequently
smaller
pieces
by
finding
edges
that
bridge
communi:es.
Constraints:
•
Can
be
adapted
to
directed
networks
(igraph).
•
Can
be
adapted
to
weights
(no
public
sofware).
Time
Complexity:
O(|V|^3)
in
general,
O(|V|^2
log
|V|)
for
special
cases
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
24. Quick
Aside
–
Zach s
Karate
Club
Zachary's
Karate
Club:
Social
network
of
friendships
between
34
members
of
a
karate
club
at
a
US
university
in
the
1970s
Event:
During
the
observa:on
period,
the
club
broke
into
2
smaller
clubs.
This
split
occurred
along
a
pre-‐exis:ng
social
division
between
the
two
communi:es
in
the
network.
Drawn
from
the
Paper:
Zachary.
An
informa&on
flow
model
for
conflict
and
fission
in
small
groups.
Journal
of
Anthropological
Research
33,
1977.
Download
the
Data:
h]p://www-‐personal.umich.edu/~mejn/netdata/
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
25. Edge
Betweenness
Only
misclassifica:on
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
26. Edge
Betweenness
Betweenness
tends
to
get
the
big
picture
right.
However,
resolu:on
can
be
a
problem!
Do
not
draw
conclusions
about
small
communi:es
from
this
algorithm
alone.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
27. Modularity
•
e
is
the
number
of
edges
in
module
i
•
d
is
total
degree
of
ver:ces
in
module
i
•
m
is
the
total
number
of
edges
in
network
Q
is
difference
between
observed
connecJvity
within
modules
and
EV
for
the
configuraJon
model
(degree-‐distribuJon
fixed)
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
28. Modularity
Remember
our
previous
discussion
on
computa:onal
complexity?
Modularity
maximiza:on
is
an
NP-‐hard
problem.
This
means
that
there
is
no
polynomial
representa:on
of
:me
complexity!
All
methods
therefore
try
to
solve
for
approximate
solu&ons.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
29. Modularity
Benjamin
H.
Good,
Yves-‐Alexandre
de
Montjoye
&
Aaron
Clauset,
The
Performance
of
Modularity
Maximiza:on
in
Prac:cal
Contexts,
Phys.
Rev.
E
81,
046106
(2010)
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
30. Fast
Greedy
PublicaJon(s):
•
Newman.
Fast
algorithm
for
detec&ng
community
structure
in
networks.
Phys.
Rev.
E,
2004.
•
Clauset,
Newman,
Moore.
Finding
community
structure
in
very
large
networks.
Phys.
Rev.
E,
2004.
•
Wakita,
Tsurumi.
Finding
Community
Structure
in
Mega-‐scale
Social
Networks.
2007.
Basic
Idea:
Try
to
randomly
assemble
a
larger
and
larger
communi:es
from
the
ground
up.
Start
by
placing
each
vertex
in
its
own
community
and
then
combine
communi:es
that
produce
the
best
modularity
at
that
step.
Constraints:
•
Can
be
adapted
to
directed
edges
(no
public).
•
Can
be
adapted
to
weights
(igraph).
Time
Complexity:
O(|E||V|
log
|V|)
worst
case
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
31. Fast
Greedy
Fast-‐Greedy
also
tends
to
aggressively
create
larger
communi:es
to
the
detriment
of
smaller
communi:es.
Why
is
this
node
red
instead
of
blue?
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
32. Leading
Eigenvector
PublicaJon(s):
•
Newman.
Finding
community
structure
in
networks
using
the
eigenvectors
of
matrices.
Phys.
Rev.
E,
2006.
•
Leicht,
Newman.
Community
structure
in
directed
networks.
Phys.
Rev.
Le].,
2008.
Basic
Idea:
Use
the
sign
on
the
components
of
the
leading
eigenvector
of
the
Laplacian
to
sequen:ally
divide
the
network.
Constraints:
•
Can
be
adapted
to
directed
edges
(no
public).
•
Can
be
adapted
to
weights
(igraph).
Time
Complexity:
O(|V|^2)
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
33. Leading
Eigenvector
Note
that
eigenvector s
results
seem
to
split
the
difference
between
edge
betweenness
and
fast-‐greedy
in
this
case.
Why
are
these
nodes
not
a
part
of
the
larger
modules?
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
34. Walktrap
PublicaJon(s):
Pons,
Latapy.
Compu&ng
communi&es
in
large
networks
using
random
walks.
JGAA,
2006.
Basic
Idea:
Simulate
many
short
random
walks
on
the
network
and
compute
pairwise
similarity
measures
based
on
these
walks.
Use
these
similarity
values
to
aggregate
ver:ces
into
communi:es.
Constraints:
•
Can
be
adapted
to
directed
edges
(igraph).
•
Can
be
adapted
to
weights
(igraph).
•
Can
alter
resolu:on
by
walk
length
(igraph).
Time
Complexity:
depends
on
walk
length,
O(|V|^2
log
|V|)
typically
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
35. Walktrap
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
36. Walktrap
Walktrap
assigns
ver:ces
to
different
communi:es
than
previous
algorithms.
Note
that
the
simulated
walk
length
can
be
changed
to
alter
resolu:on.
Furthermore,
simulaJon
is
stochasJc
and
thus
results
may
change
even
aVer
fixing
the
walk
length
and
input
graph!
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
37. Method
Comparison
Edge-‐Betweenness
Fast-‐Greedy
Walktrap
Leading
Eigenvector
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
38. Recommended
Sofware
-‐
igraph
•
Core
Library:
C
•
Interfaces:
Python,
R,
Ruby
•
Features:
Graph
opera:ons
&
algorithms,
random
graph
genera:on,
graph
sta:s:cs,
community
detec:on,
visualiza:on
layout,
ploqng
•
URL:
h]p://igraph.sourceforge.net/
•
Documenta:on:
h]p://igraph.sourceforge.net/documenta:on.html
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
40. Fron:ers
of
Community
Detec:on:
Temporal
Network
Dynamics
Gergely Palla, Albert-Laszlo Barabasi & Tamas Vicsek, Quantifying
Social Group Evolution, Nature 446:7136, 664-667 (2007)
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
41. Fron:ers
of
Community
Detec:on:
Community
Structure
Over
Scales,
Time
Period,
etc.
Science 14 May 2010, Vol. 328. no. 5980,
pp. 876 - 878
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
42. Community
Detec:on
Review
Ar:cles
Some
Useful
Review
ArJcles:
Mason A. Porter, Jukka-Pekka Onnela and Peter J. Mucha. 2009.
Communities in Networks. Notices of the American Mathematical Society
56: 1082-1166.
Santo Forunato. 2010. Community detection in graphs. Physics Reports.
486: 75-174.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
43. A
Transi:on
to
Our
Sink
Method
Paper
— Now
we
are
going
to
transi:on
to
a
specific
project
-‐-‐-‐
where
we
apply
some
of
the
ideas
contained
herein
— Provide
a
very
brief
introduc:on
to
the
Exponen:al
Random
Graph
Models
(p*)
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
44. Our
Sink
Paper
–Physica
A
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
45. Dynamic
Acyclic
Digraphs
— We
are
interested
in
conduc:ng
community
detec:on
in
the
special
case
of
dynamic
acyclic
digraphs
…
— Before
we
transi:on
to
the
full
presenta:on
–
some
background
— Dynamic
=
Changing
both
Locally
and
Globally
— Digraph
=
Directed
Graph
— Acyclic
=
No
cycles
because
current
documents
generally
cannot
cite
documents
in
the
future
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz
46. Dynamic
Acyclic
Digraphs
Case
to
Case
Judicial
Cita:on
Networks
are
Dynamic
Acyclic
Digraphs
So
are
Academic
Cita:on
Networks,
Patents,
etc.
Michael
J.
Bommarito
II,
Daniel
Mar:n
Katz