Multi-level analysis on structures and dynamics of OSN
1. Mul$-‐level
Analysis
on
Structures
and
Dynamics
of
OSN
Haewoon
Kwak
Department
of
Computer
Science,
KAIST
Ph.
D.
thesis
defense
April
12th
2011
Advisor:
Sue
Moon
1
2. Outline
• Background:
data
sources
for
social
research
– Tradi$onal
methodologies
developed
in
sociology
– Electronic
footprints
• Complex
structures
and
dynamics
of
OSN
• Mul$-‐level
approach
Individual
level
Dyad
level
Community
level
Network-‐wide
level
• Summary
&
future
direc$on
2
3. • Surveys
• Ques$onnaires
• Archives
• Observa$ons
• Experiments
• Issues
of
data
scalability,
quality,
&
measurement
[Marsden
90]
Methodologies
developed
in
sociology
3
4. Self-‐report
vs.
observed
Eagle
N,
Pentland
A,
LazerD
(2009)
Inferring
friendship
network
structure
using
mobile
phone
data.
PNAS,
106:15274–15278. 4
5. The
emergence
of
electronic
footprints
Tracking
People's
Electronic
Footprints,
Science
10
November
2006:
Vol.
314
no.
5801
pp.
914-‐916
5
6. • Mobile
phone
call
logs
[J.
P.
Onnela
et
al.
07]
• E-‐mail
logs
[Kossinets
&
D.
Waas
09]
• Online
‘friend’
rela$onships
[Ahn
et
al.
07]
• Facebook
wall
pos$ngs
[S.
Golder
et
al.
07]
• Conversa$ons
on
MSN
[Leskovec
&
Horvitz
07]
• Photos
in
Flickr
[Marlow
et
al.
07]
Forms
of
electronic
footprints
6
8. • Facebook
– 30B
pieces
of
new
informa$on
/
month
• Twiaer
– 1B
messages
/
week
– 7TB
/
day
– “300GB
while
I
give
this
talk”
Tremendous
volume
of
records
hap://www.facebook.com/press/info.php?sta$s$cs
NoSQL
at
Twiaer
(NoSQL
EU
2010) 8
9. • Establish
online
‘friend’
rela$onship
• Send
a
message
• Send
a
gil
• Upload
a
photo
• Share
one’s
loca$on
• Play
a
game
Rich
behavior
in
OSN
9
10. • New
source
that
reveals
human
nature
– As
a
miniature
of
human
society
– e.g.
verifying
balanced
theory
&
weak
$e
hypothesis
• Virtual
world
interac$ng
with
a
real
world
– Elec$on
campaign
Q:
Why
we
study
OSN?
A:
OSN
is
…
10
11. • New
source
that
reveals
human
nature
– As
a
miniature
of
human
society
– e.g.
verifying
balanced
theory
&
weak
$e
hypothesis
• Virtual
world
interac$ng
with
a
real
world
– Elec$on
campaign
– Gossip
propaga$on
Q:
Why
we
study
OSN?
A:
OSN
is
…
11
12. • New
source
that
reveals
human
nature
– As
a
miniature
of
human
society
– e.g.
verifying
balanced
theory
&
weak
$e
hypothesis
• Virtual
world
interac$ng
with
a
real
world
– Elec$on
campaign
– Gossip
propaga$on
– Reputa$on
management
Q:
Why
we
study
OSN?
A:
OSN
is
…
12
13. • New
source
that
reveals
human
nature
– As
a
miniature
of
human
society
– e.g.
verifying
balanced
theory
&
weak
$e
hypothesis
• Virtual
world
interac$ng
with
a
real
world
– Elec$on
campaign
– Gossip
propaga$on
– Reputa$on
management
– Money
exchange
Q:
Why
we
study
OSN?
A:
OSN
is
…
13
14. Complex
structures
and
dynamics
(1)
American
Journal
of
Sociology,
Vol.
100,
No.
1."Chains
of
affec$on:
The
structure
of
adolescent
roman$c
and
sexu
al
networks
“
(2004)
Facebook
social
graph
by
Facebook
Data
team
14
15. Complex
structures
and
dynamics
(2)
Annual
Review
of
Sociology,
Vol.
30
“THE
‘NEW’
SCIENCE
OF
NETWORKS”
(2004)
Nature
reviews
gene$cs,
Vol.
5,
“Network
biology:
under
standing
the
cell's
func$onal
organiza$on”
(2004)
WWW,
“Analysis
of
Topological
Characteris$cs
of
Huge
Online
Social
Networking
Services”
(2007)
15
16. • OSN’s
evolu$on…
only
#(users)?
Complex
structures
and
dynamics
(3)
16
17. • Mul$-‐level
approach
to
OSN
• Mul$ple
views
focusing
on
different
en$$es
Individual
Dyad
Community
Network-‐wide
Our
approach
–
Divide
and
conquer
17
18. • Each
level
has
its
own
inherent
resolu$on
– to
capture
elemental
processes
– to
understand
the
complex
structures
and
dynamics
as
a
combina$on
of
findings
across
mul$ple
levels
Strong
points
of
mul$-‐level
approach
18
19. e.g.
How
a
user
has
posi$onal
power?
A
story
begins
with
an
individual
19
27. • OSN’s
evolu$on…
only
#(users)?
Complex
structures
and
dynamics
(3)
27
28. • Different
perspec$ves
of
the
evolu$on
of
OSN
– Increasing
avg.
Mme
on
sites
– Increasing
#
of
friends
– Diversifying
types
of
rela$onship
– Increasing
#
of
cohesive
groups
– Increasing
density
of
the
network
– Shortening
the
avg.
diameter
of
the
network
– Absorbing
other
networks
Complex
structures
and
dynamics
(3)
Complex
structures
and
dynamics
(3)Complex
phenomena
in
mul$-‐level
28
29. • Cannot
be
answered
in
one-‐level
only
• Thus,
we
tackle
it
step
by
step
from
microscopic
to
macroscopic
view
– Individual
level
– Dyad
level
– Community
level
– Network-‐wide
level
29
Complex
structures
and
dynamics
of
OSN
30. • Mul$-‐level
analysis
of
structures
and
dynamics
– Personal
preferences
and
friend
recommenda$on
– Rela$onship
dynamics
– Consistent
community
iden$fica$on
– Interplay
between
structures
and
dynamics
Overview
of
this
thesis
30
31. Overview
of
this
thesis
31
2008
2009 2010 2011
Comparison
of
Online
Social
Rela7ons
In
Terms
of
Volume
vs.
Interac7on:
A
Case
Study
of
Cyworld
Hyunwoo
Chun,
Haewoon
Kwak,
Young-‐Ho
Eom,
Yong-‐Yeol
Ahn,
Sue
Moon,
and
Ha
woong
Jeong,
The
8th
ACM
SIGCOMM
Conference
on
Internet
Measurement
(IMC).
2008.
32. Overview
of
this
thesis
32
2008 2009
2010 2011
Connec7ng
Users
with
Similar
Interests
Across
Mul7ple
Web
Services
Haewoon
Kwak,
Hwa-‐Yong
Shin,
Jong-‐Il
Yoon,
and
Sue
Moon,
The
3rd
Interna$onal
AAAI
Conference
on
Weblogs
and
Social
Media
(ICWSM),
Poster,
2009.
Mining
communi7es
in
networks:
A
solu7on
for
consistency
and
its
evalua7on
Haewoon
Kwak,
Yoonchan
Choi,
Young-‐Ho
Eom,
Hawoong
Jeong,
and
Sue
Moon.
The
9th
ACM
SIGCOMM
Conference
on
Internet
Measurement
(IMC),
2009.
33. Overview
of
this
thesis
33
2008 2009 2010
2011
What
is
TwiKer,
a
Social
Network
or
News
Media?
Haewoon
Kwak,
Changhyun
Lee,
Hosung
Park,
and
Sue
Moon,
The
19th
interna$onal
conference
on
World
wide
web
(WWW),
2010
Finding
Influen7als
Based
on
the
Temporal
Order
of
Info.
Adop7on
in
TwiKer
Changhyun
Lee,
Haewoon
Kwak,
Hosung
Park,
and
Sue
Moon,
The
19th
interna$onal
conference
on
World
wide
web
(WWW),
Poster,
2010
Ph.
D.
Thesis
Proposal
34. Overview
of
this
thesis
34
2008 2009 2010 2011
Fragile
Online
Rela7onship:
A
First
Look
At
Unfollow
Dynamics
In
TwiKer
Haewoon
Kwak,
Hyunwoo
Chun,
and
Sue
Moon,
The
29th
interna$onal
conference
o
n
Human
factors
in
compu$ng
systems
(CHI),
2011.
35. Individual
&
dyad-‐level
view:
Personal
preferences
&
friend
recommenda$on
Connec7ng
Users
with
Similar
Interests
Across
Mul7ple
Web
Services
Haewoon
Kwak,
Hwa-‐Yong
Shin,
Jong-‐Il
Yoon,
and
Sue
Moon,
The
3rd
Interna$onal
AAAI
Conference
on
Weblogs
and
Social
Media
(ICWSM),
Poster,
2009.
다종 웹 서버 간 유사 사용자 추출 시스템 및 그 방법
Sue
Moon,
Haewoon
Kwak,
Hwa-‐Yong
Shin,
and
Jong-‐Il
Yoon,
Korean
Patent
No.
10-‐1010997,
2011.
35
36. • Groups
in
OSN
– Mo$vate
user
ac$vi$es
– Share
common
interests
and
offline
commonali$es
Mo$va$on
Blackbox
But,
restricted
within
the
boundary
of
each
service
36
37. • To
recommend
people
who
share
interests
across
various
kinds
of
web
services
• To
support
many
web
services
without
modifica$on
• Not
to
burden
users
with
addi$onal
profile
management
Our
goals
37
38. • Inferring
interests
from
tags
– Free
keywords
to
describe
user-‐generated
contents
– Publicly
accessible
– Simple
format
(plain
text)
Key
insights
38
40. • Tags
from
6
various
services
• Refinement
by
WordNet
API
Datasets
Contents Uniq.
users Uniq.
tags Avg.
tags
Del.icio.us Bookmark 40,072 1,092,534 227.2
Flickr Photo 6,366 71,724 32.4
YouTube Video 9,481 171,990 56.5
LiveJournal Blog 49,792 729,975 44.49
Last.FM Music 54,464 95,901 10.95
AllBlog Blog 24,559 383,374 44.04
40
41. • Highly
skewed
popularity
– Top
20%
of
tags
associated
with
90%
of
items
• Service-‐dependent
– About
20%
of
tags
belong
to
more
than
one
service
• Frequently
change
over
$me
Findings
about
user
interests
41
42. • Tag
weight
assignment
– Absolute
number
of
$mes
a
tag
is
used
– Normalized
number,
N,
of
$mes
a
tag
is
used
– N-‐idf
• Similarity
calcula$on
between
two
tag
sets
– Sum
of
the
weights
of
common
tags
– Cosine
similarity
based
on
the
vector
model
Recommenda$on
algorithms
42
43. User
study
of
algorithm
evalua$on
N-‐idf
+
cosine
similarity
works
43
44. • Flickr
and
YouTube
users
find
good
matches
in
other
services
• LiveJournal
aaracts
users
from
other
services
beaer
than
any
other
service
Condi$onal
prob.
of
recommenda$on
44
45. Dyad-‐level
view:
Social
rela$onship
dynamics
Comparison
of
Online
Social
Rela7ons
In
Terms
of
Volume
vs.
Interac7on:
A
Case
Study
of
Cyworld
Hyunwoo
Chun,
Haewoon
Kwak,
Young-‐Ho
Eom,
Yong-‐Yeol
Ahn,
Sue
Moon,
and
Hawoong
Jeong,
The
8th
ACM
SIGCOMM
Conference
on
Internet
Measurement
(IMC’08).
2008.
Fragile
Online
Rela7onship:
A
First
Look
At
Unfollow
Dynamics
In
TwiKer
Haewoon
Kwak,
Hyunwoo
Chun,
and
Sue
Moon,
The
29th
interna$onal
conference
on
Human
factors
in
compu$ng
systems
(CHI’11),
2011.
45
46. • Exchange
of
guestbook
messages
in
Cyworld
• Rela$onship
dissolu$on
by
unfollow
in
Twiaer
Rela$onship
dynamics
observed
from
46
47. • Most
popular
online
SNS
in
Korea
(22M
users)
• Guestbook
is
the
most
popular
feature
Part
I:
guestbook
log
analysis
in
Cyworld
47
48. • Online
‘friends’
rela$onship
– Needs
no
more
cost
once
established
– Commonly
mutual
– All
online
friends
are
considered
equally
Mo$va$on
Thus,
online
‘friends’
are
not
enough
to
represent
soci
al
rela$onships
at
that
$me
48
49. • Direc7onality
and
strength
of
user
interac$ons
reveal
more
meaningful
rela$onships
than
online
‘friends'
do
Key
insights
49
50. From
logs
to
the
ac$vity
network
<
From,
To,
When
>
<A,
C,
20040103T1103>
<B,
C,
20040103T1106>
<C,
B,
20040104T1201>
<B,
C,
20040104T0159>
CA
B
1
2 1
Directed
&
weighted
“AcMvity
network”
8
billion
messages
Graph
construc$on
50
51. • Have
power-‐law
degree
distribu$on
– A
few
number
of
high-‐degree
nodes
– A
large
number
of
low-‐degree
nodes
• Have
common
characteris$cs
– Short
diameter
– Fault
tolerant
Most
social
networks
Nature
Reviews
GeneMcs
5,
101-‐113,
2004
51
57. Reciprocity
in
user
ac$vi$es
y=x
-‐
Highly
reciprocal
-‐
Quan$ta$ve
proof
of
spammers
57
58. • Do
users
interact
evenly
with
all
friends?
Disparity
Journal
of
Physics
A:
MathemaMcal
and
General,
20:5273–5288,
1987.
For
node
i,
Y(k)
is
average
over
all
nodes
of
degree
k
58
60. Disparity
in
user
ac$vi$es
Communica$on
paaern
changes
by
#(partners)
60
61. • 13
possible
interac$on
paaerns
with
3
users
• Propor$ons
of
each
paaern
(mo$f)
determine
the
characteris$c
of
the
en$re
network
Network
Mo$fs
Network
MoMfs:
Simple
Building
Blocks
of
Complex
Networks,
Science,
298(5594):824-‐827,
2002 61
62. Mo$f
analysis
in
complex
networks
Superfamilies
of
Evolved
and
Designed
Networks,
Science, 303(5663):1538-1542, 2004
In
social
networks,
triads
are
likely
to
be
observed
62
69. • Rela$onship
forma$on
and
dissolu$on
– Forma$on
has
received
much
aaen$on
– Dissolu$on
hardly
much
due
to
the
lack
of
data
• Proxy
such
as
a
disappearance
of
communica$on
– is
difficult
to
capture
all
communica$on
means
– regards
the
absence
of
an
event
as
strictly
inten$onal.
Mo$va$on
69
70. • Unfollow
in
Twiaer
is
an
explicit
expression
of
rela$onship
dissolu$on
Key
insights
70
&
research
ques$ons
• What
are
the
characteris$cs
of
unfollow?
• Why
do
people
unfollow?
71. • 1.2M
Korean-‐speaking
users
detected
by
– Korean
in
tweets,
bio,
loca$on,
or
screen
name
• Daily
snapshots
of
follow
rela$onships
– G(I):
June
25th
to
July
15th,
2010
– G(II):
August
2nd
to
August
31st,
2010
Datasets
71
72. • Increasing
#
of
users
– G(I):
870,057
+7,599/day
– G(II):
1,203,196
+8,515/day
• Increasing
(high)
reciprocity
– G(I):
56~58%
– G(II):
61~62%
• Increasing
avg.
#
of
followees
– 59.7
→
75.7
Growing
Korean
social
graphs
72
79. • 85.6%
of
links
do
not
involve
any
single
reply,
men$on,
or
retweet
– 96.3%
involve
3
or
fewer
• People
just
subscribe
others’
tweets
passively
79
Volume
of
ac$vity
is
not
a
good
proxy
81. • Burst
tweets
• Uninteres$ng
topics
• Mundane
details
of
daily
life
• Poli$cs
Interviews
about
mo$va$on
81
82. • Burst
tweets
are
likely
to
lead
unfollow
82
Confirmed
by
data
Pearson
corr.
=
0.0554 Pearson
corr.
=
0.5833
All
users
Followee
<
200
Go
to
summary
83. Community-‐level
view:
Consistent
community
iden$fica$on
Mining
communi7es
in
networks:
A
solu7on
for
consistency
and
its
evalua7on
Haewoon
Kwak,
Yoonchan
Choi,
Young-‐Ho
Eom,
Hawoong
Jeong,
and
Sue
Moon.
The
9th
ACM
SIGCOMM
Conference
on
Internet
Measurement
(IMC’09),
2009.
Consistent
Community
Iden7fica7on
in
Complex
Networks
Haewoon
Kwak,
Young-‐Ho
Eom,
Yoonchan
Choi,
Hawoong
Jeong,
and
Sue
Moon,
Preprint
(arXiv:0910.1508v2),
2009
83
85. • eii:
ra$o
of
the
number
of
links
between
nodes
belonging
to
community
i
over
all
links
• ai:
ra$o
of
ends
of
edges
that
are
aaached
to
ver$ces
in
community
i
Modularity,
Q
85
87. • Greedy
algorithms
to
maximize
Q
– Widely
used
for
a
few
tens
of
million
nodes
network
– By
nature,
no
guarantee
to
find
global
maximum
– Local
maxima
=
Inconsistent
par$$oning
Mo$va$on
87
88. • Measure
the
level
of
inconsistency
• Develop
a
new
method
to
achieve
consistency
Our
goals
88
95. • Every
edge
has
pairwise
membership
prob.
• High
pairwise
membership
prob.
indicates
that
two
nodes
are
likely
to
be
in
the
same
community
• Weighted
version
of
exis$ng
algorithms
place
edges
of
high
weight
within
the
community
Intui$ons
behind
new
algorithm
95
96. 1. Aler
a
cycle
of
N
runs,
– Calculate
pairwise
membership
prob.
of
each
edge
– Assign
pairwise
membership
prob.
to
edge
weight
2. Return
to
another
cycle
of
N
runs
with
an
weighted
network
3. Go
to
1.
again
un$l
C
>=
Ƭ
(predefined
threshold)
Our
new
algorithm
96
98. Network-‐wide
view:
Interplay
between
structures
and
dynamics
What
is
TwiKer,
a
Social
Network
or
News
Media?
Haewoon
Kwak,
Changhyun
Lee,
Hosung
Park,
and
Sue
Moon,
The
19th
interna$onal
conference
on
World
wide
web
(WWW),
2010
Finding
Influen7als
Based
on
the
Temporal
Order
of
Info.
Adop7on
in
TwiKer
Changhyun
Lee,
Haewoon
Kwak,
Hosung
Park,
and
Sue
Moon,
The
19th
interna$onal
conference
on
World
wide
web
(WWW),
Poster,
2010
98
102. • Only
22.1%
of
user
pairs
follow
each
other
• Much
lower
than
– 68%
on
Flickr
– 84%
on
Yahoo!
360
– 77%
on
Cyworld
guestbook
Low
reciprocity
in
follow
rela$onships
102
110. • Mul$-‐level
analysis
on
structures
and
dynamics
– Individual-‐level
– Dyad-‐level
– Community-‐level
– Network-‐wide
level
Summary
110
111. • Elemental
processes
in
each
level
– Personal
preferences
and
friend
recommenda$on
– Rela$onship
dynamics
– Consistent
community
iden$fica$on
– Interplay
between
structures
and
dynamics
Ques$ons
we
raise
111
112. • From
new
algorithms
to
analyses
– New
algorithms
and
its
evalua$on
for
recommending
those
who
have
similar
interests
across
services
– Analysis
of
actual
interac$ons
in
contrast
to
friends
rela$onships
in
a
huge
OSN,
Cyworld
– Quan$ta$ve
and
qualita$ve
analysis
of
unfollow
– New
metrics
for
evalua$on
of
consistency
in
communi$es
iden$fied
by
exis$ng
algorithms
– Analysis
of
structures
and
dynamics
of
a
huge
directed
network,
Twiaer
Our
contribu$ons
112
113. • The
interplay
among
the
parallel
social
networks
from
mul$ple
services
• Conflic$ng
informa$on
pathways
in
social
media
Future
direc$ons
113
MulMrelaMonal
organizaMon
of
large-‐scale
social
networks
in
an
online
world,
PNAS
107(31):13636-‐13641
114. • We
aggregate
parallel
social
networks
captured
in
each
service
and
construct
a
big
network.
• We
observe
the
differences
between
parallel
networks
• We
observe
the
interplay
between
parallel
networks
such
as
node
migra$on
• We
compare
user’s
posi$onal
power
between
parallel
networks
Our
goals
114
115. • Some
people
selec$vely
propagate
informa$on
by
their
preferences
• Different
transmission
path
– Some
know
correct
info.,
but
others
not
Conflic$ng
informa$on
pathways
115