Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Mul$-­‐level	
  Analysis	
  on	
  	
  
Structures	
  and	
  Dynamics	
  of	
  OSN
Haewoon	
  Kwak	
  
Department	
  of	
  ...
Outline
•  Background:	
  data	
  sources	
  for	
  social	
  research	
  
–  Tradi$onal	
  methodologies	
  developed	
  ...
•  Surveys	
  
•  Ques$onnaires	
  
•  Archives	
  
•  Observa$ons	
  
•  Experiments	
  
•  Issues	
  of	
  data	
  scala...
Self-­‐report	
  vs.	
  observed
Eagle	
  N,	
  Pentland	
  A,	
  LazerD	
  (2009)	
  Inferring	
  friendship	
  network	
...
The	
  emergence	
  of	
  electronic	
  footprints
Tracking	
  People's	
  Electronic	
  Footprints,	
  	
  
Science	
  10...
•  Mobile	
  phone	
  call	
  logs	
  [J.	
  P.	
  Onnela	
  et	
  al.	
  07]	
  
•  E-­‐mail	
  logs	
  [Kossinets	
  &	
...
The	
  era	
  of	
  online	
  social	
  networks
7
•  Facebook	
  
– 30B	
  pieces	
  of	
  new	
  informa$on	
  /	
  month	
  
•  Twiaer	
  
– 1B	
  messages	
  /	
  week	
...
•  Establish	
  online	
  ‘friend’	
  rela$onship	
  
•  Send	
  a	
  message	
  
•  Send	
  a	
  gil	
  
•  Upload	
  a	
...
•  New	
  source	
  that	
  reveals	
  human	
  nature	
  
– As	
  a	
  miniature	
  of	
  human	
  society	
  
– e.g.	
  ...
•  New	
  source	
  that	
  reveals	
  human	
  nature	
  
– As	
  a	
  miniature	
  of	
  human	
  society	
  
– e.g.	
  ...
•  New	
  source	
  that	
  reveals	
  human	
  nature	
  
– As	
  a	
  miniature	
  of	
  human	
  society	
  
– e.g.	
  ...
•  New	
  source	
  that	
  reveals	
  human	
  nature	
  
– As	
  a	
  miniature	
  of	
  human	
  society	
  
– e.g.	
  ...
Complex	
  structures	
  and	
  dynamics	
  (1)
American	
  Journal	
  of	
  Sociology,	
  Vol.	
  100,	
  No.	
  1."Chain...
Complex	
  structures	
  and	
  dynamics	
  (2)
Annual	
  Review	
  of	
  Sociology,	
  Vol.	
  30	
  “THE	
  ‘NEW’	
  SCI...
•  OSN’s	
  evolu$on…	
  only	
  #(users)?	
  
Complex	
  structures	
  and	
  dynamics	
  (3)
16
•  Mul$-­‐level	
  approach	
  to	
  OSN	
  
•  Mul$ple	
  views	
  focusing	
  on	
  different	
  en$$es	
  
	
  Individua...
•  Each	
  level	
  has	
  its	
  own	
  inherent	
  resolu$on	
  	
  
– to	
  capture	
  elemental	
  processes	
  
– to	...
e.g.	
  How	
  a	
  user	
  has	
  posi$onal	
  power?
A	
  story	
  begins	
  with	
  an	
  individual
19
An	
  individual	
  has	
  proper$es
20
Individuals	
  have	
  proper$es
21
Similar	
  proper$es	
  drive	
  rela$onships
22
Densely	
  connected	
  individuals	
  form	
  a	
  community
23
Individuals	
  interact	
  with	
  others	
  in	
  the	
  same	
  community
24
Some	
  individuals	
  connect	
  to	
  many	
  people	
  in	
  a	
  comm.	
  	
  
25
Some	
  individuals	
  bridge	
  different	
  communi$es	
  
26
•  OSN’s	
  evolu$on…	
  only	
  #(users)?	
  
Complex	
  structures	
  and	
  dynamics	
  (3)
27
•  Different	
  perspec$ves	
  of	
  the	
  evolu$on	
  of	
  OSN	
  
– Increasing	
  avg.	
  Mme	
  on	
  sites	
  
– Incr...
•  Cannot	
  be	
  answered	
  in	
  one-­‐level	
  only	
  
•  Thus,	
  we	
  tackle	
  it	
  step	
  by	
  step	
  from	...
•  Mul$-­‐level	
  analysis	
  of	
  structures	
  and	
  dynamics	
  	
  
– Personal	
  preferences	
  and	
  friend	
  r...
Overview	
  of	
  this	
  thesis
31
2008
 2009 2010 2011
Comparison	
  of	
  Online	
  Social	
  Rela7ons	
  In	
  Terms	
...
Overview	
  of	
  this	
  thesis
32
2008 2009
 2010 2011
Connec7ng	
  Users	
  with	
  Similar	
  Interests	
  Across	
  M...
Overview	
  of	
  this	
  thesis
33
2008 2009 2010
 2011
What	
  is	
  TwiKer,	
  a	
  Social	
  Network	
  or	
  News	
  ...
Overview	
  of	
  this	
  thesis
34
2008 2009 2010 2011
Fragile	
  Online	
  Rela7onship:	
  A	
  First	
  Look	
  At	
  U...
Individual	
  &	
  dyad-­‐level	
  view:	
  
Personal	
  preferences	
  &	
  friend	
  recommenda$on
Connec7ng	
  Users	
 ...
•  Groups	
  in	
  OSN	
  	
  
– Mo$vate	
  user	
  ac$vi$es	
  
– Share	
  common	
  interests	
  and	
  offline	
  commona...
•  To	
  recommend	
  people	
  who	
  share	
  interests	
  
across	
  various	
  kinds	
  of	
  web	
  services	
  
•  T...
•  Inferring	
  interests	
  from	
  tags	
  
– Free	
  keywords	
  to	
  describe	
  user-­‐generated	
  contents	
  
– P...
Tags	
  are	
  supported	
  by	
  many	
  services
39
•  Tags	
  from	
  6	
  various	
  services	
  
•  Refinement	
  by	
  WordNet	
  API	
  
Datasets
Contents Uniq.	
  users ...
•  Highly	
  skewed	
  popularity	
  
– Top	
  20%	
  of	
  tags	
  associated	
  with	
  90%	
  of	
  items	
  
•  Servic...
•  Tag	
  weight	
  assignment	
  
– Absolute	
  number	
  of	
  $mes	
  a	
  tag	
  is	
  used	
  
– Normalized	
  number...
User	
  study	
  of	
  algorithm	
  evalua$on
N-­‐idf	
  +	
  cosine	
  similarity	
  works
43
•  Flickr	
  and	
  YouTube	
  users	
  find	
  good	
  matches	
  in	
  
other	
  services	
  
•  LiveJournal	
  aaracts	
...
Dyad-­‐level	
  view:	
  
Social	
  rela$onship	
  dynamics
Comparison	
  of	
  Online	
  Social	
  Rela7ons	
  In	
  Term...
•  Exchange	
  of	
  guestbook	
  messages	
  in	
  Cyworld	
  
•  Rela$onship	
  dissolu$on	
  by	
  unfollow	
  in	
  Tw...
•  Most	
  popular	
  online	
  SNS	
  in	
  Korea	
  (22M	
  users)	
  
•  Guestbook	
  is	
  the	
  most	
  popular	
  f...
•  Online	
  ‘friends’	
  rela$onship 	
  	
  
– Needs	
  no	
  more	
  cost	
  once	
  established	
  
– Commonly	
  mutu...
•  Direc7onality	
  and	
  strength	
  of	
  user	
  interac$ons	
  
reveal	
  more	
  meaningful	
  rela$onships	
  than	...
From	
  logs	
  to	
  the	
  ac$vity	
  network
	
  
	
  
<	
  From,	
  To,	
  When	
  >	
  
<A,	
  C,	
  20040103T1103>	
...
•  Have	
  power-­‐law	
  degree	
  distribu$on	
  	
  
– A	
  few	
  number	
  of	
  high-­‐degree	
  nodes	
  
– A	
  la...
Mul$-­‐scaling	
  behavior	
  implies	
  heterogeneous	
  rela$ons	
  
52
Weighted	
  clustering	
  coefficient
PNAS,	
  101(11):3747–3752,	
  2004	
  
i1 w	
  =	
  10	
  
w	
  =	
  1	
  
i2
48
5.6
...
•  In	
  ac$vity	
  network	
  Cw=0.0965	
  <	
  C=0.1665	
  
Weighted	
  clustering	
  coefficient
Edges	
  with	
  large	
...
Degree	
  correla$on	
  of	
  social	
  network
degree
avg.	
  
degree	
  
of	
  
neighbors
M.E.J.	
  Newman.	
  AssortaMv...
Degree	
  correla$on	
  of	
  ac$vity	
  network
Assorta$ve	
  mixing	
  is	
  observed	
  
56
Reciprocity	
  in	
  user	
  ac$vi$es
y=x
-­‐	
  Highly	
  reciprocal	
  	
  
-­‐	
  Quan$ta$ve	
  proof	
  of	
  spammers...
•  Do	
  users	
  interact	
  evenly	
  with	
  all	
  friends?	
  
	
  	
  	
  	
  	
  
Disparity
Journal	
  of	
  Physic...
Interpreta$on	
  of	
  Y(k)
Nature	
  427,	
  839	
  –	
  843,	
  2004
Communicate	
  evenly Have	
  dominant	
  partner	
...
Disparity	
  in	
  user	
  ac$vi$es	
  
Communica$on	
  paaern	
  changes	
  by	
  #(partners)	
  
60
•  13	
  possible	
  interac$on	
  paaerns	
  with	
  3	
  users	
  
•  Propor$ons	
  of	
  each	
  paaern	
  (mo$f)	
  de...
Mo$f	
  analysis	
  in	
  complex	
  networks
Superfamilies	
  of	
  Evolved	
  and	
  Designed	
  Networks,	
  
Science, ...
Network	
  mo$fs	
  in	
  user	
  ac$vi$es
Triads	
  are	
  common	
  in	
  Cyworld
63
Network	
  mo$fs	
  in	
  user	
  ac$vi$es
Not	
  in	
  social	
  networks,	
  but	
  in	
  OSN
64
Dunbar’s	
  number
Behavioral	
  and	
  brain	
  sciences,	
  16(4):681–735,	
  1993
The	
  maximum	
  number	
  of	
  soc...
#(friends)	
  s$mulate	
  interac$on?
The	
  more	
  friends	
  one	
  has	
  (up	
  to	
  200),	
  	
  
the	
  more	
  ac...
Time	
  interval	
  between	
  messages 	
  
Nature,	
  435:207–211,	
  2005	
  
Proceedings	
  of	
  WWW,	
  2008
intra-­...
Part	
  II:	
  unfollow	
  analysis	
  in	
  Twiaer
“Stop	
  following”
68
•  Rela$onship	
  forma$on	
  and	
  dissolu$on	
  
– Forma$on	
  has	
  received	
  much	
  aaen$on	
  
– Dissolu$on	
  h...
•  Unfollow	
  in	
  Twiaer	
  is	
  an	
  explicit	
  expression	
  of	
  
rela$onship	
  dissolu$on	
  
Key	
  insights
...
•  1.2M	
  Korean-­‐speaking	
  users	
  detected	
  by	
  
– Korean	
  in	
  tweets,	
  bio,	
  loca$on,	
  or	
  screen	...
•  Increasing	
  #	
  of	
  users	
  
– G(I):	
  870,057	
  	
  	
  	
   	
  +7,599/day	
  
– G(II):	
  1,203,196 	
  +8,5...
Rela$onship	
  forma$on	
  and	
  dissolu$on
73
•  Reciprocity	
  of	
  the	
  rela$onships	
  
•  Dura$on	
  of	
  a	
  rela$onship	
  
•  Followees’	
  informa$veness	
...
Non-­‐reciprocal	
  links	
  are	
  more	
  fragile`
P(broken)	
  =	
  0.1228
P(broken)	
  =	
  0.0529
P(broken)	
  =	
  0...
Newer	
  links	
  are	
  more	
  fragile
76
Non-­‐informa$ve	
  links	
  are	
  more	
  fragile
Random	
  links
77
Weak	
  links	
  are	
  more	
  fragile
78
•  85.6%	
  of	
  links	
  do	
  not	
  involve	
  any	
  single	
  reply,	
  
men$on,	
  or	
  retweet	
  
– 96.3%	
  inv...
Demographic	
  of	
  22	
  par$cipants
11	
  Male,	
  11	
  Female
80
•  Burst	
  tweets	
  
•  Uninteres$ng	
  topics	
  
•  Mundane	
  details	
  of	
  daily	
  life	
  
•  Poli$cs	
  
Inter...
•  Burst	
  tweets	
  are	
  likely	
  to	
  lead	
  unfollow
82
Confirmed	
  by	
  data
Pearson	
  corr.	
  =	
  0.0554 Pe...
Community-­‐level	
  view:	
  
Consistent	
  community	
  iden$fica$on
Mining	
  communi7es	
  in	
  networks:	
  A	
  solu...
Which	
  par$$oning	
  is	
  beaer?
84
•  eii:	
  ra$o	
  of	
  the	
  number	
  of	
  links	
  between	
  nodes	
  
belonging	
  to	
  community	
  i	
  over	
 ...
High	
  Q	
  is	
  beaer
Q=	
  0.41979Q=	
  0.380671
<
 86
•  Greedy	
  algorithms	
  to	
  maximize	
  Q	
  	
  
– Widely	
  used	
  for	
  a	
  few	
  tens	
  of	
  million	
  nod...
•  Measure	
  the	
  level	
  of	
  inconsistency	
  
•  Develop	
  a	
  new	
  method	
  to	
  achieve	
  consistency	
  ...
Datasets	
  of	
  12	
  networks
89
•  The	
  likelihood	
  of	
  a	
  pair	
  of	
  nodes	
  resul$ng	
  in	
  the	
  
same	
  community	
  over	
  N	
  runs...
91
Example	
  of	
  pairwise	
  membership	
  prob.
92
Example	
  of	
  pairwise	
  membership	
  prob.
p.m.p.	
  =	
  2/2
p.m.p.	
  =	
  1/2
•  To	
  quan$fy	
  the	
  network-­‐wide	
  consistency
Measure	
  2:	
  Consistency,	
  C
93
Consistency	
  in	
  12	
  networks
94
•  Every	
  edge	
  has	
  pairwise	
  membership	
  prob.	
  
•  High	
  pairwise	
  membership	
  prob.	
  indicates	
  ...
1.  Aler	
  a	
  cycle	
  of	
  N	
  runs,	
  
– Calculate	
  pairwise	
  membership	
  prob.	
  of	
  each	
  edge	
  
– ...
Convergence	
  of	
  C
97
Network-­‐wide	
  view:	
  
Interplay	
  between	
  structures	
  and	
  dynamics
What	
  is	
  TwiKer,	
  a	
  Social	
  ...
Mo$va$on
In	
  Twiaer	
  
“I	
  follow	
  you”
In	
  most	
  OSNs	
  
“We	
  are	
  friends”
99
•  Measure	
  how	
  one-­‐way	
  rela$onship	
  affects	
  the	
  
network-­‐wide	
  structures	
  and	
  dynamics	
  
Our...
•  41.7M	
  user	
  profiles	
  
•  1.47B	
  follow	
  rela$onships	
  
•  4,262	
  trending	
  topics	
  
•  106M	
  tweet...
•  Only	
  22.1%	
  of	
  user	
  pairs	
  follow	
  each	
  other	
  
•  Much	
  lower	
  than	
  	
  
– 68%	
  on	
  Fli...
Plenty	
  of	
  super	
  hubs
103
Degree of Separation
104
The	
  avg.	
  path	
  length	
  =	
  4.1
Timely	
  trending	
  topics
54.3%	
  	
  
“headline	
  news”
31.5%	
  	
  
“ephemeral”
105
106
Empirical	
  retweet	
  trees
Boos$ng	
  audience	
  by	
  retweets
107
35%	
  of	
  RTs	
  <	
  10	
  min.,	
  55%	
  <	
  1	
  hr
108
109
Powers	
  from	
  structures	
  and	
  dynamics
•  Mul$-­‐level	
  analysis	
  on	
  structures	
  and	
  dynamics	
  	
  
– Individual-­‐level	
  
– Dyad-­‐level	
  
– C...
•  Elemental	
  processes	
  in	
  each	
  level	
  
– Personal	
  preferences	
  and	
  friend	
  recommenda$on	
  
– Rel...
•  From	
  new	
  algorithms	
  to	
  analyses	
  
– New	
  algorithms	
  and	
  its	
  evalua$on	
  for	
  recommending	
...
•  The	
  interplay	
  among	
  the	
  parallel	
  social	
  networks	
  
from	
  mul$ple	
  services	
  
•  Conflic$ng	
  ...
•  We	
  aggregate	
  parallel	
  social	
  networks	
  captured	
  
in	
  each	
  service	
  and	
  construct	
  a	
  big...
•  Some	
  people	
  selec$vely	
  propagate	
  informa$on	
  
by	
  their	
  preferences	
  
•  Different	
  transmission	...
Thank	
  you
116
Upcoming SlideShare
Loading in …5
×

Multi-level analysis on structures and dynamics of OSN

1,063 views

Published on

Ph.D thesis defense. (April 2011)

Published in: Data & Analytics
  • Be the first to comment

Multi-level analysis on structures and dynamics of OSN

  1. 1. Mul$-­‐level  Analysis  on     Structures  and  Dynamics  of  OSN Haewoon  Kwak   Department  of  Computer  Science,  KAIST     Ph.  D.  thesis  defense     April  12th  2011   Advisor:  Sue  Moon 1
  2. 2. Outline •  Background:  data  sources  for  social  research   –  Tradi$onal  methodologies  developed  in  sociology   –  Electronic  footprints   •  Complex  structures  and  dynamics  of  OSN   •  Mul$-­‐level  approach          Individual  level          Dyad  level          Community  level          Network-­‐wide  level   •  Summary  &  future  direc$on   2
  3. 3. •  Surveys   •  Ques$onnaires   •  Archives   •  Observa$ons   •  Experiments   •  Issues  of  data  scalability,  quality,  &  measurement     [Marsden  90] Methodologies  developed  in  sociology 3
  4. 4. Self-­‐report  vs.  observed Eagle  N,  Pentland  A,  LazerD  (2009)  Inferring  friendship  network  structure  using  mobile  phone  data.   PNAS,    106:15274–15278. 4
  5. 5. The  emergence  of  electronic  footprints Tracking  People's  Electronic  Footprints,     Science  10  November  2006:  Vol.  314  no.  5801  pp.  914-­‐916     5
  6. 6. •  Mobile  phone  call  logs  [J.  P.  Onnela  et  al.  07]   •  E-­‐mail  logs  [Kossinets  &  D.  Waas  09]   •  Online  ‘friend’  rela$onships  [Ahn  et  al.  07]   •  Facebook  wall  pos$ngs  [S.  Golder  et  al.  07]   •  Conversa$ons  on  MSN  [Leskovec  &  Horvitz  07]   •  Photos  in  Flickr  [Marlow  et  al.  07]   Forms  of  electronic  footprints 6
  7. 7. The  era  of  online  social  networks 7
  8. 8. •  Facebook   – 30B  pieces  of  new  informa$on  /  month   •  Twiaer   – 1B  messages  /  week   – 7TB  /  day   – “300GB  while  I  give  this  talk” Tremendous  volume  of  records hap://www.facebook.com/press/info.php?sta$s$cs   NoSQL  at  Twiaer  (NoSQL  EU  2010) 8
  9. 9. •  Establish  online  ‘friend’  rela$onship   •  Send  a  message   •  Send  a  gil   •  Upload  a  photo   •  Share  one’s  loca$on   •  Play  a  game   Rich  behavior  in  OSN 9
  10. 10. •  New  source  that  reveals  human  nature   – As  a  miniature  of  human  society   – e.g.  verifying  balanced  theory  &  weak  $e  hypothesis   •  Virtual  world  interac$ng  with  a  real  world   – Elec$on  campaign   Q:  Why  we  study  OSN?          A:  OSN  is  … 10
  11. 11. •  New  source  that  reveals  human  nature   – As  a  miniature  of  human  society   – e.g.  verifying  balanced  theory  &  weak  $e  hypothesis   •  Virtual  world  interac$ng  with  a  real  world   – Elec$on  campaign   – Gossip  propaga$on Q:  Why  we  study  OSN?          A:  OSN  is  … 11
  12. 12. •  New  source  that  reveals  human  nature   – As  a  miniature  of  human  society   – e.g.  verifying  balanced  theory  &  weak  $e  hypothesis   •  Virtual  world  interac$ng  with  a  real  world   – Elec$on  campaign   – Gossip  propaga$on   – Reputa$on  management   Q:  Why  we  study  OSN?          A:  OSN  is  … 12
  13. 13. •  New  source  that  reveals  human  nature   – As  a  miniature  of  human  society   – e.g.  verifying  balanced  theory  &  weak  $e  hypothesis   •  Virtual  world  interac$ng  with  a  real  world   – Elec$on  campaign   – Gossip  propaga$on   – Reputa$on  management   – Money  exchange Q:  Why  we  study  OSN?          A:  OSN  is  … 13
  14. 14. Complex  structures  and  dynamics  (1) American  Journal  of  Sociology,  Vol.  100,  No.  1."Chains  of   affec$on:  The  structure  of  adolescent  roman$c  and  sexu al  networks  “  (2004) Facebook  social  graph  by  Facebook  Data  team 14
  15. 15. Complex  structures  and  dynamics  (2) Annual  Review  of  Sociology,  Vol.  30  “THE  ‘NEW’  SCIENCE   OF  NETWORKS”  (2004) Nature  reviews  gene$cs,  Vol.  5,  “Network  biology:  under standing  the  cell's  func$onal  organiza$on”  (2004) WWW,  “Analysis  of  Topological  Characteris$cs   of  Huge  Online  Social  Networking  Services”  (2007) 15
  16. 16. •  OSN’s  evolu$on…  only  #(users)?   Complex  structures  and  dynamics  (3) 16
  17. 17. •  Mul$-­‐level  approach  to  OSN   •  Mul$ple  views  focusing  on  different  en$$es    Individual    Dyad    Community    Network-­‐wide Our  approach  –  Divide  and  conquer 17
  18. 18. •  Each  level  has  its  own  inherent  resolu$on     – to  capture  elemental  processes   – to  understand  the  complex  structures  and  dynamics     as  a  combina$on  of  findings  across  mul$ple  levels Strong  points  of  mul$-­‐level  approach 18
  19. 19. e.g.  How  a  user  has  posi$onal  power? A  story  begins  with  an  individual 19
  20. 20. An  individual  has  proper$es 20
  21. 21. Individuals  have  proper$es 21
  22. 22. Similar  proper$es  drive  rela$onships 22
  23. 23. Densely  connected  individuals  form  a  community 23
  24. 24. Individuals  interact  with  others  in  the  same  community 24
  25. 25. Some  individuals  connect  to  many  people  in  a  comm.     25
  26. 26. Some  individuals  bridge  different  communi$es   26
  27. 27. •  OSN’s  evolu$on…  only  #(users)?   Complex  structures  and  dynamics  (3) 27
  28. 28. •  Different  perspec$ves  of  the  evolu$on  of  OSN   – Increasing  avg.  Mme  on  sites   – Increasing  #  of  friends   – Diversifying  types  of  rela$onship   – Increasing  #  of  cohesive  groups   – Increasing  density  of  the  network   – Shortening  the  avg.  diameter  of  the  network   – Absorbing  other  networks   Complex  structures  and  dynamics  (3)  Complex  structures  and  dynamics  (3)Complex  phenomena  in  mul$-­‐level 28
  29. 29. •  Cannot  be  answered  in  one-­‐level  only   •  Thus,  we  tackle  it  step  by  step  from  microscopic  to   macroscopic  view   – Individual  level   – Dyad  level   – Community  level   – Network-­‐wide  level   29 Complex  structures  and  dynamics  of  OSN
  30. 30. •  Mul$-­‐level  analysis  of  structures  and  dynamics     – Personal  preferences  and  friend  recommenda$on   – Rela$onship  dynamics   – Consistent  community  iden$fica$on   – Interplay  between  structures  and  dynamics Overview  of  this  thesis 30
  31. 31. Overview  of  this  thesis 31 2008 2009 2010 2011 Comparison  of  Online  Social  Rela7ons  In  Terms  of  Volume  vs.  Interac7on:   A  Case  Study  of  Cyworld       Hyunwoo  Chun,  Haewoon  Kwak,  Young-­‐Ho  Eom,  Yong-­‐Yeol  Ahn,  Sue  Moon,  and  Ha woong  Jeong,  The  8th  ACM  SIGCOMM  Conference  on  Internet  Measurement  (IMC).   2008.  
  32. 32. Overview  of  this  thesis 32 2008 2009 2010 2011 Connec7ng  Users  with  Similar  Interests  Across  Mul7ple  Web  Services       Haewoon  Kwak,  Hwa-­‐Yong  Shin,  Jong-­‐Il  Yoon,  and  Sue  Moon,  The  3rd  Interna$onal     AAAI  Conference  on  Weblogs  and  Social  Media  (ICWSM),  Poster,  2009.   Mining  communi7es  in  networks:  A  solu7on  for  consistency  and  its  evalua7on   Haewoon  Kwak,  Yoonchan  Choi,  Young-­‐Ho  Eom,  Hawoong  Jeong,  and  Sue  Moon.     The  9th  ACM  SIGCOMM  Conference  on  Internet  Measurement  (IMC),  2009.  
  33. 33. Overview  of  this  thesis 33 2008 2009 2010 2011 What  is  TwiKer,  a  Social  Network  or  News  Media?   Haewoon  Kwak,  Changhyun  Lee,  Hosung  Park,  and  Sue  Moon,     The  19th  interna$onal  conference  on  World  wide  web  (WWW),  2010   Finding  Influen7als  Based  on  the  Temporal  Order  of  Info.  Adop7on  in  TwiKer   Changhyun  Lee,  Haewoon  Kwak,  Hosung  Park,  and  Sue  Moon,     The  19th  interna$onal  conference  on  World  wide  web  (WWW),  Poster,  2010 Ph.  D.  Thesis  Proposal
  34. 34. Overview  of  this  thesis 34 2008 2009 2010 2011 Fragile  Online  Rela7onship:  A  First  Look  At  Unfollow  Dynamics  In  TwiKer     Haewoon  Kwak,  Hyunwoo  Chun,  and  Sue  Moon,  The  29th  interna$onal  conference  o n  Human  factors  in  compu$ng  systems  (CHI),  2011.  
  35. 35. Individual  &  dyad-­‐level  view:   Personal  preferences  &  friend  recommenda$on Connec7ng  Users  with  Similar  Interests  Across  Mul7ple  Web  Services       Haewoon  Kwak,  Hwa-­‐Yong  Shin,  Jong-­‐Il  Yoon,  and  Sue  Moon,  The  3rd   Interna$onal  AAAI  Conference  on  Weblogs  and  Social  Media  (ICWSM),  Poster,   2009.     다종 웹 서버 간 유사 사용자 추출 시스템 및 그 방법     Sue  Moon,  Haewoon  Kwak,  Hwa-­‐Yong  Shin,  and  Jong-­‐Il  Yoon,  Korean  Patent  No.   10-­‐1010997,  2011. 35
  36. 36. •  Groups  in  OSN     – Mo$vate  user  ac$vi$es   – Share  common  interests  and  offline  commonali$es     Mo$va$on Blackbox But,  restricted  within  the  boundary  of  each  service   36
  37. 37. •  To  recommend  people  who  share  interests   across  various  kinds  of  web  services   •  To  support  many  web  services  without   modifica$on   •  Not  to  burden  users  with  addi$onal  profile   management Our  goals 37
  38. 38. •  Inferring  interests  from  tags   – Free  keywords  to  describe  user-­‐generated  contents   – Publicly  accessible   – Simple  format  (plain  text)   Key  insights 38
  39. 39. Tags  are  supported  by  many  services 39
  40. 40. •  Tags  from  6  various  services   •  Refinement  by  WordNet  API   Datasets Contents Uniq.  users Uniq.  tags Avg.  tags Del.icio.us Bookmark 40,072 1,092,534 227.2 Flickr Photo 6,366 71,724 32.4 YouTube Video 9,481 171,990 56.5 LiveJournal Blog 49,792 729,975 44.49 Last.FM Music 54,464 95,901 10.95 AllBlog Blog 24,559 383,374 44.04 40
  41. 41. •  Highly  skewed  popularity   – Top  20%  of  tags  associated  with  90%  of  items   •  Service-­‐dependent     – About  20%  of  tags  belong  to  more  than  one  service   •  Frequently  change  over  $me   Findings  about  user  interests 41
  42. 42. •  Tag  weight  assignment   – Absolute  number  of  $mes  a  tag  is  used   – Normalized  number,  N,  of  $mes  a  tag  is  used   – N-­‐idf   •  Similarity  calcula$on  between  two  tag  sets   – Sum  of  the  weights  of  common  tags   – Cosine  similarity  based  on  the  vector  model   Recommenda$on  algorithms 42
  43. 43. User  study  of  algorithm  evalua$on N-­‐idf  +  cosine  similarity  works 43
  44. 44. •  Flickr  and  YouTube  users  find  good  matches  in   other  services   •  LiveJournal  aaracts  users  from  other  services   beaer  than  any  other  service Condi$onal  prob.  of  recommenda$on 44
  45. 45. Dyad-­‐level  view:   Social  rela$onship  dynamics Comparison  of  Online  Social  Rela7ons  In  Terms  of  Volume  vs.  Interac7on:  A   Case  Study  of  Cyworld       Hyunwoo  Chun,  Haewoon  Kwak,  Young-­‐Ho  Eom,  Yong-­‐Yeol  Ahn,  Sue  Moon,  and   Hawoong  Jeong,  The  8th  ACM  SIGCOMM  Conference  on  Internet  Measurement   (IMC’08).  2008.     Fragile  Online  Rela7onship:  A  First  Look  At  Unfollow  Dynamics  In  TwiKer     Haewoon  Kwak,  Hyunwoo  Chun,  and  Sue  Moon,  The  29th  interna$onal   conference  on  Human  factors  in  compu$ng  systems  (CHI’11),  2011. 45
  46. 46. •  Exchange  of  guestbook  messages  in  Cyworld   •  Rela$onship  dissolu$on  by  unfollow  in  Twiaer Rela$onship  dynamics  observed  from 46
  47. 47. •  Most  popular  online  SNS  in  Korea  (22M  users)   •  Guestbook  is  the  most  popular  feature   Part  I:  guestbook  log  analysis  in  Cyworld 47
  48. 48. •  Online  ‘friends’  rela$onship     – Needs  no  more  cost  once  established   – Commonly  mutual   – All  online  friends  are  considered  equally   Mo$va$on Thus,  online  ‘friends’  are  not  enough  to  represent  soci al  rela$onships  at  that  $me   48
  49. 49. •  Direc7onality  and  strength  of  user  interac$ons   reveal  more  meaningful  rela$onships  than  online   ‘friends'  do   Key  insights 49
  50. 50. From  logs  to  the  ac$vity  network     <  From,  To,  When  >   <A,  C,  20040103T1103>   <B,  C,  20040103T1106>   <C,  B,  20040104T1201>   <B,  C,  20040104T0159> CA B 1 2 1 Directed  &  weighted     “AcMvity  network” 8  billion  messages Graph   construc$on   50
  51. 51. •  Have  power-­‐law  degree  distribu$on     – A  few  number  of  high-­‐degree  nodes   – A  large  number  of  low-­‐degree  nodes   •  Have  common  characteris$cs   – Short  diameter   – Fault  tolerant   Most  social  networks Nature  Reviews  GeneMcs  5,  101-­‐113,  2004   51
  52. 52. Mul$-­‐scaling  behavior  implies  heterogeneous  rela$ons   52
  53. 53. Weighted  clustering  coefficient PNAS,  101(11):3747–3752,  2004   i1 w  =  10   w  =  1   i2 48 5.6 ) 2 )11()110( ( )13(12 1 1 = +++ − =w iC 48 11 ) 2 )110()101( ( )13(12 1 2 = +++ − =w iC w i w i CC 21 < 53
  54. 54. •  In  ac$vity  network  Cw=0.0965  <  C=0.1665   Weighted  clustering  coefficient Edges  with  large  weights  are  less  likely  to  form  a  triad   i1 i2 54
  55. 55. Degree  correla$on  of  social  network degree avg.   degree   of   neighbors M.E.J.  Newman.  AssortaMve  mixing  in  networks,     Phys.  Rev.  Le`.  89,  208701  (2002).     “Assorta$ve  mixing” 55
  56. 56. Degree  correla$on  of  ac$vity  network Assorta$ve  mixing  is  observed   56
  57. 57. Reciprocity  in  user  ac$vi$es y=x -­‐  Highly  reciprocal     -­‐  Quan$ta$ve  proof  of  spammers   57
  58. 58. •  Do  users  interact  evenly  with  all  friends?             Disparity Journal  of  Physics  A:  MathemaMcal  and  General,  20:5273–5288,  1987.   For  node  i, Y(k)  is  average  over  all  nodes  of  degree  k 58
  59. 59. Interpreta$on  of  Y(k) Nature  427,  839  –  843,  2004 Communicate  evenly Have  dominant  partner   59
  60. 60. Disparity  in  user  ac$vi$es   Communica$on  paaern  changes  by  #(partners)   60
  61. 61. •  13  possible  interac$on  paaerns  with  3  users   •  Propor$ons  of  each  paaern  (mo$f)  determine   the  characteris$c  of  the  en$re  network Network  Mo$fs Network  MoMfs:  Simple  Building  Blocks  of  Complex  Networks,     Science,  298(5594):824-­‐827,  2002 61
  62. 62. Mo$f  analysis  in  complex  networks Superfamilies  of  Evolved  and  Designed  Networks,   Science, 303(5663):1538-1542, 2004 In  social  networks,     triads  are  likely  to  be  observed 62
  63. 63. Network  mo$fs  in  user  ac$vi$es Triads  are  common  in  Cyworld 63
  64. 64. Network  mo$fs  in  user  ac$vi$es Not  in  social  networks,  but  in  OSN 64
  65. 65. Dunbar’s  number Behavioral  and  brain  sciences,  16(4):681–735,  1993 The  maximum  number  of  social  rela$ons  managed  by  modern  human  is  150   65
  66. 66. #(friends)  s$mulate  interac$on? The  more  friends  one  has  (up  to  200),     the  more  ac$ve  one  is Median   #(sent  msgs) 66
  67. 67. Time  interval  between  messages   Nature,  435:207–211,  2005   Proceedings  of  WWW,  2008 intra-­‐session   inter-­‐session   daily-­‐peak   67
  68. 68. Part  II:  unfollow  analysis  in  Twiaer “Stop  following” 68
  69. 69. •  Rela$onship  forma$on  and  dissolu$on   – Forma$on  has  received  much  aaen$on   – Dissolu$on  hardly  much  due  to  the  lack  of  data   •  Proxy  such  as  a  disappearance  of  communica$on     – is  difficult  to  capture  all  communica$on  means   – regards  the  absence  of  an  event  as  strictly   inten$onal.     Mo$va$on 69
  70. 70. •  Unfollow  in  Twiaer  is  an  explicit  expression  of   rela$onship  dissolu$on   Key  insights 70 &  research  ques$ons •  What  are  the  characteris$cs  of   unfollow?   •  Why  do  people  unfollow?
  71. 71. •  1.2M  Korean-­‐speaking  users  detected  by   – Korean  in  tweets,  bio,  loca$on,  or  screen  name   •  Daily  snapshots  of  follow  rela$onships     – G(I):  June  25th  to  July  15th,  2010   – G(II):  August  2nd  to  August  31st,  2010   Datasets 71
  72. 72. •  Increasing  #  of  users   – G(I):  870,057          +7,599/day   – G(II):  1,203,196  +8,515/day   •  Increasing  (high)  reciprocity   – G(I):  56~58%   – G(II):  61~62%   •  Increasing  avg.  #  of  followees   – 59.7  →  75.7   Growing  Korean  social  graphs   72
  73. 73. Rela$onship  forma$on  and  dissolu$on 73
  74. 74. •  Reciprocity  of  the  rela$onships   •  Dura$on  of  a  rela$onship   •  Followees’  informa$veness   •  Tie  strength 4  Factors  to  unfollow 74
  75. 75. Non-­‐reciprocal  links  are  more  fragile` P(broken)  =  0.1228 P(broken)  =  0.0529 P(broken)  =  0.2345 75 < <
  76. 76. Newer  links  are  more  fragile 76
  77. 77. Non-­‐informa$ve  links  are  more  fragile Random  links 77
  78. 78. Weak  links  are  more  fragile 78
  79. 79. •  85.6%  of  links  do  not  involve  any  single  reply,   men$on,  or  retweet   – 96.3%  involve  3  or  fewer     •  People  just  subscribe  others’  tweets  passively   79 Volume  of  ac$vity  is  not  a  good  proxy
  80. 80. Demographic  of  22  par$cipants 11  Male,  11  Female 80
  81. 81. •  Burst  tweets   •  Uninteres$ng  topics   •  Mundane  details  of  daily  life   •  Poli$cs   Interviews  about  mo$va$on 81
  82. 82. •  Burst  tweets  are  likely  to  lead  unfollow 82 Confirmed  by  data Pearson  corr.  =  0.0554 Pearson  corr.  =  0.5833 All  users   Followee  <  200   Go  to  summary
  83. 83. Community-­‐level  view:   Consistent  community  iden$fica$on Mining  communi7es  in  networks:  A  solu7on  for  consistency  and  its  evalua7on   Haewoon  Kwak,  Yoonchan  Choi,  Young-­‐Ho  Eom,  Hawoong  Jeong,  and  Sue  Moon.   The  9th  ACM  SIGCOMM  Conference  on  Internet  Measurement  (IMC’09),  2009.     Consistent  Community  Iden7fica7on  in  Complex  Networks   Haewoon  Kwak,  Young-­‐Ho  Eom,  Yoonchan  Choi,  Hawoong  Jeong,  and  Sue  Moon,   Preprint  (arXiv:0910.1508v2),  2009 83
  84. 84. Which  par$$oning  is  beaer? 84
  85. 85. •  eii:  ra$o  of  the  number  of  links  between  nodes   belonging  to  community  i  over  all  links   •  ai:  ra$o  of  ends  of  edges  that  are  aaached  to   ver$ces  in  community  i Modularity,  Q 85
  86. 86. High  Q  is  beaer Q=  0.41979Q=  0.380671 < 86
  87. 87. •  Greedy  algorithms  to  maximize  Q     – Widely  used  for  a  few  tens  of  million  nodes  network   – By  nature,  no  guarantee  to  find  global  maximum   – Local  maxima  =  Inconsistent  par$$oning     Mo$va$on 87
  88. 88. •  Measure  the  level  of  inconsistency   •  Develop  a  new  method  to  achieve  consistency   Our  goals 88
  89. 89. Datasets  of  12  networks 89
  90. 90. •  The  likelihood  of  a  pair  of  nodes  resul$ng  in  the   same  community  over  N  runs Measure  1:  Pairwise  membership  prob. 90
  91. 91. 91 Example  of  pairwise  membership  prob.
  92. 92. 92 Example  of  pairwise  membership  prob. p.m.p.  =  2/2 p.m.p.  =  1/2
  93. 93. •  To  quan$fy  the  network-­‐wide  consistency Measure  2:  Consistency,  C 93
  94. 94. Consistency  in  12  networks 94
  95. 95. •  Every  edge  has  pairwise  membership  prob.   •  High  pairwise  membership  prob.  indicates  that   two  nodes  are  likely  to  be  in  the  same  community   •  Weighted  version  of  exis$ng  algorithms  place   edges  of  high  weight  within  the  community Intui$ons  behind  new  algorithm 95
  96. 96. 1.  Aler  a  cycle  of  N  runs,   – Calculate  pairwise  membership  prob.  of  each  edge   – Assign  pairwise  membership  prob.  to  edge  weight   2.  Return  to  another  cycle  of  N  runs  with  an   weighted  network   3.  Go  to  1.  again  un$l  C  >=  Ƭ  (predefined   threshold) Our  new  algorithm 96
  97. 97. Convergence  of  C 97
  98. 98. Network-­‐wide  view:   Interplay  between  structures  and  dynamics What  is  TwiKer,  a  Social  Network  or  News  Media?   Haewoon  Kwak,  Changhyun  Lee,  Hosung  Park,  and  Sue  Moon,  The  19th   interna$onal  conference  on  World  wide  web  (WWW),  2010     Finding  Influen7als  Based  on  the  Temporal  Order  of  Info.  Adop7on  in  TwiKer   Changhyun  Lee,  Haewoon  Kwak,  Hosung  Park,  and  Sue  Moon,  The  19th   interna$onal  conference  on  World  wide  web  (WWW),  Poster,  2010 98
  99. 99. Mo$va$on In  Twiaer   “I  follow  you” In  most  OSNs   “We  are  friends” 99
  100. 100. •  Measure  how  one-­‐way  rela$onship  affects  the   network-­‐wide  structures  and  dynamics   Our  goals 100
  101. 101. •  41.7M  user  profiles   •  1.47B  follow  rela$onships   •  4,262  trending  topics   •  106M  tweets  men$oning  trending  topics   – Spam  tweets  removed  by  CleanTweets Datasets 101
  102. 102. •  Only  22.1%  of  user  pairs  follow  each  other   •  Much  lower  than     – 68%  on  Flickr   – 84%  on  Yahoo!  360   – 77%  on  Cyworld  guestbook Low  reciprocity  in  follow  rela$onships 102
  103. 103. Plenty  of  super  hubs 103
  104. 104. Degree of Separation 104 The  avg.  path  length  =  4.1
  105. 105. Timely  trending  topics 54.3%     “headline  news” 31.5%     “ephemeral” 105
  106. 106. 106 Empirical  retweet  trees
  107. 107. Boos$ng  audience  by  retweets 107
  108. 108. 35%  of  RTs  <  10  min.,  55%  <  1  hr 108
  109. 109. 109 Powers  from  structures  and  dynamics
  110. 110. •  Mul$-­‐level  analysis  on  structures  and  dynamics     – Individual-­‐level   – Dyad-­‐level   – Community-­‐level   – Network-­‐wide  level Summary 110
  111. 111. •  Elemental  processes  in  each  level   – Personal  preferences  and  friend  recommenda$on   – Rela$onship  dynamics   – Consistent  community  iden$fica$on   – Interplay  between  structures  and  dynamics Ques$ons  we  raise 111
  112. 112. •  From  new  algorithms  to  analyses   – New  algorithms  and  its  evalua$on  for  recommending   those  who  have  similar  interests  across  services   – Analysis  of  actual  interac$ons  in  contrast  to  friends   rela$onships  in  a  huge  OSN,  Cyworld   – Quan$ta$ve  and  qualita$ve  analysis  of  unfollow   – New  metrics  for  evalua$on  of  consistency  in   communi$es  iden$fied  by  exis$ng  algorithms   – Analysis  of  structures  and  dynamics  of  a  huge   directed  network,  Twiaer Our  contribu$ons 112
  113. 113. •  The  interplay  among  the  parallel  social  networks   from  mul$ple  services   •  Conflic$ng  informa$on  pathways  in  social  media   Future  direc$ons 113 MulMrelaMonal  organizaMon  of  large-­‐scale  social  networks  in  an  online  world,     PNAS  107(31):13636-­‐13641
  114. 114. •  We  aggregate  parallel  social  networks  captured   in  each  service  and  construct  a  big  network.   •  We  observe  the  differences  between  parallel   networks   •  We  observe  the  interplay  between  parallel   networks  such  as  node  migra$on   •  We  compare  user’s  posi$onal  power  between   parallel  networks   Our  goals 114
  115. 115. •  Some  people  selec$vely  propagate  informa$on   by  their  preferences   •  Different  transmission  path   – Some  know  correct  info.,  but  others  not Conflic$ng  informa$on  pathways 115
  116. 116. Thank  you 116

×