Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Suicide ideation of individuals in online social networks tokyo webmining

11,605 views

Published on

suicide ideation of individuals in online social networks

  • Be the first to comment

Suicide ideation of individuals in online social networks tokyo webmining

  1. Suicide  ideation  of  individuals   in  online  social  networksN.  Masuda,  I.  Kurahashi  and  H.  Onari,  arXiv:1207.2548,  2012 Hiroko  Onari #TokyoWebmining 26th  of  August,  2012
  2. What  is  a  social  network?• graph  that  represents   relationships  (ties,  links)   between  independent  users   (nodes)
  3. Directed  networks where  ties  have  direction e.g.)  online  directed  networks -  twitter -  Google+ -  YouTube -  FlickrUndirected  networkswhere  ties  have  no  direction e.g.)  online  undirected  networks -  mixi -  Facebook -  skype -  LinkedIn
  4. What  is  suicide? -  association  with  social  isolation  -• Suicide  is  defined  as  all  cases  of  death  resulting  directly  or  indirectly   from  a  positive  (e.g.,  shooting  oneself)  or  negative  (e.g.,  refusing  to   eat)  act  of  the  victim  himself,  which  he  knows  will  produce  this  result.   [Durkheim,  1951]• Suicide  is  not  an  individual  act  nor  a  personal  action.  The  force,  which   determines  the  suicide,  is  not  psychological  but  social.  Suicide  is  the   result  of  social  disorganization  or  lack  of  social  integration  or  social   solidarity.  [Durkheim,  1951]   =>  Social  Isolation
  5. Social  network  analysis   on  suicide  &  social  isolation• A  small  number  of  friends  and  a  small  fraction  of  triangles  to  which  an   individual  belongs  significantly  contribute  to  suicide  ideation  of  social   isolation.  (by  study  the  relationship  between  suicidal  behavior  and   egoentric  social  networks  among  adolescents) [Bearman  &  Moody,  2004]  [Cui  et  al.,  2010]• The  paucity  of  triangles,  or  intransitivity  also  characterizes  social   isolation.  [Wasserman  &  Faust,  1994]  [Bearman  &  Moody,  2004]• Individuals  without  triangles  are  considered  to  lack  membership  to   social  group  even  if  they  have  many  friends.  [Krackhardt,  1999]
  6. Social  Statistics  by  OECDJapanʼ’s  suicide  rate  per  100,000  persons  is  higher   than  any  other  OECD  country. Denmark Greece Hungary Ireland Japan Switzerland OECD average 45 40 35 30 25 Japan 20 15 10 5 0 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 20 20 20 20 Suicide  rates Suicide  ratesper  100,000  persons  per  year per  100,000  persons,  1960  -  2006
  7. Research  Questions• From  the  perspective  of  network  science,  can  we   observe  indications  for  reducing  suicide  by  the   quantitative  analysis  in  online  social  network?• Can  we  say  that  online  social  network  reflect  real   personal  relationship?
  8. Data• 2.7     107 Social  Network  of            ×            registered  users  from  mixi  as  of  March  2012.  • More  than    4.5          106  user-defined  communities  on  various  topics  in  mixi         ×         as  of  April  2012.  A  community  is  a  group  of  users  that  have  a  common   interest  such  as  hobby.  The  user-defined  community  is  distinctive   feature  which  other  major  SNSs  like  Facebook  do  not  have.• mixi  is  a  major  SNS  in  Japan,  and  it  launched  in  2004.
  9. Analysis  Environment Tokyo  Cabinet Analysis  Computer edge  list ID,  friendʼ’s  ID 1,  2 1,  3 2,  1 2,  4 2,  5 Perl Tokyo  Cabinet ID:  friendʼ’s  IDs 1:  2,  3 personal  info. 2:  1,  4,  5 clustering  coefficient community  info. ID,  Clustering  Coefficient 1,  0.4 result  report 2,  0.2We  calculated  clustering  coefficient  in  Tokyo  Cabinet  which  is  a  library  of  routines  for  a  managing  database  and  is  contained  Key-Value  store.  As  API  for  Tokyo  Cabinet,  Perl  language  was  used.  Data  analysis  was  implemented  in  R.Irrelevant  private  information  was  deleted,  and  relevant  information  was  encrypted.  We  conducted  all  analysis  in  Tokyo  HQ  office  of  mixi  using  a  computer  that  is  not  connected  to  Internet.
  10. Sampling  Procedure  (1/2)• Suicide  seed  sample  (9990  users):  Selected  4  communities  which  are   related  with  suicide  as  the  following  criteria;   (1)  the  name  of  user-defined  communities  includes  the  word   suicide  ( jisatsu  in  Japanese) (2)  at  least  1000  members (3)  at  least  100  comments  posted  for  each  topic (4)  at  least  3  independent  topics  on  which  comments  were  made  on  October,  2011 (5)  the  admission  to  join  community  is  open  to  public      *  excluded  communities  which  concentrated  on  the  method  of  committing  suicide  and                encouraged  members  to  live  with  hopes        *  discarded  users  with  0  or  1  friend  on  mixi Then,  sampled  9990  active  users  that  existed  as  of  January  23,  2012   and  logged  on  to  mixi  in  more  than  20  days  per  month  on  average  from   August  through  December  2011  from  the  suicidal  communities
  11. Sampling  Procedure  (2/2)• Depression  seed  sample  (24410  users):  Selected  7  communities   which  are  related  with  depression  by  the  way  of  the  similar  criteria  with   suicide  seed  sampling.  The  difference  is  the  name  of  user-defined   communities  includes  the  word   depression  ( utsu  in  Japanese)   Sampled  24410  active  users  from  the  depression  communities• Control  seed  sample  (228949  users):  Random  sample  of  active  users   that  who  had  at  least  2  friends,  and  did  not  join  the  suicidal   communities  and  the  depression-related  communities
  12. Measurements  of  a  social  network• The  following  network  indications  were  adopted. -  degree -  degree  distribution -  clustering  coefficient -  homophily
  13. Degree• Degree  is  the  number  of  neighbors  (i.e.,  Friends),  and  denoted  by           ki for  user    i .  A  small  degree  is  an  indicator  of  social  isolation.  
  14. Degree  distribution• It  is  known  that  the  degree  distribution  of  human  relationship  are  long   tailed.    Most  people  have  a  relatively  small  degree,  but  a  few  people   have  very  large  degree,  being  connected  to  many  other  people.
  15. Clustering  coefficient• Clustering  coefficient  is  a  measure  of  the  number  of  triangles  in  a   network.  In  social  networks,  clustering  coefficient  is  large,  the  user  is   considered  to  be  embedded  in  close-knit  social  groups  (Wasserman  &   Faust,  1994;  Watts  &  Strogattz,1998;  Newman,  2010).  A  small  value  is   an  indicator  of  social  isolation.• Clustering  coefficient  can  be  measured  in  two  ways:   global  clustering   coefficient  (often  called   transitivity )  and   local  clustering  coefficient .   The  global  measure  gives  an  overall  indication  of  the  clustering  in  the   network,  whereas  the  local  measure  gives  an  indication  of  the   embeddedness  of  single  nodes.• In  our  research,  we  use   local  clustering  coefficient . *  Local  clustering  coefficient  is  often  used  in  network  science  (complex  network),  and  the  global   value  is  often  used  in  sociology.  
  16. Local  clustering  coefficient for  undirected  networks• vi The  local  clustering  coefficient        i    for  each  vertex(user)          is  defined  by C number of triangles connected to vi Ci ≡ . ki (ki − 1)/2      *  By  definition,    0  ≤    C  i    ≤      .             1      *  ki  is  degree  of  the  user        i  .       v        *  The  user  who  have  0  or  1  friend  (                0    or  1  )  should/can  be  removed.   ki =          • The  average  of  local  clustering  coefficient  is  defined  by   N 1 C≡ Ci . N i=1        *  By  definition,  0    ≤  C          1.           ≤          *  N  is  the  total  number  of  users  in  the  network  except    k  i    =  0    or  1  .                      
  17. Degree  and  clustering  coefficient• The  influence  of          and          need  to  be  distinguished  carefully.   ki Ci Here  is  an  example.  There  are  two  people  with  5  friends,  but  the   different  number  of  links.   ki = 5 ki = 5 0 3 3 Ci = =0 Ci = = 5(5 − 1)/2 5(5 − 1)/2 10
  18. Degree  and  clustering  coefficient• Each  data  point  C(k)  for  degree  is  obtained  by  averaging          over  the             Ci users  in  a  group  with  degree      .  Large  fluctuations  of    C(k)  at  large         k         values  are  caused  by  the  paucity  of  users  having  large      .          decreases   k Ci with    k    in  many  networks  (Newman,  2010).  
  19. Homophily• Similar  individuals  are  more  likely  to  become  friends.  It  is  called   homophily .  In  this  study,  we  adopt  the  fraction  of  neighbor  with   suicide  ideation.  • It  should  be  noted  that,  if  a  user  has  relatively  many  friends  with  suicide   ideation,  it  does  not  necessarily  imply  that  suicide  is  contagious.   Homophily  may  be  a  cause  of  such  assortativity.• FYI:  There  is  some  research  to  differentiate  the  effect  of  influence  and   homophily  (Aral  et  al.,  2009;  Shalizi    Thomas,  2011) *My  presentation  on  the  effect  of  influence  and  homphily  based  on  Aralʼ’s  paper  in  slideshare.    http://www.slideshare.net/hirokoonari/ss-13221508
  20. Homophily• In  this  study,  users  in  suicide  group  has  more  comparatively  similar   friends  than  users  in  control  group.  The  same  tendency  can  be  said  for   users  in  depression-related  group.
  21. Independent  variablesPersonal  variablesAgeGenderLocal  network  variablesdegree number  of  neighbors  (friends)local  clustering  coefficient undirected  clustering  coefficientHomophily number  of  neighbors  who  join  the  suicide  /  depression  communityBehavioral  variables  in  mixiCommunity  number number  of  communities  which  a  user  joinRegistration  period number  of  days  between  the  registration  date  and  Jan.  23,  2012
  22. Statistical  models• Univariate  and  multivariate  logistic  regressions:  estimating  the   likelihood  of  belonging  to  a  suicidal  or  a  depressive  community• VIF  (variance  inflation  factor):  checking  the  multicollinearity  between   independent  variables  to  justify  the  use  of  the  multivariate  logistic   regression.  The  recommended  VIF  value  is  smaller  than  10  (preferably   smaller  than  5).• Pearson,  Spearman,  and  Kendall  correlation  coefficients:   measuring  correlation  between  the  independent  variables• AUC  (area  under  the  receiver  operating  characteristic  curve):   quantifying  the  explanatory  power  of  the  logistic  model.  The  AUC  value   falls  between  0.5  and  1.  A  large  AUC  value  indicates  that  logistic   regression  fits  well.
  23. Univariate  statistics  of  independent  variables   for  the  suicide  and  control  groups Suicide group Control group (N = 9, 990) (N = 228, 949) Variable p-value Range Range Mean±SD (min,max) Mean±SD (min,max) Age 27.4±10.3 (17, 97) 27.7±9.2 (14, 96) 0.000652 Community number 283.7±284.3 (1, 1000) 46.3±79.4 (1, 1000) 0.0001 ki 82.9±98.7 (2, 1000) 65.8±67.6 (2, 1000) 0.0001 Ci 0.087±0.097 (0, 1) 0.150±0.138 (0, 1) 0.0001 Homophily (suicide) 0.0110±0.0329 (0, 1.000) 0.0012±0.0080 (0, 0.667) 0.0001 Registration period 1235.7±638.9 (122, 2878) 1333.5±670.5 (102, 2891) 0.0001 Gender (female) 5,786 (57.9%) 126,941 (55.4%) 0.0001No. suicidal communities 1.20±0.51 (1, 4) N/A N/A N/A No. login days 28.9±4.4 (1, 31) 26.9±6.3 (1, 31) 0.0001
  24. Multivariate  logistic  regression  of  suicide  ideation  on  individual  and  network  variables Variable OR CI p-value VIF Age 1.00463 (1.00211, 1.00716) 0.000313 1.091 Gender (female = 1) 0.821 (0.783, 0.861) 0.0001 1.028 Community number 1.00733 (1.00720, 1.00747) 0.0001 1.197 ki 0.99790 (0.99758, 0.99821) 0.0001 1.156 Ci 0.0093 (0.0069, 0.0126) 0.0001 1.081 Homophily (suicide) 2.22 × 1012 (0.57 × 1012 , 8.65 × 1012 ) 0.0001 1.016 Registration period 0.999383 (0.999346, 0.999420) 0.0001 1.135 *  OR:  odds  ratio;  CI:  95%  confidence  interval;  VIF:  variance  inflation  factor AUC 0.873More  likely  to  belong  to  the  suicide  group  than  control  group  on  average;  -  A  one-year  older  user  is  1.00463  times    -  Being  female  is  1.00463  times  -  Membership  to  one  community  is  1.00733  times  -  Having  one  friend  is  0.99790  times  -  An  increase  in  Ci  by  0.01  is  0.0093^0.01  =  0.95  times  -  An  increase  in  the  fraction  of  friends  in  the  suicide  group  by  0.01  is  (2.22    10^12)^0.01  =  1.33  times  -  One  day  of  the  registration  period  is  0.999383  timesAUC  is  large,  so  this  logistic  regression  fits  well.
  25. Correlation  coefficients  between  pairs  of   independent  variables Control Suicide Depression Variable 1 Variable 2 P S K P S K P S K Age Gender −.053 −.026 −.022 −.094 −.137 −.116 −.166 −.174 −.145 Age Community number −.032 .023 .015 −.045 −.105 −.073 −.089 −.131 −.091 Age ki −.279 −.385 −.271 −.103 −.224 −.157 −.168 −.268 −.187 Age Ci .041 −.152 −.111 −.048 −.220 −.154 −.092 −.273 −.192 Age Homophily (suicide) −.011 −.090 .074 .031 −.037 −.029 N/A N/A N/A Age Homophily (depression) −.007 −.083 −.066 N/A N/A N/A .166 .121 −.089 Age Registration period .278 .460 .337 .159 .356 .259 .203 .364 .266 Gender Community number .110 .116 .095 .205 .204 .166 .086 .083 .068 Gender ki .015 .014 .011 .048 .046 .038 .048 .046 .038 Gender Ci −.084 −.085 −.069 −.109 −.097 −.080 −.061 −.030 −.024 Gender Homophily (suicide) −.012 −.017 −.017 −.007 .031 .028 N/A N/A N/A Gender Homophily (depression) .000 .009 .008 N/A N/A N/A −.053 −.021 −.018 Gender Registration period .025 .025 .020 −.064 −.061 −.050 −.078 −.079 −.065 Community number ki .375 .372 .258 .348 .338 .231 .375 .360 .248 Community number Ci −.376 −.399 −.277 −.231 −.200 −.136 −.201 −.171 −.116 Community number Homophily (suicide) .027 .113 .091 −.034 .140 .105 N/A N/A N/A Community number Homophily (depression) .038 .166 .132 N/A N/A N/A −.150 .034 .025 Community number Registration period .339 .338 .230 .166 .152 .102 .187 .172 .115 ki Ci −.363 −.248 −.175 −.251 −.116 −.085 −.240 −.105 −.074 ki Homophily (suicide) −.013 .191 .150 −.175 .174 .107 N/A N/A N/A ki Homophily (depression) −.027 .254 .188 N/A N/A N/A −.210 .076 .029 ki Registration period .102 .081 .055 .170 .154 .103 .172 .152 .101 Ci Homophily (suicide) −.026 −.100 −.080 −.047 −.213 −.162 N/A N/A N/A Ci Homophily (depression) −.031 −.145 −.114 N/A N/A N/A −.055 −.243 −.182 Ci Registration period −.221 −.249 −.168 −.143 −.112 −.162 −.133 −.099 −.068 Homophily (suicide) Registration period −.039 −.031 −.025 −.104 −.059 −.044 N/A N/A N/A Homophily (depression) Registration period −.024 .011 .009 N/A N/A N/A −.120 −.049 −.036 *  P:  Pearson;  S:  Spearman,  K:  Kendall  correlation  coefficients *                            0.2 These  correlation  coefficients  are  sufficiently  small.
  26. Univariate  logistic  regression  of  suicide   ideation  on  individual  and  network  variables Variable OR CI p-value AUC Age 0.99604 (0.99377, 0.99832) 0.000651 0.515 Gender (female = 1) 1.106 (1.062, 1.152) 0.0001 0.512 Community number 1.00728 (1.00716, 1.00741) 0.0001 0.867 ki 1.00259 (1.00237, 1.00280) 0.0001 0.549 Ci 0.000581 (0.000428, 0.000789) 0.0001 0.690 Homophily (suicide) 1.57 × 1016 (0.41 × 1016 , 6.08 × 1016 ) 0.0001 0.643 Registration period 0.999783 (0.999753, 0.999813) 0.0001 0.545 *  OR:  odds  ratio;  CI:  95%  confidence  interval;  AUC:  area  under  the  curve-  The  community  number  makes  by  far  the  largest  contribution  among  the  seven  independent  variables.-  The  second  largest  explanatory  power  is  the  AUC  0.690  of  clustering  coefficient.      This  result  is  consistent  with  the  previous  one  (Bearman    Moody,  2004).  -  The  third  largest  explanatory  power  is  the  AUC  0.643  of  homophily.
  27. Conclusions• Online  social  behavior  of  users  rather  than  demographic  properties.  The   below  factors  contribute  to  suicide  ideation  by  the  largest  amounts -  increase  in  the  community  number -  decrease  in  the  local  clustering  coefficient -  increase  in  the  homophily  variable• The  age  and  gender  little  influence  suicide  ideation  is  inconsistent  with   previous  findings  (Wray  et  al.,  2011).• The  degree  little  explains  suicide  ideation  is  inconsistent  with  previous   studies  (Bearman    Moody,  2004;  Cui  et  al.,  2010).• User-defined  communities  of  mix  cover  virtually  all  major  topics.  As  a   future  study,  applying  the  present  methods  can  be  profitable.
  28. Appendix-  Analysis  of  depressive  symptoms  -
  29. Univariate  statistics  of  independent  variables   for  the  depression  and  control  groups Depression group Control group (N = 24, 410) (N = 228, 949) Variable p-value Range Range Mean±SD (min,max) Mean±SD (min,max) Age 28.8±9.4 (16, 97) 27.7±9.2 (14, 96) 0.0001 Community number 249.6±263.1 (1, 1000) 46.3±79.4 (1, 1000) 0.0001 ki 81.9±88.1 (2, 1000) 65.8±67.6 (2, 1000) 0.0001 Ci 0.085±0.089 (0, 1) 0.150±0.138 (0, 1) 0.0001Homophily (depression) 0.0196±0.0501 (0, 1.000) 0.0031±0.0131 (0, 0.667) 0.0001 Registration period 1389.4±659.2 (122, 2885) 1333.5±670.5 (102, 2891) 0.0001 Gender (female) 16,872 (69.1%) 126,941 (55.4%) 0.0001No. suicidal communities 1.16±0.47 (1, 6) N/A N/A N/A No. login days 28.8±4.4 (1, 31) 26.9±6.3 (1, 31) 0.0001
  30. Multivariate  logistic  regression  of  depressive  symptoms  on  individual  and  network  variables Variable OR CI p-value VIF Age 1.0141 (1.0124, 1.0158) 0.0001 1.104 Gender (female = 1) 1.532 (1.481, 1.585) 0.0001 1.019 Community number 1.00790 (1.00778, 1.00803) 0.0001 1.155 ki 0.99833 (0.99810, 0.99856) 0.0001 1.154 Ci 0.0145 (0.0118, 0.0178) 0.0001 1.079Homophily (depression) 1.98 × 1010 (0.99 × 1010 , 4.02 × 1010 ) 0.0001 1.022 Registration period 0.999744 (0.999720, 0.999769) 0.0001 1.117 *  OR:  odds  ratio;  CI:  95%  confidence  interval;  VIF:  variance  inflation  factor AUC 0.866
  31. Univariate  logistic  regression  of  depressive  symptoms  on  individual  and  network  variables Variable OR CI p-value AUC Age 1.0110 (1.0097, 1.0123) 0.0001 0.551 Gender (female = 1) 1.799 (1.748, 1.850) 0.0001 0.568 Community number 1.00826 (1.00814, 1.00837) 0.0001 0.860 ki 1.00258 (1.00243, 1.00274) 0.0001 0.566 Ci 0.000415 (0.000338, 0.000509) 0.0001 0.692Homophily (depression) 2.12 × 1012 (1.05 × 1012 , 4.28 × 1012 ) 0.0001 0.658 Registration period 1.000126 (1.000106, 1.000145) 0.0001 0.522 *  OR:  odds  ratio;  CI:  95%  confidence  interval;  AUC:  area  under  the  curve

×