Identifying and Characterizing User Communities on Twitter during Crisis Events

406 views
326 views

Published on

Twitter is a prominent online social media which is used to share information and opinions. Previous research has shown that current real world news topics and events dominate the discussions on Twitter. In this paper, we present a preliminary study to identify and characterize communities from a set of users who post messages on Twitter during crisis events. We present our work in progress by analyzing three major crisis events of 2011 as case studies (Hurricane Irene, Riots in England, and Earthquake in Virginia). Hurricane Irene alone, caused a damage of about 7-10 billion USD and claimed 56 lives. The aim of this paper is to identify the different user communities, and characterize them by the top central users. First, we defined a similarity metric between users based on their links, content posted and meta-data. Second, we applied spectral clustering to obtain communities of users formed during three different cri- sis events. Third, we evaluated the mechanism to identify top central users using degree centrality; we showed that the top users represent the topics and opinions of all the users in the community with 81% accuracy on an average. The top central people identified represent what the entire community shares. Therefore to understand a community, we need to monitor and analyze only these top users rather than all the users in a community.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
406
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Identifying and Characterizing User Communities on Twitter during Crisis Events

  1. 1. Identifying and Characterizing User Communities on Twitter during Crisis Events Adi$  Gupta,  Anupam  Joshi,    Ponnurangam  Kumaraguru     Oct.  29,  2012   UMSOCIAL  @  CIKM’12  
  2. 2. Outline  Problem  Statement    Introduc$on    Data    Methodology    Results    Future  Work    References   2  
  3. 3. Problem Statement  To  iden$fy  and  characterize  user   communi$es  on  TwiRer  during  crisis  events,   using  semi-­‐automated  techniques.   3  
  4. 4. Data: Three 2011 Crisis Events Event:  Hurricane  Irene   Tweets  collected:    90,237   Users:  55,718   Event:  Earthquake  in  VA   Tweets  collected:  277,604   Users:  219,621   Event:  England  Riots   Tweets  collected:  1,165,628   Users:  546,966   4  
  5. 5. Our Contributions  Defined  a  novel  similarity  metric  to  compute   similari$es  between  users  based  on  their  links,   content  posted  and  meta  data.      Applied  spectral  clustering  to  obtain   communi$es  of  users  formed  during  three   different  crisis  events.      Showed  most  central  people  (by  degree   centrality)  represent  what  users  in  the  cluster   are  talking  about  with  81%  accuracy  on   average.   5  
  6. 6. Methodology  STEP  1:  Computer  User*User  Similarity  Matrix      STEP  2:  Use  spectral  clustering  to  obtain   communi$es  of  users      STEP  3:  Use  degree  centrality  to  compute  top   users  in  each  cluster   6  
  7. 7. Methodology  user*user  similarity  matrix,  U[i,j]   -   Content  Similarity,  C[i,j]     Computed  based  on  common  words,  hashtags  and  URLs   in  tweets   - Link  Similarity,  L[i,j]     Based  on  users  retweet,  men$on  or  reply  to  each  other   tweets  or  a  common  third  person’s  tweet   - Meta-­‐data  Similarity,  M[i,j]     Similarity  between  two  users  based  on  their  loca$on   (country,  state  or  city)    U[i,j]  =  C[i,j]  +  L[i,j]  +  M[i,j],  for  each  user  i  and  j   7  
  8. 8. Top Users Identification  Use  degree  centrality   - Social  Network  Analysis  metric   - Number  of  links  incident  upon  a  node  (i.e.,  the   number  of  $es  that  a  node  has)      In  each  cluster  obtained     - We  compute  the  top  30  users  using  degree   centrality   8  
  9. 9. Evaluation Metrics  Modularity   - How  good  is  the  division  of  the  network  in   communi$es      Mean  average  precision   - With  how  much  precision  can  we  say  that  the   influencers  in  the  cluster  represent  the  en$re   cluster     9  
  10. 10. Results: Modularity score  The  modularity  score  greater  than  0.5  for  all   events   - Hurricane  Irene  =  0.67   - England  Riots  =  0.51   - Virginia  Earthquake  =  0.53      We  conclude  that  the  division  of  graph  into   communi$es  is  of  good  quality   10  
  11. 11. Results: MAP Performance •  Top  users  based  on  degree  centrality  represent  the  opinions  and   topic  of  the  en$re  cluster  during  crisis  events.   •  In  case,  of  a  random  TwiRer  sample,  the  above  is  not  true.   11  
  12. 12. Results: Characterization Events Community 1 Community 2 Community 3 Hurricane Irene hurricane, irene, http, jada, song, brenda, butistillloveu, photos, icanhonestlysay, check, neverapologizefor, video, scandal, angelina, jolie, storm, nick, college, bahamas, puerto irene, hurricane, http, coast, category, bahamas, east, puerto, storm, rico, winds, advisory, issued, fore- cast, heads, track, path, update, moving, number hurricane, irene, http, puerto, coast, storm, rico, news, track, winds, bahamas, category, east, forecast, advisory, florida, update, issued, tropical, latest England Riots londonriots, http, ukriots, police, london, riots, people, hackney, riot ers, news, fire, dont, riot, cameron, riotcleanup, birmingham, looting, croydon, night, stop londonriots, http, ukriots, police, london, riots, people, rioters, cameron, hackney, news, fire, dont, riot, looting, prayforlondon, birm- ingham, manchesterriots, stop, riot- cleanup londonriots, http, ukriots, police, riots, london, people, rioters, hackney, cameron, riot, birmingham, news, fire, riotcleanup, dont, looting, make, manchesterriots, stop, rioting Virginia Earthquake earthquake, http, today, quake, east, coast, earth, richmond, washington, august, virginia, news, campbell, glen, monument, heard, norwegian, alaska, new york, video earthquake, http, east, coast, virginia, washington, felt, magnitude, news, monument, feel, damage, today, nuclear, quake, epicenter, people, school, video earthquake, http, coast, east, washington, felt, virginia, news, damage, today, magnitude, monument, feel, quake, people, video, area, breaking, shit, nuclear Hurricane  Irene   •  Community  1:  Contains  spam  words  –  spammers  dominate  this  cluster   •  Community  2  and  3  contain  similar  words   •  Analyze  the  corresponding  top  central  users  (next  slide)   12  
  13. 13. Results: Characterization C2:Username C2:Description C3:Username C3:Description top_tw_science none YaekoAvilez3837 none top_trend_world none MarxInbody5668 none Neednewsnow none KerrieRamagano3 none iBreakings none AureliaHickman1 none metofficestorm Official Met Office page for latest information on tropical storms, hurricanes, typhoons and cyclones LenitaBross5906 none RES911CU RES911CU Bringing you the latest news/weather from around the world & country using over 1000 news sources BeckieSteakley4 none qianafgtt none KirbyPietig6221 none Georgeannla47 none GailSterner5774 none moneyworlds none LeroyRasheed none LinJosh2011 none ClorindaKan1926 none top_trend_uk none StaceeRosebure2 none ShareWith911 Emergency Information Sharing Network connecting 911,First Responders and Citizens before, during and after emergencies of all shapes and sizes. VaniaAsmus none Hurricane  Irene   •  Community  2:  formed  of  official  news  media,  emergency  and  alert  profiles   (ShareWith911,  metofficestorm,  RES911CU,  Neednewsnow)   •  Community  3:  formed  with  common  or  general  users  of  TwiRer  (KerrieRamagano3,   YaekoAvilez3837,  AureliaHickman1)   13  
  14. 14. Conclusions  The  results  obtained  with  the  case  study  of   three  events  show  the  poten$al  of  applying   spectral  clustering  and  centrality  measures,   to  iden$fy  community  of  users  and  the   prominent  top  users  in  each  community,   during  crisis  events.     14  
  15. 15. Future Work  Construct  weighted  similarity  metrics  to   compute  similarity  between  users      Include  behavioral  features  of  users  in   similarity  metric  construc$on      Conduct  a  more  generic  experiment  with   more  crisis  events   15  
  16. 16. References   Gupta  et  al.  Credibility  ranking  of  tweets  during  high  impact  events.  In  Proceedings  of  the  1st   Workshop  on  Privacy  and  Security  in  Online  Social  Media,  PSOSM  ’12.     Java  et  al.  Detec$ng  Communi$es  via  Simultaneous  Clustering  of  Graphs  and  Folksonomies.  In   Proceedings  of  the  Tenth  Workshop  on  Web  Mining  and  Web  Usage  Analysis  (WebKDD).  ACM,   August  2008.     Kwak  et  al.  What  is  twiRer,  a  social  network  or  a  news  media?  In  Proceedings  of  the  19th   interna$onal  conference  on  World  wide  web,  WWW  ’10.  ACM.     Longueville  et  al.  “omg,  from  here,  i  can  see  the  flames!”:  a  use  case  of  mining  loca$on  based   social  networks  to  acquire  spa$o-­‐temporal  data  on  forest  fires,  LBSN,  2009.     Mendoza  et  al.  TwiRer  under  crisis:  can  we  trust  what  we  rt?  In  Proceedings  of  the  First   Workshop  on  Social  Media  Analy$cs,  SOMA  ’10.     Newman  et  al.  Analysis  of  weighted  networks.  Phys  Rev  E  Stat  Nonlin  Sor  MaRer  Phys,  70(5  Pt   2),  Nov.  2004.     Shi  et  al.  Normalized  cuts  and  image  segmenta$on.  In  Proceedings  of  the  1997  Conference  on   Computer  Vision  and  PaRern  Recogni$on  (CVPR  ’97).       Vieweg  et  al.  Microblogging  during  two  natural  hazards  events:  what  twiRer  may  contribute  to   situa$onal  awareness.  In  CHI  ’10.       Wang  et  al.  Detec$ng  community  kernels  in  large  social  networks.  In  Proceedings  of   Interna$onal  Conference  on  Data  Mining,  ICDM  ’11.       Welch  et  al.  Topical  seman$cs  of  twiRer  links.  In  Proceedings  of  the  fourth  interna$onal   conference  on  Web  search  and  data  mining,  WSDM  ’11.   16  
  17. 17. Thank you Contact:   adi$g@iiitd.ac.in   joshi@cs.umbc.edu  [ebiquity.umbc.edu]   pk@iiitd.ac.in  [precog.iiitd.edu.in]   17  
  18. 18. For any further information, please write to pk@iiitd.ac.in precog.iiitd.edu.in 18  

×