Predicting Crowd Behavior with Big Public Data


Published on
Presented at 23rd International World Wide Web Conference, Seoul, Korea, April, 2014

With public information becoming widely accessible and shared on today's web, greater insights are possible into crowd actions by citizens and non-state actors such as large protests and cyber activism. We present efforts to predict the occurrence, specific timeframe, and location of such actions before they occur based on public data collected from over 300,000 open content web sources in 7 languages, from all over the world, ranging from mainstream news to government publications to blogs and social media. Using natural language processing, event information is extracted from content such as type of event, what entities are involved and in what role, sentiment and tone, and the occurrence time range of the event discussed. Statements made on Twitter about a future date from the time of posting prove particularly indicative. We consider in particular the case of the 2013 Egyptian coup d'etat. The study validates and quantifies the common intuition that data on social media (beyond mainstream news sources) are able to predict major events.

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Predicting Crowd Behavior with Big Public Data

  1. 1. Predic'ng  Crowd  Behavior   with  Big  Public  Data   Nathan  Kallus   Massachuse.s  Ins0tute  of  Technology   As  seen  on:   April  8,  2014   23rd  Interna0onal  World  Wide  Web  Conference   Seoul,  Korea  
  2. 2. How  did  crowds  used  to  come   together  and  how   did  we  hear  about   it  in  the  analog  era?  
  3. 3. How  did  crowds  used  to  come   together  and  how   did  we  hear  about   it  in  the  analog  era?  
  4. 4. How  did  crowds  used  to  come   together  and  how   did  we  hear  about   it  in  the  analog  era?  
  5. 5. How  did  crowds  used  to  come   together  and  how   did  we  hear  about   it  in  the  analog  era?  
  6. 6. How  did  crowds  used  to  come   together  and  how   did  we  hear  about   it  in  the  analog  era?  
  7. 7. The  Growth  of  Data  
  8. 8. The  Growth  of  Data  
  9. 9. “Since the revolt in Syria, the security situation in Lebanon has deteriorated.” The  Growth  of  Data  
  10. 10. “Since the revolt in Syria, the security situation in Lebanon has deteriorated.” The  Growth  of  Data  
  11. 11. “Since the revolt in Syria, the security situation in Lebanon has deteriorated.” The  Growth  of  Data  
  12. 12. “Since the revolt in Syria, the security situation in Lebanon has deteriorated.” The  Growth  of  Data  
  13. 13. Is  this  data  predic0ve?   •  Sunday  6/9/13:  One  protester  dead  in  a  violent  Beirut  protest   against  Hezbollah's  interference  in  Syria   •  1  day  before  in  the  news:   “Lebanese  fac0on  organizes  two  demonstra0ons  tomorrow  rejec0ng  the   par0cipa0on  of  Hezbollah  in  the  figh0ng  in  Syria”  (translated  from  Arabic)   •  4  days  before  on   “Say  no  to  #WarCrimes  and  demonstrate  against  #Hezbollah  figh0ng  in  #Qusayr   on  June  9  at  12  PM  in  Downtown  #Beirut”     •  General  sense  of  violence  through  news  fragments:   6/6:  “Fatwa  Calls  For  Suicide  A.acks  Against  Hezbollah”  (   6/4:  “Since  the  revolt  in  Syria,  the  security  situa0on  in  Lebanon  has   deteriorated”  (Al  Bawaba)   5/23:  “The  revolt  in  Syria  has  exacerbated  tensions  in  Lebanon,  which  ...  remains   deeply  divided”  (Huff  Post)  
  14. 14. The  signal  is  there…   Reports  of  protest  in  Lebanon  by  publish  day:                 We  just  need  the  data…   Mainstream news Forward-looking twitter 9ê15 10ê1 10ê15 11ê1 11ê15 Day 50 100 Mentions
  15. 15. Data catered by Recorded Future       Con0nually  scans   300,000+  sources   in  7  languages.   News,  blogs,  social   media,  govt   publica0ons…  
  16. 16. Extract events, times, entities …………… …………… …………… ……………   …………… …………… …………… ……………   …………… …………… …………… ……………   Entities (whoand where) reported   event  
  17. 17. Use  this  to  quan0fy  the  signals   All Mainstream Twitter Afghanistan 60918 13979 27655 Bahrain 246136 32873 177310 Egypt 944998 246882 397105 France 172508 22648 111702 Greece 122416 18037 70521 India 491475 56981 274027 Indonesia 34007 6870 17120 Iran 118704 26487 53962 Italy 65569 8977 43803 Jordan 35396 7991 19369 Lebanon 44153 9610 23394 Libya 162721 43093 69437 Nigeria 70635 7873 38700 Pakistan 289643 25982 213636 Saudi Arabia 39556 12452 13670 Sudan 28680 6733 13654 Syria 212815 63538 79577 Tunisia 99000 35218 27233 Yemen 70583 29140 16712 •  19  countries   •  Events  published   1/1/2011–7/10/2013   •  Millions  of  reports     of  protests!   •  Train  (+  cross-­‐val)  on     1/1/2011–3/5/2013   •  Test  on     3/6/2013–7/10/2013  
  18. 18. •  Clustering  of  country  in  ques0on  (hierarchical  clustering  on   Kolmogorov  distance)   •  Same-­‐day  mainstream  reports  of  protests  over  past  10  days   •  Level  of  violence  language  in  those   •  Forward-­‐looking  events  reported  on  Twi<er     about  days  in  ques0on  posted  over  past  10  days   •  Forward-­‐looking  events  reported  by  mainstream  sources     about  days  in  ques0on  posted  over  past  10  days   Features  for  Random  Forest   To  determine  whether  a  future  0me  will  have   significant  protests,  base  our  predic0on  on…
  19. 19. Results   0% 25% 50% 75% 100% FPR 25% 50% 75% 100% TPR 0% 25% 50% 75% 100% FPR 25% 50% 75% 100% TPR Locale-­‐rela0ve   scale  of  significance   Global  absolute   scale  of  significance   (23%  of  posi0ve  training   instances  in  Egypt)  
  20. 20. Case  Study:  2013  Egyp0an  Coup   •  6/30/13:  Anniversary  of  Morsi’s  rule   –  Protests  broadly  an0cipated   (even  Kerry  made  a  statement  asking  for  calm)   •  6/28-­‐6/29/13:  Unan0cipated  warm-­‐up  protests   •  7/3/13  and  onward:  Morsi  removed  from  power,   nonstop  protes0ng  and  violence  persists  for  weeks   to  come  
  21. 21. 6ê17 – 6ê19 6ê18 – 6ê20 6ê19 – 6ê21 6ê20 – 6ê22 6ê21 – 6ê23 6ê22 – 6ê24 6ê23 – 6ê25 6ê24 – 6ê26 6ê25 – 6ê27 6ê26 – 6ê28 6ê27 – 6ê29 6ê28 – 6ê30 6ê29 – 7ê1 6ê30 – 7ê2 7ê1 – 7ê3 7ê2 – 7ê4 7ê3 – 7ê5 7ê4 – 7ê6 7ê5 – 7ê7 7ê6 – 7ê8 7ê7 – 7ê9 7ê8 – 7ê10 765432 July 13029282726252423222120191817 June 16Datesofprotestsinquestion Date prediction is made
  22. 22. A  more  recent  predic0on  
  23. 23. From  a  March  26  email     “Right  now  I  think  we  can  likely  expect  something  to  happen  in  Egypt  on   Friday.  There've  been  protests  so  far  …  but  these  have  been  at  Egypt's   "baseline"  levels  so  far.  With  tensions  rising  and  calls  on  from  the  pro-­‐ Brotherhood  side  as  well  as  counter  calls  for  celebratory  demonstra0ons   from  the  pro-­‐Sisi  side  ci0ng  Tamarod  …  This  will  bubble  up  further   throughout  today  and  likely  erupt  on  Friday.”   43  HOURS  LATER:   (The  previous  big  coverage,  a  week  before,  about  unrest  in                          Egypt  included  a  single  death.  I.e.,  this  is  significant.)   A  more  recent  predic0on  
  24. 24. From  a  March  26  email     “Right  now  I  think  we  can  likely  expect  something  to  happen  in  Egypt  on   Friday.  There've  been  protests  so  far  …  but  these  have  been  at  Egypt's   "baseline"  levels  so  far.  With  tensions  rising  and  calls  on  from  the  pro-­‐ Brotherhood  side  as  well  as  counter  calls  for  celebratory  demonstra0ons   from  the  pro-­‐Sisi  side  ci0ng  Tamarod  …  This  will  bubble  up  further   throughout  today  and  likely  erupt  on  Friday.”   Facts  men0oned  in  AP  release  at  a  later  0me   •  Track  those  social  media  trends  that  will  lead  to  a   significant  event  to  understand  what’s  going  on   and  what’s  going  to  go  on  and  why   A  more  recent  predic0on  
  25. 25. A  more  recent  predic0on  
  26. 26. From  an  April  1  email     “…  upcoming  protest  in  Bahrain.  The  tweets  behind  that  predic0on  reveal  it   to  be  an0-­‐government  demonstra0ons  planned  to  start  ahead  of  the   upcoming  F1  Grand  Prix.”   56  HOURS  LATER:   A  more  recent  predic0on  
  27. 27. More  Results:   Predic0ng  Hack0vist  Cyber  A.acks   Targets BAC Perpetrators BAC Israel 68.9% Anonymous 70.3% Germany 65.4% AnonGhost 70.8% South Korea 63.1% LulzSec 60.6% United Kingdom 65.5% Guccifer 66.7% By  targets  or  perpetrator.  
  28. 28. THANK  YOU!   QUESTIONS?   Nathan  Kallus