Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

540 views

Published on

Presentation given the Social Web for Disaster Management Workshop at WWW 2015 conference; Florence, Italy.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

  1. 1. Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations Alfredo  Cobo   ajcobo@uc.cl   Denis  Parra   dparra@ing.puc.cl   Jaime  Navón   jnavon@ing.puc.cl   Pon=ficia  Universidad  Católica  de  Chile   Departamento  de  Ciencia  de  la  Computación   Av.  Vicuña  Mackenna  4860,  Macul   San=ago,  Chile    
  2. 2. I (… and some other people in this room) …  come  from  Chile   Picture  from  hMp://www.quadrodemedalhas.com/images/mapas/mapa-­‐chile.jpg   hMp://upload.wikimedia.org/wikipedia/commons/thumb/9/91/Chile_in_South_America_(-­‐mini_map_-­‐rivers).svg/409px-­‐Chile_in_South_America_(-­‐mini_map_-­‐ rivers).svg.png  
  3. 3. Chile, well-known for its.. •   Copper  (Top  Producer)   "Top  5  Copper  Producers"  by  Plazak  -­‐  Own  work.  Licensed  under  CC  BY-­‐SA  3.0  via  Wikimedia  Commons  -­‐  hMp://commons.wikimedia.org/wiki/ File:Top_5_Copper_Producers.png#/media/File:Top_5_Copper_Producers.png   hMps://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=0CAYQjB0&url=hMp%3A%2F%2Fcommons.wikimedia.org%2Fwiki%2FFile %3ANa=ve_Copper_(mineral).jpg&ei=L31ZVbOsL4r1UrbRgKAB&bvm=bv.93564037,d.d24&psig=AFQjCNHr2zm5m4Jmim7AgkCwwSb0b5mGUA&ust=1432014509629311  
  4. 4. Chile, well-known for its.. • Wine     (Price  +  quality)     "Fiesta  de  Vendimia"  by  LuxoDresden  -­‐  Own  work.  Licensed  under  CC  BY-­‐SA  3.0  via  Wikimedia  Commons  -­‐  hMp://commons.wikimedia.org/wiki/ File:Fiesta_de_Vendimia.JPG#/media/File:Fiesta_de_Vendimia.JPG  
  5. 5. If you start typing in Google… 9  out  of  10   disasters  …  
  6. 6. If you start typing in Google… 9  out  of  10   disasters  …   prefer  Chile  
  7. 7. … and for Natural Disasters L • Largest  ever  registered  earthquake  in  History:   Valdivia,  Chile,  22nd  of  May  of  1960  (9.5  in  Richter   Scale)   • We  usually  have  1  large  earthquake  every  30  years  (~   8  degrees    in  Richter  Scale)   • Last  one  in  2010  close  to  Concepción,  but  it  also   affected  San=ago  (the  capital)  
  8. 8. … so, at PUC Chile • We  created  CIGIDEN  “Na=onal  Research  Center  for   the  Integrated  Administra=on  of  Natural  Disasters”  
  9. 9. CIGIDEN’s Goal in this project • Help  ci=zens  staying  informed  during  situa=ons   of  natural  disasters  by  using  Social  Media.   • Build  Mobile  Applica=on  (Carlos  Molina)   • Filter  automa=cally  relevant  messages  from  those   not  related  to  earthquakes  (Alfredo  Cobo)  to  feed   the  applica=on    
  10. 10. Our Task: Building a Twitter classifier -­‐ Filter  tweets  related  to  natural  disasters  from  those   who  did  not.    
  11. 11. Related Work Manual  Classifica8on   Data  Post-­‐processing   Feature  Genera8on   Tools  for  Disaster  Management   Vieweg  et  al.  (2010)   Imran  et  al.  (2013)   Mendoza  et  al.  (2010)       Mendoza  et  al.  (2010)   Cas=llo  et  al.  (2011)     (Informa=on  Credibility   on  TwiMer)   Gimpel  et  al.  (2011)   Koloumpis  et  al.  (2011)   Liu  et  al.  (2012)   Wu  et  al.  (2011)   Lee  et  al.  (2014)     (Not  necessarily  for   natural  disasters)     Hiltz  et  al.  (2013)   Power  et  al.  (2013)   Caragea  et  al.  (2011)   Abel  et  al.  (2012)   Middleton  et  al.  (2014)   MorstaMer  et  al.  (2013)   Imran  et  al.  (2014)  
  12. 12. Why building this classifier would be a contribution? • Building  and  valida=ng  a  ground  truth  for   classifying  tweets  in  Spanish.   • Building  the  classifier  and  dealing  with   • Class  Imbalance     • Number  of  latent  dimensions  (Feature  Genera=on   using  LDA)  
  13. 13. Workflow of Activities Chile’s  Earthquake   2010   Cas=llo  et  al.   (2010)   Our   ground truth   Non-­‐ relevant   messages   Realis=c   dataset   Sampling,   Cleaning  &     filtering   Classifiers   -­‐  Feature   selec=on  (LDA)   -­‐  Class  Imbalance   10%  -­‐  80%  
  14. 14. Building the ground truth • Random  sampling  of  5,000  tweets  from  Cas=llo  et   al.  (2010)  dataset,  used  to  study  credibility  ~  Chile’s   2010  earthquake.   • Dates:  From  February  27th  un=l  March  2nd   (Spanning  4  days  in  2010)   • We  kept  only  Spanish  messages,  removed   messages  too  similar  (Lavenshtein  distance):  2,187   messages  leE  
  15. 15. Validating of the ground truth •  Fleiss  Kappa:   •  κ  =  0.645,  p  <  .001   •  Intraclass  correla=on   •  ICC(2,1):  IIC  =  0.646,  p   <  .001   •  Landis  and  Koch  et  al.   (1977)     •   Relevant  messages  were   labeled  based  on  Imran  et  al.   (2013)  classifica=on:   • Cau=on/Warning   • Casual=es  and  Damage   • People  (missing,  found,  etc.)   • Informa=on  source  
  16. 16. Workflow of Activities Chile’s  Earthquake   2010   Cas=llo  et  al.   (2010)   Our   ground truth   Non-­‐ relevant   messages   Realis=c   dataset   Sampling,   Cleaning  &     filtering   Classifiers   -­‐  Feature   selec=on  (LDA)   -­‐  Class  Imbalance  
  17. 17. Classification Problem Features                                                                                      Class  Imbalance   User   Network   Content  (4,766   unique  words)   Followers   Hashtags   Followees   Words   User  men=ons   •  Ground  Truth  is  a  not  realis=c   representa=on  of  TwiMer   •  We  added  “Noise”:  Introduced   Tweets  non-­‐relevant  to  the  event   (20%  -­‐  80%)   •  Sampled  non-­‐relevant  tweets   from  5  months.   •  Removed  all  tweets  posted   during  days  of  seismic  ac=vi=es  
  18. 18. Model   Precision   Recall   F1  score   Accuracy   AUC   Dimensions   Noise   Propor8on   Baseline   0.625   0.545   0.53   0.5   0.568   -­‐   0   Bernoulli   NB   0.831   0.226   0.355   0.594   0.605   2000   0   Logis=c   Regression   0.827   0.641   0.722   0.756   0.834   2000   0.6   Linear  SVM   0.687   0.677   0.682   0.687   0.719   1000   0.6   Random   Forest   0.807   0.673   0.734   0.758   0.844   1000   0.8   Classification Results
  19. 19. Analysis ~ LDA Dimensions and Noise
  20. 20. Analysis ~ LDA Dimensions and Noise
  21. 21. Conclusions & Future Work • We  built  and  validated  a  ground  truth  of  tweets   in  Spanish  relevant  to  disasters   • We  implemented  a  classifier  and  analyzed  its   performance  based  on  several  algorithms  and   dealing  with  class  imbalance  problem   • Future  Work:  Move  the  applica=on  from   prototype  to  produc=on,  test  online  scalability  
  22. 22. That’s all folks! •   Thanks  and  ques=ons  to  corresponding  author   Alfredo  Cobo:  ajcobo@uc.cl  or   Denis  Parra:  dparra@uc.cl    
  23. 23. Chile, small country, but well-known for its.. • Length  (4,300  Km)     ~  4,300  Km   ~8,000  Km  
  24. 24. Model Features •  Newman  et  al.  (2007)   •  Biro  et  al.  (2008)   •  Wei  et  al.  (2006)   •  Wang  et  al.  (2012)   •  Han  (2005)   Features   Corpora  Features   Followers   Hashtags   Friends   Words   User  men=ons  
  25. 25. Results •  Amatriain  et  al.  (2013)  
  26. 26. Architecture
  27. 27. Plots of bootstrap Agreement  Day  1   Agreement  Day  2   Agreement  Day  4  Agreement  Day  3  
  28. 28. Word Frequencies
  29. 29. Just “Terremoto”: AUC
  30. 30. Related Work
  31. 31. Manual classification •  Vieweg  et  al.  (2010)   •  Imran  et  al.  (2013)  
  32. 32. Post Processing •  Cas=llo  et  al.  (2011)   •  Mendoza  et  al.  (2010)  
  33. 33. Feature Generation Approaches •  Gimpel  et  al.  (2011)   •  Koloumpis  et  al.  (2011)   •  Liu  et  al.  (2012)   •  Wu  et  al.  (2011)   •  Lee  et  al.  (2014)  
  34. 34. Tools For Disaster Management •  Hiltz  et  al.  (2013)   •  Power  et  al.  (2013)   •  Caragea  et  al.  (2011)   •  Abel  et  al.  (2012)   •  Middleton  et  al.  (2014)   •  MorstaMer  et  al.  (2013)   •  Imran  et  al.  (2014)  
  35. 35. Building the ground truth •  Mendoza  et  al.  (2010)   •  Imran  et  al.  (2013)  
  36. 36. Algorithms and evaluation procedure •  Cas=llo  et  al.  (2011)   •  FawceM  et  al.  (2004)   •  Manning  et  al.  (2008)   •  Wen  et  al.  (2014)  

×