Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Crisis Computing

1,660 views

Published on

Finding relevant and credible information in social media during disasters

Big Data Analytics Conference
Delhi, India, December 2014

Published in: Social Media
  • Be the first to comment

Crisis Computing

  1. 1. Crisis Computing Finding relevant and credible information on social media during disasters Big Data Analytics Conference Delhi, India, December 2014
  2. 2. January 2010 How/when did it start for me?
  3. 3. Humanitarian Computing At least 775publications: ● Crisis Analysis (55) ● Crisis Management (309) ● Situational Awareness (67) ● Social Media (231) ● Mobile Phones (74) ● Crowdsourcing (116) ● Software and Tools (97) ● Human-Computer Interaction (28)  ● Natural Language Processing (33)  ● Trust and Security (33) ● Geographical Analysis (53) Source: http://humanitariancomp.referata.com/
  4. 4. Humanitarian Computing Topics
  5. 5. http://www.youtube.com/watch?v=0UFsJhYBxzY
  6. 6. 8 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ An earthquake hits a Twitter user • When an earthquake strikes, the first tweets are posted 20-30 seconds later • Damaging seismic waves travel at 3-5 km/s, while network communications are light speed on fiber/copper + latency • After ~100km seismic waves may be overtaken by tweets about them http://xkcd.com/723/
  7. 7. Examples of crisis tweets
  8. 8. Alexandra Olteanu, Sarah Vieweg and Carlos Castillo: What to Expect When the Unexpected Happens: Social Media Communications Across Crises. To appear in CSCW 2015. Examples of crisis tweets (cont.)
  9. 9. 11 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Fertile grounds for applied research ✔ Problems of global significance ✔ Solved with labor-intensive methods ✔ Better solution provides a public good ✔ Large and noisy data sets available ✔ Engage volunteer communities
  10. 10. 12 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Fertile grounds for applied research ✔ Problems of global significance ✔ Solved with labor-intensive methods ✔ Better solution provides a public good ✔ Large and noisy data sets available ✔ Engage volunteer communities • Relevance to practitioners?
  11. 11. 13 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Current collaborators Patrick Meier – QCRI Sarah Vieweg – QCRI Muhammad Imran – QCRI Irina Temnikova – QCRI Alexandra Olteanu – EPFL Aditi Gupta – IIIT Delhi “P.K.” Kumaraguru – IIIT Delhi Fernando Diaz – Microsoft
  12. 12. 14 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Outline Crisis Maps Extraction Matching Verification Credibility
  13. 13. Crisis maps from social media Carlos Castillo, Fernando Diaz, and Hemant Purohit: Leveraging Social Media and Web of Data to Assist Crisis Response Coordination Tutorial at SDM, Philadelphia, PA, USA. April 2014. Hemant Purohit, Carlos Castillo, Patrick Meier and Amit Sheth: Crisis Mapping, Citizen Sensing and Social Media Analytics Tutorial at ICWSM, May 2013.
  14. 14. Patrick Meier, Social Innovation Director @ QCRI – http://irevolution.net/ “What can speed humanitarian response to tsunami-ravaged coasts? Expose human rights atrocities? Launch helicopters to rescue earthquake victims? Outwit corrupt regimes? A map.”
  15. 15. 21 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Crisis mapping goes mainstream (2011)
  16. 16. http://newsbeatsocial.com/watch/0_s6xxcr3p
  17. 17. Understanding Crisis Tweets Alexandra Olteanu, Sarah Vieweg and Carlos Castillo: What to Expect When the Unexpected Happens: Social Media Communications Across Crises. To appear in CSCW 2015.
  18. 18. 29 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Types of Disaster
  19. 19. 30 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 3. Extraction Our approach 2. Classification 1. Filtering
  20. 20. 31 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Filtering Is disaster- related? Contributes to situational awareness? Yes Yes No No
  21. 21. 32 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Classification Caution & Advice Information Sources Damage & Casualties Donations Gov Eyewitness Media NGO Outsider ... ... Filtered tweets
  22. 22. 33 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ A large-scale study of crisis tweets • Collect tweets from 26 disasters • Classify according to: ● Informative / Not informative ● Information provided ● Information source
  23. 23. 34 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Advice on labeling • Your instructions will never be correct the first time you try – e.g. personal / eyewitness – Instructions must be re-written reactively – Perform small-scale labeling first • Instructions must be concrete and brief – If you can't do it, the task has to be divided
  24. 24. 35 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Information Provided in Crisis Tweets N=26; Data available at http://crisislex.org/
  25. 25. 36 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ What do people tweet about? • Affected individuals – 20% on average (min. 5%, max. 57%) – most prevalent in human-induced, focalized & instantaneous events • Sympathy and emotional support – 20% on average (min. 3%, max. 52%) – most prevalent in instantaneous events • Other useful information – 32% on average (min. 7%, max. 59%) – least prevalent in diffused events
  26. 26. 37 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ What do people tweet about? (cont.) • Infrastructure and utilities – 7% on average (min. 0%, max. 22%) – most prevalent in diffused events, in particular floods • Caution and advice – 10% on average (min. 0%, max. 34%) – least prevalent in instantaneous & human-induced events • Donations and volunteering – 10% on average (min. 0%, max. 44%) – most prevalent in natural hazards
  27. 27. Distribution over information sources
  28. 28. Distribution over time
  29. 29. Extracting information and matching emergency-related resources Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Extracting Information Nuggets from Disaster-Related Messages in Social Media In ISCRAM. Baden-Baden, Germany, 2013. Best paper award. Hemant Purohit, Amit Sheth, Carlos Castillo, Patrick Meier, Fernando Diaz: Emergency-Relief Coord. on Social Media: Auto. Matching Resource Requests and Offers First Monday 19 (1), January 2014 Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Practical Extraction of Disaster-Relevant Information from Social Media In SWDM. Rio de Janeiro, Brazil, 2013
  30. 30. 41 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Information Extraction ... Classified tweets @JimFreund: Apparently we have no choice. There is a tornado watch in effect tonight.
  31. 31. 42 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Extraction • #hashtags, @user mentions, URLs, etc. – Regular expressions – Text library from Twitter • Temporal expressions – Part-of-speech tagger + heuristics – Natty library • Supervised learning
  32. 32. 43 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Labels for extraction • Type-dependent instruction • Ask evaluators to copy-paste a word/phrase from each tweet
  33. 33. 44 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Learning: Conditional Random Fields • Used extensively in NLP for part-of-speech tagging and information extraction • Representation of observations is important (capitalization, position, etc.) HMM Linear-chain CRF hidden observed
  34. 34. 45 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Tool • CMU ARK Twitter NLP – Tokenization – Feature extraction – CRF learning • Very easy to use: simply change the training set (part-of-speech tags) into anything, and re-train
  35. 35. 46 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Output examples RT @weatherchannel: .@NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC Wow what a mess #Sandy has made. Be sure to check on the elderly and homeless please! Thoughts and prayers to all affected RT @twc_hurricane: Wind gusts over 60 mph are being reported at Central Park and JFK airport in #NYC this hour. #Sandy RT @mitchellreports: Red Cross tells us grateful for Romney donation but prefer people send money or donate blood dont collect goods NOT best way to help #Sandy
  36. 36. 47 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Extractor evaluation Setting Rec Prec Train 2/3 Joplin, Test 1/3 Joplin 78% 90% Train 2/3 Sandy, Test 1/3 Sandy 41% 79% Train Joplin, Test Sandy 11% 78% Train Joplin + 10% Sandy, Test 90% Sandy 21% 81% • Precision is: one word or more in common with what humans extracted
  37. 37. 48 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Donations matching • Identify and match requests/offers for donations – Money, clothing, food, shelter, volunteers, blood Average precision = 0.21 (0.16 if only text similarity is used)
  38. 38. Crowdsourced stream processing systems Muhammad Imran, Ioanna Lykourentzou and Carlos Castillo: Engineering Crowdsourced Stream Processing Systems http://arxiv.org/abs/1310.5463
  39. 39. 50 Carlos Castillo – chato@acm.org http://www.chato.cl/research/
  40. 40. 51 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Design objectives and principles Design principles Design objective Example metric Automatic components Crowdsourced components Low latency End-to-end time Keep-items moving Trivial tasks High throughput Output items per unit of time High-performance processing Task automation Load adaptability Rate response function Load shedding, load queueing Task prioritization Cost effectiveness Cost vs. quality, throughput, etc. N/A Task frugality High quality Application- dependent Redudancy, aggregation and quality control
  41. 41. Design patterns ● QA loop ● Task assignment ● Process/verify ● Supervised learning ● Crowdwork sub-task chaining ● Humans are not a bottleneck ● Humans review every output element
  42. 42. 53 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ http://aidr.qcri.org/
  43. 43. 54 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Self-service for crisis-related classification Unstructured text reports Categorized information Automatic classifier Model Builder Crowdsourced ground-truth Library of training data
  44. 44. Credibility and verification Aditi Gupta, Ponnurangam Kumaraguru, Carlos Castillo and Patrick Meier: TweetCred: A Real-time Web-based System for Credibility of Content on Twitter In SocInfo 2014. Runner-up for best paper award. Carlos Castillo, Marcelo Mendoza, Barbara Poblete: Predicting Information Credibility in Time-Sensitive Social Media In Internet Research, Vol. 23, Issue 5. October 2013. A. Popoola, D. Krasnoshtan, A. Toth, V. Naroditskiy, C. Castillo, P. Meier and I. Rahwan: Information Verification during Natural Disasters Social Web and Disaster Management (SWDM) workshop, 2013.
  45. 45. 3
  46. 46. http://www.youtube.com/watch?v=pAHoEO-K0Ek
  47. 47. 62 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Crowdsourced verification: Veri.ly • Frame crowdwork correctly • Not upvoting/downvoting a claim • Instead, providing evidence for/against @VeriDotLy — http://veri.ly/
  48. 48. 65 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Examples of evidence provided
  49. 49. 66 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Automatic credibility evaluation: TweetCred • Real-time web-based service • Used as a Chrome extension • Annotates Twitter's timeline with credibility scores
  50. 50. 67 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ http://twitdigest.iiitd.edu.in/TweetCred/
  51. 51. 68 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Next steps • Credibility facets – Factually written – Detailed – Author on the ground – ... • Respond to searches about an event
  52. 52. Closing remarks
  53. 53. 71 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Computationally feasible Supported by data Useful Good projects in this space
  54. 54. 72 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Computationally feasible Supported by data Useful Good projects in this space Temptation! Danger! Poorly planned projects :-( AI-complete problems
  55. 55. 73 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Some venues • SWDM – Workshop on Social Web for Disaster Management – Deadline: January 24th • ISCRAM – International Conference on Information Systems for Crisis Response and Management + the usual suspects, depending on your area ;-)
  56. 56. 74 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Possibility of large impact by using computer science to support humanitarian work = Applied computing at its best
  57. 57. Thank you! Carlos Castillo · chato@acm.org http://www.chato.cl/research/ With thanks to Patrick Meier for several slides

×