Crisis Informatics (November 2013)

1,406 views

Published on

Talk at Microsoft Research, New York City, November 2013.

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,406
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
18
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Crisis Informatics (November 2013)

  1. 1. Crisis informatics: Finding relevant and credible information on social media during disasters
  2. 2. How/when did it start for me? January 2010
  3. 3. Fertile grounds for applied research ✔ ✔ ✔ ✔ ✔ Problems of global significance Solved with labor-intensive methods Better solution provides a public good Large and noisy data sets available Engage volunteer communities Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 3
  4. 4. State of the art At least 650 publications: Crisis Analysis (52) Crisis Management (280) Situational Awareness (58) Social Media (203) Mobile Phones (64) Crowdsourcing (109) Software and Tools (90) Human-Computer Interaction (28) Natural Language Processing (33) Trust and Security (31) Geographical Analysis (45) Source: http://humanitariancomp.referata.com/
  5. 5. Publication titles
  6. 6. Fertile grounds for applied research Problem of global significance ✔ Solved with labor-intensive methods ✔ Better solution provides a public good ✔ Large and noisy data sets available ✔ Engage volunteer communities • Relevance to practitioners? ✔ Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 7
  7. 7. Patrick Meier , Social Innovation Director @ QCRI – http://irevolution.net/
  8. 8. “What can speed humanitarian response to tsunami-ravaged coasts? Expose human rights atrocities? Launch helicopters to rescue earthquake victims? Outwit corrupt regimes? A map.” Patrick Meier , Social Innovation Director @ QCRI – http://irevolution.net/
  9. 9. Collaborators Muhammad Imran – QCRI Ioanna Lykorentzou – INRIA Hemant Purohit – Wright Univ. Alexandra Olteanu – EPFL Shady Elbassuoni – Univ. of Beirut Lalana Kagal et al. – CSAIL MIT Jakob Rogstadious – Univ. of Madeira Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Fernando Diaz – Microsoft 10
  10. 10. Outline • • • • Motivation Handling crisis tweets Crowdsourced verification Ongoing work – Automatic classification – Resource matchmaking Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 11
  11. 11. Crisis Mapping Hemant Purohit, Carlos Castillo, Patrick Meier and Amit Sheth: Crisis Mapping, Citizen Sensing and Social Media Analytics Tutorial at ICWSM, May 2013.
  12. 12. I don't have time for social networks! • We all have spare capacity – Television, TV series, Internet sites • We overestimate ourselves in general – Don' underestimate social media users, it is a bad starting point Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 13
  13. 13. An earthquake hits a Twitter user • When an earthquake strikes, the first tweets are posted 20-30 seconds later • Damaging seismic waves travel at 3-5 km/s, while network communications are light speed on fiber/copper + latency • After ~100km seismic waves may be overtaken by tweets about them http://xkcd.com/723/ Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 18
  14. 14. Crisis Mapper Conference 2013: Next week! Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 26
  15. 15. Classifying and extracting information from tweets Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Practical Extraction of Disaster-Relevant Information from Social Media In SWDM. Rio de Janeiro, Brazil, 2013. Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Extracting Information Nuggets from Disaster-Related Messages in Social Media In ISCRAM. Baden-Baden, Germany, 2013. Best paper award.
  16. 16. Our approach 1. 2. 3. Filtering Classification Extraction Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 28
  17. 17. 1. Filtering Is disasterrelated? No Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Yes Contributes to situational awareness? Yes No 29
  18. 18. Labeling task Classify the following tweet from Hurricane Sandy as: ● Personal: only of interest to author and immediate circle of friends ● Informative: interesting to other people ● Off-topic: not related to Hurricane Sandy ● Other/can't judge Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 30
  19. 19. Advice on labeling • Your instructions will never be correct the first time you try – e.g. personal / eyewitness – Instructions must be re-written reactively – Perform small-scale labeling first • Instructions must be concrete and brief – If you can't do it, the task has to be divided Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 31
  20. 20. 2. Classification Caution & Advice Information Sources Damage & Casualties Donations ... Health Filtered tweets Water Food Shelter Logistics ... Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 32
  21. 21. Distribution of tweet types 6% Caution/Advice Info Source Donations Casualties/Damage Unknown 10% 16% 50% 18% Joplin Tornado (2011) Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 33
  22. 22. Classification results Class AUC Caution and advice 0.91 Information source 0.76 Donations 0.89 Casualties/damage 0.87 Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 34
  23. 23. 3. Extraction Classified tweets @JimFreund: Apparently we have no choice. There is a tornado watch in effect tonight. ... Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 35
  24. 24. Extraction • #hashtags, @user mentions, URLs, etc. – Regular expressions – Text library from Twitter • Temporal expressions – Part-of-speech tagger + heuristics – Natty library • Supervised learning Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 36
  25. 25. Labels for extraction • Type-dependent instruction • Ask evaluators to copy-paste a word/phrase from each tweet Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 37
  26. 26. Learning: Conditional Random Fields hidden observed HMM Linear-chain CRF • Used extensively in NLP for part-of-speech tagging and information extraction • Representation of observations is important (capitalization, position, etc.) Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 38
  27. 27. Tool • CMU ARK Twitter NLP – Tokenization – Feature extraction – CRF learning • Very easy to use: simply change the training set (part-of-speech tags) into anything, and re-train Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 39
  28. 28. Output examples RT @weatherchannel: .@NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC Wow what a mess #Sandy has made. Be sure to check on the elderly and homeless please! Thoughts and prayers to all affected RT @twc_hurricane: Wind gusts over 60 mph are being reported at Central Park and JFK airport in #NYC this hour. #Sandy RT @mitchellreports: Red Cross tells us grateful for Romney donation but prefer people send money or donate blood dont collect goods NOT best way to help #Sandy Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 40
  29. 29. Extractor evaluation Setting Rec Prec Train 2/3 Joplin, Test 1/3 Joplin 78% 90% Train 2/3 Sandy, Test 1/3 Sandy 41% 79% Train Joplin, Test Sandy 11% 78% Train Joplin + 10% Sandy, Test 90% Sandy 21% 81% • Precision is: one word or more in common with what humans extracted Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 41
  30. 30. Donations matching • Identify and match requests/offers for donations – Money, clothing, food, shelter, volunteers, blood Average precision = 0.21 (0.16 if only text similarity is used) Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 42
  31. 31. Crowdsourced stream processing systems Muhammad Imran, Ioanna Lykourentzou and Carlos Castillo: Engineering Crowdsourced Stream Processing Systems (Submitted for publication)
  32. 32. Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 44
  33. 33. Design objectives and principles Design principles Design objective Example metric Automatic components Crowdsourced components Low latency End-to-end time Keep-items moving Trivial tasks High throughput Output items per unit of time High-performance processing Task automation Load adaptability Rate response function Load shedding, load queueing Task prioritization Cost effectiveness Cost vs. quality, throughput, etc. N/A Task frugality High quality Application-depen dent Redudancy, aggregation and quality control Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 45
  34. 34. Design patterns ● QA loop ● Task assignment ● ● ● ● Process/verify Supervised learning ● Crowdwork sub-task chaining Humans are not a bottleneck Humans review every output element
  35. 35. http://aidr.qcri.org/ Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 47
  36. 36. Self-service for crisis-related classification Unstructured text reports Report Classifier Structured information Model Builder Crowdsourced active learning Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Library of training data 48
  37. 37. Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 49
  38. 38. Preliminary results: efficiency Maximum documented input load during a natural disaster = 270 tweets/sec.
  39. 39. Preliminary results: effectiveness Task: Informative vs. {Personal, Other}
  40. 40. Free software • AIDR is free software • The official launch date is November 20th during the Crisis Mappers conference in Nairobi, Kenya Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 52
  41. 41. Mobile applications Fuming Shih, Oshani Seneviratne, Daniela Miao, Ilaria Liccardi, Lalana Kagal, Evan Patton, Patrick Meier, Carlos Castillo: Democratizing Mobile App Development for Disaster Management To be presented at the IJCAI Workshop on Semantic Cities. Beijing, China, 2013.
  42. 42. Mobile components (AppInventor) • Components useful for DIY emergency response apps – e.g. off-line tolerant photo uploads • Aggregating/federating linked open data Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 54
  43. 43. Helping developers query linked data Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 55
  44. 44. Resource matching Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 57
  45. 45. Crowdsourced verification
  46. 46. 3
  47. 47. Crowdsourced verification for crisis information • Veri.ly • Joint project between MASDAR and QCRI • Iyad Rahwan, Abdulfatai Popoola, Dmytro Krasnoshtan, Attila Toth (MASDAR), Victor Naroditskiy (Univ. Southampton) + QCRI Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 61
  48. 48. Closing remarks
  49. 49. Good projects in this space Useful Computationally feasible Carlos Castillo – chato@acm.org http://www.chato.cl/research/ Supported by data 65
  50. 50. Good projects in this space Poorly planned projects :-( AI-complete problems Useful Computationally feasible Supported by data Temptation! Danger! Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 66
  51. 51. Some venues • ISCRAM – International Conference on Information Systems for Crisis Response and Management • SMDW – Workshop on Social Web for Disaster Management • SMERTS – Social Media and Semantic Technologies in Emergency Response + the usual suspects, depending on your area ;-) Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 67
  52. 52. Possibility of large impact by using computer science to support humanitarian work = Applied computing at its best Carlos Castillo – chato@acm.org http://www.chato.cl/research/ 68
  53. 53. Thank you! Carlos Castillo · chato@acm.org http://www.chato.cl/research/ With thanks to Patrick Meier for several slides

×