Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Keynote talk: Big Crisis Data, an Open Invitation

951 views

Published on

Talk in Manaus, Brazil. Higher-resolution and source available upon request. Reference: http://dx.doi.org/10.1145/2820426.2822359

Published in: Government & Nonprofit
  • Be the first to comment

Keynote talk: Big Crisis Data, an Open Invitation

  1. 1. BIG CRISIS DATA An Open Invitation CARLOS CASTILLO @BigCrisisData Manaus, Brasil, Outubro 2015
  2. 2. BigCrisis Data — Carlos Castillo 2 This talk is about ... ● Disasters and time-critical situations – Natural, social, or technological hazards – Mass convergence events ● Social media – Particularly microtext ● Computing – Applications of many fields including NLP, ML, IR
  3. 3. Big Crisis Data — Carlos Castillo 3 http://www.youtube.com/watch?v=0UFsJhYBxzY
  4. 4. BigCrisis Data — Carlos Castillo 4 An earthquake hits a Twitter user http://xkcd.com/723/ ● When an earthquake strikes, the first tweets are posted 20-30 seconds later ● Damaging seismic waves travel at 3-5 km/s, while network communications are light speed on fiber/copper + latency ● After ~100km seismic waves may be overtaken by tweets about them
  5. 5. Big Crisis Data — Carlos Castillo 5 January 2010 How/when did it start for me?
  6. 6. Big Crisis Data — Carlos Castillo 6 Humanitarian Computing At least 775 publications: ● Crisis Analysis (55) ● Crisis Management (309) ● Situational Awareness (67) ● Social Media (231) ● Mobile Phones (74) ● Crowdsourcing (116) ● Software and Tools (97) ● Human-Computer Interaction (28)  ● Natural Language Processing (33)  ● Trust and Security (33) ● Geographical Analysis (53) Source: http://humanitariancomp.referata.com/
  7. 7. Big Crisis Data — Carlos Castillo 7 Humanitarian Computing Topics
  8. 8. Big Crisis Data — Carlos Castillo 8
  9. 9. Big Crisis Data — Carlos Castillo 9
  10. 10. BigCrisis Data — Carlos Castillo 10 Fertile grounds for applied research ✔ Problems of global significance ✔ Solved with labor-intensive methods ✔ Better solution provides a public good ✔ Large and noisy data sets available ✔ Engage volunteer communities
  11. 11. BigCrisis Data — Carlos Castillo 11 Fertile grounds for applied research ✔ Problems of global significance ✔ Solved with labor-intensive methods ✔ Better solution provides a public good ✔ Large and noisy data sets available ✔ Engage volunteer communities Relevance to practitioners?
  12. 12. BigCrisis Data — Carlos Castillo 12 Recent collaborators Patrick Meier Sarah Vieweg – QCRI Muhammad Imran – QCRI Irina Temnikova – QCRI Alexandra Olteanu – EPFL Aditi Gupta – IIIT Delhi “P.K.” Kumaraguru – IIIT Delhi Fernando Diaz – Microsoft
  13. 13. BigCrisis Data — Carlos Castillo 13 Outline Volume Vagueness Visualization Volunteering Values
  14. 14. BigCrisis Data — Carlos Castillo 14 Disaster Communications and Scale
  15. 15. BigCrisis Data — Carlos Castillo 15 Crises and disasters ● Crises are unstable situations – May or may not lead to a disaster ● Disasters are social phenomena – Disruptions of routines
  16. 16. BigCrisis Data — Carlos Castillo 16 Temporal and Spatial Dimensions
  17. 17. BigCrisis Data — Carlos Castillo 17 Examples
  18. 18. Big Crisis Data — Carlos Castillo 18 REEL LIFE OR REAL LIFE?
  19. 19. Big Crisis Data — Carlos Castillo 19 REEL LIFE OR REAL LIFE?
  20. 20. Big Crisis Data — Carlos Castillo 20 https://www.youtube.com/watch?v=MylI8HmgMBk
  21. 21. BigCrisis Data — Carlos Castillo 21 In Real Life ... ● Some people panic, most people don't ● People gather information from familiar sources ● People quickly decide whether to flee, take cover, or take action ● People improvise complex rescue operations on the spot Devon, UK, June 2014 London, UK, May 2015 San José Boquerón, Paraguay, Oct 2013
  22. 22. Big Crisis Data — Carlos Castillo 22 Example Disaster-Related Messages “OMG! The fire seems out of control: It’s running down the hills!” Bush fire near Marseilles, France, in 2009 [Longueville et al. 2009] “Red River at East Grand Forks is 48.70 feet, +20.7 feet of flood stage, -5.65 feet of 1997 crest. #flood09” Red River Valley floods in 2009 [Starbird et al. 2010] “My moms backyard in Hatteras. That dock is usually about 3 feet above water [photo]” Hurricane Sandy 2013 [Leavitt and Clark 2014] “Sirens going off now!! Take cover...be safe!” Moore Tornado 2013 [Blanford et al. 2014]. “There is shooting at Utøya, my little sister is there and just called home!” 2011 attacks in Norway [Perng et al. 2013]
  23. 23. BigCrisis Data — Carlos Castillo 23 Social media usage during disasters ● Interpersonal (horizontal) – Stay in touch with family and friends ● Citizen sensing (bottom-up) – Read/Write reports on ground situation ● Official communications (top-down) – E.g. advice, warnings, or evacuation orders
  24. 24. BigCrisis Data — Carlos Castillo 24 Scale: Tweets per Second
  25. 25. BigCrisis Data — Carlos Castillo 25 Requirements ● Typical users – Emergency response services – Humanitarian relief agencies – Journalists and the Public ● Underspecified requirements that vary over time ● Usually a combination of: 1) Capture the “Big Picture” 2) Obtain “Actionable Insights”
  26. 26. BigCrisis Data — Carlos Castillo 26 Understanding, Classifying and Extracting
  27. 27. BigCrisis Data — Carlos Castillo 27 Example “Media must report about d alleged 20k RSS chaps off 2 #Nepal.here’s a pic coz d 1 @ShainaNC shared isn’t true.. ;)”
  28. 28. BigCrisis Data — Carlos Castillo 28 Social media messages ● Social media is more like a transcript of a conversation than like text meant to stand on its own – Awkward entry methods: ● Fragmented language and incomplete sentences ● Many typographic and grammatical errors – Conversational: ● Little or no context (hard to comprehend in isolation) ● Code switching and borrowing ● Internet slang
  29. 29. Big Crisis Data — Carlos Castillo 29 Slang
  30. 30. Big Crisis Data — Carlos Castillo 30 Classification Caution & Advice Information Sources Damage & Casualties Donations Gov Eyewitness Media NGO Outsider ... ... Filtered tweets
  31. 31. BigCrisis Data — Carlos Castillo 31 Classification Axes ● By usefulness (application-dependent!) – Not related, Related but useless, Useful ● By factual, subjective, or emotional content ● By information provided ● By information source – Government, NGOs, media, eyewitnesses, etc. ● By humanitarian clusters
  32. 32. Big Crisis Data — Carlos Castillo 32 Humanitarian Clusters
  33. 33. Big Crisis Data — Carlos Castillo 33 Alexandra Olteanu, Sarah Vieweg and Carlos Castillo: What to Expect When the Unexpected Happens: Social Media Communications Across Crises. To appear in CSCW 2015. Humanitarian Clusters (cont.)
  34. 34. BigCrisis Data — Carlos Castillo 34 A large-scale study of crisis tweets ● Collect tweets from 26 disasters ● Classify according to: ● Informative / Not informative ● Information provided ● Information source ● Several iterations required to write the “right” instructions Alexandra Olteanu, Sarah Vieweg and Carlos Castillo: "What to Expect When the Unexpected Happens: Social Media Communications Across Crises" In CSCW 2015, 14-18 March in Vancouver, Canada. ACM Press.
  35. 35. Big Crisis Data — Carlos Castillo 35 Information Provided in Crisis Tweets N=26; Data available at http://crisislex.org/
  36. 36. BigCrisis Data — Carlos Castillo 36 What do people tweet about? ● Affected individuals – 20% on average (min. 5%, max. 57%) – most prevalent in human-induced, focalized & instantaneous events ● Sympathy and emotional support – 20% on average (min. 3%, max. 52%) – most prevalent in instantaneous events ● Other useful information – 32% on average (min. 7%, max. 59%) – least prevalent in diffused events
  37. 37. BigCrisis Data — Carlos Castillo 37 What do people tweet about? (cont.) ● Infrastructure and utilities – 7% on average (min. 0%, max. 22%) – most prevalent in diffused events, in particular floods ● Caution and advice – 10% on average (min. 0%, max. 34%) – least prevalent in instantaneous & human-induced events ● Donations and volunteering – 10% on average (min. 0%, max. 44%) – most prevalent in natural hazards
  38. 38. Big Crisis Data — Carlos Castillo 38 Distribution over information sources
  39. 39. Big Crisis Data — Carlos Castillo 39 Distribution over time
  40. 40. BigCrisis Data — Carlos Castillo 40 Dataset CrisisLexT26 www.crisislex.org
  41. 41. Big Crisis Data — Carlos Castillo 41 Information Extraction ... Classified tweets @JimFreund: Apparently we have no choice. There is a tornado watch in effect tonight.
  42. 42. BigCrisis Data — Carlos Castillo 42 Extraction ● #hashtags, @user mentions, URLs, etc. – Regular expressions – Text library from Twitter ● Temporal expressions – Part-of-speech tagger + heuristics – Natty library ● Supervised learning Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Practical Extraction of Disaster-Relevant Information from Social Media. Social Web and Disaster Management (SWDM) workshop. Rio de Janeiro, Brazil, 2013.
  43. 43. BigCrisis Data — Carlos Castillo 43 Labels for extraction ● Type-dependent instruction ● Ask evaluators to copy-paste a word/phrase from each tweet
  44. 44. BigCrisis Data — Carlos Castillo 44 Learning: Conditional Random Fields ● Extends HMM to incorporate more possible dependencies ● Used extensively in NLP for part-of-speech tagging and information extraction HMM Linear-chain CRF hidden observed
  45. 45. BigCrisis Data — Carlos Castillo 45 Tool ● CMU ARK Twitter NLP – Tokenization – Feature extraction – CRF learning ● Very easy to use – simply change the training set (part-of-speech tags), – then re-train
  46. 46. Big Crisis Data — Carlos Castillo 46 Output examples RT @weatherchannel: .@NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC Wow what a mess #Sandy has made. Be sure to check on the elderly and homeless please! Thoughts and prayers to all affected RT @twc_hurricane: Wind gusts over 60 mph are being reported at Central Park and JFK airport in #NYC this hour. #Sandy RT @mitchellreports: Red Cross tells us grateful for Romney donation but prefer people send money or donate blood dont collect goods NOT best way to help #Sandy
  47. 47. Big Crisis Data — Carlos Castillo 47 Extractor evaluation Setting Rec Prec Train 2/3 Joplin, Test 1/3 Joplin 78% 90% Train 2/3 Sandy, Test 1/3 Sandy 41% 79% Train Joplin, Test Sandy 11% 78% Train Joplin + 10% Sandy, Test 90% Sandy 21% 81% ● Precision is: one word or more in common with what humans extracted
  48. 48. BigCrisis Data — Carlos Castillo 48 Donations matching ● Identify and match requests/offers for donations – Money, clothing, food, shelter, volunteers, blood ● Method – Classify – Determine key aspects – Extract key aspects – Per-aspect matching Hemant Purohit, Amit Sheth, Carlos Castillo, Patrick Meier, Fernando Diaz: Emergency-Relief Coordination on Social Media: Automatically Matching Resource Requests and Offers. First Monday 19 (1), January 2014.
  49. 49. BigCrisis Data — Carlos Castillo 49 Donations matching Average precision = 0.21 (0.16 if only text similarity is used)
  50. 50. BigCrisis Data — Carlos Castillo 50 Crisis maps from social media
  51. 51. Big Crisis Data — Carlos Castillo 51
  52. 52. Big Crisis Data — Carlos Castillo 52
  53. 53. Big Crisis Data — Carlos Castillo 53 Patrick Meier, Social Innovation Director @ QCRI – http://irevolution.net/ “What can speed humanitarian response to tsunami-ravaged coasts? Expose human rights atrocities? Launch helicopters to rescue earthquake victims? Outwit corrupt regimes? A map.”
  54. 54. BigCrisis Data — Carlos Castillo 54 Crisis mapping goes mainstream (2011)
  55. 55. Big Crisis Data — Carlos Castillo 55
  56. 56. Big Crisis Data — Carlos Castillo 56
  57. 57. Big Crisis Data — Carlos Castillo 57
  58. 58. Big Crisis Data — Carlos Castillo 58
  59. 59. Big Crisis Data — Carlos Castillo 59
  60. 60. BigCrisis Data — Carlos Castillo 60 Automatic Mapping (floods) ● Top: hydrological data ● Bottom: tweet density ● Broad match with affected areas ● Many biases towards places with higher density of smartphones De Albuquerque, João Porto, Herfort, Benjamin, Brenning, Alexander, and Zipf, Alexander. 2015. A geographic approach for combining social media and authoritative data towards identifying useful information for disaster management. International Journal of Geographical Information Science, 29(4), 667–689.
  61. 61. BigCrisis Data — Carlos Castillo 61 Automatic Mapping (Dengue) Gomide, Janaina and Veloso, Adriano and Meira, Wagner and Almeida, Virgilio and Benevenuto, Fabricio and Ferraz, Fernanda and Teixeira, Mauro (2011) Dengue surveillance based on a computational model of spatio-temporal locality of Twitter. pp. 1-8. In: Proceedings of the ACM WebSci'11, June 14-17 2011, Koblenz, Germany. ● Top: official reports ● Bottom: tweets
  62. 62. BigCrisis Data — Carlos Castillo 62 Current Approach Hybrid real-time systems MicroMappers Manual processing: crowdsourcing Automatic processing: machine learning
  63. 63. Big Crisis Data — Carlos Castillo 63 http://newsbeatsocial.com/watch/0_s6xxcr3p
  64. 64. Big Crisis Data — Carlos Castillo 64
  65. 65. Big Crisis Data — Carlos Castillo 65
  66. 66. Big Crisis Data — Carlos Castillo 66 https://www.youtube.com/watch?v=uKgE3yWJ0_I
  67. 67. BigCrisis Data — Carlos Castillo 67 Volunteering and Values
  68. 68. BigCrisis Data — Carlos Castillo 68 Volunteering is a constant ● Integral part of how communities react to disasters ● Organizational types: – Existing – Extending – Expanding – Emerging ● Emergent organizations a mixed blessing for existing ones ● New scenario: digital volunteering – E.g. volunteer annotations, including crisis mapping
  69. 69. BigCrisis Data — Carlos Castillo 69 Why do people volunteer? Altruism is key, but it's one of many reasons
  70. 70. BigCrisis Data — Carlos Castillo 70 Privacy and Ethics ● Protect the privacy of individuals – ICRC Data Protection Guidelines – UN Guidelines on Cyber Security ● Protect victims and responders during armed attacks ● Protect volunteers from distal exposure ● Protect citizen reporters from danger and retaliation ● Give back and share results and data
  71. 71. BigCrisis Data — Carlos Castillo 71 “I'm dying, they are tweeting” Digital Voyeurism
  72. 72. BigCrisis Data — Carlos Castillo 72 CONCLUSIONS
  73. 73. Computationally feasible Supported by data Useful Good projects in this space
  74. 74. Computationally feasible Supported by data Useful Good projects in this space Temptation! Danger! Poorly planned projects :-( AI-complete problems
  75. 75. Big Crisis Data — Carlos Castillo 75 Interdisciplinary Research ● As many things, it has Good, Bad, and Ugly aspects ● Good – You learn a lot, and it's the only way of supporting claims of practical utility in applied research ● Bad – Formal response organizations can be very difficult to engage with; relationships should be established between operations ● Ugly – Working software and 24/7 support for a critical need now vs advanced proof-of-concept later
  76. 76. Possibility of large impact by using computer science to support humanitarian work = Applied computing at its best
  77. 77. Big Crisis Data — Carlos Castillo 77 References ● Carlos Castillo: “Big Crisis Data.” Cambridge University Press, 2016 (forthcoming). ● Muhammad Imran, Carlos Castillo, Fernando Diaz, Sarah Vieweg: "Processing Social Media Messages in Mass Emergency: A Survey" In ACM Computing Surveys, Volume 47, Issue 4, June 2015. ● Alexandra Olteanu, Sarah Vieweg and Carlos Castillo: "What to Expect When the Unexpected Happens: Social Media Communications Across Crises" In CSCW 2015, 14-18 March in Vancouver, Canada. ACM Press. ● Muhammad Imran, Ioanna Lykourentzou, Yannick Naudet and Carlos Castillo: Engineering Crowdsourced Stream Processing Systems. Technical report, 2015. ● Hemant Purohit, Amit Sheth, Carlos Castillo, Patrick Meier, Fernando Diaz: Emergency-Relief Coordination on Social Media: Automatically Matching Resource Requests and Offers. First Monday 19 (1), January 2014. ● Sarah Vieweg, Carlos Castillo and Muhammad Imran: "Integrating Social Media Communications into the Rapid Assessment of Sudden Onset Disasters." SocInfo 2014. ● Alexandra Olteanu, Carlos Castillo, Fernando Diaz and Sarah Vieweg: CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises. In ICWSM. Ann Arbor, MI, USA. June 2014. ● Carlos Castillo, Marcelo Mendoza, Barbara Poblete: Predicting Information Credibility in Time-Sensitive Social Media (+Supplementary Material). In Internet Research, Vol. 23, Issue 5, Special issue on The Predictive Power of Social Media, pp. 560-588. October 2013. ● Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Practical Extraction of Disaster-Relevant Information from Social Media. Social Web and Disaster Management (SWDM) workshop. Rio de Janeiro, Brazil, 2013. ● Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Extracting Information Nuggets from Disaster-Related Messages in Social Media. In ISCRAM. Baden-Baden, Germany, 2013. Best paper award.
  78. 78. BigCrisis Data — Carlos Castillo 78 Thank you! Follow @BigCrisisData

×