Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Role of Social Media and Artificial Intelligence for Disaster Response

0 views

Published on

Keynote slides for ISCRAM 2016.

"Social Media platforms such as Twitter are invaluable sources of time-critical information. Information on social media communicated during emergencies convey timely and actionable information. For rapid crisis response, real-time insights are important for emergency responders. Although, many humanitarian organizations would like to use this information, however they struggle due a number of issues such as information overload, information vagueness, less credible and misinformation. In this talk, I will describe the role of social media and potential artificial intelligence computational techniques useful for humanitarian organizations and decision makers to make sense of social media data for rapid crisis response."

Published in: Technology
  • Be the first to comment

The Role of Social Media and Artificial Intelligence for Disaster Response

  1. 1. The Role of Social Media and Ar2ficial Intelligence for Disaster Response Muhammad Imran Qatar Compu+ng Research Ins+tute Hamad Bin Khalifa University Doha, Qatar May 25, 2016 h<p://mimran.me/ From tradi*onal to emerging tools for crisis response
  2. 2. This Talk is About… •  The Role of Informa2on in Time-cri2cal Situa2ons –  Natural disasters and their destruc+ons –  Man-made disasters and mass convergence events •  The Role of Social Media for Disaster Response –  Par+cular focus on micro-blogging plaJorms –  Availability of various types of informa+on and opportuni+es •  The Role of Ar2ficial Intelligence for Disaster Response –  How AI is useful in disaster response –  Various AI techniques, approaches, and tools –  Work of crisis compu+ng group at QCRI –  Ongoing research –  Future direc+ons
  3. 3. Source: UNISDR Most Affected Countries by Natural Disasters (1995-2015)
  4. 4. Humans Suffering and Economic Damage by Disasters Humans suffering from the impacts of disasters, crises, and armed conflict Millions of people affected each year by disasters; At an annual cost to the global economy that exceeds $300 billion
  5. 5. Plan and Prepare Humans suffering from the impacts of disasters, crises, and armed conflict Millions of people affected each year by disasters; At an annual cost to the global economy that exceeds $300 billion Disasters are unavoidable but planning can lessen their effects
  6. 6. Plan and Prepare Humans suffering from the impacts of disasters, crises, and armed conflict Millions of people affected each year by disasters; At an annual cost to the global economy that exceeds $300 billion Provide helping hand…
  7. 7. -  Between 2008 and 2014, 184 million people displaced by natural disasters -  Over 60 million due to conflicts -  An average 26.4 million each year The Urgency to Act and Plan
  8. 8. Informa2on: A Lifeline During Disasters The opaqueness induced by disasters is overwhelming People need informa2on as much as water, food, medicine or shelter Lack of informa+on can make people vic2ms of disaster and targets of aid
  9. 9. Data or Dialogue? The Role of Informa2on in Disasters Iain Logan, former head of disaster opera+ons, Interna+onal Federa+on: “The very first thing you need to do is climb into a helicopter. You can’t get to see how many people are buried but you get an eagle’s eye view… You can see which airfields are working, which bridges are down. Ader three to four hours in a helicopter, I had a complete overview of the geographical extent of the disaster (earthquake in El Salvador, 2001), the logis+cs involved, the popula+on centers – also where not to send people. Then you must get on the ground to get the quality.” - Informa+on Bestows Power -
  10. 10. The Role of Social Media
  11. 11. Social Media Use During Christchurch Earthquake A self-organized workforce of 10,000 volunteers gathered on Facebook
  12. 12. The Role of Twi<er During Thailand Floods [Alisa Kongthon et al. 2011]
  13. 13. Twi<er Breaks Events Faster First report Breaks the story 33 minutes before local TV Hudson Plane Crash Westgate Mall A<ack
  14. 14. Twi<er Breaks Events Faster First report on Twi<er Aaer 1 minute Aaer 2 minutes Boston Bombing
  15. 15. Crisis Communica2ons Before and Now Gerald Baron
  16. 16. Analysis of Twi<er Crisis-Related Data An In-depth Study
  17. 17. Twi<er Crisis-Related Data (2012) Source: Qatar Compu+ng Research Ins+tute - Published in World Humanitarian Data and Trends 2014 (UN OCHA) Twioer data from 13 crises; Analyzed over 100,000 tweets; Informa+on types and sources
  18. 18. Twi<er Crisis-Related Data (2013) Source: Qatar Compu+ng Research Ins+tute - Published in World Humanitarian Data and Trends 2014 (UN OCHA) Twioer data from 13 crises; Analyzed over 100,000 tweets; Informa+on types and sources
  19. 19. Twi<er Crisis-Related Data (All) -  Twioer data from 13 recent crises -  Over 100,000 tweets -  Informa2on types -  Types of sources Source: Qatar Compu+ng Research Ins+tute - Published in World Humanitarian Data and Trends 2014 (UN OCHA)
  20. 20. UN OCHA Humanitarian Clusters- Related Informa2on on Twi<er
  21. 21. Typhoon Yolanda – OCHA Clusters -  Performed analysis of more than 440,000 tweets during the first 48 hours -  15% of the tweets found poten2ally relevant Source: Qatar Compu+ng Research Ins+tute - Published in World Humanitarian Data and Trends 2014 (UN OCHA)
  22. 22. Typhoon Yolanda – OCHA Clusters Source: Qatar Compu+ng Research Ins+tute - Published in World Humanitarian Data and Trends 2014 (UN OCHA) -  Performed analysis of more than 440,000 tweets during the first 48 hours -  15% of the tweets found poten2ally relevant
  23. 23. Sandy Hurricane Twi<er Data Analysis @NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this +me. Bridges must close by 7pm. #Sandy #NYC. rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours ader they got separated from their mom when car submerged in si. #sandy #911buff freaking out. home alone. will just watch tv #Sandy #NYC. 400 Volunteers are needed for areas that #Sandy destroyed.
  24. 24. @NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this +me. Bridges must close by 7pm. #Sandy #NYC. rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours ader they got separated from their mom when car submerged in si. #sandy #911buff freaking out. home alone. will just watch tv #Sandy #NYC. 400 Volunteers are needed for areas that #Sandy destroyed. Personal Informa+ve Sandy Hurricane Twi<er Data Analysis
  25. 25. @NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this +me. Bridges must close by 7pm. #Sandy #NYC. rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours ader they got separated from their mom when car submerged in si. #sandy #911buff freaking out. home alone. will just watch tv #Sandy #NYC. 400 Volunteers are needed for areas that #Sandy destroyed. Personal Informa+ve Cau+on and Advice Casual+es and Damage Dona+ons Sandy Hurricane Twi<er Data Analysis
  26. 26. @NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this +me. Bridges must close by 7pm. #Sandy #NYC. rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours ader they got separated from their mom when car submerged in si. #sandy #911buff freaking out. home alone. will just watch tv #Sandy #NYC. 400 Volunteers are needed for areas that #Sandy destroyed. Personal Informa+ve Cau+on and Advice Casual+es and Damage Dona+ons Sandy Hurricane Twi<er Data Analysis
  27. 27. MERS Outbreak Twi<er Data Analysis Middle East Respiratory Syndrome (MERS) Twioer data collec+on from: 2014-04-27 to 2014-07-14 using hashtag #MERS (Total = 215,370) Data analysis: Reports of symptoms Affected people reports Death reports Disease transmission reports Preven+on ques+ons Treatment ques+ons Reports of signs or symptoms such as fever, cough or ques+ons Reports of affected people due to the MERS disease Reports of deaths due to the MERS disease Ques+ons or sugges+ons related to the preven+on of disease Reports or ques+ons related to the transmission of the disease Ques+ons or sugges+ons regarding the treatment of the disease
  28. 28. Social Media During MERS Outbreak RT @abeoel: Two workers at FL hospital exposed to a pa+ent with Middle East Respiratory Syndrome are showing flu-like symptoms Coronavirus symptoms include: fever, coughing, shortness of breath, conges2on in the nose and throat, and in some cases diarrhea. MERS #MERS is a rela+vely new respiratory illness, spread b/w people in close contact. Symptoms are fever, cough, &amp; shortness of breath. Saudi Arabia finds another 32 MERS cases as disease spreads: RIYADH (Reuters) - Saudi Arabia said on Thursday ... hop://t.co/cPhm0uTRCo Signs and symptoms Signs and symptoms Signs and symptoms Affected individuals
  29. 29. Social Media During MERS Outbreak First Case of Deadly Middle Eastern Virus Found in U.S.: The Centers for Disease Control has confirmed that a case of the deadly Midd... Third Case of MERS Confirmed in the U.S.: The U.S. Centers for Disease Control and Preven+on confirmed on Sat... hop://t.co/Sb8PMyxVUn No clear transmission link btwn camels and humans for MERS. 94% Egyp+an camels seroposi+ve but no human cases yet. Hmm #asm2014 Saudi health authori+es announced on Monday that the death toll from the MERS coronavirus has reached 115 since the respiratory disease ... Transmission Death reports Affected individuals Affected individuals
  30. 30. ISCRAM Call for Papers
  31. 31. Aid is Out There! Aider is Out There! AIDR is Also Out There!
  32. 32. The Role of Ar2ficial Intelligence
  33. 33. 2013 Pakistan Earthquake September 28 at 07:34 UTC 2010 Hai2 Earthquake January 12 at 21:53 UTC Data and Opportuni2es Social Media Plamorms Availability of Immense Data: Around 16 thousands tweets per minute were posted during the hurricane Sandy in the US. Opportuni2es: -  Early warning and event detec2on -  Situa2onal awareness -  Ac2onable informa2on -  Rapid crisis response -  Post-disaster analysis Disease outbreaks
  34. 34. Processing Social Media Data Filter: removing, duplicates, spam and messages from bots Classify: categoriza+on of items into informa+on types Cluster: iden+fy trending and emerging topic Aggregate: making sense by connec+ng different pieces Extract: short snippets of focused informa+on Summarize: learning a bigger picture of an event
  35. 35. WAIT! Before applying any technique? Please! Have a look at your data first
  36. 36. Data Characteris2cs and Prepara2on •  Single-word slangs: pls (please), srsly (seriously) •  Mul2-word slangs: imo (in my opinion) •  Misspellings: missin (missing), ovrcme (overcome) •  Phone2c subs2tu2on: 2morrow (tomorrow) •  Word without spaces: prayfornepal (pray for nepal) Can you guess? “r u ok m8” ?? >> “Are you OK, mate?”
  37. 37. Tools to Process Social Media Data
  38. 38. Systems for Crisis-Relevant Data Processing Twitris [Purohit and Sheth 2013] Twioer; seman+c enrichment, classify automa+cally, geotag SensePlace2 [MacEachren et al. 2011] Twioer; geotag, visualize heat-maps based on geotags EMERSE Enhanced Messaging for the Emergency Response Sector [Caragea et al. 2011] Twioer and SMS; machine-translate, classify automa+cally, alerts ESA Emergency Situa+on Awareness [Yin et al. 2012; Power et al. 2014] Twioer; detect bursts, classify, cluster, geotag
  39. 39. Systems for Crisis-Relevant Data Processing Twitcident [Abel et al. 2012] Twioer and TwitPic; seman+c enrichment, classify CrisisTracker [Rogstadius et al. 2013] Twioer; cluster, annotate manually Tweedr [Ashktorab et al. 2014] Twioer; classify automa+cally, extract informa+on, geotag AIDR: Ar2ficial Intelligence for Disaster Response [Imran et al. 2014a] Twioer; annotate manually, classify automa+cally
  40. 40. Ar2ficial Intelligence for Disaster Response
  41. 41. Informa2on Processing Data collec+on 1 2 Human annota+ons on sample data Machine training 3 Classifica+on 4 Disaster Timeline: DATA COLLECTION Humans alone cannot process large amounts of data, so we only use them to help process a subset We train machine using human input to automa+cally process large Data at high speed For example using Keywords, hashtags etc.
  42. 42. Impact and Response Timeline Department of Community Safety, Queensland Govt. & UNOCHA, 2011 Disaster response (today) Disaster response (our target) Requires real-2me processing of data
  43. 43. Data collec+on 1 2 Human annota+ons Machine training 3 Classifica+on 4 ONLINE APPROACH DATA COLLECTION H A Learning-1 CLASSIFICATION OF DATA & DECISION MAKING PROCESS Learning-2 Learning-3 … Learning-n Human annota+on - 1 Human annota+on - 2 Human annota+on - 3 … Human annota+on - n First few hours Informa2on Processing (Real-2me)
  44. 44. Big Challenges – 4Vs •  Volume Scale of data (20m tweets in 5 days Typhoon Oklahoma) •  Velocity Analysis of streaming data (16k/min during Sandy) •  Variety Different forms/types of data (informa+on types) •  Veracity Uncertainty of data
  45. 45. Machine Learning + Crowdsourcing hop://aidr.qcri.org/ AIDR = Machine learning + Crowdsourcing
  46. 46. Crowdsourced Stream Processing Combining human and machine computa2on Difficult, ambiguous items to be labeled by crowd Automatic processing Automatic processing output output Performing verification Providing training data a: Split automatic/manual processing b: Detect-verify paradigm Automatic processing Automatic processing output c: Improving quality through active learning input input input Difficult, ambiguous items to be labeled by crowd Automatic processing Automatic processing output Performing verification Providing training data a: Split automatic/manual processing b: Detect-verify paradigm Automatic processing Automatic processing output c: Improving quality through active learning input input input Difficult, ambiguous items to be labeled by crowd Automatic processing Automatic processing output output Performing verification Providing training data a: Split automatic/manual processing b: Detect-verify paradigm Automatic processing Automatic processing output c: Improving quality through active learning input input input Quality assurance loops: human processing elements do the work, automa+c processing elements check for consistency Process-verify: work is done automa+cally, humans check low-confidence or borderline cases Online supervised learning: humans train machines to perform work automa+cally
  47. 47. hop://aidr.qcri.org/ AIDR —Ar+ficial Intelligence for Disaster Response— is a free, open-source, and easy-to-use plaJorm to automa+cally filter and classify relevant tweets posted during humanitarian crises. 1 2 3 Collect Curate Classify Awarded the Grand Prize in the Open Source Soaware World Challenge 2015
  48. 48. AIDR: From End-users Perspec2ve Collec2on Classifier(s) •  Keywords, hashtags •  Geographical bounding box •  Languages •  Follow specific set of users A collec2on is a set of filters A classifier is a set of tags •  Dona2ons requests & offers •  Damage & causali2es •  Eyewitness accounts •  … 2 steps approach 1 2 hop://aidr.qcri.org/
  49. 49. Real-2me Classifica2on in AIDR hop://aidr.qcri.org/ Trainer
  50. 50. AIDR – Collec2on Sexng Collec2on detail dashboard hop://aidr.qcri.org/ Geographical region filter Language filter Collec2on defini2on
  51. 51. hop://aidr.qcri.org/ AIDR – Classifiers Sexng
  52. 52. AIDR – Classifier Sexng (cont.) hop://aidr.qcri.org/
  53. 53. Human Annota2on in AIDR Internal Tagging Interface hop://aidr.qcri.org/
  54. 54. Human Annota2on Using MicroMappers MicroMapper Interface (web clicker) hop://aidr.qcri.org/ Mobile clicker
  55. 55. Tagged Items and Machine Output hop://aidr.qcri.org/ Training examples Classifiers’ output
  56. 56. High-level Architecture hop://aidr.qcri.org/ Items Collector Feature Extractor Classifier(s) Learner Crowdsourcing Task GeneratorStream of incoming items from data sources Item & featuresItem An expert defines classifiers by giving a name and description for each category Expert Items Crowd workers/volunteers Model parameter Classified Item A list of classified items by category and classifier’s confidence Labeling tasks Labeled item Data source Data source
  57. 57. Quality, Cost, and Performance of AIDR
  58. 58. Quality vs. Cost in AIDR hop://aidr.qcri.org/ Goal: Maximizing quality while minimizing cost •  Quality •  classifica+on accuracy •  Precision •  Cost (human labels) •  monetary in case of paid-workers •  +me in case of volunteers
  59. 59. Quality vs. Cost in AIDR hop://aidr.qcri.org/ Quality vs. cost using passive learning and de-duplica2on Quality vs. cost using ac2ve learning and de-duplica2on
  60. 60. Performance hop://aidr.qcri.org/ In terms of throughput and latency Latency of feature extractor, classifier, and the system Throughput of feature extractor, classifier, and the system
  61. 61. Typhoon HAGUPIT (2014)
  62. 62. UNICEF U-Report and AIDR-SMS AI to Answer Heath Queries via SMS •  Every hour Zambian youth get infected with HIV/AIDS •  UNICEF launched U-Report project in Zambia •  Usage of U-Report plaJorm has recently increased 300%
  63. 63. UNICEF U-Report in Zambia Manual processing and rou+ng of SMS Counselors (experts of HIV, STIs) SMS service 1 2 3 4 5 6 Vulnerable people
  64. 64. UNICEF U-Report in Zambia + AIDR Manual processing and rou+ng of SMS Counselors (experts of HIV, STIs) SMS service 1 2 3 4 5 6 Vulnerable people
  65. 65. New Scien2st Featured This Work
  66. 66. Media Coverage
  67. 67. Human Annota2on Selec2on and scheduling for supervised classifica2on system Ongoing Work
  68. 68. Human Annota2on - Challenges 1- Labeling task selec2on •  Which tasks to pick for labeling? •  No duplicate tasks should be labeled •  Priori2ze tasks that are likely to increase classifier’s accuracy Crowdsourcing is a big research topic. We address two challenges here:
  69. 69. Twi<er Crises Datasets 1.  Joplin-2011 •  Consists of 206,764 tweets collected using (#joplin) 2.  Sandy-2012 •  Consists of 4,906,521 tweets collected using (#sandy, hurricane sandy, …) 3.  Oklahoma-2013 •  Consists of 2,742,588 tweets collected using (Oklahoma, tornado, …)
  70. 70. Distribu2on of Tweets into Phases Pre: preparedness phase Impact: phase corresponds to the period in which the main effects are felt Post: corresponds to response and recovery phase Joplin (led), Sandy (center), and Oklahoma (right). Number of tweets per day in all datasets.
  71. 71. Labeling Task Selec2on Experiment: Is de-duplica2on necessary? Phase Train Phase Test AUC (without de- duplica2on) AUC (with de- duplica2on) S1 (pre) 1,500 S1 (pre) 500 0.78 0.74 S1 (pre) 500 S1 (pre) 500 0.73 0.72 S2 (impact) 500 S2 (impact) 500 0.80 0.72 S3 (post) 500 S3 (post) 500 0.79 0.73 S4 (post’) 500 S4 (post’) 500 0.70 0.64 •  29-74% of tweets are re-tweets & 60-75% are near duplicates •  Duplica+on causes an ar2ficial increase in accuracy •  Necessary to reduce classifier bias. Otherwise learning on a fewer concepts •  Necessary to improve workers experience [Rogstadius et al. 2011]
  72. 72. Labeling Task Selec2on Experiment: Passive learning vs. Ac2ve learning JOPLIN SANDY OKLAHOMA S1 S2 S3 S4 AUC stabilize with fewer training items using ac+ve learning
  73. 73. Labeling Task Scheduling •  All-at-once labeling •  Obtain 1,500 labels on S1 and use all for training •  Cumula2ve labeling •  Obtain 500 labels in each of S1, S2, and S3 and train on labels available up to each phase •  Independent labeling •  Obtain 500 labels in each of S1, S2, and S3 and use the most recent labels for training, discarding old. 2- Labeling task scheduling
  74. 74. Labeling Task Scheduling Experiment: Which labeling strategy to follow? JOPLIN SANDY OKLAHOMA Informa2ve Informa2ve (50%) Dona2ons All-at-once approach dominates in informa+ve and cumula+ve strategy seems beoer for dona+ons
  75. 75. Domain Adapta2on Ability of a system to apply knowledge and skills learned in previous domains to novel domains Ongoing Work Our Goal: To build a system that can understand natural language
  76. 76. Domain Adapta2on Labeled source, but unlabeled target Feature extractor Machine learning algorithm Feature extractor Classifier model Input documents (blue domain) Feature vectors Labels Feature vectors Machine classified items Input documents (orange domain) Training Predic2on Source event data Target event data
  77. 77. Same Domain Learning Training data Machine learning model Tes+ng data infer predict Apples Apples Apples Oranges Different shapes, colors, skins, tastes, etc. Source domain Target domain Oranges Oranges BUT
  78. 78. Crisis-related Data Classifica2on Training data Machine learning model Tes+ng data infer predict Italy earthquake Queensland floods Sandy hurricane Costa Rica earthquake Colorado floods Typhoon Haiyan Different events, languages, needs etc. Source domain Target domain
  79. 79. Domain Adapta2on
  80. 80. Model Adapta2on Experiments •  Model adapta+on using single source – Using both: in-domain and cross-domain •  Model adapta+on using mul2ple sources – In-domain – Mul+ple source events without the target – Mul+ple source events with the target •  Model adapta+on in special cases – Same languages – Similar languages
  81. 81. Observa2ons & Findings •  Data from early hours of a crisis help •  Past events of same type are useful •  Same language data as target event is also useful •  Similar languages are also useful •  Cross-domain training does not show significant improvements
  82. 82. Rapid Crisis Response Future Direc2ons
  83. 83. Image Processing for Damage Assessment Tasks •  Image categoriza2on •  E.g. building, bridge, road damage •  Damage and severity assessment •  Given a damage image, iden+fy severity of damage (low, mild, high)
  84. 84. Deep Learning to Improve Classifica2on Tasks •  Improve text classifica2on performance •  Availability of big data •  Automa+c features learning •  Binary and mul+-class classifica+on •  Tes+ng data from mul+ple past events
  85. 85. Transfer Learning Differences in classifica2on tasks: •  Different classifica+on tasks •  Different types of disasters, stakeholders, informa+on needs Task: •  Learn from source to classify target •  Seman+c similarity between tasks •  Instances similarity between domains •  Instance weigh+ng
  86. 86. Summariza2on and Priori2za2on of Ac2onable Informa2on Informa2on needs & problem: •  Different stakeholders •  Different goals, requirements, and info. needs General situa2onal awareness vs. Target situa2onal awareness •  High-level general updates from an event •  Specific updates (infrastructure damages)
  87. 87. Resources, Datasets, And Tools
  88. 88. Towards Standard Baselines and Datasets CrisisNLP.qcri.org -  Access to 52 million tweets -  Around 50k labeled tweets into humanitarian categories -  Largest word2vec embeddings trained on 52m crisis-related tweets -  Out-of-vocabulary dic2onaries
  89. 89. Towards Standard Baselines and Datasets
  90. 90. Upcoming Book
  91. 91. Digital Humanitarians: Book
  92. 92. ACM Compu2ng Survey Processing Social Media Messages in Mass Emergency: A Survey [Imran et al. 2015]
  93. 93. Conclusions •  Informa2on bestows power for disaster response –  People need informa+on as much as water, shelter, and food –  Disasters are unavoidable, but planning can lessen their effects •  Social media as 2me-cri2cal informa2on source –  Early warnings, event detec+on, event monitoring –  Availability of informa+on opens new opportuni+es •  Ar2ficial Intelligence for Disaster Response –  Applied research at its best –  AI + humans-in-the-loop can enable rapid crisis response –  AI techniques useful for: •  Situa+onal awareness •  Ac+onable informa+on extrac+on •  Summariza+on
  94. 94. THANK YOU! h<p://mimran.me/ Muhammad Imran CrisisNLP.qcri.org AIDR.qcri.org

×