Successfully reported this slideshow.
Your SlideShare is downloading. ×

Processing Social Media Messages in Mass Emergency: A Survey

Ad

Processing Social Media Messages in
Mass Emergency: Survey Summary
Muhammad Imran Carlos Castillo
Fernando Diaz Sarah View...

Ad

Overarching Goal
“To extract time-critical information from social media
that is useful for emergency responders, affected...

Ad

Survey Study Selection
Domain filters Topic filters Data filters
- Humanitarian
- Disaster response
- Mass emergencies
- C...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Check these out next

1 of 24 Ad
1 of 24 Ad

Processing Social Media Messages in Mass Emergency: A Survey

Download to read offline

Millions of people use social media to share information during disasters and mass emergencies. Information available on social media, particularly in the early hours of an event when few other sources are available, can be extremely valuable for emergency responders and decision makers, helping them gain situational awareness and plan relief efforts. Processing social media content to obtain such information involves solving multiple challenges, including parsing brief and informal messages, handling information overload, and prioritizing different types of information. These challenges can be mapped to information processing operations such as filtering, classifying, ranking, aggregating, extracting, and summarizing. This work highlights these challenges and presents state of the art computational techniques to deal with social media messages, focusing on their application to crisis scenarios.

Millions of people use social media to share information during disasters and mass emergencies. Information available on social media, particularly in the early hours of an event when few other sources are available, can be extremely valuable for emergency responders and decision makers, helping them gain situational awareness and plan relief efforts. Processing social media content to obtain such information involves solving multiple challenges, including parsing brief and informal messages, handling information overload, and prioritizing different types of information. These challenges can be mapped to information processing operations such as filtering, classifying, ranking, aggregating, extracting, and summarizing. This work highlights these challenges and presents state of the art computational techniques to deal with social media messages, focusing on their application to crisis scenarios.

More Related Content

Similar to Processing Social Media Messages in Mass Emergency: A Survey

Processing Social Media Messages in Mass Emergency: A Survey

  1. 1. Processing Social Media Messages in Mass Emergency: Survey Summary Muhammad Imran Carlos Castillo Fernando Diaz Sarah Vieweg Authors mimran@hbku.edu.qa chato@acm.org diazf@acm.org sarahvieweg@gmail.com Date: 25th April 2018
  2. 2. Overarching Goal “To extract time-critical information from social media that is useful for emergency responders, affected communities, and other concerned population in disaster situations.” Urgent help need Urgent aid need
  3. 3. Survey Study Selection Domain filters Topic filters Data filters - Humanitarian - Disaster response - Mass emergencies - Computing - Artificial intelligence - Machine learning - Twitter - Facebook - Micro-blogging Keywords Final selection = 180 published research papers Domain Topics Data >700 articles Duplicate filters
  4. 4. Topics Covered Humanitarian + Social Media + AI Volume & Velocity (~18) Data acquisition, storage, and retrieval Event Detection (~36) Topic detection and tracking Classification & Clustering (~40) Classification and clustering Information Summarization ~(15) Abstractive and Extractive summarization Semantics and Crisis Ontologies (~10) Semantic enrichment & Crisis ontologies Information Veracity (~18) Credibility and misinformation Information Visualization (~12) Crisis maps, dashboards Total ~180 papers surveyed
  5. 5. Volume & Velocity
  6. 6. Twitter Storms during Emergencies Source: https://www.wsj.com/articles/twitter-storms-can-help-gauge-damage-of-real-storms-and-disasters-study-says-1457722801 (Castillo C, Big Crisis Data, 2016, Cambridge University Press) Volume Velocity 72k tweets/min 27 million in 3 days
  7. 7. (Yury Kryvasheyeu et al. Sci Adv 2016;2:e1500779) Blue: represents a location farther from the disaster Red: represents a location closer to the disaster Twitter Activity Across Locations during Disasters Activity Retweeting Strong relationship between proximity to Sandy’s path and social media activity
  8. 8. Event Detection
  9. 9. Event Description • Why to detect events from social media? – Human sensors report incidents very quickly – Tweet waves travel faster than earthquake waves • What is an event? – Events can be defined as situations, actions or occurrences that happen in a certain location at a specific time (Dou et al. 2012) • An event is generally characterized by: 5W1H – Who? When? Where? What? Why? How?
  10. 10. Event Detection using Bursty Behavior (Liang et al. Quantifying Information Flow During Emergencies, 2014, Nature.)
  11. 11. Event Detection Systems System Approach Event types Real- time Query type Spatio- temporal Sub- events Reference Twitter Monitor Burst detection Open domain Yes Open No No [Mathioudakis et al. 2010] TwitInfo Burst detection Earthquakes Yes Keyword Spatial Yes [Marcus et al. 2011] Twevent Burst detection Open domain Yes Open No No [Li et al. 2012b] TEDAS Supervised classification Crime/disast ers No Keyword Yes No [Li et al. 2012a] LeadLine Burst detection Open domain No Keyword Yes No [Dou et al. 2012] TwiCal Supervised classification Conflicts/poli tics Yes Open Temporal No [Ritter et al. 2012] Tweet4Act Dictionaries Disasters Yes Keyword No No [Chowdhury et al. 2013] ESA Burst detection Open domain Yes Keyword Spatial No [Robinson et al. 2013a]
  12. 12. Challenges and Future Directions • Inadequate spatial information – Spatial and temporal information are two integral components of an event – Automatic text-based geo-tagging may help • Mundane events – #MusicMonday #FollowFriday are misleading • Describing the events – Named-entities, tracking, semantic enhancements
  13. 13. Information Classification and Clustering
  14. 14. By Information Provided • Caution and advice [Imran et al. 2013b]; warnings [Acar and Muraki 2011]; hazard preparation [Olteanu et al. 2014]; tips [Leavitt and Clark 2014]; advice [Bruns 2014]; status, protocol [Hughes et al. 2014b] • Affected or trapped people [Caragea et al. 2011]; casualties, people missing, found, or seen [Imran et al. 2013b]; self-reports [Acar and Muraki 2011]; injured, missing, killed [Vieweg et al. 2010]; looking for missing people [Qu et al. 2011] • Infrastructure/utilities damage [Imran et al. 2013b]; collapsed structure [Caragea et al. 2011]; built environment [Vieweg et al. 2010]; closure and services [Hughes et al. 2014b] • Needs and donations of money, goods, services [Imran et al. 2013b]; food/water shortage [Caragea et al. 2011]; donations or volunteering [Olteanu et al. 2014]; help requests, relief coordination [Qu et al. 2011]; relief, donations, resources [Hughes et al. 2014b]; help and fundraising [Bruns 2014] • Other useful information: hospital/clinic service, water sanitation [Caragea et al. 2011]; consequences [Olteanu et al. 2014]
  15. 15. By Information Provided • Caution and advice [Imran et al. 2013b]; warnings [Acar and Muraki 2011]; hazard preparation [Olteanu et al. 2014]; tips [Leavitt and Clark 2014]; advice [Bruns 2014]; status, protocol [Hughes et al. 2014b] • Affected or trapped people [Caragea et al. 2011]; casualties, people missing, found, or seen [Imran et al. 2013b]; self-reports [Acar and Muraki 2011]; injured, missing, killed [Vieweg et al. 2010]; looking for missing people [Qu et al. 2011] • Infrastructure/utilities damage [Imran et al. 2013b]; collapsed structure [Caragea et al. 2011]; built environment [Vieweg et al. 2010]; closure and services [Hughes et al. 2014b] • Needs and donations of money, goods, services [Imran et al. 2013b]; food/water shortage [Caragea et al. 2011]; donations or volunteering [Olteanu et al. 2014]; help requests, relief coordination [Qu et al. 2011]; relief, donations, resources [Hughes et al. 2014b]; help and fundraising [Bruns 2014] • Other useful information: hospital/clinic service, water sanitation [Caragea et al. 2011]; consequences [Olteanu et al. 2014] - Supervised classification techniques - Learning algorithms include SVMs, Random Forest, Ensemble methods, and lately deep learning e.g., RNN - Unsupervised: clustering, and LDA for topic modeling Formal response organizations prefer supervised classification as most of the times categories are defined.
  16. 16. Systems for Crisis Data Processing Twitris [Purohit and Sheth 2013] Twitter; semantic enrichment, classify automatically, geotag SensePlace2 [MacEachren et al. 2011] Twitter; geotag, visualize heat-maps based on geotags EAIMS Emergency Analysis Identification and Management System [McCreadie et al. 2016] Twitter; sentiment, alerts, credibility, ESA Emergency Situation Awareness [Yin et al. 2012; Power et al. 2014] Twitter; detect bursts, classify, cluster, geotag
  17. 17. Systems for Crisis Data Processing Twitcident [Abel et al. 2012] Twitter and TwitPic; semantic enrichment, classify CrisisTracker [Rogstadius et al. 2013] Twitter; cluster, annotate manually Tweedr [Ashktorab et al. 2014] Twitter; classify automatically, extract information, geotag AIDR: Artificial Intelligence for Disaster Response [Imran et al. 2014a] Twitter & Facebook; annotate manually, classify automatically (text + image)
  18. 18. Challenges and Future Directions • Missing actionable insights – Who and where help is needed – Automatic extraction of actionable/serviceable msgs • Labeled data scarcity – Most of the systems are labeled data hungry – More robust domain adaption and transfer learning techniques are required • Focus on other content type (Images) – Images contain critical information (e.g., damage) – More focus on multimodal research is required
  19. 19. Information Summarization
  20. 20. Information Summarization Tribhuvan international airport closed after the quake Airport closed after 7.9 Earthquake in Kathmandu Tribhuvan international airport closed after 7.9 earthquake in Kathmandu. Summaries reduce information overload issue
  21. 21. Key Objectives and Challenges • Information coverage – Capture most situational updates from data. The summary should be rich in terms of information coverage • Less redundant information – Messages on Twitter contain duplicate information. Produce summaries with less redundant but important updates • Readability – Twitter messages are often noisy, informal, and full of grammatical mistakes. The aim here is to produce more readable summaries • Real-time (online/updated summaries) – The system should not be heavily overloaded with computations such that by the time the summary is produced, the utility of that information is marginal (McCreadie et al. 2013; Aslam et al. 2013; Nenkova and McKeown 2011; Guo et al. 2013, Rudra et al., 2016)
  22. 22. Crisis Datasets (Labeled + Unlabeled) CrisisMMD: Multimodal Twitter Datasets from Natural Disasters http://CrisisNLP.qcri.org/ http://CrisisLex.org/
  23. 23. Conclusion and Future Directions • Applied Research at its Best – Real-world problems and challenges – Social Media for Social Good – Decent work on information filtering and classification (last 6-8 years) • Social media imagery content is another potential source of information • Labeled data scarcity problem – No or few labeled data instances (in early hours) – High diversity among organizations needs – Information needs change overtime – Domain adaptation and transfer learning techniques required • From situational to actionable insights – Identify requests and needs in real-time – Triangulate missing information – Rank them based on their urgency to help responders
  24. 24. Thank you! Contact me at: mimran@hbku.edu.qa OR @mimran15 For queries, questions, and datasets: Recommended books: Processing Social Media Messages in Mass Emergency: A Survey. ACM Computing Surveys, 2015. Full survey paper:

Editor's Notes

  • Our goal in this paper was to survey systems, techniques, and computational models that help extract time-critical information from social media useful for emergency responders and affected communities.

    For example, look at these two messages. The message on the left side, which was collected during the recent hurricane Harvey, asks about urgent help for an old person who got trapped.
    The message on the right side, requests about urgent need of baby food and medicines during a flood situation in Kashmir.
  • Before start reading the papers, we decided three aspects that influence what papers to select and what not. We formed several keyword searches using domain + topics + data sources. We used several scholarly search engines

    After getting the results, two of the authors looked at the papers and filter out the ones which were not relevant. Our final set has around 184 papers.
    ----- Meeting Notes (4/16/18 13:04) -----
    - No listing, but the message
    - Opinions (
    -
  • These are some numbers from a few major past disasters from 2010 to 2013 originally reported in the WSJ. There were 27 million tweets posted in 3 days after the Boston marathon bombing in 2013.

    How fast these messages arrive?

    Well, during 2011 Japan earthquake the highest velocity record according the Big Crisis Data book, was 72k.

    It is not only the velocity is high, actually social media breaks stories faster than traditional channels. When a magnitude-5.8 earthquake hit Virginia in 2011, the first Twitter report from a bystander at the epicenter reached New York about 40 seconds ahead of the quake’s first shock waves. Sourced WSJ
  • Now with all the big volume and high velocity, the question is whether this Twitter activity indicate anything or is it random?

    According to this paper published in the Science Journal, there is a strong relationship between disaster proximity and social media activity. “Rapid assessment of disaster damage using social media activity

    In all charts, the primary plot shows results for messages with keyword “sandy” and the small chart for keyword “weather” to contrast behaviors between event-related and neutral words.

    Blue represents a location farther from the disaster. Red represents a location closer to the disaster.

    A: Chart A shows a sharp decline in the activity as the distance between a location and the path of the hurricane increases.

    B: The chart B shows the activity and retweet fraction. It seems that the retweet rate is inversely related to activity, with affected areas producing more original content.

    None of the features discussed above are present for neutral words (see the insets in all panels).

    --Backup—
    A: After the distance exceeds 1200 to 1500 km, its effect on the strength of response disappears. This trend may be caused by a combination of factors, with direct observation of disaster effects and perception of risk both increasing the tweet activity of the East Coast cities. Anxiety, anticipation, and risk perception evidently contribute to the magnitude of response because many of the communities falling into the decreasing trend were not directly hit or were affected only marginally, whereas New Orleans, for example, shows a significant tweeting level that reflects its historical experience with damaging hurricanes like Katrina.

    C: The chart C shows content popularity. The popularity of the content created in the disaster area is also higher and therefore increases with activity as well.
  • Now, with all these huge activity on social media during disasters, can we use it to automatically detection disaster events?
  • We want to detect events from social media because 1) human sensors are generally fast, 2) we saw that tweet waves travel faster than earthquake waves
  • According to a study published in Nature on “Quantifying Information flow during emergencies”. The authors used mobile SMS and calls to predict suspicious events.

    According to this study, the actions and reactions of affected people due to a disaster or due to a non-disaster event are differentiable.

    Go are users who directly affected by the disaster
    G1 are users who are contacted by G0 users

    If you compare, bombing, jet scare, and plane crash with concert event, you notice a consistent pattern in all disaster event which is not visible in the non-disaster event.

    G0 activity goes up as they hit disaster
    G1 also go up in the case of emergency, but not really in the case of non-emergency event
  • Several systems and techniques have been developed in the last couple of years.

    Here I listed a few important ones with their capabilities e.g, event type, real-time, query type, spatio-temporal, and whether they able to identify sub-events or not.

    You notice that most of these systems are based on burst detection, which is could be misleading, especially in social media due to mundane events messages.

    Temporal = able to predict the time of a detected event
    Spatial = able to predict the location of an event
  • After an event is detected, the next step is to analyze what the data. Two famous techniques classification and clustering have been used for this purpose.
  • Here I listed a number of works, with their detailed task.
  • Here I listed a number of works, with their detailed task.
  • Unfortunately most of these systems are not developed based on stakeholders needs. Future system should be requirements-driven
  • Information summarization is another very important step after classification.
    There are mainly two types of summarization approaches: extractive in which same content as source is used to generate summaries. Abstractive in which new content is used to summarize a set of documents.

×