Extracting Information Nuggets from
Disaster-Related Messages in Social Media
Muhammad Imran, Shady Elbassuoni, Carlos Cas...
Outline
• Social Media response to disaster
• Finding tactical and actionable information
• Disaster ontologies
• Filterin...
Disaster and Social Media
2.3 million tweets reflecting the words “Haiti”
or “Red Cross” from Jan 12 to Jan 14, 2010
http:...
Disaster and Social Media
Why Social Media?
• Virtual Collaboration, Information Sharing
• Highly valuable information
• Contribute to situational a...
Sandy Tweets
@NYGovCuomo orders closing of NYC bridges. Only Staten Island
bridges unaffected at this time. Bridges must c...
Sandy Tweets
@NYGovCuomo orders closing of NYC bridges. Only Staten Island
bridges unaffected at this time. Bridges must c...
Sandy Tweets
@NYGovCuomo orders closing of NYC bridges. Only Staten Island
bridges unaffected at this time. Bridges must c...
Sandy Tweets
@NYGovCuomo orders closing of NYC bridges. Only Staten Island
bridges unaffected at this time. Bridges must c...
Finding Tactical & Actionable Information
Personal
Informative
(Direct & Indirect)
Other
Caution and advice
Casualties and...
Our Approach
3.
Extraction
2.
Classification
1.
Filtering
Our Datasets
Joplin Dataset
• 206,764 tweets collected during Joplin tornado
that hit Joplin, Missouri on May 22, 2011
• C...
Our Datasets
Sandy Dataset:
• 140,000 tweets collected during hurricane Sandy
that hit northeastern USA on Oct 29, 2012
• ...
1. Filtering
Is disaster-
related?
Contributes to
situational
awareness?
Yes Yes
No No
1. Filtering: Training Data
32%
60%
8%
4406 tweets sampled uniformly from the
Joplin dataset Annotated using CrowdFlower
P...
2. Classification
Caution &
Advice
Information
Sources
Damage &
Casualties
Donations
Health
Shelter
Food
Water
Logistics
....
Distribution of Tweet Types
50%
18%
16%
10%
6%
Caution/Advice
Info Source
Donations
Casualties/Damage
Unknown
Joplin Torna...
Automatic Classification
Class Prec Rec F-Measure AUC
Caution and advice 0.85 0.76 0.80 0.91
Information source 0.54 0.58 ...
3. Extraction
...
Classified
tweets
@JimFreund: Apparently we have no choice.
There is a tornado watch in effect
tonight.
Labels for Extraction: Training Data
• Type-dependent instruction
• Ask evaluators to copy-paste a word/phrase
from each t...
Tool
• CMU ARK Twitter NLP
– Tokenization
– Feature extraction
– CRF learning
• Very easy to use: simply change the traini...
Extraction Evaluation
Setting Rec Prec
Train 2/3 Joplin, Test 1/3 Joplin 78% 90%
Train 2/3 Sandy, Test 1/3 Sandy 41% 79%
T...
Ongoing work
Self-service for crisis-related classification
• Machine learning software can be provided as
a service
– e.g. Google Pred...
Request Labeled / Unlabeled Datasets
Contact us at: mimran@qf.org.qa
References
• K. Starbird, L. Palen, A. Hughes, and S. Vieweg (2010) Chatter on the red: what hazards
threat reveals about ...
Thank you!
Muhammad Imran
mimran@qf.org.qa
With thanks to Carlos Castillo for several slides
Upcoming SlideShare
Loading in...5
×

ISCRAM 2013: Extracting Information Nuggets from Disaster-Related Messages in Social Media

155

Published on

Authors: Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz, Patrick Meier
Qatar Computing Research Institute

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
155
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Social media empowers individuals, providing them a platform from which to share opinions, experiences and information from anywhere at any time. Ultimately the shared information can be highly useful provided if analyzed timely and effectively. And that’s what I am going to present in this session.
  • Finding tactical and actionable information from a millions of messages that people post on social media is a complex and challenging task. For this purpose, specifically for disasters we came up with a sensible ontology that has mainly three stages. Every stage refine a piece of information that thus can highly contribute to disaster management. In order to get to the actionable information it is required that we first categories a coming message to a predefined category that is of disaster-specific.
  • Identifies what named entities, what caution/advice and temporal information and others.
  • The inter annotator agreement value shows the level of agreement among workers on an assessable unit(i.e., in our case a tweet). High agreement indicates that different workers frequently gave the same response forthe same tweet message.
  • Transcript of "ISCRAM 2013: Extracting Information Nuggets from Disaster-Related Messages in Social Media"

    1. 1. Extracting Information Nuggets from Disaster-Related Messages in Social Media Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz, Patrick Meier
    2. 2. Outline • Social Media response to disaster • Finding tactical and actionable information • Disaster ontologies • Filtering, classification and extraction • Ongoing work • Discussion
    3. 3. Disaster and Social Media 2.3 million tweets reflecting the words “Haiti” or “Red Cross” from Jan 12 to Jan 14, 2010 http://www.sysomos.com
    4. 4. Disaster and Social Media
    5. 5. Why Social Media? • Virtual Collaboration, Information Sharing • Highly valuable information • Contribute to situational awareness • Highly useful, if analyzed timely and effectively
    6. 6. Sandy Tweets @NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC. rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours after they got separated from their mom when car submerged in si. #sandy #911buff freaking out. home alone. will just watch tv #Sandy #NYC. 400 Volunteers are needed for areas that #Sandy destroyed.
    7. 7. Sandy Tweets @NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC. rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours after they got separated from their mom when car submerged in si. #sandy #911buff freaking out. home alone. will just watch tv #Sandy #NYC. 400 Volunteers are needed for areas that #Sandy destroyed. Personal Informative
    8. 8. Sandy Tweets @NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC. rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours after they got separated from their mom when car submerged in si. #sandy #911buff freaking out. home alone. will just watch tv #Sandy #NYC. 400 Volunteers are needed for areas that #Sandy destroyed. Personal Informative Caution and Advice Casualties and Damage Donations
    9. 9. Sandy Tweets @NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC. rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours after they got separated from their mom when car submerged in si. #sandy #911buff freaking out. home alone. will just watch tv #Sandy #NYC. 400 Volunteers are needed for areas that #Sandy destroyed. Personal Informative Caution and Advice Casualties and Damage Donations
    10. 10. Finding Tactical & Actionable Information Personal Informative (Direct & Indirect) Other Caution and advice Casualties and damage Donations People missing, found, or seen Information source Siren heard, warning issued/lifted etc. People dead, injured, damage etc. Money, shelter, blood, goods, or services Webpages, photos, videos information sources …
    11. 11. Our Approach 3. Extraction 2. Classification 1. Filtering
    12. 12. Our Datasets Joplin Dataset • 206,764 tweets collected during Joplin tornado that hit Joplin, Missouri on May 22, 2011 • Collected by researchers at the university of Colorado at Boulder • Collected through Twitter API by monitoring the tweets with hashtags #joplin or #tornado
    13. 13. Our Datasets Sandy Dataset: • 140,000 tweets collected during hurricane Sandy that hit northeastern USA on Oct 29, 2012 • Collected through Twitter API by monitoring the tweets with hashtag #sandy or #nyc
    14. 14. 1. Filtering Is disaster- related? Contributes to situational awareness? Yes Yes No No
    15. 15. 1. Filtering: Training Data 32% 60% 8% 4406 tweets sampled uniformly from the Joplin dataset Annotated using CrowdFlower Personal Informative Other
    16. 16. 2. Classification Caution & Advice Information Sources Damage & Casualties Donations Health Shelter Food Water Logistics ... ... Filtered tweets
    17. 17. Distribution of Tweet Types 50% 18% 16% 10% 6% Caution/Advice Info Source Donations Casualties/Damage Unknown Joplin Tornado (2011)
    18. 18. Automatic Classification Class Prec Rec F-Measure AUC Caution and advice 0.85 0.76 0.80 0.91 Information source 0.54 0.58 0.56 0.76 Donations 0.72 0.71 0.72 0.89 Casualties/damage 0.52 0.65 0.58 0.87 • Binary (hashtags, URL, emotion etc.) • Scalar (tweet length) • Text features (Unigram, bigram, POS tags, Verbnet etc.) Features:
    19. 19. 3. Extraction ... Classified tweets @JimFreund: Apparently we have no choice. There is a tornado watch in effect tonight.
    20. 20. Labels for Extraction: Training Data • Type-dependent instruction • Ask evaluators to copy-paste a word/phrase from each tweet
    21. 21. Tool • CMU ARK Twitter NLP – Tokenization – Feature extraction – CRF learning • Very easy to use: simply change the training set (part-of-speech tags) into anything, and re- train
    22. 22. Extraction Evaluation Setting Rec Prec Train 2/3 Joplin, Test 1/3 Joplin 78% 90% Train 2/3 Sandy, Test 1/3 Sandy 41% 79% Train Joplin, Test Sandy 11% 78% Train Joplin + 10% Sandy, Test 90% Sandy 21% 81% • Precision is: one word or more in common with what humans extracted (Imran et al., 2013)
    23. 23. Ongoing work
    24. 24. Self-service for crisis-related classification • Machine learning software can be provided as a service – e.g. Google Prediction API • Can we provide crisis-related tweet classification as a service? – Automatic collection of tweets – Re-usable ontologies / default training sets – Active learning
    25. 25. Request Labeled / Unlabeled Datasets Contact us at: mimran@qf.org.qa
    26. 26. References • K. Starbird, L. Palen, A. Hughes, and S. Vieweg (2010) Chatter on the red: what hazards threat reveals about the social life of microblogged information. In Proceedings of the 2010 ACM conference on Computer supported cooperative work, pages 241–250. ACM. • Latonero, Mark, and Irina Shklovski. "“Respectfully Yours in Safety and Service”: Emergency Management & Social Media Evangelism." Proceedings of the 7th International ISCRAM Conference–Seattle. Vol. 1. 2010. • Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier. Practical Extraction of Disaster-Relevant Information from Social Media. WWW-2013 SWDM, May 2013
    27. 27. Thank you! Muhammad Imran mimran@qf.org.qa With thanks to Carlos Castillo for several slides

    ×