Introduction to Machine Learning: An Application to Disaster Response

Muhammad Imran
Muhammad ImranScientist at the Qatar Computing Research Institute - Lead of the Crisis Computing group at QCRI
Introduction to Machine Learning:
An Application to Disaster Response
Muhammad Imran & Shafiq Joty
Qatar Computing Research Institute
Hamad Bin Khalifa University
Doha, Qatar
DISASTERS - SOCIAL MEDIA – RESPONSE EFFORTS
Humans suffering from the impacts of disasters, crises, and armed conflicts.
In the last two decades, 218 million people each year were affected by disasters;
At an annual cost to the global economy that exceeds $300 billion. (Source: UN)
@NYGovCuomo orders closing of NYC bridges. Only Staten Island
bridges unaffected at this time. Bridges must close by 7pm. #Sandy
#NYC.
rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours
after they got separated from their mom when car submerged in si.
#sandy #911buff
freaking out. home alone. will just watch tv #Sandy #NYC.
400 Volunteers are needed for areas that #Sandy destroyed.
SANDY HURRICANE TWEETS
@NYGovCuomo orders closing of NYC bridges. Only Staten Island
bridges unaffected at this time. Bridges must close by 7pm. #Sandy
#NYC.
rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours
after they got separated from their mom when car submerged in si.
#sandy #911buff
freaking out. home alone. will just watch tv #Sandy #NYC.
400 Volunteers are needed for areas that #Sandy destroyed.
Personal
Informative
SANDY HURRICANE TWEETS
@NYGovCuomo orders closing of NYC bridges. Only Staten Island
bridges unaffected at this time. Bridges must close by 7pm. #Sandy
#NYC.
rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours
after they got separated from their mom when car submerged in si.
#sandy #911buff
freaking out. home alone. will just watch tv #Sandy #NYC.
400 Volunteers are needed for areas that #Sandy destroyed.
Personal
Informative
Caution and Advice
Reports of missing people
Help/volunteers needed
SANDY HURRICANE TWEETS
@NYGovCuomo orders closing of NYC bridges. Only Staten Island
bridges unaffected at this time. Bridges must close by 7pm. #Sandy
#NYC.
rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours
after they got separated from their mom when car submerged in si.
#sandy #911buff
freaking out. home alone. will just watch tv #Sandy #NYC.
400 Volunteers are needed for areas that #Sandy destroyed.
Personal
Informative
Caution and Advice
Reports of missing people
Help/volunteers needed
SANDY HURRICANE TWEETS
Personal
Informative
(Direct & Indirect)
Other
Caution and advice
Casualties and damage
Donations
People missing, found, or seen
Information source
Siren heard, warning issued/lifted etc.
People dead, injured, damage etc.
Money, shelter, blood, goods, or services
Webpages, photos, videos information sources
…
FINDING TACTICAL AND ACTIONABLE INFORMATION
USEFUL INFORMATION ON TWITTER
Caution
and advice
Information
source
Donations
Causalities
& damage
A siren heard
Tornado warning issued/lifted
Tornado sighting/touchdown
42%
50%
30%
12%
18%
Photos as info. source
Webpages info. source
Videos as info. source
44%
20%
16%
Other donations
Money
Equipment, shelter,
Volunteers, Blood
38%
8%
54%
People injured
People dead
Damage
44%
44%
2%
16%
10%
% of informative tweets
Ref: “Extracting Information Nuggets from Disaster-Related Messages in Social Media”. Imran et al. ISCRAM-2013, Baden-Baden, Germany.
INFORMATION PROCESSING PIPELINE (SUPERVISED LEARNING):
OFFLINE APPROACH
Data collection
1 2
Human annotations
on sample data
Machine training
3
Classification
4
Disaster Timeline:
DATA COLLECTION
IMPACT AND RESPONSE TIMELINE
Department of Community Safety, Queensland Govt. & UNOCHA, 2011
Disaster response (today) Disaster response (target)
Target disaster response requires real-time processing of data.
TIME-CRITICAL ANLYSIS OF BIG CRISIS DATA
Apply machine learningApply crowdsourcing
REQUIREMENS & CHALLENGES
• Real-time analysis of data is required
• For rapid crisis response
• To reduce community harm
• Combine human and machine intelligence
• Usable and useful for end-users (mostly non-technical)
• End-users (stakeholders)
• Crisis managers (policy makers)
• Crisis responders (field workers)
REQUIREMENS & CHALLENGES
Other key challenges:
• Volume
Scale of data (20m tweets in 5 days)
• Velocity
Analysis of streaming data (16k/min)
• Variety
Different forms/types of data (information types)
• Veracity
Uncertainty of data
STREAM PROCESSING USING SUPERVISED ML
Combining human and machine computation
Quality assurance loops: human processing elements
do the work, automatic processing elements check for
consistency
Process-verify: work is done automatically, humans
check low-confidence or borderline cases
Online supervised learning: humans train the machine
to do the work automatically
Data collection
1 2
Human annotations Machine training
3
Classification
4
ONLINE APPROACH
DATA COLLECTION
H
A
Learning-1
CLASSIFICATION OF DATA & DECISION MAKING PROCESS
Learning-2 Learning-3 … Learning-n
Human
annotation - 1
Human
annotation - 2
Human
annotation - 3 …
Human
annotation - n
First few hours
INFORMATION PROCESSING PIPELINE: ONLINE APPROACH
(REAL-TIME)
http://aidr.qcri.org/
AIDR —Artificial Intelligence for Disaster Response— is a free, open-source, and easy-to-use
platform to automatically filter and classify relevant tweets posted during humanitarian crises.
1 2 3
Collect Curate Classify
AIDR: FROM END-USERS PERSPECTIVE
Collection Classifier(s)
• Keywords, hashtags
• Geographical bounding box
• Languages
• Follow specific set of users
A collection is a set of filters A classifier is a set of tags
• Donations requests & offers
• Damage & causalities
• Eyewitness accounts
• …
2 step approach
1 2
http://aidr.qcri.org/
AIDR APPROACH
Collection Classifier(s)
Tag Tag
Tag Tag
Learner
Classifier-1
Tag
Tag Tag Tag
30k/min
Classifier-2
http://aidr.qcri.org/
AIDR: HIGH-LEVEL ARCHTECTURE
http://aidr.qcri.org/
Items Collector Feature Extractor Classifier(s)
Learner
Crowdsourcing
Task GeneratorStream of incoming
items from data sources
Item &
featuresItem
An expert defines
classifiers by giving
a name and description
for each category
Expert
Items
Crowd workers/volunteers
Model
parameter
Classified
Item
A list of classified items by category
and classifier’s confidence
Labeling
tasks
Labeled
item
Data
source
Data
source
QUALITY VS. COST
http://aidr.qcri.org/
• Gaining acceptable quality
• Quality (classification accuracy)
• Cost (human labels: monetary in case of paid-workers, time in
case of volunteers)
Quality vs. cost using passive learning Quality vs. cost using active learning
PERFORMANCE
http://aidr.qcri.org/
• In terms of throughput and latency
Throughput of feature extractor, classifier, and the system
Latency of feature extractor, classifier, and the system
CHALLENGES: DOMAIN ADAPTATION
http://aidr.qcri.org/
• Crisis-specific labels are necessary
• Contrasting vocabulary use
• Differences in public concerns, affected infrastructure
• New labels should be collected for each new crisis
[ Imran et al. 2013b ]
• Domain adaptation
• Train models using all past labeled data (all types of events)
• Train on labeled data from past similar events
• Train on data from neighboring countries on similar events
AIDR – COLLECTION SETUP
Collection detail dashboard
http://aidr.qcri.org/
Geographical region filterLanguage filter
Collection definition
http://aidr.qcri.org/
AIDR – CLASSIFIER SETUP
AIDR – CLASSIFIER SETUP (cont.)
http://aidr.qcri.org/
AIDR – CROWDSOURCING-1
Internal Tagging Interface
http://aidr.qcri.org/
AIDR – CROWDSOURCING-2
MicroMapper Interface (browser clicker)
http://aidr.qcri.org/
Mobile clicker
AIDR – OUTPUT
http://aidr.qcri.org/
Training examples Classified output (achieved accuracy ~ 75%)
- Killed 27 people
- A million evacuated
- $114 million of damage
TYPHOON HAGUPIT (2014)
DEMO
http://aidr.qcri.org/
AIDR has been awarded the Grand Prize in the
Open Source Software World Challenge 2015
http://aidr.qcri.org/
AIDR —Artificial Intelligence for Disaster Response— is a free, open-source, and easy-to-use
platform to automatically filter and classify relevant tweets posted during humanitarian crises.
Thank you!
1 of 32

More Related Content

What's hot(18)

Viewers also liked(6)

Machine LearningMachine Learning
Machine Learning
butest550 views
Machine learning Lecture 3Machine learning Lecture 3
Machine learning Lecture 3
Srinivasan R13.6K views
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Shao-Chuan Wang3.9K views
Sensitive skinSensitive skin
Sensitive skin
Vivek Jha10.1K views

Similar to Introduction to Machine Learning: An Application to Disaster Response(20)

Examples of Real-World Big Data ApplicationExamples of Real-World Big Data Application
Examples of Real-World Big Data Application
Artificial Intelligence Institute at UofSC566 views
InfoCrisis.Social - Design ProcessInfoCrisis.Social - Design Process
InfoCrisis.Social - Design Process
Javier Velasco, PhD653 views
ICCM 2013 Ignite Session 1ICCM 2013 Ignite Session 1
ICCM 2013 Ignite Session 1
Tom Weinandy1.5K views
08302011 cc vtc_risk08302011 cc vtc_risk
08302011 cc vtc_risk
Deborah Shaddon438 views
BackLifeUp - Adapting to changeBackLifeUp - Adapting to change
BackLifeUp - Adapting to change
Guillaume Lauzier131 views
2nd Annual Geospatial Conference2nd Annual Geospatial Conference
2nd Annual Geospatial Conference
Heather Blanchard382 views
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
Artificial Intelligence Institute at UofSC20.1K views
Himss15 Paramedic Disaster DataHimss15 Paramedic Disaster Data
Himss15 Paramedic Disaster Data
Nick Nudell522 views
Domicology:  A Comprehensive Approach to Structural AbandonmentDomicology:  A Comprehensive Approach to Structural Abandonment
Domicology: A Comprehensive Approach to Structural Abandonment
Build Reuse formerly Building Material Reuse Association343 views

Introduction to Machine Learning: An Application to Disaster Response

  • 1. Introduction to Machine Learning: An Application to Disaster Response Muhammad Imran & Shafiq Joty Qatar Computing Research Institute Hamad Bin Khalifa University Doha, Qatar
  • 2. DISASTERS - SOCIAL MEDIA – RESPONSE EFFORTS Humans suffering from the impacts of disasters, crises, and armed conflicts. In the last two decades, 218 million people each year were affected by disasters; At an annual cost to the global economy that exceeds $300 billion. (Source: UN)
  • 3. @NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC. rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours after they got separated from their mom when car submerged in si. #sandy #911buff freaking out. home alone. will just watch tv #Sandy #NYC. 400 Volunteers are needed for areas that #Sandy destroyed. SANDY HURRICANE TWEETS
  • 4. @NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC. rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours after they got separated from their mom when car submerged in si. #sandy #911buff freaking out. home alone. will just watch tv #Sandy #NYC. 400 Volunteers are needed for areas that #Sandy destroyed. Personal Informative SANDY HURRICANE TWEETS
  • 5. @NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC. rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours after they got separated from their mom when car submerged in si. #sandy #911buff freaking out. home alone. will just watch tv #Sandy #NYC. 400 Volunteers are needed for areas that #Sandy destroyed. Personal Informative Caution and Advice Reports of missing people Help/volunteers needed SANDY HURRICANE TWEETS
  • 6. @NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC. rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours after they got separated from their mom when car submerged in si. #sandy #911buff freaking out. home alone. will just watch tv #Sandy #NYC. 400 Volunteers are needed for areas that #Sandy destroyed. Personal Informative Caution and Advice Reports of missing people Help/volunteers needed SANDY HURRICANE TWEETS
  • 7. Personal Informative (Direct & Indirect) Other Caution and advice Casualties and damage Donations People missing, found, or seen Information source Siren heard, warning issued/lifted etc. People dead, injured, damage etc. Money, shelter, blood, goods, or services Webpages, photos, videos information sources … FINDING TACTICAL AND ACTIONABLE INFORMATION
  • 8. USEFUL INFORMATION ON TWITTER Caution and advice Information source Donations Causalities & damage A siren heard Tornado warning issued/lifted Tornado sighting/touchdown 42% 50% 30% 12% 18% Photos as info. source Webpages info. source Videos as info. source 44% 20% 16% Other donations Money Equipment, shelter, Volunteers, Blood 38% 8% 54% People injured People dead Damage 44% 44% 2% 16% 10% % of informative tweets Ref: “Extracting Information Nuggets from Disaster-Related Messages in Social Media”. Imran et al. ISCRAM-2013, Baden-Baden, Germany.
  • 9. INFORMATION PROCESSING PIPELINE (SUPERVISED LEARNING): OFFLINE APPROACH Data collection 1 2 Human annotations on sample data Machine training 3 Classification 4 Disaster Timeline: DATA COLLECTION
  • 10. IMPACT AND RESPONSE TIMELINE Department of Community Safety, Queensland Govt. & UNOCHA, 2011 Disaster response (today) Disaster response (target) Target disaster response requires real-time processing of data.
  • 11. TIME-CRITICAL ANLYSIS OF BIG CRISIS DATA Apply machine learningApply crowdsourcing
  • 12. REQUIREMENS & CHALLENGES • Real-time analysis of data is required • For rapid crisis response • To reduce community harm • Combine human and machine intelligence • Usable and useful for end-users (mostly non-technical) • End-users (stakeholders) • Crisis managers (policy makers) • Crisis responders (field workers)
  • 13. REQUIREMENS & CHALLENGES Other key challenges: • Volume Scale of data (20m tweets in 5 days) • Velocity Analysis of streaming data (16k/min) • Variety Different forms/types of data (information types) • Veracity Uncertainty of data
  • 14. STREAM PROCESSING USING SUPERVISED ML Combining human and machine computation Quality assurance loops: human processing elements do the work, automatic processing elements check for consistency Process-verify: work is done automatically, humans check low-confidence or borderline cases Online supervised learning: humans train the machine to do the work automatically
  • 15. Data collection 1 2 Human annotations Machine training 3 Classification 4 ONLINE APPROACH DATA COLLECTION H A Learning-1 CLASSIFICATION OF DATA & DECISION MAKING PROCESS Learning-2 Learning-3 … Learning-n Human annotation - 1 Human annotation - 2 Human annotation - 3 … Human annotation - n First few hours INFORMATION PROCESSING PIPELINE: ONLINE APPROACH (REAL-TIME)
  • 16. http://aidr.qcri.org/ AIDR —Artificial Intelligence for Disaster Response— is a free, open-source, and easy-to-use platform to automatically filter and classify relevant tweets posted during humanitarian crises. 1 2 3 Collect Curate Classify
  • 17. AIDR: FROM END-USERS PERSPECTIVE Collection Classifier(s) • Keywords, hashtags • Geographical bounding box • Languages • Follow specific set of users A collection is a set of filters A classifier is a set of tags • Donations requests & offers • Damage & causalities • Eyewitness accounts • … 2 step approach 1 2 http://aidr.qcri.org/
  • 18. AIDR APPROACH Collection Classifier(s) Tag Tag Tag Tag Learner Classifier-1 Tag Tag Tag Tag 30k/min Classifier-2 http://aidr.qcri.org/
  • 19. AIDR: HIGH-LEVEL ARCHTECTURE http://aidr.qcri.org/ Items Collector Feature Extractor Classifier(s) Learner Crowdsourcing Task GeneratorStream of incoming items from data sources Item & featuresItem An expert defines classifiers by giving a name and description for each category Expert Items Crowd workers/volunteers Model parameter Classified Item A list of classified items by category and classifier’s confidence Labeling tasks Labeled item Data source Data source
  • 20. QUALITY VS. COST http://aidr.qcri.org/ • Gaining acceptable quality • Quality (classification accuracy) • Cost (human labels: monetary in case of paid-workers, time in case of volunteers) Quality vs. cost using passive learning Quality vs. cost using active learning
  • 21. PERFORMANCE http://aidr.qcri.org/ • In terms of throughput and latency Throughput of feature extractor, classifier, and the system Latency of feature extractor, classifier, and the system
  • 22. CHALLENGES: DOMAIN ADAPTATION http://aidr.qcri.org/ • Crisis-specific labels are necessary • Contrasting vocabulary use • Differences in public concerns, affected infrastructure • New labels should be collected for each new crisis [ Imran et al. 2013b ] • Domain adaptation • Train models using all past labeled data (all types of events) • Train on labeled data from past similar events • Train on data from neighboring countries on similar events
  • 23. AIDR – COLLECTION SETUP Collection detail dashboard http://aidr.qcri.org/ Geographical region filterLanguage filter Collection definition
  • 25. AIDR – CLASSIFIER SETUP (cont.) http://aidr.qcri.org/
  • 26. AIDR – CROWDSOURCING-1 Internal Tagging Interface http://aidr.qcri.org/
  • 27. AIDR – CROWDSOURCING-2 MicroMapper Interface (browser clicker) http://aidr.qcri.org/ Mobile clicker
  • 28. AIDR – OUTPUT http://aidr.qcri.org/ Training examples Classified output (achieved accuracy ~ 75%)
  • 29. - Killed 27 people - A million evacuated - $114 million of damage TYPHOON HAGUPIT (2014)
  • 31. AIDR has been awarded the Grand Prize in the Open Source Software World Challenge 2015
  • 32. http://aidr.qcri.org/ AIDR —Artificial Intelligence for Disaster Response— is a free, open-source, and easy-to-use platform to automatically filter and classify relevant tweets posted during humanitarian crises. Thank you!

Editor's Notes

  1. Finding tactical and actionable information from a millions of messages that people post on social media is a complex and challenging task. For this purpose, specifically for disasters we came up with a sensible ontology that has mainly three stages. Every stage refine a piece of information that thus can highly contribute to disaster management. In order to get to the actionable information it is required that we first categories a coming message to a predefined category that is of disaster-specific.
  2. I would like to start with showing you the results of our last year paper in ISCRAM. These charts show that a significant amount of valuable information is available can be extracted from tweets. In this work, we used state-of-art supervised machine learning techniques to classify tweets posted during disaster situations. Our conclusion from that work was Social media platforms like Twitter contain useful information. Now the big question is how one can get that useful information during an on-going disaster for an effective disaster response?
  3. One of the big challenges in the crowdsourcing
  4. AIDR tagger is a machine computational component. Users define classifiers by specifying categories they like tweets to classified against. AIDR tagger requires tagged examples for its learning process.
  5. On the left side, screenshot list training examples of a particular collection. One can review, and remove if required. On the right side, screenshot shows the classified output. That is, tweets classified into categories with confidence score.
  6. During disasters, nothing better than helping those who are affected to save lives. That’s exactly what we did during the Typhoon Hagupit that struck the Philippines in early December 2014 on a request from UN to help them find requests of help/needs, infrastructure damage, aid needs and provided using Social Media data. On your left side, the guardian page covering the whole story, and on your right side a map generated by our platforms AIDR, MicroMappers.
  7. One of the big challenges in the crowdsourcing