Successfully reported this slideshow.
Your SlideShare is downloading. ×

Real-Time Processing of Social Media Content for Social Good

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 107 Ad
Advertisement

More Related Content

Slideshows for you (20)

Similar to Real-Time Processing of Social Media Content for Social Good (20)

Advertisement
Advertisement

Real-Time Processing of Social Media Content for Social Good

  1. 1. Real-Time Processing of Social Media Content for Social Good Muhammad Imran Research Scien,st Qatar Compu,ng Research Ins,tute Hamad Bin Khalifa University Doha, Qatar April 20th, 2017 Data Science Workshop
  2. 2. Outline •  P1: Background of Humanitarian CompuBng (10%) –  Sudden-onset emergencies, Time-cri,cal situa,ons –  Social Good factors –  Aid and informa,on needs •  P2: The Role of Social Media for Social Good (20%) –  Par,cular focus on micro-blogging plaKorms –  Availability of various types of informa,on and opportuni,es •  P3: The Role of ArBficial Intelligence for Social Good (70%) –  How AI is useful in crisis response –  Various AI techniques, approaches, and tools –  Work of crisis compu,ng group at QCRI –  Ongoing research –  Future direc,ons
  3. 3. Aid Needs, InformaBon Needs, and Gaps Info. Info. Info. Disaster event (earthquake, flood) Urgent needs of affected people InformaBon gathering Humanitarian organizaBons and local administraBon InformaBon gathering, especially in real-Bme, is the most challenging part Relief operaBons -  Food, water -  Shelter -  Medical assistance -  DonaBons -  Service and uBliBes
  4. 4. Aid Needs, InformaBon Needs, and Gaps Info. Info. Info. Disaster event (earthquake, flood) Urgent needs of affected people InformaBon gathering Humanitarian organizaBons and local administraBon InformaBon gathering, especially in real-Bme, is the most challenging part Relief operaBons -  Food, water -  Shelter -  Medical assistance -  DonaBons -  Service and uBliBes --Informa,on Bestows Power-- Will access to informaBon solve the problem?
  5. 5. Decision-Making and Response Department of Community Safety, Queensland Govt. & UNOCHA, 2011 -  Delayed decision-making -  Delayed crisis response -  High community harm -  Early decision-making -  Rapid crisis response -  Low community harm Target
  6. 6. Decision-Making and Response Department of Community Safety, Queensland Govt. & UNOCHA, 2011 -  Delayed decision-making -  Delayed crisis response -  High community harm -  Early decision-making -  Rapid crisis response -  Low community harm Target --Need Early Informa,on-- How early do we need it?
  7. 7. The Value of Timely InformaBon During Disasters Based on FEMA large-scale survey among emergency management professionals across the US. InformaBon value When informaBon is too late
  8. 8. The Value of Timely InformaBon During Disasters Based on FEMA large-scale survey among emergency management professionals across the US. InformaBon value When informaBon is too late
  9. 9. InformaBon Types and Needs •  Reports of Injured or dead people •  Infrastructure damage (e.g., buildings, bridges, Roads) •  Urgent needs of affected people (e.g., food, water, shelter) •  Dona,on offers and requests (e.g., money, volunteers) •  Medical Emergencies •  Disease symptoms reports •  Disease treatment reports and ques,ons •  …
  10. 10. Part 2: The Role of Social Media
  11. 11. CommunicaBons Before and A_er ICT and Social Media Gerald Baron
  12. 12. InformaBon Availability in the Age of ICT and Social Media Based on FEMA large-scale survey among emergency management professionals across the US. 1990s 2000s 2010s InformaBon value When informaBon is too late
  13. 13. Sandy Hurricane Twiaer Data Analysis @NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this ,me. Bridges must close by 7pm. #Sandy #NYC. rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours aber they got separated from their mom when car submerged in si. #sandy #911buff freaking out. home alone. will just watch tv #Sandy #NYC. 400 Volunteers are needed for areas that #Sandy destroyed.
  14. 14. @NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this ,me. Bridges must close by 7pm. #Sandy #NYC. rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours aber they got separated from their mom when car submerged in si. #sandy #911buff freaking out. home alone. will just watch tv #Sandy #NYC. 400 Volunteers are needed for areas that #Sandy destroyed. Personal Informa,ve Sandy Hurricane Twiaer Data Analysis
  15. 15. @NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this ,me. Bridges must close by 7pm. #Sandy #NYC. rt @911buff: public help needed: 2 boys 2 & 4 missing nearly 24 hours aber they got separated from their mom when car submerged in si. #sandy #911buff freaking out. home alone. will just watch tv #Sandy #NYC. 400 Volunteers are needed for areas that #Sandy destroyed. Personal Informa,ve Cau,on and Advice Missing people report Dona,on request Sandy Hurricane Twiaer Data Analysis
  16. 16. MERS Outbreak: Twiaer Data Analysis Middle East Respiratory Syndrome (MERS) Twicer data analysis from: 2014-04-27 to 2014-07-14 QualitaBve analysis categories: Reports of symptoms Affected people reports Death reports Disease transmission reports Preven,on ques,ons Treatment ques,ons Reports of signs or symptoms such as fever, cough or ques,ons Reports of affected people due to the MERS disease Reports of deaths due to the MERS disease Ques,ons or sugges,ons related to the preven,on of disease Reports or ques,ons related to the transmission of the disease Ques,ons or sugges,ons regarding the treatment of the disease
  17. 17. Social Media During MERS Outbreak RT @abecel: Two workers at FL hospital exposed to a pa,ent with Middle East Respiratory Syndrome are showing flu-like symptoms Coronavirus symptoms include: fever, coughing, shortness of breath, congesBon in the nose and throat, and in some cases diarrhea. MERS #MERS is a rela,vely new respiratory illness, spread b/w people in close contact. Symptoms are fever, cough, & shortness of breath. Saudi Arabia finds another 32 MERS cases as disease spreads: RIYADH (Reuters) - Saudi Arabia said on Thursday ... hcp://t.co/cPhm0uTRCo Signs and symptoms Signs and symptoms Signs and symptoms Affected individuals
  18. 18. Social Media During MERS Outbreak First Case of Deadly Middle Eastern Virus Found in U.S.: The Centers for Disease Control has confirmed that a case of the deadly Midd... Third Case of MERS Confirmed in the U.S.: The U.S. Centers for Disease Control and Preven,on confirmed on Sat... hcp://t.co/Sb8PMyxVUn No clear transmission link btwn camels and humans for MERS. 94% Egyp,an camels seroposi,ve but no human cases yet. Hmm #asm2014 Saudi health authori,es announced on Monday that the death toll from the MERS coronavirus has reached 115 since the respiratory disease ... Transmission Death reports Affected individuals Affected individuals
  19. 19. Twiaer Breaks Events Faster First report Breaks the story 33 minutes before local TV Hudson Plane Crash Westgate Mall Aaack
  20. 20. Twiaer Breaks Events Faster First report on Twiaer A_er 1 minute A_er 2 minutes Boston Bombing
  21. 21. Types of InformaBon on Twiaer -  Twicer data from 13 recent crises -  Over 100,000 tweets -  InformaBon types -  Types of sources Source: Qatar Compu,ng Research Ins,tute - Published in World Humanitarian Data and Trends 2014 (UN OCHA)
  22. 22. 2013 Pakistan Earthquake September 28 at 07:34 UTC 2010 HaiB Earthquake January 12 at 21:53 UTC Data and OpportuniBes Social Media Plaiorms Availability of Immense Data: Around 16 thousands tweets per minute were posted during the hurricane Sandy in the US. OpportuniBes: -  Early warning and event detecBon -  SituaBonal awareness -  AcBonable informaBon extracBon -  Rapid response -  EffecBve communicaBons Disease outbreaks
  23. 23. Part 3 The Role of AI and Data Science for Social Good
  24. 24. Big Data Challenges – 4Vs (Under Time-criBcal SituaBons) •  Volume Scale of data (e.g., millions of tweets aber an event) •  Velocity High-velocity streams (e.g., thousands of tweets/min) •  Variety Different forms/types of data (informa,on types) •  Veracity Uncertainty of data
  25. 25. Data AcquisiBon
  26. 26. Twiaer Data CollecBon •  REST APIs –  Provides programma,c access to post a new tweet, read profile, and followers. •  Streaming APIs –  Receive live updates on the latest tweets matching a search query. •  Ads API, MoPub, and Gnip –  Twicer adver,sing management, MoPub is a mobile ad exchange and ad server. –  Gnip provides commercial-grade access to real-,me and historical Twicer data.
  27. 27. REST vs. Streaming API REST API Streaming API Public streams User streams Site streams Streaming endpoints Sample code hcps://github.com/twicerdev
  28. 28. ProperBes of Social Media Data •  Mostly SM data is publicly available •  Near Real-Bme access •  1% to 3% geo-tagged •  Highly informal, oben brief, and non- structured •  Wricen by different people in many languages •  Contains rumors and misinforma,on
  29. 29. Slangs and Shortened forms •  Single-word slangs: pls (please), srsly (seriously) •  MulB-word slangs: imo (in my opinion) •  Misspellings: missin (missing), ovrcme (overcome) •  PhoneBc subsBtuBon: 2morrow (tomorrow) •  Word without spaces: prayfornepal (pray for nepal) Can you guess? “r u ok m8” ?? >> “Are you OK, mate?”
  30. 30. Data Velocity and Volume High velocity •  2012 Hurricane Sandy: 18,000 tweets/min •  2013 Boston bombings: 54,000 tweets/min •  2011 Japan earthquake: 66,000 tweets/min High volume •  2012 Hurricane Sandy: 20 million tweets in 5 days Batch Periodic Near real-,me Real-,me Stream Increase in Data Velocity KB MB GB TB PB Increase in Data Volume File system -- MySQL -- Postgres – MongoDB -- Apache Cassandra -- Redis
  31. 31. Data Processing
  32. 32. Social Media InformaBon Processing •  Natural Language Processing Methods – Informa,on extrac,on (e.g. person, loca,on, organiza,on) – ClassificaBon and clustering – Automa,c summariza,on – Seman,c search – Machine transla,on •  Imagery content processing – Object detec,on & recogni,on – Image retrieval and filtering – Automa,c annota,on
  33. 33. Supervised ClassificaBon Data collec,on 1 2 Human annota,ons on sample data Machine training 3 Classifica,on 4 Event Timeline: DATA COLLECTION Humans alone cannot process large amounts of data, so we only use them to help process a subset We train machine using human input to automa,cally process large Data at high speed For example using Keywords, hashtags etc.
  34. 34. Data Stream Processing 1.  Data items arrive online 2.  Streams have infinite length and unbounded in size 3.  No control over the order in which data items arrive 4.  Processed items are either discarded or archived 5.  No retrieval unless stored in memory (oben small size) Credit Card fraud detecBon Sensor data classificaBon Social media streams mining Data stream
  35. 35. TradiBonal vs. Stream Processing Property TradiBonal System Stream Processing System Number of passes Mul,ple Single Memory availability Unlimited Restricted Processing ,me Unlimited Restricted Results availability Delayed Real-,me Results reliability Accurate Improvable
  36. 36. Pure Stream Processing and Issues •  Rely en,rely on automated algorithms •  SM data streams can be imprecise, highly variable, and oben unseen –  Concept-dri_: happens due to slow changes in the concepts –  Concept-evoluBon: happens due to the presence of unknown classes Aurora Stream Processing (Brown University) Flu pandemic 2009
  37. 37. Crowdsourced Stream Processing (CSP) In cases where cri1cal—in terms of cost, 2me or reliability—decision-making needs to take place in real-1me, based on data streams that are poten2ally noisy and unseen, fully automated stream processing systems do not meet the needs. Stream processing systems (SPs) Crowdsourcing systems (Cs) Crowdsourced stream processing systems (CSPs) Human processing role Automatic processing role Compostion Binary classification N-ary classification Open-ended Computation Filtering Task-generation Task-assignment Task-aggregation Serial Parallel Complex Hierarchal taxonomy Faceted taxonomy System Ref. Imran, Muhammad, Ioanna Lykourentzou, Yannick Naudet, and Carlos Cas2llo. "Engineering crowdsourced stream processing systems." arXiv preprint arXiv:1310.5463.
  38. 38. hcp://aidr.qcri.org/ AIDR —Ar,ficial Intelligence for Disaster Response— is a free, open, and easy-to-use plaKorm to automa,cally filter and classify relevant tweets posted during humanitarian crises. 1 2 3 Collect Curate Classify Grand Prize Winner from the Open Source So_ware World Challenge 2015
  39. 39. Data collec,on 1 2 Human annota,ons Machine training 3 Classifica,on 4 ONLINE APPROACH DATA COLLECTION H A Learning-1 CLASSIFICATION OF DATA & DECISION MAKING PROCESS Learning-2 Learning-3 … Learning-n Human annota,on - 1 Human annota,on - 2 Human annota,on - 3 … Human annota,on - n First few hours Near Real-Bme Processing
  40. 40. Data ClassificaBon Apply machine learning Apply crowdsourcing Goal: To find relevant and ac,onable informa,on in near real-,me. Growing stack of data AIDR Machine Learning + Crowdsourcing Filter-failure Need human-labeled examples
  41. 41. Real-Bme ClassificaBon of Social Media Data hcp://aidr.qcri.org/
  42. 42. AIDR Architecture Tweets collector Twitter streaming API Features extractor ClassifierP/S Task generator Q P/S Annotator model parameters Learner Output adapters Q Q load shedding load shedding query tweets 〈tweet〉 〈tweet, features〉 〈task〉 〈task, label〉 〈tweet, label, confidence〉 Redis channel Redis queue Human-in-the-loop (crowdsourcing) - Uni-grams - Bi-grams - InformaBon gain Random Forest (decision trees) - Task selecBon - Task prioriBzaBon Database: Postgres ApplicaBon layer: Java EE, RESTFul services, Weka machine learning library Data flow and control flow: Redis Front-end: ExtJS (JavaScript library)
  43. 43. Data CollecBon in AIDR (Twiaer) CollecBon details dashboard hcp://aidr.qcri.org/ Geographical region filter Language filter CollecBon setup
  44. 44. Data ClassificaBon Approach 3. Extrac,on 2. Classifica,on 1. Filtering
  45. 45. 1. Filtering Is event- related? Contributes to situaBonal awareness? Yes Yes No No
  46. 46. 2. ClassificaBon Caution & Advice Information Sources Damage & Casualties Donations Health Shelter Food Water Logistics ... ... Filtered tweets
  47. 47. hcp://aidr.qcri.org/ Sesng up Classifiers
  48. 48. AIDR – Classifier Sesng (cont.) hcp://aidr.qcri.org/
  49. 49. Human AnnotaBon in AIDR Internal Tagging Interface hcp://aidr.qcri.org/
  50. 50. Human AnnotaBon Using MicroMappers MicroMapper Interface (web clicker) hcp://aidr.qcri.org/ Mobile clicker
  51. 51. Tagged Items and Machine Output hcp://aidr.qcri.org/ Training examples Classifiers’ output
  52. 52. Quality, Cost, and Performance of AIDR
  53. 53. Quality vs. Cost in AIDR hcp://aidr.qcri.org/ Goals: Maximize quality – Minimize cost •  Quality •  Classifica,on accuracy •  Precision/AUC •  Cost to obtain labeled data •  Monetary in case of paid-workers •  Time in case of volunteers
  54. 54. Quality vs. Cost in AIDR hcp://aidr.qcri.org/ Quality vs. cost using passive learning and with/without de-duplicaBon Quality vs. cost using acBve learning and with/without de-duplicaBon
  55. 55. Performance hcp://aidr.qcri.org/ In terms of throughput and latency Latency of feature extractor, classifier, and the system Throughput of feature extractor, classifier, and the system
  56. 56. Processing Evolving Data Streams
  57. 57. Data Stream Processing 1.  Data items in the stream arrive online 2.  Streams have infinite length and unbounded in size 3.  No control over the order in which data items arrive 4.  Processed items are either discarded or archived 5.  No retrieval unless stored in memory (oben small size) Credit Card fraud detecBon Sensor data classificaBon Social media streams mining Data stream
  58. 58. Types of Changes in SM Streams Types of Stream Dribs Concept Drib Feature Evolu,on Concept Evolu,on Class boundaries change over ,me Feature subspace may change New features appear Feature distribu,on changes Novel classes emerge Recurrent novel classes re-appear
  59. 59. Types of Changes in Streaming Data Except Noise and Blip, all the presented changes are treated as concept drib and require model adapta,on. Ref. Brzeziński, Dariusz. "Mining data streams with concept drib." PhD diss., Master’s thesis, Poznan University of Technology, 2010.
  60. 60. InformaBon Variability on Social Media •  Different events present different informa,on categories •  Even for recurring events, categories propor,on change
  61. 61. InformaBon Variability on Social Media •  Different events present different informa,on categories •  Even for recurring events, categories propor,on change
  62. 62. InformaBon Variability on Social Media •  Different events present different informa,on categories •  Even for recurring events, categories propor,on change
  63. 63. InformaBon Variability on Social Media •  Different events present different informa,on categories •  Even for recurring events, categories propor,on change
  64. 64. InformaBon Variability on Social Media •  Different events present different informa,on categories •  Even for recurring events, categories propor,on change
  65. 65. Social Media Data Streams ClassificaBon Two major issues in the supervised classifica,on of social media streams: 1.  How to keep the categories used for classificaBon up-to-date? 2.  While adding new categories, how to maintain high classificaBon accuracy? by crowd Automatic processing Automatic processing output output Performing verification Providing training data a: Split automatic/manual processing b: Detect-verify paradigm Automatic processing Automatic processing output c: Improving quality through active learning input input input
  66. 66. IdenBficaBon of Novel Categories Classes. -  Injured people -  Infrastructure damage -  Shelter needs -  Dona,on requests -  Missing or stranded people -  Different health issues -  Novel urgent needs like -  Blankets -  Medicine -  Schools shut -  Airport closed/open -  … Pre-defined classes Unseen classes (Miscellaneous) Keep in mind we have a new class “Miscellaneous”
  67. 67. Expert-Machine-Crowd Sesng Constraints Outlier DetecBon (COD-Means): 1.  Constraints forma,on using classified items 2.  Clustering using COD-Means 3.  Labeling errors iden,fica,on (using outlier detec,on) List of categories documents stream Supervised Learning System Novel Categories Detector Using COD-Means Crowdsourcing task generator Emerging novel categories Crowdsourcing tasks to be labeled by crowd An expert Crowd workers Crowd/machine classified items. (Machine classified items with confidence score >= 0.90) Incoming uncategorized documents stream Machine categorized items (item, category and machine confidence score) triplet Refined training set Human labels Labels 1 2 3 4
  68. 68. Input and Output Category A Category B Category C Miscellaneous Z Category A’ Category B’ Category C’ Z1 Z2 Z’ INPUT OUTPUT
  69. 69. Constraints FormaBon 1. Items in same category have Must-link constraints 2. Items belonging to different categories have Cannot-link constraints Category A Category B Category C Category Z Must-link Cannot-link Note: Items in Z do not have any constraints
  70. 70. ObjecBve FuncBon Standard distor2on error If an ML constraint if violated then the cost of the viola2on is equal to the distance between the two centroids that contain the instances. If a CL constraint is violated then the error cost is the distance between the centroid C assigned to the pair and its nearest centroid h(c).
  71. 71. Assignment and Update Rules Rule 1: For items without any constraints (standard distor,on error) Rule 2: For items with Must-link constraints; cost of viola,on is distance b/w their centroids Rule 3: For items with Cannot-link constraints; cost is the distance b/w centroid c and Its nearest centroid is the Kronecker delta func2on i.e. it is 1 if x=y and 0 if x != y Update rule: The update rule computes a modified average of all points that belong to a cluster.
  72. 72. COD-Means Algorithm Algorithm 1 2 3 Ini2aliza2on (e.g. random pick of k centroids) Assignment of items based on 3 assignment rules considering ML and CL constraints Points in each cluster are sorted based on their distance to the centroid and top l are removed and inserted into L
  73. 73. Dataset and Experiments 1.  Are the new clusters iden,fied by the COD-Means algorithm genuinely different and novel? 2.  What is the nature of outliers (labeling errors) discovered by the COD-Means algorithm? Are they genuine outliers? 3.  What is the impact of outlier on the quality of clusters generated by COD-Means? 4.  Once refined clusters (without labeling errors) used in the training process, does the overall accuracy improves? 8 disaster-related datasets were used from Twiaer
  74. 74. Clusters Novelty and Coherence K-Means vs. COD-Means •  The proposed approach generates more cohesive and novel clusters by removing outliers •  As the value of L increases, more ,ght and coherent clusters emerge
  75. 75. Data Improvements EvaluaBon Affected individuals Caution and advice Donations and volunteering Infrastructure and utilities Sympathy and support Misc. to other categories Precision 0 0.25 0.5 0.75 1 Precision 0 0.25 0.5 0.75 1 2012 Colorado Wildfires 2013 Alberta Floods 2013 Boston Bombings 2013 Colorado Floods 2013 Train Crash 2013 Australia Bushfire 2013 Queensland Floods 2013 West Texas Explosion Precision 0 0.25 0.5 0.75 1 Precision 0 0.25 0.5 0.75 1 Affected individuals Caution and advice Donations and volunteering Infrastructure and utilities Sympathy and support Misc. to other categories Precision 0 0.25 0.5 0.75 1 Precision 0 0.25 0.5 0.75 1 Precision 0 0.25 0.5 0.75 1 Precision 0 0.25 0.5 0.75 1 1.  Labeling errors in non-miscellaneous categories 2.  Items incorrectly labeled as miscellaneous
  76. 76. Impact on ClassificaBon Performance
  77. 77. Social Media Image Processing An ApplicaBon of Computer Vision
  78. 78. “A picture is worth a thousand words.”
  79. 79. Research Goals •  Social media image filtering – Real-,me image retrieval, processing, and storage – Duplicate or near-duplicate detec,on – Irrelevant image detec,on •  AcBonable informaBon extracBon – Infrastructure damage assessment – Injured people detec,on
  80. 80. AutomaBc Image Processing Pipeline Dat Tien Nguyen, Firoj Alam, Ferda Ofli, Muhammad Imran. Automa2c Image Filtering on Social Networks Using Deep Learning and Perceptual Hashing During Crises. Accepted for publica2on at the 14th Interna2onal Conference on Informa2on Systems for Crisis Response And Management (ISCRAM). 2017 Albi, France.
  81. 81. Disaster Datasets (Twiaer) Dataset details for all four disaster events with their year and number of images Number of labeled images for each dataset and each damage category
  82. 82. Relevancy Filtering Examples of irrelevant images showing cartoons, banners, adver,sements, celebri,es, etc. Performance of the relevancy filtering Task: Build a binary classifier Approach: Transfer learning (fine-tune a pre-trained convolu,onal neural network, e.g., VGG16*) * Simonyan, K. and Zisserman, A. (2014). “Very deep convolu,onal networks for large-scale image recogni,on”. In: arXiv preprint arXiv:1409.1556
  83. 83. Duplicate Filtering Examples of near-duplicate images Task: Compute similarity between a pair of images Approach: Perceptual Hash* + Hamming Distance (w/ threshold) * Lei, Y. et al. (2011). “Robust image hash in Radon transform domain for authen,ca,on”. In: Signal Processing: Image Communica,on 26.6, pp. 280–288.
  84. 84. Before/A_er Image Filtering Number of images that remain in our dataset aber each image filtering opera,on ~ 2 % ~ 2 % ~ 50 % ~ 58 % ~ 50 % ~ 30 %
  85. 85. Before/A_er Image Filtering Number of images that remain in our dataset aber each image filtering opera,on ~ 2 % ~ 2 % ~ 50 % ~ 58 % ~ 50 % ~ 30 % Assume tagging an image costs $1, we could have gocen the same job done by paying $17k less, almost saving 2/3s of the budget!!!
  86. 86. Infrastructure Damage Assessment •  Three-class classifica,on – Categories: severe, mild & licle-to-none •  Dis,nc,on between categories is ambiguous. •  Agreement among human annotators is low. –  in par,cular for mild category •  Fine-tuning a pre-trained CNN (e.g., VGG16)
  87. 87. AIDR SMS Processing AIDR Helps Answer Thousands of Health Queries
  88. 88. Public Health: AIDR + UNICEF Zambia Manual processing and rou,ng of SMS Counselors (experts of HIV, STIs) SMS service 1 2 3 4 5 6 Vulnerable people
  89. 89. Public Health: AIDR + UNICEF Zambia Manual processing and rou,ng of SMS Counselors (experts of HIV, STIs) SMS service 1 2 3 4 5 6 Vulnerable people
  90. 90. New ScienBst Featured This Work
  91. 91. Media Coverage
  92. 92. Domain AdaptaBon/Transfer Learning Ability of a system to apply knowledge and skills learned in previous domains to novel domains Ongoing Work Our Goal: To build a system that can understand natural language
  93. 93. Domain AdaptaBon Labeled source, but unlabeled target Feature extractor Machine learning algorithm Feature extractor Classifier model Input documents (blue domain) Feature vectors Labels Feature vectors Machine classified items Input documents (orange domain) Training PredicBon Source event data Target event data
  94. 94. Same Domain Learning Training data Machine learning model Tes,ng data infer predict Apples Apples Apples Oranges Different shapes, colors, skins, tastes, etc. Source domain Target domain Oranges Oranges BUT
  95. 95. Crisis-related Data ClassificaBon Training data Machine learning model Tes,ng data infer predict Italy earthquake Queensland floods Sandy hurricane Costa Rica earthquake Colorado floods Typhoon Haiyan Different events, languages, and needs etc. Source domain Target domain
  96. 96. Domain AdaptaBon
  97. 97. Model AdaptaBon EvaluaBon •  Model adapta,on using single source – Using both: in-domain and cross-domain •  Model adapta,on using mulBple sources – In-domain – Mul,ple source events without the target – Mul,ple source events with the target •  Model adapta,on in special cases – Same languages – Similar languages
  98. 98. Transfer Learning Differences in classificaBon tasks: •  Different classifica,on tasks •  Different types of disasters, stakeholders, informa,on needs Task: •  Learn from source to classify target •  Seman,c similarity between tasks •  Zero-shot learning (no training examples) •  One-shot learning (few training examples)
  99. 99. SummarizaBon and PrioriBzaBon of AcBonable InformaBon InformaBon needs & problem: •  Different stakeholders •  Different goals, requirements, and info. needs General situaBonal awareness vs. Target situaBonal awareness •  High-level general updates from an event •  Specific updates (infrastructure damages)
  100. 100. InformaBon SummarizaBon In Real-Time Class A Class B Class C Class D Summary Summary Summary Summary Classified documents stream
  101. 101. Resources, Datasets, And Tools
  102. 102. Towards Standard Baselines and Datasets CrisisNLP.qcri.org -  Access to 52 million tweets -  Around 50k labeled tweets into humanitarian categories -  Largest word2vec embeddings trained on 52m crisis-related tweets -  Out-of-vocabulary dic,onaries -  Tweets downloader
  103. 103. ACM CompuBng Survey Processing Social Media Messages in Mass Emergency: A Survey [Imran et al. 2015]
  104. 104. 27 Free Data Mining Books hap://www.datasciencecentral.com/profiles/blogs/27-free-data-mining-books
  105. 105. Special Issues Organizing Editors Chris,an Reuter (University of Siegen) Muhammad Imran (Qatar Compu,ng Research Ins,tute) Amanda Hughes (Utah State University) Starr Roxanne Hiltz (New Jersey Ins,tute of Technology) Linda Plotnick (Jacksonville State University) Special Issue on “ExploitaBon of Social Media for Emergency Relief and Preparedness” Deadline: July 1st 2017 Marie-Francine Moens, KU Leuven, Belgium Gareth Jones, Dublin City University, Ireland Muhammad Imran, Qatar Compu,ng Research Ins,tute Saptarshi Ghosh, IIT Kharagpur, India Kripabandhu Ghosh, IIT Kanpur, India Debasis Ganguly, IBM Research Labs, Dublin, Ireland Tanmoy Chakraborty, University of Maryland, College Park, USA
  106. 106. Conclusions •  InformaBon bestows power for disaster response –  People need informa,on as much as water, shelter, and food –  Disasters are unavoidable, but planning can lessen their effects •  Social media as Bme-criBcal informaBon source –  Early warnings, event detec,on, event monitoring –  Availability of informa,on opens new opportuni,es •  ArBficial Intelligence for Social Good –  Applied research at its best –  AI + humans-in-the-loop can enable rapid crisis response –  AI techniques useful for: •  Situa,onal awareness •  Ac,onable informa,on extrac,on •  Summariza,on
  107. 107. THANK YOU! CrisisNLP.qcri.org AIDR.qcri.org Email: mimran@hbku.edu.qa Homepage: hap://mimran.me Twiaer: @mimran15

×