Crisis Mapping, Citizen Sensing and Social Media Analytics 
Hemant Purohit 
Amit Sheth 
Carlos Castillo 
Patrick Meier 
Th...
Introduction: Kno.e.sis and QCRI 
•At Kno.e.sis: NSF SoCS project on ‘Social Media Enhanced Organizational Sensemaking dur...
Outline 
•Introduction 
•Gaps & Challenges 
•Role of Computer Science 
•Applied Crisis Computing 
•Design Principles
7.0 Magnitude Earthquake
Motivation -P
Ushahidi Map for Haiti
EMERGENCY HACKATHONS AFTER HAITI DEVASTATION .. Thousands of miles away!
“YOUR SITE HELPED SAVE HUNDREDS OF LIVES” 
-US MARINE CORPS
Citizen Sensing
Jakarta twitter map
Digital Footprints of Twitterers.. 
.. Pulse of the planet
FEMA Task Force Haiti 
Twitter
Why we care about Citizen Sensing? 
-It forms Self Organizing Communities!
Crisis Response Coordination 
UN Cluster 
system
We all need to join hands together for effectively improving response coordination! 
-Humanitarians 
-Computer and Social ...
-Big Data in crisis situations 
needs computing help! 
-Humanitarians alone can’t handle it!
Outline 
•Introduction 
•Gaps & Challenges 
•Scale, velocity, redundancy, heterogeneity, bias, noise & verifiability 
•Rol...
Puzzle of Crisis Informatics 
•What emergency-responders want? 
1.Any available prior knowledgeabout the impact of similar...
Puzzle of Crisis Informatics 
•What computer scientists can provide? 
•Algorithms to detect and predictabnormal trends 
•S...
Puzzle of Crisis Informatics 
•What is supported by social media data? 
•Real-time updateson the situation 
•Textual summa...
Crisis Response Analytics 
•Mainly three major methods of information extraction and mapping: 
•Manual feed(Processed info...
Illustrative Crisis Informatics Projects 
Project 
Host Team 
Focus 
Sahana 
Univ. of Maryland 
Information Management 
EP...
Illustrative Crisis Mapping and Analytics tools 
Tool 
Visual Geo 
Mapping 
Human 
Inputs 
Real- time 
Update 
People to e...
Tools: Sahana 
•A free & open source portable web tool for Disaster Management 
•Features: 
•Organization Registry 
•Maint...
Tools: Sahana(Organization registry)
Tools: Sahana(Requests List)
Tools: CrowdMap 
•The well-known Ushahidi’sversion 
•Geo-located reports 
•Crowdsourceddata pieces, turned into powerful i...
Tools: CrowdMap(Overview) http://zombiejournalism.com/2010/09/how-to-build-manage-and-customize-a-crowdmap/
Tools: CrowdMap(Reports)
Tools Demo: Twitris 
•Example of automatic processing compared to the previous tool based on manual-feed processing for cr...
Important tags to summarize Big Data flow Related to Oklahoma tornado 
Images and Videos Related to Oklahoma tornado 
Tool...
Incoming Tweets with need types to give quick idea of what is needed and where currently #OKC 
Legends for Different needs...
Tools Demo: Twitris (Influencers to engage with, for specific needs) 
Influential users are for respective needs. Right si...
Tools Demo: Twitris (In R&D: Engagement Interface for responders) 
What-Where-How-Who-Why Coordination 
Influential users ...
Tools Demo: Twitris during Oklahoma- Tornado disaster response 
•Video of the on-going monitoring on the next morning of t...
Who are the people to engage with in the evolving ad-hoc social community? 
Which needs are of utmost importance? 
Actiona...
Challenges
Challenge: Heterogeneity 
•Multiple channels 
•Phone, fax, TV, radio, newspapers, internet, sensor networks, etc. 
•Coexis...
Challenge: Velocity 
•Social media information is more valuable in the first minutes and hours after a disaster 
•Affected...
Challenge: Scale 
•In some countries a sizable fraction of the population has Internet access 
•Tweets are small and nimbl...
Challenge: Redundancy 
•Information from multiple information channels may not be unique 
•Near-duplicates frustrate users...
Challenge: Biases 
•Social Media Bias: 
•Youngersbetter user than elders 
•Educatedusers more existent than uneducated 
•T...
Challenge: Noise 
•Everyone wants to be heard 
•Independently of adding any value 
•Emotional expressions and even jokes d...
Challenge: Verifiability 
•Social media users are starting to develop their own methods to validate information 
•In crisi...
Outline 
•Introduction 
•Gaps & Challenges 
•Role of Computer Science 
•IR, DM, ML, NLP, SN, HCI 
•Applied Crisis Computin...
Information Retrieval (IR) 
•The research field that created web search 
•No problem working with subjective definitions 
...
IR Method: inverted indexes 
•What does it do? 
•Allows to locate documents 
containing a term without 
having to scan a w...
IR/ML Method: learning-to-rank paradigm 
•What does it do? 
•Find relevant documents for a search 
•How does it work? 
•Mo...
IR Method: document clustering 
•What does it do? 
•Group search results in order to better scan them; can be done on a qu...
Example of document clustering 
Crisis Tracker
Data Mining (DM) 
•The science of finding patternsin data 
•Finding association rules, categories of elements, anomalies, ...
Yan Huang, UNT.edu
DM Method: burst detection 
•What does it do? 
•Reliably identifies anomalies in a time series (e.g. volume of tweets w/ha...
DM Method: topic detection and tracking 
•What does it do? 
•Track the relative popularity of different topics over time 
...
DM Method: dimensionality reduction 
•What does it do? 
•Represent complex data in simpler terms 
•How does it work? 
•Fin...
IR/DM Method: reduce text dimensionality 
LDA. Illustration by Lisa M. Rhody 
Input: 
thousands of dimensions (one for eve...
Statistical Machine Learning (ML) 
•A branch of artificial intelligence 
•While DM focuses on discovery, ML focuses on pre...
ML method: supervised classification 
•What does it do? 
•Learn to separate different classes of elements, given (relative...
Example: automatic tweet classification 
Caution & 
Advice 
Information 
Sources 
Damage & 
Casualties 
Donations 
Health ...
ML method: regression 
•What does it do? 
•The same as supervised classification but the target is numerical, not categori...
Natural Language Processing (NLP) 
•A research area that has fought against several (possibly AI-complete) problems 
•Wats...
NLP method: tagging 
•What does it do? 
•Determines classes for tokens or segments on a text: part-of-speech tags, named e...
NLP methods: dependency parsing 
•What does it do? 
•Identifies relationships between different parts of a text 
•How does...
NLP method: disambiguation/linking 
•What does it do? 
•Connect named entities to concepts, e.g. a sense on a dictionary o...
Graph Theory (GT) a.k.a. link analysis, network analysis 
•Social graphs are important abstractions, they represent social...
GT method: graph clustering 
•What does it do? 
•Find communities of densely connected nodes 
•How does it work? 
•There a...
GT method: centrality metrics 
•What does it do? 
•Identify which nodes in a graph are in more shortest paths (centrality)...
Human-Computer Interaction (HCI) 
•Technologies should bring people joy, not frustration 
•Design principles and methodolo...
HCI method: user-centered design 
•What does it do? 
•Ensure users can use a tool effectively 
•How does it work? 
•Put us...
HCI method: prototypes and cont. evaluation 
•What does it do? 
•Help understand what users want early on, determine if de...
Outline 
•Introduction 
•Gaps & Challenges 
•Role of Computer Science 
•Applied Crisis Computing 
•DM is not the same as D...
Applied Crisis Computing Example to Assist Coordination: Donations Matching
Thanks, But No Thanks … 
•Many people want to donate during disasters 
•Waste occurs due to resources being over-or under-...
Matching requests with offers 
How to volunteer, donate to Hurricane Sandy: <URL> 
If you have clothes to donate to those ...
RT @OpOKRelief: Southgate Baptist Church on 4th Street in Moore has food, water, clothes, diapers, toys, and more. If you ...
A supervised learning approach
Information extraction: core & facets 
•Coreof the phrase is the “what” 
•Other facets may include “who”, “where”, “when”,...
Statistics
Some example matches [naïve method] 
•Pair 1: 
•Anyone know of volunteer opportunities for hurricane Sandy? Would like to ...
Much work remains to be done 
•Matching quality depends on type of donation 
•Improvements on item representation are nece...
Objective: Support Decision Making and Coordination of Actions
An analogy: product comparison sites 
•What product comparison sites do today 
•Collect pieces of information having diver...
First: extract facets from unstructured text 
•Collect messages 
•Classify according to several ontologies 
•Not only cont...
Second: manage data and enable faceted retrieval 
•Support real-time insertions 
•Must be visible immediately 
•Support re...
Third: discover relationships and clusters 
•Clustering 
•Near-duplicate detection 
•Same event/story/etc. 
•Data-driven g...
Fourth: enable high-level operations 
•Summarization [static] 
•Synthesize/extract a high-level description from a set of ...
Focus on decision making and coordination 
•Do not start by thinking on data visualization 
•Data visualization is constra...
Example questions during decision-making by actors 
(a.) Seeker/Demander 
•Whom to follow (provider) 
•Where to find resou...
Open problems
Data availability: chicken-or-egg problem 
http://www.vtaide.com/ 
People’s posts don’t include some data 
Because nobody ...
The semantic gap 
•Introduced ca. 1989 in the context of multimedia retrieval 
•Low-level features are far from high-level...
•Vertical operators facilitate transcending from data-information- knowledge-wisdomusing background knowledge 
•Horizontal...
Semantic gap: ML/DM/NLP/IR/… 
•Automatic methods for classifying and extracting information from short pieces of text are ...
Intentions: chicken-or-egg problem 
http://www.vtaide.com/ 
Some types of coordination do not often happen online 
Because...
Fine-grained analysis of intentions 
•People go online during disasters for a variety of reasons 
•How good is our underst...
Towards a generic crisis response ontology 
•UN effort on generic ontology (taxonomy and relationships) 
•HXL (Humanitaria...
Continuously-evolving models 
•How do we capture the existing knowledge evolving around an event 
Moore is a suburb of Okl...
Outline 
•Introduction 
•Gaps & Challenges 
•Role of Computer Science 
•Applied Crisis Computing 
•Design Principles 
•New...
Principle 1: Explicitly identify target users 
•This may not be a homogeneous groups 
•Identify profiles 
•Background, ski...
Target users: examples 
•Headquarters Humanitarians 
•Policy, Information Products, Coordination 
•Field Humanitarians 
•L...
Headquarters
The Field
Digital Humanitarians
Principle 2: Engage users in co- design 
•Do not let them offload requirements and then leave 
•We want them to co-design ...
Principle 3: Socio-technical systems 
•Conceptualize the system as hybrid (human and computer intelligence) from the begin...
Principle 4: Empirical evaluation through actions 
•We want systems that look good and are easy to use 
•We do not evaluat...
There is a part for everybody in this community 
Hackers, scientists, humanitarians,everybody.
Hackers 
•Create and curate useful datasets 
•Create dataset remixes 
•Create software tools 
•Create libraries 
•Create i...
Computer scientists 
•There are many open problems in ML, DM, NLP, etc. 
•Collaborating and partnering in humanitarian com...
Social scientists 
•There are many open questions about how, why, people coordinate, how to motivate them, what informatio...
Humanitarian organizations 
•It takes two to tango! 
•Your scientific partners are not providers/vendors 
•Scientists want...
Everybody 
•Interdisciplinary research is not easy to execute 
•But an unidirectional approach will create only more gaps ...
Thanks to 
•Nation Science Foundation (NSF) for SoCS project grant: Social Media Enhanced Organizational Sensemaking in Em...
Questions, Discussion and Feedback 
•References and reading material: 
•http://www.knoesis.org/hemant/present/icwsm2013 
•...
ICWSM 2013 tutorial: Crisis Mapping, Citizen Sensing and Social Media Analytics for Response Coordination
ICWSM 2013 tutorial: Crisis Mapping, Citizen Sensing and Social Media Analytics for Response Coordination
ICWSM 2013 tutorial: Crisis Mapping, Citizen Sensing and Social Media Analytics for Response Coordination
ICWSM 2013 tutorial: Crisis Mapping, Citizen Sensing and Social Media Analytics for Response Coordination
ICWSM 2013 tutorial: Crisis Mapping, Citizen Sensing and Social Media Analytics for Response Coordination
ICWSM 2013 tutorial: Crisis Mapping, Citizen Sensing and Social Media Analytics for Response Coordination
ICWSM 2013 tutorial: Crisis Mapping, Citizen Sensing and Social Media Analytics for Response Coordination
ICWSM 2013 tutorial: Crisis Mapping, Citizen Sensing and Social Media Analytics for Response Coordination
ICWSM 2013 tutorial: Crisis Mapping, Citizen Sensing and Social Media Analytics for Response Coordination
ICWSM 2013 tutorial: Crisis Mapping, Citizen Sensing and Social Media Analytics for Response Coordination
Upcoming SlideShare
Loading in...5
×

ICWSM 2013 tutorial: Crisis Mapping, Citizen Sensing and Social Media Analytics for Response Coordination

6,072

Published on

1.) Tutorial at the AAAI Int'l Conference on Social Media, ICWSM-2013 (http://www.icwsm.org/2013/program/tutorial/)
about the deep understanding of Crisis Mapping, Citizen Sensing and Social Media Analytics, presented by Kno.e.sis, Wright State University and the Qatar Computing Research Institute (QCRI)
2.) It focuses on leveraging citizen roles for crisis response coordination and urges everyone to join hands together for improving crisis computing for response coordination-- hackers, computer scientists, social scientists and humanitarians.

3.) Description: (http://www.knoesis.org/hemant/present/icwsm2013)
With the explosion in social media (1B+ Facebook users, 500M+ Twitter users) and ubiquitous mobile access (6B+ mobile phone subscribers) sharing their observations and opinions, we have unprecedented opportunities to extract social signals, create spatio-temporal mappings, perform analytics on social data, and support applications that vary from situational awareness during crisis response, preparedness and rebuilding phases to advanced analytics on social data, and gaining valuable insights to support improved decision making.

This tutorial weaves three themes and corresponding relevant topics- a.) citizen sensing and crisis mapping, b.) technical challenges and recent research for leveraging citizen sensing to improve crisis response coordination, and c.) experiences in building robust and scalable platforms/systems. It couples technical insights with identification of computational techniques and algorithms along with real-world examples. We also give demonstrations of the exemplary tools Sahana, Crowdmap and Twitris platforms while elaborating on the practical issues and pitfalls of the development and operation of these large-scale platforms, especially during the real-time crisis response.

Activities supported by NSF SoCS project: Social Media Enhanced Organizational Sensemaking in Emergency Response (http://knoesis.org/research/semsoc/projects/socs)

ICWSM 2013 tutorial: Crisis Mapping, Citizen Sensing and Social Media Analytics for Response Coordination

  1. 1. Crisis Mapping, Citizen Sensing and Social Media Analytics Hemant Purohit Amit Sheth Carlos Castillo Patrick Meier The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) Wright State, USA Qatar Computing Research Institute (QCRI) Doha, Qatar Leveraging Citizen Roles for Crisis Response Coordination
  2. 2. Introduction: Kno.e.sis and QCRI •At Kno.e.sis: NSF SoCS project on ‘Social Media Enhanced Organizational Sensemaking during Emergency Response’ •At QCRI: ‘Artificial Intelligence for Disaster Response’ (AIDR) project for Social Innovation
  3. 3. Outline •Introduction •Gaps & Challenges •Role of Computer Science •Applied Crisis Computing •Design Principles
  4. 4. 7.0 Magnitude Earthquake
  5. 5. Motivation -P
  6. 6. Ushahidi Map for Haiti
  7. 7. EMERGENCY HACKATHONS AFTER HAITI DEVASTATION .. Thousands of miles away!
  8. 8. “YOUR SITE HELPED SAVE HUNDREDS OF LIVES” -US MARINE CORPS
  9. 9. Citizen Sensing
  10. 10. Jakarta twitter map
  11. 11. Digital Footprints of Twitterers.. .. Pulse of the planet
  12. 12. FEMA Task Force Haiti Twitter
  13. 13. Why we care about Citizen Sensing? -It forms Self Organizing Communities!
  14. 14. Crisis Response Coordination UN Cluster system
  15. 15. We all need to join hands together for effectively improving response coordination! -Humanitarians -Computer and Social Scientists
  16. 16. -Big Data in crisis situations needs computing help! -Humanitarians alone can’t handle it!
  17. 17. Outline •Introduction •Gaps & Challenges •Scale, velocity, redundancy, heterogeneity, bias, noise & verifiability •Role of Computer Science •Applied Crisis Computing •Design Principles
  18. 18. Puzzle of Crisis Informatics •What emergency-responders want? 1.Any available prior knowledgeabout the impact of similar past disasters in the region? 2.Are existing response strategies sufficient? 3.Which factorswill worsen conditions? 4.How manyfatalities? Extent ofdamage? What emergency- responders want What computer scientists can provide What is supported by current social media data
  19. 19. Puzzle of Crisis Informatics •What computer scientists can provide? •Algorithms to detect and predictabnormal trends •Semantic abstractionand summarizationof data •Human+Machine readable knowledge organizationvia ontologies •Technology to mapgeo-located information •Visualdata interface for quicker comprehension What emergency- responders want What computer scientists can provide What is supported by current social media data
  20. 20. Puzzle of Crisis Informatics •What is supported by social media data? •Real-time updateson the situation •Textual summaries, images, videos •Messages about needs and offers •Geo-locationmetadata What emergency- responders want What computer scientists can provide What is supported by current social media data
  21. 21. Crisis Response Analytics •Mainly three major methods of information extraction and mapping: •Manual feed(Processed info.) based •e.g., Most of the formal and hybrid response organizations (Red Cross, UNOCHA), Recovers.org, AIDMatrix, SparkRelief, etc. •Crowdsourcing with limited automation •e.g., Crowdmap/Ushahidi, etc. •Automatized processingbased •e.g., Twitris, CrisisTracker, etc. •Information management for resource coordination: •e.g., Sahana
  22. 22. Illustrative Crisis Informatics Projects Project Host Team Focus Sahana Univ. of Maryland Information Management EPIC (Tweak-the-Tweet) Univ.of Colorado and UCIrvine Information extraction and behavioral aspectsin response NSF SoCS Kno.e.sis, Wright State Univ. and Ohio State Univ. Organizational sensemakingand Coordination AIDR QCRI, Doha Targeted Information extraction NSF GeoNets Univ. of SouthernCalifornia Ad hoc Geospatial Data Sharing Note that it is not an exhaustive list, see more resources here: http://wiki.knoesis.org/index.php?title=Summary_about_Social_Media_Research_in_Disaster/Emergency_Response_Systems&oldid=5177
  23. 23. Illustrative Crisis Mapping and Analytics tools Tool Visual Geo Mapping Human Inputs Real- time Update People to engage with Topical summary Exploredata Semantics CrowdMap(Ushahidi) Y Y Y Y Sahana Y Y Y Y Y AIDMatrix Y Y Recovers.org Y Y SparkRelief Y Y Y Twitris* Y Y Y Y Y Crisis Tracker* Y Y Y *Social Media driven Note that it is not an exhaustive list, see more resources here: http://wiki.knoesis.org/index.php?title=Summary_about_Social_Media_Research_in_Disaster/Emergency_Response_Systems&oldid=5177
  24. 24. Tools: Sahana •A free & open source portable web tool for Disaster Management •Features: •Organization Registry •Maintains data (contact, services, etc.) of organizations and volunteers in response •Missing Persons / Disaster Victim Registry •Helps track and find missing, deceased, injured and displaced people and families •Request Management •Tracks all requests and helps match pledges for support, aid and supplies to fulfilment •Shelter Registry •Tracks data on all temporary shelters setup following the Disaster More: http://www.slideshare.net/skbohra/sahana-disaster-management-system
  25. 25. Tools: Sahana(Organization registry)
  26. 26. Tools: Sahana(Requests List)
  27. 27. Tools: CrowdMap •The well-known Ushahidi’sversion •Geo-located reports •Crowdsourceddata pieces, turned into powerful information nuggets as reports from regions •Video: •http://www.youtube.com/watch?v=GjPc39OXr6I
  28. 28. Tools: CrowdMap(Overview) http://zombiejournalism.com/2010/09/how-to-build-manage-and-customize-a-crowdmap/
  29. 29. Tools: CrowdMap(Reports)
  30. 30. Tools Demo: Twitris •Example of automatic processing compared to the previous tool based on manual-feed processing for crisis computing •A Semantic Social Web platform for comprehensive event analysis •Real-time monitoring and multi-faceted analysis of social signals: •space, time, people, content, network, and additionally sentiment and emotion •Platform for on-going research for situational awareness and coordination using social media and knowledge on the Web
  31. 31. Important tags to summarize Big Data flow Related to Oklahoma tornado Images and Videos Related to Oklahoma tornado Tools Demo: Twitris (Topical nugget summary)
  32. 32. Incoming Tweets with need types to give quick idea of what is needed and where currently #OKC Legends for Different needs #OKC Tools Demo: Twitris (Real-time information for needs) Clicking on a tag brings contextual information–relevant tweets, news/blogs, and Wikipedia articles
  33. 33. Tools Demo: Twitris (Influencers to engage with, for specific needs) Influential users are for respective needs. Right side shows their interaction network on social media. Engaging with influencers in the self organizing communities can be very powerful for-a.) getting important information, b.) Correcting rumors in the network, c.) Propagating important information back into the citizen sensors community
  34. 34. Tools Demo: Twitris (In R&D: Engagement Interface for responders) What-Where-How-Who-Why Coordination Influential users to engage with and resources for seekers/supplies at a location, at a timestamp Contextual Information for a chosen topical tags
  35. 35. Tools Demo: Twitris during Oklahoma- Tornado disaster response •Video of the on-going monitoring on the next morning of the Oklahoma Tornado: •http://twitris.knoesis.org/images/datasets-and-models/Twitris--for- Oklahoma-disaster.mov •Snapshots during the analysis: •Images
  36. 36. Who are the people to engage with in the evolving ad-hoc social community? Which needs are of utmost importance? Actionable information improves decision making process. Who are the resource seekers and suppliers? Questions to social media toolsfor Disaster Response Coordination Where can I go for volunteering at my location? How and Where can one donate?
  37. 37. Challenges
  38. 38. Challenge: Heterogeneity •Multiple channels •Phone, fax, TV, radio, newspapers, internet, sensor networks, etc. •Coexistence of technologies, a constant •Social media is heterogeneous •Verified accounts •Re-tweets from well-known sources •Eyewitness reports •Lots more! •Different types (unstructured text, structured, multimedia) may require different tools http://blogs.lse.ac.uk
  39. 39. Challenge: Velocity •Social media information is more valuable in the first minutes and hours after a disaster •Affected people are there before anybody else •When emergency responders arrive, their priority may not be to keep information flowing •After hours/days social media is still valuable, but there is much more information from other sources •In the early hours of a disaster, television feels so slow in comparison •Often a few seconds of footage repeated over and over and over http://seventhinc.com/
  40. 40. Challenge: Scale •In some countries a sizable fraction of the population has Internet access •Tweets are small and nimble but they point to webpages, include images, videos, etc. •You need to process a lot to obtain a little •There are many tweets but •Only some of them contain usable information •Only a fraction of those can be handled by automatic systems Top-4 countries by Twitter penetration among Internet users; by Comscorevia http://5mk.co/
  41. 41. Challenge: Redundancy •Information from multiple information channels may not be unique •Near-duplicates frustrate users and waste their time •Definition of abstraction level (to merge items) is always arbitrary, depends on the application •Automatic systems tend to pick what is redundant first •Not necessarily a bad thing, e.g. phrases that are often repeated, tweets that are often re-tweeted, etc. Millenial’sinformation sources http://ypulse.com/
  42. 42. Challenge: Biases •Social Media Bias: •Youngersbetter user than elders •Educatedusers more existent than uneducated •Technology Privilegedusers more existent than unprivileged •Study carefully, with the grains of salt! •Smart sampling •Smart data cleaning •Smart algorithms
  43. 43. Challenge: Noise •Everyone wants to be heard •Independently of adding any value •Emotional expressions and even jokes drive the data traffic •Informal text and jargon hinders automatic text processing
  44. 44. Challenge: Verifiability •Social media users are starting to develop their own methods to validate information •In crisis scenarios most rumors are spread by well-intentioned people •But there are also some pranksters •We need a more fine-grained approach than true/false (we have always needed it) Edelman 2012 http://edelman.com/trust
  45. 45. Outline •Introduction •Gaps & Challenges •Role of Computer Science •IR, DM, ML, NLP, SN, HCI •Applied Crisis Computing •Design Principles
  46. 46. Information Retrieval (IR) •The research field that created web search •No problem working with subjective definitions •Relevance has always been in the eye of the beholder •Can help us by providing searching and ranking operations on social media reports
  47. 47. IR Method: inverted indexes •What does it do? •Allows to locate documents containing a term without having to scan a whole Collection How does it work? •An inverted indexcontains a list of terms, and a list of documents containing each term •How can it help us? •Indexing a collection of reports can help us locate specific ones very quickly Encyclopedia of Language and Linguistics
  48. 48. IR/ML Method: learning-to-rank paradigm •What does it do? •Find relevant documents for a search •How does it work? •Modern methods use hundreds of static(document-dependent) and dynamic(query-document-dependent) characteristics and a learning-to-rank framework •How can it help us? •Modern IR is well beyond hard rules, and beyond heuristic scoring functions; no need to re-invent the wheel http://people.dsv.su.se/~eriks/
  49. 49. IR Method: document clustering •What does it do? •Group search results in order to better scan them; can be done on a query-dependent or query-independent way •How does it work? •One way is to do a weighted dot product in which weights are associated to how informative (~rare) are words •How can it help us? •Makes it easier to deal with large, redundant collections of text http://carrot2.org/
  50. 50. Example of document clustering Crisis Tracker
  51. 51. Data Mining (DM) •The science of finding patternsin data •Finding association rules, categories of elements, anomalies, etc. •Managing temporal data •Can help us detect and track trends and topics •Managing static data •Can help us reduce the dimensionality of data
  52. 52. Yan Huang, UNT.edu
  53. 53. DM Method: burst detection •What does it do? •Reliably identifies anomalies in a time series (e.g. volume of tweets w/hashtag vs time) •How does it work? •Look for increases above the norm; look for change patterns that precede crisis •In general it is hard over noisy signals •How can it help us? •Detection of sub-events in an ongoing crisis is important to rapidly respond to them Volume for query “boston” in Google (trends.google.com).
  54. 54. DM Method: topic detection and tracking •What does it do? •Track the relative popularity of different topics over time •How does it work? •Cluster documents per time slice, merge across times slices •How can it help us? •See emerging stories, track new developments, sub- stories, etc. TextFlow
  55. 55. DM Method: dimensionality reduction •What does it do? •Represent complex data in simpler terms •How does it work? •Find independent pieces of information, discard/merge correlated ones •How can it help us? •We can focus on the big picture, not just hash-tags and keywords, but topics 4 dimensions (x,y,z,color) 2 dimensions (x’,y’) X, Y axes are correlated => X’ axis Z is independent => Y’ Color is equivalent to X’ => gone X’ Y’ http://www.cs.otago.ac.nz/
  56. 56. IR/DM Method: reduce text dimensionality LDA. Illustration by Lisa M. Rhody Input: thousands of dimensions (one for every word) Output: a handful of dimensions (one for every topic)
  57. 57. Statistical Machine Learning (ML) •A branch of artificial intelligence •While DM focuses on discovery, ML focuses on prediction •ML aims at representing data and generalizing from it •Supervised statistical machine learning is a well-established framework to learn the relationship between inputs and outputs •Can help us learn from human labeling efforts to create automatic labels for new data
  58. 58. ML method: supervised classification •What does it do? •Learn to separate different classes of elements, given (relatively) few examples •How does it work? •Several methods to choose from, popular ones are SVMs and Decision Trees/Forests •How can it help us? •Automatic classification of reports http://www.quora.com/
  59. 59. Example: automatic tweet classification Caution & Advice Information Sources Damage & Casualties Donations Health Shelter Food Water Logistics ... ...
  60. 60. ML method: regression •What does it do? •The same as supervised classification but the target is numerical, not categorical •How does it work? •It learns the parameters of a function that fits what is observed •How can it help us? •It can predict an outcome from current data http://qcri.qa/
  61. 61. Natural Language Processing (NLP) •A research area that has fought against several (possibly AI-complete) problems •Watson and other projects have demonstrated visibly their success •Can help us to classify and extract information by doing automatically: •Morphological analysis •Dependency parsing •Entity linking / Word sense disambiguation http://voices.washingtonpost.com/
  62. 62. NLP method: tagging •What does it do? •Determines classes for tokens or segments on a text: part-of-speech tags, named entities •How does it work? •Supervised learning with structured outputs •How can it help us? •A richer representation of tweets yields better predictions •Spotting named entities or key phrases can help summarize tweets I/preposition can/modal see/verb the/determiner flames/noun from/preposition here/adverb
  63. 63. NLP methods: dependency parsing •What does it do? •Identifies relationships between different parts of a text •How does it work? •Learned from labeled data using structured output (output is a parse tree) •How can it help us? •Identifying key elements on text can help find cases where a named entity is central on a report “Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas” http://nlp.stanford.edu/
  64. 64. NLP method: disambiguation/linking •What does it do? •Connect named entities to concepts, e.g. a sense on a dictionary or a URL in Wikipedia •How does it work? •Entities can have multiple senses; the correct one is picked by using contextual clues •How can it help us? •Once we have determined a concept we can map it to broader classes 1 readiness to give attention 2 quality of causing attention to be given 3 activity, subject, etc., which one gives time and attention to 4 advantage, advancement, or favour 5 a share (in a company, business, etc.) 6 money paid for the use of money Meaning of “interest” This may be of interest [2]to you The money grows because of compound interest [6] http://www.ling.gu.se/~lager/
  65. 65. Graph Theory (GT) a.k.a. link analysis, network analysis •Social graphs are important abstractions, they represent social connections as a graph •Lots of information can be derived from properties of this graph •Communities •Central users •Bridges •Availability of large datasets from online social networking sites has brought new life to this field http://www.hackingalert.net/
  66. 66. GT method: graph clustering •What does it do? •Find communities of densely connected nodes •How does it work? •There are many methods, depending on the definition of community •How can it help us? •We can identify groups of people who are closely connected http://griffsgraphs.com/
  67. 67. GT method: centrality metrics •What does it do? •Identify which nodes in a graph are in more shortest paths (centrality), or are more likely to be at the end of a random walk (PageRank) •How does it work? •Pagerank is computed through iterative calculations over the entire graph •How can it help us? •These are good proxies for importance on a network Wikipedia
  68. 68. Human-Computer Interaction (HCI) •Technologies should bring people joy, not frustration •Design principles and methodologies have been developed over years •More important, evaluation and validation criteria have emerged
  69. 69. HCI method: user-centered design •What does it do? •Ensure users can use a tool effectively •How does it work? •Put users and their tasks at the center of the design process •How can it help us? •We can avoid losing the focus on our application development by starting with the users’ concerns http://usability.msu.edu/
  70. 70. HCI method: prototypes and cont. evaluation •What does it do? •Help understand what users want early on, determine if design is effective •How does it work? •Build mock-ups and low-fidelity prototypes early on, evaluate them empirically •How can it help us? •Users may not know what they want until they see it; integrating them in the design requires communicating effectively; we also need to know how are we going to measure.
  71. 71. Outline •Introduction •Gaps & Challenges •Role of Computer Science •Applied Crisis Computing •DM is not the same as DM •Design Principles
  72. 72. Applied Crisis Computing Example to Assist Coordination: Donations Matching
  73. 73. Thanks, But No Thanks … •Many people want to donate during disasters •Waste occurs due to resources being over-or under-supplied •Goal: understanding what is needed and what is offered by social media usershttp://www.npr.org/2013/01/09/168946170/thanks-but-no-thanks-when-post-disaster-donations-overwhelm
  74. 74. Matching requests with offers How to volunteer, donate to Hurricane Sandy: <URL> If you have clothes to donate to those who are victims of Hurricane Sandy … Red Cross is urging blood donations to support those affected <URL> I have TONS of cute shoes & purses I want to donate to hurricane victims … Does anyone know how to donate clothes to hurricane #Sandy victims? Does anyone know of community service organizations to volunteer to help out? Needs to get something, suggests scarcity: REQUEST (demand) Offers or wants to give, suggests abundance: OFFER (supply)
  75. 75. RT @OpOKRelief: Southgate Baptist Church on 4th Street in Moore has food, water, clothes, diapers, toys, and more. If you can't go,call794 Text "FOOD" to 32333, REDCROSS to 90999, or STORM to 80888 to donate $10 in storm relief. #moore#oklahoma#disasterrelief#donate Want to help animals in #Oklahoma? @ASPCA tells how you can help: http://t.co/mt8l9PwzmO CITIZEN SENSORS RESPONSE TEAMS (including humanitarian org. & ‘pseudo’ responders) VICTIM SITE Coordination of needs and offers Using Social Media Does anyone know where to send a check to donate to the tornado victims? Where do I go to help out for volunteer work around Moore? Anyone know? Anyone know where to donate to help the animals from the Oklahoma disaster? #oklahoma#dogs Matched Matched Matched Serving the need! If you would like to volunteer today, help is desperately needed in Shawnee. Call 273-5331 for more info
  76. 76. A supervised learning approach
  77. 77. Information extraction: core & facets •Coreof the phrase is the “what” •Other facets may include “who”, “where”, “when”, etc. Rotary collecting clothing and other donations in New Jersey <URL> {source: “Twitter”, author: “@NN”, text: “Rotary collecting clothing and other donations in New Jersey <URL>”, donation-info: { donation-type: “Request”, donation- type-confidence: 0.8, donation-organization: “Rotary”, donation-item: “clothing and other donations”, donation-location: “New Jersey” }, … }
  78. 78. Statistics
  79. 79. Some example matches [naïve method] •Pair 1: •Anyone know of volunteer opportunities for hurricane Sandy? Would like to try and help in anyway possible (OFFER) •RT @Gothamist: How To Volunteer, Donate To Help Hurricane Sandy Victims http://t.co/fXUOnzJe (REQUEST) •Pair 2: •I want to send some clothes for hurricane relief (OFFER) •Me and @CeceVancePRare coordinating a clothing/food drive for families affected by Hurricane Sandy. If you would like to donate, DM us. (REQUEST)
  80. 80. Much work remains to be done •Matching quality depends on type of donation •Improvements on item representation are necessary •Sparsityis part of the problem •Improvements on matching quality are necessary •Hybrid approach needs to be investigated •Budget of K crowdsourcing calls, which items to annotate? •A real-world system should use continuous querying, is this efficient? Similar approach is applicable in other problem contexts of coordination as well!
  81. 81. Objective: Support Decision Making and Coordination of Actions
  82. 82. An analogy: product comparison sites •What product comparison sites do today •Collect pieces of information having diverse structure (each site has its own) •Enrich them with automatically-extracted facets (photo, name, reviews, etc.) •Cluster/de-duplicate •Enable search by extracted facets •In our case, there is almost no structure to start with, just context http://pricegrabber.com/
  83. 83. First: extract facets from unstructured text •Collect messages •Classify according to several ontologies •Not only content classification •Also Author/Source classification: discover roles •Extract core aspects / information nuggets •Identify the key portion of a message •Extract facets •Geo-locate
  84. 84. Second: manage data and enable faceted retrieval •Support real-time insertions •Must be visible immediately •Support real-time updates •E.g. new user assessments/labels of data •E.g. new parameters for an automatic classifier •Support complex queries •Faceted retrieval on complex predicates •Return relevant results •Relevance is based on multiple signals (geo, time, IR-based, etc.)
  85. 85. Third: discover relationships and clusters •Clustering •Near-duplicate detection •Same event/story/etc. •Data-driven geographical regions •Discover relationships •Content-Reply ? •Claim-Refutation ? •Etc. •Best supported by linked data management systems http://www.jaunted.com/ At a high level, what are the names of the touristic hot-spots of the word?
  86. 86. Fourth: enable high-level operations •Summarization [static] •Synthesize/extract a high-level description from a set of items •Semantic clustering [static] •Determine clusters based on high-level characteristics of data •Event detection [dynamic] •Discover large changes in the data at some level of abstraction •Topic tracking [dynamic] •Discover how a topic (an aspect of the data) evolves over time
  87. 87. Focus on decision making and coordination •Do not start by thinking on data visualization •Data visualization is constrained by the richness of your data •Start by thinking on how to make your data richer •Key questions to prioritize R&D on these systems: •Who will consume the data? •What decisions does this person or this community need to take? •Which aspects of the data support these decisions? •How do we know the decision was correct? •Can the end-users of the social media analysis make better decisions than the non-users?
  88. 88. Example questions during decision-making by actors (a.) Seeker/Demander •Whom to follow (provider) •Where to find resource info •Whom to contact in the Responder teams (b.) Provider/Supplier •Whom to follow (Seeker) •Where to find resource scarcity info •Whom to inform in the Responder side (c.) Responder •Whom (seeker/provider) to contact/DM/Mention •Where to find resource scarcity/availability info. •Whom to communicate to deliver the right info. in right time
  89. 89. Open problems
  90. 90. Data availability: chicken-or-egg problem http://www.vtaide.com/ People’s posts don’t include some data Because nobody is looking for that data
  91. 91. The semantic gap •Introduced ca. 1989 in the context of multimedia retrieval •Low-level features are far from high-level information needs http://www.semanticmetadata.net/
  92. 92. •Vertical operators facilitate transcending from data-information- knowledge-wisdomusing background knowledge •Horizontal operators facilitate semantic integration of multimodal observations Analogy for Low level Data to High Level transformationhttp://www.slideshare.net/apsheth/physical-cyber-social-computing-an- early-21st-century-approach-to-computing-for-human-experience
  93. 93. Semantic gap: ML/DM/NLP/IR/… •Automatic methods for classifying and extracting information from short pieces of text are usable but from perfect •Noisy texts make the problem harder •Social media English is a particular dialect of English •Short texts make the problem harder •There is not enough context to disambiguate •Frequency-based methods to determine key words are not usable •Important subtleties escape us •e.g. irony in sentiment analysis
  94. 94. Intentions: chicken-or-egg problem http://www.vtaide.com/ Some types of coordination do not often happen online Because there are no platforms supporting such coordinations
  95. 95. Fine-grained analysis of intentions •People go online during disasters for a variety of reasons •How good is our understanding of these reasons •Suppose we know the top-3 reasons, how many people those reasons cover •The only way of operating with a long-tail of information needs is to think in the more general terms possible •Plus opportunistically creating “vertical” systems for niche needs
  96. 96. Towards a generic crisis response ontology •UN effort on generic ontology (taxonomy and relationships) •HXL (Humanitarian Exchange Language) •Still a gap between what has been modeled so far vs. what can be used (supported via data and analytics) •Current efforts in the W3C community on ‘Emergency Information management’ on extending HXL with other existing relevant ontologies and create a necessary and sufficient model More about HXL: http://hxl.humanitarianresponse.info/ns/index.html
  97. 97. Continuously-evolving models •How do we capture the existing knowledge evolving around an event Moore is a suburb of Oklahoma City If you would like to volunteer today, help is desperately needed in Shawnee. Call 273- 5331 for more info Shawnee is a suburb near Moore Geographies: Shawnee Moore Focus areas for Data collection, processing and analytics
  98. 98. Outline •Introduction •Gaps & Challenges •Role of Computer Science •Applied Crisis Computing •Design Principles •New systems focused on actions and coordination
  99. 99. Principle 1: Explicitly identify target users •This may not be a homogeneous groups •Identify profiles •Background, skills, etc.
  100. 100. Target users: examples •Headquarters Humanitarians •Policy, Information Products, Coordination •Field Humanitarians •Logistics, Relief, Coordination •Digital Humanitarians •Information Collection •Analysis
  101. 101. Headquarters
  102. 102. The Field
  103. 103. Digital Humanitarians
  104. 104. Principle 2: Engage users in co- design •Do not let them offload requirements and then leave •We want them to co-design with us •This requires effective tools for communication •e.g. wireframe designs, user stories, etc.
  105. 105. Principle 3: Socio-technical systems •Conceptualize the system as hybrid (human and computer intelligence) from the beginning •Improve response in a continuous fashion •We want users to be part of the operation of the systems themselves
  106. 106. Principle 4: Empirical evaluation through actions •We want systems that look good and are easy to use •We do not evaluate based on looks •Are the actions of users better than those of non-users?
  107. 107. There is a part for everybody in this community Hackers, scientists, humanitarians,everybody.
  108. 108. Hackers •Create and curate useful datasets •Create dataset remixes •Create software tools •Create libraries •Create interoperability
  109. 109. Computer scientists •There are many open problems in ML, DM, NLP, etc. •Collaborating and partnering in humanitarian computing •It is easier to share data and solutions in this application domain than in commercially-driven ones •This is also a rich test bed for testing algorithms •Those algorithms can be useful well beyond humanitarian computing
  110. 110. Social scientists •There are many open questions about how, why, people coordinate, how to motivate them, what information do they require, how to present that information, etc. •Which organizational structures are better during different phases of crisis-mitigation, rescue, relief, recovery and rebuild •Humanitarian and crisis computing projects need to be assessed and evaluated for intended impact. •How to communicate these projects--need to be communicated in non-technical language that humanitarian policy makers understand.
  111. 111. Humanitarian organizations •It takes two to tango! •Your scientific partners are not providers/vendors •Scientists want access to your experts and data •Access toexperts and problems is extremely important •This is a win-win situation: help us create partnerships http://worcestertango.org/
  112. 112. Everybody •Interdisciplinary research is not easy to execute •But an unidirectional approach will create only more gaps in the research-to-practice pipeline.
  113. 113. Thanks to •Nation Science Foundation (NSF) for SoCS project grant: Social Media Enhanced Organizational Sensemaking in Emergency Response •Kno.e.sis Twitris team, Prof. Valerie Shalin, Prof. John Flach, Andrew Hampton in the Dept. of Psychology (Wright State U) •Prof. SriniParathasarathy, YiyeRuan, Dave Fuhry(Ohio State U) •Fernando Diaz, Microsoft Research •Shady Elbaussoni, Beirut University •Muhammad Imran, QCRI •JakobRogstadius, Madeira University •Our colleagues for suggestions on the material including Sahanaproject @UMD and ISI @USC, etc. •Images used here belong to their respective owners, we are grateful to such usefulness of their work that these images can be illustrative in certain contexts! Many thanks!
  114. 114. Questions, Discussion and Feedback •References and reading material: •http://www.knoesis.org/hemant/present/icwsm2013 •http://humanitariancomp.referata.com/ •Got Questions? –Talk to us on Twitter: @hemant_pt, @ChaToX, @PatrickMeier, @amit_p
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×