Finding Relevant Crisis Information on Social Media

Crisis Computing
Finding relevant and credible information on social
media during disasters
Big Data Analytics Conference
Delhi, India, December 2014

January 2010
How/when did it start for me?

Humanitarian Computing
At least 775publications:
●
Crisis Analysis (55)
●
Crisis Management (309)
●
Situational Awareness (67)
●
Social Media (231)
●
Mobile Phones (74)
●
Crowdsourcing (116)
●
Software and Tools (97)
●
Human-Computer Interaction (28)
●
Natural Language Processing (33)
●
Trust and Security (33)
●
Geographical Analysis (53)
Source: http://humanitariancomp.referata.com/

http://www.youtube.com/watch?v=0UFsJhYBxzY

8
Carlos Castillo – chato@acm.org
http://www.chato.cl/research/
An earthquake hits a Twitter user
• When an earthquake strikes, the first tweets are
posted 20-30 seconds later
• Damaging seismic waves travel at 3-5 km/s, while
network communications are light speed on
fiber/copper + latency
• After ~100km seismic waves may be overtaken by
tweets about them
http://xkcd.com/723/

Alexandra Olteanu, Sarah Vieweg and Carlos Castillo: What to Expect When the
Unexpected Happens: Social Media Communications Across Crises.
To appear in CSCW 2015.
Examples of crisis tweets (cont.)

11
Fertile grounds for applied research
✔
Problems of global significance
✔
Solved with labor-intensive methods
✔
Better solution provides a public good
✔
Large and noisy data sets available
✔
Engage volunteer communities

12
Fertile grounds for applied research
✔
Problems of global significance
✔
Solved with labor-intensive methods
✔
Better solution provides a public good
✔
Large and noisy data sets available
✔
Engage volunteer communities
• Relevance to practitioners?

13
Current collaborators
Patrick Meier
– QCRI
Sarah Vieweg
– QCRI
Muhammad Imran
– QCRI
Irina Temnikova
– QCRI
Alexandra Olteanu
– EPFL
Aditi Gupta
– IIIT Delhi
“P.K.” Kumaraguru
– IIIT Delhi
Fernando Diaz
– Microsoft

14
Outline
Crisis Maps
Extraction
Matching
Verification
Credibility

Crisis maps from social media
Carlos Castillo, Fernando Diaz, and Hemant Purohit:
Leveraging Social Media and Web of Data to Assist Crisis Response Coordination
Tutorial at SDM, Philadelphia, PA, USA. April 2014.
Hemant Purohit, Carlos Castillo, Patrick Meier and Amit Sheth:
Crisis Mapping, Citizen Sensing and Social Media Analytics
Tutorial at ICWSM, May 2013.

Patrick Meier, Social Innovation Director @ QCRI – http://irevolution.net/
“What can speed humanitarian
response to tsunami-ravaged
coasts? Expose human rights
atrocities? Launch helicopters to
rescue earthquake victims?
Outwit corrupt regimes?
A map.”

21
Crisis mapping goes mainstream (2011)

http://newsbeatsocial.com/watch/0_s6xxcr3p

Understanding Crisis Tweets
Alexandra Olteanu, Sarah Vieweg and Carlos Castillo: What to Expect When the
Unexpected Happens: Social Media Communications Across Crises.
To appear in CSCW 2015.

29
Types of Disaster

30
3.
Extraction
Our approach
2.
Classification
1.
Filtering

31
Filtering
Is disaster-
related?
Contributes to
situational
awareness?
Yes Yes
No No

32
Classification
Caution &
Advice
Information
Sources
Damage &
Casualties
Donations
Gov
Eyewitness
Media
NGO
Outsider
...
...
Filtered
tweets

33
A large-scale study of crisis tweets
• Collect tweets from 26 disasters
• Classify according to:
●
Informative / Not informative
●
Information provided
●
Information source

34
Advice on labeling
• Your instructions will never be correct the first
time you try
– e.g. personal / eyewitness
– Instructions must be re-written reactively
– Perform small-scale labeling first
• Instructions must be concrete and brief
– If you can't do it, the task has to be divided

35
Information Provided in Crisis Tweets
N=26; Data available at http://crisislex.org/

36
What do people tweet about?
• Affected individuals
– 20% on average (min. 5%, max. 57%)
– most prevalent in human-induced, focalized & instantaneous events
• Sympathy and emotional support
– most prevalent in instantaneous events
• Other useful information
– least prevalent in diffused events

37
What do people tweet about? (cont.)
• Infrastructure and utilities
– most prevalent in diffused events, in particular floods
• Caution and advice
– least prevalent in instantaneous & human-induced events
• Donations and volunteering
– most prevalent in natural hazards

Distribution over information sources

Extracting information and matching
emergency-related resources
Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier:
Extracting Information Nuggets from Disaster-Related Messages in Social Media
In ISCRAM. Baden-Baden, Germany, 2013. Best paper award.
Hemant Purohit, Amit Sheth, Carlos Castillo, Patrick Meier, Fernando Diaz:
Emergency-Relief Coord. on Social Media: Auto. Matching Resource Requests and Offers
First Monday 19 (1), January 2014
Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier:
Practical Extraction of Disaster-Relevant Information from Social Media
In SWDM. Rio de Janeiro, Brazil, 2013

41
Information Extraction
...
Classified
tweets
@JimFreund: Apparently we have no choice.
There is a tornado watch in effect
tonight.

42
Extraction
• #hashtags, @user mentions, URLs, etc.
– Regular expressions
– Text library from Twitter
• Temporal expressions
– Part-of-speech tagger + heuristics
– Natty library
• Supervised learning

43
Labels for extraction
• Type-dependent instruction
• Ask evaluators to copy-paste a word/phrase from
each tweet

44
Learning: Conditional Random Fields
• Used extensively in NLP for part-of-speech tagging
and information extraction
• Representation of observations is important
(capitalization, position, etc.)
HMM Linear-chain CRF
hidden
observed

45
Tool
• CMU ARK Twitter NLP
– Tokenization
– Feature extraction
– CRF learning
• Very easy to use: simply change the training set
(part-of-speech tags) into anything, and re-train

46
Output examples
RT @weatherchannel: .@NYGovCuomo orders closing of NYC bridges. Only
Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy
#NYC
Wow what a mess #Sandy has made. Be sure to check on the elderly and
homeless please! Thoughts and prayers to all affected
RT @twc_hurricane: Wind gusts over 60 mph are being reported at Central Park
and JFK airport in #NYC this hour. #Sandy
RT @mitchellreports: Red Cross tells us grateful for Romney donation but prefer
people send money or donate blood dont collect goods NOT best way to help
#Sandy

47
Extractor evaluation
Setting Rec Prec
Train 2/3 Joplin, Test 1/3 Joplin 78% 90%
Train 2/3 Sandy, Test 1/3 Sandy 41% 79%
Train Joplin, Test Sandy 11% 78%
Train Joplin + 10% Sandy, Test 90% Sandy 21% 81%
• Precision is: one word or more in common with
what humans extracted

48
Donations matching
• Identify and match requests/offers for donations
– Money, clothing, food, shelter, volunteers, blood
Average precision = 0.21 (0.16 if only text similarity is used)

Crowdsourced stream processing systems
Muhammad Imran, Ioanna Lykourentzou and Carlos Castillo:
Engineering Crowdsourced Stream Processing Systems
http://arxiv.org/abs/1310.5463

50

51
Design objectives and principles
Design principles
Design objective Example metric Automatic
components
Crowdsourced
components
Low latency End-to-end time Keep-items moving Trivial tasks
High throughput Output items per
unit of time
High-performance
processing
Task automation
Load adaptability Rate response
function
Load shedding, load
queueing
Task prioritization
Cost effectiveness Cost vs. quality,
throughput, etc.
N/A Task frugality
High quality Application-
dependent
Redudancy, aggregation and quality control

Design patterns
● QA loop
● Task assignment
● Process/verify
● Supervised learning
● Crowdwork sub-task
chaining
● Humans are not a
bottleneck
● Humans review every
output element

53
http://aidr.qcri.org/

54
Self-service for crisis-related classification
Unstructured
text reports
Categorized
information
Automatic
classifier
Model
Builder
Crowdsourced
ground-truth
Library of
training data

Credibility and verification
Aditi Gupta, Ponnurangam Kumaraguru, Carlos Castillo and Patrick Meier:
TweetCred: A Real-time Web-based System for Credibility of Content on Twitter
In SocInfo 2014. Runner-up for best paper award.
Carlos Castillo, Marcelo Mendoza, Barbara Poblete:
Predicting Information Credibility in Time-Sensitive Social Media
In Internet Research, Vol. 23, Issue 5. October 2013.
A. Popoola, D. Krasnoshtan, A. Toth, V. Naroditskiy, C. Castillo, P. Meier and I. Rahwan:
Information Verification during Natural Disasters
Social Web and Disaster Management (SWDM) workshop, 2013.

http://www.youtube.com/watch?v=pAHoEO-K0Ek

62
Crowdsourced verification: Veri.ly
• Frame crowdwork correctly
• Not upvoting/downvoting a claim
• Instead, providing evidence for/against
@VeriDotLy — http://veri.ly/

65
Examples of evidence provided

66
Automatic credibility evaluation: TweetCred
• Real-time web-based service
• Used as a Chrome extension
• Annotates Twitter's timeline with credibility
scores

67
http://twitdigest.iiitd.edu.in/TweetCred/

68
Next steps
• Credibility facets
– Factually written
– Detailed
– Author on the ground
– ...
• Respond to searches about an event

71
Computationally
feasible
Supported by
data
Useful
Good projects in this space

72
Computationally
feasible
Supported by
data
Useful
Good projects in this space
Temptation! Danger!
Poorly planned
projects :-(
AI-complete
problems

73
Some venues
• SWDM – Workshop on Social Web
for Disaster Management
– Deadline: January 24th
• ISCRAM – International Conference on Information Systems
for Crisis Response and Management
+ the usual suspects, depending on your area ;-)

74
Possibility of large impact by using computer
science to support humanitarian work
=
Applied computing at its best

Thank you!
Carlos Castillo · chato@acm.org
With thanks to Patrick Meier for several slides

Finding Relevant Crisis Information on Social Media

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Finding Relevant Crisis Information on Social Media

Similar to Finding Relevant Crisis Information on Social Media (20)

More from Carlos Castillo (ChaTo)

More from Carlos Castillo (ChaTo) (19)

Recently uploaded

Recently uploaded (20)

Finding Relevant Crisis Information on Social Media