Using Social Media to Enhance Emergency Situation Awareness

USING SOCIAL MEDIA TO ENHANCE
EMERGENCY SITUATION AWARENESS
Web Information Retrieval 2018/2019
Danilo Marzilli
Andrea Lombardo
Daniele Davoli
Prof. Andrea Vitaletti - Prof. Luca Becchetti

Goals
Real-time event detection through social media
• Earthquake and flood Users as sensors
• Type of disaster: earthquake and flood
Introduction and goals
Online/Hierarchical clustering
• Topic discovery
Classification problems
• Relevant/Not relevant (SVM)
• Flood/Earthquake (SVM & NB)
Experimental results

The dataset: CRESCI-SWDM15
5.642 manually annoted tweets in Italian language, 4 different natural disasters
occurred in Italy between 2009 and 2014 and 3 classes:
• Damage class
• No damage class
• Not relevant class
Differences with the paper dataset
Real time tweets collected in an entire year
English tweets
Focus to austrialian natural disasters

Preprocessing and vector trasformation
NLTK Python library
1. Punctuation, numbers, symbols, stop words elimination;
2. Stemming: Snowball Stemmer (for the Italian language);
3. Lemmatization: Not possible.
SciKit Learn Python library (TfIdfVectorizer)
1. Build the vocabulary of terms;
2. Representing a tweet as a vector in a multidimensional space;
3. TF-IDF weight.

Clustering VS Classification
• Used for topics discovery
• Unsupervised learning
• You don’t know how many and which
clusters at priori
Clustering Classification
• Used for binary classification
problem
• Supervised learning
• You know the classes (ex: relevant
and not relevant)
• Pre-annoteted training dataset

Hierarchical/Agglomerative Clustering
Used for topics discovery
Cosine similarity to computing the distance
Clustering based on centroid/prototype
Prototype/Centroid is the representation of a cluster
Bottom – Up approach

Support Vector Machine & Naive Bayes
SVM finds a hyperplane to separate 2 classes keeping the lowest possible error
Naive Bayes count words, use relative and absolute frequency
Target classes:
• Relevant or Not Relevant
• Flood or Earthquake

Results
Number of clusters for each
defined threshold
Clustering Naive Bayes
Parameters for validating
Accurancy by original paper (1): 0,862
Accurancy by original paper (2): 0,875*distance computed as dist = 1 - cos(vec1, vec2)

Results
ROC curve and AUC
SVM: first experiment SVM: second experiment
ROC curve and AUC

Burst detection
Goal: identify a natural disaster comparing the terms frequency in a given time window
in respect to a historical average frequency
Not implemented in our project because:
• No real time tweet stream
• Unknown historical average frequency
• Only tweet about natural disasters time window, no presence of noise

Then and before preprocessing
Before
Then

Vocabulary and vector representation
Vocabulary: collection of the terms found in the tweets
Vector representation: to evaluate the likelihood among
tweets
TF-IDF: to evaluate the frequency and the
informativiness of a term

Gamma parameter
• The gamma parameter is the inverse of the radius of the samples selected by the
model as support vectors;
• It represents a penalty for each misclassification;
• The higher is the value of gamma, the lower is the separator width.

ROC curve and AUC
• The ROC curve is a graphical plot that
represents the ability of a binary classifier
system;
• It’s creating by plotting the true positive rate
against false positive rate;
• AUC is a [0,1] area under the curve:
• 0 means that every element, decided by
the system, is always wrong guessed;
• 1 means a perfect classifier

Improvements
It could be interesting make the following expirements:
• Make the same expirements using different social medias (Facebook and Instagram)
• Create a system to help populations and police forces in case of criminal and terrorist
attacks
• Make cross validation for classification training algorithms

Danilo Marzilli Andrea Lombardo Daniele Davoli
https://www.linkedin.com/in/danieledavoli/https://www.linkedin.com/in/andrea-
lombardo-2103ba15a/
https://www.linkedin.com/in/danilomarzilli/
Our team

REFERENCES
• Jie Yin, Andrew Lampert, Mark Cameron, Bella Robinson, and Robert
Power, Using Social Media to Enhance Emergency Situation
Awareness, 2012, IEEEE, 1541-1672;
https://ieeexplore.ieee.org/document/6148196/
• Cresci-SWDM15, http://socialsensing.it/en/datasets
• NumPy library, https://www.numpy.org/devdocs/
• Sklearn library, http://scikit-learn.org/stable/documentation.html
• Natural Language ToolKit library, http://www.nltk.org/
Code on GitHub

Using Social Media to Enhance Emergency Situation Awareness

Recommended

Recommended

More Related Content

Similar to Using Social Media to Enhance Emergency Situation Awareness

Similar to Using Social Media to Enhance Emergency Situation Awareness (20)

Recently uploaded

Recently uploaded (20)

Using Social Media to Enhance Emergency Situation Awareness