SlideShare a Scribd company logo
1 of 19
Download to read offline
USING SOCIAL MEDIA TO ENHANCE
EMERGENCY SITUATION AWARENESS
Web Information Retrieval 2018/2019
Danilo Marzilli
Andrea Lombardo
Daniele Davoli
Prof. Andrea Vitaletti - Prof. Luca Becchetti
Goals
Real-time event detection through social media
• Earthquake and flood Users as sensors
• Type of disaster: earthquake and flood
Introduction and goals
Online/Hierarchical clustering
• Topic discovery
Classification problems
• Relevant/Not relevant (SVM)
• Flood/Earthquake (SVM & NB)
Experimental results
The dataset: CRESCI-SWDM15
5.642 manually annoted tweets in Italian language, 4 different natural disasters
occurred in Italy between 2009 and 2014 and 3 classes:
• Damage class
• No damage class
• Not relevant class
Differences with the paper dataset
Real time tweets collected in an entire year
English tweets
Focus to austrialian natural disasters
Preprocessing and vector trasformation
NLTK Python library
1. Punctuation, numbers, symbols, stop words elimination;
2. Stemming: Snowball Stemmer (for the Italian language);
3. Lemmatization: Not possible.
SciKit Learn Python library (TfIdfVectorizer)
1. Build the vocabulary of terms;
2. Representing a tweet as a vector in a multidimensional space;
3. TF-IDF weight.
Clustering VS Classification
• Used for topics discovery
• Unsupervised learning
• You don’t know how many and which
clusters at priori
Clustering Classification
• Used for binary classification
problem
• Supervised learning
• You know the classes (ex: relevant
and not relevant)
• Pre-annoteted training dataset
Hierarchical/Agglomerative Clustering
Used for topics discovery
Cosine similarity to computing the distance
Clustering based on centroid/prototype
Prototype/Centroid is the representation of a cluster
Bottom – Up approach
Support Vector Machine & Naive Bayes
SVM finds a hyperplane to separate 2 classes keeping the lowest possible error
Naive Bayes count words, use relative and absolute frequency
Target classes:
• Relevant or Not Relevant
• Flood or Earthquake
Results
Number of clusters for each
defined threshold
Clustering Naive Bayes
Parameters for validating
Accurancy by original paper (1): 0,862
Accurancy by original paper (2): 0,875*distance computed as dist = 1 - cos(vec1, vec2)
Results
ROC curve and AUC
SVM: first experiment SVM: second experiment
ROC curve and AUC
Burst detection
Goal: identify a natural disaster comparing the terms frequency in a given time window
in respect to a historical average frequency
Not implemented in our project because:
• No real time tweet stream
• Unknown historical average frequency
• Only tweet about natural disasters time window, no presence of noise
Then and before preprocessing
Before
Then
Vocabulary and vector representation
Vocabulary: collection of the terms found in the tweets
Vector representation: to evaluate the likelihood among
tweets
TF-IDF: to evaluate the frequency and the
informativiness of a term
Agglomerative clustering code
SVM code
Naive Bayes Code
Gamma parameter
• The gamma parameter is the inverse of the radius of the samples selected by the
model as support vectors;
• It represents a penalty for each misclassification;
• The higher is the value of gamma, the lower is the separator width.
ROC curve and AUC
• The ROC curve is a graphical plot that
represents the ability of a binary classifier
system;
• It’s creating by plotting the true positive rate
against false positive rate;
• AUC is a [0,1] area under the curve:
• 0 means that every element, decided by
the system, is always wrong guessed;
• 1 means a perfect classifier
Improvements
It could be interesting make the following expirements:
• Make the same expirements using different social medias (Facebook and Instagram)
• Create a system to help populations and police forces in case of criminal and terrorist
attacks
• Make cross validation for classification training algorithms
Danilo Marzilli Andrea Lombardo Daniele Davoli
https://www.linkedin.com/in/danieledavoli/https://www.linkedin.com/in/andrea-
lombardo-2103ba15a/
https://www.linkedin.com/in/danilomarzilli/
Our team
REFERENCES
• Jie Yin, Andrew Lampert, Mark Cameron, Bella Robinson, and Robert
Power, Using Social Media to Enhance Emergency Situation
Awareness, 2012, IEEEE, 1541-1672;
https://ieeexplore.ieee.org/document/6148196/
• Cresci-SWDM15, http://socialsensing.it/en/datasets
• NumPy library, https://www.numpy.org/devdocs/
• Sklearn library, http://scikit-learn.org/stable/documentation.html
• Natural Language ToolKit library, http://www.nltk.org/
Code on GitHub

More Related Content

Similar to Using Social Media to Enhance Emergency Situation Awareness

Time Series Anomaly Detection for .net and Azure
Time Series Anomaly Detection for .net and AzureTime Series Anomaly Detection for .net and Azure
Time Series Anomaly Detection for .net and AzureMarco Parenzan
 
Big data analytics for smart and sustainable city galway
Big data analytics for smart and sustainable city galwayBig data analytics for smart and sustainable city galway
Big data analytics for smart and sustainable city galwayLaura Po
 
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)Prashant Khare
 
An Infectious Disease Surveillance Simulation (IDSS) in the Cloud
An Infectious Disease Surveillance Simulation (IDSS) in the CloudAn Infectious Disease Surveillance Simulation (IDSS) in the Cloud
An Infectious Disease Surveillance Simulation (IDSS) in the CloudEdison Lascano
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?Yiannis Kompatsiaris
 
INTELLIGENT MALWARE DETECTION USING EXTREME LEARNING MACHINE
INTELLIGENT MALWARE DETECTION USING EXTREME LEARNING MACHINEINTELLIGENT MALWARE DETECTION USING EXTREME LEARNING MACHINE
INTELLIGENT MALWARE DETECTION USING EXTREME LEARNING MACHINEIRJET Journal
 
Using transfer learning for video popularity prediction
Using transfer learning for video popularity predictionUsing transfer learning for video popularity prediction
Using transfer learning for video popularity predictioneSAT Publishing House
 
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...Muhammad Imran
 
DATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITODATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITOMarcoMellia
 
Time Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and AzureTime Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and AzureMarco Parenzan
 
Vulnerability Management Nirvana - Seattle Agora - 18Mar16
Vulnerability Management Nirvana - Seattle Agora - 18Mar16Vulnerability Management Nirvana - Seattle Agora - 18Mar16
Vulnerability Management Nirvana - Seattle Agora - 18Mar16Kymberlee Price
 
130531 francis nahm - on the evolution of antipatterns genealogies
130531   francis nahm - on the evolution of antipatterns genealogies130531   francis nahm - on the evolution of antipatterns genealogies
130531 francis nahm - on the evolution of antipatterns genealogiesPtidej Team
 
MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...
MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...
MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...multimediaeval
 
Human age and gender Detection
Human age and gender DetectionHuman age and gender Detection
Human age and gender DetectionAbhiAchalla
 
Cvss guide file
Cvss guide fileCvss guide file
Cvss guide filemudyna
 
Pre-defense_talk
Pre-defense_talkPre-defense_talk
Pre-defense_talkaphex34
 
People counting in low density video sequences2
People counting in low density video sequences2People counting in low density video sequences2
People counting in low density video sequences2Ahmed Tememe
 

Similar to Using Social Media to Enhance Emergency Situation Awareness (20)

Time Series Anomaly Detection for .net and Azure
Time Series Anomaly Detection for .net and AzureTime Series Anomaly Detection for .net and Azure
Time Series Anomaly Detection for .net and Azure
 
Big data analytics for smart and sustainable city galway
Big data analytics for smart and sustainable city galwayBig data analytics for smart and sustainable city galway
Big data analytics for smart and sustainable city galway
 
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
 
An Infectious Disease Surveillance Simulation (IDSS) in the Cloud
An Infectious Disease Surveillance Simulation (IDSS) in the CloudAn Infectious Disease Surveillance Simulation (IDSS) in the Cloud
An Infectious Disease Surveillance Simulation (IDSS) in the Cloud
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
 
Multimedia Mining
Multimedia Mining Multimedia Mining
Multimedia Mining
 
INTELLIGENT MALWARE DETECTION USING EXTREME LEARNING MACHINE
INTELLIGENT MALWARE DETECTION USING EXTREME LEARNING MACHINEINTELLIGENT MALWARE DETECTION USING EXTREME LEARNING MACHINE
INTELLIGENT MALWARE DETECTION USING EXTREME LEARNING MACHINE
 
Using transfer learning for video popularity prediction
Using transfer learning for video popularity predictionUsing transfer learning for video popularity prediction
Using transfer learning for video popularity prediction
 
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...
 
DATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITODATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITO
 
Paper
PaperPaper
Paper
 
Time Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and AzureTime Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and Azure
 
Vulnerability Management Nirvana - Seattle Agora - 18Mar16
Vulnerability Management Nirvana - Seattle Agora - 18Mar16Vulnerability Management Nirvana - Seattle Agora - 18Mar16
Vulnerability Management Nirvana - Seattle Agora - 18Mar16
 
130531 francis nahm - on the evolution of antipatterns genealogies
130531   francis nahm - on the evolution of antipatterns genealogies130531   francis nahm - on the evolution of antipatterns genealogies
130531 francis nahm - on the evolution of antipatterns genealogies
 
MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...
MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...
MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...
 
Human age and gender Detection
Human age and gender DetectionHuman age and gender Detection
Human age and gender Detection
 
Cvss guide file
Cvss guide fileCvss guide file
Cvss guide file
 
Pre-defense_talk
Pre-defense_talkPre-defense_talk
Pre-defense_talk
 
People counting in low density video sequences2
People counting in low density video sequences2People counting in low density video sequences2
People counting in low density video sequences2
 
On Impact in Software Engineering Research
On Impact in Software Engineering ResearchOn Impact in Software Engineering Research
On Impact in Software Engineering Research
 

Recently uploaded

Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 

Recently uploaded (20)

Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 

Using Social Media to Enhance Emergency Situation Awareness

  • 1. USING SOCIAL MEDIA TO ENHANCE EMERGENCY SITUATION AWARENESS Web Information Retrieval 2018/2019 Danilo Marzilli Andrea Lombardo Daniele Davoli Prof. Andrea Vitaletti - Prof. Luca Becchetti
  • 2. Goals Real-time event detection through social media • Earthquake and flood Users as sensors • Type of disaster: earthquake and flood Introduction and goals Online/Hierarchical clustering • Topic discovery Classification problems • Relevant/Not relevant (SVM) • Flood/Earthquake (SVM & NB) Experimental results
  • 3. The dataset: CRESCI-SWDM15 5.642 manually annoted tweets in Italian language, 4 different natural disasters occurred in Italy between 2009 and 2014 and 3 classes: • Damage class • No damage class • Not relevant class Differences with the paper dataset Real time tweets collected in an entire year English tweets Focus to austrialian natural disasters
  • 4. Preprocessing and vector trasformation NLTK Python library 1. Punctuation, numbers, symbols, stop words elimination; 2. Stemming: Snowball Stemmer (for the Italian language); 3. Lemmatization: Not possible. SciKit Learn Python library (TfIdfVectorizer) 1. Build the vocabulary of terms; 2. Representing a tweet as a vector in a multidimensional space; 3. TF-IDF weight.
  • 5. Clustering VS Classification • Used for topics discovery • Unsupervised learning • You don’t know how many and which clusters at priori Clustering Classification • Used for binary classification problem • Supervised learning • You know the classes (ex: relevant and not relevant) • Pre-annoteted training dataset
  • 6. Hierarchical/Agglomerative Clustering Used for topics discovery Cosine similarity to computing the distance Clustering based on centroid/prototype Prototype/Centroid is the representation of a cluster Bottom – Up approach
  • 7. Support Vector Machine & Naive Bayes SVM finds a hyperplane to separate 2 classes keeping the lowest possible error Naive Bayes count words, use relative and absolute frequency Target classes: • Relevant or Not Relevant • Flood or Earthquake
  • 8. Results Number of clusters for each defined threshold Clustering Naive Bayes Parameters for validating Accurancy by original paper (1): 0,862 Accurancy by original paper (2): 0,875*distance computed as dist = 1 - cos(vec1, vec2)
  • 9. Results ROC curve and AUC SVM: first experiment SVM: second experiment ROC curve and AUC
  • 10. Burst detection Goal: identify a natural disaster comparing the terms frequency in a given time window in respect to a historical average frequency Not implemented in our project because: • No real time tweet stream • Unknown historical average frequency • Only tweet about natural disasters time window, no presence of noise
  • 11. Then and before preprocessing Before Then
  • 12. Vocabulary and vector representation Vocabulary: collection of the terms found in the tweets Vector representation: to evaluate the likelihood among tweets TF-IDF: to evaluate the frequency and the informativiness of a term
  • 15. Gamma parameter • The gamma parameter is the inverse of the radius of the samples selected by the model as support vectors; • It represents a penalty for each misclassification; • The higher is the value of gamma, the lower is the separator width.
  • 16. ROC curve and AUC • The ROC curve is a graphical plot that represents the ability of a binary classifier system; • It’s creating by plotting the true positive rate against false positive rate; • AUC is a [0,1] area under the curve: • 0 means that every element, decided by the system, is always wrong guessed; • 1 means a perfect classifier
  • 17. Improvements It could be interesting make the following expirements: • Make the same expirements using different social medias (Facebook and Instagram) • Create a system to help populations and police forces in case of criminal and terrorist attacks • Make cross validation for classification training algorithms
  • 18. Danilo Marzilli Andrea Lombardo Daniele Davoli https://www.linkedin.com/in/danieledavoli/https://www.linkedin.com/in/andrea- lombardo-2103ba15a/ https://www.linkedin.com/in/danilomarzilli/ Our team
  • 19. REFERENCES • Jie Yin, Andrew Lampert, Mark Cameron, Bella Robinson, and Robert Power, Using Social Media to Enhance Emergency Situation Awareness, 2012, IEEEE, 1541-1672; https://ieeexplore.ieee.org/document/6148196/ • Cresci-SWDM15, http://socialsensing.it/en/datasets • NumPy library, https://www.numpy.org/devdocs/ • Sklearn library, http://scikit-learn.org/stable/documentation.html • Natural Language ToolKit library, http://www.nltk.org/ Code on GitHub