SlideShare a Scribd company logo
Identifying Relevant Messages in a
Twitter-based Citizen Channel
for Natural Disaster Situations
Alfredo	
  Cobo	
  
ajcobo@uc.cl	
  
Denis	
  Parra	
  
dparra@ing.puc.cl	
  
Jaime	
  Navón	
  
jnavon@ing.puc.cl	
  
Pon=ficia	
  Universidad	
  Católica	
  de	
  Chile	
  
Departamento	
  de	
  Ciencia	
  de	
  la	
  Computación	
  
Av.	
  Vicuña	
  Mackenna	
  4860,	
  Macul	
  
San=ago,	
  Chile	
  
	
  
I (… and some other people in this room)
…	
  come	
  from	
  Chile	
  
Picture	
  from	
  hMp://www.quadrodemedalhas.com/images/mapas/mapa-­‐chile.jpg	
  
hMp://upload.wikimedia.org/wikipedia/commons/thumb/9/91/Chile_in_South_America_(-­‐mini_map_-­‐rivers).svg/409px-­‐Chile_in_South_America_(-­‐mini_map_-­‐
rivers).svg.png	
  
Chile, well-known for its..
• 	
  Copper	
  (Top	
  Producer)	
  
"Top	
  5	
  Copper	
  Producers"	
  by	
  Plazak	
  -­‐	
  Own	
  work.	
  Licensed	
  under	
  CC	
  BY-­‐SA	
  3.0	
  via	
  Wikimedia	
  Commons	
  -­‐	
  hMp://commons.wikimedia.org/wiki/
File:Top_5_Copper_Producers.png#/media/File:Top_5_Copper_Producers.png	
  
hMps://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=0CAYQjB0&url=hMp%3A%2F%2Fcommons.wikimedia.org%2Fwiki%2FFile
%3ANa=ve_Copper_(mineral).jpg&ei=L31ZVbOsL4r1UrbRgKAB&bvm=bv.93564037,d.d24&psig=AFQjCNHr2zm5m4Jmim7AgkCwwSb0b5mGUA&ust=1432014509629311	
  
Chile, well-known for its..
• Wine	
  	
  
(Price	
  +	
  quality)	
  
	
  
"Fiesta	
  de	
  Vendimia"	
  by	
  LuxoDresden	
  -­‐	
  Own	
  work.	
  Licensed	
  under	
  CC	
  BY-­‐SA	
  3.0	
  via	
  Wikimedia	
  Commons	
  -­‐	
  hMp://commons.wikimedia.org/wiki/
File:Fiesta_de_Vendimia.JPG#/media/File:Fiesta_de_Vendimia.JPG	
  
If you start typing in Google…
9	
  out	
  of	
  10	
  
disasters	
  …	
  
If you start typing in Google…
9	
  out	
  of	
  10	
  
disasters	
  …	
  
prefer	
  Chile	
  
… and for Natural Disasters L
• Largest	
  ever	
  registered	
  earthquake	
  in	
  History:	
  
Valdivia,	
  Chile,	
  22nd	
  of	
  May	
  of	
  1960	
  (9.5	
  in	
  Richter	
  
Scale)	
  
• We	
  usually	
  have	
  1	
  large	
  earthquake	
  every	
  30	
  years	
  (~	
  
8	
  degrees	
  	
  in	
  Richter	
  Scale)	
  
• Last	
  one	
  in	
  2010	
  close	
  to	
  Concepción,	
  but	
  it	
  also	
  
affected	
  San=ago	
  (the	
  capital)	
  
… so, at PUC Chile
• We	
  created	
  CIGIDEN	
  “Na=onal	
  Research	
  Center	
  for	
  
the	
  Integrated	
  Administra=on	
  of	
  Natural	
  Disasters”	
  
CIGIDEN’s Goal in this project
• Help	
  ci=zens	
  staying	
  informed	
  during	
  situa=ons	
  
of	
  natural	
  disasters	
  by	
  using	
  Social	
  Media.	
  
• Build	
  Mobile	
  Applica=on	
  (Carlos	
  Molina)	
  
• Filter	
  automa=cally	
  relevant	
  messages	
  from	
  those	
  
not	
  related	
  to	
  earthquakes	
  (Alfredo	
  Cobo)	
  to	
  feed	
  
the	
  applica=on	
  
	
  
Our Task: Building a Twitter classifier
-­‐ Filter	
  tweets	
  related	
  to	
  natural	
  disasters	
  from	
  those	
  
who	
  did	
  not.	
  
	
  
Related Work
Manual	
  Classifica8on	
   Data	
  Post-­‐processing	
   Feature	
  Genera8on	
   Tools	
  for	
  Disaster	
  Management	
  
Vieweg	
  et	
  al.	
  (2010)	
  
Imran	
  et	
  al.	
  (2013)	
  
Mendoza	
  et	
  al.	
  (2010)	
  
	
  
	
  
Mendoza	
  et	
  al.	
  (2010)	
  
Cas=llo	
  et	
  al.	
  (2011)	
  
	
  
(Informa=on	
  Credibility	
  
on	
  TwiMer)	
  
Gimpel	
  et	
  al.	
  (2011)	
  
Koloumpis	
  et	
  al.	
  (2011)	
  
Liu	
  et	
  al.	
  (2012)	
  
Wu	
  et	
  al.	
  (2011)	
  
Lee	
  et	
  al.	
  (2014)	
  
	
  
(Not	
  necessarily	
  for	
  
natural	
  disasters)	
  
	
  
Hiltz	
  et	
  al.	
  (2013)	
  
Power	
  et	
  al.	
  (2013)	
  
Caragea	
  et	
  al.	
  (2011)	
  
Abel	
  et	
  al.	
  (2012)	
  
Middleton	
  et	
  al.	
  (2014)	
  
MorstaMer	
  et	
  al.	
  (2013)	
  
Imran	
  et	
  al.	
  (2014)	
  
Why building this classifier would be a
contribution?
• Building	
  and	
  valida=ng	
  a	
  ground	
  truth	
  for	
  
classifying	
  tweets	
  in	
  Spanish.	
  
• Building	
  the	
  classifier	
  and	
  dealing	
  with	
  
• Class	
  Imbalance	
  	
  
• Number	
  of	
  latent	
  dimensions	
  (Feature	
  Genera=on	
  
using	
  LDA)	
  
Workflow of Activities
Chile’s	
  Earthquake	
  
2010	
  
Cas=llo	
  et	
  al.	
  
(2010)	
  
Our	
  
ground
truth	
  
Non-­‐
relevant	
  
messages	
  
Realis=c	
  
dataset	
  
Sampling,	
  
Cleaning	
  &	
  	
  
filtering	
  
Classifiers	
  
-­‐  Feature	
  
selec=on	
  (LDA)	
  
-­‐  Class	
  Imbalance	
  
10%	
  -­‐	
  80%	
  
Building the ground truth
• Random	
  sampling	
  of	
  5,000	
  tweets	
  from	
  Cas=llo	
  et	
  
al.	
  (2010)	
  dataset,	
  used	
  to	
  study	
  credibility	
  ~	
  Chile’s	
  
2010	
  earthquake.	
  
• Dates:	
  From	
  February	
  27th	
  un=l	
  March	
  2nd	
  
(Spanning	
  4	
  days	
  in	
  2010)	
  
• We	
  kept	
  only	
  Spanish	
  messages,	
  removed	
  
messages	
  too	
  similar	
  (Lavenshtein	
  distance):	
  2,187	
  
messages	
  leE	
  
Validating of the ground truth
•  Fleiss	
  Kappa:	
  
•  κ	
  =	
  0.645,	
  p	
  <	
  .001	
  
•  Intraclass	
  correla=on	
  
•  ICC(2,1):	
  IIC	
  =	
  0.646,	
  p	
  
<	
  .001	
  
•  Landis	
  and	
  Koch	
  et	
  al.	
  
(1977)	
  
	
  
• 	
  Relevant	
  messages	
  were	
  
labeled	
  based	
  on	
  Imran	
  et	
  al.	
  
(2013)	
  classifica=on:	
  
• Cau=on/Warning	
  
• Casual=es	
  and	
  Damage	
  
• People	
  (missing,	
  found,	
  etc.)	
  
• Informa=on	
  source	
  
Workflow of Activities
Chile’s	
  Earthquake	
  
2010	
  
Cas=llo	
  et	
  al.	
  
(2010)	
  
Our	
  
ground
truth	
  
Non-­‐
relevant	
  
messages	
  
Realis=c	
  
dataset	
  
Sampling,	
  
Cleaning	
  &	
  	
  
filtering	
  
Classifiers	
  
-­‐  Feature	
  
selec=on	
  (LDA)	
  
-­‐  Class	
  Imbalance	
  
Classification Problem
Features	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Class	
  Imbalance	
  
User	
  
Network	
  
Content	
  (4,766	
  
unique	
  words)	
  
Followers	
   Hashtags	
  
Followees	
   Words	
  
User	
  men=ons	
  
•  Ground	
  Truth	
  is	
  a	
  not	
  realis=c	
  
representa=on	
  of	
  TwiMer	
  
•  We	
  added	
  “Noise”:	
  Introduced	
  
Tweets	
  non-­‐relevant	
  to	
  the	
  event	
  
(20%	
  -­‐	
  80%)	
  
•  Sampled	
  non-­‐relevant	
  tweets	
  
from	
  5	
  months.	
  
•  Removed	
  all	
  tweets	
  posted	
  
during	
  days	
  of	
  seismic	
  ac=vi=es	
  
Model	
   Precision	
   Recall	
   F1	
  score	
   Accuracy	
   AUC	
   Dimensions	
   Noise	
  
Propor8on	
  
Baseline	
   0.625	
   0.545	
   0.53	
   0.5	
   0.568	
   -­‐	
   0	
  
Bernoulli	
  
NB	
  
0.831	
   0.226	
   0.355	
   0.594	
   0.605	
   2000	
   0	
  
Logis=c	
  
Regression	
  
0.827	
   0.641	
   0.722	
   0.756	
   0.834	
   2000	
   0.6	
  
Linear	
  SVM	
   0.687	
   0.677	
   0.682	
   0.687	
   0.719	
   1000	
   0.6	
  
Random	
  
Forest	
  
0.807	
   0.673	
   0.734	
   0.758	
   0.844	
   1000	
   0.8	
  
Classification Results
Analysis ~ LDA Dimensions and Noise
Analysis ~ LDA Dimensions and Noise
Conclusions & Future Work
• We	
  built	
  and	
  validated	
  a	
  ground	
  truth	
  of	
  tweets	
  
in	
  Spanish	
  relevant	
  to	
  disasters	
  
• We	
  implemented	
  a	
  classifier	
  and	
  analyzed	
  its	
  
performance	
  based	
  on	
  several	
  algorithms	
  and	
  
dealing	
  with	
  class	
  imbalance	
  problem	
  
• Future	
  Work:	
  Move	
  the	
  applica=on	
  from	
  
prototype	
  to	
  produc=on,	
  test	
  online	
  scalability	
  
That’s all folks!
• 	
  Thanks	
  and	
  ques=ons	
  to	
  corresponding	
  author	
  
Alfredo	
  Cobo:	
  ajcobo@uc.cl	
  or	
  
Denis	
  Parra:	
  dparra@uc.cl	
  
	
  
Chile, small country, but well-known for its..
• Length	
  (4,300	
  Km)	
  
	
  
~	
  4,300	
  Km	
   ~8,000	
  Km	
  
Model Features
•  Newman	
  et	
  al.	
  (2007)	
  
•  Biro	
  et	
  al.	
  (2008)	
  
•  Wei	
  et	
  al.	
  (2006)	
  
•  Wang	
  et	
  al.	
  (2012)	
  
•  Han	
  (2005)	
  
Features	
   Corpora	
  Features	
  
Followers	
   Hashtags	
  
Friends	
   Words	
  
User	
  men=ons	
  
Results
•  Amatriain	
  et	
  al.	
  (2013)	
  
Architecture
Plots of bootstrap
Agreement	
  Day	
  1	
   Agreement	
  Day	
  2	
  
Agreement	
  Day	
  4	
  Agreement	
  Day	
  3	
  
Word Frequencies
Just “Terremoto”: AUC
Related Work
Manual classification
•  Vieweg	
  et	
  al.	
  (2010)	
  
•  Imran	
  et	
  al.	
  (2013)	
  
Post Processing
•  Cas=llo	
  et	
  al.	
  (2011)	
  
•  Mendoza	
  et	
  al.	
  (2010)	
  
Feature Generation Approaches
•  Gimpel	
  et	
  al.	
  (2011)	
  
•  Koloumpis	
  et	
  al.	
  (2011)	
  
•  Liu	
  et	
  al.	
  (2012)	
  
•  Wu	
  et	
  al.	
  (2011)	
  
•  Lee	
  et	
  al.	
  (2014)	
  
Tools For Disaster Management
•  Hiltz	
  et	
  al.	
  (2013)	
  
•  Power	
  et	
  al.	
  (2013)	
  
•  Caragea	
  et	
  al.	
  (2011)	
  
•  Abel	
  et	
  al.	
  (2012)	
  
•  Middleton	
  et	
  al.	
  (2014)	
  
•  MorstaMer	
  et	
  al.	
  (2013)	
  
•  Imran	
  et	
  al.	
  (2014)	
  
Building the ground truth
•  Mendoza	
  et	
  al.	
  (2010)	
  
•  Imran	
  et	
  al.	
  (2013)	
  
Algorithms and evaluation procedure
•  Cas=llo	
  et	
  al.	
  (2011)	
  
•  FawceM	
  et	
  al.	
  (2004)	
  
•  Manning	
  et	
  al.	
  (2008)	
  
•  Wen	
  et	
  al.	
  (2014)	
  

More Related Content

Similar to Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...
Robert Munro
 
D-sieve : A Novel Data Processing Engine for Crises Related Social Messages
D-sieve : A Novel Data Processing Engine for Crises Related Social MessagesD-sieve : A Novel Data Processing Engine for Crises Related Social Messages
D-sieve : A Novel Data Processing Engine for Crises Related Social Messages
wire unitn
 
Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...
Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...
Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...
Muhammad Imran
 
MediaEval 2018 Pixel Privacy: Task Overview
MediaEval 2018 Pixel Privacy: Task OverviewMediaEval 2018 Pixel Privacy: Task Overview
MediaEval 2018 Pixel Privacy: Task Overview
multimediaeval
 
The Opportunities and Challenges of Putting the Latest Computer Vision and De...
The Opportunities and Challenges of Putting the Latest Computer Vision and De...The Opportunities and Challenges of Putting the Latest Computer Vision and De...
The Opportunities and Challenges of Putting the Latest Computer Vision and De...
Albert Y. C. Chen
 
SBQS 2013 Keynote: Cooperative Testing and Analysis
SBQS 2013 Keynote: Cooperative Testing and AnalysisSBQS 2013 Keynote: Cooperative Testing and Analysis
SBQS 2013 Keynote: Cooperative Testing and Analysis
Tao Xie
 
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2OIntroduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
Data Science Milan
 
Spohrer GAMP 20230628 v17.pptx
Spohrer GAMP 20230628 v17.pptxSpohrer GAMP 20230628 v17.pptx
Spohrer GAMP 20230628 v17.pptx
ISSIP
 
MapReduce In The Cloud Infinispan Distributed Task Execution Framework
MapReduce In The Cloud Infinispan Distributed Task Execution FrameworkMapReduce In The Cloud Infinispan Distributed Task Execution Framework
MapReduce In The Cloud Infinispan Distributed Task Execution Framework
Manik Surtani
 
Collecting and Coding Twitter Data in DiscoverText
Collecting and Coding Twitter Data in DiscoverTextCollecting and Coding Twitter Data in DiscoverText
Collecting and Coding Twitter Data in DiscoverText
Jill Hopke
 
AI4SE: Challenges and opportunities in the integration of Systems Engineering...
AI4SE: Challenges and opportunities in the integration of Systems Engineering...AI4SE: Challenges and opportunities in the integration of Systems Engineering...
AI4SE: Challenges and opportunities in the integration of Systems Engineering...
CARLOS III UNIVERSITY OF MADRID
 
Challenges in the integration of Systems Engineering and the AI/ML model life...
Challenges in the integration of Systems Engineering and the AI/ML model life...Challenges in the integration of Systems Engineering and the AI/ML model life...
Challenges in the integration of Systems Engineering and the AI/ML model life...
CARLOS III UNIVERSITY OF MADRID
 
ICServ2023 20230914 v8.pptx
ICServ2023 20230914 v8.pptxICServ2023 20230914 v8.pptx
ICServ2023 20230914 v8.pptx
ISSIP
 
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
Vincenzo Lomonaco
 
Sahana overview and history of sahana-aaai 2015
Sahana   overview and history of sahana-aaai 2015Sahana   overview and history of sahana-aaai 2015
Sahana overview and history of sahana-aaai 2015
Chamindra de Silva
 
UCSC Tech4Good 20240306 v12 David_Lee Leadership_and_Career
UCSC Tech4Good 20240306 v12 David_Lee Leadership_and_CareerUCSC Tech4Good 20240306 v12 David_Lee Leadership_and_Career
UCSC Tech4Good 20240306 v12 David_Lee Leadership_and_Career
ISSIP
 
Brian Fabo
Brian FaboBrian Fabo
Brian Fabo
Eduworks Network
 
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
MaRS Discovery District
 
AIM 20240515 v15 Solomon_Darwin Berkeley at UCSCSV.pptx
AIM 20240515 v15 Solomon_Darwin Berkeley at UCSCSV.pptxAIM 20240515 v15 Solomon_Darwin Berkeley at UCSCSV.pptx
AIM 20240515 v15 Solomon_Darwin Berkeley at UCSCSV.pptx
ISSIP
 
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Prashant Khare
 

Similar to Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations (20)

Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...
 
D-sieve : A Novel Data Processing Engine for Crises Related Social Messages
D-sieve : A Novel Data Processing Engine for Crises Related Social MessagesD-sieve : A Novel Data Processing Engine for Crises Related Social Messages
D-sieve : A Novel Data Processing Engine for Crises Related Social Messages
 
Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...
Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...
Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...
 
MediaEval 2018 Pixel Privacy: Task Overview
MediaEval 2018 Pixel Privacy: Task OverviewMediaEval 2018 Pixel Privacy: Task Overview
MediaEval 2018 Pixel Privacy: Task Overview
 
The Opportunities and Challenges of Putting the Latest Computer Vision and De...
The Opportunities and Challenges of Putting the Latest Computer Vision and De...The Opportunities and Challenges of Putting the Latest Computer Vision and De...
The Opportunities and Challenges of Putting the Latest Computer Vision and De...
 
SBQS 2013 Keynote: Cooperative Testing and Analysis
SBQS 2013 Keynote: Cooperative Testing and AnalysisSBQS 2013 Keynote: Cooperative Testing and Analysis
SBQS 2013 Keynote: Cooperative Testing and Analysis
 
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2OIntroduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
 
Spohrer GAMP 20230628 v17.pptx
Spohrer GAMP 20230628 v17.pptxSpohrer GAMP 20230628 v17.pptx
Spohrer GAMP 20230628 v17.pptx
 
MapReduce In The Cloud Infinispan Distributed Task Execution Framework
MapReduce In The Cloud Infinispan Distributed Task Execution FrameworkMapReduce In The Cloud Infinispan Distributed Task Execution Framework
MapReduce In The Cloud Infinispan Distributed Task Execution Framework
 
Collecting and Coding Twitter Data in DiscoverText
Collecting and Coding Twitter Data in DiscoverTextCollecting and Coding Twitter Data in DiscoverText
Collecting and Coding Twitter Data in DiscoverText
 
AI4SE: Challenges and opportunities in the integration of Systems Engineering...
AI4SE: Challenges and opportunities in the integration of Systems Engineering...AI4SE: Challenges and opportunities in the integration of Systems Engineering...
AI4SE: Challenges and opportunities in the integration of Systems Engineering...
 
Challenges in the integration of Systems Engineering and the AI/ML model life...
Challenges in the integration of Systems Engineering and the AI/ML model life...Challenges in the integration of Systems Engineering and the AI/ML model life...
Challenges in the integration of Systems Engineering and the AI/ML model life...
 
ICServ2023 20230914 v8.pptx
ICServ2023 20230914 v8.pptxICServ2023 20230914 v8.pptx
ICServ2023 20230914 v8.pptx
 
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
 
Sahana overview and history of sahana-aaai 2015
Sahana   overview and history of sahana-aaai 2015Sahana   overview and history of sahana-aaai 2015
Sahana overview and history of sahana-aaai 2015
 
UCSC Tech4Good 20240306 v12 David_Lee Leadership_and_Career
UCSC Tech4Good 20240306 v12 David_Lee Leadership_and_CareerUCSC Tech4Good 20240306 v12 David_Lee Leadership_and_Career
UCSC Tech4Good 20240306 v12 David_Lee Leadership_and_Career
 
Brian Fabo
Brian FaboBrian Fabo
Brian Fabo
 
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
 
AIM 20240515 v15 Solomon_Darwin Berkeley at UCSCSV.pptx
AIM 20240515 v15 Solomon_Darwin Berkeley at UCSCSV.pptxAIM 20240515 v15 Solomon_Darwin Berkeley at UCSCSV.pptx
AIM 20240515 v15 Solomon_Darwin Berkeley at UCSCSV.pptx
 
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
 

More from Denis Parra Santander

The Effect of Explanations & Algorithmic Accuracy on Visual Recommender Syste...
The Effect of Explanations & Algorithmic Accuracy on Visual Recommender Syste...The Effect of Explanations & Algorithmic Accuracy on Visual Recommender Syste...
The Effect of Explanations & Algorithmic Accuracy on Visual Recommender Syste...
Denis Parra Santander
 
Do Better ImageNet Models Transfer Better... for Image Recommendation?
Do Better ImageNet Models Transfer Better... for Image Recommendation?Do Better ImageNet Models Transfer Better... for Image Recommendation?
Do Better ImageNet Models Transfer Better... for Image Recommendation?
Denis Parra Santander
 
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
Denis Parra Santander
 
Social Aspects of Interactive Recommender Systems
Social Aspects of Interactive Recommender SystemsSocial Aspects of Interactive Recommender Systems
Social Aspects of Interactive Recommender Systems
Denis Parra Santander
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
Denis Parra Santander
 
Data Fusion for Dealing with the Recommendation Problem
Data Fusion for Dealing with the Recommendation ProblemData Fusion for Dealing with the Recommendation Problem
Data Fusion for Dealing with the Recommendation Problem
Denis Parra Santander
 
Research on Recommender Systems: Beyond Ratings and Lists
Research on Recommender Systems: Beyond Ratings and ListsResearch on Recommender Systems: Beyond Ratings and Lists
Research on Recommender Systems: Beyond Ratings and Lists
Denis Parra Santander
 
The Effect of Different Set-based Visualizations on User Exploration of Reco...
The Effect of Different Set-based  Visualizations on User Exploration of Reco...The Effect of Different Set-based  Visualizations on User Exploration of Reco...
The Effect of Different Set-based Visualizations on User Exploration of Reco...
Denis Parra Santander
 
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Denis Parra Santander
 

More from Denis Parra Santander (9)

The Effect of Explanations & Algorithmic Accuracy on Visual Recommender Syste...
The Effect of Explanations & Algorithmic Accuracy on Visual Recommender Syste...The Effect of Explanations & Algorithmic Accuracy on Visual Recommender Syste...
The Effect of Explanations & Algorithmic Accuracy on Visual Recommender Syste...
 
Do Better ImageNet Models Transfer Better... for Image Recommendation?
Do Better ImageNet Models Transfer Better... for Image Recommendation?Do Better ImageNet Models Transfer Better... for Image Recommendation?
Do Better ImageNet Models Transfer Better... for Image Recommendation?
 
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
 
Social Aspects of Interactive Recommender Systems
Social Aspects of Interactive Recommender SystemsSocial Aspects of Interactive Recommender Systems
Social Aspects of Interactive Recommender Systems
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
 
Data Fusion for Dealing with the Recommendation Problem
Data Fusion for Dealing with the Recommendation ProblemData Fusion for Dealing with the Recommendation Problem
Data Fusion for Dealing with the Recommendation Problem
 
Research on Recommender Systems: Beyond Ratings and Lists
Research on Recommender Systems: Beyond Ratings and ListsResearch on Recommender Systems: Beyond Ratings and Lists
Research on Recommender Systems: Beyond Ratings and Lists
 
The Effect of Different Set-based Visualizations on User Exploration of Reco...
The Effect of Different Set-based  Visualizations on User Exploration of Reco...The Effect of Different Set-based  Visualizations on User Exploration of Reco...
The Effect of Different Set-based Visualizations on User Exploration of Reco...
 
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
 

Recently uploaded

The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Diana Rendina
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
NgcHiNguyn25
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
Wahiba Chair Training & Consulting
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
paigestewart1632
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 

Recently uploaded (20)

The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 

Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

  • 1. Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations Alfredo  Cobo   ajcobo@uc.cl   Denis  Parra   dparra@ing.puc.cl   Jaime  Navón   jnavon@ing.puc.cl   Pon=ficia  Universidad  Católica  de  Chile   Departamento  de  Ciencia  de  la  Computación   Av.  Vicuña  Mackenna  4860,  Macul   San=ago,  Chile    
  • 2. I (… and some other people in this room) …  come  from  Chile   Picture  from  hMp://www.quadrodemedalhas.com/images/mapas/mapa-­‐chile.jpg   hMp://upload.wikimedia.org/wikipedia/commons/thumb/9/91/Chile_in_South_America_(-­‐mini_map_-­‐rivers).svg/409px-­‐Chile_in_South_America_(-­‐mini_map_-­‐ rivers).svg.png  
  • 3. Chile, well-known for its.. •   Copper  (Top  Producer)   "Top  5  Copper  Producers"  by  Plazak  -­‐  Own  work.  Licensed  under  CC  BY-­‐SA  3.0  via  Wikimedia  Commons  -­‐  hMp://commons.wikimedia.org/wiki/ File:Top_5_Copper_Producers.png#/media/File:Top_5_Copper_Producers.png   hMps://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=0CAYQjB0&url=hMp%3A%2F%2Fcommons.wikimedia.org%2Fwiki%2FFile %3ANa=ve_Copper_(mineral).jpg&ei=L31ZVbOsL4r1UrbRgKAB&bvm=bv.93564037,d.d24&psig=AFQjCNHr2zm5m4Jmim7AgkCwwSb0b5mGUA&ust=1432014509629311  
  • 4. Chile, well-known for its.. • Wine     (Price  +  quality)     "Fiesta  de  Vendimia"  by  LuxoDresden  -­‐  Own  work.  Licensed  under  CC  BY-­‐SA  3.0  via  Wikimedia  Commons  -­‐  hMp://commons.wikimedia.org/wiki/ File:Fiesta_de_Vendimia.JPG#/media/File:Fiesta_de_Vendimia.JPG  
  • 5. If you start typing in Google… 9  out  of  10   disasters  …  
  • 6. If you start typing in Google… 9  out  of  10   disasters  …   prefer  Chile  
  • 7. … and for Natural Disasters L • Largest  ever  registered  earthquake  in  History:   Valdivia,  Chile,  22nd  of  May  of  1960  (9.5  in  Richter   Scale)   • We  usually  have  1  large  earthquake  every  30  years  (~   8  degrees    in  Richter  Scale)   • Last  one  in  2010  close  to  Concepción,  but  it  also   affected  San=ago  (the  capital)  
  • 8. … so, at PUC Chile • We  created  CIGIDEN  “Na=onal  Research  Center  for   the  Integrated  Administra=on  of  Natural  Disasters”  
  • 9. CIGIDEN’s Goal in this project • Help  ci=zens  staying  informed  during  situa=ons   of  natural  disasters  by  using  Social  Media.   • Build  Mobile  Applica=on  (Carlos  Molina)   • Filter  automa=cally  relevant  messages  from  those   not  related  to  earthquakes  (Alfredo  Cobo)  to  feed   the  applica=on    
  • 10. Our Task: Building a Twitter classifier -­‐ Filter  tweets  related  to  natural  disasters  from  those   who  did  not.    
  • 11. Related Work Manual  Classifica8on   Data  Post-­‐processing   Feature  Genera8on   Tools  for  Disaster  Management   Vieweg  et  al.  (2010)   Imran  et  al.  (2013)   Mendoza  et  al.  (2010)       Mendoza  et  al.  (2010)   Cas=llo  et  al.  (2011)     (Informa=on  Credibility   on  TwiMer)   Gimpel  et  al.  (2011)   Koloumpis  et  al.  (2011)   Liu  et  al.  (2012)   Wu  et  al.  (2011)   Lee  et  al.  (2014)     (Not  necessarily  for   natural  disasters)     Hiltz  et  al.  (2013)   Power  et  al.  (2013)   Caragea  et  al.  (2011)   Abel  et  al.  (2012)   Middleton  et  al.  (2014)   MorstaMer  et  al.  (2013)   Imran  et  al.  (2014)  
  • 12. Why building this classifier would be a contribution? • Building  and  valida=ng  a  ground  truth  for   classifying  tweets  in  Spanish.   • Building  the  classifier  and  dealing  with   • Class  Imbalance     • Number  of  latent  dimensions  (Feature  Genera=on   using  LDA)  
  • 13. Workflow of Activities Chile’s  Earthquake   2010   Cas=llo  et  al.   (2010)   Our   ground truth   Non-­‐ relevant   messages   Realis=c   dataset   Sampling,   Cleaning  &     filtering   Classifiers   -­‐  Feature   selec=on  (LDA)   -­‐  Class  Imbalance   10%  -­‐  80%  
  • 14. Building the ground truth • Random  sampling  of  5,000  tweets  from  Cas=llo  et   al.  (2010)  dataset,  used  to  study  credibility  ~  Chile’s   2010  earthquake.   • Dates:  From  February  27th  un=l  March  2nd   (Spanning  4  days  in  2010)   • We  kept  only  Spanish  messages,  removed   messages  too  similar  (Lavenshtein  distance):  2,187   messages  leE  
  • 15. Validating of the ground truth •  Fleiss  Kappa:   •  κ  =  0.645,  p  <  .001   •  Intraclass  correla=on   •  ICC(2,1):  IIC  =  0.646,  p   <  .001   •  Landis  and  Koch  et  al.   (1977)     •   Relevant  messages  were   labeled  based  on  Imran  et  al.   (2013)  classifica=on:   • Cau=on/Warning   • Casual=es  and  Damage   • People  (missing,  found,  etc.)   • Informa=on  source  
  • 16. Workflow of Activities Chile’s  Earthquake   2010   Cas=llo  et  al.   (2010)   Our   ground truth   Non-­‐ relevant   messages   Realis=c   dataset   Sampling,   Cleaning  &     filtering   Classifiers   -­‐  Feature   selec=on  (LDA)   -­‐  Class  Imbalance  
  • 17. Classification Problem Features                                                                                      Class  Imbalance   User   Network   Content  (4,766   unique  words)   Followers   Hashtags   Followees   Words   User  men=ons   •  Ground  Truth  is  a  not  realis=c   representa=on  of  TwiMer   •  We  added  “Noise”:  Introduced   Tweets  non-­‐relevant  to  the  event   (20%  -­‐  80%)   •  Sampled  non-­‐relevant  tweets   from  5  months.   •  Removed  all  tweets  posted   during  days  of  seismic  ac=vi=es  
  • 18. Model   Precision   Recall   F1  score   Accuracy   AUC   Dimensions   Noise   Propor8on   Baseline   0.625   0.545   0.53   0.5   0.568   -­‐   0   Bernoulli   NB   0.831   0.226   0.355   0.594   0.605   2000   0   Logis=c   Regression   0.827   0.641   0.722   0.756   0.834   2000   0.6   Linear  SVM   0.687   0.677   0.682   0.687   0.719   1000   0.6   Random   Forest   0.807   0.673   0.734   0.758   0.844   1000   0.8   Classification Results
  • 19. Analysis ~ LDA Dimensions and Noise
  • 20. Analysis ~ LDA Dimensions and Noise
  • 21. Conclusions & Future Work • We  built  and  validated  a  ground  truth  of  tweets   in  Spanish  relevant  to  disasters   • We  implemented  a  classifier  and  analyzed  its   performance  based  on  several  algorithms  and   dealing  with  class  imbalance  problem   • Future  Work:  Move  the  applica=on  from   prototype  to  produc=on,  test  online  scalability  
  • 22. That’s all folks! •   Thanks  and  ques=ons  to  corresponding  author   Alfredo  Cobo:  ajcobo@uc.cl  or   Denis  Parra:  dparra@uc.cl    
  • 23. Chile, small country, but well-known for its.. • Length  (4,300  Km)     ~  4,300  Km   ~8,000  Km  
  • 24. Model Features •  Newman  et  al.  (2007)   •  Biro  et  al.  (2008)   •  Wei  et  al.  (2006)   •  Wang  et  al.  (2012)   •  Han  (2005)   Features   Corpora  Features   Followers   Hashtags   Friends   Words   User  men=ons  
  • 25. Results •  Amatriain  et  al.  (2013)  
  • 27. Plots of bootstrap Agreement  Day  1   Agreement  Day  2   Agreement  Day  4  Agreement  Day  3  
  • 31. Manual classification •  Vieweg  et  al.  (2010)   •  Imran  et  al.  (2013)  
  • 32. Post Processing •  Cas=llo  et  al.  (2011)   •  Mendoza  et  al.  (2010)  
  • 33. Feature Generation Approaches •  Gimpel  et  al.  (2011)   •  Koloumpis  et  al.  (2011)   •  Liu  et  al.  (2012)   •  Wu  et  al.  (2011)   •  Lee  et  al.  (2014)  
  • 34. Tools For Disaster Management •  Hiltz  et  al.  (2013)   •  Power  et  al.  (2013)   •  Caragea  et  al.  (2011)   •  Abel  et  al.  (2012)   •  Middleton  et  al.  (2014)   •  MorstaMer  et  al.  (2013)   •  Imran  et  al.  (2014)  
  • 35. Building the ground truth •  Mendoza  et  al.  (2010)   •  Imran  et  al.  (2013)  
  • 36. Algorithms and evaluation procedure •  Cas=llo  et  al.  (2011)   •  FawceM  et  al.  (2004)   •  Manning  et  al.  (2008)   •  Wen  et  al.  (2014)