Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Twitter floods when it rains: A case study of the UK floods in early 2014

247 views

Published on

~

Published in: Science
  • Be the first to comment

  • Be the first to like this

Twitter floods when it rains: A case study of the UK floods in early 2014

  1. 1. Twitter floods when it rains: A case study of the UK floods in early 2014 Antonia Saravanou University of Athens Dimitrios Gunopulos University of Athens George Valkanas Stevens Institute of Technology Gennady Andrienko Fraunhofer Institute IAIS, DE Social Web for Disaster Management (WWW workshop 2015) Florence, Italy National and Kapodistrian University of Athens
  2. 2. Outline ● Motivation ● Research Questions ● Methodology ○ Data Collection ○ Filtering Step: Flood-Related Lexicon ○ Clustering Step ○ Second Level Clustering ● Results Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  3. 3. Motivation Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  4. 4. Motivation Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  5. 5. Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 Motivation
  6. 6. Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 Motivation ● Identify early the event and the affected area ● Monitor the evolution of the event ● Inform users for emergencies ● Resource allocation ● Ιmmediate notification of special incident management units
  7. 7. Research questions RQ1: How can we identify the areas that have been hit the most by an event? - where to dispatch emergency response units RQ2: How effective can we be in identifying these areas? - robust and effective techniques to base decisions RQ3: Can we identify areas that have been stricken by the event in a similar manner? - transfer the same techniques to similar affected areas Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  8. 8. Data Collection ● Twitter - custom crawler ○ Streaming API ● Collection of public tweets ○ Bounding box that covers UK ○ Extract only tweets with GPS ● 13-17 January 2014 ● > 2.3 million geotagged tweets Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  9. 9. Flood Related Tweets Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 ? Entire Dataset gps location within UK b.b. Flood - Related Tweets
  10. 10. Filtering Step: Custom Flood-Related lexicon rain, flood, weather, storm, showers, ... 13 tokens 1546 tokens 456 tokens tokens that contain at least one word of the initial seed set as a substring only related tokens to the event initial seed set Entire Dataset manually review each keyword and discard non- related false positives e.g. brain, train, e.t.c. e.g. raining, floods, #ukweather, e.t.c. Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  11. 11. Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 Original vs. Flood Related Lexicon ● Manual cleaning process is necessary ● Only 4 keywords flood- related in the original lexicon ● Flood Lexicon is ⅓ of the Original - Slow process + One time at the beginning Top-10 most frequent keywords
  12. 12. Flood Related Tweets exact match with at least one keyword from our flood related lexicon Entire Dataset Flood - Related Tweets Flood Related Lexicon Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  13. 13. ● Why we care ○ where to dispatch emergency response units ○ notify citizens about areas with problems caused by floods ● From GPS to areas ○ Perform spatial clustering using the GPS coordinates ■ Convert GPS coordinates to Cartesian ones Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 RQ1: Identifying flood-affected areas
  14. 14. Clustering Step: K-Means K = 10, 100, 500, 1000 Generated clusters as Voronoi polygons ➔ more splits in the densely populated areas 10 clusters 100 clusters 500 clusters 1000 clusters Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  15. 15. Which areas are the most affected? ● Prioritize generated areas by their potential of being affected Prioritization schemes by area a: 1. By total #tweets: baseline 2. By flood-related #tweets 3. By Signal-to-Noise Ratio:score(a) = #flood-related tweets in a #tweets in a Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  16. 16. Visualization of top-100 most affected areas 1. total #tweets 2. flood-related #tweets Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 add map with SNR 3. SNR Top 100 for K-Means (K=500)
  17. 17. RQ2: Identification Effectiveness 1. Likert Scale [1-5]: to specify the degree that an area has been affected a. 1 = “normal levels of rainfall” b. 5 = “completely flooded” 2. Running Average Likert: Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 Ground Truth - MetOffice add map with SNR 3. SNR
  18. 18. Results Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  19. 19. Results (k = 100) Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 ● Baseline < Flood, SNR ● Flood ~ SNR
  20. 20. Results (k = 500) Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 ● Baseline << Flood < SNR ● #tweets is not a good proxy ● #flood-related tweets is a better one
  21. 21. Results (k = 1000) Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 ● SNR the best metric (especially top20) ● how many users talk about the specific event
  22. 22. RQ3: Similarly affected areas Identify areas with similar behavior on a temporal aspect, in the way that the flooding event was perceived by Twitter users Underlying connection: ● population level, e.g., similar posting patterns ● other variable, e.g., a nearby river Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  23. 23. Second Level Clustering: Attributes Features that show the temporal evolution of the event in an area 1. Number of tweets in day d, count(d) 2. Ratio of day d from area a, ratio(d) = count(d) / Σ count(d’), forall d’ 3. Speed of day d, speed(d) = ratio(d)-ratio(d-1) Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  24. 24. Second Level Clustering: Areas from 2 clusters : cluster 1 : cluster 2 Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 ● Speed feature ● Red cluster: Scotland, Liverpool and Ireland, mostly unaffected ● Purple cluster: Midlands, affected ● Red speed decreases ● Purple speed increases ● Verification with historical data
  25. 25. The INSIGHT project Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015 Detecting Events: - censors on road network - censors on buses - Twitter data http://www.insight-ict.eu/ Intelligent Synthesis and Real-time Response using Massive Streaming of Heterogeneous Data
  26. 26. Conclusions ● Analysis on Twitter data ○ emergencies, disaster management & relief ● Experimental analysis on floodings ○ establishment of “flood related lexicon” ○ division of the entire UK to affected areas ○ identification of flood-stricken areas with high accuracy ● Comparison with ground truth data ○ quality evaluation Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  27. 27. Future Work ● Collect more data of similar flooding events and test our approach in larger datasets ○ generalize in other areas ○ test with larger timespan ● Develop online clustering approaches (1ier) ● To incorporate into the INSIGHT tool Twitter floods when it rains: A case study of the UK floods in early 2014 18 May 2015
  28. 28. Thank you! Acknowledgements: MMD - Mining Mobility DataINSIGHT - Intelligent Synthesis and Real-time Response using Massive Streaming of Heterogeneous Data

×