Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Analysis of Twitter Data During Hurricane Sandy


Published on

Learn more about this report at
When Sandy made landfall in New York and along the New Jersey shoreline, the storm itself had already devastated islands in the Caribbean and moved along the eastern seaboard causing damage in coastal states along the way. In a much similar manner, the tweet stream about the storm ebbed and flowed and moved along with it. So that by the time the storm struck New York and New Jersey, the Twitter data stream was already heaviy with talk about its impact and its approach to the northeast.

Finding sense in a tweet storm is sometimes like trying to hold back the storm itself. The magnitude of the data that emerges in social media is only equal to the number of questions asked about information needs, situational information and the integrity of that information.

Accurate analysis of data using standardized statistical methods in scientific studies is critical to determining the validity of empirical research [source]. But in the emerging paradigm of the use of social media during disaster, there is little in terms of documented good practices for data collection and analysis. What facts can be derived from the data? Is the data ‘good’ enough to analyze? What types of questions or statistics can be applied in a manner that would allow ongoing empirical research for future events against past events. But today, we are very pleased to release the report Analysis of Twitter Data during Hurricane Sandy. The report provides a unique snapshot about the tweets emerging in the initial days just before and after the storm made landfall in New York.

Published in: Technology
  • Be the first to comment

Analysis of Twitter Data During Hurricane Sandy

  1. 1. Analysis of Twitter Data during Hurricane SandyStatistics Without Borders And Humanity RoadWith data aggregated by TweetTracker1
  2. 2. Published April 3, 2013 Page 2Introduction • To further the research and analysis of the use of communications tools and social media during disaster, Humanity Road sponsored a project to analyze a discrete set of Hurricane Sandy tweets that originated from Long Island, NY. • The goal was to identify statistically valid data that would add value in understanding the flow of communications during the response and recovery process. Additional research is recommended for the same geography now in the recovery phase of Hurricane Sandy. • There is a need to shorten the timeline for analysis of data during emerging events. We recommend additional research to study the elements and interplay of geography, population, social networks and devices
  3. 3. Published April 3, 2013 Page 3The Team • This team explored what data may be available quickly that could be useful to disaster response organizations in response to an emerging event and also to identify what steps should be taken to increase and ensure ‘good clean data’ is used for the analysis. • The team included experienced members of the technology community. Statistics without Borders performed analysis on data that was aggregated by TweetTracker from Arizona State University Decision Machine Learning Lab (DMML). TweetTracker is a project sponsored by the Office of Naval Research)
  4. 4. Published April 3, 2013 Page 4Parameters • Data set was collected for six days from October 26, 2012 through Oct 31, 2012 on a slow moving event, Hurricane Sandy. • The geoboundary set for research included all of Long Island Geocoding is approximate based on user preferences, exact location may vary due to variables in twitter, cell phone and service provider settings • The report was compiled in partnership with Statistics without Borders, for analysis with data aggregation by TweetTracker from Arizona State University DMML lab (a project sponsored by the Office of Naval Research)
  5. 5. Published April 3, 2013 Page 5Total Tweet Volume• Looking at Twitter Traffic by Day shows that it may be difficult to isolate the effects of time, from the effects of the hurricane. ▫ The lowest volume day was on Sunday before Sandy Hit ▫ The highest volume day was the day after Sandy hit• In order to identify significant shifts in total tweet volume it may be necessary to use longer timelines of local data.
  6. 6. Published April 3, 2013 Page 6 Total Tweet Volume• Views of Tweet volume over time, given a dataset over a small time window, may be made more useful by filtering the tweets to focus on disaster event related keywords• In the image below, we have filtered the tweets that are counted towards the tweet volume by the keyword “Sandy”• Even while, as shown in the previous slide, overall Tweet volume hasn’t changed dramatically, Tweets about Sandy rise dramatically once the storm hits New York Volume of Tweets that mention “Sandy” over timespan of data Start of Sandy
  7. 7. Published April 3, 2013 Page 7Tweets by Source• Further analysis of daily trends by source indicates that there may be some limitations to what twitter data can be Geocoded during weather events. ▫ Starting from Sunday October 28th the % of Geocodable tweets drops from 67% to 36% indicating that there may have been some interference with the ability of mobile user’s phones to provide coordinates. ▫ This is especially notable as the % of Mobile tweets remains fairly constant around 80% *Tweets were classified as “Geocodable” if they were geotagged and were not listed as being from a web source
  8. 8. Published April 3, 2013 Page 8Tweets by Source (continued)▫ The percentage of Geocodable tweets remains low in the days just after the storm as well  This could be caused by damage to mobile geotagging functionality.  It could also represent more users turning off the GPS function of their phone in order to conserve phone battery life.
  9. 9. Published April 3, 2013 Page 9Tweet Locations Manhattan - Baseline • The map below shows Tweets per 10k people on October 28th, 2012. • Tweet Volume on that Sunday was particularly low.
  10. 10. Published April 3, 2013 Page 10Tweet Locations Manhattan – Event Day • The map below shows Tweets per 10k people on October 29th, 2012, the Day Sandy Hit.
  11. 11. Published April 3, 2013 Page 11Tweet Locations Long Island- Baseline • The map below shows Tweets per 10k people on October 28 th, 2012.
  12. 12. Published April 3, 2013 Page 12Tweet Locations Long Island- Event Day • The map below shows Tweets per 10k people on October 29th, 2012. • Tweet Volume on that Sunday was particularly low.
  13. 13. Published April 3, 2013 Page 13Storm Surge Data • The map below has some additional storm surge Figures overlaying the Twitter heat map. • There still seems to be fairly strong Twitter traffic even in areas with high storm surge. • Storm surge data aquired from AccuWeather
  14. 14. Published April 3, 2013 Page 14Network relationships • The social network visualization below shows interactions between Twitter accounts in general and those that contain the string “weather” in them • Links are only made where the tweets in question mentioned “sandy” • Filtering the data in this way and then rendering network relationships can yield useful views • This view may reveal something of where various Twitter users were getting their Sandy related weather updates
  15. 15. Published April 3, 2013 Page 15Twitter analytics summary • In order to draw any strong conclusions from Twitter data it may be necessary to conduct more detailed analysis of overall patterns • Insight may be gained by interactively visualizing the data and filtering for keywords of interest • Map visualization provides some information for locations and high volume areas, and overall patterns. ▫ Unfortunately major events like this hurricane may interfere with the ability to get good location data from Twitter. • Overlaying weather or other event information may add more actionable information to the analysis. • Some mapping software provides easy sharing via the web, and could be used to share maps during emergencies. ▫ These mapping systems would be interactive as well which will make the data more actionable.  ArcGIS Explorer  Google Earth ▫ Some of these systems also include important location information like parks, schools, hospitals and churches. • Network visualization may be useful in gaining insights that geospatial and temporal views elide, such as what news organizations Twitter users interact with about a crisis event
  16. 16. Published April 3, 2013 Page 16Data considerations • To preserve data integrity, the raw data should be imported directly into a statistical or GIS package. Loss of integrity can result when using spreadsheet applications, which are not designed to manage data. • Maps should make use of standard geographies (e.g., Census tracts) wherever possible, as these maps are both freely available and have population counts. • Raw data can be assumed to contain duplicate records and blanks (no text in the tweet). Standard data quality checks should include the removal of duplicates (on ID variables, tweet text and date-time) and blanks. • Accuracy of geocoding should be assessed by looking for unusual (or implausible) concentrations of tweets in specific geographies.
  17. 17. Published April 3, 2013 Page 17Data considerations – cont’d • There are hundreds of different tweet publishing platforms, but only a few account for any substantial proportion of tweets. The top 4 publishing modes account for 80% of tweets; the top 8 account for 90% of tweets. These should be kept in mind when considering any type of device-specific content. Platform Percent Twitter for iPhone 45.5% Twitter for Android 13.7% Instagram 10.5% foursquare 10.2% Tweetbot for iOS 4.9% 2.3% Tweetbot for Mac 2.1% Twitter for BlackBerry 1.8%
  18. 18. 18Overall Summary• Data treatment such as formatting, deduplication, geotagging analysis are important steps to presenting the data.• Geocoding is approximate based on user preferences; exact location may vary due to variables but can be useful to determine• Geocoded information can decrease or degrade in certain type events and warrants more research.• Deduplication should be a standard part of any data cleaning prior to analysis• Geocode trend line should be included in future reports to continue communications research• Tweet volume can remain the same but subject matter shifts can be tracked through keyword analysis.• Analysis of publish codes for platform is possible and recommended at the county level for emergency managers to determine device types & relevant applications. Some codes allow you to infer the device type (e.g., Android, iPhone, iPad, iOS, Blackberry); others dont (e.g., Instagram, Foursquare, Tweetdeck).• Some mapping can be done with free tools such as Google-Earth, ARC GIS and Geofeedia but no matter what tool is used, statistical analysis from Statistics without Borders can help identify trends as well as help to create visually useful content.
  19. 19. Published April 3, 2013 Page 19 CreditsSpecial thanks to the following for contributing their time anddialogue to the preparation of this report•Team selection Cathy Furlong, Statistics without Borders•GIS and heat map results Paige Stover, Statistics without Borders•Network Relationships Joshua Saxe, Statistics without Borders•Analytics & data considerations by Tim B. Gravelle, Statistics without Borders•Additional guidance and recommendations by Joanna Lane, NY VOST•TweetTracker developed by Shamanth Kumar, Fred Morstatter and Dr. Huan LiuArizona State University DMML Lab under a grant from the Office of Naval Research•Summary and Project Management by Cat Graham, Humanity Road•Storm surge data acquired from AccuWeather