Earthquake shakes twitter users real-time event detection by social sensors

Uploaded on


More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. @MikeMayer
  • 2. What is Twitter? • Twitter is categorized as a microblogging service. • Twitter users post small blurbs of text that are 140 characters or less called tweets. • With url shorteners and services tailored for Twitter a lot of information can be conveyed in that small space. • Twitter is very free-form and still ways to categorize tweets have emerged. Fusion Search (hashtags)@MikeMayer
  • 3. How is Twitter useful as a sensor?  Twitter users will often report their status, however relevant or irrelevant, to the interest of others  This means that the public timeline is full of noise  The timeline is updated in real-time, faster than a blog, faster than a “static” document  Tweets are faster than traditional news and users select from a buffet of other users to customize their news  However, if the tweets are carefully selected there can be a great deal of useful information found  Tweets contain a great deal of metadata@MikeMayer
  • 4. JSONrepresentationof a singleTweet@MikeMayer Source:
  • 5. “Each Twitter user is a sensor and each Tweet is sensory information”  Of course context must be considered… more on that soon  A bag of words approach isn’t good enough for detecting earthquakes  “My dryer is shaking like crazy”  “Didn’t they used to have a ride at carnivals called Earthquake?”  The paper suggests a machine learning approach to determining the context@MikeMayer
  • 6. Event Detection  The primary focus of the paper is to determine the means to detect an event using so called social sensors  Events are “arbitrary classifications of space/time regions”  Targeted events are natural occurrences (weather, earthquakes, etc.) and human made (traffic jams, crime, etc.)@MikeMayer
  • 7. Semantic Analysis for Tweets  As said before, a bag of words is simply not good enough  To detect and target events they use a SVM (support vector machine), a widely used machine-learning algorithm  They classify Tweets into three components A. Statistical features (number of words…) B. Keyword features C. Word context features (words around a “query word”)@MikeMayer
  • 8. Support Vector Machine  Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. In simple words, given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.1  Very mathy- basically a way to classify data better 1.
  • 9. Tweets as sensory values  Assumption 1 – “Each twitter user is regarded as a sensor…”  Twitter has over 100 million users1  That’s enough sensors to make up for the ones not operating correctly (asleep, tweeting gibberish, busy doing something else…)  Assumption 2 – “Each tweet is associated with a time and location…”  The location is the most fundamental requirement for tweets as a sensor 1. making/articleshow/5808927.cms@MikeMayer
  • 10. Modeling Temporal Model Spatial Model  Every Tweet has a  Tweets considered in created_at chunk of this system require data geolocation information  Using probability the  The spatial model is far paper describes a way more complicated to detect the probability of an event occuring  Need to consider time and a delay as event spreads (earthquake)@MikeMayer
  • 11. Spatial Model Continued Kalman Filters Particle Filters  The paper describes an  Using Twitter user application of Kalman geographic distribution filters to model two cases:  Generate a set of 1. Location estimate of coordinates and sort earthquake center them by weight 2. Trajectory estimation of  Resample and generate a typhoon a new set, predict new sets, weigh the sets, measure, then iterate until convergence@MikeMayer
  • 12. Twitter problems that affect statistical analysis  Sensors are not independent of each other  One user will see another user’s tweets then can re- post them or re-tweet them  Some of the algorithms described before would be more accurate if the sensors were independent@MikeMayer
  • 13. Experimentation and Evaluations  Finally they describe their experimentation methodology and evaluate their findings First, their algorithm: 1. Given a set of query terms G for a target event 2. Issue a query every s seconds and obtain tweets T 3. For each tweet obtain the features A,B, and C that were described earlier 4. Calculate the probability of occurrence using the SVM 5. For each tweet estimate its location based on the coordinates given or by querying Google Maps with the registered location of the user 6. Calculate the estimated distance from the Tweet to the event@MikeMayer
  • 14. Semantic Analysis Evaluation  It turns out that the most important part of a Tweet is not the context of the words (C) nor is the content (B) it is in fact the statistical property (A)  During an event users are surprised and send very short messages  “Earthquake!”@MikeMayer
  • 15. Spatial Estimation Evaluation  The Kalman filter did a poor job at filtering out the noise in determining the probable location of the event  It was difficult to locate events that were in sparsely populated areas as well as events that are surrounded in water  In a naïve and straightforward way they mention that the number of sensors provide the most accurate positioning of an event@MikeMayer
  • 16. Conclusions 1  I’ve thought that using Twitter as a sensor was an interesting idea for months.  The first thing my mom does when there is an earthquake is run to her laptop and Tweet “EARTHQUAKE #socal”  This paper is too mathematical for me to fully grasp in the short time given@MikeMayer
  • 17. Conclusions 2 I found this fascinating: The fastest that an event was detected accurately was 19 seconds. The accuracy they managed was very impressive.@MikeMayer
  • 18. Discussion Time  Questions?  Otherwise… onto the required points…@MikeMayer
  • 19. Discussion 1  1. What the paper is about?  Using Twitter (Tweets) as a sensor  2. What is the major contribution?  Showing that accuracy is possible  3. What did you like best?  The way the paper actually ended with positive results  4. What are the weaknesses (according to you)?  Generally they accomplished what they set out to do but it was very limited in scope (Japan). It could have also been applied to many more types of events.@MikeMayer
  • 20. Discussion 2 1. What is the difference between a document, blog, and a micro-blog in the context of search systems? 2. Tweets are considered to represent real time information. Is that right? What are its implications for News? 3. What is a target event? How are tweets related to that? 4. What is the goal of the system discussed in this paper? Do you think they are successful in their goal? 5. Describe a particle filter. What does it do generally? How is it used in this paper?@MikeMayer
  • 21. Discussion 3 6. What is a support vector machine? Why is it needed in this system? 7. Human Sensors is an increasingly popular concept. Why do you think this is important? Give three examples where this could be effective. 8. Discuss the system. How does it help? What are the critical steps in this algorithm? 9. This paper talks about Kalman Filter and Particle Filter. What is the difference between these two? Do we need both or just one? If you are developing an application to detect location of an accident based on tweets – which one will you use? 10. How has this paper changed your ideas of Twitter?@MikeMayer
  • 22. Thank You.  Follow me on Twitter if you want…  Personal: @MikeMayer  Public: @MikeMayerDev@MikeMayer
  • 23. @MikeMayer