@MikeMayer
What is Twitter?                             • Twitter is categorized as a                               microblogging ser...
How is Twitter useful as a sensor?      Twitter users will often report their status, however relevant       or irrelevan...
JSONrepresentationof a singleTweet@MikeMayer   Source: http://www.readwriteweb.com/archives/this_is_what_a_tweet_looks_lik...
“Each Twitter user is a sensor and         each Tweet is sensory information”      Of course context must be considered… ...
Event Detection      The primary focus of the paper is to determine the       means to detect an event using so called so...
Semantic Analysis for Tweets      As said before, a bag of words is simply not good       enough      To detect and targ...
Support Vector Machine      Support vector machines (SVMs) are a set of related       supervised learning methods used fo...
Tweets as sensory values      Assumption 1 – “Each twitter user is regarded as a       sensor…”          Twitter has ove...
Modeling     Temporal Model                Spatial Model      Every Tweet has a            Tweets considered in       cr...
Spatial Model Continued     Kalman Filters                    Particle Filters      The paper describes an           Usi...
Twitter problems that affect                  statistical analysis      Sensors are not independent of each other      O...
Experimentation and                         Evaluations      Finally they describe their experimentation methodology and ...
Semantic Analysis Evaluation      It turns out that the most important part of a Tweet is       not the context of the wo...
Spatial Estimation                  Evaluation      The Kalman filter did a poor job at filtering out the       noise in ...
Conclusions 1      I’ve thought that using Twitter as a sensor was an       interesting idea for months.      The first ...
Conclusions 2     I found this fascinating:     The fastest that an event was detected accurately was     19 seconds.     ...
Discussion Time      Questions?      Otherwise… onto the required points…@MikeMayer
Discussion 1      1. What the paper is about?          Using Twitter (Tweets) as a sensor      2. What is the major con...
Discussion 2     1. What is the difference between a document, blog, and a        micro-blog in the context of search syst...
Discussion 3     6. What is a support vector machine? Why is it needed in        this system?     7. Human Sensors is an i...
Thank You.      Follow me on Twitter if you want…      Personal: @MikeMayer      Public: @MikeMayerDev@MikeMayer
@MikeMayer
Upcoming SlideShare
Loading in …5
×

Earthquake shakes twitter users real-time event detection by social sensors

2,804 views

Published on

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,804
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
76
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Earthquake shakes twitter users real-time event detection by social sensors

  1. 1. @MikeMayer
  2. 2. What is Twitter? • Twitter is categorized as a microblogging service. • Twitter users post small blurbs of text that are 140 characters or less called tweets. • With url shorteners and services tailored for Twitter a lot of information can be conveyed in that small space. • Twitter is very free-form and still ways to categorize tweets have emerged. Fusion Search (hashtags)@MikeMayer
  3. 3. How is Twitter useful as a sensor?  Twitter users will often report their status, however relevant or irrelevant, to the interest of others  This means that the public timeline is full of noise  The timeline is updated in real-time, faster than a blog, faster than a “static” document  Tweets are faster than traditional news and users select from a buffet of other users to customize their news  However, if the tweets are carefully selected there can be a great deal of useful information found  Tweets contain a great deal of metadata@MikeMayer
  4. 4. JSONrepresentationof a singleTweet@MikeMayer Source: http://www.readwriteweb.com/archives/this_is_what_a_tweet_looks_like.php
  5. 5. “Each Twitter user is a sensor and each Tweet is sensory information”  Of course context must be considered… more on that soon  A bag of words approach isn’t good enough for detecting earthquakes  “My dryer is shaking like crazy”  “Didn’t they used to have a ride at carnivals called Earthquake?”  The paper suggests a machine learning approach to determining the context@MikeMayer
  6. 6. Event Detection  The primary focus of the paper is to determine the means to detect an event using so called social sensors  Events are “arbitrary classifications of space/time regions”  Targeted events are natural occurrences (weather, earthquakes, etc.) and human made (traffic jams, crime, etc.)@MikeMayer
  7. 7. Semantic Analysis for Tweets  As said before, a bag of words is simply not good enough  To detect and target events they use a SVM (support vector machine), a widely used machine-learning algorithm  They classify Tweets into three components A. Statistical features (number of words…) B. Keyword features C. Word context features (words around a “query word”)@MikeMayer
  8. 8. Support Vector Machine  Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. In simple words, given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.1  Very mathy- basically a way to classify data better 1. http://en.wikipedia.org/w/index.php?title=Special:Cite&page=Support_vector_machine&id=361629294@MikeMayer
  9. 9. Tweets as sensory values  Assumption 1 – “Each twitter user is regarded as a sensor…”  Twitter has over 100 million users1  That’s enough sensors to make up for the ones not operating correctly (asleep, tweeting gibberish, busy doing something else…)  Assumption 2 – “Each tweet is associated with a time and location…”  The location is the most fundamental requirement for tweets as a sensor 1. http://economictimes.indiatimes.com/infotech/internet/Twitter-snags-over-100-million-users-eyes-money- making/articleshow/5808927.cms@MikeMayer
  10. 10. Modeling Temporal Model Spatial Model  Every Tweet has a  Tweets considered in created_at chunk of this system require data geolocation information  Using probability the  The spatial model is far paper describes a way more complicated to detect the probability of an event occuring  Need to consider time and a delay as event spreads (earthquake)@MikeMayer
  11. 11. Spatial Model Continued Kalman Filters Particle Filters  The paper describes an  Using Twitter user application of Kalman geographic distribution filters to model two cases:  Generate a set of 1. Location estimate of coordinates and sort earthquake center them by weight 2. Trajectory estimation of  Resample and generate a typhoon a new set, predict new sets, weigh the sets, measure, then iterate until convergence@MikeMayer
  12. 12. Twitter problems that affect statistical analysis  Sensors are not independent of each other  One user will see another user’s tweets then can re- post them or re-tweet them  Some of the algorithms described before would be more accurate if the sensors were independent@MikeMayer
  13. 13. Experimentation and Evaluations  Finally they describe their experimentation methodology and evaluate their findings First, their algorithm: 1. Given a set of query terms G for a target event 2. Issue a query every s seconds and obtain tweets T 3. For each tweet obtain the features A,B, and C that were described earlier 4. Calculate the probability of occurrence using the SVM 5. For each tweet estimate its location based on the coordinates given or by querying Google Maps with the registered location of the user 6. Calculate the estimated distance from the Tweet to the event@MikeMayer
  14. 14. Semantic Analysis Evaluation  It turns out that the most important part of a Tweet is not the context of the words (C) nor is the content (B) it is in fact the statistical property (A)  During an event users are surprised and send very short messages  “Earthquake!”@MikeMayer
  15. 15. Spatial Estimation Evaluation  The Kalman filter did a poor job at filtering out the noise in determining the probable location of the event  It was difficult to locate events that were in sparsely populated areas as well as events that are surrounded in water  In a naïve and straightforward way they mention that the number of sensors provide the most accurate positioning of an event@MikeMayer
  16. 16. Conclusions 1  I’ve thought that using Twitter as a sensor was an interesting idea for months.  The first thing my mom does when there is an earthquake is run to her laptop and Tweet “EARTHQUAKE #socal”  This paper is too mathematical for me to fully grasp in the short time given@MikeMayer
  17. 17. Conclusions 2 I found this fascinating: The fastest that an event was detected accurately was 19 seconds. The accuracy they managed was very impressive.@MikeMayer
  18. 18. Discussion Time  Questions?  Otherwise… onto the required points…@MikeMayer
  19. 19. Discussion 1  1. What the paper is about?  Using Twitter (Tweets) as a sensor  2. What is the major contribution?  Showing that accuracy is possible  3. What did you like best?  The way the paper actually ended with positive results  4. What are the weaknesses (according to you)?  Generally they accomplished what they set out to do but it was very limited in scope (Japan). It could have also been applied to many more types of events.@MikeMayer
  20. 20. Discussion 2 1. What is the difference between a document, blog, and a micro-blog in the context of search systems? 2. Tweets are considered to represent real time information. Is that right? What are its implications for News? 3. What is a target event? How are tweets related to that? 4. What is the goal of the system discussed in this paper? Do you think they are successful in their goal? 5. Describe a particle filter. What does it do generally? How is it used in this paper?@MikeMayer
  21. 21. Discussion 3 6. What is a support vector machine? Why is it needed in this system? 7. Human Sensors is an increasingly popular concept. Why do you think this is important? Give three examples where this could be effective. 8. Discuss the system. How does it help? What are the critical steps in this algorithm? 9. This paper talks about Kalman Filter and Particle Filter. What is the difference between these two? Do we need both or just one? If you are developing an application to detect location of an accident based on tweets – which one will you use? 10. How has this paper changed your ideas of Twitter?@MikeMayer
  22. 22. Thank You.  Follow me on Twitter if you want…  Personal: @MikeMayer  Public: @MikeMayerDev@MikeMayer
  23. 23. @MikeMayer

×