Detecting Trends Through Twitter Stream v2


Published on

Published in: Technology

Detecting Trends Through Twitter Stream v2

  1. 1. <ul><li>Detecting Trends through Twitter Stream </li></ul><ul>Neil Marion dela Cruz, BSCS <li>Institute of Computer Science
  2. 2. University of the Philippines Los Banos </li></ul>
  3. 3. <ul><li>Introduction </li></ul>
  4. 4. <ul><li>Twitter and Trending Topics </li></ul><ul><li>Twitter, in particular, is currently the major microblogging service, with more than 11 million active users.
  5. 5. Trending Topics on the other hand is a list that Twitter provides on it’s homepage. A trend is a type of search; a term that is trending is simply a term that appears in higher frequency than other terms over a set amount of time. </li></ul>
  6. 6. <ul><li>Twitter and Trending Topics </li></ul><ul>What Makes A Trend? <li>Twitter tracks the volume of terms mentioned on Twitter on an ongoing basis. Topics break into Trends list when volume of Tweets about that topic at a given moment dramatically increases. </li></ul><ul><li>What the above equation is trying to imply is that if there is a dramatic increase in the frequency of tweets in a relatively short amount of time it tends to get a high trend score. </li></ul>
  7. 7. <ul><li>The Novelty Over Popularity Philosophy </li></ul><ul><li>Say for example we do have #justinbieber and #diablo3release, which are terms that has trended at least once. Certainly, there would be a sudden surge of tweets for #diablo3release compared to #justinbieber. It is because we all know that Diablo III2 is a very much anticipated computer game in such a way that there is a great certainty that the game will gain so much attention world wide right after its release. On the other hand, #justinbieber, associated with the famous young pop artist, will not gain a trend as much as or even close to #diablo3release since this topic has been popular for a long time already. In other words #justinbieber has been always popular among tweeters therefore making a safe conclusion that there will not be a trend for the said topic. </li></ul>
  8. 8. <ul><li>Twitter and Trending Topics </li></ul><ul><li>Studying Twitter’s mechanism of trending topics can be an aide for future researches that involves extracting trends from different types of media. Extracting trends is becoming a necessity to social networking media. Therefore this study can help future social networking applications enthusiasts and developers implement their own trending topics module. </li></ul>
  9. 9. <ul><li>What Will Be Presented </li></ul><ul><li>In this paper we present a method of extracting trending topics from the Twitter stream by means of the Twitter API for obtaining the stream, Z-Score as the scoring method and Lossy-Counting as our streaming algorithm. </li></ul>
  10. 10. <ul><li>Objectives </li></ul>
  11. 11. <ul><li>General </li></ul><ul><li>The main objective of this study is to reproduce Twitter’s
  12. 12. Trending Topics module. </li></ul>
  13. 13. <ul><li>Specific </li></ul><ul><li>To be able to acquire tweets through Twitter’s streaming API
  14. 14. To apply the appropriate streaming algorithm that will determine the most frequent terms in the Tweet stream
  15. 15. To formulate novelty measurement of trends that will put in consideration the f requency/time element of the terms
  16. 16. To create an application that will implement and help analyze trending topics. </li></ul>
  17. 17. <ul><li>Materials and Methods </li></ul>
  18. 18. <ul><li>The Twitter Stream </li></ul>Twitter provides an API that can let anyone download stream of data. The stream contains actual tweets from Twitter users around the world at the time of streaming. It provides public statuses from all users that can be filtered in several ways — by userid, by keyword, by geographic location. A stream that is produced by random sampling is needed in this study and the API provides that.
  19. 19. <ul><li>The Scoring Method </li></ul>In order to determine terms that trend, we need to have a robust scoring method. As discussed earlier, Twitter trends are determined by the degree of its novelty. And from this we can conclude that
  20. 20. <ul><li>The Scoring Method </li></ul>We propose to use the standard score (also called Zscore) as our method for scoring. The standard score is where x is a raw score to be standardized, μ is the mean of the population and is the standard deviation of the population. In the case of scoring tweet trends: x is the currently observed frequency of a term, μ is the mean of all the historical frequencies of the term and is the standard deviation of the historical frequencies of the term.
  21. 21. <ul><li>The Streaming Algorithm </li></ul>The Lossy-Counting Streaming algorithm will be used to deal with the very large volume of stream. The algorithm stores tuples which comprise an item, a lower bound on its count, and a ’delta’ () value which records the difference between the upper bound and the lower bound. When processing the ith item in the stream, if information is currently stored about the item then its lower bound is increased by one; else, a new tuple for the item is created with the lower bound set to one, and set to bi/kc. Periodically, all tuples whose upper bound is less than bi/kc are deleted. These are correct upper and lower bounds on the count of each item, so at the end of the stream, all items whose count exceeds n/k must be stored.
  22. 22. <ul><li>Processing the Twitter Stream </li></ul>
  23. 23. <ul><li>Processing the Twitter Stream </li></ul>
  24. 24. <ul><li>Results and Discussion </li></ul>
  25. 25. <ul><li>The program was run six times, on different days, each within around 400 to 600 minutes. On the duration, all the terms occuring at around 90% of time, meaning have frequencies almost every minute, were determined. The terms are ”LOL”, ”LOVE”, ”PHOTO”, ”THE”, and ”YOU.”
  26. 26. By the fact stated, it can be concluded that the terms enumerated are the most frequent terms on Twitter. As a matter of fact Figure 2 shows the consistency of the frequency of these terms to being high. In addition, these terms have high percentage of occurences as shown in Table 5. </li></ul>
  27. 32. <ul><li>Conclusions and Future Work </li></ul>
  28. 33. <ul><li>Our experiment proved the correctness of our trending topics algorithm in such a way that it is parallel to Twitter’s novelty over popularity philosophy. This can be used on different data sets aside from tweets. And for those who wishes to develop their own microblogging sites, the algorithm is free to be extended. </li></ul>