Detecting Trends

46,413 views

Published on

Stanislav Nikolov (MIT, Twitter)
Devavrat Shah (MIT)

Interdisciplinary Workshop on Information and Decision in Social Networks 2012

Detecting Trends

  1. 1. Detecting Trends!Stanislav Nikolov §,†Devavrat Shah § § †
  2. 2. Source: http://twoinformcanada.ca/wp-content/uploads/2012/07/barclays.jpg
  3. 3. Source: http://twoinformcanada.ca/wp-content/uploads/2012/07/barclays.jpg
  4. 4. The Barclays Libor scandal # 12:49: “#Barclays” is listed as a trending topic on Twitter
  5. 5. •  Is there enough information before the “jump”?
  6. 6. •  Is there enough information before the “jump”?•  Can we predict which topics will trend in advance?
  7. 7. Yes.
  8. 8. •  79% early detection•  1.43 hours mean early detection•  95% TPR, 4% FPR. (best parameter setting)
  9. 9. What are Trending Topics?•  Twitter: a global communication network.
  10. 10. What are Trending Topics?•  Twitter: a global communication network.•  Tweet: a short, public message.
  11. 11. What are Trending Topics?•  Twitter: a global communication network.•  Tweet: a short, public message.•  Topic: a phrase in a tweet.
  12. 12. What are Trending Topics?•  Twitter: a global communication network.•  Tweet: a short, public message.•  Topic: a phrase in a tweet.•  Trending topic (a “trend”): a topic that becomes popular.
  13. 13. A Parametric Model•  Expect certain type of pattern (e.g. constant + jumps). activity time
  14. 14. A Parametric Model•  Expect certain type of pattern (e.g. constant + jumps).•  Fit parameters to data (e.g. how much of a jump). activity time
  15. 15. A Parametric Model•  Expect certain type of pattern (e.g. constant + jumps).•  Fit parameters to data (e.g. how much of a jump). activity p = 0.1 time
  16. 16. A Parametric Model!•  Expect certain type of pattern (e.g. constant + jumps).•  Fit parameters to data (e.g. how much of a jump). activity p = 0.6 time
  17. 17. A Parametric Model!•  Expect certain type of pattern (e.g. constant + jumps).•  Fit parameters to data (e.g. how much of a jump). activity p = 4.1 time
  18. 18. A Parametric Model!•  Expect certain type of pattern (e.g. constant + jumps).•  Fit parameters to data (e.g. how much of a jump).•  Decide if jump is big enough. trend detected! activity p = 4.1 time
  19. 19. Parametric Models areInadequate! trend detected! activity time
  20. 20. Parametric Models areInadequate! trend detected! activity time
  21. 21. Parametric Models areInadequate! trend detected! activity time
  22. 22. Parametric Models areInadequate! trend detected! activity time
  23. 23. A Data-Driven Approach•  All of the information is in the data.
  24. 24. A Data-Driven Approach•  All of the information is in the data.•  Hypothesis
  25. 25. A Data-Driven Approach!•  All of the information is in the data.•  Hypothesis –  Tweets are written by people.
  26. 26. A Data-Driven Approach•  All of the information is in the data.•  Hypothesis –  Tweets are written by people. –  People are simple.
  27. 27. A Data-Driven Approach!•  All of the information is in the data.•  Hypothesis –  Tweets are written by people. –  People are simple. •  In how they spread information.
  28. 28. A Data-Driven Approach!•  All of the information is in the data.•  Hypothesis –  Tweets are written by people. –  People are simple. •  In how they spread information. •  In how they connect to one another.
  29. 29. A Data-Driven Approach!•  All of the information is in the data.•  Hypothesis –  Tweets are written by people. –  People are simple. •  In how they spread information. •  In how they connect to one another. –  Small number of distinct “ways” in which a topic can become trending.
  30. 30. Classification by Experts
  31. 31. Classification by Experts! observations
  32. 32. Classification by Experts! observationsr
  33. 33. Classification by Experts! observationsr vote
  34. 34. Classification by Experts! observationsr vote
  35. 35. Classification by Experts! observationsr vote
  36. 36. Classification by Experts! observationsr vote
  37. 37. Classification by Experts! observationsr vote
  38. 38. Classification by Experts! observationsr vote
  39. 39. Classification by Experts! observationsr
  40. 40. Properties•  Simple (just compute distances)•  Scalable (can compute distances in parallel)•  Non-parametric – model “parameters” scale with the data
  41. 41. ExperimentalResults
  42. 42. Experiment•  500 trends.•  500 non-trends.•  Do trend detection on a 50% hold out set.•  Online signal classification.
  43. 43. Results – Early Detection (best parameter setting)
  44. 44. Results – FPR / TPR Tradeoff
  45. 45. Results – Early / Late Tradeoff
  46. 46. Concluding Remarks•  Algorithm to detect trends early•  Scalable nonparametric time series analysis
  47. 47. Concluding Remarks•  Algorithm to detect trends early•  Scalable nonparametric time series analysis classification
  48. 48. Concluding Remarks•  Algorithm to detect trends early•  Scalable nonparametric time series analysis classification anomaly detection
  49. 49. Concluding Remarks•  Algorithm to detect trends early•  Scalable nonparametric time series analysis classification anomaly detection prediction
  50. 50. Concluding Remarks•  Algorithm to detect trends early•  Scalable nonparametric time series analysis classification anomaly detection prediction

×