Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Detecting Trends!Stanislav Nikolov §,†Devavrat Shah §                        §   †
Source: http://twoinformcanada.ca/wp-content/uploads/2012/07/barclays.jpg
Source: http://twoinformcanada.ca/wp-content/uploads/2012/07/barclays.jpg
The Barclays Libor scandal            #                 12:49: “#Barclays” is listed as                 a trending topic o...
•  Is there enough information before the   “jump”?
•  Is there enough information before the   “jump”?•  Can we predict which topics will trend in   advance?
Yes.
•  79% early detection•  1.43 hours mean early detection•  95% TPR, 4% FPR.              (best parameter setting)
What are Trending Topics?•  Twitter: a global communication network.
What are Trending Topics?•  Twitter: a global communication network.•  Tweet: a short, public message.
What are Trending Topics?•  Twitter: a global communication network.•  Tweet: a short, public message.•  Topic: a phrase i...
What are Trending Topics?•  Twitter: a global communication network.•  Tweet: a short, public message.•  Topic: a phrase i...
A Parametric Model•  Expect certain type of pattern (e.g.   constant + jumps).  activity             time
A Parametric Model•  Expect certain type of pattern (e.g.   constant + jumps).•  Fit parameters to data (e.g. how much of ...
A Parametric Model•  Expect certain type of pattern (e.g.   constant + jumps).•  Fit parameters to data (e.g. how much of ...
A Parametric Model!•  Expect certain type of pattern (e.g.   constant + jumps).•  Fit parameters to data (e.g. how much of...
A Parametric Model!•  Expect certain type of pattern (e.g.   constant + jumps).•  Fit parameters to data (e.g. how much of...
A Parametric Model!•  Expect certain type of pattern (e.g.   constant + jumps).•  Fit parameters to data (e.g. how much of...
Parametric Models areInadequate!                            trend                            detected!                    ...
Parametric Models areInadequate!                            trend                            detected!                    ...
Parametric Models areInadequate!                            trend                            detected!                    ...
Parametric Models areInadequate!                            trend                            detected!                    ...
A Data-Driven Approach•  All of the information is in the data.
A Data-Driven Approach•  All of the information is in the data.•  Hypothesis
A Data-Driven Approach!•  All of the information is in the data.•  Hypothesis  –  Tweets are written by people.
A Data-Driven Approach•  All of the information is in the data.•  Hypothesis  –  Tweets are written by people.  –  People ...
A Data-Driven Approach!•  All of the information is in the data.•  Hypothesis  –  Tweets are written by people.  –  People...
A Data-Driven Approach!•  All of the information is in the data.•  Hypothesis  –  Tweets are written by people.  –  People...
A Data-Driven Approach!•  All of the information is in the data.•  Hypothesis  –  Tweets are written by people.  –  People...
Classification by Experts
Classification by Experts!                     observations
Classification by Experts!                     observationsr
Classification by Experts!                     observationsr          vote
Classification by Experts!                     observationsr          vote
Classification by Experts!                     observationsr          vote
Classification by Experts!                     observationsr          vote
Classification by Experts!                     observationsr          vote
Classification by Experts!                     observationsr          vote
Classification by Experts!                     observationsr
Properties•  Simple (just compute distances)•  Scalable (can compute distances in   parallel)•  Non-parametric – model “pa...
ExperimentalResults
Experiment•    500 trends.•    500 non-trends.•    Do trend detection on a 50% hold out set.•    Online signal classificati...
Results – Early Detection          (best parameter setting)
Results – FPR / TPR Tradeoff
Results – Early / Late Tradeoff
Concluding Remarks•  Algorithm to detect trends early•  Scalable nonparametric time series   analysis
Concluding Remarks•  Algorithm to detect trends early•  Scalable nonparametric time series   analysis    classification
Concluding Remarks•  Algorithm to detect trends early•  Scalable nonparametric time series   analysis    classification   a...
Concluding Remarks•  Algorithm to detect trends early•  Scalable nonparametric time series   analysis    classification   a...
Concluding Remarks•  Algorithm to detect trends early•  Scalable nonparametric time series   analysis    classification   a...
Detecting Trends
Detecting Trends
Detecting Trends
Detecting Trends
Detecting Trends
Detecting Trends
Upcoming SlideShare
Loading in …5
×

Detecting Trends

53,295 views

Published on

Stanislav Nikolov (MIT, Twitter)
Devavrat Shah (MIT)

Interdisciplinary Workshop on Information and Decision in Social Networks 2012

  • Be the first to comment

Detecting Trends

  1. 1. Detecting Trends!Stanislav Nikolov §,†Devavrat Shah § § †
  2. 2. Source: http://twoinformcanada.ca/wp-content/uploads/2012/07/barclays.jpg
  3. 3. Source: http://twoinformcanada.ca/wp-content/uploads/2012/07/barclays.jpg
  4. 4. The Barclays Libor scandal # 12:49: “#Barclays” is listed as a trending topic on Twitter
  5. 5. •  Is there enough information before the “jump”?
  6. 6. •  Is there enough information before the “jump”?•  Can we predict which topics will trend in advance?
  7. 7. Yes.
  8. 8. •  79% early detection•  1.43 hours mean early detection•  95% TPR, 4% FPR. (best parameter setting)
  9. 9. What are Trending Topics?•  Twitter: a global communication network.
  10. 10. What are Trending Topics?•  Twitter: a global communication network.•  Tweet: a short, public message.
  11. 11. What are Trending Topics?•  Twitter: a global communication network.•  Tweet: a short, public message.•  Topic: a phrase in a tweet.
  12. 12. What are Trending Topics?•  Twitter: a global communication network.•  Tweet: a short, public message.•  Topic: a phrase in a tweet.•  Trending topic (a “trend”): a topic that becomes popular.
  13. 13. A Parametric Model•  Expect certain type of pattern (e.g. constant + jumps). activity time
  14. 14. A Parametric Model•  Expect certain type of pattern (e.g. constant + jumps).•  Fit parameters to data (e.g. how much of a jump). activity time
  15. 15. A Parametric Model•  Expect certain type of pattern (e.g. constant + jumps).•  Fit parameters to data (e.g. how much of a jump). activity p = 0.1 time
  16. 16. A Parametric Model!•  Expect certain type of pattern (e.g. constant + jumps).•  Fit parameters to data (e.g. how much of a jump). activity p = 0.6 time
  17. 17. A Parametric Model!•  Expect certain type of pattern (e.g. constant + jumps).•  Fit parameters to data (e.g. how much of a jump). activity p = 4.1 time
  18. 18. A Parametric Model!•  Expect certain type of pattern (e.g. constant + jumps).•  Fit parameters to data (e.g. how much of a jump).•  Decide if jump is big enough. trend detected! activity p = 4.1 time
  19. 19. Parametric Models areInadequate! trend detected! activity time
  20. 20. Parametric Models areInadequate! trend detected! activity time
  21. 21. Parametric Models areInadequate! trend detected! activity time
  22. 22. Parametric Models areInadequate! trend detected! activity time
  23. 23. A Data-Driven Approach•  All of the information is in the data.
  24. 24. A Data-Driven Approach•  All of the information is in the data.•  Hypothesis
  25. 25. A Data-Driven Approach!•  All of the information is in the data.•  Hypothesis –  Tweets are written by people.
  26. 26. A Data-Driven Approach•  All of the information is in the data.•  Hypothesis –  Tweets are written by people. –  People are simple.
  27. 27. A Data-Driven Approach!•  All of the information is in the data.•  Hypothesis –  Tweets are written by people. –  People are simple. •  In how they spread information.
  28. 28. A Data-Driven Approach!•  All of the information is in the data.•  Hypothesis –  Tweets are written by people. –  People are simple. •  In how they spread information. •  In how they connect to one another.
  29. 29. A Data-Driven Approach!•  All of the information is in the data.•  Hypothesis –  Tweets are written by people. –  People are simple. •  In how they spread information. •  In how they connect to one another. –  Small number of distinct “ways” in which a topic can become trending.
  30. 30. Classification by Experts
  31. 31. Classification by Experts! observations
  32. 32. Classification by Experts! observationsr
  33. 33. Classification by Experts! observationsr vote
  34. 34. Classification by Experts! observationsr vote
  35. 35. Classification by Experts! observationsr vote
  36. 36. Classification by Experts! observationsr vote
  37. 37. Classification by Experts! observationsr vote
  38. 38. Classification by Experts! observationsr vote
  39. 39. Classification by Experts! observationsr
  40. 40. Properties•  Simple (just compute distances)•  Scalable (can compute distances in parallel)•  Non-parametric – model “parameters” scale with the data
  41. 41. ExperimentalResults
  42. 42. Experiment•  500 trends.•  500 non-trends.•  Do trend detection on a 50% hold out set.•  Online signal classification.
  43. 43. Results – Early Detection (best parameter setting)
  44. 44. Results – FPR / TPR Tradeoff
  45. 45. Results – Early / Late Tradeoff
  46. 46. Concluding Remarks•  Algorithm to detect trends early•  Scalable nonparametric time series analysis
  47. 47. Concluding Remarks•  Algorithm to detect trends early•  Scalable nonparametric time series analysis classification
  48. 48. Concluding Remarks•  Algorithm to detect trends early•  Scalable nonparametric time series analysis classification anomaly detection
  49. 49. Concluding Remarks•  Algorithm to detect trends early•  Scalable nonparametric time series analysis classification anomaly detection prediction
  50. 50. Concluding Remarks•  Algorithm to detect trends early•  Scalable nonparametric time series analysis classification anomaly detection prediction

×