Attention Flow ByTagging prediction           Time-Series Data Analysis       Yong Zheng and Mengran Liu
Introduction• Flow of Attention (Popular Attention)  Attentions in our life   Stars: movie star, music star, sports star ...
Tags & Our Dataset
Dataset•   From: Diigo.com•   Date Range: Sep. 2004 – Oct. 2010•   Tag: “Bush”•   Daily Dataset    Date         TagCount  ...
Data Preparation• Stationary & Serial Correlation & Normality Test                                               Time Plot...
Data Preparation• Stationary & Serial Correlation & Normality Test•                                   Distribution of dtag...
Model Fitting & Selection• BIC Process• EACF                  m1 = ARIMA (1,1,1)                  m2 = ARIMA (0,1,1)      ...
Check Models• Potential Models  m1 = ARIMA (1,1,1)  m2 = ARIMA (0,1,1)  m3 = ARIMA (4,1,1)• Build Models  1). Guarantee al...
Residuals Check• M2 is removed from candidate list (None-Zero AC)
Model Selection• Due to insignificant parameters, m3 is converted to  the same as m1; that is ARIMA (1,1,1)•    Ljung Box ...
Model & Forecasting• Thus, the model can be represented by:• (1-0.24093B)(1-B)Xt=0.001351+at - 0.86902at-1
Forecasting
Conclusions• The forecasting is not that bad. We can catch the  peaks! Whether the model can predict the bursts is  still ...
Future Research• Use Bush data to predict Obama data; presidents  usually experience similar activities, such as election,...
Thanks Q&A
Upcoming SlideShare
Loading in …5
×

Attention flow by tagging prediction

408 views

Published on

A topic trend can be inferred by the usage of tags -- we name it as attention. Time-series analysis for tagging prediction can indicate the evolution of attention flow. This side takes political analysis for example, using time-series technique and discover interesting patterns.

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
408
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Attention flow by tagging prediction

  1. 1. Attention Flow ByTagging prediction Time-Series Data Analysis Yong Zheng and Mengran Liu
  2. 2. Introduction• Flow of Attention (Popular Attention) Attentions in our life  Stars: movie star, music star, sports star  Media: Harry Potter  New Tech: iPad3, Canon 5D III  Rumor: iPhone 4S v.s. iPhone 5 Attentions on the Internet Social Networks: Facebook, Twitter, MySpace Social Media: Netflix, Pandora, Flickr Social Bookmarking: Delicious.com, Diigo.com
  3. 3. Tags & Our Dataset
  4. 4. Dataset• From: Diigo.com• Date Range: Sep. 2004 – Oct. 2010• Tag: “Bush”• Daily Dataset Date TagCount 09/01/2010 1 09/02/2010 2• Weekly Dataset Date TagCount 09/01/2010 5 09/08/2010 9
  5. 5. Data Preparation• Stationary & Serial Correlation & Normality Test Time Plot Time Plot tag tagsqrt 300 15 14 13 12 200 11 10 9 8 7 100 6 5 4 3 0 2 1 01/01/2004 01/01/2005 01/01/2006 01/01/2007 01/01/2008 01/01/2009 01/01/2010 01/01/2011 01/01/2012 01/01/2004 01/01/2005 01/01/2006 01/01/2007 01/01/2008 01/01/2009 01/01/2010 01/01/2011 01/01/2012 date• date• Time Plot of Original Data Time Plot of Sqrt Transformation Time Plot dtagsqrt 7 6 5 4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 01/01/2004 01/01/2005 01/01/2006 01/01/2007 01/01/2008 01/01/2009 01/01/2010 01/01/2011 01/01/2012 date
  6. 6. Data Preparation• Stationary & Serial Correlation & Normality Test• Distribution of dtagsqrt Distribution of dtagsqrt 35 7.5 30 5.0 25 2.5 dtagsqrt Percent 20 0 15 -2.5 10 -5.0 5 -7.5 -3 -2 -1 0 1 2 3 0 -6.6 -5.4 -4.2 -3.0 -1.8 -0.6 0.6 1.8 3.0 4.2 5.4 6.6 Normal Quantiles dtagsqrt Ljung Box Test, p-value = 1.412e-11 => Serial Correlated Dickey-Fuller Test, p-value < 0.0001 => No unit roots Normality Test, p-value = 5.14e-9
  7. 7. Model Fitting & Selection• BIC Process• EACF m1 = ARIMA (1,1,1) m2 = ARIMA (0,1,1) m3 = ARIMA (4,1,1)
  8. 8. Check Models• Potential Models m1 = ARIMA (1,1,1) m2 = ARIMA (0,1,1) m3 = ARIMA (4,1,1)• Build Models 1). Guarantee all parameters are significant; otherwise, remove insignificant ones; 2). Double check residual analysis in terms of ACF/PACF Plot, White Noise test, Normality Test
  9. 9. Residuals Check• M2 is removed from candidate list (None-Zero AC)
  10. 10. Model Selection• Due to insignificant parameters, m3 is converted to the same as m1; that is ARIMA (1,1,1)• Ljung Box Test, p-value=0.8969 => White Noise
  11. 11. Model & Forecasting• Thus, the model can be represented by:• (1-0.24093B)(1-B)Xt=0.001351+at - 0.86902at-1
  12. 12. Forecasting
  13. 13. Conclusions• The forecasting is not that bad. We can catch the peaks! Whether the model can predict the bursts is still under discussion. Data is limited – it would be better if we have data within more years.• The model includes both AR and MA components, where AR is relevant to Long Term effect and MA relies on Short Term memory. It makes sense if the topic is “Bush”.
  14. 14. Future Research• Use Bush data to predict Obama data; presidents usually experience similar activities, such as election, national congress or meetings, etc. Tag usage may show time-series effects. Political trend is one interesting research area.• It can be extended and applied to other topics/tags, like “iPad3”, “Thanks Giving’s” which may show seasonal effects.• Further Research: Topic Detection & Evolutions
  15. 15. Thanks Q&A

×