Successfully reported this slideshow.
Your SlideShare is downloading. ×

Influx/Days 2017 San Francisco | Jared Lander

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
11/20/2017 Time Series in R
file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Pr...
11/20/2017 Time Series in R
file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Pr...
11/20/2017 Time Series in R
file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Pr...
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Upcoming SlideShare
Testable Code
Testable Code
Loading in …3
×

Check these out next

1 of 73 Ad

Influx/Days 2017 San Francisco | Jared Lander

Download to read offline

MODELING TIME SERIES IN R
Temporal data is being produced in ever greater quantity so it is fortunate that our ability to analyze that data with time series methods has kept pace. During this talk we look at a number of different techniques for modeling time series data. We start with traditional methods such as ARMA then go over more modern tools such as Prophet and even machine learning models like XGBoost. Along the way we look at a bit of theory and the code for training these models.

MODELING TIME SERIES IN R
Temporal data is being produced in ever greater quantity so it is fortunate that our ability to analyze that data with time series methods has kept pace. During this talk we look at a number of different techniques for modeling time series data. We start with traditional methods such as ARMA then go over more modern tools such as Prophet and even machine learning models like XGBoost. Along the way we look at a bit of theory and the code for training these models.

Advertisement
Advertisement

More Related Content

More from InfluxData (20)

Advertisement

Influx/Days 2017 San Francisco | Jared Lander

  1. 1. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 1/73 Time Series in R Jared P. Lander
  2. 2. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 2/73 2/73
  3. 3. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 3/73 3/73
  4. 4. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 4/73 4/73
  5. 5. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 5/73 Time Series and R 5/73
  6. 6. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 6/73 Agenda 1. Our Data 2. Time Series Objects 3. Stationarity 4. Autocorrelation 5. ARIMA 6. Prophet 7. Neural Networks 8. XGBoost 9. Accuracy 6/73
  7. 7. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 7/73
  8. 8. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 8/73 8/73 + - Leaflet
  9. 9. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 9/73 Riders 9/73
  10. 10. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 10/73 ts 10/73
  11. 11. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 11/73 zoo 11/73
  12. 12. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 12/73 xts 12/73
  13. 13. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 13/73 Convert to xts citi <- readr::read_csv(file.path(dataDir, 'citibike_201307_201707.csv')) citiTS <- xts(citi %>% dplyr::select(Trips), order.by=citi$Date) citiTS['2017-07-23/'] Trips 2017-07-23 47779 2017-07-24 44702 2017-07-25 66620 2017-07-26 71672 2017-07-27 67066 2017-07-28 65089 2017-07-29 53897 2017-07-30 60402 2017-07-31 65523 13/73
  14. 14. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 14/73 Stationarity 14/73
  15. 15. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 15/73 Stationarity 1. 2. 15/73
  16. 16. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 16/73 Autocorrelation Function 16/73
  17. 17. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 17/73 ACF acf(citiTS) 17/73
  18. 18. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 18/73 Partial Autocorrelation Function where is the last element of and and 18/73
  19. 19. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 19/73 PACF pacf(citiTS) 19/73
  20. 20. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 20/73 Dickey Fuller Test 20/73
  21. 21. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 21/73 ADF Test adf.test(citiTS) Augmented Dickey-Fuller Test data: citiTS Dickey-Fuller = -2.9189, Lag order = 11, p-value = 0.1893 alternative hypothesis: stationary 21/73
  22. 22. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 22/73 KPSS Test Test that 22/73
  23. 23. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 23/73 KPSS Test kpss.test(citiTS) KPSS Test for Level Stationarity data: citiTS KPSS Level = 5.9245, Truncation lag parameter = 8, p-value = 0.01 23/73
  24. 24. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 24/73 Difference the Data 24/73
  25. 25. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 25/73 Diff citiDiff <- diff(citiTS) dygraph(citiDiff, elementId='CitiDiff') %>% dyRangeSelector() %>% dyUnzoom() 25/73
  26. 26. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 26/73 ACF and PACF acf(citiDiff[-1]) pacf(citiDiff[-1]) 26/73
  27. 27. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 27/73 ADF on Diff adf.test(citiDiff[-1]) Augmented Dickey-Fuller Test data: citiDiff[-1] Dickey-Fuller = -17.493, Lag order = 11, p-value = 0.01 alternative hypothesis: stationary 27/73
  28. 28. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 28/73 KPSS on Diff kpss.test(citiDiff[-1]) KPSS Test for Level Stationarity data: citiDiff[-1] KPSS Level = 0.015114, Truncation lag parameter = 8, p-value = 0.1 28/73
  29. 29. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 29/73 Box-Cox Transformation 29/73
  30. 30. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 30/73 Box-Cox citiBC <- BoxCox(citiTS, lambda=0) dygraph(citiBC, elementId='CitiBC') %>% dyRangeSelector() %>% dyUnzoom() 30/73
  31. 31. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 31/73 ACF and PACF acf(citiBC) pacf(citiBC) 31/73
  32. 32. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 32/73 Starter Models 32/73
  33. 33. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 33/73 Autoregressive 33/73
  34. 34. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 34/73 AR(p) ar14 <- ar(citiTS, order.max=14) ar14$ar [1] 0.48285396 0.03748923 0.01301544 0.01364710 0.05740610 [6] 0.10362113 0.13929120 -0.01445352 0.02920200 -0.04967610 [11] -0.05221794 0.05979242 0.05699536 0.09126062 34/73
  35. 35. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 35/73 Moving Average 35/73
  36. 36. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 36/73 MA(q) ma14 <- ma(citiTS, 14) summary(ma14) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 5337 21823 30914 30836 39158 60702 14 36/73
  37. 37. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 37/73 Autoregressive Moving Average 37/73
  38. 38. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 38/73 ARMA(p, q) arma1_1 <- arma(citiTS, order=c(1, 1)) summary(arma1_1) Call: arma(x = citiTS, order = c(1, 1)) Model: ARMA(1,1) Residuals: Min 1Q Median 3Q Max -47976 -4214 1077 5069 20960 Coefficient(s): Estimate Std. Error t value Pr(>|t|) ar1 0.990907 0.004651 213.067 <2e-16 *** ma1 -0.749162 0.033873 -22.117 <2e-16 *** intercept 309.551314 152.815760 2.026 0.0428 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Fit: sigma^2 estimated as 63353316, Conditional Sum-of-Squares = 93889663352, AIC = 30876.34 38/73
  39. 39. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 39/73 Autoregressive Integrated Moving Average 39/73
  40. 40. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 40/73 ARIMA(p, d, q) arima1_1_1 <- arima(citiTS, order=c(1, 1, 1)) summary(arima1_1_1) Call: arima(x = citiTS, order = c(1, 1, 1)) Coefficients: ar1 ma1 0.3834 -0.9016 s.e. 0.0278 0.0108 sigma^2 estimated as 56481996: log likelihood = -15340.13, aic = 30686.27 Training set error measures: ME RMSE MAE MPE MAPE MASE Training set 169.4511 7512.918 5554.759 -14.66341 31.23256 0.9318706 ACF1 Training set 0.03266496 40/73
  41. 41. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 41/73 Seasonal ARIMA 41/73
  42. 42. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 42/73 ARIMA(p, d ,q)(P, D, Q) arima_season <- arima(citiTS, order=c(1, 1, 1), seasonal=c(12, 2, 12)) summary(arima_season) Call: arima(x = citiTS, order = c(1, 1, 1), seasonal = c(12, 2, 12)) Coefficients: ar1 ma1 sar1 sar2 sar3 sar4 sar5 sar6 -0.4859 -0.9998 -0.2511 -0.0837 -0.0011 0.1572 0.0111 0.3046 s.e. 0.2157 0.0035 0.2579 0.0321 0.0160 0.0234 0.0259 0.0089 sar7 sar8 sar9 sar10 sar11 sar12 sma1 sma2 0.3297 0.5942 0.2559 -0.5208 0.0967 -0.1624 -0.7959 -0.5989 s.e. 0.1686 0.1876 0.2031 0.1055 0.0883 0.1959 0.0716 0.0857 sma3 sma4 sma5 sma6 sma7 sma8 sma9 sma10 0.0765 -0.0006 0.2784 -0.1645 -0.0077 -0.2307 0.3967 0.9158 s.e. 0.0286 0.0494 0.0534 0.0080 0.1215 0.0658 0.0129 0.0936 sma11 sma12 -0.8379 -0.0310 s.e. 0.1090 0.0778 sigma^2 estimated as 48239970: log likelihood = -15222.75, aic = 30499.5 Training set error measures: ME RMSE MAE MPE MAPE MASE Training set 244.1609 6938.476 4892.245 -11.09788 28.59577 0.8207268 ACF1 Training set -0.01560002 42/73
  43. 43. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 43/73 Automatic Fitting citi_arima <- auto.arima(citiTS, max.p=10, max.d=2, max.q=10, max.P=30, max.D=2, max.Q=30, seasonal=TRUE) 43/73
  44. 44. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 44/73 ARIMA Forecast arimaForecast <- forecast(citi_arima, h=30) autoplot(arimaForecast) 44/73
  45. 45. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 45/73 Prophet 45/73
  46. 46. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 46/73 Prophet 46/73
  47. 47. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 47/73 Trend 47/73
  48. 48. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 48/73 Seasonality 48/73
  49. 49. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 49/73 Holidays 49/73
  50. 50. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 50/73 Data For Prophet citiDF <- tibble::tibble(ds=index(citiTS), y=as.numeric(citiTS)) citiDF # A tibble: 1,484 x 2 ds y <date> <dbl> 1 2013-07-01 16650 2 2013-07-02 22745 3 2013-07-03 21864 4 2013-07-04 22326 5 2013-07-05 21842 6 2013-07-06 20467 7 2013-07-07 20477 8 2013-07-08 21615 9 2013-07-09 26641 10 2013-07-10 25732 # ... with 1,474 more rows 50/73
  51. 51. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 51/73 Prophet with Defaults prophet1 <- prophet(citiDF) Initial log joint probability = -48.3415 Optimization terminated normally: Convergence detected: relative gradient magnitude is below tolerance futureDF1 <- make_future_dataframe(prophet1, periods=30) preds1 <- predict(prophet1, df=futureDF1) 51/73
  52. 52. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 52/73 Prophet Results plot.prophet(prophet1, preds1, elementID='preds1') %>% dyRangeSelector(dateWindow=as.Date(c('2017-01-01', '2017-10-30'))) 52/73
  53. 53. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 53/73 Model Components prophet_plot_components(prophet1, fcst=preds1) 53/73
  54. 54. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 54/73 Specify Seasonality prophet2 <- prophet(citiDF, yearly.seasonality=TRUE, weekly.seasonality=TRUE ) Initial log joint probability = -48.3415 Optimization terminated normally: Convergence detected: relative gradient magnitude is below tolerance futureDF2 <- make_future_dataframe(prophet2, periods=30) preds2 <- predict(prophet2, df=futureDF2) 54/73
  55. 55. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 55/73 Seasonal Components prophet_plot_components(prophet2, fcst=preds2) 55/73
  56. 56. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 56/73 Holiday Data holidays <- readr::read_csv(file.path(dataDir, 'holidays_2013_2017.csv')) tail(holidays) # A tibble: 6 x 2 ds holiday <date> <chr> 1 2017-07-04 Independence Day 2 2017-09-04 Labor Day 3 2017-10-09 Columbus Day 4 2017-11-10 Veterans Day 5 2017-11-23 Thanksgiving Day 6 2017-12-25 Christmas Day 56/73
  57. 57. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 57/73 Prophet with Holidays prophet3 <- prophet(citiDF, yearly.seasonality=TRUE, weekly.seasonality=TRUE, holidays=holidays) Initial log joint probability = -48.3415 Optimization terminated normally: Convergence detected: relative gradient magnitude is below tolerance futureDF3 <- make_future_dataframe(prophet3, periods=30) preds3 <- predict(prophet3, df=futureDF3) 57/73
  58. 58. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 58/73 Seasonal Components prophet_plot_components(prophet3, fcst=preds3) 58/73
  59. 59. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 59/73 Neural Network 59/73
  60. 60. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 60/73 Simple Neural Network net1 <- nnetar(citiTS) netPreds1 <- forecast(net1, h=30) 60/73
  61. 61. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 61/73 Neural Net Forecast autoplot(netPreds1, include=180) 61/73
  62. 62. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 62/73 Boosted Trees 62/73
  63. 63. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 63/73 XGBoost where 63/73
  64. 64. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 64/73 XGBoost citiXG1 <- xgbar(as.ts(citiTS), maxlag=14, trend_method="differencing", seas_method="fourier") citiXGPreds1 <- forecast(citiXG1, h=30) 64/73
  65. 65. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 65/73 XGBoost Predictions autoplot(citiXGPreds1, include=180) 65/73
  66. 66. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 66/73 Accuracy 66/73
  67. 67. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 67/73 Accuracy forecastAccuracy <- function(y_hat, y) { mean(abs(as.vector(y_hat) - as.vector(y))) } results <- tibble::tibble( Method=c('ARIMA', 'Prophet', 'Neural Net', 'XGBoost'), MAE=c( forecastAccuracy(arimaForecast$mean, citiTest_ts), forecastAccuracy(preds3 %>% filter(ds >= '2017-07-31') %>% pull(yhat), citiTest_ts), forecastAccuracy(netPreds1$mean, citiTest_ts), forecastAccuracy(citiXGPreds1$mean, citiTest_ts) ) ) 67/73
  68. 68. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 68/73 Accuracy ggplot(results, aes(x=Method, y=MAE)) + geom_bar(stat='identity', width=.01) 68/73
  69. 69. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 69/73 What Have We Seen? Different time series types in R Interactive Plots Stationarity ACF/PACF Dickey Fuller and KPSS Tests Differencing Box-Cox ARIMA Prophet Neural Nets XGBoost · · · · · · · · · · · 69/73
  70. 70. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 70/73 Going Foward Use the forecast package Fit CNNs and RNNs timetk tibbletime Multivariate Models GARCH Models · · · · · · 70/73
  71. 71. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 71/73 Thank You 71/73
  72. 72. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 72/73 Jared P. Lander Chief Data Scientist of Lander Analytics Author of Adjunct Professor at Columbia University Organizer of New York Open Statistical Programming (The R) Meetup Organizer of New York R Conference · · · · · 72/73
  73. 73. 11/20/2017 Time Series in R file:///Users/chrischurilo/Google%20Drive/Marketing-Internal/Events/InfluxDays%202017/Final%20Presentations/Lander%20Jared.html#1 73/73 Packages   73/73

×