Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Model Validation, performance measu... by eurobasin 3200 views
- Classification using L1-Penalized L... by Setio Pramono 4916 views
- Lesson 3: Evaluation and Comparison... by Gladys Castillo 3719 views
- Lesson 7.2 Bayesian Network Classi... by Gladys Castillo 16601 views
- Machine Learning in H2O by Aakash Gupta 936 views
- Dbm630 lecture08 by Tokyo Institute o... 1053 views

1,458 views

1,204 views

1,204 views

Published on

A talk given at the 26th European Conference on Operational Research, Rome, Italy. 1- 4 July 2013.

No Downloads

Total views

1,458

On SlideShare

0

From Embeds

0

Number of Embeds

31

Shares

0

Downloads

25

Comments

0

Likes

3

No embeds

No notes for slide

- 1. Cross-validation aggregation for forecasting www.lancs.ac.uk Devon K. Barrow Sven F. Crone
- 2. 1. Motivation 2. Cross-validation and model selection 3. Cross-validation aggregation 4. Empirical evaluation 5. Conclusions and future work Outline Cross validation aggregation for forecasting Motivation 1
- 3. • Scenario: – The statistician constructs a model and wishes to estimate the error rate of this model when used to predict future values (a) Published items in each year (b) Citations in Each Year Bootstrapping (Efron ,1979) Cross validation (Stone, 1974) Goal Estimating generalisation error Estimating generalisation error Motivation Cross validation aggregation for forecasting Motivation 2 Procedure Random sampling with replacement from a single learning set (bootstrap samples). The validation set is the same as the original learning set. Splits the data into mutually exclusive subsets, using one subset as a set to train each model, and the remaining part as a validation sample (Arlot & Celisse, 2010) Properties Low variance but is downward biased (Efron and Tibshirani, 1997) Generalization error estimate is nearly unbiased but can be highly variable (Efron and Tibshirani, 1997) Research contribution: We propose to combine the benefits of cross-validation and forecast aggregation – Crogging Forecast aggregation Bagging (Breiman 1996) – aggregates the outputs of models trained on bootstrap samples Bagging for time series forecasting: • Forecasting with many predictors (Watson 2005) • Macro-economic time series e.g. consumer price inflation (Inoue & Kilian 2008) • Volatility prediction (Hillebrand & M. C. Medeiros 2010) • Small datasets – few observations (Langella 2010) • With other approaches e.g. feature selection – PCA (Lin and Zhu 2007) Research gap: In contrast to bootstrapping, cross-validation has not been used for forecasts aggregation Citation results for publications on bagging for time series 1996 - Breiman introduces bootstrapping and aggregation
- 4. Motivation: The Bagging algorithm Cross validation aggregation for forecasting Motivation 3 • Inputs: learning set • Selection the number of bootstraps = • For i=1 to K { – Generate a bootstrap sample using (your favorite bootstrap method) – Using training set estimate a model such that } • Combine model to obtain: xˆ k m iik ym xˆ K k k m K M 1 xˆ 1 xˆ Sk S k S NN yyyS ,x,...,,x,,x 2211 K
- 5. 1. Motivation 2. Cross-validation and model selection 3. Cross-validation aggregation 4. Empirical evaluation 5. Conclusions and future work Outline Cross validation aggregation for forecasting Cross-validation 4
- 6. • Cross validation is a widely used strategy: – Estimating the predictive accuracy of a model – Performing model selection e.g.: • Choosing among variables in a regression or the degrees of freedom of a nonparametric model (selection for identification) • Parameter estimation and tuning (selection for estimation) • Main features: – Main idea: test the model on data not used in estimation – Split data once or several times – Part of data is used for training each model (the training sample), and the remaining part is used for estimating the prediction error of the model (the validation sample) Cross validation aggregation for forecasting Cross-validation 5 Cross-validation: Background
- 7. • K-fold cross-validation: Sample 1 Sample 2 Sample K-1 Sample K Estimation Validation … K t i m e s K samples (one or more observations) Cross-validation: How it works?
- 8. • k-fold cross-validation – Divides the data into k none-overlapping and mutually exclusive sub-samples of approximately equal size. – If k=2, 2-Fold cross validation – If k=10, 10-Fold cross validation • If k=N, Leave-one-out cross-validation (LOOCV) • Monte-carlo cross-validation – Randomly split the data into two sub-samples (training and validation) multiple times, each time randomly drawing without replacement • Hold-out method – A single split into two data sub-samples Cross-validation strategies Cross validation aggregation for forecasting Cross-validation aggregation 7
- 9. • Goal: select a model having the smallest generalisation error • Compute an approximation of the generalisation error defined as follows: • Estimate model m on the training set, and calculate the error on the validation set for sample k is: • Estimate the generalisation error after K repetitions as the average error across all repetitions: N i ii N gen N my mE 1 2 xˆ lim KN my mE KN i val i val i k 1 2 xˆ K mE mE K k k gen 1ˆ What about the K models estimated on the different data sets? Cross validation: model selection Cross validation aggregation for forecasting Cross-validation 8 Select the model with the smallest generalisation error
- 10. 1. Motivation 2. Cross-validation and model selection 3. Cross-validation aggregation 4. Empirical evaluation 5. Conclusions and future work Outline Cross validation aggregation for forecasting Cross-validation aggregation 9
- 11. • In model selection, the model obtained is the one built on all the data (no data reserved for validation) – However predictive accuracy is adjudged on models built on different parts of the data – These supplementary models are thrown away after they have served their purpose • The proposed approach: – We save the predictions made by the K estimated models – This gives us a prediction for every observation in the training sample derived from a model that was built when that observation was in the validation sample – We then average across the predictions from the K models to produce a final prediction. – In the case of neural networks, we also use the validation samples for early stop training – We average across multiple initialisations together with cross validation aggregation (to reduce variance) K k tkt m K M 1 xˆ 1 xˆ Cross-validation aggregation: Crogging Cross validation aggregation for forecasting Cross-validation aggregation 10
- 12. 1. Motivation 2. Cross-validation and model selection 3. Cross-validation aggregation 4. Empirical evaluation 5. Conclusions and future work Outline Cross validation aggregation for forecasting Empirical evaluation 11
- 13. Complete Dataset Reduced Dataset Short Long Normal Difficult SUM Non-Seasonal 25 (NS) 25 (NL) 4 (NN) 3 (ND) 57 Seasonal 25 (SS) 25 (SL) 4 (SN) - 54 SUM 50 50 8 3 111 20 40 60 80 100 120 140 4000 5000 6000 NN3_101 20 40 60 80 100 120 140 0 5000 10000 NN3_102 20 40 60 80 100 120 140 0 5 10 x 10 4 NN3_103 20 40 60 80 100 120 0 5000 10000 NN3_104 20 40 60 80 100 120 140 2000 4000 6000 NN3_105 20 40 60 80 100 120 140 0 5000 10000 NN3_106 4000 5000 NN3_107 5000 10000 NN3_108 Summary description of NN3 competition time series dataset Plot of 10 time series from the NN3 dataset Evaluation: Design and implementation Cross validation aggregation for forecasting Empirical evaluation 12 • Time series data • NN3 dataset: 111 time series from the NN3 competition (Crone, Hibon, and Nikolopoulos 2011) • The following experimental setup is used: – Forecast horizon: 12 months – Holdout period: 18 months – Error Measures: SMAPE and MASE. – Rolling origin evaluation (Tashman,2000). • Neural network specification: – A univariate Multiplayer Perceptron (MLP) with Yt up to Yt-13 lags. – Each MLP network contains a single hidden layer; two hidden nodes; and a single output node with a linear identity function. The hyperbolic tangent transfer function is used.
- 14. Boxplots of the MASE and SMAPE averaged over all ftme series for the different methods. The line of reference represents the median value of the distributions. • Across all time series – On validation set Monte carlo cross-validation is always best – All Crogging variants outperform the benchmark Bagging algorithm and hold-out method (NN model averaging) Method Train Validation Test BESTMLP 1.25 0.96 1.49 HOLDOUT 0.64 0.75 1.20 BAG 0.76 0.70 1.21 MONTECV 0.76 0.41 1.16 10FOLDCV 0.69 0.45 1.07 2FOLDCV 0.73 0.60 1.15 Method Train Validation Test BESTMLP 12.36 11.10 17.89 HOLDOUT 11.78 12.57 16.08 BAG 12.95 13.17 16.32 MONTECV 13.81 8.29 15.35 10FOLDCV 12.65 8.94 15.52 2FOLDCV 13.68 11.19 15.29 MASE and SMAPE averaged over all time series on training, validation and test dataset across all time series Evaluation: Findings Cross validation aggregation for forecasting Empirical evaluation 13 MASE SMAPE
- 15. Length Method Forecast Horizon 1-3 4-12 13-18 1-18 Long BESTMLP 10.79 16.59 20.02 16.77 HOLDOUT 9.34 14.96 16.20 14.43 BAG 9.74 15.46 16.38 14.81 MONTECV 10.86 15.16 15.43 14.54 10FOLDCV 10.39 14.04 14.82 13.69 2FOLDCV 9.03 14.64 15.69 14.06 Length Method Forecast Horizon 1-3 4-12 13-18 1-18 Short BESTMLP 16.83 17.03 20.66 18.20 HOLDOUT 17.59 17.04 20.12 18.16 BAG 17.20 17.27 20.96 18.49 MONTECV 15.47 14.71 19.05 16.28 10FOLDCV 16.00 15.91 20.25 17.37 2FOLDCV 15.86 14.51 18.95 16.21 SMAPE on test set averaged over long time series for short, medium and long forecast horizonSMAPE on test set averaged over short time series for short, medium and long forecast horizon • Data conditions: – Long time series: 10-fold cross-validation has the smallest error for medium to long horizons, and over forecast lead times 1-18 – Short time series: 2-fold cross validation and Monte-carlo cross- validation outperform 10-fold cross-validation for all forecast horizons Boxplots of the SMAPE averaged across long (left) and short (right) time series Evaluation: Findings Cross validation aggregation for forecasting Empirical evaluation 14
- 16. Average errors Ranking all methods Ranking NN/CI SMAPE MASE SMAPE MASE SMAPE MASE B09 Wildi 14.84 1.13 1 2 − − B07 Theta 14.89 1.13 2 2 − − C27 Illies 15.18 1.25 3 9 1 7 ** 2FOLDCV 15.29 1.15 4 3 2 2 ** MONTECV 15.35 1.16 5 4 3 3 B03 ForecastPro 15.44 1.17 6 5 − − … … … … … … … … ** BAG 16.32 1.21 13 8 7 5 … … … … … … … … B00 AutomatANN 16.81 1.21 14 8 8 5 ** MLP 17.89 1.50 15 10 9 6 • NN3 Competition: Evaluation: Findings Cross validation aggregation for forecasting Empirical evaluation 15
- 17. 1. Motivation 2. Cross-validation and model selection 3. Cross-validation aggregation 4. Empirical evaluation 5. Conclusions and future work Outline Cross validation aggregation for forecasting Conclusions and future work 16
- 18. • Conclusion – Cross-validation aggregation outperforms model selection, Bagging and the current approaches to model averaging which uses a single hold-out (validation sample) – It is especially effective when the amount of data available for training the model is limited as shown for short time series – Improvements in forecast accuracy increase with forecast horizons – It offers promising results on the NN3 competition • Future work – Perform bias-variance decomposition and analysis – Consider other base model types other than neural networks – Evaluate forecast accuracy for a larger set of time series - M3 Competition Data (3003 time series, established benchmark) Cross validation aggregation for forecasting Conclusions and future work 17 Conclusions and future work Not a Forecasting Method! A general method for improving the accuracy of a forecast model
- 19. Devon K. Barrow Lancaster University Management School Centre for Forecasting Lancaster, LA1 4YX, UK Tel.: +44 (0) 7960271368 Email: d.barrow@lancaster.ac.uk

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment