Successfully reported this slideshow.
Upcoming SlideShare
×

# Time Series Forecasting using Neural Nets (GNNNs)

662 views

Published on

paper review: Toward Automatic Time-Series Forecasting Using Neural Networks (GNNNs)

Published in: Science
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Time Series Forecasting using Neural Nets (GNNNs)

1. 1. Toward Automatic Time-Series Forecasting Using Neural Networks - Weixhong Yan Presenter: Sean Golliher 1 / 19
2. 2. Relationship to Research Currently analyzing the performance of NEAT for Time Series Forecasting (TSF) Paper summarizes common approaches, and issues, using ANNs for TSF 2 / 19
3. 3. Claims of the Paper Develops an automatic TSF model using a Generalized Regression Neural Network (GRNN) Shows promising results by winning NN3 time-series competition against 60 diﬀerent models 3 / 19
4. 4. General Problems with ANN Most approaches are ad hoc meaning they do some type of preprocessing of the data Typically try diﬀerent ANN architectures to see which one performs better Nelson et al. : ANN inconsistency on TSF is the result of diﬀerent preprocessing strategies Balkin et al. : ANNs require larger number of samples to be trained. Real-world examples, ﬁnancial etc., are short training samples. 4 / 19
5. 5. RBF RBF can be viewed as local linear regression model Apply Gaussian kernel to input data. All inputs go to node of form: G(x) = exp −x − c σ2 (1) Find center points by assigning c (center point) to each point in data set (measuring the distance to center point). This is equivalent to doing a local regression (sigma aﬀects the smoothing of the approximation). Output layer (the weights) are trained using least-squares regression 5 / 19
6. 6. Generalized Deﬁnition for Regression Computation of most probable value of Y for each value of X based on ﬁnite number of possibly noisy measurements of X Conditional mean of y given X (regression of y on X ) is given by: E[y|X] = ∞ −∞ yf (X, y)dy ∞ −∞ f (X, y)dy (2) Since we don’t typically know the density function f(X, y) it can be estimated using a Parzen window density estimator. 6 / 19
7. 7. Generalized Deﬁnition for Regression The generalized deﬁnition yields the following regression function: ˆY (X) = n i=1 Y i exp − D2 i 2σ2 n i=1 exp − D2 i 2σ2 (3) Where D2 i = (X − Xi )T (X − Xi ) In the case of GRNN X is the input data and Xi are the centers. 7 / 19
8. 8. GRNN G(x, xi ) are the standard radial basis functions wi is the generalized regression equation The spread factor dictates the performance 8 / 19
9. 9. Claimed Beneﬁts of GRNN Easy to train Can accurately approximate functions from sparse and noisy data Note: Recent paper, Ahmed et al., claim GRNN inferior to MLP for TSF 9 / 19
10. 10. Methodology Requirements Minimal human intervention Computationally eﬃcient for a large number of series Good forecasting over range of data sets 10 / 19
11. 11. Preprocessing: Outliers Real-world time series has outliers Outliers identiﬁed by |x| ≥ 4max(|ma|, |mb|) (4) where ma = median(xi−3, xi−2, xi−1) and mb = median(xi+1, xi+2, xi+3) If x is an outlier the value is replaced with average value of two points before and after x 11 / 19
12. 12. Preprocessing: Trends Real-world time series has trends. Could be due to seasonality or other factors. Common approaches are curve ﬁtting, ﬁltering, and diﬀerencing. Identifying trends is diﬃcult to do algorithmically Proposes detrending scheme: Split series into segments. If monthly split into 12 if quarterly split into 4 Mean of historical observations within each segment is subtracted from every historical observation in segment. If x is an outlier the value is replaced with average value of two points before and after x 12 / 19
13. 13. Preprocessing: Seasonality Identifying seasonality is typically a manual process Author used a simple approach and deﬁned short series as n ≤ 60 and long n ≥ 60 Uses autocorrelation coeﬃcients at one and two seasonal lags to decide if seasonal Uses a standard method for subtracting out seasonality from series data 13 / 19
14. 14. ANN Modeling Aspects of ANN modeling Spread Factor. Typically found empirically since no good analytic approach has been found. Some guidance was given by Haykin σ = dmax√ 2n where dmax is max distance between the training points. Proposes spread factor be set to d50, d75, d95 (percentiles) of the nearest distance of all training samples to rest of points. Uses three GRNNs that all take the same input and are combined to give the ﬁnal output. Choice of combining three GRNNs is based on previous success in literature 14 / 19
15. 15. ANN Modeling Cont’d Input selection is considered one of the most important aspects in TSF Two general approaches: ﬁlter and wrapper Filtering selects features based on data itself (independent of learning algorithm) Wrapping approaches use the learning algorithm. Wrapper typically performs better. Author uses contiguous lag and limits to one full season for 12 month data. 15 / 19
16. 16. Experimental Results Use NN3 time-series competition dataset which has composed of Dataset A and Dataset B Dataset A is 111 monthly time series data drawn from empirical business time series Dataset B is a small subset of Dataset A which consists of 11 time series Error is measured using sMAPE 16 / 19
17. 17. Experimental Results B indicates statistical model and C indicates computational intelligence model 17 / 19
18. 18. Ablation Studies SP: Spread, MSA: Multiple Step Ahead 18 / 19
19. 19. Discussion Are TSF competitions just a demonstration of the no free lunch theorem? Why is the theorem not mentioned? Did he prove his approach was “better” or did this approach just outperform on a particular contest? Why doesn’t the training of the GRNN factor out outliers and seasonality on its own? Isn’t that what training is for? Why did he choose a GRNN? Previous papers said they perform poorly. What kind of bias does the detrending scheme introduce? Paper was “rule of thumb” oriented. Is there a way to make an automatic approach more rigorous? 19 / 19