SlideShare a Scribd company logo
1 of 6
Download to read offline
Multi-Step-Ahead Simultaneously Forecasting For
Multiple Time-Series, Using Truncated Singular Value
Decomposition (SVD)
Florian Cartuta1
1
Bucharest, Romania
E-mail: floriancartuta2@yahoo.com
Abstract
Purpose: Time series forecasting remains a challenging task across many application fields
despite extensive work done in this domain [6-7]. The purpose of this paper is to propose a
scalable and efficient method which simplifies multi-step-ahead simultaneous forecasting of
large number of time-series. The method proposed here seeks to improve the efficiency and
accuracy of multi-step-ahead forecasting over medium/long term forecast horizons performed
simultaneously in one go, for a large number of time-series. The method proposed in this
work is also exemplified for a store-item forecasting application in retail domain.
The proposed method uses Truncated Singular Value Decomposition at its core, to extract the
dominant correlations of multiple time-series stored in a matrix. It is shown that a very small
number of components extracted (sometime even as low as one or two right singular vectors)
might be sufficient to simultaneously forecast hundreds or more time series through their
dominant correlations. After the main components are extracted, the forecast is made only on
truncated right singular vectors matrix which encodes the time-bound evolution of the
underlying structure of the data, using a standard time-series stochastic forecasting method
like Holt-Winter Triple Exponential Smoothing. In a subsequent step, the original matrix is
recomposed. The recomposed matrix will contain both the reconstructed history
approximation and predicted values for each original time-series. As such by modeling only
few dominant correlations of the entire set, it can be simultaneously generated forecasts for
very large number of time series.
Benefits: The method is scalable, accurate, more processing time efficient than individual
time-series forecasting and can be used to forecast very large number of time-series
simultaneously.
Keywords: Singular Value Decomposition, Multiple Time-Series, Simultaneous Forecasting, Multi-Step-Ahead Forecasting
1
1. Introduction
The method proposed in this paper seeks to improve the
efficience and accuracy of multi-step-ahead forecasting over
medium/long term forecast horizons performed
simultaneously in one go for a large number of time-series.
The example given is for a store-item forecast retail
application, but the method is fairly broad and can
potentially be applied to numerous other real data
applications where multi-step-ahead simultaneous
forecasting of a large number of time-series is needed.
Benefits and challenges: The method introduced here can be
used to forecast very large number of time series
simultaneously. SVD has linear scalability with the number
of rows and cubic scalability with the number of attributes
when a full decomposition is computed. A low-rank
decomposition is typically linear with the number of rows
and linear with the number of columns. SVD has reasonable
computing cost [8]. There are a number of benefits and
challenges to forecasting multiple time series in one go,
especially when we refer to large number of time series in
the order of thousands or tens of thousands, like for example
when the task is store-item demand forecasting in retail
industry. Among benefits, we can enumerate - simplicity:
avoiding to prepare, train and maintain a separate model for
each time-series. A challenge of simultaneous forecasting
methods and models is the level of accuracy - it is difficult to
achieve the same or superior accuracy when many time
series with possible different behaviors are simultaneously
predicted with same model, compared with the situation
when each time-series is predicted using its own model. This
is usually the situation when the task it the demand
prediction at store-item level of granularity. At such low
level of granularity, the product demand which is appreciated
using daily sales is prone to be influenced by perturbing
factors like lack of store item stock , etc. Sales perturbing
factors usually negatively influence the demand forecast
accuracy in the retail domain.
Singular Value Decomposition (SVD) [1]
is one of the most
important matrix factorization techniques. SVD is used to
obtain a low-rank approximation of matrices. It is often the
case that complex systems generate data that is naturally
arranged in large matrices. For example multiple time-series
of store-items sales may be arranged in matrix with each row
containing daily store-item sales and column containing
containing all of sales for each item at a given date.
Remarkably, the data are typically low rank, meaning that
there are a few dominant patterns that explain the high-
dimensional data. The SVD is a numerically robust and
efficient method of extracting these patterns from data. [1]
Definition of the SVD [1]
We are interested in analyzing a large data set X R n×m∈R n×m
(1)
For example in this paper, X will consist of a time-series of
data. The columns are often called snapshots.
The SVD is a unique matrix decomposition that exists for
every complex-valued matrix X C n×m:∈R n×m
X=UΣV T (eq. 2)
where U R n×n and V R m×m are unitary matrices with∈ R n×n and V ∈ R m×m are unitary matrices with ∈ R n×n and V ∈ R m×m are unitary matrices with
orthonormal columns, and Σ R n×m is a matrix with real,∈R n×m
non-negative entries on the diagonal and zeros off the
diagonal. As is the case in the demand forecasting, we will
only use real numbers.
Matrix Approximation
SVD provides a low-rank approximation to matrix X.
According to Eckart-Young theorem[9]: the optimal rank-r
approximation to X, in a least-square sense is given by the
rank-r SVD truncation
(eq. 3)
Here, Ũ and denote the first r leading columns of U and
V, and contains the leading r x r sub-block of Σ.
2. Methodology
In this section I describe the method to multiple-step-ahead
simultaneous forecast a large number of time-series arranged
in a matrix, by first extracting the low rank approximation
matrix, then generating the multi-step-ahead forecasting for
just a limited number (even as low as one or two) of main
components of the right singular vector instead of all time-
series. Finally the original matrix is reconstructed and the
reconstructed matrix will also contain the forecast values for
the entire time-series set.
2.1 Data transformation:
First we will need to arrange and transform the data in a
format suitable for SVD.
2.1.1 Data arrangement in matrix
The dataset X containing time-series values retrieved at same
points in time for each: t0, t1, …, tm (t0 being the oldest
value) is transformed in a matrix with
time-series arranged in rows, and
columns representing t0, …, tm values
respectively. This format is chosen in
order to comply with the format required by the SVD
decomposition: X Rn×m.∈R n×m
2.1.2 Data normalization: In this step, the dataset is scaled
to prepare it for SVD transformation, first by applying power
transformation (I.e: natural log) to stabilize the variance and
to obtain a more-Gaussian distribution. SVD makes the
assumption that the underlying data is Gaussian distributed
and can be well described in terms of means and covariances.
[9] After the power transformation step, data is scaled to
standard normal by rows (which represents time-series).
Outlier treatment may be beneficial since SVD can be
sensitive to outliers[9]. This is an important data pre-
processing step.
2.2 Singular Value Decomposition (SVD)
Next, the scaled X data is decomposed using SVD and the
number of modes to be retained are computed. There are
three matrices obtained through decomposition: U (left
singular vectors matrix), Σ (singular values sorted in the
order of importance) and V (right singular vectors matrix).
There are different software libraries which will perform this
step.[2]
After singular value decomposition, the original matrix X
will be transformed as the singular values and singular
vectors: X=UΣV T
U and V are unitary matrices and they essentially induce a
rotation of the input data. Σ the singular values matrix is a
diagonal matrix inducing scaling.
2.2.1 Optimal low-rank X input matrix approximation
A very important point in truncated SVD decomposition is to
select the rank of truncated SVD in order to obtain the
optimal low-rank X matrix approximation. Instead of taking
all the singular values and their corresponding left and right
singular vectors, we only take the k largest singular values
and their corresponding singular vectors.
As we will see later in this paper, while choosing a higher k
would get us a closer approximation to X, choosing a smaller
k will save us more effort overall due to the fact that we will
not be required to forecast as many components.
Neglecting all but the first k components is justified since the
first k components supposedly capture the underlying
structure or the signal of the data. [3] An example is shown
in figure 1.
Figure 1. Truncated SVD with k-reduced singular decomposition of W
If the original input X matrix had n x m dimension, the k-
truncated SVD matrices will have the following dimensions:
U (n x k), Σ (k x k), V T
(k x m).
If for example we start with X containing 500 time-series
with 365 values each and k = 2 components are used for
truncated decomposition, the resulted dimensions of the low-
rank matrices are: U (500 x 2), Σ (2 x 2), VT
(2 x 365).
There are several methods to compute the optimal k value. In
this paper I used an empirical ‘elbow’-like method by
plotting the semi-log of singular values and choosing the cut-
off at the inflection point, correlated with the inspection of
the auto-correlation function (ACF) for main components of
VT
.
In figure 1 which represents the semi-log plot of the singular
values obtained through decomposition (in the diagonal
matrix Σ), we can see that in the example presented in this
paper, after k = 1 the slope tend to stabilize and we can use
this value for matrix approximation.
Fig.1 Optimal k selection
by method of Semi-log plot
of singular values
Depending on the time-series characteristics, a larger k might
be needed. In other datasets I’ve investigated, the optimal
rank k was in the range 7 – 9. Because this does not change
the approach, in this paper I will refer to the dataset used for
exemplification which is a Kaggle challenge dataset for
store-item forecasting [5].
3. Forecasting the main components of the right singular
vectors matrix VT
The V matrix encodes the time-series dynamics. The k
vectors from the truncated right singular vectors matrix VT
represent the time-bound evolution of the underlying
structure of the data and we will be forecasting them only.
Therefore instead to forecast all n time-series, we will
forecast only k time-series, for a small k << n. As we will see
below, k can have a value as small as one or two, therefore
we will be able to compute the forecast for n time-series by
forecasting only a few (k << n) main VT
components. To
recompose the original input matrix and generate the forecast
for all time-series, we’ll be using the following truncated
matrices: U truncated, Σ truncated obtained in 2.2.1 and a
new VT
matrix: VT
_forecast obtained from the horizontal
concatenation of truncated VT
(shape k x m - from 2.2.1
truncated singular value decomposition) and the k forecasts
each having (1 x forecast_horizon) shape.
3
Therefore the VT
forecast matrix will have the shape: k x m’
where m’ = m + forecast_horizon.
Finally the X_forecast is computed as dot product of
U_truncated, Σ_truncated, VT
forecast, with X_forecast of
shape: n x m’ as per the formula (3).
3.1 Decision about the forecasting model
Because the final forecast will be influenced only by a
limited number of (k) forecasts made on main components of
the V matrix, it is very important that these k forecasts to be
as accurate as possible. We might be needed to take into
account both long-term and short-term cycles and use an
appropriate time-series forecasting machine learning model.
One challenge is that the model should be able to
accommodate multiple cycles (I.e weekly and yearly).
For this study, the model used was a Triple Exponential
Smoothing (Holt Winters) [4] stochastic model because it
can easily accommodate long-term cycle. As is seen in figure
3, the data presents yearly and weekly seasonality. There are
also other models which can be tested for this purpose like
I.e Auto-Regressive Integrated Moving Average (ARIMA)
but this will remain to be done in a future test.
The solution I’ve used in this study was Holt Winters Triple
Exponential Smoothing with yearly seasonality for the
prediction of the first component (choosing k = 1).
4. Example: Application to store-item simultaneously
forecasting task
To exemplify the forecasting method, I used a dataset with
500 time-series from a Kaggle challenge [5]. The dataset
contains 5 years of daily store-item sales data for 50 different
items at 10 different stores (500 store-item time-series in
total). The prediction was made for 3 months of sales: 92
values representing daily sales for each time-series.
In figure 2, is exemplified one store-item time-series and can
be observed that it exhibits yearly seasonality. Through the
analysis of auto-correlation (ACF) and partial auto-
correlation (PCF) graphs we will see that the main
components of the low-rank approximation matrix also
exhibits both yearly and weekly seasonality.
Figure 2. Daily sales of one
store-item
4.1 Data transformation: First the data was split in train and
validation dataframes having shapes: 500 x 1734,
respectively 500 x 92. I reserved the last three months of data
(92 days) for results validation.
As a data preprocessing step, the train data was log
transformed and then standardized as described in 2.1 Data
Transformation paragraph. According to figure 1 the rank k
can be set to 1 and consequently there will be only one
component (mode 0 of the right singular vector matrix) to be
forecasted. In the next step, singular value decomposition
(SVD) was applied on the scaled train data and the low-rank
U (500, 1), Σ (1, 1), VT
(1 , 1734) matrices were computed.
4.2 Data Analysis and Modeling
Figure 3 shows the graph of VT
mode 0.
Figure 3. VT
mode 0
Like in the original time-series, the yearly and weekly cycles
are also captured in the time-series corresponding to mode 0.
Also it exhibits a trend.
The auto-correlation (ACF) graph for 50 lags of mode 0 (fig.
4), displays the weekly seasonality through lag 7 spikes of
the differentiated mode 0 time-series.
Figure 4. Auto-correlation (ACF) of
differentiated mode 0 time-series
(50 lags)
The auto-correlation (ACF) graph of the second term mode
1 (fig. 5), displays no significant lag correlation at any lag,
which is one more reason to limit our rank k to the first mode
(mode 0, k = 1).
Figure 5. Auto-correlation (ACF) of
mode 1 time-series (50 lags)
4
According to the ACF graph, I used for forecasting of mode
0, a Holt-Winters model with yearly seasonality. (figure 6)
Figure 6. Forecast of the first
right singular vector (mode
0)
4.3 Generate forecasting for all 500 time-series:
The forecast is produced by using the approximation
formula:
scaled_forecast_sales = U_truncated * Σ_truncated *
np.hstack((V_truncated,V_forecast))
It is observed that the right singular vectors matrix used is a
horizontal concatenation of the V_truncated and V_forecast
with shape (1 x 1826) and the scaled_forecast_sales will
have the shape 500 x 1826. The dataset will contain both the
reconstructed history (1734 values per store-item time-series)
and the forecast (92 values per store-item time-series).
The figure 7 below, displays the final forecast result for one
time-series. Forecast horizon is 92 days.
Figure 7. Example of a store-item time-series forecast (forecast horizon is
92 days)
4.4 Method evaluation: Comparative results with base
forecasting method: Triple Exponential Smoothing (Holt
Winters) [3]
For evaluation of forecasting accuracy I’ve used two metrics
which are widely used for assessing the prediction
performance: Mean Absolute Percentage Error (MAPE) and
Root Mean Squared Error (RMSE).
In table 1 it is presented the comparison of average for these
two metrics over all 500 time-series, computed for the
method described in this paper and another well known time-
series prediction: the Holt-Winters Triple Exponential
Smoothing method.
It can be seen that the proposed method had better results
over the Holt-Winters method. In average MAPE was
improved by 22.7% and RMSE by ~ 19%.
Table 1: Forecasting Accuracy: results comparison
5. Concluding remarks
In this work I proposed a novel method for multi-step-ahead
multiple time-series simultaneously forecasting using matrix
factorization - Truncated Singular Decomposition at its core.
As a central algorithm, the method uses Truncated Singular
Value Decomposition of a dataset (named X) containing
many time-series to be predicted (shape n x m). After low-
rank approximation of the dataset, the multi-step-ahead
prediction is made only on the main components of the
truncated right singular vector matrix V_truncated. Therefore
instead to forecast n time-series, it is enough to forecast k ,
with k << n. In this example it was enough to set the rank k
to 1 meaning only one mode was used.
By horizontal concatenation of the V_truncated with its
forecast, it results a new V_truncated* matrix which is used
to compose the forecast of original X using formula (2). The
X_forecast matrix contains both the approximated values of
the original X and the multi-step-ahead forecast for all time-
series.
I have shown that this method has several important
advantages:
a. It is very scalable.
b. It can simultaneously predict a large number of n
time-series through prediction of only k main
components (modes), with k << n (k can be as low
as 1).
c. The processing time necessary for multi-step-
ahead multiple time-series simultaneous forecasting
is a fraction of the processing time needed when
individual forecasts are performed for all n time-
series
d. The forecasting accuracy improves in average
over a well known stochastic time-series
forecasting method, namely Holt-Winters Triple
Exponential Smoothing
The proposed method is fairly broad and can potentially be
applied to numerous other real data applications where multi-
step-ahead simultaneous forecasting of a large number of
time-series is needed.
5
Forecasting Accuracy – Results Comparison
Method
18.15 11.03 3.47 4.04
23.48 13.61 5.54 5.12
Average
MAPE [%]
Average
RMSE
Standard
deviation – MAPE
Standard deviation
– RMSE
Prediction on main
modes using SVD
Triple Exponential
Smoothing (Holt
Winters)
References
[1] Brunton S., Kutz J.N, 2019 February Singular Value
Decomposition (SVD) -
Researchgate.net/publication/331230334_Singular_Value_Dec
omposition_SVD
[2] In this work for SVD decomposition I used:
https://numpy.org/doc/stable/reference/generated/numpy.linalg.
svd.html library
[3] Frank M and J.M. Buhmann. June 2011 Selecting the rank of
truncated SVD by Maximum Approximation Capacity
[4] https://en.wikipedia.org/wiki/
Exponential_smoothing#Triple_exponential_smoothing_(Holt_
Winters)
[5] Dataset: https://www.kaggle.com/c/demand-forecasting-
kernels-only
[6] Nielsen A., 2019 Practical Time Series Analysis Prediction with
Statistics & Machine Learning
[7] Mills R. Applied Time Series Analysis: A Practical Guide to
Modeling and Forecasting, 2019 Academic Press
[8] Oracle Database Online Documentation 12c, Release 1 (12.1) /
Data Warehousing and Business Intelligence
[9] Low-rank approximation
6

More Related Content

What's hot

Section 2 - Chapter 9 Part II - Short Tem Pattern - Bar Chart Reversal Patterns
Section 2 - Chapter 9 Part II - Short Tem Pattern  - Bar Chart Reversal PatternsSection 2 - Chapter 9 Part II - Short Tem Pattern  - Bar Chart Reversal Patterns
Section 2 - Chapter 9 Part II - Short Tem Pattern - Bar Chart Reversal PatternsProfessional Training Academy
 
Technical analysis finance
Technical analysis financeTechnical analysis finance
Technical analysis financeHittttesh
 
interest rate, structure and issues.
interest rate,  structure and issues.interest rate,  structure and issues.
interest rate, structure and issues.Rupesh neupane
 
Rupee convertability
Rupee convertabilityRupee convertability
Rupee convertabilityKallol Sarkar
 
Equilibrium of product and money market
Equilibrium of product and money marketEquilibrium of product and money market
Equilibrium of product and money marketSarojasiva
 
Stock chart-Chart patterns and formations-Analysis of chart pattern
Stock chart-Chart patterns and formations-Analysis of chart patternStock chart-Chart patterns and formations-Analysis of chart pattern
Stock chart-Chart patterns and formations-Analysis of chart patternAkbarAli309
 
Risk Management - CH 4 - Practical Considerations | CMT Level 3 | Chartered M...
Risk Management - CH 4 - Practical Considerations | CMT Level 3 | Chartered M...Risk Management - CH 4 - Practical Considerations | CMT Level 3 | Chartered M...
Risk Management - CH 4 - Practical Considerations | CMT Level 3 | Chartered M...Professional Training Academy
 
Box jenkins method of forecasting
Box jenkins method of forecastingBox jenkins method of forecasting
Box jenkins method of forecastingEr. Vaibhav Agarwal
 
Patinkin real balance effect
Patinkin real balance effectPatinkin real balance effect
Patinkin real balance effectsenthamizh veena
 
Cmt learning objective 37 system trading &amp; testing - copy
Cmt learning objective 37  system trading &amp; testing - copyCmt learning objective 37  system trading &amp; testing - copy
Cmt learning objective 37 system trading &amp; testing - copyProfessional Training Academy
 
Exchange Rate Mechanism (ERM) & Exchange Rate and Types
Exchange Rate Mechanism (ERM) & Exchange Rate and TypesExchange Rate Mechanism (ERM) & Exchange Rate and Types
Exchange Rate Mechanism (ERM) & Exchange Rate and TypesMohammed Jasir PV
 

What's hot (20)

Section I - CH 1 - System Design and Testing.pdf
Section I - CH 1 - System Design and Testing.pdfSection I - CH 1 - System Design and Testing.pdf
Section I - CH 1 - System Design and Testing.pdf
 
Section 2 - Chapter 9 Part II - Short Tem Pattern - Bar Chart Reversal Patterns
Section 2 - Chapter 9 Part II - Short Tem Pattern  - Bar Chart Reversal PatternsSection 2 - Chapter 9 Part II - Short Tem Pattern  - Bar Chart Reversal Patterns
Section 2 - Chapter 9 Part II - Short Tem Pattern - Bar Chart Reversal Patterns
 
Moving average
Moving averageMoving average
Moving average
 
Unit Root Test
Unit Root Test Unit Root Test
Unit Root Test
 
Fixed Exchange Rate | Economics
Fixed Exchange Rate | EconomicsFixed Exchange Rate | Economics
Fixed Exchange Rate | Economics
 
Technical analysis finance
Technical analysis financeTechnical analysis finance
Technical analysis finance
 
Risk
RiskRisk
Risk
 
Consumer price index
Consumer price indexConsumer price index
Consumer price index
 
interest rate, structure and issues.
interest rate,  structure and issues.interest rate,  structure and issues.
interest rate, structure and issues.
 
Rupee convertability
Rupee convertabilityRupee convertability
Rupee convertability
 
Index number
Index numberIndex number
Index number
 
Equilibrium of product and money market
Equilibrium of product and money marketEquilibrium of product and money market
Equilibrium of product and money market
 
Stock chart-Chart patterns and formations-Analysis of chart pattern
Stock chart-Chart patterns and formations-Analysis of chart patternStock chart-Chart patterns and formations-Analysis of chart pattern
Stock chart-Chart patterns and formations-Analysis of chart pattern
 
Index number
Index numberIndex number
Index number
 
Risk Management - CH 4 - Practical Considerations | CMT Level 3 | Chartered M...
Risk Management - CH 4 - Practical Considerations | CMT Level 3 | Chartered M...Risk Management - CH 4 - Practical Considerations | CMT Level 3 | Chartered M...
Risk Management - CH 4 - Practical Considerations | CMT Level 3 | Chartered M...
 
Box jenkins method of forecasting
Box jenkins method of forecastingBox jenkins method of forecasting
Box jenkins method of forecasting
 
Patinkin real balance effect
Patinkin real balance effectPatinkin real balance effect
Patinkin real balance effect
 
Cmt learning objective 37 system trading &amp; testing - copy
Cmt learning objective 37  system trading &amp; testing - copyCmt learning objective 37  system trading &amp; testing - copy
Cmt learning objective 37 system trading &amp; testing - copy
 
Exchange Rate Mechanism (ERM) & Exchange Rate and Types
Exchange Rate Mechanism (ERM) & Exchange Rate and TypesExchange Rate Mechanism (ERM) & Exchange Rate and Types
Exchange Rate Mechanism (ERM) & Exchange Rate and Types
 
Chart-Patterns.pptx
Chart-Patterns.pptxChart-Patterns.pptx
Chart-Patterns.pptx
 

Similar to Multi-Step-Ahead Simultaneously Forecasting For Multiple Time-Series, Using Truncated Singular Value Decomposition (SVD)

Vector Quantization Vs Scalar Quantization
Vector Quantization Vs Scalar Quantization Vector Quantization Vs Scalar Quantization
Vector Quantization Vs Scalar Quantization ManasiKaur
 
Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...
Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...
Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...Editor IJCATR
 
Generalization of linear and non-linear support vector machine in multiple fi...
Generalization of linear and non-linear support vector machine in multiple fi...Generalization of linear and non-linear support vector machine in multiple fi...
Generalization of linear and non-linear support vector machine in multiple fi...CSITiaesprime
 
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...IRJET Journal
 
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...ijdpsjournal
 
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...ijdpsjournal
 
WEIGHTED CONSTRAINT SATISFACTION AND GENETIC ALGORITHM TO SOLVE THE VIEW SELE...
WEIGHTED CONSTRAINT SATISFACTION AND GENETIC ALGORITHM TO SOLVE THE VIEW SELE...WEIGHTED CONSTRAINT SATISFACTION AND GENETIC ALGORITHM TO SOLVE THE VIEW SELE...
WEIGHTED CONSTRAINT SATISFACTION AND GENETIC ALGORITHM TO SOLVE THE VIEW SELE...ijdms
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegressionDaniel K
 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectSurya Chandra
 
Support Vector Machine Optimal Kernel Selection
Support Vector Machine Optimal Kernel SelectionSupport Vector Machine Optimal Kernel Selection
Support Vector Machine Optimal Kernel SelectionIRJET Journal
 
Applications of machine learning in Wireless sensor networks.
Applications of machine learning in Wireless sensor networks.Applications of machine learning in Wireless sensor networks.
Applications of machine learning in Wireless sensor networks.Sahana B S
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clusteringtim_hare
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3Nandhini S
 

Similar to Multi-Step-Ahead Simultaneously Forecasting For Multiple Time-Series, Using Truncated Singular Value Decomposition (SVD) (20)

Vector Quantization Vs Scalar Quantization
Vector Quantization Vs Scalar Quantization Vector Quantization Vs Scalar Quantization
Vector Quantization Vs Scalar Quantization
 
Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...
Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...
Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...
 
Generalization of linear and non-linear support vector machine in multiple fi...
Generalization of linear and non-linear support vector machine in multiple fi...Generalization of linear and non-linear support vector machine in multiple fi...
Generalization of linear and non-linear support vector machine in multiple fi...
 
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
 
Group Project
Group ProjectGroup Project
Group Project
 
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
 
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
 
WEIGHTED CONSTRAINT SATISFACTION AND GENETIC ALGORITHM TO SOLVE THE VIEW SELE...
WEIGHTED CONSTRAINT SATISFACTION AND GENETIC ALGORITHM TO SOLVE THE VIEW SELE...WEIGHTED CONSTRAINT SATISFACTION AND GENETIC ALGORITHM TO SOLVE THE VIEW SELE...
WEIGHTED CONSTRAINT SATISFACTION AND GENETIC ALGORITHM TO SOLVE THE VIEW SELE...
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegression
 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems Project
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Support Vector Machine Optimal Kernel Selection
Support Vector Machine Optimal Kernel SelectionSupport Vector Machine Optimal Kernel Selection
Support Vector Machine Optimal Kernel Selection
 
RS
RSRS
RS
 
Applications of machine learning in Wireless sensor networks.
Applications of machine learning in Wireless sensor networks.Applications of machine learning in Wireless sensor networks.
Applications of machine learning in Wireless sensor networks.
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clustering
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 

Recently uploaded

如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationmuqadasqasim10
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样jk0tkvfv
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...BabaJohn3
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancingmohamed Elzalabany
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证pwgnohujw
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunksgmuir1066
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...yulianti213969
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"John Sobanski
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证dq9vz1isj
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.pptRachmaGhifari
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单aqpto5bt
 

Recently uploaded (20)

如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
 

Multi-Step-Ahead Simultaneously Forecasting For Multiple Time-Series, Using Truncated Singular Value Decomposition (SVD)

  • 1. Multi-Step-Ahead Simultaneously Forecasting For Multiple Time-Series, Using Truncated Singular Value Decomposition (SVD) Florian Cartuta1 1 Bucharest, Romania E-mail: floriancartuta2@yahoo.com Abstract Purpose: Time series forecasting remains a challenging task across many application fields despite extensive work done in this domain [6-7]. The purpose of this paper is to propose a scalable and efficient method which simplifies multi-step-ahead simultaneous forecasting of large number of time-series. The method proposed here seeks to improve the efficiency and accuracy of multi-step-ahead forecasting over medium/long term forecast horizons performed simultaneously in one go, for a large number of time-series. The method proposed in this work is also exemplified for a store-item forecasting application in retail domain. The proposed method uses Truncated Singular Value Decomposition at its core, to extract the dominant correlations of multiple time-series stored in a matrix. It is shown that a very small number of components extracted (sometime even as low as one or two right singular vectors) might be sufficient to simultaneously forecast hundreds or more time series through their dominant correlations. After the main components are extracted, the forecast is made only on truncated right singular vectors matrix which encodes the time-bound evolution of the underlying structure of the data, using a standard time-series stochastic forecasting method like Holt-Winter Triple Exponential Smoothing. In a subsequent step, the original matrix is recomposed. The recomposed matrix will contain both the reconstructed history approximation and predicted values for each original time-series. As such by modeling only few dominant correlations of the entire set, it can be simultaneously generated forecasts for very large number of time series. Benefits: The method is scalable, accurate, more processing time efficient than individual time-series forecasting and can be used to forecast very large number of time-series simultaneously. Keywords: Singular Value Decomposition, Multiple Time-Series, Simultaneous Forecasting, Multi-Step-Ahead Forecasting 1
  • 2. 1. Introduction The method proposed in this paper seeks to improve the efficience and accuracy of multi-step-ahead forecasting over medium/long term forecast horizons performed simultaneously in one go for a large number of time-series. The example given is for a store-item forecast retail application, but the method is fairly broad and can potentially be applied to numerous other real data applications where multi-step-ahead simultaneous forecasting of a large number of time-series is needed. Benefits and challenges: The method introduced here can be used to forecast very large number of time series simultaneously. SVD has linear scalability with the number of rows and cubic scalability with the number of attributes when a full decomposition is computed. A low-rank decomposition is typically linear with the number of rows and linear with the number of columns. SVD has reasonable computing cost [8]. There are a number of benefits and challenges to forecasting multiple time series in one go, especially when we refer to large number of time series in the order of thousands or tens of thousands, like for example when the task is store-item demand forecasting in retail industry. Among benefits, we can enumerate - simplicity: avoiding to prepare, train and maintain a separate model for each time-series. A challenge of simultaneous forecasting methods and models is the level of accuracy - it is difficult to achieve the same or superior accuracy when many time series with possible different behaviors are simultaneously predicted with same model, compared with the situation when each time-series is predicted using its own model. This is usually the situation when the task it the demand prediction at store-item level of granularity. At such low level of granularity, the product demand which is appreciated using daily sales is prone to be influenced by perturbing factors like lack of store item stock , etc. Sales perturbing factors usually negatively influence the demand forecast accuracy in the retail domain. Singular Value Decomposition (SVD) [1] is one of the most important matrix factorization techniques. SVD is used to obtain a low-rank approximation of matrices. It is often the case that complex systems generate data that is naturally arranged in large matrices. For example multiple time-series of store-items sales may be arranged in matrix with each row containing daily store-item sales and column containing containing all of sales for each item at a given date. Remarkably, the data are typically low rank, meaning that there are a few dominant patterns that explain the high- dimensional data. The SVD is a numerically robust and efficient method of extracting these patterns from data. [1] Definition of the SVD [1] We are interested in analyzing a large data set X R n×m∈R n×m (1) For example in this paper, X will consist of a time-series of data. The columns are often called snapshots. The SVD is a unique matrix decomposition that exists for every complex-valued matrix X C n×m:∈R n×m X=UΣV T (eq. 2) where U R n×n and V R m×m are unitary matrices with∈ R n×n and V ∈ R m×m are unitary matrices with ∈ R n×n and V ∈ R m×m are unitary matrices with orthonormal columns, and Σ R n×m is a matrix with real,∈R n×m non-negative entries on the diagonal and zeros off the diagonal. As is the case in the demand forecasting, we will only use real numbers. Matrix Approximation SVD provides a low-rank approximation to matrix X. According to Eckart-Young theorem[9]: the optimal rank-r approximation to X, in a least-square sense is given by the rank-r SVD truncation (eq. 3) Here, Ũ and denote the first r leading columns of U and V, and contains the leading r x r sub-block of Σ. 2. Methodology In this section I describe the method to multiple-step-ahead simultaneous forecast a large number of time-series arranged in a matrix, by first extracting the low rank approximation matrix, then generating the multi-step-ahead forecasting for just a limited number (even as low as one or two) of main components of the right singular vector instead of all time- series. Finally the original matrix is reconstructed and the reconstructed matrix will also contain the forecast values for the entire time-series set. 2.1 Data transformation: First we will need to arrange and transform the data in a format suitable for SVD. 2.1.1 Data arrangement in matrix The dataset X containing time-series values retrieved at same points in time for each: t0, t1, …, tm (t0 being the oldest value) is transformed in a matrix with time-series arranged in rows, and columns representing t0, …, tm values respectively. This format is chosen in
  • 3. order to comply with the format required by the SVD decomposition: X Rn×m.∈R n×m 2.1.2 Data normalization: In this step, the dataset is scaled to prepare it for SVD transformation, first by applying power transformation (I.e: natural log) to stabilize the variance and to obtain a more-Gaussian distribution. SVD makes the assumption that the underlying data is Gaussian distributed and can be well described in terms of means and covariances. [9] After the power transformation step, data is scaled to standard normal by rows (which represents time-series). Outlier treatment may be beneficial since SVD can be sensitive to outliers[9]. This is an important data pre- processing step. 2.2 Singular Value Decomposition (SVD) Next, the scaled X data is decomposed using SVD and the number of modes to be retained are computed. There are three matrices obtained through decomposition: U (left singular vectors matrix), Σ (singular values sorted in the order of importance) and V (right singular vectors matrix). There are different software libraries which will perform this step.[2] After singular value decomposition, the original matrix X will be transformed as the singular values and singular vectors: X=UΣV T U and V are unitary matrices and they essentially induce a rotation of the input data. Σ the singular values matrix is a diagonal matrix inducing scaling. 2.2.1 Optimal low-rank X input matrix approximation A very important point in truncated SVD decomposition is to select the rank of truncated SVD in order to obtain the optimal low-rank X matrix approximation. Instead of taking all the singular values and their corresponding left and right singular vectors, we only take the k largest singular values and their corresponding singular vectors. As we will see later in this paper, while choosing a higher k would get us a closer approximation to X, choosing a smaller k will save us more effort overall due to the fact that we will not be required to forecast as many components. Neglecting all but the first k components is justified since the first k components supposedly capture the underlying structure or the signal of the data. [3] An example is shown in figure 1. Figure 1. Truncated SVD with k-reduced singular decomposition of W If the original input X matrix had n x m dimension, the k- truncated SVD matrices will have the following dimensions: U (n x k), Σ (k x k), V T (k x m). If for example we start with X containing 500 time-series with 365 values each and k = 2 components are used for truncated decomposition, the resulted dimensions of the low- rank matrices are: U (500 x 2), Σ (2 x 2), VT (2 x 365). There are several methods to compute the optimal k value. In this paper I used an empirical ‘elbow’-like method by plotting the semi-log of singular values and choosing the cut- off at the inflection point, correlated with the inspection of the auto-correlation function (ACF) for main components of VT . In figure 1 which represents the semi-log plot of the singular values obtained through decomposition (in the diagonal matrix Σ), we can see that in the example presented in this paper, after k = 1 the slope tend to stabilize and we can use this value for matrix approximation. Fig.1 Optimal k selection by method of Semi-log plot of singular values Depending on the time-series characteristics, a larger k might be needed. In other datasets I’ve investigated, the optimal rank k was in the range 7 – 9. Because this does not change the approach, in this paper I will refer to the dataset used for exemplification which is a Kaggle challenge dataset for store-item forecasting [5]. 3. Forecasting the main components of the right singular vectors matrix VT The V matrix encodes the time-series dynamics. The k vectors from the truncated right singular vectors matrix VT represent the time-bound evolution of the underlying structure of the data and we will be forecasting them only. Therefore instead to forecast all n time-series, we will forecast only k time-series, for a small k << n. As we will see below, k can have a value as small as one or two, therefore we will be able to compute the forecast for n time-series by forecasting only a few (k << n) main VT components. To recompose the original input matrix and generate the forecast for all time-series, we’ll be using the following truncated matrices: U truncated, Σ truncated obtained in 2.2.1 and a new VT matrix: VT _forecast obtained from the horizontal concatenation of truncated VT (shape k x m - from 2.2.1 truncated singular value decomposition) and the k forecasts each having (1 x forecast_horizon) shape. 3
  • 4. Therefore the VT forecast matrix will have the shape: k x m’ where m’ = m + forecast_horizon. Finally the X_forecast is computed as dot product of U_truncated, Σ_truncated, VT forecast, with X_forecast of shape: n x m’ as per the formula (3). 3.1 Decision about the forecasting model Because the final forecast will be influenced only by a limited number of (k) forecasts made on main components of the V matrix, it is very important that these k forecasts to be as accurate as possible. We might be needed to take into account both long-term and short-term cycles and use an appropriate time-series forecasting machine learning model. One challenge is that the model should be able to accommodate multiple cycles (I.e weekly and yearly). For this study, the model used was a Triple Exponential Smoothing (Holt Winters) [4] stochastic model because it can easily accommodate long-term cycle. As is seen in figure 3, the data presents yearly and weekly seasonality. There are also other models which can be tested for this purpose like I.e Auto-Regressive Integrated Moving Average (ARIMA) but this will remain to be done in a future test. The solution I’ve used in this study was Holt Winters Triple Exponential Smoothing with yearly seasonality for the prediction of the first component (choosing k = 1). 4. Example: Application to store-item simultaneously forecasting task To exemplify the forecasting method, I used a dataset with 500 time-series from a Kaggle challenge [5]. The dataset contains 5 years of daily store-item sales data for 50 different items at 10 different stores (500 store-item time-series in total). The prediction was made for 3 months of sales: 92 values representing daily sales for each time-series. In figure 2, is exemplified one store-item time-series and can be observed that it exhibits yearly seasonality. Through the analysis of auto-correlation (ACF) and partial auto- correlation (PCF) graphs we will see that the main components of the low-rank approximation matrix also exhibits both yearly and weekly seasonality. Figure 2. Daily sales of one store-item 4.1 Data transformation: First the data was split in train and validation dataframes having shapes: 500 x 1734, respectively 500 x 92. I reserved the last three months of data (92 days) for results validation. As a data preprocessing step, the train data was log transformed and then standardized as described in 2.1 Data Transformation paragraph. According to figure 1 the rank k can be set to 1 and consequently there will be only one component (mode 0 of the right singular vector matrix) to be forecasted. In the next step, singular value decomposition (SVD) was applied on the scaled train data and the low-rank U (500, 1), Σ (1, 1), VT (1 , 1734) matrices were computed. 4.2 Data Analysis and Modeling Figure 3 shows the graph of VT mode 0. Figure 3. VT mode 0 Like in the original time-series, the yearly and weekly cycles are also captured in the time-series corresponding to mode 0. Also it exhibits a trend. The auto-correlation (ACF) graph for 50 lags of mode 0 (fig. 4), displays the weekly seasonality through lag 7 spikes of the differentiated mode 0 time-series. Figure 4. Auto-correlation (ACF) of differentiated mode 0 time-series (50 lags) The auto-correlation (ACF) graph of the second term mode 1 (fig. 5), displays no significant lag correlation at any lag, which is one more reason to limit our rank k to the first mode (mode 0, k = 1). Figure 5. Auto-correlation (ACF) of mode 1 time-series (50 lags) 4
  • 5. According to the ACF graph, I used for forecasting of mode 0, a Holt-Winters model with yearly seasonality. (figure 6) Figure 6. Forecast of the first right singular vector (mode 0) 4.3 Generate forecasting for all 500 time-series: The forecast is produced by using the approximation formula: scaled_forecast_sales = U_truncated * Σ_truncated * np.hstack((V_truncated,V_forecast)) It is observed that the right singular vectors matrix used is a horizontal concatenation of the V_truncated and V_forecast with shape (1 x 1826) and the scaled_forecast_sales will have the shape 500 x 1826. The dataset will contain both the reconstructed history (1734 values per store-item time-series) and the forecast (92 values per store-item time-series). The figure 7 below, displays the final forecast result for one time-series. Forecast horizon is 92 days. Figure 7. Example of a store-item time-series forecast (forecast horizon is 92 days) 4.4 Method evaluation: Comparative results with base forecasting method: Triple Exponential Smoothing (Holt Winters) [3] For evaluation of forecasting accuracy I’ve used two metrics which are widely used for assessing the prediction performance: Mean Absolute Percentage Error (MAPE) and Root Mean Squared Error (RMSE). In table 1 it is presented the comparison of average for these two metrics over all 500 time-series, computed for the method described in this paper and another well known time- series prediction: the Holt-Winters Triple Exponential Smoothing method. It can be seen that the proposed method had better results over the Holt-Winters method. In average MAPE was improved by 22.7% and RMSE by ~ 19%. Table 1: Forecasting Accuracy: results comparison 5. Concluding remarks In this work I proposed a novel method for multi-step-ahead multiple time-series simultaneously forecasting using matrix factorization - Truncated Singular Decomposition at its core. As a central algorithm, the method uses Truncated Singular Value Decomposition of a dataset (named X) containing many time-series to be predicted (shape n x m). After low- rank approximation of the dataset, the multi-step-ahead prediction is made only on the main components of the truncated right singular vector matrix V_truncated. Therefore instead to forecast n time-series, it is enough to forecast k , with k << n. In this example it was enough to set the rank k to 1 meaning only one mode was used. By horizontal concatenation of the V_truncated with its forecast, it results a new V_truncated* matrix which is used to compose the forecast of original X using formula (2). The X_forecast matrix contains both the approximated values of the original X and the multi-step-ahead forecast for all time- series. I have shown that this method has several important advantages: a. It is very scalable. b. It can simultaneously predict a large number of n time-series through prediction of only k main components (modes), with k << n (k can be as low as 1). c. The processing time necessary for multi-step- ahead multiple time-series simultaneous forecasting is a fraction of the processing time needed when individual forecasts are performed for all n time- series d. The forecasting accuracy improves in average over a well known stochastic time-series forecasting method, namely Holt-Winters Triple Exponential Smoothing The proposed method is fairly broad and can potentially be applied to numerous other real data applications where multi- step-ahead simultaneous forecasting of a large number of time-series is needed. 5 Forecasting Accuracy – Results Comparison Method 18.15 11.03 3.47 4.04 23.48 13.61 5.54 5.12 Average MAPE [%] Average RMSE Standard deviation – MAPE Standard deviation – RMSE Prediction on main modes using SVD Triple Exponential Smoothing (Holt Winters)
  • 6. References [1] Brunton S., Kutz J.N, 2019 February Singular Value Decomposition (SVD) - Researchgate.net/publication/331230334_Singular_Value_Dec omposition_SVD [2] In this work for SVD decomposition I used: https://numpy.org/doc/stable/reference/generated/numpy.linalg. svd.html library [3] Frank M and J.M. Buhmann. June 2011 Selecting the rank of truncated SVD by Maximum Approximation Capacity [4] https://en.wikipedia.org/wiki/ Exponential_smoothing#Triple_exponential_smoothing_(Holt_ Winters) [5] Dataset: https://www.kaggle.com/c/demand-forecasting- kernels-only [6] Nielsen A., 2019 Practical Time Series Analysis Prediction with Statistics & Machine Learning [7] Mills R. Applied Time Series Analysis: A Practical Guide to Modeling and Forecasting, 2019 Academic Press [8] Oracle Database Online Documentation 12c, Release 1 (12.1) / Data Warehousing and Business Intelligence [9] Low-rank approximation 6