- The document describes using time series analysis models like ARIMA to forecast daily sales quantities of products like paintings for an online retailer.
- The best model was found to be an ARIMA(7,0,2) model, which uses the previous 7 days' values to predict future values without differencing the data.
- This model provided more accurate predictions than the Facebook Prophet model based on error metrics, while converging during both training and testing.
2. Introduction (1)
• Purpose: Forecast the daily quantity from each product
to be sold by Vivre
• Background: Vivre is a leading online retailer for Home
and Lifestyle in Central and Eastern Europe activating in
8 countries (including BG)
– Offers both campaigns (limited-time discounts for products
from similar fields) and long-term available products
(“products catalogue”)
– Launched in 2012 lots of historical data regarding sales,
providers and customers, etc.
• Result: An application using Time Series Analysis (TSA)
to forecast the number of paintings that will be sold
daily by Vivre Deco
• Design decision: Group the similar products and then
forecast the daily quantity to be sold for each group
12.09.2018 AIMSA 2018 1
3. • Issues:
– Over 106
different product ids time and resource consuming
– No more than about 50-60.000 products available at any time
– Many products were available for only a very limited amount of
time not enough information for prediction
– There are products that may be replaced one for the other
their sales are tightly connected
– The sale of a product is highly influenced by many factors:
seasonality, marketing budget, existing campaigns, providers of
the available goods not all the info is logged in the system
• Motivation: increase the company’s profit:
– Help maintain in the (limited storage area of the) warehouse
only the right stock from each product efficiency
– Allowing faster products delivery towards clients satisfaction
– Improve the flow of the incoming and outcoming deliveries for
the products optimize warehouse usage eliminate time
loss
12.09.2018 AIMSA 2018 2
Introduction (2)
4. 12.09.2018 AIMSA 2018 3
Used Models
• Time series: a set of data items collected at successive points at
uniform time intervals
• TSA: class of methods for processing the TS data to extract its main
patterns predict future values using them
– Autoregressive (AR) model: output depends linearly on its own
previous values
– 1951: Autoregressive Moving Average (ARMA) model 2 parts: AR
model and moving average (MA) part: the error is a linear combination
of the current error, along with some of the previous ones
– 1970: Autoregressive Integrated Moving Average (ARIMA) model
non-stationary data are converted to stationary ones by replacing the
actual data with the difference between them and previous values
– ARIMA(p,d,q): ARIMA(p,0,q) = ARMA(p,q); ARIMA(p,0,0) = AR(p)
5. • Being given (p, d, q) and X, ARIMA uses the Box-Jenkins
method to find the best fit of a TS to past values use the
parameters for predictions
• Choosing d: if data is not stationary, increase d until it
becomes so
• Choosing the best values of p and q is not easy:
– Using AICc - Akaike information criterion with correction
– Using ACF (autocorrelations function) to determine p and PACF
(partial autocorrelations function) plots to determine q
• Existing applications: various tasks
– Estimate the monthly catches of pilchard, forecast the Irish
inflation, predict next-day electricity prices in Spain and
California, estimate the incidence of haemorrhagic fever with
renal Syndrome in China, foretell the sugarcane production, etc.
– Stock forecasting: find the best parameters for estimating the
stock prices of a particular stock
12.09.2018 AIMSA 2018 4
Behind the Models
6. • Used the TSA models to predict the daily quantity to be sold for
one generic name (“painting”):
– Enough data available to enable predictions (few days with 0 -
counts)
– Distribution has some seasonality
– Some spikes that could be interpreted as outliers
– Training: data from 01.01.2014 – 31.12.2017 – 1461 samples
– Testing: data from 01.01.2018 – 07.05.2018 – 127 samples
– Models quality was tested using 3 error measures: RMS for both the
training and testing sets and MAPE for test set
12.09.2018 AIMSA 2018 5
Case Study (1)
7. • Built several ARIMA models, with different
parameters, and tested the forecasting accuracy:
– AR with different orders (p: 1, 7, 31, 365) – current
values are influenced be the one from the previous
day, by the ones from the previous week, month or
year
– Use various differencing to see if the data seasonality
improves the results: tested daily, weekly and
monthly seasonality (by differencing of d, we
understand single differencing with lags = d: instead
of Xi – Xi-1, we used Xi – Xi-d)
12.09.2018 AIMSA 2018 6
Case Study (2)
Error 365,0 31, 0 7, 0 1, 0 31, 31 31, 7 31, 1 7, 7 7, 1 1, 1
RMS 68.3 38.77 40.82 43.33 58.3 57.74 83.24 39.56 54.37 37.13
MAPE 109.7 42.06 40.89 45.26 79.3 72.04 169.2 56.44 88.73 41.55
8. – ARMA models with different p (2-7) and q (1, 2, 3)
• Used ACF & PACF to determine the best values of p & q
• Best model ARMA(5,2) did not converge unable
to generate predictions for all the 127 test samples
• to choose for the best model, we considered only
the models that converged Best model: ARMA(7,2)
12.09.2018 AIMSA 2018 7
Case Study (3)
Error (7,2),
0
(6,2), 0 (5,2),
0
(4,2),
0
(3,2), 0 (7,1),
0
(5,1), 0 (3,2),
365
(4,3), 7
RMS Train 33.53 33.68 33.66 33.97 33.98 33.57 33.83 46.49 48.1
RMS 1 step 38.52 38.48 36.1 38.46 38.47 38.57 35.96 57.76 51.77
MAPE 1 step 38.73 38.69 31.5 38.76 38.79 38.75 39.21 73.55 59.48
RMS iter 31.94 32.32 26.7 32.65 32.66 32.02 31.58 74.02 77.27
MAPE iter 32.86 33.41 24.22 33.8 34.77 33.23 39.82 97.22 144.66
Converged yes yes no yes yes yes yes yes Yes
Converged iter yes no no no yes no no no No
ACF 5 5 5 5 5 5 5 3 4
PACF 2 2 2 2 2 2 2 2 3
lags 0 0 0 0 0 0 0 365 7
Prediction # 127 127 27 127 127 127 127 114 127
9. 12.09.2018 AIMSA 2018 8
Case Study (4)
solver method time RMS MAPE Converged Converged iter Predictions
lbfgs
css-mle 377.23 31.94 32.86 yes yes 127
mle 365.52 31.94 32.86 yes yes 127
css 45.5 31.91 32.88 yes yes 127
newton
css-mle 633.2 31.94 32.86 yes yes 127
mle 961.88 31.94 32.86 yes yes 127
css 78.65 31.91 32.88 yes yes 127
nm
css-mle 200.06 31.95 32.81 no no 127
mle 131.72 31.97 32.94 no no 127
css 30.83 31.91 32.93 no no 127
cg
css-mle 1197.41 31.94 32.88 no no 127
mle 923.67 31.94 32.87 no mostly 127
css 96.39 31.92 32.85 no no 127
ncg
css-mle 1099.61 31.94 32.86 yes yes 127
mle 1168.33 31.94 32.86 yes yes 127
css 171.38 31.91 32.88 yes yes 127
powell
css-mle 208.18 31.93 32.83 yes yes 127
mle 139.3 31.97 32.96 yes yes 127
css 31.61 31.94 33.01 yes yes 127
– Best ARMA model – ARMA(7,2) = ARIMA (7,0,2):
• Used multiple solvers: lbfgs, newton, nn, cg, ncg, powell
• Used 3 different methods: css, mle & css-mle
• Measured time needed to make the predictions: important, since the
whole process should be repeated a couple thousand times per day
10. 12.09.2018 AIMSA 2018 9
Case Study (5)
– Best model – ARIMA(7,0,2):
• Used the Augmented Dickey–Fuller test to verify if the TS was
stationary (to find out if differencing could improve the
results)
• Test showed that the series was stationary (integration will
not help) we still verified a couple of integrated models
(with d = 1) to see the seasonality influence:
– Differencing was made explicitly before running ARIMA d = 0
– We also made a test on a random combination of p and q (5, 1) to
see the implicit integration of ARIMA (5,1,1)
Lags 0 365 31 7 1 (5,1,1)
RMS Train 33.53 45.74 48.77 39.54 33.36 65.3
RMS 1 step 38.52 57.78 52.56 71.66 68.87 /72.88 34.17
MAPE 1 step 38.73 73.62 61.64 106.4 100 43.87
RMS iter 31.94 54.62 64.74 45.65 62.15 31.58
MAPE iter 32.86 68.98 107.91 80.63 99.2 39.82
Converged yes yes yes yes yes yes
Converged iter yes yes yes yes no no
ACF 5 3 4 4 1 5
PACF 2 2 2 3 1 2
` 0 365 31 7 1 0
Predictions # 127 127 127 127 38 127
11. 12.09.2018 AIMSA 2018 10
Results
triangles = original values
circles = 1-step forecasting
‘x’-s = iterative predictions
• Used ARIMA method from statsmodels.tsa.arima_model.ARIMA
python package 2 types of prediction:
– Forecast: predicts all the values in a single run
– Predict: predicts only the next value in the time series and was run
iteratively, interleaved with re-fitting the model using the true value
– Best results (when the model converged, and time was acceptable):
ARIMA(7,0,2) using powell solver and css-mle method
12. 12.09.2018 AIMSA 2018 11
ARIMA vs FBProphet
triangles = original values
circles = FBProphet + spikes model
‘x’-s = iterative predictions
• FBProphet may be run with or without daily seasonality similar
results in both situations (RMS 1 step 33.18 / 33.21; MAPE 1 step
37.85 / 37.77; RMS iterative 32.26 / 32.25; MAPE iterative 41.29 /
41.35; runtime 1668s / 1380s)
• FBProphet + model for TS spikes using a different distribution: the
results did not improve much (RMS 1 step 33.04; MAPE 1 step
37.43), remaining poorer than ARIMA(7,0,2)
13. • AR order: the best model was the one involving the previous 7 days
• Differencing: Except for AR(1) with daily seasonality, including
differencing according to different seasonality worsen the results
the best solution is to work directly with the data, without
differencing
• P & Q: The choosing of p & q using ACF and PACF does not work in
our case:
– The model with p and q generated by ACF and PACF was the only one
that did not converge during training (ARIMA(5,0,2))
– All the models with p and q chosen this way did not converge during
testing
• Although ARIMA(5,0,2) seems to have the best results, it only
managed to provide 27 predictions out of 127
• The best model was ARIMA(7,0,2) which converged during both
training and testing
• Solvers and optimization methods: The investigation of different
solvers and optimization methods showed that they have a very
small influence on the obtained results
12.09.2018 AIMSA 2018 12
Conclusions (1)
14. • Convergence: Since some of the methods did not converge, for
further experiments was chosen the combination leading to the
second-best result (powell solver, css-mle method) it also had a
reduced running time, which counts when the process has to be
repeated 27,000 times / day
• Results: Promising, some of them being even better than
FBProphet’s ones, but not fit for production
• Possible improvements:
– Results were obtained using the data from a single generic name
use the information related to the other generic names as well
(change ARIMA for a multi-variate model)
– Use additional information: marketing budget, sales events, number
and type of products on sale in each sale event, number of page views,
products availability, etc.
– Test other ML models: logistic regression, random forest, neural nets
and deep learning
– Create an ensemble from different models that generate predictions
in different ways, which might help eliminate some of the prediction
errors
12.09.2018 AIMSA 2018 13
Conclusions (2)
15. 12.09.2018 AIMSA 2018 14
Thank you very much!
This work was partly supported through the GEX contract 20/25.09.2017
funded by the University Politehnica of Bucharest. We would also like to thank
Vivre Deco for providing us the data that made this study possible.
Questions
Editor's Notes
- number of (group of) products for which predictions should be provided dropped to around 27,000
- diminish the scarcity and variability in the data
- all the obtained groups had at least one representative available on the site
- the problem with rarely available products and with the replaceable products was also solved
γ1, γ2…, γp are the parameters of the model, c is a constant, and ε1, ε2, …, εt are white noise error terms
θ1, θ2, …, θq are the MA model parameters
p is the autoregressive order, d is the degree of differencing and q is the order of the moving-average model
RMS (root mean square error)
MAPE (mean absolute percentage error)
1st value is the p value
2nd value is the d value
1st value is the p value
2nd value is the d value
(lbfgs – limited memory Broyden-Fletcher-Goldfarb-Shanno; newton – Newton-Raphson; nn – Nelder-Mead; cg – conjugate gradient; ncg – non-conjugate gradient; and powell)
(css – maximize the conditional sum of squares likelihood; mle – maximize the exact likelihood via the Kalman Filter; and css-mle – maximize the conditional sum of squares likelihood and then these values are used as starting values for the computation of the exact likelihood via the Kalman filter)
triangles = original values, circles = 1-step forecasting and the ‘x’-s = iterative predictions