Time series basics

Basic Principles to Create a
Time Series Forecast
We are surrounded by patterns that can be found everywhere,
one can notice patterns with the four season in relation to the
weather; patterns on peak hour when it refers to the volume of
traffic; in your heart beats, as well as in the shares of the stock
market and also in the sales cycles of certain products.
Analyzing time series data can be extremely useful for checking
these patterns and creating predictions for future. There are
several ways to create these forecasts, in this post I will approach
the concepts of the most basic and traditional methodologies.
All code is written in Python, and also, any additional
information can be seen on my Github.
So let’s start commenting about the initial condition for
analyzing Time Series:
Stationary Series
A stationary time series is one whose statistical properties, such
as mean, variance and auto correlation, are relatively constant

over time. Therefore, a non-stationary series is one whose
statistical properties change over time.
Before starting any predictive modeling it is necessary to verify if
these statistical properties are constant, I will explain below each
of these points:
• Constant mean
• Constant variance
• Auto correlated
Constant Mean
A stationary series has a relatively constant mean overtime,
there are no bullish or bearish trends. Having a constant mean
with small variations around it, makes much easier to
extrapolate to the future.
There are cases where the variance is small relative to the mean
and using it may be a good metric to make predictions for the
future, below a chart to show the relative constant mean in
relation to the variances over time:

In this case, if the series is not stationary, the forecast for the
future will not be efficient, because variations around the mean
values deviate significantly as can be seen on the chart below:
In the chart above, it is clear that there is a bullish trend and the
mean is gradually rising. In this case, if the average was used to
make future forecasts the error would be significant, since
forecast prices would always be below the real price.
Constant Variance
When the series has constant variance, we have an idea of the
standard variation in relation to the mean, when the variance is
not constant (as image below) the forecast will probably have
bigger errors in certain periods and these periods will not be
predictable, it is expected that the variance will remain
inconstant over time, including in the future.
In order to reduce the variance effect, the logarithmic
transformation can be applied. In this case also, exponential

transformation, like the Box-Cox method, or the use of inflation
adjustment can be used as well.
Autocorrelated Series
When two variables have similar variation in relation to the
standard deviation during time, you can say that these variables
are correlated, For instance, when the body weight increase
along with heart disorders, the greater the weight, greater is the
incidence of problems in the heart. In this case, the correlation is
positive and the graph would look something like this:
A case of negative correlation would be something like: the
greater the investment within safety measures at work the
smaller would be the amount of work related accidents.
Here are several examples of scatter plots with correlation levels:
source: wikipedia

When the subject is auto correlation, it means that there is a
correlation of certain previous periods with the current period,
the name given to the period with this correlation is lag, For
instance, in a series that has measurements every hour, today’s
temperature at 12:00 is very similar to the temperature of 12:00,
24 hours ago. If you compare the variation of temperatures
through this 24 house time frame, there will be an auto
correlation, in this case we will have an auto correlation with the
24th lag.
Auto correlation is a condition to create forecasts with a single
variable, because if there is no correlation, you can not use past
values to predict the future, when there are several variables,
you can verify if there is a correlation between the dependent
variable and the lags of the independent variables.
If a series does not have auto correlation it is a series with
random and unpredictable sequences, and the best way to make
a prediction is usually to use the value from the previous day. I
will use more detailed charts and explanations below.
From here I will analyze the weekly Hydrous ethanol prices from
Esalq (it’s a price reference to negotiate hydrous ethanol in
Brazil), the data can be downloaded here.
The price is in Brazilian Reais per cubic meter (BRL/m3).
Before starting any analysis, let’s split the data on a training and
test set

Dividing data on training and testing basis
When we are going to create a time series prediction model, it’s
crucial to separate the data into two parts:
Training set: these data will be the main basis for defining the
coefficients/parameters of the model;
Test set: These are the data that will be separated and will not
be seen by the model to test if the model works (generally these
values are compared with a walk forward method and finally the
mean error is measured).
The size of the test set is usually about 20% of the total sample,
although this percentage depends on the sample size that you
have and also how much time ahead you want to make the
forecast. The test set should ideally be at least as large as the
maximum forecast horizon required.
Unlike other prediction methods, such as classifications and
regressions without the influence of time, in time series we can
not divide the training and test data with random samples from
any part of the data, we must follow the time criterion of the
series, where the training data should always come before the
test data.
In this example of Esalq hydrous prices we have 856 weeks, we
will use as training set the first 700 weeks and the last 156
weeks (3 years ~ 18%) we will use as a test set:

From now on we will only use the training set to do the studies,
the test set will only be used to validate the predictions that we
will make.
Every time series can be broken down into 3 parts: trend,
seasonality and residuals, which is what remains after
removing the first two parts from the series, below the
separation of these parts:
Clearly the series has an uptrend, with peaks between the end
and beginning of each years and minimums between April and
September (beginning of the sugarcane crushing in the center-
south of Brazil).

However it’s indicated to use statistical tests to confirm if the
series is stationary, we will use two tests: the Dickey-Fuller test
and the KPSS test.
First, we will use the Dickey-Fuller test, I will use the base P-
value of 5%, that is, if the P-value is below this 5% it means that
the series is statistically stationary.
In addition, there is the Statistical Test of the model, where
these values can be compared with the critical values of 1%, 5%
and 10%, if the statistical test is below some critical value
chosen the series will be stationary:
In this case, the Dickey-Fuller test indicated that the series is not
stationary (P-value 36% and the critical value 5% is less than the
statistical test).
Now we are going to analyze the series with the KPSS test,
unlike the Dickey-Fuller test, the KPSS test already assumes that
the series is stationary and only will not be if the P value is less
than 5% or the statistical test is less than some value critic:

Confirming the Dickey-Fuller test, the KPSS test also shows that
the series is not stationary because the P-value is at 1% and the
statistical test is above any critical value.
Next I will demonstrate ways to turn a series into stationary.
Turning the series into stationary
Differencing
Differencing is used to remove trend signals and also to reduce
the variance, it is simply the difference of the value of
period T with the value of the previous period T-1.
To make it easier to understand, below we get only a fraction of
ethanol prices for better visualization, note from May/2005
prices start rising until mid-May/2006, these prices have weekly
rises that accumulates creating an uptrend, in this case, we have
a non-stationary series.

When the first differentiation is made (graph below), we remove
the cumulative effect of the series and only show the variation of
period T against period T-1 throughout the whole series, so if
the price of 3 days ago was BRL 800.00 and changed to BRL
850.00, the value of the differentiation will be BRL 50.00 and if
today’s value is BRL 860.00 then the difference will be -BRL
10.00.
Normally only one differentiation is necessary to transform a
series into stationary, but if necessary, a second differentiation
can be applied, in this case, the differentiation will be made on
the values of the first differentiation (there will hardly be cases
with more than 2 differentiations).
Using the same example, to make a second differentiation we
must take the differentiation of T minus T-1: BRL 2.9 — BRL 5.5
= -BRL 2.6 and so on.

Let’s do the Dickey-fuller test to see if the series will be
stationary with the first differentiation:
In this case we confirm that the series is stationary, the P-value
is zero and when we compare the value of the statistical test, it
is far below the critical values.
In the next example we will try to transform a series into
stationary using the inflation adjustment.
Inflation Adjustment

Prices are relative to the time that they were traded, in 2002 the
price of ethanol was at BRL 680.00, if the price of this product
were traded at this price nowadays certainly many mills would
be closed as it’s a very low price.
To try to make the series stationary, I will adjust the whole
series based on the current values using the IPCA index (it’s the
Brazilian CPI index), accumulating from the end of the training
period (Apr/2016) until the beginning of the study, the source
of the data is on the IBGE website.
Now let’s see how the series became and also if it became
stationary.

As can be seen, the uptrend has disappeared, with only the
seasonal oscillations remaining, the Dickey-Fuller test also
confirms that the series is now stationary.
Just for the sake of curiosity, see below the graph with the
adjusted price with inflation against the original series.
Reducing variance
Logarithm
The logarithm is usually used to transform series that have
exponential growth values in series with more linear growths, in
this example we will use the Natural Logarithm (NL), where the
base is 2.718, this type of logarithm is widely used in economic
models.
The difference of the values transformed into NL is
approximately equivalent to the percentage variation of the
values of the original series, which is valid as a basis for
reducing the variance in series with different prices, see the
example below:
If we have a product that had a price increase in 2000 and went
from BRL 50.00 to 52.50, some years later (2019) the price was
already BRL 100.00 and changed to BRL 105.00, the absolute

difference between prices is BRL 2.50 and BRL 5.00 respectively,
however the percentage difference of both is 5%.
When we use the LN in these prices we have: NL (52,50) — NL
(50,00) = 3,96–3,912 = 0,048 or 4.8%, in the same way using
the LN in the second price sequence we have: NL (105) — NL
(100) = 4.654–4.605 = 0.049 or 4.9%.
In this example, we can reduce the variation of values by
bringing almost everything to the same basis.
Below the same example:
Result: The percentage variation of the first example is 4.9 and the
second is 4.9
Below the table comparing values of percentage variation of X
with the variation values of NL (X):
Source

let’s plot the comparative between the original series and the
series with NL transform:
Box-Cox Transformation (Power Transform)
The BOX COX transformation is also a way to transform a series,
the lambda (λ) value is a parameter used to transform the series.
In short, this function is the junction of several exponential
transformation functions, where we search for the best value of
lambda that transforms the series so that it has a distribution
closer to a normal Gaussian distribution. A condition to use this
transformation is that the series only has positive values, the
formula is:
Below I will plot the original series with its distribution and after
that the transformed series with the optimal value of lambda
with its new distribution, to find the value of lambda we will use
the function boxcox of the library Scipy, where it generates the
transformed series and the ideal lambda:

Below is an interactive chart where you can change the lambda
value and check the change in the chart:
This tool is usually used to improve the performance of the
model, since it makes it with more normal distributions,
remembering that after finishing the prediction of the model,
you must return to the original base inverting the transformation
according to the formula below:
Looking for correlated lags
To be predictable, a series with a single variable must have auto
correlation, that is, the current period must be explained based
on an earlier period (a lag).

As this series has weekly periods, 1 year is approximately 52
weeks, I will use the auto correlation function showing a period
of 60 lags to verify correlations of the current period with these
lags.
Analyzing the above auto correlation chart above, it seems that
all lags could be used to create forecasts for future events since
they have a positive correlation close to 1 and they are also
outside of the confidence interval, but this characteristic is of a
non-stationary series.
Another very important function is the partial auto correlation
function, where the effect of previous lags on the current period
is removed and only the effect of the lag analyzed over the
current period remains, for instance: the partial auto correlation
of the fourth lag will remove the effects of the first, second and
third lags.
Below the partial auto correlation graph:

As can be seen, almost no lag has an effect on the current
period, but as demonstrated earlier, the series without
differentiation is not stationary, we will now plot these two
functions with the series with one differentiation to see how it
works:

The auto correlation plot changed significantly, showing that the
series has a significant correlation only in the first lag and a
seasonal effect with negative correlation around the 26th month
(half a year).
To create forecasts, we must pay attention to an extremely
important detail about finding correlated lags, it’s important that
there is a reason behind this correlation, because if there is no
logical reason it’s possible that it’s only chance and that this
correlation can disappear when you include more data.

Another important point is that the auto correlation and partial
auto correlation graphs are very sensitive to outliers, so it’s
important to analyze the time series itself and compare with the
two auto correlation charts.
In this example the first lag has a high correlation with the
current period, since the prices of the previous week historically
do not vary significantly, in the same case the 26th lag presents
a negative correlation, indicating a tendency contrary to the
current period, probably due to the different periods of supply
and demand over the course of a year.
As the inflation-adjusted series has become stationary, we will
use it to create our forecasts, below the auto correlation and
partial auto correlation graphs of the adjusted series:

We will use only the first two lags as a predictor for auto-
regressive series.
For more information, Duke University professor Robert
Nau’s website is one of the best related to this subject.
Metrics to evaluate the model
In order to analyze if the forecasts are with the values close to
the current values one must make the measurement of the error,

the error (or residuals) in this case is basically
Yreal−YpredYreal−Ypred.
The error in the training data is evaluated to verify if the model
has good assertiveness, and validates the model by checking the
error in the test data (data that was not “seen” by the model).
Checking the error is very important to verify if your model is
overfitting or underfitting when you compare the training data
with the test data.
Below are the key metrics used to evaluate time series models:
MEAN FORECAST ERROR — (BIAS)
It’s nothing more than the average of the errors of the evaluated
series, the values can be positive or negative. This metric
suggests that the model tends to make predictions above the real
value (negative errors) or below the real value (positive errors),
so it can also be said that the mean forecast error is the bias of
the model.
MAE — MEAN ABSOLUTE ERROR
This metric is very similar to the average error of the prediction
mentioned above, the only difference is the error with a negative
value that is transformed into positive and afterward the mean is
calculated.

This metric is widely used in time series, since there are cases
that the negative error can cancel the positive error and give an
idea that the model is accurate, in the case of the MAE it doesn’t
happen, because this metric shows how much the forecast is far
from the real values, regardless if above or below, see the case
below:
Result: The error of each model value looks like this: [-4 -2 0 2 4]
The MFE error was 0.0, the MAE error was 2.4
MSE — MEAN SQUARED ERROR
This metric places more weight on larger errors because each
individual error value is squared and then the mean is
calculated. Thus, this metric is very sensitive to outliers and puts
a lot of weight on predictions with more significant errors.
Unlike the MAE and MFE, the MSE values are in quadratic units
rather than the units of the model.
RMSE — ROOT MEAN SQUARED ERROR
This metric is simply the square root of the MSE, where the error
returns to the unit of measure of the model (BRL/m3), it is very
used in time series because it’s more sensitive to the bigger
errors due to the process of squaring which originated it.
MAPE — MEAN ABSOLUTE PERCENTAGE
ERROR

This is another interesting metric to use, which generally is used
in management reports because the error is measured in
percentage terms, so the error of a product X can be compared
with the error of a product Y.
The calculation of this metric takes the absolute value of the
error divided by the current price, then the mean is calculated:
Let’s create a function to evaluate the errors of training and test
data with several evaluation metrics:
Checking the residual values
It’s not enough to create the model and check the error values
according to the chosen metric, you must also analyze the
characteristics of the residual itself, as there are cases where the
model can not capture the information necessary to make a good
forecast, resulting in an error with information that should be
used to improve the forecast.
To verify this residual we will check:
• Current vs. predicted values (sequential chart);
• Residual vs. predicted values (dispersion chart):

It is very important to analyze this graph since in it we can check
patterns that can tell us if some modification is needed in the
model, the ideal is that the error is distributed linearly along the
forecast sequence.
• QQ plot of the residual (dispersion chart):
Summarizing this is a graph that shows where the residue
should be theoretically distributed, following a Gaussian
distribution, versus how it actually is.
• Residual auto correlation (sequential chart):
Where there should be no values that come out of the
confidence margin or the model is leaving information out of the
model.
We need to create another function to plot these graphs:
Most basic ways to make a forecast
From now on we will create some models of price forecast of
Hydrous ethanol, below will be the steps that we will follow for
each model:
• Create prediction on the training data and subsequently
validate on the test data;
• Check the error of each model according to the metrics
mentioned above;

• Plot the model with the residual comparatives.
Let’s go to the models:
Naive approach:
The simplest way to make a forecast is to use the value of the
previous period, this is the best approach that can be done in
some cases, where the error is lower compared to other forecast
methodologies.
Generally, this methodology doesn’t work well to predict many
periods ahead, as the errors tend to increase in relation to real
values.
Many people also use this approach as a baseline to try to
improve with more complex models.
Below we will use the training and test data to make the
simulations:

The QQ chart shows that there are some larger (up and down)
residuals than theoretically should be, these are the so-called
outliers, and there is still a significant auto correlation in the
first, sixth and seventh lag, which could be used to improve the
model.
In the same way, we will now make the forecast in the test data.
The first value of the predicted series will be the last of the
training data, then these values will be updated step-by-step by
the current value of the test and so on:

The RMSE and MAE errors were similar to the training data, the
QQ chart is with the residual more in line with what should
theoretically be, probably due to the few sample values
compared to the training data.
In the chart comparing the residuals with the predicted values
it’s noted that there is a tendency for the errors to increase in
absolute values when prices increase, perhaps a logarithmic
adjustment would decrease this error expansion, and to finalize
the residual correlation graph shows that there is still room for
improvement as there is a strong correlation in the first lag,
where a regression based on the first lag could probably be
added to improve predictions. Next model is the simple average:

Simple Mean:
Another way to make predictions is to use the series mean,
usually this form of forecasting is good when the values oscillate
close around the mean, with constant variance and no uptrend
or downtrend, but it’s possible to use better methods, where can
make the forecast using seasonal patterns among others.
This model uses the mean of the beginning of the data until the
previous period analyzed and it expands daily until the end of
the data, in the end, the tendency is that the line is straight, we
will now compare the error of this model with the first model:
In the testing data, I will continue using the mean from the
beginning of the training data and make the expansion of the
mean with the values that will be added on the test data:

The simple mean model failed to capture relevant information of
the series, as can be seen in the Real vs Forecast graph, also in
the correlation and Residual vs. Predicted graphs.
Simple Moving Average:
The moving average is an average that is calculated for a given
period (5 days for example) and is moving and always being
calculated using this particular period, in which case we will
always be using the average for the last 5 days to predict the
value of the next day.

The error was lower than the simple average, but still above the
simple model, below the test model:
Similarly to the training data, the moving-averages model is
better than the simple average, but they do not yet gain from the
simple model.

The predictions are with auto-correlation in two lags and the
error is with a very high variance in relation to the predicted
values.
Exponential Moving Average:
The simple moving average model described above has the
property of treating the last X observations equally and
completely ignoring all previous observations. Intuitively, past
data should be discounted more gradually, for example, the
most recent observation should theoretically be slightly more
important than the second most recent, and the second most
recent should have a little more importance than the third more
recent, and so on, the Exponential Moving Average
(EMM) model does this.
Since α (alpha) is a constant with a value between 0 and 1, we
will calculate the forecast with the following formula:
Where the first value of the forecast is the respective current
value, the other values will be updated by α times the difference
between the actual value and the forecast of the previous period.
When α is zero we have a constant based on the first value of
the forecast, when α is 1 we have a model with a simple
approach because the result is the value of the previous real
period.

Below is a graph chart several values of α:
The average data period in the EMM forecast is 1 / α . For
example, when α = 0.5, lag is equivalent to 2 periods;
when α = 0.2 the lag is 5 periods; when α = 0.1 the lag is 10
periods and so on.
In this model, we will arbitrarily use a α of 0.50, but you can do
a grid search to look for the α which reduces the error in the
training and also in the validation, we will see how it will look:
The error of this model was similar to the error of the moving
averages, however, we have to validate the model in the test
base:

In the validation data, the error so far is the second best of the
models that we have already trained, but the characteristics of
the graphs of the residuals are very similar to the graphs of the
model of the moving average of 5 days.
Auto-Regressive:
An auto-regressive model is basically a linear regression with
significantly correlated lags, where the autocorrelation and
partial autocorrelation charts should initially be plotted to verify
if there is anything relevant.
Below are the autocorrelation and partial autocorrelation charts
of the training series that shows a signature of auto-regressive
model with 2 lags with significant correlations:

Below we will create the model based on the training data and
after obtaining the coefficients of the model, we will multiply
them by the values that are being performed by the test data:

In this model the error was the lowest compared to all the others
that we trained, now let’s use its coefficients to do the step-by-
step forecast of the training data:
Note that in the test data the error did not remain stable, even
worse than the simple model, note in the chart that the
forecasts are almost always below the current values, the bias
measurement shows that the real values are BRL 50.19 above
the predictions, maybe tuning some parameters in the training
model this difference would decrease.
To improve these models you can apply several transformations,
such as those explained in this post, also you can add external

variables as a forecast source, however, this is a subject for
another post.
Final considerations
Each time series model has its own characteristics and should be
analyzed individually so we can extract as much information as
possible to make good predictions reducing the uncertainty of
the future.
Checking for stationary, transforming the data, creating the
model in the training data, validating on the test data and
checking the residuals are key steps to create a good time series
forecast.

Time series basics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Time series basics

Similar to Time series basics (20)

More from akshay ghanwat

More from akshay ghanwat (7)

Recently uploaded

Recently uploaded (20)

Time series basics