SlideShare a Scribd company logo
1 of 36
Download to read offline
TIME SERIES ANALYSIS
& FORECASTING OF
ALUMINIUM PRICES
Feb 2016
MC DONNELL, CIAN
12476828
Supervisor: S. Ashe
1 | P a g e
Acknowledgments
I would like to sincerely thank Sinead Ashe for her time, support and guidance throughout the
course of this project. It was greatly appreciated.
2 | P a g e
I hereby certify that this material, which I now submit for assessment on the
programme of study leading to the award of (degree or masters) is entirely my
own work and had not been taken from the work of others save and to the
extent that such work has been cited and acknowledged within the text of my
work.
Signed: ___________________ ID no: _____________ Date: ________
3 | P a g e
Table of contents
Section1 INTRODUCTION
Introduction………………………………..4
Literature review…………………………..5
Data……………………………………......6
Section2 Time Series Analysis
Basic concepts……………………………...8
Components………………………………...9
Section3 Multiple Linear Regression
MLR………………………………………..10
Analysis…………………………………….10
Diagnostics…………………………………12
Section4 Box-Jenkins
Moving Average……………………………16
Auto Regressive…………………………….16
ARMA………………………………………17
ARIMA……………………………………...19
Diagnostics…………………………………..24
Forecasting…………………………………..28
Section5 Volatility Modelling
Arch/Garch…………………………………..30
Section6 VAR model
VAR(p)……………………………………….31
Forecasting…………………………………….33
Section7 Conclusion…………………………………..35
References…………………………………..35
4 | P a g e
Section1: Introduction
Today, aluminium ranks number two in the consumption volumes among all the metals, surpassed
only by steel. In the coming decades the demand for aluminium will continue increasing at
unstoppable rates. Recent developments in the motor industry, the rapid growth of cities, new
potential uses of aluminium as a substitute to copper in the power industry and many other trends
mean that the metal is well placed to strengthen its dominant position as a key structural material
of the 21st century.
Changes in aluminium prices are hugely important to aluminium producers, as well as the motor,
transport and power industry. Falling aluminium prices would be hugely beneficial to companies
like Coca-Cola (CCE) where aluminium represents 15% of production costs. All automobile
manufacturers use aluminium, so a fall in prices would lower production costs which could be kept
as profits or reinvested. Companies like Alcoa (AA) and Alcan (AL) would benefit greatly from
increases in the price of aluminium, these companies are involved in all stages of aluminium
production.
However, they are also a particular interest to investors who wish to profit from price speculation.
Investors profit from trading in the futures market. This is a “central financial exchange where
people can trade standardised futures contracts; that is, a contract to buy specific quantities of a
commodity at a specified price with delivery set at a specified time in the future.
If the investor believes that the price of the asset in the future is going to be less than the price
quoted in the futures market he may take a short position, meaning he will enter into a contract
where he agrees to “deliver” an agreed amount of the commodity at a future date for a specified
price.
If his prediction of future price movement was accurate then he will be able to buy the agreed
amount of the commodity at the time the contract expires at a reduced price and then sell it at the
specified price of the contract, and profit from the difference.
Conversely, if he believes that the price will increase beyond the future price stated in the contract
he may take a long position, were he agrees to buy a specified amount of aluminium at a future
date for an agreed price, in this case if he is right he may sell the aluminium at its price at the time
the contract expires and profit from the difference.
5 | P a g e
Clearly, the ability to forecast future price movements accurately is a hugely profitable skill for
investors to have.
There are 3 main reasons for modelling time series:
• To obtain a model that explains past movements and possible influencing factors
• To control the process by identifying relationships between variables
• To forecast future movements
The aim of my project will be, given a data set of monthly aluminium closing prices on the LME
(London Metal Exchange) from January 2012 to December 2015, to develop a price model
which can accurately explain price movements and estimate future price levels of aluminium.
I will examine some of the different statistical modelling techniques related to time
series which entails using the past in order to predict the future.
Literature Review
Time series analysis covers a large number of forecasting methods. I have examined several
publications, projects and books {Watkins & McAleer (2004), Brockwell & Davis (2002),
Montgomery & Johnson (1990) and Bowerman & O’Connell(2004)} examining time series
forecasting and have interpreted its results, analysing the benefits and problems with these
methods. The two main methods I have studied are multiple linear regression and the Box-
Jenkins method.
In statistical modelling, regression analysis is a statistical process for estimating the
relationships among variables. It includes many techniques for modelling and analysing
several variables, when the focus is on the relationship between a dependant variable and one
or more independent variables (or 'predictors'). More specifically, regression analysis helps
one understand how the typical value of the dependent variable changes when any one of the
independent variables is varied, while the other independent variables are held fixed.
Box - Jenkins Analysis refers to a systematic method of identifying, fitting, checking, and
using integrated autoregressive, moving average (ARIMA) time series models. The model is
based on the data being a stochastic process which means the future outputs are dependent on
the past.
6 | P a g e
The results I have seen using regression techniques seem to provide a reasonable forecast for
future prices {Ismail et al (2009) and Kapl & Muller (2010)} but there are major problems
that emerge which reduce the accuracy of the forecast. The first problem is that of
multicollinearity, where two or more of the explanatory variables are highly correlated,
making it difficult to determine the individual effects they have on the dependant variable.
This can be overcome by increasing the size of the data or by omitting one of the highly
collinear variables. Another problem is Hetroskedacity, where the variance of the error terms
is non-constant for all values of the independent variables. This gives inefficient estimates of
the standard errors. This can be reduced by transforming the variables. The biggest problem
with regression in our case is actually choosing the independent variables. The number of
known factors that affect the price in this market is unknown and therefore would make
forecasting quite difficult.
The results I have seen in various publications of the Box Jenkins methodology {Green
(2011) and Adebiyi (2014)} have provided a fairly accurate forecast in the short term. What I
like about this method is its relative ease to apply because it minimises the parameters
needed. Also it is flexible enough to be implemented on a variety of data (seasonal/non-
seasonal, linear/nonlinear). This method however comes with some major drawbacks. Firstly,
some of the traditional model identification techniques for identifying the correct model from
the class of possible models are difficult to understand and in some cases even if the right
model is chosen it still may not be reliable. Also the long term forecast eventually goes to be
straight line and is poor at predicting series with turning points. Therefore this model should
only be used to forecast in the very short term and care should be taken when identifying the
correct model.
Data
The data was taken on a monthly basis (1st
of every month) from January 2012 to December
2015. I plotted my data using the R software package. Firstly, I downloaded the relevant data
from their respective sources and saved them under the file “cian.csv” in my documents.
Data source:
The price of aluminium was taken from the London Metal Exchange and is measured in US$
per metric tonne.
The futures price was taken from investing.com and is also measured in US$ per metric
tonne.
World primary aluminium production was taken from the world aluminium database and is
shown per thousand metric tonne.
To read the file in R file: read.csv ("cian.csv", header = T, sep = ",")
attach (file)
file
7 | P a g e
To plot prices: TS1 = as.ts(rev(prices))
par(mfrow=c(2,2))
plot(rev(prices))
plot(TS1, xlab="Time", ylab="Prices")
The same method was used to plot the other three graphs shown below:
Time
Prices
0 10 20 30 40
1600180020002200
Time
returns
0 10 20 30 40
-0.100.000.10
Time
Futures
0 10 20 30 40
1600180020002200
Time
Production
0 10 20 30 40
400044004800
8 | P a g e
Section2: Time Series Analysis
Basic Concepts of time series modelling
A time series is a sequential set of data points, measured typically over successive times. It is
mathematically defined as a set of vectors y(t), t = 0,1,2,…,n where t represents the time
elapsed. The variable y(t) is treated as a random variable.
Components of a time series
A time series in general is supposed to be affected by four main components that can be
separated from the observed data. These components are: Trend, Cyclical, Seasonal and
Irregular.
1) Trend is a long term movement in a time series. With regards to modelling share
prices, it reflects the long term change in the price of a stock.
2) The Cyclical variation in a time series describes the medium term changes in the
series, caused by circumstances which repeat in cycles. The duration of a cycle
extends over longer periods of time, usually two or more years. Financial market
indices are sensitive to what are known as business cycles where businesses grow,
peak, contract, trough and recover.
3) Seasonal variations in a time series are fluctuations within a year where patterns are
repeated over a known time period (week, month and season). Examples include the
increase in retail sales over the Christmas period and the ‘January effect’ where
stock markets seem to over-perform during the month of January.
4) Irregular variations in a time series are caused by unpredictable influences which do
not have a pattern.
Considering the effects of these four components, two different types of models are
generally used for a time series.
• Additive model: y(t) = T(t) + S(t) + C(t) + I(t)
• Multiplicative model: y(t) = T(t) * S(t) + C(t) * I(t)
Here y(t) is the observation
T, S, C, I = Trend, Seasonal, Cyclical, Irregular
9 | P a g e
Multiplicative models are based on the assumption that the four components of a time series
are not necessarily independent and they can effect one another, whereas in the additive model it is
assumed that the four components are independent of each other.
Section3: Multiple Linear Regression
Multiple linear regression attempts to model the relationship between two or more
explanatory variables and a predictor variable by fitting a linear equation to the observed
data. Every value of the independent variable x is associated with a value of the dependent
variable y. Once you have identified how these multiple variables relate to the dependent
variable, you can take information about all of the independent variables and use it to make
much more powerful and accurate predictions.
A general multiple regression is:
𝑦𝑦𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥1 + ⋯ + 𝛽𝛽𝑝𝑝 𝑥𝑥𝑝𝑝 + 𝜀𝜀𝑡𝑡
𝑦𝑦𝑡𝑡: Value of response variable at time t.
𝛽𝛽𝑖𝑖: Are parameters to be estimated from the data where 𝑖𝑖 = 0,1,…,p
𝑥𝑥𝑖𝑖: Value of the explanatory variable
𝜀𝜀𝑡𝑡: Random error of sample t, which is the variation in the response variable that cannot be
explained by the rest of the model.
I choose two explanatory variables to try and explain the movements in aluminium prices,
these are:
10 | P a g e
Aluminium futures
Aluminium futures are standardized, exchange traded contracts in which the contract buyer
agrees to take delivery, from the seller, a specific quantity of aluminium at a predetermined
price on a future delivery date. Producers employ a short hedge to lock in a selling price
while businesses that require aluminium employ a long hedge to secure a selling price.
Futures is a good indicator of price speculation in the market. Futures are traded by
speculators who assume the price risk that hedgers try to avoid in return for a chance to
profit from favourable price movement. Speculators buy futures when they believe prices
will go up. Conversely, they will sell futures when they believe prices will go down.
The conclusion from the IMF working paper titled “Do commodity futures help forecast spot
prices?” by Reichsfeld & Roache read:
“Futures price-based forecasts are hard to beat. Futures prices perform at least as well as a
random walk for most commodities and at most horizons and, in some cases, do significantly
better.”
Production
Clearly the physical factors (supply & demand) of aluminium have a major influence on price
movements. Excess supply cause prices to fall and excess demand would cause prices to
rise. Every year world aluminium production increases as a result of the ever growing
demand for this metal. On average, world aluminium demand grows by 5-7% annually
(aluminiumleader.com) so production increases need to be line with this to prevent even
more volatility in prices.
Analysis
To perform the multiple linear regression in R:
#Multivariable
lm4 = lm(rev(prices) ~ (1 + rev(production) + rev(futures)))
summary(lm4)
This gave me the following output:
Coefficients: Estimate Std. Error t value P value
(Intercept) -188.10213 83.79010 -2.245 0.0297 *
Production 0.01926 0.01226 1.571 0.1233
Futures 1.05415 0.02016 52.283 <2e-16 ***
11 | P a g e
Multiple R-squared: 0.9908, Adjusted R-squared: 0.9903
So our regression function is:
𝒚𝒚�𝒕𝒕 = −𝟏𝟏𝟏𝟏𝟏𝟏. 𝟏𝟏𝟏𝟏 + 𝟎𝟎. 𝟎𝟎𝟐𝟐𝒑𝒑𝒕𝒕 + 𝟏𝟏. 𝟎𝟎𝟎𝟎𝒇𝒇𝒕𝒕 + 𝜺𝜺𝒕𝒕
Explanation of regression equation
The y-intercept (𝛽𝛽0) of -188.1 is the aluminium price we expect if the two predictor
variables both equal zero. However, the intercept is only a meaningful interpretation if it
was reasonable for both predictor variables to be zero, and if the dataset actually included
values for production and futures that were close to zero. Since both of these conditions are
not true 𝛽𝛽0 really has no meaningful interpretation, it simply anchors the regression line in
the right place. In this case, if production was equal to zero then there would be no
aluminium to sell and therefore no price.
The parameters in front of the predictor variables represent the mean change in the price
of aluminium for one unit of change in the predictor variable while holding the other
predictor in the model constant. In our case it means that if production increased by one
thousand metric tonnes and the futures price remained constant aluminium prices would
increase by 1.9 cents per metric tonne. If the futures price increased by one US$ then the
aluminium price would be expected to increase by 1.05 US$.
The p value for each term tests the null hypothesis that the coefficient is equal to zero (no
effect). A low p value indicates that you can reject the null hypothesis. A larger p value
indicates that changes in the predictor are not associated with changes in the response. The
p value of 2e-16 indicates that changes in aluminium futures prices are highly correlated
with changes in the price of aluminium while the p value of 0.1233 suggests that production
doesn’t have a significant effect on prices.
12 | P a g e
Multiple linear regression analysis makes several key assumptions:
1. Linear relationship: multiple linear regression needs the relationship between the
independent and dependent variables to be linear. This assumption can be tested
using scatter plots.
2. Normality: the error terms are random and follow a normal distribution with mean
equal to zero. This result also implies that the response 𝑦𝑦𝑡𝑡 also follows a normal
distribution.
3. Homogeneous variance: all the error terms have the same variance. If this
assumption is broken the method of least squares can be applied.
4. Independence: multiple linear regression assumes that there is little or no
multicollinearity in the data. Multicollinearity occurs when the independent
variables are not independent from each other. A second important independence
assumption is that the error terms are uncorrelated; that is that the standard mean
error of the dependent variable is independent from the independent variables.
Diagnostics:
After fitting a regression model it is important to determine whether all the necessary
model assumptions hold. If there are any violations, subsequent procedures may be invalid
resulting in faulty conclusions. Therefore, it is crucial to perform appropriate model
diagnostics. Model diagnostic procedures include graphical methods as well as formal
statistical tests.
1. Linearity: nonlinearity is usually most evident in a plot of observed versus
predicted values or a plot of residuals versus fitted values. The points should be
symmetrically distributed around a diagonal line in the first plot and around a
horizontal line in the second, with roughly a constant variance.
2. Normality: the best test for normally distributed errors is a normal probability
plot or normal quantile plot of the residuals. These are plots of the fractiles of error
distribution versus the fractiles of a normal distribution having the same mean and
variance. If the distribution is normal, the points on such a plot should fall close to
the diagonal reference line.
3. Homoscedasticity: Is evident in a plot of residuals versus fitted values. What you are
hoping to see is a random pattern around the 0 line.
4. Independence: the best test for serial correlation is to look at a residual time series
plot. Ideally, the pattern of residuals should be random.
13 | P a g e
To plot residuals versus time:
ls(lm4)
lm4$residuals
plot(lm4$residuals, type= "p")
The assumption of independence clearly holds as there is a random pattern in the plot of
residuals vs. time and the residuals bounce randomly around the residual = 0 line.
0 10 20 30 40
-40-2002040
Index
lm4$residuals
14 | P a g e
To plot normal Q-Q plot:
qqnorm(lm4$residuals)
qqline(lm4$residuals)
-2 -1 0 1 2
-40-2002040
Normal Q-Q Plot
Theoretical Quantiles
SampleQuantiles
15 | P a g e
The data lies approximately on the straight line but the end tails are skewed indicating that
the error terms might not be normal.
To plot residuals vs fitted:
plot(lm4)
16 | P a g e
The assumptions of equal variance and linearity hold as there is a random pattern around
the zero line and the residuals don’t get larger as the fitted values gets larger.
Section4: Box-Jenkins methodology
Moving average
A moving average model measures the relationship between current value of the variable
and previous error terms. An MA(q) model is a moving average model of order q, where q is
the number of past errors that determine the current observation.
The general form of a moving average model is:
𝑦𝑦𝑡𝑡 = 𝜇𝜇 + 𝜀𝜀𝑡𝑡 + 𝜃𝜃1 𝜀𝜀𝑡𝑡−1 + 𝜃𝜃2 𝜀𝜀𝑡𝑡−2 + ⋯ + 𝜃𝜃𝑞𝑞 𝜀𝜀𝑡𝑡−𝑞𝑞
Or
𝑦𝑦𝑡𝑡 = 𝜇𝜇 + 𝜀𝜀𝑡𝑡 + � 𝜃𝜃𝑖𝑖 𝜀𝜀𝑡𝑡−𝑞𝑞
𝑞𝑞
𝑖𝑖=1
Where:
• 𝑦𝑦𝑡𝑡 = response variable at time t
• µ = mean of the data set
• 𝜀𝜀𝑡𝑡 = random shock at time t
• 𝜃𝜃𝑖𝑖 = parameters estimated by the data
Auto regressive
An auto regressive model AR(p) expresses the relationship between the current value of the
variable and previous random observations plus some random error term. P is the number
of past observations that affect the response variable.
The general form of an auto regressive model is:
17 | P a g e
𝑦𝑦𝑡𝑡 = 𝜃𝜃0 + 𝜃𝜃1 𝑦𝑦𝑡𝑡−1 + 𝜃𝜃2 𝑦𝑦𝑡𝑡−2 + ⋯ + 𝜃𝜃𝑝𝑝 𝑦𝑦𝑡𝑡−𝑝𝑝 + 𝜀𝜀𝑡𝑡
Or
𝑦𝑦𝑡𝑡 = 𝜃𝜃0 + � 𝜃𝜃𝑖𝑖 𝑦𝑦𝑡𝑡−𝑖𝑖 +
𝑝𝑝
𝑖𝑖=1
𝜀𝜀𝑡𝑡
Where:
• 𝑦𝑦𝑡𝑡 = response variable at time t
• 𝜃𝜃𝑖𝑖 = parameters estimated by the data
• 𝜀𝜀𝑡𝑡 = random shock at time t
ARMA
Combining AR(p) and MA(q) models we get the ARMA(p,q) model with autoregressive order
p and moving average order q.
The general form of an ARMA model is:
𝑦𝑦𝑡𝑡 = 𝑐𝑐 + 𝜀𝜀𝑡𝑡 + � 𝜃𝜃𝑖𝑖
𝑝𝑝
𝑖𝑖=1
𝑦𝑦𝑡𝑡−𝑖𝑖 + � 𝜃𝜃𝑖𝑖 𝜀𝜀𝑡𝑡−𝑖𝑖
𝑞𝑞
𝑖𝑖=1
Conditions necessary for ARIMA modelling
Invertibility
The model has to be able to express 𝑦𝑦𝑡𝑡 in terms of past 𝑦𝑦 observations. While it is clear that
the AR part of the model is invertible by definition, certain conditions have to be met on the
MA part of the model. A moving average model is invertible if its parameters decrease to
zero as our process moves backwards through time. This implies that the sum of MA
parameters must be less than one.
18 | P a g e
� 𝜃𝜃𝑖𝑖 <
𝑞𝑞
𝑖𝑖=1
1
Stationary
The ARIMA model is based on the time series being stationary. A time series is stationary if
its mean, variance and auto-correlation is constant over time.
𝐸𝐸(𝑦𝑦𝑡𝑡) = 𝜇𝜇
𝑉𝑉𝑉𝑉𝑉𝑉(𝑦𝑦𝑡𝑡) = 𝜎𝜎2
𝐶𝐶𝐶𝐶𝐶𝐶(𝑦𝑦𝑡𝑡, 𝑦𝑦𝑡𝑡+𝑘𝑘) = 𝛾𝛾𝑘𝑘
If the time series is not stationary a transformation is required. It can be seen from the time
series plot of aluminium prices that the time series is not stationary.
Differencing
Differencing is a common way of making the time series stationary.
I will difference the data so that:
𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑(𝑦𝑦𝑡𝑡) = 𝑦𝑦𝑡𝑡 − 𝑦𝑦𝑡𝑡−1
If this fails to make the time series stationary second order differencing can be applied:
𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑"(𝑦𝑦𝑡𝑡) = (𝑦𝑦𝑡𝑡 + 𝑦𝑦𝑡𝑡−1) − (𝑦𝑦𝑡𝑡−1 − 𝑦𝑦𝑡𝑡−2)
This is done in R by:
TS5 = diff(TS1)
TS6=diff(TS5)
plot(TS6,main="second order differencing", xlab="second order differencing
pairs",ylab="TS6")
plot(TS5,main="first order differencing", xlab="first order differencing pairs",ylab="TS5")
19 | P a g e
We can see from the graphs that first order differencing gives the model stationarity and
since the second order differencing doesn’t make the model significantly more stationary I will use first
order in my ARIMA model.
ARIMA
The Auto-Regressive Integrated Moving Average model is a generalisation of the ARMA
model. An ARIMA(p,d,q) model expresses the relationship between the current value of the
response variable with past values, past errors and current and past values of other time
series. d is the order of differencing of the data. This model is known to be robust and
first order differencin
first order differencing pairs
TS5
10 20 30 40
-2000100
second order differe
second order differencing p
TS6
10 20 30 40
-4000200
20 | P a g e
efficient in financial time series forecasting especially short-term prediction. It can be
applied to non- stationary data and has an easy to follow methodology.
Using the following steps we can apply the ARIMA model.
1. Check that the time series is stationary, if not transform the data. This is usually done by
differencing
2. Find the ACF and PACF of the stationary data. Check for spikes at constant multiple lags (i.e.
L,2L,3L…) which expresses seasonality and make the necessary changes to fix this
3. Choose the order of the ARIMA model. This can be done by looking at the ACF and PACF or
by using the auto.arima function in R.
4. Perform model diagnostics
5. Forecast
Autocorrelation and Partial Autocorrelation Functions (ACF and PACF)
To determine a proper model for a given time series data, it is necessary to carry out the
ACF and PACF analysis. They reflect how the observations in a time series are related to
each other. For modelling and forecasting purpose it is often useful to plot the ACF and
PACF against consecutive time lags.
The ACF at lag k measures the correlation between the time series observations separated
by a lag of k time units. The PACF measures the correlation between an observation k
periods ago and the current observation, it is the same as the ACF except the effect of
intervening observations are removed.
The model selection guidelines below were taken from Bowerman & O’Connell “forecasting
and time series an applied approach”.
21 | P a g e
ACF PACF MODEL
Spikes at lags 1,2,…,q and
cuts off after q.
Dies down Moving Average model of
order q
Dies down Spikes at lags 1,2,…,p and
cuts off after p.
Autoregressive model of
order p.
Spikes at lags 1,2,…,q and
cuts off after q.
Spikes at lags 1,2,…,p and
cuts off after p.
Choose the model that
cuts off more abruptly. If
both model seem to cut
off equally abruptly.
1. Use the Moving Average
of order q and not the
Autoregressive of order p
in a model.
2. Use the Autoregressive
of order p and not the
Moving Average of order q
in a model.
Then choose the operator
that yields the best results
Dies down Dies down Moving Average model of
order q and an
Autoregressive model of
order p
No spikes in lags No spikes in lags No operators (Random
walk model)
22 | P a g e
As shown above, we need to difference the original data, of order d =1 to obtain
stationarity.
The ACF and PACF was obtained by:
acf(TS5)
pacf(TS5)
As the ACF cuts off after lag one and the PACF cuts off at lag one an ARIMA(1,1,0) model
should be fitted. I also used the auto.arima function in R which told me to use the
ARIMA(1,1,0) model also.
0 5 10 15
-0.20.41.0
Lag
ACF
Series TS5
5 10 15
-0.30.00.3
Lag
PartialACF
Series TS5
23 | P a g e
AIC & BIC
Lastly to make sure I chose the correct model I used the Akaike Information Criterion and
the Bayesian information Criterion. The Akaike Information Criterion (AIC) and Bayesian
Information Criterion (BIC) are standard methods of comparing the quality models. Hence
AIC and BIC provides a means for model selection.
The Akaike Information Criterion is a measure of the relative quality of a statistical model
for a given set of data. AIC estimates the quality of each model, relative to other models.
Hence AIC provides a trade-off between the goodness of fit and complexity of the model.
The BIC usually penalises added parameters more heavily than the AIC.
Usually, we deem the model with the lowest combined AIC & BIC as the most suitable
model. The optimal model order is chosen by the number of model parameters which
minimises the AIC and BIC. I tested 8 different models of difference order one to see which
had the lowest AIC and BIC. As you can see from the graph below the ARIMA(1,1,0) model is
the optimal model to use.
ARIMA(p,d,q) AIC BIC
(0,1,0) 573.55 575.40
(1,1,0) 572.16 575.86
(1,1,1) 573.92 579.47
(2,1,1) 575.25 582.65
(2,1,2) 573.30 582.55
(1,1,2) 574.27 581.67
(0,1,2) 572.86 578.41
(0,1,1) 572.94 576.64
24 | P a g e
Model Diagnostics
To plot residuals versus order:
plot(resid, main = 'Residuals of the Model', ylab = 'Residual', xlab = 'Observation Order', type
= 'p')
The residuals versus order plot seems to show that the mean of the errors is zero and that
the variance is constant. Therefore we can conclude that the error terms are independent.
25 | P a g e
The histogram was given by:
hist = hist(ARmodel$residuals)
xfit = seq(min(ARmodel$residuals),max(ARmodel$residuals))
yfit = dnorm(xfit,mean=mean(ARmodel$residuals),sd=sd(ARmodel$residuals))
yfit = yfit*diff(hist$mids[1:2])*length(ARmodel$residuals)
lines(xfit, yfit, col="black", lwd=2)
The histogram forms an approximate bell shape around 0 which indicates the residuals are
normal.
26 | P a g e
The Q-Q plot was given by:
qqnorm(resid, main = 'Probability Plot of Residuals')
The residuals form an approximately straight line indicating that the residuals are normally
distributed.
27 | P a g e
To get the ACF and PACF of the residuals of the models:
acf(resid)
pacf(resid)
All lags of the ACF and the PACF lie well within the significance bands. Therfore we can conclude that
there is no significant autocorrelation between the residuals, i.e. they are therefore independently
distributed.
28 | P a g e
Forecast
To forecast and plot the models expected outcomes:
TSpredict = predict(ARmodel,n.ahead=7)
TSpredict
plot(TS1,xlim=c(40,58),ylim=c(1200,2500))
points(c(48:55),c(TS1[48],TSpredict$pred),type="l",col="red")
points(c(48:55),c(TS1[48],TSpredict$pred+1*TSpredict$se),type="l",col="red
",lty=3)
points(c(48:55),c(TS1[48],TSpredict$pred-1*TSpredict$se),type="l",col="red
",lty=3)
Time
TS1
40 45 50 55
1200140016001800200022002400
29 | P a g e
Time Forecasted Actual
January 2016 1523.64 1526.50
February 2016 1573.27 1621.00
Mar 2016 1600.84 N/A
April 2016 1643.39 N/A
May 2016 1675.77 N/A
June 2016 1715.05 N/A
July 2016 1749.65 N/A
30 | P a g e
Section5: Volatility modelling
A major assumption of ARMA/ARIMA modelling is that the conditional variance is constant
given past information. However not all time series follow this. A lot of time series feature
variance that are functions of time. We will now present methods to model this.
ARCH
An arch model is used when there is reason to believe that there is a non-constant variance
in the time series. It will be clear in the analysis of the residuals if there is heteroskedascity
and therefore a need to model the variance.
This model assumes that the variance at time t depends on the size of the squares of the
error terms at previous times.
𝜎𝜎𝑡𝑡
2
= 𝛼𝛼0 + 𝛼𝛼1 𝜀𝜀𝑡𝑡−1
2
+ ⋯ + 𝛼𝛼𝑞𝑞 𝜀𝜀𝑡𝑡−𝑞𝑞
2
= 𝛼𝛼0 + � 𝛼𝛼𝑖𝑖
𝑞𝑞
𝑖𝑖=1
𝜀𝜀𝑡𝑡−𝑖𝑖
2
GARCH
A GARCH model is an improved model for observations taken in quick succession. It adds to
the basic ARCH model by letting the current variance depend on previous variance as well as
the square of previous error. Therefore, it is an autoregressive model of the variance of
order p combined with a moving average of squared error terms of order q.
𝜎𝜎𝑡𝑡
2
= 𝛼𝛼0 + � 𝛼𝛼𝑖𝑖
𝑞𝑞
𝑖𝑖=1
𝜀𝜀𝑡𝑡−𝑖𝑖
2
+ � 𝛽𝛽𝑖𝑖
𝑝𝑝
𝑖𝑖=1
𝜎𝜎𝑡𝑡−𝑖𝑖
2
This is quite close to the ARMA model, except that it is now applied to modelling variance.
After running the diagnostics on my regression model I found there was homoskedacity in
the data and therefore no need to model the variance.
31 | P a g e
Section6: VAR model
VAR models (vector autoregressive models) are used for multivariate time series. All
variables in a VAR are treated symmetrically in a structural sense. Each variable has an
equation explaining its evolution based on its own lags and the lags of the other model
variables. First the variables are put into a (𝑘𝑘 × 1) vector, k being the number of variables
involved.
A VAR(p) model is defined as:
𝑦𝑦𝑡𝑡 = 𝑎𝑎 + � 𝐴𝐴𝑖𝑖 𝑦𝑦𝑡𝑡−𝑖𝑖 + 𝜀𝜀𝑡𝑡
𝑝𝑝
𝑖𝑖=1
Where:
• 𝑦𝑦𝑡𝑡 is the vector of response time series variables at time t
• a is a constant vector to be estimated from the data
• Ai are n-by-n matrices for each i. The Ai are autoregressive matrices. There
are p autoregressive matrices
• 𝜀𝜀𝑡𝑡 is the vector of errors
For the VAR model I am using prices and futures because production was shown earlier to be
statistically insignificant.
To change my data into the appropriate form:
long=append(rev(prices),rev(futures))
long
datamatrix = matrix(c(long),48,2)
datamatrix
TS5 = as.ts(datamatrix)
32 | P a g e
Order Selection
To pick the order of my VAR(p) model:
VARselect(TS5,lag.max=6,type="const")
This gave me:
1 2 3 4
AIC(n) 1.499415e+01 1.495799e+01 1.508302e+01 1.501362e+01
HQ(n) 1.508514e+01 1.510963e+01 1.529533e+01 1.528659e+01
SC(n) 1.524239e+01 1.537172e+01 1.566224e+01 1.575834e+01
FPE(n) 3.251535e+06 3.141637e+06 3.574357e+06 3.358867e+06
5 6
AIC(n) 1.511536e+01 1.504350e+01
HQ(n) 1.544898e+01 1.543779e+01
SC(n) 1.602556e+01 1.611920e+01
FPE(n) 3.761570e+06 3.560736e+06
AIC(n) HQ(n) SC(n) FPE(n)
2 1 1 2
Two selection criterions told me to choose order 1 and two told me to go with order 2, so I
chose a VAR(1) model to keep the parameters at a minimum so as to not over fit the model.
Forecasting
To plug my model order into R:
modelVAR = VAR(TS5,p=1,type="const")
Forecast with model:
VARpred = predict(modelVAR,n.ahead=7)
VARpred
33 | P a g e
Series.1
fcst lower upper CI
[1,] 1524.295 1328.109 1720.481 196.1855
[2,] 1559.308 1300.869 1817.747 258.4392
[3,] 1586.911 1291.571 1882.251 295.3399
[4,] 1610.743 1291.150 1930.336 319.5926
[5,] 1631.060 1294.891 1967.229 336.1690
[6,] 1648.411 1300.656 1996.167 347.7555
[7,] 1663.225 1307.260 2019.189 355.9643
$Series.2
fcst lower upper CI
[1,] 1538.577 1348.484 1728.671 190.0934
[2,] 1570.838 1320.305 1821.370 250.5328
[3,] 1597.917 1311.308 1884.526 286.6090
[4,] 1621.091 1310.810 1931.372 310.2812
[5,] 1640.871 1314.414 1967.328 326.4573
[6,] 1657.760 1319.999 1995.522 337.7616
[7,] 1672.180 1326.411 2017.949 345.7694
Where series 1 is aluminium prices and series 2 is futures.
34 | P a g e
Comparing ARIMA and VAR Forecasts
Prediction comparison got in R by:
plot(c(40:55),TS1[40:55],xlim=c(40,55),type='l',ylab=("Time Series"),xlab=("Time"))
par(lwd=1)
points(c(48:55),c(TS1[48],TSpredict$pred),type='l',col="red")
points(c(48:55),c(TS1[48],VARpred$fcst$Series.1[,1]),type='l',col="green")
35 | P a g e
Section7: Conclusion
Overall I found this topic of study quite interesting and enjoyable as I have a keen interest in
financial markets. I’m sure that what I have learned over the course of this project will be of
great use to me in the future. Forecasting prices is a very difficult task but I think I have
done a fairly reasonable job at doing so.
I think that learning R is something that will stand by me in my future career. Even though it
took a long time to write the code and learn its commands I know feel comfortable with its
language and look forward to using it again.
I found it quite difficult to find data that would explain the movements of price in my
regression model. Most of the data I wanted to include did not come freely, you needed to
buy an LME subscription to view most of their data. I wanted to include LME warehouse
inventories in my data but was unable. I also wanted to include variables that could affect
the production of aluminium such as energy costs but again I found this task difficult.
Overall, I found the project a thoroughly enjoyable experience. I
found the study of time series and its applications both useful and interesting
and I have developed a greater understanding of the subject.
References
• “Econometric modelling of non-ferrous metal prices”- Watkins & McAleer(2004)
• “Introduction to Time Series and Forecasting”- Brockwell & Davis(2002)
• “Forecasting and time series analysis”- Montgomery & Johnson(1990)
• “Forecasting, Time Series, and Regression”- Bowerman & O’Connell(2004)
• “Forecasting Gold Prices Using Multiple Linear Regression Method” [American
Journal of Applied Sciences, Volume 6, Issue 8] – Ismail et al(2009)
• “Prediction of steel prices: A comparison between a conventional regression model
and MSSA” [Statistics and Its Interface Volume 3 (2010)] – Kapl & Muller(2010)
• “Time Series Analysis of Stock Prices Using the Box-Jenkins Approach”- Green(2011)
• “Comparison of ARIMA and Artificial Neural Networks Models for Stock Price
Prediction” [Journal of Applied Mathematics Volume 2014] – Adebiyi et al(2014)

More Related Content

Similar to TIME SERIES ANALYSIS & FORECASTING OF ALUMINIUM PRICES

Descriptionsordernametypeformatvallabvarlab1location_idint8.0gNum.docx
Descriptionsordernametypeformatvallabvarlab1location_idint8.0gNum.docxDescriptionsordernametypeformatvallabvarlab1location_idint8.0gNum.docx
Descriptionsordernametypeformatvallabvarlab1location_idint8.0gNum.docxdonaldp2
 
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docxFIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docxAKHIL969626
 
Large Scale Automatic Forecasting for Millions of Forecasts
Large Scale Automatic Forecasting for Millions of ForecastsLarge Scale Automatic Forecasting for Millions of Forecasts
Large Scale Automatic Forecasting for Millions of ForecastsAjay Ohri
 
esmaeili-2016-ijca-911399
esmaeili-2016-ijca-911399esmaeili-2016-ijca-911399
esmaeili-2016-ijca-911399Nafas Esmaeili
 
Forecasting Methodology Used in Restructured Electricity Market: A Review
Forecasting Methodology Used in Restructured Electricity Market: A ReviewForecasting Methodology Used in Restructured Electricity Market: A Review
Forecasting Methodology Used in Restructured Electricity Market: A ReviewDr. Sudhir Kumar Srivastava
 
TIME SERIES & CROSS ‎SECTIONAL ANALYSIS
TIME SERIES & CROSS ‎SECTIONAL ANALYSISTIME SERIES & CROSS ‎SECTIONAL ANALYSIS
TIME SERIES & CROSS ‎SECTIONAL ANALYSISLibcorpio
 
Module 3 - Time Series.pptx
Module 3 - Time Series.pptxModule 3 - Time Series.pptx
Module 3 - Time Series.pptxnikshaikh786
 
5.0 -Chapter Introduction
5.0 -Chapter Introduction5.0 -Chapter Introduction
5.0 -Chapter IntroductionSabrina Baloi
 
Meteorology Lab ReportIntroductionMeteorologists draw
Meteorology Lab ReportIntroductionMeteorologists draw Meteorology Lab ReportIntroductionMeteorologists draw
Meteorology Lab ReportIntroductionMeteorologists draw AbramMartino96
 
Guidelines to Understanding Design of Experiment and Reliability Prediction
Guidelines to Understanding Design of Experiment and Reliability PredictionGuidelines to Understanding Design of Experiment and Reliability Prediction
Guidelines to Understanding Design of Experiment and Reliability Predictionijsrd.com
 
Statistics, Data Analysis, and Decision ModelingFOURTH EDITION.docx
Statistics, Data Analysis, and Decision ModelingFOURTH EDITION.docxStatistics, Data Analysis, and Decision ModelingFOURTH EDITION.docx
Statistics, Data Analysis, and Decision ModelingFOURTH EDITION.docxdessiechisomjj4
 
Mb0048 operations research
Mb0048   operations researchMb0048   operations research
Mb0048 operations researchsmumbahelp
 
16 ch ken black solution
16 ch ken black solution16 ch ken black solution
16 ch ken black solutionKrunal Shah
 
RDO_01_2016_Journal_P_Web
RDO_01_2016_Journal_P_WebRDO_01_2016_Journal_P_Web
RDO_01_2016_Journal_P_WebSahl Martin
 
Mba 532 2011_part_3_time_series_analysis
Mba 532 2011_part_3_time_series_analysisMba 532 2011_part_3_time_series_analysis
Mba 532 2011_part_3_time_series_analysisChandra Kodituwakku
 
Generalized Analysis Value Behavior
Generalized Analysis Value BehaviorGeneralized Analysis Value Behavior
Generalized Analysis Value BehaviorBob Prieto
 

Similar to TIME SERIES ANALYSIS & FORECASTING OF ALUMINIUM PRICES (20)

Descriptionsordernametypeformatvallabvarlab1location_idint8.0gNum.docx
Descriptionsordernametypeformatvallabvarlab1location_idint8.0gNum.docxDescriptionsordernametypeformatvallabvarlab1location_idint8.0gNum.docx
Descriptionsordernametypeformatvallabvarlab1location_idint8.0gNum.docx
 
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docxFIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
 
Large Scale Automatic Forecasting for Millions of Forecasts
Large Scale Automatic Forecasting for Millions of ForecastsLarge Scale Automatic Forecasting for Millions of Forecasts
Large Scale Automatic Forecasting for Millions of Forecasts
 
esmaeili-2016-ijca-911399
esmaeili-2016-ijca-911399esmaeili-2016-ijca-911399
esmaeili-2016-ijca-911399
 
Forecasting Methodology Used in Restructured Electricity Market: A Review
Forecasting Methodology Used in Restructured Electricity Market: A ReviewForecasting Methodology Used in Restructured Electricity Market: A Review
Forecasting Methodology Used in Restructured Electricity Market: A Review
 
TIME SERIES & CROSS ‎SECTIONAL ANALYSIS
TIME SERIES & CROSS ‎SECTIONAL ANALYSISTIME SERIES & CROSS ‎SECTIONAL ANALYSIS
TIME SERIES & CROSS ‎SECTIONAL ANALYSIS
 
The Short-term Swap Rate Models in China
The Short-term Swap Rate Models in ChinaThe Short-term Swap Rate Models in China
The Short-term Swap Rate Models in China
 
Module 3 - Time Series.pptx
Module 3 - Time Series.pptxModule 3 - Time Series.pptx
Module 3 - Time Series.pptx
 
Training Module
Training ModuleTraining Module
Training Module
 
5.0 -Chapter Introduction
5.0 -Chapter Introduction5.0 -Chapter Introduction
5.0 -Chapter Introduction
 
Meteorology Lab ReportIntroductionMeteorologists draw
Meteorology Lab ReportIntroductionMeteorologists draw Meteorology Lab ReportIntroductionMeteorologists draw
Meteorology Lab ReportIntroductionMeteorologists draw
 
Guidelines to Understanding Design of Experiment and Reliability Prediction
Guidelines to Understanding Design of Experiment and Reliability PredictionGuidelines to Understanding Design of Experiment and Reliability Prediction
Guidelines to Understanding Design of Experiment and Reliability Prediction
 
Statistics, Data Analysis, and Decision ModelingFOURTH EDITION.docx
Statistics, Data Analysis, and Decision ModelingFOURTH EDITION.docxStatistics, Data Analysis, and Decision ModelingFOURTH EDITION.docx
Statistics, Data Analysis, and Decision ModelingFOURTH EDITION.docx
 
Mb0048 operations research
Mb0048   operations researchMb0048   operations research
Mb0048 operations research
 
16 ch ken black solution
16 ch ken black solution16 ch ken black solution
16 ch ken black solution
 
RDO_01_2016_Journal_P_Web
RDO_01_2016_Journal_P_WebRDO_01_2016_Journal_P_Web
RDO_01_2016_Journal_P_Web
 
Demand forecasting
Demand forecastingDemand forecasting
Demand forecasting
 
Demand forecasting
Demand forecastingDemand forecasting
Demand forecasting
 
Mba 532 2011_part_3_time_series_analysis
Mba 532 2011_part_3_time_series_analysisMba 532 2011_part_3_time_series_analysis
Mba 532 2011_part_3_time_series_analysis
 
Generalized Analysis Value Behavior
Generalized Analysis Value BehaviorGeneralized Analysis Value Behavior
Generalized Analysis Value Behavior
 

TIME SERIES ANALYSIS & FORECASTING OF ALUMINIUM PRICES

  • 1. TIME SERIES ANALYSIS & FORECASTING OF ALUMINIUM PRICES Feb 2016 MC DONNELL, CIAN 12476828 Supervisor: S. Ashe
  • 2. 1 | P a g e Acknowledgments I would like to sincerely thank Sinead Ashe for her time, support and guidance throughout the course of this project. It was greatly appreciated.
  • 3. 2 | P a g e I hereby certify that this material, which I now submit for assessment on the programme of study leading to the award of (degree or masters) is entirely my own work and had not been taken from the work of others save and to the extent that such work has been cited and acknowledged within the text of my work. Signed: ___________________ ID no: _____________ Date: ________
  • 4. 3 | P a g e Table of contents Section1 INTRODUCTION Introduction………………………………..4 Literature review…………………………..5 Data……………………………………......6 Section2 Time Series Analysis Basic concepts……………………………...8 Components………………………………...9 Section3 Multiple Linear Regression MLR………………………………………..10 Analysis…………………………………….10 Diagnostics…………………………………12 Section4 Box-Jenkins Moving Average……………………………16 Auto Regressive…………………………….16 ARMA………………………………………17 ARIMA……………………………………...19 Diagnostics…………………………………..24 Forecasting…………………………………..28 Section5 Volatility Modelling Arch/Garch…………………………………..30 Section6 VAR model VAR(p)……………………………………….31 Forecasting…………………………………….33 Section7 Conclusion…………………………………..35 References…………………………………..35
  • 5. 4 | P a g e Section1: Introduction Today, aluminium ranks number two in the consumption volumes among all the metals, surpassed only by steel. In the coming decades the demand for aluminium will continue increasing at unstoppable rates. Recent developments in the motor industry, the rapid growth of cities, new potential uses of aluminium as a substitute to copper in the power industry and many other trends mean that the metal is well placed to strengthen its dominant position as a key structural material of the 21st century. Changes in aluminium prices are hugely important to aluminium producers, as well as the motor, transport and power industry. Falling aluminium prices would be hugely beneficial to companies like Coca-Cola (CCE) where aluminium represents 15% of production costs. All automobile manufacturers use aluminium, so a fall in prices would lower production costs which could be kept as profits or reinvested. Companies like Alcoa (AA) and Alcan (AL) would benefit greatly from increases in the price of aluminium, these companies are involved in all stages of aluminium production. However, they are also a particular interest to investors who wish to profit from price speculation. Investors profit from trading in the futures market. This is a “central financial exchange where people can trade standardised futures contracts; that is, a contract to buy specific quantities of a commodity at a specified price with delivery set at a specified time in the future. If the investor believes that the price of the asset in the future is going to be less than the price quoted in the futures market he may take a short position, meaning he will enter into a contract where he agrees to “deliver” an agreed amount of the commodity at a future date for a specified price. If his prediction of future price movement was accurate then he will be able to buy the agreed amount of the commodity at the time the contract expires at a reduced price and then sell it at the specified price of the contract, and profit from the difference. Conversely, if he believes that the price will increase beyond the future price stated in the contract he may take a long position, were he agrees to buy a specified amount of aluminium at a future date for an agreed price, in this case if he is right he may sell the aluminium at its price at the time the contract expires and profit from the difference.
  • 6. 5 | P a g e Clearly, the ability to forecast future price movements accurately is a hugely profitable skill for investors to have. There are 3 main reasons for modelling time series: • To obtain a model that explains past movements and possible influencing factors • To control the process by identifying relationships between variables • To forecast future movements The aim of my project will be, given a data set of monthly aluminium closing prices on the LME (London Metal Exchange) from January 2012 to December 2015, to develop a price model which can accurately explain price movements and estimate future price levels of aluminium. I will examine some of the different statistical modelling techniques related to time series which entails using the past in order to predict the future. Literature Review Time series analysis covers a large number of forecasting methods. I have examined several publications, projects and books {Watkins & McAleer (2004), Brockwell & Davis (2002), Montgomery & Johnson (1990) and Bowerman & O’Connell(2004)} examining time series forecasting and have interpreted its results, analysing the benefits and problems with these methods. The two main methods I have studied are multiple linear regression and the Box- Jenkins method. In statistical modelling, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modelling and analysing several variables, when the focus is on the relationship between a dependant variable and one or more independent variables (or 'predictors'). More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. Box - Jenkins Analysis refers to a systematic method of identifying, fitting, checking, and using integrated autoregressive, moving average (ARIMA) time series models. The model is based on the data being a stochastic process which means the future outputs are dependent on the past.
  • 7. 6 | P a g e The results I have seen using regression techniques seem to provide a reasonable forecast for future prices {Ismail et al (2009) and Kapl & Muller (2010)} but there are major problems that emerge which reduce the accuracy of the forecast. The first problem is that of multicollinearity, where two or more of the explanatory variables are highly correlated, making it difficult to determine the individual effects they have on the dependant variable. This can be overcome by increasing the size of the data or by omitting one of the highly collinear variables. Another problem is Hetroskedacity, where the variance of the error terms is non-constant for all values of the independent variables. This gives inefficient estimates of the standard errors. This can be reduced by transforming the variables. The biggest problem with regression in our case is actually choosing the independent variables. The number of known factors that affect the price in this market is unknown and therefore would make forecasting quite difficult. The results I have seen in various publications of the Box Jenkins methodology {Green (2011) and Adebiyi (2014)} have provided a fairly accurate forecast in the short term. What I like about this method is its relative ease to apply because it minimises the parameters needed. Also it is flexible enough to be implemented on a variety of data (seasonal/non- seasonal, linear/nonlinear). This method however comes with some major drawbacks. Firstly, some of the traditional model identification techniques for identifying the correct model from the class of possible models are difficult to understand and in some cases even if the right model is chosen it still may not be reliable. Also the long term forecast eventually goes to be straight line and is poor at predicting series with turning points. Therefore this model should only be used to forecast in the very short term and care should be taken when identifying the correct model. Data The data was taken on a monthly basis (1st of every month) from January 2012 to December 2015. I plotted my data using the R software package. Firstly, I downloaded the relevant data from their respective sources and saved them under the file “cian.csv” in my documents. Data source: The price of aluminium was taken from the London Metal Exchange and is measured in US$ per metric tonne. The futures price was taken from investing.com and is also measured in US$ per metric tonne. World primary aluminium production was taken from the world aluminium database and is shown per thousand metric tonne. To read the file in R file: read.csv ("cian.csv", header = T, sep = ",") attach (file) file
  • 8. 7 | P a g e To plot prices: TS1 = as.ts(rev(prices)) par(mfrow=c(2,2)) plot(rev(prices)) plot(TS1, xlab="Time", ylab="Prices") The same method was used to plot the other three graphs shown below: Time Prices 0 10 20 30 40 1600180020002200 Time returns 0 10 20 30 40 -0.100.000.10 Time Futures 0 10 20 30 40 1600180020002200 Time Production 0 10 20 30 40 400044004800
  • 9. 8 | P a g e Section2: Time Series Analysis Basic Concepts of time series modelling A time series is a sequential set of data points, measured typically over successive times. It is mathematically defined as a set of vectors y(t), t = 0,1,2,…,n where t represents the time elapsed. The variable y(t) is treated as a random variable. Components of a time series A time series in general is supposed to be affected by four main components that can be separated from the observed data. These components are: Trend, Cyclical, Seasonal and Irregular. 1) Trend is a long term movement in a time series. With regards to modelling share prices, it reflects the long term change in the price of a stock. 2) The Cyclical variation in a time series describes the medium term changes in the series, caused by circumstances which repeat in cycles. The duration of a cycle extends over longer periods of time, usually two or more years. Financial market indices are sensitive to what are known as business cycles where businesses grow, peak, contract, trough and recover. 3) Seasonal variations in a time series are fluctuations within a year where patterns are repeated over a known time period (week, month and season). Examples include the increase in retail sales over the Christmas period and the ‘January effect’ where stock markets seem to over-perform during the month of January. 4) Irregular variations in a time series are caused by unpredictable influences which do not have a pattern. Considering the effects of these four components, two different types of models are generally used for a time series. • Additive model: y(t) = T(t) + S(t) + C(t) + I(t) • Multiplicative model: y(t) = T(t) * S(t) + C(t) * I(t) Here y(t) is the observation T, S, C, I = Trend, Seasonal, Cyclical, Irregular
  • 10. 9 | P a g e Multiplicative models are based on the assumption that the four components of a time series are not necessarily independent and they can effect one another, whereas in the additive model it is assumed that the four components are independent of each other. Section3: Multiple Linear Regression Multiple linear regression attempts to model the relationship between two or more explanatory variables and a predictor variable by fitting a linear equation to the observed data. Every value of the independent variable x is associated with a value of the dependent variable y. Once you have identified how these multiple variables relate to the dependent variable, you can take information about all of the independent variables and use it to make much more powerful and accurate predictions. A general multiple regression is: 𝑦𝑦𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥1 + ⋯ + 𝛽𝛽𝑝𝑝 𝑥𝑥𝑝𝑝 + 𝜀𝜀𝑡𝑡 𝑦𝑦𝑡𝑡: Value of response variable at time t. 𝛽𝛽𝑖𝑖: Are parameters to be estimated from the data where 𝑖𝑖 = 0,1,…,p 𝑥𝑥𝑖𝑖: Value of the explanatory variable 𝜀𝜀𝑡𝑡: Random error of sample t, which is the variation in the response variable that cannot be explained by the rest of the model. I choose two explanatory variables to try and explain the movements in aluminium prices, these are:
  • 11. 10 | P a g e Aluminium futures Aluminium futures are standardized, exchange traded contracts in which the contract buyer agrees to take delivery, from the seller, a specific quantity of aluminium at a predetermined price on a future delivery date. Producers employ a short hedge to lock in a selling price while businesses that require aluminium employ a long hedge to secure a selling price. Futures is a good indicator of price speculation in the market. Futures are traded by speculators who assume the price risk that hedgers try to avoid in return for a chance to profit from favourable price movement. Speculators buy futures when they believe prices will go up. Conversely, they will sell futures when they believe prices will go down. The conclusion from the IMF working paper titled “Do commodity futures help forecast spot prices?” by Reichsfeld & Roache read: “Futures price-based forecasts are hard to beat. Futures prices perform at least as well as a random walk for most commodities and at most horizons and, in some cases, do significantly better.” Production Clearly the physical factors (supply & demand) of aluminium have a major influence on price movements. Excess supply cause prices to fall and excess demand would cause prices to rise. Every year world aluminium production increases as a result of the ever growing demand for this metal. On average, world aluminium demand grows by 5-7% annually (aluminiumleader.com) so production increases need to be line with this to prevent even more volatility in prices. Analysis To perform the multiple linear regression in R: #Multivariable lm4 = lm(rev(prices) ~ (1 + rev(production) + rev(futures))) summary(lm4) This gave me the following output: Coefficients: Estimate Std. Error t value P value (Intercept) -188.10213 83.79010 -2.245 0.0297 * Production 0.01926 0.01226 1.571 0.1233 Futures 1.05415 0.02016 52.283 <2e-16 ***
  • 12. 11 | P a g e Multiple R-squared: 0.9908, Adjusted R-squared: 0.9903 So our regression function is: 𝒚𝒚�𝒕𝒕 = −𝟏𝟏𝟏𝟏𝟏𝟏. 𝟏𝟏𝟏𝟏 + 𝟎𝟎. 𝟎𝟎𝟐𝟐𝒑𝒑𝒕𝒕 + 𝟏𝟏. 𝟎𝟎𝟎𝟎𝒇𝒇𝒕𝒕 + 𝜺𝜺𝒕𝒕 Explanation of regression equation The y-intercept (𝛽𝛽0) of -188.1 is the aluminium price we expect if the two predictor variables both equal zero. However, the intercept is only a meaningful interpretation if it was reasonable for both predictor variables to be zero, and if the dataset actually included values for production and futures that were close to zero. Since both of these conditions are not true 𝛽𝛽0 really has no meaningful interpretation, it simply anchors the regression line in the right place. In this case, if production was equal to zero then there would be no aluminium to sell and therefore no price. The parameters in front of the predictor variables represent the mean change in the price of aluminium for one unit of change in the predictor variable while holding the other predictor in the model constant. In our case it means that if production increased by one thousand metric tonnes and the futures price remained constant aluminium prices would increase by 1.9 cents per metric tonne. If the futures price increased by one US$ then the aluminium price would be expected to increase by 1.05 US$. The p value for each term tests the null hypothesis that the coefficient is equal to zero (no effect). A low p value indicates that you can reject the null hypothesis. A larger p value indicates that changes in the predictor are not associated with changes in the response. The p value of 2e-16 indicates that changes in aluminium futures prices are highly correlated with changes in the price of aluminium while the p value of 0.1233 suggests that production doesn’t have a significant effect on prices.
  • 13. 12 | P a g e Multiple linear regression analysis makes several key assumptions: 1. Linear relationship: multiple linear regression needs the relationship between the independent and dependent variables to be linear. This assumption can be tested using scatter plots. 2. Normality: the error terms are random and follow a normal distribution with mean equal to zero. This result also implies that the response 𝑦𝑦𝑡𝑡 also follows a normal distribution. 3. Homogeneous variance: all the error terms have the same variance. If this assumption is broken the method of least squares can be applied. 4. Independence: multiple linear regression assumes that there is little or no multicollinearity in the data. Multicollinearity occurs when the independent variables are not independent from each other. A second important independence assumption is that the error terms are uncorrelated; that is that the standard mean error of the dependent variable is independent from the independent variables. Diagnostics: After fitting a regression model it is important to determine whether all the necessary model assumptions hold. If there are any violations, subsequent procedures may be invalid resulting in faulty conclusions. Therefore, it is crucial to perform appropriate model diagnostics. Model diagnostic procedures include graphical methods as well as formal statistical tests. 1. Linearity: nonlinearity is usually most evident in a plot of observed versus predicted values or a plot of residuals versus fitted values. The points should be symmetrically distributed around a diagonal line in the first plot and around a horizontal line in the second, with roughly a constant variance. 2. Normality: the best test for normally distributed errors is a normal probability plot or normal quantile plot of the residuals. These are plots of the fractiles of error distribution versus the fractiles of a normal distribution having the same mean and variance. If the distribution is normal, the points on such a plot should fall close to the diagonal reference line. 3. Homoscedasticity: Is evident in a plot of residuals versus fitted values. What you are hoping to see is a random pattern around the 0 line. 4. Independence: the best test for serial correlation is to look at a residual time series plot. Ideally, the pattern of residuals should be random.
  • 14. 13 | P a g e To plot residuals versus time: ls(lm4) lm4$residuals plot(lm4$residuals, type= "p") The assumption of independence clearly holds as there is a random pattern in the plot of residuals vs. time and the residuals bounce randomly around the residual = 0 line. 0 10 20 30 40 -40-2002040 Index lm4$residuals
  • 15. 14 | P a g e To plot normal Q-Q plot: qqnorm(lm4$residuals) qqline(lm4$residuals) -2 -1 0 1 2 -40-2002040 Normal Q-Q Plot Theoretical Quantiles SampleQuantiles
  • 16. 15 | P a g e The data lies approximately on the straight line but the end tails are skewed indicating that the error terms might not be normal. To plot residuals vs fitted: plot(lm4)
  • 17. 16 | P a g e The assumptions of equal variance and linearity hold as there is a random pattern around the zero line and the residuals don’t get larger as the fitted values gets larger. Section4: Box-Jenkins methodology Moving average A moving average model measures the relationship between current value of the variable and previous error terms. An MA(q) model is a moving average model of order q, where q is the number of past errors that determine the current observation. The general form of a moving average model is: 𝑦𝑦𝑡𝑡 = 𝜇𝜇 + 𝜀𝜀𝑡𝑡 + 𝜃𝜃1 𝜀𝜀𝑡𝑡−1 + 𝜃𝜃2 𝜀𝜀𝑡𝑡−2 + ⋯ + 𝜃𝜃𝑞𝑞 𝜀𝜀𝑡𝑡−𝑞𝑞 Or 𝑦𝑦𝑡𝑡 = 𝜇𝜇 + 𝜀𝜀𝑡𝑡 + � 𝜃𝜃𝑖𝑖 𝜀𝜀𝑡𝑡−𝑞𝑞 𝑞𝑞 𝑖𝑖=1 Where: • 𝑦𝑦𝑡𝑡 = response variable at time t • µ = mean of the data set • 𝜀𝜀𝑡𝑡 = random shock at time t • 𝜃𝜃𝑖𝑖 = parameters estimated by the data Auto regressive An auto regressive model AR(p) expresses the relationship between the current value of the variable and previous random observations plus some random error term. P is the number of past observations that affect the response variable. The general form of an auto regressive model is:
  • 18. 17 | P a g e 𝑦𝑦𝑡𝑡 = 𝜃𝜃0 + 𝜃𝜃1 𝑦𝑦𝑡𝑡−1 + 𝜃𝜃2 𝑦𝑦𝑡𝑡−2 + ⋯ + 𝜃𝜃𝑝𝑝 𝑦𝑦𝑡𝑡−𝑝𝑝 + 𝜀𝜀𝑡𝑡 Or 𝑦𝑦𝑡𝑡 = 𝜃𝜃0 + � 𝜃𝜃𝑖𝑖 𝑦𝑦𝑡𝑡−𝑖𝑖 + 𝑝𝑝 𝑖𝑖=1 𝜀𝜀𝑡𝑡 Where: • 𝑦𝑦𝑡𝑡 = response variable at time t • 𝜃𝜃𝑖𝑖 = parameters estimated by the data • 𝜀𝜀𝑡𝑡 = random shock at time t ARMA Combining AR(p) and MA(q) models we get the ARMA(p,q) model with autoregressive order p and moving average order q. The general form of an ARMA model is: 𝑦𝑦𝑡𝑡 = 𝑐𝑐 + 𝜀𝜀𝑡𝑡 + � 𝜃𝜃𝑖𝑖 𝑝𝑝 𝑖𝑖=1 𝑦𝑦𝑡𝑡−𝑖𝑖 + � 𝜃𝜃𝑖𝑖 𝜀𝜀𝑡𝑡−𝑖𝑖 𝑞𝑞 𝑖𝑖=1 Conditions necessary for ARIMA modelling Invertibility The model has to be able to express 𝑦𝑦𝑡𝑡 in terms of past 𝑦𝑦 observations. While it is clear that the AR part of the model is invertible by definition, certain conditions have to be met on the MA part of the model. A moving average model is invertible if its parameters decrease to zero as our process moves backwards through time. This implies that the sum of MA parameters must be less than one.
  • 19. 18 | P a g e � 𝜃𝜃𝑖𝑖 < 𝑞𝑞 𝑖𝑖=1 1 Stationary The ARIMA model is based on the time series being stationary. A time series is stationary if its mean, variance and auto-correlation is constant over time. 𝐸𝐸(𝑦𝑦𝑡𝑡) = 𝜇𝜇 𝑉𝑉𝑉𝑉𝑉𝑉(𝑦𝑦𝑡𝑡) = 𝜎𝜎2 𝐶𝐶𝐶𝐶𝐶𝐶(𝑦𝑦𝑡𝑡, 𝑦𝑦𝑡𝑡+𝑘𝑘) = 𝛾𝛾𝑘𝑘 If the time series is not stationary a transformation is required. It can be seen from the time series plot of aluminium prices that the time series is not stationary. Differencing Differencing is a common way of making the time series stationary. I will difference the data so that: 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑(𝑦𝑦𝑡𝑡) = 𝑦𝑦𝑡𝑡 − 𝑦𝑦𝑡𝑡−1 If this fails to make the time series stationary second order differencing can be applied: 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑"(𝑦𝑦𝑡𝑡) = (𝑦𝑦𝑡𝑡 + 𝑦𝑦𝑡𝑡−1) − (𝑦𝑦𝑡𝑡−1 − 𝑦𝑦𝑡𝑡−2) This is done in R by: TS5 = diff(TS1) TS6=diff(TS5) plot(TS6,main="second order differencing", xlab="second order differencing pairs",ylab="TS6") plot(TS5,main="first order differencing", xlab="first order differencing pairs",ylab="TS5")
  • 20. 19 | P a g e We can see from the graphs that first order differencing gives the model stationarity and since the second order differencing doesn’t make the model significantly more stationary I will use first order in my ARIMA model. ARIMA The Auto-Regressive Integrated Moving Average model is a generalisation of the ARMA model. An ARIMA(p,d,q) model expresses the relationship between the current value of the response variable with past values, past errors and current and past values of other time series. d is the order of differencing of the data. This model is known to be robust and first order differencin first order differencing pairs TS5 10 20 30 40 -2000100 second order differe second order differencing p TS6 10 20 30 40 -4000200
  • 21. 20 | P a g e efficient in financial time series forecasting especially short-term prediction. It can be applied to non- stationary data and has an easy to follow methodology. Using the following steps we can apply the ARIMA model. 1. Check that the time series is stationary, if not transform the data. This is usually done by differencing 2. Find the ACF and PACF of the stationary data. Check for spikes at constant multiple lags (i.e. L,2L,3L…) which expresses seasonality and make the necessary changes to fix this 3. Choose the order of the ARIMA model. This can be done by looking at the ACF and PACF or by using the auto.arima function in R. 4. Perform model diagnostics 5. Forecast Autocorrelation and Partial Autocorrelation Functions (ACF and PACF) To determine a proper model for a given time series data, it is necessary to carry out the ACF and PACF analysis. They reflect how the observations in a time series are related to each other. For modelling and forecasting purpose it is often useful to plot the ACF and PACF against consecutive time lags. The ACF at lag k measures the correlation between the time series observations separated by a lag of k time units. The PACF measures the correlation between an observation k periods ago and the current observation, it is the same as the ACF except the effect of intervening observations are removed. The model selection guidelines below were taken from Bowerman & O’Connell “forecasting and time series an applied approach”.
  • 22. 21 | P a g e ACF PACF MODEL Spikes at lags 1,2,…,q and cuts off after q. Dies down Moving Average model of order q Dies down Spikes at lags 1,2,…,p and cuts off after p. Autoregressive model of order p. Spikes at lags 1,2,…,q and cuts off after q. Spikes at lags 1,2,…,p and cuts off after p. Choose the model that cuts off more abruptly. If both model seem to cut off equally abruptly. 1. Use the Moving Average of order q and not the Autoregressive of order p in a model. 2. Use the Autoregressive of order p and not the Moving Average of order q in a model. Then choose the operator that yields the best results Dies down Dies down Moving Average model of order q and an Autoregressive model of order p No spikes in lags No spikes in lags No operators (Random walk model)
  • 23. 22 | P a g e As shown above, we need to difference the original data, of order d =1 to obtain stationarity. The ACF and PACF was obtained by: acf(TS5) pacf(TS5) As the ACF cuts off after lag one and the PACF cuts off at lag one an ARIMA(1,1,0) model should be fitted. I also used the auto.arima function in R which told me to use the ARIMA(1,1,0) model also. 0 5 10 15 -0.20.41.0 Lag ACF Series TS5 5 10 15 -0.30.00.3 Lag PartialACF Series TS5
  • 24. 23 | P a g e AIC & BIC Lastly to make sure I chose the correct model I used the Akaike Information Criterion and the Bayesian information Criterion. The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are standard methods of comparing the quality models. Hence AIC and BIC provides a means for model selection. The Akaike Information Criterion is a measure of the relative quality of a statistical model for a given set of data. AIC estimates the quality of each model, relative to other models. Hence AIC provides a trade-off between the goodness of fit and complexity of the model. The BIC usually penalises added parameters more heavily than the AIC. Usually, we deem the model with the lowest combined AIC & BIC as the most suitable model. The optimal model order is chosen by the number of model parameters which minimises the AIC and BIC. I tested 8 different models of difference order one to see which had the lowest AIC and BIC. As you can see from the graph below the ARIMA(1,1,0) model is the optimal model to use. ARIMA(p,d,q) AIC BIC (0,1,0) 573.55 575.40 (1,1,0) 572.16 575.86 (1,1,1) 573.92 579.47 (2,1,1) 575.25 582.65 (2,1,2) 573.30 582.55 (1,1,2) 574.27 581.67 (0,1,2) 572.86 578.41 (0,1,1) 572.94 576.64
  • 25. 24 | P a g e Model Diagnostics To plot residuals versus order: plot(resid, main = 'Residuals of the Model', ylab = 'Residual', xlab = 'Observation Order', type = 'p') The residuals versus order plot seems to show that the mean of the errors is zero and that the variance is constant. Therefore we can conclude that the error terms are independent.
  • 26. 25 | P a g e The histogram was given by: hist = hist(ARmodel$residuals) xfit = seq(min(ARmodel$residuals),max(ARmodel$residuals)) yfit = dnorm(xfit,mean=mean(ARmodel$residuals),sd=sd(ARmodel$residuals)) yfit = yfit*diff(hist$mids[1:2])*length(ARmodel$residuals) lines(xfit, yfit, col="black", lwd=2) The histogram forms an approximate bell shape around 0 which indicates the residuals are normal.
  • 27. 26 | P a g e The Q-Q plot was given by: qqnorm(resid, main = 'Probability Plot of Residuals') The residuals form an approximately straight line indicating that the residuals are normally distributed.
  • 28. 27 | P a g e To get the ACF and PACF of the residuals of the models: acf(resid) pacf(resid) All lags of the ACF and the PACF lie well within the significance bands. Therfore we can conclude that there is no significant autocorrelation between the residuals, i.e. they are therefore independently distributed.
  • 29. 28 | P a g e Forecast To forecast and plot the models expected outcomes: TSpredict = predict(ARmodel,n.ahead=7) TSpredict plot(TS1,xlim=c(40,58),ylim=c(1200,2500)) points(c(48:55),c(TS1[48],TSpredict$pred),type="l",col="red") points(c(48:55),c(TS1[48],TSpredict$pred+1*TSpredict$se),type="l",col="red ",lty=3) points(c(48:55),c(TS1[48],TSpredict$pred-1*TSpredict$se),type="l",col="red ",lty=3) Time TS1 40 45 50 55 1200140016001800200022002400
  • 30. 29 | P a g e Time Forecasted Actual January 2016 1523.64 1526.50 February 2016 1573.27 1621.00 Mar 2016 1600.84 N/A April 2016 1643.39 N/A May 2016 1675.77 N/A June 2016 1715.05 N/A July 2016 1749.65 N/A
  • 31. 30 | P a g e Section5: Volatility modelling A major assumption of ARMA/ARIMA modelling is that the conditional variance is constant given past information. However not all time series follow this. A lot of time series feature variance that are functions of time. We will now present methods to model this. ARCH An arch model is used when there is reason to believe that there is a non-constant variance in the time series. It will be clear in the analysis of the residuals if there is heteroskedascity and therefore a need to model the variance. This model assumes that the variance at time t depends on the size of the squares of the error terms at previous times. 𝜎𝜎𝑡𝑡 2 = 𝛼𝛼0 + 𝛼𝛼1 𝜀𝜀𝑡𝑡−1 2 + ⋯ + 𝛼𝛼𝑞𝑞 𝜀𝜀𝑡𝑡−𝑞𝑞 2 = 𝛼𝛼0 + � 𝛼𝛼𝑖𝑖 𝑞𝑞 𝑖𝑖=1 𝜀𝜀𝑡𝑡−𝑖𝑖 2 GARCH A GARCH model is an improved model for observations taken in quick succession. It adds to the basic ARCH model by letting the current variance depend on previous variance as well as the square of previous error. Therefore, it is an autoregressive model of the variance of order p combined with a moving average of squared error terms of order q. 𝜎𝜎𝑡𝑡 2 = 𝛼𝛼0 + � 𝛼𝛼𝑖𝑖 𝑞𝑞 𝑖𝑖=1 𝜀𝜀𝑡𝑡−𝑖𝑖 2 + � 𝛽𝛽𝑖𝑖 𝑝𝑝 𝑖𝑖=1 𝜎𝜎𝑡𝑡−𝑖𝑖 2 This is quite close to the ARMA model, except that it is now applied to modelling variance. After running the diagnostics on my regression model I found there was homoskedacity in the data and therefore no need to model the variance.
  • 32. 31 | P a g e Section6: VAR model VAR models (vector autoregressive models) are used for multivariate time series. All variables in a VAR are treated symmetrically in a structural sense. Each variable has an equation explaining its evolution based on its own lags and the lags of the other model variables. First the variables are put into a (𝑘𝑘 × 1) vector, k being the number of variables involved. A VAR(p) model is defined as: 𝑦𝑦𝑡𝑡 = 𝑎𝑎 + � 𝐴𝐴𝑖𝑖 𝑦𝑦𝑡𝑡−𝑖𝑖 + 𝜀𝜀𝑡𝑡 𝑝𝑝 𝑖𝑖=1 Where: • 𝑦𝑦𝑡𝑡 is the vector of response time series variables at time t • a is a constant vector to be estimated from the data • Ai are n-by-n matrices for each i. The Ai are autoregressive matrices. There are p autoregressive matrices • 𝜀𝜀𝑡𝑡 is the vector of errors For the VAR model I am using prices and futures because production was shown earlier to be statistically insignificant. To change my data into the appropriate form: long=append(rev(prices),rev(futures)) long datamatrix = matrix(c(long),48,2) datamatrix TS5 = as.ts(datamatrix)
  • 33. 32 | P a g e Order Selection To pick the order of my VAR(p) model: VARselect(TS5,lag.max=6,type="const") This gave me: 1 2 3 4 AIC(n) 1.499415e+01 1.495799e+01 1.508302e+01 1.501362e+01 HQ(n) 1.508514e+01 1.510963e+01 1.529533e+01 1.528659e+01 SC(n) 1.524239e+01 1.537172e+01 1.566224e+01 1.575834e+01 FPE(n) 3.251535e+06 3.141637e+06 3.574357e+06 3.358867e+06 5 6 AIC(n) 1.511536e+01 1.504350e+01 HQ(n) 1.544898e+01 1.543779e+01 SC(n) 1.602556e+01 1.611920e+01 FPE(n) 3.761570e+06 3.560736e+06 AIC(n) HQ(n) SC(n) FPE(n) 2 1 1 2 Two selection criterions told me to choose order 1 and two told me to go with order 2, so I chose a VAR(1) model to keep the parameters at a minimum so as to not over fit the model. Forecasting To plug my model order into R: modelVAR = VAR(TS5,p=1,type="const") Forecast with model: VARpred = predict(modelVAR,n.ahead=7) VARpred
  • 34. 33 | P a g e Series.1 fcst lower upper CI [1,] 1524.295 1328.109 1720.481 196.1855 [2,] 1559.308 1300.869 1817.747 258.4392 [3,] 1586.911 1291.571 1882.251 295.3399 [4,] 1610.743 1291.150 1930.336 319.5926 [5,] 1631.060 1294.891 1967.229 336.1690 [6,] 1648.411 1300.656 1996.167 347.7555 [7,] 1663.225 1307.260 2019.189 355.9643 $Series.2 fcst lower upper CI [1,] 1538.577 1348.484 1728.671 190.0934 [2,] 1570.838 1320.305 1821.370 250.5328 [3,] 1597.917 1311.308 1884.526 286.6090 [4,] 1621.091 1310.810 1931.372 310.2812 [5,] 1640.871 1314.414 1967.328 326.4573 [6,] 1657.760 1319.999 1995.522 337.7616 [7,] 1672.180 1326.411 2017.949 345.7694 Where series 1 is aluminium prices and series 2 is futures.
  • 35. 34 | P a g e Comparing ARIMA and VAR Forecasts Prediction comparison got in R by: plot(c(40:55),TS1[40:55],xlim=c(40,55),type='l',ylab=("Time Series"),xlab=("Time")) par(lwd=1) points(c(48:55),c(TS1[48],TSpredict$pred),type='l',col="red") points(c(48:55),c(TS1[48],VARpred$fcst$Series.1[,1]),type='l',col="green")
  • 36. 35 | P a g e Section7: Conclusion Overall I found this topic of study quite interesting and enjoyable as I have a keen interest in financial markets. I’m sure that what I have learned over the course of this project will be of great use to me in the future. Forecasting prices is a very difficult task but I think I have done a fairly reasonable job at doing so. I think that learning R is something that will stand by me in my future career. Even though it took a long time to write the code and learn its commands I know feel comfortable with its language and look forward to using it again. I found it quite difficult to find data that would explain the movements of price in my regression model. Most of the data I wanted to include did not come freely, you needed to buy an LME subscription to view most of their data. I wanted to include LME warehouse inventories in my data but was unable. I also wanted to include variables that could affect the production of aluminium such as energy costs but again I found this task difficult. Overall, I found the project a thoroughly enjoyable experience. I found the study of time series and its applications both useful and interesting and I have developed a greater understanding of the subject. References • “Econometric modelling of non-ferrous metal prices”- Watkins & McAleer(2004) • “Introduction to Time Series and Forecasting”- Brockwell & Davis(2002) • “Forecasting and time series analysis”- Montgomery & Johnson(1990) • “Forecasting, Time Series, and Regression”- Bowerman & O’Connell(2004) • “Forecasting Gold Prices Using Multiple Linear Regression Method” [American Journal of Applied Sciences, Volume 6, Issue 8] – Ismail et al(2009) • “Prediction of steel prices: A comparison between a conventional regression model and MSSA” [Statistics and Its Interface Volume 3 (2010)] – Kapl & Muller(2010) • “Time Series Analysis of Stock Prices Using the Box-Jenkins Approach”- Green(2011) • “Comparison of ARIMA and Artificial Neural Networks Models for Stock Price Prediction” [Journal of Applied Mathematics Volume 2014] – Adebiyi et al(2014)