SlideShare a Scribd company logo
1 of 27
Download to read offline
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 1/27
Time Series Analysis MATH1318
Arpan Kumar (s3696599)
May 12, 2019
Importing necessary libraries
library(TSA)
library(lmtest)
library(lmtest)
library(tseries)
library(rlang)
library(pillar)
library(forecast)
Introduction
Coregonus hoyi, is a sliver colored freshwater fish found mostly found in Lake Nipigon and Great Lakes where it
habitats in underwater slopes. It is also known as the Bloater and belongs from the family of Salmonidae.
The objective of this report is to observe any pattern or seasonality in Coregonus’ egg deposition aged 3 in Lake
Huron (one of the five great lakes in North America) using time-series analysis methods and aim to predict any
changes in Egg depositions for the next five years. This report will touchbase on the topics like finding a relevant
model and apply suitable approaches to fit a model using visualisation and R functions for the provided dataset.
Reading Dataset
Eggs_Depositions <- read.csv("C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignmen
t 2/eggs.csv")
The egg deposition series/dataset is available in BloaterLH dataset under FSAdata package and consists of two
variables: Year (from 1981 to 1996), numerical variable and Egg depositions(in millions).
Before converting the dataset into time-series plot. It is important to check the class of the dataset.
class(Eggs_Depositions)
[1] "data.frame"
Converting the ‘data frame’ into ‘time-series’ format using ts() function
Code
Hide
Hide
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 2/27
Data <- ts(as.vector(Eggs_Depositions$eggs),start = 1981, end = 1996, frequency =1)
class(Data)
[1] "ts"
Data Exploration
Time Series visualisation
plot(Data,ylab = "Egg Egg deposition (in Mns)",xlab="Years", main = "Figure 1, Egg depositions o
f age 3 Bloaters in Lake Huron
(1981-1996)",type="o",col="darkblue",xaxt="n")
axis(1,at=seq(1981,1996,by=1),las=2)
From the above image (Figure 1), clearly a trend is observed. Egg depositions reached its peak until 1990 and
post that downward trend can be observed until 1993 and then upward trend is seen post 1993. Therefore, it can
be concluded that there is changing variance within the dataset. However, with the succeeding observations it
could be implied that there is an existence of auto-regressive. Hence it would be challenging to prepare the data to
use for the predictions for next five years
Scatterplot comparing lagged value
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 3/27
y=Data
x=zlag(Data)
index=2:length(x)
cor(y[index],x[index])
[1] 0.7445657
plot(y=y,x=x,ylab = "Egg deposition (in Mns)",xlab="Egg depositions for previous year (in Mns)",
main = "Figure 2, Scatter plot for Egg depositions against its lagged value",col="darkblue")
With the correlation value of 0.74 and from figure 2, it can be implied that there’s a strong correlation between
Egg’s deposition for a year with that of its lagged value (successive year egg deposition).
Interpretating Time Series using Modeling Techniques
In the process of selecting a best model, different modeling techniques will be used to identify a model that
fits the data the best.
Linear Model
model1 = lm(Data~time(Data))
summary(model1)
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 4/27
Call:
lm(formula = Data ~ time(Data))
Residuals:
Min 1Q Median 3Q Max
-0.4048 -0.2768 -0.1933 0.2536 1.1857
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -165.98275 49.58836 -3.347 0.00479 **
time(Data) 0.08387 0.02494 3.363 0.00464 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4598 on 14 degrees of freedom
Multiple R-squared: 0.4469, Adjusted R-squared: 0.4074
F-statistic: 11.31 on 1 and 14 DF, p-value: 0.004642
plot(Data, ylab = "Egg deposition (in Mns)",xlab="Years", main = "Figure 3, Fitted linear trend
model",type="o",col="darkblue",xaxt="n")
axis(1,at=seq(1981,1996,by=1),las=2)
abline(model1,col="Blue",lty=2)
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 5/27
res.model1 = rstudent(model1)
plot(y=res.model1, x = as.vector(time(Data)),xlab="Years",ylab="Standardised Residuals",main =
"Figure 4, Residual of linear trend model",type="o",col="darkblue",xaxt="n")
axis(1,at=seq(1981,1996,by=1),las=2)
qqnorm(res.model1,main="Figure 5, Normal QQ Plot for residual values")
qqline(res.model1,col =4,lwd=1,lty=2)
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 6/27
shapiro.test(res.model1)
Shapiro-Wilk normality test
data: res.model1
W = 0.7726, p-value = 0.001205
From linear model summary ,with adjusted R-square value of 40%, explaining a weaker variance between
the values, and
from Figure 5, as the data points are not closer to the line of best fit, indicating that the data is not normal,
and
from Shapiro wilk test summary, where p-value less than 0.05
It can be infered that Linear model is not a good model to go ahead with.
Quadratic Model
t = time(Data)
t2 = t^2
model2= lm(Data ~ t+t2)
summary(model2)
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 7/27
Call:
lm(formula = Data ~ t + t2)
Residuals:
Min 1Q Median 3Q Max
-0.50896 -0.25523 -0.02701 0.16615 0.96322
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.647e+04 2.141e+04 -2.170 0.0491 *
t 4.665e+01 2.153e+01 2.166 0.0494 *
t2 -1.171e-02 5.415e-03 -2.163 0.0498 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4092 on 13 degrees of freedom
Multiple R-squared: 0.5932, Adjusted R-squared: 0.5306
F-statistic: 9.479 on 2 and 13 DF, p-value: 0.00289
plot(ts(fitted(model2)),ylim=c(min(c(fitted(model2),as.vector(Data))),max(c(fitted(model2),as.ve
ctor(Data)))),
ylab="Egg deposition (in Mns)", main = "Figure 6, fitted quadratic model", type ="l",lty =2
, col="blue",xlab="Years")
lines(as.vector(Data),type="o")
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 8/27
res.model2 = rstudent(model2)
plot(y=res.model2, x= as.vector(time(Data)),xlab="Year",ylab="Egg Deposition (in Mns)",type="o",
xaxt="n",main="Figure 7, Residual of quadratic model",col="darkblue")
axis(1,at=seq(1981,1996,by=1),las=2)
abline(h=0, col="Blue")
qqnorm(res.model2,main="Figure 8, Normal QQ Plot for residual values")
qqline(res.model2, col=4,lwd=1,lty=2)
Hide
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 9/27
shapiro.test(res.model2)
Shapiro-Wilk normality test
data: res.model2
W = 0.87948, p-value = 0.03809
From quadratic model summary ,with adjusted R-square value of 53%, explaining a slight strong variance
between the values as compared to linear model, and
from Figure 8, as the data points are closer to the line of best fit, indicating that the data is normal, and
from Shapiro wilk test summary, where p-value less than 0.05
It can be infered that quadratic model is slightly a better model as compared to linear model, with higher adjusted
R-square value (53%) and in QQ plot data points are closer to the line of best fit. However, both the model has p-
value which is less than 0.05 when checked for shapiro-wilk test, rejecting null hypothesis which means data is not
normally distributed.
Preparing Dataset
Hypothesis Testing
ACF (Auto-correlation function) and PACF (Partial auto-correlation function).
Using ACF and PACF to conduct initial check for the hypothesis.
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 10/27
H0: Dataset is non-stationary
HA: Dataset is stationary
ACF explains how present values in the dataset is related to its lagged values (past values).
acf(Data)
While PACF, explains the correlation between the residuals and the next lag value of a given time-series data
pacf(Data)
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 11/27
Using interpretations of results from the figures shown above (ACF and PACF), it can be observed that there is a
slow decaying pattern in ACF and a high first correlation In PACF. From this it can be said that there is a presence
of a trend and the data is non-stationary. Therefore, in order to prepare the data stationary. It is important to
perform the transformation and differencing on the given data.
Transformation
Before transforming the dataset, the use of ‘Augmented Dickey-fuller test’ is conducted to re-confirm the existence
of non-stationarity within the dataset, statistically. And, would check for normality using ‘Shapiro Test’.
adf.test(Data) #Augmented Dickey-fuller test
Augmented Dickey-Fuller Test
data: Data
Dickey-Fuller = -2.0669, Lag order = 2, p-value = 0.5469
alternative hypothesis: stationary
shapiro.test(Data) #Augmented Dickey-fuller test
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 12/27
Shapiro-Wilk normality test
data: Data
W = 0.94201, p-value = 0.3744
Since p-value (0.54) is greater 0.05, for Dickey-Fuller test, this means that we fail to reject our null hypothesis or in
other words, there is existence of non-stationarity in the dataset.
In the shapiro-wilk test since p-value(0.37) is greater than 0.05, we can say that we fail tp reject not statiscally
signifcant or in other words there is normality within the dataset.
Box-Cox Transformation
Data_T=BoxCox.ar(Data,method="yule-walker")
possible convergence problem: optim gave code = 1possible convergence problem: optim gave code =
1
Data_T
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 13/27
$`lambda`
[1] -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -
0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5
[27] 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
$loglike
[1] -18.4457011 -15.1801393 -11.9522118 -8.7654260 -5.6237637 -2.5317646 0.5053751 3.481
6875 6.3903005 9.2232907 11.9715279
[12] 14.6245247 17.1703124 19.5953784 21.8847108 24.0220046 25.9900870 27.7715993 29.349
9352 30.7103710 31.8412544 32.7350622
[23] 33.3891294 33.8059042 33.9926850 33.9609089 33.7251475 33.3019920 32.7089913 31.963
7533 31.0832597 30.0833992 28.9786936
[34] 27.7821762 26.5053815 25.1584095 23.7500337 22.2878321 20.7783238 19.2271034 17.638
9661
$mle
[1] 0.4
$ci
[1] 0.1 0.8
The lambda values captured by 95% confidence interval, falls between 0.1 and 0.8. Therefore, the mid-point of the
confidence interval CI[0.1,0.8] is 0.45. It’s this value of 0.45 which will be used as a lambda value for the Box-Cox
Transformation
lambda = 0.45
Data_T_BoxCox = (Data^lambda-1)/lambda
Normality check post BoxCox transformation
qqnorm(Data_T_BoxCox,main="Figure 9, Normal QQ Plot post BoxCox transformation, lambda=0.45")
qqline(Data_T_BoxCox,col=4)
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 14/27
shapiro.test(Data_T_BoxCox)
Shapiro-Wilk normality test
data: Data_T_BoxCox
W = 0.96269, p-value = 0.7107
With the increase in p-value in Shapiro-Wilk test from 0.374 to 0.710 and with data points falling more closer to the
line of best fit in figure 9, or in other words with lesser deviation of the dot-points from the line of best fit. It can be
said that the normality of the data series has improved using Box-Cox transformation. Hence it could be confirmed
that the data series is normally distributed now.
Differencing
First Differencing
Data_T_BoxCox_diff = diff(Data_T_BoxCox)
plot(Data_T_BoxCox_diff,type="o",ylab="Egg deposition (in Mns)", main = "Figure 10, First differ
encing plot on transformed data",xaxt="n",xlab="Years",col="darkblue")
axis(1,at=seq(1981,1996,by=1),las=2)
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 15/27
adf.test(Data_T_BoxCox_diff)
Augmented Dickey-Fuller Test
data: Data_T_BoxCox_diff
Dickey-Fuller = -3.6798, Lag order = 2, p-value = 0.0443
alternative hypothesis: stationary
With p-value 0.04 less than the alpha value of 0.05, we reject the null hypothesis which means that the data series
is stationary post first differencing. However, from figure 10, a trend can still be observed. Therefore, it is important
to eliminate the trend observe in the graph using second differencing
Second Differencing
Data_T2_BoxCox_diff = diff(Data_T_BoxCox,differences = 2)
plot(Data_T2_BoxCox_diff,type="o",ylab="Egg deposition (in Mns)", main = "Figure 11, Second diff
erencing plot on transformed data",xaxt="n",xlab="Years",col="darkblue")
axis(1,at=seq(1981,1996,by=1),las=2)
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 16/27
From figure 11, no longer a trend can be observed, which means its safer to say that the data series is stationary.
To confirm the statement let us check using adf.test
adf.test(Data_T2_BoxCox_diff)
Augmented Dickey-Fuller Test
data: Data_T2_BoxCox_diff
Dickey-Fuller = -3.1733, Lag order = 2, p-value = 0.1254
alternative hypothesis: stationary
Considering there’s no trend after second differencing, from figure 11. However with the p-value greater than alpha
=0.05 which is not a higher number. It is safe to use the data series for further analysis using second differencing
Modeling
Model Specification
With the use of first differencing, the changing variance trend has been removed in the data series (Figure 10) and
with the data series now being stationary too, we will build multiple models using taught approaches. Before
proceeding, model specification would be done using several approaches
Model specification using ACF and PACF for the differenced series
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 17/27
acf(Data_T2_BoxCox_diff)
pacf(Data_T2_BoxCox_diff)
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 18/27
From the ACF plot there is no significant lag. However in PACF plot shown above, there are a significant lags
present present at 4. Therefore ARIMA(1,2,0) model should be considered, with no presence of white noise
behaviour.
Model specification using EACF (Extended ACF) for the differenced series
With the use of EACF here the order of AR(Autoregressive) and MA(moving average) component of ARMA
model can be identified.
eacf(Data_T2_BoxCox_diff, ar.max=3, ma.max=3)
AR/MA
0 1 2 3
0 o o o o
1 o o o o
2 o o o o
3 o o o o
With the upper left point as (0,0) in the extended ACF method, confirming that there is presence of white noise
behaviour.Therefore, from the output above the neighbouring points can be considered and the following models
can be taken ARIMA(0,2,1), ARIMA(1,2,1), ARIMA(1,2,0) additionaly for further analysis.
Model specification using BIC (Bayesion Information Criterion) for the differenced
series
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 19/27
res = armasubsets(y=Data_T2_BoxCox_diff,nar=3,nma=2,y.name='test',ar.method='ols')
model order: 7 singularities in the computation of the projection matrix results are only valid
up to model order 6
plot(res)
From the above table, corresponding shaded columns are AR(1) and AR(3) coefficients. Also, two MA() effects
can be seen and those are MA(1) and MA(2). Therefore, using these coefficients four more possible ARIMA
models can be considered ARIMA(1,2,1), ARIMA(1,2,2), ARIMA(3,2,1) and ARIMA(3,2,2)
At the end, possible set of ARIMA model are:
ARIMA(1,2,0) - using ACF and PACF
ARIMA(0,2,1) - using EACF
ARIMA(1,2,1) - using BIC
ARIMA(1,2,2) - using BIC
ARIMA(3,2,1) - using BIC
ARIMA(3,2,2) - using BIC
Parameter estimation
ARIMA(1,2,0)
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 20/27
model_120_css = arima(Data,order=c(1,2,0),method='CSS')
coeftest(model_120_css)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 -0.45944 0.23810 -1.9296 0.05365 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
model_120_ml = arima(Data,order=c(1,2,0),method='ML')
coeftest(model_120_ml)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 -0.42966 0.22743 -1.8892 0.05886 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
AR(1) is insignificant for CSS method and ML estimation.
ARIMA(0,2,1)
model_021_css = arima(Data,order=c(0,2,1),method='CSS')
coeftest(model_021_css)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ma1 -1.066739 0.071847 -14.847 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
model_021_ml = arima(Data,order=c(0,2,1),method='ML')
coeftest(model_021_ml)
Hide
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 21/27
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ma1 -1.00000 0.25823 -3.8725 0.0001077 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
MA(1) is significant for CSS method and ML estimation
ARIMA(1,2,1)
model_121_css = arima(Data,order=c(1,2,1),method='CSS')
coeftest(model_121_css)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 0.073817 0.284315 0.2596 0.7951
ma1 -1.132556 0.074796 -15.1419 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
model_121_ml = arima(Data,order=c(1,2,1),method='ML')
coeftest(model_121_ml)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 0.071764 0.269251 0.2665 0.7898
ma1 -0.999999 0.236872 -4.2217 2.425e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
AR(1) is insignificant for CSS method and ML estimation while MA(1) is significant.
ARIMA(1,2,2)
model_122_css = arima(Data,order=c(1,2,2),method='CSS')
coeftest(model_122_css)
Hide
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 22/27
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 1.005671 0.049778 20.203 < 2.2e-16 ***
ma1 -2.824099 0.125344 -22.531 < 2.2e-16 ***
ma2 1.838559 0.114620 16.040 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
model_122_ml = arima(Data,order=c(1,2,2),method='ML')
coeftest(model_122_ml)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 0.058932 1.166803 0.0505 0.9597
ma1 -0.987060 1.155631 -0.8541 0.3930
ma2 -0.012925 1.130998 -0.0114 0.9909
AR(1),MA(1) and MA(2) are significant for CSS method and not for ML estimation
ARIMA(3,2,1)
model_321_css = arima(Data,order=c(3,2,1),method='CSS')
coeftest(model_321_css)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 -0.17209 0.26139 -0.6584 0.510286
ar2 -0.19198 0.25364 -0.7569 0.449106
ar3 -0.52748 0.24922 -2.1165 0.034300 *
ma1 -0.64906 0.24639 -2.6343 0.008432 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
model_321_ml = arima(Data,order=c(3,2,1),method='ML')
coeftest(model_321_ml)
Hide
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 23/27
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 0.0042103 0.3147996 0.0134 0.9893
ar2 -0.0431425 0.2891200 -0.1492 0.8814
ar3 -0.3350403 0.2798031 -1.1974 0.2311
ma1 -0.9018704 0.6489515 -1.3897 0.1646
AR(3) and MA(1) are significant in CSS method and AR(1),AR(2),AR(3),MA(1) all are insignificant for ML
estimation
ARIMA(3,2,2)
model_322_css = arima(Data,order=c(3,2,2),method='CSS')
coeftest(model_322_css)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 -0.139922 0.348205 -0.4018 0.68780
ar2 -0.198602 0.258520 -0.7682 0.44235
ar3 -0.544728 0.277057 -1.9661 0.04928 *
ma1 -0.683391 0.360419 -1.8961 0.05795 .
ma2 0.045948 0.329455 0.1395 0.88908
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
model_322_ml = arima(Data,order=c(3,2,2),method='ML')
coeftest(model_322_ml)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 0.132356 0.472334 0.2802 0.77931
ar2 -0.099038 0.323387 -0.3063 0.75941
ar3 -0.384399 0.314004 -1.2242 0.22088
ma1 -0.996970 0.589359 -1.6916 0.09072 .
ma2 0.199121 0.472894 0.4211 0.67371
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
AR(3) is significant for CSS method and AR(1),AR(2),AR(3),MA(1) and MA(2) are insignificant for ML estimation
After comparing all the results above, ARIMA model (1,2,2) has all the significant coefficient using CSS
method.
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 24/27
Sort AIC and BIC based on score
Function to sort model based on their AIC and BIC score
sort.score = function(x,score = c("bic","aic")){if(score == "aic"){
x[with(x,order(AIC)),]} else if (score == "bic") {x[with(x,order(BIC)),]}
else {warning('score ="x" accepts valid arguments ("aic","bic")')}}
Sorting AIC model
sort.score(AIC(model_120_ml,model_021_ml,model_121_ml,model_122_ml,model_321_ml,model_322_ml),sc
ore="aic")
df
<dbl>
AIC
<dbl>
model_021_ml 2 22.74602
model_121_ml 3 24.67428
model_120_ml 2 26.57611
model_122_ml 4 26.67412
model_321_ml 5 26.90919
model_322_ml 6 28.75165
6 rows
sort.score(BIC(model_120_ml,model_021_ml,model_121_ml,model_122_ml,model_321_ml,model_322_ml),sc
ore="bic")
df
<dbl>
BIC
<dbl>
model_021_ml 2 24.02413
model_121_ml 3 26.59145
model_120_ml 2 27.85423
model_122_ml 4 29.23035
model_321_ml 5 30.10448
model_322_ml 6 32.58599
6 rows
With the lowest AIC and BIC score of 22.7 and 24.0, it can be said that ARIMA (0,2,1) model is the best model.
Hide
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 25/27
Overfitting
With insignificant AR(1), we can conclude that ARIMA (1,2,2) is a model which overfits ARIMA (0,2,1) model
Best model selection
After observing results from AIC and BIC and cconfirming all the overfitting models, it can be concluded that
ARIMA (0,2,1) is the best model for predicting next 5 year egg depositions (in Mns).
Model Diagnostics
For the selected best model, the Standardised residual behaviour will be analysed for its normality and
autocorrelation in order to test the results. Also, Ljung-Box test will be performed to verify the correct model
selection.
residual.analysis <- function(model, std = TRUE){
library(TSA)
install.packages("FitAR")
library(FitAR)
if (std == TRUE){
res.model = rstandard(model)
}else{
res.model = residuals(model)
}
par(mfrow=c(3,2))
plot(res.model,type='o',ylab='Standardised residuals', main="Time series plot of standardised r
esiduals")
abline(h=0)
hist(res.model,main="Histogram of standardised residuals")
qqnorm(res.model,main="QQ plot of standardised residuals")
qqline(res.model, col = 2)
acf(res.model,main="ACF of standardised residuals")
pacf(res.model,main="PACF of standardised residuals")
print(shapiro.test(res.model))
k=0
LBQPlot(res.model, lag.max = length(model$residuals)-1 , StartLag = k + 1, k = 0, SquaredQ = FA
LSE)
par(mfrow=c(1,1))
}
residual.analysis(model=model_021_ml)
Hide
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 26/27
Error in install.packages : Updating loaded packages
Shapiro-Wilk normality test
data: res.model
W = 0.92478, p-value = 0.2013
From the above output it can be concluded the following,
Time Series plot of standardised residuals shows that there is no general trend with no changing variance,
supporting the selected ARIMA (0,2,1)
Histogram of standardised residuals is somewhat similar to a normal distribution.
In the QQplot of standardised residuals, majority of the data points are closer to the line of best fit or there is
less variance. However, data points at the ends are far from the line best, that could be explained by
‘smaller dataset’ in terms of no. of observations.
With p-value of 0.20 greater than alpha value of 0.05, we fail to reject our null hypothesis which means the
data series in the selected data satisfies the normality behavior.
From the ACF and PACF, there are no significant lags. Hence it could be said that there is existence of
white noise in the data series
As per the Ljung Box Test, none of the data points are under or falls below the red line, supporting our
model.
Forecast
To predict or forecast the next five year ‘Egg depositions (in Mns)’, ARIMA (0,2,1) model will be leveraged.
Hide
5/12/2019 Time Series Analysis MATH1318
file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 27/27
Fit = Arima(Data,c(0,2,1),lambda=0.45)
plot(forecast(Fit,h=5),xlab="Year",ylab="Egg Deposition (in Mns)",type="o",xaxt="n",main="Fore
casting next five year values for Egg deposition(1996-2001)",col=4,fcol=4,shadebars=TRUE)
axis(1,at=seq(1981,2001,by=1),las=2)
Summary
This repost covers the processes that are followed in order to achieve the final goal, i.e., the prediction of egg
depositions of Bloaters for the next five years after 1996. The first step was to convert the series from non-
stationary to stationary, using the methods transformation and differencing. The model specifications are then
found by the results of PACF, ACF, BIC and EACF. Parameter estimation is done by CSS method and ML
estimations. The confirmation of best model has come from the outputs of the AIC and BIC. From the Model
diagnostic, the model ARIMA(0,2,1) has turned out to be the best feasible model in order to forecast the eggs
depositions of Bloaters in the next five years. The output of forecast indicates a increase in egg deposition after
1996 for the next five years.

More Related Content

What's hot (8)

Unu gtp-sc-04-13
Unu gtp-sc-04-13Unu gtp-sc-04-13
Unu gtp-sc-04-13
 
Decline curve
Decline curveDecline curve
Decline curve
 
Presentation
PresentationPresentation
Presentation
 
Real-Time Data Mining for Event Streams
Real-Time Data Mining for Event StreamsReal-Time Data Mining for Event Streams
Real-Time Data Mining for Event Streams
 
416-project
416-project416-project
416-project
 
8th Semester (December; January-2014 and 2015) Computer Science and Informati...
8th Semester (December; January-2014 and 2015) Computer Science and Informati...8th Semester (December; January-2014 and 2015) Computer Science and Informati...
8th Semester (December; January-2014 and 2015) Computer Science and Informati...
 
11. Linear Models
11. Linear Models11. Linear Models
11. Linear Models
 
Spatial SQL
Spatial SQLSpatial SQL
Spatial SQL
 

Similar to Egg deposition for Bloaters, aged 3, in Lake Huron and predicting next five year values

Dem 7263 fall 2015 Spatial GLM's
Dem 7263 fall 2015 Spatial GLM'sDem 7263 fall 2015 Spatial GLM's
Dem 7263 fall 2015 Spatial GLM'sCorey Sparks
 
A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...
A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...
A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...ijfls
 
A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...
A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...
A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...Wireilla
 
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATADETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATAIJCSEA Journal
 
MFBLP Method Forecast for Regional Load Demand System
MFBLP Method Forecast for Regional Load Demand SystemMFBLP Method Forecast for Regional Load Demand System
MFBLP Method Forecast for Regional Load Demand SystemCSCJournals
 
Considerations on the collection of data from bio-argo floats across sampling...
Considerations on the collection of data from bio-argo floats across sampling...Considerations on the collection of data from bio-argo floats across sampling...
Considerations on the collection of data from bio-argo floats across sampling...SeaBirdScientific
 
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...ShuaiGao3
 
R language Project report
R language Project reportR language Project report
R language Project reportTianyue Wang
 
CELEBRATION 2000 (3rd Deployment) & ALP 2002 Generation of a 3D-Model of the ...
CELEBRATION 2000 (3rd Deployment) & ALP 2002Generation of a 3D-Model of the ...CELEBRATION 2000 (3rd Deployment) & ALP 2002Generation of a 3D-Model of the ...
CELEBRATION 2000 (3rd Deployment) & ALP 2002 Generation of a 3D-Model of the ...gigax2
 
Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...
Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...
Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...Waqas Tariq
 
Eli plots visualizing innumerable number of correlations
Eli plots   visualizing innumerable number of correlationsEli plots   visualizing innumerable number of correlations
Eli plots visualizing innumerable number of correlationsLeonardo Auslender
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with RYanchang Zhao
 
A multiplicative time series model
A multiplicative time series modelA multiplicative time series model
A multiplicative time series modelMohammed Awad
 
Byungchul Yea (Project)
Byungchul Yea (Project)Byungchul Yea (Project)
Byungchul Yea (Project)Byung Chul Yea
 

Similar to Egg deposition for Bloaters, aged 3, in Lake Huron and predicting next five year values (20)

ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]
 
multiscale_tutorial.pdf
multiscale_tutorial.pdfmultiscale_tutorial.pdf
multiscale_tutorial.pdf
 
Dem 7263 fall 2015 Spatial GLM's
Dem 7263 fall 2015 Spatial GLM'sDem 7263 fall 2015 Spatial GLM's
Dem 7263 fall 2015 Spatial GLM's
 
A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...
A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...
A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...
 
A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...
A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...
A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...
 
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATADETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
 
MFBLP Method Forecast for Regional Load Demand System
MFBLP Method Forecast for Regional Load Demand SystemMFBLP Method Forecast for Regional Load Demand System
MFBLP Method Forecast for Regional Load Demand System
 
Considerations on the collection of data from bio-argo floats across sampling...
Considerations on the collection of data from bio-argo floats across sampling...Considerations on the collection of data from bio-argo floats across sampling...
Considerations on the collection of data from bio-argo floats across sampling...
 
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
 
R language Project report
R language Project reportR language Project report
R language Project report
 
CELEBRATION 2000 (3rd Deployment) & ALP 2002 Generation of a 3D-Model of the ...
CELEBRATION 2000 (3rd Deployment) & ALP 2002Generation of a 3D-Model of the ...CELEBRATION 2000 (3rd Deployment) & ALP 2002Generation of a 3D-Model of the ...
CELEBRATION 2000 (3rd Deployment) & ALP 2002 Generation of a 3D-Model of the ...
 
50120130405032
5012013040503250120130405032
50120130405032
 
Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...
Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...
Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
 
Longintro
LongintroLongintro
Longintro
 
Eli plots visualizing innumerable number of correlations
Eli plots   visualizing innumerable number of correlationsEli plots   visualizing innumerable number of correlations
Eli plots visualizing innumerable number of correlations
 
CLIM: Transition Workshop - Statistical Approaches for Un-Mixing Problem and ...
CLIM: Transition Workshop - Statistical Approaches for Un-Mixing Problem and ...CLIM: Transition Workshop - Statistical Approaches for Un-Mixing Problem and ...
CLIM: Transition Workshop - Statistical Approaches for Un-Mixing Problem and ...
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
A multiplicative time series model
A multiplicative time series modelA multiplicative time series model
A multiplicative time series model
 
Byungchul Yea (Project)
Byungchul Yea (Project)Byungchul Yea (Project)
Byungchul Yea (Project)
 

Recently uploaded

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 

Recently uploaded (20)

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 

Egg deposition for Bloaters, aged 3, in Lake Huron and predicting next five year values

  • 1. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 1/27 Time Series Analysis MATH1318 Arpan Kumar (s3696599) May 12, 2019 Importing necessary libraries library(TSA) library(lmtest) library(lmtest) library(tseries) library(rlang) library(pillar) library(forecast) Introduction Coregonus hoyi, is a sliver colored freshwater fish found mostly found in Lake Nipigon and Great Lakes where it habitats in underwater slopes. It is also known as the Bloater and belongs from the family of Salmonidae. The objective of this report is to observe any pattern or seasonality in Coregonus’ egg deposition aged 3 in Lake Huron (one of the five great lakes in North America) using time-series analysis methods and aim to predict any changes in Egg depositions for the next five years. This report will touchbase on the topics like finding a relevant model and apply suitable approaches to fit a model using visualisation and R functions for the provided dataset. Reading Dataset Eggs_Depositions <- read.csv("C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignmen t 2/eggs.csv") The egg deposition series/dataset is available in BloaterLH dataset under FSAdata package and consists of two variables: Year (from 1981 to 1996), numerical variable and Egg depositions(in millions). Before converting the dataset into time-series plot. It is important to check the class of the dataset. class(Eggs_Depositions) [1] "data.frame" Converting the ‘data frame’ into ‘time-series’ format using ts() function Code Hide Hide Hide Hide
  • 2. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 2/27 Data <- ts(as.vector(Eggs_Depositions$eggs),start = 1981, end = 1996, frequency =1) class(Data) [1] "ts" Data Exploration Time Series visualisation plot(Data,ylab = "Egg Egg deposition (in Mns)",xlab="Years", main = "Figure 1, Egg depositions o f age 3 Bloaters in Lake Huron (1981-1996)",type="o",col="darkblue",xaxt="n") axis(1,at=seq(1981,1996,by=1),las=2) From the above image (Figure 1), clearly a trend is observed. Egg depositions reached its peak until 1990 and post that downward trend can be observed until 1993 and then upward trend is seen post 1993. Therefore, it can be concluded that there is changing variance within the dataset. However, with the succeeding observations it could be implied that there is an existence of auto-regressive. Hence it would be challenging to prepare the data to use for the predictions for next five years Scatterplot comparing lagged value Hide Hide
  • 3. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 3/27 y=Data x=zlag(Data) index=2:length(x) cor(y[index],x[index]) [1] 0.7445657 plot(y=y,x=x,ylab = "Egg deposition (in Mns)",xlab="Egg depositions for previous year (in Mns)", main = "Figure 2, Scatter plot for Egg depositions against its lagged value",col="darkblue") With the correlation value of 0.74 and from figure 2, it can be implied that there’s a strong correlation between Egg’s deposition for a year with that of its lagged value (successive year egg deposition). Interpretating Time Series using Modeling Techniques In the process of selecting a best model, different modeling techniques will be used to identify a model that fits the data the best. Linear Model model1 = lm(Data~time(Data)) summary(model1) Hide Hide
  • 4. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 4/27 Call: lm(formula = Data ~ time(Data)) Residuals: Min 1Q Median 3Q Max -0.4048 -0.2768 -0.1933 0.2536 1.1857 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -165.98275 49.58836 -3.347 0.00479 ** time(Data) 0.08387 0.02494 3.363 0.00464 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.4598 on 14 degrees of freedom Multiple R-squared: 0.4469, Adjusted R-squared: 0.4074 F-statistic: 11.31 on 1 and 14 DF, p-value: 0.004642 plot(Data, ylab = "Egg deposition (in Mns)",xlab="Years", main = "Figure 3, Fitted linear trend model",type="o",col="darkblue",xaxt="n") axis(1,at=seq(1981,1996,by=1),las=2) abline(model1,col="Blue",lty=2) Hide Hide
  • 5. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 5/27 res.model1 = rstudent(model1) plot(y=res.model1, x = as.vector(time(Data)),xlab="Years",ylab="Standardised Residuals",main = "Figure 4, Residual of linear trend model",type="o",col="darkblue",xaxt="n") axis(1,at=seq(1981,1996,by=1),las=2) qqnorm(res.model1,main="Figure 5, Normal QQ Plot for residual values") qqline(res.model1,col =4,lwd=1,lty=2) Hide Hide
  • 6. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 6/27 shapiro.test(res.model1) Shapiro-Wilk normality test data: res.model1 W = 0.7726, p-value = 0.001205 From linear model summary ,with adjusted R-square value of 40%, explaining a weaker variance between the values, and from Figure 5, as the data points are not closer to the line of best fit, indicating that the data is not normal, and from Shapiro wilk test summary, where p-value less than 0.05 It can be infered that Linear model is not a good model to go ahead with. Quadratic Model t = time(Data) t2 = t^2 model2= lm(Data ~ t+t2) summary(model2) Hide Hide
  • 7. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 7/27 Call: lm(formula = Data ~ t + t2) Residuals: Min 1Q Median 3Q Max -0.50896 -0.25523 -0.02701 0.16615 0.96322 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -4.647e+04 2.141e+04 -2.170 0.0491 * t 4.665e+01 2.153e+01 2.166 0.0494 * t2 -1.171e-02 5.415e-03 -2.163 0.0498 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.4092 on 13 degrees of freedom Multiple R-squared: 0.5932, Adjusted R-squared: 0.5306 F-statistic: 9.479 on 2 and 13 DF, p-value: 0.00289 plot(ts(fitted(model2)),ylim=c(min(c(fitted(model2),as.vector(Data))),max(c(fitted(model2),as.ve ctor(Data)))), ylab="Egg deposition (in Mns)", main = "Figure 6, fitted quadratic model", type ="l",lty =2 , col="blue",xlab="Years") lines(as.vector(Data),type="o") Hide
  • 8. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 8/27 res.model2 = rstudent(model2) plot(y=res.model2, x= as.vector(time(Data)),xlab="Year",ylab="Egg Deposition (in Mns)",type="o", xaxt="n",main="Figure 7, Residual of quadratic model",col="darkblue") axis(1,at=seq(1981,1996,by=1),las=2) abline(h=0, col="Blue") qqnorm(res.model2,main="Figure 8, Normal QQ Plot for residual values") qqline(res.model2, col=4,lwd=1,lty=2) Hide Hide Hide
  • 9. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 9/27 shapiro.test(res.model2) Shapiro-Wilk normality test data: res.model2 W = 0.87948, p-value = 0.03809 From quadratic model summary ,with adjusted R-square value of 53%, explaining a slight strong variance between the values as compared to linear model, and from Figure 8, as the data points are closer to the line of best fit, indicating that the data is normal, and from Shapiro wilk test summary, where p-value less than 0.05 It can be infered that quadratic model is slightly a better model as compared to linear model, with higher adjusted R-square value (53%) and in QQ plot data points are closer to the line of best fit. However, both the model has p- value which is less than 0.05 when checked for shapiro-wilk test, rejecting null hypothesis which means data is not normally distributed. Preparing Dataset Hypothesis Testing ACF (Auto-correlation function) and PACF (Partial auto-correlation function). Using ACF and PACF to conduct initial check for the hypothesis. Hide
  • 10. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 10/27 H0: Dataset is non-stationary HA: Dataset is stationary ACF explains how present values in the dataset is related to its lagged values (past values). acf(Data) While PACF, explains the correlation between the residuals and the next lag value of a given time-series data pacf(Data) Hide Hide
  • 11. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 11/27 Using interpretations of results from the figures shown above (ACF and PACF), it can be observed that there is a slow decaying pattern in ACF and a high first correlation In PACF. From this it can be said that there is a presence of a trend and the data is non-stationary. Therefore, in order to prepare the data stationary. It is important to perform the transformation and differencing on the given data. Transformation Before transforming the dataset, the use of ‘Augmented Dickey-fuller test’ is conducted to re-confirm the existence of non-stationarity within the dataset, statistically. And, would check for normality using ‘Shapiro Test’. adf.test(Data) #Augmented Dickey-fuller test Augmented Dickey-Fuller Test data: Data Dickey-Fuller = -2.0669, Lag order = 2, p-value = 0.5469 alternative hypothesis: stationary shapiro.test(Data) #Augmented Dickey-fuller test Hide Hide
  • 12. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 12/27 Shapiro-Wilk normality test data: Data W = 0.94201, p-value = 0.3744 Since p-value (0.54) is greater 0.05, for Dickey-Fuller test, this means that we fail to reject our null hypothesis or in other words, there is existence of non-stationarity in the dataset. In the shapiro-wilk test since p-value(0.37) is greater than 0.05, we can say that we fail tp reject not statiscally signifcant or in other words there is normality within the dataset. Box-Cox Transformation Data_T=BoxCox.ar(Data,method="yule-walker") possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1 Data_T Hide Hide
  • 13. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 13/27 $`lambda` [1] -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 - 0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 [27] 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 $loglike [1] -18.4457011 -15.1801393 -11.9522118 -8.7654260 -5.6237637 -2.5317646 0.5053751 3.481 6875 6.3903005 9.2232907 11.9715279 [12] 14.6245247 17.1703124 19.5953784 21.8847108 24.0220046 25.9900870 27.7715993 29.349 9352 30.7103710 31.8412544 32.7350622 [23] 33.3891294 33.8059042 33.9926850 33.9609089 33.7251475 33.3019920 32.7089913 31.963 7533 31.0832597 30.0833992 28.9786936 [34] 27.7821762 26.5053815 25.1584095 23.7500337 22.2878321 20.7783238 19.2271034 17.638 9661 $mle [1] 0.4 $ci [1] 0.1 0.8 The lambda values captured by 95% confidence interval, falls between 0.1 and 0.8. Therefore, the mid-point of the confidence interval CI[0.1,0.8] is 0.45. It’s this value of 0.45 which will be used as a lambda value for the Box-Cox Transformation lambda = 0.45 Data_T_BoxCox = (Data^lambda-1)/lambda Normality check post BoxCox transformation qqnorm(Data_T_BoxCox,main="Figure 9, Normal QQ Plot post BoxCox transformation, lambda=0.45") qqline(Data_T_BoxCox,col=4) Hide Hide
  • 14. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 14/27 shapiro.test(Data_T_BoxCox) Shapiro-Wilk normality test data: Data_T_BoxCox W = 0.96269, p-value = 0.7107 With the increase in p-value in Shapiro-Wilk test from 0.374 to 0.710 and with data points falling more closer to the line of best fit in figure 9, or in other words with lesser deviation of the dot-points from the line of best fit. It can be said that the normality of the data series has improved using Box-Cox transformation. Hence it could be confirmed that the data series is normally distributed now. Differencing First Differencing Data_T_BoxCox_diff = diff(Data_T_BoxCox) plot(Data_T_BoxCox_diff,type="o",ylab="Egg deposition (in Mns)", main = "Figure 10, First differ encing plot on transformed data",xaxt="n",xlab="Years",col="darkblue") axis(1,at=seq(1981,1996,by=1),las=2) Hide Hide
  • 15. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 15/27 adf.test(Data_T_BoxCox_diff) Augmented Dickey-Fuller Test data: Data_T_BoxCox_diff Dickey-Fuller = -3.6798, Lag order = 2, p-value = 0.0443 alternative hypothesis: stationary With p-value 0.04 less than the alpha value of 0.05, we reject the null hypothesis which means that the data series is stationary post first differencing. However, from figure 10, a trend can still be observed. Therefore, it is important to eliminate the trend observe in the graph using second differencing Second Differencing Data_T2_BoxCox_diff = diff(Data_T_BoxCox,differences = 2) plot(Data_T2_BoxCox_diff,type="o",ylab="Egg deposition (in Mns)", main = "Figure 11, Second diff erencing plot on transformed data",xaxt="n",xlab="Years",col="darkblue") axis(1,at=seq(1981,1996,by=1),las=2) Hide Hide
  • 16. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 16/27 From figure 11, no longer a trend can be observed, which means its safer to say that the data series is stationary. To confirm the statement let us check using adf.test adf.test(Data_T2_BoxCox_diff) Augmented Dickey-Fuller Test data: Data_T2_BoxCox_diff Dickey-Fuller = -3.1733, Lag order = 2, p-value = 0.1254 alternative hypothesis: stationary Considering there’s no trend after second differencing, from figure 11. However with the p-value greater than alpha =0.05 which is not a higher number. It is safe to use the data series for further analysis using second differencing Modeling Model Specification With the use of first differencing, the changing variance trend has been removed in the data series (Figure 10) and with the data series now being stationary too, we will build multiple models using taught approaches. Before proceeding, model specification would be done using several approaches Model specification using ACF and PACF for the differenced series Hide Hide
  • 17. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 17/27 acf(Data_T2_BoxCox_diff) pacf(Data_T2_BoxCox_diff) Hide
  • 18. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 18/27 From the ACF plot there is no significant lag. However in PACF plot shown above, there are a significant lags present present at 4. Therefore ARIMA(1,2,0) model should be considered, with no presence of white noise behaviour. Model specification using EACF (Extended ACF) for the differenced series With the use of EACF here the order of AR(Autoregressive) and MA(moving average) component of ARMA model can be identified. eacf(Data_T2_BoxCox_diff, ar.max=3, ma.max=3) AR/MA 0 1 2 3 0 o o o o 1 o o o o 2 o o o o 3 o o o o With the upper left point as (0,0) in the extended ACF method, confirming that there is presence of white noise behaviour.Therefore, from the output above the neighbouring points can be considered and the following models can be taken ARIMA(0,2,1), ARIMA(1,2,1), ARIMA(1,2,0) additionaly for further analysis. Model specification using BIC (Bayesion Information Criterion) for the differenced series Hide Hide
  • 19. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 19/27 res = armasubsets(y=Data_T2_BoxCox_diff,nar=3,nma=2,y.name='test',ar.method='ols') model order: 7 singularities in the computation of the projection matrix results are only valid up to model order 6 plot(res) From the above table, corresponding shaded columns are AR(1) and AR(3) coefficients. Also, two MA() effects can be seen and those are MA(1) and MA(2). Therefore, using these coefficients four more possible ARIMA models can be considered ARIMA(1,2,1), ARIMA(1,2,2), ARIMA(3,2,1) and ARIMA(3,2,2) At the end, possible set of ARIMA model are: ARIMA(1,2,0) - using ACF and PACF ARIMA(0,2,1) - using EACF ARIMA(1,2,1) - using BIC ARIMA(1,2,2) - using BIC ARIMA(3,2,1) - using BIC ARIMA(3,2,2) - using BIC Parameter estimation ARIMA(1,2,0) Hide Hide
  • 20. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 20/27 model_120_css = arima(Data,order=c(1,2,0),method='CSS') coeftest(model_120_css) z test of coefficients: Estimate Std. Error z value Pr(>|z|) ar1 -0.45944 0.23810 -1.9296 0.05365 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 model_120_ml = arima(Data,order=c(1,2,0),method='ML') coeftest(model_120_ml) z test of coefficients: Estimate Std. Error z value Pr(>|z|) ar1 -0.42966 0.22743 -1.8892 0.05886 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 AR(1) is insignificant for CSS method and ML estimation. ARIMA(0,2,1) model_021_css = arima(Data,order=c(0,2,1),method='CSS') coeftest(model_021_css) z test of coefficients: Estimate Std. Error z value Pr(>|z|) ma1 -1.066739 0.071847 -14.847 < 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 model_021_ml = arima(Data,order=c(0,2,1),method='ML') coeftest(model_021_ml) Hide Hide Hide
  • 21. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 21/27 z test of coefficients: Estimate Std. Error z value Pr(>|z|) ma1 -1.00000 0.25823 -3.8725 0.0001077 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 MA(1) is significant for CSS method and ML estimation ARIMA(1,2,1) model_121_css = arima(Data,order=c(1,2,1),method='CSS') coeftest(model_121_css) z test of coefficients: Estimate Std. Error z value Pr(>|z|) ar1 0.073817 0.284315 0.2596 0.7951 ma1 -1.132556 0.074796 -15.1419 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 model_121_ml = arima(Data,order=c(1,2,1),method='ML') coeftest(model_121_ml) z test of coefficients: Estimate Std. Error z value Pr(>|z|) ar1 0.071764 0.269251 0.2665 0.7898 ma1 -0.999999 0.236872 -4.2217 2.425e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 AR(1) is insignificant for CSS method and ML estimation while MA(1) is significant. ARIMA(1,2,2) model_122_css = arima(Data,order=c(1,2,2),method='CSS') coeftest(model_122_css) Hide Hide Hide
  • 22. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 22/27 z test of coefficients: Estimate Std. Error z value Pr(>|z|) ar1 1.005671 0.049778 20.203 < 2.2e-16 *** ma1 -2.824099 0.125344 -22.531 < 2.2e-16 *** ma2 1.838559 0.114620 16.040 < 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 model_122_ml = arima(Data,order=c(1,2,2),method='ML') coeftest(model_122_ml) z test of coefficients: Estimate Std. Error z value Pr(>|z|) ar1 0.058932 1.166803 0.0505 0.9597 ma1 -0.987060 1.155631 -0.8541 0.3930 ma2 -0.012925 1.130998 -0.0114 0.9909 AR(1),MA(1) and MA(2) are significant for CSS method and not for ML estimation ARIMA(3,2,1) model_321_css = arima(Data,order=c(3,2,1),method='CSS') coeftest(model_321_css) z test of coefficients: Estimate Std. Error z value Pr(>|z|) ar1 -0.17209 0.26139 -0.6584 0.510286 ar2 -0.19198 0.25364 -0.7569 0.449106 ar3 -0.52748 0.24922 -2.1165 0.034300 * ma1 -0.64906 0.24639 -2.6343 0.008432 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 model_321_ml = arima(Data,order=c(3,2,1),method='ML') coeftest(model_321_ml) Hide Hide Hide
  • 23. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 23/27 z test of coefficients: Estimate Std. Error z value Pr(>|z|) ar1 0.0042103 0.3147996 0.0134 0.9893 ar2 -0.0431425 0.2891200 -0.1492 0.8814 ar3 -0.3350403 0.2798031 -1.1974 0.2311 ma1 -0.9018704 0.6489515 -1.3897 0.1646 AR(3) and MA(1) are significant in CSS method and AR(1),AR(2),AR(3),MA(1) all are insignificant for ML estimation ARIMA(3,2,2) model_322_css = arima(Data,order=c(3,2,2),method='CSS') coeftest(model_322_css) z test of coefficients: Estimate Std. Error z value Pr(>|z|) ar1 -0.139922 0.348205 -0.4018 0.68780 ar2 -0.198602 0.258520 -0.7682 0.44235 ar3 -0.544728 0.277057 -1.9661 0.04928 * ma1 -0.683391 0.360419 -1.8961 0.05795 . ma2 0.045948 0.329455 0.1395 0.88908 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 model_322_ml = arima(Data,order=c(3,2,2),method='ML') coeftest(model_322_ml) z test of coefficients: Estimate Std. Error z value Pr(>|z|) ar1 0.132356 0.472334 0.2802 0.77931 ar2 -0.099038 0.323387 -0.3063 0.75941 ar3 -0.384399 0.314004 -1.2242 0.22088 ma1 -0.996970 0.589359 -1.6916 0.09072 . ma2 0.199121 0.472894 0.4211 0.67371 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 AR(3) is significant for CSS method and AR(1),AR(2),AR(3),MA(1) and MA(2) are insignificant for ML estimation After comparing all the results above, ARIMA model (1,2,2) has all the significant coefficient using CSS method. Hide Hide
  • 24. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 24/27 Sort AIC and BIC based on score Function to sort model based on their AIC and BIC score sort.score = function(x,score = c("bic","aic")){if(score == "aic"){ x[with(x,order(AIC)),]} else if (score == "bic") {x[with(x,order(BIC)),]} else {warning('score ="x" accepts valid arguments ("aic","bic")')}} Sorting AIC model sort.score(AIC(model_120_ml,model_021_ml,model_121_ml,model_122_ml,model_321_ml,model_322_ml),sc ore="aic") df <dbl> AIC <dbl> model_021_ml 2 22.74602 model_121_ml 3 24.67428 model_120_ml 2 26.57611 model_122_ml 4 26.67412 model_321_ml 5 26.90919 model_322_ml 6 28.75165 6 rows sort.score(BIC(model_120_ml,model_021_ml,model_121_ml,model_122_ml,model_321_ml,model_322_ml),sc ore="bic") df <dbl> BIC <dbl> model_021_ml 2 24.02413 model_121_ml 3 26.59145 model_120_ml 2 27.85423 model_122_ml 4 29.23035 model_321_ml 5 30.10448 model_322_ml 6 32.58599 6 rows With the lowest AIC and BIC score of 22.7 and 24.0, it can be said that ARIMA (0,2,1) model is the best model. Hide Hide Hide
  • 25. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 25/27 Overfitting With insignificant AR(1), we can conclude that ARIMA (1,2,2) is a model which overfits ARIMA (0,2,1) model Best model selection After observing results from AIC and BIC and cconfirming all the overfitting models, it can be concluded that ARIMA (0,2,1) is the best model for predicting next 5 year egg depositions (in Mns). Model Diagnostics For the selected best model, the Standardised residual behaviour will be analysed for its normality and autocorrelation in order to test the results. Also, Ljung-Box test will be performed to verify the correct model selection. residual.analysis <- function(model, std = TRUE){ library(TSA) install.packages("FitAR") library(FitAR) if (std == TRUE){ res.model = rstandard(model) }else{ res.model = residuals(model) } par(mfrow=c(3,2)) plot(res.model,type='o',ylab='Standardised residuals', main="Time series plot of standardised r esiduals") abline(h=0) hist(res.model,main="Histogram of standardised residuals") qqnorm(res.model,main="QQ plot of standardised residuals") qqline(res.model, col = 2) acf(res.model,main="ACF of standardised residuals") pacf(res.model,main="PACF of standardised residuals") print(shapiro.test(res.model)) k=0 LBQPlot(res.model, lag.max = length(model$residuals)-1 , StartLag = k + 1, k = 0, SquaredQ = FA LSE) par(mfrow=c(1,1)) } residual.analysis(model=model_021_ml) Hide Hide
  • 26. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 26/27 Error in install.packages : Updating loaded packages Shapiro-Wilk normality test data: res.model W = 0.92478, p-value = 0.2013 From the above output it can be concluded the following, Time Series plot of standardised residuals shows that there is no general trend with no changing variance, supporting the selected ARIMA (0,2,1) Histogram of standardised residuals is somewhat similar to a normal distribution. In the QQplot of standardised residuals, majority of the data points are closer to the line of best fit or there is less variance. However, data points at the ends are far from the line best, that could be explained by ‘smaller dataset’ in terms of no. of observations. With p-value of 0.20 greater than alpha value of 0.05, we fail to reject our null hypothesis which means the data series in the selected data satisfies the normality behavior. From the ACF and PACF, there are no significant lags. Hence it could be said that there is existence of white noise in the data series As per the Ljung Box Test, none of the data points are under or falls below the red line, supporting our model. Forecast To predict or forecast the next five year ‘Egg depositions (in Mns)’, ARIMA (0,2,1) model will be leveraged. Hide
  • 27. 5/12/2019 Time Series Analysis MATH1318 file:///C:/Users/kumar/Downloads/Semester 3/Time Series Analysis/Assignment 2/Egg Deposition_Time Series Analysis.nb.html 27/27 Fit = Arima(Data,c(0,2,1),lambda=0.45) plot(forecast(Fit,h=5),xlab="Year",ylab="Egg Deposition (in Mns)",type="o",xaxt="n",main="Fore casting next five year values for Egg deposition(1996-2001)",col=4,fcol=4,shadebars=TRUE) axis(1,at=seq(1981,2001,by=1),las=2) Summary This repost covers the processes that are followed in order to achieve the final goal, i.e., the prediction of egg depositions of Bloaters for the next five years after 1996. The first step was to convert the series from non- stationary to stationary, using the methods transformation and differencing. The model specifications are then found by the results of PACF, ACF, BIC and EACF. Parameter estimation is done by CSS method and ML estimations. The confirmation of best model has come from the outputs of the AIC and BIC. From the Model diagnostic, the model ARIMA(0,2,1) has turned out to be the best feasible model in order to forecast the eggs depositions of Bloaters in the next five years. The output of forecast indicates a increase in egg deposition after 1996 for the next five years.