The data we analysed in this report is the The daily closing price of bitcoin from the 27th of April 2013 to the 3rd of March 2018. The objective of this report is to analyze the Bitcoin Closing price by using the time series analysis methods and then choosing the best model among a set of possible models for this dataset and give forecasts of Bitcoin for the next 10 days. The rest of this report is organised as follow. Section 2 describes an overview of our methodology. Section 3 displays data preprocessing for futher analysis. Section 4 discovers a descriptive analysis. Section5focusesonfittingaquadratictimetrendmodel. Section6isforfittingabestARIMAmodel. Section 6 discusses GARCH models by transformed series. Section 7 explores ARMA+GARCH models. At section 8 we will make our final selection for a best fitting model. Section 9 include a mean absolute scaled error (MASE) for each of model fits and forecasts. And the last section concludes with a summary.
Time series analysis on The daily closing price of bitcoin from the 27th of April 2013 to the 3rd of March 2018.
1. MATH1318 Time Series Analysis
Final Project - Competitive
Shiman CAO s3194292
Cheng CHEN s3666057
Shuai GAO s3596156
Yifei SUN s3572436
25 May 2018
1
4. 1 Introduction
The data we analysed in this report is the The daily closing price of bitcoin from the 27th of April
2013 to the 3rd of March 2018.
The objective of this report is to analyze the Bitcoin Closing price by using the time series analysis
methods and then choosing the best model among a set of possible models for this dataset and give
forecasts of Bitcoin for the next 10 days.
The rest of this report is organised as follow. Section 2 describes an overview of our methodology.
Section 3 displays data preprocessing for futher analysis. Section 4 discovers a descriptive analysis.
Section 5 focuses on fitting a quadratic time trend model. Section 6 is for fitting a best ARIMA model.
Section 6 discusses GARCH models by transformed series. Section 7 explores ARMA+GARCH
models. At section 8 we will make our final selection for a best fitting model. Section 9 include
a mean absolute scaled error (MASE) for each of model fits and forecasts. And the last section
concludes with a summary.
2 Methodology
We considered three models - quadratic trend, ARIMA, GARCH and ARMA+GARCH in order to
find the best model to fit Bitcoin data by parameter estimation and diagnostic checking. Furthermore,
we calculated of the MASE values for our best model over fitted values and forecasts.
3 Data Preprocessing
In this report, we used following R packages.
• TSA
• tseries
• FSAdata
• fUnitRoots
• forecast
• CombMSC
• lmtest
• fGarch
• FitAR
• tswge
• rugarch
library(TSA)
library(tseries)
library(FSAdata)
library(fUnitRoots)
library(forecast)
library(CombMSC)
library(lmtest)
library(fGarch)
4
5. library(FitAR)
library(tswge)
library(rugarch)
Firstly, read Bitcoin data into R and check the data format.
setwd("~/Documents/MATH1318 Time Series Analysis/Final Project/")
bitcoin <- read.csv('Bitcoin_Historical_Price.csv', header = TRUE)
class(bitcoin)
## [1] "data.frame"
The data format is numeric, so we have to use ts function to convert data into a time series
object for further analysis.
inds <- seq(as.Date("2013-04-27"), as.Date("2018-3-3"), by = "day")
coin <- ts(bitcoin$Close,start = c(2013, as.numeric(format(inds[1], "%j"))),
frequency = 365)
4 Descriptive analysis
4.1 Time series plot
plot(coin,type='o',ylab='Closing price',
main='Time Series Plot')
Based on plot above, here are four main characteristics could be noticed:
• Trend: Yes. there is an obvious trend, upward trend, it states the nonstationary
• Repeat pattern: no repeated pattern, no seasonality components (no need to consider any
seasonality or harmonic model for this sries)
• Changing variance: obvious changing variance
• Bahaviour: autoregressive, succeeding observations imply the existence of autoregressive
behavior (it may result from the trend in the time sereis)
4.2 Scatter plot
In order to know whether or not consecutive years are related in some way, that we can be able to
use one day’s egg bitcoin closing price value to help forecast the next day. The scatter plot generated
below investigates the correlation between pairs of consecutive numbers of bitcoin closing value.
plot(y=coin,x=zlag(coin),
ylab='Closing price',
xlab='Previous day closing price', main = "Scatter plot")
The scatter plot indicates there might be a very high positive correlation.
5
6. Time Series Plot
Time
Closingprice
2014 2015 2016 2017 2018
05000100001500020000
Figure 1: Time Series Plot
0 5000 10000 15000 20000
05000100001500020000
Scatter plot
Previous day closing price
Closingprice
Figure 2: Scatter Plot
6
7. 4.3 Normality Checking
Estimates are calculated based on normality assumption, for esitimation to be correct, we have to
ensure the data is normal distributed.
qqnorm(coin)
qqline(coin, col = 2, lwd = 1, lty = 2)
−3 −2 −1 0 1 2 3
05000100001500020000
Normal Q−Q Plot
Theoretical Quantiles
SampleQuantiles
Figure 3: Check normality of orginal sereis
shapiro.test(coin)
##
## Shapiro-Wilk normality test
##
## data: coin
## W = 0.48004, p-value < 2.2e-16
Apprerantly, the Q-Q plot reveals it did not meet the requirements of normality. And shapiro test
also confirmed it with p-value less than 0.01. So it rejected the null hypothesis that it is random.
7
8. 5 Modelling - Quadratic time trend model
In order to catch the trend in time series plot, Firstly try to fit a quadratic time trend model.
5.1 Fit the quadratic model
t = time(coin)
t2 = t^2
model1 = lm(coin~t+t2)
summary(model1)
##
## Call:
## lm(formula = coin ~ t + t2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3041.0 -1317.1 248.7 893.0 12418.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.830e+09 1.019e+08 37.59 <2e-16 ***
## t -3.801e+06 1.011e+05 -37.60 <2e-16 ***
## t2 9.431e+02 2.508e+01 37.61 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1854 on 1769 degrees of freedom
## Multiple R-squared: 0.629, Adjusted R-squared: 0.6285
## F-statistic: 1499 on 2 and 1769 DF, p-value: < 2.2e-16
plot(ts(fitted(model1)), ylim = c(min(c(fitted(model1), as.vector(coin))), max(c(fitted(model1)
ylab = 'Close price', main = "Fitted quadratic curve",
type="l",lty=2,col="red")
lines(as.vector(coin), lty=2, type="o")
5.2 Diagnostic checking of quadratic model
par(mfrow=c(2,2))
y = rstudent(model1)
plot(y = y, x = as.vector(time(coin)),
xlab = 'Time', ylab='Standardized Residuals',type='o')
8
9. Fitted quadratic curve
Time
Closeprice
0 500 1000 1500
05000100001500020000
Figure 4: Fitted quadratic curve
hist(rstudent(model1), xlab='Standardized Residuals')
qqnorm(y)
qqline(y, col = 2, lwd = 1, lty = 2)
shapiro.test(y)
##
## Shapiro-Wilk normality test
##
## data: y
## W = 0.78251, p-value < 2.2e-16
acf(y, main="ACF of standardized residuals")
par(mfrow=c(1,1))
• Residuals plots: clearly did not meet expectation of random. The plot implies dramatic
patterns that would cause to doubt the model. There is obvious trends, no seasonality, no
repeated pattern, but obvious changing variance, autoregressive behaviour.
• Histogram could not be viewed as right-skewed.
• Q-Q plot: Significant number of points departure from the red dash line, normality assumption
does not hold for this series.
9
10. 2014 2015 2016 2017 2018
0246
Time
StandardizedResiduals
Histogram of rstudent(model1)
Standardized Residuals
Frequency
−2 0 2 4 60200400600800
−3 −2 −1 0 1 2 3
0246
Normal Q−Q Plot
Theoretical Quantiles
SampleQuantiles
0 5 10 15 20 25 30
0.00.20.40.60.81.0
ACF of standardized residuals
Lag
ACF
Figure 5: Residual analysis of quadratic time trend model
10
11. • Shapiro-Wail test: With P-value is less than 0.01, so we rejected the null hypothesis that the
stochastic component of this model is normally distributed.
• ACF plot: All significant lags in ACF plot which not within the horizontal dashed lines. It
confirms there are a strong autocorrelation left in the residuals.
Overall, based on analysis above, normality did not hold, and the time series did not fit the quadratic
trend model at all.
ALong with there is significant trends in orginal sereis, It is important to detrend to overcome the
nonstaionary of series and select ARIMA models to fit.
6 Modelling - ARIMA model
6.1 Existence of trend and nonstationarity
The ACF & PACF plot and ADF test to support the evidence of non-stationary again.
par(mfrow=c(1,2))
acf(coin)
pacf(coin)
0.00 0.04 0.08
0.00.20.40.60.81.0
Series coin
Lag
ACF
0.00 0.04 0.08
0.00.20.40.60.81.0
Lag
PartialACF
Series coin
Figure 6: ACF and PACF of orignial series
par(mfrow=c(1,1))
adf.test(coin)
11
12. ##
## Augmented Dickey-Fuller Test
##
## data: coin
## Dickey-Fuller = -1.6976, Lag order = 12, p-value = 0.7063
## alternative hypothesis: stationary
# Nonstationary
There is a slowly decaying pattern in ACF and very high first correlation in PACF implies the
existence of trend and nonstationarity.
With ADF test, p-value of 0.7063, (which is greater than 0.05), that indicates cannot reject the null
hypothesis stating that the series is non-stationary.
6.2 Overcome the nonstationary nature of this series
6.2.1 BoxCox Transformation
Firstly, applying Box-Cox Transformation of maximum likelihood in order to elimate non-stationary
components in series.
coin.tr = BoxCox.ar(coin+abs(min(coin))+0.1, lambda = c(-1,1))
−1.0 −0.5 0.0 0.5 1.0
800090001000011000
λ
LogLikelihood
95%
Figure 7: Log likelihood for boxcox transforamtion
coin.tr$ci
## [1] -1 -1
12
13. From the log likelihood plot, it seems to capture 0, we decided to apply the log transforamtion on
bitcoin series.
par(mfrow=c(1,2))
acf(log(coin))
pacf(log(coin))
0.00 0.04 0.08
0.00.20.40.60.81.0
Series log(coin)
Lag
ACF
0.00 0.04 0.08
0.00.20.40.60.81.0
Lag
PartialACF
Series log(coin)
Figure 8: ACF and PACF plot of log transformed series
par(mfrow=c(1,1))
adf.test(log(coin))
##
## Augmented Dickey-Fuller Test
##
## data: log(coin)
## Dickey-Fuller = -1.0155, Lag order = 12, p-value = 0.9362
## alternative hypothesis: stationary
# Nonstationary
qqnorm(log(coin), main = 'QQ plot for the natural log')
qqline(log(coin))
There is still a slowly decaying pattern in ACF and very high first correlation in PACF implies the
existence of trend and nonstationarity of log transformed coin data.
With ADF test, p-value of 0.9362 that indicates cannot reject the null hypothesis stating that the
13
14. −3 −2 −1 0 1 2 3
45678910
QQ plot for the natural log
Theoretical Quantiles
SampleQuantiles
Figure 9: Q-Q plot of log transformed series
series is non-stationary.
The Q-Q plot does not meet the requirement of normality as well.
So, we need to take differencing on the transformed data.
6.2.2 Differencing
Applying the differencing for log coin series and the ADF unit-root test to test the existence of
non-stationarity with this series
6.2.2.1 First differencing
diff.log.coin = diff(log(coin), difference = 1)
plot(diff.log.coin, type = 'o', ylab = 'Closing price')
order = ar(diff(diff.log.coin))$order
adfTest(diff.log.coin, lags = order, title = NULL, description = NULL)
## Warning in adfTest(diff.log.coin, lags = order, title = NULL, description =
## NULL): p-value smaller than printed p-value
##
## Title:
14
15. Time
Closingprice
2014 2015 2016 2017 2018
−0.20.00.10.20.3
Figure 10: Time series plot of log transformation and first differencing
## Augmented Dickey-Fuller Test
##
## Test Results:
## PARAMETER:
## Lag Order: 32
## STATISTIC:
## Dickey-Fuller: -6.5432
## P VALUE:
## 0.01
##
## Description:
## Sat May 26 12:18:05 2018 by user:
McLeod.Li.test(y=diff.log.coin, main="McLeod-Li Test Statistics for Daily Bitcoin price")
The plot shows the trend is succesfully detrened. But there is changing variance especially start
from 2017.
With ADF test, p-value of 0.01 that confirms we can reject the null hypothesis stating that the
series is non-stationary, the series is now stationary.
McLeod-Li test is significnat at 5% level of significance for all lags. This gives a strong idea about
existence of volatility clustering. It also can explain the obvious changing variance from the plot.
15
16. 0 5 10 15 20 25 30
0.00.20.40.60.81.0
McLeod−Li Test Statistics for Daily Bitcoin price
Lag
P−value
Figure 11: McLeod-Li of log transformation and first differencing
6.3 ARIMA model specifications
par(mfrow = c(1,2))
acf(diff.log.coin)
pacf(diff.log.coin)
eacf(diff.log.coin)
## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 o o o o x x o o o x x o o o
## 1 x o o o o x o o o o x o o o
## 2 x x o o o x o o o o x o o o
## 3 x x x o o x o o o o x o o o
## 4 x x o x o x o o o o x o o o
## 5 x x x x x o o o o o x o o o
## 6 x x o x x x o o o o o o o o
## 7 x x o x x x x o o o o o o o
par(mfrow = c(1,1))
res = armasubsets(y=diff.log.coin, nar = 13, nma = 13, y.name = 'test', ar.method = 'ols')
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax,
16
17. 0.00 0.04 0.08
−0.040.000.040.08
Series diff.log.coin
Lag
ACF
0.00 0.04 0.08
−0.06−0.020.020.06 Lag
PartialACF
Series diff.log.coin
Figure 12: ACF and PACF of log transformation and first differencing
## force.in = force.in, : 7 linear dependencies found
## Reordering variables and trying again:
plot(res)
From the output of EACF, we can include ARIMA(1,1,1), ARIMA(1,1,2), ARIMA(2,1,2).
From BIC table, it displays AR(5), MA(4) and MA(5), so the models will be ARIMA(1,1,1),
ARIMA(1,1,2), ARIMA(2,1,2)
Overall, the model could be included are ARIMA(5,1,1), ARIMA(5,1,2), ARIMA(1,1,4),
ARIMA(2,1,4), ARIMA(5,1,4), ARIMA(1,1,5), ARIMA(2,1,5) and ARIMA(5,1,5).
6.4 Parameter estimation
Two menthods would be used to estimate parameters
• Least Squares Estimate of the AR coefficient with significance test, method=‘CSS’
• Maximum likelihood estimate of the AR coefficient with significance tests, method=“ML”
6.5 Fit models and find estimations
Fit the models and find their parameter estimates and related significance tests
17
19. model_112_css = arima(log(coin),order=c(1,1,2),method='CSS')
coeftest(model_112_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.057253 0.233120 -0.2456 0.8060
## ma1 0.067643 0.232955 0.2904 0.7715
## ma2 -0.025280 0.023225 -1.0884 0.2764
model_112_ml = arima(log(coin),order=c(1,1,2),method='ML')
coeftest(model_112_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.0049591 0.4402458 0.0113 0.9910
## ma1 0.0049531 0.4395671 0.0113 0.9910
## ma2 -0.0289824 0.0231665 -1.2510 0.2109
P value which are greater than 0.05 for AR(1), MA(1) and MA(2) coefficience, which means AR(1),
MA(1) and MA(2) are insignificant in both CSS and ML methods.
ARIMA(2,1,2)
model_212_css = arima(log(coin),order=c(2,1,2),method='CSS')
coeftest(model_212_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.954612 0.061743 15.461 < 2.2e-16 ***
## ar2 -0.810779 0.053255 -15.225 < 2.2e-16 ***
## ma1 -0.985525 0.063667 -15.479 < 2.2e-16 ***
## ma2 0.809654 0.056937 14.220 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_212_ml = arima(log(coin),order=c(2,1,2),method='ML')
coeftest(model_212_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.744968 0.016816 44.301 < 2.2e-16 ***
## ar2 -0.972707 0.014831 -65.588 < 2.2e-16 ***
## ma1 -0.741698 0.024191 -30.661 < 2.2e-16 ***
19
20. ## ma2 0.944688 0.020423 46.255 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
P value which are less than 0.05 for AR(1), AR(2), MA(1) and MA(2) coefficience, which means
AR(1), AR(2), MA(1) and MA(2) are significant in both CSS and ML methods.
ARIMA(5,1,1)
model_511_css = arima(log(coin),order=c(5,1,1),method='CSS')
coeftest(model_511_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.1922124 0.2545669 0.7551 0.4502
## ar2 -0.0307148 0.0241429 -1.2722 0.2033
## ar3 0.0062171 0.0250590 0.2481 0.8041
## ar4 0.0295781 0.0240681 1.2289 0.2191
## ar5 0.0600586 0.0237934 2.5242 0.0116 *
## ma1 -0.1929995 0.2540491 -0.7597 0.4474
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_511_ml = arima(log(coin),order=c(5,1,1),method='ML')
coeftest(model_511_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.861301 0.080652 10.6793 <2e-16 ***
## ar2 -0.038479 0.031386 -1.2260 0.2202
## ar3 0.024587 0.031482 0.7810 0.4348
## ar4 0.031165 0.031419 0.9919 0.3212
## ar5 0.020623 0.027440 0.7516 0.4523
## ma1 -0.859505 0.077448 -11.0979 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
P value which are greater than 0.05 except AR(5) coefficience with CSS method. P value which are
greater than 0.05 except AR(1) and MA(1) coefficience with ML mehtod.
ARIMA(5,1,2)
model_512_css = arima(log(coin),order=c(5,1,2),method='CSS')
coeftest(model_512_css)
##
## z test of coefficients:
##
20
21. ## Estimate Std. Error z value Pr(>|z|)
## ar1 0.6592474 0.0984020 6.6995 2.091e-11 ***
## ar2 -0.7165209 0.1164030 -6.1555 7.483e-10 ***
## ar3 0.0255579 0.0332531 0.7686 0.4421
## ar4 -0.0050122 0.0298186 -0.1681 0.8665
## ar5 0.0624359 0.0258437 2.4159 0.0157 *
## ma1 -0.6628140 0.0952682 -6.9573 3.467e-12 ***
## ma2 0.6974260 0.1151418 6.0571 1.386e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_512_ml = arima(log(coin),order=c(5,1,2),method='ML')
coeftest(model_512_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.775954 0.464914 1.6690 0.09511 .
## ar2 0.040787 0.413627 0.0986 0.92145
## ar3 0.021336 0.034974 0.6100 0.54183
## ar4 0.032984 0.032467 1.0159 0.30967
## ar5 0.024382 0.033196 0.7345 0.46264
## ma1 -0.774123 0.464629 -1.6661 0.09569 .
## ma2 -0.079053 0.411545 -0.1921 0.84767
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
P value is significant which are less than 0.05 except AR(3), AR(4) and AR(5) coefficience with
CSS method. P value which are greater than 0.05 for all coefficience which means it is insignificant
with ML mehtod.
ARIMA(1,1,4)
model_114_css = arima(log(coin),order=c(1,1,4),method='CSS')
coeftest(model_114_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.1668019 0.2800245 0.5957 0.55140
## ma1 -0.1602397 0.2814797 -0.5693 0.56917
## ma2 -0.0333841 0.0245644 -1.3590 0.17413
## ma3 0.0092024 0.0249034 0.3695 0.71174
## ma4 0.0431831 0.0260723 1.6563 0.09766 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_114_ml = arima(log(coin),order=c(1,1,4),method='ML')
coeftest(model_114_ml)
21
22. ##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.926442 0.042537 21.7795 <2e-16 ***
## ma1 -0.924445 0.048394 -19.1025 <2e-16 ***
## ma2 -0.038163 0.032486 -1.1748 0.2401
## ma3 0.030731 0.032251 0.9529 0.3407
## ma4 0.036053 0.023877 1.5099 0.1311
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
P value which are greater than 0.05 for all coefficience which means it is insignificant with CSS
mehtod. P value is significant which are less than 0.05 except MA(2), MA(3) and MA(4) coefficience
with ML method.
ARIMA(2,1,4)
model_214_css = arima(log(coin),order=c(2,1,4),method='CSS')
coeftest(model_214_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.937463 0.072607 12.9114 <2e-16 ***
## ar2 -0.762439 0.069316 -10.9995 <2e-16 ***
## ma1 -0.938400 0.076122 -12.3275 <2e-16 ***
## ma2 0.738491 0.077035 9.5864 <2e-16 ***
## ma3 0.018296 0.034348 0.5327 0.5943
## ma4 0.024422 0.025266 0.9666 0.3337
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_214_ml = arima(log(coin),order=c(2,1,4),method='ML')
coeftest(model_214_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.223182 0.281327 0.7933 0.427593
## ar2 0.668226 0.269626 2.4783 0.013199 *
## ma1 -0.221597 0.281883 -0.7861 0.431792
## ma2 -0.707633 0.269894 -2.6219 0.008744 **
## ma3 0.015131 0.024672 0.6133 0.539682
## ma4 0.069739 0.026058 2.6763 0.007445 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
22
23. P value which are greater than 0.05 for all coefficience which means it is insignificant with CSS
mehtod. P value is significant which are less than 0.05 except MA(2), MA(3) and MA(4) coefficience
with ML method.
ARIMA(5,1,4)
model_514_css = arima(log(coin),order=c(5,1,4),method='CSS')
coeftest(model_514_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.618801 0.333882 1.8534 0.06383 .
## ar2 -0.223678 0.530398 -0.4217 0.67323
## ar3 -0.468648 0.432412 -1.0838 0.27845
## ar4 0.493471 0.216503 2.2793 0.02265 *
## ar5 0.039369 0.026566 1.4819 0.13837
## ma1 -0.621015 0.332601 -1.8671 0.06188 .
## ma2 0.217115 0.527681 0.4115 0.68074
## ma3 0.471968 0.412870 1.1431 0.25298
## ma4 -0.460210 0.191208 -2.4069 0.01609 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_514_ml = arima(log(coin),order=c(5,1,4),method='ML')
# coeftest(model_514_ml)
P value are all insignificant of coefficience with CSS method, but AR(4) components.
ARIMA(1,1,5)
model_115_css = arima(log(coin),order=c(1,1,5),method='CSS')
coeftest(model_115_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.1514020 0.1961901 0.7717 0.44029
## ma1 -0.1473827 0.1960818 -0.7516 0.45227
## ma2 -0.0288168 0.0242309 -1.1893 0.23434
## ma3 0.0063575 0.0233482 0.2723 0.78540
## ma4 0.0325136 0.0243124 1.3373 0.18112
## ma5 0.0459273 0.0226228 2.0301 0.04234 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_115_ml = arima(log(coin),order=c(1,1,5),method='ML')
coeftest(model_115_ml)
##
23
24. ## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.9190625 0.0496161 18.5235 <2e-16 ***
## ma1 -0.9166829 0.0552893 -16.5798 <2e-16 ***
## ma2 -0.0375922 0.0323477 -1.1621 0.2452
## ma3 0.0289471 0.0325592 0.8891 0.3740
## ma4 0.0312379 0.0309922 1.0079 0.3135
## ma5 0.0072406 0.0269564 0.2686 0.7882
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
P value are all insignificant of coefficience with CSS method. P value are only significant of AR(1)
and MA(1) coefficience with ML method.
ARIMA(2,1,5)
model_215_css = arima(log(coin),order=c(2,1,5),method='CSS')
coeftest(model_215_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.9054549 0.1029288 8.7969 <2e-16 ***
## ar2 -0.7555299 0.0672920 -11.2276 <2e-16 ***
## ma1 -0.9058495 0.1058401 -8.5587 <2e-16 ***
## ma2 0.7324045 0.0742457 9.8646 <2e-16 ***
## ma3 0.0297180 0.0359135 0.8275 0.4080
## ma4 0.0082017 0.0309218 0.2652 0.7908
## ma5 0.0223769 0.0264274 0.8467 0.3971
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_215_ml = arima(log(coin),order=c(2,1,5),method='ML')
# coeftest(model_215_ml)
P value are all insignificant of coefficience with CSS method, except highe order of MA componnets.
ARIMA(5,1,5)
model_515_css = arima(log(coin),order=c(5,1,5),method='CSS')
# coeftest(model_515_css)
model_515_ml = arima(log(coin),order=c(5,1,5),method='ML')
coeftest(model_515_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.538577 0.012935 41.636 < 2.2e-16 ***
24
25. ## ar2 -0.648658 0.018734 -34.625 < 2.2e-16 ***
## ar3 0.606861 0.028922 20.983 < 2.2e-16 ***
## ar4 -0.532641 0.020064 -26.547 < 2.2e-16 ***
## ar5 0.925718 0.036598 25.294 < 2.2e-16 ***
## ma1 -0.529027 0.019678 -26.884 < 2.2e-16 ***
## ma2 0.635363 0.027054 23.485 < 2.2e-16 ***
## ma3 -0.586011 0.039693 -14.763 < 2.2e-16 ***
## ma4 0.523827 0.029264 17.900 < 2.2e-16 ***
## ma5 -0.864737 0.047732 -18.116 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
P value are all significant with ML method.
Based on significant test above, we found ARIMA(2,1,2) and ARIMA(5,1,5) seems to be good.
6.6 AIC and BIC values
sort.score(AIC(model_111_ml, model_112_ml, model_212_ml, model_511_ml,
model_512_ml, model_114_ml, model_214_ml, model_514_ml,
model_115_ml, model_215_ml, model_515_ml), score = "aic")
## df AIC
## model_515_ml 11 -5980.026
## model_514_ml 10 -5971.925
## model_212_ml 5 -5970.630
## model_114_ml 6 -5958.634
## model_511_ml 7 -5957.491
## model_115_ml 7 -5956.695
## model_214_ml 7 -5956.326
## model_512_ml 8 -5955.527
## model_215_ml 8 -5954.640
## model_111_ml 3 -5948.861
## model_112_ml 4 -5948.000
# sort.score(BIC(model_111_ml, model_112_ml, model_212_ml, model_511_ml,
# model_512_ml, model_114_ml, model_214_ml, model_514_ml, model_115_ml,
# model_215_ml, model_515_ml), score = "bic")
We focused on only AIC value here, from the result, we can see ARIMA(5,1,5) has the lowest AIC
value. Thus, we chose it to be the best fitting ARIMA model.
6.6.1 Residual Analysis
residual.analysis(model = model_515_ml)
##
## Shapiro-Wilk normality test
25
26. ##
## data: res.model
## W = 0.88675, p-value < 2.2e-16
Time series plot of standardised residuals
Time
Standardisedresiduals
2014 2015 2016 2017 2018
−6048
Histogram of standardised residuals
res.model
Frequency
−5 0 5
0400800
0.00 0.02 0.04 0.06 0.08
−0.040.02
ACF of standardised residuals
Lag
ACF
0.00 0.02 0.04 0.06 0.08
−0.040.02
Lag
PartialACF
PACF of standardised residuals
−3 −2 −1 0 1 2 3
−6048
QQ plot of standardised residuals
Theoretical Quantiles
SampleQuantiles
0 5 10 15 20 25 30
0.00.40.8
Ljung−Box Test
lag
p−value
Figure 14: Residual analysis of ARIMA(5,1,5) model
• Residuals plots: Although the plot shows the model detrend trends in series, the changing
variance is still very obvious in residuals.
• Histogram could not be viewed as symmetric.
• Q-Q plot: Significant number of points departure from the red dash line, normality assumption
does not hold for this series. The thick tail implies ARCH effect in series.
• Shapiro-Wail test: With P-value is less than 0.01, so we rejected the null hypothesis that the
stochastic component of this model is normally distributed.
• ACF plot: Few significant lags that confirms there are some autocorrelation left in the residuals.
26
27. • The Ljung-Box Test: All points are above red dashed line, we have no evidence to reject the
null hypothesis that the error terms are uncorrelated.
Overall, based on residual analysis above, normality did not hold for ARIMA(5,1,5). The model is
not very successfully to be capturing the dependence structure of Bitcoin time series.
6.6.2 Forecasting the ARIMA model
r.coin = diff(log(coin))
log.data = log(bitcoin$Close)
fit = arima(coin, order = c(5,1,5),
xreg = data.frame(constant = seq(coin)))
n = length(coin)
n.ahead = 10
newxreg = data.frame(constant = (n+1):(n+n.ahead))
plot.Arima(fit, n.ahead = n.ahead, newxreg = newxreg,
ylab = 'Bitcoin price',
xlab = 'Year', n1 = c(2013,117),
col='red')
Year
Bitcoinprice
2014 2015 2016 2017 2018
05000100001500020000
Figure 15: Forecasting Plot of Bitcoin sereis in 10 days
27
28. fit2 = Arima(coin, order = c(5,1,5))
forecast(fit2, h=10)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2018.1726 11107.68 10824.49 11390.86 10674.583 11540.77
## 2018.1753 11333.36 10910.22 11756.49 10686.231 11980.48
## 2018.1781 11256.40 10739.87 11772.92 10466.441 12046.35
## 2018.1808 11488.63 10887.62 12089.64 10569.470 12407.79
## 2018.1836 11476.27 10807.46 12145.08 10453.419 12499.12
## 2018.1863 11178.59 10439.75 11917.43 10048.626 12308.55
## 2018.1890 11400.63 10591.36 12209.90 10162.955 12638.31
## 2018.1918 11306.78 10438.70 12174.85 9979.165 12634.39
## 2018.1945 11583.85 10657.92 12509.78 10167.764 12999.94
## 2018.1973 11414.00 10438.25 12389.74 9921.726 12906.27
6.6.2.1 2. original
# fit = Arima(coin,c(5,1,5))
# plot(forecast(fit,h=10))
This foreast does not make too much sense that not provide us any valueable information.
Referring to McLeod-Li test we did before, this series is significnat at 5% level of significance for all
lags, which indicates volatility clustering. Additionally, changing variance is very obvious even after
detrending through ARIMA model. Therefore, we determine to deal with the conditional variance
by applying GARCH models in next section.
7 Modelling - GARCH model (effect)
We are using the seires with log transformed and the first diffrencing to determine the GARCH
effect.
7.1 ACF and PACF
par(mfrow=c(3,2))
acf(r.coin,main="Sample ACF of lg Daily Bitcoin price change:
April 27, 2013 to March 3, 2018")
pacf(r.coin,main="Sample PACF of lg Daily Bitcoin price change:
April 27, 2013 to March 3, 2018")
acf(abs(r.coin),main="Sample ACF of the Absolute Daily Bitcoin price")
pacf(abs(r.coin),main="Sample PACF of the Absolute Daily Bitcoin price")
acf(r.coin^2,main="Sample ACF of the Squared Daily Bitcoin price")
pacf(r.coin^2,main="Sample PACF of the Squared Daily Bitcoin price")
28
29. 0.00 0.02 0.04 0.06 0.08
−0.040.04
Sample ACF of lg Daily Bitcoin price change:
April 27, 2013 to March 3, 2018
Lag
ACF
0.00 0.02 0.04 0.06 0.08
−0.060.020.08
Lag
PartialACF
Sample PACF of lg Daily Bitcoin price change:
April 27, 2013 to March 3, 2018
0.00 0.02 0.04 0.06 0.08
0.00.2
Sample ACF of the Absolute Daily Bitcoin price
Lag
ACF
0.00 0.02 0.04 0.06 0.08
0.00.2
Lag
PartialACF
Sample PACF of the Absolute Daily Bitcoin price
0.00 0.02 0.04 0.06 0.08
−0.050.15
Sample ACF of the Squared Daily Bitcoin price
Lag
ACF
0.00 0.02 0.04 0.06 0.08
0.00.2
Lag
PartialACF
Sample PACF of the Squared Daily Bitcoin price
Figure 16: ACF and PACF plot of log transformed series with the first differencing, absolute value
and squared value
29
30. par(mfrow=c(3,2))
There are lots of significant correaltions in both ACF and PACF with the effect of highly volatile
sereis for both squared and absolute value series, which also reveals daily closing price of Bitcoin is
not independently and identically distributed.
7.2 Test for Abosule value
McLeod.Li.test(y=abs(r.coin),main="McLeod-Li Test Statistics for Absolute Daily Bitcoin price")
0 5 10 15 20 25 30
0.00.20.40.60.81.0
McLeod−Li Test Statistics for Absolute Daily Bitcoin price
Lag
P−value
Figure 17: McLeod-Li test of absolute value
qqnorm(abs(r.coin), main="Q-Q Normal Plot of the Absolute Daily Bitcoin price")
qqline(abs(r.coin))
shapiro.test(abs(r.coin))
##
## Shapiro-Wilk normality test
##
## data: abs(r.coin)
## W = 0.70523, p-value < 2.2e-16
McLeod-Li test is significnat which provides a strong idea about existence of volatility clustering.
Meanwhile, fat tails also is in accordance with volatiliy clustering. Shapiro-Wilk reveals normality
did not hold.
30
31. −3 −2 −1 0 1 2 3
0.000.100.200.30
Q−Q Normal Plot of the Absolute Daily Bitcoin price
Theoretical Quantiles
SampleQuantiles
Figure 18: Q-Q plot of absolute value
7.3 Test for Squared value
McLeod.Li.test(y=r.coin^2,main="McLeod-Li Test Statistics for the Squared Daily Bitcoin price")
qqnorm(r.coin^2, main="Q-Q Normal Plot of the Squared Daily Bitcoin price")
qqline(r.coin^2)
shapiro.test(r.coin^2)
##
## Shapiro-Wilk normality test
##
## data: r.coin^2
## W = 0.3113, p-value < 2.2e-16
McLeod-Li test is significnat which provides a strong idea about existence of volatility clustering.
Meanwhile, fat tails also is in accordance with volatiliy clustering. Shapiro-Wilk reveals normality
did not hold.
7.4 Order Specification
eacf(r.coin)
31
32. 0 5 10 15 20 25 30
0.00.20.40.60.81.0
McLeod−Li Test Statistics for the Squared Daily Bitcoin price
Lag
P−value
Figure 19: McLeod-Li test of squared value
−3 −2 −1 0 1 2 3
0.000.040.080.12
Q−Q Normal Plot of the Squared Daily Bitcoin price
Theoretical Quantiles
SampleQuantiles
Figure 20: Q-Q plot of squared value
32
33. ## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 o o o o x x o o o x x o o o
## 1 x o o o o x o o o o x o o o
## 2 x x o o o x o o o o x o o o
## 3 x x x o o x o o o o x o o o
## 4 x x o x o x o o o o x o o o
## 5 x x x x x o o o o o x o o o
## 6 x x o x x x o o o o o o o o
## 7 x x o x x x x o o o o o o o
# No white noise
eacf(abs(r.coin))
## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x x x x x x x x x x x x x x
## 1 x o o x x x o o o x o o o o
## 2 x x o o o o o o o x o o o o
## 3 x x x o o o o o o x o o o o
## 4 x x x x o o o o o x o o o o
## 5 x x x x x o o o o o o o o o
## 6 x x x x x o o o o o o o o o
## 7 x o x x x x x x o o o o o o
eacf(r.coin^2)
## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x x x x x x x x x x x x x x
## 1 x x x x x x o x o x o o x o
## 2 x x o o o x o o o x o o x o
## 3 x x x o o x o o o x o o x o
## 4 x x x o o o o o o o o o x o
## 5 x x x o x x o o o o o o o o
## 6 x x x x x x o o o o o o o o
## 7 x x x o x x x x o o o o o o
EACF plot of series with log transformation and first differencing confirms existene of little serial
correaltion by suggesting a white noise sereis.
EACF plot of applying abosulte value suggests GARCH(1,1) and GARCH(1,2).
EACF plot of applying square transformations suggests GARCH(2,2), GARCH(2,3).
7.5 Model Fitting, Etimation and Diagnostic Checking
We will fit model found by EACF and check residuals violations of assumptions to Find the best
order for GARCH componnets.
33
34. 7.5.1 GARCH(1,1)
m.11 = garch(r.coin,order=c(1,1),trace = FALSE)
summary(m.11) # All coefficients are significant at 5% level of significance.
##
## Call:
## garch(x = r.coin, order = c(1, 1), trace = FALSE)
##
## Model:
## GARCH(1,1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.55811 -0.33740 0.06798 0.53194 4.76716
##
## Coefficient(s):
## Estimate Std. Error t value Pr(>|t|)
## a0 3.632e-05 3.816e-06 9.518 <2e-16 ***
## a1 1.485e-01 9.555e-03 15.546 <2e-16 ***
## b1 8.522e-01 7.025e-03 121.310 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Diagnostic Tests:
## Jarque Bera Test
##
## data: Residuals
## X-squared = 5626.9, df = 2, p-value < 2.2e-16
##
##
## Box-Ljung test
##
## data: Squared.Residuals
## X-squared = 1.7522, df = 1, p-value = 0.1856
#m.11_2 = garchFit(formula = ~garch(1,1), data =r.coin )
#summary(m.11_2)
residual.analysis(m.11,class="GARCH",start=25)
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.9021, p-value < 2.2e-16
All coefficients are significant at 5% level of significance. Normality does not hold. And there is
autocorrelation left in residuals.
34
35. 0 500 1000 1500
−8−4024
Time series plot of standardised residuals
Index
Standardisedresiduals
Histogram of standardised residuals
res.model
Frequency
−8 −6 −4 −2 0 2 4
0200400600
0 5 10 15 20 25 30
−0.040.000.040.08
ACF of standardised residuals
Lag
ACF
0 5 10 15 20 25 30
−0.040.000.040.08
Lag
PartialACF
PACF of standardised residuals
−3 −2 −1 0 1 2 3
−8−4024
QQ plot of standardised residuals
Theoretical Quantiles
SampleQuantiles
0 5 10 15 20 25 30
0.00.20.40.60.81.0
Ljung−Box Test
lag
p−value
Figure 21: Residual analysis of GARCH model
35
36. 7.5.2 GARCH(1,2)
m.12 = garch(r.coin,order=c(1,2),trace = FALSE)
summary(m.12) # All coefficients are significant at 5% level of significance.
##
## Call:
## garch(x = r.coin, order = c(1, 2), trace = FALSE)
##
## Model:
## GARCH(1,2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.51686 -0.32915 0.06704 0.51713 4.74037
##
## Coefficient(s):
## Estimate Std. Error t value Pr(>|t|)
## a0 5.215e-05 5.795e-06 8.999 <2e-16 ***
## a1 1.808e-01 1.921e-02 9.411 <2e-16 ***
## a2 8.007e-08 2.309e-02 0.000 1
## b1 8.182e-01 1.274e-02 64.233 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Diagnostic Tests:
## Jarque Bera Test
##
## data: Residuals
## X-squared = 6437.8, df = 2, p-value < 2.2e-16
##
##
## Box-Ljung test
##
## data: Squared.Residuals
## X-squared = 0.16261, df = 1, p-value = 0.6868
m.12_2 = garchFit(formula = ~garch(1,2), data =r.coin )
##
## Series Initialization:
## ARMA Model: arma
## Formula Mean: ~ arma(0, 0)
## GARCH Model: garch
## Formula Variance: ~ garch(1, 2)
## ARMA Order: 0 0
## Max ARMA Order: 0
## GARCH Order: 1 2
36
39. ## Mean and Variance Equation:
## data ~ garch(1, 2)
## <environment: 0x7fa47b3b34e8>
## [data = r.coin]
##
## Conditional Distribution:
## norm
##
## Coefficient(s):
## mu omega alpha1 beta1 beta2
## 1.6358e-03 4.2894e-05 2.1253e-01 2.3193e-01 5.6246e-01
##
## Std. Errors:
## based on Hessian
##
## Error Analysis:
## Estimate Std. Error t value Pr(>|t|)
## mu 1.636e-03 7.114e-04 2.299 0.021479 *
## omega 4.289e-05 1.213e-05 3.537 0.000405 ***
## alpha1 2.125e-01 2.781e-02 7.642 2.13e-14 ***
## beta1 2.319e-01 6.797e-02 3.412 0.000644 ***
## beta2 5.625e-01 6.317e-02 8.904 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log Likelihood:
## 3306.982 normalized: 1.867296
##
## Description:
## Sat May 26 12:18:17 2018 by user:
##
##
## Standardised Residuals Tests:
## Statistic p-Value
## Jarque-Bera Test R Chi^2 5296.654 0
## Shapiro-Wilk Test R W 0.9066555 0
## Ljung-Box Test R Q(10) 37.47967 4.673712e-05
## Ljung-Box Test R Q(15) 43.96988 0.0001111644
## Ljung-Box Test R Q(20) 54.04404 5.698434e-05
## Ljung-Box Test R^2 Q(10) 7.616766 0.6662139
## Ljung-Box Test R^2 Q(15) 10.03355 0.817624
## Ljung-Box Test R^2 Q(20) 12.65941 0.8915146
## LM Arch Test R TR^2 8.639396 0.733383
##
## Information Criterion Statistics:
## AIC BIC SIC HQIC
## -3.728946 -3.713477 -3.728962 -3.723231
39
40. # residual.analysis(m.12,class="GARCH",start=3)
By combining two estimation method, we could conclude all coefficients are significant at 5%
level of significance.The normality does not hold, but there is no autocorrealtion left in residuals.
GARCH(1,2) seems to be quite good.
7.5.3 GARCH (2,2)
# m.22 = garch(r.coin,order=c(2,2),trace = FALSE)
# summary(m.22) # Not all coefficients are significant at 5% level of significance.
# m.22_2 = garchFit(formula = ~garch(2,2), data =r.coin )
# summary(m.22_2)
# residual.analysis(m.22,class="GARCH",start=11)
Not all coefficients are significant at 5% level of significance.
7.5.4 GARCH(2,3)
#m.23 = garch(r.coin,order=c(2,3),trace = FALSE)
#summary(m.23) # Not all coefficients are significant at 5% level of significance.
# m.23_2 = garchFit(formula = ~garch(2,3), data =r.coin )
# summary(m.23_2)
# residual.analysis(m.23,class="GARCH",start=4)
Not all coefficients are significant at 5% level of significance, especially the higher order of AR
components.
# sort.score(AIC(m.11,m.12,m.22,m.23), score = "aic")
As a result, all models have similar residual analysis. But based on the parameter estimation, it
indicates GARCH(1,2) is the best although AIC test did not give us same results. We will go on
with GARCH(1,2) model to make estimated conditional variance forecasts.
7.6 Estimated Conditional Variance Prediction
plot((fitted(m.12)[,1])^2,type='l',
ylab='Conditional Variance',
xlab='t',main="Estimated Conditional Variances of the Daily Bitcoin Price")
7.7 Prediction with Confidence Intervals
40
51. ## Sign Bias 1.0782 0.2811
## Negative Sign Bias 0.1318 0.8952
## Positive Sign Bias 0.0200 0.9840
## Joint Effect 2.0908 0.5538
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 310.2 1.646e-54
## 2 30 328.6 1.664e-52
## 3 40 349.5 1.545e-51
## 4 50 363.8 1.191e-49
##
##
## Elapsed time : 0.328917
# All trend components singificant.
After increase order of MA componnets, although p value of MA(2) is still significant, but the value
goes higher.
par(mfrow=c(2,2))
plot(m.12_11, which=8)
plot(m.12_11, which=9)
plot(m.12_11, which=10)
par(mfrow=c(1,1))
Residuals are also not white noise, and the distribution of standardised residuals has fat tails and
seems to be far from normality.
So it implies it would be safe to stay at ARMA(1,1). Then we are going to find the most suitable
parameters for GARCH part.
8.1.4 ARMA(1,1) + GARCH(2,1)
model5<-ugarchspec(variance.model = list(model = "sGARCH", garchOrder = c(2, 1)),
mean.model = list(armaOrder = c(1, 1), include.mean = FALSE),
distribution.model = "norm")
m.11_21<-ugarchfit(spec=model5,data=r.coin,out.sample = 100)
m.11_21
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
51
52. Empirical Density of Standardized Residuals
zseries
Probability
−8 −6 −4 −2 0 2 4
0.00.20.40.6
Median: 0.06 | Mean: 0.0379
GARCHmodel:sGARCH
normal Density
norm (0,1) Fitted Density
−3 −2 −1 0 1 2 3
−8−4024
norm − QQ Plot
Theoretical Quantiles
SampleQuantiles
GARCHmodel:sGARCH
1 4 7 11 16 21 26 31
ACF of Standardized Residuals
lag
ACF
−0.050.000.050.10
GARCHmodel:sGARCH
Figure 26: Residual analysis of ARMA+GARCH model
52
61. ## Negative Sign Bias 0.2221 0.8242
## Positive Sign Bias 0.4836 0.6287
## Joint Effect 1.5024 0.6817
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 303.7 3.446e-53
## 2 30 327.4 2.913e-52
## 3 40 338.3 2.346e-49
## 4 50 354.2 7.746e-48
##
##
## Elapsed time : 0.151274
par(mfrow=c(2,2))
plot(m.11_13, which=8)
plot(m.11_13, which=9)
plot(m.11_13, which=10)
par(mfrow=c(1,1))
All parameter of components are not insignificant. Residuals did not been improved.
As a result, we should stay at GARCH(1,2). The best fit model is supposed to be
ARMA(1,1)+GARCH(1,2)
61
62. Empirical Density of Standardized Residuals
zseries
Probability
−8 −6 −4 −2 0 2 4
0.00.20.40.6
Median: 0.07 | Mean: 0.0478
GARCHmodel:sGARCH
normal Density
norm (0,1) Fitted Density
−3 −2 −1 0 1 2 3
−8−4024
norm − QQ Plot
Theoretical Quantiles
SampleQuantiles
GARCHmodel:sGARCH
1 4 7 11 16 21 26 31
ACF of Standardized Residuals
lag
ACF
−0.050.000.050.10
GARCHmodel:sGARCH
Figure 29: Residual analysis of ARMA+GARCH model
62
63. 8.2 Model Selection
Just comparing if ARMA(1,1)+GARCH(1,2) can get the lowest result from test summary above.
m.11_11
m.01_11
m.12_11
m.11_21
m.11_12
m.11_13
Based on the GARCH model fit Information Criteria test result :
Criteria m.11_11 m.01_11 m.12_11 m.11_21 m.11_12 m.11_13
Akaike -3.8065 -3.8068 -3.7206 -3.8035 -3.8094 -3.7247
Bayes -3.7903 -3.7938 -3.7021 -3.7841 -3.7899 -3.7031
Shibata -3.8065 -3.8068 -3.7206 -3.8036 -3.8094 -3.7248
Hannan-Quinn -3.8005 -3.8020 -3.7138 -3.7963 -3.8021 -3.7167
Therefore, ARMA(1,1) + GARCH(1,2) is the best model fitting the dataset. Except bayes, m11_12
has the all smallest value. This also coordinate to order we found in GARCH effect above.
9 Forecasting
forc.11_12 = ugarchforecast(m.11_12, data = r.coin, n.ahead = 10, n.roll = 90,
out.sample = 100)
forc.11_12
##
## *------------------------------------*
## * GARCH Model Forecast *
## *------------------------------------*
## Model: sGARCH
## Horizon: 10
## Roll Steps: 90
## Out of Sample: 10
##
## 0-roll forecast [T0=1974-07-30 10:00:00]:
## Series Sigma
## T+1 0.003018 0.04142
## T+2 0.002880 0.04365
## T+3 0.002749 0.04301
## T+4 0.002623 0.04394
## T+5 0.002503 0.04399
63
64. ## T+6 0.002389 0.04452
## T+7 0.002280 0.04478
## T+8 0.002176 0.04519
## T+9 0.002077 0.04551
## T+10 0.001982 0.04587
plot(forc.11_12,which="all")
5 6 7 8
−0.20−0.100.000.050.100.15
Forecast Series
w/th unconditional 1−Sigma bands
Time/Horizon
Series
GARCHmodel:sGARCH
Horizon: 10
Actual
Forecast
8 9 10 11
−0.20.00.20.4
Rolling Forecast vs Actual Series
w/th conditional 2−Sigma bands
Time/Horizon
Series
GARCHmodel:sGARCH
Horizon: 90
Actual
Forecast
7 09 7 19 7 29 8 08
0.040.050.060.07
Forecast Unconditional Sigma
(n.roll = 0)
Time/Horizon
Sigma
GARCHmodel:sGARCH
Horizon: 10
Actual
Forecast
8 9 10 11
0.000.050.100.150.200.25
Forecast Rolling Sigma vs |Series|
Time/Horizon
Sigma
GARCHmodel:sGARCH
Horizon: 90
Actual
Forecast
|Series|
Figure 30: Forecasting of ARMA(1,1)+GARCH(1,2) Model
The first of four plots is the usual forecasts plots that indicates there might be a genral slightly
decreasing trend in the next 10 days.
Rolling forecasts are used for long-term forecasting in general. We observed conditional variance
64
65. will go upward in the future. And it displays the closing price for Bitcoin will still be fluctuating,
and the price will gentally increasing in the future.
10 MASE
By calculating MASE, we could find out the best MASE values for our best models over fitted
values and forecasts.
a was given to represent the MASE value over forecast.
b was given to represent the MASE value over the fitted values.
forc.coin <- read.csv("Bitcoin_Prices_Forecasts.csv")
inds1 <- seq(as.Date("2018-03-04"), as.Date("2018-3-13"), by = "day")
forc.coin1 <- ts(forc.coin$Closing.price, start = c(2018, as.numeric(format(inds[1],
"%j"))), frequency = 365)
10.1 MASE for ARMA(1,1)+GARCH(1,1)
# MASE over the forecasts:
forc = ugarchforecast(m.11_11, data = r.coin, n.ahead = 9, n.roll = 90, out.sample = 100)
# forc@forecast$seriesFor
a = forc@forecast$seriesFor[, ncol(forc@forecast$seriesFor)]
a = diffinv(a, differences = 1, xi = log(11512.6))
a = exp(a)
MASE(as.numeric(forc.coin$Closing.price), as.numeric(a)) # 3832.578
## $MASE
## MASE
## 1 3832.578
# MASE over the fitted values:
model.fit = m.11_11
b = model.fit@fit$fitted.values
b = diffinv(b, differences = 1, xi = log(134.21))
b = exp(b)
MASE(coin, b) # 49.70846
## $MASE
## MASE
## 1 49.70846
• a= 3832.578
• b= 49.70846
65
66. 10.2 MASE for ARMA(0,1)+GARCH(1,1)
forc = ugarchforecast(m.01_11, data = r.coin, n.ahead = 9, n.roll = 90, out.sample = 100)
# forc@forecast$seriesFor
a = forc@forecast$seriesFor[, ncol(forc@forecast$seriesFor)]
a = diffinv(a, differences = 1, xi = log(11512.6))
a = exp(a)
MASE(as.numeric(forc.coin$Closing.price), as.numeric(a)) # 3826.62
## $MASE
## MASE
## 1 3826.62
model.fit = m.01_11
b = model.fit@fit$fitted.values
b = diffinv(b, differences = 1, xi = log(134.21))
b = exp(b)
MASE(coin, b) # 51.58759
## $MASE
## MASE
## 1 51.58759
• a= 3826.62
• b= 51.58759
10.3 MASE for ARMA(1,2)+GARCH(1,1)
forc = ugarchforecast(m.12_11, data = r.coin, n.ahead = 9, out.sample = 100)
# forc@forecast$seriesFor
a = forc@forecast$seriesFor[, ncol(forc@forecast$seriesFor)]
a = diffinv(a, differences = 1, xi = log(11512.6))
a = exp(a)
MASE(as.numeric(forc.coin$Closing.price), as.numeric(a)) #3857.048t
## $MASE
## MASE
## 1 3857.048
model.fit = m.12_11
b = model.fit@fit$fitted.values
b = diffinv(b, differences = 1, xi = log(134.21))
b = exp(b)
MASE(coin, b) # 20.73478t
66
67. ## $MASE
## MASE
## 1 20.73478
• a=3857.048
• b=20.73478
10.4 MASE for ARMA(1,1)+GARCH(2,1)
forc = ugarchforecast(m.11_21, data = r.coin, n.ahead = 9, n.roll = 90, out.sample = 100)
# forc@forecast$seriesFor
a = forc@forecast$seriesFor[, ncol(forc@forecast$seriesFor)]
a = diffinv(a, differences = 1, xi = log(11512.6))
a = exp(a)
MASE(as.numeric(forc.coin$Closing.price), as.numeric(a)) # 3832.99
## $MASE
## MASE
## 1 3832.99
model.fit = m.11_21
b = model.fit@fit$fitted.values
b = diffinv(b, differences = 1, xi = log(134.21))
b = exp(b)
MASE(coin, b) # 49.73594
## $MASE
## MASE
## 1 49.73594
• a= 3832.99
• b= 49.73594
10.5 MASE for ARMA(1,1)+GARCH(1,2)
# MASE over the forecasts:
forc = ugarchforecast(m.11_12, data = r.coin, n.ahead = 9, n.roll = 90, out.sample = 100)
# forc@forecast$seriesFor
a = forc@forecast$seriesFor[, ncol(forc@forecast$seriesFor)]
a = diffinv(a, differences = 1, xi = log(11512.6))
MASE(as.numeric(forc.coin$Closing.price), as.numeric(a)) # 1.326658
## $MASE
## MASE
## 1 1.326658
67
68. # MASE over fitted values:
model.fit = m.11_12
b = model.fit@fit$fitted.values
b = diffinv(b, differences = 1, xi = log(134.21))
b = exp(b)
MASE(coin, b) # 49.60984
## $MASE
## MASE
## 1 49.60984
• a= 1.326658
• b= 49.60984
10.6 MASE for ARMA(1,1)+GARCH(1,3)
# forc = ugarchforecast(m.11_13, data = r.coin, n.ahead = 9, n.roll = 90,
# out.sample = 100) forc@forecast$seriesFor a =
# forc@forecast$seriesFor[,ncol(forc@forecast$seriesFor)] a = diffinv(a,
# differences = 1, xi = log(11512.60)) a = exp(a)
# MASE(as.numeric(forc.coin$Closing.price), as.numeric(a)) # 3831.291
# model.fit = m.11_13 b = model.fit@fit$fitted.values b = diffinv(b,
# differences = 1, xi = log(134.21)) b = exp(b)
# MASE(coin, b) # 21.6302
• a= 3831.291
• b= 21.6302
Best value of a results from ARMA(1,1)+GARCH(1,2), best value of b generated by ARMA(1,2) +
GARCH(1,1).
Overall, the following table is the best MASE values of our best model:
Type BEST MASE
Over fitted values 20.73478
Over forecasts 1.326658
68
69. 11 Conclusion
In conclusion, this reprot focuses modelling, parameter estimation and diagnostic checking among a
set of possible candidate models to fit The daily closing price of bitcoin from the 27th of April 2013 to
the 3rd of March 2018 on the basis of characteristics in time series plot. ARMA(1,1)+GARCH(1,2)
is confirmed to be the best model for log transformed Bitcoin series with the first differencing by
coefficience significance tests, though the residual analysis is not perfect. Additionally, predictions of
bitcoin closing price for the next 10 days was forecasted in the report that implies a slight decreasing
trend, but it will increase in the long term and conditional variance will go upward as well.
ARMA(1,2)+GARCH(1,1) is the best fitting model confirmed by MASE method.
Eventually, the best MASE value over fitted values 20.73478 by ARMA(1,2) + GARCH(1,1) and
the best MASE value over forecasts value is 1.326658 by ARMA(1,1)+GARCH(1,2).
69