SlideShare a Scribd company logo
1 of 13
Natural Gas Time Series Analysis
Shalavadi
1
Natural Gas Time Series Analysis
By Sandesh Shalavadi
PSTAT 174
June 7, 2016
Natural Gas Time Series Analysis
Shalavadi
2
Sandesh Shalavadi (#7210834)
PSTAT 174
Gyorgy Terdik
6/6/2016
Natural Gas Data Time Series Analysis
I. Introduction
Natural gas is hydrocarbon gas that is made of mainly methane and smaller amounts of
carbon dioxide, hydrogen sulfide, helium, or nitrogen. It is formed as numerous layers of
plant and animal life are exposed to extreme heat and pressure under the Earth’s surface. It is
a fossil fuel that is currently used for cooking, electricity generation, and heating. Although,
in comparison to its counterparts, such as petroleum or coal, it is much more efficient and
releases far fewer emissions to the environment without polluting the air. Therefore, it is a
very safe gas and is also environmentally friendly, which is significant in today’s polluted
atmosphere. In addition, natural gas is odorless, colorless, and shapeless. If a leak occurs, it
will instantly accumulate upwards, so it wouldn’t be able to build and cause a potential
explosion.
I chose natural gas as my idea to research because I am interested in the recent efforts to
change climate control and combat global warming. In the Paris climate control talks in 2015,
195 countries gathered to discuss how they could alleviate the rapid expansion of global
warming by reducing the consumption of greenhouse gases and making developing countries
more eco-friendly so as to adapt to the adverse effects of the climate change as well as build a
financial plan to support a pathway to climate-resilient development. The time series analysis
of natural gas will help to improve competent prediction for the change of manufacturing
costs and labor, which will each country reach their respective goals.
The goal for this project is to initiate time series analysis for an actual time series growth
data set. I have chosen the (Price of Natural Gas) from 1996 to 2016. This is a span of 20
years which will show how fluctuation in price affected the overall data. First, I used R to
analyze the data and then tried to fit a model. First, I used a differencing to transform the data
to get a stationary time series and remove trend and seasonality. Then, I estimated the
parameters from the ACF and PACF of the new differenced time series. I analyzed several
plots to select five possible models to fit the data. Subsequently, I checked for the smallest
AIC. Finally, I used the fitting models to forecast the future values.
Natural Gas Time Series Analysis
Shalavadi
3
II. Sections
According to the original time series plot, it is not a stationary process since they are a
few upwards trends that portray a significant increase in the price of natural gas. The mean is
4.625945 and the variance is 5.428623. Therefore, there was no seasonality in the original
plot. There was sharp changes in the years 2001, 2006, and 2008. The price reached its
highest point at October 2005 and its lowest point at December 1998.
Modelling:
Time Series plot of original data
ACF and PACF plot of original data
Natural Gas Time Series Analysis
Shalavadi
4
From the autocorrelation function plot (ACF), the gradual decay of ACF shows not stationarity.
Therefore, in order to get a stationary series, I differenced the data so I could remove the trend.
I used differencing to transform the data instead of using box-cox or log transformation so as
to avoid using adding process.
Differenced time series data:
Time series plot of differenced data
ACF and PACF plot of differenced data
Natural Gas Time Series Analysis
Shalavadi
5
Parameter estimation
After differencing the time series, the new time series plot looks stationary. The mean is
0.01101852, which is approximately 0. While the variance decreased to 0.6964492. Also, the
ACF shows seasonality. Also, since the ACF and the PACF plot trail off rapidly, an ARIMA
model is possible. According to the ACF plot, the ACF cuts off at lag 5 and lag 9.
According to the PACF plot, the PACF cuts of at lag 5 and lag 9.
----------------------------------------------------------------------------------------------------------------
adf.test(newgas)
Augmented Dickey-Fuller Test
data: newgas
Dickey-Fuller = -6.1023, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary
p-values < 0.05 indicated stationary.
The low p-value from the ADF test also confirms that the data is stationary.
----------------------------------------------------------------------------------------------------------------
plot(decompose(newgas))
Seasonal trend of fitted data
From the decomposition of the time series, we can see that there is a quadratic trend with a
seasonal trend, which suggests a seasonal model to fit the data.
Natural Gas Time Series Analysis
Shalavadi
6
Model Diagnostics:
Fit the models
In order to find the best model, I fit the following models and afterwards, I checked their AIC
in order to find the one model with the smallest AIC. Here are the formulas for the models:
Model 1 ARIMA (1,1,0)
Model 2 ARIMA (1,0,0): lowest AIC;
Model 3 ARIMA (2,1,0)
Model 4 ARIMA (2,1,1): second lowest AIC;
Model 5 ARIMA (2,0,1)
Utilizing “auto.arima” funciton for our first model to give us a general idea.We will attempt to
find the best seasonal ARIMA model AIC value.
auto.arima(gas)
Series: gas
ARIMA(1,0,0) with non-zero mean
Coefficients:
ar1 sar1
0.0400 -0.4823
s.e. 0.0698 0.0591
sigma^2 estimated as 0.6964: log likelihood=-266.92
AIC=537.84 AICc=537.9 BIC=544.59
From “auto.arima” function, we can see that ARIMA(1,0,0) is not a bad model with
significance coefficient of 0.0400 for ar1.
Model AIC Coefficient s.e. Conclusion
SARIMA(1,0,0)×(1,1,0)12 615.75
ar1=0.0400 0.0698 significant
sar1-0.4823 0.0591 significant
SARIMA(1,0,0)×(1,0,0)12 540.88
ar1=-0.0093 0.0683 significant
sar1=-.0668 0.68 not significant
SARIMA(2,0,0)×(2,1,0)12 593.32
ar1=0.0388 0.0698 significant
sar1=-0.6561 0.0648 not significant
SARIMA(2,0,0)×(2,1,1)12 541.01
ar1=0.0117 0.0701 significant
sar1=-.1245 0.0717 significant
sma1=-1.00 0.0731 significant
SARIMA(2,0,0)×(2,0,1)12 546.23
ar1=-0.0151 0.069 significant
sar1=-0.0297 0.4389 not significant
sar2=-0.0507 0.0733 not significant
sma1=-.0422 -.0422 not significant
Although SARIMA(1,0,0)×(1,1,0)12 is a not bad model with all its coefficients are significant.
We will conclude that SARIMA(1,0,0)×(2,1,0)12 is the best fit because it has smaller AIC value
of 541.01 with all its coefficients are significant.
Natural Gas Time Series Analysis
Shalavadi
7
Diagnostic checking for ARIMA(1,1,0):
> fit1 = arima(newgas,order=c(1,0,0), seasonal = list(order = c(1, 1, 0), period = 12))
> fit1
Call:
arima(x = newgas, order = c(1, 0, 0), seasonal = list(order = c(1, 1, 0), period = 12))
Coefficients:
ar1 sar1
0.0400 -0.4823
s.e. 0.0698 0.0591
sigma^2 estimated as 1.145: log likelihood = -304.88, aic = 615.75
> Box.test(residuals(fit1), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
Box-Pierce test
data: residuals(fit1)
X-squared = 0.00081791, df = 1, p-value = 0.9772
All p-values larger than 0.05, so it passed the tests.
SARIMA(1,0,0)×(1, 1, 0)12 is not a bad model with significance coefficient of 0.0400 for ar1
and significant coefficient of -0.4823 for ma1. Now, I will try SARIMA model to fit the series,
and compared which one fits the best. The one with significant coefficient and lowest AIC
values should be the best.
Diagnostic checking for ARIMA (1,0,0):
> fit2 = arima(newgas,order=c(1,0,0), seasonal = list(order = c(1, 0, 0), period = 12))
> fit2
Call:
arima(x = newgas, order = c(1, 0, 0), seasonal = list(order = c(1, 0, 0), period = 12))
Coefficients:
ar1 sar1 intercept
-0.0093 -0.0668 0.0110
s.e. 0.0683 0.0680 0.0527
sigma^2 estimated as 0.69: log likelihood = -266.44, aic = 540.88
> Box.test(residuals(fit2), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
Box-Pierce test
data: residuals(fit2)
X-squared = 6.8896e-08, df = 1, p-value = 0.9998
>tsdiag(fit2)
All p-values larger than 0.05, so it passed the tests.
Natural Gas Time Series Analysis
Shalavadi
8
According to the tsdiag plot for fit2, the standardized residuals plot doesn’t show clusters of
volatility. The ACF plots show no significant autocorrelation between the residuals. The p-
values for the Ljung–Box statistics are mostly above the blue. This could be our optimal
model, but since fit4 has a lower AIC value, it is preferred.
Diagnostic checking for ARIMA (2,1,0):
> fit3 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 1, 0), period = 12))
> fit3
Call:
arima(x = newgas, order = c(2, 0, 0), seasonal = list(order = c(2, 1, 0), period = 12))
Coefficients:
ar1 ar2 sar1 sar2
0.0388 -0.0592 -0.6561 -0.3351
s.e. 0.0698 0.0699 0.0648 0.0626
sigma^2 estimated as 0.9913: log likelihood = -291.66, aic = 593.32
> Box.test(residuals(fit3), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
Box-Pierce test
data: residuals(fit3)
X-squared = 0.0045882, df = 1, p-value = 0.946
All p-values larger than 0.05, so it passed the tests.
tsdiag plot of fit 2 model
Natural Gas Time Series Analysis
Shalavadi
9
Diagnostic checking for ARIMA (2,1,1):
> fit4 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 1, 1), period = 12))
> fit4
Call
arima(x = newgas, order = c(2, 0, 0), seasonal = list(order = c(2, 1, 1), period = 12))
Coefficients:
ar1 ar2 sar1 sar2 sma1
0.0117 -0.0102 -0.1245 -0.1152 -1.0000
s.e. 0.0701 0.0705 0.0717 0.0696 0.0731
sigma^2 estimated as 0.6438: log likelihood = -264.5, aic = 541.01
> Box.test(residuals(fit4), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
Box-Pierce test
data: residuals(fit4)
X-squared = 2.5971e-06, df = 1, p-value = 0.9987
tsdiag(fit4)
All p-values are larger than 0.05, passes the tests.
Natural Gas Time Series Analysis
Shalavadi
10
According to the tsdiag plot for fit4, the standardized residuals plot doesn’t show clusters of
volatility. The ACF plots show no significant autocorrelation between the residuals. The p-
values for the Ljung–Box statistics are all mostly above the blue. Therefore, we have white
noise for SARIMA(2,0,0)×(2,1,1)12, and it is an adequate model.
The model equation is: (1 – φ1B + φ1B2)(1 – Φ1B12–Φ2 B24)(1 – B12)Xt = Wt
Diagnostic checking for ARIMA (2,0,1):
> fit5 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 0, 1), period = 12))
> fit5
Call:
arima(x = newgas, order = c(2, 0, 0), seasonal = list(order = c(2, 0, 1), period = 12))
Coefficients:
ar1 ar2 sar1 sar2 sma1
-0.0151 0.0072 -0.0297 -0.0507 -0.0422
s.e. 0.0690 0.0690 0.4389 0.0733 0.4374
intercept
0.0105
s.e. 0.0501
sigma^2 estimated as 0.6877: log likelihood = -266.12, aic = 546.23
> Box.test(residuals(fit5), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
Box-Pierce test
data: residuals(fit5)
X-squared = 6.152e-06, df = 1, p-value = 0.998
All p-values larger than 0.05, passed the tests.
Forecasting:
plot(forecast(fit))
The forecast plot looks pretty good, and it seems to capture the overall movement of the data
adequately.
Natural Gas Time Series Analysis
Shalavadi
11
Diagnostic Plots for Time-Series Fits
Comparing real time data with 10 future values
y=read.table("realdataofnaturalgas.txt", header=T)
data=ts(y$gas, frequency = 12, start = c(1991))
par(mfrow=c(2,1))
plot(forecast(fit4))
plot(data,main="real data")
The second time series plot above is from the real data. Looking at the real data plot, there is a
decreasing trend from 2014 to 2016, which is similar to the forecast plot. We can see that the
forecast value is relatively close to the real value because most of the real data sets from 2014
to 2016 fall fairly into the confident intervals (the shadow part of forecast plot).
As a result, the outcome shows that the forecast was fairly adequate.
Real data time series plot
Natural Gas Time Series Analysis
Shalavadi
12
III. Sources
Data from: http://www.indexmundi.com/commodities/?commodity=natural-gas&months=240
IV. IV. Code
library(forecast)
library(MASS)
# include packages astsa, forecast, MASS, timeDate, timeSeries, tseries
setwd("C:/Sandesh/College Stuff/UCSB/PSTAT 126")
x=read.table("realdataofnaturalgas.csv", sep=",", header=T)
gas=ts(x$Price, start = c(1996,4), end = c(2014,4), frequency = 12)
# mean and variance of gas data set before transformation
mean(gas)
var(gas)
#ACF and PACF plot of original data set (non-stationary), ACF trails off
acf2(ts(newgas))
# transformed data set to make stationary time series
newgas=diff(gas)
# plot of original and new data set to show difference after removing trends and adding lag
plot(gas,main ='Price of natural gas (US Dollars)', xlab='Year', ylab='Price', lwd=2)
plot(newgas,main ='Price of natural gas (US Dollars)', xlab='Year', ylab='Price', lwd=2)
# mean and variance for transformed data set
mean(newgas)
var(newgas)
#ACF and PACF of transformed data set
acf2(ts(newgas))
adf.test(newgas)
plot(decompose(newgas))
auto.arima(newgas)
#fit models
fit1 = arima(newgas,order=c(1,0,0), seasonal = list(order = c(1, 1, 0), period = 12))
fit2 = arima(newgas,order=c(1,0,0), seasonal = list(order = c(1, 0, 0), period = 12))
fit3 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 1, 0), period = 12))
fit4 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 1, 1), period = 12))
fit5 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 0, 1), period = 12))
Natural Gas Time Series Analysis
Shalavadi
13
# ARIMA(1,1,0) has the smallest AIC, ARIMA(2,1,0),
# ARIMA(1,0,0), ARIMA(2,1,1) and ARIMA(2,0,1) have similar AIC.
#simulate models
> fit1
> fit2
> fit3
> fit4
> fit5
> # Diagnostic checking for models fit1, fit2, fit3, fit4, fit5
# plot acf of residuals, standardized residuals and p-values test
>tsdiag(fit2)
>tsdiag(fit4)
> plot(forecast(fit1))
# box-pierce and Ljung Box test for all fitted models
> Box.test(residuals(fit1), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
> Box.test(residuals(fit2), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
> Box.test(residuals(fit3), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
> Box.test(residuals(fit4), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
> Box.test(residuals(fit5), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
#confidence intervals for all fitted models
> confint(fit1)
> confint(fit2)
> confint(fit3)
> confint(fit4)
> confint(fit5)
y=read.table("realdata3.csv",sep=",", header=T)
data=ts(y$Price,start = c(2014,6), end = c(2016,4), frequency=12)
# forecast next 10 observations of original time series
# prediction interval
pred<-predict(fit4, n.ahead = 10)
pred.se<-pred$sepred<-predict(fit4, n.ahead = 10)
par(mfrow=c(2,1))
plot(forecast(fit4))
plot.ts(data,main='real data', xlab='Year',ylab='Price',lwd=2)

More Related Content

Viewers also liked

A biblioteca escolar e os desafios no contexto da sociedade atual
A biblioteca escolar e os desafios no contexto da sociedade atualA biblioteca escolar e os desafios no contexto da sociedade atual
A biblioteca escolar e os desafios no contexto da sociedade atualpaulabarrocas
 
Taller práctico 10 claves para la implementación de tendencias y enfoques inn...
Taller práctico 10 claves para la implementación de tendencias y enfoques inn...Taller práctico 10 claves para la implementación de tendencias y enfoques inn...
Taller práctico 10 claves para la implementación de tendencias y enfoques inn...JOHANNA MOSQUERA
 
VASEP: Giai phap tiep thi thong tin doanh nghiep thuy san toan cau
VASEP: Giai phap tiep thi thong tin doanh nghiep thuy san toan cauVASEP: Giai phap tiep thi thong tin doanh nghiep thuy san toan cau
VASEP: Giai phap tiep thi thong tin doanh nghiep thuy san toan cauOneOanh
 
Leh to Kanyakumari on Mahindra Mojo - Press Release
Leh to Kanyakumari on Mahindra Mojo - Press ReleaseLeh to Kanyakumari on Mahindra Mojo - Press Release
Leh to Kanyakumari on Mahindra Mojo - Press ReleaseRushLane
 
Name one gay club/bar for each of these global cities? And a little descripti...
Name one gay club/bar for each of these global cities? And a little descripti...Name one gay club/bar for each of these global cities? And a little descripti...
Name one gay club/bar for each of these global cities? And a little descripti...olsonqswrkyjwxq
 

Viewers also liked (8)

A biblioteca escolar e os desafios no contexto da sociedade atual
A biblioteca escolar e os desafios no contexto da sociedade atualA biblioteca escolar e os desafios no contexto da sociedade atual
A biblioteca escolar e os desafios no contexto da sociedade atual
 
Teorias
TeoriasTeorias
Teorias
 
Taller práctico 10 claves para la implementación de tendencias y enfoques inn...
Taller práctico 10 claves para la implementación de tendencias y enfoques inn...Taller práctico 10 claves para la implementación de tendencias y enfoques inn...
Taller práctico 10 claves para la implementación de tendencias y enfoques inn...
 
Overview poe-2004
Overview poe-2004Overview poe-2004
Overview poe-2004
 
VASEP: Giai phap tiep thi thong tin doanh nghiep thuy san toan cau
VASEP: Giai phap tiep thi thong tin doanh nghiep thuy san toan cauVASEP: Giai phap tiep thi thong tin doanh nghiep thuy san toan cau
VASEP: Giai phap tiep thi thong tin doanh nghiep thuy san toan cau
 
KPIs ARE BEYOND 1,2,3,4!
KPIs ARE BEYOND 1,2,3,4!KPIs ARE BEYOND 1,2,3,4!
KPIs ARE BEYOND 1,2,3,4!
 
Leh to Kanyakumari on Mahindra Mojo - Press Release
Leh to Kanyakumari on Mahindra Mojo - Press ReleaseLeh to Kanyakumari on Mahindra Mojo - Press Release
Leh to Kanyakumari on Mahindra Mojo - Press Release
 
Name one gay club/bar for each of these global cities? And a little descripti...
Name one gay club/bar for each of these global cities? And a little descripti...Name one gay club/bar for each of these global cities? And a little descripti...
Name one gay club/bar for each of these global cities? And a little descripti...
 

Similar to Natural Gas Time Series Analysis: SARIMA(1,0,0)×(2,1,1)12 Best Fit Model

R language Project report
R language Project reportR language Project report
R language Project reportTianyue Wang
 
Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes Brock Butlett
 
Byungchul Yea (Project)
Byungchul Yea (Project)Byungchul Yea (Project)
Byungchul Yea (Project)Byung Chul Yea
 
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Simplilearn
 
Forecasting_CO2_Emissions.pptx
Forecasting_CO2_Emissions.pptxForecasting_CO2_Emissions.pptx
Forecasting_CO2_Emissions.pptxMOINDALVS
 
Arima model (time series)
Arima model (time series)Arima model (time series)
Arima model (time series)Kumar P
 
Time series analysis on The daily closing price of bitcoin from the 27th of A...
Time series analysis on The daily closing price of bitcoin from the 27th of A...Time series analysis on The daily closing price of bitcoin from the 27th of A...
Time series analysis on The daily closing price of bitcoin from the 27th of A...ShuaiGao3
 
Arima model
Arima modelArima model
Arima modelJassika
 
Different Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLDifferent Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLVijaySharma802
 
arimamodel-170204090012.pdf
arimamodel-170204090012.pdfarimamodel-170204090012.pdf
arimamodel-170204090012.pdfssuserdca880
 
Business Analytics Foundation with R tool - Part 5
Business Analytics Foundation with R tool - Part 5Business Analytics Foundation with R tool - Part 5
Business Analytics Foundation with R tool - Part 5Beamsync
 
A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...
A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...
A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...Federico Cerutti
 
STA457 Assignment Liangkai Hu 999475884
STA457 Assignment Liangkai Hu 999475884STA457 Assignment Liangkai Hu 999475884
STA457 Assignment Liangkai Hu 999475884Liang Kai Hu
 
Long Memory presentation to SURF
Long Memory presentation to SURFLong Memory presentation to SURF
Long Memory presentation to SURFRichard Hunt
 
Title of the ReportA. Partner, B. Partner, and C. Partner.docx
Title of the ReportA. Partner, B. Partner, and C. Partner.docxTitle of the ReportA. Partner, B. Partner, and C. Partner.docx
Title of the ReportA. Partner, B. Partner, and C. Partner.docxjuliennehar
 

Similar to Natural Gas Time Series Analysis: SARIMA(1,0,0)×(2,1,1)12 Best Fit Model (20)

R language Project report
R language Project reportR language Project report
R language Project report
 
Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes
 
Presentation
PresentationPresentation
Presentation
 
Byungchul Yea (Project)
Byungchul Yea (Project)Byungchul Yea (Project)
Byungchul Yea (Project)
 
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
 
Forecasting_CO2_Emissions.pptx
Forecasting_CO2_Emissions.pptxForecasting_CO2_Emissions.pptx
Forecasting_CO2_Emissions.pptx
 
Arima model (time series)
Arima model (time series)Arima model (time series)
Arima model (time series)
 
Time series analysis on The daily closing price of bitcoin from the 27th of A...
Time series analysis on The daily closing price of bitcoin from the 27th of A...Time series analysis on The daily closing price of bitcoin from the 27th of A...
Time series analysis on The daily closing price of bitcoin from the 27th of A...
 
ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]
 
ARIMA.pptx
ARIMA.pptxARIMA.pptx
ARIMA.pptx
 
Arima model
Arima modelArima model
Arima model
 
Different Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLDifferent Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIML
 
arimamodel-170204090012.pdf
arimamodel-170204090012.pdfarimamodel-170204090012.pdf
arimamodel-170204090012.pdf
 
Business Analytics Foundation with R tool - Part 5
Business Analytics Foundation with R tool - Part 5Business Analytics Foundation with R tool - Part 5
Business Analytics Foundation with R tool - Part 5
 
Seasonal ARIMA
Seasonal ARIMASeasonal ARIMA
Seasonal ARIMA
 
A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...
A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...
A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...
 
STA457 Assignment Liangkai Hu 999475884
STA457 Assignment Liangkai Hu 999475884STA457 Assignment Liangkai Hu 999475884
STA457 Assignment Liangkai Hu 999475884
 
Long Memory presentation to SURF
Long Memory presentation to SURFLong Memory presentation to SURF
Long Memory presentation to SURF
 
Title of the ReportA. Partner, B. Partner, and C. Partner.docx
Title of the ReportA. Partner, B. Partner, and C. Partner.docxTitle of the ReportA. Partner, B. Partner, and C. Partner.docx
Title of the ReportA. Partner, B. Partner, and C. Partner.docx
 
Ascoli et al. 2014 ICFFR
Ascoli et al. 2014 ICFFRAscoli et al. 2014 ICFFR
Ascoli et al. 2014 ICFFR
 

Natural Gas Time Series Analysis: SARIMA(1,0,0)×(2,1,1)12 Best Fit Model

  • 1. Natural Gas Time Series Analysis Shalavadi 1 Natural Gas Time Series Analysis By Sandesh Shalavadi PSTAT 174 June 7, 2016
  • 2. Natural Gas Time Series Analysis Shalavadi 2 Sandesh Shalavadi (#7210834) PSTAT 174 Gyorgy Terdik 6/6/2016 Natural Gas Data Time Series Analysis I. Introduction Natural gas is hydrocarbon gas that is made of mainly methane and smaller amounts of carbon dioxide, hydrogen sulfide, helium, or nitrogen. It is formed as numerous layers of plant and animal life are exposed to extreme heat and pressure under the Earth’s surface. It is a fossil fuel that is currently used for cooking, electricity generation, and heating. Although, in comparison to its counterparts, such as petroleum or coal, it is much more efficient and releases far fewer emissions to the environment without polluting the air. Therefore, it is a very safe gas and is also environmentally friendly, which is significant in today’s polluted atmosphere. In addition, natural gas is odorless, colorless, and shapeless. If a leak occurs, it will instantly accumulate upwards, so it wouldn’t be able to build and cause a potential explosion. I chose natural gas as my idea to research because I am interested in the recent efforts to change climate control and combat global warming. In the Paris climate control talks in 2015, 195 countries gathered to discuss how they could alleviate the rapid expansion of global warming by reducing the consumption of greenhouse gases and making developing countries more eco-friendly so as to adapt to the adverse effects of the climate change as well as build a financial plan to support a pathway to climate-resilient development. The time series analysis of natural gas will help to improve competent prediction for the change of manufacturing costs and labor, which will each country reach their respective goals. The goal for this project is to initiate time series analysis for an actual time series growth data set. I have chosen the (Price of Natural Gas) from 1996 to 2016. This is a span of 20 years which will show how fluctuation in price affected the overall data. First, I used R to analyze the data and then tried to fit a model. First, I used a differencing to transform the data to get a stationary time series and remove trend and seasonality. Then, I estimated the parameters from the ACF and PACF of the new differenced time series. I analyzed several plots to select five possible models to fit the data. Subsequently, I checked for the smallest AIC. Finally, I used the fitting models to forecast the future values.
  • 3. Natural Gas Time Series Analysis Shalavadi 3 II. Sections According to the original time series plot, it is not a stationary process since they are a few upwards trends that portray a significant increase in the price of natural gas. The mean is 4.625945 and the variance is 5.428623. Therefore, there was no seasonality in the original plot. There was sharp changes in the years 2001, 2006, and 2008. The price reached its highest point at October 2005 and its lowest point at December 1998. Modelling: Time Series plot of original data ACF and PACF plot of original data
  • 4. Natural Gas Time Series Analysis Shalavadi 4 From the autocorrelation function plot (ACF), the gradual decay of ACF shows not stationarity. Therefore, in order to get a stationary series, I differenced the data so I could remove the trend. I used differencing to transform the data instead of using box-cox or log transformation so as to avoid using adding process. Differenced time series data: Time series plot of differenced data ACF and PACF plot of differenced data
  • 5. Natural Gas Time Series Analysis Shalavadi 5 Parameter estimation After differencing the time series, the new time series plot looks stationary. The mean is 0.01101852, which is approximately 0. While the variance decreased to 0.6964492. Also, the ACF shows seasonality. Also, since the ACF and the PACF plot trail off rapidly, an ARIMA model is possible. According to the ACF plot, the ACF cuts off at lag 5 and lag 9. According to the PACF plot, the PACF cuts of at lag 5 and lag 9. ---------------------------------------------------------------------------------------------------------------- adf.test(newgas) Augmented Dickey-Fuller Test data: newgas Dickey-Fuller = -6.1023, Lag order = 5, p-value = 0.01 alternative hypothesis: stationary p-values < 0.05 indicated stationary. The low p-value from the ADF test also confirms that the data is stationary. ---------------------------------------------------------------------------------------------------------------- plot(decompose(newgas)) Seasonal trend of fitted data From the decomposition of the time series, we can see that there is a quadratic trend with a seasonal trend, which suggests a seasonal model to fit the data.
  • 6. Natural Gas Time Series Analysis Shalavadi 6 Model Diagnostics: Fit the models In order to find the best model, I fit the following models and afterwards, I checked their AIC in order to find the one model with the smallest AIC. Here are the formulas for the models: Model 1 ARIMA (1,1,0) Model 2 ARIMA (1,0,0): lowest AIC; Model 3 ARIMA (2,1,0) Model 4 ARIMA (2,1,1): second lowest AIC; Model 5 ARIMA (2,0,1) Utilizing “auto.arima” funciton for our first model to give us a general idea.We will attempt to find the best seasonal ARIMA model AIC value. auto.arima(gas) Series: gas ARIMA(1,0,0) with non-zero mean Coefficients: ar1 sar1 0.0400 -0.4823 s.e. 0.0698 0.0591 sigma^2 estimated as 0.6964: log likelihood=-266.92 AIC=537.84 AICc=537.9 BIC=544.59 From “auto.arima” function, we can see that ARIMA(1,0,0) is not a bad model with significance coefficient of 0.0400 for ar1. Model AIC Coefficient s.e. Conclusion SARIMA(1,0,0)×(1,1,0)12 615.75 ar1=0.0400 0.0698 significant sar1-0.4823 0.0591 significant SARIMA(1,0,0)×(1,0,0)12 540.88 ar1=-0.0093 0.0683 significant sar1=-.0668 0.68 not significant SARIMA(2,0,0)×(2,1,0)12 593.32 ar1=0.0388 0.0698 significant sar1=-0.6561 0.0648 not significant SARIMA(2,0,0)×(2,1,1)12 541.01 ar1=0.0117 0.0701 significant sar1=-.1245 0.0717 significant sma1=-1.00 0.0731 significant SARIMA(2,0,0)×(2,0,1)12 546.23 ar1=-0.0151 0.069 significant sar1=-0.0297 0.4389 not significant sar2=-0.0507 0.0733 not significant sma1=-.0422 -.0422 not significant Although SARIMA(1,0,0)×(1,1,0)12 is a not bad model with all its coefficients are significant. We will conclude that SARIMA(1,0,0)×(2,1,0)12 is the best fit because it has smaller AIC value of 541.01 with all its coefficients are significant.
  • 7. Natural Gas Time Series Analysis Shalavadi 7 Diagnostic checking for ARIMA(1,1,0): > fit1 = arima(newgas,order=c(1,0,0), seasonal = list(order = c(1, 1, 0), period = 12)) > fit1 Call: arima(x = newgas, order = c(1, 0, 0), seasonal = list(order = c(1, 1, 0), period = 12)) Coefficients: ar1 sar1 0.0400 -0.4823 s.e. 0.0698 0.0591 sigma^2 estimated as 1.145: log likelihood = -304.88, aic = 615.75 > Box.test(residuals(fit1), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0) Box-Pierce test data: residuals(fit1) X-squared = 0.00081791, df = 1, p-value = 0.9772 All p-values larger than 0.05, so it passed the tests. SARIMA(1,0,0)×(1, 1, 0)12 is not a bad model with significance coefficient of 0.0400 for ar1 and significant coefficient of -0.4823 for ma1. Now, I will try SARIMA model to fit the series, and compared which one fits the best. The one with significant coefficient and lowest AIC values should be the best. Diagnostic checking for ARIMA (1,0,0): > fit2 = arima(newgas,order=c(1,0,0), seasonal = list(order = c(1, 0, 0), period = 12)) > fit2 Call: arima(x = newgas, order = c(1, 0, 0), seasonal = list(order = c(1, 0, 0), period = 12)) Coefficients: ar1 sar1 intercept -0.0093 -0.0668 0.0110 s.e. 0.0683 0.0680 0.0527 sigma^2 estimated as 0.69: log likelihood = -266.44, aic = 540.88 > Box.test(residuals(fit2), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0) Box-Pierce test data: residuals(fit2) X-squared = 6.8896e-08, df = 1, p-value = 0.9998 >tsdiag(fit2) All p-values larger than 0.05, so it passed the tests.
  • 8. Natural Gas Time Series Analysis Shalavadi 8 According to the tsdiag plot for fit2, the standardized residuals plot doesn’t show clusters of volatility. The ACF plots show no significant autocorrelation between the residuals. The p- values for the Ljung–Box statistics are mostly above the blue. This could be our optimal model, but since fit4 has a lower AIC value, it is preferred. Diagnostic checking for ARIMA (2,1,0): > fit3 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 1, 0), period = 12)) > fit3 Call: arima(x = newgas, order = c(2, 0, 0), seasonal = list(order = c(2, 1, 0), period = 12)) Coefficients: ar1 ar2 sar1 sar2 0.0388 -0.0592 -0.6561 -0.3351 s.e. 0.0698 0.0699 0.0648 0.0626 sigma^2 estimated as 0.9913: log likelihood = -291.66, aic = 593.32 > Box.test(residuals(fit3), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0) Box-Pierce test data: residuals(fit3) X-squared = 0.0045882, df = 1, p-value = 0.946 All p-values larger than 0.05, so it passed the tests. tsdiag plot of fit 2 model
  • 9. Natural Gas Time Series Analysis Shalavadi 9 Diagnostic checking for ARIMA (2,1,1): > fit4 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 1, 1), period = 12)) > fit4 Call arima(x = newgas, order = c(2, 0, 0), seasonal = list(order = c(2, 1, 1), period = 12)) Coefficients: ar1 ar2 sar1 sar2 sma1 0.0117 -0.0102 -0.1245 -0.1152 -1.0000 s.e. 0.0701 0.0705 0.0717 0.0696 0.0731 sigma^2 estimated as 0.6438: log likelihood = -264.5, aic = 541.01 > Box.test(residuals(fit4), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0) Box-Pierce test data: residuals(fit4) X-squared = 2.5971e-06, df = 1, p-value = 0.9987 tsdiag(fit4) All p-values are larger than 0.05, passes the tests.
  • 10. Natural Gas Time Series Analysis Shalavadi 10 According to the tsdiag plot for fit4, the standardized residuals plot doesn’t show clusters of volatility. The ACF plots show no significant autocorrelation between the residuals. The p- values for the Ljung–Box statistics are all mostly above the blue. Therefore, we have white noise for SARIMA(2,0,0)×(2,1,1)12, and it is an adequate model. The model equation is: (1 – φ1B + φ1B2)(1 – Φ1B12–Φ2 B24)(1 – B12)Xt = Wt Diagnostic checking for ARIMA (2,0,1): > fit5 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 0, 1), period = 12)) > fit5 Call: arima(x = newgas, order = c(2, 0, 0), seasonal = list(order = c(2, 0, 1), period = 12)) Coefficients: ar1 ar2 sar1 sar2 sma1 -0.0151 0.0072 -0.0297 -0.0507 -0.0422 s.e. 0.0690 0.0690 0.4389 0.0733 0.4374 intercept 0.0105 s.e. 0.0501 sigma^2 estimated as 0.6877: log likelihood = -266.12, aic = 546.23 > Box.test(residuals(fit5), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0) Box-Pierce test data: residuals(fit5) X-squared = 6.152e-06, df = 1, p-value = 0.998 All p-values larger than 0.05, passed the tests. Forecasting: plot(forecast(fit)) The forecast plot looks pretty good, and it seems to capture the overall movement of the data adequately.
  • 11. Natural Gas Time Series Analysis Shalavadi 11 Diagnostic Plots for Time-Series Fits Comparing real time data with 10 future values y=read.table("realdataofnaturalgas.txt", header=T) data=ts(y$gas, frequency = 12, start = c(1991)) par(mfrow=c(2,1)) plot(forecast(fit4)) plot(data,main="real data") The second time series plot above is from the real data. Looking at the real data plot, there is a decreasing trend from 2014 to 2016, which is similar to the forecast plot. We can see that the forecast value is relatively close to the real value because most of the real data sets from 2014 to 2016 fall fairly into the confident intervals (the shadow part of forecast plot). As a result, the outcome shows that the forecast was fairly adequate. Real data time series plot
  • 12. Natural Gas Time Series Analysis Shalavadi 12 III. Sources Data from: http://www.indexmundi.com/commodities/?commodity=natural-gas&months=240 IV. IV. Code library(forecast) library(MASS) # include packages astsa, forecast, MASS, timeDate, timeSeries, tseries setwd("C:/Sandesh/College Stuff/UCSB/PSTAT 126") x=read.table("realdataofnaturalgas.csv", sep=",", header=T) gas=ts(x$Price, start = c(1996,4), end = c(2014,4), frequency = 12) # mean and variance of gas data set before transformation mean(gas) var(gas) #ACF and PACF plot of original data set (non-stationary), ACF trails off acf2(ts(newgas)) # transformed data set to make stationary time series newgas=diff(gas) # plot of original and new data set to show difference after removing trends and adding lag plot(gas,main ='Price of natural gas (US Dollars)', xlab='Year', ylab='Price', lwd=2) plot(newgas,main ='Price of natural gas (US Dollars)', xlab='Year', ylab='Price', lwd=2) # mean and variance for transformed data set mean(newgas) var(newgas) #ACF and PACF of transformed data set acf2(ts(newgas)) adf.test(newgas) plot(decompose(newgas)) auto.arima(newgas) #fit models fit1 = arima(newgas,order=c(1,0,0), seasonal = list(order = c(1, 1, 0), period = 12)) fit2 = arima(newgas,order=c(1,0,0), seasonal = list(order = c(1, 0, 0), period = 12)) fit3 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 1, 0), period = 12)) fit4 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 1, 1), period = 12)) fit5 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 0, 1), period = 12))
  • 13. Natural Gas Time Series Analysis Shalavadi 13 # ARIMA(1,1,0) has the smallest AIC, ARIMA(2,1,0), # ARIMA(1,0,0), ARIMA(2,1,1) and ARIMA(2,0,1) have similar AIC. #simulate models > fit1 > fit2 > fit3 > fit4 > fit5 > # Diagnostic checking for models fit1, fit2, fit3, fit4, fit5 # plot acf of residuals, standardized residuals and p-values test >tsdiag(fit2) >tsdiag(fit4) > plot(forecast(fit1)) # box-pierce and Ljung Box test for all fitted models > Box.test(residuals(fit1), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0) > Box.test(residuals(fit2), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0) > Box.test(residuals(fit3), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0) > Box.test(residuals(fit4), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0) > Box.test(residuals(fit5), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0) #confidence intervals for all fitted models > confint(fit1) > confint(fit2) > confint(fit3) > confint(fit4) > confint(fit5) y=read.table("realdata3.csv",sep=",", header=T) data=ts(y$Price,start = c(2014,6), end = c(2016,4), frequency=12) # forecast next 10 observations of original time series # prediction interval pred<-predict(fit4, n.ahead = 10) pred.se<-pred$sepred<-predict(fit4, n.ahead = 10) par(mfrow=c(2,1)) plot(forecast(fit4)) plot.ts(data,main='real data', xlab='Year',ylab='Price',lwd=2)