Natural Gas Time Series Analysis
The author analyzes natural gas price data from 1996 to 2016 using R. After differencing to achieve stationarity, ARIMA models are fitted and the SARIMA(1,0,0)×(2,1,1)12 model is identified as best based on having the lowest AIC value and significant coefficients. Forecasting with this model shows the predicted values follow a similar decreasing trend as the actual later data. Diagnostic checks confirm the residuals exhibit white noise. The analysis provides useful prediction of natural gas prices.
Natural Gas Time Series Analysis: SARIMA(1,0,0)×(2,1,1)12 Best Fit Model
1. Natural Gas Time Series Analysis
Shalavadi
1
Natural Gas Time Series Analysis
By Sandesh Shalavadi
PSTAT 174
June 7, 2016
2. Natural Gas Time Series Analysis
Shalavadi
2
Sandesh Shalavadi (#7210834)
PSTAT 174
Gyorgy Terdik
6/6/2016
Natural Gas Data Time Series Analysis
I. Introduction
Natural gas is hydrocarbon gas that is made of mainly methane and smaller amounts of
carbon dioxide, hydrogen sulfide, helium, or nitrogen. It is formed as numerous layers of
plant and animal life are exposed to extreme heat and pressure under the Earth’s surface. It is
a fossil fuel that is currently used for cooking, electricity generation, and heating. Although,
in comparison to its counterparts, such as petroleum or coal, it is much more efficient and
releases far fewer emissions to the environment without polluting the air. Therefore, it is a
very safe gas and is also environmentally friendly, which is significant in today’s polluted
atmosphere. In addition, natural gas is odorless, colorless, and shapeless. If a leak occurs, it
will instantly accumulate upwards, so it wouldn’t be able to build and cause a potential
explosion.
I chose natural gas as my idea to research because I am interested in the recent efforts to
change climate control and combat global warming. In the Paris climate control talks in 2015,
195 countries gathered to discuss how they could alleviate the rapid expansion of global
warming by reducing the consumption of greenhouse gases and making developing countries
more eco-friendly so as to adapt to the adverse effects of the climate change as well as build a
financial plan to support a pathway to climate-resilient development. The time series analysis
of natural gas will help to improve competent prediction for the change of manufacturing
costs and labor, which will each country reach their respective goals.
The goal for this project is to initiate time series analysis for an actual time series growth
data set. I have chosen the (Price of Natural Gas) from 1996 to 2016. This is a span of 20
years which will show how fluctuation in price affected the overall data. First, I used R to
analyze the data and then tried to fit a model. First, I used a differencing to transform the data
to get a stationary time series and remove trend and seasonality. Then, I estimated the
parameters from the ACF and PACF of the new differenced time series. I analyzed several
plots to select five possible models to fit the data. Subsequently, I checked for the smallest
AIC. Finally, I used the fitting models to forecast the future values.
3. Natural Gas Time Series Analysis
Shalavadi
3
II. Sections
According to the original time series plot, it is not a stationary process since they are a
few upwards trends that portray a significant increase in the price of natural gas. The mean is
4.625945 and the variance is 5.428623. Therefore, there was no seasonality in the original
plot. There was sharp changes in the years 2001, 2006, and 2008. The price reached its
highest point at October 2005 and its lowest point at December 1998.
Modelling:
Time Series plot of original data
ACF and PACF plot of original data
4. Natural Gas Time Series Analysis
Shalavadi
4
From the autocorrelation function plot (ACF), the gradual decay of ACF shows not stationarity.
Therefore, in order to get a stationary series, I differenced the data so I could remove the trend.
I used differencing to transform the data instead of using box-cox or log transformation so as
to avoid using adding process.
Differenced time series data:
Time series plot of differenced data
ACF and PACF plot of differenced data
5. Natural Gas Time Series Analysis
Shalavadi
5
Parameter estimation
After differencing the time series, the new time series plot looks stationary. The mean is
0.01101852, which is approximately 0. While the variance decreased to 0.6964492. Also, the
ACF shows seasonality. Also, since the ACF and the PACF plot trail off rapidly, an ARIMA
model is possible. According to the ACF plot, the ACF cuts off at lag 5 and lag 9.
According to the PACF plot, the PACF cuts of at lag 5 and lag 9.
----------------------------------------------------------------------------------------------------------------
adf.test(newgas)
Augmented Dickey-Fuller Test
data: newgas
Dickey-Fuller = -6.1023, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary
p-values < 0.05 indicated stationary.
The low p-value from the ADF test also confirms that the data is stationary.
----------------------------------------------------------------------------------------------------------------
plot(decompose(newgas))
Seasonal trend of fitted data
From the decomposition of the time series, we can see that there is a quadratic trend with a
seasonal trend, which suggests a seasonal model to fit the data.
6. Natural Gas Time Series Analysis
Shalavadi
6
Model Diagnostics:
Fit the models
In order to find the best model, I fit the following models and afterwards, I checked their AIC
in order to find the one model with the smallest AIC. Here are the formulas for the models:
Model 1 ARIMA (1,1,0)
Model 2 ARIMA (1,0,0): lowest AIC;
Model 3 ARIMA (2,1,0)
Model 4 ARIMA (2,1,1): second lowest AIC;
Model 5 ARIMA (2,0,1)
Utilizing “auto.arima” funciton for our first model to give us a general idea.We will attempt to
find the best seasonal ARIMA model AIC value.
auto.arima(gas)
Series: gas
ARIMA(1,0,0) with non-zero mean
Coefficients:
ar1 sar1
0.0400 -0.4823
s.e. 0.0698 0.0591
sigma^2 estimated as 0.6964: log likelihood=-266.92
AIC=537.84 AICc=537.9 BIC=544.59
From “auto.arima” function, we can see that ARIMA(1,0,0) is not a bad model with
significance coefficient of 0.0400 for ar1.
Model AIC Coefficient s.e. Conclusion
SARIMA(1,0,0)×(1,1,0)12 615.75
ar1=0.0400 0.0698 significant
sar1-0.4823 0.0591 significant
SARIMA(1,0,0)×(1,0,0)12 540.88
ar1=-0.0093 0.0683 significant
sar1=-.0668 0.68 not significant
SARIMA(2,0,0)×(2,1,0)12 593.32
ar1=0.0388 0.0698 significant
sar1=-0.6561 0.0648 not significant
SARIMA(2,0,0)×(2,1,1)12 541.01
ar1=0.0117 0.0701 significant
sar1=-.1245 0.0717 significant
sma1=-1.00 0.0731 significant
SARIMA(2,0,0)×(2,0,1)12 546.23
ar1=-0.0151 0.069 significant
sar1=-0.0297 0.4389 not significant
sar2=-0.0507 0.0733 not significant
sma1=-.0422 -.0422 not significant
Although SARIMA(1,0,0)×(1,1,0)12 is a not bad model with all its coefficients are significant.
We will conclude that SARIMA(1,0,0)×(2,1,0)12 is the best fit because it has smaller AIC value
of 541.01 with all its coefficients are significant.
7. Natural Gas Time Series Analysis
Shalavadi
7
Diagnostic checking for ARIMA(1,1,0):
> fit1 = arima(newgas,order=c(1,0,0), seasonal = list(order = c(1, 1, 0), period = 12))
> fit1
Call:
arima(x = newgas, order = c(1, 0, 0), seasonal = list(order = c(1, 1, 0), period = 12))
Coefficients:
ar1 sar1
0.0400 -0.4823
s.e. 0.0698 0.0591
sigma^2 estimated as 1.145: log likelihood = -304.88, aic = 615.75
> Box.test(residuals(fit1), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
Box-Pierce test
data: residuals(fit1)
X-squared = 0.00081791, df = 1, p-value = 0.9772
All p-values larger than 0.05, so it passed the tests.
SARIMA(1,0,0)×(1, 1, 0)12 is not a bad model with significance coefficient of 0.0400 for ar1
and significant coefficient of -0.4823 for ma1. Now, I will try SARIMA model to fit the series,
and compared which one fits the best. The one with significant coefficient and lowest AIC
values should be the best.
Diagnostic checking for ARIMA (1,0,0):
> fit2 = arima(newgas,order=c(1,0,0), seasonal = list(order = c(1, 0, 0), period = 12))
> fit2
Call:
arima(x = newgas, order = c(1, 0, 0), seasonal = list(order = c(1, 0, 0), period = 12))
Coefficients:
ar1 sar1 intercept
-0.0093 -0.0668 0.0110
s.e. 0.0683 0.0680 0.0527
sigma^2 estimated as 0.69: log likelihood = -266.44, aic = 540.88
> Box.test(residuals(fit2), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
Box-Pierce test
data: residuals(fit2)
X-squared = 6.8896e-08, df = 1, p-value = 0.9998
>tsdiag(fit2)
All p-values larger than 0.05, so it passed the tests.
8. Natural Gas Time Series Analysis
Shalavadi
8
According to the tsdiag plot for fit2, the standardized residuals plot doesn’t show clusters of
volatility. The ACF plots show no significant autocorrelation between the residuals. The p-
values for the Ljung–Box statistics are mostly above the blue. This could be our optimal
model, but since fit4 has a lower AIC value, it is preferred.
Diagnostic checking for ARIMA (2,1,0):
> fit3 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 1, 0), period = 12))
> fit3
Call:
arima(x = newgas, order = c(2, 0, 0), seasonal = list(order = c(2, 1, 0), period = 12))
Coefficients:
ar1 ar2 sar1 sar2
0.0388 -0.0592 -0.6561 -0.3351
s.e. 0.0698 0.0699 0.0648 0.0626
sigma^2 estimated as 0.9913: log likelihood = -291.66, aic = 593.32
> Box.test(residuals(fit3), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
Box-Pierce test
data: residuals(fit3)
X-squared = 0.0045882, df = 1, p-value = 0.946
All p-values larger than 0.05, so it passed the tests.
tsdiag plot of fit 2 model
9. Natural Gas Time Series Analysis
Shalavadi
9
Diagnostic checking for ARIMA (2,1,1):
> fit4 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 1, 1), period = 12))
> fit4
Call
arima(x = newgas, order = c(2, 0, 0), seasonal = list(order = c(2, 1, 1), period = 12))
Coefficients:
ar1 ar2 sar1 sar2 sma1
0.0117 -0.0102 -0.1245 -0.1152 -1.0000
s.e. 0.0701 0.0705 0.0717 0.0696 0.0731
sigma^2 estimated as 0.6438: log likelihood = -264.5, aic = 541.01
> Box.test(residuals(fit4), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
Box-Pierce test
data: residuals(fit4)
X-squared = 2.5971e-06, df = 1, p-value = 0.9987
tsdiag(fit4)
All p-values are larger than 0.05, passes the tests.
10. Natural Gas Time Series Analysis
Shalavadi
10
According to the tsdiag plot for fit4, the standardized residuals plot doesn’t show clusters of
volatility. The ACF plots show no significant autocorrelation between the residuals. The p-
values for the Ljung–Box statistics are all mostly above the blue. Therefore, we have white
noise for SARIMA(2,0,0)×(2,1,1)12, and it is an adequate model.
The model equation is: (1 – φ1B + φ1B2)(1 – Φ1B12–Φ2 B24)(1 – B12)Xt = Wt
Diagnostic checking for ARIMA (2,0,1):
> fit5 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 0, 1), period = 12))
> fit5
Call:
arima(x = newgas, order = c(2, 0, 0), seasonal = list(order = c(2, 0, 1), period = 12))
Coefficients:
ar1 ar2 sar1 sar2 sma1
-0.0151 0.0072 -0.0297 -0.0507 -0.0422
s.e. 0.0690 0.0690 0.4389 0.0733 0.4374
intercept
0.0105
s.e. 0.0501
sigma^2 estimated as 0.6877: log likelihood = -266.12, aic = 546.23
> Box.test(residuals(fit5), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
Box-Pierce test
data: residuals(fit5)
X-squared = 6.152e-06, df = 1, p-value = 0.998
All p-values larger than 0.05, passed the tests.
Forecasting:
plot(forecast(fit))
The forecast plot looks pretty good, and it seems to capture the overall movement of the data
adequately.
11. Natural Gas Time Series Analysis
Shalavadi
11
Diagnostic Plots for Time-Series Fits
Comparing real time data with 10 future values
y=read.table("realdataofnaturalgas.txt", header=T)
data=ts(y$gas, frequency = 12, start = c(1991))
par(mfrow=c(2,1))
plot(forecast(fit4))
plot(data,main="real data")
The second time series plot above is from the real data. Looking at the real data plot, there is a
decreasing trend from 2014 to 2016, which is similar to the forecast plot. We can see that the
forecast value is relatively close to the real value because most of the real data sets from 2014
to 2016 fall fairly into the confident intervals (the shadow part of forecast plot).
As a result, the outcome shows that the forecast was fairly adequate.
Real data time series plot
12. Natural Gas Time Series Analysis
Shalavadi
12
III. Sources
Data from: http://www.indexmundi.com/commodities/?commodity=natural-gas&months=240
IV. IV. Code
library(forecast)
library(MASS)
# include packages astsa, forecast, MASS, timeDate, timeSeries, tseries
setwd("C:/Sandesh/College Stuff/UCSB/PSTAT 126")
x=read.table("realdataofnaturalgas.csv", sep=",", header=T)
gas=ts(x$Price, start = c(1996,4), end = c(2014,4), frequency = 12)
# mean and variance of gas data set before transformation
mean(gas)
var(gas)
#ACF and PACF plot of original data set (non-stationary), ACF trails off
acf2(ts(newgas))
# transformed data set to make stationary time series
newgas=diff(gas)
# plot of original and new data set to show difference after removing trends and adding lag
plot(gas,main ='Price of natural gas (US Dollars)', xlab='Year', ylab='Price', lwd=2)
plot(newgas,main ='Price of natural gas (US Dollars)', xlab='Year', ylab='Price', lwd=2)
# mean and variance for transformed data set
mean(newgas)
var(newgas)
#ACF and PACF of transformed data set
acf2(ts(newgas))
adf.test(newgas)
plot(decompose(newgas))
auto.arima(newgas)
#fit models
fit1 = arima(newgas,order=c(1,0,0), seasonal = list(order = c(1, 1, 0), period = 12))
fit2 = arima(newgas,order=c(1,0,0), seasonal = list(order = c(1, 0, 0), period = 12))
fit3 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 1, 0), period = 12))
fit4 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 1, 1), period = 12))
fit5 = arima(newgas,order=c(2,0,0), seasonal = list(order = c(2, 0, 1), period = 12))
13. Natural Gas Time Series Analysis
Shalavadi
13
# ARIMA(1,1,0) has the smallest AIC, ARIMA(2,1,0),
# ARIMA(1,0,0), ARIMA(2,1,1) and ARIMA(2,0,1) have similar AIC.
#simulate models
> fit1
> fit2
> fit3
> fit4
> fit5
> # Diagnostic checking for models fit1, fit2, fit3, fit4, fit5
# plot acf of residuals, standardized residuals and p-values test
>tsdiag(fit2)
>tsdiag(fit4)
> plot(forecast(fit1))
# box-pierce and Ljung Box test for all fitted models
> Box.test(residuals(fit1), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
> Box.test(residuals(fit2), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
> Box.test(residuals(fit3), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
> Box.test(residuals(fit4), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
> Box.test(residuals(fit5), lag=1, type = c("Box-Pierce","Ljung-Box"), fitdf =0)
#confidence intervals for all fitted models
> confint(fit1)
> confint(fit2)
> confint(fit3)
> confint(fit4)
> confint(fit5)
y=read.table("realdata3.csv",sep=",", header=T)
data=ts(y$Price,start = c(2014,6), end = c(2016,4), frequency=12)
# forecast next 10 observations of original time series
# prediction interval
pred<-predict(fit4, n.ahead = 10)
pred.se<-pred$sepred<-predict(fit4, n.ahead = 10)
par(mfrow=c(2,1))
plot(forecast(fit4))
plot.ts(data,main='real data', xlab='Year',ylab='Price',lwd=2)