SlideShare a Scribd company logo
1 of 11
Society consumes trillions upon trillions of petroleum in barrels every month. The question at
hand is whether or not our consumption follows a pattern. In fact it does! Presented in the data is the
total amount of petroleum consumption by the residential sector, from 1984 to 2013. Figure 1 clearly
shows that our consumption fluctuates above and below the overall mean of our data! So by plotting a
Loess curve we are able to visibly see the rising and falling trends of our data! To remedy this trend we
take the first difference as well as a log, the latter as a means to reduce the varying scale of our data,
and behold the data is very much stationary, as there is very little fluctuation from the mean, which is
exactly what we need for our data to be stationary!
Now what model should we employ to represent our data? Figure two contains the ACF and
Figure 1: The RAW, Transformed, and Transformed + First Difference of our data. Notice how trend
becomes less and less apparent!
the PACF plots of our transformed and differenced data, and quite clearly we can see that neither an
auto regressive model nor would a moving average model be able to represent our model, so we must
use an ARMA model, and because we have taken a first difference we must employ the use of an
ARIMA(q,d,p) model! Using R's forecast package I employ the use of auto.arima() function to
determine our parameters, which come out to be ARIMA(5,1,4).
We now have a model to fit our data to! The next step to see if this model fits very well is to
perform a number of diagnostics to ensure our model is a good fit! First we look at the ACF and PACF
of our residuals, and there happen to significant values at particular lags, which tells me that the 5,1,4
model may not have been the best model to represent our data. Next we look to see if our residuals
fluctuate about a mean of 0, and quite certainly they in fact do. The next question is: are the residuals
normal and independent? The normal plot shows that they are indeed normal, and the Box-Ljung test
gives us a significantly large p-test (p = .94), which implies that our residuals are independent.
Figure 2: The ACF and PACF of our transformed and differenced data, notice that both do not cut off.
Implying that an ARMA model is appropriate! We also have our periodogram of our data! Notice that
a large peak occurs around .833! This is our frequency which relates to a cycle of 1.2 months!
However do take notice of the Ljung-Box test plot clearly shows that after two lags our model no
longer shows independent residuals. What does this mean? It implies any lag higher than two may be
of poor representation, in other words our model is only decent at examining two future values. Figure
three represent our conclusions graphically. Our results on the residuals are as follows:
1. Independent for lags up to 2
2. Constant variance
3. Fluctuate about mean
4. Residuals are normal
Our residuals do not represent only white noise, there are other errors being accounted for. So the
question is now then how effective is our model in predicting our future consumption of petroleum?
Using our suggested model, ARIMA(5,1,4) we attempt to predict the last 12 months of our
Figure 3: Our residual lots concerning residual analysis for our model. Notice how our model fails the
Ljung Box test and clearly shows significant values in it's ACF and PACF! This implies our model is
not the BEST!
observed data. Figure 4 shows our forecasted values, our observed values, as well as our prediction
bands (our standard error.) Our predicted values are actually quite spot on! Our observed and
forecasted values are well within range of one another and lie within our prediction bands, so our
model, ARIMA(5,1,4) turned out to be a decent fit for representing future values!
Using time series methods we observed that we were able to draw successful predictions to how
much petroleum the residential sector will consume per month! This example shows time series's
effectiveness in predicting the future. However what I must take note of is that our prediction were for
the next year only, not the year afterward or the year after that one. What should be taken with a grain
of salt is that though we can successfully predict the future to a relatively good degree, we are still
unable to account for possibly shocks in the future with the methods presented here. FIGURE clearly
shows how large our room for error is! Nevertheless time series remains to be a powerful tool in
predicting the future based on past observations.
Figure 4: Our predicted values and the observed values. Take note of how well our model actually
predicted the values!
Figure 6: Our periodogram of our final model, notice that it still retains the same significant frequency
(.899), which is made even more apparent in it's smoothed counter part!
Explicit R Code & Output:
> data <- read.table("Data.txt", header = TRUE)
>
> #Determine if the Series is stationary.
>
> #Plot the original data
> par(mfrow=(c(3,1)))
> Petro <- data$Petro
> Count = (1:359)
>
> plot(Petro~Count, type = "l",main = "Petriolium Consumed Be Residential Sector (Trillions) 1984 -
2013", ylab = "Original Petrolium")
> fitO = loess(Petro~Count, span = .25, family = "gaussian")
> lines(Count,fitO$fitted,col="red")
> abline(h = mean(Petro), col = 'blue')
> legend(0,100, c("Observed", "Loess", "Mean"), lty = c(1,1), lwd=c(2.5,2.5),col = c("black","red",
"blue"))
>
> #Transform the Data
> transData <- log(Petro)
> plot(transData ~ Count, type = "l", main = "Petriolium Consumed Be Residential Sector (Trillions)
1984 - 2013",ylab = "Log Petrolium")
> fitL = loess(transData ~ Count, span = .25, family = "gaussian")
> lines(Count,fitL$fitted,col="red")
> abline(h = mean(transData), col = 'blue')
> legend(0,4.4, c("Observed", "Loess", "Mean"), lty = c(1,1), lwd=c(2.5,2.5),col = c("black","red",
"blue"))
>
> #Take the First difference
> transDif <- diff(transData,differences=1)
> newCount = (1:358)
>
> #Plot the transformed data.
> plot(transDif~newCount, type = "l", main = "Petriolium Consumed Be Residential Sector (Trillions)
1984 - 2013",ylab = "First Difference Petrolium")
> fitD = loess(transDif ~ newCount, span = .25, family = "gaussian")
> lines(newCount,fitD$fitted,col="red")
> abline(h = mean(transDif), col = 'blue')
> legend(100,-.2, c("Observed", "Loess", "Mean"), lty = c(1,1), lwd=c(2.5,2.5),col = c("black","red",
"blue"))
>
> #This will make our Data stationary.
> #I will choose the Log + first difference of the Data.
> par(mfrow=c(2,2))
> acf (transDif, lag = 400)
> pacf (transDif, lag = 400)
> dataPeriod <-spec.pgram(transDif, taper = 0, fast=FALSE, detrend=FALSE,demean
=TRUE,log="no")
> k = kernel("modified.daniell", c(6,6))
> dataSmooth <- spec.pgram(transDif, k, taper=0, detrend=FALSE, demean=TRUE, log="no")
> which.max(dataSmooth$spec)
[1] 30
> dataSmooth$spec
[1] 0.001264492 0.001262252 0.001258110 0.001255941 0.001258441 0.001264701 0.001267342
0.001254761 0.001227113 0.001201434 0.001185966 0.001170703
[13] 0.001152362 0.001138549 0.001135637 0.001142318 0.001149538 0.003714350 0.011430265
0.021780544 0.032202750 0.042688695 0.053230640 0.063797912
[25] 0.074367912 0.084954830 0.095570415 0.106190513 0.116819454 0.122353888 0.117608038
0.107654882 0.097649276 0.087546969 0.077315275 0.067037882
[37] 0.056772306 0.046487385 0.036148892 0.025785512 0.015401723 0.007545190 0.004845270
0.004743850 0.004594111 0.004448212 0.004388544 0.005059599
[49] 0.007097739 0.009831243 0.012583215 0.015356612 0.018171704 0.021008704 0.023799299
0.026541293 0.029318707 0.032133087 0.034957185 0.036453173
[61] 0.035248132 0.032676137 0.030111676 0.027542471 0.024950235 0.022379948 0.019875935
0.017450727 0.015056079 0.012668207 0.010239976 0.008366218
[73] 0.007789588 0.007892669 0.007971573 0.008027764 0.008053122 0.008076994 0.008147917
0.008187799 0.008166425 0.008111941 0.008041100 0.008015643
[85] 0.008000086 0.007988294 0.008046627 0.008163193 0.008324955 0.008404989 0.008274497
0.008066992 0.007879108 0.007689639 0.007516088 0.007352704
[97] 0.007199961 0.007011720 0.006713313 0.006328845 0.005897773 0.005503301 0.005221036
0.005000439 0.004806468 0.004635803 0.004464696 0.004473053
[109] 0.004829694 0.005395164 0.006066362 0.006808720 0.007589485 0.008430153 0.009365512
0.010430756 0.011573510 0.012742851 0.013964865 0.014880976
[121] 0.015124874 0.014982793 0.014762723 0.014578933 0.014445398 0.014303281 0.014067069
0.013600449 0.012958840 0.012264598 0.011521265 0.010893322
[133] 0.010589373 0.010517923 0.010465841 0.010283997 0.009971791 0.010131472 0.011418931
0.013484145 0.015822529 0.018271292 0.020717686 0.023187057
[145] 0.025653494 0.028048051 0.030435900 0.032904177 0.035450444 0.036900974 0.036047474
0.033849221 0.031288464 0.028530360 0.025783871 0.023068047
[157] 0.020398738 0.017794329 0.015191281 0.012567859 0.009992336 0.008147941 0.007657208
0.007900026 0.008349436 0.008983892 0.009668505 0.010320199
[169] 0.010926820 0.011538437 0.012218961 0.012937053 0.013547797 0.013941216 0.014164212
0.014394409 0.014640860 0.014777446 0.014802968 0.014796573
> freq = .833
> length(Petro)
[1] 359
> cyc <- 1/.833 # number of Montshs in a cycle
> cyc
[1] 1.20048
> cycTotal <- length(Petro)/cyc #number of total cycles in our data
> cycTotal
[1] 299.047
> specMax = .122
> #What is the preliminary Model we should choose.
>
> #ACF AND PACF BOTH TAIL OFF TO ZERO.
> library('forecast')
Loading required package: zoo
Attaching package: ‘zoo’
The following objects are masked from ‘package:base’:
as.Date, as.Date.numeric
Loading required package: timeDate
This is forecast 5.2
> library('astsa')
> library("FitAR")
Loading required package: lattice
Loading required package: leaps
Loading required package: ltsa
Loading required package: bestglm
Loading required package: lars
Loaded lars 1.2
Loading required package: ElemStatLearn
Attaching package: ‘FitAR’
The following object is masked from ‘package:forecast’:
BoxCox
>
> #I choose an ARIMA model, d = 0 since the data is already differenced.
>
> #Select the final model using ID Criterion
> auto.arima(transDif, max.p = 8, max.q = 8, stationary = TRUE, ic = c("aicc", "aic", "bic"), trace =
TRUE)
ARIMA(2,0,2) with non-zero mean : -439.6081
ARIMA(0,0,0) with non-zero mean : -390.096
ARIMA(1,0,0) with non-zero mean : -414.5041
ARIMA(0,0,1) with non-zero mean : -401.6552
ARIMA(1,0,2) with non-zero mean : -435.8972
ARIMA(3,0,2) with non-zero mean : -590.5202
ARIMA(3,0,1) with non-zero mean : -553.3992
ARIMA(3,0,3) with non-zero mean : -624.6525
ARIMA(4,0,4) with non-zero mean : -641.8194
ARIMA(4,0,4) with zero mean : -643.3304
ARIMA(3,0,4) with zero mean : 1e+20
ARIMA(5,0,4) with zero mean : -654.9737
ARIMA(5,0,3) with zero mean : -636.1066
ARIMA(5,0,5) with zero mean : -629.8412
ARIMA(4,0,3) with zero mean : -636.4764
ARIMA(6,0,5) with zero mean : 1e+20
ARIMA(5,0,4) with non-zero mean : -653.4301
ARIMA(6,0,4) with zero mean : 1e+20
Best model: ARIMA(5,0,4) with zero mean
Series: transDif
ARIMA(5,0,4) with zero mean
Coefficients:
ar1 ar2 ar3 ar4 ar5 ma1 ma2 ma3 ma4
-0.2040 1.0071 0.1723 -0.9667 -0.1743 -0.0248 -1.2324 -0.1158 0.8848
s.e. 0.0613 0.0171 0.0601 0.0152 0.0599 0.0335 0.0319 0.0293 0.0282
sigma^2 estimated as 0.008753: log likelihood=336.29
AIC=-652.58 AICc=-651.95 BIC=-613.78
>
> #My model is going to be of order (5,1,4)
>
>
> #Perform residual analysis, show that our model is good. Make sure our error is white noise.
> par(mfrow = c(4,2))
>
> #Plot of Residuals
> fit <- Arima(transDif, order = c(5,0,4))
> plot(fit$residual, type = "p")
>
> abline(h=0, col = "red")
>
> #Normally Distributed?
> hist(fit$residuals)
>
> #AutoCorrelations present?
> acf(fit$residuals)
> pacf(fit$residuals)
>
> #Are they Normal?
> qqnorm(fit$residuals)
>
> #Does the Test say it is a good fit?
> Box.test(fit$residuals, type = c("Ljung-Box"))
Box-Ljung test
data: fit$residuals
X-squared = 0.0056, df = 1, p-value = 0.9402
> LBQPlot(fit$residuals, lag.max = 30)
> mean(fit$res)
[1] -0.001070552
>
> #Smoothed periodgram of residuals
> k = kernel("modified.daniell", c(6,6))
> spec.pgram(fit$residuals, k, taper=0, detrend=FALSE, demean=TRUE, log="no")
>
>
> par(mfrow = c(2,1))
> #Plot Spec Density of the Final MOdel as well as the Smoothed
> finalModel <- fitted(fit)
> spec.pgram(finalModel, taper = 0, fast=FALSE, detrend=FALSE,demean =TRUE,log="no")
> k = kernel("modified.daniell", c(6,6))
> spec.pgram(finalModel, k, taper=0, detrend=FALSE, demean=TRUE, log="no")
>
> #Refit The final model
>
> refit <- Arima(Petro[1:348], order = c(5,1,4))
> pred <- predict(refit, n.ahead = 12, d =1)
> pred$pred
Time Series:
Start = 349
End = 360
Frequency = 1
[1] 96.98064 96.96685 94.07727 82.66665 75.30919 63.93817 59.73830 59.37145 62.29238 72.78900
79.44755 90.20105
> pred$pred + pred$se
Time Series:
Start = 349
End = 360
Frequency = 1
[1] 109.44587 112.64558 111.26915 100.66319 93.42068 82.07828 77.88662 77.55500 80.87230
92.11042 100.52580 113.21682
> forePlus <- c(109.44587, 112.64558 ,111.26915 ,100.66319 , 93.42068 , 82.07828 , 77.88662,
77.55500, 80.87230 , 92.11042 ,100.52580, 113.21682)
>
> pred$pred - pred$se
Time Series:
Start = 349
End = 360
Frequency = 1
[1] 84.51540 81.28813 76.88539 64.67011 57.19769 45.79806 41.58997 41.18790 43.71247 53.46758
58.36930 67.18528
> foreMins <- c(84.51540, 81.28813, 76.88539, 64.67011, 57.19769, 45.79806, 41.58997 ,41.18790,
43.71247, 53.46758 ,58.36930 ,67.18528)
>
> index <- c(348:359)
> fore <- c(96.98064 ,96.96685, 94.07727, 82.66665, 75.30919, 63.93817, 59.73830, 59.37145,
62.29238, 72.78900, 79.44755, 90.20105)
> par(mfrow=c(2,1))
> which.max(fore)
[1] 1
>
> par(mfrow = c(1,1))
>
> plot(Petro[348:359]~index, type = 'l', col = 'black', main = 'Plot of Forecasted and Observed Values',
xlab = "Time", ylab = "Petrolium Consumption", ylim = c(35,130))
> lines(fore~index, col = "red")
> lines(forePlus~index, col = "blue")
> lines(foreMins~index, col = "blue")
> legend(348,130, c("Observed","Predicted", "Prediction Bands"), lty = c(1,1), lwd=c(2.5,2.5),col =
c("black","red", "blue"))

More Related Content

Viewers also liked

Viewers also liked (6)

Writing sample - Press Releases and Donations
Writing sample - Press Releases and DonationsWriting sample - Press Releases and Donations
Writing sample - Press Releases and Donations
 
отчет слайд 2016 (1) г
отчет слайд 2016 (1) готчет слайд 2016 (1) г
отчет слайд 2016 (1) г
 
Cold Cliamte Air Source Heat Pumps: Study Update
Cold Cliamte Air Source Heat Pumps: Study UpdateCold Cliamte Air Source Heat Pumps: Study Update
Cold Cliamte Air Source Heat Pumps: Study Update
 
Healthcare with IT Enabled PPP
 Healthcare with IT Enabled PPP Healthcare with IT Enabled PPP
Healthcare with IT Enabled PPP
 
Galvanizing Methods
Galvanizing MethodsGalvanizing Methods
Galvanizing Methods
 
First Line Therapy in EGFR Mutant Advanced/Metastatic NSCLC
First Line Therapy in EGFR Mutant Advanced/Metastatic NSCLCFirst Line Therapy in EGFR Mutant Advanced/Metastatic NSCLC
First Line Therapy in EGFR Mutant Advanced/Metastatic NSCLC
 

Similar to Case Study of Petroleum Consumption With R Code

IRJET- Analysis of Crucial Oil Gas and Liquid Sensor Statistics and Productio...
IRJET- Analysis of Crucial Oil Gas and Liquid Sensor Statistics and Productio...IRJET- Analysis of Crucial Oil Gas and Liquid Sensor Statistics and Productio...
IRJET- Analysis of Crucial Oil Gas and Liquid Sensor Statistics and Productio...IRJET Journal
 
Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points FinalJohn Michael Croft
 
Analysis of Microsoft, Google and Apple Stock Prices
Analysis of Microsoft, Google and Apple Stock PricesAnalysis of Microsoft, Google and Apple Stock Prices
Analysis of Microsoft, Google and Apple Stock PricesHira Nadeem
 
Statistical Model to Predict IPO Prices for Semiconductor
Statistical Model to Predict IPO Prices for SemiconductorStatistical Model to Predict IPO Prices for Semiconductor
Statistical Model to Predict IPO Prices for SemiconductorXuanhua(Peter) Yin
 
Economic Forecasting Final Memo
Economic Forecasting Final MemoEconomic Forecasting Final Memo
Economic Forecasting Final MemoHannah Badgley
 
schuster_econometricsWritingSample
schuster_econometricsWritingSampleschuster_econometricsWritingSample
schuster_econometricsWritingSampleRoman Schuster
 
Regression Analysis of SAT Scores Final
Regression Analysis of SAT Scores FinalRegression Analysis of SAT Scores Final
Regression Analysis of SAT Scores FinalJohn Michael Croft
 
Predicting US house prices using Multiple Linear Regression in R
Predicting US house prices using Multiple Linear Regression in RPredicting US house prices using Multiple Linear Regression in R
Predicting US house prices using Multiple Linear Regression in RSotiris Baratsas
 
An Application Of TRAMO-SEATS Model Selection And Out-Of-Sample Performance....
An Application Of TRAMO-SEATS  Model Selection And Out-Of-Sample Performance....An Application Of TRAMO-SEATS  Model Selection And Out-Of-Sample Performance....
An Application Of TRAMO-SEATS Model Selection And Out-Of-Sample Performance....Wendy Berg
 
DOW JONES INDUSTRIAL AVERAGE Time series Data Analysis: Analysis and Results ...
DOW JONES INDUSTRIAL AVERAGE Time series Data Analysis: Analysis and Results ...DOW JONES INDUSTRIAL AVERAGE Time series Data Analysis: Analysis and Results ...
DOW JONES INDUSTRIAL AVERAGE Time series Data Analysis: Analysis and Results ...Ajay Bidyarthy
 
MSc Finance_EF_0853352_Kartik Malla
MSc Finance_EF_0853352_Kartik MallaMSc Finance_EF_0853352_Kartik Malla
MSc Finance_EF_0853352_Kartik MallaKartik Malla
 
Lecture2 Applied Econometrics and Economic Modeling
Lecture2 Applied Econometrics and Economic ModelingLecture2 Applied Econometrics and Economic Modeling
Lecture2 Applied Econometrics and Economic Modelingstone55
 
Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes Brock Butlett
 
Empowerment Technology Lesson 4
Empowerment Technology Lesson 4Empowerment Technology Lesson 4
Empowerment Technology Lesson 4alicelagajino
 

Similar to Case Study of Petroleum Consumption With R Code (20)

Spc training
Spc training Spc training
Spc training
 
Occidental petroleum corp.
Occidental petroleum corp.Occidental petroleum corp.
Occidental petroleum corp.
 
Occidental petroleum corp.
Occidental petroleum corp.Occidental petroleum corp.
Occidental petroleum corp.
 
20120140503019
2012014050301920120140503019
20120140503019
 
IRJET- Analysis of Crucial Oil Gas and Liquid Sensor Statistics and Productio...
IRJET- Analysis of Crucial Oil Gas and Liquid Sensor Statistics and Productio...IRJET- Analysis of Crucial Oil Gas and Liquid Sensor Statistics and Productio...
IRJET- Analysis of Crucial Oil Gas and Liquid Sensor Statistics and Productio...
 
Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points Final
 
Analysis of Microsoft, Google and Apple Stock Prices
Analysis of Microsoft, Google and Apple Stock PricesAnalysis of Microsoft, Google and Apple Stock Prices
Analysis of Microsoft, Google and Apple Stock Prices
 
Statistical Model to Predict IPO Prices for Semiconductor
Statistical Model to Predict IPO Prices for SemiconductorStatistical Model to Predict IPO Prices for Semiconductor
Statistical Model to Predict IPO Prices for Semiconductor
 
Economic Forecasting Final Memo
Economic Forecasting Final MemoEconomic Forecasting Final Memo
Economic Forecasting Final Memo
 
schuster_econometricsWritingSample
schuster_econometricsWritingSampleschuster_econometricsWritingSample
schuster_econometricsWritingSample
 
Crude Oil Levy
Crude Oil LevyCrude Oil Levy
Crude Oil Levy
 
Regression Analysis of SAT Scores Final
Regression Analysis of SAT Scores FinalRegression Analysis of SAT Scores Final
Regression Analysis of SAT Scores Final
 
Predicting US house prices using Multiple Linear Regression in R
Predicting US house prices using Multiple Linear Regression in RPredicting US house prices using Multiple Linear Regression in R
Predicting US house prices using Multiple Linear Regression in R
 
An Application Of TRAMO-SEATS Model Selection And Out-Of-Sample Performance....
An Application Of TRAMO-SEATS  Model Selection And Out-Of-Sample Performance....An Application Of TRAMO-SEATS  Model Selection And Out-Of-Sample Performance....
An Application Of TRAMO-SEATS Model Selection And Out-Of-Sample Performance....
 
DOW JONES INDUSTRIAL AVERAGE Time series Data Analysis: Analysis and Results ...
DOW JONES INDUSTRIAL AVERAGE Time series Data Analysis: Analysis and Results ...DOW JONES INDUSTRIAL AVERAGE Time series Data Analysis: Analysis and Results ...
DOW JONES INDUSTRIAL AVERAGE Time series Data Analysis: Analysis and Results ...
 
MSc Finance_EF_0853352_Kartik Malla
MSc Finance_EF_0853352_Kartik MallaMSc Finance_EF_0853352_Kartik Malla
MSc Finance_EF_0853352_Kartik Malla
 
Lecture2 Applied Econometrics and Economic Modeling
Lecture2 Applied Econometrics and Economic ModelingLecture2 Applied Econometrics and Economic Modeling
Lecture2 Applied Econometrics and Economic Modeling
 
Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes
 
Time series project
Time series projectTime series project
Time series project
 
Empowerment Technology Lesson 4
Empowerment Technology Lesson 4Empowerment Technology Lesson 4
Empowerment Technology Lesson 4
 

Case Study of Petroleum Consumption With R Code

  • 1. Society consumes trillions upon trillions of petroleum in barrels every month. The question at hand is whether or not our consumption follows a pattern. In fact it does! Presented in the data is the total amount of petroleum consumption by the residential sector, from 1984 to 2013. Figure 1 clearly shows that our consumption fluctuates above and below the overall mean of our data! So by plotting a Loess curve we are able to visibly see the rising and falling trends of our data! To remedy this trend we take the first difference as well as a log, the latter as a means to reduce the varying scale of our data, and behold the data is very much stationary, as there is very little fluctuation from the mean, which is exactly what we need for our data to be stationary! Now what model should we employ to represent our data? Figure two contains the ACF and Figure 1: The RAW, Transformed, and Transformed + First Difference of our data. Notice how trend becomes less and less apparent!
  • 2. the PACF plots of our transformed and differenced data, and quite clearly we can see that neither an auto regressive model nor would a moving average model be able to represent our model, so we must use an ARMA model, and because we have taken a first difference we must employ the use of an ARIMA(q,d,p) model! Using R's forecast package I employ the use of auto.arima() function to determine our parameters, which come out to be ARIMA(5,1,4). We now have a model to fit our data to! The next step to see if this model fits very well is to perform a number of diagnostics to ensure our model is a good fit! First we look at the ACF and PACF of our residuals, and there happen to significant values at particular lags, which tells me that the 5,1,4 model may not have been the best model to represent our data. Next we look to see if our residuals fluctuate about a mean of 0, and quite certainly they in fact do. The next question is: are the residuals normal and independent? The normal plot shows that they are indeed normal, and the Box-Ljung test gives us a significantly large p-test (p = .94), which implies that our residuals are independent. Figure 2: The ACF and PACF of our transformed and differenced data, notice that both do not cut off. Implying that an ARMA model is appropriate! We also have our periodogram of our data! Notice that a large peak occurs around .833! This is our frequency which relates to a cycle of 1.2 months!
  • 3. However do take notice of the Ljung-Box test plot clearly shows that after two lags our model no longer shows independent residuals. What does this mean? It implies any lag higher than two may be of poor representation, in other words our model is only decent at examining two future values. Figure three represent our conclusions graphically. Our results on the residuals are as follows: 1. Independent for lags up to 2 2. Constant variance 3. Fluctuate about mean 4. Residuals are normal Our residuals do not represent only white noise, there are other errors being accounted for. So the question is now then how effective is our model in predicting our future consumption of petroleum? Using our suggested model, ARIMA(5,1,4) we attempt to predict the last 12 months of our Figure 3: Our residual lots concerning residual analysis for our model. Notice how our model fails the Ljung Box test and clearly shows significant values in it's ACF and PACF! This implies our model is not the BEST!
  • 4. observed data. Figure 4 shows our forecasted values, our observed values, as well as our prediction bands (our standard error.) Our predicted values are actually quite spot on! Our observed and forecasted values are well within range of one another and lie within our prediction bands, so our model, ARIMA(5,1,4) turned out to be a decent fit for representing future values! Using time series methods we observed that we were able to draw successful predictions to how much petroleum the residential sector will consume per month! This example shows time series's effectiveness in predicting the future. However what I must take note of is that our prediction were for the next year only, not the year afterward or the year after that one. What should be taken with a grain of salt is that though we can successfully predict the future to a relatively good degree, we are still unable to account for possibly shocks in the future with the methods presented here. FIGURE clearly shows how large our room for error is! Nevertheless time series remains to be a powerful tool in predicting the future based on past observations. Figure 4: Our predicted values and the observed values. Take note of how well our model actually predicted the values!
  • 5. Figure 6: Our periodogram of our final model, notice that it still retains the same significant frequency (.899), which is made even more apparent in it's smoothed counter part!
  • 6. Explicit R Code & Output: > data <- read.table("Data.txt", header = TRUE) > > #Determine if the Series is stationary. > > #Plot the original data > par(mfrow=(c(3,1))) > Petro <- data$Petro > Count = (1:359) > > plot(Petro~Count, type = "l",main = "Petriolium Consumed Be Residential Sector (Trillions) 1984 - 2013", ylab = "Original Petrolium") > fitO = loess(Petro~Count, span = .25, family = "gaussian") > lines(Count,fitO$fitted,col="red") > abline(h = mean(Petro), col = 'blue') > legend(0,100, c("Observed", "Loess", "Mean"), lty = c(1,1), lwd=c(2.5,2.5),col = c("black","red", "blue")) > > #Transform the Data > transData <- log(Petro) > plot(transData ~ Count, type = "l", main = "Petriolium Consumed Be Residential Sector (Trillions) 1984 - 2013",ylab = "Log Petrolium") > fitL = loess(transData ~ Count, span = .25, family = "gaussian") > lines(Count,fitL$fitted,col="red") > abline(h = mean(transData), col = 'blue') > legend(0,4.4, c("Observed", "Loess", "Mean"), lty = c(1,1), lwd=c(2.5,2.5),col = c("black","red", "blue")) > > #Take the First difference > transDif <- diff(transData,differences=1) > newCount = (1:358) > > #Plot the transformed data. > plot(transDif~newCount, type = "l", main = "Petriolium Consumed Be Residential Sector (Trillions) 1984 - 2013",ylab = "First Difference Petrolium") > fitD = loess(transDif ~ newCount, span = .25, family = "gaussian") > lines(newCount,fitD$fitted,col="red") > abline(h = mean(transDif), col = 'blue') > legend(100,-.2, c("Observed", "Loess", "Mean"), lty = c(1,1), lwd=c(2.5,2.5),col = c("black","red", "blue")) > > #This will make our Data stationary. > #I will choose the Log + first difference of the Data. > par(mfrow=c(2,2)) > acf (transDif, lag = 400) > pacf (transDif, lag = 400) > dataPeriod <-spec.pgram(transDif, taper = 0, fast=FALSE, detrend=FALSE,demean
  • 7. =TRUE,log="no") > k = kernel("modified.daniell", c(6,6)) > dataSmooth <- spec.pgram(transDif, k, taper=0, detrend=FALSE, demean=TRUE, log="no") > which.max(dataSmooth$spec) [1] 30 > dataSmooth$spec [1] 0.001264492 0.001262252 0.001258110 0.001255941 0.001258441 0.001264701 0.001267342 0.001254761 0.001227113 0.001201434 0.001185966 0.001170703 [13] 0.001152362 0.001138549 0.001135637 0.001142318 0.001149538 0.003714350 0.011430265 0.021780544 0.032202750 0.042688695 0.053230640 0.063797912 [25] 0.074367912 0.084954830 0.095570415 0.106190513 0.116819454 0.122353888 0.117608038 0.107654882 0.097649276 0.087546969 0.077315275 0.067037882 [37] 0.056772306 0.046487385 0.036148892 0.025785512 0.015401723 0.007545190 0.004845270 0.004743850 0.004594111 0.004448212 0.004388544 0.005059599 [49] 0.007097739 0.009831243 0.012583215 0.015356612 0.018171704 0.021008704 0.023799299 0.026541293 0.029318707 0.032133087 0.034957185 0.036453173 [61] 0.035248132 0.032676137 0.030111676 0.027542471 0.024950235 0.022379948 0.019875935 0.017450727 0.015056079 0.012668207 0.010239976 0.008366218 [73] 0.007789588 0.007892669 0.007971573 0.008027764 0.008053122 0.008076994 0.008147917 0.008187799 0.008166425 0.008111941 0.008041100 0.008015643 [85] 0.008000086 0.007988294 0.008046627 0.008163193 0.008324955 0.008404989 0.008274497 0.008066992 0.007879108 0.007689639 0.007516088 0.007352704 [97] 0.007199961 0.007011720 0.006713313 0.006328845 0.005897773 0.005503301 0.005221036 0.005000439 0.004806468 0.004635803 0.004464696 0.004473053 [109] 0.004829694 0.005395164 0.006066362 0.006808720 0.007589485 0.008430153 0.009365512 0.010430756 0.011573510 0.012742851 0.013964865 0.014880976 [121] 0.015124874 0.014982793 0.014762723 0.014578933 0.014445398 0.014303281 0.014067069 0.013600449 0.012958840 0.012264598 0.011521265 0.010893322 [133] 0.010589373 0.010517923 0.010465841 0.010283997 0.009971791 0.010131472 0.011418931 0.013484145 0.015822529 0.018271292 0.020717686 0.023187057 [145] 0.025653494 0.028048051 0.030435900 0.032904177 0.035450444 0.036900974 0.036047474 0.033849221 0.031288464 0.028530360 0.025783871 0.023068047 [157] 0.020398738 0.017794329 0.015191281 0.012567859 0.009992336 0.008147941 0.007657208 0.007900026 0.008349436 0.008983892 0.009668505 0.010320199 [169] 0.010926820 0.011538437 0.012218961 0.012937053 0.013547797 0.013941216 0.014164212 0.014394409 0.014640860 0.014777446 0.014802968 0.014796573 > freq = .833 > length(Petro) [1] 359 > cyc <- 1/.833 # number of Montshs in a cycle > cyc [1] 1.20048 > cycTotal <- length(Petro)/cyc #number of total cycles in our data > cycTotal [1] 299.047 > specMax = .122 > #What is the preliminary Model we should choose. > > #ACF AND PACF BOTH TAIL OFF TO ZERO.
  • 8. > library('forecast') Loading required package: zoo Attaching package: ‘zoo’ The following objects are masked from ‘package:base’: as.Date, as.Date.numeric Loading required package: timeDate This is forecast 5.2 > library('astsa') > library("FitAR") Loading required package: lattice Loading required package: leaps Loading required package: ltsa Loading required package: bestglm Loading required package: lars Loaded lars 1.2 Loading required package: ElemStatLearn Attaching package: ‘FitAR’ The following object is masked from ‘package:forecast’: BoxCox > > #I choose an ARIMA model, d = 0 since the data is already differenced. > > #Select the final model using ID Criterion > auto.arima(transDif, max.p = 8, max.q = 8, stationary = TRUE, ic = c("aicc", "aic", "bic"), trace = TRUE) ARIMA(2,0,2) with non-zero mean : -439.6081 ARIMA(0,0,0) with non-zero mean : -390.096 ARIMA(1,0,0) with non-zero mean : -414.5041 ARIMA(0,0,1) with non-zero mean : -401.6552 ARIMA(1,0,2) with non-zero mean : -435.8972 ARIMA(3,0,2) with non-zero mean : -590.5202 ARIMA(3,0,1) with non-zero mean : -553.3992 ARIMA(3,0,3) with non-zero mean : -624.6525 ARIMA(4,0,4) with non-zero mean : -641.8194 ARIMA(4,0,4) with zero mean : -643.3304 ARIMA(3,0,4) with zero mean : 1e+20 ARIMA(5,0,4) with zero mean : -654.9737 ARIMA(5,0,3) with zero mean : -636.1066
  • 9. ARIMA(5,0,5) with zero mean : -629.8412 ARIMA(4,0,3) with zero mean : -636.4764 ARIMA(6,0,5) with zero mean : 1e+20 ARIMA(5,0,4) with non-zero mean : -653.4301 ARIMA(6,0,4) with zero mean : 1e+20 Best model: ARIMA(5,0,4) with zero mean Series: transDif ARIMA(5,0,4) with zero mean Coefficients: ar1 ar2 ar3 ar4 ar5 ma1 ma2 ma3 ma4 -0.2040 1.0071 0.1723 -0.9667 -0.1743 -0.0248 -1.2324 -0.1158 0.8848 s.e. 0.0613 0.0171 0.0601 0.0152 0.0599 0.0335 0.0319 0.0293 0.0282 sigma^2 estimated as 0.008753: log likelihood=336.29 AIC=-652.58 AICc=-651.95 BIC=-613.78 > > #My model is going to be of order (5,1,4) > > > #Perform residual analysis, show that our model is good. Make sure our error is white noise. > par(mfrow = c(4,2)) > > #Plot of Residuals > fit <- Arima(transDif, order = c(5,0,4)) > plot(fit$residual, type = "p") > > abline(h=0, col = "red") > > #Normally Distributed? > hist(fit$residuals) > > #AutoCorrelations present? > acf(fit$residuals) > pacf(fit$residuals) > > #Are they Normal? > qqnorm(fit$residuals) > > #Does the Test say it is a good fit? > Box.test(fit$residuals, type = c("Ljung-Box")) Box-Ljung test data: fit$residuals X-squared = 0.0056, df = 1, p-value = 0.9402
  • 10. > LBQPlot(fit$residuals, lag.max = 30) > mean(fit$res) [1] -0.001070552 > > #Smoothed periodgram of residuals > k = kernel("modified.daniell", c(6,6)) > spec.pgram(fit$residuals, k, taper=0, detrend=FALSE, demean=TRUE, log="no") > > > par(mfrow = c(2,1)) > #Plot Spec Density of the Final MOdel as well as the Smoothed > finalModel <- fitted(fit) > spec.pgram(finalModel, taper = 0, fast=FALSE, detrend=FALSE,demean =TRUE,log="no") > k = kernel("modified.daniell", c(6,6)) > spec.pgram(finalModel, k, taper=0, detrend=FALSE, demean=TRUE, log="no") > > #Refit The final model > > refit <- Arima(Petro[1:348], order = c(5,1,4)) > pred <- predict(refit, n.ahead = 12, d =1) > pred$pred Time Series: Start = 349 End = 360 Frequency = 1 [1] 96.98064 96.96685 94.07727 82.66665 75.30919 63.93817 59.73830 59.37145 62.29238 72.78900 79.44755 90.20105 > pred$pred + pred$se Time Series: Start = 349 End = 360 Frequency = 1 [1] 109.44587 112.64558 111.26915 100.66319 93.42068 82.07828 77.88662 77.55500 80.87230 92.11042 100.52580 113.21682 > forePlus <- c(109.44587, 112.64558 ,111.26915 ,100.66319 , 93.42068 , 82.07828 , 77.88662, 77.55500, 80.87230 , 92.11042 ,100.52580, 113.21682) > > pred$pred - pred$se Time Series: Start = 349 End = 360 Frequency = 1 [1] 84.51540 81.28813 76.88539 64.67011 57.19769 45.79806 41.58997 41.18790 43.71247 53.46758 58.36930 67.18528 > foreMins <- c(84.51540, 81.28813, 76.88539, 64.67011, 57.19769, 45.79806, 41.58997 ,41.18790, 43.71247, 53.46758 ,58.36930 ,67.18528) > > index <- c(348:359) > fore <- c(96.98064 ,96.96685, 94.07727, 82.66665, 75.30919, 63.93817, 59.73830, 59.37145,
  • 11. 62.29238, 72.78900, 79.44755, 90.20105) > par(mfrow=c(2,1)) > which.max(fore) [1] 1 > > par(mfrow = c(1,1)) > > plot(Petro[348:359]~index, type = 'l', col = 'black', main = 'Plot of Forecasted and Observed Values', xlab = "Time", ylab = "Petrolium Consumption", ylim = c(35,130)) > lines(fore~index, col = "red") > lines(forePlus~index, col = "blue") > lines(foreMins~index, col = "blue") > legend(348,130, c("Observed","Predicted", "Prediction Bands"), lty = c(1,1), lwd=c(2.5,2.5),col = c("black","red", "blue"))