SlideShare a Scribd company logo
1 of 38
Download to read offline
1	
  
	
  
Statistical Consulting for TELCAP
M2 Statistics and Econometrics
by
Orestis Ampeliotis Sayli Javadekar Zhao-qiu Luo Damien Quesada
  
  
2	
  
	
  
ACKNOLWDGEMENT
We would like to thank TelCap for providing us with the necessary logistic requirements to complete
this assignment. We would also like to thank Prof Daouia and Prof Orozco for their guidance without
which we would not be able to complete this assignment.
	
  
	
  
	
   	
  
3	
  
	
  
TABLE OF CONTENTS
ACKNOLWDGEMENT   2	
  
INTRODUCTION   4	
  
THEORETICAL BACKGROUND   5	
  
TIME SERIES 1 - TOULOUSE   7	
  
Overview	
   7	
  
Seasonality	
   8	
  
Differencing	
   9	
  
Estimation	
  &Model	
  Selection	
   10	
  
Prediction	
   15	
  
TIME SERIES 2 – AFGHANISTAN   17	
  
Overview	
   17	
  
Differencing	
   18	
  
Estimation	
  and	
  Model	
  Selection	
   21	
  
MODEL	
  1	
   21	
  
MODEL	
  2	
   23	
  
MODEL	
  3	
   26	
  
CONCLUSION   29	
  
ANNEXE:   30	
  
Codes	
  for	
  Toulouse	
   30	
  
Codes	
  for	
  Kabul	
   33	
  
	
  
	
   	
  
4	
  
	
  
Introduction
In this assignment we were given time series data for mobile data traffic for two locations, Toulouse and
Kabul. The aim was to provide a model that fits the data and could be utilised to predict the traffic. The
data for the two places was provided by TelCap. For Toulouse, we worked with cell421and found a
ARIMA (1,(7)) with an error of 13.3% and we could predict for 365 days. Whereas for Kabul, we
worked with cell 279 and found a AR (6) with an error of 11.1% and we could predict for 20 days. We
tested that our model fits quite well for four randomly selected cells of Toulouse, however for the cells
in Afghanistan as the data in Afghanistan behaves more erratic it’s not possible to fit the same model in
other cells. We have programmed all the codes in SAS for this project.
	
   	
  
5	
  
	
  
Theoretical Background
First we consider 𝑦! to be the time series. Now we define the models we have used in our project.
• Stationarity: If neither the mean 𝜇!nor the autocovariances 𝛾!"depend on the date t, then the
process for 𝑦! is said to be weakly stationary:
o E(𝑦!)= μ for all t
o 𝛾!" = 𝑐𝑜𝑣 𝑦!, 𝑦!!! =  E(𝑦! − 𝜇)(𝑦!!! −   𝜇)=𝛾(𝑗) for all t and any j.
In practice, we consider a time series stationary based on :
1. The chronogram of the time series has a constant mean and constant variance over time
2. If the ACF,PACF and IACF1
plots are decreasing exponentially
3. The Augmented Dickey-Fuller Unit-Root test2
• Autocorrelation function (ACF):
We denote 𝜌(𝜏) as autocorrelation function, it defined as
𝜌 𝜏 =
𝛾 𝜏
𝛾 0
where  𝛾 0 =   𝑐𝑜𝑣 𝑦!, 𝑦! = 𝑣𝑎𝑟  (𝑦!) and 𝛾 𝜏 = 𝑐𝑜𝑣 𝑦!, 𝑦!!!
• Partial autocorrelation function (PACF) and Inverse autocorrelation function (IACF)
PACF and IACF are complicated to define and understand. We referenced several test books3
for you to
read it if necessary. (see references)
• Lag operator:
We define the lag operator B such that 𝐵𝑦! = 𝑦!!!
• White noise:
A stationary time series 𝜀! is said white noise if cov(𝜀!, 𝜀!) = 0 for all t ≠  s
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
1
Brocklebank	
  J.	
  and	
  Dickey	
  D.(2003).	
  SAS	
  for	
  Forecasting	
  Time	
  Series,	
  United	
  States,	
  pp.58-­‐78.	
  
2
Cryer.	
  J.	
  and	
  Chan.	
  KS.	
  (2008).	
  Time	
  series	
  analysis	
  :	
  with	
  application	
  in	
  R.	
  Springer,	
  United	
  States,	
  pp.	
  129.	
  	
  
3
	
  Yves	
  ARAGON(2006).	
  Séries	
  Temporelles	
  appliquées.	
  
6	
  
	
  
In practice, this is verified using the Portmanteau Tests available for testing for autocorrelations in the
residuals of a model: it tests whether any of a group of autocorrelations of the residual time series are
different from zero.
• ARMA(p, q) Model
If the time series is stationary and the ACF, PACF and IACF decrease rapidly
Φ! 𝐵 𝑦! =  Θ! 𝐵 𝜀!
𝑦! −   𝜙! 𝑦!!! − ⋯ −   𝜙! 𝑦!!! =   𝜃! +   𝜀! −   𝜃! 𝜀!!! − ⋯ −   𝜃! 𝜀!!!
Where 𝜀!are White Noise i.e. 𝜀!~𝑊𝑁 0, 𝜎!
. This model has two parameters:
Ø order of AR is p with coefficients AR:𝜙!, 𝜙!, 𝜙!, . . . , 𝜙!
Ø order of MA is q with coefficients MA:𝜃!, 𝜃!, 𝜃!, … , 𝜃!
• SARMA(p, q)(P,Q)s Model
If the time series presents a seasonality of period s, we use :
Φ! 𝐵!
Φ! 𝐵 𝑦! = 𝑐 +  Θ! 𝐵!
Θ! 𝐵 𝜀!
• ARIMA(p,d,q) Model
Φ! 𝐵 Δ!
𝑦! =  Θ! 𝐵 𝜀!
Where Δ!
𝑦! = (1 − 𝐵)!
𝑦! and follows an ARMA model
• SARIMA(p,d,q)(P, D, Q)S Model
We defineΘ! 𝐵!
= 1 − 𝑏! 𝐵!
−. . . −𝑏! 𝐵!"
and Φ! 𝐵!
= 1 − 𝑎! 𝐵!
−. . . −𝑎! 𝐵!"
1 − 𝐵 !
1 −   𝐵! !
𝑦! =  
Θ! 𝐵!
Θ! 𝐵
Φ! 𝐵! Φ! 𝐵
𝜀!
7	
  
	
  
Time Series 1 - Toulouse
Overview
We have chosen randomly one of the cells for Toulouse (cell 512) which has given us the daily data
from 6th
July 2010 to 31st
December 2013, accounting for 1,212 observations. The variable of interest is
Traffic_CS. Before building any model on our series, our first step is to see if this series is stationary.
Thus to check for this we try to plot the ACF and PACF plots. During this, we noticed that in the data, a
few days are missing, thus we have to ‘fill’ these gaps and have a value for each day of our interval. To
do so, we used a SAS procedure proc expand with the method spline4
.
So our raw data after filling the gaps looks like this:
Graphic1.1 : the original chronological traffic volume of Toulouse Saint-Rome during Jul.2010 to Jan.2014
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
4
The spline method is just a way to join together two spaced points thanks to a segmented function consisting of third-degree (cubic)
polynomial functions, so that the whole curve and its first and second derivatives are continuous. The methods should not be very
important for the following. The real need is to fill every gap of the time series. For more explanations, see the book:
Bartels, R. H.; Beatty, J. C.; and Barsky, B. A. "Hermite and Cubic Spline Interpolation." Ch. 3 in An Introduction to Splines for Use in
Computer Graphics and Geometric Modelling. San Francisco, CA: Morgan Kaufmann, pp. 9-17, 1998.
Or the website:
http://mathworld.wolfram.com/CubicSpline.html	
  
	
  
8	
  
	
  
From the figure above we see that, there are a few observations of the variable Traffic_CS equal to 16.
These can be considered atypical and potential outliers. Confirming with Mr. Olivier Rostaing, that
these have been observed probably due to a failure of the equipment, hence they have been deleted and
the missing values have been filled by the “spline method” in order to have a proper time series to work
with. We see that this series displays a non-constant mean and variance. Thus the series is not stationary.
To stabilize the variance we use a log transformation. From now on, the variable of interest will be
𝑥! = 𝑇𝑟𝑎𝑓𝑓𝑖𝑐_𝐶𝑆!
𝑦! = log  (𝑥!)
To convert it to a stationary series would be our first step, but before that we observe other patterns in
the series.
	
  
Seasonality
From the figure below, we can see that the same pattern is repeated for all the three years. (see arrows)
This tells us that our series displays yearly seasonality.
Graphic1.2:   the   chronological   traffic   volume𝑥!  of   Toulouse   Saint-­‐Rome   during      Jul.2010   to   Jan.2014   after   removing   the  
outliers
Next, we checked the ACF and the PACF plots of this series and we obtain,
9	
  
	
  
	
  
Graphic1.3  :   The   autocorrelation   plots,   partial   autocorrelation   plots   and   inverse   autocorrelation   plots   of      Toulouse   Saint-­‐
Rome    traffic  volume  𝑥!  
The plots tell us that there is a pattern repeated every 7 days as well (see arrows). Hence there is weekly
seasonality along with yearly seasonality in our non stationary time series.
Differencing
If there is a non-stationary time series yt, and a seasonality of period s, then to make yt stationary, we
difference with the order s:
∆! 𝑦! = 1 − 𝐵!
𝑦! =   𝑦! − 𝑦!!!
If the ACF and PACF decrease rapidly to null, then it means that we have a stationary series and now
we can fit an ARMA model to this ‘new’ time series.
We try 3 methods:
• Difference by 7: the seasonality is removed, but the model is not valid at the end.
• Difference by 365: it does not eliminate the seasonality.
• Difference by 365 and 7: in this case we eliminate the seasonality and we get a white noise, so
we use this method.
10	
  
	
  
After the log transform and differencing by 365 and 7 the Augmented Dickey-Fuller test confirms that
our series is likely to be stationnary	
  	
  
The time series becomes:
Graphic1.4:  the  chronological  traffic  volume  of  Toulouse  Saint-­‐Rome  (𝑧!)during  Jul.2010  to  Jan.2014  
Here we see that the series has a constant mean and variance. From now, we are interested in
𝑧! = (1 − 𝐵!
) (1 −   𝐵!"#
)𝑦!
Because of the seasonality, we want to fit a SARMA model to 𝑧! and so a SARIMA to 𝑦!
Estimation &Model Selection
Next, to see which model would fit well for our data, we take a look at the ACF and the PACF plots. We
have to choose a model AR (p) if the PACF is null after rank p and a MA (q) if the ACF is null after
rank q. Below, we see that the ACF is null after lag 7. We want to make clear that the differenciated
time series doesn’t look strictly stationary, however, we will assume it is for the following, the period
being large, we could consider this decaying fast enough
11	
  
	
  
Graphic1.5  The  autocorrelation  plots,  partial  autocorrelation  plots  and  inverse  autocorrelation  plots  of  Toulouse  Saint-­‐Rome  (𝑧!)traffic  
volume  after  log  transformation  and  differencing  for  7  and  365	
  
Our autocorrelation plots suggests the use of a model MA (7):
𝑧! =   𝜇 +   𝜀! +   𝜃! 𝜀!!!
!
!!!
Where all εt are a white noise term error.
Next we fit a MA (7) on our series using proc arima in SAS. However we do not obtain a white noise
which is seen from the Autocorrelation Checks of residuals in the SAS output (Portmanteau Test). Here,
we test
H0 = ‘There is no autocorrelation’ against H1 = ‘There is autocorrelation’
The p-value (Pr> Khi-2) is very small (usually we compare this value to a 5% risk level), which allows
us to reject H0. We can say that there is a significant autocorrelation, and so, we have to reject the
hypothesis of White Noise. (see table below). Therefore, MA (7) is not valid.
12	
  
	
  
	
  
To help us get a good model, we next use the proc arima with the minic method, method which
computes the ‘optimal’ model according to the AIC or BIC5
creteria.
According to this method the optimal model is an AR (1) :
𝑧! = 𝑐 +   𝜑𝑧!!! +   𝜀!,
but again as we see from the table below, the p-values are <.05 hence we reject the null hypothesis that
there is no autocorrelation.
	
  
	
  
Since the two models are not valid, we try several combinations of them. The model ARMA (1, 7)
𝑧! = 𝑐 +   𝜀! +   𝜑 ∙ 𝑧!!! +   𝜃! 𝜀!_!
!
!!!
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
5
Cryer.	
  J.	
  and	
  Chan.	
  KS.	
  (2008).	
  Time	
  series	
  analysis	
  :	
  with	
  application	
  in	
  R.	
  Springer,	
  United	
  States,	
  pp.	
  130-­‐131.	
  
13	
  
	
  
Here we get white noise, but we noticed that the θi, i = 1, … 6 are not significant (except i=5, but it’s
close to 5%, its significance is not very obvious), so we delete them.
Next, we noticed that the intercept (MU, ‘c’ in our formula) is not significant:
After deleting the constant,
14	
  
	
  
Thus, we have a model with white noise and significant coefficients.
To ensure that this is the best model, we tried to fit a model ARMA(2,(7)) and model ARMA(1,(8)) to
the data, but we do not get the white noise.
Thus, we keep our model ARMA(1, (7)).
𝒛𝒕 =   𝜺𝒕 +   𝝋𝒛𝒕!𝟏 +   𝜽𝜺𝒕!𝟕
And so
1 − 𝐵!
1 −   𝐵!"#
𝑦! =
Θ ! (𝐵)
Φ!(𝐵)
𝜀! ↔     1 − 𝐵!
1 −   𝐵!"#
𝑦! =  
1 −   𝜃  𝐵!
1 −   𝜙𝐵
𝜀!
We have the equation of a SARIMA model. However, the theory defines it by only one seasonality s.
Here we have two different seasonalities s1=7 and s2=365 and so it is a non standard SARIMA model.
𝑦!~  𝑆𝐴𝑅𝐼𝑀𝐴(1,0, (7))(1, 1,0)!,!"#
15	
  
	
  
Prediction
From this model, we can compute some predictions.
We fit our non-standard SARIMA to the 𝑦! variable in order to get the forecast. However, we predicted
𝑦! =  log (Traffic_CS). Coming back to 𝑥! =  Traffic_CS is easy but not trivial.
We have to use the transformation:
𝑥!,!"#$%&'( =   𝑒!!,!"#$%&'(!  
!!
!   
where σ² is the variance of the forecast  𝑦!,!"#$%&'(
After the vertical line we see what our model predicts (one year prediction). We can see that the forecast
(in red) has the same shape as the original data (in black).
Graphic1.6: One year forecasting of our whole data set
16	
  
	
  
To check whether our prediction is good or not, we delete a part of data (here data deleted is from 1st
January 2013 to 1st
January 2014) and predict them by using our model :
Graphic1.7: Check of the forecast only on a part of the data
Then we compare our predicted values with the original ones which are the true values and calculate the
error rate.
𝑒𝑟𝑟𝑜𝑟  𝑟𝑎𝑡𝑒 =   𝑀𝑒𝑎𝑛(
𝑥! −   𝑥!!"#$%&'(
𝑥!
)
Our model has a 13.3% error for the one year forecast.
We fitted this model on 4 randomly selected cells from Toulouse. The model was valid for the 3 out of
the 4 cells selected (cell512, cell521G1, cell451D3), the 4th cell (cell421D1) needed adjustments with
the lag.
We tried the same methodology without the weekends to check if there were any improvements in the
model. Here we succeeded in finding another model, but the final error was bigger than before so we
keep the whole data and our model.
17	
  
	
  
Time Series 2 – Afghanistan
Overview
We choose one of the cells for Afghanistan (cell279) from the data set that has been provided to us,
which contains daily data from 25th
May 2012 to 22nd
August 2014, accounting for 839 observations.
The variable of interest is Traffic_CS. The first step, as we discussed before, is to check if our time
series is stationary. In order to plot the time series and the ACF and PACF graphs we have to “treat” the
missing values that our data has. Using again the method that SAS provides (spline) we achieve that.
Graphic2.1 : the original chronological traffic volume of Afghanistan from May 2012 to Sept 2014
	
  
From the Graphic2.1 we see that there is no clear trend and there is a sudden fall in the traffic between
2/4/2013 and 26/6/2013 which is a main reason for non-stationary time series.
The usual method to forecast this are the Markov Chains, however it is impossible here, because we
only have one occurrence of the fall .We can propose 3 methods :
• Model 1: Fit an ARIMA model to the whole data.
• Model 2: Forecast the whole data but by placing this part up in the continuity of the data
18	
  
	
  
• Model 3: We use the data just after the gap (we have to know why this fall appeared, maybe it
won’t happen anymore)
Like before let the variable of interest be
𝑥! = 𝑇𝑟𝑎𝑓𝑓𝑖𝑐_𝐶𝑆!
Differencing
If we lift the gap, we see that when we fill the gap we see yearly seasonality . And since this gap occurs
only once in our data, we can consider it unnatural.
Now, we check the ACF and the PACF plots of the original time series and we obtain the following
results:
Graphic2.2: The autocorrelation plots, partial autocorrelation plots and inverse autocorrelation plots of Afghanistan time
series traffic volume
As we can notice from the ACF and PACF plots, they are not decreasing exponentially providing
evidence of a non-stationary time series. So according with the Augmented Dickey-Fuller Single Mean
Test obtained from SAS output for lag5 it is required to difference our time series.
19	
  
	
  
Thus we difference our time series for 1. We also take a seasonal difference of 365 that we will justify
later. Hence we have the following time series:
Graphic2.3: the differenced chronological traffic volume of Afghanistan
And the corresponding Autocorrelation plots:
20	
  
	
  
Graphic2.4: The autocorrelation plots, partial autocorrelation plots and inverse autocorrelation plots of differenced 1 and 365
times Afghanistan time series traffic volume
After the first simple difference we get,
∆! 𝑥! = 1 − 𝐵!
𝑥! =   𝑥! − 𝑥!!!
The seasonal difference of 365 of ∆! 𝑥! series gives
zt =(1-B365
)1
∆! 𝑥!
where s=365 and D=1
As we can see now the ACF and PACF plots are rapidly decreasing so we can assume that our time
series is stationary. We confirm it by the Augmented Dickey-Fuller test.
21	
  
	
  
Estimation and Model Selection
MODEL 1
The next step is to fit a model with our time series in order to perform forecasting. For this purpose we
use the ACF and PACF plots. We see that the ACF plot is null after lag 1 and PACF plot is null after lag
6. Thus we try to fit a MA(1) on this series however we do not get a white noise. Next we try a AR(6)
and this gives us a white noise and significant coefficients.
Here we see that for all the estimates the p-values are less than .05 thus they are significant.
The Autocorrelation Check for the residuals gives us,
22	
  
	
  
This is in accordance to the Portemanteau tests for autocorrelations. As the p-values are all greater
than .05, the null hypothesis that there is no autocorrelation is not rejected. Thus white noise is obtained.
Our model finally is,
𝑧! =   𝜀! +   𝜑! 𝑧!!! + 𝜑! 𝑧!!! +   𝜑! 𝑧!!! +   𝜑! 𝑧!!! +   𝜑! 𝑧!!! +   𝜑! 𝑧!!!
which can be represented in terms of
Thus 𝑥! (such that zt =(1-B365
)1
∆! 𝑥!) is a SARIMA(6,1,0)(0,1,0)365.
𝑧! = 1 − 𝐵!"# !
1 −   𝐵!
𝑥! =
Θ ! (𝐵)
Φ!(𝐵)
𝜀!
Prediction
With this pure autoregressive model AR(6) , we firstly tried to forecast one-year traffic value after 22
September 2014.(see graph below).
Graphic2.5: Forecast of the original data of Afghanistan
23	
  
	
  
Then we calculate the error rate by the mean absolute value of the difference between the predictions
and true values divided the true values. This amounts to 11.1%
Know that we have forecasted, we have a “bigger” time series to observe, and we clearly see the
seasonality now with a maximum in july.
MODEL 2
In this model, we lift the gap by 130 and fit a model to this new series. So our new series looks like
below
	
  
Graphic2.6: chronogram of traffic volume of Afghanistan after lifting the fall
We difference this series once and 365 times as before and then 𝑥! fit the following
SARIMA(10,1,0)(0,1,0)365
1 − 𝐵!"# !
1 −   𝐵!
𝑥! =
Θ!(𝐵)
Φ!"(𝐵)
𝜀!
According to the ACF and PACF
24	
  
	
  
To verify this model we check the information below:
-­‐ The Autocorrelation check of residuals gives us,
-­‐ As mentioned earlier, we obtain white noise according to the Portemanteau tests. Also all of the
estimates are significant as seen in the table below
25	
  
	
  
𝑥!  fit the model SARIMA(10,1,0)(0,1,0)365 :
𝑧! = 1 − 𝐵!"# !
1 −   𝐵!
𝑥! =
Θ!(𝐵)
Φ!"(𝐵)
𝜀!
Predictions
We use this model to make the predictions for 365 days. Below is the graph of the series
	
  
Further we calculate the error as explained before and we get 14.6% percent for 30 days.
26	
  
	
  
MODEL 3
Here we work on the series after the gap and delete the previous data.
Graphic 2.7: Chronogram of the selected data after the gap
To get the stationary series we difference it once and to consider the seasonality we difference it 365
times as before. With the same methodology, we try to fit a model.
To verify this model we check the information below.
- The Autocorrelation check of residuals verify the white noise according to the test.
27	
  
	
  
- We obtain the estimates to be significant.
The final model is 1 − 𝐵!"# !
1 −   𝐵!
𝑥! =
!!(!)
!!"(!)
𝜀!
So 𝑥!  fit the following SARIMA(10,1,0)(0,1,0)365 follows a SARIMA(10,1,0)(0,1,0)365.
28	
  
	
  
Predictions :
We use this model to make the predictions for 365 days. Below is the graph of the series
Further we calculate the error as explained before and we get 10.98 percent for 20 days.
	
  
So, for the 3 models, we get quite similar SARIMA.
Original Data Lifted Data Cut Data
Model of 𝒙𝒕 SARIMA(6,1,0)(0,1,0)365 SARIMA(10,1,0)(0,1,0)365 SARIMA(10,1,0)(0,1,0)3
65
Model of 𝒛𝒕 =
(1-B365
) ∆ 𝟏 𝒙𝒕
ARMA(6,0) ARMA(10,0) ARMA(10,0)
Error Rate(%) 11.1 14.6 10.98
29	
  
	
  
Conclusion
In this assignment, we analyse respectively the telecommunication traffic series in Toulouse and
Afghanistan. Since the traffic series in Toulouse behaves much better than that in Afghanistan, the
prediction is effective for longer duration hence we predicted one year’s traffic volume for Toulouse but
only ten days’ for Afghanistan. Both traffic series experienced non stationarity in this
telecommunication traffic modelling study because the demand patterns influencing the series were not
relatively stable, thus requiring series transformation, which is generally done by differentiation(as we
did 365 and 7for Toulouse and once for Afghanistan). From our study we can say that the modern traffic
in telecommunication with strong correlation characteristics can be appropriately modeled by time series,
especially seasonal ARIMA. Evaluating the seasonal ARIMA model (developed and finally chosen as
being the most appropriate in this study) showed a fairly high performance related to the residual
dimension, which did not have any correlation. To conclude, we strongly recommend that we need to
use ARIMA models with customised lags for each cells of Toulouse and Afghanistan.
30	
  
	
  
Annexe:
Codes for Toulouse
/* importing the dataset*/
PROC IMPORT OUT=telDATAFILE= "C:UsersUSERDesktopTSE M2 Eco
STatTelcapDonnesTls dataFinaldata.xlsx"
DBMS=xlsx REPLACE;
SHEET="HistoricalTraffic";
GETNAMES=YES;
RUN;
/* we are keeping from the dataset only the variable of our interest (Traffic_CS
and the date)*/
Data tel (keep=date Traffic_CS);
Set tel;
run;
/*we are deleting the potential outliers i.e values for which Traffic_CS is very
small*/
data tel1;
set tel;
if Traffic_CS<50 then delete;
run;
/* proc expand method using spline to fill the gaps in the data(for more
information see the references)*/
Proc expand data=tel1 out=tel1 to=day method=spline plots=TRANSFORMIN;
id date;
run;
/* deleting the data after 31/12/2012 in order to compare with the prediction
ATTENTION: This is the final code. Initially we did all the procedure with
Traffic_CS and not with Traffic_CSbis in order to find the ARMA(1,(7)). As follows:
data tel2;
settelp;
ltra_CS=log(Traffic_CS);
ltra7=dif7(ltra_CS);
ltra365=dif365(ltra7);
run;
proc arima data=tel2;
ivar=ltra_CS(7,365) minic perror=(1:11);run;
e p=1 q=(7) noint plot;run;
31	
  
	
  
/*we are taking the log of the traffic_CS and we differencing seasonal two times
with respect the weekly seasonality and the yearly */
data tel2;
set tel1;
ltra_CS=log(Traffic_CSbis);
ltra7=dif7(ltra_CS);
ltra365=dif365(ltra7);
run;
/* we are fitting the model ARMA(1,(7)) and in this part we are predicting also for
the following 365 days*/
Proc arima data=tel2;
I var=ltra_CS(7,365) minic perror=(1:11);run;/*differencing for 7 and 365*/
e p=1 q=(7) noint plot;run;/* estimation of ARMA(p, q) without the intercept */
f out=previs lead=365 id=date interval=day noprint; run; quit;/* forecast of
the estimated model for lead=365 days, the outputs are stored in dataset previs */
/* we have taken the log transformation before so now we make the transformation
mentioned in the report */
Data previs;
set previs;
Traffic_forecast=exp(forecast + STD*STD/2);
run;
data previbis;
merge previs tel2;
by date;
run;
/* we are plotting the time series and the prediction*/
Proc gplot data=previbis;
symbol1 v=plus i=join color=black;
symbol2 v=star i=join color=red;
plotTraffic_CS * date=1Traffic_forecast * date=2/overlay href='01JAN2013'd;
run;
/*Prediction and calculating the error*/
/*After having conclude for our model we are taking the Traffic_CSbis in order to
find the error of our prediction like below (the reason for why we delete a part of
data, we have already explained in our report.*/
Data telp;
set tel1;
Traffic_CSbis = Traffic_CS;
if date >'31DEC2012'dthen
Traffic_CSbis= .;
run;
32	
  
	
  
data tel2;
set telp;
ltra_CS=log(Traffic_CSbis);
ltra7=dif7(ltra_CS);
ltra365=dif365(ltra7);
run;
/* we are fitting the model ARMA(1,(7)) and in this part we are predicting also for
the following 365 days*/
Proc arima data=tel2;
I var=ltra_CS(7,365) minic perror=(1:11);run;/*differencing for 7 and 365*/
e p=1 q=(7) noint plot;run;/* estimation of ARMA(p, q) without the intercept */
f out=previs lead=365 id=date interval=day noprint; run; quit;/* forecast of
the estimated model for lead=365 days, the outputs are stored in dataset previs */
/* we have taken the log transformation before so now we make the transformation
mentioned in the report */
Data previs;
set previs;
Traffic_forecast=exp(forecast + STD*STD/2);
run;
data previbis;
merge previs tel2;
by date;
run;
/* we are plotting the time series and the prediction*/
Proc gplot data=previbis;
symbol1 v=plus i=join color=black;
symbol2 v=star i=join color=red;
plot Traffic_CS * date=1Traffic_forecast * date=2/overlay href='01JAN2013'd;
run;
/* Then we are calculate the error*/
data difference;
set previbis;
if date>'31DEC2012'dthen
error = ABS(Traffic_CS - Traffic_forecast);
Rerror = ABS(Traffic_CS - Traffic_forecast)/Traffic_CS;
Qerror = (Traffic_CS - Traffic_forecast)**2;
run;
proc means data=difference;
var error Rerror Qerror;
run;
	
  
33	
  
	
  
	
  
Codes for Kabul
/* import the data */
proc import out=kblc279_4 datafile= "C:Usersdamien1991DesktopM2Stat
Consultingkblc279_4"
dbms=xlsx replace;
sheet="Historical Traffic";
getnames=yes;
run;
/* only keep date and Traffic_CS */
data tel279 (keep=date Traffic_CS);
set kblc279_4;
run;
/*fill the gaps with spline method */
proc expand data=tel279 out=tel279 to=day method=spline plots=TRANSFORMIN;
id date;
run;
proc arima data=tel279;
i var=Traffic_CS(1,365) nlag=100 minic perror=(1:11);run;
e p=6 noint;run; /* estimate the model AR(6) */
f out=previs_af279 lead=365 id=date interval=day ; run; /* forecast for leads =
365 days */
data previs_af;
set previs_af279;
Traffic_forecast=forecast;
run;
/* merge forecasted and real data */
data previbis_af;
merge previs_af tel279;
by date;
run;
/* plot forecast */
proc gplot data=previbis_af;
symbol1 v=plus i=join color=black;
symbol2 v=star i=join color=red;
plot Traffic_CS * date=1 Traffic_forecast * date=2/overlay href='12SEP2014'd;
run;
/********************** calculate the error *****************************/
data tel279_bis;
set tel279;
if Date>'02SEP2014'd then delete;
run;
proc arima data=tel279_bis;
i var=Traffic_CS(1,365) nlag=100 minic perror=(1:11);run;
34	
  
	
  
e p=6 noint;run;
f out=previs_af279 lead=20 id=date interval=day ; run;
data previs_af;
set previs_af279;
Traffic_forecast=forecast;
run;
data previbis_af;
merge previs_af tel279;
by date;
run;
proc gplot data=previbis_af;
symbol1 v=plus i=join color=black;
symbol2 v=star i=join color=red;
plot Traffic_CS * date=1 Traffic_forecast * date=2/overlay href='02SEP2014'd;
run;
data difference;
set previbis_af;
if date>'02SEP2014'd then do;
error = ABS(Traffic_CS - Traffic_forecast);
Rerror = ABS(Traffic_CS - Traffic_forecast)/Traffic_CS;
Qerror = (Traffic_CS - Traffic_forecast)**2;
end;
run;
proc means data=difference;
var error Rerror Qerror;
run;
Afghanistan Lift : we try to fix the fall by lifting it.
/* import the data */
proc import out=kblc279_4 datafile= "C:Usersdamien1991DesktopM2Stat
Consultingkblc279_4"
dbms=xlsx replace;
sheet="Historical Traffic";
getnames=yes;
run;
/*keep the variables of interest date and Traffic_CS */
data tel279 (keep=date Traffic_CS);
set kblc279_4;
run;
/* fill the gaps with spline method */
proc expand data=tel279 out=tel279 to=day method=spline plots=TRANSFORMIN;
id date;
run;
/* lift the fall */
data tel279lift;
35	
  
	
  
set tel279;
if Date>'01APR2013'd and Date<'23JUN2013'd then Traffic_CS=Traffic_CS+130;
run;
proc arima data=tel279lift;
i var=Traffic_CS(1,365) minic perror=(1:11);run; /* identification of the model for
Traffic_CS simple diff and seasonal diff 365 */
e p=10 noint;run; /* estimation of the model AR(p=10) */
f out=previs_af279 lead=365 id=date interval=day ; run; /*forecast for leads =
365 days */
data previs_af279;
set previs_af279;
Traffic_forecast=forecast;
run;
/* merge of forecasted and original data */
data previbis_af;
merge previs_af279 tel279lift;
by date;
run;
/* plot of the forecast */
proc gplot data=previbis_af;
symbol1 v=plus i=join color=black;
symbol2 v=star i=join color=red;
plot Traffic_CS * date=1 Traffic_forecast * date=2/overlay href='22SEP2014'd;
run;
/********************** calculate the error *****************************/
data tel279lift_bis;
set tel279lift;
if Date>'22AUG2014'd then delete;
run;
proc arima data=tel279lift_bis;
i var=Traffic_CS(1,365) minic perror=(1:11);run; /* identification of the model for
Traffic_CS simple diff and seasonal diff 365 */
e p=10 noint;run; /* estimation of the model AR(p=10) */
f out=previs_af279 lead=365 id=date interval=day ; run; /*forecast for leads =
365 days */
data previs_af279;
set previs_af279;
Traffic_forecast=forecast;
run;
/* merge of forecasted and original data */
data previbis_af;
merge previs_af279 tel279lift;
by date;
run;
36	
  
	
  
/* plot of the forecast */
proc gplot data=previbis_af;
symbol1 v=plus i=join color=black;
symbol2 v=star i=join color=red;
plot Traffic_CS * date=1 Traffic_forecast * date=2/overlay href='22AUG2014'd;
run;
/* computation of the error */
data difference;
set previbis_af;
if date>'22AUG2014'd then do;
error = ABS(Traffic_CS - Traffic_forecast);
Rerror = ABS(Traffic_CS - Traffic_forecast)/Traffic_CS;
Qerror = (Traffic_CS - Traffic_forecast)**2;
end;
run;
proc means data=difference;
var error Rerror Qerror;
run;
Afghanistan cut : we keep the data after the fall
/* import the data */
proc import out=kblc279_4 datafile= "C:Usersdamien1991DesktopM2Stat
Consultingkblc279_4"
dbms=xlsx replace;
sheet="Historical Traffic";
getnames=yes;
run;
/* only keep date and Traffic_CS */
data tel279 (keep=date Traffic_CS);
set kblc279_4;
run;
/*fill the gaps with spline method */
proc expand data=tel279 out=tel279 to=day method=spline plots=TRANSFORMIN;
id date;
run;
/*select the data after the fall */
data tel279cut;
set tel279;
if Date<'26JUN2013'd then delete;
run;
proc arima data=tel279cut;
i var=Traffic_CS(1,365) nlag=100 minic perror=(1:11);run;
e p=10 noint;run; /* estimate the model AR(10) */
f out=previs_af279 lead=365 id=date interval=day ; run; /* forecast for leads =
365 days */
37	
  
	
  
data previs_af;
set previs_af279;
Traffic_forecast=forecast;
run;
/* merge forecasted and real data */
data previbis_af;
merge previs_af tel279cut;
by date;
run;
/* plot forecast */
proc gplot data=previbis_af;
symbol1 v=plus i=join color=black;
symbol2 v=star i=join color=red;
plot Traffic_CS * date=1 Traffic_forecast * date=2/overlay href='12SEP2014'd;
run;
/********************** calculate the error *****************************/
data tel279cut_bis;
set tel279cut;
if Date>'02SEP2014'd then delete;
run;
proc arima data=tel279cut_bis;
i var=Traffic_CS(1,365) nlag=100 minic perror=(1:11);run;
e p=10 noint;run;
f out=previs_af279 lead=20 id=date interval=day ; run;
data previs_af;
set previs_af279;
Traffic_forecast=forecast;
run;
data previbis_af;
merge previs_af tel279cut;
by date;
run;
proc gplot data=previbis_af;
symbol1 v=plus i=join color=black;
symbol2 v=star i=join color=red;
plot Traffic_CS * date=1 Traffic_forecast * date=2/overlay href='02SEP2014'd;
run;
data difference;
set previbis_af;
if date>'02SEP2014'd then do;
error = ABS(Traffic_CS - Traffic_forecast);
Rerror = ABS(Traffic_CS - Traffic_forecast)/Traffic_CS;
Qerror = (Traffic_CS - Traffic_forecast)**2;
end;
run;
proc means data=difference;
var error Rerror Qerror;
38	
  
	
  
run;

More Related Content

What's hot

QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...Austin Benson
 
Simply typed lambda-calculus modulo isomorphisms
Simply typed lambda-calculus modulo isomorphismsSimply typed lambda-calculus modulo isomorphisms
Simply typed lambda-calculus modulo isomorphismsAlejandro Díaz-Caro
 
“Solving QCD: from BG/P to BG/Q”. Prof. Dr. Attilio Cucchieri – IFSC/USP.
“Solving QCD: from BG/P to BG/Q”. Prof. Dr. Attilio Cucchieri – IFSC/USP.“Solving QCD: from BG/P to BG/Q”. Prof. Dr. Attilio Cucchieri – IFSC/USP.
“Solving QCD: from BG/P to BG/Q”. Prof. Dr. Attilio Cucchieri – IFSC/USP.lccausp
 
Chapter 2 laplace transform
Chapter 2 laplace transformChapter 2 laplace transform
Chapter 2 laplace transformLenchoDuguma
 
Maximum likelihood estimation for generalized autoregressive score models - A...
Maximum likelihood estimation for generalized autoregressive score models - A...Maximum likelihood estimation for generalized autoregressive score models - A...
Maximum likelihood estimation for generalized autoregressive score models - A...SYRTO Project
 
Lecture 3 tangent & velocity problems
Lecture 3   tangent & velocity problemsLecture 3   tangent & velocity problems
Lecture 3 tangent & velocity problemsnjit-ronbrown
 
Jif 315 lesson 1 Laplace and fourier transform
Jif 315 lesson 1 Laplace and fourier transformJif 315 lesson 1 Laplace and fourier transform
Jif 315 lesson 1 Laplace and fourier transformKurenai Ryu
 
laplace transform and inverse laplace, properties, Inverse Laplace Calculatio...
laplace transform and inverse laplace, properties, Inverse Laplace Calculatio...laplace transform and inverse laplace, properties, Inverse Laplace Calculatio...
laplace transform and inverse laplace, properties, Inverse Laplace Calculatio...Waqas Afzal
 

What's hot (19)

QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
 
Simply typed lambda-calculus modulo isomorphisms
Simply typed lambda-calculus modulo isomorphismsSimply typed lambda-calculus modulo isomorphisms
Simply typed lambda-calculus modulo isomorphisms
 
“Solving QCD: from BG/P to BG/Q”. Prof. Dr. Attilio Cucchieri – IFSC/USP.
“Solving QCD: from BG/P to BG/Q”. Prof. Dr. Attilio Cucchieri – IFSC/USP.“Solving QCD: from BG/P to BG/Q”. Prof. Dr. Attilio Cucchieri – IFSC/USP.
“Solving QCD: from BG/P to BG/Q”. Prof. Dr. Attilio Cucchieri – IFSC/USP.
 
25 String Matching
25 String Matching25 String Matching
25 String Matching
 
Inverse Laplace Transform
Inverse Laplace TransformInverse Laplace Transform
Inverse Laplace Transform
 
Laplace transform
Laplace transformLaplace transform
Laplace transform
 
Chapter 2 laplace transform
Chapter 2 laplace transformChapter 2 laplace transform
Chapter 2 laplace transform
 
Maximum likelihood estimation for generalized autoregressive score models - A...
Maximum likelihood estimation for generalized autoregressive score models - A...Maximum likelihood estimation for generalized autoregressive score models - A...
Maximum likelihood estimation for generalized autoregressive score models - A...
 
CMSI計算科学技術特論C (2015) ALPS と量子多体問題①
CMSI計算科学技術特論C (2015) ALPS と量子多体問題①CMSI計算科学技術特論C (2015) ALPS と量子多体問題①
CMSI計算科学技術特論C (2015) ALPS と量子多体問題①
 
Laplace Transform
Laplace TransformLaplace Transform
Laplace Transform
 
Isomorphism
IsomorphismIsomorphism
Isomorphism
 
smtlecture.10
smtlecture.10smtlecture.10
smtlecture.10
 
smtlecture.3
smtlecture.3smtlecture.3
smtlecture.3
 
Laplace transform
Laplace transformLaplace transform
Laplace transform
 
Lecture 3 tangent & velocity problems
Lecture 3   tangent & velocity problemsLecture 3   tangent & velocity problems
Lecture 3 tangent & velocity problems
 
Convex hull
Convex hullConvex hull
Convex hull
 
Jif 315 lesson 1 Laplace and fourier transform
Jif 315 lesson 1 Laplace and fourier transformJif 315 lesson 1 Laplace and fourier transform
Jif 315 lesson 1 Laplace and fourier transform
 
laplace transform and inverse laplace, properties, Inverse Laplace Calculatio...
laplace transform and inverse laplace, properties, Inverse Laplace Calculatio...laplace transform and inverse laplace, properties, Inverse Laplace Calculatio...
laplace transform and inverse laplace, properties, Inverse Laplace Calculatio...
 
convex hull
convex hullconvex hull
convex hull
 

Viewers also liked

IES Vall d'Alba - Correllengua - 1r i 2n d'ESO
IES Vall d'Alba - Correllengua - 1r i 2n d'ESOIES Vall d'Alba - Correllengua - 1r i 2n d'ESO
IES Vall d'Alba - Correllengua - 1r i 2n d'ESOiesvalldalba
 
ROLE & RESPONSIBILITY - ARJUN
ROLE & RESPONSIBILITY - ARJUNROLE & RESPONSIBILITY - ARJUN
ROLE & RESPONSIBILITY - ARJUNARJUN SHARMA
 
Helping High Achievers Find the Magic Within
Helping High Achievers Find the Magic WithinHelping High Achievers Find the Magic Within
Helping High Achievers Find the Magic WithinRebekah Black
 
Fuente de poder
Fuente de poderFuente de poder
Fuente de poderaurachacon
 
Present. cuanti el diablo viste a la moda
Present. cuanti el diablo viste a la modaPresent. cuanti el diablo viste a la moda
Present. cuanti el diablo viste a la modaanalygm
 
Gerencia de proyecto
Gerencia de proyectoGerencia de proyecto
Gerencia de proyectoLeyner Yesid
 
Arquitectura computacional
Arquitectura computacionalArquitectura computacional
Arquitectura computacionalmarisolalvarez30
 
Creacion de Blogger.
Creacion de Blogger.Creacion de Blogger.
Creacion de Blogger.KarenLorenaOh
 
Presentación1 erika
Presentación1 erikaPresentación1 erika
Presentación1 erikaEk Hdez
 

Viewers also liked (9)

IES Vall d'Alba - Correllengua - 1r i 2n d'ESO
IES Vall d'Alba - Correllengua - 1r i 2n d'ESOIES Vall d'Alba - Correllengua - 1r i 2n d'ESO
IES Vall d'Alba - Correllengua - 1r i 2n d'ESO
 
ROLE & RESPONSIBILITY - ARJUN
ROLE & RESPONSIBILITY - ARJUNROLE & RESPONSIBILITY - ARJUN
ROLE & RESPONSIBILITY - ARJUN
 
Helping High Achievers Find the Magic Within
Helping High Achievers Find the Magic WithinHelping High Achievers Find the Magic Within
Helping High Achievers Find the Magic Within
 
Fuente de poder
Fuente de poderFuente de poder
Fuente de poder
 
Present. cuanti el diablo viste a la moda
Present. cuanti el diablo viste a la modaPresent. cuanti el diablo viste a la moda
Present. cuanti el diablo viste a la moda
 
Gerencia de proyecto
Gerencia de proyectoGerencia de proyecto
Gerencia de proyecto
 
Arquitectura computacional
Arquitectura computacionalArquitectura computacional
Arquitectura computacional
 
Creacion de Blogger.
Creacion de Blogger.Creacion de Blogger.
Creacion de Blogger.
 
Presentación1 erika
Presentación1 erikaPresentación1 erika
Presentación1 erika
 

Similar to Statitical consulting project report

Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...BRNSS Publication Hub
 
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Simplilearn
 
Time Series Analysis with R
Time Series Analysis with RTime Series Analysis with R
Time Series Analysis with RARCHIT GUPTA
 
Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes Brock Butlett
 
11.[1 11]a seasonal arima model for nigerian gross domestic product
11.[1 11]a seasonal arima model for nigerian gross domestic product11.[1 11]a seasonal arima model for nigerian gross domestic product
11.[1 11]a seasonal arima model for nigerian gross domestic productAlexander Decker
 
Byungchul Yea (Project)
Byungchul Yea (Project)Byungchul Yea (Project)
Byungchul Yea (Project)Byung Chul Yea
 
Project time series ppt
Project time series pptProject time series ppt
Project time series pptamar patil
 
Different Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLDifferent Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLVijaySharma802
 
R language Project report
R language Project reportR language Project report
R language Project reportTianyue Wang
 
Time series modelling arima-arch
Time series modelling  arima-archTime series modelling  arima-arch
Time series modelling arima-archjeevan solaskar
 
Long Memory presentation to SURF
Long Memory presentation to SURFLong Memory presentation to SURF
Long Memory presentation to SURFRichard Hunt
 
Forecasting_CO2_Emissions.pptx
Forecasting_CO2_Emissions.pptxForecasting_CO2_Emissions.pptx
Forecasting_CO2_Emissions.pptxMOINDALVS
 

Similar to Statitical consulting project report (20)

Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
 
04_AJMS_288_20.pdf
04_AJMS_288_20.pdf04_AJMS_288_20.pdf
04_AJMS_288_20.pdf
 
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
 
Seasonal ARIMA
Seasonal ARIMASeasonal ARIMA
Seasonal ARIMA
 
Time Series Analysis with R
Time Series Analysis with RTime Series Analysis with R
Time Series Analysis with R
 
Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes
 
11.[1 11]a seasonal arima model for nigerian gross domestic product
11.[1 11]a seasonal arima model for nigerian gross domestic product11.[1 11]a seasonal arima model for nigerian gross domestic product
11.[1 11]a seasonal arima model for nigerian gross domestic product
 
Byungchul Yea (Project)
Byungchul Yea (Project)Byungchul Yea (Project)
Byungchul Yea (Project)
 
Project time series ppt
Project time series pptProject time series ppt
Project time series ppt
 
Different Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLDifferent Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIML
 
ACF.ppt
ACF.pptACF.ppt
ACF.ppt
 
R language Project report
R language Project reportR language Project report
R language Project report
 
ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]
 
Time series modelling arima-arch
Time series modelling  arima-archTime series modelling  arima-arch
Time series modelling arima-arch
 
Writing Sample 1
Writing Sample 1Writing Sample 1
Writing Sample 1
 
4267
42674267
4267
 
4267
42674267
4267
 
Long Memory presentation to SURF
Long Memory presentation to SURFLong Memory presentation to SURF
Long Memory presentation to SURF
 
20120140503019
2012014050301920120140503019
20120140503019
 
Forecasting_CO2_Emissions.pptx
Forecasting_CO2_Emissions.pptxForecasting_CO2_Emissions.pptx
Forecasting_CO2_Emissions.pptx
 

Statitical consulting project report

  • 1. 1     Statistical Consulting for TELCAP M2 Statistics and Econometrics by Orestis Ampeliotis Sayli Javadekar Zhao-qiu Luo Damien Quesada    
  • 2. 2     ACKNOLWDGEMENT We would like to thank TelCap for providing us with the necessary logistic requirements to complete this assignment. We would also like to thank Prof Daouia and Prof Orozco for their guidance without which we would not be able to complete this assignment.        
  • 3. 3     TABLE OF CONTENTS ACKNOLWDGEMENT   2   INTRODUCTION   4   THEORETICAL BACKGROUND   5   TIME SERIES 1 - TOULOUSE   7   Overview   7   Seasonality   8   Differencing   9   Estimation  &Model  Selection   10   Prediction   15   TIME SERIES 2 – AFGHANISTAN   17   Overview   17   Differencing   18   Estimation  and  Model  Selection   21   MODEL  1   21   MODEL  2   23   MODEL  3   26   CONCLUSION   29   ANNEXE:   30   Codes  for  Toulouse   30   Codes  for  Kabul   33        
  • 4. 4     Introduction In this assignment we were given time series data for mobile data traffic for two locations, Toulouse and Kabul. The aim was to provide a model that fits the data and could be utilised to predict the traffic. The data for the two places was provided by TelCap. For Toulouse, we worked with cell421and found a ARIMA (1,(7)) with an error of 13.3% and we could predict for 365 days. Whereas for Kabul, we worked with cell 279 and found a AR (6) with an error of 11.1% and we could predict for 20 days. We tested that our model fits quite well for four randomly selected cells of Toulouse, however for the cells in Afghanistan as the data in Afghanistan behaves more erratic it’s not possible to fit the same model in other cells. We have programmed all the codes in SAS for this project.    
  • 5. 5     Theoretical Background First we consider 𝑦! to be the time series. Now we define the models we have used in our project. • Stationarity: If neither the mean 𝜇!nor the autocovariances 𝛾!"depend on the date t, then the process for 𝑦! is said to be weakly stationary: o E(𝑦!)= μ for all t o 𝛾!" = 𝑐𝑜𝑣 𝑦!, 𝑦!!! =  E(𝑦! − 𝜇)(𝑦!!! −  𝜇)=𝛾(𝑗) for all t and any j. In practice, we consider a time series stationary based on : 1. The chronogram of the time series has a constant mean and constant variance over time 2. If the ACF,PACF and IACF1 plots are decreasing exponentially 3. The Augmented Dickey-Fuller Unit-Root test2 • Autocorrelation function (ACF): We denote 𝜌(𝜏) as autocorrelation function, it defined as 𝜌 𝜏 = 𝛾 𝜏 𝛾 0 where  𝛾 0 =  𝑐𝑜𝑣 𝑦!, 𝑦! = 𝑣𝑎𝑟  (𝑦!) and 𝛾 𝜏 = 𝑐𝑜𝑣 𝑦!, 𝑦!!! • Partial autocorrelation function (PACF) and Inverse autocorrelation function (IACF) PACF and IACF are complicated to define and understand. We referenced several test books3 for you to read it if necessary. (see references) • Lag operator: We define the lag operator B such that 𝐵𝑦! = 𝑦!!! • White noise: A stationary time series 𝜀! is said white noise if cov(𝜀!, 𝜀!) = 0 for all t ≠  s                                                                                                                 1 Brocklebank  J.  and  Dickey  D.(2003).  SAS  for  Forecasting  Time  Series,  United  States,  pp.58-­‐78.   2 Cryer.  J.  and  Chan.  KS.  (2008).  Time  series  analysis  :  with  application  in  R.  Springer,  United  States,  pp.  129.     3  Yves  ARAGON(2006).  Séries  Temporelles  appliquées.  
  • 6. 6     In practice, this is verified using the Portmanteau Tests available for testing for autocorrelations in the residuals of a model: it tests whether any of a group of autocorrelations of the residual time series are different from zero. • ARMA(p, q) Model If the time series is stationary and the ACF, PACF and IACF decrease rapidly Φ! 𝐵 𝑦! =  Θ! 𝐵 𝜀! 𝑦! −   𝜙! 𝑦!!! − ⋯ −   𝜙! 𝑦!!! =   𝜃! +   𝜀! −   𝜃! 𝜀!!! − ⋯ −   𝜃! 𝜀!!! Where 𝜀!are White Noise i.e. 𝜀!~𝑊𝑁 0, 𝜎! . This model has two parameters: Ø order of AR is p with coefficients AR:𝜙!, 𝜙!, 𝜙!, . . . , 𝜙! Ø order of MA is q with coefficients MA:𝜃!, 𝜃!, 𝜃!, … , 𝜃! • SARMA(p, q)(P,Q)s Model If the time series presents a seasonality of period s, we use : Φ! 𝐵! Φ! 𝐵 𝑦! = 𝑐 +  Θ! 𝐵! Θ! 𝐵 𝜀! • ARIMA(p,d,q) Model Φ! 𝐵 Δ! 𝑦! =  Θ! 𝐵 𝜀! Where Δ! 𝑦! = (1 − 𝐵)! 𝑦! and follows an ARMA model • SARIMA(p,d,q)(P, D, Q)S Model We defineΘ! 𝐵! = 1 − 𝑏! 𝐵! −. . . −𝑏! 𝐵!" and Φ! 𝐵! = 1 − 𝑎! 𝐵! −. . . −𝑎! 𝐵!" 1 − 𝐵 ! 1 −   𝐵! ! 𝑦! =   Θ! 𝐵! Θ! 𝐵 Φ! 𝐵! Φ! 𝐵 𝜀!
  • 7. 7     Time Series 1 - Toulouse Overview We have chosen randomly one of the cells for Toulouse (cell 512) which has given us the daily data from 6th July 2010 to 31st December 2013, accounting for 1,212 observations. The variable of interest is Traffic_CS. Before building any model on our series, our first step is to see if this series is stationary. Thus to check for this we try to plot the ACF and PACF plots. During this, we noticed that in the data, a few days are missing, thus we have to ‘fill’ these gaps and have a value for each day of our interval. To do so, we used a SAS procedure proc expand with the method spline4 . So our raw data after filling the gaps looks like this: Graphic1.1 : the original chronological traffic volume of Toulouse Saint-Rome during Jul.2010 to Jan.2014                                                                                                                   4 The spline method is just a way to join together two spaced points thanks to a segmented function consisting of third-degree (cubic) polynomial functions, so that the whole curve and its first and second derivatives are continuous. The methods should not be very important for the following. The real need is to fill every gap of the time series. For more explanations, see the book: Bartels, R. H.; Beatty, J. C.; and Barsky, B. A. "Hermite and Cubic Spline Interpolation." Ch. 3 in An Introduction to Splines for Use in Computer Graphics and Geometric Modelling. San Francisco, CA: Morgan Kaufmann, pp. 9-17, 1998. Or the website: http://mathworld.wolfram.com/CubicSpline.html    
  • 8. 8     From the figure above we see that, there are a few observations of the variable Traffic_CS equal to 16. These can be considered atypical and potential outliers. Confirming with Mr. Olivier Rostaing, that these have been observed probably due to a failure of the equipment, hence they have been deleted and the missing values have been filled by the “spline method” in order to have a proper time series to work with. We see that this series displays a non-constant mean and variance. Thus the series is not stationary. To stabilize the variance we use a log transformation. From now on, the variable of interest will be 𝑥! = 𝑇𝑟𝑎𝑓𝑓𝑖𝑐_𝐶𝑆! 𝑦! = log  (𝑥!) To convert it to a stationary series would be our first step, but before that we observe other patterns in the series.   Seasonality From the figure below, we can see that the same pattern is repeated for all the three years. (see arrows) This tells us that our series displays yearly seasonality. Graphic1.2:   the   chronological   traffic   volume𝑥!  of   Toulouse   Saint-­‐Rome   during     Jul.2010   to   Jan.2014   after   removing   the   outliers Next, we checked the ACF and the PACF plots of this series and we obtain,
  • 9. 9       Graphic1.3  :   The   autocorrelation   plots,   partial   autocorrelation   plots   and   inverse   autocorrelation   plots   of     Toulouse   Saint-­‐ Rome    traffic  volume  𝑥!   The plots tell us that there is a pattern repeated every 7 days as well (see arrows). Hence there is weekly seasonality along with yearly seasonality in our non stationary time series. Differencing If there is a non-stationary time series yt, and a seasonality of period s, then to make yt stationary, we difference with the order s: ∆! 𝑦! = 1 − 𝐵! 𝑦! =   𝑦! − 𝑦!!! If the ACF and PACF decrease rapidly to null, then it means that we have a stationary series and now we can fit an ARMA model to this ‘new’ time series. We try 3 methods: • Difference by 7: the seasonality is removed, but the model is not valid at the end. • Difference by 365: it does not eliminate the seasonality. • Difference by 365 and 7: in this case we eliminate the seasonality and we get a white noise, so we use this method.
  • 10. 10     After the log transform and differencing by 365 and 7 the Augmented Dickey-Fuller test confirms that our series is likely to be stationnary     The time series becomes: Graphic1.4:  the  chronological  traffic  volume  of  Toulouse  Saint-­‐Rome  (𝑧!)during  Jul.2010  to  Jan.2014   Here we see that the series has a constant mean and variance. From now, we are interested in 𝑧! = (1 − 𝐵! ) (1 −   𝐵!"# )𝑦! Because of the seasonality, we want to fit a SARMA model to 𝑧! and so a SARIMA to 𝑦! Estimation &Model Selection Next, to see which model would fit well for our data, we take a look at the ACF and the PACF plots. We have to choose a model AR (p) if the PACF is null after rank p and a MA (q) if the ACF is null after rank q. Below, we see that the ACF is null after lag 7. We want to make clear that the differenciated time series doesn’t look strictly stationary, however, we will assume it is for the following, the period being large, we could consider this decaying fast enough
  • 11. 11     Graphic1.5  The  autocorrelation  plots,  partial  autocorrelation  plots  and  inverse  autocorrelation  plots  of  Toulouse  Saint-­‐Rome  (𝑧!)traffic   volume  after  log  transformation  and  differencing  for  7  and  365   Our autocorrelation plots suggests the use of a model MA (7): 𝑧! =  𝜇 +   𝜀! +   𝜃! 𝜀!!! ! !!! Where all εt are a white noise term error. Next we fit a MA (7) on our series using proc arima in SAS. However we do not obtain a white noise which is seen from the Autocorrelation Checks of residuals in the SAS output (Portmanteau Test). Here, we test H0 = ‘There is no autocorrelation’ against H1 = ‘There is autocorrelation’ The p-value (Pr> Khi-2) is very small (usually we compare this value to a 5% risk level), which allows us to reject H0. We can say that there is a significant autocorrelation, and so, we have to reject the hypothesis of White Noise. (see table below). Therefore, MA (7) is not valid.
  • 12. 12       To help us get a good model, we next use the proc arima with the minic method, method which computes the ‘optimal’ model according to the AIC or BIC5 creteria. According to this method the optimal model is an AR (1) : 𝑧! = 𝑐 +  𝜑𝑧!!! +   𝜀!, but again as we see from the table below, the p-values are <.05 hence we reject the null hypothesis that there is no autocorrelation.     Since the two models are not valid, we try several combinations of them. The model ARMA (1, 7) 𝑧! = 𝑐 +   𝜀! +  𝜑 ∙ 𝑧!!! +   𝜃! 𝜀!_! ! !!!                                                                                                                 5 Cryer.  J.  and  Chan.  KS.  (2008).  Time  series  analysis  :  with  application  in  R.  Springer,  United  States,  pp.  130-­‐131.  
  • 13. 13     Here we get white noise, but we noticed that the θi, i = 1, … 6 are not significant (except i=5, but it’s close to 5%, its significance is not very obvious), so we delete them. Next, we noticed that the intercept (MU, ‘c’ in our formula) is not significant: After deleting the constant,
  • 14. 14     Thus, we have a model with white noise and significant coefficients. To ensure that this is the best model, we tried to fit a model ARMA(2,(7)) and model ARMA(1,(8)) to the data, but we do not get the white noise. Thus, we keep our model ARMA(1, (7)). 𝒛𝒕 =   𝜺𝒕 +  𝝋𝒛𝒕!𝟏 +  𝜽𝜺𝒕!𝟕 And so 1 − 𝐵! 1 −   𝐵!"# 𝑦! = Θ ! (𝐵) Φ!(𝐵) 𝜀! ↔     1 − 𝐵! 1 −   𝐵!"# 𝑦! =   1 −  𝜃  𝐵! 1 −  𝜙𝐵 𝜀! We have the equation of a SARIMA model. However, the theory defines it by only one seasonality s. Here we have two different seasonalities s1=7 and s2=365 and so it is a non standard SARIMA model. 𝑦!~  𝑆𝐴𝑅𝐼𝑀𝐴(1,0, (7))(1, 1,0)!,!"#
  • 15. 15     Prediction From this model, we can compute some predictions. We fit our non-standard SARIMA to the 𝑦! variable in order to get the forecast. However, we predicted 𝑦! =  log (Traffic_CS). Coming back to 𝑥! =  Traffic_CS is easy but not trivial. We have to use the transformation: 𝑥!,!"#$%&'( =   𝑒!!,!"#$%&'(!   !! !   where σ² is the variance of the forecast  𝑦!,!"#$%&'( After the vertical line we see what our model predicts (one year prediction). We can see that the forecast (in red) has the same shape as the original data (in black). Graphic1.6: One year forecasting of our whole data set
  • 16. 16     To check whether our prediction is good or not, we delete a part of data (here data deleted is from 1st January 2013 to 1st January 2014) and predict them by using our model : Graphic1.7: Check of the forecast only on a part of the data Then we compare our predicted values with the original ones which are the true values and calculate the error rate. 𝑒𝑟𝑟𝑜𝑟  𝑟𝑎𝑡𝑒 =  𝑀𝑒𝑎𝑛( 𝑥! −   𝑥!!"#$%&'( 𝑥! ) Our model has a 13.3% error for the one year forecast. We fitted this model on 4 randomly selected cells from Toulouse. The model was valid for the 3 out of the 4 cells selected (cell512, cell521G1, cell451D3), the 4th cell (cell421D1) needed adjustments with the lag. We tried the same methodology without the weekends to check if there were any improvements in the model. Here we succeeded in finding another model, but the final error was bigger than before so we keep the whole data and our model.
  • 17. 17     Time Series 2 – Afghanistan Overview We choose one of the cells for Afghanistan (cell279) from the data set that has been provided to us, which contains daily data from 25th May 2012 to 22nd August 2014, accounting for 839 observations. The variable of interest is Traffic_CS. The first step, as we discussed before, is to check if our time series is stationary. In order to plot the time series and the ACF and PACF graphs we have to “treat” the missing values that our data has. Using again the method that SAS provides (spline) we achieve that. Graphic2.1 : the original chronological traffic volume of Afghanistan from May 2012 to Sept 2014   From the Graphic2.1 we see that there is no clear trend and there is a sudden fall in the traffic between 2/4/2013 and 26/6/2013 which is a main reason for non-stationary time series. The usual method to forecast this are the Markov Chains, however it is impossible here, because we only have one occurrence of the fall .We can propose 3 methods : • Model 1: Fit an ARIMA model to the whole data. • Model 2: Forecast the whole data but by placing this part up in the continuity of the data
  • 18. 18     • Model 3: We use the data just after the gap (we have to know why this fall appeared, maybe it won’t happen anymore) Like before let the variable of interest be 𝑥! = 𝑇𝑟𝑎𝑓𝑓𝑖𝑐_𝐶𝑆! Differencing If we lift the gap, we see that when we fill the gap we see yearly seasonality . And since this gap occurs only once in our data, we can consider it unnatural. Now, we check the ACF and the PACF plots of the original time series and we obtain the following results: Graphic2.2: The autocorrelation plots, partial autocorrelation plots and inverse autocorrelation plots of Afghanistan time series traffic volume As we can notice from the ACF and PACF plots, they are not decreasing exponentially providing evidence of a non-stationary time series. So according with the Augmented Dickey-Fuller Single Mean Test obtained from SAS output for lag5 it is required to difference our time series.
  • 19. 19     Thus we difference our time series for 1. We also take a seasonal difference of 365 that we will justify later. Hence we have the following time series: Graphic2.3: the differenced chronological traffic volume of Afghanistan And the corresponding Autocorrelation plots:
  • 20. 20     Graphic2.4: The autocorrelation plots, partial autocorrelation plots and inverse autocorrelation plots of differenced 1 and 365 times Afghanistan time series traffic volume After the first simple difference we get, ∆! 𝑥! = 1 − 𝐵! 𝑥! =   𝑥! − 𝑥!!! The seasonal difference of 365 of ∆! 𝑥! series gives zt =(1-B365 )1 ∆! 𝑥! where s=365 and D=1 As we can see now the ACF and PACF plots are rapidly decreasing so we can assume that our time series is stationary. We confirm it by the Augmented Dickey-Fuller test.
  • 21. 21     Estimation and Model Selection MODEL 1 The next step is to fit a model with our time series in order to perform forecasting. For this purpose we use the ACF and PACF plots. We see that the ACF plot is null after lag 1 and PACF plot is null after lag 6. Thus we try to fit a MA(1) on this series however we do not get a white noise. Next we try a AR(6) and this gives us a white noise and significant coefficients. Here we see that for all the estimates the p-values are less than .05 thus they are significant. The Autocorrelation Check for the residuals gives us,
  • 22. 22     This is in accordance to the Portemanteau tests for autocorrelations. As the p-values are all greater than .05, the null hypothesis that there is no autocorrelation is not rejected. Thus white noise is obtained. Our model finally is, 𝑧! =   𝜀! +   𝜑! 𝑧!!! + 𝜑! 𝑧!!! +   𝜑! 𝑧!!! +   𝜑! 𝑧!!! +   𝜑! 𝑧!!! +   𝜑! 𝑧!!! which can be represented in terms of Thus 𝑥! (such that zt =(1-B365 )1 ∆! 𝑥!) is a SARIMA(6,1,0)(0,1,0)365. 𝑧! = 1 − 𝐵!"# ! 1 −   𝐵! 𝑥! = Θ ! (𝐵) Φ!(𝐵) 𝜀! Prediction With this pure autoregressive model AR(6) , we firstly tried to forecast one-year traffic value after 22 September 2014.(see graph below). Graphic2.5: Forecast of the original data of Afghanistan
  • 23. 23     Then we calculate the error rate by the mean absolute value of the difference between the predictions and true values divided the true values. This amounts to 11.1% Know that we have forecasted, we have a “bigger” time series to observe, and we clearly see the seasonality now with a maximum in july. MODEL 2 In this model, we lift the gap by 130 and fit a model to this new series. So our new series looks like below   Graphic2.6: chronogram of traffic volume of Afghanistan after lifting the fall We difference this series once and 365 times as before and then 𝑥! fit the following SARIMA(10,1,0)(0,1,0)365 1 − 𝐵!"# ! 1 −   𝐵! 𝑥! = Θ!(𝐵) Φ!"(𝐵) 𝜀! According to the ACF and PACF
  • 24. 24     To verify this model we check the information below: -­‐ The Autocorrelation check of residuals gives us, -­‐ As mentioned earlier, we obtain white noise according to the Portemanteau tests. Also all of the estimates are significant as seen in the table below
  • 25. 25     𝑥!  fit the model SARIMA(10,1,0)(0,1,0)365 : 𝑧! = 1 − 𝐵!"# ! 1 −   𝐵! 𝑥! = Θ!(𝐵) Φ!"(𝐵) 𝜀! Predictions We use this model to make the predictions for 365 days. Below is the graph of the series   Further we calculate the error as explained before and we get 14.6% percent for 30 days.
  • 26. 26     MODEL 3 Here we work on the series after the gap and delete the previous data. Graphic 2.7: Chronogram of the selected data after the gap To get the stationary series we difference it once and to consider the seasonality we difference it 365 times as before. With the same methodology, we try to fit a model. To verify this model we check the information below. - The Autocorrelation check of residuals verify the white noise according to the test.
  • 27. 27     - We obtain the estimates to be significant. The final model is 1 − 𝐵!"# ! 1 −   𝐵! 𝑥! = !!(!) !!"(!) 𝜀! So 𝑥!  fit the following SARIMA(10,1,0)(0,1,0)365 follows a SARIMA(10,1,0)(0,1,0)365.
  • 28. 28     Predictions : We use this model to make the predictions for 365 days. Below is the graph of the series Further we calculate the error as explained before and we get 10.98 percent for 20 days.   So, for the 3 models, we get quite similar SARIMA. Original Data Lifted Data Cut Data Model of 𝒙𝒕 SARIMA(6,1,0)(0,1,0)365 SARIMA(10,1,0)(0,1,0)365 SARIMA(10,1,0)(0,1,0)3 65 Model of 𝒛𝒕 = (1-B365 ) ∆ 𝟏 𝒙𝒕 ARMA(6,0) ARMA(10,0) ARMA(10,0) Error Rate(%) 11.1 14.6 10.98
  • 29. 29     Conclusion In this assignment, we analyse respectively the telecommunication traffic series in Toulouse and Afghanistan. Since the traffic series in Toulouse behaves much better than that in Afghanistan, the prediction is effective for longer duration hence we predicted one year’s traffic volume for Toulouse but only ten days’ for Afghanistan. Both traffic series experienced non stationarity in this telecommunication traffic modelling study because the demand patterns influencing the series were not relatively stable, thus requiring series transformation, which is generally done by differentiation(as we did 365 and 7for Toulouse and once for Afghanistan). From our study we can say that the modern traffic in telecommunication with strong correlation characteristics can be appropriately modeled by time series, especially seasonal ARIMA. Evaluating the seasonal ARIMA model (developed and finally chosen as being the most appropriate in this study) showed a fairly high performance related to the residual dimension, which did not have any correlation. To conclude, we strongly recommend that we need to use ARIMA models with customised lags for each cells of Toulouse and Afghanistan.
  • 30. 30     Annexe: Codes for Toulouse /* importing the dataset*/ PROC IMPORT OUT=telDATAFILE= "C:UsersUSERDesktopTSE M2 Eco STatTelcapDonnesTls dataFinaldata.xlsx" DBMS=xlsx REPLACE; SHEET="HistoricalTraffic"; GETNAMES=YES; RUN; /* we are keeping from the dataset only the variable of our interest (Traffic_CS and the date)*/ Data tel (keep=date Traffic_CS); Set tel; run; /*we are deleting the potential outliers i.e values for which Traffic_CS is very small*/ data tel1; set tel; if Traffic_CS<50 then delete; run; /* proc expand method using spline to fill the gaps in the data(for more information see the references)*/ Proc expand data=tel1 out=tel1 to=day method=spline plots=TRANSFORMIN; id date; run; /* deleting the data after 31/12/2012 in order to compare with the prediction ATTENTION: This is the final code. Initially we did all the procedure with Traffic_CS and not with Traffic_CSbis in order to find the ARMA(1,(7)). As follows: data tel2; settelp; ltra_CS=log(Traffic_CS); ltra7=dif7(ltra_CS); ltra365=dif365(ltra7); run; proc arima data=tel2; ivar=ltra_CS(7,365) minic perror=(1:11);run; e p=1 q=(7) noint plot;run;
  • 31. 31     /*we are taking the log of the traffic_CS and we differencing seasonal two times with respect the weekly seasonality and the yearly */ data tel2; set tel1; ltra_CS=log(Traffic_CSbis); ltra7=dif7(ltra_CS); ltra365=dif365(ltra7); run; /* we are fitting the model ARMA(1,(7)) and in this part we are predicting also for the following 365 days*/ Proc arima data=tel2; I var=ltra_CS(7,365) minic perror=(1:11);run;/*differencing for 7 and 365*/ e p=1 q=(7) noint plot;run;/* estimation of ARMA(p, q) without the intercept */ f out=previs lead=365 id=date interval=day noprint; run; quit;/* forecast of the estimated model for lead=365 days, the outputs are stored in dataset previs */ /* we have taken the log transformation before so now we make the transformation mentioned in the report */ Data previs; set previs; Traffic_forecast=exp(forecast + STD*STD/2); run; data previbis; merge previs tel2; by date; run; /* we are plotting the time series and the prediction*/ Proc gplot data=previbis; symbol1 v=plus i=join color=black; symbol2 v=star i=join color=red; plotTraffic_CS * date=1Traffic_forecast * date=2/overlay href='01JAN2013'd; run; /*Prediction and calculating the error*/ /*After having conclude for our model we are taking the Traffic_CSbis in order to find the error of our prediction like below (the reason for why we delete a part of data, we have already explained in our report.*/ Data telp; set tel1; Traffic_CSbis = Traffic_CS; if date >'31DEC2012'dthen Traffic_CSbis= .; run;
  • 32. 32     data tel2; set telp; ltra_CS=log(Traffic_CSbis); ltra7=dif7(ltra_CS); ltra365=dif365(ltra7); run; /* we are fitting the model ARMA(1,(7)) and in this part we are predicting also for the following 365 days*/ Proc arima data=tel2; I var=ltra_CS(7,365) minic perror=(1:11);run;/*differencing for 7 and 365*/ e p=1 q=(7) noint plot;run;/* estimation of ARMA(p, q) without the intercept */ f out=previs lead=365 id=date interval=day noprint; run; quit;/* forecast of the estimated model for lead=365 days, the outputs are stored in dataset previs */ /* we have taken the log transformation before so now we make the transformation mentioned in the report */ Data previs; set previs; Traffic_forecast=exp(forecast + STD*STD/2); run; data previbis; merge previs tel2; by date; run; /* we are plotting the time series and the prediction*/ Proc gplot data=previbis; symbol1 v=plus i=join color=black; symbol2 v=star i=join color=red; plot Traffic_CS * date=1Traffic_forecast * date=2/overlay href='01JAN2013'd; run; /* Then we are calculate the error*/ data difference; set previbis; if date>'31DEC2012'dthen error = ABS(Traffic_CS - Traffic_forecast); Rerror = ABS(Traffic_CS - Traffic_forecast)/Traffic_CS; Qerror = (Traffic_CS - Traffic_forecast)**2; run; proc means data=difference; var error Rerror Qerror; run;  
  • 33. 33       Codes for Kabul /* import the data */ proc import out=kblc279_4 datafile= "C:Usersdamien1991DesktopM2Stat Consultingkblc279_4" dbms=xlsx replace; sheet="Historical Traffic"; getnames=yes; run; /* only keep date and Traffic_CS */ data tel279 (keep=date Traffic_CS); set kblc279_4; run; /*fill the gaps with spline method */ proc expand data=tel279 out=tel279 to=day method=spline plots=TRANSFORMIN; id date; run; proc arima data=tel279; i var=Traffic_CS(1,365) nlag=100 minic perror=(1:11);run; e p=6 noint;run; /* estimate the model AR(6) */ f out=previs_af279 lead=365 id=date interval=day ; run; /* forecast for leads = 365 days */ data previs_af; set previs_af279; Traffic_forecast=forecast; run; /* merge forecasted and real data */ data previbis_af; merge previs_af tel279; by date; run; /* plot forecast */ proc gplot data=previbis_af; symbol1 v=plus i=join color=black; symbol2 v=star i=join color=red; plot Traffic_CS * date=1 Traffic_forecast * date=2/overlay href='12SEP2014'd; run; /********************** calculate the error *****************************/ data tel279_bis; set tel279; if Date>'02SEP2014'd then delete; run; proc arima data=tel279_bis; i var=Traffic_CS(1,365) nlag=100 minic perror=(1:11);run;
  • 34. 34     e p=6 noint;run; f out=previs_af279 lead=20 id=date interval=day ; run; data previs_af; set previs_af279; Traffic_forecast=forecast; run; data previbis_af; merge previs_af tel279; by date; run; proc gplot data=previbis_af; symbol1 v=plus i=join color=black; symbol2 v=star i=join color=red; plot Traffic_CS * date=1 Traffic_forecast * date=2/overlay href='02SEP2014'd; run; data difference; set previbis_af; if date>'02SEP2014'd then do; error = ABS(Traffic_CS - Traffic_forecast); Rerror = ABS(Traffic_CS - Traffic_forecast)/Traffic_CS; Qerror = (Traffic_CS - Traffic_forecast)**2; end; run; proc means data=difference; var error Rerror Qerror; run; Afghanistan Lift : we try to fix the fall by lifting it. /* import the data */ proc import out=kblc279_4 datafile= "C:Usersdamien1991DesktopM2Stat Consultingkblc279_4" dbms=xlsx replace; sheet="Historical Traffic"; getnames=yes; run; /*keep the variables of interest date and Traffic_CS */ data tel279 (keep=date Traffic_CS); set kblc279_4; run; /* fill the gaps with spline method */ proc expand data=tel279 out=tel279 to=day method=spline plots=TRANSFORMIN; id date; run; /* lift the fall */ data tel279lift;
  • 35. 35     set tel279; if Date>'01APR2013'd and Date<'23JUN2013'd then Traffic_CS=Traffic_CS+130; run; proc arima data=tel279lift; i var=Traffic_CS(1,365) minic perror=(1:11);run; /* identification of the model for Traffic_CS simple diff and seasonal diff 365 */ e p=10 noint;run; /* estimation of the model AR(p=10) */ f out=previs_af279 lead=365 id=date interval=day ; run; /*forecast for leads = 365 days */ data previs_af279; set previs_af279; Traffic_forecast=forecast; run; /* merge of forecasted and original data */ data previbis_af; merge previs_af279 tel279lift; by date; run; /* plot of the forecast */ proc gplot data=previbis_af; symbol1 v=plus i=join color=black; symbol2 v=star i=join color=red; plot Traffic_CS * date=1 Traffic_forecast * date=2/overlay href='22SEP2014'd; run; /********************** calculate the error *****************************/ data tel279lift_bis; set tel279lift; if Date>'22AUG2014'd then delete; run; proc arima data=tel279lift_bis; i var=Traffic_CS(1,365) minic perror=(1:11);run; /* identification of the model for Traffic_CS simple diff and seasonal diff 365 */ e p=10 noint;run; /* estimation of the model AR(p=10) */ f out=previs_af279 lead=365 id=date interval=day ; run; /*forecast for leads = 365 days */ data previs_af279; set previs_af279; Traffic_forecast=forecast; run; /* merge of forecasted and original data */ data previbis_af; merge previs_af279 tel279lift; by date; run;
  • 36. 36     /* plot of the forecast */ proc gplot data=previbis_af; symbol1 v=plus i=join color=black; symbol2 v=star i=join color=red; plot Traffic_CS * date=1 Traffic_forecast * date=2/overlay href='22AUG2014'd; run; /* computation of the error */ data difference; set previbis_af; if date>'22AUG2014'd then do; error = ABS(Traffic_CS - Traffic_forecast); Rerror = ABS(Traffic_CS - Traffic_forecast)/Traffic_CS; Qerror = (Traffic_CS - Traffic_forecast)**2; end; run; proc means data=difference; var error Rerror Qerror; run; Afghanistan cut : we keep the data after the fall /* import the data */ proc import out=kblc279_4 datafile= "C:Usersdamien1991DesktopM2Stat Consultingkblc279_4" dbms=xlsx replace; sheet="Historical Traffic"; getnames=yes; run; /* only keep date and Traffic_CS */ data tel279 (keep=date Traffic_CS); set kblc279_4; run; /*fill the gaps with spline method */ proc expand data=tel279 out=tel279 to=day method=spline plots=TRANSFORMIN; id date; run; /*select the data after the fall */ data tel279cut; set tel279; if Date<'26JUN2013'd then delete; run; proc arima data=tel279cut; i var=Traffic_CS(1,365) nlag=100 minic perror=(1:11);run; e p=10 noint;run; /* estimate the model AR(10) */ f out=previs_af279 lead=365 id=date interval=day ; run; /* forecast for leads = 365 days */
  • 37. 37     data previs_af; set previs_af279; Traffic_forecast=forecast; run; /* merge forecasted and real data */ data previbis_af; merge previs_af tel279cut; by date; run; /* plot forecast */ proc gplot data=previbis_af; symbol1 v=plus i=join color=black; symbol2 v=star i=join color=red; plot Traffic_CS * date=1 Traffic_forecast * date=2/overlay href='12SEP2014'd; run; /********************** calculate the error *****************************/ data tel279cut_bis; set tel279cut; if Date>'02SEP2014'd then delete; run; proc arima data=tel279cut_bis; i var=Traffic_CS(1,365) nlag=100 minic perror=(1:11);run; e p=10 noint;run; f out=previs_af279 lead=20 id=date interval=day ; run; data previs_af; set previs_af279; Traffic_forecast=forecast; run; data previbis_af; merge previs_af tel279cut; by date; run; proc gplot data=previbis_af; symbol1 v=plus i=join color=black; symbol2 v=star i=join color=red; plot Traffic_CS * date=1 Traffic_forecast * date=2/overlay href='02SEP2014'd; run; data difference; set previbis_af; if date>'02SEP2014'd then do; error = ABS(Traffic_CS - Traffic_forecast); Rerror = ABS(Traffic_CS - Traffic_forecast)/Traffic_CS; Qerror = (Traffic_CS - Traffic_forecast)**2; end; run; proc means data=difference; var error Rerror Qerror;