1. 1
Statistical Consulting for TELCAP
M2 Statistics and Econometrics
by
Orestis Ampeliotis Sayli Javadekar Zhao-qiu Luo Damien Quesada
2. 2
ACKNOLWDGEMENT
We would like to thank TelCap for providing us with the necessary logistic requirements to complete
this assignment. We would also like to thank Prof Daouia and Prof Orozco for their guidance without
which we would not be able to complete this assignment.
3. 3
TABLE OF CONTENTS
ACKNOLWDGEMENT 2
INTRODUCTION 4
THEORETICAL BACKGROUND 5
TIME SERIES 1 - TOULOUSE 7
Overview
7
Seasonality
8
Differencing
9
Estimation
&Model
Selection
10
Prediction
15
TIME SERIES 2 – AFGHANISTAN 17
Overview
17
Differencing
18
Estimation
and
Model
Selection
21
MODEL
1
21
MODEL
2
23
MODEL
3
26
CONCLUSION 29
ANNEXE: 30
Codes
for
Toulouse
30
Codes
for
Kabul
33
4. 4
Introduction
In this assignment we were given time series data for mobile data traffic for two locations, Toulouse and
Kabul. The aim was to provide a model that fits the data and could be utilised to predict the traffic. The
data for the two places was provided by TelCap. For Toulouse, we worked with cell421and found a
ARIMA (1,(7)) with an error of 13.3% and we could predict for 365 days. Whereas for Kabul, we
worked with cell 279 and found a AR (6) with an error of 11.1% and we could predict for 20 days. We
tested that our model fits quite well for four randomly selected cells of Toulouse, however for the cells
in Afghanistan as the data in Afghanistan behaves more erratic it’s not possible to fit the same model in
other cells. We have programmed all the codes in SAS for this project.
5. 5
Theoretical Background
First we consider 𝑦! to be the time series. Now we define the models we have used in our project.
• Stationarity: If neither the mean 𝜇!nor the autocovariances 𝛾!"depend on the date t, then the
process for 𝑦! is said to be weakly stationary:
o E(𝑦!)= μ for all t
o 𝛾!" = 𝑐𝑜𝑣 𝑦!, 𝑦!!! = E(𝑦! − 𝜇)(𝑦!!! − 𝜇)=𝛾(𝑗) for all t and any j.
In practice, we consider a time series stationary based on :
1. The chronogram of the time series has a constant mean and constant variance over time
2. If the ACF,PACF and IACF1
plots are decreasing exponentially
3. The Augmented Dickey-Fuller Unit-Root test2
• Autocorrelation function (ACF):
We denote 𝜌(𝜏) as autocorrelation function, it defined as
𝜌 𝜏 =
𝛾 𝜏
𝛾 0
where 𝛾 0 = 𝑐𝑜𝑣 𝑦!, 𝑦! = 𝑣𝑎𝑟 (𝑦!) and 𝛾 𝜏 = 𝑐𝑜𝑣 𝑦!, 𝑦!!!
• Partial autocorrelation function (PACF) and Inverse autocorrelation function (IACF)
PACF and IACF are complicated to define and understand. We referenced several test books3
for you to
read it if necessary. (see references)
• Lag operator:
We define the lag operator B such that 𝐵𝑦! = 𝑦!!!
• White noise:
A stationary time series 𝜀! is said white noise if cov(𝜀!, 𝜀!) = 0 for all t ≠ s
1
Brocklebank
J.
and
Dickey
D.(2003).
SAS
for
Forecasting
Time
Series,
United
States,
pp.58-‐78.
2
Cryer.
J.
and
Chan.
KS.
(2008).
Time
series
analysis
:
with
application
in
R.
Springer,
United
States,
pp.
129.
3
Yves
ARAGON(2006).
Séries
Temporelles
appliquées.
6. 6
In practice, this is verified using the Portmanteau Tests available for testing for autocorrelations in the
residuals of a model: it tests whether any of a group of autocorrelations of the residual time series are
different from zero.
• ARMA(p, q) Model
If the time series is stationary and the ACF, PACF and IACF decrease rapidly
Φ! 𝐵 𝑦! = Θ! 𝐵 𝜀!
𝑦! − 𝜙! 𝑦!!! − ⋯ − 𝜙! 𝑦!!! = 𝜃! + 𝜀! − 𝜃! 𝜀!!! − ⋯ − 𝜃! 𝜀!!!
Where 𝜀!are White Noise i.e. 𝜀!~𝑊𝑁 0, 𝜎!
. This model has two parameters:
Ø order of AR is p with coefficients AR:𝜙!, 𝜙!, 𝜙!, . . . , 𝜙!
Ø order of MA is q with coefficients MA:𝜃!, 𝜃!, 𝜃!, … , 𝜃!
• SARMA(p, q)(P,Q)s Model
If the time series presents a seasonality of period s, we use :
Φ! 𝐵!
Φ! 𝐵 𝑦! = 𝑐 + Θ! 𝐵!
Θ! 𝐵 𝜀!
• ARIMA(p,d,q) Model
Φ! 𝐵 Δ!
𝑦! = Θ! 𝐵 𝜀!
Where Δ!
𝑦! = (1 − 𝐵)!
𝑦! and follows an ARMA model
• SARIMA(p,d,q)(P, D, Q)S Model
We defineΘ! 𝐵!
= 1 − 𝑏! 𝐵!
−. . . −𝑏! 𝐵!"
and Φ! 𝐵!
= 1 − 𝑎! 𝐵!
−. . . −𝑎! 𝐵!"
1 − 𝐵 !
1 − 𝐵! !
𝑦! =
Θ! 𝐵!
Θ! 𝐵
Φ! 𝐵! Φ! 𝐵
𝜀!
7. 7
Time Series 1 - Toulouse
Overview
We have chosen randomly one of the cells for Toulouse (cell 512) which has given us the daily data
from 6th
July 2010 to 31st
December 2013, accounting for 1,212 observations. The variable of interest is
Traffic_CS. Before building any model on our series, our first step is to see if this series is stationary.
Thus to check for this we try to plot the ACF and PACF plots. During this, we noticed that in the data, a
few days are missing, thus we have to ‘fill’ these gaps and have a value for each day of our interval. To
do so, we used a SAS procedure proc expand with the method spline4
.
So our raw data after filling the gaps looks like this:
Graphic1.1 : the original chronological traffic volume of Toulouse Saint-Rome during Jul.2010 to Jan.2014
4
The spline method is just a way to join together two spaced points thanks to a segmented function consisting of third-degree (cubic)
polynomial functions, so that the whole curve and its first and second derivatives are continuous. The methods should not be very
important for the following. The real need is to fill every gap of the time series. For more explanations, see the book:
Bartels, R. H.; Beatty, J. C.; and Barsky, B. A. "Hermite and Cubic Spline Interpolation." Ch. 3 in An Introduction to Splines for Use in
Computer Graphics and Geometric Modelling. San Francisco, CA: Morgan Kaufmann, pp. 9-17, 1998.
Or the website:
http://mathworld.wolfram.com/CubicSpline.html
8. 8
From the figure above we see that, there are a few observations of the variable Traffic_CS equal to 16.
These can be considered atypical and potential outliers. Confirming with Mr. Olivier Rostaing, that
these have been observed probably due to a failure of the equipment, hence they have been deleted and
the missing values have been filled by the “spline method” in order to have a proper time series to work
with. We see that this series displays a non-constant mean and variance. Thus the series is not stationary.
To stabilize the variance we use a log transformation. From now on, the variable of interest will be
𝑥! = 𝑇𝑟𝑎𝑓𝑓𝑖𝑐_𝐶𝑆!
𝑦! = log (𝑥!)
To convert it to a stationary series would be our first step, but before that we observe other patterns in
the series.
Seasonality
From the figure below, we can see that the same pattern is repeated for all the three years. (see arrows)
This tells us that our series displays yearly seasonality.
Graphic1.2: the chronological traffic volume𝑥! of Toulouse Saint-‐Rome during Jul.2010 to Jan.2014 after removing the
outliers
Next, we checked the ACF and the PACF plots of this series and we obtain,
9. 9
Graphic1.3 : The autocorrelation plots, partial autocorrelation plots and inverse autocorrelation plots of Toulouse Saint-‐
Rome traffic volume 𝑥!
The plots tell us that there is a pattern repeated every 7 days as well (see arrows). Hence there is weekly
seasonality along with yearly seasonality in our non stationary time series.
Differencing
If there is a non-stationary time series yt, and a seasonality of period s, then to make yt stationary, we
difference with the order s:
∆! 𝑦! = 1 − 𝐵!
𝑦! = 𝑦! − 𝑦!!!
If the ACF and PACF decrease rapidly to null, then it means that we have a stationary series and now
we can fit an ARMA model to this ‘new’ time series.
We try 3 methods:
• Difference by 7: the seasonality is removed, but the model is not valid at the end.
• Difference by 365: it does not eliminate the seasonality.
• Difference by 365 and 7: in this case we eliminate the seasonality and we get a white noise, so
we use this method.
10. 10
After the log transform and differencing by 365 and 7 the Augmented Dickey-Fuller test confirms that
our series is likely to be stationnary
The time series becomes:
Graphic1.4: the chronological traffic volume of Toulouse Saint-‐Rome (𝑧!)during Jul.2010 to Jan.2014
Here we see that the series has a constant mean and variance. From now, we are interested in
𝑧! = (1 − 𝐵!
) (1 − 𝐵!"#
)𝑦!
Because of the seasonality, we want to fit a SARMA model to 𝑧! and so a SARIMA to 𝑦!
Estimation &Model Selection
Next, to see which model would fit well for our data, we take a look at the ACF and the PACF plots. We
have to choose a model AR (p) if the PACF is null after rank p and a MA (q) if the ACF is null after
rank q. Below, we see that the ACF is null after lag 7. We want to make clear that the differenciated
time series doesn’t look strictly stationary, however, we will assume it is for the following, the period
being large, we could consider this decaying fast enough
11. 11
Graphic1.5 The autocorrelation plots, partial autocorrelation plots and inverse autocorrelation plots of Toulouse Saint-‐Rome (𝑧!)traffic
volume after log transformation and differencing for 7 and 365
Our autocorrelation plots suggests the use of a model MA (7):
𝑧! = 𝜇 + 𝜀! + 𝜃! 𝜀!!!
!
!!!
Where all εt are a white noise term error.
Next we fit a MA (7) on our series using proc arima in SAS. However we do not obtain a white noise
which is seen from the Autocorrelation Checks of residuals in the SAS output (Portmanteau Test). Here,
we test
H0 = ‘There is no autocorrelation’ against H1 = ‘There is autocorrelation’
The p-value (Pr> Khi-2) is very small (usually we compare this value to a 5% risk level), which allows
us to reject H0. We can say that there is a significant autocorrelation, and so, we have to reject the
hypothesis of White Noise. (see table below). Therefore, MA (7) is not valid.
12. 12
To help us get a good model, we next use the proc arima with the minic method, method which
computes the ‘optimal’ model according to the AIC or BIC5
creteria.
According to this method the optimal model is an AR (1) :
𝑧! = 𝑐 + 𝜑𝑧!!! + 𝜀!,
but again as we see from the table below, the p-values are <.05 hence we reject the null hypothesis that
there is no autocorrelation.
Since the two models are not valid, we try several combinations of them. The model ARMA (1, 7)
𝑧! = 𝑐 + 𝜀! + 𝜑 ∙ 𝑧!!! + 𝜃! 𝜀!_!
!
!!!
5
Cryer.
J.
and
Chan.
KS.
(2008).
Time
series
analysis
:
with
application
in
R.
Springer,
United
States,
pp.
130-‐131.
13. 13
Here we get white noise, but we noticed that the θi, i = 1, … 6 are not significant (except i=5, but it’s
close to 5%, its significance is not very obvious), so we delete them.
Next, we noticed that the intercept (MU, ‘c’ in our formula) is not significant:
After deleting the constant,
14. 14
Thus, we have a model with white noise and significant coefficients.
To ensure that this is the best model, we tried to fit a model ARMA(2,(7)) and model ARMA(1,(8)) to
the data, but we do not get the white noise.
Thus, we keep our model ARMA(1, (7)).
𝒛𝒕 = 𝜺𝒕 + 𝝋𝒛𝒕!𝟏 + 𝜽𝜺𝒕!𝟕
And so
1 − 𝐵!
1 − 𝐵!"#
𝑦! =
Θ ! (𝐵)
Φ!(𝐵)
𝜀! ↔ 1 − 𝐵!
1 − 𝐵!"#
𝑦! =
1 − 𝜃 𝐵!
1 − 𝜙𝐵
𝜀!
We have the equation of a SARIMA model. However, the theory defines it by only one seasonality s.
Here we have two different seasonalities s1=7 and s2=365 and so it is a non standard SARIMA model.
𝑦!~ 𝑆𝐴𝑅𝐼𝑀𝐴(1,0, (7))(1, 1,0)!,!"#
15. 15
Prediction
From this model, we can compute some predictions.
We fit our non-standard SARIMA to the 𝑦! variable in order to get the forecast. However, we predicted
𝑦! = log (Traffic_CS). Coming back to 𝑥! = Traffic_CS is easy but not trivial.
We have to use the transformation:
𝑥!,!"#$%&'( = 𝑒!!,!"#$%&'(!
!!
!
where σ² is the variance of the forecast 𝑦!,!"#$%&'(
After the vertical line we see what our model predicts (one year prediction). We can see that the forecast
(in red) has the same shape as the original data (in black).
Graphic1.6: One year forecasting of our whole data set
16. 16
To check whether our prediction is good or not, we delete a part of data (here data deleted is from 1st
January 2013 to 1st
January 2014) and predict them by using our model :
Graphic1.7: Check of the forecast only on a part of the data
Then we compare our predicted values with the original ones which are the true values and calculate the
error rate.
𝑒𝑟𝑟𝑜𝑟 𝑟𝑎𝑡𝑒 = 𝑀𝑒𝑎𝑛(
𝑥! − 𝑥!!"#$%&'(
𝑥!
)
Our model has a 13.3% error for the one year forecast.
We fitted this model on 4 randomly selected cells from Toulouse. The model was valid for the 3 out of
the 4 cells selected (cell512, cell521G1, cell451D3), the 4th cell (cell421D1) needed adjustments with
the lag.
We tried the same methodology without the weekends to check if there were any improvements in the
model. Here we succeeded in finding another model, but the final error was bigger than before so we
keep the whole data and our model.
17. 17
Time Series 2 – Afghanistan
Overview
We choose one of the cells for Afghanistan (cell279) from the data set that has been provided to us,
which contains daily data from 25th
May 2012 to 22nd
August 2014, accounting for 839 observations.
The variable of interest is Traffic_CS. The first step, as we discussed before, is to check if our time
series is stationary. In order to plot the time series and the ACF and PACF graphs we have to “treat” the
missing values that our data has. Using again the method that SAS provides (spline) we achieve that.
Graphic2.1 : the original chronological traffic volume of Afghanistan from May 2012 to Sept 2014
From the Graphic2.1 we see that there is no clear trend and there is a sudden fall in the traffic between
2/4/2013 and 26/6/2013 which is a main reason for non-stationary time series.
The usual method to forecast this are the Markov Chains, however it is impossible here, because we
only have one occurrence of the fall .We can propose 3 methods :
• Model 1: Fit an ARIMA model to the whole data.
• Model 2: Forecast the whole data but by placing this part up in the continuity of the data
18. 18
• Model 3: We use the data just after the gap (we have to know why this fall appeared, maybe it
won’t happen anymore)
Like before let the variable of interest be
𝑥! = 𝑇𝑟𝑎𝑓𝑓𝑖𝑐_𝐶𝑆!
Differencing
If we lift the gap, we see that when we fill the gap we see yearly seasonality . And since this gap occurs
only once in our data, we can consider it unnatural.
Now, we check the ACF and the PACF plots of the original time series and we obtain the following
results:
Graphic2.2: The autocorrelation plots, partial autocorrelation plots and inverse autocorrelation plots of Afghanistan time
series traffic volume
As we can notice from the ACF and PACF plots, they are not decreasing exponentially providing
evidence of a non-stationary time series. So according with the Augmented Dickey-Fuller Single Mean
Test obtained from SAS output for lag5 it is required to difference our time series.
19. 19
Thus we difference our time series for 1. We also take a seasonal difference of 365 that we will justify
later. Hence we have the following time series:
Graphic2.3: the differenced chronological traffic volume of Afghanistan
And the corresponding Autocorrelation plots:
20. 20
Graphic2.4: The autocorrelation plots, partial autocorrelation plots and inverse autocorrelation plots of differenced 1 and 365
times Afghanistan time series traffic volume
After the first simple difference we get,
∆! 𝑥! = 1 − 𝐵!
𝑥! = 𝑥! − 𝑥!!!
The seasonal difference of 365 of ∆! 𝑥! series gives
zt =(1-B365
)1
∆! 𝑥!
where s=365 and D=1
As we can see now the ACF and PACF plots are rapidly decreasing so we can assume that our time
series is stationary. We confirm it by the Augmented Dickey-Fuller test.
21. 21
Estimation and Model Selection
MODEL 1
The next step is to fit a model with our time series in order to perform forecasting. For this purpose we
use the ACF and PACF plots. We see that the ACF plot is null after lag 1 and PACF plot is null after lag
6. Thus we try to fit a MA(1) on this series however we do not get a white noise. Next we try a AR(6)
and this gives us a white noise and significant coefficients.
Here we see that for all the estimates the p-values are less than .05 thus they are significant.
The Autocorrelation Check for the residuals gives us,
22. 22
This is in accordance to the Portemanteau tests for autocorrelations. As the p-values are all greater
than .05, the null hypothesis that there is no autocorrelation is not rejected. Thus white noise is obtained.
Our model finally is,
𝑧! = 𝜀! + 𝜑! 𝑧!!! + 𝜑! 𝑧!!! + 𝜑! 𝑧!!! + 𝜑! 𝑧!!! + 𝜑! 𝑧!!! + 𝜑! 𝑧!!!
which can be represented in terms of
Thus 𝑥! (such that zt =(1-B365
)1
∆! 𝑥!) is a SARIMA(6,1,0)(0,1,0)365.
𝑧! = 1 − 𝐵!"# !
1 − 𝐵!
𝑥! =
Θ ! (𝐵)
Φ!(𝐵)
𝜀!
Prediction
With this pure autoregressive model AR(6) , we firstly tried to forecast one-year traffic value after 22
September 2014.(see graph below).
Graphic2.5: Forecast of the original data of Afghanistan
23. 23
Then we calculate the error rate by the mean absolute value of the difference between the predictions
and true values divided the true values. This amounts to 11.1%
Know that we have forecasted, we have a “bigger” time series to observe, and we clearly see the
seasonality now with a maximum in july.
MODEL 2
In this model, we lift the gap by 130 and fit a model to this new series. So our new series looks like
below
Graphic2.6: chronogram of traffic volume of Afghanistan after lifting the fall
We difference this series once and 365 times as before and then 𝑥! fit the following
SARIMA(10,1,0)(0,1,0)365
1 − 𝐵!"# !
1 − 𝐵!
𝑥! =
Θ!(𝐵)
Φ!"(𝐵)
𝜀!
According to the ACF and PACF
24. 24
To verify this model we check the information below:
-‐ The Autocorrelation check of residuals gives us,
-‐ As mentioned earlier, we obtain white noise according to the Portemanteau tests. Also all of the
estimates are significant as seen in the table below
25. 25
𝑥! fit the model SARIMA(10,1,0)(0,1,0)365 :
𝑧! = 1 − 𝐵!"# !
1 − 𝐵!
𝑥! =
Θ!(𝐵)
Φ!"(𝐵)
𝜀!
Predictions
We use this model to make the predictions for 365 days. Below is the graph of the series
Further we calculate the error as explained before and we get 14.6% percent for 30 days.
26. 26
MODEL 3
Here we work on the series after the gap and delete the previous data.
Graphic 2.7: Chronogram of the selected data after the gap
To get the stationary series we difference it once and to consider the seasonality we difference it 365
times as before. With the same methodology, we try to fit a model.
To verify this model we check the information below.
- The Autocorrelation check of residuals verify the white noise according to the test.
27. 27
- We obtain the estimates to be significant.
The final model is 1 − 𝐵!"# !
1 − 𝐵!
𝑥! =
!!(!)
!!"(!)
𝜀!
So 𝑥! fit the following SARIMA(10,1,0)(0,1,0)365 follows a SARIMA(10,1,0)(0,1,0)365.
28. 28
Predictions :
We use this model to make the predictions for 365 days. Below is the graph of the series
Further we calculate the error as explained before and we get 10.98 percent for 20 days.
So, for the 3 models, we get quite similar SARIMA.
Original Data Lifted Data Cut Data
Model of 𝒙𝒕 SARIMA(6,1,0)(0,1,0)365 SARIMA(10,1,0)(0,1,0)365 SARIMA(10,1,0)(0,1,0)3
65
Model of 𝒛𝒕 =
(1-B365
) ∆ 𝟏 𝒙𝒕
ARMA(6,0) ARMA(10,0) ARMA(10,0)
Error Rate(%) 11.1 14.6 10.98
29. 29
Conclusion
In this assignment, we analyse respectively the telecommunication traffic series in Toulouse and
Afghanistan. Since the traffic series in Toulouse behaves much better than that in Afghanistan, the
prediction is effective for longer duration hence we predicted one year’s traffic volume for Toulouse but
only ten days’ for Afghanistan. Both traffic series experienced non stationarity in this
telecommunication traffic modelling study because the demand patterns influencing the series were not
relatively stable, thus requiring series transformation, which is generally done by differentiation(as we
did 365 and 7for Toulouse and once for Afghanistan). From our study we can say that the modern traffic
in telecommunication with strong correlation characteristics can be appropriately modeled by time series,
especially seasonal ARIMA. Evaluating the seasonal ARIMA model (developed and finally chosen as
being the most appropriate in this study) showed a fairly high performance related to the residual
dimension, which did not have any correlation. To conclude, we strongly recommend that we need to
use ARIMA models with customised lags for each cells of Toulouse and Afghanistan.
30. 30
Annexe:
Codes for Toulouse
/* importing the dataset*/
PROC IMPORT OUT=telDATAFILE= "C:UsersUSERDesktopTSE M2 Eco
STatTelcapDonnesTls dataFinaldata.xlsx"
DBMS=xlsx REPLACE;
SHEET="HistoricalTraffic";
GETNAMES=YES;
RUN;
/* we are keeping from the dataset only the variable of our interest (Traffic_CS
and the date)*/
Data tel (keep=date Traffic_CS);
Set tel;
run;
/*we are deleting the potential outliers i.e values for which Traffic_CS is very
small*/
data tel1;
set tel;
if Traffic_CS<50 then delete;
run;
/* proc expand method using spline to fill the gaps in the data(for more
information see the references)*/
Proc expand data=tel1 out=tel1 to=day method=spline plots=TRANSFORMIN;
id date;
run;
/* deleting the data after 31/12/2012 in order to compare with the prediction
ATTENTION: This is the final code. Initially we did all the procedure with
Traffic_CS and not with Traffic_CSbis in order to find the ARMA(1,(7)). As follows:
data tel2;
settelp;
ltra_CS=log(Traffic_CS);
ltra7=dif7(ltra_CS);
ltra365=dif365(ltra7);
run;
proc arima data=tel2;
ivar=ltra_CS(7,365) minic perror=(1:11);run;
e p=1 q=(7) noint plot;run;
31. 31
/*we are taking the log of the traffic_CS and we differencing seasonal two times
with respect the weekly seasonality and the yearly */
data tel2;
set tel1;
ltra_CS=log(Traffic_CSbis);
ltra7=dif7(ltra_CS);
ltra365=dif365(ltra7);
run;
/* we are fitting the model ARMA(1,(7)) and in this part we are predicting also for
the following 365 days*/
Proc arima data=tel2;
I var=ltra_CS(7,365) minic perror=(1:11);run;/*differencing for 7 and 365*/
e p=1 q=(7) noint plot;run;/* estimation of ARMA(p, q) without the intercept */
f out=previs lead=365 id=date interval=day noprint; run; quit;/* forecast of
the estimated model for lead=365 days, the outputs are stored in dataset previs */
/* we have taken the log transformation before so now we make the transformation
mentioned in the report */
Data previs;
set previs;
Traffic_forecast=exp(forecast + STD*STD/2);
run;
data previbis;
merge previs tel2;
by date;
run;
/* we are plotting the time series and the prediction*/
Proc gplot data=previbis;
symbol1 v=plus i=join color=black;
symbol2 v=star i=join color=red;
plotTraffic_CS * date=1Traffic_forecast * date=2/overlay href='01JAN2013'd;
run;
/*Prediction and calculating the error*/
/*After having conclude for our model we are taking the Traffic_CSbis in order to
find the error of our prediction like below (the reason for why we delete a part of
data, we have already explained in our report.*/
Data telp;
set tel1;
Traffic_CSbis = Traffic_CS;
if date >'31DEC2012'dthen
Traffic_CSbis= .;
run;
32. 32
data tel2;
set telp;
ltra_CS=log(Traffic_CSbis);
ltra7=dif7(ltra_CS);
ltra365=dif365(ltra7);
run;
/* we are fitting the model ARMA(1,(7)) and in this part we are predicting also for
the following 365 days*/
Proc arima data=tel2;
I var=ltra_CS(7,365) minic perror=(1:11);run;/*differencing for 7 and 365*/
e p=1 q=(7) noint plot;run;/* estimation of ARMA(p, q) without the intercept */
f out=previs lead=365 id=date interval=day noprint; run; quit;/* forecast of
the estimated model for lead=365 days, the outputs are stored in dataset previs */
/* we have taken the log transformation before so now we make the transformation
mentioned in the report */
Data previs;
set previs;
Traffic_forecast=exp(forecast + STD*STD/2);
run;
data previbis;
merge previs tel2;
by date;
run;
/* we are plotting the time series and the prediction*/
Proc gplot data=previbis;
symbol1 v=plus i=join color=black;
symbol2 v=star i=join color=red;
plot Traffic_CS * date=1Traffic_forecast * date=2/overlay href='01JAN2013'd;
run;
/* Then we are calculate the error*/
data difference;
set previbis;
if date>'31DEC2012'dthen
error = ABS(Traffic_CS - Traffic_forecast);
Rerror = ABS(Traffic_CS - Traffic_forecast)/Traffic_CS;
Qerror = (Traffic_CS - Traffic_forecast)**2;
run;
proc means data=difference;
var error Rerror Qerror;
run;
33. 33
Codes for Kabul
/* import the data */
proc import out=kblc279_4 datafile= "C:Usersdamien1991DesktopM2Stat
Consultingkblc279_4"
dbms=xlsx replace;
sheet="Historical Traffic";
getnames=yes;
run;
/* only keep date and Traffic_CS */
data tel279 (keep=date Traffic_CS);
set kblc279_4;
run;
/*fill the gaps with spline method */
proc expand data=tel279 out=tel279 to=day method=spline plots=TRANSFORMIN;
id date;
run;
proc arima data=tel279;
i var=Traffic_CS(1,365) nlag=100 minic perror=(1:11);run;
e p=6 noint;run; /* estimate the model AR(6) */
f out=previs_af279 lead=365 id=date interval=day ; run; /* forecast for leads =
365 days */
data previs_af;
set previs_af279;
Traffic_forecast=forecast;
run;
/* merge forecasted and real data */
data previbis_af;
merge previs_af tel279;
by date;
run;
/* plot forecast */
proc gplot data=previbis_af;
symbol1 v=plus i=join color=black;
symbol2 v=star i=join color=red;
plot Traffic_CS * date=1 Traffic_forecast * date=2/overlay href='12SEP2014'd;
run;
/********************** calculate the error *****************************/
data tel279_bis;
set tel279;
if Date>'02SEP2014'd then delete;
run;
proc arima data=tel279_bis;
i var=Traffic_CS(1,365) nlag=100 minic perror=(1:11);run;
34. 34
e p=6 noint;run;
f out=previs_af279 lead=20 id=date interval=day ; run;
data previs_af;
set previs_af279;
Traffic_forecast=forecast;
run;
data previbis_af;
merge previs_af tel279;
by date;
run;
proc gplot data=previbis_af;
symbol1 v=plus i=join color=black;
symbol2 v=star i=join color=red;
plot Traffic_CS * date=1 Traffic_forecast * date=2/overlay href='02SEP2014'd;
run;
data difference;
set previbis_af;
if date>'02SEP2014'd then do;
error = ABS(Traffic_CS - Traffic_forecast);
Rerror = ABS(Traffic_CS - Traffic_forecast)/Traffic_CS;
Qerror = (Traffic_CS - Traffic_forecast)**2;
end;
run;
proc means data=difference;
var error Rerror Qerror;
run;
Afghanistan Lift : we try to fix the fall by lifting it.
/* import the data */
proc import out=kblc279_4 datafile= "C:Usersdamien1991DesktopM2Stat
Consultingkblc279_4"
dbms=xlsx replace;
sheet="Historical Traffic";
getnames=yes;
run;
/*keep the variables of interest date and Traffic_CS */
data tel279 (keep=date Traffic_CS);
set kblc279_4;
run;
/* fill the gaps with spline method */
proc expand data=tel279 out=tel279 to=day method=spline plots=TRANSFORMIN;
id date;
run;
/* lift the fall */
data tel279lift;
35. 35
set tel279;
if Date>'01APR2013'd and Date<'23JUN2013'd then Traffic_CS=Traffic_CS+130;
run;
proc arima data=tel279lift;
i var=Traffic_CS(1,365) minic perror=(1:11);run; /* identification of the model for
Traffic_CS simple diff and seasonal diff 365 */
e p=10 noint;run; /* estimation of the model AR(p=10) */
f out=previs_af279 lead=365 id=date interval=day ; run; /*forecast for leads =
365 days */
data previs_af279;
set previs_af279;
Traffic_forecast=forecast;
run;
/* merge of forecasted and original data */
data previbis_af;
merge previs_af279 tel279lift;
by date;
run;
/* plot of the forecast */
proc gplot data=previbis_af;
symbol1 v=plus i=join color=black;
symbol2 v=star i=join color=red;
plot Traffic_CS * date=1 Traffic_forecast * date=2/overlay href='22SEP2014'd;
run;
/********************** calculate the error *****************************/
data tel279lift_bis;
set tel279lift;
if Date>'22AUG2014'd then delete;
run;
proc arima data=tel279lift_bis;
i var=Traffic_CS(1,365) minic perror=(1:11);run; /* identification of the model for
Traffic_CS simple diff and seasonal diff 365 */
e p=10 noint;run; /* estimation of the model AR(p=10) */
f out=previs_af279 lead=365 id=date interval=day ; run; /*forecast for leads =
365 days */
data previs_af279;
set previs_af279;
Traffic_forecast=forecast;
run;
/* merge of forecasted and original data */
data previbis_af;
merge previs_af279 tel279lift;
by date;
run;
36. 36
/* plot of the forecast */
proc gplot data=previbis_af;
symbol1 v=plus i=join color=black;
symbol2 v=star i=join color=red;
plot Traffic_CS * date=1 Traffic_forecast * date=2/overlay href='22AUG2014'd;
run;
/* computation of the error */
data difference;
set previbis_af;
if date>'22AUG2014'd then do;
error = ABS(Traffic_CS - Traffic_forecast);
Rerror = ABS(Traffic_CS - Traffic_forecast)/Traffic_CS;
Qerror = (Traffic_CS - Traffic_forecast)**2;
end;
run;
proc means data=difference;
var error Rerror Qerror;
run;
Afghanistan cut : we keep the data after the fall
/* import the data */
proc import out=kblc279_4 datafile= "C:Usersdamien1991DesktopM2Stat
Consultingkblc279_4"
dbms=xlsx replace;
sheet="Historical Traffic";
getnames=yes;
run;
/* only keep date and Traffic_CS */
data tel279 (keep=date Traffic_CS);
set kblc279_4;
run;
/*fill the gaps with spline method */
proc expand data=tel279 out=tel279 to=day method=spline plots=TRANSFORMIN;
id date;
run;
/*select the data after the fall */
data tel279cut;
set tel279;
if Date<'26JUN2013'd then delete;
run;
proc arima data=tel279cut;
i var=Traffic_CS(1,365) nlag=100 minic perror=(1:11);run;
e p=10 noint;run; /* estimate the model AR(10) */
f out=previs_af279 lead=365 id=date interval=day ; run; /* forecast for leads =
365 days */
37. 37
data previs_af;
set previs_af279;
Traffic_forecast=forecast;
run;
/* merge forecasted and real data */
data previbis_af;
merge previs_af tel279cut;
by date;
run;
/* plot forecast */
proc gplot data=previbis_af;
symbol1 v=plus i=join color=black;
symbol2 v=star i=join color=red;
plot Traffic_CS * date=1 Traffic_forecast * date=2/overlay href='12SEP2014'd;
run;
/********************** calculate the error *****************************/
data tel279cut_bis;
set tel279cut;
if Date>'02SEP2014'd then delete;
run;
proc arima data=tel279cut_bis;
i var=Traffic_CS(1,365) nlag=100 minic perror=(1:11);run;
e p=10 noint;run;
f out=previs_af279 lead=20 id=date interval=day ; run;
data previs_af;
set previs_af279;
Traffic_forecast=forecast;
run;
data previbis_af;
merge previs_af tel279cut;
by date;
run;
proc gplot data=previbis_af;
symbol1 v=plus i=join color=black;
symbol2 v=star i=join color=red;
plot Traffic_CS * date=1 Traffic_forecast * date=2/overlay href='02SEP2014'd;
run;
data difference;
set previbis_af;
if date>'02SEP2014'd then do;
error = ABS(Traffic_CS - Traffic_forecast);
Rerror = ABS(Traffic_CS - Traffic_forecast)/Traffic_CS;
Qerror = (Traffic_CS - Traffic_forecast)**2;
end;
run;
proc means data=difference;
var error Rerror Qerror;