Time Series Analysis: Monthly Postal Revenue, Department of Post, India

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Time Series Analysis: Monthly Postal Revenue, Department of Post, India - Presentation Transcript

    1. Modeling of Univariate Data Total Revenue of Department of Post, Government of India Course Instructor Prof. Nityananda Sarkar[1] Date Report by 23rd May, 2009 Probal Mojumder (QE0810) [2] Contents – Objective …………………………………………...……….………………………….... 1 Department of Post, India …………………………...…….……………………...…..…. 2 The Series ………………….…………………………...…………………………...…… 3 Detection and Removal of Seasonality …….……………..……………………………... 5 A Note on Stationarity ………...…………………………..…………………………..… 8 A Note on Unit Root Test ……………………………………………………………...... 9 Detection of Stationarity …………………………………..……………………..…….. 11 Estimation of the Model ................................................................................................... 14 Diagnostic Tests ……………………………………………………………………..…. 16 Forecasting ……………………………………………………………………………... 18 Test for Structural Break …………………………..…………..…………………..….... 20 Conclusions ……………………………………………………..…………………...…. 21 Acknowledgement …………………………………………………..………………..... 22 Software Used ………………………………………………………..……………...…. 22 References ……………………………………………………………..………..........… 22 Objective – Our time series analysis of the given data will be followed by performing the following exercises. 1. Plotting the data and making visual inferences on it. 2. Testing for the seasonality in the data. Deseasonalising the series if seasonality exist. 3. Carrying out the Augmented Dickey Fuller (ADF) test to find the existence of unit root. 4. Obtaining the most appropriate model (AR/MA/ARMA) for the stationary series obtained. 5. Carrying out the diagnostic tests. 6. Forecasting with in-sample observations and hold-out sample. 7. Testing for structural break. 8. In case of a structural break, examining whether the conclusion on the stationary/ non stationary property of time series as obtained above remains unchanged. [1] Proffesor, Economic Research Unit, Indian Statistical Institute, 203, B.T. Road, Kolkata - 700108. [2] Student, M.S (Q.E) – 1st year, Indian Statistical Institute, 203, B.T. Road, Kolkata-700108. email: probalmojumder@yahoo.co.in 1
    2. Department of Post, India – The Department of Posts functioning under the brand name India Post is a government operated postal system in India; it is generally referred to within India as \"the post office\". The Indian Postal Service, with 155,333 post offices, is the most widely distributed post office system in the world (China is next, with 57,000). The large numbers are a result of a long tradition of many disparate postal systems which were unified in the Indian Union post-Independence. Owing to this far-flung reach and its presence in remote areas, the Indian postal service is also involved in other services such as small savings banking and financial services. The postal service comes under the Department of Posts which is a part of the Ministry of Communications and Information Technology under the Government of India. The apex body of the department is the Postal Service Board. Financial Management – The Department of Post provides postal services to the public through a large nationwide network of Post Offices. Besides providing purely postal services, Post Offices perform agency functions like Savings Bank, Payment of Pension, Sale of Cash Certificates, etc., on behalf of other Ministries and Departments of the Government of India/other organisations. The total revenue earned including remuneration for SB and SC work during the year 2006-07 was Rs. 532243.9 Lakhs and the amount received from other Ministries/Departments as Agency charges was Rs. 20715.6 Lakhs. Gross working expenditure for the year 2006-07 was Rs. 677912.0 Lakhs against the previous year’s expenditure of Rs. 642915.2 Lakhs (i.e. an increase of about 5.44%). The increase was mainly due to payment of Dearness Allowance/Dearness Relief and payment of Pensionary charges etc. In spite of the increase in salaries and Pensionary charges, deficit of the Department is Rs. 124952.5 Lakhs against the previous year (2005-06) deficit of Rs. 120988.4 Lakhs. During the year the funds made available by the Ministry of Finance for ‘Working Expenses’ and ‘Capital Outlay’, were appropriately utilized. Surplus funds were surrendered in time to the Ministry of Finance. This appreciable achievement was made possible by effective budgetary control and monitoring the progress of expenditure on monthly basis. ••• The subtle intricacies in the functioning of an important unit of Government of India like The Department of Post ignited my interest to consider their monthly total revenue series post New Economic Policy, 1991. Studying the behavior of the series I expect to understand where postal system of our country stands in the light of new changes like modern communication technologies. I also wish to get some idea on proper execution of welfare aspects that a common man’s ‘Post Office’ must provide him candidly. 2
    3. The Series – The series under my consideration is Total Postal Revenue of Department of Post, Ministry of Communication and Information Technology, India. I considered monthly data from April 1991 to March 2008. Unit of data is Rs. Lakhs. Let Y denote the Data. Date Data Date Data Date Data Apr-91 6834 Dec-96 10839 Aug-02 20435 May-91 6560 Jan-97 10905 Sep-02 20504 Jun-91 6365 Feb-97 9425 Oct-02 19952 Jul-91 NA Mar-97 10661 Nov-02 19523 Aug-91 7199 Apr-97 9261 Dec-02 20889 Sep-91 8317 May-97 9367 Jan-03 22306 Oct-91 8129 Jun-97 11773 Feb-03 18488 Nov-91 7639 Jul-97 13594 Mar-03 183742 Dec-91 8718 Aug-97 13468 Apr-03 23340 Jan-92 8318 Sep-97 14018 May-03 19132 Feb-92 8413 Oct-97 12119 Jun-03 19032 Mar-92 NA Nov-97 13687 Jul-03 23305 Apr-92 7029 Dec-97 13450 Aug-03 14890 May-92 7250 Jan-98 14428 Sep-03 20311 Jun-92 8458 Feb-98 11861 Oct-03 20817 Jul-92 9191 Mar-98 19573 Nov-03 20131 Aug-92 8305 Apr-98 10851 Dec-03 23144 Sep-92 9407 May-98 11322 Jan-04 21434 Oct-92 9018 Jun-98 12249 Feb-04 17398 Nov-92 7041 Jul-98 12466 Mar-04 202766 Dec-92 12149 Aug-98 13521 Apr-04 18294 Jan-93 9156 Sep-98 13941 May-04 20137 Feb-93 8430 Oct-98 14773 Jun-04 20485 Mar-93 17064 Nov-98 14694 Jul-04 21948 Apr-93 6798 Dec-98 16698 Aug-04 25248 May-93 8548 Jan-99 14621 Sep-04 18226 Jun-93 8066 Feb-99 14637 Oct-04 20669 Jul-93 9201 Mar-99 22483 Nov-04 18958 Aug-93 8547 Apr-99 13312 Dec-04 23606 Sep-93 8952 May-99 13825 Jan-05 21432 Oct-93 8542 Jun-99 16782 Feb-05 21924 Nov-93 7899 Jul-99 17615 Mar-05 22081 Dec-93 11066 Aug-99 18898 Apr-05 20137 Jan-94 10560 Sep-99 16480 May-05 19323 Feb-94 9053 Oct-99 14793 Jun-05 21353 Mar-94 110517 Nov-99 14932 Jul-05 22802 Apr-94 8335 Dec-99 20334 Aug-05 22394 May-94 8310 Jan-00 20334 Sep-05 22674 Jun-94 8351 Feb-00 16770 Oct-05 21896 3
    4. Jul-94 9769 Mar-00 20794 Nov-05 20533 Aug-94 9816 Apr-00 14254 Dec-05 24005 Sep-94 9501 May-00 16362 Jan-06 22815 Oct-94 9399 Jun-00 16155 Feb-06 23028 Nov-94 9758 Jul-00 17750 Mar-06 29183 Dec-94 10448 Aug-00 18113 Apr-06 19257 Jan-95 10379 Sep-00 16032 May-06 21697 Feb-95 8420 Oct-00 16692 Jun-06 21226 Mar-95 14552 Nov-00 15815 Jul-06 23020 Apr-95 8613 Dec-00 11699 Aug-06 26378 May-95 8843 Jan-01 18432 Sep-06 23685 Jun-95 8772 Feb-01 16012 Oct-06 19745 Jul-95 9237 Mar-01 21860 Nov-06 25568 Aug-95 9971 Apr-01 13703 Dec-06 24709 Sep-95 9607 May-01 15729 Jan-07 22448 Oct-95 9613 Jun-01 17906 Feb-07 24147 Nov-95 10470 Jul-01 19992 Mar-07 27868 Dec-95 10869 Aug-01 17946 Apr-07 21340 Jan-96 10039 Sep-01 19073 May-07 24105 Feb-96 8639 Oct-01 16718 Jun-07 21889 Mar-96 10787 Nov-01 17730 Jul-07 25538 Apr-96 7566 Dec-01 19790 Aug-07 26107 May-96 8406 Jan-02 20485 Sep-07 24875 Jun-96 9161 Feb-02 18070 Oct-07 24034 Jul-96 10225 Mar-02 172579 Nov-07 24936 Aug-96 13161 Apr-02 16279 Dec-07 23882 Sep-96 9132 May-02 18109 Jan-08 27908 Oct-96 7525 Jun-02 18226 Feb-08 25971 Nov-96 10906 Jul-02 22507 Mar-08 27844 Source of the Series – Monthly Abstract of Statistics (Vol 44 – Vol 61), Central Satistical Organisation (CSO), Ministry of Statistics and Programme Implementation, Government of India. No. of observations – The series consists of 203 observations. 4
    5. Plot of the Series – 240000 200000 160000 120000 80000 40000 0 1992 1994 1996 1998 2000 2002 2004 2006 Postal Revenue Figure - 1 Comment – The above graph shows that the series is following a steady pattern. In the month of March seasonality is observed. Seasonality reaches its peak in March of year 1994, 2002, 2003 and 2004. Detection and Removal of Seasonality – Many time series display seasonality. By seasonality, we mean periodic fluctuations. For example, retail sales tend to peak for the Christmas season and then decline after the holidays. Seasonality is quite common in economic time series. It is less common in engineering and scientific data. Graphical Approach – A seasonal sub-series plot is a specialized technique for showing seasonality. REVENUE by Season 250000 200000 150000 100000 50000 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec REVENUE Means by Season Figure – 2 5
    6. Comment – The seasonal stacked line graph shows seasonality in March. Revenue receipts of March is consistently higher than other months. March 1994, 2002, 2003 and 2004 show drastically high postal revenue receipts. Test for Seasonality – We can also check for seasonality using statistical tests. There are numerous different tests available for the detection. According to Startz, the simplest way is to regress the dependent variable against seasonal dummies. Since we are working with monthly data we take 11 dummy variables and 1 intercept. Dummy variable SEASi is defined as follows SEASi = 1 if the observation refers to ith month. SEASi = 0 otherwise The Regression: Y (in log values) = C + Σi = 1 to 11 SEAS(i) The following table shows the results of the seasonality check – Table 1: Variable Coefficient Std. Error t-Statistic Prob. C 4.199528 0.050813 82.64598 0 @SEAS(1) -0.003293 0.071861 -0.045825 0.9635 @SEAS(2) -0.047936 0.071861 -0.667067 0.5055 @SEAS(3) 0.319685 0.072975 4.380736 0 @SEAS(4) -0.115656 0.071861 -1.609442 0.1092 @SEAS(5) -0.092183 0.071861 -1.282799 0.2011 @SEAS(6) -0.072598 0.071861 -1.010253 0.3137 @SEAS(7) -0.005423 0.072975 -0.074308 0.9408 @SEAS(8) -0.026545 0.071861 -0.369397 0.7122 @SEAS(9) -0.036843 0.071861 -0.512705 0.6088 @SEAS(10) -0.055515 0.071861 -0.772536 0.4408 @SEAS(11) -0.04719 0.071861 -0.656683 0.5122 The intercept and SEAS(3) (i.e. seasonal coefficient of MARCH) is significant. Thus we have deterministic and stable seasonality in March. Deseasonalised Series – The series has seasonality in month of March. We use additive seasonal adjustment technique provided by U.S. Census Bureau. Thus we get the Deseasonalised series. Let Y_SA denote this data. 6
    7. The Series – Date Data Date Data Date Data 1991M04 6815.872 1997M01 9819.177 2002M10 23378.24 1991M05 6851.71 1997M02 9794.331 2002M11 23747.38 1991M06 8625.112 1997M03 5011.013 2002M12 23614.93 1991M07 NA 1997M04 10756.39 2003M01 24748.68 1991M08 9391.296 1997M05 10773.62 2003M02 6909.438 1991M09 9956.906 1997M06 12998.84 2003M03 165755.8 1991M10 10157.63 1997M07 14001.58 2003M04 24882.36 1991M11 10602.18 1997M08 12949.87 2003M05 24032.27 1991M12 9087.815 1997M09 14753.24 2003M06 23118.92 1992M01 6737.003 1997M10 13587.93 2003M07 24531.51 1992M02 6588.07 1997M11 13974.75 2003M08 17517.01 1992M03 NA 1997M12 12818.89 2003M09 23294.43 1992M04 6838.377 1998M01 13416.12 2003M10 24004.03 1992M05 7452.823 1998M02 12539.53 2003M11 23816.25 1992M06 10952.01 1998M03 13990.22 2003M12 24920.54 1992M07 10874.38 1998M04 12939.14 2004M01 23800.66 1992M08 10451.34 1998M05 12980.86 2004M02 6660.983 1992M09 11045.22 1998M06 12804.27 2004M03 186265.8 1992M10 10954.32 1998M07 12275.56 2004M04 19405.73 1992M11 9670.792 1998M08 12903.07 2004M05 24844.51 1992M12 12340.39 1998M09 14741.48 2004M06 24679.48 1993M01 7587.12 1998M10 16604.03 2004M07 23390.35 1993M02 6779.203 1998M11 15744.68 2004M08 27495.14 1993M03 7942.875 1998M12 16794.47 2004M09 20777.59 1993M04 6487.404 1999M01 13732.03 2004M10 23457.94 1993M05 8652.125 1999M02 14014.78 2004M11 21766.6 1993M06 10727.21 1999M03 15281.26 2004M12 24284.58 1993M07 10938.02 1999M04 15925.19 2005M01 23211.1 1993M08 10330.62 1999M05 15620.33 2005M02 13485 1993M09 10587.31 1999M06 17474.43 2005M03 8595.753 1993M10 10452.35 1999M07 17234.69 2005M04 20797 1993M11 10000.47 1999M08 18673.54 2005M05 23513.93 1993M12 11014.94 1999M09 17671.27 2005M06 25439.73 1994M01 8995.562 1999M10 17102.15 2005M07 24366.55 1994M02 7738.075 1999M11 16974.06 2005M08 23483.16 1994M03 102342.6 1999M12 21544.17 2005M09 24770.45 1994M04 8217.758 2000M01 20018 2005M10 24235.43 1994M05 8548.738 2000M02 13477.4 2005M11 22137.13 1994M06 10854.71 2000M03 10549 2005M12 23627.7 1994M07 11304.55 2000M04 17274.27 2006M01 23172.91 1994M08 11040.66 2000M05 18600.94 2006M02 17899.34 1994M09 10967.4 2000M06 17335.33 2006M03 20013.43 1994M10 11284.05 2000M07 17301.09 2006M04 19644 1994M11 11247.23 2000M08 18939.74 2006M05 24803.27 1994M12 10414.26 2000M09 17746.41 2006M06 24685.45 1995M01 8801.369 2000M10 19663.3 2006M07 24268.32 1995M02 7502.185 2000M11 19044.64 2006M08 26496.52 1995M03 7362.359 2000M12 14007.6 2006M09 25212.98 1995M04 8773.445 2001M01 19337.28 2006M10 21817.01 1995M05 9313.592 2001M02 8886.636 2006M11 26403.66 7
    8. 1995M06 11188.76 2001M03 7667.626 2006M12 23843.99 1995M07 10579.62 2001M04 16592.31 2007M01 21745.85 1995M08 10568.07 2001M05 18996.41 2007M02 21403.16 1995M09 10792.68 2001M06 20234.7 2007M03 21598.85 1995M10 11307.47 2001M07 20000.1 2007M04 21757.13 1995M11 11195.07 2001M08 19680.08 2007M05 26427.86 1995M12 10552.04 2001M09 21538.35 2007M06 24820.78 1996M01 8739.476 2001M10 20034.82 2007M07 26526.88 1996M02 8422.225 2001M11 21646.81 2007M08 25519.91 1996M03 4474.657 2001M12 22692.78 2007M09 26062.82 1996M04 8371.044 2002M01 22258.62 2007M10 26158.52 1996M05 9399.018 2002M02 8025.531 2007M11 25446.74 1996M06 10996.16 2002M03 155792.1 2007M12 22835.5 1996M07 11110 2002M04 18454.92 2008M01 26660.95 1996M08 13155.27 2002M05 22319.68 2008M02 24467.38 1996M09 9988.303 2002M06 21501.16 2008M03 22925.15 1996M10 9070.546 2002M07 23036.02 *Calculated using e-views 5.0 1996M11 11226.79 2002M08 23027.07 M0i = ith month, 1996M12 10259.1 2002M09 23389.11 A Note on Stationarity – Univariate time series models are a class of specifications where one attempts to model and to predict the variable under study by using only the information contained in their past values and possibly current and past values of the error term. The starting point is the concept of a stochastic process. A stochastic process is a collection of random variables {Xt } with distribution functions {Ft(Xt)} where t denotes an integer that indexes the time period. From the point of view of analyzing a time series using statistical methods, it is useful to regard an observed time series as a particular realization of a stochastic process. Thus for each random variable there is one observation available. As specifying the complete form of the probability distribution of a stochastic process will be a difficult task, it is customary to define the stochastic process in terms of the first and second moments of the random variable Xt. the fact that the unknown mean and variance parameters change with t presents us with a difficult problem since there are too many parameters to be estimated. Thus given a single realization, we need to reduce the number of parameters. To this end, we first make the restrictive assumption that the process is stationary. Stationarity can be of two types – 1. A stochastic process is called weak or covariance or second order stationary if the mean and the variance of the stochastic process remain the same over time, and the covariances-called autocovariances between Xt and Xt-k depend only on the value of lag k. 2. A stronger condition of stationarity is called the strong or strict stationarity. A stochastic process is said to be strongly or strictly stationary if its properties are unaffected by a change of time origin; in other words, the joint probability distribution of the stochastic process at any set of times t1, t2 ,…….,tm is the same as 8
    9. that of t1+k, t2 +k,…….,tm+k, where k, and integer, is an arbitrary shift along the time axis. It may be noted that strong stationarity assumption is in terms of the complete distribution whereas weak stationarity is in terms of the first and the second moments only. When we are doing data analysis we make sure that the data under consideration is weakly stationary. An examination whether a series is stationary or non-stationary is essential for the following reasons: 1. The stationarity of a series can strongly influence its behavior and properties. For example, for a stationary series, shocks to a system will gradually die away. 2. The use of non-stationary data can lead to spurious regression. 3. If the variables employed in the model are not stationary then the standard assumptions of the data analysis will not be valid. In other words, the usual ‘t-ratios’ will not follow t-distribution, and the F-statistic will not follow F-stationary. Thus by conducting the test for unit root if it is found that the series is non-stationary then we can try to make the series stationary by differencing. However differencing leads to loss of one data point. A Note on Unit Root Tests – To test for existence of unit roots we take the Dickey Fuller, Augmented Dickey Fuller test and Phillips-Perron test. Consider the equation, ∆y t = φ y t −1 + u t where ut ~ WN (0,σ2). The Dickey Fuller Test says that testing for unit root is equivalent to testing the null hypothesis that the autoregressive parameter ρ=1 against the alternative hypothesis ρ<1, i.e. H0: ρ=1 H1: ρ<1 (two-sided test) (we can also write H1: ρ<1) Said and Dickey modified the original DF test to incorporate ut’s are correlated and AR(p) representation and we have the Augmented Dickey Fuller test and the estimating equation is given by, m ∆yt = φ yt −1 + ∑ δ j ∆y t − j + ut (2) j =1 where φ = -α(1), m = p-1 and δj = - (αi+1+……αp). (This is the ADF equation) The null hypothesis and the alternative hypothesis of ADF unit root test is H0: φ=0 H1: φ<0 (one-sided test) 9
    10. The ADF test statistic, does not follow an asymptotic standard normal distribution under H0, but it has a non-standard limiting distribution. Further, this non-standard limiting distribution under the unit root null hypothesis is the same as the DF distribution, and hence the same critical values corresponding to the DF test are applicable for the ADF test as well. The reason behind this result is that in a regression of an I(1) variable on an I(1) and an I(0) variable, the asymptotic distributions of the coefficients of I(1) and I(0) variables are independent. Said and Dickey have also proved the validity of this ADF test statistic for the general case in which, under the unit root null hypothesis, the series of first differences Δyt, follow the general ARMA(p,q) process with p and q unknown. This increases the power of ADF compared to DF test. When errors ut’s are autocorrelated, Phillips and Perron suggested a modification of the original DF test statistic using non-parametric approach. This test should ideally be called a semi-parametric test since it essentially considers the usual parametric regression, but treats serial correlations in nonparametric way. Although the PP test is more sophisticated and sometimes more powerful as compared to the ADF test for testing unit roots, this relative advantage of the PP tests have become somewhat restrictive due to its severe size distortion in finite samples. On the basis of their Monte Carlo studies, De Jong et al. have further argued that the PP tests have very low power(less than 0.1) against trend stationary alternatives, but the ADF test has power around 1/3 rd and thus the ADF test is likely to be more useful in practice. Hence ADF test is used in the following univariate analysis. Further discussion on ADF Test – In the ADF test discussed so far, both the DGP and the estimating equation have been assumed to have no drift (constant term) or deterministic trend. But, as stated in the preceding sub-section, this is often quite unrealistic. Therefore, we now outline the ADF test for models with drift and deterministic trend. In other words, we now keep in mind that stochastic trend along with deterministic trend and drift may be present in the DGP. In this case, the ADF unit root test is usually performed sequentially-starting with the estimating equation (2) and then the following two in (3) and (4) in that order: m ∆yt = α + φ yt −1 + ∑ δ j ∆yt − j + ut , (3) j =1 and m ∆yt = α + β t + φ yt −1 + ∑ δ j ∆yt − j + ut . (4) j =1 It may be worth noting that while the limiting distribution of the t-statistic under the null φ =0, as already stated, does not depend on the δ j ' s or other characteristics of the short- run dynamics like autocorrelations as incorporated through the augmentation, it depends on the deterministic terms that are included in the equation. Hence, the critical values also 10
    11. depend on the deterministic terms included in the estimating regression, and accordingly different critical values are used when a constant and/or a linear trend term are (is) added, as in (3) and (4). These critical values are available in Fuller (1976, 1996) and Dickey and Fuller (1981). In this context, it is worth noting that including seasonal dummies in addition to a constant and/ or a linear trend does not result in further changes in the limiting distributions, and hence in the critical values as well. Selection of lag length , m, for the ADF test is a very crucial issue as it has been observed that the size and power of ADF test are very sensitive to the choice of m. Based on Monte Carlo studies, Hall (1994) and Ng and Perron (1995) have found that both Akaike information criterion (AIC) and Schwarz Bayesian information criterion (BIC) underestimate the optimum lag length, which in turn results in high size distortions. Based on their findings, Nag and Perron (1995) have advocated that it is preferable (to AIC and BIC) to follow Hall’s prescription of general to specific approach i.e., to start with a reasonably high lag value, and then test the significance of the last coefficient and reduce the value of m iteratively until a significant statistic value is found. Hall has also shown that the other rule of model selection viz., specific to general approach is not generally valid asymptotically. Detection of Stationarity – Step 1 → ADF Test without Intercept and Trend on Y_SA_LOG (i.e. log values of Y_SA). Table 2: Null Hypothesis: Y_SA_LOG has a unit root Exogenous: None Lag Length: 11 (Automatic based on AIC) t-Statistic Prob. Augmented Dickey-Fuller test statistic 1.567514 0.9713 Test critical values: 1% level -2.577062 5% level -1.942491 10% level -1.615600 R-squared 0.617949 Mean dependent var 0.002590 Adjusted R-squared 0.594602 S.D. dependent var 0.258992 S.E. of regression 0.164902 Akaike info criterion -0.706467 Sum squared resid 4.894693 Schwarz criterion -0.502873 Log likelihood 79.82082 Durbin-Watson stat 2.002369 Comments – Mod-value of ADF Test Statistic is less than Mod-value of Critical value 10% level of significance. Thus null hypothesis cannot be rejected at 10% level of significance. Hence we conclude that the level data is unit-root i.e. non-stationary when ADF Test is performed with no intercept and trend.. Also the AIC and BIC criteria have a negative value which can be true if variance term in AIC is less than 1. 11
    12. Lag 11 has been determined by the Hall’s General to specific Procedure. We start with a large lag and carry out the ADF test. If the last coefficient is insignificant at 5% we reduced the lag by 1 and repeat the process until we find that the last coefficient is significant Thus at the level values the series is non-stationary. Step 2 → ADF Test with intercept and trend on Y_SA_LOG. Table 3: Null Hypothesis: Y_SA_LOG has a unit root Exogenous: Constant, Linear Trend Lag Length: 11 (Automatic based on AIC) t-Statistic Prob. Augmented Dickey-Fuller test statistic -2.214542 0.4784 Test critical values: 1% level -4.006566 5% level -3.433401 10% level -3.140550 Variable Coefficient Std. Error t-Statistic Prob. Y_SA_LOG(-1) -0.439037 0.198252 -2.214542 0.0281 D(Y_SA_LOG(-1)) -0.633966 0.193415 -3.277754 0.0013 D(Y_SA_LOG(-2)) -0.551402 0.191613 -2.877682 0.0045 D(Y_SA_LOG(-3)) -0.462497 0.187654 -2.464625 0.0147 D(Y_SA_LOG(-4)) -0.419611 0.180923 -2.319277 0.0215 D(Y_SA_LOG(-5)) -0.393598 0.172398 -2.283071 0.0236 D(Y_SA_LOG(-6)) -0.368569 0.163270 -2.257423 0.0252 D(Y_SA_LOG(-7)) -0.365065 0.152809 -2.389031 0.0179 D(Y_SA_LOG(-8)) -0.321450 0.140978 -2.280136 0.0238 D(Y_SA_LOG(-9)) -0.268490 0.126128 -2.128710 0.0347 D(Y_SA_LOG(-10)) -0.247754 0.104417 -2.372734 0.0187 D(Y_SA_LOG(-11)) -0.340732 0.070119 -4.859363 0.0000 C 1.733580 0.770159 2.250936 0.0256 @TREND(1991M04) 0.001146 0.000596 1.921289 0.0563 R-squared 0.628857 Mean dependent var 0.002590 Adjusted R-squared 0.601751 S.D. dependent var 0.258992 S.E. of regression 0.163442 Akaike info criterion -0.714600 Sum squared resid 4.754946 Schwarz criterion -0.477074 Log likelihood 82.60157 F-statistic 23.19998 Durbin-Watson stat 1.982146 Prob(F-statistic) 0.000000 Comments – Mod-value of ADF Test Statistic is less than Mod-value of Critical value 10% level of significance. Thus null hypothesis cannot be rejected at 10% level of significance. Hence we conclude that the level data is unit-root i.e. non-stationary when ADF Test is performed with intercept and trend. 12
    13. Detrending the Series – We detrend the deseasonalised non-stationary series by differencing. Here we take 1st difference of the series. Let DY_SA denote the new series. The plot after taking difference is: 200000 150000 100000 50000 0 -50000 -100000 -150000 -200000 1992 1994 1996 1998 2000 2002 2004 2006 Figure –3 Comment – Clearly from the graph we see that DY_SA is most likely to be stationary. However around 2002 to 2004 there seems to be an erratic aberration which may be due to structural break. Now we formally check whether DY_SA is really stationary or not. We caryy out ADF test on DY_SA. The optimum lag value of 10 is obtained by Hall’s procedure. The results are shown Table 4. Table 4: Null Hypothesis: DY_SA has a unit root Exogenous: Constant, Linear Trend Lag Length: 10 (Automatic based on AIC, MAXLAG=50) t-Statistic Prob. Augmented Dickey-Fuller test statistic -11.28941 0.0000 Test critical values: 1% level -4.006566 5% level -3.433401 10% level -3.140550 Variable Coefficient Std. Error t-Statistic Prob. DY_SA(-1) -9.214590 0.816216 -11.28941 0.0000 D(DY_SA(-1)) 7.212275 0.790129 9.127966 0.0000 D(DY_SA(-2)) 6.266767 0.740604 8.461693 0.0000 D(DY_SA(-3)) 5.381412 0.673604 7.988990 0.0000 D(DY_SA(-4)) 4.552312 0.594270 7.660347 0.0000 D(DY_SA(-5)) 3.778670 0.506801 7.455926 0.0000 13
    14. D(DY_SA(-6)) 3.051982 0.414266 7.367198 0.0000 D(DY_SA(-7)) 2.360534 0.319824 7.380734 0.0000 D(DY_SA(-8)) 1.715098 0.226677 7.566268 0.0000 D(DY_SA(-9)) 1.115783 0.138915 8.032138 0.0000 D(DY_SA(-10)) 0.555456 0.062151 8.937182 0.0000 C 1497.441 2807.310 0.533408 0.5944 @TREND(1991M04) -6.539395 23.19569 -0.281923 0.7783 R-squared 0.889559 Mean dependent var -11.71245 Adjusted R-squared 0.882156 S.D. dependent var 51884.59 S.E. of regression 17811.19 Akaike info criterion 22.47835 Sum squared resid 5.68E+10 Schwarz criterion 22.69891 Log likelihood -2144.921 F-statistic 120.1484 Durbin-Watson stat 2.015379 Prob(F-statistic) 0.000000 Comment – Mod-value of ADF Test Statistic is greater than Mod-value of Critical value 1% level of significance. Thus null hypothesis is rejected at 1% level of significance. Hence we conclude that the level data is not unit-root i.e. stationary. Here intercept and deterministic trend are insignificant. Estimation of Stationary Model – Using 80% of the initial observations we try to fit the model. We first check the correlogram of the series to get an idea about the underlying model. The nature of ACF/PACF gives us an idea. Table 5 – Autocorrelation Partial Correlation Lag AC PAC Q-Stat ****|. | ****|. | 1 -0.552 -0.552 50.548 .|. | ***|. | 2 0.052 -0.363 50.996 .|. | **|. | 3 0.002 -0.265 50.996 .|. | **|. | 4 0.002 -0.203 50.997 .|. | *|. | 5 -0.003 -0.167 50.999 .|. | *|. | 6 0.005 -0.133 51.003 .|. | *|. | 7 -0.012 -0.128 51.027 .|. | *|. | 8 0.009 -0.112 51.040 .|. | *|. | 9 -0.001 -0.099 51.040 .|. | .|. | 10 0.034 -0.020 51.240 **|. | ****|. | 11 -0.316 -0.582 68.944 From the correlogram analyses, its visible that PAC comes close to zero as lags increase but AC has eratic values. This way we can presume the data to follow MA(q) model. Again we see at the 11th lag AC and PAC are significantly non-zero. Thus to be doubly sure it would be better to consider the series follows ARMA(p,q). Now we must identify the order of p and q. For this purpose AIC and BIC values of ARMA(p,q), where p, q belongs to {0,1,2,3}, will be considered. 14
    15. Model Fitting – Table 6 (A)– AR 0 1 2 3 MA AIC --- 23.23 23.10 23.05 0 BIC --- 23.24 23.14 23.11 AIC 22.82 22.82 22.84 22.85 1 BIC 22.84 22.86 22.89 22.93 AIC 22.81 22.80 22.82 22.84 2 BIC 22.85 22.86 22.90 22.93 AIC 22.82 22.81 22.80 22.82 3 BIC 22.88 22.89 22.90 22.93 Table 6 (B) – AR 4 5 6 7 AIC 23.03 23.02 23.02 23.02 BIC 23.10 23.11 23.14 23.16 Looking at Akaike Info Criterion (AIC) and Schwarz Criterion (BIC) for various AR(p), MA(q) and ARMA(p.q) models we can observe that for AR(5) model have the minimum AIC and BIC values. Thus we can conclude that the series DY_SA follows AR(5) model. The Model – AR(5) Table 7 – Dependent Variable: DY_SA (80% in-sample values) Method: Least Squares Sample (adjusted): 1991M10 2004M11 Included observations: 158 after adjustments Convergence achieved after 3 iterations Variable Coefficient Std. Error t-Statistic Prob. AR(1) -0.936328 0.079706 -11.74729 0.0000 AR(2) -0.750066 0.106134 -7.067184 0.0000 AR(3) -0.550506 0.113854 -4.835178 0.0000 AR(4) -0.353925 0.106137 -3.334597 0.0011 AR(5) -0.167351 0.079714 -2.099388 0.0374 R-squared 0.476885 Mean dependent var 74.74586 Adjusted R-squared 0.463209 S.D. dependent var 32453.12 S.E. of regression 23777.11 Akaike info criterion 23.02197 Sum squared resid 8.65E+10 Schwarz criterion 23.11889 Log likelihood -1813.736 Durbin-Watson stat 2.044628 Inverted AR Roots .27-.67i .27+.67i -.39-.56i -.39+.56i -.68 15
    16. The estimated model has the form – Xt = at - 0.936328Xt-1 - 0.750066Xt-2 - 0.550506Xt-3 - 0.353925Xt-4 - 0.167351Xt-5 Diagnostic Tests – Now I will carryout several diagnostic test to check whether the specified AR(5) model for Xt can correctly explain it or not. 1. Residual Diagnostic (Q-statistic)- If the AR model is correctly specified, then the residuals from the model should be nearly white noise. This means that there should be no serial correlation left in the residuals. Table 8 – Sample: 1991M10 2004M11 Included observations: 158 Autocorrelation Partial Correlation Lag AC PAC Q-Stat Prob .|. | .|. | 1 -0.023 -0.023 0.0821 0.774 .|. | .|. | 2 -0.047 -0.048 0.4431 0.801 *|. | *|. | 3 -0.074 -0.076 1.3253 0.723 *|. | *|. | 4 -0.100 -0.107 2.9668 0.563 *|. | *|. | 5 -0.114 -0.130 5.1049 0.403 **|. | **|. | 6 -0.201 -0.236 11.800 0.067 *|. | *|. | 7 -0.060 -0.130 12.395 0.088 .|. | *|. | 8 -0.053 -0.151 12.862 0.117 .|. | *|. | 9 -0.038 -0.169 13.103 0.158 .|. | **|. | 10 -0.027 -0.198 13.231 0.211 .|. | **|. | 11 -0.026 -0.246 13.346 0.271 In Table 8 correlogram has a significant spike at lag 12, but all subsequent Q-statistics are not highly significant. This result clearly indicates that the model specified is good. 2. Root Diagnostic – The roots view displays the inverse roots of the AR characteristic polynomial. The graph plots the roots in the complex plane where the horizontal axis is the real part and the vertical axis is the imaginary part of each root. Inverse Roots of AR/MA Polynomial(s) 1.5 1.0 0.5 0.0 -0.5 -1.0 -1.5 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 16 AR roots
    17. Figure – 4 Since the estimated AR process is (covariance) stationary, thus all AR roots lies inside the unit circle. 3. Correlogram Diagnostics – It compares the autocorrelation pattern of the structural residuals and that of the estimated AR(5) model for a specified number of periods. Here 24 lags are considered. .8 A to o la n .4 u c rre tio .0 -.4 -.8 2 4 6 8 10 12 14 16 18 20 22 24 Actual Theoretical .2 P rtia a to o la n a l u c rre tio .0 -.2 -.4 -.6 2 4 6 8 10 12 14 16 18 20 22 24 Actual Theoretical Figure – 5 In Figure 5 the residual and theoretical (estimated) autocorrelations and partial autocorrelations are quite \"close\". Thus the model AR(5) is not a bad model for the series. Forecasting – Forecasting refers to predicting likely values pertaining to future time points based on a given time series of observations. Forecasting in-sample observations – The model fitted is AR(5) and we forecast the insample (80%, i.e. 1991M05 2004M11) observations. 100000 Forecast: DY_SAF 50000 Actual: DY_SA Forecast sample: 1991M05 2004M11 0 Adjusted sample: 1991M10 2004M11 Included observations: 158 -50000 Root Mean Squared Error 23397.87 Mean Absolute Error 8391.537 -100000 Mean Abs. Percent Error 2508.158 Theil Inequality Coefficient 0.427823 -150000 Bias Proportion 0.000207 Variance Proportion 0.183085 -200000 Covariance Proportion 0.816707 -250000 92 93 94 95 96 97 98 99 00 01 02 03 04 17 DY_SAF
    18. Figure – 6 (a) : Static Forecast The graph of the forecasts and statistics evaluating the quality of the fit to the actual data is given in Figure 6. This is a one-step ahead forecast and has asymmetric confidence interval. The red lines are (+-) standard error bands. Now lets examine the graph of actual in-sample observations versus fitted values. 200000 150000 100000 50000 0 -50000 -100000 -150000 -200000 1992 1994 1996 1998 2000 2002 2004 2006 DIF_IN DIF_INF Figure – 6 (b) The actual and fitted graph depicted above is quite close to each other. This shows that the fit is good. Forecasting holdout-sample – The model fitted is AR(5) and we forecast the out-sample (20%, i.e. 2004M12 2008M03) observations. 18
    19. 80000 Forecast: DY_SAOF 60000 Actual: DY_SA Forecast sample: 2004M12 2008M03 40000 Included observations: 40 20000 Root Mean Squared Error 4012.841 Mean Absolute Error 3005.525 Mean Abs. Percent Error 333.8227 0 Theil Inequality Coefficient 0.577285 Bias Proportion 0.000942 -20000 Variance Proportion 0.001191 Covariance Proportion 0.997867 -40000 -60000 05M01 05M07 06M01 06M07 07M01 07M07 08M01 DY_SAOF Figure – 7 (a) : Static Forecast The graph of the forecasts and statistics evaluating the quality of the fit to the actual out- sample data is given in Figure 7. This is a one-step ahead forecast and has asymmetric confidence interval. The red lines are (+-) standard error bands. Now lets examine the graph of actual out-sample observations versus fitted values. 200000 150000 100000 50000 0 -50000 -100000 -150000 -200000 1992 1994 1996 1998 2000 2002 2004 2006 OUTSAM OUTFIT Figure 7 (b) The actual and fitted graph depicted above is quite close to each other. This shows that the fit is good. 19
    20. Test for Structural Break – Structural change/break is an important problem in time series and it affects the inferential procedures of any time series analysis. Simply stated, structural change means a situation where at least one of the underlying parameters has changed at some date called the break date. We use Quandt-Andrews Test to check for structural break in the series. Quandt-Andrews Test is applicable for non-linear model as well. Here we plot the sequence of Wald’s statistic (or LR or LM statistic) as a function of candidate break date. The candidate break dates are along x-axis and the values of the log likelihood ratio on y- axis. After plotting these values we check whether the maximum of Wald’s statistic (or LR or LM) lies above the Andrew’s critical value, if it lies above the critical value we conclude that we have a break. Table 9 – Quandt-Andrews unknown breakpoint test Null Hypothesis: No breakpoints within trimmed data Equation Sample: 1991M03 2008M03 Test Sample: 1992M08 2007M10 Number of breaks compared: 182 Statistic Value Prob. Maximum LR F-statistic (2001M02) 2.983682 0.4972 Maximum Wald F-statistic (2001M02) 2.983682 0.4972 Exp LR F-statistic 0.251994 0.5002 Exp Wald F-statistic 0.251994 0.5002 Ave LR F-statistic 0.528733 0.4258 Ave Wald F-statistic 0.528733 0.4258 Note: probabilities calculated using Hansen's (1997) method Thus, we see here that our null hypothesis that there is no structural break cannot be rejected at every time point at 5% level of significance. Thus we conclude that there is no structural break in our data. Conclusions – 1. The monthly series (April 1991 - March 2008) is non stationary at the level value. 2. There exists deterministic seasonality in the data at the month of March. 3. The difference of the original series is stationary. 4. The best model estimated on the stationary data is AR(5) . 5. There exists no Structural Break in the data series. 6. Thus, we can conclude that the total postal revenue series show evidence of seasonality in March but has had a stable structure during the financial years 1991-1992 to 2007-2008. 7. There have been key shocks to the total revenue in March 1994, 2002, 2003 and 2004. They cannot totally be considered as part of seasonality. They should be treated as outliers in the series. 20
    21. 8. The Department of Post, Government of India has been incorporation newer and modern ideas keeping a strong hold on its task of efficiency and welfare. Post New- Economic Policy the Postal department has grown slowly and steadily. 9. The total revenue of March 1994, 2002, 2003 and 2004 can highlight the huge investments and revenue earning the Postal Department has incurred to give itself a big push. Acknowledgement – I would like to thank Prof. Nityananda Sarkar of the Economic Research Unit, Indian Statistical Institute, Kolkata, who taught me Time Series Analysis and Forecasting. He has been a constant source of guidance throughout this project. His informative lecture notes were of great help. Software used – 1. Eviews 5.0 2. Eviews 6.0 3. SPSS 17. References – 1. Monthly Abstract of Statistics (Vol 44 – Vol 61), Central Statistical Organisation. 2. Annual Report (2007-2008), Department of Posts, India. 3. IndiaPost Official Web-site – www.indiapost.gov.in 4. Time Series Techniques for Economists by T.C. Mills. 5. “Unit Root Tests in Time Series Econometrics”, Nityananda Sarkar and Samarjit Das, Indian Statistical Institute, Kolkata. 6. Lecture Notes of Prof. Nityananda Sarkar on Time Series Analysis and Forecasting. 7. X12-ARIMA Reference Manual, Time Series Staff, Statistical Research Division, U.S. Census Bureau, Washington DC. 21

    + Probal MojumderProbal Mojumder, 6 months ago

    custom

    327 views, 0 favs, 0 embeds more stats

    The subtle intricacies in the functioning of an imp more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 327
      • 327 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 0
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories