Time series data are observations collected over time on one or more variables. Time series data can be used to analyze problems involving changes over time, such as stock prices, GDP, and exchange rates. Time series data must be stationary, meaning that its statistical properties like mean and variance do not change over time, to avoid spurious regressions. Non-stationary time series can be transformed to become stationary through differencing, removing trends, or taking logs. Common time series models like ARIMA rely on stationary data.
2. Types of Data
• Time Series Data,
• Cross-Sectional Data and
• Panel Data.
3. Time Series Data
• Time series data, as the name suggests, are data that have been
collected over a period of time on one or more variables.
• Time series data have associated with them a particular frequency
of observation or frequency of collection of data points.
• The frequency is simply a measure of the interval over, or the
regularity with which, the data are collected or recorded.
4. Problems that could be tackled using time series data:
• How the value of a country’s stock index has varied with that
country’s macroeconomic fundamentals.
• How the value of a company’s stock price has varied when it
announced the value of its dividend payment.
• The effect on a country’s exchange rate of an increase in its trade
deficit.
In all of the above cases, it is clearly the time dimension which is the
most important, and the analysis will be conducted using the values
of the variables over time.
5. Cross-Sectional Data
Cross-sectional data are data on one or more variables collected at a
single point in time. For example, the data might be on:
• A poll of usage of internet stockbroking services.
• A cross-section of stock returns on the New York Stock Exchange
(NYSE)
• A sample of bond credit ratings for UK banks.
• Data on sales volume, sales revenue, number of customers and
expenses for the past month at each Starbucks location.
6. Problems that could be tackled using cross-sectional data:
• The relationship between company size and the return to investing
in its shares.
• The relationship between a country’s GDP level and the probability
that the government will default on its sovereign debt.
7. Panel, Longitudinal, or Micropanel Data
Panel data have the dimensions of both time series and cross-
sections, e.g. the daily prices of a number of blue chip stocks over
two years.
Cross sectional data is collected
at a particular point of time...For
instance, say your are studying the
GDP of 3 developing countries in
year 1999 only .... So u have data
like this :-
This is your cross sectional data since
you are studying the entities (Countries
here) at some point of time i.e. (year
1999 here) .
Country Time GDP
India 1999 ----
China 1999 ----
Brazil 1999 ----
Panel data is when you study again say GDP
of 3 developing countries over a period of time
(say 3 yrs. from 1999 to 2001)….So u have data
like this:
Here you are studying the same entities i.e.
(China, India and Brazil) over a period of time
i.e. 3 yrs. from 1999 to 2001. This is called panel
data.
Country Time GDP
India 1999 ----
India 2000 ----
India 2001 ----
China 1999 ----
China 2000 ----
China 2001 ----
Brazil 1999 ----
Brazil 2000 ----
Brazil 2001 ----
8. Balanced and Unbalanced Panel Data
• If all the companies/ person/
entities have the same number
of observations, we have what is
called a balanced panel.
• Balanced panel, as each person
is observed every year
• If the number of observations
is not the same for each
company, companies/ person/
entity it is called an
unbalanced panel.
• Unbalanced panel, since
person 1 is not observed in
year 2003 and person 3 is not
observed in 2003 or 2001.
9. Repeated Cross-Sections or Pooled Cross-Sections.
• There is also a type of data that's in between cross-sectional
data and panel data. It is typically called repeated cross-
sections or pooled cross-sections.
• For example, annual labour force surveys are repeated cross-
sections, because every year, a new random sample is taken from
the population. In this case, there is a time component, so
these are not cross-sectional data, but every year, new individuals
are surveyed, so these are also not panel data. That's why these
are called repeated cross-sections.
10. What is a time series?
A time series is any series of data that varies over time. For example
• Monthly Tourist Arrivals from Other countries.
• Quarterly GDP of USA.
• Hourly price of stocks and shares.
• Weekly quantity of beer sold in a pub
Because of widespread availability of time series databases most
empirical studies use time series data.
Definition of Time Series: An ordered sequence of values of a
variable at equally spaced time intervals.
11. Applications Of Time Series Analysis
Time Series Analysis is used for many applications such as:
• Economic Forecasting
• Sales Forecasting
• Budgetary Analysis
• Stock Market Analysis
• Yield Projections
• Process and Quality Control
• Inventory Studies
• Workload Projections
• Utility Studies
• Census Analysis.
12. Time Series Data
• One of the important and frequent types of data used in
empirical analysis.
• But it poses several challenges to
econometricians/practitioners. E.g.
1. Empirical work based on time series data assumes that the
underlying time series is stationary.
2. Autocorrelation: because the underlying time series data is
non-stationary.
3. Spurious/nonsense regression: a very high R2 and
significant regression coefficients (though there is no
meaningful relationship between the two variables)
13. Caveats in Using Time Series Data in
Applied Econometric Modeling
• Data Should be Stationary
• Presence of Autocorrelation
• Guard Against Spurious Regressions
• Establish Cointegration
• Reconcile SR with LR Behavior via ECM
• Implications to Forecasting
• Possibility of Volatility Clustering
16. Stationary Processes
• Stationary Processes: A stochastic process is said to be
stationary/ weakly /covariance/2nd-order stationary if:
o Its mean and variance are constant over time, and
o The value of the covariance between the two time periods depends only on
the distance/lag between the two time periods and not the actual time at
which the covariance is computed.
o E.g. let’s Yt be a stochastic process, then;
• Mean: E(Yt ) = µ …………………………………………..
(1)
• Variance: var (Yt ) = E(Yt − µ)2 = σ2 ………………………………..
(2)
• Covariance: γk = E[(Yt − µ)(Yt +k − µ)] ……………..…………
(3)
• Where γk, the covariance (or auto-covariance) at lag k,
• If k = 0, we obtain γ0, which is simply the variance of Y (= σ2); if k = 1,
γ1 is the covariance between two adjacent values of Y
17. Stationarity
A time series has stationarity if a shift in time doesn’t cause a
change in the shape of the distribution. Basic properties of the
distribution like the mean , variance and covariance are constant
over time.
18. Why are Stationary Time Series so
Important?
• Because if a time series is non-stationary, we can study its
behavior only for the time period under consideration, and as a
consequence, it is not possible to generalize it to other time
periods.
• Therefore, for the purpose of forecasting, such (non-stationary)
time series may be of little practical value.
• Non-stationary Stochastic Processes: Although our
interest is in stationary time series, one often encounters non-
stationary time series
• A non-stationary time series will have a time-varying mean
or a time-varying variance or both.
19. Transformations to Achieve Stationarity
If the time series is not stationary, we can often transform it to stationarity with one
of the following techniques.
1. We can difference the data. That is, given the series Yt, we create the
new series Yi = Yi − Yi−1. The differenced data will contain one less point
than the original data. Although you can difference the data more than
once, one difference is usually sufficient.
2. If the data contain a trend, we can fit some type of curve to the data
and then model the residuals from that fit. Since the purpose of the fit is
to simply remove long term trend, a simple fit, such as a straight line, is
typically used.
3. For non-constant variance, taking the logarithm or square root of the
series may stabilize the variance. For negative data, you can add a
suitable constant to make all the data positive before applying the
transformation. This constant can then be subtracted from the model to
obtain predicted (i.e., the fitted) values and forecasts for future points.
The above techniques are intended to generate series with constant location and
scale. Although seasonality also violates stationarity, this is usually explicitly
incorporated into the time series model
20. • Seasonality rules out series (d), (h) and (i).
• Trend rules out series (a), (c), (e), (f) and (i).
• Stationary series (b) and (g).
21. Differencing
In Figure below, the Google stock price was non-stationary in panel (a), but the
daily changes were stationary in panel (b). This shows one way to make a non-
stationary time series stationary — compute the differences between consecutive
observations. This is known as differencing.
23. EViews Commands
Log Loggdp = log(gdp)
1st Difference d1loggdp=d(loggdp)
1st Difference + Log d1loggdp=dlog(gdp)
2nd Difference
d2loggdp=d(d1loggdp)
or
d2loggdp=d(loggdp,2)
or
d2loggdp=dlog(gdp,2)
• First take log of series and check stationarity because standard
unit root tests assumes a linear structure.
• If log transformed series is not stationarity then do differencing of
log transformed series.
26. Transformations to Achieve Stationarity
A seasonal difference is
the difference between
an observation and the
corresponding
observation from the
previous year. So,
Y′t = Yt−Yt−m
where m = number of
seasons.
These are also called
“lag-m differences” as
we subtract the
observation after a lag of
m periods.
27. • Sometimes it is
necessary to do both a
seasonal difference and
a first difference to
obtain stationary data,
as shown in Figure.
• Here, the data are first
transformed using
logarithms (second
panel).
• Then seasonal
differenced are
calculated (third panel).
• The data still seem a
little non-stationary,
and so a further lot of
first differences are
computed (bottom
panel).
US net electricity generation (billion kWh).
28. Types of Stationary
Models can show different types of stationarity:
• Strict stationarity means that the joint distribution of any moments of any degree
(e.g. expected values, variances, third order and higher moments) within the
process is never dependent on time. This definition is in practice too strict to be
used for any real-life model.
• First-order stationarity series have means that never changes with time. Any
other statistics (like variance) can change.
• Second-order stationarity (also called weak stationarity) time series have a
constant mean, variance and an autocovariance that doesn’t change with time.
Other statistics in the system are free to change over time. This constrained
version of strict stationarity is very common.
• Trend-stationary models fluctuate around a deterministic trend (the series
mean). These deterministic trends can be linear or quadratic, but the amplitude
(height of one oscillation) of the fluctuations neither increases nor decreases
across the series.
• Difference-stationary models are models that need one or more differencing's to
become stationary.
29. • We call a stochastic process(time series) purely random/white noise process
if it has zero mean, constant variance σ2, and is serially
uncorrelated i.e. [ut ∼ IIDN(0, σ2)].
• Note: Here onward, in all equations the assumption of “white noise” will be
applicable on ut .
White Noise Processes
31. Unit Root Testing: Formal Tests to
Establish Stationarity of Time Series
• Dickey-Fuller (DF) Test
• Augmented Dickey-Fuller (ADF) Test
• Phillips-Perron (PP) Unit Root Test
• Dickey-Pantula Unit Root Test
• GLS Transformed Dickey-Fuller Test
• ERS Point Optimal Test
• KPSS Test (run as a complement to the unit root tests)
• Ng and Perron Test
32. Some Useful Models for Time Series
1. A purely random process,
2. A random walk,
3. A moving average (MA) process,
4. An autoregressive (AR) process,
5. An autoregressive moving average (ARMA) process, and
6. An autoregressive integrated moving average (ARIMA)process.
33. Estimation of AR, MA, and ARMA Models
Testing Goodness of Fit
• When an AR, MA, or ARMA model has been fitted to a given
time series, it is advisable to check that the model does really
give an adequate description of the data
• There are two criteria often used that reflect the closeness of fit
and the number of parameters estimated.
• One is the Akaike Information Criterion (AIC), and the other is
the Schwartz Bayesian Criterion (SBC)/ Bayesian information
criterion (BIC).
34. The Box-Jenkins Approach
• The Box-Jenkins approach is one of the most widely used
methodologies for the analysis of time-series data
• It is popular because of its generality; it can handle any series,
stationary or not, with or without seasonal elements, and it has
well-documented computer programs.
• Although Box and Jenkins have been neither the originators nor the
most important contributors in the field of ARMA models
• They have popularized these models and made them readily
accessible to everyone, so much that ARMA models are sometimes
referred to as Box-Jenkins models.
35. The Box-Jenkins Approach
The basic steps in the Box-Jenkins methodology are
1. Differencing the series so as to achieve Stationarity,
2. Identification of a tentative model,
3. Estimation of the model,
4. Diagnostic checking (if the model is found inadequate, we go
back to step 2), and
5. Using the model for forecasting and control.
36. The Box-Jenkins Approach
1. Differencing to achieve Stationarity: How do we conclude
whether a time series is stationary or not?
• We can do this by studying the graph of the correlogram of the series.
• The correlogram of a stationary series drops off as k, the number of
lags, becomes large, but this is not usually the case for a non-
stationary series.
• Thus, the common procedure is to plot the correlogram of the given
series Yt and successive differences ΔY, ΔY, and so on, and look at the
correlograms at each stage.
• We keep differencing until the correlogram dampens.
37. The Box-Jenkins Approach
2. Once we have used the differencing procedure to get a
stationary time series, we examine the correlogram to decide on
the appropriate orders of the AR and MA components.
• The correlogram of a MA process is zero after a point.
• That of an AR process declines geometrically. The correlograms
of ARMA processes show different patterns (but all dampen
after a while).
• Based on these, one arrives at a tentative ARMA model.
• This step involves more of a judgmental procedure than the use
of any clear-cut rules.
39. The Box-Jenkins Approach
3. The next step is the estimation of the tentative ARMA model
identified in step 2. We have discussed in the preceding section
the estimation of ARMA models.
4. The next step is diagnostic checking to check the adequacy of the
tentative model. We discussed in the preceding section the Q and
Q* statistics commonly used in diagnostic checking. As argued
there, the (^-statistic is inappropriate in autoregressive models
and thus we need to replace it with some LM test statistic.
5. The final step is forecasting.
40. Differencing the series
to achieve Stationarity
Identify model to be
tentatively entertained
Estimate the parameters
of the tentative model
Diagnostic checking. Is
the model adequate?
No
Yes
Use the model for
forecasting and
control
Approaches to Economic Forecasting
The Box-Jenkins Approach
41. The defining characteristics of AR, MA and ARMA processes:
An autoregressive process has:
• a geometrically decaying acf
• a number of non-zero points of pacf = AR order.
A moving average process has:
• a geometrically decaying pacf.
• number of non-zero points of acf = MA order
A combination autoregressive moving average process has:
• a geometrically decaying acf
• a geometrically decaying pacf.
42. • Diagnostic Checking: Diagnostic checking consists of evaluating the
adequacy of the estimated model. Considerable skill is required to
choose the actual ARIMA (p,d,q) model so that the residuals
estimated from this model are white noise. So the autocorrelations
of the residuals are to be estimated for the diagnostic checking of
the model. These are also judged by Ljung-Box statistic under null
hypothesis that autocorrelation co-efficient is equal to zero.
43. MA(1) model
Sample autocorrelation and partial autocorrelation functions for an MA(1) model:
• The MA(1) has an acf that is significant for only lag 1, while the pacf declines
geometrically, and is significant until lag 7.
• The acf at lag 1 and all of the pacfs are negative as a result of the negative
coefficient in the MA generating process.
44. MA(2) model
Sample autocorrelation and partial autocorrelation functions for an MA(2) model:
• The first two autocorrelation coefficients only are significant, while the partial
autocorrelation coefficients are geometrically declining.
• Since, the second coefficient on the lagged error term in the MA is negative, the
acf and pacf alternate between positive and negative.
45. AR(1) model
Sample autocorrelation and partial autocorrelation functions for an AR(1) model:
• The AR(1) has an pacf that is significant for only lag 1, while the acf declines
geometrically.
• Only the first pacf coefficient is significant, while all others are virtually zero and
are not significant.
46. AR(1) model
Sample autocorrelation and partial autocorrelation functions for an AR(1) model:
• AR(1), which was generated using identical error terms, but a much smaller
autoregressive coefficient. In this case, the autocorrelation function dies away
much more quickly than in the previous example, and in fact becomes
insignificant after around five lags.
47. AR(1) model
• Sample autocorrelation and partial autocorrelation functions for a non-stationary
model (i.e. a unit coefficient):
• On some occasions, the acf does die away for a non-stationary process.
• The pacf, however, is significant only for lag 1, correctly suggesting that an
autoregressive model with no moving average term is most appropriate.
48. ARMA(1, 1) model
• Sample autocorrelation and partial autocorrelation functions for an ARMA(1, 1)
model:
• In such a process, both the acf and the pacf decline geometrically – the acf as a
result of the AR part and the pacf as a result of the MA part.
49. GDP
• Check through AIC and BIC criteria.
• Imp note: ARIMA model output differs in EViews 8 and later
versions. EViews 8 and former versions estimate ARIMA model on
the basis of conditional least squares method. Whereas version 9
and newer are based on ARIMA forecasting through maximum
likelihood method.
• Therefore AIC & BIC values will also differ in EViews 8 and later
versions.
51. AIC Vs SBIC
• When large observations (data points) are available select model
as per AIC. This is because AIC will always select higher model than
SBIC. Higher model (eg. ARMA 5,5) means loss of 5 data points,
therefore more data should be available which can compensate
for loss of lags.
52. Computing Summary Statistics
• UKHP.xls
• Import into Eviews.
• Calculate simple
percentage changes in the
series:
dhp = 100*(hp-hp(-1))/hp(-1)
• To obtain descriptive
summary statistics of a
series select Quick/Series
Statistics/Histogram and
Stats and type the name of
the variable (DHP)
56. AIC selects an ARMA(4,5), while SBIC selects the smaller ARMA(2,0) model i.e. an
AR(2).
The values of all of the Akaike and Schwarz information criteria calculated using EViews are as
follows.
ARMA(5,5) Model
57. Why forecast?
Some examples in finance of where forecasts from econometric
models might be useful include:
• Forecasting tomorrow’s return on a particular share.
• Forecasting the price of a house given its characteristics.
• Forecasting the riskiness of a portfolio over the next year.
• Forecasting the volatility of bond returns
• Forecasting the correlation between US and UK stock market
movements tomorrow
• Forecasting the likely number of defaults on a portfolio of home
loans.
58. Checking Forecasting Accuracy
• Root Mean Squared Error: the smaller the error, the better the
forecasting ability of that model
• Theil Inequality Coefficient: always lies between zero and one,
where zero indicates a perfect fit.
http://www.eviews.com/help/helpintro.html#page/content/Forecast-Forecast_Basics.html
http://www.eviews.com/help/helpintro.html#page/content%2FForecast-
An_Illustration.html%23
59. Diagnostic Checking
• How do we know that the model is a reasonable fit to the data?
• One simple diagnostic is to obtain residuals from identified
Equation and obtain the ACF and PACF of these residuals, say, up
to lag 25 (or 1/3 or ¼ ) of total observations.
• In estimated AC and PACF Figure, none of the autocorrelations and
partial autocorrelations should be individually statistically
significant. Nor is the sum of the 25 squared autocorrelations, as
shown by the Box–Pierce Q and Ljung–Box (LB) statistics,
statistically significant.
• In other words, the correlograms of both autocorrelation and
partial autocorrelation give the impression that the residuals
estimated from Equation are purely random. Hence, there may
not be any need to look for another ARIMA model
60. Diagnostic Checking
Serial Correlation (Auto-Correlation)
• Normally serial correlation (auto-correlation) in regression model
(Y= a + bX) is detected through Durbin Watson Statistics (DW).
• However, when regression model is autoregressive (AR) in nature (Y
= a + bYt-1), Durbin Watson Statistics (DW) will give invalid results.
• Durbin Watson Statistics (DW) can be used for only for AR models
up to one lag.
• Testing serial correlation in Eviews:
1. Correlogram:
a. ACF and PACF.
b. the Ljung-Box (LB) Q-statistics and their p-values.
2. The Langrange Multiplier (LM test) (given by Breusch and
Godfrey)
61. • Model accuracy is generally assessed by using the root
mean squared error criterion.
• Calculate all the errors, square them, calculate the average, and
then take the square root of that average.
• The model with the lowest mean-squared error is judged the
most accurate
Diagnostic Checking Cont.……
62. Autoregressive (AR) Forecasting Equation
• To make forecasts j years into the future, using a third-order
autoregressive model (AR3), you need only the most recent p = 3
values (Yn, Yn-1 and Yn-2 ) and the regression estimates a0, a1, a2, and
a3.
To forecast one year ahead, Equation becomes:
To forecast two years ahead, Equation becomes:
To forecast two years ahead, Equation becomes:
63. Moving Average (MA) Forecasting Equation
• An autoregression of the residual error time series is called a
Moving Average (MA) model. This is confusing because it has
nothing to do with the moving average smoothing process. Think of
it as the sibling to the autoregressive (AR) process, except on
lagged residual error rather than laged raw observations.
MA order Regression Equation
MA(1) Yt = c + Ut-1
MA(2) Yt = c + Ut-1 + Ut-2
U = error
64. Seasonality
Time series that show regular patterns of movement within a
year across years.
• Seasonal lags are most often included as a lagged value
one year before the prior value.
• We detect such patterns through the autocorrelations in the
data.
• For quarterly data, the fourth autocorrelation will not be
statistically zero if there is quarterly seasonality.
For monthly, the 12th, and so on.
• To correct for seasonality, we can include an additional lagged
term to capture the seasonality.
For quarterly data, we would include a prior year quarterly seasonal lag
as
𝑥𝑡 = 𝑏0 + 𝑏1𝑥𝑡−1 + 𝑏2𝑥𝑡−4 + ε𝑡