1) The document discusses cointegration analysis, which models the complex interdependencies between financial assets. It examines the non-stationary nature of financial time series data and explores vector autoregressive (VAR) models and cointegration techniques to analyze relationships between non-stationary variables.
2) VAR models provide a framework for modeling dynamic relationships between stationary time series variables. The document outlines univariate and multivariate VAR models and discusses estimations and lag order selection for VAR models.
3) Cointegration techniques allow modeling of relationships between non-stationary time series variables. The document reviews tests for identifying stationary and non-stationary time series, including the Augmented Dickey-Fuller and Phillips-Perron tests
The document discusses the topic of autocorrelation. It begins by defining autocorrelation as data that is correlated with itself over successive time periods, rather than being correlated with other external data. It then explains the concept of autocorrelation and how it violates the classical linear regression assumption that disturbances are independent over time. Several potential sources of autocorrelation are described, including omitted variables, interpolation of data, and misspecification of the random error term. The document concludes by providing mathematical expressions that describe how the mean, variance, and covariance of autocorrelated disturbances differ from the independent case.
Dummy variables are used to represent qualitative or categorical variables that take on only two values, usually 0 and 1. A dummy variable indicates the presence or absence of a particular attribute. For example, a dummy variable could represent gender where 1 = male and 0 = female. Dummy variables allow qualitative variables to be used in regression models. However, there is a "dummy variable trap" where including dummy variables for all categories of a qualitative variable leads to perfect multicollinearity. To avoid this, only n-1 dummy variables should be included where there are n categories.
The document discusses autocorrelation in econometrics models. It defines autocorrelation as a violation of the assumption that errors are independently distributed over time. The key causes of autocorrelation are omitted variables, model misspecification, and systematic measurement errors. Autocorrelation can be first-order, where the current error depends on the previous error, or higher-order. Autocorrelation biases standard errors and test statistics. Methods to detect autocorrelation include graphical analysis of residuals and formal tests like Durbin-Watson and Breusch-Godfrey. Autocorrelation can be resolved by transforming the model when the autocorrelation coefficient is known.
The document discusses heteroscedasticity, which occurs when the variance of the error term is not constant. It defines heteroscedasticity and provides potential causes, such as errors increasing with an independent variable or model misspecification. Consequences are that OLS estimates are no longer BLUE and standard errors are biased. Several tests for detecting heteroscedasticity are outlined, including Park, Glejser, Spearman rank correlation, and Goldfeld-Quandt tests. The Goldfeld-Quandt test involves dividing data into groups and comparing regression sum of squares to test if error variance differs between groups.
This document discusses heteroscedasticity, which occurs when the error variance is not constant. It provides examples of when the variance of errors may change, such as with income level or outliers. Graphical methods are presented for detecting heteroscedasticity by examining patterns in residual plots. Formal tests are also described, including the Park test which regresses the log of the squared residuals on explanatory variables, and the Glejser test which regresses the absolute value of residuals on variables related to the error variance. Detection of heteroscedasticity is important as it violates assumptions of the classical linear regression model.
Ragui Assaad- University of Minnesota
Caroline Krafft- ST. Catherine University
ERF Training on Applied Micro-Econometrics and Public Policy Evaluation
Cairo, Egypt July 25-27, 2016
www.erf.org.eg
This document discusses heteroskedasticity in multiple linear regression models. Heteroskedasticity occurs when the variance of the error term is not constant, violating the assumption of homoskedasticity. If heteroskedasticity is present, ordinary least squares (OLS) estimates are still unbiased but the standard errors are biased. Various tests for heteroskedasticity are presented, including the Breusch-Pagan and White tests. Weighted least squares (WLS) methods like feasible generalized least squares (FGLS) can produce more efficient estimates than OLS when the form of heteroskedasticity is known or can be estimated.
Identification problem in simultaneous equations modelGarimaGupta229
In this presentation, identification problem is explained with the example of Supply and Demand equilibrium and why identification problem arises. In addition, the rank and order conditions are also introduced.
For further explanation, checkout the youtube link:
https://youtu.be/PyU_RJJspfE
The document discusses the topic of autocorrelation. It begins by defining autocorrelation as data that is correlated with itself over successive time periods, rather than being correlated with other external data. It then explains the concept of autocorrelation and how it violates the classical linear regression assumption that disturbances are independent over time. Several potential sources of autocorrelation are described, including omitted variables, interpolation of data, and misspecification of the random error term. The document concludes by providing mathematical expressions that describe how the mean, variance, and covariance of autocorrelated disturbances differ from the independent case.
Dummy variables are used to represent qualitative or categorical variables that take on only two values, usually 0 and 1. A dummy variable indicates the presence or absence of a particular attribute. For example, a dummy variable could represent gender where 1 = male and 0 = female. Dummy variables allow qualitative variables to be used in regression models. However, there is a "dummy variable trap" where including dummy variables for all categories of a qualitative variable leads to perfect multicollinearity. To avoid this, only n-1 dummy variables should be included where there are n categories.
The document discusses autocorrelation in econometrics models. It defines autocorrelation as a violation of the assumption that errors are independently distributed over time. The key causes of autocorrelation are omitted variables, model misspecification, and systematic measurement errors. Autocorrelation can be first-order, where the current error depends on the previous error, or higher-order. Autocorrelation biases standard errors and test statistics. Methods to detect autocorrelation include graphical analysis of residuals and formal tests like Durbin-Watson and Breusch-Godfrey. Autocorrelation can be resolved by transforming the model when the autocorrelation coefficient is known.
The document discusses heteroscedasticity, which occurs when the variance of the error term is not constant. It defines heteroscedasticity and provides potential causes, such as errors increasing with an independent variable or model misspecification. Consequences are that OLS estimates are no longer BLUE and standard errors are biased. Several tests for detecting heteroscedasticity are outlined, including Park, Glejser, Spearman rank correlation, and Goldfeld-Quandt tests. The Goldfeld-Quandt test involves dividing data into groups and comparing regression sum of squares to test if error variance differs between groups.
This document discusses heteroscedasticity, which occurs when the error variance is not constant. It provides examples of when the variance of errors may change, such as with income level or outliers. Graphical methods are presented for detecting heteroscedasticity by examining patterns in residual plots. Formal tests are also described, including the Park test which regresses the log of the squared residuals on explanatory variables, and the Glejser test which regresses the absolute value of residuals on variables related to the error variance. Detection of heteroscedasticity is important as it violates assumptions of the classical linear regression model.
Ragui Assaad- University of Minnesota
Caroline Krafft- ST. Catherine University
ERF Training on Applied Micro-Econometrics and Public Policy Evaluation
Cairo, Egypt July 25-27, 2016
www.erf.org.eg
This document discusses heteroskedasticity in multiple linear regression models. Heteroskedasticity occurs when the variance of the error term is not constant, violating the assumption of homoskedasticity. If heteroskedasticity is present, ordinary least squares (OLS) estimates are still unbiased but the standard errors are biased. Various tests for heteroskedasticity are presented, including the Breusch-Pagan and White tests. Weighted least squares (WLS) methods like feasible generalized least squares (FGLS) can produce more efficient estimates than OLS when the form of heteroskedasticity is known or can be estimated.
Identification problem in simultaneous equations modelGarimaGupta229
In this presentation, identification problem is explained with the example of Supply and Demand equilibrium and why identification problem arises. In addition, the rank and order conditions are also introduced.
For further explanation, checkout the youtube link:
https://youtu.be/PyU_RJJspfE
Advanced Econometrics by Sajid Ali Khan Rawalakot: 0334-5439066Sajid Ali Khan
This document appears to be the introduction or table of contents to a textbook on advanced econometrics. It includes 10 chapters that cover topics such as simple linear regression, multiple linear regression, dummy variables, autocorrelation, and simultaneous equation systems. The introduction defines econometrics and discusses its goals of policy making, forecasting, and analyzing economic theories using quantitative methods. It also outlines the methodology of econometrics, which involves stating an economic theory, specifying mathematical and statistical models, collecting data, estimating parameters, testing hypotheses, forecasting, and using models for control or policy purposes.
Autocorrelation measures the correlation of a time series with its past and lagged values. It exists when observations in a time series are correlated with each other. The Durbin-Watson test can detect the presence of autocorrelation by examining the residuals of a regression model. If autocorrelation is present, it violates the assumption that errors are independent and leads to inaccurate test statistics and predictions. Common structures for autocorrelation include autoregressive (AR), moving average (MA), and autoregressive moving average (ARMA) processes.
This document provides an overview of vector autoregression (VAR) and vector error correction models (VECM) as time series methodologies. It discusses what the acronyms stand for, the practical benefits of using a VAR, how to set up and estimate a VAR, and how to interpret the results through impulse response functions and variance decompositions. A key point is that while VAR parameter estimates are often insignificant, the models can be useful for generating ancillary results like impulse responses and variance decompositions to analyze dynamic relationships between variables over time.
The document discusses panel data analysis and its application to analyzing competition in the UK banking sector. It summarizes:
1) Panel data has both time series and cross-sectional dimensions, allowing examination of how variables change over time for the same objects. A fixed effects model accounts for heterogeneity across objects.
2) A study analyzed competition in UK banking from 1980-2004 using a fixed effects panel data model. It tested for market equilibrium and calculated a contestability parameter to indicate the degree of competition.
3) The results found evidence of equilibrium and showed the contestability parameter fell from 0.78 to 0.46, suggesting competition weakened over the period.
1) Autocorrelation refers to correlation between members of a time series or cross-sectional data set ordered by time or space.
2) In a time series, successive errors are often correlated, violating the assumption of independent errors in a linear regression model.
3) Autocorrelation occurs when there is correlation between a variable and its own past or lagged values, while serial correlation refers to correlation between two different time series.
This document summarizes the key assumptions and properties of Ordinary Least Squares (OLS) regression. OLS aims to minimize the sum of squared residuals by estimating the beta coefficients. It provides the best linear unbiased estimates if its assumptions are met. The key assumptions are: 1) the regression is linear in parameters; 2) the error term has a mean of zero; 3) the error term is uncorrelated with the independent variables; 4) there is no serial correlation or autocorrelation in the error term; 5) the error term has constant variance (homoskedasticity); and 6) there is no perfect multicollinearity among independent variables. When all assumptions are met, OLS estimates
This document discusses panel data analysis. Some key points:
- Panel data combines cross-sectional and time series data to observe multiple subjects over time in balanced and unbalanced panels.
- Panel data is useful for reducing noise, studying dynamic changes, and addressing issues with limited data availability.
- Choosing between fixed effects and random effects models depends on tests like the Hausman test and whether the unobserved effects are correlated with regressors.
- Panel data regression techniques like pooled mean group allow for heterogeneity across subjects while assuming some parameters are the same.
This document discusses Granger causality and how to test for it. It provides the following key points:
1) Granger causality measures whether variable A occurs before variable B and helps predict B, but does not guarantee true causality. If A does not Granger cause B, one can be more confident A does not cause B.
2) To test for Granger causality, autoregressive models are developed with and without the variable being tested, and an F-test or t-test is used to see if adding the variable significantly lowers the residuals.
3) The document applies this to test if changes in loans Granger cause changes in deposits using quarterly U.S. financial
Unit Root Test
1: What is unit root?
2: How to check unit root?
3: Types of unit root test
4: Dickey fuller
5: Augmented dickey fuller
6: Phillip perron
7: Testing Unit Root on E-views
The document discusses using a vector autoregression (VAR) model to forecast two time series - leads and binds - that interact with each other. A 5-period VAR model is found to best capture the weekly periodicity between the series. The model is shown to accurately forecast leads 1-11 days in advance, within 2% error, and binds within 5% error over a two week period, indicating the interaction between the series can be used to predict each going forward. Some conclusions drawn are that the VAR model performs well but could be improved by trying other techniques or adding external variables.
Heteroscedasticity occurs when the variance of the error terms in a regression model are not constant, but instead vary depending on the values of the independent variables. While ordinary least squares estimators remain unbiased, their standard errors may be incorrect under heteroscedasticity. This means that confidence intervals and hypothesis tests based on the usual standard errors are unreliable and can lead to misleading conclusions.
1. A VAR model comprises multiple time series and is an extension of the autoregressive model that allows for feedback between variables.
2. The optimal lag length is chosen using information criteria like AIC and BIC to balance model fit and complexity.
3. Cointegration testing determines whether variables have a long-run relationship and whether a VECM or VAR in differences should be specified.
This document discusses testing for non-stationarity and unit roots in time series data. It introduces the Augmented Dickey-Fuller (ADF) test and Phillips-Perron test for determining if a time series is integrated of order zero (I(0)), one (I(1)), or two (I(2)). The ADF test regressions the change in a variable on its lag and lags of the change to test for a unit root. If the null of a unit root is not rejected, further tests are needed to determine higher orders of integration. While ADF and Phillips-Perron tests are commonly used, their power is low if the process is near but not at the non-station
Time series data are observations collected over time on one or more variables. Time series data can be used to analyze problems involving changes over time, such as stock prices, GDP, and exchange rates. Time series data must be stationary, meaning that its statistical properties like mean and variance do not change over time, to avoid spurious regressions. Non-stationary time series can be transformed to become stationary through differencing, removing trends, or taking logs. Common time series models like ARIMA rely on stationary data.
This document discusses simultaneous equation models and issues that arise when estimating them. It introduces the concepts of structural and reduced form equations. Estimating structural equations individually using OLS will result in biased coefficients due to endogeneity. However, the reduced form equations can be estimated consistently using OLS as their right-hand side variables are exogenous. Identification issues may also arise if not enough information is present to separately estimate the structural parameters. Tests are discussed to check for exogeneity of variables.
The document discusses heteroskedasticity in econometrics. It defines heteroskedasticity as unequal variance of error terms compared to homoskedasticity which is equal variance. This violates assumptions of ordinary least squares regression. Heteroskedasticity does not bias estimates but makes them inefficient. The document outlines various tests to detect heteroskedasticity including graphical methods and formal tests. It also discusses methods to resolve heteroskedasticity such as generalized least squares, weighted least squares, and heteroskedasticity-consistent standard errors.
This document discusses autocorrelation in time series data and its effects on regression analysis. It defines autocorrelation as errors in one time period carrying over into future periods. Autocorrelation can be caused by factors like inertia in economic cycles, specification bias, lags, and nonstationarity. While OLS estimators remain unbiased with autocorrelation, they become inefficient and hypothesis tests are invalid. Autocorrelation can be detected using graphical analysis or formal tests like the Durbin-Watson test and Breusch-Godfrey test. The Cochrane-Orcutt procedure is also described as a way to transform data and remove autocorrelation.
This document discusses multicollinearity in regression analysis. It defines multicollinearity as an exact or near-exact linear relationship between explanatory variables. In cases of perfect multicollinearity, individual regression coefficients cannot be estimated. Near or imperfect multicollinearity is more common in real data and can lead to less precise coefficient estimates with wider confidence intervals. The document discusses various methods for detecting multicollinearity, such as auxiliary regressions and variance inflation factors, and potential remedies like dropping or transforming variables. However, multicollinearity diagnosis depends on the specific data sample and goals of the analysis.
This document discusses the use of dummy variables in econometric modeling. It begins by explaining that some variables cannot be quantified numerically and provides examples where dummy variables would be used. It then discusses how dummy variables are incorporated into regression models, including intercept dummy variables, slope dummy variables, and dummy variables for multiple categories. The document also covers seasonal dummy variables and concludes by explaining the Chow test and dummy variable test for testing structural stability using dummy variables.
This document discusses autocorrelation, which occurs when there is a correlation between members of a series of observed data ordered over time or space. This violates an assumption of classical linear regression that error terms are uncorrelated. Causes of autocorrelation include inertia in macroeconomic data, specification bias from excluded or incorrectly specified variables, lags, data manipulation, and non-stationarity of time series data. Autocorrelation can be detected graphically or using the Durbin-Watson and Breusch-Godfrey tests. Remedial measures include first-difference transformation, generalized transformation, and using Newey-West standard errors.
The document provides an overview of correlation and regression analysis, time series models, and cost indexes. It defines correlation, regression analysis, and their importance and applications. It discusses simple linear regression equations, assumptions, and hypothesis testing. It also covers multiple linear regression, moving averages, exponential smoothing, and quantitative measures for evaluating time series models. The document is serving as the agenda for the Advanced Economics for Engineers course taught by Leemary Berrios, Irving Rivera, and Wilfredo Robles.
This document discusses concepts and techniques for time series analysis. It defines a time series as any series of data that varies over time, and provides examples like GDP and stock prices. It outlines precautions for using time series data in econometric models, such as checking for stationarity and guarding against spurious regressions. Stationary and non-stationary time series are defined, and unit root tests like the Dickey-Fuller test are introduced. The concepts of cointegration and error correction models are also covered, along with the Granger causality test.
Advanced Econometrics by Sajid Ali Khan Rawalakot: 0334-5439066Sajid Ali Khan
This document appears to be the introduction or table of contents to a textbook on advanced econometrics. It includes 10 chapters that cover topics such as simple linear regression, multiple linear regression, dummy variables, autocorrelation, and simultaneous equation systems. The introduction defines econometrics and discusses its goals of policy making, forecasting, and analyzing economic theories using quantitative methods. It also outlines the methodology of econometrics, which involves stating an economic theory, specifying mathematical and statistical models, collecting data, estimating parameters, testing hypotheses, forecasting, and using models for control or policy purposes.
Autocorrelation measures the correlation of a time series with its past and lagged values. It exists when observations in a time series are correlated with each other. The Durbin-Watson test can detect the presence of autocorrelation by examining the residuals of a regression model. If autocorrelation is present, it violates the assumption that errors are independent and leads to inaccurate test statistics and predictions. Common structures for autocorrelation include autoregressive (AR), moving average (MA), and autoregressive moving average (ARMA) processes.
This document provides an overview of vector autoregression (VAR) and vector error correction models (VECM) as time series methodologies. It discusses what the acronyms stand for, the practical benefits of using a VAR, how to set up and estimate a VAR, and how to interpret the results through impulse response functions and variance decompositions. A key point is that while VAR parameter estimates are often insignificant, the models can be useful for generating ancillary results like impulse responses and variance decompositions to analyze dynamic relationships between variables over time.
The document discusses panel data analysis and its application to analyzing competition in the UK banking sector. It summarizes:
1) Panel data has both time series and cross-sectional dimensions, allowing examination of how variables change over time for the same objects. A fixed effects model accounts for heterogeneity across objects.
2) A study analyzed competition in UK banking from 1980-2004 using a fixed effects panel data model. It tested for market equilibrium and calculated a contestability parameter to indicate the degree of competition.
3) The results found evidence of equilibrium and showed the contestability parameter fell from 0.78 to 0.46, suggesting competition weakened over the period.
1) Autocorrelation refers to correlation between members of a time series or cross-sectional data set ordered by time or space.
2) In a time series, successive errors are often correlated, violating the assumption of independent errors in a linear regression model.
3) Autocorrelation occurs when there is correlation between a variable and its own past or lagged values, while serial correlation refers to correlation between two different time series.
This document summarizes the key assumptions and properties of Ordinary Least Squares (OLS) regression. OLS aims to minimize the sum of squared residuals by estimating the beta coefficients. It provides the best linear unbiased estimates if its assumptions are met. The key assumptions are: 1) the regression is linear in parameters; 2) the error term has a mean of zero; 3) the error term is uncorrelated with the independent variables; 4) there is no serial correlation or autocorrelation in the error term; 5) the error term has constant variance (homoskedasticity); and 6) there is no perfect multicollinearity among independent variables. When all assumptions are met, OLS estimates
This document discusses panel data analysis. Some key points:
- Panel data combines cross-sectional and time series data to observe multiple subjects over time in balanced and unbalanced panels.
- Panel data is useful for reducing noise, studying dynamic changes, and addressing issues with limited data availability.
- Choosing between fixed effects and random effects models depends on tests like the Hausman test and whether the unobserved effects are correlated with regressors.
- Panel data regression techniques like pooled mean group allow for heterogeneity across subjects while assuming some parameters are the same.
This document discusses Granger causality and how to test for it. It provides the following key points:
1) Granger causality measures whether variable A occurs before variable B and helps predict B, but does not guarantee true causality. If A does not Granger cause B, one can be more confident A does not cause B.
2) To test for Granger causality, autoregressive models are developed with and without the variable being tested, and an F-test or t-test is used to see if adding the variable significantly lowers the residuals.
3) The document applies this to test if changes in loans Granger cause changes in deposits using quarterly U.S. financial
Unit Root Test
1: What is unit root?
2: How to check unit root?
3: Types of unit root test
4: Dickey fuller
5: Augmented dickey fuller
6: Phillip perron
7: Testing Unit Root on E-views
The document discusses using a vector autoregression (VAR) model to forecast two time series - leads and binds - that interact with each other. A 5-period VAR model is found to best capture the weekly periodicity between the series. The model is shown to accurately forecast leads 1-11 days in advance, within 2% error, and binds within 5% error over a two week period, indicating the interaction between the series can be used to predict each going forward. Some conclusions drawn are that the VAR model performs well but could be improved by trying other techniques or adding external variables.
Heteroscedasticity occurs when the variance of the error terms in a regression model are not constant, but instead vary depending on the values of the independent variables. While ordinary least squares estimators remain unbiased, their standard errors may be incorrect under heteroscedasticity. This means that confidence intervals and hypothesis tests based on the usual standard errors are unreliable and can lead to misleading conclusions.
1. A VAR model comprises multiple time series and is an extension of the autoregressive model that allows for feedback between variables.
2. The optimal lag length is chosen using information criteria like AIC and BIC to balance model fit and complexity.
3. Cointegration testing determines whether variables have a long-run relationship and whether a VECM or VAR in differences should be specified.
This document discusses testing for non-stationarity and unit roots in time series data. It introduces the Augmented Dickey-Fuller (ADF) test and Phillips-Perron test for determining if a time series is integrated of order zero (I(0)), one (I(1)), or two (I(2)). The ADF test regressions the change in a variable on its lag and lags of the change to test for a unit root. If the null of a unit root is not rejected, further tests are needed to determine higher orders of integration. While ADF and Phillips-Perron tests are commonly used, their power is low if the process is near but not at the non-station
Time series data are observations collected over time on one or more variables. Time series data can be used to analyze problems involving changes over time, such as stock prices, GDP, and exchange rates. Time series data must be stationary, meaning that its statistical properties like mean and variance do not change over time, to avoid spurious regressions. Non-stationary time series can be transformed to become stationary through differencing, removing trends, or taking logs. Common time series models like ARIMA rely on stationary data.
This document discusses simultaneous equation models and issues that arise when estimating them. It introduces the concepts of structural and reduced form equations. Estimating structural equations individually using OLS will result in biased coefficients due to endogeneity. However, the reduced form equations can be estimated consistently using OLS as their right-hand side variables are exogenous. Identification issues may also arise if not enough information is present to separately estimate the structural parameters. Tests are discussed to check for exogeneity of variables.
The document discusses heteroskedasticity in econometrics. It defines heteroskedasticity as unequal variance of error terms compared to homoskedasticity which is equal variance. This violates assumptions of ordinary least squares regression. Heteroskedasticity does not bias estimates but makes them inefficient. The document outlines various tests to detect heteroskedasticity including graphical methods and formal tests. It also discusses methods to resolve heteroskedasticity such as generalized least squares, weighted least squares, and heteroskedasticity-consistent standard errors.
This document discusses autocorrelation in time series data and its effects on regression analysis. It defines autocorrelation as errors in one time period carrying over into future periods. Autocorrelation can be caused by factors like inertia in economic cycles, specification bias, lags, and nonstationarity. While OLS estimators remain unbiased with autocorrelation, they become inefficient and hypothesis tests are invalid. Autocorrelation can be detected using graphical analysis or formal tests like the Durbin-Watson test and Breusch-Godfrey test. The Cochrane-Orcutt procedure is also described as a way to transform data and remove autocorrelation.
This document discusses multicollinearity in regression analysis. It defines multicollinearity as an exact or near-exact linear relationship between explanatory variables. In cases of perfect multicollinearity, individual regression coefficients cannot be estimated. Near or imperfect multicollinearity is more common in real data and can lead to less precise coefficient estimates with wider confidence intervals. The document discusses various methods for detecting multicollinearity, such as auxiliary regressions and variance inflation factors, and potential remedies like dropping or transforming variables. However, multicollinearity diagnosis depends on the specific data sample and goals of the analysis.
This document discusses the use of dummy variables in econometric modeling. It begins by explaining that some variables cannot be quantified numerically and provides examples where dummy variables would be used. It then discusses how dummy variables are incorporated into regression models, including intercept dummy variables, slope dummy variables, and dummy variables for multiple categories. The document also covers seasonal dummy variables and concludes by explaining the Chow test and dummy variable test for testing structural stability using dummy variables.
This document discusses autocorrelation, which occurs when there is a correlation between members of a series of observed data ordered over time or space. This violates an assumption of classical linear regression that error terms are uncorrelated. Causes of autocorrelation include inertia in macroeconomic data, specification bias from excluded or incorrectly specified variables, lags, data manipulation, and non-stationarity of time series data. Autocorrelation can be detected graphically or using the Durbin-Watson and Breusch-Godfrey tests. Remedial measures include first-difference transformation, generalized transformation, and using Newey-West standard errors.
The document provides an overview of correlation and regression analysis, time series models, and cost indexes. It defines correlation, regression analysis, and their importance and applications. It discusses simple linear regression equations, assumptions, and hypothesis testing. It also covers multiple linear regression, moving averages, exponential smoothing, and quantitative measures for evaluating time series models. The document is serving as the agenda for the Advanced Economics for Engineers course taught by Leemary Berrios, Irving Rivera, and Wilfredo Robles.
This document discusses concepts and techniques for time series analysis. It defines a time series as any series of data that varies over time, and provides examples like GDP and stock prices. It outlines precautions for using time series data in econometric models, such as checking for stationarity and guarding against spurious regressions. Stationary and non-stationary time series are defined, and unit root tests like the Dickey-Fuller test are introduced. The concepts of cointegration and error correction models are also covered, along with the Granger causality test.
ders 3.3 Unit root testing section 3 .pptxErgin Akalpler
The document discusses various unit root tests used to determine if a time series is stationary or non-stationary. It describes the Dickey-Fuller test and Augmented Dickey-Fuller test, which test for a unit root in a time series. The Augmented Dickey-Fuller test extends the Dickey-Fuller test by including lagged difference terms to account for autocorrelation. The tests are used to distinguish between trend-stationary and difference-stationary processes, which have different implications for forecasting and detecting spurious relationships between variables.
This document is a thesis that examines potential cointegration among biotech stocks. It begins with an introduction that provides background on cointegration and discusses how little is known about cointegration of individual stocks. The document then reviews relevant theoretical frameworks for unit root testing and cointegration testing. It also discusses characteristics of biotech stocks and criteria for stock valuation. The empirical analysis will apply unit root and cointegration tests to selected biotech stock data. The results may provide insights into whether these stocks share a common stochastic trend.
This document compares different methods for disaggregating low frequency economic time series data into higher frequency data: Chow-Lin (static model), Fernandez (static model), Litterman (static model), and Santo Silvacardoso (dynamic model). The Chow-Lin, Fernandez, and Litterman models are static, while Santo Silvacardoso uses a dynamic regression model. The models were used to disaggregate annual private consumption expenditure data into monthly data. Results showed that all methods produced high correlation between original and disaggregated data annually. At the monthly level, Santo Silvacardoso performed best with the lowest standard deviation, while Litterman performed worst.
6. bounds test for cointegration within ardl or vecm Quang Hoang
This document discusses using the bounds test approach within an autoregressive distributed lag (ARDL) model to test for cointegration and causality between time series variables. The ARDL model estimates error correction models involving the change in one variable (ΔYt or ΔXt) regressed on lags of itself and the other variable. The bounds test involves calculating an F-statistic and comparing it to critical value bounds - if the F-statistic exceeds the upper critical value bounds, then there is cointegration, and if it falls below the lower bounds, then there is no cointegration. The document provides the null and alternative hypotheses for the bounds test when each variable is the dependent variable in the error correction model. It also outlines the
Knowledge of cause-effect relationships is central to the field of climate science, supporting mechanistic understanding, observational sampling strategies, experimental design, model development and model prediction. While the major causal connections in our planet's climate system are already known, there is still potential for new discoveries in some areas. The purpose of this talk is to make this community familiar with a variety of available tools to discover potential cause-effect relationships from observed or simulation data. Some of these tools are already in use in climate science, others are just emerging in recent years. None of them are miracle solutions, but many can provide important pieces of information to climate scientists. An important way to use such methods is to generate cause-effect hypotheses that climate experts can then study further. In this talk we will (1) introduce key concepts important for causal analysis; (2) discuss some methods based on the concepts of Granger causality and Pearl causality; (3) point out some strengths and limitations of these approaches; and (4) illustrate such methods using a few real-world examples from climate science.
This document discusses several methods for temporal disaggregation, which is the process of estimating higher frequency data (e.g. monthly or daily) from observed lower frequency data (e.g. quarterly or yearly). It describes the Chow Lin method, which uses a linear model and regression to distribute errors among estimated high frequency values. It also discusses extensions by Fernandez and Litterman that allow for non-stationary errors by modeling the error process as a random walk or AR(1) process. The key steps of each method are outlined.
This document reviews testing for causality between variables. It begins by defining Granger causality, which tests whether including one time series helps forecast another. For bivariate systems, causality can be tested by examining coefficients in a vector autoregression (VAR) model. For multivariate systems, causality is more complex and graphical models may help. The document outlines procedures for testing causality between stationary and nonstationary time series using impulse responses, vector autoregressive moving average (VARMA) models, and other techniques. It provides examples and discusses challenges like potential omitted common factors.
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...SYRTO Project
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time Series Models. Andre Lucas. Amsterdam - June, 25 2015. European Financial Management Association 2015 Annual Meetings.
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression is a multivariate technique used when the outcome is continuous that provides slopes. Linear regression assumes a linear relationship between an independent and dependent variable, normally distributed dependent variable values, equal variances, and independence of observations. Least squares estimation is used to calculate the intercept and slope that minimize the squared differences between observed and predicted dependent variable values. The slope's significance can be tested using a t-test.
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression analyzes the relationship between a continuous outcome (dependent) variable and one or more independent (predictor) variables. Linear regression finds the line of best fit to model this relationship and estimates coefficients that can be tested for statistical significance. The assumptions of linear regression include a linear relationship between variables, normally distributed errors, homogeneity of variance, and independent observations.
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression analyzes the relationship between a continuous outcome (dependent) variable and one or more independent (predictor) variables. Linear regression finds the line of best fit to model this relationship and estimates coefficients that can be used to predict the outcome variable based on the independent variables. Key assumptions of linear regression include a linear relationship between variables, normally distributed errors, homogeneity of variance, and independence of observations. The significance of regression coefficients can be tested using t-tests and the standard error of the coefficients is also discussed.
Slideset Simple Linear Regression models.pptrahulrkmgb09
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression is a multivariate technique used when the outcome is continuous that provides slopes. Linear regression assumes a linear relationship between an independent and dependent variable, normally distributed dependent variable values, equal variances, and independence of observations. It estimates a slope and intercept through least squares estimation to minimize the squared distances between observed and predicted dependent variable values. The significance of the estimated slope can be tested using a t-test.
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression analyzes the relationship between a continuous outcome (dependent) variable and one or more independent (predictor) variables. Linear regression finds the line of best fit to model this relationship and estimates coefficients that can be tested for statistical significance. The assumptions of linear regression include a linear relationship between variables, normally distributed errors, homogeneity of variance, and independent observations.
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression is a multivariate technique used when the outcome is continuous that provides slopes. Linear regression assumes a linear relationship between an independent and dependent variable, normally distributed errors, equal variances, and independence of observations. The slope is estimated using least squares to minimize the squared differences between observed and predicted values of the dependent variable. Significance of the slope is tested using a t-test.
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression is a multivariate technique used when the outcome is continuous that provides slopes. Linear regression assumes a linear relationship between the predictor and outcome variables, normality of the outcome at each value of the predictor, equal variances of the outcome, and independence of observations. It also discusses calculating the slope and intercept via least squares estimation to find the line that best fits the data by minimizing residuals.
This document discusses time series analysis and stationarity testing. It explains that time series data is common in finance and can be observed at different intervals. When using time series data, variables may influence each other with lags and non-stationary variables can cause spurious regressions. The Dickey-Fuller test and Augmented Dickey-Fuller test are introduced to test for a unit root and determine if a time series is stationary or non-stationary. The tests compare a test statistic to critical values, and a failure to reject the null hypothesis of a unit root means the series is non-stationary.
This document discusses numerical methods for solving partial differential equations (PDEs). It begins by classifying PDEs as parabolic, elliptic, or hyperbolic based on their coefficients. It then introduces finite difference methods, which approximate PDE solutions on a grid by replacing derivatives with finite differences. In particular, it describes the forward time centered space (FTCS) scheme for solving the 1D heat equation numerically and analyzing its stability using von Neumann analysis.
Any business and economic applications of forecasting involve time series data. Re-gression models can be fit to monthly, quarterly, or yearly data using the techniques de-scribed in previous chapters. However, because data collected over time tend to exhibit trends, seasonal patterns, and so forth, observations in different time periods are re¬lated or autocorrelated. That is, for time series data, the sample of observations cannot be regarded as a random sample. Problems of interpretation can arise when standard regression methods are applied to observations that are related to one another over time. Fitting regression models to time series data must be done with considerable care.
Similar to Cointegration analysis: Modelling the complex interdependencies between financial assets (20)
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...sameer shah
Delve into the world of STREETONOMICS, where a team of 7 enthusiasts embarks on a journey to understand unorganized markets. By engaging with a coffee street vendor and crafting questionnaires, this project uncovers valuable insights into consumer behavior and market dynamics in informal settings."
1. Elemental Economics - Introduction to mining.pdfNeal Brewster
After this first you should: Understand the nature of mining; have an awareness of the industry’s boundaries, corporate structure and size; appreciation the complex motivations and objectives of the industries’ various participants; know how mineral reserves are defined and estimated, and how they evolve over time.
Economic Risk Factor Update: June 2024 [SlideShare]Commonwealth
May’s reports showed signs of continued economic growth, said Sam Millette, director, fixed income, in his latest Economic Risk Factor Update.
For more market updates, subscribe to The Independent Market Observer at https://blog.commonwealth.com/independent-market-observer.
Falcon stands out as a top-tier P2P Invoice Discounting platform in India, bridging esteemed blue-chip companies and eager investors. Our goal is to transform the investment landscape in India by establishing a comprehensive destination for borrowers and investors with diverse profiles and needs, all while minimizing risk. What sets Falcon apart is the elimination of intermediaries such as commercial banks and depository institutions, allowing investors to enjoy higher yields.
Seminar: Gender Board Diversity through Ownership NetworksGRAPE
Seminar on gender diversity spillovers through ownership networks at FAME|GRAPE. Presenting novel research. Studies in economics and management using econometrics methods.
2. Elemental Economics - Mineral demand.pdfNeal Brewster
After this second you should be able to: Explain the main determinants of demand for any mineral product, and their relative importance; recognise and explain how demand for any product is likely to change with economic activity; recognise and explain the roles of technology and relative prices in influencing demand; be able to explain the differences between the rates of growth of demand for different products.
In a tight labour market, job-seekers gain bargaining power and leverage it into greater job quality—at least, that’s the conventional wisdom.
Michael, LMIC Economist, presented findings that reveal a weakened relationship between labour market tightness and job quality indicators following the pandemic. Labour market tightness coincided with growth in real wages for only a portion of workers: those in low-wage jobs requiring little education. Several factors—including labour market composition, worker and employer behaviour, and labour market practices—have contributed to the absence of worker benefits. These will be investigated further in future work.
What's a worker’s market? Job quality and labour market tightness
Cointegration analysis: Modelling the complex interdependencies between financial assets
1. Cointegration analysis
Modelling the complex interdependencies between financial assets
Dr. Edward Thomas Jones
Bangor Business School
Bangor University
E-mail: e.t.jones@bangor.ac.uk
February 21, 2017
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 1 / 40
2. Cointegration analysis The non-stationary process
The drunkard and his dog
The notion of cointegration: An adaptation of the drunkard’s walk
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 2 / 40
4. Background Correlation analysis
Measuring simple relationships
Correlation (e.g. Pearson’s product-moment coefficient) is an easy and well-understood
approach to measuring the relationship between two variables. It is widely used in
academia and industry through risk management and hedging calculations to explain the
relationship between the movement of financial assets as well as economic data series.
ρX,Y = corr(X, Y ) =
cov(X, Y )
σX σY
=
E(X − µX )(Y − µY )
σX σY
where ρX,Y is the correlation coefficient, X and Y are two random variables with
expected values of µX and µY , and standard deviation of σX and σY , and cov(X, Y ) is
the covariance of X and Y .
A correlation value can sit anywhere between -1 (perfect negative relationship) and +1
(perfect positive relationship).
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 4 / 40
5. Background Correlation analysis
Issues with a simple correlation measure
Despite correlation being a popular tool to measuring relationships, it is unstable and is
of very limited use (see Pearl, 2000).
Two equal groups from the same dataset (up to one date and beyond that date)
could result in very different relationships between the series being implied by the
correlation.
Correlation analysis is only valid for stationary series; that is, a series with a mean
and variance that doesn’t change over time (Alexander and Dimitriu (2002)). This
condition usually requires prior de-trending of the series before performing
correlation analysis, which results in loss of valuable information.
De-trending the data series before analysis removes any possibility to detect a
common trend and the interpretation of a relationship becomes difficult when
different approaches are taken to de-trend both series. This problem is amplified
when series are integrated to different orders so as to become stationary.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 5 / 40
6. Stationarity Definition of a stationary series
What is a stationary series?
A stationary series has a mean and variance that doesn’t change over time and does not
follow any trends. That is, the joint probability distribution does not change over time.
E(yt ) = E(yt−s ) = µ
E(yt − µ)2
= E(yt−s − µ)2
= σ2
y
In addition, there is also covariance stationarity which requires the autocovariances of
the series to be unaffected by a change in time.
E(yt − µ)(yt−s − µ) = E(Yt−j − µ)(yt−j−s − µ) = δs
where µ, σ2
y , and δs are constants.
A series is integrated of order d if it must be differenced d times in order to become
stationary. A stationary series is by definition an I(0) process (i.e. no integration
required).
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 6 / 40
7. Stationarity Identifying a stationary series
The Dickey Fuller (DF) test
One of three statistical tests can be used to determine if a data series has a unit root or
not; that is, if it is stationary (no unit root) or not (with a unit root). The Augmented
Dickey Fuller (ADF) test, which is an extension of the Dickey Fuller (DF) test, is the
most commonly used tests for stationary.
The Dickey-Fuller test fits a trend line using the data from the series as the xt values
and the differences between each data (xt − xt−1) point as the y-values. In this way, the
trend-line models the relationship between each point and its change to the next point.
A stationary series would behave in such a way that;
if the x points are small, they would generally be followed by a large positive shift,
and
if the x points are large, they would be followed by a large negative shift.
In this way, large points would descend and small points would ascend towards a mean.
Points in the middle would in general have a small gradient and the overall gradient of
the trend-line would be negative.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 7 / 40
8. Stationarity Identifying a stationary series
The Augmented Dickey Fuller (ADF) test
This gradient is also called the root, and is denoted by a(1) in the equation:
x(t) − x(t−1) = a(0) + a(1)x(t−1) + εt
A t-test with special critical values (recorded empirically by Dickey and Fuller) tells us
whether the estimate for this gradient is indicative of a stationary series or not. The
t-test is calculated by dividing the result a(1) by the standard deviation of this result.
The ADF test is an extended version of the DF test, which removes all the structural
effects (i.e. autocorrelations) in the time series.
x(t) − x(t−1) = a(0) + b(1)t + a(1)x(t−1) + d(1)x(t−1) + . . . . . . + d(p−1)x(t−p+1) + εt
where b(1) is the coefficient of the time trend, d(i) are the coefficients for the lagged level
of the series, and p is the lag order of the autoregressive process. Once the structural
effects have been controlled for by the b(1) and d(i) coefficients, we are left with testing
the a(1) coefficient.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 8 / 40
9. Stationarity Identifying a stationary series
The Augmented Dickey Fuller (ADF) test (cont.)
In some situations, outliers can greatly influence the gradient of a curve. It is advisable
to use a certain amount of judgement in performing this test:
consider the graphs of the series,
as well as the graph of the trend line fitted through the scattered x(t) and
(xt − xt−1) points,
notice outliers that may influence stationarity as well as large shifts in the data
that may cause an otherwise stationary series to appear non-stationary.
By including lags of order p, the ADF formulation allows for higher-order autoregressive
processes. This means that the lag length p has to be determined when applying this
test.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 9 / 40
10. Stationarity Identifying a stationary series
The Phillips-Perron (P-P) test
An alternative approach to identifying a unit root is the Phillips-Perron (P-P) test,
which builds upon the ADF test.
The difference between both tests is their approach in addressing the issue that the
process generating the data series might have a higher order of autocorrelation than is
admitted in the test equation.
the ADF addresses this issue by introducing lags of the first differences
(d(p−1)x(t−p+1)) as the regressors in the test equation,
the P-P test makes a non-parametric correction to the t-test statistic by using the
Newey-West standard errors to account for serial correlation.
Given this correction, the P-P test is robust with respect to unspecified autocorrelation
and heteroscedasticity in the disturbance term ε of the test equation.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 10 / 40
11. Stationary Lags selection for use in a statistical test
Identifying the number of lags p
Before calculating a test for unit root, it is necessary to decide upon the number of lags
p to be included in the test equation.
too many lags could increase the error in the forecast,
too few could leave out relevant information.
The number of lags to be included in the test equation can be calculated by using
either:
Schwarz’s Bayesian Information Criterion (SBIC),
Akaike’s Information Criterion (AIC), or
Hanna and Quinn Information Criterion (HQIC).
When all three agree, the lag selection is clear. According to Ivanov and Kilian (2001):
AIC tends to be more accurate for monthly data,
HQIC works better for quarterly data on samples over 120 observations, and
SBIC works fine with any sample size for quarterly data.
In most cases, the preferred model is one that has the fewest parameters to estimate.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 11 / 40
12. Modelling dynamic patterns of association Bivariate VAR models
Vector Autoregressive (VAR) models
Vector Autoregressive (VAR) models provide a framework for modelling dynamic
relationships between stationary time series variables (Sims, 1980).
In a bivariate first-order VAR(1) model, there are two variables y1t and y2t . Each
depends partly on the lagged values y1t−1 and y2t−1, and partly on its own random
disturbance term u1t and u2t :
y1t = β10 + β11y1t−1 + α11y2t−1 + u1t
y2t = β20 + α21y1t−1 + β21y2t−1 + u2t
Higher-order VAR models simply add further lagged terms on the right hand-side. A
bivariate VAR(p) model would be:
y1t = β10 + β11y1t−1 + α11y2t−1 + . . . . . . + β1py1t−p + α1py2t−p + u1t
y2t = β20 + β21y1t−1 + α21y2t−1 + . . . . . . + α2py1t−p + β2py2t−p + u2t
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 12 / 40
13. Modelling dynamic patterns of association Multivariate VAR models
Vector Autoregressive (VAR) models (cont.)
A VAR model does not have to be bivariate; it can be multivariate. We could have a
system of equations for three, four or any number of variables. However, as the number
of equations and the lag-length increases, the number of parameters escalates rapidly.
The estimation of larger VAR models can run into degrees of freedom problems of too
many parameters and too few observations.
Matrix notation provides a more concise representation of VAR models. The bivariate
first-order VAR(1) model . . .
y1t = β10 + β11y1t−1 + α11y2t−1 + u1t
y2t = β20 + α21y1t−1 + β21y2t−1 + u2t
. . . can be written in matrix form as:
y1t
y2t
=
β10
β20
+
β11 α11
α21 β21
y1t−1
y2t−1
+
u1t
u2t
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 13 / 40
14. Modelling dynamic patterns of association Multivariate VAR models
Vector Autoregressive (VAR) models (cont.)
The matrix formula can be written more concisely as:
Yt = β0 + β1Yt−1 + ut
where Yt , Yt−1, and ut are (2 × 1) column vectors, β0 is a (2 × 1) column vector, and
β1 is a (2 × 2) matrix:
Yt =
y1t
y2t
, Yt−1 =
y1t−1
y2t−1
, ut =
u1t
u2t
, β0 =
β10
β20
, β1 =
β11 α11
α21 β21
By extension, in the most general case, an m-variate VAR(p) model is written:
Yt = β0 + β1Yt−1 + . . . + βpYt−p + ut
where Yt , Yt−1, and ut are (m × 1), β0 is (m × 1), β1, . . . , βp are (m × m) matrices.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 14 / 40
15. Modelling dynamic patterns of association VARs with exogenous variables
Vector Autoregressive (VAR) models (cont.)
It is also possible to specify VARs in which the present values of the endogenous
variables y1t and y2t are determined partly by their own history, and partly by the present
and/or past values of other exogenous variables.
A first-order bivariate VAR with s exogenous variables collected together in the matrix
Xt , whose current values influence the current values of the two endogenous variables
y1t and y2t takes the following form:
Yt = β0 + β1Yt−1 + ΨXt + ut
where Yt , Yt−1, and ut are (2 × 1), Xt is (s × 1), β0 is (2 × 1), β1 is (2 × 2), and Ψ is
(2 × s) matrices.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 15 / 40
16. Modelling dynamic patterns of association Estimations of VARs
Vector Autoregressive (VAR) models (cont.)
If T observations on all variables are available, a full set of lagged variables will be
available only for observations t = p + 1, p + 2, . . . , T. Therefore the number of
observations that can be used in the estimation of the VAR(p) model is T − p.
Equation-by-equation OLS could be used to obtain estimates of the coefficients of the
matrices β0, β1, . . . , βp. However, in practice it is usual to estimate these coefficients
using a technique called Seemingly Unrelated Regressions (SUR) Estimation.
When the variables included on the right-hand-side of equation are identical, SUR
produces the same estimated coefficients as equation-by-equation OLS; but it produces
different standard errors and hypothesis test statistics.
If the variables included on the right-hand-side of each equation are not identical, SUR
and equation-by-equation OLS produce different estimated coefficients.
With SUR, the estimated variance-covariance matrix of the disturbance terms takes into
account any contemporaneous correlation between the disturbance terms of different
equations (this is ignored in equation-by-equation OLS).
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 16 / 40
17. Modelling dynamic patterns of association Lags selection for use in VAR models
Identifying the number of lags p in VAR models
Information criteria can be used to determine the most appropriate value of p, the
lag-length, based on the determinant of the variance covariance matrix of the
disturbance terms:
Σ = E(ut , ut ) where ut =
u1t
u2t
. . .
umt
where umt are the disturbance terms.
⇒ Σ =
var(uit ) cov(u1t , u2t ) . . . cov(u1t , umt )
cov(u2t , u1t ) var(u2t ) . . . cov(u2t , umt )
. . . . . . . . . . . .
cov(umt , u1t ) cov(umt , u2t ) . . . var(umt )
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 17 / 40
18. Modelling dynamic patterns of association Lags selection for use in VAR models
Identifying the number of lags p in VAR models (cont.)
If the uit ’s are normally distributed, the maximum likelihood estimator of Σ is:
ΣEST =
Σe2
1t /n Σe1t e2t /n . . . Σe1t emt /n
Σe2t e1t /n Σe2
2t /n . . . Σe2t emt /n
. . . . . . . . . . . .
Σemt e1t /n Σemt e2t /n . . . Σe2
mt /n
where eit is residual for the t’th observation in the i’th equation of the estimated model.
The determinant of ΣEST , denoted |ΣEST |, is calculated from the elements of ΣEST . The
determinant of any matrix can be interpreted as a single numerical summary of the
’information’ that is contained in the matrix. |ΣEST | plays a similar to the residual sum
of squares (Σe2
t ).
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 18 / 40
19. Modelling dynamic patterns of association Lags selection for use in VAR models
Identifying the number of lags p in VAR models (cont.)
The multivariate versions of the Akaike and Schwartz Information Criteria are:
MAIC = ln(|ΣEST |) + 2k/(T − P)
MSIC = ln(|ΣEST |) + k/(T − p)ln(T − p)
where k is the total number of estimated coefficients (across all m equations in the
VAR), and (T − p) is the number of observations used in the estimation.
The specification (i.e. the value of p) which produces the smallest MAIC or MSIC
should be selected.
Note, comparison should only be drawn between versions of the model that have been
estimated over the same observations; that is, when comparing a VAR(p) with a
VAR(p-1) model, do not include the extra available observation in the estimation of the
VAR(p-1).
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 19 / 40
20. Cointegration analysis Definition of cointegration
Non-stationary VAR models
If yt and xt are non-stationary, the estimation of the VAR model is subject to the
’spurious regression’ problem, and modelling cannot proceed without any adjustment. In
some cases, however, it is possible to find a pair of constants, π1 and π2, such that:
vt = yt − π1 − π2xt
is stationary or I(0), even when yt and xt are both non-stationary or I(1). Note that vt is
just a linear function of yt and xt . If this is possible (i.e. if vt is stationary), then yt and
xt are said to be cointegrated.
If two non-stationary time series variables are cointegrated, they tend to ’move together’
over time; in other words, they are bounded together by a long-run equilibrium
relationship.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 20 / 40
21. Cointegration analysis Definition of cointegration
Non-stationary VAR models (cont.)
Before investigating cointegration, we might originally have been thinking of fitting one
of the following specifications:
yt = β10 + β20xt + α1yt−1 + β21xt−1 + t
yt = β10 + β20xt + α1yt−1 + β21xt−1 + α2yt−2 + β22xt−2 + t
. . . or however many lags are required.
Both equations are specified in terms of I(1) variables. However, if yt and xt are
cointegrated, both specifications can easily be rearranged so that they contain I(0)
variables only:
yt − yt−1 = β10 + β20(xt − xt−1) + (α1 − 1)yt−1 + (β21 + β20)xt−1 + t
⇒ ∆yt = δ20∆xt + Ψ(yt−1 − π1 − π2xt−1) + t
where δ20 = β20, ψ = α1 − 1, π1 = β10/(1 − α1), π2 = (β21 + β20)/(1 − α1).
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 21 / 40
22. Cointegration analysis Definition of cointegration
A general definition of cointegration
The aim of cointegration is to detect any stochastic trends in the series and use these
trends for a dynamic analysis of correlation.
With cointegration, if two or more time series are non-stationary but there exists a
linear combination of them that is stationary then we can correctly conduct
hypothesis testing concerning the relationship between the variables (Engle, 1987).
The main advantage of cointegration analysis, as compared to the standard measure of
relationship such as correlation, is that it allows the use of the entire information set
when the series are non-stationary.
Furthermore, cointegrating is able to explain the long-run behaviour of related series,
whereas correlation usually lacks stability, because it is a short-run measure of
co-dependency.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 22 / 40
23. Cointegration analysis Definition of cointegration
A technical definition of cointegration
We say that components of the vector ˜Xt are cointegrated of order d,b, which is
denoted by ˜Xt ∼ CI(d, b) if:
All components of ˜Xt are integrated of order d,
There exists a vector ˜β such that the linear combination
˜β ˜Xt = β1X1t + β2X2t + . . . . . . + βnXnt
is integrated of order (d,b), where b > 0 and ˜β is the Cointegrated Vector (CV).
While the amount of historical data required to support the cointegrating relationship
may be large, the attempt to use the same sample to estimate correlation coefficients
may face many obstacles such as outliers and volatility clustering.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 23 / 40
24. Cointegration analysis Definition of cointegration
Important aspects of cointegration
Cointegration refers to a linear combination of non-stationary variables. The CV is
not unique.
For example, if (β1, β2, . . . + βn) is a CV, then for non-zero λ,
(λβ1, λβ2, . . . + λβn) is also a CV. Typically the CV is normalised with respect to
x1t by selecting λ = 1/βn.
All variables must be integrated of the same order. This is a prior condition for the
presence of a cointegrating relationship. The inverse is not true; this condition
does not imply that all similarly integrated variables are cointegrated, in fact it is
usually not the case.
If the vector ˜Xt has n components, there may be as many as (n-1) linearly
independent cointegrating vectors. For example, if n = 2 then there can be at
most one independent CV.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 24 / 40
25. Cointegration analysis Cointegration and the error correction model
The error correction model
∆yt = δ20∆xt + Ψ(yt−1 − π1 − π2xt−1) + t is known as an error correction model. On
the right-hand side:
The term in ∆xt represents the model’s short-run dynamics. It contains
information about the extent to which current changes in xt influence current
changes in yt (i.e. ∆xt influences ∆yt ).
The term Ψ(yt−1 − π1 − π2xt−1), which can also be written Ψvt−1, is known as
the error correction mechanism. Recall yt−1 = π1 + π2xt−1 represents the long-run
equilibrium relationship between xt and yt . Accordingly:
If vt−1 = yt−1 − π1 − π2xt−1 > 0, yt−1 was above its equilibrium value at t − 1;
If vt−1 = yt−1 − π1 − π2xt−1 < 0, yt−1 was below its equilibrium value at t − 1.
We expect Ψ < 0, so that; vt−1 > 0 ⇒ Ψvt−1 < 0 and a tendency for ∆yt < 0 (yt
falling), and vt−1 < 0 ⇒ Ψvt−1 > 0 and a tendency for ∆yt > 0 (yt rising). The error
correction model incorporates an adjustment process, pushing yt towards equilibrium at
time t whenever yt was out of equilibrium at time t − 1.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 25 / 40
26. Cointegration analysis The Engle-Granger’s approach
A two-step approach for identifying cointegration
Engle and Granger (1987) proposed a simple two-step (residual-based) approach to test
for cointegration and incorporating the relationship into an estimated model. Other tests
have been developed, including the Johansen procedure that allows the testing of several
series (unlike Engle and Granger approach that is restricted to testing only two series).
The first step of the Engle-Granger’s approach is to test each series individually for
their order of integration. If the individual time series are integrated of different
orders then it can be concluded with certainty that they are not cointegrated.
The Engle-Granger procedure is applicable only if both variables are non-stationary
and I(0). Assume that xt and yt are both I(0). If either or both of xt and yt are
non-stationary and I(2), then the procedure could be applied by using the
first-differences of the I(2) variable.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 26 / 40
27. Cointegration analysis Engle-Granger’s approach
Engle-Granger’s approach: Step 1
Obtain the estimated cointegration regression: yt = π1 − π2xt−1 + vt .
Save the residuals vt = yt − π1 − π2xt−1, and test vt for stationarity using a DF or
ADF-type procedure:
Test H0 : ρ = 0 against H0 : ρ < 0 in one of:
∆vt = ρvt−1 + t
or ∆vt = ρvt−1 + δ1∆vt−1 + t
or ∆vt = ρvt−1 + δ1∆vt−1 + δ2∆vt−2 + t
. . . or however many lags are required.
Accept H0 ⇒ vt is non-stationary ⇒ stop (yt and xt are not cointegrated)
Reject H0 ⇒ vt is stationary ⇒ proceed to step 2 (yt and xt are cointegrated)
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 27 / 40
28. Cointegration analysis Engle-Granger’s approach
Engle-Granger’s approach: Step 2
AIC or SBIC can be used to select the lag-length for the DF/ADF-type
autoregression. Selecting the correct lag-length is important because the result of
the cointegration test is sensitive to the lag-length chosen.
No constant term or trend is required in the DF/ADF-type autoregression, because
the sample mean vt is zero and vt is untrended.
A separate set of critical values (produced by Engle and Granger) is required to
determine acceptance or rejection of H0. In the multivariate case, the critical
values are dependent on the number of xt ’s included on the right-hand side of the
cointegration regression.
The standard Dickey-Fuller critical values cannot be used, because {vt } are
residuals from regression of yt on xt . This regression will already have introduced
an element of ’smoothing’ into {vt }, which the Engle-Granger critical values take
into account.
In common with the DF and ADF test, the Engle-Granger cointegration test has
low power in small samples. It is common not to find evidence of cointegration
with this test.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 28 / 40
29. Cointegration analysis Engle-Granger’s approach
Engle-Granger’s approach: Step 2
Obtain the estimated error correction model by estimating one of the following using
OLS:
∆yt = δ20∆xt + Ψvt−1 + t
or ∆yt = δ20∆xt + δ11∆yt−1 + δ21∆xt−1 + Ψvt−1 + t
. . . or however many lags are required.
AIC or SIC can be used to select the lag-length for the lagged ∆yt ’s and ∆xt ’s in
the error correction model.
It is not possible to perform hypothesis tests on π1 and π2, which is a serious
limitation of the Engle-Granger procedure, but hypothesis tests on δ20, δ11, δ21 and
Ψ can be performed.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 29 / 40
30. Cointegration analysis The Engle-Granger’s approach
Drawbacks of the Engle-Granger’s approach
If the variables are in fact cointegrated then OLS regression yields superconsistent
estimates of the cointegrating parameters β0 and β1 (CV). It has been shown by Stock
(1987) that OLS coefficient estimates converge faster towards their parameter values in
the presence of a cointegrating relationship compared with regressions involving
stationary variables. If deviations from the long-run equilibrium εt are found to be
stationary, I(0), then the yt and zt sequences are cointegrated of order (1,1). The
augmented Dickey-Fuller test can be used to determine the stationarity of the residual
series εt .
The Engle and Granger approach is relatively straight forward and easily implemented in
practice. However there are significant drawbacks of this approach:
The Engle and Granger test for cointegration uses residuals from either of the two
equilibrium equations,
The major problem regarding the Engle and Granger procedure is that it relies on a
two-step estimator.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 30 / 40
31. Cointegration analysis Johansen’s approach
Cointegration as a special case of VAR
The Johansen (1988) maximum likelihood estimators circumvent the use of a two-step
estimator and in doing so avoid the drawbacks faced by Engle and Granger.
Instead, the Johansen (1988) procedure relies heavily on the relationship between the
rank of a matrix and its characteristic roots. Johansen (1988) demonstrated that
cointegration can also be modelled with a modified Vector Autoregression (VAR)
framework. In order to keep notation as simple as possible, this topic will be discussed
for the bivariate case only. Consider the following bivariate VAR(1) model:
y1t
y2t
=
β10
β20
+
β11 α11
α21 β21
y1t−1
y2t−1
+
u1t
u2t
Suppose y1t and y2t are both non-stationary or I(1), but a linear combination of y1t and
y2t exists which is stationary or I(0). Therefore y1t and y2t are cointegrated. In this case,
the bivariate VAR(1) model can be reparameterised so that it is expressed in terms of
I(0) variables only, as follows:
y1t − y1t−1
y2t − y2t−1
=
β10
β20
+
β11 − 1 α11
α21 β21 − 1
y1t−1
y2t−1
+
u1t
u2t
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 31 / 40
32. Cointegration analysis Johansen’s approach
Specification of a Vector Error Correction Model (VECM)
⇒
y1t
y2t
=
β10
β20
+
π11 π12
π21 π22
y1t−1
y2t−1
+
u1t
u2t
where π = β11 − 1, π12 = α11, π21 = α21, π = β21 − 1.
or ∆Yt = β0 + πYt−1 + ut
The above equation is know as a Vector Error Correction Model (VECM) representation
of the bivariate VAR(1) model. Now consider the bivariate VAR(2) model:
y1t
y2t
=
β10
β20
+
β11 α11
α21 β21
y1t−1
y2t−1
+
β12 α12
α22 β22
y1t−2
y2t−2
+
u1t
u2t
Again, if y1t and y2t are both I(0), but a linear combination of y1t and y2t exists which is
I(O), then the above bivariate VAR(2) model can be reparameterised and expressed in
terms of I(0) variables only, as follows:
∆y1t
∆y2t
=
β10
β20
+
π11 π12
π21 π22
y1t−1
y2t−1
+
δ11 γ12
γ21 δ22
∆y1t−1
∆y2t−1
+
u1t
u2t
where π11 = β11 + β12 − 1, . . . , δ11 = −β12, . . . and so on.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 32 / 40
33. Cointegration analysis Johansen’s approach
Cointegration as a special case of VAR (cont.)
In the Engle-Granger formulation, there is a presumption that yt is partly determined by
xt . Accordingly ∆yt depends on ∆xt , as well as ∆yt−1 and ∆xt−1.
In the Johansen formulation, y1t and y2t are treated symmetrically, and no causation is
assumed between the current values in either direction. ∆y1t depends only on the lagged
values ∆y1t−1, ∆y2t−1 (and higher-order lags if applicable); similarly, ∆y2t depends only
on the lagged values ∆y1t−1, ∆y2t−1 etc.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 33 / 40
34. Cointegration analysis Johansen’s approach
Conditions for cointegration
Johansen showed that the condition for a stationary or I(0) linear combination of y1t and
y2t to exist (the condition for y1t and y2t to be cointegrated) depends on the rank of the
matrix π =
π11 π12
π21 π22
in the VECMs.
If y1t and y2t are both stationary or I(0), rank (π) = 2. In this case, any linear
combination of y1t and y2t is stationary. The matrix π contains 2 cointegrating
vectors. In one sense, y1t and y2t are trivially cointegrated. However, it would not
be common practice to refer to y1t and y2t as cointegrated in this case.
If y1t and y2t are both non-stationary or I(1), but a linear combination of y1t and
y2t exists which is stationary or I(0), rank (π) = 1 and y1t and y2t are
cointegrated. The matrix π contains 1 cointegrating vector.
If y1t and y2t are both non-stationary or I(1), and no stationary linear combination
of y1t and y2t exists, rank (π) = 0 and y1t and y2t are not cointegrated. The
matrix π contains no cointegrating vectors.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 34 / 40
35. Cointegration analysis Johansen’s approach
Interpreting the long-run equilibrium relationship
If rank (π) = 1, it is possible to decompose π as follows:
π =
π11 π12
π21 π22
=
a1
a2
1 − b2 =
a1 −a1b2
a2 −a2b2
a1 and a2 are known as the adjustment parameters. These parameters play a similar role
to Ψ seen in the previous error correction model. (1 − b2) is known as the cointegrating
vector.
In equilibrium, nothing is changing and there is no disturbance. In order to study the
long-run equilibrium relationship between y1t and y2t , we set variables in first-difference,
and the error term to zero.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 35 / 40
36. Cointegration analysis Johansen’s approach
Testing for cointegration
In Johansen VECM formulation, to test for cointegration we need to test hypotheses
concerning the rank of the matrix π. Let r denote rank (π).
Johansen developed two test statistics, known as the trace statistic and the maximal
eigenvalue statistic. Using Johansen notation, these are denoted λtrace and λmax .
The formulation of the null hypothesis differs very slightly between the two procedures.
In both cases, acceptance or rejection of the null is decided by comparing the test
statistic with special critical values compiled by Johansen.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 36 / 40
37. Cointegration analysis Johansen’s approach
Testing for cointegration (cont.)
For a bivariate model, the tests would be carried out in two stages:
λtrace λmax
Stage1 H0 : r = 0 H0 : r = 0
H1 : r > 0 H1 : r = 1
Accept H0 ⇒ series are I(1) and not cointegrated ⇒ STOP.
Reject H0 ⇒ proceed to Stage 2.
λtrace λmax
Stage2 H0 : r ≤ 1 H0 : r ≤ 1
H1 : r = 2 H1 : r = 2
Accept H0 ⇒ series are I(1) and cointegrated.
Reject H0 ⇒ series are I(0).
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 37 / 40
38. Cointegration analysis References
References
Alexander, C. and Dimitriu, A. (2002) The Cointegration Alpha: Enhanced Index Tracking and
Long-Short Equity Market Neutral Strategies. ISMA Finance Discussion Paper No. 2002-08.
June. pp. 1-55.
Dickey, D. and Fuller, W. (1979) Distribution of the Estimators for Autoregressive Time Series
with a Unit Root. Journal of the American Statistical Association, 74, pp. 427-431.
Engle, R. and Granger, C. (1987) Co-integration and Error Correction: Representation,
Estimation and Testing. Econometrica, 55(2), pp. 251-76.
Johansen, S. (1988) Statistical Analysis of Cointegration Vectors. Journal of Economic
Dynamics and Control, 12, p. 231-254.
Pearl, J. (2000) Causal Inference Without Counterfactuals: Comment. Journal of the
American Statistical Association, 95 (450), pp. 428-431.
Stock, J. (1987) Asymptotic Properties of Least Squares Estimators of Cointegrating Vectors.
Econometrica, 55(2), pp. 1035-1056.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 38 / 40
39. Appendix Granger causality test
Granger causality test
Granger (1969) describes a test that assess whether the past values of one variable
’cause’ the current values of another variable.
Consider the bivariate VAR(2) model:
y1t
y2t
=
β10
β20
+
β11 α11
α21 β21
y1t−1
y2t−1
+
β12 α12
α22 β22
y1t−1
y2t−1
+
u1t
u2t
This model is equivalent to:
y1t = β10 + β11y1t−1 + α11y2t−1 + β12y1t−2 + α12y2t−2 + u1t
y2t = β20 + α21y1t−1 + β21y2t−1 + α22y1t−2 + β22y2t−2 + u2t
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 39 / 40
40. Appendix Granger causality test
Hypothesis of Granger causality testing
The terminology of Granger causality testing:
The variable y1 Granger causes the variable y2 if the coefficients on y1t−1 and
y1t−2 are significant in the equation for y2t .
The variable y2 Granger causes the variable y1 if the coefficients on y2t−1 and
y2t−2 are significant in the equation for y1t .
Normally, Granger causality tests involve testing restrictions on one equation at a time.
A standard F-test can be used to determine acceptance or rejection of the following:
H0 : α21 = α22 = 0. If we can reject this null hypothesis, we can infer that y1
Granger cause y2.
H0 : α12 = α12 = 0. If we can reject this null hypothesis, we can infer that y2
Granger cause y1.
Dr. Edward Thomas Jones Cointegration analysis February 21, 2017 40 / 40