Any business and economic applications of forecasting involve time series data. Re-gression models can be fit to monthly, quarterly, or yearly data using the techniques de-scribed in previous chapters. However, because data collected over time tend to exhibit trends, seasonal patterns, and so forth, observations in different time periods are re¬lated or autocorrelated. That is, for time series data, the sample of observations cannot be regarded as a random sample. Problems of interpretation can arise when standard regression methods are applied to observations that are related to one another over time. Fitting regression models to time series data must be done with considerable care.
The document discusses the assumptions and properties of ordinary least squares (OLS) estimators in linear regression analysis. It notes that OLS estimators are best linear unbiased estimators (BLUE) if the assumptions of the linear regression model are met. Specifically, it assumes errors have zero mean and constant variance, are uncorrelated, and are normally distributed. Violation of the assumption of constant variance is known as heteroscedasticity. The document outlines how heteroscedasticity impacts the properties of OLS estimators and their use in applications like econometrics.
This presentation is aimed at fitting a Simple Linear Regression model in a Python program. IDE used is Spyder. Screenshots from a working example are used for demonstration.
The document provides an introduction to linear algebra concepts for machine learning. It defines vectors as ordered tuples of numbers that express magnitude and direction. Vector spaces are sets that contain all linear combinations of vectors. Linear independence and basis of vector spaces are discussed. Norms measure the magnitude of a vector, with examples given of the 1-norm and 2-norm. Inner products measure the correlation between vectors. Matrices can represent linear operators between vector spaces. Key linear algebra concepts such as trace, determinant, and matrix decompositions are outlined for machine learning applications.
This document provides an overview of linear regression analysis. It defines key terms like dependent and independent variables. It describes simple linear regression, which involves predicting a dependent variable based on a single independent variable. It covers techniques for linear regression including least squares estimation to calculate the slope and intercept of the regression line, the coefficient of determination (R2) to evaluate the model fit, and assumptions like independence and homoscedasticity of residuals. Hypothesis testing methods for the slope and correlation coefficient using the t-test and F-test are also summarized.
This presentation introduces regression analysis. It discusses key concepts such as dependent and independent variables, simple and multiple regression, and linear and nonlinear regression models. It also covers different types of regression including simple linear regression, cross-sectional vs time series data, and methods for building regression models like stepwise regression and forward/backward selection. Examples are provided to demonstrate calculating regression equations using the least squares method and computing deviations from mean values.
The document discusses regularization techniques for machine learning models called Ridge and Lasso regression. Ridge regression, also known as L2 regularization, introduces a small bias to models to minimize testing error by reducing variance. It works by adding penalties for large weights proportional to the square of the weight. Lasso regression, or L1 regularization, is similar but can exclude useless variables from models by setting some weights to zero. Both techniques aim to reduce overfitting and improve generalization to unlabeled data.
The document discusses the assumptions and properties of ordinary least squares (OLS) estimators in linear regression analysis. It notes that OLS estimators are best linear unbiased estimators (BLUE) if the assumptions of the linear regression model are met. Specifically, it assumes errors have zero mean and constant variance, are uncorrelated, and are normally distributed. Violation of the assumption of constant variance is known as heteroscedasticity. The document outlines how heteroscedasticity impacts the properties of OLS estimators and their use in applications like econometrics.
This presentation is aimed at fitting a Simple Linear Regression model in a Python program. IDE used is Spyder. Screenshots from a working example are used for demonstration.
The document provides an introduction to linear algebra concepts for machine learning. It defines vectors as ordered tuples of numbers that express magnitude and direction. Vector spaces are sets that contain all linear combinations of vectors. Linear independence and basis of vector spaces are discussed. Norms measure the magnitude of a vector, with examples given of the 1-norm and 2-norm. Inner products measure the correlation between vectors. Matrices can represent linear operators between vector spaces. Key linear algebra concepts such as trace, determinant, and matrix decompositions are outlined for machine learning applications.
This document provides an overview of linear regression analysis. It defines key terms like dependent and independent variables. It describes simple linear regression, which involves predicting a dependent variable based on a single independent variable. It covers techniques for linear regression including least squares estimation to calculate the slope and intercept of the regression line, the coefficient of determination (R2) to evaluate the model fit, and assumptions like independence and homoscedasticity of residuals. Hypothesis testing methods for the slope and correlation coefficient using the t-test and F-test are also summarized.
This presentation introduces regression analysis. It discusses key concepts such as dependent and independent variables, simple and multiple regression, and linear and nonlinear regression models. It also covers different types of regression including simple linear regression, cross-sectional vs time series data, and methods for building regression models like stepwise regression and forward/backward selection. Examples are provided to demonstrate calculating regression equations using the least squares method and computing deviations from mean values.
The document discusses regularization techniques for machine learning models called Ridge and Lasso regression. Ridge regression, also known as L2 regularization, introduces a small bias to models to minimize testing error by reducing variance. It works by adding penalties for large weights proportional to the square of the weight. Lasso regression, or L1 regularization, is similar but can exclude useless variables from models by setting some weights to zero. Both techniques aim to reduce overfitting and improve generalization to unlabeled data.
Multiple regression analysis is a powerful technique used for predicting the unknown value of a variable from the known value of two or more variables.
This document discusses autocorrelation in time series data and its effects on regression analysis. It defines autocorrelation as errors in one time period carrying over into future periods. Autocorrelation can be caused by factors like inertia in economic cycles, specification bias, lags, and nonstationarity. While OLS estimators remain unbiased with autocorrelation, they become inefficient and hypothesis tests are invalid. Autocorrelation can be detected using graphical analysis or formal tests like the Durbin-Watson test and Breusch-Godfrey test. The Cochrane-Orcutt procedure is also described as a way to transform data and remove autocorrelation.
- Time series data involves observations across time that are related as part of a stochastic or random process. We can only observe one realization of this process.
- Static regression models assume the dependent variable is related only to independent variables in the same time period. Finite distributed lag (FDL) models allow for lags where past values of independent variables may affect current dependent values.
- In FDL models, coefficients measure the impact of temporary or permanent changes in independent variables on dependent variables over time, showing the "memory" or lag effects. Coefficients of permanent changes sum to indicate total long-run impact.
▸ Machine Learning / Deep Learning models require to set the value of many hyperparameters
▸ Common examples: regularization coefficients, dropout rate, or number of neurons per layer in a Neural Network
▸ Instead of relying on some "expert advice", this presentation shows how to automatically find optimal hyperparameters
▸ Exhaustive Search, Monte Carlo Search, Bayesian Optimization, and Evolutionary Algorithms are explained with concrete examples
This document discusses heteroskedasticity in econometric models. It defines heteroskedasticity as non-constant variance of the error term, in contrast to the homoskedasticity assumption of constant variance. It explains that while OLS estimates remain unbiased with heteroskedasticity, the standard errors are biased. Robust standard errors can provide consistent standard errors even with heteroskedasticity. The Breusch-Pagan and White tests are presented as methods to test for the presence of heteroskedasticity based on the residuals. Weighted least squares is also introduced as a method to obtain more efficient estimates than OLS when the form of heteroskedasticity is known.
This document summarizes various optimization techniques for deep learning models, including gradient descent, stochastic gradient descent, and variants like momentum, Nesterov's accelerated gradient, AdaGrad, RMSProp, and Adam. It provides an overview of how each technique works and comparisons of their performance on image classification tasks using MNIST and CIFAR-10 datasets. The document concludes by encouraging attendees to try out the different optimization methods in Keras and provides resources for further deep learning topics.
The document discusses the topic of autocorrelation. It begins by defining autocorrelation as data that is correlated with itself over successive time periods, rather than being correlated with other external data. It then explains the concept of autocorrelation and how it violates the classical linear regression assumption that disturbances are independent over time. Several potential sources of autocorrelation are described, including omitted variables, interpolation of data, and misspecification of the random error term. The document concludes by providing mathematical expressions that describe how the mean, variance, and covariance of autocorrelated disturbances differ from the independent case.
Abstract: This PDSG workshop introduces basic concepts of multiple linear regression in machine learning. Concepts covered are Feature Elimination and Backward Elimination, with examples in Python.
Level: Fundamental
Requirements: Should have some experience with Python programming.
Visual Explanation of Ridge Regression and LASSOKazuki Yoshida
Ridge regression and LASSO are regularization techniques used to address overfitting in regression analysis. Ridge regression minimizes residuals while also penalizing large coefficients, resulting in all coefficients remaining in the model. LASSO also minimizes residuals while penalizing large coefficients, but performs continuous variable selection by driving some coefficients to exactly zero. Both techniques involve a tuning parameter that controls the strength of regularization. Cross-validation is commonly used to select the optimal tuning parameter value.
Multinomial logisticregression basicrelationshipsAnirudha si
This document provides an overview of multinomial logistic regression. It discusses how multinomial logistic regression compares multiple groups through binary logistic regressions. It describes how to interpret the results, including evaluating the overall relationship between predictors and the dependent variable and relationships between individual predictors and the dependent variable. Requirements and assumptions of the analysis are explained, such as the dependent variable being non-metric and cases-to-variable ratios. Methods for evaluating model accuracy and usefulness are also outlined.
2. Linear Algebra for Machine Learning: Basis and DimensionCeni Babaoglu, PhD
The seminar series will focus on the mathematical background needed for machine learning. The first set of the seminars will be on "Linear Algebra for Machine Learning". Here are the slides of the second part which is discussing basis and dimension.
Here is the link of the first part which was discussing linear systems: https://www.slideshare.net/CeniBabaogluPhDinMat/linear-algebra-for-machine-learning-linear-systems/1
Polynomial regression models the relationship between variables as a polynomial equation rather than a linear one. It allows for modeling of curvilinear relationships. The document discusses the definition of polynomial regression, why it is used, its history, the regression model and matrix form, how to implement it in Matlab, its advantages in fitting flexible curves, and its disadvantages related to sensitivity to outliers.
The document provides an overview of linear models and their extensions for data science applications. It begins with an introduction to linear regression and how it finds the coefficients that minimize squared error loss. It then discusses generalizing linear models to binary data using link functions. Regularization methods like ridge regression, lasso, elastic net, and grouped lasso are introduced to reduce overfitting. The document also covers extensions such as generalized additive models, support vector machines, and mixed effects models. Overall, the document aims to convince the reader that simple linear models can be very effective while also introducing more advanced techniques.
The document discusses machine learning classification using the MNIST dataset of handwritten digits. It begins by defining classification and providing examples. It then describes the MNIST dataset and how it is fetched in scikit-learn. The document outlines the steps of classification which include dividing the data into training and test sets, training a classifier on the training set, testing it on the test set, and evaluating performance. It specifically trains a stochastic gradient descent (SGD) classifier on the MNIST data. The performance is evaluated using cross validation accuracy, confusion matrix, and metrics like precision and recall.
The document discusses multiple linear regression and partial correlation. It explains that multiple regression allows one to analyze the unique contribution of predictor variables to an outcome variable after accounting for the effects of other predictor variables. Partial correlation similarly examines the relationship between two variables while controlling for a third, but only considers two variables, whereas multiple regression examines the effects of multiple predictor variables simultaneously. Examples are given comparing the correlation between height and weight with and without controlling for other relevant variables like gender, age, exercise habits, etc.
1. Multinomial logistic regression allows modeling of nominal outcome variables with more than two categories by calculating multiple logistic regression equations to compare each category's probability to a reference category.
2. The document provides an example of using multinomial logistic regression to model student program choice (academic, general, vocational) based on writing score and socioeconomic status.
3. The model results show that writing score significantly impacts the choice between academic and general/vocational programs, while socioeconomic status also influences general versus academic program choice.
- Regression analysis is a statistical technique for modeling relationships between variables, where one variable is dependent on the others. It allows predicting the average value of the dependent variable based on the independent variables.
- The key assumptions of regression models are that the error terms are normally distributed with zero mean and constant variance, and are independent of each other.
- Linear regression specifies that the dependent variable is a linear combination of the parameters, though the independent variables need not be linearly related. In simple linear regression with one independent variable, the least squares estimates of the intercept and slope are calculated to minimize the sum of squared errors.
This document provides an overview of point estimation methods, including maximum likelihood estimation and the method of moments. It begins with an introduction to statistical inference and the theory of estimation. Point estimation is defined as using sample data to calculate a single value as the best estimate of an unknown population parameter. Maximum likelihood estimation maximizes the likelihood function to find the parameter values that make the observed sample data most probable. The method of moments equates sample moments to theoretical moments to derive parameter estimates. Examples are provided to illustrate how to apply each method to obtain point estimators.
This document provides an overview of regression models and their use in business analytics. It discusses simple and multiple linear regression models, how to develop regression equations from sample data, and how to interpret key outputs like the slope, intercept, coefficient of determination, and correlation coefficient. Regression analysis is presented as a valuable tool for managers to understand relationships between variables and predict outcomes. The document outlines the key steps in regression including developing scatter plots, calculating regression equations, and measuring the fit of regression models.
This document analyzes the relationship between stock market liquidity and stock returns in 27 emerging equity markets from January 1992 to December 1999. It finds that stock returns are positively correlated with measures of market liquidity, including turnover ratio, trading value, and turnover-volatility multiple, in both cross-sectional and time-series analyses. This relationship holds even after controlling for other factors and contrasts with theories supported by studies of developed markets, where liquidity and returns are negatively correlated. The findings suggest emerging markets have a lower degree of integration with the global economy.
Multiple regression analysis is a powerful technique used for predicting the unknown value of a variable from the known value of two or more variables.
This document discusses autocorrelation in time series data and its effects on regression analysis. It defines autocorrelation as errors in one time period carrying over into future periods. Autocorrelation can be caused by factors like inertia in economic cycles, specification bias, lags, and nonstationarity. While OLS estimators remain unbiased with autocorrelation, they become inefficient and hypothesis tests are invalid. Autocorrelation can be detected using graphical analysis or formal tests like the Durbin-Watson test and Breusch-Godfrey test. The Cochrane-Orcutt procedure is also described as a way to transform data and remove autocorrelation.
- Time series data involves observations across time that are related as part of a stochastic or random process. We can only observe one realization of this process.
- Static regression models assume the dependent variable is related only to independent variables in the same time period. Finite distributed lag (FDL) models allow for lags where past values of independent variables may affect current dependent values.
- In FDL models, coefficients measure the impact of temporary or permanent changes in independent variables on dependent variables over time, showing the "memory" or lag effects. Coefficients of permanent changes sum to indicate total long-run impact.
▸ Machine Learning / Deep Learning models require to set the value of many hyperparameters
▸ Common examples: regularization coefficients, dropout rate, or number of neurons per layer in a Neural Network
▸ Instead of relying on some "expert advice", this presentation shows how to automatically find optimal hyperparameters
▸ Exhaustive Search, Monte Carlo Search, Bayesian Optimization, and Evolutionary Algorithms are explained with concrete examples
This document discusses heteroskedasticity in econometric models. It defines heteroskedasticity as non-constant variance of the error term, in contrast to the homoskedasticity assumption of constant variance. It explains that while OLS estimates remain unbiased with heteroskedasticity, the standard errors are biased. Robust standard errors can provide consistent standard errors even with heteroskedasticity. The Breusch-Pagan and White tests are presented as methods to test for the presence of heteroskedasticity based on the residuals. Weighted least squares is also introduced as a method to obtain more efficient estimates than OLS when the form of heteroskedasticity is known.
This document summarizes various optimization techniques for deep learning models, including gradient descent, stochastic gradient descent, and variants like momentum, Nesterov's accelerated gradient, AdaGrad, RMSProp, and Adam. It provides an overview of how each technique works and comparisons of their performance on image classification tasks using MNIST and CIFAR-10 datasets. The document concludes by encouraging attendees to try out the different optimization methods in Keras and provides resources for further deep learning topics.
The document discusses the topic of autocorrelation. It begins by defining autocorrelation as data that is correlated with itself over successive time periods, rather than being correlated with other external data. It then explains the concept of autocorrelation and how it violates the classical linear regression assumption that disturbances are independent over time. Several potential sources of autocorrelation are described, including omitted variables, interpolation of data, and misspecification of the random error term. The document concludes by providing mathematical expressions that describe how the mean, variance, and covariance of autocorrelated disturbances differ from the independent case.
Abstract: This PDSG workshop introduces basic concepts of multiple linear regression in machine learning. Concepts covered are Feature Elimination and Backward Elimination, with examples in Python.
Level: Fundamental
Requirements: Should have some experience with Python programming.
Visual Explanation of Ridge Regression and LASSOKazuki Yoshida
Ridge regression and LASSO are regularization techniques used to address overfitting in regression analysis. Ridge regression minimizes residuals while also penalizing large coefficients, resulting in all coefficients remaining in the model. LASSO also minimizes residuals while penalizing large coefficients, but performs continuous variable selection by driving some coefficients to exactly zero. Both techniques involve a tuning parameter that controls the strength of regularization. Cross-validation is commonly used to select the optimal tuning parameter value.
Multinomial logisticregression basicrelationshipsAnirudha si
This document provides an overview of multinomial logistic regression. It discusses how multinomial logistic regression compares multiple groups through binary logistic regressions. It describes how to interpret the results, including evaluating the overall relationship between predictors and the dependent variable and relationships between individual predictors and the dependent variable. Requirements and assumptions of the analysis are explained, such as the dependent variable being non-metric and cases-to-variable ratios. Methods for evaluating model accuracy and usefulness are also outlined.
2. Linear Algebra for Machine Learning: Basis and DimensionCeni Babaoglu, PhD
The seminar series will focus on the mathematical background needed for machine learning. The first set of the seminars will be on "Linear Algebra for Machine Learning". Here are the slides of the second part which is discussing basis and dimension.
Here is the link of the first part which was discussing linear systems: https://www.slideshare.net/CeniBabaogluPhDinMat/linear-algebra-for-machine-learning-linear-systems/1
Polynomial regression models the relationship between variables as a polynomial equation rather than a linear one. It allows for modeling of curvilinear relationships. The document discusses the definition of polynomial regression, why it is used, its history, the regression model and matrix form, how to implement it in Matlab, its advantages in fitting flexible curves, and its disadvantages related to sensitivity to outliers.
The document provides an overview of linear models and their extensions for data science applications. It begins with an introduction to linear regression and how it finds the coefficients that minimize squared error loss. It then discusses generalizing linear models to binary data using link functions. Regularization methods like ridge regression, lasso, elastic net, and grouped lasso are introduced to reduce overfitting. The document also covers extensions such as generalized additive models, support vector machines, and mixed effects models. Overall, the document aims to convince the reader that simple linear models can be very effective while also introducing more advanced techniques.
The document discusses machine learning classification using the MNIST dataset of handwritten digits. It begins by defining classification and providing examples. It then describes the MNIST dataset and how it is fetched in scikit-learn. The document outlines the steps of classification which include dividing the data into training and test sets, training a classifier on the training set, testing it on the test set, and evaluating performance. It specifically trains a stochastic gradient descent (SGD) classifier on the MNIST data. The performance is evaluated using cross validation accuracy, confusion matrix, and metrics like precision and recall.
The document discusses multiple linear regression and partial correlation. It explains that multiple regression allows one to analyze the unique contribution of predictor variables to an outcome variable after accounting for the effects of other predictor variables. Partial correlation similarly examines the relationship between two variables while controlling for a third, but only considers two variables, whereas multiple regression examines the effects of multiple predictor variables simultaneously. Examples are given comparing the correlation between height and weight with and without controlling for other relevant variables like gender, age, exercise habits, etc.
1. Multinomial logistic regression allows modeling of nominal outcome variables with more than two categories by calculating multiple logistic regression equations to compare each category's probability to a reference category.
2. The document provides an example of using multinomial logistic regression to model student program choice (academic, general, vocational) based on writing score and socioeconomic status.
3. The model results show that writing score significantly impacts the choice between academic and general/vocational programs, while socioeconomic status also influences general versus academic program choice.
- Regression analysis is a statistical technique for modeling relationships between variables, where one variable is dependent on the others. It allows predicting the average value of the dependent variable based on the independent variables.
- The key assumptions of regression models are that the error terms are normally distributed with zero mean and constant variance, and are independent of each other.
- Linear regression specifies that the dependent variable is a linear combination of the parameters, though the independent variables need not be linearly related. In simple linear regression with one independent variable, the least squares estimates of the intercept and slope are calculated to minimize the sum of squared errors.
This document provides an overview of point estimation methods, including maximum likelihood estimation and the method of moments. It begins with an introduction to statistical inference and the theory of estimation. Point estimation is defined as using sample data to calculate a single value as the best estimate of an unknown population parameter. Maximum likelihood estimation maximizes the likelihood function to find the parameter values that make the observed sample data most probable. The method of moments equates sample moments to theoretical moments to derive parameter estimates. Examples are provided to illustrate how to apply each method to obtain point estimators.
This document provides an overview of regression models and their use in business analytics. It discusses simple and multiple linear regression models, how to develop regression equations from sample data, and how to interpret key outputs like the slope, intercept, coefficient of determination, and correlation coefficient. Regression analysis is presented as a valuable tool for managers to understand relationships between variables and predict outcomes. The document outlines the key steps in regression including developing scatter plots, calculating regression equations, and measuring the fit of regression models.
This document analyzes the relationship between stock market liquidity and stock returns in 27 emerging equity markets from January 1992 to December 1999. It finds that stock returns are positively correlated with measures of market liquidity, including turnover ratio, trading value, and turnover-volatility multiple, in both cross-sectional and time-series analyses. This relationship holds even after controlling for other factors and contrasts with theories supported by studies of developed markets, where liquidity and returns are negatively correlated. The findings suggest emerging markets have a lower degree of integration with the global economy.
SAS Programming and Data Analysis Portfolio - BTReillyBrian Reilly
This document is a portfolio submitted by Brian Thomas Reilly to Florida State University containing projects analyzing transportation security administration (TSA) claims data from 2002 to 2014 using SAS software. The portfolio includes a project on SAS for data analysis that analyzes the TSA claims data, performing graphical and numerical summaries, statistical tests, and drawing conclusions. It finds that on average, it is better financially to lose an item at a TSA checkpoint than in checked baggage, with the average compensation per loss being over $107 higher at checkpoints. The source code for importing, cleaning, and merging the claims data files is also included.
This document summarizes a study that estimated the beta of Costco Wholesale Corporation stock using the Capital Asset Pricing Model. The authors collected quarterly stock price data for Costco and the S&P 500 index over a 10-year period. They performed a linear regression of Costco's returns against the market risk premium (S&P 500 returns - risk-free rate). The regression estimated Costco's beta at 0.642, meaning its returns tend to be about 64.2% as volatile as the overall market. However, the regression had a low R-squared value, indicating the model was not a great fit for the data. Therefore, while beta provides some insight into risk, other factors like
Predicting Stock Market Returns and the Efficiency Market HypothesisMysa Vijay
There has been growing interest on financial forecasting in recent years as accurate
forecasting of financial prices has become an important issue in investment decision making
Lu et al. (2009). It is argued that exchange rate market is very efficient (Ince and Trafalis
2006). By predicting the accurate results in return, it gives the opportunity for investors to
make profitable decisions for investing their money. As many of the researchers argue that
efficiency market hypothesis theory is not true.
Investors of the stock market are “rational” and they adapt easily to recent knowledge
regarding the stock market products, which implies investors opinion in the market follow the
effects of any information revealed. This efficient market hypothesis contains three different
levels of information sharing: the weak form, the semi-strong form, and the strong form Fama
and French (2009). Many trials have been made to calculate the movements of stock markets
using quantitative information (Andersen et al. 2007; Fama and French 2009; Nartea 2009),
but some other studies state that stock market movements cannot be captured by firm‟s
quantitative information Shleifer and Vishny (1997). This is because the actual market is not
as efficient as the expressed by the theory of efficiency market hypothesis.
We want to investigate the above addressed problem as we want to see if can predict stock
market returns and test the theory of efficiency market hypothesis. We want to see if we can
predict the stock market returns in future and see how accurate we are in predicting in it.
After predicting we want to test the theory on efficiency market hypothesis on the predicted
values, as the market is not as efficient as expressed by efficiency market hypothesis.
In order to reach our goal, we selected three papers as our base line papers. Our aim is follow
their methodologies and their algorithms and predict the stock market returns. We implement
and reproduce the efforts done by Lu et al. (2009), Khansa and Liginlal (2009) and Ince and
Trafalis (2006). We want to know how close the mentioned authors‟ results were to the actual
data. In addition to that, we also want to know how close we are to the actual data.
The document provides a review of topics that will be covered on the final exam for a quantitative analysis for business course. The 3 hour exam will cover all topics discussed in the course and be worth 50% of the final grade. Key topics include linear regression, its assumptions, hypothesis testing of regression coefficients, and time series models such as moving averages, exponential smoothing, and decomposition. Multiple regression, adjusted R-squared, and seasonal variations with trend are also summarized.
This document discusses a research project investigating investor perception of mutual funds and their behavior using time series models. It provides background on the project, which analyzed daily net asset values for 30 mutual fund schemes from equity, debt and balanced categories over one year. The objectives were to study how personal and risk factors affect fund benefits and performance, and determine the causal relationship between benchmark indices and different fund schemes. The methodology section describes collecting primary data through a survey and secondary data from sources like AMFI. Variables for analysis included performance rating based on past performance, current NAV, and agency ratings. The analysis would use factor analysis, regression, and time series models.
This document summarizes a presentation on dummy variables and innovation in the manufacturing sector in Pakistan. It includes:
1) References several books on econometrics and a research paper on determinants of innovation in manufacturing in Pakistan.
2) Defines dummy variables, product innovation, and process innovation. Large firms, exporting firms, and firms with more educated managers are more likely to innovate.
3) Uses panel data from Pakistani manufacturing firms to estimate innovation rates based on firm characteristics like size, location, and industry. The highest innovation rates are in Karachi, large firms, and textile/food/garment industries.
4) Employs a probit model to estimate the effects of internal
Report earned 105% and is a complete valuation of the company based upon CAPM and the Dividend Discount Models. Includes regression analysis of macro variables, figures from conference calls and 10Ks, and a fair market stock price. (Not to be used as investment advice)
The document discusses the Durbin-Watson test for autocorrelation in regression residuals. It provides tables of critical values for different sample sizes and numbers of regressors. It explains how to use the tables to test for positive or negative autocorrelation at various significance levels. An example is given to demonstrate interpreting results using the tables.
1. The class outline covers incorporating discrete variables in regression analysis using dummy variables for variables with 2 or more categories.
2. Two application exercises are presented: the first examines the impact of competition and other factors on airfare, while the second develops a model to forecast movie box office revenues.
3. For the airfare example, dummy variables will be created for variables with two categories like advertising. For movies, dummy variables will represent the different genres to predict box office revenues from other predictors like number of theaters and overall movie rating.
The document discusses the derivation and testing of the Capital Asset Pricing Model (CAPM). It begins by restating three key equations related to the CAPM. It then describes the assumptions and derivation of the CAPM, noting that the key insight is that the market portfolio is efficient. The document outlines how the CAPM makes testable predictions about asset expected returns and betas. It discusses additional assumptions required to test the CAPM using regression analysis. Specifically, it explains the Fama-MacBeth and Gibbons-Ross-Shanken (GRS) approaches to estimating the security market line implied by the CAPM using cross-sectional and time-series regressions respectively.
Efficient Frontier Searching of Fixed Income Portfolio under CROSSSun Zhi
This document describes a framework for constructing efficient frontiers for fixed income portfolios under China's CROSS (China Risk Oriented Solvency System) regulatory framework. The framework uses quadratic programming to optimize portfolios to meet expected yield targets while staying within regulatory capital limits. A simulation case examines efficient frontiers with and without duration constraints. It finds that holding long-duration corporate bonds to maturity uses less regulatory capital than trading them. The framework allows insurance firms to maximize returns within capital limits by providing optimal asset allocations.
This document discusses the use of dummy variables in econometric modeling. It begins by explaining that some variables cannot be quantified numerically and provides examples where dummy variables would be used. It then discusses how dummy variables are incorporated into regression models, including intercept dummy variables, slope dummy variables, and dummy variables for multiple categories. The document also covers seasonal dummy variables and concludes by explaining the Chow test and dummy variable test for testing structural stability using dummy variables.
The document discusses using dummy variables in regression analysis to account for categorical variables with more than two categories. It presents a model using dummy variables to estimate differences in teacher starting salaries by gender. It then discusses the "dummy variable trap" that can occur when including too many dummy variables, causing perfect multicollinearity. The solution is to omit one dummy variable, making its category the base or reference group. The document provides an example using seasonal dummy variables to estimate differences in beer consumption across quarters while avoiding the dummy variable trap.
The document summarizes the capital asset pricing model (CAPM) and reviews early empirical tests of the model. It begins by outlining the logic and key assumptions of the CAPM, including that the market portfolio must be mean-variance efficient. However, empirical tests found problems with the CAPM's predictions about the relationship between expected returns and market betas. Specifically, cross-sectional regressions did not find intercepts equal to the risk-free rate or slopes equal to the expected market premium. To address measurement error, later tests examined portfolios rather than individual assets. In general, the early empirical evidence revealed shortcomings in the CAPM's ability to explain returns.
Dummy variables are used to represent qualitative or categorical variables that take on only two values, usually 0 and 1. A dummy variable indicates the presence or absence of a particular attribute. For example, a dummy variable could represent gender where 1 = male and 0 = female. Dummy variables allow qualitative variables to be used in regression models. However, there is a "dummy variable trap" where including dummy variables for all categories of a qualitative variable leads to perfect multicollinearity. To avoid this, only n-1 dummy variables should be included where there are n categories.
Lots of neat examples of how to use and interpret dummy variables in regression analysis. Created by Professor Marsh for his introductory statistics course at the University of Notre Dame, Notre Dame, Indiana.
This chapter discusses time series analysis and forecasting. The key components are:
1. A time series contains data recorded over time and can be analyzed to identify trends and patterns that may continue in the future.
2. The components of a time series are secular trends, cyclical variation, seasonal variation, and irregular variation.
3. Moving averages and weighted moving averages can be used to smooth time series data and identify trends. Linear and nonlinear trend lines can also model trends in the data.
4. Seasonal indices identify seasonal patterns that repeat each year and can be used to deseasonalize time series data. Autocorrelation tests whether residuals are independent or correlated over time.
This document discusses autocorrelation in the context of time series regression analysis. It begins by defining autocorrelation as correlation between observations in a time series. When autocorrelation is present, the assumptions of the classical linear regression model are violated. The document then discusses some potential causes of autocorrelation, including omitted variables, incorrect functional form, and exclusion of lagged variables. It proceeds to describe several tests to detect autocorrelation, including graphical tests, the runs test, Durbin-Watson test, and Breusch-Godfrey test. The document concludes by outlining some remedial measures that can be taken if autocorrelation is present, such as generalized least squares and first differencing transformations.
This document discusses several methods for temporal disaggregation, which is the process of estimating higher frequency data (e.g. monthly or daily) from observed lower frequency data (e.g. quarterly or yearly). It describes the Chow Lin method, which uses a linear model and regression to distribute errors among estimated high frequency values. It also discusses extensions by Fernandez and Litterman that allow for non-stationary errors by modeling the error process as a random walk or AR(1) process. The key steps of each method are outlined.
This document discusses autocorrelation and its consequences. Autocorrelation occurs when error terms in a time series regression model are correlated over time. This violates the classical linear regression assumption that error terms are independent. If autocorrelation is present, it can bias standard error estimates and invalidate statistical tests. The document outlines various causes of autocorrelation like inertia in time series data, omitted variables, incorrect functional form, lags, and data manipulation. It also discusses the consequences of autocorrelation like biased standard errors and underestimated variance estimates. Methods to detect autocorrelation graphically and through statistical tests like runs tests are presented.
Stuck with your Regression Assignment? Get 24/7 help from tutors with Phd in the subject. Email us at support@helpwithassignment.com
Reach us at http://www.HelpWithAssignment.com
Autocorrelation measures the correlation of a time series with its past and lagged values. It exists when observations in a time series are correlated with each other. The Durbin-Watson test can detect the presence of autocorrelation by examining the residuals of a regression model. If autocorrelation is present, it violates the assumption that errors are independent and leads to inaccurate test statistics and predictions. Common structures for autocorrelation include autoregressive (AR), moving average (MA), and autoregressive moving average (ARMA) processes.
This document discusses multiple linear regression analysis. It begins by introducing the basic multiple regression model that includes more than one predictor variable. It then discusses the assumptions of multiple regression including adequate sample size, absence of outliers and multicollinearity, and normality, linearity and homoscedasticity of residuals. The document provides an example of predicting house prices using living area and distance from the city center as predictor variables. It shows how to check assumptions, interpret the regression output and make predictions using the fitted model.
This document discusses logistic regression, including:
- Logistic regression can be used when the dependent variable is binary and predicts the probability of an event occurring.
- The logistic regression equation calculates the log odds of an event occurring based on independent variables.
- Logistic regression is commonly used in medical research when variables are a mix of categorical and continuous.
The document provides an overview of regression analysis. It defines regression analysis as a technique used to estimate the relationship between a dependent variable and one or more independent variables. The key purposes of regression are to estimate relationships between variables, determine the effect of each independent variable on the dependent variable, and predict the dependent variable given values of the independent variables. The document also outlines the assumptions of the linear regression model, introduces simple and multiple regression, and describes methods for model building including variable selection procedures.
Logistic regression is used to model the probability of binary and multiclass classification problems. It assumes a linear relationship between predictors and the log-odds of the target variable. The regression coefficients are estimated using maximum likelihood estimation in an iterative process. Model fit is assessed using measures like deviance and likelihood ratio tests rather than R^2, with smaller deviance indicating better fit. The predictive ability of logistic regression models can be evaluated using metrics like accuracy from a confusion matrix, cross-validation, and the area under the ROC curve (AUC).
The document discusses various econometric modeling techniques including regression equations, cointegration, error correction models, vector autoregressive (VAR) modeling, and vector error correction models (VECM). It explains that regression equations can produce spurious results if the data is non-stationary, and that cointegration exists if the residuals from a regression equation are stationary. Error correction models specify the short-run relationship that maintains the long-run equilibrium between cointegrated variables. VAR models express current values of variables as functions of past values, while VECMs are VARs in first differences that incorporate the long-run cointegrating relationships between variables.
Logistic regression vs. logistic classifier. History of the confusion and the...Adrian Olszewski
Despite the wrong (yet widespread) claim, that "logistic regression is not a regression", it's one of the key regression tool in experimental research, like the clinical trials. It is used also for advanced testing hypotheses.
The logistic regression is part of the GLM (Generalized Linear Model) regression framework. I expanded this topic here: https://medium.com/@r.clin.res/is-logistic-regression-a-regression-46dcce4945dd
This document discusses correlation, regression, and issues that can arise when performing regression analysis. It defines correlation and covariance, and how to interpret a scatter plot. It explains how to test for statistical significance of correlation and establish if a linear relationship exists between variables. Simple and multiple linear regression are explained, including assumptions, model construction, and importance of regression coefficients. It discusses how to assess the importance of independent variables in explaining the dependent variable using t-tests, F-tests, R-squared, and adjusted R-squared. Potential issues like heteroskedasticity and multicollinearity are also summarized.
This document provides an overview of regularized regression techniques including ridge regression and lasso regression. It discusses when to use regularization to prevent overfitting, the tradeoff between bias and variance, and different types of regularization. Ridge regression minimizes the sum of squared coefficients while lasso regression minimizes the sum of absolute values of coefficients, allowing it to perform variable selection. Cross-validation is described as a method for selecting the optimal regularization parameter lambda. Advantages of regularization include improved generalization and interpretability. The document also provides an example using different regression models to predict diamond prices based on other variables in a dataset.
This document provides an introduction to generalized linear mixed models (GLMMs). GLMMs allow for modeling of data that violates assumptions of linear mixed models, such as non-normal distributions and non-constant variance. The document discusses the components of a GLMM, including the linear predictor, inverse link function, and variance function. It also describes how to derive estimating equations for GLMMs and provides an example for a univariate logit model. Estimation of variance components is also briefly discussed.
Ordinary least-squares (OLS) regression is a statistical method used to model relationships between variables. It allows for prediction of a continuous dependent variable from one or more independent variables. The OLS model finds the line of "best fit" by minimizing the sum of the squares of the distances between the observed dependent variable values and the dependent variable values predicted by the linear approximation. Key outputs include coefficients that measure the strength of the relationships, goodness of fit statistics like the residual sum of squares, and tests of significance.
Distribution of EstimatesLinear Regression ModelAssume (yt,.docxmadlynplamondon
Distribution of Estimates
Linear Regression Model
Assume (yt, xt) are independent and identically distributed and E(xtet) = 0
Estimation Consistency
The estimates approach the true values as the sample size increases.
Estimation variance decreases as the sample size increases.
Illustration of Consistency
Take a random sample of U.S. men
Estimate a linear regression of log(wages) on education
Total sample = 9089
Start with 100 observations, and sequentially increase sample size until in the final regression use the whole 9089.
Sequence of Slope Coefficients
Asymptotic Normality
4
Illustration of Asymptotic Normality
Time Series
Do these results apply to time-series data?
Consistency
Asymptotic Normality
Variance Formula
Time-series models
AR models, i.e., xt = yt-1
Trend and seasonal models
One-step and multi-step forecasting
Derivation of Variance Formula
For simplicity
Assume the variables have zero mean
The regression has no intercept
Model with no intercept:
Model with no intercept
OLS minimizes the sum of squares
The first-order condition is
Solution
Now substitute
We have
The denominator is the sample variance (when x has mean zero), so
10
Then
Where
Since
Then
From the covariance formula
When the observations are independent, the covariances are zero.
And since
We obtain
We have found
As stated at the beginning.
Extension to Time-Series
The only place in this argument where we used the assumption of the independence of observations was to show that vt = xtet has zero covariance with vj = xjej.
This is saying that vt is not autocorrelated.
Unforecastable one-step errors
In one-step-ahead forecasting, if the regression error is unforecastable, then vt is not autocorrelated.
In this case, the variance formula for the least-squares estimate is
Why is this true?
The error is unforecastable if
For simplicity, suppose that xt = 1.
Then for
Summary
In one-step-ahead time-series models, if the error is unforecastable, then least-squares estimates satisfy the asymptotic (approximate) distribution
As the sample size T is in the denominator, the variance decreases as the sample size increases.
This means that least-squares is consistent.
Variance Formula
The variance formula for the least-squares estimate takes the form
This formula is valid in time-series regression when the error is unforecastable.
Classical Variance Formula
If we make the simplifying assumption
Then
Homoskedasticity
The variance simplification is valid under “conditional homoskedasticity”
This is a simplifying assumption made to make calculations easier, and is a conventional assumption in introductory econometrics courses.
It is not used in serious econometrics.
Variance Formula: AR(1) Model
Take the AR(1) model with unforecastable homoscedastic errors
Then the variance of the OLS estimate is
Since in this model
AR(1) Asymptotic Variance
We know that
So
The asymp ...
Covariance and correlation(Dereje JIMA)Dereje Jima
The document discusses covariance and correlation, which are mathematical models used to assess relationships between variables. Covariance measures how two variables change together, while correlation measures both the strength and direction of the linear relationship between variables. Correlation coefficients range from -1 to 1, where values closer to 1 or -1 indicate a strong linear relationship and values closer to 0 indicate no linear relationship. The document also discusses partial correlation and multiple correlation, which measure relationships while controlling for additional variables. Factors that can affect correlation analyses include sample size and outliers.
Empirical Finance, Jordan Stone- LinkedinJordan Stone
This document provides instructions for a coursework assignment on analyzing exchange rate data using EViews. Students are asked to:
1) Import exchange rate and inflation rate data and comment on descriptive statistics.
2) Estimate a regression model testing purchasing power parity and comment on results.
3) Define autocorrelation and its consequences for OLS estimators.
4) Test the regression model from question 2 for autocorrelation using the Durbin-Watson test.
Chapter wise All Notes of First year Basic Civil Engineering.pptxDenish Jangid
Chapter wise All Notes of First year Basic Civil Engineering
Syllabus
Chapter-1
Introduction to objective, scope and outcome the subject
Chapter 2
Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country.
Chapter 3
Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements.
Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station.
Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps.
Chapter 4
Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation
Chapter 5
Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures.
Chapter 6
Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems.
Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect
Text Books:
1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers.
2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers.
3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House.
4. BCP, Surveying volume 1
Communicating effectively and consistently with students can help them feel at ease during their learning experience and provide the instructor with a communication trail to track the course's progress. This workshop will take you through constructing an engaging course container to facilitate effective communication.
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
1. TUGAS KELOMPOK
TEKNIK PROYEKSI BISNIS
RESUME
“Regression with Time
Series Data”
Dosen:
SigitIndrawijaya, SE. M.Si
Disusunoleh:
RizanoAhdiatRash’ada (C1B011047)
MuchlasPratama
Roby Harianto (C1B011005)
PROGRAM STUDI MANAJEMEN
FAKULTAS EKONOMI
UNIVERSITAS JAMBI
2013
2. Regression with Time
Series Data
Any business and economic applications of forecasting involve time
series data. Re-gression models can be fit to monthly, quarterly, or yearly data
using the techniques de-scribed in previous chapters. However, because data
collected over time tend to exhibit trends, seasonal patterns, and so forth,
observations in different time periods are re¬ lated or autocorrelated. That is,
for time series data, the sample of observations cannot be regarded as a random
sample. Problems of interpretation can arise when standard regression methods
are applied to observations that are related to one another over time. Fitting
regression models to time series data must be done with considerable care.
Time Series Data and the Problem of Autocorrelation
With time series data, the assumption of independence rarely holds.. Consider
the annual base price for a particular model of a new car. Can you imagine the chaos
that would exist if the new car prices from one year to the next were indeed
unrelated (in-dependent) of one another? In such a world prices would be
determined like numbers drawn from a random number tabl-. Knowledge of the
price in one year would not tell you anything about the price in the next year. In the,
real world, price in the current year is related to (correlated with) the price in the
previous year and maybe the price two years ago, arid so forth. That is, the prices in
different years are autocorrelated, they are not independent.
Autocorrelation exists when successive observations over time are related to
one another.
3. Autocorrelation can occur because the effect of a predictor variable on the
re¬sponse is distributed over time. For example, an increase in salary may
affect your con¬sumption (or saving) not only in the current period but
also in several future periodic A currentlabor contract may affect the cost of
production for some time to come. Over time, relationships tend to be dynamic
From a forecasting perspective, autocorrelation is not all bad. If values of a
re¬sponse Y in one time period are related to Y values in previous time periods,
then pre¬vious Y's can be used to predict future Y's.1 In a regression framework,
autocorrelation is handled by "fixing up" the standard regression model. To
accommodate autocorrelation, sometimes it is necessary to change the mix of
predictor variables and/or the form of the regression function. More typically,
however, autocorrelation is handled by changing the nature of the error term. A
common kind of autocorrelation, often called first-order serial correlation, is one in
which the error term in the current time period is directly related to the error term
in the previous time period. In this case, with the subscript t representing time, the
simple linear regression model takes the form (evolving), not static.
From a forecasting perspective, autocorrelation is not all bad. If values of a
response Y in one time period are related to Y values in previous time periods,
then previous Y's can be used to predict future Y's.1 In a regression framework,
autocorrelation is handled by "fixing up" the standard regression model. To
accommodate autocorrelation, sometimes it is necessary to change the mix of
predictor variables and/or the form of the regression function. More typically,
however, autocorrelation is handled by changing the nature of the error term.
4. A common kind of autocorrelation, often called first-order serial correlation,
is one in which the error term in the current time period is directly related to the
error term in the previous time period. In this case, with the subscript t representing
time, the simple linear regression model takes the form
Yt=β0+β1X1+ε1
With
(1)
εt=ρεt-1+v
(2)
Where
E, = the error at time t
p = the parameter (lag 1 autocorrelation coefficient) that measures correlation
between adjacent error terms
= normally distributed independent error term with mean 0 and variance σ2
Equation 2 says that the level of one error term (εt-1) directly affects the level of the
next error term (εt,). The magnitude of the autocorrelation coefficient p, where —1
< p < I, indicates the strength of the serial correlation. If p is zero, then there is no
serial correlation, and the error terms are independent (εt = vt).
Durbin-Watson Test for Serial Correlation
One approach that is used frequently to determine if serial correlation is present is
Durbin-Watson test. The test involves the determination of whether the
autocorretion parameter p shown in Equation 8.2 is zero. Consider
εt=ρεt-1+v
5. The hypotheses to be tested are
H0:ρ=0
H1:p>0
The alternative hypothesis is p > 0 since business and economic time series tend
show positive autocorrelation.
If a regression model does not properly account for autocorrelation, the residu will
be autocorrelated. So, the Durbin-Watson test is carried out using the residua from
the regression analysis.
Durbin-Watson statistic is defined as
where
e, = Yt — Yt = the residual for time period t
et-i— Yt -1Yr -1 = the residual for time period t — 1
For positive serial correlation, successive residuals tend to be alike and the
sum of squared differences in the numerator of the Durbin-Watson statistic will be
relatively small. Small values of the Durbin-Watson statistic are consistent with
positive serial correlation.
The autocorrelation coefficient ρ can be estimated by the lag 1 residual
autocorrelation r1(e), and with a little bit of mathematical maneuvering. the
Durbin-Watson statistic can be related to ri (e). For moderate to large samples,
DW ----- 2(1 — r1(e))
6. Since —1 <r1(e)< 1, Equation above shows that 0 < DW < 4. For r1(e) close to 0,
the DW statistic will be close to 2. Positive lag 1 residual autocorrelation is
associated with DW values less than 2, and negative lag 1 residual autocorrelation
is associated with DW values above 2.
A useful, but sometimes not definitive, test for serial correlation can be performed
by comparing the calculated value of the Durbin-Watson statistic with lower (L)
and upper (U) bounds. The decision rules are:
1.When the Durbin-Watson statistic is larger than the upper (U) bound, the autocorrelation coefficient p is equal to zero (there is no positive autocorrelation).
2.When the Durbin-Watson statistic is smaller than the lower (L) bound, the autocorrelation coefficient p is greater than zero (there is positive autocorrelation).
3.When the Durbin-Watson statistic lies within the lower and upper bounds, the
test is inconclusive (we don't know whether there is positive autocorrelation).
The Durbin-Watson test is used to determine whether positive autocorrelation
is present.
If DW > U, conclude H0 :ρ= 0. If DW < L, conclude H1 : ρ > 0.
If DW lies within the lower and upper bounds (L ≤ DW ≤ U), the test is
inconclusive.
Solutions to Autocorrelation Problems
Once autocorrelation has been discovered in a regression of time series data,
it is neccessary to remove it, or model it, before the regression function can be
evaluated for its effectiveness.
The solution to the problem of serial correlation begins with an
evaluation of the model specification. Is the functional form correct? Were any
important variables omitted? Are there effects that might have some pattern
over time that could have introduced autocorrelation into the errors?
7. Since a major cause of autocorrelated errors in the regression model is
the omission of one or more key variables, the bes t approach to solving the
problem is to findthem. This effort is sometimes referred to as improving the
model specification. Modelspecification not only involves finding the
important predictor variables, it also involves entering these variables in the
regression function in the right way. Unfortu nately, it is not always possible to
improve the model specification because an important missing variable may not
be quantifiable or, if it is quantifiable, the data may not be available. For
example, one may suspect that business investment in future periods is related
to the attitude of potential investors. However, it is difficult to quantify the
variable "attitude." Nevertheless, whenever possible, the model should be
specified in accordance with theoretically sound insight.
Only after the specification of the equation has been carefully reviewed should
the possibility of an adjustment be considered. Several techniques for
eliminating auto-correlation will be discussed.
One approach to eliminating autocorrelati on is to add an omitted variable
to the re-gression function that explains the association in the response from
one period to the next.
REGRESSION WITH DIFFERENCES
For highly autocorrelated data, modeling changes rather than levels can often
elimi¬nate the serial correlation. That is, instead of formulating the regression
equation in terms of Y and X 1 , X 2 ,... , X k , the regression equation is written in
terms of the differences, Y 1 = Y t – Y t-1 , and X t1 = X t1 - 1,1 , X t2 = X t2 – X t-1,2 , and
so forth. Differences should be considered when the Durbin -Watson statistic
associated with the regression involving the original variables is close to 0.7
8. One rationale for differencing comes from the following argument.
Yr = β 0 + β 1 X 1 +εt
with
εt=ρε t-1 =v t
where
p = correlation between consecutive errors
V t = random error = εtwhen p = 0
The model holds for any time period so
Y t-1 = β 0 +βtX t -1+ε t -1
Time Series Data and the Problem of Heteroscedasticity
Variability can increase if a variable is growing at a constant rate rather
than a constant amount over time. Nonconstant variability is called
heteroscedasticity.In a regression framework, heteroscedasticity occurs if the
variance of the error term, c, is not constant. If the variability for recent time
periods is larger than it was for past time periods, then the standard error of
the estimate,underestimates the current standard deviation of the error term. If
the standard deviationof the estimate is then used to set forecast limits for
future observations, these limits can be too narrow for the stated confidence
level.
Using Regression to Forecast Seasonal Data
In this model the seasonality is handled by using dummy variables in the
regression function.
A seasonal model for quarterly data with a time trend is
Yt=β 0 +β 1 t+β 2 S 2 +β3S3+β 4 S 4 +ε t
9. Where
Y t = the variable to be forecast
t = the time index
S 2 = a dummy variable that is 1 for the second quarter of the year; 0 otherwise
= a dummy variable that is 1 for the third quarter of the year; 0 otherwise
S 4 = a dummy variable that is 1 for the fourth quarter of the year; 0 otherwise
ε t = errors assumed to be independent and normally distributed with mean zero
and constant variance
β 0 β 1 β 2 β 3 β 4 = coefficients to be estimated
Econometric Forecasting
When regression analysis is applied to economic data, the predictions developed
from such models are referred to as economic forecasts. However, since economic
theory frequently suggests that the values taken by the quantities of interest are
determined through the simultaneous interaction of different economic forces, it
may be necessary to model this interaction with a set of simultaneous equations.
This idea leads to the construction of simultaneous equation econometric models.
These models involve individual equations that look like regression equations.
However, in a simultaneous system the individual equations are related, and the
econometric model allows the joint determination of a set of dependent variables in
terms of several independent variables. This contrasts with the usual regression
situation in which a single equation determines the expected value of one
dependent variable in terms of the independent, variables.
A simultaneous equation econometric model determines jointly the values of a set
of dependent variables, called endogenous variables by econometricians, in terms
10. of thevalues of independent variables, called exogenous variables. The values of
the exoge¬nous variables are assumed to influence the endogenous variables but
not the other way around. A complete simultaneous equation model will involve
the same number of equations as endogenous variables.
Economic theory holds that, in equilibrium, the quantity supplied is equal to the
quantity demanded at a particular price. That is, the quantity demanded, the
quantity supplied, and price are determined simultaneously. In one study of the
price elasticity of demand, the model was specified as
Qt=α0+a1Pt+a2lt+a3Tt+εt
Pt=β0+β1Q1+β2Lt+vt
where
Qt = a measure of the demand (quantity sold)
Pt = a measure of price (deflated dollars)
lt = a measure of income per capita
Tt = a measure of temperature
lt = a measure of labor cost
εt,vt= independent error terms that are uncorrelated with each other
Large-scale econometric models are being used today to model the behavior of
specific firms within an industry, selected industries within the economy, and
the total economy. Econometric models can include any number of
simultaneous multiple re¬gression-like equations. Econometric models are
used to understand how the economy works and to generate forecasts of key
economic variables. Econometric models are important aids in policy
formulation.