Statistics Lab
Rodolfo Metulini
IMT Institute for Advanced Studies, Lucca, Italy

Lesson 4 - The linear Regression Model: ...
Introduction

In the past praticals we analyzed one variable.
For certain reasons, it is even usefull to analyze two or mo...
Objectives

All in all, the regression model is the instrument used to:
measure the entity of the relations between two or...
Simple linear regression model
The regression model is stochastic, not deterministic.
Giving two sets of values (two varia...
Simple linear regression model - 2

ˆ
ˆ ˆ
We need to find β = {β0 , β1 } as estimators of β0 and β1 .
After β is estimated,...
Steps in the Analysis

1. Study the relations (scatterplot, correlations) between two or
more variables.
ˆ
ˆ ˆ
2. Estimati...
Why linear?

It is simple to estimate, to analyze and to interpret
it likely fits with most of empirical cases, in which th...
Model Hypotesis

In order the estimation and the utilization of the model to be
correct, certain hypotesis must hold:
E ( ...
Model Hypotesis - 2

From the hypotesis above, follow that:
V (yi ) = σ 2 , ∀i. Y is stochastic only for the

component.

...
Ordinary Least Squares (OLS) Estimation

The OLS is the estimation method used to estimate the vector β.
The idea is to mi...
Ordinary Least Squares (OLS) Estimation - 2

n

ei2 = 0

(1)

ei2 = 0

δ/δβ0

(2)

i
n

δ/δβ1
i

After some arithmetics, w...
OLS estimators

ˆ
ˆ
OLS β0 and β1 are stochastic estimators (they have a
distribution in a sample space of all the possibl...
Linear dependency index (R 2 )
The R 2 index is the most used index to measure the linear fitting
of the model.
R 2 is confi...
Hypotesis testing on β1
The estimated slope parameter β1 is stochastic. It distributes as a
gaussian:
ˆ
β1 ∼ N[β1 , σ 2 /S...
Forecasting within the regresion model

The question we want to answer is the following: Which is the
expected value of Y ...
Model Checking
Several methods are used to test the robustness of the model,
most of them based on the stochastic part of ...
Model Checking using estimated residuals - Linearity
An example of departure from the linearity assumption: we can
draw a ...
Model Checking using estimated residuals Omoscedasticity
An example of departure from the omoschedasticity assumption
(the...
Model Checking using estimated residuals - Normality
An example of departure from the normality assumption: the
qq-points ...
Model Checking using estimated residuals - Serial
correlation
An example of departure from the serial incorrelation assump...
Homeworks

1. Using cement data (n = 13), determine the β0 and β1
coefficients manually, using OLS formula at page 11, of th...
Charts - 1

Figure: Slope coefficient in the linear model
Charts - 2

Figure: Fitted (line) versus real (points) values
Upcoming SlideShare
Loading in...5
×

The linear regression model: Theory and Application

626

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
626
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
36
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

The linear regression model: Theory and Application

  1. 1. Statistics Lab Rodolfo Metulini IMT Institute for Advanced Studies, Lucca, Italy Lesson 4 - The linear Regression Model: Theory and Application - 21.01.2014
  2. 2. Introduction In the past praticals we analyzed one variable. For certain reasons, it is even usefull to analyze two or more variables together. The question we want to asnwer regards what are the relations, the causal effects determining changes in a variable. Analyze if a certain phenomenon is endogenous or exogenous. In symbols, the idea can be represent as follow: y = f (x1 , x2 , ...) Y is the response, which is a function (it depends on) one or more variables.
  3. 3. Objectives All in all, the regression model is the instrument used to: measure the entity of the relations between two or more variables: Y / X , and to measure the causal direction ( X −→ viceversa? ) Y or forecast the value of the variable Y in response to some changes in the others X1 , X2 , ... (called explanatories), or for some cases that are not considered in the sample.
  4. 4. Simple linear regression model The regression model is stochastic, not deterministic. Giving two sets of values (two variables) from a random sample of length n: x = {x1 , x2 , ..., xi , ..xn }; y = {y1 , y2 , ..., yi , ..yn }: Deterministic formula: yi = β0 + β1 xi , ∀i = 1, .., n Stochastic formula: yi = β0 + β1 xi + where i i ∀i = 1, .., n is the stochastic component. β1 define the slope in the relations between X and Y (See graph in chart 1)
  5. 5. Simple linear regression model - 2 ˆ ˆ ˆ We need to find β = {β0 , β1 } as estimators of β0 and β1 . After β is estimated, we can draw the estimated regression line, which corresponds to the estimated regression model, as follow: ˆ ˆ yi = β0 + β1 xi ˆ Here, ˆi = yi − yi . ˆ Where yi is the i-element of the estimated Y vector, and yi is the ˆ i-elements of the real Y vector. (see graph in chart 2)
  6. 6. Steps in the Analysis 1. Study the relations (scatterplot, correlations) between two or more variables. ˆ ˆ ˆ 2. Estimation of the parameters of the model β = {β0 , β1 }. ˆ 3. Hypotesis tests on the estimated β1 to verify the casual effects between X and Y 4. Robustness check of the model. 5. Use the model to analyze the causal effect and/or to do forecasting.
  7. 7. Why linear? It is simple to estimate, to analyze and to interpret it likely fits with most of empirical cases, in which the relations between two phenomenon is linear. There are a lot of implemented methods to transorm variables in order to obtain a linear relationship (log transformation, normalization, etc.. )
  8. 8. Model Hypotesis In order the estimation and the utilization of the model to be correct, certain hypotesis must hold: E ( i ) = 0, ∀i −→ E (yi ) = β0 + β1 xi Omoschedasticity: V ( i ) = σi2 = σ 2 , ∀i Null covariance: Cov ( i , j ) = 0, ∀i = j Null covariance among residuals and explanatories: Cov (xi , i ) = 0, ∀i, since X is deterministic (known) Normal assumption: i ∼ N(0, σ 2 )
  9. 9. Model Hypotesis - 2 From the hypotesis above, follow that: V (yi ) = σ 2 , ∀i. Y is stochastic only for the component. Cov (yi , yj ) = 0, ∀i = j. Since the residuals are uncorrelated. yi ∼ N[(β0 + β1 x1 ), σ 2 ] Since also the residuals are normal in shape.
  10. 10. Ordinary Least Squares (OLS) Estimation The OLS is the estimation method used to estimate the vector β. The idea is to minimize the value of the residuals. Since ei = yi − yi we are interested in minimize the component ˆ ˆ ˆ yi − β0 − β1 xi . N.B. i ˆ ˆ = β0 − β1 xi , while ei = β0 − β1 xi The method consist in minimize the sum of the square differences: n i (yi − yi )2 = ˆ n 2 i ei = Min, which is equal to solve this 2 equation system derived using derivates.
  11. 11. Ordinary Least Squares (OLS) Estimation - 2 n ei2 = 0 (1) ei2 = 0 δ/δβ0 (2) i n δ/δβ1 i After some arithmetics, we end up with this estimators for the vector β: β0 = y − β1 x ¯ ˆ ¯ n ¯ ¯ i (yi − y )(xi − x ) β1 = n 2 ¯ i (xi − x ) (3) (4)
  12. 12. OLS estimators ˆ ˆ OLS β0 and β1 are stochastic estimators (they have a distribution in a sample space of all the possible estimtors define with different samples) ˆ β1 : measure the estimated variation in Y determined by a unitary variation in X (δY /δX ) ˆ The OLS estimators are correct (E (β1 ) = β1 ), and they are BLUE (corrects and with the lowest variance)
  13. 13. Linear dependency index (R 2 ) The R 2 index is the most used index to measure the linear fitting of the model. R 2 is confined in the boundary [−1, 1], where, values near to 1 (or -1) means the explanatories are usefull to describe the changes in Y. Let define SQT = SQR + SQE , or n i (yi − y )2 = ¯ n y i (ˆi The R 2 is defined as R2 = n y y 2 i (ˆi −¯) n y 2 i (yi −¯) − y )2 + ¯ SQR SQT or 1 − n i (yi − y i )2 ˆ SQE SQT . Or, equivalent:
  14. 14. Hypotesis testing on β1 The estimated slope parameter β1 is stochastic. It distributes as a gaussian: ˆ β1 ∼ N[β1 , σ 2 /SSx] We can make use of the hypotesis tests approach to investigate on the causal relation between Y and X : H0 : β1 = 0 H1 : β1 = 0, where, alternative hypotesis mean causal relation. The test is: z= ˆ β1 −β1 sqrt(σ 2 /SSx) ∼ N(0, 1). When SSx is unknown, we estimate it as : SSx = and we use t − test with n − 1 degrees of freedom n i (xi − y )2 , ¯
  15. 15. Forecasting within the regresion model The question we want to answer is the following: Which is the expected value of Y (say yn+1 ), for a certain observation that is not in the sample?. Suppose we have, for that observation, the value for the variable X (say xn+1 ) We make use of the estimated β to determine: ˆ ˆ yn+1 = β0 + β1 xn+1 ˆ
  16. 16. Model Checking Several methods are used to test the robustness of the model, most of them based on the stochastic part of the the model: the estimated residuals. Graphical checks: Plot residuals versus fitted values qq-plot for the normality Shapiro wilk test for normality Durbin-Watson test for serial correlation Breusch-Pagan test for heteroschedasticity Moreover, the leverage is used to evaluate th importance of each observation in determining the estimated coefficients β. The Stepwise procedure is used to choice between different model specifications.
  17. 17. Model Checking using estimated residuals - Linearity An example of departure from the linearity assumption: we can draw a curve (not a horizontal line) to interpolate the points Figure: residuals (Y) versus estimated (X) values
  18. 18. Model Checking using estimated residuals Omoscedasticity An example of departure from the omoschedasticity assumption (the estimated residuals increases as the predicted values increase)
  19. 19. Model Checking using estimated residuals - Normality An example of departure from the normality assumption: the qq-points do not follow the qq-line Figure: residuals (Y) versus estimated (X) values
  20. 20. Model Checking using estimated residuals - Serial correlation An example of departure from the serial incorrelation assumption: the residual at i depend on the value at i − 1
  21. 21. Homeworks 1. Using cement data (n = 13), determine the β0 and β1 coefficients manually, using OLS formula at page 11, of the model y = β0 + β1 x1 2. Using cement data, estimate the R 2 index of the model y = β0 + β1 x1 , using formula at page 13.
  22. 22. Charts - 1 Figure: Slope coefficient in the linear model
  23. 23. Charts - 2 Figure: Fitted (line) versus real (points) values
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×