Upcoming SlideShare
×

# Simple regression model

2,664 views

Published on

Published in: Technology, Economy & Finance
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
2,664
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
53
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Simple regression model

1. 1. KTEE 309 ECONOMETRICSTHE SIMPLE REGRESSION MODELChap 4 – S & W1Dr TU Thuy AnhFaculty of International Economics
2. 2. Output and labor use20501001502002503001 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24Labor Output
3. 3. 1Output and labor useThe scatter diagram shows output q plotted against labor use l for a sampleof 24 observations05010015020025030080 100 120 140 160 180 200 220Output vs labor use
4. 4. 4Output and labor use Economic theory (theory of firms) predictsAn increase in labor use leads to an increase output in theSR, as long as MPL>0, other thing being equal From the data: consistent with common sense the relationship looks linear Setting up: Want to know the impact of labor use on output=>Y: output, X: labor use
5. 5. 1YSIMPLE LINEAR REGRESSION MODELSuppose that a variable Y is a linear function of another variable X, withunknown parameters b1 and b2 that we wish to estimate.XY 21 bb b1XX1 X2 X3 X4
6. 6. Suppose that we have a sample of 4 observations with X values as shown.SIMPLE LINEAR REGRESSION MODEL2XY 21 bb b1YXX1 X2 X3 X4
7. 7. If the relationship were an exact one, the observations would lie on astraight line and we would have no trouble obtaining accurate estimates of b1and b2.Q1Q2Q3Q4SIMPLE LINEAR REGRESSION MODEL3XY 21 bb b1YXX1 X2 X3 X4
8. 8. P4In practice, most economic relationships are not exact and the actual valuesof Y are different from those corresponding to the straight line.P3P2P1Q1Q2Q3Q4SIMPLE LINEAR REGRESSION MODEL4XY 21 bb b1YXX1 X2 X3 X4
9. 9. P4To allow for such divergences, we will write the model as Y = b1 + b2X +u, where u is a disturbance term.P3P2P1Q1Q2Q3Q4SIMPLE LINEAR REGRESSION MODEL5XY 21 bb b1YXX1 X2 X3 X4
10. 10. P4Each value of Y thus has a nonrandom component, b1 + b2X, and a randomcomponent, u. The first observation has been decomposed into these twocomponents.P3P2P1Q1Q2Q3Q4u1SIMPLE LINEAR REGRESSION MODEL6XY 21 bb b1Y121 Xbb XX1 X2 X3 X4
11. 11. P4In practice we can see only the P points.P3P2P1SIMPLE LINEAR REGRESSION MODEL7YXX1 X2 X3 X4
12. 12. P4Obviously, we can use the P points to draw a line which is an approximationto the lineY = b1 + b2X. If we write this line Y = b1 + b2X, b1 is an estimate of b1 and b2is an estimate of b2.P3P2P1^SIMPLE LINEAR REGRESSION MODEL8XbbY 21ˆ b1YXX1 X2 X3 X4
13. 13. P4The line is called the fitted model and the values of Y predicted by it arecalled the fitted values of Y. They are given by the heights of the R points.P3P2P1R1R2R3 R4SIMPLE LINEAR REGRESSION MODEL9XbbY 21ˆ b1Yˆ(fitted value)Y (actual value)YXX1 X2 X3 X4
14. 14. P4XX1 X2 X3 X4The discrepancies between the actual and fitted values of Y are known as theresiduals.P3P2P1R1R2R3 R4(residual)e1e2e3e4SIMPLE LINEAR REGRESSION MODEL10XbbY 21ˆ b1Yˆ(fitted value)Y (actual value)eYY  ˆY
15. 15. P4Note that the values of the residuals are not the same as the values of thedisturbance term. The diagram now shows the true unknown relationship aswell as the fitted line.The disturbance term in each observation is responsible for the divergencebetween the nonrandom component of the true relationship and the actualobservation.P3P2P1SIMPLE LINEAR REGRESSION MODEL12Q2Q1Q3Q4XbbY 21ˆ XY 21 bb b1b1Yˆ(fitted value)Y (actual value)YXX1 X2 X3 X4unknown PRFestimatedSRF
16. 16. P4The residuals are the discrepancies between the actual and the fitted values.If the fit is a good one, the residuals and the values of the disturbance termwill be similar, but they must be kept apart conceptually.P3P2P1R1R2R3 R4SIMPLE LINEAR REGRESSION MODEL13XbbY 21ˆ XY 21 bb b1b1Yˆ(fitted value)Y (actual value)YXX1 X2 X3 X4unknown PRFestimatedSRF
17. 17. 17  LbbMiniXbbYbbMiniebbMinRSSbbMineXbbYeXbbY2,12212,122,12,12121 You must prevent negative residuals from cancelling positive ones, and oneway to do this is to use the squares of the residuals.We will determine the values of b1 and b2 that minimize RSS, the sum of the squares of theresiduals.Least squares criterion:
18. 18. 01234560 1 2 31Y2Y3YDERIVING LINEAR REGRESSION COEFFICIENTSYXbbYuXY2121ˆ:lineFitted:modelTrue bbXThis sequence shows how the regression coefficients for a simple regression model arederived, using the least squares criterion (OLS, for ordinary least squares)We will start with a numerical example with just three observations: (1,3), (2,5), and (3,6)1
19. 19. 01234560 1 2 31Y2Y3Y211ˆ bbY 212 2ˆ bbY 213 3ˆ bbY DERIVING LINEAR REGRESSION COEFFICIENTSYb2b1XWriting the fitted regression as Y = b1 + b2X, we will determine the values of b1 and b2 thatminimize RSS, the sum of the squares of the residuals.3^XbbYuXY2121ˆ:lineFitted:modelTrue bb
20. 20. 01234560 1 2 31Y2Y3Y211ˆ bbY 212 2ˆ bbY 213 3ˆ bbY Given our choice of b1 and b2, the residuals are as shown.DERIVING LINEAR REGRESSION COEFFICIENTSYb2b121333212222111136ˆ25ˆ3ˆbbYYebbYYebbYYe4XbbYuXY2121ˆ:lineFitted:modelTrue bbX221221221232221 )36()25()3( bbbbbbeeeRSS 
21. 21. SIMPLE REGRESSION ANALYSIS0281260 211bbbRSS06228120 212bbbRSS50.1,67.1 21  bbThe first-order conditions give us two equations in two unknowns. Solving them, we findthat RSS is minimized when b1 and b2 are equal to 1.67 and 1.50, respectively.10221221221232221 )36()25()3( bbbbbbeeeRSS 
22. 22. 01234560 1 2 31Y2Y3Y17.3ˆ1 Y67.4ˆ2 Y17.6ˆ3 YDERIVING LINEAR REGRESSION COEFFICIENTSYXYuXY50.167.1ˆ:lineFitted:modelTrue 21 bbXThe fitted line and the fitted values of Y are as shown.121.501.67
23. 23. DERIVING LINEAR REGRESSION COEFFICIENTSXXnX1YXbbYuXY2121ˆ:lineFitted:modelTrue bb1YnYNow we will do the same thing for the general case with n observations.13
24. 24. DERIVING LINEAR REGRESSION COEFFICIENTSXXnX1Yb1XbbYuXY2121ˆ:lineFitted:modelTrue bb1211ˆ XbbY 1Yb2nYnn XbbY 21ˆ Given our choice of b1 and b2, we will obtain a fitted line as shown.14
25. 25. DERIVING LINEAR REGRESSION COEFFICIENTSThe residual for the first observation is defined.Similarly we define the residuals for the remaining observations. That for the last one ismarked.XXnX1Yb1XbbYuXY2121ˆ:lineFitted:modelTrue bbnnnnn XbbYYYeXbbYYYe211211111ˆ.....ˆ1211ˆ XbbY 1Yb2nY1ene nn XbbY 21ˆ 16
26. 26. 26  LbbMiniXbbYbbMiniebbMinRSSbbMineXbbYeXbbY2,12212,122,12,12121 We will determine the values of b1 and b2 that minimize RSS, the sum of the squares of theresiduals.Least squares criterion:
27. 27. DERIVING LINEAR REGRESSION COEFFICIENTSXXnX1Yb1XbbYuXY2121ˆ:lineFitted:modelTrue bb1211ˆ XbbY 1Yb2nYnn XbbY 21ˆ XbYb 21 We chose the parameters of the fitted line so as to minimize the sum of the squares of theresiduals. As a result, we derived the expressions for b1 and b2 using the first ordercondition40    22XXYYXXbiii
28. 28. Practice – calculate b1 and b2Year Output - Y Labor - X1899 100 1001900 101 1051901 112 1101902 122 1181903 124 1231904 122 1161905 143 1251906 152 1331907 151 1381908 126 1211909 155 1401910 159 1441911 153 14528
29. 29. 29Model 1: OLS, using observations 1899-1922 (T = 24)Dependent variable: qcoefficient std. error t-ratio p-value---------------------------------------------------------const -38,7267 14,5994 -2,653 0,0145 **l 1,40367 0,0982155 14,29 1,29e-012 ***Mean dependent var 165,9167 S.D. dependent var 43,75318Sum squared resid 4281,287 S.E. of regression 13,95005R-squared 0,902764 Adjusted R-squared 0,898344F(1, 22) 204,2536 P-value(F) 1,29e-12Log-likelihood -96,26199 Akaike criterion 196,5240Schwarz criterion 198,8801 Hannan-Quinn 197,1490rho 0,836471 Durbin-Watson 0,763565INTERPRETATION OF A REGRESSION EQUATIONThis is the output from a regression of output q, using gretl.
30. 30. 3080100120140160180200220240260100 120 140 160 180 200qlq versus l (with least squares fit)Y = -38,7 + 1,40XHere is the scatter diagram again, with the regression lineshown
31. 31. 31THE COEFFICIENT OF DETERMINATION Question: how does the sample regression line fit the data athand? How much does the independent var. explain the variation inthe dependent var. (in the sample)? We have:222ˆ( )( )iiiiY YRY Y2 2 2ˆ( ) ( )i i ii i iY Y Y Y e     total variationof YThe shareexplainedby model
32. 32. GOODNESS OF FITRSSESSTSS  222)()ˆ(YYYYTSSESSRii      222ˆ iii eYYYYThe main criterion of goodness of fit, formally described as the coefficient ofdetermination, but usually referred to as R2, is defined to be the ratio of ESSto TSS, that is, the proportion of the variance of Y explained by theregression equation.
33. 33. GOODNESS OF FIT 222)(1YYeTSSRSSTSSRii 222)()ˆ(YYYYTSSESSRiiThe OLS regression coefficients are chosen in such a way as to minimize thesum of the squares of the residuals. Thus it automatically follows that theymaximize R2.      222ˆ iii eYYYY RSSESSTSS 
34. 34. 34Model 1: OLS, using observations 1899-1922 (T = 24)Dependent variable: qcoefficient std. error t-ratio p-value---------------------------------------------------------const -38,7267 14,5994 -2,653 0,0145 **l 1,40367 0,0982155 14,29 1,29e-012 ***Mean dependent var 165,9167 S.D. dependent var 43,75318Sum squared resid 4281,287 S.E. of regression 13,95005R-squared 0,902764 Adjusted R-squared 0,898344F(1, 22) 204,2536 P-value(F) 1,29e-12Log-likelihood -96,26199 Akaike criterion 196,5240Schwarz criterion 198,8801 Hannan-Quinn 197,1490rho 0,836471 Durbin-Watson 0,763565INTERPRETATION OF A REGRESSION EQUATIONThis is the output from a regression of output q, using gretl.
35. 35. BASIC (Gauss-Makov) ASSUMPTION OF THE OLS351. zero systematic error: E(ui) =02. Homoscedasticity: var(ui) = δ2 for all i3. No autocorrelation: cov(ui; uj) = 0 for all i #j4. X is non-stochastic5. u~ N(0, δ2)
36. 36. BASIC ASSUMPTION OF THE OLSHomoscedasticity: var(ui) = δ2 for all i36f(u)X1 X2 Xi XYii XYPRF 21: bb Mậtđộxácsuấtcủaui
37. 37. BASIC ASSUMPTION OF THE OLSHeteroscedasticity: var(ui) = δi237f(u)X1 X2 Xi XYii XYPRF 21: bb Mậtđộxácsuấtcủaui
38. 38. BASIC ASSUMPTION OF THE OLSNo autocorrelation: cov(ui; uj) = 0 for all i #j38(a) (b)(c )iu iuiu iuiuiuiu iuiu iuiuiu
39. 39. Simple regression model: Y = b1 + b2X + uWe saw in a previous slideshow that the slope coefficient may be decomposed into thetrue value and a weighted sum of the values of the disturbance term.UNBIASEDNESS OF THE REGRESSION COEFFICIENTS      iiiiiuaXXYYXXb 222 b   2XXXXajii
40. 40. Simple regression model: Y = b1 + b2X + ub2 is fixed so it is unaffected by taking expectations. The first expectation rule statesthat the expectation of a sum of several quantities is equal to the sum of theirexpectations.UNBIASEDNESS OF THE REGRESSION COEFFICIENTS      iiiiiuaXXYYXXb 222 b   2XXXXajii        22222bbbbiiiiiiuEauaEuaEEbE           iinnnnii uaEuaEuaEuauaEuaE ...... 1111
41. 41. Simple regression model: Y = b1 + b2X + uNow for each i, E(aiui) = aiE(ui)UNBIASEDNESS OF THE REGRESSION COEFFICIENTS      iiiiiuaXXYYXXb 222 b   2XXXXajii        22222bbbbiiiiiiuEauaEuaEEbE
42. 42. Simple regression model: Y = b1 + b2X + uEfficiencyPRECISION OF THE REGRESSION COEFFICIENTSThe Gauss–Markov theorem states that, provided that the regression modelassumptions are valid, the OLS estimators are BLUE: Linear, Unbiased, Minimumvariance in the class of all unbiased estimatorsprobabilitydensityfunction of b2OLSother unbiasedestimatorb2 b2
43. 43. Simple regression model: Y = b1 + b2X + uPRECISION OF THE REGRESSION COEFFICIENTSIn this sequence we will see that we can also obtain estimates of thestandard deviations of the distributions. These will give some idea of theirlikely reliability and will provide a basis for tests of hypotheses.probabilitydensityfunction of b2b2standard deviationof density functionof b2b2
44. 44. Simple regression model: Y = b1 + b2X + uPRECISION OF THE REGRESSION COEFFICIENTSExpressions (which will not be derived) for the variances of theirdistributions are shown above.We will focus on the implications of the expression for the variance of b2.Looking at the numerator, we see that the variance of b2 is proportional tosu2. This is as we would expect. The more noise there is in the model, theless precise will be our estimates.  2222 11XXXn iub ss  )(MSD22222XnXXuiubsss 
45. 45. Simple regression model: Y = b1 + b2X + uPRECISION OF THE REGRESSION COEFFICIENTSHowever the size of the sum of the squared deviations depends on twofactors: the number of observations, and the size of the deviations of Xiaround its sample mean. To discriminate between them, it is convenient todefine the mean square deviation of X, MSD(X).  2222 11XXXn iub ss  )(MSD22222XnXXuiubsss   21)(MSD XXnX i
46. 46. Simple regression model: Y = b1 + b2X + uPRECISION OF THE REGRESSION COEFFICIENTSThis is illustrated by the diagrams above. The nonstochastic component ofthe relationship, Y = 3.0 + 0.8X, represented by the dotted line, is the samein both diagrams.However, in the right-hand diagram the random numbers have beenmultiplied by a factor of 5. As a consequence, the regression line, the solidline, is a much poorer approximation to the nonstochastic relationship.-15-10-5051015202530350 5 10 15 20-15-10-5051015202530350 5 10 15 20Y YX XY = 3.0 + 0.8X
47. 47. Simple regression model: Y = b1 + b2X + uPRECISION OF THE REGRESSION COEFFICIENTSLooking at the denominator, the larger is the sum of the squared deviationsof X, the smaller is the variance of b2.  2222 11XXXn iub ss  )(MSD22222XnXXuiubsss 
48. 48. Simple regression model: Y = b1 + b2X + uPRECISION OF THE REGRESSION COEFFICIENTS11  2222 11XXXn iub ss  )(MSD22222XnXXuiubsss   21)(MSD XXnX iA third implication of the expression is that the variance is inverselyproportional to the mean square deviation of X.
49. 49. Simple regression model: Y = b1 + b2X + uPRECISION OF THE REGRESSION COEFFICIENTSIn the diagrams above, the nonstochastic component of the relationship isthe same and the same random numbers have been used for the 20 values ofthe disturbance term.-15-10-5051015202530350 5 10 15 20-15-10-5051015202530350 5 10 15 20Y YX XY = 3.0 + 0.8X
50. 50. Simple regression model: Y = b1 + b2X + uPRECISION OF THE REGRESSION COEFFICIENTSHowever, MSD(X) is much smaller in the right-hand diagram because thevalues of X are much closer together.-15-10-5051015202530350 5 10 15 20-15-10-5051015202530350 5 10 15 20Y YX XY = 3.0 + 0.8X
51. 51. Simple regression model: Y = b1 + b2X + uPRECISION OF THE REGRESSION COEFFICIENTSHence in that diagram the position of the regression line is more sensitive tothe values of the disturbance term, and as a consequence the regression lineis likely to be relatively inaccurate.-15-10-5051015202530350 5 10 15 20-15-10-5051015202530350 5 10 15 20Y YX XY = 3.0 + 0.8X
52. 52. Simple regression model: Y = b1 + b2X + uPRECISION OF THE REGRESSION COEFFICIENTS  2222 11XXXn iub ss  )(MSD22222XnXXuiubsss We cannot calculate the variances exactly because we do not know thevariance of the disturbance term. However, we can derive an estimator of su2from the residuals.
53. 53. Simple regression model: Y = b1 + b2X + uPRECISION OF THE REGRESSION COEFFICIENTS  2222 11XXXn iub ss  )(MSD22222XnXXuiubsss     22 11)(MSD ii eneeneClearly the scatter of the residuals around the regression line will reflect theunseen scatter of u about the line Yi = b1 + b2Xi, although in general theresidual and the value of the disturbance term in any given observation arenot equal to one another.One measure of the scatter of the residuals is their mean squareerror, MSD(e), defined as shown.
54. 54. 54Model 1: OLS, using observations 1899-1922 (T = 24)Dependent variable: qcoefficient std. error t-ratio p-value---------------------------------------------------------const -38,7267 14,5994 -2,653 0,0145 **l 1,40367 0,0982155 14,29 1,29e-012 ***Mean dependent var 165,9167 S.D. dependent var 43,75318Sum squared resid 4281,287 S.E. of regression 13,95005R-squared 0,902764 Adjusted R-squared 0,898344F(1, 22) 204,2536 P-value(F) 1,29e-12Log-likelihood -96,26199 Akaike criterion 196,5240Schwarz criterion 198,8801 Hannan-Quinn 197,1490rho 0,836471 Durbin-Watson 0,763565PRECISION OF THE REGRESSION COEFFICIENTSThe standard errors of the coefficients always appear as part of the output ofa regression. The standard errors appear in a column to the right of thecoefficients.
55. 55. Summing up55 Simple Linear Regression model: Verify dependent, independent variables, parameters, and the errorterms Interpret estimated parameters b1 & b2 as they show the relationshipbetween X andY. OLS provides BLUE estimators for the parameters under 5 Gauss-Makov ass. What next:Estimation of multiple regression model