Multiple regression analysis allows modeling of relationships between a dependent variable and multiple independent variables. The model takes the form of Y = β0 + β1X1 + β2X2 + ... + βkXk + ε, where Y is the dependent variable, the X's are independent variables, the β's are coefficients, and ε is the error term. Regression coefficients are estimated to predict Y values and are interpreted as the expected change in Y from a one-unit change in the corresponding X, holding other X's constant. The overall model, individual coefficients, and goodness of fit can be evaluated statistically. Nonlinear relationships may require transforming variables before applying regression.
4. Interpreting example 10-year real earnings growth of S&P500 (EG10) Intercept term If dividend payout ratio (PR) is zero and the slope of the yield curve (YC) is zero, we would expect the subsequent 10-year real earnings growth rate to be -11.6% intercept Slope coefficient of PR If they payout ratio increases by 1%, we would expect the subsequent 10-year earnings growth rate to increase by 0.25%, holding YC constant Slope coefficient of YC If the yield curve slope increases by 1%, we would expect the subsequent 10-year earnings growth rate to increase by 0.14%, holding PR constant
5. Hypothesis testing of regression coefficients t-statistic – used to test the significance of the individual coefficient in a multiple regression t-statistic has n-k-1 degrees of freedom Estimated regression coefficient – hypothesized value Coefficient standard error of bj
6. Ex: testing the statistical significance of a regression coefficient Test the statistical significance of the independent variable PR in the real earnings growth example at the 10% significance level. Data based on 46 observations
7. Ex: testing the statistical significance of a regression coefficient We are testing the following hypothesis: The 10% two-tailed critical t-value with 43 degree of freedom (46-2-1) is approximately 1.68 We should reject the hypothesis if the t-statistic is greater than 1.68 or less than -1.68 Greater than 1.68, we can reject the null hypothesis and conclude that PR regression coefficient is statistically significant a the 10% significant level
21. F-statistic F-test assesses how well the set of independent variables, as a group, explains the variation of the dependent variable F-statistic is used to test whether at least one of the independent variables explains a significant portion of the variation of the dependent variable
22. F-statistic F-statistic is calculated as Where: SSR = Sum of Square of Regression SSE = Sum of Square of Errors MSR = Mean Regression Sum of Squares MSE = Mean Squared Error Reject H0 if F-statistic > Fc (critical value)
23. EX: calculating and interpreting f-statistic An analyst runs a regression of monthly value-stock returns on five independent variables over 60 months. The total sum of squares is 460, and the sum of squared errors is 170. Test the null hypothesis at the 5% significance level that all five of the independent variables are equal to zero The critical F-value for 5 and 54 degrees of freedom at 5% significance level is approximately 2.40
24. EX: calculating and interpreting f-statistic The null and alternative hypothesis are Calculations F-statistic > F-critical We reject null hypothesis! At least one independent variable is significantly different than zero
27. r2 = 0.6719 so the model explains about 67% of the variation in selling price (Y)
28. But the F-test is for the entire model and we can’t tell if one or both of the independent variables are significant
29. By calculating the p-value of each variable, we can assess the significance of the individual variables
30.
31. Adjusted R2 Unfortunately, R2 by itself may not be a reliable measure of the multiple regression model R2almost always increases as variables are added to the model We need to take new variables into account Where n = number of observations k = number of independent variables Ra2 = adjusted R2
32. Adjusted R2 Whenever there is more than 1 independent variable Ra2 is less than or equal to R2 So adding new variables to the model will increase R2 but may increase or decrease the Ra2 Ra2 maybe less than 0 if R2 is low enough
33.
34.
35.
36. A dummy variable is assigned a value of 1 if a particular condition is met and a value of 0 otherwise
42. Model explains about 90% of the variation in selling price F-value indicates significance Low p-values indicate each variable is significant Jenny Wilson Realty Program 4.3
43.
44. As more variables are added to the model, the r2-value usually increases
45. For this reason, the adjusted r2 value is often used to determine the usefulness of an additional variable
57. The easiest way to work with this model is to develop a new variable
58. This gives us a model that can be solved with linear regression softwareColonel Motors
59.
60. Cautions and Pitfalls t-tests for the intercept (b0) may be ignored as this point is often outside the range of the model A linear relationship may not be the best relationship, even if the F-test returns an acceptable value A nonlinear relationship can exist even if a linear relationship does not Just because a relationship is statistically significant doesn't mean it has any practical value