# Chapter 15

## by matthewlevy

• 639 views

ISDS 2001 Chapter 15

ISDS 2001 Chapter 15

### Statistics

Likes
0
6
0
Embed Views
0
Views on SlideShare
639
Total Views
639

## Chapter 15Webinar Transcript

• Chapter 15 ISDS 2001 Matt Levy
• Multiple Regression Model and Equation What we learned in SLR, is also applicable in Multiple Regression. The multiple regression model simply extends SLR to include more than 1 independent variable. Hence we augment our simple linear model accomodate this: y = β0 + β1x1 + β2x2 + ... + βpxp + ε Additionally, since we still assume the expected value of ε to be zero, we show the multiple regression equation as follows: E(y) = β0 + β1x1 + β2x2 + ... + βpxp
• Estimated Multiple Regression Equation If β0, β1,...βp were known the equation on the previous slide could be used to compute the mean value of y at given values of x1, x2, ..., xp. But we don't know them, so we need estimates b0, b1, ..., bp Thus we arrive at the Estimated Regression Equation: ŷ = b0 + b1x1 + b2x2 + ... + bpxp
• The Estimation Process for Multiple Regression
• Least Squares Method To estimate our beta's, the objective is the same as SLR. That is we seek to minimize the difference between our actual dependent variable (y) and the prediction for that dependent variable (ŷ). Least Square Criterion: min Σ(yi - ŷi)2 In SLR, we had a relatively easy way to obtain our estimates. In multiple regression, this is not so easy: B = (X'X)-1X'Y So we rely on statistical computing packages to do this for us.
• Interpretation of Coefficients (β) in Multiple Regression Now that we have > 1 independent variables, we must be aware of the consequences of adding multiple independent variables. Notice from the example in Ch.15, that a b1 estimate computed with 1 independent variable (SLR) will NOT be the same when additional independent variables are added. In SLR we interpreted b1 as an estimate of the change in y for a 1 unit change in x. In multiple regression, bi is an estimate of the change in y for a 1 unit change in xi when all other independent variables are held constant. (For example, when they are all 0) Take note also that now we can easily throw in as many independent variables as we want. This will increase our explained variance, and our R2...So this is good, right??? Wrong. While this may increase our ability to predict, it will also make our model increasingly complex. Statistical power is achieved through accurate prediction with least amount of variables. In the coming sections we will look at additional measures for 'model parsimony'...that is models that 'do the most with the least'.
• Model Assumptions Our assumptions in multiple regression parallel those in SLR. For emphasis, lets briefly review. (Also look at 15.11) 1. E(ε) = 0; Therefore E(y) = β0 + β1x1 + β2x2 + ... + βpxp 2. Var(ε) = σ2 and is the same for all values of x; Therefore the variance about the regression line also equals σ2 and is the same for all values of x. 3. The values of ε are independent; Therefore the values of ε for any set of x values are not related to any other set of x values. 4. ε is normally distributed random variable; Therefore y is normally distributed.
• Testing for Significance In multiple regression significance testing carries slightly different meaning than in SLR. 1. F-Test: Tests for a significant relationship between the dependent variable and the set of all independent variables. We refer to this as the test for overall significance. 2. If the F-Test shows overall significance, than we use the t- test to check the significance for each of the independent variables. We conduct the t-test on each of the independent variables. We refer to the t-test as the test for individual significance.
• F-Test In multiple regression, we test that none of the parameters are equal to zero: H0: β0 = β1 = ... = βp = 0 Ha: One or more of the parameters are equal to zero. Remember that F = MSR/MSE. And in multiple regression: MSR = SSR/p MSE = SSE/(n - p - 1) And we reject H0 if our p-value < α
• T-Test Remember we test for each parameter. For any βi: H0: βi = 0 Ha: βi ≠ 0 t = βi/sbi And we reject H0 if our p-value < α
• Multicollinearity This is essentially the correlation among independent variables. We care about this because we want our independent variables to measure significantly different things when predicting our dependent variable. While in practice there is always some multicollinearity, we need to try and eliminate as much as we can. A simple test of multicollinearity is with the sample correlation (rx1x2) for any two independent variables. If the sample correlation exceeds .7 for any two independent variables we must take measures to reduce multicollinearity, for example, removing one of the two highly correlated variables from the model.
• The End Thats it for Ch. 15. Hope you have recovered from Mardi Gras next time I see you!