Upcoming SlideShare
×

# Mult reg

605 views

Published on

Published in: Education, Technology
1 Comment
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Great article. Thanks for the info, very helpful. BTW, if anyone needs to fill out a CA DMV REG 31, I found a blank form here: "www.apps.dmv.ca.gov" and also here "california verification vehicle"

Are you sure you want to  Yes  No
• Be the first to like this

Views
Total views
605
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
3
1
Likes
0
Embeds 0
No embeds

No notes for slide

### Mult reg

1. 1. Multiple Regression Goals Implementation Assumptions
2. 2. Goals of Regression    Description Inference Prediction (Forecasting)
3. 3. Examples
4. 4. Why is there a need for more than one predictor variable?    Shown using the examples given above: more than one variable influences a response variable. Predictors may themselves be correlated,  What is the independent contribution of each variable to explaining the variation in the response variable.
5. 5. Three fundamental aspects of linear regression  Model selection –   Evaluation of Assumptions   What is the most parsimonious set of predictors that explain the most variation in the response variable Have we met the assumptions of the regression model Model validation
6. 6. The multiple regression model Express a p variable regression model as a series of equations  P equations condensed into a matrix form,    gives the familiar general linear model β coefficients are known as partial regression coefficients
7. 7. The p – variable Regression Model         Yi = β1 + β 2 X 2i + β 3 X 3i +  + β p X pi + ε i β1 β2…βp εi     - Intercept - Partial Regression slope coefficients - Residual term associated with the ith observation This model gives the expected value of Y conditional on the fixed values of X2, X3, …Xp, plus error
8. 8. Matrix Representation Regression model is best described as a system of equations:  Y1   β1     Y2   β1 =       Y   β  n1   1 β 2 X 21 + β 3 X 31 β 2 X 22 + β 3 X 32 +  + β p X p1 +  + β p X p2  β 2 X 2 n + β 3 X 3n +  + β p X pn ε1   ε2    εn  
9. 9. We can re-write these equations:  Y1  1 X 21     Y2  1 X 22    =       Y  1 X 2n  n  X 31  X p1  β1   ε1      X 32  X p 2  β 2   ε 2     +          X 3n  X p 3  β p   ε n      Y= X (n × 1) (n × p) β +ε (p × 1) (n × 1)
10. 10. Summary of Terms Y = n × 1 column vector of observations for response variable X = n × p matrix that portrays the n observations on p – 1 independent variablesX2, …, Xp, and the first column of 1’s represents the intercept term, e.g., β1 β = p × 1 column vector of unknown parameters, β1, β2, …, βp, where β1, is the intercept term and the β2, …, βp, are partial regression coefficients. ε = n × 1 column vector of residuals εi
11. 11. A Partial Regression Model Burst = 1.21 + 2.1 Femur Length – 0.25 Tail Length + 1.0 Toe Velocity Response Variable Intercept Partial Regression Coefficient Predictor Variable
12. 12. Assumption 1.  Expected value of the residual vector is 0  ε1   0      ε 2  0 E( ε ) = E  =         ε   0  n  
13. 13. Assumption 2.  There is no correlation between the ith and jth residual terms E (ε iε j ) = 0
14. 14. Assumption 3.  The residuals exhibit constant variance E ( ε ′ε ) = σ I 2
15. 15. Assumption 4.  Covariance between the X’s and residual terms is 0  Usually satisfied if the predictor variables are fixed and non-stochastic cov( ε , X ) = 0
16. 16. Assumption 5.     The rank of the data matrix, X is p, the number of columns p < n, the number of observations. No exact linear relationships among X variables. Assumption of no multicollinearity r( X ) = p
17. 17. If these assumptions hold…   Then the OLS estimators are in the class of unbiased linear estimators Also minimum variance estimators
18. 18. What does it mean to be BLUE?    What does this mean? Allows us to compute a number of statistics. OLS estimation
19. 19. An estimator , is the best linear unbiased estimator of θ, iff       Linear ˆ Unbiased, i.e., E(θ ) = θ Minimum variance in class of all linear unbiased estimators Unbiased and minimum variance properties means that OLS estimators are efficient estimators If one or more of the conditions are not met than the OLS estimators are no longer BLUE
20. 20. Does is matter? Yes, it means we require an alternative method for characterizing the association between our Y and X variables
21. 21. OLS Estimation Sample-based counter part to population regression model: Y = Xb + e OLS requires choosing values of b, such that error sum-ofsquares (SSE) is as small as possible.
22. 22. The Normal Equations Need to differentiate with respect to the unknowns (b): ′ ( Y − Xb ) SSE = e′e = ( Y − Xb ) Yields p simultaneous equations in p unknowns, Also known as the Normal Equations
23. 23. Matrix form of the Normal Equations ( X ′X ) b = X ′Y
24. 24. The solution for the “b’s” It should be apparent how to solve for the unknown parameters Pre-multiply by the inverse of X′X ( X ′X ) ( X ′X ) b = ( X ′X ) X ′Y −1 −1
25. 25. Solution Continued From the properties of Inverses we note that: ( X ′X ) ( X ′X ) = I −1 Ib = ( X ′X ) X ′Y −1 b = ( X ′X ) X ′Y −1 This is the fundamental outcome of OLS theory
26. 26. Assessment of “Goodness-of-Fit”  Use the R2 statistic     It represents the proportion of variability in response variable that is accounted for by the regression model  1 ≤ R2 ≤ 1  Good fit of model means that R-square will be close to one.  Poor fit means that R-square will be near 0.
27. 27. R2 – Multiple Coefficient of Determination ˆ ) ′ (Y − Y ) ˆ (Y − Y 2 R = 1− ′ (Y − Y ) (Y − Y ) Alternative Expressions SSE R = 1− SST 2 SSR R = SST 2
28. 28. Critique of R2 in Multiple Regression    R2 inflated by increasing the number of parameters in the model. One should also analyze the residual values from the model (MSE) Alternatively use the adjusted R2
29. 29. Adjusted R2 ′ (Y − Yˆ ) (Y − Yˆ )( n − p ) 2 R = 1− ( Y − Y )( Y − Y ) ( n − 1) 2 p > 1; R < R 2
30. 30. How does adjusted R-square work?  Total Sum-of-Squares is fixed,      because it is independent of number of variables The numerator, SSE, decreases as the number of variables increases.  R2 artificially inflated by adding explanatory variables to the model Use Adjusted R2 to compare different regression  Adjusted R2 takes into account the number of predictors in the model
31. 31. Statistical Inference and Hypothesis Testing  Our goal may be:     1) hypothesis testing & 2) interval estimation  Hence we will need to impose distributional limits on the residuals  It turns out the probability distribution of the OLS estimators depends on the probability distribution of the residuals, ε.
32. 32. Recount Assumptions    Normality – this means the elements of b are normally distributed b’s are unbiased. If these hold then we can perform several hypothesis tests.
33. 33. ANOVA Approach  Decomposition of total sums-of-squares into components relating   explained variance (regression) unexplained variance (error)
34. 34. ANOVA Table Source of Sums-ofVariation Squares df Regression b′X ′Y − nY 2 p-1 Residual Y ′Y − b′X ′Y n-p Total Y ′Y n-1 Mean Square F-ratio b′X ′Y − nY 2 p −1 MSR/MSE Y ′Y − b′X ′Y n− p
35. 35. Test of Null Hypothesis   Tests the null hypothesis:   H0: β2=β3…βp = 0 Null hypothesis is known as a joint or simultaneous hypothesis, because it compares the values of all βi simultaneously   This tests overall significance of regression model
36. 36. The F-test statistic and R2 vary directly (b′X ′Y − nY 2 ) ( p − 1) F= SSR ( p − 1) F= SSE ( n − p ) ( Y ′Y − b′X ′Y ) ( n − p ) SSR ( p − 1) F= SST − SSR ( n − p ) SSR SST n − p F= 1 − ( SSR SST ) p − 1 R2 n − p F= 1− R2 p −1
37. 37. Tests of Hypotheses of true β Assume the regression coefficients are normally distributed b ∼N(β,σ2[Χ′Χ]-1) cov(b) = E(b - β)(b - β)′ Estimate of σ is s 2 2 = σ2[Χ′Χ]-1 s2 = ′ ( Y − Xb ) ( Y − Xb ) n− p
38. 38. Test Statistic bi − β i t= s cii Follows a t distribution with n – p df. where cii is the element of the ith row and ith column of [Χ′Χ]-1 100(1-α)% Confidence Interval is obtained from α  bi ± t  ; n − p  s cii 2 
39. 39. Model Comparisons  Our interest is in parsimonious modeling     We seek a minimum set of X variables to predict variation in Y response variable. Goal is to reduce the number of predictor variables to arrive at a more parsimonious description of the data. Does leaving out one of the b’s significantly diminish the variance explained by the model. Compare a Saturated to an Unsaturated model  Note there are many possible Unsaturated models.
40. 40. General Philosophy      Let SSE( r ) designate the error sum-of-squares for reduced model  SSE( r ) ≥ SSE(f) The saturated model will contain p parameters The reduced model will contain k < p parameters If we assume the errors are normally distributed with mean 0 and variance sigma squared, then we can compare the two models.
41. 41. Model Comparison Compare saturated model with the reduced model  Use the SSE terms as the basis for comparison   Hence,   [ SSE ( r ) − SSE ( f ) ] ( p − k ) SSE ( f ) ( n − p ) Follows an F-distribution, with (p – k), (n – p) df If Fobs > Fcritical we reject the reduced model as a parsimonious model the bi must be included in the model
42. 42. How Many Predictors to Retain? A short course in Model Selection  Several Options  Sequential Selection     Backward Selection Forward Selection Stepwise Selection All possible subsets      MAXR MINR RSQUARE ADJUSTED RSQUARE CP
43. 43. Sequential Methods    Forward, Stepwise, Backward selection procedures Entails “Partialling-out” the predictor variables  Based on the partial correlation coefficient r12.3 = r12 − r13 r23 1 − r13 2 1 − r23 2
44. 44. Forward Selection   Build-up” procedure. Add predictors until the “best” regression model is obtained
45. 45. Outline of Forward Selection 1) 2) 3) 4) 5) No variables are included in regression equation Calculate correlations of all predictors with dependent variable Enter predictor variable with highest correlation into regression model if its corresponding partial F-value exceeds a predetermined threshold Calculate the regression equation with the predictor Select the predictor variable with the highest partial correlation to enter next.
46. 46. Forward Selection Continued Compare the partial F-test value (called FH also known as “F-to-enter”): to a predetermined tabulated F-value (called FC) If FH > FC, include the variable with the highest partial correlation and return to step 5. If FH < FC, stop and retain the regression equation as calculated
47. 47. Backward Selection  A “deconstruction” approach  Begin with the saturated (full) regression model Compute the drop in R2 as a consequence of eliminating each predictor variable, and the partial F-test value; treat as if the variable was the last to enter the regression equation Compare the lowest partial F-test value, (designated FL), to the critical value of F (designated FC)   a. If FL < FC, remove the variable recompute the regression equation using the remaining predictor variables and return to step 2. b. FL < FC, adopt the regression equation as calculated
48. 48. Stepwise Selection      Calculate correlations of all predictors with response variable Select the predictor variable with highest correlation. Regress Y on Xi. Retain the predictor if there is a significant F-test value. Calculate partial correlations of all variable not in equation with response variable. Select next predictor to enter that has the highest partial correlation. Call this predictor Xj. Compute the regression equation with both Xi and Xj entered. Retain Xj if its partial F-value exceeds the tabulated F (1, n-2-1) df.  Now determine whether Xi warrants retention. Compare its partial F-value as if Xj was entered into the equation first.
49. 49. Stepwise Continued     Retain if its F-value exceeds the tabulated F value  Enter a new Xk variable. Compute regression with three predictors. Compute partial F-values for Xi, Xj and Xk. Determine whether any should be retained by comparing observed partial F with the critical F. 6) Retain regression equation when no other predictor can be entered or removed from the model.
50. 50. All possible subsets Requires use of optimality criterion, e.g., Mallow’s Cp Cp   ( s 2 − σˆ 2 )( n − p ) = p+ σ ˆ 2 (p = k + 1) s2 is residual variance for reduced model and σ2 is the residual variance for full model All subset regressions compute possible 1, 2, 3, … variable models given some optimality criterion.
51. 51. Mallow’s Cp   Measures total squared error Choose model where Cp ~ p