Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Bivariate analysis by ariassam 2542 views
- Univariate & bivariate analysis by sristi1992 2546 views
- Bivariate by Deven Vaijapurkar 893 views
- Univariate, bivariate analysis, hyp... by kongara 7807 views
- Univariate Analysis by Soumya Sahoo 19290 views
- Multivariate Analysis Techniques by Mehul Gondaliya 6989 views

No Downloads

Total views

1,740

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

39

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Bivariate analysis
- 2. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (Xi)Multiple Regression Model with k Independent Variables: Y-intercept Population slopes Random Error Yi β0 β1X1i β2 X2i βk Xki εi
- 3. Assumptions of RegressionUse the acronym LINE:• Linearity – The underlying relationship between X and Y is linear• Independence of Errors – Error values are statistically independent• Normality of Error – Error values (ε) are normally distributed for any given value of X• Equal Variance (Homoscedasticity) – The probability distribution of the errors has constant variance
- 4. Regression StatisticsMultiple R 0.998368 2 SSR 11704.1 r .996739R Square 0.996739 SST 11740Adjusted RSquare 0.995808StandardError 1.350151 99.674% variation isObservations 28 explained by the dependent VariablesANOVA Significan df SS MS F ce FRegression 6 11701.72 1950.286 1069.876 5.54E-25Residual 21 38.28108 1.822908Total 27 11740
- 5. Adjusted r2• r2 never decreases when a new X variable is added to the model – This can be a disadvantage when comparing models• What is the net effect of adding a new variable? – We lose a degree of freedom when a new X variable is added – Did the new X variable add enough explanatory power to offset the loss of one degree of freedom?
- 6. Adjusted r2• Shows the proportion of variation in Y explained by all X variables adjusted for the number of X variables used 2 n 1 2 radj 1 (1 r ) n k 1 (where n = sample size, k = number of independent variables) – Penalize excessive use of unimportant independent variables – Smaller than r2 – Useful in comparing among models
- 7. Error and coefficients relationship • B1 = Covar(yx)/Varp(x)Stddevp 419.28571 1103.4439 115902.4 1630165.82 36245060.6 706538.59 195.9184Covar 662.14286 6862.5 25621.4286 120976.786 16061.643 257.1429b1 0.6000694 0.059209 0.01571707 0.00333775 0.0227329 1.3125
- 8. Is the Model Significant?• F Test for Overall Significance of the Model• Shows if there is a linear relationship between all of the X variables considered together and Y• Use F-test statistic• Hypotheses: H0: β1 = β2 = … = βk = 0 (no linear relationship) H1: at least one βi ≠ 0 (at least one independent variable affects Y)
- 9. F Test for Overall Significance• Test statistic: SSR MSR k F MSE SSE n k 1 where F has (numerator) = k and (denominator) = (n – k - 1) degrees of freedom
- 10. Case discussion
- 11. Multiple Regression AssumptionsErrors (residuals) from the regression model: < ei = (Yi – Yi) Assumptions: • The errors are normally distributed • Errors have a constant variance • The model errors are independent
- 12. Error terms and coefficient estimates• Once we think of the Error term as a random variable, it becomes clear that the estimates of b1, b2, … (as distinguished from their true values) will also be random variables, because the estimates generated by the SSE criterion will depend upon the particular value of e drawn by nature for each individual in the data set.
- 13. Statistical Inference and Goodness of fit• The parameter estimates are themselves random variables, dependent upon the random variables e.• Thus, each estimate can be thought of as a draw from some underlying probability distribution, the nature of that distribution as yet unspecified.• If we assume that the error terms e are all drawn from the same normal distribution, it is possible to show that the parameter estimates have a normal distribution as well.
- 14. T Statistic and P value• T = B1-B1average/B1 std dev Can you have a hypothesis that b1 average = b1 estimate and do the T test
- 15. Are Individual Variables Significant?• Use t tests of individual variable slopes• Shows if there is a linear relationship between the variable Xj and Y• Hypotheses: – H0: βj = 0 (no linear relationship) – H1: βj ≠ 0 (linear relationship does exist between Xj and Y)
- 16. Are Individual Variables Significant?H0: βj = 0 (no linear relationship)H1: βj ≠ 0 (linear relationship does exist between xj and y)Test Statistic: bj 0 t (df = n – k – 1) Sb j
- 17. Coefficien Standard Lower Upper Lower Upper ts Error t Stat P-value 95% 95% 95.0% 95.0%Intercept -59.0661 11.28404 -5.23448 3.45E-05 -82.5325 -35.5996 -82.5325 -35.5996OFF -0.00696 0.04619 -0.15068 0.881663 -0.10302 0.089097 -0.10302 0.089097BAR 0.041988 0.005271 7.966651 8.81E-08 0.031028 0.052949 0.031028 0.052949YNG 0.002716 0.000999 2.717326 0.012904 0.000637 0.004794 0.000637 0.004794VEH 0.00147 0.000265 5.540878 1.69E-05 0.000918 0.002021 0.000918 0.002021INV -0.00274 0.001336 -2.05135 0.052914 -0.00552 3.78E-05 -0.00552 3.78E-05SPD -0.2682 0.068418 -3.92009 0.000786 -0.41049 -0.12592 -0.41049 -0.12592 with n – (k+1) degrees of freedom
- 18. Confidence Interval Estimate for the Slope• Confidence interval for the population slope βj• b j tn S k 1 bj where t has (n – k – 1) d.f. Example: Form a 95% confidence interval for the effect of changes in Bars on fatal accidents: 0.041988 (2.079614 )(0.005271) So the interval is (0.031028, 0.052949 ) (This interval does not contain zero, so bars has a significant effect on Accidents)
- 19. Coefficien Standard Lower Upper ts Error t Stat P-value 95% 95%Intercept -59.0661 11.28404 -5.23448 3.45E-05 -82.5325 -35.5996OFF -0.00696 0.04619 -0.15068 0.881663 -0.10302 0.089097BAR 0.041988 0.005271 7.966651 8.81E-08 0.031028 0.052949YNG 0.002716 0.000999 2.717326 0.012904 0.000637 0.004794VEH 0.00147 0.000265 5.540878 1.69E-05 0.000918 0.002021INV -0.00274 0.001336 -2.05135 0.052914 -0.00552 3.78E-05SPD -0.2682 0.068418 -3.92009 0.000786 -0.41049 -0.12592
- 20. Using Dummy Variables• A dummy variable is a categorical explanatory variable with two levels: – yes or no, on or off, male or female – coded as 0 or 1• Regression intercepts are different if the variable is significant• Assumes equal slopes for other variables
- 21. Interaction Between Independent Variables• Hypothesizes interaction between pairs of X variables – Response to one X variable may vary at different levels of another X variable• Contains cross-product term ˆ Y b0 b1X1 b2 X2 b3 X3 – b0 b1X1 b2 X2 b3 (X1X2 )
- 22. Effect of Interaction• Given: Y β0 β1X1 β2 X2 β3 X1X2 ε• Without interaction term, effect of X1 on Y is measured by β1• With interaction term, effect of X1 on Y is measured by β1 + β3 X2• Effect changes as X2 changes
- 23. Interaction ExampleSuppose X2 is a dummy variable and the estimated regression equation is ˆ Y= 1 + 2X1 + 3X2 + 4X1X2 Y12 X2 = 1: 8 Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1 4 X2 = 0: Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1 0 X1 0 0.5 1 1.5 Slopes are different if the effect of X1 on Y depends on X2 value
- 24. Residual Analysis ei Yi ˆ Yi• The residual for observation i, ei, is the difference between its observed and predicted value• Check the assumptions of regression by examining the residuals – Examine for linearity assumption – Evaluate independence assumption – Evaluate normal distribution assumption – Examine for constant variance for all levels of X (homoscedasticity)• Graphical Analysis of Residuals – Can plot residuals vs. X
- 25. Residual Analysis for Independence Not Independent Independentresiduals X residuals Xresiduals X
- 26. Residual Analysis for Equal Variance Y Y x xresiduals x residuals x Non-constant variance Constant variance
- 27. Linear vs. Nonlinear Fit Y Y X Xresiduals X residuals X Linear fit does not give Nonlinear fit gives random residuals random residuals
- 28. Quadratic Regression Model Yi β0 β1X1i β 2 X1i 2 εi Quadratic models may be considered when the scatter diagram takes on one of the following shapes:Y Y Y Y X1 X1 X1 X1 β1 < 0 β1 > 0 β1 < 0 β1 > 0 β2 > 0 β2 > 0 β2 < 0 β2 < 0 β1 = the coefficient of the linear term β2 = the coefficient of the squared term

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment