Business Quantitative Lecture 3

2,010 views
1,850 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,010
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
77
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Business Quantitative Lecture 3

  1. 1. Quantitative Analysis for Business<br />Lecture 3<br />July 12th, 2010<br />Saksarun (Jay) Mativachranon<br />
  2. 2. Error in regression model<br />
  3. 3. Assumptions of the Regression Model<br />If we make certain assumptions about the errors in a regression model, we can perform statistical tests to determine if the model is useful <br />Errors are independent<br />Errors are normally distributed<br />Errors have a mean of zero<br />Errors have a constant variance<br />A plot of the residuals (errors) will often highlight any glaring violations of the assumption<br />
  4. 4. Residual Plots <br />Error<br />X<br /><ul><li>A random plot of residuals</li></ul>Figure 4.4A<br />
  5. 5. Residual Plots <br />Error<br />X<br /><ul><li>Nonconstant error variance</li></ul>Figure 4.4B<br />
  6. 6. Residual Plots <br />Error<br />X<br /><ul><li>Nonlinear relationship</li></ul>Figure 4.4C<br />
  7. 7. Analysis of variance<br />
  8. 8. Analysis of variance (Anova)<br />Analysis of Variance (ANOVA)<br />A statistical procedure for analyzing the total variability of a data set<br />
  9. 9. Analysis of variance (Anova)<br />Sum of squares total (SST)<br />Measures the total variation in the dependent variable<br />Sum of squares of regression (SSR)<br />Measures the variation in the dependent variable explained by the independent variable<br />Sum of squares of errors (SSE)<br />Measures the unexplained variation<br />
  10. 10. Estimating the Variance<br />Errors are assumed to have a constant variance ( 2), but we usually don’t know this<br />It can be estimated using the mean squared error (MSE), s2<br />where<br />n = number of observations in the sample<br />k = number of independent variables<br />
  11. 11. The Standard error of estimate<br />The standard error of estimate (SEE) or The standard error of the regression<br />Measures uncertainty between independent and dependent variables<br />
  12. 12. The f statistic<br />F-test asses how well a set of independent variables, as a group, explains the variation in the dependent variable<br />Where<br />MSR = mean regression sum of squares<br />MSE = mean squared error<br />k = the number of slope parameters (k = 1 for linear regression)<br />n = number of observations<br />
  13. 13. F-statistic linear regression<br />For linear regression, the hypotheses for the validity of the model are;<br />H0: b1 = 0<br />Ha: b1 ≠ 0<br />To determine if b1 is statistically significant, the calculated F-statistic is compared with the critical F-value, Fc, at the appropriate level of significance.<br />
  14. 14. F-statistic linear regression<br />The degree of freedom (df) for the numerator and denominator with one independent variable are;<br />dfnumerator = k = 1<br />dfdenominator = n – k – 1 = n – 2<br />Decision for F-test<br />Reject H0 if F > Fc<br />
  15. 15. Company A Data<br />
  16. 16. Company A example<br />_<br />_<br />^<br />^<br />^<br />_<br />_<br />^<br />^<br />_<br />
  17. 17. Estimating the Variance<br />For Company A<br /><ul><li>We can estimate the standard deviation, s
  18. 18. This is also called the standard error of the estimate or the standard deviation of the regression</li></li></ul><li>Testing the Model for Significance<br />When the sample size is too small, you can get good values for MSE and r2 even if there is no relationship between the variables<br />Testing the model for significance helps determine if the values are meaningful<br />We do this by performing a statistical hypothesis test<br />
  19. 19. Testing the Model for Significance<br />We start with the general linear model<br /><ul><li>If 1 = 0, the null hypothesis is that there is no relationship between X and Y
  20. 20. The alternate hypothesis is that there is a linear relationship (1≠ 0)
  21. 21. If the null hypothesis can be rejected, we have proven there is a relationship
  22. 22. We use the F statistic for this test</li></li></ul><li>Testing the Model for Significance<br />The F statistic is based on the MSE and MSR<br />where<br />k = number of independent variables in the model<br /><ul><li>The F statistic is
  23. 23. This describes an F distribution with</li></ul> degrees of freedom for the numerator = df1 = k<br /> degrees of freedom for the denominator = df2 = n – k – 1<br />
  24. 24. Testing the Model for Significance<br />If there is very little error, the MSE would be small and the F-statistic would be large indicating the model is useful<br />If the F-statistic is large, the significance level (p-value) will be low, indicating it is unlikely this would have occurred by chance<br />So when the F-value is large, we can reject the null hypothesis and accept that there is a linear relationship between X and Y and the values of the MSE and r2 are meaningful<br />
  25. 25. Steps in a Hypothesis Test<br />Specify null and alternative hypotheses<br />Select the level of significance (). Common values are 0.01 and 0.05<br />Calculate the value of the test statistic using the formula<br />
  26. 26. Steps in a Hypothesis Test<br />Reject the null hypothesis if the test statistic is greater than the F-value from the table Otherwise, do not reject the null hypothesis:<br />Reject the null hypothesis if the observed significance level, or p-value, is less than the level of significance (). Otherwise, do not reject the null hypothesis:<br />Make a decision using one of the following methods<br />
  27. 27. Step 3.<br /> Calculate the value of the test statistic<br />Company A<br />Step 1.<br />H0: 1 = 0 (no linear relationship between X and Y)<br />H1: 1≠ 0 (linear relationship exists between X and Y)<br />Step 2.<br /> Select  = 0.05<br />
  28. 28. Step 4.<br /> Reject the null hypothesis if the test statistic is greater than the F-value<br />df1 = k = 1<br />df2 = n – k – 1 = 6 – 1 – 1 = 4<br /> The value of F associated with a 5% level of significance and with degrees of freedom 1 and 4 is<br />F0.05,1,4 = 7.71<br />Fcalculated = 9.09<br />Reject H0 because 9.09 > 7.71<br />Company A<br />
  29. 29. 0.05<br />F = 7.71<br />9.09<br />Company A<br /><ul><li>We can conclude there is a statistically significant relationship between X and Y
  30. 30. The r2 value of 0.69 means about 69% of the variability in sales (Y) is explained by Man Hour (X)</li></li></ul><li>Limitation of regression analysis<br />Linear relationships can change over time<br />This is referred to as parameter instability<br />Even if the model is accurate, its usefulness will be limited if other market participants are also aware of and act on this model<br />If the assumptions do not hold, the interpretation and tests of hypotheses may not be valid<br />
  31. 31. Using software for regression <br />
  32. 32. Using Software for Regression<br />
  33. 33. Using Software for Regression<br />Correlation coefficient is called Multiple R in Excel<br />
  34. 34. Analysis of Variance (ANOVA) Table<br /><ul><li>When software is used to develop a regression model, an ANOVA table is typically created that shows the observed significance level (p-value) for the calculated F value
  35. 35. This can be compared to the level of significance () to make a decision</li></ul>Table 4.4<br />
  36. 36. ANOVA for Company A<br />P(F > 9.0909) = 0.0394<br /><ul><li>Because this probability is less than 0.05, we reject the null hypothesis of no linear relationship and conclude there is a linear relationship between X and Y</li>

×