REGRESSIONMODELS        By:        Ayush Sharma 09        Mickey Haldia 19        Prerna Makhijani 29        Sanoj George ...
ExampleYear      Population on Farm (in          millions)1935      32.11940      30.51945      24.41950      23.01955    ...
Scatter Plot               Population(in millions)3530252015                                         Poplation(in millions...
Correlation Coefficient (r)   It is a measure of strength of the linear    relationship between two variables and is    c...
Interpretation   After calculating we find r = -0.993   There is a strong negative correlation.
Coefficient of Determination   Squaring the correlation coefficient (r) gives us    the percent variation in the y-variab...
To continue with the example   We found r = -0.993. By squaring we get the    Coefficient of Determination (R^2) = 0.987 ...
Interpretation   We conclude that 98.7% of the decrease in    farm population can be explained by timeline    progression...
Assumptions of the Regression Model The following assumptions are made about the  errors:a) The errors are independentb) ...
Patterns of Indicating ErrorsError                 X
Estimating the Variance The error variance is measured by the MSE s2 = MSE= SSE                n-k-1where n = number of ...
Multiple regression Analysis  More than one independent variable                    Y=β0+β1X1+β2X2+……+βkXk+ϵ      Where,  ...
Testing the Model for Significance•   MSE and co-efficient of determination (r2) does not    provide a good measure of acc...
Steps in Hypothesis Test for a Significant Regression Model1. Specify null and alternative hypothesis.2. Select the level ...
Triple A Construction Example   Step 1:    H0 :β1 = 0, (no linear relationship between X and Y)    H1 :β1 ≠ 0, (linear re...
Triple A Construction Example   Step 3: Calculate the value of the test statistic    MSR = SSR/k           = 15.6250/1   ...
Triple A Construction Example   Step 4: Reject the null hypothesis if the test statistic    is greater than the F value f...
Selling Price ($)   Suare Footage         AGE       Condition       95000             1926               30        GOOD   ...
Binary or Dummy Variables   Indicator Variable   Assigned a value of 1 if a particular condition is    met, 0 otherwise...
Selling Price               Suare Footage                  AGE             X3(Exc.)     X4(Mint)             Condition    ...
Model Building   The value of r2 can never decrease when more    variables are added to the model   Adjusted r2 often us...
Multiple Regression   Sales/Decision to buy = B0+ B1* Price        Sales/Decision to buy = B0+ B1* (Price)3+        B2*(De...
Pitfalls In RegressionA High Correlation does not mean one variable is causing achange in another (Some regressions have s...
Presentation2 stats
Upcoming SlideShare
Loading in …5
×

Presentation2 stats

749 views

Published on

  • Be the first to comment

  • Be the first to like this

Presentation2 stats

  1. 1. REGRESSIONMODELS By: Ayush Sharma 09 Mickey Haldia 19 Prerna Makhijani 29 Sanoj George 39 Sushant Jaggi 49 Nitish Dorle 59
  2. 2. ExampleYear Population on Farm (in millions)1935 32.11940 30.51945 24.41950 23.01955 19.11960 15.61965 12.5
  3. 3. Scatter Plot Population(in millions)3530252015 Poplation(in millions)1050 1930 1940 1950 1960 1970
  4. 4. Correlation Coefficient (r) It is a measure of strength of the linear relationship between two variables and is calculated using the following formula:
  5. 5. Interpretation After calculating we find r = -0.993 There is a strong negative correlation.
  6. 6. Coefficient of Determination Squaring the correlation coefficient (r) gives us the percent variation in the y-variable that is described by the variation in the x-variable To relate x and y, the Regression Equation is calculated using Least Squares technique. Regression Equation: Y’ = a +bX Slope of the regression line:
  7. 7. To continue with the example We found r = -0.993. By squaring we get the Coefficient of Determination (R^2) = 0.987 35 Regression y = -0.671 x + 1,330.350 Population on Farm (in 30 R² = 0.987 millions) 25 20 15 10 1930 1940 Year 1950 1960 1970
  8. 8. Interpretation We conclude that 98.7% of the decrease in farm population can be explained by timeline progression. Theoretically, population is a dependent variable (y-axis) and timeline is an independent variable (x-axis).
  9. 9. Assumptions of the Regression Model The following assumptions are made about the errors:a) The errors are independentb) The errors are normally distributedc) The errors have a mean of zerod) The errors have a constant variance(regardless of the value of X)
  10. 10. Patterns of Indicating ErrorsError X
  11. 11. Estimating the Variance The error variance is measured by the MSE s2 = MSE= SSE n-k-1where n = number of observations in the sample k = number of independent variablesTherefore the standard deviation will bes = sqrt (MSE)
  12. 12. Multiple regression Analysis More than one independent variable Y=β0+β1X1+β2X2+……+βkXk+ϵ Where, Y=dependent variable(response variable) Xi=ith independent variable(predictor variable or explanatory variable) β0= intercept(value of Y when all Xi = 0) βi= coefficient of the ith independent variable k= number of independent variables ϵ= random error To estimate the values of these coefficients, a sample is taken and the following equation is developed : Ῡ= b0+b1X1+b2X2+…….+bkXk where, Ῡ= predicted value of Y b0= sample intercept (and is an estimate of β0) bi= sample coefficient of ith variable(and is an estimate of βi)
  13. 13. Testing the Model for Significance• MSE and co-efficient of determination (r2) does not provide a good measure of accuracy when the sample size is small• In this case, it is necessary to test the model for significance• Linear Model is given by, Y=β0 + β1X + εNull Hypothesis :If β1 = 0, then there is no linear relationshipbetween X and YAlternate Hypothesis : If β1 ≠ 0, then there is a linear relationship
  14. 14. Steps in Hypothesis Test for a Significant Regression Model1. Specify null and alternative hypothesis.2. Select the level of significance (α). Common values are between 0.01 and 0.053. Calculate the value of the test statistic using the formula: F = MSE/MSE4. Make a decision using one of the followingmethods:a) Reject if Fcalculated > Ftableb) Reject if p-value < α
  15. 15. Triple A Construction Example Step 1: H0 :β1 = 0, (no linear relationship between X and Y) H1 :β1 ≠ 0, (linear relationship between X and Y) Step 2 Select α = 0.05
  16. 16. Triple A Construction Example Step 3: Calculate the value of the test statistic MSR = SSR/k = 15.6250/1 = 15.6250 F = MSR/MSE = 15.6250/1.7188 = 9.09
  17. 17. Triple A Construction Example Step 4: Reject the null hypothesis if the test statistic is greater than the F value from the table.To find table value, we need : Level of Significance (α) = 0.05 df1 = k = 1 df2 = n – k – 1 = 4 where k = number of independent variables n = sample sizeUsing these values, we find Ftable = 7.71 Hence, we reject H0 because 9.09 > 7.71
  18. 18. Selling Price ($) Suare Footage AGE Condition 95000 1926 30 GOOD SUMMARY OUTPUT Jenny Wilson Reality 119000 2069 40 Excellent 124800 1720 30 Excellent 135000 1396 15 GOOD 142800 1706 32 Mint Regression Statistics 145000 1847 38 Mint 159000 1950 27 Mint Multiple R The coefficient of 0.819680305 165000 2323 30 Excellent R Square determination r2 0.671875802 182000 2285 26 Mint 183000 3752 35 GOOD Adjusted R Square 0.612216857 200000 2300 18 GOOD 211000 2525 17 GOOD Standard Error 24312.60729 215000 3800 40 Excellent 219000 1740 12 Mint Observations 14ANOVA df SS MS F Significance F The regression The p-values areRegression 2 13313936968 6.7E+09 11.262 0.002178765 coefficients used to test theResidual 11 6502131603 5.9E+08 individualTotal 13 19816068571 variables for significance Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 146630.89 25482.08287 5.75427 0.0001 90545.20735 202717 90545 202717SF 43.819366 10.28096507 4.26218 0.0013 21.19111495 66.448 21.191 66.448AGE -2898.686 796.5649421 -3.639 0.0039 -4651.91386 -1145 -4651.9 -1145.5
  19. 19. Binary or Dummy Variables Indicator Variable Assigned a value of 1 if a particular condition is met, 0 otherwise The number of dummy variables must equal one less than the number of categories of a qualitative variable The Jenny Wilson realty example : – X3= 1 for excellent condition = 0 otherwise – X4= 1 for mint condition = 0 otherwise
  20. 20. Selling Price Suare Footage AGE X3(Exc.) X4(Mint) Condition Jenny Wilson Reality ($) SUMMARY OUTPUT 95000 1926 30 0 0 GOOD 119000 2069 40 1 0 Excellent 124800 1720 30 1 0 Excellent 135000 1396 15 0 0 GOOD Regression Statistics 142800 1706 32 0 1 Mint 145000 1847 38 0 1 Mint Multiple R 0.94762 159000 1950 27 0 1 Mint 165000 2323 30 1 0 Excellent R Square 0.89798 182000 2285 26 0 1 Mint 183000 3752 35 0 0 GOOD Adjusted R Square 0.85264 200000 2300 18 0 0 GOOD 211000 2525 17 0 0 GOOD Standard Error 14987.6 215000 3800 40 1 0 Excellent 219000 1740 12 0 1 Mint Observations 14 The coefficients of age is negative, indicatingANOVA that the price decreases as a house gets older df SS MS F Significance FRegression 4 17794427451 4E+09 19.8044 0.000174421Residual 9 2021641120 2E+08Total 13 19816068571 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 121658 17426.61432 6.9812 6.5E-05 82236.71393 161080 82236.71 161080SF 56.4276 6.947516792 8.122 2E-05 40.71122594 72.144 40.71123 72.144AGE -3962.82 596.0278736 -6.6487 9.4E-05 -5311.12866 -2614.5 -5311.129 -2614.5X3(Exc.) 33162.6 12179.62073 2.7228 0.0235 5610.432651 60714.9 5610.433 60715X4(Mint) 47369.2 10649.26942 4.4481 0.0016 23278.92699 71459.6 23278.93 71460
  21. 21. Model Building The value of r2 can never decrease when more variables are added to the model Adjusted r2 often used to determine if an additional independent variable is beneficial The adjusted r2 is A variable should not be added to the model if it causes the adjusted r2 to decrease
  22. 22. Multiple Regression Sales/Decision to buy = B0+ B1* Price Sales/Decision to buy = B0+ B1* (Price)3+ B2*(Design)2+B3*(Performance) L = (Price)3 M = (Design)2 N = (Performance)Sales/Decision to buy = B0+ B1* L+ B2* M+ B3* N
  23. 23. Pitfalls In RegressionA High Correlation does not mean one variable is causing achange in another (Some regressions have shown asignificantly positive relation between individuals collegeGPA and future salary. )Values of the dependent variable should not be used thatare above or below the ones from the sampleThe number of independent variables that should be usedin the model is limited by the number of observations.

×