Your SlideShare is downloading. ×
Lesson07_new
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Lesson07_new

1,338
views

Published on

Published in: Education, Technology, Business

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,338
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
68
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Statistics for Management Lesson 6 The Linear Regression Model and Correlation
  • 2. Lesson Topics
    • Types of Regression Models
    • Determining the Simple Linear Regression Equation
    • Measures of Variation in Regression and Correlation
    • Estimation of Predicted Values
    • The Multiple Regression Model
  • 3. 1. Purpose of Regression and Correlation Analysis
    • Regression Analysis is Used Primarily for Prediction
    • A statistical model used to predict the values of a dependent or response variable based on values of at least one independent or explanatory variable
    • Correlation Analysis is Used to Measure Strength of the Association Between Numerical Variables
  • 4. The Scatter Diagram Plot of all ( X i , Y i ) pairs
  • 5. Types of Regression Models Positive Linear Relationship Negative Linear Relationship Relationship NOT Linear No Relationship
  • 6. 2. Simple Linear Regression Model Y intercept Slope
    • The Straight Line that Best Fit the Data
    • Relationship Between Variables Is a Linear Function
    Random Error Dependent (Response) Variable Independent (Explanatory) Variable
  • 7. Population Linear Regression Model  i = Random Error Y X Observed Value Observed Value    YX i X   0 1 Y X i i i       0 1
  • 8. Sample Linear Regression Model  Y i  = Predicted Value of Y for observation i X i = Value of X for observation i b 0 = Sample Y - intercept used as estimate of the population  0 b 1 = Sample Slope used as estimate of the population  1
  • 9. Simple Linear Regression Equation: Example You wish to examine the relationship between the square footage of produce stores and its annual sales. Sample data for 7 stores were obtained. Find the equation of the straight line that fits the data best Annual Store Square Sales Feet ($000) 1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760
  • 10. Scatter Diagram Example Excel Output
  • 11. Equation for the Best Straight Line  From SPSS Printout:
  • 12. Graph of the Best Straight Line Y i = 1636.415 +1.487X i 
  • 13. Interpreting the Results Y i = 1636.415 +1.487X i The slope of 1.487 means for each increase of one unit in X, the Y is estimated to increase 1.487units. For each increase of 1 square foot in the size of the store, the model predicts that the expected annual sales are estimated to increase by $1487 . 
  • 14. Standard Error of Estimate  = The standard deviation of the variation of observations around the regression line
  • 15. Inferences about the Slope: t Test
    • t Test for a Population Slope Is a Linear Relationship Between X & Y ?
    • Test Statistic:
    and df = n - 2
    • Null and Alternative Hypotheses
    • H 0 :  1 = 0 (No Linear Relationship) H 1 :  1  0 (Linear Relationship )
    Where
  • 16. Example: Produce Stores Data for 7 Stores: Regression Model Obtained: The slope of this model is 1.487. Is there a linear relationship between the square footage of a store and its annual sales ?  Annual Store Square Sales Feet ($000) 1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760 Y i = 1636.415 +1.487X i
  • 17.
    • H 0 :  1 = 0
    • H 1 :  1  0
    •   .05
    • df  7 - 2 = 7
    • Critical Value(s):
    Inferences about the Slope: t Test Example Test Statistic: Decision: Conclusion: There is evidence of a relationship. t 0 2.5706 -2.5706 .025 Reject Reject .025 From Excel Printout Reject H 0
  • 18. Inferences about the Slope: Confidence Interval Example Confidence Interval Estimate of the Slope b 1  t n-2 Excel Printout for Produce Stores At 95% level of Confidence The confidence Interval for the slope is (1.062, 1.911). Does not include 0. Conclusion: There is a significant linear relationship between annual sales and the size of the store .
  • 19. 3. Measures of Variation: The Sum of Squares
    • SST = Total Sum of Squares
      • measures the variation of the Y i values around their mean Y
    • SSR = Regression Sum of Squares
      • explained variation attributable to the relationship between X and Y
    • SSE = Error Sum of Squares
      • variation attributable to factors other than the relationship between X and Y
    _
  • 20. Measures of Variation: The Sum of Squares X i Y i = b 0 + b 1 X i Y X Y SST =  ( Y i - Y ) 2 SSE =  ( Y i - Y i ) 2  SSR =  ( Y i - Y ) 2   _ _ _
  • 21. Measures of Variation The Sum of Squares: Example SPSS Output for Produce Stores SSR SSE SST
  • 22. The Coefficient of Determination SSR regression sum of squares SST total sum of squares r 2 = = Measures the proportion of variation that is explained by the independent variable X in the regression model
  • 23. Coefficients of Determination ( r 2 ) and Correlation ( r ) r 2 = 1, r 2 = 1, r 2 = .8, r 2 = 0, Y Y i = b 0 + b 1 X i X ^ Y Y i = b 0 + b 1 X i X ^ Y Y i = b 0 + b 1 X i X ^ Y Y i = b 0 + b 1 X i X ^ r = +1 r = -1 r = +0.9 r = 0
  • 24. Correlation: Measuring the Strength of Association
    • Answer ‘ How Strong Is the Linear Relationship Between 2 Variables?’
    • Coefficient of Correlation Used
      • Population correlation coefficient denoted  (‘Rho’)
      • Values range from -1 to +1
      • Measures degree of association
    • Is the Square Root of the Coefficient of Determination
  • 25. Measures of Variation: Example Excel Output for Produce Stores r 2 = .94 S yx 94% of the variation in annual sales can be explained by the variability in the size of the store as measured by square footage
  • 26. 4. Estimation of Predicted Values Confidence Interval Estimate for  XY The Mean of Y given a particular X i t value from table with df=n-2 Standard error of the estimate Size of interval vary according to distance away from mean, X.
  • 27. Estimation of Predicted Values Confidence Interval Estimate for Individual Response Y i at a Particular X i Addition of this 1 increased width of interval from that for the mean Y
  • 28. Interval Estimates for Different Values of X X Y X Confidence Interval for a individual Y i A Given X Confidence Interval for the mean of Y Y i = b 0 + b 1 X i  _
  • 29. Example: Produce Stores Y i = 1636.415 +1.487X i Data for 7 Stores: Regression Model Obtained: Predict the annual sales for a store with 2000 square feet.  Annual Store Square Sales Feet ($000) 1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760
  • 30. Estimation of Predicted Values: Example Confidence Interval Estimate for Individual Y Find the 95% confidence interval for the average annual sales for stores of 2,000 square feet Predicted Sales Y i = 1636.415 +1.487X i = 4610.45 ($000)  X = 2350.29 S YX = 611.75 t n-2 = t 5 = 2.5706 = 4610.45  980.97 Confidence interval for mean Y
  • 31. Estimation of Predicted Values: Example Confidence Interval Estimate for  XY Find the 95% confidence interval for annual sales of one particular stores of 2,000 square feet Predicted Sales Y i = 1636.415 +1.487X i = 4610.45 ($000)  X = 2350.29 S YX = 611.75 t n-2 = t 5 = 2.5706 = 4610.45  1853.45 Confidence interval for individual Y
  • 32. 5. Multipe Regression model
    • The Multiple Regression Model
    • Coefficient of Determination
    • Model Building
  • 33. The Multiple Regression Model Relationship between 1 dependent & 2 or more independent variables is a linear function Population Y-intercept Population slopes Dependent (Response) variable for sample Independent (Explanatory) variables for sample model Random Error
  • 34. Sample Multiple Regression Model X 2 X 1 Y e i
  • 35. Multiple Regression Model: Example ( 0 F) Develop a model for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches.
  • 36. Sample Regression Model: Example Excel Output For each degree increase in temperature, the average amount of heating oil used is decreased by 5.437 gallons, holding insulation constant. For each increase in one inch of insulation, the use of heating oil is decreased by 20.012 gallons, holding temperature constant.
  • 37. Using The Model to Make Predictions Estimate the average amount of heating oil used for a home if the average temperature is 30 0 and the insulation is 6 inches. The estimated heating oil used is 278.97 gallons
  • 38. Coefficient of Multiple Determination SPSS Output
    • Adjusted r 2
    • reflects the number of explanatory variables and sample size
    • is smaller than r 2
  • 39. Testing for Overall Significance
    • Shows if there is a linear relationship between all of the X variables together and Y
    • Use F test Statistic
    • Hypotheses:
      • H 0 :  1 =  2 = … =  p = 0 (No linear relationship)
      • H 1 : At least one  i  0 ( At least one independent variable affects Y)
  • 40. Test for Overall Significance Excel Output: Example p = 2, the number of explanatory variables n - 1 MRS MSE p value = F Test Statistic
  • 41.
    • H 0 :  1 =  2 = … =  p = 0
    • H 1 : At least one  I  0
    •  = .05
    • df = 2 and 12
    • Critical Value(s):
    Test for Overall Significance Example Solution F 0 3.89 Test Statistic: Decision: Conclusion: Reject at  = 0.05 There is evidence that At least one independent variable affects Y  = 0.05 F  168.47 (Excel Output)
  • 42. Test for Significance: Individual Variables
    • Shows if there is a linear relationship between the variable X i and Y
    • Use t test Statistic
    • Hypotheses:
      • H 0 :  i = 0 (No linear relationship)
      • H 1 :  i  0 (Linear relationship between X i and Y)
  • 43. t Test Statistic Excel Output: Example t Test Statistic for X 1 (Temperature) t Test Statistic for X 2 (Insulation)
  • 44.
    • H 0 :  1 = 0 H 1 :  1  0
    • df = 12 Critical Value(s):
    t Test : Example Solution Test Statistic: Decision: Conclusion: Reject H 0 at  = 0.05 There is evidence of a significant effect of temperature on oil consumption. Z 0 2.1788 -2.1788 .025 Reject H 0 Reject H 0 .025 Does temperature have a significant effect on monthly consumption of heating oil? Test at  = 0.05. t Test Statistic = -16.1699
  • 45. Confidence Interval Estimate For The Slope Provide the 95% confidence interval for the population slope  1 (the effect of temperature on oil consumption). -6.169   1  -4.704 The average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per each increase of 1 0 F.
  • 46. Model Building
    • Goal is to Develop Model with Fewest Explanatory Variables
      • Easier to interpret
      • Lower probability of collinearity
    • Stepwise Regression Procedure
      • Provide limited evaluation of alternative models
    • Best-Subset Approach
  • 47. Lesson Summary
    • Described Types of Regression Models
    • Determined the Simple Linear Regression Equation
    • Provided Measures of Variation in Regression and Correlation
    • Provided Estimation of Predicted Values
    • Determined the Multiple Regression equation