Your SlideShare is downloading. ×
Presented by –Harsh Upadhyay
Module Objective          Agenda•Introduce the concept of Simple Linear Regression•Walk through the process of plotting ou...
Historical Note•Sir Francis Galton (1822-1911) used the termRegression.•To explain the relationship between the heights(in...
Simple Linear RegressionRegression analysis is used to predict the value of one variable (the dependentvariable) on the ba...
Simple Linear Regression AnalysisVariables:         X = Independent Variable (we provide this)         Y = Dependent Varia...
Simple Linear Regression AnalysisMeaning of       and   > 0 [positive slope]   < 0 [negative slope]             y         ...
Effect of Larger Values of σ ε                Lower vs. Higher                Variability Y25K$                    Y= β0+ ...
The least Square Method                                                            s                                      ...
ExerciseA black belt is connected with optimize a call center in a retail bank where clients placethe inquiries and order ...
Scatter Plot
Fitted Line Plot      Regression Analysis: CallsAnswd      versus Staff      The regression equation is      CallsAnswd = ...
Minitab Output-Session WindowCallsAnswd = - 8.844 + 3.099 StaffCallsAnswd = - 8.844 + 3.099 (17)=43.839CallsAnswd = - 8.84...
Assessing The Model              Evaluate the strength of the regression model1. Determine how much variation in our calls...
% of Variation-Explanation                                                       Explained       Total                    ...
Properties of R                                                              The Correlation Coefficient                  ...
Caution- Correlation and                          Causation                                          Calls Answered       ...
Regression Interpretation-                               Graphical•Residual is not normal•Residual Vs Fits is not constant
Regression- Session WindowRegression Analysis: CallsAnswd versus Staff         Unusual ObservationsThe regression equation...
Unusual Observations Unusual Observations                                               Unusual Observations Obs   Staff C...
Original Vs AlteredOriginal Graph               Altered Graph
Interpreting The ModelRegression Analysis: CallsAnswd versus StaffThe regression equation isCallsAnswd = - 8.84 + 3.10 Sta...
Interpreting The ModelRegression Analysis: CallsAnswd versus StaffThe regression equation isCallsAnswd = - 8.84 + 3.10 Sta...
Interpreting The ModelRegression Analysis: CallsAnswd versus StaffThe regression equation isCallsAnswd = - 8.84 + 3.10 Sta...
Interpreting The ModelRegression Analysis: CallsAnswd versus StaffThe regression equation isCallsAnswd = - 8.84 + 3.10 Sta...
Confidence on Output CallsAnswd = -8.844 + 3.099 StaffIf Staff 10 then, - 8.844 + 3.099 (10) = 22If Staff 20 then, - 8.844...
Confidence on Output 95% confidence the trueregression line lies within                        95% prediction interval exp...
Confidence on OutputConfidence interval report  average performance                         Prediction intervals          ...
Predicted Value for YLet management decide max 20 staff, then how may call answer by these20 executivePredicted Values for...
Review Methodology1. Get Familiar with the data                     3. Check the model & the assumption• Identify the outp...
Simple linear regression (final)
Upcoming SlideShare
Loading in...5
×

Simple linear regression (final)

2,987

Published on

Published in: Education
4 Comments
11 Likes
Statistics
Notes
No Downloads
Views
Total Views
2,987
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
4
Likes
11
Embeds 0
No embeds

No notes for slide

Transcript of "Simple linear regression (final)"

  1. 1. Presented by –Harsh Upadhyay
  2. 2. Module Objective Agenda•Introduce the concept of Simple Linear Regression•Walk through the process of plotting our data•Apply regression Techniques•Evaluate our model•Interpret the result Expected Learning•Understand key Simple Linear Regression terminology•Evaluate the relationship between a continuous X and continuous Y•Use regression analysis to make predictions about process
  3. 3. Historical Note•Sir Francis Galton (1822-1911) used the termRegression.•To explain the relationship between the heights(inches) of fathers and their sons.
  4. 4. Simple Linear RegressionRegression analysis is used to predict the value of one variable (the dependentvariable) on the basis of other variables (the independent variables).Dependent variable: denoted YIndependent variables: denoted X1, X2, …, Xk If we only have ONE independent variable, the model is
  5. 5. Simple Linear Regression AnalysisVariables: X = Independent Variable (we provide this) Y = Dependent Variable (we observe this)Parameters: β0 = Intercept The y-intercept of a line is the point at which the line crosses the y axis. ( i.e. where the x value equals 0) β1 = Slope Change in the mean of Y for a unit change in X ε ~ Normal Random Variable (με = 0, σε = ???) [Noise]
  6. 6. Simple Linear Regression AnalysisMeaning of and > 0 [positive slope] < 0 [negative slope] y rise run =slope (=rise/run) =y-intercept x
  7. 7. Effect of Larger Values of σ ε Lower vs. Higher Variability Y25K$ Y= β0+ β1 + X
  8. 8. The least Square Method s nce dif ferethese differences are ed quarcalled residuals or heserrors of t e sum line… ze s th d the imi nts an min poi ne s li n the Thi wee bet
  9. 9. ExerciseA black belt is connected with optimize a call center in a retail bank where clients placethe inquiries and order over the phone during 6 am to 6 pm (Monday to Friday). Thecurrent staffing plan is begin with about 4 associates at 6 am and increases to about 35associate by 9 am. At 3.30 pm the no. of associate begin to drop to about 7 by 6 pm. Theblack belt is anticipating an increasing in call volume and want to know how many callcan be answer in 30 min time interval for various staffing level to better staff the callcenter.The black belt obtain the data on the no. of associate and the no. of calls answeredfor each 30 min interval for the last two weeks. You have a total of 240 Samples. Data sheet
  10. 10. Scatter Plot
  11. 11. Fitted Line Plot Regression Analysis: CallsAnswd versus Staff The regression equation is CallsAnswd = - 8.844 + 3.099 Staff S = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% Analysis of Variance Source DF SS MS F P Regression 1 164091 164091 1192.37 0.000 Error 238 32753 138 Total 239 196844 Fitted Line: CallsAnswd versus Staff
  12. 12. Minitab Output-Session WindowCallsAnswd = - 8.844 + 3.099 StaffCallsAnswd = - 8.844 + 3.099 (17)=43.839CallsAnswd = - 8.844 + 3.099 (22)=59.334 How Confident are you with this model? 59 Calls Staff of 22 ? 59 Calls 59 Calls
  13. 13. Assessing The Model Evaluate the strength of the regression model1. Determine how much variation in our calls answered data is actually explained by staff.2. Determine The strength of the relationship between Call Answered and StaffRegression Analysis: CallsAnswd R-sqversus StaffThe regression equation isCallsAnswd = - 8.844 + 3.099 Staff % of variation in the YS = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% values explained by the linear relationship with XAnalysis of VarianceSource DF SS MS F PRegression 1 164091 164091 1192.37 0.000Error 238 32753 138 % of variation Calls AnsweredTotal 239 196844 explained by StaffFitted Line: CallsAnswd versus Staff
  14. 14. % of Variation-Explanation Explained Total Variation (Y) Variation (Y) Explained VariationR Squared = Total Variation = Between 0 and 1 (0% to 100%)
  15. 15. Properties of R The Correlation Coefficient √ R2 = R Measure the strength of the linear relationship (Pearson’s Correlation of coefficient) R Strong Moderate Low Low Moderate Moderate-1 Relationship -0.8 Relationship -0.5 Relationship 0 Relationship 0.5 Relationship 0.8 Relationship +1
  16. 16. Caution- Correlation and Causation Calls Answered Staff Correlated Correlation Causation Change in one variable causesTwo things vary together change in another
  17. 17. Regression Interpretation- Graphical•Residual is not normal•Residual Vs Fits is not constant
  18. 18. Regression- Session WindowRegression Analysis: CallsAnswd versus Staff Unusual ObservationsThe regression equation is Obs Staff CallsAnswd Fit SE Fit Residual St ResidCallsAnswd = - 8.84 + 3.10 Staff 1 3.0 6.000 0.454 1.995 5.546 0.48 X 3 4.0 7.000 3.554 1.913 3.446 0.30 X 6 4.0 11.000 3.554 1.913 7.446 0.64 XPredictor Coef SE Coef T P 7 4.0 12.000 3.554 1.913 8.446 0.73 XConstant -8.844 2.247 -3.94 0.000 8 3.0 5.000 0.454 1.995 4.546 0.39 XStaff 3.09947 0.08976 34.53 0.000 9 3.0 7.000 0.454 1.995 6.546 0.57 X 67 29.0 109.000 81.040 0.901 27.960 2.39R 78 32.0 127.000 90.339 1.071 36.661 3.14RS = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% 141 30.0 134.000 84.140 0.952 49.860 4.26R 151 30.0 114.000 84.140 0.952 29.860 2.55R 161 33.0 128.000 93.438 1.136 34.562 2.96RAnalysis of Variance 171 33.0 126.000 93.438 1.136 32.562 2.79R 183 32.0 59.000 90.339 1.071 -31.339 -2.68RSource DF SS MS F P 199 29.0 48.000 81.040 0.901 -33.040 -2.82RRegression 1 164091 164091 1192.37 0.000 202 25.0 35.000 68.643 0.768 -33.643 -2.87RResidual Error 238 32753 138 235 4.0 14.000 3.554 1.913 10.446 0.90 XTotal 239 196844 236 4.0 7.000 3.554 1.913 3.446 0.30 X R denotes an observation with a large standardized residual (>2) X denotes an observation whose X value gives it large leverage
  19. 19. Unusual Observations Unusual Observations Unusual Observations Obs Staff CallsAnswd Fit SE Fit Residual St Resid Obs Staff CallsAnswd Fit SE Fit Residual St Resid 1 3.0 6.000 0.454 1.995 5.546 0.48 X 1 3.0 6.000 0.454 1.995 5.546 0.48 X 3 4.0 7.000 3.554 1.913 3.446 0.30 X 3 4.0 7.000 3.554 1.913 3.446 0.30 X 6 4.0 11.000 3.554 1.913 7.446 0.64 X 6 4.0 11.000 3.554 1.913 7.446 0.64 X 7 4.0 12.000 3.554 1.913 8.446 0.73 X 7 4.0 12.000 3.554 1.913 8.446 0.73 X 8 3.0 5.000 0.454 1.995 4.546 0.39 X 8 3.0 5.000 0.454 1.995 4.546 0.39 X 9 3.0 7.000 0.454 1.995 6.546 0.57 X 9 3.0 7.000 0.454 1.995 6.546 0.57 X 67 29.0 109.000 81.040 0.901 27.960 2.39R 67 29.0 109.000 81.040 0.901 27.960 2.39R 78 32.0 127.000 90.339 1.071 36.661 3.14R 78 32.0 127.000 90.339 1.071 36.661 3.14R 141 30.0 134.000 84.140 0.952 49.860 4.26R 141 30.0 134.000 84.140 0.952 49.860 4.26R 151 30.0 114.000 84.140 0.952 29.860 2.55R 151 30.0 114.000 84.140 0.952 29.860 2.55R 161 33.0 128.000 93.438 1.136 34.562 2.96R 161 33.0 128.000 93.438 1.136 34.562 2.96R 171 33.0 126.000 93.438 1.136 32.562 2.79R 171 33.0 126.000 93.438 1.136 32.562 2.79R 183 32.0 59.000 90.339 1.071 -31.339 -2.68R 183 32.0 59.000 90.339 1.071 -31.339 -2.68R 199 29.0 48.000 81.040 0.901 -33.040 -2.82R 199 29.0 48.000 81.040 0.901 -33.040 -2.82R 202 25.0 35.000 68.643 0.768 -33.643 -2.87R 202 25.0 35.000 68.643 0.768 -33.643 -2.87R 235 4.0 14.000 3.554 1.913 10.446 0.90 X 235 4.0 14.000 3.554 1.913 10.446 0.90 X 236 4.0 7.000 3.554 1.913 3.446 0.30 X 236 4.0 7.000 3.554 1.913 3.446 0.30 X R denotes an observation with a large standardized residual (>2) X denotes an observation whose X value gives it large leverage 240 Data pointsUnusual observation 4% = 9 pointsResidual values (R) ≤5% of total observation
  20. 20. Original Vs AlteredOriginal Graph Altered Graph
  21. 21. Interpreting The ModelRegression Analysis: CallsAnswd versus StaffThe regression equation isCallsAnswd = - 8.84 + 3.10 StaffPredictor Coef SE Coef T PConstant -8.844 2.247 -3.94 0.000 Predictor TableStaff 3.09947 0.08976 34.53 0.000S = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% Additional StatisticsAnalysis of VarianceSource DF SS MS F PRegression 1 164091 164091 1192.37 0.000 ANOVA TableResidual Error 238 32753 138Total 239 196844
  22. 22. Interpreting The ModelRegression Analysis: CallsAnswd versus StaffThe regression equation isCallsAnswd = - 8.84 + 3.10 Staff Predictor TablePredictor Coef SE Coef T PConstant -8.844 2.247 -3.94 0.000Staff 3.09947 0.08976 34.53 0.000 Ho: Slope = 0 •No difference in Y when X changesS = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% •X has no impact on yAnalysis of Variance Ha: Slope ≠ 0Source DF SS MS F PRegression 1 164091 164091 1192.37 0.000 •Y changes as X changesResidual Error 238 32753 138 •X has impact on yTotal 239 196844
  23. 23. Interpreting The ModelRegression Analysis: CallsAnswd versus StaffThe regression equation isCallsAnswd = - 8.84 + 3.10 Staff Additional StatisticsPredictor Coef SE Coef T PConstant -8.844 2.247 -3.94 0.000 R-squaredStaff 3.09947 0.08976 34.53 0.000 •The amount of variation explained by this variationS = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% R-squared (Adjusted) •Account for the no. of X’s used in model (Used in multipleAnalysis of Variance regression)Source DF SS MS F P SRegression 1 164091 164091 1192.37 0.000Residual Error 238 32753 138 •Standard deviation for theTotal 239 196844 leftover or unexplained variation. It is used for designing the confidence interval
  24. 24. Interpreting The ModelRegression Analysis: CallsAnswd versus StaffThe regression equation isCallsAnswd = - 8.84 + 3.10 Staff ANOVA TablePredictor Coef SE Coef T PConstant -8.844 2.247 -3.94 0.000 RegressionStaff 3.09947 0.08976 34.53 0.000 •The explained variationS = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% Residual Error •Unexplained VariationAnalysis of VarianceSource DF SS MS F P Ho:Regression 1 164091 164091 1192.37 0.000 •The model does not explain theResidual Error 238 32753 138 observed variationTotal 239 196844 Ha: •The model does explain the observed variation
  25. 25. Confidence on Output CallsAnswd = -8.844 + 3.099 StaffIf Staff 10 then, - 8.844 + 3.099 (10) = 22If Staff 20 then, - 8.844 + 3.099 (20) = 53If Staff 30 then, - 8.844 + 3.099 (30) = 84 Confidence = ???% R-squared = 83.4% Report result using Unexplained Variation = 16.6% confidence intervals
  26. 26. Confidence on Output 95% confidence the trueregression line lies within 95% prediction interval expect 95% of the data points to fall within
  27. 27. Confidence on OutputConfidence interval report average performance Prediction intervals predict actual values
  28. 28. Predicted Value for YLet management decide max 20 staff, then how may call answer by these20 executivePredicted Values for New ObservationsNew Obs Fit SE Fit 95% CI 95% PI 1 53.145 0.822 (51.526, 54.765) (29.979, 76.312)Values of Predictors for New ObservationsNew Obs Staff 1 20.0
  29. 29. Review Methodology1. Get Familiar with the data 3. Check the model & the assumption• Identify the output or Y variable • Look at the residual plot• Identify the X variable or predictor  Are assumption valid?• Look at the time series, dot plots, histogram  Do residual look ok?and scatter plot  Are there unusual observation? If yes,• Run descriptive statistics address, if possible and rerun the analysis  Check for outliers • What is r-squared?  Check for gaps in the data • How much variation can be explained by the model? • Look at P –values  Does X have significant impact on Y?2. Fit the model to the data 4. Report Result & Use Equation• Run regression • Summarize the results for your stakeholders• Use the regression • If you have excluded data, explain why • If you kept outlier in the data, explain why • How much variation in Y can be explained by X? • Remember causation vs. correlation • Make a predictions using confidence and prediction intervals.

×