Upcoming SlideShare
×

# Simple linear regression (final)

3,497

Published on

Published in: Education
14 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• can you be email me this presentation? on mahesh_gaik@hotmail.com

Are you sure you want to  Yes  No
• can u share this slide for my presentation thank u..here is my email maiaorinoco@yahoo.com..

Are you sure you want to  Yes  No
• Hi Harsh, will appre ciate if you could share the presentation. Thank you. kvarri@gmail.com

Are you sure you want to  Yes  No
• Would you be willing to email me this presentation? (powerpoint) whlafollette@cvty.com

Are you sure you want to  Yes  No
Views
Total Views
3,497
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
1
4
Likes
14
Embeds 0
No embeds

No notes for slide

### Simple linear regression (final)

1. 1. Presented by –Harsh Upadhyay
2. 2. Module Objective Agenda•Introduce the concept of Simple Linear Regression•Walk through the process of plotting our data•Apply regression Techniques•Evaluate our model•Interpret the result Expected Learning•Understand key Simple Linear Regression terminology•Evaluate the relationship between a continuous X and continuous Y•Use regression analysis to make predictions about process
3. 3. Historical Note•Sir Francis Galton (1822-1911) used the termRegression.•To explain the relationship between the heights(inches) of fathers and their sons.
4. 4. Simple Linear RegressionRegression analysis is used to predict the value of one variable (the dependentvariable) on the basis of other variables (the independent variables).Dependent variable: denoted YIndependent variables: denoted X1, X2, …, Xk If we only have ONE independent variable, the model is
5. 5. Simple Linear Regression AnalysisVariables: X = Independent Variable (we provide this) Y = Dependent Variable (we observe this)Parameters: β0 = Intercept The y-intercept of a line is the point at which the line crosses the y axis. ( i.e. where the x value equals 0) β1 = Slope Change in the mean of Y for a unit change in X ε ~ Normal Random Variable (με = 0, σε = ???) [Noise]
6. 6. Simple Linear Regression AnalysisMeaning of and > 0 [positive slope] < 0 [negative slope] y rise run =slope (=rise/run) =y-intercept x
7. 7. Effect of Larger Values of σ ε Lower vs. Higher Variability Y25K\$ Y= β0+ β1 + X
8. 8. The least Square Method s nce dif ferethese differences are ed quarcalled residuals or heserrors of t e sum line… ze s th d the imi nts an min poi ne s li n the Thi wee bet
9. 9. ExerciseA black belt is connected with optimize a call center in a retail bank where clients placethe inquiries and order over the phone during 6 am to 6 pm (Monday to Friday). Thecurrent staffing plan is begin with about 4 associates at 6 am and increases to about 35associate by 9 am. At 3.30 pm the no. of associate begin to drop to about 7 by 6 pm. Theblack belt is anticipating an increasing in call volume and want to know how many callcan be answer in 30 min time interval for various staffing level to better staff the callcenter.The black belt obtain the data on the no. of associate and the no. of calls answeredfor each 30 min interval for the last two weeks. You have a total of 240 Samples. Data sheet
10. 10. Scatter Plot
11. 11. Fitted Line Plot Regression Analysis: CallsAnswd versus Staff The regression equation is CallsAnswd = - 8.844 + 3.099 Staff S = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% Analysis of Variance Source DF SS MS F P Regression 1 164091 164091 1192.37 0.000 Error 238 32753 138 Total 239 196844 Fitted Line: CallsAnswd versus Staff
12. 12. Minitab Output-Session WindowCallsAnswd = - 8.844 + 3.099 StaffCallsAnswd = - 8.844 + 3.099 (17)=43.839CallsAnswd = - 8.844 + 3.099 (22)=59.334 How Confident are you with this model? 59 Calls Staff of 22 ? 59 Calls 59 Calls
13. 13. Assessing The Model Evaluate the strength of the regression model1. Determine how much variation in our calls answered data is actually explained by staff.2. Determine The strength of the relationship between Call Answered and StaffRegression Analysis: CallsAnswd R-sqversus StaffThe regression equation isCallsAnswd = - 8.844 + 3.099 Staff % of variation in the YS = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% values explained by the linear relationship with XAnalysis of VarianceSource DF SS MS F PRegression 1 164091 164091 1192.37 0.000Error 238 32753 138 % of variation Calls AnsweredTotal 239 196844 explained by StaffFitted Line: CallsAnswd versus Staff
14. 14. % of Variation-Explanation Explained Total Variation (Y) Variation (Y) Explained VariationR Squared = Total Variation = Between 0 and 1 (0% to 100%)
15. 15. Properties of R The Correlation Coefficient √ R2 = R Measure the strength of the linear relationship (Pearson’s Correlation of coefficient) R Strong Moderate Low Low Moderate Moderate-1 Relationship -0.8 Relationship -0.5 Relationship 0 Relationship 0.5 Relationship 0.8 Relationship +1
16. 16. Caution- Correlation and Causation Calls Answered Staff Correlated Correlation Causation Change in one variable causesTwo things vary together change in another
17. 17. Regression Interpretation- Graphical•Residual is not normal•Residual Vs Fits is not constant
18. 18. Regression- Session WindowRegression Analysis: CallsAnswd versus Staff Unusual ObservationsThe regression equation is Obs Staff CallsAnswd Fit SE Fit Residual St ResidCallsAnswd = - 8.84 + 3.10 Staff 1 3.0 6.000 0.454 1.995 5.546 0.48 X 3 4.0 7.000 3.554 1.913 3.446 0.30 X 6 4.0 11.000 3.554 1.913 7.446 0.64 XPredictor Coef SE Coef T P 7 4.0 12.000 3.554 1.913 8.446 0.73 XConstant -8.844 2.247 -3.94 0.000 8 3.0 5.000 0.454 1.995 4.546 0.39 XStaff 3.09947 0.08976 34.53 0.000 9 3.0 7.000 0.454 1.995 6.546 0.57 X 67 29.0 109.000 81.040 0.901 27.960 2.39R 78 32.0 127.000 90.339 1.071 36.661 3.14RS = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% 141 30.0 134.000 84.140 0.952 49.860 4.26R 151 30.0 114.000 84.140 0.952 29.860 2.55R 161 33.0 128.000 93.438 1.136 34.562 2.96RAnalysis of Variance 171 33.0 126.000 93.438 1.136 32.562 2.79R 183 32.0 59.000 90.339 1.071 -31.339 -2.68RSource DF SS MS F P 199 29.0 48.000 81.040 0.901 -33.040 -2.82RRegression 1 164091 164091 1192.37 0.000 202 25.0 35.000 68.643 0.768 -33.643 -2.87RResidual Error 238 32753 138 235 4.0 14.000 3.554 1.913 10.446 0.90 XTotal 239 196844 236 4.0 7.000 3.554 1.913 3.446 0.30 X R denotes an observation with a large standardized residual (>2) X denotes an observation whose X value gives it large leverage
19. 19. Unusual Observations Unusual Observations Unusual Observations Obs Staff CallsAnswd Fit SE Fit Residual St Resid Obs Staff CallsAnswd Fit SE Fit Residual St Resid 1 3.0 6.000 0.454 1.995 5.546 0.48 X 1 3.0 6.000 0.454 1.995 5.546 0.48 X 3 4.0 7.000 3.554 1.913 3.446 0.30 X 3 4.0 7.000 3.554 1.913 3.446 0.30 X 6 4.0 11.000 3.554 1.913 7.446 0.64 X 6 4.0 11.000 3.554 1.913 7.446 0.64 X 7 4.0 12.000 3.554 1.913 8.446 0.73 X 7 4.0 12.000 3.554 1.913 8.446 0.73 X 8 3.0 5.000 0.454 1.995 4.546 0.39 X 8 3.0 5.000 0.454 1.995 4.546 0.39 X 9 3.0 7.000 0.454 1.995 6.546 0.57 X 9 3.0 7.000 0.454 1.995 6.546 0.57 X 67 29.0 109.000 81.040 0.901 27.960 2.39R 67 29.0 109.000 81.040 0.901 27.960 2.39R 78 32.0 127.000 90.339 1.071 36.661 3.14R 78 32.0 127.000 90.339 1.071 36.661 3.14R 141 30.0 134.000 84.140 0.952 49.860 4.26R 141 30.0 134.000 84.140 0.952 49.860 4.26R 151 30.0 114.000 84.140 0.952 29.860 2.55R 151 30.0 114.000 84.140 0.952 29.860 2.55R 161 33.0 128.000 93.438 1.136 34.562 2.96R 161 33.0 128.000 93.438 1.136 34.562 2.96R 171 33.0 126.000 93.438 1.136 32.562 2.79R 171 33.0 126.000 93.438 1.136 32.562 2.79R 183 32.0 59.000 90.339 1.071 -31.339 -2.68R 183 32.0 59.000 90.339 1.071 -31.339 -2.68R 199 29.0 48.000 81.040 0.901 -33.040 -2.82R 199 29.0 48.000 81.040 0.901 -33.040 -2.82R 202 25.0 35.000 68.643 0.768 -33.643 -2.87R 202 25.0 35.000 68.643 0.768 -33.643 -2.87R 235 4.0 14.000 3.554 1.913 10.446 0.90 X 235 4.0 14.000 3.554 1.913 10.446 0.90 X 236 4.0 7.000 3.554 1.913 3.446 0.30 X 236 4.0 7.000 3.554 1.913 3.446 0.30 X R denotes an observation with a large standardized residual (>2) X denotes an observation whose X value gives it large leverage 240 Data pointsUnusual observation 4% = 9 pointsResidual values (R) ≤5% of total observation
20. 20. Original Vs AlteredOriginal Graph Altered Graph
21. 21. Interpreting The ModelRegression Analysis: CallsAnswd versus StaffThe regression equation isCallsAnswd = - 8.84 + 3.10 StaffPredictor Coef SE Coef T PConstant -8.844 2.247 -3.94 0.000 Predictor TableStaff 3.09947 0.08976 34.53 0.000S = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% Additional StatisticsAnalysis of VarianceSource DF SS MS F PRegression 1 164091 164091 1192.37 0.000 ANOVA TableResidual Error 238 32753 138Total 239 196844
22. 22. Interpreting The ModelRegression Analysis: CallsAnswd versus StaffThe regression equation isCallsAnswd = - 8.84 + 3.10 Staff Predictor TablePredictor Coef SE Coef T PConstant -8.844 2.247 -3.94 0.000Staff 3.09947 0.08976 34.53 0.000 Ho: Slope = 0 •No difference in Y when X changesS = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% •X has no impact on yAnalysis of Variance Ha: Slope ≠ 0Source DF SS MS F PRegression 1 164091 164091 1192.37 0.000 •Y changes as X changesResidual Error 238 32753 138 •X has impact on yTotal 239 196844
23. 23. Interpreting The ModelRegression Analysis: CallsAnswd versus StaffThe regression equation isCallsAnswd = - 8.84 + 3.10 Staff Additional StatisticsPredictor Coef SE Coef T PConstant -8.844 2.247 -3.94 0.000 R-squaredStaff 3.09947 0.08976 34.53 0.000 •The amount of variation explained by this variationS = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% R-squared (Adjusted) •Account for the no. of X’s used in model (Used in multipleAnalysis of Variance regression)Source DF SS MS F P SRegression 1 164091 164091 1192.37 0.000Residual Error 238 32753 138 •Standard deviation for theTotal 239 196844 leftover or unexplained variation. It is used for designing the confidence interval
24. 24. Interpreting The ModelRegression Analysis: CallsAnswd versus StaffThe regression equation isCallsAnswd = - 8.84 + 3.10 Staff ANOVA TablePredictor Coef SE Coef T PConstant -8.844 2.247 -3.94 0.000 RegressionStaff 3.09947 0.08976 34.53 0.000 •The explained variationS = 11.7311 R-Sq = 83.4% R-Sq(adj) = 83.3% Residual Error •Unexplained VariationAnalysis of VarianceSource DF SS MS F P Ho:Regression 1 164091 164091 1192.37 0.000 •The model does not explain theResidual Error 238 32753 138 observed variationTotal 239 196844 Ha: •The model does explain the observed variation
25. 25. Confidence on Output CallsAnswd = -8.844 + 3.099 StaffIf Staff 10 then, - 8.844 + 3.099 (10) = 22If Staff 20 then, - 8.844 + 3.099 (20) = 53If Staff 30 then, - 8.844 + 3.099 (30) = 84 Confidence = ???% R-squared = 83.4% Report result using Unexplained Variation = 16.6% confidence intervals
26. 26. Confidence on Output 95% confidence the trueregression line lies within 95% prediction interval expect 95% of the data points to fall within
27. 27. Confidence on OutputConfidence interval report average performance Prediction intervals predict actual values
28. 28. Predicted Value for YLet management decide max 20 staff, then how may call answer by these20 executivePredicted Values for New ObservationsNew Obs Fit SE Fit 95% CI 95% PI 1 53.145 0.822 (51.526, 54.765) (29.979, 76.312)Values of Predictors for New ObservationsNew Obs Staff 1 20.0
29. 29. Review Methodology1. Get Familiar with the data 3. Check the model & the assumption• Identify the output or Y variable • Look at the residual plot• Identify the X variable or predictor  Are assumption valid?• Look at the time series, dot plots, histogram  Do residual look ok?and scatter plot  Are there unusual observation? If yes,• Run descriptive statistics address, if possible and rerun the analysis  Check for outliers • What is r-squared?  Check for gaps in the data • How much variation can be explained by the model? • Look at P –values  Does X have significant impact on Y?2. Fit the model to the data 4. Report Result & Use Equation• Run regression • Summarize the results for your stakeholders• Use the regression • If you have excluded data, explain why • If you kept outlier in the data, explain why • How much variation in Y can be explained by X? • Remember causation vs. correlation • Make a predictions using confidence and prediction intervals.