NG BB 37 Multiple Regression

3,107
-1

Published on

Published in: Education, Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,107
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
418
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

NG BB 37 Multiple Regression

  1. 1. UNCLASSIFIED / FOUO UNCLASSIFIED / FOUO National Guard Black Belt Training Module 37 Multiple Regression UNCLASSIFIED / FOUO UNCLASSIFIED / FOUO
  2. 2. UNCLASSIFIED / FOUOCPI Roadmap – Analyze 8-STEP PROCESS 6. See 1.Validate 2. Identify 3. Set 4. Determine 5. Develop 7. Confirm 8. Standardize Counter- the Performance Improvement Root Counter- Results Successful Measures Problem Gaps Targets Cause Measures & Process Processes Through Define Measure Analyze Improve Control ACTIVITIES TOOLS • Value Stream Analysis • Identify Potential Root Causes • Process Constraint ID • Reduce List of Potential Root • Takt Time Analysis Causes • Cause and Effect Analysis • Brainstorming • Confirm Root Cause to Output • 5 Whys Relationship • Affinity Diagram • Estimate Impact of Root Causes • Pareto on Key Outputs • Cause and Effect Matrix • FMEA • Prioritize Root Causes • Hypothesis Tests • Complete Analyze Tollgate • ANOVA • Chi Square • Simple and Multiple Regression Note: Activities and tools vary by project. Lists provided here are not necessarily all-inclusive. UNCLASSIFIED / FOUO
  3. 3. UNCLASSIFIED / FOUO Learning Objectives  Understand how to identify correlation with multiple variables  Learn how to create a mathematical model for the effect of multiple inputs on an output variable  Understand and identify multicollinearity  Understand how to use best subsets to identify the best model  Examine unusual observations to learn more about the data Multiple Regression UNCLASSIFIED / FOUO 3
  4. 4. UNCLASSIFIED / FOUO Multiple Regression  In Simple Linear Regression, we had: Y = f(X)  Y = B0 + B1X  In Multiple Linear Regression, we have:  Y = B0 + B1X1 + B2X2 + X3 B3X3 X1 X5  We’d like to identify which, if any, of the predictor variables are X2 X4 Y useful in predicting Y Multiple Regression UNCLASSIFIED / FOUO 4
  5. 5. UNCLASSIFIED / FOUOWhen Should I Use Multiple Regression? Independent Variable (X) Continuous Attribute Dependent Variable (Y) Continuous Regression ANOVA Attribute Logistic Chi-Square (2) Regression Test The tool depends on the data type. Regression is typically used with a continuous input and a continuous response but may also be used with count or categorical inputs and outputs. Multiple Regression UNCLASSIFIED / FOUO 5
  6. 6. UNCLASSIFIED / FOUO Basic Steps for Regression Modeling STEPS OBJECTIVES KEY QUESTIONS 1 Process Flowchart To identify KPIVs and Which KPIVs will significantly SIPOC KPOVs improve which KPOVs? 2 Does it look like there is Scatter Plot, To visualize the data Histogram C&E relationship? 3 How strong is the C&E To qualify the C&E relationship Correlation, Test (Strength, % Variability, P-value) relationship? Hypothesis 4 To quantify the C&E relationship What is the prediction Regression Analysis (Method of Least Squares) equation? 5 Is there anything suspicious To validate the model selected Residual Analysis with the model selected? KPIV = Key Process Input Variables KPOV = Key Process Output Variables Multiple Regression UNCLASSIFIED / FOUO 6
  7. 7. UNCLASSIFIED / FOUO Example: Production Plant  A chemical engineer is investigating the amount of silver required in the high volume production of contact switches for a new Army radio. Although only a small amount of silver is deposited on the switches, a larger amount is wasted through a multiple step process. She has collected data and would like to develop a prediction model. A-06 Production Plant  Step 1: The variables identified as KPIVs are given below:  X1 = Average temperature of rinse bath (degrees C)  X2 = Speed of reel that feeds the switches through the line (inches/min)  X3 = Thickness of silver deposit (angstroms)  X4 = Water consumed (gallons per day) What questions  Y = Amount of silver consumed (pounds/day) would you ask about this data?Source: Applied Regression Analysis, Draper and Smith Multiple Regression UNCLASSIFIED / FOUO 7
  8. 8. UNCLASSIFIED / FOUO Visualize the Data! Step 2: Visualize the Data Data file: A-06 Production Plant.mtw Select Graph>Matrix Plot Multiple Regression UNCLASSIFIED / FOUO 8
  9. 9. UNCLASSIFIED / FOUO Step 2: Visualize the Data! Looking for relationships between variables... This dialog box comes up first Select Matrix of Plots – Simple Since we have only one (Y) variable and no groups Click on OK to go the next Dialog box Multiple Regression UNCLASSIFIED / FOUO 9
  10. 10. UNCLASSIFIED / FOUO Step 2: Visualize the Data! Double click on all of the variables you want to include in the Matrix, to place them in the Graph variables box Select Matrix Options to move on to the next dialog box Multiple Regression UNCLASSIFIED / FOUO 10
  11. 11. UNCLASSIFIED / FOUO Step 2: Visualize the Data! Select Lower left to place all the graph labels to the lower left of the boxes Click on OK here and on the previous dialog box to get the matrix Multiple Regression UNCLASSIFIED / FOUO 11
  12. 12. UNCLASSIFIED / FOUO Correlation Table There appear to be some relationships between certain variables and the response. Matrix Plot of Temp, Speed, Thickness, Water, Amt of Ag Temp 12 10 Speed Is this 8 14.0 good or 13.5 bad? Response Thickness 13.0 Variable 170 (Y) 160 Water 150 21 20 Amt of Ag 19 55 60 65 8 10 12 13.0 13.5 14.0 150 160 170 Multiple Regression UNCLASSIFIED / FOUO 12
  13. 13. UNCLASSIFIED / FOUO Quantify the Relationships Between Variables Step 3: Quantify the relationshipSelect Stat>BasicStatistics> Correlation Multiple Regression UNCLASSIFIED / FOUO 13
  14. 14. UNCLASSIFIED / FOUO Correlation Matrix Evaluating coefficients of correlation among predictors... Double click on all of the variables you want to include, to place them in the Variables box Check to display p-values (default setting) Click on OK to get the Correlation Matrix in your Session Window Multiple Regression UNCLASSIFIED / FOUO 14
  15. 15. UNCLASSIFIED / FOUO Correlation Matrix The TOP number in each pair is the Pearson Coefficient of Correlation, (r-Value) While the BOTTOM number is the p-Value Predictor variable pairwise correlations larger than .5-.7 are signs of trouble ... Multicollinearity. We will explain more shortly. Multiple Regression UNCLASSIFIED / FOUO 15
  16. 16. UNCLASSIFIED / FOUO Finding the Regression Equation... Step 4: Develop a prediction model Select: Stat> Regression> Regression Multiple Regression UNCLASSIFIED / FOUO 16
  17. 17. UNCLASSIFIED / FOUO Finding the Regression Equation... (Cont.) Double click on C5 Amt of AG and place it in the Response: variable box, then double click on all the variables you want to place in the Predictors: box. Select Options to go to next dialog box. Multiple Regression UNCLASSIFIED / FOUO 17
  18. 18. UNCLASSIFIED / FOUO Finding the Regression Equation... (Cont.) In this dialog box, the only thing you have to do is check Variance inflation factors Click on OK here and on previous dialog box to get the regression analysis in your Session Window Multiple Regression UNCLASSIFIED / FOUO 18
  19. 19. UNCLASSIFIED / FOUO Regression Equation Minitab displays the following regression equation: Amt of Ag = 5.7 - 0.0156 Temp + 0.239 Speed + 0.44 Thickness + 0.0449 Water Predictor Coef SE Coef T P VIF Constant 5.72 10.83 0.53 0.607 Temp -0.01558 0.02616 -0.60 0.563 1.276 Speed 0.2393 0.2644 0.90 0.383 10.997 Thickness 0.443 1.033 0.43 0.675 11.671 Water 0.04495 0.01481 3.04 0.010 1.731 S = 0.412748 R-Sq = 80.9% R-Sq(adj) = 74.5% The P-values indicate whether a particular predictor is significant This new model R-Sq (adj) adjusts for degrees in presence of other explains 80.9% of of freedom due to variables predictors in the response variability that have no real value. It model should be used when comparing models Multiple Regression UNCLASSIFIED / FOUO 19
  20. 20. UNCLASSIFIED / FOUO Interpreting P-values  The P columns give the significance level for each term in the model  Typically, if a P value is less than or equal to 0.05, the variable is considered significant (i.e., null hypothesis is rejected)  If a P value is greater than 0.10, the term is removed from the model. A practitioner might leave the term in the model, if the P value is within the gray region between these two probability levels Multiple Regression UNCLASSIFIED / FOUO 20
  21. 21. UNCLASSIFIED / FOUO Regression Equation Regression output in Minitab’s Session Window Amt of Ag = 5.7 - 0.0156 Temp + 0.239 Speed + 0.44 Thickness + 0.0449 Water Predictor Coef SE Coef T P VIF Constant 5.72 10.83 0.53 0.607 Variance Inflation Factor Temp -0.01558 0.02616 -0.60 0.563 1.276 Speed 0.2393 0.2644 0.90 0.383 10.997 Thickness 0.443 1.033 0.43 0.675 11.671 Water 0.04495 0.01481 3.04 0.010 1.731 S = 0.412748 R-Sq = 80.9% R-Sq(adj) = 74.5% High VIF values are signs of trouble (VIF > 10) Multiple Regression UNCLASSIFIED / FOUO 21
  22. 22. UNCLASSIFIED / FOUO Problems with Several Predictor Variables  Sometimes the Xs are correlated (dependent). This condition is known as Multicollinearity  Multicollinearity can cause problems (sometimes severe)  Estimates of the coefficients are affected (unstable, inflated variances)  Difficulty isolating the effects of each X  Coefficients depend on which Xs are included in the model  High multicollinearity inflates the standard error estimates, which increases the P values  If case of extreme multicollinearity, Minitab will throw out one term and give you notice Multiple Regression UNCLASSIFIED / FOUO 22
  23. 23. UNCLASSIFIED / FOUO Graphical Representation of Multicollinearity Variation Explained by X1 Total Variation in Y Variation Explained by X2 • Overlap represents correlation • X1 and X2 are both correlated with Y • X1 and X2 are highly correlated • If X1 is in the model, we don’t need X2, and vice versa Multiple Regression UNCLASSIFIED / FOUO 23
  24. 24. UNCLASSIFIED / FOUO Assessing the Degree of Multicollinearity  We use a metric called Variance Inflation Factor (VIF): 1 VIF  2 Select 1  Ri Stat>Regression>Regression>Options> Display variance inflation factors Where:  Ri2 is the R2 value you get when you regress Xi against the other X’s  A large Ri2 suggests that a variable is redundant  Rule of Thumb:  Ri2 > 0.9 is a cause for concern (high degree of collinearity) (VIF > 10)  0.8 < Ri2 < 0.9 (moderate degree of collinearity) (VIF > 5)  For the Production Plant data, Minitab gives us: VIF Temp 1.276 Speed 10.997 Two VIF’s are a bit large, but in this case with a R-sq. Thickness 11.671 of 80.9%, some multicollinearity can be tolerated Water 1.731 Multiple Regression UNCLASSIFIED / FOUO 24
  25. 25. UNCLASSIFIED / FOUO Some Cautions About the Coefficients  Remember the prediction equation obtained earlier: Amt of Ag  5.7  0.0156 Temp.  0.239 Speed  0.44 Thickness  0.0449 Water  Relative importance of predictors cannot be determined from the size of their coefficients:  The coefficients are scale dependent  The coefficients are influenced by correlation among the predictor variables  If a high degree of multicollinearity exists, even the signs of the coefficients may be misleading Multiple Regression UNCLASSIFIED / FOUO 25
  26. 26. UNCLASSIFIED / FOUO Residual Analysis Step 5: Validate the selected model Select Stat> Regression> Regression Is there anything suspicious with this model? Multiple Regression UNCLASSIFIED / FOUO 26
  27. 27. UNCLASSIFIED / FOUO Residual Analysis (Cont.) Double click on C5 Amt of AG and place it in the Response variable box, then double click on all the variables you want to place in the Predictors box Select Graphs to go to next dialog box Multiple Regression UNCLASSIFIED / FOUO 27
  28. 28. UNCLASSIFIED / FOUO Residual Analysis (Cont.) Select Four in one to get all four Residual plots on one graph, or you can pick and choose the plots You want Click on OK here and on previous Dialog box to get Residual plots Multiple Regression UNCLASSIFIED / FOUO 28
  29. 29. UNCLASSIFIED / FOUO Residual Analysis (Cont.) Not too bad overall… Residual Plots for Amt of Ag Normal Probability Plot Versus Fits 99 N 17 AD 0.249 0.50 90 P-Value 0.705 0.25 Residual Percent 50 0.00 If you want to see -0.25 10 the value for any -0.50 1 observation, just -1.0 -0.5 0.0 0.5 1.0 19.5 20.0 20.5 21.0 hold your cursor 21.5 Residual Fitted Value over that point Histogram Versus Order 4 0.50 3 0.25 Frequency Residual 2 0.00 -0.25 1 -0.50 0 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 2 4 6 8 10 12 14 16 Residual Observation Order Multiple Regression UNCLASSIFIED / FOUO 29
  30. 30. UNCLASSIFIED / FOUO How to Address Multicollinearity  Eliminate one or more input variables  We’ll look at a technique called Best Subsets Regression  Collect additional data  Use process knowledge to determine the principal relationship  Use DOE to further assess the multicollinearity  If neither are significant then eliminate both from the analysis Multiple Regression UNCLASSIFIED / FOUO 30
  31. 31. UNCLASSIFIED / FOUO Best Subsets Regression  Rather than relying on the p-values alone, the computer looks at all possible combinations of variables and prints the resulting model characteristics  Statistics like adjusted R-Sq and MSError will improve as important model terms are added, then worsen as “junk” terms are added to the model Multiple Regression UNCLASSIFIED / FOUO 31
  32. 32. UNCLASSIFIED / FOUO Best Subsets Regression Considerations  Objective: We want to select a model with predictive accuracy and minimum multicollinearity  Seek compromise between:  Overfitting (including model terms with only marginal, or no, contribution)  Underfitting (ignoring or deleting relatively important model terms)  What are some problems with overfitting? overfit underfit  What are some problems with underfitting? Multiple Regression UNCLASSIFIED / FOUO 32
  33. 33. UNCLASSIFIED / FOUO Best Subsets Regression Evaluating Candidate Models  Four things to look at when evaluating candidate models: 1. R2 (large R2 is desired, although R2 increases as we add more predictors to the model, so this should only be used for comparing models with the same number of terms) 2. Adjusted R2 (large is desired) 3. Mallows Cp statistic (small Cp desired, close to the number of terms in the model) 4. s (the estimate of the standard deviation around the regression)  Generally, the best three models are selected and checked for significance of all factors and residual assumptions Multiple Regression UNCLASSIFIED / FOUO 33
  34. 34. UNCLASSIFIED / FOUO More on the Mallows C-p Statistic  In practice, the minimum number of parameters needed in the model is when the Mallows’ C-p statistic is a minimum  Rule of Thumb:  We want C-p  number of input variables Multiple Regression UNCLASSIFIED / FOUO 34
  35. 35. UNCLASSIFIED / FOUO Best Subsets Regression Minitab data set: Production Plant Select Stat> Regression> Best Subsets Multiple Regression UNCLASSIFIED / FOUO 35
  36. 36. UNCLASSIFIED / FOUO Best Subsets Regression (Cont.) Enter Response variable Enter Predictor variables (Input Variables) Click on OK to get analysis in Session Window Multiple Regression UNCLASSIFIED / FOUO 36
  37. 37. UNCLASSIFIED / FOUO Best Subsets Regression (Cont.) Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water Response is Amt of Ag T h i c S k W T p n a e e e t Mallows m e s e Vars R-Sq R-Sq(adj) Cp S p d s r 1 64.4 62.0 9.4 0.50387 X 1 62.3 59.8 10.7 0.51836 X 2 80.0 77.2 1.5 0.39047 XX What Model(s) 2 78.8 75.8 2.3 0.40200 X X are the best 3 80.6 76.1 3.2 0.39959 X X X candidates? 3 80.3 75.8 3.4 0.40237 X X X 4 80.9 74.5 5.0 0.41275 X X X X Multiple Regression UNCLASSIFIED / FOUO 37
  38. 38. UNCLASSIFIED / FOUO Best Subsets Regression (Cont.) Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water Response is Amt of Ag T h R-Sq: Look for the highest value i when comparing models with the c S k W same number of input variables T p n a e e e t Mallows m e s e Vars R-Sq R-Sq(adj) Cp S p d s r 1 64.4 62.0 9.4 0.50387 X 1 62.3 59.8 10.7 0.51836 X 2 80.0 77.2 1.5 0.39047 XX 2 78.8 75.8 2.3 0.40200 X X 3 80.6 76.1 3.2 0.39959 X X X 3 80.3 75.8 3.4 0.40237 X X X 4 80.9 74.5 5.0 0.41275 X X X X Multiple Regression UNCLASSIFIED / FOUO 38
  39. 39. UNCLASSIFIED / FOUO Best Subsets Regression (Cont.) Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water Response is Amt of Ag T R-Sq (adj): Look for the h i highest value when comparing c models with different number S k W of input variables T p n a e e e t Mallows m e s e Vars R-Sq R-Sq(adj) Cp S p d s r 1 64.4 62.0 9.4 0.50387 X 1 62.3 59.8 10.7 0.51836 X 2 80.0 77.2 1.5 0.39047 XX 2 78.8 75.8 2.3 0.40200 X X 3 80.6 76.1 3.2 0.39959 X X X 3 80.3 75.8 3.4 0.40237 X X X 4 80.9 74.5 5.0 0.41275 X X X X Multiple Regression UNCLASSIFIED / FOUO 39
  40. 40. UNCLASSIFIED / FOUO Best Subsets Regression (Cont.) Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water Response is Amt of Ag T Cp: Look for models where Cp is h small and close to the number of i c input variables in the model S k W T p n a e e e t Mallows m e s e Vars R-Sq R-Sq(adj) Cp S p d s r 1 64.4 62.0 9.4 0.50387 X 1 62.3 59.8 10.7 0.51836 X 2 80.0 77.2 1.5 0.39047 XX 2 78.8 75.8 2.3 0.40200 X X 3 80.6 76.1 3.2 0.39959 X X X 3 80.3 75.8 3.4 0.40237 X X X 4 80.9 74.5 5.0 0.41275 X X X X Multiple Regression UNCLASSIFIED / FOUO 40
  41. 41. UNCLASSIFIED / FOUO Best Subsets Regression (Cont.) Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water Response is Amt of Ag T h S: We want S, the estimate of i the standard deviation about c the regression, to be as small S k W as possible T p n a e e e t Mallows m e s e Vars R-Sq R-Sq(adj) Cp S p d s r 1 64.4 62.0 9.4 0.50387 X 1 62.3 59.8 10.7 0.51836 X 2 80.0 77.2 1.5 0.39047 XX 2 78.8 75.8 2.3 0.40200 X X 3 80.6 76.1 3.2 0.39959 X X X 3 80.3 75.8 3.4 0.40237 X X X 4 80.9 74.5 5.0 0.41275 X X X X Multiple Regression UNCLASSIFIED / FOUO 41
  42. 42. UNCLASSIFIED / FOUO Once the Candidate Models Are Identified  Evaluate the candidate models under a “microscope”  Outliers  High leverage  Influential observations  Residuals  Prediction quality  Once a model has been selected, find the new regression equation  Test its predictive capability for observations NOT originally used in the modeling Multiple Regression UNCLASSIFIED / FOUO 42
  43. 43. UNCLASSIFIED / FOUO Regression with Reduced Model We select the best model with two variables, Speed & Water, and run Minitab again to obtain the new regression equation: Select Stat> Regression> Regression Multiple Regression UNCLASSIFIED / FOUO 43
  44. 44. UNCLASSIFIED / FOUO Regression with Reduced Model (Cont.) Enter Amt of Ag as the Response Enter only Speed and Water as Predictors Click on OK to get analysis in Session Window Multiple Regression UNCLASSIFIED / FOUO 44
  45. 45. UNCLASSIFIED / FOUO Regression with Reduced Model (Cont.) Session window of Minitab yields the following regression equation for the reduced model: Amt of Ag = 9.92 + 0.357 Speed + 0.0425 Water Predictor Coef SE Coef T P Constant 9.919 1.694 5.86 0.000 Speed 0.35689 0.08544 4.18 0.001 Water 0.04253 0.01206 3.53 0.003 S = 0.3905 R-Sq = 80.0% R-Sq(adj) = 77.2% …to compare with the previous model: Amt of Ag = 5.7 - 0.0156 Temp. + 0.239 Speed + 0.44 Thickness + 0.0449 Water Predictor Coef SE Coef T P Constant 5.72 10.83 0.53 0.607 H20 Temp -0.01558 0.02616 -0.60 0.563 Speed 0.2393 0.2644 0.90 0.383 Thick. 0.443 1.033 0.43 0.675 Water 0.04495 0.01481 3.04 0.010 S = 0.4127 R-Sq = 80.9% R-Sq(adj) = 74.5% Multiple Regression UNCLASSIFIED / FOUO 45
  46. 46. UNCLASSIFIED / FOUOUnusual Observations Session window of Minitab also gives us the following output: Unusual Observations Obs Speed Amt of A Fit SE Fit Residual St Resid 3 11.5 21.0000 20.3784 0.2477 0.6216 2.06R R denotes an observation with a large standardized residual An unusual observation means a large standard residual Let’s see what would happen if we eliminated such an observation from our collected data! Multiple Regression UNCLASSIFIED / FOUO 46
  47. 47. UNCLASSIFIED / FOUOImpact of the Unusual Observation Without the Unusual Observation, the Session window of Minitab yields the following regression equation: Amt of Ag = 8.61 + 0.237 Speed + 0.0577 Water Predictor Coef SE Coef T P Constant 8.610 1.567 5.49 0.000 Speed 0.23698 0.08960 2.64 0.020 Water 0.05775 0.01226 4.71 0.000 R-Sq goes up a little S = 0.3383 R-Sq = 85.0% R-Sq(adj) = 82.7% because we’ve gotten rid of “noise” in the model …to compare with the regression equation of our previous reduced model Amt of Ag = 9.92 + 0.357 Speed + 0.0425 Water Predictor Coef SE Coef T P Constant 9.919 1.694 5.86 0.000 Speed 0.35689 0.08544 4.18 0.001 Water 0.04253 0.01206 3.53 0.003 S = 0.3905 R-Sq = 80.0% R-Sq(adj) = 77.2% Multiple Regression UNCLASSIFIED / FOUO 47
  48. 48. UNCLASSIFIED / FOUO Takeaways  Regression analysis can be used with historical data as well data from designed experiments to build prediction models  Care must be exercised when using historical data  Correlation does not imply a cause and effect relationship  There may be serious problems with multicollinearity and high leverage observations  There are several diagnostic tools available to evaluate regression models:  Fit: R2, adjusted R2, Cp, S  Unusual observations: residual plots, leverage, CooksD  Multicollinearity: VIFs (Variance Inflation Factors) Multiple Regression UNCLASSIFIED / FOUO 48
  49. 49. UNCLASSIFIED / FOUO Considerations in Regression  Set goals before doing the analysis (what do you want to learn, how well do you need to predict, etc.).  Gather enough observations to adequately measure error and check the model assumptions.  Make sure that the sample of data is representative of the population.  Excessive measurement error of the inputs (Xs) creates uncertainty in the estimated coefficients, predictions, etc.  Be sure to collect data on all potentially important explanatory variables. Multiple Regression UNCLASSIFIED / FOUO 49
  50. 50. UNCLASSIFIED / FOUO Regression Checklist  Scatterplots (Y vs. X)  Histograms and/or Boxplots of Ys and Xs  Coefficients  Significance (p < .05 - .10)  R2 and adjusted R2  S  Residuals (no obvious pattern)  Unusual Y values (standardized residuals > 2)  Unusual X values (leverage > 2p/n)  Overfitting vs. underfitting (C-p  number of input variables in model)  Multicollinearity (VIF > 5-10) Multiple Regression UNCLASSIFIED / FOUO 50
  51. 51. UNCLASSIFIED / FOUO What other comments or questions do you have? UNCLASSIFIED / FOUO
  52. 52. UNCLASSIFIED / FOUO References  Neter, Wasserman, and Kutner, Applied Linear Regression Models, Irwin, 1989  Draper and Smith, Applied Regression Analysis, Wiley, 1981  Schulman, Robert S., Statistics in Plain English, Chapman and Hall, 1992.  Gunst and Mason, Regression Analysis and its Application, Marcel Dekker, 1980  Myers, Raymond H., Classical and Modern Regression with Applications, Duxbury, 1990  Dielman, Applied Regression Analysis for Business and Economics, Duxbury, 1991  Hosmer and Lemeshow, Applied Logistic Regression, Wiley, 1989  Iglewicz and Hoaglin, How to Detect and Handle Outliers, ASQ Press  Crocker, Douglas C., How to use Regression Analysis in Quality Control, ASQ Press Multiple Regression UNCLASSIFIED / FOUO 52
  53. 53. UNCLASSIFIED / FOUO UNCLASSIFIED / FOUO National Guard Black Belt Training APPENDIX Additional Exercises  Anthony’s Pizza  Customer Satisfaction  A Study of Supervisor Performance UNCLASSIFIED / FOUO UNCLASSIFIED / FOUO
  54. 54. UNCLASSIFIED / FOUO Additional Practice Example: Anthony’s Pizza  We have received Voice of the Customer feedback telling us that customers are dissatisfied if we cannot accurately predict the time of their pizza delivery when it is beyond the 30 minute target  We would like to develop a model so that when the customer calls, we can accurately predict delivery time Multiple Regression UNCLASSIFIED / FOUO 54
  55. 55. UNCLASSIFIED / FOUO Additional Practice Example: Six Sigma Pizza  Our Minitab data can be found in the file Multiple Regression - Pizza.mpj  Based on the data that we have collected, we are going to study the effects of total pizzas ordered, defects, and incorrect order on delivery time Multiple Regression UNCLASSIFIED / FOUO 55
  56. 56. UNCLASSIFIED / FOUO Additional Practice Exercise: Customer Satisfaction  Bob Black Belt would like to get a better understanding of the customer satisfaction data  Use the data provided in the Minitab file A-06 Customer Satisfaction Data.mtw to create a Regression Model to predict Overall Satisfaction Each row of data is a monthly average of how customers rated the services on a scale of 1-10. For example, in January, the average of customer ratings for Staff Responsiveness was a 7.9. Multiple Regression UNCLASSIFIED / FOUO 56
  57. 57. UNCLASSIFIED / FOUO Additional Practice Exercise: Customer Satisfaction (Cont.)  Consider Staff Responsiveness, Check-out Speed, Frequent Guest Program, and Problems Resolved as possible inputs that could be used to predict Overall Satisfaction.  First, study correlation with a Matrix Plot and Correlation Table  Next, create the initial Regression Model  Find the best combination of inputs with Best Subsets  Finally, run the reduced Regression Model Multiple Regression UNCLASSIFIED / FOUO 57
  58. 58. UNCLASSIFIED / FOUO Additional Practice Exercise: A Study of Supervisor Performance  A recent survey of clerical employees in a large financial organization included questions related to employee satisfaction with their supervisors. The company was interested in any relationships between specific supervisor characteristics and overall satisfaction with supervisors as perceived by the employees,  Y = Overall rating of the job being done by the supervisor  X1 = Handles employee complaints  X2 = Does not allow special privileges  X3 = Provides opportunity to learn new things  X4 = Raises based on performance  X5 = Too critical of poor performance  X6 = Rate of advancing to better jobs (employee’s perception of their own advancement rate) Source: Regression Analysis by Example, Chatterjee and Price Multiple Regression UNCLASSIFIED / FOUO 58
  59. 59. UNCLASSIFIED / FOUO Additional Practice Exercise: A Study of Supervisor Performance  The survey responses were on a scale of 1-5  For purposes of analysis, a score of 1 or 2 was considered “favorable”, while a score of 3, 4, or 5 was considered “unfavorable”  Data was collected from 30 departments, selected randomly form the organization. Each department had approximately 35 employees with one supervisor  For each department, the data was aggregated and the data recorded was the percent favorable for each item  Data file is A-06 Attitude.mtw  Questions:  Can we predict the overall supervisor rating using this data?  What variable(s) have the strongest correlation with the supervisor rating?  Are there any unusual observations?  Comments on the data? Multiple Regression UNCLASSIFIED / FOUO 59
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×