1.
UNCLASSIFIED / FOUO UNCLASSIFIED / FOUO National Guard Black Belt Training Module 36 Simple Linear Regression UNCLASSIFIED / FOUO This material is not for general distribution, and its contents should not be quoted, extracted for publication, or otherwise copied or distributed without prior coordination with the Department of the Army, ATTN: ETF. UNCLASSIFIED / FOUO
2.
UNCLASSIFIED / FOUOCPI Roadmap – Analyze 8-STEP PROCESS 6. See 1.Validate 2. Identify 3. Set 4. Determine 5. Develop 7. Confirm 8. Standardize Counter- the Performance Improvement Root Counter- Results Successful Measures Problem Gaps Targets Cause Measures & Process Processes Through Define Measure Analyze Improve Control ACTIVITIES TOOLS • Value Stream Analysis • Identify Potential Root Causes • Process Constraint ID • Reduce List of Potential Root • Takt Time Analysis Causes • Cause and Effect Analysis • Brainstorming • Confirm Root Cause to Output • 5 Whys Relationship • Affinity Diagram • Estimate Impact of Root Causes • Pareto on Key Outputs • Cause and Effect Matrix • FMEA • Prioritize Root Causes • Hypothesis Tests • Complete Analyze Tollgate • ANOVA • Chi Square • Simple and Multiple Regression Note: Activities and tools vary by project. Lists provided here are not necessarily all-inclusive. UNCLASSIFIED / FOUO
3.
UNCLASSIFIED / FOUO Learning Objectives Terminology and data requirements for conducting a regression analysis Interpretation and use of scatter plots Interpretation and use of correlation coefficients The difference between correlation and causation How to generate, interpret, and use regression equations Simple Linear Regression UNCLASSIFIED / FOUO 3
4.
UNCLASSIFIED / FOUO Application Examples Administrative – A financial analyst wants to predict the cash needed to support growth and increases in training Market/Customer Research – The main exchange wants to determine how to predict a customer’s buying decision from demographics and product characteristics Hospitality – The MWR Guest House wants to see if there is a relationship between room service delays and order size Simple Linear Regression UNCLASSIFIED / FOUO 4
5.
UNCLASSIFIED / FOUO When Should I Use Regression? Independent Variable (X) Continuous Attribute Continuous Dependent Variable (Y) Regression ANOVA Attribute Logistic Chi-Square (2) Regression Test The tool depends on the data type. Regression is typically used with a continuous input and a continuous response but can also be used with count or categorical inputs and outputs. Simple Linear Regression UNCLASSIFIED / FOUO 5
6.
UNCLASSIFIED / FOUO General Strategy for Regression Modeling Planning and • What variables? Data Collection • How will I get the data? • How much data do I need? Initial Analysis and • What input variables have the biggest Reduction of Variables effect on the response variable? • What are some candidate prediction models? Select and Refine • What is the best model? Models Validate • How well does the model predict new Model observations? Simple Linear Regression UNCLASSIFIED / FOUO 6
7.
UNCLASSIFIED / FOUO Regression Terminology Types of Variables Input Variable (Xs) These are also called predictor variables or independent variables Best if the variables are continuous, Error but can be count or categorical X1 Output Variable (Ys) Process or X2 Y These are also called response Product X3 variables or dependent variables (what we’re trying to predict) Best if the variables are continuous, but can be count or categorical Simple Linear Regression UNCLASSIFIED / FOUO 7
8.
UNCLASSIFIED / FOUO Visualize the Data – A Good Start! Scatter Plot: A graph showing a relationship (or correlation) between two factors or variables Lets you “see” patterns in data Supports or refutes theories about the data Helps create or refine hypotheses Predicts effects under other circumstances (be careful extending predictions beyond the range of data used) Be Careful Correlation does not guarantee causation! Simple Linear Regression UNCLASSIFIED / FOUO 8
9.
UNCLASSIFIED / FOUO Correlation vs. Causation Correlation by itself does not imply a cause and effect relationship! Other examples? Average life expectancy Gas mileage # divorces/10,000 Price of automobiles Lurking variables! When is it correct to infer causation? Simple Linear Regression UNCLASSIFIED / FOUO 9
10.
UNCLASSIFIED / FOUO Example: Mortgage Estimates A Belt is trying to reduce the call length for military clients calling for a good faith estimate on a VA loan The Belt thinks that there is a relationship between broker experience and call length, and creates a scatter plot to visualize the relationship Simple Linear Regression UNCLASSIFIED / FOUO 10
11.
UNCLASSIFIED / FOUO Example: Mortgage Estimate Scatter Plot Hypothesis: Brokers with more experience can provide estimates in a shorter time. 60 50 Call Length 40 30 20 10 20 30 Broker Experience Does it look like a relationship exists between Broker Experience and Call Length? Simple Linear Regression UNCLASSIFIED / FOUO 11
12.
UNCLASSIFIED / FOUO Scatter Plot - Structure Y Axis 60 Paired (Result?) Data 50 Call Length 40 X Axis 30 ( Suspected Influence ) 20 10 20 30 Broker Experience Paired Data? To use a scatter plot, you must have measured two factors for a single observation or item (ex: for a given measurement, you need to know both the call length and the broker’s experience). You have to make sure that the data “pair-up” properly in Minitab, or the diagram will be meaningless. Simple Linear Regression UNCLASSIFIED / FOUO 12
13.
UNCLASSIFIED / FOUO Input, Process, Output Context PREDICTOR MEASURES RESULTS MEASURES Y (X) (X) (Y) Input Process Output • Arrival • Customer Time Satisfaction • Accuracy • Total • Cost Defects • Key Specs • Cycle Time • Cost • Time Per Task • In-Process Errors • Labor Hours • Exceptions X Axis – Y Axis – Independent Variable Dependent Variable X Simple Linear Regression UNCLASSIFIED / FOUO 13
14.
UNCLASSIFIED / FOUO Scatter Plots No Correlation Negative Curvilinear Positive See how one factor relates to changes in another Develop and/or verify hypotheses Judge strength of relationship by width or tightness of scatter Don’t assume a causal relationship! Simple Linear Regression UNCLASSIFIED / FOUO 14
15.
UNCLASSIFIED / FOUO Exercise: Interpreting Scatter Plots 1. As a team, review assigned Scatter Plots – see next pages 2. What kind of correlation do you see? (Name) 3. What does it mean? 4. What can you conclude? 5. What data might this represent? (Example) Simple Linear Regression UNCLASSIFIED / FOUO 15
16.
UNCLASSIFIED / FOUO Example One Simple Linear Regression UNCLASSIFIED / FOUO 16
17.
UNCLASSIFIED / FOUO Example Two Simple Linear Regression UNCLASSIFIED / FOUO 17
18.
UNCLASSIFIED / FOUO Example Three Simple Linear Regression UNCLASSIFIED / FOUO 18
19.
UNCLASSIFIED / FOUO Minitab Example: Scatter Plot Next, we will work through a Minitab example using data collected at the Anthony’s Pizza company The Belt suspects that the customers have to wait too long on days when there are many deliveries to make at Anthony’s Pizza Simple Linear Regression UNCLASSIFIED / FOUO 19
20.
UNCLASSIFIED / FOUO Minitab Example: Pizza Scatter Plot A month of data was collected, and stored in the Minitab file Regression-Pizza.mtw Simple Linear Regression UNCLASSIFIED / FOUO 20
22.
UNCLASSIFIED / FOUOPizza Scatter Plot (Cont.) When you click on Scatterplots, this is the first dialog box that comes up 3. Select the Simple Scatterplot 4. Click on OK to move to the next dialog box Simple Linear Regression UNCLASSIFIED / FOUO 22
23.
UNCLASSIFIED / FOUOPizza Scatter Plot (Cont.) 5. Double click on C5 Wait Time to enter it as the Y variable, then double click on C6 Deliveries to enter it as the X variable 6. Edit dialog box options (Optional) 7. Click OK Simple Linear Regression UNCLASSIFIED / FOUO 23
24.
UNCLASSIFIED / FOUO Pizza Scatter Plot (Cont.) Does it look like the number of Deliveries influences the customer’s Wait Time? Scatterplot of Wait Time vs Deliveries 55 50 Wait Time 45 40 35 10 15 20 25 30 35 Deliveries Simple Linear Regression UNCLASSIFIED / FOUO 24
25.
UNCLASSIFIED / FOUOPizza Scatter Plot (Cont.) Note: Hold your cursor over any point on the Scatterplot and Minitab will identify the Row, X-Value and Y-Value for that point Simple Linear Regression UNCLASSIFIED / FOUO 25
26.
UNCLASSIFIED / FOUO Correlation Coefficients (r & r2) Numbers that indicate the strength of the correlation between two factors r - strength and the direction of the relationship Also called Pearson’s Correlation Coefficient r2 - percentage of variation in Y attributable to the independent variable X. Adds precision to a person’s visual judgment about correlation Test the power of your hypothesis How much influence does this factor have? Are there other, more important, “vital few” causes? Simple Linear Regression UNCLASSIFIED / FOUO 26
27.
UNCLASSIFIED / FOUO Interpreting Correlation Coefficients r falls on or between -1 and 1 Calculate in Minitab Figures below -0.65 and above 0.65 indicate a meaningful correlation 1 = “Perfect” positive correlation r=0 -1 = “Perfect” negative correlation Use to calculate r2 r=-.8 Simple Linear Regression UNCLASSIFIED / FOUO 27
28.
UNCLASSIFIED / FOUO Pearson Correlation Coefficient (r) – Mortgage Betty Black Belt used the scatter plot to get a visual picture of the relationship between broker experience and call length Now she uses the Pearson Correlation Coefficient, r, to quantify the strength of the relationship 60 50 Call Length 40 r = - 0.896 30 (a strong negative correlation) 20 10 20 30 Broker Experience Simple Linear Regression UNCLASSIFIED / FOUO 28
29.
UNCLASSIFIED / FOUO Exercise: Correlation The scatter plot shows that the customers are waiting longer when Anthony’s Pizza has to make more deliveries Next, the Belt wants to quantify the strength of that relationship To do that, we will calculate the Pearson Correlation Coefficient, r Simple Linear Regression UNCLASSIFIED / FOUO 29
31.
UNCLASSIFIED / FOUO Correlation Input Window 2. Double click on C5 Wait Time and C6 Deliveries to add them to the Variables box 3. Uncheck the box, Display p-values 4. Click OK Simple Linear Regression UNCLASSIFIED / FOUO 31
32.
UNCLASSIFIED / FOUO Correlation Coefficient Since r, the Pearson correlation, is 0.970, there is a meaningful correlation between the wait time and number of deliveries Simple Linear Regression UNCLASSIFIED / FOUO 32
33.
UNCLASSIFIED / FOUO Interpreting Coefficients – r2 First, we obtained r from the Correlation analysis Next, in Regression, we will look at r2 to see how good our model (regression equation) is r2: Compute by multiplying r x r (Pearson correlation squared) Example: With an r value of .970, in the Pizza example, the team computed r2 : .970 x .970 = .941 or 94.1% So, 94% of the variation in wait time is explained by the variability in deliveries Simple Linear Regression UNCLASSIFIED / FOUO 33
34.
UNCLASSIFIED / FOUO Regression Analysis Regression Analysis is used in conjunction with Correlation and Scatter Plots to predict future performance using past results While Correlation shows how much linear relationship exists between two variables, Regression defines the relationship more precisely Use this tool when there is existing data over a defined range Regression analysis is a tool that uses data on relevant variables to develop a prediction equation, or model Simple Linear Regression UNCLASSIFIED / FOUO 34
35.
UNCLASSIFIED / FOUO Linear Regression In Simple Linear Regression, a single variable “X” is used to define/predict “Y” e.g.; Wait Time = B1 + (B2) x (Deliveries) + (error) Simple Regression Equation: Y = B1 + (B2) x (X) + Y B2 = Slope y x X Simple Linear Regression UNCLASSIFIED / FOUO 35
36.
UNCLASSIFIED / FOUO Exercise: Regression Since the Pearson Correlation (r) was .970, we know that there is a strong positive correlation between the number of deliveries and the wait time Next, the Belt would like to get an equation to predict how long the customers will be waiting Simple Linear Regression UNCLASSIFIED / FOUO 36
37.
UNCLASSIFIED / FOUO Regression (Cont.) 1. Choose Stat>Regression>Fitted Line Plot Simple Linear Regression UNCLASSIFIED / FOUO 37
38.
UNCLASSIFIED / FOUO Fitted Line Input Window 2. Double click on C5 Wait Time to enter it as the Response (Y) variable 3. Double click on C6 Deliveries to enter it as the Predictor (X) variable 4. Make sure Linear is checked for the type of Regression 5.Edit dialog box options (Optional) 6. Click OK Simple Linear Regression UNCLASSIFIED / FOUO 38
39.
UNCLASSIFIED / FOUO Pizza Regression Plot Fitted Line Plot Wait Time = 32.05 + 0.5825 Deliveries 55 S 1.11885 R-Sq 94.1% R-Sq(adj) 93.9% 50 Wait Time 45 40 35 10 15 20 25 30 35 Deliveries Simple Linear Regression UNCLASSIFIED / FOUO 39
40.
UNCLASSIFIED / FOUORegression Analysis Results – Session Window Prediction Equation (Regression Model) R-Sq is the amount of variation in the data explained by the model. Notice that 94.1 = .970 * .970. R-Sq is the square of the Pearson correlation from the previous analysis. Simple Linear Regression UNCLASSIFIED / FOUO 40
41.
UNCLASSIFIED / FOUO Using the Prediction Equation If we have 20 deliveries to make, how long will the customer have to wait for their order? Based on our 30 minute guarantee, how acceptable is our performance? Simple Linear Regression UNCLASSIFIED / FOUO 41
42.
UNCLASSIFIED / FOUOMethod of “Least Squares”Regression – Technical Note Fitted Line Plot Wait Time = 32.05 + 0.5825 Deliveries 55 ˆ Y 50 “fitted” observation (the line) Wait Time 45 Y 40 true observation (the data point) 35 10 15 20 25 30 35 Deliveries Minitab will find the “best fitting” line for us. How does it do that? •We want to have as little difference as possible between the true observations and the fitted line •Minitab minimizes the sums of squares of the distance between the fitted and true observations Simple Linear Regression UNCLASSIFIED / FOUO 42
43.
UNCLASSIFIED / FOUO Multiple Regression Use this when you want to consider more than one predictor variable The benefit is that you might need more predictors to create an accurate model In the case of our Anthony’s Pizza example, we may want to look at the impact that incorrect orders, damaged pizzas, and cold pizzas have on wait time Simple Linear Regression UNCLASSIFIED / FOUO 43
44.
UNCLASSIFIED / FOUO Individual Exercise: Pizza As a Anthony’s Pizza Belt, you suspect that the number of pizza defects increases when more pizzas are ordered. You want to visualize the data and quantify the relationship Use the Minitab file Pizza Exercise.mtw data to investigate the relationship between “Total Pizzas” and “Defects” Create a scatter plot Determine correlation Create a fitted line plot Determine the prediction equation How many defects do we usually have when 50 pizzas are on order? What do you think of this model? Simple Linear Regression UNCLASSIFIED / FOUO 44
45.
UNCLASSIFIED / FOUO Another Exercise: Absentee Rate The human resources director of a chain of fast-food restaurants studied the absentee rate of employees. Whenever employees called in sick, or simply did not show up, the restaurant manager had to find replacements in a hurry, or else work short-handed The director had data on the number of absences per 100 employees per week (Y) and the average number of months’ experience at the restaurant (X) for 10 restaurants in the chain. The director expected that long-term employees would be more reliable and absent less often Simple Linear Regression UNCLASSIFIED / FOUO 45
46.
UNCLASSIFIED / FOUO Absentee Rate 1. Open an blank Minitab worksheet Experience Absences and input the data 18.1 31.5 2. Create a scatter plot and decide 20.0 33.1 whether a straight line is a 20.8 27.4 reasonable model 21.5 24.5 3. Conduct a regression analysis and 22.0 27.0 get the linear prediction equation 22.4 27.8 4. Predict the number of absences for 22.9 23.3 employees with 19.5 months of 24.0 24.7 experience 25.4 16.9 27.3 18.1 Simple Linear Regression UNCLASSIFIED / FOUO 46
47.
UNCLASSIFIED / FOUO Takeaways Start with a visual tool – create a scatter plot Determine the Pearson correlation coefficient, r, to determine the strength of the relationship Remember that correlation does not guarantee causation! Create and interpret the Regression Plot Use the prediction equation Validate the prediction model’s r-squared using new data (not part of the data set used in creating the prediction equation) Simple Linear Regression UNCLASSIFIED / FOUO 47
48.
UNCLASSIFIED / FOUO What other comments or questions do you have? UNCLASSIFIED / FOUO