SlideShare a Scribd company logo
1 of 12
Section & Lesson #:
Pre-Requisite Lessons:
Complex Tools + Clear Teaching = Powerful Results
Testing for Multicollinearity
Six Sigma-Improve – Lesson 3
A review of how we can assess if the factors tested in the hypothesis tests
in the Analyze phase have multicollinearity (i.e., interdependency).
Six Sigma-Analyze #09 through #28 – Hypothesis Testing
Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any means
(electronic, mechanical, photographic, photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher.
Why do we need hypothesis testing?
o Remember, our project goal is to resolve a problem by first building a transfer function.
• We don’t want to just alleviate symptoms, we want to resolve the root cause.
 Remember Hannah? We don’t want to alleviate the arthritis pain in her leg, but heal the strep throat.
• If we don’t know what the root cause is, then we need to build a transfer function.
 By building a transfer function, we can know what changes (improvements) should fix the root cause.
o Remember, the Transfer Function is defined as Y = f(X).
• This is described as “output response Y
is a function of one or more input X’s”.
• It’s part of the IPO flow model where we
described the IPO flow model as one or
more inputs feeding into a process that
transforms it to create a new output.
o How does a transfer function fit with hypothesis testing?
• Hypothesis testing tells us which X’s (inputs) are independently influencing the Y (output).
 When we reject a null hypothesis, we’re building evidence proving which X’s are “guilty” of driving the Y.
 We’ll compile all the evidence in the Improve phase of DMAIC and begin to fix those root causes.
Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be
reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic,
photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher.
Y = f(X)
Input (X) > Process > Output (Y)
Multicollinearity Defined
o What is multicollinearity?
• When building the transfer function, we expect each X (input) to be independent.
 If they’re not, then it could impede accurate control of the inputs to create the desired Y (output).
• Multicollinearity is when two or more independent variables are found to have inter-dependency.
o Multicollinearity Example:
• Transfer Function: Fuel Efficiency (Y) = f(fuel price, speed, engine size, vehicle weight, etc.)
 Engine size and vehicle weight are each considered independent factors that influence fuel efficiency.
 But the bigger the engine, the more it adds to the vehicle weight too.
o How do we test for multicollinearity?
• Use Multiple Regression Procedures.
 These consist of 7 general steps using correlation testing, matrix plots, regressions, etc.
Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be
reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic,
photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher.
3
Multiple Regression Procedures (1 of 7)
o The process for Multiple Regressions depends on the following assumptions:
• The output (Y) is normal.
 Validate by the P value of a normality test or probability plot. If P > .05, then it’s normal.
 If data is not normal, consider transforming the data to make it normal or assume risk of non-normality.
• The correct sample size is used.
 Run a sample size calculation on the data to ensure there are enough data points.
• Each X being tested is continuous.
 You can try converting your X values to numeric values to “fool” Minitab into treating them as continuous.
o Step 1 – Test each X against the Y to ensure there is a relationship.
• You should have already done this in the Analyze phase; check for high R2(adj) value.
Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be
reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic,
photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher.
4
Multiple Regression Procedures (2 of 7)
o Step 2 – Run a Matrix Plot for all Xs and the Y
• In Minitab, go to Graph > Matrix Plot; check for any linear relationships of possible collinearity.
• Based on the above example Matrix Plot, Metrics C, D & E should be further examined to see if
one or more of them need to be excluded.
Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be
reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic,
photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher.
5
Despite the correlation
of MetricA (Y) to Metrics
C, D & E (Xs)…
There appears to be
multicollinearity
between Metrics C, D & E
Multiple Regression Procedures (3 of 7)
o Step 3 – Run a Correlation Matrix of all Xs and the Y
• In Minitab, go to Stat > Basic Statistics > Correlation; look for low P value and high R2(adj) value
• Again, based on the above Correlation example, Metrics C, D & E should be further examined to
see if one or more of them need to be excluded.
Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be
reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic,
photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher.
6
Correlations: MetricA, MetricB, MetricC, MetricD, MetricE
MetricA MetricB MetricC MetricD
MetricB 0.115
0.255
MetricC 0.798 0.079
0.000 0.434
MetricD 0.999 0.119 0.797
0.000 0.237 0.000
MetricE 1.000 0.132 0.797 0.999
0.000 0.191 0.000 0.000
Cell Contents: Pearson correlation
P-Value
Despite the correlation
of MetricA (Y) to Metrics
C, D & E (Xs)…
There appears to be
multicollinearity
between Metrics C, D & E
Multiple Regression Procedures (4 of 7)
o Step 4 – Run a multiple regression generating Variance Inflation Factors (VIFs)
• VIF calculates the degree of multicollinearity in at least one tested factor:
 VIF > 10 = HIGH multicollinearity
 VIF > 5 and < 10 = moderate degree of multicollinearity
 VIF < 5 = little or no multicollinearity
• In Minitab, go to Stat > Regression, then select “Variance Inflation Factors” in the Options box
• Again, based on the above Regression example, Metrics C, D & E should be further examined to
see if one or more of them need to be excluded.
 Just because MetricC has a low VIF doesn’t mean we keep it and exclude Metrics D & E. We already know
from the other tests that MetricC may also need to be excluded. The team can validate this.
Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be
reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic,
photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher.
7
Regression Analysis: MetricA versus MetricB, MetricC, MetricD, MetricE
The regression equation is
MetricA = 8.29 - 15.1 MetricB + 0.000008 MetricC + 0.281 MetricD + 0.453 MetricE
Predictor Coef SE Coef T P VIF
Constant 8.2921 0.3549 23.37 0.000
MetricB -15.0965 0.5397 -27.97 0.000 1.076
MetricC 0.00000771 0.00000377 2.04 0.044 2.750
MetricD 0.28085 0.03124 8.99 0.000 364.714
MetricE 0.452872 0.005237 86.48 0.000 366.737
S = 1.29942 R-Sq = 100.0% R-Sq(adj) = 100.0%
Very high VIF (anything > 10)
indicates multicollinearity.
Multiple Regression Procedures (5 of 7)
o Step 5 – Reduce the factors to the most critical (with no multicollinearity). 2 ways:
• Method A – Manually
 If relationships exist between X factors from step 2 (Matrix Plot) and step 3 (Correlation), then re-run step
4 (Regression with VIF) and exclude one of these related factors. Repeat until the VIFs are < 5.
• Method B – Stepwise
 In Minitab, go to Stat > Regression > Stepwise…
 This test attempts to automatically determine the critical X factors. As such, it is only as effective as the
data you input to the test, so ensure the output results are logical and validated by the team.
Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be
reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic,
photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher.
8
Stepwise Regression: MetricA vs MetricB, MetricC, MetricD, MetricE
Response is MetricA on 4 predictors, with N = 100
Step 1 2 3 4
Constant 4.697 9.254 8.403 8.292
MetricE 0.49945 0.50057 0.45343 0.45287
T-Value 535.59 1329.52 85.30 86.48
P-Value 0.000 0.000 0.000 0.000
MetricB -16.26 -15.14 -15.10
T-Value -22.69 -27.62 -27.97
P-Value 0.000 0.000 0.000
MetricD 0.282 0.281
T-Value 8.88 8.99
P-Value 0.000 0.000
MetricC 0.00001
T-Value 2.04
P-Value 0.044
S 4.43 1.77 1.32 1.30
R-Sq 99.97 99.99 100.00 100.00
R-Sq(adj) 99.97 99.99 100.00 100.00
1. Ensure factors have low P value
2. Ensure R2(adj) is high
3. Ensure Std Deviation (S) is low
4. These are the factor coefficients
for the Transfer Function. If they
are near 0 and don’t affect the
R2(adj), then they can probably be
excluded.
Multiple Regression Procedures (6 of 7)
o Step 6 - Evaluate the quality of the final transfer function
• Build the transfer function and validate with team what factors to include/exclude
 Ensure all VIFs are < 10 and preferably < 5
 Ensure the residuals are independent and normally distributed
 Ensure any outliers or unusual observations are validated by the team
• Use the Stepwise Regression to get the Constant & Coefficients for the Transfer Function
Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be
reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic,
photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher.
9
Stepwise Regression: MetricA vs MetricB, MetricC, MetricD, MetricE
Response is MetricA on 4 predictors, with N = 100
Step 1 2 3 4
Constant 4.697 9.254 8.403 8.292
MetricE 0.49945 0.50057 0.45343 0.45287
T-Value 535.59 1329.52 85.30 86.48
P-Value 0.000 0.000 0.000 0.000
MetricB -16.26 -15.14 -15.10
T-Value -22.69 -27.62 -27.97
P-Value 0.000 0.000 0.000
MetricD 0.282 0.281
T-Value 8.88 8.99
P-Value 0.000 0.000
MetricC 0.00001
T-Value 2.04
P-Value 0.044
S 4.43 1.77 1.32 1.30
R-Sq 99.97 99.99 100.00 100.00
R-Sq(adj) 99.97 99.99 100.00 100.00
It appears step 2 using Metrics B & E
have high R2(adj), low S and low VIF.
Build Transfer Function excluding
Metric C & D, if team agrees.
MetricA = 9.25 + 0.5(MetricE) + (-16.26)(MetricB)
Multiple Regression Procedures (7 of 7)
o Step 7 – Assess predictive capability of the transfer function
• Go to Stat > Regression > Regression > Options; add prediction intervals to test
 Helps answer the question “What would the output Y be when the input Xs have these values?”
 To do this, type in values for the X factors (“predictors”)
 Examine the PI (Predicted Interval) in the Predicted Values section to see the predictive capability
Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be
reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic,
photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher.
10
Interpreted as:
“We are 95% certain that MetricA will
be between 254.99 and 262.14 when
MetricE = 500 and Metric B = 6%”
Predicted Values for New Observations
New
Obs Fit SE Fit 95% CI 95% PI
1 258.57 0.32 (257.94, 259.19) (254.99, 262.14)
Values of Predictors for New Observations
New
Obs MetricE MetricB
1 500 0.0600
Validate Results with the Team
o Always validate your findings and interpreted results with your team.
• The team (or any other supporting SME) is your
BEST validation of practical significance.
• It’s helpful to keep the team updated regularly
with findings to help steer your analysis.
 Example: setup recurring meetings to review the findings.
If they challenge a finding that you believe is significant,
then explore other ways you can substantiate either your
finding or their dispute.
o Your goal of analysis is not to be the mastermind who finds the silver bullet.
• A silver bullet is only a paperweight unless you can shoot it at the right target.
• You must work with your team to accurately find, aim at, and hit your target root cause.
Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be
reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic,
photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher.
11
Practical Application
o Refer to the critical metric (output Y) and at least 5 factors (input X’s) you identified in
the Analyze phase for which you did hypothesis testing.
• Run through the Multiple Regression Procedures to test for multicollinearity in your data.
 Does any multicollinearity exist? If so, how do you know it?
– Is the multicollinearity logical? (That is, does it make sense? Did you know the interdependency existed before running
this test?)
 How would you modify your transfer function to account for this multicollinearity?
Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be
reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic,
photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher.
12

More Related Content

What's hot

Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)Matt Hansen
 
Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)Matt Hansen
 
Hypothesis Testing: Formal and Informal Sub-Processes
Hypothesis Testing: Formal and Informal Sub-ProcessesHypothesis Testing: Formal and Informal Sub-Processes
Hypothesis Testing: Formal and Informal Sub-ProcessesMatt Hansen
 
Hypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence IntervalsHypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence IntervalsMatt Hansen
 
Hypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical TestHypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical TestMatt Hansen
 
Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)Matt Hansen
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Matt Hansen
 
Hypothesis Testing: Relationships (Overview)
Hypothesis Testing: Relationships (Overview)Hypothesis Testing: Relationships (Overview)
Hypothesis Testing: Relationships (Overview)Matt Hansen
 
Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Matt Hansen
 
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)Matt Hansen
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive StatisticsMatt Hansen
 
Process Capability: Step 6 (Binomial)
Process Capability: Step 6 (Binomial)Process Capability: Step 6 (Binomial)
Process Capability: Step 6 (Binomial)Matt Hansen
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)Matt Hansen
 
Process Capability: Steps 1 to 3
Process Capability: Steps 1 to 3Process Capability: Steps 1 to 3
Process Capability: Steps 1 to 3Matt Hansen
 
Hypothesis Testing: Overview
Hypothesis Testing: OverviewHypothesis Testing: Overview
Hypothesis Testing: OverviewMatt Hansen
 
Defining Performance Objectives
Defining Performance ObjectivesDefining Performance Objectives
Defining Performance ObjectivesMatt Hansen
 
Statistical Process Control (SPC)
Statistical Process Control (SPC)Statistical Process Control (SPC)
Statistical Process Control (SPC)Matt Hansen
 
205250 crystall ball
205250 crystall ball205250 crystall ball
205250 crystall ballp6academy
 
MSA – Overview
MSA – OverviewMSA – Overview
MSA – OverviewMatt Hansen
 
Variation Causes (Common vs. Special)
Variation Causes (Common vs. Special)Variation Causes (Common vs. Special)
Variation Causes (Common vs. Special)Matt Hansen
 

What's hot (20)

Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)
 
Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)
 
Hypothesis Testing: Formal and Informal Sub-Processes
Hypothesis Testing: Formal and Informal Sub-ProcessesHypothesis Testing: Formal and Informal Sub-Processes
Hypothesis Testing: Formal and Informal Sub-Processes
 
Hypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence IntervalsHypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence Intervals
 
Hypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical TestHypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical Test
 
Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
 
Hypothesis Testing: Relationships (Overview)
Hypothesis Testing: Relationships (Overview)Hypothesis Testing: Relationships (Overview)
Hypothesis Testing: Relationships (Overview)
 
Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)
 
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Process Capability: Step 6 (Binomial)
Process Capability: Step 6 (Binomial)Process Capability: Step 6 (Binomial)
Process Capability: Step 6 (Binomial)
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
 
Process Capability: Steps 1 to 3
Process Capability: Steps 1 to 3Process Capability: Steps 1 to 3
Process Capability: Steps 1 to 3
 
Hypothesis Testing: Overview
Hypothesis Testing: OverviewHypothesis Testing: Overview
Hypothesis Testing: Overview
 
Defining Performance Objectives
Defining Performance ObjectivesDefining Performance Objectives
Defining Performance Objectives
 
Statistical Process Control (SPC)
Statistical Process Control (SPC)Statistical Process Control (SPC)
Statistical Process Control (SPC)
 
205250 crystall ball
205250 crystall ball205250 crystall ball
205250 crystall ball
 
MSA – Overview
MSA – OverviewMSA – Overview
MSA – Overview
 
Variation Causes (Common vs. Special)
Variation Causes (Common vs. Special)Variation Causes (Common vs. Special)
Variation Causes (Common vs. Special)
 

Similar to Testing for Multicollinearity

Control Charts: U Chart
Control Charts: U ChartControl Charts: U Chart
Control Charts: U ChartMatt Hansen
 
Data Types with Matt Hansen at StatStuff
Data Types with Matt Hansen at StatStuffData Types with Matt Hansen at StatStuff
Data Types with Matt Hansen at StatStuffMatt Hansen
 
Identify Root Causes – C&E Matrix
Identify Root Causes – C&E MatrixIdentify Root Causes – C&E Matrix
Identify Root Causes – C&E MatrixMatt Hansen
 
Distributions: Overview with Matt Hansen at StatStuff
Distributions: Overview with Matt Hansen at StatStuffDistributions: Overview with Matt Hansen at StatStuff
Distributions: Overview with Matt Hansen at StatStuffMatt Hansen
 
Distributions: Non-Normal with Matt Hansen at StatStuff
Distributions: Non-Normal with Matt Hansen at StatStuffDistributions: Non-Normal with Matt Hansen at StatStuff
Distributions: Non-Normal with Matt Hansen at StatStuffMatt Hansen
 
Control Charts: P Chart
Control Charts: P ChartControl Charts: P Chart
Control Charts: P ChartMatt Hansen
 
Rational Sub-Grouping
Rational Sub-GroupingRational Sub-Grouping
Rational Sub-GroupingMatt Hansen
 
Comparing Distributions and Using the Graphical Summary
Comparing Distributions and Using the Graphical SummaryComparing Distributions and Using the Graphical Summary
Comparing Distributions and Using the Graphical SummaryMatt Hansen
 
Control Charts: Finding the Right Control Chart
Control Charts: Finding the Right Control ChartControl Charts: Finding the Right Control Chart
Control Charts: Finding the Right Control ChartMatt Hansen
 
Supervised learning - Linear and Logistic Regression( AI, ML)
Supervised learning - Linear and Logistic Regression( AI, ML)Supervised learning - Linear and Logistic Regression( AI, ML)
Supervised learning - Linear and Logistic Regression( AI, ML)Rahul Pal
 
Compiling Analysis Results
Compiling Analysis ResultsCompiling Analysis Results
Compiling Analysis ResultsMatt Hansen
 
Lecture 3 Statistical ProcessControl (SPC).docx
Lecture 3 Statistical ProcessControl (SPC).docxLecture 3 Statistical ProcessControl (SPC).docx
Lecture 3 Statistical ProcessControl (SPC).docxsmile790243
 
Control Charts: Xbar-S Chart
Control Charts: Xbar-S ChartControl Charts: Xbar-S Chart
Control Charts: Xbar-S ChartMatt Hansen
 
Transfer Function with Matt Hansen at StatStuff
Transfer Function with Matt Hansen at StatStuffTransfer Function with Matt Hansen at StatStuff
Transfer Function with Matt Hansen at StatStuffMatt Hansen
 
A05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat TestsA05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat TestsLeanleaders.org
 

Similar to Testing for Multicollinearity (17)

Control Charts: U Chart
Control Charts: U ChartControl Charts: U Chart
Control Charts: U Chart
 
Data Types with Matt Hansen at StatStuff
Data Types with Matt Hansen at StatStuffData Types with Matt Hansen at StatStuff
Data Types with Matt Hansen at StatStuff
 
Identify Root Causes – C&E Matrix
Identify Root Causes – C&E MatrixIdentify Root Causes – C&E Matrix
Identify Root Causes – C&E Matrix
 
Distributions: Overview with Matt Hansen at StatStuff
Distributions: Overview with Matt Hansen at StatStuffDistributions: Overview with Matt Hansen at StatStuff
Distributions: Overview with Matt Hansen at StatStuff
 
Distributions: Non-Normal with Matt Hansen at StatStuff
Distributions: Non-Normal with Matt Hansen at StatStuffDistributions: Non-Normal with Matt Hansen at StatStuff
Distributions: Non-Normal with Matt Hansen at StatStuff
 
Control Charts: P Chart
Control Charts: P ChartControl Charts: P Chart
Control Charts: P Chart
 
Rational Sub-Grouping
Rational Sub-GroupingRational Sub-Grouping
Rational Sub-Grouping
 
AIG Seven QC Tools
AIG Seven QC ToolsAIG Seven QC Tools
AIG Seven QC Tools
 
Comparing Distributions and Using the Graphical Summary
Comparing Distributions and Using the Graphical SummaryComparing Distributions and Using the Graphical Summary
Comparing Distributions and Using the Graphical Summary
 
Control Charts: Finding the Right Control Chart
Control Charts: Finding the Right Control ChartControl Charts: Finding the Right Control Chart
Control Charts: Finding the Right Control Chart
 
Supervised learning - Linear and Logistic Regression( AI, ML)
Supervised learning - Linear and Logistic Regression( AI, ML)Supervised learning - Linear and Logistic Regression( AI, ML)
Supervised learning - Linear and Logistic Regression( AI, ML)
 
Compiling Analysis Results
Compiling Analysis ResultsCompiling Analysis Results
Compiling Analysis Results
 
Lecture 3 Statistical ProcessControl (SPC).docx
Lecture 3 Statistical ProcessControl (SPC).docxLecture 3 Statistical ProcessControl (SPC).docx
Lecture 3 Statistical ProcessControl (SPC).docx
 
Control Charts: Xbar-S Chart
Control Charts: Xbar-S ChartControl Charts: Xbar-S Chart
Control Charts: Xbar-S Chart
 
Transfer Function with Matt Hansen at StatStuff
Transfer Function with Matt Hansen at StatStuffTransfer Function with Matt Hansen at StatStuff
Transfer Function with Matt Hansen at StatStuff
 
C O N T R O L L P R E S E N T A T I O N
C O N T R O L L  P R E S E N T A T I O NC O N T R O L L  P R E S E N T A T I O N
C O N T R O L L P R E S E N T A T I O N
 
A05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat TestsA05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat Tests
 

More from Matt Hansen

Getting Feedback with a Plus/Delta Tool
Getting Feedback with a Plus/Delta ToolGetting Feedback with a Plus/Delta Tool
Getting Feedback with a Plus/Delta ToolMatt Hansen
 
Closing a Project
Closing a ProjectClosing a Project
Closing a ProjectMatt Hansen
 
Documenting a New Process with SOPs
Documenting a New Process with SOPsDocumenting a New Process with SOPs
Documenting a New Process with SOPsMatt Hansen
 
Building a Control Plan
Building a Control PlanBuilding a Control Plan
Building a Control PlanMatt Hansen
 
Control Charts: Recalculating Control Limits
Control Charts: Recalculating Control LimitsControl Charts: Recalculating Control Limits
Control Charts: Recalculating Control LimitsMatt Hansen
 
Control Charts: I-MR Chart
Control Charts: I-MR ChartControl Charts: I-MR Chart
Control Charts: I-MR ChartMatt Hansen
 
Building a Scorecard
Building a ScorecardBuilding a Scorecard
Building a ScorecardMatt Hansen
 
Control Phase Roadmap (Level 3)
Control Phase Roadmap (Level 3)Control Phase Roadmap (Level 3)
Control Phase Roadmap (Level 3)Matt Hansen
 
Piloting Solutions: Build the Pilot Plan
Piloting Solutions: Build the Pilot PlanPiloting Solutions: Build the Pilot Plan
Piloting Solutions: Build the Pilot PlanMatt Hansen
 
Piloting Solutions: The Process
Piloting Solutions: The ProcessPiloting Solutions: The Process
Piloting Solutions: The ProcessMatt Hansen
 
Risk Assessment with a FMEA Tool
Risk Assessment with a FMEA ToolRisk Assessment with a FMEA Tool
Risk Assessment with a FMEA ToolMatt Hansen
 
Prioritize Solutions with an Impact Matrix
Prioritize Solutions with an Impact MatrixPrioritize Solutions with an Impact Matrix
Prioritize Solutions with an Impact MatrixMatt Hansen
 
Brainstorm Solutions with an Affinity Diagram
Brainstorm Solutions with an Affinity DiagramBrainstorm Solutions with an Affinity Diagram
Brainstorm Solutions with an Affinity DiagramMatt Hansen
 
Brainstorm & Prioritize Solutions with a Workout
Brainstorm & Prioritize Solutions with a WorkoutBrainstorm & Prioritize Solutions with a Workout
Brainstorm & Prioritize Solutions with a WorkoutMatt Hansen
 
Improve Phase Roadmap (Level 3)
Improve Phase Roadmap (Level 3)Improve Phase Roadmap (Level 3)
Improve Phase Roadmap (Level 3)Matt Hansen
 

More from Matt Hansen (15)

Getting Feedback with a Plus/Delta Tool
Getting Feedback with a Plus/Delta ToolGetting Feedback with a Plus/Delta Tool
Getting Feedback with a Plus/Delta Tool
 
Closing a Project
Closing a ProjectClosing a Project
Closing a Project
 
Documenting a New Process with SOPs
Documenting a New Process with SOPsDocumenting a New Process with SOPs
Documenting a New Process with SOPs
 
Building a Control Plan
Building a Control PlanBuilding a Control Plan
Building a Control Plan
 
Control Charts: Recalculating Control Limits
Control Charts: Recalculating Control LimitsControl Charts: Recalculating Control Limits
Control Charts: Recalculating Control Limits
 
Control Charts: I-MR Chart
Control Charts: I-MR ChartControl Charts: I-MR Chart
Control Charts: I-MR Chart
 
Building a Scorecard
Building a ScorecardBuilding a Scorecard
Building a Scorecard
 
Control Phase Roadmap (Level 3)
Control Phase Roadmap (Level 3)Control Phase Roadmap (Level 3)
Control Phase Roadmap (Level 3)
 
Piloting Solutions: Build the Pilot Plan
Piloting Solutions: Build the Pilot PlanPiloting Solutions: Build the Pilot Plan
Piloting Solutions: Build the Pilot Plan
 
Piloting Solutions: The Process
Piloting Solutions: The ProcessPiloting Solutions: The Process
Piloting Solutions: The Process
 
Risk Assessment with a FMEA Tool
Risk Assessment with a FMEA ToolRisk Assessment with a FMEA Tool
Risk Assessment with a FMEA Tool
 
Prioritize Solutions with an Impact Matrix
Prioritize Solutions with an Impact MatrixPrioritize Solutions with an Impact Matrix
Prioritize Solutions with an Impact Matrix
 
Brainstorm Solutions with an Affinity Diagram
Brainstorm Solutions with an Affinity DiagramBrainstorm Solutions with an Affinity Diagram
Brainstorm Solutions with an Affinity Diagram
 
Brainstorm & Prioritize Solutions with a Workout
Brainstorm & Prioritize Solutions with a WorkoutBrainstorm & Prioritize Solutions with a Workout
Brainstorm & Prioritize Solutions with a Workout
 
Improve Phase Roadmap (Level 3)
Improve Phase Roadmap (Level 3)Improve Phase Roadmap (Level 3)
Improve Phase Roadmap (Level 3)
 

Recently uploaded

Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth MarketingShawn Pang
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation SlidesKeppelCorporation
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒anilsa9823
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst SummitHolger Mueller
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
Socio-economic-Impact-of-business-consumers-suppliers-and.pptx
Socio-economic-Impact-of-business-consumers-suppliers-and.pptxSocio-economic-Impact-of-business-consumers-suppliers-and.pptx
Socio-economic-Impact-of-business-consumers-suppliers-and.pptxtrishalcan8
 
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service DewasVip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewasmakika9823
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurSuhani Kapoor
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxAndy Lambert
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessAggregage
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Roomdivyansh0kumar0
 
RE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman LeechRE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman LeechNewman George Leech
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 

Recently uploaded (20)

Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst Summit
 
Best Practices for Implementing an External Recruiting Partnership
Best Practices for Implementing an External Recruiting PartnershipBest Practices for Implementing an External Recruiting Partnership
Best Practices for Implementing an External Recruiting Partnership
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
Socio-economic-Impact-of-business-consumers-suppliers-and.pptx
Socio-economic-Impact-of-business-consumers-suppliers-and.pptxSocio-economic-Impact-of-business-consumers-suppliers-and.pptx
Socio-economic-Impact-of-business-consumers-suppliers-and.pptx
 
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service DewasVip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for Success
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
 
RE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman LeechRE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman Leech
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 

Testing for Multicollinearity

  • 1. Section & Lesson #: Pre-Requisite Lessons: Complex Tools + Clear Teaching = Powerful Results Testing for Multicollinearity Six Sigma-Improve – Lesson 3 A review of how we can assess if the factors tested in the hypothesis tests in the Analyze phase have multicollinearity (i.e., interdependency). Six Sigma-Analyze #09 through #28 – Hypothesis Testing Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic, photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher.
  • 2. Why do we need hypothesis testing? o Remember, our project goal is to resolve a problem by first building a transfer function. • We don’t want to just alleviate symptoms, we want to resolve the root cause.  Remember Hannah? We don’t want to alleviate the arthritis pain in her leg, but heal the strep throat. • If we don’t know what the root cause is, then we need to build a transfer function.  By building a transfer function, we can know what changes (improvements) should fix the root cause. o Remember, the Transfer Function is defined as Y = f(X). • This is described as “output response Y is a function of one or more input X’s”. • It’s part of the IPO flow model where we described the IPO flow model as one or more inputs feeding into a process that transforms it to create a new output. o How does a transfer function fit with hypothesis testing? • Hypothesis testing tells us which X’s (inputs) are independently influencing the Y (output).  When we reject a null hypothesis, we’re building evidence proving which X’s are “guilty” of driving the Y.  We’ll compile all the evidence in the Improve phase of DMAIC and begin to fix those root causes. Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic, photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher. Y = f(X) Input (X) > Process > Output (Y)
  • 3. Multicollinearity Defined o What is multicollinearity? • When building the transfer function, we expect each X (input) to be independent.  If they’re not, then it could impede accurate control of the inputs to create the desired Y (output). • Multicollinearity is when two or more independent variables are found to have inter-dependency. o Multicollinearity Example: • Transfer Function: Fuel Efficiency (Y) = f(fuel price, speed, engine size, vehicle weight, etc.)  Engine size and vehicle weight are each considered independent factors that influence fuel efficiency.  But the bigger the engine, the more it adds to the vehicle weight too. o How do we test for multicollinearity? • Use Multiple Regression Procedures.  These consist of 7 general steps using correlation testing, matrix plots, regressions, etc. Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic, photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher. 3
  • 4. Multiple Regression Procedures (1 of 7) o The process for Multiple Regressions depends on the following assumptions: • The output (Y) is normal.  Validate by the P value of a normality test or probability plot. If P > .05, then it’s normal.  If data is not normal, consider transforming the data to make it normal or assume risk of non-normality. • The correct sample size is used.  Run a sample size calculation on the data to ensure there are enough data points. • Each X being tested is continuous.  You can try converting your X values to numeric values to “fool” Minitab into treating them as continuous. o Step 1 – Test each X against the Y to ensure there is a relationship. • You should have already done this in the Analyze phase; check for high R2(adj) value. Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic, photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher. 4
  • 5. Multiple Regression Procedures (2 of 7) o Step 2 – Run a Matrix Plot for all Xs and the Y • In Minitab, go to Graph > Matrix Plot; check for any linear relationships of possible collinearity. • Based on the above example Matrix Plot, Metrics C, D & E should be further examined to see if one or more of them need to be excluded. Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic, photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher. 5 Despite the correlation of MetricA (Y) to Metrics C, D & E (Xs)… There appears to be multicollinearity between Metrics C, D & E
  • 6. Multiple Regression Procedures (3 of 7) o Step 3 – Run a Correlation Matrix of all Xs and the Y • In Minitab, go to Stat > Basic Statistics > Correlation; look for low P value and high R2(adj) value • Again, based on the above Correlation example, Metrics C, D & E should be further examined to see if one or more of them need to be excluded. Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic, photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher. 6 Correlations: MetricA, MetricB, MetricC, MetricD, MetricE MetricA MetricB MetricC MetricD MetricB 0.115 0.255 MetricC 0.798 0.079 0.000 0.434 MetricD 0.999 0.119 0.797 0.000 0.237 0.000 MetricE 1.000 0.132 0.797 0.999 0.000 0.191 0.000 0.000 Cell Contents: Pearson correlation P-Value Despite the correlation of MetricA (Y) to Metrics C, D & E (Xs)… There appears to be multicollinearity between Metrics C, D & E
  • 7. Multiple Regression Procedures (4 of 7) o Step 4 – Run a multiple regression generating Variance Inflation Factors (VIFs) • VIF calculates the degree of multicollinearity in at least one tested factor:  VIF > 10 = HIGH multicollinearity  VIF > 5 and < 10 = moderate degree of multicollinearity  VIF < 5 = little or no multicollinearity • In Minitab, go to Stat > Regression, then select “Variance Inflation Factors” in the Options box • Again, based on the above Regression example, Metrics C, D & E should be further examined to see if one or more of them need to be excluded.  Just because MetricC has a low VIF doesn’t mean we keep it and exclude Metrics D & E. We already know from the other tests that MetricC may also need to be excluded. The team can validate this. Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic, photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher. 7 Regression Analysis: MetricA versus MetricB, MetricC, MetricD, MetricE The regression equation is MetricA = 8.29 - 15.1 MetricB + 0.000008 MetricC + 0.281 MetricD + 0.453 MetricE Predictor Coef SE Coef T P VIF Constant 8.2921 0.3549 23.37 0.000 MetricB -15.0965 0.5397 -27.97 0.000 1.076 MetricC 0.00000771 0.00000377 2.04 0.044 2.750 MetricD 0.28085 0.03124 8.99 0.000 364.714 MetricE 0.452872 0.005237 86.48 0.000 366.737 S = 1.29942 R-Sq = 100.0% R-Sq(adj) = 100.0% Very high VIF (anything > 10) indicates multicollinearity.
  • 8. Multiple Regression Procedures (5 of 7) o Step 5 – Reduce the factors to the most critical (with no multicollinearity). 2 ways: • Method A – Manually  If relationships exist between X factors from step 2 (Matrix Plot) and step 3 (Correlation), then re-run step 4 (Regression with VIF) and exclude one of these related factors. Repeat until the VIFs are < 5. • Method B – Stepwise  In Minitab, go to Stat > Regression > Stepwise…  This test attempts to automatically determine the critical X factors. As such, it is only as effective as the data you input to the test, so ensure the output results are logical and validated by the team. Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic, photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher. 8 Stepwise Regression: MetricA vs MetricB, MetricC, MetricD, MetricE Response is MetricA on 4 predictors, with N = 100 Step 1 2 3 4 Constant 4.697 9.254 8.403 8.292 MetricE 0.49945 0.50057 0.45343 0.45287 T-Value 535.59 1329.52 85.30 86.48 P-Value 0.000 0.000 0.000 0.000 MetricB -16.26 -15.14 -15.10 T-Value -22.69 -27.62 -27.97 P-Value 0.000 0.000 0.000 MetricD 0.282 0.281 T-Value 8.88 8.99 P-Value 0.000 0.000 MetricC 0.00001 T-Value 2.04 P-Value 0.044 S 4.43 1.77 1.32 1.30 R-Sq 99.97 99.99 100.00 100.00 R-Sq(adj) 99.97 99.99 100.00 100.00 1. Ensure factors have low P value 2. Ensure R2(adj) is high 3. Ensure Std Deviation (S) is low 4. These are the factor coefficients for the Transfer Function. If they are near 0 and don’t affect the R2(adj), then they can probably be excluded.
  • 9. Multiple Regression Procedures (6 of 7) o Step 6 - Evaluate the quality of the final transfer function • Build the transfer function and validate with team what factors to include/exclude  Ensure all VIFs are < 10 and preferably < 5  Ensure the residuals are independent and normally distributed  Ensure any outliers or unusual observations are validated by the team • Use the Stepwise Regression to get the Constant & Coefficients for the Transfer Function Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic, photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher. 9 Stepwise Regression: MetricA vs MetricB, MetricC, MetricD, MetricE Response is MetricA on 4 predictors, with N = 100 Step 1 2 3 4 Constant 4.697 9.254 8.403 8.292 MetricE 0.49945 0.50057 0.45343 0.45287 T-Value 535.59 1329.52 85.30 86.48 P-Value 0.000 0.000 0.000 0.000 MetricB -16.26 -15.14 -15.10 T-Value -22.69 -27.62 -27.97 P-Value 0.000 0.000 0.000 MetricD 0.282 0.281 T-Value 8.88 8.99 P-Value 0.000 0.000 MetricC 0.00001 T-Value 2.04 P-Value 0.044 S 4.43 1.77 1.32 1.30 R-Sq 99.97 99.99 100.00 100.00 R-Sq(adj) 99.97 99.99 100.00 100.00 It appears step 2 using Metrics B & E have high R2(adj), low S and low VIF. Build Transfer Function excluding Metric C & D, if team agrees. MetricA = 9.25 + 0.5(MetricE) + (-16.26)(MetricB)
  • 10. Multiple Regression Procedures (7 of 7) o Step 7 – Assess predictive capability of the transfer function • Go to Stat > Regression > Regression > Options; add prediction intervals to test  Helps answer the question “What would the output Y be when the input Xs have these values?”  To do this, type in values for the X factors (“predictors”)  Examine the PI (Predicted Interval) in the Predicted Values section to see the predictive capability Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic, photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher. 10 Interpreted as: “We are 95% certain that MetricA will be between 254.99 and 262.14 when MetricE = 500 and Metric B = 6%” Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI 1 258.57 0.32 (257.94, 259.19) (254.99, 262.14) Values of Predictors for New Observations New Obs MetricE MetricB 1 500 0.0600
  • 11. Validate Results with the Team o Always validate your findings and interpreted results with your team. • The team (or any other supporting SME) is your BEST validation of practical significance. • It’s helpful to keep the team updated regularly with findings to help steer your analysis.  Example: setup recurring meetings to review the findings. If they challenge a finding that you believe is significant, then explore other ways you can substantiate either your finding or their dispute. o Your goal of analysis is not to be the mastermind who finds the silver bullet. • A silver bullet is only a paperweight unless you can shoot it at the right target. • You must work with your team to accurately find, aim at, and hit your target root cause. Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic, photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher. 11
  • 12. Practical Application o Refer to the critical metric (output Y) and at least 5 factors (input X’s) you identified in the Analyze phase for which you did hypothesis testing. • Run through the Multiple Regression Procedures to test for multicollinearity in your data.  Does any multicollinearity exist? If so, how do you know it? – Is the multicollinearity logical? (That is, does it make sense? Did you know the interdependency existed before running this test?)  How would you modify your transfer function to account for this multicollinearity? Copyright © 2011-2019 by Matthew J. Hansen. All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photographic, photocopying, recording or otherwise) without prior permission in writing by the author and/or publisher. 12