What is Hypothesis Testing?
 In hypothesis testing, relatively small samples are
used to answer questions about population
parameters (inferential statistics)
 There is always a chance that the selected sample
is not representative of the population; therefore,
there is always a chance that the conclusion
obtained is wrong
 With some assumptions, inferential statistics
allows the estimation of the probability of getting
an “odd” sample and quantifies the probability (p-
value) of a wrong conclusion
Hypothesis Testing–Introduction
- Refers to the use of statistical analysis to determine if
observed differences between two or more data samples
are due to random chance or to be true differences in the
samples
- Increase your confidence that probable X’s are
statistically significant
- Used when you need to be confident that a statistical
difference exists
Hypothesis Testing For Equal Means
 The histograms below show the height of inhabitants of
countries A and B.
 Both samples are of size 100, the scale is the same, and
the unit of measurement is inches.
 Question: Is the population of country B, on average,
taller than that of country A?
Country A
Country B
[inch]
60.0 62.0 64.0 66.0 68.0 70.0 72.0 74.0 76.0 78.0 80.0
Concepts Of Hypothesis Testing
1. All processes have
variation.
2. Samples from one
given process may vary.
3. How can we differentiate
between sample–based
“chance” variation and
a true process
difference?
Kinds Of Differences
Continuous data:
- Differences in averages
- Differences in variation
- Differences in distribution
“shape” of values
Discrete data:
- Differences in proportions
Hypothesis Testing
Guilty vs. Innocent Example
The American justice system can be used to illustrate the
concept of hypothesis testing.
In America, we assume innocence until proven guilty.
This corresponds to the null hypothesis.
It requires strong evidence “beyond a reasonable doubt”
to convict the defendant. This corresponds to rejecting the
null hypothesis and accepting the alternate hypothesis.
Ho: person is innocent
Ha: person is guilty
Nature Of Hypothesis
Null Hypothesis (Ho):
Usually describes a
status quo
The one you assume
unless otherwise
Shown
The one you reject or
fail to reject based
upon evidence
Signs used in Minitab:
= or > or <
Alternative Hypothesis (Ha):
Usually describes a
difference
Signs used in Minitab:
or < or >≠
Activity–Hypothesis Statements (10 minutes)
Write the null and alternate hypothesis testing statements for each scenario below:
Scenario 1: You have collected delivery time of supplier A and supplier B. You wish to test whether or not
there is a difference in delivery time from supplier A and B.
Null hypothesis statement :
Alternate hypothesis statement:
Scenario 2: You suspect that there is a difference in cycle time to process purchase orders in site 1 of
your company compared to site 2. You are going to perform a hypothesis test to verify your hypothesis.
Null hypothesis statement :
Alternate hypothesis statement:
Scenario 3: You have implemented process improvements to reduce the cycle time to process purchase
orders in your company. You have collected cycle time before the process improvements and after the
process improvement was implemented. You are going to perform a hypothesis test to verify that the
process improvements have resulted in a reduction in cycle time.
Null hypothesis statement :
Alternate hypothesis statement:
Hypothesis Testing
Guilty vs. Innocent Example
The only four possible outcomes:
1. An innocent person is set free. Correct decision
2. An innocent person is jailed. Type I error = α
The probability of this type of error occurring we represent as
3. A guilty person is set free. Type II error = β
The probability of this type of error occurring we represent as
4. A guilty person is jailed. Correct decision
Hypothesis Testing–Another View
TruthTruth
Ho Ha
VerdictVerdict
Ho
Ha
Innocent,
Jailed
Type I
α
Guilty,
Set Free
Type II
β
Innocent,
Set Free
Guilty,
Jailed
Innocent Guilty
Set Free
Jailed
Ho: Person is innocent.
Ha: Person is guilty.
Incorrectl
y
REJECT
Ho
Incorrectl
y
ACCEPT
Ho
Hypothesis Testing
 The P-value is calculated by Minitab
- The probability of getting the observed difference or
greater when the Ho is true. If p > 0.05, then there is no
statistical evidence of a difference existing.
- Ranges from 0.0 - 1.0
- The alpha (α) level is usually set at 0.05. Alpha is the
probability of making a Type I Error (concluding there is a
statistical difference between samples when there really is
no difference).
P > α : Accept Ho
P ≤ α: Reject Ho
Statistical Tests In Minitab
Some basic statistical tests are shown below with the command for running each
test in Minitab.
Variance among two or
more populations is
different.
Homogeneity of
Variance
Stat > ANOVA >
Homogeneity of Variance
Output (Y) changes as
the input (X) changes.
Linear
Regression
Stat > Regression >Fitted
Line Plot
Output counts from two
two or more subgroups
differ.
Chi-Square Test
of Independence
Stat > Tables >
Cross Tabulation OR
Chi-Square Test
Box
Plots
Scatter
Plots
C ABD E
Frequency
Category
Pareto
MNO
What The Tool Tests Statistical Test Graphical Test
Mean of population data
is different from an
established target.
1-Sample t-test
Stat > Basic Statistics
> 1-Sample t
Mean of population 1 is
different from mean of
population 2.
2-Sample t-test
Stat > Basic Statistics
> 2-Sample t
The means of two or
more populations is
different.
1-Way ANOVA
Stat > ANOVA > One-Way
Histogram
Histogram
Histogram
Normality Test
Stat > Basic
Statistics
Data is normally
distributed
Select A Statistical Test
 Hypothesis tests to find relationships between project Y
and potential X’s
Simple Linear
Regression
2 Sample t-Test
(Compare Means of two
samples)
ANOVA (Compare means of
multiple samples)
Homgeneity of Variance
(Compare variances)
ContinuousContinuous DiscreteDiscrete
Discrete
Continuous
X
Chi-Square Test
Y
Logistic
Regression
Hypothesis Test Summary
Normal Data
Variance Tests (Continuous Y)
F-test- Compares two sample variances.
Homogeneity of Variance –Compares two
or more sample variances (use Levene’s
Test)
Mean Tests (Continuous Y)
T-test One-sample–Tests if sample mean is
equal to a known mean or target.
T-test Two-sample–Tests if two sample
means are equal.
ANOVA One-Way–Tests if two or more
sample means are equal.
ANOVA Two-Way–Tests if means from
samples classified by two categories are
equal.
Correlation–Tests linear relation- ship
between two variables.
Regression–Defines the linear relationship
between a dependent and independent
variable.
(Here, “Normality” applies to the
residuals of the regression.)
Non-Normal Data
Variance Tests (Continuous Y)
Homogeneity of Variance–Compares two or
more sample variances (use Levene’s Test)
Median Tests (Continuous Y)
Mood’s Median Test–Another test for two or
more medians. More robust to outliers in
data.
Correlation–Tests linear relationship
between two variables.
Proportion Tests (Discrete Y)
P-Test–Tests if two population proportions
are equal.
Chi-Square Test–Tests if three or more
relative counts are equal.
Studying and Analyzing Variation
Choosing The Correct Hypothesis Test
*HOV (test spread)
*Mood’s Median (test center)
*HOV (test spread)
*ANOVA (test center)
Comparing 2
or fewer
Groups?
Can I Match
X’s With X’s?
Are we
comparing
the mean to a
Standard?
Paired t 1 Sample t
NO
NO NO
YES YES
Is the data
normal?
NO
YES
*CHI
SQUARE (X2
)
Are Y’s
Continuous?
NO
YES
YES
*HOV (spread)
*2 Sample t–Test
(center)
Are X’s
Discrete?
YES
Logistic
Regression
NO
Is Y
Continuous?
NO
Yes
*Linear
Regression
(Note: Do HOV first and use
results to refine 2 Sample t)
Multiple
Groups
START
* Instructions for these
tests are on the
following pages
Pre and Post
Improvement
(center)
Choosing the Appropriate Test
There are four items that we need to consider before we select the
right statistical test:
1. Is the Y Continuous or Discrete
2. Is (are) the X(s) Continuous or Discrete
3. Are we trying to compare the Variation or
Centering
4. Is Y Normal or non-Normal
Note: Not all four questions are used for the selection of
the proper test...
Statistical Test Flow Chart
Is Y Continuous or Discrete?
Is X Continuous or Discrete? Is X Continuous or Discrete?
Chi-Square
Variation or Centering?
Discrete
Continuou
s
Continuou
s
Discrete Continuou
s
Discrete
Binomial
Logistic
Regression
Regression
Correlation
Mood's
Median
Mann Whitney
Centering
Normal or Non-Normal Data
Non-Normal
Non-
Parametric
Tests
Normal
No
ANOVA
No
Comparing
Relative to a
Target?
Comparing
only Two
Groups?
2 Sample
t-Test
Yes
1 Sample
t-Test
Yes
Normal or Non-Normal Data
Variation
Homogeneity
of Variance
Bartlett's
Homogeneity
of Variance
F-test
Homogeneity
of Variance
Levene's
Normal
Non-
Normal
Note:
Even though the tests
are broken down by
whether the dependent
variable (Y) is normal or
not, you may still
perform the test as long
as you know the
limitations of the test
Which Hypothesis Testing Tool Would You Use?
For each scenario described below, which hypothesis testing tool would
you use? Assume normal distribution, where appropriate
1. A six-sigma project is being conducted in the field to improve the cycle time for warranty
repair returns. The warranty return cycle time was measured for a period of 6 weeks for 4
regions. The Green Belt suspects that there is a difference in average warranty repair cycle
time among each of the regions. How would you test whether there is a statistically
significant difference in mean cycle time for the different regions?
2. Tungsten steel erosion shields are fitted to the low pressure blading in steam turbines.
The most important feature of a shield is its resistance to wear. Resistance to wear can be
measured by abrasion loss, which is thought to be associated with the hardness of steel.
How would you test whether there is a statistically significant relationship between
resistance to wear and abrasion hardness of steel?
3. Your business purchases sheet stock from two different suppliers. It has found an
unacceptably large number of defects being caused by thickness beyond tolerance levels.
Data for overall mean thickness data was analyzed and found to be on target. Data was
collected that would identify a potential difference in the variation of the thickness of the
material by supplier.
4. Checks Are Us is a payroll processing firm. Timecard errors are routinely monitored and
recorded. A Black Belt investigating the errors wishes to determine if there are any
differences in the number of errors among five of its major customers. The number of errors
contained in a sample of 150 employees was recorded for five weeks. How would you test if
there is a statistically significant difference in the number of errors among the customers?

Hypothesis Testing in Six Sigma

  • 1.
    What is HypothesisTesting?  In hypothesis testing, relatively small samples are used to answer questions about population parameters (inferential statistics)  There is always a chance that the selected sample is not representative of the population; therefore, there is always a chance that the conclusion obtained is wrong  With some assumptions, inferential statistics allows the estimation of the probability of getting an “odd” sample and quantifies the probability (p- value) of a wrong conclusion
  • 2.
    Hypothesis Testing–Introduction - Refersto the use of statistical analysis to determine if observed differences between two or more data samples are due to random chance or to be true differences in the samples - Increase your confidence that probable X’s are statistically significant - Used when you need to be confident that a statistical difference exists
  • 3.
    Hypothesis Testing ForEqual Means  The histograms below show the height of inhabitants of countries A and B.  Both samples are of size 100, the scale is the same, and the unit of measurement is inches.  Question: Is the population of country B, on average, taller than that of country A? Country A Country B [inch] 60.0 62.0 64.0 66.0 68.0 70.0 72.0 74.0 76.0 78.0 80.0
  • 4.
    Concepts Of HypothesisTesting 1. All processes have variation. 2. Samples from one given process may vary. 3. How can we differentiate between sample–based “chance” variation and a true process difference?
  • 5.
    Kinds Of Differences Continuousdata: - Differences in averages - Differences in variation - Differences in distribution “shape” of values Discrete data: - Differences in proportions
  • 6.
    Hypothesis Testing Guilty vs.Innocent Example The American justice system can be used to illustrate the concept of hypothesis testing. In America, we assume innocence until proven guilty. This corresponds to the null hypothesis. It requires strong evidence “beyond a reasonable doubt” to convict the defendant. This corresponds to rejecting the null hypothesis and accepting the alternate hypothesis. Ho: person is innocent Ha: person is guilty
  • 7.
    Nature Of Hypothesis NullHypothesis (Ho): Usually describes a status quo The one you assume unless otherwise Shown The one you reject or fail to reject based upon evidence Signs used in Minitab: = or > or < Alternative Hypothesis (Ha): Usually describes a difference Signs used in Minitab: or < or >≠
  • 8.
    Activity–Hypothesis Statements (10minutes) Write the null and alternate hypothesis testing statements for each scenario below: Scenario 1: You have collected delivery time of supplier A and supplier B. You wish to test whether or not there is a difference in delivery time from supplier A and B. Null hypothesis statement : Alternate hypothesis statement: Scenario 2: You suspect that there is a difference in cycle time to process purchase orders in site 1 of your company compared to site 2. You are going to perform a hypothesis test to verify your hypothesis. Null hypothesis statement : Alternate hypothesis statement: Scenario 3: You have implemented process improvements to reduce the cycle time to process purchase orders in your company. You have collected cycle time before the process improvements and after the process improvement was implemented. You are going to perform a hypothesis test to verify that the process improvements have resulted in a reduction in cycle time. Null hypothesis statement : Alternate hypothesis statement:
  • 9.
    Hypothesis Testing Guilty vs.Innocent Example The only four possible outcomes: 1. An innocent person is set free. Correct decision 2. An innocent person is jailed. Type I error = α The probability of this type of error occurring we represent as 3. A guilty person is set free. Type II error = β The probability of this type of error occurring we represent as 4. A guilty person is jailed. Correct decision
  • 10.
    Hypothesis Testing–Another View TruthTruth HoHa VerdictVerdict Ho Ha Innocent, Jailed Type I α Guilty, Set Free Type II β Innocent, Set Free Guilty, Jailed Innocent Guilty Set Free Jailed Ho: Person is innocent. Ha: Person is guilty. Incorrectl y REJECT Ho Incorrectl y ACCEPT Ho
  • 11.
    Hypothesis Testing  TheP-value is calculated by Minitab - The probability of getting the observed difference or greater when the Ho is true. If p > 0.05, then there is no statistical evidence of a difference existing. - Ranges from 0.0 - 1.0 - The alpha (α) level is usually set at 0.05. Alpha is the probability of making a Type I Error (concluding there is a statistical difference between samples when there really is no difference). P > α : Accept Ho P ≤ α: Reject Ho
  • 12.
    Statistical Tests InMinitab Some basic statistical tests are shown below with the command for running each test in Minitab. Variance among two or more populations is different. Homogeneity of Variance Stat > ANOVA > Homogeneity of Variance Output (Y) changes as the input (X) changes. Linear Regression Stat > Regression >Fitted Line Plot Output counts from two two or more subgroups differ. Chi-Square Test of Independence Stat > Tables > Cross Tabulation OR Chi-Square Test Box Plots Scatter Plots C ABD E Frequency Category Pareto MNO What The Tool Tests Statistical Test Graphical Test Mean of population data is different from an established target. 1-Sample t-test Stat > Basic Statistics > 1-Sample t Mean of population 1 is different from mean of population 2. 2-Sample t-test Stat > Basic Statistics > 2-Sample t The means of two or more populations is different. 1-Way ANOVA Stat > ANOVA > One-Way Histogram Histogram Histogram Normality Test Stat > Basic Statistics Data is normally distributed
  • 13.
    Select A StatisticalTest  Hypothesis tests to find relationships between project Y and potential X’s Simple Linear Regression 2 Sample t-Test (Compare Means of two samples) ANOVA (Compare means of multiple samples) Homgeneity of Variance (Compare variances) ContinuousContinuous DiscreteDiscrete Discrete Continuous X Chi-Square Test Y Logistic Regression
  • 14.
    Hypothesis Test Summary NormalData Variance Tests (Continuous Y) F-test- Compares two sample variances. Homogeneity of Variance –Compares two or more sample variances (use Levene’s Test) Mean Tests (Continuous Y) T-test One-sample–Tests if sample mean is equal to a known mean or target. T-test Two-sample–Tests if two sample means are equal. ANOVA One-Way–Tests if two or more sample means are equal. ANOVA Two-Way–Tests if means from samples classified by two categories are equal. Correlation–Tests linear relation- ship between two variables. Regression–Defines the linear relationship between a dependent and independent variable. (Here, “Normality” applies to the residuals of the regression.) Non-Normal Data Variance Tests (Continuous Y) Homogeneity of Variance–Compares two or more sample variances (use Levene’s Test) Median Tests (Continuous Y) Mood’s Median Test–Another test for two or more medians. More robust to outliers in data. Correlation–Tests linear relationship between two variables. Proportion Tests (Discrete Y) P-Test–Tests if two population proportions are equal. Chi-Square Test–Tests if three or more relative counts are equal.
  • 15.
  • 16.
    Choosing The CorrectHypothesis Test *HOV (test spread) *Mood’s Median (test center) *HOV (test spread) *ANOVA (test center) Comparing 2 or fewer Groups? Can I Match X’s With X’s? Are we comparing the mean to a Standard? Paired t 1 Sample t NO NO NO YES YES Is the data normal? NO YES *CHI SQUARE (X2 ) Are Y’s Continuous? NO YES YES *HOV (spread) *2 Sample t–Test (center) Are X’s Discrete? YES Logistic Regression NO Is Y Continuous? NO Yes *Linear Regression (Note: Do HOV first and use results to refine 2 Sample t) Multiple Groups START * Instructions for these tests are on the following pages Pre and Post Improvement (center)
  • 17.
    Choosing the AppropriateTest There are four items that we need to consider before we select the right statistical test: 1. Is the Y Continuous or Discrete 2. Is (are) the X(s) Continuous or Discrete 3. Are we trying to compare the Variation or Centering 4. Is Y Normal or non-Normal Note: Not all four questions are used for the selection of the proper test...
  • 18.
    Statistical Test FlowChart Is Y Continuous or Discrete? Is X Continuous or Discrete? Is X Continuous or Discrete? Chi-Square Variation or Centering? Discrete Continuou s Continuou s Discrete Continuou s Discrete Binomial Logistic Regression Regression Correlation Mood's Median Mann Whitney Centering Normal or Non-Normal Data Non-Normal Non- Parametric Tests Normal No ANOVA No Comparing Relative to a Target? Comparing only Two Groups? 2 Sample t-Test Yes 1 Sample t-Test Yes Normal or Non-Normal Data Variation Homogeneity of Variance Bartlett's Homogeneity of Variance F-test Homogeneity of Variance Levene's Normal Non- Normal Note: Even though the tests are broken down by whether the dependent variable (Y) is normal or not, you may still perform the test as long as you know the limitations of the test
  • 19.
    Which Hypothesis TestingTool Would You Use? For each scenario described below, which hypothesis testing tool would you use? Assume normal distribution, where appropriate 1. A six-sigma project is being conducted in the field to improve the cycle time for warranty repair returns. The warranty return cycle time was measured for a period of 6 weeks for 4 regions. The Green Belt suspects that there is a difference in average warranty repair cycle time among each of the regions. How would you test whether there is a statistically significant difference in mean cycle time for the different regions? 2. Tungsten steel erosion shields are fitted to the low pressure blading in steam turbines. The most important feature of a shield is its resistance to wear. Resistance to wear can be measured by abrasion loss, which is thought to be associated with the hardness of steel. How would you test whether there is a statistically significant relationship between resistance to wear and abrasion hardness of steel? 3. Your business purchases sheet stock from two different suppliers. It has found an unacceptably large number of defects being caused by thickness beyond tolerance levels. Data for overall mean thickness data was analyzed and found to be on target. Data was collected that would identify a potential difference in the variation of the thickness of the material by supplier. 4. Checks Are Us is a payroll processing firm. Timecard errors are routinely monitored and recorded. A Black Belt investigating the errors wishes to determine if there are any differences in the number of errors among five of its major customers. The number of errors contained in a sample of 150 employees was recorded for five weeks. How would you test if there is a statistically significant difference in the number of errors among the customers?

Editor's Notes

  • #3 An assertion or conjecture about one or more parameters of a population(s). To determine whether it is true or false, we must examine the entire population, this is impossible!! Instead, use a random sample to provide evidencethat either supports or does not support the hypothesis The conclusion is then based upon statistical significance It is important to remember that this conclusion isan inference about the population determined from the sample data
  • #4 Issue: how conclusive is the evidence that the sample results indicate a real, more-than-random effect in the underlying population or process? In the Analyze phase we will try to determine which X’s have an effect on the Y. We can compare two sets of data, with X set at different values, thereby determining if that X has an effect. Examples: Does a process perform better using machine/material/fixture/tool A or B? Does the purchased material conform to the desired specifications? Is there a difference in performance between vendor A or B? Is there a difference in your process after you make a change? Is the process on target?
  • #5 To improve processes, we need to identify factors which impact the mean or standard deviation. Once we have identified these factors and made adjustments for improvement, we need to validate actual improvements in our processes. Sometimes we cannot decide graphically or by using calculated statistics (sample mean and standard deviation) if there is a statistically significant difference between processes. In such cases the decision will be subjective. We perform a formal statistical hypothesis test to decide objectively whether there is a difference.
  • #6 There are a variety of different hypothesis tests. Each one tests for a different kind of “difference.”
  • #8 Note that we are not proving the hypothesis to be true or false. We will reject or fail to reject the null hypothesis based on the evidence from our samples. Failing to reject the null hypothesis implies that the data does not provide sufficient evidence to conclude that a difference exists. On the other hand, rejection of the null hypothesis implies that the sample data provides sufficient evidence to say that a difference exists.
  • #10 Why is it important to minimize the chance of making a Type I error in a six-sigma project?
  • #11 This is a visual way of looking at the four possible outcomes of hypothesis testing.
  • #12 The p (probability) value is the statistical measure for the strength of H0, usually reported with a range between 0.0 and 1.0. The higher the p-value, the more evidence we have to support H0, that there is no difference. Think of the null hypothesis as a jury trial: the accused is innocent until proven guilty. In hypothesis testing, the samples are assumed equal until proven not. Since we are usually doing a hypothesis test to prove there is a difference, we are looking for p-values less than 0.05. By convention if p &amp;gt;.05 accept H0 (no difference). If p.05 reject H0 (difference exists).
  • #15 There are a number of hypothesis tests for both normal and non-normal data. You should consult a Black Belt or Master Black Belt if you are not sure which test to use for you project, or if your project involves non-normal data. The next page shows a summary of the tests that we will look at in detail during this training. We have already looked at the Normality Test in previous modules.
  • #17 Note: In order to use this chart, we are assuming our X’s are discrete. Otherwise, use Regression. Moods Median Test Homogeniety of Variance (Hov) – Levene’s Test Chi Square Analysis of Variance (ANOVA)