Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Hypothesis Test

Hypothesis TestingHypothesis Testing
Statistical Test
ProceduresProcedures
Week 2
Knorr-Bremse Group
Introduction
This module will introduce you to the statistical testing
methods which are all based on hypothesis testingmethods which are all based on hypothesis testing.
With the statistical tests we want to proof if assumptionsWith the statistical tests we want to proof if assumptions,
statements or hypothesis about unknown populations
are valid or notare valid or not.
B f di th t t th d i d t il it iBefore we discuss the test methods in detail it is
important to understand the fundamentals. Every
statistical decision incorporates risksstatistical decision incorporates risks.
Fi ll ill l d i h lFinally we will also determine how many samples are
required to decide if differences are significant.
Knorr-Bremse Group 02 BB W2 Hypothesis test 08, D. Szemkus/H. Winkler Page 2/36

Content
• Overview Hypothesis testing• Overview Hypothesis testing
D fi iti d th i• Definitions and there meaning
• The procedure for hypothesis testing
• The practical meaning of the hypothesis
testingg
• Sample sizes• Sample sizes
The questions is not if we draw conclusions or
not, the question is, if we are aware about the
conclusions we draw
The questions is not if we draw conclusions or
not, the question is, if we are aware about the
conclusions we drawconclusions we draw.
- S. I. Hayakawa
conclusions we draw.
- S. I. Hayakawa
The desire for certainty lays in the nature of theThe desire for certainty lays in the nature of the
humans and anyhow it is an intellectual vice.
- Bertrand Russell
humans and anyhow it is an intellectual vice.
- Bertrand Russell

But as long the people are not educated toBut as long the people are not educated toBut as long the people are not educated to
withhold their judgment due to the lag of
evidences, they will be disoriented…
But as long the people are not educated to
withhold their judgment due to the lag of
evidences, they will be disoriented…, y
…uncertainty is difficult to bear, like all the great
, y
…uncertainty is difficult to bear, like all the great
virtues.
- Bertrand Russell
virtues.
- Bertrand Russell
The DMAIC Cycle
Control
Maintain
DefineMaintain
Improvements
SPC
Control Plans
Project charter
(SMART)
Business Score Card
QFD VOC
D
Documentation QFD + VOC
Strategic Goals
Project strategy
C M
Measure
B li A l iImprove
AI
Baseline Analysis
Process Map
C + E Matrix
M t S t
Analyze
Improve
Adjustment to the
Optimum
FMEA Measurement System
Process Capability
Definition of
critical Inputs
FMEA
FMEA
Statistical Tests
Simulation
Tolerancing FMEA
Statistical Tests
Multi-Vari Studies
Regression
Tolerancing
Regression

The Statistical Methods
• Usually we have three pitfalls during our investigation:
• Experimentation error or noise factorsp
- Driving route to work vs. traffic conditions
• Mix of correlation with causality• Mix of correlation with causality
- Speed vs. tachometer
Complexity of effects and interactions• Complexity of effects and interactions
- Alcohol and coffee
• The correct application of the statistical methods helps to protect
against these pitfalls:
• Experimentation error → Exact estimation of the results (ANOVA)
• Correlation/causality mix → Random experimental designCorrelation/causality mix → Random experimental design
• Complexity of effects → Accordingly planed experiment
The Next Steps?
• We believe that we have found the true causes of the
variation with the already known tools (C&E, FMEA,y ( & , ,
process capability).
Exited we ask for approval to replace the actual process
parameter with the new (better) ones to show, that we
hi i ifi f ican achieve a significant performance increase.
F th t ti ti l i t f i h t bli h d• From the statistical point of view we have established a
hypothesishypothesis.
• But, we are really sure that the new process is better?
Would you bet your salary on it?Would you bet your salary on it?
Now we have to prove the significance of our hypothesis!
Now we have to prove the significance of our hypothesis!

The Null Hypothesis and the Alternative
• We will always assume that the Null Hypothesis (H0) is
true, unless we find a strong evidence for the contrary,true, unless we find a strong evidence for the contrary,
which we call the Alternative Hypothesis (Ha).
• Everybody in a court is not guilty unless the contrary is
proofed.
• You as the public prosecutor will have to show evidence
that the Null Hypothesis is probably wrongthat the Null Hypothesis is probably wrong.
Example: A Trial
JudgmentJudgment
Not Guilty Guilty Result:Result:
Not Guilty Type 1 Error
( Ri k)Correct
An innocent
person is
going to
The Truth
(α - Risk)Correct g g
prison
Guilty
Type 2 Error
(β - Risk) Correct
Guilty (β Risk)
ResultResult: A criminal gets free

Example: Supplier Quality
H0: „Quality from supplier A and B is comparable“
Decision of the Quality Assurance Department
Q-SA = Q-SB Q-SA ≠ Q-SB
„Don’t reject H0 “ „Reject H0 Ha is true”
Q A Q B
Q S Q S
Q A Q B
No action.
Actions for the supposed
worse supplier will be
wrongly defined
(α-Risk)
Truth
Q-SA = Q-SB
(Correct)
wrongly defined.
Truth
Q-SA ≠ Q-SB
No improvement action ,
although one is
statistical verifiably
worse
Improvement actions
are correct required for
one supplier
(α-Risk) (Correct)
worse.
Hypothesis Testing
Real life hypothesis: The statistical hypothesis:yp
The modified process
improves the yield.
This is what we call the
yp
The yield will not change.
This is what we call the null
hypothesis (H )This is what we call the
alternative hypothesis (Ha).
hypothesis (Ho).
HH :: aaµµ µµ bb~~HHoo::
HHaa::
aa
aa
µµ µµ
µµ µµ≠≠
bb
bb
~
We have to proof that the measured values are too different to belong to the
same process what means that Ho has to be wrong.
p o g

Hypothesis Testing Procedure
Lets compare situation A with situation B (2 suppliers)
B should have a higher average and a lower StDev
Formulate the “null hypothesis” (Ho)
and the “alternative hypothesis” (Ha) Hypothesis
of averages
H0: µA ≈ µB
H : µA < µB
Hypothesis
Ha: µA < µB
H0: σA ≈ σBCollect evidences
(a sample from the reality)
yp
of Standard-
deviations
H0: σA σB
Ha: σA > σB
Decide based on our evidences:
Rejection of Ho?
Acceptance of Ha?
Increase the sample size?
Formulation of a Problem as a Hypothesis
Desired
State
Current
Situation
Hypothesis of the Average Values
H0: µ0 ≈ µ1
H1: µ0 > µ1δ
LSL USL
H2: µ0 < µ1
H3: µ0 ≠ µ1
Problem associated with the
location of the average
H0: σ0 ≈ σ1
location of the average
H1: σ0 > σ1
H2: σ0 < σ1
H ≠
Desired
State
Current
Situation
LSL USL
H3: σ0 ≠ σ1
Problem associated with
Hypothesis of the Standard Deviations
Problem associated with
the process variation
What are the Alternative Hypothesis?
What are the Alternative Hypothesis?

Hypothesis Testing, how does it Work?
After the collection of the data we calculate:
a test statistic (a kind signal-to-noise ratio [SNR] like a Z- ; T- or F-
value)
We compare this calculated value to a critical value listed in an
appropriate table (several tables available)appropriate table (several tables available)
If the calculated value < critical value we don’t reject Ho
Minitab delivers a p value which makes life easier
The P-value (Probability) is the probability that an event occurs in( y) p y
respect to Ho (the p-value varies between 0 and 1;e.g. a p-value of
0,05 represents a level of significance of 95%).
The p value is based on a assumed or a actual reference distributionThe p-value is based on a assumed or a actual reference distribution
(Normal-, T-, Chi-square, F- distribution and others).
Small “P-value”
High SNR
H ill b j t d
Small “P-value”
High SNR
H ill b j t d
High “P-value”
Small SNR
H ill b t j t d
High “P-value”
Small SNR
H ill b t j t d
Ho will be rejectedHo will be rejected Ho will be not rejectedHo will be not rejected
Application of the Hypothesis Test
Xbar and S Chart for: C1 Is this point really
90
out of control or is
this part of the
natural process90
80
70
Means
MU=71.61
UCL=78.60
natural process
variation?
20100
60
Subgroup
10
s
LCL=64.62
UCL=10 2310
5
Deviations
S=4.897
UCL=10.23
0
Std
LCL=0.000
Statistical Process Control Chart (SPC)
Statistical Process Control Chart (SPC)

Application of the Hypothesis Test
100
Is this particular
product line really
different compared
90
80
1
different compared
to the others or is
this part of the
80
70
C
natural process
variation?
654321
60
654321
C2
Production Line
Test of differences between group average values
Test of differences between group average values
Estimation of the Decision Error
Reality
Experimental
Ho is true Ha is true
Type 2 Error
Experimental
Decision
Don’t reject Ho
Type 2 Error
β
Assumption
Type 1 Error
Reject Ho and
accept Ha
α
α = the probability of error (level of significance)… the risk in our
decision that an effect is presentp
1 - β = probability that there was an effect (Discriminatory power of
the statistical test)
)

Probability for an Error Type 1 (α-Risk)
• α is the risk which we accept that we wrongly reject the null
hypothesis (error type 1).
• We use α as a threshold value (also called significance
level) in order to decide whether we reject or don’t rejectlevel) in order to decide whether we reject or don t reject
Ho.
– If P < α, reject the null hypothesis (a change)
– If P > α, don’t reject the null hypothesis (no change)If P α, don t reject the null hypothesis (no change)
• In real life: we take actions without improvements.
• Practical consideration like financial risks, safety risks and
risks which effects the customer should be included in the
selection of a α-value.
• A typically value for α is 5 - 10%
• A typically value for α is 5 - 10%.
Significance Level
Not probable… How probable…Not probable… How probable…
With which certainty you want (you have) to decide?
This is the significance level (α)
With which certainty you want (you have) to decide?
This is the significance level (α)g ( )g ( )
We like to have a probability less than 10 % that the
events were just by chance (α = 0,10)
5% would be much better (α = 0,05) (Recommendation)
1% ld b id l ( 0 01)1% would be ideal (α = 0,01)
This alpha value is the assumption that there is no difference
between observed sample and a reference distribution.
p

Probability for a Error Type 2 (β-Risk)
• 1-β = the probability to detect a certain change in the
universe if it really exists.
• Also called the power of the test!• Also called the power of the test!
• Connected with the error type 2, the risk of failing to reject
the null hypothesis.
• In real life: An opportunity for improvement remainsIn real life: An opportunity for improvement remains
unchallenged.
A t 2 i ll li k d ith l t th• An error type 2 is usually linked with less cost than an error
type 1.
• Typical values for industrial experiments are 10 to 20%.
Micro Perspective of the Decision Risk
1 − α
Control-
distribution
1 − β
Compare-
distribution
αβ
1 − αα/2 α/21 α
β
CL
Control-
distribution
Compare-
distribution
CL
β
δ
1 − β

Which Difference do We Want to See?
Delta to Sigma (δ/σ)
• The delta of the test shows the magnitude of the effectThe delta of the test shows the magnitude of the effect
which has to be present that the results are practical
significant.
• Delta represents therefore the minimal effect which we want
t d t t ith i t i t (th t i t i d fi d b thto detect with given certainty (the certainty is defined by the
power of the test 1-β).
• This will be expressed in the units of standard deviations
“δ/σ”.
• The smaller the delta, the more sensible the test has to be
i d t d l i ith hi h l l f fidin order to draw conclusions with high level of confidence.
Question: Which effect has σ on the calculation of the test delta (δ/σ)?
Question: Which effect has σ on the calculation of the test delta (δ/σ)?
For Clarification
δ/σδ/σ
/2 /21 − αα/2 α/2
CL
Control-
distribution
β
CL
1 − β
Compare-
distribution
δ
Diff d i th b f StD
Differences are measured in the number of StDev

Calculation of the Sample Size
)(2 2
2/ βα ZZ +
( )
)(2
2
2/
δ
βα ZZ
N
+
=
( )σ
δ
The sample size can be calculated by:The sample size can be calculated by:
• Z-value of the half of the significance level (α error)
• Z-value of test power (β error)
• The difference is measured in units of StDev
File: Sample.XLSFile: Sample.XLS
pp
Table for Sample Sizes
δ/σ 20% 10% 5% 1% β 20% 10% 5% 1% β 20% 10% 5% 1% β 20% 10% 5% 1% β
0,2 225 328 428 651 309 428 541 789 392 525 650 919 584 744 891 1202
0,3 100 146 190 289 137 190 241 350 174 234 289 408 260 331 396 534
0,4 56 82 107 163 77 107 135 197 98 131 162 230 146 186 223 300
0 5 36 53 69 104 49 69 87 126 63 84 104 147 93 119 143 192
α = 20% α = 10% α = 5% α = 1%
0,5 36 53 69 104 49 69 87 126 63 84 104 147 93 119 143 192
0,6 25 36 48 72 34 48 60 88 44 58 72 102 65 83 99 134
0,7 18 27 35 53 25 35 44 64 32 43 53 75 48 61 73 98
0,8 14 21 27 41 19 27 34 49 25 33 41 57 36 46 56 75
0,9 11 16 21 32 15 21 27 39 19 26 32 45 29 37 44 59
1,0 9 13 17 26 12 17 22 32 16 21 26 37 23 30 36 48
1,1 7 11 14 22 10 14 18 26 13 17 21 30 19 25 29 40
1,2 6 9 12 18 9 12 15 22 11 15 18 26 16 21 25 33
1,3 5 8 10 15 7 10 13 19 9 12 15 22 14 18 21 28
1,4 5 7 9 13 6 9 11 16 8 11 13 19 12 15 18 25
1,5 4 6 8 12 5 8 10 14 7 9 12 16 10 13 16 21
1 6 4 5 7 10 5 7 8 12 6 8 10 14 9 12 14 191,6 4 5 7 10 5 7 8 12 6 8 10 14 9 12 14 19
1,7 3 5 6 9 4 6 7 11 5 7 9 13 8 10 12 17
1,8 3 4 5 8 4 5 7 10 5 6 8 11 7 9 11 15
1,9 2 4 5 7 3 5 6 9 4 6 7 10 6 8 10 13
2,0 2 3 4 7 3 4 5 8 4 5 6 9 6 7 9 12
2,1 2 3 4 6 3 4 5 7 4 5 6 8 5 7 8 11
2,2 2 3 4 5 3 4 4 7 3 4 5 8 5 6 7 10
2,3 2 2 3 5 2 3 4 6 3 4 5 7 4 6 7 9
2,4 2 2 3 5 2 3 4 5 3 4 5 6 4 5 6 8
2,5 1 2 3 4 2 3 3 5 3 3 4 6 4 5 6 8
2,6 1 2 3 4 2 3 3 5 2 3 4 5 3 4 5 7
2,7 1 2 2 4 2 2 3 4 2 3 4 5 3 4 5 72,7 1 2 2 4 2 2 3 4 2 3 4 5 3 4 5 7
2,8 1 2 2 3 2 2 3 4 2 3 3 5 3 4 5 6
2,9 1 2 2 3 1 2 3 4 2 2 3 4 3 4 4 6
3,0 1 1 2 3 1 2 2 4 2 2 3 4 3 3 4 5
3,1 1 1 2 3 1 2 2 3 2 2 3 4 2 3 4 5
3,2 1 1 2 3 1 2 2 3 2 2 3 4 2 3 3 5
3 3 1 1 2 2 1 2 2 3 1 2 2 3 2 3 3 43,3 1 1 2 2 1 2 2 3 1 2 2 3 2 3 3 4
3,4 1 1 1 2 1 1 2 3 1 2 2 3 2 3 3 4
3,5 1 1 1 2 1 1 2 3 1 2 2 3 2 2 3 4
3,6 1 1 1 2 1 1 2 2 1 2 2 3 2 2 3 4
3,7 1 1 1 2 1 1 2 2 1 2 2 3 2 2 3 4
3,8 1 1 1 2 1 1 1 2 1 1 2 3 2 2 2 3
3,9 1 1 1 2 1 1 1 2 1 1 2 2 2 2 2 3
4,0 1 1 1 2 1 1 1 2 1 1 2 2 1 2 2 3

An Example
Let´s assume the output (Y) we measure is a metric for the surface quality
of laminate. We want to figure out if the yield of the modified (New) process
has been significantly improved compared to the current (Old) processhas been significantly improved compared to the current (Old) process.
The data of the investigation are shown below. The values in (%) are the
results of 48 sheets cut into 288 panels per experimental runresults of 48 sheets cut into 288 panels per experimental run.
“Old” “New”
89.7 84.7
81.4 86.1
84 5 83 2 How would you formulate HHow would you formulate H84.5 83.2
84.8 91.9
87.3 86.3
How would you formulate Ho
and Ha for this example?
How would you formulate Ho
and Ha for this example?
79.7 79.3
85.1 82.6
81.7 89.1
83.7 83.7
84.5 88.5
File: Yield Laminat.MTWFile: Yield Laminat.MTW
84.5 88.5
An Example
Question: Does the “New” process improve the yield
compared to the current “Old” process?
Descriptive StatisticsDescriptive Statistics
Variable N Mean Median Tr Mean StDev SE Mean
New 10 84.24 84.50 84.125 2.902 0.918
Old 10 85.54 85.40 85.52 3.65 1.15
The statistical question is:
Is difference between the mean from “New” (85 54) to “Old”Is difference between the mean from New (85,54) to Old
(84,24) significant so that it can be described as real?
Or are the means so close together that this is a day to dayOr are the means so close together that this is a day to day
variation just by chance (random)?

What is True?
Old New
B B B B B BB B B B
Do the values represent two different processes?Do the values represent two different processes?
80.0 82.5 85.0 87.5 90.0 92.5
A AA AAAA A A
B B B B B BB B B B
Do the values represent two different processes?Do the values represent two different processes?
Do the values represent the same process ?Do the values represent the same process ?
. .. . . : ::. .. . . . . . .. . .. . . : ::. .. . . . . . .
----+---------+---------+---------+---------+---------+-
80 0 82 5 85 0 87 5 90 0 92 5
80.0 82.5 85.0 87.5 90.0 92.5
Hypothesis Testing - Procedure
1. Define the Problem
2 Define the goals2. Define the goals
3. Establish the hypothesis
- Null hypothesis (Ho)
- Alternative hypothesis (Ha)
4. Select the applicable test statistics (assumed probability
distribution Z, t, or F)
5. Define the probability for the error type 1 (Alpha), usually 5%.
6. Define the probability for the error type 2 (Beta), usually 10-20%
7 Define the effect (Delta)
7. Define the effect (Delta)

Hypothesis Testing – Procedure, continued
8. Define the sample size
9 Define a sample plan9. Define a sample plan
10. Take the samples and collect the data
11. Calculate the test statistics based on the data (Z, t, or F)
12. Determine the probability that the test statistics occurs just by
chance
13. Is this probability smaller than α reject Ho and accept Ha. Is this
probability bigger than α don’t reject Hprobability bigger than α don t reject Ho
14. Replicate the results and transfer the statistical conclusion into a
practical solution
Hypothesis Testing – Definitions
1. Null Hypothesis (Ho) - statement of no change or difference. This
statement is assumed true until sufficient evidence for the opposite is
presented.p
2. Error Type 1 - The error to reject Ho although Ho is true, or saying there
is a difference although no difference exists! Chance of “false positive”is a difference although no difference exists! Chance of false positive
3. Alpha Risk - The maximum risk or probability of finding a false positive
( )(Error Type 1). This probability is always greater than zero, and is
usually established at 5%. This risk will be set to a greatest level which
is still acceptable to reject Ho. (Costs or risks of change.)j o ( g )
4. Significance Level – Probability of error (Same as Alpha Risk).
5. Alternative Hypothesis (Ha) - statement of change or difference. This
statement is considered true if Ho is rejected.
6. Error Type 2 - The error not to reject Ho if it is not true or to saying there
is no difference if a difference exists. Chance of “false negative”, it
g
represents a missed opportunity.

Hypothesis testing – definitions
7. Beta Risk - The risk or probability of making a Error Type 2, or
overlooking an effective treatment or solution to the problem.
8. Significant Difference - A term used to describe the results of a statistical
hypothesis test where a difference is too large to be reasonably
attributed to chanceattributed to chance.
9. Power - The ability of a statistical test to detect a real difference when
fthere really is one, or the probability of being correct in rejecting Ho.
Commonly used to determine if sample sizes are sufficient to detect a
difference in treatments if one exists.
10. Test Statistic - a standardized value (Z, t, F, etc.) which represents the
feasibility of H and is distributed in a known manner such that afeasibility of Ho, and is distributed in a known manner such that a
probability for this observed value can be determined. Usually, the more
feasible Ho is, the smaller the absolute value of the test statistic, and the
greater the probability of observing this value within its distributiongreater the probability of observing this value within its distribution.
Confirmation of an Effect
• Whenever we conduct an experiment or we modify
thi t t k if th t h t h dsomething, we want to know if that what we have done,
has a real actual impact/effect.
• Due to the fact that every process displays variation it
is difficult to recognize a true change within thisis difficult to recognize a true change within this
variation or noise.
• Example:
A ld d l d ld fli iAssume you would stand on one leg and you would flip a coin ten
times with the result of seven heads. Could you conclude out of
this result that standing on one leg has an effect or was that justthis result that standing on one leg has an effect or was that just
by chance?

Validation of Factors Y = f(x)
Factor X = Input
Discrete / Attributive Continuous / Variable
Part of the
Green Belt
Training Discrete / Attributive Continuous / Variable
te
ve
Training
Output
Discret
Attributiv Chi-Square
Logistic
Regression
ltY=O
D
A
s
Resul
ntinuous
ariable
T - Test
ANOVA ( F - Test) Regression
Con
Va
Variance Test
Statistical techniques for all combination of data types are available
Statistical techniques for all combination of data types are available
Summary
• Overview Hypothesis testing• Overview Hypothesis testing
D fi iti d th i• Definitions and there meaning
• The procedure for hypothesis testing
• The practical meaning of the hypothesis
testingg
• Sample sizes• Sample sizes

Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Hypothesis Test

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Hypothesis Test

Similar to Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Hypothesis Test (20)

More from J. García - Verdugo

More from J. García - Verdugo (20)

Recently uploaded

Recently uploaded (20)

Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Hypothesis Test