Design Of Experiments For Dummies – A
By Willy Vandenbrande | Published: 21 Feb 07
Despite all the efforts by specialists in quality and statistics, Design Of Experiments (DOE) is
still not applied as widely as it could and should be. When I talk to people who had some kind
of introduction to DOE, I notice that they are reluctant or even scared to apply it. To them,
DOE deals with very complex tests with many factors at various levels and it requires in depth
knowledge of at least two textbooks full of statistics to understand it and to be able to use it. In
addition to that, the continuous fight between classicists, Taguchi adepts and Shainin
enthusiasts does not make it easier for the interested newcomer to the field. As far as this
ridiculous fighting is concerned I refer to reference (1).
More interesting is the comment about the complexity of the subject. I dare say that 90% of all
tests performed in industry are one factor two level experiments. For the specialists among
you: 21 designs. We just want to know how the system, product or process will react if one
factor is changed from one level to another level. In most cases the first level is the original,
presently used level of the factor, so we already have some knowledge about the behavior of
the system at this setting.
Simple, you say? Do it all the time? Undoubtedly, but how often is it done correctly? I know
from experience that in industrial experimentation it does not need to be complex to be messed
up! And there are two ways of messing it up: the first one is not being able to draw any
conclusion from the test, the second one, probably even worse, is drawing wrong conclusions
from the test.
Where do we go wrong?
If we divide the experimentation process in the following four phases: setting-up the
experiment, executing the tests, analyzing the results and drawing conclusions, we see things
going wrong in all phases, but generally most damage is done before the tests start. A wrongly
set-up experiment cannot be saved, not even by the most advanced of statistical software
programs. We need to think more up front and use basic rules from DOE to avoid problems.
We are talking here about experiments that are so simple that the calculations and the analyses
to be made will be very straightforward. It will never be a problem for anyone to make these
calculations and to understand the results. What we need to be sure of is that these results
contain the answers to our questions!
To illustrate the many ways to mess up an experiment, we will use a simple example. I call it:
quot;the hunt for the red tomatoquot; and it stars Sam, the friendly amateur gardener who is a keen
tomato grower. His local supplier tells him he now has a great new fertilizer called quot;the red
tomatoquot; that is 20% more expensive than the current one, quot;tomato loverquot;, but that is guaranteed
to give him a lot more tomatoes by the end of the season. The offer is tempting, but Sam
decides to test the product this season, before eventually changing over for his entire crop the
next season. He knows a few things about experimentation and will follow some basic DOE
Rule number one: write down the questions you would like to see answered by
In Six Sigma terminology: define the problem and one aspect of problem definition is
quantification. If you want to compare results, you have to know on what bases the results will
be compared, so you need to quantify the output. In this case the question Sam wants to see
answered by his experiment is: does the quot;red tomatoquot; fertilizer increase my tomato harvest by
at least 20% in weight?
Rule number two: don’t forget that characteristics that are not part of the
study also need to fulfill requirements.
It is not very interesting to solve one problem or improve on one aspect and at the same create
three new problems or deteriorate on several other characteristics. Sam would not really be
happy if as a result of changing fertilizer he would now have 20 % more, but badly tasting,
tomatoes. So he would certainly evaluate the taste, but might also look at the size of the
tomatoes, or the size distribution, the color, etc. Although we do not perform the experiment to
improve on these characteristics, we at least have to make sure that they stay at an acceptable
level. So at the end of the experiment we need to measure and evaluate them!
Rule number three: make sure to have a reliable measurement system.
It is not the purpose of this article to give a course on Measurement Systems Analysis (MSA),
but you must be aware of the importance of the variation introduced by the measurement
system and have to keep it at a minimum. Weighing systems are generally pretty good, but if
Sam wants to be sure he should perform an analysis of the measurement system to see if it is
adequate. For more info on MSA, see reference (2).
Rule number four: use statistics and statistical principles upfront.
One question that always needs to be answered before starting a test is: quot;how big should the
sample size be?quot; Don’t forget that we will use a sample result to draw conclusions for the
behavior of the population. Table 1 shows the two types of wrong conclusions we may draw
when comparing two means, and the associated risks. In order to determine the appropriate
sample size we need to determine the α- and β- risk (or their counterparts: the significance
level (1- α) and the power (1- β) of the test), the distribution and the standard deviation of the
population and the minimum difference we want to be able to detect between the two
populations. This requires several assumptions.
Decision taken based on the experiment Real (but unknown) situation
There is a difference There is no difference
There is a difference between the means. OK Type I error (α-risk)
significance level = 1- α
There is no difference between the means Type II error (β-risk)
Power = 1- β OK
Table 1: The two types of errors
If we want to compare the means of normally distributed populations with equal variance (we
can use the historical standard deviation of the current process as an approximation) Table 2
can be used as a guideline for determining sample sizes. It is based on a two sample t-test and
contains the values for one sided comparison (is the new setting better?) and two sided
comparison (is there any difference between old and new setting?). Despite its limiting
assumptions it is useful in many cases. Note that if you want to detect small differences the
sample sizes increase drastically. For other cases (different significance and power levels, other
comparisons, etc) we refer to (3) and (4).
(in standard deviation units) One sided comparison Two sided comparison
0.5 70 86
1.0 18 23
1.5 9 11
2.0 6 7
2.5 4 5
3.0 3 4
Table 2: Selecting sample sizes when comparing two means of normal distributions.
α-risk = 0.05 (significance level= 95%)
β- risk = maximum 0.10 (power = minimum 90 %)
Based on t-test for two means with equal variance.
The data from last year's harvest are shown in table 3. Sam only wants to change over to the
new fertilizer if it gives him an average yield of at least 20% extra (one sided comparison).
This is 0.95 kg or 2.3 standard deviations. Looking at table 2 he decides to treat 6 plants with
quot;tomato loverquot; and 6 plants with quot;the red tomatoquot;. This should put him on the safe side when
he draws his conclusions.
Tomato plant number Yield (kg)
Standard Deviation 0.41
Table 3: Results of last year’s harvest
Rule number five: beware of known enemies.
Figure 1 gives you an idea of Sam’s garden and the location he has reserved for his tomato
plants. One thing is immediately obvious: because of the tree, some of the plants will receive a
lot more sunshine than others. Now suppose Sam treats 6 plants in the shade with quot;tomato
loverquot; and 6 plants in the sun with quot;the red tomatoquot; and that at the end of the season he sees
that the quot;red tomatoquot; plants clearly yield more tomatoes. What has he proven now? That the
fertilizer is superior or that tomato plants in the sun yield more tomatoes? No one knows,
because the effects of the two factors have been mixed.
Sam has three options. He can place all test plants in the sun, he can place all test plants in the
shade or he can place half of them in the sun and half of them in the shade. He chooses this
third option because in this way the conclusions he can take from the test have a bigger
validity. It allows him to compare the effect of the fertilizer over various conditions of
sunshine hours. In DOE this is called quot;blockingquot;. For every known enemy we have to develop
a strategy: will we keep it constant for the test or use it as a block factor?
Figure 1: Schematic representation of Sam's garden
Figure 2: A known enemy: amount of sunshine
As a result of this rule Sam has marked twelve test spots, six in the shady area, six in the sunny
region (see figure 2). In both areas he now has to select which three plants he will treat with
quot;the red tomatoquot;.
Rule number six: beware of unknown enemies.
Gardens are mysterious places. They may hold all sorts of differences that we are not aware of:
small changes in soil composition, effect of wind, ground water levels, etc. All these factors
may or may not influence the result of our test. The only way Sam can defend himself against
these enemies is by setting up his experiment in such a way that these factors are distributed
randomly, by chance, over his experiment.
Because of the blocking (see rule number four) he has to randomize within each block. This
can be done in a very simple way because there are only two levels to consider. Sam takes
three black and three red playing cards, shuffles them and at each test location within the block
he has his daughter pick one card. If it is a black card, he will treat that plant with quot;tomato
loverquot;, if it is a red card he will treat it with quot;the red tomatoquot;. The two of them have a lot of fun
and figure 3 shows the result of their work.
Figure 3: selecting test plants with playing cards
Note that this is a randomization in location, in many industrial tests, randomization in time is
needed. This means that the sequence of executing the tests has to be decided by chance within
Rule number seven: beware of what goes on during testing.
Sam will not have too much problems with this, he does it all himself and has full control of
what goes on during testing. He may want to instruct his daughter not to touch anything in the
tomato garden. Or not to be overenthusiastic and to start watering some of the plants.
With industrial experiments this is a different story. There is no end to what can go wrong
during testing. In many cases the people performing the tests have not been part of the team
that designed it, they have no idea what it is about or sometimes even why it is done. So keep
these two golden rules in mind:
1. He who communicates is king
2. Be where it happens when it happens.
Rule number eight: analyze the results statistically.
The tomato season has come to an end; Sam has weighted all tomatoes and has put the test
results together in table 4.
Test number quot;Tomato loverquot; treated (kg) quot;The red tomatoquot; treated (kg)
1 4.46 6.17
2 4.65 5.11
3 5.19 4.76
4 5.97 5.21
5 4.23 4.62
6 4.49 5.43
Average 4.83 5.22
Standard Deviation0.64 0.55
Table 4: results of the tests
In rule number four we saw that he only wants to change if the new fertilizer brings him at
least 20% or 0.97 kg more tomatoes. In other words: is the observed average difference greater
than 0.97kg? From the data it is clear that this is not the case (the observed difference is only
Statistically we test the null hypothesis that the means are equal versus the alternative that the
difference between the means is larger than 0.97 kg. This is done with a t-test. We refer to (5)
for an overall discussion on the t-test. If we use the assumption of unknown but equal
variances, the formulas to be used in this case are:
Click to enlarge
Using the data of table 4 we find s = 0.60 and t0=-1.69 with 10 degrees of freedom.
The critical value at the 0.05 level is 1.812, so only if the calculated t-statistic were > 1.812
could we say with 95% confidence that the new fertilizer gives more than 20% increase in
yield. This is clearly not the case!
If the result was positive, Sam would still have to analyze all the other characteristics that need
to fulfill minimum requirements.
Rule number nine: present the results graphically.
All this is correct and for those who know statistics it shows that there is not a significant
difference between the two fertilizers. However, not all people involved in the experiment are
knowledgeable of statistics. Sam’s daughter for instance! That’s why graphical presentation of
results is so important in communicating.
Both figures 4 and 5 show that there is definitely not a 20% higher tomato harvest with the new
fertilizer. Moreover, if you look at figure 5 it is also clear that there is a lot of overlap on the
individual test results. This is an indication that the difference in mean that can be observed,
could be due to chance and has no great significance.
Figure 4: boxplot of the yield of tomatoes per fertilizer
Figure 5: Individual value plot of the yield of tomatoes per fertilizer
Actually, in most cases the graphical output will tell the whole story. Only when there is some
doubt left, the correct numbers may be needed to take a final decision.
There is no such thing as a quot;simplequot; experiment. No matter how simple it may look, you need
to take several rules into account if you want to be able to draw correct conclusions out of your
tests. Don’t forget that it is equally expensive to run a bad or a good experiment. The only
difference is that the good experiment has a return on investment.