1. by
Dr. Kannan A.
Department of Chemical Engineering
Indian Institute of Technology Madras
Chennai 600036
kannan@iitm.ac.in
Phone: 044 - 22574170 (Office)
CH5020: Statistical Design and Analysis of
Experiments
2. Simple experiments involving ONE factor
The effect of changing the settings (or levels) of (only) one factor
on the desired response is investigated.
There may be many levels of this factor as well as many replicates
(repeats) at each level.
The variability in response of replicated measurements at a given
level is due to random error.
Hence, replicates are important in an experiment because they
give estimates of the experimental error.
3. Sl. No. T oC Rxn rate
mol/(m3.s)
Average
1 25 1
2 25 1.1 1.02
3 25 0.95
1 30 1.5
2 30 1.3 1.3
3 30 1.1
Variability
within
treatment
4. Simple experiments involving ONE factor
When the level of a factor is changed, there will be also a
variation in the response.
The important question is whether this change is genuinely
due to the effect of changing the level of a factor or is due to
noise.
Hence compare sum of variations due to treatment level
change, (called as mean square treatment), with sum of
variations due to noise (mean square error).
In other words, compare variation between treatments to
variation within treatments
5. Simple experiments involving ONE factor
Terminology:
Factor: Variable whose effect on the outcome is being investigated
Level: Value that is set to the factor, and many levels of the same
factor may be tested
Treatment: Each level or setting for a factor is called as a treatment
a: number of treatments that are carried out
n: number of repeats of each treatment
Response: The outcome after the treatment. The response
after each of the a treatments is a random variable.
6. Tabulation of Data from a Single Factor
Experiment
Treatment Observations Totals Averages
1 y11 y12 … Y1n y1.
2 y21 y22 … Y2n y2.
. … … … … …
. … … … … …
. … … … … …
a ya1 … … Yan ya.
y.1 y.2 y.n y..
.
y1
.
y2
.
a
y
..
y
8. Simple experiments involving ONE factor
Terminology:
Yij: Response to the ith treatment (i=1,2,…a) and jth repeat
(j=1,2,…n)
µ: Overall mean and is a parameter common to all treatments
µi : ith treatment mean (µ+ i)
i: ith treatment effect
ij: random error N(0,2)
ij
i
ij
ij
i
ij
Y
or
Y
9. Simple experiments involving ONE factor
Interpretation:
Basically there could be different treatment means which are
different outcomes of different treatments. There is a spread about
these means µi (i=1,2,…,a) due to the random error component
which has mean 0 and variance 2.
When this error distribution is superimposed on each one of the
treatment means, we get a normal distributions viz. N(µi, 2).
ij
i
ij
ij
i
ij
Y
or
Y
10. Simple experiments involving ONE factor
Interpretation:
The individual treatment means µi are defined as deviations about an
overall mean µ and hence the sum of the treatment effects (i)
becomes zero.
Alternatively, The average of the individual treatment means is the
overall mean µ i.e.
a
i
i
a
i
i
i
i
a
as
1
1
0
a
a
i
i
1
11. Definition of Summation Conventions
an
N
where
y
N
y
and
y
y
a
,...,
,
i
y
n
y
and
y
y
a
i
n
j
ij
a
i
n
j
ij
n
j
j
i
n
j
i
j
i i
1 1
1 1
1
1
2
1
Here N is the product of number of treatments (a) and number of
repeats per treatment (n).
12. Definition of Summation Conventions
an
N
where
y
N
y
and
y
y
a
,...,
,
i
y
n
y
and
y
y
a
i
n
j
ij
a
i
n
j
ij
n
j
j
i
n
j
i
j
i i
1 1
1 1
1
1
2
1
Here N is the product of number of treatments (a) and number of
repeats per treatment (n). The dot() represents the summation over
the index it replaces.
14. Null and Alternate Hypothesis
Here the null hypothesis indicates that none of the treatments have an
effect on the outcome and the response is an average value over which is
superimposed the variability due to the random error component. Hence
all N observations are taken from a normal distribution with mean µ and
variance 2.
If the null hypothesis is true then there is no effect of changing the factor
level on the mean response.
15. Analysis of Variance (ANOVA)
Let us find out the measure of the overall variability in the experiments. It
may be calculated from the following Total Sum of Squares
This may be eventually resolved into two meaningful entities viz. The
treatment sum of squares and the error sum of squares
T
a
i
n
j
ij SS
)
y
y
(
2
1 1
2
1
2
1 1
2
1 1
)
y
y
(
n
)
y
y
(
)
y
y
(
a
i
i
i
a
i
n
j
ij
a
i
n
j
ij
16. Resolution and Interpretation of SST
2
1
2
1 1
2
1 1
)
y
y
(
n
)
y
y
(
)
y
y
(
a
i
i
i
a
i
n
j
ij
a
i
n
j
ij
The sum of squares of the differences between individual
responses and overall treatment mean is resolved into
Squares
of
Sum
Error
:
)
y
y
( 2
i
a
1
i
n
1
j
ij
Squares
of
Sum
Treatment
:
)
y
y
(
n
a
i
i
2
1
17. Resolution and Interpretation of SST
The sum of squares of the differences between individual
responses and overall treatment mean is resolved into
sum of squares of deviations within each treatment which
is summed across all treatments (error sum of squares)
and sum of squares of deviations between each treatment
mean and the overall mean summed across all treatment
means.
Squares
of
Sum
Error
:
)
y
y
( i
a
i
n
j
ij
2
1 1
Squares
of
Sum
Treatment
:
)
y
y
(
n
a
i
i
2
1
18. Resolution and Interpretation of SST
2
1
2
1 1
2
1 1
)
y
y
(
n
)
y
y
(
)
y
y
(
a
i
i
i
a
i
n
j
ij
a
i
n
j
ij
SST = SSE + SSTreatment
Total Sum of Squares = Sum of Squares due to Error and Sum
of Squares due to Treatment Effect
The Query is
“Are the two Sum of Squares Different or Comparable?”
If comparable, it implies that the treatment effect is negligible and
variations are only due to random error.
If much different, there is a distinct effect of at least one treatment.
However, before we jump to conclusions we need to bring the two
sum of squares into an equal footing.
19. Degrees of Freedom Analysis
2
1
2
1 1
2
1 1
)
y
y
(
n
)
y
y
(
)
y
y
(
a
i
i
i
a
i
n
j
ij
a
i
n
j
ij
Degrees of Freedom of Total Sum of Squares:
an – 1 = N-1
Degrees of Freedom of Error Sum of Squares:
a(n-1)
Degrees of Freedom of Treatment Sum of Squares:
(a-1)
N-1 = an – a + a – 1 = an – 1
21. Mean Squares
Expected (Mean Square Treatments) =
Expected (Mean Square Error) = 2
From the above it may be seen that the expected mean
squares error is an unbiased estimator of 2 while the Mean
Square Treatments is also an unbiased estimator of 2 if the
null hypothesis is true.
If the null hypothesis is not true, then the expected value of the
mean square treatment will exceed the expected value of the
mean square error due to the treatment effects.
a
i
i
a
n
1
2
2
1
22. Mean Squares
Expected (Mean Square Treatments) =
Expected (Mean Square Error) = 2
a
i
i
a
n
1
2
2
1
Error
Treatments
MS
MS
F
0
Hence the expected value of the Numerator in the test statistic
F0 is greater than the expected value in the Denominator.
We should reject H0 if the computed value of the above statistic
is sufficiently large. This implies a one-tail upper tail critical
region.
Hence we should reject fo when
fo > f1-α,a-1,a(n-1)
23. Short Cut Formulae for Computing Mean
Squares
Treatments
T
E
a
i
i
Treatments
a
i
n
j
ij
T
SS
SS
SS
N
y
n
y
SS
N
y
y
SS
2
1
2
2
1 1
2
25. Example of a Fixed Effects Model Analysis
A product development engineer is investigating the tensile
strength of a new synthetic fiber that will be used to make
cloth for men’s shirts. The strength is affected by wt.% of
cotton used in the blend of materials for the fiber. She
suspects that increasing the wt.% of cotton will increase
the strength. She knows that the cotton wt.% should be
between 10 – 40 if the final product has to have other
quality characteristics . The engineer decides to test
specimens at five levels of cotton wt.%.
36. Difference Between Treatment Means
n
MS
2
t
y
y
μ
n
MS
2
t
y
y
E
)
1
n
(
a
,
2
α
j
i
i
E
)
1
n
(
a
,
2
α
j
i
37. Confidence Intervals for Treatment Means
n
MS
t
)
y
y
(
n
MS
t
)
y
y
(
E
)
n
(
a
,
j
i
i
E
)
n
(
a
,
j
i
2
2
1
2
1
2
If the treatment mean difference CIs include zero,
then there is no difference between the treatments
43. What do ALL these mean anyway?
We have carried out a fixed effects model experiments involving ‘a’
treatments and ‘n’ repeats. What are the point estimates for , i and
i?
: ෝ
= ഥ
𝑦..
i: ෝ
𝜇𝑖= ത
𝑦𝑖.
i = i - and ෝ
i = ത
𝑦𝑖. - ഥ
𝑦..
44. Further Analysis on Treatment Means
A T-test may be performed by defining the T random variable as
follows with degrees of freedom associated with error sum of
squares viz. a(n-1) and not (a-1). This test helps to perform
Hypothesis Testing on µi and also construct 100(1-)% CI around it.
n
MS
y
n
y
T
E
i
i
i
i
i
45. Confidence Intervals for Treatment Means
n
MS
t
y
n
MS
t
y E
)
n
(
a
,
i
i
E
)
n
(
a
,
i
1
2
1
2
46. Further Analysis on Difference Between
Treatment Means
A T-test may be performed on difference in individual means using
the following relation with the null hypothesis usually being
H0: µi - µj =0 i.e. there is no difference between the means
H0: µi - µj 0 i.e. there is no difference between the means
n
MS
2
)
0
(
)
y
y
(
T
n
σ
n
σ
)
0
(
)
y
y
(
T
E
j
i
0
2
2
j
i
0
Assuming equal number of repeats in each treatment
47. Fisher’s Least Significant Difference (LSD)
If the treatment means difference (expressed on an absolute basis)
exceeds this value, then that treatment pair are different between one
another. On the other hand, if the absolute difference falls within this
LSD, then there is no difference between those particular pair of
treatments.
ത
𝑦𝑖. − ത
𝑦𝑗. < 𝑡𝛼/2
2𝑀𝑆𝐸
𝑛
and ത
𝑦𝑖. − ത
𝑦𝑗. > 𝑡𝛼/2
2𝑀𝑆𝐸
𝑛
n
MS
2
t
LSD
where
LSD
y
y
E
)
1
n
(
a
,
2
/
α
j
i
48. Trt. Mean Trt. Mean abs(diff) Cirterion Different?
2&1 15.400 9.800 5.600 3.746 YES
3&1 17.600 9.800 7.800 3.746 YES
4&1 21.600 9.800 11.800 3.746 YES
5&1 10.800 9.800 1.000 3.746 NO
Trt. Mean Trt. Mean abs(diff) Cirterion Different
3&2 17.6 15.4 2.200 3.746 NO
4&2 21.6 15.4 6.200 3.746 YES
5&2 10.8 15.4 4.600 3.746 YES
Trt. Mean Trt. Mean abs(diff) Cirterion Different
4&3 21.6 17.6 4.000 3.746 YES
5&3 10.8 17.6 6.800 3.746 YES
Trt. Mean Trt. Mean abs(diff) Cirterion Different
5&4 10.8 21.6 10.800 3.746 YES
53. type 1 type 2 type 3 type 4
sp1 9.3 9.4 9.2 9.7 9.4
sp2 9.4 9.3 9.4 9.6 9.425
sp3 9.6 9.8 9.5 10 9.725
sp4 10 9.9 9.7 10.2 9.95
mean 9.575 9.6 9.45 9.875 9.625
y2
i· 1466.89 1474.56 1428.84 1560.25
38.3 38.4 37.8 39.5
86.49 88.36 84.64 94.09
88.36 86.49 88.36 92.16
92.16 96.04 90.25 100
100 98.01 94.09 104.04
SST 1.29
SSTreatments 0.385
SSE 0.905
Short Cut Formulae for Computing Mean
Squares
54. type 1 type 2 type 3 type 4
sp1 9.3 9.4 9.2 9.7
sp2 9.4 9.3 9.4 9.6
sp3 9.6 9.8 9.5 10.0
sp4 10.0 9.9 9.7 10.2
55. One-way ANOVA:
Source DF SS MS F P
Factor 3 0.385 0.1283 1.7 0.22
Error 12 0.905 0.0754
Total 15 1.29
MINITAB OUTPUT
56.
57. Interpretation of Graphs
Residuals:
Broadly, a residual is defined as the difference between the experimental
observation and its predicted value
Here for the single variable fixed effects model prediction and is the
treatment mean
The residual eij has information on the unexplained variability.
Plotting the residuals against the normal probability plot leads to a straight
line if they are normally distributed. Watch out for any outliers in the data
which may explain additional variability that may not be dismissed as random
error (outlier).
i
ij
ij ŷ
y
e
i
ŷ
i
y
i
ij
i y
y
e
58. Interpretation of Graphs
Plots of Residuals versus fitted values
When the residuals are plotted against fitted values, the
pattern should not expand depending upon the value of the
treatment mean, i.e. you should not see a systematic
increase in the value of the residual with increase in the fitted
value.
In the graph seen we find that the residuals do not show a
funnel type increase with increasing treatment means.
59. Further Analysis of Treatment Means
Pooled Standard Deviation:
Use MSError as estimate of error variance.
Minitab also presents 95% CI on each treatment mean
µi=µ+i , i=1,2,…,a
Point Estimator of each µi is given by
2
2
Error
MS
i
i y
ˆ
60. Further Analysis of Treatment Means
Now situation becomes interesting
The treatment mean values taken from the experiment and are
expected to come from a population of mean µi and variance 2.
However, the distribution of the treatment means have a mean µi and
variance 2/n.
A T-test may be performed by defining the T random variable as
follows
n
σ
μ
y
T i
i
i
61. Further Minitab Analysis
A T-test may be performed by defining the T random variable as
follows
n
MS
y
n
y
T
E
i
i
i
i
i
62. Grouping Information Using Fisher Method
N Mean Grouping
type 4 4 9.8750 A
type 2 4 9.6000 A B
type 1 4 9.5750 A B
type 3 4 9.4500 B
Means that do not share a letter are significantly different.
Fisher 95% Individual Confidence Intervals
All Pairwise Comparisons
Simultaneous confidence level = 81.57%
Example:
448049
0
12
0754
0
2
178813
2
575
9
6
9
12
0754
0
2
575
9
6
9
1
2
.
/
.
*
*
.
)
.
.
(
/
.
*
*
t
)
.
.
(
UL
)
n
(
a
,
63. type 1 subtracted from:
Lower Center Upper -------+---------+---------+---------+--
type 2 -0.3981 0.0250 0.4481 (--------*-------)
type 3 -0.5481 -0.1250 0.2981 (-------*--------)
type 4 -0.1231 0.3000 0.7231 (-------*-------)
-------+---------+---------+---------+--
-0.50 0.00 0.50 1.00
type 2 subtracted from:
Lower Center Upper -------+---------+---------+---------+--
type 3 -0.5731 -0.1500 0.2731 (-------*-------)
type 4 -0.1481 0.2750 0.6981 (-------*--------)
-------+---------+---------+---------+--
-0.50 0.00 0.50 1.00
type 3 subtracted from:
Lower Center Upper -------+---------+---------+---------+--
type 4 0.0019 0.4250 0.8481 (--------*-------)
-------+---------+---------+---------+--
-0.50 0.00 0.50 1.00
64. Source DF SS MS F P
Factor 3 0.3850 0.1283 1.70 0.220
Error 12 0.9050 0.0754
Total 15 1.2900
S = 0.2746 R-Sq = 29.84% R-Sq(adj) = 12.31%
Individual 95% CIs For Mean Based on
Pooled StDev
Level N Mean StDev -----+---------+---------+---------+----
type 1 4 9.575 0.310 (---------*---------)
type 2 4 9.600 0.294 (---------*---------)
type 3 4 9.450 0.208 (---------*---------)
type 4 4 9.875 0.275 (---------*---------)
-----+---------+---------+---------+----
9.30 9.60 9.90 10.20
Pooled StDev = 0.275
65. Tukey 95% Simultaneous Confidence Intervals
All Pair wise Comparisons
Individual confidence level = 98.83%
type 1 subtracted from:
Lower Center Upper
type 2 -0.5517 0.0250 0.6017
type 3 -0.7017 - 0.1250 0.4517
type 4 -0.2767 0.3000 0.8767
66. ILLUSTRATION
An R&D facility has 15 test motors. Three different brands of
petrol are tested with each brand of petrol being assigned to
exactly 5 of the motors chosen at random.
The following data represents the mileages obtained from the
different motors.
Test the null hypothesis Ho: that average mileage obtained is
not affected by the type of petrol used.
Use the 5% level of significance.
67. Petrol 1 220 251 226 246 260
Petrol 2 244 235 232 242 225
Petrol 3 252 272 250 238 256
68. Mean Squares and their Ratios
Source DF SS MS F P
Factor 2 863 432 2.60 0.115
Error 12 1992 166
Total 14 2855
Use in excel @fdist(2.60,2,12) to find p value
69. ILLUSTRATION
Effect of air voids on % retained strength of asphalt:
In an experiment the asphalt with low levels of air voids( 2-4%),
medium (4-6%) and high (6-8%) are tested.
a. Do the different levels of air voids significantly affect the mean
retained strength? Use = 0.01
b. Find the P-value of the F-statistic in part (a)
c. Find the 95% CI on mean retained strength where there is a
high level of the air voids.
d. Find a 95% CI on the difference in mean retained strength at
the low and high levels of air voids.
70. ILLUSTRATION
Use MINITAB and Excel. Verify your calculations.
Air Voids Retained Strength (%)
Low 106 90 103 90 79 88 92 95
Medium 80 69 94 91 70 83 87 83
High 78 80 62 69 76 85 69 85
For transpose of columns do transpose (array of interest of size m X n).
Then select array of size (n X m) in worksheet incl. formula and then do
F2 and control shift enter
71. ————— 29-09-2012 17:45:42 ————————————————————
One-way ANOVA: Low, Medium, High
Source DF SS MS F P
Factor 2 1230.3 615.1 8.30 0.002
Error 21 1555.7 74.1
Total 23 2786.0
S = 8.607 R-Sq = 44.16% R-Sq(adj) = 38.84%
Individual 99% CIs For Mean Based on
Pooled StDev
Level N Mean StDev ---+---------+---------+---------+------
Low 8 92.88 8.56 (--------*-------)
Medium 8 82.13 9.01 (-------*--------)
High 8 75.50 8.23 (--------*-------)
---+---------+---------+---------+------
70 80 90 100
73. Short Cut Formulae
For transpose of columns do transpose (array of interest of size m X n).
Then select array of size (n X m) in worksheet incl. formula and then do
F2 and control shift enter
Treatments
T
E
a
i
i
Treatments
a
i
n
j
ij
T
SS
SS
SS
N
y
n
y
SS
N
y
y
SS
2
1
2
2
1 1
2
76. Means Talfa/2 LL UL
92.875 2.079614 86.54654 99.20346
82.125 2.079614 75.79654 88.45346
75.5 2.079614 69.17154 81.82846
differences Talfa/2 LL UL
17.375 2.079614 8.425208 26.32479
Confidence Intervals for Means and Difference in Means
95% CI
95% CI