3. “If your experiment needs a statistician,
you need a better experiment”.
Ernest Rutherford
4. COURSE CONTENTS:
1. DESIGNING OF AN EXPERIMENT
1.1 Introduction
1.3 Basic Principles
1.2 Terminologies in Experimental designs
2. ANALYSIS OF VARIANCE (ANOVA)
2.1 Introduction
2.2 One way classifications
2.3 Two way classifications
1.4 Some basic designs
2.4 Three way classification
6. 5. DATA TRANSFORMATIONS
5.1 Introduction
5.2 Data transformation techniques/families
5.3 Practical problems
6. SIMPLE LINEAR REGRESSION
6.1 Tests of significance of regression parameters (Intercept and Slope)
6.2 ANOVA to test for significance of slope parameters
6.3 Confidence intervals for regression parameters
6.4 Using the model for prediction
6.5 Introduction to Multiple linear regression analysis
7. ANALYSIS OF FREQUECY DATA
7.1 Contingency tables
7. 1.1 INTRODUCTION TO DESIGN AND
ANALYSIS OF EXPERIMENTS
Questions:
What is the main purpose of running an experiment ?
What do one hope to be able to show?
Typically, an experiment may be run for one or more of the following reasons:
1. To determine the principal causes of variation in a measured
response
2. To find conditions that give rise to a maximum or minimum
response
3. To compare the response achieved at different settings of
controllable variables
4. To obtain a mathematical model in order to predict future
responses
8. Biometrics: is the application of statistics and
mathematics to problems with a
biological component, including the
problems in agricultural,
environmental, and biological sciences
as well as medical science.
Biometry: is a subject that is concerned with the
application of statistics and mathematics
to problems in the agricultural,
environmental, and biological sciences.
The Greek roots of biometry are bios (“life”) and metron (“measure”);
Hence biometry literally means “the measurement of life”.
1.2 Terminologies in Experimental designs
9. An Experiment involves the manipulation of one
or more experimental condition(s) by an
experimenter in order to determine the effects of
this manipulation to the response.
Much research departs from this pattern in that nature
rather than the experimenter manipulates the variables.
Such research is referred to as Observational studies
This course is concerned with COMPARATIVE
EXPERIMENTS.
These allows conclusions to be drawn about
cause
and effect (Causal relationships)
10. Experiment vs. Observational
OBSERVATIONAL STUDY
Researcher observes the response of interest under
natural conditions
EX: Surveys, weather patterns
DESIGNED EXPERIMENT
Researcher controls variables that have a potential
effect on the response of interest
Qn. Which one helps establish cause-and-effect
relationships better?
12. A treatment is something controlled and
administered by the researcher to an experimental
unit (EU)
– An experimental unit can also be thought of as the
physical entity assigned to receive a treatment from
which we measure the response
Essentially a design is the proposed allocation of
treatments to experimental units (or vice-versa)
13. Experimental Units (EUs)
We now introduce the term “Experimental Unit” (EU);
-EU is the “material” to which treatment factors
(treatments) are assigned
This is different from an “Observational Unit” (OU);
- OU is part of an EU that is measured
14. A source of variation is anything that could
cause an observation to be different from
another observation
Sources of Variation
Sources of Variation are of two types:
Those that can be controlled and are of interest are
called treatments or treatment factors
Those that are not of interest but are difficult to
control are nuisance factors
15. Dependent variable
The dependent variable (response) reflects
any effects associated with manipulation
of the independent variable
Independent Variables
The variable that is under the control of the
experimenter.
The terms independent variables, treatments,
experimental conditions, controllable variables
can be used interchangeably
16. PROCESS
Z1 Z2 ZP
X1 X2 XP
…….
…….
INPUTS
Uncontrollable factors
Controllable factors
OUTPUT (Response)
The primary goal of an experiment is to determine the amount of
variation caused by the treatment factors in the presence of other
sources of variation
Adapted fro m Mo ntgo mery (201 3)
17. Is a variable, which is believed to affect the
outcome of an experiment e.g. humidity,
pressure, time, concentration, fertilizer, grazing
period, sunlight, etc.
Factor
The various values or classifications of the
factors are known as the levels of the factor(s).
For example, suppose we wish to compare the
efficacy of three medications (M1, M2, and M3)
for lowering blood pressure among middle aged
women, thus, there are three levels of the factor
Medication.
Level
18. Is a measure of the variation among experimental
units that measures mainly inherent variation
among them.
Thus, experimental error is a technical term and
does not mean a mistake, but includes all types of
extraneous variation due to:
Experimental error
-Inherent variability in the experimental units
-Error associated with the measurements made
-Lack of representativeness of the sample to the population
under study
19. The objective of the experiment may include the following;
Determine which conditions are most influential on the response
Determine where to set the influential conditions so that the
response is always near the desired nominal value
Determine where to set the influential conditions so that variability
in the response is small
Determine where to set the influential conditions so that the effects
of the uncontrollable Variables are minimized
20. EXAMPLE;
Researchers were
interested to see the
food consumption of
albino rats when
exposed to microwave
radiation
“If albino rats are subjected to microwave radiation,
then their food consumption will decrease”
22. Design example:
Your child comes home from school and shows you what they
learned in class.
He/she asks for a film canister and an Alka-Seltzer tablet. They
fill the canister with a little water, put the tablet in the water,
close the canister and turn it upside down.
After a few seconds, the canister flies in the air! Your child
wants to know how to make the canister fly as high as possible
= BOOM
23. Design example :
Question: Does the amount of alka-seltzer affect flight
time? Which amount gives the best time?
The different amounts of alka-seltzer are:
– 1 1/2 tablets – 1 tablet
– 1/2 of a tablet
For now, we will reuse the same film canister
The response is the amount of time from liftoff to
landing in seconds
24. Design example :
What are some sources of variation?
– Amount of alka-seltzer (we control this)
– Amount of water
– Film canister seal
– Time Measurement
– Angle of liftoff
There may be more, let's choose the ones that we think
will be most significant and easiest to control
25. There have been four eras in the modern development
of statistical Experimental design
Agricultural era led by Ronald Fisher
Industrial era led by Box and Wilson
Quality improvement era led by Taguchi
Modern era
26. There are three fundamental concepts to any design:
– Replication of treatment
– Randomization of treatment assignment
– Local error control:
• Analysis of Covariance (ANCOVA)
• Blocking of EU's
Neglecting to acknowledge these will result in
unreliable results and immediate skepticism.
Fundamental Principles:
27. Treatments: Three different amounts of Alka-
Seltzer
EU's: Assume we have 9 nearly identical film
canisters.
How do we use the fundamental principles to
design this experiment?
Film Canister Experiment
The three basic principles were developed by Sir Ronald A. Fisher ,
during his time at Rothamsted Agricultural Experimentation Station.
28. Replicating a treatment means assigning that
treatment to multiple EU's.
Will reduce variance of estimates of that treatment's
effect.
If we have equal interest in all the treatments, we
want to try to equally replicate the number of
treatment assignments.
FC Example: There are three treatments (tablet
size) and say we use 9 canisters. So 9/3=3 reps
Replication
29. Independent repeat run of each factor combination
Replication
Number of Experimental Units to which a treatment
is assigned
Advantages
It allows the experimenter to obtain an estimate of the
experimental error
It permits the experimenter to obtain more precise estimates
30. Replication Extension to EU
Thus, a treatment is only replicated if it is
assigned to a new EU.
Taking multiple observations on one EU (i.e.
creating more OUs) does not count as
replication – this is known as subsampling.
31. Note that treating subsampling as
replicating increases the chance of
incorrect conclusions
(psuedoreplication)
Variability in multiple measurements is
measurement error, rather than
experimental error
32. Randomly assign which EU gets a treatment
How we randomize depends on the type of design.
Clearly we must randomize before measurements are
taken.
Reduces possibility of most types of bias caused by
unaccountable sources of variation
FC Example: Perhaps all film canisters have a chance of having a
small, indetectable hole. This will affect the pressure necessary to
launch the canister. Randomizing will give every treatment the same
chance of being affected by this.
Randomization
33. The allocation of experimental material and the order
in which the individual runs of the experiment are to
be performed are randomly determined.
Advantages
Allows the observations (or errors) to be independently distributed
random variables (It ensures random samples).
Proper randomization assist in “averaging out” the effects of
extraneous Factors that may be present.
Randomization cont.
It involves the assignment of treatments to the
experimental units, based on the chosen design, by
some chance mechanism or probabilistic procedures,
e.g. Random numbers
34. There may exist other factors affecting the response
that we can't control or measure until we perform
an experiment. These are called covariates.
We don't necessarily care about the covariate effect,
but by taking it into account we can better detect
treatment differences
Covariate accounts for unexplained experimental
error
FC example: Varying wind speeds during launch
Local Error Control: Analysis of
Covariance
35. A block is a set of experimental units sharing a
common characteristics thought to affect the
response, and to which a separate random
assignment is made
Blocking is used to reduce or eliminate the
variability transmitted from a controllable
nuisance factor
Local Error Control: Blocking
36. Use this when there are factors we are aware of prior
the experiment, but we cannot control them.
Group EU's so that each block contains EU's that
are more “homogeneous”.
Compare treatments within a block, which can
account for variance that would otherwise be
considered as “noise” or “error” (coming from
differences in block effects)
Local Error Control: Blocking
37. FC Example: Maybe we want to use three different
types of film canisters which we feel may be
significantly different from each other.
Local Error Control: Blocking
Each box
represents an EU
with the block
trait
Blocks
9 EU's in each
block, call this
“block size”
38. Covariates and block effects are
referred to as nuisance parameters
because they are “getting in
the way” of the estimation of
treatment effects
Detecting treatment differences is
the
goal! We mainly include blocks
and/or
covariates to reduce experimental
error.
39. 1.4 SOME STANDARD EXPERIMENTAL DESIGNS
The term experimental
design refers to a plan of
assigning experimental
conditions to subjects and
the statistical analysis
associated with the plan.
OR
An experimental design is a
rule that determines the
assignment of the
experimental units to the
treatments.
40. Some standard designs that are used frequently includes;
Completely Randomized design
A completely randomized design (CRD) refer to a design
in which the experimenter assigns the EU’s to the
treatments completely at random, subject only to the
number of observations to be taken on each treatment.
The model is of the form;
Response = constant + effect of a treatment + error
41. The simplest design assumes all the EU's to be
similar and the primary source of variation is
the different treatments.
A completely randomized design (CRD) will
randomize all treatment-EU assignments for the
specified number of treatment replications
Result: If equally interested in comparisons of all
treatments get as close as possible to equally
replicating the treatments
One Source of Variation: The CRD
42. CRD Example: FC Experiment
These are similar EUs
The design plan:
Before randomization
½ tablet 1 tablet 1 ½ tablet
44. Perhaps a single treatment is actually composed of a
combination of multiple factors with different levels.
Example: For the FC experiment we may also vary
water amount (low/medium/high). In this case one
“treatment” is actually a combination of tablet
and water amount.
The specific tablet and water amounts are referred
to as the levels of the tablet factor and water
factor, respectively.
CRD Extension: Factorial Experiments
45. Factorial Example: FC Experiment
½ tablet low water
1 ½ tablet high water
1 tablet medium
water
47. The valuable approach to dealing with
several factors is to conduct a
FACTORIAL EXPERIMENT
This is an experimental strategy in which
factors are varied together, instead of one
at a time
48. In a factorial design, in each complete trial
or replicate of the experiment, all possible
combination of the levels of the factors
are investigated.
e.g.
If there are a levels of factor A and b levels of factor B, each replicate
contains all ab treatment combinations
The model is of the form
Response = Constant + Effect of factor A + Effect of factor B
+ Interaction effect + Error term
49. Block designs
This is a design in which experimenter partitions the EU’s
in blocks, determines the allocation of treatments to
blocks, and assigns the EU’s within each block to the
treatments completely at random
The model is of the form
Response = Constant + effect of a block
+ effect of treatment + error
50. If the block size equals the number of treatments we
call this a randomized complete block design.
You can think of this as separate CRD's for each
block. By that I mean we know we want all the
treatments once in each block and we
RANDOMIZE TREATMENTS IN EACH BLOCK
Block Design: RCBD
51. RCBD Analysis: FC Example
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
Recall, the EU's in the
blocks are the time order
of reuses of same canister
1 1 means 1/2 tablet, low
water; 3 3 means 1 1/2
tablet, high water
Recall, we randomize
within each block (3 total
randomizations)
53. Designs with two blocking factors
These involves two major sources of variation that
have been designated as blocking factors.
The model is of the form
Response = Constant + effect of row block
+ effect of column block
+ effect of treatment + error
54. All Complex designs can be
constructed from and understood
in terms Of the three mentioned
basic designs
55.
56. Example( CRD)
A pharmaceutical manufacturer wants to investigate the
bioactivity of a new drug. A completely Randomized single
factor experiment was conducted With three dosage levels,
and the following results were obtained.
Dosage Observations
20 g
30 g
40 g
24 28 37 30
37 44 31 35
42 47 52 38
Is there evidence to indicate that dosage level affects
bioactivity? Use alpha of 0.05
57. Example( CRD)
A civil engineer is interested in determining whether four
different methods of estimating flood flow frequency produce
Equivalent estimates of peak discharge when applied to the same
Watershed. The resulting discharge data (in cubic feet per second)
Are shown below.
Estimation
Method
Observations
1
2
3
4
0.34 0.12 1.23 0.70 1.75 0.12
0.91 2.94 2.14 2.36 2.86 4.55
6.31 8.37 9.75 6.09 9.82 7.24
17.15 11.82 10.95 17.20 14.35 16.82
Is there a significant difference? use alpha = 0.05
58. Example( RCBD)
A medical device manufacturer produces vascular grafts (artificial veins). These
Artificial veins are produced using Resin. Frequently the grafts contains
defects known as flicks which is a main cause for rejection. The manufacturer
Suspects that extrusion pressure affects the occurrence of flicks and therefore
intends to conduct the experiment to investigate this hypothesis. However the
Resin is manufactured by an external supplier and the manufacturer and delivered
in batches. The manufacturer suspects that there will be batch to batch variation
and decided to Conduct a blocking design.
Extrusion
Pressure (PSI)
Batches of Resins
8500
8700
8900
9100
1 2 3 4 5 6
90.3
92.5
85.5
82.5
89.2
89.5
90.8
89.5
98.2
90.6
89.6
85.6
93.9
94.7
86.2
87.4
87.4
87
88
78.9
97.6
95.8
93.4
90.7
Is there evidence at 5%?
59. Example(LSD)
An experimenter is studying the effects of five different formulations of a
Chemical product on the burning rate. Each formulation is mixed from a batch
of raw materials that is Only Large enough for five formulations to be tested.
Furthermore the formulations are prepared by different operators and they
may be a substantial difference in Skill and experience. This tells us that
there are two nuisance factors.
Batches
Raw Materials
Operators
1
2
3
4
5
1 2 3 4 5
A=24 B=20 C=19 D=24 E=24
B=17 C=24 D=30 E=27 A=36
C=18 D=38 E=26 A=27 B=21
D=26 E=31 A=26 B=23 C=22
E=22 A=30 B=20 C=29 D=31
Is there a significant difference at 5% level of significance?
60. INTRODUCTION TO FACTORIAL DESIGNS
Experiments often involves several factors, and usually
the objective of the experimenter is to determine the
influence these factors have on the response.
Several approaches can be employed to deal when
faced with more than one treatments
Best – guess Approach
Experimenter select an arbitrary combinations of
treatments, test them and see what happens
61. One - Factor - at - a - time (OFAT)
Consists of selecting a starting point, or baseline set of
levels, for each factor, and then successively varying
each factor over its range with the other factors held
constant at the baseline level.
62. The valuable approach to dealing with
several factors is to conduct a
FACTORIAL EXPERIMENT
This is an experimental strategy in which
factors are varied together, instead of one
at a time
63. In a factorial design, in each complete trial
or replicate of the experiment, all possible
combination of the levels of the factors
are investigated.
e.g.
If there are a levels of factor A and b levels of factor B, each replicate
contains all ab treatment combinations
The model is of the form
Response = Constant + Effect of factor A + Effect of factor B
+ Interaction effect + Error term
64. B High
A High
B High
A Low
B Low
A Low
B Low
A High
Consider the following example (adapted from Montgomery, 2013)
of a two-factors (A and B) factorial
experiment with both design factors at two levels (High and Low)
5230
20 40
65. Main effect : Change in response produced by a
change in the level of a factor
Factor A
Main Effect = 40 + 52 _ 20 + 30
2 2
= 21
Factor B
Main Effect = ?
,Increasing factor A from low level to high lev
causes an average response increase of 21 un
67. At low level of factor B
The A effect = 50 – 20
= 30
At high level of factor B
The A effect = 12 - 40
= -28
The effect of A depends on the level chosen for factor B
68. “If the difference in response between the levels of one
factor is not the same at all levels of the other factors then
we say there is an interaction between the factors”
(Montgomery 2013)
The magnitude of the
interaction effect is the
average difference in
the two factor A effects
AB = (-28 – 30)
2
= -29
In this case, factor A has an effect, but it depends on the
level of factor B be chosen
A effect = 1
70. Factorial designs has
several advantages;
They are more efficient than One Factor at a Time
A factorial design is necessary when interactions
may be present to avoid misleading conclusions
Factorial designs allow the effect of a factor to be
estimated at a several levels of the other factors,
yielding conclusions that are valid over a range
of experimental conditions
71. he two factor Factorial Desig
The simplest types of factorial design involves
only two factors.
There are a levels of factor A and b levels of
factor B, and these are arranged in a factorial
design.
There are n replicates, and each replicate of the
experiment contains all the ab combination.
72. Example
An engineer is designing a battery for use in a device that will be
subjected to some extreme variations in temperature. The only design
parameter that he can select is the plate material for the battery.
For the purpose of testing temperature can be controlled in the product
development laboratory (Montgomery, 2013)
Life (in hours) Data
TemperatureMaterial
Type 15 70 125
130
74
150
159
138
168
1
2
3
155
180
188
126
110
160
34
80
136
106
174
150
40
75
122
115
120
139
20
82
25
58
96
82
70
58
70
45
104
60
73. The design has two factors each at three levels and is
then regarded as 32
factorial design.
The engineer wants to answer the following questions;
1. What effects do material type and temperature have on the life
of the battery?
2 .Is there a choice of material that would give uniformly long life
regardless of temperature?
Both factors are assumed to be fixed,
hence we have a fixed effect model
The design is a completely Randomized Design
74. Analysis of Variance for Battery life (in hours)
Source DF Seq SS Adj SS Adj MS F P-value
Material Type 2 10683.7 10683.7 5341.9 7.91 0.002
Temperature 2 39118.7 39118.7 19559.4 28.97 0.000
Material Type*Temperature 4 9613.8 9613.8 2403.4 3.56 0.019
Error 27 18230.7 18230.7 675.2
Total 35 77647.0
We have a significant interaction between temperature
and material type.
75. Interaction plot
Significant interaction is indicated by the lack of parallelism of the
lines, Longer life is attained at low temperature, regardless
Of material type
76. The General Factorial Design
The results for the two – factor factorial
design may be extended to the general
case where there are a levels of factor A,
b levels of factor B, c levels of factor C,
and so on, arranged in a factorial
experiment.
77. Sometimes, it is not feasible or practical
to completely randomize all of the runs
in a factorial.
The presence of a nuisance factor may
require that experiment be run in blocks.
The model is of the form
Response = Constant + Effect of factor A + Effect of factor B
+ interaction effect + Block Effect + Error term
78. The 2K
Factorial designs
This is a case of a factorial design with K factors, each
at only two levels.
These levels may be quantitative or qualitative.
A complete replicate of this design requires
2K
observation and is called 2K
factorial design.
Assumptions
1. The factors are fixed.
2. The designs are completely randomized.
3. The usual normality assumptions are satisfied.
79. The design with only two factors each at two levels is
called 22
factorial design
The levels of the factors may be arbitrarily called
“Low” and “High”
Factor
A B Treatment Combination
-
+
-
+
-
-
+
+
A Low, B Low
A High, B Low
A Low, B High
A High, B High
he order in which the runs are made is a completely
andomized experiment
(1)
a
b
ab
80. The four treatment combination in the design can be
represented by lower case letters
The high level factor in any treatment combination is
denoted by the corresponding lower case letter
The low level of a factor in a treatment combination is
represented by the absence of the corresponding letter
The average effect of a factor is the change in the
response produced by a change in the level of that
factor averaged over the levels of the other factor
81. The symbols (1), a, b, ab represents the total
of the observation at all n replicates
taken at a treatment combination
A main effect = 1/2n[ab + a – b – (1)]
B main effect = 1/2n[ab +b - a – (1)]
AB effect = 1/2n{[ab + (1) – a – b]
82. In experiments involving 2K
designs, it
is
always important to examine the
magnitude
and direction of the factor effect to
determine
which factors are likely to be importantEffect Magnitude and direction should always
be considered along with ANOVA, because the
ANOVA alone does not convey this information
83. Contrast A = ab + a – b – (1) = Total
effect of A
We can write the treatment combination in the order
(1), a, b, ab. Also called the standard order (or Yates order)
Treatment
Combination
Factorial Effect
I A B AB
(1)
a
b
ab
+
+
+
+
-
+
-
+
-
-
+
+
+
-
-
+
The above is also called the table of plus and minus signs
We define;
84. Suppose that three factors, A ,B and C, each at two levels
are of interest. The design is referred as 23
factorial design
Treatment
Combination
Factorial Effects
I A B AB C AC BC ABC
(1)
a
b
ab
c
ac
bc
abc
+
+
+
+
+
+
+
+
-
+
-
+
-
+
-
+
-
-
+
+
-
-
+
+
+
-
-
+
+
-
-
+
-
-
-
-
+
+
+
+
+
-
+
-
-
+
-
+
+
+
-
-
-
-
+
+
-
+
+
-
+
-
-
+
A contrast = [ab + a + ac + abc – (1) – b – c - bc
B contrast = ?
85. The design with K factors each at two levels is
called a 2K
factorial design
The treatment combination are written in
standard order using notation introduced
in a 22
and 23
designs
In General;