Biometry BOOK by mbazi E

ABOUT THE INSTRUCTOR
 Emanuel Mbazi (Mr.)
Office # 9, Kepa Building, Mazimbu

“If your experiment needs a statistician,
you need a better experiment”.
Ernest Rutherford

COURSE CONTENTS:
1. DESIGNING OF AN EXPERIMENT
1.1 Introduction
1.3 Basic Principles
1.2 Terminologies in Experimental designs
2. ANALYSIS OF VARIANCE (ANOVA)
2.1 Introduction
2.2 One way classifications
2.3 Two way classifications
1.4 Some basic designs
2.4 Three way classification

4. MULTIPLE COMPARISONS
3.2 The randomized complete block design
3.3 The Latin square design
3.4 Factorial experiments
4.1 Introduction
4.2 Multiple comparisons procedures
4.3 Practical problems
3. EXPERIMENTAL DESIGNS
3.1 The completely randomized design (CRD)
3.1.1 Equal replication
3.1.2 Unequal replication

5. DATA TRANSFORMATIONS
5.1 Introduction
5.2 Data transformation techniques/families
5.3 Practical problems
6. SIMPLE LINEAR REGRESSION
6.1 Tests of significance of regression parameters (Intercept and Slope)
6.2 ANOVA to test for significance of slope parameters
6.3 Confidence intervals for regression parameters
6.4 Using the model for prediction
6.5 Introduction to Multiple linear regression analysis
7. ANALYSIS OF FREQUECY DATA
7.1 Contingency tables

1.1 INTRODUCTION TO DESIGN AND
ANALYSIS OF EXPERIMENTS
Questions:
What is the main purpose of running an experiment ?
What do one hope to be able to show?
Typically, an experiment may be run for one or more of the following reasons:
1. To determine the principal causes of variation in a measured
response
2. To find conditions that give rise to a maximum or minimum
response
3. To compare the response achieved at different settings of
controllable variables
4. To obtain a mathematical model in order to predict future
responses

Biometrics: is the application of statistics and
mathematics to problems with a
biological component, including the
problems in agricultural,
environmental, and biological sciences
as well as medical science.
Biometry: is a subject that is concerned with the
application of statistics and mathematics
to problems in the agricultural,
environmental, and biological sciences.
The Greek roots of biometry are bios (“life”) and metron (“measure”);
Hence biometry literally means “the measurement of life”.
1.2 Terminologies in Experimental designs

An Experiment involves the manipulation of one
or more experimental condition(s) by an
experimenter in order to determine the effects of
this manipulation to the response.
Much research departs from this pattern in that nature
rather than the experimenter manipulates the variables.
Such research is referred to as Observational studies
This course is concerned with COMPARATIVE
EXPERIMENTS.
These allows conclusions to be drawn about
cause
and effect (Causal relationships)

Experiment vs. Observational
OBSERVATIONAL STUDY
Researcher observes the response of interest under
natural conditions
EX: Surveys, weather patterns
 DESIGNED EXPERIMENT
Researcher controls variables that have a potential
effect on the response of interest
Qn. Which one helps establish cause-and-effect
relationships better?

 A treatment is something controlled and
administered by the researcher to an experimental
unit (EU)
– An experimental unit can also be thought of as the
physical entity assigned to receive a treatment from
which we measure the response
 Essentially a design is the proposed allocation of
treatments to experimental units (or vice-versa)

Experimental Units (EUs)
We now introduce the term “Experimental Unit” (EU);
-EU is the “material” to which treatment factors
(treatments) are assigned
This is different from an “Observational Unit” (OU);
- OU is part of an EU that is measured

 A source of variation is anything that could
cause an observation to be different from
another observation
Sources of Variation
Sources of Variation are of two types:
Those that can be controlled and are of interest are
called treatments or treatment factors
 Those that are not of interest but are difficult to
control are nuisance factors

Dependent variable
 The dependent variable (response) reflects
any effects associated with manipulation
of the independent variable
Independent Variables
 The variable that is under the control of the
experimenter.
The terms independent variables, treatments,
experimental conditions, controllable variables
can be used interchangeably

PROCESS
Z1 Z2 ZP
X1 X2 XP
…….
…….
INPUTS
Uncontrollable factors
Controllable factors
OUTPUT (Response)
The primary goal of an experiment is to determine the amount of
variation caused by the treatment factors in the presence of other
sources of variation
Adapted fro m Mo ntgo mery (201 3)

 Is a variable, which is believed to affect the
outcome of an experiment e.g. humidity,
pressure, time, concentration, fertilizer, grazing
period, sunlight, etc.
Factor
 The various values or classifications of the
factors are known as the levels of the factor(s).
For example, suppose we wish to compare the
efficacy of three medications (M1, M2, and M3)
for lowering blood pressure among middle aged
women, thus, there are three levels of the factor
Medication.
Level

 Is a measure of the variation among experimental
units that measures mainly inherent variation
among them.
 Thus, experimental error is a technical term and
does not mean a mistake, but includes all types of
extraneous variation due to:
Experimental error
-Inherent variability in the experimental units
-Error associated with the measurements made
-Lack of representativeness of the sample to the population
under study

The objective of the experiment may include the following;
Determine which conditions are most influential on the response
Determine where to set the influential conditions so that the
response is always near the desired nominal value
Determine where to set the influential conditions so that variability
in the response is small
Determine where to set the influential conditions so that the effects
of the uncontrollable Variables are minimized

EXAMPLE;
Researchers were
interested to see the
food consumption of
albino rats when
exposed to microwave
radiation
“If albino rats are subjected to microwave radiation,
then their food consumption will decrease”

Independent variable?
Dependant variable?
Nuisance factor (s)?
……………………….
……………………….
……………………….
TRY!

Design example:
 Your child comes home from school and shows you what they
learned in class.
 He/she asks for a film canister and an Alka-Seltzer tablet. They
fill the canister with a little water, put the tablet in the water,
close the canister and turn it upside down.
 After a few seconds, the canister flies in the air! Your child
wants to know how to make the canister fly as high as possible
= BOOM

Design example :
Question: Does the amount of alka-seltzer affect flight
time? Which amount gives the best time?
 The different amounts of alka-seltzer are:
– 1 1/2 tablets – 1 tablet
– 1/2 of a tablet
 For now, we will reuse the same film canister
 The response is the amount of time from liftoff to
landing in seconds

Design example :
What are some sources of variation?
– Amount of alka-seltzer (we control this)
– Amount of water
– Film canister seal
– Time Measurement
– Angle of liftoff
There may be more, let's choose the ones that we think
will be most significant and easiest to control

There have been four eras in the modern development
of statistical Experimental design
Agricultural era led by Ronald Fisher
Industrial era led by Box and Wilson
Quality improvement era led by Taguchi
Modern era

There are three fundamental concepts to any design:
– Replication of treatment
– Randomization of treatment assignment
– Local error control:
• Analysis of Covariance (ANCOVA)
• Blocking of EU's
Neglecting to acknowledge these will result in
unreliable results and immediate skepticism.
Fundamental Principles:

Treatments: Three different amounts of Alka-
Seltzer
EU's: Assume we have 9 nearly identical film
canisters.
How do we use the fundamental principles to
design this experiment?
Film Canister Experiment
The three basic principles were developed by Sir Ronald A. Fisher ,
during his time at Rothamsted Agricultural Experimentation Station.

 Replicating a treatment means assigning that
treatment to multiple EU's.
 Will reduce variance of estimates of that treatment's
effect.
 If we have equal interest in all the treatments, we
want to try to equally replicate the number of
treatment assignments.
FC Example: There are three treatments (tablet
size) and say we use 9 canisters. So 9/3=3 reps
Replication

Independent repeat run of each factor combination
Replication
Number of Experimental Units to which a treatment
is assigned
Advantages
 It allows the experimenter to obtain an estimate of the
experimental error
 It permits the experimenter to obtain more precise estimates

Replication Extension to EU
 Thus, a treatment is only replicated if it is
assigned to a new EU.
 Taking multiple observations on one EU (i.e.
creating more OUs) does not count as
replication – this is known as subsampling.

 Note that treating subsampling as
replicating increases the chance of
incorrect conclusions
(psuedoreplication)
 Variability in multiple measurements is
measurement error, rather than
experimental error

Randomly assign which EU gets a treatment
 How we randomize depends on the type of design.
 Clearly we must randomize before measurements are
taken.
 Reduces possibility of most types of bias caused by
unaccountable sources of variation
FC Example: Perhaps all film canisters have a chance of having a
small, indetectable hole. This will affect the pressure necessary to
launch the canister. Randomizing will give every treatment the same
chance of being affected by this.
Randomization

The allocation of experimental material and the order
in which the individual runs of the experiment are to
be performed are randomly determined.
Advantages
 Allows the observations (or errors) to be independently distributed
random variables (It ensures random samples).
 Proper randomization assist in “averaging out” the effects of
extraneous Factors that may be present.
Randomization cont.
It involves the assignment of treatments to the
experimental units, based on the chosen design, by
some chance mechanism or probabilistic procedures,
e.g. Random numbers

 There may exist other factors affecting the response
that we can't control or measure until we perform
an experiment. These are called covariates.
 We don't necessarily care about the covariate effect,
but by taking it into account we can better detect
treatment differences
 Covariate accounts for unexplained experimental
error
FC example: Varying wind speeds during launch
Local Error Control: Analysis of
Covariance

A block is a set of experimental units sharing a
common characteristics thought to affect the
response, and to which a separate random
assignment is made
 Blocking is used to reduce or eliminate the
variability transmitted from a controllable
nuisance factor
Local Error Control: Blocking

 Use this when there are factors we are aware of prior
the experiment, but we cannot control them.
 Group EU's so that each block contains EU's that
are more “homogeneous”.
 Compare treatments within a block, which can
account for variance that would otherwise be
considered as “noise” or “error” (coming from
differences in block effects)

FC Example: Maybe we want to use three different
types of film canisters which we feel may be
significantly different from each other.
Each box
represents an EU
with the block
trait
Blocks
9 EU's in each
block, call this
“block size”

 Covariates and block effects are
referred to as nuisance parameters
because they are “getting in
the way” of the estimation of
treatment effects
 Detecting treatment differences is
the
goal! We mainly include blocks
and/or
covariates to reduce experimental
error.

1.4 SOME STANDARD EXPERIMENTAL DESIGNS
The term experimental
design refers to a plan of
assigning experimental
conditions to subjects and
the statistical analysis
associated with the plan.
OR
An experimental design is a
rule that determines the
assignment of the
experimental units to the
treatments.

Some standard designs that are used frequently includes;
Completely Randomized design
A completely randomized design (CRD) refer to a design
in which the experimenter assigns the EU’s to the
treatments completely at random, subject only to the
number of observations to be taken on each treatment.
The model is of the form;
Response = constant + effect of a treatment + error

 The simplest design assumes all the EU's to be
similar and the primary source of variation is
the different treatments.
 A completely randomized design (CRD) will
randomize all treatment-EU assignments for the
specified number of treatment replications
Result: If equally interested in comparisons of all
treatments get as close as possible to equally
replicating the treatments
One Source of Variation: The CRD

CRD Example: FC Experiment
These are similar EUs
The design plan:
Before randomization
½ tablet 1 tablet 1 ½ tablet

CRD Example: FC Experiment
The implemented design

 Perhaps a single treatment is actually composed of a
combination of multiple factors with different levels.
 Example: For the FC experiment we may also vary
water amount (low/medium/high). In this case one
“treatment” is actually a combination of tablet
and water amount.
 The specific tablet and water amounts are referred
to as the levels of the tablet factor and water
factor, respectively.
CRD Extension: Factorial Experiments

Factorial Example: FC Experiment
½ tablet low water
1 ½ tablet high water
1 tablet medium
water

Factorial Example: FC Experiment

 The valuable approach to dealing with
several factors is to conduct a
FACTORIAL EXPERIMENT
 This is an experimental strategy in which
factors are varied together, instead of one
at a time

In a factorial design, in each complete trial
or replicate of the experiment, all possible
combination of the levels of the factors
are investigated.
e.g.
If there are a levels of factor A and b levels of factor B, each replicate
contains all ab treatment combinations
The model is of the form
Response = Constant + Effect of factor A + Effect of factor B
+ Interaction effect + Error term

Block designs
This is a design in which experimenter partitions the EU’s
in blocks, determines the allocation of treatments to
blocks, and assigns the EU’s within each block to the
treatments completely at random
Response = Constant + effect of a block
+ effect of treatment + error

 If the block size equals the number of treatments we
call this a randomized complete block design.
 You can think of this as separate CRD's for each
block. By that I mean we know we want all the
treatments once in each block and we
RANDOMIZE TREATMENTS IN EACH BLOCK
Block Design: RCBD

RCBD Analysis: FC Example
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
 Recall, the EU's in the
blocks are the time order
of reuses of same canister
 1 1 means 1/2 tablet, low
water; 3 3 means 1 1/2
tablet, high water
 Recall, we randomize
within each block (3 total
randomizations)

RCBD Analysis: FC Example
1 2
2 3
3 1
2 1
1 1
2 2
3 3
1 3
3 2
2 1
3 1
3 2
2 3
1 3
1 2
1 1
2 2
3 3
1 1
3 1
3 3
3 2
2 1
1 3
2 3
1 2
2 2

Designs with two blocking factors
These involves two major sources of variation that
have been designated as blocking factors.
Response = Constant + effect of row block
+ effect of column block
+ effect of treatment + error

 All Complex designs can be
constructed from and understood
in terms Of the three mentioned
basic designs

Example( CRD)
A pharmaceutical manufacturer wants to investigate the
bioactivity of a new drug. A completely Randomized single
factor experiment was conducted With three dosage levels,
and the following results were obtained.
Dosage Observations
20 g
30 g
40 g
24 28 37 30
37 44 31 35
42 47 52 38
Is there evidence to indicate that dosage level affects
bioactivity? Use alpha of 0.05

Example( CRD)
A civil engineer is interested in determining whether four
different methods of estimating flood flow frequency produce
Equivalent estimates of peak discharge when applied to the same
Watershed. The resulting discharge data (in cubic feet per second)
Are shown below.
Estimation
Method
Observations
1
2
3
4
0.34 0.12 1.23 0.70 1.75 0.12
0.91 2.94 2.14 2.36 2.86 4.55
6.31 8.37 9.75 6.09 9.82 7.24
17.15 11.82 10.95 17.20 14.35 16.82
Is there a significant difference? use alpha = 0.05

Example( RCBD)
A medical device manufacturer produces vascular grafts (artificial veins). These
Artificial veins are produced using Resin. Frequently the grafts contains
defects known as flicks which is a main cause for rejection. The manufacturer
Suspects that extrusion pressure affects the occurrence of flicks and therefore
intends to conduct the experiment to investigate this hypothesis. However the
Resin is manufactured by an external supplier and the manufacturer and delivered
in batches. The manufacturer suspects that there will be batch to batch variation
and decided to Conduct a blocking design.
Extrusion
Pressure (PSI)
Batches of Resins
8500
8700
8900
9100
1 2 3 4 5 6
90.3
92.5
85.5
82.5
89.2
89.5
90.8
89.5
98.2
90.6
89.6
85.6
93.9
94.7
86.2
87.4
87.4
87
88
78.9
97.6
95.8
93.4
90.7
Is there evidence at 5%?

Example(LSD)
An experimenter is studying the effects of five different formulations of a
Chemical product on the burning rate. Each formulation is mixed from a batch
of raw materials that is Only Large enough for five formulations to be tested.
Furthermore the formulations are prepared by different operators and they
may be a substantial difference in Skill and experience. This tells us that
there are two nuisance factors.
Batches
Raw Materials
Operators
1
2
3
4
5
1 2 3 4 5
A=24 B=20 C=19 D=24 E=24
B=17 C=24 D=30 E=27 A=36
C=18 D=38 E=26 A=27 B=21
D=26 E=31 A=26 B=23 C=22
E=22 A=30 B=20 C=29 D=31
Is there a significant difference at 5% level of significance?

INTRODUCTION TO FACTORIAL DESIGNS
Experiments often involves several factors, and usually
the objective of the experimenter is to determine the
influence these factors have on the response.
Several approaches can be employed to deal when
faced with more than one treatments
Best – guess Approach
Experimenter select an arbitrary combinations of
treatments, test them and see what happens

One - Factor - at - a - time (OFAT)
Consists of selecting a starting point, or baseline set of
levels, for each factor, and then successively varying
each factor over its range with the other factors held
constant at the baseline level.

The valuable approach to dealing with
several factors is to conduct a
FACTORIAL EXPERIMENT
This is an experimental strategy in which
factors are varied together, instead of one
at a time

B High
A High
B High
A Low
B Low
A Low
B Low
A High
Consider the following example (adapted from Montgomery, 2013)
of a two-factors (A and B) factorial
experiment with both design factors at two levels (High and Low)
5230
20 40

Main effect : Change in response produced by a
change in the level of a factor
Factor A
Main Effect = 40 + 52 _ 20 + 30
2 2
= 21
Factor B
Main Effect = ?
,Increasing factor A from low level to high lev
causes an average response increase of 21 un

Interaction
A High
B High
A High
B Low
A Low
B High
A Low
B Low
1240
20 50

At low level of factor B
The A effect = 50 – 20
= 30
At high level of factor B
The A effect = 12 - 40
= -28
The effect of A depends on the level chosen for factor B

“If the difference in response between the levels of one
factor is not the same at all levels of the other factors then
we say there is an interaction between the factors”
(Montgomery 2013)
The magnitude of the
interaction effect is the
average difference in
the two factor A effects
AB = (-28 – 30)
2
= -29
In this case, factor A has an effect, but it depends on the
level of factor B be chosen
A effect = 1

Interaction GraphicallyResponse
Response
Factor A Factor A
B High
B Low
B High
B Low
Low High Low High
A factorial experiment
without interaction
A factorial experiment with
interaction

Factorial designs has
several advantages;
They are more efficient than One Factor at a Time
A factorial design is necessary when interactions
may be present to avoid misleading conclusions
Factorial designs allow the effect of a factor to be
estimated at a several levels of the other factors,
yielding conclusions that are valid over a range
of experimental conditions

he two factor Factorial Desig
The simplest types of factorial design involves
only two factors.
There are a levels of factor A and b levels of
factor B, and these are arranged in a factorial
design.
There are n replicates, and each replicate of the
experiment contains all the ab combination.

Example
An engineer is designing a battery for use in a device that will be
subjected to some extreme variations in temperature. The only design
parameter that he can select is the plate material for the battery.
For the purpose of testing temperature can be controlled in the product
development laboratory (Montgomery, 2013)
Life (in hours) Data
TemperatureMaterial
Type 15 70 125
130
74
150
159
138
168
1
2
3
155
180
188
126
110
160
34
80
136
106
174
150
40
75
122
115
120
139
20
82
25
58
96
82
70
58
70
45
104
60

The design has two factors each at three levels and is
then regarded as 32
factorial design.
The engineer wants to answer the following questions;
1. What effects do material type and temperature have on the life
of the battery?
2 .Is there a choice of material that would give uniformly long life
regardless of temperature?
Both factors are assumed to be fixed,
hence we have a fixed effect model
The design is a completely Randomized Design

Analysis of Variance for Battery life (in hours)
Source DF Seq SS Adj SS Adj MS F P-value
Material Type 2 10683.7 10683.7 5341.9 7.91 0.002
Temperature 2 39118.7 39118.7 19559.4 28.97 0.000
Material Type*Temperature 4 9613.8 9613.8 2403.4 3.56 0.019
Error 27 18230.7 18230.7 675.2
Total 35 77647.0
We have a significant interaction between temperature
and material type.

Interaction plot
Significant interaction is indicated by the lack of parallelism of the
lines, Longer life is attained at low temperature, regardless
Of material type

The General Factorial Design
The results for the two – factor factorial
design may be extended to the general
case where there are a levels of factor A,
b levels of factor B, c levels of factor C,
and so on, arranged in a factorial
experiment.

Sometimes, it is not feasible or practical
to completely randomize all of the runs
in a factorial.
The presence of a nuisance factor may
require that experiment be run in blocks.
Response = Constant + Effect of factor A + Effect of factor B
+ interaction effect + Block Effect + Error term

The 2K
Factorial designs
This is a case of a factorial design with K factors, each
at only two levels.
These levels may be quantitative or qualitative.
A complete replicate of this design requires
2K
observation and is called 2K
factorial design.
Assumptions
1. The factors are fixed.
2. The designs are completely randomized.
3. The usual normality assumptions are satisfied.

The design with only two factors each at two levels is
called 22
factorial design
The levels of the factors may be arbitrarily called
“Low” and “High”
Factor
A B Treatment Combination
-
+
-
+
-
-
+
+
A Low, B Low
A High, B Low
A Low, B High
A High, B High
he order in which the runs are made is a completely
andomized experiment
(1)
a
b
ab

The four treatment combination in the design can be
represented by lower case letters
The high level factor in any treatment combination is
denoted by the corresponding lower case letter
The low level of a factor in a treatment combination is
represented by the absence of the corresponding letter
The average effect of a factor is the change in the
response produced by a change in the level of that
factor averaged over the levels of the other factor

The symbols (1), a, b, ab represents the total
of the observation at all n replicates
taken at a treatment combination
A main effect = 1/2n[ab + a – b – (1)]
B main effect = 1/2n[ab +b - a – (1)]
AB effect = 1/2n{[ab + (1) – a – b]

In experiments involving 2K
designs, it
is
always important to examine the
magnitude
and direction of the factor effect to
determine
which factors are likely to be importantEffect Magnitude and direction should always
be considered along with ANOVA, because the
ANOVA alone does not convey this information

Contrast A = ab + a – b – (1) = Total
effect of A
We can write the treatment combination in the order
(1), a, b, ab. Also called the standard order (or Yates order)
Treatment
Combination
Factorial Effect
I A B AB
(1)
a
b
ab
+
+
+
+
-
+
-
+
-
-
+
+
+
-
-
+
The above is also called the table of plus and minus signs
We define;

Suppose that three factors, A ,B and C, each at two levels
are of interest. The design is referred as 23
factorial design
Treatment
Combination
Factorial Effects
I A B AB C AC BC ABC
(1)
a
b
ab
c
ac
bc
abc
+
+
+
+
+
+
+
+
-
+
-
+
-
+
-
+
-
-
+
+
-
-
+
+
+
-
-
+
+
-
-
+
-
-
-
-
+
+
+
+
+
-
+
-
-
+
-
+
+
+
-
-
-
-
+
+
-
+
+
-
+
-
-
+
A contrast = [ab + a + ac + abc – (1) – b – c - bc
B contrast = ?

The design with K factors each at two levels is
called a 2K
factorial design
The treatment combination are written in
standard order using notation introduced
in a 22
and 23
designs
In General;

Biometry BOOK by mbazi E

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Biometry BOOK by mbazi E

Similar to Biometry BOOK by mbazi E (20)

More from musadoto

More from musadoto (20)

Recently uploaded

Recently uploaded (20)

Biometry BOOK by mbazi E