BIOMETRY BOOK

Pendael Zephania Machafuko
Department of Biometry and Mathematics
Sokoine University ofAgriculture
Mobile phone: +255655397495
:+255688397495
Email address: p_zephania@yahoo.com
“not ability to reproduce but ability to produce”
Design and Analysis of Experiments
(MTH201 Lecture Notes)

Course objective
01/11/2013Design andAnalysis of Experiments2
 Student be able to design an experiment in context of his/her
specialization using statistical concepts
 Student should be able to differentiate different types of
experimental designs
 Student be able to appropriately allocate treatments to
experimental units and identify possible confounders
 Student be able to perform analysis of variance to determine the
treatment effects and examine internal and external validity of an
experiment

Mode of teaching and assessment
 Lectures, seminars and presentations
 Final examination will contribute 60% of the end of semester
marks
 Seminar reports and presentations will contribute 20% of the
end of semester marks
 Tests will contribute 20% of the end of semester marks

Scientific studies
 Simple and effective statistical analysis
 Understanding of subject matter
 Provide precise parameter estimates
 Improved statistical power

Overview of Experimental Design
Experimental study Observational study
 Cause-effect relationship between
response and explanatory variables
 Are comparative in nature
 Explanatory factor levels referred
to treatment
 Unit of analysis referred to as
experimental unit
 Randomization –assigning
treatment levels to experimental
units at random
 Predictor variables can be can be
controlled
Association between explanatory
and response variables
Not comparative
No randomization
Predictor variables cannot be
controlled by investigator

Application of Experimental Design
 Improve performance of a process or system
 Reduced variability and closer conformance to nominal or target
requirements
 Reduced development time
 Reduced overall cost

Treatment
 Complete description of what will be applied to the experimental
unit
 Treatments are applications that can stimulate response e.g. wheat
varieties, diets, fertilizers, nutrients
 Treatment to be considered in an experiment constitute
combination of the levels of factors e.g. fertilizers (nitrogen,
phosphate, potassium), and soil type (loam, clay, sand)

Factor
 Explanatory variable (s) manipulated by the experimenter
 Levels of a factor-the values of a specific factor e.g. cattle breed
with levels Boran, Nndama, Freshian

Examples of experimental units
 Plots in agricultural experiments
 Pots in greenhouse experiments
 Pens or individual animals in animal experiments
 Farms or farmers in non-farm survey/trials
 Patients in medical trials
 Farms in disease survey/trials

Examples of experimental units(1)

Examples of experimental units(2)

Response variable
 Measured as the outcome of interest in the experiment. E.g.
weight gained by calves after diet use
 In many agriculture experiments the yield of experimental units
to treatments is mostly a measurement of interest e.g. yield of
wheat, milk yield.

Response variable(1)
 Differences in the response variable from different experimental
units subjected to the same treatment may be due to number of
small uncontrollable differences versus slight differences in
Environment- temperature, soil conditions (fertility, acidity,
human), pests, diseases
Raw materials-slight differences in seed condition
Management regimes

Experimental error
 All variations that can be attributed to the effects of all non-
treatment factors and other unidentified disturbance factor(s)

Contribution of statistics to
experimentation
 Planning the experiment so that appropriate data can be
generated
 Knowing the mechanism generated data help to identify
appropriate statistical methods
 Attain valid and objective conclusions

Principles of Experimental Design
Replication
Randomization
Blocking

replication
 Number of times each treatment is repeated
 Instead of having a single large plot of each treatment, there are
several smaller ones known as replicates
 The difference in responses for the same treatment is due to
experimental error
 Experimental error must be small for a well designed study

Why replicates?
 Replication is desirable because it
Enlarges scope of investigation
Enhances precision and overall efficiency
Minimizes experimental error because it reduces plot size to a
precision-enhancing form
Permits determination of experimental error

Properties of replication
basic unit of measurement for determining whether the
observed differences in the data are really statistically
different
Permits precise estimation of treatment effect if sample mean
is used to estimate the effect of a factor, e.g., if 𝜎2
is the
variance of an individual observation and there are n
replicates, the variance of the sample mean 𝜎 𝑦
2
=
𝜎2
𝑛

randomization
 Act of assigning treatments to the experimental units purely on
the basis of chance i.e. every treatment has equal chance of being
allocated to any given plot
 Statistical methods require that the observations be
independently random variables
 Averaging out the effects of extraneous factors present i.e.,
systematic effects are not under the control of the investigator
 Statistical estimation and tests of hypothesis on effects are
theoretically valid

Why randomize?
 Overcome systematic effects
 Avoid selection bias
 Minimize accidental bias
 Stop experimental cheating (for good or bad)
 Ensure no particular patterns in treatment allocation

How to randomize
 Table of random numbers
 Computer package
 Randomization schemes, such as simple and permuted blocks

blocking
 Heterogeneous experimental units are divided into
homogeneous subgroups called blocks to facilitate isolation of
block variation that could distort treatment effects
 Heterogeneity may be due to soil fertility, land gradient, animal
weights, age, etc.
 Used to improve the precision when comparisons among the
factors of interest are made.
 Reduce or eliminate the variability transmitted from nuisance
factors i.e., factors that influence experimental response

Blocking variables (1)
 In agricultural experiments;
Soil type or fertility level
Extent and nature of previous cropping
Degree of pest infestation
Direction of wind in wind-control pest disease trial
Moisture level

Blocking variables(2)
 Livestock experiment, animal of similar
Weight
Age
Previous milk yield
Lactation

Why blocking?
 Blocking is an error-control strategy that when used effectively
reduces error variances
increases precision
Reliability of estimates of effects

Advantages of blocking
 Guarantee that the same number of two different
homogeneous groups will receive each treatment
 Increases the range of validity for the conclusions from the
experiment i.e., provide sufficient variability between groups
of experimental units in different groups for a wider range of
generalizability
 High precision because of small experimental errors within
blocks

Experimental validity
 Assessment of the quality of an experimental design requires
knowledge of the factors that influence or cause variation in the
measured outcomes
 Two concepts to consider
Internal validity
conclusion can be made only about the relationship between
dependent and independent variables
External validity
Conclusion from the experiment can be appropriately generalized
to a wider situation of interest

assignment
 With respect to your profession design an experiment based
on the following;
 experimental units
 treatments
 response variable
 use three principles of experimental design
 is that experiment valid external?
 state the assumptions of your experiment
 suggest the appropriate statistical methodology

Types of experimental design
 Some basic designs commonly used in field experiments;
Single level experimental units designs
Completely randomized designs
Randomized complete block designs
Latin squares designs
Multiple level experimental units designs
Split-plot Designs
On-farm experiments
Inter-cropping
Repeated measures experiments

Single level experimental units designs
 Treatments applied to the plots and measurements taken on the
plots

Completely Randomized Design
 Levels of treatment are randomly assigned to the experimental
units (no allocation restrictions)
 Expected effects are from between and within treatment
differences only
 Within variation due to experimental units behaving differently
under the same treatment
 Experimental units assumed to be homogeneous or similar in their
reaction to same treatment stimulus
 Basic CRD has one treatment with L levels and n replicates

CRD Example
 Suppose that a study involves three varieties of wheat and there
are 27 plots available
 In equal replication, the three wheat varieties will be randomly
allocated to the plots, 9 for each. 𝑁 = 𝑛𝐿 (balanced design)
 In unequal allocation then we may have 11 plots variety 1, 7 plots
variety 2 and 9 plots variety3. 𝑁 = 𝑛𝑖
𝐿
𝑖=1 (unbalanced
design)

Prospects and problems of CRD
advantages disadvantages
 Easy to set up and analyze
 Provide maximum number of
degrees of freedom for
estimation of error variation
 Missing values cause no
difficulty
 Suitable only for
homogeneous experimental
material
 Suitable only for small
numbers of treatments

CRD Model
 Model
-Yield=overall mean+ treatment+ exper. Error i.e., 𝑦𝑖𝑗 = 𝜇 + 𝜏𝑖 + 𝜀𝑖𝑗 where 𝑖 = 1,2, … , 𝐿 𝑎𝑛𝑑 𝑗 = 1,2, … , 𝑛𝑖
 Assumptions
additive effects
Independent homogeneous independent error terms
Constant variance of error terms
Normal error terms
 Analysis to obtain
Treatment effects
Experimental error variance
Test of treatment effects

CRD Outcome measurements
Treatment Levels
1 2 … L
𝑦11 𝑦21 𝑦 𝐿1
𝑦12 𝑦22 𝑦 𝐿2
. . .
. . .
. . .
𝑦1𝑛1 𝑦1𝑛2 𝑦1𝑛𝐿
Sample mean 𝑦1 𝑦2 … 𝑦 𝐿
Sample SD 𝑠1 𝑠2 𝑠 𝐿

CRD Analysis of Variance
ANOVATable
Source of
Variation
Degree of freedom
(f.d)
Sum of squares
(SS)
Mean square
(MS)
F-ratio
Treatments L-1 SSTR
𝑀𝑆𝑇𝑅 =
𝑆𝑆𝑇𝑅
𝐿 − 1
𝐹 =
𝑀𝑆𝑇𝑅
𝑀𝑆𝐸
Error term N-L SSE
𝑀𝑆𝐸 =
𝑆𝑆𝐸
𝑁 − 𝐿
Total N-1 SST

CRD-Sum of squares

CRD Example

CRD calculation
 𝑦 =
𝑦 𝑖𝑗
𝑛 𝑖
=
74+54+32+74+60+⋯+54
15
= 57.4
𝒚𝒊𝒋 − 𝒚
𝟐
𝒚𝒊𝒋 − 𝒚𝒊
𝟐 𝒚𝒊 − 𝒚 𝟐
𝟕𝟒 − 𝟓𝟕. 𝟒 𝟐 𝟕𝟒 − 𝟔𝟖. 𝟑𝟑 𝟐
𝟓𝟒 − 𝟓𝟕. 𝟒 𝟐 𝟓𝟒 − 𝟔𝟖. 𝟑𝟑 𝟐 𝟒𝟐 − 𝟔𝟖. 𝟑𝟑 𝟐
𝟑𝟐 − 𝟓𝟕. 𝟒 𝟐 𝟑𝟐 − 𝟔𝟖. 𝟑𝟑 𝟐 𝟓𝟑 − 𝟔𝟖. 𝟑𝟑 𝟐
. . 𝟕𝟐. 𝟔𝟔𝟕 − 𝟔𝟖. 𝟑𝟑 𝟐
. . 𝟓𝟏 − 𝟔𝟖. 𝟑𝟑 𝟐
. .
𝟓𝟒 − 𝟓𝟕. 𝟒 𝟐 𝟓𝟒 − 𝟓𝟏 𝟐

Decomposition of the SST

CRD Example(1)

CRD Hypothesis testing for effects
model

CRD hypothesis for cell means model
 𝐻 𝑜: 𝜇1 = 𝜇2 = 𝜇3 = ⋯ = 𝜇𝑖
 Treatment means are the same
 𝐻 𝑜: 𝜇1 ≠ 𝜇2 ≠ 𝜇3 ≠ ⋯ ≠ 𝜇𝑖
 Treatment means are not the same
 S𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 = 5%
 Test statistic is the ratio of two variances 𝐹𝑐 =
𝑀𝑆𝑇𝑅
𝑀𝑆𝐸
≈ 𝐹(𝑓1, 𝑓2)
 Decision if 𝐹𝑐 > 𝐹(𝑓1, 𝑓2) reject 𝐻 𝑜 at
α% 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙
 𝐹𝑐 < 𝐹(𝑓1, 𝑓2) do not reject 𝐻 𝑜
 Conclusion: There is statistical evidence that treatment means
are not equal

CRD hypothesis for cell means model
 𝐹𝐶 = 2.199, 𝐹4,10 = 3.48
 Since 𝐹𝐶 < 𝐹4,10, we do not reject 𝐻 𝑜 that treatment
means are the same at 5% level of significance.
 Conclusion.There is no statistical evidence that the
treatment means are different.

Comparison of individual treatment
means(1)

Comparison of individual treatment
means(2)

Estimation

Completely Randomized Block Design
(CRBD)
 The RCBD is the standard design for agricultural experiments
 Goal is to improve the experiment by reducing the amount of
variability affecting the treatments
 Field is divided into units to account for any variation in the field
 Treatments are assigned at random within blocks of adjacent
plots, each treatment once per block
 Number of blocks is the number of replications
 Very important in improving experiments as it allows some
control of uncontrolled variation

CRBD (1)
 Any treatment can be adjacent to any other treatment, but not to
the same treatment within the block
 Used to control variation in an experiment by accounting for
spatial effects.

CRBD (2)
 “complete” each block contains all the treatments
 Variability arising from a nuisance factor can affect the results
 Has an effect on response but not of interest
 Unknown and uncontrolled
 Randomization can help to eliminate
 Known but uncontrollable-analysis of covariance
 Known and controllable-blocking systematically eliminate its
effect

CRBD Example
 Experiment was planned for execution in three batches to
accommodate goats that kidded at different times
 Each batch on its own can be considered as a completely
randomized design
 Together they form a randomized block design with batch taking
the role of block

CRBD Model
 Model
Yield=mean+treatment+block+error, i.e.,
𝑦𝑖𝑗 = 𝜇 + 𝜏𝑖 + 𝛽𝑗 + 𝜀𝑖𝑗 , 𝑖 = 1,2, … , 𝐿, 𝐽 = 1, 2, … , 𝑏
 Assumption
Additive effects
Independent error terms
Constant variance of error terms
Normal distribution of error terms
No block-treatment interactions
 Analysis to obtain
Treatment effects
Experimental error variance
Tests of treatment and block effects

Decomposition of SST in RBD

RBD Analysis of Variance
ANOVATable
Source of
variation
Degree of
freedom
Sum of square Mean square F-ratio
Blocks b-1 SSB MSB
𝐹𝐵 =
𝑀𝑆𝐵
𝑀𝑆𝐸
Treatment L-1 SSA MSA
𝐹 𝑇 =
𝑀𝑆𝐴
𝑀𝑆𝐸
Error (b-1)(L-1) SSE MSE
Total bL-1 SSG

RCBD Hypothesis testing

Hypothesis testing(1)

Prospects and problems of RBD
Advantages disadvantages
 Control local variability
 Accommodate any number of
replications
 Different experimental
techniques can be used in
different blocks
 Simple analysis
 Not feasible for large number
of treatments as block size is
increased thus reducing plot
homogeneity
 Invalid results if assumed
block homogeneity is violated

Statistical assumptions
 Variance of the error term is constant, regardless of factor level
i.e.,
𝜎2
𝑌𝑖𝑗 = 𝜎2
𝜀𝑖𝑗 = 𝜎2
 Error terms are normally distributed, this means that,
observations and error terms are linearly related
 Error terms are independent i.e., error term of an outcome of
any trial has no effect on the error of any other trial for the same
factor level
 ANOVA model is 𝑌𝑖𝑗 ≈ 𝑁(𝜇𝑖, 𝜎2
)

RBD example
 An experiment was designed to study the performance of four
different detergents for cleaning clothes.The following
“cleanliness” readings (higher=cleaner) were obtained using a
special device for three different types of common stains. Is there
a significant difference among the detergents?

Why blocking?
 Homogeneous experimental units
 Experimental error as small as possible
 Improves the accuracy of the comparisons among treatments

Latin Square Design
 Randomized block design use only one blocking variable
 It is not appropriate where there are more than two blocking
variables need to be controlled
 When there are two blocking variables and treatments the design
that can handle such a case is the LATIN SQUARE DESIGN
 In Latin square design each treatment occurs once, and only
once, in each row and column

Building Latin Square Design
 For 𝑝 treatments, there are 𝑝2
observations
 Observations are placed in 𝑝 rows and 𝑝 columns which form
𝑝* 𝑝 grid, in such a way that each treatment occurs once, and
only once, in each row and column.
 For 4 treatments 𝐴, 𝐵, 𝐶, 𝐷 and two factors to control. Latin
square design is

Latin Square Model

Latin Square-Statistical analysis
 Total sum of squares (𝑆𝑆 𝑇), partitions into sums of squares due
to columns, rows, treatments and error.

Latin square-ANOVA Table

Latin Square-Hypothesis testing
 Treatments effects
𝐻0: 𝜏.1. = 𝜏.2. = ⋯ = 𝜏.𝑗. 𝑉𝑠 𝐻1: 𝑁𝑜𝑡 𝑎𝑙𝑙 𝑒𝑞𝑢𝑎𝑙
test statistic 𝐹𝑡𝑟 =
𝑀𝑆𝑇𝑟
𝑀𝑆𝐸
 Column effects
𝐻0: 𝜏..1 = 𝜏..2 = ⋯ = 𝜏..𝑘 𝑉𝑠 𝐻1: 𝑁𝑜𝑡 𝑎𝑙𝑙 𝑒𝑞𝑢𝑎𝑙
test statistic 𝐹𝑐𝑜𝑙 =
𝑀𝑆𝐶𝑜𝑙
𝑀𝑆𝐸
 Row effects
𝐻0: 𝜏1.. = 𝜏2.. = ⋯ = 𝜏𝑖.. 𝑉𝑠 𝐻1: 𝑁𝑜𝑡 𝑎𝑙𝑙 𝑒𝑞𝑢𝑎𝑙
test statistic 𝐹𝑟𝑜𝑤 =
𝑀𝑆𝑅𝑜𝑤
𝑀𝑆𝐸

Example -LSD
 Consider an experiment to investigate the effect of four different
diets on milk production of cows.There are four cows in the
study. During each lactation period the cows receive a different
diet.Assume that there is a washout period between diets so that
previous diet does not affect future results. Lactation period and
cows are used as blocking variables

Factorial Design
 Two or more factors can be studied simultaneously
 Every combination of the factors is studied in every trial
 Given two factors 𝐴 𝑎𝑛𝑑 𝐵, 𝑤𝑖𝑡ℎ 𝑙𝑒𝑣𝑒𝑙𝑠 𝑎 𝑎𝑛𝑑 𝑏, each
replicate contain all the 𝑎 ∗ 𝑏 treatment combinations
 The effect of factor 𝐴 is the change in response due to a change
in the level of 𝐴

BIOMETRY BOOK

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to BIOMETRY BOOK

Similar to BIOMETRY BOOK (20)

More from musadoto

More from musadoto (20)

Recently uploaded

Recently uploaded (20)

BIOMETRY BOOK