A review on experimental design and statistical power in swine experimentation. This review helps in gaining more insights into animal experimentation(s).
Experimental design and statistical power in swine experimentation: A review
1. FEDERAL UNIVERSITY OF AGRICULTURE ABEOKUTA (FUNAAB)
P.M.B. 2240, ABEOKUTA, OGUN STATE.
A REVIEW ON
EXPERIMENTAL DESIGN AND STATISTICAL POWER IN SWINE
EXPERIMENTATION: A REVIEW
KAREEM, Damilola Uthman
DEPARTMENT OF ANIMAL NUTRITION
COLLEGE OF ANIMAL SCIENCE AND LIVESTOCK PRODUCTION
APRIL 2019
2. Introduction
Animal studies continue to have a vital role in science development (Aguilar-Nascimento, 2005).
Experimental researches are the key studies for the development of new feeds, feeding regimes
and feeding standards that brings about improvement in animal nutrition. Animal experiments
should inform decisions about what treatments should be taken forward in trials only if their results
are valid and precise. Biased or imprecise results from animal experiments may result in the testing
of biologically inert or other substances in animal trials, thus wasting time and the limited available
resources without obtaining favourable results (Roberts et al., 2002). Unfortunately, some
researchers ignore the principles of experimental designs, generating incorrect data and thus,
reaching to wrong conclusions. Sometimes these experiments are unnecessary repetitive, unethical
and as a result, waste both money and resources (Festing, 2003; Pound and Ebraheem, 2004).
All research appears from the necessity in obtaining new information. Firstly, the design should
define very well which information is aimed at. In other words, the researcher should state which
is the question to be answered by the experiment. Once the question is stated, the method is
delineated step by step, executed and then the data are collected. Data analysis is the next step and
finally, the final text should provide new questions to be answered in further experiments. This
cycle is vital for research lines. There are two types of experiments according to Aguilar-
Nascimento (2005), they are; Confirmatory experiments (which is aimed at testing one or more
hypothesis. For example, an experiment may be set up to investigate whether diet A is associated
with greater performance than diet B) and exploratory experiments (which looks at producing data
that may be important for the generation of hypothesis to be tested). However, in many times both
confirmatory and exploratory experiments are overlapped in the same study (Johnson and
Besselsen, 2002). So, as a rule, all experiments should be presented in a way that allows other
3. researchers to repeat it elsewhere. For that, all experiments should clearly inform the aim, the
reason for choosing some animal model, the species, strain and source of the animals. Every details
of the method should be stated including number of animals, method of randomization and
information of the statistical method (Aguilar-Nascimento, 2005). Knowledge of this will provide
researchers in swine nutrition with the means to determine a valuable piece of information which
are needed for an experiment of known power and sensitivity. This a priori, or prospective, power
analysis, conducted as part of a pre-experiment protocol, will ensure that a researcher does not
waste time and resources carrying out an experiment that has little chance of finding a significant
effect, if one truly exists (Aaron and Hays, 2004). It also makes sure resources are not wasted by
including more replicates than are necessary to detect an effect.
Experimental designs
The primary designs used in swine production and nutrition research include the completely
randomized design (CRD) and the randomized complete block design (RCBD). Modifications or
additions to these designs can be performed to generate more complex designs, such as factorial
designs and Latin square, that typically are used in specific instances when experimental units are
limited. One of the main functions of the experimental design is to dictate the process of allotting
treatments to experimental units (EU). But no matter what design is used, it is important to balance
studies by having equal replication of each treatment factor to maximize the power available to
detect treatment differences. The CRD is the simplest of all designs; treatments are allotted to EU
independently of any factors. This design allows for the most degrees of freedom (DF) for the error
term in the model to test for treatment differences. However, the CRD can be unreliable if the EU
are not homogenous. Non-homogeneity of EU can cause inflated error variance components and
can increase the chance of a type 2 error.
4. In the RCBD, treatments are allotted to EU on the basis of some factor, commonly referred to as
the blocking factor, which should reduce the error variance if the blocking factor is important. The
blocking factor groups EU based on that particular factor into a block, with each treatment having
a minimum of one EU in each block. The primary function of blocking is to obtain groups of
homogenous EU. Blocking factors vary according to the type of trial and may be different
depending on the desired treatment structures. One of the assumptions in this design is that
treatments would respond similarly in each block or that there was no true block × treatment
interactions because the mean square calculated as the block × treatment source estimates the error
variance structure for the model. One way to examine the blocking factor’s effectiveness is to
determine its relative efficiency (RE). Relative efficiency is a calculation performed after the trial
is completed to show the ratio between an estimated error term if the study were conducted as a
CRD and the error term for the RCBD. It also describes the increased number of experimental
units that are needed in a CRD to achieve the same error variance component term as in a RCBD.
For example, if the RE for a particular response variable was calculated to be 2.00, one could
assume that the estimate for the error variance component was 2.00 times greater in the CRD than
the RCBD, and theoretically, the CRD would need twice as many experimental units to achieve
the same estimate error variance component as a RCBD.
It has been a common practice to block nursery studies to achieve a reduced estimate for the error
component of an experiment. Often, these studies are blocked simultaneously by location in the
barn and initial weight. Both of these factors could affect performance and affect the interpretation
of results if not equalized across treatments.
5. Factors to consider in designing an animal experiment
During the preparation of the study design, important factors to consider include; the number of
animals to be used, pilot studies, randomization, blinding, control groups, type of variables
collected, and the statistical methods (Festing, 2003; Johnson and Besselsen, 2002) are to be to be
taken into consideration.
Number of Animals
The number of animals assigned to the experimental groups and sub-groups is vital. The
calculations can be easily done by on-line internet sites. Examples of such web sites are
http://www.biomath.info, http://www.stat.uiowa.edu/~rlenth/Power etc. According to Aguilar-
Nascimento (2005), computer softwares such as SPSS, SAS, and Epi-6 may also be used to
calculate the figures. Most of times, mathematical formulas are used. The identification of the
number of animals to be used is fundamental to avoid the â error (1-power). 1-power or the â error
is the chance of obtaining a false negative result, i.e. the experiment will reject an untrue null
hypothesis or a specified treatment effect (Dell et al., 2002). Power analysis if possible, must be
calculated to ascertain the number of animals per group.
Pilot studies
This is an important step to assure that the entire experiment will work out fine. Frequently, a few
numbers of animals are required, though large number may be necessary (Festing and Altman,
2003). Pilot studies are also important to give an idea and calculate the size of the experiment, i.e.
the number of animals that may be necessary (Johnson and Besselsen, 2002; Festing and Altman,
2003; Dell et al., 2002). Sometimes the original design is changed during the course of the pilot
6. study due to the outcomes. Therefore, pilot studies should be minded by a researcher as a useful
weapon for the strategy of the project (Festing and Altman, 2003).
Randomization
Randomization is another valuable topic in the experimental design (Altman and Dore, 1990). The
allocation of animals to different groups of treatment should be at random because of the following
reasons;
i. To avoid biases,
ii. To guarantee that groups have the same probability to receive a treatment, and
iii. To control variation
The method used to randomize should be clearly stated. Dices, envelopes containing pieces of
papers with codes, and tables with random numbers are examples of some frequent methods used
for randomization (Johnson and Besselsen, 2002). Experiments with either completely randomized
groups or randomized blocking designs are all correct (Festing, 2003). Blocking refers to direct
manipulation to control one or more independent variables and thus, avoid variation (Johnson and
Besselsen, 2002; Shaw et al., 2002). Ancillary variables such as sex and weight may be firstly
manipulated to confer minimal variation between the groups (Das, 2002). The researcher may
divide the animals that are comparable in cages and then randomize them to the groups.
Blinding
This is a procedure in which one or more parties in a trial are kept unaware of which treatment
arms participants have been assigned to. It is an important aspect of any trial done in order to avoid
and prevent conscious or unconscious bias in the design and execution of a trial. When two or
7. more treatments are being compared, the researcher must diminish the occurrence of bias. Thus,
the experiment should be done “blind” to diminish the possibility of a subjective effect in
collecting data. (Johnson and Besselsen, 2002; Festing and Altman, 2003).
Control groups
Control groups should be planned with care. They are fundamental in experimental designs and
should be preferred than historical comparison (Johnson and Besselsen, 2002). There are many
types of control groups, although the most important are the positive, negative, sham and
comparative controls (Festing and Altman, 2003). Positive controls are those in whom an effect or
changes are most expected. They are necessary to estimate alterations that a condition may cause
and then detect what the investigated treatment may modify. For example, the effect of two
different diets on the recovery of malnutrition in swines must be compared with a positive control
group studied during malnutrition. In a negative control no changes are expect. It is like a mirror
of the positive control. Sham controls are used to mimic a procedure or treatment, while
comparative control is a type of positive control in which it is used a known treatment to contrast
with the new investigated treatment (Johnson and Besselsen, 2002; Festing and Altman, 2003).
Types of variables
A variable can be continuous, ordinal, and categorical (Altman and Dore, 1990; Kinnear, 1994;
Johnson and Besselsen, 2002). Continuous variables are those expressed by numbers (serum
glucose level, anastomotic bursting pressure, heart frequency, etc). When a score is used with a
limited variation such as 0, +, ++, and +++, the variable is termed ordinal. When it is considered
an effect that may or may not occur such as death or infection, the variable is said to be categorical.
Whenever it is possible, the researcher should use continuous variables because with it, an effect
8. may be noticed earlier with few animals (Johnson and Besselsen, 2002; Festing and Altman, 2003;
Shaw et al., 2002).
The choice of experimental units in swine experiments
Experiments can be conducted on individual pigs, meaning measurements are made on each
individual randomly assigned to the treatments (Festing and Altman, 2002). Each pig has unique
housing and microclimate conditions and represents one degree of freedom in the analysis of
variance. Housing pigs individually in production settings is rare, so using observations on
individuals is not always appropriate. For instance, the heat production (metabolic rate) of
individually and colony-housed pigs may be different due to huddling, particularly under cool or
cold conditions. Thus, housing pigs individually has the potential to compromise the application
of results to field conditions where the pigs are raised together in a house.
Replication
Replication according to Aaron and Hays (2004) refers to the assignment of more than one
experimental unit to the same treatment. Each replication is said to be an independent observation;
thus, each replication involves a different experimental unit. Correct definition of the experimental
unit determines the entity to be replicated. To cite an example with an experiment conducted by a
researcher to compare effects of four different diets on the performance of growing-finishing pigs.
Four pens of the same size are available and each will house eight pigs of the desired age and
weight. The researcher randomly assigns eight pigs to each pen and then randomly assigns diets
to pens. The researcher believes “pig” is the experimental unit and that there are eight replications.
However, because diets were assigned to pens, and all pigs in the same pen receive the same diet,
9. “pen” constitutes the experimental unit. As a result, the experiment has no replication, and further
assumptions are needed before valid conclusions can be drawn (Aaron and Hays, 2004).
Replication versus repeated measures
It is of utmost importance to make a distinction between a repeated measure and a replication. If a
treatment is assigned at random to a particular entity or experimental unit of the same moment and
location, then this is a genuine replication. However, if the same animal is measured several times,
either at different locations or at different moments in time and a treatment is assigned to the animal
as a whole, then these measurements are repeated measures and not replications. The repeated
measures allow us to make a more precise assessment of the response of the particular animal but
does not give us any additional information on the variability between animals against which we
have to test the treatment effect. Thus, the statistical analysis should be based on genuine
replications and not on repeated measures as the concept of repeated measure versus replication is
very important to perform a correct statistical analysis (Duchateau, 2009).
Experimental/statistical power in swine experimentation
Most researchers are primarily concerned with Type I error (α), the probability that they will
declare a significant difference when none really exists (reject the null hypothesis when it is true).
By tradition, according to Bedford et al. (2016), the chance of declaring differences to be
significant when they are not, is 1 in 20, or P < 0.05. As opined by the same authors, researchers
should more often be concerned with another type of error; Type II error (β). This error occurs
when something is not declared different when it really is (fail to reject the null hypothesis when
it is false). Answers to typical questions that swine nutritionists ask are more dependent on Type
II error than Type I (Bedford et al., 2016). Questions like ‘How much of an additive can be added
10. before there is no longer any significant increase in response?’, or ‘How much of an alternative
ingredient can be fed before there is no significant decrease in response?’, require more powerful
experiments to find differences of importance to producers. The convention is to be content with
not declaring something different that really is only one out of five times, or P > 0.80.
Unfortunately, if the chance of committing a Type I error (declaring something different when it
is not) is decreased by increasing the critical probability value, the chance of committing a Type
II error (not declaring a real difference) is increased for a given sample size, n. To decrease both,
the sample size (n) must be increased (Bedford et al., 2016).
Statistical power can thereby be defined as the probability of rejecting the null hypothesis while
the alternative hypothesis is true (Gayla and Yong, year unknown). Factors that affect statistical
power include the sample size, the specification of the parameter(s) in the null and alternative
hypothesis, i.e. how far they are from each other, the precision or uncertainty the researcher allows
for the study (generally the confidence or significance level) and the distribution of the parameter
to be estimated. For example, if a researcher knows that the statistics in the study follow a Z or
standard normal distribution, there are two parameters that he/she needs to estimate, the population
mean (μ) and the population variance (σ2
). Most of the time, the researcher knows one of the
parameters and needs to estimate the other. If that is not the case, some other distribution may be
used, for example, if the researcher does not know the population variance, he/she can estimate it
using the sample variance and that ends up with using a T distribution.
In research, statistical power is generally calculated for two purposes.
1. It can be calculated before data collection based on information from previous research to
decide the sample size needed for the study.
11. 2. It can also be calculated after data analysis. It usually happens when the result turns out to
be non-significant. In this case, statistical power is calculated to verify whether the non-
significant result is due to really no relation in the sample or due to a lack of statistical
power.
Statistical power is positively correlated with the sample size, which means that given the level of
the other factors, a larger sample size gives greater power. However, researchers are also faced
with the decision to make a difference between statistical difference and scientific difference.
Although a larger sample size enables researchers to find smaller difference statistically
significant, that difference may not be large enough to be scientifically meaningful. Therefore, it
is pertinent to have an idea of what to do to have a scientifically meaningful difference before
doing a power analysis to determine the actual sample size needed.
Statistical power calculation in swine experiment
Power calculations can be made during either the planning or the analysis stage of an experiment.
In either stage, essential information includes 1) significance level, 2) size of the difference or
effect to be detected, 3) power to detect the effect, 4) variation in response, and 5) number of
replications or sample size.
In determining the appropriate power, the idea is to have a reasonable chance of detecting the
stated minimum difference. A target power of 80% is common and can be used as a minimal value.
Some statisticians argue for higher powers, such as 85, 90, or even 95%. As power increases,
however, the required number of replications increases. Therefore, it is rare with animal
experiments to set power at values larger than 80% (Aaron and Hays, 2004).
12. In swine nutrition experiments, guidelines for determining expected differences may be obtained
from previous work. For example, in sow reproduction studies, estimated average litter size is 12
to 14 pigs at birth and 10 to 12 pigs at weaning (personal communication, G. L. Cromwell,
University of Kentucky). In determining the appropriate power, the idea is to have a reasonable
chance of detecting the stated minimum difference. A target power of 80% is common and can be
used as a minimal value. Some statisticians argue for higher powers, such as 85, 90, or even 95%.
As power increases, however, the required number of replications increases. Therefore, it is rare
with animal experiments to set power at values larger than 80% (Aaron and Hays, 2004).
Researchers must thus have some information before they can do the power and sample size
calculation. This information includes previous knowledge about the parameters (their means and
variances) and what confidence or significance level is needed in the study.
Methods employed in performing power analysis
i Manual method
This is synonymous to manual calculation. Illustrating a researcher that wants to calculate the
sample size needed for a study. Given that the researcher has the null hypothesis that μ=μ0 and
alternative hypothesis that μ=μ1≠ μ0, and that the population variance is known as σ2
. Also, he
knows that he wants to reject the null hypothesis at a significance level of α which gives a
corresponding Z score, called it Zα/2. Therefore, the power function will be;
P{Z > Zα/2 or Z < −Zα/2|μ1} = 1 − Φ[Zα/2 − (μ1 − μ0)/(σ/n)] + Φ[−Zα/2 − (μ1 − μ0)/(σ/n)]
That is a function of the power and sample size given other information known and the researcher
can get the corresponding sample size for each power level.
13. For example, if the researcher learns from literature that the population follows a normal
distribution with mean of 100 and variance of 100 under the null hypothesis and he/she expects
the mean to be greater than 105 or less than 95 under the null hypothesis and he/she wants the test
to be significant at 95% level, the resulting power function would be:
Power = 1 − Φ[1.96 − (105 − 100)/(10/n)] + Φ[−1.96 − (95 − 100)/(10/n)], which is,
Power = 1 − Φ[1.96 − n/2] + Φ[−1.96 + n/2]
That function shows a relationship between power and sample size. For each level of sample size,
there is a corresponding sample size. For example, if n=20, the corresponding power level would
be about 0.97, or, if the power level is 0.95, the corresponding sample size would be 16.
ii Computer method
On computers, statistical power can be calculated via statistical packages like SAS. The
PowerAndSampleSize.com web site, which contains (at last count) 19 interactive calculators for
power or required sample size for many different types of statistical tests: testing 1 mean,
comparing 2 or more means, testing 1 proportion, comparing 2 or more proportions, testing odds
ratios, and two 1-sample tests (normal and binomial-based) can as well be used. This site also
provides calculators for non-inferiority and equivalence studies. The web pages display graphs
that dynamically show how power varies with various design parameters as you change other
parameters.
Also, power/sample-size calculator by Russel Lenth (U of Iowa) is as well up to this task. It
handles tests of means (one or two samples), tests of proportions (one or two samples), linear
regression, generic chi-square and Poisson tests, and an amazing variety of ANOVAs -- 1-, 2-, and
14. 3-way; randomized complete-block; Latin and Greco-Latin squares; 1-stage, 2-stage, and factorial
nested designs; crossover; split-plot; split-split; and more
Conclusion
There are many possible interpretations of experimental designs, but it is the inference from
statistical analyses that is really important for researchers. The researcher’s goals, and especially
the degree of precision deemed necessary, are particularly important when choosing how many
animals should be used, should more than one be put into each pen, how many pens should be used
for each treatment, etc. In this context, the planning of the experimental design via power analysis
calculation is thereby vital, as this will help in curbing unnecessary replications and resource
wastages in swine experimentation.
15. REFERENCES
Aaron D. K. and Hays V. W. 2004. How many pigs? Statistical power considerations in swine
nutrition experiments. J. Anim. Sci. 2004. 82(E. Suppl.): E245–E254
Aguilar-Nascimento J.E. 2005. Fundamental steps in experimental design for animal studies. Acta
Cirúrgica Brasileira - Vol 20 (1)
Altman DJ, Dore CJ. Randomization and base-line comparisons in clinical trails. Lancet.
1990;335:149-53.
Das REG. The role of ancillary in the design, analysis, and interpretation of animal experiments.
ILAR J. 2002;43:214-22.
Dell RB, Holleran S, Ramakrishnan R. Sample size determination. ILAR J. 2002;43:207-13.
Duchateau L. 2009. Design and analysis of animal experiments.
Festing MFW, Altman DG. Guidelines for the design and statistical analysis for experiments using
laboratory animals. ILAR J. 2002;43:244-58.
Festing MFW. Principles: the need for better experimental design. Trends Pharmacol Sci. 2003;
24:341-5.
Hoenig, John M. and Heisey, Dennis M. (2001), “The Abuse of Power: The Pervasive Fallacy of
Power Calculations for Data Analysis,” The American Statistician, 55, 19-24.
Johnson PD, Besselsen DG. Practical aspects of experimental designs in animal research. ILAR J.
2002;43:202-6
Kinnear PR, Gray CD. SPSS for Windows. London: Psychology Press; 1994.
Kuehl, R. O. 2000. Design of Experiments: Statistical Principles of Research Design and Analysis.
Duxbury Press, Pacific Grove, CA. pp. 272-275.
16. Lenth, R. V. (2001), “Some Practical Guidelines for Effective Sample Size Determination,” The
American Statistician, 55, 187-193.
Lenth, R. V. (2006). Java Applets for Power and Sample Size [Computer software]. Retrieved on
29th April, 2019 from http://www.stat.uiowa.edu/~rlenth/Power.
Pound P, Ebrahim S, Sandercock P, Bracken MB, Roberts I; Reviewing Animal Trials
Systematically (RATS) Group. Where is the evidence that animal research benefits humans? Br
Med J. 2004;328:514-7.
Roberts I, Kwan I, Evans P, Haig S. Does animal experimentation inform human healthcare?
Observations from a systematic review of international animal experiments on fluid resuscitaition.
Br Med J. 2002;324:474-6.
Shaw R, Festing MFW, Peers I, Furlong L. The use of factorial designs to optimize animal
experiments and reduce animal use. ILAR J. 2002;43:223-32.
Shelton N. W., Dritz S. S., Tokach M. D., Goodband R. D., Nelssen J. L., DeRouchey J. M., and
Murray L. W. 2000. Design of Experiments: Statistical Principles of Research Design and
Analysis. Duxbury Press, Pacific Grove, CA. pp. 272-275.