Advanced Biometrics Course on Plant Breeding and Biotechnology

Advanced Biometrics
Techale Birhan (PhD)

Course
• Biometry – Plant Breeding
• Biostatistics – Plant Biotechnology

Course content…
 Introduction to basic principles of AE
 Experimental Design and Field Management
 Multivariate analysis
 Incomplète Block design
 Practical data analysis with R, SAS & Tassel
 Data transformation

What is research?
• Research means an organized and systematic
way of finding solution to a question
• Is a planned inquiry to obtain new facts or to
confirm or deny the results of the previous
experiments

Experimental research or non-experimental
research
• One may simply observe a scenario and
decide based on own subjective judgment or
• Require other tools and methods to assist in
the process of decision making

Development of Quantitative Genetics
• Johannsen, 1903, Seed weight of inbred lines
– Variation among lines, heritable
– Parents with heavy lines gave heavy offsprings
– Variation within lines, not heritable, environmental
• Nilsson-Ehle in 1909 studied kernel color in wheat
– crossed red lines to white
– F1 red intermediate between the two parents, and
– F2 ranged from red to white.
– Some lines segregated 3:1 (red:white) in the F2,
– whereas some segregated 15:1 and some 63:1.
– Kernel color is ontrolled by three genes.
– These three genes act additively independently that gives a
continuous distribution

Field experimentation is used to obtain
New information or to improve the results of
previous findings
• It helps to answer questions such as:
 Which fertilizer level gives optimum yield?
 Which insecticide is the most effective?
 Is the improved variety higher yielding than the
local varieties?

Development of Quantitative Genetics cont.
• Fisher (1918) introduced statistics
in Mendelian genetics, where
variance (2) was used to measure
differences in a population.
• This analysis involves population,
not an individual.
• Population - Group of individuals
belonging to a certain class.

Development of Quantitative Genetics cont.
• Wright (1932) studied coat color in
guinea pigs and recognized the
importance of gene interaction
• Inbreeding, non-random mating and
selection on the genetic composition of
a population.
• The two most important contribution of
Wright are the concept of inbreeding
coeficient and effective population size.

Steps in experimental methods:
• Define and state the problem
• State objectives
• Develop hypothesis
• Implement the experiment
• Data collection and analysis
• Interpretation of results
• Preparation of complete & precise report

To produce an acceptable result
• The trials must be designed properly
• Data must be collected properly
• Correct analytical method must be used
NB: It is quite difficult to compromise at analysis stage
if the design was initially wrong

Basic Statistical Terms and Concepts

Categories of genetic studies
1. Qualitative characters
• Characters that could be grouped into ,
kinds, types/classes.

Statistics…
Statistics
Inferential
(Test hypothesis, make
conclusion )
(Making decision about
population based on
sample)
Descriptive
(Describe characteristics,
organize and summarize)
(mean, mode, median)

3. Quantitative Genetics
• Quantitative genetics is a field of science
involving transmission, inheritance or
heredity of variation among quantitative
traits of individuals, i.e. variation in traits
that can only be differentiated using
measurements.

Characteristics of QT cont.
• Although there are 27 genotypes,
many of them have the same
phenotype and hence there are
only seven phenotypes (0, 1, 2, 3,
4, 5 and 6).
• Therefore there is no strict one to
one realtionship between
genotype and phenotype.

Experimental Error
In the process of experimentation there are
several sources of errors that may be encountered
at all stages of the work.
• Inaccurate equipment
• Personal bias
• Inadequate replication
• Lack of uniformity in soil fertility
• Topography or drainage
• Damage by rodents, birds, insects & diseases

Experimental Error
Precision
• Precision is the closeness of repeated
measurements
Accuracy
• Accuracy is the closeness of a measured or
computed value to its true value

Hypothesis
• A proposed explanation made on the basis of
limited evidence
• A starting point for further investigation

Hypothesis
Null hypothesis
• Any hypothesis to be tested and is denoted by H0
• There is no difference between treatment
Alternate hypothesis
• Denoted by H1 or HA
• There is at least one treatment different from other
If one plant is watered with distilled water and
the other with mineral water, then there is no
difference in the growth of these two plants

• Many different genotypes can have the same
phenotype. Considering k number of genes, all
having an equal effect on a trait. If there are two
alleles at each locus and that they exhibit co-
dominance (neither allele is dominant), then
there will be a total of 3k genotypes. For
example with k = 3 the following genotypes and
phenotypes can be shown, assuming each A, B
and C allele adds one unit to the phenotype:

• Type-I error
• Rejection of the null hypothesis when it is true
• If you get significance and you’re wrong, it’s a
false-positive
• The probability of finding a difference with our
sample compared to population, and there
really isn’t one

• Type-II error
• Acceptance of the null hypothesis when it is false
• If you get non-significance and you’re wrong, it’s
a false negative
• The probability of not finding a difference that
actually exists between our sample compared to
the population.

No. Genotype Phenotype
1. AABBCC 6 units
2. AABBCc 5 units
3. AABBcc 4 units
4. AABbCC 5 units
5. AABbCc 4 units
6. AAbbCc 3 units
7. AAbbCC 4 units
. .
. .
. .
27. aabbcc 0 units

1 gene → 3 genotypes = 3 phenotypes
2 genes → 9 genotypes = 5 phenotypes
3 genes →27genotypes = 7 phenotypes
n genes → 3n genotypes = 2n+1 pheno.

Genotypic & Metric values
• The A allele will give 4 units while the a
allele will provide 2 units. At the other
locus, the B allele will contribute 2 units
while the b allele will provide 1 units. With
two genes controlling a trait, nine different
genotypes are possible. Below are the
genotypes and their associated metric
values:

Genotype Ratio in F2 Metric value
AABB 1 12
AABb 2 11
AAbb 1 10
AaBB 2 10
AaBb 4 9
Aabb 2 8
aaBB 1 8
aaBb 2 7

• A factor is a procedure or condition whose effect is to
be measured.
• Treatment
• Is the level or rate of a certain experimental factor
• a treatment may be a standard ration, inoculation,
and a spraying rate/spraying schedule

2. Dominance (allelic-interaction) can
obscure the true genotype effects.
3. Environmental variation and the
interaction of genotype with environment
obscure genetical effects.
4. Epistasis (non-allelic interaction) would
impose limitation to make prediction, for
example, predicted response to
selection.

Organization and Description of
Data

2. Molecular
Molecular genetics on the other hand, deals
with biochemical and molecular mechanisms
by which hereditary information is stored in
DNA (deoxyribonucleic acid) and
subsequently transmitted to proteins.
DNA is the molecule that stores genetic
information within the cell.

• Continuous Vs Discrete variables
• Continuous
– Infinite values in between
– eg. height of students, GPA etc
• Discrete
– separate categories
– eg. letter grade

2. Gene and Genotype Frequencies
Assuming that, in a population of diploid
organisms, the composition of a population, in
terms of gene A and a is as follows:
AA Aa aa Total
Number 2 12 26 40
Proportion 2/40 12/40 26/40
0.05 0.30 0.65 1.0
No. (A) 2(2) = 4 1(12) = 12 0(26) = 0
No. (a) 0(2) = 0 1(12) = 12 2(26) = 52
Total Alleles 4 24 52 80

2 x No. aa + 1 x No. Aa
Freq. a = q = ------------------------------
Total No. Alleles
= (2 x 26) + (1 x 2)
---------------------
80
= 52 + 12
----------
80
= 64
---
80
= 0.8

Random Mating
• Random mating occurs when every individual in
the population has the same probability (chance)
to mate with every other individual in the
population.
• Random mating is also called panmixia, while
the population involved is called a panmictic
population.
• In a panmictic population, panmixia usually only
occurs in large populations - with hundreds or
thousands of individuals.

Measure of central tendency
• The three most common measures of central tendency
• Mean
o Median
o Mode

Mean
• Mean is the arithmetic average of the values.
• To calculate the mean, all measurements are added and
then be divided by the number of observations.

Median
• Is the value that exactly separates the upper half of the
distribution from the lower half.
• Median is the point located in such a way that 50% of the
scores are lower than the median and the other 50% are
greater than the median.

Mode
• Mode is the most frequent value.
• It is categorized as a measure of central tendency,
because a glance at a graph of the frequency distribution
shows the grouping about a central point
• Mode is the highest point in the hump or it is the most
frequent score.

Measure of dispersion
• Range
• Standard deviation
• Variance

Methods of Data Collection
• Observation
• Interview
• Questionnaire

2. These traits are controlled by many
genes, and greatly influenced by
environmental factors. Therefore, it is
important to know how much (percentage)
of the variation is heritable and how much
is not. Information is important in selection
of traits in breeding and selection program.
3. Important in evolution studies.
4. Important in population studies.
Importance of Quantitative Genetics

Importance of Quantitative Genetics
1. Most economically important traits are
categorized here. Products of:
• Crops
• Livestock
• Micro-organisms

• Sampling techniques
 Probability (Random) Sampling
 Non-probability (Non-random) Sampling

• Probability (Random) Sampling
 Simple random sampling
 Systematic sampling
 Stratified sampling
 Clustered sampling
 Multistage random sampling
 Stratified multistage random sampling

• Non-probability (Non-random) Sampling
 Quota sampling
 Purposive Sampling
 Convenience sampling

Sampling methods
•Probability Sample
• Every unit in the population has a chance (greater
than zero) of being selected in the sample
• Probability samples are the best to ensure
representativeness and precision

Simple random sampling
• Applicable when population is small, homogeneous
& readily available
• This is done by assigning a number to each unit in
the sampling frame.
• A table of random number or lottery system is used
to determine which units are to be selected.

• Systematic sampling
• Relies on arranging the target population according to
some ordering scheme and then selecting elements at
regular intervals through that ordered list.
• Involves a random start and then proceeds with the
selection of every kth element from then onwards.
• A simple example would be to select every 10th name
from the telephone directory

• Stratified sampling
• Where population embraces a number of distinct
categories, the frame can be organized into separate
"strata".
• Each stratum is then sampled as an independent
sub- population, out of which individual elements
can be randomly selected.

• Cluster sampling
• An example of 'two-stage sampling'
• First stage a sample of areas is chosen;
• Second stage a sample of respondents within those
areas is selected.
• Population divided into clusters of homogeneous units,
usually based on geographical contiguity
• The most common variables used in the clustering
population are the geographical area, buildings, school,
etc

• Non- probability samples
– Probability of being chosen is unknown, cheaper- but
unable to generalize;potential for bias
• Convenience samples (ease of access)
– Sample is selected from elements of a population that
are easily accessible

• Purposive sampling (judgemental)
• You chose who you think should be in the study
• This is used primarily when there is a limited
number of people that have expertise in the area
being researched

• Quota sample
• The selection is non-random
• For example, interviewers might be tempted to
interview those people in the street who look most
helpful, or may choose to use accidental sampling
to question those closest to them, to save time.

With random mating,
AA Aa aa
P H Q
__________________________________
AA P P2 PH PQ
Aa H PH H2 HQ
aa Q PQ HQ Q2

As a result of panmixia, progenies with the
following proportions are obtained:
Mating Frequency Progeny Genotype Frequency
_________________________________________________
AA Aa aa
_____________________________________________________________________________
AA x AA P2 P2 - -
AA x Aa 2PH PH PH -
AA x aa 2PQ - 2PQ -
Aa x Aa H2 1/4H2 1/2H2 1/4H2
Aa x aa 2HQ - HQ HQ
aa x aa Q2 - - Q2
_____________________________________________________________________________
Total 1 (P + 1/2H)2 2(P + 1/2H)(Q + 1/2H) (Q + 1/2H)2
p2 2pq q2
____________________________________________________________________________

In random mating, mating of gametes is also random, therefore, in a
population with genotypes AA, Aa, aa, gamete frequencies are A=p
and a=q, and fusion between the two gametes will produce:
Male A a
p q
Female
A p AA p2 Aa pq
a q Aa pq aa q2
i.e.
AA Aa aa
Frequency p2 2pq q2

The formation of this new progeny population showed that the
composition of the succeeding generation depends on the gene
frequencies of the initial population.
AA Aa aa
A a
p q
_______________________
AA
A p p2 pq
Aa
a q pq q2
aa

The gene frequencies in this population are:
p = P + 1/2H = p2 + 1/2 (2pq)
= p2 + pq
= p (p + q)
= p
q = Q + 1/2H = q2 + 1/2(2pq)
= q2 + pq
= q(p + q)
= q
• This shows that, in a panmictic population, gene and
genotype frequencies remain constant.

Hardy-Weinberg Law of Equilibrium
• In a large and panmictic population,
considering one locus (unlinked gene), in
the absence of migration, mutation and
selection, gene and genotype frequencies
in the population remain constant from one
generation to another.

The relationship between gene and genotype
frequencies in the population in Hardy-Weinberg
equilibrium is:
Gene Genotype
A a AA Aa aa
p q p2 2pq q2
________________________________________
1 0 1 0 0
0.8 0.2 0.64 0.32 0.04
0.5 0.5 0.25 0.5 0.25
0.2 0.8 0.04 0.32 0.64
0 1 0 0 1

Hardy-Weinberg Law of Equilibrium
involves four situations/ stages to be
true
1. Gene frequency of parent Gene segregation - normal
Parent/Gamete - normal
Mating of Gametes – random
(large population)
2. Zygote genotype frequency
3. Progeny genotype frequency Equal viability
4. Progeny gene frequency

Multiple Alleles
• In some situations, there are more than two alleles on a locus. In
this case, the population will reach equilibrium after one generation
of random mating. This can be shown either by
- random mating of gametes, or
- random mating of genotypes
• Assuming the case of three alleles on one locus: A,a' and a
Gene Genotype
A a’ a AA Aa’ Aa a’a’ a’a aa
f p q r p2 2pq 2pr q2 2qr r2

The proof, after random mating of
gamete:
A a’ a
p q r
A p AA p2 Aa’ pq Aa pr
a’ q Aa’ pq a’a’ q2 a’a qr
a r Aa pr a’a qr aa r2
Inference:
Genotype AA Aa’ Aa a’a’ a’a aa
Frequency p2 2pq 2pr q2 2qr r2
P Q R S T U

After random mating of gamete:
pA = 2P + Q + R = 2P + Q + R
2(P + Q + R + S + T + U) 2
= P + 1/2Q + 1/2R
= p2 + 1/2(2pq) + 1/2(2pr)
= p2 + pq + pr
= p(p + q + r)
= p

After random mating of gamete:
qa’ = S + 1/2Q + 1/2T
= q2 + 1/2(2pq) + 1/2(2qr)
= q2 + pq + qr
= q(q + p + r)
= q
ra = U + 1/2R + 1/2T
= r2 + 1/2(2pr) + 1/2(2qr)
= r2 + pr + qr
= r(r + p + q)
= r

Multiple Alleles
• However, sometimes each of those genotype
cannot be differentiated by type, for example,
Genotype Aa’ AA,Aa a’a’,a’a aa
Blood group AB A B O
Frequency 2pq p2 + 2pr q2 + 2qr r2
The easiest way to calculate the gene frequencies is by the
reverse method, as follows:
ra = r2
= O
pA ?

The reverse method, cont.
B + O = q2 + 2qr + r2
= (q + r)2
but, q + r = 1 - p
therefore, (1 – p)2 = B + O
1 – p = (B + O)
p = 1 - (B + O)
qa’ ?
A + O = p2 + 2pr + r2
= (p + r)2
= (1 – q)2
(A + O) = 1 – q
= 1 - (A + O)

Factors affecting Equilibrium
1. Sex Linkage
• There are genes located on sex chromosomes,
i.e. these genes are always with a certain sex.
There are two forms of combinations of sex
chromosomes, homogamete (XX - female) and
heterogamete (XY or XO - male). Therefore, the
possible genotypes would be more.

Sex Linkage
For one locus, A/a, the possible genotypes
are:
Male Female
XY XX
A a AA Aa aa
XAY XaY XAXA XAXa XaXa

• Assuming that the gene frequencies in the
female and male populations are equal,
A=p, a=q, the panmictic population will
reach equilibrium.
p2 AA 2pq Aa q2 aa
p A p3 2p2q pq2
q a p2q 2pq2 q3
Sex Linkage

Progenies
Mating Female Male
Freq
AA Aa Aa A a
AA X A p3 p3 - - p3 -
Aa X A 2p2q p2q p2q - p2q p2q
aa X A pq2 - pq2 - - pq2
AA X a p2q - p2q - p2q -
AaX a 2pq2 - pq2 pq2 pq2 pq2
aa X a q3 - - Q3 - q3
Total p3+p2q
=p2(p+q)
=p2
2pq2+2p2q
=2pq(q+p)
=2pq
pq2+q3
=q2(p+q)
=q2
p3+p2q+pq2
=p(p2+2pq+q2
= p
q3+2pq2+p2q
=q(q2+2pq+p2)
=q

Sex Linkage
Equilibrium will only be reached if the gene
fruquencies in the male and female are the
same, i.e.,
pf = pm
Example:
Let pf =pm = 0.4; qf = qm = 0.6,
Male Female
A a AA Aa aa
0.4 0.6 0.16 0.48 0.36

Sex Linkage
• If the gene frequencies in the males and females
are not equal, equilibrium will not be reached
after one generation of panmixia. This is shown
below:
Female Male
AA Aa aa A a
P H Q R S
pf = P + 1/2H pm = R
p = 1/3 pm + 2/3 pf

Sex Linkage
• Since after panmixia, the male progenies received genes from the
female parents, while female progenies received half of the genes
from female parents, while the other half from the male parents, the
gene frequencies after one generation of panmixia are:
pm = pf'
pf = 1/2 (pf' + pm')
pf - pm = 1/2 (pf' + pm') - pf'
= -1/2pf' + 1/2pm'
= -1/2(pf' -pm')
i.e.;
1. the difference in gene frequencies between the males and females
is ½ after every generation of panmixia,
2. the direction of the difference is reverse every generation.

Example:
Initial population:
Male Female
A a AA Aa aa
0.2 0.8 0.2 0.6 0.2
pm = 0.2 pf = 0.2 + 1/2 (0.6)
= 0.5
pm = pf'
pf = 1/2(pf' + pm')
p = 1/3(0.2) + 2/3(0.5)
= 0.4

Generation pm pf pf - pm
________________________________________________________________
0 0.2 0.5 +0.3
1 0.5 0.35 -0.15
2 0.35 0.425 +0.075
3 0.425 0.3875 -0.0375
4 0.3875 0.40625 +0.01875
5 0.40625 0.396875 -0.009375
6 0.396875 0.4015625 +0.0046875
.
.
.
n 0.40000 0.40000 0.00000
____________________________________________________________

0
0.1
0.2
0.3
0.4
0.5
0.6
0 1 2 3 4 5 6 . . . n
pm pf

2. Two (or more) Linked Loci
• Equilibrium in the population is reached after one
generation of random mating if all loci are considered
separately.
• Equilibrium is not reached if the loci are considered
together. The rate in achieving equilibrium will be slower
if the loci are more tightly linked.
Assuming 2 loci A/a and B/b, with the gene frequency of:
A a B b
p q r s

At equilibrium, the genotype frequencies are:
AABB AABb Aabb AaBB AaBb Aabb aaBB aaBb aabb
p2r2 2p2rs p2s2 2pqr2 4pqrs 2pqs2 q2r2 2q2rs q2s2
Equilibrium will be reached, depending on the gamete frequencies
Gamete: AB Ab aB ab
Frequency: pr ps qr qs

• Equilibrium will be reached after one generation of random mating, if
all the gene frequencies are the same, i.e. p=q=r=s=0.5; or
pr=ps=qr=qs=0.25.
• At equilibrium, it is expected that the frequency of the repulsion
phase gametes equals to the frequency of the coupling phase
gametes.
A B
______________________________
______________________________
………………. X …..………………....
………………………………………....
a b
AB, ab = coupling phase gametes
Ab, aB = repulsion phase gametes
2. Two (or more) Linked Loci

At equilibrium,
(AB)(ab) = (Ab)(aB)
for example: A = B = 0.6, a = b = 0.4;
AB Ab aB ab
0.36 0.24 0.24 0.16
(0.36 x 0.16) = (0.24 x 0.24)
0.0576 = 0.0576

3. Changes in Gene Frequencies in
Populations
• According to Hardy-Weinberg Law of Equilibrium,
considering only one locus (gene), a population will
be at equilibrium after one generation of random
mating, in the absence of migration, mutation and
selection.

Migration
Let, in a large poplation:
m = proportion of new immigrants
1-m = proportion of natives.
Let the gene frequency of a certain gene among the
immigrants = qm and among the natives = q0. Then, the
gene frequency in the combined population:
q1 = mqm + (1 - m)q0
= m(qm - q0) + q0

Change in gene frequency as a
result of immigration:
(q ) = q1 - q0
= m(qm-q0)
• It can therefore be concluded that the
change in gene frequency in the new
population depends on:
– migration rate, and
– the difference in gene frequencies between
the immigrants and the natives.

Mutation
• Mutation is the sudden change of a gene
(allele) in a population to a different form.
The effect on the population depends on
the kinds of mutation.
2 kinds of mutation:

a. Non-Recurrent mutation
AA Aa
• This kind only involves a small change in the
large population. It is not important and not
effective, because its product has a small
chance to be viable in a large population.
Normally lost and does not show changes in the
succeeding generation, as it is usually in the
form of heterozygote.

b. Recurrent Mutation
This kind affects the gene frequency. Its occurance
is recurring, and has a certain frequency of
occurance in the population.
i. Unidirectional mutation
A a
Let, mutation rate/ generation =  (  = rate of
gene A changing to a per generation)
If frequency of A in a population = p0,
Freq. of new a genes in the next generation = p0.

At equilibrium,
p0 = q0, or q = 0,
p0
q0 = ------ ;

 ( 1 - q0 )
q0 = -------------- ;

 - q0
q0 = ---------- ;

q0  =  - q0 ;
q0 (  +  ) =  ;

q0 = ------- ;
 + 

 q = -------
 + 
 (not influenced by the initial gene frequency, but influenced by rate of
mutation).

The effect of mutation on gene frequency:
1. Normally low; 10-5 to 10-6 per generation (1 in 100,000 or
1,000,000 gametes carries the new allele mutated at any loci)
2. Mutations are more frequent from the wild type to mutant type,
rather than the reverse.
Example:
 = 0.00003,  = 0.00002. Gene frequency at equilibrium:
0.00003
q = -------------------------
0.00003 + 0.00002
= 0.6
I  I
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
 = 0.00003  = 0.00002

Number of generations needed for a certain frequency to be reached:
q0 - q
 = ln -------
qn – q
________
 + 
Example:
 = 0.00003
 = 0.00002
q = 0.6,
q0 = 0.10,
q1 = 0.20 0.1 - 0.6
 = ln --------------
0.2 - 0.6
------------------------
0.00003 + 0.00002
= ln 1.25
----------
0.00005
= 4463 generation

Two factors determine fitness:
a. Long life span.
b. Number of offsprings produced within a period.
These two factors lead to higher contribution to the
succeeding generation.If the difference in fitness is
associated with the presence or absence of a gene in the
genotype of an individual, selection is said to have been
done on the gene.
 The gene freq. in the offsprings will not be the
same as it was in the parents, because individuals in the
parental generation contribute genes to the next
generation at different rates among genes.
 Selection results in changes in gene frequencies,
and hence genotype frequencies.

Kinds of Selection
• The kinds of selection consider degree/rate of
dominance for the gene involved.
1. Selection Against Recessives
Selection depends on the degree of dominance of the gene
involved.
s = coefficient of selection;
1 = fitness : contribution of the favoured genotype;
1-s : contribution of the genotypes selected against.

Degree of dominance vs. fitness:
a. No dominance
Genotype aa Aa AA
Fitness 1-s 1-1/2s 1 against a
b. Partial dominance
Genotype aa Aa AA
Fitness 1-s 1-hs 1 aginst a
degree of dominance

c. Complete dominance
Genotype aa Aa;AA
Fitness 1-s 1 against a
d. Over dominance
Genotype aa AA Aa
Fitness 1-s2 1-s1 1 against homozygotes
- selection against AA and aa

Selection Against Recessives - Complete
Dominance
(Partial Elimination of Recessives)
AA Aa aa Total
Initial Freq. p2 2pq q2 1
Sel. Coef. 0 0 s
Fitness 1 1 1-s
Gamete p2 2pq q2(1-s)1-sq2
Contribution)

q1 = freq. of gene 'a' in the following generation
q2(1-s) + pq
q1 = ---------------
1-sq2
q = q1 - q0
pq + (1-s) q2
= ---------------- - q
1-sq2
pq + (1-s)q2 - q (1-sq2)
= ----------------------------
1-sq2
pq + q2 - sq2 - q + sq3
= --------------------------
1-sq2
q - q2 + q2 - sq2 - q + sq3
= ------------------------------
1-sq2
-sq2 (1-q)
q = -------------
1-sq2

Partial Elimination of Recessives
• Determining factors:
1. initial gene freq.
2. selection coefficient.

2. Complete Elimination of Recessives
AA Aa aa
Initial Freq. p2 2pq q2
Fitness 1 1
Gamete p2 2pq 0
q
q1 = ---------
(1+q)

q = q1 - q0
q -q2
= -------
1+q
q depends on initial gene freq.
Freq. decrease at higher rate if initial freq. is high.
Freq. decrease at lower rate if gene freq.
gradually reduced.

q2 q1 q / (1+q)
= ------- = --------------------
1+q1 1+ { q / (1+q) }
q
-----
1+q
= -------------
1+q+q
----------
1+q
q 1+q
= ----- x --------
1+q 1+2q
q
q2 = -----------
1 + 2q
q3 = q
-----------,
1 + 3q

q
qn = ---------
1 + nq
qn (1 + nq) = q,
qn + nqnq = q
q - qn
n = --------
qqn
1 1
n = ---- - ----
qn q

Example:
q0 = 0.2
qn = 0.1
How many generations are required to reduce the frequency of
the recessive gene from q0 to qn through selection by
elimination of all recessives?
1 1
n = ---- - ----
0.1 0.2
= 5 generations
if q0 = 0.02, qn = 0.01,
1 1
n = ----- - -----
0.01 0.02
= 50 generations

Selection Against Heterozygotes
AA Aa aa Total
Initial freq. p2 2pq q2 1
Fitness 1 1-s 1 -
Gamete p2 2(1-s)pq q2 1-2pqs
contribution
pq (1 - s) + q2
q1 = ----------------
1 - 2pqs
q - spq
= ----------
1 - 2pqs

Δq = q1 - q
q - spq
= ---------- - q
1 - 2pqs
q - spq - q + 2pq2s
= ------------------------
1 - 2pqs
spq (2q - 1)
= ---------------
1 - 2pqs
if s too small, 2pqs approaching 0
Δq = spq (2q - 1)
= 2spq (q – 1/2)

If q = 1/2, Δq = 0;
If q  1/2, Δq positive, and increase
with generation.
If q < 1/2, Δq Negative, and decrease
every generation.
.
+ q
Δq 
0 1/2 1.0
_ q

Selection For Heterozygotes
• Normal case in natural situation
• Both alleles will be maintained in the population
and will not be lost.
• With random mating, equilibrium is reached.
AA Aa aa Total
Initial freq. p2 2pq q2 1
Fitness 1-s1 1 1-s2 -
Gametic p2 (1-s1) 2pq q2(1-s2) 1-s1p2-s2q2
contribution

pq + q2 (1 - s2)
q1 = ------------------- p + q = 1
1 - s1 p2 - s2q2
q - s2q2
= ---------------
1 - s1 p2 - s2q2
q - s2q2
Δq = ---------------- - q s gets small,
1 - s1 p2 - s2q2
= q – s2q2 – q + s1p2q + s2q3
= s1p2q – s2q2(1 – q)
= s1p2q – s2pq2
Δq = pq (s1p – s2q)

Selection For Heterozygotes
• If Δq = 0,
s1p = s2q
s1(1 –q) = s2q
s1 - s1q = s2q
q (at equilibrium) s1
= ----------
balanced polymorphism s1 + s2

Conclusion
1. In natural selection, if selection is against H is
conducted, then q will increase or decrease
depending on q and s, and q will remain
constant at q = ½
2. In selection for H, no gene will be lost or
eliminated and the rate of gene frequency
depends on the initial gene frequency and
selection coefficients
3. For selection against recessive, the recessive
will be lost very fast, if the initial frequency is
high and vice versa.

4. SMALL POPULATION SIZE
• Introduction
– In previous lectures, we discussed on agents
of change in gene and genotype frequencies
where the population size is large, i.e. in the
absence of migration, mutation or selection,
gene and genotype frequencies remain
constant from one generation to another, in a
random mating situation, in a large population
(systematic process).

SMALL POPULATION SIZE
• These features are not true in small populations.
The gene frequencies are exposed to random
increase and decrease which occur from gamete
sampling, because small populations can be
considered as samples of large populations. If
the sample size is not large enough, it will not
represent the large population, and thus
changes of gene frequencies occur. The
process of change in gene frequencies at
random in a small population is called a
dispersive process.

Prevailing Situations in a Dispersive Process:
1. Random Drift (Wright's Effect)
- Changes in gene frequencies at random.
- Frequency changes irregularly from one generation
to another, and normally does not return to its initial value.
2. Differentiation among sub-populations
Drifts occur independently within the small populations which are
contained in the large population. Matings are only confined within
the sub-populations. No random mixing of the large population.

Prevailing Situations in a Dispersive Process
Small pop.
• Large pop.

Prevailing Situations in a Dispersive Process
3. Uniformity in small populations
– Genetic variations within small populations become
small.
– Because of inbreeding, etc., many unfavourable effects
are seen.
4. Homozygosity increases among individuals
within small population.
- many unfavourable effects to population.
- fertility
- viability, etc.

Incomplete Block Designs
• Large number of treatments to be tested
• It is difficult to get uniform blocks large enough to
accommodate a complete replication of all the
treatments
• Precision increases as the block size decreases
• Smaller blocks are preferred to larger ones

Idealised Population
• A large population where mating is at random,
and population then sub-divided into many sub-
populations. This is due to geographical or
ecological factors (natural), or controlled mating
(laboratory or controlled environment).
• Initial population, which undergoes random
mating is called base population, and the sub-
populations called lines.

Lines
Base populations
• Characteristics of lines can be combined to form
the characteristics of the base population.

Balanced Incomplete Block Design (BIBD)
• Every pair of treatments occurs once in the same
incomplete block
• All pairs of treatments are compared with the same degree
of precision
• Each treatment occurs together with every other treatment
in the same block equal number of times

• Each block contains the same number of units
• Each treatment occur the same number of times in total
• Each pair of treatments occurs together the same number
of times in total
 A design that satisfies these conditions is called Balanced
Incomplete Block Design

Characteristics of Idealised Population
1. Mating only occurs among individuals within a line.
= No migration between lines.
2. Generations do not overlap among each other.
3. Number of individuals in each line is the same, = N
4. Random mating among individuals within lines.
5. No selection or mutation at any level.

Base Population (n = )
Individual N N N N
Gamete 2N 2N 2N 2 N
Individual N N N N

Sampling in Idealised Population
• For idealized population q = qo
If error is committed,
p0q0
2
Δq = ____
2N
= variance to the differences in gene frequencies.
This difference occurs when sampling is done from each of the lines.
This caused the final gene frequency not the same as the initial gene
frequency.
ie q  q o

Sampling in Idealised Population
• sub-populations have different
characteristics
– random drift
– some genes will be lost, while others fixed
in the population

The sum of squares for total, replication, treatment and error are computed as in any other
designs. The sum of squares due to block is a new statistic to be computed in lattice
designs.
1. Correction factor C.F. = 2
2
)
(
rq
GT
2. Total SS = ∑∑X2
ij(l) – C.F.
3. SSR = 2
2
q
Rj
 –C.F.
4. SSB =
)
1
(
2


r
qr
Cij
-
)
1
(
2
2


r
r
q
Ci
5. SSt =
r
Ti

2
– C.F.
6. SSE = Total SS – SSR – SSB – SSt

Practical Example……..
1.A breeder would like to evaluate 16 highly advanced hybrids in balanced lattice
design as the experimental area has variability in terms soil acidity with unknown
direction of the gradient. Then he conducted the experiment and obtained the
following measurements. The statistical objective of this example is to get
familiarize with the analysis of variance for balanced lattice design.

The stepwise analysis is as follows:
1.Compute Bj
a. B1 = Bi1 + Bi5 + Bi9 + Bi13 + Bi17
= 62.4 + 63.2 + 65.1 + 81.1 + 58.1
=329.9
a. B2 = Bi1 + Bi6+ Bi10 + Bi14 + Bi18
62.4 + 58.9 + 65.7 + 74.2 + 75
=336.2
a. B3 = Bi1 + Bi7 + Bi11 + Bi15 + Bi19
= 62.4 + 72.5 + 63.7 + 69.7 + 77.9
=346.2
a. B4 = Bi1 + Bi8+ Bi12 + Bi16 + Bi20
= 62.4 + 76.8 + 69.0 + 69.5 + 83.5
=361.2
a. B5 = Bi2 + Bi5 + Bi10 + Bi15 + Bi20
= 61.6 + 63.2 + 65.7 + 69.7 + 83.5
=343.7
a. B6 = Bi2 + Bi6 + Bi9 + Bi16 + Bi19
61.6 + 58.9 + 65.1 + 69.5 + 77.9
=333.0
a. B7 = Bi2 + Bi7 + Bi12 + Bi13 + Bi18
= 61.6 + 72.5 + 69.0 + 81.1 + 75.0
=359.2
a. B8 = Bi2 + Bi8 + Bi11 + Bi14 + Bi17
= 61.6 + 76.8 + 63.7 + 74.2 + 58.1
=329.9
a. B9 = Bi3 + Bi5 + Bi11 + Bi14 + Bi17
= 60.9 + 63.2 + 63.7 + 74.2 + 58.1
=332.3

a. B10 = Bi3 + Bi6 + Bi12 + Bi15 + Bi17
• = 60.9 + 58.9 + 69 + 69.7 + 58.1
• =329.9
a. B11 = Bi3 + Bi7 + Bi9 + Bi14 + Bi20
• = 60.9 + 72.5 + 65.1 + 74.2 + 83.5
• =356.2
a. B12 = Bi3 + Bi8 + Bi10 + Bi13 + Bi19
• = 60.9 + 76.8 + 65.7 + 81.1 + 77.9
• =362.4
a. B13 = Bi4 + Bi5 + Bi12 + Bi14 + Bi19
• = 73.7 + 63.2 + 69 + 74.2 + 77.9
• =358
a. B14 = Bi4 + Bi6 + Bi11+ Bi13 + Bi20
• = 73.74 + 58.9 + 63.7 + 81.1 + 83.5
• =360.9
•
a. B15 = Bi4 + Bi7 + Bi10 + Bi16 + Bi17
• = 73.7 + 72.5 + 65.7 + 69.5 + 58.1
• =339.5
a. B16 = Bi4 + Bi8 + Bi9 + Bi15 + Bi18
• = 73.7 + 76.8 + 65.1 + 69.7 + 75 = 360.3

1. Compute Wj
a. W1 = qT1 – (q+1)B1 + GT
= (4x80.7) – (4+1) x 329.9 + 1382.5
= 55.8
b. W2 = qT2 – (q+1)B2 + GT
= (4x80.4) – (4+1) x 336.2 + 1382.5
= 23.1
c. W3 = qT3 – (q+1)B3 + GT
= (4x91.3) – (4+1) x 346.2 + 1382.5
= 16.7
d. W4 = qT4 – (q+1)B4 + GT
= (4x91.5) – (4+1) x 361.2 + 1382.5
= -57.5
e. W5 = qT5 – (q+1)B5 + GT
= (4x81.3) – (4+1) x 343.7 + 1382.5
= -10.8
f. W6 = qT6 – (q+1)B6 + GT
= (4x85.3) – (4+1) x 333 + 1382.5
= 58.7
g. W7 = qT7 – (q+1)B7 + GT
= (4x87.7) – (4+1) x 359.2 + 1382.5
= -62.7
h. W8 = qT8 – (q+1)B8 + GT
= (4x87.7) – (4+1) x 334.4 + 1382.5
= 61.3
i. W9 = qT9 – (q+1)B9 + GT
= (4x80.2) – (4+1) x 332.3 + 1382.5
= 41.8

a. W10 = qT10 – (q+1)B10 + GT
= (4x59.4) – (4+1) x 316.6 + 1382.5
= 37.1
b. W11 = qT11 – (q+1)B11 + GT
= (4x95.8) – (4+1) x 356.2 + 1382.5
= -15.3
c. W12 = qT12 – (q+1)B12 + GT
= (4x95.1) – (4+1) x 362.4 + 1382.5
= -49.1
d. W13 = qT13 – (q+1)B13 + GT
= (4x87.5) – (4+1) x 358 + 1382.5
= -57.5
e. W14 = qT14 – (q+1)B14 + GT
= (4x99.9) – (4+1) x 360.9 + 1382.5
= -22.4
f. W15 = qT15 – (q+1)B15 + GT
= (4x90.8) – (4+1) x 339.5 + 1382.5
= 48.2
g. W16 = qT16 – (q+1)B16 + GT
= (4 x 87.9) – (4+1) x 360.3 + 1382.5
= -67.4

1. Compute Sum of squares for the different components. The best way is to start with the
adjusted block sum of squares because the mean square of the block is an important
component for making decision whether we continue the analysis as lattice or as RCBD
after comparing it with MS of error.
a. SS block (adjusted) =
)
1
(
3
2


q
q
Wj
=
)
1
4
(
4
]
)
4
.
67
(
...
)
1
.
23
(
)
8
.
55
[(
3
2
2
2





=109.14
b. MS block =
1
)
(
2

q
adj
block
SS
=
1
4
14
.
109
2

= 7.28 = Eb
c. Correction factor(C.F.) = 2
2
)
(
rq
GT
= 2
2
4
5
)
5
.
1382
(
x
= 23891.33
d. Total SS = ∑Xijk
2
– CF = [(14.9)2
+ (15.2)2
+ … + (22.5)2
] - 23891.33
= 566.4

a. SS treatment (unadj.)
SSt (unadj.) = .
.
2
F
C
r
Ti


=
5
]
)
9
.
87
(
...
)
4
.
80
(
)
7
.
80
( 2
2



= 257.13
b. SS replication (SSR)
SSR = .
.
2
2
F
C
q
Rj


= 2
2
2
4
)
5
.
294
(
...
)
4
.
271
(
)
6
.
258
[( 


-23891.33
= 72.71
c. SS due to error (SSE)
SSE = Total SS – SSt(unadj) – SS block(adj) – SSR
= 566.4 – 257.13 – 109.14 – 72.71
= 127.42
d. Degree of freedom for error = (q-1)(q2
-1) = (4-1)(42
-1) = 45
e. MSE = SSE/d.f. for error = 127.42/45 = 2.83 = Ee, Once the two statistics are obtained, it
is possible to check whether μ is positive or not. If it is positive we will continue the
analysis as lattice, if not as in RCBD.

a. μ =
b
e
b
E
q
E
E
2

=
28
.
7
4
83
.
2
28
.
7
2
x

= 0.04, since μ is
positive we will proceed to adjust
treatment means as in Table 4.72.
Let T’j = Tj + μWj where Tj is unadjusted treatment total
Table 4.72. Computing adjusted treatment means
Treatment Tj Bj Wj T’j = Tj + μWj Adjusted mean(T’j/r)
T1
T2
T3
T4
T5
T6
T7
T8
T9
T10
T11
T12
T13
T14
T15
T16
80.7
80.4
91.3
91.5
81.3
85.3
87.7
87.7
80.2
59.4
95.8
95.1
87.5
99.9
90.8
87.9
329.9
336.2
346.2
361.2
343.7
333.0
359.2
334.4
332.3
316.6
356.2
362.4
358.0
360.9
339.5
360.3
55.8
23.1
16.7
-57.5
-10.8
58.7
-62.7
61.3
41.8
37.1
-15.3
-49.1
-57.5
-22.4
48.2
-67.4
82.93
81.32
91.97
89.20
80.87
87.65
85.19
90.15
81.87
60.88
95.19
93.14
85.20
99.00
92.73
85.20
16.59
16.26
18.39
17.84
16.17
17.53
17.04
18.03
16.37
12.18
19.04
18.63
17.04
19.80
18.55
17.04

1. SSt (adjusted) =
1
2


q
Ti
– CF
=
1
4
]
)
20
.
85
(
...
)
32
.
81
(
)
93
.
82
[( 2
2
2




- 23891.33
= 223.77
2. MSt = SSt(adj)/q2
–1= 223.77/42
-1 = 14.92
3. Effective error MS = Ee (1 + qµ) = 2.83(1+4x0.04) = 3.2828
4. F-calculated = MSt/Effective error MS = 14.92/3.2828 = 4.54
5. Finally ANOVA table can be constructed as in Table 4.73
Table 4.73. ANOVA for balanced lattice design
Sources of d.f. SS MS F-cal F-tabulated
Variation 0.05
0.01
Replication 4 72.71 18.18
Block (adj) 15 109.14 7.28 2.57* 1.90
Treatment(adj) 15 223.77 14.92 4.54* 1.90
Intra-block error 45 127.13 2.83
Effective error 45 3.2828
*, significant at 0.05 level of probability,

CV = 100
)
( x
GM
MS
Effective E
= 56
.
11
28
.
17
28
.
3

1. Compute standard errors
SE(m) =
r
q
Ee )]
1
(
[ 

=
5
)
04
.
0
4
1
(
83
.
2 x

= 0.81
SE(d) =
r
q
Ee )]
1
(
2
[ 

=
5
)]
04
.
0
4
1
(
83
.
2
2
[ x
x 
] =
CD/LSD = SE(d) x t0.05 at error degree of freedom
= 1.15 x 2.019
= 2.32
2. Compute relative efficiency of lattice over RCBD
MSERCBD =
error
block
E
B
f
d
f
d
SS
adj
SS
.
.
.
.
)
(


=
45
15
)
42
.
127
(
)
14
.
109
(


= 3.94
Effective error MS = 3.2828
RE = MSERCBD/Effective error MS = 3.94/3.2828 = 1.20

Interpretation
 Treatment effect was found to be significant
 T14 had the highest grain yield (19.80 t/ha) followed by
T11 and T12 (but statistically these three treatments
did not show differences among themselves)
 There was also a significant block effect implying that
blocking helped in reducing experimental error
 The relative efficiency of 1.20 indicates that the use of
lattice design instead of RCBD improved precision by
20%

Partially Balanced Design
 Characteristics of partially balanced design:
 All treatments do not occur together in the same block
 The number of replications is not restricted
 Number of treatment must be a perfect square

Differences in genotype frequencies
• Therefore: Difference from initial
AA : p2 = ( p )2 + q
2
Aa : H = 2pq - 2 q
2
aa : q2 = ( q)2 - q
2
= Wahlund’s Formula

Inbreeding
• Inbreeding-breeding together of individuals
more closely related than mates chosen at
random from a population (mating of
relatives)
• Inbreeding coefficient – probability of any
individual (diploid individual) being an
identical homozygote.
• probability that the 2 genes of a random
member of a pop. are identical by descent

Inbreeding
a a - allele
-identical by descent
• When we have more genes/genotypes that are identical
by descent, then the higher is the chance of inbreeding
incidence in the population. The incidence of inbreeding
is measured by coefficient of inbreeding (F)
aa

Inbreeding
m
F = 1/2  (1/2)n (1 + FA)
i=1
Fi = Inbreeding coefficient of an individual or a population of
individuals
n = No. of generations separating the male and female
parent through the common ancestor.
m = No. of pathways to get the individual through common
ancestor
FA= coefficient of inbreeding for common ancestor for seed
and pollen parent.

In any breeding program,
population mean refers to both phenotypic and
genotypic values, because we consider the
environmental deviation = 0 in the population.
To assume this, we have to have a good control
of the environment, i.e. by growing or raising all
individuals in the population in the area with no
difference in the environmental influence.

Alpha Lattice Design
 Alpha-lattice design is replicated designs that divide the
replicate into incomplete blocks that contain a fraction of the
total number of entries
 It bridges the gap between RCBD and lattice designs
 The number of treatments should not necessarily be a
perfect square
 Genotypes are distributed among the reps so that all pairs
occur in the same replication in nearly equal frequency

Alpha lattice design
• Suppose a researcher is interested in evaluating 20 genotypes in
alpha (0,1)-lattice design. Then, there will be t = 20, q = 4 and
number of treatments in each block = 5

Alpha Lattice Design
Used when we have large number of genotypes and small area
There are no checks varieties for estimation error
It reduce the effect of within-complete-block variation
They can also provide repeatability, particularly in trials
Maximizes the use of comparisons between genotypes in the
same incomplete-block

Advantages of alpha lattice design
 It allows the adjustment of treatment means for block effects
 The small incomplete blocks create homogeneous comparisons
 It provides effective control within replicate variability

Augmented Design
 Augmented designs also use grids or incomplete blocks to
remove some field variation from the plot residuals
 In an augmented design, a large set of experimental lines is
divided into small incomplete blocks
 In each incomplete block, a set of checks is included; every
check occurs in each incomplete block

Augmented Design
 Because the design is unreplicated, the repeated checks are
used to estimate the error mean square and the block effect
 The block effect is estimated from the repeated check means
and then removed from the means of the test varieties
 This reduces error and increases precision somewhat

Augmented Designs
Developed by Federer (1956)
Used to test a large number of lines in a limited area
Used when other designs are not appropriate due to large
number of entries
In augmented designs the goal is to compare existing (control)
treatments with new treatments that have an experimental
constraint of , limited replication and resources

Augmented Design
Experimental lines replicated once
Checks occur in each block
Checks used to estimate block effects
Checks provide error term
Difficult to maintain homogeneous blocks when comparing
Flexible – blocks can be of unequal size

Disadvantage of Augmented Design
Considerable resources are spent on production and
processing of control plots
Relatively few degrees of freedom for experimental
error, which reduces the power to detect differences
among treatments
Un-replicated experiments are inherently imprecise, no
matter how sophisticated the design

Analysis of Quantitative Traits
Consider,One locus, two alleles - A1/A2
Genotypes: A2A2 A1A2 A1A1
Value -a 0 d +a
Alelle A1 has value that increases the mean value.
d : depends on the degree of dominance
d = 0, no dominance
d = +, A1 > A2
if complete dominance, d = +a, -a
Over dominance, d > +a or d < -a
Degree of dominance = d/a

Population Mean
In a population, population mean is product of the above genotypic
values, after the effects of all loci controlling the trait is combined. i.e.
when every genotypic value is multiplied with its frequency, and then
total for all three genotypes taken.
Genotype Frequency Value Frequency x Value
A1A1 p2 +a p2a
A1A2 2pq d 2pqd
A2A2 q2 -a -q2a
Population Mean = a(p - q) + 2dpq

Population Mean
Population Mean:
M = a(p – q) + 2dpq
M = a(p - q) + 2dpq
Produced by Produced by
Homozygotes heterozygotes

Population Mean
• The value is the product of gene
contribution from all/many loci, effect of
combination = population mean.
• Assume here that they all combine
additively,
M = a(p - q) + 2dpq

Average Effect
• Average effect of a gene is the average deviation from the
population mean, of individuals receiving one gene from one parent,
while the other is received at random from the population.
• Average effect of a gene substitution - effect on the population
mean, when a gene is substituted with another with a different form.
ie. A1 → A2 A1A1 → A1A2
A2 → A1 A1A2 → A1A1
1 locus, 2 alleles,
Frequency of A1 = p
Frequency of A2 = q

Average Effect
• Average effect of gene A1= 1,
• If a gamete carrying A1 combines with
gametes at random in the population, the
genotype frequencies resulting would be,
A1A1 = p
A1A2 = q
Genotypic value for A1A1 = a
Genotypic value for A1A2 = d
Mean for both = pa + qd

Average Effect
• Difference between this mean value and the population mean is
average effect of gene A1.
 1 = pa + qd - a (p - q) + 2dpq
 1 = qa + d(q - p)
For gene A2,
2 = -pa + d(q - p)
If A2 is taken at random, genotype frequency,
A1A2 = p
A2A2 = q
Changing A1A2 A1A1, changing value of d to +a
 effect = (a - d)

Average Effect
• Changing A2A2 A1A2, changing value of -a to d
 effect of gene substitution
 = p (a - d) + q(d+a)
Computation: pa - pd + qd + qa
= a(p+q) + d(q - p)
= a + d(q - p)
  = a + d(q - p)
Relate with 1, 2,
 = 1 - 2

Breeding Value
• The value of an individual as judged by the
mean value of the progenies is called the
breeding value.
• Breeding value can be measured by the
value of deviation from the population
mean.

Breeding Value
population
X
A
• If a certain individual is mated with a group of individuals at random
from a population,
• Breeding value = 2 x the average deviation of the progenies from
the population mean.
• In the context of average effects, the breeding value of an individual
= total average effects of the genes it carries, summed up for all
pairs of genes (alleles) at every locus, for all loci involved.
XXXXXXXX
XXXXX

Breeding Value
• Considering one locus, the breeding value
for the genotypes:
B.V. A1A1 = 21 = 2q
B.V. A1A2 = 1 + 2 = (q - p) 
B.V. A2A2 = 22 = -2p
Values of progenies

Breeding Value
Arbitrary Value Breeding value
+ a 2q

d (q-p)
0 

-a -2p
0 1 2
A2A2 A1A2 A1A1
q2 2pq p2 No. of A1 gene

• We have discussed about only a component of the
genotype value, i.e. additive effects; i.e. breeding value
ie. G = A + D + I
G = genotypic value
A = breeding value
D = dominance deviation
I = interaction deviation
• For one locus only:-
G = A + D
Breeding Value

Dominance Deviation
 Dominance Deviation is a function of d
d = 0, DD = 0
 all genes are additive in nature
I Deviation Interaction
G = A + D + I
• When more than one locus involved. If I  0, there is locus
interaction contributing to the genotypic value. It is called epistasis. I
is also called epistatic deviation.
• If I = 0, the genes are said to act additively among loci.
• If 1 locus involved, additive action means the absence of
dominance.
• If more than 1 locus involved, additive action means absence of
epistasis.

6. MATING DESIGNS AND ESTIMATION
OF GENETIC PARAMETERS
• Heterosis = hybrid vigor
– the superiority of F1 over its parents
– Positive traits
– Negative traits

Heterosis cont.
• Superiority of the cross over its
parents
– A/a locus, AA x aa = Aa, position
– Dominance √
– Epistasis √
– Additive X

Heterosis cont.
• Average parent,
MPH = 100(F1-MP)/MP
• Better parent,
BPH = 100(F1-BP)/BP

Heterosis cont.
• What is better parent?
• For which traits
• Positive traits:
• Negative traits:
• Measurement and score:

Heterosis cont.
• What is the basis for heterosis?
– Theory of dominance
– Theory of over-dominance
– Theory of physiological enhancement

Heterosis cont.
• Heterotic patterns
• Theory of testers
– Broad-base
– Narrow-base

• General combining ability
• Average performance,
• Additive genetic variance, σ2
A
• Specific combining ability
• Specific combinations,
• Non-additive genetic variance, σ2
I
Combining Ability

DIALLEL MATING DESIGN
Introduction
Diallel cross - mating design where all possible
crosses are made on an individual or
population (inbred, variety) to obtain all
possible combinations.
- Complete diallel
- Partial diallel (half-diallel)

DIALLEL MATING DESIGN
Exmaple: n inbred lines, therefore
n x n = n2 Components:
n parents = 7
½ n (n-1) crosses = 21
½ n (n-1) reciprocals = 21

DIALLEL ANALYSIS
• Diallel is difficult to construct but useful
to obtain genetic information from
populations.
• Started by Sprague and Tatum (1942)
• Uses of diallel (Hayward, 1979):
Application of the dialel cross to
outbreeding crop species.

USES OF DIALLEL ANALYSIS
1. Strategic survey on populations
as initial breeding materials in
breeding programmes.
– observe variance components
– observe genetic variablility
– estimate heritability
- Use top–cross

2. Tactical assessment on genetic
relationships among selected elite
genotypes.
– Selection can be done to select parents
that have good combining ability.
– Inbred lines have to be first developed and
then tested.

Examples: - Evaluate SCA, GCA on hybrid
combinations among inbred lines.
Methods of estimating GCA and SCA:
1. Diallel Analysis
2. Mating Designs I, II, III
3. Test cross performance
- top cross, inbreds, hybrids, full/half-sibs.
4. Self-progeny performance.

Methods of Diallel Analysis
Many methods have been proposed
including,
a) Hayman's (1954)
b) Griffing's (1956) - 4 Methods
c) Gardner and Eberhart's (1966)
- Analysis II, Analysis III.
- Focus on Griffing's Methods
(Ref: Issues of Dialel Analysis by Baker, 1978)

Assumptions used in diallel analysis
a. Diploid segregation of the individuals involved.
b. Homozygous parents.
c. No reciprocal differences.
d. No epistasis.
e. No multiple alleles.
f. Uncorrelated gene distribution in the two parents

a) Example of strategic survey on populations: To determine the features of a trait in
terms of its genetic components, regression of Wr against Vr is used.
Lets say, we test five parents which were entered into a diallel i.e. A, B, C, D
and E, to form progenies.
Partial dominance
Wr Complete dominance
W2 = Vr Vp
xA Over dominance
xC
xD xB
xE
0 Vr
Wr = covariance of progenies on parents
Vr = variance of progenies on parents.

Diallel Analysis
• From the Wr - Vr graph above:
- The line that passes through the origin (0)
shows complete
dominance as the main feature of the
control of the trait concerned.

Diallel Analysis cont.
a. Above origin - partial dominance.
b. Below origin - over dominance.
c. The larger the Vr value, the
higher is the interloci interaction.

d. E.g. A and E are more different
from each other genetically
because their points are far apart
on the graph, as compared to e.g.
D and E.
e. All points within the parabola -
parabola limits the values of the
coordinates.

f. If points close to the origin
- more dominant genes.
g. If far from origin
- more recessive genes.

Example, 7 x 7 half diallel:
1 2 3 4 5 6 7
1 37.250 38.500 38.375 39.500 37.375 38.125 38.375
2 30.500 32.125 32.750 34.875 38.750 32.625
3 31.000 32.625 34.875 39.000 35.125
4 32.250 36.375 37.500 35.375
5 35.250 38.875 35.625
6 38.500 38.625
7 34.250

Diallel Analysis
Correction factor = ( one parental cross value )2
n
= (37.250 + 30.500 + ......+ 34.250)2
7
= 8160.143
Variance:
Vp (phenotypic variance of population):
= 1 [(one-parent cross value)2 ] - Correction factor (C.F.)
n-1
= 1/6 x [37.2502 + 30.5002 + ..... + 34.2502 ]2 – C.F.
= 9.4345

Correction factor = (Grand total)2/Total number of observations.
= (23231.82)2
4x64
= 2108271.3301
Total S.S. = (104.86)2 + (88.66)2 + .......... + (81.48)2 - C. F.
= 127712.5000
Treatments S. S. = (342.58)2 + (348.05)2 +.......... + (328.00)2 -
C.F.
4
= 104924.1604
Replication S. S. = (5811.48)2 + .......... + (5951.34)2 - C.F.
64
= 1037.0241
Error S. S. = Total S. S. - Treatment S. S. - Replication S. S.
= 21751.3155.

Diallel Analysis
Vi = 1 {  (value of all crosses to i )2 -[ ( value of all crosses
n - 1 to i )2/n ] }
V1 = 1/6 { ( 37.2502 + 38.5002 + .... + 38.3752) -
(37.250 + 38.500 + .... + 38.375)2/7)
= 0.57143
V2 = 1/6 { ( 38.5002 + 30.5002 + .... + 32.6252) -
(38.500 + 30.500 + .... + 32.625)2 / 7)
= 10.35863
V3 = 9.47098
V4 = 7.75446
V5 = 2.21801
V6 = 0.26786
V7 = 4.61830.

Diallel Analysis
Covariance:
Wi = 1 X { [ (cross of parent with i X one-parent cross of the specific
n – 1 parent concerned)] - [(total of all crosses to i) ( total of all
one-parent cross ) / n]}
W1= 1/6 X [(37.250 X 37.250) + (38.500 X 30.500) + .....+ 38.375 X
34.250)] - [(37.250 + 38.500 + ....+ 38.375) ( 37.250 +
30.500 +........+ 34.250)] / 7
= -1.37946
W2 = 9.41815
W3 = 9.22173
W4 = 7.88373
W5 = 3.30878
W6 = 0.22098
W7 = 5.74033

Finally, the graph of Wr vs. Vr can be constructed:
Wr
10 -
9 -
8 - 2*
7 - 3 *
6 - 4 *
5 - 7*
4 -
3 - 5*
2 -
1 - * 6
l l l l l l l l l
0 1 2 3 4 5 6 7 8 9 10
Vr
*1

Diallel Analysis
• Deductions from the graph:
– Points close to each other, parents are similar.
– Points far from each other, parents are different.
– Generally, this trait is controlled by genes with complete
dominance.
– Example, 1, 6 carry more dominant genes, 2, 3, 4 carry
more recessive genes.
– This analysis is called graphical analysis of a diallel cross.
– Convenient with the use of computers.

Diallel Analysis
b. Example of the Use of Diallel Analysis for Tactical Assessment
• To test the GCA and SCA for certain hybrid combinations.
– GCA – to determine the average performance of lines/inbreds in
hybrid combinations.
– SCA – to compare performance of one cross with the other crosses.
i.e. is it better or worse than the average performance
of all crosses.
• Example: A x B; A x C; A x D; A x E; A x F.
Average for A crosses = ?
Compare with A x F, for example to determine SCA(A x F)
• Griffing’s Method (1956) : For n2 diallel table
‘v’ genotype ‘b’ block ‘c’ individuals/plot

Diallel Analysis
• Observation on performance:
Xijkl = μ + νij + bk + (bν)ijk + ejkl
μ = overall population mean
νij = genotype
bk = k th block effect
(bν)ijk = block and genotype interaction effect
eijkl = experimental error
• then use analysis of variance to look at significance of differences.
• genotypes were normally chosen for specific goals, i.e. hybrids, etc.
MSv
F = ........
MSe
• - if the effect of genotypes is significant, look at the components of M.S., to
determine GCA, SCA and other effects.

Diallel Analysis
• Values were given for each effect. The break-down of the genotype
effects are as follows:
Xij = μ + gi + gj + sij + rij + Σ Σ eijkl /bc
g = GCA i and j = parents
s = SCA
r = reciprocal effects
b = no.of blocks
c = no. of individuals
e = effects of environmental factors
μ = overall mean
• Analysis is limited to the following conditions:
– sij = sij Σ gi = 0
– rij = -rji Σ sij = 0

Diallel Analysis
ANOVA table
______________________________________________________________________________________
Source df SS MS EMS
_______________________________________
Fixed Model Random Model
_____________________________________________________________________________________
GCA n-1 Sg Mg σ2 + 2n Σ gi
2 σ2+2(n-1)σs2+2nσg2 +2r
----------- ------------- ---------------------------------------------
n-1 n
SCA n(n-1)/2 Ss Ms σ2 + 2Σ Σ sij
2 σ2+2(n2-n+1)σs2
------------------ ---------------------
n(n-1) n2
Reciprocal n(n-1)/2 Sr Mr σ2 + 4 Σ Σ rij
2 σ2+2σr2
----------------------
n(n-1)
Error Se Me’ σ2 σ2
______________________________________________________________________________________
Me’ = Me (MSe) r: Number of replications (observations)
-----
r

Diallel Analysis
• To get more detailed breakdown of the combinations:
gi = 1 (Xi. + X.i ) - X../n2
---
2n
sij = 1 (Xij + Xji ) - 1 (Xi. + X.i + Xj. + X.j ) + X..
--- --- ----
2 2n n2
rij = 1 (Xij + Xji )
---
2
• References:
– Biometrical Genetics (Mather and Jinks)- Diallel.

Example of a complete 7 x 7 diallel cross, in a tactical assessment involving
4 replications in RCBD:
1 2 3 4 5 6 7
1 37.25 38.50 38.25 40.00 35.75 38.75 38.25
2 39.00 30.50 32.75 32.00 35.50 39.25 32.73
3 38.50 31.50 31.00 32.75 34.75 38.50 34.75
4 39.00 33.50 32.50 32.25 36.25 35.75 35.25
5 39.00 34.25 35.00 36.50 35.25 39.25 36.25
6 37.50 38.25 39.50 39.25 38.50 38.50 39.50
7 38.50 32.50 35.50 35.50 35.20 37.75 34.25

Example of a complete 7 x 7 diallel cross
ANOVA table:
_______________________________________________________
Source df SS MS F
_______________________________________________________
Reps (Blks) 3 19.1875 6.3958 268.46
Genotypes 48 1380.1250 28.7526 12.0689**
Error 144 343.0625 2.3824
_______________________________________________________
Total 195 1742.3750

1 2 3 4 5 6 7 Yi.
1
2
3
4
5
6
7
Y.i GT

Yi. Y.i (Yi. + Y.i)
1 Y1
2 Y2
3 Y3
4 Y4
5 Y5
6 Y6
7 Y7
GT
SSGCA = 1/2n(Σ(Y1
2+Y2
2….Y7
2) – 2/n2(GT)2

Yij (Yij + Yji) Yij(Yij + Yji)
1
2
3
4
5
6
7
GT
SSSCA = 1/2ΣΣ Yij(Yij + Yji)– 1/2n(Yi.+Y.i)2+1/n2(GT)

Testing for significance
MSe for testing GCA & SCA= MSE/r
MSGCA ** MSGCA
ns
MSSCA ** MSSCA **
MSGCA**
MSSCA
ns

Breakdown of Genotype effects:
______________________________________________________
Source d.f. MS F
______________________________________________________
Genotypes (48) 6 (GCA) 37.8310 63.518*
21 (SCA) 4.6701 7.841*
21 (Residual) 0.9509 1.597
48
Error 144 MSei = MSe’ = 2.3824 = 0.5956
------- ---------
4 4
______________________________________________________

• The significant varaition among genotypes is caused by GCA and SCA
effects. GCA has a larger contribution to the genotype differences.
Genotypic variances are mainly due to additive gene action, and a little
amount of non-additive gene action.
When further subdivided:
g1 = 2.0969
g2 = -1.8138
g3 = -1.3852
g4 = -0.9209
g5 = 0.0612
g6 = 2.3648
g7 = -0.4031
 = 0

1 2 3 4 5 6 7
1 -3.062
2 2.0995 -1.9898
3 1.5459 -0.7934 -2.3469
4 2.2066 -0.6327 -1.8620 2.0255
5 -0.9005 0.5102 0.0816 1.1173 -0.9898
6 -2.4541 2.0816 1.9031 -0.0612 0.3316 -2.3469
7 0.5638 -1.2755 0.7959 0.5816 -0.1505 0.5459 -1.0612

Conclusion in Diallel Analysis
Therefore, when selecting for traits with
small figures, example, earliness, need to
go for parents with high negative values,
while when higher figures e.g. yield is
favoured, high positive values is
selected.

North Carolina Mating Designs
• Mating designs are normally termed as the
North Carolina Design, because they were
first introduced by the North Carolina State
University, USA (by Comstock,
Cockerham and Robinson).

• Mating designs are designs used in
cyclic selection schemes, where
progenies and families are created, and
then used for the purpose of:
– estimation of genetic components in the
control of a trait, calculation of gain from
selection, and development of new
populations.

• There are many kinds and variations as
well as modifications of the designs, as
proposed. However, in principle, they
are categorised as follows:

Design I
Uses:
• to estimate genetic components of variance
• to estimate degree of dominance
• to calculate gain from selection.

Design-I
• Design I is a nested design, where every
male is mated to a number of females in a
set. This is done in Season I, i.e. at the
mating nursery stage.

Season I
From the base population:
Male Female  4 half-sib families (HS)
1 x x x x (HS) formed
2 x x x x
 4 HS families
3 x x x x x 4=16 HS families
4 x x x x
____________________________________________
. . . . .
. . . . .
n . . . .

Season 2:
After the half-sib and full-sib families were formed from
the crosses in Season I, the progenies were then
eveluated for performance in Season 2, following the
sib identities.
Example:
Set-I: 16 HS families + 2 check varieties
= 18 x 2 rep
9 entries/block:
Example: Block No. 1 Block No. 2 ............., n
II 36 -------------- 28
19 -------------- 27
I 18 -------------- 10
1 -------------- 9

Stages in the cyclic selection schemes:
Yield Trial on Progenies
From Crosses
(Season 2)
(Season 5) Data Collection
Estimation of Predicted Gain from
Selection, from Yield Trial Data Analysis
h2 estimation
Estimation of
Variance components Selection
Estimation of
Mating Nursery degree of dominance
- Formation of families
(Season I) Recombination
(Season 4) (Season 3)

ANOVA – DESIGN I
For 1 Block:
__________________________________________________________
Source d.f. EMS MS
__________________________________________________________
Rep (2) r-1 = 1
Male (4) m-1 = 3 2
e + r2
f/m + rf2
m M1
Female/Male m(f-1) = 12 2
e + r2
f/m M2
M x F (m-1)(r-1) = 3
B/M x F (f)(m)(r-1) = 12 2
e M3
___________________________________________________________
Total n-1 = 31
(rmf-1)

DESIGN I
Calculation of Heritability:
M3 = 2
e
2
f/m = (M2-M3)/r
2
m = (M1-M2)/rf
2
T = 2m + 2f + 2
e
2
m = covariance of half-sibs
= 1/4 VA (Falconer)

DESIGN I
2
m = 1/4 2
A
2
A = 42
m
2
f/m = (M2 - M3)/ r
= 1/4 VT
2
f/m = Cov. FS - Cov. paternal half sibs
= 1/2 2
A + 1/4 2
D - 1/4 2
A
= 1/4 2
A + 1/4 2
D
= 1/4 (2
A + 2
D )
42
f/m = 2
A + 2
D
2
D = 42
f/m - 42
m
= 4(2
f/m - 2
m )

DESIGN I
h2
(m) = 2
A / 2
T
= 42
m/2
T
= h2
N
h2
(f) = (2
A + 2
D )/2
T
= 42
f/m /2
T
= h2
B
h2
(m+f) = 2 (2
m+2
f/m)/2
T

DESIGN I
Selection Phase:
HS Family Selection
– based on performance of HS families in
Season 2 Yield Test. For Recombination
phase, use remnant self seeds from males in
Season 1.
• The phases involving mating, testing,
selection and recombination of selected
families are conducted in a cyclic manner.

Design II
• Uses:
– to estimate genetic components of variance
– to estimate degree of dominance
– to estimate epistatic variance
– to calculate progress from selection
• Also called Factorial Mating Design, where
every male is crossed to one female in a
factorial manner.

Design II
Example:
Male inbred = 4
1 2 3 4
5
Female inbred= 4 6
7
8  produce 16 FS families
- Population size to be tested is bigger – about twice the size that
of Design I
Example: with 4 males, 4 females, 16 crosses:

Design II
• Requires bigger population size in order to
obtain information with the same precision
as Design I,
• Although the population to be used is
much larger, the advantages of Design II
are that:
– it can estimate epistasis
– suitable to be used in situation where some
degree of inbreeding occurs in the population.

ANOVA – Design II
__________________________________________________________
Source d.f. EMS
_________________________________________________________
Rep (2) r-1 = 1
Males (4) m-1 = 3 2
e + r2
MF + rf2
M
Females (4) f-1 = 3 2
e + r2
MF + rm2
F
M x F (m-1)(f-1) = 9 2
e + r2
MF
Error (m-1)(f-1)(r-1) = 9 2
e
____________________________________________________________
Total 31

• From here, the calculations for genetic
components of variance and heritability
can be computed.
Given:
2 = 2
e
2
F = MSf-MSmxf/r
2
M = MSm-MSmxf/r
= 1/4VA
Design II

Design III
• Uses:
1. more powerful in estimating the
degree of dominance.
i.e. with a lesser amount of data, it
gives a stronger estimate of the degree
of dominance.
Design-I = 10-12 times
Design-II = 3-4 times
Design-III = 1 time

Design-III: Uses
2. In determining which generation
to use, i.e F2, F4, etc, depends on
the presence or absence of linkage
– the stronger the linkage, the more
advance is the generation required.

♀ ♀ (Original stock P1 – e.g. Inbred line A)
Female ♀ ♀
♀ ♀
♀ ♀
Male ♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂...... (any generation from the cross between the
2 parental stocks:)
♀ ♀
Female ♀ ♀
♀ ♀
♀ ♀ (Original Stock P2 – e.g. Inbred line B)

• The source populations for this design are normally the
product of a certain programme with specific objectives
• Therefore, the evaluation on the progenies of of the 16
male parents
(e.g. F2) in Season 2 will involve:
16 x2 (parents) = 32 FS families/ block + 2 checks/rep
x 2 reps/block
____________
72 plots/block
Design III

ANOVA – Design-III (1 block)
_______________________________________________
Source df EMS
_______________________________________________
Rep r-1
Female parent p-1 2e + r2MF + rm2F
Male parent n-1 2e + r2MF + rp2M
M x F (n-1)(p-1) 2e + r2MF
Error (n-1)(p-1)(r-1)2e
_______________________________________________
Design III

Pascal’s triangle.
1 no segregating alleles
1 1
1 2 1 two alleles,
1 3 3 1
1 4 6 4 1 four alleles,
1 5 10 10 5 1
1 6 15 20 15 6 1 six alleles,

Line x Tester Analysis
• Kempthorre (1957)
• Broad-based Tester
• Narrow-based Testers
• Why L x T
– Cost

• Uses
– Information on GCA
– Information on SCA
– Information on gene effects
– Male female relationship
– Grouping

T1 T2 T3
L1
L2
L3
.
.

based on performance of hybrid, in
Season 2 Yield Test.
Select good combinations
The phases involving crossing and testing.

ANOVA – Design-III (1 block)
_____________________________________________
Source df MS
______________________________________________
Rep r-1
Genotypes g-1
Parents p-1
P vs. C 1
Crosses c-1
Lines l-1
Testers t-1
L x T (l-1)(t-1)
Error (r-1)(g-1)
_______________________________________________

SSc = ΣCi
2/r– C.F. (GTc)2/rc
SSp = Σpi
2/r– C.F. (GTp)2/rp
SSpvs.c = SSg–SSc–SSp

T1 T2 T3 Total
1 C1 C2 L1
2 L2
3 L3
4 L4
5 L5
T1 T2 T3 GT
SSL = ΣLi
2 /tr– C.F.(crosses)
SST = ΣTi
2/lr – C.F.(crosses)
SSLxT =SSc-SSL-SST

SSc =SSg-SSp-SS p vs. c
SSp =SSg-SSc-SS p vs. c

5 lines, 3 testers, 4 reps.
Blocks df
Genotypes
Parents, P
P vs. C
Crosses
Lines
Testers
L x T
Error

5 lines, 3 testers, 4 reps.
Sources d.f. MS
Blocks 3 27.66ns
Genotypes 22 1479**
Parents, P 7 899**
P vs. C 1 53ns
Crosses 14 1871**
Lines 4 2579ns
Testers 2 859ns
L x T 8 1770**
Error 66 91

Advanced Biometrics Course on Plant Breeding and Biotechnology

Recommended

Recommended

More Related Content

Similar to Advanced Biometrics Course on Plant Breeding and Biotechnology

Similar to Advanced Biometrics Course on Plant Breeding and Biotechnology (20)

Recently uploaded

Recently uploaded (20)

Advanced Biometrics Course on Plant Breeding and Biotechnology