1.
PCB5065 Advanced Genetics Population Genetics and Quantitative Genetics Instructor: Rongling Wu, Department of Statistics, 420A Genetics Institute Tel: 2-3806, Email: [email_address] Tues Nov 14 Population genetics - population structure Wed Nov 15 Population genetics - Hardy-Weinberg equilibrium Thurs Nov 16 Population genetics - effective population size Mon Nov 20 Population genetics - linkage disequilibrium Tues Nov 21 Population genetics - evolutionary forces Wed Nov 22 Genetic Parameters: Means Thurs Nov 24 Thanksgiving Holiday - No Class Mon Nov 27 Genetic Parameters: (Co)Variances Tues Nov 28 Mating Designs for Parameter Estimation Wed Nov 29 Experimental Designs for Parameter Estimation Thurs Nov 30 Heritability, Genetic Correlation and Gain from Selection Mon Dec4 Molecular Dissection of Quant. Variation – Linkage Analysis Tues Dec 5 Molecular Dissection of Quant. Variation – Association Studies Wed Dec 6 Take-home exam on population and quantitative genetics given- due in electronic format ( [email_address] ) by 5 PM Tuesday Dec 12
2.
Teosinte and Maize Teosinte branched 1( tb1 ) is found to affect the differentiation in branch architecture from teosinte to maize (John Doebley 2001)
Identify the genetic architecture of the differences in morphology between maize and teosinte
Estimate the number of genes required for the evolution of a new morphological trait from teosinte to maize: few genes of large effect or many genes of small effect?
Doebley pioneered the use of quantitative trait locus (QTL) mapping approaches to successfully identify genomic regions that are responsible for the separation of maize from its undomesticated relatives.
Doebley has cloned genes identified through QTL mapping, teosinte branched1 ( tb1 ), which governs kernel structure and plant architecture.
Ancient Mexicans used several thousand years ago to transform the wild grass teosinte into modern maize through rounds of selective breeding for large ears of corn.
With genetic information, ‘‘I think in as few as 25 years I can move teosinte fairly far along the road to becoming maize,’’ Doebley predicts (Brownlee, 2004 PNAS vol. 101: 697–699)
6.
Toward biomedical breakthroughs? Single Nucleotide Polymorphisms (SNPs) no cancer cancer
According to The International HapMap Consortium (2003), the statistical analysis and modeling of the links between DNA sequence variants and phenotypes will play a pivotal role in the characterization of specific genes for various diseases and, ultimately, the design of personalized medications that are optimal for individual patients.
What knowledge is needed to perform such statistical analyses?
Population genetics and quantitative genetics, and others…
The International HapMap Consortium, 2003 The International HapMap Project. Nature 426: 789-94.
Liu, T., J. A. Johnson, G. Casella and R. L. Wu, 2004 Sequencing complex diseases with HapMap. Genetics 168: 503-511.
In the Hardy-Weinberg equilibrium (HWE), the relative frequencies of the genotypes will remain unchanged from generation to generation;
As long as a population is randomly mating, the population can reach HWE from the second generation;
The deviation from HWE, called Hardy-Weinberg disequilibrium (HWD), results from many factors, such as selection, mutation, admixture and population structure…
(1) Genotype (and allele) frequencies are constant from generation to generation,
(2) Genotype frequencies = the product of the allele frequencies, i.e., P AA = p A 2 , P Aa = 2p A p a , P aa = p a 2
For a population at Hardy-Weinberg disequilibrium (HWD), we have
P AA = p A 2 + D
P Aa = 2p A p a – 2D
P aa = p a 2 + D
The magnitude of D determines the degree of HWD.
D = 0 means that there is no HWD.
D has a range of max(-p A 2 , -p a 2 ) D p A p a
Concluding remarks A population with [P Aa (t+1)] 2 = 4P AA (t+1)P aa (t+1) is said to be in Hardy-Weinberg equilibrium (HWE). The HWE population has the following properties:
Whether or not the population deviates from HWE at a particular locus can be tested using a chi-square test.
If the population deviates from HWE (i.e., Hardy-Weinberg disequilibrium, HWD), this implies that the population is not randomly mating. Many evolutionary forces, such as mutation, genetic drift and population structure, may operate.
Therefore, the population does not deviate from HWE at this locus.
Why the degree of freedom = 1? Degree of freedom = the number of parameters contained in the alternative hypothesis – the number of parameters contained in the null hypothesis. In this case, df = 2 (p A or p a and D) – 1 (p A or p a ) = 1
But the population is at Linkage Disequilibrium (for a pair of loci). Then we have
Two-gene haplotype AB: p AB = p A p B + D AB
Two-gene haplotype Ab: p Ab = p A p b + D Ab
Two-gene haplotype aB: p aB = p a p B + D aB
Two-gene haplotype ab: p ab = p a p b + D ab
p AB +p Ab +p aB +p ab = 1
D ij is the coefficient of linkage disequilibrium (LD) between the two genes in the population. The magnitude of D reflects the degree of LD. The larger D, the stronger LD.
How does D transmit from one generation (1) to the next (2)?
D(2) = (1-r) 1 D(1)
…
D(t+1) = (1-r) t D(1)
t , D(t+1) r
26.
Conclusions: - D tends to be zero at the rate depending on the recombination fraction. - Linkage equilibrium PAB = pApB is approached gradually and without oscillation. - The larger r, the faster is the rate of convergence, the most rapid being (½)t for unlinked loci (r=0.5).
The ratio D(t)/D(0) describes the degree with which LD decays with generation.
28.
The plot of the ratio D(t)/D(0) against r tells us the evolutionary history of a population – implications for population and evolutionary genetics.
29.
The plot of the ratio D(t)/D(0) against t tells us the degree of linkage – Implications for high- resolution mapping of human diseases and other complex traits
The four gametes randomly unite to form a zygote. The proportion 1-r of the gametes produced by this zygote are parental (or nonrecombinant) gametes and fraction r are nonparental (or recombinant) gametes. A particular gamete, say AB, has a proportion (1-r) in generation t+1 produced without recombination. The frequency with which this gamete is produced in this way is (1-r)p AB (t).
Also this gamete is generated as a recombinant from the genotypes formed by the gametes containing allele A and the gametes containing allele B. The frequencies of the gametes containing alleles A or B are p A (t) and p B (t), respectively. So the frequency with which AB arises in this way is rp A (t)p B (t).
Therefore the frequency of AB in the generation t+1 is
p AB (t+1) = (1-r)p AB (t) + rp A (t)p B (t)
By subtracting is p A (t)p B (t) from both sides of the above equation, we have
(1) Two genes A with allele A and a, B with alleles B and b, whose population frequencies are denoted by p A , p a (=1- p A ) and p B , p b (=1- p b ), respectively
(2) These two genes are associated with each other, having the coefficient of linkage disequilibrium D
Four gametes are observed as follows:
Gamete AB Ab aB ab Total
Obs 474 611 142 773 2n=2000
Gamete frequency p AB p Ab p aB p ab =474/2000 =611/2000 =142/2000 =773/2000
This means that when the population undergoes random mating, the LD decays exponentially in a proportion related to the recombination fraction.
(1) Population structure and evolution
Estimating D, D' and R the mating history of
population
The larger the D’ and R estimates, the more likely the population in nonrandom mating, the more likely the population to have a small size, the more likely the population to be affected by evolutionary forces.
Reich, D. E., M. Cargill, S. Bolk, J. Ireland, P. C. Sabeti, D. J. Richter, T. Lavery, R. Kouyoumjian, S. F. Farhadian, R. Ward and E. S. Lander, 2001 Linkage disequilibrium in the human genome. Nature 411: 199-204.
Dawson, E., G. R. Abecasis, S. Bumpstead, Y. Chen et al. 2002 A first-generation linkage disequilibrium map of human chromosome 22. Nature 418: 544-548.
41.
LD curve for Swedish and Yoruban samples. To minimize ascertainment bias, data are only shown for marker comparisons involving the core SNP. Alleles are paired such that D' > 0 in the Utah population. D' > 0 in the other populations indicates the same direction of allelic association and D' < 0 indicates the opposite association. a , In Sweden, average D' is nearly identical to the average |D'| values up to 40-kb distances, and the overall curve has a similar shape to that of the Utah population (thin line in a and b ). b , LD extends less far in the Yoruban sample, with most of the long-range LD coming from a single region, HCF2 . Even at 5 kb, the average values of |D'| and D' diverge substantially. To make the comparisons between populations appropriate, the Utah LD curves are calculated solely on the basis of SNPs that had been successfully genotyped and met the minimum frequency criterion in both populations (Swedish and Yoruban) (Reich,te al. 2001)
Individuals that are related to each other by ancestry are called relatives;
Mating between relatives is called inbreeding;
The consequence of inbreeding is to increase the frequency of homozygous genotypes in a population, relative to the frequency that would be expected with random mating (Hartl 1999).
as the inbreeding coefficient. Biologically, F measures the degree with which heterozygosity is reduced due to inbreeding, measured as a fraction relative to heterozygosity expected in a random-mating population.
Controlled cross (self fertilization): F = (P Aa – P 0Aa )/ P 0Aa = (1/2 – 1/4)/½ = 1
Natural population: AA Aa aa Total
Example 1 Obs 224 64 6 294
Freq .762 .218 .020 p A =.871, pa=.129
Exp (HWE) .769 .224 .017
F = (P 0Aa – P Aa )/ P 0Aa = (.224-.218)/.224=.027
Example 2 Obs 234 36 6 276
Freq .848 .130 .022 p A =.913, pa=..087
Exp (HWE) .719 .159 0
F = (P 0Aa – P Aa )/ P 0Aa = (.159-.130)/.159=.101
The common ancestor A generates two gametes G1 and G2 during meiosis, but only transmits one gamete for its first offspring B and one gamete for its second offspring C.
A pair of gametes contributed to offspring B and C by A may be G1G1, G1G2, G2G1, G2G2, each with a probability of 1/4 because of Mendelian segregation.
For G1G1 and G2G2, the alleles are clearly IBD,
For G1G2 and G2G1, the alleles are IBD only if G1 and
G2 are IBD, and G1 and G2 are IBD only if individual A is
autozygous, which has probability F A (the inbreeding
coefficient of A)
The probability for A to generate IBD alleles for B and D is therefore 1/4 + 1/4 + 1/4F A + 1/4F A = 1/2(1 + F A ).
For a Hardy-Weinberg equilibrium (HWE) population, the genotype frequencies will remain unchanged from generation to generation. Two questions may arise that concern HWE.
First of all , no HWE population exists in nature because many evolutionary forces may operate in a population, which cause the genotype frequencies in the population to change.
Secondly , even if a population is at HWE, this equilibrium may be quickly violated because of some particular evolutionary forces.
These so-called evolutionary forces that cause the structure and organization of a population to change include mutation , selection , admixture , division , migration , genetic drift … Next, we will talk about the roles of some of these evolutionary forces in shaping a population.
With the definition of mutation rate u ( a fraction u of A alleles undergo mutation and become a alleles, whereas a fraction 1-u of A alleles escape mutation and remain A ), we have allele frequency in the next generation t+1
Assuming that the initial population is nearly fixed for A, i.e., p A (0) ≈ 1 , and that t+1 is not too large relative to 1/u, we can approximate the allele frequencies by
p A (t+1) ≈ p A (0) – (t+1)u,
p a (t+1) ≈ p a (0) + (t+1)u .
The frequency of the mutant a allele increases linearly with time and the slope of the line equals u.
Because u is small, the linear increase in p a is difficult to detect unless a very large population size is used.
Admixture is an evolutionary process in which two or more HWE populations with differing allele frequencies are mixed to produce a new population.
The consequence of admixture is the deficiency of heterozygous genotypes relative to the frequency expected with HWE for the average allele frequencies
It can be seen that genotype frequencies are not equal to the products of the allele frequencies for the admixed population so that the mixed population is not in HWE.
Discovery 2
Relative to an HWE population, the aggregate population contains too few heterozygous genotypes and too many homozygous genotypes.
2 is actually the difference between the genotype frequencies ( R S ) in the metapopulation (equal to the average genotype frequencies among the subpopulations) and the genotype frequencies ( R T ) that would be expected in a total population in HWE., i.e.,
The average frequency of homozygous recessive genotypes among a group of subpopulations is always greater than the frequency of homozygous recessive genotypes that would be expected with random mating, and excess is numerically equal to the variance in the recessive allele frequency .
The relationship R S = R T + 2 R T is called Wahlund’s principle
This is a fundamental relation in population genetics that connects the fixation index in a metapopulation with the variance in allele frequencies among the subpopulations . The fixation index can be interpreted in terms of the inbreeding coefficient. Thus, the genotype frequencies in a metapopulation are expressed as
AA: p - A 2 + p - A p - a F ST = p - A 2 (1-F ST ) + p - A F ST
Aa: 2p - A p - a - 2p - A p - a F ST = 2p - A p - a (1-F ST )
aa: p - a 2 + p - A p - a F ST = p - a 2 (1-F ST ) + p - a F ST
Even though each subpopulation itself is undergoing random mating and is in HWE, there is inbreeding in the metapopulation composed of the aggregate of subpopulations.
A metapopulation may be composed of many smaller subpopulations each of which may be in HWE (theory for population structure).
Viability or survivorship : the probability of survival, which is also called fitness .
Fitness has two types: Absolute fitness separately for each genotype and relative fitness (the ability of one genotype to survive relative to another genotype taken as a standard)
It is impossible to measure absolute fitness because it requires knowing the absolute number of each genotype, whereas relative fitness can be measured by the sampling approach
New frequencies p’ A = (2 5+8)/[2(5+8+3)]=18/32 p’ a =(3 2+8)/[2(5+8+3)]=14/32
Random mating with HWE leads to
AA: P AA = (18/32) 2 20 = 6
Aa: P Aa = 2(18/32)(14/32) 20 = 10
Aa: P aa = (14/32) 2 20 = 4
85.
Define h = hs/s as the degree of dominance of allele a. We have
h = 0 means that a is recessive to A,
h = ½ means that the heterozygous fitness is the arithmetic average of the homozygous fitnesses; in this case, the effects of the alleles are said to be additive effects
h = 1 means that allele a is dominant to allele A.
It is possible that h < 0 or h > 1.
86.
In general, the allele frequencies in the next generation after diploid selection are expressed as
where the dominator is the average fitness in the population, symbolized by
87.
This equation has no analytical solution, and for this reason it is more useful to calculate the difference
In the initial population, P AA = 0, P Aa = 2/3, P aa = 1/3, so we have p A = 1/3 and p a = 2/3. The fitness is measured, w AA = 0, w Aa = 0.50 and w aa = 1.
With the selection coefficient (s), the degree of dominance (h) and 1 (if selection is weak), the difference in allele frequency can be expressed as
p A = p A p a s[p A h + p a (1-h)].
90.
The time t required for the allele frequency of A to change from p A (0) to p A (t) can be determined in each of the three following special cases:
1. Allele A is a favored dominant , in which case h = 0 and p A = p A p a 2 s, i.e.,
,
In the special case, p a (0) = p a (t) = 1, we have
Migration : The movement of individuals among subpopulations
Random genetic drift : Fluctuations in allele frequency that happen by chance, particularly in small populations, as a result of random sampling among gametes
Mutation-selection balance: Selection and mutation affect a population at the same time
Be the first to comment