8. S = Number of nucleotide sites that differ
among the aligned sequences (segregating
sites)
Π = the average number of nucleotide
mismatches for every possible pairwise
comparison among the aligned sequences
Nucleotide diversity
Where μ = the mutation rate per generation,
across the entire nucleotide sequence
Where n = the number of aligned sequences
(Tajima 1989)
9. 𝛉 and heterozygosity
This is equilibrium
heterozygosity following
infinite allele model under
neutral evolution
Where Hg is the heterozygosity of
the current generation and Hg+1 is
the heterozygosity in the next
generation
As the effective population size (Ne) goes down, the
heterozygosity in the next generation decreases
Heterozygosity (H)
10. Natural Selection
• Differing viability and/or fertility of different
genotypes
• Accounts for ‘Adaptive’ Evolution
• Certain traits in a population are under
selection
• Individuals with ‘adaptive’ trait more
successful than others in passing on their
genes (Higher Fitness)
• Offspring inherit that ‘adaptive’ trait
• Under strong selection pressure adaptive traits
become universal => Populations evolve
11. Types of Natural Selection
Natural selection is all about increasing fitness
Balancing selection
12. Tempo of Natural Selection
• Negative (Purifying) Selection eliminates
disadvantageous mutations from the population
• Positive Selection favors beneficial mutations and
increases its frequency in the population
• Hitchhiking can occur when positive selection strongly
favors certain mutations (selective sweep)
- Along with that ‘beneficial’ site, the closely
linked neutral sites (or even slightly deleterious) increase
in frequency
• Background selection: strong negative selection against
deleterious substitutions
14. Population size is a factor!
• For selection to operate, we need a large, diverse population, where the
individuals differ in respect to certain trait
• In the absence of individual variation, selection cannot occur
(Neutrality)
• In a small population, the change in allele frequencies mostly take place
by chance alone (Genetic Drift)
15. Selection is NOT the only factor the causes evolution:
• Accounts for Non-adaptive Evolution
• Change in allele frequency in a population due
to chance event/sampling error
Genetic Drift
16. assumes two alleles: P(A)=p; P(a)=q
assumes non-overlapping generations
probability of exactly i A alleles in the next generation
2N!
(2N – i)! i!= piq2N-i
https://www.radford.edu/~rsheehy/Gen_flash/
popgen/
17. Kimura’s Neutral theory: Null model for
selection
• Most mutations are neutral:
selection does not influence
evolution much at the
molecular level.
• Thus does not effect fitness.
• Majority of evolution at the
molecular level is caused by
random genetic drift.
18. Neutral Theory and Nearly Neutral Theory
• Neutrality of mutations means that they are selectively
equivalent of each other
• A ‘neutral’ mutation does not imply that it is functionless or
evolutionary ‘noise’ or loss of genetic information
• The neutral theory assumes that mutations are either neutral
or deleterious.
• Deleterious mutations are eliminated quickly by natural
selection and all remaining mutations are selectively neutral
25. Where n = the number of aligned sequences
(Excess of rare alleles)
Recent decrease in effective population size
(Very few of rare alleles)
(Tajima 1989)
26. Example: for DIY
S = ?
Π = ?
θΠ = ?
θS = ?
Are these sequences evolving Neutrally?
For Π => Pairwise comparisons should be considered one
nucleotide at a time and calculate the total number of
mismatches
For example, there are 6 sites (doubletons), where 3
nucleotides are of one type and 2 nucleotides are of the other
type. So, here the mismatch total will be: (2x3) x 6 = 36
What about this site?
(2x3)+(1x1) = 7
27. Given the small sample size there is a good agreement between θΠ and θS . We
can consider neutrality here
Excess of singletons: Common among natural populations. Either indicate recent
population growth or deleterious substitutions maintained at a lower frequency
28. A B
C D
Between
species
diversity
Within species
Polymorphism
Non
Synonymous
Synonymous
If, A/C = B/D
- We cannot reject the null hypothesis
of neutral evolution
If, A/C > B/D
- Increased non-synonymous changes
between species = Positive Selection
If, A/C < B/D
- Decreased non-synonymous changes
between species = Negative or
Purifying Selection
Fisher’s
Exact Test
(χ2)
Martin Kreitman
(1991)
John H. McDonald
29. Example
For ant:
3 2
2 34
Between
species
diversity
Within species
Polymorphism
Non
Synonymous
Synonymous
3/2 (1.5) >> 2/34 (0.06)
- indicates increased non-synonymous changes between
species i.e. operation of positive selection in this locus
31. When the difference in θ, can be explained by the difference
in μ and N is same for the two loci => Neutral evolution
When θ is different but μ is the same for the two loci and N is
different in the two loci => Natural selection
32. The difference between
HKA test and MK test:
You have two loci to
compare the influence of
demographic changes
ExampleThe similarity between
HKA test and MK test:
Both are polymorphism-
divergence tests i.e. tests
compare two organisms
HKA test was the
precursor of MK test
33. (Dn/Ds test)
= ω
Advantages:
1. Quick, computationally
less intensive
2. Comparatively easier to
interpret
Disadvantages:
1. ~80 variant sites needed
for reliability – lots of
variation needed
2. Codon dependency -
susceptible to codon usage
bias
3. Synonymous substitutions
can also influence in the
protein function – protein
secondary structure
34. Empirical example: My ‘PhD gene’ – CRTAC1
The pair-wise ω (Ka/Ks) values
ω << 1 indicates strong Purifying selection is operating at the
coding region of CRTAC1
Human Chimp Gorilla
Chimp 0.0936
Gorilla 0.0701 0.1227
Orang 0.0580 0.0408 0.0546
35. Linkage Disequilibrium (LD)
• Also known as gametic disequilibrium
• Non random association between alleles
• Physical closeness of the loci not required
• A population characteristic
36. Measures of Linkage Disequilibrium (LD)
At linkage equilibrium
D = PABPab-PAbPaB = 0
or, PABPab = PAbPaB
At LD PABPab ≠ PAbPaB
If D is negative (more heterozygous gamets),
D’ = D/Dmin, [Dmin = the larger of -pApB and –qaqb]
If D is positive (more homozygous gamets),
D’ = D/Dmax [Dmax = the smaller of pAqB and qapb]
(Correlation between the loci)
(D’= Recombination
dependent measurement)
37. Difference between Linkage and Linkage
Disequilibrium (LD)
Linkage LD
Linkage is a physical state of two alleles being
linked due to chromosomal organization of the
genome
LD refers to the presence of a statistical non-
random association between alleles
Character of a family (pedigree) Population trait affected by evolutionary forces
such as mutation, migration and selection
within population
Physical association between alleles is must Physical association is not necessary. Two alleles
can even be in two different chromosomes and
still be in LD due to long range and/or epistatic
interactions
All linkage can lead to LD All LD does not require physical linkage
In absence of selection, recombination will eventually break down the association between the two alleles at adjacent loci.
More tightly the two loci are physically linked, the slower will be this process.
Linkage and LD are not the same
38. EHH (Extended Haplotype Homozygoity)
• Under neutral evolution, a new variant takes long time to reach high
frequency in population; LD around this variant decays during this
period due to recombination
• Under positive selection (selective sweep), a new variant increases its
frequency rapidly in population over a short period of time; LD
around this variant does not get enough time to decay due to not
enough recombination events
• Signatures of positive selection at a given locus can be detected using
the breakdown of LD as a clock for estimating the age of the alleles
39. Approach of EHH (Sabeti 2002, 2007)
• First identify the core region
• Core region: areas of high density SNPs. Recombination between
these SNPs extremely rare (|D’| = 1)
• Next, measure LD at a distance x from the core region by calculating
EHH
• EHH between two SNPs, A and B can be defined as the probability
that two randomly chosen chromosomes are autozygous (identical by
descent) at all SNPs between A and B
where G = Homozygous groups; N = No. of samples for a
particular core haplotype; ni = Total number of elements in
each group
40. EHH detects transmission of an extended haplotype
without recombination
EHH ranges between 0 (no homozygosity) and 1 (complete
homozygosity)
Bifurcation diagram showing the breakdown of LD at increasing
distances from the selected core region. The root of each
diagram is the core haplotype. The thickness of the lines
corresponds to the number of samples with the indicated long-
distance haplotype
41. Cross population Extended Haplotype
Homozygosity (XPEHH)
• XP-EHH tests whether a given site is
homozygous in one population but
polymorphic in another (Ma et al. 2014)
• Compares EHH score of two populations on
one core haplotype
• In C21orf34 locus extreme XPEHH scores in
non-Africans (most extreme in Europeans)
(Pickrell et al. 2009)
• A haplotype in this region swept to near
fixation at some point since the out-of-
Africa migration