Neutral theory
Ranajit Das, PhD
Assistant Professor
Yenepoya Research Centre
Yenepoya (Deemed to be University)
Mangalore, Karnataka
Brief Background
Organization of Eukaryotic Genome
Gene
Mutation: The main source of variation
Synonymous Non-Synonymous
Mutation rate (µ)
= P0
m
Farlow et al. (2015)
Mutations and alleles
S = Number of nucleotide sites that differ
among the aligned sequences (segregating
sites)
Π = the average number of nucleotide
mismatches for every possible pairwise
comparison among the aligned sequences
Nucleotide diversity
Where μ = the mutation rate per generation,
across the entire nucleotide sequence
Where n = the number of aligned sequences
(Tajima 1989)
𝛉 and heterozygosity
This is equilibrium
heterozygosity following
infinite allele model under
neutral evolution
Where Hg is the heterozygosity of
the current generation and Hg+1 is
the heterozygosity in the next
generation
As the effective population size (Ne) goes down, the
heterozygosity in the next generation decreases
Heterozygosity (H)
Natural Selection
• Differing viability and/or fertility of different
genotypes
• Accounts for ‘Adaptive’ Evolution
• Certain traits in a population are under
selection
• Individuals with ‘adaptive’ trait more
successful than others in passing on their
genes (Higher Fitness)
• Offspring inherit that ‘adaptive’ trait
• Under strong selection pressure adaptive traits
become universal => Populations evolve
Types of Natural Selection
Natural selection is all about increasing fitness
Balancing selection
Tempo of Natural Selection
• Negative (Purifying) Selection eliminates
disadvantageous mutations from the population
• Positive Selection favors beneficial mutations and
increases its frequency in the population
• Hitchhiking can occur when positive selection strongly
favors certain mutations (selective sweep)
- Along with that ‘beneficial’ site, the closely
linked neutral sites (or even slightly deleterious) increase
in frequency
• Background selection: strong negative selection against
deleterious substitutions
Hard and soft selective sweep
Population size is a factor!
• For selection to operate, we need a large, diverse population, where the
individuals differ in respect to certain trait
• In the absence of individual variation, selection cannot occur
(Neutrality)
• In a small population, the change in allele frequencies mostly take place
by chance alone (Genetic Drift)
Selection is NOT the only factor the causes evolution:
• Accounts for Non-adaptive Evolution
• Change in allele frequency in a population due
to chance event/sampling error
Genetic Drift
assumes two alleles: P(A)=p; P(a)=q
assumes non-overlapping generations
probability of exactly i A alleles in the next generation
2N!
(2N – i)! i!= piq2N-i
https://www.radford.edu/~rsheehy/Gen_flash/
popgen/
Kimura’s Neutral theory: Null model for
selection
• Most mutations are neutral:
selection does not influence
evolution much at the
molecular level.
• Thus does not effect fitness.
• Majority of evolution at the
molecular level is caused by
random genetic drift.
Neutral Theory and Nearly Neutral Theory
• Neutrality of mutations means that they are selectively
equivalent of each other
• A ‘neutral’ mutation does not imply that it is functionless or
evolutionary ‘noise’ or loss of genetic information
• The neutral theory assumes that mutations are either neutral
or deleterious.
• Deleterious mutations are eliminated quickly by natural
selection and all remaining mutations are selectively neutral
The Neutralist-Selectionist debate
Which one predominates? Drift or Selection?
Tests for selection considering Neutral theory
as the null hypothesis
Where n = the number of aligned sequences
(Excess of rare alleles)
Recent decrease in effective population size
(Very few of rare alleles)
(Tajima 1989)
Example: for DIY 
S = ?
Π = ?
θΠ = ?
θS = ?
Are these sequences evolving Neutrally?
For Π => Pairwise comparisons should be considered one
nucleotide at a time and calculate the total number of
mismatches
For example, there are 6 sites (doubletons), where 3
nucleotides are of one type and 2 nucleotides are of the other
type. So, here the mismatch total will be: (2x3) x 6 = 36
What about this site?
(2x3)+(1x1) = 7
 Given the small sample size there is a good agreement between θΠ and θS . We
can consider neutrality here
 Excess of singletons: Common among natural populations. Either indicate recent
population growth or deleterious substitutions maintained at a lower frequency
A B
C D
Between
species
diversity
Within species
Polymorphism
Non
Synonymous
Synonymous
If, A/C = B/D
- We cannot reject the null hypothesis
of neutral evolution
If, A/C > B/D
- Increased non-synonymous changes
between species = Positive Selection
If, A/C < B/D
- Decreased non-synonymous changes
between species = Negative or
Purifying Selection
Fisher’s
Exact Test
(χ2)
Martin Kreitman
(1991)
John H. McDonald
Example
For ant:
3 2
2 34
Between
species
diversity
Within species
Polymorphism
Non
Synonymous
Synonymous
3/2 (1.5) >> 2/34 (0.06)
- indicates increased non-synonymous changes between
species i.e. operation of positive selection in this locus
(1989)
Richard R. Hudson
Montserrat Aguadé
 When the difference in θ, can be explained by the difference
in μ and N is same for the two loci => Neutral evolution
 When θ is different but μ is the same for the two loci and N is
different in the two loci => Natural selection
The difference between
HKA test and MK test:
You have two loci to
compare the influence of
demographic changes
ExampleThe similarity between
HKA test and MK test:
Both are polymorphism-
divergence tests i.e. tests
compare two organisms
HKA test was the
precursor of MK test
(Dn/Ds test)
= ω
Advantages:
1. Quick, computationally
less intensive
2. Comparatively easier to
interpret
Disadvantages:
1. ~80 variant sites needed
for reliability – lots of
variation needed
2. Codon dependency -
susceptible to codon usage
bias
3. Synonymous substitutions
can also influence in the
protein function – protein
secondary structure
Empirical example: My ‘PhD gene’ – CRTAC1
The pair-wise ω (Ka/Ks) values
ω << 1 indicates strong Purifying selection is operating at the
coding region of CRTAC1
Human Chimp Gorilla
Chimp 0.0936
Gorilla 0.0701 0.1227
Orang 0.0580 0.0408 0.0546
Linkage Disequilibrium (LD)
• Also known as gametic disequilibrium
• Non random association between alleles
• Physical closeness of the loci not required
• A population characteristic
Measures of Linkage Disequilibrium (LD)
At linkage equilibrium
D = PABPab-PAbPaB = 0
or, PABPab = PAbPaB
At LD PABPab ≠ PAbPaB
If D is negative (more heterozygous gamets),
D’ = D/Dmin, [Dmin = the larger of -pApB and –qaqb]
If D is positive (more homozygous gamets),
D’ = D/Dmax [Dmax = the smaller of pAqB and qapb]
(Correlation between the loci)
(D’= Recombination
dependent measurement)
Difference between Linkage and Linkage
Disequilibrium (LD)
Linkage LD
Linkage is a physical state of two alleles being
linked due to chromosomal organization of the
genome
LD refers to the presence of a statistical non-
random association between alleles
Character of a family (pedigree) Population trait affected by evolutionary forces
such as mutation, migration and selection
within population
Physical association between alleles is must Physical association is not necessary. Two alleles
can even be in two different chromosomes and
still be in LD due to long range and/or epistatic
interactions
All linkage can lead to LD All LD does not require physical linkage
In absence of selection, recombination will eventually break down the association between the two alleles at adjacent loci.
More tightly the two loci are physically linked, the slower will be this process.
Linkage and LD are not the same
EHH (Extended Haplotype Homozygoity)
• Under neutral evolution, a new variant takes long time to reach high
frequency in population; LD around this variant decays during this
period due to recombination
• Under positive selection (selective sweep), a new variant increases its
frequency rapidly in population over a short period of time; LD
around this variant does not get enough time to decay due to not
enough recombination events
• Signatures of positive selection at a given locus can be detected using
the breakdown of LD as a clock for estimating the age of the alleles
Approach of EHH (Sabeti 2002, 2007)
• First identify the core region
• Core region: areas of high density SNPs. Recombination between
these SNPs extremely rare (|D’| = 1)
• Next, measure LD at a distance x from the core region by calculating
EHH
• EHH between two SNPs, A and B can be defined as the probability
that two randomly chosen chromosomes are autozygous (identical by
descent) at all SNPs between A and B
where G = Homozygous groups; N = No. of samples for a
particular core haplotype; ni = Total number of elements in
each group
EHH detects transmission of an extended haplotype
without recombination
EHH ranges between 0 (no homozygosity) and 1 (complete
homozygosity)
Bifurcation diagram showing the breakdown of LD at increasing
distances from the selected core region. The root of each
diagram is the core haplotype. The thickness of the lines
corresponds to the number of samples with the indicated long-
distance haplotype
Cross population Extended Haplotype
Homozygosity (XPEHH)
• XP-EHH tests whether a given site is
homozygous in one population but
polymorphic in another (Ma et al. 2014)
• Compares EHH score of two populations on
one core haplotype
• In C21orf34 locus extreme XPEHH scores in
non-Africans (most extreme in Europeans)
(Pickrell et al. 2009)
• A haplotype in this region swept to near
fixation at some point since the out-of-
Africa migration
Re-think

Neutral theory 2019

  • 1.
    Neutral theory Ranajit Das,PhD Assistant Professor Yenepoya Research Centre Yenepoya (Deemed to be University) Mangalore, Karnataka
  • 2.
  • 4.
  • 5.
    Mutation: The mainsource of variation Synonymous Non-Synonymous
  • 6.
    Mutation rate (µ) =P0 m Farlow et al. (2015)
  • 7.
  • 8.
    S = Numberof nucleotide sites that differ among the aligned sequences (segregating sites) Π = the average number of nucleotide mismatches for every possible pairwise comparison among the aligned sequences Nucleotide diversity Where μ = the mutation rate per generation, across the entire nucleotide sequence Where n = the number of aligned sequences (Tajima 1989)
  • 9.
    𝛉 and heterozygosity Thisis equilibrium heterozygosity following infinite allele model under neutral evolution Where Hg is the heterozygosity of the current generation and Hg+1 is the heterozygosity in the next generation As the effective population size (Ne) goes down, the heterozygosity in the next generation decreases Heterozygosity (H)
  • 10.
    Natural Selection • Differingviability and/or fertility of different genotypes • Accounts for ‘Adaptive’ Evolution • Certain traits in a population are under selection • Individuals with ‘adaptive’ trait more successful than others in passing on their genes (Higher Fitness) • Offspring inherit that ‘adaptive’ trait • Under strong selection pressure adaptive traits become universal => Populations evolve
  • 11.
    Types of NaturalSelection Natural selection is all about increasing fitness Balancing selection
  • 12.
    Tempo of NaturalSelection • Negative (Purifying) Selection eliminates disadvantageous mutations from the population • Positive Selection favors beneficial mutations and increases its frequency in the population • Hitchhiking can occur when positive selection strongly favors certain mutations (selective sweep) - Along with that ‘beneficial’ site, the closely linked neutral sites (or even slightly deleterious) increase in frequency • Background selection: strong negative selection against deleterious substitutions
  • 13.
    Hard and softselective sweep
  • 14.
    Population size isa factor! • For selection to operate, we need a large, diverse population, where the individuals differ in respect to certain trait • In the absence of individual variation, selection cannot occur (Neutrality) • In a small population, the change in allele frequencies mostly take place by chance alone (Genetic Drift)
  • 15.
    Selection is NOTthe only factor the causes evolution: • Accounts for Non-adaptive Evolution • Change in allele frequency in a population due to chance event/sampling error Genetic Drift
  • 16.
    assumes two alleles:P(A)=p; P(a)=q assumes non-overlapping generations probability of exactly i A alleles in the next generation 2N! (2N – i)! i!= piq2N-i https://www.radford.edu/~rsheehy/Gen_flash/ popgen/
  • 17.
    Kimura’s Neutral theory:Null model for selection • Most mutations are neutral: selection does not influence evolution much at the molecular level. • Thus does not effect fitness. • Majority of evolution at the molecular level is caused by random genetic drift.
  • 18.
    Neutral Theory andNearly Neutral Theory • Neutrality of mutations means that they are selectively equivalent of each other • A ‘neutral’ mutation does not imply that it is functionless or evolutionary ‘noise’ or loss of genetic information • The neutral theory assumes that mutations are either neutral or deleterious. • Deleterious mutations are eliminated quickly by natural selection and all remaining mutations are selectively neutral
  • 19.
  • 21.
    Which one predominates?Drift or Selection?
  • 23.
    Tests for selectionconsidering Neutral theory as the null hypothesis
  • 25.
    Where n =the number of aligned sequences (Excess of rare alleles) Recent decrease in effective population size (Very few of rare alleles) (Tajima 1989)
  • 26.
    Example: for DIY S = ? Π = ? θΠ = ? θS = ? Are these sequences evolving Neutrally? For Π => Pairwise comparisons should be considered one nucleotide at a time and calculate the total number of mismatches For example, there are 6 sites (doubletons), where 3 nucleotides are of one type and 2 nucleotides are of the other type. So, here the mismatch total will be: (2x3) x 6 = 36 What about this site? (2x3)+(1x1) = 7
  • 27.
     Given thesmall sample size there is a good agreement between θΠ and θS . We can consider neutrality here  Excess of singletons: Common among natural populations. Either indicate recent population growth or deleterious substitutions maintained at a lower frequency
  • 28.
    A B C D Between species diversity Withinspecies Polymorphism Non Synonymous Synonymous If, A/C = B/D - We cannot reject the null hypothesis of neutral evolution If, A/C > B/D - Increased non-synonymous changes between species = Positive Selection If, A/C < B/D - Decreased non-synonymous changes between species = Negative or Purifying Selection Fisher’s Exact Test (χ2) Martin Kreitman (1991) John H. McDonald
  • 29.
    Example For ant: 3 2 234 Between species diversity Within species Polymorphism Non Synonymous Synonymous 3/2 (1.5) >> 2/34 (0.06) - indicates increased non-synonymous changes between species i.e. operation of positive selection in this locus
  • 30.
  • 31.
     When thedifference in θ, can be explained by the difference in μ and N is same for the two loci => Neutral evolution  When θ is different but μ is the same for the two loci and N is different in the two loci => Natural selection
  • 32.
    The difference between HKAtest and MK test: You have two loci to compare the influence of demographic changes ExampleThe similarity between HKA test and MK test: Both are polymorphism- divergence tests i.e. tests compare two organisms HKA test was the precursor of MK test
  • 33.
    (Dn/Ds test) = ω Advantages: 1.Quick, computationally less intensive 2. Comparatively easier to interpret Disadvantages: 1. ~80 variant sites needed for reliability – lots of variation needed 2. Codon dependency - susceptible to codon usage bias 3. Synonymous substitutions can also influence in the protein function – protein secondary structure
  • 34.
    Empirical example: My‘PhD gene’ – CRTAC1 The pair-wise ω (Ka/Ks) values ω << 1 indicates strong Purifying selection is operating at the coding region of CRTAC1 Human Chimp Gorilla Chimp 0.0936 Gorilla 0.0701 0.1227 Orang 0.0580 0.0408 0.0546
  • 35.
    Linkage Disequilibrium (LD) •Also known as gametic disequilibrium • Non random association between alleles • Physical closeness of the loci not required • A population characteristic
  • 36.
    Measures of LinkageDisequilibrium (LD) At linkage equilibrium D = PABPab-PAbPaB = 0 or, PABPab = PAbPaB At LD PABPab ≠ PAbPaB If D is negative (more heterozygous gamets), D’ = D/Dmin, [Dmin = the larger of -pApB and –qaqb] If D is positive (more homozygous gamets), D’ = D/Dmax [Dmax = the smaller of pAqB and qapb] (Correlation between the loci) (D’= Recombination dependent measurement)
  • 37.
    Difference between Linkageand Linkage Disequilibrium (LD) Linkage LD Linkage is a physical state of two alleles being linked due to chromosomal organization of the genome LD refers to the presence of a statistical non- random association between alleles Character of a family (pedigree) Population trait affected by evolutionary forces such as mutation, migration and selection within population Physical association between alleles is must Physical association is not necessary. Two alleles can even be in two different chromosomes and still be in LD due to long range and/or epistatic interactions All linkage can lead to LD All LD does not require physical linkage In absence of selection, recombination will eventually break down the association between the two alleles at adjacent loci. More tightly the two loci are physically linked, the slower will be this process. Linkage and LD are not the same
  • 38.
    EHH (Extended HaplotypeHomozygoity) • Under neutral evolution, a new variant takes long time to reach high frequency in population; LD around this variant decays during this period due to recombination • Under positive selection (selective sweep), a new variant increases its frequency rapidly in population over a short period of time; LD around this variant does not get enough time to decay due to not enough recombination events • Signatures of positive selection at a given locus can be detected using the breakdown of LD as a clock for estimating the age of the alleles
  • 39.
    Approach of EHH(Sabeti 2002, 2007) • First identify the core region • Core region: areas of high density SNPs. Recombination between these SNPs extremely rare (|D’| = 1) • Next, measure LD at a distance x from the core region by calculating EHH • EHH between two SNPs, A and B can be defined as the probability that two randomly chosen chromosomes are autozygous (identical by descent) at all SNPs between A and B where G = Homozygous groups; N = No. of samples for a particular core haplotype; ni = Total number of elements in each group
  • 40.
    EHH detects transmissionof an extended haplotype without recombination EHH ranges between 0 (no homozygosity) and 1 (complete homozygosity) Bifurcation diagram showing the breakdown of LD at increasing distances from the selected core region. The root of each diagram is the core haplotype. The thickness of the lines corresponds to the number of samples with the indicated long- distance haplotype
  • 41.
    Cross population ExtendedHaplotype Homozygosity (XPEHH) • XP-EHH tests whether a given site is homozygous in one population but polymorphic in another (Ma et al. 2014) • Compares EHH score of two populations on one core haplotype • In C21orf34 locus extreme XPEHH scores in non-Africans (most extreme in Europeans) (Pickrell et al. 2009) • A haplotype in this region swept to near fixation at some point since the out-of- Africa migration
  • 42.