2. Evolution
2
• Evolution is a gradual change in genetic makeup
from one generation to the next
• Evolution:
• Natural Selection
• Mutation
• Genetic Drift
…
• Natural selection and genetic drift are the two most
important causes of allele substitution in populations
Random
processes
Nonrandom
process
3. Evolution
• Evolution creates species-specific and
population-specific differences
• Are they all selected for advantages to the
species or population?
3
Some definitions:
• Locus: position on
chromosome where a
sequence or a gene is located
• Allele: alternative form of
DNA on a locus
• Written as A vs a, or A vs B
5. Phenotypic vs Molecular Evolution
• Phenotypic evolution is controlled by
natural selection
• Molecular mutations are selectively
neutral in the strict sense as that their
fate in evolution is largely determined by
random genetic drift
• Genetic drift due to
sampling errors
5
Motoo Kimura
6. Random Fluctuation in Allele Frequencies
6
p q
Deme
Metapopulation
p'
Neutral alleles
…
time
will eventually fall off the edge of
the platform onto one or the other
track
Drunk traveler staggering on a train
platform with tracks on both sides…
pt
7. Genetic Drift
• Over time, allele frequency in each sub-population
will fluctuate, diversity in each sub-population
will decrease till an allele is fixed (100%) or lost
(0%)
7
p q
Deme
Metapopulation
p' pt
Neutral alleles
…
time
8. Factors Influencing Genetic Drift
• Deme: a population consisting of closely related
species that can typically breed within
• Initial mutation (allele) occurs in a deme of N
individuals (effective population size)
• Assuming neutral evolution, its probably of being
sampled in the offspring is 1/2N
• The likelihood of a mutation being fixed is its
initial frequency (1 / 2N): smaller population,
more likely fix; larger population more likely lost
• Founder effect: new colony starts from few
members (small N) of initial population
8
9. Factors Influencing Genetic Drift
• An allele’s probability of fixation equals its
frequency at that time and is not affected by its
previous history
• In a diploid population, the average time to
fixation of a newly arisen neutral allele that does
become fixed is 4N generations: evolution by
genetic drift proceeds faster in small than in large
populations
• Bottleneck: drastic population
decrease for at least one generation
accelerate fixation
9
p'
10. Factors Influencing Genetic Drift
• Initially genetically identical demes can evolve by
chance to have different genetic constitutions
• Pb (mutation X will fix) = allele frequency
• Among genetically identical demes in a
metapopulation, average allele frequency does not
change but heterogeneity in each declines to 0
10
p q
Deme
Metapopulation
p'
Neutral alleles
…
pt
11. The Neutral Theory of Molecular Evolution
• Most mutations (genetic variations) are fixed from
genetic drifts: neutrally selected and lacks adaptive
significance
• Some mutations are disadvantageous and eliminated
• Only minority of mutations are advantageous and
fixed from natural selection
11
Break
12. Population 1: A T G T A A C G T T A T A
Population 2: A C G T A A C G T T A T A
Population 3: A C G A A A C G T T A T A
Population 4: A C G A A A C C T T A T A
4
3
2
1
By comparing DNA changes among
populations we can trace their history
13. From Phylogeny to Selection
• The protein-coding portion of DNA
has synonymous and nonsynonymous
substitutions. Thus, some DNA changes do not
have corresponding protein changes.
• If the synonymous substitution rate (dS) is greater
than the nonsynonymous substitution rate (dN),
the DNA sequence is under negative (purifying)
selection.
• If dS < dN, positive selection occurs. E.g. a
duplicated gene may evolve rapidly to assume
new functions.
13
14. Molecular Clock
• Molecular evolutionary substitutions proceed at
~constant rate, sequence difference between
species a MOLECULAR CLOCK
• If sequences evolve at constant rates (big if), they
can be used to estimate the times that sequences
diverged. ~Dating fossils by radioactive decay.
14
15. Molecular Clock
• L = number of nucleotides compared between two
sequences
• N = total number of substitutions
• K = N / L, number of substitutions per nucleotide
• E.g. K = 0.093 for rat versus human
• r = rate of substitution (mutations) = 0.56 x 10-9
per site per year
• r = K / 2T T = .093 / (2)(0.56 x 10-9) = 80
million years
15 Graur and Li (1999)
16. Factors Influencing Mutation Rate /
Molecular Clock
• Generation time (age to reproduction)
• Population size (stronger drifts in small
populations)
• Intensity of natural selection
• Species-specific differences
16
When two species are way too
different, over a sufficiently
long time some sites experience
repeated base substitutions, so
the observed number of
differences will plateau.
17. Factors Influencing Mutation Rate /
Molecular Clock
• Generation time (age to reproduction)
• Population size (stronger drifts in small
populations)
• Intensity of natural selection
• Species-specific differences
• Change in protein function
17
19. Where did we come from?
• Two competing hypotheses
– Multiregional evolution (1 millions years ago, Homo erectus
left Africa, and evolve into modern humans in different parts
of the Old World)
– The Out of Africa hypothesis: Homo erectus were displaced
by new populations of modern humans that left Africa 100K
to 50K years ago.
20. • National Geographic Story Jan 2014
• If a fragment of DNA is shared by Neanderthals
and non-Africans, but not Africans or other
primates, it is likely to be a Neanderthal heirloom.
• People living outside Africa carries 1-4% of
Neanderthal DNA (skin, hair, etc).
20
Break
21. 21
Polymorphism
• Polymorphism: sites/genes with “common”
variation, less common allele frequency >= 1%,
otherwise called rare variant and not polymorphic
• Single Nucleotide Polymorphism
– Come from DNA-replication mistake
individual germ line cell, then transmitted
– ~90% of human genetic variation
• Copy number variations
– May or may not be genetic
STAT115
22. 22
Why Should We Care
• Disease gene discovery
– Association studies, e.g. certain SNPs are
susceptible for diabetes
– Chromosome aberrations, duplication / deletion
might cause cancer
• Personalized Medicine
– Drug only effective if you have one allele
STAT115
23. 23
SNP Distribution
• Most common, 1 SNP / 100-300 bp
– Balance between mutation introduction rate and
polymorphism lost rate
– Most mutations lost within a few generations
• 2/3 are CT differences
• In non-coding regions, often less SNPs at
more conserved regions
• In coding regions, often more synonymous
than non-synonymous SNPs
STAT115
25. 25
SNP Characteristics:
Linkage Disequilibrium
• Hardy-Weinberg equilibrium
– In a population with genotypes AA, aa, and Aa, if p =
freq(A), q =freq(a), the frequency of AA, aa and Aa
will be p2, q2, and 2 pq respectively at equilibrium.
– Similarly with two loci, each two alleles Aa, Bb
STAT115
26. 26
SNP Characteristics:
Linkage Disequilibrium
• Equilibrium Disequilibrium
• LD: If Alleles occur together more often than can
be accounted for by chance, then indicate two
alleles are physically close on the DNA
– In mammals, LD is often lost at ~100 KB
– In fly, LD often decays within a few hundred
bases
STAT115
0.26 ab
27. 27
SNP Characteristics:
Linkage Disequilibrium
• Statistical Significance of LD
– Chi-square test (or Fisher’s exact test)
– eij = ni. n.j / nT
j
i ij
ij
ij
e
e
n
,
2
2 )
(
B1 B2 Total
A1 n11 n12 n1.
A2 n21 n22 n2.
Total n.1 n.2 nT
STAT115
28. 28
SNP Characteristics:
Linkage Disequilibrium
• Haplotype block: a cluster of linked SNPs
• Haplotype boundary: blocks of sequence
with strong LD within blocks and no LD
between blocks, reflect recombination
hotspots
STAT115
29. 29
SNP Characteristics:
Linkage Disequilibrium
• Haplotype block: a cluster of linked SNPs
• Haplotype boundary: blocks of sequence
with strong LD within blocks and no LD
between blocks, reflect recombination
hotspots
• Haplotype size
distribution
STAT115
30. Summary
• Phenotype evolution (natural selection) vs
molecular evolution (neutral theory)
• Decrease of genetic variation over time
• Fixation: population size, probability
• Positive and negative selection (dN / dS ratio)
• Molecular clock and migration patterns
• Genome variations: SNP and CNV
• Linkage disequilibrium from recombination
30