Unit 2: Phylogeny
LECTURE LEARNING GOALS
1. Define phylogeny, and describe what a phylogenetic tree can reveal about the species it models.
2. Describe how to construct a phylogenetic tree, and the complexities that create mistakes.
3. Explain how to root a tree, and contrast how to root the tree of life.
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Ā
Lecture 02 (2 04-2021) phylogeny
1. PHYLOGENY
Unit 02, 2.04.2021
Reading for today: Brown Ch. 4, 5 & 6
Reading for next class: Brown Ch. 24 & 7
Dr. Kristen DeAngelis
Student Hours by appointment
deangelis@microbio.umass.edu
2. Unit 2: Phylogeny
LECTURE LEARNING GOALS
1. Define phylogeny, and describe what
a phylogenetic tree can reveal about
the species it models.
2. Describe how to construct a
phylogenetic tree, and the
complexities that create mistakes.
3. Explain how to root a tree, and
contrast how to root the tree of life.
2
3. Unit 2: Phylogeny
LECTURE LEARNING GOALS
1. Define phylogeny, and describe what
a phylogenetic tree can reveal about
the species it models.
2. Describe how to construct a
phylogenetic tree, and the
complexities that create mistakes.
3. Explain how to root a tree, and
contrast how to root the tree of life.
3
4. Phylogeny
ā¢ Phylogeny is a model of
evolutionary
relationships among
species based on
sequence similarities.
ā¢ Phylogeny may also
refer to a phylogenetic
tree, the illustration of
these relationships.
Woesian ToL: Pace NR, Science 1997
4
5. Phylogeny
ā¢ Last time we looked at some āwrongā
trees including Haeckelās 3 kingdom
tree and Whittakerās 5 kingdom tree.
Why are these problematic?
ā Subjective & qualitative.
ā¢ PCR and sequencing make it possible
to understand how organisms are
related in an objective and
quantitative way.
5
8. Descent with modification =
evolution
ā¢ Individual species āsplitā into two or
more daughter species
ā concept of vertical inheritance
ā common ancestor at basal nodes
ā Molecular clock
ā¢ Evolution only occurs when there is a
change in gene frequency within a
population over time
8
10. Read trees like mobiles
10
In a tree like this, these blue branches have
lengths that are meaningful.
Their distance should be described by the
value of changes in a scale bar.
In a tree like this, these red distances have
lengths that are NOT meaningful.
They are spacers whose distance are only
meant to make room for labels or pictures, as
seen at the left.
12. Reading a tree
ā¢ The tips are the extant organisms whose relationship you are
trying to discern.
ā¢ Branch lengths correspond to sequence similarity, which are
also an expression of how much DNA sequence has changed
over time.
ā ALL good trees have scale bars in units of change per unit branch
length
ā¢ There are nodes connecting the tips, which represent a
hypothetical common ancestor between the organisms in
that clade.
ā¢ The distance between tips (not along the branches) has no
meaning.
ā¢ The treeās branches can rotate freely around the axes.
12
13. Unequal rates of evolution
1, 2 and 3 are extant organisms
B is a theoretical common ancestor (ānodeā)
1.0
13
14. Unequal rates of evolution
Similarity between organisms is not
necessarily equal to evolutionary
relationship.
ā¢ Which one evolved faster?
ā ā3ā evolved faster than ā2ā
ā¢ Which is most similar to 2? Why?
ā ā2ā is more āsimilarā to ā1ā than to ā3ā
ā¢ However, ā2ā and ā3ā share a common
ancestor āBā
ā¢ Scale bar tells you the number of
substitutions per unit branch length
14
16. Derived vs Ancestral Trait
ā¢ A derived trait is one that was NOT present
in the common ancestor.
ā¢ Ancestral (or primitive traits) are characters
that WERE present in a common ancestor.
ā¢ These terms are relative because it depends
which common ancestor you are referring
to; every node is the last common ancestor
for all descendants of that group.
ā¢ The green circle on the left denotes a
monophyletic group, where all organisms
share a common ancestor 16
17. Activity for Review of
Unit 02.1 Defining Trees
1. Label the root,
a tip, a node
and a branch.
2. Circle the
Domains
included in the
Prokaryotes.
3. Is the group
Prokaryote
monophyletic?
17
18. Unit 2: Phylogeny
LECTURE LEARNING GOALS
1. Define phylogeny, and describe what
a phylogenetic tree can reveal about
the species it models.
2. Describe how to construct a
phylogenetic tree, and the
complexities that create mistakes.
3. Explain how to root a tree, and
contrast how to root the tree of life.
18
19. So youāre making a
phylogenetic treeā¦
ā¢ Assume you have chosen which
species to analyze
ā¢ (1) Decide which gene to use ā¦
ā Ribosomal RNA genes
ā A concatenation of single copy
housekeeping genes
19
20. So youāre making a
phylogenetic treeā¦
ā¢ (1) Decide which gene to use
20
21. So you need to make a
phylogenetic treeā¦
ā¢ SSU ribosomal RNA gene
+Short, only 1500 base pairs
+Information-dense because it is a non-
coding, structural RNA
+Essential for life so probably not
horizontally transferred
- Multiple copies per genome
- Cannot resolve close relationships
21
22. So youāre making a
phylogenetic treeā¦
ā¢ (2) Sequence the gene and align them
22
23. So youāre making a
phylogenetic treeā¦
ā¢ (2) Sequence the gene and align them
ā¢ We want evolutionary distance but it cannot be directly
measured, so it must be estimated
ā¢ Each vertical column in the alignment is a ātraitā in
calculating the distance matrix
ā¢ Distance matrix is based on observed (measurable)
differences, but we assume parsimony
ā There can be more than one evolutionary change at a single
position (e.g., A Ć G Ć U)
ā Positions can change and change back (A Ć G Ć A)
23
24. So youāre making a
phylogenetic treeā¦
ā¢ (3) Make an evolutionary distance matrix based on
sequence similarity, using Jukes-Cantor Method.
24
25. So youāre making a
phylogenetic treeā¦
ā¢ Jukes Cantor method relates sequence
similarity to evolutionary distance
ā If all sequences are the same, distance is zero
ā Distances increase as sequence similarity
decreases, which means that one or two bases
difference does not change the distance much
ā The lowest sequence similarity is about 0.25
because all sequences are about 25% similar by
chance; there are 4 bases in the genetic code
so the chance that one base will match another
is 1 in 4
25
26. So youāre making a
phylogenetic treeā¦
ā¢ (4) Perform phylogenetic
analysis, and optionally
constructing a phylogenetic
tree
ā¢ This is an example of the
neighbor joining method
26
Distance Matrix (%)
27. So youāre making a
phylogenetic treeā¦
ā¢ How can you determine the branch
lengths?
ā In other words, you need to place the node
āuā, which defines a common ancestor
ā You know how far apart a & b are from
each other
ā You know how far apart a is from something
else, say c, so measure b from c and you
can estimate where node u should be
27
29. Tree Construction Complexities
1. Choice of substitution model
2. GC bias
3. Choice of tree-making algorithm
4. Long-branch attraction
5. Bootstrapping
29
30. Substitution models
ā¢ Jukes Cantor model is a one-parameter model
ā¢ Two-parameter models only care about whether a
substitution is a transition or transversion
ā¢ Six-parameter models weighting each change
differently
30
31. Substitution models
ā¢ Transitions are much more common than
transversions, so these are weighted
differently in deciding what distance to
assign to a mismatch
ā¢ Six-parameter models consider different
types of transitions and transversions,
weighting each change differently
ā¢ Gaps are also trickyā¦ for example,
adjacent gaps are not unrelated
31
33. GC bias
ā¢ The more GC-rich a region is, the higher the
recombination rates
ā¢ That means that GC-rich regions, or GC-rich
genomes, evolve faster naturally
ā¢ Including High GC gram positives (like
Actinobacteria) in the same tree as Low GC
gram positives (like Firmicutes) can be
misleading
33
34. Choice of tree algorithm can
affect tree structure
ā¢ Neighbor-joining starts with a radial tree and joins
neighbors
ā¢ Parsimony makes a bunch of trees and find the one
that is the most simple, usually based on the fewest
mutations
ā¢ Maximum likelihood trees are based on probability
ā the best & most computationally intensive
ā¢ Bayesian inference starts with random tree structure
& random parameters, then iterates until an
āoptimalā tree is found
34
35. Long-branch attraction
ā¢ Very long branches can sometimes cluster artificially
ā¢ Usually due to bad sequence, poor alignment, or not
enough tips
ā¢ The erroneous new phylogeny implies a common
ancestor and can result in different rates of evolution
35
36. Bootstrapping
ā¢ Random sampling with
replacement to create new
trees
ā¢ A measure of confidence in
your sequence alignment
ā¢ Numbers are from 0-100,
with 100 being perfect
confidence
36
37. Activity for Review of
Unit 02.2
Examine the two trees at right,
made with two different genes.
Bootstrap values for maximum
likelihood (above branches) and
parsimony (below branches) are
shown.
1. Which tree is a more likely
representation of
Methanopyrus kandleri?
Why?
2. What could explain the
differences between the
two trees?
37
38. Unit 2: Phylogeny
LECTURE LEARNING GOALS
1. Define phylogeny, and describe what
a phylogenetic tree can reveal about
the species it models.
2. Describe how to construct a
phylogenetic tree, and the
complexities that create mistakes.
3. Explain how to root a tree, and
contrast how to root the tree of life.
38
40. How to root a tree
ā¢ This is optional ā one can infer
evolutionary relationships without a
root
ā¢ To root a tree, pick an āoutgroupā
ā¢ The root identifies the last common
ancestor ā different from the LUCA
40
42. The root of the ToL is the
Last universal common ancestor
ā¢ One cannot rely on nucleotide gene
sequences alone because these would
have mutated beyond recognition
ā¢ Amino acid sequences mutate more
slowly because neutral mutations leave
the amino acid sequence fixed
ā¢ The tertiary folded structure of a protein is
even more strongly conserved than the
secondary structure
42
43. Sequence homology
ā¢ Homologous genes have a shared ancestry.
ā Orthologs arise because of a speciation event.
ā Paralogs arise because of duplication event.
43
44. Paralogs are used to root the ToL
ā¢ Elongation Factors duplicated prior to divergence of
the three Domains
ā¢ One gene tree can be rooted with the other gene
ā¢ Both trees yield the same relationship and are rooted
in the same location. 44
45. 45
ā¢ Homologous genes have a shared ancestry.
ā Orthologs arise from a speciation event ā multiple organisms, one gene.
ā Paralogs arise from a duplication event ā the same organism, two
different (homologous) genes.
46. Root the tree of life using
paralogs
ā¢ The genes for the protein synthesis elongation
factors Tu (EF-Tu) and G (EF-G) are the
products of an ancient gene duplication,
which appears to predate the divergence of
all extant organismal lineages.
ā¢ Most phylogenetic methods place the root of
the ToL in the Bacteria
ā¢ A combined data set of EF-Tu and EF-G
sequences favors placement of the
eukaryotes within the Archaea, as the sister
group to the Crenarchaeota
ā http://www.ncbi.nlm.nih.gov/pmc/articles/PMC38819
/
46
48. Protein-based models of evolution
ā¢ Traits here are proteins, NOT DNA sequence
ā Based on 420 modern organisms, looking for
structures that were common to all.
ā 5 to 11 per cent were universal-- conserved
enough to have originated in LUCA
ā¢ This perspective gives us new information
about LUCA
ā LUCA had enzymes to break down and extract
energy from nutrients, and some protein-making
equipment
ā LUCA lacked the enzymes for making and
reading DNA molecules
48
49. The root moves depending on whether
you use nucleic acids or protein!
Bacteria
Archaea Eukaryotes
Bacteria Archaea Eukaryotes
49
50. The root moves depending on whether
you use nucleic acids or protein!
ā¢ RNA sequence-based rooting of the tree
of life puts the root within the Bacteria.
ā usually derived from analyses of the
sequence of ancient gene paralogs e.g.,
ATPases, elongation factors
ā¢ Proteomic analyses for many proteins
puts the root of the tree of life within the
Archaea.
ā Archaeal rooting has been observed for
phylogenetic analyses of tRNA, 5S, & Rnase P
50
51. Activity for Review of
Unit 02.3
ā¢ Answer on your own, then discuss in
groups.
ā¢ What can we infer about the biology of
the Last Universal Common Ancestor
based on the fact that different genes
place the root in different Domains?
51
52. Unit 2: Phylogeny
LECTURE LEARNING GOALS
1. Define phylogeny, and describe what a
phylogenetic tree can reveal about the
species it models.
2. Describe how to construct a phylogenetic
tree, and the complexities that create
mistakes.
3. Explain how to root a tree, and contrast how
to root the tree of life.
Next class is Unit 3: Microbiology of early Earth
Reading for next class: Brown Ch. 24 & 7
52