MIB200A at UCDavis Module: Microbial Phylogeny; Class 2

Class 2:
MIB200
Biology of Organisms without Nuclei
Class #2:
Phylogeny
UC Davis, Fall 2019
Instructor: Jonathan Eisen
1

Some Questions
• What is a phylogenetic tree?
• What can be shown in a phylogenetic tree?
• How does one infer a phylogenetic tree?
• How does one know if a tree is correct?
• How can one use phylogenetic trees?
• What is the difference between a gene tree and a species tree?

Raff J. How to Read and Understand a Scientific Article
1. Begin by reading the introduction, not the abstract.
https://violentmetaphors.files.wordpress.com/2018/01/how-to-read-and-understand-a-scientific-article.pdf
2. Identify the big question.
3. Summarize the background in five sentences or less.
4. Identify the specific question(s).
5. Identify the approach.
6. Read the methods section.
7. Read the results section.
8. Determine whether the results answer the specific
question(s).
9. Read the conclusion/discussion/interpretation section.
10. Go back to the beginning and read the abstract.
11. Find out what other researchers say about the paper.

Raff J. How to Read and Understand a Scientific Article
1. Begin by reading the introduction, not the abstract.
https://violentmetaphors.files.wordpress.com/2018/01/how-to-read-and-understand-a-scientific-article.pdf
2. Identify the big question.
3. Summarize the background in five sentences or less.
4. Identify the specific question(s).
5. Identify the approach.
6. Read the methods section.
7. Read the results section.
8. Determine whether the results answer the specific
question(s).
9. Read the conclusion/discussion/interpretation section.
10. Go back to the beginning and read the abstract.
11. Find out what other researchers say about the paper.
X

Baldauf Main Topics
• Terminology
• Groups
• Trees
• Roots
• Homology
• Inferring Trees
! Step 1. Assembling a dataset
! Step 2. Multiple sequence alignment – the heart of the matter
! Step 3. Trees – methods, models and madness
! Step 4. Tests – telling the forest from the trees
! Step 5. Data presentation

A phylogenetic tree is composed of branches (edges) and nodes.
Branches connect nodes; a node is the point at which two (or more)
branches diverge. Branches and nodes can be internal or external
(terminal). An internal node corresponds to the hypothetical last
common ancestor (LCA) of everything arising from it. Terminal
nodes correspond to the sequences from which the tree was
derived (also referred to as operational taxonomic units or ‘OTUs’).

Internal nodes represent hypothetical ancestral taxa
a b c d e f g h
root, root node
terminal (or tip) taxa
internal nodes
internal
branches
u
v
w
x
y
z
t
Terminal
branches
Parts of a phylogenetic tree
13

Tree Roots
At the base of a phylogenetic tree is its ‘root’. This is the oldest point
in the tree, and it, in turn, implies the order of branching in the rest
of the tree; that is, who shares a more recent common ancestor with
whom. The only way to root a tree is with an ‘outgroup’, an external
point of reference. An outgroup is anything that is not a natural
member of the group of interest (i.e. the ‘ingroup’

Slides by Jonathan Eisen for BIS2C at UC Davis Spring 2016
Unrooted Tree of Life from Woese
23
ROOT

Unrooted Tree of Life from Woese
24
ROOT
MAJOR DEBATE/AMBIGUITIES

Alternative Position of Eukaryote Branch
25
ROOT

Orthology vs. Paralogy
Evolution is about homology; that is, the similarity due to common ancestry.

The methods for calculating phylogenetic trees fall into two general
categories. These are distance-matrix methods, also known as
clustering or algorithmic methods (e.g. UPGMA, neighbour-joining,
Fitch–Margoliash), and discrete data methods, also known as tree
searching methods (e.g. parsimony, maximum likelihood, Bayesian
methods)

Eisen 1998 Major Topics
• Sequence Similarity, Homology, and Functional Predictions
• Identification of Homologs
• Alignment and Masking
• Phylogenetic Trees
• Functional Predictions

tion ary in form ation can be used to im -
prove fun ction al prediction s. Below, I
presen t an outlin e of on e such phylog-
enomic m eth od (see Fig. 1), an d I com -
pare th is m eth od to n on evolution ary
fun ction al prediction m eth ods. Th is
m eth od is based on a relatively sim ple
assum ption —because gen e fun ction s
ch an ge as a result of evolution , recon -
structin g th e evolution ary h istory of
gen es sh ould h elp predict th e fun ction s
of un ch aracterized gen es. Th e first step
is th e gen eration of a ph ylogen etic tree
represen tin g th e evolution ary h istory of
th e gen e of in terest an d its h om ologs.
Such trees are distin ct from clusters an d
oth er m ean s of ch aracterizin g sequen ce
sim ilarity because th ey are in ferred by
special tech n iques th at h elp con vert pat-
tern s of sim ilarity in to evolution ary re-
lation sh ips (see Swofford et al. 1996). Af-
ter th e gen e tree is in ferred, biologically
determ in ed fun ction s of th e various h o-
m ologs are overlaid on to th e tree. Fi-
n ally, th e structure of th e tree an d th e
relative ph ylogen etic position s of gen es
of differen t fun ction s are used to trace
th e h istory of fun ction al ch an ges, wh ich
is th en used to predict fun ction s of un -
ch aracterized gen es. More detail of th is
m eth od is provided below.
Identification of Homologs
Th e first step in studyin g th e evolution
of a particular gen e is th e iden tification
of h om ologs. As with sim ilarity-based
fun ction al prediction m eth ods, likely
h om ologs of a particular gen e are iden -
database
erated se
BLAST (A
fam ily is
ers), it m a
a subset
m ust be d
m igh t ac
th at wou
sis.
Alignment
Sequen ce
an alysis h
th e assign
Each col
align m en
acids or
m on evol
um n is tr
gen etic a
wh ich th
m ology
cluded (G
sion of ce
kn own as
gen etic m
n atory po
ated with
m an y seq
ages) are
th e evolu
with m as
Phylogene
For exten
atin g ph y
Table 1. Methods of Predicting
Gene Function When Homologs
Have Multiple Functions
Highest Hit
The uncharacterized gene is
assigned the function (or frequently,
the annotated function) of the gene
that is identified as the highest hit
by a similarity search program (e.g.,
Tomb et al. 1997).
Top Hits
Identify top 10+ hits for the
uncharacterized gene. Depending
on the degree of consensus of the
functions of the top hits, the query
sequence is assigned a specific
function, a general activity with
unknown specificity, or no function
(e.g., Blattner et al. 1997).
Clusters of Orthologous Groups
Genes are divided into groups of
orthologs based on a cluster
analysis of pairwise similarity scores
between genes from different
species. Uncharacterized genes are
assigned the function of
characterized orthologs (Tatusov et
al. 1997).
Phylogenomics
Known functions are overlaid onto
an evolutionary tree of all
homologs. Functions of
uncharacterized genes are predicted
by their phylogenetic position
relative to characterized genes (e.g.,
Eisen et al. 1995, 1997).
Insight/Outlook

greatly from m ore data, it is useful to
augm en t th is in itial list by usin g iden ti-
fied h om ologs as queries for furth er
m on ly used: parsim on y, distan ce, an d
m axim um likelih ood (Table 3), an d each
h as its advan tages an d disadvan tages. I
Table 2. Types of Molecular Homology
Homolog Genes that are descended from a common ancestor
(e.g., all globins)
Ortholog Homologous genes that have diverged from each other
after speciation events (e.g., human b- and chimp
b-globin)
Paralog Homologous genes that have diverged from each other
after gene duplication events (e.g., b- and g-globin)
Xenolog Homologous genes that have diverged from each other
after lateral gene transfer events (e.g., antibiotic
resistance genes in bacteria)
Positional homology Common ancestry of specific amino acid or nucleotide
positions in different genes

al. 1989). However, exam in ation of th e
percen t sim ilarity between m ycoplasm al
gen es an d th eir h om ologs in bacteria
does n ot clearly sh ow th is relation sh ip.
Th is is because m ycoplasm as h ave un -
dergon e an accelerated rate of m olecular
evolution relative to oth er bacteria.
Th us, a BLAST search with a gen e from
Bacillus subtilis (a low GC Gram -positive
species) will result in a list in wh ich th e
m ycoplasm a h om ologs (if th ey exist)
score lower th an gen es from m an y spe-
Table 3. Molecular Phylogenetic Methods
Method
Parsimony Possible trees are compared and each is given a score that is a reflection of the minimum number
of character state changes (e.g., amino acid substitutions) that would be required over
evolutionary time to fit the sequences into that tree. The optimal tree is considered to be the
one requiring the fewest changes (the most parsimonious tree).
Distance The optimal tree is generated by first calculating the estimated evolutionary distance between all
pairs of sequences. Then these distances are used to generate a tree in which the branch
patterns and lengths best represent the distance matrix.
Maximum likelihood Maximum likelihood is similar to parsimony methods in that possible trees are compared and
given a score. The score is based on how likely the given sequences are to have evolved in a
particular tree given a model of amino acid or nucleotide substitution probabilities. The optimal
tree is considered to be the one that has the highest probability.
Bootstrapping Alignment positions within the original multiple sequence alignment are resampled and new data
sets are made. Each bootstrapped data set is used to generate a separate phylogenetic tree and
the trees are compared. Each node of the tree can be given a bootstrap percentage indicating
how frequently those species joined by that node group together in different trees. Bootstrap
percentage does not correspond directly to a confidence limit.
Insight/Outlook

MIB200A at UCDavis Module: Microbial Phylogeny; Class 2

MIB200A at UCDavis Module: Microbial Phylogeny; Class 2

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Similar to MIB200A at UCDavis Module: Microbial Phylogeny; Class 2

Similar to MIB200A at UCDavis Module: Microbial Phylogeny; Class 2 (20)

More from Jonathan Eisen

More from Jonathan Eisen (20)

Recently uploaded

Recently uploaded (20)

MIB200A at UCDavis Module: Microbial Phylogeny; Class 2