SlideShare a Scribd company logo
Phylogeny
MICROBIO 590B Bioinformatics Lab: Bacterial Genomics
Professor Kristen DeAngelis
UMass Amherst
Fall 2022
1
Lecture Learning Goals
• Define phylogeny, and describe what a phylogenetic tree can reveal
about the taxa that they model.
• Explain how phylogenetic methods can allow us to make inferences
about groups of organisms and ancestors
• Describe how to construct a phylogenetic tree, and the complexities
that create mistakes.
• Contrast the different phylogenetic marker genes or concatenations
of genes that are available depending on the sequencing technology.
• Define the species concept for microbes.
• Make a phylogenetic tree.
2
Phylogeny
• Phylogeny is a model of
evolutionary relationships
among species based on
sequence similarities.
• Phylogeny may also refer to a
phylogenetic tree, the
illustration of these
relationships.
Woesian ToL: Pace NR, Science 1997
3
Read trees like mobiles
4
Read trees like mobiles
5
In a tree like this, these blue branches have lengths that are
meaningful. Their distance should be described by the
value of changes in a scale bar.
In a tree like this, these red distances have lengths that are
NOT meaningful. They are spacers whose distance are only
meant to make room for labels or pictures.
Descent with modifica;on
6
Who was the last universal common ancestor?
8
The root of the ToL represents the
last universal common ancestor
The root of the ToL represents the
last universal common ancestor
• One cannot rely on nucleotide gene sequences alone because these
would have mutated beyond recognition
• Amino acid sequences mutate more slowly because neutral
mutations leave the amino acid sequence fixed
• The tertiary folded structure of a protein is even more strongly
conserved than the secondary structure
9
Sequence homology
• Homologous genes have a shared ancestry.
• Orthologs arise because of a speciation event.
• Paralogs arise because of duplication event.
10
Paralogs are used to root the ToL
• Elongation Factors duplicated prior to divergence of the three
Domains
• One gene tree can be rooted with the other gene
• Both trees yield the same relationship and are rooted in the
same location.
11
Root the tree of life using paralogs
• The genes for the protein synthesis elongation factors Tu
(EF-Tu) and G (EF-G) are the products of an ancient gene
duplication, which appears to predate the divergence of all
extant organismal lineages.
• Most phylogenetic methods place the root of the ToL in the
Bacteria
• A combined data set of EF-Tu and EF-G sequences favors
placement of the eukaryotes within the Archaea, as the
sister group to the Crenarchaeota
12
Baladuf, Palmer, & Doolittle, 1996
Protein-based models of evolution
13
Kim and Caetano-Anollés BMC Evolutionary Biology 2011
Protein-based models of evolu7on
• Traits here are proteins, NOT DNA sequence
• Based on 420 modern organisms, looking for structures
that were common to all.
• 5 to 11 per cent were universal-- conserved enough to have
originated in LUCA
• This perspective gives us new information about LUCA
• LUCA had enzymes to break down and extract energy from
nutrients, and some protein-making equipment
• LUCA lacked the enzymes for making and reading DNA
molecules
14
Bacteria
Archaea Eukaryotes
Bacteria Archaea Eukaryotes
15
The root moves depending on what trait you use!
The root moves depending on whether you use
nucleic acids or protein!
• RNA sequence-based rooting of the
tree of life puts the root within the
Bacteria.
• usually derived from analyses of the
sequence of ancient gene paralogs e.g.,
ATPases, elongation factors
• Proteomic analyses for many proteins
puts the root of the tree of life within
the Archaea.
• Archaeal rooting has been observed for
phylogenetic analyses of tRNA, 5S, &
Rnase P
16
Bacteria
Archaea Eukaryotes
Bacteria Archaea Eukaryotes
The last universal common ancestor, aka LUCA
• 4 – 3.5 Ga (Ga = 109 years ago)
• Almost certainly a dispersed
population of variable cells
• Features
• DNA, the universal code, and most genes
• Transcription and RNA polymerase
• RNAs of all kinds
• Translation and translational machinery
• Most proteins and metabolisms
• Membrane and cellular structure
17
Bacteria
Archaea Eukaryotes
Bacteria Archaea Eukaryotes
LUCA
also LUCA !
So you’re making a phylogenetic tree…
• Assume you have chosen which species to analyze
• (1) Decide which gene to use …
• Ribosomal RNA genes
• A concatenaZon of single copy housekeeping genes
18
SSU ribosomal RNA
gene is a common
phylogenetic marker
+ Short, only 1500 base pairs
+ InformaZon-dense because it
is a non-coding, structural RNA
+ EssenZal for life so probably
not horizontally transferred
- MulZple copies per genome
- Cannot resolve close
relaZonships
19
Xie, Tian, Qin, Bu, 2008
20
Sensitivity and correlation of hypervariable regions
in 16S rRNA genes in phylogenetic analysis
• Distance between trees based
on sub-regions (V2 through
V8) and trees based on all the
sub-regions (VT)
• Sequence analyses including
V4 are favored because of this
21
Yang, Wang, Qian. BMC Bioinformatics, 2016
So you’re making a phylogene;c tree…
• (2) Align the gene sequences
22
So you’re making a phylogene;c tree…
• (2) Align the gene sequences
• We want evolutionary distance but it cannot be directly
measured, so it must be estimated
• Each vertical column in the alignment is a “trait” in
calculating the distance matrix
• Distance matrix is based on observed (measurable)
differences, but we assume parsimony
• There can be more than one evolutionary change at a single position
(e.g., A à G à U)
• Positions can change and change back (A à G à A)
23
So you’re making a phylogenetic tree…
• (3) Make an evolutionary distance matrix based on sequence
similarity, using Jukes-Cantor Method.
24
So you’re making a phylogenetic tree…
• Jukes Cantor method relates sequence similarity to
evolutionary distance
• If all sequences are the same, distance is zero
• Distances increase as sequence similarity decreases, which
means that one or two bases difference does not change
the distance much
• The lowest sequence similarity is about 0.25 because all
sequences are about 25% similar by chance; there are 4
bases in the genetic code so the chance that one base will
match another is 1 in 4
25
So you’re making a phylogenetic tree…
• (4) Perform phylogeneZc analysis.
• This is an example of the neighbor
joining method
26
Distance Matrix (%)
So you’re making a phylogene;c tree…
• How can you determine the branch lengths?
• In other words, you need to place the node “u”, which defines
a common ancestor
• You know how far apart a & b are from each other
• You know how far apart a is from something else, say c, so
measure b from c and you can estimate where node u should
be
• (5, optional) Create a visualization of the tree.
• Let’s look at some nice trees …
27
So you’re making a phylogene;c tree…
• (4) Perform phylogenetic analysis.
28
Yang & Rannala, Nat Rev Gen, 2012
So you’re making a phylogenetic tree…
• (4) Perform phylogeneZc analysis.
29
Yang & Rannala, Nat Rev Gen, 2012
Some nice trees: Metatranscriptomic reconstruc1on reveals
RNA viruses with the poten1al to shape carbon cycling in soil
30
Starr et al., 2019
A nice tree: bacterial
isolates in the our lab
culture collection
The colored branches are unique
for each taxonomy Family, and the
colored labels refer to strains that
belong to the same Genus.
The outer blue/red indicates if
each strain is from the heated or
control plots.
And the stars mean we have a
genome sequenced.
Choudoir, unpublished
So you’re
making a
phylogenetic
tree…
• There are many
(free) programs
to make trees…
https://evolution.genetics.
washington.edu/phylip/soft
ware.html
32
Yang & Rannala, Nat Rev Gen, 2012
Tree Construc;on Complexi;es
1. Choice of substitution model
2. GC bias
3. Choice of tree-making algorithm
4. Long-branch attraction
5. Bootstrapping
33
Choice of subs;tu;on model
• Pairwise sequence distances are calculated assuming a Markov chain model of
nucleotide substitution. Several commonly used models are illustrated in FIG. 1.
34
Yang & Rannala, Nature Reviews GeneYcs, 2012
35
“GC bias”
• The more GC-rich a
region is, the higher the
recombination rates.
• That means that GC-rich
regions, or GC-rich
genomes, evolve faster
naturally.
• Including High GC gram
positives (like
Actinobacteria) in the
same tree as Low GC
gram positives (like
Firmicutes) can be
misleading.
Choice of tree algorithm can affect tree structure
• Neighbor-joining starts with a radial tree and joins
neighbors
• Parsimony makes a bunch of trees and find the one
that is the most simple, usually based on the fewest
mutaWons
• Maximum likelihood trees are based on probability
• the best & most computaZonally intensive
• Bayesian inference starts with random tree structure
& random parameters, then iterates unWl an
“opWmal” tree is found
36
Long-branch attraction
• Very long branches can someZmes cluster arZficially
• Usually due to bad sequence, poor alignment, or not enough Zps
• The erroneous new phylogeny implies a common ancestor and
can result in different rates of evoluZon
37
Long-branch aPrac;on
• …
38
Yang & Rannala, Nature Reviews GeneYcs, 2012
Long-branch attraction in theory and in practice
• Panels a and b show the four-species case by Felsenstein. If the correct tree (T in a) has
two long branches separated by a short internal branch, parsimony (as well as model-
based methods such as likelihood and Bayesian methods under simplistic models) tends
to recover a wrong tree (T2 in b), in which the two long branches are grouped together.
• Panels c and d show a similar phenomenon in a real data set, concerning the phylogeny
of seed plants. The Gnetales is a morphologically and ecologically diverse group of
Gymnosperms including three genera (Ephedra, Gnetum and Welwitschia), but its
phylogenetic position has been controversial.
• Maximum likelihood analysis of 56 chloroplast proteins produced the GneCup tree (d), in
which the Gnetales are grouped with Cupressophyta, apparently owing to a long-branch
attraction artefact.
• However, the Gnepine tree (c), in which the Gnetales joins the Pinaceae, was inferred by
excluding the fastest-evolving 18 proteins as well as three proteins (namely, psbC, rpl2
and rps7) that had experienced many parallel substitutions between the Cryptomeria
branch and the branch ancestral to the Gnetales. The Gnepine tree (c) is also supported
by two proteins from the nuclear genome and appears to be the correct tree.
• Branch lengths and bootstrap proportions are all calculated using RAxML.
39
Yang & Rannala, Nature Reviews Genetics, 2012
Bootstrapping
• Random sampling with
replacement to create new
trees
• A measure of confidence in
your sequence alignment
• Numbers are from 0-100, with
100 being perfect confidence
40
Bootstrapping
• …
41
What is a species?
The following terms represent similar concepts and are sometimes used
interchangeably.
• Species = related organisms that share common characteristics and are capable of
interbreeding
• Taxa = a group of one or more populations of an organism, usually with a name and rank,
and seen by taxonomists to form a unit
• Operational taxonomic unit = Usually defined as the number of distinct 16S ribosomal
RNA sequences (or distinct phylogenetic marker genes or concatenations) at a certain
cut-off level of sequence diversity.
• Lineage = temporal series of populations, organisms, cells, or genes connected by a
continuous line of descent from ancestor to descendant, determined by the techniques
of molecular systematics.
• Strain = a genetic variant, a subtype or a culture within a biological species
42
What is a species?
43
The species concept in microbes is hotly debated.
• ‘‘A species could be described as a monophyleZc and genomically coherent
cluster of individual organisms that show a high degree of overall similarity in
many independent characterisZcs, and is diagnosable by a discriminaZve
phenotypic property.’’ (ReF. 9)
• ‘‘Species are considered to be an irreducible cluster of organisms diagnosably
different from other such clusters and within which there is a parental palern of
ancestry and descent.’’ (ReF. 82)
• ‘‘A species is a group of individuals where the observed lateral gene transfer
within the group is much greater than the transfer between groups.’’ (ReF. 83)
• ‘‘Microbes ... do not form natural clusters to which the term “species” can be
universally and sensibly applied.’’ (ReF. 84)
• ‘‘Species are (segments of) metapopulaZon lineages.’’ (ReF. 7)
Achtman & Wagner, Nat. Rev. Micro. 2008
44
Achtman & Wagner, Nat. Rev. Micro. 2008
Species definition should be guided by a method-free
species concept based on cohesive evolutionary forces
Species defini7ons
• Five types of ecotype models have been described in detail. E1 and E2 represent ecotypes; G1 and G2
represent genotypes. Colours reflect genetic ancestry. Solid lines indicate extant lineages that exist today,
whereas dotted lines indicate extinct lineages that have disappeared owing to overgrowth during episodes
of periodic selection.
45
Achtman & Wagner, Nat. Rev. Micro. 2008
Species definitions
• …
46
Achtman & Wagner, Nat. Rev. Micro. 2008
Salmonella enterica subsp.
enterica serovar Typhi
Yersinia pestis Neisseria meningitidis
serogroup A subgroup III
Opera;onal species defini;ons
• pairwise DNA re-association values are ≥70% in DNA–DNA
hybridization experiments under standardized conditions and their
∆Tm (melting temperature) is ≤5°C
• 16S ribosomal RNAs (rRNAs) that are ≤98.7% identical are always
members of different species
• strong differences in rRNA correlate with <70% DNA–DNA similarity
• distinct species have been occasionally described with 16S rRNAs that are
>98.7% identical
• multilocus sequence analysis (MLSA) based on multiple (typically 6–8)
protein-coding core genes
• average nucleotide identity (ANI) of all orthologous genes
• …
47
NCBI BLAST 16S ribosomal RNA genes
>GP101
CGGCAGCGGGGGTAGCTTGCTACTTGCCGGCGAGTGGCGAACGGGTGAGTAATACATCGGAACGTGCCCTGTAGTGGGGG
ATAACTAGTCGAAAGACTGGCTAATACCGCATACGACCTGAGGGTGAAAGTGGGGGACCGCAAGGCCTCATGCTATAGGAG
CGGCCGATGTCTGATTAGCTAGTTGGTGGGGTAAAGGCCCACCAAGGCGACGATCAGTAGCTGGTCTGAGAGGACGATCAG
CCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGGGCAACCCTGAT
CCAGCAATGCCGCGTGTGTGAAGAAGGCCTTCGGGTTGTAAAGCACTTTTGTCCGGAAAGAAATCGCTTCGGTTAATACCTG
GAGTGGATGACGGTACCGGAAGAATAAGGACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTCCAAGCGTTA
ATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTGTGCAAGACCGATGTGAAATCCCCGGGCTTAACCTGGGAATTGC
ATTGGTGACTGCACGGCTAGAGTGTGTCAGAGGGGGGTAGAATTCCACGTGTAGCAGTGAAATGCGTAGAGATGTGGAGG
AATACCGATGGCGAAGGCAGCCCCCTGGGATAACACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATAC
CCTGGTAGTCCACGCCCTAAACGATGTCAACTAGTTGTTGGGGATTCATTTTCTTAGTAACGTAGCTAACGCGTGAAGTTGAC
CGCCTGGGGAGTACGGTCGCAAGATTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGATGATGTGGATTAA
TTCGATGCAACGCGAAAAACCTTACCTACCCTTGACATGCCACTAACGAAGCAGAGATGCATTAGGTGCTCGAAAGAGAAA
GTGGACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGT
CTCTAGTTGCTACGAAAGGGCACTCTAGAGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCCTCA
TGGCCCTTATGGGTAGGGCTTCACACGTCATACAATGGTGCATACAGAGGGTTGCCAAGCCGCGAGGTGGAGCTAATCCCA
GAAAATGCATCGTAGTCCGGATCGTAGTCTGCAACTCGACTACGTGAAGCTGGAATCGCTAGTAATCGCGGATCAGCATGCC
GCGGTGAATACGTTCCCGGGTCTTGTACACACCGCCCGTCACACCTTGGGAGTGGGCTTTACCAGAAGTAGTTAGCCTAACC
GCAAGGAGGGCGATACCACGTAGT
48
NCBI BLAST 16S ribosomal RNA genes
• The Basic Local
Alignment Search Tool (BLAST)
finds regions of local similarity
between sequences.
• Default database is ‘nr/nt’, the
non-redundant nucleotide
collection
• Update date: 2021/08/01
• Number of sequences:
72,191,653
• For phylogeny & taxonomy, we
want to use the ribosomal RNA
(rRNA) intergenic transcribed
spacer (ITS) database
• 21,856 sequences
49
What if we cannot detect the usual phylogenetic
marker genes?
• Inferring phylogeny for genomes newly discovered from
metagenomes is useful for identification (aka genotyping)
• 16S ribosomal RNA genes are the “gold standard,” but sometimes
resist assembly due to high degrees of sequence similarity across
lineages
• Any shared genomic trait is a candidate for a phylogenetic marker
• Single copy marker genes
50
Single copy marker genes
• ezTree is a program that can extract single
copy genes for phylogeneZc analysis
51
Wu, BMC Genomics. 2018
Taxonomy versus phylogeny
• Taxonomy bins organization based on classified levels
• Linnaean classification is still used
• 97% identity of the 16S rRNA gene or greater are the same species
• 95% identity of the 16S rRNA gene or greater are the same genus
• THERE ARE MANY EXCEPTIONS!
• Linnaean classification
• Kingdom
• Phylum
• Class
• Order
• Family
• Genus
• Species
52
Linnaeus and Race
• Linnaeus’ work forms one of the 18th-century roots of modern scientific racism.
• Linnaeus was the first naturalist to classify man as an animal in Systema naturae in 1735
• ’man’ was divided into four ”varieties” (he did not use the word ”race”)
• based on the then known four continents of the world: Europe, America, Asia and Africa
• By the 10th edition, he expanded this idea to add the four ‘humours’ or temperaments, as
well as a hierarchy of the ‘varieties’
53
https://www.linnean.org/learning/who-was-linnaeus/linnaeus-and-race
Taxonomy via NCBI
• Order
• Family
• Genus
• Species
• Subspecies
54
Genome Taxonomy Database (GTDB)
• Phylogenomic classificaWon based on a set of conserved proteins
55
Insert Genome into Species Tree
• species tree using a set of 49 core, universal genes defined by COG
(Clusters of Orthologous Groups) gene families
• COGs domains used in the estimate of relatedness are listed on the
website. For example:
• GTPase, tRNA synthetases, Ribosomal proteins, and other proteins involved in
Translation, ribosomal structure and biogenesis
• Nucleotide transport and metabolism
• 3-phosphoglycerate kinase [Carbohydrate transport and metabolism]
56
Lecture Learning Goals
• Define phylogeny, and describe what a phylogenetic tree can reveal
about the taxa that they model.
• Explain how phylogenetic methods can allow us to make inferences
about groups of organisms and ancestors
• Describe how to construct a phylogenetic tree, and the complexities
that create mistakes.
• Contrast the different phylogenetic marker genes or concatenations
of genes that are available depending on the sequencing technology.
• Define the species concept for microbes.
• Make a phylogenetic tree.
57
58

More Related Content

Similar to 07_Phylogeny_2022.pdf

phylogenetics.pdf
phylogenetics.pdfphylogenetics.pdf
phylogenetics.pdf
SrimathideviJ
 
BTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptxBTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptx
ChijiokeNsofor
 
Phylogeny-Abida.pptx
Phylogeny-Abida.pptxPhylogeny-Abida.pptx
Phylogeny-Abida.pptx
MuhammadRizwan863722
 
Phylogenetics
PhylogeneticsPhylogenetics
Phylogenetics
Afnan Zuiter
 
Genomics,proteomics and comparative genomics
Genomics,proteomics and comparative genomicsGenomics,proteomics and comparative genomics
Genomics,proteomics and comparative genomics
Iqbal college Peringammala TVM
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
Nitin Naik
 
Molecular phylogenetics
Molecular phylogeneticsMolecular phylogenetics
Molecular phylogenetics
Ajay Kumar Chandra
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
Jajati Keshari Nayak
 
Beiko networks 2019_final
Beiko networks 2019_finalBeiko networks 2019_final
Beiko networks 2019_final
beiko
 
Molecular Phylogenetics
Molecular PhylogeneticsMolecular Phylogenetics
Molecular Phylogenetics
Meghaj Mallick
 
Genetics and health
Genetics and healthGenetics and health
Genetics and health
Nisha Yadav
 
Role of Genome Advancement in Evolution Studies
Role of Genome Advancement in Evolution StudiesRole of Genome Advancement in Evolution Studies
Role of Genome Advancement in Evolution StudiesSarla Rao
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
Leighton Pritchard
 
EVE161: Microbial Phylogenomics - Class 1 - Introduction
EVE161: Microbial Phylogenomics - Class 1 - IntroductionEVE161: Microbial Phylogenomics - Class 1 - Introduction
EVE161: Microbial Phylogenomics - Class 1 - Introduction
Jonathan Eisen
 
Introduction to Modern Biosystemaics for Fungal Classification
Introduction to Modern Biosystemaics for Fungal ClassificationIntroduction to Modern Biosystemaics for Fungal Classification
Introduction to Modern Biosystemaics for Fungal Classification
Mrinal Vashisth
 
Evolution 2012 Talk: When do we Lack Resolvable Clades?
Evolution 2012 Talk: When do we Lack Resolvable Clades?Evolution 2012 Talk: When do we Lack Resolvable Clades?
Evolution 2012 Talk: When do we Lack Resolvable Clades?
David Bapst
 
Comparative genomics.pdf
Comparative genomics.pdfComparative genomics.pdf
Comparative genomics.pdf
shinycthomas
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomics
BITS
 
Applied bioinformatics
Applied bioinformaticsApplied bioinformatics
Applied bioinformatics
Maryam Saddiqa
 

Similar to 07_Phylogeny_2022.pdf (20)

phylogenetics.pdf
phylogenetics.pdfphylogenetics.pdf
phylogenetics.pdf
 
BTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptxBTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptx
 
Comparitive genomics
Comparitive genomicsComparitive genomics
Comparitive genomics
 
Phylogeny-Abida.pptx
Phylogeny-Abida.pptxPhylogeny-Abida.pptx
Phylogeny-Abida.pptx
 
Phylogenetics
PhylogeneticsPhylogenetics
Phylogenetics
 
Genomics,proteomics and comparative genomics
Genomics,proteomics and comparative genomicsGenomics,proteomics and comparative genomics
Genomics,proteomics and comparative genomics
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
 
Molecular phylogenetics
Molecular phylogeneticsMolecular phylogenetics
Molecular phylogenetics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Beiko networks 2019_final
Beiko networks 2019_finalBeiko networks 2019_final
Beiko networks 2019_final
 
Molecular Phylogenetics
Molecular PhylogeneticsMolecular Phylogenetics
Molecular Phylogenetics
 
Genetics and health
Genetics and healthGenetics and health
Genetics and health
 
Role of Genome Advancement in Evolution Studies
Role of Genome Advancement in Evolution StudiesRole of Genome Advancement in Evolution Studies
Role of Genome Advancement in Evolution Studies
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
 
EVE161: Microbial Phylogenomics - Class 1 - Introduction
EVE161: Microbial Phylogenomics - Class 1 - IntroductionEVE161: Microbial Phylogenomics - Class 1 - Introduction
EVE161: Microbial Phylogenomics - Class 1 - Introduction
 
Introduction to Modern Biosystemaics for Fungal Classification
Introduction to Modern Biosystemaics for Fungal ClassificationIntroduction to Modern Biosystemaics for Fungal Classification
Introduction to Modern Biosystemaics for Fungal Classification
 
Evolution 2012 Talk: When do we Lack Resolvable Clades?
Evolution 2012 Talk: When do we Lack Resolvable Clades?Evolution 2012 Talk: When do we Lack Resolvable Clades?
Evolution 2012 Talk: When do we Lack Resolvable Clades?
 
Comparative genomics.pdf
Comparative genomics.pdfComparative genomics.pdf
Comparative genomics.pdf
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomics
 
Applied bioinformatics
Applied bioinformaticsApplied bioinformatics
Applied bioinformatics
 

More from Kristen DeAngelis

10_Hypothesis_2022.pdf
10_Hypothesis_2022.pdf10_Hypothesis_2022.pdf
10_Hypothesis_2022.pdf
Kristen DeAngelis
 
09_MeetTheIsolates_2022.pdf
09_MeetTheIsolates_2022.pdf09_MeetTheIsolates_2022.pdf
09_MeetTheIsolates_2022.pdf
Kristen DeAngelis
 
08_Annotation_2022.pdf
08_Annotation_2022.pdf08_Annotation_2022.pdf
08_Annotation_2022.pdf
Kristen DeAngelis
 
06_Alignment_2022.pdf
06_Alignment_2022.pdf06_Alignment_2022.pdf
06_Alignment_2022.pdf
Kristen DeAngelis
 
05_Microbio590B_QC_2022.pdf
05_Microbio590B_QC_2022.pdf05_Microbio590B_QC_2022.pdf
05_Microbio590B_QC_2022.pdf
Kristen DeAngelis
 
04_Assembly_2022.pdf
04_Assembly_2022.pdf04_Assembly_2022.pdf
04_Assembly_2022.pdf
Kristen DeAngelis
 
03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdf03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdf
Kristen DeAngelis
 
02_Microbio590B_genomics_2022.pdf
02_Microbio590B_genomics_2022.pdf02_Microbio590B_genomics_2022.pdf
02_Microbio590B_genomics_2022.pdf
Kristen DeAngelis
 
01_Microbio590B_intro_2022.pdf
01_Microbio590B_intro_2022.pdf01_Microbio590B_intro_2022.pdf
01_Microbio590B_intro_2022.pdf
Kristen DeAngelis
 
MorrillMicrobeMadness_HowtoPlay_Bracket.pdf
MorrillMicrobeMadness_HowtoPlay_Bracket.pdfMorrillMicrobeMadness_HowtoPlay_Bracket.pdf
MorrillMicrobeMadness_HowtoPlay_Bracket.pdf
Kristen DeAngelis
 
MorrillMicrobeMadness_2022.pdf
MorrillMicrobeMadness_2022.pdfMorrillMicrobeMadness_2022.pdf
MorrillMicrobeMadness_2022.pdf
Kristen DeAngelis
 
Lecture 11 (3 11-2021) acellular life
Lecture 11 (3 11-2021) acellular lifeLecture 11 (3 11-2021) acellular life
Lecture 11 (3 11-2021) acellular life
Kristen DeAngelis
 
Lecture 10 (3 9-2021) archaea
Lecture 10 (3 9-2021) archaeaLecture 10 (3 9-2021) archaea
Lecture 10 (3 9-2021) archaea
Kristen DeAngelis
 
Lecture 09 (3 4-2021) euks
Lecture 09 (3 4-2021) euksLecture 09 (3 4-2021) euks
Lecture 09 (3 4-2021) euks
Kristen DeAngelis
 
Lecture 08 (3 2-2021) rares
Lecture 08 (3 2-2021) raresLecture 08 (3 2-2021) rares
Lecture 08 (3 2-2021) rares
Kristen DeAngelis
 
Lecture 07 (2 25-21) soils
Lecture 07 (2 25-21) soilsLecture 07 (2 25-21) soils
Lecture 07 (2 25-21) soils
Kristen DeAngelis
 
Lecture 06 (2 23-2021) microbial mats
Lecture 06 (2 23-2021) microbial matsLecture 06 (2 23-2021) microbial mats
Lecture 06 (2 23-2021) microbial mats
Kristen DeAngelis
 
Lecture 05 (2 16-2021) baas becking
Lecture 05 (2 16-2021) baas beckingLecture 05 (2 16-2021) baas becking
Lecture 05 (2 16-2021) baas becking
Kristen DeAngelis
 
Lecture 04 (2 11-2021) motility
Lecture 04 (2 11-2021) motilityLecture 04 (2 11-2021) motility
Lecture 04 (2 11-2021) motility
Kristen DeAngelis
 
Lecture 03 (2 09-2021) early earth
Lecture 03 (2 09-2021) early earthLecture 03 (2 09-2021) early earth
Lecture 03 (2 09-2021) early earth
Kristen DeAngelis
 

More from Kristen DeAngelis (20)

10_Hypothesis_2022.pdf
10_Hypothesis_2022.pdf10_Hypothesis_2022.pdf
10_Hypothesis_2022.pdf
 
09_MeetTheIsolates_2022.pdf
09_MeetTheIsolates_2022.pdf09_MeetTheIsolates_2022.pdf
09_MeetTheIsolates_2022.pdf
 
08_Annotation_2022.pdf
08_Annotation_2022.pdf08_Annotation_2022.pdf
08_Annotation_2022.pdf
 
06_Alignment_2022.pdf
06_Alignment_2022.pdf06_Alignment_2022.pdf
06_Alignment_2022.pdf
 
05_Microbio590B_QC_2022.pdf
05_Microbio590B_QC_2022.pdf05_Microbio590B_QC_2022.pdf
05_Microbio590B_QC_2022.pdf
 
04_Assembly_2022.pdf
04_Assembly_2022.pdf04_Assembly_2022.pdf
04_Assembly_2022.pdf
 
03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdf03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdf
 
02_Microbio590B_genomics_2022.pdf
02_Microbio590B_genomics_2022.pdf02_Microbio590B_genomics_2022.pdf
02_Microbio590B_genomics_2022.pdf
 
01_Microbio590B_intro_2022.pdf
01_Microbio590B_intro_2022.pdf01_Microbio590B_intro_2022.pdf
01_Microbio590B_intro_2022.pdf
 
MorrillMicrobeMadness_HowtoPlay_Bracket.pdf
MorrillMicrobeMadness_HowtoPlay_Bracket.pdfMorrillMicrobeMadness_HowtoPlay_Bracket.pdf
MorrillMicrobeMadness_HowtoPlay_Bracket.pdf
 
MorrillMicrobeMadness_2022.pdf
MorrillMicrobeMadness_2022.pdfMorrillMicrobeMadness_2022.pdf
MorrillMicrobeMadness_2022.pdf
 
Lecture 11 (3 11-2021) acellular life
Lecture 11 (3 11-2021) acellular lifeLecture 11 (3 11-2021) acellular life
Lecture 11 (3 11-2021) acellular life
 
Lecture 10 (3 9-2021) archaea
Lecture 10 (3 9-2021) archaeaLecture 10 (3 9-2021) archaea
Lecture 10 (3 9-2021) archaea
 
Lecture 09 (3 4-2021) euks
Lecture 09 (3 4-2021) euksLecture 09 (3 4-2021) euks
Lecture 09 (3 4-2021) euks
 
Lecture 08 (3 2-2021) rares
Lecture 08 (3 2-2021) raresLecture 08 (3 2-2021) rares
Lecture 08 (3 2-2021) rares
 
Lecture 07 (2 25-21) soils
Lecture 07 (2 25-21) soilsLecture 07 (2 25-21) soils
Lecture 07 (2 25-21) soils
 
Lecture 06 (2 23-2021) microbial mats
Lecture 06 (2 23-2021) microbial matsLecture 06 (2 23-2021) microbial mats
Lecture 06 (2 23-2021) microbial mats
 
Lecture 05 (2 16-2021) baas becking
Lecture 05 (2 16-2021) baas beckingLecture 05 (2 16-2021) baas becking
Lecture 05 (2 16-2021) baas becking
 
Lecture 04 (2 11-2021) motility
Lecture 04 (2 11-2021) motilityLecture 04 (2 11-2021) motility
Lecture 04 (2 11-2021) motility
 
Lecture 03 (2 09-2021) early earth
Lecture 03 (2 09-2021) early earthLecture 03 (2 09-2021) early earth
Lecture 03 (2 09-2021) early earth
 

Recently uploaded

general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 

Recently uploaded (20)

general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 

07_Phylogeny_2022.pdf

  • 1. Phylogeny MICROBIO 590B Bioinformatics Lab: Bacterial Genomics Professor Kristen DeAngelis UMass Amherst Fall 2022 1
  • 2. Lecture Learning Goals • Define phylogeny, and describe what a phylogenetic tree can reveal about the taxa that they model. • Explain how phylogenetic methods can allow us to make inferences about groups of organisms and ancestors • Describe how to construct a phylogenetic tree, and the complexities that create mistakes. • Contrast the different phylogenetic marker genes or concatenations of genes that are available depending on the sequencing technology. • Define the species concept for microbes. • Make a phylogenetic tree. 2
  • 3. Phylogeny • Phylogeny is a model of evolutionary relationships among species based on sequence similarities. • Phylogeny may also refer to a phylogenetic tree, the illustration of these relationships. Woesian ToL: Pace NR, Science 1997 3
  • 4. Read trees like mobiles 4
  • 5. Read trees like mobiles 5 In a tree like this, these blue branches have lengths that are meaningful. Their distance should be described by the value of changes in a scale bar. In a tree like this, these red distances have lengths that are NOT meaningful. They are spacers whose distance are only meant to make room for labels or pictures.
  • 7. Who was the last universal common ancestor?
  • 8. 8 The root of the ToL represents the last universal common ancestor
  • 9. The root of the ToL represents the last universal common ancestor • One cannot rely on nucleotide gene sequences alone because these would have mutated beyond recognition • Amino acid sequences mutate more slowly because neutral mutations leave the amino acid sequence fixed • The tertiary folded structure of a protein is even more strongly conserved than the secondary structure 9
  • 10. Sequence homology • Homologous genes have a shared ancestry. • Orthologs arise because of a speciation event. • Paralogs arise because of duplication event. 10
  • 11. Paralogs are used to root the ToL • Elongation Factors duplicated prior to divergence of the three Domains • One gene tree can be rooted with the other gene • Both trees yield the same relationship and are rooted in the same location. 11
  • 12. Root the tree of life using paralogs • The genes for the protein synthesis elongation factors Tu (EF-Tu) and G (EF-G) are the products of an ancient gene duplication, which appears to predate the divergence of all extant organismal lineages. • Most phylogenetic methods place the root of the ToL in the Bacteria • A combined data set of EF-Tu and EF-G sequences favors placement of the eukaryotes within the Archaea, as the sister group to the Crenarchaeota 12 Baladuf, Palmer, & Doolittle, 1996
  • 13. Protein-based models of evolution 13 Kim and Caetano-Anollés BMC Evolutionary Biology 2011
  • 14. Protein-based models of evolu7on • Traits here are proteins, NOT DNA sequence • Based on 420 modern organisms, looking for structures that were common to all. • 5 to 11 per cent were universal-- conserved enough to have originated in LUCA • This perspective gives us new information about LUCA • LUCA had enzymes to break down and extract energy from nutrients, and some protein-making equipment • LUCA lacked the enzymes for making and reading DNA molecules 14
  • 15. Bacteria Archaea Eukaryotes Bacteria Archaea Eukaryotes 15 The root moves depending on what trait you use!
  • 16. The root moves depending on whether you use nucleic acids or protein! • RNA sequence-based rooting of the tree of life puts the root within the Bacteria. • usually derived from analyses of the sequence of ancient gene paralogs e.g., ATPases, elongation factors • Proteomic analyses for many proteins puts the root of the tree of life within the Archaea. • Archaeal rooting has been observed for phylogenetic analyses of tRNA, 5S, & Rnase P 16 Bacteria Archaea Eukaryotes Bacteria Archaea Eukaryotes
  • 17. The last universal common ancestor, aka LUCA • 4 – 3.5 Ga (Ga = 109 years ago) • Almost certainly a dispersed population of variable cells • Features • DNA, the universal code, and most genes • Transcription and RNA polymerase • RNAs of all kinds • Translation and translational machinery • Most proteins and metabolisms • Membrane and cellular structure 17 Bacteria Archaea Eukaryotes Bacteria Archaea Eukaryotes LUCA also LUCA !
  • 18. So you’re making a phylogenetic tree… • Assume you have chosen which species to analyze • (1) Decide which gene to use … • Ribosomal RNA genes • A concatenaZon of single copy housekeeping genes 18
  • 19. SSU ribosomal RNA gene is a common phylogenetic marker + Short, only 1500 base pairs + InformaZon-dense because it is a non-coding, structural RNA + EssenZal for life so probably not horizontally transferred - MulZple copies per genome - Cannot resolve close relaZonships 19 Xie, Tian, Qin, Bu, 2008
  • 20. 20
  • 21. Sensitivity and correlation of hypervariable regions in 16S rRNA genes in phylogenetic analysis • Distance between trees based on sub-regions (V2 through V8) and trees based on all the sub-regions (VT) • Sequence analyses including V4 are favored because of this 21 Yang, Wang, Qian. BMC Bioinformatics, 2016
  • 22. So you’re making a phylogene;c tree… • (2) Align the gene sequences 22
  • 23. So you’re making a phylogene;c tree… • (2) Align the gene sequences • We want evolutionary distance but it cannot be directly measured, so it must be estimated • Each vertical column in the alignment is a “trait” in calculating the distance matrix • Distance matrix is based on observed (measurable) differences, but we assume parsimony • There can be more than one evolutionary change at a single position (e.g., A à G à U) • Positions can change and change back (A à G à A) 23
  • 24. So you’re making a phylogenetic tree… • (3) Make an evolutionary distance matrix based on sequence similarity, using Jukes-Cantor Method. 24
  • 25. So you’re making a phylogenetic tree… • Jukes Cantor method relates sequence similarity to evolutionary distance • If all sequences are the same, distance is zero • Distances increase as sequence similarity decreases, which means that one or two bases difference does not change the distance much • The lowest sequence similarity is about 0.25 because all sequences are about 25% similar by chance; there are 4 bases in the genetic code so the chance that one base will match another is 1 in 4 25
  • 26. So you’re making a phylogenetic tree… • (4) Perform phylogeneZc analysis. • This is an example of the neighbor joining method 26 Distance Matrix (%)
  • 27. So you’re making a phylogene;c tree… • How can you determine the branch lengths? • In other words, you need to place the node “u”, which defines a common ancestor • You know how far apart a & b are from each other • You know how far apart a is from something else, say c, so measure b from c and you can estimate where node u should be • (5, optional) Create a visualization of the tree. • Let’s look at some nice trees … 27
  • 28. So you’re making a phylogene;c tree… • (4) Perform phylogenetic analysis. 28 Yang & Rannala, Nat Rev Gen, 2012
  • 29. So you’re making a phylogenetic tree… • (4) Perform phylogeneZc analysis. 29 Yang & Rannala, Nat Rev Gen, 2012
  • 30. Some nice trees: Metatranscriptomic reconstruc1on reveals RNA viruses with the poten1al to shape carbon cycling in soil 30 Starr et al., 2019
  • 31. A nice tree: bacterial isolates in the our lab culture collection The colored branches are unique for each taxonomy Family, and the colored labels refer to strains that belong to the same Genus. The outer blue/red indicates if each strain is from the heated or control plots. And the stars mean we have a genome sequenced. Choudoir, unpublished
  • 32. So you’re making a phylogenetic tree… • There are many (free) programs to make trees… https://evolution.genetics. washington.edu/phylip/soft ware.html 32 Yang & Rannala, Nat Rev Gen, 2012
  • 33. Tree Construc;on Complexi;es 1. Choice of substitution model 2. GC bias 3. Choice of tree-making algorithm 4. Long-branch attraction 5. Bootstrapping 33
  • 34. Choice of subs;tu;on model • Pairwise sequence distances are calculated assuming a Markov chain model of nucleotide substitution. Several commonly used models are illustrated in FIG. 1. 34 Yang & Rannala, Nature Reviews GeneYcs, 2012
  • 35. 35 “GC bias” • The more GC-rich a region is, the higher the recombination rates. • That means that GC-rich regions, or GC-rich genomes, evolve faster naturally. • Including High GC gram positives (like Actinobacteria) in the same tree as Low GC gram positives (like Firmicutes) can be misleading.
  • 36. Choice of tree algorithm can affect tree structure • Neighbor-joining starts with a radial tree and joins neighbors • Parsimony makes a bunch of trees and find the one that is the most simple, usually based on the fewest mutaWons • Maximum likelihood trees are based on probability • the best & most computaZonally intensive • Bayesian inference starts with random tree structure & random parameters, then iterates unWl an “opWmal” tree is found 36
  • 37. Long-branch attraction • Very long branches can someZmes cluster arZficially • Usually due to bad sequence, poor alignment, or not enough Zps • The erroneous new phylogeny implies a common ancestor and can result in different rates of evoluZon 37
  • 38. Long-branch aPrac;on • … 38 Yang & Rannala, Nature Reviews GeneYcs, 2012
  • 39. Long-branch attraction in theory and in practice • Panels a and b show the four-species case by Felsenstein. If the correct tree (T in a) has two long branches separated by a short internal branch, parsimony (as well as model- based methods such as likelihood and Bayesian methods under simplistic models) tends to recover a wrong tree (T2 in b), in which the two long branches are grouped together. • Panels c and d show a similar phenomenon in a real data set, concerning the phylogeny of seed plants. The Gnetales is a morphologically and ecologically diverse group of Gymnosperms including three genera (Ephedra, Gnetum and Welwitschia), but its phylogenetic position has been controversial. • Maximum likelihood analysis of 56 chloroplast proteins produced the GneCup tree (d), in which the Gnetales are grouped with Cupressophyta, apparently owing to a long-branch attraction artefact. • However, the Gnepine tree (c), in which the Gnetales joins the Pinaceae, was inferred by excluding the fastest-evolving 18 proteins as well as three proteins (namely, psbC, rpl2 and rps7) that had experienced many parallel substitutions between the Cryptomeria branch and the branch ancestral to the Gnetales. The Gnepine tree (c) is also supported by two proteins from the nuclear genome and appears to be the correct tree. • Branch lengths and bootstrap proportions are all calculated using RAxML. 39 Yang & Rannala, Nature Reviews Genetics, 2012
  • 40. Bootstrapping • Random sampling with replacement to create new trees • A measure of confidence in your sequence alignment • Numbers are from 0-100, with 100 being perfect confidence 40
  • 42. What is a species? The following terms represent similar concepts and are sometimes used interchangeably. • Species = related organisms that share common characteristics and are capable of interbreeding • Taxa = a group of one or more populations of an organism, usually with a name and rank, and seen by taxonomists to form a unit • Operational taxonomic unit = Usually defined as the number of distinct 16S ribosomal RNA sequences (or distinct phylogenetic marker genes or concatenations) at a certain cut-off level of sequence diversity. • Lineage = temporal series of populations, organisms, cells, or genes connected by a continuous line of descent from ancestor to descendant, determined by the techniques of molecular systematics. • Strain = a genetic variant, a subtype or a culture within a biological species 42
  • 43. What is a species? 43 The species concept in microbes is hotly debated. • ‘‘A species could be described as a monophyleZc and genomically coherent cluster of individual organisms that show a high degree of overall similarity in many independent characterisZcs, and is diagnosable by a discriminaZve phenotypic property.’’ (ReF. 9) • ‘‘Species are considered to be an irreducible cluster of organisms diagnosably different from other such clusters and within which there is a parental palern of ancestry and descent.’’ (ReF. 82) • ‘‘A species is a group of individuals where the observed lateral gene transfer within the group is much greater than the transfer between groups.’’ (ReF. 83) • ‘‘Microbes ... do not form natural clusters to which the term “species” can be universally and sensibly applied.’’ (ReF. 84) • ‘‘Species are (segments of) metapopulaZon lineages.’’ (ReF. 7) Achtman & Wagner, Nat. Rev. Micro. 2008
  • 44. 44 Achtman & Wagner, Nat. Rev. Micro. 2008 Species definition should be guided by a method-free species concept based on cohesive evolutionary forces
  • 45. Species defini7ons • Five types of ecotype models have been described in detail. E1 and E2 represent ecotypes; G1 and G2 represent genotypes. Colours reflect genetic ancestry. Solid lines indicate extant lineages that exist today, whereas dotted lines indicate extinct lineages that have disappeared owing to overgrowth during episodes of periodic selection. 45 Achtman & Wagner, Nat. Rev. Micro. 2008
  • 46. Species definitions • … 46 Achtman & Wagner, Nat. Rev. Micro. 2008 Salmonella enterica subsp. enterica serovar Typhi Yersinia pestis Neisseria meningitidis serogroup A subgroup III
  • 47. Opera;onal species defini;ons • pairwise DNA re-association values are ≥70% in DNA–DNA hybridization experiments under standardized conditions and their ∆Tm (melting temperature) is ≤5°C • 16S ribosomal RNAs (rRNAs) that are ≤98.7% identical are always members of different species • strong differences in rRNA correlate with <70% DNA–DNA similarity • distinct species have been occasionally described with 16S rRNAs that are >98.7% identical • multilocus sequence analysis (MLSA) based on multiple (typically 6–8) protein-coding core genes • average nucleotide identity (ANI) of all orthologous genes • … 47
  • 48. NCBI BLAST 16S ribosomal RNA genes >GP101 CGGCAGCGGGGGTAGCTTGCTACTTGCCGGCGAGTGGCGAACGGGTGAGTAATACATCGGAACGTGCCCTGTAGTGGGGG ATAACTAGTCGAAAGACTGGCTAATACCGCATACGACCTGAGGGTGAAAGTGGGGGACCGCAAGGCCTCATGCTATAGGAG CGGCCGATGTCTGATTAGCTAGTTGGTGGGGTAAAGGCCCACCAAGGCGACGATCAGTAGCTGGTCTGAGAGGACGATCAG CCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGGGCAACCCTGAT CCAGCAATGCCGCGTGTGTGAAGAAGGCCTTCGGGTTGTAAAGCACTTTTGTCCGGAAAGAAATCGCTTCGGTTAATACCTG GAGTGGATGACGGTACCGGAAGAATAAGGACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTCCAAGCGTTA ATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTGTGCAAGACCGATGTGAAATCCCCGGGCTTAACCTGGGAATTGC ATTGGTGACTGCACGGCTAGAGTGTGTCAGAGGGGGGTAGAATTCCACGTGTAGCAGTGAAATGCGTAGAGATGTGGAGG AATACCGATGGCGAAGGCAGCCCCCTGGGATAACACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATAC CCTGGTAGTCCACGCCCTAAACGATGTCAACTAGTTGTTGGGGATTCATTTTCTTAGTAACGTAGCTAACGCGTGAAGTTGAC CGCCTGGGGAGTACGGTCGCAAGATTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGATGATGTGGATTAA TTCGATGCAACGCGAAAAACCTTACCTACCCTTGACATGCCACTAACGAAGCAGAGATGCATTAGGTGCTCGAAAGAGAAA GTGGACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGT CTCTAGTTGCTACGAAAGGGCACTCTAGAGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCCTCA TGGCCCTTATGGGTAGGGCTTCACACGTCATACAATGGTGCATACAGAGGGTTGCCAAGCCGCGAGGTGGAGCTAATCCCA GAAAATGCATCGTAGTCCGGATCGTAGTCTGCAACTCGACTACGTGAAGCTGGAATCGCTAGTAATCGCGGATCAGCATGCC GCGGTGAATACGTTCCCGGGTCTTGTACACACCGCCCGTCACACCTTGGGAGTGGGCTTTACCAGAAGTAGTTAGCCTAACC GCAAGGAGGGCGATACCACGTAGT 48
  • 49. NCBI BLAST 16S ribosomal RNA genes • The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. • Default database is ‘nr/nt’, the non-redundant nucleotide collection • Update date: 2021/08/01 • Number of sequences: 72,191,653 • For phylogeny & taxonomy, we want to use the ribosomal RNA (rRNA) intergenic transcribed spacer (ITS) database • 21,856 sequences 49
  • 50. What if we cannot detect the usual phylogenetic marker genes? • Inferring phylogeny for genomes newly discovered from metagenomes is useful for identification (aka genotyping) • 16S ribosomal RNA genes are the “gold standard,” but sometimes resist assembly due to high degrees of sequence similarity across lineages • Any shared genomic trait is a candidate for a phylogenetic marker • Single copy marker genes 50
  • 51. Single copy marker genes • ezTree is a program that can extract single copy genes for phylogeneZc analysis 51 Wu, BMC Genomics. 2018
  • 52. Taxonomy versus phylogeny • Taxonomy bins organization based on classified levels • Linnaean classification is still used • 97% identity of the 16S rRNA gene or greater are the same species • 95% identity of the 16S rRNA gene or greater are the same genus • THERE ARE MANY EXCEPTIONS! • Linnaean classification • Kingdom • Phylum • Class • Order • Family • Genus • Species 52
  • 53. Linnaeus and Race • Linnaeus’ work forms one of the 18th-century roots of modern scientific racism. • Linnaeus was the first naturalist to classify man as an animal in Systema naturae in 1735 • ’man’ was divided into four ”varieties” (he did not use the word ”race”) • based on the then known four continents of the world: Europe, America, Asia and Africa • By the 10th edition, he expanded this idea to add the four ‘humours’ or temperaments, as well as a hierarchy of the ‘varieties’ 53 https://www.linnean.org/learning/who-was-linnaeus/linnaeus-and-race
  • 54. Taxonomy via NCBI • Order • Family • Genus • Species • Subspecies 54
  • 55. Genome Taxonomy Database (GTDB) • Phylogenomic classificaWon based on a set of conserved proteins 55
  • 56. Insert Genome into Species Tree • species tree using a set of 49 core, universal genes defined by COG (Clusters of Orthologous Groups) gene families • COGs domains used in the estimate of relatedness are listed on the website. For example: • GTPase, tRNA synthetases, Ribosomal proteins, and other proteins involved in Translation, ribosomal structure and biogenesis • Nucleotide transport and metabolism • 3-phosphoglycerate kinase [Carbohydrate transport and metabolism] 56
  • 57. Lecture Learning Goals • Define phylogeny, and describe what a phylogenetic tree can reveal about the taxa that they model. • Explain how phylogenetic methods can allow us to make inferences about groups of organisms and ancestors • Describe how to construct a phylogenetic tree, and the complexities that create mistakes. • Contrast the different phylogenetic marker genes or concatenations of genes that are available depending on the sequencing technology. • Define the species concept for microbes. • Make a phylogenetic tree. 57
  • 58. 58