Lecture 6:

EVE 161:

Microbial Phylogenomics
!

Lecture #6:
Era II: rRNA PCR and major groups
!
UC Davis, Winter 2014
Instructor: Jonathan Eisen

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!1
Where we are going and where we have been

• Previous lecture:
! 5. Era II: rRNA from environment
• Current Lecture:
! 6: Era II: PCR and major groups
• Next Lecture:
! 7: Era II: rRNA ecology

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!2
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!3
FIG. 1.
Evolutionary distance tree of the bacterial
domain showing currently recognized
divisions and putative (candidate) divisions.
The tree was constructed using the ARB
software package (with the Lane mask and
Olsen rate-corrected neighbor-joining
options) and a sequence database modified
from the March 1997 ARB database release
(43). Division-level groupings of two or more
sequences are depicted as wedges. The
depth of the wedge reflects the branching
depth of the representatives selected for a
particular division. Divisions which have
cultivated representatives are shown in black;
divisions represented only by environmental
sequences are shown in outline. The scale
bar indicates 0.1 change per nucleotide. The
aligned, unmasked data sets used for this
figure and Fig. 3 through 6are available from
http://crab2.berkeley.edu/pacelab/176.htm.

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!4
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!5
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!6
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!7
FIG. 2.
Relative representation in selected
cosmopolitan bacterial divisions of 16S rRNA
sequences from cultivated and uncultivated
organisms. Results were compiled from 5,224
and 2,918 sequences from cultivated and
uncultivated organisms, respectively.

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!8
FIG. 3.
Phylogenetic dendrogram of the Acidobacteriumdivision. Names of
cultivated organisms are shown in bold. The habitat source of each
environmental sequence is indicated before the clone name. GenBank
accession numbers are listed parenthetically. Subdivisions (see the text)
are indicated by brackets at the right of the tree. Construction of the tree
was as described for Fig. 1. The robustness of the topology presented
was estimated by bootstrap resampling of independent distance,
parsimony, and rate-corrected maximum-likelihood analyses as
previously described (2). Distance and parsimony analyses were
conducted using test version 4.0d61 of PAUP*, written by David L.
Swofford. Branch points supported (bootstrap values of >75%) by most
or all phylogenetic analyses are indicated by filled circles; open circles
indicate branch points marginally supported (bootstrap values of 50 to
74%) by most or all analyses. Branch points without circles are not
resolved (bootstrap values of <50%) as specific groups in different
analyses. The scale bar indicates 0.1 change per nucleotide.

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!9
FIG. 4.
Phylogenetic dendrogram of theVerrucomicrobia division.
Names of cultivated organisms are shown in bold. The
habitat source of each environmental sequence is
indicated before the clone name. GenBank accession
numbers are listed parenthetically. Subdivisions (see the
text) are indicated by brackets at the right of the tree.
Tree construction and support for branch points was as
described for Fig. 1 and 3, respectively. The scale bar
indicates 0.1 change per nucleotide.

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!10
FIG. 5.
Phylogenetic dendrogram of the GNS division. Names of
cultivated organisms are shown in bold. The habitat source of
each environmental sequence is indicated before the clone
name. GenBank accession numbers are listed parenthetically.
Subdivisions (see the text) are indicated by brackets at the
right of the tree. Tree construction and support for branch
points was as described for Fig. 1and 3, respectively. The scale
bar indicates 0.1 change per nucleotide.

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!11
FIG. 6.
Phylogenetic dendrogram of the OP11
division. The habitat source of each
environmental sequence is indicated before
the clone name. GenBank accession numbers
are listed parenthetically. Subdivisions (see the
text) are indicated by brackets at the right of
the tree. Tree construction and support for
branch points was as described for Fig. 1 and
3, respectively. The four MIM clones and F78
clone are unreleased sequences generously
made available to us by Pascale Durand (10)
and Floyd Dewhirst (8). The scale bar
indicates 0.1 change per nucleotide.

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!12
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!13
Summary: The intent of this article is to provide a critical assessment of our current
understanding of life's phylogenetic diversity. Phylogenetic comparison of gene
sequences is a natural way to identify microorganisms and can also be used to infer
the course of evolution. Three decades of molecular phylogenetic studies with
various molecular markers have provided the outlines of a universal tree of life
(ToL), the three-domain pattern of archaea, bacteria, and eucarya. The sequencebased perspective on microbial identification additionally opened the way to the
identification of environmental microbes without the requirement for culture,
particularly through analysis of rRNA gene sequences. Environmental rRNA
sequences, which now far outnumber those from cultivars, expand our knowledge
of the extent of microbial diversity and contribute increasingly heavily to the
emerging ToL. Although the three-domain structure of the ToL is established, the
deep phylogenetic structure of each of the domains remains murky and sometimes
controversial. Obstacles to accurate inference of deep phylogenetic relationships
are both systematic, in molecular phylogenetic calculations, and practical, due to a
paucity of sequence representation for many groups of organisms.

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!14
Sequence uncertainty with depth in a phylogenetic tree.

Sequence uncertainty with
depth in a phylogenetic tree.
Dashed line, not corrected for
unseen changes; solid line,
corrected for unseen changes
using the following estimation:
inferred sequence change
(Knuc) = −3/4 ln[1 − (4/3)D],
where D is the number of
changes counted (31).

Pace N R Microbiol. Mol. Biol. Rev. 2009;73:565-576

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!15
Chronological accumulation of SSU rRNA sequences.

Chronological accumulation of
SSU rRNA sequences. The
data are derived from the
SILVA 98 SSU Parc database
(52) using the EMBL
taxonomic designations for the
sequences (66). The SILVA
SSU Parc database contains
rRNA sequences that are 300
or more nucleotides in length
and validated as rRNA with
RNAmmer (43). (A)
Accumulation of total,
archaeal, bacterial, and
eucaryal SSU sequences. (B)
Accumulation of rRNA
sequences from cultured and
environmental bacteria. (C)
Accumulation of rRNA
sequences from cultured and
environmental archaea.
Pace N R Microbiol. Mol. Biol. Rev. 2009;73:565-576

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!16
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!17
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!18
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!19
A molecular ToL based on rRNA sequence comparisons.

A molecular ToL based on
rRNA sequence
comparisons. The
diagram compiles the
results of many rRNA
sequence comparisons.
Only a few of the known
lines of descent are
shown.

Pace N R Microbiol. Mol. Biol. Rev. 2009;73:565-576

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!20
Distribution of SSU rRNA among the top 12 bacterial phyla.

Distribution of SSU rRNA
sequences among the top
12 bacterial phyla. Shown is
the SSU rRNA sequence
distribution in the SILVA 98
SSU Parc database (52)
among the bacterial phyla
(Ribosomal Database
Project taxonomy) (10)
containing the most rRNA
sequences.

Pace N R Microbiol. Mol. Biol. Rev. 2009;73:565-576

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!21
Archaeal rRNA trees with sequences available in 1993 and 2008.

Archaeal rRNA trees with sequences available in
1993 and 2008. Archaeal SSU rRNA sequences
available in 1993 (classic archaeal tree) (A) and in
2008 (B) were used in maximum likelihood
bootstrap analysis with RAxML (64) as described
previously (56, 57). The boxes represent radiations
within the groups, with the long and short
dimensions reflecting the line segment lengths
within the groups. The sizes of the boxes reflect
sequence representation for the groups. The
numbers at the base of the boxes are bootstrap
percentages. The box labeled Environmental
“Euryarchaeota” is not a phylogenetically coherent
group.

Pace N R Microbiol. Mol. Biol. Rev. 2009;73:565-576

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!22
Distribution of SSU rRNA among the top 12 eucaryal phyla.

Distribution of SSU rRNA
sequences among the top
12 eucaryal phyla. Shown
is SSU rRNA sequence
distribution in the SILVA 98
SSU Parc database (52)
among the eucaryotic
phyla (EMBL taxonomy
[66]) containing the most
rRNA sequences.

Pace N R Microbiol. Mol. Biol. Rev. 2009;73:565-576

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!23
Side issues
• Orthologs and Paralogs
• Unseen changes
• Testing trees
• What we do not know

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!24

EVE 161 Lecture 6

  • 1.
    Lecture 6: EVE 161:
 MicrobialPhylogenomics ! Lecture #6: Era II: rRNA PCR and major groups ! UC Davis, Winter 2014 Instructor: Jonathan Eisen Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !1
  • 2.
    Where we aregoing and where we have been • Previous lecture: ! 5. Era II: rRNA from environment • Current Lecture: ! 6: Era II: PCR and major groups • Next Lecture: ! 7: Era II: rRNA ecology Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !2
  • 3.
    Slides for UCDavis EVE161 Course Taught by Jonathan Eisen Winter 2014 !3
  • 4.
    FIG. 1. Evolutionary distancetree of the bacterial domain showing currently recognized divisions and putative (candidate) divisions. The tree was constructed using the ARB software package (with the Lane mask and Olsen rate-corrected neighbor-joining options) and a sequence database modified from the March 1997 ARB database release (43). Division-level groupings of two or more sequences are depicted as wedges. The depth of the wedge reflects the branching depth of the representatives selected for a particular division. Divisions which have cultivated representatives are shown in black; divisions represented only by environmental sequences are shown in outline. The scale bar indicates 0.1 change per nucleotide. The aligned, unmasked data sets used for this figure and Fig. 3 through 6are available from http://crab2.berkeley.edu/pacelab/176.htm. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !4
  • 5.
    Slides for UCDavis EVE161 Course Taught by Jonathan Eisen Winter 2014 !5
  • 6.
    Slides for UCDavis EVE161 Course Taught by Jonathan Eisen Winter 2014 !6
  • 7.
    Slides for UCDavis EVE161 Course Taught by Jonathan Eisen Winter 2014 !7
  • 8.
    FIG. 2. Relative representationin selected cosmopolitan bacterial divisions of 16S rRNA sequences from cultivated and uncultivated organisms. Results were compiled from 5,224 and 2,918 sequences from cultivated and uncultivated organisms, respectively. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !8
  • 9.
    FIG. 3. Phylogenetic dendrogramof the Acidobacteriumdivision. Names of cultivated organisms are shown in bold. The habitat source of each environmental sequence is indicated before the clone name. GenBank accession numbers are listed parenthetically. Subdivisions (see the text) are indicated by brackets at the right of the tree. Construction of the tree was as described for Fig. 1. The robustness of the topology presented was estimated by bootstrap resampling of independent distance, parsimony, and rate-corrected maximum-likelihood analyses as previously described (2). Distance and parsimony analyses were conducted using test version 4.0d61 of PAUP*, written by David L. Swofford. Branch points supported (bootstrap values of >75%) by most or all phylogenetic analyses are indicated by filled circles; open circles indicate branch points marginally supported (bootstrap values of 50 to 74%) by most or all analyses. Branch points without circles are not resolved (bootstrap values of <50%) as specific groups in different analyses. The scale bar indicates 0.1 change per nucleotide. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !9
  • 10.
    FIG. 4. Phylogenetic dendrogramof theVerrucomicrobia division. Names of cultivated organisms are shown in bold. The habitat source of each environmental sequence is indicated before the clone name. GenBank accession numbers are listed parenthetically. Subdivisions (see the text) are indicated by brackets at the right of the tree. Tree construction and support for branch points was as described for Fig. 1 and 3, respectively. The scale bar indicates 0.1 change per nucleotide. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !10
  • 11.
    FIG. 5. Phylogenetic dendrogramof the GNS division. Names of cultivated organisms are shown in bold. The habitat source of each environmental sequence is indicated before the clone name. GenBank accession numbers are listed parenthetically. Subdivisions (see the text) are indicated by brackets at the right of the tree. Tree construction and support for branch points was as described for Fig. 1and 3, respectively. The scale bar indicates 0.1 change per nucleotide. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !11
  • 12.
    FIG. 6. Phylogenetic dendrogramof the OP11 division. The habitat source of each environmental sequence is indicated before the clone name. GenBank accession numbers are listed parenthetically. Subdivisions (see the text) are indicated by brackets at the right of the tree. Tree construction and support for branch points was as described for Fig. 1 and 3, respectively. The four MIM clones and F78 clone are unreleased sequences generously made available to us by Pascale Durand (10) and Floyd Dewhirst (8). The scale bar indicates 0.1 change per nucleotide. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !12
  • 13.
    Slides for UCDavis EVE161 Course Taught by Jonathan Eisen Winter 2014 !13
  • 14.
    Summary: The intentof this article is to provide a critical assessment of our current understanding of life's phylogenetic diversity. Phylogenetic comparison of gene sequences is a natural way to identify microorganisms and can also be used to infer the course of evolution. Three decades of molecular phylogenetic studies with various molecular markers have provided the outlines of a universal tree of life (ToL), the three-domain pattern of archaea, bacteria, and eucarya. The sequencebased perspective on microbial identification additionally opened the way to the identification of environmental microbes without the requirement for culture, particularly through analysis of rRNA gene sequences. Environmental rRNA sequences, which now far outnumber those from cultivars, expand our knowledge of the extent of microbial diversity and contribute increasingly heavily to the emerging ToL. Although the three-domain structure of the ToL is established, the deep phylogenetic structure of each of the domains remains murky and sometimes controversial. Obstacles to accurate inference of deep phylogenetic relationships are both systematic, in molecular phylogenetic calculations, and practical, due to a paucity of sequence representation for many groups of organisms. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !14
  • 15.
    Sequence uncertainty withdepth in a phylogenetic tree. Sequence uncertainty with depth in a phylogenetic tree. Dashed line, not corrected for unseen changes; solid line, corrected for unseen changes using the following estimation: inferred sequence change (Knuc) = −3/4 ln[1 − (4/3)D], where D is the number of changes counted (31). Pace N R Microbiol. Mol. Biol. Rev. 2009;73:565-576 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !15
  • 16.
    Chronological accumulation ofSSU rRNA sequences. Chronological accumulation of SSU rRNA sequences. The data are derived from the SILVA 98 SSU Parc database (52) using the EMBL taxonomic designations for the sequences (66). The SILVA SSU Parc database contains rRNA sequences that are 300 or more nucleotides in length and validated as rRNA with RNAmmer (43). (A) Accumulation of total, archaeal, bacterial, and eucaryal SSU sequences. (B) Accumulation of rRNA sequences from cultured and environmental bacteria. (C) Accumulation of rRNA sequences from cultured and environmental archaea. Pace N R Microbiol. Mol. Biol. Rev. 2009;73:565-576 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !16
  • 17.
    Slides for UCDavis EVE161 Course Taught by Jonathan Eisen Winter 2014 !17
  • 18.
    Slides for UCDavis EVE161 Course Taught by Jonathan Eisen Winter 2014 !18
  • 19.
    Slides for UCDavis EVE161 Course Taught by Jonathan Eisen Winter 2014 !19
  • 20.
    A molecular ToLbased on rRNA sequence comparisons. A molecular ToL based on rRNA sequence comparisons. The diagram compiles the results of many rRNA sequence comparisons. Only a few of the known lines of descent are shown. Pace N R Microbiol. Mol. Biol. Rev. 2009;73:565-576 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !20
  • 21.
    Distribution of SSUrRNA among the top 12 bacterial phyla. Distribution of SSU rRNA sequences among the top 12 bacterial phyla. Shown is the SSU rRNA sequence distribution in the SILVA 98 SSU Parc database (52) among the bacterial phyla (Ribosomal Database Project taxonomy) (10) containing the most rRNA sequences. Pace N R Microbiol. Mol. Biol. Rev. 2009;73:565-576 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !21
  • 22.
    Archaeal rRNA treeswith sequences available in 1993 and 2008. Archaeal rRNA trees with sequences available in 1993 and 2008. Archaeal SSU rRNA sequences available in 1993 (classic archaeal tree) (A) and in 2008 (B) were used in maximum likelihood bootstrap analysis with RAxML (64) as described previously (56, 57). The boxes represent radiations within the groups, with the long and short dimensions reflecting the line segment lengths within the groups. The sizes of the boxes reflect sequence representation for the groups. The numbers at the base of the boxes are bootstrap percentages. The box labeled Environmental “Euryarchaeota” is not a phylogenetically coherent group. Pace N R Microbiol. Mol. Biol. Rev. 2009;73:565-576 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !22
  • 23.
    Distribution of SSUrRNA among the top 12 eucaryal phyla. Distribution of SSU rRNA sequences among the top 12 eucaryal phyla. Shown is SSU rRNA sequence distribution in the SILVA 98 SSU Parc database (52) among the eucaryotic phyla (EMBL taxonomy [66]) containing the most rRNA sequences. Pace N R Microbiol. Mol. Biol. Rev. 2009;73:565-576 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !23
  • 24.
    Side issues • Orthologsand Paralogs • Unseen changes • Testing trees • What we do not know Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !24