SlideShare a Scribd company logo
Phylogeny driven approaches
to the study of microbial diversity
September 3, 2015
Queenstown Computational Genomics
Conference
Jonathan A. Eisen
@phylogenomics
University of California, Davis
0
1000
2000
3000
4000
00 01 02 03 04 05 06 07 08 09 10 11 12 13
Pubmed “Microbiome” Hits
The Rise of the Microbiome
microBIOME or microbiOME
• microbi-OME
• collection of genomes of microbes from a
community (emphasis on OME)
• micro-BIOME
• a community of microbes (emphasis on
BIOME)
• see http://tinyurl.com/definemicrobiome
Not Just About Humans or Hosts
Why Now?
Why Now I: Appreciation of Microbial Diversity
Functional Diversity
Diversity of Form
Phylogenetic Diversity
Why Now I: Appreciation of Microbial Diversity
Functional Diversity
Diversity of Form
Phylogenetic Diversity
MICROBES
RUN THE
PLANET
Why Now II: Post Genome Blues
The Microbiome
Transcriptome
VariomeEpigenome
Overselling the Human Genome?
<<<<
Culturing Observation
CountCount
http://www.google.com/url?
sa=i&rct=j&q=&esrc=s&source=images&
cd=&docid=rLu5sL207WlE1M&tbnid=CR
LQYP7d9d_TcM:&ved=0CAUQjRw&url=h
ttp%3A%2F%2Fwww.biol.unt.edu
%2F~jajohnson
%2FDNA_sequencing_process&ei=hFu7
U_TyCtOqsQSu9YGwBg&psig=AFQjCN
G-8EBdEljE7-
yHFG2KPuBZt8kIPw&ust=140487395121
1424
DNA
Why Now III: CSI-Microbiology Advances
Why Now IV: Sequencing Has Gone Crazy
Sequencing Revolution
!10
•More genes and genomes
•Deeper sequencing
• The rare biosphere
• Relative abundance estimates
•More samples (with barcoding)
• Times series
• Spatially diverse sampling
• Fine scale sampling
Turnbaugh et al Nature. 2006 444(7122):1027-31.
Why Now V: Microbiome Functions
Uses of Phylogeny 1: Species Phylogeny
Woese: Classification of Cultured Taxa by rRNA
!13
rRNA rRNArRNA
ACUGC
ACCUAU
CGUUCG
ACUCC
AGCUAU
CGAUCG
ACCCC
AGCUCU
CGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
R ACUCCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
F ACUCCAGGUAUCGAUCG
C ACCCCAGCUCUCGCUCG
W ACCCCAGCUCUGGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
C ACCCCAGCUCUCGCUCG
EukaryotesBacteria ?????ArchaebacteriaArchaea
Isolate Ribosomes
Archaea
Woese: Classification of Cultured Taxa by rRNA PCR
!15
rRNA
rRNA
PCR
rRNA
PCR
ACUGC
ACCUAU
CGUUCG
ACUCC
AGCUAU
CGAUCG
ACCCC
AGCUCU
CGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
R ACUCCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
F ACUCCAGGUAUCGAUCG
C ACCCCAGCUCUCGCUCG
W ACCCCAGCUCUGGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
C ACCCCAGCUCUCGCUCG
EukaryotesBacteria
Isolate DNA
Archaea
!16
rRNA
rRNA
PCR
rRNA
PCR
EukaryotesBacteria
Isolate DNA
ACTGC
ACCTAT
CGTTCG
ACTGC
ACCTAT
CGTTCG
ACTGC
ACCTAT
CGTTCG
Taxa Characters
B1 ACTGCACCTATCGTTCG
B2 ACTCCACCTATCGTTCG
E1 ACTCCAGCTATCGATCG
E2 ACTCCAGGTATCGATCG
A1 ACCCCAGCTCTCGCTCG
A2 ACCCCAGCTCTGGCTCG
New1 ACTGCACCTATCGTTCG
Phylotyping via rRNA PCR: One Taxon
Chemosymbiont rRNA Phylotyping
!17
Eisen et al. 1992. J. Bact.174: 3416Colleen Cavanaugh
Taxa Characters
B1 ACTGCACCTATCGTTCG
B2 ACTCCACCTATCGTTCG
E1 ACTCCAGCTATCGATCG
E2 ACTCCAGGTATCGATCG
A1 ACCCCAGCTCTCGCTCG
A2 ACCCCAGCTCTGGCTCG
New1 ACCCCAGCTCTGCCTCG
New2 ACTGCACCTATCGTTCG
Archaea EukaryotesBacteria
ACTGC
ACCTAT
CGTTCG
ACTGC
ACCTAT
CGTTCG
ACCCC
AGCTCT
CGCTCG
!18
rRNA
rRNA
PCR
rRNA
PCR
Isolate DNA
Phylotyping via rRNA PCR: Two Taxa
ACTGC
ACCTAT
CGTTCG
ACTCC
AGCTAT
CGATCG
ACCCC
AGCTCT
CGCTCG
AGGGG
AGCTCT
CGCTCG
AGGGG
AGCTCT
CGCTCG
ACTGC
ACCTAT
CGTTCG
Taxa Characters
B1 ACTGCACCTATCGTTCG
B2 ACTCCACCTATCGTTCG
E1 ACTCCAGCTATCGATCG
E2 ACTCCAGGTATCGATCG
A1 ACCCCAGCTCTCGCTCG
A2 ACCCCAGCTCTGGCTCG
New1 ACCCCAGCTCTGCCTCG
New2 ACTGCACCTATCGTTCG
New3 ACCCCAGCTCTCGCTCG

New4 AGGGGAGCTCTCGCTCG
Archaea EukaryotesBacteria
!19
rRNA
rRNA
PCR
rRNA
PCR
Isolate DNA
Phylotyping via rRNA PCR: Four Taxa
Similarity vs. Phylogeny
!20
!21
Approaching to NGS
Discovery of DNA structure
(Cold Spring Harb. Symp. Quant. Biol. 1953;18:123-31)
1953
Sanger sequencing method by F. Sanger
(PNAS ,1977, 74: 560-564)
1977
PCR by K. Mullis
(Cold Spring Harb Symp Quant Biol. 1986;51 Pt 1:263-73)
1983
Development of pyrosequencing
(Anal. Biochem., 1993, 208: 171-175; Science ,1998, 281: 363-365)
1993
1980
1990
2000
2010
Single molecule emulsion PCR 1998
Human Genome Project
(Nature , 2001, 409: 860–92; Science, 2001, 291: 1304–1351)
Founded 454 Life Science 2000
454 GS20 sequencer
(First NGS sequencer)
2005
Founded Solexa 1998
Solexa Genome Analyzer
(First short-read NGS sequencer)
2006
GS FLX sequencer
(NGS with 400-500 bp read lenght)
2008
Hi-Seq2000
(200Gbp per Flow Cell)
2010
Illumina acquires Solexa
(Illumina enters the NGS business)
2006
ABI SOLiD
(Short-read sequencer based upon ligation)
2007
Roche acquires 454 Life Sciences
(Roche enters the NGS business)
2007
NGS Human Genome sequencing
(First Human Genome sequencing based upon NGS technology)
2008
From Slideshare presentation of Cosentino Cristian
http://www.slideshare.net/cosentia/high-throughput-equencing
Miseq
Roche Jr
Ion Torrent
PacBio
Oxford
Automation is Critical
AAATCGCTAGCGC
CGGCGAGCTAGC
CGAGCGATCGAGC
CGAGCATCGAGTA
STAP (for rRNA)
An Automated Phylogenetic Tree-Based Small Subunit
rRNA Taxonomy and Alignment Pipeline (STAP)
Dongying Wu1
*, Amber Hartman1,6
, Naomi Ward4,5
, Jonathan A. Eisen1,2,3
1 UC Davis Genome Center, University of California Davis, Davis, California, United States of America, 2 Section of Evolution and Ecology, College of Biological Sciences,
University of California Davis, Davis, California, United States of America, 3 Department of Medical Microbiology and Immunology, School of Medicine, University of
California Davis, Davis, California, United States of America, 4 Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, United States of America,
5 Center of Marine Biotechnology, Baltimore, Maryland, United States of America, 6 The Johns Hopkins University, Department of Biology, Baltimore, Maryland, United
States of America
Abstract
Comparative analysis of small-subunit ribosomal RNA (ss-rRNA) gene sequences forms the basis for much of what we know
about the phylogenetic diversity of both cultured and uncultured microorganisms. As sequencing costs continue to decline
and throughput increases, sequences of ss-rRNA genes are being obtained at an ever-increasing rate. This increasing flow of
data has opened many new windows into microbial diversity and evolution, and at the same time has created significant
methodological challenges. Those processes which commonly require time-consuming human intervention, such as the
preparation of multiple sequence alignments, simply cannot keep up with the flood of incoming data. Fully automated
methods of analysis are needed. Notably, existing automated methods avoid one or more steps that, though
computationally costly or difficult, we consider to be important. In particular, we regard both the building of multiple
sequence alignments and the performance of high quality phylogenetic analysis to be necessary. We describe here our fully-
automated ss-rRNA taxonomy and alignment pipeline (STAP). It generates both high-quality multiple sequence alignments
and phylogenetic trees, and thus can be used for multiple purposes including phylogenetically-based taxonomic
assignments and analysis of species diversity in environmental samples. The pipeline combines publicly-available packages
(PHYML, BLASTN and CLUSTALW) with our automatic alignment, masking, and tree-parsing programs. Most importantly,
this automated process yields results comparable to those achievable by manual analysis, yet offers speed and capacity that
are unattainable by manual efforts.
Citation: Wu D, Hartman A, Ward N, Eisen JA (2008) An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP). PLoS
ONE 3(7): e2566. doi:10.1371/journal.pone.0002566
multiple alignment and phylogeny was deemed unfeasible.
However, this we believe can compromise the value of the results.
For example, the delineation of OTUs has also been automated
via tools that do not make use of alignments or phylogenetic trees
(e.g., Greengenes). This is usually done by carrying out pairwise
comparisons of sequences and then clustering of sequences that
have better than some cutoff threshold of similarity with each
other). This approach can be powerful (and reasonably efficient)
but it too has limitations. In particular, since multiple sequence
alignments are not used, one cannot carry out standard
phylogenetic analyses. In addition, without multiple sequence
alignments one might end up comparing and contrasting different
regions of a sequence depending on what it is paired with.
The limitations of avoiding multiple sequence alignments and
phylogenetic analysis are readily apparent in tools to classify
sequences. For example, the Ribosomal Database Project’s
Classifier program [29] focuses on composition characteristics of
each sequence (e.g., oligonucleotide frequency) and assigns
taxonomy based upon clustering genes by their composition.
Though this is fast and completely automatable, it can be misled in
cases where distantly related sequences have converged on similar
composition, something known to be a major problem in ss-rRNA
sequences [30]. Other taxonomy assignment systems focus
primarily on the similarity of sequences. The simplest of these is
classification tools it does have some limitations. For example,
the generation of new alignments for each sequence is both
computational costly, and does not take advantage of available
curated alignments that make use of ss-RNA secondary structure
to guide the primary sequence alignment. Perhaps most
importantly however is that the tool is not fully automated. In
addition, it does not generate multiple sequence alignments for all
sequences in a dataset which would be necessary for doing many
analyses.
Automated methods for analyzing rRNA sequences are also
available at the web sites for multiple rRNA centric databases,
such as Greengenes and the Ribosomal Database Project (RDPII).
Though these and other web sites offer diverse powerful tools, they
do have some limitations. For example, not all provide multiple
sequence alignments as output and few use phylogenetic
approaches for taxonomy assignments or other analyses. More
importantly, all provide only web-based interfaces and their
integrated software, (e.g., alignment and taxonomy assignment),
cannot be locally installed by the user. Therefore, the user cannot
take advantage of the speed and computing power of parallel
processing such as is available on linux clusters, or locally alter and
potentially tailor these programs to their individual computing
needs (Table 1).
Given the limited automated tools that are available for
Table 1. Comparison of STAP’s computational abilities relative to existing commonly-used ss-RNA analysis tools.
STAP ARB Greengenes RDP
Installed where? Locally Locally Web only Web only
User interface Command line GUI Web portal Web portal
Parallel processing YES NO NO NO
Manual curation for taxonomy assignment NO YES NO NO
Manual curation for alignment NO YES NO* NO
Open source YES** NO NO NO
Processing speed Fast Slow Medium Medium
It is important to note, that STAP is the only software that runs on the command line and can take advantage of parallel processing on linux clusters and, further, is
more amenable to downstream code manipulation.
*
Note: Greengenes alignment output is compatible with upload into ARB and downstream manual alignment.
**
The STAP program itself is open source, the programs it depends on are freely available but not open source.
doi:10.1371/journal.pone.0002566.t001
ss-rRNA Taxonomy Pipeline
STAP database, and the query sequence is aligned to them using
the CLUSTALW profile alignment algorithm [40] as described
above for domain assignment. By adapting the profile alignment
algorithm, th
while gaps ar
sequence ac
Figure 1. A flow chart of the STAP pipeline.
doi:10.1371/journal.pone.0002566.g001
STAP database, and the query sequence is aligned to them using
the CLUSTALW profile alignment algorithm [40] as described
above for domain assignment. By adapting the profile alignment
algorithm, the alignments from the STAP database remain intact,
while gaps are inserted and nucleotides are trimmed for the query
sequence according to the profile defined by the previous
alignments from the databases. Thus the accuracy and quality of
the alignment generated at this step depends heavily on the quality
of the Bacterial/Archaeal ss-rRNA alignments from the
Greengenes project or the Eukaryotic ss-rRNA alignments from
the RDPII project.
Phylogenetic analysis using multiple sequence alignments rests on
the assumption that the residues (nucleotides or amino acids) at the
same position in every sequence in the alignment are homologous.
Thus, columns in the alignment for which ‘‘positional homology’’
cannot be robustly determined must be excluded from subsequent
analyses. This process of evaluating homology and eliminating
questionable columns, known as masking, typically requires time-
consuming, skillful, human intervention. We designed an automat-
ed masking method for ss-rRNA alignments, thus eliminating this
bottleneck in high-throughput processing.
First, an alignment score is calculated for each aligned column
by a method similar to that used in the CLUSTALX package [42].
Specifically, an R-dimensional sequence space representing all the
possible nucleotide character states is defined. Then for each
aligned column, the nucleotide populating that column in each of
the aligned sequences is assigned a score in each of the R
dimensions (Sr) according to the IUB matrix [42]. The consensus
‘‘nucleotide’’ for each column (X) also has R dimensions, with the
Figure 2. Domain assignment. In Step 1, STAP assigns a domain to
each query sequence based on its position in a maximum likelihood
tree of representative ss-rRNA sequences. Because the tree illustrated
here is not rooted, domain assignment would not be accurate and
Figure 1. A flow chart of the STAP pipeline.
doi:10.1371/journal.pone.0002566.g001
ss-rRNA Taxonomy Pipeline
Dongying 

Wu
Amber
Hartman
Naomi Ward
alignment used to build the profile, resulting in a multiple PD versus PID clustering, 2) to explore overlap between PhylOT
Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in this generaliz
workflow of PhylOTU. See Results section for details.
doi:10.1371/journal.pcbi.1001061.g001
Finding Metagenomic OTU
Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer JP, Green JL, Eisen JA, Pollard
KS. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity
and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:
10.1371/journal.pcbi.1001061
PhylOTU
Tom Sharpton
Katie Pollard
Jessica Green
!24
rRNA PCR: Community Comparisons
Taxa Characters
B1 ACTGCACCTATCGTTCG
B2 ACTCCACCTATCGTTCG
E1 ACTCCAGCTATCGATCG
E2 ACTCCAGGTATCGATCG
A1 ACCCCAGCTCTCGCTCG
A2 ACCCCAGCTCTGGCTCG
New1 ACCCCAGCTCTGCCTCG
New2 ACTGCACCTATCGTTCG
New3 ACCCCAGCTCTCGCTCG

New4 AGGGGAGCTCTCGCTCG
Archaea EukaryotesBacteria
!24
rRNA
rRNA
PCR
rRNA
PCR
Isolate DNA
rRNA PCR: Community Comparisons
A A A A
AA
A A A A
AA
A A
A A A
AA
A A
Taxa Characters
B1 ACTGCACCTATCGTTCG
B2 ACTCCACCTATCGTTCG
E1 ACTCCAGCTATCGATCG
E2 ACTCCAGGTATCGATCG
A1 ACCCCAGCTCTCGCTCG
A2 ACCCCAGCTCTGGCTCG
New1 ACCCCAGCTCTGCCTCG
New2 ACTGCACCTATCGTTCG
New3 ACCCCAGCTCTCGCTCG

New4 AGGGGAGCTCTCGCTCG !25
rRNA
rRNA
PCR
rRNA
PCR
Isolate DNA
rRNA PCR: Community Comparisons
A A A A
AA
A A A A
AA
A A
A A A
AA
A A
Hartman et al. BMC Bioinformatics 2010, 11:317
http://www.biomedcentral.com/1471-2105/11/317
Open AccessSOFTWARE
Software
Introducing W.A.T.E.R.S.: a Workflow for the
Alignment, Taxonomy, and Ecology of Ribosomal
Sequences
Amber L Hartman†1,3, Sean Riddle†2, Timothy McPhillips2, Bertram Ludäscher2 and Jonathan A Eisen*1
Abstract
Background: For more than two decades microbiologists have used a highly conserved microbial gene as a
phylogenetic marker for bacteria and archaea. The small-subunit ribosomal RNA gene, also known as 16 S rRNA, is
encoded by ribosomal DNA, 16 S rDNA, and has provided a powerful comparative tool to microbial ecologists. Over
time, the microbial ecology field has matured from small-scale studies in a select number of environments to massive
collections of sequence data that are paired with dozens of corresponding collection variables. As the complexity of
data and tool sets have grown, the need for flexible automation and maintenance of the core processes of 16 S rDNA
sequence analysis has increased correspondingly.
Results: We present WATERS, an integrated approach for 16 S rDNA analysis that bundles a suite of publicly available 16
S rDNA analysis software tools into a single software package. The "toolkit" includes sequence alignment, chimera
removal, OTU determination, taxonomy assignment, phylogentic tree construction as well as a host of ecological
analysis and visualization tools. WATERS employs a flexible, collection-oriented 'workflow' approach using the open-
source Kepler system as a platform.
Conclusions: By packaging available software tools into a single automated workflow, WATERS simplifies 16 S rDNA
analyses, especially for those without specialized bioinformatics, programming expertise. In addition, WATERS, like
some of the newer comprehensive rRNA analysis tools, allows researchers to minimize the time dedicated to carrying
out tedious informatics steps and to focus their attention instead on the biological interpretation of the results. One
advantage of WATERS over other comprehensive tools is that the use of the Kepler workflow system facilitates result
interpretation and reproducibility via a data provenance sub-system. Furthermore, new "actors" can be added to the
workflow as desired and we see WATERS as an initial seed for a sizeable and growing repository of interoperable, easy-
to-combine tools for asking increasingly complex microbial ecology questions.
Background
Microbial communities and how they are surveyed
Microbial communities abound in nature and are crucial
for the success and diversity of ecosystems. There is no
end in sight to the number of biological questions that
can be asked about microbial diversity on earth. From
animal and human guts to open ocean surfaces and deep
sea hydrothermal vents, to anaerobic mud swamps or
boiling thermal pools, to the tops of the rainforest canopy
and the frozen Antarctic tundra, the composition of
microbial communities is a source of natural history,
intellectual curiosity, and reservoir of environmental
health [1]. Microbial communities are also mediators of
insight into global warming processes [2,3], agricultural
success [4], pathogenicity [5,6], and even human obesity
[7,8].
In the mid-1980 s, researchers began to sequence ribo-
somal RNAs from environmental samples in order to
characterize the types of microbes present in those sam-
ples, (e.g., [9,10]). This general approach was revolution-
ized by the invention of the polymerase chain reaction
(PCR), which made it relatively easy to clone and then
* Correspondence: jaeisen@ucdavis.edu
1 Department of Medical Microbiology and Immunology and the Department
of Evolution and Ecology, Genome Center, University of California Davis, One
Shields Avenue, Davis, CA, 95616, USA
† Contributed equally
Full list of author information is available at the end of the article
WATERS - Kepler Workflow for rRNA
matics 2010, 11:317
.com/1471-2105/11/317
Page 2 of 14
genes for ribosomal RNA) in partic-
ubunit ribosomal RNA (ss-rRNA).
ed a large amount of previously
l diversity [1,11-13]. Researchers
all subunit rRNA gene not only
ith which it can be PCR amplified,
has variable and highly conserved
to be universally distributed among
nd it is useful for inferring phyloge-
4,15]. Since then, "cultivation-inde-
" have brought a revolution to the
by allowing scientists to study a
mount of diversity in many different
ments [16-18]. The general premise
Figure 1 Overview of WATERS. Schema of WATERS where white
boxes indicate "behind the scenes" analyses that are performed in WA-
Align
Check
chimeras
Cluster Build
Tree
Assign
Taxonomy
Tree w/
Taxonomy
Diversity
statistics &
graphs
Unifrac
files
Cytoscape
network
OTU table
Hartman et al. BMC Bioinformatics 2010, 11:317
http://www.biomedcentral.com/1471-2105/11/317
Page 3 of 14
Motivations
As outlined above, successfully processing microbial
sequence collections is far from trivial. Each step is com-
plex and usually requires significant bioinformatics
expertise and time investment prior to the biological
interpretation. In order to both increase efficiency and
ensure that all best-practice tools are easily usable, we
sought to create an "all-inclusive" method for performing
all of these bioinformatics steps together in one package.
To this end, we have built an automated, user-friendly,
workflow-based system called WATERS: a Workflow for
the Alignment, Taxonomy, and Ecology of Ribosomal
Sequences (Fig. 1). In addition to being automated and
simple to use, because WATERS is executed in the Kepler
scientific workflow system (Fig. 2) it also has the advan-
tage that it keeps track of the data lineage and provenance
of data products [23,24].
Automation
The primary motivation in building WATERS was to
minimize the technical, bioinformatics challenges that
arise when performing DNA sequence clustering, phylo-
genetic tree, and statistical analyses by automating the 16
S rDNA analysis workflow. We also hoped to exploit
additional features that workflow-based approaches
entail, such as optimized execution and data lineage
tracking and browsing [23,25-27]. In the earlier days of 16
S rDNA analysis, simply knowing which microbes were
present and whether they were biologically novel was a
noteworthy achievement. It was reasonable and expected,
therefore, to invest a large amount of time and effort to
get to that list of microbes. But now that current efforts
are significantly more advanced and often require com-
parison of dozens of factors and variables with datasets of
thousands of sequences, it is not practically feasible to
process these large collections "by hand", and hugely inef-
ficient if instead automated methods can be successfully
employed.
Broadening the user base
A second motivation and perspective is that by minimiz-
ing the technical difficulty of 16 S rDNA analysis through
the use of WATERS, we aim to make the analysis of these
datasets more widely available and allow individuals with
Figure 2 Screenshot of WATERS in Kepler software. Key features: the library of actors un-collapsed and displayed on the left-hand side, the input
and output paths where the user declares the location of their input files and desired location for the results files. Each green box is an individual Kepler
actor that performs a single action on the data stream. The connectors (black arrows) direct and hook up the actors in a defined sequence. Double-
clicking on any actor or connector allows it to be manipulated and re-arranged.
Hartman et al. BMC Bioinformatics 2010, 11:317
http://www.biomedcentral.com/1471-2105/11/317
Page 9 of
default is 97% and 99%), and they are also generated for
every metadata variable comparison that the user
includes.
Data pruning
To assist in troubleshooting and quality contro
WATERS returns to the user three fasta files of sequenc
Figure 3 Biologically similar results automatically produced by WATERS on published colonic microbiota samples. (A) Rarefaction curves sim
ilar to curves shown in Eckburg et al. Fig. 2; 70-72, indicate patient numbers, i.e., 3 different individuals. (B) Weighted Unifrac analysis based on phylo
genetic tree and OTU data produced by WATERS very similar to Eckburg et al. Fig. 3B. (C) Neighbor-joining phylogenetic tree (Quicktree) representing
the sequences analyzed by WATERS, which is clearly similar to Fig. S1 in Eckburg et al.
BA
3 3HUFHQW YDULDWLRQ H[SODLQHG
33HUFHQWYDULDWLRQH[SODLQHG
$%
&
'(
)
6
$ %
&
'(
)
6
$
%&
'
()
6
3&$ 3 YV 3
C
%$&7(52,'(7(6
%$&7(52,'$/(6
'(/7$3527(2%$&7(5,$
$&7,12%$&7(5,$
9(558&20,&52%,$
(36,/213527(2%$&7(5,$
),50,&87(6
&/2675,',$
&/2675,',$/(6
*$00$3527(2%$&7(5,$
&<$12%$&7(5,$
$/3+$3527(2%$&7(5,$
)862%$&7(5,$
),50,&87(6
%$&,//,
),50,&87(6
02//,&87(6
Amber

Hartman
Tree from Woese. 1987.
Microbiological Reviews 51:221
rRNA Not Perfect
Nothing is Perfect
rRNA Phylogeny Copy # Correction
Kembel SW, Wu M,
Eisen JA, Green JL
(2012) Incorporating
16S Gene Copy
Number Information
Improves Estimates of
Microbial Diversity and
Abundance. PLoS
Comput Biol 8(10):
e1002743. doi:
10.1371/journal.pcbi.
1002743 Steven
Kembel
Jessica
Green
Martin
Wu
Tree Complications 1
!29
rRNA rRNArRNA
ACUGC
ACCUAU
CGUUCG
ACUCC
AGCUAU
CGAUCG
ACCCC
AGCUCU
CGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
R ACUCCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
F ACUCCAGGUAUCGAUCG
C ACCCCAGCUCUCGCUCG
W ACCCCAGCUCUGGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
C ACCCCAGCUCUCGCUCG
EuksBacteria Arch
Isolate Ribosomes
Arch
Tree Complications 2
!30
rRNA rRNArRNA
ACUGC
ACCUAU
CGUUCG
ACUCC
AGCUAU
CGAUCG
ACCCC
AGCUCU
CGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
R ACUCCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
F ACUCCAGGUAUCGAUCG
C ACCCCAGCUCUCGCUCG
W ACCCCAGCUCUGGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
C ACCCCAGCUCUCGCUCG
EuksBacteria Arch
Isolate Ribosomes
Arch
Tree Complications 3
!31
rRNA rRNArRNA
ACUGC
ACCUAU
CGUUCG
ACUCC
AGCUAU
CGAUCG
ACCCC
AGCUCU
CGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
R ACUCCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
F ACUCCAGGUAUCGAUCG
C ACCCCAGCUCUCGCUCG
W ACCCCAGCUCUGGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
C ACCCCAGCUCUCGCUCG
EuksBacteria Arch
Isolate Ribosomes
Arch
Automated Accurate Genome Tree
Lang JM, Darling AE, Eisen JA (2013) Phylogeny of
Bacterial and Archaeal Genomes Using Conserved
Genes: Supertrees and Supermatrices. PLoS ONE
8(4): e62510. doi:10.1371/journal.pone.0062510
Jenna
Lang
Aaron
Darling
AMPHORA
Martin
Wu
Metagenomics
metagenomics
ACUGC
ACCUAU
CGUUCG
ACUCC
AGCUAU
CGAUCG
ACCCC
AGCUCU
CGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
R ACUCCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
F ACUCCAGGUAUCGAUCG
C ACCCCAGCUCUCGCUCG
W ACCCCAGCUCUGGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
C ACCCCAGCUCUCGCUCG
EukaryotesBacteria Archaea
inputs of fixed carbon or nitrogen from external sources. As with
Leptospirillum group I, both Leptospirillum group II and III have the
genes needed to fix carbon by means of the Calvin–Benson–
Bassham cycle (using type II ribulose 1,5-bisphosphate carboxy-
lase–oxygenase). All genomes recovered from the AMD system
contain formate hydrogenlyase complexes. These, in combination
with carbon monoxide dehydrogenase, may be used for carbon
fixation via the reductive acetyl coenzyme A (acetyl-CoA) pathway
by some, or all, organisms. Given the large number of ABC-type
sugar and amino acid transporters encoded in the Ferroplasma type
Figure 4 Cell metabolic cartoons constructed from the annotation of 2,180 ORFs
identified in the Leptospirillum group II genome (63% with putative assigned function) and
1,931 ORFs in the Ferroplasma type II genome (58% with assigned function). The cell
cartoons are shown within a biofilm that is attached to the surface of an acid mine
drainage stream (viewed in cross-section). Tight coupling between ferrous iron oxidation,
pyrite dissolution and acid generation is indicated. Rubisco, ribulose 1,5-bisphosphate
carboxylase–oxygenase. THF, tetrahydrofolate.
articles
NATURE | doi:10.1038/nature02340 | www.nature.com/nature 5©2004 NaturePublishing Group
Metagenomics
metagenomics
ACUGC
ACCUAU
CGUUCG
ACUCC
AGCUAU
CGAUCG
ACCCC
AGCUCU
CGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
R ACUCCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
F ACUCCAGGUAUCGAUCG
C ACCCCAGCUCUCGCUCG
W ACCCCAGCUCUGGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
C ACCCCAGCUCUCGCUCG
Metagenomics
metagenomics
ACUGC
ACCUAU
CGUUCG
ACUCC
AGCUAU
CGAUCG
ACCCC
AGCUCU
CGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
R ACUCCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
F ACUCCAGGUAUCGAUCG
C ACCCCAGCUCUCGCUCG
W ACCCCAGCUCUGGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
C ACCCCAGCUCUCGCUCG
Culture Independent “Metagenomics”
DNA DNADNA
!35
Taxa Characters
B1 ACTGCACCTATCGTTCG
B2 ACTCCACCTATCGTTCG
E1 ACTCCAGCTATCGATCG
E2 ACTCCAGGTATCGATCG
A1 ACCCCAGCTCTCGCTCG
A2 ACCCCAGCTCTGGCTCG
New1 ACCCCAGCTCTGCCTCG
New2 AGGGGAGCTCTGCCTCG
New3 ACTCCAGCTATCGATCG
New4 ACTGCACCTATCGTTCG
RecA RecARecA
http://genomebiology.com/2008/9/10/R151 Genome Biology 2008, Volume 9, Issue 10, Article R151 Wu and Eisen R151.7
Genome Biology 2008, 9:R151
sequences are not conserved at the nucleotide level [29]. As a
result, the nr database does not actually contain many more
protein marker sequences that can be used as references than
those available from complete genome sequences.
Comparison of phylogeny-based and similarity-based phylotyping
Although our phylogeny-based phylotyping is fully auto-
mated, it still requires many more steps than, and is slower
than, similarity based phylotyping methods such as a
MEGAN [30]. Is it worth the trouble? Similarity based phylo-
typing works by searching a query sequence against a refer-
ence database such as NCBI nr and deriving taxonomic
information from the best matches or 'hits'. When species
that are closely related to the query sequence exist in the ref-
erence database, similarity-based phylotyping can work well.
However, if the reference database is a biased sample or if it
contains no closely related species to the query, then the top
hits returned could be misleading [31]. Furthermore, similar-
ity-based methods require an arbitrary similarity cut-off
value to define the top hits. Because individual bacterial
genomes and proteins can evolve at very different rates, a uni-
versal cut-off that works under all conditions does not exist.
As a result, the final results can be very subjective.
In contrast, our tree-based bracketing algorithm places the
query sequence within the context of a phylogenetic tree and
only assigns it to a taxonomic level if that level has adequate
sampling (see Materials and methods [below] for details of
the algorithm). With the well sampled species Prochlorococ-
cus marinus, for example, our method can distinguish closely
related organisms and make taxonomic identifications at the
species level. Our reanalysis of the Sargasso Sea data placed
672 sequences (3.6% of the total) within a P. marinus clade.
On the other hand, for sparsely sampled clades such as
Aquifex, assignments will be made only at the phylum level.
Thus, our phylogeny-based analysis is less susceptible to data
sampling bias than a similarity based approach, and it makes
Major phylotypes identified in Sargasso Sea metagenomic dataFigure 3
Major phylotypes identified in Sargasso Sea metagenomic data. The metagenomic data previously obtained from the Sargasso Sea was reanalyzed using
AMPHORA and the 31 protein phylogenetic markers. The microbial diversity profiles obtained from individual markers are remarkably consistent. The
breakdown of the phylotyping assignments by markers and major taxonomic groups is listed in Additional data file 5.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Alphaproteobacteria
Betaproteobacteria
G
am
m
aproteobacteria
D
eltaproteobacteria
Epsilonproteobacteria
U
nclassified
proteobacteria
Bacteroidetes
C
hlam
ydiae
C
yanobacteria
Acidobacteria
Therm
otogae
Fusobacteria
ActinobacteriaAquificae
Planctom
ycetes
Spirochaetes
Firm
icutes
C
hloroflexiC
hlorobi
U
nclassified
bacteria
dnaG
frr
infC
nusA
pgk
pyrG
rplA
rplB
rplC
rplD
rplE
rplF
rplK
rplL
rplM
rplN
rplP
rplS
rplT
rpmA
rpoB
rpsB
rpsC
rpsE
rpsI
rpsJ
rpsK
rpsM
rpsS
smpB
tsf
Relativeabundance
RpoB RpoBRpoB
Rpl4 Rpl4Rpl4 rRNA rRNArRNA
Hsp70 Hsp70Hsp70
EFTu EFTuEFTu
Many other genes
better than rRNA
AMPHORA
AMPHORA
Phylotyping w/ Protein Markers
AMPHORA
http://genomebiology.com/2008/9/10/R151 Genome Biology 2008, Volume 9, Issue 10, Article R151 Wu and Eisen R151.7
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Alphaproteobacteria
Betaproteobacteria
G
am
m
aproteobacteria
D
eltaproteobacteria
Epsilonproteobacteria
U
nclassified
proteobacteria
Bacteroidetes
C
hlam
ydiae
C
yanobacteria
Acidobacteria
Therm
otogae
Fusobacteria
ActinobacteriaAquificae
Planctom
ycetes
Spirochaetes
Firm
icutes
C
hloroflexiC
hlorobi
U
nclassified
bacteria
dnaG
frr
infC
nusA
pgk
pyrG
rplA
rplB
rplC
rplD
rplE
rplF
rplK
rplL
rplM
rplN
rplP
rplS
rplT
rpmA
rpoB
rpsB
rpsC
rpsE
rpsI
rpsJ
rpsK
rpsM
rpsS
smpB
tsf
Relativeabundance
Martin Wu
GOS 1
GOS 2
GOS 3
GOS 4
GOS 5
Phylogenetic ID of Novel Lineages
Dongying 

Wu
Wu D, Wu M, Halpern A, Rusch DB,
Yooseph S, Frazier M, et al. (2011)
Stalking the Fourth Domain in
Metagenomic Data: Searching for,
Discovering, and Interpreting Novel, Deep
Branches in Marker Gene Phylogenetic
Trees. PLoS ONE 6(3): e18011. doi:
10.1371/journal.pone.0018011
Phylogenetic Diversity of Metagenomes
typically used as a qualitative measure because duplicate s
quences are usually removed from the tree. However, the
test may be used in a semiquantitative manner if all clone
even those with identical or near-identical sequences, are i
cluded in the tree (13).
Here we describe a quantitative version of UniFrac that w
call “weighted UniFrac.” We show that weighted UniFrac b
haves similarly to the FST test in situations where both a
FIG. 1. Calculation of the unweighted and the weighted UniFr
measures. Squares and circles represent sequences from two differe
environments. (a) In unweighted UniFrac, the distance between t
circle and square communities is calculated as the fraction of t
branch length that has descendants from either the square or the circ
environment (black) but not both (gray). (b) In weighted UniFra
branch lengths are weighted by the relative abundance of sequences
the square and circle communities; square sequences are weight
twice as much as circle sequences because there are twice as many tot
circle sequences in the data set. The width of branches is proportion
to the degree to which each branch is weighted in the calculations, an
gray branches have no weight. Branches 1 and 2 have heavy weigh
since the descendants are biased toward the square and circles, respe
tively. Branch 3 contributes no value since it has an equal contributio
from circle and square sequences after normalization.
Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of
Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214
Jessica
Green
Steven
Kembel
Katie
Pollard
Phylosift
Input Sequences
rRNA workflow
protein workflow
profile HMMs used to align
candidates to reference alignment
Taxonomic
Summaries
parallel option
hmmalign
multiple alignment
LAST
fast candidate search
pplacer
phylogenetic placement
LAST
fast candidate search
LAST
fast candidate search
search input against references
hmmalign
multiple alignment
hmmalign
multiple alignment
Infernal
multiple alignment
LAST
fast candidate search
<600 bp
>600 bp
Sample Analysis &
Comparison
Krona plots,
Number of reads placed
for each marker gene
Edge PCA,
Tree visualization,
Bayes factor tests
eachinputsequencescannedagainstbothworkflows
Aaron Darling
@koadman
Erik Matsen
@ematsen
Holly Bik
@hollybik
Guillaume Jospin
@guillaumejospin
Darling AE, Jospin G, Lowe E,
Matsen FA IV, Bik HM, Eisen JA.
(2014) PhyloSift: phylogenetic
analysis of genomes and
metagenomes. PeerJ 2:e243
http://dx.doi.org/10.7717/peerj.
243
Erik Lowe
Edge PCA: Identify
lineages that explain most
variation among samples
Edge PCA - Matsen and Evans 2013
Output: Edge PCA
Using Phylogeny 2: Functional Prediction
PHYLOGENENETIC PREDICTION OF GENE FUNCTION
IDENTIFY HOMOLOGS
OVERLAY KNOWN
FUNCTIONS ONTO TREE
INFER LIKELY FUNCTION
OF GENE(S) OF INTEREST
1 2 3 4 5 6
3 5
3
1A 2A 3A 1B 2B 3B
2A 1B
1A
3A
1B
2B
3B
ALIGN SEQUENCES
CALCULATE GENE TREE
1
2
4
6
CHOOSE GENE(S) OF INTEREST
2A
2A
5
3
Species 3Species 1 Species 2
1
1 2
2
2 31
1A 3A
1A 2A 3A
1A 2A 3A
4 6
4 5 6
4 5 6
2B 3B
1B 2B 3B
1B 2B 3B
ACTUAL EVOLUTION
(ASSUMED TO BE UNKNOWN)
Duplication?
EXAMPLE A EXAMPLE B
Duplication?
Duplication?
Duplication
5
METHOD
Ambiguous
Based on
Eisen, 1998
Genome Res 8:
163-167.
Phylogenomics
Overlaying Functions onto Tree
Aquae Trepa
Rat
Fly
Xenla
Mouse
Human
Yeast
Neucr
Arath
Borbu
Synsp
Neigo
Thema
Strpy
Bacsu
Ecoli
TheaqDeira
Chltr
Spombe
Yeast
Yeast
Spombe
Mouse
Human
Arath
Yeast
Human
Mouse
Arath
StrpyBacsu
Human
Celeg
Yeast
MetthBorbu
Aquae
Synsp
Deira Helpy
mSaco
Yeast
Celeg
Human
MSH4
MSH5
MutS2
MutS1
MSH1
MSH3
MSH6
MSH2
Based on Eisen, 1998

Nucl Acids Res 26: 4291-4300.
Phylogenomics ~~ Phylotyping
Eisen et al.
1992Eisen et al. 1992. J. Bact.174: 3416
Proteorhodopsin Functional Diversity
Venter et al., Science 304: 66. 2004
Shotmap
Simulate)
metagenomic)
library)
Translate)
metagenomic)
reads)
Search)
metagenomic)
pep6des)
Classify)
metagenomic)
pep6des)
Es6mate)
protein)family)
abundance)
Taxonomic)
profiles)from)real)
metagenomes)
Protein)family)
database)
IMG/ER)
reference)
genomes)
Construct))
mock))
community)
1"
Annotate)
genes)in)
genomes)
2"
Expected)
abundance)of)
gene)families)
3"
4"
5"
Protein)family)
database)
Evaluate)
es6ma6on)
accuracy)
6" 7"
8"
9"
Tom Sharpton
Katie Pollardhttps://github.com/sharpton/shotmap
dFunctional Prediction from Metagenomes
DNA DNADNA
!23
Taxa Characters
B1 ACTGCACCTATCGTTCG
B2 ACTCCACCTATCGTTCG
E1 ACTCCAGCTATCGATCG
E2 ACTCCAGGTATCGATCG
A1 ACCCCAGCTCTCGCTCG
A2 ACCCCAGCTCTGGCTCG
New1 ACCCCAGCTCTGCCTCG
New2 AGGGGAGCTCTGCCTCG
New3 ACTCCAGCTATCGATCG
New4 ACTGCACCTATCGTTCG
inputs of fixed carbon or nitrogen from external sources. As with
Leptospirillum group I, both Leptospirillum group II and III have the
genes needed to fix carbon by means of the Calvin–Benson–
Bassham cycle (using type II ribulose 1,5-bisphosphate carboxy-
lase–oxygenase). All genomes recovered from the AMD system
contain formate hydrogenlyase complexes. These, in combination
with carbon monoxide dehydrogenase, may be used for carbon
fixation via the reductive acetyl coenzyme A (acetyl-CoA) pathway
by some, or all, organisms. Given the large number of ABC-type
sugar and amino acid transporters encoded in the Ferroplasma type
Figure 4 Cell metabolic cartoons constructed from the annotation of 2,180 ORFs
identified in the Leptospirillum group II genome (63% with putative assigned function) and
1,931 ORFs in the Ferroplasma type II genome (58% with assigned function). The cell
cartoons are shown within a biofilm that is attached to the surface of an acid mine
drainage stream (viewed in cross-section). Tight coupling between ferrous iron oxidation,
pyrite dissolution and acid generation is indicated. Rubisco, ribulose 1,5-bisphosphate
carboxylase–oxygenase. THF, tetrahydrofolate.
articles
NATURE | doi:10.1038/nature02340 | www.nature.com/nature 5©2004 NaturePublishing Group
Phylogenetic Prediction of Function
• Many powerful and automated similarity based
methods for assigning genes to protein families
• COGs
• PFAM HMM searches
• Some limitations of similarity based methods can be
overcome by phylogenetic approaches
• Automated methods now available
• Sean Eddy
• Steven Brenner
• Kimmen Sjölander
Phylogenetic Prediction of Function
• Many powerful and automated similarity based
methods for assigning genes to protein families
• COGs
• PFAM HMM searches
• Some limitations of similarity based methods can be
overcome by phylogenetic approaches
• Automated methods now available
• Sean Eddy
• Steven Brenner
• Kimmen Sjölander
• But …
Carboxydothermus hydrogenoformans
• Isolated from a Russian hotspring
• Thermophile (grows at 80°C)
• Anaerobic
• Grows very efficiently on CO (Carbon
Monoxide)
• Produces hydrogen gas
• Low GC Gram positive (Firmicute)
• Genome Determined (Wu et al. 2005
PLoS Genetics 1: e65. )
Homologs of Sporulation Genes
Wu et al. 2005 PLoS
Genetics 1: e65.
Carboxydothermus sporulates
Wu et al. 2005 PLoS Genetics 1: e65.
Non-Homology Predictions:
Phylogenetic Profiling
• Step 1: Search all genes in
organisms of interest against all
other genomes
• Ask: Yes or No, is each gene
found in each other species
• Cluster genes by distribution
patterns (profiles)
Sporulation Gene Profile
Wu et al. 2005 PLoS Genetics 1: e65.
B. subtilis new sporulation genes
J Bacteriol. 2013 Jan;195(2):253-60. doi: 10.1128/JB.01778-12
Bjorn Traag
Richard Losick
Phylogenetic Profiling for Metagenomics?
Using Phylogeny 3: Linking Function and Phylogeny
HiC Crosslinking & Sequencing
Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore
RW, Eisen JA, Darling AE. (2014) Strain- and plasmid-
level deconvolution of a synthetic metagenome by
sequencing proximity ligation products. PeerJ 2:e415
http://dx.doi.org/10.7717/peerj.415
Table 1 Species alignment fractions. The number of reads aligning to each replicon present in the
synthetic microbial community are shown before and after filtering, along with the percent of total
constituted by each species. The GC content (“GC”) and restriction site counts (“#R.S.”) of each replicon,
species, and strain are shown. Bur1: B. thailandensis chromosome 1. Bur2: B. thailandensis chromosome
2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2: L. brevis plasmid 2, Ped: P. pentosaceus,
K12: E. coli K12 DH10B, BL21: E. coli BL21. An expanded version of this table can be found in Table S2.
Sequence Alignment % of Total Filtered % of aligned Length GC #R.S.
Lac0 10,603,204 26.17% 10,269,562 96.85% 2,291,220 0.462 629
Lac1 145,718 0.36% 145,478 99.84% 13,413 0.386 3
Lac2 691,723 1.71% 665,825 96.26% 35,595 0.385 16
Lac 11,440,645 28.23% 11,080,865 96.86% 2,340,228 0.46 648
Ped 2,084,595 5.14% 2,022,870 97.04% 1,832,387 0.373 863
BL21 12,882,177 31.79% 2,676,458 20.78% 4,558,953 0.508 508
K12 9,693,726 23.92% 1,218,281 12.57% 4,686,137 0.507 568
E. coli 22,575,903 55.71% 3,894,739 17.25% 9,245,090 0.51 1076
Bur1 1,886,054 4.65% 1,797,745 95.32% 2,914,771 0.68 144
Bur2 2,536,569 6.26% 2,464,534 97.16% 3,809,201 0.672 225
Bur 4,422,623 10.91% 4,262,279 96.37% 6,723,972 0.68 369
Figure 1 Hi-C insert distribution. The distribution of genomic distances between Hi-C read pairs is
shown for read pairs mapping to each chromosome. For each read pair the minimum path length on
the circular chromosome was calculated and read pairs separated by less than 1000 bp were discarded.
The 2.5 Mb range was divided into 100 bins of equal size and the number of read pairs in each bin
was recorded for each chromosome. Bin values for each chromosome were normalized to sum to 1 and
plotted.
E. coli K12 genome were distributed in a similar manner as previously reported (Fig. 1;
(Lieberman-Aiden et al., 2009)). We observed a minor depletion of alignments spanning
the linearization point of the E. coli K12 assembly (e.g., near coordinates 0 and 4686137)
due to edge eVects induced by BWA treating the sequence as a linear chromosome rather
than circular.
10.7717/peerj.415 9/19
Figure 2 Metagenomic Hi-C associations. The log-scaled, normalized number of Hi-C read pairs
associating each genomic replicon in the synthetic community is shown as a heat map (see color scale,
blue to yellow: low to high normalized, log scaled association rates). Bur1: B. thailandensis chromosome
1. Bur2: B. thailandensis chromosome 2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2:
L. brevis plasmid 2, Ped: P. pentosaceus, K12: E. coli K12 DH10B, BL21: E. coli BL21.
reference assemblies of the members of our synthetic microbial community with the same
alignment parameters as were used in the top ranked clustering (described above). We first
Figure 3 Contigs associated by Hi-C reads. A graph is drawn with nodes depicting contigs and edges
depicting associations between contigs as indicated by aligned Hi-C read pairs, with the count thereof
depicted by the weight of edges. Nodes are colored to reflect the species to which they belong (see legend)
with node size reflecting contig size. Contigs below 5 kb and edges with weights less than 5 were excluded.
Contig associations were normalized for variation in contig size.
typically represent the reads and variant sites as a variant graph wherein variant sites are
represented as nodes, and sequence reads define edges between variant sites observed in
the same read (or read pair). We reasoned that variant graphs constructed from Hi-C
data would have much greater connectivity (where connectivity is defined as the mean
path length between randomly sampled variant positions) than graphs constructed from
mate-pair sequencing data, simply because Hi-C inserts span megabase distances. Such
Figure 4 Hi-C contact maps for replicons of Lactobacillus brevis. Contact maps show the number of
Hi-C read pairs associating each region of the L. brevis genome. The L. brevis chromosome (Lac0, (A),
Chris Beitel
@datscimed
Aaron Darling
@koadman
Pink Berries
PB-PSB1
(Purple sulfur bacteria)
PB-SRB1
(Sulfate reducing bacteria)
(sulfate)
(sulfide)
Wilbanks, E.G. et al (2014). Environmental Microbiology
Lizzy Wilbanks
@lizzywilbanks
Long Reads Help, A Lot
Hiseq & Miseq
100-250 bp
Moleculo
2-20 kb
Pacbio RSII
2-20kb
Micky Kertesz,
Tim Blauwcamp
Meredith Ashby
Cheryl Heiner
Illumina-based
“synthetic long
reads”
Real-time single
molecule
sequencing
(p4-c2, p5-c3)
295 Megabases 474 Megabases61 Gigabases
Using Phylogeny 4: Better Reference Data
PhyEco Markers
Phylogenetic group Genome Number Gene Number Maker Candidates
Archaea 62 145415 106
Actinobacteria 63 267783 136
Alphaproteobacteria 94 347287 121
Betaproteobacteria 56 266362 311
Gammaproteobacteria 126 483632 118
Deltaproteobacteria 25 102115 206
Epislonproteobacteria 18 33416 455
Bacteriodes 25 71531 286
Chlamydae 13 13823 560
Chloroflexi 10 33577 323
Cyanobacteria 36 124080 590
Firmicutes 106 312309 87
Spirochaetes 18 38832 176
Thermi 5 14160 974
Thermotogae 9 17037 684
Wu D, Jospin G, Eisen JA (2013) Systematic Identification of Gene Families
for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological
Studies of Bacteria and Archaea and Their Major Subgroups. PLoS ONE
8(10): e77033. doi:10.1371/journal.pone.0077033
Better Protein Families
Representative
Genomes
Extract
Protein
Annotation
All v. All
BLAST
Homology
Clustering
(MCL)
SFams
Align &
Build
HMMs
HMMs
Screen for
Homologs
New
Genomes
Extract
Protein
Annotation
Figure 1
Sharpton et al. 2012.BMC bioinformatics,
13(1), 264.
A
B
C
Diverse Reference Genomes
Microbial Dark Matter Part 2
• Ramunas
Stepanauskas
• Tanja Woyke
• Jonathan Eisen
• Duane Moser
• Tullis Onstott
Phylogeny Isn’t Everything .. Model Systems
Wu et al. 2006 PLoS Biology 4: e188.
Baumannia makes vitamins and cofactors
Sulcia makes amino acids
Simple Symbioses
Wu et al. 2006 PLoS Biology 4: e188.
Baumannia makes vitamins and cofactors
Sulcia makes amino acids
Phylogenetic Binning
Nancy Moran
Dongying Wu
Drosophila microbiome w/ Kopp Lab
Both natural surveys and laboratory
experiments indicate that host diet
plays a major role in shaping the
Drosophila bacterial microbiome.
Laboratory strains provide only a
limited model of natural host–microbe
interactions
Jenna Lang Angus Chandler
Rice Microbiome w/ Sundar Lab
Edwards et al. 2015. Structure, variation,
and assembly of the root-associated
microbiomes of rice. PNAS
9
Supplementary Figures31
32
Fig. S1 Map depicting soil collection locations for greenhouse experiment.33
10
234
Fig. S2. Sampling and collection of the rhizocompartments. Roots are collected from rice235
plants and soil is shaken off the roots to leave ~1mm of soil around the roots. The ~1 mm of soil236
three separate rhizocompartments: the rhizosphere, rhizoplane,
and endosphere (Fig. 1A). Because the root microbiome has
been shown to correlate with the developmental stage of the
plant (10), the root-associated microbial communities were
sampled at 42 d (6 wk), when rice plants from all genotypes were
well-established in the soil but still in their vegetative phase of
growth. For our study, the rhizosphere compartment was com-
w
i
t
i
(
t
s
z
i
m
a
r
t
t
(
t
m
P
h
t
P
p
(
i
M
P
a
t
o
s
q
a
n
v
v
p
t
p
s
G
Fig. 1. Root-associated microbial communities are separable by rhizo-
compartment and soil type. (A) A representation of a rice root cross-section
depicting the locations of the microbial communities sampled. (B) Within-
sample diversity (α-diversity) measurements between rhizospheric compart-
ments indicate a decreasing gradient in microbial diversity from the rhizo-
sphere to the endosphere independent of soil type. Estimated species
richness was calculated as eShannon_entropy
. The horizontal bars within boxes
represent median. The tops and bottoms of boxes represent 75th and 25th
quartiles, respectively. The upper and lower whiskers extend 1.5× the
interquartile range from the upper edge and lower edge of the box, re-
spectively. All outliers are plotted as individual points. (C) PCoA using the
WUF metric indicates that the largest separation between microbial com-
munities is spatial proximity to the root (PCo 1) and the second largest
source of variation is soil type (PCo 2). (D) Histograms of phyla abundances in
each compartment and soil. B, bulk soil; E, endosphere; P, rhizoplane; S,
rhizosphere; Sac, Sacramento.
2 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1414592112
gate the relationship between rice ge-
icrobiome, domesticated rice varieties
rated growing regions were tested. Six
spanning two species within the Oryza
2 d in the greenhouse before sampling.
a) cultivars M104, Nipponbare (both
ties), IR50, and 93-11 (both indica va-
gside two cultivars of African cultivated
g7102 (Glab B) and TOg7267 (Glab E).
ed that rice genotype accounted for
ariation between microbial communities
% of the variance, P < 0.001; Dataset
f the variance, P < 0.066; Dataset S5H);
ntations for clustering patterns of the
nt on the first two axes of unconstrained
ppendix, Fig. S10). We then used CAP
ffect of rice genotype on the microbial
g on rice cultivar and controlling for
and technical factors, we found that ge-
ice have a significant effect on root-
mmunities (5.1%, P = 0.005, WUF, Fig.
, UUF, SI Appendix, Fig. S11A). Ordi-
AP analysis revealed clustering patterns
only partially consistent with genetic
F and UUF metrics. The two japonica
her and the two O. glaberrima cultivars
ver, the indica cultivars were split, with
O. glaberrima cultivars and IR50 clus-
cultivars.
enotypic effect manifests in individual
eparated the whole dataset to focus on
vidually and conducted CAP analysis
and technical factors. The rhizosphere
eight sites were operated under two cultivation practices: organic
cultivation and a more conventional cultivation practice termed
“ecofarming” (see below). Because genotype explained the least
variance in the greenhouse data, we limited the analysis to one
cultivar, S102, a California temperate japonica variety that is
widely cultivated by commercial growers and is closely related to
M104 (26). Field samples were collected from vegetatively
growing rice plants in flooded fields and the previously defined
rhizocompartments were analyzed as before. Unfortunately,
collection of bulk soil controls for the field experiment was not
Fig. 3. Host plant genotype significantly affects microbial communities in
the rhizospheric compartments. (A) Ordination of CAP analysis using the
WUF metric constrained to rice genotype. (B) Within-sample diversity
measurements of rhizosphere samples of each cultivar grown in each soil.
Estimated species richness was calculated as eShannon_entropy
. The horizontal
bars within boxes represent median. The tops and bottoms of boxes repre-
sent 75th and 25th quartiles, respectively. The upper and lower whiskers
extend 1.5× the interquartile range from the upper edge and lower edge of
the box, respectively. All outliers are plotted as individual points.
oi/10.1073/pnas.1414592112 Edwards et al.
fields are too high to find representative soil that is unlikely to
be affected by nearby plants. Amplification and sequencing of
the field microbiome samples yielded 13,349,538 high-quality
sequences (median: 54,069 reads per sample; range: 12,535–
148,233 reads per sample; Dataset S13). The sequences were
clustered into OTUs using the same criteria as the greenhouse
experiment, yielding 222,691 microbial OTUs and 47,983 OTUs
with counts >5 across the field dataset.
We found that the microbial diversity of field rice plants is
significantly influenced by the field site. α-Diversity measure-
ments of the field rhizospheres indicated that the cultivation site
significantly impacts microbial diversity (SI Appendix, Fig. S14A,
P = 2.00E-16, ANOVA and Dataset S14). Unconstrained PCoA
using both the WUF and UUF metrics showed that microbial
communities separated by field site across the first axis (Fig. 4B,
WUF and SI Appendix, Fig. S14B, UUF). PERMANOVA agreed
with the unconstrained PCoA in that field site explained the
largest proportion of variance between the microbial communi-
ties for field plants (30.4% of variance, P < 0.001, WUF, Dataset
S5O and 26.6% of variance, P < 0.001, UUF, Dataset S5P). CAP
analysis constrained to field site and controlled for rhizocom-
partment, cultivation practice, and technical factors (sequencing
batch and biological replicate) agreed with the PERMANOVA
results in that the field site explains the largest proportion of
variance between the root-associated microbial communities in
field plants (27.3%, P = 0.005, WUF, SI Appendix, Fig. S15A
and 28.9%, P = 0.005, UUF, SI Appendix, Fig. S15E), sug-
gesting that geographical factors may shape root-associated
microbial communities.
Rhizospheric Compartmentalization Is Retained in Field Plants. Sim-
ilar to the greenhouse plants, the rhizospheric microbiomes of
field plants are distinguishable by compartment. α-Diversity of
the field plants again showed that the rhizosphere had the
highest microbial diversity, whereas the endosphere had the least
S15). PCoA
the WUF a
compartmen
Appendix, F
separation i
ond largest
(20.76%, P
UUF, Data
biomes cons
trolled for f
agreed with
variance bet
compartmen
and 10.9%,
Taxonomi
overall sim
Chloroflexi,
microbiota.
endosphere
Proteobacteri
and Plancto
distribution
trend from t
Appendix, Fi
We again
OTUs in the
S16). We fo
endosphere c
representing
Fig. S17). Th
the genus A
and Alphap
terestingly, 1
found to b
greenhouse
OTUs were
sisted of tax
and Myxoco
bidopsis roo
Cultivation Pr
The rice fiel
practices, org
tion called
farming in th
are all perm
harvest fumi
itself does si
partments ov
a significant
the rhizocom
indicating th
affected diffe
the rhizosph
practice, with
zospheres th
Dataset S14)
crobial comm
tests; Datase
practices are
the WUF m
S14D). PERFig. 4. Root-associated microbiomes from field-grown plants are separable
by cultivation site, rhizospheric compartment, and cultivation practice. (A)
Variation w/in Plant
Cultivation Site Effects
Rice Genotype Effects
and mitochondrial) reads to analyze microbial abundance in
the endosphere over time (Fig. 6A). Using this technique, we
confirmed the sterility of seedling roots before transplantation.
We found that microbial penetrance into the endosphere oc-
curred at or before 24 h after transplantation and that the pro-
portion of microbial reads to organellar reads increased over the
first 2 wk after transplantation (Fig. 6A). To further support the
evidence for microbiome acquisition within the first 24 h, we
sampled root endospheric microbiomes from sterilely germi-
nated seedlings before transplanting into Davis field soil as well
as immediately after transplantation and 24 h after transplan-
tation (SI Appendix, Fig. S24). The root endospheres of sterilely
germinated seedlings, as well as seedlings transplanted into
Davis field soil for 1 min, both had a very low percentage of
microbial reads compared with organellar reads (0.22% and
0.71%), with the differences not statistically significant (P = 0.1,
Wilcoxon test). As before, endospheric microbial abundance
increased significantly, by >10-fold after 24 h in field soil (3.95%,
P = 0.05, Wilcoxon test). We conclude that brief soil contact
does not strongly increase the proportion of microbial reads, and
therefore the increase in microbial reads at 24 h is indicative of
endophyte acquisition within 1 d after transplantation.
α-Diversity significantly varied by rhizocompartment (P < 2E-
16; Dataset S23) and there was a significant interaction between
rhizocompartment and collection time (P = 0.042; Dataset S23);
however, when each rhizocompartment was analyzed individ-
ually, the bulk soil was the only compartment that showed
(13 d) approach the endosphere and rhizoplane microbiome
compositions for plants that have been grown in the green-
house for 42 d.
There are slight shifts in the distribution of phyla over time;
however, there are significant distinctions between the com-
partments starting as early as 24 h after transplantation into soil
(Fig. 6D, SI Appendix, Figs. S24B and S26, and Dataset S24).
Because each phylum consists of diverse OTUs that could ex-
hibit very different behaviors during acquisition, we next ex-
amined the dynamics and colonization patterns of specific
OTUs within the time-course experiment. The core set of 92
endosphere-enriched OTUs obtained from the previous green-
house experiment (SI Appendix, Fig. S9C) was analyzed for
relative abundances at different time points (Fig. 6E). Of the 92
core endosphere-enriched microbes present in the greenhouse
experiment, 53 OTUs were detectable in the endosphere in the
time-course experiment. The average abundance profile over
time revealed a colonization pattern for the core endospheric
microbiome. Relative abundance of the core endosphere-
enriched microbiome peaks early (3 d) in the rhizosphere and
then decreases back to a steady, low level for the remainder of
the time points. Similarly, the rhizoplane profile shows an in-
crease after 3 d with a peak at 8 d with a decline at 13 d. The
endosphere generally follows the rhizoplane profile, except that
relative abundance is still increasing at 13 d. These results sug-
gest that the core endospheric microbes are first attracted to the
rhizosphere and then locate to the rhizoplane, where they attach
Fig. 5. OTU coabundance network reveals modules of OTUs associated with methane cycling. (A) Subset of the entire network corresponding to 11
modules with methane cycling potential. Each node represents one OTU and an edge is drawn between OTUs if they share a Pearson correlation of
greater than or equal to 0.6. (B) Depiction of module 119 showing the relationship between methanogens, syntrophs, methanotrophs, and other
methane cycling taxonomies. Each node represents one OTU and is labeled by the presumed function of that OTU’s taxonomy in methane cycling. An
edge is drawn between two OTUs if they have a Pearson correlation of greater than or equal to 0.6. (C) Mean abundance profile for OTUs in module 119
across all rhizocompartments and field sites. The position along the x axis corresponds to a different field site. Error bars represent SE. The x and y axes
represent no particular scale.
PLANTBIOLOGYPNASPLUS
Function x Genotype
of magnitude greater than in any single plant species to date.
Under controlled greenhouse conditions, the rhizocompartments
described the largest source of variation in the microbial com-
munities sampled (Dataset S5A). The pattern of separation be-
tween the microbial communities in each compartment is
consistent with a spatial gradient from the bulk soil across the
rhizosphere and rhizoplane into the endosphere (Fig. 1C).
Similarly, microbial diversity patterns within samples hold the
same pattern where there is a gradient in α-diversity from the
rhizosphere to the endosphere (Fig. 1B). Enrichment and de-
pletion of certain microbes across the rhizocompartments indi-
cates that microbial colonization of rice roots is not a passive
process and that plants have the ability to select for certain mi-
crobial consortia or that some microbes are better at filling the
root colonizing niche. Similar to studies in Arabidopsis, we found
that the relative abundance of Proteobacteria is increased in the
endosphere compared with soil, and that the relative abundances
of Acidobacteria and Gemmatimonadetes decrease from the soil
to the endosphere (9–11), suggesting that the distribution of
different bacterial phyla inside the roots might be similar for all
land plants (Fig. 1D and Dataset S6). Under controlled green-
house conditions, soil type described the second largest source
of variation within the microbial communities of each sample.
However, the soil source did not affect the pattern of separation
between the rhizospheric compartments, suggesting that the
rhizocompartments exert a recruitment effect on microbial con-
sortia independent of the microbiome source.
By using differential OTU abundance analysis in the com-
partments, we observed that the rhizosphere serves an enrich-
ment role for a subset of microbial OTUs relative to bulk soil
(Fig. 2). Further, the majority of the OTUs enriched in the
rhizosphere are simultaneously enriched in the rhizoplane and/or
endosphere of rice roots (Fig. 2B and SI Appendix, Fig. S16B),
consistent with a recruitment model in which factors produced by
the root attract taxa that can colonize the endosphere. We found
that the rhizoplane, although enriched for OTUs that are also
Time Series
Acknowledgements
DOE JGI Sloan GBMF NSF
DHS DARPA
Aaron Darling

Lizzy
Wilbanks
Jenna Lang Russell
Neches
Rob Knight
Jack Gilbert Tanja Woyke Rob Dunn
Katie Pollard
Jessica
Green
Darlene
Cavalier
Eddy RubinWendy Brown
Dongying Wu
Phil
Hugenholtz
DSMZ
Sundar
Srijak
Bhatnagar David Coil
Alex Alexiev
Hannah
Holland-Moritz
Holly Bik
John Zhang
Holly
Menninger
Guillaume
Jospin
David Lang
Cassie
Ettinger
Tim HarkinsJennifer Gardy
Holly Ganz

More Related Content

What's hot

Introduction to 16S Microbiome Analysis
Introduction to 16S Microbiome AnalysisIntroduction to 16S Microbiome Analysis
Introduction to 16S Microbiome Analysis
Bioinformatics and Computational Biosciences Branch
 
Introduction to Proteogenomics
Introduction to Proteogenomics Introduction to Proteogenomics
Introduction to Proteogenomics
Yasset Perez-Riverol
 
BioMinds Poster!!!!!!!!
BioMinds Poster!!!!!!!!BioMinds Poster!!!!!!!!
BioMinds Poster!!!!!!!!
Zuleika86
 
metagenomics
metagenomicsmetagenomics
metagenomics
Ghooda Shaqour
 
Reframing Phylogenomics
Reframing PhylogenomicsReframing Phylogenomics
Reframing Phylogenomics
Joe Parker
 
Metagenomics: An overview
Metagenomics: An overviewMetagenomics: An overview
Metagenomics: An overview
Jerome Andonissamy
 
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Mick Watson
 
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
Saul Kravitz
 
Advancing the Metagenomics Revolution
Advancing the Metagenomics RevolutionAdvancing the Metagenomics Revolution
Advancing the Metagenomics Revolution
Larry Smarr
 
Metagenomics as a tool for biodiversity and health
Metagenomics as a tool for biodiversity and healthMetagenomics as a tool for biodiversity and health
Metagenomics as a tool for biodiversity and health
Alberto Dávila
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Larry Smarr
 
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Larry Smarr
 
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
Eman Abdelrazik
 
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICSPROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
Lubna MRL
 
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
jennomics
 
Viral Metagenomics (CABBIO 20150629 Buenos Aires)
Viral Metagenomics (CABBIO 20150629 Buenos Aires)Viral Metagenomics (CABBIO 20150629 Buenos Aires)
Viral Metagenomics (CABBIO 20150629 Buenos Aires)
bedutilh
 
Folker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data AnnotationFolker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data Annotation
GigaScience, BGI Hong Kong
 
Metagenomics and Industrial Application
Metagenomics and Industrial ApplicationMetagenomics and Industrial Application
Metagenomics and Industrial Application
Zuleika86
 
Metagenomics and it’s applications
Metagenomics and it’s applicationsMetagenomics and it’s applications
Metagenomics and it’s applications
Sham Sadiq
 
[2013.09.27] extracting genomes from metagenomes
[2013.09.27] extracting genomes from metagenomes[2013.09.27] extracting genomes from metagenomes
[2013.09.27] extracting genomes from metagenomes
Mads Albertsen
 

What's hot (20)

Introduction to 16S Microbiome Analysis
Introduction to 16S Microbiome AnalysisIntroduction to 16S Microbiome Analysis
Introduction to 16S Microbiome Analysis
 
Introduction to Proteogenomics
Introduction to Proteogenomics Introduction to Proteogenomics
Introduction to Proteogenomics
 
BioMinds Poster!!!!!!!!
BioMinds Poster!!!!!!!!BioMinds Poster!!!!!!!!
BioMinds Poster!!!!!!!!
 
metagenomics
metagenomicsmetagenomics
metagenomics
 
Reframing Phylogenomics
Reframing PhylogenomicsReframing Phylogenomics
Reframing Phylogenomics
 
Metagenomics: An overview
Metagenomics: An overviewMetagenomics: An overview
Metagenomics: An overview
 
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
 
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
 
Advancing the Metagenomics Revolution
Advancing the Metagenomics RevolutionAdvancing the Metagenomics Revolution
Advancing the Metagenomics Revolution
 
Metagenomics as a tool for biodiversity and health
Metagenomics as a tool for biodiversity and healthMetagenomics as a tool for biodiversity and health
Metagenomics as a tool for biodiversity and health
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
 
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
 
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
 
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICSPROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
 
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
 
Viral Metagenomics (CABBIO 20150629 Buenos Aires)
Viral Metagenomics (CABBIO 20150629 Buenos Aires)Viral Metagenomics (CABBIO 20150629 Buenos Aires)
Viral Metagenomics (CABBIO 20150629 Buenos Aires)
 
Folker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data AnnotationFolker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data Annotation
 
Metagenomics and Industrial Application
Metagenomics and Industrial ApplicationMetagenomics and Industrial Application
Metagenomics and Industrial Application
 
Metagenomics and it’s applications
Metagenomics and it’s applicationsMetagenomics and it’s applications
Metagenomics and it’s applications
 
[2013.09.27] extracting genomes from metagenomes
[2013.09.27] extracting genomes from metagenomes[2013.09.27] extracting genomes from metagenomes
[2013.09.27] extracting genomes from metagenomes
 

Viewers also liked

Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...
Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...
Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...
Renzo Kottmann
 
The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First ...
The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First ...The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First ...
The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First ...
Renzo Kottmann
 
Sophie F. summer Poster Final
Sophie F. summer Poster FinalSophie F. summer Poster Final
Sophie F. summer Poster Final
Sophie Friedheim
 
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
Mads Albertsen
 
16S classifier
16S classifier16S classifier
16S classifier
Ashok Sharma
 
16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence Analysis16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence Analysis
Abdulrahman Muhammad
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overview
Paolo Dametto
 
Future of metagenomics
Future of metagenomicsFuture of metagenomics
Future of metagenomics
Francisco Rodriguez-Valera
 
Bacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.pptBacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.ppt
Rakesh Kumar
 
[13.07.07] albertsen mewe13 metagenomics
[13.07.07] albertsen mewe13 metagenomics[13.07.07] albertsen mewe13 metagenomics
[13.07.07] albertsen mewe13 metagenomics
Mads Albertsen
 
Novel Computational Approaches to Investigate Microbial Diversity
Novel Computational Approaches to Investigate Microbial DiversityNovel Computational Approaches to Investigate Microbial Diversity
Novel Computational Approaches to Investigate Microbial Diversity
Qingpeng "Q.P." Zhang
 
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
Justin Johnson
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Dayananda Salam
 
Metagenomics newer approach in understanding Microbes
Metagenomics newer approach in understanding Microbes  Metagenomics newer approach in understanding Microbes
Metagenomics newer approach in understanding Microbes
Society for Microbiology and Infection care
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
Surender Rawat
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Surya Saha
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiome
jukais
 
Metagenomics sequencing
Metagenomics sequencingMetagenomics sequencing
Metagenomics sequencing
cdgenomics525
 
Next Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewNext Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology Overview
Dominic Suciu
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
Rahul Sahu
 

Viewers also liked (20)

Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...
Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...
Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...
 
The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First ...
The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First ...The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First ...
The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First ...
 
Sophie F. summer Poster Final
Sophie F. summer Poster FinalSophie F. summer Poster Final
Sophie F. summer Poster Final
 
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
 
16S classifier
16S classifier16S classifier
16S classifier
 
16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence Analysis16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence Analysis
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overview
 
Future of metagenomics
Future of metagenomicsFuture of metagenomics
Future of metagenomics
 
Bacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.pptBacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.ppt
 
[13.07.07] albertsen mewe13 metagenomics
[13.07.07] albertsen mewe13 metagenomics[13.07.07] albertsen mewe13 metagenomics
[13.07.07] albertsen mewe13 metagenomics
 
Novel Computational Approaches to Investigate Microbial Diversity
Novel Computational Approaches to Investigate Microbial DiversityNovel Computational Approaches to Investigate Microbial Diversity
Novel Computational Approaches to Investigate Microbial Diversity
 
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Metagenomics newer approach in understanding Microbes
Metagenomics newer approach in understanding Microbes  Metagenomics newer approach in understanding Microbes
Metagenomics newer approach in understanding Microbes
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiome
 
Metagenomics sequencing
Metagenomics sequencingMetagenomics sequencing
Metagenomics sequencing
 
Next Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewNext Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology Overview
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 

Similar to Talk by J. Eisen for NZ Computational Genomics meeting

Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...
Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...
Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...
Jonathan Eisen
 
Diversity Diversity Diversity Diversity ....
Diversity Diversity Diversity Diversity ....Diversity Diversity Diversity Diversity ....
Diversity Diversity Diversity Diversity ....
Jonathan Eisen
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Martin Hartmann
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
Ankit Bhardwaj
 
metagenomicsanditsapplications-161222180924.pdf
metagenomicsanditsapplications-161222180924.pdfmetagenomicsanditsapplications-161222180924.pdf
metagenomicsanditsapplications-161222180924.pdf
VisheshMishra20
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
xRowlet
 
Trends In Genomics
Trends In GenomicsTrends In Genomics
Trends In Genomics
Saul Kravitz
 
Knowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional PredictionsKnowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional Predictions
Golden Helix Inc
 
Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...
OECD Environment
 
Meta genomics talks
Meta genomics talksMeta genomics talks
Meta genomics talks
Ateeq Khaliq
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
Yannick Wurm
 
T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
Jaclyn Williams
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and Approaches
Elia Brodsky
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods
Zohaib HUSSAIN
 
Bioinformatics-2009-Moura-1096-8
Bioinformatics-2009-Moura-1096-8Bioinformatics-2009-Moura-1096-8
Bioinformatics-2009-Moura-1096-8
Carolina Ruivo Pereira
 
Computational Genomics - Bioinformatics - IK
Computational Genomics - Bioinformatics - IKComputational Genomics - Bioinformatics - IK
Computational Genomics - Bioinformatics - IK
Ilgın Kavaklıoğulları
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
drelamuruganvet
 
Article
ArticleArticle
Article
MisbahAlwi
 
Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)
Sijo A
 
FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
Rothamsted Research, UK
 

Similar to Talk by J. Eisen for NZ Computational Genomics meeting (20)

Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...
Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...
Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...
 
Diversity Diversity Diversity Diversity ....
Diversity Diversity Diversity Diversity ....Diversity Diversity Diversity Diversity ....
Diversity Diversity Diversity Diversity ....
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
 
metagenomicsanditsapplications-161222180924.pdf
metagenomicsanditsapplications-161222180924.pdfmetagenomicsanditsapplications-161222180924.pdf
metagenomicsanditsapplications-161222180924.pdf
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
Trends In Genomics
Trends In GenomicsTrends In Genomics
Trends In Genomics
 
Knowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional PredictionsKnowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional Predictions
 
Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...
 
Meta genomics talks
Meta genomics talksMeta genomics talks
Meta genomics talks
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
 
T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and Approaches
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods
 
Bioinformatics-2009-Moura-1096-8
Bioinformatics-2009-Moura-1096-8Bioinformatics-2009-Moura-1096-8
Bioinformatics-2009-Moura-1096-8
 
Computational Genomics - Bioinformatics - IK
Computational Genomics - Bioinformatics - IKComputational Genomics - Bioinformatics - IK
Computational Genomics - Bioinformatics - IK
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Article
ArticleArticle
Article
 
Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)
 
FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
 

More from Jonathan Eisen

Eisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfEisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdf
Jonathan Eisen
 
Phylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesPhylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of Microbes
Jonathan Eisen
 
Talk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingTalk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meeting
Jonathan Eisen
 
Thoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsThoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current Actions
Jonathan Eisen
 
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Jonathan Eisen
 
A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2
Jonathan Eisen
 
EVE198 Summer Session Class 4
EVE198 Summer Session Class 4EVE198 Summer Session Class 4
EVE198 Summer Session Class 4
Jonathan Eisen
 
EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1
Jonathan Eisen
 
EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines
Jonathan Eisen
 
EVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionEVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 Introduction
Jonathan Eisen
 
EVE198 Spring2021 Class2
EVE198 Spring2021 Class2EVE198 Spring2021 Class2
EVE198 Spring2021 Class2
Jonathan Eisen
 
EVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesEVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 Vaccines
Jonathan Eisen
 
EVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionEVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA Detection
Jonathan Eisen
 
EVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionEVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 Introduction
Jonathan Eisen
 
EVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingEVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID Testing
Jonathan Eisen
 
EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesEVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID Vaccines
Jonathan Eisen
 
EVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionEVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID Transmission
Jonathan Eisen
 
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesEVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
Jonathan Eisen
 
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingEVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
Jonathan Eisen
 
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionEVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
Jonathan Eisen
 

More from Jonathan Eisen (20)

Eisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfEisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdf
 
Phylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesPhylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of Microbes
 
Talk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingTalk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meeting
 
Thoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsThoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current Actions
 
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
 
A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2
 
EVE198 Summer Session Class 4
EVE198 Summer Session Class 4EVE198 Summer Session Class 4
EVE198 Summer Session Class 4
 
EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1
 
EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines
 
EVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionEVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 Introduction
 
EVE198 Spring2021 Class2
EVE198 Spring2021 Class2EVE198 Spring2021 Class2
EVE198 Spring2021 Class2
 
EVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesEVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 Vaccines
 
EVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionEVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA Detection
 
EVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionEVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 Introduction
 
EVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingEVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID Testing
 
EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesEVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID Vaccines
 
EVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionEVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID Transmission
 
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesEVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
 
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingEVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
 
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionEVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
 

Recently uploaded

bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
RASHMI M G
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
SSR02
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 

Recently uploaded (20)

bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 

Talk by J. Eisen for NZ Computational Genomics meeting

  • 1. Phylogeny driven approaches to the study of microbial diversity September 3, 2015 Queenstown Computational Genomics Conference Jonathan A. Eisen @phylogenomics University of California, Davis
  • 2. 0 1000 2000 3000 4000 00 01 02 03 04 05 06 07 08 09 10 11 12 13 Pubmed “Microbiome” Hits The Rise of the Microbiome
  • 3. microBIOME or microbiOME • microbi-OME • collection of genomes of microbes from a community (emphasis on OME) • micro-BIOME • a community of microbes (emphasis on BIOME) • see http://tinyurl.com/definemicrobiome
  • 4. Not Just About Humans or Hosts
  • 6. Why Now I: Appreciation of Microbial Diversity Functional Diversity Diversity of Form Phylogenetic Diversity
  • 7. Why Now I: Appreciation of Microbial Diversity Functional Diversity Diversity of Form Phylogenetic Diversity MICROBES RUN THE PLANET
  • 8. Why Now II: Post Genome Blues The Microbiome Transcriptome VariomeEpigenome Overselling the Human Genome?
  • 10. Why Now IV: Sequencing Has Gone Crazy
  • 11. Sequencing Revolution !10 •More genes and genomes •Deeper sequencing • The rare biosphere • Relative abundance estimates •More samples (with barcoding) • Times series • Spatially diverse sampling • Fine scale sampling
  • 12. Turnbaugh et al Nature. 2006 444(7122):1027-31. Why Now V: Microbiome Functions
  • 13. Uses of Phylogeny 1: Species Phylogeny
  • 14. Woese: Classification of Cultured Taxa by rRNA !13 rRNA rRNArRNA ACUGC ACCUAU CGUUCG ACUCC AGCUAU CGAUCG ACCCC AGCUCU CGCUCG Taxa Characters S ACUGCACCUAUCGUUCG R ACUCCACCUAUCGUUCG E ACUCCAGCUAUCGAUCG F ACUCCAGGUAUCGAUCG C ACCCCAGCUCUCGCUCG W ACCCCAGCUCUGGCUCG Taxa Characters S ACUGCACCUAUCGUUCG E ACUCCAGCUAUCGAUCG C ACCCCAGCUCUCGCUCG EukaryotesBacteria ?????ArchaebacteriaArchaea Isolate Ribosomes
  • 15. Archaea Woese: Classification of Cultured Taxa by rRNA PCR !15 rRNA rRNA PCR rRNA PCR ACUGC ACCUAU CGUUCG ACUCC AGCUAU CGAUCG ACCCC AGCUCU CGCUCG Taxa Characters S ACUGCACCUAUCGUUCG R ACUCCACCUAUCGUUCG E ACUCCAGCUAUCGAUCG F ACUCCAGGUAUCGAUCG C ACCCCAGCUCUCGCUCG W ACCCCAGCUCUGGCUCG Taxa Characters S ACUGCACCUAUCGUUCG E ACUCCAGCUAUCGAUCG C ACCCCAGCUCUCGCUCG EukaryotesBacteria Isolate DNA
  • 16. Archaea !16 rRNA rRNA PCR rRNA PCR EukaryotesBacteria Isolate DNA ACTGC ACCTAT CGTTCG ACTGC ACCTAT CGTTCG ACTGC ACCTAT CGTTCG Taxa Characters B1 ACTGCACCTATCGTTCG B2 ACTCCACCTATCGTTCG E1 ACTCCAGCTATCGATCG E2 ACTCCAGGTATCGATCG A1 ACCCCAGCTCTCGCTCG A2 ACCCCAGCTCTGGCTCG New1 ACTGCACCTATCGTTCG Phylotyping via rRNA PCR: One Taxon
  • 17. Chemosymbiont rRNA Phylotyping !17 Eisen et al. 1992. J. Bact.174: 3416Colleen Cavanaugh
  • 18. Taxa Characters B1 ACTGCACCTATCGTTCG B2 ACTCCACCTATCGTTCG E1 ACTCCAGCTATCGATCG E2 ACTCCAGGTATCGATCG A1 ACCCCAGCTCTCGCTCG A2 ACCCCAGCTCTGGCTCG New1 ACCCCAGCTCTGCCTCG New2 ACTGCACCTATCGTTCG Archaea EukaryotesBacteria ACTGC ACCTAT CGTTCG ACTGC ACCTAT CGTTCG ACCCC AGCTCT CGCTCG !18 rRNA rRNA PCR rRNA PCR Isolate DNA Phylotyping via rRNA PCR: Two Taxa
  • 19. ACTGC ACCTAT CGTTCG ACTCC AGCTAT CGATCG ACCCC AGCTCT CGCTCG AGGGG AGCTCT CGCTCG AGGGG AGCTCT CGCTCG ACTGC ACCTAT CGTTCG Taxa Characters B1 ACTGCACCTATCGTTCG B2 ACTCCACCTATCGTTCG E1 ACTCCAGCTATCGATCG E2 ACTCCAGGTATCGATCG A1 ACCCCAGCTCTCGCTCG A2 ACCCCAGCTCTGGCTCG New1 ACCCCAGCTCTGCCTCG New2 ACTGCACCTATCGTTCG New3 ACCCCAGCTCTCGCTCG
 New4 AGGGGAGCTCTCGCTCG Archaea EukaryotesBacteria !19 rRNA rRNA PCR rRNA PCR Isolate DNA Phylotyping via rRNA PCR: Four Taxa
  • 21. !21 Approaching to NGS Discovery of DNA structure (Cold Spring Harb. Symp. Quant. Biol. 1953;18:123-31) 1953 Sanger sequencing method by F. Sanger (PNAS ,1977, 74: 560-564) 1977 PCR by K. Mullis (Cold Spring Harb Symp Quant Biol. 1986;51 Pt 1:263-73) 1983 Development of pyrosequencing (Anal. Biochem., 1993, 208: 171-175; Science ,1998, 281: 363-365) 1993 1980 1990 2000 2010 Single molecule emulsion PCR 1998 Human Genome Project (Nature , 2001, 409: 860–92; Science, 2001, 291: 1304–1351) Founded 454 Life Science 2000 454 GS20 sequencer (First NGS sequencer) 2005 Founded Solexa 1998 Solexa Genome Analyzer (First short-read NGS sequencer) 2006 GS FLX sequencer (NGS with 400-500 bp read lenght) 2008 Hi-Seq2000 (200Gbp per Flow Cell) 2010 Illumina acquires Solexa (Illumina enters the NGS business) 2006 ABI SOLiD (Short-read sequencer based upon ligation) 2007 Roche acquires 454 Life Sciences (Roche enters the NGS business) 2007 NGS Human Genome sequencing (First Human Genome sequencing based upon NGS technology) 2008 From Slideshare presentation of Cosentino Cristian http://www.slideshare.net/cosentia/high-throughput-equencing Miseq Roche Jr Ion Torrent PacBio Oxford Automation is Critical AAATCGCTAGCGC CGGCGAGCTAGC CGAGCGATCGAGC CGAGCATCGAGTA
  • 22. STAP (for rRNA) An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP) Dongying Wu1 *, Amber Hartman1,6 , Naomi Ward4,5 , Jonathan A. Eisen1,2,3 1 UC Davis Genome Center, University of California Davis, Davis, California, United States of America, 2 Section of Evolution and Ecology, College of Biological Sciences, University of California Davis, Davis, California, United States of America, 3 Department of Medical Microbiology and Immunology, School of Medicine, University of California Davis, Davis, California, United States of America, 4 Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, United States of America, 5 Center of Marine Biotechnology, Baltimore, Maryland, United States of America, 6 The Johns Hopkins University, Department of Biology, Baltimore, Maryland, United States of America Abstract Comparative analysis of small-subunit ribosomal RNA (ss-rRNA) gene sequences forms the basis for much of what we know about the phylogenetic diversity of both cultured and uncultured microorganisms. As sequencing costs continue to decline and throughput increases, sequences of ss-rRNA genes are being obtained at an ever-increasing rate. This increasing flow of data has opened many new windows into microbial diversity and evolution, and at the same time has created significant methodological challenges. Those processes which commonly require time-consuming human intervention, such as the preparation of multiple sequence alignments, simply cannot keep up with the flood of incoming data. Fully automated methods of analysis are needed. Notably, existing automated methods avoid one or more steps that, though computationally costly or difficult, we consider to be important. In particular, we regard both the building of multiple sequence alignments and the performance of high quality phylogenetic analysis to be necessary. We describe here our fully- automated ss-rRNA taxonomy and alignment pipeline (STAP). It generates both high-quality multiple sequence alignments and phylogenetic trees, and thus can be used for multiple purposes including phylogenetically-based taxonomic assignments and analysis of species diversity in environmental samples. The pipeline combines publicly-available packages (PHYML, BLASTN and CLUSTALW) with our automatic alignment, masking, and tree-parsing programs. Most importantly, this automated process yields results comparable to those achievable by manual analysis, yet offers speed and capacity that are unattainable by manual efforts. Citation: Wu D, Hartman A, Ward N, Eisen JA (2008) An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP). PLoS ONE 3(7): e2566. doi:10.1371/journal.pone.0002566 multiple alignment and phylogeny was deemed unfeasible. However, this we believe can compromise the value of the results. For example, the delineation of OTUs has also been automated via tools that do not make use of alignments or phylogenetic trees (e.g., Greengenes). This is usually done by carrying out pairwise comparisons of sequences and then clustering of sequences that have better than some cutoff threshold of similarity with each other). This approach can be powerful (and reasonably efficient) but it too has limitations. In particular, since multiple sequence alignments are not used, one cannot carry out standard phylogenetic analyses. In addition, without multiple sequence alignments one might end up comparing and contrasting different regions of a sequence depending on what it is paired with. The limitations of avoiding multiple sequence alignments and phylogenetic analysis are readily apparent in tools to classify sequences. For example, the Ribosomal Database Project’s Classifier program [29] focuses on composition characteristics of each sequence (e.g., oligonucleotide frequency) and assigns taxonomy based upon clustering genes by their composition. Though this is fast and completely automatable, it can be misled in cases where distantly related sequences have converged on similar composition, something known to be a major problem in ss-rRNA sequences [30]. Other taxonomy assignment systems focus primarily on the similarity of sequences. The simplest of these is classification tools it does have some limitations. For example, the generation of new alignments for each sequence is both computational costly, and does not take advantage of available curated alignments that make use of ss-RNA secondary structure to guide the primary sequence alignment. Perhaps most importantly however is that the tool is not fully automated. In addition, it does not generate multiple sequence alignments for all sequences in a dataset which would be necessary for doing many analyses. Automated methods for analyzing rRNA sequences are also available at the web sites for multiple rRNA centric databases, such as Greengenes and the Ribosomal Database Project (RDPII). Though these and other web sites offer diverse powerful tools, they do have some limitations. For example, not all provide multiple sequence alignments as output and few use phylogenetic approaches for taxonomy assignments or other analyses. More importantly, all provide only web-based interfaces and their integrated software, (e.g., alignment and taxonomy assignment), cannot be locally installed by the user. Therefore, the user cannot take advantage of the speed and computing power of parallel processing such as is available on linux clusters, or locally alter and potentially tailor these programs to their individual computing needs (Table 1). Given the limited automated tools that are available for Table 1. Comparison of STAP’s computational abilities relative to existing commonly-used ss-RNA analysis tools. STAP ARB Greengenes RDP Installed where? Locally Locally Web only Web only User interface Command line GUI Web portal Web portal Parallel processing YES NO NO NO Manual curation for taxonomy assignment NO YES NO NO Manual curation for alignment NO YES NO* NO Open source YES** NO NO NO Processing speed Fast Slow Medium Medium It is important to note, that STAP is the only software that runs on the command line and can take advantage of parallel processing on linux clusters and, further, is more amenable to downstream code manipulation. * Note: Greengenes alignment output is compatible with upload into ARB and downstream manual alignment. ** The STAP program itself is open source, the programs it depends on are freely available but not open source. doi:10.1371/journal.pone.0002566.t001 ss-rRNA Taxonomy Pipeline STAP database, and the query sequence is aligned to them using the CLUSTALW profile alignment algorithm [40] as described above for domain assignment. By adapting the profile alignment algorithm, th while gaps ar sequence ac Figure 1. A flow chart of the STAP pipeline. doi:10.1371/journal.pone.0002566.g001 STAP database, and the query sequence is aligned to them using the CLUSTALW profile alignment algorithm [40] as described above for domain assignment. By adapting the profile alignment algorithm, the alignments from the STAP database remain intact, while gaps are inserted and nucleotides are trimmed for the query sequence according to the profile defined by the previous alignments from the databases. Thus the accuracy and quality of the alignment generated at this step depends heavily on the quality of the Bacterial/Archaeal ss-rRNA alignments from the Greengenes project or the Eukaryotic ss-rRNA alignments from the RDPII project. Phylogenetic analysis using multiple sequence alignments rests on the assumption that the residues (nucleotides or amino acids) at the same position in every sequence in the alignment are homologous. Thus, columns in the alignment for which ‘‘positional homology’’ cannot be robustly determined must be excluded from subsequent analyses. This process of evaluating homology and eliminating questionable columns, known as masking, typically requires time- consuming, skillful, human intervention. We designed an automat- ed masking method for ss-rRNA alignments, thus eliminating this bottleneck in high-throughput processing. First, an alignment score is calculated for each aligned column by a method similar to that used in the CLUSTALX package [42]. Specifically, an R-dimensional sequence space representing all the possible nucleotide character states is defined. Then for each aligned column, the nucleotide populating that column in each of the aligned sequences is assigned a score in each of the R dimensions (Sr) according to the IUB matrix [42]. The consensus ‘‘nucleotide’’ for each column (X) also has R dimensions, with the Figure 2. Domain assignment. In Step 1, STAP assigns a domain to each query sequence based on its position in a maximum likelihood tree of representative ss-rRNA sequences. Because the tree illustrated here is not rooted, domain assignment would not be accurate and Figure 1. A flow chart of the STAP pipeline. doi:10.1371/journal.pone.0002566.g001 ss-rRNA Taxonomy Pipeline Dongying 
 Wu Amber Hartman Naomi Ward
  • 23. alignment used to build the profile, resulting in a multiple PD versus PID clustering, 2) to explore overlap between PhylOT Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in this generaliz workflow of PhylOTU. See Results section for details. doi:10.1371/journal.pcbi.1001061.g001 Finding Metagenomic OTU Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer JP, Green JL, Eisen JA, Pollard KS. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi: 10.1371/journal.pcbi.1001061 PhylOTU Tom Sharpton Katie Pollard Jessica Green
  • 24. !24 rRNA PCR: Community Comparisons
  • 25. Taxa Characters B1 ACTGCACCTATCGTTCG B2 ACTCCACCTATCGTTCG E1 ACTCCAGCTATCGATCG E2 ACTCCAGGTATCGATCG A1 ACCCCAGCTCTCGCTCG A2 ACCCCAGCTCTGGCTCG New1 ACCCCAGCTCTGCCTCG New2 ACTGCACCTATCGTTCG New3 ACCCCAGCTCTCGCTCG
 New4 AGGGGAGCTCTCGCTCG Archaea EukaryotesBacteria !24 rRNA rRNA PCR rRNA PCR Isolate DNA rRNA PCR: Community Comparisons A A A A AA A A A A AA A A A A A AA A A
  • 26. Taxa Characters B1 ACTGCACCTATCGTTCG B2 ACTCCACCTATCGTTCG E1 ACTCCAGCTATCGATCG E2 ACTCCAGGTATCGATCG A1 ACCCCAGCTCTCGCTCG A2 ACCCCAGCTCTGGCTCG New1 ACCCCAGCTCTGCCTCG New2 ACTGCACCTATCGTTCG New3 ACCCCAGCTCTCGCTCG
 New4 AGGGGAGCTCTCGCTCG !25 rRNA rRNA PCR rRNA PCR Isolate DNA rRNA PCR: Community Comparisons A A A A AA A A A A AA A A A A A AA A A
  • 27. Hartman et al. BMC Bioinformatics 2010, 11:317 http://www.biomedcentral.com/1471-2105/11/317 Open AccessSOFTWARE Software Introducing W.A.T.E.R.S.: a Workflow for the Alignment, Taxonomy, and Ecology of Ribosomal Sequences Amber L Hartman†1,3, Sean Riddle†2, Timothy McPhillips2, Bertram Ludäscher2 and Jonathan A Eisen*1 Abstract Background: For more than two decades microbiologists have used a highly conserved microbial gene as a phylogenetic marker for bacteria and archaea. The small-subunit ribosomal RNA gene, also known as 16 S rRNA, is encoded by ribosomal DNA, 16 S rDNA, and has provided a powerful comparative tool to microbial ecologists. Over time, the microbial ecology field has matured from small-scale studies in a select number of environments to massive collections of sequence data that are paired with dozens of corresponding collection variables. As the complexity of data and tool sets have grown, the need for flexible automation and maintenance of the core processes of 16 S rDNA sequence analysis has increased correspondingly. Results: We present WATERS, an integrated approach for 16 S rDNA analysis that bundles a suite of publicly available 16 S rDNA analysis software tools into a single software package. The "toolkit" includes sequence alignment, chimera removal, OTU determination, taxonomy assignment, phylogentic tree construction as well as a host of ecological analysis and visualization tools. WATERS employs a flexible, collection-oriented 'workflow' approach using the open- source Kepler system as a platform. Conclusions: By packaging available software tools into a single automated workflow, WATERS simplifies 16 S rDNA analyses, especially for those without specialized bioinformatics, programming expertise. In addition, WATERS, like some of the newer comprehensive rRNA analysis tools, allows researchers to minimize the time dedicated to carrying out tedious informatics steps and to focus their attention instead on the biological interpretation of the results. One advantage of WATERS over other comprehensive tools is that the use of the Kepler workflow system facilitates result interpretation and reproducibility via a data provenance sub-system. Furthermore, new "actors" can be added to the workflow as desired and we see WATERS as an initial seed for a sizeable and growing repository of interoperable, easy- to-combine tools for asking increasingly complex microbial ecology questions. Background Microbial communities and how they are surveyed Microbial communities abound in nature and are crucial for the success and diversity of ecosystems. There is no end in sight to the number of biological questions that can be asked about microbial diversity on earth. From animal and human guts to open ocean surfaces and deep sea hydrothermal vents, to anaerobic mud swamps or boiling thermal pools, to the tops of the rainforest canopy and the frozen Antarctic tundra, the composition of microbial communities is a source of natural history, intellectual curiosity, and reservoir of environmental health [1]. Microbial communities are also mediators of insight into global warming processes [2,3], agricultural success [4], pathogenicity [5,6], and even human obesity [7,8]. In the mid-1980 s, researchers began to sequence ribo- somal RNAs from environmental samples in order to characterize the types of microbes present in those sam- ples, (e.g., [9,10]). This general approach was revolution- ized by the invention of the polymerase chain reaction (PCR), which made it relatively easy to clone and then * Correspondence: jaeisen@ucdavis.edu 1 Department of Medical Microbiology and Immunology and the Department of Evolution and Ecology, Genome Center, University of California Davis, One Shields Avenue, Davis, CA, 95616, USA † Contributed equally Full list of author information is available at the end of the article WATERS - Kepler Workflow for rRNA matics 2010, 11:317 .com/1471-2105/11/317 Page 2 of 14 genes for ribosomal RNA) in partic- ubunit ribosomal RNA (ss-rRNA). ed a large amount of previously l diversity [1,11-13]. Researchers all subunit rRNA gene not only ith which it can be PCR amplified, has variable and highly conserved to be universally distributed among nd it is useful for inferring phyloge- 4,15]. Since then, "cultivation-inde- " have brought a revolution to the by allowing scientists to study a mount of diversity in many different ments [16-18]. The general premise Figure 1 Overview of WATERS. Schema of WATERS where white boxes indicate "behind the scenes" analyses that are performed in WA- Align Check chimeras Cluster Build Tree Assign Taxonomy Tree w/ Taxonomy Diversity statistics & graphs Unifrac files Cytoscape network OTU table Hartman et al. BMC Bioinformatics 2010, 11:317 http://www.biomedcentral.com/1471-2105/11/317 Page 3 of 14 Motivations As outlined above, successfully processing microbial sequence collections is far from trivial. Each step is com- plex and usually requires significant bioinformatics expertise and time investment prior to the biological interpretation. In order to both increase efficiency and ensure that all best-practice tools are easily usable, we sought to create an "all-inclusive" method for performing all of these bioinformatics steps together in one package. To this end, we have built an automated, user-friendly, workflow-based system called WATERS: a Workflow for the Alignment, Taxonomy, and Ecology of Ribosomal Sequences (Fig. 1). In addition to being automated and simple to use, because WATERS is executed in the Kepler scientific workflow system (Fig. 2) it also has the advan- tage that it keeps track of the data lineage and provenance of data products [23,24]. Automation The primary motivation in building WATERS was to minimize the technical, bioinformatics challenges that arise when performing DNA sequence clustering, phylo- genetic tree, and statistical analyses by automating the 16 S rDNA analysis workflow. We also hoped to exploit additional features that workflow-based approaches entail, such as optimized execution and data lineage tracking and browsing [23,25-27]. In the earlier days of 16 S rDNA analysis, simply knowing which microbes were present and whether they were biologically novel was a noteworthy achievement. It was reasonable and expected, therefore, to invest a large amount of time and effort to get to that list of microbes. But now that current efforts are significantly more advanced and often require com- parison of dozens of factors and variables with datasets of thousands of sequences, it is not practically feasible to process these large collections "by hand", and hugely inef- ficient if instead automated methods can be successfully employed. Broadening the user base A second motivation and perspective is that by minimiz- ing the technical difficulty of 16 S rDNA analysis through the use of WATERS, we aim to make the analysis of these datasets more widely available and allow individuals with Figure 2 Screenshot of WATERS in Kepler software. Key features: the library of actors un-collapsed and displayed on the left-hand side, the input and output paths where the user declares the location of their input files and desired location for the results files. Each green box is an individual Kepler actor that performs a single action on the data stream. The connectors (black arrows) direct and hook up the actors in a defined sequence. Double- clicking on any actor or connector allows it to be manipulated and re-arranged. Hartman et al. BMC Bioinformatics 2010, 11:317 http://www.biomedcentral.com/1471-2105/11/317 Page 9 of default is 97% and 99%), and they are also generated for every metadata variable comparison that the user includes. Data pruning To assist in troubleshooting and quality contro WATERS returns to the user three fasta files of sequenc Figure 3 Biologically similar results automatically produced by WATERS on published colonic microbiota samples. (A) Rarefaction curves sim ilar to curves shown in Eckburg et al. Fig. 2; 70-72, indicate patient numbers, i.e., 3 different individuals. (B) Weighted Unifrac analysis based on phylo genetic tree and OTU data produced by WATERS very similar to Eckburg et al. Fig. 3B. (C) Neighbor-joining phylogenetic tree (Quicktree) representing the sequences analyzed by WATERS, which is clearly similar to Fig. S1 in Eckburg et al. BA 3 3HUFHQW YDULDWLRQ H[SODLQHG 33HUFHQWYDULDWLRQH[SODLQHG $% & '( ) 6 $ % & '( ) 6 $ %& ' () 6 3&$ 3 YV 3 C %$&7(52,'(7(6 %$&7(52,'$/(6 '(/7$3527(2%$&7(5,$ $&7,12%$&7(5,$ 9(558&20,&52%,$ (36,/213527(2%$&7(5,$ ),50,&87(6 &/2675,',$ &/2675,',$/(6 *$00$3527(2%$&7(5,$ &<$12%$&7(5,$ $/3+$3527(2%$&7(5,$ )862%$&7(5,$ ),50,&87(6 %$&,//, ),50,&87(6 02//,&87(6 Amber
 Hartman
  • 28. Tree from Woese. 1987. Microbiological Reviews 51:221 rRNA Not Perfect Nothing is Perfect
  • 29. rRNA Phylogeny Copy # Correction Kembel SW, Wu M, Eisen JA, Green JL (2012) Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance. PLoS Comput Biol 8(10): e1002743. doi: 10.1371/journal.pcbi. 1002743 Steven Kembel Jessica Green Martin Wu
  • 30. Tree Complications 1 !29 rRNA rRNArRNA ACUGC ACCUAU CGUUCG ACUCC AGCUAU CGAUCG ACCCC AGCUCU CGCUCG Taxa Characters S ACUGCACCUAUCGUUCG R ACUCCACCUAUCGUUCG E ACUCCAGCUAUCGAUCG F ACUCCAGGUAUCGAUCG C ACCCCAGCUCUCGCUCG W ACCCCAGCUCUGGCUCG Taxa Characters S ACUGCACCUAUCGUUCG E ACUCCAGCUAUCGAUCG C ACCCCAGCUCUCGCUCG EuksBacteria Arch Isolate Ribosomes Arch
  • 31. Tree Complications 2 !30 rRNA rRNArRNA ACUGC ACCUAU CGUUCG ACUCC AGCUAU CGAUCG ACCCC AGCUCU CGCUCG Taxa Characters S ACUGCACCUAUCGUUCG R ACUCCACCUAUCGUUCG E ACUCCAGCUAUCGAUCG F ACUCCAGGUAUCGAUCG C ACCCCAGCUCUCGCUCG W ACCCCAGCUCUGGCUCG Taxa Characters S ACUGCACCUAUCGUUCG E ACUCCAGCUAUCGAUCG C ACCCCAGCUCUCGCUCG EuksBacteria Arch Isolate Ribosomes Arch
  • 32. Tree Complications 3 !31 rRNA rRNArRNA ACUGC ACCUAU CGUUCG ACUCC AGCUAU CGAUCG ACCCC AGCUCU CGCUCG Taxa Characters S ACUGCACCUAUCGUUCG R ACUCCACCUAUCGUUCG E ACUCCAGCUAUCGAUCG F ACUCCAGGUAUCGAUCG C ACCCCAGCUCUCGCUCG W ACCCCAGCUCUGGCUCG Taxa Characters S ACUGCACCUAUCGUUCG E ACUCCAGCUAUCGAUCG C ACCCCAGCUCUCGCUCG EuksBacteria Arch Isolate Ribosomes Arch
  • 33. Automated Accurate Genome Tree Lang JM, Darling AE, Eisen JA (2013) Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees and Supermatrices. PLoS ONE 8(4): e62510. doi:10.1371/journal.pone.0062510 Jenna Lang Aaron Darling
  • 35. Metagenomics metagenomics ACUGC ACCUAU CGUUCG ACUCC AGCUAU CGAUCG ACCCC AGCUCU CGCUCG Taxa Characters S ACUGCACCUAUCGUUCG R ACUCCACCUAUCGUUCG E ACUCCAGCUAUCGAUCG F ACUCCAGGUAUCGAUCG C ACCCCAGCUCUCGCUCG W ACCCCAGCUCUGGCUCG Taxa Characters S ACUGCACCUAUCGUUCG E ACUCCAGCUAUCGAUCG C ACCCCAGCUCUCGCUCG EukaryotesBacteria Archaea
  • 36. inputs of fixed carbon or nitrogen from external sources. As with Leptospirillum group I, both Leptospirillum group II and III have the genes needed to fix carbon by means of the Calvin–Benson– Bassham cycle (using type II ribulose 1,5-bisphosphate carboxy- lase–oxygenase). All genomes recovered from the AMD system contain formate hydrogenlyase complexes. These, in combination with carbon monoxide dehydrogenase, may be used for carbon fixation via the reductive acetyl coenzyme A (acetyl-CoA) pathway by some, or all, organisms. Given the large number of ABC-type sugar and amino acid transporters encoded in the Ferroplasma type Figure 4 Cell metabolic cartoons constructed from the annotation of 2,180 ORFs identified in the Leptospirillum group II genome (63% with putative assigned function) and 1,931 ORFs in the Ferroplasma type II genome (58% with assigned function). The cell cartoons are shown within a biofilm that is attached to the surface of an acid mine drainage stream (viewed in cross-section). Tight coupling between ferrous iron oxidation, pyrite dissolution and acid generation is indicated. Rubisco, ribulose 1,5-bisphosphate carboxylase–oxygenase. THF, tetrahydrofolate. articles NATURE | doi:10.1038/nature02340 | www.nature.com/nature 5©2004 NaturePublishing Group Metagenomics metagenomics ACUGC ACCUAU CGUUCG ACUCC AGCUAU CGAUCG ACCCC AGCUCU CGCUCG Taxa Characters S ACUGCACCUAUCGUUCG R ACUCCACCUAUCGUUCG E ACUCCAGCUAUCGAUCG F ACUCCAGGUAUCGAUCG C ACCCCAGCUCUCGCUCG W ACCCCAGCUCUGGCUCG Taxa Characters S ACUGCACCUAUCGUUCG E ACUCCAGCUAUCGAUCG C ACCCCAGCUCUCGCUCG
  • 37. Metagenomics metagenomics ACUGC ACCUAU CGUUCG ACUCC AGCUAU CGAUCG ACCCC AGCUCU CGCUCG Taxa Characters S ACUGCACCUAUCGUUCG R ACUCCACCUAUCGUUCG E ACUCCAGCUAUCGAUCG F ACUCCAGGUAUCGAUCG C ACCCCAGCUCUCGCUCG W ACCCCAGCUCUGGCUCG Taxa Characters S ACUGCACCUAUCGUUCG E ACUCCAGCUAUCGAUCG C ACCCCAGCUCUCGCUCG
  • 38. Culture Independent “Metagenomics” DNA DNADNA !35 Taxa Characters B1 ACTGCACCTATCGTTCG B2 ACTCCACCTATCGTTCG E1 ACTCCAGCTATCGATCG E2 ACTCCAGGTATCGATCG A1 ACCCCAGCTCTCGCTCG A2 ACCCCAGCTCTGGCTCG New1 ACCCCAGCTCTGCCTCG New2 AGGGGAGCTCTGCCTCG New3 ACTCCAGCTATCGATCG New4 ACTGCACCTATCGTTCG RecA RecARecA http://genomebiology.com/2008/9/10/R151 Genome Biology 2008, Volume 9, Issue 10, Article R151 Wu and Eisen R151.7 Genome Biology 2008, 9:R151 sequences are not conserved at the nucleotide level [29]. As a result, the nr database does not actually contain many more protein marker sequences that can be used as references than those available from complete genome sequences. Comparison of phylogeny-based and similarity-based phylotyping Although our phylogeny-based phylotyping is fully auto- mated, it still requires many more steps than, and is slower than, similarity based phylotyping methods such as a MEGAN [30]. Is it worth the trouble? Similarity based phylo- typing works by searching a query sequence against a refer- ence database such as NCBI nr and deriving taxonomic information from the best matches or 'hits'. When species that are closely related to the query sequence exist in the ref- erence database, similarity-based phylotyping can work well. However, if the reference database is a biased sample or if it contains no closely related species to the query, then the top hits returned could be misleading [31]. Furthermore, similar- ity-based methods require an arbitrary similarity cut-off value to define the top hits. Because individual bacterial genomes and proteins can evolve at very different rates, a uni- versal cut-off that works under all conditions does not exist. As a result, the final results can be very subjective. In contrast, our tree-based bracketing algorithm places the query sequence within the context of a phylogenetic tree and only assigns it to a taxonomic level if that level has adequate sampling (see Materials and methods [below] for details of the algorithm). With the well sampled species Prochlorococ- cus marinus, for example, our method can distinguish closely related organisms and make taxonomic identifications at the species level. Our reanalysis of the Sargasso Sea data placed 672 sequences (3.6% of the total) within a P. marinus clade. On the other hand, for sparsely sampled clades such as Aquifex, assignments will be made only at the phylum level. Thus, our phylogeny-based analysis is less susceptible to data sampling bias than a similarity based approach, and it makes Major phylotypes identified in Sargasso Sea metagenomic dataFigure 3 Major phylotypes identified in Sargasso Sea metagenomic data. The metagenomic data previously obtained from the Sargasso Sea was reanalyzed using AMPHORA and the 31 protein phylogenetic markers. The microbial diversity profiles obtained from individual markers are remarkably consistent. The breakdown of the phylotyping assignments by markers and major taxonomic groups is listed in Additional data file 5. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Alphaproteobacteria Betaproteobacteria G am m aproteobacteria D eltaproteobacteria Epsilonproteobacteria U nclassified proteobacteria Bacteroidetes C hlam ydiae C yanobacteria Acidobacteria Therm otogae Fusobacteria ActinobacteriaAquificae Planctom ycetes Spirochaetes Firm icutes C hloroflexiC hlorobi U nclassified bacteria dnaG frr infC nusA pgk pyrG rplA rplB rplC rplD rplE rplF rplK rplL rplM rplN rplP rplS rplT rpmA rpoB rpsB rpsC rpsE rpsI rpsJ rpsK rpsM rpsS smpB tsf Relativeabundance RpoB RpoBRpoB Rpl4 Rpl4Rpl4 rRNA rRNArRNA Hsp70 Hsp70Hsp70 EFTu EFTuEFTu Many other genes better than rRNA
  • 40. Phylotyping w/ Protein Markers AMPHORA http://genomebiology.com/2008/9/10/R151 Genome Biology 2008, Volume 9, Issue 10, Article R151 Wu and Eisen R151.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Alphaproteobacteria Betaproteobacteria G am m aproteobacteria D eltaproteobacteria Epsilonproteobacteria U nclassified proteobacteria Bacteroidetes C hlam ydiae C yanobacteria Acidobacteria Therm otogae Fusobacteria ActinobacteriaAquificae Planctom ycetes Spirochaetes Firm icutes C hloroflexiC hlorobi U nclassified bacteria dnaG frr infC nusA pgk pyrG rplA rplB rplC rplD rplE rplF rplK rplL rplM rplN rplP rplS rplT rpmA rpoB rpsB rpsC rpsE rpsI rpsJ rpsK rpsM rpsS smpB tsf Relativeabundance Martin Wu
  • 41. GOS 1 GOS 2 GOS 3 GOS 4 GOS 5 Phylogenetic ID of Novel Lineages Dongying 
 Wu Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, Frazier M, et al. (2011) Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees. PLoS ONE 6(3): e18011. doi: 10.1371/journal.pone.0018011
  • 42. Phylogenetic Diversity of Metagenomes typically used as a qualitative measure because duplicate s quences are usually removed from the tree. However, the test may be used in a semiquantitative manner if all clone even those with identical or near-identical sequences, are i cluded in the tree (13). Here we describe a quantitative version of UniFrac that w call “weighted UniFrac.” We show that weighted UniFrac b haves similarly to the FST test in situations where both a FIG. 1. Calculation of the unweighted and the weighted UniFr measures. Squares and circles represent sequences from two differe environments. (a) In unweighted UniFrac, the distance between t circle and square communities is calculated as the fraction of t branch length that has descendants from either the square or the circ environment (black) but not both (gray). (b) In weighted UniFra branch lengths are weighted by the relative abundance of sequences the square and circle communities; square sequences are weight twice as much as circle sequences because there are twice as many tot circle sequences in the data set. The width of branches is proportion to the degree to which each branch is weighted in the calculations, an gray branches have no weight. Branches 1 and 2 have heavy weigh since the descendants are biased toward the square and circles, respe tively. Branch 3 contributes no value since it has an equal contributio from circle and square sequences after normalization. Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214 Jessica Green Steven Kembel Katie Pollard
  • 43. Phylosift Input Sequences rRNA workflow protein workflow profile HMMs used to align candidates to reference alignment Taxonomic Summaries parallel option hmmalign multiple alignment LAST fast candidate search pplacer phylogenetic placement LAST fast candidate search LAST fast candidate search search input against references hmmalign multiple alignment hmmalign multiple alignment Infernal multiple alignment LAST fast candidate search <600 bp >600 bp Sample Analysis & Comparison Krona plots, Number of reads placed for each marker gene Edge PCA, Tree visualization, Bayes factor tests eachinputsequencescannedagainstbothworkflows Aaron Darling @koadman Erik Matsen @ematsen Holly Bik @hollybik Guillaume Jospin @guillaumejospin Darling AE, Jospin G, Lowe E, Matsen FA IV, Bik HM, Eisen JA. (2014) PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ 2:e243 http://dx.doi.org/10.7717/peerj. 243 Erik Lowe
  • 44. Edge PCA: Identify lineages that explain most variation among samples Edge PCA - Matsen and Evans 2013 Output: Edge PCA
  • 45. Using Phylogeny 2: Functional Prediction
  • 46. PHYLOGENENETIC PREDICTION OF GENE FUNCTION IDENTIFY HOMOLOGS OVERLAY KNOWN FUNCTIONS ONTO TREE INFER LIKELY FUNCTION OF GENE(S) OF INTEREST 1 2 3 4 5 6 3 5 3 1A 2A 3A 1B 2B 3B 2A 1B 1A 3A 1B 2B 3B ALIGN SEQUENCES CALCULATE GENE TREE 1 2 4 6 CHOOSE GENE(S) OF INTEREST 2A 2A 5 3 Species 3Species 1 Species 2 1 1 2 2 2 31 1A 3A 1A 2A 3A 1A 2A 3A 4 6 4 5 6 4 5 6 2B 3B 1B 2B 3B 1B 2B 3B ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Duplication? EXAMPLE A EXAMPLE B Duplication? Duplication? Duplication 5 METHOD Ambiguous Based on Eisen, 1998 Genome Res 8: 163-167. Phylogenomics
  • 47. Overlaying Functions onto Tree Aquae Trepa Rat Fly Xenla Mouse Human Yeast Neucr Arath Borbu Synsp Neigo Thema Strpy Bacsu Ecoli TheaqDeira Chltr Spombe Yeast Yeast Spombe Mouse Human Arath Yeast Human Mouse Arath StrpyBacsu Human Celeg Yeast MetthBorbu Aquae Synsp Deira Helpy mSaco Yeast Celeg Human MSH4 MSH5 MutS2 MutS1 MSH1 MSH3 MSH6 MSH2 Based on Eisen, 1998
 Nucl Acids Res 26: 4291-4300.
  • 48. Phylogenomics ~~ Phylotyping Eisen et al. 1992Eisen et al. 1992. J. Bact.174: 3416
  • 49. Proteorhodopsin Functional Diversity Venter et al., Science 304: 66. 2004
  • 51. dFunctional Prediction from Metagenomes DNA DNADNA !23 Taxa Characters B1 ACTGCACCTATCGTTCG B2 ACTCCACCTATCGTTCG E1 ACTCCAGCTATCGATCG E2 ACTCCAGGTATCGATCG A1 ACCCCAGCTCTCGCTCG A2 ACCCCAGCTCTGGCTCG New1 ACCCCAGCTCTGCCTCG New2 AGGGGAGCTCTGCCTCG New3 ACTCCAGCTATCGATCG New4 ACTGCACCTATCGTTCG inputs of fixed carbon or nitrogen from external sources. As with Leptospirillum group I, both Leptospirillum group II and III have the genes needed to fix carbon by means of the Calvin–Benson– Bassham cycle (using type II ribulose 1,5-bisphosphate carboxy- lase–oxygenase). All genomes recovered from the AMD system contain formate hydrogenlyase complexes. These, in combination with carbon monoxide dehydrogenase, may be used for carbon fixation via the reductive acetyl coenzyme A (acetyl-CoA) pathway by some, or all, organisms. Given the large number of ABC-type sugar and amino acid transporters encoded in the Ferroplasma type Figure 4 Cell metabolic cartoons constructed from the annotation of 2,180 ORFs identified in the Leptospirillum group II genome (63% with putative assigned function) and 1,931 ORFs in the Ferroplasma type II genome (58% with assigned function). The cell cartoons are shown within a biofilm that is attached to the surface of an acid mine drainage stream (viewed in cross-section). Tight coupling between ferrous iron oxidation, pyrite dissolution and acid generation is indicated. Rubisco, ribulose 1,5-bisphosphate carboxylase–oxygenase. THF, tetrahydrofolate. articles NATURE | doi:10.1038/nature02340 | www.nature.com/nature 5©2004 NaturePublishing Group
  • 52. Phylogenetic Prediction of Function • Many powerful and automated similarity based methods for assigning genes to protein families • COGs • PFAM HMM searches • Some limitations of similarity based methods can be overcome by phylogenetic approaches • Automated methods now available • Sean Eddy • Steven Brenner • Kimmen Sjölander
  • 53. Phylogenetic Prediction of Function • Many powerful and automated similarity based methods for assigning genes to protein families • COGs • PFAM HMM searches • Some limitations of similarity based methods can be overcome by phylogenetic approaches • Automated methods now available • Sean Eddy • Steven Brenner • Kimmen Sjölander • But …
  • 54. Carboxydothermus hydrogenoformans • Isolated from a Russian hotspring • Thermophile (grows at 80°C) • Anaerobic • Grows very efficiently on CO (Carbon Monoxide) • Produces hydrogen gas • Low GC Gram positive (Firmicute) • Genome Determined (Wu et al. 2005 PLoS Genetics 1: e65. )
  • 55. Homologs of Sporulation Genes Wu et al. 2005 PLoS Genetics 1: e65.
  • 56. Carboxydothermus sporulates Wu et al. 2005 PLoS Genetics 1: e65.
  • 57. Non-Homology Predictions: Phylogenetic Profiling • Step 1: Search all genes in organisms of interest against all other genomes • Ask: Yes or No, is each gene found in each other species • Cluster genes by distribution patterns (profiles)
  • 58. Sporulation Gene Profile Wu et al. 2005 PLoS Genetics 1: e65.
  • 59. B. subtilis new sporulation genes J Bacteriol. 2013 Jan;195(2):253-60. doi: 10.1128/JB.01778-12 Bjorn Traag Richard Losick
  • 60. Phylogenetic Profiling for Metagenomics?
  • 61. Using Phylogeny 3: Linking Function and Phylogeny
  • 62. HiC Crosslinking & Sequencing Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore RW, Eisen JA, Darling AE. (2014) Strain- and plasmid- level deconvolution of a synthetic metagenome by sequencing proximity ligation products. PeerJ 2:e415 http://dx.doi.org/10.7717/peerj.415 Table 1 Species alignment fractions. The number of reads aligning to each replicon present in the synthetic microbial community are shown before and after filtering, along with the percent of total constituted by each species. The GC content (“GC”) and restriction site counts (“#R.S.”) of each replicon, species, and strain are shown. Bur1: B. thailandensis chromosome 1. Bur2: B. thailandensis chromosome 2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2: L. brevis plasmid 2, Ped: P. pentosaceus, K12: E. coli K12 DH10B, BL21: E. coli BL21. An expanded version of this table can be found in Table S2. Sequence Alignment % of Total Filtered % of aligned Length GC #R.S. Lac0 10,603,204 26.17% 10,269,562 96.85% 2,291,220 0.462 629 Lac1 145,718 0.36% 145,478 99.84% 13,413 0.386 3 Lac2 691,723 1.71% 665,825 96.26% 35,595 0.385 16 Lac 11,440,645 28.23% 11,080,865 96.86% 2,340,228 0.46 648 Ped 2,084,595 5.14% 2,022,870 97.04% 1,832,387 0.373 863 BL21 12,882,177 31.79% 2,676,458 20.78% 4,558,953 0.508 508 K12 9,693,726 23.92% 1,218,281 12.57% 4,686,137 0.507 568 E. coli 22,575,903 55.71% 3,894,739 17.25% 9,245,090 0.51 1076 Bur1 1,886,054 4.65% 1,797,745 95.32% 2,914,771 0.68 144 Bur2 2,536,569 6.26% 2,464,534 97.16% 3,809,201 0.672 225 Bur 4,422,623 10.91% 4,262,279 96.37% 6,723,972 0.68 369 Figure 1 Hi-C insert distribution. The distribution of genomic distances between Hi-C read pairs is shown for read pairs mapping to each chromosome. For each read pair the minimum path length on the circular chromosome was calculated and read pairs separated by less than 1000 bp were discarded. The 2.5 Mb range was divided into 100 bins of equal size and the number of read pairs in each bin was recorded for each chromosome. Bin values for each chromosome were normalized to sum to 1 and plotted. E. coli K12 genome were distributed in a similar manner as previously reported (Fig. 1; (Lieberman-Aiden et al., 2009)). We observed a minor depletion of alignments spanning the linearization point of the E. coli K12 assembly (e.g., near coordinates 0 and 4686137) due to edge eVects induced by BWA treating the sequence as a linear chromosome rather than circular. 10.7717/peerj.415 9/19 Figure 2 Metagenomic Hi-C associations. The log-scaled, normalized number of Hi-C read pairs associating each genomic replicon in the synthetic community is shown as a heat map (see color scale, blue to yellow: low to high normalized, log scaled association rates). Bur1: B. thailandensis chromosome 1. Bur2: B. thailandensis chromosome 2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2: L. brevis plasmid 2, Ped: P. pentosaceus, K12: E. coli K12 DH10B, BL21: E. coli BL21. reference assemblies of the members of our synthetic microbial community with the same alignment parameters as were used in the top ranked clustering (described above). We first Figure 3 Contigs associated by Hi-C reads. A graph is drawn with nodes depicting contigs and edges depicting associations between contigs as indicated by aligned Hi-C read pairs, with the count thereof depicted by the weight of edges. Nodes are colored to reflect the species to which they belong (see legend) with node size reflecting contig size. Contigs below 5 kb and edges with weights less than 5 were excluded. Contig associations were normalized for variation in contig size. typically represent the reads and variant sites as a variant graph wherein variant sites are represented as nodes, and sequence reads define edges between variant sites observed in the same read (or read pair). We reasoned that variant graphs constructed from Hi-C data would have much greater connectivity (where connectivity is defined as the mean path length between randomly sampled variant positions) than graphs constructed from mate-pair sequencing data, simply because Hi-C inserts span megabase distances. Such Figure 4 Hi-C contact maps for replicons of Lactobacillus brevis. Contact maps show the number of Hi-C read pairs associating each region of the L. brevis genome. The L. brevis chromosome (Lac0, (A), Chris Beitel @datscimed Aaron Darling @koadman
  • 63. Pink Berries PB-PSB1 (Purple sulfur bacteria) PB-SRB1 (Sulfate reducing bacteria) (sulfate) (sulfide) Wilbanks, E.G. et al (2014). Environmental Microbiology Lizzy Wilbanks @lizzywilbanks
  • 64. Long Reads Help, A Lot Hiseq & Miseq 100-250 bp Moleculo 2-20 kb Pacbio RSII 2-20kb Micky Kertesz, Tim Blauwcamp Meredith Ashby Cheryl Heiner Illumina-based “synthetic long reads” Real-time single molecule sequencing (p4-c2, p5-c3) 295 Megabases 474 Megabases61 Gigabases
  • 65. Using Phylogeny 4: Better Reference Data
  • 66. PhyEco Markers Phylogenetic group Genome Number Gene Number Maker Candidates Archaea 62 145415 106 Actinobacteria 63 267783 136 Alphaproteobacteria 94 347287 121 Betaproteobacteria 56 266362 311 Gammaproteobacteria 126 483632 118 Deltaproteobacteria 25 102115 206 Epislonproteobacteria 18 33416 455 Bacteriodes 25 71531 286 Chlamydae 13 13823 560 Chloroflexi 10 33577 323 Cyanobacteria 36 124080 590 Firmicutes 106 312309 87 Spirochaetes 18 38832 176 Thermi 5 14160 974 Thermotogae 9 17037 684 Wu D, Jospin G, Eisen JA (2013) Systematic Identification of Gene Families for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological Studies of Bacteria and Archaea and Their Major Subgroups. PLoS ONE 8(10): e77033. doi:10.1371/journal.pone.0077033
  • 67. Better Protein Families Representative Genomes Extract Protein Annotation All v. All BLAST Homology Clustering (MCL) SFams Align & Build HMMs HMMs Screen for Homologs New Genomes Extract Protein Annotation Figure 1 Sharpton et al. 2012.BMC bioinformatics, 13(1), 264. A B C
  • 69. Microbial Dark Matter Part 2 • Ramunas Stepanauskas • Tanja Woyke • Jonathan Eisen • Duane Moser • Tullis Onstott
  • 70.
  • 71. Phylogeny Isn’t Everything .. Model Systems
  • 72. Wu et al. 2006 PLoS Biology 4: e188. Baumannia makes vitamins and cofactors Sulcia makes amino acids Simple Symbioses Wu et al. 2006 PLoS Biology 4: e188. Baumannia makes vitamins and cofactors Sulcia makes amino acids Phylogenetic Binning Nancy Moran Dongying Wu
  • 73. Drosophila microbiome w/ Kopp Lab Both natural surveys and laboratory experiments indicate that host diet plays a major role in shaping the Drosophila bacterial microbiome. Laboratory strains provide only a limited model of natural host–microbe interactions Jenna Lang Angus Chandler
  • 74. Rice Microbiome w/ Sundar Lab Edwards et al. 2015. Structure, variation, and assembly of the root-associated microbiomes of rice. PNAS 9 Supplementary Figures31 32 Fig. S1 Map depicting soil collection locations for greenhouse experiment.33 10 234 Fig. S2. Sampling and collection of the rhizocompartments. Roots are collected from rice235 plants and soil is shaken off the roots to leave ~1mm of soil around the roots. The ~1 mm of soil236 three separate rhizocompartments: the rhizosphere, rhizoplane, and endosphere (Fig. 1A). Because the root microbiome has been shown to correlate with the developmental stage of the plant (10), the root-associated microbial communities were sampled at 42 d (6 wk), when rice plants from all genotypes were well-established in the soil but still in their vegetative phase of growth. For our study, the rhizosphere compartment was com- w i t i ( t s z i m a r t t ( t m P h t P p ( i M P a t o s q a n v v p t p s G Fig. 1. Root-associated microbial communities are separable by rhizo- compartment and soil type. (A) A representation of a rice root cross-section depicting the locations of the microbial communities sampled. (B) Within- sample diversity (α-diversity) measurements between rhizospheric compart- ments indicate a decreasing gradient in microbial diversity from the rhizo- sphere to the endosphere independent of soil type. Estimated species richness was calculated as eShannon_entropy . The horizontal bars within boxes represent median. The tops and bottoms of boxes represent 75th and 25th quartiles, respectively. The upper and lower whiskers extend 1.5× the interquartile range from the upper edge and lower edge of the box, re- spectively. All outliers are plotted as individual points. (C) PCoA using the WUF metric indicates that the largest separation between microbial com- munities is spatial proximity to the root (PCo 1) and the second largest source of variation is soil type (PCo 2). (D) Histograms of phyla abundances in each compartment and soil. B, bulk soil; E, endosphere; P, rhizoplane; S, rhizosphere; Sac, Sacramento. 2 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1414592112 gate the relationship between rice ge- icrobiome, domesticated rice varieties rated growing regions were tested. Six spanning two species within the Oryza 2 d in the greenhouse before sampling. a) cultivars M104, Nipponbare (both ties), IR50, and 93-11 (both indica va- gside two cultivars of African cultivated g7102 (Glab B) and TOg7267 (Glab E). ed that rice genotype accounted for ariation between microbial communities % of the variance, P < 0.001; Dataset f the variance, P < 0.066; Dataset S5H); ntations for clustering patterns of the nt on the first two axes of unconstrained ppendix, Fig. S10). We then used CAP ffect of rice genotype on the microbial g on rice cultivar and controlling for and technical factors, we found that ge- ice have a significant effect on root- mmunities (5.1%, P = 0.005, WUF, Fig. , UUF, SI Appendix, Fig. S11A). Ordi- AP analysis revealed clustering patterns only partially consistent with genetic F and UUF metrics. The two japonica her and the two O. glaberrima cultivars ver, the indica cultivars were split, with O. glaberrima cultivars and IR50 clus- cultivars. enotypic effect manifests in individual eparated the whole dataset to focus on vidually and conducted CAP analysis and technical factors. The rhizosphere eight sites were operated under two cultivation practices: organic cultivation and a more conventional cultivation practice termed “ecofarming” (see below). Because genotype explained the least variance in the greenhouse data, we limited the analysis to one cultivar, S102, a California temperate japonica variety that is widely cultivated by commercial growers and is closely related to M104 (26). Field samples were collected from vegetatively growing rice plants in flooded fields and the previously defined rhizocompartments were analyzed as before. Unfortunately, collection of bulk soil controls for the field experiment was not Fig. 3. Host plant genotype significantly affects microbial communities in the rhizospheric compartments. (A) Ordination of CAP analysis using the WUF metric constrained to rice genotype. (B) Within-sample diversity measurements of rhizosphere samples of each cultivar grown in each soil. Estimated species richness was calculated as eShannon_entropy . The horizontal bars within boxes represent median. The tops and bottoms of boxes repre- sent 75th and 25th quartiles, respectively. The upper and lower whiskers extend 1.5× the interquartile range from the upper edge and lower edge of the box, respectively. All outliers are plotted as individual points. oi/10.1073/pnas.1414592112 Edwards et al. fields are too high to find representative soil that is unlikely to be affected by nearby plants. Amplification and sequencing of the field microbiome samples yielded 13,349,538 high-quality sequences (median: 54,069 reads per sample; range: 12,535– 148,233 reads per sample; Dataset S13). The sequences were clustered into OTUs using the same criteria as the greenhouse experiment, yielding 222,691 microbial OTUs and 47,983 OTUs with counts >5 across the field dataset. We found that the microbial diversity of field rice plants is significantly influenced by the field site. α-Diversity measure- ments of the field rhizospheres indicated that the cultivation site significantly impacts microbial diversity (SI Appendix, Fig. S14A, P = 2.00E-16, ANOVA and Dataset S14). Unconstrained PCoA using both the WUF and UUF metrics showed that microbial communities separated by field site across the first axis (Fig. 4B, WUF and SI Appendix, Fig. S14B, UUF). PERMANOVA agreed with the unconstrained PCoA in that field site explained the largest proportion of variance between the microbial communi- ties for field plants (30.4% of variance, P < 0.001, WUF, Dataset S5O and 26.6% of variance, P < 0.001, UUF, Dataset S5P). CAP analysis constrained to field site and controlled for rhizocom- partment, cultivation practice, and technical factors (sequencing batch and biological replicate) agreed with the PERMANOVA results in that the field site explains the largest proportion of variance between the root-associated microbial communities in field plants (27.3%, P = 0.005, WUF, SI Appendix, Fig. S15A and 28.9%, P = 0.005, UUF, SI Appendix, Fig. S15E), sug- gesting that geographical factors may shape root-associated microbial communities. Rhizospheric Compartmentalization Is Retained in Field Plants. Sim- ilar to the greenhouse plants, the rhizospheric microbiomes of field plants are distinguishable by compartment. α-Diversity of the field plants again showed that the rhizosphere had the highest microbial diversity, whereas the endosphere had the least S15). PCoA the WUF a compartmen Appendix, F separation i ond largest (20.76%, P UUF, Data biomes cons trolled for f agreed with variance bet compartmen and 10.9%, Taxonomi overall sim Chloroflexi, microbiota. endosphere Proteobacteri and Plancto distribution trend from t Appendix, Fi We again OTUs in the S16). We fo endosphere c representing Fig. S17). Th the genus A and Alphap terestingly, 1 found to b greenhouse OTUs were sisted of tax and Myxoco bidopsis roo Cultivation Pr The rice fiel practices, org tion called farming in th are all perm harvest fumi itself does si partments ov a significant the rhizocom indicating th affected diffe the rhizosph practice, with zospheres th Dataset S14) crobial comm tests; Datase practices are the WUF m S14D). PERFig. 4. Root-associated microbiomes from field-grown plants are separable by cultivation site, rhizospheric compartment, and cultivation practice. (A) Variation w/in Plant Cultivation Site Effects Rice Genotype Effects and mitochondrial) reads to analyze microbial abundance in the endosphere over time (Fig. 6A). Using this technique, we confirmed the sterility of seedling roots before transplantation. We found that microbial penetrance into the endosphere oc- curred at or before 24 h after transplantation and that the pro- portion of microbial reads to organellar reads increased over the first 2 wk after transplantation (Fig. 6A). To further support the evidence for microbiome acquisition within the first 24 h, we sampled root endospheric microbiomes from sterilely germi- nated seedlings before transplanting into Davis field soil as well as immediately after transplantation and 24 h after transplan- tation (SI Appendix, Fig. S24). The root endospheres of sterilely germinated seedlings, as well as seedlings transplanted into Davis field soil for 1 min, both had a very low percentage of microbial reads compared with organellar reads (0.22% and 0.71%), with the differences not statistically significant (P = 0.1, Wilcoxon test). As before, endospheric microbial abundance increased significantly, by >10-fold after 24 h in field soil (3.95%, P = 0.05, Wilcoxon test). We conclude that brief soil contact does not strongly increase the proportion of microbial reads, and therefore the increase in microbial reads at 24 h is indicative of endophyte acquisition within 1 d after transplantation. α-Diversity significantly varied by rhizocompartment (P < 2E- 16; Dataset S23) and there was a significant interaction between rhizocompartment and collection time (P = 0.042; Dataset S23); however, when each rhizocompartment was analyzed individ- ually, the bulk soil was the only compartment that showed (13 d) approach the endosphere and rhizoplane microbiome compositions for plants that have been grown in the green- house for 42 d. There are slight shifts in the distribution of phyla over time; however, there are significant distinctions between the com- partments starting as early as 24 h after transplantation into soil (Fig. 6D, SI Appendix, Figs. S24B and S26, and Dataset S24). Because each phylum consists of diverse OTUs that could ex- hibit very different behaviors during acquisition, we next ex- amined the dynamics and colonization patterns of specific OTUs within the time-course experiment. The core set of 92 endosphere-enriched OTUs obtained from the previous green- house experiment (SI Appendix, Fig. S9C) was analyzed for relative abundances at different time points (Fig. 6E). Of the 92 core endosphere-enriched microbes present in the greenhouse experiment, 53 OTUs were detectable in the endosphere in the time-course experiment. The average abundance profile over time revealed a colonization pattern for the core endospheric microbiome. Relative abundance of the core endosphere- enriched microbiome peaks early (3 d) in the rhizosphere and then decreases back to a steady, low level for the remainder of the time points. Similarly, the rhizoplane profile shows an in- crease after 3 d with a peak at 8 d with a decline at 13 d. The endosphere generally follows the rhizoplane profile, except that relative abundance is still increasing at 13 d. These results sug- gest that the core endospheric microbes are first attracted to the rhizosphere and then locate to the rhizoplane, where they attach Fig. 5. OTU coabundance network reveals modules of OTUs associated with methane cycling. (A) Subset of the entire network corresponding to 11 modules with methane cycling potential. Each node represents one OTU and an edge is drawn between OTUs if they share a Pearson correlation of greater than or equal to 0.6. (B) Depiction of module 119 showing the relationship between methanogens, syntrophs, methanotrophs, and other methane cycling taxonomies. Each node represents one OTU and is labeled by the presumed function of that OTU’s taxonomy in methane cycling. An edge is drawn between two OTUs if they have a Pearson correlation of greater than or equal to 0.6. (C) Mean abundance profile for OTUs in module 119 across all rhizocompartments and field sites. The position along the x axis corresponds to a different field site. Error bars represent SE. The x and y axes represent no particular scale. PLANTBIOLOGYPNASPLUS Function x Genotype of magnitude greater than in any single plant species to date. Under controlled greenhouse conditions, the rhizocompartments described the largest source of variation in the microbial com- munities sampled (Dataset S5A). The pattern of separation be- tween the microbial communities in each compartment is consistent with a spatial gradient from the bulk soil across the rhizosphere and rhizoplane into the endosphere (Fig. 1C). Similarly, microbial diversity patterns within samples hold the same pattern where there is a gradient in α-diversity from the rhizosphere to the endosphere (Fig. 1B). Enrichment and de- pletion of certain microbes across the rhizocompartments indi- cates that microbial colonization of rice roots is not a passive process and that plants have the ability to select for certain mi- crobial consortia or that some microbes are better at filling the root colonizing niche. Similar to studies in Arabidopsis, we found that the relative abundance of Proteobacteria is increased in the endosphere compared with soil, and that the relative abundances of Acidobacteria and Gemmatimonadetes decrease from the soil to the endosphere (9–11), suggesting that the distribution of different bacterial phyla inside the roots might be similar for all land plants (Fig. 1D and Dataset S6). Under controlled green- house conditions, soil type described the second largest source of variation within the microbial communities of each sample. However, the soil source did not affect the pattern of separation between the rhizospheric compartments, suggesting that the rhizocompartments exert a recruitment effect on microbial con- sortia independent of the microbiome source. By using differential OTU abundance analysis in the com- partments, we observed that the rhizosphere serves an enrich- ment role for a subset of microbial OTUs relative to bulk soil (Fig. 2). Further, the majority of the OTUs enriched in the rhizosphere are simultaneously enriched in the rhizoplane and/or endosphere of rice roots (Fig. 2B and SI Appendix, Fig. S16B), consistent with a recruitment model in which factors produced by the root attract taxa that can colonize the endosphere. We found that the rhizoplane, although enriched for OTUs that are also Time Series
  • 75. Acknowledgements DOE JGI Sloan GBMF NSF DHS DARPA Aaron Darling
 Lizzy Wilbanks Jenna Lang Russell Neches Rob Knight Jack Gilbert Tanja Woyke Rob Dunn Katie Pollard Jessica Green Darlene Cavalier Eddy RubinWendy Brown Dongying Wu Phil Hugenholtz DSMZ Sundar Srijak Bhatnagar David Coil Alex Alexiev Hannah Holland-Moritz Holly Bik John Zhang Holly Menninger Guillaume Jospin David Lang Cassie Ettinger Tim HarkinsJennifer Gardy Holly Ganz