SlideShare a Scribd company logo
Phylogenomic Case Studies:


The Benefits (and Occasional Drawbacks)


of Integrating


Evolutionary and Genomic Studies


BIATA 2021


Jonathan A. Eisen


University of California, Davis


@phylogenomics


http://phylogenomics.me
Eisen Lab
• Rules
Phylogenomics and Evolvability
•Mutation


•Duplication


•Deletion


•Rearrangement


•Recombination
Intrinsic
Novelty Origin


Evolvability: variation in these
processes w/in & between taxa


Phylogenomics: integrating
genomics & evolution, helps
interpret / predict evolvability
•Mutation


•Duplication


•Deletion


•Rearrangement


•Recombination
Intrinsic
Extrinsic
Novelty Origin


Evolvability &
Phylogenomics of
Extrinsic Novelties
Phylogenomics and Evolvability
•Recombination


•Gene transfer
•Mutation


•Duplication


•Deletion


•Rearrangement


•Recombination
Intrinsic
•Symbiosis


•Symbioses


•Microbiomes
Extrinsic
Novelty Origin


Evolvability &
Phylogenomics of
Extrinsic Novelties
Phylogenomics and Evolvability
•Recombination


•Gene transfer
Eisen Lab Funding
• NSF


• DOE


• Gordon and Betty Moore Foundation


• Alfred P. Sloan Foundation


• NIH


• UC Davis


• DARPA


• DHS
Eisen Lab “Topics”
Phylogenomic


Methods


& Tools
Microbial
Phylogenomics


&


Evolvability
Phylogenomic


Resources


&


Reference Data
Communication


&


Participation


In Microbiology


& Science
Research


Projects
Eisen Lab “Topics”
Phylogenomic


Projects
Microbial
Phylogenomics


&


Evolvability
A Brief Tour
Phylogenomic


Novelty:


Recombination
Phylogenomic


Projects
Area 1: Intrinsic Novelty I
Intrinsic
RecA Structure & Function I
Intrinsic
Liu SK, Eisen JA, Hanawalt PC, Tessman IW. 1993. recA mutations that reduce the constitutive coprotease activity of the RecA1202(PrtC) protein: possible involvement of interfilament
association in proteolytic and recombination activities. Journal of Bacteriology 175: 6518-6529. PMID: 8407828. PMCID: PMC206762.
RecA vs. rRNA
Eisen 1995 Journal of Molecular Evolution 41: 1105-1123..
More on this later …
I
Intrinsic
RecA From Other Species I
Intrinsic
RecA Missing From Some Taxa
Those taxa without RecA
homologs have no
homologous recombination
which has major impacts on
tempo and modes of
evolution


I
Intrinsic
Moran NA, Mira A. The process of genome shrinkage in the obligate symbiont Buchnera aphidicola. Genome Biol. 2001;2(12):RESEARCH0054. doi:10.1186/gb-2001-2-12-
research0054
Phylogenomic


Novelty:


Rearrangements
Phylogenomic


Projects
Area 1: Intrinsic Novelty I
Intrinsic
13621300
13621775
13622250
13622725
13623200
0 625 1250 1875 2500
Series1
Streps
0
500
1000
1500
2000
2500
3000
2632200 2632700 2633200 2633700 2634200 2634700 2635200 2635700 2636200 2636700
B. subt vs. Staph
0
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
4 0 0 0 0 0 0
Mycobacterium
tuberculosis
0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0 0 0 0
Mycobacterium leprae
M. tb vs. M. leprae Pyrococcus Thermoplasmas
9945700
9947275
9948850
9950425
9952000
0 2125 4250 6375 8500
Series1
Pseudomonas
The X-Files I
Intrinsic
B1
A1
B2
A2
B3
A3
B3
B2
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
6
7
25
8
26
27
28
29
30
1 2
3
4
5
3132
B1
3132
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1 2
3
4
5
3132
B3 24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
6
7
25
8
26
27
28
29
3
3231
30
4
5
2 1
A1
3132
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1 2 3
4
5
3132
A2
3132
6
7
8
9
10
11
12
13
19
18
17
16
15
14
20
21
22
23
24
25
26
27
28
29
30
1 2 3
4
5
3132
A3
2
6
7
8
9
10
11
12
13
19
18
17
16
15
14
20
21
22
23
24
25
26
27
5
4
3 31 30
29
28
1 32
B2
Inversion
Around
Terminus (*)
Inversion
Around
Terminus (*)
Inversion
Around
Origin (*)
Inversion
Around
Origin (*)
* *
* *
* *
* *
Common
Ancestor of
A and B
3132
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1 2
3
4
5
3132
A2
A1 A2
A3
B2
B1
Symmetric Inversion Model
Eisen JA, Heidelberg JF, White O, Salzberg SL. Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol. 2000 1(6):
RESEARCH0011. doi:10.1186/gb-2000-1-6-research0011


I
Intrinsic
Phylogenomic


Novelty:


Duplications
Phylogenomic


Projects
Area 1: Intrinsic Novelty I
Intrinsic
Tehtrahymena Duplication Suppression I
Intrinsic
Phylogenomic


Novelty


Generation
Phylogenomic


Projects
Area 2: Extrinsic Novelty
Phylogenomic


Gene Transfer
Phylogenomic


Projects
Area 2: Extrinsic Novelty E1
Extrinsic
Wu et al., 2004. Collaboration between Jonathan Eisen and Scott
O’Neill (Yale, U. Queensland).
Wolbachia pipientis wMel E1
Extrinsic
Collaboration with Scott O’ Neill and others
Wu M, Sun LV, Vamathevan J, et al. Phylogenomics of the reproductive parasite Wolbachia pipientis wMel: a streamlined genome overrun by mobile genetic elements.
PLoS Biol. 2004;2(3):E69. doi:10.1371/journal.pbio.0020069
Wolbachia Mobile/Repetitive DNA
Repeat
Class
Size
(Median)
Copies Protein motifs/families IS Family Possible Terminal Inverted Repeat Sequence
1 1512 3 Transposase IS4 5’ ATACGCGTCAAGTTAAG 3’
2 360 12 - New 5’ GGCTTTGTTGCATCGCTA 3’
3 858 9 Transposase IS492/IS110 5’ GGCTTTGTTGCAT 3’
4 1404.5 4 Conserved hypothetical,
phage terminase
New 5’ ATACCGCGAWTSAWTCGCGGTAT 3’
5 1212 15 Transposase IS3 5’ TGACCTTACCCAGAAAAAGTGGAGAGAAAG 3’
6 948 13 Transposase IS5 5’ AGAGGTTGTCCGGAAACAAGTAAA 3’
7 2405.5 8 RT/maturase -
8 468 45 - -
9 817 3 conserved hypothetical,
transposase
ISBt12
10 238 2 ExoD -
11 225 2 RT/maturase -
12 1263 4 Transposase ???
13 572.5 2 Transposase ??? None detected
14 433 2 Ankyrin -
15 201 2 - -
16 1400 6 RT/maturase -
17 721 2 transposase IS630
18 1191.5 2 EF-Tu -
19 230 2 hypothetical -
E1
Extrinsic
Wu M, Sun LV, Vamathevan J, et al. Phylogenomics of the reproductive parasite Wolbachia pipientis wMel: a streamlined genome overrun by mobile genetic elements.
PLoS Biol. 2004;2(3):E69. doi:10.1371/journal.pbio.0020069
T. roseum mobile motility element
Wu et al doi:10.1371/journal.pone.0004207
E1
Extrinsic
Phylogenomic


Symbiosis


Symbioses


Communities
Phylogenomic


Projects
Area 2: Extrinsic Novelty
E2
Extrinsic
Host Microbe Stress (HMS) Triangle
Host
Microbe Stress
E2
Extrinsic
Host
Microbiome Stress
Host Microbe Stress (HMS) Triangle
E2
Extrinsic
Symbiosis Under Stress
When organisms are placed under selective
pressure or stress where novelty would be
beneficial, can we predict which pathway
they will use?


What leads to interactions / symbioses
being a potential solution?


Can we manipulate interactions and/or force
new ones upon systems?
Extrinsic


Novelty
Extrinsic Novelty: HMS
Phylogenomic





HMS
Phylogenomic


Projects
E2
Extrinsic
HMS Type 1: Nutrient Acquisition
Host
Microbiome Nutrients
E2
Extrinsic
HMS Type 1: Chemosymbioses
Marine Invertebrates
Endosymbionts Carbon
Colleen
 

Cavanaugh
E2
Extrinsic
HMS Type 1: Xylem Feeders
Glassy Winged Sharpshooter
Gut


Endosymbionts
Trying to


Live on


Xylem Fluid
Nancy Moran
Dongying Wu
E2
Extrinsic
HMS Type 1: Nitrogen Acquisition
Oloton


Corn
Mucilage


Microbiome
Low


N
Van Deynze A, Zamora P, Delaux PM, Heitmann C, Jayaraman D, Rajasekar S, Graham D, Maeda J, Gibson D, Schwartz KD, Berry AM,
Bhatnagar S, Jospin G, Darling A, Jeannotte R, Lopez J, Weimer BC, Eisen JA, Shapiro HY, Ané JM, Bennett AB. 2018. Nitrogen fixation in a
landrace of maize is supported by a mucilage-associated diazotrophic microbiota. PLoS Biology 16(8):e2006352. doi: 10.1371/
journal.pbio.2006352. PMID: 30086128. PMCID: PMC6080747.
E2
Extrinsic
HMS Type 2: Pathogens
Host
Microbiome Pathogen
E2
Extrinsic
HMS Type 2: Flu & Ducks
Ducks
Gut


Microbiome
Flu
Walter 

Boyce
Holly


Ganz
Sarah


Hird
Ladan


Daroud
Alana

Firl
Hird SM, Ganz H, Eisen JA, Boyce WM. 2018. The cloacal microbiome of
fi
ve wild duck species varies by species and in
fl
uenza A virus infection status. mSphere 3:e00382-18. https:// doi.org/10.1128/
mSphere.00382-18
Ganz, H.H., Doroud, L., Firl, A.J., Hird, S.M., Eisen, J.A. and Boyce, W.M., 2017. Community-level differences in the microbiome of healthy wild mallards and those infected by influenza A
viruses. mSystems, 2(1) .e00188-16.
E2
Extrinsic
HMS Type 2: Kolalas & Chlamydia
Koala
Gut


Microbiome
Chlamydia


&


Antibiotics
Katherine


Dahlhausen
E2
Extrinsic
Dahlhausen KE, Jospin G, Coil DA, Eisen JA, Wilkins LGE. Isolation and sequence-based characterization of a koala symbiont: Lonepinella koalarum. PeerJ. 2020;8:e10177.
Published 2020 Oct 20. doi:10.7717/peerj.10177


Dahlhausen KE, Doroud L, Firl AJ, Polkinghorne A, Eisen JA. Characterization of shifts of koala (Phascolarctos cinereus) intestinal microbial communities associated with
antibiotic treatment. PeerJ. 2018;6:e4452. Published 2018 Mar 12. doi:10.7717/peerj.4452
Frogs
Skin


Microbiome
Chytrid
Sonia Ghose
Marina De León
HMS Type 2: Frogs and Chytrids
E2
Extrinsic
Host
Microbiome Changing


Environment
HMS Type 3: Environmental Change
E2
Extrinsic
HMS Type 3: Rice Microbiome
Rice
Root


Microbiome Domestication
E2
Extrinsic
Sundar Lab Srijak


Bhatnagar
Edwards J, Johnson C, Santos-Medellín C, et al. Structure, variation, and assembly of the root-associated microbiomes of rice. Proc Natl Acad Sci U S A. 2015;112(8):E911-
E920. doi:10.1073/pnas.1414592112
Seagrass
Microbiome Returning to


The Sea
HMS Type 3: Seagrass Land to Sea
Jenna

Lang
Jessica 

Green
Jay 

Stachowicz David Coil
E2
Extrinsic
HMS Type 3: Panamanian Isthmus
1000s of Species
Microbiome
Rise of


Wilkins
Bill


Wcislo
Matt


Leray
E2
Extrinsic
Eisen Lab “Topics”
Phylogenomic


Projects
Microbial
Phylogenomics


&


Evolvability
End of Tour
Eisen Lab “Topics”
Phylogenomic


Methods


& Tools
Phylogenomic


Resources


&


Reference Data
Communication


&


Participation


In Microbiology


& Science
Research


&


Evolvability
Eisen Lab “Topics”
Phylogenomic


Methods


& Tools
Phylogenomic


Resources


&


Reference Data
Communication


&


Evolvability
Phylogenomics Methods and Tools
Phylogenomic


&


Evolvability
Phylogenomic
&


Evolvability
Phylogenomic
≠


Relatedness
STAP
An Automated Phylogenetic Tree-Based Small Subunit
rRNA Taxonomy and Alignment Pipeline (STAP)
Dongying Wu1
*, Amber Hartman1,6
, Naomi Ward4,5
, Jonathan A. Eisen1,2,3
1 UC Davis Genome Center, University of California Davis, Davis, California, United States of America, 2 Section of Evolution and Ecology, College of Biological Sciences,
University of California Davis, Davis, California, United States of America, 3 Department of Medical Microbiology and Immunology, School of Medicine, University of
California Davis, Davis, California, United States of America, 4 Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, United States of America,
5 Center of Marine Biotechnology, Baltimore, Maryland, United States of America, 6 The Johns Hopkins University, Department of Biology, Baltimore, Maryland, United
States of America
Abstract
Comparative analysis of small-subunit ribosomal RNA (ss-rRNA) gene sequences forms the basis for much of what we know
about the phylogenetic diversity of both cultured and uncultured microorganisms. As sequencing costs continue to decline
and throughput increases, sequences of ss-rRNA genes are being obtained at an ever-increasing rate. This increasing flow of
data has opened many new windows into microbial diversity and evolution, and at the same time has created significant
methodological challenges. Those processes which commonly require time-consuming human intervention, such as the
preparation of multiple sequence alignments, simply cannot keep up with the flood of incoming data. Fully automated
methods of analysis are needed. Notably, existing automated methods avoid one or more steps that, though
computationally costly or difficult, we consider to be important. In particular, we regard both the building of multiple
sequence alignments and the performance of high quality phylogenetic analysis to be necessary. We describe here our fully-
automated ss-rRNA taxonomy and alignment pipeline (STAP). It generates both high-quality multiple sequence alignments
and phylogenetic trees, and thus can be used for multiple purposes including phylogenetically-based taxonomic
assignments and analysis of species diversity in environmental samples. The pipeline combines publicly-available packages
(PHYML, BLASTN and CLUSTALW) with our automatic alignment, masking, and tree-parsing programs. Most importantly,
this automated process yields results comparable to those achievable by manual analysis, yet offers speed and capacity that
are unattainable by manual efforts.
Citation: Wu D, Hartman A, Ward N, Eisen JA (2008) An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP). PLoS
ONE 3(7): e2566. doi:10.1371/journal.pone.0002566
multiple alignment and phylogeny was deemed unfeasible.
However, this we believe can compromise the value of the results.
For example, the delineation of OTUs has also been automated
via tools that do not make use of alignments or phylogenetic trees
(e.g., Greengenes). This is usually done by carrying out pairwise
comparisons of sequences and then clustering of sequences that
have better than some cutoff threshold of similarity with each
other). This approach can be powerful (and reasonably efficient)
but it too has limitations. In particular, since multiple sequence
alignments are not used, one cannot carry out standard
phylogenetic analyses. In addition, without multiple sequence
alignments one might end up comparing and contrasting different
regions of a sequence depending on what it is paired with.
The limitations of avoiding multiple sequence alignments and
phylogenetic analysis are readily apparent in tools to classify
sequences. For example, the Ribosomal Database Project’s
Classifier program [29] focuses on composition characteristics of
each sequence (e.g., oligonucleotide frequency) and assigns
taxonomy based upon clustering genes by their composition.
Though this is fast and completely automatable, it can be misled in
cases where distantly related sequences have converged on similar
composition, something known to be a major problem in ss-rRNA
sequences [30]. Other taxonomy assignment systems focus
primarily on the similarity of sequences. The simplest of these is
to use BLASTN to search a sequence database (e.g., Genbank) and
to then use information about the top match to assign some sort of
classification tools it does have some limitations. For example,
the generation of new alignments for each sequence is both
computational costly, and does not take advantage of available
curated alignments that make use of ss-RNA secondary structure
to guide the primary sequence alignment. Perhaps most
importantly however is that the tool is not fully automated. In
addition, it does not generate multiple sequence alignments for all
sequences in a dataset which would be necessary for doing many
analyses.
Automated methods for analyzing rRNA sequences are also
available at the web sites for multiple rRNA centric databases,
such as Greengenes and the Ribosomal Database Project (RDPII).
Though these and other web sites offer diverse powerful tools, they
do have some limitations. For example, not all provide multiple
sequence alignments as output and few use phylogenetic
approaches for taxonomy assignments or other analyses. More
importantly, all provide only web-based interfaces and their
integrated software, (e.g., alignment and taxonomy assignment),
cannot be locally installed by the user. Therefore, the user cannot
take advantage of the speed and computing power of parallel
processing such as is available on linux clusters, or locally alter and
potentially tailor these programs to their individual computing
needs (Table 1).
Given the limited automated tools that are available for
researchers have had to choose between two non-ideal options:
manually generating and/or curating alignments (an expensive
Table 1. Comparison of STAP’s computational abilities relative to existing commonly-used ss-RNA analysis tools.
STAP ARB Greengenes RDP
Installed where? Locally Locally Web only Web only
User interface Command line GUI Web portal Web portal
Parallel processing YES NO NO NO
Manual curation for taxonomy assignment NO YES NO NO
Manual curation for alignment NO YES NO* NO
Open source YES** NO NO NO
Processing speed Fast Slow Medium Medium
It is important to note, that STAP is the only software that runs on the command line and can take advantage of parallel processing on linux clusters and, further, is
more amenable to downstream code manipulation.
*
Note: Greengenes alignment output is compatible with upload into ARB and downstream manual alignment.
**
The STAP program itself is open source, the programs it depends on are freely available but not open source.
doi:10.1371/journal.pone.0002566.t001
ss-rRNA Taxonomy Pipeline
STAP database, and the query sequence is aligned to them using
the CLUSTALW profile alignment algorithm [40] as described
above for domain assignment. By adapting the profile alignment
algorithm, t
while gaps a
sequence a
alignments
Figure 1. A flow chart of the STAP pipeline.
doi:10.1371/journal.pone.0002566.g001
STAP database, and the query sequence is aligned to them using
the CLUSTALW profile alignment algorithm [40] as described
above for domain assignment. By adapting the profile alignment
algorithm, the alignments from the STAP database remain intact,
while gaps are inserted and nucleotides are trimmed for the query
sequence according to the profile defined by the previous
alignments from the databases. Thus the accuracy and quality of
the alignment generated at this step depends heavily on the quality
of the Bacterial/Archaeal ss-rRNA alignments from the
Greengenes project or the Eukaryotic ss-rRNA alignments from
the RDPII project.
Phylogenetic analysis using multiple sequence alignments rests on
the assumption that the residues (nucleotides or amino acids) at the
same position in every sequence in the alignment are homologous.
Thus, columns in the alignment for which ‘‘positional homology’’
cannot be robustly determined must be excluded from subsequent
analyses. This process of evaluating homology and eliminating
questionable columns, known as masking, typically requires time-
consuming, skillful, human intervention. We designed an automat-
ed masking method for ss-rRNA alignments, thus eliminating this
bottleneck in high-throughput processing.
First, an alignment score is calculated for each aligned column
by a method similar to that used in the CLUSTALX package [42].
Specifically, an R-dimensional sequence space representing all the
possible nucleotide character states is defined. Then for each
aligned column, the nucleotide populating that column in each of
the aligned sequences is assigned a score in each of the R
dimensions (Sr) according to the IUB matrix [42]. The consensus
‘‘nucleotide’’ for each column (X) also has R dimensions, with the
score for each dimension (Xr) calculated as the average of the
scores for that column in that dimension (average of Sr). Thus the
score of the consensus nucleotide is a mathematical expression
describing the average ‘‘nucleotide’’ in that column for that
Figure 2. Domain assignment. In Step 1, STAP assigns a domain to
each query sequence based on its position in a maximum likelihood
tree of representative ss-rRNA sequences. Because the tree illustrated
here is not rooted, domain assignment would not be accurate and
reliable (sequence similarity based methods cannot make an accurate
assignment in this case either). However the figure illustrates an
important role of the tree-based domain assignment step, namely
automatic identification of deep-branching environmental ss-rRNAs.
Figure 1. A flow chart of the STAP pipeline.
doi:10.1371/journal.pone.0002566.g001
ss-rRNA Taxonomy Pipeline
Wu D, Hartman A, Ward N, Eisen JA. An automated phylogenetic tree-based small subunit rRNA taxonomy and alignment
pipeline (STAP) [published correction appears in PLoS ONE. 2008;3(7). doi: 10.1371/annotation/
c1aa88dd-4360-4902-8599-4d7edca79817]. PLoS One. 2008;3(7):e2566. Published 2008 Jul 2. doi:10.1371/
journal.pone.0002566
Venter et al., Science 304: 66. 2004
STAP for Sargasso Metagenome
WATERS
Hartman et al. BMC Bioinformatics 2010, 11:317
http://www.biomedcentral.com/1471-2105/11/317
Open Access
SOFTWARE
© 2010 Hartman et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
Software
Introducing W.A.T.E.R.S.: a Workflow for the
Alignment, Taxonomy, and Ecology of Ribosomal
Sequences
Amber L Hartman†1,3, Sean Riddle†2, Timothy McPhillips2, Bertram Ludäscher2 and Jonathan A Eisen*1
Abstract
Background: For more than two decades microbiologists have used a highly conserved microbial gene as a
phylogenetic marker for bacteria and archaea. The small-subunit ribosomal RNA gene, also known as 16 S rRNA, is
encoded by ribosomal DNA, 16 S rDNA, and has provided a powerful comparative tool to microbial ecologists. Over
time, the microbial ecology field has matured from small-scale studies in a select number of environments to massive
collections of sequence data that are paired with dozens of corresponding collection variables. As the complexity of
data and tool sets have grown, the need for flexible automation and maintenance of the core processes of 16 S rDNA
sequence analysis has increased correspondingly.
Results: We present WATERS, an integrated approach for 16 S rDNA analysis that bundles a suite of publicly available 16
S rDNA analysis software tools into a single software package. The "toolkit" includes sequence alignment, chimera
removal, OTU determination, taxonomy assignment, phylogentic tree construction as well as a host of ecological
analysis and visualization tools. WATERS employs a flexible, collection-oriented 'workflow' approach using the open-
source Kepler system as a platform.
Conclusions: By packaging available software tools into a single automated workflow, WATERS simplifies 16 S rDNA
analyses, especially for those without specialized bioinformatics, programming expertise. In addition, WATERS, like
some of the newer comprehensive rRNA analysis tools, allows researchers to minimize the time dedicated to carrying
out tedious informatics steps and to focus their attention instead on the biological interpretation of the results. One
advantage of WATERS over other comprehensive tools is that the use of the Kepler workflow system facilitates result
interpretation and reproducibility via a data provenance sub-system. Furthermore, new "actors" can be added to the
workflow as desired and we see WATERS as an initial seed for a sizeable and growing repository of interoperable, easy-
to-combine tools for asking increasingly complex microbial ecology questions.
Background
Microbial communities and how they are surveyed
Microbial communities abound in nature and are crucial
for the success and diversity of ecosystems. There is no
end in sight to the number of biological questions that
can be asked about microbial diversity on earth. From
animal and human guts to open ocean surfaces and deep
sea hydrothermal vents, to anaerobic mud swamps or
boiling thermal pools, to the tops of the rainforest canopy
and the frozen Antarctic tundra, the composition of
microbial communities is a source of natural history,
intellectual curiosity, and reservoir of environmental
health [1]. Microbial communities are also mediators of
insight into global warming processes [2,3], agricultural
success [4], pathogenicity [5,6], and even human obesity
[7,8].
In the mid-1980 s, researchers began to sequence ribo-
somal RNAs from environmental samples in order to
characterize the types of microbes present in those sam-
ples, (e.g., [9,10]). This general approach was revolution-
ized by the invention of the polymerase chain reaction
(PCR), which made it relatively easy to clone and then
* Correspondence: jaeisen@ucdavis.edu
1 Department of Medical Microbiology and Immunology and the Department
of Evolution and Ecology, Genome Center, University of California Davis, One
Shields Avenue, Davis, CA, 95616, USA
† Contributed equally
Full list of author information is available at the end of the article
317
/11/317
Page 2 of 14
somal RNA) in partic-
omal RNA (ss-rRNA).
amount of previously
1,11-13]. Researchers
RNA gene not only
can be PCR amplified,
and highly conserved
ally distributed among
for inferring phyloge-
hen, "cultivation-inde-
ht a revolution to the
scientists to study a
rsity in many different
. The general premise
Figure 1 Overview of WATERS. Schema of WATERS where white
boxes indicate "behind the scenes" analyses that are performed in WA-
Align
Check
chimeras
Cluster Build
Tree
Assign
Taxonomy
Tree w/
Taxonomy
Diversity
statistics &
graphs
Unifrac
files
Cytoscape
network
OTU table
Hartman et al. BMC Bioinformatics 2010, 11:317
http://www.biomedcentral.com/1471-2105/11/317
Page 3 of 14
Motivations
As outlined above, successfully processing microbial
sequence collections is far from trivial. Each step is com-
plex and usually requires significant bioinformatics
expertise and time investment prior to the biological
interpretation. In order to both increase efficiency and
ensure that all best-practice tools are easily usable, we
sought to create an "all-inclusive" method for performing
all of these bioinformatics steps together in one package.
To this end, we have built an automated, user-friendly,
workflow-based system called WATERS: a Workflow for
the Alignment, Taxonomy, and Ecology of Ribosomal
Sequences (Fig. 1). In addition to being automated and
simple to use, because WATERS is executed in the Kepler
scientific workflow system (Fig. 2) it also has the advan-
tage that it keeps track of the data lineage and provenance
of data products [23,24].
Automation
The primary motivation in building WATERS was to
minimize the technical, bioinformatics challenges that
arise when performing DNA sequence clustering, phylo-
genetic tree, and statistical analyses by automating the 16
S rDNA analysis workflow. We also hoped to exploit
additional features that workflow-based approaches
entail, such as optimized execution and data lineage
tracking and browsing [23,25-27]. In the earlier days of 16
S rDNA analysis, simply knowing which microbes were
present and whether they were biologically novel was a
noteworthy achievement. It was reasonable and expected,
therefore, to invest a large amount of time and effort to
get to that list of microbes. But now that current efforts
are significantly more advanced and often require com-
parison of dozens of factors and variables with datasets of
thousands of sequences, it is not practically feasible to
process these large collections "by hand", and hugely inef-
ficient if instead automated methods can be successfully
employed.
Broadening the user base
A second motivation and perspective is that by minimiz-
ing the technical difficulty of 16 S rDNA analysis through
the use of WATERS, we aim to make the analysis of these
datasets more widely available and allow individuals with
Figure 2 Screenshot of WATERS in Kepler software. Key features: the library of actors un-collapsed and displayed on the left-hand side, the input
and output paths where the user declares the location of their input files and desired location for the results files. Each green box is an individual Kepler
actor that performs a single action on the data stream. The connectors (black arrows) direct and hook up the actors in a defined sequence. Double-
clicking on any actor or connector allows it to be manipulated and re-arranged.
Hartman et al. BMC Bioinformatics 2010, 11:317
http://www.biomedcentral.com/1471-2105/11/317
Page 9 of 14
default is 97% and 99%), and they are also generated for
every metadata variable comparison that the user
includes.
Data pruning
To assist in troubleshooting and quality control,
WATERS returns to the user three fasta files of sequences
that were removed at various steps in the workflow. A
short_sequences.fas file is created that contains all
Figure 3 Biologically similar results automatically produced by WATERS on published colonic microbiota samples. (A) Rarefaction curves sim-
ilar to curves shown in Eckburg et al. Fig. 2; 70-72, indicate patient numbers, i.e., 3 different individuals. (B) Weighted Unifrac analysis based on phylo-
genetic tree and OTU data produced by WATERS very similar to Eckburg et al. Fig. 3B. (C) Neighbor-joining phylogenetic tree (Quicktree) representing
the sequences analyzed by WATERS, which is clearly similar to Fig. S1 in Eckburg et al.
B
A
!"#$ !"#% !"#& "#" "#&
'&(!(')*+),-(./*0/-01,()234/0,)5(67#778
!"#%
!"#&
"#"
"#&
"#%
"#$
"#6
"#9
'%(!(')*+),-(./*0/-01,()234/0,)5(%&#9%8
:";
:"<
:"=
:">
:"?
:"@
:"A
:&;
:&<
:&=
:&>
:&?
:&@
:&A
:%;
:%<
:%=
:%>
:%?
:%@
:%A
'=;(!('&(.B('%
" :9" &9"" %%9" $"""
"
9"
&""
&9"
%""
%9"
:%
:&
:"
C
!"#$%&'()%$%*
!"#$%&'()"+%*
)%+$",&'$%'!"#$%&("
"#$(-'!"#$%&("
.%&&/#'0(#&'!("
%,*(+'-,&'$%'!"#$%&("
1(&0(#/$%*
#+'*$&()("
#+'*$&()("+%*
2324
5"00",&'$%'!"#$%&("
#6"-'!"#$%&("
"+,7",&'$%'!"#$%&("
1/*'!"#$%&("
1(&0(#/$%*
!"#(++(
1(&0(#/$%*
0'++(#/$%*
Hartman AL, Riddle S, McPhillips T, Ludäscher B, Eisen JA. Introducing W.A.T.E.R.S.: a workflow for the alignment, taxonomy, and ecology of ribosomal sequences. BMC
Bioinformatics. 2010;11:317. Published 2010 Jun 12. doi:10.1186/1471-2105-11-317
alignment used to build the profile, resulting in a multiple
sequence alignment of full-length reference sequences and
metagenomic reads. The final step of the alignment process is a
quality control filter that 1) ensures that only homologous SSU-
rRNA sequences from the appropriate phylogenetic domain are
included in the final alignment, and 2) masks highly gapped
alignment columns (see Text S1).
We use this high quality alignment of metagenomic reads and
references sequences to construct a fully-resolved, phylogenetic
tree and hence determine the evolutionary relationships between
the reads. Reference sequences are included in this stage of the
analysis to guide the phylogenetic assignment of the relatively
short metagenomic reads. While the software can be easily
extended to incorporate a number of different phylogenetic tools
capable of analyzing metagenomic data (e.g., RAxML [27],
pplacer [28], etc.), PhylOTU currently employs FastTree as a
default method due to its relatively high speed-to-performance
PD versus PID clustering, 2) to explore overlap between PhylOTU
clusters and recognized taxonomic designations, and 3) to quantify
the accuracy of PhylOTU clusters from shotgun reads relative to
those obtained from full-length sequences.
PhylOTU Clusters Recapitulate PID Clusters
We sought to identify how PD-based clustering compares to
commonly employed PID-based clustering methods by applying
the two methods to the same set of sequences. Both PID-based
clustering and PhylOTU may be used to identify OTUs from
overlapping sequences. Therefore we applied both methods to a
dataset of 508 full-length bacterial SSU-rRNA sequences (refer-
ence sequences; see above) obtained from the Ribosomal Database
Project (RDP) [25]. Recent work has demonstrated that PID is
more accurately calculated from pairwise alignments than multiple
sequence alignments [32–33], so we used ESPRIT, which
Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in this generalize
workflow of PhylOTU. See Results section for details.
doi:10.1371/journal.pcbi.1001061.g001
Finding Metagenomic OTUs
Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer
JP, Green JL, Eisen JA, Pollard KS. (2011) PhylOTU: A High-
Throughput Procedure Quantifies Microbial Community
Diversity and Resolves Novel Taxa from Metagenomic Data.
PLoS Comput Biol 7(1): e1001061. doi:10.1371/
journal.pcbi.1001061
OTUs via Phylogeny (PhylOTU)
Tom


Sharpton
Katie


Pollard
Jessica


Green
Finding Metagenomic OTUs
rRNA Copy # vs. Phylogeny
Steven
 

Kembel
Jessic
a

Green
Martin

Wu
Kembel SW, Wu M, Eisen JA, Green JL (2012)
Incorporating 16S Gene Copy Number
Information Improves Estimates of Microbial
Diversity and Abundance. PLoS Comput Biol
8(10): e1002743. doi:10.1371/
journal.pcbi.1002743
Other


&


Evolvability
Phylogenomic
RecA vs. rRNA
Eisen 1995 Journal of Molecular Evolution 41: 1105-1123..
RecA From Other Species
RecA from Environment?
Eisen 1995 Journal of Molecular Evolution 41: 1105-1123..
Metagenomics
DNA
Venter et al., Science 304: 66. 2004
RecA Phylotyping - Sargasso Metagenome
GOS 1
GOS 2
GOS 3
GOS 4
GOS 5
Phylogenetic ID of Novel Lineages
Wu et al PLoS One 2011
Metagenomics
DNA
RecA RecA
RecA
RpoB RpoB
RpoB
Rpl4 Rpl4
Rpl4 rRNA rRNA
rRNA
Hsp70 Hsp70
Hsp70
EFTu EFTu
EFTu
http://genomebiology.com/2008/9/10/R151 Genome Biology 2008, Volume 9, Issue 10, Article R151 Wu and Eisen R151.7
Genome Biology 2008, 9:R151
sequences are not conserved at the nucleotide level [29]. As a
result, the nr database does not actually contain many more
protein marker sequences that can be used as references than
those available from complete genome sequences.
Comparison of phylogeny-based and similarity-based phylotyping
Although our phylogeny-based phylotyping is fully auto-
mated, it still requires many more steps than, and is slower
than, similarity based phylotyping methods such as a
MEGAN [30]. Is it worth the trouble? Similarity based phylo-
typing works by searching a query sequence against a refer-
ence database such as NCBI nr and deriving taxonomic
information from the best matches or 'hits'. When species
that are closely related to the query sequence exist in the ref-
erence database, similarity-based phylotyping can work well.
However, if the reference database is a biased sample or if it
contains no closely related species to the query, then the top
hits returned could be misleading [31]. Furthermore, similar-
ity-based methods require an arbitrary similarity cut-off
value to define the top hits. Because individual bacterial
genomes and proteins can evolve at very different rates, a uni-
versal cut-off that works under all conditions does not exist.
As a result, the final results can be very subjective.
In contrast, our tree-based bracketing algorithm places the
query sequence within the context of a phylogenetic tree and
only assigns it to a taxonomic level if that level has adequate
sampling (see Materials and methods [below] for details of
the algorithm). With the well sampled species Prochlorococ-
cus marinus, for example, our method can distinguish closely
related organisms and make taxonomic identifications at the
species level. Our reanalysis of the Sargasso Sea data placed
672 sequences (3.6% of the total) within a P. marinus clade.
On the other hand, for sparsely sampled clades such as
Aquifex, assignments will be made only at the phylum level.
Thus, our phylogeny-based analysis is less susceptible to data
sampling bias than a similarity based approach, and it makes
Major phylotypes identified in Sargasso Sea metagenomic data
Figure 3
Major phylotypes identified in Sargasso Sea metagenomic data. The metagenomic data previously obtained from the Sargasso Sea was reanalyzed using
AMPHORA and the 31 protein phylogenetic markers. The microbial diversity profiles obtained from individual markers are remarkably consistent. The
breakdown of the phylotyping assignments by markers and major taxonomic groups is listed in Additional data file 5.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
A
l
p
h
a
p
r
o
t
e
o
b
a
c
t
e
r
i
a
B
e
t
a
p
r
o
t
e
o
b
a
c
t
e
r
i
a
G
a
m
m
a
p
r
o
t
e
o
b
a
c
t
e
r
i
a
D
e
l
t
a
p
r
o
t
e
o
b
a
c
t
e
r
i
a
E
p
s
i
l
o
n
p
r
o
t
e
o
b
a
c
t
e
r
i
a
U
n
c
l
a
s
s
i
f
i
e
d
p
r
o
t
e
o
b
a
c
t
e
r
i
a
B
a
c
t
e
r
o
i
d
e
t
e
s
C
h
l
a
m
y
d
i
a
e
C
y
a
n
o
b
a
c
t
e
r
i
a
A
c
i
d
o
b
a
c
t
e
r
i
a
T
h
e
r
m
o
t
o
g
a
e
F
u
s
o
b
a
c
t
e
r
i
a
A
c
t
i
n
o
b
a
c
t
e
r
i
a
A
q
u
i
f
i
c
a
e
P
l
a
n
c
t
o
m
y
c
e
t
e
s
S
p
i
r
o
c
h
a
e
t
e
s
F
i
r
m
i
c
u
t
e
s
C
h
l
o
r
o
f
l
e
x
i
C
h
l
o
r
o
b
i
U
n
c
l
a
s
s
i
f
i
e
d
b
a
c
t
e
r
i
a
dnaG
frr
infC
nusA
pgk
pyrG
rplA
rplB
rplC
rplD
rplE
rplF
rplK
rplL
rplM
rplN
rplP
rplS
rplT
rpmA
rpoB
rpsB
rpsC
rpsE
rpsI
rpsJ
rpsK
rpsM
rpsS
smpB
tsf
Relative
abundance
Many other genes
better than rRNA
Sargasso Phylotypes
Weighted
%
of
Clones
0.000
0.125
0.250
0.375
0.500
Major Phylogenetic Group
A
l
p
h
a
p
r
o
t
e
o
b
a
c
t
e
r
i
a
B
e
t
a
p
r
o
t
e
o
b
a
c
t
e
r
i
a
G
a
m
m
a
p
r
o
t
e
o
b
a
c
t
e
r
i
a
E
p
s
i
l
o
n
p
r
o
t
e
o
b
a
c
t
e
r
i
a
D
e
l
t
a
p
r
o
t
e
o
b
a
c
t
e
r
i
a
C
y
a
n
o
b
a
c
t
e
r
i
a
F
i
r
m
i
c
u
t
e
s
A
c
t
i
n
o
b
a
c
t
e
r
i
a
C
h
l
o
r
o
b
i
C
F
B
C
h
l
o
r
o
fl
e
x
i
S
p
i
r
o
c
h
a
e
t
e
s
F
u
s
o
b
a
c
t
e
r
i
a
D
e
i
n
o
c
o
c
c
u
s
-
T
h
e
r
m
u
s
E
u
r
y
a
r
c
h
a
e
o
t
a
C
r
e
n
a
r
c
h
a
e
o
t
a
EFG EFTu HSP70 RecA RpoB rRNA
Venter et al., Science 304: 66. 2004
RecA Phylotyping - Sargasso Metagenome
Amphora
W
Martin

Wu
Wu M, Eisen JA. A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 2008;9(10):R151.
Published 2008 Oct 13. doi:10.1186/gb-2008-9-10-r151
AMPHORA
http://genomebiology.com/2008/9/10/R151 Genome Biology 2008, Volume 9, Issue 10, Article R151 Wu and Eisen R151.7
Major phylotypes identified in Sargasso Sea metagenomic data
Figure 3
Major phylotypes identified in Sargasso Sea metagenomic data. The metagenomic data previously obtained from the Sargasso Sea was reanalyzed using
AMPHORA and the 31 protein phylogenetic markers. The microbial diversity profiles obtained from individual markers are remarkably consistent. The
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
A
l
p
h
a
p
r
o
t
e
o
b
a
c
t
e
r
i
a
B
e
t
a
p
r
o
t
e
o
b
a
c
t
e
r
i
a
G
a
m
m
a
p
r
o
t
e
o
b
a
c
t
e
r
i
a
D
e
l
t
a
p
r
o
t
e
o
b
a
c
t
e
r
i
a
E
p
s
i
l
o
n
p
r
o
t
e
o
b
a
c
t
e
r
i
a
U
n
c
l
a
s
s
i
f
i
e
d
p
r
o
t
e
o
b
a
c
t
e
r
i
a
B
a
c
t
e
r
o
i
d
e
t
e
s
C
h
l
a
m
y
d
i
a
e
C
y
a
n
o
b
a
c
t
e
r
i
a
A
c
i
d
o
b
a
c
t
e
r
i
a
T
h
e
r
m
o
t
o
g
a
e
F
u
s
o
b
a
c
t
e
r
i
a
A
c
t
i
n
o
b
a
c
t
e
r
i
a
A
q
u
i
f
i
c
a
e
P
l
a
n
c
t
o
m
y
c
e
t
e
s
S
p
i
r
o
c
h
a
e
t
e
s
F
i
r
m
i
c
u
t
e
s
C
h
l
o
r
o
f
l
e
x
i
C
h
l
o
r
o
b
i
U
n
c
l
a
s
s
i
f
i
e
d
b
a
c
t
e
r
i
a
dnaG
frr
infC
nusA
pgk
pyrG
rplA
rplB
rplC
rplD
rplE
rplF
rplK
rplL
rplM
rplN
rplP
rplS
rplT
rpmA
rpoB
rpsB
rpsC
rpsE
rpsI
rpsJ
rpsK
rpsM
rpsS
smpB
tsf
Relative
abundance AMPHORA Phylotyping w/ Protein Markers
Martin

Wu
Wu M, Eisen JA. A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 2008;9(10):R151.
Published 2008 Oct 13. doi:10.1186/gb-2008-9-10-r151
Phylosift - Bayesian Phylotyping
Input Sequences
rRNA workflow
protein workflow
profile HMMs used to align
candidates to reference alignment
Taxonomic
Summaries
parallel option
hmmalign
multiple alignment
LAST
fast candidate search
pplacer
phylogenetic placement
LAST
fast candidate search
LAST
fast candidate search
search input against references
hmmalign
multiple alignment
hmmalign
multiple alignment
Infernal
multiple alignment
LAST
fast candidate search
<600 bp
>600 bp
Sample Analysis &
Comparison
Krona plots,
Number of reads placed
for each marker gene
Edge PCA,
Tree visualization,
Bayes factor tests
each
input
sequence
scanned
against
both
workflows
Aaron


Darling
Erik


Matsen
Holly


Bik
Guillaume


Jospin
Darling AE, Jospin G, Lowe E,
Matsen FA IV, Bik HM, Eisen JA.
(2014) PhyloSift: phylogenetic
analysis of genomes and
metagenomes. PeerJ 2:e243
http://dx.doi.org/10.7717/
peerj.243
Erik


Lowe
PD from Metagenomes
typically used as a qualitative measure because duplicate s
quences are usually removed from the tree. However, the
test may be used in a semiquantitative manner if all clone
even those with identical or near-identical sequences, are i
cluded in the tree (13).
Here we describe a quantitative version of UniFrac that w
call “weighted UniFrac.” We show that weighted UniFrac b
haves similarly to the FST test in situations where both a
FIG. 1. Calculation of the unweighted and the weighted UniFr
measures. Squares and circles represent sequences from two differe
environments. (a) In unweighted UniFrac, the distance between t
circle and square communities is calculated as the fraction of t
branch length that has descendants from either the square or the circ
environment (black) but not both (gray). (b) In weighted UniFra
branch lengths are weighted by the relative abundance of sequences
the square and circle communities; square sequences are weight
twice as much as circle sequences because there are twice as many tot
circle sequences in the data set. The width of branches is proportion
to the degree to which each branch is weighted in the calculations, an
gray branches have no weight. Branches 1 and 2 have heavy weigh
since the descendants are biased toward the square and circles, respe
tively. Branch 3 contributes no value since it has an equal contributio
from circle and square sequences after normalization.
Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of
Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214
Jessica


Green
Steven


Kembel
Katie


Pollard
Zorro - Automated Masking
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
1.0
2.0
3.0
4.0
5.0
6.0
no ma
zorro
gbloc
Distance
to
True
Tree
NJ
ML
A
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
200 400 800 1600 3200
1.0
2.0
3.0
4.0
5.0
6.0
7.0
Seque
Distance
to
True
Tree
NJ
ML
A
C
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
200 400 800 1600 3200
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
200 400 800 1600 3200
0.0
1.0
2.0
3.0
4.0
5.0
6.0
200 400 8
Sequence Length
Di
ML
C
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
200 400 800 1600 3200
D
0.0
1.0
2.0
3.0
4.0
5.0
200 400 800
0.0
1.0
2.0
3.0
4.0
5.0
6.0
200 400 800
no masking
zorro
gblocks
Distance
to
True
Tree
NJ
ML
A
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
200 400 800 1600 3200
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
B
Wu M, Chatterji S, Eisen JA (2012) Accounting For Alignment Uncertainty
in Phylogenomics. PLoS ONE 7(1): e30288. doi:10.1371/
journal.pone.0030288
Tools: Phylogenomic Functional Prediction
Phylogenomic


&


Evolvability
Phylogenomic
We need to be able to predict


Functions well from sequence data.
Tools: Phylogenomic Functional Prediction
Helicobacter pylori genome 1997
Helicobacter pylori genome 1997
“The ability of H. pylori to
perform mismatch repair is
suggested by the presence of
methyl transferases, mutS
and uvrD. However,
orthologues of MutH and
MutL were not identified.”
MutL?
From http://asajj.roswellpark.org/huberman/dna_repair/mmr.html
Blast Search of H. pylori “MutS”
Score E
Sequences producing significant alignments: (bits) Value
sp|P73625|MUTS_SYNY3 DNA MISMATCH REPAIR PROTEIN 117 3e-25
sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN 69 1e-10
sp|P44834|MUTS_HAEIN DNA MISMATCH REPAIR PROTEIN 64 3e-09
sp|P10339|MUTS_SALTY DNA MISMATCH REPAIR PROTEIN 62 2e-08
sp|O66652|MUTS_AQUAE DNA MISMATCH REPAIR PROTEIN 57 4e-07
sp|P23909|MUTS_ECOLI DNA MISMATCH REPAIR PROTEIN 57 4e-07
Blast search pulls up Syn. sp MutS#2 with much higher p value
than other MutS homologs


Based on this TIGR predicted this species had mismatch repair
Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
PSA
Similarity


≠


Relatedness
Overlaying Functions onto Tree
Aquae Trepa
Rat
Fly
Xenla
Mouse
Human
Yeast
Neucr
Arath
Borbu
Synsp
Neigo
Thema
Strpy
Bacsu
Ecoli
Theaq
Deira
Chltr
Spombe
Yeast
Yeast
Spombe
Mouse
Human
Arath
Yeast
Human
Mouse
Arath
StrpyBacsu
Human
Celeg
Yeast
Metth
Borbu
Aquae
Synsp
Deira Helpy
mSaco
Yeast
Celeg
Human
MSH4
MSH5
MutS2
MutS1
MSH1
MSH3
MSH6
MSH2
Based on Eisen, 1998


Nucl Acids Res 26: 4291-4300.
High Mutation Rate in H. pylori
Blast search pulls up Syn. sp MutS#2 with much higher p value
than other MutS homologs


Based on this TIGR predicted this species had mismatch repair
Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
PHYLOGENENETIC PREDICTION OF GENE FUNCTION
IDENTIFY HOMOLOGS
OVERLAY KNOWN
FUNCTIONS ONTO TREE
INFER LIKELY FUNCTION
OF GENE(S) OF INTEREST
1 2 3 4 5 6
3 5
3
1A 2A 3A 1B 2B 3B
2A 1B
1A
3A
1B
2B
3B
ALIGN SEQUENCES
CALCULATE GENE TREE
1
2
4
6
CHOOSE GENE(S) OF INTEREST
2A
2A
5
3
Species 3
Species 1 Species 2
1
1 2
2
2 3
1
1A 3A
1A 2A 3A
1A 2A 3A
4 6
4 5 6
4 5 6
2B 3B
1B 2B 3B
1B 2B 3B
ACTUAL EVOLUTION
(ASSUMED TO BE UNKNOWN)
Duplication?
EXAMPLE A EXAMPLE B
Duplication?
Duplication?
Duplication
5
METHOD
Ambiguous
Based on
Eisen, 1998
Genome Res 8:
163-167.
Phylogenomics
Phylotyping
Eisen et al.
1992
Eisen et al. 1992. J. Bact.174: 3416
Proteorhodopsin Phylogenomics
Venter et al., Science 304: 66. 2004
Shotmap
Simulate)
metagenomic)
library)
Translate)
metagenomic)
reads)
Search)
metagenomic)
pep6des)
Classify)
metagenomic)
pep6des)
Es6mate)
protein)family)
abundance)
Taxonomic)
profiles)from)real)
metagenomes)
Protein)family)
database)
IMG/ER)
reference)
genomes)
Construct))
mock))
community)
1"
Annotate)
genes)in)
genomes)
2"
Expected)
abundance)of)
gene)families)
3"
4"
5"
Protein)family)
database)
Evaluate)
es6ma6on)
accuracy)
6" 7"
8"
9"
Tom Sharpton
Katie Pollard
https://github.com/sharpton/shotmap
Shotmap
Nayfach S, Bradley PH, Wyman SK, et al. Automated and Accurate Estimation of Gene Family Abundance from
Shotgun Metagenomes. PLoS Comput Biol. 2015;11(11):e1004573. Published 2015 Nov 13. doi:10.1371/
journal.pcbi.1004573
Stephen Nayfach
Limitations of Phylogenetic Prediction of Function
• Still imperfectly automated


• Each gene family different


• Each function different


• In some cases, function does not track with phylogeny well


• Does not work when NO members of a gene family have
been characterized
Tools: Phylogenetic Profiling
Phylogenetic


&


Evolvability
Phylogenomic
• Thermophile (grows at 80°C)


• Anaerobic


• Grows very efficiently on CO (Carbon
Monoxide)


• Produces hydrogen gas


• Low GC Gram positive (Firmicute)


• Genome Determined (Wu et al. 2005
PLoS Genetics 1: e65. )
Martin Wu Frank Robb
Homologs of Sporulation Genes
Wu et al. 2005 PLoS
Genetics 1: e65.
Carboxydothermus sporulates
Wu et al. 2005 PLoS Genetics 1: e65.
Non-Homology Predictions: Phylogenetic Profiling
• Step 1: Search all genes in organisms of
interest against all other genomes


• Ask: Yes or No, is each gene found in each
other species


• Cluster genes by distribution patterns
(profiles)
Sporulation Gene Profile
Wu et al. 2005 PLoS Genetics 1: e65.
B. subtilis new sporulation genes
Bjorn Traag
Richard Losick
Antonia Pugliese
J Bacteriol. 2013 Jan;195(2):253-60. doi: 10.1128/JB.01778-12
PG Profiling Works Better with Orthology
Martin Wu
Eisen JA, Wu M. 2002. Phylogenetic analysis and gene functional predictions: phylogenomics in action. Theoretical and Population Biology
61: 481-487. PMID: 12167367.
PG Profiling for Metagenomes
Jiang X, Langille MGI, Neches RY, Elliot M, Levin SA, Eisen JA, et al. (2012) Functional Biogeography of Ocean Microbes Revealed
through Non-Negative Matrix Factorization. PLoS ONE 7(9): e43866. doi:10.1371/journal.pone.0043866
Unidentified Pfams with high association to Components 1, 2
and 5 may have similar functional themes to other Pfams seen in
these components, or they may have functions that are ecologically
linked to the identified theme, or they may be associated
taxonomically rather than functionally (ie., they may be expressed
by the same taxa that express the identified Pfams). In the future,
Additionally, we inspected the Pfams that were associated with
the ‘‘ubiquitous’’ cluster previously identified in Figure 2. Many of
these Pfams are associated with bacterial primary metabolism and
only 1% of these had unknown functions (Table S6). This is a
striking difference compared to the 15–54% proportion of
unknown Pfams seen in the five NMF components.
Figure 3. Components across sites. a) Weight for each of the five components at each of the 45 sites (HT
); b) the site-similarity matrix ( ^
H
HT ^
H
H); c)
environmental variables for the sites. The matrices are aligned so that the same row corresponds to the same site in each matrix. Sites are ordered by
applying spectral reordering to the similarity matrix (see Materials and Methods). Rows are aligned across the three matrices.
doi:10.1371/journal.pone.0043866.g003
PLOS ONE | www.plosone.org 4 September 2012 | Volume 7 | Issue 9 | e43866
Tools: Whole Genome Phylogeny
Whole


&


Evolvability
Phylogenomic
We need to know how organisms are


related to each other
Tools: Whole Genome Phylogeny
16s Says Hyphomonas is in Rhodobacteriales
Badger et al. 2005
Int J System Evol
Microbiol 55:
1021-1026.
Naomi
 

Ward
Jonatha
n

Badger
WGT & gene trees: Related to Caulobacterales
Badger et al. 2005
Int J System Evol
Microbiol 55:
1021-1026.
Naomi
 

Ward
Jonatha
n

Badger
HMS Type 1: Xylem Feeders
Glassy Winged Sharpshooter
Gut


Endosymbionts
Trying to


Live on


Xylem Fluid
Nancy Moran
Dongying Wu
E2
Extrinsic
WGT: Higher Evolutionary Rates in Endosymbionts
Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
Higher
Evolutionary
Rates in
Endosymbionts
Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
MutS MutL
+ +
+ +
+ +
+ +
_ _
_ _
Variation in Evolution Rates Correlated with Repair Gene Presence
Highest Rates


In Those Missing


Mismatch Repair


Genes
Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
MutS MutL
+ +
+ +
+ +
+ +
_ _
_ _
Variation in Evolution Rates Correlated with Repair Gene Presence
Important Use of


Whole Genome Trees
Whole Genome Trees: Many Possible Methods
Lang JM, Darling AE, Eisen JA (2013) Phylogeny of
Bacterial and Archaeal Genomes Using Conserved
Genes: Supertrees and Supermatrices. PLoS ONE
8(4): e62510. doi:10.1371/journal.pone.0062510
Jenna Lang
Automated WGT: Amphora
W
Martin

Wu
Automated WGT: Phylosift
Input Sequences
rRNA workflow
protein workflow
profile HMMs used to align
candidates to reference alignment
Taxonomic
Summaries
parallel option
hmmalign
multiple alignment
LAST
fast candidate search
pplacer
phylogenetic placement
LAST
fast candidate search
LAST
fast candidate search
search input against references
hmmalign
multiple alignment
hmmalign
multiple alignment
Infernal
multiple alignment
LAST
fast candidate search
<600 bp
>600 bp
Sample Analysis &
Comparison
Krona plots,
Number of reads placed
for each marker gene
Edge PCA,
Tree visualization,
Bayes factor tests
each
input
sequence
scanned
against
both
workflows
Aaron


Darling
Erik


Matsen
Holly


Bik
Guillaume


Jospin
Darling AE, Jospin G, Lowe E,
Matsen FA IV, Bik HM, Eisen JA.
(2014) PhyloSift: phylogenetic
analysis of genomes and
metagenomes. PeerJ 2:e243
http://dx.doi.org/10.7717/
peerj.243
Erik


Lowe
Normalizing Across Genes Tree OTU
Wu, D., Doroud, L, Eisen, JA 2013. arXiv. TreeOTU:
Operational Taxonomic Unit Classi
fi
cation Based on
Phylogenetic
Dongying Wu
Tools: Linking Phylogeny and Function
Linking


&


Evolvability
Phylogenomic
fl
ow. PeerJ 3: e960. PMID: 26020012. PMCID:
PMC4435499.
Binning & Assembly
DNA
inputs of fixed carbon or nitrogen from external sources. As with
Leptospirillum group I, both Leptospirillum group II and III have the
genes needed to fix carbon by means of the Calvin–Benson–
Bassham cycle (using type II ribulose 1,5-bisphosphate carboxy-
lase–oxygenase). All genomes recovered from the AMD system
contain formate hydrogenlyase complexes. These, in combination
with carbon monoxide dehydrogenase, may be used for carbon
fixation via the reductive acetyl coenzyme A (acetyl-CoA) pathway
by some, or all, organisms. Given the large number of ABC-type
sugar and amino acid transporters encoded in the Ferroplasma type
Figure 4 Cell metabolic cartoons constructed from the annotation of 2,180 ORFs
identified in the Leptospirillum group II genome (63% with putative assigned function) and
1,931 ORFs in the Ferroplasma type II genome (58% with assigned function). The cell
cartoons are shown within a biofilm that is attached to the surface of an acid mine
drainage stream (viewed in cross-section). Tight coupling between ferrous iron oxidation,
pyrite dissolution and acid generation is indicated. Rubisco, ribulose 1,5-bisphosphate
carboxylase–oxygenase. THF, tetrahydrofolate.
articles
NATURE | doi:10.1038/nature02340 | www.nature.com/nature 5
©2004 NaturePublishing Group
HiC Metagenomic Binning
Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore RW, Eisen JA,
Darling AE. (2014) Strain- and plasmid-level deconvolution of a
synthetic metagenome by sequencing proximity ligation products.
PeerJ 2:e415 http://dx.doi.org/10.7717/peerj.415
Table 1 Species alignment fractions. The number of reads aligning to each replicon present in the
synthetic microbial community are shown before and after filtering, along with the percent of total
constituted by each species. The GC content (“GC”) and restriction site counts (“#R.S.”) of each replicon,
species, and strain are shown. Bur1: B. thailandensis chromosome 1. Bur2: B. thailandensis chromosome
2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2: L. brevis plasmid 2, Ped: P. pentosaceus,
K12: E. coli K12 DH10B, BL21: E. coli BL21. An expanded version of this table can be found in Table S2.
Sequence Alignment % of Total Filtered % of aligned Length GC #R.S.
Lac0 10,603,204 26.17% 10,269,562 96.85% 2,291,220 0.462 629
Lac1 145,718 0.36% 145,478 99.84% 13,413 0.386 3
Lac2 691,723 1.71% 665,825 96.26% 35,595 0.385 16
Lac 11,440,645 28.23% 11,080,865 96.86% 2,340,228 0.46 648
Ped 2,084,595 5.14% 2,022,870 97.04% 1,832,387 0.373 863
BL21 12,882,177 31.79% 2,676,458 20.78% 4,558,953 0.508 508
K12 9,693,726 23.92% 1,218,281 12.57% 4,686,137 0.507 568
E. coli 22,575,903 55.71% 3,894,739 17.25% 9,245,090 0.51 1076
Bur1 1,886,054 4.65% 1,797,745 95.32% 2,914,771 0.68 144
Bur2 2,536,569 6.26% 2,464,534 97.16% 3,809,201 0.672 225
Bur 4,422,623 10.91% 4,262,279 96.37% 6,723,972 0.68 369
Figure 1 Hi-C insert distribution. The distribution of genomic distances between Hi-C read pairs is
shown for read pairs mapping to each chromosome. For each read pair the minimum path length on
the circular chromosome was calculated and read pairs separated by less than 1000 bp were discarded.
The 2.5 Mb range was divided into 100 bins of equal size and the number of read pairs in each bin
was recorded for each chromosome. Bin values for each chromosome were normalized to sum to 1 and
plotted.
E. coli K12 genome were distributed in a similar manner as previously reported (Fig. 1;
(Lieberman-Aiden et al., 2009)). We observed a minor depletion of alignments spanning
the linearization point of the E. coli K12 assembly (e.g., near coordinates 0 and 4686137)
due to edge eVects induced by BWA treating the sequence as a linear chromosome rather
than circular.
OI 10.7717/peerj.415 9/19
Figure 2 Metagenomic Hi-C associations. The log-scaled, normalized number of Hi-C read pairs
associating each genomic replicon in the synthetic community is shown as a heat map (see color scale,
blue to yellow: low to high normalized, log scaled association rates). Bur1: B. thailandensis chromosome
1. Bur2: B. thailandensis chromosome 2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2:
L. brevis plasmid 2, Ped: P. pentosaceus, K12: E. coli K12 DH10B, BL21: E. coli BL21.
reference assemblies of the members of our synthetic microbial community with the same
alignment parameters as were used in the top ranked clustering (described above). We first
counted the number of Hi-C reads associating each reference assembly replicon (Fig. 2;
Figure 3 Contigs associated by Hi-C reads. A graph is drawn with nodes depicting contigs and
depicting associations between contigs as indicated by aligned Hi-C read pairs, with the count t
depicted by the weight of edges. Nodes are colored to reflect the species to which they belong (see le
with node size reflecting contig size. Contigs below 5 kb and edges with weights less than 5 were exc
Contig associations were normalized for variation in contig size.
typically represent the reads and variant sites as a variant graph wherein variant sit
represented as nodes, and sequence reads define edges between variant sites observ
the same read (or read pair). We reasoned that variant graphs constructed from H
data would have much greater connectivity (where connectivity is defined as the m
path length between randomly sampled variant positions) than graphs constructed
Chris Beite
l

@datscimed
Aaron Darling


@koadman
Long Reads Help, A Lot
Hiseq & Miseq
100-250 bp
Moleculo
2-20 kb
Pacbio RSII
2-20kb
Micky Kertesz,


Tim Blauwcamp
Meredith Ashby
Cheryl Heiner
Illumina-based


“synthetic long
reads”
Real-time single
molecule
sequencing


(p4-c2, p5-c3)
295 Megabases 474 Megabases
61 Gigabases
Meredith Ashby
Metagenomic Binning
Phylogeny is an
important tool in
binning
Sharpshooter Symbionts
Wu et al. 2006 PLoS Biology 4: e188.
Phylogenetic Binning: CFB Phyla
Sharpshooter Symbiont Binning
Wu et al. 2006 PLoS Biology 4: e188.
Baumannia makes vitamins and cofactors
Sulcia makes amino acids
Wu et al. 2006 PLoS Biology 4: e188.
Baumannia makes vitamins and cofactors
Sulcia makes amino acids
Phylogenetic Binning
Nancy Moran
Dongying Wu
Resources and Reference Data
Phylogenomic


&


Evolvability
Phylogenomic


Resources


&


Reference Data
Communication
Genomes Poorly Sampled
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
2002-2007: TIGR Tree of Life Project
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree


Naomi
 

Ward
Kare
n

Nelson
2007-2014: GEBA
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree


Dongyin
g

Wu
Phi
l

Hugenholtz
Niko
s

Kyrpides
Hans-Pete
r

Klenk
All
a

Lapidus
Synapomorphies Exist
Wu et al.. 2009. Nature
462: 1056-1060.
Missing Microbes?
GEBA Cyanobacteria
Shih et al. 2013. PNAS 10.1073/pnas.1217107110
0.3
B1
B2
C1
Paulinella
Glaucophyte
Green
Red
Chromalveolates
C2
C3
A
E
F
G
B3
D
A
B
Fig. 2. Implications on plastid evolution. (A) Maxi-
mum-likelihood phylogenetic tree of plastids and cya-
nobacteria, grouped by subclades (Fig. 1). The red dot
Chery
l

Kerfeld
Haloarchaeal GEBA-like
Lynch et al. (2012) PLoS ONE 7(7): e41389. doi:10.1371/journal.pone.0041389
Eri
n

Lynch
The Dark Matter of Biology
From Wu et al. 2009 Nature 462, 1056-1060
JGI Dark Matter Project
environmental
samples (n=9)
isolation of single
cells (n=9,600)
whole genome
amplification (n=3,300)
SSU rRNA gene
based identification
(n=2,000)
genome sequencing,
assembly and QC (n=201)
draft genomes
(n=201)
SAK
HSM ETL
TG
HOT
GOM
GBS
EPR
TA
ETL T
PR
E
BS
AK E
SM G TA
T
TG
OM
OT
seawater brackish/freshwater hydrothermal sediment bioreactor
GN04
WS3 (Latescibacteria)
GN01
!"#$%&'$
LD1
WS1
Poribacteria
BRC1
Lentisphaerae
Verrucomicrobia
OP3 (Omnitrophica)
Chlamydiae
Planctomycetes
NKB19 (Hydrogenedentes)
WYO
Armatimonadetes
WS4
Actinobacteria
Gemmatimonadetes
NC10
SC4
WS2
Cyanobacteria
()*&2
Deltaproteobacteria
EM19 (Calescamantes)
+,-*./'&'012345678#89/,-568/:
GAL35
Aquificae
EM3
Thermotogae
Dictyoglomi
SPAM
GAL15
CD12 (Aerophobetes)
OP8 (Aminicenantes)
AC1
SBR1093
Thermodesulfobacteria
Deferribacteres
Synergistetes
OP9 (Atribacteria)
()*&2
Caldiserica
AD3
Chloroflexi
Acidobacteria
Elusimicrobia
Nitrospirae
49S1 2B
Caldithrix
GOUTA4
*;<%0123=/68>8?8,6@98/:
Chlorobi
486?8,A-5B
Tenericutes
4AB@9/,-568/
Chrysiogenetes
Proteobacteria
4896@9/,-565B
TG3
Spirochaetes
WWE1 (Cloacamonetes)
C=1
ZB3
=D)&'E
F58>@,@,,AB&CG56?AB
OP1 (Acetothermia)
Bacteriodetes
TM7
GN02 (Gracilibacteria)
SR1
BH1
OD1 (Parcubacteria)
(*1
OP11 (Microgenomates)
Euryarchaeota
Micrarchaea
DSEG (Aenigmarchaea)
Nanohaloarchaea
Nanoarchaea
Cren MCG
Thaumarchaeota
Cren C2
Aigarchaeota
Cren pISA7
Cren Thermoprotei
Korarchaeota
pMC2A384 (Diapherotrites)
BACTERIA ARCHAEA
archaeal toxins (Nanoarchaea)
lytic murein transglycosylase
stringent response
(Diapherotrites, Nanoarchaea)
ppGpp
limiting
amino acids
SpotT RelA
(GTP or GDP)
+ PPi
GTP or GDP
+ATP
limiting
phosphate,
fatty acids,
carbon, iron
DksA
Expression of components
for stress response
sigma factor (Diapherotrites, Nanoarchaea)
!4
"#$#"%
!2
!3 !1
-35 -10
&'()
&*()
+',#-./0123452
oxidoretucase
+ +
e- donor e- acceptor
H
'
Ribo
ADP
+
'62
O
Reduction
Oxidation
H
'
Ribo
ADP
'6
O
2
H
',)##$#6##$#72#####################',)6
+ + -
HGT from Eukaryotes (Nanoarchaea)
Eukaryota
O
68*62
OH
'6
*8*63
O
O
68*62
'6
*8*63
O
tetra-
peptide
O
68*62
OH
'6
*8*63
O
O
68*62
'6
*8*63
O
tetra-
peptide
murein (peptido-glycan)
archaeal type purine synthesis
(Microgenomates)
PurF
PurD
9:3'
PurL/Q
PurM
PurK
PurE
9:3*
PurB
PurP
?
Archaea
adenine guanine
O
6##'
2
+
'
'62
'
'
H
H
'
'
'
H
H
H
' '
H
PRPP ;,<*,+
IMP
,<*,+
A
*
G
U
A *
G U
G
U
A
*
G
U
A U
A * U
A * U
Growing
AA chain
=+',>?/0@#
recognizes
UGA
1+',
UGA recoded for Gly (Gracilibacteria)
ribosome
Woyke et al. Nature 2013.
Tanja

Woyke
Microbial Dark Matter Part 2
• Ramunas
Stepanauskas


• Tanja Woyke


• Jonathan Eisen


• Duane Moser


• Tullis Onstott
MAGs
SFAMs (Sifting Families)
Representative
Genomes
Extract
Protein
Annotation
All v. All
BLAST
Homology
Clustering
(MCL)
SFams
Align &
Build
HMMs
HMMs
Screen for
Homologs
New
Genomes
Extract
Protein
Annotation
Figure 1
Sharpton et al. 2012.BMC bioinformatics, 13(1), 264.
A
B
C
PhyEco Markers
Phylogenetic group Genome Number Gene Number Maker Candidates
Archaea 62 145415 106
Actinobacteria 63 267783 136
Alphaproteobacteria 94 347287 121
Betaproteobacteria 56 266362 311
Gammaproteobacteria 126 483632 118
Deltaproteobacteria 25 102115 206
Epislonproteobacteria 18 33416 455
Bacteriodes 25 71531 286
Chlamydae 13 13823 560
Chloroflexi 10 33577 323
Cyanobacteria 36 124080 590
Firmicutes 106 312309 87
Spirochaetes 18 38832 176
Thermi 5 14160 974
Thermotogae 9 17037 684
Wu D, Jospin G, Eisen JA (2013) Systematic Identification of Gene Families
for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological
Studies of Bacteria and Archaea and Their Major Subgroups. PLoS ONE
8(10): e77033. doi:10.1371/journal.pone.0077033
Eisen Lab “Topics”
Phylogenomic


&


Evolvability
Phylogenomic


&


Participation


In Microbiology


& Science
Model
Methods


& Tools
Microbial
Phylogenomics


&


Evolvability
Phylogenomic


Resources


&


Reference Data
Communication


&


Participation


In Microbiology


& Science
Research


Projects
Eisen Lab
• Rules

More Related Content

What's hot

EVE 161 Winter 2018 Class 14
EVE 161 Winter 2018 Class 14EVE 161 Winter 2018 Class 14
EVE 161 Winter 2018 Class 14
Jonathan Eisen
 
EVE 161 Winter 2018 Class 15
EVE 161 Winter 2018 Class 15EVE 161 Winter 2018 Class 15
EVE 161 Winter 2018 Class 15
Jonathan Eisen
 
UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics UC Davis EVE161 Lecture 18 by @phylogenomics
UC Davis EVE161 Lecture 18 by @phylogenomics
Jonathan Eisen
 
Biotechnology
BiotechnologyBiotechnology
Biotechnology
Robin Seamon
 
Microbial Phylogenomics (EVE161) Class 13 - Comparative Genomics
Microbial Phylogenomics (EVE161) Class 13 - Comparative GenomicsMicrobial Phylogenomics (EVE161) Class 13 - Comparative Genomics
Microbial Phylogenomics (EVE161) Class 13 - Comparative Genomics
Jonathan Eisen
 
CORONAVIRUS AKA SARS PATENT
CORONAVIRUS AKA SARS PATENT CORONAVIRUS AKA SARS PATENT
CORONAVIRUS AKA SARS PATENT
ICJ-ICC
 
"The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February...
"The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February..."The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February...
"The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February...
Jonathan Eisen
 
BIS2C: Lecture 13: The Human Microbiome
BIS2C: Lecture 13: The Human MicrobiomeBIS2C: Lecture 13: The Human Microbiome
BIS2C: Lecture 13: The Human Microbiome
Jonathan Eisen
 
The Seagrass Microbiome Project
The Seagrass Microbiome Project The Seagrass Microbiome Project
The Seagrass Microbiome Project
Jonathan Eisen
 
Investigation of phylogenic relationships of shrew populations using genetic...
Investigation of phylogenic relationships  of shrew populations using genetic...Investigation of phylogenic relationships  of shrew populations using genetic...
Investigation of phylogenic relationships of shrew populations using genetic...
Juan Barrera
 
BiS2C: Lecture 12: Acquiring Novelty
BiS2C: Lecture 12: Acquiring NoveltyBiS2C: Lecture 12: Acquiring Novelty
BiS2C: Lecture 12: Acquiring Novelty
Jonathan Eisen
 
03-CJM-004-KRISHNA-ARTICLE-MATING-BOTSWANA
03-CJM-004-KRISHNA-ARTICLE-MATING-BOTSWANA03-CJM-004-KRISHNA-ARTICLE-MATING-BOTSWANA
03-CJM-004-KRISHNA-ARTICLE-MATING-BOTSWANAKabo Baruti
 
Microbial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 16: Shotgun MetagenomicsMicrobial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
Jonathan Eisen
 
EVE 161 Winter 2018 Class 10
EVE 161 Winter 2018 Class 10EVE 161 Winter 2018 Class 10
EVE 161 Winter 2018 Class 10
Jonathan Eisen
 
Isolation of Novel Mycobacteriophages from Tropical Soils of Puerto Rico
Isolation of Novel Mycobacteriophages from Tropical Soils of Puerto Rico Isolation of Novel Mycobacteriophages from Tropical Soils of Puerto Rico
Isolation of Novel Mycobacteriophages from Tropical Soils of Puerto Rico avargas11
 
Incredible invaders: How bark and ambrosia beetles are colonizing the world
Incredible invaders: How bark and ambrosia beetles are colonizing the worldIncredible invaders: How bark and ambrosia beetles are colonizing the world
Incredible invaders: How bark and ambrosia beetles are colonizing the world
cgstorer
 
EVE 161 Winter 2018 Class 17
EVE 161 Winter 2018 Class 17EVE 161 Winter 2018 Class 17
EVE 161 Winter 2018 Class 17
Jonathan Eisen
 
BiS2C: Lecture 8: The Tree of Life II
BiS2C: Lecture 8: The Tree of Life IIBiS2C: Lecture 8: The Tree of Life II
BiS2C: Lecture 8: The Tree of Life II
Jonathan Eisen
 

What's hot (19)

EVE 161 Winter 2018 Class 14
EVE 161 Winter 2018 Class 14EVE 161 Winter 2018 Class 14
EVE 161 Winter 2018 Class 14
 
EVE 161 Winter 2018 Class 15
EVE 161 Winter 2018 Class 15EVE 161 Winter 2018 Class 15
EVE 161 Winter 2018 Class 15
 
UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics UC Davis EVE161 Lecture 18 by @phylogenomics
UC Davis EVE161 Lecture 18 by @phylogenomics
 
Biotechnology
BiotechnologyBiotechnology
Biotechnology
 
Microbial Phylogenomics (EVE161) Class 13 - Comparative Genomics
Microbial Phylogenomics (EVE161) Class 13 - Comparative GenomicsMicrobial Phylogenomics (EVE161) Class 13 - Comparative Genomics
Microbial Phylogenomics (EVE161) Class 13 - Comparative Genomics
 
CORONAVIRUS AKA SARS PATENT
CORONAVIRUS AKA SARS PATENT CORONAVIRUS AKA SARS PATENT
CORONAVIRUS AKA SARS PATENT
 
"The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February...
"The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February..."The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February...
"The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February...
 
BIS2C: Lecture 13: The Human Microbiome
BIS2C: Lecture 13: The Human MicrobiomeBIS2C: Lecture 13: The Human Microbiome
BIS2C: Lecture 13: The Human Microbiome
 
The Seagrass Microbiome Project
The Seagrass Microbiome Project The Seagrass Microbiome Project
The Seagrass Microbiome Project
 
Investigation of phylogenic relationships of shrew populations using genetic...
Investigation of phylogenic relationships  of shrew populations using genetic...Investigation of phylogenic relationships  of shrew populations using genetic...
Investigation of phylogenic relationships of shrew populations using genetic...
 
BiS2C: Lecture 12: Acquiring Novelty
BiS2C: Lecture 12: Acquiring NoveltyBiS2C: Lecture 12: Acquiring Novelty
BiS2C: Lecture 12: Acquiring Novelty
 
03-CJM-004-KRISHNA-ARTICLE-MATING-BOTSWANA
03-CJM-004-KRISHNA-ARTICLE-MATING-BOTSWANA03-CJM-004-KRISHNA-ARTICLE-MATING-BOTSWANA
03-CJM-004-KRISHNA-ARTICLE-MATING-BOTSWANA
 
Microbial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 16: Shotgun MetagenomicsMicrobial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
 
EVE 161 Winter 2018 Class 10
EVE 161 Winter 2018 Class 10EVE 161 Winter 2018 Class 10
EVE 161 Winter 2018 Class 10
 
Isolation of Novel Mycobacteriophages from Tropical Soils of Puerto Rico
Isolation of Novel Mycobacteriophages from Tropical Soils of Puerto Rico Isolation of Novel Mycobacteriophages from Tropical Soils of Puerto Rico
Isolation of Novel Mycobacteriophages from Tropical Soils of Puerto Rico
 
Incredible invaders: How bark and ambrosia beetles are colonizing the world
Incredible invaders: How bark and ambrosia beetles are colonizing the worldIncredible invaders: How bark and ambrosia beetles are colonizing the world
Incredible invaders: How bark and ambrosia beetles are colonizing the world
 
EVE 161 Winter 2018 Class 17
EVE 161 Winter 2018 Class 17EVE 161 Winter 2018 Class 17
EVE 161 Winter 2018 Class 17
 
Randy Ploetz
Randy PloetzRandy Ploetz
Randy Ploetz
 
BiS2C: Lecture 8: The Tree of Life II
BiS2C: Lecture 8: The Tree of Life IIBiS2C: Lecture 8: The Tree of Life II
BiS2C: Lecture 8: The Tree of Life II
 

Similar to Phylogenomic Case Studies: The Benefits (and Occasional Drawbacks) of Integrating Evolutionary and Genomic Studies. Talk by J. Eisen for BIATA 2021

Pseudomonas alcaligenes, potential antagonist against fusarium oxysporum f.s...
Pseudomonas alcaligenes, potential  antagonist against fusarium oxysporum f.s...Pseudomonas alcaligenes, potential  antagonist against fusarium oxysporum f.s...
Pseudomonas alcaligenes, potential antagonist against fusarium oxysporum f.s...
Alexander Decker
 
Phylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesPhylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of Microbes
Jonathan Eisen
 
OS18 - 8.a.2 Rational Design of Attenuated FMDV Vaccines by elevation of –Cpg...
OS18 - 8.a.2 Rational Design of Attenuated FMDV Vaccines by elevation of –Cpg...OS18 - 8.a.2 Rational Design of Attenuated FMDV Vaccines by elevation of –Cpg...
OS18 - 8.a.2 Rational Design of Attenuated FMDV Vaccines by elevation of –Cpg...
EuFMD
 
GMO - Genetically Modified Organisms
GMO - Genetically Modified OrganismsGMO - Genetically Modified Organisms
GMO - Genetically Modified Organisms
Aishwarya Ravichandran
 
ZFN-Science-Rats
ZFN-Science-RatsZFN-Science-Rats
ZFN-Science-RatsGreg Davis
 
Spontaneous mutations in microorganisms
Spontaneous mutations in microorganismsSpontaneous mutations in microorganisms
Spontaneous mutations in microorganisms
prasanna1017
 
Use the Harvard Business Case, West Jet Airlines Information Tec.docx
Use the Harvard Business Case, West Jet Airlines Information Tec.docxUse the Harvard Business Case, West Jet Airlines Information Tec.docx
Use the Harvard Business Case, West Jet Airlines Information Tec.docx
jessiehampson
 
Seminario biomol Maria Clara Torres Ferrer
Seminario biomol Maria Clara Torres FerrerSeminario biomol Maria Clara Torres Ferrer
Seminario biomol Maria Clara Torres Ferrer
MariaClaraTorres7
 
Elucidating the role of the Chromosomal Type III Secretion System structural ...
Elucidating the role of the Chromosomal Type III Secretion System structural ...Elucidating the role of the Chromosomal Type III Secretion System structural ...
Elucidating the role of the Chromosomal Type III Secretion System structural ...Jackson Osaghae-Nosa
 
Malaria treatment schedules and socio economic implications of
Malaria treatment schedules and socio  economic implications ofMalaria treatment schedules and socio  economic implications of
Malaria treatment schedules and socio economic implications of
Alexander Decker
 
Gpb minor seminor
Gpb  minor seminorGpb  minor seminor
Gpb minor seminor
chaithram11
 
Molecular Genetics of Host-Virus Interactions
Molecular Genetics of Host-Virus InteractionsMolecular Genetics of Host-Virus Interactions
Molecular Genetics of Host-Virus Interactions
Suresh Gopalan
 
Intracellular highways in the plants: the role of the cytoskeleton in camv i...
Intracellular highways in the plants:  the role of the cytoskeleton in camv i...Intracellular highways in the plants:  the role of the cytoskeleton in camv i...
Intracellular highways in the plants: the role of the cytoskeleton in camv i...
CIAT
 
38.Cellwall maceration and electrolyte leakage by endopolygalacturonase from ...
38.Cellwall maceration and electrolyte leakage by endopolygalacturonase from ...38.Cellwall maceration and electrolyte leakage by endopolygalacturonase from ...
38.Cellwall maceration and electrolyte leakage by endopolygalacturonase from ...Annadurai B
 
Heavy metal resistant bacteria
Heavy metal resistant bacteriaHeavy metal resistant bacteria
Heavy metal resistant bacteria
College of Medicine,University of Babylon
 
Detection of Parapoxvirus in goats during contagious ecthyma outbreak in Cear...
Detection of Parapoxvirus in goats during contagious ecthyma outbreak in Cear...Detection of Parapoxvirus in goats during contagious ecthyma outbreak in Cear...
Detection of Parapoxvirus in goats during contagious ecthyma outbreak in Cear...
Agriculture Journal IJOEAR
 
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
ExternalEvents
 
S. prasanth kumar young scientist awarded presentation
S. prasanth kumar young scientist awarded presentationS. prasanth kumar young scientist awarded presentation
S. prasanth kumar young scientist awarded presentation
Prasanthperceptron
 
Sarah's INBRE poster updated Aug 11 LL FINAL-2
Sarah's INBRE poster updated Aug 11 LL FINAL-2Sarah's INBRE poster updated Aug 11 LL FINAL-2
Sarah's INBRE poster updated Aug 11 LL FINAL-2Sarah Sanders
 
Avs significant achievements and present status of trichoderma spp. in
Avs significant achievements and present status of trichoderma spp. inAvs significant achievements and present status of trichoderma spp. in
Avs significant achievements and present status of trichoderma spp. in
AMOL SHITOLE
 

Similar to Phylogenomic Case Studies: The Benefits (and Occasional Drawbacks) of Integrating Evolutionary and Genomic Studies. Talk by J. Eisen for BIATA 2021 (20)

Pseudomonas alcaligenes, potential antagonist against fusarium oxysporum f.s...
Pseudomonas alcaligenes, potential  antagonist against fusarium oxysporum f.s...Pseudomonas alcaligenes, potential  antagonist against fusarium oxysporum f.s...
Pseudomonas alcaligenes, potential antagonist against fusarium oxysporum f.s...
 
Phylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesPhylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of Microbes
 
OS18 - 8.a.2 Rational Design of Attenuated FMDV Vaccines by elevation of –Cpg...
OS18 - 8.a.2 Rational Design of Attenuated FMDV Vaccines by elevation of –Cpg...OS18 - 8.a.2 Rational Design of Attenuated FMDV Vaccines by elevation of –Cpg...
OS18 - 8.a.2 Rational Design of Attenuated FMDV Vaccines by elevation of –Cpg...
 
GMO - Genetically Modified Organisms
GMO - Genetically Modified OrganismsGMO - Genetically Modified Organisms
GMO - Genetically Modified Organisms
 
ZFN-Science-Rats
ZFN-Science-RatsZFN-Science-Rats
ZFN-Science-Rats
 
Spontaneous mutations in microorganisms
Spontaneous mutations in microorganismsSpontaneous mutations in microorganisms
Spontaneous mutations in microorganisms
 
Use the Harvard Business Case, West Jet Airlines Information Tec.docx
Use the Harvard Business Case, West Jet Airlines Information Tec.docxUse the Harvard Business Case, West Jet Airlines Information Tec.docx
Use the Harvard Business Case, West Jet Airlines Information Tec.docx
 
Seminario biomol Maria Clara Torres Ferrer
Seminario biomol Maria Clara Torres FerrerSeminario biomol Maria Clara Torres Ferrer
Seminario biomol Maria Clara Torres Ferrer
 
Elucidating the role of the Chromosomal Type III Secretion System structural ...
Elucidating the role of the Chromosomal Type III Secretion System structural ...Elucidating the role of the Chromosomal Type III Secretion System structural ...
Elucidating the role of the Chromosomal Type III Secretion System structural ...
 
Malaria treatment schedules and socio economic implications of
Malaria treatment schedules and socio  economic implications ofMalaria treatment schedules and socio  economic implications of
Malaria treatment schedules and socio economic implications of
 
Gpb minor seminor
Gpb  minor seminorGpb  minor seminor
Gpb minor seminor
 
Molecular Genetics of Host-Virus Interactions
Molecular Genetics of Host-Virus InteractionsMolecular Genetics of Host-Virus Interactions
Molecular Genetics of Host-Virus Interactions
 
Intracellular highways in the plants: the role of the cytoskeleton in camv i...
Intracellular highways in the plants:  the role of the cytoskeleton in camv i...Intracellular highways in the plants:  the role of the cytoskeleton in camv i...
Intracellular highways in the plants: the role of the cytoskeleton in camv i...
 
38.Cellwall maceration and electrolyte leakage by endopolygalacturonase from ...
38.Cellwall maceration and electrolyte leakage by endopolygalacturonase from ...38.Cellwall maceration and electrolyte leakage by endopolygalacturonase from ...
38.Cellwall maceration and electrolyte leakage by endopolygalacturonase from ...
 
Heavy metal resistant bacteria
Heavy metal resistant bacteriaHeavy metal resistant bacteria
Heavy metal resistant bacteria
 
Detection of Parapoxvirus in goats during contagious ecthyma outbreak in Cear...
Detection of Parapoxvirus in goats during contagious ecthyma outbreak in Cear...Detection of Parapoxvirus in goats during contagious ecthyma outbreak in Cear...
Detection of Parapoxvirus in goats during contagious ecthyma outbreak in Cear...
 
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
 
S. prasanth kumar young scientist awarded presentation
S. prasanth kumar young scientist awarded presentationS. prasanth kumar young scientist awarded presentation
S. prasanth kumar young scientist awarded presentation
 
Sarah's INBRE poster updated Aug 11 LL FINAL-2
Sarah's INBRE poster updated Aug 11 LL FINAL-2Sarah's INBRE poster updated Aug 11 LL FINAL-2
Sarah's INBRE poster updated Aug 11 LL FINAL-2
 
Avs significant achievements and present status of trichoderma spp. in
Avs significant achievements and present status of trichoderma spp. inAvs significant achievements and present status of trichoderma spp. in
Avs significant achievements and present status of trichoderma spp. in
 

More from Jonathan Eisen

Eisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfEisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdf
Jonathan Eisen
 
Talk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingTalk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meeting
Jonathan Eisen
 
Thoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsThoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current Actions
Jonathan Eisen
 
EVE198 Summer Session Class 4
EVE198 Summer Session Class 4EVE198 Summer Session Class 4
EVE198 Summer Session Class 4
Jonathan Eisen
 
EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1
Jonathan Eisen
 
EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines
Jonathan Eisen
 
EVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionEVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 Introduction
Jonathan Eisen
 
EVE198 Spring2021 Class2
EVE198 Spring2021 Class2EVE198 Spring2021 Class2
EVE198 Spring2021 Class2
Jonathan Eisen
 
EVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesEVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 Vaccines
Jonathan Eisen
 
EVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionEVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA Detection
Jonathan Eisen
 
EVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionEVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 Introduction
Jonathan Eisen
 
EVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingEVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID Testing
Jonathan Eisen
 
EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesEVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID Vaccines
Jonathan Eisen
 
EVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionEVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID Transmission
Jonathan Eisen
 
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesEVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
Jonathan Eisen
 
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingEVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
Jonathan Eisen
 
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionEVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
Jonathan Eisen
 
BIS2C_2020. Lecture 22 Fungi Part 1
BIS2C_2020. Lecture 22 Fungi Part 1BIS2C_2020. Lecture 22 Fungi Part 1
BIS2C_2020. Lecture 22 Fungi Part 1
Jonathan Eisen
 
BIS2C_2020. Lecture 23 Fungi Part 2
BIS2C_2020. Lecture 23 Fungi Part 2BIS2C_2020. Lecture 23 Fungi Part 2
BIS2C_2020. Lecture 23 Fungi Part 2
Jonathan Eisen
 
BIS2C2020 - Lecture 10 - Parasites and Pathogens
BIS2C2020 - Lecture 10 - Parasites and PathogensBIS2C2020 - Lecture 10 - Parasites and Pathogens
BIS2C2020 - Lecture 10 - Parasites and Pathogens
Jonathan Eisen
 

More from Jonathan Eisen (20)

Eisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfEisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdf
 
Talk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingTalk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meeting
 
Thoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsThoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current Actions
 
EVE198 Summer Session Class 4
EVE198 Summer Session Class 4EVE198 Summer Session Class 4
EVE198 Summer Session Class 4
 
EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1
 
EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines
 
EVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionEVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 Introduction
 
EVE198 Spring2021 Class2
EVE198 Spring2021 Class2EVE198 Spring2021 Class2
EVE198 Spring2021 Class2
 
EVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesEVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 Vaccines
 
EVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionEVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA Detection
 
EVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionEVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 Introduction
 
EVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingEVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID Testing
 
EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesEVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID Vaccines
 
EVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionEVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID Transmission
 
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesEVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
 
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingEVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
 
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionEVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
 
BIS2C_2020. Lecture 22 Fungi Part 1
BIS2C_2020. Lecture 22 Fungi Part 1BIS2C_2020. Lecture 22 Fungi Part 1
BIS2C_2020. Lecture 22 Fungi Part 1
 
BIS2C_2020. Lecture 23 Fungi Part 2
BIS2C_2020. Lecture 23 Fungi Part 2BIS2C_2020. Lecture 23 Fungi Part 2
BIS2C_2020. Lecture 23 Fungi Part 2
 
BIS2C2020 - Lecture 10 - Parasites and Pathogens
BIS2C2020 - Lecture 10 - Parasites and PathogensBIS2C2020 - Lecture 10 - Parasites and Pathogens
BIS2C2020 - Lecture 10 - Parasites and Pathogens
 

Recently uploaded

Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 

Recently uploaded (20)

Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 

Phylogenomic Case Studies: The Benefits (and Occasional Drawbacks) of Integrating Evolutionary and Genomic Studies. Talk by J. Eisen for BIATA 2021

  • 1. Phylogenomic Case Studies: The Benefits (and Occasional Drawbacks) of Integrating Evolutionary and Genomic Studies BIATA 2021 Jonathan A. Eisen University of California, Davis @phylogenomics http://phylogenomics.me
  • 3. Phylogenomics and Evolvability •Mutation •Duplication •Deletion •Rearrangement •Recombination Intrinsic Novelty Origin Evolvability: variation in these processes w/in & between taxa 
 Phylogenomics: integrating genomics & evolution, helps interpret / predict evolvability
  • 6. Eisen Lab Funding • NSF • DOE • Gordon and Betty Moore Foundation • Alfred P. Sloan Foundation • NIH • UC Davis • DARPA • DHS
  • 7. Eisen Lab “Topics” Phylogenomic Methods & Tools Microbial Phylogenomics & Evolvability Phylogenomic Resources & Reference Data Communication & Participation In Microbiology & Science Research Projects
  • 10. RecA Structure & Function I Intrinsic Liu SK, Eisen JA, Hanawalt PC, Tessman IW. 1993. recA mutations that reduce the constitutive coprotease activity of the RecA1202(PrtC) protein: possible involvement of interfilament association in proteolytic and recombination activities. Journal of Bacteriology 175: 6518-6529. PMID: 8407828. PMCID: PMC206762.
  • 11. RecA vs. rRNA Eisen 1995 Journal of Molecular Evolution 41: 1105-1123.. More on this later … I Intrinsic
  • 12. RecA From Other Species I Intrinsic
  • 13. RecA Missing From Some Taxa Those taxa without RecA homologs have no homologous recombination which has major impacts on tempo and modes of evolution I Intrinsic Moran NA, Mira A. The process of genome shrinkage in the obligate symbiont Buchnera aphidicola. Genome Biol. 2001;2(12):RESEARCH0054. doi:10.1186/gb-2001-2-12- research0054
  • 15. 13621300 13621775 13622250 13622725 13623200 0 625 1250 1875 2500 Series1 Streps 0 500 1000 1500 2000 2500 3000 2632200 2632700 2633200 2633700 2634200 2634700 2635200 2635700 2636200 2636700 B. subt vs. Staph 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0 0 0 0 4 0 0 0 0 0 0 Mycobacterium tuberculosis 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0 0 0 0 Mycobacterium leprae M. tb vs. M. leprae Pyrococcus Thermoplasmas 9945700 9947275 9948850 9950425 9952000 0 2125 4250 6375 8500 Series1 Pseudomonas The X-Files I Intrinsic
  • 16. B1 A1 B2 A2 B3 A3 B3 B2 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 6 7 25 8 26 27 28 29 30 1 2 3 4 5 3132 B1 3132 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 B3 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 6 7 25 8 26 27 28 29 3 3231 30 4 5 2 1 A1 3132 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 A2 3132 6 7 8 9 10 11 12 13 19 18 17 16 15 14 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 A3 2 6 7 8 9 10 11 12 13 19 18 17 16 15 14 20 21 22 23 24 25 26 27 5 4 3 31 30 29 28 1 32 B2 Inversion Around Terminus (*) Inversion Around Terminus (*) Inversion Around Origin (*) Inversion Around Origin (*) * * * * * * * * Common Ancestor of A and B 3132 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 A2 A1 A2 A3 B2 B1 Symmetric Inversion Model Eisen JA, Heidelberg JF, White O, Salzberg SL. Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol. 2000 1(6): RESEARCH0011. doi:10.1186/gb-2000-1-6-research0011 I Intrinsic
  • 21. Wu et al., 2004. Collaboration between Jonathan Eisen and Scott O’Neill (Yale, U. Queensland). Wolbachia pipientis wMel E1 Extrinsic Collaboration with Scott O’ Neill and others Wu M, Sun LV, Vamathevan J, et al. Phylogenomics of the reproductive parasite Wolbachia pipientis wMel: a streamlined genome overrun by mobile genetic elements. PLoS Biol. 2004;2(3):E69. doi:10.1371/journal.pbio.0020069
  • 22. Wolbachia Mobile/Repetitive DNA Repeat Class Size (Median) Copies Protein motifs/families IS Family Possible Terminal Inverted Repeat Sequence 1 1512 3 Transposase IS4 5’ ATACGCGTCAAGTTAAG 3’ 2 360 12 - New 5’ GGCTTTGTTGCATCGCTA 3’ 3 858 9 Transposase IS492/IS110 5’ GGCTTTGTTGCAT 3’ 4 1404.5 4 Conserved hypothetical, phage terminase New 5’ ATACCGCGAWTSAWTCGCGGTAT 3’ 5 1212 15 Transposase IS3 5’ TGACCTTACCCAGAAAAAGTGGAGAGAAAG 3’ 6 948 13 Transposase IS5 5’ AGAGGTTGTCCGGAAACAAGTAAA 3’ 7 2405.5 8 RT/maturase - 8 468 45 - - 9 817 3 conserved hypothetical, transposase ISBt12 10 238 2 ExoD - 11 225 2 RT/maturase - 12 1263 4 Transposase ??? 13 572.5 2 Transposase ??? None detected 14 433 2 Ankyrin - 15 201 2 - - 16 1400 6 RT/maturase - 17 721 2 transposase IS630 18 1191.5 2 EF-Tu - 19 230 2 hypothetical - E1 Extrinsic Wu M, Sun LV, Vamathevan J, et al. Phylogenomics of the reproductive parasite Wolbachia pipientis wMel: a streamlined genome overrun by mobile genetic elements. PLoS Biol. 2004;2(3):E69. doi:10.1371/journal.pbio.0020069
  • 23. T. roseum mobile motility element Wu et al doi:10.1371/journal.pone.0004207 E1 Extrinsic
  • 25. Host Microbe Stress (HMS) Triangle Host Microbe Stress E2 Extrinsic
  • 26. Host Microbiome Stress Host Microbe Stress (HMS) Triangle E2 Extrinsic
  • 27. Symbiosis Under Stress When organisms are placed under selective pressure or stress where novelty would be beneficial, can we predict which pathway they will use? What leads to interactions / symbioses being a potential solution? Can we manipulate interactions and/or force new ones upon systems? Extrinsic Novelty
  • 29. HMS Type 1: Nutrient Acquisition Host Microbiome Nutrients E2 Extrinsic
  • 30. HMS Type 1: Chemosymbioses Marine Invertebrates Endosymbionts Carbon Colleen Cavanaugh E2 Extrinsic
  • 31. HMS Type 1: Xylem Feeders Glassy Winged Sharpshooter Gut Endosymbionts Trying to Live on Xylem Fluid Nancy Moran Dongying Wu E2 Extrinsic
  • 32. HMS Type 1: Nitrogen Acquisition Oloton Corn Mucilage Microbiome Low N Van Deynze A, Zamora P, Delaux PM, Heitmann C, Jayaraman D, Rajasekar S, Graham D, Maeda J, Gibson D, Schwartz KD, Berry AM, Bhatnagar S, Jospin G, Darling A, Jeannotte R, Lopez J, Weimer BC, Eisen JA, Shapiro HY, Ané JM, Bennett AB. 2018. Nitrogen fixation in a landrace of maize is supported by a mucilage-associated diazotrophic microbiota. PLoS Biology 16(8):e2006352. doi: 10.1371/ journal.pbio.2006352. PMID: 30086128. PMCID: PMC6080747. E2 Extrinsic
  • 33. HMS Type 2: Pathogens Host Microbiome Pathogen E2 Extrinsic
  • 34. HMS Type 2: Flu & Ducks Ducks Gut Microbiome Flu Walter Boyce Holly Ganz Sarah Hird Ladan Daroud Alana Firl Hird SM, Ganz H, Eisen JA, Boyce WM. 2018. The cloacal microbiome of fi ve wild duck species varies by species and in fl uenza A virus infection status. mSphere 3:e00382-18. https:// doi.org/10.1128/ mSphere.00382-18 Ganz, H.H., Doroud, L., Firl, A.J., Hird, S.M., Eisen, J.A. and Boyce, W.M., 2017. Community-level differences in the microbiome of healthy wild mallards and those infected by influenza A viruses. mSystems, 2(1) .e00188-16. E2 Extrinsic
  • 35. HMS Type 2: Kolalas & Chlamydia Koala Gut Microbiome Chlamydia & Antibiotics Katherine Dahlhausen E2 Extrinsic Dahlhausen KE, Jospin G, Coil DA, Eisen JA, Wilkins LGE. Isolation and sequence-based characterization of a koala symbiont: Lonepinella koalarum. PeerJ. 2020;8:e10177. Published 2020 Oct 20. doi:10.7717/peerj.10177 Dahlhausen KE, Doroud L, Firl AJ, Polkinghorne A, Eisen JA. Characterization of shifts of koala (Phascolarctos cinereus) intestinal microbial communities associated with antibiotic treatment. PeerJ. 2018;6:e4452. Published 2018 Mar 12. doi:10.7717/peerj.4452
  • 36. Frogs Skin Microbiome Chytrid Sonia Ghose Marina De León HMS Type 2: Frogs and Chytrids E2 Extrinsic
  • 37. Host Microbiome Changing Environment HMS Type 3: Environmental Change E2 Extrinsic
  • 38. HMS Type 3: Rice Microbiome Rice Root Microbiome Domestication E2 Extrinsic Sundar Lab Srijak Bhatnagar Edwards J, Johnson C, Santos-Medellín C, et al. Structure, variation, and assembly of the root-associated microbiomes of rice. Proc Natl Acad Sci U S A. 2015;112(8):E911- E920. doi:10.1073/pnas.1414592112
  • 39. Seagrass Microbiome Returning to The Sea HMS Type 3: Seagrass Land to Sea Jenna Lang Jessica Green Jay Stachowicz David Coil E2 Extrinsic
  • 40. HMS Type 3: Panamanian Isthmus 1000s of Species Microbiome Rise of Wilkins Bill Wcislo Matt Leray E2 Extrinsic
  • 42. Eisen Lab “Topics” Phylogenomic Methods & Tools Phylogenomic Resources & Reference Data Communication & Participation In Microbiology & Science Research & Evolvability
  • 43. Eisen Lab “Topics” Phylogenomic Methods & Tools Phylogenomic Resources & Reference Data Communication & Evolvability
  • 44. Phylogenomics Methods and Tools Phylogenomic & Evolvability Phylogenomic
  • 46.
  • 48. STAP An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP) Dongying Wu1 *, Amber Hartman1,6 , Naomi Ward4,5 , Jonathan A. Eisen1,2,3 1 UC Davis Genome Center, University of California Davis, Davis, California, United States of America, 2 Section of Evolution and Ecology, College of Biological Sciences, University of California Davis, Davis, California, United States of America, 3 Department of Medical Microbiology and Immunology, School of Medicine, University of California Davis, Davis, California, United States of America, 4 Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, United States of America, 5 Center of Marine Biotechnology, Baltimore, Maryland, United States of America, 6 The Johns Hopkins University, Department of Biology, Baltimore, Maryland, United States of America Abstract Comparative analysis of small-subunit ribosomal RNA (ss-rRNA) gene sequences forms the basis for much of what we know about the phylogenetic diversity of both cultured and uncultured microorganisms. As sequencing costs continue to decline and throughput increases, sequences of ss-rRNA genes are being obtained at an ever-increasing rate. This increasing flow of data has opened many new windows into microbial diversity and evolution, and at the same time has created significant methodological challenges. Those processes which commonly require time-consuming human intervention, such as the preparation of multiple sequence alignments, simply cannot keep up with the flood of incoming data. Fully automated methods of analysis are needed. Notably, existing automated methods avoid one or more steps that, though computationally costly or difficult, we consider to be important. In particular, we regard both the building of multiple sequence alignments and the performance of high quality phylogenetic analysis to be necessary. We describe here our fully- automated ss-rRNA taxonomy and alignment pipeline (STAP). It generates both high-quality multiple sequence alignments and phylogenetic trees, and thus can be used for multiple purposes including phylogenetically-based taxonomic assignments and analysis of species diversity in environmental samples. The pipeline combines publicly-available packages (PHYML, BLASTN and CLUSTALW) with our automatic alignment, masking, and tree-parsing programs. Most importantly, this automated process yields results comparable to those achievable by manual analysis, yet offers speed and capacity that are unattainable by manual efforts. Citation: Wu D, Hartman A, Ward N, Eisen JA (2008) An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP). PLoS ONE 3(7): e2566. doi:10.1371/journal.pone.0002566 multiple alignment and phylogeny was deemed unfeasible. However, this we believe can compromise the value of the results. For example, the delineation of OTUs has also been automated via tools that do not make use of alignments or phylogenetic trees (e.g., Greengenes). This is usually done by carrying out pairwise comparisons of sequences and then clustering of sequences that have better than some cutoff threshold of similarity with each other). This approach can be powerful (and reasonably efficient) but it too has limitations. In particular, since multiple sequence alignments are not used, one cannot carry out standard phylogenetic analyses. In addition, without multiple sequence alignments one might end up comparing and contrasting different regions of a sequence depending on what it is paired with. The limitations of avoiding multiple sequence alignments and phylogenetic analysis are readily apparent in tools to classify sequences. For example, the Ribosomal Database Project’s Classifier program [29] focuses on composition characteristics of each sequence (e.g., oligonucleotide frequency) and assigns taxonomy based upon clustering genes by their composition. Though this is fast and completely automatable, it can be misled in cases where distantly related sequences have converged on similar composition, something known to be a major problem in ss-rRNA sequences [30]. Other taxonomy assignment systems focus primarily on the similarity of sequences. The simplest of these is to use BLASTN to search a sequence database (e.g., Genbank) and to then use information about the top match to assign some sort of classification tools it does have some limitations. For example, the generation of new alignments for each sequence is both computational costly, and does not take advantage of available curated alignments that make use of ss-RNA secondary structure to guide the primary sequence alignment. Perhaps most importantly however is that the tool is not fully automated. In addition, it does not generate multiple sequence alignments for all sequences in a dataset which would be necessary for doing many analyses. Automated methods for analyzing rRNA sequences are also available at the web sites for multiple rRNA centric databases, such as Greengenes and the Ribosomal Database Project (RDPII). Though these and other web sites offer diverse powerful tools, they do have some limitations. For example, not all provide multiple sequence alignments as output and few use phylogenetic approaches for taxonomy assignments or other analyses. More importantly, all provide only web-based interfaces and their integrated software, (e.g., alignment and taxonomy assignment), cannot be locally installed by the user. Therefore, the user cannot take advantage of the speed and computing power of parallel processing such as is available on linux clusters, or locally alter and potentially tailor these programs to their individual computing needs (Table 1). Given the limited automated tools that are available for researchers have had to choose between two non-ideal options: manually generating and/or curating alignments (an expensive Table 1. Comparison of STAP’s computational abilities relative to existing commonly-used ss-RNA analysis tools. STAP ARB Greengenes RDP Installed where? Locally Locally Web only Web only User interface Command line GUI Web portal Web portal Parallel processing YES NO NO NO Manual curation for taxonomy assignment NO YES NO NO Manual curation for alignment NO YES NO* NO Open source YES** NO NO NO Processing speed Fast Slow Medium Medium It is important to note, that STAP is the only software that runs on the command line and can take advantage of parallel processing on linux clusters and, further, is more amenable to downstream code manipulation. * Note: Greengenes alignment output is compatible with upload into ARB and downstream manual alignment. ** The STAP program itself is open source, the programs it depends on are freely available but not open source. doi:10.1371/journal.pone.0002566.t001 ss-rRNA Taxonomy Pipeline STAP database, and the query sequence is aligned to them using the CLUSTALW profile alignment algorithm [40] as described above for domain assignment. By adapting the profile alignment algorithm, t while gaps a sequence a alignments Figure 1. A flow chart of the STAP pipeline. doi:10.1371/journal.pone.0002566.g001 STAP database, and the query sequence is aligned to them using the CLUSTALW profile alignment algorithm [40] as described above for domain assignment. By adapting the profile alignment algorithm, the alignments from the STAP database remain intact, while gaps are inserted and nucleotides are trimmed for the query sequence according to the profile defined by the previous alignments from the databases. Thus the accuracy and quality of the alignment generated at this step depends heavily on the quality of the Bacterial/Archaeal ss-rRNA alignments from the Greengenes project or the Eukaryotic ss-rRNA alignments from the RDPII project. Phylogenetic analysis using multiple sequence alignments rests on the assumption that the residues (nucleotides or amino acids) at the same position in every sequence in the alignment are homologous. Thus, columns in the alignment for which ‘‘positional homology’’ cannot be robustly determined must be excluded from subsequent analyses. This process of evaluating homology and eliminating questionable columns, known as masking, typically requires time- consuming, skillful, human intervention. We designed an automat- ed masking method for ss-rRNA alignments, thus eliminating this bottleneck in high-throughput processing. First, an alignment score is calculated for each aligned column by a method similar to that used in the CLUSTALX package [42]. Specifically, an R-dimensional sequence space representing all the possible nucleotide character states is defined. Then for each aligned column, the nucleotide populating that column in each of the aligned sequences is assigned a score in each of the R dimensions (Sr) according to the IUB matrix [42]. The consensus ‘‘nucleotide’’ for each column (X) also has R dimensions, with the score for each dimension (Xr) calculated as the average of the scores for that column in that dimension (average of Sr). Thus the score of the consensus nucleotide is a mathematical expression describing the average ‘‘nucleotide’’ in that column for that Figure 2. Domain assignment. In Step 1, STAP assigns a domain to each query sequence based on its position in a maximum likelihood tree of representative ss-rRNA sequences. Because the tree illustrated here is not rooted, domain assignment would not be accurate and reliable (sequence similarity based methods cannot make an accurate assignment in this case either). However the figure illustrates an important role of the tree-based domain assignment step, namely automatic identification of deep-branching environmental ss-rRNAs. Figure 1. A flow chart of the STAP pipeline. doi:10.1371/journal.pone.0002566.g001 ss-rRNA Taxonomy Pipeline Wu D, Hartman A, Ward N, Eisen JA. An automated phylogenetic tree-based small subunit rRNA taxonomy and alignment pipeline (STAP) [published correction appears in PLoS ONE. 2008;3(7). doi: 10.1371/annotation/ c1aa88dd-4360-4902-8599-4d7edca79817]. PLoS One. 2008;3(7):e2566. Published 2008 Jul 2. doi:10.1371/ journal.pone.0002566
  • 49. Venter et al., Science 304: 66. 2004 STAP for Sargasso Metagenome
  • 50. WATERS Hartman et al. BMC Bioinformatics 2010, 11:317 http://www.biomedcentral.com/1471-2105/11/317 Open Access SOFTWARE © 2010 Hartman et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in Software Introducing W.A.T.E.R.S.: a Workflow for the Alignment, Taxonomy, and Ecology of Ribosomal Sequences Amber L Hartman†1,3, Sean Riddle†2, Timothy McPhillips2, Bertram Ludäscher2 and Jonathan A Eisen*1 Abstract Background: For more than two decades microbiologists have used a highly conserved microbial gene as a phylogenetic marker for bacteria and archaea. The small-subunit ribosomal RNA gene, also known as 16 S rRNA, is encoded by ribosomal DNA, 16 S rDNA, and has provided a powerful comparative tool to microbial ecologists. Over time, the microbial ecology field has matured from small-scale studies in a select number of environments to massive collections of sequence data that are paired with dozens of corresponding collection variables. As the complexity of data and tool sets have grown, the need for flexible automation and maintenance of the core processes of 16 S rDNA sequence analysis has increased correspondingly. Results: We present WATERS, an integrated approach for 16 S rDNA analysis that bundles a suite of publicly available 16 S rDNA analysis software tools into a single software package. The "toolkit" includes sequence alignment, chimera removal, OTU determination, taxonomy assignment, phylogentic tree construction as well as a host of ecological analysis and visualization tools. WATERS employs a flexible, collection-oriented 'workflow' approach using the open- source Kepler system as a platform. Conclusions: By packaging available software tools into a single automated workflow, WATERS simplifies 16 S rDNA analyses, especially for those without specialized bioinformatics, programming expertise. In addition, WATERS, like some of the newer comprehensive rRNA analysis tools, allows researchers to minimize the time dedicated to carrying out tedious informatics steps and to focus their attention instead on the biological interpretation of the results. One advantage of WATERS over other comprehensive tools is that the use of the Kepler workflow system facilitates result interpretation and reproducibility via a data provenance sub-system. Furthermore, new "actors" can be added to the workflow as desired and we see WATERS as an initial seed for a sizeable and growing repository of interoperable, easy- to-combine tools for asking increasingly complex microbial ecology questions. Background Microbial communities and how they are surveyed Microbial communities abound in nature and are crucial for the success and diversity of ecosystems. There is no end in sight to the number of biological questions that can be asked about microbial diversity on earth. From animal and human guts to open ocean surfaces and deep sea hydrothermal vents, to anaerobic mud swamps or boiling thermal pools, to the tops of the rainforest canopy and the frozen Antarctic tundra, the composition of microbial communities is a source of natural history, intellectual curiosity, and reservoir of environmental health [1]. Microbial communities are also mediators of insight into global warming processes [2,3], agricultural success [4], pathogenicity [5,6], and even human obesity [7,8]. In the mid-1980 s, researchers began to sequence ribo- somal RNAs from environmental samples in order to characterize the types of microbes present in those sam- ples, (e.g., [9,10]). This general approach was revolution- ized by the invention of the polymerase chain reaction (PCR), which made it relatively easy to clone and then * Correspondence: jaeisen@ucdavis.edu 1 Department of Medical Microbiology and Immunology and the Department of Evolution and Ecology, Genome Center, University of California Davis, One Shields Avenue, Davis, CA, 95616, USA † Contributed equally Full list of author information is available at the end of the article 317 /11/317 Page 2 of 14 somal RNA) in partic- omal RNA (ss-rRNA). amount of previously 1,11-13]. Researchers RNA gene not only can be PCR amplified, and highly conserved ally distributed among for inferring phyloge- hen, "cultivation-inde- ht a revolution to the scientists to study a rsity in many different . The general premise Figure 1 Overview of WATERS. Schema of WATERS where white boxes indicate "behind the scenes" analyses that are performed in WA- Align Check chimeras Cluster Build Tree Assign Taxonomy Tree w/ Taxonomy Diversity statistics & graphs Unifrac files Cytoscape network OTU table Hartman et al. BMC Bioinformatics 2010, 11:317 http://www.biomedcentral.com/1471-2105/11/317 Page 3 of 14 Motivations As outlined above, successfully processing microbial sequence collections is far from trivial. Each step is com- plex and usually requires significant bioinformatics expertise and time investment prior to the biological interpretation. In order to both increase efficiency and ensure that all best-practice tools are easily usable, we sought to create an "all-inclusive" method for performing all of these bioinformatics steps together in one package. To this end, we have built an automated, user-friendly, workflow-based system called WATERS: a Workflow for the Alignment, Taxonomy, and Ecology of Ribosomal Sequences (Fig. 1). In addition to being automated and simple to use, because WATERS is executed in the Kepler scientific workflow system (Fig. 2) it also has the advan- tage that it keeps track of the data lineage and provenance of data products [23,24]. Automation The primary motivation in building WATERS was to minimize the technical, bioinformatics challenges that arise when performing DNA sequence clustering, phylo- genetic tree, and statistical analyses by automating the 16 S rDNA analysis workflow. We also hoped to exploit additional features that workflow-based approaches entail, such as optimized execution and data lineage tracking and browsing [23,25-27]. In the earlier days of 16 S rDNA analysis, simply knowing which microbes were present and whether they were biologically novel was a noteworthy achievement. It was reasonable and expected, therefore, to invest a large amount of time and effort to get to that list of microbes. But now that current efforts are significantly more advanced and often require com- parison of dozens of factors and variables with datasets of thousands of sequences, it is not practically feasible to process these large collections "by hand", and hugely inef- ficient if instead automated methods can be successfully employed. Broadening the user base A second motivation and perspective is that by minimiz- ing the technical difficulty of 16 S rDNA analysis through the use of WATERS, we aim to make the analysis of these datasets more widely available and allow individuals with Figure 2 Screenshot of WATERS in Kepler software. Key features: the library of actors un-collapsed and displayed on the left-hand side, the input and output paths where the user declares the location of their input files and desired location for the results files. Each green box is an individual Kepler actor that performs a single action on the data stream. The connectors (black arrows) direct and hook up the actors in a defined sequence. Double- clicking on any actor or connector allows it to be manipulated and re-arranged. Hartman et al. BMC Bioinformatics 2010, 11:317 http://www.biomedcentral.com/1471-2105/11/317 Page 9 of 14 default is 97% and 99%), and they are also generated for every metadata variable comparison that the user includes. Data pruning To assist in troubleshooting and quality control, WATERS returns to the user three fasta files of sequences that were removed at various steps in the workflow. A short_sequences.fas file is created that contains all Figure 3 Biologically similar results automatically produced by WATERS on published colonic microbiota samples. (A) Rarefaction curves sim- ilar to curves shown in Eckburg et al. Fig. 2; 70-72, indicate patient numbers, i.e., 3 different individuals. (B) Weighted Unifrac analysis based on phylo- genetic tree and OTU data produced by WATERS very similar to Eckburg et al. Fig. 3B. (C) Neighbor-joining phylogenetic tree (Quicktree) representing the sequences analyzed by WATERS, which is clearly similar to Fig. S1 in Eckburg et al. B A !"#$ !"#% !"#& "#" "#& '&(!(')*+),-(./*0/-01,()234/0,)5(67#778 !"#% !"#& "#" "#& "#% "#$ "#6 "#9 '%(!(')*+),-(./*0/-01,()234/0,)5(%&#9%8 :"; :"< :"= :"> :"? :"@ :"A :&; :&< :&= :&> :&? :&@ :&A :%; :%< :%= :%> :%? :%@ :%A '=;(!('&(.B('% " :9" &9"" %%9" $""" " 9" &"" &9" %"" %9" :% :& :" C !"#$%&'()%$%* !"#$%&'()"+%* )%+$",&'$%'!"#$%&(" "#$(-'!"#$%&(" .%&&/#'0(#&'!(" %,*(+'-,&'$%'!"#$%&(" 1(&0(#/$%* #+'*$&()(" #+'*$&()("+%* 2324 5"00",&'$%'!"#$%&(" #6"-'!"#$%&(" "+,7",&'$%'!"#$%&(" 1/*'!"#$%&(" 1(&0(#/$%* !"#(++( 1(&0(#/$%* 0'++(#/$%* Hartman AL, Riddle S, McPhillips T, Ludäscher B, Eisen JA. Introducing W.A.T.E.R.S.: a workflow for the alignment, taxonomy, and ecology of ribosomal sequences. BMC Bioinformatics. 2010;11:317. Published 2010 Jun 12. doi:10.1186/1471-2105-11-317
  • 51. alignment used to build the profile, resulting in a multiple sequence alignment of full-length reference sequences and metagenomic reads. The final step of the alignment process is a quality control filter that 1) ensures that only homologous SSU- rRNA sequences from the appropriate phylogenetic domain are included in the final alignment, and 2) masks highly gapped alignment columns (see Text S1). We use this high quality alignment of metagenomic reads and references sequences to construct a fully-resolved, phylogenetic tree and hence determine the evolutionary relationships between the reads. Reference sequences are included in this stage of the analysis to guide the phylogenetic assignment of the relatively short metagenomic reads. While the software can be easily extended to incorporate a number of different phylogenetic tools capable of analyzing metagenomic data (e.g., RAxML [27], pplacer [28], etc.), PhylOTU currently employs FastTree as a default method due to its relatively high speed-to-performance PD versus PID clustering, 2) to explore overlap between PhylOTU clusters and recognized taxonomic designations, and 3) to quantify the accuracy of PhylOTU clusters from shotgun reads relative to those obtained from full-length sequences. PhylOTU Clusters Recapitulate PID Clusters We sought to identify how PD-based clustering compares to commonly employed PID-based clustering methods by applying the two methods to the same set of sequences. Both PID-based clustering and PhylOTU may be used to identify OTUs from overlapping sequences. Therefore we applied both methods to a dataset of 508 full-length bacterial SSU-rRNA sequences (refer- ence sequences; see above) obtained from the Ribosomal Database Project (RDP) [25]. Recent work has demonstrated that PID is more accurately calculated from pairwise alignments than multiple sequence alignments [32–33], so we used ESPRIT, which Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in this generalize workflow of PhylOTU. See Results section for details. doi:10.1371/journal.pcbi.1001061.g001 Finding Metagenomic OTUs Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer JP, Green JL, Eisen JA, Pollard KS. (2011) PhylOTU: A High- Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/ journal.pcbi.1001061 OTUs via Phylogeny (PhylOTU) Tom 
 Sharpton Katie Pollard Jessica Green Finding Metagenomic OTUs
  • 52. rRNA Copy # vs. Phylogeny Steven Kembel Jessic a Green Martin
 Wu Kembel SW, Wu M, Eisen JA, Green JL (2012) Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance. PLoS Comput Biol 8(10): e1002743. doi:10.1371/ journal.pcbi.1002743
  • 54.
  • 55. RecA vs. rRNA Eisen 1995 Journal of Molecular Evolution 41: 1105-1123..
  • 56. RecA From Other Species
  • 57. RecA from Environment? Eisen 1995 Journal of Molecular Evolution 41: 1105-1123..
  • 59. Venter et al., Science 304: 66. 2004 RecA Phylotyping - Sargasso Metagenome
  • 60. GOS 1 GOS 2 GOS 3 GOS 4 GOS 5 Phylogenetic ID of Novel Lineages Wu et al PLoS One 2011
  • 61. Metagenomics DNA RecA RecA RecA RpoB RpoB RpoB Rpl4 Rpl4 Rpl4 rRNA rRNA rRNA Hsp70 Hsp70 Hsp70 EFTu EFTu EFTu http://genomebiology.com/2008/9/10/R151 Genome Biology 2008, Volume 9, Issue 10, Article R151 Wu and Eisen R151.7 Genome Biology 2008, 9:R151 sequences are not conserved at the nucleotide level [29]. As a result, the nr database does not actually contain many more protein marker sequences that can be used as references than those available from complete genome sequences. Comparison of phylogeny-based and similarity-based phylotyping Although our phylogeny-based phylotyping is fully auto- mated, it still requires many more steps than, and is slower than, similarity based phylotyping methods such as a MEGAN [30]. Is it worth the trouble? Similarity based phylo- typing works by searching a query sequence against a refer- ence database such as NCBI nr and deriving taxonomic information from the best matches or 'hits'. When species that are closely related to the query sequence exist in the ref- erence database, similarity-based phylotyping can work well. However, if the reference database is a biased sample or if it contains no closely related species to the query, then the top hits returned could be misleading [31]. Furthermore, similar- ity-based methods require an arbitrary similarity cut-off value to define the top hits. Because individual bacterial genomes and proteins can evolve at very different rates, a uni- versal cut-off that works under all conditions does not exist. As a result, the final results can be very subjective. In contrast, our tree-based bracketing algorithm places the query sequence within the context of a phylogenetic tree and only assigns it to a taxonomic level if that level has adequate sampling (see Materials and methods [below] for details of the algorithm). With the well sampled species Prochlorococ- cus marinus, for example, our method can distinguish closely related organisms and make taxonomic identifications at the species level. Our reanalysis of the Sargasso Sea data placed 672 sequences (3.6% of the total) within a P. marinus clade. On the other hand, for sparsely sampled clades such as Aquifex, assignments will be made only at the phylum level. Thus, our phylogeny-based analysis is less susceptible to data sampling bias than a similarity based approach, and it makes Major phylotypes identified in Sargasso Sea metagenomic data Figure 3 Major phylotypes identified in Sargasso Sea metagenomic data. The metagenomic data previously obtained from the Sargasso Sea was reanalyzed using AMPHORA and the 31 protein phylogenetic markers. The microbial diversity profiles obtained from individual markers are remarkably consistent. The breakdown of the phylotyping assignments by markers and major taxonomic groups is listed in Additional data file 5. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 A l p h a p r o t e o b a c t e r i a B e t a p r o t e o b a c t e r i a G a m m a p r o t e o b a c t e r i a D e l t a p r o t e o b a c t e r i a E p s i l o n p r o t e o b a c t e r i a U n c l a s s i f i e d p r o t e o b a c t e r i a B a c t e r o i d e t e s C h l a m y d i a e C y a n o b a c t e r i a A c i d o b a c t e r i a T h e r m o t o g a e F u s o b a c t e r i a A c t i n o b a c t e r i a A q u i f i c a e P l a n c t o m y c e t e s S p i r o c h a e t e s F i r m i c u t e s C h l o r o f l e x i C h l o r o b i U n c l a s s i f i e d b a c t e r i a dnaG frr infC nusA pgk pyrG rplA rplB rplC rplD rplE rplF rplK rplL rplM rplN rplP rplS rplT rpmA rpoB rpsB rpsC rpsE rpsI rpsJ rpsK rpsM rpsS smpB tsf Relative abundance Many other genes better than rRNA
  • 62. Sargasso Phylotypes Weighted % of Clones 0.000 0.125 0.250 0.375 0.500 Major Phylogenetic Group A l p h a p r o t e o b a c t e r i a B e t a p r o t e o b a c t e r i a G a m m a p r o t e o b a c t e r i a E p s i l o n p r o t e o b a c t e r i a D e l t a p r o t e o b a c t e r i a C y a n o b a c t e r i a F i r m i c u t e s A c t i n o b a c t e r i a C h l o r o b i C F B C h l o r o fl e x i S p i r o c h a e t e s F u s o b a c t e r i a D e i n o c o c c u s - T h e r m u s E u r y a r c h a e o t a C r e n a r c h a e o t a EFG EFTu HSP70 RecA RpoB rRNA Venter et al., Science 304: 66. 2004 RecA Phylotyping - Sargasso Metagenome
  • 63. Amphora W Martin
 Wu Wu M, Eisen JA. A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 2008;9(10):R151. Published 2008 Oct 13. doi:10.1186/gb-2008-9-10-r151
  • 64. AMPHORA http://genomebiology.com/2008/9/10/R151 Genome Biology 2008, Volume 9, Issue 10, Article R151 Wu and Eisen R151.7 Major phylotypes identified in Sargasso Sea metagenomic data Figure 3 Major phylotypes identified in Sargasso Sea metagenomic data. The metagenomic data previously obtained from the Sargasso Sea was reanalyzed using AMPHORA and the 31 protein phylogenetic markers. The microbial diversity profiles obtained from individual markers are remarkably consistent. The 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 A l p h a p r o t e o b a c t e r i a B e t a p r o t e o b a c t e r i a G a m m a p r o t e o b a c t e r i a D e l t a p r o t e o b a c t e r i a E p s i l o n p r o t e o b a c t e r i a U n c l a s s i f i e d p r o t e o b a c t e r i a B a c t e r o i d e t e s C h l a m y d i a e C y a n o b a c t e r i a A c i d o b a c t e r i a T h e r m o t o g a e F u s o b a c t e r i a A c t i n o b a c t e r i a A q u i f i c a e P l a n c t o m y c e t e s S p i r o c h a e t e s F i r m i c u t e s C h l o r o f l e x i C h l o r o b i U n c l a s s i f i e d b a c t e r i a dnaG frr infC nusA pgk pyrG rplA rplB rplC rplD rplE rplF rplK rplL rplM rplN rplP rplS rplT rpmA rpoB rpsB rpsC rpsE rpsI rpsJ rpsK rpsM rpsS smpB tsf Relative abundance AMPHORA Phylotyping w/ Protein Markers Martin
 Wu Wu M, Eisen JA. A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 2008;9(10):R151. Published 2008 Oct 13. doi:10.1186/gb-2008-9-10-r151
  • 65. Phylosift - Bayesian Phylotyping Input Sequences rRNA workflow protein workflow profile HMMs used to align candidates to reference alignment Taxonomic Summaries parallel option hmmalign multiple alignment LAST fast candidate search pplacer phylogenetic placement LAST fast candidate search LAST fast candidate search search input against references hmmalign multiple alignment hmmalign multiple alignment Infernal multiple alignment LAST fast candidate search <600 bp >600 bp Sample Analysis & Comparison Krona plots, Number of reads placed for each marker gene Edge PCA, Tree visualization, Bayes factor tests each input sequence scanned against both workflows Aaron 
 Darling Erik 
 Matsen Holly 
 Bik Guillaume 
 Jospin Darling AE, Jospin G, Lowe E, Matsen FA IV, Bik HM, Eisen JA. (2014) PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ 2:e243 http://dx.doi.org/10.7717/ peerj.243 Erik Lowe
  • 66. PD from Metagenomes typically used as a qualitative measure because duplicate s quences are usually removed from the tree. However, the test may be used in a semiquantitative manner if all clone even those with identical or near-identical sequences, are i cluded in the tree (13). Here we describe a quantitative version of UniFrac that w call “weighted UniFrac.” We show that weighted UniFrac b haves similarly to the FST test in situations where both a FIG. 1. Calculation of the unweighted and the weighted UniFr measures. Squares and circles represent sequences from two differe environments. (a) In unweighted UniFrac, the distance between t circle and square communities is calculated as the fraction of t branch length that has descendants from either the square or the circ environment (black) but not both (gray). (b) In weighted UniFra branch lengths are weighted by the relative abundance of sequences the square and circle communities; square sequences are weight twice as much as circle sequences because there are twice as many tot circle sequences in the data set. The width of branches is proportion to the degree to which each branch is weighted in the calculations, an gray branches have no weight. Branches 1 and 2 have heavy weigh since the descendants are biased toward the square and circles, respe tively. Branch 3 contributes no value since it has an equal contributio from circle and square sequences after normalization. Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214 Jessica Green Steven Kembel Katie Pollard
  • 67. Zorro - Automated Masking 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 1.0 2.0 3.0 4.0 5.0 6.0 no ma zorro gbloc Distance to True Tree NJ ML A 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 200 400 800 1600 3200 1.0 2.0 3.0 4.0 5.0 6.0 7.0 Seque Distance to True Tree NJ ML A C 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 200 400 800 1600 3200 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 200 400 800 1600 3200 0.0 1.0 2.0 3.0 4.0 5.0 6.0 200 400 8 Sequence Length Di ML C 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 200 400 800 1600 3200 D 0.0 1.0 2.0 3.0 4.0 5.0 200 400 800 0.0 1.0 2.0 3.0 4.0 5.0 6.0 200 400 800 no masking zorro gblocks Distance to True Tree NJ ML A 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 200 400 800 1600 3200 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 B Wu M, Chatterji S, Eisen JA (2012) Accounting For Alignment Uncertainty in Phylogenomics. PLoS ONE 7(1): e30288. doi:10.1371/ journal.pone.0030288
  • 68. Tools: Phylogenomic Functional Prediction Phylogenomic & Evolvability Phylogenomic
  • 69. We need to be able to predict Functions well from sequence data. Tools: Phylogenomic Functional Prediction
  • 71. Helicobacter pylori genome 1997 “The ability of H. pylori to perform mismatch repair is suggested by the presence of methyl transferases, mutS and uvrD. However, orthologues of MutH and MutL were not identified.”
  • 73. Blast Search of H. pylori “MutS” Score E Sequences producing significant alignments: (bits) Value sp|P73625|MUTS_SYNY3 DNA MISMATCH REPAIR PROTEIN 117 3e-25 sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN 69 1e-10 sp|P44834|MUTS_HAEIN DNA MISMATCH REPAIR PROTEIN 64 3e-09 sp|P10339|MUTS_SALTY DNA MISMATCH REPAIR PROTEIN 62 2e-08 sp|O66652|MUTS_AQUAE DNA MISMATCH REPAIR PROTEIN 57 4e-07 sp|P23909|MUTS_ECOLI DNA MISMATCH REPAIR PROTEIN 57 4e-07 Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs Based on this TIGR predicted this species had mismatch repair Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
  • 75. Overlaying Functions onto Tree Aquae Trepa Rat Fly Xenla Mouse Human Yeast Neucr Arath Borbu Synsp Neigo Thema Strpy Bacsu Ecoli Theaq Deira Chltr Spombe Yeast Yeast Spombe Mouse Human Arath Yeast Human Mouse Arath StrpyBacsu Human Celeg Yeast Metth Borbu Aquae Synsp Deira Helpy mSaco Yeast Celeg Human MSH4 MSH5 MutS2 MutS1 MSH1 MSH3 MSH6 MSH2 Based on Eisen, 1998 
 Nucl Acids Res 26: 4291-4300.
  • 76. High Mutation Rate in H. pylori Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs Based on this TIGR predicted this species had mismatch repair Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
  • 77. PHYLOGENENETIC PREDICTION OF GENE FUNCTION IDENTIFY HOMOLOGS OVERLAY KNOWN FUNCTIONS ONTO TREE INFER LIKELY FUNCTION OF GENE(S) OF INTEREST 1 2 3 4 5 6 3 5 3 1A 2A 3A 1B 2B 3B 2A 1B 1A 3A 1B 2B 3B ALIGN SEQUENCES CALCULATE GENE TREE 1 2 4 6 CHOOSE GENE(S) OF INTEREST 2A 2A 5 3 Species 3 Species 1 Species 2 1 1 2 2 2 3 1 1A 3A 1A 2A 3A 1A 2A 3A 4 6 4 5 6 4 5 6 2B 3B 1B 2B 3B 1B 2B 3B ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Duplication? EXAMPLE A EXAMPLE B Duplication? Duplication? Duplication 5 METHOD Ambiguous Based on Eisen, 1998 Genome Res 8: 163-167. Phylogenomics
  • 78. Phylotyping Eisen et al. 1992 Eisen et al. 1992. J. Bact.174: 3416
  • 79. Proteorhodopsin Phylogenomics Venter et al., Science 304: 66. 2004
  • 81. Limitations of Phylogenetic Prediction of Function • Still imperfectly automated • Each gene family different • Each function different • In some cases, function does not track with phylogeny well • Does not work when NO members of a gene family have been characterized
  • 83. • Thermophile (grows at 80°C) • Anaerobic • Grows very efficiently on CO (Carbon Monoxide) • Produces hydrogen gas • Low GC Gram positive (Firmicute) • Genome Determined (Wu et al. 2005 PLoS Genetics 1: e65. ) Martin Wu Frank Robb
  • 84. Homologs of Sporulation Genes Wu et al. 2005 PLoS Genetics 1: e65.
  • 85. Carboxydothermus sporulates Wu et al. 2005 PLoS Genetics 1: e65.
  • 86. Non-Homology Predictions: Phylogenetic Profiling • Step 1: Search all genes in organisms of interest against all other genomes • Ask: Yes or No, is each gene found in each other species • Cluster genes by distribution patterns (profiles)
  • 87. Sporulation Gene Profile Wu et al. 2005 PLoS Genetics 1: e65.
  • 88. B. subtilis new sporulation genes Bjorn Traag Richard Losick Antonia Pugliese J Bacteriol. 2013 Jan;195(2):253-60. doi: 10.1128/JB.01778-12
  • 89. PG Profiling Works Better with Orthology Martin Wu Eisen JA, Wu M. 2002. Phylogenetic analysis and gene functional predictions: phylogenomics in action. Theoretical and Population Biology 61: 481-487. PMID: 12167367.
  • 90. PG Profiling for Metagenomes Jiang X, Langille MGI, Neches RY, Elliot M, Levin SA, Eisen JA, et al. (2012) Functional Biogeography of Ocean Microbes Revealed through Non-Negative Matrix Factorization. PLoS ONE 7(9): e43866. doi:10.1371/journal.pone.0043866 Unidentified Pfams with high association to Components 1, 2 and 5 may have similar functional themes to other Pfams seen in these components, or they may have functions that are ecologically linked to the identified theme, or they may be associated taxonomically rather than functionally (ie., they may be expressed by the same taxa that express the identified Pfams). In the future, Additionally, we inspected the Pfams that were associated with the ‘‘ubiquitous’’ cluster previously identified in Figure 2. Many of these Pfams are associated with bacterial primary metabolism and only 1% of these had unknown functions (Table S6). This is a striking difference compared to the 15–54% proportion of unknown Pfams seen in the five NMF components. Figure 3. Components across sites. a) Weight for each of the five components at each of the 45 sites (HT ); b) the site-similarity matrix ( ^ H HT ^ H H); c) environmental variables for the sites. The matrices are aligned so that the same row corresponds to the same site in each matrix. Sites are ordered by applying spectral reordering to the similarity matrix (see Materials and Methods). Rows are aligned across the three matrices. doi:10.1371/journal.pone.0043866.g003 PLOS ONE | www.plosone.org 4 September 2012 | Volume 7 | Issue 9 | e43866
  • 91. Tools: Whole Genome Phylogeny Whole & Evolvability Phylogenomic
  • 92. We need to know how organisms are related to each other Tools: Whole Genome Phylogeny
  • 93. 16s Says Hyphomonas is in Rhodobacteriales Badger et al. 2005 Int J System Evol Microbiol 55: 1021-1026. Naomi Ward Jonatha n Badger
  • 94. WGT & gene trees: Related to Caulobacterales Badger et al. 2005 Int J System Evol Microbiol 55: 1021-1026. Naomi Ward Jonatha n Badger
  • 95. HMS Type 1: Xylem Feeders Glassy Winged Sharpshooter Gut Endosymbionts Trying to Live on Xylem Fluid Nancy Moran Dongying Wu E2 Extrinsic
  • 96. WGT: Higher Evolutionary Rates in Endosymbionts Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab Higher Evolutionary Rates in Endosymbionts
  • 97. Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab MutS MutL + + + + + + + + _ _ _ _ Variation in Evolution Rates Correlated with Repair Gene Presence Highest Rates In Those Missing Mismatch Repair Genes
  • 98. Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab MutS MutL + + + + + + + + _ _ _ _ Variation in Evolution Rates Correlated with Repair Gene Presence Important Use of Whole Genome Trees
  • 99. Whole Genome Trees: Many Possible Methods Lang JM, Darling AE, Eisen JA (2013) Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees and Supermatrices. PLoS ONE 8(4): e62510. doi:10.1371/journal.pone.0062510 Jenna Lang
  • 101. Automated WGT: Phylosift Input Sequences rRNA workflow protein workflow profile HMMs used to align candidates to reference alignment Taxonomic Summaries parallel option hmmalign multiple alignment LAST fast candidate search pplacer phylogenetic placement LAST fast candidate search LAST fast candidate search search input against references hmmalign multiple alignment hmmalign multiple alignment Infernal multiple alignment LAST fast candidate search <600 bp >600 bp Sample Analysis & Comparison Krona plots, Number of reads placed for each marker gene Edge PCA, Tree visualization, Bayes factor tests each input sequence scanned against both workflows Aaron 
 Darling Erik 
 Matsen Holly 
 Bik Guillaume 
 Jospin Darling AE, Jospin G, Lowe E, Matsen FA IV, Bik HM, Eisen JA. (2014) PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ 2:e243 http://dx.doi.org/10.7717/ peerj.243 Erik Lowe
  • 102. Normalizing Across Genes Tree OTU Wu, D., Doroud, L, Eisen, JA 2013. arXiv. TreeOTU: Operational Taxonomic Unit Classi fi cation Based on Phylogenetic Dongying Wu
  • 103. Tools: Linking Phylogeny and Function Linking & Evolvability Phylogenomic
  • 104. fl ow. PeerJ 3: e960. PMID: 26020012. PMCID: PMC4435499.
  • 105. Binning & Assembly DNA inputs of fixed carbon or nitrogen from external sources. As with Leptospirillum group I, both Leptospirillum group II and III have the genes needed to fix carbon by means of the Calvin–Benson– Bassham cycle (using type II ribulose 1,5-bisphosphate carboxy- lase–oxygenase). All genomes recovered from the AMD system contain formate hydrogenlyase complexes. These, in combination with carbon monoxide dehydrogenase, may be used for carbon fixation via the reductive acetyl coenzyme A (acetyl-CoA) pathway by some, or all, organisms. Given the large number of ABC-type sugar and amino acid transporters encoded in the Ferroplasma type Figure 4 Cell metabolic cartoons constructed from the annotation of 2,180 ORFs identified in the Leptospirillum group II genome (63% with putative assigned function) and 1,931 ORFs in the Ferroplasma type II genome (58% with assigned function). The cell cartoons are shown within a biofilm that is attached to the surface of an acid mine drainage stream (viewed in cross-section). Tight coupling between ferrous iron oxidation, pyrite dissolution and acid generation is indicated. Rubisco, ribulose 1,5-bisphosphate carboxylase–oxygenase. THF, tetrahydrofolate. articles NATURE | doi:10.1038/nature02340 | www.nature.com/nature 5 ©2004 NaturePublishing Group
  • 106. HiC Metagenomic Binning Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore RW, Eisen JA, Darling AE. (2014) Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. PeerJ 2:e415 http://dx.doi.org/10.7717/peerj.415 Table 1 Species alignment fractions. The number of reads aligning to each replicon present in the synthetic microbial community are shown before and after filtering, along with the percent of total constituted by each species. The GC content (“GC”) and restriction site counts (“#R.S.”) of each replicon, species, and strain are shown. Bur1: B. thailandensis chromosome 1. Bur2: B. thailandensis chromosome 2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2: L. brevis plasmid 2, Ped: P. pentosaceus, K12: E. coli K12 DH10B, BL21: E. coli BL21. An expanded version of this table can be found in Table S2. Sequence Alignment % of Total Filtered % of aligned Length GC #R.S. Lac0 10,603,204 26.17% 10,269,562 96.85% 2,291,220 0.462 629 Lac1 145,718 0.36% 145,478 99.84% 13,413 0.386 3 Lac2 691,723 1.71% 665,825 96.26% 35,595 0.385 16 Lac 11,440,645 28.23% 11,080,865 96.86% 2,340,228 0.46 648 Ped 2,084,595 5.14% 2,022,870 97.04% 1,832,387 0.373 863 BL21 12,882,177 31.79% 2,676,458 20.78% 4,558,953 0.508 508 K12 9,693,726 23.92% 1,218,281 12.57% 4,686,137 0.507 568 E. coli 22,575,903 55.71% 3,894,739 17.25% 9,245,090 0.51 1076 Bur1 1,886,054 4.65% 1,797,745 95.32% 2,914,771 0.68 144 Bur2 2,536,569 6.26% 2,464,534 97.16% 3,809,201 0.672 225 Bur 4,422,623 10.91% 4,262,279 96.37% 6,723,972 0.68 369 Figure 1 Hi-C insert distribution. The distribution of genomic distances between Hi-C read pairs is shown for read pairs mapping to each chromosome. For each read pair the minimum path length on the circular chromosome was calculated and read pairs separated by less than 1000 bp were discarded. The 2.5 Mb range was divided into 100 bins of equal size and the number of read pairs in each bin was recorded for each chromosome. Bin values for each chromosome were normalized to sum to 1 and plotted. E. coli K12 genome were distributed in a similar manner as previously reported (Fig. 1; (Lieberman-Aiden et al., 2009)). We observed a minor depletion of alignments spanning the linearization point of the E. coli K12 assembly (e.g., near coordinates 0 and 4686137) due to edge eVects induced by BWA treating the sequence as a linear chromosome rather than circular. OI 10.7717/peerj.415 9/19 Figure 2 Metagenomic Hi-C associations. The log-scaled, normalized number of Hi-C read pairs associating each genomic replicon in the synthetic community is shown as a heat map (see color scale, blue to yellow: low to high normalized, log scaled association rates). Bur1: B. thailandensis chromosome 1. Bur2: B. thailandensis chromosome 2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2: L. brevis plasmid 2, Ped: P. pentosaceus, K12: E. coli K12 DH10B, BL21: E. coli BL21. reference assemblies of the members of our synthetic microbial community with the same alignment parameters as were used in the top ranked clustering (described above). We first counted the number of Hi-C reads associating each reference assembly replicon (Fig. 2; Figure 3 Contigs associated by Hi-C reads. A graph is drawn with nodes depicting contigs and depicting associations between contigs as indicated by aligned Hi-C read pairs, with the count t depicted by the weight of edges. Nodes are colored to reflect the species to which they belong (see le with node size reflecting contig size. Contigs below 5 kb and edges with weights less than 5 were exc Contig associations were normalized for variation in contig size. typically represent the reads and variant sites as a variant graph wherein variant sit represented as nodes, and sequence reads define edges between variant sites observ the same read (or read pair). We reasoned that variant graphs constructed from H data would have much greater connectivity (where connectivity is defined as the m path length between randomly sampled variant positions) than graphs constructed Chris Beite l @datscimed Aaron Darling @koadman
  • 107. Long Reads Help, A Lot Hiseq & Miseq 100-250 bp Moleculo 2-20 kb Pacbio RSII 2-20kb Micky Kertesz, Tim Blauwcamp Meredith Ashby Cheryl Heiner Illumina-based “synthetic long reads” Real-time single molecule sequencing (p4-c2, p5-c3) 295 Megabases 474 Megabases 61 Gigabases Meredith Ashby
  • 108. Metagenomic Binning Phylogeny is an important tool in binning
  • 109. Sharpshooter Symbionts Wu et al. 2006 PLoS Biology 4: e188.
  • 111. Sharpshooter Symbiont Binning Wu et al. 2006 PLoS Biology 4: e188. Baumannia makes vitamins and cofactors Sulcia makes amino acids Wu et al. 2006 PLoS Biology 4: e188. Baumannia makes vitamins and cofactors Sulcia makes amino acids Phylogenetic Binning Nancy Moran Dongying Wu
  • 112. Resources and Reference Data Phylogenomic & Evolvability Phylogenomic Resources & Reference Data Communication
  • 113.
  • 114.
  • 115. Genomes Poorly Sampled Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
  • 116. 2002-2007: TIGR Tree of Life Project Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree Naomi Ward Kare n Nelson
  • 117. 2007-2014: GEBA Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree Dongyin g Wu Phi l Hugenholtz Niko s Kyrpides Hans-Pete r Klenk All a Lapidus
  • 118. Synapomorphies Exist Wu et al.. 2009. Nature 462: 1056-1060.
  • 120. GEBA Cyanobacteria Shih et al. 2013. PNAS 10.1073/pnas.1217107110 0.3 B1 B2 C1 Paulinella Glaucophyte Green Red Chromalveolates C2 C3 A E F G B3 D A B Fig. 2. Implications on plastid evolution. (A) Maxi- mum-likelihood phylogenetic tree of plastids and cya- nobacteria, grouped by subclades (Fig. 1). The red dot Chery l Kerfeld
  • 121. Haloarchaeal GEBA-like Lynch et al. (2012) PLoS ONE 7(7): e41389. doi:10.1371/journal.pone.0041389 Eri n Lynch
  • 122. The Dark Matter of Biology From Wu et al. 2009 Nature 462, 1056-1060
  • 123. JGI Dark Matter Project environmental samples (n=9) isolation of single cells (n=9,600) whole genome amplification (n=3,300) SSU rRNA gene based identification (n=2,000) genome sequencing, assembly and QC (n=201) draft genomes (n=201) SAK HSM ETL TG HOT GOM GBS EPR TA ETL T PR E BS AK E SM G TA T TG OM OT seawater brackish/freshwater hydrothermal sediment bioreactor GN04 WS3 (Latescibacteria) GN01 !"#$%&'$ LD1 WS1 Poribacteria BRC1 Lentisphaerae Verrucomicrobia OP3 (Omnitrophica) Chlamydiae Planctomycetes NKB19 (Hydrogenedentes) WYO Armatimonadetes WS4 Actinobacteria Gemmatimonadetes NC10 SC4 WS2 Cyanobacteria ()*&2 Deltaproteobacteria EM19 (Calescamantes) +,-*./'&'012345678#89/,-568/: GAL35 Aquificae EM3 Thermotogae Dictyoglomi SPAM GAL15 CD12 (Aerophobetes) OP8 (Aminicenantes) AC1 SBR1093 Thermodesulfobacteria Deferribacteres Synergistetes OP9 (Atribacteria) ()*&2 Caldiserica AD3 Chloroflexi Acidobacteria Elusimicrobia Nitrospirae 49S1 2B Caldithrix GOUTA4 *;<%0123=/68>8?8,6@98/: Chlorobi 486?8,A-5B Tenericutes 4AB@9/,-568/ Chrysiogenetes Proteobacteria 4896@9/,-565B TG3 Spirochaetes WWE1 (Cloacamonetes) C=1 ZB3 =D)&'E F58>@,@,,AB&CG56?AB OP1 (Acetothermia) Bacteriodetes TM7 GN02 (Gracilibacteria) SR1 BH1 OD1 (Parcubacteria) (*1 OP11 (Microgenomates) Euryarchaeota Micrarchaea DSEG (Aenigmarchaea) Nanohaloarchaea Nanoarchaea Cren MCG Thaumarchaeota Cren C2 Aigarchaeota Cren pISA7 Cren Thermoprotei Korarchaeota pMC2A384 (Diapherotrites) BACTERIA ARCHAEA archaeal toxins (Nanoarchaea) lytic murein transglycosylase stringent response (Diapherotrites, Nanoarchaea) ppGpp limiting amino acids SpotT RelA (GTP or GDP) + PPi GTP or GDP +ATP limiting phosphate, fatty acids, carbon, iron DksA Expression of components for stress response sigma factor (Diapherotrites, Nanoarchaea) !4 "#$#"% !2 !3 !1 -35 -10 &'() &*() +',#-./0123452 oxidoretucase + + e- donor e- acceptor H ' Ribo ADP + '62 O Reduction Oxidation H ' Ribo ADP '6 O 2 H ',)##$#6##$#72#####################',)6 + + - HGT from Eukaryotes (Nanoarchaea) Eukaryota O 68*62 OH '6 *8*63 O O 68*62 '6 *8*63 O tetra- peptide O 68*62 OH '6 *8*63 O O 68*62 '6 *8*63 O tetra- peptide murein (peptido-glycan) archaeal type purine synthesis (Microgenomates) PurF PurD 9:3' PurL/Q PurM PurK PurE 9:3* PurB PurP ? Archaea adenine guanine O 6##' 2 + ' '62 ' ' H H ' ' ' H H H ' ' H PRPP ;,<*,+ IMP ,<*,+ A * G U A * G U G U A * G U A U A * U A * U Growing AA chain =+',>?/0@# recognizes UGA 1+', UGA recoded for Gly (Gracilibacteria) ribosome Woyke et al. Nature 2013. Tanja
 Woyke
  • 124. Microbial Dark Matter Part 2 • Ramunas Stepanauskas • Tanja Woyke • Jonathan Eisen • Duane Moser • Tullis Onstott
  • 125. MAGs
  • 126. SFAMs (Sifting Families) Representative Genomes Extract Protein Annotation All v. All BLAST Homology Clustering (MCL) SFams Align & Build HMMs HMMs Screen for Homologs New Genomes Extract Protein Annotation Figure 1 Sharpton et al. 2012.BMC bioinformatics, 13(1), 264. A B C
  • 127. PhyEco Markers Phylogenetic group Genome Number Gene Number Maker Candidates Archaea 62 145415 106 Actinobacteria 63 267783 136 Alphaproteobacteria 94 347287 121 Betaproteobacteria 56 266362 311 Gammaproteobacteria 126 483632 118 Deltaproteobacteria 25 102115 206 Epislonproteobacteria 18 33416 455 Bacteriodes 25 71531 286 Chlamydae 13 13823 560 Chloroflexi 10 33577 323 Cyanobacteria 36 124080 590 Firmicutes 106 312309 87 Spirochaetes 18 38832 176 Thermi 5 14160 974 Thermotogae 9 17037 684 Wu D, Jospin G, Eisen JA (2013) Systematic Identification of Gene Families for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological Studies of Bacteria and Archaea and Their Major Subgroups. PLoS ONE 8(10): e77033. doi:10.1371/journal.pone.0077033