Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Phylogeny-Driven Approaches to
Studies of Microbial and Microbiome
Diversity
Jonathan A. Eisen
University of California, D...
Phylogeny-Driven Approaches to
Studies of Microbial and Microbiome
Diversity
Jonathan A. Eisen
University of California, D...
Phylogeny-Driven Approaches to
Studies of Microbial and Microbiome
Diversity
Jonathan A. Eisen
University of California, D...
Open Science
Open Science
X
Social Media & Science
Social Media & Science
X
• RedSox
RedSox
• RedSox
RedSox
X
Microbial Evolution
Microbial Evolution
Lesson 2:
History Matters
Microbial Evolution
Lesson 2:
History (of
species, genes,
people, science)
Matters
Example I: Lost in Graduate School?
Lost in Graduate School?
Get A Map
Tree from Woese. 1987.
Microbiological Reviews 51:221
Map for Graduate School
Carl Woese
Limited Sampling of RRR Studies
Tree from Woese. 1987.
Microbiological Reviews 51:221
My Study Organisms
Tree from Woese. 1987.
Microbiological Reviews 51:221
H. volcanii Excision Repair
0
0.2
0.4
0.6
0 2000 4000 6000 8000 10000 12000 14000 16000 18000
Avg. Mol. Wt.(Base Pairs)
H....
Tree from Woese. 1987.
Microbiological Reviews 51:221
Map for Graduate School
Lesson 3:
Go Fishing Where
Nobody Else Has
Example II: Rice Microbiomes and Phylogeny
Joseph
Edwards
@Bulk_Soil
Sundar
@sundarlab
Cameron
Johnson
Srijak
Bhatnagar
@s...
DNA
extraction
PCR
Sequence
rRNA genes
Sequence alignment = Data matrixPhylogenetic tree
PCR
rRNA1
rRNA2
Makes lots of
cop...
STAP
An Automated Phylogenetic Tree-Based Small Subunit
rRNA Taxonomy and Alignment Pipeline (STAP)
Dongying Wu1
*, Amber ...
WATERsPage 2 of 14
chimeric sequences generated during PCR identifying
closely related sets of sequences (also known as op...
alignment used to build the profile, resulting in a multiple
sequence alignment of full-length reference sequences and
PD ...
QIIME Phylotyping and Phylogenetic Ecology
296
Fig. S6. A set of 96 OTUs mainly consisting of Proteobacteria is297
compart...
QIIME Phylotyping and Phylogenetic Ecology
296
Fig. S6. A set of 96 OTUs mainly consisting of Proteobacteria is297
compart...
Rice Microbiome: Variation w/in Plant
Joseph
Edwards
@Bulk_Soil
Sundar
@sundarlab
Cameron
Johnson
Srijak
Bhatnagar
@srijak...
Rice Genotype Affects Microbiome
rhizocompartments were analyzed as before. Unfortunately,
collection of bulk soil control...
Rice: Cultivation Site Effects
Edwards et al. 2015.
Structure, variation, and
assembly of the root-
associated
microbiomes...
Rice: Functional Enrichment x Genotype
and mitochondrial) reads to analyze microbial abundance in
the endosphere over time...
Rice Developmental Time Series
of magnitude greater than in any single plant species
Under controlled greenhouse condition...
Tree from Woese. 1987.
Microbiological Reviews 51:221
Example III: rRNA Not Perfect
Lesson 5:
Nothing is Perfect
Tree from Woese. 1987.
Microbiological Reviews 51:221
Taxa Phylogeny III: rRNA Not Perfect
rRNA Copy # Correction by Phylogeny
Kembel SW, Wu M, Eisen JA, Green JL (2012) Incorporating 16S Gene Copy Number Informat...
DNA
extraction
PCR
Sequence
all genes
Phylogenetic tree
Shotgun
GeneX
E. coli Humans
GeneX
Yeast
GeneX
GeneX
Phylotyping
P...
RecA vs. rRNA
Eisen 1995 Journal of Molecular Evolution 41: 1105-1123..
RecA vs. rRNA
Eisen 1995 Journal of Molecular Evolution 41: 1105-1123..
Lesson 6:
Keep Going Back
to Your Past
Phylotyping w/ Protein Markers
AMPHORA
http://genomebiology.com/2008/9/10/R151 Genome Biology 2008, Volume 9, Issue 10, Ar...
GOS 1
GOS 2
GOS 3
GOS 4
GOS 5
Phylogenetic ID of Novel Lineages
Wu et al PLoS One 2011
Dongying Wu
Phylogenetic Diversity of Metagenomes
typically used as a qualitative measure because duplicate s
quences are usually remo...
Phylosift/ pplacer Workflow
Input Sequences
rRNA workflow
protein workflow
profile HMMs used to align
candidates to refere...
Whole Genome Tree of 2000 Taxa
Lang JM, Darling AE, Eisen JA (2013)
Phylogeny of Bacterial and Archaeal
Genomes Using Cons...
Phylosift Markers
• PMPROK – Dongying Wu’s Bac/Arch
markers
• Eukaryotic Orthologs – Parfrey 2011 paper
• 16S/18S rRNA
• M...
PhyEco Markers
Phylogenetic group Genome Number Gene Number Maker Candidates
Archaea 62 145415 106
Actinobacteria 63 26778...
Edge PCA: Identify
lineages that explain most
variation among samples
Edge PCA - Matsen and Evans 2013
Output: Edge PCA
QIIME Phylotyping and Phylogenetic Ecology
296
Fig. S6. A set of 96 OTUs mainly consisting of Proteobacteria is297
compart...
Example IV: Functional Evolution
My Study Organisms
Tree from Woese. 1987.
Microbiological Reviews 51:221
1st Genome Sequence
Fleischmann et al.
1995
TIGR Genome Projects
Tree from Woese. 1987.
Microbiological Reviews 51:221
1st Genome Sequence
Fleischmann et al.
1995
Lesson 8:
If you can’t beat
them, critique
them or join them
• Leveraging an understanding of the
evolution of function to better prediction
functions
Function & Phylogeny
PHYLOGENENETIC PREDICTION OF GENE FUNCTION
IDENTIFY HOMOLOGS
OVERLAY KNOWN
FUNCTIONS ONTO TREE
INFER LIKELY FUNCTION
OF GE...
PHYLOGENENETIC PREDICTION OF GENE FUNCTION
IDENTIFY HOMOLOGS
OVERLAY KNOWN
FUNCTIONS ONTO TREE
INFER LIKELY FUNCTION
OF GE...
Phylogenomics ~~ Phylotyping
Eisen et al.
1992Eisen et al. 1992. J. Bact.174: 3416
Phylogenomics ~~ Phylotyping
Eisen et al.
1992Eisen et al. 1992. J. Bact.174: 3416
Lesson 10:
Stealing (with
acknowledgeme...
Proteorhodopsin Functional Diversity
Venter et al., Science 304: 66. 2004
• Leveraging understanding of gene gain
and loss to better predict genome
functions
Lesson 11:
Who you hang out
with matte...
Carboxydothermus hydrogenoformans
• Isolated from a Russian hotspring
• Thermophile (grows at 80°C)
• Anaerobic
• Grows ve...
Homologs of Sporulation Genes
Wu et al. 2005 PLoS
Genetics 1: e65.
Carboxydothermus sporulates
Wu et al. 2005 PLoS Genetics 1: e65.
Non-Homology Predictions:
Phylogenetic Profiling
• Step 1: Search all genes in
organisms of interest against all
other gen...
Sporulation Gene Profile
Wu et al. 2005 PLoS Genetics 1: e65.
B. subtilis new sporulation genes
J Bacteriol. 2013 Jan;195(2):253-60. doi: 10.1128/JB.01778-12
Bjorn Traag
Richard Losick
Tree from Woese. 1987.
Microbiological Reviews 51:221
Example V: More Gaps
Lesson 12:
Keep Returning to
the Same Theme
Ove...
Yet Another Map
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Genomes Poorly Sampled
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
TIGR Tree of Life Project
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Genomic Encyclopedia of Bacteria & Archaea
Wu et al. 2009 Nature 462, 1056-1060
Figure from Barton, Eisen et al. “Evolutio...
Genomic Encyclopedia of Bacteria & Archaea
Wu et al. 2009 Nature 462, 1056-1060
Figure from Barton, Eisen et al. “Evolutio...
Family Diversity vs. PD
Wu et al. 2009 Nature 462, 1056-1060
GEBA Cyanobacteria
Shih et al. 2013. PNAS 10.1073/pnas.1217107110
0.3
B1
B2
C1
Paulinella
Glaucophyte
Green
Red
Chromalveo...
Haloarchaeal GEBA-like
Lynch et al. (2012) PLoS ONE 7(7): e41389. doi:10.1371/journal.pone.0041389
The Dark Matter of Biology
From Wu et al. 2009 Nature 462, 1056-1060
75
Number of SAGs from Candidate Phyla
OD1
OP11
OP3
SAR406
Site A: Hydrothermal vent 4 1 - -
Site B: Gold Mine 6 13 2 -
Si...
JGI Dark Matter Project
environmental
samples (n=9)
isolation of single
cells (n=9,600)
whole genome
amplification (n=3,30...
GAL35
Aquificae
EM3
Thermotogae
Dictyoglomi
SPAM
GAL15
CD12 (Aerophobetes)
OP8 (Aminicenantes)
AC1
SBR1093
Thermodesulfoba...
Chlorobi
)LUPLFXWHV
Tenericutes
)XVREDFWHULD
Chrysiogenetes
Proteobacteria
)LEUREDFWHUHV
TG3
Spirochaetes
WWE1 (Cloacamone...
recognizes
UGA
P51$
UGA recoded for Gly (Gracilibacteria)
ribosome
Woyke et al. Nature 2013.
A Genomic Encyclopedia of Microbes (GEM)
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al T...
Tetrahymena Genome Project
A Genomic Encyclopedia of Microbes (GEM)
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al T...
Tree from Woese. 1987.
Microbiological Reviews 51:221
Example VI: Beyond Sequence
Lesson 13:
Don’t Overdo It
With That The...
DNA
extraction
PCR
Sequence
all genes
Shotgun
Shotgun Metagenomics
Wu et al. 2006 PLoS Biology 4: e188.
Baumannia makes vitamins and cofactors
Sulcia makes amino acids
Phylogenetic Binning
HiC Crosslinking  Sequencing
Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore
RW, Eisen JA, Darling AE. (2014) Strain-...
Sequence Isn’t Everything
PB-PSB1
(Purple sulfur bacteria)
PB-SRB1
(Sulfate reducing bacteria)
(sulfate)
(sulfide)
Wilbank...
12
C, 12
C14
N, 32
S
Biomass
(RGB composite)
0.044 0.080
34S-incorporation
(34S/32S ratio)
Wilbanks, E.G. et al (2014). En...
Long Reads Help, A Lot
Hiseq  Miseq
100-250 bp
Moleculo
2-20 kb
Pacbio RSII
2-20kb
Micky Kertesz,
Tim Blauwcamp
Meredith A...
Light-responsive sulfate reducer?
rhodopsin
w/ Susumu Yoshizawa
Lesson 14:
Asking for, and
getting, help, is a
good thing
Seagrass Microbiome
1000 samples collected.
Not a blade of seagrass touched.
YEAR ONE


ZEN (Zostera Experimental Network)

25 partner sites
leaves, roots, sediment, and water samples
MICROBES
Acknowledgements
• GEBA:
• $$: DOE-JGI, DSMZ
• Eddy Rubin, Phil Hugenholtz, Hans-Peter Klenk, Nikos Kyrpides, Tanya Woyke,...
Upcoming SlideShare
Loading in …5
×

Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonathan Eisen at UCSB Feb 2015

2,048 views

Published on

Talk by Jonathan Eisen at UCSB February 2015 for the EEMG Graduate Student Symposium

Published in: Science
  • Be the first to comment

Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonathan Eisen at UCSB Feb 2015

  1. 1. Phylogeny-Driven Approaches to Studies of Microbial and Microbiome Diversity Jonathan A. Eisen University of California, Davis @phylogenomics February 7, 2015 UCSB EEMB Graduate Student Symposium
  2. 2. Phylogeny-Driven Approaches to Studies of Microbial and Microbiome Diversity Jonathan A. Eisen University of California, Davis @phylogenomics February 7, 2015 UCSB EEMB Graduate Student Symposium Some Lessons I Think I Have Learned
  3. 3. Phylogeny-Driven Approaches to Studies of Microbial and Microbiome Diversity Jonathan A. Eisen University of California, Davis @phylogenomics February 7, 2015 UCSB EEMB Graduate Student Symposium Lesson 1: Go With Your Obsessions
  4. 4. Open Science
  5. 5. Open Science X
  6. 6. Social Media & Science
  7. 7. Social Media & Science X
  8. 8. • RedSox RedSox
  9. 9. • RedSox RedSox X
  10. 10. Microbial Evolution
  11. 11. Microbial Evolution Lesson 2: History Matters
  12. 12. Microbial Evolution Lesson 2: History (of species, genes, people, science) Matters
  13. 13. Example I: Lost in Graduate School?
  14. 14. Lost in Graduate School? Get A Map
  15. 15. Tree from Woese. 1987. Microbiological Reviews 51:221 Map for Graduate School Carl Woese
  16. 16. Limited Sampling of RRR Studies Tree from Woese. 1987. Microbiological Reviews 51:221
  17. 17. My Study Organisms Tree from Woese. 1987. Microbiological Reviews 51:221
  18. 18. H. volcanii Excision Repair 0 0.2 0.4 0.6 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Avg. Mol. Wt.(Base Pairs) H. volcanii UV Repair Label 7 - 45J / m2) 45 J/m2 Dark 24 Hours 45 J/m2 Photoreac. 45 J/m2 t0 0 J/m2 t0 By Grombo - from Wikipedia 1E-07 1E-06 1E-05 0.0001 0.001 0.01 0.1 1 Relative Survival 0 50 100 150 200 250 300 350 400 UV J/m2 UV Survival E.coli vs H.volcanii H.volcanii WFD11 E.coli NR10125 mfd+ E.coli NR10121 mfd- From Eisen 1998. PhD Thesis.
  19. 19. Tree from Woese. 1987. Microbiological Reviews 51:221 Map for Graduate School Lesson 3: Go Fishing Where Nobody Else Has
  20. 20. Example II: Rice Microbiomes and Phylogeny Joseph Edwards @Bulk_Soil Sundar @sundarlab Cameron Johnson Srijak Bhatnagar @srijakbhatnagar Edwards et al. 2015. Structure, variation, and assembly of the root-associated microbiomes of rice. PNAS Supplementary Figures1 2 Fig. S1 Map depicting soil collection locations for greenhouse experiment.3 10 234 Fig. S2. Sampling and collection of the rhizocompartments. Roots are collected from rice235 plants and soil is shaken off the roots to leave ~1mm of soil around the roots. The ~1 mm of soil236
  21. 21. DNA extraction PCR Sequence rRNA genes Sequence alignment = Data matrixPhylogenetic tree PCR rRNA1 rRNA2 Makes lots of copies of the rRNA genes in sample rRNA1 5’...ACACACATAGGTGGAGCTA GCGATCGATCGA... 3’ E. coli Humans A T T A G A A C A T C A C A A C A G G A G T T C rRNA1 E. coli Humans rRNA2 rRNA2 5’..TACAGTATAGGTGGAGCTAG CGACGATCGA... 3’ rRNA3 5’...ACGGCAAAATAGGTGGATT CTAGCGATATAGA... 3’ rRNA4 5’...ACGGCCCGATAGGTGGATT CTAGCGCCATAGA... 3’ rRNA3 C A C T G T rRNA4 C A C A G T Yeast T A C A G T Yeast rRNA3 rRNA4 Phylogeny PCR and phylogenetic analysis of rRNA genes
  22. 22. STAP An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP) Dongying Wu1 *, Amber Hartman1,6 , Naomi Ward4,5 , Jonathan A. Eisen1,2,3 1 UC Davis Genome Center, University of California Davis, Davis, California, United States of America, 2 Section of Evolution and Ecology, College of Biological Sciences, University of California Davis, Davis, California, United States of America, 3 Department of Medical Microbiology and Immunology, School of Medicine, University of California Davis, Davis, California, United States of America, 4 Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, United States of America, 5 Center of Marine Biotechnology, Baltimore, Maryland, United States of America, 6 The Johns Hopkins University, Department of Biology, Baltimore, Maryland, United States of America Abstract Comparative analysis of small-subunit ribosomal RNA (ss-rRNA) gene sequences forms the basis for much of what we know about the phylogenetic diversity of both cultured and uncultured microorganisms. As sequencing costs continue to decline and throughput increases, sequences of ss-rRNA genes are being obtained at an ever-increasing rate. This increasing flow of data has opened many new windows into microbial diversity and evolution, and at the same time has created significant methodological challenges. Those processes which commonly require time-consuming human intervention, such as the preparation of multiple sequence alignments, simply cannot keep up with the flood of incoming data. Fully automated methods of analysis are needed. Notably, existing automated methods avoid one or more steps that, though computationally costly or difficult, we consider to be important. In particular, we regard both the building of multiple sequence alignments and the performance of high quality phylogenetic analysis to be necessary. We describe here our fully- automated ss-rRNA taxonomy and alignment pipeline (STAP). It generates both high-quality multiple sequence alignments and phylogenetic trees, and thus can be used for multiple purposes including phylogenetically-based taxonomic assignments and analysis of species diversity in environmental samples. The pipeline combines publicly-available packages (PHYML, BLASTN and CLUSTALW) with our automatic alignment, masking, and tree-parsing programs. Most importantly, this automated process yields results comparable to those achievable by manual analysis, yet offers speed and capacity that are unattainable by manual efforts. Citation: Wu D, Hartman A, Ward N, Eisen JA (2008) An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP). PLoS ONE 3(7): e2566. doi:10.1371/journal.pone.0002566 multiple alignment and phylogeny was deemed unfeasible. However, this we believe can compromise the value of the results. For example, the delineation of OTUs has also been automated via tools that do not make use of alignments or phylogenetic trees (e.g., Greengenes). This is usually done by carrying out pairwise comparisons of sequences and then clustering of sequences that have better than some cutoff threshold of similarity with each other). This approach can be powerful (and reasonably efficient) but it too has limitations. In particular, since multiple sequence alignments are not used, one cannot carry out standard phylogenetic analyses. In addition, without multiple sequence alignments one might end up comparing and contrasting different regions of a sequence depending on what it is paired with. The limitations of avoiding multiple sequence alignments and phylogenetic analysis are readily apparent in tools to classify sequences. For example, the Ribosomal Database Project’s Classifier program [29] focuses on composition characteristics of each sequence (e.g., oligonucleotide frequency) and assigns taxonomy based upon clustering genes by their composition. Though this is fast and completely automatable, it can be misled in cases where distantly related sequences have converged on similar composition, something known to be a major problem in ss-rRNA sequences [30]. Other taxonomy assignment systems focus primarily on the similarity of sequences. The simplest of these is classification tools it does have some limitations. For example, the generation of new alignments for each sequence is both computational costly, and does not take advantage of available curated alignments that make use of ss-RNA secondary structure to guide the primary sequence alignment. Perhaps most importantly however is that the tool is not fully automated. In addition, it does not generate multiple sequence alignments for all sequences in a dataset which would be necessary for doing many analyses. Automated methods for analyzing rRNA sequences are also available at the web sites for multiple rRNA centric databases, such as Greengenes and the Ribosomal Database Project (RDPII). Though these and other web sites offer diverse powerful tools, they do have some limitations. For example, not all provide multiple sequence alignments as output and few use phylogenetic approaches for taxonomy assignments or other analyses. More importantly, all provide only web-based interfaces and their integrated software, (e.g., alignment and taxonomy assignment), cannot be locally installed by the user. Therefore, the user cannot take advantage of the speed and computing power of parallel processing such as is available on linux clusters, or locally alter and potentially tailor these programs to their individual computing needs (Table 1). Given the limited automated tools that are available for Table 1. Comparison of STAP’s computational abilities relative to existing commonly-used ss-RNA analysis tools. STAP ARB Greengenes RDP Installed where? Locally Locally Web only Web only User interface Command line GUI Web portal Web portal Parallel processing YES NO NO NO Manual curation for taxonomy assignment NO YES NO NO Manual curation for alignment NO YES NO* NO Open source YES** NO NO NO Processing speed Fast Slow Medium Medium It is important to note, that STAP is the only software that runs on the command line and can take advantage of parallel processing on linux clusters and, further, is more amenable to downstream code manipulation. * Note: Greengenes alignment output is compatible with upload into ARB and downstream manual alignment. ** The STAP program itself is open source, the programs it depends on are freely available but not open source. doi:10.1371/journal.pone.0002566.t001 ss-rRNA Taxonomy Pipeline STAP database, and the query sequence is aligned to them using the CLUSTALW profile alignment algorithm [40] as described above for domain assignment. By adapting the profile alignment algorithm, th while gaps ar sequence ac Figure 1. A flow chart of the STAP pipeline. doi:10.1371/journal.pone.0002566.g001 STAP database, and the query sequence is aligned to them using the CLUSTALW profile alignment algorithm [40] as described above for domain assignment. By adapting the profile alignment algorithm, the alignments from the STAP database remain intact, while gaps are inserted and nucleotides are trimmed for the query sequence according to the profile defined by the previous alignments from the databases. Thus the accuracy and quality of the alignment generated at this step depends heavily on the quality of the Bacterial/Archaeal ss-rRNA alignments from the Greengenes project or the Eukaryotic ss-rRNA alignments from the RDPII project. Phylogenetic analysis using multiple sequence alignments rests on the assumption that the residues (nucleotides or amino acids) at the same position in every sequence in the alignment are homologous. Thus, columns in the alignment for which ‘‘positional homology’’ cannot be robustly determined must be excluded from subsequent analyses. This process of evaluating homology and eliminating questionable columns, known as masking, typically requires time- consuming, skillful, human intervention. We designed an automat- ed masking method for ss-rRNA alignments, thus eliminating this bottleneck in high-throughput processing. First, an alignment score is calculated for each aligned column by a method similar to that used in the CLUSTALX package [42]. Specifically, an R-dimensional sequence space representing all the possible nucleotide character states is defined. Then for each aligned column, the nucleotide populating that column in each of the aligned sequences is assigned a score in each of the R dimensions (Sr) according to the IUB matrix [42]. The consensus ‘‘nucleotide’’ for each column (X) also has R dimensions, with the Figure 2. Domain assignment. In Step 1, STAP assigns a domain to each query sequence based on its position in a maximum likelihood tree of representative ss-rRNA sequences. Because the tree illustrated here is not rooted, domain assignment would not be accurate and Figure 1. A flow chart of the STAP pipeline. doi:10.1371/journal.pone.0002566.g001 ss-rRNA Taxonomy Pipeline Dongying Wu Amber Hartman Naomi Ward
  23. 23. WATERsPage 2 of 14 chimeric sequences generated during PCR identifying closely related sets of sequences (also known as opera- tional taxonomic units or OTUs), removing redundant sequences above a certain percent identity cutoff, assign- ing putative taxonomic identifiers to each sequence or representative of a group, inferring a phylogenetic tree of the sequences, and comparing the phylogenetic structure Figure 1 Overview of WATERS. Schema of WATERS where white boxes indicate "behind the scenes" analyses that are performed in WA- TERS. Quality control files are generated for white boxes, but not oth- erwise routinely analyzed. Black arrows indicate that metadata (e.g., sample type) has been overlaid on the data for downstream interpre- tation. Colored boxes indicate different types of results files that are generated for the user for further use and biological interpretation. Colors indicate different types of WATERS actors from Fig. 2 which were used: green, Diversity metrics, WriteGraphCoordinates, Diversity graphs; blue, Taxonomy, BuildTree, Rename Trees, Save Trees; Create- Unifrac; yellow, CreateOtuTable, CreateCytoscape, CreateOTUFile; white, remaining unnamed actors. Align Check chimeras Cluster Build Tree Assign Taxonomy Tree w/ Taxonomy Diversity statistics & graphs Unifrac files Cytoscape network OTU table Hartman et al 2010. W.A.T.E.R.S.: a Workflow for the Alignment, Taxonomy, and Ecology of Ribosomal Sequences. BMC Bioinformatics 2010, 11:317 doi:10.1186/1471-2105-11-317 Hartman et al. BMC Bioinformatics 2010, 11:317 http://www.biomedcentral.com/1471-2105/11/317 Page 9 of 14 default is 97% and 99%), and they are also generated for every metadata variable comparison that the user includes. Data pruning To assist in troubleshooting and quality control, WATERS returns to the user three fasta files of sequences that were removed at various steps in the workflow. A short_sequences.fas file is created that contains all Figure 3 Biologically similar results automatically produced by WATERS on published colonic microbiota samples. (A) Rarefaction curves sim- ilar to curves shown in Eckburg et al. Fig. 2; 70-72, indicate patient numbers, i.e., 3 different individuals. (B) Weighted Unifrac analysis based on phylo- genetic tree and OTU data produced by WATERS very similar to Eckburg et al. Fig. 3B. (C) Neighbor-joining phylogenetic tree (Quicktree) representing the sequences analyzed by WATERS, which is clearly similar to Fig. S1 in Eckburg et al. BA 3 3HUFHQW YDULDWLRQ H[SODLQHG 33HUFHQWYDULDWLRQH[SODLQHG $% & '( ) 6 $ % & '( ) 6 $ %& ' () 6 3&$ 3 YV 3 C %$&7(52,'(7(6 %$&7(52,'$/(6 '(/7$3527(2%$&7(5,$ $&7,12%$&7(5,$ 9(558&20,&52%,$ (36,/213527(2%$&7(5,$ ),50,&87(6 &/2675,',$ &/2675,',$/(6 *$00$3527(2%$&7(5,$ &<$12%$&7(5,$ $/3+$3527(2%$&7(5,$ )862%$&7(5,$ ),50,&87(6 %$&,//, ),50,&87(6 02//,&87(6 Amber Hartman Bertram Ludaescer
  24. 24. alignment used to build the profile, resulting in a multiple sequence alignment of full-length reference sequences and PD versus PID clustering, 2) to explore overlap between PhylOTU clusters and recognized taxonomic designations, and 3) to quantify Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in this generalize workflow of PhylOTU. See Results section for details. doi:10.1371/journal.pcbi.1001061.g001 Finding Metagenomic OTUs Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer JP, Green JL, Eisen JA, Pollard KS. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061 PhylOTU Tom Sharpton @tjsharpton
  25. 25. QIIME Phylotyping and Phylogenetic Ecology 296 Fig. S6. A set of 96 OTUs mainly consisting of Proteobacteria is297 compartment in the greenhouse experiment. (A) Number of OTU298 they belong to that are enriched across all rhizocompartments in the299 A subset of the Proteobacteria and the classes and families they belo300 enriched across all rhizocompartments in the greenhouse.301 https://evomics.org/2014/01/the-glories-of-the-gut-ask-a-fat-mouse/
  26. 26. QIIME Phylotyping and Phylogenetic Ecology 296 Fig. S6. A set of 96 OTUs mainly consisting of Proteobacteria is297 compartment in the greenhouse experiment. (A) Number of OTU298 they belong to that are enriched across all rhizocompartments in the299 A subset of the Proteobacteria and the classes and families they belo300 enriched across all rhizocompartments in the greenhouse.301 https://evomics.org/2014/01/the-glories-of-the-gut-ask-a-fat-mouse/ Lesson 4: Accept When You Are Defeated
  27. 27. Rice Microbiome: Variation w/in Plant Joseph Edwards @Bulk_Soil Sundar @sundarlab Cameron Johnson Srijak Bhatnagar @srijakbhatnagar growth. For our study, the rhizosphere compartment was com- the un sitive t zocomp indicat microb and SI ration the ext terior o (PERM talizati microb P < 0.0 howeve the sec P < 0.0 perform (CAP) iance a Materia PCoA analysi terest t on the soil typ quenci agreem Fig. 1. Root-associated microbial communities are separable by rhizo- compartment and soil type. (A) A representation of a rice root cross-section depicting the locations of the microbial communities sampled. (B) Within- sample diversity (α-diversity) measurements between rhizospheric compart- ments indicate a decreasing gradient in microbial diversity from the rhizo- sphere to the endosphere independent of soil type. Estimated species Edwards et al. 2015. Structure, variation, and assembly of the root-associated microbiomes of rice. PNAS
  28. 28. Rice Genotype Affects Microbiome rhizocompartments were analyzed as before. Unfortunately, collection of bulk soil controls for the field experiment was not Fig. 3. Host plant genotype significantly affects microbial communities in the rhizospheric compartments. (A) Ordination of CAP analysis using the WUF metric constrained to rice genotype. (B) Within-sample diversity measurements of rhizosphere samples of each cultivar grown in each soil. Estimated species richness was calculated as eShannon_entropy . The horizontal Edwards et al. 2015. Structure, variation, and assembly of the root-associated microbiomes of rice. PNAS
  29. 29. Rice: Cultivation Site Effects Edwards et al. 2015. Structure, variation, and assembly of the root- associated microbiomes of rice. PNAS the field plants again showed that the rhizosphere had the highest microbial diversity, whereas the endosphere had the least found to be enriche greenhouse plants (S OTUs were classifiabl sisted of taxa in the fa and Myxococcaceae, al bidopsis root endosphe Cultivation Practice Result The rice fields that we practices, organic farmi tion called ecofarming farming in that chemica are all permitted but g harvest fumigants are n itself does significantly partments overall (P = a significant interaction the rhizocompartments indicating that the α-d affected differentially by the rhizosphere compa practice, with the mean zospheres than organic Dataset S14), whereas crobial communities (P tests; Dataset S14). Un practices are separable a the WUF metric (Fig.
  30. 30. Rice: Functional Enrichment x Genotype and mitochondrial) reads to analyze microbial abundance in the endosphere over time (Fig. 6A). Using this technique, we confirmed the sterility of seedling roots before transplantation. (13 d) approach the endosphere and rhizoplane microbiome compositions for plants that have been grown in the green- house for 42 d. Fig. 5. OTU coabundance network reveals modules of OTUs associated with methane cycling. (A) Subset of the entire network corresponding to 11 modules with methane cycling potential. Each node represents one OTU and an edge is drawn between OTUs if they share a Pearson correlation of greater than or equal to 0.6. (B) Depiction of module 119 showing the relationship between methanogens, syntrophs, methanotrophs, and other methane cycling taxonomies. Each node represents one OTU and is labeled by the presumed function of that OTU’s taxonomy in methane cycling. An edge is drawn between two OTUs if they have a Pearson correlation of greater than or equal to 0.6. (C) Mean abundance profile for OTUs in module 119 across all rhizocompartments and field sites. The position along the x axis corresponds to a different field site. Error bars represent SE. The x and y axes represent no particular scale. Edwards et al. 2015. Structure, variation, and assembly of the root-associated microbiomes of rice. PNAS
  31. 31. Rice Developmental Time Series of magnitude greater than in any single plant species Under controlled greenhouse conditions, the rhizocomp described the largest source of variation in the microb munities sampled (Dataset S5A). The pattern of separ tween the microbial communities in each compar consistent with a spatial gradient from the bulk soil a rhizosphere and rhizoplane into the endosphere (F Similarly, microbial diversity patterns within samples same pattern where there is a gradient in α-diversity rhizosphere to the endosphere (Fig. 1B). Enrichment pletion of certain microbes across the rhizocompartme cates that microbial colonization of rice roots is not a process and that plants have the ability to select for ce crobial consortia or that some microbes are better at f root colonizing niche. Similar to studies in Arabidopsis, w that the relative abundance of Proteobacteria is increas endosphere compared with soil, and that the relative abu of Acidobacteria and Gemmatimonadetes decrease from to the endosphere (9–11), suggesting that the distrib different bacterial phyla inside the roots might be simil land plants (Fig. 1D and Dataset S6). Under controlle house conditions, soil type described the second large of variation within the microbial communities of each However, the soil source did not affect the pattern of se between the rhizospheric compartments, suggesting rhizocompartments exert a recruitment effect on micro sortia independent of the microbiome source. By using differential OTU abundance analysis in t partments, we observed that the rhizosphere serves an ment role for a subset of microbial OTUs relative to (Fig. 2). Further, the majority of the OTUs enriche rhizosphere are simultaneously enriched in the rhizoplan endosphere of rice roots (Fig. 2B and SI Appendix, Fig consistent with a recruitment model in which factors pro the root attract taxa that can colonize the endosphere. W that the rhizoplane, although enriched for OTUs that enriched in the endosphere, is also uniquely enriched for of OTUs, suggesting that the rhizoplane serves as a sp Edwards et al. 2015. Structure, variation, and assembly of the root- associated microbiomes of rice. PNAS
  32. 32. Tree from Woese. 1987. Microbiological Reviews 51:221 Example III: rRNA Not Perfect Lesson 5: Nothing is Perfect
  33. 33. Tree from Woese. 1987. Microbiological Reviews 51:221 Taxa Phylogeny III: rRNA Not Perfect
  34. 34. rRNA Copy # Correction by Phylogeny Kembel SW, Wu M, Eisen JA, Green JL (2012) Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance. PLoS Comput Biol 8(10): e1002743. doi:10.1371/journal.pcbi.1002743 Jessica Green @jessicaleegreen Steven Kembel @stevenkembel Martin Wu
  35. 35. DNA extraction PCR Sequence all genes Phylogenetic tree Shotgun GeneX E. coli Humans GeneX Yeast GeneX GeneX Phylotyping Phylogeny in Shotgun Metagenomics
  36. 36. RecA vs. rRNA Eisen 1995 Journal of Molecular Evolution 41: 1105-1123..
  37. 37. RecA vs. rRNA Eisen 1995 Journal of Molecular Evolution 41: 1105-1123.. Lesson 6: Keep Going Back to Your Past
  38. 38. Phylotyping w/ Protein Markers AMPHORA http://genomebiology.com/2008/9/10/R151 Genome Biology 2008, Volume 9, Issue 10, Article R151 Wu and Eisen R151.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Alphaproteobacteria Betaproteobacteria G am m aproteobacteria D eltaproteobacteria Epsilonproteobacteria U nclassified proteobacteria Bacteroidetes C hlam ydiae C yanobacteria Acidobacteria Therm otogae Fusobacteria ActinobacteriaAquificae Planctom ycetes Spirochaetes Firm icutes C hloroflexiC hlorobi U nclassified bacteria dnaG frr infC nusA pgk pyrG rplA rplB rplC rplD rplE rplF rplK rplL rplM rplN rplP rplS rplT rpmA rpoB rpsB rpsC rpsE rpsI rpsJ rpsK rpsM rpsS smpB tsf Relativeabundance Martin Wu
  39. 39. GOS 1 GOS 2 GOS 3 GOS 4 GOS 5 Phylogenetic ID of Novel Lineages Wu et al PLoS One 2011 Dongying Wu
  40. 40. Phylogenetic Diversity of Metagenomes typically used as a qualitative measure because duplicate s quences are usually removed from the tree. However, the test may be used in a semiquantitative manner if all clone even those with identical or near-identical sequences, are i cluded in the tree (13). Here we describe a quantitative version of UniFrac that w call “weighted UniFrac.” We show that weighted UniFrac b haves similarly to the FST test in situations where both a FIG. 1. Calculation of the unweighted and the weighted UniFr measures. Squares and circles represent sequences from two differe environments. (a) In unweighted UniFrac, the distance between t circle and square communities is calculated as the fraction of t branch length that has descendants from either the square or the circ environment (black) but not both (gray). (b) In weighted UniFra branch lengths are weighted by the relative abundance of sequences the square and circle communities; square sequences are weight twice as much as circle sequences because there are twice as many tot circle sequences in the data set. The width of branches is proportion to the degree to which each branch is weighted in the calculations, an gray branches have no weight. Branches 1 and 2 have heavy weigh since the descendants are biased toward the square and circles, respe tively. Branch 3 contributes no value since it has an equal contributio from circle and square sequences after normalization. Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214 Jessica Green Steven Kembel Katie Pollard
  41. 41. Phylosift/ pplacer Workflow Input Sequences rRNA workflow protein workflow profile HMMs used to align candidates to reference alignment Taxonomic Summaries parallel option hmmalign multiple alignment LAST fast candidate search pplacer phylogenetic placement LAST fast candidate search LAST fast candidate search search input against references hmmalign multiple alignment hmmalign multiple alignment Infernal multiple alignment LAST fast candidate search <600 bp >600 bp Sample Analysis & Comparison Krona plots, Number of reads placed for each marker gene Edge PCA, Tree visualization, Bayes factor tests eachinputsequencescannedagainstbothworkflows Aaron Darling @koadman Erik Matsen @ematsen Holly Bik @hollybik Guillaume Jospin @guillaumejospin Darling AE, Jospin G, Lowe E, Matsen FA IV, Bik HM, Eisen JA. (2014) PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ 2:e243 http://dx.doi.org/10.7717/peerj. 243 Erik Lowe
  42. 42. Whole Genome Tree of 2000 Taxa Lang JM, Darling AE, Eisen JA (2013) Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees and Supermatrices. PLoS ONE 8(4): e62510. doi:10.1371/ journal.pone.0062510 Jenna Lang @jennnomics Aaron Darling @koadman
  43. 43. Phylosift Markers • PMPROK – Dongying Wu’s Bac/Arch markers • Eukaryotic Orthologs – Parfrey 2011 paper • 16S/18S rRNA • Mitochondria - protein-coding genes • Viral Markers – Markov clustering on genomes • Codon Subtrees – finer scale taxonomy • Extended Markers – plastids, gene families
  44. 44. PhyEco Markers Phylogenetic group Genome Number Gene Number Maker Candidates Archaea 62 145415 106 Actinobacteria 63 267783 136 Alphaproteobacteria 94 347287 121 Betaproteobacteria 56 266362 311 Gammaproteobacteria 126 483632 118 Deltaproteobacteria 25 102115 206 Epislonproteobacteria 18 33416 455 Bacteriodes 25 71531 286 Chlamydae 13 13823 560 Chloroflexi 10 33577 323 Cyanobacteria 36 124080 590 Firmicutes 106 312309 87 Spirochaetes 18 38832 176 Thermi 5 14160 974 Thermotogae 9 17037 684 Wu D, Jospin G, Eisen JA (2013) Systematic Identification of Gene Families for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological Studies of Bacteria and Archaea and Their Major Subgroups. PLoS ONE 8(10): e77033. doi:10.1371/journal.pone.0077033
  45. 45. Edge PCA: Identify lineages that explain most variation among samples Edge PCA - Matsen and Evans 2013 Output: Edge PCA
  46. 46. QIIME Phylotyping and Phylogenetic Ecology 296 Fig. S6. A set of 96 OTUs mainly consisting of Proteobacteria is297 compartment in the greenhouse experiment. (A) Number of OTU298 they belong to that are enriched across all rhizocompartments in the299 A subset of the Proteobacteria and the classes and families they belo300 enriched across all rhizocompartments in the greenhouse.301 https://evomics.org/2014/01/the-glories-of-the-gut-ask-a-fat-mouse/ Lesson 7: Don’t Accept When You Are Defeated
  47. 47. Example IV: Functional Evolution
  48. 48. My Study Organisms Tree from Woese. 1987. Microbiological Reviews 51:221
  49. 49. 1st Genome Sequence Fleischmann et al. 1995
  50. 50. TIGR Genome Projects Tree from Woese. 1987. Microbiological Reviews 51:221
  51. 51. 1st Genome Sequence Fleischmann et al. 1995 Lesson 8: If you can’t beat them, critique them or join them
  52. 52. • Leveraging an understanding of the evolution of function to better prediction functions Function & Phylogeny
  53. 53. PHYLOGENENETIC PREDICTION OF GENE FUNCTION IDENTIFY HOMOLOGS OVERLAY KNOWN FUNCTIONS ONTO TREE INFER LIKELY FUNCTION OF GENE(S) OF INTEREST 1 2 3 4 5 6 3 5 3 1A 2A 3A 1B 2B 3B 2A 1B 1A 3A 1B 2B 3B ALIGN SEQUENCES CALCULATE GENE TREE 1 2 4 6 CHOOSE GENE(S) OF INTEREST 2A 2A 5 3 Species 3Species 1 Species 2 1 1 2 2 2 31 1A 3A 1A 2A 3A 1A 2A 3A 4 6 4 5 6 4 5 6 2B 3B 1B 2B 3B 1B 2B 3B ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Duplication? EXAMPLE A EXAMPLE B Duplication? Duplication? Duplication 5 METHOD Ambiguous Based on Eisen, 1998 Genome Res 8: 163-167. Phylogenomics
  54. 54. PHYLOGENENETIC PREDICTION OF GENE FUNCTION IDENTIFY HOMOLOGS OVERLAY KNOWN FUNCTIONS ONTO TREE INFER LIKELY FUNCTION OF GENE(S) OF INTEREST 1 2 3 4 5 6 3 5 3 1A 2A 3A 1B 2B 3B 2A 1B 1A 3A 1B 2B 3B ALIGN SEQUENCES CALCULATE GENE TREE 1 2 4 6 CHOOSE GENE(S) OF INTEREST 2A 2A 5 3 Species 3Species 1 Species 2 1 1 2 2 2 31 1A 3A 1A 2A 3A 1A 2A 3A 4 6 4 5 6 4 5 6 2B 3B 1B 2B 3B 1B 2B 3B ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Duplication? EXAMPLE A EXAMPLE B Duplication? Duplication? Duplication 5 METHOD Ambiguous Based on Eisen, 1998 Genome Res 8: 163-167. Phylogenomics Lesson 9: If you invent your own omics word, you are stuck with it so use it for branding
  55. 55. Phylogenomics ~~ Phylotyping Eisen et al. 1992Eisen et al. 1992. J. Bact.174: 3416
  56. 56. Phylogenomics ~~ Phylotyping Eisen et al. 1992Eisen et al. 1992. J. Bact.174: 3416 Lesson 10: Stealing (with acknowledgement) is OK
  57. 57. Proteorhodopsin Functional Diversity Venter et al., Science 304: 66. 2004
  58. 58. • Leveraging understanding of gene gain and loss to better predict genome functions Lesson 11: Who you hang out with matters
  59. 59. Carboxydothermus hydrogenoformans • Isolated from a Russian hotspring • Thermophile (grows at 80°C) • Anaerobic • Grows very efficiently on CO (Carbon Monoxide) • Produces hydrogen gas • Low GC Gram positive (Firmicute) • Genome Determined (Wu et al. 2005 PLoS Genetics 1: e65. )
  60. 60. Homologs of Sporulation Genes Wu et al. 2005 PLoS Genetics 1: e65.
  61. 61. Carboxydothermus sporulates Wu et al. 2005 PLoS Genetics 1: e65.
  62. 62. Non-Homology Predictions: Phylogenetic Profiling • Step 1: Search all genes in organisms of interest against all other genomes • Ask: Yes or No, is each gene found in each other species • Cluster genes by distribution patterns (profiles)
  63. 63. Sporulation Gene Profile Wu et al. 2005 PLoS Genetics 1: e65.
  64. 64. B. subtilis new sporulation genes J Bacteriol. 2013 Jan;195(2):253-60. doi: 10.1128/JB.01778-12 Bjorn Traag Richard Losick
  65. 65. Tree from Woese. 1987. Microbiological Reviews 51:221 Example V: More Gaps Lesson 12: Keep Returning to the Same Theme Over and Over and Over
  66. 66. Yet Another Map Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
  67. 67. Genomes Poorly Sampled Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
  68. 68. TIGR Tree of Life Project Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
  69. 69. Genomic Encyclopedia of Bacteria & Archaea Wu et al. 2009 Nature 462, 1056-1060 Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
  70. 70. Genomic Encyclopedia of Bacteria & Archaea Wu et al. 2009 Nature 462, 1056-1060 Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
  71. 71. Family Diversity vs. PD Wu et al. 2009 Nature 462, 1056-1060
  72. 72. GEBA Cyanobacteria Shih et al. 2013. PNAS 10.1073/pnas.1217107110 0.3 B1 B2 C1 Paulinella Glaucophyte Green Red Chromalveolates C2 C3 A E F G B3 D A B Fig. mum noba
  73. 73. Haloarchaeal GEBA-like Lynch et al. (2012) PLoS ONE 7(7): e41389. doi:10.1371/journal.pone.0041389
  74. 74. The Dark Matter of Biology From Wu et al. 2009 Nature 462, 1056-1060
  75. 75. 75 Number of SAGs from Candidate Phyla OD1 OP11 OP3 SAR406 Site A: Hydrothermal vent 4 1 - - Site B: Gold Mine 6 13 2 - Site C: Tropical gyres (Mesopelagic) - - - 2 Site D: Tropical gyres (Photic zone) 1 - - - Sample collections at 4 additional sites are underway. Phil Hugenholtz GEBA Uncultured
  76. 76. JGI Dark Matter Project environmental samples (n=9) isolation of single cells (n=9,600) whole genome amplification (n=3,300) SSU rRNA gene based identification (n=2,000) genome sequencing, assembly and QC (n=201) draft genomes (n=201) SAK HSM ETLTG HOT GOM GBS EPR TAETL T PR EBS AK E SM G TATTG OM OT seawater brackish/freshwater hydrothermal sediment bioreactor GN04 WS3 (Latescibacteria) GN01 +Gí LD1 WS1 Poribacteria BRC1 Lentisphaerae Verrucomicrobia OP3 (Omnitrophica) Chlamydiae Planctomycetes NKB19 (Hydrogenedentes) WYO Armatimonadetes WS4 Actinobacteria Gemmatimonadetes NC10 SC4 WS2 Cyanobacteria :36í2 Deltaproteobacteria EM19 (Calescamantes) 2FW6SDí )HUYLGLEDFWHULD
  77. 77. GAL35 Aquificae EM3 Thermotogae Dictyoglomi SPAM GAL15 CD12 (Aerophobetes) OP8 (Aminicenantes) AC1 SBR1093 Thermodesulfobacteria Deferribacteres Synergistetes OP9 (Atribacteria) :36í2 Caldiserica AD3 Chloroflexi Acidobacteria Elusimicrobia Nitrospirae 49S1 2B Caldithrix GOUTA4 6$5 0DULQLPLFURELD
  78. 78. Chlorobi )LUPLFXWHV Tenericutes )XVREDFWHULD Chrysiogenetes Proteobacteria )LEUREDFWHUHV TG3 Spirochaetes WWE1 (Cloacamonetes) 70 ZB3 093í 'HLQRFRFFXVí7KHUPXV OP1 (Acetothermia) Bacteriodetes TM7 GN02 (Gracilibacteria) SR1 BH1 OD1 (Parcubacteria) :6 OP11 (Microgenomates) Euryarchaeota Micrarchaea DSEG (Aenigmarchaea) Nanohaloarchaea Nanoarchaea Cren MCG Thaumarchaeota Cren C2 Aigarchaeota Cren pISA7 Cren Thermoprotei Korarchaeota pMC2A384 (Diapherotrites) BACTERIA ARCHAEA archaeal toxins (Nanoarchaea) lytic murein transglycosylase stringent response (Diapherotrites, Nanoarchaea) ppGpp limiting amino acids SpotT RelA (GTP or GDP) + PPi GTP or GDP +ATP limiting phosphate, fatty acids, carbon, iron DksA Expression of components for stress response sigma factor (Diapherotrites, Nanoarchaea) ı4 ȕ ȕ¶ ı2ı3 ı1 -35 -10 Į17' Į7' 51$ SROPHUDVH oxidoretucase + +e- donor e- acceptor H 1 Ribo ADP + 1+2 O Reduction Oxidation H 1 Ribo ADP 1+ O 2H 1$' + H 1$'++ + - HGT from Eukaryotes (Nanoarchaea) Eukaryota O +2+2 OH 1+ 2+3 O O +2+2 1+ 2+3 O tetra- peptide O +2+2 OH 1+ 2+3 O O +2+2 1+ 2+3 O tetra- peptide murein (peptido-glycan) archaeal type purine synthesis (Microgenomates) PurF PurD 3XU1 PurL/Q PurM PurK PurE 3XU PurB PurP ? Archaea adenine guanine O + 12 + 1 1+2 1 1 H H 1 1 1 H H H1 1 H PRPP )$,$5 IMP $,$5 A GUA G U G U A G U A U A U A U Growing AA chain W51$*O
  79. 79. recognizes UGA P51$ UGA recoded for Gly (Gracilibacteria) ribosome Woyke et al. Nature 2013.
  80. 80. A Genomic Encyclopedia of Microbes (GEM) Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
  81. 81. Tetrahymena Genome Project
  82. 82. A Genomic Encyclopedia of Microbes (GEM) Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
  83. 83. Tree from Woese. 1987. Microbiological Reviews 51:221 Example VI: Beyond Sequence Lesson 13: Don’t Overdo It With That Theme
  84. 84. DNA extraction PCR Sequence all genes Shotgun Shotgun Metagenomics
  85. 85. Wu et al. 2006 PLoS Biology 4: e188. Baumannia makes vitamins and cofactors Sulcia makes amino acids Phylogenetic Binning
  86. 86. HiC Crosslinking Sequencing Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore RW, Eisen JA, Darling AE. (2014) Strain- and plasmid- level deconvolution of a synthetic metagenome by sequencing proximity ligation products. PeerJ 2:e415 http://dx.doi.org/10.7717/peerj.415 Table 1 Species alignment fractions. The number of reads aligning to each replicon present in the synthetic microbial community are shown before and after filtering, along with the percent of total constituted by each species. The GC content (“GC”) and restriction site counts (“#R.S.”) of each replicon, species, and strain are shown. Bur1: B. thailandensis chromosome 1. Bur2: B. thailandensis chromosome 2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2: L. brevis plasmid 2, Ped: P. pentosaceus, K12: E. coli K12 DH10B, BL21: E. coli BL21. An expanded version of this table can be found in Table S2. Sequence Alignment % of Total Filtered % of aligned Length GC #R.S. Lac0 10,603,204 26.17% 10,269,562 96.85% 2,291,220 0.462 629 Lac1 145,718 0.36% 145,478 99.84% 13,413 0.386 3 Lac2 691,723 1.71% 665,825 96.26% 35,595 0.385 16 Lac 11,440,645 28.23% 11,080,865 96.86% 2,340,228 0.46 648 Ped 2,084,595 5.14% 2,022,870 97.04% 1,832,387 0.373 863 BL21 12,882,177 31.79% 2,676,458 20.78% 4,558,953 0.508 508 K12 9,693,726 23.92% 1,218,281 12.57% 4,686,137 0.507 568 E. coli 22,575,903 55.71% 3,894,739 17.25% 9,245,090 0.51 1076 Bur1 1,886,054 4.65% 1,797,745 95.32% 2,914,771 0.68 144 Bur2 2,536,569 6.26% 2,464,534 97.16% 3,809,201 0.672 225 Bur 4,422,623 10.91% 4,262,279 96.37% 6,723,972 0.68 369 Figure 1 Hi-C insert distribution. The distribution of genomic distances between Hi-C read pairs is shown for read pairs mapping to each chromosome. For each read pair the minimum path length on the circular chromosome was calculated and read pairs separated by less than 1000 bp were discarded. The 2.5 Mb range was divided into 100 bins of equal size and the number of read pairs in each bin was recorded for each chromosome. Bin values for each chromosome were normalized to sum to 1 and plotted. E. coli K12 genome were distributed in a similar manner as previously reported (Fig. 1; (Lieberman-Aiden et al., 2009)). We observed a minor depletion of alignments spanning the linearization point of the E. coli K12 assembly (e.g., near coordinates 0 and 4686137) due to edge eVects induced by BWA treating the sequence as a linear chromosome rather than circular. 10.7717/peerj.415 9/19 Figure 2 Metagenomic Hi-C associations. The log-scaled, normalized number of Hi-C read pairs associating each genomic replicon in the synthetic community is shown as a heat map (see color scale, blue to yellow: low to high normalized, log scaled association rates). Bur1: B. thailandensis chromosome 1. Bur2: B. thailandensis chromosome 2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2: L. brevis plasmid 2, Ped: P. pentosaceus, K12: E. coli K12 DH10B, BL21: E. coli BL21. reference assemblies of the members of our synthetic microbial community with the same alignment parameters as were used in the top ranked clustering (described above). We first Figure 3 Contigs associated by Hi-C reads. A graph is drawn with nodes depicting contigs and edges depicting associations between contigs as indicated by aligned Hi-C read pairs, with the count thereof depicted by the weight of edges. Nodes are colored to reflect the species to which they belong (see legend) with node size reflecting contig size. Contigs below 5 kb and edges with weights less than 5 were excluded. Contig associations were normalized for variation in contig size. typically represent the reads and variant sites as a variant graph wherein variant sites are represented as nodes, and sequence reads define edges between variant sites observed in the same read (or read pair). We reasoned that variant graphs constructed from Hi-C data would have much greater connectivity (where connectivity is defined as the mean path length between randomly sampled variant positions) than graphs constructed from mate-pair sequencing data, simply because Hi-C inserts span megabase distances. Such Figure 4 Hi-C contact maps for replicons of Lactobacillus brevis. Contact maps show the number of Hi-C read pairs associating each region of the L. brevis genome. The L. brevis chromosome (Lac0, (A), Chris Beitel @datscimed Aaron Darling @koadman
  87. 87. Sequence Isn’t Everything PB-PSB1 (Purple sulfur bacteria) PB-SRB1 (Sulfate reducing bacteria) (sulfate) (sulfide) Wilbanks, E.G. et al (2014). Environmental Microbiology Lizzy Wilbanks @lizzywilbanks
  88. 88. 12 C, 12 C14 N, 32 S Biomass (RGB composite) 0.044 0.080 34S-incorporation (34S/32S ratio) Wilbanks, E.G. et al (2014). Environmental Microbiology Transfer of 34 S from SRB to PSB
  89. 89. Long Reads Help, A Lot Hiseq Miseq 100-250 bp Moleculo 2-20 kb Pacbio RSII 2-20kb Micky Kertesz, Tim Blauwcamp Meredith Ashby Cheryl Heiner Illumina-based synthetic long reads” Real-time single molecul sequencing (p4-c2, p5-c3) 295 Megabases 474 Megabases61 Gigabases
  90. 90. Light-responsive sulfate reducer? rhodopsin w/ Susumu Yoshizawa
  91. 91. Lesson 14: Asking for, and getting, help, is a good thing
  92. 92. Seagrass Microbiome 1000 samples collected. Not a blade of seagrass touched. YEAR ONE
  93. 93. 
 ZEN (Zostera Experimental Network)
 25 partner sites leaves, roots, sediment, and water samples
  94. 94. MICROBES
  95. 95. Acknowledgements • GEBA: • $$: DOE-JGI, DSMZ • Eddy Rubin, Phil Hugenholtz, Hans-Peter Klenk, Nikos Kyrpides, Tanya Woyke, Dongying Wu, Aaron Darling, Jenna Lang • GEBA Cyanobacteria • $$: DOE-JGI • Cheryl Kerfeld, Dongying Wu, Patrick Shih • Haloarchaea • $$$ NSF • Marc Facciotti, Aaron Darling, Erin Lynch, • Phylosift • $$$ DHS • Aaron Darling, Erik Matsen, Holly Bik, Guillaume Jospin • iSEEM: • $$: GBMF • Katie Pollard, Jessica Green, Martin Wu, Steven Kembel, Tom Sharpton, Morgan Langille, Guillaume Jospin, Dongying Wu, • aTOL • $$: NSF • Naomi Ward, Jonathan Badger, Frank Robb, Martin Wu, Dongying Wu • Others (not mentioned in detail) • $$: NSF, NIH, DOE, GBMF, DARPA, Sloan • Frank Robb, Craig Venter, Doug Rusch, Shibu Yooseph, Nancy Moran, Colleen Cavanaugh, Josh Weitz • EisenLab: Srijak Bhatnagar, Russell Neches, Lizzy Wilbanks, Holly Bik

×