Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan Eisen

3,627 views

Published on

Slides from talk by Jonathan Eisen at the Keystone meeting on "microbial communities as drivers of ecosystem complexity"

Published in: Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,627
On SlideShare
0
From Embeds
0
Number of Embeds
989
Actions
Shares
0
Downloads
78
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Phylogenetic analysis of rRNAs led to the discovery of archaea\n
  • Phylogenetic analysis of rRNAs led to the discovery of archaea\n
  • This is a tree of a rRNA gene that was found on a large DNA fragment isolated from the Monterey Bay. This rRNA gene groups in a tree with genes from members of the gamma Proteobacteria a group that includes E. coli as well as many environmental bacteria. This rRNA phylotype has been found to be a dominant species in many ocean ecosystems.\n\n clone from the Sargasso Sea. This shows that this \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Gets better with more markers - but we do not have lots of sequences for these markers. We can get them from genomes. The more diverse the genomes, thebeter the marker set will be\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Sites of imaginative potential – iconic locations – Iconic Sampling – \n\nLife is strange – microbes are stranger – how do we capitalize on this?\n
  • \n
  • \n
  • Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan Eisen

    1. 1. Phylogenetic & Phylogenomic Approaches to Metagenomic Analysis Jonathan A. Eisen UC Davis Keystone Meeting #KSMicro March 26, 2011
    2. 2. Outline• Introduction to phylogeny• Phylogeny and metagenomics – Phylotyping – Phylogenetic Binning – Functional diversity and prediction – Phylogenetic ecology – Selecting species or genes for study
    3. 3. T. H. Dobzhansky (1973)“Nothing in biology makes senseexcept in the light of evolution.”
    4. 4. Evolutionary Perspective and Comparative Biology• Comparative biology is the analysis of differences and similarities between species.• An evolutionary perspective is useful in such studies because it allows one to focus on how and why similarities and differences came to be.• In other words, biological objects have a history and understanding that history is important
    5. 5. Phylogeny• Phylogeny is a description of the evolutionary history of relationships among organisms (or their parts).• This is frequently portrayed in a diagram called a phylogenetic tree.• Phylogenies can be more complex than a bifurcating tree (e.g., lateral gene transfer, recombination, hybridization)• History allows one to distinguish homology from convergence; tease apart issues with rate variation
    6. 6. Uses of Phylogenyin Metagenomics Example 1: Phylotyping
    7. 7. rRNA survey • Sequence rRNAs • Cluster into OTUs
    8. 8. rRNA surveyOTU1 • SequenceOTU2 rRNAsOTU3 • Cluster intoOTU4OTU5 OTUsOTU6OTU7OTU8OTU9OTU10
    9. 9. OTUs on Tree OTU1 OTU5 OTU4 OTU6 OTU2 OTU3 OTU7 OTU9 OTU8 OTU10
    10. 10. Uses of Tree • Clades • Rates of change • LGT • Convergence • Character history
    11. 11. rRNA Phylotyping
    12. 12. rRNA Phylotyping • Note - using a tree does not mean phylogeny always matters per se • But allows one to test whether and how it impacts biology, ecology, etc • When it does = homology • When it does not = convergence, HGT, etc
    13. 13. rRNA Phylotyping in Sargasso Sea Metagenomic Metagenomic Data Venter et al., Science 304: 66. 2004
    14. 14. Metagenomic Phylogenetic challenge A single tree with everything
    15. 15. PhylOTU: A High-Throughput Procedure QuantifiesMicrobial Community Diversity and Resolves Novel Taxafrom Metagenomic DataThomas J. Sharpton1*, Samantha J. Riesenfeld1, Steven W. Kembel2, Joshua Ladau1, James P.O’Dwyer2,3, Jessica L. Green2, Jonathan A. Eisen4, Katherine S. Pollard1,51 The J. David Gladstone Institutes, University of California San Francisco, San Francisco, California, United States of America, 2 Center for Ecology and EvolutionaryBiology, University of Oregon, Eugene, Oregon, United States of America, 3 Institute of Integrative and Comparative Biology, University of Leeds, Leeds, United Kingdom,4 Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America, 5 Institute for Human Genetics & Division of Biostatistics, Finding Metagenomic OTUsUniversity of California San Francisco, San Francisco, California, United States of America Abstract Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU- finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUs from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles. Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by analysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of the biosphere currently hidden from PCR-based surveys of diversity? Citation: Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O’Dwyer JP, et al. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061 Editor: Oded Be ` , Technion-Israel Institute of Technology, Israel ´ja Received July 22, 2010; Accepted December 17, 2010; Published January 20, 2011 Copyright: ß 2011 Sharpton et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
    16. 16. Shotgun Sequencing Allows Use of Alternative Anchors (e.g., RecA) Venter et al., Science 304: 66. 2004
    17. 17. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
    18. 18. AMPHORA Guide tree
    19. 19. 0 0.1750 0.3500 0.5250 0.7000 Al ph ap r Be ote ta o pr bac G ot t am eo eria m ba ap ct D ro te er ia el ta ob pr Ep ot act si eo erU lo ba ia nc np ct la ro er ss te ia ifi ob ed ac Pr te C ot ria yae o nob a bac C cter hl teriia am a Ac yd ia id ob e Ba act ct er er ia oi Ac de tin te ob s ac Aq ter ui ia Pl fic an ae ct om Sp yc iro ete ch s a Fi ete rm s ic ut C es hl or of le C xi U hl or nc la ob ss i ifi ed Ba ct er iafrrtsfpgkrpsIrplLrplTrplFrplErplSinfCrplPrplArplKrplBrplNrplDrplCrpsJrplMrpsErpsSrpsKrpsBrpsCpyrGrpoBnusArpsMrpmAdnaGsmpB
    20. 20. rRNA Tree of LifeBacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
    21. 21. rRNA Tree of LifeBacteria Wu et al. (2011) PLoS ONE 6(3): e18011. doi:10.1371/ journal.pone.0018011 Archaea ?????? Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
    22. 22. rRNA Tree of LifeBacteria Archaea Scanned through GOS data for rRNAs that fit this pattern Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
    23. 23. rRNA Tree of LifeBacteria Archaea Found many, but closer examination revealed all to have issues Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
    24. 24. Mol Evol (1m) 41:110S-1123 PURNALOF OLECULA EVOLUTIO @ S!"""I,rVcr1.g N"" Yorll", 19) RecAThe RecA Protein as a Model Molecule for Molecular Systematic Studiesof Bacteria: Comparison of Trees of RecAs and 16S rRNAs from theSame SpeciesJonathan A. EisenDepanmenl of Biological ScierM:es,Stanford Universily. SIaDfORi,CA 9430S-S020. USA (email: jeisen@leI8IM1sranfOld.edu)Received: I July 199.5/ Accepted: 2S July 199.5Abstract. The evolution of the RecA protein was an- Introductionalyzed using molecularphylogenetictechniques. Phylo- Molecular systematicsbas becomethe primary way togenetic trees of all currently available complete RecA detennineevolutionaryrelationships amongmicroorganproteins were inferred using multiple maximum parsi- isms because morphologicaland other phenotypiccharmony and distance matrix methods. Comparison and actersareeither absent changetoo rapidly to be usefu oranalysisof the treesrevealthat the inferredrelationships for phylogeneticinference(Woese 1987).Not all moleamongtheseproteinsare highly robust The RecA trees culesare equaIly useful for molecularsystematicstudieshow consistentsubdivisionscorresponding many of to and the molecule of choice for most such studies o
    25. 25. Homologs in GOS DataAnalysis 1st Done in 2004
    26. 26. GOS 1GOS 2GOS 3GOS 4GOS 5
    27. 27. GOS 1 GOS 2 GOS 3 GOS 4Thaumarchaeota
    28. 28. GOS 1 Phage? Phage? GOS 4Thaumarchaeota
    29. 29. ???? Phage? Phage? ????Thaumarchaeota
    30. 30. Uses of Phylogenyin Metagenomics Example 2: Binning
    31. 31. Binning challengeNo reference genome? What do you do?
    32. 32. Binning challengeNo reference genome? What do you do?Phylogeny
    33. 33. CFB Phyla
    34. 34. Sulcia makes amino acidsBaumannia makes vitamins and cofactors Wu et al. 2006 PLoS Biology 4: e188.
    35. 35. AMPHORA Guide tree
    36. 36. Uses of Phylogeny in Metagenomics Example 3:Functional Diversity and Functional Predictions
    37. 37. Predicting Function• Identification of motifs – Short regions of sequence similarity that are indicative of general activity – e.g., ATP binding• Homology/similarity based methods – Gene sequence is searched against a databases of other sequences – If significant similar genes are found, their functional information is used• Problem – Genes frequently have similarity to hundreds of motifs and multiple genes, not all with the same function
    38. 38. PHYLOGENENETIC PREDICTION OF GENE FUNCTION EXAMPLE A METHOD EXAMPLE B 2A CHOOSE GENE(S) OF INTEREST 5 3A 1 3 4 2B 2 IDENTIFY HOMOLOGS 5 1A 2A 1B 3B 6 ALIGN SEQUENCES 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 CALCULATE GENE TREE Duplication? 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 OVERLAY KNOWN FUNCTIONS ONTO TREE Duplication? 2A 3A 1B 2B 3B 1 2 3 4 5 6 1A INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication?Species 1 Species 2 Species 3 1A 1B 2A 2B 3A 3B 1 2 3 4 5 6 ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Based on Eisen, 1998 Genome Duplication Res 8: 163-167.
    39. 39. Massiuve Diversity of Proteorhodopsins Venter et al., 2004
    40. 40. Uses of Phylogeny in Metagenomics Example 4:Phylogenetic Ecology
    41. 41. Uses of Phylogeny in Metagenomics Example 5:Selecting Organisms for Study
    42. 42. rRNA Tree of LifeBacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
    43. 43. http://www.jgi.doe.gov/programs/GEBA/pilot.html
    44. 44. GEBA Phylogenomic Lesson 1 Phylogeny-driven genome selection helps discover new genetic diversity
    45. 45. Protein Family Rarefaction Curves• Take data set of multiple complete genomes• Identify all protein families using MCL• Plot # of genomes vs. # of protein families
    46. 46. Wu et al. 2009 Nature 462, 1056-1060
    47. 47. Wu et al. 2009 Nature 462, 1056-1060
    48. 48. Wu et al. 2009 Nature 462, 1056-1060
    49. 49. Wu et al. 2009 Nature 462, 1056-1060
    50. 50. Wu et al. 2009 Nature 462, 1056-1060
    51. 51. Synapomorphies existWu et al. 2009 Nature 462, 1056-1060
    52. 52. Families/PD not uniform +,%-./&#(%)"* !"#$%"&(%)"*! !
    53. 53. GEBA Phylogenomic Lesson 2 Improves analysis of genome data from uncultured organisms
    54. 54. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
    55. 55. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi without good C FB Major Phylogenetic Group Sargasso Phylotypes C Cannot be done hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c ia sampling of genomes oc cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
    56. 56. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch Phylogenetic Binning ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
    57. 57. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi without good C FB Major Phylogenetic Group Sargasso Phylotypes C Cannot be done hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c ia sampling of genomes oc cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
    58. 58. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut e improves Ac s tin ob ac te C ria hl o ro bi C GEBA Project FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu metagenomic analysis s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
    59. 59. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o But not a lot ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
    60. 60. Phylogeny and Metagenomics Future 1 Need to adapt genomic and metagenomic methods to make better use of data
    61. 61. iSEEM Project
    62. 62. AMPHORA
2
Coming
w/
More
Markers Phylogene9c
group Genome
 Gene
 Maker
 Number Number Candidates Archaea 62 145415 106 Ac-nobacteria 63 267783 136 Alphaproteobacteria 94 347287 121 Betaproteobacteria 56 266362 311 Gammaproteobacteria 126 483632 118 Deltaproteobacteria 25 102115 206 Epislonproteobacteria 18 33416 455 Bacteriodes 25 71531 286 Chlamydae 13 13823 560 Chloroflexi 10 33577 323 Cyanobacteria 36 124080 590 Firmicutes 106 312309 87 Spirochaetes 18 38832 176 Thermi 5 14160 974 Thermotogae 9 17037 684See posters by Dongying Wu and Guillaume Jospin
    63. 63. • Build AMPHORA ALL reference tree with concatenated alignment• Align reads that match any of the HMMs to concatenated alignment• Place reads into reference tree one at a time
    64. 64. Phylogeny and Metagenomics Future 2 We have still only scratched the surface of microbial diversity
    65. 65. rRNA Tree of LifeBacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
    66. 66. Phylogenetic Diversity: GenomesFrom Wuet al. 2009Nature462,1056-1060
    67. 67. Phylogenetic Diversity with GEBAFrom Wuet al. 2009Nature462,1056-1060
    68. 68. Phylogenetic Diversity: Isolates From Wu et al. 2009 Nature 462, 1056-1060
    69. 69. Phylogenetic Diversity: All From Wu et al. 2009 Nature 462, 1056-1060
    70. 70. GEBA uncultured Number of SAGs from Candidate Phyla 406 1 OD1 OP1 OP3 SAR Site A: Hydrothermal vent 4 1 - - Site B: Gold Mine 6 13 2 - Site C: Tropical gyres (Mesopelagic) - - - 2 Site D: Tropical gyres (Photic zone) 1 - - -Sample collections at 4 additional sites are underway. Phil Hugenholtz 72
    71. 71. Earth
Microbiome
Project• Goal
–
to
systema-cally
approach
the
problem
of
 characterizing
microbial
life
on
earth• Strategy: – Explore
microbes
in
environmental
parameter
space – Design
‘ideal’
strategy
to
interrogate
these
biomes – Acquire
samples
and
sequence
broad
and
deep
both
DNA,
mRNA
 and
rRNA – Define
microbial
community
structure
and
the
protein
universe• Gilbert
et
al.,
2010a,b
SIGS•
    72. 72. Phylogenomics Future 3Need Experiments from Across the Tree of Life too
    73. 73. A Happy Tree of Life

    ×