Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan Eisen

4,413 views

Published on

Slides from talk by Jonathan Eisen at the Keystone meeting on "microbial communities as drivers of ecosystem complexity"

Published in: Health & Medicine
  • Be the first to comment

Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan Eisen

  1. 1. Phylogenetic & Phylogenomic Approaches to Metagenomic Analysis Jonathan A. Eisen UC Davis Keystone Meeting #KSMicro March 26, 2011
  2. 2. Outline• Introduction to phylogeny• Phylogeny and metagenomics – Phylotyping – Phylogenetic Binning – Functional diversity and prediction – Phylogenetic ecology – Selecting species or genes for study
  3. 3. T. H. Dobzhansky (1973)“Nothing in biology makes senseexcept in the light of evolution.”
  4. 4. Evolutionary Perspective and Comparative Biology• Comparative biology is the analysis of differences and similarities between species.• An evolutionary perspective is useful in such studies because it allows one to focus on how and why similarities and differences came to be.• In other words, biological objects have a history and understanding that history is important
  5. 5. Phylogeny• Phylogeny is a description of the evolutionary history of relationships among organisms (or their parts).• This is frequently portrayed in a diagram called a phylogenetic tree.• Phylogenies can be more complex than a bifurcating tree (e.g., lateral gene transfer, recombination, hybridization)• History allows one to distinguish homology from convergence; tease apart issues with rate variation
  6. 6. Uses of Phylogenyin Metagenomics Example 1: Phylotyping
  7. 7. rRNA survey • Sequence rRNAs • Cluster into OTUs
  8. 8. rRNA surveyOTU1 • SequenceOTU2 rRNAsOTU3 • Cluster intoOTU4OTU5 OTUsOTU6OTU7OTU8OTU9OTU10
  9. 9. OTUs on Tree OTU1 OTU5 OTU4 OTU6 OTU2 OTU3 OTU7 OTU9 OTU8 OTU10
  10. 10. Uses of Tree • Clades • Rates of change • LGT • Convergence • Character history
  11. 11. rRNA Phylotyping
  12. 12. rRNA Phylotyping • Note - using a tree does not mean phylogeny always matters per se • But allows one to test whether and how it impacts biology, ecology, etc • When it does = homology • When it does not = convergence, HGT, etc
  13. 13. rRNA Phylotyping in Sargasso Sea Metagenomic Metagenomic Data Venter et al., Science 304: 66. 2004
  14. 14. Metagenomic Phylogenetic challenge A single tree with everything
  15. 15. PhylOTU: A High-Throughput Procedure QuantifiesMicrobial Community Diversity and Resolves Novel Taxafrom Metagenomic DataThomas J. Sharpton1*, Samantha J. Riesenfeld1, Steven W. Kembel2, Joshua Ladau1, James P.O’Dwyer2,3, Jessica L. Green2, Jonathan A. Eisen4, Katherine S. Pollard1,51 The J. David Gladstone Institutes, University of California San Francisco, San Francisco, California, United States of America, 2 Center for Ecology and EvolutionaryBiology, University of Oregon, Eugene, Oregon, United States of America, 3 Institute of Integrative and Comparative Biology, University of Leeds, Leeds, United Kingdom,4 Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America, 5 Institute for Human Genetics & Division of Biostatistics, Finding Metagenomic OTUsUniversity of California San Francisco, San Francisco, California, United States of America Abstract Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU- finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUs from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles. Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by analysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of the biosphere currently hidden from PCR-based surveys of diversity? Citation: Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O’Dwyer JP, et al. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061 Editor: Oded Be ` , Technion-Israel Institute of Technology, Israel ´ja Received July 22, 2010; Accepted December 17, 2010; Published January 20, 2011 Copyright: ß 2011 Sharpton et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
  16. 16. Shotgun Sequencing Allows Use of Alternative Anchors (e.g., RecA) Venter et al., Science 304: 66. 2004
  17. 17. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  18. 18. AMPHORA Guide tree
  19. 19. 0 0.1750 0.3500 0.5250 0.7000 Al ph ap r Be ote ta o pr bac G ot t am eo eria m ba ap ct D ro te er ia el ta ob pr Ep ot act si eo erU lo ba ia nc np ct la ro er ss te ia ifi ob ed ac Pr te C ot ria yae o nob a bac C cter hl teriia am a Ac yd ia id ob e Ba act ct er er ia oi Ac de tin te ob s ac Aq ter ui ia Pl fic an ae ct om Sp yc iro ete ch s a Fi ete rm s ic ut C es hl or of le C xi U hl or nc la ob ss i ifi ed Ba ct er iafrrtsfpgkrpsIrplLrplTrplFrplErplSinfCrplPrplArplKrplBrplNrplDrplCrpsJrplMrpsErpsSrpsKrpsBrpsCpyrGrpoBnusArpsMrpmAdnaGsmpB
  20. 20. rRNA Tree of LifeBacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
  21. 21. rRNA Tree of LifeBacteria Wu et al. (2011) PLoS ONE 6(3): e18011. doi:10.1371/ journal.pone.0018011 Archaea ?????? Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
  22. 22. rRNA Tree of LifeBacteria Archaea Scanned through GOS data for rRNAs that fit this pattern Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
  23. 23. rRNA Tree of LifeBacteria Archaea Found many, but closer examination revealed all to have issues Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
  24. 24. Mol Evol (1m) 41:110S-1123 PURNALOF OLECULA EVOLUTIO @ S!"""I,rVcr1.g N"" Yorll", 19) RecAThe RecA Protein as a Model Molecule for Molecular Systematic Studiesof Bacteria: Comparison of Trees of RecAs and 16S rRNAs from theSame SpeciesJonathan A. EisenDepanmenl of Biological ScierM:es,Stanford Universily. SIaDfORi,CA 9430S-S020. USA (email: jeisen@leI8IM1sranfOld.edu)Received: I July 199.5/ Accepted: 2S July 199.5Abstract. The evolution of the RecA protein was an- Introductionalyzed using molecularphylogenetictechniques. Phylo- Molecular systematicsbas becomethe primary way togenetic trees of all currently available complete RecA detennineevolutionaryrelationships amongmicroorganproteins were inferred using multiple maximum parsi- isms because morphologicaland other phenotypiccharmony and distance matrix methods. Comparison and actersareeither absent changetoo rapidly to be usefu oranalysisof the treesrevealthat the inferredrelationships for phylogeneticinference(Woese 1987).Not all moleamongtheseproteinsare highly robust The RecA trees culesare equaIly useful for molecularsystematicstudieshow consistentsubdivisionscorresponding many of to and the molecule of choice for most such studies o
  25. 25. Homologs in GOS DataAnalysis 1st Done in 2004
  26. 26. GOS 1GOS 2GOS 3GOS 4GOS 5
  27. 27. GOS 1 GOS 2 GOS 3 GOS 4Thaumarchaeota
  28. 28. GOS 1 Phage? Phage? GOS 4Thaumarchaeota
  29. 29. ???? Phage? Phage? ????Thaumarchaeota
  30. 30. Uses of Phylogenyin Metagenomics Example 2: Binning
  31. 31. Binning challengeNo reference genome? What do you do?
  32. 32. Binning challengeNo reference genome? What do you do?Phylogeny
  33. 33. CFB Phyla
  34. 34. Sulcia makes amino acidsBaumannia makes vitamins and cofactors Wu et al. 2006 PLoS Biology 4: e188.
  35. 35. AMPHORA Guide tree
  36. 36. Uses of Phylogeny in Metagenomics Example 3:Functional Diversity and Functional Predictions
  37. 37. Predicting Function• Identification of motifs – Short regions of sequence similarity that are indicative of general activity – e.g., ATP binding• Homology/similarity based methods – Gene sequence is searched against a databases of other sequences – If significant similar genes are found, their functional information is used• Problem – Genes frequently have similarity to hundreds of motifs and multiple genes, not all with the same function
  38. 38. PHYLOGENENETIC PREDICTION OF GENE FUNCTION EXAMPLE A METHOD EXAMPLE B 2A CHOOSE GENE(S) OF INTEREST 5 3A 1 3 4 2B 2 IDENTIFY HOMOLOGS 5 1A 2A 1B 3B 6 ALIGN SEQUENCES 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 CALCULATE GENE TREE Duplication? 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 OVERLAY KNOWN FUNCTIONS ONTO TREE Duplication? 2A 3A 1B 2B 3B 1 2 3 4 5 6 1A INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication?Species 1 Species 2 Species 3 1A 1B 2A 2B 3A 3B 1 2 3 4 5 6 ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Based on Eisen, 1998 Genome Duplication Res 8: 163-167.
  39. 39. Massiuve Diversity of Proteorhodopsins Venter et al., 2004
  40. 40. Uses of Phylogeny in Metagenomics Example 4:Phylogenetic Ecology
  41. 41. Uses of Phylogeny in Metagenomics Example 5:Selecting Organisms for Study
  42. 42. rRNA Tree of LifeBacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
  43. 43. http://www.jgi.doe.gov/programs/GEBA/pilot.html
  44. 44. GEBA Phylogenomic Lesson 1 Phylogeny-driven genome selection helps discover new genetic diversity
  45. 45. Protein Family Rarefaction Curves• Take data set of multiple complete genomes• Identify all protein families using MCL• Plot # of genomes vs. # of protein families
  46. 46. Wu et al. 2009 Nature 462, 1056-1060
  47. 47. Wu et al. 2009 Nature 462, 1056-1060
  48. 48. Wu et al. 2009 Nature 462, 1056-1060
  49. 49. Wu et al. 2009 Nature 462, 1056-1060
  50. 50. Wu et al. 2009 Nature 462, 1056-1060
  51. 51. Synapomorphies existWu et al. 2009 Nature 462, 1056-1060
  52. 52. Families/PD not uniform +,%-./&#(%)"* !"#$%"&(%)"*! !
  53. 53. GEBA Phylogenomic Lesson 2 Improves analysis of genome data from uncultured organisms
  54. 54. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  55. 55. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi without good C FB Major Phylogenetic Group Sargasso Phylotypes C Cannot be done hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c ia sampling of genomes oc cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  56. 56. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch Phylogenetic Binning ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  57. 57. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi without good C FB Major Phylogenetic Group Sargasso Phylotypes C Cannot be done hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c ia sampling of genomes oc cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  58. 58. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut e improves Ac s tin ob ac te C ria hl o ro bi C GEBA Project FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu metagenomic analysis s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  59. 59. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o But not a lot ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  60. 60. Phylogeny and Metagenomics Future 1 Need to adapt genomic and metagenomic methods to make better use of data
  61. 61. iSEEM Project
  62. 62. AMPHORA
2
Coming
w/
More
Markers Phylogene9c
group Genome
 Gene
 Maker
 Number Number Candidates Archaea 62 145415 106 Ac-nobacteria 63 267783 136 Alphaproteobacteria 94 347287 121 Betaproteobacteria 56 266362 311 Gammaproteobacteria 126 483632 118 Deltaproteobacteria 25 102115 206 Epislonproteobacteria 18 33416 455 Bacteriodes 25 71531 286 Chlamydae 13 13823 560 Chloroflexi 10 33577 323 Cyanobacteria 36 124080 590 Firmicutes 106 312309 87 Spirochaetes 18 38832 176 Thermi 5 14160 974 Thermotogae 9 17037 684See posters by Dongying Wu and Guillaume Jospin
  63. 63. • Build AMPHORA ALL reference tree with concatenated alignment• Align reads that match any of the HMMs to concatenated alignment• Place reads into reference tree one at a time
  64. 64. Phylogeny and Metagenomics Future 2 We have still only scratched the surface of microbial diversity
  65. 65. rRNA Tree of LifeBacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
  66. 66. Phylogenetic Diversity: GenomesFrom Wuet al. 2009Nature462,1056-1060
  67. 67. Phylogenetic Diversity with GEBAFrom Wuet al. 2009Nature462,1056-1060
  68. 68. Phylogenetic Diversity: Isolates From Wu et al. 2009 Nature 462, 1056-1060
  69. 69. Phylogenetic Diversity: All From Wu et al. 2009 Nature 462, 1056-1060
  70. 70. GEBA uncultured Number of SAGs from Candidate Phyla 406 1 OD1 OP1 OP3 SAR Site A: Hydrothermal vent 4 1 - - Site B: Gold Mine 6 13 2 - Site C: Tropical gyres (Mesopelagic) - - - 2 Site D: Tropical gyres (Photic zone) 1 - - -Sample collections at 4 additional sites are underway. Phil Hugenholtz 72
  71. 71. Earth
Microbiome
Project• Goal
–
to
systema-cally
approach
the
problem
of
 characterizing
microbial
life
on
earth• Strategy: – Explore
microbes
in
environmental
parameter
space – Design
‘ideal’
strategy
to
interrogate
these
biomes – Acquire
samples
and
sequence
broad
and
deep
both
DNA,
mRNA
 and
rRNA – Define
microbial
community
structure
and
the
protein
universe• Gilbert
et
al.,
2010a,b
SIGS•
  72. 72. Phylogenomics Future 3Need Experiments from Across the Tree of Life too
  73. 73. A Happy Tree of Life

×