Jonathan Eisen talk at ASM General Meeting 2010

2,814 views

Published on

Talk by Jonathan Eisen at ASM General Meeting

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,814
On SlideShare
0
From Embeds
0
Number of Embeds
1,110
Actions
Shares
0
Downloads
27
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Jonathan Eisen talk at ASM General Meeting 2010

  1. 1. A phylogeny driven genomic encyclopedia of bacteria and archaea Jonathan A. Eisen Talk at ASMGM May 25, 2010 Tuesday, May 25, 2010
  2. 2. Fleischmann et al. 1995 Tuesday, May 25, 2010
  3. 3. Microbial genomes From http://genomesonline.org Tuesday, May 25, 2010
  4. 4. rRNA Tree of Life Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003. Tuesday, May 25, 2010
  5. 5. Proteobacteria 2002 TM6 OS-K Acidobacteria • At least 40 Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002 Tuesday, May 25, 2010
  6. 6. Proteobacteria 2002 TM6 OS-K Acidobacteria • At least 40 Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002 Tuesday, May 25, 2010
  7. 7. Proteobacteria 2002 TM6 OS-K Acidobacteria • At least 40 Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002 Tuesday, May 25, 2010
  8. 8. Proteobacteria 2002 TM6 OS-K Acidobacteria • At least 40 Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Same trend in Dictyoglomus Aquificae Thermudesulfobacteria Archaea Thermotogae OP1 Based on OP11 Hugenholtz, 2002 Tuesday, May 25, 2010
  9. 9. Proteobacteria 2002 TM6 OS-K Acidobacteria • At least 40 Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Same trend in Dictyoglomus Aquificae Thermudesulfobacteria Eukaryotes Thermotogae OP1 Based on OP11 Hugenholtz, 2002 Tuesday, May 25, 2010
  10. 10. The Tree is not Happy Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003. Tuesday, May 25, 2010
  11. 11. Why Increase Phylogenetic Coverage? • Common approach within some eukaryotic groups • Many small projects to fill in bacterial or archaeal gaps • Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature • Many potential benefits Tuesday, May 25, 2010
  12. 12. Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of OP8 Project Nitrospira Bacteroides bacteria Chlorobi • A genome Fibrobacteres Marine GroupA • Genome WS3 from each of Gemmimonas sequences are Firmicutes eight phyla Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are only Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Solution I: Dictyoglomus Aquificae sequence more Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 phyla OP11 Tuesday, May 25, 2010
  13. 13. Tuesday, May 25, 2010
  14. 14. The Tree of Life is Still Angry Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003. Tuesday, May 25, 2010
  15. 15. Major Lineages of Actinobacteria 2.5 Actinobacteria 2.5.1 Acidimicrobidae 2.5.1 Acidimicrobidae 2.5.1.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.1 Unclassified 2.5.1.3 Acidimicrobineae 2.5.1.3.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.3.2 Acidimicrobiaceae 2.5.1.4 BD2-10 2.5.1.3 Acidimicrobineae 2.5.1.5 EB1017 2.5.2 Actinobacteridae 2.5.1.4 BD2-10 2.5.2.1 Unclassified 2.5.2.10 Ellin306/WR160 2.5.1.5 EB1017 2.5.2.11 Ellin5012 2.5.2.12 Ellin5034 2.5.2 Actinobacteridae 2.5.2.13 Frankineae 2.5.2.13.1 Unclassified 2.5.2.1 Unclassified 2.5.2.13.2 Acidothermaceae 2.5.2.10 Ellin306/WR160 2.5.2.13.3 2.5.2.13.4 Ellin6090 Frankiaceae 2.5.2.11 Ellin5012 2.5.2.13.5 2.5.2.13.6 Geodermatophilaceae Microsphaeraceae 2.5.2.12 Ellin5034 2.5.2.13.7 2.5.2.14 Sporichthyaceae Glycomyces 2.5.2.13 Frankineae 2.5.2.15 2.5.2.15.1 Intrasporangiaceae Unclassified 2.5.2.14 Glycomyces 2.5.2.15.2 2.5.2.15.3 Dermacoccus Intrasporangiaceae 2.5.2.15 Intrasporangiaceae 2.5.2.16 2.5.2.17 Kineosporiaceae Microbacteriaceae 2.5.2.16 Kineosporiaceae 2.5.2.17.1 2.5.2.17.2 Unclassified Agrococcus 2.5.2.17 Microbacteriaceae 2.5.2.17.3 2.5.2.18 Agromyces Micrococcaceae 2.5.2.18 Micrococcaceae 2.5.2.19 2.5.2.2 Micromonosporaceae Actinomyces 2.5.2.19 Micromonosporaceae 2.5.2.20 2.5.2.20.1 Propionibacterineae Unclassified 2.5.2.2 Actinomyces 2.5.2.20.2 2.5.2.20.3 Kribbella Nocardioidaceae 2.5.2.20 Propionibacterineae 2.5.2.20.4 2.5.2.21 Propionibacteriaceae Pseudonocardiaceae 2.5.2.21 Pseudonocardiaceae 2.5.2.22 2.5.2.22.1 Streptomycineae Unclassified 2.5.2.22 Streptomycineae 2.5.2.22.2 2.5.2.22.3 Kitasatospora Streptacidiphilus 2.5.2.23 Streptosporangineae 2.5.2.23 2.5.2.23.1 Streptosporangineae Unclassified 2.5.2.3 Actinomycineae 2.5.2.23.2 2.5.2.23.3 Ellin5129 Nocardiopsaceae 2.5.2.4 Actinosynnemataceae 2.5.2.23.4 2.5.2.23.5 Streptosporangiaceae Thermomonosporaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.3 Actinomycineae 2.5.2.4 Actinosynnemataceae 2.5.2.6 Brevibacteriaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.6 Brevibacteriaceae 2.5.2.7 Cellulomonadaceae 2.5.2.7 Cellulomonadaceae 2.5.2.8 Corynebacterineae 2.5.2.8 Corynebacterineae 2.5.2.8.1 Unclassified 2.5.2.8.2 Corynebacteriaceae 2.5.2.9 Dermabacteraceae 2.5.2.8.3 Dietziaceae 2.5.2.8.4 Gordoniaceae 2.5.3 Coriobacteridae 2.5.2.8.5 Mycobacteriaceae 2.5.2.8.6 Rhodococcus 2.5.3.1 Unclassified 2.5.2.8.7 Rhodococcus 2.5.2.8.8 Rhodococcus 2.5.3.2 Atopobiales 2.5.2.9 Dermabacteraceae 2.5.2.9.1 Unclassified 2.5.3.3 Coriobacteriales 2.5.2.9.2 Brachybacterium 2.5.2.9.3 Dermabacter 2.5.3.4 Eggerthellales 2.5.3 Coriobacteridae 2.5.3.1 Unclassified 2.5.4 OPB41 2.5.3.2 Atopobiales 2.5.3.3 Coriobacteriales 2.5.5 PK1 2.5.3.4 Eggerthellales 2.5.4 OPB41 2.5.6 Rubrobacteridae 2.5.5 PK1 2.5.6 Rubrobacteridae 2.5.6.1 Unclassified 2.5.6.1 Unclassified 2.5.6.2 "Thermoleiphilaceae 2.5.6.2 "Thermoleiphilaceae 2.5.6.2.1 Unclassified 2.5.6.2.2 Conexibacter 2.5.6.3 MC47 2.5.6.2.3 XGE514 2.5.6.3 MC47 2.5.6.4 Rubrobacteraceae 2.5.6.4 Rubrobacteraceae Tuesday, May 25, 2010
  16. 16. Proteobacteria TM6 OS-K • At least 100 phyla of Acidobacteria Termite Group OP8 bacteria Nitrospira Bacteroides Chlorobi • Genome sequences are Fibrobacteres Marine GroupA mostly from three phyla WS3 Gemmimonas Firmicutes • Most phyla with cultured Fusobacteria Actinobacteria species are sparsely OP9 Cyanobacteria Synergistes sampled Deferribacteres Chrysiogenetes NKB19 • Lineages with no cultured Verrucomicrobia Chlamydia OP3 taxa even more poorly Planctomycetes Spriochaetes sampled Coprothmermobacter OP10 Thermomicrobia Chloroflexi • Solution - use tree to really TM7 Deinococcus-Thermus fill gaps Dictyoglomus Aquificae Well sampled phyla Thermudesulfobacteria Thermotogae OP1 OP11 Tuesday, May 25, 2010
  17. 17. http://www.jgi.doe.gov/programs/GEBA/pilot.html Tuesday, May 25, 2010
  18. 18. A Genomic Encyclopedia of Bacteria and Archaea (GEBA) Tuesday, May 25, 2010
  19. 19. GEBA Pilot Project Overview • Identify major branches in rRNA tree for which no genomes are available • Identify branches with a cultured representative in DSMZ • Grow > 200 of these and prep. DNA • Sequence and finish 100 (covering breadth of bacterial/archaea diversity) • Annotate, analyze, release data • Assess benefits of tree guided sequencing Tuesday, May 25, 2010
  20. 20. GEBA and Openness • All data released as quickly as possible w/ no restrictions to IMG-GEBA; Genbank, etc • Data also available in Biotorrents (http://biotorrents.net) • Individual genome reports published in OA “Standards in Genome Sciences (SIGS)” • 1st GEBA paper in Nature freely available and published using Creative Commons License Tuesday, May 25, 2010
  21. 21. GEBA Lesson 1 rRNA Tree is Useful for Identifying Phylogenetically Novel Genomes Tuesday, May 25, 2010
  22. 22. rRNA Tree of Life Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003. Tuesday, May 25, 2010
  23. 23. Network of Life Bacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003. Tuesday, May 25, 2010
  24. 24. Whole Genome Tree w/ AMPHORA See Wu and Eisen, Genome Biology 2008 9: R151 http://bobcat.genomecenter.ucdavis.edu/AMPHORA/ Tuesday, May 25, 2010
  25. 25. Compare PD in Trees Tuesday, May 25, 2010
  26. 26. PD of rRNA, Genome Trees Similar From Wu et al. 2009 Nature 462, 1056-1060 Tuesday, May 25, 2010
  27. 27. GEBA Lesson 1B rRNA Tree topology is not perfect; Genome-based trees better Tuesday, May 25, 2010
  28. 28. 16s Says Hyphomonas is in Rhodobacteriales Badger et al. 2005 28 Tuesday, May 25, 2010
  29. 29. WGT and individual gene trees: Its Related to Caulobacterales Badger et al. 2005 29 Tuesday, May 25, 2010
  30. 30. Wh Concatenated alignment “whole genome tree” built using AMPHORA Tuesday, May 25, 2010
  31. 31. Whole genome phylogeny? • Many approaches – Gene presence/absence – Concatenation of phylogenetic markers – Separate phylogeny of genes and then integration of results (e.g., networks) – Models that incorporate gain/loss as well as gene phylogeny • No new results from us – However ... see Eric Alm talk Ballroom A - “Microbes in a changing world” session tomorrow AM Tuesday, May 25, 2010
  32. 32. GEBA Lesson 2 Phylogeny-driven genome selection helps discover new genetic diversity Tuesday, May 25, 2010
  33. 33. Network of Life Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003. Tuesday, May 25, 2010
  34. 34. Protein Family Rarefaction Curves • Take data set of multiple complete genomes • Identify all protein families using MCL • Plot # of genomes vs. # of protein families Tuesday, May 25, 2010
  35. 35. Tuesday, May 25, 2010
  36. 36. Tuesday, May 25, 2010
  37. 37. Tuesday, May 25, 2010
  38. 38. Tuesday, May 25, 2010
  39. 39. Tuesday, May 25, 2010
  40. 40. Synapomorphies exist Tuesday, May 25, 2010
  41. 41. GEBA Lesson 3 Phylogeny-driven genome selection improves genome annotation Tuesday, May 25, 2010
  42. 42. Predicting Function • Key step in genome projects • More accurate predictions help guide experimental and computational analyses • Many diverse approaches • Comparative and evolutionary analysis greatly improves most predictions Tuesday, May 25, 2010
  43. 43. Most/All Functional Prediction Improves w/ Better Phylogenetic Sampling • Better definition of protein family sequence “patterns” (e.g., improved HMMs) • Conversion of hypothetical into conserved hypotheticals • Greatly improves “comparative” and “evolutionary” based predictions • Linking distantly related members of protein families • Improved non-homology prediction Tuesday, May 25, 2010
  44. 44. From Wu et al. 2009. Tuesday, May 25, 2010
  45. 45. GEBA Lesson 4 Phylogeny-driven genome selection improves analysis of genome data from uncultured organisms Tuesday, May 25, 2010
  46. 46. Metagenomics Challenge Tuesday, May 25, 2010
  47. 47. Metagenomics Challenge 1. Who is out there? 2. What are they doing? Tuesday, May 25, 2010
  48. 48. Who is out there? • Mimic rRNA PCR based studies • But can now do these with other genes Tuesday, May 25, 2010
  49. 49. rRNA phylotyping from metagenomics Venter et al., 2004 Tuesday, May 25, 2010
  50. 50. Shotgun Sequencing Allows Use of Alternative Anchors (e.g., RecA) Venter et al., 2004 Tuesday, May 25, 2010
  51. 51. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro t eo Be b ac ta te pr ria ot eo G ba am Tuesday, May 25, 2010 ct m er ia ap ro te Ep ob si ac lo te np ria ro te De ob ac lta te pr ria ot eo ba C ct ya er no ia ba ct er ia Fi rm ic ut es Ac tin ob ac te ria C hl or ob i C Major Phylogenetic Group FB Sargasso Phylotypes C hl or of le xi Sp iro cha et es Fu so ba De ct in er o ia co cc u s- Th er Eu ry m ar us ch ae ot C a re na rc ha eo ta Shotgun Sequencing Allows Use of Other Markers Venter et al., 2004 EFG EFTu rRNA RecA RpoB HSP70
  52. 52. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro t eo Be b ac ta te pr ria ot eo G ba am Tuesday, May 25, 2010 ct m er ia ap ro te Ep ob si ac lo te np ria ro te De ob ac lta te pr ria ot eo ba C ct ya er no ia ba ct er ia Fi rm ic ut es sampling Ac tin ob ac te ria C hl or ob i C Major Phylogenetic Group better genomic FB Sargasso Phylotypes C hl or ofl ex Sp i iro cha et es Fu so ba Should improve with De ct in er o ia co cc u s- Th er Eu ry m ar us ch ae ot C a re na rc ha eo ta Shotgun Sequencing Allows Use of Other Markers Venter et al., 2004 EFG EFTu rRNA RecA RpoB HSP70
  53. 53. Functional Inference from Metagenomics • Can work well for individual genes • Predicting “community” function is challenging because treating community as a bag of genes does not work well • Better to “compartmentalize” data ... Tuesday, May 25, 2010
  54. 54. Binning challenge A T B U C V D W E X F Y G Z Tuesday, May 25, 2010
  55. 55. Binning challenge A T B U C V D W E X F Y G Best binning method: reference genomes Z Tuesday, May 25, 2010
  56. 56. Reference Genomes Coming from Select Environment Tuesday, May 25, 2010
  57. 57. Binning challenge A T B U C V D W E X F Y G No reference genome? What do you do? Z Tuesday, May 25, 2010
  58. 58. Binning challenge A T B U C V D W E X F Y G No reference genome? What do you do? Z Phylogeny .... Tuesday, May 25, 2010
  59. 59. AMPHORA Guide tree Tuesday, May 25, 2010
  60. 60. Al ph ap ro Be te ta o ba G p 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 am ro ct te er m o ia Tuesday, May 25, 2010 ap ba ro ct D te er el ob ia ta pr ac Ep ot te U si lo eo ria nc ba la np ct ss ro er ifi te ia ed ob Pr ac ot te eo ria ba Cy ct an er ob ia ac Ch te ria la m Ac yd id ia ob e Ba ac te ct ria er Ac oi de tin te ob s ac te ria Aq Pl ui an fic ct om ae yc Sp et AMPHORA - each read on its own tree iro es ch ae Fi te rm s ic ut Ch es lo ro U fle nc xi la Ch ss lo ifi ro ed bi Ba ct er ia Phylogenetic Binning Using AMPHORA frr tsf pgk rplL rplF rplP rplT rplE infC rpsI rplS rplA rplB rplK rplC rpsJ rplN rplD rplM rpsE rpsS rpsB rpsK rpsC rpoB rpsM pyrG nusA dnaG rpmA smpB
  61. 61. Phylogenetic Binning Using AMPHORA dnaG 0.7 frr infC 0.6 nusA pgk pyrG 0.5 0.4 Should improve with rplA rplB rplC rplD 0.3 better genomic rplE rplF rplK rplL 0.2 0.1 sampling rplM rplN rplP rplS rplT rpmA 0 rpoB rpsB es ia s es s ria ia ia bi ia ia om ae e ia ria ria ria xi te te ia er er er er r er fle ro et ut rpsC fic te te te te te yd de ae ct ct ct ct ct lo yc ro ic ac ac ac ac ac ui m ch oi ba ba Ch ba ba Ba rm rpsE lo Aq ob ob ob ob ob er la iro eo Ch o eo o Fi ed Ch ct an te te te te id tin ct rpsI Sp ot ot Ba Ac ro ro ro ro ifi an Cy Ac Pr pr ss ap p ap np rpsJ Pl ta ta ed la ph m lo el Be nc rpsK si ifi am Al D Ep U ss rpsM G la nc rpsS U smpB tsf AMPHORA - each read on its own tree Tuesday, May 25, 2010
  62. 62. Metagenomic Analysis Improves w/ Phylogenetic Sampling • Small but real improvements in – Gene identification / confirmation – Functional prediction – Binning – Phylogenetic classification Tuesday, May 25, 2010
  63. 63. Metagenomic Analysis Improves w/ Phylogenetic Sampling • Small but real improvements in – Gene identification / confirmation – Functional prediction – Binning – Phylogenetic classification • But not a lot ... Tuesday, May 25, 2010
  64. 64. How to improve phylogenetic analysis of metagenomic data • Fragmented data • Which genes to use? • More automation Tuesday, May 25, 2010
  65. 65. iSEEM Project Tuesday, May 25, 2010
  66. 66. Phylogenetic challenge A single tree with everything Tuesday, May 25, 2010
  67. 67. Phylogenetic Binning Using AMPHORA dnaG 0.7 frr infC 0.6 nusA pgk pyrG 0.5 0.4 Improves with better rplA rplB rplC rplD 0.3 phylogenetic methods rplE rplF rplK rplL 0.2 rplM rplN rplP 0.1 rplS rplT rpmA 0 rpoB rpsB es ia s es s ria ia ia bi ia ia om ae e ia ria ria ria xi te te ia er er er er r er fle ro et ut rpsC fic te te te te te yd de ae ct ct ct ct ct lo yc ro ic ac ac ac ac ac ui m ch oi ba ba Ch ba ba Ba rm rpsE lo Aq ob ob ob ob ob er la iro eo Ch o eo o Fi ed Ch ct an te te te te id tin ct rpsI Sp ot ot Ba Ac ro ro ro ro ifi an Cy Ac Pr pr ss ap p ap np rpsJ Pl ta ta ed la ph m lo el Be nc rpsK si ifi am Al D Ep U ss rpsM G la nc rpsS U smpB tsf AMPHORA - each read on its own tree Tuesday, May 25, 2010

×