Advertisement

Jonathan Eisen slides for #HMP2010

Professor; Studying Evolution, Ecology and Genomics of Microbes & Microbiomes; Open Science; Science communicator
Sep. 2, 2010
Advertisement

More Related Content

Advertisement
Advertisement

Jonathan Eisen slides for #HMP2010

  1. A phylogeny driven genomic encyclopedia of bacteria and archaea Jonathan A. Eisen UC Davis Talk for HMP2010 September 2, 2010
  2. Social Networking in Science
  3. Bacterial evolve
  4. Progress in Genome Sequencing From http://genomesonline.org
  5. Progress in Genome Sequencing From http://genomesonline.org
  6. Progress in Genome Sequencing From http://genomesonline.org
  7. Way Back Machine - 2002
  8. Way Back Machine - 2002 454
  9. Way Back Machine - 2002 454
  10. Way Back Machine - 2002 454 Illumina
  11. Way Back Machine - 2002 454 Illumina
  12. Way Back Machine - 2002 454 Illumina Solid
  13. Way Back Machine - 2002 454 Illumina Solid
  14. Proteobacteria 2002 TM6 OS-K Acidobacteria • At least 40 Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002
  15. 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas sequences are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002
  16. 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas sequences are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes • Some other NKB19 Verrucomicrobia Chlamydia phyla are only OP3 Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 sampled Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002
  17. 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas sequences are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes • Some other NKB19 Verrucomicrobia Chlamydia phyla are only OP3 Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 sampled Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002
  18. Why Increase Phylogenetic Coverage? • Common approach within some eukaryotic groups (FGP, NHGRI, etc) • Many successful small projects to fill in bacterial or archaeal gaps • Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature • Many potential benefits
  19. Proteobacteria • NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi • A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Solution I: Coprothmermobacter OP10 sequence more Thermomicrobia Chloroflexi TM7 phyla Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  20. Proteobacteria • NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi • A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Still highly Coprothmermobacter OP10 biased in terms Thermomicrobia Chloroflexi TM7 of the tree Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  21. Major Lineages of Actinobacteria 2.5 Actinobacteria 2.5.1 Acidimicrobidae 2.5.1 Acidimicrobidae 2.5.1.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.1 Unclassified 2.5.1.3 Acidimicrobineae 2.5.1.3.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.3.2 Acidimicrobiaceae 2.5.1.4 BD2-10 2.5.1.3 Acidimicrobineae 2.5.1.5 EB1017 2.5.2 Actinobacteridae 2.5.1.4 BD2-10 2.5.2.1 Unclassified 2.5.2.10 Ellin306/WR160 2.5.1.5 EB1017 2.5.2.11 Ellin5012 2.5.2.12 Ellin5034 2.5.2 Actinobacteridae 2.5.2.13 Frankineae 2.5.2.13.1 Unclassified 2.5.2.1 Unclassified 2.5.2.13.2 Acidothermaceae 2.5.2.13.3 Ellin6090 2.5.2.10 Ellin306/WR160 2.5.2.13.4 Frankiaceae 2.5.2.11 Ellin5012 2.5.2.13.5 2.5.2.13.6 Geodermatophilaceae Microsphaeraceae 2.5.2.12 Ellin5034 2.5.2.13.7 2.5.2.14 Sporichthyaceae Glycomyces 2.5.2.13 Frankineae 2.5.2.15 2.5.2.15.1 Intrasporangiaceae Unclassified 2.5.2.14 Glycomyces 2.5.2.15.2 2.5.2.15.3 Dermacoccus Intrasporangiaceae 2.5.2.15 Intrasporangiaceae 2.5.2.16 2.5.2.17 Kineosporiaceae Microbacteriaceae 2.5.2.16 Kineosporiaceae 2.5.2.17.1 2.5.2.17.2 Unclassified Agrococcus 2.5.2.17 Microbacteriaceae 2.5.2.17.3 2.5.2.18 Agromyces Micrococcaceae 2.5.2.18 Micrococcaceae 2.5.2.19 2.5.2.2 Micromonosporaceae Actinomyces 2.5.2.19 Micromonosporaceae 2.5.2.20 2.5.2.20.1 Propionibacterineae Unclassified 2.5.2.2 Actinomyces 2.5.2.20.2 2.5.2.20.3 Kribbella Nocardioidaceae 2.5.2.20 Propionibacterineae 2.5.2.20.4 2.5.2.21 Propionibacteriaceae Pseudonocardiaceae 2.5.2.21 Pseudonocardiaceae 2.5.2.22 2.5.2.22.1 Streptomycineae Unclassified 2.5.2.22 Streptomycineae 2.5.2.22.2 2.5.2.22.3 Kitasatospora Streptacidiphilus 2.5.2.23 Streptosporangineae 2.5.2.23 2.5.2.23.1 Streptosporangineae Unclassified 2.5.2.3 Actinomycineae 2.5.2.23.2 2.5.2.23.3 Ellin5129 Nocardiopsaceae 2.5.2.4 Actinosynnemataceae 2.5.2.23.4 2.5.2.23.5 Streptosporangiaceae Thermomonosporaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.3 2.5.2.4 Actinomycineae Actinosynnemataceae 2.5.2.6 Brevibacteriaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.6 Brevibacteriaceae 2.5.2.7 Cellulomonadaceae 2.5.2.7 Cellulomonadaceae 2.5.2.8 Corynebacterineae 2.5.2.8 Corynebacterineae 2.5.2.8.1 Unclassified 2.5.2.8.2 Corynebacteriaceae 2.5.2.9 Dermabacteraceae 2.5.2.8.3 Dietziaceae 2.5.2.8.4 Gordoniaceae 2.5.3 Coriobacteridae 2.5.2.8.5 Mycobacteriaceae 2.5.2.8.6 Rhodococcus 2.5.3.1 Unclassified 2.5.2.8.7 Rhodococcus 2.5.2.8.8 Rhodococcus 2.5.3.2 Atopobiales 2.5.2.9 Dermabacteraceae 2.5.2.9.1 Unclassified 2.5.3.3 Coriobacteriales 2.5.2.9.2 Brachybacterium 2.5.2.9.3 Dermabacter 2.5.3.4 Eggerthellales 2.5.3 Coriobacteridae 2.5.3.1 Unclassified 2.5.4 OPB41 2.5.3.2 Atopobiales 2.5.3.3 Coriobacteriales 2.5.5 PK1 2.5.3.4 Eggerthellales 2.5.4 OPB41 2.5.6 Rubrobacteridae 2.5.5 PK1 2.5.6 Rubrobacteridae 2.5.6.1 Unclassified 2.5.6.1 Unclassified 2.5.6.2 "Thermoleiphilaceae 2.5.6.2 "Thermoleiphilaceae 2.5.6.2.1 Unclassified 2.5.6.2.2 Conexibacter 2.5.6.3 MC47 2.5.6.2.3 XGE514 2.5.6.3 MC47 2.5.6.4 Rubrobacteraceae 2.5.6.4 Rubrobacteraceae
  22. Proteobacteria • NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi • A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Same trend in Coprothmermobacter OP10 Archaea Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  23. Proteobacteria • NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi • A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Same trend in Coprothmermobacter OP10 Eukaryotes Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  24. Proteobacteria • NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi • A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Same trend in Coprothmermobacter OP10 Viruses Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  25. Progress in Genome Sequencing From http://genomesonline.org
  26. Proteobacteria • GEBA TM6 OS-K • At least 40 phyla Acidobacteria • A genomic Termite Group OP8 of bacteria encyclopedia Nitrospira Bacteroides • Genome Chlorobi of bacteria and Fibrobacteres Marine GroupA sequences are archaea WS3 Gemmimonas mostly from Firmicutes Fusobacteria Actinobacteria three phyla OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Solution: Really Coprothmermobacter OP10 Thermomicrobia Fill in the Tree Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  27. GEBA Pilot Project Overview • Identify major branches in rRNA tree for which no genomes are available • Identify branches with a cultured representative in DSMZ • DSMZ grew > 200 of these and prepped DNA • Sequence and finish 100 (covering breadth of bacterial/archaea diversity) • Annotate, analyze, release data • Assess benefits of tree guided sequencing • 1st paper Wu et al in Nature Dec 2009
  28. GEBA Pilot Project: Components • Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen, Eddy Rubin, Jim Bristow) • Project management (David Bruce, Eileen Dalin, Lynne Goodwin) • Culture collection and DNA prep (DSMZ, Hans-Peter Klenk) • Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng) • Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al) • Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla) • Adopt a microbe education project (Cheryl Kerfeld) • Outreach (David Gilbert) • $$$ (DOE, DSMZ, GBMF)
  29. GEBA Lesson 1 rRNA Tree is Useful for Identifying Phylogenetically Novel Organisms
  30. rRNA Tree of Life Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  31. Network of Life Bacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  32. “Whole Genome” Tree w/ AMPHORA http://itol.embl.de/ Analogous to method of Ciccarelli et al. See Wu and Eisen, Genome Biology 2008 9: R151 http://bobcat.genomecenter.ucdavis.edu/AMPHORA/
  33. Compare PD in rRNA and WGT
  34. PD of rRNA, Genome Trees Similar From Wu et al. 2009 Nature 462, 1056-1060
  35. GEBA Lesson 2 Phylogeny-driven genome selection helps discover new genetic diversity
  36. Network of Life Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  37. Protein Family Rarefaction Curves • Take data set of multiple complete genomes • Identify all protein families using MCL • Plot # of genomes vs. # of protein families
  38. Synapomorphies exist
  39. Phylogenetic Distribution Novelty: Bacterial Actin Related Protein 2"#3)&4&*&& !"#*)$*),+% 5"#$-.-6&0&1- !"#$%,$-%)( 7"#0(1.8-9& !"#$''+-+,',! 5"#:1,)*&$/0 !"#&$,%+)+-+ !"#$% !"#$%&'()*&& !"#$%&'(%() (( +"#,-.(/01 !"#*+,**'+( ;"#01,&-*0 !"#%*+$--( <"#$-.-3.1%&0 !"#%',&'-+) ') 2"#$&*-.-1 !"#$'(-%%+&$ ="#$.1001 !"#-*$+$(&( !&'( $++ >"#0$1,/%1.&0 !"#&$**+),)-! *$ $++ ;"#01,&-*0 !"#*+,$*'( '* 5"#:1,)*&$/0 !"#&$,%+%-%% $++ 5"#$-.-6&0&1- !"#',&+$)* !&') ?"#@-%1*)A10(-. !"#&%'%&*%* $++ B"#A1%%/0# "#%*,-&*'( )* 2"#*-)').@1*0 !"#*-&'''(+ 5"#$-.-6&0&1- !"#',&&*&* !&'* $++ ?"#@-%1*)A10(-. !"#$)),)*%, $++ ;"#01,&-*0 !"#*+,$*),! ;"#)$C.1$-/@ !"#&&),(*((- +!&' 5"#$-.-6&0&1- !"#$++-&%%! ), ."#,1(-*0 !"#$'-+*$((&! !&', (( !"#(C1%&1*1 !"#$-,(%'+-! (% 5"#$-.-6&0&1- !"#$,+$(,& $++ 5"#:1,)*&$/0 !"#&$,%+-,(,! !&'- -) ?"#4&0$)&4-/@ !"#''-+&%$- )% ?"#@-%1*)A10(-. !"#$)),),%) () 5"#$-.-6&0&1- !"#',&,$$% $++ ?"#C1*0-*&&!"#&$-*$ $(&$ !&'. $++ D"#01(&61 !"#$-&'*)%&+! !"#(C1%&1*1!"#$-%$ $),) !&'/ ?"#@-%1*)A1(-. !"#$((&+,*- $++ <"#@/0$/%/0 !"#&&'&%'*(, !&'(0 +/*! Haliangium ochraceum DSM 14365 Patrik D’haeseleer, Adam Zemla, Victor Kunin See also Guljamow et al. 2007 Current Biology.
  40. GEBA Lesson 3 Phylogeny-driven genome selection improves genome annotation
  41. Most/All Functional Prediction Improves w/ Better Phylogenetic Sampling • Better definition of protein family sequence “patterns” • Greatly improves “comparative” and “evolutionary” based predictions • Conversion of hypothetical into conserved hypotheticals • Linking distantly related members of protein families • Improved non-homology prediction Kostas Natalia Thanos Nikos Iain Mavrommatis Ivanova Lykidis Kyrpides Anderson
  42. GEBA Lesson 4 Metadata and individual genome papers important
  43. SIGS http://standardsingenomics.org/
  44. GEBA Lesson 5 Phylogeny-driven genome selection improves analysis of metagenome data
  45. Who is out there?
  46. rRNA phylotyping from metagenomics Venter et al., 2004
  47. Shotgun Sequencing Allows Use of Alternative Anchors (e.g., RecA) Venter et al., 2004
  48. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro t eo ba Be ct t ap er ia ro t eo G ba ct am er m ia ap ro t eo Ep ba si ct lo er np ia ro t eo De ba lta ct er pr ia ot eo ba ct C er ia ya no ba ct er ia Fi rm ic ut es Ac tin ob ac te ria C hl or ob i Major Phylogenetic Group C FB Sargasso Phylotypes C hl or of le xi Sp iro ch ae te s Fu so ba ct De er in ia oc oc cu s- Th er Eu m ry us ar ch ae ot C a re n ar ch ae ot a Shotgun Sequencing Allows Use of Other Markers Venter et al., 2004 EFG EFTu rRNA RecA RpoB HSP70
  49. Binning challenge A T B U C V D W E X F Y G Z
  50. Binning challenge A T B U C V D W E X F Y G Best binning method: reference genomes Z
  51. Reference Genomes Coming from Select Environment
  52. Binning challenge A T B U C V D W E X F Y G No reference genome? What do you do? Z
  53. Binning challenge A T B U C V D W E X F Y G No reference genome? What do you do? Z Phylogeny ....
  54. Al ph ap ro Be te ta ob G 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 pr ac am ot te m eo ria ap ba ro ct D te er el ob ia ta pr ac Ep ot te U si lo eo ria nc ba la np ss ro ct ifi te er ed ob ia Pr ac ot te eo ria ba Cy ct an er ob ia ac Ch te ria la m Ac yd id ia ob e Ba act ct er er ia Ac oi de tin te ob s ac te ria Aq Pl ui an fic ct om ae yc Sp et AMPHORA - each read on its own tree iro es ch ae Fi te rm s ic ut Ch es lo ro U fle nc xi la Ch ss lo ifi ro ed bi Ba ct er ia Phylogenetic Binning Using AMPHORA frr tsf pgk rplL rplF rplP rplT rplE infC rpsI rplS rplA rplB rplK rplC rpsJ rplN rplD rplM rpsE rpsS rpsB rpsK rpsC rpoB rpsM pyrG nusA dnaG rpmA smpB
  55. Al ph ap ro Be te ta ob G 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 pr ac am ot te m eo ria ap ba ro ct D te er el ob ia ta pr ac Ep ot te U si lo eo ria nc ba la np ss ro ct ifi te er ed ob ia Pr ac ot te eo ria ba Cy ct an er ob ia ac Ch te ria la m Ac yd sampling id ia ob e Ba act ct er er ia Ac oi de tin te ob s ac te poor genomic ria Aq Pl ui an fic ct om ae yc Sp et AMPHORA - each read on its own tree iro es ch Limited in past by ae Fi te rm s ic ut Ch es lo ro U fle nc xi la Ch ss lo ifi ro ed bi Ba ct er ia Phylogenetic Binning Using AMPHORA frr tsf pgk rplL rplF rplP rplT rplE infC rpsI rplS rplA rplB rplK rplC rpsJ rplN rplD rplM rpsE rpsS rpsB rpsK rpsC rpoB rpsM pyrG nusA dnaG rpmA smpB
  56. Metagenomic Analysis Improves w/ Phylogenetic Sampling • Small but real improvements in –Gene identification / confirmation –Functional prediction –Binning –Phylogenetic classification
  57. Metagenomic Analysis Improves w/ Phylogenetic Sampling • Small but real improvements in –Gene identification / confirmation –Functional prediction –Binning –Phylogenetic classification • But not a lot ...
  58. GEBA Future 1 Need to adapt genomic and metagenomic methods to make use of GEBA data
  59. Al ph ap ro Be te ta ob G 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 pr ac am ot te m eo ria ap ba ro ct D te er el ob ia ta pr ac Ep ot te U si lo eo ria nc ba la np ss ro ct ifi te er ed ob ia Pr ac ot te eo ria ba Cy ct an er ob ia ac Ch te ria la m Ac yd id ia ob e Ba act ct er er ia Ac oi de tin te ob s ac te ria Aq Pl ui an fic ct om ae yc Sp et AMPHORA - each read on its own tree iro es ch ae Fi te rm s ic ut Ch es Improves with better lo ro U fle nc phylogenetic methods la xi ss Ch ifi lo ed ro bi Ba ct er ia Phylogenetic Binning Using AMPHORA frr tsf pgk rplL rplF rplP rplT rplE infC rpsI rplS rplA rplB rplK rplC rpsJ rplN rplD rplM rpsE rpsS rpsB rpsK rpsC rpoB rpsM pyrG nusA dnaG rpmA smpB
  60. Improving Phylogeny for Metagenomic Reads • Examples using reference trees – AMPHORA (Wu and Eisen) – PPlacer (Erik Matsen) – FastTree (Morgan Price) • Variants – Use concatenated alignment of markers not just individual genes (Steven Kembel) – Apply to OTU identification not just classification (Thomas Sharpton) – CoBinning: look for linkage among fragments/genes (Aaron Darling)
  61. Al ph ap ro Be te ta ob G 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 pr ac am ot te m eo ria ap ba ro ct D te er el ob ia ta pr ac Ep ot te U si lo eo ria nc ba la np ss ro ct ifi te er ed ob ia Pr ac ot te eo ria ba Cy ct an er ob ia ac Ch te ria la m Ac yd id ia ob e Ba act ct er er ia Ac oi de tin te ob s ac gene families te ria Aq Pl ui an fic ct om ae yc Sp et AMPHORA - each read on its own tree iro es ch ae Fi te rm s ic ut Improves with more Ch es lo ro U fle nc xi la Ch ss lo ifi ro ed bi Ba ct er ia Phylogenetic Binning Using AMPHORA frr tsf pgk rplL rplF rplP rplT rplE infC rpsI rplS rplA rplB rplK rplC rpsJ rplN rplD rplM rpsE rpsS rpsB rpsK rpsC rpoB rpsM pyrG nusA dnaG rpmA smpB
  62. Keep only the families with: Universality * Evenness * monophyly >= 90*90*90 Phylogenetic group Genome Number Gene Number Maker Candidates Archaea 62 145415 102 Actinobacteria 63 267783 136 Alphaproteobacteria 94 347287 142 Betaproteobacteria 56 266362 294 Gammaproteobacteria 126 483632 141 Deltaproteobacteria 25 102115 44 Epislonproteobacteria 18 33416 446 Bacteriodes 25 71531 179 Chlamydae 13 13823 561 Chloroflexi 10 33577 140 Cyanobacteria 36 124080 532 Firmicutes 106 312309 80 Spirochaetes 18 38832 72 Thermi 5 14160 727 Thermotogae 9 17037 646
  63. Al ph ap ro Be te ta ob G 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 pr ac am ot te m eo ria ap ba ro ct D te er el ob ia ta pr ac Ep ot te U si lo eo ria nc ba la np ss ro ct ifi te er ed ob ia Pr ac ot te eo ria ba Cy ct an er ob ia ac Ac Ch la m te yd ria models id ia ob e Ba act ct er er ia Ac oi de tin te ob s ac te ria Aq Improves with Pl ui an fic ct om ae yc Sp et AMPHORA - each read on its own tree iro es ch ae Fi te rm s ic ut Ch es lo ro U fle nc xi la Ch ss rebuilding gene family ifi lo ed ro bi Ba ct er ia Phylogenetic Binning Using AMPHORA frr tsf pgk rplL rplF rplP rplT rplE infC rpsI rplS rplA rplB rplK rplC rpsJ rplN rplD rplM rpsE rpsS rpsB rpsK rpsC rpoB rpsM pyrG nusA dnaG rpmA smpB
  64. Other Ways to Make Better Use of the Data • Rebuild protein family models • Experiments from across the tree needed • Need better phylogenies, including HGT • Improved tools for using distantly related genomes in metagenomic analysis • Better recording and sharing of metadata about organisms
  65. GEBA Future 2 The dark matter of the biological universe
  66. rRNA Tree of Life Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  67. Phylogenetic Diversity: Sequenced Bacteria & Archaea From Wu et al. 2009
  68. Phylogenetic Diversity with GEBA From Wu et al. 2009
  69. Phylogenetic Diversity: Isolates From Wu et al. 2009
  70. Phylogenetic Diversity: All From Wu et al. 2009
  71. Proteobacteria TM6 OS-K • At least 40 phyla of bacteria Acidobacteria Termite Group OP8 • Genome sequences are mostly Nitrospira Bacteroides from three phyla Chlorobi Fibrobacteres Marine GroupA • Most phyla with cultured WS3 Gemmimonas Firmicutes species are sparsely sampled Fusobacteria Actinobacteria • Lineages with no cultured OP9 Cyanobacteria Synergistes taxa even more poorly Deferribacteres Chrysiogenetes NKB19 sampled Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Well sampled phyla Thermudesulfobacteria Thermotogae Poorly sampled OP1 OP11 No cultured taxa
  72. Proteobacteria TM6 OS-K Acidobacteria Termite Group • At least 40 phyla of bacteria OP8 Nitrospira • Genome sequences are mostly Bacteroides Chlorobi Fibrobacteres from three phyla Marine GroupA WS3 • Most phyla with cultured Gemmimonas Firmicutes species are sparsely sampled Fusobacteria Actinobacteria OP9 • Lineages with no cultured taxa Cyanobacteria Synergistes even more poorly sampled Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Well sampled phyla Thermudesulfobacteria Thermotogae Poorly sampled OP1 OP11 No cultured taxa
  73. Uncultured Lineages: Technical Approaches • Get into culture • Enrichment cultures • If abundant in low diversity ecosystems • Flow sorting • Microbeads • Microfluidic sorting • Single cell amplification
  74. MICROBES
  75. Proteobacteria • GEBA TM6 OS-K • At least 40 phyla Acidobacteria • A genomic Termite Group OP8 of bacteria encyclopedia Nitrospira Bacteroides • Genome Chlorobi of bacteria and Fibrobacteres Marine GroupA sequences are archaea WS3 Gemmimonas mostly from Firmicutes Fusobacteria Actinobacteria three phyla OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Solution: Really Coprothmermobacter OP10 Thermomicrobia Fill in the Tree Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  76. GEBA Pilot Project: Components • Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen, Eddy Rubin, Jim Bristow) • Project management (David Bruce, Eileen Dalin, Lynne Goodwin) • Culture collection and DNA prep (DSMZ, Hans-Peter Klenk) • Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng) • Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al) • Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla) • Adopt a microbe education project (Cheryl Kerfeld) • Outreach (David Gilbert) • $$$ (DOE, DSMZ, GBMF)

Editor's Notes

  1. This is a tree of a rRNA gene that was found on a large DNA fragment isolated from the Monterey Bay. This rRNA gene groups in a tree with genes from members of the gamma Proteobacteria a group that includes E. coli as well as many environmental bacteria. This rRNA phylotype has been found to be a dominant species in many ocean ecosystems. clone from the Sargasso Sea. This shows that this
  2. Gets better with more markers - but we do not have lots of sequences for these markers. We can get them from genomes. The more diverse the genomes, thebeter the marker set will be
  3. Gets better with more markers - but we do not have lots of sequences for these markers. We can get them from genomes. The more diverse the genomes, thebeter the marker set will be
  4. Gets better with more markers - but we do not have lots of sequences for these markers. We can get them from genomes. The more diverse the genomes, thebeter the marker set will be
  5. Gets better with more markers - but we do not have lots of sequences for these markers. We can get them from genomes. The more diverse the genomes, thebeter the marker set will be
  6. Gets better with more markers - but we do not have lots of sequences for these markers. We can get them from genomes. The more diverse the genomes, thebeter the marker set will be
Advertisement