Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Eisen.Geba.Jgi2009b

2,292 views

Published on

Talk I gave at the JGI User Meeting 2009.

Published in: Business, Technology
  • Be the first to comment

  • Be the first to like this

Eisen.Geba.Jgi2009b

  1. 1. GEBA A genomic encyclopedia of bacteria and archaea Jonathan A. Eisen JGI User Meeting 2009
  2. 2. “ Nothing in biology makes sense except in the light of evolution.” T. Dobzhansky (1973)
  3. 4. rRNA Tree of Life
  4. 5. The Tree is not Happy
  5. 6. From http://genomesonline.org
  6. 7. <ul><li>At least 40 phyla of bacteria </li></ul>Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi As of 2002 Based on Hugenholtz, 2002 Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
  7. 8. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi <ul><li>At least 40 phyla of bacteria </li></ul><ul><li>Genome sequences are mostly from three phyla </li></ul>As of 2002 Based on Hugenholtz, 2002 Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
  8. 9. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi <ul><li>At least 40 phyla of bacteria </li></ul><ul><li>Genome sequences are mostly from three phyla </li></ul><ul><li>Some other phyla are only sparsely sampled </li></ul>As of 2002 Based on Hugenholtz, 2002 Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
  9. 10. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi <ul><li>At least 40 phyla of bacteria </li></ul><ul><li>Genome sequences are mostly from three phyla </li></ul><ul><li>Some other phyla are only sparsely sampled </li></ul><ul><li>Same trend in Archaea </li></ul>As of 2002 Based on Hugenholtz, 2002 Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
  10. 11. Need for Tree Guidance Well Established <ul><li>Common approach within some eukaryotic groups </li></ul><ul><li>Many small projects funded to fill in some bacterial or archaeal gaps </li></ul><ul><li>Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature </li></ul>
  11. 12. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi <ul><li>At least 40 phyla of bacteria </li></ul><ul><li>Genome sequences are mostly from three phyla </li></ul><ul><li>Some other phyla are only sparsely sampled </li></ul><ul><li>Solution I: sequence more phyla </li></ul><ul><li>NSF-funded Tree of Life Project </li></ul><ul><li>A genome from each of eight phyla </li></ul>Eisen, Ward, Badger, Wu, Wu, et al. Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
  12. 13. Bacterial aTOL Project AIMS <ul><li>Improve resolution of deep branches in the bacterial tree </li></ul><ul><li>Launch biological studies of these phyla and discover functional novelty </li></ul><ul><li>Leverage data for interpreting environmental surveys </li></ul>
  13. 14. T. roseum genome
  14. 15. The Tree of Life is Still Angry
  15. 16. Within Phyla Diversity Immense <ul><li>Each phyla represents billions of years of evolution </li></ul><ul><li>Some have hundreds of major lineages </li></ul><ul><li>New lineages are being discovered all the time </li></ul><ul><li>Most branches within most phyla have few or no genomes </li></ul>
  16. 17. Major Lineages of Actinobacteria
  17. 18. Additional Impetus for Tree Guided Projects <ul><li>Suggestion to sequence all bacteria and archaea in Bergey’s Manual (Stevens et al) </li></ul><ul><li>Success in sequencing genomes from across the tree in animals </li></ul><ul><li>Multiple government reports suggest a more systematic approach to sequencing is needed </li></ul>
  18. 19. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi <ul><li>At least 100 phyla of bacteria </li></ul><ul><li>Genome sequences are mostly from three phyla </li></ul><ul><li>Most phyla with cultured species are sparsely sampled </li></ul><ul><li>Lineages with no cultured taxa even more poorly sampled </li></ul><ul><li>Solution - use tree to really fill gaps </li></ul>Well sampled phyla Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
  19. 20. http://www.jgi.doe.gov/programs/GEBA/pilot.html
  20. 21. GEBA Pilot Project Overview <ul><li>Select 200 organisms using tree </li></ul><ul><li>Develop high throughput pipeline for strain growth and DNA preparation </li></ul><ul><li>Sequence and finish 100 </li></ul><ul><li>Annotate, analyze, release data </li></ul><ul><li>Assess benefits of tree guided sequencing </li></ul>
  21. 22. GEBA Pilot I: Selecting Targets
  22. 28. GEBA Pilot II: The Importance of Project Management
  23. 29. <ul><li>GEBA Project Flowchart </li></ul>GEBA Proposal Scientific and Technical Review 1 Negotiate Scope of Work Receive Starting Material 1 OK? Project Initiation Sequencing Annotation Draft Sequencing and Assembly 1 Finish Sequencing and Assembly 2 IMG 1 Finish Annotation 3 Complete Genome GenBank Submission 1 Draft Annotation 3 Shotgun Genome GenBank Submission 1 IMG – ER 1 1 PGF 2 LANL 3 ORNL OK? OK? IMG – ER 1 Gene-QA 1 David Bruce, Lynne Goodwin et al
  24. 30. GEBA Pilot III: Partnership with DSMZ
  25. 31. GEBA Biggest Challenge: Getting DNA <ul><li>Getting quality DNA is biggest bottleneck </li></ul><ul><li>Solution: Beg Borrow and Steal </li></ul><ul><li>DSMZ offered to do for free </li></ul><ul><li>ATCC is doing a small number for a fee </li></ul><ul><li>In discussions with other PCC and other collections </li></ul>
  26. 33. Microorganisms Quantification gel of the genomic DNA isolated from Conexibacter woesei (DSM 14684T) Conexibacter woesei (DSM 14684T) was taken from the German Collection of Microorganisms and Cell Cultures (DSMZ). The genomic DNA was isolated using the Qiagen Genomic 500 DNA Kit (Qiagen 10262). The genomic DNA was 10-250 kb in size as determined by Pulsed Field Gel Electrophoresis (PFGE). The bulk of DNA had a size of 50-250 kb (see attached PFGE image). The DNA concentration is 500 ng/µl as estimated from the gel. Spectrophotometric measurements yielded a DNA concentration of 450 µg/ml; 300 µl of genomic DNA are shipped (150 µg). 1 2 3 4 5 6 7 8 Lane 1: c(  -Marker)= 15 ng Lane 2: c(  -Marker)= 30 ng Lane 3: c(  -Marker)= 50 ng Lane 4: DNA Molecular Weight Marker II (Roche 236250) Lane 5: DSM 13279, Collinsella stercoris Lane 6: DSM 43043, Intrasporangium calvum Lane 7: DSM 18053, Dyadobacter fermentans Lane 8: DSM 20476, Slackia heliotrinireducens Lane 9: DSM 18081, Patulibacter minatonensis Lane 10: DSM 14684, Conexibacter woesei Lane 11: DSM 11002, Dethiosulfovibrio peptidovorans Lane 12: DSM 11551, Halogeometricum borinquense Lane 13: DNA Molecular Weight Marker II (Roche 236250) Lane 14: c(  -Marker)= 125 ng Lane 15: c(  -Marker)= 250 ng Lane 16: c(  -Marker)= 500 ng 9 10 11 12 13 14 15 16
  27. 34. GEBA Pilot IV: Sequencing, Annotation, Data Release
  28. 35. Current Status <ul><li>>100 in progress </li></ul><ul><li>GEBA 56 (focus of first paper) </li></ul><ul><ul><li>34 finished genomes </li></ul></ul><ul><ul><li>55 submitted to Genbank </li></ul></ul><ul><ul><li>Released to IMG-GEBA page and JGI-FTP site </li></ul></ul><ul><li>All data is completely Open for anyone to use </li></ul>
  29. 36. IMG/GEBA http://img.jgi.doe.gov/cgi-bin/geba/main.cgi
  30. 37. Adopt a Microbe
  31. 38. GEBA Pilot IV: Assess Benefits of GEBA56 All genomes have some value But what, if any, is the benefit of tree-guided sequencing over other selection methods
  32. 39. Why Increase Taxonomic Coverage II? <ul><li>Gene discovery </li></ul><ul><li>Annotation, functional prediction </li></ul><ul><li>Metagenomic analysis </li></ul><ul><li>Mechanisms of diversification </li></ul><ul><li>Species phylogeny and classification </li></ul>
  33. 41. Value of diverse genomes I: Gene discovery <ul><li>Premise: </li></ul><ul><ul><li>New genomes frequently contain genetic novelty </li></ul></ul><ul><ul><li>Phylogenetic diversity of a genome should be correlated to novelty </li></ul></ul><ul><li>Caveat: </li></ul><ul><ul><li>Does lateral gene transfer wipe out contribution of phylogenetic diversity to novelty? </li></ul></ul>
  34. 42. Protein Family Rarefaction Curves <ul><li>Take data set of multiple complete genomes </li></ul><ul><li>Identify all protein families using MCL </li></ul><ul><li>Plot # of genomes vs. # of protein families </li></ul>
  35. 44. Genome Number Total Gene Number Number of proteins 0 50000 100000 150000 200000 250000 300000 350000 0 10 20 30 40 50 60 70 80 S. agalactiae Enterobacteriaceae Actinobacteria Bacteria from GEBA project
  36. 45. Novelty 2 - Structural Novelty <ul><li>Of the 17000 protein families in the GEBA56, 1800 are novel in sequence (Wu) </li></ul><ul><li>Structural modeling suggests many are structurally novel too (D'haeseleer) </li></ul><ul><li>372 being crystallized by the PSI (Kerfeld) </li></ul>
  37. 46. Novelty 3 Diversity within known families
  38. 47. Transporter Profiles Sebaldella termitidis ATCC 33386 has 2x number of sugar PTS transporters of any genome
  39. 48. Novelty 4 Unusual distribution patterns
  40. 49. Shotgun Sequencing Detects More Diversity than PCR-methods
  41. 50. First Bacterial Actin Related Protein First found by V. Kunin, Structure Analysis by Patrik D. et al
  42. 51. Most Closely Related to ARP8
  43. 52. Value of 100 diverse genomes II: Annotation <ul><li>Premise: </li></ul><ul><ul><li>Increased phylogenetic coverage should improve our ability to annotate genes in other (e.g., reference/model genomes) </li></ul></ul>
  44. 53. Annotation Improves <ul><li>Conversion of hypothetical into conserved hypotheticals </li></ul><ul><li>Linking distantly related members of protein families </li></ul><ul><li>Non-homology functional prediction methods </li></ul>
  45. 54. Linking Protein Families Improved
  46. 55. Fusion Based Predictions Improved
  47. 56. Improving Rosetta Stone Predictions
  48. 57. Value of 100 diverse genomes III: Metagenomics <ul><li>Premise: </li></ul><ul><ul><li>Increased sampling of diverse genomes should improve many aspects of metagenomic analysis </li></ul></ul><ul><li>To test: </li></ul><ul><ul><li>Annotation </li></ul></ul><ul><ul><li>Binning </li></ul></ul>
  49. 58. Metagenomic Annotation Improves (Slightly)
  50. 59. Compositional Binning Improves (Slightly)
  51. 60. Phylogenetic Binning Improves Slightly
  52. 61. Value of 100 diverse genomes V: Phylogeny
  53. 62. 16s Says Hyphomonas is in Rhodobacteriales Badger et al. 2005
  54. 63. WGT Says Its Related to Caulobacterales Badger et al. 2005
  55. 66. GEBA - After the Pilot
  56. 67. PD of sequenced organisms
  57. 68. PD with GEBA
  58. 70. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi <ul><li>At least 40 phyla of bacteria </li></ul><ul><li>Genome sequences are mostly from three phyla </li></ul><ul><li>Most phyla with cultured species are sparsely sampled </li></ul><ul><li>Lineages with no cultured taxa even more poorly sampled </li></ul>Well sampled phyla Poorly sampled No cultured taxa Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
  59. 71. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi <ul><li>At least 40 phyla of bacteria </li></ul><ul><li>Genome sequences are mostly from three phyla </li></ul><ul><li>Some other phyla are only sparsely sampled </li></ul><ul><li>Same trend in Viruses </li></ul>As of 2002 Based on Hugenholtz, 2002 Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
  60. 72. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi <ul><li>At least 40 phyla of bacteria </li></ul><ul><li>Genome sequences are mostly from three phyla </li></ul><ul><li>Some other phyla are only sparsely sampled </li></ul><ul><li>Same trend in Microbial Eukaryotes </li></ul>As of 2002 Based on Hugenholtz, 2002 Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
  61. 73. 0.1 Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi Tree based on Hugenholtz (2002) with some modifications. Need experimental studies from across the tree too Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
  62. 75. MICROBES
  63. 76. A Happy Tree of Life

×