Jonathan Eisen talk on 1$ Genome

5,170 views

Published on

Talk given by Jonathan Eisen at ASM General Meeting 2009 in session on "The 1$ Bacterial Genome"

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,170
On SlideShare
0
From Embeds
0
Number of Embeds
1,173
Actions
Shares
0
Downloads
25
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Gets better with more markers - but we do not have lots of sequences for these markers. We can get them from genomes. The more diverse the genomes, thebeter the marker set will be
  • Selecting phylogenetically diverse genomes increases the probability that one will find new protein families
  • Selecting phylogenetically diverse genomes increases the probability that one will find new protein families
  • Jonathan Eisen talk on 1$ Genome

    1. 1. The 1$ Bacterial Genome: Advances in Bioinformatics Jonathan A. Eisen U. C. Davis Genome Center
    2. 2. The 1$ Bacterial Genome: Oh $^#^ - We’re $&#$ Jonathan A. Eisen U. C. Davis Genome Center
    3. 3. The 1$ Bacterial Genome: Informatics, GEBA and me Jonathan A. Eisen U. C. Davis Genome Center
    4. 4. Outline • GEBA - The JGI Genomic Encyclopedia of Bacteria and Archaea • Insights into the 1$ genome from the GEBA project • Additional insights into the 1$ genome
    5. 5. GEBA: The Genomic Encyclopedia of Bacteria and Archaea Run by JGI $$ from DOE Work by many
    6. 6. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
    7. 7. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
    8. 8. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes • Some other NKB19 Verrucomicrobia Chlamydia phyla are OP3 Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 sampled Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
    9. 9. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes • Some other NKB19 Verrucomicrobia Chlamydia phyla are OP3 Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 sampled Thermomicrobia Chloroflexi TM7 • Same trend in Deinococcus-Thermus Dictyoglomus Aquificae Archaea Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
    10. 10. Need for Tree Guidance Well Established • Common approach within some eukaryotic groups – NHGRI animal projects – FGI at Whitehead – Plant LSP • Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature, conversations, etc • Many small projects funded to fill in some gaps – DOE/TIGR Sequencing – Multiple CSP projects – Multiple NSF/USDA projects – Private projects (e.g., Integrated Genomics, Diversa)
    11. 11. Proteobacteria • NSF-funded TM6 • At least 40 OS-K Tree of Life Acidobacteria Termite Group phyla of OP8 Project Nitrospira bacteria Bacteroides Chlorobi • A genome Fibrobacteres Marine GroupA • Genome from each of WS3 Gemmimonas sequences are Firmicutes eight phyla Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes • Some other NKB19 Verrucomicrobia Chlamydia phyla are only OP3 Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 sampled Thermomicrobia Eisen, Ward, Chloroflexi • Solution I: TM7 Badger, Wu, Deinococcus-Thermus Wu, et al. Dictyoglomus Aquificae sequence more Thermudesulfobacteria Thermotogae OP1 phyla OP11
    12. 12. Proteobacteria TM6 OS-K • At least 100 phyla of Acidobacteria Termite Group bacteria OP8 Nitrospira Bacteroides • Genome sequences are Chlorobi Fibrobacteres Marine GroupA mostly from three phyla WS3 Gemmimonas Firmicutes • Most phyla with cultured Fusobacteria Actinobacteria species are sparsely OP9 Cyanobacteria Synergistes sampled Deferribacteres Chrysiogenetes NKB19 • Lineages with no cultured Verrucomicrobia Chlamydia OP3 taxa even more poorly Planctomycetes Spriochaetes sampled Coprothmermobacter OP10 Thermomicrobia • Solution - use tree to really Chloroflexi TM7 Deinococcus-Thermus fill gaps Dictyoglomus Aquificae Well sampled phyla Thermudesulfobacteria Thermotogae OP1 OP11
    13. 13. http://www.jgi.doe.gov/programs/GEBA/pilot.html
    14. 14. GEBA Pilot Project Overview • Select 200 organisms using rRNA tree as a guide • Develop high throughput pipeline for strain growth and DNA preparation • Sequence and finish 100 genomes • Annotate, analyze, release data • Assess benefits of tree guided sequencing
    15. 15. B: Ac tin ob ac te B: ria # of Genomes Am (H in igh 10 15 20 25 30 35 0 5 an G a C B: B: er ) Ba Aq ob ct uif ia e i B: B: ro cae D Ch ide B: efe lo te r s D rri ofl ef ba e B: e c xi B: De B rrib ter Ep lta : D act es si Pr ei er lo o n es n te oc Pr ob oc ot a ci B: e ct G B: oba eri am B F ct a : ir e B: m Fu mi ria a G P so cut em ro ba e t c s B: ma eo te ba ri H tim c a a t B: loa ona eri Pl na d a B: an er ete Th o B: cto bia s Phyla er m S m le y s B: od piro ce es c te T u h B: he lfo ae s rm b te GEBA Pilot Target List Th o a s er de cte m s ri u a A: ove lfo H n bi A: alo abu a A: A b la M rc ac e A: et ha te M han eo ria et g ha oba lob A: no cte i m r A: The icr ia Th rm obi er oc a m oc op ci ro te i
    16. 16. IMG/GEBA http://img.jgi.doe.gov/cgi-bin/geba/main.cgi
    17. 17. Why Increase Taxonomic Coverage? • Gene discovery • Annotation, functional prediction • Metagenomic analysis • Mechanisms of diversification • Species phylogeny and classification
    18. 18. Phylogenetic Metagenomics
    19. 19. Non-Homology Predictions: Phylogenetic Profiling • Step 1: Search all genes in organisms of interest against all other genomes • Ask: Yes or No, is each gene found in each other species • Cluster genes by distribution patterns (profiles)
    20. 20. GEBA Lesson 1 Tree of Life is a Useful Guide
    21. 21. rRNA Tree of Life
    22. 22. GEBA Lesson 2 We have still only scratched the surface of microbial diversity
    23. 23. Phylogenetic Diversity: Sequenced Bacteria & Archaea
    24. 24. Phylogenetic Diversity with GEBA
    25. 25. Phylogenetic Diversity: GreenGenes
    26. 26. Viruses Too
    27. 27. First Bacterial Actin Related Protein - Haliangium ochraceum DSM 14365 First found by V. Kunin, Structure Analysis by Patrik D. et al
    28. 28. GEBA Lesson 3 Need Experiments from Across the Tree of Life too
    29. 29. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
    30. 30. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Experimental WS3 Gemmimonas Firmicutes studies are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
    31. 31. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Experimental WS3 Gemmimonas Firmicutes studies are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes • Some studies NKB19 Verrucomicrobia Chlamydia in other phyla OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
    32. 32. Proteobacteria TM6 OS-K Need Acidobacteria Termite Group OP8 experimental Nitrospira Bacteroides Chlorobi studies from Fibrobacteres Marine GroupA WS3 across the tree Gemmimonas Firmicutes too Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 OP11
    33. 33. GEBA Lesson 4 The Importance of Project Management
    34. 34. GEBA Project Flowchart Annotation Project Initiation Sequencing Draft IMG1 GEBA Sequencing Proposal and Assembly1 Shotgun Complete Scientific and Genome Genome Technical GenBank GenBank Review1 Submission1 Submission1 OK? OK? IMG – ER1 IMG – ER1 Finish Negotiate Sequencing Scope of and Draft Work Gene-QA1 Assembly2 Annotation3 Receive Starting Material1 Finish OK? Annotation3 1 PGF 2 LANL David Bruce, Lynne Goodwin et al 3 ORNL
    35. 35. GEBA Lessons 5 The Importance of Culture (Collections that is)
    36. 36. GEBA Biggest Challenge: Getting DNA • Getting quality DNA is biggest bottleneck • Solution: Beg Borrow and Steal • DSMZ offered to do for free • ATCC is doing a small number for a fee • In discussions with other PCC and other collections
    37. 37. Quantification gel of the genomic DNA isolated from Microorganisms Conexibacter woesei (DSM 14684T) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Lane 1: c(λ-Marker)= 15 ng Lane 9: DSM 18081, Patulibacter minatonensis Lane 2: c(λ-Marker)= 30 ng Lane 10: DSM 14684, Conexibacter woesei Lane 3: c(λ-Marker)= 50 ng Lane 11: DSM 11002, Dethiosulfovibrio peptidovorans Lane 4: DNA Molecular Weight Marker II (Roche Lane 12: DSM 11551, Halogeometricum borinquense 236250) Lane 13: DNA Molecular Weight Marker II (Roche Lane 5: DSM 13279, Collinsella stercoris 236250) Lane 6: DSM 43043, Intrasporangium calvum Lane 14: c(λ-Marker)= 125 ng Lane 7: DSM 18053, Dyadobacter fermentans Lane 15: c(λ-Marker)= 250 ng Lane 8: DSM 20476, Slackia heliotrinireducens Lane 16: c(λ-Marker)= 500 ng Conexibacter woesei (DSM 14684T) was taken from the German Collection of Microorganisms and Cell Cultures (DSMZ). The genomic DNA was isolated using the Qiagen Genomic 500 DNA Kit (Qiagen 10262). The genomic DNA was 10-250 kb in size as determined by Pulsed Field Gel Electrophoresis (PFGE). The bulk of DNA had a size of 50-250 kb (see attached PFGE image). The DNA concentration is 500 ng/µl as estimated from the gel. Spectrophotometric measurements yielded a DNA concentration of 450 µg/ml; 300 µl of genomic DNA are shipped (150 µg).
    38. 38. Related Lesson 1 METADATA ROCKS
    39. 39. SIGS • The Genomic Standards Consortium • The GSC is an open-membership working body which formed in September 2005. • The goal of this international community is to promote mechanisms that standardize the description of genomes and the exchange and integration of genomic data. • See http://gensc.org/gc_wiki/index.php/Main_Page
    40. 40. Related Lesson 2 Completeness Matters
    41. 41. Completeness • Final quality of genome sequence influences what one can do with the data • Why completeness (closed, high quality) is important – Gene presence/absence – Gene order – Genome rearrangements – Identifying islands • See “The Value of Complete Microbial Genome Sequencing (You Get What You Pay For).” Fraser et al. J. Bact. 2002.
    42. 42. StrpB vs. StrpA 13623100 13622900 13622700 13622500 13622300 Series1 13622100 13621900 13621700 13621500 13621300 0 500 1000 1500 2000 2500
    43. 43. Mauve, Artemis
    44. 44. Additional Lessons • Computational methods need to be more automated • Need to limit analyses to subsets of all available data • Need for people to help interpret and study data is increasing not decreasing • Sequence is just the beginning • Need to train more students
    45. 45. MICROBES

    ×