Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Phylogenomics, Microbes, Yada Yada Yada - Talk by Jeisen at JCVI 1/18/11

3,905 views

Published on

Talk by Jonathan Eisen of UC Davis at the J. Craig Venter Institute 1/18/11.

Published in: Education
  • Be the first to comment

Phylogenomics, Microbes, Yada Yada Yada - Talk by Jeisen at JCVI 1/18/11

  1. 1. Phylogenomics and the Diversityand Diversification of Microbes Jonathan A. Eisen UC Davis JCVI West January 18, 2010
  2. 2. My ObsessionsJonathan A. Eisen UC Davis JCVI WestJanuary 18, 2010
  3. 3. Social Networking in ScienceHOME PAGE MY TIMES TODAYS PAPER VIDEO MOST POPULAR TIMES TOPICS Welcome, fcollins Member Center Log OutSunday, April 1, 2007 HealthWORLD U.S. N.Y. / REGION BUSINESS TECHNOLOGY SCIENCE HEALTH SPORTS OPINION ARTS STYLE TRAVEL JOBS REAL ESTATE AUTOS FITNESS & NUTRITION HEALTH CARE POLICY MENTAL HEALTH & BEHAVIORScientist Reveals Secret of the Ocean: Its HimBy NICHOLAS WADEPublished: April 1, 2007 PRINT nytimes.com/sportsMaverick scientist J. Craig Venter has done it again. It was just a few years SINGLE-PAGEago that Dr. Venter announced that the human genome sequenced by Celera SAVEGenomics was in fact, mostly his own. And now, Venter has revealed a second SHAREtwist in his genomic self-examination. Venter was discussing his Global SHAREOcean Voyage, in which he used his personal yacht to collect ocean watersamples from around the world. He then used large filtration units to collect How good is your bracket? Compare your tournament picks to choices from members of The New York Times sportsmicrobes from the water samples which were then brought back to his high desk and other players.tech lab in Rockville, MD where he used the same methods that were used to Also in Sports: The Bracket Blog - all the news leading up to the Finalsequence the human genome to study the genomes of the 1000s of ocean Fourdwelling microbes found in each sample. In discussing the sampling methods, Venter let slip his Bats Blog: Spring training updates Play Magazine: How to build a super athletelatest attack on the standards of science – some of the samples were in fact not from the ocean, butwere from microbial habitats in and on his body.“The human microbiome is the next frontier,” Dr. Venter said. “The ocean voyage was just a cover.My main goal has always been to work on the microbes that live in and on people. And now that mygenome is nearly complete, why not use myself as the model for human microbiome studies as well.”It is certainly true that in the last few years, the microbes that live in and on people have become ahot research topic. So hot that the same people who were involved in the race to sequence the human
  4. 4. Bacterial evolve
  5. 5. Phylogenomics of Novelty Origin of New Causes and Effects Functions and of Variation in Processes Processes•From within • Causes •New genes •Mutation rates •Changes in old genes • Repair and •Changes in pathways recombination processes•From outside • Recombination rates •Lateral transfer •Effects •Symbioses •Evolvability •Communities •Ecology Species Evolution •Genome Evolution •Phylogenetic history •Vertical vs. horizontal descent •Needed to track gain/loss of processes, infer convergence
  6. 6. Simpler Description• How do new functions originate in microbes?• How do these processes vary both within and between species?• What are the effects of this variation in evolvability on biology, ecology, etc?
  7. 7. Examples(Blasts from the TIGR past)
  8. 8. Wolbachia pipientis wMel • Wolbachia are obligate, maternally transmitted intracellular symbionts • Wolbachia infect many invertebrate species – Many cause male specific deleterious effects – Model system for studying sex ratio changes in hosts – Some are mutualistic (e.g., in filarial nematodes) • wMel selected as model system because it infects Drosophila melanogaster
  9. 9. Genome Completed Wu et al., PLoS Biology 2004
  10. 10. Wolbachia Overrun by Mobile ElementsRepeat Size Copies Protein motifs/families IS Family Possible Terminal Inverted Repeat SequenceClass (Median)1 1512 3 Transposase IS4 5’ ATACGCGTCAAGTTAAG 3’2 360 12 - New 5’ GGCTTTGTTGCATCGCTA 3’3 858 9 Transposase IS492/IS110 5’ GGCTTTGTTGCAT 3’4 1404.5 4 Conserved hypothetical, New 5’ ATACCGCGAWTSAWTCGCGGTAT 3’ phage terminase5 1212 15 Transposase IS3 5’ TGACCTTACCCAGAAAAAGTGGAGAGAAAG 3’6 948 13 Transposase IS5 5’ AGAGGTTGTCCGGAAACAAGTAAA 3’7 2405.5 8 RT/maturase -8 468 45 - -9 817 3 conserved hypothetical, ISBt12 transposase10 238 2 ExoD -11 225 2 RT/maturase -12 1263 4 Transposase ???13 572.5 2 Transposase ??? None detected14 433 2 Ankyrin -15 201 2 - -16 1400 6 RT/maturase -17 721 2 transposase IS63018 1191.5 2 EF-Tu -19 230 2 hypothetical - Wu et al., PLoS Biology 2004
  11. 11. Glassy Winged Sharpshooter • Obligate xylem feeder • Can transmit Pierce’s Disease agent • Potential bioterror agent • Needs to get amino- acids and other nutrients from symbionts like aphids
  12. 12. Sulcia makes amino acidsBaumannia makes vitamins and cofactorsWu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
  13. 13. Higher Evolutionary Rates in CladeWu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
  14. 14. Variation in Evolution Rates MutS MutL + + + + + + + + _ _ _ _Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
  15. 15. Evolution and Genome Processing • Probably exists as a defense mechanism • Analogous to RIPPING and heterochromatin silencing • Presence of repetitive DNA in MAC but not TEs suggests the mechanism involves targeting foreign DNA • Thus unlike RIPPING ciliate processing does not limit diversification by duplicationEisen et al. 2006. PLoS Biology.
  16. 16. The X-Files Pseudomonas Streps B. subt vs. Staph 13623200 30009952000 2500 136227259950425 2000 Series1 15009948850 Series1 13622250 10009947275 13621775 5009945700 0 13621300 2632200 2632700 2633200 2633700 2634200 2634700 2635200 2635700 2636200 2636700 0 2125 4250 6375 8500 0 625 1250 1875 2500 M. tb vs. M. leprae Pyrococcus Thermoplasmas 4000000 Mycobacterium tuberculosis 3000000 2000000 1000000 0 0 1000000 2000000 3000000 Mycobacterium leprae
  17. 17. Data for “Phylogenomics of Novelty” studies?
  18. 18. Fleischmann et al.1995
  19. 19. From http://genomesonline.org
  20. 20. From http://genomesonline.org
  21. 21. As of 2002
  22. 22. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  23. 23. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  24. 24. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  25. 25. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  26. 26. Need for Tree Guidance Well Established• Common approach within some eukaryotic groups• Many small projects funded to fill in some bacterial or archaeal gaps• Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature
  27. 27. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of OP8 Project Nitrospira Bacteroides bacteria Chlorobi• A genome Fibrobacteres Marine GroupA • Genome WS3 from each of Gemmimonas sequences are Firmicutes eight phyla Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are only Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Solution I: DictyoglomusEisen, Ward, Aquificae Thermudesulfobacteria sequence moreRobb, Nelson, et Thermotogae phyla OP1al OP11
  28. 28. Organisms SelectedPhylum Species selectedChrysiogenes Chrysiogenes arsenatis (GCA)Coprothermobacter Coprothermobacter proteolyticus (GCBP)Dictyoglomi Dictyoglomus thermophilum (GD T )Thermodesulfobacteria Thermodesulfobacterium commune (GTC)Nitrospirae Thermodesulfovibrio yellowstonii (GTY)Thermomicrobia Thermomicrobium roseum (GTR )Deferribacteres Geovibrio thiophilus (GGT)Synergistes Synergistes jonesii (GSJ)
  29. 29. Major Lineages of Actinobacteria 2.5 Actinobacteria 2.5.1 Acidimicrobidae 2.5.1 Acidimicrobidae 2.5.1.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.1 Unclassified 2.5.1.3 Acidimicrobineae 2.5.1.3.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.3.2 Acidimicrobiaceae 2.5.1.4 BD2-10 2.5.1.3 Acidimicrobineae 2.5.1.5 EB1017 2.5.2 Actinobacteridae 2.5.1.4 BD2-10 2.5.2.1 Unclassified 2.5.2.10 Ellin306/WR160 2.5.1.5 EB1017 2.5.2.11 Ellin5012 2.5.2.12 Ellin5034 2.5.2 Actinobacteridae 2.5.2.13 Frankineae 2.5.2.13.1 Unclassified 2.5.2.1 Unclassified 2.5.2.13.2 Acidothermaceae 2.5.2.10 Ellin306/WR160 2.5.2.13.3 2.5.2.13.4 Ellin6090 Frankiaceae 2.5.2.11 Ellin5012 2.5.2.13.5 2.5.2.13.6 Geodermatophilaceae Microsphaeraceae 2.5.2.12 Ellin5034 2.5.2.13.7 2.5.2.14 Sporichthyaceae Glycomyces 2.5.2.13 Frankineae 2.5.2.15 2.5.2.15.1 Intrasporangiaceae Unclassified 2.5.2.14 Glycomyces 2.5.2.15.2 2.5.2.15.3 Dermacoccus Intrasporangiaceae 2.5.2.15 Intrasporangiaceae 2.5.2.16 2.5.2.17 Kineosporiaceae Microbacteriaceae 2.5.2.16 Kineosporiaceae 2.5.2.17.1 2.5.2.17.2 Unclassified Agrococcus 2.5.2.17 Microbacteriaceae 2.5.2.17.3 2.5.2.18 Agromyces Micrococcaceae 2.5.2.18 Micrococcaceae 2.5.2.19 2.5.2.2 Micromonosporaceae Actinomyces 2.5.2.19 Micromonosporaceae 2.5.2.20 2.5.2.20.1 Propionibacterineae Unclassified 2.5.2.2 Actinomyces 2.5.2.20.2 2.5.2.20.3 Kribbella Nocardioidaceae 2.5.2.20 Propionibacterineae 2.5.2.20.4 2.5.2.21 Propionibacteriaceae Pseudonocardiaceae 2.5.2.21 Pseudonocardiaceae 2.5.2.22 2.5.2.22.1 Streptomycineae Unclassified 2.5.2.22 Streptomycineae 2.5.2.22.2 2.5.2.22.3 Kitasatospora Streptacidiphilus 2.5.2.23 Streptosporangineae 2.5.2.23 2.5.2.23.1 Streptosporangineae Unclassified 2.5.2.3 Actinomycineae 2.5.2.23.2 2.5.2.23.3 Ellin5129 Nocardiopsaceae 2.5.2.4 Actinosynnemataceae 2.5.2.23.4 2.5.2.23.5 Streptosporangiaceae Thermomonosporaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.3 Actinomycineae 2.5.2.4 Actinosynnemataceae 2.5.2.6 Brevibacteriaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.6 Brevibacteriaceae 2.5.2.7 Cellulomonadaceae 2.5.2.7 Cellulomonadaceae 2.5.2.8 Corynebacterineae 2.5.2.8 Corynebacterineae 2.5.2.8.1 Unclassified 2.5.2.8.2 Corynebacteriaceae 2.5.2.9 Dermabacteraceae 2.5.2.8.3 Dietziaceae 2.5.2.8.4 Gordoniaceae 2.5.3 Coriobacteridae 2.5.2.8.5 Mycobacteriaceae 2.5.2.8.6 Rhodococcus 2.5.3.1 Unclassified 2.5.2.8.7 Rhodococcus 2.5.2.8.8 Rhodococcus 2.5.3.2 Atopobiales 2.5.2.9 Dermabacteraceae 2.5.2.9.1 Unclassified 2.5.3.3 Coriobacteriales 2.5.2.9.2 Brachybacterium 2.5.2.9.3 Dermabacter 2.5.3.4 Eggerthellales 2.5.3 Coriobacteridae 2.5.3.1 Unclassified 2.5.4 OPB41 2.5.3.2 Atopobiales 2.5.3.3 Coriobacteriales 2.5.5 PK1 2.5.3.4 Eggerthellales 2.5.4 OPB41 2.5.6 Rubrobacteridae 2.5.5 PK1 2.5.6 Rubrobacteridae 2.5.6.1 Unclassified 2.5.6.1 Unclassified 2.5.6.2 "Thermoleiphilaceae 2.5.6.2 "Thermoleiphilaceae 2.5.6.2.1 Unclassified 2.5.6.2.2 Conexibacter 2.5.6.3 MC47 2.5.6.2.3 XGE514 2.5.6.3 MC47 2.5.6.4 Rubrobacteraceae 2.5.6.4 Rubrobacteraceae
  30. 30. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Still highly OP10 Thermomicrobia Chloroflexi biased in terms TM7 Deinococcus-Thermus Dictyoglomus of the tree AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  31. 31. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Archaea TM7 Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  32. 32. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Eukaryotes TM7 Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  33. 33. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Viruses TM7 Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  34. 34. Phylogenomics of Novelty Origin of New Causes and Effects Functions and of Variation in Processes Processes•From within • Causes •New genes •Mutation rates •Changes in old genes • Repair and •Changes in pathways recombination processes•From outside • Recombination rates •Lateral transfer •Effects •Symbioses •Evolvability •Communities •Ecology Species Evolution •Genome Evolution •Phylogenetic history •Vertical vs. horizontal descent •Needed to track gain/loss of processes, infer convergence
  35. 35. Phylogenomics of Novelty Origin of New Causes and Effects Functions and of Variation in Processes Processes•From within • Causes •New genes •Mutation rates •Changes in old genes • Repair and •Changes in pathways recombination processes•From outside • Recombination rates •Lateral transfer •Effects •Symbioses •Evolvability •Communities •Ecology Species Evolution •Genome Evolution •Phylogenetic history •Vertical vs. horizontal descent •Needed to track gain/loss of processes, infer convergence
  36. 36. Proteobacteria• GEBA TM6 OS-K • At least 40 Acidobacteria• A genomic Termite Group OP8 phyla of bacteria encyclopedia Nitrospira Bacteroides • Genome Chlorobi of bacteria Fibrobacteres Marine GroupA sequences are and archaea WS3 Gemmimonas mostly from Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter OP10 • Solution: Really Thermomicrobia Chloroflexi Fill in the Tree TM7 Deinococcus-Thermus Dictyoglomus Aquificae ThermudesulfobacteriaEisen & Ward, PIs Thermotogae OP1 OP11
  37. 37. http://www.jgi.doe.gov/programs/GEBA/pilot.html
  38. 38. GEBA Pilot Project Overview• Identify major branches in rRNA tree for which no genomes are available• Identify those with a cultured representative in DSMZ• DSMZ grew > 200 of these and prepped DNA• Sequence and finish 100+ (covering breadth of bacterial/archaea diversity)• Annotate, analyze, release data• Assess benefits of tree guided sequencing• 1st paper Wu et al in Nature Dec 2009
  39. 39. GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen, Eddy Rubin, Jim Bristow)• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al)• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, Eddy Rubin, Jim Bristow)
  40. 40. GEBA Phylogenomic Lesson 1The rRNA Tree of Life is a Useful Toolfor Identifying Phylogenetically Novel Genomes
  41. 41. rRNA Tree of LifeBacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  42. 42. Network of LifeBacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  43. 43. Network of LifeBacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  44. 44. “Whole Genome” Concatenation Tree w/ AMPHORASee Wu and Eisen, Genome Biology 2008 9: R151http://bobcat.genomecenter.ucdavis.edu/AMPHORA/
  45. 45. Wanted: Good Visualization ExpertsZimmer. New York Times. 2009
  46. 46. Compare PD in TreesFrom Wu et al. 2009 Nature 462, 1056-1060
  47. 47. PD of rRNA, Genome Trees SimilarFrom Wu et al. 2009 Nature 462, 1056-1060
  48. 48. 16s Says Hyphomonas is in RhodobacterialesBadger et al.2005 Int JSystem EvolMicrobiol 55:1021-1026.
  49. 49. WGT and individual gene trees: Its Related to CaulobacteralesBadger et al.2005 Int JSystem EvolMicrobiol 55:1021-1026.
  50. 50. GEBA Phylogenomic Lesson 2 Phylogeny-driven genome selection helps discover new genetic diversity
  51. 51. Network of LifeBacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  52. 52. Protein Family Rarefaction Curves• Take data set of multiple complete genomes• Identify all protein families using MCL• Plot # of genomes vs. # of protein families
  53. 53. Wu et al. 2009 Nature 462, 1056-1060
  54. 54. Wu et al. 2009 Nature 462, 1056-1060
  55. 55. Wu et al. 2009 Nature 462, 1056-1060
  56. 56. Wu et al. 2009 Nature 462, 1056-1060
  57. 57. Wu et al. 2009 Nature 462, 1056-1060
  58. 58. Synapomorphies existWu et al. 2009 Nature 462, 1056-1060
  59. 59. also supported by the GOS diversity seen at the nucleotide environmental settin level across the different sampling sites [30]. Averaged over stood [57,119–121]. A the sites, 14% of the GOS sequence reads from a site are viral sequences (unp unique (at 70% nucleotide identity) to that site [30]. protein clusters cont Figure 11. Rate of Cluster Discovery for Mammals Compared to That for Microbes The x-axis denotes the number of sequences (in thousands), and the y-axis denotes the number of clu are considered for the ‘‘Mammalian’’ dataset, and the plot shows the number of clusters that are hit w ‘‘Mammalian Random’’ dataset, the order of the sequences from the ‘‘Mammalian’’ dataset is randYooseph et al. PLoS subsets of2007 similar to that of the mammalian set are considered. datasets, random Biology size doi:10.1371/journal.pbio.0050016.g011
  60. 60. Structural Novelty• Of the 17000 protein families in the GEBA56, 1800 are novel in sequence (Wu)• Structural modeling suggests many are structurally novel too (Dhaeseleer)• 372 being crystallized by the PSI (Kerfeld)
  61. 61. Phylogenetic Distribution Novelty: Bacterial Actin Related Protein 2 "# & * & 3) 4& &!"#*)$*),+% 5 "# .- 6& 1- !"#$%,$-%)( $- 0& 7"# 0(1.8- 9&!"#$+-+,,! 5 "# ) * & 0 !"#&$,%+)+-+ :1, $/ !"#$% ! "# & * & $% &!"#$%&(%() ’() (( + "# - .(/ 01 !"#*+,**+( , ; "# & * 0 !"#%*+$--( 01, - < "# .- 3.1% !"#%,&-+) $- &0 ) 2 "# * - .- 1 !"#$(-%%+&$ $& = "# $.1001 !"#-*$+$(&( !&( $++ > 0$1, / % 0 !"#&$**+),)-! "# 1.& *$ $++ ; "# 01, & * 0 !"#*+,$*( - * 5 "# ) * & 0 !"#&$,%+%-%% :1, $/ $++ 5 "# .- 6& 1- !"#,&+$)* $- 0& !&) ? "# % ) A10(- . !"#&%%&*%* @- 1* $++ B "# %0#"#%*,-&*( A1% / )* 2 "# - ) ’) .@1* 0 !"#*-&(+ * 5 "# .- 6& 1- !"#,&&*&* $- 0& !&* $++ ? "# % ) A10(- . !"#$)),)*%, @- 1* $++ ; "# 01, & * 0 !"#*+,$*),! - ; "# $C.1$- / @ !"#&&),(*((- ) +!& 5 "# .- 6& 1- !"#$++-&%%! $- 0& ), ."# 1(- * 0 !"#$-+*$((&! , !&, (( ! "# & 1 !"#$-,(%+-! (C1% 1* (% 5 "# .- 6& 1- !"#$,+$(,& $- 0& $++ 5 "# ) * & 0 !"#&$,%+-,(,! :1, $/ !&- -) ? "# 0$) & / @ !"#-+&%$- 4& 4- )% ? "# % ) A10(- . !"#$)),),%) @- 1* () 5 "# .- 6& 1- !"#,&,$$% $- 0& $++ ? "# 0- * & C1* & !"#&$-*$ $(&$ !&. $++ D"# 61 !"#$-&*)%&+! 01(& ! "# & 1!"#$-%$ $),) (C1% 1* !&/ ? "# % ) A1(- . !"#$((&+,*- @- 1* $++ < "# 0$/ %0 !"#&&&%*(, @/ / !&(0 +/*! Haliangium ochraceum DSM 14365 Patrik D’haeseleer, Adam Zemla, Victor KuninWu et al. 2009 Nature 462, 1056-1060 See also Guljamow et al. 2007 Current Biology.
  62. 62. GEBA Phylogenomic Lesson 3Phylogeny driven genome selection(and phylogenetics in general)improves genome annotation
  63. 63. Predicting Function• Key step in genome projects• More accurate predictions help guide experimental and computational analyses• Many diverse approaches• All improved both by “phylogenomic” type analyses that integrate evolutionary reconstructions and understanding of how new functions evolve
  64. 64. Most/All Functional Prediction Improves w/ Better Phylogenetic Sampling • Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes • Better definition of protein family sequence “patterns” • Greatly improves “comparative” and “evolutionary” based predictions • Conversion of hypothetical into conserved hypotheticals • Linking distantly related members of protein families • Improved non-homology prediction Kostas Natalia Thanos Nikos IainMavrommatis Ivanova Lykidis Kyrpides Anderson
  65. 65. GEBA Phylogenomic Lesson 4 Metadata and individual genome papers important
  66. 66. Genome Marker Papers w/ metadata
  67. 67. GEBA Phylogenomic Lesson 5 Improves analysis of genome data from uncultured organisms
  68. 68. Metagenomics shotgun clone
  69. 69. Rusch et al. PLoS Biology 2007
  70. 70. Example I: Phylotyping with rRNA and other genes
  71. 71. Uses of rRNA sequencesThe Hidden Majority Richness estimates Hugenholtz 2002 Bohannan and Hughes 2003
  72. 72. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  73. 73. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi without good C FB Major Phylogenetic Group Sargasso Phylotypes C Cannot be done hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c ia sampling of genomes oc cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  74. 74. Example II: Binning
  75. 75. Metagenomics Challenge
  76. 76. Binning challengeA TB UC VD WE XF YG Z
  77. 77. Binning challengeA TB UC VD WE XF YG Best binning method: reference genomes Z
  78. 78. Binning challengeA TB UC VD WE XF YG Best binning method: reference genomes Z
  79. 79. Binning challengeA TB UC VD WE XF YG No reference genome? What do you do? Z
  80. 80. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch Phylogenetic Binning ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  81. 81. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi without good C FB Major Phylogenetic Group Sargasso Phylotypes C Cannot be done hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c ia sampling of genomes oc cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70

×