Eisen Talk for MBL Microbial Diversity Course
Upcoming SlideShare
Loading in...5
×
 

Eisen Talk for MBL Microbial Diversity Course

on

  • 2,661 views

Talk by Jonathan Eisen for MBL Microbial Diversity Course

Talk by Jonathan Eisen for MBL Microbial Diversity Course

Statistics

Views

Total Views
2,661
Views on SlideShare
1,771
Embed Views
890

Actions

Likes
0
Downloads
28
Comments
0

9 Embeds 890

http://phylogenomics.wordpress.com 694
http://phylogenomics.blogspot.com 145
http://paper.li 39
http://feeds.feedburner.com 3
https://phylogenomics.wordpress.com 3
http://twitter.com 2
http://tweetedtimes.com 2
https://twitter.com 1
http://www.slideshare.net 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • It has been less than 10 years since the first genome was determined\n
  • \n
  • It has been less than 10 years since the first genome was determined\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Phylogenetic analysis of rRNAs led to the discovery of archaea\n
  • Extension of rRNA analysis to uncultured organisms using PCR\n
  • \n
  • \n
  • This is a tree of a rRNA gene that was found on a large DNA fragment isolated from the Monterey Bay. This rRNA gene groups in a tree with genes from members of the gamma Proteobacteria a group that includes E. coli as well as many environmental bacteria. This rRNA phylotype has been found to be a dominant species in many ocean ecosystems.\n\n clone from the Sargasso Sea. This shows that this \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Eisen Talk for MBL Microbial Diversity Course Eisen Talk for MBL Microbial Diversity Course Presentation Transcript

  • Phylogenomics and theOrigin of Novelty in Microbes Jonathan A. Eisen UC Davis MBL Microbial Diversity Course July 9, 2011
  • Phylogenomics and theOrigin of Novelty in Microbes Jonathan A. Eisen UC Davis MBL Microbial Diversity Course July 9, 2011
  • My Obsessions Jonathan A. Eisen UC DavisMBL Microbial Diversity Course July 9, 2011
  • Social Networking in ScienceHOME PAGE MY TIMES TODAYS PAPER VIDEO MOST POPULAR TIMES TOPICS Welcome, fcollins Member Center Log OutSunday, April 1, 2007 HealthWORLD U.S. N.Y. / REGION BUSINESS TECHNOLOGY SCIENCE HEALTH SPORTS OPINION ARTS STYLE TRAVEL JOBS REAL ESTATE AUTOS FITNESS & NUTRITION HEALTH CARE POLICY MENTAL HEALTH & BEHAVIORScientist Reveals Secret of the Ocean: Its HimBy NICHOLAS WADEPublished: April 1, 2007 PRINT nytimes.com/sportsMaverick scientist J. Craig Venter has done it again. It was just a few years SINGLE-PAGEago that Dr. Venter announced that the human genome sequenced by Celera SAVEGenomics was in fact, mostly his own. And now, Venter has revealed a second SHAREtwist in his genomic self-examination. Venter was discussing his Global SHAREOcean Voyage, in which he used his personal yacht to collect ocean watersamples from around the world. He then used large filtration units to collect How good is your bracket? Compare your tournament picks to choices from members of The New York Times sportsmicrobes from the water samples which were then brought back to his high desk and other players.tech lab in Rockville, MD where he used the same methods that were used to Also in Sports: The Bracket Blog - all the news leading up to the Finalsequence the human genome to study the genomes of the 1000s of ocean Fourdwelling microbes found in each sample. In discussing the sampling methods, Venter let slip his Bats Blog: Spring training updates Play Magazine: How to build a super athletelatest attack on the standards of science – some of the samples were in fact not from the ocean, butwere from microbial habitats in and on his body.“The human microbiome is the next frontier,” Dr. Venter said. “The ocean voyage was just a cover.My main goal has always been to work on the microbes that live in and on people. And now that mygenome is nearly complete, why not use myself as the model for human microbiome studies as well.”It is certainly true that in the last few years, the microbes that live in and on people have become ahot research topic. So hot that the same people who were involved in the race to sequence the human
  • Bacterial evolve
  • T. H. Dobzhansky (1973)“Nothing in biology makes senseexcept in the light of evolution.”
  • Evolutionary Perspective and Comparative Biology• Comparative biology is the analysis of differences and similarities between species.• An evolutionary perspective is useful in such studies because it allows one to focus on how and why similarities and differences came to be.• In other words, biological objects have a history and understanding that history is important
  • Phylogenomic Analysis• Evolutionary reconstructions greatly improve genome analyses• Genome analysis greatly improves evolutionary reconstructions• There is a feedback loop such that these should be integrated
  • Phylogenomics of Novelty Variation inMechanisms of Mechanisms:Origin of New Patterns, Causes Functions and Effects Species Evolution
  • rRNA Tree of Life Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007.Based on tree from Pace 1997 Science 276:734-740
  • Limited Sampling of RRR Studies Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
  • Limited Sampling of RRR Studies Haloferax MethanococcusChlorobiumDeinococcusThermotoga Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
  • Fleischmann et al.1995 Science269:496-512
  • TIGR Genome Projects MethanococcusChlorobiumDeinococcusThermotoga Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
  • Fleischmann et al.1995 Science269:496-512
  • Whole Genome Shotgun Sequencing
  • Whole Genome Shotgun Sequencing
  • Whole Genome Shotgun SequencingWarner Brothers, Inc.
  • Whole Genome Shotgun Sequencing shotgunWarner Brothers, Inc.
  • Whole Genome Shotgun Sequencing shotgunWarner Brothers, Inc.
  • Whole Genome Shotgun Sequencing shotgunWarner Brothers, Inc. sequence
  • Whole Genome Shotgun Sequencing shotgunWarner Brothers, Inc. sequence
  • Assemble Fragments
  • Assemble Fragmentssequencer output
  • Assemble Fragmentssequencer output
  • Assemble Fragmentssequencer output assemble fragments
  • Assemble Fragmentssequencer output assemble fragments Closure & Annotation
  • From http://genomesonline.org
  • General Steps in Analysis of Complete Genomes• Identification/prediction of genes• Characterization of gene features• Characterization of genome features• Prediction of gene function• Prediction of pathways• Integration with known biological data• Comparative genomics
  • Genome Sequences Have Revolutionized Microbiology• Predictions of metabolic processes• Better vaccine and drug design• New insights into mechanisms of evolution• Genomes serve as template for functional studies• New enzymes and materials for engineering and synthetic biology
  • From http://genomesonline.org
  • Outline• Phylogenomic Tales – Selecting genomes for sequencing – Species evolution – Predicting functions of genes – Uncultured microbes – Searching for novel organisms and genes
  • Outline• Phylogenomic Tales – Selecting genomes for sequencing – Species evolution – Predicting functions of genes – Uncultured microbes – Searching for novel organisms and genes• All of these going to be told in context of a recent project “A Genomic Encyclopedia of Bacteria and Archaea” (aka GEBA)
  • GEBA IntroductionKnowing What We Don’t Know
  • Major Microbial Sequencing Efforts• Coordinated, top-down efforts – Fungal Genome Initiative (Broad/Whitehead) – Gordon and Betty Moore Foundation Marine Microbial Genome Sequencing Project – Sanger Center Pathogen Sequencing Unit – NHGRI Human Gut Microbiome Project – NIH Human Microbiome Program• White paper or grant systems – NIAID Microbial Sequencing Centers – DOE/JGI Community Sequencing Program – DOE/JGI BER Sequencing Program – NSF/USDA Microbial Genome Sequencing• Covers lots of ground and biological diversity
  • As of 2002
  • As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • Need for Tree Guidance Well Established• Common approach within some eukaryotic groups• Many small projects funded to fill in some bacterial or archaeal gaps• Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature
  • Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of OP8 Project Nitrospira Bacteroides bacteria Chlorobi• A genome Fibrobacteres Marine GroupA • Genome WS3 from each of Gemmimonas sequences are Firmicutes eight phyla Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are only Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Solution I: DictyoglomusEisen, Ward, Aquificae Thermudesulfobacteria sequence moreRobb, Nelson, et Thermotogae phyla OP1al OP11
  • Organisms SelectedPhylum Species selectedChrysiogenes Chrysiogenes arsenatis (GCA)Coprothermobacter Coprothermobacter proteolyticus (GCBP)Dictyoglomi Dictyoglomus thermophilum (GD T )Thermodesulfobacteria Thermodesulfobacterium commune (GTC)Nitrospirae Thermodesulfovibrio yellowstonii (GTY)Thermomicrobia Thermomicrobium roseum (GTR )Deferribacteres Geovibrio thiophilus (GGT)Synergistes Synergistes jonesii (GSJ)
  • Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Still highly OP10 Thermomicrobia Chloroflexi biased in terms TM7 Deinococcus-Thermus Dictyoglomus of the tree AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • Major Lineages of Actinobacteria 2.5 Actinobacteria 2.5.1 Acidimicrobidae 2.5.1 Acidimicrobidae 2.5.1.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.1 Unclassified 2.5.1.3 Acidimicrobineae 2.5.1.3.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.3.2 Acidimicrobiaceae 2.5.1.4 BD2-10 2.5.1.3 Acidimicrobineae 2.5.1.5 EB1017 2.5.2 Actinobacteridae 2.5.1.4 BD2-10 2.5.2.1 Unclassified 2.5.2.10 Ellin306/WR160 2.5.1.5 EB1017 2.5.2.11 Ellin5012 2.5.2.12 Ellin5034 2.5.2 Actinobacteridae 2.5.2.13 Frankineae 2.5.2.13.1 Unclassified 2.5.2.1 Unclassified 2.5.2.13.2 Acidothermaceae 2.5.2.10 Ellin306/WR160 2.5.2.13.3 2.5.2.13.4 Ellin6090 Frankiaceae 2.5.2.11 Ellin5012 2.5.2.13.5 2.5.2.13.6 Geodermatophilaceae Microsphaeraceae 2.5.2.12 Ellin5034 2.5.2.13.7 2.5.2.14 Sporichthyaceae Glycomyces 2.5.2.13 Frankineae 2.5.2.15 2.5.2.15.1 Intrasporangiaceae Unclassified 2.5.2.14 Glycomyces 2.5.2.15.2 2.5.2.15.3 Dermacoccus Intrasporangiaceae 2.5.2.15 Intrasporangiaceae 2.5.2.16 2.5.2.17 Kineosporiaceae Microbacteriaceae 2.5.2.16 Kineosporiaceae 2.5.2.17.1 2.5.2.17.2 Unclassified Agrococcus 2.5.2.17 Microbacteriaceae 2.5.2.17.3 2.5.2.18 Agromyces Micrococcaceae 2.5.2.18 Micrococcaceae 2.5.2.19 2.5.2.2 Micromonosporaceae Actinomyces 2.5.2.19 Micromonosporaceae 2.5.2.20 2.5.2.20.1 Propionibacterineae Unclassified 2.5.2.2 Actinomyces 2.5.2.20.2 2.5.2.20.3 Kribbella Nocardioidaceae 2.5.2.20 Propionibacterineae 2.5.2.20.4 2.5.2.21 Propionibacteriaceae Pseudonocardiaceae 2.5.2.21 Pseudonocardiaceae 2.5.2.22 2.5.2.22.1 Streptomycineae Unclassified 2.5.2.22 Streptomycineae 2.5.2.22.2 2.5.2.22.3 Kitasatospora Streptacidiphilus 2.5.2.23 Streptosporangineae 2.5.2.23 2.5.2.23.1 Streptosporangineae Unclassified 2.5.2.3 Actinomycineae 2.5.2.23.2 2.5.2.23.3 Ellin5129 Nocardiopsaceae 2.5.2.4 Actinosynnemataceae 2.5.2.23.4 2.5.2.23.5 Streptosporangiaceae Thermomonosporaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.3 Actinomycineae 2.5.2.4 Actinosynnemataceae 2.5.2.6 Brevibacteriaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.6 Brevibacteriaceae 2.5.2.7 Cellulomonadaceae 2.5.2.7 Cellulomonadaceae 2.5.2.8 Corynebacterineae 2.5.2.8 Corynebacterineae 2.5.2.8.1 Unclassified 2.5.2.8.2 Corynebacteriaceae 2.5.2.9 Dermabacteraceae 2.5.2.8.3 Dietziaceae 2.5.2.8.4 Gordoniaceae 2.5.3 Coriobacteridae 2.5.2.8.5 Mycobacteriaceae 2.5.2.8.6 Rhodococcus 2.5.3.1 Unclassified 2.5.2.8.7 Rhodococcus 2.5.2.8.8 Rhodococcus 2.5.3.2 Atopobiales 2.5.2.9 Dermabacteraceae 2.5.2.9.1 Unclassified 2.5.3.3 Coriobacteriales 2.5.2.9.2 Brachybacterium 2.5.2.9.3 Dermabacter 2.5.3.4 Eggerthellales 2.5.3 Coriobacteridae 2.5.3.1 Unclassified 2.5.4 OPB41 2.5.3.2 Atopobiales 2.5.3.3 Coriobacteriales 2.5.5 PK1 2.5.3.4 Eggerthellales 2.5.4 OPB41 2.5.6 Rubrobacteridae 2.5.5 PK1 2.5.6 Rubrobacteridae 2.5.6.1 Unclassified 2.5.6.1 Unclassified 2.5.6.2 "Thermoleiphilaceae 2.5.6.2 "Thermoleiphilaceae 2.5.6.2.1 Unclassified 2.5.6.2.2 Conexibacter 2.5.6.3 MC47 2.5.6.2.3 XGE514 2.5.6.3 MC47 2.5.6.4 Rubrobacteraceae 2.5.6.4 Rubrobacteraceae
  • Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Archaea TM7 Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Eukaryotes TM7 Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Viruses TM7 Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • Proteobacteria• GEBA TM6 OS-K • At least 40 Acidobacteria• A genomic Termite Group OP8 phyla of bacteria encyclopedia Nitrospira Bacteroides • Genome Chlorobi of bacteria Fibrobacteres Marine GroupA sequences are and archaea WS3 Gemmimonas mostly from Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter OP10 • Solution: Really Thermomicrobia Chloroflexi Fill in the Tree TM7 Deinococcus-Thermus Dictyoglomus Aquificae ThermudesulfobacteriaEisen & Ward, PIs Thermotogae OP1 OP11
  • http://www.jgi.doe.gov/programs/GEBA/pilot.html
  • GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen, Eddy Rubin, Jim Bristow)• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al)• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, Eddy Rubin, Jim Bristow)
  • rRNA Tree of Life FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.Based on tree from Pace NR, 2003.
  • B: Ac tin ob ac te B: ria # of Genomes Am (H in igh 10 15 20 25 30 35 0 5 an G a C B: B: er ) Ba Aq ob ct uif ia B: ero ica B: e D Ch ide B: e ef lo te r s D rri ofl ef ba e B: e c xi B: De B rrib ter Ep lta : D act es si Pr ei er lo o n es n te oc Pr ob oc ot a ci B: e ct G B: oba eri am B F ct a : ir e B: m Fu mi ria a G P so cut em ro ba e t c s B: ma eo te ba ri H tim c a a t B: loa ona eri a B: Pl nae de an r te Th c o sPhyla er B: to bia m S m le y s B: od piro ce es c te T u h B: he lfo ae s rm b te GEBA Pilot Target List Th o a s er de cte m s ri u a A: ove lfo H n bi A: alo abu a A: A b la M rc ac e A: et ha te M han eo ria et g ha ob lob ac i A: no te m r A: The icr ia Th rm obi er oc a m oc op ci ro te i
  • GEBA Pilot Project Overview• Identify major branches in rRNA tree for which no genomes are available• Identify those with a cultured representative in DSMZ• DSMZ grew > 200 of these and prepped DNA• Sequence and finish 200+• Annotate, analyze, release data• Assess benefits of tree guided sequencing• 1st paper Wu et al in Nature Dec 2009
  • Assess Benefits of GEBA• All genomes have some value• But what, if any, is the benefit of tree- guided sequencing over other selection methods• Lessons for other large scale microbial genome projects?
  • GEBA Phylogenomic Lesson 1 The rRNA Tree of Life is a Useful Tool for Identifying Phylogenetically Novel Genomes
  • rRNA Tree of LifeBacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
  • The Core Gets Small ...
  • The Pangenome
  • Islands Among Synteny
  • Network of LifeBacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • T. roseum mobile motility element Wu et al doi:10.1371/journal.pone.0004207
  • Phylogenetic Distribution Novelty: Bacterial Actin Related Protein 2"#3)&4&*&& !"#*)$*),+% 5"#$-.-6&0&1- !"#$%,$-%)( 7"#0(1.8-9& !"#$+-+,,! 5"#:1,)*&$/0 !"#&$,%+)+-+ !"#$% !"#$%&()*&& !"#$%&(%() (( +"#,-.(/01 !"#*+,**+( ;"#01,&-*0 !"#%*+$--( <"#$-.-3.1%&0 !"#%,&-+) ) 2"#$&*-.-1 !"#$(-%%+&$ ="#$.1001 !"#-*$+$(&( !&( $++ >"#0$1,/%1.&0 !"#&$**+),)-! *$ $++ ;"#01,&-*0 !"#*+,$*( * 5"#:1,)*&$/0 !"#&$,%+%-%% $++ 5"#$-.-6&0&1- !"#,&+$)* !&) ?"#@-%1*)A10(-. !"#&%%&*%* $++ B"#A1%%/0# "#%*,-&*( )* 2"#*-)).@1*0 !"#*-&(+ 5"#$-.-6&0&1- !"#,&&*&* !&* $++ ?"#@-%1*)A10(-. !"#$)),)*%, $++ ;"#01,&-*0 !"#*+,$*),! ;"#)$C.1$-/@ !"#&&),(*((- +!& 5"#$-.-6&0&1- !"#$++-&%%! ), ."#,1(-*0 !"#$-+*$((&! !&, (( !"#(C1%&1*1 !"#$-,(%+-! (% 5"#$-.-6&0&1- !"#$,+$(,& $++ 5"#:1,)*&$/0 !"#&$,%+-,(,! !&- -) ?"#4&0$)&4-/@ !"#-+&%$- )% ?"#@-%1*)A10(-. !"#$)),),%) () 5"#$-.-6&0&1- !"#,&,$$% $++ ?"#C1*0-*&&!"#&$-*$ $(&$ !&. $++ D"#01(&61 !"#$-&*)%&+! !"#(C1%&1*1!"#$-%$ $),) !&/ ?"#@-%1*)A1(-. !"#$((&+,*- $++ <"#@/0$/%/0 !"#&&&%*(, !&(0 +/*! Haliangium ochraceum DSM 14365 Patrik D’haeseleer, Adam Zemla, Victor KuninWu et al. 2009 Nature 462, 1056-1060 See also Guljamow et al. 2007 Current Biology.
  • articlesAnalysis of the genome sequence of the¯owering plantThe Arabidopsis Genome Initiative Authorship of this paper should be cited as `The Arabidopsis Genome Iniative. A full list of contributors appears at the end of this paper.......................................................................................................................................................................................................................................................................... . .The ¯owering plant is an important model system for identifying genes and determining their functions.Here we report the analysis of the genomic sequence of . The sequenced regions cover 115.4 megabases of the125-megabase genome and extend into centromeric regions. The evolution of involved a whole-genome duplication,followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral genetransfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000families, similar to the functional diversity of and the other sequenced multicellulareukaryotes. has many families of new proteins but also lacks several common protein families, indicating that the setsof common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the ®rstcomplete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processesin all eukaryotes, identifying a wide range of plant-speci®c gene functions and establishing rapid systematic ways to identifygenes for crop improvement. C. elegans Drosophila Overview of sequencing strategy Arabidopsis thaliana Arabidopsis Arabidopsis
  • Using the Core
  • WhWhole genome treebuilt usingAMPHORAby Martin Wu andDongying Wu
  • GEBA Phylogenomic Lesson 2 rRNA Tree is good but not perfect and better genomic sampling improves phylogenetic inference
  • 16s Says Hyphomonas is in RhodobacterialesBadger et al.2005
  • WGT and individual gene trees: Its Related to CaulobacteralesBadger et al.2005
  • 16s WGT, 23SBadger et al. 2005 Int J System Evol Microbiol 55: 1021-1026.
  • Zimmer. New York Times. 2009
  • GEBA Phylogenomic Lesson 3 Phylogenetics guided genome selection (and phylogenetics in general) improves genome annotation
  • Predicting Function• Key step in genome projects• More accurate predictions help guide experimental and computational analyses• Many diverse approaches• All improved both by “phylogenomic” type analyses that integrate evolutionary reconstructions and understanding of how new functions evolve
  • From Eisen etal. 1997 NatureMedicine 3:1076-1078.
  • Blast Search of H. pylori “MutS”• Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs• Based on this TIGR predicted this species had mismatch repair Based on Eisen• Assumes functional constancy et al. 1997 Nature Medicine 3: 1076-1078.
  • MutL??From http://asajj.roswellpark.org/huberman/dna_repair/mmr.html
  • Phylogenetic Tree of MutS Family Aquae Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg mSaco Yeast Human Yeast Mouse Arath Celeg Human Arath Human Mouse Spombe Fly Yeast Xenla Rat Mouse Yeast Human Spombe Yeast Neucr Arath Aquae Trepa Chltr DeiraTheaq Thema BacsuBorbu Based on Eisen, SynspStrpy 1998 Nucl Acids Ecoli Neigo Res 26: 4291-4300.
  • MutS Subfamilies MSH5 MutS2 Aquae Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg mSaco MSH6 Yeast Human Mouse Arath Yeast MSH4 Celeg Human Arath HumanMSH3 Mouse Fly Spombe Yeast Xenla Rat Mouse YeastMSH1 Spombe Human Yeast MSH2 Neucr Arath Aquae Trepa Chltr Deira Theaq BacsuBorbu Thema SynspStrpy Ecoli Neigo Based on Eisen, 1998 Nucl Acids MutS1 Res 26: 4291-4300.
  • Overlaying Functions onto Tree MutS2 MSH5 Aquae Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg MSH6 mSaco Yeast Human Mouse Arath YeastMSH4 Celeg Human Arath HumanMSH3 Mouse Fly Spombe Yeast Xenla Rat Mouse Yeast HumanMSH1 Spombe Yeast MSH2 Neucr Arath Aquae Trepa Chltr DeiraTheaq BacsuBorbu Thema SynspStrpy Based on Eisen, Ecoli Neigo 1998 Nucl Acids MutS1 Res 26: 4291-4300.
  • Functional Prediction Using Tree MSH5 - Meiotic Crossing Over MutS2 - Unknown Functions Aquae Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth CelegMSH6 - Nuclear mSacoRepair YeastOf Mismatches Human MSH4 - Meiotic Crossing Mouse Yeast Over Arath Celeg Human ArathMSH3 - Nuclear Human MouseRepairOf Loops Spombe Fly Yeast Xenla Rat Mouse MSH2 - Eukaryotic Nuclear Yeast Human Mismatch and Loop RepairMSH1 Spombe Yeast NeucrMitochondrial ArathRepair Aquae Trepa Chltr DeiraTheaq BacsuBorbu Thema SynspStrpy Ecoli Based on Eisen, Neigo 1998 Nucl Acids MutS1 - Bacterial Mismatch and Loop Repair Res 26: 4291-4300.
  • PHYLOGENENETIC PREDICTION OF GENE FUNCTION EXAMPLE A METHOD EXAMPLE B 2A CHOOSE GENE(S) OF INTEREST 5 3A 1 3 4 2B 2 IDENTIFY HOMOLOGS 5 1A 2A 1B 3B 6 ALIGN SEQUENCES 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 CALCULATE GENE TREE Duplication? 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 OVERLAY KNOWN FUNCTIONS ONTO TREE Duplication? 2A 3A 1B 2B 3B 1 2 3 4 5 6 1A INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication?Species 1 Species 2 Species 3 1A 1B 2A 2B 3A 3B 1 2 3 4 5 6 ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Based on Eisen, 1998 Genome Duplication Res 8: 163-167.
  • Evolutionary Rate Variation1 2 4 6 3 5
  • Phylogenetic Prediction of Function• Greatly improves accuracy of functional predictions compared to similarity alone (e.g., blast)• Many surrogate methods (e.g., COGs)• Automated phylogenetic methods now available – Sean Eddy, Steven Brenner, Kimmen Sjölander, etc.• But …
  • Example 2: Recent Changes• Phylogenomic functional prediction NJ * ** V.cholerae VC V.cholerae VC 0512 A1034 V.cholerae VC V.cholerae VC V.cholerae VC A0974 A0068 V.cholerae VC0825 0282 may not work well for very newly V.cholerae VCA0906 V.cholerae VC A0979 V.cholerae VCA1056 V.cholerae VC1643 V.cholerae VC 2161 V.cholerae VCA0923 ** ** V.cholerae VC0514 V.cholerae VC1868 V.cholerae VCA0773 V.cholerae VC1313 evolved functions V.cholerae VC1859 V.cholerae VC 1413 V.cholerae VCA0268 V.cholerae VC A0658 ** V.cholerae VC1405 V.cholerae VC 1298 * V.cholerae V.cholerae VCA0864 VC 1248 V.cholerae VCA0176 V.cholerae VCA0220 ** V.cholerae VC1289 V.cholerae VC1069 A ** V.cholerae VC2439• Can use understanding of origin of V.cholerae VC967 1 V.cholerae VCA0031 V.cholerae VC 1898 V.cholerae VCA0663 V.cholerae VC0988 A V.cholerae VC0216 V.cholerae VC0449 * V.cholerae VCA0008 V.cholerae VC1406 V.cholerae VC 1535 novelty to better interpret these cases? V.cholerae VC 0840 B.subtilis gi2633766 Synechocystis sp. gi1001299 Synechocystis sp.gi1001300 * Synechocystis sp. gi1652276 * Synechocystis * H.pylori sp. gi1652103 gi2313716 H.pylori 99 gi4155097 **C.jejuni ** C.jejuniCj1190c Cj1110c A.fulgidus gi2649560 A.fulgidus gi2649548 ** B.subtilis gi2634254• Screen genomes for genes that have B.subtilis gi2632630 B.subtilis gi2635607 B.subtilis gi2635608 B.subtilis ** ** B.subtilis gi2635609 ** gi2635610 B.subtilis E.coli gi2635882 E.coligi1788195 gi2367378 * ** E.coligi1788194 E.coli A1092 gi1787690 V.cholerae VC changed recently V.cholerae VC0098 E.coli gi1789453 H.pylori gi2313186 H.pylori 99 gi4154603 C.jejuni ** C.jejuni Cj0144 Cj1564 C.jejuni ** C.jejuniCj0262c ** Cj1506c H.pylori gi2313163 * H.pylori 99 gi4154575 **H.pylori gi2313179 ** H.pylori 99 gi4154599– Pseudogenes and gene loss ** C.jejuni Cj0019c C.jejuni C.jejuni Cj0951c Cj0246c B.subtilis gi2633374 T.maritima TM0014 V.cholerae VC V.cholerae VC 1403 A1088 T.pallidum gi3322777 T.pallidum ** T.pallidum gi3322939 gi3322938 ** B.burgdorferi gi2688522– Contingency Loci T.pallidum gi3322296 B.burgdorferi * T.maritima gi2688521 TM0429 T.maritima **T.maritima TM0918 ** TM1428 T.maritima TM0023 * T.maritima TM1143 T.maritima TM1146 P.abyssi PAB1308 P.horikoshii gi3256846 ** P.horikoshii P.abyssi PAB1336– Acquisition (e.g., LGT) ** gi3256896 ** **P.abyssi PAB2066 ** P.horikoshii gi3258290 * ** P.abyssi PAB1026 P.horikoshii gi3256884 ** D.radiodurans DRA00354 D.radiodurans DRA0353 ** D.radiodurans ** ** VC DRA0352 V.cholerae 1394 P.abyssi PAB1189 P.horikoshii gi3258414– Unusual dS/dN ratios ** B.burgdorferi gi2688621 M.tuberculosis gi1666149 V.cholerae VC 0622– Rapid evolutionary rates– Recent duplications
  • Example 3: Non homology methods• Many genes have homologs in other species but no homologs have ever been studied experimentally• Non-homology methods can make functional predictions for these• Example: phylogenetic profiling (extension of prior work of Koonin, Tatusov, Ragan, et al.)
  • Phylogenetic profiling basis• Microbial genes are lost rapidly when not maintained by selection• Genes can be acquired by lateral transfer• Frequently gain and loss occurs for entire pathways/processes• Thus might be able to use correlated presence/ absence information to identify genes with similar functions
  • Non-Homology Predictions: Phylogenetic Profiling• Step 1: Search all genes in organisms of interest against all other genomes• Ask: Yes or No, is each gene found in each other species• Cluster genes by distribution patterns (profiles)
  • Carboxydothermus hydrogenoformans• Isolated from a Russian hotspring• Thermophile (grows at 80°C)• Anaerobic• Grows very efficiently on CO (Carbon Monoxide)• Produces hydrogen gas• Low GC Gram positive (Firmicute)• Genome Determined (Wu et al. 2005 PLoS Genetics 1: e65. )
  • Homologs of Sporulation Genes Wu et al. 2005 PLoS Genetics 1: e65.
  • Carboxydothermus sporulates Wu et al. 2005 PLoS Genetics 1: e65.
  • Wu et al. 2005 PLoS Genetics 1: e65.
  • PG Profiling Works Better Using Orthology
  • GEBA Lesson 3: Phylogeny driven genome selection (and phylogenetics) improves genome annotation• Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes• Better definition of protein family sequence “patterns”• Greatly improves “comparative” and “evolutionary” based predictions• Conversion of hypothetical into conserved hypotheticals• Linking distantly related members of protein families• Improved non-homology prediction
  • GEBA Lesson 4: Metadata Important
  • GEBA Phylogenomic Lesson 5 Phylogeny-driven genome selection helps discover new genetic diversity
  • Network of LifeBacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • Protein Family Rarefaction• Take data set of multiple complete genomes• Identify all protein families using MCL• Plot # of genomes vs. # of protein families
  • Wu et al. 2009 Nature 462, 1056-1060
  • Wu et al. 2009 Nature 462, 1056-1060
  • Wu et al. 2009 Nature 462, 1056-1060
  • Wu et al. 2009 Nature 462, 1056-1060
  • Wu et al. 2009 Nature 462, 1056-1060
  • Synapomorphies existWu et al. 2009 Nature 462, 1056-1060
  • GEBA Phylogenomic Lesson 6 Improves analysis of genome data from uncultured organisms
  • rRNA Phylotyping • Collect DNA from environment • PCR amplify rRNA genes using broad (so- called universal) primers • Sequence • Align to others • Infer evolutionary tree • Unknowns “identified” by placement on tree • Some use BLAST, but not as good as phylogeny
  • rRNA PCRThe Hidden Majority Richness estimates Hugenholtz 2002 Bohannan and Hughes 2003
  • Metagenomics shotgun sequence
  • Example I:Phylotyping w/ many genes
  • rRNA Phylotyping in Sargasso Sea Venter et al., Science 304: 66. 2004
  • Shotgun Sequencing Allows Use of Alternative Anchors (e.g., RecA) Venter et al., Science 304: 66. 2004
  • Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta304: 66. 2004 Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science EFTu rRNA RecA RpoB HSP70
  • Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi without good C FB Major Phylogenetic Group Sargasso Phylotypes C Cannot be done hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c ia sampling of genomes oc cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • Example II: Binning
  • Metagenomics Challenge
  • Binning challenge
  • Binning challengeBest binning method: reference genomes
  • Binning challengeBest binning method: reference genomes
  • Binning challengeNo reference genome? What do you do?
  • Glassy Winged Sharpshooter • Obligate xylem feeder • Can transmit Pierce’s Disease agent • Potential bioterror agent • Needs to get amino- acids and other nutrients from symbionts like aphids
  • Wu et al. 2006 PLoS Biology 4: e188.
  • CFB Phyla
  • Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch Phylogenetic Binning ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi without good C FB Major Phylogenetic Group Sargasso Phylotypes C Cannot be done hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c ia sampling of genomes oc cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut e improves Ac s tin ob ac te C ria hl o ro bi C GEBA Project FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu metagenomic analysis s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • GEBA CyanoSequencing status (as of 01/14): Awaiting
Material 11 Library 12 Production 22 Finishing 5 Grand
Total 50On-going/ Planed Activities: - Building Cyanobacterial Metadatabase (IMG-GOLD) - 10th Cyanobacterial Molecular Biology Workshop, Lake Arrowhead, CA (06/10) --> Cheryl will host: Workshop training as prep for virtual Jamboree 123
  • GEBA RNBPlan:Sequence multiple Root Nodule Bacteria (RNBs) across the planet. Pilot: 100 RNBs. Beta RNB CupriavidisGoal: Burkholderia• Understand BioGeographical effects on species evolution Alpha RNB Azorhizobium and understand host-specificity. Allorhizobium Bradyrhizobium MesorhizobiumRationale: Rhizobium Sinorhizobium• N2 fixation by legume pastures and crops provides 65% of the N Devosia Ochrobactrum currently utilized in agricultural production. Phyllobacterium Balneimonas-like• Contributes 25 to 90 million metric tones N pa.• Symbioses save $US 6-10 billion annually on N fertilizer.• Grain and animal production enhanced by fixed nitrogen supplied by the symbiosis. Nikos Kyrpides 124
  • Haloarchaeal GEBA-like
  • Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Still not happy OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut e improves Ac s tin ob ac te C ria hl o ro bi C GEBA Project FB Major Phylogenetic Group Sargasso Phylotypes but only a little C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- metagenomic analysis, Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • Phylogenomics Future 1 Need to adapt genomic andmetagenomic methods to make better use of data
  • Improving Metagenomic Analysis• Methods – More automation – Better phylogenetic methods for short reads – Improved tools for using distantly related genomes in metagenomic analysis• Data sets – Rebuild protein family models – New phylogenetic markers – Need better reference phylogenies, including HGT• More simulations
  • AMPHORA Guide tree
  • AMPHORA
2
Coming
w/
More
MarkersPhylogene9c
group Genome
Number Gene
Number Maker
CandidatesArchaea 62 145415 106Ac-nobacteria 63 267783 136Alphaproteobacteria 94 347287 121Betaproteobacteria 56 266362 311Gammaproteobacteria 126 483632 118Deltaproteobacteria 25 102115 206Epislonproteobacteria 18 33416 455Bacteriodes 25 71531 286Chlamydae 13 13823 560Chloroflexi 10 33577 323Cyanobacteria 36 124080 590Firmicutes 106 312309 87Spirochaetes 18 38832 176Thermi 5 14160 974Thermotogae 9 17037 684
  • Phylogenetic challenge A single tree with everything
  • PhylOTU: A High-Throughput Procedure QuantifiesMicrobial Community Diversity and Resolves Novel Taxafrom Metagenomic DataT h o m as J. Sh ar p t o n 1 *, Sa m a n t h a J. Riese n f el d 1 , St e v e n W. K e m b el 2 , Josh u a La d a u 1 , Ja m es P.O ’ D w y er 2,3 , Jessica L. G re e n 2 , Jo n a t h a n A . Eise n 4 , K a t h e rin e S. Pollar d 1,51 The J. David Gladstone Institutes, University of California San Francisco, San Francisco, California, United States of America, 2 Center for Ecology and EvolutionaryBiology, University of Oregon, Eugene, Oregon, United States of America, 3 Institute of Integrative and Comparative Biology, University of Leeds, Leeds, United Kingdom,4 Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America, 5 Institute for Human Genetics & Division of Biostatistics, Finding Metagenomic OTUsUniversity of California San Francisco, San Francisco, California, United States of America A bstract Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU- finding methods. To circumvent these limitations, we developed Ph y l OTU, a computational workflow that identifies OTUs from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles. Using simulated metagenomic data, we quantified the accuracy with which Ph y l OTU clusters reads into OTUs. Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by analysis of PCR sequences. Taken together, these results suggest that Ph y l OTU enables characterization of part of the biosphere currently hidden from PCR-based surveys of diversity? Cit a tio n: Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O’Dwyer JP, et al. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061 E d it or: Oded Be ` , Technion-Israel Institute of Technology, Israel ´ja Receiv e d July 22, 2010; A cce p t e d December 17, 2010; Pu b lish e d January 20, 2011 C o p yrig h t: 2011 Sharpton et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
  • • Build AMPHORA ALL reference tree with concatenated alignment• Align reads that match any of the HMMs to concatenated alignment• Place reads into reference tree one at a time
  • Phylogenomics Future 2We have still only scratched the surface of microbial diversity
  • rRNA Tree of LifeBacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740
  • Phylogenetic Diversity: GenomesFrom Wuet al. 2009Nature462,1056-1060
  • Phylogenetic Diversity with GEBAFrom Wuet al. 2009Nature462,1056-1060
  • Phylogenetic Diversity: Isolates From Wu et al. 2009 Nature 462, 1056-1060
  • Phylogenetic Diversity: All From Wu et al. 2009 Nature 462, 1056-1060
  • Uncultured Lineages:• Get into culture• Enrichment cultures• If abundant in low diversity ecosystems• Flow sorting• Microbeads• Microfluidic sorting• Single cell amplification
  • GEBA uncultured Number of SAGs from Candidate Phyla 406 1 OD1 SAR OP3 OP1 Site A: Hydrothermal vent 4 1 - - Site B: Gold Mine 6 13 2 - Site C: Tropical gyres (Mesopelagic) - - - 2 Site D: Tropical gyres (Photic zone) 1 - - -Sample collections at 4 additional sites are underway. Phil Hugenholtz 142
  • Phylogenomics Future 3Need Experiments from Across the Tree of Life too
  • As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Experimental WS3 Gemmimonas Firmicutes studies are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Experimental WS3 Gemmimonas Firmicutes studies are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some studies Verrucomicrobia Chlamydia OP3 in other phyla Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Same trend in Dictyoglomus Aquificae Thermudesulfobacteria Eukaryotes Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Same trend in Dictyoglomus Aquificae Thermudesulfobacteria Viruses Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • ProteobacteriaTM6OS-K NeedAcidobacteriaTermite GroupOP8 experimentalNitrospiraBacteroidesChlorobi studies fromFibrobacteresMarine GroupAWS3 across the treeGemmimonasFirmicutes tooFusobacteriaActinobacteriaOP9CyanobacteriaSynergistesDeferribacteresChrysiogenetesNKB19VerrucomicrobiaChlamydiaOP3PlanctomycetesSpriochaetes 0.1CoprothmermobacterOP10ThermomicrobiaChloroflexiTM7Deinococcus-ThermusDictyoglomusAquificae Tree based onThermudesulfobacteriaThermotogae Hugenholtz (2002)OP1 with someOP11 modifications.
  • ProteobacteriaTM6OS-K Adopt aAcidobacteriaTermite GroupOP8 MicrobeNitrospiraBacteroidesChlorobiFibrobacteresMarine GroupAWS3GemmimonasFirmicutesFusobacteriaActinobacteriaOP9CyanobacteriaSynergistesDeferribacteresChrysiogenetesNKB19VerrucomicrobiaChlamydiaOP3PlanctomycetesSpriochaetes 0.1CoprothmermobacterOP10ThermomicrobiaChloroflexiTM7Deinococcus-ThermusDictyoglomusAquificae Tree based onThermudesulfobacteriaThermotogae Hugenholtz (2002)OP1 with someOP11 modifications.
  • Conclusion• Phylogenetic sampling of genomes improves our understanding of microbial diversity in many ways• Still need – More biogeography – More phenotypic/experimental data – Deeper phylogenetic sampling
  • MICROBES
  • A Happy Tree of Life
  • Acknowledgements• GEBA: – $$: DOE-JGI, DSMZ – Eddy Rubin, Dongying Wu, Phil Hugenholtz, Hans-Peter Klenk, Nikos Kyrpides, Tanya Woyke – Aaron Darling, Jenna Morgan• iSEEM: – $$: GBMF – Katie Pollard, Jessica Green, Martin Wu, Steven Kembel, Tom Sharpton, Morgan Langille, Guillaume Jospin• aTOL – $$: NSF – Naomi Ward, Jonathan Badger, Frank Robb, Martin Wu, Dongying Wu• Others (not mentioned in detail) – $$: NSF, NIH, DOE, GBMF, DARPA, Sloan – Frank Robb, Craig Venter, Doug Rusch, Shibu Yooseph, Nancy Moran, Colleen Cavanaugh, Josh Weitz, Srijak Bhatnagar, Russell Neches, Lizzy Wilbanks, Marc Facciotti,