• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Eisen.lake arrowhead2010c
 

Eisen.lake arrowhead2010c

on

  • 559 views

 

Statistics

Views

Total Views
559
Views on SlideShare
559
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • Gets better with more markers - but we do not have lots of sequences for these markers. We can get them from genomes. The more diverse the genomes, thebeter the marker set will be <br />
  • <br />
  • Gets better with more markers - but we do not have lots of sequences for these markers. We can get them from genomes. The more diverse the genomes, thebeter the marker set will be <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • Gets better with more markers - but we do not have lots of sequences for these markers. We can get them from genomes. The more diverse the genomes, thebeter the marker set will be <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

Eisen.lake arrowhead2010c Eisen.lake arrowhead2010c Presentation Transcript

  • The Importance of History (and other obsessions) Jonathan A. Eisen UC Davis Talk for Lake Arrowhead Microbial Genomes 2010 (#LAMG10)
  • Social Networking in Science
  • Bacterial evolve
  • Evolution of Lake Arrowhead
  • Blast Peptide LAKEARROWHEAD
  • Homework • Do blastp search with other famous people associated with Lake Arrowhead Meeting • JEFFREYHMILLER • SARAHPALIN and her relationship to fungi B. fuckeliana • see http://phylogenomics.blogspot.com/ 2008/09/tracing-evolutionary-history-of- sarah.html
  • 2010
  • 2008
  • 2006
  • 2004
  • No 2002
  • Wayback Machine
  • 2002
  • Quotes 2004 • Space-time continuum of genes and genomes • Gene sequences are the wormhole that allows one to tunnel into the past • The human mind can conceive of things with no basis in physical reality • Thoughts can go faster than the speed of light
  • Quotes 2006 • The human guts are a real milieu of stuff • You better kiss everybody • Microbes not only have a lot of sex, they have a lot of weird sex • This is how you do metagenomics on 50 dollars, and that’s Canadian dollars
  • Quotes 2008 • Antibiotics do not kill things, they corrupt them • There comes a point in life when you have to bring chemists into the picture • The rectal swabs are here in tan color • And there's Jeffrey Dahmer • We are the environment. We live the phenotype. • If I have time I will tell you about a dream • A paper came out next year
  • Quotes 2010 • We have been using this word for many years without actually realizing it was correct • Another thing you need to know" pause "Actually you don't NEED to know any of this • "I have been influenced by Fisher Price throughout my life • Don't take that away from us • It takes 1000 nanobiologists to make one microbiologist • I am going to wrap up as I hear the crickets chirping • And we will bring out the unused cheese from yesterday • In an engineering sense, the vagina is a simple plug flow reactor • This is going to be ironic coming from someone who studies circumcision • A little bit about time, but I am going to spend a lot less time on time than on space
  • Keywords I remember from 2010 • Penis • Vagina • Anthrax • Acne • Ulcer (multiple kinds) • Global warming • Antibiotic resistance • Virulence 24
  • rRNA Tree of Life Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • Proteobacteria 2002 TM6 OS-K Acidobacteria • At least 40 Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002
  • 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas sequences are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002
  • 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas sequences are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes • Some other NKB19 Verrucomicrobia Chlamydia phyla are only OP3 Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 sampled Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002
  • 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas sequences are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes • Some other NKB19 Verrucomicrobia Chlamydia phyla are only OP3 Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 sampled Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002
  • Why Increase Phylogenetic Coverage? • Common approach within some eukaryotic groups (FGP, NHGRI, etc) • Many successful small projects to fill in bacterial or archaeal gaps • Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature • Many potential benefits
  • Proteobacteria • NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi • A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Solution I: Coprothmermobacter OP10 sequence more Thermomicrobia Chloroflexi TM7 phyla Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • Proteobacteria • NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi • A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Still highly Coprothmermobacter OP10 biased in terms Thermomicrobia Chloroflexi TM7 of the tree Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • Major Lineages of Actinobacteria 2.5 Actinobacteria 2.5.1 Acidimicrobidae 2.5.1 Acidimicrobidae 2.5.1.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.1 Unclassified 2.5.1.3 Acidimicrobineae 2.5.1.3.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.3.2 Acidimicrobiaceae 2.5.1.4 BD2-10 2.5.1.3 Acidimicrobineae 2.5.1.5 EB1017 2.5.2 Actinobacteridae 2.5.1.4 BD2-10 2.5.2.1 Unclassified 2.5.2.10 Ellin306/WR160 2.5.1.5 EB1017 2.5.2.11 Ellin5012 2.5.2.12 Ellin5034 2.5.2 Actinobacteridae 2.5.2.13 Frankineae 2.5.2.13.1 Unclassified 2.5.2.1 Unclassified 2.5.2.13.2 Acidothermaceae 2.5.2.13.3 Ellin6090 2.5.2.10 Ellin306/WR160 2.5.2.13.4 Frankiaceae 2.5.2.11 Ellin5012 2.5.2.13.5 2.5.2.13.6 Geodermatophilaceae Microsphaeraceae 2.5.2.12 Ellin5034 2.5.2.13.7 2.5.2.14 Sporichthyaceae Glycomyces 2.5.2.13 Frankineae 2.5.2.15 2.5.2.15.1 Intrasporangiaceae Unclassified 2.5.2.14 Glycomyces 2.5.2.15.2 2.5.2.15.3 Dermacoccus Intrasporangiaceae 2.5.2.15 Intrasporangiaceae 2.5.2.16 2.5.2.17 Kineosporiaceae Microbacteriaceae 2.5.2.16 Kineosporiaceae 2.5.2.17.1 2.5.2.17.2 Unclassified Agrococcus 2.5.2.17 Microbacteriaceae 2.5.2.17.3 2.5.2.18 Agromyces Micrococcaceae 2.5.2.18 Micrococcaceae 2.5.2.19 2.5.2.2 Micromonosporaceae Actinomyces 2.5.2.19 Micromonosporaceae 2.5.2.20 2.5.2.20.1 Propionibacterineae Unclassified 2.5.2.2 Actinomyces 2.5.2.20.2 2.5.2.20.3 Kribbella Nocardioidaceae 2.5.2.20 Propionibacterineae 2.5.2.20.4 2.5.2.21 Propionibacteriaceae Pseudonocardiaceae 2.5.2.21 Pseudonocardiaceae 2.5.2.22 2.5.2.22.1 Streptomycineae Unclassified 2.5.2.22 Streptomycineae 2.5.2.22.2 2.5.2.22.3 Kitasatospora Streptacidiphilus 2.5.2.23 Streptosporangineae 2.5.2.23 2.5.2.23.1 Streptosporangineae Unclassified 2.5.2.3 Actinomycineae 2.5.2.23.2 2.5.2.23.3 Ellin5129 Nocardiopsaceae 2.5.2.4 Actinosynnemataceae 2.5.2.23.4 2.5.2.23.5 Streptosporangiaceae Thermomonosporaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.3 2.5.2.4 Actinomycineae Actinosynnemataceae 2.5.2.6 Brevibacteriaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.6 Brevibacteriaceae 2.5.2.7 Cellulomonadaceae 2.5.2.7 Cellulomonadaceae 2.5.2.8 Corynebacterineae 2.5.2.8 Corynebacterineae 2.5.2.8.1 Unclassified 2.5.2.8.2 Corynebacteriaceae 2.5.2.9 Dermabacteraceae 2.5.2.8.3 Dietziaceae 2.5.2.8.4 Gordoniaceae 2.5.3 Coriobacteridae 2.5.2.8.5 Mycobacteriaceae 2.5.2.8.6 Rhodococcus 2.5.3.1 Unclassified 2.5.2.8.7 Rhodococcus 2.5.2.8.8 Rhodococcus 2.5.3.2 Atopobiales 2.5.2.9 Dermabacteraceae 2.5.2.9.1 Unclassified 2.5.3.3 Coriobacteriales 2.5.2.9.2 Brachybacterium 2.5.2.9.3 Dermabacter 2.5.3.4 Eggerthellales 2.5.3 Coriobacteridae 2.5.3.1 Unclassified 2.5.4 OPB41 2.5.3.2 Atopobiales 2.5.3.3 Coriobacteriales 2.5.5 PK1 2.5.3.4 Eggerthellales 2.5.4 OPB41 2.5.6 Rubrobacteridae 2.5.5 PK1 2.5.6 Rubrobacteridae 2.5.6.1 Unclassified 2.5.6.1 Unclassified 2.5.6.2 "Thermoleiphilaceae 2.5.6.2 "Thermoleiphilaceae 2.5.6.2.1 Unclassified 2.5.6.2.2 Conexibacter 2.5.6.3 MC47 2.5.6.2.3 XGE514 2.5.6.3 MC47 2.5.6.4 Rubrobacteraceae 2.5.6.4 Rubrobacteraceae
  • Proteobacteria • NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi • A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Same trend in Coprothmermobacter OP10 Archaea Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • Proteobacteria • NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi • A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Same trend in Coprothmermobacter OP10 Eukaryotes Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • Proteobacteria • NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi • A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Same trend in Coprothmermobacter OP10 Viruses Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • Proteobacteria • GEBA TM6 OS-K • At least 40 phyla Acidobacteria • A genomic Termite Group OP8 of bacteria encyclopedia Nitrospira Bacteroides • Genome Chlorobi of bacteria and Fibrobacteres Marine GroupA sequences are archaea WS3 Gemmimonas mostly from Firmicutes Fusobacteria Actinobacteria three phyla OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Solution: Really Coprothmermobacter OP10 Thermomicrobia Fill in the Tree Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • GEBA Pilot Project Overview • Identify major branches in rRNA tree for which no genomes are available • Identify those with a cultured representative in DSMZ • DSMZ grew > 200 of these and prepped DNA • Sequence and finish 100+ (covering breadth of bacterial/archaea diversity) • Annotate, analyze, release data • Assess benefits of tree guided sequencing • 1st paper Wu et al in Nature Dec 2009
  • GEBA Pilot Project: Components • Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen, Eddy Rubin, Jim Bristow, Tanya Woyke) • Project management (David Bruce, Eileen Dalin, Lynne Goodwin) • Culture collection and DNA prep (DSMZ, Hans-Peter Klenk) • Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng) • Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al) • Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla) • Adopt a microbe education project (Cheryl Kerfeld) • Outreach (David Gilbert) • $$$ (DOE, DSMZ, GBMF)
  • GEBA and Openness • All data released as quickly as possible w/ no restrictions to IMG-GEBA; Genbank, etc • Data also available in Biotorrents (http:// biotorrents.net) • Individual genome reports published in OA “Standards in Genome Sciences (SIGS)” • 1st GEBA paper in Nature freely available and published using Creative Commons License 43
  • GEBA Lesson 1 rRNA Tree is Useful for Identifying Phylogenetically Novel Organisms 44
  • rRNA Tree of Life Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • Network of Life? Bacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • Compare PD in rRNA and WGT
  • PD of rRNA, Genome Trees Similar From Wu et al. 2009 Nature 462, 1056-1060
  • GEBA Lesson 2 Phylogeny-driven genome selection helps discover new genetic diversity
  • Network of Life? Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • Protein Family Rarefaction Curves • Take data set of multiple complete genomes • Identify all protein families using MCL • Plot # of genomes vs. # of protein families
  • Synapomorphies exist
  • Phylogenetic Distribution Novelty: Bacterial Actin Related Protein 2"#3)&4&*&& !"#*)$*),+% 5"#$-.-6&0&1- !"#$%,$-%)( 7"#0(1.8-9& !"#$''+-+,',! 5"#:1,)*&$/0 !"#&$,%+)+-+ ! " #$% !"#$%&'()*&& !"#$%&'(%() (( +"#,-.(/01 !"#*+,**'+( ;"#01,&-*0 !"#%*+$--( <"#$-.-3.1%&0 !"#%',&'-+) ') 2"#$&*-.-1 !"#$'(-%%+&$ ="#$.1001 !"#-*$+$(&( ! &’ ( $++ >"#0$1,/%1.&0 !"#&$**+),)-! *$ $++ ;"#01,&-*0 !"#*+,$*'( '* 5"#:1,)*&$/0 !"#&$,%+%-%% $++ 5"#$-.-6&0&1- !"#',&+$)* ! &’ ) ?"#@-%1*)A10(-. !"#&%'%&*%* $++ B"#A1%%/0# "#%*,-&*'( )* 2"#*-)').@1*0 !"#*-&'''(+ 5"#$-.-6&0&1- !"#',&&*&* ! &’ * $++ ?"#@-%1*)A10(-. !"#$)),)*%, $++ ;"#01,&-*0 !"#*+,$*),! ;"#)$C.1$-/@ !"#&&),(*((- + ! &’ 5"#$-.-6&0&1- !"#$++-&%%! ), ."#,1(-*0 !"#$'-+*$((&! ! &’ , (( !"#(C1%&1*1 !"#$-,(%'+-! (% 5"#$-.-6&0&1- !"#$,+$(,& $++ 5"#:1,)*&$/0 !"#&$,%+-,(,! ! &’ - -) ?"#4&0$)&4-/@ !"#''-+&%$- )% ?"#@-%1*)A10(-. !"#$)),),%) () 5"#$-.-6&0&1- !"#',&,$$% $++ ?"#C1*0-*&&!"#&$-*$ $(&$ ! &’ . $++ D"#01(&61 !"#$-&'*)%&+! !"#(C1%&1*1!"#$-%$ $),) ! &’ / ?"#@-%1*)A1(-. !"#$((&+,*- $++ <"#@/0$/%/0 !"#&&'&%'*(, ! &’ ( 0 +/*! Haliangium ochraceum DSM 14365 Patrik D’haeseleer, Adam Zemla, Victor Kunin See also Guljamow et al. 2007 Current Biology.
  • GEBA Lesson 3 Phylogeny-driven genome selection improves genome annotation
  • Most/All Functional Prediction Improves w/ Better Phylogenetic Sampling • Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes • Better definition of protein family sequence “patterns” • Greatly improves “comparative” and “evolutionary” based predictions • Conversion of hypothetical into conserved hypotheticals • Linking distantly related members of protein families • Improved non-homology prediction Kostas Natalia Thanos Nikos Iain Mavrommatis Ivanova Lykidis Kyrpides Anderson
  • GEBA Lesson 4 Metadata and individual genome papers important
  • SIGS http://standardsingenomics.org/
  • GEBA Lesson 5 Phylogeny-driven genome selection improves analysis of metagenome data
  • genomes if no reference • Assigning reads to phylogenetic groups using multiple genes • Phylogenetic binning • Phylogenetic ecology - especially important Weighted % of Clones Al ph ap ro 0 0.1250 0.2500 0.3750 0.5000 Be te Al ta ob ph G 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 pr ac a am ot te m eo ria Be pro ap ba ro ct ta teo D te er G p b el ob ia ta am rot ac pr ac Ep ot te U si lo eo ria m eo te nc ba ba ria la np Ep ap ss ro ct ct ifi te er si rot ed ob ia lo Pr ac n eo eria ot te De pr ba eo ria ba lta ote cte Cy ct pr ob ria an er ob ia o a ac C teo cte Ch te ya b ri ria la no ac a m b te Ac yd id ia ob e Fi act ria rm er Ba act ct er ia Ac ic ia Uses of phylogenetic er ut Ac oi tin es de tin te ob ob s a ac te C cte ria hl ri Aq or a Pl ui an fic ob ct om ae C i yc FB Sp et C iro es hl ch o ae te Major Phylogenetic Group Fi Sp rof rm s ic iro lex i Sargasso Phylotypes ut classification in metagenomics Ch es Fu cha lo ro De U fle so ete nc xi in ba s la Ch oc ss lo ct ifi ro oc ed bi er Ba Ecus ia ct ur - er ia yaTh C rcherm re na aeous frr tsf t pgk rplL rplF rplP rplT rplE infC rpsI rplS rplA rplB rplK rplC rpsJ rc rplN rplD rplM rpsE rpsS rpsB rpsK rpsC rpoB rpsM pyrG nusA dnaG rpmA smpB ha a eo ta
  • genomes if no reference phylogenetic groups using multiple genes Limited • Phylogenetic binning • Phylogenetic ecology - especially important sampling Weighted % of Clones Al ph ap ro 0 0.1250 0.2500 0.3750 0.5000 Be te Al ta ob ph G 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 pr ac a poor genomic am ot te m eo ria Be pro ap ba ro ct ta teo D te er G p b el ob ia ta • Assigning reads to in past pr ac am rot ac Ep ot te U si lo eo ria m eo te nc ba ba ria la np Ep ap ss ro ct ct ifi te er si rot ed ob ia lo Pr ac n eo eria ot te De pr ba eo ria ba lta ote cte Cy ct pr ob ria an er ob ia o a by ac C teo cte Ch te ya b ri ria la no ac a m b te Ac yd id ia ob e Fi act ria rm er Ba act ct er ia Ac ic ia Uses of phylogenetic er ut Ac oi tin es de tin te ob ob s a ac te C cte ria hl ri Aq or a Pl ui an fic ob ct om ae C i yc FB Sp et C iro es hl ch o ae te Major Phylogenetic Group Fi Sp rof rm s ic iro lex i Sargasso Phylotypes ut classification in metagenomics Ch es Fu cha lo ro De U fle so ete nc xi in ba s la Ch oc ss lo ct ifi ro oc ed bi er Ba Ecus ia ct ur - er ia yaTh C rcherm re na aeous frr tsf t pgk rplL rplF rplP rplT rplE infC rpsI rplS rplA rplB rplK rplC rpsJ rc rplN rplD rplM rpsE rpsS rpsB rpsK rpsC rpoB rpsM pyrG nusA dnaG rpmA smpB ha a eo ta
  • Metagenomic Analysis Improves w/ Phylogenetic Sampling • Small but real improvements in –Gene identification / confirmation –Functional prediction –Binning –Phylogenetic classification
  • Metagenomic Analysis Improves w/ Phylogenetic Sampling • Small but real improvements in –Gene identification / confirmation –Functional prediction –Binning –Phylogenetic classification • But not a lot ...
  • GEBA Future 1 Need to adapt genomic and metagenomic methods to make use of GEBA data
  • Al ph ap ro Be te ta ob G 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 pr ac am ot te m eo ria ap ba ro ct D te er el ob ia ta pr ac Ep ot te U si lo eo ria nc ba la np ss ro ct ifi te er ed ob ia Pr ac ot te eo ria ba Cy ct an er ob ia ac Ch te ria la m Ac yd id ia ob e Ba act ct er er ia Ac oi de tin te ob s ac te ria Aq Pl ui an fic ct om ae yc Sp et AMPHORA - each read on its own tree iro es ch ae Fi te rm s ic ut Ch es Improves with better lo ro U fle nc phylogenetic methods la xi ss Ch ifi lo ed ro bi Ba ct er ia Phylogenetic Binning Using AMPHORA frr tsf pgk rplL rplF rplP rplT rplE infC rpsI rplS rplA rplB rplK rplC rpsJ rplN rplD rplM rpsE rpsS rpsB rpsK rpsC rpoB rpsM pyrG nusA dnaG rpmA smpB
  • Improving Phylogeny for Metagenomic Reads • Examples using reference trees – AMPHORA (Wu and Eisen) – PPlacer (Erik Matsen) – FastTree (Morgan Price) • Variants – Use concatenated alignment of markers not just individual genes (Steven Kembel) – Apply to OTU identification not just classification (Thomas Sharpton) – CoBinning: look for linkage among fragments/genes (Aaron Darling)
  • Al ph ap ro Be te ta ob G 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 pr ac am ot te m eo ria ap ba ro ct D te er el ob ia ta pr ac Ep ot te U si lo eo ria nc ba la np ss ro ct ifi te er ed ob ia Pr ac ot te eo ria ba Cy ct an er ob ia ac Ch te ria la m Ac yd id ia ob e Ba act ct er er ia Ac oi de tin te ob s ac gene families te ria Aq Pl ui an fic ct om ae yc Sp et AMPHORA - each read on its own tree iro es ch ae Fi te rm s ic ut Improves with more Ch es lo ro U fle nc xi la Ch ss lo ifi ro ed bi Ba ct er ia Phylogenetic Binning Using AMPHORA frr tsf pgk rplL rplF rplP rplT rplE infC rpsI rplS rplA rplB rplK rplC rpsJ rplN rplD rplM rpsE rpsS rpsB rpsK rpsC rpoB rpsM pyrG nusA dnaG rpmA smpB
  • Identifying new markers • Take all genomes • All vs. all search • Identify protein families • For each family measure –Evenness in copy number –Universality –Phylogenetic congruence with WGT –Monophyly for superfamilies
  • Distances between gene trees and the AMPHORA concatenated genome tree rpmA coaE coaE rpmA trmD rplL rpsS rpsQ radA rplR rplD rplQ tsf rpsH frr smpB ttf rpsO rplR rplP rplM rpsS rplI rplV rpsB rplT rpsO rplO mraW rpsP rpsH rpsK rplQ rplU rplL tsf rplT trmD rplE rplS rpsP ttf rplC rpsI rplV mraW rplS rpsL infC rpsG rpsM rplM rplO rplI rplU pyrH rpsL rpsM rpsQ ruvA guaA radA rpsG purA smpB rplK priA rplD rpsK infC rplK rplC serS rplE rplA rplA rplF frr ruvA rplF rpsC serS rplN rplN rplP guaA rpsE ruvB pyrH rpsB rpsI rpsJ secY rRNA16S rpsJ secY purA rplB rplB priA nusA rpsE ruvB rpsC rRNA16S nusA 0 1 2 3 4 5 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 NODAL distance SPLIT distance AMPHORA marker Ribosomal protein Transcription/translation related protein DNA repair protein Protein of other function Distance between the genome tree and 100 random trees (average ± standard deviation)
  • Identifying new phylogenetic markers within phyla • Take all genomes within a phylum • All vs. all search • Identify protein families • For each family measure –Evenness in copy number –Universality –Phylogenetic congruence with WGT –Monophyly for superfamilies
  • Keep only the families with: Universality * Evenness * monophyly >= 90*90*90 Phylogenetic group Genome Number Gene Number Maker Candidates Archaea 62 145415 102 Actinobacteria 63 267783 136 Alphaproteobacteria 94 347287 142 Betaproteobacteria 56 266362 294 Gammaproteobacteria 126 483632 141 Deltaproteobacteria 25 102115 44 Epislonproteobacteria 18 33416 446 Bacteriodes 25 71531 179 Chlamydae 13 13823 561 Chloroflexi 10 33577 140 Cyanobacteria 36 124080 532 Firmicutes 106 312309 80 Spirochaetes 18 38832 72 Thermi 5 14160 727 Thermotogae 9 17037 646
  • Al ph ap ro Be te ta ob G 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 pr ac am ot te m eo ria ap ba ro ct D te er el ob ia ta pr ac Ep ot te U si lo eo ria nc ba la np ss ro ct ifi te er ed ob ia Pr ac ot te eo ria ba Cy ct an er ob ia ac Ch te ria la m Ac yd id ia ob e Ba act ct er er ia Ac oi de tin te ob s Other needs? ac te ria Aq Pl ui an fic ct om ae yc Sp et AMPHORA - each read on its own tree iro es ch ae Fi te rm s ic ut Ch es lo ro U fle nc xi la Ch ss lo ifi ro ed bi Ba ct er ia Phylogenetic Binning Using AMPHORA frr tsf pgk rplL rplF rplP rplT rplE infC rpsI rplS rplA rplB rplK rplC rpsJ rplN rplD rplM rpsE rpsS rpsB rpsK rpsC rpoB rpsM pyrG nusA dnaG rpmA smpB
  • Other Ways to Make Better Use of GEBA Data • Rebuild protein family models • Experiments from across the tree needed • Need better phylogenies, including HGT • Improved tools for using distantly related genomes in metagenomic analysis • Better recording and sharing of metadata about organisms
  • GEBA Future 2 The dark matter of the biological universe
  • rRNA Tree of Life Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • Phylogenetic Diversity: Sequenced Bacteria & Archaea From Wu et al. 2009
  • Phylogenetic Diversity with GEBA From Wu et al. 2009
  • Phylogenetic Diversity: Isolates From Wu et al. 2009
  • Phylogenetic Diversity: All From Wu et al. 2009
  • Fantasy analysis of # PFAMs GEBA Genomes PD/Genome ~0.1 PFAMs/Genome ~1000 PFAMs/PD ~10000 Total PFAMS ~10,000,000 From Wu et al. 2009
  • Conclusions • Sequencing phylogenetically novel genomes has many benefits • To obtain the most benefits, we need to change and adapt: computationally and experimentally • Most of the phylogenetic diversity of microbes remains to be sampled • Long live the Lake Arrowhead Microbial Genomes meeting
  • MICROBES
  • Proteobacteria • GEBA TM6 OS-K • At least 40 phyla Acidobacteria • A genomic Termite Group OP8 of bacteria encyclopedia Nitrospira Bacteroides • Genome Chlorobi of bacteria and Fibrobacteres Marine GroupA sequences are archaea WS3 Gemmimonas mostly from Firmicutes Fusobacteria Actinobacteria three phyla OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Solution: Really Coprothmermobacter OP10 Thermomicrobia Fill in the Tree Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • Thanks Institutions $$$$ JGI etc DOE UC Davis NSF DSMZ GBMF TIGR People Dongying Wu Phil Hugenholtz Nikos Kyrpides FIgure from Barton, Eisen et al. Hans-Peter Klenk “Evolution”, CSHL Press. Eddy Rubin Based on tree from Pace NR, 2003.