Jonathan Eisen talk on "The Importance of History" at Lake Arrowhead Small Genomes Meeting 2010

5,094 views

Published on

Talk by Jonathan Eisen on "The Importance of History" at Lake Arrowhead Small Genomes meeting in 2010.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,094
On SlideShare
0
From Embeds
0
Number of Embeds
3,845
Actions
Shares
0
Downloads
5
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Gets better with more markers - but we do not have lots of sequences for these markers. We can get them from genomes. The more diverse the genomes, thebeter the marker set will be\n
  • \n
  • Gets better with more markers - but we do not have lots of sequences for these markers. We can get them from genomes. The more diverse the genomes, thebeter the marker set will be\n
  • \n
  • \n
  • \n
  • \n
  • Gets better with more markers - but we do not have lots of sequences for these markers. We can get them from genomes. The more diverse the genomes, thebeter the marker set will be\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Jonathan Eisen talk on "The Importance of History" at Lake Arrowhead Small Genomes Meeting 2010

    1. 1. The Importance of History (and other obsessions) Jonathan A. Eisen UC DavisTalk for Lake Arrowhead Microbial Genomes 2010 (#LAMG10)
    2. 2. Social Networking in Science
    3. 3. Bacterial evolve
    4. 4. Evolution of Lake Arrowhead
    5. 5. Blast PeptideLAKEARROWHEAD
    6. 6. Homework• Do blastp search with other famous people associated with Lake Arrowhead Meeting• JEFFREYHMILLER• SARAHPALIN and her relationship to fungi B. fuckeliana• see http://phylogenomics.blogspot.com/ 2008/09/tracing-evolutionary-history-of- sarah.html
    7. 7. 2010
    8. 8. 2008
    9. 9. 2006
    10. 10. 2004
    11. 11. No2002
    12. 12. Wayback Machine
    13. 13. 2002
    14. 14. Quotes 2004• Space-time continuum of genes and genomes• Gene sequences are the wormhole that allows one to tunnel into the past• The human mind can conceive of things with no basis in physical reality• Thoughts can go faster than the speed of light
    15. 15. Quotes 2006• The human guts are a real milieu of stuff• You better kiss everybody• Microbes not only have a lot of sex, they have a lot of weird sex• This is how you do metagenomics on 50 dollars, and that’s Canadian dollars
    16. 16. Quotes 2008• Antibiotics do not kill things, they corrupt them• There comes a point in life when you have to bring chemists into the picture• The rectal swabs are here in tan color• And theres Jeffrey Dahmer• We are the environment. We live the phenotype.• If I have time I will tell you about a dream• A paper came out next year
    17. 17. Quotes 2010• We have been using this word for many years without actually realizing it was correct• Another thing you need to know" pause "Actually you dont NEED to know any of this• "I have been influenced by Fisher Price throughout my life• Dont take that away from us• It takes 1000 nanobiologists to make one microbiologist• I am going to wrap up as I hear the crickets chirping• And we will bring out the unused cheese from yesterday• In an engineering sense, the vagina is a simple plug flow reactor• This is going to be ironic coming from someone who studies circumcision• A little bit about time, but I am going to spend a lot less time on time than on space
    18. 18. Keywords I remember from 2010• Penis• Vagina• Anthrax• Acne• Ulcer (multiple kinds)• Global warming• Antibiotic resistance• Virulence 24
    19. 19. rRNA Tree of LifeBacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
    20. 20. Proteobacteria2002 TM6 OS-K Acidobacteria • At least 40 Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002
    21. 21. 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas sequences are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002
    22. 22. 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas sequences are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes • Some other NKB19 Verrucomicrobia Chlamydia phyla are only OP3 Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 sampled Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002
    23. 23. 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas sequences are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes • Some other NKB19 Verrucomicrobia Chlamydia phyla are only OP3 Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 sampled Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002
    24. 24. Why Increase Phylogenetic Coverage? • Common approach within some eukaryotic groups (FGP, NHGRI, etc) • Many successful small projects to fill in bacterial or archaeal gaps • Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature • Many potential benefits
    25. 25. Proteobacteria• NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi• A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Solution I: Coprothmermobacter OP10 sequence more Thermomicrobia Chloroflexi TM7 phyla Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
    26. 26. Proteobacteria• NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi• A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Still highly Coprothmermobacter OP10 biased in terms Thermomicrobia Chloroflexi TM7 of the tree Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
    27. 27. Major Lineages of Actinobacteria 2.5 Actinobacteria 2.5.1 Acidimicrobidae 2.5.1 Acidimicrobidae 2.5.1.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.1 Unclassified 2.5.1.3 Acidimicrobineae 2.5.1.3.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.3.2 Acidimicrobiaceae 2.5.1.4 BD2-10 2.5.1.3 Acidimicrobineae 2.5.1.5 EB1017 2.5.2 Actinobacteridae 2.5.1.4 BD2-10 2.5.2.1 Unclassified 2.5.2.10 Ellin306/WR160 2.5.1.5 EB1017 2.5.2.11 Ellin5012 2.5.2.12 Ellin5034 2.5.2 Actinobacteridae 2.5.2.13 Frankineae 2.5.2.13.1 Unclassified 2.5.2.1 Unclassified 2.5.2.13.2 Acidothermaceae 2.5.2.13.3 Ellin6090 2.5.2.10 Ellin306/WR160 2.5.2.13.4 Frankiaceae 2.5.2.11 Ellin5012 2.5.2.13.5 2.5.2.13.6 Geodermatophilaceae Microsphaeraceae 2.5.2.12 Ellin5034 2.5.2.13.7 2.5.2.14 Sporichthyaceae Glycomyces 2.5.2.13 Frankineae 2.5.2.15 2.5.2.15.1 Intrasporangiaceae Unclassified 2.5.2.14 Glycomyces 2.5.2.15.2 2.5.2.15.3 Dermacoccus Intrasporangiaceae 2.5.2.15 Intrasporangiaceae 2.5.2.16 2.5.2.17 Kineosporiaceae Microbacteriaceae 2.5.2.16 Kineosporiaceae 2.5.2.17.1 2.5.2.17.2 Unclassified Agrococcus 2.5.2.17 Microbacteriaceae 2.5.2.17.3 2.5.2.18 Agromyces Micrococcaceae 2.5.2.18 Micrococcaceae 2.5.2.19 2.5.2.2 Micromonosporaceae Actinomyces 2.5.2.19 Micromonosporaceae 2.5.2.20 2.5.2.20.1 Propionibacterineae Unclassified 2.5.2.2 Actinomyces 2.5.2.20.2 2.5.2.20.3 Kribbella Nocardioidaceae 2.5.2.20 Propionibacterineae 2.5.2.20.4 2.5.2.21 Propionibacteriaceae Pseudonocardiaceae 2.5.2.21 Pseudonocardiaceae 2.5.2.22 2.5.2.22.1 Streptomycineae Unclassified 2.5.2.22 Streptomycineae 2.5.2.22.2 2.5.2.22.3 Kitasatospora Streptacidiphilus 2.5.2.23 Streptosporangineae 2.5.2.23 2.5.2.23.1 Streptosporangineae Unclassified 2.5.2.3 Actinomycineae 2.5.2.23.2 2.5.2.23.3 Ellin5129 Nocardiopsaceae 2.5.2.4 Actinosynnemataceae 2.5.2.23.4 2.5.2.23.5 Streptosporangiaceae Thermomonosporaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.3 2.5.2.4 Actinomycineae Actinosynnemataceae 2.5.2.6 Brevibacteriaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.6 Brevibacteriaceae 2.5.2.7 Cellulomonadaceae 2.5.2.7 Cellulomonadaceae 2.5.2.8 Corynebacterineae 2.5.2.8 Corynebacterineae 2.5.2.8.1 Unclassified 2.5.2.8.2 Corynebacteriaceae 2.5.2.9 Dermabacteraceae 2.5.2.8.3 Dietziaceae 2.5.2.8.4 Gordoniaceae 2.5.3 Coriobacteridae 2.5.2.8.5 Mycobacteriaceae 2.5.2.8.6 Rhodococcus 2.5.3.1 Unclassified 2.5.2.8.7 Rhodococcus 2.5.2.8.8 Rhodococcus 2.5.3.2 Atopobiales 2.5.2.9 Dermabacteraceae 2.5.2.9.1 Unclassified 2.5.3.3 Coriobacteriales 2.5.2.9.2 Brachybacterium 2.5.2.9.3 Dermabacter 2.5.3.4 Eggerthellales 2.5.3 Coriobacteridae 2.5.3.1 Unclassified 2.5.4 OPB41 2.5.3.2 Atopobiales 2.5.3.3 Coriobacteriales 2.5.5 PK1 2.5.3.4 Eggerthellales 2.5.4 OPB41 2.5.6 Rubrobacteridae 2.5.5 PK1 2.5.6 Rubrobacteridae 2.5.6.1 Unclassified 2.5.6.1 Unclassified 2.5.6.2 "Thermoleiphilaceae 2.5.6.2 "Thermoleiphilaceae 2.5.6.2.1 Unclassified 2.5.6.2.2 Conexibacter 2.5.6.3 MC47 2.5.6.2.3 XGE514 2.5.6.3 MC47 2.5.6.4 Rubrobacteraceae 2.5.6.4 Rubrobacteraceae
    28. 28. Proteobacteria• NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi• A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Same trend in Coprothmermobacter OP10 Archaea Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
    29. 29. Proteobacteria• NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi• A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Same trend in Coprothmermobacter OP10 Eukaryotes Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
    30. 30. Proteobacteria• NSF-funded TM6 • At least 40 phyla OS-K Tree of Life Acidobacteria Termite Group of bacteria OP8 Project Nitrospira • Genome Bacteroides Chlorobi• A genome Fibrobacteres Marine GroupA sequences are from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Same trend in Coprothmermobacter OP10 Viruses Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
    31. 31. Proteobacteria• GEBA TM6 OS-K • At least 40 phyla Acidobacteria• A genomic Termite Group OP8 of bacteria encyclopedia Nitrospira Bacteroides • Genome Chlorobi of bacteria and Fibrobacteres Marine GroupA sequences are archaea WS3 Gemmimonas mostly from Firmicutes Fusobacteria Actinobacteria three phyla OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Solution: Really Coprothmermobacter OP10 Thermomicrobia Fill in the Tree Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
    32. 32. GEBA Pilot Project Overview• Identify major branches in rRNA tree for which no genomes are available• Identify those with a cultured representative in DSMZ• DSMZ grew > 200 of these and prepped DNA• Sequence and finish 100+ (covering breadth of bacterial/archaea diversity)• Annotate, analyze, release data• Assess benefits of tree guided sequencing• 1st paper Wu et al in Nature Dec 2009
    33. 33. GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen, Eddy Rubin, Jim Bristow, Tanya Woyke)• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al)• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, DSMZ, GBMF)
    34. 34. GEBA and Openness• All data released as quickly as possible w/ no restrictions to IMG-GEBA; Genbank, etc• Data also available in Biotorrents (http:// biotorrents.net)• Individual genome reports published in OA “Standards in Genome Sciences (SIGS)”• 1st GEBA paper in Nature freely available and published using Creative Commons License 43
    35. 35. GEBA Lesson 1rRNA Tree is Useful for IdentifyingPhylogenetically Novel Organisms 44
    36. 36. rRNA Tree of LifeBacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
    37. 37. Network of Life?Bacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
    38. 38. Compare PD in rRNA and WGT
    39. 39. PD of rRNA, Genome Trees SimilarFrom Wu et al. 2009 Nature 462, 1056-1060
    40. 40. GEBA Lesson 2Phylogeny-driven genome selectionhelps discover new genetic diversity
    41. 41. Network of Life?Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
    42. 42. Protein Family Rarefaction Curves• Take data set of multiple complete genomes• Identify all protein families using MCL• Plot # of genomes vs. # of protein families
    43. 43. Synapomorphies exist
    44. 44. Phylogenetic Distribution Novelty: Bacterial Actin Related Protein C. boidinii  gi57157304 S. cerevisiae  gi14318479 L. starkeyi  gi166080363  S. japonicus  gi213407080 ACTIN A. cliftonii  gi14269497 99 U. pertusa  gi50355609 H. sapiens  gi4501889 M. cerebralis  gi46326807 67 C. cinerea  gi169844021 N. crassa  gi85101929 ARP1 100 I. scapularis  gi215507378  51 100 H. sapiens  gi5031569 65 S. japonicus  gi213404844 100 S. cerevisiae  gi6320175 ARP2 D. melanogaster  gi24642545 100 G. gallus  gi45382569 75 C. neoformans  gi58266690 S. cerevisiae  gi6322525 ARP3 100 D. melanogaster  gi17737543 100 H. sapiens  gi5031573  H. ochraceum  gi227395998 BARP S. cerevisiae  gi1008244  73 P. patens  gi168051992  ARP4 99 A. thaliana  gi18394608  94 S. cerevisiae  gi1301932 100 S. japonicus  gi213408393  ARP5 87 D. discoideum  gi66802418 74 D. melanogaster  gi17737347 97 S. cerevisiae  gi6323114 100 D. hansenii gi21851 1921 ARP6 100 O. sativa  gi182657420  A. thaliana gi1841 1737 ARP7 D. melanogater  gi19920358 100 M. musculus  gi226246593 ARP10 0.5 Haliangium ochraceum DSM 14365 Patrik D’haeseleer, Adam Zemla, Victor Kunin See also Guljamow et al. 2007 Current Biology.
    45. 45. GEBA Lesson 3Phylogeny-driven genome selection improves genome annotation
    46. 46. Most/All Functional Prediction Improves w/ Better Phylogenetic Sampling • Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes • Better definition of protein family sequence “patterns” • Greatly improves “comparative” and “evolutionary” based predictions • Conversion of hypothetical into conserved hypotheticals • Linking distantly related members of protein families • Improved non-homology prediction Kostas Natalia Thanos Nikos IainMavrommatis Ivanova Lykidis Kyrpides Anderson
    47. 47. GEBA Lesson 4Metadata and individual genome papers important
    48. 48. SIGShttp://standardsingenomics.org/
    49. 49. GEBA Lesson 5 Phylogeny-driven genome selectionimproves analysis of metagenome data
    50. 50. genomes if no reference • Assigning reads to phylogenetic groups using multiple genes • Phylogenetic binning • Phylogenetic ecology - especially important Weighted % of Clones Al pha pr ot 0 0.1250 0.2500 0.3750 0.5000 Be eo Al ta ba ph G 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 am pr ot ct er a m eo ia Be pro ap ba ro ct ta teo D te er G p b el ob ia ta am rot ac pr ac Ep ot teU si lo eo ria m eo te nc ba ba ria la np Ep ap ss ro ct ct ifi te er si rot ed ob ia lo Pr ac n eo eria ot te De pr ba eo ria ba lta ote cte Cy ct pr ob ria an er ob ia o a ac C teo cte Ch te ya b ri ria la no ac a m b te Ac yd id ia ob e Fi act ria rm er Ba act ct er ia Ac ic ia Uses of phylogenetic er ut Ac oi tin es de tin te ob ob s a ac te C cte ria hl ri Aq or a Pl ui an fic ob ct om ae C i yc FB Sp et C iro es hl ch o ae te Major Phylogenetic Group Fi Sp rof rm s ic iro lex i Sargasso Phylotypes ut classification in metagenomics Ch es Fu cha lo ro De U fle so ete nc xi in ba s la Ch oc ss lo ct ifi ro oc ed bi er Ba Ecus ia ct ur - er ia yaTh C rcherm re na aeousfrrtsf tpgkrplLrplFrplPrplTrplEinfCrpsIrplSrplArplBrplKrplCrpsJ rcrplNrplDrplMrpsErpsSrpsBrpsKrpsCrpoBrpsMpyrGnusAdnaGrpmAsmpB ha a eo ta
    51. 51. genomes if no reference phylogenetic groups using multiple genes Limited • Phylogenetic binning • Phylogenetic ecology - especially important sampling Weighted % of Clones Al pha pr ot 0 0.1250 0.2500 0.3750 0.5000 Be eo Al ta ba ph G 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 pr a poor genomic am ot ct er m eo ia Be pro ap ba ro ct ta teo D te er G p b el ob ia ta • Assigning reads to in past pr ac am rot ac Ep ot teU si lo eo ria m eo te nc ba ba ria la np Ep ap ss ro ct ct ifi te er si rot ed ob ia lo Pr ac n eo eria ot te De pr ba eo ria ba lta ote cte Cy ct pr ob ria an er ob ia o a by ac C teo cte Ch te ya b ri ria la no ac a m b te Ac yd id ia ob e Fi act ria rm er Ba act ct er ia Ac ic ia Uses of phylogenetic er ut Ac oi tin es de tin te ob ob s a ac te C cte ria hl ri Aq or a Pl ui an fic ob ct om ae C i yc FB Sp et C iro es hl ch o ae te Major Phylogenetic Group Fi Sp rof rm s ic iro lex i Sargasso Phylotypes ut classification in metagenomics Ch es Fu cha lo ro De U fle so ete nc xi in ba s la Ch oc ss lo ct ifi ro oc ed bi er Ba Ecus ia ct ur - er ia yaTh C rcherm re na aeousfrrtsf tpgkrplLrplFrplPrplTrplEinfCrpsIrplSrplArplBrplKrplCrpsJ rcrplNrplDrplMrpsErpsSrpsBrpsKrpsCrpoBrpsMpyrGnusAdnaGrpmAsmpB ha a eo ta
    52. 52. Metagenomic Analysis Improves w/ Phylogenetic Sampling• Small but real improvements in –Gene identification / confirmation –Functional prediction –Binning –Phylogenetic classification
    53. 53. Metagenomic Analysis Improves w/ Phylogenetic Sampling• Small but real improvements in –Gene identification / confirmation –Functional prediction –Binning –Phylogenetic classification• But not a lot ...
    54. 54. GEBA Future 1 Need to adapt genomic andmetagenomic methods to make use of GEBA data
    55. 55. Al p ha pr ot Be eo ta ba G 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 pr ct am ot er m eo ia ap ba ro ct D te er el ob ia ta pr ac Ep ot te U si lo eo ria nc ba la np ct ss ro er ifi te ia ed ob Pr ac ot te eo ria ba Cy ct an er ob ia ac Ch te ria la m Ac yd id ia ob e Ba act ct er er ia Ac oi de tin te ob s ac te ria Aq Pl ui an fic ct om ae yc Sp etAMPHORA - each read on its own tree iro es ch ae Fi te rm s ic ut Ch es Improves with better lo ro U fle nc phylogenetic methods la xi ss Ch ifi lo ed ro bi Ba ct er ia Phylogenetic Binning Using AMPHORA frr tsf pgk rplL rplF rplP rplT rplE infC rpsI rplS rplA rplB rplK rplC rpsJ rplN rplD rplM rpsE rpsS rpsB rpsK rpsC rpoB rpsM pyrG nusA dnaG rpmA smpB
    56. 56. Improving Phylogeny for Metagenomic Reads• Examples using reference trees – AMPHORA (Wu and Eisen) – PPlacer (Erik Matsen) – FastTree (Morgan Price)• Variants – Use concatenated alignment of markers not just individual genes (Steven Kembel) – Apply to OTU identification not just classification (Thomas Sharpton) – CoBinning: look for linkage among fragments/genes (Aaron Darling)
    57. 57. Al p ha pr ot Be eo ta ba G 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 pr ct am ot er m eo ia ap ba ro ct D te er el ob ia ta pr ac Ep ot te U si lo eo ria nc ba la np ct ss ro er ifi te ia ed ob Pr ac ot te eo ria ba Cy ct an er ob ia ac Ch te ria la m Ac yd id ia ob e Ba act ct er er ia Ac oi de tin te ob s ac gene families te ria Aq Pl ui an fic ct om ae yc Sp etAMPHORA - each read on its own tree iro es ch ae Fi te rm s ic ut Improves with more Ch es lo ro U fle nc xi la Ch ss lo ifi ro ed bi Ba ct er ia Phylogenetic Binning Using AMPHORA frr tsf pgk rplL rplF rplP rplT rplE infC rpsI rplS rplA rplB rplK rplC rpsJ rplN rplD rplM rpsE rpsS rpsB rpsK rpsC rpoB rpsM pyrG nusA dnaG rpmA smpB
    58. 58. Identifying new markers• Take all genomes• All vs. all search• Identify protein families• For each family measure –Evenness in copy number –Universality –Phylogenetic congruence with WGT –Monophyly for superfamilies
    59. 59. Distances between gene trees and the AMPHORA concatenated genome tree rpmA coaE coaE rpmA trmD rplL rpsS rpsQ radA rplR rplD rplQ tsf rpsH frr smpB ttf rpsO rplR rplP rplM rpsS rplI rplV rpsB rplT rpsO rplO mraW rpsP rpsH rpsK rplQ rplU rplL tsf rplT trmD rplE rplS rpsP ttf rplC rpsI rplV mraW rplS rpsL infC rpsG rpsM rplM rplO rplI rplU pyrH rpsL rpsM rpsQ ruvA guaA radA rpsG purA smpB rplK priA rplD rpsK infC rplK rplC serS rplE rplA rplA rplF frr ruvA rplF rpsC serS rplN rplN rplP guaA rpsE ruvB pyrH rpsB rpsI rpsJ secY rRNA16S rpsJ secY purA rplB rplB priA nusA rpsE ruvB rpsCrRNA16S nusA 0 1 2 3 4 5 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 NODAL distance SPLIT distance AMPHORA marker Ribosomal protein Transcription/translation related protein DNA repair protein Protein of other function Distance between the genome tree and 100 random trees (average ± standard deviation)
    60. 60. Identifying new phylogenetic markers within phyla• Take all genomes within a phylum• All vs. all search• Identify protein families• For each family measure –Evenness in copy number –Universality –Phylogenetic congruence with WGT –Monophyly for superfamilies
    61. 61. Keep only the families with:Universality * Evenness * monophyly >= 90*90*90 Phylogenetic group Genome Number Gene Number Maker Candidates Archaea 62 145415 102 Actinobacteria 63 267783 136 Alphaproteobacteria 94 347287 142 Betaproteobacteria 56 266362 294 Gammaproteobacteria 126 483632 141 Deltaproteobacteria 25 102115 44 Epislonproteobacteria 18 33416 446 Bacteriodes 25 71531 179 Chlamydae 13 13823 561 Chloroflexi 10 33577 140 Cyanobacteria 36 124080 532 Firmicutes 106 312309 80 Spirochaetes 18 38832 72 Thermi 5 14160 727 Thermotogae 9 17037 646
    62. 62. Al p ha pr ot Be eo ta ba G 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 pr ct am ot er m eo ia ap ba ro ct D te er el ob ia ta pr ac Ep ot te U si lo eo ria nc ba la np ct ss ro er ifi te ia ed ob Pr ac ot te eo ria ba Cy ct an er ob ia ac Ch te ria la m Ac yd id ia ob e Ba act ct er er ia Ac oi de tin te ob s Other needs? ac te ria Aq Pl ui an fic ct om ae yc Sp etAMPHORA - each read on its own tree iro es ch ae Fi te rm s ic ut Ch es lo ro U fle nc xi la Ch ss lo ifi ro ed bi Ba ct er ia Phylogenetic Binning Using AMPHORA frr tsf pgk rplL rplF rplP rplT rplE infC rpsI rplS rplA rplB rplK rplC rpsJ rplN rplD rplM rpsE rpsS rpsB rpsK rpsC rpoB rpsM pyrG nusA dnaG rpmA smpB
    63. 63. Other Ways to Make Better Use of GEBA Data• Rebuild protein family models• Experiments from across the tree needed• Need better phylogenies, including HGT• Improved tools for using distantly related genomes in metagenomic analysis• Better recording and sharing of metadata about organisms
    64. 64. GEBA Future 2The dark matter of the biological universe
    65. 65. rRNA Tree of LifeBacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
    66. 66. Phylogenetic Diversity:Sequenced Bacteria & Archaea From Wu et al. 2009
    67. 67. Phylogenetic Diversity with GEBA From Wu et al. 2009
    68. 68. Phylogenetic Diversity: Isolates From Wu et al. 2009
    69. 69. Phylogenetic Diversity: All From Wu et al. 2009
    70. 70. Fantasy analysis of # PFAMs GEBA Genomes PD/Genome ~0.1 PFAMs/Genome ~1000 PFAMs/PD ~10000 Total PFAMS ~10,000,000 From Wu et al. 2009
    71. 71. Conclusions• Sequencing phylogenetically novel genomes has many benefits• To obtain the most benefits, we need to change and adapt: computationally and experimentally• Most of the phylogenetic diversity of microbes remains to be sampled• Long live the Lake Arrowhead Microbial Genomes meeting
    72. 72. MICROBES
    73. 73. Proteobacteria• GEBA TM6 OS-K • At least 40 phyla Acidobacteria• A genomic Termite Group OP8 of bacteria encyclopedia Nitrospira Bacteroides • Genome Chlorobi of bacteria and Fibrobacteres Marine GroupA sequences are archaea WS3 Gemmimonas mostly from Firmicutes Fusobacteria Actinobacteria three phyla OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely sampled OP3 Planctomycetes Spriochaetes • Solution: Really Coprothmermobacter OP10 Thermomicrobia Fill in the Tree Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
    74. 74. Thanks Institutions $$$$ JGI etc DOE UC Davis NSF DSMZ GBMF TIGRPeopleDongying WuPhil HugenholtzNikos Kyrpides FIgure from Barton, Eisen et al.Hans-Peter Klenk “Evolution”, CSHL Press.Eddy Rubin Based on tree from Pace NR, 2003.

    ×