Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Phylogenomics and the diversity and diversification of microbes

1,440 views

Published on

Talk by Jonathan Eisen for seminar series / class at UC Davis.

Published in: Education, Technology
  • Be the first to like this

Phylogenomics and the diversity and diversification of microbes

  1. 1. Phylogenomics and the Diversity and Diversification of Microbes Jonathan A. Eisen UC Davis UC Davis Talk February 11, 2011
  2. 2. Phylogenomics of Novelty Variation inMechanisms of Mechanisms:Origin of New Patterns, Causes Functions and Effects Species Evolution
  3. 3. Why do this?• Discover causes and effects of differences in evolvability• Improve predictions from genome analysis• Guide interpretation of biological data
  4. 4. Outline• Introduction• Phylogenomic Stories – Within genome invention of novelty – Stealing novelty – Communities of microbes – Community service and knowing what we don’t know
  5. 5. IntroductionGenome Sequencing
  6. 6. rRNA Tree of Life FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.Based on tree from Pace NR, 2003.
  7. 7. Limited Sampling of RRR Studies FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  8. 8. Limited Sampling of RRR Studies Haloferax MethanococcusChlorobiumDeinococcusThermotoga FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  9. 9. UV Survival E.coli vs H.volcanii 1 Ecoli vs. Hvolcanii 0.1 0.01Relative 0.001Survival 0.0001 1E-05 1E-06 1E-07 0 50 100 150 200 250 300 350 400 UV J/m2 E.coli NR10121 mfd- E.coli NR10125 mfd+ TIGR H.volcanii WFD11
  10. 10. H. volcanii UV Repair Label 7 - 45J / m2)0.6 Label5#2 0 J/m2 t0 45 J/m2 t0 45 J/m2 Photoreac. 45 J/m2 Dark 24 Hours0.40.2 0 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Avg. Mol. Wt.(Base Pairs)
  11. 11. Fleischmann et al.1995
  12. 12. Limited Sampling of RRR Studies Haloferax MethanococcusChlorobiumDeinococcusThermotoga FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  13. 13. From http://genomesonline.org
  14. 14. Human commensals
  15. 15. From http://genomesonline.org
  16. 16. Phylogenomics of Novelty I Origin of Functions from Within
  17. 17. Phylogenomics of Novelty • How does novelty originate? • Major categories of processes • From within • De novo invention • Simple substitutions • Duplication and divergence • Domain shuffling • Small & large rearrangements • Regulatory changes • From outside • Lateral gene transfer • Symbioses
  18. 18. Phylogenomics of Novelty • How does novelty originate? • Major categories of processes Mechanisms of • From within Origin of New • De novo invention Functions • Simple substitutions • Duplication and divergence • Domain shuffling • Small & large rearrangements • Regulatory changes • From outside • Lateral gene transfer • Symbioses
  19. 19. From Eisen et al.1997 NatureMedicine 3:1076-1078.
  20. 20. Blast Search of H. pylori “MutS”• Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs• Based on this TIGR predicted this species had mismatch repair• Assumes functional constancy Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
  21. 21. Predicting Function• Identification of motifs – Short regions of sequence similarity that are indicative of general activity – e.g., ATP binding• Homology/similarity based methods – Gene sequence is searched against a databases of other sequences – If significant similar genes are found, their functional information is used• Problem – Genes frequently have similarity to hundreds of motifs and multiple genes, not all with the same function
  22. 22. MutL??Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
  23. 23. Overlaying Functions onto Tree MutS2 Aquae MSH5 Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg MSH6 mSaco Yeast Human Mouse Arath Yeast MSH4 Celeg Human Arath HumanMSH3 Mouse Fly Spombe Yeast Xenla Rat Mouse Yeast HumanMSH1 Spombe Yeast MSH2 Neucr Arath Aquae Trepa Chltr DeiraTheaq BacsuBorbu Thema SynspStrpy Based on Eisen, Ecoli Neigo 1998 Nucl Acids MutS1 Res 26: 4291-4300.
  24. 24. Evolutionary Functional Prediction EXAMPLE A METHOD EXAMPLE B 2A CHOOSE GENE(S) OF INTEREST 5 3A 1 3 4 2B 2 IDENTIFY HOMOLOGS 5 1A 2A 1B 3B 6 ALIGN SEQUENCES 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 CALCULATE GENE TREE Duplication? 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 OVERLAY KNOWN FUNCTIONS ONTO TREE Duplication? 1 2 3 4 5 6 1A 2A 3A 1B 2B 3B INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication? Species 1 Species 2 Species 3 1A 1B 2A 2B 3A 3B 1 2 3 4 5 6 ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Based on Eisen, 1998 Genome Duplication Res 8: 163-167.
  25. 25. Example 2: Recent Changes• Phylogenomic functional prediction NJ * ** V.cholerae VC V.cholerae VC 0512 A1034 V.cholerae VC V.cholerae VC V.cholerae VC A0974 A0068 V.cholerae VC0825 0282 may not work well for very newly V.cholerae VCA0906 V.cholerae VC A0979 V.cholerae VCA1056 V.cholerae VC1643 V.cholerae VC 2161 V.cholerae VCA0923 ** ** V.cholerae VC0514 V.cholerae VC1868 V.cholerae VCA0773 V.cholerae VC1313 evolved functions V.cholerae VC1859 V.cholerae VC 1413 V.cholerae VCA0268 V.cholerae VC A0658 ** V.cholerae VC1405 V.cholerae VC 1298 * V.cholerae V.cholerae VCA0864 VC 1248 V.cholerae VCA0176 V.cholerae VCA0220 ** V.cholerae VC1289 V.cholerae VC1069 A ** V.cholerae VC2439• Can use understanding of origin of V.cholerae VC967 1 V.cholerae VCA0031 V.cholerae VC 1898 V.cholerae VCA0663 V.cholerae VC0988 A V.cholerae VC0216 V.cholerae VC0449 * V.cholerae VCA0008 V.cholerae VC1406 V.cholerae VC 1535 novelty to better interpret these cases? V.cholerae VC 0840 B.subtilis gi2633766 Synechocystis sp. gi1001299 Synechocystis sp.gi1001300 * Synechocystis sp. gi1652276 * Synechocystis * H.pylori sp. gi1652103 gi2313716 H.pylori 99 gi4155097 **C.jejuni ** C.jejuniCj1190c Cj1110c A.fulgidus gi2649560 A.fulgidus gi2649548 ** B.subtilis gi2634254• Screen genomes for genes that have B.subtilis gi2632630 B.subtilis gi2635607 B.subtilis gi2635608 B.subtilis ** ** B.subtilis gi2635609 ** gi2635610 B.subtilis E.coli gi2635882 E.coligi1788195 gi2367378 * ** E.coligi1788194 E.coli A1092 gi1787690 V.cholerae VC changed recently V.cholerae VC0098 E.coli gi1789453 H.pylori gi2313186 H.pylori 99 gi4154603 C.jejuni ** C.jejuni Cj0144 Cj1564 C.jejuni ** C.jejuniCj0262c ** Cj1506c H.pylori gi2313163 * H.pylori 99 gi4154575 **H.pylori gi2313179 ** H.pylori 99 gi4154599– Pseudogenes and gene loss ** C.jejuni Cj0019c C.jejuni C.jejuni Cj0951c Cj0246c B.subtilis gi2633374 T.maritima TM0014 V.cholerae VC V.cholerae VC 1403 A1088 T.pallidum gi3322777 T.pallidum ** T.pallidum gi3322939 gi3322938 ** B.burgdorferi gi2688522– Contingency Loci T.pallidum gi3322296 B.burgdorferi * T.maritima gi2688521 TM0429 T.maritima **T.maritima TM0918 ** TM1428 T.maritima TM0023 * T.maritima TM1143 T.maritima TM1146 P.abyssi PAB1308 P.horikoshii gi3256846 ** P.horikoshii P.abyssi PAB1336– Acquisition (e.g., LGT) ** gi3256896 ** **P.abyssi PAB2066 ** P.horikoshii gi3258290 * ** P.abyssi PAB1026 P.horikoshii gi3256884 ** D.radiodurans DRA00354 D.radiodurans DRA0353 ** D.radiodurans ** ** VC DRA0352 V.cholerae 1394 P.abyssi PAB1189 P.horikoshii gi3258414– Unusual dS/dN ratios ** B.burgdorferi gi2688621 M.tuberculosis gi1666149 V.cholerae VC 0622– Rapid evolutionary rates– Recent duplications
  26. 26. Tetrahymena Genome Processing • Probably exists as a defense mechanism • Analogous to RIPPING and heterochromatin silencing • Presence of repetitive DNA in MAC but not TEs suggests the mechanism involves targeting foreign DNA • Thus unlike RIPPING ciliate processing does not limit diversification by duplicationEisen et al. 2006. PLoS Biology.
  27. 27. Phylogenomics of Novelty IISometimes, it is easier to steal, borrow, or coopt functions rather than evolve them anew
  28. 28. Stealing DNA
  29. 29. rRNA Tree of LifeBacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  30. 30. Perna et al. 2003
  31. 31. Network of LifeBacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  32. 32. Correlated gain/loss of genes• Microbial genes are lost rapidly when not maintained by selection• Genes can be acquired by lateral transfer• Frequently gain and loss occurs for entire pathways/processes• Thus might be able to use correlated presence/absence information to identify genes with similar functions
  33. 33. Non-Homology Predictions: Phylogenetic Profiling• Step 1: Search all genes in organisms of interest against all other genomes• Ask: Yes or No, is each gene found in each other species• Cluster genes by distribution patterns (profiles)
  34. 34. Carboxydothermus hydrogenoformans• Isolated from a Russian hotspring• Thermophile (grows at 80°C)• Anaerobic• Grows very efficiently on CO (Carbon Monoxide)• Produces hydrogen gas• Low GC Gram positive (Firmicute)• Genome Determined (Wu et al. 2005 PLoS Genetics 1: e65. )
  35. 35. Homologs of Sporulation Genes Wu et al. 2005 PLoS Genetics 1: e65.
  36. 36. Carboxydothermus sporulates Wu et al. 2005 PLoS Genetics 1: e65.
  37. 37. Wu et al. 2005 PLoS Genetics 1: e65.
  38. 38. Stealing Organisms (Symbioses)
  39. 39. Mutualistic Genome Evolution• Compare and contrast different types of mutualistic symbioses• Diverse hosts, symbionts, biology, ages• Organelles, chemosymbioses, photosynthetic symbioses, nutritional symbioses• What are the rules & patterns?
  40. 40. Glassy Winged Sharpshooter • Feeds on xylem sap • Vector for Pierce’s Disease • Potential bioterror agent
  41. 41. Sharpshooter Shotgun Sequencing shotgun Collaboration with Nancy Wu et al. 2006 PLoS Biology 4: e188. Moran’s lab
  42. 42. Higher Evolutionary Rates in EndosymbiontsWu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
  43. 43. Variation in Evolution Rates MutS MutL + + + + + + + + _ _ _ _Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
  44. 44. Polymorphisms in Metapopulation• Data from ~200 hosts – 104 SNPs – 2 indels• PCR surveys show that this is between host variation• Much lower ratio of transitions:transversions than in Blochmannia• Consistent with absence of MMR from Blochmannia
  45. 45. Baumannia is a Vitamin andCofactor Producing Machine Wu et al. 2006 PLoS Biology 4: e188.
  46. 46. No Amino-Acid Synthesis
  47. 47. The Uncultured Majority
  48. 48. Great Plate Count AnomalyCulturing Microscope Count Count
  49. 49. Great Plate Count AnomalyCulturing Microscope Count <<<< Count
  50. 50. Great Plate Count Anomaly DNACulturing Microscope Count <<<< Count
  51. 51. rRNA PCRThe Hidden Majority Richness estimates Hugenholtz 2002 Bohannan and Hughes 2003
  52. 52. rRNA data increasing exponentially too
  53. 53. Perna et al. 2003
  54. 54. Metagenomics shotgun clone
  55. 55. How can we best use metagenomic data?• Many possible uses including: – Improvements on rRNA based phylotyping and species diversity measurements – Adding functional information on top of phylogenetic/species diversity information• Most/all possible uses either require or are improved with phylogenetic analysis
  56. 56. Example I: Phylotyping with rRNA and other genes
  57. 57. Functional Diversity of Proteorhodopsins? Venter et al., 2004
  58. 58. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  59. 59. Example II: Binning
  60. 60. Metagenomics Challenge
  61. 61. Binning challengeA TB UC VD WE XF YG Z
  62. 62. Binning challengeA TB UC VD WE XF YG Best binning method: reference genomes Z
  63. 63. Binning challengeA TB UC VD WE XF YG Best binning method: reference genomes Z
  64. 64. Binning challengeA TB UC VD WE XF YG No reference genome? What do you do? Z
  65. 65. Binning challengeA TB UC VD WE XF YG No reference genome? What do you do? Z Phylogeny ....
  66. 66. No Amino-Acid Synthesis
  67. 67. ???????
  68. 68. CFB Phyla
  69. 69. Sulcia makes amino acidsBaumannia makes vitamins and cofactors Wu et al. 2006 PLoS Biology 4: e188.
  70. 70. Phylogenomics of Novelty III Knowing What We Don’t Know
  71. 71. Research Topics Variation inMechanisms of Mechanisms:Origin of New Patterns, Causes Functions and Effects Species Evolution
  72. 72. Research Topics Variation inMechanisms of Mechanisms:Origin of New Patterns, Causes Functions and Effects Species Evolution
  73. 73. As of 2002
  74. 74. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  75. 75. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  76. 76. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  77. 77. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  78. 78. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of OP8 Project Nitrospira Bacteroides bacteria Chlorobi• A genome Fibrobacteres Marine GroupA • Genome WS3 from each of Gemmimonas sequences are Firmicutes eight phyla Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are only Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Solution I: DictyoglomusEisen, Ward, Aquificae Thermudesulfobacteria sequence moreRobb, Nelson, et Thermotogae phyla OP1al OP11
  79. 79. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Still highly OP10 Thermomicrobia Chloroflexi biased in terms TM7 Deinococcus-Thermus Dictyoglomus of the tree AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  80. 80. Proteobacteria• GEBA TM6 OS-K • At least 40 Acidobacteria• A genomic Termite Group OP8 phyla of bacteria encyclopedia Nitrospira Bacteroides • Genome Chlorobi of bacteria Fibrobacteres Marine GroupA sequences are and archaea WS3 Gemmimonas mostly from Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter OP10 • Solution: Really Thermomicrobia Chloroflexi Fill in the Tree TM7 Deinococcus-Thermus Dictyoglomus Aquificae ThermudesulfobacteriaEisen & Ward, PIs Thermotogae OP1 OP11
  81. 81. http://www.jgi.doe.gov/programs/GEBA/pilot.html
  82. 82. GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen, Eddy Rubin, Jim Bristow)• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al)• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, Eddy Rubin, Jim Bristow)
  83. 83. GEBA Pilot Project Overview• Identify major branches in rRNA tree for which no genomes are available• Identify those with a cultured representative in DSMZ• DSMZ grew > 200 of these and prepped DNA• Sequence and finish 100+ (covering breadth of bacterial/archaea diversity)• Annotate, analyze, release data• Assess benefits of tree guided sequencing• 1st paper Wu et al in Nature Dec 2009
  84. 84. GEBA Phylogenomic Lesson 1The rRNA Tree of Life is a Useful Toolfor Identifying Phylogenetically Novel Genomes
  85. 85. Compare PD in TreesFrom Wu et al. 2009 Nature 462, 1056-1060
  86. 86. GEBA Phylogenomic Lesson 2The rRNA Tree of Life is not perfect ...
  87. 87. 16s Says Hyphomonas is in RhodobacterialesBadger et al.2005 Int JSystem EvolMicrobiol 55:1021-1026.
  88. 88. WGT and individual gene trees: Its Related to CaulobacteralesBadger et al.2005 Int JSystem EvolMicrobiol 55:1021-1026.
  89. 89. GEBA Phylogenomic Lesson 3 Phylogeny-driven genome selection helps discover new genetic diversity
  90. 90. Network of LifeBacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  91. 91. Protein Family Rarefaction Curves• Take data set of multiple complete genomes• Identify all protein families using MCL• Plot # of genomes vs. # of protein families
  92. 92. Wu et al. 2009 Nature 462, 1056-1060
  93. 93. Wu et al. 2009 Nature 462, 1056-1060
  94. 94. Wu et al. 2009 Nature 462, 1056-1060
  95. 95. Wu et al. 2009 Nature 462, 1056-1060
  96. 96. Wu et al. 2009 Nature 462, 1056-1060
  97. 97. Synapomorphies existWu et al. 2009 Nature 462, 1056-1060
  98. 98. +,%-./&#(%)"* !"#$%"&(%)"*! !
  99. 99. Phylogenetic Distribution Novelty: Bacterial Actin Related Protein 2"#3)&4&*&& !"#*)$*),+% 5"#$-.-6&0&1- !"#$%,$-%)( 7"#0(1.8-9& !"#$+-+,,! 5"#:1,)*&$/0 !"#&$,%+)+-+ !"#$% !"#$%&()*&& !"#$%&(%() (( +"#,-.(/01 !"#*+,**+( ;"#01,&-*0 !"#%*+$--( <"#$-.-3.1%&0 !"#%,&-+) ) 2"#$&*-.-1 !"#$(-%%+&$ ="#$.1001 !"#-*$+$(&( !&( $++ >"#0$1,/%1.&0 !"#&$**+),)-! *$ $++ ;"#01,&-*0 !"#*+,$*( * 5"#:1,)*&$/0 !"#&$,%+%-%% $++ 5"#$-.-6&0&1- !"#,&+$)* !&) ?"#@-%1*)A10(-. !"#&%%&*%* $++ B"#A1%%/0# "#%*,-&*( )* 2"#*-)).@1*0 !"#*-&(+ 5"#$-.-6&0&1- !"#,&&*&* !&* $++ ?"#@-%1*)A10(-. !"#$)),)*%, $++ ;"#01,&-*0 !"#*+,$*),! ;"#)$C.1$-/@ !"#&&),(*((- +!& 5"#$-.-6&0&1- !"#$++-&%%! ), ."#,1(-*0 !"#$-+*$((&! !&, (( !"#(C1%&1*1 !"#$-,(%+-! (% 5"#$-.-6&0&1- !"#$,+$(,& $++ 5"#:1,)*&$/0 !"#&$,%+-,(,! !&- -) ?"#4&0$)&4-/@ !"#-+&%$- )% ?"#@-%1*)A10(-. !"#$)),),%) () 5"#$-.-6&0&1- !"#,&,$$% $++ ?"#C1*0-*&&!"#&$-*$ $(&$ !&. $++ D"#01(&61 !"#$-&*)%&+! !"#(C1%&1*1!"#$-%$ $),) !&/ ?"#@-%1*)A1(-. !"#$((&+,*- $++ <"#@/0$/%/0 !"#&&&%*(, !&(0 +/*! Haliangium ochraceum DSM 14365 Patrik D’haeseleer, Adam Zemla, Victor KuninWu et al. 2009 Nature 462, 1056-1060 See also Guljamow et al. 2007 Current Biology.
  100. 100. GEBA Phylogenomic Lesson 4Phylogeny driven genome selection(and phylogenetics in general)improves genome annotation
  101. 101. Most/All Functional Prediction Improves w/ Better Phylogenetic Sampling • Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes • Better definition of protein family sequence “patterns” • Greatly improves “comparative” and “evolutionary” based predictions • Conversion of hypothetical into conserved hypotheticals • Linking distantly related members of protein families • Improved non-homology prediction Kostas Natalia Thanos Nikos IainMavrommatis Ivanova Lykidis Kyrpides Anderson
  102. 102. GEBA Phylogenomic Lesson 5 Improves analysis of genome data from uncultured organisms
  103. 103. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  104. 104. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi without good C FB Major Phylogenetic Group Sargasso Phylotypes C Cannot be done hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c ia sampling of genomes oc cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70

×