Phylogenomics                         Jonathan A. Eisen                            UC Davis              Bodega Applied Ph...
Fleischmann et al.                         1995 Science                         269:496-512Tuesday, March 8, 2011
Whole Genome Shotgun SequencingTuesday, March 8, 2011
Whole Genome Shotgun SequencingTuesday, March 8, 2011
Whole Genome Shotgun Sequencing Warner Brothers, Inc.Tuesday, March 8, 2011
Whole Genome Shotgun Sequencing                          shotgun Warner Brothers, Inc.Tuesday, March 8, 2011
Whole Genome Shotgun Sequencing                          shotgun Warner Brothers, Inc.Tuesday, March 8, 2011
Whole Genome Shotgun Sequencing                          shotgun Warner Brothers, Inc.                                    ...
Whole Genome Shotgun Sequencing                          shotgun Warner Brothers, Inc.                                    ...
Assemble FragmentsTuesday, March 8, 2011
Assemble Fragments                  sequencer outputTuesday, March 8, 2011
Assemble Fragments                  sequencer outputTuesday, March 8, 2011
Assemble Fragments                  sequencer output                                     assemble                         ...
Assemble Fragments                  sequencer output                                     assemble                         ...
From http://genomesonline.orgTuesday, March 8, 2011
Tuesday, March 8, 2011
Tuesday, March 8, 2011
Tuesday, March 8, 2011
Tuesday, March 8, 2011
Genome Sequences Have               Revolutionized Microbiology         • Predictions of metabolic processes         • Bet...
General Steps in Analysis of            Complete Genomes       • Identification/prediction of genes       • Characterizatio...
Genome SizeTuesday, March 8, 2011
Genome         Structure:            More          Variable         than Once          ThoughtTuesday, March 8, 2011
Tuesday, March 8, 2011
Why Completeness is     • Improves characterization of genome       features           – Gene order, replication origins  ...
Vibrio cholerae MetabolismTuesday, March 8, 2011
Tuesday, March 8, 2011
From http://genomesonline.orgTuesday, March 8, 2011
Phylogenomic Analysis         • Evolutionary reconstructions greatly           improve genome analyses         • Genome an...
Outline         • Phylogenomic Tales               –   Selecting genomes for sequencing               –   Species evolutio...
Outline         • Phylogenomic Tales               –   Selecting genomes for sequencing               –   Species evolutio...
GEBA Introduction                         Knowing What We Don’t KnowTuesday, March 8, 2011
Major Microbial Sequencing                    Efforts      •   Coordinated, top-down efforts            – Fungal Genome In...
As of 2002Tuesday, March 8, 2011
As of 2002               Proteobacteria                         TM6                         OS-K                    • At l...
As of 2002               Proteobacteria                         TM6                         OS-K                          ...
As of 2002               Proteobacteria                         TM6                         OS-K                          ...
As of 2002               Proteobacteria                         TM6                         OS-K                          ...
Need for Tree Guidance Well Established     • Common approach within some eukaryotic       groups     • Many small project...
Proteobacteria• NSF-funded             TM6                         OS-K                                                 • ...
Organisms Selected        Phylum                  Species selected        Chrysiogenes            Chrysiogenes arsenatis (...
Proteobacteria• NSF-funded             TM6                         OS-K                                                 • ...
Major Lineages of Actinobacteria                                                                       2.5 Actinobacteria ...
Proteobacteria• NSF-funded             TM6                         OS-K                                                 • ...
Proteobacteria• NSF-funded             TM6                         OS-K                                                 • ...
Proteobacteria• NSF-funded             TM6                         OS-K                                                 • ...
Proteobacteria• GEBA                   TM6                         OS-K                    • At least 40                  ...
http://www.jgi.doe.gov/programs/GEBA/pilot.htmlTuesday, March 8, 2011
GEBA Pilot Project: Components      • Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan        Eisen, Eddy Rubin...
rRNA Tree of Life                          FIgure from Barton, Eisen et al.                             “Evolution”, CSHL ...
Tuesday, March 8, 2011
Tuesday, March 8, 2011
Tuesday, March 8, 2011
B:                                      Ac                                      in t                                      ...
GEBA Pilot Project Overview        • Identify major branches in rRNA tree for          which no genomes are available     ...
GEBA Phylogenomic Lesson 1                 The rRNA Tree of Life is a Useful Tool                 for Identifying Phylogen...
rRNA Tree of Life                         Bacteria                                                                Archaea ...
The Core Gets Small ...Tuesday, March 8, 2011
The PangenomeTuesday, March 8, 2011
Islands Among SyntenyTuesday, March 8, 2011
The PangenomeTuesday, March 8, 2011
Network of Life                         Bacteria                                                                Archaea   ...
Using the CoreTuesday, March 8, 2011
Wh  Whole genome tree  built using  AMPHORA  by Martin Wu and  Dongying WuTuesday, March 8, 2011
Tuesday, March 8, 2011
Four Models for Rooting TOL                         from Lake et al. doi: 10.1098/rstb.2009.0035Tuesday, March 8, 2011
GEBA Phylogenomic Lesson 2                      rRNA Tree is good but not perfect                    and better genomic sa...
16s Says Hyphomonas is in RhodobacterialesBadger et al.2005Tuesday, March 8, 2011
WGT and individual gene trees:                         Its Related to CaulobacteralesBadger et al.2005Tuesday, March 8, 2011
16s                                          WGT, 23S  Badger et al. 2005 Int J System Evol Microbiol 55: 1021-1026.Tuesda...
Caveats: ignoring LGT and using               concatenated alignmentsTuesday, March 8, 2011
Concatenated Alignment ML TreeTuesday, March 8, 2011
Green Non Sulfur BacteriaTuesday, March 8, 2011
Chlamydia-VerrucomicrobiaTuesday, March 8, 2011
ProteobacteriaTuesday, March 8, 2011
Zimmer. New York Times. 2009Tuesday, March 8, 2011
GEBA Phylogenomic Lesson 3                      Phylogenetics guided genome                     selection (and phylogeneti...
Predicting Function         • Key step in genome projects         • More accurate predictions help guide           experim...
From Eisen et                         al. 1997 Nature                         Medicine 3:                         1076-107...
Blast Search of H. pylori “MutS”         • Blast search pulls up Syn. sp MutS#2 with much higher p           value than ot...
Predicting Function         • Identification of motifs               – Short regions of sequence similarity that are indic...
MutL??     From http://asajj.roswellpark.org/huberman/dna_repair/mmr.htmlTuesday, March 8, 2011
Phylogenetic Tree of MutS Family                                              Aquae                                       ...
MutS Subfamilies                                            MSH5                        MutS2                             ...
Overlaying Functions onto Tree                                                                        MutS2               ...
Functional Prediction Using Tree               MSH5 - Meiotic Crossing Over                MutS2 - Unknown Functions      ...
Tuesday, March 8, 2011
PHYLOGENENETIC PREDICTION OF GENE FUNCTION                                     EXAMPLE A                                  ...
Phylogenetic Prediction of         • Termed phylogenomics (Eisen, et al 1997)         • Greatly improves accuracy of funct...
Example 2: Recent Changes        • Phylogenomic functional prediction         NJ                                          ...
Example 3: Non homology                             methods         • Many genes have homologs in other species           ...
Phylogenetic profiling basis         • Microbial genes are lost rapidly when not           maintained by selection         ...
Non-Homology Predictions:               Phylogenetic Profiling          • Step 1: Search all genes in            organisms ...
Carboxydothermus hydrogenoformans   • Isolated from a Russian hotspring   • Thermophile (grows at 80°C)   • Anaerobic   • ...
Homologs of Sporulation Genes                                    Wu et al. 2005                                    PLoS Ge...
Carboxydothermus sporulates                         Wu et al. 2005 PLoS Genetics 1: e65.Tuesday, March 8, 2011
Wu et al. 2005 PLoS Genetics 1: e65.Tuesday, March 8, 2011
PG Profiling Works Better Using                    OrthologyTuesday, March 8, 2011
GEBA Lesson 3:              Phylogeny driven genome selection (and             phylogenetics) improves genome annotation  ...
GEBA Lesson 4:                          Metadata ImportantTuesday, March 8, 2011
GEBA Phylogenomic Lesson 5                    Phylogeny-driven genome selection                    helps discover new gene...
Network of Life                         Bacteria                                                                Archaea   ...
Protein Family Rarefaction         • Take data set of multiple complete genomes         • Identify all protein families us...
Wu et al. 2009 Nature 462, 1056-1060Tuesday, March 8, 2011
Wu et al. 2009 Nature 462, 1056-1060Tuesday, March 8, 2011
Wu et al. 2009 Nature 462, 1056-1060Tuesday, March 8, 2011
Wu et al. 2009 Nature 462, 1056-1060Tuesday, March 8, 2011
Wu et al. 2009 Nature 462, 1056-1060Tuesday, March 8, 2011
Synapomorphies existWu et al. 2009 Nature 462, 1056-1060Tuesday, March 8, 2011
Families/PD not uniform           +,%-./&#(%)"*                                   !"#$%"&(%)"*       !                    ...
Structural Novelty         • Of the 17000 protein families in the GEBA56, 1800           are novel in sequence (Wu)       ...
GEBA Phylogenomic Lesson 6                         Improves analysis of genome data                            from uncult...
Great Plate Count Anomaly                         Culturing   Microscope                          Count       CountTuesday...
Great Plate Count Anomaly                         Culturing       Microscope                          Count      <<<< Coun...
Environmental DNA Analysis                                                      DNA                         Culturing     ...
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Talk for UC Davis Applied Phylogenetics Course at Bodega Bay
Upcoming SlideShare
Loading in …5
×

Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

3,681 views

Published on

Talk by Jonathan Eisen for UC Davis Applied Phylogenetics Course at Bodega Bay

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,681
On SlideShare
0
From Embeds
0
Number of Embeds
994
Actions
Shares
0
Downloads
66
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

  1. 1. Phylogenomics Jonathan A. Eisen UC Davis Bodega Applied Phylogenetics Workshop March 7, 2011Tuesday, March 8, 2011
  2. 2. Fleischmann et al. 1995 Science 269:496-512Tuesday, March 8, 2011
  3. 3. Whole Genome Shotgun SequencingTuesday, March 8, 2011
  4. 4. Whole Genome Shotgun SequencingTuesday, March 8, 2011
  5. 5. Whole Genome Shotgun Sequencing Warner Brothers, Inc.Tuesday, March 8, 2011
  6. 6. Whole Genome Shotgun Sequencing shotgun Warner Brothers, Inc.Tuesday, March 8, 2011
  7. 7. Whole Genome Shotgun Sequencing shotgun Warner Brothers, Inc.Tuesday, March 8, 2011
  8. 8. Whole Genome Shotgun Sequencing shotgun Warner Brothers, Inc. sequenceTuesday, March 8, 2011
  9. 9. Whole Genome Shotgun Sequencing shotgun Warner Brothers, Inc. sequenceTuesday, March 8, 2011
  10. 10. Assemble FragmentsTuesday, March 8, 2011
  11. 11. Assemble Fragments sequencer outputTuesday, March 8, 2011
  12. 12. Assemble Fragments sequencer outputTuesday, March 8, 2011
  13. 13. Assemble Fragments sequencer output assemble fragmentsTuesday, March 8, 2011
  14. 14. Assemble Fragments sequencer output assemble fragments Closure & AnnotationTuesday, March 8, 2011
  15. 15. From http://genomesonline.orgTuesday, March 8, 2011
  16. 16. Tuesday, March 8, 2011
  17. 17. Tuesday, March 8, 2011
  18. 18. Tuesday, March 8, 2011
  19. 19. Tuesday, March 8, 2011
  20. 20. Genome Sequences Have Revolutionized Microbiology • Predictions of metabolic processes • Better vaccine and drug design • New insights into mechanisms of evolution • Genomes serve as template for functional studies • New enzymes and materials for engineering and synthetic biologyTuesday, March 8, 2011
  21. 21. General Steps in Analysis of Complete Genomes • Identification/prediction of genes • Characterization of gene features • Characterization of genome features • Prediction of gene function • Prediction of pathways • Integration with known biological data • Comparative genomicsTuesday, March 8, 2011
  22. 22. Genome SizeTuesday, March 8, 2011
  23. 23. Genome Structure: More Variable than Once ThoughtTuesday, March 8, 2011
  24. 24. Tuesday, March 8, 2011
  25. 25. Why Completeness is • Improves characterization of genome features – Gene order, replication origins • Better comparative genomics – Genome duplications, inversions • Presence and absence of particular genes can be very important • Missing sequence might be important (e.g., centromere) • Allows researchers to focus on biology not sequencingTuesday, March 8, 2011
  26. 26. Vibrio cholerae MetabolismTuesday, March 8, 2011
  27. 27. Tuesday, March 8, 2011
  28. 28. From http://genomesonline.orgTuesday, March 8, 2011
  29. 29. Phylogenomic Analysis • Evolutionary reconstructions greatly improve genome analyses • Genome analysis greatly improves evolutionary reconstructions • There is a feedback loop such that these should be integratedTuesday, March 8, 2011
  30. 30. Outline • Phylogenomic Tales – Selecting genomes for sequencing – Species evolution – Predicting functions of genes – Uncultured microbes – Searching for novel organisms and genesTuesday, March 8, 2011
  31. 31. Outline • Phylogenomic Tales – Selecting genomes for sequencing – Species evolution – Predicting functions of genes – Uncultured microbes – Searching for novel organisms and genes • All of these going to be told in context of a recent project “A Genomic Encyclopedia of Bacteria and Archaea” (aka GEBA)Tuesday, March 8, 2011
  32. 32. GEBA Introduction Knowing What We Don’t KnowTuesday, March 8, 2011
  33. 33. Major Microbial Sequencing Efforts • Coordinated, top-down efforts – Fungal Genome Initiative (Broad/Whitehead) – Gordon and Betty Moore Foundation Marine Microbial Genome Sequencing Project – Sanger Center Pathogen Sequencing Unit – NHGRI Human Gut Microbiome Project – NIH Human Microbiome Program • White paper or grant systems – NIAID Microbial Sequencing Centers – DOE/JGI Community Sequencing Program – DOE/JGI BER Sequencing Program – NSF/USDA Microbial Genome Sequencing • Covers lots of ground and biological diversityTuesday, March 8, 2011
  34. 34. As of 2002Tuesday, March 8, 2011
  35. 35. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002Tuesday, March 8, 2011
  36. 36. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002Tuesday, March 8, 2011
  37. 37. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002Tuesday, March 8, 2011
  38. 38. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002Tuesday, March 8, 2011
  39. 39. Need for Tree Guidance Well Established • Common approach within some eukaryotic groups • Many small projects funded to fill in some bacterial or archaeal gaps • Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literatureTuesday, March 8, 2011
  40. 40. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of OP8 Project Nitrospira Bacteroides bacteria Chlorobi• A genome Fibrobacteres Marine GroupA • Genome WS3 from each of Gemmimonas sequences are Firmicutes eight phyla Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are only Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Solution I: Dictyoglomus Eisen, Ward, Aquificae Thermudesulfobacteria sequence more Robb, Nelson, et Thermotogae phyla OP1 al OP11Tuesday, March 8, 2011
  41. 41. Organisms Selected Phylum Species selected Chrysiogenes Chrysiogenes arsenatis (GCA) Coprothermobacter Coprothermobacter proteolyticus (GCBP) Dictyoglomi Dictyoglomus thermophilum (GD T ) Thermodesulfobacteria Thermodesulfobacterium commune (GTC) Nitrospirae Thermodesulfovibrio yellowstonii (GTY) Thermomicrobia Thermomicrobium roseum (GTR ) Deferribacteres Geovibrio thiophilus (GGT) Synergistes Synergistes jonesii (GSJ)Tuesday, March 8, 2011
  42. 42. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Still highly OP10 Thermomicrobia Chloroflexi biased in terms TM7 Deinococcus-Thermus Dictyoglomus of the tree AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11Tuesday, March 8, 2011
  43. 43. Major Lineages of Actinobacteria 2.5 Actinobacteria 2.5.1 Acidimicrobidae 2.5.1 Acidimicrobidae 2.5.1.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.1 Unclassified 2.5.1.3 Acidimicrobineae 2.5.1.3.1 Unclassified 2.5.1.2 "Microthrixineae 2.5.1.3.2 Acidimicrobiaceae 2.5.1.4 BD2-10 2.5.1.3 Acidimicrobineae 2.5.1.5 EB1017 2.5.2 Actinobacteridae 2.5.1.4 BD2-10 2.5.2.1 Unclassified 2.5.2.10 Ellin306/WR160 2.5.1.5 EB1017 2.5.2.11 Ellin5012 2.5.2.12 Ellin5034 2.5.2 Actinobacteridae 2.5.2.13 Frankineae 2.5.2.13.1 Unclassified 2.5.2.1 Unclassified 2.5.2.13.2 Acidothermaceae 2.5.2.10 Ellin306/WR160 2.5.2.13.3 2.5.2.13.4 Ellin6090 Frankiaceae 2.5.2.11 Ellin5012 2.5.2.13.5 2.5.2.13.6 Geodermatophilaceae Microsphaeraceae 2.5.2.12 Ellin5034 2.5.2.13.7 2.5.2.14 Sporichthyaceae Glycomyces 2.5.2.13 Frankineae 2.5.2.15 2.5.2.15.1 Intrasporangiaceae Unclassified 2.5.2.14 Glycomyces 2.5.2.15.2 2.5.2.15.3 Dermacoccus Intrasporangiaceae 2.5.2.15 Intrasporangiaceae 2.5.2.16 2.5.2.17 Kineosporiaceae Microbacteriaceae 2.5.2.16 Kineosporiaceae 2.5.2.17.1 2.5.2.17.2 Unclassified Agrococcus 2.5.2.17 Microbacteriaceae 2.5.2.17.3 2.5.2.18 Agromyces Micrococcaceae 2.5.2.18 Micrococcaceae 2.5.2.19 2.5.2.2 Micromonosporaceae Actinomyces 2.5.2.19 Micromonosporaceae 2.5.2.20 2.5.2.20.1 Propionibacterineae Unclassified 2.5.2.2 Actinomyces 2.5.2.20.2 2.5.2.20.3 Kribbella Nocardioidaceae 2.5.2.20 Propionibacterineae 2.5.2.20.4 2.5.2.21 Propionibacteriaceae Pseudonocardiaceae 2.5.2.21 Pseudonocardiaceae 2.5.2.22 2.5.2.22.1 Streptomycineae Unclassified 2.5.2.22 Streptomycineae 2.5.2.22.2 2.5.2.22.3 Kitasatospora Streptacidiphilus 2.5.2.23 Streptosporangineae 2.5.2.23 2.5.2.23.1 Streptosporangineae Unclassified 2.5.2.3 Actinomycineae 2.5.2.23.2 2.5.2.23.3 Ellin5129 Nocardiopsaceae 2.5.2.4 Actinosynnemataceae 2.5.2.23.4 2.5.2.23.5 Streptosporangiaceae Thermomonosporaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.3 Actinomycineae 2.5.2.4 Actinosynnemataceae 2.5.2.6 Brevibacteriaceae 2.5.2.5 Bifidobacteriaceae 2.5.2.6 Brevibacteriaceae 2.5.2.7 Cellulomonadaceae 2.5.2.7 Cellulomonadaceae 2.5.2.8 Corynebacterineae 2.5.2.8 Corynebacterineae 2.5.2.8.1 Unclassified 2.5.2.8.2 Corynebacteriaceae 2.5.2.9 Dermabacteraceae 2.5.2.8.3 Dietziaceae 2.5.2.8.4 Gordoniaceae 2.5.3 Coriobacteridae 2.5.2.8.5 Mycobacteriaceae 2.5.2.8.6 Rhodococcus 2.5.3.1 Unclassified 2.5.2.8.7 Rhodococcus 2.5.2.8.8 Rhodococcus 2.5.3.2 Atopobiales 2.5.2.9 Dermabacteraceae 2.5.2.9.1 Unclassified 2.5.3.3 Coriobacteriales 2.5.2.9.2 Brachybacterium 2.5.2.9.3 Dermabacter 2.5.3.4 Eggerthellales 2.5.3 Coriobacteridae 2.5.3.1 Unclassified 2.5.4 OPB41 2.5.3.2 Atopobiales 2.5.3.3 Coriobacteriales 2.5.5 PK1 2.5.3.4 Eggerthellales 2.5.4 OPB41 2.5.6 Rubrobacteridae 2.5.5 PK1 2.5.6 Rubrobacteridae 2.5.6.1 Unclassified 2.5.6.1 Unclassified 2.5.6.2 "Thermoleiphilaceae 2.5.6.2 "Thermoleiphilaceae 2.5.6.2.1 Unclassified 2.5.6.2.2 Conexibacter 2.5.6.3 MC47 2.5.6.2.3 XGE514 2.5.6.3 MC47 2.5.6.4 Rubrobacteraceae 2.5.6.4 RubrobacteraceaeTuesday, March 8, 2011
  44. 44. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Archaea TM7 Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11Tuesday, March 8, 2011
  45. 45. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Eukaryotes TM7 Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11Tuesday, March 8, 2011
  46. 46. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Viruses TM7 Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11Tuesday, March 8, 2011
  47. 47. Proteobacteria• GEBA TM6 OS-K • At least 40 Acidobacteria• A genomic Termite Group OP8 phyla of bacteria encyclopedia Nitrospira Bacteroides • Genome Chlorobi of bacteria Fibrobacteres Marine GroupA sequences are and archaea WS3 Gemmimonas mostly from Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter OP10 • Solution: Really Thermomicrobia Chloroflexi Fill in the Tree TM7 Deinococcus-Thermus Dictyoglomus Aquificae ThermudesulfobacteriaEisen & Ward, PIs Thermotogae OP1 OP11Tuesday, March 8, 2011
  48. 48. http://www.jgi.doe.gov/programs/GEBA/pilot.htmlTuesday, March 8, 2011
  49. 49. GEBA Pilot Project: Components • Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen, Eddy Rubin, Jim Bristow) • Project management (David Bruce, Eileen Dalin, Lynne Goodwin) • Culture collection and DNA prep (DSMZ, Hans-Peter Klenk) • Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng) • Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al) • Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla) • Adopt a microbe education project (Cheryl Kerfeld) • Outreach (David Gilbert) • $$$ (DOE, Eddy Rubin, Jim Bristow)Tuesday, March 8, 2011
  50. 50. rRNA Tree of Life FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.Tuesday, March 8, 2011
  51. 51. Tuesday, March 8, 2011
  52. 52. Tuesday, March 8, 2011
  53. 53. Tuesday, March 8, 2011
  54. 54. B: Ac in t ob ac te B: ria # of Genomes Am (HTuesday, March 8, 2011 in igh 10 15 20 25 30 35 0 5 an G a C B: B: er ) Ba Aq ob ct uif ia B: ero ica B: e D Ch ide B: e ef lo te r s D rri ofl ef ba e B: e c xi B: De B rrib ter Ep lta : D act es si Pr ei er lo o n es n te oc Pr ob oc ot a ci B: e ct G B: oba eri am B F ct a : ir e B: m Fu mi ria a G P so cut em ro ba e t c s B: ma eo te ba ri H tim c a a t B: loa ona eri a B: Pl nae de an r te Th c o s Phyla er B: to bia m S m le y s B: od piro ce es c te T u h B: he lfo ae s rm b te GEBA Pilot Target List Th o a s er de cte m s ri u a A: ove lfo H n bi A: alo abu a A: A b la M rc ac e A: et ha te M han eo ria et g ha ob lob ac i A: no te m r A: The icr ia Th rm obi er oc a m oc op ci ro te i
  55. 55. GEBA Pilot Project Overview • Identify major branches in rRNA tree for which no genomes are available • Identify those with a cultured representative in DSMZ • DSMZ grew > 200 of these and prepped DNA • Sequence and finish 200+ • Annotate, analyze, release data • Assess benefits of tree guided sequencing • 1st paper Wu et al in Nature Dec 2009Tuesday, March 8, 2011
  56. 56. GEBA Phylogenomic Lesson 1 The rRNA Tree of Life is a Useful Tool for Identifying Phylogenetically Novel GenomesTuesday, March 8, 2011
  57. 57. rRNA Tree of Life Bacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740Tuesday, March 8, 2011
  58. 58. The Core Gets Small ...Tuesday, March 8, 2011
  59. 59. The PangenomeTuesday, March 8, 2011
  60. 60. Islands Among SyntenyTuesday, March 8, 2011
  61. 61. The PangenomeTuesday, March 8, 2011
  62. 62. Network of Life Bacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.Tuesday, March 8, 2011
  63. 63. Using the CoreTuesday, March 8, 2011
  64. 64. Wh Whole genome tree built using AMPHORA by Martin Wu and Dongying WuTuesday, March 8, 2011
  65. 65. Tuesday, March 8, 2011
  66. 66. Four Models for Rooting TOL from Lake et al. doi: 10.1098/rstb.2009.0035Tuesday, March 8, 2011
  67. 67. GEBA Phylogenomic Lesson 2 rRNA Tree is good but not perfect and better genomic sampling improves phylogenetic inferenceTuesday, March 8, 2011
  68. 68. 16s Says Hyphomonas is in RhodobacterialesBadger et al.2005Tuesday, March 8, 2011
  69. 69. WGT and individual gene trees: Its Related to CaulobacteralesBadger et al.2005Tuesday, March 8, 2011
  70. 70. 16s WGT, 23S Badger et al. 2005 Int J System Evol Microbiol 55: 1021-1026.Tuesday, March 8, 2011
  71. 71. Caveats: ignoring LGT and using concatenated alignmentsTuesday, March 8, 2011
  72. 72. Concatenated Alignment ML TreeTuesday, March 8, 2011
  73. 73. Green Non Sulfur BacteriaTuesday, March 8, 2011
  74. 74. Chlamydia-VerrucomicrobiaTuesday, March 8, 2011
  75. 75. ProteobacteriaTuesday, March 8, 2011
  76. 76. Zimmer. New York Times. 2009Tuesday, March 8, 2011
  77. 77. GEBA Phylogenomic Lesson 3 Phylogenetics guided genome selection (and phylogenetics in general) improves genome annotationTuesday, March 8, 2011
  78. 78. Predicting Function • Key step in genome projects • More accurate predictions help guide experimental and computational analyses • Many diverse approaches • All improved both by “phylogenomic” type analyses that integrate evolutionary reconstructions and understanding of how new functions evolveTuesday, March 8, 2011
  79. 79. From Eisen et al. 1997 Nature Medicine 3: 1076-1078.Tuesday, March 8, 2011
  80. 80. Blast Search of H. pylori “MutS” • Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs • Based on this TIGR predicted this species had mismatch repair Based on Eisen • Assumes functional constancy et al. 1997 Nature Medicine 3: 1076-1078.Tuesday, March 8, 2011
  81. 81. Predicting Function • Identification of motifs – Short regions of sequence similarity that are indicative of general activity – e.g., ATP binding • Homology/similarity based methods – Gene sequence is searched against a databases of other sequences – If significant similar genes are found, their functional information is used • Problem – Genes frequently have similarity to hundreds of motifs and multiple genes, not all with the same functionTuesday, March 8, 2011
  82. 82. MutL?? From http://asajj.roswellpark.org/huberman/dna_repair/mmr.htmlTuesday, March 8, 2011
  83. 83. Phylogenetic Tree of MutS Family Aquae Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg mSaco Yeast Human Yeast Mouse Arath Celeg Human Arath Human Mouse Spombe Fly Yeast Xenla Rat Mouse Yeast Human Spombe Yeast Neucr Arath Aquae Trepa Chltr DeiraTheaq Thema BacsuBorbu Based on Eisen, SynspStrpy 1998 Nucl Acids Ecoli Neigo Res 26: 4291-4300.Tuesday, March 8, 2011
  84. 84. MutS Subfamilies MSH5 MutS2 Aquae Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg mSaco MSH6 Yeast Human Mouse Arath Yeast MSH4 Celeg Human Arath Human MSH3 Mouse Fly Spombe Yeast Xenla Rat Mouse Yeast MSH1 Spombe Human Yeast MSH2 Neucr Arath Aquae Trepa Chltr Deira Theaq BacsuBorbu Thema SynspStrpy Ecoli Neigo Based on Eisen, 1998 Nucl Acids MutS1 Res 26: 4291-4300.Tuesday, March 8, 2011
  85. 85. Overlaying Functions onto Tree MutS2 MSH5 Aquae Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg MSH6 mSaco Yeast Human Mouse Arath YeastMSH4 Celeg Human Arath Human MSH3 Mouse Fly Spombe Yeast Xenla Rat Mouse Yeast Human MSH1 Spombe Yeast MSH2 Neucr Arath Aquae Trepa Chltr DeiraTheaq BacsuBorbu Thema SynspStrpy Based on Eisen, Ecoli Neigo 1998 Nucl Acids MutS1 Res 26: 4291-4300.Tuesday, March 8, 2011
  86. 86. Functional Prediction Using Tree MSH5 - Meiotic Crossing Over MutS2 - Unknown Functions Aquae Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg MSH6 - Nuclear mSaco Repair Yeast Of Mismatches Human MSH4 - Meiotic Crossing Mouse Yeast Over Arath Celeg Human Arath MSH3 - Nuclear Human Mouse RepairOf Loops Spombe Fly Yeast Xenla Rat Mouse MSH2 - Eukaryotic Nuclear Yeast Human Mismatch and Loop Repair MSH1 Spombe Yeast Neucr Mitochondrial Arath Repair Aquae Trepa Chltr DeiraTheaq BacsuBorbu Thema SynspStrpy Ecoli Based on Eisen, Neigo 1998 Nucl Acids MutS1 - Bacterial Mismatch and Loop Repair Res 26: 4291-4300.Tuesday, March 8, 2011
  87. 87. Tuesday, March 8, 2011
  88. 88. PHYLOGENENETIC PREDICTION OF GENE FUNCTION EXAMPLE A METHOD EXAMPLE B 2A CHOOSE GENE(S) OF INTEREST 5 3A 1 3 4 2B 2 IDENTIFY HOMOLOGS 5 1A 2A 1B 3B 6 ALIGN SEQUENCES 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 CALCULATE GENE TREE Duplication? 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 OVERLAY KNOWN FUNCTIONS ONTO TREE Duplication? 2A 3A 1B 2B 3B 1 2 3 4 5 6 1A INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication? Species 1 Species 2 Species 3 1A 1B 2A 2B 3A 3B 1 2 3 4 5 6 ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Based on Eisen, 1998 Genome Duplication Res 8: 163-167.Tuesday, March 8, 2011
  89. 89. Phylogenetic Prediction of • Termed phylogenomics (Eisen, et al 1997) • Greatly improves accuracy of functional predictions compared to similarity based methods (e.g., blast) • Automated methods now available – Sean Eddy, Steven Brenner, Kimmen Sjölander, etc. • But …Tuesday, March 8, 2011
  90. 90. Example 2: Recent Changes • Phylogenomic functional prediction NJ * ** V.cholerae0512 VC V.cholerae VCA1034 V.cholerae VC V.cholerae VC V.cholerae VC A0974 A0068 V.cholerae VC 0825 0282 may not work well for very newly V.cholerae VCA0906 V.cholerae VC A0979 V.cholerae VCA1056 V.cholerae VC1643 V.cholerae VC2161 ** V.cholerae VCA0923 ** V.cholerae VC0514 V.cholerae VC 1868 V.cholerae VC A0773 V.cholerae VC1313 evolved functions V.cholerae VC 1859 V.cholerae VC1413 V.cholerae VCA0268 ** V.cholerae VC A0658 V.cholerae VC 1405 * V.cholerae VC1298 V.cholerae VC1248 V.cholerae VCA0864 V.cholerae VCA0176 ** V.cholerae VCA0220 V.cholerae VC 1289 ** V.cholerae VC1069 A V.cholerae VC2439 • Can use understanding of origin of V.cholerae VC967 1 V.cholerae VC A0031 V.cholerae VC1898 V.cholerae VC A0663 V.cholerae VC0988 A V.cholerae VC0216 * V.cholerae VC0449 V.cholerae VCA0008 V.cholerae VC1406 V.cholerae VC 1535 novelty to better interpret these cases? V.cholerae VC0840 B.subtilis gi2633766 Synechocystis sp. gi1001299 * Synechocystis sp.gi1001300 * Synechocystis sp. gi1652276 * Synechocystis sp. gi1652103 H.pylori gi2313716 ** **H.pylori 99 gi4155097 C.jejuni Cj1190c C.jejuni Cj1110c A.fulgidus gi2649560 A.fulgidus gi2649548 ** B.subtilis gi2634254 • Screen genomes for genes that have B.subtilis gi2632630 B.subtilis gi2635607 B.subtilis gi2635608 ** B.subtilis gi2635609 ** ** B.subtilisgi2635882 gi2635610 B.subtilis E.coligi1788195 E.coli gi2367378 * ** E.coligi1788194 E.coli A1092 gi1787690 V.cholerae VC changed recently V.cholerae VC 0098 E.coli gi1789453 H.pylori gi2313186 H.pylori 99 gi4154603 ** C.jejuni Cj0144 C.jejuni Cj1564 **C.jejuni C.jejuni Cj0262c Cj1506c ** H.pylori gi2313163 * ** H.pylori 99 gi4154575 ** H.pylori gi2313179 H.pylori 99 gi4154599 – Pseudogenes and gene loss ** C.jejuni Cj0019c C.jejuni Cj0951c C.jejuni Cj0246c B.subtilis gi2633374 T.maritima TM0014 V.cholerae VC1403 V.cholerae VCA1088 T.pallidum gi3322777 ** T.pallidum gi3322939 ** T.pallidum gi3322938 B.burgdorferi gi2688522 – Contingency Loci T.pallidum gi3322296 B.burgdorferi gi2688521 * T.maritima TM0429 **T.maritima TM0918 * **T.maritima T.maritima TM0023 TM1428 T.maritima TM1143 T.maritima TM1146 P.abyssi PAB1308 P.horikoshii gi3256846 ** P.abyssiPAB1336 – Acquisition (e.g., LGT) ** P.horikoshii gi3256896 ** **P.abyssi PAB2066 ** P.horikoshii ** P.abyssi gi3258290 * PAB1026 ** P.horikoshii DRA00354 gi3256884 D.radiodurans D.radiodurans ** D.radioduransDRA0353 ** DRA0352 ** V.cholerae VC 1394 P.abyssi PAB1189 P.horikoshii gi3258414 – Unusual dS/dN ratios ** B.burgdorferi gi2688621 M.tuberculosis gi1666149 V.cholerae VC 0622 – Rapid evolutionary rates – Recent duplicationsTuesday, March 8, 2011
  91. 91. Example 3: Non homology methods • Many genes have homologs in other species but no homologs have ever been studied experimentally • Non-homology methods can make functional predictions for these • Example: phylogenetic profilingTuesday, March 8, 2011
  92. 92. Phylogenetic profiling basis • Microbial genes are lost rapidly when not maintained by selection • Genes can be acquired by lateral transfer • Frequently gain and loss occurs for entire pathways/processes • Thus might be able to use correlated presence/ absence information to identify genes with similar functionsTuesday, March 8, 2011
  93. 93. Non-Homology Predictions: Phylogenetic Profiling • Step 1: Search all genes in organisms of interest against all other genomes • Ask: Yes or No, is each gene found in each other species • Cluster genes by distribution patterns (profiles)Tuesday, March 8, 2011
  94. 94. Carboxydothermus hydrogenoformans • Isolated from a Russian hotspring • Thermophile (grows at 80°C) • Anaerobic • Grows very efficiently on CO (Carbon Monoxide) • Produces hydrogen gas • Low GC Gram positive (Firmicute) • Genome Determined (Wu et al. 2005 PLoS Genetics 1: e65. )Tuesday, March 8, 2011
  95. 95. Homologs of Sporulation Genes Wu et al. 2005 PLoS Genetics 1: e65.Tuesday, March 8, 2011
  96. 96. Carboxydothermus sporulates Wu et al. 2005 PLoS Genetics 1: e65.Tuesday, March 8, 2011
  97. 97. Wu et al. 2005 PLoS Genetics 1: e65.Tuesday, March 8, 2011
  98. 98. PG Profiling Works Better Using OrthologyTuesday, March 8, 2011
  99. 99. GEBA Lesson 3: Phylogeny driven genome selection (and phylogenetics) improves genome annotation • Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes • Better definition of protein family sequence “patterns” • Greatly improves “comparative” and “evolutionary” based predictions • Conversion of hypothetical into conserved hypotheticals • Linking distantly related members of protein families • Improved non-homology predictionTuesday, March 8, 2011
  100. 100. GEBA Lesson 4: Metadata ImportantTuesday, March 8, 2011
  101. 101. GEBA Phylogenomic Lesson 5 Phylogeny-driven genome selection helps discover new genetic diversityTuesday, March 8, 2011
  102. 102. Network of Life Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.Tuesday, March 8, 2011
  103. 103. Protein Family Rarefaction • Take data set of multiple complete genomes • Identify all protein families using MCL • Plot # of genomes vs. # of protein familiesTuesday, March 8, 2011
  104. 104. Wu et al. 2009 Nature 462, 1056-1060Tuesday, March 8, 2011
  105. 105. Wu et al. 2009 Nature 462, 1056-1060Tuesday, March 8, 2011
  106. 106. Wu et al. 2009 Nature 462, 1056-1060Tuesday, March 8, 2011
  107. 107. Wu et al. 2009 Nature 462, 1056-1060Tuesday, March 8, 2011
  108. 108. Wu et al. 2009 Nature 462, 1056-1060Tuesday, March 8, 2011
  109. 109. Synapomorphies existWu et al. 2009 Nature 462, 1056-1060Tuesday, March 8, 2011
  110. 110. Families/PD not uniform +,%-./&#(%)"* !"#$%"&(%)"* ! !Tuesday, March 8, 2011
  111. 111. Structural Novelty • Of the 17000 protein families in the GEBA56, 1800 are novel in sequence (Wu) • Structural modeling suggests many are structurally novel too (Dhaeseleer) • 372 being crystallized by the PSI (Kerfeld)Tuesday, March 8, 2011
  112. 112. GEBA Phylogenomic Lesson 6 Improves analysis of genome data from uncultured organismsTuesday, March 8, 2011
  113. 113. Great Plate Count Anomaly Culturing Microscope Count CountTuesday, March 8, 2011
  114. 114. Great Plate Count Anomaly Culturing Microscope Count <<<< CountTuesday, March 8, 2011
  115. 115. Environmental DNA Analysis DNA Culturing Microscope Count <<<< CountTuesday, March 8, 2011

×