Phylogenomic Approaches to Functional Prediction                        Automated Function Prediction SIG                 ...
PAFP                        Automated Function Prediction SIG                                   ISMB 2012                 ...
PAFP                                   AFP SIG                                 ISMB 2012                                Ju...
PAFP AFP SIG ISMB 2012                                July 13, 2012                              Jonathan A. Eisen        ...
Acknowledgements              • $$$                    •   DOE                    •   NSF                    •   GBMF     ...
Phylogenomics of Novelty                                                     Variation in         Mechanisms of           ...
Origin of Novelty       • How does novelty originate?       • What are the constraints on evolvability?       • What leads...
HistorySaturday, July 14, 12
Whatever the History:                        Trying to Incorporate it is Critical       from Lake et al. doi: 10.1098/rstb...
PAFP AFP SIG ISMB 2012 I:             Predicting Functions with Evolutionary TreesSaturday, July 14, 12
SNF2 Family of Proteins (1995)              • SNF2 family defined by presence of conserved                DNA-dependent AT...
Sub-              Family    Protein                          Helicase Motifs --   I   Ia Ib II III       IV   V VI        ...
Saturday, July 14, 12
SNF2 Subfamilies        Subfamily                      Function        SNF2            Transcription activation (Swi/Snf c...
SNF2 Tree and F(x) Prediction          • Function conserved within but not            between subfamilies/orthology groups...
From Eisen et al.                        1997 Nature                        Medicine 3:                        1076-1078.S...
Blast Search of H. pylori “MutS”              • Blast search pulls up Syn. sp MutS#2 with much higher p                val...
MutL??        From http://asajj.roswellpark.org/huberman/dna_repair/mmr.htmlSaturday, July 14, 12
Phylogenetic Tree of MutS Family                                                          Aquae                           ...
MutS Subfamilies                                                  MSH5                                MutS2               ...
Overlaying Functions onto Tree                                                                                        MutS...
MutS Subfamilies               • MutS1	   	   Bacterial MMR               • MSH1	    	   Euk - mitochondrial MMR          ...
Functional Prediction Using Tree                        MSH5 - Meiotic Crossing Over                    MutS2 - Unknown Fu...
Ancient MutS DuplicationSaturday, July 14, 12
MutS1,2 vs MutL                            Table 3. Presence of MutS Homologs in Complete Genomes Sequences               ...
Saturday, July 14, 12
PHYLOGENENETIC PREDICTION OF GENE FUNCTION                                    EXAMPLE A                                MET...
Evolutionary Rate Variation                        1          2                                           4         6     ...
Functional Diversity of Proteorhodopsins?                                                 Venter et al., Science          ...
Phylogenetic Challenge                        A single tree with everything?Saturday, July 14, 12
Phylosift/ pplacerSaturday, July 14, 12
rRNA Phylotyping                         DNA                         extraction                              PCR          ...
Eisen et al. 2002                                            Eisen et                                            al. 1992S...
PAFP AFP SIG ISMB 2012 II:                        Every gene family is unique ...Saturday, July 14, 12
Saturday, July 14, 12
Steps in Phylogenomics       • Create database of genes of interest       • Presence/absence of homologs in complete genom...
Photoreactivation/Photolyases       • All photoreactivation is carried out by enzymes in the photolyase         family    ...
Photoreactivation    • All known enzymes that perform photoreactivation are part of      a single large photolyase gene fa...
Alkyltransferases    • All known alkyltransferases are members of a single gene      family    • Found in most but not all...
BER Glycosylases       • Distribution patterns highly uneven but some glycosylases         have been found in all species ...
AP Endonucleases         • All species encode either Nfo or Xth homologs. Some encode           both.         • Only Nfo: ...
Uracil Glycosylase           • Many non-homologous proteins have uracil-             DNA glycosylase activity (Ung, GPADH,...
Not Open AccessSaturday, July 14, 12
Saturday, July 14, 12
PAFP AFP SIG ISMB 2012 III:                        When phylogeny is not enough ...Saturday, July 14, 12
But ...        • Many powerful and automated similarity based          methods for assigning genes to protein families    ...
Example: Recent Changes             • Phylogenomic functional prediction may   NJ                                         ...
Non-Homology Predictions:                          Phylogenetic Profiling                         • Step 1: Search all gene...
Correlated gain/loss of genes              • Microbial genes are lost rapidly when not                maintained by select...
Carboxydothermus hydrogenoformans    •    Isolated from a Russian hotspring    •    Thermophile (grows at 80°C)    •    An...
Homologs of Sporulation Genes  Wu et al. 2005  PLoS Genetics 1: e65.Saturday, July 14, 12
Carboxydothermus sporulates  Wu et al. 2005  PLoS Genetics 1: e65.Saturday, July 14, 12
Wu et al. 2005 PLoS Genetics 1: e65.Saturday, July 14, 12
PG Profiling Works Better with FamiliesSaturday, July 14, 12
PAFP AFP SIG ISMB 2012 IV:                        Knowing What You Don’t KnowSaturday, July 14, 12
As of 2002              Proteobacteria                        TM6                        OS-K                     • At lea...
As of 2002             Proteobacteria                        TM6                        OS-K                              ...
As of 2002             Proteobacteria                        TM6                        OS-K                              ...
As of 2002             Proteobacteria                        TM6                        OS-K                              ...
As of 2002             Proteobacteria                        TM6                        OS-K                              ...
TIGR TOL 2002Saturday, July 14, 12
GEBASaturday, July 14, 12
GEBA Lesson 1:                        Improves genome annotation             • Took 56 GEBA genomes and compared results v...
GEBA Lesson 2:                        Metadata ImportantSaturday, July 14, 12
GEBA Lesson 3:                        Improves discovering new genetic diversitySaturday, July 14, 12
Protein Family Rarefaction        • Take data set of multiple complete          genomes        • Identify all protein fami...
Wu et al. 2009 Nature 462, 1056-1060Saturday, July 14, 12
Wu et al. 2009 Nature 462, 1056-1060Saturday, July 14, 12
Wu et al. 2009 Nature 462, 1056-1060Saturday, July 14, 12
Wu et al. 2009 Nature 462, 1056-1060Saturday, July 14, 12
Wu et al. 2009 Nature 462, 1056-1060Saturday, July 14, 12
Synapomorphies existWu et al. 2009 Nature 462, 1056-1060Saturday, July 14, 12
Families/PD not uniform                31	                                        6	                                      ...
Structural Novelty              • Of the 17000 protein families in the GEBA56, 1800 are                novel in sequence (...
Needed Reference TreeSaturday, July 14, 12
GEBA Lesson 4:                        Much diversity untouchedSaturday, July 14, 12
rRNA Tree of Life                         FIgure from Barton, Eisen et al.                            “Evolution”, CSHL Pr...
Phylogenetic Diversity:From Wu etal. 2009Nature 462,1056-1060Saturday, July 14, 12
Phylogenetic Diversity withFrom Wu etal. 2009Nature 462,1056-1060Saturday, July 14, 12
Phylogenetic Diversity: Isolates                                     From Wu et al. 2009 Nature 462, 1056-1060Saturday, Ju...
Haloarchaeal GEBA-likeSaturday, July 14, 12
Phylogenetic Diversity: All                                      From Wu et al. 2009 Nature 462, 1056-1060Saturday, July 1...
Uncultured Lineages:              • Get into culture              • Enrichment cultures              • If abundant in low ...
GEBA uncultured       Number of SAGs from Candidate Phyla                                                                 ...
RecA, RpoB in GOS                                                   GOS 1                                                 ...
GEBA Lesson 6:                        Experimental diversitySaturday, July 14, 12
As of 2002              Proteobacteria                        TM6                        OS-K                     • At lea...
As of 2002             Proteobacteria                        TM6                        OS-K                              ...
As of 2002             Proteobacteria                        TM6                        OS-K                              ...
As of 2002             Proteobacteria                        TM6                        OS-K                              ...
As of 2002             Proteobacteria                        TM6                        OS-K                              ...
Proteobacteria                        TM6                        OS-K                                                Need ...
Proteobacteria                        TM6                        OS-K                                                Adopt...
Acknowledgements              • $$$                    •   DOE                    •   NSF                    •   GBMF     ...
Upcoming SlideShare
Loading in …5
×

Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

3,488 views

Published on

Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,488
On SlideShare
0
From Embeds
0
Number of Embeds
2,418
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Jonathan Eisen talk "Phylogneomic approaches to functional prediction"a #AFP2012 #ISMB

  1. 1. Phylogenomic Approaches to Functional Prediction Automated Function Prediction SIG ISMB 2012 July 13, 2012 Jonathan A. Eisen University of California, Davis @phylogenomicsSaturday, July 14, 12
  2. 2. PAFP Automated Function Prediction SIG ISMB 2012 July 13, 2012 Jonathan A. Eisen University of California, Davis @phylogenomicsSaturday, July 14, 12
  3. 3. PAFP AFP SIG ISMB 2012 July 13, 2012 Jonathan A. Eisen University of California, Davis @phylogenomicsSaturday, July 14, 12
  4. 4. PAFP AFP SIG ISMB 2012 July 13, 2012 Jonathan A. Eisen University of California, Davis @phylogenomicsSaturday, July 14, 12
  5. 5. Acknowledgements • $$$ • DOE • NSF • GBMF • Sloan • DARPA • DSMZ • DHS • People, places • DOE JGI: Eddy Rubin, Phil Hugenholtz, Nikos Kyrpides • UC Davis: Aaron Darling, Dongying Wu, Holly Bik, Russell Neches, Jenna Morgan-Lang • Other: Jessica Green, Katie Pollard, Martin Wu, Tom Slezak, Jack Gilbert, Steven Kembel, J. Craig Venter, Naomi Ward, Hans-Peter Klenk, Phil HanawaltSaturday, July 14, 12
  6. 6. Phylogenomics of Novelty Variation in Mechanisms of Mechanisms: Origin of New Patterns, Causes Functions and Effects Species EvolutionSaturday, July 14, 12
  7. 7. Origin of Novelty • How does novelty originate? • What are the constraints on evolvability? • What leads to variation within the genome and within and between species in evolvability • This information helps interpret the past, understand the present and (maybe) predict the futureSaturday, July 14, 12
  8. 8. HistorySaturday, July 14, 12
  9. 9. Whatever the History: Trying to Incorporate it is Critical from Lake et al. doi: 10.1098/rstb.2009.0035Saturday, July 14, 12
  10. 10. PAFP AFP SIG ISMB 2012 I: Predicting Functions with Evolutionary TreesSaturday, July 14, 12
  11. 11. SNF2 Family of Proteins (1995) • SNF2 family defined by presence of conserved DNA-dependent ATPase domain Bork and Koonin 1993 • 100s of proteins • Diversity of functions: • transcriptional activation (SNF2) • transcriptional repression (MOT1) • Recombination (RAD54) • transcription-coupled repair (CSB) • post-replication repair (RAD5) • chromosome segregation (lodestar) • Many with unknown functions • Some species have 15+ representativesSaturday, July 14, 12
  12. 12. Sub- Family Protein Helicase Motifs -- I Ia Ib II III IV V VI BRM hBRM hBRG1 SNF2 Alignment Br Br Br SNF2 mBRG1 Br STH1 Br SNF2 Br YB95 F37A4 SNF2L ISWI SNF2L CHD1 C CHD1 SYGP C ETL1 ETL1 FUN30 MOT1 MOT1 ERCC6 ERCC6 RAD26 YB53 RAD54 RAD54 DNRPPX hNUCP mNUCP RAD5 R spRAD8 R RAD16 HIP116 R RAD16 R LODE NPH42 HepA B.cereus ORF SCALE (aa) 0 500Saturday, July 14, 12
  13. 13. Saturday, July 14, 12
  14. 14. SNF2 Subfamilies Subfamily Function SNF2 Transcription activation (Swi/Snf complex) SNF2L Transcription activation (NURF complex) CHD1 Chromatin remodelling ETL1 Unknown MOT1 Transcription repression CSB Transcription-coupled repair Rad54 Recombinational repair Rad16 Chromatin access for DNA repair HepA Bacterial RNA polymerase subunitSaturday, July 14, 12
  15. 15. SNF2 Tree and F(x) Prediction • Function conserved within but not between subfamilies/orthology groups • Therefore, assignment of genes to subfamilies can be used to predict functions of unknowns • Grouping into subfamilies helps identify motifs conserved within groups • Phylogeny recovers subfamilies better than similarity searchesSaturday, July 14, 12
  16. 16. From Eisen et al. 1997 Nature Medicine 3: 1076-1078.Saturday, July 14, 12
  17. 17. Blast Search of H. pylori “MutS” • Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs • Based on this TIGR predicted this species had mismatch repair Based on Eisen • Assumes functional constancy et al. 1997 Nature Medicine 3: 1076-1078.Saturday, July 14, 12
  18. 18. MutL?? From http://asajj.roswellpark.org/huberman/dna_repair/mmr.htmlSaturday, July 14, 12
  19. 19. Phylogenetic Tree of MutS Family Aquae Strpy BacsuSynsp Deira Helpy Yeast Human Borbu Metth Celeg mSaco Yeast Human Yeast Mouse Arath Celeg Human Arath Human Mouse Spombe Fly Yeast Xenla Rat Mouse Yeast Human Spombe Yeast Neucr Arath Aquae Trepa Chltr DeiraTheaq Thema Bacsu Borbu Based on Eisen, Synsp Strpy 1998 Nucl Acids Res Ecoli Neigo 26: 4291-4300.Saturday, July 14, 12
  20. 20. MutS Subfamilies MSH5 MutS2 Aquae StrpyBacsuSynsp Deira Helpy Yeast Human Borbu Celeg Metth mSaco MSH6 Yeast Human Mouse Arath Yeast MSH4 Celeg Human Arath Human MSH3 Mouse Fly Spombe Yeast Xenla Rat Mouse Yeast MSH1 Spombe Human Yeast MSH2 Neucr Arath Aquae Trepa Chltr DeiraTheaq Bacsu Borbu Thema Synsp Strpy Ecoli Neigo Based on Eisen, 1998 Nucl Acids Res MutS1 26: 4291-4300.Saturday, July 14, 12
  21. 21. Overlaying Functions onto Tree MutS2 Aquae MSH5 StrpyBacsuSynsp Deira Helpy Yeast Human Borbu Celeg Metth MSH6 mSaco Yeast Human Mouse Arath Yeast MSH4 Celeg Human Arath Human MSH3 Mouse Spombe Fly Yeast Xenla Rat Mouse Yeast Human MSH1 Spombe Yeast MSH2 Neucr Arath Aquae Trepa Chltr DeiraTheaq Bacsu Borbu Thema Synsp Strpy Ecoli Based on Eisen, Neigo 1998 Nucl Acids Res MutS1 26: 4291-4300.Saturday, July 14, 12
  22. 22. MutS Subfamilies • MutS1 Bacterial MMR • MSH1 Euk - mitochondrial MMR • MSH2 Euk - all MMR in nucleus • MSH3 Euk - loop MMR in nucleus • MSH6 Euk - base:base MMR in nucleus • MutS2 Bacterial - function unknown • MSH4 Euk - meiotic crossing-over • MSH5 Euk - meiotic crossing-overSaturday, July 14, 12
  23. 23. Functional Prediction Using Tree MSH5 - Meiotic Crossing Over MutS2 - Unknown Functions Aquae StrpyBacsuSynsp Deira Helpy Yeast Human Borbu Celeg Metth MSH6 - Nuclear mSaco Repair Of Mismatches Yeast Human MSH4 - Meiotic Crossing Mouse Yeast Over Arath Celeg Human Arath MSH3 - Nuclear Human Mouse RepairOf Loops Spombe Fly Yeast Xenla Rat Mouse MSH2 - Eukaryotic Nuclear Yeast Human Mismatch and Loop Repair MSH1 Spombe Yeast Mitochondrial Neucr Arath Repair Aquae Trepa Chltr DeiraTheaq Bacsu Borbu Thema Synsp Strpy Ecoli Based on Eisen, Neigo 1998 Nucl Acids Res MutS1 - Bacterial Mismatch and Loop Repair 26: 4291-4300.Saturday, July 14, 12
  24. 24. Ancient MutS DuplicationSaturday, July 14, 12
  25. 25. MutS1,2 vs MutL Table 3. Presence of MutS Homologs in Complete Genomes Sequences Species # of MutS Which MutL Homologs Subfamilies? Homologs Bacteria Escherichia coli K12 1 MutS1 1 Haemophilus influenzae Rd KW20 1 MutS1 1 Neisseria gonorrhoeae 1 MutS1 1 Helicobacter pylori 26695 1 MutS2 - Mycoplasma genitalium G-37 - - - Mycoplasma pneumoniae M129 - - - Bacillus subtilis 169 2 MutS1,MutS2 1 Streptococcus pyogenes 2 MutS1,MutS2 1 Mycobacterium tuberculosis - - - Synechocystis sp. PCC6803 2 MutS1,MutS2 1 Treponema pallidum Nichols 1 MutS1 1 Borrelia burgdorferi B31 2 MutS1,MutS2 1 Aquifex aeolicus 2 MutS1,MutS2 1 Deinococcus radiodurans R1 2 MutS1,MutS2 1 Archaea Archaeoglobus fulgidus VC-16, DSM4304 - - - Methanococcus janasscii DSM 2661 - - - Methanobacterium thermoautotrophicum ΔH 1 MutS2 - Eukaryotes Saccharomyces cerevisiae 6 MSH1-6 3+ Homo sapiens 5 MSH2-6 3+Saturday, July 14, 12
  26. 26. Saturday, July 14, 12
  27. 27. PHYLOGENENETIC PREDICTION OF GENE FUNCTION EXAMPLE A METHOD EXAMPLE B 2A CHOOSE GENE(S) OF INTEREST 5 3A 1 3 4 2B 2 IDENTIFY HOMOLOGS 5 1A 2A 1B 3B 6 ALIGN SEQUENCES 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 CALCULATE GENE TREE Duplication? 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 OVERLAY KNOWN FUNCTIONS ONTO TREE Duplication? 2B 3B 1 2 3 4 5 6 1A 2A 3A 1B INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication? Species 1 Species 2 Species 3 1A 1B 1 2 3 4 5 6 2A 2B 3A 3B ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Based on Eisen, 1998 Genome Duplication Res 8: 163-167.Saturday, July 14, 12
  28. 28. Evolutionary Rate Variation 1 2 4 6 3 5Saturday, July 14, 12
  29. 29. Functional Diversity of Proteorhodopsins? Venter et al., Science 304: 66. 2004Saturday, July 14, 12
  30. 30. Phylogenetic Challenge A single tree with everything?Saturday, July 14, 12
  31. 31. Phylosift/ pplacerSaturday, July 14, 12
  32. 32. rRNA Phylotyping DNA extraction PCR Makes lots of Sequence PCR copies of the rRNA genes rRNA genes in sample rRNA1 5’...ACACACATAGGTGGAGCTA GCGATCGATCGA... 3’ Phylogenetic tree Sequence alignment = Data matrix rRNA2 rRNA1 rRNA2 rRNA1 A C A C A C 5’..TACAGTATAGGTGGAGCTAG CGACGATCGA... 3’ rRNA4 rRNA3 rRNA2 T A C A G T rRNA3 rRNA3 C A C T G T 5’...ACGGCAAAATAGGTGGATT E. coli Humans rRNA4 C A C A G T CTAGCGATATAGA... 3’ Yeast E. coli A G A C A G rRNA4 5’...ACGGCCCGATAGGTGGATT Humans T A T A G T CTAGCGCCATAGA... 3’ Yeast T A C A G TSaturday, July 14, 12
  33. 33. Eisen et al. 2002 Eisen et al. 1992Saturday, July 14, 12
  34. 34. PAFP AFP SIG ISMB 2012 II: Every gene family is unique ...Saturday, July 14, 12
  35. 35. Saturday, July 14, 12
  36. 36. Steps in Phylogenomics • Create database of genes of interest • Presence/absence of homologs in complete genomes • Phylogenetic trees of each gene family • Infer evolutionary events (gene origin, duplication, loss and transfer) • Refine presence/absence (orthologs, paralogs, subfamilies) • Functional predictions and functional evolution • Analysis of pathwaysSaturday, July 14, 12
  37. 37. Photoreactivation/Photolyases • All photoreactivation is carried out by enzymes in the photolyase family • Two main classes of photolyases – class I and class II – are distantly related to each other and likely the result of an ancient duplication • PhrI and PhrII missing from most species for which complete genomes are available. • Many cases of functional change (e.g., CPD -> 6-4) and some are not even involved in DNA repair • Many of the eukaryotic proteins appear to be of an organellar ancestrySaturday, July 14, 12
  38. 38. Photoreactivation • All known enzymes that perform photoreactivation are part of a single large photolyase gene family • Some members of the family do not function as photolyases, but instead work as blue-light receptors • If a species does not encode a member of the photolyase gene family, it likely does not have photoreactivation capability • If a species encodes a photolyase, one cannot conclude it has photolyase activity • Position of photolyase homologs within photolyase tree helps predict what activities they haveSaturday, July 14, 12
  39. 39. Alkyltransferases • All known alkyltransferases are members of a single gene family • Found in most but not all species • Likely present in LUCA • Ada protein in E. coli originated by fusion between an alkyltransferase and a transcription-regulatory domain • Gram-positive bacteria have the Ada domain fused to an alkylation glycosylase instead of alkyltransferaseSaturday, July 14, 12
  40. 40. BER Glycosylases • Distribution patterns highly uneven but some glycosylases have been found in all species • Some are ancient enzymes, probably presence in LUCA (e.g., MutY-Nth), others more recent (e.g., TagI). • Many families are distantly related to each other (e.g., Ogg, AlkA, MutY-Nth) • Many cases of gene duplication, loss and possibly transfer, especially from organellar genomes to nucleus • Orthologs frequently have different specificitySaturday, July 14, 12
  41. 41. AP Endonucleases • All species encode either Nfo or Xth homologs. Some encode both. • Only Nfo: mycoplasmas, Aquifex, M. jannascii, yeast • Only Xth: many bacteria, A. fulgidus, humans (so far) • Both: E. coli, B. subtilis, M. tuberculosis, M. thermoautotrophicum • Both Nfo and Xth are likely ancient. • Many cases of gene loss of one or the other, but never bothSaturday, July 14, 12
  42. 42. Uracil Glycosylase • Many non-homologous proteins have uracil- DNA glycosylase activity (Ung, GPADH, MUG, cyclin) • Therefore, absence of homologs of these genes should not be used to infer likely absence of activity • However, presence of homologs of Ung and MUG genes can be used to indicate presence of activity because all homologs of these genes have this activitySaturday, July 14, 12
  43. 43. Not Open AccessSaturday, July 14, 12
  44. 44. Saturday, July 14, 12
  45. 45. PAFP AFP SIG ISMB 2012 III: When phylogeny is not enough ...Saturday, July 14, 12
  46. 46. But ... • Many powerful and automated similarity based methods for assigning genes to protein families • COGs • PFAM HMM searches • Some limitations of similarity based methods can be overcome by phylogenetic approaches • Automated methods now available • Sean Eddy • Steven Brenner • Kimmen Sjölander • But …Saturday, July 14, 12
  47. 47. Example: Recent Changes • Phylogenomic functional prediction may NJ * ** V.cholerae0512 VC V.cholerae VCA1034 V.cholerae VC V.cholerae VC A0974 A0068 V.cholerae VC 0825 not work well for very newly evolved V.cholerae VC0282 V.cholerae VCA0906 V.cholerae VCA0979 V.cholerae VCA1056 V.cholerae VC1643 V.cholerae VC2161 ** V.cholerae VC0923 A ** V.cholerae VC0514 V.cholerae VC 1868 V.cholerae VC A0773 functions V.cholerae VC1313 V.cholerae VC 1859 V.cholerae VC1413 V.cholerae VCA0268 ** V.cholerae VC A0658 V.cholerae VC 1405 * V.cholerae VC1298 V.cholerae VC1248 V.cholerae VCA0864 V.cholerae VCA0176 ** V.cholerae VCA0220 V.cholerae VC 1289 V.cholerae VC1069 A • Can use understanding of origin of novelty ** V.cholerae VC2439 V.cholerae VC967 1 V.cholerae VC A0031 V.cholerae VC1898 V.cholerae VCA0663 V.cholerae VC0988 A V.cholerae VC0216 * V.cholerae VC0449 V.cholerae VCA0008 V.cholerae VC1406 to better interpret these cases? V.cholerae VC1535 V.cholerae VC0840 B.subtilis gi2633766 Synechocystis sp. gi1001299 * Synechocystis sp.gi1001300 * Synechocystis sp. gi1652276 * Synechocystis sp. gi1652103 H.pylori gi2313716 H.pylori 99 gi4155097 ** C.jejuni ** Cj1190c C.jejuni Cj1110c A.fulgidus gi2649560 A.fulgidus gi2649548 • Screen genomes for genes that have ** B.subtilis gi2634254 B.subtilis gi2632630 B.subtilis gi2635607 B.subtilis gi2635608 ** B.subtilis gi2635609 ** ** B.subtilis gi2635610 B.subtilis gi2635882 E.coli gi1788195 E.coli gi2367378 * ** E.coli gi1788194 changed recently E.coli A1092 gi1787690 V.cholerae VC V.cholerae VC 0098 E.coli gi1789453 H.pylori gi2313186 H.pylori 99 gi4154603 ** C.jejuni Cj0144 C.jejuni Cj1564 **C.jejuni C.jejuni Cj0262c Cj1506c ** H.pylori gi2313163 * ** H.pylori 99 gi4154575 – Pseudogenes and gene loss ** H.pylori gi2313179 ** H.pylori 99 gi4154599 C.jejuni Cj0019c C.jejuni Cj0951c C.jejuni Cj0246c B.subtilis gi2633374 T.maritima TM0014 V.cholerae VC1403 V.cholerae VC A1088 T.pallidum gi3322777 T.pallidum gi3322939 – Contingency Loci **** T.pallidum gi3322938 B.burgdorferi gi2688522 T.pallidum gi3322296 B.burgdorferi gi2688521 * T.maritima TM0429 **T.maritima TM0918 * ** T.maritima T.maritima TM0023 TM1428 T.maritima TM1143 T.maritima TM1146 P.abyssi PAB1308 – Acquisition (e.g., LGT) P.horikoshii ** P.abyssi gi3256846 ** PAB1336 P.horikoshii **P.abyssi gi3256896 ** PAB2066 ** * ** P.horikoshii P.abyssi gi3258290 PAB1026 ** P.horikoshii DRA00354 gi3256884 D.radiodurans D.radiodurans ** D.radioduransDRA0353 ** DRA0352 ** V.cholerae VC 1394 P.abyssi PAB1189 – Unusual dS/dN ratios P.horikoshii gi3258414 ** B.burgdorferi gi2688621 M.tuberculosis gi1666149 V.cholerae VC 0622 – Rapid evolutionary rates – Recent duplicationsSaturday, July 14, 12
  48. 48. Non-Homology Predictions: Phylogenetic Profiling • Step 1: Search all genes in organisms of interest against all other genomes • Ask: Yes or No, is each gene found in each other species • Cluster genes by distribution patterns (profiles) Pelligrini et al. 1999. PNAS 96: 4285.Saturday, July 14, 12
  49. 49. Correlated gain/loss of genes • Microbial genes are lost rapidly when not maintained by selection • Genes can be acquired by lateral transfer • Frequently gain and loss occurs for entire pathways/processes • Thus might be able to use correlated presence/ absence information to identify genes with similar functionsSaturday, July 14, 12
  50. 50. Carboxydothermus hydrogenoformans • Isolated from a Russian hotspring • Thermophile (grows at 80°C) • Anaerobic • Grows very efficiently on CO • Produces hydrogen gas • Low GC Gram + (Firmicute) • Genome Determined Wu et al. 2005 PLoS Genetics 1: e65.Saturday, July 14, 12
  51. 51. Homologs of Sporulation Genes Wu et al. 2005 PLoS Genetics 1: e65.Saturday, July 14, 12
  52. 52. Carboxydothermus sporulates Wu et al. 2005 PLoS Genetics 1: e65.Saturday, July 14, 12
  53. 53. Wu et al. 2005 PLoS Genetics 1: e65.Saturday, July 14, 12
  54. 54. PG Profiling Works Better with FamiliesSaturday, July 14, 12
  55. 55. PAFP AFP SIG ISMB 2012 IV: Knowing What You Don’t KnowSaturday, July 14, 12
  56. 56. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, 2002 OP11Saturday, July 14, 12
  57. 57. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Most genomes WS3 Gemmimonas from three Firmicutes Fusobacteria phyla Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, 2002 OP11Saturday, July 14, 12
  58. 58. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Most genomes WS3 Gemmimonas from three Firmicutes Fusobacteria phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some studies Deferribacteres Chrysiogenetes in other phyla NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, 2002 OP11Saturday, July 14, 12
  59. 59. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Most genomes WS3 Gemmimonas from three Firmicutes Fusobacteria phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely OP3 Planctomycetes Spriochaetes sampled Coprothmermobacter OP10 • Same trend in Thermomicrobia Chloroflexi TM7 Eukaryotes Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, 2002 OP11Saturday, July 14, 12
  60. 60. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Most genomes WS3 Gemmimonas from three Firmicutes Fusobacteria phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely OP3 Planctomycetes Spriochaetes sampled Coprothmermobacter OP10 • Same trend in Thermomicrobia Chloroflexi TM7 Viruses Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, 2002 OP11Saturday, July 14, 12
  61. 61. TIGR TOL 2002Saturday, July 14, 12
  62. 62. GEBASaturday, July 14, 12
  63. 63. GEBA Lesson 1: Improves genome annotation • Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes • Better definition of protein family sequence “patterns” • Greatly improves “comparative” and “evolutionary” based predictions • Conversion of hypothetical into conserved hypotheticals • Linking distantly related members of protein families • Improved non-homology predictionSaturday, July 14, 12
  64. 64. GEBA Lesson 2: Metadata ImportantSaturday, July 14, 12
  65. 65. GEBA Lesson 3: Improves discovering new genetic diversitySaturday, July 14, 12
  66. 66. Protein Family Rarefaction • Take data set of multiple complete genomes • Identify all protein families using MCL • Plot # of genomes vs. # of protein familiesSaturday, July 14, 12
  67. 67. Wu et al. 2009 Nature 462, 1056-1060Saturday, July 14, 12
  68. 68. Wu et al. 2009 Nature 462, 1056-1060Saturday, July 14, 12
  69. 69. Wu et al. 2009 Nature 462, 1056-1060Saturday, July 14, 12
  70. 70. Wu et al. 2009 Nature 462, 1056-1060Saturday, July 14, 12
  71. 71. Wu et al. 2009 Nature 462, 1056-1060Saturday, July 14, 12
  72. 72. Synapomorphies existWu et al. 2009 Nature 462, 1056-1060Saturday, July 14, 12
  73. 73. Families/PD not uniform 31 6 Saturday, July 14, 12
  74. 74. Structural Novelty • Of the 17000 protein families in the GEBA56, 1800 are novel in sequence (Wu) • Structural modeling suggests many are structurally novel too (Dhaeseleer) • 372 being crystallized by the PSI (Kerfeld)Saturday, July 14, 12
  75. 75. Needed Reference TreeSaturday, July 14, 12
  76. 76. GEBA Lesson 4: Much diversity untouchedSaturday, July 14, 12
  77. 77. rRNA Tree of Life FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.Saturday, July 14, 12
  78. 78. Phylogenetic Diversity:From Wu etal. 2009Nature 462,1056-1060Saturday, July 14, 12
  79. 79. Phylogenetic Diversity withFrom Wu etal. 2009Nature 462,1056-1060Saturday, July 14, 12
  80. 80. Phylogenetic Diversity: Isolates From Wu et al. 2009 Nature 462, 1056-1060Saturday, July 14, 12
  81. 81. Haloarchaeal GEBA-likeSaturday, July 14, 12
  82. 82. Phylogenetic Diversity: All From Wu et al. 2009 Nature 462, 1056-1060Saturday, July 14, 12
  83. 83. Uncultured Lineages: • Get into culture • Enrichment cultures • If abundant in low diversity ecosystems • Flow sorting • Microbeads • Microfluidic sorting • Single cell amplificationSaturday, July 14, 12
  84. 84. GEBA uncultured Number of SAGs from Candidate Phyla 406 1 OD1 OP1 OP3 SAR Site A: Hydrothermal vent 4 1 - - Site B: Gold Mine 6 13 2 - Site C: Tropical gyres (Mesopelagic) - - - 2 Site D: Tropical gyres (Photic zone) 1 - - - Sample collections at 4 additional sites are underway. Phil Hugenholtz 80Saturday, July 14, 12
  85. 85. RecA, RpoB in GOS GOS 1 GOS 2 GOS 3 GOS 4 GOS 5 Wu et al PLoS One 2011Saturday, July 14, 12
  86. 86. GEBA Lesson 6: Experimental diversitySaturday, July 14, 12
  87. 87. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, 2002 OP11Saturday, July 14, 12
  88. 88. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Experimental WS3 Gemmimonas studies are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, 2002 OP11Saturday, July 14, 12
  89. 89. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Experimental WS3 Gemmimonas studies are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes • Some studies NKB19 Verrucomicrobia Chlamydia in other phyla OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, 2002 OP11Saturday, July 14, 12
  90. 90. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas sequences are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes • Some other NKB19 Verrucomicrobia Chlamydia phyla are only OP3 Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 sampled Thermomicrobia Chloroflexi TM7 • Same trend in Deinococcus-Thermus Dictyoglomus Aquificae Eukaryotes Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, 2002 OP11Saturday, July 14, 12
  91. 91. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas sequences are Firmicutes Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria three phyla Synergistes Deferribacteres Chrysiogenetes • Some other NKB19 Verrucomicrobia Chlamydia phyla are only OP3 Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 sampled Thermomicrobia Chloroflexi TM7 • Same trend in Deinococcus-Thermus Dictyoglomus Aquificae Viruses Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, 2002 OP11Saturday, July 14, 12
  92. 92. Proteobacteria TM6 OS-K Need Acidobacteria Termite Group OP8 experimental Nitrospira Bacteroides Chlorobi studies from Fibrobacteres Marine GroupA WS3 across the tree Gemmimonas Firmicutes too Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes 0.1 Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, 2002 OP11Saturday, July 14, 12
  93. 93. Proteobacteria TM6 OS-K Adopt a Acidobacteria Termite Group OP8 Microbe Nitrospira Bacteroides Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes 0.1 Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, 2002 OP11Saturday, July 14, 12
  94. 94. Acknowledgements • $$$ • DOE • NSF • GBMF • Sloan • DARPA • DSMZ • DHS • People, places • DOE JGI: Eddy Rubin, Phil Hugenholtz, Nikos Kyrpides • UC Davis: Aaron Darling, Dongying Wu, Holly Bik, Russell Neches, Jenna Morgan-Lang • Other: Jessica Green, Katie Pollard, Martin Wu, Tom Slezak, Jack Gilbert, Steven Kembel, J. Craig Venter, Naomi Ward, Hans-Peter KlenkSaturday, July 14, 12

×