Phylogenomics and the Diversity and Diversification of Microbes         Jonathan A. Eisen            UC Davis          UC D...
Phylogenomics of Novelty                                      Variation inMechanisms of                                   ...
Why do this?• Discover causes and effects of differences in  evolvability• Improve predictions from genome analysis• Guide...
Outline• Introduction• Phylogenomic Stories  –   Within genome invention of novelty  –   Stealing novelty  –   Communities...
IntroductionGenome Sequencing
rRNA Tree of Life FIgure from Barton, Eisen et al.    “Evolution”, CSHL Press.Based on tree from Pace NR, 2003.
Limited Sampling of RRR Studies        FIgure from Barton, Eisen et al.           “Evolution”, CSHL Press.       Based on ...
Limited Sampling of RRR Studies                                                  Haloferax                                ...
UV Survival E.coli vs H.volcanii                1                        Ecoli vs. Hvolcanii              0.1             ...
H. volcanii UV Repair Label 7 - 45J / m2)0.6                                    Label5#2                        0 J/m2 t0 ...
Fleischmann et al.1995
Limited Sampling of RRR Studies                                                  Haloferax                                ...
From http://genomesonline.org
Human commensals
From http://genomesonline.org
Phylogenomics of Novelty I  Origin of Functions from Within
Phylogenomics of Novelty            • How does novelty originate?            • Major categories of processes            • ...
Phylogenomics of Novelty                 • How does novelty originate?                 • Major categories of processes Mec...
From Eisen et al.1997 NatureMedicine 3:1076-1078.
Blast Search of H. pylori “MutS”• Blast search pulls up Syn. sp MutS#2 with much higher p  value than other MutS homologs•...
Predicting Function• Identification of motifs   – Short regions of sequence similarity that are indicative of     general ...
MutL??Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
Overlaying Functions onto Tree                                                         MutS2                              ...
Evolutionary Functional Prediction                   EXAMPLE A                                METHOD                      ...
Example 2: Recent Changes• Phylogenomic functional prediction         NJ                                                  ...
Tetrahymena Genome Processing                            • Probably exists as a defense mechanism                         ...
Phylogenomics of Novelty IISometimes, it is easier to steal, borrow, or coopt functions rather than evolve them           ...
Stealing DNA
rRNA Tree of LifeBacteria                                       Archaea Eukaryotes    FIgure from Barton, Eisen et al.    ...
Perna et al. 2003
Network of LifeBacteria                                       Archaea Eukaryotes    Figure from Barton, Eisen et al.      ...
Correlated gain/loss of genes• Microbial genes are lost rapidly when not  maintained by selection• Genes can be acquired b...
Non-Homology Predictions:    Phylogenetic Profiling• Step 1: Search all genes in  organisms of interest against all  other...
Carboxydothermus hydrogenoformans• Isolated from a Russian hotspring• Thermophile (grows at 80°C)• Anaerobic• Grows very e...
Homologs of Sporulation Genes                           Wu et al. 2005                           PLoS Genetics 1:         ...
Carboxydothermus sporulates      Wu et al. 2005 PLoS Genetics 1: e65.
Wu et al. 2005 PLoS Genetics 1: e65.
Stealing Organisms (Symbioses)
Mutualistic Genome Evolution• Compare and contrast different types of  mutualistic symbioses• Diverse hosts, symbionts, bi...
Glassy Winged Sharpshooter                 • Feeds on xylem                   sap                 • Vector for            ...
Sharpshooter Shotgun Sequencing                              shotgun   Collaboration with Nancy                           ...
Higher Evolutionary Rates in                  EndosymbiontsWu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy M...
Variation in Evolution Rates                                                                MutS         MutL             ...
Polymorphisms in Metapopulation• Data from ~200 hosts   – 104 SNPs   – 2 indels• PCR surveys show that  this is between ho...
Baumannia is a Vitamin andCofactor Producing Machine                             Wu et al.                             200...
No Amino-Acid Synthesis
The Uncultured Majority
Great Plate Count AnomalyCulturing     Microscope  Count         Count
Great Plate Count AnomalyCulturing       Microscope  Count     <<<< Count
Great Plate Count Anomaly                             DNACulturing       Microscope  Count     <<<< Count
rRNA PCRThe Hidden Majority            Richness estimates             Hugenholtz 2002         Bohannan and Hughes 2003
rRNA data increasing exponentially too
Perna et al. 2003
Metagenomics         shotgun                   clone
How can we best use         metagenomic data?• Many possible uses including:  – Improvements on rRNA based phylotyping and...
Example I: Phylotyping with   rRNA and other genes
Functional Diversity of Proteorhodopsins?                                 Venter et al., 2004
Weighted % of Clones                                                                                                      ...
Example II: Binning
Metagenomics Challenge
Binning challengeA                       TB                       UC                       VD                       WE    ...
Binning challengeA                                            TB                                            UC            ...
Binning challengeA                                            TB                                            UC            ...
Binning challengeA                                          TB                                          UC                ...
Binning challengeA                                          TB                                          UC                ...
No Amino-Acid Synthesis
???????
CFB Phyla
Sulcia makes amino acidsBaumannia makes vitamins and cofactors                         Wu et al. 2006 PLoS Biology 4: e188.
Phylogenomics of Novelty III  Knowing What We Don’t Know
Research Topics                                        Variation inMechanisms of                                       Mec...
Research Topics                                        Variation inMechanisms of                                       Mec...
As of 2002
As of 2002   Proteobacteria             TM6             OS-K                    • At least 40             Acidobacteria   ...
As of 2002   Proteobacteria             TM6             OS-K                                     • At least 40            ...
As of 2002   Proteobacteria             TM6             OS-K                                     • At least 40            ...
As of 2002   Proteobacteria             TM6             OS-K                                     • At least 40            ...
Proteobacteria• NSF-funded       TM6                   OS-K                                           • At least 40  Tree ...
Proteobacteria• NSF-funded        TM6                    OS-K                                            • At least 40  Tr...
Proteobacteria• GEBA              TM6                    OS-K                    • At least 40                    Acidobac...
http://www.jgi.doe.gov/programs/GEBA/pilot.html
GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan  Eisen, Eddy Rubin, Jim Bristo...
GEBA Pilot Project Overview• Identify major branches in rRNA tree for  which no genomes are available• Identify those with...
GEBA Phylogenomic Lesson 1The rRNA Tree of Life is a Useful Toolfor Identifying Phylogenetically Novel               Genomes
Compare PD in TreesFrom Wu et al. 2009 Nature 462, 1056-1060
GEBA Phylogenomic Lesson 2The rRNA Tree of Life is not perfect ...
16s Says Hyphomonas is in RhodobacterialesBadger et al.2005 Int JSystem EvolMicrobiol 55:1021-1026.
WGT and individual gene trees:                Its Related to CaulobacteralesBadger et al.2005 Int JSystem EvolMicrobiol 55...
GEBA Phylogenomic Lesson 3 Phylogeny-driven genome selection helps discover new genetic diversity
Network of LifeBacteria                                       Archaea Eukaryotes    FIgure from Barton, Eisen et al.      ...
Protein Family Rarefaction Curves• Take data set of multiple complete genomes• Identify all protein families using MCL• Pl...
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Synapomorphies existWu et al. 2009 Nature 462, 1056-1060
+,%-./&#(%)"*                    !"#$%"&(%)"*!                          !
Phylogenetic Distribution Novelty:                Bacterial Actin Related Protein                                         ...
GEBA Phylogenomic Lesson 4Phylogeny driven genome selection(and phylogenetics in general)improves genome annotation
Most/All Functional Prediction Improves      w/ Better Phylogenetic Sampling  • Took 56 GEBA genomes and compared results ...
GEBA Phylogenomic Lesson 5 Improves analysis of genome data    from uncultured organisms
Weighted % of Clones                                                                                                      ...
Weighted % of Clones                                                                                                      ...
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Phylogenomics and the diversity and diversification of microbes
Upcoming SlideShare
Loading in …5
×

Phylogenomics and the diversity and diversification of microbes

1,338 views
1,277 views

Published on

Talk by Jonathan Eisen for seminar series / class at UC Davis.

Published in: Education, Technology
1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
1,338
On SlideShare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
Downloads
25
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • It has been less than 10 years since the first genome was determined\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Functional prediction using a gene tree is just like predicting the biology of a species using a species tree\n
  • Extension of rRNA analysis to uncultured organisms using PCR\n
  • \n
  • \n
  • \n
  • Functional prediction using a gene tree is just like predicting the biology of a species using a species tree\n
  • Functional prediction using a gene tree is just like predicting the biology of a species using a species tree\n
  • Functional prediction using a gene tree is just like predicting the biology of a species using a species tree\n
  • \n
  • \n
  • \n
  • \n
  • This is a tree of a rRNA gene that was found on a large DNA fragment isolated from the Monterey Bay. This rRNA gene groups in a tree with genes from members of the gamma Proteobacteria a group that includes E. coli as well as many environmental bacteria. This rRNA phylotype has been found to be a dominant species in many ocean ecosystems.\n
  • \n
  • \n
  • \n
  • Phylogenetic analysis of rRNAs led to the discovery of archaea\n
  • This is a tree of a rRNA gene that was found on a large DNA fragment isolated from the Monterey Bay. This rRNA gene groups in a tree with genes from members of the gamma Proteobacteria a group that includes E. coli as well as many environmental bacteria. This rRNA phylotype has been found to be a dominant species in many ocean ecosystems.\n\n clone from the Sargasso Sea. This shows that this \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • It has been less than 10 years since the first genome was determined\n
  • Phylogenomics and the diversity and diversification of microbes

    1. 1. Phylogenomics and the Diversity and Diversification of Microbes Jonathan A. Eisen UC Davis UC Davis Talk February 11, 2011
    2. 2. Phylogenomics of Novelty Variation inMechanisms of Mechanisms:Origin of New Patterns, Causes Functions and Effects Species Evolution
    3. 3. Why do this?• Discover causes and effects of differences in evolvability• Improve predictions from genome analysis• Guide interpretation of biological data
    4. 4. Outline• Introduction• Phylogenomic Stories – Within genome invention of novelty – Stealing novelty – Communities of microbes – Community service and knowing what we don’t know
    5. 5. IntroductionGenome Sequencing
    6. 6. rRNA Tree of Life FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.Based on tree from Pace NR, 2003.
    7. 7. Limited Sampling of RRR Studies FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
    8. 8. Limited Sampling of RRR Studies Haloferax MethanococcusChlorobiumDeinococcusThermotoga FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
    9. 9. UV Survival E.coli vs H.volcanii 1 Ecoli vs. Hvolcanii 0.1 0.01Relative 0.001Survival 0.0001 1E-05 1E-06 1E-07 0 50 100 150 200 250 300 350 400 UV J/m2 E.coli NR10121 mfd- E.coli NR10125 mfd+ TIGR H.volcanii WFD11
    10. 10. H. volcanii UV Repair Label 7 - 45J / m2)0.6 Label5#2 0 J/m2 t0 45 J/m2 t0 45 J/m2 Photoreac. 45 J/m2 Dark 24 Hours0.40.2 0 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Avg. Mol. Wt.(Base Pairs)
    11. 11. Fleischmann et al.1995
    12. 12. Limited Sampling of RRR Studies Haloferax MethanococcusChlorobiumDeinococcusThermotoga FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
    13. 13. From http://genomesonline.org
    14. 14. Human commensals
    15. 15. From http://genomesonline.org
    16. 16. Phylogenomics of Novelty I Origin of Functions from Within
    17. 17. Phylogenomics of Novelty • How does novelty originate? • Major categories of processes • From within • De novo invention • Simple substitutions • Duplication and divergence • Domain shuffling • Small & large rearrangements • Regulatory changes • From outside • Lateral gene transfer • Symbioses
    18. 18. Phylogenomics of Novelty • How does novelty originate? • Major categories of processes Mechanisms of • From within Origin of New • De novo invention Functions • Simple substitutions • Duplication and divergence • Domain shuffling • Small & large rearrangements • Regulatory changes • From outside • Lateral gene transfer • Symbioses
    19. 19. From Eisen et al.1997 NatureMedicine 3:1076-1078.
    20. 20. Blast Search of H. pylori “MutS”• Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs• Based on this TIGR predicted this species had mismatch repair• Assumes functional constancy Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
    21. 21. Predicting Function• Identification of motifs – Short regions of sequence similarity that are indicative of general activity – e.g., ATP binding• Homology/similarity based methods – Gene sequence is searched against a databases of other sequences – If significant similar genes are found, their functional information is used• Problem – Genes frequently have similarity to hundreds of motifs and multiple genes, not all with the same function
    22. 22. MutL??Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
    23. 23. Overlaying Functions onto Tree MutS2 Aquae MSH5 Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg MSH6 mSaco Yeast Human Mouse Arath Yeast MSH4 Celeg Human Arath HumanMSH3 Mouse Fly Spombe Yeast Xenla Rat Mouse Yeast HumanMSH1 Spombe Yeast MSH2 Neucr Arath Aquae Trepa Chltr DeiraTheaq BacsuBorbu Thema SynspStrpy Based on Eisen, Ecoli Neigo 1998 Nucl Acids MutS1 Res 26: 4291-4300.
    24. 24. Evolutionary Functional Prediction EXAMPLE A METHOD EXAMPLE B 2A CHOOSE GENE(S) OF INTEREST 5 3A 1 3 4 2B 2 IDENTIFY HOMOLOGS 5 1A 2A 1B 3B 6 ALIGN SEQUENCES 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 CALCULATE GENE TREE Duplication? 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 OVERLAY KNOWN FUNCTIONS ONTO TREE Duplication? 1 2 3 4 5 6 1A 2A 3A 1B 2B 3B INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication? Species 1 Species 2 Species 3 1A 1B 2A 2B 3A 3B 1 2 3 4 5 6 ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Based on Eisen, 1998 Genome Duplication Res 8: 163-167.
    25. 25. Example 2: Recent Changes• Phylogenomic functional prediction NJ * ** V.cholerae VC V.cholerae VC 0512 A1034 V.cholerae VC V.cholerae VC V.cholerae VC A0974 A0068 V.cholerae VC0825 0282 may not work well for very newly V.cholerae VCA0906 V.cholerae VC A0979 V.cholerae VCA1056 V.cholerae VC1643 V.cholerae VC 2161 V.cholerae VCA0923 ** ** V.cholerae VC0514 V.cholerae VC1868 V.cholerae VCA0773 V.cholerae VC1313 evolved functions V.cholerae VC1859 V.cholerae VC 1413 V.cholerae VCA0268 V.cholerae VC A0658 ** V.cholerae VC1405 V.cholerae VC 1298 * V.cholerae V.cholerae VCA0864 VC 1248 V.cholerae VCA0176 V.cholerae VCA0220 ** V.cholerae VC1289 V.cholerae VC1069 A ** V.cholerae VC2439• Can use understanding of origin of V.cholerae VC967 1 V.cholerae VCA0031 V.cholerae VC 1898 V.cholerae VCA0663 V.cholerae VC0988 A V.cholerae VC0216 V.cholerae VC0449 * V.cholerae VCA0008 V.cholerae VC1406 V.cholerae VC 1535 novelty to better interpret these cases? V.cholerae VC 0840 B.subtilis gi2633766 Synechocystis sp. gi1001299 Synechocystis sp.gi1001300 * Synechocystis sp. gi1652276 * Synechocystis * H.pylori sp. gi1652103 gi2313716 H.pylori 99 gi4155097 **C.jejuni ** C.jejuniCj1190c Cj1110c A.fulgidus gi2649560 A.fulgidus gi2649548 ** B.subtilis gi2634254• Screen genomes for genes that have B.subtilis gi2632630 B.subtilis gi2635607 B.subtilis gi2635608 B.subtilis ** ** B.subtilis gi2635609 ** gi2635610 B.subtilis E.coli gi2635882 E.coligi1788195 gi2367378 * ** E.coligi1788194 E.coli A1092 gi1787690 V.cholerae VC changed recently V.cholerae VC0098 E.coli gi1789453 H.pylori gi2313186 H.pylori 99 gi4154603 C.jejuni ** C.jejuni Cj0144 Cj1564 C.jejuni ** C.jejuniCj0262c ** Cj1506c H.pylori gi2313163 * H.pylori 99 gi4154575 **H.pylori gi2313179 ** H.pylori 99 gi4154599– Pseudogenes and gene loss ** C.jejuni Cj0019c C.jejuni C.jejuni Cj0951c Cj0246c B.subtilis gi2633374 T.maritima TM0014 V.cholerae VC V.cholerae VC 1403 A1088 T.pallidum gi3322777 T.pallidum ** T.pallidum gi3322939 gi3322938 ** B.burgdorferi gi2688522– Contingency Loci T.pallidum gi3322296 B.burgdorferi * T.maritima gi2688521 TM0429 T.maritima **T.maritima TM0918 ** TM1428 T.maritima TM0023 * T.maritima TM1143 T.maritima TM1146 P.abyssi PAB1308 P.horikoshii gi3256846 ** P.horikoshii P.abyssi PAB1336– Acquisition (e.g., LGT) ** gi3256896 ** **P.abyssi PAB2066 ** P.horikoshii gi3258290 * ** P.abyssi PAB1026 P.horikoshii gi3256884 ** D.radiodurans DRA00354 D.radiodurans DRA0353 ** D.radiodurans ** ** VC DRA0352 V.cholerae 1394 P.abyssi PAB1189 P.horikoshii gi3258414– Unusual dS/dN ratios ** B.burgdorferi gi2688621 M.tuberculosis gi1666149 V.cholerae VC 0622– Rapid evolutionary rates– Recent duplications
    26. 26. Tetrahymena Genome Processing • Probably exists as a defense mechanism • Analogous to RIPPING and heterochromatin silencing • Presence of repetitive DNA in MAC but not TEs suggests the mechanism involves targeting foreign DNA • Thus unlike RIPPING ciliate processing does not limit diversification by duplicationEisen et al. 2006. PLoS Biology.
    27. 27. Phylogenomics of Novelty IISometimes, it is easier to steal, borrow, or coopt functions rather than evolve them anew
    28. 28. Stealing DNA
    29. 29. rRNA Tree of LifeBacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
    30. 30. Perna et al. 2003
    31. 31. Network of LifeBacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
    32. 32. Correlated gain/loss of genes• Microbial genes are lost rapidly when not maintained by selection• Genes can be acquired by lateral transfer• Frequently gain and loss occurs for entire pathways/processes• Thus might be able to use correlated presence/absence information to identify genes with similar functions
    33. 33. Non-Homology Predictions: Phylogenetic Profiling• Step 1: Search all genes in organisms of interest against all other genomes• Ask: Yes or No, is each gene found in each other species• Cluster genes by distribution patterns (profiles)
    34. 34. Carboxydothermus hydrogenoformans• Isolated from a Russian hotspring• Thermophile (grows at 80°C)• Anaerobic• Grows very efficiently on CO (Carbon Monoxide)• Produces hydrogen gas• Low GC Gram positive (Firmicute)• Genome Determined (Wu et al. 2005 PLoS Genetics 1: e65. )
    35. 35. Homologs of Sporulation Genes Wu et al. 2005 PLoS Genetics 1: e65.
    36. 36. Carboxydothermus sporulates Wu et al. 2005 PLoS Genetics 1: e65.
    37. 37. Wu et al. 2005 PLoS Genetics 1: e65.
    38. 38. Stealing Organisms (Symbioses)
    39. 39. Mutualistic Genome Evolution• Compare and contrast different types of mutualistic symbioses• Diverse hosts, symbionts, biology, ages• Organelles, chemosymbioses, photosynthetic symbioses, nutritional symbioses• What are the rules & patterns?
    40. 40. Glassy Winged Sharpshooter • Feeds on xylem sap • Vector for Pierce’s Disease • Potential bioterror agent
    41. 41. Sharpshooter Shotgun Sequencing shotgun Collaboration with Nancy Wu et al. 2006 PLoS Biology 4: e188. Moran’s lab
    42. 42. Higher Evolutionary Rates in EndosymbiontsWu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
    43. 43. Variation in Evolution Rates MutS MutL + + + + + + + + _ _ _ _Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
    44. 44. Polymorphisms in Metapopulation• Data from ~200 hosts – 104 SNPs – 2 indels• PCR surveys show that this is between host variation• Much lower ratio of transitions:transversions than in Blochmannia• Consistent with absence of MMR from Blochmannia
    45. 45. Baumannia is a Vitamin andCofactor Producing Machine Wu et al. 2006 PLoS Biology 4: e188.
    46. 46. No Amino-Acid Synthesis
    47. 47. The Uncultured Majority
    48. 48. Great Plate Count AnomalyCulturing Microscope Count Count
    49. 49. Great Plate Count AnomalyCulturing Microscope Count <<<< Count
    50. 50. Great Plate Count Anomaly DNACulturing Microscope Count <<<< Count
    51. 51. rRNA PCRThe Hidden Majority Richness estimates Hugenholtz 2002 Bohannan and Hughes 2003
    52. 52. rRNA data increasing exponentially too
    53. 53. Perna et al. 2003
    54. 54. Metagenomics shotgun clone
    55. 55. How can we best use metagenomic data?• Many possible uses including: – Improvements on rRNA based phylotyping and species diversity measurements – Adding functional information on top of phylogenetic/species diversity information• Most/all possible uses either require or are improved with phylogenetic analysis
    56. 56. Example I: Phylotyping with rRNA and other genes
    57. 57. Functional Diversity of Proteorhodopsins? Venter et al., 2004
    58. 58. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
    59. 59. Example II: Binning
    60. 60. Metagenomics Challenge
    61. 61. Binning challengeA TB UC VD WE XF YG Z
    62. 62. Binning challengeA TB UC VD WE XF YG Best binning method: reference genomes Z
    63. 63. Binning challengeA TB UC VD WE XF YG Best binning method: reference genomes Z
    64. 64. Binning challengeA TB UC VD WE XF YG No reference genome? What do you do? Z
    65. 65. Binning challengeA TB UC VD WE XF YG No reference genome? What do you do? Z Phylogeny ....
    66. 66. No Amino-Acid Synthesis
    67. 67. ???????
    68. 68. CFB Phyla
    69. 69. Sulcia makes amino acidsBaumannia makes vitamins and cofactors Wu et al. 2006 PLoS Biology 4: e188.
    70. 70. Phylogenomics of Novelty III Knowing What We Don’t Know
    71. 71. Research Topics Variation inMechanisms of Mechanisms:Origin of New Patterns, Causes Functions and Effects Species Evolution
    72. 72. Research Topics Variation inMechanisms of Mechanisms:Origin of New Patterns, Causes Functions and Effects Species Evolution
    73. 73. As of 2002
    74. 74. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
    75. 75. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
    76. 76. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
    77. 77. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
    78. 78. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of OP8 Project Nitrospira Bacteroides bacteria Chlorobi• A genome Fibrobacteres Marine GroupA • Genome WS3 from each of Gemmimonas sequences are Firmicutes eight phyla Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are only Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Solution I: DictyoglomusEisen, Ward, Aquificae Thermudesulfobacteria sequence moreRobb, Nelson, et Thermotogae phyla OP1al OP11
    79. 79. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Still highly OP10 Thermomicrobia Chloroflexi biased in terms TM7 Deinococcus-Thermus Dictyoglomus of the tree AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
    80. 80. Proteobacteria• GEBA TM6 OS-K • At least 40 Acidobacteria• A genomic Termite Group OP8 phyla of bacteria encyclopedia Nitrospira Bacteroides • Genome Chlorobi of bacteria Fibrobacteres Marine GroupA sequences are and archaea WS3 Gemmimonas mostly from Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter OP10 • Solution: Really Thermomicrobia Chloroflexi Fill in the Tree TM7 Deinococcus-Thermus Dictyoglomus Aquificae ThermudesulfobacteriaEisen & Ward, PIs Thermotogae OP1 OP11
    81. 81. http://www.jgi.doe.gov/programs/GEBA/pilot.html
    82. 82. GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen, Eddy Rubin, Jim Bristow)• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al)• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, Eddy Rubin, Jim Bristow)
    83. 83. GEBA Pilot Project Overview• Identify major branches in rRNA tree for which no genomes are available• Identify those with a cultured representative in DSMZ• DSMZ grew > 200 of these and prepped DNA• Sequence and finish 100+ (covering breadth of bacterial/archaea diversity)• Annotate, analyze, release data• Assess benefits of tree guided sequencing• 1st paper Wu et al in Nature Dec 2009
    84. 84. GEBA Phylogenomic Lesson 1The rRNA Tree of Life is a Useful Toolfor Identifying Phylogenetically Novel Genomes
    85. 85. Compare PD in TreesFrom Wu et al. 2009 Nature 462, 1056-1060
    86. 86. GEBA Phylogenomic Lesson 2The rRNA Tree of Life is not perfect ...
    87. 87. 16s Says Hyphomonas is in RhodobacterialesBadger et al.2005 Int JSystem EvolMicrobiol 55:1021-1026.
    88. 88. WGT and individual gene trees: Its Related to CaulobacteralesBadger et al.2005 Int JSystem EvolMicrobiol 55:1021-1026.
    89. 89. GEBA Phylogenomic Lesson 3 Phylogeny-driven genome selection helps discover new genetic diversity
    90. 90. Network of LifeBacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
    91. 91. Protein Family Rarefaction Curves• Take data set of multiple complete genomes• Identify all protein families using MCL• Plot # of genomes vs. # of protein families
    92. 92. Wu et al. 2009 Nature 462, 1056-1060
    93. 93. Wu et al. 2009 Nature 462, 1056-1060
    94. 94. Wu et al. 2009 Nature 462, 1056-1060
    95. 95. Wu et al. 2009 Nature 462, 1056-1060
    96. 96. Wu et al. 2009 Nature 462, 1056-1060
    97. 97. Synapomorphies existWu et al. 2009 Nature 462, 1056-1060
    98. 98. +,%-./&#(%)"* !"#$%"&(%)"*! !
    99. 99. Phylogenetic Distribution Novelty: Bacterial Actin Related Protein 2"#3)&4&*&& !"#*)$*),+% 5"#$-.-6&0&1- !"#$%,$-%)( 7"#0(1.8-9& !"#$+-+,,! 5"#:1,)*&$/0 !"#&$,%+)+-+ !"#$% !"#$%&()*&& !"#$%&(%() (( +"#,-.(/01 !"#*+,**+( ;"#01,&-*0 !"#%*+$--( <"#$-.-3.1%&0 !"#%,&-+) ) 2"#$&*-.-1 !"#$(-%%+&$ ="#$.1001 !"#-*$+$(&( !&( $++ >"#0$1,/%1.&0 !"#&$**+),)-! *$ $++ ;"#01,&-*0 !"#*+,$*( * 5"#:1,)*&$/0 !"#&$,%+%-%% $++ 5"#$-.-6&0&1- !"#,&+$)* !&) ?"#@-%1*)A10(-. !"#&%%&*%* $++ B"#A1%%/0# "#%*,-&*( )* 2"#*-)).@1*0 !"#*-&(+ 5"#$-.-6&0&1- !"#,&&*&* !&* $++ ?"#@-%1*)A10(-. !"#$)),)*%, $++ ;"#01,&-*0 !"#*+,$*),! ;"#)$C.1$-/@ !"#&&),(*((- +!& 5"#$-.-6&0&1- !"#$++-&%%! ), ."#,1(-*0 !"#$-+*$((&! !&, (( !"#(C1%&1*1 !"#$-,(%+-! (% 5"#$-.-6&0&1- !"#$,+$(,& $++ 5"#:1,)*&$/0 !"#&$,%+-,(,! !&- -) ?"#4&0$)&4-/@ !"#-+&%$- )% ?"#@-%1*)A10(-. !"#$)),),%) () 5"#$-.-6&0&1- !"#,&,$$% $++ ?"#C1*0-*&&!"#&$-*$ $(&$ !&. $++ D"#01(&61 !"#$-&*)%&+! !"#(C1%&1*1!"#$-%$ $),) !&/ ?"#@-%1*)A1(-. !"#$((&+,*- $++ <"#@/0$/%/0 !"#&&&%*(, !&(0 +/*! Haliangium ochraceum DSM 14365 Patrik D’haeseleer, Adam Zemla, Victor KuninWu et al. 2009 Nature 462, 1056-1060 See also Guljamow et al. 2007 Current Biology.
    100. 100. GEBA Phylogenomic Lesson 4Phylogeny driven genome selection(and phylogenetics in general)improves genome annotation
    101. 101. Most/All Functional Prediction Improves w/ Better Phylogenetic Sampling • Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes • Better definition of protein family sequence “patterns” • Greatly improves “comparative” and “evolutionary” based predictions • Conversion of hypothetical into conserved hypotheticals • Linking distantly related members of protein families • Improved non-homology prediction Kostas Natalia Thanos Nikos IainMavrommatis Ivanova Lykidis Kyrpides Anderson
    102. 102. GEBA Phylogenomic Lesson 5 Improves analysis of genome data from uncultured organisms
    103. 103. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
    104. 104. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi without good C FB Major Phylogenetic Group Sargasso Phylotypes C Cannot be done hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c ia sampling of genomes oc cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70

    ×