Phylogenomics and the Diversity and Diversification of Microbes         Jonathan A. Eisen            UC Davis            UC...
Phylogenomics of Novelty
Phylogenomics of Novelty Mechanisms of Origin of New   Functions
Phylogenomics of Novelty Mechanisms of     Variation in Origin of New    Mechanisms:   Functions     Patterns, Causes     ...
Phylogenomics of Novelty Mechanisms of         Variation in Origin of New        Mechanisms:   Functions         Patterns,...
Phylogenomics of Novelty                                      Variation inMechanisms of                                   ...
Outline• Introduction• Phylogenomic Stories  –   Within genome invention of novelty  –   Stealing novelty  –   Communities...
Introduction
rRNA Tree of Life FIgure from Barton, Eisen et al.    “Evolution”, CSHL Press.Based on tree from Pace NR, 2003.
Limited Sampling of RRR Studies        FIgure from Barton, Eisen et al.           “Evolution”, CSHL Press.       Based on ...
Limited Sampling of RRR Studies                                                  Haloferax                                ...
UV Survival E.coli vs H.volcanii                1                        Ecoli vs. Hvolcanii              0.1             ...
H. volcanii UV Repair Label 7 - 45J / m2)0.6                                    Label5#2                        0 J/m2 t0 ...
Fleischmann et al.1995
TIGR Genome Projects                                                    Haloferax                                         ...
From http://genomesonline.org
Human commensals
From http://genomesonline.org
Phylogenomics of Novelty I  Origin of Functions from Within
From Eisen et al.1997 NatureMedicine 3:1076-1078.
Blast Search of H. pylori “MutS”• Blast search pulls up Syn. sp MutS#2 with much higher p  value than other MutS homologs•...
Predicting Function• Identification of motifs   – Short regions of sequence similarity that are indicative of     general ...
MutL??Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
Overlaying Functions onto Tree                                                         MutS2                              ...
Evolutionary Functional Prediction                   EXAMPLE A                                METHOD                      ...
Example 2: Recent Changes• Phylogenomic functional prediction         NJ                                                  ...
RIPPING                     CATGTACAGCA                     GTACATGTCGT                                                   ...
Tetrahymena thermophilamacronuclear genome project
Tetrahymena’s two nuclear genomes                 Micronucleus (MIC)                   Germline Genome                    ...
Macronuclear Differentiation
Tetrahymena Genome Processing                            • Analogous to RIPPING and                              heterochr...
Phylogenomics of Novelty IISometimes, it is easier to steal, borrow, or coopt functions rather than evolve them           ...
rRNA Tree of LifeBacteria                                       Archaea Eukaryotes    FIgure from Barton, Eisen et al.    ...
Perna et al. 2003
Network of LifeBacteria                                       Archaea Eukaryotes    Figure from Barton, Eisen et al.      ...
articles                                                                                                         Arabidops...
Correlated gain/loss of genes• Microbial genes are lost rapidly when not  maintained by selection• Genes can be acquired b...
Non-Homology Predictions:    Phylogenetic Profiling• Step 1: Search all genes in  organisms of interest against all  other ...
Carboxydothermus hydrogenoformans• Isolated from a Russian hotspring• Thermophile (grows at 80°C)• Anaerobic• Grows very e...
Homologs of Sporulation Genes                         Wu et al. 2005                         PLoS Genetics 1:             ...
Carboxydothermus sporulates       Wu et al. 2005 PLoS Genetics 1: e65.
Wu et al. 2005 PLoS Genetics 1: e65.
Stealing Organisms (Symbioses)
Mutualistic Genome Evolution• Compare and contrast different types of  mutualistic symbioses• Diverse hosts, symbionts, bi...
Glassy Winged Sharpshooter                 • Obligate xylem feeder                 • Can transmit Pierce’s                ...
Sharpshooter Shotgun Sequencing                              shotgun   Collaboration with Nancy                           ...
Higher Evolutionary Rates in                   EndosymbiontsWu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy ...
Variation in Evolution Rates                                                                MutS         MutL             ...
Baumannia is a Vitamin andCofactor Producing Machine                             Wu et al.                             200...
No Amino-Acid Synthesis
The Uncultured Majority
Great Plate Count AnomalyCulturing     Microscope  Count         Count
Great Plate Count AnomalyCulturing       Microscope  Count     <<<< Count
Great Plate Count Anomaly                             DNACulturing       Microscope  Count     <<<< Count
rRNA PCRThe Hidden Majority            Richness estimates             Hugenholtz 2002         Bohannan and Hughes 2003
rRNA data increasing exponentially too
Perna et al. 2003
Metagenomics         shotgun                   clone
How can we best use         metagenomic data?• Many possible uses including:  – Improvements on rRNA based phylotyping and...
Example I: Phylotyping with   rRNA and other genes
Functional Diversity of Proteorhodopsins?                                 Venter et al., 2004
Weighted % of Clones                                                                                                      ...
Example II: Binning
Metagenomics Challenge
Binning challengeA                       TB                       UC                       VD                       WE    ...
Binning challengeA                                            TB                                            UC            ...
Binning challengeA                                            TB                                            UC            ...
Binning challengeA                                          TB                                          UC                ...
No Amino-Acid Synthesis
???????
CFB Phyla
Sulcia makes amino acidsBaumannia makes vitamins and cofactors                         Wu et al. 2006 PLoS Biology 4: e188.
Phylogenomics of Novelty III  Knowing What We Don’t Know
Research Topics                                        Variation inMechanisms of                                       Mec...
Research Topics                                        Variation inMechanisms of                                       Mec...
As of 2002
As of 2002   Proteobacteria             TM6             OS-K                    • At least 40             Acidobacteria   ...
As of 2002   Proteobacteria             TM6             OS-K                                     • At least 40            ...
As of 2002   Proteobacteria             TM6             OS-K                                     • At least 40            ...
As of 2002   Proteobacteria             TM6             OS-K                                     • At least 40            ...
Need for Tree Guidance Well Established• Common approach within some eukaryotic  groups• Many small projects funded to fill...
Proteobacteria• NSF-funded       TM6                   OS-K                                           • At least 40  Tree ...
Proteobacteria• NSF-funded        TM6                    OS-K                                            • At least 40  Tr...
Proteobacteria• NSF-funded        TM6                    OS-K                                            • At least 40  Tr...
Proteobacteria• NSF-funded        TM6                    OS-K                                            • At least 40  Tr...
Proteobacteria• NSF-funded        TM6                    OS-K                                            • At least 40  Tr...
Proteobacteria• GEBA              TM6                    OS-K                    • At least 40                    Acidobac...
http://www.jgi.doe.gov/programs/GEBA/pilot.html
GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan  Eisen, Eddy Rubin, Jim Bristo...
GEBA Pilot Project Overview• Identify major branches in rRNA tree for  which no genomes are available• Identify those with...
Network of LifeBacteria                                       Archaea Eukaryotes    Figure from Barton, Eisen et al.      ...
GEBA Lesson 1:          The rRNA Tree of Life is a Useful Tool          for Identifying Phylogenetically NovelFrom Wu et a...
GEBA Lesson 2:           The rRNA Tree of Life is not perfect ...               16s                                       ...
GEBA Lesson 3:  Phylogeny driven genome selection (and phylogenetics) improves genome annotation• Took 56 GEBA genomes and...
GEBA Lesson 4: Metadata Important
GEBA Phylogenomic Lesson 5  Phylogeny-driven genome selection  helps discover new genetic diversity
Phylogenetic Distribution Novelty:                Bacterial Actin Related Protein                                         ...
Network of LifeBacteria                                       Archaea Eukaryotes    FIgure from Barton, Eisen et al.      ...
Protein Family Rarefaction              Curves• Take data set of multiple complete genomes• Identify all protein families ...
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Synapomorphies existWu et al. 2009 Nature 462, 1056-1060
Families/PD not uniform    +,%-./&#(%)"*                            !"#$%"&(%)"*!                                  !
Structural Novelty• Of the 17000 protein families in the GEBA56, 1800  are novel in sequence (Wu)• Structural modeling sug...
GEBA Phylogenomic Lesson 6  Improves analysis of genome data     from uncultured organisms
Weighted % of Clones                                                                                                      ...
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Upcoming SlideShare
Loading in …5
×

Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11

3,420 views

Published on

Talk by Jonathan A. Eisen at UCSF Mission Bay Feb 17, 2011.

Published in: Education, Technology
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total views
3,420
On SlideShare
0
From Embeds
0
Number of Embeds
1,189
Actions
Shares
0
Downloads
52
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • It has been less than 10 years since the first genome was determined\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Genome sizes estimated from careful cytospectrophotometry in the 1970&amp;#x2019;s. 180 Mb = Drosophila size.\nMAC chromosome copy # exception: rDNA @ ~9,000 copies per MAC (by quantitative DNA hybridization)\nChromosome #s:\n MIC: Direct microscopic observations (1950s)\n Quantitative measurements in stained pulsed-field gels (1980s)\n
  • Cbs = chromosome breakage site\nIES = internally eliminated segment\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Functional prediction using a gene tree is just like predicting the biology of a species using a species tree\n
  • Extension of rRNA analysis to uncultured organisms using PCR\n
  • \n
  • \n
  • \n
  • Functional prediction using a gene tree is just like predicting the biology of a species using a species tree\n
  • Functional prediction using a gene tree is just like predicting the biology of a species using a species tree\n
  • Functional prediction using a gene tree is just like predicting the biology of a species using a species tree\n
  • \n
  • \n
  • \n
  • \n
  • This is a tree of a rRNA gene that was found on a large DNA fragment isolated from the Monterey Bay. This rRNA gene groups in a tree with genes from members of the gamma Proteobacteria a group that includes E. coli as well as many environmental bacteria. This rRNA phylotype has been found to be a dominant species in many ocean ecosystems.\n
  • \n
  • \n
  • \n
  • Phylogenetic analysis of rRNAs led to the discovery of archaea\n
  • This is a tree of a rRNA gene that was found on a large DNA fragment isolated from the Monterey Bay. This rRNA gene groups in a tree with genes from members of the gamma Proteobacteria a group that includes E. coli as well as many environmental bacteria. This rRNA phylotype has been found to be a dominant species in many ocean ecosystems.\n\n clone from the Sargasso Sea. This shows that this \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • It has been less than 10 years since the first genome was determined\n
  • Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11

    1. 1. Phylogenomics and the Diversity and Diversification of Microbes Jonathan A. Eisen UC Davis UCSF Talk February 17, 2011
    2. 2. Phylogenomics of Novelty
    3. 3. Phylogenomics of Novelty Mechanisms of Origin of New Functions
    4. 4. Phylogenomics of Novelty Mechanisms of Variation in Origin of New Mechanisms: Functions Patterns, Causes and Effects
    5. 5. Phylogenomics of Novelty Mechanisms of Variation in Origin of New Mechanisms: Functions Patterns, Causes and Effects Species Evolution
    6. 6. Phylogenomics of Novelty Variation inMechanisms of Mechanisms:Origin of New Patterns, Causes Functions and Effects Species Evolution
    7. 7. Outline• Introduction• Phylogenomic Stories – Within genome invention of novelty – Stealing novelty – Communities of microbes – Community service and knowing what we don’t know
    8. 8. Introduction
    9. 9. rRNA Tree of Life FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.Based on tree from Pace NR, 2003.
    10. 10. Limited Sampling of RRR Studies FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
    11. 11. Limited Sampling of RRR Studies Haloferax MethanococcusChlorobiumDeinococcusThermotoga FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
    12. 12. UV Survival E.coli vs H.volcanii 1 Ecoli vs. Hvolcanii 0.1 0.01Relative 0.001Survival 0.0001 1E-05 1E-06 1E-07 0 50 100 150 200 250 300 350 400 UV J/m2 E.coli NR10121 mfd- E.coli NR10125 mfd+ TIGR H.volcanii WFD11
    13. 13. H. volcanii UV Repair Label 7 - 45J / m2)0.6 Label5#2 0 J/m2 t0 45 J/m2 t0 45 J/m2 Photoreac. 45 J/m2 Dark 24 Hours0.40.2 0 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Avg. Mol. Wt.(Base Pairs)
    14. 14. Fleischmann et al.1995
    15. 15. TIGR Genome Projects Haloferax MethanococcusChlorobiumDeinococcusThermotoga FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
    16. 16. From http://genomesonline.org
    17. 17. Human commensals
    18. 18. From http://genomesonline.org
    19. 19. Phylogenomics of Novelty I Origin of Functions from Within
    20. 20. From Eisen et al.1997 NatureMedicine 3:1076-1078.
    21. 21. Blast Search of H. pylori “MutS”• Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs• Based on this TIGR predicted this species had mismatch repair• Assumes functional constancy Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
    22. 22. Predicting Function• Identification of motifs – Short regions of sequence similarity that are indicative of general activity – e.g., ATP binding• Homology/similarity based methods – Gene sequence is searched against a databases of other sequences – If significant similar genes are found, their functional information is used• Problem – Genes frequently have similarity to hundreds of motifs and multiple genes, not all with the same function
    23. 23. MutL??Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
    24. 24. Overlaying Functions onto Tree MutS2 Aquae MSH5 Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg MSH6 mSaco Yeast Human Mouse Arath Yeast MSH4 Celeg Human Arath HumanMSH3 Mouse Fly Spombe Yeast Xenla Rat Mouse Yeast HumanMSH1 Spombe Yeast MSH2 Neucr Arath Aquae Trepa Chltr DeiraTheaq BacsuBorbu Thema SynspStrpy Based on Eisen, Ecoli Neigo 1998 Nucl Acids MutS1 Res 26: 4291-4300.
    25. 25. Evolutionary Functional Prediction EXAMPLE A METHOD EXAMPLE B 2A CHOOSE GENE(S) OF INTEREST 5 3A 1 3 4 2B 2 IDENTIFY HOMOLOGS 5 1A 2A 1B 3B 6 ALIGN SEQUENCES 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 CALCULATE GENE TREE Duplication? 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 OVERLAY KNOWN FUNCTIONS ONTO TREE Duplication? 1 2 3 4 5 6 1A 2A 3A 1B 2B 3B INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication? Species 1 Species 2 Species 3 1A 1B 2A 2B 3A 3B 1 2 3 4 5 6 ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Based on Eisen, 1998 Genome Duplication Res 8: 163-167.
    26. 26. Example 2: Recent Changes• Phylogenomic functional prediction NJ * ** V.cholerae VC V.cholerae VC 0512 A1034 V.cholerae VC V.cholerae VC V.cholerae VC A0974 A0068 V.cholerae VC0825 0282 may not work well for very newly V.cholerae VCA0906 V.cholerae VC A0979 V.cholerae VCA1056 V.cholerae VC1643 V.cholerae VC 2161 V.cholerae VCA0923 ** ** V.cholerae VC0514 V.cholerae VC1868 V.cholerae VCA0773 V.cholerae VC1313 evolved functions V.cholerae VC1859 V.cholerae VC 1413 V.cholerae VCA0268 V.cholerae VC A0658 ** V.cholerae VC1405 V.cholerae VC 1298 * V.cholerae V.cholerae VCA0864 VC 1248 V.cholerae VCA0176 V.cholerae VCA0220 ** V.cholerae VC1289 V.cholerae VC1069 A ** V.cholerae VC2439• Can use understanding of origin of V.cholerae VC967 1 V.cholerae VCA0031 V.cholerae VC 1898 V.cholerae VCA0663 V.cholerae VC0988 A V.cholerae VC0216 V.cholerae VC0449 * V.cholerae VCA0008 V.cholerae VC1406 V.cholerae VC 1535 novelty to better interpret these cases? V.cholerae VC 0840 B.subtilis gi2633766 Synechocystis sp. gi1001299 Synechocystis sp.gi1001300 * Synechocystis sp. gi1652276 * Synechocystis * H.pylori sp. gi1652103 gi2313716 H.pylori 99 gi4155097 **C.jejuni ** C.jejuniCj1190c Cj1110c A.fulgidus gi2649560 A.fulgidus gi2649548 ** B.subtilis gi2634254• Screen genomes for genes that have B.subtilis gi2632630 B.subtilis gi2635607 B.subtilis gi2635608 B.subtilis ** ** B.subtilis gi2635609 ** gi2635610 B.subtilis E.coli gi2635882 E.coligi1788195 gi2367378 * ** E.coligi1788194 E.coli A1092 gi1787690 V.cholerae VC changed recently V.cholerae VC0098 E.coli gi1789453 H.pylori gi2313186 H.pylori 99 gi4154603 C.jejuni ** C.jejuni Cj0144 Cj1564 C.jejuni ** C.jejuniCj0262c ** Cj1506c H.pylori gi2313163 * H.pylori 99 gi4154575 **H.pylori gi2313179 ** H.pylori 99 gi4154599– Pseudogenes and gene loss ** C.jejuni Cj0019c C.jejuni C.jejuni Cj0951c Cj0246c B.subtilis gi2633374 T.maritima TM0014 V.cholerae VC V.cholerae VC 1403 A1088 T.pallidum gi3322777 T.pallidum ** T.pallidum gi3322939 gi3322938 ** B.burgdorferi gi2688522– Contingency Loci T.pallidum gi3322296 B.burgdorferi * T.maritima gi2688521 TM0429 T.maritima **T.maritima TM0918 ** TM1428 T.maritima TM0023 * T.maritima TM1143 T.maritima TM1146 P.abyssi PAB1308 P.horikoshii gi3256846 ** P.horikoshii P.abyssi PAB1336– Acquisition (e.g., LGT) ** gi3256896 ** **P.abyssi PAB2066 ** P.horikoshii gi3258290 * ** P.abyssi PAB1026 P.horikoshii gi3256884 ** D.radiodurans DRA00354 D.radiodurans DRA0353 ** D.radiodurans ** ** VC DRA0352 V.cholerae 1394 P.abyssi PAB1189 P.horikoshii gi3258414– Unusual dS/dN ratios ** B.burgdorferi gi2688621 M.tuberculosis gi1666149 V.cholerae VC 0622– Rapid evolutionary rates– Recent duplications
    27. 27. RIPPING CATGTACAGCA GTACATGTCGT Galagan et al. Genome CATGTACAGCA GTACATGTCGT sequence reveals CATGTACAGCA significant S GTACATGTCGT underrepresentation of F TATGTATAG ATACATATC recently duplicated genes. TATATATAG A O ATATATATC TATGTATAGTA ATACATATCAT O CH3 CH3 CH3 TATATATAGCA R ATATATATCGT CH3 AU: Fig. 12.30. leg- PFIGURE 12.30. RIPPING. “The repeat-induced point mutation (RIP) process in Neurospora crassa. end fromDuplications that occur during the vegetative phase are detected by RIP during the sexual cycle source; re-after fertilization but before the DNA synthesis and nuclear fusion (karyogamy). Duplicated se- place withquences that are longer than ~400 bp (or ~1 kb for unlinked duplications as shown) and sharing an originalgreater than ~80% nucleotide identity are detected. Numerous C-G to T-A point mutations are in- legend.troduced into both copies (unmutated C-G pairs are shown in blue; mutations are shown in redletters; only a small number of base pairs are shown for clarity). RIP-mutated sequences are fre-quent targets for methylation, which results in transcriptional silencing in Neurospora. In contrastto mammals and plants, methylation is not limited to symmetric sites.”
    28. 28. Tetrahymena thermophilamacronuclear genome project
    29. 29. Tetrahymena’s two nuclear genomes Micronucleus (MIC) Germline Genome (Silent) 5 pairs of chromosomes Macronucleus (MAC) Somatic genome (Expressed) 250-300 chromosomes @ ~45 copies each
    30. 30. Macronuclear Differentiation
    31. 31. Tetrahymena Genome Processing • Analogous to RIPPING and heterochromatin silencing • Targets new/foreign DNA not duplicated DNA • Does not limit diversification by duplicationEisen et al. 2006. PLoS Biology.
    32. 32. Phylogenomics of Novelty IISometimes, it is easier to steal, borrow, or coopt functions rather than evolve them anew
    33. 33. rRNA Tree of LifeBacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
    34. 34. Perna et al. 2003
    35. 35. Network of LifeBacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
    36. 36. articles Arabidopsis thaliana ** Authorship of this paper should be cited as `The Arabidopsis Genome Iniative. A full list of contributors appears at the end of this paper.......................................................................................................................................................................................................................................................................... . .The ¯owering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions.Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication,followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral genetransfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000families, similar to the functional diversity of Drosophila and Caenorhabditis elegansÐ the other sequenced multicellulareukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the setsof common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the ®rstcomplete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processesin all eukaryotes, identifying a wide range of plant-speci®c gene functions and establishing rapid systematic ways to identifygenes for crop improvement.The plant and animal kingdoms evolved independently from biologists, but will also affect agricultural science, evolutionaryunicellular eukaryotes and represent highly contrasting life forms. biology, bioinformatics, combinatorial chemistry, functional andThe genome sequences of C. elegans1 and Drosophila2 reveal that comparative genomics, and molecular medicine.metazoans share a great deal of genetic information required fordevelopmental and physiological processes, but these genome Overview of sequencing strategysequences represent a limited survey of multicellular organisms. We used large-insert bacterial arti®cial chromosome (BAC), phageFlowering plants have unique organizational and physiological (P1) and transformation-competent arti®cial chromosome (TAC)properties in addition to ancestral features conserved between libraries9±12 as the primary substrates for sequencing. Early stages ofplants and animals. The genome sequence of a plant provides a genome sequencing used 79 cosmid clones. Physical maps of themeans for understanding the genetic basis of differences between genome of accession Columbia were assembled by restrictionplants and other eukaryotes, and provides the foundation for fragment `®ngerprint analysis of BAC clones13, by hybridization14detailed functional characterization of plant genes. or polymerase chain reaction (PCR)15 of sequence-tagged sites and Arabidopsis thaliana has many advantages for genome analysis, by hybridization and Southern blotting16. The resulting maps wereincluding a short generation time, small size, large number of integrated (http://nucleus/cshl.org/arabmaps/) with the geneticoffspring, and a relatively small nuclear genome. These advantages map and provided a foundation for assembling sets of contigspromoted the growth of a scienti®c community that has investi- into sequence-ready tiling paths. End sequence (http://www.gated the biological processes of Arabidopsis and has characterized tigr.org/tdb/at/abe/bac_end_search.html) of 47,788 BAC clonesmany genes3. To support these activities, an international collabora- was used to extend contigs from BACS anchored by marker contenttion (the Arabidopsis Genome Initiative, AGI) began sequencing and to integrate contigs.the genome in 1996. The sequences of chromosomes 2 and 4 have Ten contigs representing the chromosome arms and centromericbeen reported4,5, and the accompanying Letters describe the heterochromatin were assembled from 1,569 BAC, TAC, cosmid andsequences of chromosomes 1 (ref. 6), 3 (ref. 7) and 5 (ref. 8). P1 clones (average insert size 100 kilobases (kb)). Twenty-two PCR Here we report analysis of the completed Arabidopsis genome products were ampli®ed directly from genomic DNA and
    37. 37. Correlated gain/loss of genes• Microbial genes are lost rapidly when not maintained by selection• Genes can be acquired by lateral transfer• Frequently gain and loss occurs for entire pathways/processes• Thus might be able to use correlated presence/absence information to identify genes with similar functions
    38. 38. Non-Homology Predictions: Phylogenetic Profiling• Step 1: Search all genes in organisms of interest against all other genomes• Ask: Yes or No, is each gene found in each other species• Cluster genes by distribution patterns (profiles)
    39. 39. Carboxydothermus hydrogenoformans• Isolated from a Russian hotspring• Thermophile (grows at 80°C)• Anaerobic• Grows very efficiently on CO (Carbon Monoxide)• Produces hydrogen gas• Low GC Gram positive (Firmicute)• Genome Determined (Wu et al. 2005 PLoS Genetics 1: e65. )
    40. 40. Homologs of Sporulation Genes Wu et al. 2005 PLoS Genetics 1: e65.
    41. 41. Carboxydothermus sporulates Wu et al. 2005 PLoS Genetics 1: e65.
    42. 42. Wu et al. 2005 PLoS Genetics 1: e65.
    43. 43. Stealing Organisms (Symbioses)
    44. 44. Mutualistic Genome Evolution• Compare and contrast different types of mutualistic symbioses• Diverse hosts, symbionts, biology, ages• Organelles, chemosymbioses, photosynthetic symbioses, nutritional symbioses• What are the rules & patterns?
    45. 45. Glassy Winged Sharpshooter • Obligate xylem feeder • Can transmit Pierce’s Disease agent • Potential bioterror agent • Needs to get amino- acids and other nutrients from symbionts like aphids
    46. 46. Sharpshooter Shotgun Sequencing shotgun Collaboration with Nancy Wu et al. 2006 PLoS Biology 4: e188. Moran’s lab
    47. 47. Higher Evolutionary Rates in EndosymbiontsWu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
    48. 48. Variation in Evolution Rates MutS MutL + + + + + + + + _ _ _ _Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
    49. 49. Baumannia is a Vitamin andCofactor Producing Machine Wu et al. 2006 PLoS Biology 4: e188.
    50. 50. No Amino-Acid Synthesis
    51. 51. The Uncultured Majority
    52. 52. Great Plate Count AnomalyCulturing Microscope Count Count
    53. 53. Great Plate Count AnomalyCulturing Microscope Count <<<< Count
    54. 54. Great Plate Count Anomaly DNACulturing Microscope Count <<<< Count
    55. 55. rRNA PCRThe Hidden Majority Richness estimates Hugenholtz 2002 Bohannan and Hughes 2003
    56. 56. rRNA data increasing exponentially too
    57. 57. Perna et al. 2003
    58. 58. Metagenomics shotgun clone
    59. 59. How can we best use metagenomic data?• Many possible uses including: – Improvements on rRNA based phylotyping and species diversity measurements – Adding functional information on top of phylogenetic/species diversity information• Most/all possible uses either require or are improved with phylogenetic analysis
    60. 60. Example I: Phylotyping with rRNA and other genes
    61. 61. Functional Diversity of Proteorhodopsins? Venter et al., 2004
    62. 62. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
    63. 63. Example II: Binning
    64. 64. Metagenomics Challenge
    65. 65. Binning challengeA TB UC VD WE XF YG Z
    66. 66. Binning challengeA TB UC VD WE XF YG Best binning method: reference genomes Z
    67. 67. Binning challengeA TB UC VD WE XF YG Best binning method: reference genomes Z
    68. 68. Binning challengeA TB UC VD WE XF YG No reference genome? What do you do? Z
    69. 69. No Amino-Acid Synthesis
    70. 70. ???????
    71. 71. CFB Phyla
    72. 72. Sulcia makes amino acidsBaumannia makes vitamins and cofactors Wu et al. 2006 PLoS Biology 4: e188.
    73. 73. Phylogenomics of Novelty III Knowing What We Don’t Know
    74. 74. Research Topics Variation inMechanisms of Mechanisms:Origin of New Patterns, Causes Functions and Effects Species Evolution
    75. 75. Research Topics Variation inMechanisms of Mechanisms:Origin of New Patterns, Causes Functions and Effects Species Evolution
    76. 76. As of 2002
    77. 77. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
    78. 78. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
    79. 79. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
    80. 80. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
    81. 81. Need for Tree Guidance Well Established• Common approach within some eukaryotic groups• Many small projects funded to fill in some bacterial or archaeal gaps• Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature
    82. 82. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of OP8 Project Nitrospira Bacteroides bacteria Chlorobi• A genome Fibrobacteres Marine GroupA • Genome WS3 from each of Gemmimonas sequences are Firmicutes eight phyla Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are only Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Solution I: DictyoglomusEisen, Ward, Aquificae Thermudesulfobacteria sequence moreRobb, Nelson, et Thermotogae phyla OP1al OP11
    83. 83. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Still highly OP10 Thermomicrobia Chloroflexi biased in terms TM7 Deinococcus-Thermus Dictyoglomus of the tree AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
    84. 84. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Archaea TM7 Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
    85. 85. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Eukaryotes TM7 Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
    86. 86. Proteobacteria• NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides• A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Viruses TM7 Deinococcus-Thermus Dictyoglomus AquificaeEisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
    87. 87. Proteobacteria• GEBA TM6 OS-K • At least 40 Acidobacteria• A genomic Termite Group OP8 phyla of bacteria encyclopedia Nitrospira Bacteroides • Genome Chlorobi of bacteria Fibrobacteres Marine GroupA sequences are and archaea WS3 Gemmimonas mostly from Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter OP10 • Solution: Really Thermomicrobia Chloroflexi Fill in the Tree TM7 Deinococcus-Thermus Dictyoglomus Aquificae ThermudesulfobacteriaEisen & Ward, PIs Thermotogae OP1 OP11
    88. 88. http://www.jgi.doe.gov/programs/GEBA/pilot.html
    89. 89. GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen, Eddy Rubin, Jim Bristow)• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al)• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, Eddy Rubin, Jim Bristow)
    90. 90. GEBA Pilot Project Overview• Identify major branches in rRNA tree for which no genomes are available• Identify those with a cultured representative in DSMZ• DSMZ grew > 200 of these and prepped DNA• Sequence and finish 100+ (covering breadth of bacterial/archaea diversity)• Annotate, analyze, release data• Assess benefits of tree guided sequencing• 1st paper Wu et al in Nature Dec 2009
    91. 91. Network of LifeBacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
    92. 92. GEBA Lesson 1: The rRNA Tree of Life is a Useful Tool for Identifying Phylogenetically NovelFrom Wu et al. 2009 Nature 462, 1056-1060
    93. 93. GEBA Lesson 2: The rRNA Tree of Life is not perfect ... 16s WGT, 23SBadger et al. 2005 Int J System Evol Microbiol 55: 1021-1026.
    94. 94. GEBA Lesson 3: Phylogeny driven genome selection (and phylogenetics) improves genome annotation• Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes• Better definition of protein family sequence “patterns”• Greatly improves “comparative” and “evolutionary” based predictions• Conversion of hypothetical into conserved hypotheticals• Linking distantly related members of protein families• Improved non-homology prediction
    95. 95. GEBA Lesson 4: Metadata Important
    96. 96. GEBA Phylogenomic Lesson 5 Phylogeny-driven genome selection helps discover new genetic diversity
    97. 97. Phylogenetic Distribution Novelty: Bacterial Actin Related Protein 2"#3)&4&*&& !"#*)$*),+% 5"#$-.-6&0&1- !"#$%,$-%)( 7"#0(1.8-9& !"#$+-+,,! 5"#:1,)*&$/0 !"#&$,%+)+-+ !"#$% !"#$%&()*&& !"#$%&(%() (( +"#,-.(/01 !"#*+,**+( ;"#01,&-*0 !"#%*+$--( <"#$-.-3.1%&0 !"#%,&-+) ) 2"#$&*-.-1 !"#$(-%%+&$ ="#$.1001 !"#-*$+$(&( !&( $++ >"#0$1,/%1.&0 !"#&$**+),)-! *$ $++ ;"#01,&-*0 !"#*+,$*( * 5"#:1,)*&$/0 !"#&$,%+%-%% $++ 5"#$-.-6&0&1- !"#,&+$)* !&) ?"#@-%1*)A10(-. !"#&%%&*%* $++ B"#A1%%/0# "#%*,-&*( )* 2"#*-)).@1*0 !"#*-&(+ 5"#$-.-6&0&1- !"#,&&*&* !&* $++ ?"#@-%1*)A10(-. !"#$)),)*%, $++ ;"#01,&-*0 !"#*+,$*),! ;"#)$C.1$-/@ !"#&&),(*((- +!& 5"#$-.-6&0&1- !"#$++-&%%! ), ."#,1(-*0 !"#$-+*$((&! !&, (( !"#(C1%&1*1 !"#$-,(%+-! (% 5"#$-.-6&0&1- !"#$,+$(,& $++ 5"#:1,)*&$/0 !"#&$,%+-,(,! !&- -) ?"#4&0$)&4-/@ !"#-+&%$- )% ?"#@-%1*)A10(-. !"#$)),),%) () 5"#$-.-6&0&1- !"#,&,$$% $++ ?"#C1*0-*&&!"#&$-*$ $(&$ !&. $++ D"#01(&61 !"#$-&*)%&+! !"#(C1%&1*1!"#$-%$ $),) !&/ ?"#@-%1*)A1(-. !"#$((&+,*- $++ <"#@/0$/%/0 !"#&&&%*(, !&(0 +/*! Haliangium ochraceum DSM 14365 Patrik D’haeseleer, Adam Zemla, Victor KuninWu et al. 2009 Nature 462, 1056-1060 See also Guljamow et al. 2007 Current Biology.
    98. 98. Network of LifeBacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
    99. 99. Protein Family Rarefaction Curves• Take data set of multiple complete genomes• Identify all protein families using MCL• Plot # of genomes vs. # of protein families
    100. 100. Wu et al. 2009 Nature 462, 1056-1060
    101. 101. Wu et al. 2009 Nature 462, 1056-1060
    102. 102. Wu et al. 2009 Nature 462, 1056-1060
    103. 103. Wu et al. 2009 Nature 462, 1056-1060
    104. 104. Wu et al. 2009 Nature 462, 1056-1060
    105. 105. Synapomorphies existWu et al. 2009 Nature 462, 1056-1060
    106. 106. Families/PD not uniform +,%-./&#(%)"* !"#$%"&(%)"*! !
    107. 107. Structural Novelty• Of the 17000 protein families in the GEBA56, 1800 are novel in sequence (Wu)• Structural modeling suggests many are structurally novel too (Dhaeseleer)• 372 being crystallized by the PSI (Kerfeld)
    108. 108. GEBA Phylogenomic Lesson 6 Improves analysis of genome data from uncultured organisms
    109. 109. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFGVenter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70

    ×