Phylogenetic and Phylogenomic                    Approaches to the              Study of Microbial Communities            ...
Acknowledgements            • $$$                  •      DOE                  •      NSF                  •      GBMF    ...
Outline            • Introduction            • Phylotyping and phylogenetic ecology            • Functional prediction    ...
Phylogeny • Phylogeny is a description of the   evolutionary history of   relationships among organisms (or   their parts)...
Whatever the History: Trying to Incorporate it is Critical                                 Four Models for Rooting TOL    ...
Uses of Phylogeny                 in Genomics and Metagenomics                             Example 1:                     ...
rRNA Phylotyping                                    • Collect DNA from                                      environment   ...
rRNA PhylotypingWednesday, March 7, 12
Three Major Issues in Phylotpying   Beyond Moore’s Law                  Metagenomics                         Short readsWe...
rRNA Phylotyping in      Sargasso Sea      Metagenomic    Metagenomic Data            Venter et al., Science            30...
RecA         Phylotyping in         Sargasso Data                 Venter et al., Science                 304: 66. 2004Wedn...
RecA         Phylotyping in         Sargasso Data                 Venter et al., Science                 304: 66. 2004Wedn...
Sargasso Phylotypes                        0.500                                                                          ...
Solution: More Automation            • BLAST????            • Composition/word frequencies            • Automation of tree...
AutoPhylotyping 1:                         Each Sequence is an IslandWednesday, March 7, 12
STAP                                               Wu et al. 2008 PLoS OneFigure 1. A flow chart of the STAP pipeline.Wedn...
STAP                                               Figure 1. A flow chart of the STAP pipeline.                           ...
AMPHORA   Wu and Eisen   Genome Biology   2008 9:R151 doi:   10.1186/   gb-2008-9-10-r151Wednesday, March 7, 12
WGT   Wu and Eisen Genome Biology 2008 9:R151   doi:10.1186/gb-2008-9-10-r151Wednesday, March 7, 12
AMPHORA   Wu and Eisen   Genome Biology   2008 9:R151 doi:   10.1186/   gb-2008-9-10-r151                         Guide tr...
Wu and Eisen Genome Biology 2008 9:R151   doi:10.1186/gb-2008-9-10-r151Wednesday, March 7, 12
Comparison of the phylotyping performance by AMPHORA and MEGAN. The sensitivity and specificity of the phylotypingmethods ...
AutoPhylotyping 2:                         Most in the FamilyWednesday, March 7, 12
Metagenomic Phylogenetic challenge                              xxxxxxxxxxxxxxxxxxxxxxx                            xxxxxx ...
Metagenomic Phylogenetic challenge                              xxxxxxxxxxxxxxxxxxxxxxx                            xxxxxx ...
rRNA Phylotyping in      Sargasso Sea      Metagenomic    Metagenomic Data            Venter et al., Science            30...
Combine all into                                                              one alignment               Figure 1. A flow...
Cluster, cluster of more than three identical sequences.                           APPLIED AND ENVIRONMENTAL MICROBIOLOGY,...
RecA         Phylotyping in         Sargasso Data                 Venter et al., Science                 304: 66. 2004Wedn...
RecA         Phylotyping in         Sargasso Data                 Venter et al., Science                 304: 66. 2004Wedn...
Sargasso Phylotypes                        0.500                                                                          ...
AutoPhylotyping 3:                          All in the FamilyWednesday, March 7, 12
Metagenomic Phylogenetic challenge                              xxxxxxxxxxxxxxxxxxxxxxx                            xxxxxx ...
Metagenomic Phylogenetic challenge                         A single tree with everythingWednesday, March 7, 12
Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylin     ...
Wednesday, March 7, 12
AutoPhylotyping 4:                         All in the GenomeWednesday, March 7, 12
Challenge            • Each gene poorly sampled in metagenomes            • Can we combine all into a single tree?Wednesda...
AMPHORA ALL          Kembel et al. The phylogenetic diversity of metagenomes. PLoS          One 2011Wednesday, March 7, 12
Wednesday, March 7, 12
the communities combined (18), is a quantitative measure that                                                        accou...
AutoPhylotyping 5:                  Novel lineages and declutteringWednesday, March 7, 12
RecA Tree of Life                         Bacteria                                                                 Archaea...
Lek Clustering                                                     0.75                         0.75                      ...
Lek Clustering                                       Cutoff of 0.5                                                        ...
GOS 1                         RecA                                GOS 2    RecA                                GOS 3      ...
RpoB TooWednesday, March 7, 12
Side benefit: binningWednesday, March 7, 12
Sulcia makes amino acids    Baumannia makes vitamins and cofactors                             Wu et al. 2006 PLoS Biology...
Uses of Phylogeny                in Genomics and Metagenomics                               Example 2:                    ...
Predicting Function            • Key step in genome projects            • More accurate predictions help guide            ...
PHYLOGENENETIC PREDICTION OF GENE FUNCTION                                     EXAMPLE A                                ME...
PHYLOGENENETIC PREDICTION OF GENE FUNCTION                                     EXAMPLE A                                ME...
0.01  Legend:                                                                                                  Halorubrum ...
!"#                                                                 Haloarchaea TBPs                                      ...
Massive Diversity of Proteorhodopsins                                                   Venter et al., 2004Wednesday, Marc...
Characterizing the niche-space distributions of components                                   Metagenomics DARPA           ...
Uses of Phylogeny                in Genomics and Metagenomics                             Example 3:                    Se...
rRNA Tree of Life                         Bacteria                                                                 Archaea...
As of 2002               Proteobacteria                         TM6                         OS-K                    • At l...
As of 2002              Proteobacteria                         TM6                         OS-K                           ...
As of 2002              Proteobacteria                         TM6                         OS-K                           ...
As of 2002              Proteobacteria                         TM6                         OS-K                           ...
As of 2002              Proteobacteria                         TM6                         OS-K                           ...
Wednesday, March 7, 12
http://www.jgi.doe.gov/programs/GEBA/pilot.htmlWednesday, March 7, 12
GEBA Pilot Project: Components         • Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen,           Eddy...
GEBA Lesson 1:                     Phylogeny driven genome selection (and                    phylogenetics) improves genom...
GEBA Lesson 2                         Phylogeny-driven genome selection                         helps discover new genetic...
Protein Family Rarefaction                                   Curves            • Take data set of multiple complete genome...
Wu et al. 2009 Nature 462, 1056-1060Wednesday, March 7, 12
Wu et al. 2009 Nature 462, 1056-1060Wednesday, March 7, 12
Wu et al. 2009 Nature 462, 1056-1060Wednesday, March 7, 12
Wu et al. 2009 Nature 462, 1056-1060Wednesday, March 7, 12
Wu et al. 2009 Nature 462, 1056-1060Wednesday, March 7, 12
Synapomorphies existWu et al. 2009 Nature 462, 1056-1060Wednesday, March 7, 12
Families/PD not uniform               31	                                       6	                                        ...
GEBA Lesson 3                         Improves analysis of genome data from                                 uncultured org...
Shotgun Sequencing Allows Use of Other Markers                                                                            ...
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes
Upcoming SlideShare
Loading in …5
×

Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes

1,003 views
901 views

Published on

Talk by Jonathan Eisen March 7, 2012 at the National Academy of Sciences Institute of Medicine "Forum on Microbial Threats" meeting on the "Social Biology of Microbes"

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,003
On SlideShare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
Downloads
36
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metagenomes

  1. 1. Phylogenetic and Phylogenomic Approaches to the Study of Microbial Communities March 7, 2012 IOM Forum on Microbial Threats Social Biology of Microbes Jonathan A. Eisen University of California, DavisWednesday, March 7, 12
  2. 2. Acknowledgements • $$$ • DOE • NSF • GBMF • Sloan • DARPA • DSMZ • DHS • People, places • DOE JGI: Eddy Rubin, Phil Hugenholtz, Nikos Kyrpides • UC Davis: Aaron Darling, Dongying Wu, Holly Bik, Russell Neches, Jenna Morgan-Lang • Other: Jessica Green, Katie Pollard, Martin Wu, Tom Slezak, Jack Gilbert, Steven Kembel, J. Craig Venter, Naomi Ward, Hans-Peter KlenkWednesday, March 7, 12
  3. 3. Outline • Introduction • Phylotyping and phylogenetic ecology • Functional prediction • Selecting organisms • Future needsWednesday, March 7, 12
  4. 4. Phylogeny • Phylogeny is a description of the evolutionary history of relationships among organisms (or their parts). • This is frequently portrayed in a diagram called a phylogenetic tree. • Phylogenies can be more complex than a bifurcating tree (e.g., lateral gene transfer, recombination, hybridization)Wednesday, March 7, 12
  5. 5. Whatever the History: Trying to Incorporate it is Critical Four Models for Rooting TOL from Lake et al. doi: 10.1098/rstb.2009.0035Wednesday, March 7, 12
  6. 6. Uses of Phylogeny in Genomics and Metagenomics Example 1: Phylotyping and Phylogenetic EcologyWednesday, March 7, 12
  7. 7. rRNA Phylotyping • Collect DNA from environment • PCR amplify rRNA genes using broad (so-called universal) primers • Sequence • Align to others • Infer evolutionary tree • Unknowns “identified” by placement on treeWednesday, March 7, 12
  8. 8. rRNA PhylotypingWednesday, March 7, 12
  9. 9. Three Major Issues in Phylotpying Beyond Moore’s Law Metagenomics Short readsWednesday, March 7, 12
  10. 10. rRNA Phylotyping in Sargasso Sea Metagenomic Metagenomic Data Venter et al., Science 304: 66. 2004Wednesday, March 7, 12
  11. 11. RecA Phylotyping in Sargasso Data Venter et al., Science 304: 66. 2004Wednesday, March 7, 12
  12. 12. RecA Phylotyping in Sargasso Data Venter et al., Science 304: 66. 2004Wednesday, March 7, 12
  13. 13. Sargasso Phylotypes 0.500 EFG EFTu HSP70 RecA RpoB rRNA 0.375 Weighted % of Clones 0.250 0.125 0 ia ia ria s i xi ia a ob te ot le er er er e u or ae of ct ct ct ct ic hl or ba ba ba ba ch rm C hl ar eo eo eo so Fi C ry Fu t t ot ro ro Eu pr ap ap lta ph m De am Al G Major Phylogenetic Group Venter et al., Science 304: 66-74. 2004Wednesday, March 7, 12
  14. 14. Solution: More Automation • BLAST???? • Composition/word frequencies • Automation of treesWednesday, March 7, 12
  15. 15. AutoPhylotyping 1: Each Sequence is an IslandWednesday, March 7, 12
  16. 16. STAP Wu et al. 2008 PLoS OneFigure 1. A flow chart of the STAP pipeline.Wednesday, March 7, 12
  17. 17. STAP Figure 1. A flow chart of the STAP pipeline. doi:10.1371/journal.pone.0002566.g001 STAP database, and the query sequence is aligned to them using the CLUSTALW profile alignment algorithm [40] as described a w above for domain assignment. By adapting the profile alignment s a t o G t t s Each sequence T c a analyzed separately q c e b b S p a Figure 2. Domain assignment. In Step 1, STAP assigns a domain to t each query sequence based on its position in a maximum likelihood d tree of representative ss-rRNA sequences. Because the tree illustrated ‘ here is not rooted, domain assignment would not be accurate and s reliable (sequence similarity based methods cannot make an accurate s assignment in this case either). However the figure illustrates an important role of the tree-based domain assignment step, namely s automatic identification of deep-branching environmental ss-rRNAs. d doi:10.1371/journal.pone.0002566.g002 a PLoS ONE | www.plosone.org 5 Wu et al. 2008 PLoS OneFigure 1. A flow chart of the STAP pipeline.Wednesday, March 7, 12
  18. 18. AMPHORA Wu and Eisen Genome Biology 2008 9:R151 doi: 10.1186/ gb-2008-9-10-r151Wednesday, March 7, 12
  19. 19. WGT Wu and Eisen Genome Biology 2008 9:R151 doi:10.1186/gb-2008-9-10-r151Wednesday, March 7, 12
  20. 20. AMPHORA Wu and Eisen Genome Biology 2008 9:R151 doi: 10.1186/ gb-2008-9-10-r151 Guide treeWednesday, March 7, 12
  21. 21. Wu and Eisen Genome Biology 2008 9:R151 doi:10.1186/gb-2008-9-10-r151Wednesday, March 7, 12
  22. 22. Comparison of the phylotyping performance by AMPHORA and MEGAN. The sensitivity and specificity of the phylotypingmethods were measured across taxonomic ranks using simulated Sanger shotgun sequences of 31 genes from 100representative bacterial genomes. The figure shows that AMPHORA significantly outperforms MEGAN in sensitivity withoutsacrificing specificity.Wu and Eisen Genome Biology 2008 9:R151 doi:10.1186/gb-2008-9-10-r151Wednesday, March 7, 12
  23. 23. AutoPhylotyping 2: Most in the FamilyWednesday, March 7, 12
  24. 24. Metagenomic Phylogenetic challenge xxxxxxxxxxxxxxxxxxxxxxx xxxxxx xxxxxxxxxxxxx xxxxxxxxxxxxxx xxxxxxxxxxxxxx A single tree with everythingWednesday, March 7, 12
  25. 25. Metagenomic Phylogenetic challenge xxxxxxxxxxxxxxxxxxxxxxx xxxxxx xxxxxxxxxxxxx xxxxxxxxxxxxxx xxxxxxxxxxxxxx A single tree with everythingWednesday, March 7, 12
  26. 26. rRNA Phylotyping in Sargasso Sea Metagenomic Metagenomic Data Venter et al., Science 304: 66. 2004Wednesday, March 7, 12
  27. 27. Combine all into one alignment Figure 1. A flow chart of the STAP pipeline.Wednesday, March 7, 12
  28. 28. Cluster, cluster of more than three identical sequences. APPLIED AND ENVIRONMENTAL MICROBIOLOGY, Feb. 2006, p. 1680–1683 Vol. 72, No. 2 Downloaded from http://aem.asm.org/ on November 15, 2011 by guest on November 15, 2011 by guest 0099-2240/06/$08.00ϩ0 doi:10.1128/AEM.72.2.1680–1683.2006 Copyright © 2006, American Society for Microbiology. All Rights Reserved. sequences obtained in this and previous (12, 13) studies fall NAC11-7 (from the same algal bloom study [8]) and an uncul- within the marine roseobacters (see Fig. S1 in the supplemen- tivated marine bacterium, ZD0207, associated with dimethyl- Characterization of Bacterial Communities Associated with Deep-Sea tal material), a major clade of culturable marine heterotrophs (7), many of which play a role in sulfur cycles (e.g., see refer- sulfoniopropionate uptake (15). CGOAB33 is most similar to one (slope strain EI1*) of a group of thiosulfate-oxidizing Corals on Gulf of Alaska Seamounts† ence 8). One clade of six CGOF sequences is most closely bacteria from marine sediments and hydrothermal vents (14). related to NAC11-6 from a dimethylsulfoniopropionate-pro- Members of the family Pseudomonadaceae comprised 23 to Kevin Penn,1 Dongying closely Jonathan A. Eisen,1,2 and Naomi Ward1,3* in those samples ducing algal bloom (8), while CGOCA38 groups Wu,1 with 69% of the gammaproteobacterial sequences The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 208501; Johns Hopkins University, Charles and 34th Streets, Baltimore, Maryland 212182; and Center of Marine Biotechnology, 701 East Pratt Street, Baltimore, Maryland 202123 Received 22 June 2005/Accepted 8 November 2005 Although microbes associated with shallow-water corals have been reported, deepwater coral microbes are poorly characterized. A cultivation-independent analysis of Alaskan seamount octocoral microflora showed Downloaded from http://aem.asm.org/ that Proteobacteria (classes Alphaproteobacteria and Gammaproteobacteria), Firmicutes, Bacteroidetes, and Ac- idobacteria dominate and vary in abundance. More sampling is needed to understand the basis and significance of this variation. The most abundant corals on Gulf of Alaska seamounts are vitrogen). Amplifications were performed with an initial dena- octocorals (9), which create a habitat structure for mobile turation of 2 min at 94°C, followed by 29 cycles of 30 s at 94°C, fauna (4). Concerns about the benthic impacts of commercial 30 s at 55°C, and 2 min at 72°C, with a final extension of 5 min fishing have renewed interest in habitat-forming deep-sea cor- at 72°C. PCR products were cloned using a TOPO TA cloning als (4). Studies of shallow-water scleractinian corals (12) have kit (Invitrogen), and primers M13F and M13R were used to revealed a diverse microflora and evidence of host-microbe sequence positions 9 to 1545 of the 16S rRNA gene. interactions. Although studies of the deep-sea octocoral mi- BLASTN (1) was used to compare our query sequences with croflora are under way (10), there have been no published reference sequences from the RDP2 (3) database. Represen- reports describing the microbial community composition. tative sequences from the BLASTN output were aligned with Three Gulf of Alaska seamounts were visited during re- our query sequences, using an RDP2-provided profile align- search cruise AT7-15/16 aboard the R/V Atlantis. The biolog- ment. Neighbor-joining trees were created using PHYLIP (6) ical objectives of the cruise included sampling of deep-sea and used to assign putative taxonomy down to the family level. octocorals for studies of their dispersal and reproductive strat- Detailed phylogenetic trees were constructed using the rele- egies, with a particular focus on the abundant bamboo corals vant sequences from each clone library, two reference se- (Isididae). We took advantage of available coral specimens to quences most closely related to the query sequence, and addi- examine their associated microflora. tional reference sequences. Alignments were generated using Coral, rock, and water column samples (Table 1) were col- the RDP2 profile alignment, and bootstrapped neighbor-join- lected from the Warwick, Murray, and Chirikof seamounts ing trees were reconstructed using PHYLIP (6). using the deep-submergence vehicle Alvin. Corals and rocks The clones sequenced comprised 19 phyla (see Table S1 in were harvested using the submersible’s manipulators and the supplemental material), dominated by Proteobacteria stored in a closed box during ascent to minimize physical dis- (classes Alphaproteobacteria and Gammaproteobacteria), Firmi- turbance by surface waters. The water adjacent to coral colo- cutes, Bacteroidetes, and Acidobacteria (Fig. 1). The relative nies was sampled using a Niskin bottle fired at depth. After proportions of these groups varied widely across the five coral submersible recovery, freshly extruded coral exopolysaccharide FIG. 1. Histogram showing percentages of composition (by taxon) for 16S rRNA as did the degree to which a given library was domi- samples, gene libraries generated for this study, showing only taxa comprising at least 20% of and rock in at least wereclone library. to and scrapings of coral sequences surfaces one transferred nated by a single group (Fig. 1; see Table S1 in the supple- sterile cryovials. Water samples were prefiltered through 20- mental material). At the subphylum level, families occurring in ␮m-pore-size Nitex, concentrated using a TFF apparatus (Mil- major proportions included Rhizobiaceae, Rhodobacteraceae, lipore), and vacuum filtered (1.0-␮m and 0.2-␮m pore size). and Sphingomonadaceae (Alphaproteobacteria); Pseudomona-Wednesday, March 7, 12 The 0.2-␮m filter retentate was resuspended in sterile saline
  29. 29. RecA Phylotyping in Sargasso Data Venter et al., Science 304: 66. 2004Wednesday, March 7, 12
  30. 30. RecA Phylotyping in Sargasso Data Venter et al., Science 304: 66. 2004Wednesday, March 7, 12
  31. 31. Sargasso Phylotypes 0.500 EFG EFTu HSP70 RecA RpoB rRNA 0.375 Weighted % of Clones 0.250 0.125 0 ia ia ria s i xi ia a ob te ot le er er er e u or ae of ct ct ct ct ic hl or ba ba ba ba ch rm C hl ar eo eo eo so Fi C ry Fu t t ot ro ro Eu pr ap ap lta ph m De am Al G Major Phylogenetic Group Venter et al., Science 304: 66-74. 2004Wednesday, March 7, 12
  32. 32. AutoPhylotyping 3: All in the FamilyWednesday, March 7, 12
  33. 33. Metagenomic Phylogenetic challenge xxxxxxxxxxxxxxxxxxxxxxx xxxxxx xxxxxxxxxxxxx xxxxxxxxxxxxxx xxxxxxxxxxxxxx A single tree with everythingWednesday, March 7, 12
  34. 34. Metagenomic Phylogenetic challenge A single tree with everythingWednesday, March 7, 12
  35. 35. Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylin PhylOTU - Sharpton et al. PLoS Comp. Bio 2011 workflow of PhylOTU. See Results section for details. doi:10.1371/journal.pcbi.1001061.g001Wednesday, March 7, 12
  36. 36. Wednesday, March 7, 12
  37. 37. AutoPhylotyping 4: All in the GenomeWednesday, March 7, 12
  38. 38. Challenge • Each gene poorly sampled in metagenomes • Can we combine all into a single tree?Wednesday, March 7, 12
  39. 39. AMPHORA ALL Kembel et al. The phylogenetic diversity of metagenomes. PLoS One 2011Wednesday, March 7, 12
  40. 40. Wednesday, March 7, 12
  41. 41. the communities combined (18), is a quantitative measure that accounts for different levels of divergence between sequences. The phylogenetic test (P test), which measures the significance of the association between environment and phylogeny (18), is typically used as a qualitative measure because duplicate se- quences are usually removed from the tree. However, the P test may be used in a semiquantitative manner if all clones, even those with identical or near-identical sequences, are in- cluded in the tree (13). Here we describe a quantitative version of UniFrac that we call “weighted UniFrac.” We show that weighted UniFrac be- haves similarly to the FST test in situations where both are FIG. 1. Calculation of the unweighted and the weighted UniFrac measures. Squares and circles represent sequences from two different environments. (a) In unweighted UniFrac, the distance between the circle and square communities is calculated as the fraction of the branch length that has descendants from either the square or the circle environment (black) but not both (gray). (b) In weighted UniFrac, branch lengths are weighted by the relative abundance of sequences in the square and circle communities; square sequences are weighted twice as much as circle sequences because there are twice as many total circle sequences in the data set. The width of branches is proportional to the degree to which each branch is weighted in the calculations, and gray branches have no weight. Branches 1 and 2 have heavy weights since the descendants are biased toward the square and circles, respec- tively. Branch 3 contributes no value since it has an equal contribution from circle and square sequences after normalization. Figure 3. Taxonomic diversity and standardized phylogenetic diversity versus depth in environmental samples along an oceanic depth gradient at the HOT ALO site.Wednesday, March 7, 12
  42. 42. AutoPhylotyping 5: Novel lineages and declutteringWednesday, March 7, 12
  43. 43. RecA Tree of Life Bacteria Archaea Other lineages? Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740Wednesday, March 7, 12
  44. 44. Lek Clustering 0.75 0.75 0.33 1 1 0.75 0.75Wednesday, March 7, 12
  45. 45. Lek Clustering Cutoff of 0.5 0.75 0.75 0.33 1 1 0.75 0.75Wednesday, March 7, 12
  46. 46. GOS 1 RecA GOS 2 RecA GOS 3 GOS 4 GOS 5Wednesday, March 7, 12
  47. 47. RpoB TooWednesday, March 7, 12
  48. 48. Side benefit: binningWednesday, March 7, 12
  49. 49. Sulcia makes amino acids Baumannia makes vitamins and cofactors Wu et al. 2006 PLoS Biology 4: e188.Wednesday, March 7, 12
  50. 50. Uses of Phylogeny in Genomics and Metagenomics Example 2: Functional Diversity and Functional PredictionsWednesday, March 7, 12
  51. 51. Predicting Function • Key step in genome projects • More accurate predictions help guide experimental and computational analyses • Many diverse approaches • All improved both by “phylogenomic” type analyses that integrate evolutionary reconstructions and understanding of how new functions evolveWednesday, March 7, 12
  52. 52. PHYLOGENENETIC PREDICTION OF GENE FUNCTION EXAMPLE A METHOD EXAMPLE B 2A CHOOSE GENE(S) OF INTEREST 5 3A 1 3 4 2B 2 IDENTIFY HOMOLOGS 5 1A 2A 1B 3B 6 ALIGN SEQUENCES 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 CALCULATE GENE TREE Duplication? 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 OVERLAY KNOWN FUNCTIONS ONTO TREE Duplication? 2B 3B 1 2 3 4 5 6 1A 2A 3A 1B INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication? Species 1 Species 2 Species 3 1A 1B 1 2 3 4 5 6 2A 2B 3A 3B ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Based on Eisen, 1998 Genome Duplication Res 8: 163-167.Wednesday, March 7, 12
  53. 53. PHYLOGENENETIC PREDICTION OF GENE FUNCTION EXAMPLE A METHOD EXAMPLE B 2A CHOOSE GENE(S) OF INTEREST 5 3A 1 3 4 2B 2 IDENTIFY HOMOLOGS 5 1A 2A 1B 3B 6 ALIGN SEQUENCES 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 CALCULATE GENE TREE Duplication? 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 OVERLAY KNOWN FUNCTIONS ONTO TREE Duplication? 2B 3B 1 2 3 4 5 6 1A 2A 3A 1B INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication? Species 1 Species 2 Species 3 1A 1B 1 2 3 4 5 6 2A 2B 3A 3B ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Based on Eisen, 1998 Genome Duplication Res 8: 163-167.Wednesday, March 7, 12
  54. 54. 0.01 Legend: Halorubrum lacusprofundi 0.32 Haloquadratum walsbyi Dataset genes 0.93 Halogeometricum borinquense MA ammonialyase 0.55 0.83 Haloferax mediterranei MA mutase S subunit 1 Haloferax mucosum 0.90 Haloferax volcanii MA mutase E subunit 0.34 Haloferax sulfurifontis PHA synthatase Haloferax denitrificans cellulase 0.41 Halalkalicoccus jeotgali CRISPRs 1 Halopiger xanaduensis 0.52 Natrialba magadii CAS 0.32 Haloterrigena turkmenica 1 Halobacterium sp. NRC 1 Color ranges: 0.08 Halobacterium salinarum R1 Natronomonas pharaonis New Genomes 0.23 Halorhabdus utahensis 0.79 0.52 Halomicrobium mukohataei 1 Haloarcula vallismortis 0.71 Haloarcula marismortui 0.24 Haloarcula sinaiiensis Haloarcula californiaeWednesday, March 7, 12
  55. 55. !"# Haloarchaea TBPs !"E# $%&?)*%7.1)5+()**%-)+.D !"HJ $%&?)*%7.1/28/1.# !"MD $%&?)*%7.>&2%-++.# !"NL $%&?)*%7.8/&?/*+?-(+8.# !"HH $%&?)*%7.5)-+(*+?+2%-8.# $%&A/%5*%(/1.B%&84C+.D $%&,)1)(*+2/1.4*+-A/)-8).# !"#D # $%&4%2()*+/1.86".3:; #.(46= $%&4%2()*+/1.8%&+-%*/1.:#.(46= !"DK!"H# !"KL $%&()**+,)-%.(/*01)-+2%.# !"J# 3%(*+%&4%.1%,%5++.# !"DE $%&6+,)*.7%-%5/)-8+8.# !"ED $%&%&0%&+222/8.9)(,%&+.# !"JH !"L! # $%&4%2()*+/1.86".3:; #.(46< $%&4%2()*+/1.8%&+-%*/1.:#.(46< !"ML 3%(*-1-%8.6@%*%-+8.# !"ED $%&*@%45/8./(%@)-8+8.# !"EE $%&1+2*4+/1.1/0@%(%)+.# # $%&%*2/&%.1%*+81*(/+.# !"DJ $%&%*2/&%.>%&&+81*(+8.# !"#J $%&%*2/&%.8+-%++)-8+8.# $%&%*2/&%.2%&+?*-+%).# $%&*/4*/1.&%2/86*?/-5+.E # $%&4%2()*+/1.86".3:; #.(46F $%&4%2()*+/1.8%&+-%*/1.:#.(46F# !"JD !"NK $%&4%2()*+/1.86".3:; #.(46G !"L# $%&4%2()*+/1.8%&+-%*/1.:#.E !"N! $%&4%2()*+/1.8%&+-%*/1.:#.H !"JN $%&4%2()*+/1.8%&+-%*/1.:#.# !"JH $%&4%2()*+/1.86".3:; #.(46I $%&4%2()*+/1.8%&+-%*/1.:#.D $%&*/4*/1.&%2/86*?/-5+.D !"NH!"K# $%&,)1)(*+2/1.4*+-A/)-8).D !"E! $%&*/4*/1.&%2/86*?/-5+.H !"MJ $%&A/%5*%(/1.B%&84C+.# !"M! $%&?)*%7.1/28/1.D !"NJ $%&?)*%7.1)5+()**%-)+.# !"LN $%&?)*%7.>&2%-++.D !"NJ !"K# $%&?)*%7.8/&?/*+?-(+8.D $%&?)*%7.5)-+(*+?+2%-8.D !"J! $%&?)*%7.5)-+(*+?+2%-8.E !"MN $%&?)*%7.1/28/1.H $%&?)*%7.1)5+()**%-)+.H !"KN !"JM $%&?)*%7.1/28/1.E !"MD $%&?)*%7.1)5+()**%-)+.E !"NM $%&?)*%7.>&2%-++.E !"MJ $%&?)*%7.8/&?/*+?-(+8.H !"EK $%&?)*%7.5)-+(*+?+2%-8.H !"J! $%&%*2/&%.2%&+?*-+%).D # $%&?)*%7.>&2%-++.H # $%&*/4*/1.&%2/86*?/-5+.# # $%&4%2()*+/1.86".3:; #.(46; $%&4%2()*+/1.8%&+-%*/1.:#.(46;# Figure 8. Independent expansion of the TATA-binding protein family in two haloarchaeal genera. Phylogeny of TATA-binding protein (TBP) homologs identified by RAST with Bootstrap values shown. Colored branches represent duplication events (with the dark blue branch representing four duplications). Ancestral TBP (found in all genomes) is shown on the purple branch. Successive duplications are shown in darkening shades of green (Halobacterium) or blue (Haloferax). Lynch et al. in preparationWednesday, March 7, 12
  56. 56. Massive Diversity of Proteorhodopsins Venter et al., 2004Wednesday, March 7, 12
  57. 57. Characterizing the niche-space distributions of components Metagenomics DARPA 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .2 0 .4 0 .6 0 .8 1 .0 Polyne sia Archipe la gos_ G S 0 4 8 a _ C ora l R e e f India n O ce a n_ G S 1 2 0 _ O pe n O ce a n Polyne sia Archipe la gos_ G S 0 4 9 _ C oa sta l G a la pa gos Isla nds_ G S 0 2 6 _ O pe n O ce a n India n O ce a n_ G S 1 1 9 _ O pe n O ce a n G e ne ra l C a ribbe a n S e a _ G S 0 1 5 _ C oa sta l C a ribbe a n S e a _ G S 0 1 9 _ C oa sta l India n O ce a n_ G S 1 1 4 _ O pe n O ce a n H igh E a ste rn Tropica l Pa cific_ G S 0 2 3 _ O pe n O ce a n M e dium India n O ce a n_ G S 1 1 0 a _ O pe n O ce a n India n O ce a n_ G S 1 0 8 a _ La goon R e e f Low C a ribbe a n S e a _ G S 0 1 8 _ O pe n O ce a n NA G a la pa gos Isla nds_ G S 0 3 4 _ C oa sta l India n O ce a n_ G S 1 2 2 a _ O pe n O ce a n India n O ce a n_ G S 1 2 1 _ O pe n O ce a n C a ribbe a n S e a _ G S 0 1 7 _ O pe n O ce a n India n O ce a n_ G S 1 1 2 a _ O pe n O ce a n India n O ce a n_ G S 1 1 3 _ O pe n O ce a n India n O ce a n_ G S 1 4 8 _ F ringing R e e f C a ribbe a n S e a _ G S 0 1 6 _ C oa sta l S e a India n O ce a n_ G S 1 2 3 _ O pe n O ce a n India n O ce a n_ G S 1 4 9 _ H a rbor G a la pa gos Isla nds_ G S 0 2 7 _ C oa sta l E a ste rn Tropica l Pa cific_ G S 0 2 2 _ O pe n O ce a n W a te r de pth S ites S a rga sso S e a _ G S 0 0 1 c_ O pe n O ce a n G a la pa gos Isla nds_ G S 0 3 5 _ C oa sta l G a la pa gos Isla nds_ G S 0 3 0 _ W a rm S e e p G a la pa gos Isla nds_ G S 0 2 9 _ C oa sta l >4000m G a la pa gos Isla nds_ G S 0 3 1 _ C oa sta l upwe lling India n O ce a n_ G S 1 1 7 a _ C oa sta l sa m ple 2000!4000m G a la pa gos Isla nds_ G S 0 2 8 _ C oa sta l 900!2000m G a la pa gos Isla nds_ G S 0 3 6 _ C oa sta l 100!200m Polyne sia Archipe la gos_ G S 0 5 1 _ C ora l R e e f Atoll N orth Am e rica n E a st C oa st_ G S 0 1 4 _ C oa sta l 20!100m N orth Am e rica n E a st C oa st_ G S 0 0 6 _ E stua ry 0!20m E a ste rn Tropica l Pa cific_ G S 0 2 1 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 0 9 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 1 1 _ E stua ry N orth Am e rica n E a st C oa st_ G S 0 0 8 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 1 3 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 0 4 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 0 7 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 0 3 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 0 2 _ C oa sta l N orth Am e rica n E a st C oa st_ G S 0 0 5 _ E m baym e nt Co Co Co Co Co Chlorophyll Water Depth Salinity Temperature Sample Depth Insolation mp mp mp mp mp on on on on on en en en en en t1 t2 t3 t4 t5 (a) (b) (c) Figure 3: a) Niche-space distributions for our five components (H T ); b) the site- ˆ ˆ similarity matrix (H T H); c) environmental variables for the sites. The matrices are aligned so that the same row corresponds to the same site in each matrix. Sites are ordered by applying spectral reordering to the similarity matrix (see Materials and Methods). Rows are aligned across the three matrices.Wednesday, March 7, 12
  58. 58. Uses of Phylogeny in Genomics and Metagenomics Example 3: Selecting Organisms for StudyWednesday, March 7, 12
  59. 59. rRNA Tree of Life Bacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740Wednesday, March 7, 12
  60. 60. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002Wednesday, March 7, 12
  61. 61. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Most genomes WS3 Gemmimonas from three Firmicutes Fusobacteria phyla Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002Wednesday, March 7, 12
  62. 62. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Most genomes WS3 Gemmimonas from three Firmicutes Fusobacteria phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some studies Deferribacteres Chrysiogenetes in other phyla NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002Wednesday, March 7, 12
  63. 63. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Most genomes WS3 Gemmimonas from three Firmicutes Fusobacteria phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely OP3 Planctomycetes Spriochaetes sampled Coprothmermobacter OP10 • Same trend in Thermomicrobia Chloroflexi TM7 Eukaryotes Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002Wednesday, March 7, 12
  64. 64. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Most genomes WS3 Gemmimonas from three Firmicutes Fusobacteria phyla Actinobacteria OP9 Cyanobacteria Synergistes • Some other Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia Chlamydia sparsely OP3 Planctomycetes Spriochaetes sampled Coprothmermobacter OP10 • Same trend in Thermomicrobia Chloroflexi TM7 Viruses Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on Hugenholtz, OP11 2002Wednesday, March 7, 12
  65. 65. Wednesday, March 7, 12
  66. 66. http://www.jgi.doe.gov/programs/GEBA/pilot.htmlWednesday, March 7, 12
  67. 67. GEBA Pilot Project: Components • Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen, Eddy Rubin, Jim Bristow) • Project management (David Bruce, Eileen Dalin, Lynne Goodwin) • Culture collection and DNA prep (DSMZ, Hans-Peter Klenk) • Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng) • Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al) • Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla) • Adopt a microbe education project (Cheryl Kerfeld) • Outreach (David Gilbert) • $$$ (DOE, Eddy Rubin, Jim Bristow)Wednesday, March 7, 12
  68. 68. GEBA Lesson 1: Phylogeny driven genome selection (and phylogenetics) improves genome annotation • Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes • Better definition of protein family sequence “patterns” • Greatly improves “comparative” and “evolutionary” based predictions • Conversion of hypothetical into conserved hypotheticals • Linking distantly related members of protein families • Improved non-homology predictionWednesday, March 7, 12
  69. 69. GEBA Lesson 2 Phylogeny-driven genome selection helps discover new genetic diversityWednesday, March 7, 12
  70. 70. Protein Family Rarefaction Curves • Take data set of multiple complete genomes • Identify all protein families using MCL • Plot # of genomes vs. # of protein familiesWednesday, March 7, 12
  71. 71. Wu et al. 2009 Nature 462, 1056-1060Wednesday, March 7, 12
  72. 72. Wu et al. 2009 Nature 462, 1056-1060Wednesday, March 7, 12
  73. 73. Wu et al. 2009 Nature 462, 1056-1060Wednesday, March 7, 12
  74. 74. Wu et al. 2009 Nature 462, 1056-1060Wednesday, March 7, 12
  75. 75. Wu et al. 2009 Nature 462, 1056-1060Wednesday, March 7, 12
  76. 76. Synapomorphies existWu et al. 2009 Nature 462, 1056-1060Wednesday, March 7, 12
  77. 77. Families/PD not uniform 31 6 Wednesday, March 7, 12
  78. 78. GEBA Lesson 3 Improves analysis of genome data from uncultured organismsWednesday, March 7, 12
  79. 79. Shotgun Sequencing Allows Use of Other Markers Sargasso Phylotypes 0.500 0.375 GEBA ProjectWeighted % of Clones 0.250 improves EFG EFTu HSP70 metagenomic analysis RecA RpoB rRNA 0.125 0 ia ia ria s i xi ia a ob te ot le er er er e u or ae of ct ct ct ct ic hl or ba ba ba ba ch rm C hl ar eo eo eo so Fi C ry Fu t t ot ro ro Eu pr ap ap lta ph m De am Al G Major Phylogenetic Group Venter et al., Science 304: 66-74. 2004 Wednesday, March 7, 12

×