Phylogeny-Driven Approaches to Genomics and Metagenomics - talk by Jonathan Eisen at Fresno State May 6, 2013

1,703 views

Published on

Phylogeny-Driven Approaches to Genomics and Metagenomics

Published in: Science, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,703
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
30
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Phylogeny-Driven Approaches to Genomics and Metagenomics - talk by Jonathan Eisen at Fresno State May 6, 2013

  1. 1. ! Phylogeny-Driven Approaches to Genomics and Metagenomics ! Jonathan A. Eisen May 6, 2013 ! Talk at Fresno State University ! !
  2. 2. Phylogeny
  3. 3. Whatever the History: Try to Incorporate It from Lake et al. doi: 10.1098/rstb.2009.0035
  4. 4. DNA sequencing
  5. 5. My Obsessions
  6. 6. My Obsessions
  7. 7. My Obsessions
  8. 8. My Obsessions
  9. 9. ! The Importance of History ! Jonathan A. Eisen May 6, 2013 ! Talk at Fresno State University ! !
  10. 10. Era I: The Tree of Life
  11. 11. 12 Ernst Haeckel 1866 www.mblwhoilibrary.org Plantae Protista Animalia
  12. 12. 13 Monera Protista Plantae Fungi Animalia Whittaker – Five Kingdoms 1969
  13. 13. Tree from Woese. 1987. Microbiological Reviews 51:221 Woese - Three Domains 1977
  14. 14. My Obsessions in Graduate School Tree from Woese. 1987. Microbiological Reviews 51:221
  15. 15. Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740 Tree Updated
  16. 16. adapted from Baldauf, et al., in Assembling the Tree of Life, 2004 Tree Updated
  17. 17. My Obsessions Stayed Tree from Woese. 1987. Microbiological Reviews 51:221
  18. 18. Limited Sampling of RRR Studies Tree from Woese. 1987. Microbiological Reviews 51:221
  19. 19. My Study Organisms Tree from Woese. 1987. Microbiological Reviews 51:221
  20. 20. Halophiles
  21. 21. E.coli vs. H. volcanii UV survival 1E-07 1E-06 1E-05 0.0001 0.001 0.01 0.1 1 Relative Survival 0 50 100 150 200 250 300 350 400 UV J/m2 UV Survival E.coli vs H.volcanii H.volcanii WFD11 E.coli NR10125 mfd+ E.coli NR10121 mfd-
  22. 22. RecA vs. rRNA Eisen 1995 Journal of Molecular Evolution 41: 1105-1123..
  23. 23. Era II: rRNA in the Environment
  24. 24. DNA extraction PCR Sequence rRNA genes Sequence alignment = Data matrixPhylogenetic tree PCR rRNA1 rRNA2 Makes lots of copies of the rRNA genes in sample rRNA1 5’...ACACACATAGGTGGAGCTA GCGATCGATCGA... 3’ E. coli Humans A T T A G A A C A T C A C A A C A G G A G T T C rRNA1 E. coli Humans rRNA2 rRNA2 5’..TACAGTATAGGTGGAGCTAG CGACGATCGA... 3’ PCR and phylogenetic analysis of rRNA genes rRNA3 5’...ACGGCAAAATAGGTGGATT CTAGCGATATAGA... 3’ rRNA4 5’...ACGGCCCGATAGGTGGATT CTAGCGCCATAGA... 3’ rRNA3 C A C T G T rRNA4 C A C A G T Yeast T A C A G T Yeast rRNA3 rRNA4
  25. 25. DNA extraction PCR Sequence rRNA genes Sequence alignment = Data matrixPhylogenetic tree PCR rRNA1 rRNA2 Makes lots of copies of the rRNA genes in sample rRNA1 5’...ACACACATAGGTGGAGCTA GCGATCGATCGA... 3’ E. coli Humans A T T A G A A C A T C A C A A C A G G A G T T C rRNA1 E. coli Humans rRNA2 rRNA2 5’..TACAGTATAGGTGGAGCTAG CGACGATCGA... 3’ PCR and phylogenetic analysis of rRNA genes rRNA3 5’...ACGGCAAAATAGGTGGATT CTAGCGATATAGA... 3’ rRNA4 5’...ACGGCCCGATAGGTGGATT CTAGCGCCATAGA... 3’ rRNA3 C A C T G T rRNA4 C A C A G T Yeast T A C A G T Yeast rRNA3 rRNA4 Phylotyping
  26. 26. • OTUs • Taxonomic lists • Relative abundance of taxa • Ecological metrics (alpha / beta diversity) • Phylogenetic metrics • Binning • Identification of novel groups • Clades • Rates of change • LGT • Convergence • PD • Phylogenetic ecology (e.g., Unifrac) rRNA Phylotyping
  27. 27. Chemosynthetic Symbionts Eisen et al. 1992Eisen et al. 1992. J. Bact.174: 3416
  28. 28. RecA from Environment? Eisen 1995 Journal of Molecular Evolution 41: 1105-1123..
  29. 29. Approaching to NGS Discovery of DNA structure (Cold Spring Harb. Symp. Quant. Biol. 1953;18:123-31) 1953 Sanger sequencing method by F. Sanger (PNAS ,1977, 74: 560-564) 1977 PCR by K. Mullis (Cold Spring Harb Symp Quant Biol. 1986;51 Pt 1:263-73) 1983 Development of pyrosequencing (Anal. Biochem., 1993, 208: 171-175; Science ,1998, 281: 363-365) 1993 1980 1990 2000 2010 Single molecule emulsion PCR 1998 Human Genome Project (Nature , 2001, 409: 860–92; Science, 2001, 291: 1304–1351) Founded 454 Life Science 2000 454 GS20 sequencer (First NGS sequencer) 2005 Founded Solexa 1998 Solexa Genome Analyzer (First short-read NGS sequencer) 2006 GS FLX sequencer (NGS with 400-500 bp read lenght) 2008 Hi-Seq2000 (200Gbp per Flow Cell) 2010 Illumina acquires Solexa (Illumina enters the NGS business) 2006 ABI SOLiD (Short-read sequencer based upon ligation) 2007 Roche acquires 454 Life Sciences (Roche enters the NGS business) 2007 NGS Human Genome sequencing (First Human Genome sequencing based upon NGS technology) 2008 From Slideshare presentation of Cosentino Cristian http://www.slideshare.net/cosentia/high-throughput-equencing Miseq Roche Jr Ion Torrent PacBio Oxford Sequencing Has Gone Crazy
  30. 30. Phylotyping Revolution • More PCR products ! • Deeper sequencing • The rare biosphere • Relative abundance estimates ! • More samples (with barcoding) • Times series • Spatially diverse sampling • Fine scale sampling
  31. 31. Beta-Diversity a broader range of Proteobacteria, but yielded similar results (Fig. S1 and Tables S2 and S3). Across all samples, we identified 4,931 quality Nitrosomadales sequences, which grouped into 176 OTUs (operational taxo- nomic units) using an arbitrary 99% sequence similarity cutoff. This cutoff retained a high amount of sequence diversity, but minimized the chance of including diversity because of se- quencing or PCR errors. Most (95%) of the sequences appear closely related either to the marine Nitrosospira-like clade, known to be abundant in estuarine sediments (e.g., ref. 19) or to marine bacterium C-17, classified as Nitrosomonas (20) (Fig. S2). Pairwise community similarity between the samples was calcu- somonadales community similarity. Geographic distance con- tributed the largest partial regression coefficient (b = 0.40, P < 0.0001), with sediment moisture, nitrate concentration, plant cover, salinity, and air and water temperature contributing to Fig. 1. The 13 marshes sampled (see Table S1 for details). Marshes com- pared with one another within regions are circled. (Inset) The arrangement of sampling points within marshes. Six points were sampled along a 100-m transect, and a seventh point was sampled ∼1 km away. Two marshes in the Northeast United States (outlined stars) were sampled more intensively, along four 100-m transects in a grid pattern. Fig. 2. Distance-decay curves for the Nitrosomadales communities. The dashed, blue line denotes the least-squares linear regression across all spatial scales. The solid lines denote separate regressions within each of the three spatial scales: within marshes, regional (across marshes within regions circled in Fig. 1), and continental (across regions). The slopes of all lines (except the solid light blue line) are significantly less than zero. The slopes of the solid red lines are significantly different from the slope of the all scale (blue dashed) line. ECOLOGY a broader range of Proteobacteria, but yielded similar results (Fig. S1 and Tables S2 and S3). Across all samples, we identified 4,931 quality Nitrosomadales sequences, which grouped into 176 OTUs (operational taxo- nomic units) using an arbitrary 99% sequence similarity cutoff. This cutoff retained a high amount of sequence diversity, but minimized the chance of including diversity because of se- quencing or PCR errors. Most (95%) of the sequences appear closely related either to the marine Nitrosospira-like clade, known to be abundant in estuarine sediments (e.g., ref. 19) or to marine bacterium C-17, classified as Nitrosomonas (20) (Fig. S2). Pairwise community similarity between the samples was calcu- lated based on the presence or absence of each OTU using a rarefied Sørensen’s index (4). Community similarity using this incidence index was highly correlated with the abundance-based Sørensen index (Mantel test: ρ = 0.9239; P = 0.0001) (21). A plot of community similarity versus geographic distance for somonadales community similarity. Geographic distance con- tributed the largest partial regression coefficient (b = 0.40, P < 0.0001), with sediment moisture, nitrate concentration, plant cover, salinity, and air and water temperature contributing to smaller, but significant, partial regression coefficients (b = 0.09– 0.17, P < 0.05) (Table 1). Because salt marsh bacteria may be dispersing through ocean currents, we also used a global ocean circulation model (23), as applied previously (24), to estimate relative dispersal times of hypothetical microbial cells between Fig. 1. The 13 marshes sampled (see Table S1 for details). Marshes com- pared with one another within regions are circled. (Inset) The arrangement of sampling points within marshes. Six points were sampled along a 100-m transect, and a seventh point was sampled ∼1 km away. Two marshes in the Northeast United States (outlined stars) were sampled more intensively, along four 100-m transects in a grid pattern. Fig. 2. Distance-decay curves for the Nitrosomadales communities. The dashed, blue line denotes the least-squares linear regression across all spatial scales. The solid lines denote separate regressions within each of the three spatial scales: within marshes, regional (across marshes within regions circled in Fig. 1), and continental (across regions). The slopes of all lines (except the solid light blue line) are significantly less than zero. The slopes of the solid red lines are significantly different from the slope of the all scale (blue dashed) line. ECOLOGY Drivers of bacterial β-diversity depend on spatial scale Jennifer B. H. Martinya,1 , Jonathan A. Eisenb , Kevin Pennc , Steven D. Allisona,d , and M. Claire Horner-Devinee a Department of Ecology and Evolutionary Biology, and d Department of Earth System Science, University of California, Irvine, CA 92697; b Department of Evolution and Ecology, University of California Davis Genome Center, Davis, CA 95616; c Center for Marine Biotechnology and Biomedicine, The Scripps Institution of Oceanography, University of California at San Diego, La Jolla, CA 92093; and e School of Aquatic and Fishery Sciences, University of Washington, community composition) yield insights into the maintenance of biodiversity. These studies are still relatively rare for micro- organisms, however, and thus our understanding of the mecha- nisms underlying microbial diversity—most of the tree of life— remains limited. β-Diversity, and therefore distance-decay patterns, could be driven solely by differences in environmental conditions across space, a hypothesis summed up by microbiologists as, “every- thing is everywhere—the environmental selects” (10). Under this model, a distance-decay curve is observed because environmen- tal variables tend to be spatially autocorrelated, and organisms with differing niche preferences are selected from the available pool of taxa as the environment changes with distance. Dispersal limitation can also give rise to β-diversity, as it per- mits historical contingencies to influence present-day biogeo- graphic patterns. For example, neutral niche models, in which an organism’s abundance is not influenced by its environmental preferences, predict a distance-decay curve (8, 11). On relatively short time scales, stochastic births and deaths contribute to a heterogeneous distribution of taxa (ecological drift). On longer time scales, stochastic genetic processes allow for taxon di- versification across the landscape (evolutionary drift). If dispersal is limiting, then current environmental or biotic conditions will not fully explain the distance-decay curve, and thus geographic distance will be correlated with community similarity even after controlling for other factors (2). For macroorganisms, the relative contribution of environ- mental factors or dispersal limitation to β-diversity depends on vary by spatial scale? Because most bac and hardy, we predicted that dispers primarily across continents, resulting microbial “provinces” (15). At the sam environmental factors would contrib decay at all scales, resulting in the steep scale as reported in plant and animal c Results and Discussion We characterized AOB community co Sanger sequencing of 16S rRNA gene primer sets. Here we focus on the resu sequences from the order Nitrosomo primers specific for AOB within the β- The second primer set (18) generate Author contributions: J.B.H.M. and M.C.H.-D. designe M.C.H.-D. performed research; J.B.H.M., S.D.A., and M and M.C.H.-D. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Freely available online through the PNAS open acces Data deposition: The sequences reported in this pap Bank database (accession nos. HQ271472–HQ276885 1 To whom correspondence should be addressed. E-m This article contains supporting information online at 1073/pnas.1016308108/-/DCSupplemental. 7850–7854 | PNAS | May 10, 2011 | vol. 108 | no. 19 www.pnas.org
  32. 32. Drosophila microbiome
  33. 33. The Built Environment ORIGINAL ARTICLE Architectural design influences the diversity and structure of the built environment microbiome Steven W Kembel1 , Evan Jones1 , Jeff Kline1,2 , Dale Northcutt1,2 , Jason Stenson1,2 , Ann M Womack1 , Brendan JM Bohannan1 , G Z Brown1,2 and Jessica L Green1,3 1 Biology and the Built Environment Center, Institute of Ecology and Evolution, Department of Biology, University of Oregon, Eugene, OR, USA; 2 Energy Studies in Buildings Laboratory, Department of Architecture, University of Oregon, Eugene, OR, USA and 3 Santa Fe Institute, Santa Fe, NM, USA Buildings are complex ecosystems that house trillions of microorganisms interacting with each other, with humans and with their environment. Understanding the ecological and evolutionary processes that determine the diversity and composition of the built environment microbiome—the community of microorganisms that live indoors—is important for understanding the relationship between building design, biodiversity and human health. In this study, we used high-throughput sequencing of the bacterial 16S rRNA gene to quantify relationships between building attributes and airborne bacterial communities at a health-care facility. We quantified airborne bacterial community structure and environmental conditions in patient rooms exposed to mechanical or window ventilation and in outdoor air. The phylogenetic diversity of airborne bacterial communities was lower indoors than outdoors, and mechanically ventilated rooms contained less diverse microbial communities than did window-ventilated rooms. Bacterial communities in indoor environments contained many taxa that are absent or rare outdoors, including taxa closely related to potential human pathogens. Building attributes, specifically the source of ventilation air, airflow rates, relative humidity and temperature, were correlated with the diversity and composition of indoor bacterial communities. The relative abundance of bacteria closely related to human pathogens was higher indoors than outdoors, and higher in rooms with lower airflow rates and lower relative humidity. The observed relationship between building design and airborne bacterial diversity suggests that we can manage indoor environments, altering through building design and operation the community of microbial species that potentially colonize the human microbiome during our time indoors. The ISME Journal advance online publication, 26 January 2012; doi:10.1038/ismej.2011.211 Subject Category: microbial population and community ecology Keywords: aeromicrobiology; bacteria; built environment microbiome; community ecology; dispersal; environmental filtering Introduction microbiome—includes human pathogens and com- mensals interacting with each other and with their The ISME Journal (2012), 1–11 & 2012 International Society for Microbial Ecology All rights reserved 1751-7362/12 www.nature.com/ismej Microbial Biogeography of Public Restroom Surfaces Gilberto E. Flores1 , Scott T. Bates1 , Dan Knights2 , Christian L. Lauber1 , Jesse Stombaugh3 , Rob Knight3,4 , Noah Fierer1,5 * 1 Cooperative Institute for Research in Environmental Science, University of Colorado, Boulder, Colorado, United States of America, 2 Department of Computer Science, University of Colorado, Boulder, Colorado, United States of America, 3 Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado, United States of America, 4 Howard Hughes Medical Institute, University of Colorado, Boulder, Colorado, United States of America, 5 Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, Colorado, United States of America Abstract We spend the majority of our lives indoors where we are constantly exposed to bacteria residing on surfaces. However, the diversity of these surface-associated communities is largely unknown. We explored the biogeographical patterns exhibited by bacteria across ten surfaces within each of twelve public restrooms. Using high-throughput barcoded pyrosequencing of the 16 S rRNA gene, we identified 19 bacterial phyla across all surfaces. Most sequences belonged to four phyla: Actinobacteria, Bacteriodetes, Firmicutes and Proteobacteria. The communities clustered into three general categories: those found on surfaces associated with toilets, those on the restroom floor, and those found on surfaces routinely touched with hands. On toilet surfaces, gut-associated taxa were more prevalent, suggesting fecal contamination of these surfaces. Floor surfaces were the most diverse of all communities and contained several taxa commonly found in soils. Skin-associated bacteria, especially the Propionibacteriaceae, dominated surfaces routinely touched with our hands. Certain taxa were more common in female than in male restrooms as vagina-associated Lactobacillaceae were widely distributed in female restrooms, likely from urine contamination. Use of the SourceTracker algorithm confirmed many of our taxonomic observations as human skin was the primary source of bacteria on restroom surfaces. Overall, these results demonstrate that restroom surfaces host relatively diverse microbial communities dominated by human-associated bacteria with clear linkages between communities on or in different body sites and those communities found on restroom surfaces. More generally, this work is relevant to the public health field as we show that human-associated microbes are commonly found on restroom surfaces suggesting that bacterial pathogens could readily be transmitted between individuals by the touching of surfaces. Furthermore, we demonstrate that we can use high-throughput analyses of bacterial communities to determine sources of bacteria on indoor surfaces, an approach which could be used to track pathogen transmission and test the efficacy of hygiene practices. Citation: Flores GE, Bates ST, Knights D, Lauber CL, Stombaugh J, et al. (2011) Microbial Biogeography of Public Restroom Surfaces. PLoS ONE 6(11): e28132. doi:10.1371/journal.pone.0028132 Editor: Mark R. Liles, Auburn University, United States of America Received September 12, 2011; Accepted November 1, 2011; Published November 23, 2011 Copyright: ß 2011 Flores et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported with funding from the Alfred P. Sloan Foundation and their Indoor Environment program, and in part by the National Institutes of Health and the Howard Hughes Medical Institute. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: noah.fierer@colorado.edu Introduction More than ever, individuals across the globe spend a large portion of their lives indoors, yet relatively little is known about the microbial diversity of indoor environments. Of the studies that have examined microorganisms associated with indoor environ- ments, most have relied upon cultivation-based techniques to detect organisms residing on a variety of household surfaces [1–5]. Not surprisingly, these studies have identified surfaces in kitchens and restrooms as being hot spots of bacterial contamination. Because several pathogenic bacteria are known to survive on surfaces for extended periods of time [6–8], these studies are of obvious importance in preventing the spread of human disease. However, it is now widely recognized that the majority of communities and revealed a greater diversity of bacteria on indoor surfaces than captured using cultivation-based techniques [10–13]. Most of the organisms identified in these studies are related to human commensals suggesting that the organisms are not actively growing on the surfaces but rather were deposited directly (i.e. touching) or indirectly (e.g. shedding of skin cells) by humans. Despite these efforts, we still have an incomplete understanding of bacterial communities associated with indoor environments because limitations of traditional 16 S rRNA gene cloning and sequencing techniques have made replicate sampling and in-depth characterizations of the communities prohibitive. With the advent of high-throughput sequencing techniques, we can now investigate indoor microbial communities at an unprecedented depth and begin to understand the relationship the stall in), they were likely dispersed manually after women used the toilet. Coupling these observations with those of the distribution of gut-associated bacteria indicate that routine use of toilets results in the dispersal of urine- and fecal-associated bacteria throughout the restroom. While these results are not unexpected, they do highlight the importance of hand-hygiene when using public restrooms since these surfaces could also be potential vehicles for the transmission of human pathogens. Unfortunately, previous studies have documented that college students (who are likely the most frequent users of the studied restrooms) are not always the most diligent of hand-washers [42,43]. Results of SourceTracker analysis support the taxonomic patterns highlighted above, indicating that human skin was the primary source of bacteria on all public restroom surfaces examined, while the human gut was an important source on or around the toilet, and urine was an important source in women’s restrooms (Figure 4, Table S4). Contrary to expectations (see above), soil was not identified by the SourceTracker algorithm as being a major source of bacteria on any of the surfaces, including floors (Figure 4). Although the floor samples contained family-level taxa that are common in soil, the SourceTracker algorithm probably underestimates the relative importance of sources, like Figure 3. Cartoon illustrations of the relative abundance of discriminating taxa on public restroom surfaces. Light blue indicates low abundance while dark blue indicates high abundance of taxa. (A) Although skin-associated taxa (Propionibacteriaceae, Corynebacteriaceae, Staphylococcaceae and Streptococcaceae) were abundant on all surfaces, they were relatively more abundant on surfaces routinely touched with hands. (B) Gut-associated taxa (Clostridiales, Clostridiales group XI, Ruminococcaceae, Lachnospiraceae, Prevotellaceae and Bacteroidaceae) were most abundant on toilet surfaces. (C) Although soil-associated taxa (Rhodobacteraceae, Rhizobiales, Microbacteriaceae and Nocardioidaceae) were in low abundance on all restroom surfaces, they were relatively more abundant on the floor of the restrooms we surveyed. Figure not drawn to scale. doi:10.1371/journal.pone.0028132.g003 Bacteria of Public Restrooms high diversity of floor communities is likely due to the frequency of contact with the bottom of shoes, which would track in a diversity of microorganisms from a variety of sources including soil, which is known to be a highly-diverse microbial habitat [27,39]. Indeed, bacteria commonly associated with soil (e.g. Rhodobacteraceae, Rhizobiales, Microbacteriaceae and Nocardioidaceae) were, on average, related differences in the relative abundances of s some surfaces (Figure 1B, Table S2). Most notably were clearly more abundant on certain surfaces restrooms than male restrooms (Figure 1B). Some family are the most common, and often most abun found in the vagina of healthy reproductive age w Figure 2. Relationship between bacterial communities associated with ten public restroom surfaces. Communities were PCoA of the unweighted UniFrac distance matrix. Each point represents a single sample. Note that the floor (triangles) and toilet (as form clusters distinct from surfaces touched with hands. doi:10.1371/journal.pone.0028132.g002 Bacteria of P time, the un to take of outside om plants ours after ere shut ortion of e human ck to pre- which 26 Janu- Journal, hanically had lower y than ones with open win- ility of fresh air translated tions of microbes associ- an body, and consequently, pathogens. Although this hat having natural airflow Green says answering that clinical data; she’s hoping they move around. But to quantify those con- tributions, Peccia’s team has had to develop new methods to collect airborne bacteria and extract their DNA, as the microbes are much less abundant in air than on surfaces. In one recent study, they used air filters to sample airborne particles and microbes in a classroom during 4 days during which pant in indoor microbial ecology research, Peccia thinks that the field has yet to gel. And the Sloan Foundation’s Olsiewski shares some of his con- cern. “Everybody’s gen- erating vast amounts of data,” she says, but looking across data sets can be difficult because groups choose dif- ferent analytical tools. With Sloan support, though, a data archive and integrated analyt- ical tools are in the works. To foster collaborations between micro- biologists, architects, and building scientists, the foundation also sponsored a symposium 100 80 60 40 20 0 Averagecontribution(%) DoorinDoorout StallinStallout Faucethandles SoapdispenserToiletseat ToiletflushhandleToiletfloorSinkfloor SOURCES Soil Water Mouth Urine Gut Skin Bathroom biogeography. By swabbing different surfaces in public restrooms, researchers determinedthatmicrobesvaryin where they come from depend- ing on the surface (chart). February9,2012
  34. 34. Earth Microbiome Project
  35. 35. Era III: Genomics
  36. 36. 1995: 1st Genome Sequence Fleischmann et al. 1995
  37. 37. My Study Organisms Tree from Woese. 1987. Microbiological Reviews 51:221
  38. 38. TIGR Genome Projects Tree from Woese. 1987. Microbiological Reviews 51:221
  39. 39. TIGR Genome Projects Tree from Woese. 1987. Microbiological Reviews 51:221
  40. 40. If you can’t beat them, critique them ... Fleischmann et al. 1995
  41. 41. Helicobacter pylori genome 1997
  42. 42. Helicobacter pylori genome sequenced 1997 “The ability of H. pylori to perform mismatch repair is suggested by the presence of methyl transferases, mutS and uvrD. However, orthologues of MutH and MutL were not identified.”
  43. 43. MutL?? From http://asajj.roswellpark.org/huberman/dna_repair/mmr.html
  44. 44. Blast Search of H. pylori “MutS” Score E Sequences producing significant alignments: (bits) Value sp|P73625|MUTS_SYNY3 DNA MISMATCH REPAIR PROTEIN 117 3e-25 sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN 69 1e-10 sp|P44834|MUTS_HAEIN DNA MISMATCH REPAIR PROTEIN 64 3e-09 sp|P10339|MUTS_SALTY DNA MISMATCH REPAIR PROTEIN 62 2e-08 sp|O66652|MUTS_AQUAE DNA MISMATCH REPAIR PROTEIN 57 4e-07 sp|P23909|MUTS_ECOLI DNA MISMATCH REPAIR PROTEIN 57 4e-07 • Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs • Based on this TIGR predicted this species had mismatch repair Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
  45. 45. Tree of MutS Family Aquae Trepa Fly Xenla Rat Mouse Human Yeast Neucr Arath Borbu Strpy Bacsu Synsp Ecoli Neigo Thema TheaqDeira Chltr Spombe Yeast Yeast Spombe Mouse Human Arath Yeast Human Mouse Arath StrpyBacsu Celeg Human Yeast MetthBorbu Aquae Synsp Deira Helpy mSaco Yeast Celeg Human Based on Eisen, 1998
 Nucl Acids Res 26: 4291-4300.
  46. 46. MutS Subfamilies Aquae Trepa Fly Xenla Rat Mouse Human Yeast Neucr Arath Borbu Strpy Bacsu Synsp Ecoli Neigo Thema TheaqDeira Chltr Spombe Yeast Yeast Spombe Mouse Human Arath Yeast Human Mouse Arath StrpyBacsu Celeg Human Yeast MetthBorbu Aquae Synsp Deira Helpy mSaco Yeast Celeg Human MSH4 MSH5 MutS2 MutS1 MSH1 MSH3 MSH6 MSH2 Based on Eisen, 1998
 Nucl Acids Res 26: 4291-4300.
  47. 47. Overlaying Functions onto Tree Aquae Trepa Rat Fly Xenla Mouse Human Yeast Neucr Arath Borbu Synsp Neigo Thema Strpy Bacsu Ecoli TheaqDeira Chltr Spombe Yeast Yeast Spombe Mouse Human Arath Yeast Human Mouse Arath StrpyBacsu Human Celeg Yeast MetthBorbu Aquae Synsp Deira Helpy mSaco Yeast Celeg Human MSH4 MSH5 MutS2 MutS1 MSH1 MSH3 MSH6 MSH2 Based on Eisen, 1998
 Nucl Acids Res 26: 4291-4300.
  48. 48. Functional Prediction Using Tree Aquae Trepa Fly Xenla Rat Mouse Human Yeast Neucr Arath Borbu Strpy Bacsu Synsp Ecoli Neigo Thema TheaqDeira Chltr Spombe Yeast Yeast Spombe Mouse Human Arath Yeast Human Mouse Arath MSH1 Mitochondrial Repair MSH3 - Nuclear 
 RepairOf Loops MSH6 - Nuclear 
 Repair Of Mismatches MutS1 - Bacterial Mismatch and Loop Repair StrpyBacsu Celeg Human Yeast MetthBorbu Aquae Synsp Deira Helpy mSaco Yeast Celeg Human MSH4 - Meiotic Crossing Over MSH5 - Meiotic Crossing Over MutS2 - Unknown Functions MSH2 - Eukaryotic Nuclear Mismatch and Loop Repair Based on Eisen, 1998
 Nucl Acids Res 26: 4291-4300.
  49. 49. PHYLOGENENETIC PREDICTION OF GENE FUNCTION IDENTIFY HOMOLOGS OVERLAY KNOWN FUNCTIONS ONTO TREE INFER LIKELY FUNCTION OF GENE(S) OF INTEREST 1 2 3 4 5 6 3 5 3 1A 2A 3A 1B 2B 3B 2A 1B 1A 3A 1B 2B 3B ALIGN SEQUENCES CALCULATE GENE TREE 1 2 4 6 CHOOSE GENE(S) OF INTEREST 2A 2A 5 3 Species 3Species 1 Species 2 1 1 2 2 2 31 1A 3A 1A 2A 3A 1A 2A 3A 4 6 4 5 6 4 5 6 2B 3B 1B 2B 3B 1B 2B 3B ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Duplication? EXAMPLE A EXAMPLE B Duplication? Duplication? Duplication 5 METHOD Ambiguous Based on Eisen, 1998 Genome Res 8: 163-167. Phylogenomic Functional Prediction
  50. 50. If you can’t beat them, use their data Fleischmann et al. 1995
  51. 51. -Ogt -RecFRQN -RuvC -Dut -SMS -PhrI -AlkA -Nfo -Vsr -SbcCD -LexA -UmuC -PhrI -PhrII -AlkA -Fpg -Nfo -MutLS -RecFORQ -SbcCD -LexA -UmuC -TagI -PhrI -Ogt -AlkA -Xth -MutLS -RecFJORQN -Mfd -SbcCD -RecG -Dut -PriA -LexA -SMS -MutT -PhrI -PhrII? -AlkA -Fpg -Nfo -RecO -LexA -UmuC -PhrI -Ung? -MutLS -RecQ? -Dut -UmuC -PhrII -Ogg -Ogt -AlkA -TagI -Nfo -Rec -SbcCD -LexA -Ogt -AlkA -Nfo -RecQ -SbcD? -Lon -LexA -AlkA -Xth -Rad25? -AlkA -Rad25 -Nfo -Ogt -Ung -Nfo -Dut -Lon -Ung -PhrII -PhrI Ecoli Haein Neigo Helpy Bacsu Strpy Mycge Mycpn Borbu Trepa Synsp Metjn Arcfu Metth Human Yeast BACTERIA ARCHAEA EUKARYOTES from mitochondria +Ada +MutH +SbcB dPhr +TagI? +Fpg +UvrABCD +Mfd +RecFJNOR +RuvABC +RecG +LigI +LexA +SSB +PriA +Dut? +Rus +UmuD +Nei? +RecE tRecT? +Vsr +RecBCD? +RFAs +TFIIH +Rad4,10,14,16,23,26 +CSA +Rad52,53,54 +DNA-PK, Ku dSNF2 dMutS dMutL dRecA +Rad1 +Rad2 +Rad25? +Ogg +LigII +Ung? +SSB, +Dut? +PhrI, PhrII +Ogt +Ung, AlkA, MutY-Nth +AlkA +Xth, Nfo? +MutLS? +SbcCD +RecA +UmuC +MutT +Lon dMutSI/MutSII dRecA/SMS dPhrI/PhrII +Spr t3MG +Rad7 +CCE1 +P53 dRecQ dRad23 +MAG? -PhrII -RuvC tRad25 +TagI? +RecT tUvrABCD tTagI ? Gain and Loss of Repair Genes Eisen and Hanawalt, 1999 Mut Res 435: 171-213
  52. 52. Why critique them when you can join them ... Fleischmann et al. 1995
  53. 53. Whole Genome Shotgun Sequencing
  54. 54. Whole Genome Shotgun Sequencing
  55. 55. Whole Genome Shotgun Sequencing Warner Brothers, Inc.
  56. 56. Whole Genome Shotgun Sequencing shotgun Warner Brothers, Inc.
  57. 57. Whole Genome Shotgun Sequencing shotgun Warner Brothers, Inc.
  58. 58. Whole Genome Shotgun Sequencing shotgun sequence Warner Brothers, Inc.
  59. 59. Whole Genome Shotgun Sequencing shotgun sequence Warner Brothers, Inc.
  60. 60. Assemble Fragments
  61. 61. Assemble Fragments sequencer output
  62. 62. Assemble Fragments sequencer output
  63. 63. Assemble Fragments sequencer output assemble fragments
  64. 64. Assemble Fragments sequencer output assemble fragments Closure &
 
 Annotation
  65. 65. Genome Sequences Have Revolutionized Microbiology • Predictions of metabolic processes • Better vaccine and drug design • New insights into mechanisms of evolution • Genomes serve as template for functional studies • New enzymes and materials for engineering and synthetic biology
  66. 66. From http://genomesonline.org
  67. 67. Phylogenetic Prediction of Function • Many powerful and automated similarity based methods for assigning genes to protein families • COGs • PFAM HMM searches • Some limitations of similarity based methods can be overcome by phylogenetic approaches • Automated methods now available • Sean Eddy • Steven Brenner • Kimmen Sjölander • But …
  68. 68. Carboxydothermus hydrogenoformans • Isolated from a Russian hotspring • Thermophile (grows at 80°C) • Anaerobic • Grows very efficiently on CO (Carbon Monoxide) • Produces hydrogen gas • Low GC Gram positive (Firmicute) • Genome Determined (Wu et al. 2005 PLoS Genetics 1: e65. )
  69. 69. Homologs of Sporulation Genes Wu et al. 2005 PLoS Genetics 1: e65.
  70. 70. Carboxydothermus sporulates Wu et al. 2005 PLoS Genetics 1: e65.
  71. 71. Non-Homology Predictions: Phylogenetic Profiling • Step 1: Search all genes in organisms of interest against all other genomes ! • Ask: Yes or No, is each gene found in each other species ! • Cluster genes by distribution patterns (profiles)
  72. 72. Sporulation Gene Profile Wu et al. 2005 PLoS Genetics 1: e65.
  73. 73. B. subtilis new sporulation genes Traag et al. 2013. J. Bact. 195: 253.
  74. 74. Era IV: Genomes in the Environment
  75. 75. Ed Delong on SAR86 gene le ge- iden- roteo- from opsins erent. hereas philes r than rmine l, we a coli pres- rotein 3A). nes of popro- m was (Fig. at 520 band- erated odop- nce of dth is own transducer of light stimuli [for example, Htr (22, 23)]. Although sequence analysis of proteorhodopsin shows moderate statistical support for a specific relationship with sen- the kinetics of its photochemical reaction cy- cle. The transport rhodopsins (bacteriorho- dopsins and halorhodopsins) are character- ized by cyclic photochemical reaction se- From Beja et al. Science 289: 1902–1906. doi:
  76. 76. Proteorhodopsin generated eorhodop- resence of ndwidth is absorption . The red- nm in the ated Schiff ably to the on was de- s in a cell ward trans- in proteor- nd only in (Fig. 4A). edium was ce of a 10 re carbonyl 19). Illumi- ical poten- right-side- nce of reti- light onset hat proteo- capable of physiolog- e activities containing proteorho- main to be Fig. 1. (A) Phylogenetic tree of bacterial 16S rRNA gene sequences, including that encoded on the 130-kb bacterioplankton BAC clone (EBAC31A08) (16). (B) Phylogenetic analysis of proteorhodop- sin with archaeal (BR, HR, and SR prefixes) and Neurospora crassa (NOP1 prefix) rhodopsins (16). Nomenclature: Name_Species.abbreviation_Genbank.gi (HR, halorhodopsin; SR, sensory rhodopsin; BR, bacteriorhodopsin). Halsod, Halorubrum sodomense; Halhal, Halobacterium salinarum (halo- bium); Halval, Haloarcula vallismortis; Natpha, Natronomonas pharaonis; Halsp, Halobacterium sp; Neucra, Neurospora crassa. wDownloadedfrom From Beja et al. Science 289: 1902–1906. doi:
  77. 77. Bac Based Metagenomics
  78. 78. Whole Genome Shotgun Sequencing
  79. 79. Whole Genome Shotgun Sequencing Warner Brothers, Inc.
  80. 80. Whole Genome Shotgun Sequencing shotgun Warner Brothers, Inc.
  81. 81. Whole Genome Shotgun Sequencing shotgun Warner Brothers, Inc.
  82. 82. Whole Genome Shotgun Sequencing shotgun sequence Warner Brothers, Inc.
  83. 83. Whole Genome Shotgun Sequencing shotgun sequence Warner Brothers, Inc.
  84. 84. Whole Genome Shotgun Sequencing
  85. 85. Whole Genome Shotgun Sequencing Warner Brothers, Inc.
  86. 86. Whole Genome Shotgun Sequencing shotgun Warner Brothers, Inc.
  87. 87. Whole Genome Shotgun Sequencing shotgun Warner Brothers, Inc.
  88. 88. Whole Genome Shotgun Sequencing shotgun sequence Warner Brothers, Inc.
  89. 89. Whole Genome Shotgun Sequencing shotgun sequence Warner Brothers, Inc.
  90. 90. Baumannia is a Vitamin and Cofactor Producing Machine Wu et al. 2006 PLoS Biology 4: e188.
  91. 91. No Amino-Acid Synthesis
  92. 92. ???????
  93. 93. Commonly Used Binning Methods
 Did not Work Well • Assembly –Only Baumannia generated good contigs • Depth of coverage –Everything else 0-1X coverage • Nucleotide composition –No detectible peaks in any vector we looked at
  94. 94. CFB Phyla Wu et al. 2006 PLoS Biology 4: e188.
  95. 95. Wu et al. 2006 PLoS Biology 4: e188. Baumannia makes vitamins and cofactors Sulcia makes amino acids
  96. 96. Whole Genome Shotgun Sequencing
  97. 97. Whole Genome Shotgun Sequencing Warner Brothers, Inc.
  98. 98. Whole Genome Shotgun Sequencing shotgun Warner Brothers, Inc.
  99. 99. Whole Genome Shotgun Sequencing shotgun Warner Brothers, Inc.
  100. 100. Whole Genome Shotgun Sequencing shotgun sequence Warner Brothers, Inc.
  101. 101. Whole Genome Shotgun Sequencing shotgun sequence Warner Brothers, Inc.
  102. 102. Shotgun Metagenomics Community structure and metabolism through reconstruction of microbial genomes from the environment Gene W. Tyson1 , Jarrod Chapman3,4 , Philip Hugenholtz1 , Eric E. Allen1 , Rachna J. Ram1 , Paul M. Richardson4 , Victor V. Solovyev4 , Edward M. Rubin4 , Daniel S. Rokhsar3,4 & Jillian F. Banfield1,2 1 Department of Environmental Science, Policy and Management, 2 Department of Earth and Planetary Sciences, and 3 Department of Physics, University of California, Berkeley, California 94720, USA 4 Joint Genome Institute, Walnut Creek, California 94598, USA ........................................................................................................................................................................................................................... Microbial communities are vital in the functioning of all ecosystems; however, most microorganisms are uncultivated, and their roles in natural systems are unclear. Here, using random shotgun sequencing of DNA from a natural acidophilic biofilm, we report reconstruction of near-complete genomes of Leptospirillum group II and Ferroplasma type II, and partial recovery of three other genomes. This was possible because the biofilm was dominated by a small number of species populations and the frequency of genomic rearrangements and gene insertions or deletions was relatively low. Because each sequence read came from a different individual, we could determine that single-nucleotide polymorphisms are the predominant form of heterogeneity at the strain level. The Leptospirillum group II genome had remarkably few nucleotide polymorphisms, despite the existence of low-abundance variants. The Ferroplasma type II genome seems to be a composite from three ancestral strains that have undergone homologous recombination to form a large population of mosaic genomes. Analysis of the gene complement for each organism revealed the pathways for carbon and nitrogen fixation and energy generation, and provided insights into survival strategies in an extreme environment. The study of microbial evolution and ecology has been revolutio- nized by DNA sequencing and analysis1–3 . However, isolates have been the main source of sequence data, and only a small fraction of microorganisms have been cultivated4–6 . Consequently, focus has shifted towards the analysis of uncultivated microorganisms via cloning of conserved genes5 and genome fragments directly from 7–9 fluorescence in situ hybridization (FISH) revealed that all biofilms contained mixtures of bacteria (Leptospirillum, Sulfobacillus and, in a few cases, Acidimicrobium) and archaea (Ferroplasma and other members of the Thermoplasmatales). The genome of one of these archaea, Ferroplasma acidarmanus fer1, isolated from the Richmond mine, has been sequenced previously (http://www.jgi.doe.gov/JGI_ articles Environmental Genome Shotgun Sequencing of the Sargasso Sea J. Craig Venter,1 * Karin Remington,1 John F. Heidelberg,3 Aaron L. Halpern,2 Doug Rusch,2 Jonathan A. Eisen,3 Dongying Wu,3 Ian Paulsen,3 Karen E. Nelson,3 William Nelson,3 Derrick E. Fouts,3 Samuel Levy,2 Anthony H. Knap,6 Michael W. Lomas,6 Ken Nealson,5 Owen White,3 Jeremy Peterson,3 Jeff Hoffman,1 Rachel Parsons,6 Holly Baden-Tillson,1 Cynthia Pfannkoch,1 Yu-Hui Rogers,4 Hamilton O. Smith1 chlorococcus, tha photosynthetic bio Surface water were collected ab from three sites o February 2003. A lected aboard the S station S” in May are indicated on F S1; sampling prot one expedition to was extracted from genomic libraries w 2 to 6 kb were m prepared plasmid RESEARCH ARTICLE
  103. 103. Venter et al., Science 304: 66. 2004 rRNA Phylotyping in Sargasso
  104. 104. RecA Phylotyping in Sargasso Data Venter et al., Science 304: 66. 2004
  105. 105. Sargasso Phylotypes Weighted%ofClones 0.000 0.125 0.250 0.375 0.500 Major Phylogenetic Group Alphaproteobacteria Betaproteobacteria G am m aproteobacteria Epsilonproteobacteria Deltaproteobacteria C yanobacteriaFirm icutesActinobacteria C hlorobi C FB C hloroflexiSpirochaetesFusobacteria Deinococcus-Therm us EuryarchaeotaC renarchaeota EFG EFTu HSP70 RecA RpoB rRNA Phylotyping in Sargasso Data Venter et al., Science 304: 66. 2004
  106. 106. Diversity of Proteorhodopsins Venter et al., Science 304: 66. 2004
  107. 107. GOS 1 GOS 2 GOS 3 GOS 4 GOS 5 RecA, RpoB in GOS Wu et al PLoS One 2011
  108. 108. Merging Eras
  109. 109. As of 2002
  110. 110. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter • At least 40 phyla of bacteria As of 2002 Based on Hugenholtz, 2002
  111. 111. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter • At least 40 phyla of bacteria • Genome sequences are mostly from three phyla As of 2002 Based on Hugenholtz, 2002
  112. 112. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter • At least 40 phyla of bacteria • Genome sequences are mostly from three phyla • Some other phyla are only sparsely sampled As of 2002 Based on Hugenholtz, 2002
  113. 113. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter • At least 40 phyla of bacteria • Genome sequences are mostly from three phyla • Some other phyla are only sparsely sampled As of 2002 Based on Hugenholtz, 2002
  114. 114. GEBA
  115. 115. GEBA Pilot Project Overview • Identify major branches in rRNA tree for which no genomes are available • Identify those with a cultured representative in DSMZ • DSMZ grew > 200 of these and prepped DNA • Sequence and finish 200+ • Annotate, analyze, release data • Assess benefits of tree guided sequencing • 1st paper Wu et al in Nature Dec 2009
  116. 116. GEBA Pilot Project: Components • Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen, Eddy Rubin, Jim Bristow) • Project management (David Bruce, Eileen Dalin, Lynne Goodwin) • Culture collection and DNA prep (DSMZ, Hans-Peter Klenk) • Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng) • Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al) • Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla) • Adopt a microbe education project (Cheryl Kerfeld) • Outreach (David Gilbert) • $$$ (DOE, Eddy Rubin, Jim Bristow)
  117. 117. Lessons from GEBA
  118. 118. Lesson 1: rRNA PD IDs novel lineages From Wu et al. 2009 Nature 462, 1056-1060
  119. 119. Lesson 2: rRNA Tree is not perfect Badger et al. 2005 Int J System Evol Microbiol 55: 1021-1026. 16s WGT, 23S
  120. 120. Lesson 3: Improves annotation • Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes • Better definition of protein family sequence “patterns” • Greatly improves “comparative” and “evolutionary” based predictions • Conversion of hypothetical into conserved hypotheticals • Linking distantly related members of protein families • Improved non-homology prediction
  121. 121. Lesson 4: Diversity Discovery • Phylogeny-driven genome selection helps discover new genetic diversity
  122. 122. Wu et al. 2009 Nature 462, 1056-1060
  123. 123. Wu et al. 2009 Nature 462, 1056-1060
  124. 124. Wu et al. 2009 Nature 462, 1056-1060
  125. 125. Wu et al. 2009 Nature 462, 1056-1060
  126. 126. Wu et al. 2009 Nature 462, 1056-1060
  127. 127. Synapomorphies exist Wu et al. 2009 Nature 462, 1056-1060
  128. 128. Lesson 5: Improves metagenomics Sargasso Phylotypes Weighted%ofClones 0.000 0.125 0.250 0.375 0.500 Major Phylogenetic Group Alphaproteobacteria Betaproteobacteria G am m aproteobacteria Epsilonproteobacteria Deltaproteobacteria C yanobacteriaFirm icutesActinobacteriaC hlorobi C FB C hloroflexiSpirochaetesFusobacteria Deinococcus-Therm us Euryarchaeota C renarchaeota EFG EFTu HSP70 RecA RpoB rRNA Venter et al., Science 304: 66-74. 2004 GEBA Project improves metagenomic analysis
  129. 129. GEBA Cyanobacteria www.pnas.org/cgi/doi/10.1073/pnas.1217107110 0.3 B1 B2 C1 Paulinella Glaucophyte Green Red Chromalveolates C2 C3 A E F G B3 D A B
  130. 130. Haloarchaeal GEBA-like Lynch EA, Langille MGI, Darling A, Wilbanks EG, Haltiner C, et al. (2012) Sequencing of Seven Haloarchaeal Genomes Reveals Patterns of Genomic Flux. PLoS ONE 7(7): e41389. doi:10.1371/journal.pone.0041389
  131. 131. But ...
  132. 132. Phylotyping Sargasso Phylotypes Weighted%ofClones 0.000 0.125 0.250 0.375 0.500 Major Phylogenetic Group Alphaproteobacteria Betaproteobacteria G am m aproteobacteria Epsilonproteobacteria Deltaproteobacteria C yanobacteriaFirm icutesActinobacteriaC hlorobi C FB C hloroflexiSpirochaetesFusobacteria Deinococcus-Therm us Euryarchaeota C renarchaeota EFG EFTu HSP70 RecA RpoB rRNA Venter et al., Science 304: 66-74. 2004 GEBA Project improves metagenomic analysis
  133. 133. Phylotyping Sargasso Phylotypes Weighted%ofClones 0.000 0.125 0.250 0.375 0.500 Major Phylogenetic Group Alphaproteobacteria Betaproteobacteria G am m aproteobacteria Epsilonproteobacteria Deltaproteobacteria C yanobacteriaFirm icutesActinobacteriaC hlorobi C FB C hloroflexiSpirochaetesFusobacteria Deinococcus-Therm us Euryarchaeota C renarchaeota EFG EFTu HSP70 RecA RpoB rRNA But not a lot Venter et al., Science 304: 66-74. 2004
  134. 134. Phylotyping Sargasso Phylotypes Weighted%ofClones 0.000 0.125 0.250 0.375 0.500 Major Phylogenetic Group Alphaproteobacteria Betaproteobacteria G am m aproteobacteria Epsilonproteobacteria Deltaproteobacteria C yanobacteriaFirm icutesActinobacteriaC hlorobi C FB C hloroflexiSpirochaetesFusobacteria Deinococcus-Therm us Euryarchaeota C renarchaeota EFG EFTu HSP70 RecA RpoB rRNA Venter et al., Science 304: 66-74. 2004 GEBA Project improves phylogenomics analysis
  135. 135. Phylotyping Sargasso Phylotypes Weighted%ofClones 0.000 0.125 0.250 0.375 0.500 Major Phylogenetic Group Alphaproteobacteria Betaproteobacteria G am m aproteobacteria Epsilonproteobacteria Deltaproteobacteria C yanobacteriaFirm icutesActinobacteriaC hlorobi C FB C hloroflexiSpirochaetesFusobacteria Deinococcus-Therm us Euryarchaeota C renarchaeota EFG EFTu HSP70 RecA RpoB rRNA But not a lot Venter et al., Science 304: 66-74. 2004
  136. 136. Future Needs I: • Need to adapt genomic and metagenomic methods to make better use of data
  137. 137. Improving Metagenomic Analysis • Methods • More automation • Better phylogenetic methods for short reads and large data sets • Improved tools for using distantly related genomes in metagenomic analysis • Data sets • Rebuild protein family models • New phylogenetic markers • Need better reference phylogenies, including HGT • More simulations
  138. 138. WATERsPage 2 of 14 ic- A). sly ers nly ed, ed ng ge- de- he a nt ise he on n- nd eys er) 16 n- as nto tly nc- 6 S As chimeric sequences generated during PCR identifying closely related sets of sequences (also known as opera- tional taxonomic units or OTUs), removing redundant sequences above a certain percent identity cutoff, assign- ing putative taxonomic identifiers to each sequence or representative of a group, inferring a phylogenetic tree of Figure 1 Overview of WATERS. Schema of WATERS where white boxes indicate "behind the scenes" analyses that are performed in WA- TERS. Quality control files are generated for white boxes, but not oth- erwise routinely analyzed. Black arrows indicate that metadata (e.g., sample type) has been overlaid on the data for downstream interpre- tation. Colored boxes indicate different types of results files that are generated for the user for further use and biological interpretation. Colors indicate different types of WATERS actors from Fig. 2 which were used: green, Diversity metrics, WriteGraphCoordinates, Diversity graphs; blue, Taxonomy, BuildTree, Rename Trees, Save Trees; Create- Unifrac; yellow, CreateOtuTable, CreateCytoscape, CreateOTUFile; white, remaining unnamed actors. Align Check chimeras Cluster Build Tree Assign Taxonomy Tree w/ Taxonomy Diversity statistics & graphs Unifrac files Cytoscape network OTU table Hartman et al 2010. W.A.T.E.R.S.: a Workflow for the Alignment, Taxonomy, and Ecology of Ribosomal Sequences. BMC Bioinformatics 2010, 11:317 doi: 10.1186/1471-2105-11-317 all of these bioinformatics steps together in one package. To this end, we have built an automated, user-friendly, workflow-based system called WATERS: a Workflow for the Alignment, Taxonomy, and Ecology of Ribosomal Sequences (Fig. 1). In addition to being automated and simple to use, because WATERS is executed in the Kepler scientific workflow system (Fig. 2) it also has the advan- tage that it keeps track of the data lineage and provenance of data products [23,24]. Automation The primary motivation in building WATERS was to minimize the technical, bioinformatics challenges that arise when performing DNA sequence clustering, phylo- therefore, to invest a large amount of time and effort to get to that list of microbes. But now that current efforts are significantly more advanced and often require com- parison of dozens of factors and variables with datasets of thousands of sequences, it is not practically feasible to process these large collections "by hand", and hugely inef- ficient if instead automated methods can be successfully employed. Broadening the user base A second motivation and perspective is that by minimiz- ing the technical difficulty of 16 S rDNA analysis through the use of WATERS, we aim to make the analysis of these datasets more widely available and allow individuals with Figure 2 Screenshot of WATERS in Kepler software. Key features: the library of actors un-collapsed and displayed on the left-hand side, the input and output paths where the user declares the location of their input files and desired location for the results files. Each green box is an individual Kepler actor that performs a single action on the data stream. The connectors (black arrows) direct and hook up the actors in a defined sequence. Double- clicking on any actor or connector allows it to be manipulated and re-arranged.
  139. 139. Zorro - Automated Masking cetoTrueTree 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 200 400 800 1600 3200 DistancetoTrueTree Sequence Length 200 no masking zorro gblocks Wu M, Chatterji S, Eisen JA (2012) Accounting For Alignment Uncertainty in Phylogenomics. PLoS ONE 7(1): e30288. doi:10.1371/journal.pone. 0030288
  140. 140. Kembel Correction Kembel SW, Wu M, Eisen JA, Green JL (2012) Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance. PLoS Comput Biol 8(10): e1002743. doi:10.1371/journal.pcbi.1002743
  141. 141. alignment used to build the profile, resulting in a multiple sequence alignment of full-length reference sequences and PD versus PID clustering, 2) to explore overlap betw clusters and recognized taxonomic designations, and Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in workflow of PhylOTU. See Results section for details. doi:10.1371/journal.pcbi.1001061.g001 Finding Meta Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer JP, Green JL, Eisen JA, Pollard KS. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061 PhylOTU
  142. 142. Phylosift/ pplacer Aaron Darling, Guillaume Jospin, Holly Bik, Erik Matsen, Eric Lowe, and others
  143. 143. Kembel Combiner typically used as a qualitative measure because duplicate s quences are usually removed from the tree. However, the test may be used in a semiquantitative manner if all clone even those with identical or near-identical sequences, are i cluded in the tree (13). Here we describe a quantitative version of UniFrac that w call “weighted UniFrac.” We show that weighted UniFrac b haves similarly to the FST test in situations where both a FIG. 1. Calculation of the unweighted and the weighted UniFr measures. Squares and circles represent sequences from two differe environments. (a) In unweighted UniFrac, the distance between t circle and square communities is calculated as the fraction of t branch length that has descendants from either the square or the circ environment (black) but not both (gray). (b) In weighted UniFra branch lengths are weighted by the relative abundance of sequences the square and circle communities; square sequences are weight twice as much as circle sequences because there are twice as many tot circle sequences in the data set. The width of branches is proportion to the degree to which each branch is weighted in the calculations, an gray branches have no weight. Branches 1 and 2 have heavy weigh since the descendants are biased toward the square and circles, respe tively. Branch 3 contributes no value since it has an equal contributio from circle and square sequences after normalization. Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214
  144. 144. NMF in MetagenomesCharacterizing the niche-space distributions of components Sites North American East Coast_GS005_Embayment North American East Coast_GS002_Coastal North American East Coast_GS003_Coastal North American East Coast_GS007_Coastal North American East Coast_GS004_Coastal North American East Coast_GS013_Coastal North American East Coast_GS008_Coastal North American East Coast_GS011_Estuary North American East Coast_GS009_Coastal Eastern Tropical Pacific_GS021_Coastal North American East Coast_GS006_Estuary North American East Coast_GS014_Coastal Polynesia Archipelagos_GS051_Coral Reef Atoll Galapagos Islands_GS036_Coastal Galapagos Islands_GS028_Coastal Indian Ocean_GS117a_Coastal sample Galapagos Islands_GS031_Coastal upwelling Galapagos Islands_GS029_Coastal Galapagos Islands_GS030_Warm Seep Galapagos Islands_GS035_Coastal Sargasso Sea_GS001c_Open Ocean Eastern Tropical Pacific_GS022_Open Ocean Galapagos Islands_GS027_Coastal Indian Ocean_GS149_Harbor Indian Ocean_GS123_Open Ocean Caribbean Sea_GS016_Coastal Sea Indian Ocean_GS148_Fringing Reef Indian Ocean_GS113_Open Ocean Indian Ocean_GS112a_Open Ocean Caribbean Sea_GS017_Open Ocean Indian Ocean_GS121_Open Ocean Indian Ocean_GS122a_Open Ocean Galapagos Islands_GS034_Coastal Caribbean Sea_GS018_Open Ocean Indian Ocean_GS108a_Lagoon Reef Indian Ocean_GS110a_Open Ocean Eastern Tropical Pacific_GS023_Open Ocean Indian Ocean_GS114_Open Ocean Caribbean Sea_GS019_Coastal Caribbean Sea_GS015_Coastal Indian Ocean_GS119_Open Ocean Galapagos Islands_GS026_Open Ocean Polynesia Archipelagos_GS049_Coastal Indian Ocean_GS120_Open Ocean Polynesia Archipelagos_GS048a_Coral Reef Component 1 Component 2 Component 3 Component 4 Component 5 0.1 0.2 0.3 0.4 0.5 0.6 0.2 0.4 0.6 0.8 1.0 Salinity SampleDepth Chlorophyll Temperature Insolation WaterDepth General High Medium Low NA High Medium Low NA Water depth >4000m 2000!4000m 900!2000m 100!200m 20!100m 0!20m >4000m 2000!4000m 900!2000m 100!200m 20!100m 0!20m (a) (b) (c) Figure 3: a) Niche-space distributions for our five components (HT ); b) the site- similarity matrix ( ˆHT ˆH); c) environmental variables for the sites. The matrices are aligned so that the same row corresponds to the same site in each matrix. Sites are ordered by applying spectral reordering to the similarity matrix (see Materials and Methods). Rows are aligned across the three matrices. Functional biogeography of ocean microbes revealed through non-negative matrix factorization Jiang et al. PLoS One. w/ Weitz, Dushoff, Langille, Neches, Levin, etc
  145. 145. More Markers Phylogenetic group Genome Number Gene Number Maker Candidates Archaea 62 145415 106 Actinobacteria 63 267783 136 Alphaproteobacteria 94 347287 121 Betaproteobacteria 56 266362 311 Gammaproteobacteria 126 483632 118 Deltaproteobacteria 25 102115 206 Epislonproteobacteria 18 33416 455 Bacteriodes 25 71531 286 Chlamydae 13 13823 560 Chloroflexi 10 33577 323 Cyanobacteria 36 124080 590 Firmicutes 106 312309 87 Spirochaetes 18 38832 176 Thermi 5 14160 974 Thermotogae 9 17037 684
  146. 146. Better Reference Tree Lang et al. 2013
  147. 147. Sifting Families Representative Genomes Extract Protein Annotation All v. All BLAST Homology Clustering (MCL) SFams Align & Build HMMs HMMs Screen for Homologs New Genomes Extract Protein Annotation Figure 1 Sharpton et al. 2013 A B C
  148. 148. Future Needs II: • We have still only scratched the surface of microbial diversity
  149. 149. rRNA Tree of Life Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007. Based on tree from Pace 1997 Science 276:734-740 Archaea Eukaryotes Bacteria
  150. 150. PD: All From Wu et al. 2009 Nature 462, 1056-1060
  151. 151. Uncultured Lineages: Methods • Get into culture • Enrichment cultures • If abundant in low diversity ecosystems • Flow sorting • Microbeads • Microfluidic sorting • Single cell amplification
  152. 152. 130 Number of SAGs from Candidate Phyla OD1 OP11 OP3 SAR406 Site A: Hydrothermal vent 4 1 - - Site B: Gold Mine 6 13 2 - Site C: Tropical gyres (Mesopelagic) - - - 2 Site D: Tropical gyres (Photic zone) 1 - - - Sample collections at 4 additional sites are underway. Phil Hugenholtz GEBA Uncultured
  153. 153. Future Needs III: • Need Experiments from Across the Tree of Life too
  154. 154. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter • At least 40 phyla of bacteria As of 2002 Tree Based on Hugenholtz, 2002. http://genomebiology.com/ 2002/3/2/reviews/0003
  155. 155. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter • At least 40 phyla of bacteria • Experimental studies are mostly from three phyla As of 2002 Tree Based on Hugenholtz, 2002. http://genomebiology.com/ 2002/3/2/reviews/0003
  156. 156. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter • At least 40 phyla of bacteria • Experimental studies are mostly from three phyla • Some studies in other phyla As of 2002 Tree Based on Hugenholtz, 2002. http://genomebiology.com/ 2002/3/2/reviews/0003
  157. 157. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter • At least 40 phyla of bacteria • Genome sequences are mostly from three phyla • Some other phyla are only sparsely sampled • Same trend in Eukaryotes As of 2002 Tree Based on Hugenholtz, 2002. http://genomebiology.com/ 2002/3/2/reviews/0003
  158. 158. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter • At least 40 phyla of bacteria • Genome sequences are mostly from three phyla • Some other phyla are only sparsely sampled • Same trend in Viruses As of 2002 Tree Based on Hugenholtz, 2002. http://genomebiology.com/ 2002/3/2/reviews/0003
  159. 159. 0.1 Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter Tree based on Hugenholtz (2002) with some modifications. Need experimental studies from across the tree too Tree Based on Hugenholtz, 2002. http://genomebiology.com/ 2002/3/2/reviews/0003
  160. 160. 0.1 Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter Tree based on Hugenholtz (2002) with some modifications. Adopt a Microbe Tree Based on Hugenholtz, 2002. http://genomebiology.com/ 2002/3/2/reviews/0003
  161. 161. MICROBES
  162. 162. A Happy Tree of Life
  163. 163. Acknowledgements • GEBA: • $$: DOE-JGI, DSMZ • Eddy Rubin, Phil Hugenholtz, Hans-Peter Klenk, Nikos Kyrpides, Tanya Woyke, Dongying Wu, Aaron Darling, Jenna Lang • GEBA Cyanobacteria • $$: DOE-JGI • Cheryl Kerfeld, Dongying Wu, Patrick Shih • Haloarchaea • $$$ NSF • Marc Facciotti, Aaron Darling, Erin Lynch, • iSEEM: • $$: GBMF • Katie Pollard, Jessica Green, Martin Wu, Steven Kembel, Tom Sharpton, Morgan Langille, Guillaume Jospin, Dongying Wu, • aTOL • $$: NSF • Naomi Ward, Jonathan Badger, Frank Robb, Martin Wu, Dongying Wu • Others • $$: NSF, NIH, DOE, GBMF, DARPA, Sloan • Frank Robb, Craig Venter, Doug Rusch, Shibu Yooseph, Nancy Moran, Colleen Cavanaugh, Josh Weitz • EisenLab: Srijak Bhatnagar, Russell Neches, Lizzy Wilbanks, Holly Bik

×