EssayEnvironmental Shotgun Sequencing:Its Potential and Challenges for Studyingthe Hidden World of MicrobesJonathan A. Eis...
Table 1. Some Major Methods for Studying Individual Microbes Found in the EnvironmentMethod                       Summary ...
Table 2. Methods of BinningMethod                                  Description                                            ...
few different phylotypes present, all       Similarly, the initial comparisons of       References                        ...
23. Perna NT, Plunkett G 3rd, Burland V, Mau B,           II Gobal Ocean Sampling expedition:                    metagenom...
Upcoming SlideShare
Loading in …5

Environmental Shotgun Sequencing


Published on

Eisen JA (2007) Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes. PLoS Biol 5(3): e82. doi:10.1371/journal.pbio.0050082

Experimenting with posting OpenAccess papers on Slideshare

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Environmental Shotgun Sequencing

  1. 1. EssayEnvironmental Shotgun Sequencing:Its Potential and Challenges for Studyingthe Hidden World of MicrobesJonathan A. Eisen without culturing them [5–7], and taxa may look the same. This vexing the use of high-throughput “shotgun” problem was partially overcome in methods to sequence the genomes of the 1980s through the use of rRNA- cultured species [8]. We are now in the PCR (Table 1). This method allows midst of another such revolution—this microorganisms in a sample to be one driven by the use of genome phylogenetically typed and counted sequencing methods to study microbes based on the sequence of their rRNA directly in their natural habitats, an genes, genes that are present in all approach known as metagenomics, cell-based organisms. In essence, a environmental genomics, or database of rRNA sequences [14,15]S ince their discovery in the 1670s community genomics [9]. from known organisms functions by Anton van Leeuwenhoek, In this essay I focus on one like a bird field guide, and finding a an incredible amount has been particularly promising area of rRNA-PCR product is akin to seeing alearned about microorganisms and metagenomics—the use of shotgun bird through binoculars. Rather thantheir importance to human health, genome methods to sequence random counting species, this approach focusesagriculture, industry, ecosystem fragments of DNA from microbes on “phylotypes,” which are defined asfunctioning, global biogeochemical in an environmental sample. The organisms whose rRNA sequences arecycles, and the origin and evolution randomness and breadth of this very similar to each other (a cutoff ofof life. Nevertheless, it is what is not environmental shotgun sequencing >97% or >99% identical is frequentlyknown that is most astonishing. For (ESS)—first used only a few years ago used). The ability to use phylotypingexample, though there are certainly [10,11] and now being used to assay to determine who was out there in anyat least 10 million species of bacteria, every microbial system imaginable microbial sample has revolutionizedonly a few thousand have been formally from the human gut [12] to waste environmental microbiology [16],described [1]. This contrasts with the water sludge [13]—has the potential to led to many discoveries [e.g.,17],more than 350,000 described species reveal novel and fundamental insights and convinced many people (myselfof beetles [2]. This is one of many into the hidden world of microbes and included) to become microbiologists.examples indicative of the general their impact on our world. However,difficulties encountered in studying the complexity of analysis required Citation: Eisen JA (2007) Environmental shotgunorganisms that we cannot readily see to realize this potential poses unique sequencing: Its potential and challenges for studyingor collect in large samples for future interdisciplinary challenges, challenges the hidden world of microbes. PLoS Biol 5(3): e82. doi:10.1371/journal.pbio.0050082analyses. It is thus not surprising that that make the approach bothmost major advances in microbiology fascinating and frustrating in equal Series Editor: Simon Levin, Princeton University,can be traced to methodological measure. United States of Americaadvances rather than scientific Copyright: © 2007 Jonathan A. Eisen. This is andiscoveries per se. Who Is Out There? Typing open-access article distributed under the terms Examples of these key revolutionary and Counting Microbes in of the Creative Commons Attribution License, which permits unrestricted use, distribution, andmethods (Table 1) include the use of the Environment reproduction in any medium, provided the originalmicroscopes to view microbial cells, author and source are credited. One of the most important andthe growth of single types of organisms conceptually straightforward steps Abbreviations: ESS, environmental shotgunin the lab in isolation from other in studying any ecosystem involves sequencing; PCR, polymerase chain reaction; rRNA,types (culturing), the comparison ribosomal RNA cataloging the types of organisms andof ribosomal RNA (rRNA) genes to the numbers of each type. For a long Jonathan A. Eisen is at the University of Californiaconstruct the first tree of life that time, such typing and counting was Davis Genome Center, with joint appointments in the Section of Evolution and Ecology andincluded microbes [3], the use of the an almost insurmountable problem in the Department of Medical Microbiology andpolymerase chain reaction (PCR) [4] microbiology. This is largely because Immunology, Davis, California, United States ofto clone rRNA genes from organisms America. Web site: http://phylogenomics.blogspot. physical appearance does not provide com. E-mail: a valid taxonomic picture in microbes. Appearance evolves so rapidly that two This article is part of the Oceanic MetagenomicsEssays articulate a specific perspective on a topic of collection in PLoS Biology. The full collection isbroad interest to scientists. closely related taxa may look wildly available online at different and two distantly related plosbiology/gos-2007.php. PLoS Biology | 0001 March 2007 | Volume 5 | Issue 3 | e82
  2. 2. Table 1. Some Major Methods for Studying Individual Microbes Found in the EnvironmentMethod Summary CommentsMicroscopy Microbial phenotypes can be studied by making them more visible. In conjunction The appearance of microbes is not a reliable indicator of with other methods, such as staining, microscopy can also be used to count taxa what type of microbe one is looking at. and make inferences about biological processes.Culturing Single cells of a particular microbial type are grown in isolation from other This is the best way to learn about the biology of a organisms. This can be done in liquid or solid growth media. particular organism. However, many microbes are uncultured (i.e., have never been grown in the lab in isolation from other organisms) and may be unculturable (i.e., may not be able to grow without other organisms).rRNA-PCR The key aspects of this method are the following: (a) all cell-based organisms This method revolutionized microbiology in the 1980s by possess the same rRNA genes (albeit with different underlying sequences); (b) PCR allowing the types and numbers of microbes present in is used to make billions of copies of basically each and every rRNA gene present in a sample to be rapidly characterized. However, there are a sample; this amplifies the rRNA signal relative to the noise of thousands of other some biases in the process that make it not perfect for all genes present in each organism’s DNA; (c) sequencing and phylogenetic analysis aspects of typing and counting. places rRNA genes on the rRNA tree of life; the position on the tree is used to infer what type of organism (a.k.a. phylotype) the gene came from; and (d) the numbers of each microbe type are estimated from the number of times the same rRNA gene is seen.Shotgun genome The DNA from an organism is isolated and broken into small fragments, and then This has now been applied to over 1,000 microbes, as wellsequencing of cultured portions of these fragments are sequenced, usually with the aid of sequencing as some multicellular species, and has provided a muchspecies machines. The fragments are then assembled into larger pieces by looking deeper understanding of the biology and evolution of life. for overlaps in the sequence each possesses. The complete genome can be One limitation is that each genome sequence is usually a determined by filling in gaps between the larger pieces. snapshot of one or a few individuals.Metagenomics DNA is directly isolated from an environmental sample and then sequenced. This method allows one to sample the genomes of One approach to doing this is to select particular pieces of interest (e.g., those microbes without culturing them. It can be used both for containing interesting rRNA genes) and sequence them. An alternative is ESS, typing and counting taxa and for making predictions of which is shotgun genome sequencing as described above, but applied to an their biological functions. environmental sample with multiple organisms, rather than to a single cultured organism.doi:10.1371/journal.pbio.0050082.t001 The selective targeting of a single Certainly, many challenges remain the phylotypes and study its propertiesgene makes rRNA-PCR an efficient before we can fully realize the potential in the lab. Unfortunately, many, ifmethod for deep community sampling of ESS for the typing and counting of not most, key microbes have not yet[18]. However, this efficiency comes species, including making automated been cultured [22]. Thus, for manywith limitations, most of which are yet accurate phylogenetic trees of every years, the only alternative was tocomplemented or circumvented by the gene, determining which genes are make predictions about the biology ofrandomness and breadth of ESS. For most useful for which taxa, combining particular phylotypes based on whatexample, examination of the random data from different genes even when was known about related organisms.samples of rRNA sequences obtained we do not know if they come from Unfortunately, this too does not workthrough ESS has already led to the the same organisms, building up well for microbes since very closelydiscovery of new taxa—taxa that were databases of genes other than rRNA, related organisms frequently havecompletely missed by PCR because of and making up for the lack of depth of major biological differences. Forits inability to sample all taxa equally sampling. If these challenges are met, example, Escherichia coli K12 and E.well (e.g., [19]). In addition, ESS ESS has the potential to rewrite much coli O157:H7 are strains of the sameprovides the first robust sampling of of what we thought we knew about the species (and considered to be the samegenes other than rRNA, and many of phylogenetic diversity of microbial life. phylotype), with genomes containingthese genes can be more useful for only about 4,000 genes, yet eachsome aspects of typing and counting. What Are They Doing? Top Down possesses hundreds of functionallySome universal protein coding and Bottom Up Approaches to important genes not seen in thegenes are better than rRNA both for Understanding Functions in other strain [23]. Such differencesdistinguishing closely related strains are routine in microbes, and thus one Communities(because of third position variation in cannot make any useful inferencescodons) and for estimating numbers A community is, of course, more about what particular phylotypes areof individuals (because they vary less than a list of types of organisms. doing (e.g., type of metabolism, growthin copy number between species One approach to understanding properties, role in nutrient cycling, orthan do rRNA genes) [10]. Perhaps the properties and functioning of pathogenicity) based on the activities ofmost significantly, ESS is providing a microbial community is to start their relatives.groundbreaking insights into the with studies of the different types of These difficulties—the inabilitydiversity of viruses [20,21], which lack organisms and build up from these to culture most microbes and therRNA genes and thus were left out of individuals to the community. Ideally, functional disparities between closethe previous revolution. to do this one would culture each of relatives—led to one of the first kinds PLoS Biology | 0002 March 2007 | Volume 5 | Issue 3 | e82
  3. 3. Table 2. Methods of BinningMethod Description CommentsGenome assembly Identify regions of overlap between different fragments Getting deep enough sampling for this to work is very expensive from the same organism to build larger contiguous pieces except for low diversity systems or for very abundant taxa. (contigs).Reference genome alignment Identify ESS fragments or contigs that are very similar (a) One of the most effective ways to sort through ESS data, if the to already assembled sections of the genome of single reference genome is very closely related to an organism in the sample; microbial types. (b) the reason why more reference genomes are needed; (c) does not handle regions present in uncultured organisms but not in the reference.Phylogenetic analysis Build evolutionary trees of genes encoded by ESS fragments (a) Very powerful, but level of resolution depends on whether or contigs. Assign fragments or contigs to taxonomic fragments encode useful phylogenetic markers and on how well groups based on nearest neighbor(s) in trees. sampled the database is for the neighbor analysis; (b) would work much better if more genomes were available from across the tree of life.Word frequency and nucleotide Measure word frequency and composition of each (a) Has the potential to work because organisms sometimes havecomposition analysis fragment. Group by clustering algorithms or principal “signatures” of word frequencies that are found throughout the component analysis. genome and are different between species; (b) very challenging for small fragments.Population genetics Build alignments of fragments or contigs with similarity May be most useful as a way of subdividing bins created by other to each other (but not as much as needed for assembly). methods. Examine haplotype structure, predicted effective population size, and synonymous and non synonymous substitution patterns.Note that some methods can be applied to ESS fragments or to bins identified by other methods.doi:10.1371/journal.pbio.0050082.t002of metagenomic analyses, wherein and species), and these compartments trying to bin? Is it fragments from thepredictions of function were made matter. The key challenge in analyzing same chromosome from a single cell,from analysis of the sequence of large ESS data is to sort the DNA fragments which would be useful for studyingDNA fragments from representatives (which are usually less than 1,000 base chromosome structure? If so, thenof known phylotypes. This approach pairs long relative to genome sizes of perhaps genome assembly methodshas provided some stunning insights, millions or billions of bases) into bins are the best. What if instead, as in thesuch as the discovery of a novel form that correspond to compartments in the sharpshooter example, we are trying toof phototrophy in the oceans [24]. system being studied. have each bin include every fragmentHowever, this large insert approach A recent study by myself and that came from a particular species,has the same limitation as predicting colleagues illustrates the importance knowledge which may be useful forproperties from characterized of compartments when interpreting predicting community metabolicrelatives—a single cell cannot possibly ESS data. When we analyzed ESS data potential? If the level of geneticrepresent the biological functions of all from symbionts living inside the gut polymorphism among individualmembers of a phylotype. of the glassy-winged sharpshooter (an cells from the same species is high, ESS provides an alternative, more insect that has a nutrient-limited diet), then genome assembly methods mayglobal way of assessing biological we were able to bin the data to two not work well (the polymorphismsfunctions in microbial communities. As distinct symbionts [26]. We then could will break up assemblies). A betterwhen using the large insert approach, infer from those data that one of the approach might be to look for species-functions can be predicted from symbionts synthesizes amino acids for specific “word” frequencies in thesequences. However, in this case the the host while the other synthesizes DNA, such as ones created by patternspredicted functions represent a random the needed vitamins and cofactors. in codon usage. The challenge is, howsampling of those encoded in the Modeling and understanding of this do we tune the methods to find thegenomes of all the organisms present. ecosystem are greatly enhanced by the right target level of resolution? If weThis approach has unquestionably demonstration of this complementary are too stringent, most bins will includebeen wildly successful in terms of gene division of labor, in comparison to only a few fragments. But if we arediscovery. For example, analysis of simply knowing that amino acids, too relaxed, we will create artificialESS data has revealed novel forms of vitamins, and cofactors are made by constructs that may prove biologicallyevery type of gene family examined, as “symbionts.” misleading, such as grouping togetherwell as a great number of completely How does one go about binning sequences from different species. Tonovel families (e.g., [25]). However, ESS data? A variety of approaches have make matters more complex, mostthere is a major caveat when using been developed, some of which are likely the stringency needed will varyESS data to make community-level described in Table 2. In considering for different taxa present in the sample.inferences. Ecosystems are more than the different binning methods and Another critical issue is the diversityjust a bag of genes—they are made up of their limitations, the first question of the system under study. Generally,compartments (e.g., cells, chromosomes, one needs to ask is, what are we binning works better when there are PLoS Biology | 0003 March 2007 | Volume 5 | Issue 3 | e82
  4. 4. few different phylotypes present, all Similarly, the initial comparisons of References 1. Gould SJ (1996) Full house: The spread ofof which are distantly related and ESS data involved comparisons of wildly excellence from Plato to Darwin. New York:form discrete populations. This is why different environments [32], yielding Harmony Books. 244 p.binning works well for the sharpshooter insights into the general structure of 2. Evans AV, Bellamy CL (1996) An inordinate fondness for beetles. New York: Holt. 208 p.system and other relatively isolated, communities. But as more comparisons 3. Woese C, Fox G (1977) Phylogenetic structurelow diversity environments. Binning are made between similar communities of the prokaryotic domain: The primaryincreases in difficulty exponentially kingdoms. Proc Natl Acad Sci U S A 74: 5088– [33,34], such as those sampled during the number of species increases: vertical and horizontal ocean transects 4. Mullis K, Faloona F (1987) Specific synthesis ofthe populations and species start to [27,35–37], we will begin to learn DNA in vitro via a polymerase-catalyzed chain reaction. Methods Enzymol 155: 335–350.merge together, and the populations about shorter time scale processes such 5. Reysenbach AL, Giver LJ, Wickham GS, Paceget more and more polymorphic and as migration, speciation, extinction, NR (1992) Differential amplification of rRNAvariable in relative abundance (such as genes by polymerase chain reaction. Appl responses to disturbance, and Environ Microbiol 58: 3417– the paper about the Global Ocean succession. It is from a combination 6. Medlin L, Elwood HJ, Stickel S, Sogin MLSampling expedition in this issue [27]). of both approaches—comparing (1988) The characterization of enzymaticallyFurther complicating binning is the amplified eukaryotic 16S-like ribosomal RNA- both similar and very divergent coding regions. Gene 71: 491–500.phenomenon of lateral gene transfer, communities—that we will be able to 7. Weisburg W, Barns S, Pelletier D, Lane Dwhere genes are exchanged between understand the fundamental rules of (1991) 16S ribosomal DNA amplification for phylogenetic study. J Bacteriol 173: 697–703.distantly related lineages at rates that microbial ecology and how they relate 8. Fleischmann RD, Adams MD, White O,are high enough that random sampling to ecological principles seen in macro- Clayton RA, Kirkness EF, et al. (1995) Whole-of a genome will frequently include genome random sequencing and assembly organisms. of Haemophilus influenzae Rd. Science 269:genes with multiple histories. 496–512. Despite these challenges, I believe we Conclusions 9. Handelsman J (2004) Metagenomics:can develop effective binning methods Application of genomics to uncultured In promoting some of the exciting microorganisms. Microbiol Mol Biol Rev 68:for complex communities. First, we 669–685. opportunities with ESS, I do notcan combine different approaches 10. Venter JC, Remington K, Heidelberg want to give the impression that it is JF, Halpern AL, Rusch D, et al. (2004)together, such as using one method flawless. It is helpful in this respect to Environmental genome shotgun sequencing ofto sort in a relaxed manner and then the Sargasso Sea. Science 304: 66–74. compare ESS to the Internet. As with 11. Tyson GW, Chapman J, Hugenholtz P, Allenusing another to subdivide the bins the Internet, ESS is a global portal for EE, Ram RJ, et al. (2004) Community structureprovided by the first method. Second, and metabolism through reconstruction of looking at what occurs in a previouslywe can incorporate new approaches microbial genomes from the environment. hidden world. Making sense of it Nature 428: 37–43.such as population genetics into requires one to sort through massive, 12. Gill SR, Pop M, Deboy RT, Eckburg PB,the analysis [28]. In addition, the Turnbaugh PJ, et al. (2006) Metagenomiclessons learned here can be applied to random, fragmented collections of bits analysis of the human distal gut microbiome.other aspects of metagenomics (e.g., of information. Such searches need Science 312: 1355–1359. to be done with caution because any 13. Garcia Martin H, Ivanova N, Kunin V,the counting and typing discussed Warnecke F, Barry KW, et al. (2006)above) and provide insights into the time you analyze such a large amount Metagenomic analysis of two enhancednature of microbial genomes and the of data patterns can be found. In biological phosphorus removal (EBPR) sludge communities. Nat Biotechnol 24: 1263–1269.structure of microbial populations and addition, as with the Internet, there 14. Olsen GJ, Larsen N, Woese CR (1991) Thecommunities. is certainly some hype associated with ribosomal RNA database project. Nucleic Acids ESS that gives relatively trivial findings Res 19: 2017–2021. 15. Cole JR, Chai B, Farris RJ, Wang Q, Kulam-Comparative Metagenomics more attention than they deserve. Syed-Mohideen AS, et al. (2007) The ribosomalSo far, I have discussed issues relating Overall, though, I believe the hype database project (RDP-II): Introducing myRDP space and quality controlled public data.mostly to intrasample analysis of is deserved. As long as we treat ESS Nucleic Acids Res 35: D169–D172.ESS data. However, the area with as a strong complement to existing 16. Pace NR (1997) A molecular view of microbial methods, and we build the tools and diversity and the biosphere. Science 276: 734–perhaps the most promise involves 740.the comparative analysis of different databases necessary for people to use 17. Hugenholtz P, Pitulle C, Hershberger KL,samples. This work parallels the the information, it will live up to its Pace NR (1998) Novel division level bacterial diversity in a Yellowstone hot spring. J Bacteriolcomparative analysis of genomes of revolutionary potential. 180: 366–376.cultured species. Initial studies of 18. Sogin ML, Morrison HG, Huber JA, Welchthat type compared distantly related Acknowledgments DM, Huse SM, et al. (2006) Microbial diversity in the deep sea and the underexplored “raretaxa with enormous biological I thank Simon Levin, Joshua Weitz, biosphere”. Proc Natl Acad Sci U S A 103:differences. What has been learned Jonathan Dushoff, Maria-Inés Benito, 12115–12120. Doug Rusch, Aaron Halpern, and Shibu 19. Baker BJ, Tyson GW, Webb RI, Flanaganfrom these studies pertains mostly to J, Hugenholtz P, et al. (2006) Lineages ofcore housekeeping functions, such Yooseph for helpful discussions, and acidophilic archaea revealed by community Melinda Simmons, Merry Youle, and three genomic analysis. Science 314: 1933– translation and DNA metabolism, anonymous reviewers for helpful comments 20. Angly FE, Felts B, Breitbart M, Salamon P,and to other very ancient processes Edwards RA, et al. (2006) The marine viromes on the manuscript. The writing of this[29,30]. It was not until comparisons paper was supported by National Science of four oceanic regions. PLoS Biol 4: e368. doi:10.1371/journal.pbio.0040368were made between closely related Foundation Assembling the Tree of Life 21. Edwards RA, Rohwer F (2005) Viralorganisms that we began to understand Grant 0228651 to Jonathan A. Eisen and by metagenomics. Nat Rev Microbiol 3: 504– that occurred on shorter 22. Leadbetter JR (2003) Cultivation of recalcitrant the Defense Advanced Research Projects microbes: Cells are alive, well and revealingtime scales, such as selection, gene Agency under grants HR0011-05-1-0057 their secrets in the 21st century laboratory.transfer, and mutation processes [31]. and FA9550-06-1-0478. Curr Opin Microbiol 6: 274–281. PLoS Biology | 0004 March 2007 | Volume 5 | Issue 3 | e82
  5. 5. 23. Perna NT, Plunkett G 3rd, Burland V, Mau B, II Gobal Ocean Sampling expedition: metagenomics of microbial communities. Glasner JD, et al. (2001) Genome sequence of Northwest Atlantic through Eastern Tropical Science 308: 554–557. enterohaemorrhagic Escherichia coli O157:H7. Pacific. PLoS Biol 5: e77. doi:10.1371/journal. 33. Edwards RA, Rodriguez-Brito B, Wegley L, Nature 409: 529–533. pbio.0050077 Haynes M, Breitbart M, et al. (2006) Using24. Beja O, Aravind L, Koonin EV, Suzuki MT, 28. Johnson PL, Slatkin M (2006) Inference pyrosequencing to shed light on deep mine Hadd A, et al. (2000) Bacterial rhodopsin: of population genetic parameters in microbial ecology. BMC Genomics 7: 57. Evidence for a new type of phototrophy in the metagenomics: A clean look at messy data. 34. Rodriguez-Brito B, Rohwer F, Edwards RA sea. Science 289: 1902–1906. Genome Res 16: 1320–1327. (2006) An application of statistics to comparative25. Yooseph S, Sutton G, Rusch DB, Halpern AL, 29. Koonin EV, Mushegian AR (1996) Complete metagenomics. BMC Bioinformatics 7: 162. Williamson SJ, et al. (2007) The Sorcerer II genome sequences of cellular life forms: 35. DeLong EF (2005) Microbial community Global Ocean Sampling expedition: Expanding Glimpses of theoretical evolutionary genomics. genomics in the ocean. Nat Rev Microbiol 3: the universe of protein families. PLoS Biol 5: Curr Opin Genet Dev 6: 757–762. 459–469. e16. DOI: 10.1371/journal.pbio.0050016 30. Mushegian AR, Koonin EV (1996) A minimal 36. DeLong EF, Preston CM, Mincer T, Rich V,26. Wu D, Daugherty SC, Van Aken SE, Pai gene set for cellular life derived by comparison Hallam SJ, et al. (2006) Community genomics GH, Watkins KL, et al. (2006) Metabolic of complete bacterial genomes. Proc Natl Acad among stratified microbial assemblages in the complementarity and genomics of the dual Sci U S A 93: 10268–10273. ocean’s interior. Science 311: 496–503. bacterial symbiosis of sharpshooters. PLoS Biol 31. Eisen JA (2001) Gastrogenomics. Nature 409: 37. Worden AZ, Cuvelier ML, Bartlett DH 4: e188. doi:10.1371/journal.pbio.0040188 463, 465–466. (2006) In-depth analyses of marine microbial27. Rusch DB, Halpern AL, Sutton G, Heidelberg 32. Tringe SG, von Mering C, Kobayashi A, community genomics. Trends Microbiol 14: KB, Williamson S, et al. (2007) The Sorcerer Salamov AA, Chen K, et al. (2005) Comparative 331–336. PLoS Biology | 0005 March 2007 | Volume 5 | Issue 3 | e82