Eisen JA (2007) Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes. PLoS Biol 5(3): e82. doi:10.1371/journal.pbio.0050082
Experimenting with posting OpenAccess papers on Slideshare
2. Table 1. Some Major Methods for Studying Individual Microbes Found in the Environment
Method Summary Comments
Microscopy Microbial phenotypes can be studied by making them more visible. In conjunction The appearance of microbes is not a reliable indicator of
with other methods, such as staining, microscopy can also be used to count taxa what type of microbe one is looking at.
and make inferences about biological processes.
Culturing Single cells of a particular microbial type are grown in isolation from other This is the best way to learn about the biology of a
organisms. This can be done in liquid or solid growth media. particular organism. However, many microbes are
uncultured (i.e., have never been grown in the lab in
isolation from other organisms) and may be unculturable
(i.e., may not be able to grow without other organisms).
rRNA-PCR The key aspects of this method are the following: (a) all cell-based organisms This method revolutionized microbiology in the 1980s by
possess the same rRNA genes (albeit with different underlying sequences); (b) PCR allowing the types and numbers of microbes present in
is used to make billions of copies of basically each and every rRNA gene present in a sample to be rapidly characterized. However, there are
a sample; this amplifies the rRNA signal relative to the noise of thousands of other some biases in the process that make it not perfect for all
genes present in each organism’s DNA; (c) sequencing and phylogenetic analysis aspects of typing and counting.
places rRNA genes on the rRNA tree of life; the position on the tree is used to infer
what type of organism (a.k.a. phylotype) the gene came from; and (d) the numbers
of each microbe type are estimated from the number of times the same rRNA gene
is seen.
Shotgun genome The DNA from an organism is isolated and broken into small fragments, and then This has now been applied to over 1,000 microbes, as well
sequencing of cultured portions of these fragments are sequenced, usually with the aid of sequencing as some multicellular species, and has provided a much
species machines. The fragments are then assembled into larger pieces by looking deeper understanding of the biology and evolution of life.
for overlaps in the sequence each possesses. The complete genome can be One limitation is that each genome sequence is usually a
determined by filling in gaps between the larger pieces. snapshot of one or a few individuals.
Metagenomics DNA is directly isolated from an environmental sample and then sequenced. This method allows one to sample the genomes of
One approach to doing this is to select particular pieces of interest (e.g., those microbes without culturing them. It can be used both for
containing interesting rRNA genes) and sequence them. An alternative is ESS, typing and counting taxa and for making predictions of
which is shotgun genome sequencing as described above, but applied to an their biological functions.
environmental sample with multiple organisms, rather than to a single cultured
organism.
doi:10.1371/journal.pbio.0050082.t001
The selective targeting of a single Certainly, many challenges remain the phylotypes and study its properties
gene makes rRNA-PCR an efficient before we can fully realize the potential in the lab. Unfortunately, many, if
method for deep community sampling of ESS for the typing and counting of not most, key microbes have not yet
[18]. However, this efficiency comes species, including making automated been cultured [22]. Thus, for many
with limitations, most of which are yet accurate phylogenetic trees of every years, the only alternative was to
complemented or circumvented by the gene, determining which genes are make predictions about the biology of
randomness and breadth of ESS. For most useful for which taxa, combining particular phylotypes based on what
example, examination of the random data from different genes even when was known about related organisms.
samples of rRNA sequences obtained we do not know if they come from Unfortunately, this too does not work
through ESS has already led to the the same organisms, building up well for microbes since very closely
discovery of new taxa—taxa that were databases of genes other than rRNA, related organisms frequently have
completely missed by PCR because of and making up for the lack of depth of major biological differences. For
its inability to sample all taxa equally sampling. If these challenges are met, example, Escherichia coli K12 and E.
well (e.g., [19]). In addition, ESS ESS has the potential to rewrite much coli O157:H7 are strains of the same
provides the first robust sampling of of what we thought we knew about the species (and considered to be the same
genes other than rRNA, and many of phylogenetic diversity of microbial life. phylotype), with genomes containing
these genes can be more useful for only about 4,000 genes, yet each
some aspects of typing and counting. What Are They Doing? Top Down possesses hundreds of functionally
Some universal protein coding and Bottom Up Approaches to important genes not seen in the
genes are better than rRNA both for Understanding Functions in other strain [23]. Such differences
distinguishing closely related strains are routine in microbes, and thus one
Communities
(because of third position variation in cannot make any useful inferences
codons) and for estimating numbers A community is, of course, more about what particular phylotypes are
of individuals (because they vary less than a list of types of organisms. doing (e.g., type of metabolism, growth
in copy number between species One approach to understanding properties, role in nutrient cycling, or
than do rRNA genes) [10]. Perhaps the properties and functioning of pathogenicity) based on the activities of
most significantly, ESS is providing a microbial community is to start their relatives.
groundbreaking insights into the with studies of the different types of These difficulties—the inability
diversity of viruses [20,21], which lack organisms and build up from these to culture most microbes and the
rRNA genes and thus were left out of individuals to the community. Ideally, functional disparities between close
the previous revolution. to do this one would culture each of relatives—led to one of the first kinds
PLoS Biology | www.plosbiology.org 0002 March 2007 | Volume 5 | Issue 3 | e82
3. Table 2. Methods of Binning
Method Description Comments
Genome assembly Identify regions of overlap between different fragments Getting deep enough sampling for this to work is very expensive
from the same organism to build larger contiguous pieces except for low diversity systems or for very abundant taxa.
(contigs).
Reference genome alignment Identify ESS fragments or contigs that are very similar (a) One of the most effective ways to sort through ESS data, if the
to already assembled sections of the genome of single reference genome is very closely related to an organism in the sample;
microbial types. (b) the reason why more reference genomes are needed; (c) does
not handle regions present in uncultured organisms but not in the
reference.
Phylogenetic analysis Build evolutionary trees of genes encoded by ESS fragments (a) Very powerful, but level of resolution depends on whether
or contigs. Assign fragments or contigs to taxonomic fragments encode useful phylogenetic markers and on how well
groups based on nearest neighbor(s) in trees. sampled the database is for the neighbor analysis; (b) would work
much better if more genomes were available from across the tree of
life.
Word frequency and nucleotide Measure word frequency and composition of each (a) Has the potential to work because organisms sometimes have
composition analysis fragment. Group by clustering algorithms or principal “signatures” of word frequencies that are found throughout the
component analysis. genome and are different between species; (b) very challenging for
small fragments.
Population genetics Build alignments of fragments or contigs with similarity May be most useful as a way of subdividing bins created by other
to each other (but not as much as needed for assembly). methods.
Examine haplotype structure, predicted effective
population size, and synonymous and non synonymous
substitution patterns.
Note that some methods can be applied to ESS fragments or to bins identified by other methods.
doi:10.1371/journal.pbio.0050082.t002
of metagenomic analyses, wherein and species), and these compartments trying to bin? Is it fragments from the
predictions of function were made matter. The key challenge in analyzing same chromosome from a single cell,
from analysis of the sequence of large ESS data is to sort the DNA fragments which would be useful for studying
DNA fragments from representatives (which are usually less than 1,000 base chromosome structure? If so, then
of known phylotypes. This approach pairs long relative to genome sizes of perhaps genome assembly methods
has provided some stunning insights, millions or billions of bases) into bins are the best. What if instead, as in the
such as the discovery of a novel form that correspond to compartments in the sharpshooter example, we are trying to
of phototrophy in the oceans [24]. system being studied. have each bin include every fragment
However, this large insert approach A recent study by myself and that came from a particular species,
has the same limitation as predicting colleagues illustrates the importance knowledge which may be useful for
properties from characterized of compartments when interpreting predicting community metabolic
relatives—a single cell cannot possibly ESS data. When we analyzed ESS data potential? If the level of genetic
represent the biological functions of all from symbionts living inside the gut polymorphism among individual
members of a phylotype. of the glassy-winged sharpshooter (an cells from the same species is high,
ESS provides an alternative, more insect that has a nutrient-limited diet), then genome assembly methods may
global way of assessing biological we were able to bin the data to two not work well (the polymorphisms
functions in microbial communities. As distinct symbionts [26]. We then could will break up assemblies). A better
when using the large insert approach, infer from those data that one of the approach might be to look for species-
functions can be predicted from symbionts synthesizes amino acids for specific “word” frequencies in the
sequences. However, in this case the the host while the other synthesizes DNA, such as ones created by patterns
predicted functions represent a random the needed vitamins and cofactors. in codon usage. The challenge is, how
sampling of those encoded in the Modeling and understanding of this do we tune the methods to find the
genomes of all the organisms present. ecosystem are greatly enhanced by the right target level of resolution? If we
This approach has unquestionably demonstration of this complementary are too stringent, most bins will include
been wildly successful in terms of gene division of labor, in comparison to only a few fragments. But if we are
discovery. For example, analysis of simply knowing that amino acids, too relaxed, we will create artificial
ESS data has revealed novel forms of vitamins, and cofactors are made by constructs that may prove biologically
every type of gene family examined, as “symbionts.” misleading, such as grouping together
well as a great number of completely How does one go about binning sequences from different species. To
novel families (e.g., [25]). However, ESS data? A variety of approaches have make matters more complex, most
there is a major caveat when using been developed, some of which are likely the stringency needed will vary
ESS data to make community-level described in Table 2. In considering for different taxa present in the sample.
inferences. Ecosystems are more than the different binning methods and Another critical issue is the diversity
just a bag of genes—they are made up of their limitations, the first question of the system under study. Generally,
compartments (e.g., cells, chromosomes, one needs to ask is, what are we binning works better when there are
PLoS Biology | www.plosbiology.org 0003 March 2007 | Volume 5 | Issue 3 | e82
4. few different phylotypes present, all Similarly, the initial comparisons of References
1. Gould SJ (1996) Full house: The spread of
of which are distantly related and ESS data involved comparisons of wildly excellence from Plato to Darwin. New York:
form discrete populations. This is why different environments [32], yielding Harmony Books. 244 p.
binning works well for the sharpshooter insights into the general structure of 2. Evans AV, Bellamy CL (1996) An inordinate
fondness for beetles. New York: Holt. 208 p.
system and other relatively isolated, communities. But as more comparisons 3. Woese C, Fox G (1977) Phylogenetic structure
low diversity environments. Binning are made between similar communities of the prokaryotic domain: The primary
increases in difficulty exponentially kingdoms. Proc Natl Acad Sci U S A 74: 5088–
[33,34], such as those sampled during 5090.
as the number of species increases: vertical and horizontal ocean transects 4. Mullis K, Faloona F (1987) Specific synthesis of
the populations and species start to [27,35–37], we will begin to learn DNA in vitro via a polymerase-catalyzed chain
reaction. Methods Enzymol 155: 335–350.
merge together, and the populations about shorter time scale processes such 5. Reysenbach AL, Giver LJ, Wickham GS, Pace
get more and more polymorphic and as migration, speciation, extinction, NR (1992) Differential amplification of rRNA
variable in relative abundance (such as genes by polymerase chain reaction. Appl
responses to disturbance, and Environ Microbiol 58: 3417–3418.
in the paper about the Global Ocean succession. It is from a combination 6. Medlin L, Elwood HJ, Stickel S, Sogin ML
Sampling expedition in this issue [27]). of both approaches—comparing (1988) The characterization of enzymatically
Further complicating binning is the amplified eukaryotic 16S-like ribosomal RNA-
both similar and very divergent coding regions. Gene 71: 491–500.
phenomenon of lateral gene transfer, communities—that we will be able to 7. Weisburg W, Barns S, Pelletier D, Lane D
where genes are exchanged between understand the fundamental rules of (1991) 16S ribosomal DNA amplification for
phylogenetic study. J Bacteriol 173: 697–703.
distantly related lineages at rates that microbial ecology and how they relate 8. Fleischmann RD, Adams MD, White O,
are high enough that random sampling to ecological principles seen in macro- Clayton RA, Kirkness EF, et al. (1995) Whole-
of a genome will frequently include genome random sequencing and assembly
organisms. of Haemophilus influenzae Rd. Science 269:
genes with multiple histories. 496–512.
Despite these challenges, I believe we Conclusions 9. Handelsman J (2004) Metagenomics:
can develop effective binning methods Application of genomics to uncultured
In promoting some of the exciting microorganisms. Microbiol Mol Biol Rev 68:
for complex communities. First, we 669–685.
opportunities with ESS, I do not
can combine different approaches 10. Venter JC, Remington K, Heidelberg
want to give the impression that it is JF, Halpern AL, Rusch D, et al. (2004)
together, such as using one method
flawless. It is helpful in this respect to Environmental genome shotgun sequencing of
to sort in a relaxed manner and then the Sargasso Sea. Science 304: 66–74.
compare ESS to the Internet. As with 11. Tyson GW, Chapman J, Hugenholtz P, Allen
using another to subdivide the bins
the Internet, ESS is a global portal for EE, Ram RJ, et al. (2004) Community structure
provided by the first method. Second, and metabolism through reconstruction of
looking at what occurs in a previously
we can incorporate new approaches microbial genomes from the environment.
hidden world. Making sense of it Nature 428: 37–43.
such as population genetics into
requires one to sort through massive, 12. Gill SR, Pop M, Deboy RT, Eckburg PB,
the analysis [28]. In addition, the Turnbaugh PJ, et al. (2006) Metagenomic
lessons learned here can be applied to random, fragmented collections of bits analysis of the human distal gut microbiome.
other aspects of metagenomics (e.g., of information. Such searches need Science 312: 1355–1359.
to be done with caution because any 13. Garcia Martin H, Ivanova N, Kunin V,
the counting and typing discussed Warnecke F, Barry KW, et al. (2006)
above) and provide insights into the time you analyze such a large amount Metagenomic analysis of two enhanced
nature of microbial genomes and the of data patterns can be found. In biological phosphorus removal (EBPR) sludge
communities. Nat Biotechnol 24: 1263–1269.
structure of microbial populations and addition, as with the Internet, there 14. Olsen GJ, Larsen N, Woese CR (1991) The
communities. is certainly some hype associated with ribosomal RNA database project. Nucleic Acids
ESS that gives relatively trivial findings Res 19: 2017–2021.
15. Cole JR, Chai B, Farris RJ, Wang Q, Kulam-
Comparative Metagenomics more attention than they deserve. Syed-Mohideen AS, et al. (2007) The ribosomal
So far, I have discussed issues relating Overall, though, I believe the hype database project (RDP-II): Introducing myRDP
space and quality controlled public data.
mostly to intrasample analysis of is deserved. As long as we treat ESS Nucleic Acids Res 35: D169–D172.
ESS data. However, the area with as a strong complement to existing 16. Pace NR (1997) A molecular view of microbial
methods, and we build the tools and diversity and the biosphere. Science 276: 734–
perhaps the most promise involves 740.
the comparative analysis of different databases necessary for people to use 17. Hugenholtz P, Pitulle C, Hershberger KL,
samples. This work parallels the the information, it will live up to its Pace NR (1998) Novel division level bacterial
diversity in a Yellowstone hot spring. J Bacteriol
comparative analysis of genomes of revolutionary potential. 180: 366–376.
cultured species. Initial studies of 18. Sogin ML, Morrison HG, Huber JA, Welch
that type compared distantly related Acknowledgments DM, Huse SM, et al. (2006) Microbial diversity
in the deep sea and the underexplored “rare
taxa with enormous biological I thank Simon Levin, Joshua Weitz, biosphere”. Proc Natl Acad Sci U S A 103:
differences. What has been learned Jonathan Dushoff, Maria-Inés Benito, 12115–12120.
Doug Rusch, Aaron Halpern, and Shibu 19. Baker BJ, Tyson GW, Webb RI, Flanagan
from these studies pertains mostly to J, Hugenholtz P, et al. (2006) Lineages of
core housekeeping functions, such Yooseph for helpful discussions, and acidophilic archaea revealed by community
Melinda Simmons, Merry Youle, and three genomic analysis. Science 314: 1933–1935.
as translation and DNA metabolism,
anonymous reviewers for helpful comments 20. Angly FE, Felts B, Breitbart M, Salamon P,
and to other very ancient processes Edwards RA, et al. (2006) The marine viromes
on the manuscript. The writing of this
[29,30]. It was not until comparisons paper was supported by National Science
of four oceanic regions. PLoS Biol 4: e368.
doi:10.1371/journal.pbio.0040368
were made between closely related Foundation Assembling the Tree of Life 21. Edwards RA, Rohwer F (2005) Viral
organisms that we began to understand Grant 0228651 to Jonathan A. Eisen and by metagenomics. Nat Rev Microbiol 3: 504–510.
events that occurred on shorter 22. Leadbetter JR (2003) Cultivation of recalcitrant
the Defense Advanced Research Projects
microbes: Cells are alive, well and revealing
time scales, such as selection, gene Agency under grants HR0011-05-1-0057 their secrets in the 21st century laboratory.
transfer, and mutation processes [31]. and FA9550-06-1-0478. Curr Opin Microbiol 6: 274–281.
PLoS Biology | www.plosbiology.org 0004 March 2007 | Volume 5 | Issue 3 | e82
5. 23. Perna NT, Plunkett G 3rd, Burland V, Mau B, II Gobal Ocean Sampling expedition: metagenomics of microbial communities.
Glasner JD, et al. (2001) Genome sequence of Northwest Atlantic through Eastern Tropical Science 308: 554–557.
enterohaemorrhagic Escherichia coli O157:H7. Pacific. PLoS Biol 5: e77. doi:10.1371/journal. 33. Edwards RA, Rodriguez-Brito B, Wegley L,
Nature 409: 529–533. pbio.0050077 Haynes M, Breitbart M, et al. (2006) Using
24. Beja O, Aravind L, Koonin EV, Suzuki MT, 28. Johnson PL, Slatkin M (2006) Inference pyrosequencing to shed light on deep mine
Hadd A, et al. (2000) Bacterial rhodopsin: of population genetic parameters in microbial ecology. BMC Genomics 7: 57.
Evidence for a new type of phototrophy in the metagenomics: A clean look at messy data. 34. Rodriguez-Brito B, Rohwer F, Edwards RA
sea. Science 289: 1902–1906. Genome Res 16: 1320–1327. (2006) An application of statistics to comparative
25. Yooseph S, Sutton G, Rusch DB, Halpern AL, 29. Koonin EV, Mushegian AR (1996) Complete metagenomics. BMC Bioinformatics 7: 162.
Williamson SJ, et al. (2007) The Sorcerer II genome sequences of cellular life forms: 35. DeLong EF (2005) Microbial community
Global Ocean Sampling expedition: Expanding Glimpses of theoretical evolutionary genomics. genomics in the ocean. Nat Rev Microbiol 3:
the universe of protein families. PLoS Biol 5: Curr Opin Genet Dev 6: 757–762. 459–469.
e16. DOI: 10.1371/journal.pbio.0050016 30. Mushegian AR, Koonin EV (1996) A minimal 36. DeLong EF, Preston CM, Mincer T, Rich V,
26. Wu D, Daugherty SC, Van Aken SE, Pai gene set for cellular life derived by comparison Hallam SJ, et al. (2006) Community genomics
GH, Watkins KL, et al. (2006) Metabolic of complete bacterial genomes. Proc Natl Acad among stratified microbial assemblages in the
complementarity and genomics of the dual Sci U S A 93: 10268–10273. ocean’s interior. Science 311: 496–503.
bacterial symbiosis of sharpshooters. PLoS Biol 31. Eisen JA (2001) Gastrogenomics. Nature 409: 37. Worden AZ, Cuvelier ML, Bartlett DH
4: e188. doi:10.1371/journal.pbio.0040188 463, 465–466. (2006) In-depth analyses of marine microbial
27. Rusch DB, Halpern AL, Sutton G, Heidelberg 32. Tringe SG, von Mering C, Kobayashi A, community genomics. Trends Microbiol 14:
KB, Williamson S, et al. (2007) The Sorcerer Salamov AA, Chen K, et al. (2005) Comparative 331–336.
PLoS Biology | www.plosbiology.org 0005 March 2007 | Volume 5 | Issue 3 | e82