• Total number of prokaryotic cells on earth 4–6 × 1030
• Less than 0.1% are culturable
• Yet to discover the correct culture conditions for culturing the rest 99.9%
• Metagenomics presently offers a way to access unculturable microorganisms
because it is a culture-independent way to study them.
• It involves extracting DNA directly from an environmental sample –e.g.
seawater, soil, the human gut – and then studying the DNA sample.
“The application of modern genomics techniques to the study of communities of
microbial organisms directly in their natural environments, bypassing the need for
isolation and lab cultivation of individual species”
- Kevin Chen and Lior Pachter
• Study of metagenomes, genetic material recovered directly
from environmental samples.
• Also reffered as Environmental genomics, ecogenomics, or community genomics.
• The term "metagenomics" was first used by Jo Handelsman,
Jon Clardy, Robert M. Goodman, and others,
and first appeared in publication in 1998
• Late 17th century, Anton van Leeuwenhoek :
• First metagenomicist who directly studied organisms from pond water and his own teeth.
• Cell culture evolved, 16 S rRNA sequencing of culturable microbes
• If an organism could not be cultured, it could not be classified.
• Discrepancies observed:
• (1) Number of organisms under microscope in conflict with amount on plates.
• Ex: Aquatic culture differed by 4-6 orders of magnitude from direct observation.
• (2) Cellular activities in situ conflicted with activities in culture.
• Ex: Sulfolobus acidocaldarius in hot springs grew at lower temperatures than required
• (3) Cells are viable but unculturable.
• Norman Pace proposed the idea of cloning DNA directly from environmental samples in 1985
• The first report was published by Pace and colleagues in 1991 which reported non fuctional genes.
Healy reported the metagenomic isolation of functional genes from "zoolibraries"
constructed from a complex culture of environmental organisms grown in the
laboratory on dried grasses in 1995
After leaving the Pace laboratory, Edward DeLong continued in the field and has
published work that has largely laid the groundwork for environmental phylogenies based
on signature 16S sequences, beginning with his group's construction of libraries
from marine samples
In 2002, Mya Breitbart & Forest Rohwer, and colleagues used shotgun sequencing to show that
200 liters of seawater contains over 5000 different viruses.
In 2003, Craig Venter led the Global Ocean Sampling Expedition (GOS), circumnavigating the
globe and collecting metagenomic samples throughout the journey. All of these samples are
sequenced using shotgun sequencing, in hopes that new genomes (and therefore new organisms)
would be identified.
The pilot project, conducted in the Sargasso Sea, found DNA from nearly 2000 different species,
including 148 types of bacteria never before seen.
Venter has circumnavigated the globe and thoroughly explored the West Coast of the
United States, and completed a two-year expedition to explore
the Baltic, Mediterranean and Black Seas. Analysis of the metagenomic data collected
during this journey revealed two groups of organisms, one composed of taxa adapted to
environmental conditions of 'feast or famine', and a second composed of relatively fewer
but more abundantly and widely distributed taxa primarily composed of plankton
In 2004, Gene Tyson, Jill Banfield, and colleagues at the University of California,
Berkeley and the Joint Genome Institute sequenced DNA extracted from an acid mine
In 2005 Stephan C. Schuster at Penn State University and colleagues published the first
sequences of an environmental sample generated with high-throughput sequencing, in this
case massively parallel pyrosequencing developed by 454 Life Sciences
METAGENOMICS AND SYMBIOSIS
Many microorganisms with symbiotic relationships with their hosts are difficult to
culture away from the host are prime candidates for metagenomics.
• Eg. the Aphid and Buchnera,
• First example of genomics on an uncultured microorganism.
• lost almost 2000 genes since it entered the symbiotic relationship 200–250 million
• It contains only 564 genes
• Does not conduct many of the life functions
The deep-sea tube worm, Riftia pachyptila, and a bacterium (Boetius, 2005).
• These creatures live in harsh environments near thermal vents 2600m below the
• The tube worm provides the bacterium with carbon dioxide, hydrogen sulfide and
oxygen, which it accumulates from the seawater.
• The bacterium, converts the carbon dioxide to amino acids and sugars needed by
the tube worm, using the hydrogen sulfide for energy
Halophilic environments Glacial
Deep sea Desert
METAGENOME OF EXTREME HABITATS
• Metagenomic analyses of seawater revealed some interesting aspects
of ocean-dwelling microorganisms.
• More than one million genes were sequenced and deposited in the
• Groups of bacteria that were not previously known to transduce light
energy appear to contain genes for such a function eg. Rhodopsin.
• Metagenomic analysis of the biofilm led to the computer-based
reconstruction of the genomes of some of the community members.
• A model for the cycling of carbon, nitrogen and metals in the acid
mine drainage environment was developed.
• The human intestinal microbiota is composed
of 1013 to 1014 microorganisms
• Collective genome (‘‘microbiome’’) contains
at least 100 times as many genes as our own
• About 10 to 100 trillion microbes inhabit our
• The greatest number residing in the distal gut.
• They synthesize essential amino acids and
vitamins and process components of
otherwise indigestible contributions to our
• 70 divisions of Bacteria and 13 divisions of Archaea described to date
• The distal gut and fecal microbiota was dominated by just two bacterial divisions,
the Bacteroidetes and the Firmicutes, which made up 999% of the identified
phylogenetic types, and by one prominent methanogenic archaeon,
• The human distal gut microbiome is estimated to contain ˃100 times as many
genes as our 2.85–billion base pair (bp) human genome.
• Oral metagenome is also done
Metagenomic studies have revealed that each person carries a unique microbial community in his
or her gastrointestinal tract; in fact these communities have been called a ‘second fingerprint’
because they provide a personal signature for each of us.
ACID MINE DRAINAGE METAGENOME
6 species identified with 16 S rRNA
10X coverage of dominant species
• carbon fixation
N2-fixation genes found only in a minor community member
• Scope of diversity: Sargasso Sea
– Oligotrophic environment
– More diverse than expected
• Sequenced 1x109 bases
• Found 1.2 million new genes
• 794,061 open reading frames with no known function
• 69,718 open reading frames for energy transduction
– 782 rhodopsin-like photoreceptors
• 1412 rRNA genes, 148 previously unknown phylotypes
(97% similarity cut off)
– α- and γ- Proteobacteria dominant groups
Venter, J.C. 2004. Science 304:66
• Data Storage:
– Metagenomic Library – 2 Approaches
• Function-Driven: Focuses on activity of target protein and clones that express a given
• Sequence-Driven: Relies on conserved DNA to design PCR primers and hybrdization
probes; gives functional information about the organism.
–“Evolutionary Chronometer:” Very slow mutation rate.
–Universal and functionally similar
–16S rRNA sequences used.
•Data Collection Methods:
–Initially, direct sequencing of RNA and sequencing reverse transcription generated DNA.
–Progressed to PCR
TWO APPROACHES FOR METAGENOMICS
• In the first approach, known as
DNA from the environment of
interest is sequenced and
subjected to computational
• The metagenomic sequences are
compared to sequences deposited
in publicly available databases such
• The genes are then collected into
groups of similar predicted
function, and the distribution of
various functions and types of
proteins that conduct those
functions can be assessed.
• In the second approach, ‘function-driven
metagenomics’, the DNA
extracted from the environment is
also captured and stored in a
surrogate host, but instead of
sequencing it, scientists screen the
captured fragments of DNA, or
‘clones’, for a certain function.
• The function must be absent in the
surrogate host so that acquisition
of the function can be attributed to
the metagenomic DNA.
LIMITATIONS OF TWO APPROACHES
• The sequence driven approach
• limited existing knowledge: if a metagenomic gene does not look like a gene
of known function deposited in the databases, then little can be learned
about the gene or its product from sequence alone.
• The function driven approach
• most genes from organisms in wild communities cannot be expressed easily
by a given surrogate host
Therefore, the two approaches are complementary and should be pursued in
Nucleic Acid Extraction:
Cell Extraction and Direct Lysis
Cell lysis (chemical, enzymatic or mechanical) followed by removal of cell
fragments and nucleic acid precipitation and purification.
• Genome enrichment:
• Sample enrichment enhances the screening of metagenomic libraries for a
particular gene of interest, the proportion of which is generally smaller than
the total nucleic acid content.
• Stable isotope probing (SIP) and 5-Bromo-2-deoxyuridine labeling of DNA or
RNA, followed by density-gradient centrifugal separation.
• Suppressive subtractive hybridization (SSH)
• Phage display
• DNA microarray
• Nucleic acid extraction and enrichment technologies
• Genome and gene enrichment
• Metagenomic libraries
• Transcriptome libraries
• Metagenome sequencing
PCR is used to probe genomes for specific metabolic or biodegradative
•Primer design based on known sequence information
•Amplification limited mainly to gene fragments rather than full-length
genes, requiring additional procedures to attain the full-length genes
•RT-PCR has been used to recover genes from environmental samples since
RNA is a more sensitive biomarker than DNA
• Metagenome sequencing:
• Complete metagenomes sequencing using large fragments of genomic DNA
from uncultured microorganisms.
• The objectives have been to sequence and identify the thousands of viral
and prokaryotic genomes as well as lower eukaryotic species present in
small environmental samples such as a gram of soil or liter of seawater.
• – Too much data?
• • Most genes are not identifiable
• – Contamination, chimeric clone sequences
• – Extraction problems
• – Requires proteomics or expression studies to demonstrate phenotypic
• – Need a standard method for annotating genomes
• – Requires high throughput instrumentation – not readily available to most
• Can only progress as library technology progresses, including sequencing
FUTURE OF METAGENOMICS
• To identify new enzymes & antibiotics
• To assess the effects of age, diet, and pathologic states (e.g.,
inflammatory bowel diseases, obesity, and cancer) on the distal gut
microbiome of humans living in different environments
• Study of more exotic habitats
• Study antibiotic resistance in soil microbes
• Improved bioinformatics will quicken analysis for library profiling
• Investigating ancient DNA remnants
• Discoveries such as phylogenic tags (rRNA genes, etc) will give
momentum to the growing field
• Learning novel pathways will lead to knowledge about the current
nonculturable bacteria to then culture these systems