Metagenomics is the study of genetic material recovered
directly from environmental samples.
While traditional microbiology and microbial genome
sequencing and genomics rely upon cultivated clonal
cultures, early environmental gene sequencing cloned specific
genes to produce a profile of diversity in a natural sample
TWO APPROACHES FOR METAGENOMICS
In the first approach:
Known as ‘sequence-driven metagenomics’, DNA from the
environment of interest is sequenced and subjected to
The metagenomic sequences are compared to sequences deposited
in publicly available databases such as GENBANK.
The genes are then collected into groups of similar predicted
function, and the distribution of various functions and types of
proteins that conduct those functions can be assessed.
In the second approach:
‘Function-driven metagenomics’, the DNA extracted from the
environment is also captured and stored in a surrogate host, but
instead of sequencing it, scientists screen the captured fragments of
DNA, or ‘clones’, for a certain function.
The function must be absent in the surrogate host so that acquisition
of the function can be attributed to the metagenomics DNA.
LIMITATIONS OF TWO APPROACHES
The sequence driven approach
limited existing knowledge: if a metagenomic gene does not look like
a gene of known function deposited in the databases, then little can be
learned about the gene or its product from sequence alone.
The function driven approach
most genes from organisms in wild communities cannot be expressed
easily by a given surrogate host
How it use in bioinformatics:
The first step of metagenomic data analysis requires the execution of
certain pre-filtering steps, including the removal of redundant, low-
quality sequences and sequences of probable eukaryotic origin .
The methods available for the removal of contaminating eukaryotic
genomic DNA sequences include Eu-Detect and DeConseq.
Comparative analyses between metagenomes can provide
additional insight into the function of complex microbial
communities and their role in host health.
Pairwise or multiple comparisons between metagenomes can be
made at the level of sequence composition (comparing GC-content
or genome size), taxonomic diversity, or functional complement.
Consequently, metadata on the environmental context of the
metagenomic sample is especially important in comparative
analyses, as it provides researchers with the ability to study the
effect of habitat upon community structure and function.
1. KEVIN CHEN
2. LIOR PACHTER
PUBLISHED: JULY 12, 2005
Shotgun sequencing involves randomly breaking up DNA sequences into lots of
small pieces and then reassembling the sequence by looking for regions of
Large, mammalian genomes difficult to clone(complex).
Clone-by-clone sequencing, although reliable and methodical(time taking).
Used by Fred Sanger and his colleagues.
To sequence small genomes such as those of viruses and bacteria.
fragments are often of varying sizes, ranging from 2-20kilobases to 200-300 kilo
Advantages of shotgun sequencing:
By removing the mapping stages, much faster process than clone-
Uses a fraction of the DNA that clone-by-clone sequencing needs.
Efficient if there is an existing reference sequence.
Easier to assemble the genome sequence by aligning it to an
existing reference genome?.
Faster and less expensive than methods requiring a genetic map.
Disadvantages of shotgun sequencing
Vast amounts of computing power and sophisticated software are
required to assemble shotgun sequences together.
Errors in assembly are more likely to be made because a genetic
map is not used
Easier to resolve than in other methods and minimized if a
reference genome can be used.
Carried out if a reference genome is already available, otherwise
assembly is very difficult without an existing genome to match it
Repetitive genomes and sequences can be more difficult to
The assembly of communities has strong similarities to the assembly of highly
polymorphic diploid eukaryotes, such as Ciona savigny and Candida albicans.
If we view prokaryotic strains as analogous to eukaryotic haplotypes.
The main difference is that in a microbial community, the number of strains is unknown
and potentially large, and their relative abundance is also unknown and potentially
skewed, while in most eukaryotes we know a priori the number of haplotypes and their
This disadvantage is mitigated somewhat by the small size and relative lack of repetitive
sequence in prokaryotic and viral genomes, so that the issue of distinguishing alleles from
paralogs and polymorphism from repetitive sequence is less acute.
We performed similar calculations for the three whale fall communities.
In addition, we considered the problem of assembling all genomes in these communities.
Since the 16S survey indicated that three dominant species constitute approximately half
the total abundance and all other species have roughly equal abundance, the Lander–
Waterman model implies that the expected coverage should be distributed as the mixture
of two Poisons with equal weight.
The results of these calculations are summarized. Similar results were obtained by Venter
et al. and Breitbart et al. , and bioinformatitions use different software's.
Whole genome shotgun sequencing
guided by bioinformatics pipelines—
an optimized approach for an
Shotgun metagenomics sequencing allows researchers to comprehensively sample all genes in all organisms present in a given
complex sample. The method enables microbiologists to evaluate bacterial diversity and detect the abundance of microbes in
various environments. Shotgun metagenomics also provides a means to study unculturable microorganisms that are otherwise
difficult or impossible to analyze.
Phylogeny and Community Diversity
Regards to community diversity, one of the advantages of the WGS
approach is that it is less biased then PCR, which is known to suffer
from a host of problems.
Community modeling based on analysis of assembly data within
the Lander–Waterman model is beginning to show that species
abundance curves are not lognormal as previously thought.
New methods that take into account these naturally occurring
distributions are needed.
The number of new community shotgun sequencing projects continues to grow, promising
to provide vast quantities of sequence data for analysis.
Samples are being drawn from macroscopic environments such as the sea and air, as well
as from more contained communities such as the human mouth.
Exciting advances in our understanding of ecosystems, environments, and communities
will require creative solutions to numerous new bioinformatics problems.
We have briefly mentioned some of these: assembly (can co-assembly techniques be used
to assemble polymorphic genomes and complex communities?), binning (what is the best
way to combine diverse sources of information to bin scaffolds?), gene finding (how
should gene finding programs, which were designed for complete genes and genomes, be
adapted for low-coverage sequence?), fingerprinting (which clustering techniques are best
suited for discovering novel pathways and functional groups that allow communities to
adapt to their environments?), and MSA and phylogeny (how can we best construct trees
and alignments from fragmented data?).
Countless more challenges will likely emerge as WGS sequencing approaches are used to
tackle increasingly complex communities.
The reward for computational biologists who work on these problems will be the
satisfaction of contributing to the grand enterprise of understanding the total diversity of
life on our planet.
Produces high quality microbial genome assemblies on a laptop
computer without any parameter tuning. A5-miseq does this by
automating the process of adapter trimming, quality filtering
A Galaxy-based framework consisting of publicly available
research software and specifically designed pipelines to build
complex, reproducible workflows for next-generation sequencing
microbiology data analysis.
Enabling microbiology researchers to conduct their own custom
analysis and data manipulation without software installation or
programming, Orione provides new opportunities for data-intensive
computational analyses in microbiology and metagenomics.
• Transport proteins
• Ecology and Environment
● Global Impacts.
The role of microbes is critical in
maintaining atmospheric balances, as
the main photosynthetic agents
responsible for the generation and
consumption of greenhouse gases
involved at all levels in ecosystems
and trophic chains
● the waste from water treatment
● gasoline leaks on lands or oil spills
in the oceans
● toxic chemicals
We are harnessing microbial power in
order to produce
● ethanol (from cellulose), hydrogen,
● Smart Farming. Microbes help our
● the “supressive soil” phenomenon
(buffer effect against disease-
● soil enrichment and regeneration
The World Within.
Studying the human microbiome
may lead to valuable new tools
and guidelines in
● Human and animal nutrition
● Better understanding of complex
diseases (obesity, cancer,
● Drug discovery
● Preventative medicine
QIIME is an open-source bioinformatics pipeline
for performing microbiome analysis from raw
DNA sequencing data.
QIIME is designed to take users from raw
sequencing data generated on the Illumina or
other platforms through publication quality
graphics and statistics.
QIIME has been applied to studies based on
billions of sequences from tens of thousands of
to develop a single piece of open-source, expandable software to fill the
bioinformatics needs of the microbial ecology community
screening, processing, aligning & clustering of Sanger, 454 or Illumina (16S
generating a high-quality, effectively ‘normalized’ shared file (i.e. counts of
OTUs per sample)
gaining general taxonomic information about the OTUs in your study system
(RDP Taxonomic Classifier)
In metagenomics, the aim is to understand the composition and
operation of complex microbial consortia in environmental
samples through sequencing and analysis of their DNA.
FUTURE OF METAGENOMICS
• To identify new enzymes & antibiotics
• To assess the effects of age, diet, and pathologic states
(e.g., inflammatory bowel diseases, obesity, and cancer)
on the distal gut microbiome of humans living in
Study of more exotic habitats
• Study antibiotic resistance in soil microbes
• Improved bioinformatics will quicken analysis for library
• Investigating ancient DNA remnants
• Discoveries such as phylogenic tags (rRNA genes, etc) will give
momentum to the growing field
• Learning novel pathways will lead to knowledge about the current
nonculturable bacteria to then culture these systems