Presented By : Shalini Sharma
Edited by- HIMANSHU JAIN
15-07-2020JUNIOR1
METAGENOMICS
INTRODUCTION TO
METAGENOMICS
15-07-2020JUNIOR2
 The term “Metagenomics" was first used by Jo
Handelsman, Jon Clardy, Robert M. Goodman, Sean F.
Brady, and others, and first appeared in publication in
1998.
 Metagenomics referenced the idea that a collection of
genes sequenced from the environment could be
analyzed in a way analogous to the study of a
single genome.
 The broad field may also be referred to
as environmental
genomics, ecogenomics or community genomics.
HISTORY OF METAGENOMICS
15-07-2020JUNIOR3
 These early studies focused on 16S ribosomal RNA (rRNA)
sequences which are relatively short, often conserved within a
species, and generally different between species. Many
16S rRNA sequences have been found which do not belong to any
known cultured species, indicating that there are numerous non-
isolated organisms.
 Norman R. Pace and colleagues, who used PCR to explore the
diversity of ribosomal RNA sequences.The insights gained from
these breakthrough studies led Pace to propose the idea of cloning
DNA directly from environmental samples as early as 1985.This led
to the first report of isolating and cloning bulk DNA from an
environmental sample, published by Pace and colleagues in 1991.
 In 2002, Mya Breitbart, Forest Rohwer, and colleagues used
environmental shotgun sequencing to show that 200 liters of
seawater contains over 5000 different viruses.
CONT.
15-07-2020JUNIOR4
 Subsequent studies showed that there are more than a thousand viral
species in human stool and possibly a million different viruses per
kilogram of marine sediment, including many bacteriophages.
Essentially all of the viruses in these studies were new species.
 Beginning in 2003, Craig Venter, has led the Global Ocean Sampling
Expedition (GOS), circumnavigating the globe and collecting
metagenomic samples throughout the journey. All of these samples
are sequenced using shotgun sequencing, in hopes that new
genomes (and therefore new organisms) would be identified. The
pilot project, conducted in the Sargasso Sea, found DNA from nearly
2000 different species, including 148 types of bacteria never before
seen. Venter has circumnavigated the globe and thoroughly explored
the West Coast of the United States, and completed a two-year
expedition to explore the Baltic, Mediterranean and Black Seas.
 In 2005 Stephan C. Schuster at Penn State University and colleagues
published the first sequences of an environmental sample generated
with high-throughput sequencing.
FLOWCHART OF TYPICAL METAGENOMIC
PROJECT
15-07-2020JUNIOR5
SAMPLING AND PROCESSING
15-07-2020JUNIOR6
 Sample processing is the first and most crucial step in
metagenomics.
 DNA extracted should be representative of all cells present in the
sample and sufficient amount of high quality nucleic acid must be
obtained for subsequent library production and sequencing.
 Sample fractionation steps should be checked to ensure that
sufficient enrichment of the target is achieved and that minimal
contamination of non-target material occurs.
 Physical separation and isolation of cells from the sample might also
be important to maximize DNA yield, and resulting sequence
fragment length.
 Some type of sample such as biopsis or ground water often yield
very small amount of DNA but in library production for most
sequencing technologies require high amount of DNA and hence
amplification of starting material might be required.
 Multiple displacement amplification (MDA) using random hexamers
and phi29 polymerase is one option employed to increase DNA
yield, this method has been widely used in single-cell genomics and
to a certain extent in metagenomics.
METAGENOMIC HOTSPOTS
15-07-2020JUNIOR7
1. EXTREME LOW AND HIGH TEMPRETURE
2. VOLCANO
3. SOIL
4. WASTE WATER
5. ACIDIC
6. ALKALINE
7. HEAVY METAL COMPOSITION
16s rRNA SEQUENCING
15-07-2020JUNIOR8
 The 16S rRNA gene is a taxonomic genomic marker
that is common to almost all bacteria and archaea.
 16S rRNA sequencing is accomplished by designing
primers to the entire 16S locus or targeting multiple
hypervariable domains within the gene.
 These hypervariable regions provide the species-
species signature necessary for identification.
 After these domains have been amplified,
sequencing related primers are either ligated or
added by a second PCR step.
15-07-2020JUNIOR9
ADVANTAGES OF 16s rRNA
SEQUENCING
15-07-2020JUNIOR10
 Gene is universally distributed
 Abundance of 16S rRNA gene sequences exceed
those of other bacterial genes
 Easy measurements of phylogenetic relationships
across different taxa
 Horizontal gene transfer isn’t a big problem
 Costs to perform 16S rRNA amplification and
sequencing are typically between $47 - $60 per
sample
DISADVANTAGE OF 16s rRNA
SEQUENCING
15-07-2020JUNIOR11
 Copy numbers per genome can vary. While they
tend to be taxon specific, variation among strains
is possible.
 Relative abundance measurements are un-
reliable because of amplification biases.
 Diversity of the gene tends to overinflate diversity
estimates
 Resolution of the 16S gene is often too low to
differentiate between closely related species.
SEQUENCING
15-07-2020JUNIOR12
 Recovery of DNA sequences longer than a few
thousand base pairs from environmental samples was
very difficult until recent advances in molecular
biological techniques allowed the construction
of libraries in bacterial artificial chromosomes (BACs),
which provided better vectors for molecular cloning.
 Sequencing can be done through three methods:
1. SHORTGUN SEQUENCING
2. HIGH-THROUGHPUT SEQUENCING
3. CLONE-BY-CLONE SEQUENCING
SHORTGUN SEQUENCING
15-07-2020JUNIOR13
 The approach, used to sequence many cultured
microorganisms and the human genome, randomly
shears DNA, sequences many short sequences,
and reconstructs them into a consensus sequence.
 Shotgun metagenomics provides information both about
which organisms are present and what metabolic
processes are possible in the community.
 To achieve the high coverage needed to fully resolve the
genomes of under-represented community members,
large samples, often prohibitively so, are needed.
 the random nature of shotgun sequencing ensures that
many of these organisms, which would otherwise go
unnoticed using traditional culturing techniques, will be
represented by at least some small sequence segments.
HIGH-THROUGHPUT
SEQUENCING
15-07-2020JUNIOR14
 The first metagenomic studies conducted using high-
throughput sequencing used massively parallel 454
pyrosequencing.
 Three other technologies commonly applied to environmental
sampling are the Ion Torrent Personal Genome Machine,
the Illumina MiSeq or HiSeq and the Applied Biosystems
SOLiD system.
 These techniques for sequencing DNA generate shorter
fragments than Sanger sequencing; Ion Torrent PGM System
and 454 pyrosequencing typically produces ~400 bp reads,
Illumina MiSeq produces 400-700bp reads (depending on
whether paired end options are used), and SOLiD produce
25–75 bp reads.
 Historically, these read lengths were significantly shorter than
the typical Sanger sequencing read length of ~750 bp,
however the Illumina technology is quickly coming close to this
benchmark. However, this limitation is compensated for by the
much larger number of sequence reads.
CLONE-BY-CLONE
SEQUENCING
15-07-2020JUNIOR15
 In this method, the fragments are first aligned into
contagis.
 It is also called directed sequencing of BAC
contings.
 In the first step,BAC clones are arranged in
contigs.each BAC clone has 80-100 kb long DNA
fragment cloned into it.
 This fragment is then used to create cosmid clones and
plasmid clones; these have progressely smaller DNA
fragments.
 This clones are also arranged in contigs. Each clone of
contigs is now sequenced.
 Thus the nucleotide sequence is determined on a clone
by clone basis until the entire genome is sequenced.
 This method was chosen for the publicly funded Human
Genome Project sponsored by National Institure of
Health and Department of Energy.
ASSEMBLY OF DNA SEQUENCE
15-07-2020JUNIOR16
 sequence assembly refers to aligning and merging
fragments from a longer DNA sequence in order to
reconstruct the original sequence.
 This is needed as DNA sequencing technology
cannot read whole genomes in one go, but rather
reads small pieces of between 20 and 30,000 bases,
depending on the technology used. Typically the
short fragments, called reads, result from shotgun
sequencing genomic DNA, or gene transcript (ESTs).
 Three techniques employed for dna sequence
assembly are:
1. Genome assemblers,
2. EST assemblers,
3. De-novo vs mapping assembly
15-07-2020JUNIOR17
 GENOME ASSEMBLERS: The first sequence assemblers began
to appear in the late 1980s and early 1990s as variants of
simpler sequence alignment programs to piece together vast
quantities of fragments generated by automated sequencing
instruments called DNA sequencers. Faced with the challenge of
assembling the first larger eukaryotic genomes—the fruit
fly Drosophila melanogaster in 2000 and the human genome just
a year later,—scientists developed assemblers like Celera
Assembler and Arachne able to handle genomes of 130 million
(e.g., the fruit fly Drosophila melanogaster) to 3 billion (e.g., the
human genome) base pairs.
 EST ASSEMBLERS: Expressed sequence tag or EST assembly
was an early strategy, dating from the mid-1990s to the mid-
2000s, to assemble individual genes rather than whole genomes.
The input sequences for EST assembly are fragments of the
transcribed mRNA of a cell and represent only a subset of the
whole genome. Transcribed genes contain many fewer repeats,
making assembly somewhat easier. On the other hand, some
genes are expressed (transcribed) in very high numbers
(e.g., housekeeping genes), which means that unlike whole-
genome shotgun sequencing, the reads are not uniformly
sampled across the genome. EST assembly is made much more
complicated by features like (cis-) alternative splicing, trans-
splicing, single-nucleotide polymorphism, and post-transcriptional
• DE-NOVO vs MAPPING ASSEMBLY
DE-NOVO ASSEMBLY MAPPING ASSEMBLY
15-07-2020JUNIOR18
 de-novo: assembling short reads
to create full-length (sometimes
novel) sequences, without using
a template (see de novo
sequence assemblers, de novo
transcriptome assembly).
 de-novo assemblies are orders of
magnitude slower and more
memory intensive than mapping
assemblies. This is mostly due to
the fact that the assembly
algorithm needs to compare
every read with every other read.
 In de-novo assembly requires
the construction of
a graph representing neighboring
repeats. Such information can be
derived from reading a long
fragment covering the repeats in
full or only its two ends
 mapping: assembling reads
against an existing
backbone sequence,
building a sequence that is
similar but not necessarily
identical to the backbone
sequence.
 while for mapping
assemblies one would
have a very similar book as
a template.
 On the other hand, in a
mapping assembly, parts
with multiple or no matches
are usually left for another
BINNING
15-07-2020JUNIOR19
 Binning is the process of grouping reads or contigs and
assigning them to operational taxonomic units. Binning
methods can be based on either compositional features
or alignment (similarity), or both.
 The incomplete nature of the obtained sequences makes
it hard to assemble individual genes,much less recovering
the full genomes of each organism. Thus, binning
techniques represent a "best effort" to identify reads
or contigs with certain groups of organisms designated
as operational taxonomic units (OTUs).
 Modern binning techniques use both previously available
information independent from the sample and intrinsic
information present in the sample.
 Depending on the diversity and complexity of the sample,
their degree of success vary: in some cases they can
resolve the sequences up to individual species, while in
some others the sequences are identified at best with very
broad taxonomic groups.
ANNOTATION
15-07-2020JUNIOR20
 It is the process that identifies genes, their regulatory
sequences and their functions.
 Annotation also identifies nonprotein coding genes,
including those that codes for ribosomal RNA,
transfer RNA and small nuclear RNAs.
 In addition, mobile genetic elements and reptitve
sequences families present in the genome are also
identified and characterized.
15-07-2020JUNIOR21
 Locating protein coding genes are done by inspecting the sequence
using a computer software.
 Protein encoding genes are composed of open reading
frams(ORFs).
 An ORF has a series of codon that specifies an amino acid sequence.
They begin with an initiation codon(usually ATG) and stop codon(TAA,
TAG and TGA) that are usually identifies by the computer.
 This method is effective for bacterial genomes.
 Genes in eukaryotic genomes, have a pattern of exons(coding genes)
alternated with intron(non coding region).
 Genes in humans and other eukaryotes are often widely
spaced,increasing the chances of finding false genes. But newer
version of ORF scanning software for eukaryotes genomes make
scanning more effecient.
STATISTICAL ANALYSIS
15-07-2020JUNIOR22
 Metagenomic analysis involves the application of
bioinformatics tools to study the genetic material
from environmental, uncultured microorganisms.
 Analysis of metagenomic data involves three
major steps: 1) assembly, 2) annotation, and 3)
statistical analysis.
 This task is relatively simple in case of
prokaryotes, but search of eukaryotic gene is
quite difficult.
BIOINFORMATIC TOOLS USED FOR ANALYSIS AND
INTERPRETATION OF GENOME SEQUENCE DATA
FUNCTION SOFTWARE
15-07-2020JUNIOR23
1. Detection of gene from
genome sequence.
2. Prediction of a new gene
3. Identification of functional
domains/motif of proteins .
4. Detection of tRNA.
5. Genome annotation:
 Description of
experimental evidence
 Indexing and
visualization
 Storage,manipulation
and visualizat ion
1. GeneScan,Glimmer,G
ENIE,GRAIL,GENEMA
RK,GeneFinder,HMM
Gene,etc.
2. tBLASTx,FASTA,HMM
ER,etc.
3. PRINTS,PROSITE,SM
ART,BLOCKS,etc.
4. tRNA ScanSE.
GAME
DAS
BioJava 2001, BioPerl
2001,etc
STORAGE AND SHARING OF DATA
15-07-2020JUNIOR24
 NCBI is mandated to store all metagenomic data,however
the sheer volume of data being generated means there is
an urgent need for appropriate ways of storing vast
amount of sequences.
 Tools such as IMG/MER, CAMERA, MGRAST, and EBI
metagenomics provide an integrated environment for
analysis,management,storage and sharing of
metagenome projects.
 A suite of standerd language for metadata is currently
provided by the minimum information about any (x)
sequence checklists (MIxS).
 MIxS is an umbrella term to describe MIMS (Minimum
Information about a Metagenome Sequence) and
MIMARKS (MINIMUM Information about a MARKer
Sequence) have been devised, providing a scheme of
standard language for metadata annotation.
DATABASES
15-07-2020JUNIOR25
Databases Information available Sources
nucleotide sequence
databases
Contain nucleotide
sequences
GeneBank by
NCBI,USA;DDBJ,Japan:
Nucleotide sequence
database by EMBL.
GeneBank Contain nucleotide
sequence of genomic
DNA
NCBI,USA
dbEST EST sequence
(redundant nucleotide
sequences)
NCBI,USA;EBI,UK
DDBJ (DNA Database of
Japan)
Nucleotide sequences GenomeNet,Japan
EBML Nucleotide
sequences Databank
(European Molecular
Biology Laboratory)
DNA sequence EBI,UK
PDB (Protein DataBank) Sequence of those
protein whose 3-D
structure is known
NCBI,USA;EBI,UK
15-07-2020JUNIOR26
1. GM CROPS DATABASE:This database,
developed by NRC on Plant Biotechnolog,
New Delhi. It store information on Biosafety of
transgenics(released in India), including over
800 publications on the subject. For eg: Atotal
of 139 transgenic lines using 4 genes (cry1Ab,
cry1Ab, cry2Ab and vip3A) and a single
promoter (CaMV 35S) have been developed.
2. VANSHANUDHAN:This Database has been
developed by NRC on Plant biotechnology,
New Delhi scientists as an outcome of Indian
rice genome initiative. It contains information
on the 56,298 rice genes.
SOME INDIAN DATABASE
SOME IMPORTANT DATABASE SEARCH TOOLS
SEARCH TOOL FUNCTION PROVIDED
15-07-2020JUNIOR27
 BLAST (Basic
Local Alignment
Search
Tool)(NCBI,USA)
 DNAPLOT
(EBI,UK)
 LIGAND
(GeneomeNet,
Japan)
 PROSITE
 A group of tool used to
analyse sequence
information and detect
homologous sequence.
 Sequence alignment tool.
 A chemical database that
allow search for a
combination of enzyme and
metabolic enzymes;linked
to all other publicly
accessible database.
 A collection of functional
sites and sequence
patterns found in many
protein;has search tool for
matching patterns.
APPLICATIONS OF METAGENOMICS
15-07-2020JUNIOR28
 Metagenomics has the potential to advance
knowledge in a wide variety of fields. It can also
be applied to solve practical challenges in:
a) Agriculture
b) Biofuels
c) Biotechnology
d) Ecology
e) Environmental remedation
f) Gut microbe characterization
g) Infectious disease diagnosis
15-07-2020JUNIOR29
 AGRICULTURE:
 The soils in which plants grow are inhabited by microbial
communities, with one gram of soil containing around 109-
1010 microbial cells which comprise about one gigabase of sequence
information.
 Microbial consortia perform a wide variety of ecosystem
services necessary for plant growth, including fixing atmospheric
nitrogen, nutrient cycling, disease suppression,
and sequester iron and other metals.
 Functional metagenomics strategies are being used to explore
the interactions between plants and microbes through cultivation-
independent study of these microbial communities.
 By allowing insights into the role of previously uncultivated or rare
community members in nutrient cycling and the promotion of plant
growth, metagenomic approaches can contribute to improved
disease detection in crops and livestock and the adaptation of
enhanced farming practices which improve crop health by
harnessing the relationship between microbes and plants.
15-07-2020JUNIOR30
 BIOFUEL:
 Biofuels are fuels derived from biomass conversion, as in the
conversion of cellulose contained in corn stalks, switchgrass, and
other biomass into cellulosic ethanol.
 This process is dependent upon microbial consortia(association) that
transform the cellulose into sugars, followed by the fermentation of
the sugars into ethanol. Microbes also produce a variety of sources
of bioenergy including methane and hydrogen.
 The efficient industrial-scale deconstruction of biomass requires
novel enzymes with higher productivity and lower cost.
 Metagenomic approaches to the analysis of complex microbial
communities allow the targeted screening of enzymes with industrial
applications in biofuel production, such as glycoside hydrolases.
 Metagenomic approaches allow comparative analysis
between convergent microbial systems like biogas fermenters
or insect herbivores such as the fungus garden of the leafcutter ants.
15-07-2020JUNIOR31
 BIOTECHNOLOGY:
 The application of metagenomics has allowed the development
of commodity and fine
chemicals, agrochemicals and pharmaceuticals where the benefit
of enzyme-catalyzed chiral synthesis is increasingly recognized.
 Two types of analysis are used in the bioprospecting of metagenomic data:
function-driven screening for an expressed trait, and sequence-driven
screening for DNA sequences of interest.
 Function-driven analysis seeks to identify clones expressing a desired trait
or useful activity, followed by biochemical characterization and sequence
analysis. This approach is limited by availability of a suitable screen and the
requirement that the desired trait be expressed in the host cell. Moreover,
the low rate of discovery (less than one per 1,000 clones screened) and its
labor-intensive nature further limit this approach.
 In contrast, sequence-driven analysis uses conserved DNA
sequences to design PCR primers to screen clones for the sequence of
interest.
 The sequence-driven approach to screening is limited by the breadth and
accuracy of gene functions present in public sequence databases.
 In practice, experiments make use of a combination of both functional and
sequence-based approaches based upon the function of interest, the
complexity of the sample to be screened, and other factors.
 An example of success using metagenomics as a biotechnology for drug
discovery is illustrated with the malacidin antibiotics.
15-07-2020JUNIOR32
 ECOLOGY:
 Metagenomic analysis of the bacterial consortia found in the
defecations of Australian sea lions suggests that nutrient-rich sea
lion faeces may be an important nutrient source for coastal
ecosystems. This is because the bacteria that are expelled
simultaneously with the defecations are adept at breaking down
the nutrients in the faeces into a bioavailable form that can be
taken up into the food chain.
 DNA sequencing can also be used more broadly to identify
species present in a body of water,debris filtered from the air, or
sample of dirt. This can establish the range of invasive
species and endangered species, and track seasonal
populations.
 ENVIRONMENTAL REMEDIATION:
 Metagenomics can improve strategies for monitoring the impact
of pollutants on ecosystems and for cleaning up contaminated
environments. Increased understanding of how microbial
communities cope with pollutants improves assessments of the
potential of contaminated sites to recover from pollution and
increases the chances of bioaugmentation or biostimulation trials
to succeed.
15-07-2020JUNIOR33
 GUT MICROBE CHARACTERIZATION:
 Metagenomic sequencing is being used to characterize the microbial
communities from 15–18 body sites from at least 250 individuals. This is part
of the Human Microbiome initiative with primary goals to determine if there is
a core human microbiome, to understand the changes in the human
microbiome that can be correlated with human health, and to develop new
technological and bioinformatics tools to support these goals.
 Another medical study as part of the MetaHit (Metagenomics of the Human
Intestinal Tract) project consisted of 124 individuals from Denmark and
Spain consisting of healthy, overweight, and irritable bowel disease patients.
The study attempted to categorize the depth and phylogenetic diversity of
gastrointestinal bacteria. Using Illumina GA sequence data and
SOAPdenovo, a de Bruijn graph-based tool specifically designed for
assembly short reads, they were able to generate 6.58 million contigs
greater than 500 bp for a total contig length of 10.3 Gb and a N50 length of
2.2 kb.
 The study demonstrated that two bacterial divisions, Bacteroidetes and
Firmicutes, constitute over 90% of the known phylogenetic categories that
dominate distal gut bacteria.
15-07-2020JUNIOR34
 Using the relative gene frequencies found within the gut these researchers
identified 1,244 metagenomic clusters that are critically important for the
health of the intestinal tract.
 There are two types of functions in these range clusters: housekeeping and
those specific to the intestine.
 The housekeeping gene clusters are required in all bacteria and are often
major players in the main metabolic pathways including central carbon
metabolism and amino acid synthesis. The gut-specific functions include
adhesion to host proteins and the harvesting of sugars from globoseries
glycolipids.
 Patients with irritable bowel syndrome were shown to exhibit 25% fewer
genes and lower bacterial diversity than individuals not suffering from irritable
bowel syndrome indicating that changes in patients' gut biome diversity may
be associated with this condition.
 While these studies highlight some potentially valuable medical applications,
only 31–48.8% of the reads could be aligned to 194 public human gut
bacterial genomes and 7.6–21.2% to bacterial genomes available in
GenBank which indicates that there is still far more research necessary to
capture novel bacterial genomes.
15-07-2020JUNIOR35
 INFECTIOUS DISEASE DIAGNOSIS:
 Differentiating between infectious and non-infectious
illness, and identifying the underlying etiology of infection,
can be quite challenging. For example, more than half of
cases of encephalitis remain undiagnosed, despite
extensive testing using state-of-the-art clinical laboratory
methods. Metagenomic sequencing shows promise as a
sensitive and rapid method to diagnose infection by
comparing genetic material found in a patient's sample to
a database of thousands of bacteria, viruses, and other
pathogens.
GENOME SEQUENCING PROJECT
15-07-2020JUNIOR36
 A Model Organism is an organism about which is a large amount of
scientific knowledge is already available. These organisms include
both prokaryotes and eukaryotes as well as animals.
 E. coli genome sequencing was completed in 1997. The genome
size is over 4.64 x 10^6 bp and contain 4,408 genes.
 A. fungidus is a strictly anaerobic archaebacterium, its genome was
published in 1997. The genome size project is 2.17 x 10^6 bp and
contain 2,493 genes.
 Arabidopsis genome sequencing was began in 1990 and was
completed in 2000. the genome has 130 x 10^6 bp and estimated
26,000genes.
 Human genome project was picked up in 1984 by the US
government when the planning started, the project formally launched
in 1990 and was declared complete on April 14, 2003.
15-07-2020JUNIOR37
 Several conclusions were made from human genome
draft sequence. Some of the important feature are?;
1. It contain over 3.2 million base pairs.
2. Only ~5% of the genome encodes proteins.
3. At least 50% of the genome is derived from
transposable elements.
4. The genome has gene rich regions separated by gene
poor regions often called gene deserts.
5. Human genome is estimated to have about 35,000
genes.
6. The largest gene is the gene encoding dystrophin; it is
2.5 x 10^6 bp long.
7. Genome sequencing of different individuals differ for
less than 0.2%of the base pairs. Most of the difference
occur in the form of single base differenences in the
sequence. The single base difference is called single
nucleotide polymorphism. One SNP occurs at every
~1,000 bp of human genome. About 85% of all
difference in human DNAs are due to SNPs.
BENEFITS FROM GENOME SEQUENCING
PROJECTS
15-07-2020JUNIOR38
 It enables the determination of the complete genetic information present in
the genomes of various organisms.
 The relationships between genes can be deduces with confidence.
 It provides insights on genome organization and evaluation and the
mechanism involved therein.
 It has openup exciting areas for future research,eg. Functional genomics.
 Genome sequence will allow biologists to work out the various molecular
interactions that lead to the normal development of organism.
 Information like SNPs has become available;these may be useful oin several
ways.
 A varirty of tool and techniques were developed for the genome sequencing
projects.
 A better understanding of human genetics diseases should facilitate their
cure.
 It may provide an understanding of why different individuals respond
differently to the same drugs (pharmacogenomics).
 The pathogenecity of microorganisms would be better understood. This
should facilitate protection from such diseases.
LIMITATIONS
15-07-2020JUNIOR39
 To much data.
 Most genes are not identifiable.
 Contamination,chimeric clone sequence.
 Extraction problem.
 Requires proteomics or expression studies to
demonstrate phenotypic characteristics.
 Need a standard method for annotating genomes.
 Can only progress as library technology progress,
including sequencing technology.
 Requires high throughput instrumentation not readily
available to most institutions.
CONCLUSION
15-07-2020JUNIOR40
 Metagenomics has benefited in the past few years from many
visionary investments in both financial and intellectual terms.
 The science of metagenomics is currently in its pioneering
stages of development as a field, and many tools and
technologies are undergoing rapid evolution.
 The best use of the metagenomics as a tool to address
fundamental question of microbial ecology,evolution and
diversity and to derive and test new hypothesis.
 As datasets become increasingly more complex and
comprehensive, novel tools for analysis,storage and
visualization will be required.
 Metagenomics allows us to discover new genes and proteins
or even the complete genomes of non cultivable organism in
less time and with better accuracy than classical microbiology
or molecular methods.
 In addition to the phenotypic dimension of human biolohy,such
as gene expression profiling,proteomics and metabolomics,
perhaps we need to extend our concept of the human genome
to include the more comprehensive and plastic human
metagenome in laboratory medicine.
REFRENCES
15-07-2020JUNIOR41
 https://en.wikipedia.org/wiki/Metagenomics.
 https://en.wikipedia.org/wiki/Sequence_assembly.
 https://www.slideshare.net/PradeepBadal/metagenomics-ppt.
 https://genohub.com/shotgun-metagenomics-sequencing/.
 GENOMICS AND BIOINFORMATICS from B.D. SINGH
• 19.4. SEQUENCING OF GENOMES (pg-704)
• 19.5. GENOME SEQUENCING PROJECT (pg-706-708)
• 19.14.DATABASE AND SEARCH TOOLS (pg-737)
• 19.15. SOME INDIAN DATABASE (pg-741)
• 19.16. ANALYSIS USING BIOINFORMATICS TOOLS (pg-741)

Metagenomics

  • 1.
    Presented By :Shalini Sharma Edited by- HIMANSHU JAIN 15-07-2020JUNIOR1 METAGENOMICS
  • 2.
    INTRODUCTION TO METAGENOMICS 15-07-2020JUNIOR2  Theterm “Metagenomics" was first used by Jo Handelsman, Jon Clardy, Robert M. Goodman, Sean F. Brady, and others, and first appeared in publication in 1998.  Metagenomics referenced the idea that a collection of genes sequenced from the environment could be analyzed in a way analogous to the study of a single genome.  The broad field may also be referred to as environmental genomics, ecogenomics or community genomics.
  • 3.
    HISTORY OF METAGENOMICS 15-07-2020JUNIOR3 These early studies focused on 16S ribosomal RNA (rRNA) sequences which are relatively short, often conserved within a species, and generally different between species. Many 16S rRNA sequences have been found which do not belong to any known cultured species, indicating that there are numerous non- isolated organisms.  Norman R. Pace and colleagues, who used PCR to explore the diversity of ribosomal RNA sequences.The insights gained from these breakthrough studies led Pace to propose the idea of cloning DNA directly from environmental samples as early as 1985.This led to the first report of isolating and cloning bulk DNA from an environmental sample, published by Pace and colleagues in 1991.  In 2002, Mya Breitbart, Forest Rohwer, and colleagues used environmental shotgun sequencing to show that 200 liters of seawater contains over 5000 different viruses.
  • 4.
    CONT. 15-07-2020JUNIOR4  Subsequent studiesshowed that there are more than a thousand viral species in human stool and possibly a million different viruses per kilogram of marine sediment, including many bacteriophages. Essentially all of the viruses in these studies were new species.  Beginning in 2003, Craig Venter, has led the Global Ocean Sampling Expedition (GOS), circumnavigating the globe and collecting metagenomic samples throughout the journey. All of these samples are sequenced using shotgun sequencing, in hopes that new genomes (and therefore new organisms) would be identified. The pilot project, conducted in the Sargasso Sea, found DNA from nearly 2000 different species, including 148 types of bacteria never before seen. Venter has circumnavigated the globe and thoroughly explored the West Coast of the United States, and completed a two-year expedition to explore the Baltic, Mediterranean and Black Seas.  In 2005 Stephan C. Schuster at Penn State University and colleagues published the first sequences of an environmental sample generated with high-throughput sequencing.
  • 5.
    FLOWCHART OF TYPICALMETAGENOMIC PROJECT 15-07-2020JUNIOR5
  • 6.
    SAMPLING AND PROCESSING 15-07-2020JUNIOR6 Sample processing is the first and most crucial step in metagenomics.  DNA extracted should be representative of all cells present in the sample and sufficient amount of high quality nucleic acid must be obtained for subsequent library production and sequencing.  Sample fractionation steps should be checked to ensure that sufficient enrichment of the target is achieved and that minimal contamination of non-target material occurs.  Physical separation and isolation of cells from the sample might also be important to maximize DNA yield, and resulting sequence fragment length.  Some type of sample such as biopsis or ground water often yield very small amount of DNA but in library production for most sequencing technologies require high amount of DNA and hence amplification of starting material might be required.  Multiple displacement amplification (MDA) using random hexamers and phi29 polymerase is one option employed to increase DNA yield, this method has been widely used in single-cell genomics and to a certain extent in metagenomics.
  • 7.
    METAGENOMIC HOTSPOTS 15-07-2020JUNIOR7 1. EXTREMELOW AND HIGH TEMPRETURE 2. VOLCANO 3. SOIL 4. WASTE WATER 5. ACIDIC 6. ALKALINE 7. HEAVY METAL COMPOSITION
  • 8.
    16s rRNA SEQUENCING 15-07-2020JUNIOR8 The 16S rRNA gene is a taxonomic genomic marker that is common to almost all bacteria and archaea.  16S rRNA sequencing is accomplished by designing primers to the entire 16S locus or targeting multiple hypervariable domains within the gene.  These hypervariable regions provide the species- species signature necessary for identification.  After these domains have been amplified, sequencing related primers are either ligated or added by a second PCR step.
  • 9.
  • 10.
    ADVANTAGES OF 16srRNA SEQUENCING 15-07-2020JUNIOR10  Gene is universally distributed  Abundance of 16S rRNA gene sequences exceed those of other bacterial genes  Easy measurements of phylogenetic relationships across different taxa  Horizontal gene transfer isn’t a big problem  Costs to perform 16S rRNA amplification and sequencing are typically between $47 - $60 per sample
  • 11.
    DISADVANTAGE OF 16srRNA SEQUENCING 15-07-2020JUNIOR11  Copy numbers per genome can vary. While they tend to be taxon specific, variation among strains is possible.  Relative abundance measurements are un- reliable because of amplification biases.  Diversity of the gene tends to overinflate diversity estimates  Resolution of the 16S gene is often too low to differentiate between closely related species.
  • 12.
    SEQUENCING 15-07-2020JUNIOR12  Recovery ofDNA sequences longer than a few thousand base pairs from environmental samples was very difficult until recent advances in molecular biological techniques allowed the construction of libraries in bacterial artificial chromosomes (BACs), which provided better vectors for molecular cloning.  Sequencing can be done through three methods: 1. SHORTGUN SEQUENCING 2. HIGH-THROUGHPUT SEQUENCING 3. CLONE-BY-CLONE SEQUENCING
  • 13.
    SHORTGUN SEQUENCING 15-07-2020JUNIOR13  Theapproach, used to sequence many cultured microorganisms and the human genome, randomly shears DNA, sequences many short sequences, and reconstructs them into a consensus sequence.  Shotgun metagenomics provides information both about which organisms are present and what metabolic processes are possible in the community.  To achieve the high coverage needed to fully resolve the genomes of under-represented community members, large samples, often prohibitively so, are needed.  the random nature of shotgun sequencing ensures that many of these organisms, which would otherwise go unnoticed using traditional culturing techniques, will be represented by at least some small sequence segments.
  • 14.
    HIGH-THROUGHPUT SEQUENCING 15-07-2020JUNIOR14  The firstmetagenomic studies conducted using high- throughput sequencing used massively parallel 454 pyrosequencing.  Three other technologies commonly applied to environmental sampling are the Ion Torrent Personal Genome Machine, the Illumina MiSeq or HiSeq and the Applied Biosystems SOLiD system.  These techniques for sequencing DNA generate shorter fragments than Sanger sequencing; Ion Torrent PGM System and 454 pyrosequencing typically produces ~400 bp reads, Illumina MiSeq produces 400-700bp reads (depending on whether paired end options are used), and SOLiD produce 25–75 bp reads.  Historically, these read lengths were significantly shorter than the typical Sanger sequencing read length of ~750 bp, however the Illumina technology is quickly coming close to this benchmark. However, this limitation is compensated for by the much larger number of sequence reads.
  • 15.
    CLONE-BY-CLONE SEQUENCING 15-07-2020JUNIOR15  In thismethod, the fragments are first aligned into contagis.  It is also called directed sequencing of BAC contings.  In the first step,BAC clones are arranged in contigs.each BAC clone has 80-100 kb long DNA fragment cloned into it.  This fragment is then used to create cosmid clones and plasmid clones; these have progressely smaller DNA fragments.  This clones are also arranged in contigs. Each clone of contigs is now sequenced.  Thus the nucleotide sequence is determined on a clone by clone basis until the entire genome is sequenced.  This method was chosen for the publicly funded Human Genome Project sponsored by National Institure of Health and Department of Energy.
  • 16.
    ASSEMBLY OF DNASEQUENCE 15-07-2020JUNIOR16  sequence assembly refers to aligning and merging fragments from a longer DNA sequence in order to reconstruct the original sequence.  This is needed as DNA sequencing technology cannot read whole genomes in one go, but rather reads small pieces of between 20 and 30,000 bases, depending on the technology used. Typically the short fragments, called reads, result from shotgun sequencing genomic DNA, or gene transcript (ESTs).  Three techniques employed for dna sequence assembly are: 1. Genome assemblers, 2. EST assemblers, 3. De-novo vs mapping assembly
  • 17.
    15-07-2020JUNIOR17  GENOME ASSEMBLERS:The first sequence assemblers began to appear in the late 1980s and early 1990s as variants of simpler sequence alignment programs to piece together vast quantities of fragments generated by automated sequencing instruments called DNA sequencers. Faced with the challenge of assembling the first larger eukaryotic genomes—the fruit fly Drosophila melanogaster in 2000 and the human genome just a year later,—scientists developed assemblers like Celera Assembler and Arachne able to handle genomes of 130 million (e.g., the fruit fly Drosophila melanogaster) to 3 billion (e.g., the human genome) base pairs.  EST ASSEMBLERS: Expressed sequence tag or EST assembly was an early strategy, dating from the mid-1990s to the mid- 2000s, to assemble individual genes rather than whole genomes. The input sequences for EST assembly are fragments of the transcribed mRNA of a cell and represent only a subset of the whole genome. Transcribed genes contain many fewer repeats, making assembly somewhat easier. On the other hand, some genes are expressed (transcribed) in very high numbers (e.g., housekeeping genes), which means that unlike whole- genome shotgun sequencing, the reads are not uniformly sampled across the genome. EST assembly is made much more complicated by features like (cis-) alternative splicing, trans- splicing, single-nucleotide polymorphism, and post-transcriptional
  • 18.
    • DE-NOVO vsMAPPING ASSEMBLY DE-NOVO ASSEMBLY MAPPING ASSEMBLY 15-07-2020JUNIOR18  de-novo: assembling short reads to create full-length (sometimes novel) sequences, without using a template (see de novo sequence assemblers, de novo transcriptome assembly).  de-novo assemblies are orders of magnitude slower and more memory intensive than mapping assemblies. This is mostly due to the fact that the assembly algorithm needs to compare every read with every other read.  In de-novo assembly requires the construction of a graph representing neighboring repeats. Such information can be derived from reading a long fragment covering the repeats in full or only its two ends  mapping: assembling reads against an existing backbone sequence, building a sequence that is similar but not necessarily identical to the backbone sequence.  while for mapping assemblies one would have a very similar book as a template.  On the other hand, in a mapping assembly, parts with multiple or no matches are usually left for another
  • 19.
    BINNING 15-07-2020JUNIOR19  Binning isthe process of grouping reads or contigs and assigning them to operational taxonomic units. Binning methods can be based on either compositional features or alignment (similarity), or both.  The incomplete nature of the obtained sequences makes it hard to assemble individual genes,much less recovering the full genomes of each organism. Thus, binning techniques represent a "best effort" to identify reads or contigs with certain groups of organisms designated as operational taxonomic units (OTUs).  Modern binning techniques use both previously available information independent from the sample and intrinsic information present in the sample.  Depending on the diversity and complexity of the sample, their degree of success vary: in some cases they can resolve the sequences up to individual species, while in some others the sequences are identified at best with very broad taxonomic groups.
  • 20.
    ANNOTATION 15-07-2020JUNIOR20  It isthe process that identifies genes, their regulatory sequences and their functions.  Annotation also identifies nonprotein coding genes, including those that codes for ribosomal RNA, transfer RNA and small nuclear RNAs.  In addition, mobile genetic elements and reptitve sequences families present in the genome are also identified and characterized.
  • 21.
    15-07-2020JUNIOR21  Locating proteincoding genes are done by inspecting the sequence using a computer software.  Protein encoding genes are composed of open reading frams(ORFs).  An ORF has a series of codon that specifies an amino acid sequence. They begin with an initiation codon(usually ATG) and stop codon(TAA, TAG and TGA) that are usually identifies by the computer.  This method is effective for bacterial genomes.  Genes in eukaryotic genomes, have a pattern of exons(coding genes) alternated with intron(non coding region).  Genes in humans and other eukaryotes are often widely spaced,increasing the chances of finding false genes. But newer version of ORF scanning software for eukaryotes genomes make scanning more effecient.
  • 22.
    STATISTICAL ANALYSIS 15-07-2020JUNIOR22  Metagenomicanalysis involves the application of bioinformatics tools to study the genetic material from environmental, uncultured microorganisms.  Analysis of metagenomic data involves three major steps: 1) assembly, 2) annotation, and 3) statistical analysis.  This task is relatively simple in case of prokaryotes, but search of eukaryotic gene is quite difficult.
  • 23.
    BIOINFORMATIC TOOLS USEDFOR ANALYSIS AND INTERPRETATION OF GENOME SEQUENCE DATA FUNCTION SOFTWARE 15-07-2020JUNIOR23 1. Detection of gene from genome sequence. 2. Prediction of a new gene 3. Identification of functional domains/motif of proteins . 4. Detection of tRNA. 5. Genome annotation:  Description of experimental evidence  Indexing and visualization  Storage,manipulation and visualizat ion 1. GeneScan,Glimmer,G ENIE,GRAIL,GENEMA RK,GeneFinder,HMM Gene,etc. 2. tBLASTx,FASTA,HMM ER,etc. 3. PRINTS,PROSITE,SM ART,BLOCKS,etc. 4. tRNA ScanSE. GAME DAS BioJava 2001, BioPerl 2001,etc
  • 24.
    STORAGE AND SHARINGOF DATA 15-07-2020JUNIOR24  NCBI is mandated to store all metagenomic data,however the sheer volume of data being generated means there is an urgent need for appropriate ways of storing vast amount of sequences.  Tools such as IMG/MER, CAMERA, MGRAST, and EBI metagenomics provide an integrated environment for analysis,management,storage and sharing of metagenome projects.  A suite of standerd language for metadata is currently provided by the minimum information about any (x) sequence checklists (MIxS).  MIxS is an umbrella term to describe MIMS (Minimum Information about a Metagenome Sequence) and MIMARKS (MINIMUM Information about a MARKer Sequence) have been devised, providing a scheme of standard language for metadata annotation.
  • 25.
    DATABASES 15-07-2020JUNIOR25 Databases Information availableSources nucleotide sequence databases Contain nucleotide sequences GeneBank by NCBI,USA;DDBJ,Japan: Nucleotide sequence database by EMBL. GeneBank Contain nucleotide sequence of genomic DNA NCBI,USA dbEST EST sequence (redundant nucleotide sequences) NCBI,USA;EBI,UK DDBJ (DNA Database of Japan) Nucleotide sequences GenomeNet,Japan EBML Nucleotide sequences Databank (European Molecular Biology Laboratory) DNA sequence EBI,UK PDB (Protein DataBank) Sequence of those protein whose 3-D structure is known NCBI,USA;EBI,UK
  • 26.
    15-07-2020JUNIOR26 1. GM CROPSDATABASE:This database, developed by NRC on Plant Biotechnolog, New Delhi. It store information on Biosafety of transgenics(released in India), including over 800 publications on the subject. For eg: Atotal of 139 transgenic lines using 4 genes (cry1Ab, cry1Ab, cry2Ab and vip3A) and a single promoter (CaMV 35S) have been developed. 2. VANSHANUDHAN:This Database has been developed by NRC on Plant biotechnology, New Delhi scientists as an outcome of Indian rice genome initiative. It contains information on the 56,298 rice genes. SOME INDIAN DATABASE
  • 27.
    SOME IMPORTANT DATABASESEARCH TOOLS SEARCH TOOL FUNCTION PROVIDED 15-07-2020JUNIOR27  BLAST (Basic Local Alignment Search Tool)(NCBI,USA)  DNAPLOT (EBI,UK)  LIGAND (GeneomeNet, Japan)  PROSITE  A group of tool used to analyse sequence information and detect homologous sequence.  Sequence alignment tool.  A chemical database that allow search for a combination of enzyme and metabolic enzymes;linked to all other publicly accessible database.  A collection of functional sites and sequence patterns found in many protein;has search tool for matching patterns.
  • 28.
    APPLICATIONS OF METAGENOMICS 15-07-2020JUNIOR28 Metagenomics has the potential to advance knowledge in a wide variety of fields. It can also be applied to solve practical challenges in: a) Agriculture b) Biofuels c) Biotechnology d) Ecology e) Environmental remedation f) Gut microbe characterization g) Infectious disease diagnosis
  • 29.
    15-07-2020JUNIOR29  AGRICULTURE:  Thesoils in which plants grow are inhabited by microbial communities, with one gram of soil containing around 109- 1010 microbial cells which comprise about one gigabase of sequence information.  Microbial consortia perform a wide variety of ecosystem services necessary for plant growth, including fixing atmospheric nitrogen, nutrient cycling, disease suppression, and sequester iron and other metals.  Functional metagenomics strategies are being used to explore the interactions between plants and microbes through cultivation- independent study of these microbial communities.  By allowing insights into the role of previously uncultivated or rare community members in nutrient cycling and the promotion of plant growth, metagenomic approaches can contribute to improved disease detection in crops and livestock and the adaptation of enhanced farming practices which improve crop health by harnessing the relationship between microbes and plants.
  • 30.
    15-07-2020JUNIOR30  BIOFUEL:  Biofuelsare fuels derived from biomass conversion, as in the conversion of cellulose contained in corn stalks, switchgrass, and other biomass into cellulosic ethanol.  This process is dependent upon microbial consortia(association) that transform the cellulose into sugars, followed by the fermentation of the sugars into ethanol. Microbes also produce a variety of sources of bioenergy including methane and hydrogen.  The efficient industrial-scale deconstruction of biomass requires novel enzymes with higher productivity and lower cost.  Metagenomic approaches to the analysis of complex microbial communities allow the targeted screening of enzymes with industrial applications in biofuel production, such as glycoside hydrolases.  Metagenomic approaches allow comparative analysis between convergent microbial systems like biogas fermenters or insect herbivores such as the fungus garden of the leafcutter ants.
  • 31.
    15-07-2020JUNIOR31  BIOTECHNOLOGY:  Theapplication of metagenomics has allowed the development of commodity and fine chemicals, agrochemicals and pharmaceuticals where the benefit of enzyme-catalyzed chiral synthesis is increasingly recognized.  Two types of analysis are used in the bioprospecting of metagenomic data: function-driven screening for an expressed trait, and sequence-driven screening for DNA sequences of interest.  Function-driven analysis seeks to identify clones expressing a desired trait or useful activity, followed by biochemical characterization and sequence analysis. This approach is limited by availability of a suitable screen and the requirement that the desired trait be expressed in the host cell. Moreover, the low rate of discovery (less than one per 1,000 clones screened) and its labor-intensive nature further limit this approach.  In contrast, sequence-driven analysis uses conserved DNA sequences to design PCR primers to screen clones for the sequence of interest.  The sequence-driven approach to screening is limited by the breadth and accuracy of gene functions present in public sequence databases.  In practice, experiments make use of a combination of both functional and sequence-based approaches based upon the function of interest, the complexity of the sample to be screened, and other factors.  An example of success using metagenomics as a biotechnology for drug discovery is illustrated with the malacidin antibiotics.
  • 32.
    15-07-2020JUNIOR32  ECOLOGY:  Metagenomicanalysis of the bacterial consortia found in the defecations of Australian sea lions suggests that nutrient-rich sea lion faeces may be an important nutrient source for coastal ecosystems. This is because the bacteria that are expelled simultaneously with the defecations are adept at breaking down the nutrients in the faeces into a bioavailable form that can be taken up into the food chain.  DNA sequencing can also be used more broadly to identify species present in a body of water,debris filtered from the air, or sample of dirt. This can establish the range of invasive species and endangered species, and track seasonal populations.  ENVIRONMENTAL REMEDIATION:  Metagenomics can improve strategies for monitoring the impact of pollutants on ecosystems and for cleaning up contaminated environments. Increased understanding of how microbial communities cope with pollutants improves assessments of the potential of contaminated sites to recover from pollution and increases the chances of bioaugmentation or biostimulation trials to succeed.
  • 33.
    15-07-2020JUNIOR33  GUT MICROBECHARACTERIZATION:  Metagenomic sequencing is being used to characterize the microbial communities from 15–18 body sites from at least 250 individuals. This is part of the Human Microbiome initiative with primary goals to determine if there is a core human microbiome, to understand the changes in the human microbiome that can be correlated with human health, and to develop new technological and bioinformatics tools to support these goals.  Another medical study as part of the MetaHit (Metagenomics of the Human Intestinal Tract) project consisted of 124 individuals from Denmark and Spain consisting of healthy, overweight, and irritable bowel disease patients. The study attempted to categorize the depth and phylogenetic diversity of gastrointestinal bacteria. Using Illumina GA sequence data and SOAPdenovo, a de Bruijn graph-based tool specifically designed for assembly short reads, they were able to generate 6.58 million contigs greater than 500 bp for a total contig length of 10.3 Gb and a N50 length of 2.2 kb.  The study demonstrated that two bacterial divisions, Bacteroidetes and Firmicutes, constitute over 90% of the known phylogenetic categories that dominate distal gut bacteria.
  • 34.
    15-07-2020JUNIOR34  Using therelative gene frequencies found within the gut these researchers identified 1,244 metagenomic clusters that are critically important for the health of the intestinal tract.  There are two types of functions in these range clusters: housekeeping and those specific to the intestine.  The housekeeping gene clusters are required in all bacteria and are often major players in the main metabolic pathways including central carbon metabolism and amino acid synthesis. The gut-specific functions include adhesion to host proteins and the harvesting of sugars from globoseries glycolipids.  Patients with irritable bowel syndrome were shown to exhibit 25% fewer genes and lower bacterial diversity than individuals not suffering from irritable bowel syndrome indicating that changes in patients' gut biome diversity may be associated with this condition.  While these studies highlight some potentially valuable medical applications, only 31–48.8% of the reads could be aligned to 194 public human gut bacterial genomes and 7.6–21.2% to bacterial genomes available in GenBank which indicates that there is still far more research necessary to capture novel bacterial genomes.
  • 35.
    15-07-2020JUNIOR35  INFECTIOUS DISEASEDIAGNOSIS:  Differentiating between infectious and non-infectious illness, and identifying the underlying etiology of infection, can be quite challenging. For example, more than half of cases of encephalitis remain undiagnosed, despite extensive testing using state-of-the-art clinical laboratory methods. Metagenomic sequencing shows promise as a sensitive and rapid method to diagnose infection by comparing genetic material found in a patient's sample to a database of thousands of bacteria, viruses, and other pathogens.
  • 36.
    GENOME SEQUENCING PROJECT 15-07-2020JUNIOR36 A Model Organism is an organism about which is a large amount of scientific knowledge is already available. These organisms include both prokaryotes and eukaryotes as well as animals.  E. coli genome sequencing was completed in 1997. The genome size is over 4.64 x 10^6 bp and contain 4,408 genes.  A. fungidus is a strictly anaerobic archaebacterium, its genome was published in 1997. The genome size project is 2.17 x 10^6 bp and contain 2,493 genes.  Arabidopsis genome sequencing was began in 1990 and was completed in 2000. the genome has 130 x 10^6 bp and estimated 26,000genes.  Human genome project was picked up in 1984 by the US government when the planning started, the project formally launched in 1990 and was declared complete on April 14, 2003.
  • 37.
    15-07-2020JUNIOR37  Several conclusionswere made from human genome draft sequence. Some of the important feature are?; 1. It contain over 3.2 million base pairs. 2. Only ~5% of the genome encodes proteins. 3. At least 50% of the genome is derived from transposable elements. 4. The genome has gene rich regions separated by gene poor regions often called gene deserts. 5. Human genome is estimated to have about 35,000 genes. 6. The largest gene is the gene encoding dystrophin; it is 2.5 x 10^6 bp long. 7. Genome sequencing of different individuals differ for less than 0.2%of the base pairs. Most of the difference occur in the form of single base differenences in the sequence. The single base difference is called single nucleotide polymorphism. One SNP occurs at every ~1,000 bp of human genome. About 85% of all difference in human DNAs are due to SNPs.
  • 38.
    BENEFITS FROM GENOMESEQUENCING PROJECTS 15-07-2020JUNIOR38  It enables the determination of the complete genetic information present in the genomes of various organisms.  The relationships between genes can be deduces with confidence.  It provides insights on genome organization and evaluation and the mechanism involved therein.  It has openup exciting areas for future research,eg. Functional genomics.  Genome sequence will allow biologists to work out the various molecular interactions that lead to the normal development of organism.  Information like SNPs has become available;these may be useful oin several ways.  A varirty of tool and techniques were developed for the genome sequencing projects.  A better understanding of human genetics diseases should facilitate their cure.  It may provide an understanding of why different individuals respond differently to the same drugs (pharmacogenomics).  The pathogenecity of microorganisms would be better understood. This should facilitate protection from such diseases.
  • 39.
    LIMITATIONS 15-07-2020JUNIOR39  To muchdata.  Most genes are not identifiable.  Contamination,chimeric clone sequence.  Extraction problem.  Requires proteomics or expression studies to demonstrate phenotypic characteristics.  Need a standard method for annotating genomes.  Can only progress as library technology progress, including sequencing technology.  Requires high throughput instrumentation not readily available to most institutions.
  • 40.
    CONCLUSION 15-07-2020JUNIOR40  Metagenomics hasbenefited in the past few years from many visionary investments in both financial and intellectual terms.  The science of metagenomics is currently in its pioneering stages of development as a field, and many tools and technologies are undergoing rapid evolution.  The best use of the metagenomics as a tool to address fundamental question of microbial ecology,evolution and diversity and to derive and test new hypothesis.  As datasets become increasingly more complex and comprehensive, novel tools for analysis,storage and visualization will be required.  Metagenomics allows us to discover new genes and proteins or even the complete genomes of non cultivable organism in less time and with better accuracy than classical microbiology or molecular methods.  In addition to the phenotypic dimension of human biolohy,such as gene expression profiling,proteomics and metabolomics, perhaps we need to extend our concept of the human genome to include the more comprehensive and plastic human metagenome in laboratory medicine.
  • 41.
    REFRENCES 15-07-2020JUNIOR41  https://en.wikipedia.org/wiki/Metagenomics.  https://en.wikipedia.org/wiki/Sequence_assembly. https://www.slideshare.net/PradeepBadal/metagenomics-ppt.  https://genohub.com/shotgun-metagenomics-sequencing/.  GENOMICS AND BIOINFORMATICS from B.D. SINGH • 19.4. SEQUENCING OF GENOMES (pg-704) • 19.5. GENOME SEQUENCING PROJECT (pg-706-708) • 19.14.DATABASE AND SEARCH TOOLS (pg-737) • 19.15. SOME INDIAN DATABASE (pg-741) • 19.16. ANALYSIS USING BIOINFORMATICS TOOLS (pg-741)