Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
1. An Introduction toAn Introduction to
Metagenomics Data AnalysisMetagenomics Data Analysis
Metagenomics TrainingMetagenomics Training
Ferran BriansóFerran Briansó
VHIR - 26/08/2013
ferran.brianso@vhir.orgferran.brianso@vhir.org
4. IntroductionIntroduction
First use of the term metagenome, referencing the idea that a collection of
genes sequenced from the environment could be analyzed in a way analogous
to the study of a single genome.
Handelsman, J.; Rondon, M. R.; Brady, S. F.; Clardy, J.; Goodman, R. M. (1998).
"Molecular biological access to the chemistry of unknown soil microbes: A new
frontier for natural products".
Chemistry & Biology 5 (10): R245–R249. doi:10.1016/S1074-5521(98)90108-9.
PMID 9818143
5. First use of the term metagenome, referencing the idea that a collection of
genes sequenced from the environment could be analyzed in a way analogous
to the study of a single genome.
“The application of modern genomics techniques to the study of communities
of microbial organisms directly in their natural environments, bypassing the
need for isolation and lab cultivation of individual species.”
Handelsman, J.; Rondon, M. R.; Brady, S. F.; Clardy, J.; Goodman, R. M. (1998).
"Molecular biological access to the chemistry of unknown soil microbes: A new
frontier for natural products".
Chemistry & Biology 5 (10): R245–R249. doi:10.1016/S1074-5521(98)90108-9.
PMID 9818143
Chen, K.; Pachter, L. (2005).
"Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities".
PLoS Computational Biology 1 (2): e24. doi:10.1371/journal.pcbi.0010024
IntroductionIntroduction
6. Source: US Division of Earth & Life Studies of the National Academies
http://dels-old.nas.edu/metagenomics/overview.shtml
IntroductionIntroduction
7. Source: US Division of Earth & Life Studies of the National Academies
http://dels-old.nas.edu/metagenomics/overview.shtml
IntroductionIntroduction
11. TerminologyTerminology
Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from
automated DNA sequencers prior to sequence assembly and other downstream uses.
Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).
OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.
Typically using a percent sequence similarity threshold for classifying microbes within the same, or different,
OTUs.
12. Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from
automated DNA sequencers prior to sequence assembly and other downstream uses.
Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).
OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.
Typically using a percent sequence similarity threshold for classifying microbes within the same, or different,
OTUs.
Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise
from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template
derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric
sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995).
A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric
amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.
TerminologyTerminology
13. Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from
automated DNA sequencers prior to sequence assembly and other downstream uses.
Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).
OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.
Typically using a percent sequence similarity threshold for classifying microbes within the same, or different,
OTUs.
Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise
from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template
derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric
sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995).
A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric
amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.
Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e.,
species richness) in that ecosystem, or by one or more diversity indices.
Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species
change between the ecosystems.
Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity
according to Hunter (2002:448).
TerminologyTerminology
14. Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from
automated DNA sequencers prior to sequence assembly and other downstream uses.
Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).
OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.
Typically using a percent sequence similarity threshold for classifying microbes within the same, or different,
OTUs.
Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise
from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template
derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric
sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995).
A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric
amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.
Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e.,
species richness) in that ecosystem, or by one or more diversity indices.
Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species
change between the ecosystems.
Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity
according to Hunter (2002:448).
Rarefaction allows the calculation of species richness for a given number of individual samples, based on the
construction of so-called rarefaction curves. This curve is a plot of the number of species as a function of the
number of samples.
TerminologyTerminology
54. Ferran BriansóFerran Briansó
MGTraining 26/08/2013
Thanks for your attentionThanks for your attention
ferran.brianso@vhir.orgferran.brianso@vhir.org
An Introduction toAn Introduction to
Metagenomics Data AnalysisMetagenomics Data Analysis
more info at
http://ueb.vhir.org/MGT