SlideShare a Scribd company logo
1 of 43
Download to read offline
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Metagenomic Data Analysis:
Computational Methods and Applications
Fabio Gori
Intelligent Systems, Institute for Computing and Information Sciences
in collaboration with
Department of Microbiology
Radboud University Nijmegen
The Netherlands
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Table of Contents
Introduction to Metagenomics
Taxonomic-annotation Algorithms
Genomic Signatures for Metagenomics
Metagenomics to Retrieve Anammox Bacteria
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Table of Contents
Introduction to Metagenomics
Taxonomic-annotation Algorithms
Genomic Signatures for Metagenomics
Metagenomics to Retrieve Anammox Bacteria
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
What is Metagenomics?
Metagenomics:
study of genomic
imformation obtained
directly from microbial
communities
Why?
• 99% microbes
cannot be sequenced
• Understand interactions
between organisms
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
What is Metagenomics?
Metagenomics:
study of genomic
imformation obtained
directly from microbial
communities
Why?
• 99% microbes
cannot be sequenced
• Understand interactions
between organisms
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
What is Metagenomics?
Metagenomics:
study of genomic
imformation obtained
directly from microbial
communities
Why?
• 99% microbes
cannot be sequenced
• Understand interactions
between organisms
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
What kind of data? A meta. . . jigsaw puzzle
DNA sequences
(reads)
• Original pictures are
unknown
• Pieces are similar
• Pieces have errors
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Annotation: discovering the original pictures of the puzzles
Assign each read
to an organism or
to a taxonomic identier
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Taxonomy: a biological classication
Linnean taxonomy:
• Formal system for classifying and naming
living things
• Based on a simple hierarchical structure
• Similar elements are grouped together
Rank: level in the hierarchy (left)
Taxon: unit of the hierarchy
(group of similar living things)
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Table of Contents
Introduction to Metagenomics
Taxonomic-annotation Algorithms
Genomic Signatures for Metagenomics
Metagenomics to Retrieve Anammox Bacteria
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Similarity-based methods
Algorithm scheme
1 Compare reads to reference sequences
2 Assign each read to a taxon of one of its best
matching sequences
Comparison performed with sequence alignment or composition
prole
Problems (Lowest Common Ancestor algorithm):
• Few reads at low ranks
• Many unassigned reads
How can we improve it?
Idea: assignments of reads are dependent on each other
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Similarity-based methods
Algorithm scheme
1 Compare reads to reference sequences
2 Assign each read to a taxon of one of its best
matching sequences
Comparison performed with sequence alignment or composition
prole
Problems (Lowest Common Ancestor algorithm):
• Few reads at low ranks
• Many unassigned reads
How can we improve it?
Idea: assignments of reads are dependent on each other
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
MTR: Annotation via combinatorial optimization
For each rank j: For each taxon ti or rank j:
Create cluster Ci of sequences similar to taxon ti
Set Covering Problem
Select collection of clusters (taxa) s.t.
• No sequence is left outside
• Minimal number of selected clusters
If Ci is selected, sequences of Ci will be assigned to ti
Example:
C1 C2 C3 C4 C5 C6
s1 • • •
s2 • •
s3 • •
s4 • • •
s5 • •
s6 • •
s7 • • •
s8 • •
s9 • •
s10 • •
→
Clustering Solution:
C1 C2 C3 C4 C5 C6
s1 • • •
s2 • •
s3 • •
s4 • • •
s5 • •
s6 • •
s7 • • •
s8 • •
s9 • •
s10 • •
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
MTR: Annotation via combinatorial optimization
For each rank j: For each taxon ti or rank j:
Create cluster Ci of sequences similar to taxon ti
Set Covering Problem
Select collection of clusters (taxa) s.t.
• No sequence is left outside
• Minimal number of selected clusters
If Ci is selected, sequences of Ci will be assigned to ti
Example:
C1 C2 C3 C4 C5 C6
s1 • • •
s2 • •
s3 • •
s4 • • •
s5 • •
s6 • •
s7 • • •
s8 • •
s9 • •
s10 • •
→
Clustering Solution:
C1 C2 C3 C4 C5 C6
s1 • • •
s2 • •
s3 • •
s4 • • •
s5 • •
s6 • •
s7 • • •
s8 • •
s9 • •
s10 • •
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Results
Rank MTR (#of reads) LCA (#of reads)
Kingdom 95.07 (88,537) 94.66 (73,176)
Phylum 93.21 (88,537) 92.57 (73,169)
Class 89.25 (87,635) 88.98 (60,294)
Order 89.24 (85,657) 88.44 (57,373)
Family 77.35 (81,366) 81.84 (48,760)
Genus 61.36 (77,307) 74.60 (40,823)
Table: Data name: M2, Coverage 1X,
Tot reads:288,730
Population distributions (rank
Genus) of M2, coverage 0.1X
• More sequences
annotated
at low ranks
• Better estimate
of
population
distribution
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Results
Rank MTR (#of reads) LCA (#of reads)
Kingdom 95.07 (88,537) 94.66 (73,176)
Phylum 93.21 (88,537) 92.57 (73,169)
Class 89.25 (87,635) 88.98 (60,294)
Order 89.24 (85,657) 88.44 (57,373)
Family 77.35 (81,366) 81.84 (48,760)
Genus 61.36 (77,307) 74.60 (40,823)
Table: Data name: M2, Coverage 1X,
Tot reads:288,730
Population distributions (rank
Genus) of M2, coverage 0.1X
• More sequences
annotated
at low ranks
• Better estimate
of
population
distribution
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Table of Contents
Introduction to Metagenomics
Taxonomic-annotation Algorithms
Genomic Signatures for Metagenomics
Metagenomics to Retrieve Anammox Bacteria
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Metagenomic annotation in two steps
DNA sequences
(strings of A, C, G, T)
ρ
−→
Rn
Classication
or Clustering
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Metagenomic annotation in two steps
DNA sequences
(strings of A, C, G, T)
ρ
−→
Rn
Classication
or Clustering
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Metagenomic annotation in two steps
DNA sequences
(strings of A, C, G, T)
ρ
−→
Rn
Classication
or Clustering
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Metagenomic annotation in two steps
DNA sequences
(strings of A, C, G, T)
ρ
−→
Rn
Classication
or Clustering
In this study: focus on ρ,
the data representation
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Typical ρ's used in binning
ρT(s) := frequencies in s of all the k-mers
k-mer := sequence of k nucleotides {A, C, G, T}k
ρT
i (s) := #wi, wi is a k-mer, i = 1, . . . , 4
k
Usually k = 4 =⇒ 4
k = 256 features: ρT(s) ∈ N256
[ Mohammed et al., Bioinformatics, 2011], [ Diaz et al., BMC Bioinformatics, 2009]
[ Chan et al., J. Biomed. Biotech., 2008], [ Teeling et al., Environ. Microb., 2004]
Example:
s = A G C A T G C A G C A T A T G T G G A G C A
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Typical ρ's used in binning
ρT(s) := frequencies in s of all the k-mers
k-mer := sequence of k nucleotides {A, C, G, T}k
ρT
i (s) := #wi, wi is a k-mer, i = 1, . . . , 4
k
Usually k = 4 =⇒ 4
k = 256 features: ρT(s) ∈ N256
[ Mohammed et al., Bioinformatics, 2011], [ Diaz et al., BMC Bioinformatics, 2009]
[ Chan et al., J. Biomed. Biotech., 2008], [ Teeling et al., Environ. Microb., 2004]
Example:
s = A G C A T G C A G C A T A T G T G G A G C A
ρT(s) =( . . . , #AGCA = 1, . . .
)
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Typical ρ's used in binning
ρT(s) := frequencies in s of all the k-mers
k-mer := sequence of k nucleotides {A, C, G, T}k
ρT
i (s) := #wi, wi is a k-mer, i = 1, . . . , 4
k
Usually k = 4 =⇒ 4
k = 256 features: ρT(s) ∈ N256
[ Mohammed et al., Bioinformatics, 2011], [ Diaz et al., BMC Bioinformatics, 2009]
[ Chan et al., J. Biomed. Biotech., 2008], [ Teeling et al., Environ. Microb., 2004]
Example:
s = A G C A T G C A G C A T A T G T G G A G C A
ρT(s) =( . . . , #AGCA = 1, . . . , #GCAT = 1, . . .
)
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Typical ρ's used in binning
ρT(s) := frequencies in s of all the k-mers
k-mer := sequence of k nucleotides {A, C, G, T}k
ρT
i (s) := #wi, wi is a k-mer, i = 1, . . . , 4
k
Usually k = 4 =⇒ 4
k = 256 features: ρT(s) ∈ N256
[ Mohammed et al., Bioinformatics, 2011], [ Diaz et al., BMC Bioinformatics, 2009]
[ Chan et al., J. Biomed. Biotech., 2008], [ Teeling et al., Environ. Microb., 2004]
Example:
s = A G C A T G C A G C A T A T G T G G A G C A
ρT(s) =( . . . , #AGCA = 1, . . . , #CATG = 1, #GCAT = 1, . . .
)
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Typical ρ's used in binning
ρT(s) := frequencies in s of all the k-mers
k-mer := sequence of k nucleotides {A, C, G, T}k
ρT
i (s) := #wi, wi is a k-mer, i = 1, . . . , 4
k
Usually k = 4 =⇒ 4
k = 256 features: ρT(s) ∈ N256
[ Mohammed et al., Bioinformatics, 2011], [ Diaz et al., BMC Bioinformatics, 2009]
[ Chan et al., J. Biomed. Biotech., 2008], [ Teeling et al., Environ. Microb., 2004]
Example:
s = A G C A T G C A G C A T A T G T G G A G C A
ρT(s) =(#AAAA = 0, . . . , #AGCA = 3, . . . , #ATAT = 1, . . .
. . . , #GCAT = 2, . . . )
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
What ρ should do
z
s r
−→
ρ
Rn
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
What ρ should do
z
s r
−→
ρ
Rn
ρ(s)
ρ(z)
ρ(r)
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
What ρ should do
z
s r
ρ needs to be a
genomic signature:
[ Karlin et al., Trends. Genet., 1995 ]
ρ(s) ≈ ρ(z)
ρ(s) = ρ(r)
−→
ρ
Rn
ρ(s)
ρ(z)
ρ(r)
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
What ρ should do
z
s r
ρ needs to be a
genomic signature:
[ Karlin et al., Trends. Genet., 1995 ]
ρ(s) ≈ ρ(z)
ρ(s) = ρ(r)
but few connections
with metagenomics in
the literature
−→
ρ
Rn
ρ(s)
ρ(z)
ρ(r)
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Results
• Proposed signatures excelled
standard signature ρT used in
metagenomics
• Best signatures had fewer features
(half number of dimensions)
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Results
• Proposed signatures excelled
standard signature ρT used in
metagenomics
• Best signatures had fewer features
(half number of dimensions)
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Table of Contents
Introduction to Metagenomics
Taxonomic-annotation Algorithms
Genomic Signatures for Metagenomics
Metagenomics to Retrieve Anammox Bacteria
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Sequencing communities containing anammox bacteria
ANaerobic
AMMonium
OXidation
Why anammox
are important:
• Fixed nitrogen loss
• Wastewater-treatment plants
Metagenomics:
only way to retrieve anammox
• Dicult to culture
• Not isolable
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Sequencing communities containing anammox bacteria
ANaerobic
AMMonium
OXidation
Why anammox
are important:
• Fixed nitrogen loss
• Wastewater-treatment plants
Metagenomics:
only way to retrieve anammox
• Dicult to culture
• Not isolable
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Sequencing communities containing anammox bacteria
ANaerobic
AMMonium
OXidation
Why anammox
are important:
• Fixed nitrogen loss
• Wastewater-treatment plants
Metagenomics:
only way to retrieve anammox
• Dicult to culture
• Not isolable
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Sequencing communities containing anammox bacteria
ANaerobic
AMMonium
OXidation
Why anammox
are important:
• Fixed nitrogen loss
• Wastewater-treatment plants
Metagenomics:
only way to retrieve anammox
• Dicult to culture
• Not isolable
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Why FISH analysis and BlastX annotation don't agree?
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Dierent point of view: GC content
[ Bernaola-Galvan et al., Gene, 2004 ]
• Dierent organisms can have
dierent GC content
(16.6% - 74.9%)
• If genome is partitioned in
equally sized, non-overlapping
sequences:
• GC content has normal
distribution (approximately)
• Distribution is centered on
organism GC content
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Bias toward high GC-content organisms
Raw
Annotated
Brocadia
Alphaproteobacteria
Betaproteobacteria
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0
2000
4000
6000
8000
10000
GC−content
Frequency
454
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0
200
400
600
800
1000
1200
1400
1600
GC−content
Frequency
Fosmid
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0
200
400
600
800
1000
1200
1400
1600
GC−content
Frequency
Shotgun
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Comining technologies improved protein retrieval
Extended Venn-diagram
of proteins retrieved for
80% of their length
• Retrieve anammox core genes
Technologies:
Shotgun (Sanger):
Fosmid (Sanger):
454:
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Conclusions
• Proposed new eective methods
for improving metagenomic data analysis
• Studied in details real-life data
of anammox bacteria
Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval
Conclusions
• Proposed new eective methods
for improving metagenomic data analysis
• Studied in details real-life data
of anammox bacteria

More Related Content

What's hot

2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekingeProf. Wim Van Criekinge
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorialc.titus.brown
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenomeInABottle
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPatricia Francis-Lyon
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesSurya Saha
 
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubRNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubJennifer Shelton
 
A short introduction to single-cell RNA-seq analyses
A short introduction to single-cell RNA-seq analysesA short introduction to single-cell RNA-seq analyses
A short introduction to single-cell RNA-seq analysestuxette
 
Evaluation of Pool-Seq as a cost-effective alternative to GWAS
Evaluation of Pool-Seq as a cost-effective alternative to GWASEvaluation of Pool-Seq as a cost-effective alternative to GWAS
Evaluation of Pool-Seq as a cost-effective alternative to GWASAmin Mohamed
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1BITS
 
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...Data Con LA
 
Analyzing the exome—focusing your NGS analysis with high performance target c...
Analyzing the exome—focusing your NGS analysis with high performance target c...Analyzing the exome—focusing your NGS analysis with high performance target c...
Analyzing the exome—focusing your NGS analysis with high performance target c...Integrated DNA Technologies
 
Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878GenomeInABottle
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Gunnar Rätsch
 
Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012gregcaporaso
 

What's hot (20)

2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
Predictive Features of TCR Repertoire
Predictive Features of TCR RepertoirePredictive Features of TCR Repertoire
Predictive Features of TCR Repertoire
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learning
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
 
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubRNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
 
presentation
presentationpresentation
presentation
 
A short introduction to single-cell RNA-seq analyses
A short introduction to single-cell RNA-seq analysesA short introduction to single-cell RNA-seq analyses
A short introduction to single-cell RNA-seq analyses
 
Genotyping in Breeding programs
Genotyping in Breeding programsGenotyping in Breeding programs
Genotyping in Breeding programs
 
Evaluation of Pool-Seq as a cost-effective alternative to GWAS
Evaluation of Pool-Seq as a cost-effective alternative to GWASEvaluation of Pool-Seq as a cost-effective alternative to GWAS
Evaluation of Pool-Seq as a cost-effective alternative to GWAS
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGSCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
 
Analyzing the exome—focusing your NGS analysis with high performance target c...
Analyzing the exome—focusing your NGS analysis with high performance target c...Analyzing the exome—focusing your NGS analysis with high performance target c...
Analyzing the exome—focusing your NGS analysis with high performance target c...
 
Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)
 
Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012
 
Kishor Presentation
Kishor PresentationKishor Presentation
Kishor Presentation
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 

Similar to Metagenomic Data Analysis: Computational Methods and Applications

SNP genotyping using Affymetrix' Axiom Genotyping Solution
SNP genotyping using Affymetrix' Axiom Genotyping SolutionSNP genotyping using Affymetrix' Axiom Genotyping Solution
SNP genotyping using Affymetrix' Axiom Genotyping SolutionAffymetrix
 
LogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology MatchingLogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology MatchingErnesto Jimenez Ruiz
 
5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological dataKrish_ver2
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger Eli Kaminuma
 
An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...Zac Darcy
 
An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...Zac Darcy
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedJonathan Eisen
 
53564379-Ant-Colony-Optimization.ppt
53564379-Ant-Colony-Optimization.ppt53564379-Ant-Colony-Optimization.ppt
53564379-Ant-Colony-Optimization.pptAhmedSalimJAlJawadi
 
IGB genome genometry data models by Gregg Helt and Cyrus Harmon
IGB genome genometry data models by Gregg Helt and Cyrus HarmonIGB genome genometry data models by Gregg Helt and Cyrus Harmon
IGB genome genometry data models by Gregg Helt and Cyrus HarmonAnn Loraine
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
Doctoral Thesis Dissertation 2014-03-20 @PoliMi
Doctoral Thesis Dissertation 2014-03-20 @PoliMiDoctoral Thesis Dissertation 2014-03-20 @PoliMi
Doctoral Thesis Dissertation 2014-03-20 @PoliMiDavide Chicco
 
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)r-kor
 
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...Zac Darcy
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithmgarima931
 

Similar to Metagenomic Data Analysis: Computational Methods and Applications (20)

SNP genotyping using Affymetrix' Axiom Genotyping Solution
SNP genotyping using Affymetrix' Axiom Genotyping SolutionSNP genotyping using Affymetrix' Axiom Genotyping Solution
SNP genotyping using Affymetrix' Axiom Genotyping Solution
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 
LogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology MatchingLogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology Matching
 
Biochip
BiochipBiochip
Biochip
 
unit-4.ppt
unit-4.pptunit-4.ppt
unit-4.ppt
 
unit 4.ppt
unit 4.pptunit 4.ppt
unit 4.ppt
 
5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger
 
Bioinformatica t3-scoring matrices
Bioinformatica t3-scoring matricesBioinformatica t3-scoring matrices
Bioinformatica t3-scoring matrices
 
An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...
 
An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
Ch09 combinatorialpatternmatching
Ch09 combinatorialpatternmatchingCh09 combinatorialpatternmatching
Ch09 combinatorialpatternmatching
 
53564379-Ant-Colony-Optimization.ppt
53564379-Ant-Colony-Optimization.ppt53564379-Ant-Colony-Optimization.ppt
53564379-Ant-Colony-Optimization.ppt
 
IGB genome genometry data models by Gregg Helt and Cyrus Harmon
IGB genome genometry data models by Gregg Helt and Cyrus HarmonIGB genome genometry data models by Gregg Helt and Cyrus Harmon
IGB genome genometry data models by Gregg Helt and Cyrus Harmon
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Doctoral Thesis Dissertation 2014-03-20 @PoliMi
Doctoral Thesis Dissertation 2014-03-20 @PoliMiDoctoral Thesis Dissertation 2014-03-20 @PoliMi
Doctoral Thesis Dissertation 2014-03-20 @PoliMi
 
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
 
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
 

Recently uploaded

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Jshifa
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 

Recently uploaded (20)

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 

Metagenomic Data Analysis: Computational Methods and Applications

  • 1. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Metagenomic Data Analysis: Computational Methods and Applications Fabio Gori Intelligent Systems, Institute for Computing and Information Sciences in collaboration with Department of Microbiology Radboud University Nijmegen The Netherlands
  • 2. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Table of Contents Introduction to Metagenomics Taxonomic-annotation Algorithms Genomic Signatures for Metagenomics Metagenomics to Retrieve Anammox Bacteria
  • 3. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Table of Contents Introduction to Metagenomics Taxonomic-annotation Algorithms Genomic Signatures for Metagenomics Metagenomics to Retrieve Anammox Bacteria
  • 4. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval What is Metagenomics? Metagenomics: study of genomic imformation obtained directly from microbial communities Why? • 99% microbes cannot be sequenced • Understand interactions between organisms
  • 5. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval What is Metagenomics? Metagenomics: study of genomic imformation obtained directly from microbial communities Why? • 99% microbes cannot be sequenced • Understand interactions between organisms
  • 6. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval What is Metagenomics? Metagenomics: study of genomic imformation obtained directly from microbial communities Why? • 99% microbes cannot be sequenced • Understand interactions between organisms
  • 7. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval What kind of data? A meta. . . jigsaw puzzle DNA sequences (reads) • Original pictures are unknown • Pieces are similar • Pieces have errors
  • 8. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Annotation: discovering the original pictures of the puzzles Assign each read to an organism or to a taxonomic identier
  • 9. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Taxonomy: a biological classication Linnean taxonomy: • Formal system for classifying and naming living things • Based on a simple hierarchical structure • Similar elements are grouped together Rank: level in the hierarchy (left) Taxon: unit of the hierarchy (group of similar living things)
  • 10. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Table of Contents Introduction to Metagenomics Taxonomic-annotation Algorithms Genomic Signatures for Metagenomics Metagenomics to Retrieve Anammox Bacteria
  • 11. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Similarity-based methods Algorithm scheme 1 Compare reads to reference sequences 2 Assign each read to a taxon of one of its best matching sequences Comparison performed with sequence alignment or composition prole Problems (Lowest Common Ancestor algorithm): • Few reads at low ranks • Many unassigned reads How can we improve it? Idea: assignments of reads are dependent on each other
  • 12. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Similarity-based methods Algorithm scheme 1 Compare reads to reference sequences 2 Assign each read to a taxon of one of its best matching sequences Comparison performed with sequence alignment or composition prole Problems (Lowest Common Ancestor algorithm): • Few reads at low ranks • Many unassigned reads How can we improve it? Idea: assignments of reads are dependent on each other
  • 13. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval MTR: Annotation via combinatorial optimization For each rank j: For each taxon ti or rank j: Create cluster Ci of sequences similar to taxon ti Set Covering Problem Select collection of clusters (taxa) s.t. • No sequence is left outside • Minimal number of selected clusters If Ci is selected, sequences of Ci will be assigned to ti Example: C1 C2 C3 C4 C5 C6 s1 • • • s2 • • s3 • • s4 • • • s5 • • s6 • • s7 • • • s8 • • s9 • • s10 • • → Clustering Solution: C1 C2 C3 C4 C5 C6 s1 • • • s2 • • s3 • • s4 • • • s5 • • s6 • • s7 • • • s8 • • s9 • • s10 • •
  • 14. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval MTR: Annotation via combinatorial optimization For each rank j: For each taxon ti or rank j: Create cluster Ci of sequences similar to taxon ti Set Covering Problem Select collection of clusters (taxa) s.t. • No sequence is left outside • Minimal number of selected clusters If Ci is selected, sequences of Ci will be assigned to ti Example: C1 C2 C3 C4 C5 C6 s1 • • • s2 • • s3 • • s4 • • • s5 • • s6 • • s7 • • • s8 • • s9 • • s10 • • → Clustering Solution: C1 C2 C3 C4 C5 C6 s1 • • • s2 • • s3 • • s4 • • • s5 • • s6 • • s7 • • • s8 • • s9 • • s10 • •
  • 15. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Results Rank MTR (#of reads) LCA (#of reads) Kingdom 95.07 (88,537) 94.66 (73,176) Phylum 93.21 (88,537) 92.57 (73,169) Class 89.25 (87,635) 88.98 (60,294) Order 89.24 (85,657) 88.44 (57,373) Family 77.35 (81,366) 81.84 (48,760) Genus 61.36 (77,307) 74.60 (40,823) Table: Data name: M2, Coverage 1X, Tot reads:288,730 Population distributions (rank Genus) of M2, coverage 0.1X • More sequences annotated at low ranks • Better estimate of population distribution
  • 16. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Results Rank MTR (#of reads) LCA (#of reads) Kingdom 95.07 (88,537) 94.66 (73,176) Phylum 93.21 (88,537) 92.57 (73,169) Class 89.25 (87,635) 88.98 (60,294) Order 89.24 (85,657) 88.44 (57,373) Family 77.35 (81,366) 81.84 (48,760) Genus 61.36 (77,307) 74.60 (40,823) Table: Data name: M2, Coverage 1X, Tot reads:288,730 Population distributions (rank Genus) of M2, coverage 0.1X • More sequences annotated at low ranks • Better estimate of population distribution
  • 17. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Table of Contents Introduction to Metagenomics Taxonomic-annotation Algorithms Genomic Signatures for Metagenomics Metagenomics to Retrieve Anammox Bacteria
  • 18. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Metagenomic annotation in two steps DNA sequences (strings of A, C, G, T) ρ −→ Rn Classication or Clustering
  • 19. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Metagenomic annotation in two steps DNA sequences (strings of A, C, G, T) ρ −→ Rn Classication or Clustering
  • 20. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Metagenomic annotation in two steps DNA sequences (strings of A, C, G, T) ρ −→ Rn Classication or Clustering
  • 21. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Metagenomic annotation in two steps DNA sequences (strings of A, C, G, T) ρ −→ Rn Classication or Clustering In this study: focus on ρ, the data representation
  • 22. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Typical ρ's used in binning ρT(s) := frequencies in s of all the k-mers k-mer := sequence of k nucleotides {A, C, G, T}k ρT i (s) := #wi, wi is a k-mer, i = 1, . . . , 4 k Usually k = 4 =⇒ 4 k = 256 features: ρT(s) ∈ N256 [ Mohammed et al., Bioinformatics, 2011], [ Diaz et al., BMC Bioinformatics, 2009] [ Chan et al., J. Biomed. Biotech., 2008], [ Teeling et al., Environ. Microb., 2004] Example: s = A G C A T G C A G C A T A T G T G G A G C A
  • 23. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Typical ρ's used in binning ρT(s) := frequencies in s of all the k-mers k-mer := sequence of k nucleotides {A, C, G, T}k ρT i (s) := #wi, wi is a k-mer, i = 1, . . . , 4 k Usually k = 4 =⇒ 4 k = 256 features: ρT(s) ∈ N256 [ Mohammed et al., Bioinformatics, 2011], [ Diaz et al., BMC Bioinformatics, 2009] [ Chan et al., J. Biomed. Biotech., 2008], [ Teeling et al., Environ. Microb., 2004] Example: s = A G C A T G C A G C A T A T G T G G A G C A ρT(s) =( . . . , #AGCA = 1, . . . )
  • 24. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Typical ρ's used in binning ρT(s) := frequencies in s of all the k-mers k-mer := sequence of k nucleotides {A, C, G, T}k ρT i (s) := #wi, wi is a k-mer, i = 1, . . . , 4 k Usually k = 4 =⇒ 4 k = 256 features: ρT(s) ∈ N256 [ Mohammed et al., Bioinformatics, 2011], [ Diaz et al., BMC Bioinformatics, 2009] [ Chan et al., J. Biomed. Biotech., 2008], [ Teeling et al., Environ. Microb., 2004] Example: s = A G C A T G C A G C A T A T G T G G A G C A ρT(s) =( . . . , #AGCA = 1, . . . , #GCAT = 1, . . . )
  • 25. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Typical ρ's used in binning ρT(s) := frequencies in s of all the k-mers k-mer := sequence of k nucleotides {A, C, G, T}k ρT i (s) := #wi, wi is a k-mer, i = 1, . . . , 4 k Usually k = 4 =⇒ 4 k = 256 features: ρT(s) ∈ N256 [ Mohammed et al., Bioinformatics, 2011], [ Diaz et al., BMC Bioinformatics, 2009] [ Chan et al., J. Biomed. Biotech., 2008], [ Teeling et al., Environ. Microb., 2004] Example: s = A G C A T G C A G C A T A T G T G G A G C A ρT(s) =( . . . , #AGCA = 1, . . . , #CATG = 1, #GCAT = 1, . . . )
  • 26. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Typical ρ's used in binning ρT(s) := frequencies in s of all the k-mers k-mer := sequence of k nucleotides {A, C, G, T}k ρT i (s) := #wi, wi is a k-mer, i = 1, . . . , 4 k Usually k = 4 =⇒ 4 k = 256 features: ρT(s) ∈ N256 [ Mohammed et al., Bioinformatics, 2011], [ Diaz et al., BMC Bioinformatics, 2009] [ Chan et al., J. Biomed. Biotech., 2008], [ Teeling et al., Environ. Microb., 2004] Example: s = A G C A T G C A G C A T A T G T G G A G C A ρT(s) =(#AAAA = 0, . . . , #AGCA = 3, . . . , #ATAT = 1, . . . . . . , #GCAT = 2, . . . )
  • 27. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval What ρ should do z s r −→ ρ Rn
  • 28. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval What ρ should do z s r −→ ρ Rn ρ(s) ρ(z) ρ(r)
  • 29. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval What ρ should do z s r ρ needs to be a genomic signature: [ Karlin et al., Trends. Genet., 1995 ] ρ(s) ≈ ρ(z) ρ(s) = ρ(r) −→ ρ Rn ρ(s) ρ(z) ρ(r)
  • 30. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval What ρ should do z s r ρ needs to be a genomic signature: [ Karlin et al., Trends. Genet., 1995 ] ρ(s) ≈ ρ(z) ρ(s) = ρ(r) but few connections with metagenomics in the literature −→ ρ Rn ρ(s) ρ(z) ρ(r)
  • 31. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Results • Proposed signatures excelled standard signature ρT used in metagenomics • Best signatures had fewer features (half number of dimensions)
  • 32. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Results • Proposed signatures excelled standard signature ρT used in metagenomics • Best signatures had fewer features (half number of dimensions)
  • 33. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Table of Contents Introduction to Metagenomics Taxonomic-annotation Algorithms Genomic Signatures for Metagenomics Metagenomics to Retrieve Anammox Bacteria
  • 34. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Sequencing communities containing anammox bacteria ANaerobic AMMonium OXidation Why anammox are important: • Fixed nitrogen loss • Wastewater-treatment plants Metagenomics: only way to retrieve anammox • Dicult to culture • Not isolable
  • 35. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Sequencing communities containing anammox bacteria ANaerobic AMMonium OXidation Why anammox are important: • Fixed nitrogen loss • Wastewater-treatment plants Metagenomics: only way to retrieve anammox • Dicult to culture • Not isolable
  • 36. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Sequencing communities containing anammox bacteria ANaerobic AMMonium OXidation Why anammox are important: • Fixed nitrogen loss • Wastewater-treatment plants Metagenomics: only way to retrieve anammox • Dicult to culture • Not isolable
  • 37. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Sequencing communities containing anammox bacteria ANaerobic AMMonium OXidation Why anammox are important: • Fixed nitrogen loss • Wastewater-treatment plants Metagenomics: only way to retrieve anammox • Dicult to culture • Not isolable
  • 38. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Why FISH analysis and BlastX annotation don't agree?
  • 39. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Dierent point of view: GC content [ Bernaola-Galvan et al., Gene, 2004 ] • Dierent organisms can have dierent GC content (16.6% - 74.9%) • If genome is partitioned in equally sized, non-overlapping sequences: • GC content has normal distribution (approximately) • Distribution is centered on organism GC content
  • 40. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Bias toward high GC-content organisms Raw Annotated Brocadia Alphaproteobacteria Betaproteobacteria 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 2000 4000 6000 8000 10000 GC−content Frequency 454 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 200 400 600 800 1000 1200 1400 1600 GC−content Frequency Fosmid 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 200 400 600 800 1000 1200 1400 1600 GC−content Frequency Shotgun
  • 41. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Comining technologies improved protein retrieval Extended Venn-diagram of proteins retrieved for 80% of their length • Retrieve anammox core genes Technologies: Shotgun (Sanger): Fosmid (Sanger): 454:
  • 42. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Conclusions • Proposed new eective methods for improving metagenomic data analysis • Studied in details real-life data of anammox bacteria
  • 43. Metagenomics Annotation Algorithms Genomic Signatures Anammox Retrieval Conclusions • Proposed new eective methods for improving metagenomic data analysis • Studied in details real-life data of anammox bacteria