SlideShare a Scribd company logo
1 of 65
Download to read offline
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Metagenomic Data Analysis and Microbial
Genomics
Fabio Gori
Intelligent Systems, Institute for Computing and Information Sciences
in collaboration with
Department of Microbiology
Radboud University Nijmegen
The Netherlands
22
nd May 2015
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Table of Contents
1 Introduction to Metagenomics
2 Taxonomic-annotation Algorithms
3 Metagenomics to Retrieve a Bacteria
4 Comparative Genomics for Antibiotic Resistance
5 Appendix: construction of a tailored reference
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Table of Contents
1 Introduction to Metagenomics
2 Taxonomic-annotation Algorithms
3 Metagenomics to Retrieve a Bacteria
4 Comparative Genomics for Antibiotic Resistance
5 Appendix: construction of a tailored reference
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
What is Metagenomics?
Metagenomics:
study of genomic
imformation obtained
directly from microbial
communities
Why?
99% microbes
cannot be sequenced
Understand interactions
between organisms
Human microbiota
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
What is Metagenomics?
Metagenomics:
study of genomic
imformation obtained
directly from microbial
communities
Why?
99% microbes
cannot be sequenced
Understand interactions
between organisms
Human microbiota
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
What is Metagenomics?
Metagenomics:
study of genomic
imformation obtained
directly from microbial
communities
Why?
99% microbes
cannot be sequenced
Understand interactions
between organisms
Human microbiota
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
What is Metagenomics?
Metagenomics:
study of genomic
imformation obtained
directly from microbial
communities
Why?
99% microbes
cannot be sequenced
Understand interactions
between organisms
Human microbiota
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
What kind of data? A meta. . . jigsaw puzzle
Reads
of multiple microbes
Original pictures are
unknown
Pieces are similar
Biased abundance of pieces
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Annotation: discovering the original pictures of the puzzles
Assign each read
to an organism or
to a taxonomic identier
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Taxonomy: a biological classication
Linnean taxonomy:
Formal system for classifying and naming
living things
Based on a simple hierarchical structure
Similar elements are grouped together
Rank: level in the hierarchy (left)
Taxon: unit of the hierarchy
(group of similar living things)
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Table of Contents
1 Introduction to Metagenomics
2 Taxonomic-annotation Algorithms
3 Metagenomics to Retrieve a Bacteria
4 Comparative Genomics for Antibiotic Resistance
5 Appendix: construction of a tailored reference
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Lowest Common Ancestor (LCA) Algorithm
For each read r of the metagenome:
1 Compare r with reference sequences (e.g. with BLASTX)
2 Assign r to the lowest common taxonomic ancestor
of the matching species Hi 's
Example
LCA
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
LCA: Pros and Cons
Pros:
Higher accuracy than BLASTX best hit
Assign to taxa is more realistic
(with short reads)
Cons:
Few reads at low ranks
Many unassigned reads
How can we improve it?
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
LCA: Pros and Cons
Pros:
Higher accuracy than BLASTX best hit
Assign to taxa is more realistic
(with short reads)
Cons:
Few reads at low ranks
Many unassigned reads
How can we improve it?
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
MTR: Multiple Taxonomic Rank based clustering
Goal: Taxonomic Annotation of Short
Metagenomics reads (rank-level)
Assign from the highest rank
to the lowest feasible rank
Assignments of reads are
dependent on each other
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
MTR Algorithm scheme: top-down strategy
1 Compare reads R with reference proteins
(we used BLASTX and NCBI-NR database)
2 For each rank j (from the highest to the lowest):
1 T ← {taxa at rank j of proteins matching R}
2 Annotate by clustering R in clusters Ci
each Ci corresponds to a taxon ti ∈ T
3 Remove from R reads with incoherent classication
(w.r.t. higher ranks classications)
3 For each rank j (from the lowest to the highest):
1 Majority Vote on clusters' intersections at rank j
2 Make higher ranks classications coherent with the Majority
Vote results
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
MTR Algorithm scheme: top-down strategy
1 Compare reads R with reference proteins
(we used BLASTX and NCBI-NR database)
2 For each rank j (from the highest to the lowest):
1 T ← {taxa at rank j of proteins matching R}
2 Annotate by clustering R in clusters Ci
each Ci corresponds to a taxon ti ∈ T
3 Remove from R reads with incoherent classication
(w.r.t. higher ranks classications)
3 For each rank j (from the lowest to the highest):
1 Majority Vote on clusters' intersections at rank j
2 Make higher ranks classications coherent with the Majority
Vote results
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
MTR Algorithm scheme: top-down strategy
1 Compare reads R with reference proteins
(we used BLASTX and NCBI-NR database)
2 For each rank j (from the highest to the lowest):
1 T ← {taxa at rank j of proteins matching R}
2 Annotate by clustering R in clusters Ci
each Ci corresponds to a taxon ti ∈ T
3 Remove from R reads with incoherent classication
(w.r.t. higher ranks classications)
3 For each rank j (from the lowest to the highest):
1 Majority Vote on clusters' intersections at rank j
2 Make higher ranks classications coherent with the Majority
Vote results
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
MTR: Annotation via combinatorial optimization
For each rank j: For each taxon ti of rank j:
Create cluster Ci ⊆ R of reads similar to taxon ti
Set Covering Problem
Select collection of clusters (taxa) s.t.
No sequence is left outside
Minimal number of selected clusters
If Ci is selected, sequences of Ci will be assigned to ti
Example:
C1 C2 C3 C4 C5 C6
s1 • • •
s2 • •
s3 • •
s4 • • •
s5 • •
s6 • •
s7 • • •
s8 • •
s9 • •
s10 • •
→
Clustering Solution:
C1 C2 C3 C4 C5 C6
s1 • • •
s2 • •
s3 • •
s4 • • •
s5 • •
s6 • •
s7 • • •
s8 • •
s9 • •
s10 • •
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
MTR: Annotation via combinatorial optimization
For each rank j: For each taxon ti of rank j:
Create cluster Ci ⊆ R of reads similar to taxon ti
Set Covering Problem
Select collection of clusters (taxa) s.t.
No sequence is left outside
Minimal number of selected clusters
If Ci is selected, sequences of Ci will be assigned to ti
Example:
C1 C2 C3 C4 C5 C6
s1 • • •
s2 • •
s3 • •
s4 • • •
s5 • •
s6 • •
s7 • • •
s8 • •
s9 • •
s10 • •
→
Clustering Solution:
C1 C2 C3 C4 C5 C6
s1 • • •
s2 • •
s3 • •
s4 • • •
s5 • •
s6 • •
s7 • • •
s8 • •
s9 • •
s10 • •
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
MTR vs LCA: MTR better wrt to quantity
MTR annotates more reads than LCA
Simulated data: MTR 8%  37% more reads
At rank Genus: 28%  89%
Real-life data: MTR 15%  30% more reads
At rank Species: 120%  208%
Experiments: 12 simulated data and 3 real life data (100bp reads)
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
MTR vs LCA: LCA better accuracy
Accuracy and Number of reads assigned (for each rank)
Rank MTR (#of reads) LCA (#of reads)
Kingdom 100.00 (166,948) 99.99 (155,263)
Phylum 99.86 (166,948) 99.93 (155,258)
Class 99.73 (166,936) 99.81 (141,829)
Order 97.67 (166,148) 98.14 (115,732)
Family 97.62 (165,231) 98.04 (110,488)
Genus 97.42 (140,476) 98.35 (110,139)
Table: Data name: M3, Coverage 4X, Tot reads:166,978
Rank MTR (#of reads) LCA (#of reads)
Kingdom 95.07 (88,537) 94.66 (73,176)
Phylum 93.21 (88,537) 92.57 (73,169)
Class 89.25 (87,635) 88.98 (60,294)
Order 89.24 (85,657) 88.44 (57,373)
Family 77.35 (81,366) 81.84 (48,760)
Genus 61.36 (77,307) 74.60 (40,823)
Table: Data name: M2, Coverage 1X, Tot reads:288,730
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
MTR vs LCA: MTR better population distribution
Population distributions (rank Genus) of M2, coverage 0.1X
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Population distributions (rank Genus) of Coral dataset
MTR
1031
279
3492
133
80
14657
4540
90
313
128
1133
MTR
Acinetobacter (9.03%)
Aspergillus (2.44%)
Gibberella (30.57%)
Neurospora (1.16%)
Podospora (0.70%)
Chaetomium (1.28%)
T4−like viruses (0.50%)
Porites (39.75%)
Phaeosphaeria (0.79%)
Magnaporthe (2.74%)
Nitrosopumilus (1.12%)
Others (9.92%)
LCA
944
80
1804
76
76
105
57
643
51
169
76
604
LCA
Acinetobacter (20.15%)
Aspergillus (1.71%)
Gibberella (38.51%)
Neurospora (1.62%)
Podospora (1.62%)
Chaetomium (2.24%)
T4−like viruses (1.22%)
Porites (13.72%)
Phaeosphaeria (1.09%)
Magnaporthe (3.61%)
Nitrosopumilus (1.62%)
Others (12.89%)
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Conclusions
MTR outperforms LCA in two ways:
More sequences annotated
especially at low ranks
Better estimate of
population distribution
LCA tends to be more accurate
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Conclusions
MTR outperforms LCA in two ways:
More sequences annotated
especially at low ranks
Better estimate of
population distribution
LCA tends to be more accurate
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Future Developments
Replace BLASTX with composition-based
similarity measure
Additional constraints of cluster selection
e.g. consistent coverage depth on proteins
or constraints on genome location coverage
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Table of Contents
1 Introduction to Metagenomics
2 Taxonomic-annotation Algorithms
3 Metagenomics to Retrieve a Bacteria
4 Comparative Genomics for Antibiotic Resistance
5 Appendix: construction of a tailored reference
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Metagenomic sequencing to acquire an organism
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Metagenomic sequencing to acquire an organism
Candidatus Brocadia
fulgida
Brocadia genome had not been
previously sequenced
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Metagenomic sequencing to acquire an organism
Candidatus Brocadia
fulgida
Brocadia genome had not been
previously sequenced
Sequencing platforms
(mean read length):
SangerShotgun (800bp)
SangerFosmid (800bp)
454 GS20 (200bp)
First standard annotation:
Reads are assigned to
BLASTX best hit
Reads assigned to Brocadia
if best hit is Kuenenia
(Kuenenia is close relative
of Brocadia)
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Metagenomic sequencing to acquire an organism
Candidatus Brocadia
fulgida
Brocadia genome had not been
previously sequenced
Sequencing platforms
(mean read length):
SangerShotgun (800bp)
SangerFosmid (800bp)
454 GS20 (200bp)
First standard annotation:
Reads are assigned to
BLASTX best hit
Reads assigned to Brocadia
if best hit is Kuenenia
(Kuenenia is close relative
of Brocadia)
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Why FISH analysis and BLASTX annotation do not agree?
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
80% of the cells are Brocadia, but. . .
Brocadia seems underrepresented
Are we sure?
Can we still extract signicant information?
Shotgun Fosmid 454
Brocadia reads 9.68% 13.76% 12.92%
Brocadia bp 9.76% 14.33% 11.34%
Let's do some composition-based analyses. . .
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
80% of the cells are Brocadia, but. . .
Brocadia seems underrepresented
Are we sure?
Can we still extract signicant information?
Shotgun Fosmid 454
Brocadia reads 9.68% 13.76% 12.92%
Brocadia bp 9.76% 14.33% 11.34%
Let's do some composition-based analyses. . .
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Dierent point of view: GC content
[ Bernaola-Galvan et al., Gene, 2004 ]
Dierent organisms can have
dierent GC content
(16.6% - 74.9%)
If genome is partitioned in
equally sized, non-overlapping
sequences:
GC content has normal
distribution (approximately)
Distribution is centered on
organism GC content
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Bias toward high GC-content organisms
Raw
Annotated
Brocadia
Alphaproteobacteria
Betaproteobacteria
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0
2000
4000
6000
8000
10000
GC−content
Frequency
454
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0
200
400
600
800
1000
1200
1400
1600
GC−content
Frequency
Fosmid
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0
200
400
600
800
1000
1200
1400
1600
GC−content
Frequency
Shotgun
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
We saw that Brocadia is underrepresented. . .
How can we cope with that?
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Sets of well-recovered Kuenenia ORFs dier
Technologies:
Shotgun (Sanger):
Fosmid (Sanger):
454:
Extended Venn-diagram of Brocadia Open Reading Frames
retrieved for 80% of their length
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Depth of coverage: correlation on the same ORF
Shotgun  Fosmid Shotgun  454 Fosmid  454
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
−1, −0.7
−0.7, −0.3
−0.3, 0
0, 0.3
0.3, 0.7
0.7, 1
Correlation
Similar: Dierent:
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Future Developments
Tuning sequencing for the specic
community
Integration of composition-based analysis
and BLASTX annotation
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Future Developments
Tuning sequencing for the specic
community
Integration of composition-based analysis
and BLASTX annotation
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Table of Contents
1 Introduction to Metagenomics
2 Taxonomic-annotation Algorithms
3 Metagenomics to Retrieve a Bacteria
4 Comparative Genomics for Antibiotic Resistance
5 Appendix: construction of a tailored reference
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
The antibiotic alarm, Nature, 14 March 2013
Rise of
resistance
(inevitable)
Decline of
development
(economics)
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Waiting for new drugs. . . How can we cope with it?
Multi-drug
treatments
New therapies
(dosage, duration)
Personalised medicine
(e.g. infecting strain,
patient PK/PD,
patient genotype)
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Idea: Drug Switching
Experiments:
Treatments:
Sequential  switch drug
50%50%  cocktail
Control  no drugs
Protocol:
For each season
bacteria grow
in liquid medium
with drug
1% bacteria transfer
3 replicates
Duration: 96 hours
8 seasons of 12 hours
Drugs: Doxycycline,
Erythromycin
Sequencing: after 24h and 96h
18 datasets (red border)
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
My Role
Construct annotated reference genome [custom pipeline]
For each replicate, identify:
Structural Variations (SVs)
[Pindel]
Copy Number Variations (CNVs)
[CNVnator]
Single Nucleotide Polymorphisms (SNPs)
[VarScan]
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Results CNV: 412kb duplicated region at 96 hours
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5 Mb
96 hours
Control
ERY/DOX
50−50
ampE
rrnH
paoC
tauA
ybbJ
mdtG
atoB
rrnG
yqhC
rng
rrnD
rrnC
ubiDrrnA
rh
aM
rrnB
rrnE
slt
Normalised Coverage (1000 bins)
Mean +1
Eux-pump duplications (!)
This region includes the
multidrug eux pump
AcrRAB-TolC
[Peña-Miller et al, PLOS Biology, 2013]
24 96
Time
1
2CoverageRatioInside/OutsideDuplicatedRegion
Dox/Ery
p  0.0001
24 96
Time
50%-50%
p  0.0001
24 96
Time
Control
p = 0.303
(p-value is for t-test)
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Conclusion
Sequential treatments work well in vitro when cocktail fail
Genomics: antibiotics prevent mutations
Futher developments (omics):
Phage role in region duplication
Timing of region duplication
NGS of additional treatments
Transcriptomics
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Table of Contents
1 Introduction to Metagenomics
2 Taxonomic-annotation Algorithms
3 Metagenomics to Retrieve a Bacteria
4 Comparative Genomics for Antibiotic Resistance
5 Appendix: construction of a tailored reference
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Standard approach: de novo assembly  annotation
Solve the jigsaw puzzle
Functional annotation
Done with software and
manual work
Problems (common)
Errors:
Repetitive regions
misassembled
Wrong order/orientation
Annotation quality
Fragmentation
Quality depends on
timemoney
2014: Automated genome
assembly for less than $1,000
[KorenPhillippy, Curr Opin Microbiol, 2015]
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
The alternative: tailored reference
Take the reference genome
of a close relative
Modify it according to
sequencing data
Import annotation from
reference
Pros
Less fragmentation
Higher quality
Better annotation
Cons
You need a close relative
Visually check steps
Ad hoc scripting
Conservative approach
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Our case
Sequenced organism:
E. coli K-12 AG100 growing 24h in M9 medium
Reference genome:
E. coli K-12 MG1655 (available online)
Data (preprocessed):
Reads mapping to reference MG1655: 95.84%
Mean coverage depth: 88.19x (based on MG1655)
Read min/max/mean length (bp): 15 / 99 / 72.17
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
The pipeline
Read preprocessing (standard)
Mapping to reference MG1655 (standard)
Call Structural Variations (SVs)
Assemble unmapped and mapped data
Make intermediate reference
Check SVs and call SNPs
Functional annotation
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Clean  align reads
Reads preprocessing
[fastq-mcf, samtools]
Mapping to reference
[BWA, IGV]
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
The pipeline
Read preprocessing (standard)
Mapping to reference MG1655 (standard)
Call Structural Variations (SVs)
Assemble unmapped and mapped data
Make intermediate reference
Check SVs and call SNPs
Functional annotation
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Structural Variations (SVs)
Use Pindel to call SVs
Deletions, Insertions,
Inversions, Translocations
Indels
Break points
Visually checked [IGV]:
Deletions: 5 (total 47kbp)
Indels: 9
Break points: 9
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
The pipeline
Read preprocessing (standard)
Mapping to reference MG1655 (standard)
Call Structural Variations (SVs)
Assemble unmapped and mapped data
Make intermediate reference
Check SVs and call SNPs
Functional annotation
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
SVs application and assemble unmapped reads
Take the close relative genome
Break in sequences by applying SVs
Extract reads around removed regions
Extract reads not mapped to reference
Assemble ∪ −→
Scaold ∪
[PythonBash scripting, Samtools, Velvet,
SSpace, Gapller]
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
The pipeline
Read preprocessing (standard)
Mapping to reference MG1655 (standard)
Call Structural Variations (SVs)
Assemble unmapped and mapped data
Make intermediate reference
Check SVs and call SNPs
Functional annotation
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Making intermediate and tailored references
Making Intermediate reference
Order scaolds w.r.t reference [Mauve]
Concatenate the 13 aligned scaolds
[Bash one-liner]
Making tailored reference
Look for SVs (none should be present)
Call SNPs [VarScan, vcftools]
Annotation
Export annotation from reference [RATT]
Adjust and annotate missing parts [RAST,
manually edit]
Make le ENA compatible [Python script]
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
In my experience, people do not look at assemblies critically
enough [Nature Methods, 2012]
Clean results need designed protocols, time, and money
Leap forwards has been done recently,
but the sequencing cost is still not very low
[Nature Methods, June 2013; KorenPhillippy, Curr Opin
Microbiol, 2015]
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Combining technologies improved Kuenenia ORFs retrieval
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Threshold of Mapping Percentage
NumberofORFs
Shotgun, Fosmid
Shotgun, 454
Fosmid, 454
All
Shotgun
Fosmid
454
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
SNPs  ribosomal: mutations in the control
Hypothesis: antibiotics slow down adaptation for optimal growth
in culture
Heightened ribosomal demand due to rapid growth
[Condon et al., J Bacteriol 1995]
% mean variant frequency(replicates, if not all)
50%-50% Dox/Ery Control
operon position relative posn
24h 96h 24h 96h 24h 96h
rrnH 226,521 595 5(2)
227,791 1,865 3(1)
17
rrnG 2,723,624 1,865 3(1)
9
2,724,894 595 8
rrnD 3,421,431 1,865 4(1)
13
3,422,701 595 8
rrnC 3,940,810 595 4(1)
17
rrnA 4,034,586 555 7
rrnB 4,165,708 595 4(1)
8
4,166,978 1,865 10
rrnE 4,207,110 595 3(1)
9
4,208,380 1,865 5(1)
7
SNPs signicantly dierent in frequency (ANOVA)
Maybe these ribosomal mutations helps with α-amino acid
starvation, because. . .
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
tauA expressed only under condition of sulfate or cysteine
(α-amino acid) starvation [Eichhorn et al, J Bacteriol, 2000]
yqhC regulates a scavenger of toxic aldehydes produced by lipid
peroxidation [Jarobe et al, Appl Microbiol Biotechnol, 2011]
% mean variant frequency(replicates, if not all)
50%-50% Dox/Ery Control
gene position 24h 96h 24h 96h 24h 96h annotation
DUPLICATED REGION
tauA 384,897 19(1)
68 taurine transport system
yqhC 3,151,384 45 putative ARAC-type
regulatory protein

More Related Content

What's hot

Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsAndrea Telatin
 
BM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of StrathclydeBM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of StrathclydeLeighton Pritchard
 
Formal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural GenomesFormal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural Genomesmadalladam
 
DNA Sequencing in Phylogeny
DNA Sequencing in PhylogenyDNA Sequencing in Phylogeny
DNA Sequencing in PhylogenyBikash1489
 
Human genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsHuman genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsgroovescience
 
2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vuploadProf. Wim Van Criekinge
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorialc.titus.brown
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsRutger Vos
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Prof. Wim Van Criekinge
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesSurya Saha
 
Genevestigator
GenevestigatorGenevestigator
GenevestigatorBITS
 
[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomicsMads Albertsen
 
Microbial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New CyberinfrastructureMicrobial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New CyberinfrastructureLarry Smarr
 
Whole genome taxonomic classi cation for prokaryotic plant pathogens
Whole genome taxonomic classication for prokaryotic plant pathogensWhole genome taxonomic classication for prokaryotic plant pathogens
Whole genome taxonomic classi cation for prokaryotic plant pathogensLeighton Pritchard
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopMorgan Langille
 

What's hot (20)

Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR Genomics
 
BM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of StrathclydeBM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of Strathclyde
 
Formal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural GenomesFormal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural Genomes
 
DNA Sequencing in Phylogeny
DNA Sequencing in PhylogenyDNA Sequencing in Phylogeny
DNA Sequencing in Phylogeny
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Pathogen Genome Data
Pathogen Genome DataPathogen Genome Data
Pathogen Genome Data
 
Human genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsHuman genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traits
 
2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
 
2015 bioinformatics wim_vancriekinge
2015 bioinformatics wim_vancriekinge2015 bioinformatics wim_vancriekinge
2015 bioinformatics wim_vancriekinge
 
Genevestigator
GenevestigatorGenevestigator
Genevestigator
 
[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics
 
Microbial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New CyberinfrastructureMicrobial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New Cyberinfrastructure
 
2015 04 22_time_labs_shared
2015 04 22_time_labs_shared2015 04 22_time_labs_shared
2015 04 22_time_labs_shared
 
Whole genome taxonomic classi cation for prokaryotic plant pathogens
Whole genome taxonomic classication for prokaryotic plant pathogensWhole genome taxonomic classication for prokaryotic plant pathogens
Whole genome taxonomic classi cation for prokaryotic plant pathogens
 
Genotyping in Breeding programs
Genotyping in Breeding programsGenotyping in Breeding programs
Genotyping in Breeding programs
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 

Similar to Metagenomic Data Analysis and Microbial Genomics

4. sequence alignment.pptx
4. sequence alignment.pptx4. sequence alignment.pptx
4. sequence alignment.pptxArupKhakhlari1
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
9739142.ppt
9739142.ppt9739142.ppt
9739142.pptdawitg2
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein functionLars Juhl Jensen
 
Parks kmer metagenomics
Parks kmer metagenomicsParks kmer metagenomics
Parks kmer metagenomicsdparks1134
 
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Christos Argyropoulos
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationNils Gehlenborg
 
LTR-Retrotransposons of Chimpanzee genome
LTR-Retrotransposons of Chimpanzee genomeLTR-Retrotransposons of Chimpanzee genome
LTR-Retrotransposons of Chimpanzee genomeAbhishek Dabral
 
Paper presentation @DILS'07
Paper presentation @DILS'07Paper presentation @DILS'07
Paper presentation @DILS'07Paolo Missier
 
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...QIAGEN
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Natalio Krasnogor
 
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...Data Con LA
 
A computational framework for large-scale analysis of TCRβ immune repertoire ...
A computational framework for large-scale analysis of TCRβ immune repertoire ...A computational framework for large-scale analysis of TCRβ immune repertoire ...
A computational framework for large-scale analysis of TCRβ immune repertoire ...Thermo Fisher Scientific
 
RSEM and DE packages
RSEM and DE packagesRSEM and DE packages
RSEM and DE packagesRavi Gandham
 

Similar to Metagenomic Data Analysis and Microbial Genomics (20)

4. sequence alignment.pptx
4. sequence alignment.pptx4. sequence alignment.pptx
4. sequence alignment.pptx
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
9739142.ppt
9739142.ppt9739142.ppt
9739142.ppt
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Parks kmer metagenomics
Parks kmer metagenomicsParks kmer metagenomics
Parks kmer metagenomics
 
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
LTR-Retrotransposons of Chimpanzee genome
LTR-Retrotransposons of Chimpanzee genomeLTR-Retrotransposons of Chimpanzee genome
LTR-Retrotransposons of Chimpanzee genome
 
TCRpower
TCRpowerTCRpower
TCRpower
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
 
Paper presentation @DILS'07
Paper presentation @DILS'07Paper presentation @DILS'07
Paper presentation @DILS'07
 
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
 
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 
Predictive Features of TCR Repertoire
Predictive Features of TCR RepertoirePredictive Features of TCR Repertoire
Predictive Features of TCR Repertoire
 
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
 
A computational framework for large-scale analysis of TCRβ immune repertoire ...
A computational framework for large-scale analysis of TCRβ immune repertoire ...A computational framework for large-scale analysis of TCRβ immune repertoire ...
A computational framework for large-scale analysis of TCRβ immune repertoire ...
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 
RSEM and DE packages
RSEM and DE packagesRSEM and DE packages
RSEM and DE packages
 

Recently uploaded

Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Jshifa
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 

Recently uploaded (20)

Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 

Metagenomic Data Analysis and Microbial Genomics

  • 1. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Metagenomic Data Analysis and Microbial Genomics Fabio Gori Intelligent Systems, Institute for Computing and Information Sciences in collaboration with Department of Microbiology Radboud University Nijmegen The Netherlands 22 nd May 2015
  • 2. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Table of Contents 1 Introduction to Metagenomics 2 Taxonomic-annotation Algorithms 3 Metagenomics to Retrieve a Bacteria 4 Comparative Genomics for Antibiotic Resistance 5 Appendix: construction of a tailored reference
  • 3. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Table of Contents 1 Introduction to Metagenomics 2 Taxonomic-annotation Algorithms 3 Metagenomics to Retrieve a Bacteria 4 Comparative Genomics for Antibiotic Resistance 5 Appendix: construction of a tailored reference
  • 4. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere What is Metagenomics? Metagenomics: study of genomic imformation obtained directly from microbial communities Why? 99% microbes cannot be sequenced Understand interactions between organisms Human microbiota
  • 5. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere What is Metagenomics? Metagenomics: study of genomic imformation obtained directly from microbial communities Why? 99% microbes cannot be sequenced Understand interactions between organisms Human microbiota
  • 6. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere What is Metagenomics? Metagenomics: study of genomic imformation obtained directly from microbial communities Why? 99% microbes cannot be sequenced Understand interactions between organisms Human microbiota
  • 7. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere What is Metagenomics? Metagenomics: study of genomic imformation obtained directly from microbial communities Why? 99% microbes cannot be sequenced Understand interactions between organisms Human microbiota
  • 8. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere What kind of data? A meta. . . jigsaw puzzle Reads of multiple microbes Original pictures are unknown Pieces are similar Biased abundance of pieces
  • 9. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Annotation: discovering the original pictures of the puzzles Assign each read to an organism or to a taxonomic identier
  • 10. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Taxonomy: a biological classication Linnean taxonomy: Formal system for classifying and naming living things Based on a simple hierarchical structure Similar elements are grouped together Rank: level in the hierarchy (left) Taxon: unit of the hierarchy (group of similar living things)
  • 11. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Table of Contents 1 Introduction to Metagenomics 2 Taxonomic-annotation Algorithms 3 Metagenomics to Retrieve a Bacteria 4 Comparative Genomics for Antibiotic Resistance 5 Appendix: construction of a tailored reference
  • 12. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Lowest Common Ancestor (LCA) Algorithm For each read r of the metagenome: 1 Compare r with reference sequences (e.g. with BLASTX) 2 Assign r to the lowest common taxonomic ancestor of the matching species Hi 's Example LCA H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12
  • 13. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere LCA: Pros and Cons Pros: Higher accuracy than BLASTX best hit Assign to taxa is more realistic (with short reads) Cons: Few reads at low ranks Many unassigned reads How can we improve it?
  • 14. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere LCA: Pros and Cons Pros: Higher accuracy than BLASTX best hit Assign to taxa is more realistic (with short reads) Cons: Few reads at low ranks Many unassigned reads How can we improve it?
  • 15. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere MTR: Multiple Taxonomic Rank based clustering Goal: Taxonomic Annotation of Short Metagenomics reads (rank-level) Assign from the highest rank to the lowest feasible rank Assignments of reads are dependent on each other
  • 16. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere MTR Algorithm scheme: top-down strategy 1 Compare reads R with reference proteins (we used BLASTX and NCBI-NR database) 2 For each rank j (from the highest to the lowest): 1 T ← {taxa at rank j of proteins matching R} 2 Annotate by clustering R in clusters Ci each Ci corresponds to a taxon ti ∈ T 3 Remove from R reads with incoherent classication (w.r.t. higher ranks classications) 3 For each rank j (from the lowest to the highest): 1 Majority Vote on clusters' intersections at rank j 2 Make higher ranks classications coherent with the Majority Vote results
  • 17. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere MTR Algorithm scheme: top-down strategy 1 Compare reads R with reference proteins (we used BLASTX and NCBI-NR database) 2 For each rank j (from the highest to the lowest): 1 T ← {taxa at rank j of proteins matching R} 2 Annotate by clustering R in clusters Ci each Ci corresponds to a taxon ti ∈ T 3 Remove from R reads with incoherent classication (w.r.t. higher ranks classications) 3 For each rank j (from the lowest to the highest): 1 Majority Vote on clusters' intersections at rank j 2 Make higher ranks classications coherent with the Majority Vote results
  • 18. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere MTR Algorithm scheme: top-down strategy 1 Compare reads R with reference proteins (we used BLASTX and NCBI-NR database) 2 For each rank j (from the highest to the lowest): 1 T ← {taxa at rank j of proteins matching R} 2 Annotate by clustering R in clusters Ci each Ci corresponds to a taxon ti ∈ T 3 Remove from R reads with incoherent classication (w.r.t. higher ranks classications) 3 For each rank j (from the lowest to the highest): 1 Majority Vote on clusters' intersections at rank j 2 Make higher ranks classications coherent with the Majority Vote results
  • 19. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere MTR: Annotation via combinatorial optimization For each rank j: For each taxon ti of rank j: Create cluster Ci ⊆ R of reads similar to taxon ti Set Covering Problem Select collection of clusters (taxa) s.t. No sequence is left outside Minimal number of selected clusters If Ci is selected, sequences of Ci will be assigned to ti Example: C1 C2 C3 C4 C5 C6 s1 • • • s2 • • s3 • • s4 • • • s5 • • s6 • • s7 • • • s8 • • s9 • • s10 • • → Clustering Solution: C1 C2 C3 C4 C5 C6 s1 • • • s2 • • s3 • • s4 • • • s5 • • s6 • • s7 • • • s8 • • s9 • • s10 • •
  • 20. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere MTR: Annotation via combinatorial optimization For each rank j: For each taxon ti of rank j: Create cluster Ci ⊆ R of reads similar to taxon ti Set Covering Problem Select collection of clusters (taxa) s.t. No sequence is left outside Minimal number of selected clusters If Ci is selected, sequences of Ci will be assigned to ti Example: C1 C2 C3 C4 C5 C6 s1 • • • s2 • • s3 • • s4 • • • s5 • • s6 • • s7 • • • s8 • • s9 • • s10 • • → Clustering Solution: C1 C2 C3 C4 C5 C6 s1 • • • s2 • • s3 • • s4 • • • s5 • • s6 • • s7 • • • s8 • • s9 • • s10 • •
  • 21. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere MTR vs LCA: MTR better wrt to quantity MTR annotates more reads than LCA Simulated data: MTR 8% 37% more reads At rank Genus: 28% 89% Real-life data: MTR 15% 30% more reads At rank Species: 120% 208% Experiments: 12 simulated data and 3 real life data (100bp reads)
  • 22. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere MTR vs LCA: LCA better accuracy Accuracy and Number of reads assigned (for each rank) Rank MTR (#of reads) LCA (#of reads) Kingdom 100.00 (166,948) 99.99 (155,263) Phylum 99.86 (166,948) 99.93 (155,258) Class 99.73 (166,936) 99.81 (141,829) Order 97.67 (166,148) 98.14 (115,732) Family 97.62 (165,231) 98.04 (110,488) Genus 97.42 (140,476) 98.35 (110,139) Table: Data name: M3, Coverage 4X, Tot reads:166,978 Rank MTR (#of reads) LCA (#of reads) Kingdom 95.07 (88,537) 94.66 (73,176) Phylum 93.21 (88,537) 92.57 (73,169) Class 89.25 (87,635) 88.98 (60,294) Order 89.24 (85,657) 88.44 (57,373) Family 77.35 (81,366) 81.84 (48,760) Genus 61.36 (77,307) 74.60 (40,823) Table: Data name: M2, Coverage 1X, Tot reads:288,730
  • 23. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere MTR vs LCA: MTR better population distribution Population distributions (rank Genus) of M2, coverage 0.1X
  • 24. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Population distributions (rank Genus) of Coral dataset MTR 1031 279 3492 133 80 14657 4540 90 313 128 1133 MTR Acinetobacter (9.03%) Aspergillus (2.44%) Gibberella (30.57%) Neurospora (1.16%) Podospora (0.70%) Chaetomium (1.28%) T4−like viruses (0.50%) Porites (39.75%) Phaeosphaeria (0.79%) Magnaporthe (2.74%) Nitrosopumilus (1.12%) Others (9.92%) LCA 944 80 1804 76 76 105 57 643 51 169 76 604 LCA Acinetobacter (20.15%) Aspergillus (1.71%) Gibberella (38.51%) Neurospora (1.62%) Podospora (1.62%) Chaetomium (2.24%) T4−like viruses (1.22%) Porites (13.72%) Phaeosphaeria (1.09%) Magnaporthe (3.61%) Nitrosopumilus (1.62%) Others (12.89%)
  • 25. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Conclusions MTR outperforms LCA in two ways: More sequences annotated especially at low ranks Better estimate of population distribution LCA tends to be more accurate
  • 26. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Conclusions MTR outperforms LCA in two ways: More sequences annotated especially at low ranks Better estimate of population distribution LCA tends to be more accurate
  • 27. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Future Developments Replace BLASTX with composition-based similarity measure Additional constraints of cluster selection e.g. consistent coverage depth on proteins or constraints on genome location coverage
  • 28. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Table of Contents 1 Introduction to Metagenomics 2 Taxonomic-annotation Algorithms 3 Metagenomics to Retrieve a Bacteria 4 Comparative Genomics for Antibiotic Resistance 5 Appendix: construction of a tailored reference
  • 29. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Metagenomic sequencing to acquire an organism
  • 30. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Metagenomic sequencing to acquire an organism Candidatus Brocadia fulgida Brocadia genome had not been previously sequenced
  • 31. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Metagenomic sequencing to acquire an organism Candidatus Brocadia fulgida Brocadia genome had not been previously sequenced Sequencing platforms (mean read length): SangerShotgun (800bp) SangerFosmid (800bp) 454 GS20 (200bp) First standard annotation: Reads are assigned to BLASTX best hit Reads assigned to Brocadia if best hit is Kuenenia (Kuenenia is close relative of Brocadia)
  • 32. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Metagenomic sequencing to acquire an organism Candidatus Brocadia fulgida Brocadia genome had not been previously sequenced Sequencing platforms (mean read length): SangerShotgun (800bp) SangerFosmid (800bp) 454 GS20 (200bp) First standard annotation: Reads are assigned to BLASTX best hit Reads assigned to Brocadia if best hit is Kuenenia (Kuenenia is close relative of Brocadia)
  • 33. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Why FISH analysis and BLASTX annotation do not agree?
  • 34. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere 80% of the cells are Brocadia, but. . . Brocadia seems underrepresented Are we sure? Can we still extract signicant information? Shotgun Fosmid 454 Brocadia reads 9.68% 13.76% 12.92% Brocadia bp 9.76% 14.33% 11.34% Let's do some composition-based analyses. . .
  • 35. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere 80% of the cells are Brocadia, but. . . Brocadia seems underrepresented Are we sure? Can we still extract signicant information? Shotgun Fosmid 454 Brocadia reads 9.68% 13.76% 12.92% Brocadia bp 9.76% 14.33% 11.34% Let's do some composition-based analyses. . .
  • 36. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Dierent point of view: GC content [ Bernaola-Galvan et al., Gene, 2004 ] Dierent organisms can have dierent GC content (16.6% - 74.9%) If genome is partitioned in equally sized, non-overlapping sequences: GC content has normal distribution (approximately) Distribution is centered on organism GC content
  • 37. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Bias toward high GC-content organisms Raw Annotated Brocadia Alphaproteobacteria Betaproteobacteria 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 2000 4000 6000 8000 10000 GC−content Frequency 454 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 200 400 600 800 1000 1200 1400 1600 GC−content Frequency Fosmid 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 200 400 600 800 1000 1200 1400 1600 GC−content Frequency Shotgun
  • 38. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere We saw that Brocadia is underrepresented. . . How can we cope with that?
  • 39. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Sets of well-recovered Kuenenia ORFs dier Technologies: Shotgun (Sanger): Fosmid (Sanger): 454: Extended Venn-diagram of Brocadia Open Reading Frames retrieved for 80% of their length
  • 40. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Depth of coverage: correlation on the same ORF Shotgun Fosmid Shotgun 454 Fosmid 454 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% −1, −0.7 −0.7, −0.3 −0.3, 0 0, 0.3 0.3, 0.7 0.7, 1 Correlation Similar: Dierent:
  • 41. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Future Developments Tuning sequencing for the specic community Integration of composition-based analysis and BLASTX annotation
  • 42. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Future Developments Tuning sequencing for the specic community Integration of composition-based analysis and BLASTX annotation
  • 43. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Table of Contents 1 Introduction to Metagenomics 2 Taxonomic-annotation Algorithms 3 Metagenomics to Retrieve a Bacteria 4 Comparative Genomics for Antibiotic Resistance 5 Appendix: construction of a tailored reference
  • 44. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere The antibiotic alarm, Nature, 14 March 2013 Rise of resistance (inevitable) Decline of development (economics)
  • 45. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Waiting for new drugs. . . How can we cope with it? Multi-drug treatments New therapies (dosage, duration) Personalised medicine (e.g. infecting strain, patient PK/PD, patient genotype)
  • 46. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Idea: Drug Switching Experiments: Treatments: Sequential switch drug 50%50% cocktail Control no drugs Protocol: For each season bacteria grow in liquid medium with drug 1% bacteria transfer 3 replicates Duration: 96 hours 8 seasons of 12 hours Drugs: Doxycycline, Erythromycin Sequencing: after 24h and 96h 18 datasets (red border)
  • 47. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere My Role Construct annotated reference genome [custom pipeline] For each replicate, identify: Structural Variations (SVs) [Pindel] Copy Number Variations (CNVs) [CNVnator] Single Nucleotide Polymorphisms (SNPs) [VarScan]
  • 48. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Results CNV: 412kb duplicated region at 96 hours 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Mb 96 hours Control ERY/DOX 50−50 ampE rrnH paoC tauA ybbJ mdtG atoB rrnG yqhC rng rrnD rrnC ubiDrrnA rh aM rrnB rrnE slt Normalised Coverage (1000 bins) Mean +1 Eux-pump duplications (!) This region includes the multidrug eux pump AcrRAB-TolC [Peña-Miller et al, PLOS Biology, 2013] 24 96 Time 1 2CoverageRatioInside/OutsideDuplicatedRegion Dox/Ery p 0.0001 24 96 Time 50%-50% p 0.0001 24 96 Time Control p = 0.303 (p-value is for t-test)
  • 49. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Conclusion Sequential treatments work well in vitro when cocktail fail Genomics: antibiotics prevent mutations Futher developments (omics): Phage role in region duplication Timing of region duplication NGS of additional treatments Transcriptomics
  • 50. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Table of Contents 1 Introduction to Metagenomics 2 Taxonomic-annotation Algorithms 3 Metagenomics to Retrieve a Bacteria 4 Comparative Genomics for Antibiotic Resistance 5 Appendix: construction of a tailored reference
  • 51. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Standard approach: de novo assembly annotation Solve the jigsaw puzzle Functional annotation Done with software and manual work Problems (common) Errors: Repetitive regions misassembled Wrong order/orientation Annotation quality Fragmentation Quality depends on timemoney 2014: Automated genome assembly for less than $1,000 [KorenPhillippy, Curr Opin Microbiol, 2015]
  • 52. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere The alternative: tailored reference Take the reference genome of a close relative Modify it according to sequencing data Import annotation from reference Pros Less fragmentation Higher quality Better annotation Cons You need a close relative Visually check steps Ad hoc scripting Conservative approach
  • 53. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Our case Sequenced organism: E. coli K-12 AG100 growing 24h in M9 medium Reference genome: E. coli K-12 MG1655 (available online) Data (preprocessed): Reads mapping to reference MG1655: 95.84% Mean coverage depth: 88.19x (based on MG1655) Read min/max/mean length (bp): 15 / 99 / 72.17
  • 54. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere The pipeline Read preprocessing (standard) Mapping to reference MG1655 (standard) Call Structural Variations (SVs) Assemble unmapped and mapped data Make intermediate reference Check SVs and call SNPs Functional annotation
  • 55. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Clean align reads Reads preprocessing [fastq-mcf, samtools] Mapping to reference [BWA, IGV]
  • 56. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere The pipeline Read preprocessing (standard) Mapping to reference MG1655 (standard) Call Structural Variations (SVs) Assemble unmapped and mapped data Make intermediate reference Check SVs and call SNPs Functional annotation
  • 57. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Structural Variations (SVs) Use Pindel to call SVs Deletions, Insertions, Inversions, Translocations Indels Break points Visually checked [IGV]: Deletions: 5 (total 47kbp) Indels: 9 Break points: 9
  • 58. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere The pipeline Read preprocessing (standard) Mapping to reference MG1655 (standard) Call Structural Variations (SVs) Assemble unmapped and mapped data Make intermediate reference Check SVs and call SNPs Functional annotation
  • 59. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere SVs application and assemble unmapped reads Take the close relative genome Break in sequences by applying SVs Extract reads around removed regions Extract reads not mapped to reference Assemble ∪ −→ Scaold ∪ [PythonBash scripting, Samtools, Velvet, SSpace, Gapller]
  • 60. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere The pipeline Read preprocessing (standard) Mapping to reference MG1655 (standard) Call Structural Variations (SVs) Assemble unmapped and mapped data Make intermediate reference Check SVs and call SNPs Functional annotation
  • 61. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Making intermediate and tailored references Making Intermediate reference Order scaolds w.r.t reference [Mauve] Concatenate the 13 aligned scaolds [Bash one-liner] Making tailored reference Look for SVs (none should be present) Call SNPs [VarScan, vcftools] Annotation Export annotation from reference [RATT] Adjust and annotate missing parts [RAST, manually edit] Make le ENA compatible [Python script]
  • 62. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere In my experience, people do not look at assemblies critically enough [Nature Methods, 2012] Clean results need designed protocols, time, and money Leap forwards has been done recently, but the sequencing cost is still not very low [Nature Methods, June 2013; KorenPhillippy, Curr Opin Microbiol, 2015]
  • 63. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Combining technologies improved Kuenenia ORFs retrieval 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Threshold of Mapping Percentage NumberofORFs Shotgun, Fosmid Shotgun, 454 Fosmid, 454 All Shotgun Fosmid 454
  • 64. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere SNPs ribosomal: mutations in the control Hypothesis: antibiotics slow down adaptation for optimal growth in culture Heightened ribosomal demand due to rapid growth [Condon et al., J Bacteriol 1995] % mean variant frequency(replicates, if not all) 50%-50% Dox/Ery Control operon position relative posn 24h 96h 24h 96h 24h 96h rrnH 226,521 595 5(2) 227,791 1,865 3(1) 17 rrnG 2,723,624 1,865 3(1) 9 2,724,894 595 8 rrnD 3,421,431 1,865 4(1) 13 3,422,701 595 8 rrnC 3,940,810 595 4(1) 17 rrnA 4,034,586 555 7 rrnB 4,165,708 595 4(1) 8 4,166,978 1,865 10 rrnE 4,207,110 595 3(1) 9 4,208,380 1,865 5(1) 7 SNPs signicantly dierent in frequency (ANOVA) Maybe these ribosomal mutations helps with α-amino acid starvation, because. . .
  • 65. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere tauA expressed only under condition of sulfate or cysteine (α-amino acid) starvation [Eichhorn et al, J Bacteriol, 2000] yqhC regulates a scavenger of toxic aldehydes produced by lipid peroxidation [Jarobe et al, Appl Microbiol Biotechnol, 2011] % mean variant frequency(replicates, if not all) 50%-50% Dox/Ery Control gene position 24h 96h 24h 96h 24h 96h annotation DUPLICATED REGION tauA 384,897 19(1) 68 taurine transport system yqhC 3,151,384 45 putative ARAC-type regulatory protein