SlideShare a Scribd company logo
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Metagenomic Data Analysis and Microbial
Genomics
Fabio Gori
Intelligent Systems, Institute for Computing and Information Sciences
in collaboration with
Department of Microbiology
Radboud University Nijmegen
The Netherlands
22
nd May 2015
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Table of Contents
1 Introduction to Metagenomics
2 Taxonomic-annotation Algorithms
3 Metagenomics to Retrieve a Bacteria
4 Comparative Genomics for Antibiotic Resistance
5 Appendix: construction of a tailored reference
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Table of Contents
1 Introduction to Metagenomics
2 Taxonomic-annotation Algorithms
3 Metagenomics to Retrieve a Bacteria
4 Comparative Genomics for Antibiotic Resistance
5 Appendix: construction of a tailored reference
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
What is Metagenomics?
Metagenomics:
study of genomic
imformation obtained
directly from microbial
communities
Why?
99% microbes
cannot be sequenced
Understand interactions
between organisms
Human microbiota
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
What is Metagenomics?
Metagenomics:
study of genomic
imformation obtained
directly from microbial
communities
Why?
99% microbes
cannot be sequenced
Understand interactions
between organisms
Human microbiota
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
What is Metagenomics?
Metagenomics:
study of genomic
imformation obtained
directly from microbial
communities
Why?
99% microbes
cannot be sequenced
Understand interactions
between organisms
Human microbiota
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
What is Metagenomics?
Metagenomics:
study of genomic
imformation obtained
directly from microbial
communities
Why?
99% microbes
cannot be sequenced
Understand interactions
between organisms
Human microbiota
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
What kind of data? A meta. . . jigsaw puzzle
Reads
of multiple microbes
Original pictures are
unknown
Pieces are similar
Biased abundance of pieces
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Annotation: discovering the original pictures of the puzzles
Assign each read
to an organism or
to a taxonomic identier
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Taxonomy: a biological classication
Linnean taxonomy:
Formal system for classifying and naming
living things
Based on a simple hierarchical structure
Similar elements are grouped together
Rank: level in the hierarchy (left)
Taxon: unit of the hierarchy
(group of similar living things)
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Table of Contents
1 Introduction to Metagenomics
2 Taxonomic-annotation Algorithms
3 Metagenomics to Retrieve a Bacteria
4 Comparative Genomics for Antibiotic Resistance
5 Appendix: construction of a tailored reference
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Lowest Common Ancestor (LCA) Algorithm
For each read r of the metagenome:
1 Compare r with reference sequences (e.g. with BLASTX)
2 Assign r to the lowest common taxonomic ancestor
of the matching species Hi 's
Example
LCA
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
LCA: Pros and Cons
Pros:
Higher accuracy than BLASTX best hit
Assign to taxa is more realistic
(with short reads)
Cons:
Few reads at low ranks
Many unassigned reads
How can we improve it?
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
LCA: Pros and Cons
Pros:
Higher accuracy than BLASTX best hit
Assign to taxa is more realistic
(with short reads)
Cons:
Few reads at low ranks
Many unassigned reads
How can we improve it?
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
MTR: Multiple Taxonomic Rank based clustering
Goal: Taxonomic Annotation of Short
Metagenomics reads (rank-level)
Assign from the highest rank
to the lowest feasible rank
Assignments of reads are
dependent on each other
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
MTR Algorithm scheme: top-down strategy
1 Compare reads R with reference proteins
(we used BLASTX and NCBI-NR database)
2 For each rank j (from the highest to the lowest):
1 T ← {taxa at rank j of proteins matching R}
2 Annotate by clustering R in clusters Ci
each Ci corresponds to a taxon ti ∈ T
3 Remove from R reads with incoherent classication
(w.r.t. higher ranks classications)
3 For each rank j (from the lowest to the highest):
1 Majority Vote on clusters' intersections at rank j
2 Make higher ranks classications coherent with the Majority
Vote results
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
MTR Algorithm scheme: top-down strategy
1 Compare reads R with reference proteins
(we used BLASTX and NCBI-NR database)
2 For each rank j (from the highest to the lowest):
1 T ← {taxa at rank j of proteins matching R}
2 Annotate by clustering R in clusters Ci
each Ci corresponds to a taxon ti ∈ T
3 Remove from R reads with incoherent classication
(w.r.t. higher ranks classications)
3 For each rank j (from the lowest to the highest):
1 Majority Vote on clusters' intersections at rank j
2 Make higher ranks classications coherent with the Majority
Vote results
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
MTR Algorithm scheme: top-down strategy
1 Compare reads R with reference proteins
(we used BLASTX and NCBI-NR database)
2 For each rank j (from the highest to the lowest):
1 T ← {taxa at rank j of proteins matching R}
2 Annotate by clustering R in clusters Ci
each Ci corresponds to a taxon ti ∈ T
3 Remove from R reads with incoherent classication
(w.r.t. higher ranks classications)
3 For each rank j (from the lowest to the highest):
1 Majority Vote on clusters' intersections at rank j
2 Make higher ranks classications coherent with the Majority
Vote results
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
MTR: Annotation via combinatorial optimization
For each rank j: For each taxon ti of rank j:
Create cluster Ci ⊆ R of reads similar to taxon ti
Set Covering Problem
Select collection of clusters (taxa) s.t.
No sequence is left outside
Minimal number of selected clusters
If Ci is selected, sequences of Ci will be assigned to ti
Example:
C1 C2 C3 C4 C5 C6
s1 • • •
s2 • •
s3 • •
s4 • • •
s5 • •
s6 • •
s7 • • •
s8 • •
s9 • •
s10 • •
→
Clustering Solution:
C1 C2 C3 C4 C5 C6
s1 • • •
s2 • •
s3 • •
s4 • • •
s5 • •
s6 • •
s7 • • •
s8 • •
s9 • •
s10 • •
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
MTR: Annotation via combinatorial optimization
For each rank j: For each taxon ti of rank j:
Create cluster Ci ⊆ R of reads similar to taxon ti
Set Covering Problem
Select collection of clusters (taxa) s.t.
No sequence is left outside
Minimal number of selected clusters
If Ci is selected, sequences of Ci will be assigned to ti
Example:
C1 C2 C3 C4 C5 C6
s1 • • •
s2 • •
s3 • •
s4 • • •
s5 • •
s6 • •
s7 • • •
s8 • •
s9 • •
s10 • •
→
Clustering Solution:
C1 C2 C3 C4 C5 C6
s1 • • •
s2 • •
s3 • •
s4 • • •
s5 • •
s6 • •
s7 • • •
s8 • •
s9 • •
s10 • •
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
MTR vs LCA: MTR better wrt to quantity
MTR annotates more reads than LCA
Simulated data: MTR 8%  37% more reads
At rank Genus: 28%  89%
Real-life data: MTR 15%  30% more reads
At rank Species: 120%  208%
Experiments: 12 simulated data and 3 real life data (100bp reads)
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
MTR vs LCA: LCA better accuracy
Accuracy and Number of reads assigned (for each rank)
Rank MTR (#of reads) LCA (#of reads)
Kingdom 100.00 (166,948) 99.99 (155,263)
Phylum 99.86 (166,948) 99.93 (155,258)
Class 99.73 (166,936) 99.81 (141,829)
Order 97.67 (166,148) 98.14 (115,732)
Family 97.62 (165,231) 98.04 (110,488)
Genus 97.42 (140,476) 98.35 (110,139)
Table: Data name: M3, Coverage 4X, Tot reads:166,978
Rank MTR (#of reads) LCA (#of reads)
Kingdom 95.07 (88,537) 94.66 (73,176)
Phylum 93.21 (88,537) 92.57 (73,169)
Class 89.25 (87,635) 88.98 (60,294)
Order 89.24 (85,657) 88.44 (57,373)
Family 77.35 (81,366) 81.84 (48,760)
Genus 61.36 (77,307) 74.60 (40,823)
Table: Data name: M2, Coverage 1X, Tot reads:288,730
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
MTR vs LCA: MTR better population distribution
Population distributions (rank Genus) of M2, coverage 0.1X
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Population distributions (rank Genus) of Coral dataset
MTR
1031
279
3492
133
80
14657
4540
90
313
128
1133
MTR
Acinetobacter (9.03%)
Aspergillus (2.44%)
Gibberella (30.57%)
Neurospora (1.16%)
Podospora (0.70%)
Chaetomium (1.28%)
T4−like viruses (0.50%)
Porites (39.75%)
Phaeosphaeria (0.79%)
Magnaporthe (2.74%)
Nitrosopumilus (1.12%)
Others (9.92%)
LCA
944
80
1804
76
76
105
57
643
51
169
76
604
LCA
Acinetobacter (20.15%)
Aspergillus (1.71%)
Gibberella (38.51%)
Neurospora (1.62%)
Podospora (1.62%)
Chaetomium (2.24%)
T4−like viruses (1.22%)
Porites (13.72%)
Phaeosphaeria (1.09%)
Magnaporthe (3.61%)
Nitrosopumilus (1.62%)
Others (12.89%)
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Conclusions
MTR outperforms LCA in two ways:
More sequences annotated
especially at low ranks
Better estimate of
population distribution
LCA tends to be more accurate
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Conclusions
MTR outperforms LCA in two ways:
More sequences annotated
especially at low ranks
Better estimate of
population distribution
LCA tends to be more accurate
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Future Developments
Replace BLASTX with composition-based
similarity measure
Additional constraints of cluster selection
e.g. consistent coverage depth on proteins
or constraints on genome location coverage
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Table of Contents
1 Introduction to Metagenomics
2 Taxonomic-annotation Algorithms
3 Metagenomics to Retrieve a Bacteria
4 Comparative Genomics for Antibiotic Resistance
5 Appendix: construction of a tailored reference
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Metagenomic sequencing to acquire an organism
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Metagenomic sequencing to acquire an organism
Candidatus Brocadia
fulgida
Brocadia genome had not been
previously sequenced
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Metagenomic sequencing to acquire an organism
Candidatus Brocadia
fulgida
Brocadia genome had not been
previously sequenced
Sequencing platforms
(mean read length):
SangerShotgun (800bp)
SangerFosmid (800bp)
454 GS20 (200bp)
First standard annotation:
Reads are assigned to
BLASTX best hit
Reads assigned to Brocadia
if best hit is Kuenenia
(Kuenenia is close relative
of Brocadia)
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Metagenomic sequencing to acquire an organism
Candidatus Brocadia
fulgida
Brocadia genome had not been
previously sequenced
Sequencing platforms
(mean read length):
SangerShotgun (800bp)
SangerFosmid (800bp)
454 GS20 (200bp)
First standard annotation:
Reads are assigned to
BLASTX best hit
Reads assigned to Brocadia
if best hit is Kuenenia
(Kuenenia is close relative
of Brocadia)
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Why FISH analysis and BLASTX annotation do not agree?
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
80% of the cells are Brocadia, but. . .
Brocadia seems underrepresented
Are we sure?
Can we still extract signicant information?
Shotgun Fosmid 454
Brocadia reads 9.68% 13.76% 12.92%
Brocadia bp 9.76% 14.33% 11.34%
Let's do some composition-based analyses. . .
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
80% of the cells are Brocadia, but. . .
Brocadia seems underrepresented
Are we sure?
Can we still extract signicant information?
Shotgun Fosmid 454
Brocadia reads 9.68% 13.76% 12.92%
Brocadia bp 9.76% 14.33% 11.34%
Let's do some composition-based analyses. . .
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Dierent point of view: GC content
[ Bernaola-Galvan et al., Gene, 2004 ]
Dierent organisms can have
dierent GC content
(16.6% - 74.9%)
If genome is partitioned in
equally sized, non-overlapping
sequences:
GC content has normal
distribution (approximately)
Distribution is centered on
organism GC content
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Bias toward high GC-content organisms
Raw
Annotated
Brocadia
Alphaproteobacteria
Betaproteobacteria
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0
2000
4000
6000
8000
10000
GC−content
Frequency
454
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0
200
400
600
800
1000
1200
1400
1600
GC−content
Frequency
Fosmid
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0
200
400
600
800
1000
1200
1400
1600
GC−content
Frequency
Shotgun
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
We saw that Brocadia is underrepresented. . .
How can we cope with that?
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Sets of well-recovered Kuenenia ORFs dier
Technologies:
Shotgun (Sanger):
Fosmid (Sanger):
454:
Extended Venn-diagram of Brocadia Open Reading Frames
retrieved for 80% of their length
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Depth of coverage: correlation on the same ORF
Shotgun  Fosmid Shotgun  454 Fosmid  454
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
−1, −0.7
−0.7, −0.3
−0.3, 0
0, 0.3
0.3, 0.7
0.7, 1
Correlation
Similar: Dierent:
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Future Developments
Tuning sequencing for the specic
community
Integration of composition-based analysis
and BLASTX annotation
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Future Developments
Tuning sequencing for the specic
community
Integration of composition-based analysis
and BLASTX annotation
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Table of Contents
1 Introduction to Metagenomics
2 Taxonomic-annotation Algorithms
3 Metagenomics to Retrieve a Bacteria
4 Comparative Genomics for Antibiotic Resistance
5 Appendix: construction of a tailored reference
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
The antibiotic alarm, Nature, 14 March 2013
Rise of
resistance
(inevitable)
Decline of
development
(economics)
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Waiting for new drugs. . . How can we cope with it?
Multi-drug
treatments
New therapies
(dosage, duration)
Personalised medicine
(e.g. infecting strain,
patient PK/PD,
patient genotype)
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Idea: Drug Switching
Experiments:
Treatments:
Sequential  switch drug
50%50%  cocktail
Control  no drugs
Protocol:
For each season
bacteria grow
in liquid medium
with drug
1% bacteria transfer
3 replicates
Duration: 96 hours
8 seasons of 12 hours
Drugs: Doxycycline,
Erythromycin
Sequencing: after 24h and 96h
18 datasets (red border)
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
My Role
Construct annotated reference genome [custom pipeline]
For each replicate, identify:
Structural Variations (SVs)
[Pindel]
Copy Number Variations (CNVs)
[CNVnator]
Single Nucleotide Polymorphisms (SNPs)
[VarScan]
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Results CNV: 412kb duplicated region at 96 hours
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5 Mb
96 hours
Control
ERY/DOX
50−50
ampE
rrnH
paoC
tauA
ybbJ
mdtG
atoB
rrnG
yqhC
rng
rrnD
rrnC
ubiDrrnA
rh
aM
rrnB
rrnE
slt
Normalised Coverage (1000 bins)
Mean +1
Eux-pump duplications (!)
This region includes the
multidrug eux pump
AcrRAB-TolC
[Peña-Miller et al, PLOS Biology, 2013]
24 96
Time
1
2CoverageRatioInside/OutsideDuplicatedRegion
Dox/Ery
p  0.0001
24 96
Time
50%-50%
p  0.0001
24 96
Time
Control
p = 0.303
(p-value is for t-test)
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Conclusion
Sequential treatments work well in vitro when cocktail fail
Genomics: antibiotics prevent mutations
Futher developments (omics):
Phage role in region duplication
Timing of region duplication
NGS of additional treatments
Transcriptomics
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Table of Contents
1 Introduction to Metagenomics
2 Taxonomic-annotation Algorithms
3 Metagenomics to Retrieve a Bacteria
4 Comparative Genomics for Antibiotic Resistance
5 Appendix: construction of a tailored reference
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Standard approach: de novo assembly  annotation
Solve the jigsaw puzzle
Functional annotation
Done with software and
manual work
Problems (common)
Errors:
Repetitive regions
misassembled
Wrong order/orientation
Annotation quality
Fragmentation
Quality depends on
timemoney
2014: Automated genome
assembly for less than $1,000
[KorenPhillippy, Curr Opin Microbiol, 2015]
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
The alternative: tailored reference
Take the reference genome
of a close relative
Modify it according to
sequencing data
Import annotation from
reference
Pros
Less fragmentation
Higher quality
Better annotation
Cons
You need a close relative
Visually check steps
Ad hoc scripting
Conservative approach
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Our case
Sequenced organism:
E. coli K-12 AG100 growing 24h in M9 medium
Reference genome:
E. coli K-12 MG1655 (available online)
Data (preprocessed):
Reads mapping to reference MG1655: 95.84%
Mean coverage depth: 88.19x (based on MG1655)
Read min/max/mean length (bp): 15 / 99 / 72.17
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
The pipeline
Read preprocessing (standard)
Mapping to reference MG1655 (standard)
Call Structural Variations (SVs)
Assemble unmapped and mapped data
Make intermediate reference
Check SVs and call SNPs
Functional annotation
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Clean  align reads
Reads preprocessing
[fastq-mcf, samtools]
Mapping to reference
[BWA, IGV]
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
The pipeline
Read preprocessing (standard)
Mapping to reference MG1655 (standard)
Call Structural Variations (SVs)
Assemble unmapped and mapped data
Make intermediate reference
Check SVs and call SNPs
Functional annotation
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Structural Variations (SVs)
Use Pindel to call SVs
Deletions, Insertions,
Inversions, Translocations
Indels
Break points
Visually checked [IGV]:
Deletions: 5 (total 47kbp)
Indels: 9
Break points: 9
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
The pipeline
Read preprocessing (standard)
Mapping to reference MG1655 (standard)
Call Structural Variations (SVs)
Assemble unmapped and mapped data
Make intermediate reference
Check SVs and call SNPs
Functional annotation
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
SVs application and assemble unmapped reads
Take the close relative genome
Break in sequences by applying SVs
Extract reads around removed regions
Extract reads not mapped to reference
Assemble ∪ −→
Scaold ∪
[PythonBash scripting, Samtools, Velvet,
SSpace, Gapller]
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
The pipeline
Read preprocessing (standard)
Mapping to reference MG1655 (standard)
Call Structural Variations (SVs)
Assemble unmapped and mapped data
Make intermediate reference
Check SVs and call SNPs
Functional annotation
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Making intermediate and tailored references
Making Intermediate reference
Order scaolds w.r.t reference [Mauve]
Concatenate the 13 aligned scaolds
[Bash one-liner]
Making tailored reference
Look for SVs (none should be present)
Call SNPs [VarScan, vcftools]
Annotation
Export annotation from reference [RATT]
Adjust and annotate missing parts [RAST,
manually edit]
Make le ENA compatible [Python script]
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
In my experience, people do not look at assemblies critically
enough [Nature Methods, 2012]
Clean results need designed protocols, time, and money
Leap forwards has been done recently,
but the sequencing cost is still not very low
[Nature Methods, June 2013; KorenPhillippy, Curr Opin
Microbiol, 2015]
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
Combining technologies improved Kuenenia ORFs retrieval
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Threshold of Mapping Percentage
NumberofORFs
Shotgun, Fosmid
Shotgun, 454
Fosmid, 454
All
Shotgun
Fosmid
454
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
SNPs  ribosomal: mutations in the control
Hypothesis: antibiotics slow down adaptation for optimal growth
in culture
Heightened ribosomal demand due to rapid growth
[Condon et al., J Bacteriol 1995]
% mean variant frequency(replicates, if not all)
50%-50% Dox/Ery Control
operon position relative posn
24h 96h 24h 96h 24h 96h
rrnH 226,521 595 5(2)
227,791 1,865 3(1)
17
rrnG 2,723,624 1,865 3(1)
9
2,724,894 595 8
rrnD 3,421,431 1,865 4(1)
13
3,422,701 595 8
rrnC 3,940,810 595 4(1)
17
rrnA 4,034,586 555 7
rrnB 4,165,708 595 4(1)
8
4,166,978 1,865 10
rrnE 4,207,110 595 3(1)
9
4,208,380 1,865 5(1)
7
SNPs signicantly dierent in frequency (ANOVA)
Maybe these ribosomal mutations helps with α-amino acid
starvation, because. . .
Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere
tauA expressed only under condition of sulfate or cysteine
(α-amino acid) starvation [Eichhorn et al, J Bacteriol, 2000]
yqhC regulates a scavenger of toxic aldehydes produced by lipid
peroxidation [Jarobe et al, Appl Microbiol Biotechnol, 2011]
% mean variant frequency(replicates, if not all)
50%-50% Dox/Ery Control
gene position 24h 96h 24h 96h 24h 96h annotation
DUPLICATED REGION
tauA 384,897 19(1)
68 taurine transport system
yqhC 3,151,384 45 putative ARAC-type
regulatory protein

More Related Content

What's hot

Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR Genomics
Andrea Telatin
 
BM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of StrathclydeBM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of Strathclyde
Leighton Pritchard
 
Formal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural GenomesFormal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural Genomes
madalladam
 
DNA Sequencing in Phylogeny
DNA Sequencing in PhylogenyDNA Sequencing in Phylogeny
DNA Sequencing in Phylogeny
Bikash1489
 
Machine Learning
Machine LearningMachine Learning
Pathogen Genome Data
Pathogen Genome DataPathogen Genome Data
Pathogen Genome Data
Leighton Pritchard
 
Human genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsHuman genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traits
groovescience
 
2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload
Prof. Wim Van Criekinge
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
c.titus.brown
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
Rutger Vos
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014
Prof. Wim Van Criekinge
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Surya Saha
 
2015 bioinformatics wim_vancriekinge
2015 bioinformatics wim_vancriekinge2015 bioinformatics wim_vancriekinge
2015 bioinformatics wim_vancriekinge
Prof. Wim Van Criekinge
 
Genevestigator
GenevestigatorGenevestigator
Genevestigator
BITS
 
[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics
Mads Albertsen
 
Microbial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New CyberinfrastructureMicrobial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New Cyberinfrastructure
Larry Smarr
 
2015 04 22_time_labs_shared
2015 04 22_time_labs_shared2015 04 22_time_labs_shared
2015 04 22_time_labs_shared
Prof. Wim Van Criekinge
 
Whole genome taxonomic classi cation for prokaryotic plant pathogens
Whole genome taxonomic classication for prokaryotic plant pathogensWhole genome taxonomic classication for prokaryotic plant pathogens
Whole genome taxonomic classi cation for prokaryotic plant pathogens
Leighton Pritchard
 
Genotyping in Breeding programs
Genotyping in Breeding programsGenotyping in Breeding programs
Genotyping in Breeding programs
International Institute of Tropical Agriculture
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
Morgan Langille
 

What's hot (20)

Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR Genomics
 
BM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of StrathclydeBM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of Strathclyde
 
Formal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural GenomesFormal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural Genomes
 
DNA Sequencing in Phylogeny
DNA Sequencing in PhylogenyDNA Sequencing in Phylogeny
DNA Sequencing in Phylogeny
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Pathogen Genome Data
Pathogen Genome DataPathogen Genome Data
Pathogen Genome Data
 
Human genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsHuman genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traits
 
2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
 
2015 bioinformatics wim_vancriekinge
2015 bioinformatics wim_vancriekinge2015 bioinformatics wim_vancriekinge
2015 bioinformatics wim_vancriekinge
 
Genevestigator
GenevestigatorGenevestigator
Genevestigator
 
[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics
 
Microbial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New CyberinfrastructureMicrobial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New Cyberinfrastructure
 
2015 04 22_time_labs_shared
2015 04 22_time_labs_shared2015 04 22_time_labs_shared
2015 04 22_time_labs_shared
 
Whole genome taxonomic classi cation for prokaryotic plant pathogens
Whole genome taxonomic classication for prokaryotic plant pathogensWhole genome taxonomic classication for prokaryotic plant pathogens
Whole genome taxonomic classi cation for prokaryotic plant pathogens
 
Genotyping in Breeding programs
Genotyping in Breeding programsGenotyping in Breeding programs
Genotyping in Breeding programs
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 

Similar to Metagenomic Data Analysis and Microbial Genomics

4. sequence alignment.pptx
4. sequence alignment.pptx4. sequence alignment.pptx
4. sequence alignment.pptx
ArupKhakhlari1
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
Abhishek Vatsa
 
9739142.ppt
9739142.ppt9739142.ppt
9739142.ppt
dawitg2
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
Lars Juhl Jensen
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
seyed mohammad motevalli
 
Parks kmer metagenomics
Parks kmer metagenomicsParks kmer metagenomics
Parks kmer metagenomics
dparks1134
 
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Christos Argyropoulos
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Nils Gehlenborg
 
LTR-Retrotransposons of Chimpanzee genome
LTR-Retrotransposons of Chimpanzee genomeLTR-Retrotransposons of Chimpanzee genome
LTR-Retrotransposons of Chimpanzee genome
Abhishek Dabral
 
TCRpower
TCRpowerTCRpower
TCRpower
Hoffman Lab
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
DEBPRASAD DUTTA
 
Paper presentation @DILS'07
Paper presentation @DILS'07Paper presentation @DILS'07
Paper presentation @DILS'07
Paolo Missier
 
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
QIAGEN
 
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
VHIR Vall d’Hebron Institut de Recerca
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Natalio Krasnogor
 
Predictive Features of TCR Repertoire
Predictive Features of TCR RepertoirePredictive Features of TCR Repertoire
Predictive Features of TCR Repertoire
Thi K. Tran-Nguyen, PhD
 
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
Data Con LA
 
A computational framework for large-scale analysis of TCRβ immune repertoire ...
A computational framework for large-scale analysis of TCRβ immune repertoire ...A computational framework for large-scale analysis of TCRβ immune repertoire ...
A computational framework for large-scale analysis of TCRβ immune repertoire ...
Thermo Fisher Scientific
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
James McInerney
 
RSEM and DE packages
RSEM and DE packagesRSEM and DE packages
RSEM and DE packages
Ravi Gandham
 

Similar to Metagenomic Data Analysis and Microbial Genomics (20)

4. sequence alignment.pptx
4. sequence alignment.pptx4. sequence alignment.pptx
4. sequence alignment.pptx
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
9739142.ppt
9739142.ppt9739142.ppt
9739142.ppt
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Parks kmer metagenomics
Parks kmer metagenomicsParks kmer metagenomics
Parks kmer metagenomics
 
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
LTR-Retrotransposons of Chimpanzee genome
LTR-Retrotransposons of Chimpanzee genomeLTR-Retrotransposons of Chimpanzee genome
LTR-Retrotransposons of Chimpanzee genome
 
TCRpower
TCRpowerTCRpower
TCRpower
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
 
Paper presentation @DILS'07
Paper presentation @DILS'07Paper presentation @DILS'07
Paper presentation @DILS'07
 
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
 
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 
Predictive Features of TCR Repertoire
Predictive Features of TCR RepertoirePredictive Features of TCR Repertoire
Predictive Features of TCR Repertoire
 
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
 
A computational framework for large-scale analysis of TCRβ immune repertoire ...
A computational framework for large-scale analysis of TCRβ immune repertoire ...A computational framework for large-scale analysis of TCRβ immune repertoire ...
A computational framework for large-scale analysis of TCRβ immune repertoire ...
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 
RSEM and DE packages
RSEM and DE packagesRSEM and DE packages
RSEM and DE packages
 

Recently uploaded

Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
LengamoLAppostilic
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 

Recently uploaded (20)

Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 

Metagenomic Data Analysis and Microbial Genomics

  • 1. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Metagenomic Data Analysis and Microbial Genomics Fabio Gori Intelligent Systems, Institute for Computing and Information Sciences in collaboration with Department of Microbiology Radboud University Nijmegen The Netherlands 22 nd May 2015
  • 2. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Table of Contents 1 Introduction to Metagenomics 2 Taxonomic-annotation Algorithms 3 Metagenomics to Retrieve a Bacteria 4 Comparative Genomics for Antibiotic Resistance 5 Appendix: construction of a tailored reference
  • 3. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Table of Contents 1 Introduction to Metagenomics 2 Taxonomic-annotation Algorithms 3 Metagenomics to Retrieve a Bacteria 4 Comparative Genomics for Antibiotic Resistance 5 Appendix: construction of a tailored reference
  • 4. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere What is Metagenomics? Metagenomics: study of genomic imformation obtained directly from microbial communities Why? 99% microbes cannot be sequenced Understand interactions between organisms Human microbiota
  • 5. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere What is Metagenomics? Metagenomics: study of genomic imformation obtained directly from microbial communities Why? 99% microbes cannot be sequenced Understand interactions between organisms Human microbiota
  • 6. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere What is Metagenomics? Metagenomics: study of genomic imformation obtained directly from microbial communities Why? 99% microbes cannot be sequenced Understand interactions between organisms Human microbiota
  • 7. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere What is Metagenomics? Metagenomics: study of genomic imformation obtained directly from microbial communities Why? 99% microbes cannot be sequenced Understand interactions between organisms Human microbiota
  • 8. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere What kind of data? A meta. . . jigsaw puzzle Reads of multiple microbes Original pictures are unknown Pieces are similar Biased abundance of pieces
  • 9. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Annotation: discovering the original pictures of the puzzles Assign each read to an organism or to a taxonomic identier
  • 10. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Taxonomy: a biological classication Linnean taxonomy: Formal system for classifying and naming living things Based on a simple hierarchical structure Similar elements are grouped together Rank: level in the hierarchy (left) Taxon: unit of the hierarchy (group of similar living things)
  • 11. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Table of Contents 1 Introduction to Metagenomics 2 Taxonomic-annotation Algorithms 3 Metagenomics to Retrieve a Bacteria 4 Comparative Genomics for Antibiotic Resistance 5 Appendix: construction of a tailored reference
  • 12. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Lowest Common Ancestor (LCA) Algorithm For each read r of the metagenome: 1 Compare r with reference sequences (e.g. with BLASTX) 2 Assign r to the lowest common taxonomic ancestor of the matching species Hi 's Example LCA H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12
  • 13. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere LCA: Pros and Cons Pros: Higher accuracy than BLASTX best hit Assign to taxa is more realistic (with short reads) Cons: Few reads at low ranks Many unassigned reads How can we improve it?
  • 14. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere LCA: Pros and Cons Pros: Higher accuracy than BLASTX best hit Assign to taxa is more realistic (with short reads) Cons: Few reads at low ranks Many unassigned reads How can we improve it?
  • 15. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere MTR: Multiple Taxonomic Rank based clustering Goal: Taxonomic Annotation of Short Metagenomics reads (rank-level) Assign from the highest rank to the lowest feasible rank Assignments of reads are dependent on each other
  • 16. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere MTR Algorithm scheme: top-down strategy 1 Compare reads R with reference proteins (we used BLASTX and NCBI-NR database) 2 For each rank j (from the highest to the lowest): 1 T ← {taxa at rank j of proteins matching R} 2 Annotate by clustering R in clusters Ci each Ci corresponds to a taxon ti ∈ T 3 Remove from R reads with incoherent classication (w.r.t. higher ranks classications) 3 For each rank j (from the lowest to the highest): 1 Majority Vote on clusters' intersections at rank j 2 Make higher ranks classications coherent with the Majority Vote results
  • 17. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere MTR Algorithm scheme: top-down strategy 1 Compare reads R with reference proteins (we used BLASTX and NCBI-NR database) 2 For each rank j (from the highest to the lowest): 1 T ← {taxa at rank j of proteins matching R} 2 Annotate by clustering R in clusters Ci each Ci corresponds to a taxon ti ∈ T 3 Remove from R reads with incoherent classication (w.r.t. higher ranks classications) 3 For each rank j (from the lowest to the highest): 1 Majority Vote on clusters' intersections at rank j 2 Make higher ranks classications coherent with the Majority Vote results
  • 18. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere MTR Algorithm scheme: top-down strategy 1 Compare reads R with reference proteins (we used BLASTX and NCBI-NR database) 2 For each rank j (from the highest to the lowest): 1 T ← {taxa at rank j of proteins matching R} 2 Annotate by clustering R in clusters Ci each Ci corresponds to a taxon ti ∈ T 3 Remove from R reads with incoherent classication (w.r.t. higher ranks classications) 3 For each rank j (from the lowest to the highest): 1 Majority Vote on clusters' intersections at rank j 2 Make higher ranks classications coherent with the Majority Vote results
  • 19. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere MTR: Annotation via combinatorial optimization For each rank j: For each taxon ti of rank j: Create cluster Ci ⊆ R of reads similar to taxon ti Set Covering Problem Select collection of clusters (taxa) s.t. No sequence is left outside Minimal number of selected clusters If Ci is selected, sequences of Ci will be assigned to ti Example: C1 C2 C3 C4 C5 C6 s1 • • • s2 • • s3 • • s4 • • • s5 • • s6 • • s7 • • • s8 • • s9 • • s10 • • → Clustering Solution: C1 C2 C3 C4 C5 C6 s1 • • • s2 • • s3 • • s4 • • • s5 • • s6 • • s7 • • • s8 • • s9 • • s10 • •
  • 20. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere MTR: Annotation via combinatorial optimization For each rank j: For each taxon ti of rank j: Create cluster Ci ⊆ R of reads similar to taxon ti Set Covering Problem Select collection of clusters (taxa) s.t. No sequence is left outside Minimal number of selected clusters If Ci is selected, sequences of Ci will be assigned to ti Example: C1 C2 C3 C4 C5 C6 s1 • • • s2 • • s3 • • s4 • • • s5 • • s6 • • s7 • • • s8 • • s9 • • s10 • • → Clustering Solution: C1 C2 C3 C4 C5 C6 s1 • • • s2 • • s3 • • s4 • • • s5 • • s6 • • s7 • • • s8 • • s9 • • s10 • •
  • 21. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere MTR vs LCA: MTR better wrt to quantity MTR annotates more reads than LCA Simulated data: MTR 8% 37% more reads At rank Genus: 28% 89% Real-life data: MTR 15% 30% more reads At rank Species: 120% 208% Experiments: 12 simulated data and 3 real life data (100bp reads)
  • 22. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere MTR vs LCA: LCA better accuracy Accuracy and Number of reads assigned (for each rank) Rank MTR (#of reads) LCA (#of reads) Kingdom 100.00 (166,948) 99.99 (155,263) Phylum 99.86 (166,948) 99.93 (155,258) Class 99.73 (166,936) 99.81 (141,829) Order 97.67 (166,148) 98.14 (115,732) Family 97.62 (165,231) 98.04 (110,488) Genus 97.42 (140,476) 98.35 (110,139) Table: Data name: M3, Coverage 4X, Tot reads:166,978 Rank MTR (#of reads) LCA (#of reads) Kingdom 95.07 (88,537) 94.66 (73,176) Phylum 93.21 (88,537) 92.57 (73,169) Class 89.25 (87,635) 88.98 (60,294) Order 89.24 (85,657) 88.44 (57,373) Family 77.35 (81,366) 81.84 (48,760) Genus 61.36 (77,307) 74.60 (40,823) Table: Data name: M2, Coverage 1X, Tot reads:288,730
  • 23. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere MTR vs LCA: MTR better population distribution Population distributions (rank Genus) of M2, coverage 0.1X
  • 24. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Population distributions (rank Genus) of Coral dataset MTR 1031 279 3492 133 80 14657 4540 90 313 128 1133 MTR Acinetobacter (9.03%) Aspergillus (2.44%) Gibberella (30.57%) Neurospora (1.16%) Podospora (0.70%) Chaetomium (1.28%) T4−like viruses (0.50%) Porites (39.75%) Phaeosphaeria (0.79%) Magnaporthe (2.74%) Nitrosopumilus (1.12%) Others (9.92%) LCA 944 80 1804 76 76 105 57 643 51 169 76 604 LCA Acinetobacter (20.15%) Aspergillus (1.71%) Gibberella (38.51%) Neurospora (1.62%) Podospora (1.62%) Chaetomium (2.24%) T4−like viruses (1.22%) Porites (13.72%) Phaeosphaeria (1.09%) Magnaporthe (3.61%) Nitrosopumilus (1.62%) Others (12.89%)
  • 25. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Conclusions MTR outperforms LCA in two ways: More sequences annotated especially at low ranks Better estimate of population distribution LCA tends to be more accurate
  • 26. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Conclusions MTR outperforms LCA in two ways: More sequences annotated especially at low ranks Better estimate of population distribution LCA tends to be more accurate
  • 27. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Future Developments Replace BLASTX with composition-based similarity measure Additional constraints of cluster selection e.g. consistent coverage depth on proteins or constraints on genome location coverage
  • 28. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Table of Contents 1 Introduction to Metagenomics 2 Taxonomic-annotation Algorithms 3 Metagenomics to Retrieve a Bacteria 4 Comparative Genomics for Antibiotic Resistance 5 Appendix: construction of a tailored reference
  • 29. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Metagenomic sequencing to acquire an organism
  • 30. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Metagenomic sequencing to acquire an organism Candidatus Brocadia fulgida Brocadia genome had not been previously sequenced
  • 31. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Metagenomic sequencing to acquire an organism Candidatus Brocadia fulgida Brocadia genome had not been previously sequenced Sequencing platforms (mean read length): SangerShotgun (800bp) SangerFosmid (800bp) 454 GS20 (200bp) First standard annotation: Reads are assigned to BLASTX best hit Reads assigned to Brocadia if best hit is Kuenenia (Kuenenia is close relative of Brocadia)
  • 32. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Metagenomic sequencing to acquire an organism Candidatus Brocadia fulgida Brocadia genome had not been previously sequenced Sequencing platforms (mean read length): SangerShotgun (800bp) SangerFosmid (800bp) 454 GS20 (200bp) First standard annotation: Reads are assigned to BLASTX best hit Reads assigned to Brocadia if best hit is Kuenenia (Kuenenia is close relative of Brocadia)
  • 33. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Why FISH analysis and BLASTX annotation do not agree?
  • 34. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere 80% of the cells are Brocadia, but. . . Brocadia seems underrepresented Are we sure? Can we still extract signicant information? Shotgun Fosmid 454 Brocadia reads 9.68% 13.76% 12.92% Brocadia bp 9.76% 14.33% 11.34% Let's do some composition-based analyses. . .
  • 35. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere 80% of the cells are Brocadia, but. . . Brocadia seems underrepresented Are we sure? Can we still extract signicant information? Shotgun Fosmid 454 Brocadia reads 9.68% 13.76% 12.92% Brocadia bp 9.76% 14.33% 11.34% Let's do some composition-based analyses. . .
  • 36. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Dierent point of view: GC content [ Bernaola-Galvan et al., Gene, 2004 ] Dierent organisms can have dierent GC content (16.6% - 74.9%) If genome is partitioned in equally sized, non-overlapping sequences: GC content has normal distribution (approximately) Distribution is centered on organism GC content
  • 37. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Bias toward high GC-content organisms Raw Annotated Brocadia Alphaproteobacteria Betaproteobacteria 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 2000 4000 6000 8000 10000 GC−content Frequency 454 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 200 400 600 800 1000 1200 1400 1600 GC−content Frequency Fosmid 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 200 400 600 800 1000 1200 1400 1600 GC−content Frequency Shotgun
  • 38. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere We saw that Brocadia is underrepresented. . . How can we cope with that?
  • 39. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Sets of well-recovered Kuenenia ORFs dier Technologies: Shotgun (Sanger): Fosmid (Sanger): 454: Extended Venn-diagram of Brocadia Open Reading Frames retrieved for 80% of their length
  • 40. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Depth of coverage: correlation on the same ORF Shotgun Fosmid Shotgun 454 Fosmid 454 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% −1, −0.7 −0.7, −0.3 −0.3, 0 0, 0.3 0.3, 0.7 0.7, 1 Correlation Similar: Dierent:
  • 41. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Future Developments Tuning sequencing for the specic community Integration of composition-based analysis and BLASTX annotation
  • 42. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Future Developments Tuning sequencing for the specic community Integration of composition-based analysis and BLASTX annotation
  • 43. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Table of Contents 1 Introduction to Metagenomics 2 Taxonomic-annotation Algorithms 3 Metagenomics to Retrieve a Bacteria 4 Comparative Genomics for Antibiotic Resistance 5 Appendix: construction of a tailored reference
  • 44. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere The antibiotic alarm, Nature, 14 March 2013 Rise of resistance (inevitable) Decline of development (economics)
  • 45. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Waiting for new drugs. . . How can we cope with it? Multi-drug treatments New therapies (dosage, duration) Personalised medicine (e.g. infecting strain, patient PK/PD, patient genotype)
  • 46. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Idea: Drug Switching Experiments: Treatments: Sequential switch drug 50%50% cocktail Control no drugs Protocol: For each season bacteria grow in liquid medium with drug 1% bacteria transfer 3 replicates Duration: 96 hours 8 seasons of 12 hours Drugs: Doxycycline, Erythromycin Sequencing: after 24h and 96h 18 datasets (red border)
  • 47. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere My Role Construct annotated reference genome [custom pipeline] For each replicate, identify: Structural Variations (SVs) [Pindel] Copy Number Variations (CNVs) [CNVnator] Single Nucleotide Polymorphisms (SNPs) [VarScan]
  • 48. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Results CNV: 412kb duplicated region at 96 hours 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Mb 96 hours Control ERY/DOX 50−50 ampE rrnH paoC tauA ybbJ mdtG atoB rrnG yqhC rng rrnD rrnC ubiDrrnA rh aM rrnB rrnE slt Normalised Coverage (1000 bins) Mean +1 Eux-pump duplications (!) This region includes the multidrug eux pump AcrRAB-TolC [Peña-Miller et al, PLOS Biology, 2013] 24 96 Time 1 2CoverageRatioInside/OutsideDuplicatedRegion Dox/Ery p 0.0001 24 96 Time 50%-50% p 0.0001 24 96 Time Control p = 0.303 (p-value is for t-test)
  • 49. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Conclusion Sequential treatments work well in vitro when cocktail fail Genomics: antibiotics prevent mutations Futher developments (omics): Phage role in region duplication Timing of region duplication NGS of additional treatments Transcriptomics
  • 50. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Table of Contents 1 Introduction to Metagenomics 2 Taxonomic-annotation Algorithms 3 Metagenomics to Retrieve a Bacteria 4 Comparative Genomics for Antibiotic Resistance 5 Appendix: construction of a tailored reference
  • 51. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Standard approach: de novo assembly annotation Solve the jigsaw puzzle Functional annotation Done with software and manual work Problems (common) Errors: Repetitive regions misassembled Wrong order/orientation Annotation quality Fragmentation Quality depends on timemoney 2014: Automated genome assembly for less than $1,000 [KorenPhillippy, Curr Opin Microbiol, 2015]
  • 52. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere The alternative: tailored reference Take the reference genome of a close relative Modify it according to sequencing data Import annotation from reference Pros Less fragmentation Higher quality Better annotation Cons You need a close relative Visually check steps Ad hoc scripting Conservative approach
  • 53. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Our case Sequenced organism: E. coli K-12 AG100 growing 24h in M9 medium Reference genome: E. coli K-12 MG1655 (available online) Data (preprocessed): Reads mapping to reference MG1655: 95.84% Mean coverage depth: 88.19x (based on MG1655) Read min/max/mean length (bp): 15 / 99 / 72.17
  • 54. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere The pipeline Read preprocessing (standard) Mapping to reference MG1655 (standard) Call Structural Variations (SVs) Assemble unmapped and mapped data Make intermediate reference Check SVs and call SNPs Functional annotation
  • 55. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Clean align reads Reads preprocessing [fastq-mcf, samtools] Mapping to reference [BWA, IGV]
  • 56. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere The pipeline Read preprocessing (standard) Mapping to reference MG1655 (standard) Call Structural Variations (SVs) Assemble unmapped and mapped data Make intermediate reference Check SVs and call SNPs Functional annotation
  • 57. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Structural Variations (SVs) Use Pindel to call SVs Deletions, Insertions, Inversions, Translocations Indels Break points Visually checked [IGV]: Deletions: 5 (total 47kbp) Indels: 9 Break points: 9
  • 58. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere The pipeline Read preprocessing (standard) Mapping to reference MG1655 (standard) Call Structural Variations (SVs) Assemble unmapped and mapped data Make intermediate reference Check SVs and call SNPs Functional annotation
  • 59. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere SVs application and assemble unmapped reads Take the close relative genome Break in sequences by applying SVs Extract reads around removed regions Extract reads not mapped to reference Assemble ∪ −→ Scaold ∪ [PythonBash scripting, Samtools, Velvet, SSpace, Gapller]
  • 60. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere The pipeline Read preprocessing (standard) Mapping to reference MG1655 (standard) Call Structural Variations (SVs) Assemble unmapped and mapped data Make intermediate reference Check SVs and call SNPs Functional annotation
  • 61. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Making intermediate and tailored references Making Intermediate reference Order scaolds w.r.t reference [Mauve] Concatenate the 13 aligned scaolds [Bash one-liner] Making tailored reference Look for SVs (none should be present) Call SNPs [VarScan, vcftools] Annotation Export annotation from reference [RATT] Adjust and annotate missing parts [RAST, manually edit] Make le ENA compatible [Python script]
  • 62. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere In my experience, people do not look at assemblies critically enough [Nature Methods, 2012] Clean results need designed protocols, time, and money Leap forwards has been done recently, but the sequencing cost is still not very low [Nature Methods, June 2013; KorenPhillippy, Curr Opin Microbiol, 2015]
  • 63. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere Combining technologies improved Kuenenia ORFs retrieval 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Threshold of Mapping Percentage NumberofORFs Shotgun, Fosmid Shotgun, 454 Fosmid, 454 All Shotgun Fosmid 454
  • 64. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere SNPs ribosomal: mutations in the control Hypothesis: antibiotics slow down adaptation for optimal growth in culture Heightened ribosomal demand due to rapid growth [Condon et al., J Bacteriol 1995] % mean variant frequency(replicates, if not all) 50%-50% Dox/Ery Control operon position relative posn 24h 96h 24h 96h 24h 96h rrnH 226,521 595 5(2) 227,791 1,865 3(1) 17 rrnG 2,723,624 1,865 3(1) 9 2,724,894 595 8 rrnD 3,421,431 1,865 4(1) 13 3,422,701 595 8 rrnC 3,940,810 595 4(1) 17 rrnA 4,034,586 555 7 rrnB 4,165,708 595 4(1) 8 4,166,978 1,865 10 rrnE 4,207,110 595 3(1) 9 4,208,380 1,865 5(1) 7 SNPs signicantly dierent in frequency (ANOVA) Maybe these ribosomal mutations helps with α-amino acid starvation, because. . .
  • 65. Metagenomics Annotation Algorithms Metagenomics to acquire a single genome Antibiotic Resistance Tailored refere tauA expressed only under condition of sulfate or cysteine (α-amino acid) starvation [Eichhorn et al, J Bacteriol, 2000] yqhC regulates a scavenger of toxic aldehydes produced by lipid peroxidation [Jarobe et al, Appl Microbiol Biotechnol, 2011] % mean variant frequency(replicates, if not all) 50%-50% Dox/Ery Control gene position 24h 96h 24h 96h 24h 96h annotation DUPLICATED REGION tauA 384,897 19(1) 68 taurine transport system yqhC 3,151,384 45 putative ARAC-type regulatory protein