STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
SnapperDB ASM NGS conference
1. SnapperDB: A Scalable Database
for Routine Sequencing of
Bacterial Isolates
Philip Ashton
Bioinformatician
Gastrointestinal Bacteria Reference Unit
2. Github + CLIMB image
• Download the code from https://github.com/PHE-GIDIS/SnapperDB.git
• Spin up an image (if you have a CLIMB account)
http://birmingham.climb.ac.uk/ ashton-phe-snpdb-client
• And, if you have no idea what I mean by spin up an image:
• http://bitsandbugs.org/2015/05/13/climb-hackathon-outcome/
• Google – ‘bits and bugs + climb’
• CLIMB google group
2 SnapperDB
3. 3 SnapperDB
Challenges:
• Many different eburst groups (of STs) – have to be analysed separately
• Hundreds of strains a week
• Rapid, hands-off analysis
Solution – SnapperDB:
SnapperDB
Sample
FASTQs
(with ST)
EBG 1 - Typhimuriumdb
db
db
db
db
EBG 4 - Enteritidis
EBG 13 - Typhi
EBG 3 - Newport
EBG 11 – Paratyphi A
…
30 mins - parallel 5 min – 1 hour
5. SNPdb Schema
5 SnapperDB
Parse VCF
Ignored positions
• Ambiguous
mapping
• Low coverage
Variants
Strain
SNPs
id name variants_id ignored_pos
1 H123456789 [1,2,3,4,5,…] [9985, 856142, …]
Variants
id position ref base var_base
1 235214 A T
2 455544 T C
…
14. Update SNP address& detect mixed
14 SnapperDB
As you add strains to cluster, it
calculates the mean pairwise distance
and stdev within that cluster.
Then, if a strain has a z-score of >1.75,
the strain is quality assessed.
Look at the alignment of that strain and
near neighbours (with Ns) and exclude
if a strain introduces a larger number of
unique Ns to the alignment.
Otherwise, these can blunt the tips of
the tree and reduce resolution.
SnapperDB.py update_clusters