SnapperDB ASM NGS conference

•Download as PPTX, PDF•

1 like•1,506 views

Philip Ashton

My talk at #ASMNGS on using snapperdb to do bacterial phylogenetics.

Science

SnapperDB: A Scalable Database
for Routine Sequencing of
Bacterial Isolates
Philip Ashton
Bioinformatician
Gastrointestinal Bacteria Reference Unit

Github + CLIMB image
• Download the code from https://github.com/PHE-GIDIS/SnapperDB.git
• Spin up an image (if you have a CLIMB account)
http://birmingham.climb.ac.uk/ ashton-phe-snpdb-client
• And, if you have no idea what I mean by spin up an image:
• http://bitsandbugs.org/2015/05/13/climb-hackathon-outcome/
• Google – ‘bits and bugs + climb’
• CLIMB google group
2 SnapperDB

3 SnapperDB
Challenges:
• Many different eburst groups (of STs) – have to be analysed separately
• Hundreds of strains a week
• Rapid, hands-off analysis
Solution – SnapperDB:
SnapperDB
Sample
FASTQs
(with ST)
EBG 1 - Typhimuriumdb
db
db
db
db
EBG 4 - Enteritidis
EBG 13 - Typhi
EBG 3 - Newport
EBG 11 – Paratyphi A
…
30 mins - parallel 5 min – 1 hour

4 SnapperDB
FASTQs
SNPdb
(PostgreSQL
database)
SNP alignmentsTrees
SnapperDB
SnapperDB.py
fastq_to_db
(fastq_to_vcf,
vcf_to_db)
SnapperDB.py
get_the_snps
RAxML
FastTree

SNPdb Schema
5 SnapperDB
Parse VCF
Ignored positions
• Ambiguous
mapping
• Low coverage
Variants
Strain
SNPs
id name variants_id ignored_pos
1 H123456789 [1,2,3,4,5,…] [9985, 856142, …]
Variants
id position ref base var_base
1 235214 A T
2 455544 T C
…

Running SnapperDB
6 SnapperDB
FASTQs
Trees
Pairwise distance matrix
Put picture of SNP distance matrix
SNP address
Looks something like
1.1.24.36.48.128.2013

7 Revolutionising Salmonella reference microbiology
http://benfry.com/zipdecode/

8 Revolutionising Salmonella reference microbiology
http://benfry.com/zipdecode/

9 Revolutionising Salmonella reference microbiology
http://benfry.com/zipdecode/

10 Revolutionising Salmonella reference microbiology
http://benfry.com/zipdecode/

11 Revolutionising Salmonella reference microbiology
http://benfry.com/zipdecode/

12 Revolutionising Salmonella reference microbiology
http://benfry.com/zipdecode/

SNPAddress
13
1 2
1
2
3
1
2
3
4
5
6
1.1.1
1.2.2
1.2.4
1.2.3
2.3.5
2.3.6

Update SNP address& detect mixed
14 SnapperDB
As you add strains to cluster, it
calculates the mean pairwise distance
and stdev within that cluster.
Then, if a strain has a z-score of >1.75,
the strain is quality assessed.
Look at the alignment of that strain and
near neighbours (with Ns) and exclude
if a strain introduces a larger number of
unique Ns to the alignment.
Otherwise, these can blunt the tips of
the tree and reduce resolution.
SnapperDB.py update_clusters

Acknowledments
Tim Dallman
Anthony Underwood
Aleksy Jironkin
Jon Green
Ali Al-Shabib
17 SnapperDB

Similar to SnapperDB ASM NGS conference

PAC 2019 virtual Christoph NEUMÜLLERNeotys

20120524 english lt2_pythontoolsfortestingKazuhiro Oinuma

Giraph주영 송

openCV with pythonWei-Wen Hsu

PyCon AU 2012 - Debugging Live Python Web ApplicationsGraham Dumpleton

Debugging Java from DumpsChris Bailey

The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseNikolay Samokhvalov

Big Data Analysis : Deciphering the haystack Srinath Perera

A Post-Apocalyptic sun.misc.Unsafe World by Christoph engelbertJ On The Beach

Monitoring anomalies in experimentation platform.Deepak Vasthimal

Continuously Integrating PuppetPuppet

Elite Bug SquashingTony Brown

Using Riak for Events storage and analysis at Booking.comDamien Krotkine

systemd @ Facebook -- a year laterDavide Cavalca

Open Source Monitoring Toolsm_richardson

Exploring Java Heap Dumps (Oracle Code One 2018)Ryan Cuprak

Crab - A Python Framework for Building Recommendation SystemsMarcel Caraciolo

Getting root with benign app store apps vsecurityfestCsaba Fitzl

[Hackersuli][HUN]MacOS - Going Down the Rabbit Holehackersuli

Winning the metrics battlesihil

Similar to SnapperDB ASM NGS conference (20)

PAC 2019 virtual Christoph NEUMÜLLER

20120524 english lt2_pythontoolsfortesting

Giraph

openCV with python

PyCon AU 2012 - Debugging Live Python Web Applications

Debugging Java from Dumps

The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose

Big Data Analysis : Deciphering the haystack

A Post-Apocalyptic sun.misc.Unsafe World by Christoph engelbert

Monitoring anomalies in experimentation platform.

Continuously Integrating Puppet

Elite Bug Squashing

Using Riak for Events storage and analysis at Booking.com

systemd @ Facebook -- a year later

Open Source Monitoring Tools

Exploring Java Heap Dumps (Oracle Code One 2018)

Crab - A Python Framework for Building Recommendation Systems

Getting root with benign app store apps vsecurityfest

[Hackersuli][HUN]MacOS - Going Down the Rabbit Hole

Winning the metrics battle

Recently uploaded

Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl

Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani

Isotopic evidence of long-lived volcanism on IoSérgio Sacani

Is RISC-V ready for HPC workload? Maybe?Patrick Diehl

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani

Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani

zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069

Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136

Boyles law module in the grade 10 sciencefloriejanemacaya1

A relative description on Sonoporation.pdfnehabiju2046

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823

Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar

Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl

Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar

Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25

Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1

Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P

Recently uploaded (20)

Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.

Physiochemical properties of nanomaterials and its nanotoxicity.pptx

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...

Isotopic evidence of long-lived volcanism on Io

Is RISC-V ready for HPC workload? Maybe?

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b

Hubble Asteroid Hunter III. Physical properties of newly found asteroids

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...

zoogeography of pakistan.pptx fauna of Pakistan

Cultivation of KODO MILLET . made by Ghanshyam pptx

Boyles law module in the grade 10 science

A relative description on Sonoporation.pdf

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...

Analytical Profile of Coleus Forskohlii | Forskolin .pptx

Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.

Analytical Profile of Coleus Forskohlii | Forskolin .pdf

Recombination DNA Technology (Nucleic Acid Hybridization )

Recombinant DNA technology (Immunological screening)

Spermiogenesis or Spermateleosis or metamorphosis of spermatid

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE

SnapperDB ASM NGS conference

1. SnapperDB: A Scalable Database for Routine Sequencing of Bacterial Isolates Philip Ashton Bioinformatician Gastrointestinal Bacteria Reference Unit

2. Github + CLIMB image • Download the code from https://github.com/PHE-GIDIS/SnapperDB.git • Spin up an image (if you have a CLIMB account) http://birmingham.climb.ac.uk/ ashton-phe-snpdb-client • And, if you have no idea what I mean by spin up an image: • http://bitsandbugs.org/2015/05/13/climb-hackathon-outcome/ • Google – ‘bits and bugs + climb’ • CLIMB google group 2 SnapperDB

3. 3 SnapperDB Challenges: • Many different eburst groups (of STs) – have to be analysed separately • Hundreds of strains a week • Rapid, hands-off analysis Solution – SnapperDB: SnapperDB Sample FASTQs (with ST) EBG 1 - Typhimuriumdb db db db db EBG 4 - Enteritidis EBG 13 - Typhi EBG 3 - Newport EBG 11 – Paratyphi A … 30 mins - parallel 5 min – 1 hour

4. 4 SnapperDB FASTQs SNPdb (PostgreSQL database) SNP alignmentsTrees SnapperDB SnapperDB.py fastq_to_db (fastq_to_vcf, vcf_to_db) SnapperDB.py get_the_snps RAxML FastTree

5. SNPdb Schema 5 SnapperDB Parse VCF Ignored positions • Ambiguous mapping • Low coverage Variants Strain SNPs id name variants_id ignored_pos 1 H123456789 [1,2,3,4,5,…] [9985, 856142, …] Variants id position ref base var_base 1 235214 A T 2 455544 T C …

6. Running SnapperDB 6 SnapperDB FASTQs Trees Pairwise distance matrix Put picture of SNP distance matrix SNP address Looks something like 1.1.24.36.48.128.2013

7. 7 Revolutionising Salmonella reference microbiology http://benfry.com/zipdecode/

8. 8 Revolutionising Salmonella reference microbiology http://benfry.com/zipdecode/

9. 9 Revolutionising Salmonella reference microbiology http://benfry.com/zipdecode/

10. 10 Revolutionising Salmonella reference microbiology http://benfry.com/zipdecode/

11. 11 Revolutionising Salmonella reference microbiology http://benfry.com/zipdecode/

12. 12 Revolutionising Salmonella reference microbiology http://benfry.com/zipdecode/

13. SNPAddress 13 1 2 1 2 3 1 2 3 4 5 6 1.1.1 1.2.2 1.2.4 1.2.3 2.3.5 2.3.6

14. Update SNP address& detect mixed 14 SnapperDB As you add strains to cluster, it calculates the mean pairwise distance and stdev within that cluster. Then, if a strain has a z-score of >1.75, the strain is quality assessed. Look at the alignment of that strain and near neighbours (with Ns) and exclude if a strain introduces a larger number of unique Ns to the alignment. Otherwise, these can blunt the tips of the tree and reduce resolution. SnapperDB.py update_clusters

15. Results 15 SnapperDB

16. 16 SnapperDB Results

17. Acknowledments Tim Dallman Anthony Underwood Aleksy Jironkin Jon Green Ali Al-Shabib 17 SnapperDB

SnapperDB ASM NGS conference

Recommended

Recommended

More Related Content

Similar to SnapperDB ASM NGS conference

Similar to SnapperDB ASM NGS conference (20)

Recently uploaded

Recently uploaded (20)

SnapperDB ASM NGS conference