Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Utility of the Salmonella in Silico Typing
Resource(SISTR) to outbreak investigations
James Robertson1, Catherine Yoshida1...
Salmonella is a leading public health concern
 Salmonella is a leading food-borne pathogen both in Canada and around the ...
3
Potential Sources
Challenges in Salmonella typing and epidemiology
 Small number of highly prevalent/globally distributed serovars account ...
5
GATCGATCGATCG
GATCAATCGATCG
MLST cgMLST wgSNP’sSerotyping
Discriminatory Power
Low Low-Mid Mid-High High
• Based on reac...
7
• Initial dataset of 4330 genomes
• 94.6% concordance between predicted
and reported serovar
• in silico serovar predict...
What does SISTR do?
In silico analysis of WGS data
 assembly statistics
 serovar prediction
 in silico typing (MLST,
cg...
SISTR cgMLST
• Current cgMLST scheme in SISTR based on 330 core
genes with high “assignability” (i.e. very low levels of
“...
Testing the accuracy of SISTR
• ~45,000 Salmonella genomes were downloaded from the
SRA
• Raw reads were assembled using F...
Recovery rates of 330 cgMLST genes from Assembled
SRA genomes
11
41781
1393
1905
Number of Genomes with
Complete 330
Numbe...
SISTR Accuracy
12
2347
29884
N=32,321
• 93.7% Overall concordance with serovar
specified
Discordant
Concordant
13
• Two outbreaks of Salmonella Enteriditis were retrospectively sequenced
• Examined the feasibility of WGS to outbreak ...
14
15
16
17
18
SISTR (cgMLST) PARSNP (core SNP)
SNP Tree (Wuyts et al 2015)
• All three methods produce concordant
trees.
• cgMLST has...
Outbreak Clustering Categories
B
A
C
B
A+C
B
C
A
A
Correct Incorrectly Split
Over-grouped
A+B
A+C
Incorrectly Split and gr...
Concordance between cgMLST and SNP trees
Study Correct Over-grouped Split Combination Serovar(s)
1 1 1 0 0 Enteriditis
2 2...
Conclusions
• SISTR is a a robust and accurate platform for Salmonella in silico
typing with 93.7% concordance between spe...
22
Acknowledgements
Team:
 Ed Taboada, Peter Kruczkiewicz, Catherine Yoshida, John Nash
Research partners:
 Public Healt...
Robertson immemxi final March 2016
Upcoming SlideShare
Loading in …5
×

Robertson immemxi final March 2016

102 views

Published on

Utility Of The Salmonella In Sillico Typing Resource (SISTR) To Outbreak Investigations

Published in: Science
  • Be the first to comment

  • Be the first to like this

Robertson immemxi final March 2016

  1. 1. Utility of the Salmonella in Silico Typing Resource(SISTR) to outbreak investigations James Robertson1, Catherine Yoshida1, Peter Kruczkiewicz2, Eduardo N. Taboada2 and John H. E. Nash3 1 National Microbiology Laboratory @Guelph , Public Health Agency of Canada 2 National Microbiology Laboratory @Lethbridge, Public Health Agency of Canada 3 National Microbiology Laboratory @Toronto, Public Health Agency of Canada
  2. 2. Salmonella is a leading public health concern  Salmonella is a leading food-borne pathogen both in Canada and around the world  Globally, there are an estimated 94 million Salmonella infections every year  Human costs: • acute illness • loss of life (155,000 deaths)  Societal costs: • health care costs • lost productivity • legal costs • impact to food industry 2
  3. 3. 3 Potential Sources
  4. 4. Challenges in Salmonella typing and epidemiology  Small number of highly prevalent/globally distributed serovars account for most outbreaks (e.g. Enteritidis, Typhimurium)  Epidemiologicaly unrelated isolates within same serovar  difficult to investigate  Additional subtyping resolution within a serovar needed (e.g. phage typing)  Increasing use of genotypic methods (i.e. molecular typing)  Driven by need for methods with higher discriminatory power  A number of different approaches have been applied to molecular typing of Salmonella 4
  5. 5. 5 GATCGATCGATCG GATCAATCGATCG MLST cgMLST wgSNP’sSerotyping Discriminatory Power Low Low-Mid Mid-High High • Based on reaction of antibodies to surface antigens • Broad usage and common nomenclature in use since the 1930’s • Multi-Locus Sequence Typing: developed by Maiden et al. (1998) • Indexes genetic variation in 7 core (i.e. “housekeeping”) genes • cgMLST extends this principle to 100’s to 1000’s of loci • Provides a portable naming scheme which correlates with historical serotypes • Utilizes individual SNP’s and gives very high resolution • Results are not portable to other public health professionals
  6. 6. 7 • Initial dataset of 4330 genomes • 94.6% concordance between predicted and reported serovar • in silico serovar predictions based on O and H antigens • cgMLST refinement of serovar assignment and analysis • Uses minimally processed genome assemblies • Very fast ~30 seconds to process a genome
  7. 7. What does SISTR do? In silico analysis of WGS data  assembly statistics  serovar prediction  in silico typing (MLST, cgMLST)  AMR prediction Comparative genomic analyses  cgMLST  accessory gene content  core SNPs Epidemiologic analysis  geospatial distribution  temporal distribution  source association https://lfz.corefacility.ca/sistr-app/
  8. 8. SISTR cgMLST • Current cgMLST scheme in SISTR based on 330 core genes with high “assignability” (i.e. very low levels of “missing” data) • Will include international Salmonella cgMLST scheme (i.e. once it is developed!) • cgMLST information is used to: – Assess quality of WGS data  complete, partial, missing loci – Supplement genoserotyping predictions 9
  9. 9. Testing the accuracy of SISTR • ~45,000 Salmonella genomes were downloaded from the SRA • Raw reads were assembled using FLASH and Spades • Assemblies were loaded into SISTR and the serovar predictions were compared between predicted and reported (where available) • Assemblies were checked for contamination using Kraken • Quality was assessed using Quast 10
  10. 10. Recovery rates of 330 cgMLST genes from Assembled SRA genomes 11 41781 1393 1905 Number of Genomes with Complete 330 Number of Genomes with >300 Genes Number of Genomes with <300 Genes N=45,079
  11. 11. SISTR Accuracy 12 2347 29884 N=32,321 • 93.7% Overall concordance with serovar specified Discordant Concordant
  12. 12. 13 • Two outbreaks of Salmonella Enteriditis were retrospectively sequenced • Examined the feasibility of WGS to outbreak investigations • Compared results of traditional molecular and microbial tests to WGS
  13. 13. 14
  14. 14. 15
  15. 15. 16
  16. 16. 17
  17. 17. 18 SISTR (cgMLST) PARSNP (core SNP) SNP Tree (Wuyts et al 2015) • All three methods produce concordant trees. • cgMLST has a tendency to overgroup
  18. 18. Outbreak Clustering Categories B A C B A+C B C A A Correct Incorrectly Split Over-grouped A+B A+C Incorrectly Split and grouped
  19. 19. Concordance between cgMLST and SNP trees Study Correct Over-grouped Split Combination Serovar(s) 1 1 1 0 0 Enteriditis 2 2 3 0 0 Enteriditis 3 5 1 0 0 Enteriditis,Typhimurium, Derby 4 2 7 0 0 Enteriditis 5 2 0 0 0 Enteriditis 6 5 2 0 0 Enteriditis Total 18 13 0 0 20
  20. 20. Conclusions • SISTR is a a robust and accurate platform for Salmonella in silico typing with 93.7% concordance between specified serovar and predicted serovar • The prototype 330 gene cgMLST scheme is readily retrievable from HTS assemblies of varying quality levels. • The current scheme provides coarse grain separation of Salmonella genetic lineages that will be useful in outbreak analysis 21
  21. 21. 22 Acknowledgements Team:  Ed Taboada, Peter Kruczkiewicz, Catherine Yoshida, John Nash Research partners:  Public Health Agency of Canada:  OIE Laboratory for Salmonellosis – National Microbiology Lab (NML) @ Guelph  Genomics Core and Bioinformatics Core – NML @ Winnipeg  Public Health Genomics team – NML @ Winnipeg  IRIDA project team  Animal Health Veterinary Laboratory Agency – UK  Austrian Institute of Technology – Austria Funding:  Genomics Research and Development Initiative  Genome Canada (IRIDA project)  Public Health Agency of Canada

×