Successfully reported this slideshow.

Robertson immemxi final March 2016

1

Share

1 of 22
1 of 22

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Robertson immemxi final March 2016

  1. 1. Utility of the Salmonella in Silico Typing Resource(SISTR) to outbreak investigations James Robertson1, Catherine Yoshida1, Peter Kruczkiewicz2, Eduardo N. Taboada2 and John H. E. Nash3 1 National Microbiology Laboratory @Guelph , Public Health Agency of Canada 2 National Microbiology Laboratory @Lethbridge, Public Health Agency of Canada 3 National Microbiology Laboratory @Toronto, Public Health Agency of Canada
  2. 2. Salmonella is a leading public health concern  Salmonella is a leading food-borne pathogen both in Canada and around the world  Globally, there are an estimated 94 million Salmonella infections every year  Human costs: • acute illness • loss of life (155,000 deaths)  Societal costs: • health care costs • lost productivity • legal costs • impact to food industry 2
  3. 3. 3 Potential Sources
  4. 4. Challenges in Salmonella typing and epidemiology  Small number of highly prevalent/globally distributed serovars account for most outbreaks (e.g. Enteritidis, Typhimurium)  Epidemiologicaly unrelated isolates within same serovar  difficult to investigate  Additional subtyping resolution within a serovar needed (e.g. phage typing)  Increasing use of genotypic methods (i.e. molecular typing)  Driven by need for methods with higher discriminatory power  A number of different approaches have been applied to molecular typing of Salmonella 4
  5. 5. 5 GATCGATCGATCG GATCAATCGATCG MLST cgMLST wgSNP’sSerotyping Discriminatory Power Low Low-Mid Mid-High High • Based on reaction of antibodies to surface antigens • Broad usage and common nomenclature in use since the 1930’s • Multi-Locus Sequence Typing: developed by Maiden et al. (1998) • Indexes genetic variation in 7 core (i.e. “housekeeping”) genes • cgMLST extends this principle to 100’s to 1000’s of loci • Provides a portable naming scheme which correlates with historical serotypes • Utilizes individual SNP’s and gives very high resolution • Results are not portable to other public health professionals
  6. 6. 7 • Initial dataset of 4330 genomes • 94.6% concordance between predicted and reported serovar • in silico serovar predictions based on O and H antigens • cgMLST refinement of serovar assignment and analysis • Uses minimally processed genome assemblies • Very fast ~30 seconds to process a genome
  7. 7. What does SISTR do? In silico analysis of WGS data  assembly statistics  serovar prediction  in silico typing (MLST, cgMLST)  AMR prediction Comparative genomic analyses  cgMLST  accessory gene content  core SNPs Epidemiologic analysis  geospatial distribution  temporal distribution  source association https://lfz.corefacility.ca/sistr-app/
  8. 8. SISTR cgMLST • Current cgMLST scheme in SISTR based on 330 core genes with high “assignability” (i.e. very low levels of “missing” data) • Will include international Salmonella cgMLST scheme (i.e. once it is developed!) • cgMLST information is used to: – Assess quality of WGS data  complete, partial, missing loci – Supplement genoserotyping predictions 9
  9. 9. Testing the accuracy of SISTR • ~45,000 Salmonella genomes were downloaded from the SRA • Raw reads were assembled using FLASH and Spades • Assemblies were loaded into SISTR and the serovar predictions were compared between predicted and reported (where available) • Assemblies were checked for contamination using Kraken • Quality was assessed using Quast 10
  10. 10. Recovery rates of 330 cgMLST genes from Assembled SRA genomes 11 41781 1393 1905 Number of Genomes with Complete 330 Number of Genomes with >300 Genes Number of Genomes with <300 Genes N=45,079
  11. 11. SISTR Accuracy 12 2347 29884 N=32,321 • 93.7% Overall concordance with serovar specified Discordant Concordant
  12. 12. 13 • Two outbreaks of Salmonella Enteriditis were retrospectively sequenced • Examined the feasibility of WGS to outbreak investigations • Compared results of traditional molecular and microbial tests to WGS
  13. 13. 14
  14. 14. 15
  15. 15. 16
  16. 16. 17
  17. 17. 18 SISTR (cgMLST) PARSNP (core SNP) SNP Tree (Wuyts et al 2015) • All three methods produce concordant trees. • cgMLST has a tendency to overgroup
  18. 18. Outbreak Clustering Categories B A C B A+C B C A A Correct Incorrectly Split Over-grouped A+B A+C Incorrectly Split and grouped
  19. 19. Concordance between cgMLST and SNP trees Study Correct Over-grouped Split Combination Serovar(s) 1 1 1 0 0 Enteriditis 2 2 3 0 0 Enteriditis 3 5 1 0 0 Enteriditis,Typhimurium, Derby 4 2 7 0 0 Enteriditis 5 2 0 0 0 Enteriditis 6 5 2 0 0 Enteriditis Total 18 13 0 0 20
  20. 20. Conclusions • SISTR is a a robust and accurate platform for Salmonella in silico typing with 93.7% concordance between specified serovar and predicted serovar • The prototype 330 gene cgMLST scheme is readily retrievable from HTS assemblies of varying quality levels. • The current scheme provides coarse grain separation of Salmonella genetic lineages that will be useful in outbreak analysis 21
  21. 21. 22 Acknowledgements Team:  Ed Taboada, Peter Kruczkiewicz, Catherine Yoshida, John Nash Research partners:  Public Health Agency of Canada:  OIE Laboratory for Salmonellosis – National Microbiology Lab (NML) @ Guelph  Genomics Core and Bioinformatics Core – NML @ Winnipeg  Public Health Genomics team – NML @ Winnipeg  IRIDA project team  Animal Health Veterinary Laboratory Agency – UK  Austrian Institute of Technology – Austria Funding:  Genomics Research and Development Initiative  Genome Canada (IRIDA project)  Public Health Agency of Canada

Editor's Notes

  • Serovar prediction provides antigenic formula and serovar name – compatible with historical data
  • ×