Utility of the Salmonella in Silico Typing
Resource(SISTR) to outbreak investigations
James Robertson1, Catherine Yoshida1, Peter Kruczkiewicz2, Eduardo N.
Taboada2 and John H. E. Nash3
1 National Microbiology Laboratory @Guelph , Public Health Agency of Canada
2 National Microbiology Laboratory @Lethbridge, Public Health Agency of Canada
3 National Microbiology Laboratory @Toronto, Public Health Agency of Canada
Salmonella is a leading public health concern
Salmonella is a leading food-borne pathogen both in Canada and around the world
Globally, there are an estimated 94 million Salmonella infections every year
Human costs:
• acute illness
• loss of life (155,000 deaths)
Societal costs:
• health care costs
• lost productivity
• legal costs
• impact to food industry
2
Challenges in Salmonella typing and epidemiology
Small number of highly prevalent/globally distributed serovars account for most
outbreaks (e.g. Enteritidis, Typhimurium)
Epidemiologicaly unrelated isolates within same serovar difficult to
investigate
Additional subtyping resolution within a serovar needed (e.g. phage typing)
Increasing use of genotypic methods (i.e. molecular typing)
Driven by need for methods with higher discriminatory power
A number of different approaches have been applied to molecular typing of
Salmonella
4
5
GATCGATCGATCG
GATCAATCGATCG
MLST cgMLST wgSNP’sSerotyping
Discriminatory Power
Low Low-Mid Mid-High High
• Based on reaction
of antibodies to
surface antigens
• Broad usage and
common
nomenclature in
use since the
1930’s
• Multi-Locus Sequence Typing:
developed by Maiden et al. (1998)
• Indexes genetic variation in 7 core (i.e.
“housekeeping”) genes
• cgMLST extends this principle to 100’s
to 1000’s of loci
• Provides a portable naming scheme
which correlates with historical
serotypes
• Utilizes individual
SNP’s and gives
very high
resolution
• Results are not
portable to other
public health
professionals
7
• Initial dataset of 4330 genomes
• 94.6% concordance between predicted
and reported serovar
• in silico serovar predictions based on O
and H antigens
• cgMLST refinement of serovar
assignment and analysis
• Uses minimally processed genome
assemblies
• Very fast ~30 seconds to process a
genome
What does SISTR do?
In silico analysis of WGS data
assembly statistics
serovar prediction
in silico typing (MLST,
cgMLST)
AMR prediction
Comparative genomic analyses
cgMLST
accessory gene content
core SNPs
Epidemiologic analysis
geospatial distribution
temporal distribution
source association
https://lfz.corefacility.ca/sistr-app/
SISTR cgMLST
• Current cgMLST scheme in SISTR based on 330 core
genes with high “assignability” (i.e. very low levels of
“missing” data)
• Will include international Salmonella cgMLST scheme (i.e.
once it is developed!)
• cgMLST information is used to:
– Assess quality of WGS data complete, partial,
missing loci
– Supplement genoserotyping predictions
9
Testing the accuracy of SISTR
• ~45,000 Salmonella genomes were downloaded from the
SRA
• Raw reads were assembled using FLASH and Spades
• Assemblies were loaded into SISTR and the serovar
predictions were compared between predicted and
reported (where available)
• Assemblies were checked for contamination using Kraken
• Quality was assessed using Quast
10
Recovery rates of 330 cgMLST genes from Assembled
SRA genomes
11
41781
1393
1905
Number of Genomes with
Complete 330
Number of Genomes with >300
Genes
Number of Genomes with <300
Genes
N=45,079
13
• Two outbreaks of Salmonella Enteriditis were retrospectively sequenced
• Examined the feasibility of WGS to outbreak investigations
• Compared results of traditional molecular and microbial tests to WGS
Concordance between cgMLST and SNP trees
Study Correct Over-grouped Split Combination Serovar(s)
1 1 1 0 0 Enteriditis
2 2 3 0 0 Enteriditis
3 5 1 0 0 Enteriditis,Typhimurium,
Derby
4 2 7 0 0 Enteriditis
5 2 0 0 0 Enteriditis
6 5 2 0 0 Enteriditis
Total 18 13 0 0
20
Conclusions
• SISTR is a a robust and accurate platform for Salmonella in silico
typing with 93.7% concordance between specified serovar and
predicted serovar
• The prototype 330 gene cgMLST scheme is readily retrievable from
HTS assemblies of varying quality levels.
• The current scheme provides coarse grain separation of Salmonella
genetic lineages that will be useful in outbreak analysis
21
22
Acknowledgements
Team:
Ed Taboada, Peter Kruczkiewicz, Catherine Yoshida, John Nash
Research partners:
Public Health Agency of Canada:
OIE Laboratory for Salmonellosis – National Microbiology Lab (NML) @
Guelph
Genomics Core and Bioinformatics Core – NML @ Winnipeg
Public Health Genomics team – NML @ Winnipeg
IRIDA project team
Animal Health Veterinary Laboratory Agency – UK
Austrian Institute of Technology – Austria
Funding:
Genomics Research and Development Initiative
Genome Canada (IRIDA project)
Public Health Agency of Canada
Editor's Notes
Serovar prediction provides antigenic formula and serovar name – compatible with historical data