Next Generation Sequencing for
Identification and Subtyping
of Foodborne Pathogens
National Center for Emerging and Zoonotic Infectious Diseases
Division of Foodborne, Waterborne, and Environmental Diseases
Rebecca Lindsey, PhD
Enteric Diseases Laboratory Branch
NIST Workshop October 20, 2014
Advanced Molecular Detection (AMD) Initiative
http://www.cdc.gov/amd/
• Projects to transform Networks, programs and
systems – 8 CDC projects
• EDLB- Transforming public health microbiology with whole genome
sequencing for foodborne diseases (Salmonella, Shiga toxin-
producing Escherichia coli (STEC), and Campylobacter)
• Projects Using AMD for Specific Pathogens – 15
CDC projects
• EDLB- Maximizing the potential of real-time whole genome sequence-
based Listeria surveillance to solve outbreaks and improve food safety
No CDC consensus on how to use
WGS for identification
Collaborating Partners
• Collaboration among the public health departments
in the states, FDA, USDA, and NCBI
• International component: Developing and refining
bioinformatics ‘pipelines’ with partners
in Belgium, Canada, Denmark, England, and France
Public Health Agency of Canada
Vision
for the use of WGS in the surveillance of foodborne illness
WGS is used to characterize foodborne pathogens in
public health laboratories, replacing multiple
workflows with one single efficient workflow
TAT: (2-) 3- 4 days
Current Methods of Characterizing Foodborne
Pathogens in a Public Health Laboratory
• Growth characteristics
• Phenotypic panels
• Agglutination reactions
• Enzyme immuno assays (EIAs)
• PCR
• DNA arrays (hybridization)
• Sanger sequencing
• DNA restriction
• Electrophoresis (PFGE, capillary)
• Each pathogen is characterized by methods that are specific to
that pathogen in multiple workflows
- Separate workflows for each pathogen
- TAT: 5 min – weeks (months)
Why Move Public Health
Microbiology to WGS?
Besides consolidation of workflows in the labs:
• More efficient outbreak detection, investigation & control
• Precise and flexible case definition
– More outbreaks will be detected and solved when they are
small
– Scarce epi-resources may be focused
• More efficient surveillance of sporadic infections
• Source attribution analysis of sporadic disease
• Focus on pathogens of particular public health
importance:
– Virulence – Resistance - Emerging pathogens - Rapidly
spreading clones/ traits- Vaccine preventable diseases
WGS in Public Health:
The tools must be
• Simple
• Public health microbiologists are NOT
bioinformaticians
• Standard desktop software
• Comprehensive
• All characterization in one workflow
• Work in a network of laboratories
• Free sharing and comparison of data between labs
• Central and local databases
To SNP or Not to SNP?
in public health
• Single Nucleotide Polymorphism (SNP) approaches
• Default for phylogenetic analyses of sequence data
• Comparative subtyping by nature
• Results difficult to communicate
• Computationally intensive = SLOW
• Gene- gene approach (wgMLST)
• Definitive subtyping
• Leads to naming, tracking over time, easy communication
• Computationally more simple = FAST but…
• Sufficiently discrimination?
• YES!
Standardization of WGS
Public Health Microbiology
• Methods
• Analysis
• Nomenclature
Standardization of
Methods
• Standard Operating Procedures- CLIA
certification- in EDLB
• Recommended protocols in state labs
• Sequencing quality metrics
– Qvalues – vary by machine
– Coverage – for upload to NCBI
• 20X Listeria, Campylobacter
• 30X Salmonella
• 40X STEC/Shigella
Salmonella www.cdc.gov/amd
NGS Standards in Progress for Clinical Labs
• The College of American Pathologists (CAP) –NGS
molecular pathology
- includes 18 laboratory accreditation checklist requirements for
the analytic “wet bench” process and “dry lab” bioinformatics
analysis processes (Aziz et al 2014).
• National Next-generation Sequencing Standardization
of Clinical Testing (Nex-StoCT) workgroup.
- developed guidelines to ensure that results from tests based
on NGS are reliable and useful for clinical decision making
(Gargis et al 2013).
• All labs submitting NGS to CLIA labs will have to
follow CLIA protocols
Standardization of
Analysis
• Quality metrics
• Pipelines
– Primary analysis: whole genome multi-locus
sequence typing (wgMLST)
– Secondary analysis: high quality SNP (hqSNP)
analysis
• References
• Algorithms
• Masking
• Database structure
BioNumerics
• A powerful combined database and analytical
software package
– A ‘one tool fits all’ application for public health
• Highly customizable
• Used by PulseNet, CaliciNet and CryptoNet
– The public health labs are familiar with it
Gene – Gene Approach
• Fixed set of genes (‘loci’) leading to typing schemes
on different levels
• Concept of allelic variation, not only point mutations
• Evolutionary distance for events such as recombination
and simultaneous close-range mutations are counted as
one event
• Definitive subtyping
• Leads to nomenclature
• Requires curation
eMLST cMLST wgMLST
MLST
Genus/Species
Serotype
AR
Genes That May Be Targeted In a
Gene-Gene Analytical Approach
Core (c) genes (‘present
in all strains in a species’)
Housekeeping genes for MLST & eMLST
Serotyping genes
Genes for genus/species/subspecies
identification
Virulence genes
Antimicrobial resistance
genes
Pan- genome (wg) (‘all
genes in the whole
population of a species’)
Public Health WGS Workflow
Nomenclature server
Calculation engine
Trimming, mapping, de novo
assembly, SNP detection, allele
detection
SQL databases
End users at
CDC and in
the States
Allele databases
External storage
NCBI, ENA, BaseSpace
Sequencer
Genus/species
Serotype
Pathotype
Virulence profile
AST
Lineage
Clone
Sequence type
Allele
Raw sequences
LIMS
Public Health WGS Workflow
Nomenclature server
Calculation engine
Trimming, mapping, de novo
assembly, SNP detection, allele
detection
SQL databases
End users at
CDC and in
the States
Allele databases
External storage
NCBI, ENA, BaseSpace
Sequencer
Genus/species
Serotype
Pathotype
Virulence profile
AST
Lineage
Clone
Sequence type
Allele
Raw sequences
LIMS
The Nomenclatural Server in
the WGS Workflow
• A database with all genes and gene variants (‘alleles’)
• Function of most genes not known
but
• Genes used for reference characterization are also included
• E.g., genus/species identification, serotyping, pathotyping, virulence
characterization, antimicrobial resistance, MLST
• Alleles detected by the calculation engine are identified and NAMED
• New alleles are added to the database automatically
• Ambiguous alleles are forwarded to database managers and organism
specific SME’s for curation/confirmation before being added
 Building the nomenclatural
database is an international
collaborative effort
 Should ultimately be placed in
public domain
Building species specific allele
data bases - wgMLST
• Listeria
- 200 annotated reference genomes
- 5800 unique loci
• Campylobacteraceae
– 100 annotated reference genomes
– current BIGSdb
• Shiga toxin-producing E. coli
- 60 annotated reference genomes
- E. coli databases
- ResFinder
-VirulenceFinder
-SerotypeFinder
O target = wzy,
wzx, wzm and wzt
H target = flic, flka,
flla, flma and flna
Zankari E, et al., J Antimicrob
Chemother. 2012. 67(11):2640-4.
Joensen KG, et al.J. Clin.
Micobiol. 2014. 52(5): 1501-1510.
Escherichia and Shigella Reference Unit
O serology workflow
Public Health WGS Workflow
Nomenclature server
Calculation engine
Trimming, mapping, de novo
assembly, SNP detection, allele
detection
SQL databases
End users at
CDC and in
the States
Allele databases
External storage
NCBI, ENA, BaseSpace
Sequencer
Genus/species
Serotype
Pathotype
Virulence profile
AST
Lineage
Clone
Sequence type
Allele
Raw sequences
LIMS
The Calculation Engine in the
WGS Workflow
• Current: Closed - OID
Bioinformatics Core
• Potential: Public - In ‘the
cloud’ for the global public
health community
• Computationally intensive
sequence trimming,
mapping, de novo assembly,
SNP detection, allele
detection
• Slow - but a ‘one-time’
process
Calculation engine
Allele data in BioNumerics for
wgMLST analysis
Standardization of WGS
Public Health Microbiology
• Methods
• Analysis
• Nomenclature
Standardization of
Nomenclature
• Naming wgMLST patterns
• Still need epidemiology data
– To detect outbreaks
Gene – Gene Approach for Naming
Subtyping in Keep with Phylogeny
(concept to be developed)
eMLST cMLST wgMLST7 gene MLST
Isolate A ST24 - e12 - c48 - w214
Isolate B ST24 - e12 - c48 - w352
Isolate C ST24 - e12 - c45 - w132
Isolate D ST31 - e15 - c60 - w582
Isolate A and B closely related
Isolate C related to A and B but not as closely as A is to B
Isolate D unrelated to all the other isolates
Providing phylogenetic information in the name is important because isolates from the
same source are more likely to be related than isolates from different sources
PATHOTYPE: Shiga toxin producing and Enteroaggregative E. coli (STEC & EaggEC)
VIRULENCE PROFILE: stx2a, aagR, aagA, sigA, sepA, pic, aatA, aaiC, aap
SEQUENCE TYPE: ST34
ANTIMICROBIAL RESISTANCE GENES: blaTEM-1 , blaCTX-M-15
The strain contains Shiga toxin subtype 2a typically associated with virulent STEC
It does not contain adherence and virulence factors (eae, ehxA) typically associated with virulent STEC
It contains adherence and virulence factors typically associated with virulent EaggEc (aagR, aagA, sigA, sepA,
pic, aatA, aaiC, aap)
This genotype is associated with extremely high (>10%) rates of hemolytic uremic syndrome (HUS)
All characteristics have been determined by whole genome sequencing (WGS)
GENUS/SPECIES:
Conclusion: Standardization of WGS
Public Health Microbiology
• No CDC consensus among the many
different organisms
• Standardization of NGS following
CAP/CLIA guidelines.
• Standardization among collaborators
-- Methods
-- Analysis
-- Nomenclature
Acknowledgements
National Center for Emerging and Zoonotic Infectious Diseases
Division of Foodborne, Waterborne, and Environmental Diseases
Disclaimers:
“The findings and conclusions in this presentation are those of the author and do not necessarily
represent the official position of the Centers for Disease Control and Prevention”
“Use of trade names is for identification only and does not imply endorsement by the Centers for
Disease Control and Prevention or by the U.S. Department of Health and Human Services.”
Public Health Agency of Canada
CDC: Heather Carleton, Eija Trees, Peter Gerner-Smidt, Collette Leaumont, Efrain
Ribot, Lee Katz, Nancy Strockbine
Questions?
For more information please contact Centers for Disease Control and Prevention
Enteric Diseases Laboratory Branch
1600 Clifton Road NE, Atlanta, GA 30333
The findings and conclusions in this report are those of the authors and do not necessarily represent the
official position of the Centers for Disease Control and Prevention.

Next Generation Sequencing for Identification and Subtyping of Foodborne Pathogens

  • 1.
    Next Generation Sequencingfor Identification and Subtyping of Foodborne Pathogens National Center for Emerging and Zoonotic Infectious Diseases Division of Foodborne, Waterborne, and Environmental Diseases Rebecca Lindsey, PhD Enteric Diseases Laboratory Branch NIST Workshop October 20, 2014
  • 2.
    Advanced Molecular Detection(AMD) Initiative http://www.cdc.gov/amd/ • Projects to transform Networks, programs and systems – 8 CDC projects • EDLB- Transforming public health microbiology with whole genome sequencing for foodborne diseases (Salmonella, Shiga toxin- producing Escherichia coli (STEC), and Campylobacter) • Projects Using AMD for Specific Pathogens – 15 CDC projects • EDLB- Maximizing the potential of real-time whole genome sequence- based Listeria surveillance to solve outbreaks and improve food safety No CDC consensus on how to use WGS for identification
  • 4.
    Collaborating Partners • Collaborationamong the public health departments in the states, FDA, USDA, and NCBI • International component: Developing and refining bioinformatics ‘pipelines’ with partners in Belgium, Canada, Denmark, England, and France Public Health Agency of Canada
  • 5.
    Vision for the useof WGS in the surveillance of foodborne illness WGS is used to characterize foodborne pathogens in public health laboratories, replacing multiple workflows with one single efficient workflow TAT: (2-) 3- 4 days
  • 6.
    Current Methods ofCharacterizing Foodborne Pathogens in a Public Health Laboratory • Growth characteristics • Phenotypic panels • Agglutination reactions • Enzyme immuno assays (EIAs) • PCR • DNA arrays (hybridization) • Sanger sequencing • DNA restriction • Electrophoresis (PFGE, capillary) • Each pathogen is characterized by methods that are specific to that pathogen in multiple workflows - Separate workflows for each pathogen - TAT: 5 min – weeks (months)
  • 7.
    Why Move PublicHealth Microbiology to WGS? Besides consolidation of workflows in the labs: • More efficient outbreak detection, investigation & control • Precise and flexible case definition – More outbreaks will be detected and solved when they are small – Scarce epi-resources may be focused • More efficient surveillance of sporadic infections • Source attribution analysis of sporadic disease • Focus on pathogens of particular public health importance: – Virulence – Resistance - Emerging pathogens - Rapidly spreading clones/ traits- Vaccine preventable diseases
  • 8.
    WGS in PublicHealth: The tools must be • Simple • Public health microbiologists are NOT bioinformaticians • Standard desktop software • Comprehensive • All characterization in one workflow • Work in a network of laboratories • Free sharing and comparison of data between labs • Central and local databases
  • 9.
    To SNP orNot to SNP? in public health • Single Nucleotide Polymorphism (SNP) approaches • Default for phylogenetic analyses of sequence data • Comparative subtyping by nature • Results difficult to communicate • Computationally intensive = SLOW • Gene- gene approach (wgMLST) • Definitive subtyping • Leads to naming, tracking over time, easy communication • Computationally more simple = FAST but… • Sufficiently discrimination? • YES!
  • 10.
    Standardization of WGS PublicHealth Microbiology • Methods • Analysis • Nomenclature
  • 11.
    Standardization of Methods • StandardOperating Procedures- CLIA certification- in EDLB • Recommended protocols in state labs • Sequencing quality metrics – Qvalues – vary by machine – Coverage – for upload to NCBI • 20X Listeria, Campylobacter • 30X Salmonella • 40X STEC/Shigella Salmonella www.cdc.gov/amd
  • 12.
    NGS Standards inProgress for Clinical Labs • The College of American Pathologists (CAP) –NGS molecular pathology - includes 18 laboratory accreditation checklist requirements for the analytic “wet bench” process and “dry lab” bioinformatics analysis processes (Aziz et al 2014). • National Next-generation Sequencing Standardization of Clinical Testing (Nex-StoCT) workgroup. - developed guidelines to ensure that results from tests based on NGS are reliable and useful for clinical decision making (Gargis et al 2013). • All labs submitting NGS to CLIA labs will have to follow CLIA protocols
  • 13.
    Standardization of Analysis • Qualitymetrics • Pipelines – Primary analysis: whole genome multi-locus sequence typing (wgMLST) – Secondary analysis: high quality SNP (hqSNP) analysis • References • Algorithms • Masking • Database structure
  • 14.
    BioNumerics • A powerfulcombined database and analytical software package – A ‘one tool fits all’ application for public health • Highly customizable • Used by PulseNet, CaliciNet and CryptoNet – The public health labs are familiar with it
  • 15.
    Gene – GeneApproach • Fixed set of genes (‘loci’) leading to typing schemes on different levels • Concept of allelic variation, not only point mutations • Evolutionary distance for events such as recombination and simultaneous close-range mutations are counted as one event • Definitive subtyping • Leads to nomenclature • Requires curation eMLST cMLST wgMLST MLST Genus/Species Serotype AR
  • 16.
    Genes That MayBe Targeted In a Gene-Gene Analytical Approach Core (c) genes (‘present in all strains in a species’) Housekeeping genes for MLST & eMLST Serotyping genes Genes for genus/species/subspecies identification Virulence genes Antimicrobial resistance genes Pan- genome (wg) (‘all genes in the whole population of a species’)
  • 17.
    Public Health WGSWorkflow Nomenclature server Calculation engine Trimming, mapping, de novo assembly, SNP detection, allele detection SQL databases End users at CDC and in the States Allele databases External storage NCBI, ENA, BaseSpace Sequencer Genus/species Serotype Pathotype Virulence profile AST Lineage Clone Sequence type Allele Raw sequences LIMS
  • 18.
    Public Health WGSWorkflow Nomenclature server Calculation engine Trimming, mapping, de novo assembly, SNP detection, allele detection SQL databases End users at CDC and in the States Allele databases External storage NCBI, ENA, BaseSpace Sequencer Genus/species Serotype Pathotype Virulence profile AST Lineage Clone Sequence type Allele Raw sequences LIMS
  • 19.
    The Nomenclatural Serverin the WGS Workflow • A database with all genes and gene variants (‘alleles’) • Function of most genes not known but • Genes used for reference characterization are also included • E.g., genus/species identification, serotyping, pathotyping, virulence characterization, antimicrobial resistance, MLST • Alleles detected by the calculation engine are identified and NAMED • New alleles are added to the database automatically • Ambiguous alleles are forwarded to database managers and organism specific SME’s for curation/confirmation before being added  Building the nomenclatural database is an international collaborative effort  Should ultimately be placed in public domain
  • 20.
    Building species specificallele data bases - wgMLST • Listeria - 200 annotated reference genomes - 5800 unique loci • Campylobacteraceae – 100 annotated reference genomes – current BIGSdb • Shiga toxin-producing E. coli - 60 annotated reference genomes - E. coli databases
  • 21.
    - ResFinder -VirulenceFinder -SerotypeFinder O target= wzy, wzx, wzm and wzt H target = flic, flka, flla, flma and flna Zankari E, et al., J Antimicrob Chemother. 2012. 67(11):2640-4. Joensen KG, et al.J. Clin. Micobiol. 2014. 52(5): 1501-1510.
  • 22.
    Escherichia and ShigellaReference Unit O serology workflow
  • 23.
    Public Health WGSWorkflow Nomenclature server Calculation engine Trimming, mapping, de novo assembly, SNP detection, allele detection SQL databases End users at CDC and in the States Allele databases External storage NCBI, ENA, BaseSpace Sequencer Genus/species Serotype Pathotype Virulence profile AST Lineage Clone Sequence type Allele Raw sequences LIMS
  • 24.
    The Calculation Enginein the WGS Workflow • Current: Closed - OID Bioinformatics Core • Potential: Public - In ‘the cloud’ for the global public health community • Computationally intensive sequence trimming, mapping, de novo assembly, SNP detection, allele detection • Slow - but a ‘one-time’ process Calculation engine
  • 25.
    Allele data inBioNumerics for wgMLST analysis
  • 26.
    Standardization of WGS PublicHealth Microbiology • Methods • Analysis • Nomenclature
  • 27.
    Standardization of Nomenclature • NamingwgMLST patterns • Still need epidemiology data – To detect outbreaks
  • 28.
    Gene – GeneApproach for Naming Subtyping in Keep with Phylogeny (concept to be developed) eMLST cMLST wgMLST7 gene MLST Isolate A ST24 - e12 - c48 - w214 Isolate B ST24 - e12 - c48 - w352 Isolate C ST24 - e12 - c45 - w132 Isolate D ST31 - e15 - c60 - w582 Isolate A and B closely related Isolate C related to A and B but not as closely as A is to B Isolate D unrelated to all the other isolates Providing phylogenetic information in the name is important because isolates from the same source are more likely to be related than isolates from different sources
  • 29.
    PATHOTYPE: Shiga toxinproducing and Enteroaggregative E. coli (STEC & EaggEC) VIRULENCE PROFILE: stx2a, aagR, aagA, sigA, sepA, pic, aatA, aaiC, aap SEQUENCE TYPE: ST34 ANTIMICROBIAL RESISTANCE GENES: blaTEM-1 , blaCTX-M-15 The strain contains Shiga toxin subtype 2a typically associated with virulent STEC It does not contain adherence and virulence factors (eae, ehxA) typically associated with virulent STEC It contains adherence and virulence factors typically associated with virulent EaggEc (aagR, aagA, sigA, sepA, pic, aatA, aaiC, aap) This genotype is associated with extremely high (>10%) rates of hemolytic uremic syndrome (HUS) All characteristics have been determined by whole genome sequencing (WGS) GENUS/SPECIES:
  • 30.
    Conclusion: Standardization ofWGS Public Health Microbiology • No CDC consensus among the many different organisms • Standardization of NGS following CAP/CLIA guidelines. • Standardization among collaborators -- Methods -- Analysis -- Nomenclature
  • 31.
    Acknowledgements National Center forEmerging and Zoonotic Infectious Diseases Division of Foodborne, Waterborne, and Environmental Diseases Disclaimers: “The findings and conclusions in this presentation are those of the author and do not necessarily represent the official position of the Centers for Disease Control and Prevention” “Use of trade names is for identification only and does not imply endorsement by the Centers for Disease Control and Prevention or by the U.S. Department of Health and Human Services.” Public Health Agency of Canada CDC: Heather Carleton, Eija Trees, Peter Gerner-Smidt, Collette Leaumont, Efrain Ribot, Lee Katz, Nancy Strockbine
  • 32.
    Questions? For more informationplease contact Centers for Disease Control and Prevention Enteric Diseases Laboratory Branch 1600 Clifton Road NE, Atlanta, GA 30333 The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

Editor's Notes

  • #11 There is no consensus at CDC for any of the above.
  • #12 All labs submitting to the reference labs will have to be CLIA certified. IN the past state labs could have Molecular Pulsenet and Reference labs, now all will have to be CLIA certified so if they are all conducting NGS and they are all CLIA, they may streamline the labs into one lab. Working on EDLB and PulseNet standardized protocols which are recommended to states. Working towards CLIA certification of all steps in the process. Need Sequencing quality metrics – Qvalues vary MiSeq vs. NextSeq vs. Pgem. For PulseNet and EDLB reference labs we have coverage recommendations.
  • #14 W High quality standard reference genomes that have been annotated would be helpful for hqSNP as well as building databases for wgMLST. Working for CLIA certification of all steps in the process.
  • #21 well characterized annotated reference genomes Pac-bio sequencing still working on the STEC database More high quality genomes would be useful
  • #27 There is no consensus at CDC for any of the above.
  • #28 Want to be able to automatically name a pattern
  • #31 There is no consensus at CDC for any of the above.