Successfully reported this slideshow.
Your SlideShare is downloading. ×

The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pipeline to Support Real Time Sequencing of Foodborne Pathogens

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
RML NCBI Resources
RML NCBI Resources
Loading in …3
×

Check these out next

1 of 24 Ad

The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pipeline to Support Real Time Sequencing of Foodborne Pathogens

Download to read offline

http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/

Real time sequencing of food borne pathogens: Pathogen Analysis Pipeline at The National Center for Biotechnology Information (NCBI). Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management -23-25 May 2016, Rome, Italy.

http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/

Real time sequencing of food borne pathogens: Pathogen Analysis Pipeline at The National Center for Biotechnology Information (NCBI). Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management -23-25 May 2016, Rome, Italy.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Advertisement

Similar to The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pipeline to Support Real Time Sequencing of Foodborne Pathogens (20)

Advertisement

Recently uploaded (20)

The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pipeline to Support Real Time Sequencing of Foodborne Pathogens

  1. 1. The NCBI Pathogen Analysis Pipeline to Support Real Time Sequencing of Foodborne Pathogens William Klimke GMI9
  2. 2. NCBI Pathogen Detection Pipeline NCBISubmissionPortal BioSamples SRA GenBank BioProject NCBI Pathogen Pipeline Kmer analysis Genome Assembly Genome Annotation Genome Placement Clustering SNP analysis Tree Construction Reports QC
  3. 3. sample_name organism strain/isolate Category (attribute_package) 1a) Clinical/Host-associated 1a1) specific_host 1a2) isolation_source 1a3) host-disease OR 1b) Environmental/Food/Other 1b1) isolation_source collection_date Geographic location 6a) geo_loc_name OR 6b) lat_lon collected by Where When Who What minimal metadata NCBI Biosample – Pathogen Template (Foodborne Outbreaks) https://submit.ncbi.nlm.nih.gov/subs/biosample/ https://www.ncbi.nlm.nih.gov/biosample/docs/ http://www.ncbi.nlm.nih.gov/projects/biosample/validate/
  4. 4. NCBI Pathogen Detection Pipeline Submissions (Jan – May, 2016)
  5. 5. Automated Bacterial Assembly SRA Reads sample 1 Trim reads (Ns, adaptor) Reference Distance tree Find closest reference genome(s) ArgoCA (Combined Assembly) De novo assembly panel Argo (Reference assisted assembly) SOAP denovo GS-assembler (newbler)MaSuRCA Celera Assembler Reads remapped to combined assembly Contig fasta Read placements (bam) Quality profile SPAdes
  6. 6. 1. Initial partition of isolates within each species by kmer distances 2. Within each partition, blast comparison of all pairs of genomes 3. Single linkage clusters with at most 50 SNPs 4. Within clusters, SNPs with respect to one reference 5. Generate final SNP list and phylogenetic trees Filtering: • Base level • Repeat • Density Problematic genomes are eliminated at various points along the way SNP pipeline
  7. 7. High SNP density Cumulative count of differences Iterative density filtering (Richa Agarwala modification of Science. 2011 Jan 28;331(6016):430-4.
  8. 8. Type Total targets in k- mer tree Targets in clusters (single linkage <= 50 SNPs) Salmonella 45297 38794 Listeria 9621 8135 E. coli & Shigella 13144 6046 Campylobacter 2234 1569 Acinteobacter 2179 1299 Elizabethkingia 89 74 Serratia 336 227 Klebsiella 1194 677 Total targets (May 2016)
  9. 9. http://www.ncbi.nlm.nih.gov/pathogens/ Results Available Now
  10. 10. there are several rows as NULL – means the target either is not in a cluster (check last column) or is in a cluster without any other isolate of the opposite type rows with low SNP count are significant these isolates are all <10 SNPs, and they all are in the same cluster
  11. 11. NCBI Pathogen Detection SNP Pipeline: example 1 - stone fruit outbreak
  12. 12. http://www.cdc.gov/mmwr/preview/mmwrhtml/mm6410a6.htm?s_cid=mm6410a6_e#Fig similar results to CDC wgMLST
  13. 13. MN chicken kiev outbreak NCBI Pathogen Detection SNP Pipeline: example 2 – chicken kiev outbreak
  14. 14. NCBI Pathogen Detection SNP Pipeline Web viewer (coming soon): example 3 – Elizabethkingia outbreak
  15. 15. wgMLST approach • Complementary to SNP analysis e.g. consistency check • Efficient for initial clustering of all isolates in species • Generate loci using “essentially complete” RefSeq genomes Organism Number of loci Genome in loci Number of genomes Major species Acinetobacter 2420 58.25% 43/47 Baumannii Campylobacter 1257 68.36% 90/132 Jejuni Escherichia 2896 52.97% 159/165 Coli Klebsiella 4004 82.54% 67/82 Pneumoniae Listeria 2364 73.88% 73/81 Monocytogenes Salmonella 3469 66.98% 137/147 Enterica R&D: wgMLST
  16. 16. • Fast & relatively simple • Epidemiologists are familiar with it • Good for initial clustering • Different heuristics • Can use special markers for e.g. serovars • Still need to deal with assembly errors • Recombination can still be a problem… wgMLST – a complementary method Loci are not independent R&D: wgMLST
  17. 17. NCBI’s Role in Combating Antibiotic Resistant Bacteria “Create a repository of resistant bacterial strains (an “isolate bank”) and maintain a well-curated reference database that describes the characteristics of these strains.” “Develop and maintain a national sequence database of resistant pathogens.”
  18. 18. AMR efforts at NCBI • With collaborators, build database of sequenced isolates with standardized AMR metadata (i.e. accept antibiograms) (2019 Samples as of May 16 - http://www.ncbi.nlm.nih.gov/biosample/?term=antibiogram[filter]) • Collaborators include: (CDC, WRAIR, FDA, B&W) • Stable, up-to-date database of AMR genes with standardized nomenclature • Collaborators (CARD) • RefSeq set released by June 2016 • Implement and validate tools for identifying AMR genes in new isolates
  19. 19. Antibiogram Fields • Fields designed to find balance between comprehensiveness and ease of submission • Data dictionaries based on outside expertise (ASM, CLSI) standardize input and minimize ‘data drift’
  20. 20. mcr-1 encoding organisms Total E. coli 11 Salmonella 10 Antibiotic resistance
  21. 21. NCBI Outputs Kmer tree ftp://ftp.ncbi.nlm.nih.gov/pathogen/Results/ • Genome Workbench • full SNP reports • Integrated web-based interactive system* • AMR reports* • wgMLST*
  22. 22. Acknowledgements Richa Agarwala Azat Badretdin Slava Brover Joshua Cherry Vyacheslav Chetvernin Robert Cohen Michael DiCuccio Mike Feldgarden Dan Haft William Klimke Alex Kotliarov Arjun Prasad Edward Rice Kirill Rotmistrovskyy This research was supported by the Intramural Research Program of the NIH, National Library of Medicine. http://www.ncbi.nlm.nih.gov National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA CDC FDA/CFSAN USDA-FSIS PHE/FERA NIHGRI NIAID WRAIR Broad Wadsworth/MDH Vendors: PacBio, Illumina, Roche Stephen Sherry Sergey Shiryev Martin Shumway Tatiana Tatusova Igor Tolstoy Chunlin Xiao Leonid Zaslavsky Alexander Zasypkin Alejandro A. Schaffer Lukas Wagner Aleksandr Morgulis David Lipman James Ostell

×