SlideShare a Scribd company logo
Bacterial Pathogen Genomics at
NCBI
FDA, USDA, CDC State, Local and
Foreign Public Health Agencies
Industry/Academia Additional
DATA ANALYSIS
DATA ASSEMBLY AND
STORAGE and Analysis
DATA ACQUISITION
NCBI, EMBL DDBJ (INDIS)
(Public Access Database)
Our Current Model – Publicly available data
National Network of SequencersIntrenational Network of Sequencers
Automated Bacterial Assembly
SRA Reads
sample 1
Trim reads
(Ns, adaptor)
Reference
Distance tree
Find closest reference genome(s)
ArgoCA (Combined Assembly)
De novo assembly panel
Argo (Reference
assisted
assembly)
SOAP denovo
GS-assembler
(newbler)
MaSuRCA
Celera
Assembler
Reads remapped to combined assembly
Contig fasta
Read placements (bam)
Quality profile
SPAdes
WGS & Epidemiologically Relevant Distance (ERD)
• WGS allows high resolution genotypic comparison of
pathogen isolates
• What is the epidemiological relevance of genotypic
distance?
• Many methods to compute – we need some common
principles…
Since all approaches start with sequence reads, we must
retain for independent confirmation
0
0.2
0.4
0.6
0.8
1
0 500 1000 1500
Millions
FDA-CFSAN: microbial foodborne pathogen
research
SRA format bytes per sequenced base versus
number of bases in MiSeq runs
With Quality Without Qualities
0
0.2
0.4
0.6
0.8
0 200 400 600 800 1000 1200
Millions
OXFORD University: Population Genomics of
Mycobacterium tuberculosis
SRA format bytes per sequenced base versus
number of bases in MiSeq and HiSeq runs
With Quality Without Quality
Storage is manageable…
Reliable, transparent, high throughput, high
resolution ERDs?
Major challenge is to distinguish independent
events (SNPs) from single events that generate
multiple nucleotide differences
i.e. collapsed repeats and other artifacts,
alignment errors (reference-based alignments),
sequence quality, & recombination
Fairly uniform distribution
of differences along the
two genomes…?
Cumulative count of differences
Iterative density filtering
(Richa Agarwala
modification of
Science. 2011 Jan
28;331(6016):430-4.
Table: Samples currently processed (as of Sept 5, 2014) in NCBI Pathogen Pipeline
Organisms
Center Listeria Salmonella E. coli Total
CDC 903 903
FDA + State Partners* 858 6129 307 7294
100K 565 34 599
FERA 14 14
Total 1775 6694 341 8810
Processing Status
How to measure the system?
need the raw data (sequence reads) in unprocessed form
any read trimming/filtering along with the assembly can be regenerated
Assembly metrics
map the reads back to the assembly and generate a profile of each position
(coverage, alleles, qualities)
compare the assembly against other assemblies of the same organism (genus,
species) and check the expected genome size, or similarity to related genomes
annotation metrics such as frameshifted proteins
What is the actual measurement for sequence
similarity?
the number of pairwise SNPs between two genomes
What is the threshold?
a pairwise distance (an observationally determined cutoff below which a cluster of 2
or more isolates are considered significantly close enough to warrant further investigation)
Sensitivity vs. Specificity
sequence clustering
sensitivity – measure of isolates which belong to the cluster within epidemiologically
relevant distance
(true positives) / true positives + false negatives (not correctly identified)
specificity – measure of isolates which are excluded from a cluster within
epidemiologically relevant distance
(true negatives) / true negatives + false positives
Organism
Total
Samples
Not
expected
species1
Mixed
organisms
Less than
5X
coverage Duplicates PacBio
Poor
2nd
read
Failed
assembly
stage
Listeria 1775 20 2 (?) 1 5 1
Salmonella 6694 35 5 9 12
E. coli 341 8 1
1. not L. monocytogenes, S. enterica, or E. coli
Processing Problems
PROBLEMS!
Reference Materials
Streptococcus massiliensis 4401825 - CANO - GCA_000341525.1
Streptococcus massiliensis DSM 18628 - ARCE - GCA_000380065.1
Streptococcus intermedius BA1 - ANFT - GCA_000313655.1
Streptococcus intermedius B196 - - GCA_000463355.1
Streptococcus intermedius C270 - - GCA_000463385.1
Streptococcus intermedius F0413 - AFXO - GCA_000234035.1
Streptococcus intermedius SK54 - AJKN - GCA_000258445.1
Streptococcus intermedius JTH08 - - GCA_000306805.1
Streptococcus intermedius ATCC 27335 - ATFK - GCA_000413475.1
Streptococcus intermedius F0395 - AFXN - GCA_000234015.1
Streptococcus sp. AS20 - JANS - GCA_000524255.1
Streptococcus constellatus subsp. constellatus SK53 - AICQ - GCA_000257785.1
Streptococcus constellatus subsp. constellatus SK53 - BASU - GCA_000474075.1
Streptococcus constellatus subsp. pharyngis C1050 - - GCA_000463425.1
Streptococcus constellatus subsp. pharyngis SK1060 = CCUG 46377 - AFUP - GCA_000223295.2
Streptococcus constellatus subsp. pharyngis SK1060 = CCUG 46377 - BASX - GCA_000474135.1
Streptococcus constellatus subsp. pharyngis C232 - - GCA_000463395.1
Streptococcus constellatus subsp. pharyngis C818 - - GCA_000463445.1
Streptococcus anginosus SK1138 - ALJO - GCA_000287595.1
Streptococcus sp. CM7 - JATP - GCA_000526035.1
Streptococcus sp. OBRC6 - JACR - GCA_000517685.1
Streptococcus anginosus F0211 - AECT - GCA_000184365.2
Streptococcus anginosus 1505 - BASW - GCA_000474115.1
Streptococcus sp. ACC21 - JAQU - GCA_000524375.1
Streptococcus sp. AC15 - JDFJ - GCA_000565055.1
Streptococcus anginosus subsp. whileyi MAS624 - - GCA_000478925.1
Streptococcus anginosus subsp. whileyi CCUG 39159 - AICP - GCA_000257765.1
Streptococcus anginosus C238 - - GCA_000463505.1
Streptococcus anginosus DORA_7 - AZMF - GCA_000508545.1
Streptococcus anginosus 1_2_62CV - ADME - GCA_000186545.1
Streptococcus anginosus C1051 - - GCA_000463465.1
Streptococcus anginosus T5 - BASY - GCA_000474155.1
Streptococcus anginosus SK52 = DSM 20563 - AFIM - GCA_000214555.2
Streptococcus anginosus SK52 = DSM 20563 - AREF - GCA_000373605.1
Streptococcus anginosus SK52 = DSM 20563 - BAST - GCA_000474055.1
Streptococcus intermedius SK54 - BASV - GCA_000474095.1
0.05
Escherichia coli KTE179 - ANYQ - GCA_000326485.1
Escherichia coli KTE229 - ANXK - GCA_000353165.1
Escherichia coli H252 - AEFI - GCA_000190895.1
Escherichia coli HVH 180 (4-3051617) - AVYH - GCA_000458685.1
Escherichia coli HVH 73 (4-2393174) - AVUX - GCA_000457025.1
Escherichia coli HVH 104 (4-6977960) - AVVT - GCA_000457455.1
Escherichia coli HVH 19 (4-7154984) - AVTL - GCA_000456265.1
Escherichia coli 908675 - AXTY - GCA_000488755.1
Escherichia coli HVH 127 (4-7303629) - AVWO - GCA_000457855.1
Escherichia coli HVH 12 (4-7653042) - AVTG - GCA_000494955.1
Escherichia coli KOEGE 32 (66a) - AWAD - GCA_000459635.1
Escherichia coli UMEA 3041-1 - AWAW - GCA_000460015.1
Escherichia coli HVH 148 (4-3192490) - AVXH - GCA_000495015.1
Escherichia coli HVH 59 (4-1119338) - AVUQ - GCA_000456885.1
Escherichia coli HVH 222 (4-2977443) - AVZU - GCA_000459455.1
Escherichia coli UMEA 3140-1 - AWBK - GCA_000460295.1
Escherichia coli HVH 178 (4-3189163) - AVYG - GCA_000495055.1
Escherichia coli KTE4 - ANSO - GCA_000350645.1
Escherichia coli KTE3 - ASTO - GCA_000407685.1
Escherichia coli KTE240 - ASUS - GCA_000408305.1
Escherichia coli BIDMC 49b - JAPT - GCA_000522365.1
Escherichia coli BIDMC 49a - JAPU - GCA_000522385.1
Escherichia coli APEC O1 - - GCA_000014845.1
Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 - BAIM - GCA_000613265.1
Escherichia coli JCM 20135 - BAKV - GCA_000614505.1
Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 - AGSE - GCA_000690815.1
Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 - JMST - GCA_000734955.1
Escherichia coli HVH 214 (4-3062198) - AZJN - GCA_000507665.1
Escherichia coli UMEA 3162-1 - AWBU - GCA_000460475.1
Escherichia coli HVH 191 (3-9341900) - AVYR - GCA_000458875.1
Escherichia coli HVH 170 (4-3026949) - AVYA - GCA_000458555.1
Escherichia coli S88 - - GCA_000026285.1
Escherichia coli UMEA 3893-1 - AWEI - GCA_000461775.1
Escherichia coli HVH 217 (4-1022806) - AVZQ - GCA_000459375.1
Escherichia coli KTE5 - ANSP - GCA_000350665.1
Escherichia coli KTE7 - ASTP - GCA_000407705.1
Escherichia coli HVH 32 (4-3773988) - AVTX - GCA_000456505.1
Escherichia coli UMEA 3206-1 - AWCK - GCA_000460795.1
Escherichia coli UMEA 3203-1 - AWCJ - GCA_000460775.1
Escherichia coli KTE62 - ANUK - GCA_000351605.1
Escherichia coli KTE27 - ASTY - GCA_000407885.1
Escherichia coli cloneA_i1 - AEYT - GCA_000233675.2
Escherichia coli 597 - AYQU - GCA_000503475.1
Escherichia coli HVH 203 (4-3126218) - AVZD - GCA_000459115.1
Escherichia coli UMEA 3702-1 - AWDZ - GCA_000461595.1
Escherichia coli UMEA 3662-1 - AWDU - GCA_000461495.1
Escherichia coli HVH 5 (4-7148410) - AVTB - GCA_000456085.1
Escherichia coli HVH 102 (4-6906788) - AVVR - GCA_000465155.1
Escherichia coli HVH 201 (4-4459431) - AVZB - GCA_000459075.1
Escherichia coli HM605 - AJWU - GCA_000264175.1
Escherichia coli HM605 - CADZ - GCA_000285375.1
0.01
http://www.ncbi.nlm.nih.gov/assembly/?term=%22anomalous%22[Properties]
Contamination (multiple organisms)
Assembly for sample SAMN02727350
Type
Number of
contigs
Sum of contig
lengths
Full assembly 667 5251272
contigs with Listeria hits 37 3031650
contigs with Staphylococcus
hits 630 2203573
Contamination (carryover contamination)
Contamination (multiple strains)
Table: Assembly stats for SAMN02693748
measurement result
num_input_reads 4212706
aligned_reads 4040070
assembly_num_bases 3180478
assembly_num_contigs 50
assembly_N50 2817733
poor_quality_support_bases 132321
Organism Biosample SRA Run Similarity to:
Listeria monocytogenes IEH-NGS-LIS-00100 SAMN02567873 SRR1207486 Listeria SLCC7179
SRR1220750 Listeria J0161
Salmonella enterica Enteritidis MDH-2014-
00798 SAMN02741943 SRR1553852
Schwarzengrund str.
CVM19633
SRR1272871 Enteritidis str. P125109
Salmonella enterica Fluntern MDH-2013-
00153 SAMN02378158 SRR1067624
Javiana and
Schwarzengrund
SRR1395304 Cubana and Agona
Proficiency Testing
• Replicate results (phylogeny, SNPs) from published studies
• Resequencing
 same isolate on multiple platforms
 same isolate in multiple libraries
 same isolate in multiple labs
• Blinded submissions
 already-characterized isolates
 mixed sample isolates
 metagenomic isolates
• Corner cases
 Extreme coverage
 Duplicates
 Sample mixups
Acknowledgements
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
Richa Agarwala
Azat Badretdin
Slava Brover
Joshua Cherry
Vyacheslav Chetvernin
Robert Cohen
Michael DiCuccio
Mike Feldgarden
Dan Haft
William Klimke
Arjun Prasad
Edward Rice
Kirill Rotmistrovskyy
Stephen Sherry
Sergey Shiryev
Martin Shumway
Tatiana Tatusova
Igor Tolstoy
Chunlin Xiao
Leonid Zaslavsky
Alexander Zasypkin
Alejandro A. Schaffer
Lukas Wagner
Aleksandr Morgulis
David Lipman
James Ostell
NCBI
• This research was supported by the Intramural
Research Program of the NIH, National Library of
Medicine. http://www.ncbi.nlm.nih.gov
CDC
FDA/CFSAN
NIHGRI
UC-Davis
USDA
Vendors: PacBio, Illumina, Roche

More Related Content

What's hot

Case studies of HTS / NGS applications
Case studies of HTS / NGS applicationsCase studies of HTS / NGS applications
Case studies of HTS / NGS applications
rjorton
 
2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture
Dan Gaston
 
Overview of the ECDC whole genome sequencing strategy
Overview of the ECDC whole genome sequencing strategyOverview of the ECDC whole genome sequencing strategy
Overview of the ECDC whole genome sequencing strategy
European Centre for Disease Prevention and Control (ECDC)
 
Dr. Ben Hause - Pathogen Discovery Using Metagenomic Sequencing
Dr. Ben Hause - Pathogen Discovery Using Metagenomic SequencingDr. Ben Hause - Pathogen Discovery Using Metagenomic Sequencing
Dr. Ben Hause - Pathogen Discovery Using Metagenomic Sequencing
John Blue
 
The Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome SequencingThe Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome Sequencing
Emiliano De Cristofaro
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
Shelomi Karoon
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
ExternalEvents
 
Mci5004 biomarkers infectious diseases
Mci5004 biomarkers infectious diseasesMci5004 biomarkers infectious diseases
Mci5004 biomarkers infectious diseasesR Lin
 
GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016
ExternalEvents
 
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
ExternalEvents
 
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
Genomic Epidemiology:  How High Throughput Sequencing changed our view on bac...Genomic Epidemiology:  How High Throughput Sequencing changed our view on bac...
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
João André Carriço
 
0301 ostrer
0301   ostrer0301   ostrer
0301 ostrer
tczucker
 
Proposal for 2016 survey of WGS capacity in EU/EEA Member States
Proposal for 2016 survey of WGS capacity in EU/EEA Member StatesProposal for 2016 survey of WGS capacity in EU/EEA Member States
Proposal for 2016 survey of WGS capacity in EU/EEA Member States
European Center for Disease Prevention and Control (ECDC)
 
Genomics: The coming challenge to the health system
Genomics: The coming challenge to the health systemGenomics: The coming challenge to the health system
Genomics: The coming challenge to the health system
Private Healthcare Australia
 
Genomics, Bioinformatics, and Pathology
Genomics, Bioinformatics, and PathologyGenomics, Bioinformatics, and Pathology
Genomics, Bioinformatics, and PathologyDan Gaston
 
2017 09-07 Global Virome Project
2017 09-07 Global Virome Project2017 09-07 Global Virome Project
2017 09-07 Global Virome Project
The End Within
 
I Jornada Actualización en Genética Reproductiva y Fertilidad
I Jornada Actualización en Genética Reproductiva y Fertilidad I Jornada Actualización en Genética Reproductiva y Fertilidad
I Jornada Actualización en Genética Reproductiva y Fertilidad
TECNALIA Research & Innovation
 
Jan 15 2013 Hospital Microbiome Meeting
Jan 15 2013 Hospital Microbiome MeetingJan 15 2013 Hospital Microbiome Meeting
Jan 15 2013 Hospital Microbiome Meetingdansmith01
 
Rossen eccmid2015v1.5
Rossen eccmid2015v1.5Rossen eccmid2015v1.5

What's hot (20)

Case studies of HTS / NGS applications
Case studies of HTS / NGS applicationsCase studies of HTS / NGS applications
Case studies of HTS / NGS applications
 
2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture
 
Overview of the ECDC whole genome sequencing strategy
Overview of the ECDC whole genome sequencing strategyOverview of the ECDC whole genome sequencing strategy
Overview of the ECDC whole genome sequencing strategy
 
Dr. Ben Hause - Pathogen Discovery Using Metagenomic Sequencing
Dr. Ben Hause - Pathogen Discovery Using Metagenomic SequencingDr. Ben Hause - Pathogen Discovery Using Metagenomic Sequencing
Dr. Ben Hause - Pathogen Discovery Using Metagenomic Sequencing
 
The Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome SequencingThe Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome Sequencing
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
 
Mci5004 biomarkers infectious diseases
Mci5004 biomarkers infectious diseasesMci5004 biomarkers infectious diseases
Mci5004 biomarkers infectious diseases
 
GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
 
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
Genomic Epidemiology:  How High Throughput Sequencing changed our view on bac...Genomic Epidemiology:  How High Throughput Sequencing changed our view on bac...
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
 
0301 ostrer
0301   ostrer0301   ostrer
0301 ostrer
 
Proposal for 2016 survey of WGS capacity in EU/EEA Member States
Proposal for 2016 survey of WGS capacity in EU/EEA Member StatesProposal for 2016 survey of WGS capacity in EU/EEA Member States
Proposal for 2016 survey of WGS capacity in EU/EEA Member States
 
Genomics: The coming challenge to the health system
Genomics: The coming challenge to the health systemGenomics: The coming challenge to the health system
Genomics: The coming challenge to the health system
 
Genomics, Bioinformatics, and Pathology
Genomics, Bioinformatics, and PathologyGenomics, Bioinformatics, and Pathology
Genomics, Bioinformatics, and Pathology
 
2017 09-07 Global Virome Project
2017 09-07 Global Virome Project2017 09-07 Global Virome Project
2017 09-07 Global Virome Project
 
I Jornada Actualización en Genética Reproductiva y Fertilidad
I Jornada Actualización en Genética Reproductiva y Fertilidad I Jornada Actualización en Genética Reproductiva y Fertilidad
I Jornada Actualización en Genética Reproductiva y Fertilidad
 
Jan 15 2013 Hospital Microbiome Meeting
Jan 15 2013 Hospital Microbiome MeetingJan 15 2013 Hospital Microbiome Meeting
Jan 15 2013 Hospital Microbiome Meeting
 
Rossen eccmid2015v1.5
Rossen eccmid2015v1.5Rossen eccmid2015v1.5
Rossen eccmid2015v1.5
 

Viewers also liked

SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NIST
nist-spin
 
Whole genome sequencing
Whole genome sequencingWhole genome sequencing
Whole genome sequencing
qadardana kakar
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Mark Pallen
 
DNA Sequencing
DNA SequencingDNA Sequencing
DNA Sequencing
Surender Rawat
 
Diagnostic microbiology - Traditional and Modern Approach
Diagnostic microbiology - Traditional and Modern ApproachDiagnostic microbiology - Traditional and Modern Approach
Diagnostic microbiology - Traditional and Modern Approach
Chhaya Sawant
 
DNA Structure and Function (Diamsay, Mendoza))
DNA Structure and Function (Diamsay, Mendoza))DNA Structure and Function (Diamsay, Mendoza))
DNA Structure and Function (Diamsay, Mendoza))Elisha Grace Diamsay
 
DNA Structure PowerPoint
DNA Structure PowerPointDNA Structure PowerPoint
DNA Structure PowerPointBiologyIB
 
DNA structure, Functions and properties
DNA structure, Functions and propertiesDNA structure, Functions and properties
DNA structure, Functions and properties
Namrata Chhabra
 

Viewers also liked (9)

SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NIST
 
Whole genome sequencing
Whole genome sequencingWhole genome sequencing
Whole genome sequencing
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
 
DNA Sequencing
DNA SequencingDNA Sequencing
DNA Sequencing
 
Diagnostic microbiology - Traditional and Modern Approach
Diagnostic microbiology - Traditional and Modern ApproachDiagnostic microbiology - Traditional and Modern Approach
Diagnostic microbiology - Traditional and Modern Approach
 
DNA Structure and Function (Diamsay, Mendoza))
DNA Structure and Function (Diamsay, Mendoza))DNA Structure and Function (Diamsay, Mendoza))
DNA Structure and Function (Diamsay, Mendoza))
 
DNA Structure PowerPoint
DNA Structure PowerPointDNA Structure PowerPoint
DNA Structure PowerPoint
 
DNA structure, Functions and properties
DNA structure, Functions and propertiesDNA structure, Functions and properties
DNA structure, Functions and properties
 

Similar to Bacterial Pathogen Genomics at NCBI

Bacterial Pathogen Genomics at NCBI
Bacterial Pathogen Genomics at NCBIBacterial Pathogen Genomics at NCBI
Bacterial Pathogen Genomics at NCBI
Nathan Olson
 
EpiVax_Tregitope_Overview_2013
EpiVax_Tregitope_Overview_2013EpiVax_Tregitope_Overview_2013
EpiVax_Tregitope_Overview_2013
Business EpiVax
 
Automation in microbiology, changing concept and defeating challenges
Automation in microbiology, changing concept and defeating challengesAutomation in microbiology, changing concept and defeating challenges
Automation in microbiology, changing concept and defeating challenges
Ayman Allam
 
Dr. Jianqiang Zhang - Improvement of PRRSV Isolation from Clinical Samples Us...
Dr. Jianqiang Zhang - Improvement of PRRSV Isolation from Clinical Samples Us...Dr. Jianqiang Zhang - Improvement of PRRSV Isolation from Clinical Samples Us...
Dr. Jianqiang Zhang - Improvement of PRRSV Isolation from Clinical Samples Us...
John Blue
 
6-23-2015 AACC Poster HIV Incidence Assay - Stengelin_Final
6-23-2015 AACC Poster HIV Incidence Assay - Stengelin_Final6-23-2015 AACC Poster HIV Incidence Assay - Stengelin_Final
6-23-2015 AACC Poster HIV Incidence Assay - Stengelin_FinalLawrence Hwang
 
BIOL335: Sequence alignment
BIOL335: Sequence alignmentBIOL335: Sequence alignment
BIOL335: Sequence alignment
Paul Gardner
 
An overview of reverse genetic approaches to enhanced FMD vaccines in Africa ...
An overview of reverse genetic approaches to enhanced FMD vaccines in Africa ...An overview of reverse genetic approaches to enhanced FMD vaccines in Africa ...
An overview of reverse genetic approaches to enhanced FMD vaccines in Africa ...
EuFMD
 
OS18 - 8.a.1 An Overview of reverse Genetic approaches to enhanced FMD vaccin...
OS18 - 8.a.1 An Overview of reverse Genetic approaches to enhanced FMD vaccin...OS18 - 8.a.1 An Overview of reverse Genetic approaches to enhanced FMD vaccin...
OS18 - 8.a.1 An Overview of reverse Genetic approaches to enhanced FMD vaccin...
EuFMD
 
Dr. Subha Madhavan: G-DOC – Enabling Systems Medicine through Innovations in ...
Dr. Subha Madhavan: G-DOC – Enabling Systems Medicine through Innovations in ...Dr. Subha Madhavan: G-DOC – Enabling Systems Medicine through Innovations in ...
Dr. Subha Madhavan: G-DOC – Enabling Systems Medicine through Innovations in ...
National Cancer Institute National Cancer Informatics Program
 
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
Merck Life Sciences
 
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
MilliporeSigma
 
2023 CPT Code Updates (CDM Focused)
2023 CPT Code Updates (CDM Focused)2023 CPT Code Updates (CDM Focused)
2023 CPT Code Updates (CDM Focused)
Health Catalyst
 
Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...
Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...
Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...
Kate Barlow
 
OS16 - 2.P3.d Genetic and Antigenic Variation of FMD Virus During Persisten...
OS16 - 2.P3.d   Genetic and Antigenic Variation of FMD Virus During Persisten...OS16 - 2.P3.d   Genetic and Antigenic Variation of FMD Virus During Persisten...
OS16 - 2.P3.d Genetic and Antigenic Variation of FMD Virus During Persisten...
EuFMD
 
dkn520.pdf
dkn520.pdfdkn520.pdf
dkn520.pdf
ImeneFl
 
CCO_mCRPC_Management_Downloadable_3.pptx
CCO_mCRPC_Management_Downloadable_3.pptxCCO_mCRPC_Management_Downloadable_3.pptx
CCO_mCRPC_Management_Downloadable_3.pptx
DoQuyenPhan1
 
SNP marker development in a QTL region associated with drought tolerance trai...
SNP marker development in a QTL region associated with drought tolerance trai...SNP marker development in a QTL region associated with drought tolerance trai...
SNP marker development in a QTL region associated with drought tolerance trai...
ICRISAT
 
Global germplasm collections: sure benefits without seedborne diseases
Global germplasm collections: sure benefits without seedborne diseasesGlobal germplasm collections: sure benefits without seedborne diseases
Global germplasm collections: sure benefits without seedborne diseases
CIAT
 
PHI PowerPoint Template (1).pptx
PHI PowerPoint Template (1).pptxPHI PowerPoint Template (1).pptx
PHI PowerPoint Template (1).pptx
Satendra Shroti
 
Wei as databank in taiwan 2011
Wei as databank in taiwan 2011Wei as databank in taiwan 2011
Wei as databank in taiwan 2011
netnk
 

Similar to Bacterial Pathogen Genomics at NCBI (20)

Bacterial Pathogen Genomics at NCBI
Bacterial Pathogen Genomics at NCBIBacterial Pathogen Genomics at NCBI
Bacterial Pathogen Genomics at NCBI
 
EpiVax_Tregitope_Overview_2013
EpiVax_Tregitope_Overview_2013EpiVax_Tregitope_Overview_2013
EpiVax_Tregitope_Overview_2013
 
Automation in microbiology, changing concept and defeating challenges
Automation in microbiology, changing concept and defeating challengesAutomation in microbiology, changing concept and defeating challenges
Automation in microbiology, changing concept and defeating challenges
 
Dr. Jianqiang Zhang - Improvement of PRRSV Isolation from Clinical Samples Us...
Dr. Jianqiang Zhang - Improvement of PRRSV Isolation from Clinical Samples Us...Dr. Jianqiang Zhang - Improvement of PRRSV Isolation from Clinical Samples Us...
Dr. Jianqiang Zhang - Improvement of PRRSV Isolation from Clinical Samples Us...
 
6-23-2015 AACC Poster HIV Incidence Assay - Stengelin_Final
6-23-2015 AACC Poster HIV Incidence Assay - Stengelin_Final6-23-2015 AACC Poster HIV Incidence Assay - Stengelin_Final
6-23-2015 AACC Poster HIV Incidence Assay - Stengelin_Final
 
BIOL335: Sequence alignment
BIOL335: Sequence alignmentBIOL335: Sequence alignment
BIOL335: Sequence alignment
 
An overview of reverse genetic approaches to enhanced FMD vaccines in Africa ...
An overview of reverse genetic approaches to enhanced FMD vaccines in Africa ...An overview of reverse genetic approaches to enhanced FMD vaccines in Africa ...
An overview of reverse genetic approaches to enhanced FMD vaccines in Africa ...
 
OS18 - 8.a.1 An Overview of reverse Genetic approaches to enhanced FMD vaccin...
OS18 - 8.a.1 An Overview of reverse Genetic approaches to enhanced FMD vaccin...OS18 - 8.a.1 An Overview of reverse Genetic approaches to enhanced FMD vaccin...
OS18 - 8.a.1 An Overview of reverse Genetic approaches to enhanced FMD vaccin...
 
Dr. Subha Madhavan: G-DOC – Enabling Systems Medicine through Innovations in ...
Dr. Subha Madhavan: G-DOC – Enabling Systems Medicine through Innovations in ...Dr. Subha Madhavan: G-DOC – Enabling Systems Medicine through Innovations in ...
Dr. Subha Madhavan: G-DOC – Enabling Systems Medicine through Innovations in ...
 
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
 
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
 
2023 CPT Code Updates (CDM Focused)
2023 CPT Code Updates (CDM Focused)2023 CPT Code Updates (CDM Focused)
2023 CPT Code Updates (CDM Focused)
 
Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...
Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...
Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...
 
OS16 - 2.P3.d Genetic and Antigenic Variation of FMD Virus During Persisten...
OS16 - 2.P3.d   Genetic and Antigenic Variation of FMD Virus During Persisten...OS16 - 2.P3.d   Genetic and Antigenic Variation of FMD Virus During Persisten...
OS16 - 2.P3.d Genetic and Antigenic Variation of FMD Virus During Persisten...
 
dkn520.pdf
dkn520.pdfdkn520.pdf
dkn520.pdf
 
CCO_mCRPC_Management_Downloadable_3.pptx
CCO_mCRPC_Management_Downloadable_3.pptxCCO_mCRPC_Management_Downloadable_3.pptx
CCO_mCRPC_Management_Downloadable_3.pptx
 
SNP marker development in a QTL region associated with drought tolerance trai...
SNP marker development in a QTL region associated with drought tolerance trai...SNP marker development in a QTL region associated with drought tolerance trai...
SNP marker development in a QTL region associated with drought tolerance trai...
 
Global germplasm collections: sure benefits without seedborne diseases
Global germplasm collections: sure benefits without seedborne diseasesGlobal germplasm collections: sure benefits without seedborne diseases
Global germplasm collections: sure benefits without seedborne diseases
 
PHI PowerPoint Template (1).pptx
PHI PowerPoint Template (1).pptxPHI PowerPoint Template (1).pptx
PHI PowerPoint Template (1).pptx
 
Wei as databank in taiwan 2011
Wei as databank in taiwan 2011Wei as databank in taiwan 2011
Wei as databank in taiwan 2011
 

Recently uploaded

platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 

Recently uploaded (20)

platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 

Bacterial Pathogen Genomics at NCBI

  • 2.
  • 3. FDA, USDA, CDC State, Local and Foreign Public Health Agencies Industry/Academia Additional DATA ANALYSIS DATA ASSEMBLY AND STORAGE and Analysis DATA ACQUISITION NCBI, EMBL DDBJ (INDIS) (Public Access Database) Our Current Model – Publicly available data National Network of SequencersIntrenational Network of Sequencers
  • 4. Automated Bacterial Assembly SRA Reads sample 1 Trim reads (Ns, adaptor) Reference Distance tree Find closest reference genome(s) ArgoCA (Combined Assembly) De novo assembly panel Argo (Reference assisted assembly) SOAP denovo GS-assembler (newbler) MaSuRCA Celera Assembler Reads remapped to combined assembly Contig fasta Read placements (bam) Quality profile SPAdes
  • 5. WGS & Epidemiologically Relevant Distance (ERD) • WGS allows high resolution genotypic comparison of pathogen isolates • What is the epidemiological relevance of genotypic distance? • Many methods to compute – we need some common principles…
  • 6. Since all approaches start with sequence reads, we must retain for independent confirmation 0 0.2 0.4 0.6 0.8 1 0 500 1000 1500 Millions FDA-CFSAN: microbial foodborne pathogen research SRA format bytes per sequenced base versus number of bases in MiSeq runs With Quality Without Qualities 0 0.2 0.4 0.6 0.8 0 200 400 600 800 1000 1200 Millions OXFORD University: Population Genomics of Mycobacterium tuberculosis SRA format bytes per sequenced base versus number of bases in MiSeq and HiSeq runs With Quality Without Quality Storage is manageable…
  • 7. Reliable, transparent, high throughput, high resolution ERDs? Major challenge is to distinguish independent events (SNPs) from single events that generate multiple nucleotide differences i.e. collapsed repeats and other artifacts, alignment errors (reference-based alignments), sequence quality, & recombination
  • 8. Fairly uniform distribution of differences along the two genomes…? Cumulative count of differences
  • 9. Iterative density filtering (Richa Agarwala modification of Science. 2011 Jan 28;331(6016):430-4.
  • 10.
  • 11. Table: Samples currently processed (as of Sept 5, 2014) in NCBI Pathogen Pipeline Organisms Center Listeria Salmonella E. coli Total CDC 903 903 FDA + State Partners* 858 6129 307 7294 100K 565 34 599 FERA 14 14 Total 1775 6694 341 8810 Processing Status
  • 12. How to measure the system? need the raw data (sequence reads) in unprocessed form any read trimming/filtering along with the assembly can be regenerated
  • 13. Assembly metrics map the reads back to the assembly and generate a profile of each position (coverage, alleles, qualities) compare the assembly against other assemblies of the same organism (genus, species) and check the expected genome size, or similarity to related genomes annotation metrics such as frameshifted proteins
  • 14. What is the actual measurement for sequence similarity? the number of pairwise SNPs between two genomes What is the threshold? a pairwise distance (an observationally determined cutoff below which a cluster of 2 or more isolates are considered significantly close enough to warrant further investigation)
  • 15. Sensitivity vs. Specificity sequence clustering sensitivity – measure of isolates which belong to the cluster within epidemiologically relevant distance (true positives) / true positives + false negatives (not correctly identified) specificity – measure of isolates which are excluded from a cluster within epidemiologically relevant distance (true negatives) / true negatives + false positives
  • 16. Organism Total Samples Not expected species1 Mixed organisms Less than 5X coverage Duplicates PacBio Poor 2nd read Failed assembly stage Listeria 1775 20 2 (?) 1 5 1 Salmonella 6694 35 5 9 12 E. coli 341 8 1 1. not L. monocytogenes, S. enterica, or E. coli Processing Problems
  • 19.
  • 20. Streptococcus massiliensis 4401825 - CANO - GCA_000341525.1 Streptococcus massiliensis DSM 18628 - ARCE - GCA_000380065.1 Streptococcus intermedius BA1 - ANFT - GCA_000313655.1 Streptococcus intermedius B196 - - GCA_000463355.1 Streptococcus intermedius C270 - - GCA_000463385.1 Streptococcus intermedius F0413 - AFXO - GCA_000234035.1 Streptococcus intermedius SK54 - AJKN - GCA_000258445.1 Streptococcus intermedius JTH08 - - GCA_000306805.1 Streptococcus intermedius ATCC 27335 - ATFK - GCA_000413475.1 Streptococcus intermedius F0395 - AFXN - GCA_000234015.1 Streptococcus sp. AS20 - JANS - GCA_000524255.1 Streptococcus constellatus subsp. constellatus SK53 - AICQ - GCA_000257785.1 Streptococcus constellatus subsp. constellatus SK53 - BASU - GCA_000474075.1 Streptococcus constellatus subsp. pharyngis C1050 - - GCA_000463425.1 Streptococcus constellatus subsp. pharyngis SK1060 = CCUG 46377 - AFUP - GCA_000223295.2 Streptococcus constellatus subsp. pharyngis SK1060 = CCUG 46377 - BASX - GCA_000474135.1 Streptococcus constellatus subsp. pharyngis C232 - - GCA_000463395.1 Streptococcus constellatus subsp. pharyngis C818 - - GCA_000463445.1 Streptococcus anginosus SK1138 - ALJO - GCA_000287595.1 Streptococcus sp. CM7 - JATP - GCA_000526035.1 Streptococcus sp. OBRC6 - JACR - GCA_000517685.1 Streptococcus anginosus F0211 - AECT - GCA_000184365.2 Streptococcus anginosus 1505 - BASW - GCA_000474115.1 Streptococcus sp. ACC21 - JAQU - GCA_000524375.1 Streptococcus sp. AC15 - JDFJ - GCA_000565055.1 Streptococcus anginosus subsp. whileyi MAS624 - - GCA_000478925.1 Streptococcus anginosus subsp. whileyi CCUG 39159 - AICP - GCA_000257765.1 Streptococcus anginosus C238 - - GCA_000463505.1 Streptococcus anginosus DORA_7 - AZMF - GCA_000508545.1 Streptococcus anginosus 1_2_62CV - ADME - GCA_000186545.1 Streptococcus anginosus C1051 - - GCA_000463465.1 Streptococcus anginosus T5 - BASY - GCA_000474155.1 Streptococcus anginosus SK52 = DSM 20563 - AFIM - GCA_000214555.2 Streptococcus anginosus SK52 = DSM 20563 - AREF - GCA_000373605.1 Streptococcus anginosus SK52 = DSM 20563 - BAST - GCA_000474055.1 Streptococcus intermedius SK54 - BASV - GCA_000474095.1 0.05
  • 21.
  • 22. Escherichia coli KTE179 - ANYQ - GCA_000326485.1 Escherichia coli KTE229 - ANXK - GCA_000353165.1 Escherichia coli H252 - AEFI - GCA_000190895.1 Escherichia coli HVH 180 (4-3051617) - AVYH - GCA_000458685.1 Escherichia coli HVH 73 (4-2393174) - AVUX - GCA_000457025.1 Escherichia coli HVH 104 (4-6977960) - AVVT - GCA_000457455.1 Escherichia coli HVH 19 (4-7154984) - AVTL - GCA_000456265.1 Escherichia coli 908675 - AXTY - GCA_000488755.1 Escherichia coli HVH 127 (4-7303629) - AVWO - GCA_000457855.1 Escherichia coli HVH 12 (4-7653042) - AVTG - GCA_000494955.1 Escherichia coli KOEGE 32 (66a) - AWAD - GCA_000459635.1 Escherichia coli UMEA 3041-1 - AWAW - GCA_000460015.1 Escherichia coli HVH 148 (4-3192490) - AVXH - GCA_000495015.1 Escherichia coli HVH 59 (4-1119338) - AVUQ - GCA_000456885.1 Escherichia coli HVH 222 (4-2977443) - AVZU - GCA_000459455.1 Escherichia coli UMEA 3140-1 - AWBK - GCA_000460295.1 Escherichia coli HVH 178 (4-3189163) - AVYG - GCA_000495055.1 Escherichia coli KTE4 - ANSO - GCA_000350645.1 Escherichia coli KTE3 - ASTO - GCA_000407685.1 Escherichia coli KTE240 - ASUS - GCA_000408305.1 Escherichia coli BIDMC 49b - JAPT - GCA_000522365.1 Escherichia coli BIDMC 49a - JAPU - GCA_000522385.1 Escherichia coli APEC O1 - - GCA_000014845.1 Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 - BAIM - GCA_000613265.1 Escherichia coli JCM 20135 - BAKV - GCA_000614505.1 Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 - AGSE - GCA_000690815.1 Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 - JMST - GCA_000734955.1 Escherichia coli HVH 214 (4-3062198) - AZJN - GCA_000507665.1 Escherichia coli UMEA 3162-1 - AWBU - GCA_000460475.1 Escherichia coli HVH 191 (3-9341900) - AVYR - GCA_000458875.1 Escherichia coli HVH 170 (4-3026949) - AVYA - GCA_000458555.1 Escherichia coli S88 - - GCA_000026285.1 Escherichia coli UMEA 3893-1 - AWEI - GCA_000461775.1 Escherichia coli HVH 217 (4-1022806) - AVZQ - GCA_000459375.1 Escherichia coli KTE5 - ANSP - GCA_000350665.1 Escherichia coli KTE7 - ASTP - GCA_000407705.1 Escherichia coli HVH 32 (4-3773988) - AVTX - GCA_000456505.1 Escherichia coli UMEA 3206-1 - AWCK - GCA_000460795.1 Escherichia coli UMEA 3203-1 - AWCJ - GCA_000460775.1 Escherichia coli KTE62 - ANUK - GCA_000351605.1 Escherichia coli KTE27 - ASTY - GCA_000407885.1 Escherichia coli cloneA_i1 - AEYT - GCA_000233675.2 Escherichia coli 597 - AYQU - GCA_000503475.1 Escherichia coli HVH 203 (4-3126218) - AVZD - GCA_000459115.1 Escherichia coli UMEA 3702-1 - AWDZ - GCA_000461595.1 Escherichia coli UMEA 3662-1 - AWDU - GCA_000461495.1 Escherichia coli HVH 5 (4-7148410) - AVTB - GCA_000456085.1 Escherichia coli HVH 102 (4-6906788) - AVVR - GCA_000465155.1 Escherichia coli HVH 201 (4-4459431) - AVZB - GCA_000459075.1 Escherichia coli HM605 - AJWU - GCA_000264175.1 Escherichia coli HM605 - CADZ - GCA_000285375.1 0.01
  • 23.
  • 24.
  • 26.
  • 28.
  • 29. Assembly for sample SAMN02727350 Type Number of contigs Sum of contig lengths Full assembly 667 5251272 contigs with Listeria hits 37 3031650 contigs with Staphylococcus hits 630 2203573
  • 31.
  • 33.
  • 34. Table: Assembly stats for SAMN02693748 measurement result num_input_reads 4212706 aligned_reads 4040070 assembly_num_bases 3180478 assembly_num_contigs 50 assembly_N50 2817733 poor_quality_support_bases 132321
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40. Organism Biosample SRA Run Similarity to: Listeria monocytogenes IEH-NGS-LIS-00100 SAMN02567873 SRR1207486 Listeria SLCC7179 SRR1220750 Listeria J0161 Salmonella enterica Enteritidis MDH-2014- 00798 SAMN02741943 SRR1553852 Schwarzengrund str. CVM19633 SRR1272871 Enteritidis str. P125109 Salmonella enterica Fluntern MDH-2013- 00153 SAMN02378158 SRR1067624 Javiana and Schwarzengrund SRR1395304 Cubana and Agona
  • 41.
  • 42. Proficiency Testing • Replicate results (phylogeny, SNPs) from published studies • Resequencing  same isolate on multiple platforms  same isolate in multiple libraries  same isolate in multiple labs • Blinded submissions  already-characterized isolates  mixed sample isolates  metagenomic isolates • Corner cases  Extreme coverage  Duplicates  Sample mixups
  • 43.
  • 44.
  • 45.
  • 46.
  • 47. Acknowledgements National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA Richa Agarwala Azat Badretdin Slava Brover Joshua Cherry Vyacheslav Chetvernin Robert Cohen Michael DiCuccio Mike Feldgarden Dan Haft William Klimke Arjun Prasad Edward Rice Kirill Rotmistrovskyy Stephen Sherry Sergey Shiryev Martin Shumway Tatiana Tatusova Igor Tolstoy Chunlin Xiao Leonid Zaslavsky Alexander Zasypkin Alejandro A. Schaffer Lukas Wagner Aleksandr Morgulis David Lipman James Ostell NCBI • This research was supported by the Intramural Research Program of the NIH, National Library of Medicine. http://www.ncbi.nlm.nih.gov CDC FDA/CFSAN NIHGRI UC-Davis USDA Vendors: PacBio, Illumina, Roche