SlideShare a Scribd company logo
1 of 18
Development of FDA MicroDB:
A Regulatory-Grade
Microbial Reference Database
Heike Sichtig, Ph.D.
Division of Microbiology Devices
OIR/CDRH/FDA/HHS
Heike.Sichtig@fda.hhs.gov
U.S. Food and Drug Administration
Institute for Genome Sciences
October 21-22, 2014
NIST Workshop to Identify Standards Needed to Support Pathogen Identification
via Next-Generation Sequencing, NIST, MD
Luke Tallon
Genomics Resource Center
Institute for Genome Sciences
UMSOM
ljtallon@som.umaryland.edu
2
Microbial NGS-Based Diagnostic Devices
• OIR/DMD working on a fast-tracked Draft Guidance
• On April 1st 2014 held Public Workshop
“Advancing Regulatory Science for High Throughput Sequencing
Devices for Microbial Identification and Detection of Antimicrobial
Resistance Markers” [FR Doc No: 2014-04940]
• Workshop agenda, discussion paper and webcast online:
http://www.fda.gov/MedicalDevices/NewsEvents/WorkshopsConferences/ucm386967.htm
Objectives:
1. Streamline/shorten clinical trials for microbial diagnosis/identification
2. Establish a new comparator algorithm for assays developed using this
new technology
3. Develop regulatory science standards for microbial genome sequencing
4. Investigate the regulatory science required for antimicrobial resistance
determination through microbial genome sequence information.
3
Inter-Agency Working Group on Feasibility
Approach:
• Formed a diverse working group FDA, NIH-NCBI, NIAID, DTRA,
LLNL, and CDC
• Conducted small pilot study to generate information to evaluate
quality of existing sequences in the public domain (In Progress)
• Identify the pre-existing high-quality deposits, and build from
there
• Will use information to set quality bar for sequence outputs for
our ongoing sequencing efforts
• Utilized existing standards (if available) for technical and isolate
metadata –no need to re-invent
• Attention given to connecting antimicrobial resistance
phenotype to genomic deposits – clinical collection site
– Multiple levels of Reference DBs likely
• “High quality” genomes only
– For validation and clinical use
• “High quality” + other available genomes
– For testing and development
• Requires definition of “high quality” that must include
some draft genomes
– Extensive screening required
• Human and other hosts; chimeras
• Artificial constructs
– Separate bacterial, viral, fungal reference DBs
– Publicly available (NCBI/EMBL/DDBJ)
4
Looking ahead: Predictions for Reference Databases
Courtesy of Tom Slezak
5
Robust, Standardized, and High Quality Microbial
Sequence Database in the Public Sector
Current Need
Cover illustration
(Copyright © 2009, American Society
for Microbiology. All Rights Reserved.)
• Representative Samples
• Metadata
• High quality raw sequences
• Assemblies
• Annotation
• Public Domain
6
Latest NCBI Genbank Report on Bacterial Genome
Growth
0
5000
10000
15000
20000
25000
Jul-98 Aug-99 Oct-00 Nov-01 Dec-02 Jan-04 Feb-05 Mar-06 Apr-07 Jun-08 Jul-09 Aug-10 Sep-11 Oct-12 Nov-13 Dec-14
Count
Date
Bacterial Genomes Report
#Genomes
#Real Species
Courtesy of NCBI
7
Microbial Reference Database (MicroDB)($1,67M)
• Identify “gaps” and target sequencing efforts (Funding awarded by FDA/OCET)
• All raw reads, assemblies, annotations, metadata sent to NCBI and
accessible to the PUBLIC
• Traceable results that could be reevaluated as necessary
>600 Clinically
Relevant and MCM
Microorganisms
Highly
Controlled
and
Documented
Approach
Collaborations with Clinical Labs and Repositories
• Children’s National Hospital
• DoD Critical Reagents Program (CRP, USAMRIID)
• FDA-CFSAN, FDA-CBER, FDA-CDER
• DHS National Biodefense Analysis and
Countermeasures Center (NBACC)
• The Rockefeller University
• Culture Collections: ATCC, DSMZ
Sequencing Center (UMD IGS)
• Hybrid Approach (PacBio and Illumina)
• Deposit of Raw Reads at NCBI (SRA)
• Deposit of Assemblies at NCBI
• Deposit of Annotations at NCBI
• FDA Interface to Access Data
MicroDB Requirements
A. Extracted Genomic DNA (gDNA)
– Extracted gDNA should be of high quality and purity, and at sufficient concentration to
achieve a suitable yield to assure adequate depth and breadth of genomic coverage for
the type of sequencing method employed.
B. BioSample Metadata
– A minimal description of the isolate source material is necessary for traceability. We are
using 14 descriptors as outlined below. (Note: Minimal metadata is modeled in part after
NCBI’s minimal pathogen template)
– Unique ID, organism, strain/isolate, sample site, specimen type, host disease, collection
date, collected by, patient age, gender, geographic location, AST method*, AST method
manufacturer*, Antimicrobial Susceptibilities*
C. Sequencing Data
– The minimum requirement for sequencing data is that the generated raw reads should be
deposited in NCBI’s Sequence Read Archive (SRA) and assemblies should be deposited
at NCBI’s Assembly division. The availability of raw reads and assemblies will provide a
pathway to re-analyze the data as newer technologies emerge. Furthermore, annotation
data should be deposited when available.
– Raw reads, assemblies, annotations*
8
*not used as a criteria for exclusion
MicroDB Requirements
D. Sequencing Metadata
– A minimal description of the sequencing process is necessary for traceability. We are
using 7 descriptors as outlined below including bioinformatics tool information for
assembly and annotation, and genomic coverage information.
– Library, platform, submitted by, fold coverage, pipeline, assembler, annotation tool*
E. Suggested phenotypic metadata*
– A description of the phenotypic information is suggested to create a link between the
phenotypic traits of particular organisms and their genomic sequence. We are
recommending 5 descriptors as outlined below (1-4 are also included in sections B and C).
– Annotation, AST method, AST method manufacturer, antimicrobial susceptibilities,
additional phenotypic data
9
*not used as a criteria for exclusion
NCBI Submission Cases
1. Childrens National Medical Center
– Submit all data when available
– Register sample metadata via BioSample
– Submit raw reads and assemblies generated by IGS when available
2. FDA/CFSAN
– Collaborative agreement: Wait for genome announcements
– Follow same procedures as for 1 and put a ‘6 month hold’ to release
data, lift hold when genome announcements are out
3. Rockefeller University
– Collaborative agreement: Wait for publication
– Follow same procedures as for 1 and put a ‘6 month hold’ to release
data, lift hold when publication is out
Similar agreements in place with other collaborators
depending on their needs
10
Project Approach
• Sequencing in large batches
– Illumina HiSeq paired-end sequencing: >200x
– PacBio long-insert SMRT P4-C2 sequencing: >80-100x
• Assembly
– PacBio only (HGAP, PBcR CA)
– Illumina only (CA, MaSuRCA)
– PacBio/Illumina hybrid (CA)
– Minimal manual QA/QC & curation
• Automated Annotation
• Base modification detection
• Raw reads -> NCBI SRA
• Assembled & annotated genomes -> Genbank
– NCBI BIOPROJECT ID: PRJNA231221
• FDA Web interface to aggregate data
Progress - Batch 1
Rockefeller (50)
• Uniform sample set
– Staphylococcus aureus
– 2.8Mbp genome size
– 32.8 %GC
– Significant metadata
CNH/CFSAN (41)
• Diverse sample set
– 18 genera represented
– 2 – 8 Mbp genome size range
– 38 – 67 %GC range
Wikimedia Commons Wikimedia Commons
NCBI BioProject: PRJNA231221
Rockefeller Samples
• Sequencing
– Avg Illumina cvg: 578x
– Avg PacBio cvg: 185x
– 1 or 2 SMRT cells each
• Assembly:
– 32 of 50 in single contig chromosome
– Average contig count = 5
– “Best” assembly:
• HGAP = 29
• CA hybrid = 21
• Most differences subtle
• Annotation complete
• Final QC & data submissions underway
CNH/CFSAN Samples
• Sequencing
– Avg Illumina cvg: 315x
– Avg PacBio cvg: 167x
• 2 SMRT cells each
• Assembly
– 12 of 41 in single contig chromosome
• 29 in <= 5 contigs
– Avg contig count = 4.5
– Median contig count = 3
– “Best” assembly (of 41):
• HGAP = 24
• PBcR CA = 14
• CA hybrid = 3
• Annotation underway
ROCK_290Celera8ctgvs.ref
05000001000000150000020000002500000
gi|374362062|gb|CP003033.1|
0
500000
1000000
1500000
2000000
2500000
ctg7180000000002
0
20
40
60
80
100
%similarity
Assembly QC & Curation
CA8 – Ill/PB hybrid
Largest Ctg Len: 2,759,091bp
Total asm Ctg Len: 2,770,822 bp
ROCK_290HGAP2ctgvs.ref
05000001000000150000020000002500000
gi|374362062|gb|CP003033.1|
scf7180000000011|quiverscf7180000000010|quiverscf7180000000012|quiver
scf7180000000013|quiverscf7180000000014|quiver
QRY
HGAP2
Largest Ctg Len: 2,128,476bp
Total asm Ctg Len: 2,802,621 bp
05000001000000150000020000002500000
gi|595636499|gb|CP007454.1|
0
500000
1000000
1500000
2000000
2500000
scf7180000000002|quiver
0
20
40
60
80
100
%similarity
HGAP2
Largest Ctg Len: 2,764,709bp
Total asm Ctg Len: 2,764,709bp
4bp overlap?
1X coverage
TAAC
1X coverage
TAGC
Assembly QC & Curation
Challenges & Opportunities
• Sample acquisition & quality
• Efficiency/throughput vs. accuracy/quality
– Sequencing strategy
– Assembly QA/QC & curation
• Ever longer reads!
– Reduced coverage -> higher efficiency sequencing
– More “closed” genomes!
• Small plasmids
– SageELF & Illumina
18
Thank You
LLNL
Tom Slezak
NIH-NCBI
Bill Klimke, Martin Shumway, David Lipman
NIH-NIAID
Vivien Dugan, Maria Giovani
DTRA
Matt Tobelmann, Chris Detter, Eric
VanGieson, Nels Olsen
CDC
Duncan MacCannell
FDA-CFSAN
Maria Hoffmann, Cary Pirone, Andrea
Ottessen, Marc Allard, Eric Brown
NMRC
Kim Bishop-Lilly, Ken Frey
FDA Micro Team
Peyton Hobson, Brittany Goldberg, Kevin Snyder, Tamara Feldblyum, Uwe Scherf, Sally Hojvat
Collaborators
IGS@UMD
Lisa Sadzewicz, Luke Tallon, Naomi
Sengamalay, Al Godinez, Sandy
Ott, Sushma Nagaraj, Claire Fraser
Rockefeller University
Bryan Utter, Douglas Deutsch
Children’s National Medical Center
Brittany Goldberg, Joseph Campos
DOD-CRP
Shanmuga Sozhamannan, Mike Smith
DOD-USAMRIID
Tim Minogue
NBACC
Adam Phillippy, Nick Bergman
ATCC
Liz Kerrigan
DSMZ
Cathrin Sproer

More Related Content

What's hot

Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3GenomeInABottle
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GenomeInABottle
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleGenomeInABottle
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normalGenomeInABottle
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenomeInABottle
 
Jan2016 rm selection and design breakout summary
Jan2016 rm selection and design breakout summaryJan2016 rm selection and design breakout summary
Jan2016 rm selection and design breakout summaryGenomeInABottle
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim D. Pruitt
 
Kim Pruitt biocuration2015
Kim Pruitt biocuration2015Kim Pruitt biocuration2015
Kim Pruitt biocuration2015Kim D. Pruitt
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsGenomeInABottle
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGenomeInABottle
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GenomeInABottle
 
Giab product and tool roadmap small variants
Giab product and tool roadmap   small variantsGiab product and tool roadmap   small variants
Giab product and tool roadmap small variantsGenomeInABottle
 
Aug2015 horizon diagnostics
Aug2015 horizon diagnosticsAug2015 horizon diagnostics
Aug2015 horizon diagnosticsGenomeInABottle
 
GIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGenomeInABottle
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomesGenomeInABottle
 
Aug2015 Giab nist integration methods
Aug2015 Giab nist integration methodsAug2015 Giab nist integration methods
Aug2015 Giab nist integration methodsGenomeInABottle
 

What's hot (20)

Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottle
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
 
Jan2016 horizon GIAB
Jan2016 horizon GIABJan2016 horizon GIAB
Jan2016 horizon GIAB
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
Jan2016 rm selection and design breakout summary
Jan2016 rm selection and design breakout summaryJan2016 rm selection and design breakout summary
Jan2016 rm selection and design breakout summary
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015
 
Kim Pruitt biocuration2015
Kim Pruitt biocuration2015Kim Pruitt biocuration2015
Kim Pruitt biocuration2015
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference Materials
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptx
 
Jan2016 bina giab
Jan2016 bina giabJan2016 bina giab
Jan2016 bina giab
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
 
Giab product and tool roadmap small variants
Giab product and tool roadmap   small variantsGiab product and tool roadmap   small variants
Giab product and tool roadmap small variants
 
Aug2015 horizon diagnostics
Aug2015 horizon diagnosticsAug2015 horizon diagnostics
Aug2015 horizon diagnostics
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
GIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seq
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
 
Aug2015 Giab nist integration methods
Aug2015 Giab nist integration methodsAug2015 Giab nist integration methods
Aug2015 Giab nist integration methods
 

Viewers also liked

The prospects for Nextgen surveillance of pathogens: A view from a Public Hea...
The prospects for Nextgen surveillance of pathogens: A view from a Public Hea...The prospects for Nextgen surveillance of pathogens: A view from a Public Hea...
The prospects for Nextgen surveillance of pathogens: A view from a Public Hea...nist-spin
 
Indenting Business brochure Final
Indenting Business brochure FinalIndenting Business brochure Final
Indenting Business brochure FinalArjun Verma
 
Forensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics GroupForensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics Groupnist-spin
 
Bacterial Pathogen Genomics at NCBI
Bacterial Pathogen Genomics at NCBIBacterial Pathogen Genomics at NCBI
Bacterial Pathogen Genomics at NCBInist-spin
 
income from salary questions
income from salary questionsincome from salary questions
income from salary questionsPreeti Agarwal
 
Metrology for Identity and Other Nominal Properties
Metrology for Identity and Other Nominal PropertiesMetrology for Identity and Other Nominal Properties
Metrology for Identity and Other Nominal Propertiesnist-spin
 
Learn 100 words in 10 minutes for important entrance exams like CAT,SAT,GRE,I...
Learn 100 words in 10 minutes for important entrance exams like CAT,SAT,GRE,I...Learn 100 words in 10 minutes for important entrance exams like CAT,SAT,GRE,I...
Learn 100 words in 10 minutes for important entrance exams like CAT,SAT,GRE,I...Visual Vocab
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTnist-spin
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...nist-spin
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsJoão André Carriço
 

Viewers also liked (12)

Muhammad irfan
Muhammad irfanMuhammad irfan
Muhammad irfan
 
The prospects for Nextgen surveillance of pathogens: A view from a Public Hea...
The prospects for Nextgen surveillance of pathogens: A view from a Public Hea...The prospects for Nextgen surveillance of pathogens: A view from a Public Hea...
The prospects for Nextgen surveillance of pathogens: A view from a Public Hea...
 
IRFAN UPDATE CV
IRFAN UPDATE CVIRFAN UPDATE CV
IRFAN UPDATE CV
 
Indenting Business brochure Final
Indenting Business brochure FinalIndenting Business brochure Final
Indenting Business brochure Final
 
Forensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics GroupForensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics Group
 
Bacterial Pathogen Genomics at NCBI
Bacterial Pathogen Genomics at NCBIBacterial Pathogen Genomics at NCBI
Bacterial Pathogen Genomics at NCBI
 
income from salary questions
income from salary questionsincome from salary questions
income from salary questions
 
Metrology for Identity and Other Nominal Properties
Metrology for Identity and Other Nominal PropertiesMetrology for Identity and Other Nominal Properties
Metrology for Identity and Other Nominal Properties
 
Learn 100 words in 10 minutes for important entrance exams like CAT,SAT,GRE,I...
Learn 100 words in 10 minutes for important entrance exams like CAT,SAT,GRE,I...Learn 100 words in 10 minutes for important entrance exams like CAT,SAT,GRE,I...
Learn 100 words in 10 minutes for important entrance exams like CAT,SAT,GRE,I...
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NIST
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and Annotations
 

Similar to Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database

Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Nathan Olson
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveGolden Helix
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveGolden Helix
 
Supporting high throughput high-biotechnologies in today’s research environme...
Supporting high throughput high-biotechnologies in today’s research environme...Supporting high throughput high-biotechnologies in today’s research environme...
Supporting high throughput high-biotechnologies in today’s research environme...Ed Dodds
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
 
Best Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing WorkflowBest Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing WorkflowGolden Helix
 
The Wide Spectrum of Next-Generation Sequencing Assays with VarSeq
The Wide Spectrum of Next-Generation Sequencing Assays with VarSeqThe Wide Spectrum of Next-Generation Sequencing Assays with VarSeq
The Wide Spectrum of Next-Generation Sequencing Assays with VarSeqGolden Helix
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GenomeInABottle
 
Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Neuro, McGill University
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsSean Ekins
 
Bio-banking and metagenomics platforms for pathogen discovery
Bio-banking and metagenomics platforms for pathogen discoveryBio-banking and metagenomics platforms for pathogen discovery
Bio-banking and metagenomics platforms for pathogen discoveryILRI
 
Next Generation Companion Diagnostics; Adoption, Drivers, and Moderators of N...
Next Generation Companion Diagnostics; Adoption, Drivers, and Moderators of N...Next Generation Companion Diagnostics; Adoption, Drivers, and Moderators of N...
Next Generation Companion Diagnostics; Adoption, Drivers, and Moderators of N...Andrew Aijian
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshopGenomeInABottle
 

Similar to Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database (20)

Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
 
Supporting high throughput high-biotechnologies in today’s research environme...
Supporting high throughput high-biotechnologies in today’s research environme...Supporting high throughput high-biotechnologies in today’s research environme...
Supporting high throughput high-biotechnologies in today’s research environme...
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
SLAS Screen Design and Assay Technology SIG: SLAS2013 Presentation
SLAS Screen Design and Assay Technology SIG: SLAS2013 PresentationSLAS Screen Design and Assay Technology SIG: SLAS2013 Presentation
SLAS Screen Design and Assay Technology SIG: SLAS2013 Presentation
 
Best Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing WorkflowBest Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing Workflow
 
The Wide Spectrum of Next-Generation Sequencing Assays with VarSeq
The Wide Spectrum of Next-Generation Sequencing Assays with VarSeqThe Wide Spectrum of Next-Generation Sequencing Assays with VarSeq
The Wide Spectrum of Next-Generation Sequencing Assays with VarSeq
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).
 
First Coast Final
First Coast FinalFirst Coast Final
First Coast Final
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
 
Ouellette icgc toronto_oct2012_fged_ver02
Ouellette icgc toronto_oct2012_fged_ver02Ouellette icgc toronto_oct2012_fged_ver02
Ouellette icgc toronto_oct2012_fged_ver02
 
Bio-banking and metagenomics platforms for pathogen discovery
Bio-banking and metagenomics platforms for pathogen discoveryBio-banking and metagenomics platforms for pathogen discovery
Bio-banking and metagenomics platforms for pathogen discovery
 
Next Generation Companion Diagnostics; Adoption, Drivers, and Moderators of N...
Next Generation Companion Diagnostics; Adoption, Drivers, and Moderators of N...Next Generation Companion Diagnostics; Adoption, Drivers, and Moderators of N...
Next Generation Companion Diagnostics; Adoption, Drivers, and Moderators of N...
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Nov 2014 ouellette_windsor_icgc_final
Nov 2014 ouellette_windsor_icgc_finalNov 2014 ouellette_windsor_icgc_final
Nov 2014 ouellette_windsor_icgc_final
 

Recently uploaded

Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 

Recently uploaded (20)

Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 

Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database

  • 1. Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database Heike Sichtig, Ph.D. Division of Microbiology Devices OIR/CDRH/FDA/HHS Heike.Sichtig@fda.hhs.gov U.S. Food and Drug Administration Institute for Genome Sciences October 21-22, 2014 NIST Workshop to Identify Standards Needed to Support Pathogen Identification via Next-Generation Sequencing, NIST, MD Luke Tallon Genomics Resource Center Institute for Genome Sciences UMSOM ljtallon@som.umaryland.edu
  • 2. 2 Microbial NGS-Based Diagnostic Devices • OIR/DMD working on a fast-tracked Draft Guidance • On April 1st 2014 held Public Workshop “Advancing Regulatory Science for High Throughput Sequencing Devices for Microbial Identification and Detection of Antimicrobial Resistance Markers” [FR Doc No: 2014-04940] • Workshop agenda, discussion paper and webcast online: http://www.fda.gov/MedicalDevices/NewsEvents/WorkshopsConferences/ucm386967.htm Objectives: 1. Streamline/shorten clinical trials for microbial diagnosis/identification 2. Establish a new comparator algorithm for assays developed using this new technology 3. Develop regulatory science standards for microbial genome sequencing 4. Investigate the regulatory science required for antimicrobial resistance determination through microbial genome sequence information.
  • 3. 3 Inter-Agency Working Group on Feasibility Approach: • Formed a diverse working group FDA, NIH-NCBI, NIAID, DTRA, LLNL, and CDC • Conducted small pilot study to generate information to evaluate quality of existing sequences in the public domain (In Progress) • Identify the pre-existing high-quality deposits, and build from there • Will use information to set quality bar for sequence outputs for our ongoing sequencing efforts • Utilized existing standards (if available) for technical and isolate metadata –no need to re-invent • Attention given to connecting antimicrobial resistance phenotype to genomic deposits – clinical collection site
  • 4. – Multiple levels of Reference DBs likely • “High quality” genomes only – For validation and clinical use • “High quality” + other available genomes – For testing and development • Requires definition of “high quality” that must include some draft genomes – Extensive screening required • Human and other hosts; chimeras • Artificial constructs – Separate bacterial, viral, fungal reference DBs – Publicly available (NCBI/EMBL/DDBJ) 4 Looking ahead: Predictions for Reference Databases Courtesy of Tom Slezak
  • 5. 5 Robust, Standardized, and High Quality Microbial Sequence Database in the Public Sector Current Need Cover illustration (Copyright © 2009, American Society for Microbiology. All Rights Reserved.) • Representative Samples • Metadata • High quality raw sequences • Assemblies • Annotation • Public Domain
  • 6. 6 Latest NCBI Genbank Report on Bacterial Genome Growth 0 5000 10000 15000 20000 25000 Jul-98 Aug-99 Oct-00 Nov-01 Dec-02 Jan-04 Feb-05 Mar-06 Apr-07 Jun-08 Jul-09 Aug-10 Sep-11 Oct-12 Nov-13 Dec-14 Count Date Bacterial Genomes Report #Genomes #Real Species Courtesy of NCBI
  • 7. 7 Microbial Reference Database (MicroDB)($1,67M) • Identify “gaps” and target sequencing efforts (Funding awarded by FDA/OCET) • All raw reads, assemblies, annotations, metadata sent to NCBI and accessible to the PUBLIC • Traceable results that could be reevaluated as necessary >600 Clinically Relevant and MCM Microorganisms Highly Controlled and Documented Approach Collaborations with Clinical Labs and Repositories • Children’s National Hospital • DoD Critical Reagents Program (CRP, USAMRIID) • FDA-CFSAN, FDA-CBER, FDA-CDER • DHS National Biodefense Analysis and Countermeasures Center (NBACC) • The Rockefeller University • Culture Collections: ATCC, DSMZ Sequencing Center (UMD IGS) • Hybrid Approach (PacBio and Illumina) • Deposit of Raw Reads at NCBI (SRA) • Deposit of Assemblies at NCBI • Deposit of Annotations at NCBI • FDA Interface to Access Data
  • 8. MicroDB Requirements A. Extracted Genomic DNA (gDNA) – Extracted gDNA should be of high quality and purity, and at sufficient concentration to achieve a suitable yield to assure adequate depth and breadth of genomic coverage for the type of sequencing method employed. B. BioSample Metadata – A minimal description of the isolate source material is necessary for traceability. We are using 14 descriptors as outlined below. (Note: Minimal metadata is modeled in part after NCBI’s minimal pathogen template) – Unique ID, organism, strain/isolate, sample site, specimen type, host disease, collection date, collected by, patient age, gender, geographic location, AST method*, AST method manufacturer*, Antimicrobial Susceptibilities* C. Sequencing Data – The minimum requirement for sequencing data is that the generated raw reads should be deposited in NCBI’s Sequence Read Archive (SRA) and assemblies should be deposited at NCBI’s Assembly division. The availability of raw reads and assemblies will provide a pathway to re-analyze the data as newer technologies emerge. Furthermore, annotation data should be deposited when available. – Raw reads, assemblies, annotations* 8 *not used as a criteria for exclusion
  • 9. MicroDB Requirements D. Sequencing Metadata – A minimal description of the sequencing process is necessary for traceability. We are using 7 descriptors as outlined below including bioinformatics tool information for assembly and annotation, and genomic coverage information. – Library, platform, submitted by, fold coverage, pipeline, assembler, annotation tool* E. Suggested phenotypic metadata* – A description of the phenotypic information is suggested to create a link between the phenotypic traits of particular organisms and their genomic sequence. We are recommending 5 descriptors as outlined below (1-4 are also included in sections B and C). – Annotation, AST method, AST method manufacturer, antimicrobial susceptibilities, additional phenotypic data 9 *not used as a criteria for exclusion
  • 10. NCBI Submission Cases 1. Childrens National Medical Center – Submit all data when available – Register sample metadata via BioSample – Submit raw reads and assemblies generated by IGS when available 2. FDA/CFSAN – Collaborative agreement: Wait for genome announcements – Follow same procedures as for 1 and put a ‘6 month hold’ to release data, lift hold when genome announcements are out 3. Rockefeller University – Collaborative agreement: Wait for publication – Follow same procedures as for 1 and put a ‘6 month hold’ to release data, lift hold when publication is out Similar agreements in place with other collaborators depending on their needs 10
  • 11. Project Approach • Sequencing in large batches – Illumina HiSeq paired-end sequencing: >200x – PacBio long-insert SMRT P4-C2 sequencing: >80-100x • Assembly – PacBio only (HGAP, PBcR CA) – Illumina only (CA, MaSuRCA) – PacBio/Illumina hybrid (CA) – Minimal manual QA/QC & curation • Automated Annotation • Base modification detection • Raw reads -> NCBI SRA • Assembled & annotated genomes -> Genbank – NCBI BIOPROJECT ID: PRJNA231221 • FDA Web interface to aggregate data
  • 12. Progress - Batch 1 Rockefeller (50) • Uniform sample set – Staphylococcus aureus – 2.8Mbp genome size – 32.8 %GC – Significant metadata CNH/CFSAN (41) • Diverse sample set – 18 genera represented – 2 – 8 Mbp genome size range – 38 – 67 %GC range Wikimedia Commons Wikimedia Commons NCBI BioProject: PRJNA231221
  • 13. Rockefeller Samples • Sequencing – Avg Illumina cvg: 578x – Avg PacBio cvg: 185x – 1 or 2 SMRT cells each • Assembly: – 32 of 50 in single contig chromosome – Average contig count = 5 – “Best” assembly: • HGAP = 29 • CA hybrid = 21 • Most differences subtle • Annotation complete • Final QC & data submissions underway
  • 14. CNH/CFSAN Samples • Sequencing – Avg Illumina cvg: 315x – Avg PacBio cvg: 167x • 2 SMRT cells each • Assembly – 12 of 41 in single contig chromosome • 29 in <= 5 contigs – Avg contig count = 4.5 – Median contig count = 3 – “Best” assembly (of 41): • HGAP = 24 • PBcR CA = 14 • CA hybrid = 3 • Annotation underway
  • 15. ROCK_290Celera8ctgvs.ref 05000001000000150000020000002500000 gi|374362062|gb|CP003033.1| 0 500000 1000000 1500000 2000000 2500000 ctg7180000000002 0 20 40 60 80 100 %similarity Assembly QC & Curation CA8 – Ill/PB hybrid Largest Ctg Len: 2,759,091bp Total asm Ctg Len: 2,770,822 bp ROCK_290HGAP2ctgvs.ref 05000001000000150000020000002500000 gi|374362062|gb|CP003033.1| scf7180000000011|quiverscf7180000000010|quiverscf7180000000012|quiver scf7180000000013|quiverscf7180000000014|quiver QRY HGAP2 Largest Ctg Len: 2,128,476bp Total asm Ctg Len: 2,802,621 bp
  • 17. Challenges & Opportunities • Sample acquisition & quality • Efficiency/throughput vs. accuracy/quality – Sequencing strategy – Assembly QA/QC & curation • Ever longer reads! – Reduced coverage -> higher efficiency sequencing – More “closed” genomes! • Small plasmids – SageELF & Illumina
  • 18. 18 Thank You LLNL Tom Slezak NIH-NCBI Bill Klimke, Martin Shumway, David Lipman NIH-NIAID Vivien Dugan, Maria Giovani DTRA Matt Tobelmann, Chris Detter, Eric VanGieson, Nels Olsen CDC Duncan MacCannell FDA-CFSAN Maria Hoffmann, Cary Pirone, Andrea Ottessen, Marc Allard, Eric Brown NMRC Kim Bishop-Lilly, Ken Frey FDA Micro Team Peyton Hobson, Brittany Goldberg, Kevin Snyder, Tamara Feldblyum, Uwe Scherf, Sally Hojvat Collaborators IGS@UMD Lisa Sadzewicz, Luke Tallon, Naomi Sengamalay, Al Godinez, Sandy Ott, Sushma Nagaraj, Claire Fraser Rockefeller University Bryan Utter, Douglas Deutsch Children’s National Medical Center Brittany Goldberg, Joseph Campos DOD-CRP Shanmuga Sozhamannan, Mike Smith DOD-USAMRIID Tim Minogue NBACC Adam Phillippy, Nick Bergman ATCC Liz Kerrigan DSMZ Cathrin Sproer