"Development of FDA MicroDB: A Regulatory-Grade
Microbial Reference Database" presentation at the Standards for Pathogen Identification via NGS (SPIN) workshop hosted by National Institute for Standards and Technology October 2014 by Heike Sichtig, PhD from the FDA and Luke Tallon from IGS UMSOM.
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
1. Development of FDA MicroDB:
A Regulatory-Grade
Microbial Reference Database
Heike Sichtig, Ph.D.
Division of Microbiology Devices
OIR/CDRH/FDA/HHS
Heike.Sichtig@fda.hhs.gov
U.S. Food and Drug Administration
Institute for Genome Sciences
October 21-22, 2014
NIST Workshop to Identify Standards Needed to Support Pathogen Identification
via Next-Generation Sequencing, NIST, MD
Luke Tallon
Genomics Resource Center
Institute for Genome Sciences
UMSOM
ljtallon@som.umaryland.edu
2. 2
Microbial NGS-Based Diagnostic Devices
• OIR/DMD working on a fast-tracked Draft Guidance
• On April 1st 2014 held Public Workshop
“Advancing Regulatory Science for High Throughput Sequencing
Devices for Microbial Identification and Detection of Antimicrobial
Resistance Markers” [FR Doc No: 2014-04940]
• Workshop agenda, discussion paper and webcast online:
http://www.fda.gov/MedicalDevices/NewsEvents/WorkshopsConferences/ucm386967.htm
Objectives:
1. Streamline/shorten clinical trials for microbial diagnosis/identification
2. Establish a new comparator algorithm for assays developed using this
new technology
3. Develop regulatory science standards for microbial genome sequencing
4. Investigate the regulatory science required for antimicrobial resistance
determination through microbial genome sequence information.
3. 3
Inter-Agency Working Group on Feasibility
Approach:
• Formed a diverse working group FDA, NIH-NCBI, NIAID, DTRA,
LLNL, and CDC
• Conducted small pilot study to generate information to evaluate
quality of existing sequences in the public domain (In Progress)
• Identify the pre-existing high-quality deposits, and build from
there
• Will use information to set quality bar for sequence outputs for
our ongoing sequencing efforts
• Utilized existing standards (if available) for technical and isolate
metadata –no need to re-invent
• Attention given to connecting antimicrobial resistance
phenotype to genomic deposits – clinical collection site
4. – Multiple levels of Reference DBs likely
• “High quality” genomes only
– For validation and clinical use
• “High quality” + other available genomes
– For testing and development
• Requires definition of “high quality” that must include
some draft genomes
– Extensive screening required
• Human and other hosts; chimeras
• Artificial constructs
– Separate bacterial, viral, fungal reference DBs
– Publicly available (NCBI/EMBL/DDBJ)
4
Looking ahead: Predictions for Reference Databases
Courtesy of Tom Slezak
7. 7
Microbial Reference Database (MicroDB)($1,67M)
• Identify “gaps” and target sequencing efforts (Funding awarded by FDA/OCET)
• All raw reads, assemblies, annotations, metadata sent to NCBI and
accessible to the PUBLIC
• Traceable results that could be reevaluated as necessary
>600 Clinically
Relevant and MCM
Microorganisms
Highly
Controlled
and
Documented
Approach
Collaborations with Clinical Labs and Repositories
• Children’s National Hospital
• DoD Critical Reagents Program (CRP, USAMRIID)
• FDA-CFSAN, FDA-CBER, FDA-CDER
• DHS National Biodefense Analysis and
Countermeasures Center (NBACC)
• The Rockefeller University
• Culture Collections: ATCC, DSMZ
Sequencing Center (UMD IGS)
• Hybrid Approach (PacBio and Illumina)
• Deposit of Raw Reads at NCBI (SRA)
• Deposit of Assemblies at NCBI
• Deposit of Annotations at NCBI
• FDA Interface to Access Data
8. MicroDB Requirements
A. Extracted Genomic DNA (gDNA)
– Extracted gDNA should be of high quality and purity, and at sufficient concentration to
achieve a suitable yield to assure adequate depth and breadth of genomic coverage for
the type of sequencing method employed.
B. BioSample Metadata
– A minimal description of the isolate source material is necessary for traceability. We are
using 14 descriptors as outlined below. (Note: Minimal metadata is modeled in part after
NCBI’s minimal pathogen template)
– Unique ID, organism, strain/isolate, sample site, specimen type, host disease, collection
date, collected by, patient age, gender, geographic location, AST method*, AST method
manufacturer*, Antimicrobial Susceptibilities*
C. Sequencing Data
– The minimum requirement for sequencing data is that the generated raw reads should be
deposited in NCBI’s Sequence Read Archive (SRA) and assemblies should be deposited
at NCBI’s Assembly division. The availability of raw reads and assemblies will provide a
pathway to re-analyze the data as newer technologies emerge. Furthermore, annotation
data should be deposited when available.
– Raw reads, assemblies, annotations*
8
*not used as a criteria for exclusion
9. MicroDB Requirements
D. Sequencing Metadata
– A minimal description of the sequencing process is necessary for traceability. We are
using 7 descriptors as outlined below including bioinformatics tool information for
assembly and annotation, and genomic coverage information.
– Library, platform, submitted by, fold coverage, pipeline, assembler, annotation tool*
E. Suggested phenotypic metadata*
– A description of the phenotypic information is suggested to create a link between the
phenotypic traits of particular organisms and their genomic sequence. We are
recommending 5 descriptors as outlined below (1-4 are also included in sections B and C).
– Annotation, AST method, AST method manufacturer, antimicrobial susceptibilities,
additional phenotypic data
9
*not used as a criteria for exclusion
10. NCBI Submission Cases
1. Childrens National Medical Center
– Submit all data when available
– Register sample metadata via BioSample
– Submit raw reads and assemblies generated by IGS when available
2. FDA/CFSAN
– Collaborative agreement: Wait for genome announcements
– Follow same procedures as for 1 and put a ‘6 month hold’ to release
data, lift hold when genome announcements are out
3. Rockefeller University
– Collaborative agreement: Wait for publication
– Follow same procedures as for 1 and put a ‘6 month hold’ to release
data, lift hold when publication is out
Similar agreements in place with other collaborators
depending on their needs
10
11. Project Approach
• Sequencing in large batches
– Illumina HiSeq paired-end sequencing: >200x
– PacBio long-insert SMRT P4-C2 sequencing: >80-100x
• Assembly
– PacBio only (HGAP, PBcR CA)
– Illumina only (CA, MaSuRCA)
– PacBio/Illumina hybrid (CA)
– Minimal manual QA/QC & curation
• Automated Annotation
• Base modification detection
• Raw reads -> NCBI SRA
• Assembled & annotated genomes -> Genbank
– NCBI BIOPROJECT ID: PRJNA231221
• FDA Web interface to aggregate data
12. Progress - Batch 1
Rockefeller (50)
• Uniform sample set
– Staphylococcus aureus
– 2.8Mbp genome size
– 32.8 %GC
– Significant metadata
CNH/CFSAN (41)
• Diverse sample set
– 18 genera represented
– 2 – 8 Mbp genome size range
– 38 – 67 %GC range
Wikimedia Commons Wikimedia Commons
NCBI BioProject: PRJNA231221
13. Rockefeller Samples
• Sequencing
– Avg Illumina cvg: 578x
– Avg PacBio cvg: 185x
– 1 or 2 SMRT cells each
• Assembly:
– 32 of 50 in single contig chromosome
– Average contig count = 5
– “Best” assembly:
• HGAP = 29
• CA hybrid = 21
• Most differences subtle
• Annotation complete
• Final QC & data submissions underway
14. CNH/CFSAN Samples
• Sequencing
– Avg Illumina cvg: 315x
– Avg PacBio cvg: 167x
• 2 SMRT cells each
• Assembly
– 12 of 41 in single contig chromosome
• 29 in <= 5 contigs
– Avg contig count = 4.5
– Median contig count = 3
– “Best” assembly (of 41):
• HGAP = 24
• PBcR CA = 14
• CA hybrid = 3
• Annotation underway
18. 18
Thank You
LLNL
Tom Slezak
NIH-NCBI
Bill Klimke, Martin Shumway, David Lipman
NIH-NIAID
Vivien Dugan, Maria Giovani
DTRA
Matt Tobelmann, Chris Detter, Eric
VanGieson, Nels Olsen
CDC
Duncan MacCannell
FDA-CFSAN
Maria Hoffmann, Cary Pirone, Andrea
Ottessen, Marc Allard, Eric Brown
NMRC
Kim Bishop-Lilly, Ken Frey
FDA Micro Team
Peyton Hobson, Brittany Goldberg, Kevin Snyder, Tamara Feldblyum, Uwe Scherf, Sally Hojvat
Collaborators
IGS@UMD
Lisa Sadzewicz, Luke Tallon, Naomi
Sengamalay, Al Godinez, Sandy
Ott, Sushma Nagaraj, Claire Fraser
Rockefeller University
Bryan Utter, Douglas Deutsch
Children’s National Medical Center
Brittany Goldberg, Joseph Campos
DOD-CRP
Shanmuga Sozhamannan, Mike Smith
DOD-USAMRIID
Tim Minogue
NBACC
Adam Phillippy, Nick Bergman
ATCC
Liz Kerrigan
DSMZ
Cathrin Sproer