Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database

Development of FDA MicroDB:
A Regulatory-Grade
Microbial Reference Database
Heike Sichtig, Ph.D.
Division of Microbiology Devices
OIR/CDRH/FDA/HHS
Heike.Sichtig@fda.hhs.gov
U.S. Food and Drug Administration
Institute for Genome Sciences
October 21-22, 2014
NIST Workshop to Identify Standards Needed to Support Pathogen Identification
via Next-Generation Sequencing, NIST, MD
Luke Tallon
Genomics Resource Center
Institute for Genome Sciences
UMSOM
ljtallon@som.umaryland.edu

2
Microbial NGS-Based Diagnostic Devices
• OIR/DMD working on a fast-tracked Draft Guidance
• On April 1st 2014 held Public Workshop
“Advancing Regulatory Science for High Throughput Sequencing
Devices for Microbial Identification and Detection of Antimicrobial
Resistance Markers” [FR Doc No: 2014-04940]
• Workshop agenda, discussion paper and webcast online:
http://www.fda.gov/MedicalDevices/NewsEvents/WorkshopsConferences/ucm386967.htm
Objectives:
1. Streamline/shorten clinical trials for microbial diagnosis/identification
2. Establish a new comparator algorithm for assays developed using this
new technology
3. Develop regulatory science standards for microbial genome sequencing
4. Investigate the regulatory science required for antimicrobial resistance
determination through microbial genome sequence information.

3
Inter-Agency Working Group on Feasibility
Approach:
• Formed a diverse working group FDA, NIH-NCBI, NIAID, DTRA,
LLNL, and CDC
• Conducted small pilot study to generate information to evaluate
quality of existing sequences in the public domain (In Progress)
• Identify the pre-existing high-quality deposits, and build from
there
• Will use information to set quality bar for sequence outputs for
our ongoing sequencing efforts
• Utilized existing standards (if available) for technical and isolate
metadata –no need to re-invent
• Attention given to connecting antimicrobial resistance
phenotype to genomic deposits – clinical collection site

– Multiple levels of Reference DBs likely
• “High quality” genomes only
– For validation and clinical use
• “High quality” + other available genomes
– For testing and development
• Requires definition of “high quality” that must include
some draft genomes
– Extensive screening required
• Human and other hosts; chimeras
• Artificial constructs
– Separate bacterial, viral, fungal reference DBs
– Publicly available (NCBI/EMBL/DDBJ)
4
Looking ahead: Predictions for Reference Databases
Courtesy of Tom Slezak

5
Robust, Standardized, and High Quality Microbial
Sequence Database in the Public Sector
Current Need
Cover illustration
(Copyright © 2009, American Society
for Microbiology. All Rights Reserved.)
• Representative Samples
• Metadata
• High quality raw sequences
• Assemblies
• Annotation
• Public Domain

6
Latest NCBI Genbank Report on Bacterial Genome
Growth
0
5000
10000
15000
20000
25000
Jul-98 Aug-99 Oct-00 Nov-01 Dec-02 Jan-04 Feb-05 Mar-06 Apr-07 Jun-08 Jul-09 Aug-10 Sep-11 Oct-12 Nov-13 Dec-14
Count
Date
Bacterial Genomes Report
#Genomes
#Real Species
Courtesy of NCBI

7
Microbial Reference Database (MicroDB)($1,67M)
• Identify “gaps” and target sequencing efforts (Funding awarded by FDA/OCET)
• All raw reads, assemblies, annotations, metadata sent to NCBI and
accessible to the PUBLIC
• Traceable results that could be reevaluated as necessary
>600 Clinically
Relevant and MCM
Microorganisms
Highly
Controlled
and
Documented
Approach
Collaborations with Clinical Labs and Repositories
• Children’s National Hospital
• DoD Critical Reagents Program (CRP, USAMRIID)
• FDA-CFSAN, FDA-CBER, FDA-CDER
• DHS National Biodefense Analysis and
Countermeasures Center (NBACC)
• The Rockefeller University
• Culture Collections: ATCC, DSMZ
Sequencing Center (UMD IGS)
• Hybrid Approach (PacBio and Illumina)
• Deposit of Raw Reads at NCBI (SRA)
• Deposit of Assemblies at NCBI
• Deposit of Annotations at NCBI
• FDA Interface to Access Data

MicroDB Requirements
A. Extracted Genomic DNA (gDNA)
– Extracted gDNA should be of high quality and purity, and at sufficient concentration to
achieve a suitable yield to assure adequate depth and breadth of genomic coverage for
the type of sequencing method employed.
B. BioSample Metadata
– A minimal description of the isolate source material is necessary for traceability. We are
using 14 descriptors as outlined below. (Note: Minimal metadata is modeled in part after
NCBI’s minimal pathogen template)
– Unique ID, organism, strain/isolate, sample site, specimen type, host disease, collection
date, collected by, patient age, gender, geographic location, AST method*, AST method
manufacturer*, Antimicrobial Susceptibilities*
C. Sequencing Data
– The minimum requirement for sequencing data is that the generated raw reads should be
deposited in NCBI’s Sequence Read Archive (SRA) and assemblies should be deposited
at NCBI’s Assembly division. The availability of raw reads and assemblies will provide a
pathway to re-analyze the data as newer technologies emerge. Furthermore, annotation
data should be deposited when available.
– Raw reads, assemblies, annotations*
8
*not used as a criteria for exclusion

MicroDB Requirements
D. Sequencing Metadata
– A minimal description of the sequencing process is necessary for traceability. We are
using 7 descriptors as outlined below including bioinformatics tool information for
assembly and annotation, and genomic coverage information.
– Library, platform, submitted by, fold coverage, pipeline, assembler, annotation tool*
E. Suggested phenotypic metadata*
– A description of the phenotypic information is suggested to create a link between the
phenotypic traits of particular organisms and their genomic sequence. We are
recommending 5 descriptors as outlined below (1-4 are also included in sections B and C).
– Annotation, AST method, AST method manufacturer, antimicrobial susceptibilities,
additional phenotypic data
9
*not used as a criteria for exclusion

NCBI Submission Cases
1. Childrens National Medical Center
– Submit all data when available
– Register sample metadata via BioSample
– Submit raw reads and assemblies generated by IGS when available
2. FDA/CFSAN
– Collaborative agreement: Wait for genome announcements
– Follow same procedures as for 1 and put a ‘6 month hold’ to release
data, lift hold when genome announcements are out
3. Rockefeller University
– Collaborative agreement: Wait for publication
– Follow same procedures as for 1 and put a ‘6 month hold’ to release
data, lift hold when publication is out
Similar agreements in place with other collaborators
depending on their needs
10

Project Approach
• Sequencing in large batches
– Illumina HiSeq paired-end sequencing: >200x
– PacBio long-insert SMRT P4-C2 sequencing: >80-100x
• Assembly
– PacBio only (HGAP, PBcR CA)
– Illumina only (CA, MaSuRCA)
– PacBio/Illumina hybrid (CA)
– Minimal manual QA/QC & curation
• Automated Annotation
• Base modification detection
• Raw reads -> NCBI SRA
• Assembled & annotated genomes -> Genbank
– NCBI BIOPROJECT ID: PRJNA231221
• FDA Web interface to aggregate data

Progress - Batch 1
Rockefeller (50)
• Uniform sample set
– Staphylococcus aureus
– 2.8Mbp genome size
– 32.8 %GC
– Significant metadata
CNH/CFSAN (41)
• Diverse sample set
– 18 genera represented
– 2 – 8 Mbp genome size range
– 38 – 67 %GC range
Wikimedia Commons Wikimedia Commons
NCBI BioProject: PRJNA231221

Rockefeller Samples
• Sequencing
– Avg Illumina cvg: 578x
– Avg PacBio cvg: 185x
– 1 or 2 SMRT cells each
• Assembly:
– 32 of 50 in single contig chromosome
– Average contig count = 5
– “Best” assembly:
• HGAP = 29
• CA hybrid = 21
• Most differences subtle
• Annotation complete
• Final QC & data submissions underway

CNH/CFSAN Samples
• Sequencing
– Avg Illumina cvg: 315x
– Avg PacBio cvg: 167x
• 2 SMRT cells each
• Assembly
– 12 of 41 in single contig chromosome
• 29 in <= 5 contigs
– Avg contig count = 4.5
– Median contig count = 3
– “Best” assembly (of 41):
• HGAP = 24
• PBcR CA = 14
• CA hybrid = 3
• Annotation underway

ROCK_290Celera8ctgvs.ref
05000001000000150000020000002500000
gi|374362062|gb|CP003033.1|
0
500000
1000000
1500000
2000000
2500000
ctg7180000000002
0
20
40
60
80
100
%similarity
Assembly QC & Curation
CA8 – Ill/PB hybrid
Largest Ctg Len: 2,759,091bp
Total asm Ctg Len: 2,770,822 bp
ROCK_290HGAP2ctgvs.ref
05000001000000150000020000002500000
gi|374362062|gb|CP003033.1|
scf7180000000011|quiverscf7180000000010|quiverscf7180000000012|quiver
scf7180000000013|quiverscf7180000000014|quiver
QRY
HGAP2
Total asm Ctg Len: 2,802,621 bp

05000001000000150000020000002500000
gi|595636499|gb|CP007454.1|
0
500000
1000000
1500000
2000000
2500000
scf7180000000002|quiver
0
20
40
60
80
100
%similarity
HGAP2
Total asm Ctg Len: 2,764,709bp
4bp overlap?
1X coverage
TAAC
1X coverage
TAGC
Assembly QC & Curation

Challenges & Opportunities
• Sample acquisition & quality
• Efficiency/throughput vs. accuracy/quality
– Sequencing strategy
– Assembly QA/QC & curation
• Ever longer reads!
– Reduced coverage -> higher efficiency sequencing
– More “closed” genomes!
• Small plasmids
– SageELF & Illumina

18
Thank You
LLNL
Tom Slezak
NIH-NCBI
Bill Klimke, Martin Shumway, David Lipman
NIH-NIAID
Vivien Dugan, Maria Giovani
DTRA
Matt Tobelmann, Chris Detter, Eric
VanGieson, Nels Olsen
CDC
Duncan MacCannell
FDA-CFSAN
Maria Hoffmann, Cary Pirone, Andrea
Ottessen, Marc Allard, Eric Brown
NMRC
Kim Bishop-Lilly, Ken Frey
FDA Micro Team
Peyton Hobson, Brittany Goldberg, Kevin Snyder, Tamara Feldblyum, Uwe Scherf, Sally Hojvat
Collaborators
IGS@UMD
Lisa Sadzewicz, Luke Tallon, Naomi
Sengamalay, Al Godinez, Sandy
Ott, Sushma Nagaraj, Claire Fraser
Rockefeller University
Bryan Utter, Douglas Deutsch
Children’s National Medical Center
Brittany Goldberg, Joseph Campos
DOD-CRP
Shanmuga Sozhamannan, Mike Smith
DOD-USAMRIID
Tim Minogue
NBACC
Adam Phillippy, Nick Bergman
ATCC
Liz Kerrigan
DSMZ
Cathrin Sproer

Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database

Similar to Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database (20)

Recently uploaded

Recently uploaded (20)

Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database