Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Food Safety (Bio-)Informatics

49 views

Published on

2018 Annual Meeting of the Council of Sponsoring Institutions
Henk C. den Bakker
Assistant Professor in Bioinformatics and Epidemiology
Center for Food Safety
University of Georgia

Published in: Government & Nonprofit
  • Be the first to comment

  • Be the first to like this

Food Safety (Bio-)Informatics

  1. 1. Food Safety (Bio-)Informatics Henk C. den Bakker Assistant Professor in Bioinformatics and Epidemiology Center for Food Safety University of Georgia hcd82599@uga.edu H.C. den Bakker, Oak Ridge, March 6-8, 2018
  2. 2. Overview • Short introduction of Food Safety Informatics • The digital immune system H.C. den Bakker, Oak Ridge, March 6-8, 2018
  3. 3. Food Safety Informatics? • The use of information and computer science to advance food safety • A combination of different individual disciplines: • Statistics • Computer science • Epidemiology • Bioinformatics • Using Big Data approaches, The Internet of Things H.C. den Bakker, Oak Ridge, March 6-8, 2018
  4. 4. The rise of a digital immune system (DIS) • Coined by David Lipman • Further worked out by Michael Schatz and Adam Phillippy in 2012* • Would work in much the same way as an adaptive, biological immune system: • Observe the microbial landscape • Detect potential threats • Neutralize threads before they can cause widespread harm • ‘Distributed “sensor” sequencing and bioinformatics—where a network of mobile sequencing devices serves a real-time stream of microbial genomes to a global compute cloud for analysis.’ *Schatz, M.C, & A. Phillippy. 2012.GigaScience 1 (1): 4. doi:10.1186/2047-217X-1-4. H.C. den Bakker, Oak Ridge, March 6-8, 2018
  5. 5. What is necessary for a digital immune system? • A catalogue of microbial diversity, so we can tell the normal from the abnormal (a potential thread) • Centralized (genome) databases, such as NCBI, EMBL and DDBJ • Rapid bioinformatics tools to deal with the growing amount of (real- time) data • sequencing devices (preferably inexpensive and portable) that can act as the sensors in a distributed, real-time sequencing network H.C. den Bakker, Oak Ridge, March 6-8, 2018
  6. 6. The digital immune system http://hint.fm/wind/ H.C. den Bakker, Oak Ridge, March 6-8, 2018
  7. 7. Applying the digital immune system to food safety: The GenomeTrakr project • Project spear-headed by the FDA* • GenomeTrakr is the first distributed network of labs to utilize whole genome sequencing for pathogen identification • Consists of 15 federal labs, 25 state health and university labs, 1 U.S. hospital lab, 2 other labs located in the U.S., 20 labs located outside of the U.S., and collaborations with independent academic researchers. • Data curation and bioinformatic analyses and support are provided by the National Center for Biotechnology Information (NCBI) at the National Institutes of Health • The GenomeTrakr network has sequenced more than 167,000 isolates, and closed more than 175 genomes. The network is regularly sequencing over 5,000 isolates each month. *https://www.fda.gov/Food/FoodScienceResearch/WholeGenomeSequencingProgramWGS/default.htm H.C. den Bakker, Oak Ridge, March 6-8, 2018
  8. 8. The ‘sensors’ and the network • Illumina® short read sequencers, in particular the MiSeq • Generate genome sequences as ‘short’ reads, typically >> 200,000 per bacterial genome https://www.illumina.comH.C. den Bakker, Oak Ridge, March 6-8, 2018
  9. 9. The ‘sensors’ and the network H.C. den Bakker, Oak Ridge, March 6-8, 2018
  10. 10. Using whole genome sequencing (WGS) data in outbreak investigations • WGS data give unprecedented resolution • Ability to use genomic changes that can help us to infer relatedness with strains in past and present (Single Nucleotide Polymorphisms). • After ~ 2 years of using WGS for outbreak investigations*: • aid in finding the food vehicle for ‘cold’ cases and sporadic cases, as WGS can phylogenetically link isolates from human cases and food. • Sequencing of both food product and patient derived isolates, outbreaks can be confirmed following product testing, allowing for an early association of an outbreak with a contaminated food. • WGS can help in a rapid and precise outbreak case definition, and thus productively redirect epidemiological resources * Jackson et al. 2016. Clin Infect Dis.;63(3):380-6 H.C. den Bakker, Oak Ridge, March 6-8, 2018
  11. 11. NCBI Pathogen detection H.C. den Bakker, Oak Ridge, March 6-8, 2018
  12. 12. The database is growing… H.C. den Bakker, Oak Ridge, March 6-8, 2018
  13. 13. How close is the GenomeTrakr network to a digital immune system? • Close, but far from real-time: • Still dependent on classical microbiology to isolate pathogens, which adds days to weeks to the protocol • Sequenchers are state of the art, but the sequencing procedure takes 2 to 3 days • The increasing size of the database becomes prohibitively large for real-time searches H.C. den Bakker, Oak Ridge, March 6-8, 2018
  14. 14. New sequencing technologies and (quasi- )metagenome sequencing • Novel sequencing protocols that need either no or very limited steps for enrichment of target organisms • Novel sequencing technologies e.g., Oxford Nanopore https://nanoporetech.com H.C. den Bakker, Oak Ridge, March 6-8, 2018
  15. 15. The databases are getting larger and larger H.C. den Bakker, Oak Ridge, March 6-8, 2018
  16. 16. Fortunately we can surf the Big Data wave Source: http://www.tech-dynamics.com/wp-content/uploads/2014/02/BigDataChart.png H.C. den Bakker, Oak Ridge, March 6-8, 2018
  17. 17. A rediscovery of ‘old’ data structures/algorithms • ‘Big Data’ is years ahead of the big increase in genomic data • In an effort to speed up analyses and searches of genomic data, old data structures and algorithms are rediscovered and/or re- implemented: • De Bruijn Graph (De Bruijn, 1946) genome assembly • Bloom filter (Bloom, 1970) • MinHash (Broder, 1997); efficient comparison of datasets H.C. den Bakker, Oak Ridge, March 6-8, 2018
  18. 18. MinHash; comparing large datasets with smaller ‘sketches’ • Originally developed to compare large electronic documents (Broder, 1998) • Summarizes documents as subsets (sketch) of a fixed size of their information, using a specific criterion to select the members of the subset • Example: a sketch of a thousand words is approximately large enough to infer the similarity of a document with millions of words • Translated to bacterial genomes, we can use the same strategy to divide genomes up in words (k- mers) and use a MinHash approach to estimate the relatedness of these genomes Ondov, Brian D. et al. 2016. Genome Biology 17 (1): 132. H.C. den Bakker, Oak Ridge, March 6-8, 2018
  19. 19. BIGSI: Searching microbial big data • BLAST has been the traditional search algorithm for genetic and genomic database centers such as NCBI (US), EBI (Europe). • However the majority of genomic data (by now hundreds of thousands) are stored as un-assembled genomes, consisting of hundreds of thousands to millions of small reads • BLAST is generally not fast enough to search these databases real- time H.C. den Bakker, Oak Ridge, March 6-8, 2018
  20. 20. From Bloom filters to BIGSI By David Eppstein - self-made, originally for a talk at WADS 2007, Public Domain,https://commons.wikimedia.org/w/index.php?curid=2609777 • Advantages: • small storage for large sets of elements • Fast search • Disadvantage: • False positives H.C. den Bakker, Oak Ridge, March 6-8, 2018
  21. 21. BIGSI: extension of the bloom filter P. Bradley, H.C. den Bakker, E. Rocha, G. McVean, Z. Iqbal. 2017. bioRxiv 234955; doi: https://doi.org/10.1101/234955 bitsliced genomic signature index (BIGSI) • Allows for superfast search of big sequence data databases • 3 antibiotic resistance genes (MCR-1, MCR-2, MCR3) could be searched in 1.73 seconds in a data-base of 447,833 viral and bacterial genomes. H.C. den Bakker, Oak Ridge, March 6-8, 2018
  22. 22. Summary • In food safety, the Genome Trackr network is the closest thing we have to a ‘digital immune system’ • In order to use this network to detect early threads we need further improvements: • Improvement of sample preparation methods/culture free methods • Sequenching technology (faster, easier, smaller) • Bioinformatics • These improvements are coming fast H.C. den Bakker, Oak Ridge, March 6-8, 2018

×