SlideShare a Scribd company logo
1 of 32
Download to read offline
Building bioinformatics resources
for the global community
James Pettengill
james.pettengill@fda.hhs.gov
Biostatistics and Bioinformatics Staff
Office of Analytics and Outreach
FDA Center for Food Safety and Applied Nutrition
GMI9
May 24, 2016
Rome, Italy
CFSAN’s open-access peer reviewed methods for analyzing and differentiating
among samples based on WGS data.
Submitted 16 April 2014
Accepted 23 September 2014
Published 14 October 2014
Corresponding author
Errol Strain,
Errol.Strain@fda.hhs.gov
Academic editor
Keith Crandall
Additional Information and
Declarations can be found on
page 21
DOI 10.7717/peerj.620
An evaluation of alternative methods for
constructing phylogenies from whole
genome sequence data: a case study with
Salmonella
James B. Pettengill, Yan Luo, Steven Davis, Yi Chen,
Narjol Gonzalez-Escalona, Andrea Ottesen, Hugh Rand,
Marc W. Allard and Errol Strain
Center for Food Safety & Applied Nutrition, U.S. Food & Drug Administration, College Park,
MD, USA
ABSTRACT
Comparative genomics based on whole genome sequencing (WGS) is increasingly
being applied to investigate questions within evolutionary and molecular biology,
as well as questions concerning public health (e.g., pathogen outbreaks). Given the
impact that conclusions derived from such analyses may have, we have evaluated
the robustness of clustering individuals based on WGS data to three key factors: (1)
next-generation sequencing (NGS) platform (HiSeq, MiSeq, IonTorrent, 454, and
SOLiD), (2) algorithms used to construct a SNP (single nucleotide polymorphism)
matrix (reference-based and reference-free), and (3) phylogenetic inference method
(FastTreeMP, GARLI, and RAxML). We carried out these analyses on 194 whole
genome sequences representing 107 unique Salmonella enterica subsp. enterica ser.
Montevideo strains. Reference-based approaches for identifying SNPs produced trees
that were significantly more similar to one another than those produced under the
reference-free approach. Topologies inferred using a core matrix (i.e., no missing
data) were significantly more discordant than those inferred using a non-core matrix
that allows for some missing data. However, allowing for too much missing data likely
results in a high false discovery rate of SNPs. When analyzing the same SNP matrix,
we observed that the more thorough inference methods implemented in GARLI and
RAxML produced more similar topologies than FastTreeMP. Our results also confirm
that reproducibility varies among NGS platforms where the MiSeq had the lowest
number of pairwise diVerences among replicate runs. Our investigation into the ro-
bustness of clustering patterns illustrates the importance of carefully considering how
data from diVerent platforms are combined and analyzed. We found clear diVerences
in the topologies inferred, and certain methods performed significantly better than
others for discriminating between the highly clonal organisms investigated here. The
methods supported by our results represent a preliminary set of guidelines and a
step towards developing validated standards for clustering based on whole genome
sequence data.
Real-time pathogen detection in the era of whole-genome
sequencing and big data: K-mer and site-based methods for
inferring the distances among tens of thousands of
Salmonella samples
James Pettengill
james.pettengill@fda.hhs.gov
Biostatistics and Bioinformatics Staff
Office of Analytics and Outreach
FDA Center for Food Safety and Applied Nutrition
GMI9
May 24, 2016
Rome, Italy
•  The adoption of whole-genome sequencing within the public health realm has
resulted in large databases being populated in real-time.
Premise/Background of the project
•  The adoption of whole-genome sequencing within the public health realm has
resulted in large databases being populated in real-time.
•  These databases contain 60,000+ samples and are expected to grow to hundreds of
thousands within a few years.
Premise/Background of the project
•  The adoption of whole-genome sequencing within the public health realm has
resulted in large databases being populated in real-time.
•  These databases contain 60,000+ samples and are expected to grow to hundreds of
thousands within a few years.
•  For these databases to be of optimal use one must be able to quickly interrogate them
to accurately determine the genomic distances among a set of samples.
Premise/Background of the project
•  The adoption of whole-genome sequencing within the public health realm has
resulted in large databases being populated in real-time.
•  These databases contain 60,000+ samples and are expected to grow to hundreds of
thousands within a few years.
•  For these databases to be of optimal use one must be able to quickly interrogate them
to accurately determine the genomic distances among a set of samples.
•  Being able to do so is challenging due to both biological (evolutionary diverse
samples) and computational (petabytes of sequence data) issues.
Premise/Background of the project
•  The adoption of whole-genome sequencing within the public health realm has
resulted in large databases being populated in real-time.
•  These databases contain 60,000+ samples and are expected to grow to hundreds of
thousands within a few years.
•  For these databases to be of optimal use one must be able to quickly interrogate them
to accurately determine the genomic distances among a set of samples.
•  Being able to do so is challenging due to both biological (evolutionary diverse
samples) and computational (petabytes of sequence data) issues.
•  Evaluated 7 measures of genetic distance based on k-mer profiles (Jaccard, Euclidean,
Manhattan, Mash Jaccard, and Mash distances) and nucleotide sites (NUCmer and
whole-genome multi-locus sequence typing (wgMLST))
Premise/Background of the project
•  The adoption of whole-genome sequencing within the public health realm has
resulted in large databases being populated in real-time.
•  These databases contain 60,000+ samples and are expected to grow to hundreds of
thousands within a few years.
•  For these databases to be of optimal use one must be able to quickly interrogate them
to accurately determine the genomic distances among a set of samples.
•  Being able to do so is challenging due to both biological (evolutionary diverse
samples) and computational (petabytes of sequence data) issues.
•  Evaluated 7 measures of genetic distance based on k-mer profiles (Jaccard, Euclidean,
Manhattan, Mash Jaccard, and Mash distances) and nucleotide sites (NUCmer and
multi-locus sequence typing (MLST))
•  Empirical data: whole-genome sequence data from 18,997 Salmonella isolates
Premise/Background of the project
NutButter Outbreak
?
http://www.cdc.gov/salmonella/braenderup-08-14/index.html
NCBI GenomeTrakr Tree
Efficient method
inter-category comparisons
intra-category comparisons
genetic distances
Experimental design: based on a classification scheme determine how
well each distance measure performs
#
Inefficient method
genetic distances
#
Experimental design:
Simulated data:
Experimental design:
Empirical data:
•  Analyze different distance methods on de novo assemblies of all Salmonella
samples in GenomeTrakr
•  Use serovar as the classification scheme
Efficient method
inter-enteritidis comparisons
intra-enteritidis comparisons
genetic distances
#
Experimental design:
Empirical data: using cloud computing to perform assemblies on GenomeTrakr data
Assembly workflow:
Obtain latest metadata file
from NCBI pathogen database
Assembly workflow:
Obtain latest metadata file
from NCBI pathogen database
Parse metadata and download raw data
Experimental design:
Empirical data: using cloud computing to perform assemblies on GenomeTrakr data
Assembly workflow:
Obtain latest metadata file
from NCBI pathogen database
Parse metadata and download raw data
Quality filter using fastx toolkit
Experimental design:
Empirical data: using cloud computing to perform assemblies on GenomeTrakr data
Assembly workflow:
Obtain latest metadata file
from NCBI pathogen database
Parse metadata and download raw data
Quality filter using fastx toolkit
Taxonomic/contamination filtering
using Kraken with custom db
Experimental design:
Empirical data: using cloud computing to perform assemblies on GenomeTrakr data
Assembly workflow:
Obtain latest metadata file
from NCBI pathogen database
Parse metadata and download raw data
Quality filter using fastx toolkit
Taxonomic/contamination filtering
using Kraken with custom db
Assembly using
SPAdes
Experimental design:
Empirical data: using cloud computing to perform assemblies on GenomeTrakr data
1.  Obtain an assembly for each sample within GenomeTrakr
•  Use pilot of cloud computing to accomplish assemblies – “cloudbursting”
Summary! –! We! have! successfully!completed! running! Use! Cases! 2! and! 3! on! AWS!
servers!via!the!CycleCloud!platform.!Even!without!time!for!extensive!optimization!of!
the!clusters,!we!were!able!to!complete!the!Use!Cases!rapidly!and!inexpensively.!!
!
!
!
Use)Case)2)–))Listeria)Isolates)
)
! A!workflow!was!designed!to!analyze!sequencing!data!from!all!of!the!publicly!
available! Listeria! isolates! (3645)! collected! by! the! GenomeTrackr! network.! This!
workflow! involves! downloading! data! from! the! NCBI! servers,! trimming! the!
sequencing!reads!based!on!quality!scores,!filtering!the!reads!based!on!quality!and!
taxonomy,!and!assembling!the!reads!into!contiguous!genome!segments.!The!results!
of! this! workflow! will! allow! us! to! improve! our! methods! of! identifying! outbreak!
isolates.!
! !
1.!!Cluster!Specs!–!!
! Max!cores! ! 4000!
! Max!parallel!jobs! 1000!
! Master!node! ! i2.4xlarge!
! Compute!nodes!! r3.2xlarge,!r3.4xlarge!
!
2.!Results!–!
! Jobs! ! ! 3645!
! Run!time! ! 8)hours!!
! Job!completion!rate! 99.8%!
! Approximate!cost! $1800.00!
!
3.!Additional!Notes!–!
! Local!runtime!! ! 3.5)days!
! Feasible!to!run!locally! YES!
! Anticipated!frequency! once/quarter!
! Estimated!yearly!cost! !$9000.00!
*!Assuming!the!current!growth!rate!of!this!dataset,!we!estimate!900!additional!samples!per!
quarter!for!the!next!year.!
!
3,645 Listeria assemblies
!
Use)Case)3)–))Salmonella)Isolates)
)
! Our!revised!Use!Case!3!applies!the!workflow!described!
publicly! available! Salmonella! isolates! (25765)! collected! by!
network.!The!analysis!of!this!dataset!is!much!more!difficult!due!
size!and!a!much!larger!number!of!isolates!and!is!not!feasible!o
resources.!
!
1.!!Cluster!Specs!–!!
! Max!cores! ! 12000!
! Max!parallel!jobs! 3000!
! Master!node! ! i2.4xlarge!
! Compute!nodes!! r3.2xlarge,!r3.4xlarge,!r3.8xlarge!
!
2.!Results!–!
! Jobs! ! ! 25765!
! Run)time! ! 20)hours!!
! Job!completion!rate! 99.1%!
! Approximate!cost! $8000.00!
!
3.!Additional!Notes!–!
! Estimated)local)runtime! 23)days!
! Feasible)to)run)locally! NO!
! Anticipated!frequency! once/quarter!
! Estimated!yearly!cost! !$56000.00!
*!Assuming!the!current!growth!rate!of!this!dataset,!we!estimate!12,50
per!quarter!for!the!next!year.!
!
25,765 Salmonella assemblies
Site-based:
Sample1: ACCTAGTACC
Sample2: ACGTACTACC
Requires statements about homology/
sequence alignment
Kmer-based (L = 9):
Sample1: ACCTAGTACC
kmer1: ACCTAGTAC
kmer2: CCTAGTACC
Sample2: ACGTACTACC
kmer1: ACGTACTAC
kmer2: CGTACTACC
Fast but loss/oversimplification of
information
Similarity = 0.8
Similarity = 0
Experimental design:
Distance measures
Summary of methods used to infer the relationships among samples.
Class Method Description Exec. time (s)
Site-based
Nucmer§
Pairwise genome alignment using suffix arrays 11.9
wgMLST¶
Gene based approach 46.95
K-mer
based
Jaccard Index§ The intersection divided by the union of all K-mers found between two
samples
9.4
Manhattan Distance§ Sum of the absolute differences between the abundance of each K-mer
present between two samples
45.1
Euclidean Distance§ The square root of the sum of square of all pairwise differences in K-
mer abundance
44.2
Mash Distance
MinHash (Broder 1998) technique to reduce genomes to sketches and
estimates a novel evolutionary distance metric among them
1.2
Mash Jaccard Distance
The Jaccard Distance (as described above) but based on the sketch size
(e.g., the number of hashes)
1.2
§
Performed using de novo assemblies and requires k-mer indexing, which with jellyfish takes 7.4s (0.8) per sample (2.1
days for 25,000 samples)
¶
Requires a reference genome
Classification of simulated data:
ROC curves identical across
different distance methods
* Simulated data is not complex/
noisy enough
Summary/Implications:
•  There are features (e.g., genomic, assembly, and contamination) that cause
k-mer based methods to fail to accurately capture the distance between
samples.
•  Treating absent data as informative may be problematic
Summary/Implications:
•  There are features (e.g., genomic, assembly, and contamination) that cause
k-mer based methods to fail to accurately capture the distance between
samples.
•  Treating absent data as informative may be problematic
•  Site-based methods, like NUCmer and MLST, tended to be superior in
performance
Summary/Implications:
•  There are features (e.g., genomic, assembly, and contamination) that cause
k-mer based methods to fail to accurately capture the distance between
samples.
•  Treating absent data as informative may be problematic
•  Site-based methods, like NUCmer and MLST, tended to be superior in
performance
•  Accessing the computing resources necessary to perform site-based
methods may be challenging when analyzing large databases.
Summary/Implications:
•  There are features (e.g., genomic, assembly, and contamination) that cause
k-mer based methods to fail to accurately capture the distance between
samples.
•  Treating absent data as informative may be problematic
•  Site-based methods, like NUCmer and MLST, tended to be superior in
performance
•  Accessing the computing resources necessary to perform site-based
methods may be challenging when analyzing large databases.
•  If working with k-mer distances err on the side of false positives
•  And have high quality assemblies
Acknowledgements
FDA
•  Center for Food Safety and Applied Nutrition
•  Biostats/Bioinformatics staff – J. Baugher, H. Rand,
J. Miller, Y. Luo, S. Davis, E. Strain
•  Center for Veterinary Medicine
•  Office of Regulatory Affairs
National Institutes of Health
•  National Center for Biotechnology Information
State Health and University Labs
•  Alaska
•  Arizona
•  California
•  Florida
•  Hawaii
•  Maryland
•  Minnesota
•  New Mexico
•  New York
•  South Dakota
•  Texas
•  Virginia
•  Washington
USDA/FSIS
•  Eastern Laboratory
CDC
•  Enteric Diseases Laboratory
•  INEI-ANLIS “Carolos Malbran Institute,”
Argentina
•  Centre for Food Safety, University College
Dublin, Ireland
•  Food Environmental Research Agency, UK
•  Public Health England, UK
•  WHO
•  Illumina
•  Pac Bio
•  CLC Bio
•  Other independent collaborators
•  False negatives are primarily
due to failure to meet
consensus frequency
threshold
ConsensusFrequency<0.9
Coverage<8
X20x_Coverage
X100x_Coverage
X20x_Coverage
X100x_Coverage
0 1000 2000 3000 4000 5000
value
variable
variable
X20x_Coverage
X100x_Coverage
Validation exercise key findings:
Number of false negatives
•  False negatives are not random
across the genome
Validation exercise of CFSAN SNP Pipeline key findings:
•  100× dataset
•  Recovered 98.9% of the introduced SNPs
•  False positive rate of 1.04 × 10−6
•  20× dataset
•  Recovered 98.8% of SNPs
•  False positive rate of 8.34 × 10−7

More Related Content

What's hot

Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation SequencingShelomi Karoon
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Nathan Olson
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101Ino de Bruijn
 
Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)Mahidol University, Thailand
 
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...Application of Whole Genome Sequencing in the infectious disease’ in vitro di...
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...ExternalEvents
 
Bioinformatics as a tool for understanding carcinogenesis
Bioinformatics as a tool for understanding carcinogenesisBioinformatics as a tool for understanding carcinogenesis
Bioinformatics as a tool for understanding carcinogenesisDespoina Kalfakakou
 
Errors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingErrors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingNixon Mendez
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
 
Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14mhaendel
 
Advancing the Metagenomics Revolution
Advancing the Metagenomics RevolutionAdvancing the Metagenomics Revolution
Advancing the Metagenomics RevolutionLarry Smarr
 
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008Saul Kravitz
 
'Novel technologies to study the resistome'
'Novel technologies to study the resistome''Novel technologies to study the resistome'
'Novel technologies to study the resistome'Willem van Schaik
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...Larry Smarr
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput SequencingMark Pallen
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...VHIR Vall d’Hebron Institut de Recerca
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015Torsten Seemann
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorialc.titus.brown
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsJoão André Carriço
 
Viral Metagenomics (CABBIO 20150629 Buenos Aires)
Viral Metagenomics (CABBIO 20150629 Buenos Aires)Viral Metagenomics (CABBIO 20150629 Buenos Aires)
Viral Metagenomics (CABBIO 20150629 Buenos Aires)bedutilh
 

What's hot (20)

Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101
 
Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)
 
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...Application of Whole Genome Sequencing in the infectious disease’ in vitro di...
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...
 
Bioinformatics as a tool for understanding carcinogenesis
Bioinformatics as a tool for understanding carcinogenesisBioinformatics as a tool for understanding carcinogenesis
Bioinformatics as a tool for understanding carcinogenesis
 
Errors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingErrors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation Sequencing
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
 
Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14
 
Pattemore 2015
Pattemore 2015Pattemore 2015
Pattemore 2015
 
Advancing the Metagenomics Revolution
Advancing the Metagenomics RevolutionAdvancing the Metagenomics Revolution
Advancing the Metagenomics Revolution
 
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
 
'Novel technologies to study the resistome'
'Novel technologies to study the resistome''Novel technologies to study the resistome'
'Novel technologies to study the resistome'
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput Sequencing
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and Annotations
 
Viral Metagenomics (CABBIO 20150629 Buenos Aires)
Viral Metagenomics (CABBIO 20150629 Buenos Aires)Viral Metagenomics (CABBIO 20150629 Buenos Aires)
Viral Metagenomics (CABBIO 20150629 Buenos Aires)
 

Viewers also liked

Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceGenomeInABottle
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesSurya Saha
 
Plant genome sequencing and crop improvement
Plant genome sequencing and crop improvementPlant genome sequencing and crop improvement
Plant genome sequencing and crop improvementRagavendran Abbai
 
Bioinformática Introdução (Basic NGS)
Bioinformática Introdução (Basic NGS)Bioinformática Introdução (Basic NGS)
Bioinformática Introdução (Basic NGS)Renato Puga
 
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...Thermo Fisher Scientific
 
transforming clinical microbiology by next generation sequencing
transforming clinical microbiology by next generation sequencingtransforming clinical microbiology by next generation sequencing
transforming clinical microbiology by next generation sequencingPathKind Labs
 
NGS overview
NGS overviewNGS overview
NGS overviewAllSeq
 
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library PrepQIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library PrepQIAGEN
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platformsAllSeq
 
Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020Christian Frech
 
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...Spark Summit
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsAnnelies Haegeman
 
Next Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesNext Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesChung-Tsai Su
 

Viewers also liked (20)

Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
 
Plant genome sequencing and crop improvement
Plant genome sequencing and crop improvementPlant genome sequencing and crop improvement
Plant genome sequencing and crop improvement
 
Bioinformática Introdução (Basic NGS)
Bioinformática Introdução (Basic NGS)Bioinformática Introdução (Basic NGS)
Bioinformática Introdução (Basic NGS)
 
Rossen eccmid2015v1.5
Rossen eccmid2015v1.5Rossen eccmid2015v1.5
Rossen eccmid2015v1.5
 
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
 
transforming clinical microbiology by next generation sequencing
transforming clinical microbiology by next generation sequencingtransforming clinical microbiology by next generation sequencing
transforming clinical microbiology by next generation sequencing
 
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
 
Ngs presentation
Ngs presentationNgs presentation
Ngs presentation
 
NGS overview
NGS overviewNGS overview
NGS overview
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library PrepQIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platforms
 
Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platforms
 
Next Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesNext Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and Opportunities
 
Poster ESHG
Poster ESHGPoster ESHG
Poster ESHG
 

Similar to Building bioinformatics resources for global community

Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_PresentationYatpang Cheung
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisDespoina Kalfakakou
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)Michael Atkins
 
Affymetrix OncoScan®* data analysis with Nexus Copy Number™
Affymetrix OncoScan®* data analysis with Nexus Copy Number™Affymetrix OncoScan®* data analysis with Nexus Copy Number™
Affymetrix OncoScan®* data analysis with Nexus Copy Number™Affymetrix
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...Chris Evelo
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxxRowlet
 
Open Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisOpen Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisAntica Culina
 
Multi-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorMulti-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorLevi Waldron
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuJax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuAnne Deslattes Mays
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBenjamin Good
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesMartin Hartmann
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods Zohaib HUSSAIN
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128GenomeInABottle
 

Similar to Building bioinformatics resources for global community (20)

Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
ASHG_2014_AP
ASHG_2014_APASHG_2014_AP
ASHG_2014_AP
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysis
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
Affymetrix OncoScan®* data analysis with Nexus Copy Number™
Affymetrix OncoScan®* data analysis with Nexus Copy Number™Affymetrix OncoScan®* data analysis with Nexus Copy Number™
Affymetrix OncoScan®* data analysis with Nexus Copy Number™
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
Open Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisOpen Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysis
 
Multi-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorMulti-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/Bioconductor
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuJax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiers
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods
 
Brief introduction to Bioinformatics
Brief introduction to BioinformaticsBrief introduction to Bioinformatics
Brief introduction to Bioinformatics
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
10.1.1.80.2149
10.1.1.80.214910.1.1.80.2149
10.1.1.80.2149
 

More from ExternalEvents

More from ExternalEvents (20)

Mauritania
Mauritania Mauritania
Mauritania
 
Malawi - M. Munthali
Malawi - M. MunthaliMalawi - M. Munthali
Malawi - M. Munthali
 
Malawi (Mbewe)
Malawi (Mbewe)Malawi (Mbewe)
Malawi (Mbewe)
 
Malawi (Desideri)
Malawi (Desideri)Malawi (Desideri)
Malawi (Desideri)
 
Lesotho
LesothoLesotho
Lesotho
 
Kenya
KenyaKenya
Kenya
 
ICRAF: Soil-plant spectral diagnostics laboratory
ICRAF: Soil-plant spectral diagnostics laboratoryICRAF: Soil-plant spectral diagnostics laboratory
ICRAF: Soil-plant spectral diagnostics laboratory
 
Ghana
GhanaGhana
Ghana
 
Ethiopia
EthiopiaEthiopia
Ethiopia
 
Item 15
Item 15Item 15
Item 15
 
Item 14
Item 14Item 14
Item 14
 
Item 13
Item 13Item 13
Item 13
 
Item 7
Item 7Item 7
Item 7
 
Item 6
Item 6Item 6
Item 6
 
Item 3
Item 3Item 3
Item 3
 
Item 16
Item 16Item 16
Item 16
 
Item 9: Soil mapping to support sustainable agriculture
Item 9: Soil mapping to support sustainable agricultureItem 9: Soil mapping to support sustainable agriculture
Item 9: Soil mapping to support sustainable agriculture
 
Item 8: WRB, World Reference Base for Soil Resouces
Item 8: WRB, World Reference Base for Soil ResoucesItem 8: WRB, World Reference Base for Soil Resouces
Item 8: WRB, World Reference Base for Soil Resouces
 
Item 7: Progress made in Nepal
Item 7: Progress made in NepalItem 7: Progress made in Nepal
Item 7: Progress made in Nepal
 
Item 6: International Center for Biosaline Agriculture
Item 6: International Center for Biosaline AgricultureItem 6: International Center for Biosaline Agriculture
Item 6: International Center for Biosaline Agriculture
 

Recently uploaded

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 

Recently uploaded (20)

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 

Building bioinformatics resources for global community

  • 1. Building bioinformatics resources for the global community James Pettengill james.pettengill@fda.hhs.gov Biostatistics and Bioinformatics Staff Office of Analytics and Outreach FDA Center for Food Safety and Applied Nutrition GMI9 May 24, 2016 Rome, Italy
  • 2. CFSAN’s open-access peer reviewed methods for analyzing and differentiating among samples based on WGS data. Submitted 16 April 2014 Accepted 23 September 2014 Published 14 October 2014 Corresponding author Errol Strain, Errol.Strain@fda.hhs.gov Academic editor Keith Crandall Additional Information and Declarations can be found on page 21 DOI 10.7717/peerj.620 An evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study with Salmonella James B. Pettengill, Yan Luo, Steven Davis, Yi Chen, Narjol Gonzalez-Escalona, Andrea Ottesen, Hugh Rand, Marc W. Allard and Errol Strain Center for Food Safety & Applied Nutrition, U.S. Food & Drug Administration, College Park, MD, USA ABSTRACT Comparative genomics based on whole genome sequencing (WGS) is increasingly being applied to investigate questions within evolutionary and molecular biology, as well as questions concerning public health (e.g., pathogen outbreaks). Given the impact that conclusions derived from such analyses may have, we have evaluated the robustness of clustering individuals based on WGS data to three key factors: (1) next-generation sequencing (NGS) platform (HiSeq, MiSeq, IonTorrent, 454, and SOLiD), (2) algorithms used to construct a SNP (single nucleotide polymorphism) matrix (reference-based and reference-free), and (3) phylogenetic inference method (FastTreeMP, GARLI, and RAxML). We carried out these analyses on 194 whole genome sequences representing 107 unique Salmonella enterica subsp. enterica ser. Montevideo strains. Reference-based approaches for identifying SNPs produced trees that were significantly more similar to one another than those produced under the reference-free approach. Topologies inferred using a core matrix (i.e., no missing data) were significantly more discordant than those inferred using a non-core matrix that allows for some missing data. However, allowing for too much missing data likely results in a high false discovery rate of SNPs. When analyzing the same SNP matrix, we observed that the more thorough inference methods implemented in GARLI and RAxML produced more similar topologies than FastTreeMP. Our results also confirm that reproducibility varies among NGS platforms where the MiSeq had the lowest number of pairwise diVerences among replicate runs. Our investigation into the ro- bustness of clustering patterns illustrates the importance of carefully considering how data from diVerent platforms are combined and analyzed. We found clear diVerences in the topologies inferred, and certain methods performed significantly better than others for discriminating between the highly clonal organisms investigated here. The methods supported by our results represent a preliminary set of guidelines and a step towards developing validated standards for clustering based on whole genome sequence data.
  • 3. Real-time pathogen detection in the era of whole-genome sequencing and big data: K-mer and site-based methods for inferring the distances among tens of thousands of Salmonella samples James Pettengill james.pettengill@fda.hhs.gov Biostatistics and Bioinformatics Staff Office of Analytics and Outreach FDA Center for Food Safety and Applied Nutrition GMI9 May 24, 2016 Rome, Italy
  • 4. •  The adoption of whole-genome sequencing within the public health realm has resulted in large databases being populated in real-time. Premise/Background of the project
  • 5. •  The adoption of whole-genome sequencing within the public health realm has resulted in large databases being populated in real-time. •  These databases contain 60,000+ samples and are expected to grow to hundreds of thousands within a few years. Premise/Background of the project
  • 6. •  The adoption of whole-genome sequencing within the public health realm has resulted in large databases being populated in real-time. •  These databases contain 60,000+ samples and are expected to grow to hundreds of thousands within a few years. •  For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genomic distances among a set of samples. Premise/Background of the project
  • 7. •  The adoption of whole-genome sequencing within the public health realm has resulted in large databases being populated in real-time. •  These databases contain 60,000+ samples and are expected to grow to hundreds of thousands within a few years. •  For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genomic distances among a set of samples. •  Being able to do so is challenging due to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. Premise/Background of the project
  • 8. •  The adoption of whole-genome sequencing within the public health realm has resulted in large databases being populated in real-time. •  These databases contain 60,000+ samples and are expected to grow to hundreds of thousands within a few years. •  For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genomic distances among a set of samples. •  Being able to do so is challenging due to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. •  Evaluated 7 measures of genetic distance based on k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances) and nucleotide sites (NUCmer and whole-genome multi-locus sequence typing (wgMLST)) Premise/Background of the project
  • 9. •  The adoption of whole-genome sequencing within the public health realm has resulted in large databases being populated in real-time. •  These databases contain 60,000+ samples and are expected to grow to hundreds of thousands within a few years. •  For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genomic distances among a set of samples. •  Being able to do so is challenging due to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. •  Evaluated 7 measures of genetic distance based on k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances) and nucleotide sites (NUCmer and multi-locus sequence typing (MLST)) •  Empirical data: whole-genome sequence data from 18,997 Salmonella isolates Premise/Background of the project
  • 11. Efficient method inter-category comparisons intra-category comparisons genetic distances Experimental design: based on a classification scheme determine how well each distance measure performs # Inefficient method genetic distances #
  • 13. Experimental design: Empirical data: •  Analyze different distance methods on de novo assemblies of all Salmonella samples in GenomeTrakr •  Use serovar as the classification scheme Efficient method inter-enteritidis comparisons intra-enteritidis comparisons genetic distances #
  • 14. Experimental design: Empirical data: using cloud computing to perform assemblies on GenomeTrakr data Assembly workflow: Obtain latest metadata file from NCBI pathogen database
  • 15. Assembly workflow: Obtain latest metadata file from NCBI pathogen database Parse metadata and download raw data Experimental design: Empirical data: using cloud computing to perform assemblies on GenomeTrakr data
  • 16. Assembly workflow: Obtain latest metadata file from NCBI pathogen database Parse metadata and download raw data Quality filter using fastx toolkit Experimental design: Empirical data: using cloud computing to perform assemblies on GenomeTrakr data
  • 17. Assembly workflow: Obtain latest metadata file from NCBI pathogen database Parse metadata and download raw data Quality filter using fastx toolkit Taxonomic/contamination filtering using Kraken with custom db Experimental design: Empirical data: using cloud computing to perform assemblies on GenomeTrakr data
  • 18. Assembly workflow: Obtain latest metadata file from NCBI pathogen database Parse metadata and download raw data Quality filter using fastx toolkit Taxonomic/contamination filtering using Kraken with custom db Assembly using SPAdes Experimental design: Empirical data: using cloud computing to perform assemblies on GenomeTrakr data
  • 19. 1.  Obtain an assembly for each sample within GenomeTrakr •  Use pilot of cloud computing to accomplish assemblies – “cloudbursting” Summary! –! We! have! successfully!completed! running! Use! Cases! 2! and! 3! on! AWS! servers!via!the!CycleCloud!platform.!Even!without!time!for!extensive!optimization!of! the!clusters,!we!were!able!to!complete!the!Use!Cases!rapidly!and!inexpensively.!! ! ! ! Use)Case)2)–))Listeria)Isolates) ) ! A!workflow!was!designed!to!analyze!sequencing!data!from!all!of!the!publicly! available! Listeria! isolates! (3645)! collected! by! the! GenomeTrackr! network.! This! workflow! involves! downloading! data! from! the! NCBI! servers,! trimming! the! sequencing!reads!based!on!quality!scores,!filtering!the!reads!based!on!quality!and! taxonomy,!and!assembling!the!reads!into!contiguous!genome!segments.!The!results! of! this! workflow! will! allow! us! to! improve! our! methods! of! identifying! outbreak! isolates.! ! ! 1.!!Cluster!Specs!–!! ! Max!cores! ! 4000! ! Max!parallel!jobs! 1000! ! Master!node! ! i2.4xlarge! ! Compute!nodes!! r3.2xlarge,!r3.4xlarge! ! 2.!Results!–! ! Jobs! ! ! 3645! ! Run!time! ! 8)hours!! ! Job!completion!rate! 99.8%! ! Approximate!cost! $1800.00! ! 3.!Additional!Notes!–! ! Local!runtime!! ! 3.5)days! ! Feasible!to!run!locally! YES! ! Anticipated!frequency! once/quarter! ! Estimated!yearly!cost! !$9000.00! *!Assuming!the!current!growth!rate!of!this!dataset,!we!estimate!900!additional!samples!per! quarter!for!the!next!year.! ! 3,645 Listeria assemblies ! Use)Case)3)–))Salmonella)Isolates) ) ! Our!revised!Use!Case!3!applies!the!workflow!described! publicly! available! Salmonella! isolates! (25765)! collected! by! network.!The!analysis!of!this!dataset!is!much!more!difficult!due! size!and!a!much!larger!number!of!isolates!and!is!not!feasible!o resources.! ! 1.!!Cluster!Specs!–!! ! Max!cores! ! 12000! ! Max!parallel!jobs! 3000! ! Master!node! ! i2.4xlarge! ! Compute!nodes!! r3.2xlarge,!r3.4xlarge,!r3.8xlarge! ! 2.!Results!–! ! Jobs! ! ! 25765! ! Run)time! ! 20)hours!! ! Job!completion!rate! 99.1%! ! Approximate!cost! $8000.00! ! 3.!Additional!Notes!–! ! Estimated)local)runtime! 23)days! ! Feasible)to)run)locally! NO! ! Anticipated!frequency! once/quarter! ! Estimated!yearly!cost! !$56000.00! *!Assuming!the!current!growth!rate!of!this!dataset,!we!estimate!12,50 per!quarter!for!the!next!year.! ! 25,765 Salmonella assemblies
  • 20. Site-based: Sample1: ACCTAGTACC Sample2: ACGTACTACC Requires statements about homology/ sequence alignment Kmer-based (L = 9): Sample1: ACCTAGTACC kmer1: ACCTAGTAC kmer2: CCTAGTACC Sample2: ACGTACTACC kmer1: ACGTACTAC kmer2: CGTACTACC Fast but loss/oversimplification of information Similarity = 0.8 Similarity = 0 Experimental design: Distance measures
  • 21. Summary of methods used to infer the relationships among samples. Class Method Description Exec. time (s) Site-based Nucmer§ Pairwise genome alignment using suffix arrays 11.9 wgMLST¶ Gene based approach 46.95 K-mer based Jaccard Index§ The intersection divided by the union of all K-mers found between two samples 9.4 Manhattan Distance§ Sum of the absolute differences between the abundance of each K-mer present between two samples 45.1 Euclidean Distance§ The square root of the sum of square of all pairwise differences in K- mer abundance 44.2 Mash Distance MinHash (Broder 1998) technique to reduce genomes to sketches and estimates a novel evolutionary distance metric among them 1.2 Mash Jaccard Distance The Jaccard Distance (as described above) but based on the sketch size (e.g., the number of hashes) 1.2 § Performed using de novo assemblies and requires k-mer indexing, which with jellyfish takes 7.4s (0.8) per sample (2.1 days for 25,000 samples) ¶ Requires a reference genome
  • 22. Classification of simulated data: ROC curves identical across different distance methods * Simulated data is not complex/ noisy enough
  • 23.
  • 24. Summary/Implications: •  There are features (e.g., genomic, assembly, and contamination) that cause k-mer based methods to fail to accurately capture the distance between samples. •  Treating absent data as informative may be problematic
  • 25. Summary/Implications: •  There are features (e.g., genomic, assembly, and contamination) that cause k-mer based methods to fail to accurately capture the distance between samples. •  Treating absent data as informative may be problematic •  Site-based methods, like NUCmer and MLST, tended to be superior in performance
  • 26. Summary/Implications: •  There are features (e.g., genomic, assembly, and contamination) that cause k-mer based methods to fail to accurately capture the distance between samples. •  Treating absent data as informative may be problematic •  Site-based methods, like NUCmer and MLST, tended to be superior in performance •  Accessing the computing resources necessary to perform site-based methods may be challenging when analyzing large databases.
  • 27. Summary/Implications: •  There are features (e.g., genomic, assembly, and contamination) that cause k-mer based methods to fail to accurately capture the distance between samples. •  Treating absent data as informative may be problematic •  Site-based methods, like NUCmer and MLST, tended to be superior in performance •  Accessing the computing resources necessary to perform site-based methods may be challenging when analyzing large databases. •  If working with k-mer distances err on the side of false positives •  And have high quality assemblies
  • 28. Acknowledgements FDA •  Center for Food Safety and Applied Nutrition •  Biostats/Bioinformatics staff – J. Baugher, H. Rand, J. Miller, Y. Luo, S. Davis, E. Strain •  Center for Veterinary Medicine •  Office of Regulatory Affairs National Institutes of Health •  National Center for Biotechnology Information State Health and University Labs •  Alaska •  Arizona •  California •  Florida •  Hawaii •  Maryland •  Minnesota •  New Mexico •  New York •  South Dakota •  Texas •  Virginia •  Washington USDA/FSIS •  Eastern Laboratory CDC •  Enteric Diseases Laboratory •  INEI-ANLIS “Carolos Malbran Institute,” Argentina •  Centre for Food Safety, University College Dublin, Ireland •  Food Environmental Research Agency, UK •  Public Health England, UK •  WHO •  Illumina •  Pac Bio •  CLC Bio •  Other independent collaborators
  • 29.
  • 30.
  • 31. •  False negatives are primarily due to failure to meet consensus frequency threshold ConsensusFrequency<0.9 Coverage<8 X20x_Coverage X100x_Coverage X20x_Coverage X100x_Coverage 0 1000 2000 3000 4000 5000 value variable variable X20x_Coverage X100x_Coverage Validation exercise key findings: Number of false negatives •  False negatives are not random across the genome
  • 32. Validation exercise of CFSAN SNP Pipeline key findings: •  100× dataset •  Recovered 98.9% of the introduced SNPs •  False positive rate of 1.04 × 10−6 •  20× dataset •  Recovered 98.8% of SNPs •  False positive rate of 8.34 × 10−7