SlideShare a Scribd company logo
NGS data processing
with GENALICE MAP
Remco	Ursem
Rijk Zwaan Biotechnology
Research Facilities
Fijnaart, The Netherlands
Outline
•  Rijk Zwaan
•  Sequencing data challenge
•  Comparing GENALICE MAP with
BWA/GATK
– Storage footprint
– Analysis speed
– First in house experiences
•  Wrap-up
•  Since 1924
•  Independent
•  2,500 colleagues
Family company
Our organization
> 25 crops
Data deluge?
•  Re-sequencing larger number of
accessions/lines
•  Reference available for growing
amount of crops
•  Multiple references available per
(sub)species
•  Fast evolving versioning of
references
Yes!
•  Storage footprint grows exponential
•  Computational demand as well
•  Looking for a solution we came in
contact with Genalice
Storage footprint
10
Cucumber alignment file size
27,9	GB	
	0,99	GB	
28	X	
BAM	
GAR
12
13
14
GAR file
Lossless compression?
•  No, but does it matter?
•  Probably not when we have clean
high quality data.
Analysis
Variant calling pipeline
•  Based on BWA / GATK (Broad)
Conclusions
•  This variant calling pipeline does
need serious hardware!
•  Not all steps can be easily
parallelized.
•  Does not scale in our setup.
GENALICE Server
•  24 cores
•  128G RAM
Two test cases
•  Cucumber sample
– 300 MB genome
– No problems with old pipeline
•  Brassica sample
– 600 MB genome
– Unreliable calls detected
Elapsed time to Map/Call cucumber
BWA/GATK	 GENALICE	
	mapping	 21:02	 16	
	calling	 2:24	 1	
	total	 23:26	 17			83X
Comparing Cucumber SNP calling
0
100000
200000
300000
400000
500000
600000
700000
Total SNP's
Overlapping Genotypes
BWA/GATK GENALICE
Higher quality SNPs overlap.
GENALICE settings probably to stringent?
BWA/GATK to loose?
Both?
Comparing SNP zygosity
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Homozygous SNP's
Heterezygous SNP's
BWA/GATK GENALICE
Elapsed time to Map/Call Brassica
BWA/GATK	 Genalice	
	mapping	 20:03	 13	
	calling	 1:18	 1	
	total	 21:21	 14			92X
Comparing Brassica SNP calling
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
Total SNP's
Overlapping genotypes
To stringent, but something else going on?
BWA/GATK GENALICE
Whole genome duplications
Comparing SNP zygosity
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Homozygous SNP's
Heterozygous SNP's
BWA/GATK GENALICE
Comparing with marker assay results
0
100
200
300
400
500
600
700
800
Failed Markers
Passed markers
BWA/GATK GENALICE
Conclusions
•  GENALICE makes it possible to do
wide parameters sweeps.
•  This enables crop/assembly specific
pipeline fine tuning .
Integration with other tools
•  BAM and VCF export functionality
and GAR-API suite available.
•  IGV plugin works with GAR files.
More than SNPs
EliaStupka_etal_NGS-meeting 2012
More than SNPs
•  GENALICE pipeline calls INDELs up to
126 base pairs.
•  SV calling under development.
Wrap-up
•  GENALICE pipeline is very fast!
•  Feasible to rerun entire collection of
resequenced lines (e.g. new assembly)
•  Higher quality variants found are
overlapping with BWA/GATK results.
•  Fast mapping and calling makes it
possible to do parameter sweeps.
•  This enables assembly or project specific
pipeline fine tuning .
Wrap-up (continued)
•  GAR file is a very efficient format.
•  GAR plugin for IGV and GAR-API
available.
•  INDELs called up to 126 bases.
•  Structural Variation detection is under
development.
•  RNA-Seq mapping functionality
available, not yet tested at Rijk Zwaan.
Acknowledgements
Mar<jn	van	Elk	
Bioinforma<cs	group	 Tim	Karten	
Bas	Tolhuis

More Related Content

Similar to Streamlining NGS data processing with GENALICE MAP - Remco Ursem (Rijk Zwaan)

DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RC
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RCDNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RC
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RC
Grid Protection Alliance
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
sesejun
 
Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...
Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...
Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...
QIAGEN
 
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A RathoreGRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
CGIAR Generation Challenge Programme
 

Similar to Streamlining NGS data processing with GENALICE MAP - Remco Ursem (Rijk Zwaan) (20)

Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGSCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
 
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RC
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RCDNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RC
DNMTT - Synchrophasor Data Delivery Efficiency GEP Testing Results at Peak RC
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotech
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...
Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...
Comparison of Different NGS Library Construction Methods for Single-Cell Sequ...
 
TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014
 
Making powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisMaking powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysis
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethion
 
RIPE 82: An Update on Fragmentation Loss Rates in IPv6
RIPE 82: An Update on Fragmentation Loss Rates in IPv6RIPE 82: An Update on Fragmentation Loss Rates in IPv6
RIPE 82: An Update on Fragmentation Loss Rates in IPv6
 
IETF 113: IPv6 fragmentation and EH behaviours
IETF 113: IPv6 fragmentation and EH behavioursIETF 113: IPv6 fragmentation and EH behaviours
IETF 113: IPv6 fragmentation and EH behaviours
 
The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learning
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
Distributed Network Monitoring - Interopnet class by NetBeez
Distributed Network Monitoring - Interopnet class by NetBeezDistributed Network Monitoring - Interopnet class by NetBeez
Distributed Network Monitoring - Interopnet class by NetBeez
 
Whole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptxWhole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptx
 
SeqsLab: a high performance genomics data analysis platform based on Apache S...
SeqsLab: a high performance genomics data analysis platform based on Apache S...SeqsLab: a high performance genomics data analysis platform based on Apache S...
SeqsLab: a high performance genomics data analysis platform based on Apache S...
 
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A RathoreGRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
 
Efficiency of RSA Key Factorization by Open-Source Libraries and Distributed ...
Efficiency of RSA Key Factorization by Open-Source Libraries and Distributed ...Efficiency of RSA Key Factorization by Open-Source Libraries and Distributed ...
Efficiency of RSA Key Factorization by Open-Source Libraries and Distributed ...
 
VMworld 2013: Big Data: Virtualized SAP HANA Performance, Scalability and Bes...
VMworld 2013: Big Data: Virtualized SAP HANA Performance, Scalability and Bes...VMworld 2013: Big Data: Virtualized SAP HANA Performance, Scalability and Bes...
VMworld 2013: Big Data: Virtualized SAP HANA Performance, Scalability and Bes...
 
BGP: Whats so special about the number 512?
BGP: Whats so special about the number 512?BGP: Whats so special about the number 512?
BGP: Whats so special about the number 512?
 

Recently uploaded

(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Sérgio Sacani
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdfPests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
PirithiRaju
 

Recently uploaded (20)

(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
 
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptxGLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere University
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
Transport in plants G1.pptx Cambridge IGCSE
Transport in plants G1.pptx Cambridge IGCSETransport in plants G1.pptx Cambridge IGCSE
Transport in plants G1.pptx Cambridge IGCSE
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
 
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdfPests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
 
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 

Streamlining NGS data processing with GENALICE MAP - Remco Ursem (Rijk Zwaan)