SlideShare a Scribd company logo
1 of 18
Download to read offline
Population Calling
a powerful tool for novel mutation detection in larger
sample pools
January 12, 2016
Antoine Janssen
antoine.janssen@keygene.com
June 2015/ Antoine JanssenThe crop innovation company
The crop innovation company 2
Overview
• Genotype calling strategies
• Infrastructure setup
• Show case 3,000 rice genomes
• Genalice stepwise vs Population calling
• BWA/GATK joint genotyping vs Population calling
• Conclusions
The crop innovation company
Overview
Genotype calling strategies
3
• Single sample calling
• Batch calling
• Joint calling
Pros
• Distinction between homozygous reference vs. missing
data
• Greater sensitivity for low-frequency variants
• Greater ability to filter out false positives
Cons
• Scaling and infrastructure
• Incremental analysis
The crop innovation company
Single sample calling
Genotype calling strategies
4
The crop innovation company
The classic route: BWA/GATK
preprocessing read mapping
genotypingpostprocessing
• python script
• sickle
• BWA
• samtools
• Picard MarkDuplicates
• GATK IndelRealigner
• GATK UnifiedGenotyper• several python scripts
Genotype calling strategies
The crop innovation company
Map per sample Mapping of all samples against reference
Genotype per sample Genotyping of all samples
Store variant positions Create BED file of all variation over all
samples
Genotype calling strategies
Stepwise approach (1)
The crop innovation company
Adjust VCF per sample
Adjust VCF files to results in samples with
equal amounts of positions – forcing
unknown (./.)
Merge VCFs & Post process Merge VCFs & Post-process all samples
Genotype calling strategies
Stepwise approach (2)
Genotype per sample
Genotyping of all samples with a BED file –
force calls on all positions
The crop innovation company
Genalice Population Calling
Genotype calling strategies
8
Map per sample
Map all samples against reference
Genotype per sample Genotype all samples
Pop call & post process Population calling & post-processing of all
samples
The crop innovation company
Genalice Population Calling: Process
Genotype calling strategies
9
gaPopulation add
gaPopulation merge
gaPopulation commit
gaPopulation extract
XMLgaMap
GAR
GAR
GAR
GAR
fastq
fastq
fastq
fastq
GVM
VCF
The crop innovation company
Infrastructure setup
10
3,000 rice accessions
~17 Tb
fastq.gz format
Streaming
fastq.gz over
NFS
Desktop client (VM)
SSH connection
Illumina
HiSeq2500Intel Xeon 2620 (12
cores, 2Ghz)
96Gb RAM
The crop innovation company
3,000 rice genome project
Show case
11
• Rice (Oryza sativa L)
• 3,000 rice accessions
• 89 countries
• Avg seq depth of 14X
• Mapped to Nipponbare (IRGSP-1.0)
o 374 Mbp
o 12 chromosomes
• BWA and GATK (DNANexus)
• Data on
o Gygascience
o EBI
o Amazon
The crop innovation company
Test set 1: 131 rice accessions
Show case
12
• Subset of 131 accessions (out of 3,000) selected
• All major rice types are represented
• Mapped to Nipponbare (IRGSP-1.0)
Type #
Aus/boro 5
Basmati/sadri 1
Indica 19
Intermediate type 7
Japonica 11
Temperate japonica 71
Tropical japonica 17
Grand Total 131
The crop innovation company
Genalice ‘stepwise’ vs. Population calling
Show case
13
Map per sample
Genotype per sample
Store variant positions
9h
Adjust VCF per sample
Merge VCFs & Post process
Genotype per sample
1h
0.5h
3h
7h
116h
Map per sample
Pop call & post process
3h
5m
Genotype per sample 1h
Total time: 136:30h
8,227,780 variants
Total time: 4:05h
8,137,366 variants
Stepwise Pop call
6,147,072
positions shared
75%/76%
Map: 1:22m / sample
Total: 1:43m / sample
The crop innovation company
Genalice ‘stepwise’ vs. Population calling
Show case
14
• Genalice population calling route is straight forward and
does not require external tools
• Major performance increase from stepwise to population
calling approach (factor 34)
• Overlap of approaches on position ~75%
• Further qualitative research required
The crop innovation company
BWA/GATK vs. Genalice Population calling
Show case
15
BWA / GATK from
https://aws.amazon.com/public-data-
sets/3000-rice-genome
No details on compute available
gaMap
gaPopulation merge
75h
1.5h
gaPopulation add 25h
Pop call
Map per sample
Pop call & post process
Genotype per sample
BWA/GATK
gaPopulation extract 1h
gaPopulation commit 0.5h
Map: 1:30m / sample
Total: 2:04m / sample
Analysis of 3,000 rice accessions
The crop innovation company
BWA/GATK vs. Genalice Population calling (2)
Show case
16
• Format of VCF on Amazon was different from Genalice MAP
o --output_mode EMIT_ALL_SITES
o One gVCF file per accession (~2Gb each)
o Call for all positions of the reference
o Multiple calls for same genomic position
• Datasets are cleaned and merged
• Variant count 131 samples: 14,838,819 vs 8,137,366
• Minimum allele depth 2 vs 5
• 6,031,002 positions shared (74%) -> 26% novels
• 83.5% of shared positions have identical genotypes
• Further analysis required
The crop innovation company 17
Conclusions
• The new Population Calling module is extremely fast
• > 30 time faster then stepwise approach
• Analysis time scales linear with # samples
• Differences in content should be further analyzed
• The module is highly flexible
o Incremental addition of samples
• Extracting variants from GVM is very efficient
• The VCF output is conform standard but misses details
required in follow up research (depth / quality)
• This feature is requested and will be added soon
The crop innovation company 18
Thank you
Cueleneare
Koen
Nijbroek Hans
Karten
Tim
Rudie Antonise Bas Tolhuis

More Related Content

What's hot

Bioo Scientific - Improving the Performance of SureSelectXT2 Target Capture
Bioo Scientific - Improving the Performance of SureSelectXT2 Target CaptureBioo Scientific - Improving the Performance of SureSelectXT2 Target Capture
Bioo Scientific - Improving the Performance of SureSelectXT2 Target CaptureBioo Scientific
 
A field-deployable automatic nucleic acid extraction and insulated isothermal...
A field-deployable automatic nucleic acid extraction and insulated isothermal...A field-deployable automatic nucleic acid extraction and insulated isothermal...
A field-deployable automatic nucleic acid extraction and insulated isothermal...Simon Chung - genereach
 
Evaluation of an easy and rapid detection of avian leukosis virus subgroup j ...
Evaluation of an easy and rapid detection of avian leukosis virus subgroup j ...Evaluation of an easy and rapid detection of avian leukosis virus subgroup j ...
Evaluation of an easy and rapid detection of avian leukosis virus subgroup j ...Simon Chung - genereach
 
Gene disc® rapid microbiology system
Gene disc® rapid microbiology systemGene disc® rapid microbiology system
Gene disc® rapid microbiology systemdanisandominguez
 
Genetic engineering
Genetic  engineeringGenetic  engineering
Genetic engineeringArunima Sur
 
A Fully Automated POCKIT Central PCR System for Evaluation of the Infectious ...
A Fully Automated POCKIT Central PCR System for Evaluation of the Infectious ...A Fully Automated POCKIT Central PCR System for Evaluation of the Infectious ...
A Fully Automated POCKIT Central PCR System for Evaluation of the Infectious ...Simon Chung - genereach
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 
RT2 Profiler PCR Arrays: Pathway-focused Gene Expression Profiling with qRT-P...
RT2 Profiler PCR Arrays: Pathway-focused Gene Expression Profiling with qRT-P...RT2 Profiler PCR Arrays: Pathway-focused Gene Expression Profiling with qRT-P...
RT2 Profiler PCR Arrays: Pathway-focused Gene Expression Profiling with qRT-P...QIAGEN
 
Custom Enrichment Panels for Targeted Next Generation Sequencing
Custom Enrichment Panels for Targeted Next Generation SequencingCustom Enrichment Panels for Targeted Next Generation Sequencing
Custom Enrichment Panels for Targeted Next Generation SequencingIntegrated DNA Technologies
 
Developing Reliable QC at the Swedish National Genomics Infrastructure
Developing Reliable QC at the Swedish National Genomics InfrastructureDeveloping Reliable QC at the Swedish National Genomics Infrastructure
Developing Reliable QC at the Swedish National Genomics InfrastructurePhil Ewels
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 
A field-deployable RT-PCR system performs equivalently to real-time RT-PCR in...
A field-deployable RT-PCR system performs equivalently to real-time RT-PCR in...A field-deployable RT-PCR system performs equivalently to real-time RT-PCR in...
A field-deployable RT-PCR system performs equivalently to real-time RT-PCR in...Simon Chung - genereach
 

What's hot (14)

Bioo Scientific - Improving the Performance of SureSelectXT2 Target Capture
Bioo Scientific - Improving the Performance of SureSelectXT2 Target CaptureBioo Scientific - Improving the Performance of SureSelectXT2 Target Capture
Bioo Scientific - Improving the Performance of SureSelectXT2 Target Capture
 
A field-deployable automatic nucleic acid extraction and insulated isothermal...
A field-deployable automatic nucleic acid extraction and insulated isothermal...A field-deployable automatic nucleic acid extraction and insulated isothermal...
A field-deployable automatic nucleic acid extraction and insulated isothermal...
 
PoemTapp16
PoemTapp16PoemTapp16
PoemTapp16
 
Evaluation of an easy and rapid detection of avian leukosis virus subgroup j ...
Evaluation of an easy and rapid detection of avian leukosis virus subgroup j ...Evaluation of an easy and rapid detection of avian leukosis virus subgroup j ...
Evaluation of an easy and rapid detection of avian leukosis virus subgroup j ...
 
Gene disc® rapid microbiology system
Gene disc® rapid microbiology systemGene disc® rapid microbiology system
Gene disc® rapid microbiology system
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 
Genetic engineering
Genetic  engineeringGenetic  engineering
Genetic engineering
 
A Fully Automated POCKIT Central PCR System for Evaluation of the Infectious ...
A Fully Automated POCKIT Central PCR System for Evaluation of the Infectious ...A Fully Automated POCKIT Central PCR System for Evaluation of the Infectious ...
A Fully Automated POCKIT Central PCR System for Evaluation of the Infectious ...
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
RT2 Profiler PCR Arrays: Pathway-focused Gene Expression Profiling with qRT-P...
RT2 Profiler PCR Arrays: Pathway-focused Gene Expression Profiling with qRT-P...RT2 Profiler PCR Arrays: Pathway-focused Gene Expression Profiling with qRT-P...
RT2 Profiler PCR Arrays: Pathway-focused Gene Expression Profiling with qRT-P...
 
Custom Enrichment Panels for Targeted Next Generation Sequencing
Custom Enrichment Panels for Targeted Next Generation SequencingCustom Enrichment Panels for Targeted Next Generation Sequencing
Custom Enrichment Panels for Targeted Next Generation Sequencing
 
Developing Reliable QC at the Swedish National Genomics Infrastructure
Developing Reliable QC at the Swedish National Genomics InfrastructureDeveloping Reliable QC at the Swedish National Genomics Infrastructure
Developing Reliable QC at the Swedish National Genomics Infrastructure
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
A field-deployable RT-PCR system performs equivalently to real-time RT-PCR in...
A field-deployable RT-PCR system performs equivalently to real-time RT-PCR in...A field-deployable RT-PCR system performs equivalently to real-time RT-PCR in...
A field-deployable RT-PCR system performs equivalently to real-time RT-PCR in...
 

Similar to Novel mutation detection in large sample pools using Population Calling

Research Program Genetic Gains (RPGG) Review Meeting 2021: Forward Breeding: ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: Forward Breeding: ...Research Program Genetic Gains (RPGG) Review Meeting 2021: Forward Breeding: ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: Forward Breeding: ...ICRISAT
 
3b. Biotechnolgies & Genomics - Jane Theaker
3b. Biotechnolgies & Genomics - Jane Theaker3b. Biotechnolgies & Genomics - Jane Theaker
3b. Biotechnolgies & Genomics - Jane TheakerIventus
 
Genomic and enabling technologies in maize breeding for enhanced genetic gain...
Genomic and enabling technologies in maize breeding for enhanced genetic gain...Genomic and enabling technologies in maize breeding for enhanced genetic gain...
Genomic and enabling technologies in maize breeding for enhanced genetic gain...CIMMYT
 
Spark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van HamSpark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van HamSpark Summit
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
 
Metadata for the Global Evaluation Trial Repository of CCAFS and interoperabi...
Metadata for the Global Evaluation Trial Repository of CCAFS and interoperabi...Metadata for the Global Evaluation Trial Repository of CCAFS and interoperabi...
Metadata for the Global Evaluation Trial Repository of CCAFS and interoperabi...Decision and Policy Analysis Program
 
2013 GRM: Improve chickpea productivity for marginal environments in sub-Sah...
2013 GRM: Improve chickpea productivity for marginal environments in  sub-Sah...2013 GRM: Improve chickpea productivity for marginal environments in  sub-Sah...
2013 GRM: Improve chickpea productivity for marginal environments in sub-Sah...CGIAR Generation Challenge Programme
 
ICRISAT Global Planning Meeting 2019:Research Program - Genetic Gains by Dr R...
ICRISAT Global Planning Meeting 2019:Research Program - Genetic Gains by Dr R...ICRISAT Global Planning Meeting 2019:Research Program - Genetic Gains by Dr R...
ICRISAT Global Planning Meeting 2019:Research Program - Genetic Gains by Dr R...ICRISAT
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSIntegrated DNA Technologies
 
PAG 2015 - Accessing genotyping services for crop improvement in developing c...
PAG 2015 - Accessing genotyping services for crop improvement in developing c...PAG 2015 - Accessing genotyping services for crop improvement in developing c...
PAG 2015 - Accessing genotyping services for crop improvement in developing c...Integrated Breeding Platform
 
Research Program Genetic Gains (RPGG) Review Meeting 2021: Groundnut genomic ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: Groundnut genomic ...Research Program Genetic Gains (RPGG) Review Meeting 2021: Groundnut genomic ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: Groundnut genomic ...ICRISAT
 
Apac distributor training series 3 swift product for cancer study
Apac distributor training series 3  swift product for cancer studyApac distributor training series 3  swift product for cancer study
Apac distributor training series 3 swift product for cancer studySwift Biosciences
 
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...Amazon Web Services
 
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisVarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisGolden Helix
 
Cassavabase general presentation PAG 2016
Cassavabase general presentation PAG 2016Cassavabase general presentation PAG 2016
Cassavabase general presentation PAG 2016solgenomics
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenomeInABottle
 

Similar to Novel mutation detection in large sample pools using Population Calling (20)

Research Program Genetic Gains (RPGG) Review Meeting 2021: Forward Breeding: ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: Forward Breeding: ...Research Program Genetic Gains (RPGG) Review Meeting 2021: Forward Breeding: ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: Forward Breeding: ...
 
3b. Biotechnolgies & Genomics - Jane Theaker
3b. Biotechnolgies & Genomics - Jane Theaker3b. Biotechnolgies & Genomics - Jane Theaker
3b. Biotechnolgies & Genomics - Jane Theaker
 
Genomic and enabling technologies in maize breeding for enhanced genetic gain...
Genomic and enabling technologies in maize breeding for enhanced genetic gain...Genomic and enabling technologies in maize breeding for enhanced genetic gain...
Genomic and enabling technologies in maize breeding for enhanced genetic gain...
 
Spark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van HamSpark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van Ham
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Metadata for the Global Evaluation Trial Repository of CCAFS and interoperabi...
Metadata for the Global Evaluation Trial Repository of CCAFS and interoperabi...Metadata for the Global Evaluation Trial Repository of CCAFS and interoperabi...
Metadata for the Global Evaluation Trial Repository of CCAFS and interoperabi...
 
2013 GRM: Improve chickpea productivity for marginal environments in sub-Sah...
2013 GRM: Improve chickpea productivity for marginal environments in  sub-Sah...2013 GRM: Improve chickpea productivity for marginal environments in  sub-Sah...
2013 GRM: Improve chickpea productivity for marginal environments in sub-Sah...
 
ICRISAT Global Planning Meeting 2019:Research Program - Genetic Gains by Dr R...
ICRISAT Global Planning Meeting 2019:Research Program - Genetic Gains by Dr R...ICRISAT Global Planning Meeting 2019:Research Program - Genetic Gains by Dr R...
ICRISAT Global Planning Meeting 2019:Research Program - Genetic Gains by Dr R...
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGS
 
PAG 2015 - Accessing genotyping services for crop improvement in developing c...
PAG 2015 - Accessing genotyping services for crop improvement in developing c...PAG 2015 - Accessing genotyping services for crop improvement in developing c...
PAG 2015 - Accessing genotyping services for crop improvement in developing c...
 
Research Program Genetic Gains (RPGG) Review Meeting 2021: Groundnut genomic ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: Groundnut genomic ...Research Program Genetic Gains (RPGG) Review Meeting 2021: Groundnut genomic ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: Groundnut genomic ...
 
GRM 2011: The Integrated Breeding Platform tools and services
GRM 2011: The Integrated Breeding Platform tools and servicesGRM 2011: The Integrated Breeding Platform tools and services
GRM 2011: The Integrated Breeding Platform tools and services
 
Apac distributor training series 3 swift product for cancer study
Apac distributor training series 3  swift product for cancer studyApac distributor training series 3  swift product for cancer study
Apac distributor training series 3 swift product for cancer study
 
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
 
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisVarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
 
2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GRM 2013: Integrated Breeding Workflow System update -- M Sawkins
GRM 2013: Integrated Breeding Workflow System update -- M SawkinsGRM 2013: Integrated Breeding Workflow System update -- M Sawkins
GRM 2013: Integrated Breeding Workflow System update -- M Sawkins
 
Cassavabase general presentation PAG 2016
Cassavabase general presentation PAG 2016Cassavabase general presentation PAG 2016
Cassavabase general presentation PAG 2016
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
HUG @ NGCLE@e-Novia 15.11.2017
HUG @ NGCLE@e-Novia 15.11.2017HUG @ NGCLE@e-Novia 15.11.2017
HUG @ NGCLE@e-Novia 15.11.2017
 

Recently uploaded

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 

Recently uploaded (20)

Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 

Novel mutation detection in large sample pools using Population Calling

  • 1. Population Calling a powerful tool for novel mutation detection in larger sample pools January 12, 2016 Antoine Janssen antoine.janssen@keygene.com June 2015/ Antoine JanssenThe crop innovation company
  • 2. The crop innovation company 2 Overview • Genotype calling strategies • Infrastructure setup • Show case 3,000 rice genomes • Genalice stepwise vs Population calling • BWA/GATK joint genotyping vs Population calling • Conclusions
  • 3. The crop innovation company Overview Genotype calling strategies 3 • Single sample calling • Batch calling • Joint calling Pros • Distinction between homozygous reference vs. missing data • Greater sensitivity for low-frequency variants • Greater ability to filter out false positives Cons • Scaling and infrastructure • Incremental analysis
  • 4. The crop innovation company Single sample calling Genotype calling strategies 4
  • 5. The crop innovation company The classic route: BWA/GATK preprocessing read mapping genotypingpostprocessing • python script • sickle • BWA • samtools • Picard MarkDuplicates • GATK IndelRealigner • GATK UnifiedGenotyper• several python scripts Genotype calling strategies
  • 6. The crop innovation company Map per sample Mapping of all samples against reference Genotype per sample Genotyping of all samples Store variant positions Create BED file of all variation over all samples Genotype calling strategies Stepwise approach (1)
  • 7. The crop innovation company Adjust VCF per sample Adjust VCF files to results in samples with equal amounts of positions – forcing unknown (./.) Merge VCFs & Post process Merge VCFs & Post-process all samples Genotype calling strategies Stepwise approach (2) Genotype per sample Genotyping of all samples with a BED file – force calls on all positions
  • 8. The crop innovation company Genalice Population Calling Genotype calling strategies 8 Map per sample Map all samples against reference Genotype per sample Genotype all samples Pop call & post process Population calling & post-processing of all samples
  • 9. The crop innovation company Genalice Population Calling: Process Genotype calling strategies 9 gaPopulation add gaPopulation merge gaPopulation commit gaPopulation extract XMLgaMap GAR GAR GAR GAR fastq fastq fastq fastq GVM VCF
  • 10. The crop innovation company Infrastructure setup 10 3,000 rice accessions ~17 Tb fastq.gz format Streaming fastq.gz over NFS Desktop client (VM) SSH connection Illumina HiSeq2500Intel Xeon 2620 (12 cores, 2Ghz) 96Gb RAM
  • 11. The crop innovation company 3,000 rice genome project Show case 11 • Rice (Oryza sativa L) • 3,000 rice accessions • 89 countries • Avg seq depth of 14X • Mapped to Nipponbare (IRGSP-1.0) o 374 Mbp o 12 chromosomes • BWA and GATK (DNANexus) • Data on o Gygascience o EBI o Amazon
  • 12. The crop innovation company Test set 1: 131 rice accessions Show case 12 • Subset of 131 accessions (out of 3,000) selected • All major rice types are represented • Mapped to Nipponbare (IRGSP-1.0) Type # Aus/boro 5 Basmati/sadri 1 Indica 19 Intermediate type 7 Japonica 11 Temperate japonica 71 Tropical japonica 17 Grand Total 131
  • 13. The crop innovation company Genalice ‘stepwise’ vs. Population calling Show case 13 Map per sample Genotype per sample Store variant positions 9h Adjust VCF per sample Merge VCFs & Post process Genotype per sample 1h 0.5h 3h 7h 116h Map per sample Pop call & post process 3h 5m Genotype per sample 1h Total time: 136:30h 8,227,780 variants Total time: 4:05h 8,137,366 variants Stepwise Pop call 6,147,072 positions shared 75%/76% Map: 1:22m / sample Total: 1:43m / sample
  • 14. The crop innovation company Genalice ‘stepwise’ vs. Population calling Show case 14 • Genalice population calling route is straight forward and does not require external tools • Major performance increase from stepwise to population calling approach (factor 34) • Overlap of approaches on position ~75% • Further qualitative research required
  • 15. The crop innovation company BWA/GATK vs. Genalice Population calling Show case 15 BWA / GATK from https://aws.amazon.com/public-data- sets/3000-rice-genome No details on compute available gaMap gaPopulation merge 75h 1.5h gaPopulation add 25h Pop call Map per sample Pop call & post process Genotype per sample BWA/GATK gaPopulation extract 1h gaPopulation commit 0.5h Map: 1:30m / sample Total: 2:04m / sample Analysis of 3,000 rice accessions
  • 16. The crop innovation company BWA/GATK vs. Genalice Population calling (2) Show case 16 • Format of VCF on Amazon was different from Genalice MAP o --output_mode EMIT_ALL_SITES o One gVCF file per accession (~2Gb each) o Call for all positions of the reference o Multiple calls for same genomic position • Datasets are cleaned and merged • Variant count 131 samples: 14,838,819 vs 8,137,366 • Minimum allele depth 2 vs 5 • 6,031,002 positions shared (74%) -> 26% novels • 83.5% of shared positions have identical genotypes • Further analysis required
  • 17. The crop innovation company 17 Conclusions • The new Population Calling module is extremely fast • > 30 time faster then stepwise approach • Analysis time scales linear with # samples • Differences in content should be further analyzed • The module is highly flexible o Incremental addition of samples • Extracting variants from GVM is very efficient • The VCF output is conform standard but misses details required in follow up research (depth / quality) • This feature is requested and will be added soon
  • 18. The crop innovation company 18 Thank you Cueleneare Koen Nijbroek Hans Karten Tim Rudie Antonise Bas Tolhuis