SlideShare a Scribd company logo
Validating INDELs in the Genome In A Bottle
Reference Genome Standard
Cameron Locker, Sarah Imawalle, Natalie Bir, Vijay Nadella, Ben Kelly, Harkness Kuck, James Fitch, Peter White
The Biomedical Genomics Core & The Center for Microbial Pathogenesis, The Research Institute at Nationwide Children's
Hospital, Columbus, OH
Methods
The National Institute of Science and Technology (NIST) produced the GIAB with their
own analysis of the NA12878 genome. The variant caller programs FREEBAYES and
GATK were used on a 30X coverage set of the NA12878 genome sequence data using our
in house data analysis pipeline “Churchill". The three variant lists were compared to
identify the variants in common, against those that were unique to each data set (Figure 6)
ResultsIntroduction
Goal: Validate the NIST Genome In A Bottle GIAB standard against two different
variant callers
• FREEBAYES (FB)
• Genome Analysis Tool Kit (GATK)
Approach:
• Identify the variants unique to each
• Develop a high throughput sequencing approach using the MiSeq
• Validate sequencing results to determine if the variant exists.
Figure 2: Sample variant in which all groups
correctly identified a change, but reported it
in different ways. Freebayes describes the
change as a block substitution, GATK as an SNP,
insertion and deletion, and GIAB as four single
nucleotide polymorphisms (SNP).
Figure 1: Output of the program IGVtools from the Broad Institute. Allows viewing of multiple variant
calls at the same location. All three of the groups are shown, differences described in Figure 2 and Table 1.
Table 1: Further explanation of the differences in
reporting among the groups. Individual base
changes described. All groups correctly identified
the change and the location, but differed when
describing how the change occurred.
Figure 3: Work flow for the identification, isolation and laboratory preparation of variants. The program RTGtools
was used to parse out the differences between the variant lists. By using a range for location it can filter out variants like the
one described in Figure 1 where despite describing the same variant, the method to do so was different. The program
Primer3 was used to design primers to be ordered for MiSeq analysis. PCR was used to amplify the desired sequences. They
were loaded into pools for analysis by the MiSeq data.
Figure 5: Fastq analysis of MiSeq data. The program
FLASh was used to merge the paired end reads. Another
program, cutAdapt, was used to remove the primer ends of the
sequence. Finally the sequence was compared to the expected
variant it was designed for to determine if the variant was
indeed valid.
Using RTGtools to properly parse out the difference between the variant callers, seven
variant lists were generated . These variants include both single nucleotide polymorphisms
(SNPs) as well as insertions and deletions (INDELs) (Table 2).
Figure 6: Distribution of variants among the cross-section of GIAB database and variant callers FB and GATK.
Table 2: INDEL counts for each of the cross-sectional lists. 150 variants (if the files were large enough) were
randomly selected from each file. The breakdown of the 150 is 70 random insertions, 70 random deletions, and 10
random multi allelic variants.
Figure 4: Gel process during the ‘pool’ phase. Each
band represents the DNA of the variant loaded into that
well. The further the band goes, the lighter the sequence
is. Multiple bands can indicate the variant is
heterozygous such that there are unique sequences from
both chromosomes.
Validation of chr2: 79133830 CACACACACCCTAT>C GIAB variant
GGTGGCACAGATAAGGACACAGTAGTCATGAGCTTTTGCCCACAGTAAACTGGATGATTACTGAAAGAAAGGGAGGCTGACAAGGAGAG
CCTGTGATTAAGGTAGAAAAGGTTTTCAACCAGGGCCCTTTCAAGCAGCACTGAGAACATTTCAGCTTCTTCCTTCCAGCCTTGGAGAG
GAAAGTACACACACACACACACACACACACACACACACACACACACACACACACACCCTATCTTTTTTTTTCTTTTGACTAAAGACAGA
TGATGACATGGTTGACCAGTATTCACACACACTCAAAGAAGTTAAATGCTTTTTAGCTGACAGTCATCTCAAATCCTTCTAGAAAACAA
CACAAAATACTTTATGTGATTTGCTGGTCACTTCACTGTTTAGCCC
Figure 7: Validating MiSeq results for a deletion on chromosome 2. In yellow, the forward read, in blue, the reverse
read. The overlap between the two is colored in green. The deletions from this region of chromosome 2 are found in
purple.
GIAB
Freebayes
GATK
Block Substitution
Insertion and Deletion
SNPs
Variant List Ref > Alt
GTACCA>GCGGCG
Type of
Variant
Freebayes TACCA > CGGCG Block Substitution
GATK G> GCGG
GTAC > G
A > G
Insertion and
Deletion
GIAB T > C
A > G
C > G
A > G
SNP
All correctly report the same result
Acknowledgements
This work was supported by Nationwide Children’s Hospital Biomedical Genomics core.
PCR, Gel data collection, and MiSeq machine operation done by Sarah Imawalle.
Discussion
Validating the existence of these variants:
• Achieve a better understanding of the variant callers used
• Help improve the GIAB standard
Moving Forward:
• Continue to validate INDELs among the seven variant lists
• Test new variant callers
• Improve the MiSeq INDEL validation pipeline

More Related Content

What's hot

Bioinformatics-driven discovery of EGFR mutant Lung Cancer
Bioinformatics-driven discovery of EGFR mutant Lung CancerBioinformatics-driven discovery of EGFR mutant Lung Cancer
Bioinformatics-driven discovery of EGFR mutant Lung CancerPreveenRamamoorthy
 
EHA poster Genomic Analysis by MyAML with Chemotherapy
EHA poster Genomic Analysis by MyAML with ChemotherapyEHA poster Genomic Analysis by MyAML with Chemotherapy
EHA poster Genomic Analysis by MyAML with ChemotherapySuzanne M. Graham
 
FabioAmaralProject 3
FabioAmaralProject 3FabioAmaralProject 3
FabioAmaralProject 3Fabio Amaral
 
Monitoring the quality of data in the clinical use of pathogen genomes
Monitoring the quality of data in the clinical use of pathogen genomesMonitoring the quality of data in the clinical use of pathogen genomes
Monitoring the quality of data in the clinical use of pathogen genomes
Health Informatics New Zealand
 
CROI2010Marlowe(4) FINAL
CROI2010Marlowe(4) FINALCROI2010Marlowe(4) FINAL
CROI2010Marlowe(4) FINALRobert Bruce
 
The Evaluation of the Speed-Oligo® Mycobacteria Assay for Identification of M...
The Evaluation of the Speed-Oligo® Mycobacteria Assay for Identification of M...The Evaluation of the Speed-Oligo® Mycobacteria Assay for Identification of M...
The Evaluation of the Speed-Oligo® Mycobacteria Assay for Identification of M...
CrimsonpublishersCJMI
 
Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...
Ahmed Madni
 
Development of a high throughput workflow for genotyping CFTR mutations
Development of a high throughput workflow for genotyping CFTR mutationsDevelopment of a high throughput workflow for genotyping CFTR mutations
Development of a high throughput workflow for genotyping CFTR mutations
Thermo Fisher Scientific
 
Bioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyBioinformatics in dermato-oncology
Bioinformatics in dermato-oncology
Joaquin Dopazo
 
The server of the Spanish Population Variability
The server of the Spanish Population VariabilityThe server of the Spanish Population Variability
The server of the Spanish Population Variability
Joaquin Dopazo
 
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
QIAGEN
 
Ruta graveolens extract induces dna damage pathways and blocks akt activation...
Ruta graveolens extract induces dna damage pathways and blocks akt activation...Ruta graveolens extract induces dna damage pathways and blocks akt activation...
Ruta graveolens extract induces dna damage pathways and blocks akt activation...
Tiensae Teshome
 
Multiplex TaqMan Assays for Rare Mutation Analysis Using Digital PCR
Multiplex TaqMan Assays for Rare Mutation Analysis Using Digital PCRMultiplex TaqMan Assays for Rare Mutation Analysis Using Digital PCR
Multiplex TaqMan Assays for Rare Mutation Analysis Using Digital PCR
Thermo Fisher Scientific
 
Resolving false positive CYP2D6 genotype results: CYP2D7 variation is the cul...
Resolving false positive CYP2D6 genotype results: CYP2D7 variation is the cul...Resolving false positive CYP2D6 genotype results: CYP2D7 variation is the cul...
Resolving false positive CYP2D6 genotype results: CYP2D7 variation is the cul...
Thermo Fisher Scientific
 
One man's *1 is another man's *13? Trouble with nomenclatures in personalized...
One man's *1 is another man's *13? Trouble with nomenclatures in personalized...One man's *1 is another man's *13? Trouble with nomenclatures in personalized...
One man's *1 is another man's *13? Trouble with nomenclatures in personalized...
Matthias Samwald
 
Digging into thousands of variants to find disease genes in Mendelian and com...
Digging into thousands of variants to find disease genes in Mendelian and com...Digging into thousands of variants to find disease genes in Mendelian and com...
Digging into thousands of variants to find disease genes in Mendelian and com...
Joaquin Dopazo
 

What's hot (20)

Bioinformatics-driven discovery of EGFR mutant Lung Cancer
Bioinformatics-driven discovery of EGFR mutant Lung CancerBioinformatics-driven discovery of EGFR mutant Lung Cancer
Bioinformatics-driven discovery of EGFR mutant Lung Cancer
 
JCV2010
JCV2010JCV2010
JCV2010
 
EHA poster Genomic Analysis by MyAML with Chemotherapy
EHA poster Genomic Analysis by MyAML with ChemotherapyEHA poster Genomic Analysis by MyAML with Chemotherapy
EHA poster Genomic Analysis by MyAML with Chemotherapy
 
FabioAmaralProject 3
FabioAmaralProject 3FabioAmaralProject 3
FabioAmaralProject 3
 
Monitoring the quality of data in the clinical use of pathogen genomes
Monitoring the quality of data in the clinical use of pathogen genomesMonitoring the quality of data in the clinical use of pathogen genomes
Monitoring the quality of data in the clinical use of pathogen genomes
 
CROI2010Marlowe(4) FINAL
CROI2010Marlowe(4) FINALCROI2010Marlowe(4) FINAL
CROI2010Marlowe(4) FINAL
 
The Evaluation of the Speed-Oligo® Mycobacteria Assay for Identification of M...
The Evaluation of the Speed-Oligo® Mycobacteria Assay for Identification of M...The Evaluation of the Speed-Oligo® Mycobacteria Assay for Identification of M...
The Evaluation of the Speed-Oligo® Mycobacteria Assay for Identification of M...
 
Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...
 
Development of a high throughput workflow for genotyping CFTR mutations
Development of a high throughput workflow for genotyping CFTR mutationsDevelopment of a high throughput workflow for genotyping CFTR mutations
Development of a high throughput workflow for genotyping CFTR mutations
 
Bioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyBioinformatics in dermato-oncology
Bioinformatics in dermato-oncology
 
The server of the Spanish Population Variability
The server of the Spanish Population VariabilityThe server of the Spanish Population Variability
The server of the Spanish Population Variability
 
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
 
Ruta graveolens extract induces dna damage pathways and blocks akt activation...
Ruta graveolens extract induces dna damage pathways and blocks akt activation...Ruta graveolens extract induces dna damage pathways and blocks akt activation...
Ruta graveolens extract induces dna damage pathways and blocks akt activation...
 
Multiplex TaqMan Assays for Rare Mutation Analysis Using Digital PCR
Multiplex TaqMan Assays for Rare Mutation Analysis Using Digital PCRMultiplex TaqMan Assays for Rare Mutation Analysis Using Digital PCR
Multiplex TaqMan Assays for Rare Mutation Analysis Using Digital PCR
 
Resolving false positive CYP2D6 genotype results: CYP2D7 variation is the cul...
Resolving false positive CYP2D6 genotype results: CYP2D7 variation is the cul...Resolving false positive CYP2D6 genotype results: CYP2D7 variation is the cul...
Resolving false positive CYP2D6 genotype results: CYP2D7 variation is the cul...
 
14KoVar
14KoVar14KoVar
14KoVar
 
Published-PageOne
Published-PageOnePublished-PageOne
Published-PageOne
 
One man's *1 is another man's *13? Trouble with nomenclatures in personalized...
One man's *1 is another man's *13? Trouble with nomenclatures in personalized...One man's *1 is another man's *13? Trouble with nomenclatures in personalized...
One man's *1 is another man's *13? Trouble with nomenclatures in personalized...
 
Tfpcr array poster
Tfpcr array posterTfpcr array poster
Tfpcr array poster
 
Digging into thousands of variants to find disease genes in Mendelian and com...
Digging into thousands of variants to find disease genes in Mendelian and com...Digging into thousands of variants to find disease genes in Mendelian and com...
Digging into thousands of variants to find disease genes in Mendelian and com...
 

Viewers also liked

Giab roadmap 150820.pptx
Giab roadmap 150820.pptxGiab roadmap 150820.pptx
Giab roadmap 150820.pptx
GenomeInABottle
 
Encuesta adulto mayor
Encuesta adulto mayorEncuesta adulto mayor
Encuesta adulto mayor
Juan Camilo Zapata
 
GIAB Sep2016 Lightning Effective Depth Metric yves konigshofer_sera_care
GIAB Sep2016 Lightning Effective Depth Metric yves konigshofer_sera_careGIAB Sep2016 Lightning Effective Depth Metric yves konigshofer_sera_care
GIAB Sep2016 Lightning Effective Depth Metric yves konigshofer_sera_care
GenomeInABottle
 
Gold Souk, Dubai
Gold Souk, DubaiGold Souk, Dubai
Gold Souk, Dubai
Makala D.
 
Entrevista jaiver
Entrevista jaiverEntrevista jaiver
Entrevista jaiver
Juan Camilo Zapata
 
Sept2016 smallvar rtg
Sept2016 smallvar rtgSept2016 smallvar rtg
Sept2016 smallvar rtg
GenomeInABottle
 
Aug2015 analysis team spiral genetics
Aug2015 analysis team spiral geneticsAug2015 analysis team spiral genetics
Aug2015 analysis team spiral genetics
GenomeInABottle
 
Mi proyecto de vida
Mi proyecto de vidaMi proyecto de vida
Mi proyecto de vida
Jhonatahfernando Florespacheco
 
Gente Tóxica...
Gente Tóxica...Gente Tóxica...
Gente Tóxica...
Pedro Roberto Casanova
 
agbt 2016 workshop lindsay
agbt 2016 workshop lindsayagbt 2016 workshop lindsay
agbt 2016 workshop lindsay
Genome Reference Consortium
 
Psicomotricidad
PsicomotricidadPsicomotricidad
Psicomotricidad
viviana gallardo
 
25η μαρτίου
25η μαρτίου25η μαρτίου
25η μαρτίου
Athina Kollia
 
«Diseño para todos» en la investigacion social sobre personas con discapacidad
«Diseño para todos» en la investigacion social sobre personas con discapacidad«Diseño para todos» en la investigacion social sobre personas con discapacidad
«Diseño para todos» en la investigacion social sobre personas con discapacidad
Pedro Roberto Casanova
 
La educación domiciliaria y hospitalaria en el nivel secundario 2016
La educación domiciliaria y hospitalaria en el nivel secundario 2016La educación domiciliaria y hospitalaria en el nivel secundario 2016
La educación domiciliaria y hospitalaria en el nivel secundario 2016
Pedro Roberto Casanova
 

Viewers also liked (15)

Giab roadmap 150820.pptx
Giab roadmap 150820.pptxGiab roadmap 150820.pptx
Giab roadmap 150820.pptx
 
Encuesta adulto mayor
Encuesta adulto mayorEncuesta adulto mayor
Encuesta adulto mayor
 
GIAB Sep2016 Lightning Effective Depth Metric yves konigshofer_sera_care
GIAB Sep2016 Lightning Effective Depth Metric yves konigshofer_sera_careGIAB Sep2016 Lightning Effective Depth Metric yves konigshofer_sera_care
GIAB Sep2016 Lightning Effective Depth Metric yves konigshofer_sera_care
 
Gold Souk, Dubai
Gold Souk, DubaiGold Souk, Dubai
Gold Souk, Dubai
 
Trab final
Trab final Trab final
Trab final
 
Entrevista jaiver
Entrevista jaiverEntrevista jaiver
Entrevista jaiver
 
Sept2016 smallvar rtg
Sept2016 smallvar rtgSept2016 smallvar rtg
Sept2016 smallvar rtg
 
Aug2015 analysis team spiral genetics
Aug2015 analysis team spiral geneticsAug2015 analysis team spiral genetics
Aug2015 analysis team spiral genetics
 
Mi proyecto de vida
Mi proyecto de vidaMi proyecto de vida
Mi proyecto de vida
 
Gente Tóxica...
Gente Tóxica...Gente Tóxica...
Gente Tóxica...
 
agbt 2016 workshop lindsay
agbt 2016 workshop lindsayagbt 2016 workshop lindsay
agbt 2016 workshop lindsay
 
Psicomotricidad
PsicomotricidadPsicomotricidad
Psicomotricidad
 
25η μαρτίου
25η μαρτίου25η μαρτίου
25η μαρτίου
 
«Diseño para todos» en la investigacion social sobre personas con discapacidad
«Diseño para todos» en la investigacion social sobre personas con discapacidad«Diseño para todos» en la investigacion social sobre personas con discapacidad
«Diseño para todos» en la investigacion social sobre personas con discapacidad
 
La educación domiciliaria y hospitalaria en el nivel secundario 2016
La educación domiciliaria y hospitalaria en el nivel secundario 2016La educación domiciliaria y hospitalaria en el nivel secundario 2016
La educación domiciliaria y hospitalaria en el nivel secundario 2016
 

Similar to Cameron_Locker_variants_final_poster1

Enabling CNV Studies from Single Cells Using Whole Genome Amplification and L...
Enabling CNV Studies from Single Cells Using Whole Genome Amplification and L...Enabling CNV Studies from Single Cells Using Whole Genome Amplification and L...
Enabling CNV Studies from Single Cells Using Whole Genome Amplification and L...
QIAGEN
 
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
QIAGEN
 
A novel method for building custom ampli seq panels using optimized pcr primers
A novel method for building custom ampli seq panels using optimized pcr primers A novel method for building custom ampli seq panels using optimized pcr primers
A novel method for building custom ampli seq panels using optimized pcr primers
Thermo Fisher Scientific
 
A High Throughput TaqMan CFTR Mutation Genotyping Workflow
A High Throughput TaqMan CFTR Mutation Genotyping WorkflowA High Throughput TaqMan CFTR Mutation Genotyping Workflow
A High Throughput TaqMan CFTR Mutation Genotyping Workflow
Thermo Fisher Scientific
 
MCB 432 Final Table PP 01.06.16
MCB 432 Final Table PP 01.06.16MCB 432 Final Table PP 01.06.16
MCB 432 Final Table PP 01.06.16Keegan McAuliffe
 
Mutation taster ppt
Mutation taster pptMutation taster ppt
Mutation taster ppt
Hina Qaiser
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
Aashish Patel
 
Genome comparision
Genome comparisionGenome comparision
Genome comparision
Pinky Vincent
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Nils Gehlenborg
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Elia Brodsky
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821
GenomeInABottle
 
Targeted Single Cell Sequencing for Accurate Mutation Detection in Heterogene...
Targeted Single Cell Sequencing for Accurate Mutation Detection in Heterogene...Targeted Single Cell Sequencing for Accurate Mutation Detection in Heterogene...
Targeted Single Cell Sequencing for Accurate Mutation Detection in Heterogene...
QIAGEN
 
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
IJERD Editor
 
ASHG 2015 - Redundant Annotations in Tertiary Analysis
ASHG 2015 - Redundant Annotations in Tertiary AnalysisASHG 2015 - Redundant Annotations in Tertiary Analysis
ASHG 2015 - Redundant Annotations in Tertiary Analysis
James Warren
 
31931 31941
31931 3194131931 31941
31931 31941
Amit Gupta
 
Custom AmpliSeq™ Panels for Inherited Disease Research from Optimized, Invent...
Custom AmpliSeq™ Panels for Inherited Disease Research from Optimized, Invent...Custom AmpliSeq™ Panels for Inherited Disease Research from Optimized, Invent...
Custom AmpliSeq™ Panels for Inherited Disease Research from Optimized, Invent...
Thermo Fisher Scientific
 
Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...
William Chow
 
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell TumorsWhole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
Thermo Fisher Scientific
 
EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13
Jonathan Eisen
 

Similar to Cameron_Locker_variants_final_poster1 (20)

Enabling CNV Studies from Single Cells Using Whole Genome Amplification and L...
Enabling CNV Studies from Single Cells Using Whole Genome Amplification and L...Enabling CNV Studies from Single Cells Using Whole Genome Amplification and L...
Enabling CNV Studies from Single Cells Using Whole Genome Amplification and L...
 
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
 
A novel method for building custom ampli seq panels using optimized pcr primers
A novel method for building custom ampli seq panels using optimized pcr primers A novel method for building custom ampli seq panels using optimized pcr primers
A novel method for building custom ampli seq panels using optimized pcr primers
 
KO poster 8:13
KO poster 8:13KO poster 8:13
KO poster 8:13
 
A High Throughput TaqMan CFTR Mutation Genotyping Workflow
A High Throughput TaqMan CFTR Mutation Genotyping WorkflowA High Throughput TaqMan CFTR Mutation Genotyping Workflow
A High Throughput TaqMan CFTR Mutation Genotyping Workflow
 
MCB 432 Final Table PP 01.06.16
MCB 432 Final Table PP 01.06.16MCB 432 Final Table PP 01.06.16
MCB 432 Final Table PP 01.06.16
 
Mutation taster ppt
Mutation taster pptMutation taster ppt
Mutation taster ppt
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
 
Genome comparision
Genome comparisionGenome comparision
Genome comparision
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821
 
Targeted Single Cell Sequencing for Accurate Mutation Detection in Heterogene...
Targeted Single Cell Sequencing for Accurate Mutation Detection in Heterogene...Targeted Single Cell Sequencing for Accurate Mutation Detection in Heterogene...
Targeted Single Cell Sequencing for Accurate Mutation Detection in Heterogene...
 
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
 
ASHG 2015 - Redundant Annotations in Tertiary Analysis
ASHG 2015 - Redundant Annotations in Tertiary AnalysisASHG 2015 - Redundant Annotations in Tertiary Analysis
ASHG 2015 - Redundant Annotations in Tertiary Analysis
 
31931 31941
31931 3194131931 31941
31931 31941
 
Custom AmpliSeq™ Panels for Inherited Disease Research from Optimized, Invent...
Custom AmpliSeq™ Panels for Inherited Disease Research from Optimized, Invent...Custom AmpliSeq™ Panels for Inherited Disease Research from Optimized, Invent...
Custom AmpliSeq™ Panels for Inherited Disease Research from Optimized, Invent...
 
Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...
 
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell TumorsWhole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
 
EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13
 

Cameron_Locker_variants_final_poster1

  • 1. Validating INDELs in the Genome In A Bottle Reference Genome Standard Cameron Locker, Sarah Imawalle, Natalie Bir, Vijay Nadella, Ben Kelly, Harkness Kuck, James Fitch, Peter White The Biomedical Genomics Core & The Center for Microbial Pathogenesis, The Research Institute at Nationwide Children's Hospital, Columbus, OH Methods The National Institute of Science and Technology (NIST) produced the GIAB with their own analysis of the NA12878 genome. The variant caller programs FREEBAYES and GATK were used on a 30X coverage set of the NA12878 genome sequence data using our in house data analysis pipeline “Churchill". The three variant lists were compared to identify the variants in common, against those that were unique to each data set (Figure 6) ResultsIntroduction Goal: Validate the NIST Genome In A Bottle GIAB standard against two different variant callers • FREEBAYES (FB) • Genome Analysis Tool Kit (GATK) Approach: • Identify the variants unique to each • Develop a high throughput sequencing approach using the MiSeq • Validate sequencing results to determine if the variant exists. Figure 2: Sample variant in which all groups correctly identified a change, but reported it in different ways. Freebayes describes the change as a block substitution, GATK as an SNP, insertion and deletion, and GIAB as four single nucleotide polymorphisms (SNP). Figure 1: Output of the program IGVtools from the Broad Institute. Allows viewing of multiple variant calls at the same location. All three of the groups are shown, differences described in Figure 2 and Table 1. Table 1: Further explanation of the differences in reporting among the groups. Individual base changes described. All groups correctly identified the change and the location, but differed when describing how the change occurred. Figure 3: Work flow for the identification, isolation and laboratory preparation of variants. The program RTGtools was used to parse out the differences between the variant lists. By using a range for location it can filter out variants like the one described in Figure 1 where despite describing the same variant, the method to do so was different. The program Primer3 was used to design primers to be ordered for MiSeq analysis. PCR was used to amplify the desired sequences. They were loaded into pools for analysis by the MiSeq data. Figure 5: Fastq analysis of MiSeq data. The program FLASh was used to merge the paired end reads. Another program, cutAdapt, was used to remove the primer ends of the sequence. Finally the sequence was compared to the expected variant it was designed for to determine if the variant was indeed valid. Using RTGtools to properly parse out the difference between the variant callers, seven variant lists were generated . These variants include both single nucleotide polymorphisms (SNPs) as well as insertions and deletions (INDELs) (Table 2). Figure 6: Distribution of variants among the cross-section of GIAB database and variant callers FB and GATK. Table 2: INDEL counts for each of the cross-sectional lists. 150 variants (if the files were large enough) were randomly selected from each file. The breakdown of the 150 is 70 random insertions, 70 random deletions, and 10 random multi allelic variants. Figure 4: Gel process during the ‘pool’ phase. Each band represents the DNA of the variant loaded into that well. The further the band goes, the lighter the sequence is. Multiple bands can indicate the variant is heterozygous such that there are unique sequences from both chromosomes. Validation of chr2: 79133830 CACACACACCCTAT>C GIAB variant GGTGGCACAGATAAGGACACAGTAGTCATGAGCTTTTGCCCACAGTAAACTGGATGATTACTGAAAGAAAGGGAGGCTGACAAGGAGAG CCTGTGATTAAGGTAGAAAAGGTTTTCAACCAGGGCCCTTTCAAGCAGCACTGAGAACATTTCAGCTTCTTCCTTCCAGCCTTGGAGAG GAAAGTACACACACACACACACACACACACACACACACACACACACACACACACACCCTATCTTTTTTTTTCTTTTGACTAAAGACAGA TGATGACATGGTTGACCAGTATTCACACACACTCAAAGAAGTTAAATGCTTTTTAGCTGACAGTCATCTCAAATCCTTCTAGAAAACAA CACAAAATACTTTATGTGATTTGCTGGTCACTTCACTGTTTAGCCC Figure 7: Validating MiSeq results for a deletion on chromosome 2. In yellow, the forward read, in blue, the reverse read. The overlap between the two is colored in green. The deletions from this region of chromosome 2 are found in purple. GIAB Freebayes GATK Block Substitution Insertion and Deletion SNPs Variant List Ref > Alt GTACCA>GCGGCG Type of Variant Freebayes TACCA > CGGCG Block Substitution GATK G> GCGG GTAC > G A > G Insertion and Deletion GIAB T > C A > G C > G A > G SNP All correctly report the same result Acknowledgements This work was supported by Nationwide Children’s Hospital Biomedical Genomics core. PCR, Gel data collection, and MiSeq machine operation done by Sarah Imawalle. Discussion Validating the existence of these variants: • Achieve a better understanding of the variant callers used • Help improve the GIAB standard Moving Forward: • Continue to validate INDELs among the seven variant lists • Test new variant callers • Improve the MiSeq INDEL validation pipeline