This document describes a three-part webinar series on next generation sequencing and its role in cancer biology. The webinars will cover an introduction to NGS technology and applications, NGS for cancer research, and NGS data analysis for genetic profiling. The third webinar presented by Ravi Vijaya Satya will discuss topics including read mapping, variant calling, variant annotation, targeted enrichment using GeneRead Gene Panels, and the GeneRead Data Analysis Portal.
New methods deep variant evaluation of draft v4alphaGenomeInABottle
DeepVariant is using a convolutional neural network with 26 layers to call variants from genomic data encoded as tensors representing probability distributions. The document evaluates DeepVariant's performance on different datasets, finding that while indel call accuracy improved in the v4 truth set, SNP call accuracy was mixed due to errors in the v4 labels. Error analysis found DeepVariant was usually correct when it differed from v4, and v4 errors dominated remaining discrepancies, suggesting label errors limit training. With label errors addressed, DeepVariant may achieve very high accuracy.
This document provides an overview of essential bioinformatics resources for designing PCR primers and oligos for various applications. It begins by outlining general rules for PCR primer design, including recommendations for primer length, melting temperature, specificity, secondary structures, and other factors. It then describes several online tools and databases for designing primers for general purpose PCR, real-time qPCR, methylation studies, and other applications. These resources include Primer3, Primer3Plus, PrimerZ, and Vector NTI. Databases like NCBI Probe and RTPrimerDB provide validated primer sequences. The document emphasizes considering multiple design tools and validation of primers.
This document summarizes work on using long read sequencing to assemble structural variants with haplotype resolution. Two methods are described: Falcon Unzip, which performs de novo diploid assembly, and guided assembly using long reads partitioned by 10X Genomics haplotypes. Comparing calls from these methods to other callers reveals missed calls due to length discrepancies and shifted tandem repeat calls. Next steps include improving resolution in low-coverage regions and filling reference gaps. Challenges integrating calls include repeats, multi-allelic events, and complex variants. The richness of assemblies provides opportunities to leverage sequence context and quality.
The document summarizes discussions from a data analysis meeting on small and structural genetic variants. For small variants, updates were provided on improving variant calls, transferring variant phase between datasets, and developing more stringent filtering methods. Next steps include sharing new pedigree-based calls, integrating calls from different analyses, and defining high-confidence regions. For structural variants, updates were given on detecting more sensitive variants from long-read sequencing and optical mapping data, and validating variants using multiple technologies.
This document summarizes Illumina's efforts to generate a population-based structural variant callset using whole genome sequencing data from 3,000-4,000 samples. Key points include using multiple variant callers and population genetics analysis to generate hypotheses about common structural variants, assembling putative deletions to refine breakpoints, developing a graph-based genotyping tool, and validating variants using depth analysis and Mendelian inheritance checks. The goals are to improve consistency and accuracy in calling common structural variants across any sample.
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGenomeInABottle
This document summarizes the evaluation of the Genome in a Bottle (GIAB) HG002 v4 draft benchmark variant calls against calls made by GATK on PacBio HiFi reads. It finds that the v4 draft benchmark increases the number of true positive variants called and improves precision compared to the v3 benchmark. However, there are still some false positive and false negative variant calls made by GATK, including in homopolymer stretches and repetitive regions, presenting opportunities for improving both the variant calling and benchmark.
This document summarizes data from the Genome in a Bottle Consortium SV Data Jamboree. It discusses several approaches to integrating structural variant calls from multiple technologies to establish a benchmark set of high-confidence SVs:
1) Finding deletions supported by multiple technologies with concordant breakpoints and filtering questionable calls. This approach identified 524 deletions supported by 2+ technologies ranging in size from 20bp to over 3kb.
2) A method called svcompare that compares SV calls across technologies and outputs a multi-sample VCF with variant details from each caller. This identified over 2,000 regions with structural variants called by multiple technologies.
3) A method called svviz that analyzes read support for
This document discusses RNA-Seq data analysis using Babelomics 5 software. It describes the typical RNA-Seq data analysis pipeline which includes sequence preprocessing, mapping, quantification, normalization, differential expression analysis, and functional profiling. It also discusses common file formats used like Fastq, BAM, and count matrices. Normalization methods like RPKM and TMM are explained. The document provides an overview of the RNA-Seq data analysis capabilities in Babelomics 5.
New methods deep variant evaluation of draft v4alphaGenomeInABottle
DeepVariant is using a convolutional neural network with 26 layers to call variants from genomic data encoded as tensors representing probability distributions. The document evaluates DeepVariant's performance on different datasets, finding that while indel call accuracy improved in the v4 truth set, SNP call accuracy was mixed due to errors in the v4 labels. Error analysis found DeepVariant was usually correct when it differed from v4, and v4 errors dominated remaining discrepancies, suggesting label errors limit training. With label errors addressed, DeepVariant may achieve very high accuracy.
This document provides an overview of essential bioinformatics resources for designing PCR primers and oligos for various applications. It begins by outlining general rules for PCR primer design, including recommendations for primer length, melting temperature, specificity, secondary structures, and other factors. It then describes several online tools and databases for designing primers for general purpose PCR, real-time qPCR, methylation studies, and other applications. These resources include Primer3, Primer3Plus, PrimerZ, and Vector NTI. Databases like NCBI Probe and RTPrimerDB provide validated primer sequences. The document emphasizes considering multiple design tools and validation of primers.
This document summarizes work on using long read sequencing to assemble structural variants with haplotype resolution. Two methods are described: Falcon Unzip, which performs de novo diploid assembly, and guided assembly using long reads partitioned by 10X Genomics haplotypes. Comparing calls from these methods to other callers reveals missed calls due to length discrepancies and shifted tandem repeat calls. Next steps include improving resolution in low-coverage regions and filling reference gaps. Challenges integrating calls include repeats, multi-allelic events, and complex variants. The richness of assemblies provides opportunities to leverage sequence context and quality.
The document summarizes discussions from a data analysis meeting on small and structural genetic variants. For small variants, updates were provided on improving variant calls, transferring variant phase between datasets, and developing more stringent filtering methods. Next steps include sharing new pedigree-based calls, integrating calls from different analyses, and defining high-confidence regions. For structural variants, updates were given on detecting more sensitive variants from long-read sequencing and optical mapping data, and validating variants using multiple technologies.
This document summarizes Illumina's efforts to generate a population-based structural variant callset using whole genome sequencing data from 3,000-4,000 samples. Key points include using multiple variant callers and population genetics analysis to generate hypotheses about common structural variants, assembling putative deletions to refine breakpoints, developing a graph-based genotyping tool, and validating variants using depth analysis and Mendelian inheritance checks. The goals are to improve consistency and accuracy in calling common structural variants across any sample.
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGenomeInABottle
This document summarizes the evaluation of the Genome in a Bottle (GIAB) HG002 v4 draft benchmark variant calls against calls made by GATK on PacBio HiFi reads. It finds that the v4 draft benchmark increases the number of true positive variants called and improves precision compared to the v3 benchmark. However, there are still some false positive and false negative variant calls made by GATK, including in homopolymer stretches and repetitive regions, presenting opportunities for improving both the variant calling and benchmark.
This document summarizes data from the Genome in a Bottle Consortium SV Data Jamboree. It discusses several approaches to integrating structural variant calls from multiple technologies to establish a benchmark set of high-confidence SVs:
1) Finding deletions supported by multiple technologies with concordant breakpoints and filtering questionable calls. This approach identified 524 deletions supported by 2+ technologies ranging in size from 20bp to over 3kb.
2) A method called svcompare that compares SV calls across technologies and outputs a multi-sample VCF with variant details from each caller. This identified over 2,000 regions with structural variants called by multiple technologies.
3) A method called svviz that analyzes read support for
This document discusses RNA-Seq data analysis using Babelomics 5 software. It describes the typical RNA-Seq data analysis pipeline which includes sequence preprocessing, mapping, quantification, normalization, differential expression analysis, and functional profiling. It also discusses common file formats used like Fastq, BAM, and count matrices. Normalization methods like RPKM and TMM are explained. The document provides an overview of the RNA-Seq data analysis capabilities in Babelomics 5.
The document discusses Nabsys's single-molecule electronic mapping technology. It can detect structural variants like deletions at sub-diffraction limit resolution and with higher resolution than optical mapping. The technology involves DNA moving through nanodetectors that detect signals as DNA passes through. The signals are converted to spatial maps. The document then shows examples of detecting heterozygous deletions in sample NA24385 and comparing detected deletion sizes to reference sizes. It also discusses training a support vector machine model called SV Verify on sample NA12878 to characterize sensitivity and specificity for detecting deletions of different sizes. SV Verify is then applied to putative deletions in NA24385 to confirm a high percentage, demonstrating the technology's accuracy.
This document provides an overview of next generation sequencing technologies and applications. It summarizes an upcoming webinar series on next generation sequencing and its role in cancer biology. The first webinar will provide an introduction to next generation sequencing technologies and applications and be presented by Quan Peng on April 4, 2013. The following two webinars will focus on next generation sequencing for cancer research and data analysis and be presented on April 11 and 18, 2013 respectively.
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
The document discusses Genome in a Bottle (GIAB) and its efforts to characterize human genomes and provide reference materials and benchmarks to evaluate genome sequencing and variant calling. Specifically, it summarizes how GIAB has characterized 7 human genomes, provides extensive public sequencing data for benchmarking, and is now using linked and long reads to expand the small variant benchmark set, develop a structural variant benchmark, and perform diploid assembly of difficult regions. It also shows how new benchmarks that include more difficult regions have revealed errors in previous benchmarks and reduced performance metrics for variant calling tools.
1) Targeted re-sequencing is used to detect sequence differences between an individual and a reference genome in order to identify genetic variants associated with diseases. It can be done using microarrays or next-generation sequencing (NGS).
2) Microarrays have simpler workflows but struggle with insertions/deletions and repeats, while NGS can detect more variant types but has more complex data analysis.
3) Both methods require validation before clinical use, but NGS typically has higher data quality, reproducibility, and ability to detect pathogenic mutations while reducing incidental findings compared to microarrays.
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...QIAGEN
This slidedeck discusses the most biologically efficient, cost-effective method for successful NGS. The GeneRead DNA QuantiMIZE Kits enable determination of the optimum conditions for targeted enrichment of DNA isolated from biological samples, while the GeneRead DNAseq Panels V2 allow you to quickly and reliably deep sequence your genes of interest. Applications in translational and clinical research are highlighted.
PerkinElmer provides end-to-end next generation sequencing (NGS) services from sample intake to data analysis. Their CLIA-certified sequencing laboratory is staffed by expert scientists with decades of experience in genomics who deliver consistently high quality sequencing results. PerkinElmer offers sequencing, library preparation, capture, bioinformatics analysis, and professional consulting services to build customized NGS solutions that meet customers' specific needs and requirements.
The document summarizes research on generating high-quality human reference genomes using PromethION nanopore sequencing. Key points:
- 11 human reference genomes were sequenced in 9 days using PromethION nanopore sequencing and assembly tools, achieving finished assemblies.
- The sequencing strategy included enriching for ultra-long reads over 100kb using a short read eliminator kit to boost overall coverage of long reads.
- Evaluation of one genome assembly showed over 99% consensus base accuracy when aligned to the human reference genome and over 99.76% accuracy for alignments of complete BAC sequences.
- The research aims to further improve assembly quality and reduce costs while increasing throughput using PromethION sequencing and optimized assembly tools
The ChIP-qPCR assays provide pre-designed and validated real-time PCR primer assays that measure genomic DNA enrichment within chromatin immunoprecipitation samples. They save researchers time and money by eliminating the need to design their own assays. The assays provide quick, easy, and quantitative analysis of multiple promoter regions from a single ChIP sample using real-time PCR, addressing challenges of traditional ChIP workflows. The assays offer comprehensive coverage of human, mouse, and rat genomes and can be customized, allowing researchers to expand their ChIP experiments.
This document summarizes work presented at the Genome in a Bottle Workshop on integrating small variant calls from multiple methods. It shows that the new v3.3 calls match the Platinum Genomes reference calls more than the older v2.19 calls, with around 80% fewer differences in high-confidence regions. It also outlines the principles and criteria for the integration process, including forming sensitive calls from each dataset and filtering calls that differ from concordant calls. Ongoing work on further improving complex variant and indel calling is also discussed.
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
Second part of the training session 'RNA-seq for Differential expression' analysis. We explain the characteristics of RNA-seq data that allow us to detect differential expression. Interested in following this session? Please contact http://www.jakonix.be/contact.html
This document summarizes benchmarking of germline small variant calling using Genome in a Bottle (GIAB) reference materials. It highlights best practices for benchmarking, including using benchmarking tools like hap.py and stratified performance metrics. It demonstrates benchmarking an Illumina HiSeq dataset aligned and called against GRCh37 using hap.py and stratifications from the GA4GH benchmarking tool. The results show precision and recall metrics with confidence intervals to evaluate performance across variant classes and difficulty levels. Ongoing work includes developing GIAB resources for GRCh38 and structural variants.
The SlipChip is a simple lab-on-a-chip system that can perform many reactions in parallel in a small volume. It is used to demonstrate digital PCR, which can quantify DNA over a large concentration range by loading samples into wells and counting the number of wells that test positive. The SlipChip allows running multiple digital PCR experiments simultaneously on different sections of the chip, improving quantification accuracy and dynamic range. Examples show detecting pathogens in blood and simultaneously measuring viral loads of HCV and HIV. The system is flexible and its applications are limited only by the assays designed for it. Feedback is sought on potential clinical and research uses.
Neuroscience core lecture given at the Icahn school of medicine at Mount Sinai. This is the version 2 of the same topic. I have made some modifications to give a more gentle introduction and add a new example for ngs.plot.
- PacBio HiFi reads are long (>10 kb) and accurate (>99%). HiFi reads are available now for HG002 and soon for HG001 and HG005.
- HiFi reads will be useful for comprehensive variant detection and phasing. Plans are outlined to apply HiFi reads to structural variant benchmarking and expand small variant calling to difficult regions.
This presentation gives an introduction to analysing ChIP-seq data and is part of a bioinformatics workshop. The accompanying websites are available at http://sschmeier.github.io/bioinf-workshop/#!galaxy-chipseq/
The field of next-generation sequencing (NGS) has been experiencing explosive growth over the past several years and shows little sign of slowing down. The increasing capabilities and dramatically lowered costs have expanded NGS's reach beyond that of the human genome into nearly every corner of biological research. An overview of the platforms on the market today, including an assessment of their relative strengths and weaknesses, will be presented. The presentation will conclude with a peek into where the technology is going and what will be available in the future.
Rfam is an open access database (hosted at the Wellcome Trust Sanger Institute) containing information for RNA families and annotations for millions of RNA genes. Designed to work in a similar way to the Pfam database of protein families, Rfam uses a similar model for annotation and display and is built on the same principle of open access to the data. Each entry in the Rfam database includes multiple sequence alignments, a secondary structure and probabilistic models known as covariance models (CMs), these models can simultaneously handle an RNA sequence and its structure. In conjunction with the Infernal software package, Rfam CMs can be used to search genomes or other DNA sequence databases for homologs to known structural RNA families. You can find more about Rfam at http://rfam.janelia.org/
Forensics: Human Identity Testing in the Applied Genetics GroupNathan Olson
"Forensics: Human Identity Testing in the Applied Genetics Group" presentation at the Standards for Pathogen Identification via NGS (SPIN) workshop hosted by the National Institute for Standards and Technology October 2014 by Peter Vallone, PhD from NIST.
[2017-05-29] DNASmartTagger : Development of DNA sequence tagging tools based on machine learning using public sequence annotation data, NIG International Symposium 2017.
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Thermo Fisher Scientific
This document summarizes the integration of massively parallel sequencing (MPS) using the Ion PGMTM sequencer into a forensic laboratory. The project aims to begin transforming STR profiling to genomic technologies, add additional SNP markers in a single workflow, and enable non-human DNA testing. Initial results show sequencing of amplified STR products is possible but alignment is challenging. A custom panel of 280 targets including STRs, SNPs, and amelogenin was also tested with most targets detected across samples. Ongoing work focuses on improving sensitivity, reproducibility, and analyzing mixed samples. Implementation of MPS as a routine forensic service is estimated within 3-5 years.
The document discusses Nabsys's single-molecule electronic mapping technology. It can detect structural variants like deletions at sub-diffraction limit resolution and with higher resolution than optical mapping. The technology involves DNA moving through nanodetectors that detect signals as DNA passes through. The signals are converted to spatial maps. The document then shows examples of detecting heterozygous deletions in sample NA24385 and comparing detected deletion sizes to reference sizes. It also discusses training a support vector machine model called SV Verify on sample NA12878 to characterize sensitivity and specificity for detecting deletions of different sizes. SV Verify is then applied to putative deletions in NA24385 to confirm a high percentage, demonstrating the technology's accuracy.
This document provides an overview of next generation sequencing technologies and applications. It summarizes an upcoming webinar series on next generation sequencing and its role in cancer biology. The first webinar will provide an introduction to next generation sequencing technologies and applications and be presented by Quan Peng on April 4, 2013. The following two webinars will focus on next generation sequencing for cancer research and data analysis and be presented on April 11 and 18, 2013 respectively.
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
The document discusses Genome in a Bottle (GIAB) and its efforts to characterize human genomes and provide reference materials and benchmarks to evaluate genome sequencing and variant calling. Specifically, it summarizes how GIAB has characterized 7 human genomes, provides extensive public sequencing data for benchmarking, and is now using linked and long reads to expand the small variant benchmark set, develop a structural variant benchmark, and perform diploid assembly of difficult regions. It also shows how new benchmarks that include more difficult regions have revealed errors in previous benchmarks and reduced performance metrics for variant calling tools.
1) Targeted re-sequencing is used to detect sequence differences between an individual and a reference genome in order to identify genetic variants associated with diseases. It can be done using microarrays or next-generation sequencing (NGS).
2) Microarrays have simpler workflows but struggle with insertions/deletions and repeats, while NGS can detect more variant types but has more complex data analysis.
3) Both methods require validation before clinical use, but NGS typically has higher data quality, reproducibility, and ability to detect pathogenic mutations while reducing incidental findings compared to microarrays.
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...QIAGEN
This slidedeck discusses the most biologically efficient, cost-effective method for successful NGS. The GeneRead DNA QuantiMIZE Kits enable determination of the optimum conditions for targeted enrichment of DNA isolated from biological samples, while the GeneRead DNAseq Panels V2 allow you to quickly and reliably deep sequence your genes of interest. Applications in translational and clinical research are highlighted.
PerkinElmer provides end-to-end next generation sequencing (NGS) services from sample intake to data analysis. Their CLIA-certified sequencing laboratory is staffed by expert scientists with decades of experience in genomics who deliver consistently high quality sequencing results. PerkinElmer offers sequencing, library preparation, capture, bioinformatics analysis, and professional consulting services to build customized NGS solutions that meet customers' specific needs and requirements.
The document summarizes research on generating high-quality human reference genomes using PromethION nanopore sequencing. Key points:
- 11 human reference genomes were sequenced in 9 days using PromethION nanopore sequencing and assembly tools, achieving finished assemblies.
- The sequencing strategy included enriching for ultra-long reads over 100kb using a short read eliminator kit to boost overall coverage of long reads.
- Evaluation of one genome assembly showed over 99% consensus base accuracy when aligned to the human reference genome and over 99.76% accuracy for alignments of complete BAC sequences.
- The research aims to further improve assembly quality and reduce costs while increasing throughput using PromethION sequencing and optimized assembly tools
The ChIP-qPCR assays provide pre-designed and validated real-time PCR primer assays that measure genomic DNA enrichment within chromatin immunoprecipitation samples. They save researchers time and money by eliminating the need to design their own assays. The assays provide quick, easy, and quantitative analysis of multiple promoter regions from a single ChIP sample using real-time PCR, addressing challenges of traditional ChIP workflows. The assays offer comprehensive coverage of human, mouse, and rat genomes and can be customized, allowing researchers to expand their ChIP experiments.
This document summarizes work presented at the Genome in a Bottle Workshop on integrating small variant calls from multiple methods. It shows that the new v3.3 calls match the Platinum Genomes reference calls more than the older v2.19 calls, with around 80% fewer differences in high-confidence regions. It also outlines the principles and criteria for the integration process, including forming sensitive calls from each dataset and filtering calls that differ from concordant calls. Ongoing work on further improving complex variant and indel calling is also discussed.
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
Second part of the training session 'RNA-seq for Differential expression' analysis. We explain the characteristics of RNA-seq data that allow us to detect differential expression. Interested in following this session? Please contact http://www.jakonix.be/contact.html
This document summarizes benchmarking of germline small variant calling using Genome in a Bottle (GIAB) reference materials. It highlights best practices for benchmarking, including using benchmarking tools like hap.py and stratified performance metrics. It demonstrates benchmarking an Illumina HiSeq dataset aligned and called against GRCh37 using hap.py and stratifications from the GA4GH benchmarking tool. The results show precision and recall metrics with confidence intervals to evaluate performance across variant classes and difficulty levels. Ongoing work includes developing GIAB resources for GRCh38 and structural variants.
The SlipChip is a simple lab-on-a-chip system that can perform many reactions in parallel in a small volume. It is used to demonstrate digital PCR, which can quantify DNA over a large concentration range by loading samples into wells and counting the number of wells that test positive. The SlipChip allows running multiple digital PCR experiments simultaneously on different sections of the chip, improving quantification accuracy and dynamic range. Examples show detecting pathogens in blood and simultaneously measuring viral loads of HCV and HIV. The system is flexible and its applications are limited only by the assays designed for it. Feedback is sought on potential clinical and research uses.
Neuroscience core lecture given at the Icahn school of medicine at Mount Sinai. This is the version 2 of the same topic. I have made some modifications to give a more gentle introduction and add a new example for ngs.plot.
- PacBio HiFi reads are long (>10 kb) and accurate (>99%). HiFi reads are available now for HG002 and soon for HG001 and HG005.
- HiFi reads will be useful for comprehensive variant detection and phasing. Plans are outlined to apply HiFi reads to structural variant benchmarking and expand small variant calling to difficult regions.
This presentation gives an introduction to analysing ChIP-seq data and is part of a bioinformatics workshop. The accompanying websites are available at http://sschmeier.github.io/bioinf-workshop/#!galaxy-chipseq/
The field of next-generation sequencing (NGS) has been experiencing explosive growth over the past several years and shows little sign of slowing down. The increasing capabilities and dramatically lowered costs have expanded NGS's reach beyond that of the human genome into nearly every corner of biological research. An overview of the platforms on the market today, including an assessment of their relative strengths and weaknesses, will be presented. The presentation will conclude with a peek into where the technology is going and what will be available in the future.
Rfam is an open access database (hosted at the Wellcome Trust Sanger Institute) containing information for RNA families and annotations for millions of RNA genes. Designed to work in a similar way to the Pfam database of protein families, Rfam uses a similar model for annotation and display and is built on the same principle of open access to the data. Each entry in the Rfam database includes multiple sequence alignments, a secondary structure and probabilistic models known as covariance models (CMs), these models can simultaneously handle an RNA sequence and its structure. In conjunction with the Infernal software package, Rfam CMs can be used to search genomes or other DNA sequence databases for homologs to known structural RNA families. You can find more about Rfam at http://rfam.janelia.org/
Forensics: Human Identity Testing in the Applied Genetics GroupNathan Olson
"Forensics: Human Identity Testing in the Applied Genetics Group" presentation at the Standards for Pathogen Identification via NGS (SPIN) workshop hosted by the National Institute for Standards and Technology October 2014 by Peter Vallone, PhD from NIST.
[2017-05-29] DNASmartTagger : Development of DNA sequence tagging tools based on machine learning using public sequence annotation data, NIG International Symposium 2017.
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Thermo Fisher Scientific
This document summarizes the integration of massively parallel sequencing (MPS) using the Ion PGMTM sequencer into a forensic laboratory. The project aims to begin transforming STR profiling to genomic technologies, add additional SNP markers in a single workflow, and enable non-human DNA testing. Initial results show sequencing of amplified STR products is possible but alignment is challenging. A custom panel of 280 targets including STRs, SNPs, and amelogenin was also tested with most targets detected across samples. Ongoing work focuses on improving sensitivity, reproducibility, and analyzing mixed samples. Implementation of MPS as a routine forensic service is estimated within 3-5 years.
The document describes Phase II of the ABRF Next Generation Sequencing Study which aims to establish reference data sets for evaluating DNA sequencing performance across multiple platforms and laboratories. Phase II will sequence various human and bacterial genomic samples to assess accuracy, coverage, and limits of detection using different platforms and library preparation methods. A collaboration with NIST Genome in a Bottle will provide standardized samples to the participating laboratories. The study aims to provide a resource for ongoing method development and evaluation of sequencing performance.
The document describes QIAGEN's GeneRead DNAseq Targeted Exon Enrichment and GeneRead Library Quantification System for next generation sequencing. It discusses targeted enrichment workflow and principles, data analysis, pathway content of panels, performance data and application examples. It also covers the library quantification workflow, using qPCR to quantify sequencing libraries, and a DNAseq library quantification array to assess sample quality. The document is aimed at promoting these NGS sample preparation and analysis solutions to potential customers.
This document describes SureFIND Transcriptome PCR Arrays, which are ready-to-use cDNA panels that can identify the miRNAs, pathways, or transcription factors that regulate gene expression. Each array contains cDNA from cells treated with different factors, such as miRNA mimics or pathway inhibitors. The document outlines an example where a Transcriptome PCR Array identified three miRNAs - miR-193b, miR-138, and miR-373 - that regulate the INPPL1 gene. Users are encouraged to validate top hits from the arrays.
Overview of methods for variant calling from next-generation sequence dataThomas Keane
This document provides an overview of methods for variant calling from next-generation sequencing data. It describes the SAM/BAM format for storing read alignments and details various tools for manipulating and visualizing BAM files. It also discusses approaches for calling SNPs, small indels, and structural variants from aligned sequencing data, highlighting factors to consider for accurate variant detection.
The document discusses GeneRead DNAseq Targeted Exon Enrichment and GeneRead Library Quantification System for Next Generation Sequencing. It provides an overview of the targeted enrichment workflow and principles, pathway-focused analysis tools, library quantification workflow, and performance data. The targeted enrichment panels allow users to focus sequencing on genes of interest, improve detection of low prevalence mutations from poor quality samples. The library quantification system uses qPCR to accurately quantify sequencing libraries and assess sample quality before NGS runs.
This document summarizes the Genome in a Bottle (GIAB) Consortium's efforts to characterize structural variants in human genomes to serve as benchmarks. The GIAB Consortium has generated structural variant calls for 7 human genomes using diverse data types and analysis methods. The document describes the GIAB Consortium's process for integrating these data to identify high-confidence structural variant calls to include in version 0.6 of the structural variant benchmark set. It provides examples of different types of structural variants characterized and evaluates the trustworthiness of the benchmark calls based on independent validation. The document also discusses ongoing efforts to further improve structural variant characterization using emerging long-read technologies.
The document describes the ABRF Next Generation Sequencing Study which aims to produce reference data sets to establish baseline performance of sequencing platforms and methods. Phase I focused on RNA-Seq and produced major conclusions about intraplatform and interplatform concordance. Phase II will focus on DNA sequencing including performance using different platforms/protocols, damaged DNA, small genomes, and oncogenic mutations. Samples and sequencing plans are described for three projects. The study is a collaboration between ABRF and Genome in a Bottle to generate standardized reference data.
This slidedeck details two comprehensive informatics solutions — the Biomedical Genomics Workbench and Ingenuity Knowledge Base Variant Analysis platforms. We show the intuitive user interface of CLC Cancer Research Workbench and demonstrate how the rich biological content from Ingenuity Knowledge Base helps you rapidly identify critical variants in your samples.
This document discusses GeneRead DNAseq Targeted Exon Enrichment and the GeneRead Library Quantification System for next generation sequencing. It begins with an introduction and agenda, then discusses targeted enrichment including the workflow, principles, data analysis, pathway content, performance data, and an application example. It also discusses library quantification including the workflow and an application example. In summary, the document presents Qiagen's GeneRead DNAseq and Library Quant systems as targeted enrichment and library quantification solutions for next generation sequencing applications.
Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen H...Arghya Kusum Das
Recent advances in large-scale experimental facilities ushered in an era of data-driven science. These large-scale data increase the opportunity to answer many fundamental questions in basic science. However, these data pose new challenges to the scientific community in terms of their optimal processing. Consequently, scientists are in dire need of robust high-performance computing (HPC) solutions that can scale with terabytes of data.
In this talk, I will address the challenges of two major aspects of scientific big data processing: 1) Developing scalable software and algorithms for data- and compute-intensive scientific applications. 2) Proposing new cluster architectures that these applications and software tools need for good performance. In this talk, I will mainly address the challenges involved in large-scale genome analysis applications such as, genomic error correction and genome assembly which made their way to the forefront of big data challenges recently as the sequencing machines outperformed Moore's law by several magnitudes.
In the first part, I will address the challenges involved in developing scalable algorithms to process huge amounts of genomic big data using the power of recent analytic tools such as, Hadoop, Giraph, distributed NoSQL, etc. The algorithms are carefully tailored to scale over terabytes of data over hundreds of computing nodes. At a border level, these algorithms take advantage of locality-based computing for their scalability. In this aspect, I will briefly talk about my general-purpose, analytic framework for easy and rapid designing of embarrassingly parallel algorithms for massive-scale scientific data.
In the second part, I will address the challenges in designing the hardware environment that these data- and compute-intensive applications require for good performance. I will pinpoint the limitations in a traditional HPC cluster (supercomputer) to process this huge amount of big genomic data with respect to these applications and propose a solution to those limitations by balancing the storage (both I/O and memory) bandwidth, with the computational speed of high-performance CPUs. I will briefly discuss my theoretical model that can help the HPC system designers who are striving for system balance.
Many of these observations and developments are used by different hardware vendors such as, Samsung and IBM to develop or improve the configuration of their next-gen HPC clusters (e.g., Samsung’s hyper-scale computing cluster, IBM’s Power8-based supercomputer) with high-speed storage and processing power
High data quality and accuracy are recognized characteristics of Sanger re-sequencing projects and are primary reasons that next generation sequencing projects compliment their results by capillary electrophoresis data validation. We have developed an on-line tool called Primer Designer™ to streamline the NGS-to-Sanger sequencing workflow by taking the laborious task of PCR primer design out of the hands of the researcher by providing pre-designed assays for the human exome. The primer design tool has been created to enable scientists using next generation sequencing to quickly confirm variants discovered in their work by providing the means to quickly search, order and receive suitable pre-designed PCR primers for Sanger sequencing. Using the Primer Designer™ tool to design M13-tailed and non-tailed PCR primers for Sanger sequencing we will demonstrate validation of 28-variants across 24-amplicons and 19-genes using the BDD, BDTv1.1 and BDTv3.1 sequencing chemistries on the 3500xl Genetic Analyzer capillary electrophoresis platform.
This document discusses copy number variation analysis and qBiomarker Copy Number PCR Arrays. It begins with defining copy number variation and describing current methods to analyze copy number, including array CGH, SNP chips, NGS, qPCR and FISH. It then discusses issues with using single gene references and introduces the concept of a multicopy reference assay as a better reference. The remainder focuses on qBiomarker Copy Number PCR Arrays, which allow profiling copy number variation across curated gene sets or custom arrays. The arrays utilize a multicopy reference assay and are compatible with most qPCR instruments. Data analysis is performed using an online portal.
The Polymerase Chain Reaction (PCR) revolutionized life sciences by providing an efficient means of amplifying DNA. Invented in 1983 by Kary Mullis, PCR uses the DNA polymerase from Thermus aquaticus bacteria and cycles of heating and cooling to amplify a targeted region of DNA. The primary materials needed for PCR are DNA nucleotides, template DNA, primers that are complementary to the template DNA, and a heat-stable DNA polymerase. PCR changed molecular biology techniques and applications.
The document discusses copy number variation and strategies for analyzing copy number alterations. It describes what copy number is and how copy number variations occur frequently in the human genome. Several techniques are used for copy number analysis including array comparative genomic hybridization, single nucleotide polymorphism chips, and next-generation sequencing for discovery, and fluorescence in situ hybridization and quantitative PCR for validation. Quantitative PCR is a common method for copy number validation due to its reliability and ease of use. Important considerations for quantitative PCR include the choice of reference gene, as single-copy genes can be affected by copy number changes and genomic variations. A multicopy reference assay is recommended as it is less influenced by local genomic changes and provides more accurate copy number measurements.
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String MatchingIJERA Editor
This document presents a novel framework for identifying Short Tandem Repeats (STRs) using parallel string matching. It begins with background on STRs and challenges with existing sequential algorithms. It then describes a two-phase methodology - first applying a basic improved right prefix algorithm sequentially, then applying it in parallel using multi-threading on multicore processors. Results show the basic algorithm outperforms Boyer-Moore, Knuth-Morris-Pratt and brute force algorithms sequentially. When applied in parallel, processing time is reduced from 80ms sequentially to 40ms in parallel on multicore systems. The parallel STR identification framework allows efficient searching of repeats in large genomes.
Genome in a bottle for ashg grc giab workshop 181016GenomeInABottle
Genome in a Bottle (GIAB) provides benchmark genomes to evaluate the accuracy of variant calling from whole genome sequencing. GIAB has characterized 7 human genomes to date, including difficult variants. The benchmark calls continue to evolve as new data and methods are integrated. While current benchmarks enable validation of "easier" variants, GIAB is working to characterize more difficult variants and regions. This will allow validation of clinical tests focused on difficult sites. GIAB data and analyses are openly available to support method development and technology optimization.
This document discusses oncogenomics and cancer genomics technologies. It provides an overview and agenda for oncogenomics, describing various types of DNA biomarkers and somatic mutations that can be detected. It then discusses experimental strategies and techniques used for cancer genomics, including discovery and validation methods. The document reviews QIAGEN's qBiomarker somatic mutation assay pipeline and array products. It compares different mutation detection technologies and their sensitivity. Finally, it discusses disease-focused mutational profiling and data analysis.
The document discusses using 3D plots to analyze process data collected in arrays. A field trial demonstrated collecting absorber temperature data as arrays for 3D plotting. This allowed analyzing the distributed temperature profile and effects of intercooling. High speed data can also be trended by collecting it as arrays. The field trial showed benefits for startup optimization and process improvements.
Styles of Scientific Reasoning, Scientific Practices and Argument in Science ...Elsa von Licy
The document discusses various topics related to scientific reasoning, practices, and argumentation including different styles of scientific thinking, features of scientific knowledge, and teaching and learning science. It provides examples of "crazy ideas" in science that are now accepted, examines the role of argument in science, and outlines the scientific practices and central questions of science. It also discusses developing models, planning investigations, analyzing data, and constructing explanations as key scientific practices.
Anti-philosophy rejects traditional philosophy and logic, instead embracing creativity, spirituality, and personality. It considers philosophy to be dead, kept alive artificially by analytic philosophers. The document criticizes how philosophy is currently taught and argues it has become unproductive, replacing original aims with nonsense. Anti-philosophy's goal is not to destroy philosophy but to transform its current state and avoid fundamentalism in philosophy and science.
There is no_such_thing_as_a_social_science_introElsa von Licy
This document provides an introduction and overview of the arguments made in the book "There is No Such Thing as Social Science". It begins by stating the provocative title and questioning whether the authors will take it back or qualify their position.
It then outlines three ways the term "social science" could be used - referring to a scientific spirit of inquiry, a shared scientific method, or reducibility to natural sciences. The authors argue against the latter two, methodological and substantive reductionism.
The introduction discusses how opponents may accuse the authors of being a priori or anti-reductionist, but argues that those defending social science are actually being dogmatic by insisting it must follow a scientific model. It frames the debate as being
1. Sample & Assay Technologies
Next Generation Sequencing:
Data analysis for genetic profiling
Ravi Vijaya Satya, Ph.D.
Senior Scientist, R&D
Ravi.VijayaSatya@QIAGEN.com
2. Sample & Assay Technologies
Welcome to the three-part webinar series
Next Generation Sequencing and its role in cancer biology
Webinar 1: Next-generation sequencing, an introduction to technology and
applications
Date:
April 4, 2013
Speaker:
Quan Peng, Ph.D.
Webinar 2:
Date:
Speaker:
Next-generation sequencing for cancer research
April 11, 2013
Vikram Devgan, Ph.D., MBA
Webinar 3:
Date:
Speaker:
Next-generation sequencing data analysis for genetic profiling
April 18, 2013
Ravi Vijaya Satya, Ph.D.
Title, Location, Date
2
3. Sample & Assay Technologies
Agenda
NGS Data Analysis
Read Mapping
Variant Calling
Variant Annotation
Targeted Enrichment
GeneRead Gene Panels
GeneRead Data Analysis Portal
Background
Workflow
Data Interpretation
3
4. Sample & Assay Technologies
Read Mapping
Reads mapped to a reference genome
Millions of reads
from a single run
Alignment
Mapping Quality
Programs for read-mapping
Hash-based
MAQ, ELAND, SOAP, Novoalign
Suffix array/Burrows Wheeler Transform based
BWA, BowTie, BowTie2, SOAP2
Title, Location, Date
4
5. Sample & Assay Technologies
Variant Calling
Determine if there is enough statistical support to call a variant
Reference sequence
ACAGTTAAGCCTGAACTAGACTAGGATCGTCCTAGATAGTCTCGATAGCTCGATATC
Aligned reads
AACTAGACTAGGATCGTCCTAGATAGTCTCG
AACTAGACTAGGATCGTCCTACATAGTCTCG
AACTAGACTAGGATCGTCCTACATAGTCTCG
GATCGTCCTAGATAGTCTCGATAGCTCGAT
GATCGTCCTAGATAGTCTCGATAGCTCGAT
GATCGTCCTAGATAGTCTCGATAGCTCGAT
Multiple factors are considered in calling variants
No. of reads with the variant
Mapping qualities of the reads
Base qualities at the variant position
Strand bias (variant is seen in only one of the strands)
Variant Calling Software
GATK Unified Genotyper, Torrent Variant Caller, SamTools, Mutect, …
Title, Location, Date
5
6. Sample & Assay Technologies
Variant Representation
VCF – Variant Call Format
http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
Header lines
##fileformat=VCFv4.1
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been
filtered">
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to
detect strand bias">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=OND,Number=1,Type=Float,Description="Overall non-diploid ratio (alleles/(alleles+nonalleles))">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##contig=<ID=chrM,length=16571,assembly=hg19>
##contig=<ID=chr1,length=249250621,assembly=hg19>
##contig=<ID=chr2,length=243199373,assembly=hg19>
Column labels
##contig=<ID=chr3,length=198022430,assembly=hg19>
##contig=<ID=chr4,length=191154276,assembly=hg19>
#CHROM POS
ID
REF ALT
QUAL FILTER
INFO
FORMAT
Sample
chr1
11181327 rs11121691 C
T
100.0 PASS
DP=1000;MQ=87.67 GT:AD:DP 0/1:146,45:191
chr1
11190646 rs2275527
G
A
100.0 PASS
DP=1000;MQ=67.38 GT:AD:DP 0/1:462,121:583
chr1
11205058 rs1057079
C
T
100.0 PASS
DP=1000;MQ=79.57 GT:AD:DP 0/1:49,143:192
Variant calls
Title, Location, Date
6
7. Variant Annotation
Sample & Assay Technologies
dbSNP/COSMIC ID
Chro
m
chr1
chr1
chr1
chr1
chr1
chr1
chr1
chr1
chr1
chr1
chr1
chr2
R
ef
11181327 rs11121691 C
11190646 rs2275527 G
11205058 rs1057079 C
11288758 rs1064261 G
11300344 rs191073707 C
11301714 rs1135172 A
11322628 rs2295080 G
186641626 rs2853805 G
186642429 rs2206593 A
186643058
rs5275
A
186645927 rs2066826 C
29415792 rs1728828 G
Pos
ID
Alt
T
A
T
A
T
G
T
A
G
G
T
A
Actual change and position within
the codon or amino acid sequence
Gene Mutation
Name type
MTOR SNP
MTOR SNP
MTOR SNP
MTOR SNP
MTOR SNP
MTOR SNP
MTOR SNP
PTGS2 SNP
PTGS2 SNP
PTGS2 SNP
PTGS2 SNP
ALK
SNP
chr2 29416366
rs1881421 G C
ALK
SNP
chr2 29416481
rs1881420 T C
ALK
SNP
chr2 29416572
rs1670283 T C
ALK
SNP
chr2 29419591
chr2 29445458
chr2 29446184
rs1670284 G T
rs3795850 G T
rs2276550 C G
ALK
ALK
ALK
SNP
SNP
SNP
Effect of the variant
on protein coding
Codon
AA
Filtered
Variant
Allele Frequency
snpEff Effect
Change
Change Coverage
Frequency
c.6909C>T p.L2303
C=0.761 T=0.239
SYNONYMOUS_CODING
1,924
0.239
c.5553G>A p.S1851
G=0.791 A=0.208
SYNONYMOUS_CODING
5,842
0.208
c.4731C>T p.A1577
C=0.254 T=0.746
SYNONYMOUS_CODING
1,928
0.746
c.2997G>A p.N999
G=0.212 A=0.788
SYNONYMOUS_CODING
5,186
0.788
C=0.924 T=0.076
INTRON
210
0.076
c.1437A>G p.D479
A=0.248 G=0.752
SYNONYMOUS_CODING
3,965
0.752
G=0.239 T=0.755
UPSTREAM
339
0.755
G=0.0 A=1.0
UTR_3_PRIME
97
1
A=0.167 G=0.833
UTR_3_PRIME
3,552
0.833
A=0.759 G=0.241
UTR_3_PRIME
237
0.241
C=0.88 T=0.12
INTRON
209
0.12
G=0.0 A=1.0
UTR_3_PRIME
2,520
1
NON_SYNONYMOUS_CODI
c.4587G>C p.D1529E
G=0.907 C=0.093
NG
4,361
0.093
NON_SYNONYMOUS_CODI
c.4472T>C p.K1491R
T=0.954 C=0.045
NG
3,061
0.045
NON_SYNONYMOUS_CODI
c.4381T>C p.I1461V
T=0.0 C=0.999
NG
5,834
0.999
G=0.093 T=0.907
INTRON
739
0.907
c.3375G>T p.G1125
G=0.917 T=0.082
SYNONYMOUS_CODING
1,776
0.082
C=0.895 G=0.105
INTRON
475
0.105
SIFT score
Predicts the deleterious effect of an amino acid change based on how conserved the
sequence is among related species
Polyphen score
Predicts the impact of the variant on protein structure
Title, Location, Date
7
8. Sample & Assay Technologies
GeneRead DNAseq Gene Panel: Targeted Sequencing
What is targeted sequencing?
Sequencing a sub set of regions in the whole-genome
Why do we need targeted sequencing?
Not all regions in the genome are of interest or relevant to specific study
Exome Sequencing: sequencing most of the exonic regions of the genome (exome).
Protein-coding regions constitute less than 2% of the entire genome
Focused panel/hot spot sequencing: focused on the genes or regions of interest
What are the advantages of focused panel sequencing?
More coverage per sample, more sensitive mutation detection
More samples per run, lower cost per sample
Title, Location, Date
8
9. Sample & Assay Technologies
Target Enrichment - Methodology
Multiplex PCR
Small DNA input (< 100ng)
Short processing time
(several hrs)
Relatively small throughput
(KB - MB region)
Sample
preparation
(DNA
isolation)
PCR target
enrichment
(2 hours)
Title, Location, Date
Library
construction
Sequencing
Data analysis
9
10. Sample & Assay Technologies
Variants Identifiable through Multiplex PCR
SNPs – single nucleotide polymorphisms
Indels
Indels < 20 bp in length
Variants not callable
Structural variants
Large indels
Inversions
Copy Number Variants (CNVs)
Large insertion
Inversion
CNV
Title, Location, Date
10
11. Sample & Assay Technologies
GeneRead DNAseq Gene Panel
Multiplex PCR technology based targeted enrichment for DNA sequencing
Cover all human exons (coding region + UTR)
Division of gene primers sets into 4 tubes; up to 1200 plex in each tube
11
12. Sample & Assay Technologies
GeneRead DNAseq Gene Panel
Focus on your Disease of Interest
Comprehensive Cancer Panel (124 genes)
Disease Focused Gene Panels (20 genes)
Breast cancer
Colon Cancer
Gastric cancer
Leukemia
Liver cancer
Genes Involved in Disease
Lung Cancer
Ovarian Cancer
Prostate Cancer
Genes with High Relevance
12
14. Sample & Assay Technologies
Data Analysis for Targeted Sequencing
GeneRead data analysis work flow
Read
Mapping
Primer
Trimming
Variant
Calling
Variant
Annotation
Read mapping
Identify the possible position of the read within the reference
Align the read sequence to reference sequences
Primer trimming
Remove primer sequences from the reads
Variant calling
Identify differences between the reference and reads
Variant filtering and annotation
Functional information about the variant
Title, Location, Date
14
15. Sample & Assay Technologies
Reads from Targeted Sequencing
Typical NGS raw read from targeted sequencing
Adapter
Barcode
Primer
Insert sequence
Primer
Adapter
-3’
5’Removal of adapters and de-multiplexing
Primer
Insert sequence
Primer
-3’
5’-3’
5’Read length can vary:
only part of the insert
5’or the 3’ primer may
be present
5’-
-3’
-3’
Title, Location, Date
15
16. Sample & Assay Technologies
Read Mapping
Align reads to the reference genome
Reference sequence
Amplicon 1
Amplicon 2
Aligned reads
Title, Location, Date
16
17. Primer Trimming
Sample & Assay Technologies
Primer sequences must be trimmed for accurate variant calling
Reference sequence
Amplicon 1
Frequency of `C` without
primer trimming = 4/13 = 31%
C
C
C
C
Aligned reads
Amplicon 2
Title, Location, Date
Frequency of `C` after primer
trimming = 4/7 = 57%
17
18. Sample & Assay Technologies
GeneRead Variant Calling Overview
BAM file
(w/ flow space info)
BED file for
amplicons used
Run Parameters
Annotation
Variant calling and filtering
Torrent Variant
Caller (TVC)
vcf
GATK Variant
Annotator
IonTorrent
vcf
snpEff
(basic annotation)
vcf
Seq.
platform
vcf
GATK Unified
Genotyper
MiSeq
GATK Indel
Realigner
bam
bam
GATK Base
Quality
Recalibrator
GATK Variant
Filtration
vcf
Additional filtering
(based on
frequency and
coverage)
SnpSift
(links to dbSNP,
Cosmic and
computation of
Sift scores, etc.)
VCF to excel
dbSNP
Cosmic
dbNSFP
Variants in excel
format
Title, Location, Date
18
19. Sample & Assay Technologies
Indel realignment
DePristo MA, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat
Genet. 2011 May; 13(5):191-8. PMID: 21178889
Eliminates some false-positive variant calls around indels
Read aligners can not eliminate these alignment errors since they align reads
independently
Multiple sequence alignment can identify these errors and correct them
Title, Location, Date
19
20. Sample & Assay Technologies
Base Quality Recalibration
DePristo MA, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data.
Nat Genet. 2011 May; 13(5):191-8. PMID: 21178889
Eliminates sequencer-specific biases
Lane-specific/sample-specific biases
Instrument-specific under-reporting/over-reporting of quality scores
Systematic errors based on read position
Di-nucleotide-specific sequencing errors
Recalibrartion leads to improved variant calls
Title, Location, Date
20
21. Sample & Assay Technologies
Variant Filtration
Variant Frequency
Somatic mode
SNPs with frequency < 4% and indels with frequency < 20%
Germline mode
SNPs with frequency < 20% and indels with frequency < 25%
Strand Bias
SNPs with FS ≤ 60
Indels with FS ≤ 200
Mapping Quality
SNPs with MQ ≤ 40.0
C
C
C
Haplotype Score
SNPs with HaplotypeScore ≤ 13.0
Not applicable for pooled samples
Title, Location, Date
Strand Bias: variants that are
present in reads from only one of
the two strands
21
22. Sample & Assay Technologies
Specificity Analysis
Specificity: the percentage of sequences that map to the intended targets
region of interest
number of on-target reads / total number of reads
Reference
sequence
ROI 1
ROI 2
NGS
reads
Off-target reads
On-target reads
Title, Location, Date
On-target
reads
22
23. Sample & Assay Technologies
Sequencing Depth
Coverage depth (or depth of coverage): how many times each base has been sequenced
Unlike Sanger sequencing, in which each sample is sequenced 1-3 times to be confident of
its nucleotide identity, NGS generally needs to cover each position many times to make a
confident base call, due to relative high error rate (0.1 - 1% vs 0.001 – 0.01%)
Increasing coverage depth is also helpful to identify low frequent mutations in heterogenous
samples such as cancer sample
Reference
sequence
NGS
reads
coverage depth = 4
Title, Location, Date
coverage depth = 3
coverage depth = 2
23
24. Sample & Assay Technologies
NGS Data Analysis: Uniformity
Coverage uniformity: measure the evenness of the coverage depth of target
position
Reference
sequence
NGS
reads
coverage depth = 10
coverage depth = 3
coverage depth = 2
Title, Location, Date
24
25. Sample & Assay Technologies
GeneRead Data Analysis Web Portal
FREE Complete & Easy to use Data Analysis with Web-based Software
25
26. Sample & Assay Technologies
GeneRead Data Analysis Web Portal
Title, Location, Date
26
29. Sample & Assay Technologies
Note: Runtimes depend on the number of reads in the input files. Typical runtimes are 20-60 minutes.
Title, Location, Date
29
33. Sample & Assay Technologies
Summary
Run Summary
Specificity
Coverage
Uniformity
Numbers of SNPs and Indels
Summary By Gene
Specificity
Coverage
Uniformity
# of SNPs and Indels
33
34. Sample & Assay Technologies
Features of Variant Report
SNP detection
Indel detection
34
35. Sample & Assay Technologies
QIAGEN’s GeneRead DNAseq Gene Panel System
FOCUS ON YOUR RELEVANT GENES
Focused:
Biologically relevant content
selection enables deep sequencing
on relevant genes and identification
of rare mutations
Flexible:
Mix and match any gene of interest
NGS platform independent:
Functionally validated for PGM,
MiSeq/HiSeq
Integrated controls:
Enabling quality control of prepared
library before sequencing
Free, complete and easy of use data
analysis tool
36. Sample & Assay Technologies
Upcoming webinars
Next Generation Sequencing and its role in cancer biology
Webinar 3:
Date:
Speaker:
Next-generation sequencing data analysis for genetic profiling
April 18, 2013
Ravi Vijaya Satya, Ph.D.
Webinar 1: Next-generation sequencing, an introduction to technology and
applications
Date:
May 3, 2013
Speaker:
Quan Peng, Ph.D.
Webinar 2:
Date:
Speaker:
Next-generation sequencing for cancer research
May 10, 2013
Vikram Devgan, Ph.D., MBA
Title, Location, Date
36