The quality of data is very important for various downstream analyses, such as sequence assembly, single nucleotide polymorphisms identification this ppt show parameters for
NGS Data quality check and Dataformat of top sequencing machine
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
This document provides an overview of downstream analyses that can be performed after variant identification and filtering in a typical variant calling pipeline. It discusses visualization of variant data in each gene to identify potential causative variants. It also mentions association studies as another type of downstream analysis where variants are tested for association with disease phenotypes. The goal of downstream analyses is to help prioritize variants for further investigation.
This document provides an overview of next generation sequencing (NGS) analysis. It discusses various NGS platforms such as Illumina, Roche 454, PacBio, and Ion Torrent. It also covers common file formats for sequencing data like FASTQ, quality control measures to assess data quality, and applications of NGS such as RNA-seq and ChIP-seq. The document aims to introduce researchers to basic concepts in NGS analysis and highlights available resources for storing and analyzing large sequencing datasets.
A workshop is intended for those who are interested in and are in the planning stages of conducting an RNA-Seq experiment. Topics to be discussed will include:
* Experimental Design of RNA-Seq experiment
* Sample preparation, best practices
* High throughput sequencing basics and choices
* Cost estimation
* Differential Gene Expression Analysis
* Data cleanup and quality assurance
* Mapping your data
* Assigning reads to genes and counting
* Analysis of differentially expressed genes
* Downstream analysis/visualizations and tables
Next-generation sequencing and quality control: An Introduction (2016)Sebastian Schmeier
This lecture is part is an introductory bioinformatics workshop. It gives a background to what sequencing is, what the results of a sequencing experiment are, how to assess the quality of a sequencing run, what error sources exist and how to deal with errors. The accompanying websites are available at http://sschmeier.com/bioinf-workshop/
The document describes an RNA-seq analysis workflow that includes:
1. Preprocessing raw reads including quality control, filtering, and alignment to a reference genome using tools like FastQC, Bowtie2, and TopHat.
2. Assembling transcripts and estimating abundance using Cufflinks and HTseq-count.
3. Identifying differentially expressed genes between samples using DESeq and Cuffdiff.
4. Providing gene annotations and visualizing results using tools like GO, KEGG, and CummeRbund.
The workflow follows a typical reference-based analysis approach and uses various open source tools for read mapping, assembly, quantification, and differential expression.
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
This document provides an overview of downstream analyses that can be performed after variant identification and filtering in a typical variant calling pipeline. It discusses visualization of variant data in each gene to identify potential causative variants. It also mentions association studies as another type of downstream analysis where variants are tested for association with disease phenotypes. The goal of downstream analyses is to help prioritize variants for further investigation.
This document provides an overview of next generation sequencing (NGS) analysis. It discusses various NGS platforms such as Illumina, Roche 454, PacBio, and Ion Torrent. It also covers common file formats for sequencing data like FASTQ, quality control measures to assess data quality, and applications of NGS such as RNA-seq and ChIP-seq. The document aims to introduce researchers to basic concepts in NGS analysis and highlights available resources for storing and analyzing large sequencing datasets.
A workshop is intended for those who are interested in and are in the planning stages of conducting an RNA-Seq experiment. Topics to be discussed will include:
* Experimental Design of RNA-Seq experiment
* Sample preparation, best practices
* High throughput sequencing basics and choices
* Cost estimation
* Differential Gene Expression Analysis
* Data cleanup and quality assurance
* Mapping your data
* Assigning reads to genes and counting
* Analysis of differentially expressed genes
* Downstream analysis/visualizations and tables
Next-generation sequencing and quality control: An Introduction (2016)Sebastian Schmeier
This lecture is part is an introductory bioinformatics workshop. It gives a background to what sequencing is, what the results of a sequencing experiment are, how to assess the quality of a sequencing run, what error sources exist and how to deal with errors. The accompanying websites are available at http://sschmeier.com/bioinf-workshop/
The document describes an RNA-seq analysis workflow that includes:
1. Preprocessing raw reads including quality control, filtering, and alignment to a reference genome using tools like FastQC, Bowtie2, and TopHat.
2. Assembling transcripts and estimating abundance using Cufflinks and HTseq-count.
3. Identifying differentially expressed genes between samples using DESeq and Cuffdiff.
4. Providing gene annotations and visualizing results using tools like GO, KEGG, and CummeRbund.
The workflow follows a typical reference-based analysis approach and uses various open source tools for read mapping, assembly, quantification, and differential expression.
The document discusses RNA-seq analysis. It begins with an introduction to Mikael Huss, a bioinformatics scientist, and provides an overview of how genomics, RNA profiles, protein profiles, and interactomics relate within systems biology. The document then discusses how gene expression analysis can provide insights into basic research questions regarding tissue and cell identity, as well as insights into diseases by identifying genes that are over- or under-expressed in patients. Finally, it provides a brief overview of the typical workflow for RNA-seq analysis, which involves mapping RNA sequencing reads to a reference genome or transcriptome.
RNASeq - Analysis Pipeline for Differential ExpressionJatinder Singh
RNA-Seq is a technique that uses next generation sequencing to sequence RNA transcripts and quantify gene expression levels. It can be used to estimate transcript abundance, detect alternative splicing, and compare gene expression profiles between healthy and diseased tissue. Computational challenges include read mapping due to exon-exon junctions and normalization of read counts. Key steps in RNA-Seq analysis include read mapping, transcript assembly, counting and normalizing reads, and detecting differentially expressed genes.
1. O documento descreve os principais aspectos do projeto de experimento para RNAseq, incluindo o design experimental, a complexidade do transcriptoma, e as aplicações da transcriptômica de próxima geração.
2. Fatores como heterozigozidade, poliploidia, isoformas alternativas de splicing, estágios de desenvolvimento e partes do organismo afetam a complexidade do projeto.
3. O design experimental deve levar esses fatores em consideração para obter dados de RNAseq confiáveis.
This document provides an introduction to next generation sequencing (NGS) technologies. It begins with an outline of topics to be covered, including the evolution of NGS technologies, their descriptions and comparisons, bioinformatics challenges of NGS data analysis, and some aspects of NGS data analysis workflows and tools. The document then delves into explanations of specific NGS platforms, their performance characteristics, and the sequencing processes. It discusses the large computational infrastructure and data management needs of NGS, as well as quality control, preprocessing of NGS data, and popular analysis tools and workflows.
This document discusses next generation sequencing technologies. It provides details on several massively parallel sequencing platforms and describes their advantages over traditional Sanger sequencing such as higher throughput, lower costs, and ability to process millions of reads in parallel. It then outlines several applications of next generation sequencing like mutation discovery, transcriptome analysis, metagenomics, epigenetics research and discovery of non-coding RNAs.
The document provides an overview of Chip-seq data analysis. It discusses the Chip-seq technology, visualization of genomic data, command line analysis including quality checking, alignment, peak calling, annotation, and motif finding. It also discusses downstream analysis such as comparing samples, analyzing region occupancy, and web resources for Chip-seq analysis.
This document provides an introduction and overview of common methods for processing and analyzing next generation sequencing (NGS) data, including mapping NGS reads and de novo assembly of NGS reads. It discusses various NGS applications such as RNA-Seq, epigenetics, structural variation detection, and metagenomics. Key steps in read alignment such as choosing an alignment program and viewing alignments are outlined. Considerations for choosing an alignment program based on library type, read type, and platform are also reviewed. Popular alignment programs including Bowtie, BWA, TopHat, and Novoalign are mentioned.
This document outlines exercises for quality control of NGS data from an Illumina sequencing experiment on tomato ripening stages. The exercises include: 1) evaluating raw fastq files for format and number of sequences; 2) using FastQC to analyze read quality scores, lengths, duplication levels, and k-mer content; and 3) preprocessing the reads using fastq-mcf to trim low quality ends and remove short reads before reanalyzing with FastQC. The goal is to learn how to evaluate NGS read quality and preprocess data prior to downstream analysis.
Introduction to Next-Generation Sequencing (NGS) TechnologyQIAGEN
The continuous evolution of NGS technology has led to an enormous diversification in NGS applications and dramatically decreased the costs to sequence a complete human genome.
In this presentation, we will discuss the following major topics:
• Basic overview of NGS sequencing technologies
• Next-generation sequencing workflow
• Spectrum of NGS applications
• QIAGEN universal NGS solutions
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
This document provides an overview of variant analysis from next-generation sequencing data. It begins with introductions to the CCA-Drylab@VUmc, TraIT, and Galaxy projects. The focus of the lecture is explained to be variant analysis from NGS data using interactive demos in Galaxy. Background is provided on Illumina sequencing technology and properties of sequencing reads. Key steps in variant analysis are outlined, including quality control and read mapping, variant calling and annotation using tools like FastQC, BWA, FreeBayes, and SnpEff. Formats for storing sequencing data and variants are also introduced, such as FASTQ, SAM/BAM, and VCF.
This document discusses the use of single-cell RNA sequencing (scRNA-seq) to study brain tumors. It compares different scRNA-seq approaches and discusses challenges of working with clinical tumor samples. The document outlines how scRNA-seq can be used to answer questions about tumor cell subtypes and immune cell profiles in brain tumors. Specifically, it describes a study that used scRNA-seq to identify gene signatures that distinguish tumor-associated macrophages by their origin in glioma samples.
The document discusses RNA-Seq data analysis. Some key points:
- RNA-Seq involves sequencing steady-state RNA in a sample without prior knowledge of the organism. It can uncover novel transcripts and isoforms.
- Making sense of the large and complex RNA-Seq data depends on the scientific question, such as finding transcribed SNPs for allele-specific expression or novel transcripts in cancer samples.
- Common applications of RNA-Seq include abundance estimation, alternative splicing detection, RNA editing discovery, and finding novel transcripts and isoforms.
- Analysis steps include mapping reads to a reference genome/transcriptome, generating mapping statistics and quality metrics, differential expression analysis, clustering, and pathway analysis using tools like
This document provides an overview of Illumina sequencing, including:
- Illumina sequencing uses a sequencing by synthesis (SBS) approach with reversible terminator chemistry. All four fluorescently labeled bases are present in each sequencing cycle.
- Key steps include library construction, cluster generation, bridge amplification on the flow cell, and single-base sequencing imaging.
- Multiplexing allows indexing of multiple samples by attaching barcodes during library preparation. This enables pooled sequencing of many samples.
- Run statistics like number of reads, percentage of high-quality bases, and alignment rates provide information about run quality and performance.
This document provides an overview of exome sequence analysis. It begins with definitions of key terms like genome, genetic variants, and exome sequencing. It then describes the exome sequencing workflow, which involves fragmentation, hybridization to capture exonic regions, sequencing, mapping reads to reference genome, variant calling, and variant annotation. Challenges of finding causal variants are discussed. The document also compares benefits and challenges of exome sequencing versus whole genome sequencing or traditional methods. Finally, it discusses how exome sequencing has helped identify novel disease genes and expand knowledge of known disease genes.
The document discusses next generation sequencing (NGS) data analysis workflows and RNA sequencing data analysis. It provides an overview of primary, secondary, and tertiary analysis steps in NGS data analysis including quality control, mapping, assembly, and differential expression analysis. It also describes common file formats, tools for mapping, counting, and identifying differentially expressed genes from RNA-Seq data using either a reference genome or de novo assembly. Finally, it lists several pathways identified from comparative temporal analysis of differentially expressed genes.
This document provides an overview of next generation sequencing (NGS) technologies. It discusses the history and evolution of DNA sequencing, from early manual methods developed by Sanger to modern high-throughput NGS approaches. Key NGS methods described include Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, 454 pyrosequencing, and SOLiD ligation sequencing. Compared to Sanger, NGS allows massively parallel sequencing of many samples at lower cost and higher throughput. While NGS has advanced biological research, each method still has advantages and limitations related to read length, accuracy, and cost.
This document provides an overview of RNA-Seq analysis. It begins with considerations for RNA-Seq experiments such as computational requirements. It then describes the general RNA-Seq analysis workflow including short-read alignment, transcript reconstruction, abundance estimation, visualization, and statistics. The document focuses on explaining the "Tuxedo" analysis pipeline which includes Bowtie, Tophat, Cufflinks, Cuffmerge, Cuffdiff and CummeRbund. It provides examples of commands for each step and discusses alternative tools. The document concludes with resources for further information on RNA-Seq analysis.
The document describes ISMU, a pipeline for NGS data analysis and facilitating molecular breeding. ISMU version 1 focuses on SNP discovery between genotypes through mapping, assembly, and visualization. Version 2 applies identified SNPs to breeding through assay design, genotype calling, and analysis. The pipeline benchmarks open source programs, performs assembly and polymorphism detection between genotypes, and identifies parental lines for molecular breeding applications. It provides user-friendly interfaces for uploading data and visualizing results. Future plans include updating tools, extending pipeline capabilities, and linking with other databases and analysis systems.
- ISMU 1.0 and 2.0 are pipelines for genome-wide selection that allow for automated SNP detection, genotyping, and genomic selection in plant breeding.
- ISMU 1.0 featured a graphical user interface and automated data cleaning. ISMU 2.0 adds capabilities for genomic selection like multiple GS methods, cross validation, and output in HTML and PDF.
- Factors affecting genomic selection models include marker density, population size, trait heritability, and relationship between training and selection populations. Validation studies are needed to select the best model.
The document discusses RNA-seq analysis. It begins with an introduction to Mikael Huss, a bioinformatics scientist, and provides an overview of how genomics, RNA profiles, protein profiles, and interactomics relate within systems biology. The document then discusses how gene expression analysis can provide insights into basic research questions regarding tissue and cell identity, as well as insights into diseases by identifying genes that are over- or under-expressed in patients. Finally, it provides a brief overview of the typical workflow for RNA-seq analysis, which involves mapping RNA sequencing reads to a reference genome or transcriptome.
RNASeq - Analysis Pipeline for Differential ExpressionJatinder Singh
RNA-Seq is a technique that uses next generation sequencing to sequence RNA transcripts and quantify gene expression levels. It can be used to estimate transcript abundance, detect alternative splicing, and compare gene expression profiles between healthy and diseased tissue. Computational challenges include read mapping due to exon-exon junctions and normalization of read counts. Key steps in RNA-Seq analysis include read mapping, transcript assembly, counting and normalizing reads, and detecting differentially expressed genes.
1. O documento descreve os principais aspectos do projeto de experimento para RNAseq, incluindo o design experimental, a complexidade do transcriptoma, e as aplicações da transcriptômica de próxima geração.
2. Fatores como heterozigozidade, poliploidia, isoformas alternativas de splicing, estágios de desenvolvimento e partes do organismo afetam a complexidade do projeto.
3. O design experimental deve levar esses fatores em consideração para obter dados de RNAseq confiáveis.
This document provides an introduction to next generation sequencing (NGS) technologies. It begins with an outline of topics to be covered, including the evolution of NGS technologies, their descriptions and comparisons, bioinformatics challenges of NGS data analysis, and some aspects of NGS data analysis workflows and tools. The document then delves into explanations of specific NGS platforms, their performance characteristics, and the sequencing processes. It discusses the large computational infrastructure and data management needs of NGS, as well as quality control, preprocessing of NGS data, and popular analysis tools and workflows.
This document discusses next generation sequencing technologies. It provides details on several massively parallel sequencing platforms and describes their advantages over traditional Sanger sequencing such as higher throughput, lower costs, and ability to process millions of reads in parallel. It then outlines several applications of next generation sequencing like mutation discovery, transcriptome analysis, metagenomics, epigenetics research and discovery of non-coding RNAs.
The document provides an overview of Chip-seq data analysis. It discusses the Chip-seq technology, visualization of genomic data, command line analysis including quality checking, alignment, peak calling, annotation, and motif finding. It also discusses downstream analysis such as comparing samples, analyzing region occupancy, and web resources for Chip-seq analysis.
This document provides an introduction and overview of common methods for processing and analyzing next generation sequencing (NGS) data, including mapping NGS reads and de novo assembly of NGS reads. It discusses various NGS applications such as RNA-Seq, epigenetics, structural variation detection, and metagenomics. Key steps in read alignment such as choosing an alignment program and viewing alignments are outlined. Considerations for choosing an alignment program based on library type, read type, and platform are also reviewed. Popular alignment programs including Bowtie, BWA, TopHat, and Novoalign are mentioned.
This document outlines exercises for quality control of NGS data from an Illumina sequencing experiment on tomato ripening stages. The exercises include: 1) evaluating raw fastq files for format and number of sequences; 2) using FastQC to analyze read quality scores, lengths, duplication levels, and k-mer content; and 3) preprocessing the reads using fastq-mcf to trim low quality ends and remove short reads before reanalyzing with FastQC. The goal is to learn how to evaluate NGS read quality and preprocess data prior to downstream analysis.
Introduction to Next-Generation Sequencing (NGS) TechnologyQIAGEN
The continuous evolution of NGS technology has led to an enormous diversification in NGS applications and dramatically decreased the costs to sequence a complete human genome.
In this presentation, we will discuss the following major topics:
• Basic overview of NGS sequencing technologies
• Next-generation sequencing workflow
• Spectrum of NGS applications
• QIAGEN universal NGS solutions
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
This document provides an overview of variant analysis from next-generation sequencing data. It begins with introductions to the CCA-Drylab@VUmc, TraIT, and Galaxy projects. The focus of the lecture is explained to be variant analysis from NGS data using interactive demos in Galaxy. Background is provided on Illumina sequencing technology and properties of sequencing reads. Key steps in variant analysis are outlined, including quality control and read mapping, variant calling and annotation using tools like FastQC, BWA, FreeBayes, and SnpEff. Formats for storing sequencing data and variants are also introduced, such as FASTQ, SAM/BAM, and VCF.
This document discusses the use of single-cell RNA sequencing (scRNA-seq) to study brain tumors. It compares different scRNA-seq approaches and discusses challenges of working with clinical tumor samples. The document outlines how scRNA-seq can be used to answer questions about tumor cell subtypes and immune cell profiles in brain tumors. Specifically, it describes a study that used scRNA-seq to identify gene signatures that distinguish tumor-associated macrophages by their origin in glioma samples.
The document discusses RNA-Seq data analysis. Some key points:
- RNA-Seq involves sequencing steady-state RNA in a sample without prior knowledge of the organism. It can uncover novel transcripts and isoforms.
- Making sense of the large and complex RNA-Seq data depends on the scientific question, such as finding transcribed SNPs for allele-specific expression or novel transcripts in cancer samples.
- Common applications of RNA-Seq include abundance estimation, alternative splicing detection, RNA editing discovery, and finding novel transcripts and isoforms.
- Analysis steps include mapping reads to a reference genome/transcriptome, generating mapping statistics and quality metrics, differential expression analysis, clustering, and pathway analysis using tools like
This document provides an overview of Illumina sequencing, including:
- Illumina sequencing uses a sequencing by synthesis (SBS) approach with reversible terminator chemistry. All four fluorescently labeled bases are present in each sequencing cycle.
- Key steps include library construction, cluster generation, bridge amplification on the flow cell, and single-base sequencing imaging.
- Multiplexing allows indexing of multiple samples by attaching barcodes during library preparation. This enables pooled sequencing of many samples.
- Run statistics like number of reads, percentage of high-quality bases, and alignment rates provide information about run quality and performance.
This document provides an overview of exome sequence analysis. It begins with definitions of key terms like genome, genetic variants, and exome sequencing. It then describes the exome sequencing workflow, which involves fragmentation, hybridization to capture exonic regions, sequencing, mapping reads to reference genome, variant calling, and variant annotation. Challenges of finding causal variants are discussed. The document also compares benefits and challenges of exome sequencing versus whole genome sequencing or traditional methods. Finally, it discusses how exome sequencing has helped identify novel disease genes and expand knowledge of known disease genes.
The document discusses next generation sequencing (NGS) data analysis workflows and RNA sequencing data analysis. It provides an overview of primary, secondary, and tertiary analysis steps in NGS data analysis including quality control, mapping, assembly, and differential expression analysis. It also describes common file formats, tools for mapping, counting, and identifying differentially expressed genes from RNA-Seq data using either a reference genome or de novo assembly. Finally, it lists several pathways identified from comparative temporal analysis of differentially expressed genes.
This document provides an overview of next generation sequencing (NGS) technologies. It discusses the history and evolution of DNA sequencing, from early manual methods developed by Sanger to modern high-throughput NGS approaches. Key NGS methods described include Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, 454 pyrosequencing, and SOLiD ligation sequencing. Compared to Sanger, NGS allows massively parallel sequencing of many samples at lower cost and higher throughput. While NGS has advanced biological research, each method still has advantages and limitations related to read length, accuracy, and cost.
This document provides an overview of RNA-Seq analysis. It begins with considerations for RNA-Seq experiments such as computational requirements. It then describes the general RNA-Seq analysis workflow including short-read alignment, transcript reconstruction, abundance estimation, visualization, and statistics. The document focuses on explaining the "Tuxedo" analysis pipeline which includes Bowtie, Tophat, Cufflinks, Cuffmerge, Cuffdiff and CummeRbund. It provides examples of commands for each step and discusses alternative tools. The document concludes with resources for further information on RNA-Seq analysis.
The document describes ISMU, a pipeline for NGS data analysis and facilitating molecular breeding. ISMU version 1 focuses on SNP discovery between genotypes through mapping, assembly, and visualization. Version 2 applies identified SNPs to breeding through assay design, genotype calling, and analysis. The pipeline benchmarks open source programs, performs assembly and polymorphism detection between genotypes, and identifies parental lines for molecular breeding applications. It provides user-friendly interfaces for uploading data and visualizing results. Future plans include updating tools, extending pipeline capabilities, and linking with other databases and analysis systems.
- ISMU 1.0 and 2.0 are pipelines for genome-wide selection that allow for automated SNP detection, genotyping, and genomic selection in plant breeding.
- ISMU 1.0 featured a graphical user interface and automated data cleaning. ISMU 2.0 adds capabilities for genomic selection like multiple GS methods, cross validation, and output in HTML and PDF.
- Factors affecting genomic selection models include marker density, population size, trait heritability, and relationship between training and selection populations. Validation studies are needed to select the best model.
The Advanced Data Analysis Centre (ADAC) at the University of Nottingham provides bioinformatics and data analysis support for complex genomic and transcriptomic datasets. It offers a range of services including obtaining and processing next-generation sequencing data, quality control, mapping, variant calling, and specialized analysis. ADAC has expertise in many areas relevant to NGS analysis and is able to provide flexible consultancy, collaboration, and bespoke analysis support for high-quality research.
A key tenant of moving NFV from a Proof of Concept (Poc) to deployment is testing. NFV solutions that pull from open source projects such as OPNFV, OpenStack, OpenDaylight, and others must be integrated and tested in an environment that fully supports the performance and availability requirements of service provider networks. Testing criteria and solutions are also required to ensure NFV interoperability between hardware and software systems that comprise NFV. In this tutorial, you’ll learn best practices for open source NFV testing, including: methodology; mapping to ETSI NFV use-case/s; open source project integration; testing dashboards; Continuous Integration and Continuous Deployment (CI/CD); and testing acceleration.
The document discusses Juniper's WANDL and NorthStar solutions for network operators. It provides an overview of the key capabilities of each solution, including:
- WANDL's IP/MPLS View allows operators to design, plan, monitor and optimize multi-vendor Layer 3 networks. It provides network modeling, traffic analysis and automated provisioning capabilities.
- NorthStar combines WANDL's path computation with Juniper's dynamic IP control plane to enable stateful traffic engineering. It provides optimized routing using a centralized path computation approach.
- Both solutions help operators improve network performance, redundancy and efficiency through capabilities like failure simulation, capacity planning, high availability assessment and traffic engineering.
The document discusses Spectra Operating Environment (OE), an SCA compliant radio development environment. It provides an overview of SCA OE requirements including supporting an Application Environment Profile, CORBA middleware, and SCA Core Framework. It outlines the benefits of using a commercial off-the-shelf (COTS) OE like Spectra OE over custom development. Spectra OE is highlighted as a standards-based, high performance, portable COTS solution for SCA application deployment across processor types.
Rahul Ramani has over 3 years of experience in functional verification of ASICs and FPGAs using SystemVerilog and OVM methodologies. He has worked on projects involving DAL level-A components for avionics and has experience developing testbenches, achieving code and functional coverage, and verifying protocols like Serial RapidIO. Rahul is skilled in Verilog, VHDL, SystemVerilog, OVM, and scripting languages like Perl. He has experience with verification tools like QuestaSim and debugging tools like Modelsim. Rahul has also worked on verification projects involving DO-254 standards where he created compliance documents and reviewed verification artifacts.
Demo how to efficiently evaluate nf-vi performance by leveraging opnfv testi...OPNFV
Liang Gao, Huawei, Trevor Cooper, Intel
NFV environments are highly flexible and this introduces unique challenges for testing performance of NFVI and Network Services. This presentation introduces OPNFV performance test projects and explains their role as part of the testing ecosystem. Examples from three performance testing categories will be demonstrated showing test results and their interpretation. Test cases discussed will include data-path performance, live migration performance and storage performance.
The document discusses the H.264/MPEG-4 AVC video compression standard. It provides an overview of the evolution of video coding standards that led to H.264/AVC, describes key features of H.264/AVC such as enhanced motion compensation, transform coding, and entropy coding, and compares H.264/AVC's compression performance to prior standards. The document concludes that H.264/AVC provides up to 50% better compression efficiency than previous standards through improvements like smaller block sizes and an adaptive deblocking filter, though it also increases computational complexity.
2015. abhishek rathore. ismu 2.0 a multi algorithm pipeline for genomic selec...FOODCROPS
This document describes ISMU 2.0, a multi-algorithm pipeline for genomic selection. ISMU 2.0 allows users to perform genomic selection through a graphical user interface. It supports various genomic selection models and can handle large datasets through a combination of R and Fortran. The pipeline automates many steps involved in genomic selection such as data filtering, cross-validation, and generating reports. It is designed to make genomic selection more accessible and efficient.
Open64 is an open source, optimizing compiler tool for Intel Itanium platform. It was released by SGI (Silicon Graphics, Inc) company and now mostly serves as a research platform for compiler and computer architecture research groups
High Performance Flow Matching Architecture for Openflow Data PlaneMahesh Dananjaya
This document proposes a novel high performance flow matching architecture for OpenFlow data planes. It introduces an integrated approach using a customized RISC network processor and dedicated parallel logic. The processor provides flexibility and programmability while the dedicated logic handles performance-intensive flow matching tasks with reduced TCAM usage. An FPGA implementation of this architecture achieves high performance while minimizing resource utilization.
This document describes a science gateway developed using the Airavata middleware to enable complex industrial flow simulations through a high-performance computing workflow. The gateway lowers barriers to running workflows by automating tasks like file management, scheduling, scripting, and compiling across different HPC systems. It allows users to specify simulation parameters through a web interface and executes workflow steps on HPC resources, returning output upon completion. The document outlines the key components of the gateway, including the PHASTA application integration and deployment on TACC Stampede and CCI IBM Blue Gene resources.
How good is your SPARQL endpoint? A QoS-Aware SPARQL Endpoint Monitoring and...Ali Intizar
Due to the decentralised and autonomous architecture of the
Web of Data, data replication and local deployment of SPARQL endpoints
is inevitable. Nowadays, it is common to have multiple copies of
the same dataset accessible by various SPARQL endpoints, thus leading
to the problem of selecting optimal data source for a user query based on
data properties and requirements of the user or the application. Quality
of Service (QoS) parameters can play a pivotal role for the selection of
optimal data sources according to the user's requirements. QoS parameters
have been widely studied in the context of web service selection.
However, to the best of our knowledge, the potential of associating QoS
parameters to SPARQL endpoints for optimal data source selection has
not been investigated.
In this paper, we dene various QoS parameters associated with the
SPARQL endpoints and represent a semantic model for QoS parameters
and their evaluation. We present a monitoring service for the SPARQL
endpoint which automatically evaluates the QoS metrics of any given
SPARQL endpoint. We demonstrate the utility of our monitoring service
by implementing an extension of the SPARQL query language, which
caters for user requirements based on QoS parameters and selects the
optimal data source for a particular user query over federated sources.
This research study is presented at the 7th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE2014)
Abstract: Effectively performing code review increases the quality of software and reduces occurrence of defects. However, this requires reviewers with experiences and deep understandings of system code. Manual selection of such reviewers can be a costly and time-consuming task. To reduce this cost, we propose a reviewer recommendation algorithm determining file path similarity called FPS algorithm. Using three OSS projects as case studies, FPS algorithm was accurate up to 77.97%, which significantly outperformed the previous
approach.
Find more information and preprint at patanamon.com
The document discusses a lecture on sequence alignment, data formats, quality control, and data processing for next-generation sequencing data. It covers common file formats like FASTQ, SAM/BAM, and CRAM. It also describes algorithms for sequence alignment like hash table based methods and suffix/prefix tree based aligners. The lecture discusses scaling alignment to large datasets using parallel computing approaches.
BioWeka is an extension of the Weka data mining framework for bioinformatics applications. It provides additional tools for tasks like sequence analysis, gene expression analysis, and protein structure prediction. BioWeka implements these tools as extendable components within Weka's framework using common data formats and interfaces in order to improve interoperability and allow easy comparison of different methods. It has been used in applications like predicting the coding frame of sequences and distinguishing plant and pathogen genes.
This document outlines the 8 steps in the Broad Institute's human whole genome sequencing analysis pipeline. It uses Picard tools, BWA, and GATK. The pipeline takes raw Illumina BCL data and demultiplexes it per sample and lane. BWA is used to align reads to a reference genome. Picard tools generate QC metrics, mark duplicates, and aggregate alignments into sorted BAM files per sample. These BAM files are then input into GATK for germline variant calling and Firehose for somatic variant calling. Key steps include demultiplexing, alignment, merging alignments with metadata, duplicate marking, aggregation, indel realignment and base recalibration.
This webinar explains why PISA chips are inevitable, provides overview of machine architecture of such switches, presents a brief primer on the P4 language with sample programs for a variety of networks and demonstrates a powerful network diagnostics application implemented in P4.
Programmability in SDNs is confined to the network control plane. The forwarding plane is still largely dictated by fixed-function switching chips. Our goal is to change that, and to allow programmers to define how packets are to be processed all the way down to the wire.
This is made possible by a new generation of high-performance forwarding chips. At the high-end, PISA (Protocol-Independent Switch Architecture) chips promise multi-Tb/s of packet processing. At the mid- and low-end of the performance spectrum, CPUs, GPUs, FPGAs, and NPUs already offer great flexibility with performance of a few tens to hundreds of Gb/s.
In addition to programmable forwarding chips, we also need a high-level language to dictate the forwarding behavior in a target independent fashion. "P4" (www.p4.org) is such a language. In P4, the programer declares how packets are to be processed, and a compiler generates a configuration for a PISA chip, or a programmable target in general. For example, the programmer might program the switch to be a top-of-rack switch, a firewall, or a load-balancer; and might add features to run automatic diagnostics and novel congestion control algorithms.
Primer design is a key step in PCR that requires considering various factors to optimize the reaction. These include primer length, melting temperature, GC content, specificity, and potential for secondary structures. Well-designed primers are unique in targeting a single region, have compatible melting temperatures, and do not form hairpins or primer dimers that could inhibit the reaction. Choosing appropriate primers is essential for successful PCR amplification.
The yak is one of the most enduring symbols of the high Himalayas. Whether you visit Tibet, Bhutan, India or Nepal, you will inevitably find tourist places with yaks for picture clicking and ride.
As the largest animal on the Tibetan plateau and its surrounding regions, the yak is a “flagship species”, and indicates the health of the ecosystem within which it lives.
DNA barcoding uses a short, standardized gene sequence from a uniform region of mitochondrial or nuclear DNA to identify species. It has potential to identify the estimated 10 million eukaryotic species on Earth. The cytochrome c oxidase I (COI) gene region is commonly used for animals. DNA barcoding can identify specimens at all life stages, resolve taxonomic ambiguities, and enable development of electronic field guides. It involves tissue sampling, DNA extraction and amplification, sequencing, and comparing sequences to reference databases. Costs and time for the process are decreasing with new technologies. Global efforts aim to compile barcodes for all known eukaryotes.
Microsatellite are powerful DNA markers for quantifying genetic variations within & between populations of a species, also called as STR, SSR, VNTR. Tandemly repeated DNA sequences with the repeat/size of 1 – 6 bases repeated several times
This document provides a guide for identifying different tick species. It begins with an overview of tick taxonomy, then describes the key morphological features used to identify ticks, including their life stages, geographical distribution, hosts, life cycle, and pathogenesis. The guide highlights characteristics such as scutum pattern, festoons, capitulum length, and genital aperture location that can be used to distinguish among common tick genera like Ixodes, Dermacentor, and Amblyomma. Even engorged ticks can often still be identified by these visual features along with context clues about location and time of collection.
Access and Benefit sharing from Genetic ResourcesKaran Veer Singh
Millions of people depend on biological (genetic) resources and traditional knowledge for their livelihoods. While the concept of an access and benefit sharing (ABS) regime is new, access to biological resources and transfer of associated traditional knowledge is centuries old.
The document discusses various types of intellectual property rights (IPRs) including patents, geographical indications, copyrights, trademarks, industrial designs, trade secrets, and layout designs of integrated circuits (ICs). It provides details on the scope, subject matter, criteria and duration of protection for each type of IPR. Key points covered include what constitutes an invention for patents, ownership of geographical indications, rights conferred by copyrights, purpose of trademarks, registrability of designs, and protection of trade secrets and IC layouts as undisclosed information.
Indian act on IPRs, CBD, Copyright Act, 1957
The Patents Act, 1970
The Geographical Indications of Goods (Registration and Protection) Act, 1999
The Trade Marks Act, 1999
The Designs Act, 2000
The Semiconductor Integrated Circuits Layout-Design Act, 2000
Protection of Plant Varieties and Farmers' Rights Act, 2001
Biological Diversity Act, 2002
Ip protected invention in the field of biotechnologyKaran Veer Singh
This document discusses patenting of microorganisms, recombinant DNA, plant processes, gene patents, and applications of DNA sequences in India. Key points:
1) Isolated, mutated, adapted, and recombinant microorganisms can be patented under Indian law, and must be deposited in an IDA with disclosed source and geography.
2) Recombinant DNA techniques allow for patenting of processes, products, microorganisms and their variants, and proteins.
3) Plant processes involving increasing yield, genetic transformation, tissue culture, micropropagation, and somatic embryogenesis are patentable.
4) Gene patents can claim DNA sequences, proteins, recombinant plasmids, GM organisms, and production processes
Genome annotation, NGS sequence data, decoding sequence information, The genome contains all the biological information required to build and maintain any given living organism.
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSESKaran Veer Singh
This document discusses the use of microsatellite markers for analyzing genetic diversity in livestock. It begins by providing background on livestock diversity and the threats to many breeds. It then describes microsatellites and how they are useful genetic markers for studies of diversity and relatedness. The document gives examples of how microsatellite data can be collected and analyzed to assess diversity within and among populations/breeds. It discusses applications such as conservation prioritization, phylogenetics, and management of genetic resources.
Semen Banking for conservation of livestock biodiversityKaran Veer Singh
1) Semen banking is an important method for the conservation of livestock biodiversity and genetic resources in India. It involves the collection, evaluation, processing, freezing and storage of semen from genetically important breeding males.
2) For long-term conservation, a minimum of 30,000 semen doses from 15 unrelated bulls is preserved for cattle and buffalo breeds. Quality control measures ensure high post-thaw motility and integrity of stored semen doses.
3) Twenty percent of preserved semen doses for each breed are stored at the National Gene Bank in Karnal, while the remaining eighty percent are stored at regional centers, allowing for distribution and utilization of genetic material.
The document discusses 2D gel electrophoresis and the limitations of conventional 2D gels. It introduces Difference Gel Electrophoresis (DIGE), which uses spectrally distinct fluorescent dyes to label protein samples before running multiple samples on the same 2D gel. This allows direct comparison of protein abundance levels between samples and eliminates gel-to-gel variation. The document outlines the experimental design, statistical analysis software, and advantages of DIGE over conventional 2D gels such as increased accuracy, reduced variation, and ability to detect small protein differences.
T-TECTO 3.0 is an integrated software program for structural analysis of fault-slip data. It enables paleostress and kinematic analysis of heterogeneous and homogeneous fault-slip data using the Gauss method. Version 3.0 provides additional functions for detailed analysis including determination of maximum/minimum horizontal stresses and extensions, analysis of relative vertical deformations, optimization of parameters, and visualization of multiple slip mechanisms. The software allows analysis of fault-slip data, earthquake focal mechanisms, fractures, and other structures under stress.
This document summarizes an experiment with monkeys where scientists placed bananas above a ladder in a cage. When one monkey climbed the ladder, the other monkeys were sprayed with cold water. Over time, any monkey that tried to climb the ladder would be beaten by the others, even if the punishments stopped. When monkeys were replaced in the group, the new monkeys learned not to climb the ladder without experiencing the cold water themselves, showing how behaviors can be passed down within groups without understanding the original reasons.
Electrophoresis is a technique used to separate charged molecules like proteins and DNA. It works by applying an electric current which causes the molecules to migrate through a buffer or gel at different rates depending on their size and charge. The document discusses the principles of electrophoresis, the different types of electrophoresis like agarose gel electrophoresis and polyacrylamide gel electrophoresis, and factors that influence molecule migration like pH, molecular weight, and net charge.
Electrophoresis is a technique used to separate charged molecules like proteins and DNA. It works by applying an electric current which causes the molecules to migrate through a buffer or gel at different rates depending on their size and charge. The document discusses the principles of electrophoresis, different types of electrophoresis like agarose gel electrophoresis and polyacrylamide gel electrophoresis (PAGE), and factors that influence molecule migration like pH, molecular weight, and net charge.
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...Diana Rendina
Librarians are leading the way in creating future-ready citizens – now we need to update our spaces to match. In this session, attendees will get inspiration for transforming their library spaces. You’ll learn how to survey students and patrons, create a focus group, and use design thinking to brainstorm ideas for your space. We’ll discuss budget friendly ways to change your space as well as how to find funding. No matter where you’re at, you’ll find ideas for reimagining your space in this session.
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
2. 2
1/26/2014
Sequence Formats
All
Sequence formats are ASCII text
containing sequence ID, Quality Scores,
Annotation details, comments, and other
descriptions about sequence
Formats
are designed to hold sequence
data and other information about
sequence
3. 3
1/26/2014
Why so many formats?
Created based on the information required for each step of analysis
Efficient Data & time management
Types of sequence file formats
•
•
•
•
•
Raw Sequence files
Co-ordinate files
Parameter files
Annotation files
Metadata files
Each Data formats vary in the information they contain
8. 8
1/26/2014
SOLiD output format(s)
CSFASTA
color-space sequence reads in a fasta format
These reads can be retained and analyzed in color-space by
software
The Format Conversion Tool offers options for cleaning of the
CSFASTA files
9. Read Length
• Sanger reads lengths ~ 800-2000bp
• Generally we define short reads as anything below 200bp
−Illumina (100bp – 250bp)
−SoLID (75bp max)
−Ion Torrent (200-300bp max – currently...)
−Roche 454 – 400-800bp
• Even with these platforms it is cheaper to produce short reads (e.g. 50bp)
rather than 100 or 200bp reads
• Diminishing returns:
−For some applications 50bp is more than sufficient
−Resequencing of smaller organisms
−Bacterial de-novo assembly
−ChIP-Seq
−Digital Gene Expression profiling
−Bacterial RNA-seq
12. 12
1/26/2014
Formats for Genome/Gene annotation
BED format
(genome-browser tracks)
GFF format
(gene/genome features)
BioXSD
(XML)
(any annotation; under development)
14. 14
1/26/2014
Points to remember on Data Formats
For base-call data, “standard” FASTQ (Sanger, Phred)
For read alignments, SAM/BAM/MAQ format
For annotation results (e.g. GFF or BED format)
16. All platforms have errors
Illumina
1.
2.
3.
SoLID/ABI-Life
Roche 454
Ion Torrent
Removal of low quality bases/ Low complexity regions
Removal of adaptor sequences
Homopolymer-associated base call errors (3 or more
identical DNA bases) causes higher number of (artificial)
frameshifts
17. Illumina artefacts
under represented GC rich regions
PCR
Sequencing
GGC/GCC motif is associated with low
quality and mismatches
Low quality reads < 20% phred score
18. 18
1/26/2014
Need for QC & Preprocessing
QC analysis of sequence data is extremely important for meaningful
downstream analysis
To analyze problems in quality scores/ statistics of sequencing data
To check whether further analysis with sequence is possible
To remove redundancy (filtering)
To remove low quality reads from analysis
To remove adapter contamination
Highly efficient and fast processing tools are required to handle large volume
of datasets
19. 19
1/26/2014
Need for QC & Preprocessing
The
quality of data is very important for various
downstream analyses, such as sequence assembly,
single nucleotide polymorphisms identification
Most
of the programs available for downstream
analyses do not provide the utility for quality check
and filtering of NGS data before processing
20. 20
1/26/2014
NGS QC Toolkit & FastQC
NGS QC Toolkit is for quality check and filtering of high-quality read
This toolkit is a standalone and open source application freely available at
http://www.nipgr.res.in/ngsqctoolkit.html
Application have been implemented in Perl programming language
QC of sequencing data generated using Roche 454 and Illumina
platforms
Additional tools to aid QC : (sequence format converter and trimming
tools) and analysis (statistics tools)
FastQC can be used only for preliminary analysis
26. 26
1/26/2014
FastQC
Basic
statistics
Quality- Per base position
Per Sequence Quality Distribution
Nucleotide content per position
Per sequence GC distribution
Per base GC distribution
Per base N content
Length Distribution
Overrepresented/ duplicated sequences
K-mer content
40. 8. Kmer content
40
1/26/2014
Any k-mer showing more than a 3 fold overall enrichment or a 5 fold
enrichment at any given base position will be reported by this module.
41. 9. Overrepresented/
duplicate sequences
41
1/26/2014
The analysis of overrepresented sequences will spot an
increase in any exactly duplicated sequences
Too many duplicate regions in the sequence will be due to
sequencing problems
This module will issue a warning if any sequence is
found to represent more than 0.1% of the total.
42. 42
1/26/2014
QC Report
Sequence Statistics
Total No. of Sequences
6970943
Avg. Sequence Length
54
Max Sequence Length
54
Min Sequence Length
54
Total Sequence Length
376430922
Total N bases
14254521
% N bases
3.78676
No of Sequences with Ns 278635
% Sequences with Ns
3.99709
Quality Statistics
Total HQ bases 334195496
%HQ bases
88.78
Total HQ reads 6350256
%HQ reads
91.0961
Alignment statistics