NGS
PHARMACOGENOMICS
By:
Ebbali Harshitha
2021502009
CONTENTS:
● Introduction to NGS
● NGS output file formats
● SRA
● Applications of NGS
● Drawbacks of NGS
● References
Introduction to NGS:
● Next Generation Sequencing (NGS) is a powerful platform that has
enabled the sequencing of thousands to millions of DNA and RNA
molecules simultaneously.
● Next-generation sequencing (NGS), also known as high-throughput
sequencing, is the catch-all term used to describe a number of
different modern sequencing technologies.
● These recent technologies allow us to sequence DNA and RNA much
more quickly and cheaply than the previously used Sanger sequencing,
and as such have revolutionized the study of genomics and molecular
biology.
STEPS INVOLVED IN NGS:
NGS output file formats
The different sequence related formats include different information about the
sequence. The most common file formats in the NGS world are:
1. Sff
2. fasta
3. fastq
SFF file format:
● The SFF (Standard Flowgram Format) is a file used to encode the
results of sequencing from the 454 Life Sciences platform for high-
throughput sequencing.
● SFF flies can converted to FASTQ format with sff2fastq or
seq_crumbs
Fasta file format:
● The fasta format is based on a simple text. Each sequence starts with a “>”
followed by the sequence name, an space and, optionally, the description.
● Usually, if we have quality information, another fasta file with the quality
information could be provided. In this cases both the sequence and the quality
file should have the sequences in the same order.
Fastq file format:
● fastq format was developed to provide a convenient way of storing the
sequence and the quality scores in the same file. These are text files and they
look like:
1. A sequence identifier with information about the sequencing run and the
cluster.
2. The sequence/’
3. A separator, which is simply a plus (+) sign.
4. The base call quality scores. These are Phred +33 encoded, using ASCII
characters to represent the numerical quality scores.
What does a FASTQ file look like?
ASCII CODES:
SRA
● SRA is the file format in which all NCBI SRA content is provided.
● SRA files are binary files and we need specific tools to extract the information.
● There is a toolkit (SRA Toolkit)developed by NCBI to deal with these binary files.
Compressed files
● Sometime these sequence text file can be found compressed to save up hard drive
space.
● The most common compression formats are gzip and bgzip.
Applications of NGS:
NGS allows labs to:
● Rapidly sequence whole genomes rapidly sequence whole genomes.
● Deeply sequence target regions.
● Utilize RNA sequencing (RNA-Seq) to discover novel RNA variants and splice sites, or
quantify mRNAs for gene expression analysis.
● Analyze epigenetic factors such as genome-wide DNA methylation and DNA-protein
interactions.
● Sequence cancer samples to study rare somatic variants.
● Study the human microbiome.
● Identify novel pathogens.
● Indualised medecine
● The cost of sequencing the human genome has fallen over the past decade from
$20-$25 million to under $1,000 by 2016.
● The time required to carry out NGS and receive results has also dropped over
recent years. Today, NGS platforms can sequence millions of DNA fragments
simultaneously
● On top of this, NGS makes it possible for researchers to identify abnormalities
across the whole genome. This means they can sequence abnormalities across
insertions, deletions, substitutions, duplications, chromosome
inversions/translocations, and copy number changes (gene and exon)
Cons of Next-Generation Sequencing:
Despite all the benefits of NGS, the technique does have some cons.
● First, although NGS provides information on many molecular aberrations, the clinical
significance of many identified abnormalities is still unknown.
● Second, NGS requires large data storage capabilities, sophisticated bioinformatics
systems, and fast data processing infrastructures, each of which can be costly.
● Third, while researchers can use NGS to sequence a whole DNA sequence, they can only
use data from approximately 3% of the genome in clinical practice.
References:
1. Next Generation Sequencing (NGS) (microbenotes.com)
2. Exploring the Pros and Cons of Next-Generation Sequencing
(marketbusinessnews.com)
3. FASTQ format - Wikipedia
4. Next-Generation Sequencing (NGS) | Explore the technology (illumina.com)
5. What is Next-Generation Sequencing? | Thermo Fisher Scientific - IN
THANK YOU!!

NGS File formats

  • 1.
  • 2.
    CONTENTS: ● Introduction toNGS ● NGS output file formats ● SRA ● Applications of NGS ● Drawbacks of NGS ● References
  • 3.
    Introduction to NGS: ●Next Generation Sequencing (NGS) is a powerful platform that has enabled the sequencing of thousands to millions of DNA and RNA molecules simultaneously. ● Next-generation sequencing (NGS), also known as high-throughput sequencing, is the catch-all term used to describe a number of different modern sequencing technologies. ● These recent technologies allow us to sequence DNA and RNA much more quickly and cheaply than the previously used Sanger sequencing, and as such have revolutionized the study of genomics and molecular biology.
  • 4.
  • 5.
    NGS output fileformats The different sequence related formats include different information about the sequence. The most common file formats in the NGS world are: 1. Sff 2. fasta 3. fastq
  • 6.
    SFF file format: ●The SFF (Standard Flowgram Format) is a file used to encode the results of sequencing from the 454 Life Sciences platform for high- throughput sequencing. ● SFF flies can converted to FASTQ format with sff2fastq or seq_crumbs
  • 7.
    Fasta file format: ●The fasta format is based on a simple text. Each sequence starts with a “>” followed by the sequence name, an space and, optionally, the description.
  • 8.
    ● Usually, ifwe have quality information, another fasta file with the quality information could be provided. In this cases both the sequence and the quality file should have the sequences in the same order.
  • 9.
    Fastq file format: ●fastq format was developed to provide a convenient way of storing the sequence and the quality scores in the same file. These are text files and they look like: 1. A sequence identifier with information about the sequencing run and the cluster. 2. The sequence/’ 3. A separator, which is simply a plus (+) sign. 4. The base call quality scores. These are Phred +33 encoded, using ASCII characters to represent the numerical quality scores.
  • 10.
    What does aFASTQ file look like?
  • 11.
  • 13.
    SRA ● SRA isthe file format in which all NCBI SRA content is provided. ● SRA files are binary files and we need specific tools to extract the information. ● There is a toolkit (SRA Toolkit)developed by NCBI to deal with these binary files. Compressed files ● Sometime these sequence text file can be found compressed to save up hard drive space. ● The most common compression formats are gzip and bgzip.
  • 14.
    Applications of NGS: NGSallows labs to: ● Rapidly sequence whole genomes rapidly sequence whole genomes. ● Deeply sequence target regions. ● Utilize RNA sequencing (RNA-Seq) to discover novel RNA variants and splice sites, or quantify mRNAs for gene expression analysis. ● Analyze epigenetic factors such as genome-wide DNA methylation and DNA-protein interactions. ● Sequence cancer samples to study rare somatic variants. ● Study the human microbiome. ● Identify novel pathogens. ● Indualised medecine
  • 15.
    ● The costof sequencing the human genome has fallen over the past decade from $20-$25 million to under $1,000 by 2016. ● The time required to carry out NGS and receive results has also dropped over recent years. Today, NGS platforms can sequence millions of DNA fragments simultaneously ● On top of this, NGS makes it possible for researchers to identify abnormalities across the whole genome. This means they can sequence abnormalities across insertions, deletions, substitutions, duplications, chromosome inversions/translocations, and copy number changes (gene and exon)
  • 16.
    Cons of Next-GenerationSequencing: Despite all the benefits of NGS, the technique does have some cons. ● First, although NGS provides information on many molecular aberrations, the clinical significance of many identified abnormalities is still unknown. ● Second, NGS requires large data storage capabilities, sophisticated bioinformatics systems, and fast data processing infrastructures, each of which can be costly. ● Third, while researchers can use NGS to sequence a whole DNA sequence, they can only use data from approximately 3% of the genome in clinical practice.
  • 17.
    References: 1. Next GenerationSequencing (NGS) (microbenotes.com) 2. Exploring the Pros and Cons of Next-Generation Sequencing (marketbusinessnews.com) 3. FASTQ format - Wikipedia 4. Next-Generation Sequencing (NGS) | Explore the technology (illumina.com) 5. What is Next-Generation Sequencing? | Thermo Fisher Scientific - IN
  • 19.