RNA SEQUENCING AND CHIP
SEQUENCING
Presented by,
Jyoti Kumari
B.Tech(Bioinformatics)
What is RNA?
 Within a multicellular organism and throughout its
life, its genome stays mostly unchanged. Its cells,
however, can have very distinct appearances,
functions and respond differently to extracellular
stimuli.
 These differences are possible because cells make
different use of stretches of the DNA, called genes,
as templates to build functional cellular products in a
process called gene expression. In the first step of
gene expression, known as transcription, the
information in the DNA is used to create ribonucleic
acid molecules (RNA).
 RNA is synthesised using one of the DNA strands
as a template and has the same chemical
structure except that thymine is replaced by
uracil (U). Some RNA molecules can be the end
product in themselves and some can in turn be
used as a template for the creation of other
molecules, proteins, in a process called
translation.
 RNAs that are used as a template for proteins
arecalled messenger RNAs(mRNAs) and the
ones that are not are non-coding RNAs
(ncRNAs).
Transcriptome: RNA WORLD!!
 The transcriptome is the complete set of
transcripts in a cell, and their quantity, for a
specific developmental stage or physiological
condition.
 Understanding the transcriptome is essential
for interpreting the functional elements of
the genome and revealing the molecular
constituents of cells and tissues, and also for
understanding development and disease.
Techonologies to deduce and
quantify transcriptome
 Hybridization-based : Microarrays
Advantages:
1. HighThroughput
2. Relatively inexpensive
Limitations:
1. Reliance upon existing knowledge about genome
sequence.
2. High background levels owing to cross-hybridization.
3. Limited dynamic range of detection
 Sequence Based: Sanger sequencing of cDNA
or EST libraries
Limitations
1. Low throughput
2. Expensive
 Tag Based: Serial analysis of gene expression
(SAGE)
Cap analysis of gene expression
(CAGE)
Massively parallel signature
sequencing (MPSS)
Advantages
1. High throughput
2. Provide precise, ‘digital’ gene expression
levels.
Limitations
1. Expensive
2. Only a portion of the transcript is analysed.
3. Isoforms are generally indistinguishable
from each other.
 RNA-Sequencing
Advantages
1. It is not limited to detecting transcripts that
correspond to existing genomic sequence.
2. It is helpful for studying complex genomes
3. HighThroughput
4. It can also reveal sequence variations (for
example, SNPs) in the transcribed regions.
What is RNA Sequencing?
 RNA sequencing or Rna-Seq is a recently
developed approach to transcriptome profiling
that uses deep-sequencing technologies.
 It includes high-throughput shotgun
sequencing of cDNA molecules obtained by
reverse transcription from RNA, and next-
generation sequencing technologies to sequence
the RNA molecules within a biological sample in
an effort to determine the primary sequence and
relative abundance of each RNA molecule.
A typical RNA-Seq Experiment
Uses of RNA-Seq
1. Identify and quantify both rare and common
transcripts, with over six orders of
magnitude of dynamic range
2. Align sequencing reads across splice
junctions, and detect isoforms, novel
transcripts and gene fusions
3. Perform robust whole-transcriptome
analysis on a wide range of samples,
including low-quality samples
4. Identification of exons and introns and mapping
of their boundaries.
5. Identification of the 5’ and 3’ ends of genes and
identification of transcription start sites.
6. Quantification of exon expression and splicing
variants.
RNA-Sequencing WORKFLOW:
A typical RNA-Seq experiments follows these
steps:
RNA Preparation
 Since the goal of RNA-seq is to characterize the transcriptome
the first step naturally involves isolating and purifying cellular
RNAs.
 Isolation and purification of RNA typically involves disrupting
cells in the presence of detergents and chaotropic agents.
 After homogenization, RNA can be recovered and purified from
the total cell lysate using either liquid-liquid partitioning or solid-
phase extraction.
 Typically the total RNA is then enriched for messenger RNA
(mRNA).This can be done by either directly selecting mRNA or
by selectively removing ribosomal RNA (rRNA).
 To make the RNA suitable for RNA-seq it is typically fragmented
and then the quality and fragmentation are assessed.
Library Preparation
 In all cases an RNA-seq experiment involves
making a collection of cDNA fragments which
are flanked by specific constant sequences
(known as adapters) that are necessary for
sequencing.This collection (referred to as a
library) is then sequenced using short-read
sequencing which produces millions of short
sequence reads that correspond to individual
cDNA fragments.
 After obtaining an RNA preparation that is suitable
for RNA-seq the RNA must be converted to double-
stranded complementary DNA (cDNA).
 This comprises of two steps:
1. First strand synthesis
In order to convert RNA to DNA the RNA
must be used as a template for DNA
polymerase. Most DNA polymerases cannot
use RNA as a template. However,
retroviruses encode a unique type of
polymerase known as reverse transcriptases,
which are able to synthesize DNA using an
RNA template.
2. Second strand synthesis
The second cDNA strand is synthesized by a
DNA polymerase using the RT-synthesized DNA-
strand as a template.
 To achieve the highest quality of data, quality
is validated and the cDNA libraries are
accurately quantified before sequencing.
 Assessing the fragment size distribution of
final RNA-seq libraries is also important.The
sizes of the molecules should fall into the
expected size range.
 Fragment sizes can be evaluated via
electrophoresis, preferably using a sensitive
instrument such as an ABI Bioanalyzer.
Sequencing
 For sequencing, a sequencing platform is
required.
 The current leading platform for RNA-seq is
Illumina.This platform enables deep
sequencing which is generally important for
RNA-seq, and provides long enough, low-
error reads that are suitable for mapping to
reference genomes and transcriptome
assembly.
 Other platforms is PacBio platform.
Analysis
 Sequence reads are mapped with a combination of
SOAP and BLAT. SOAP is a very fast mapping
program, and BLAT contains powerful options for
mapping gapped reads.
 Most tags will map back to a unique place in the
genome.
 There are two special types of tags:
1. Gapped alignments:They are reads which
putatively span an intron.
2. 3 end tags:They are sequence reads with a non-
genomic run of.A. or .T. bases, indicating that
they are the site of a polyadenylation event.
These areuseful in determining the 3 ends of
genes.
 With the help of Bioinformatics we can
calculate the expression level for each base
pair of the genome.
 it is possible to annotate the genome with
information
1. where introns are located (via gapped
alignments)
2. 5’ ends (via suddenexpression drops)
3. 3’ ends (via sudden expression drops and 3
end tags).
ChIP Sequencing
 ChIP sequencing, also known as ChIP-Seq, is a method
used to analyze protein interactions with DNA.
 ChIP stands for Chromatin Immuno-Precipitation and seq
refers to the high throughput sequencing to detect bound
genomic locations.
 ChIP-Seq combines chromatin immunoprecipitation (ChIP)
with massively parallel DNA sequencing to identify the
binding sites of DNA-associated proteins.
 It can be used to map global binding sites precisely for any
protein of interest
USES
 ChIP-seq is used primarily to determine how transcription
factors and other chromatin-associated proteins
influence phenotype affecting mechanisms.
 Determining how proteins interact with DNA to
regulate gene expression is essential for fully
understanding many biological processes and disease
states.
 Specific DNA sites in direct physical interaction with
transcription factors and other proteins can be isolated
by chromatin immunoprecipitation.
 ChIP produces a library of target DNA sites bound to a
protein of interest in vivo
Overview of ChIP sequencing
Why Chip-Sequencing is
better approach
 High Quality Data: Positional precision of
mapped binding sites += 50bp
 Wide Dynamic Range: Robust quantification for
determining binding specificities of varying
strengths.
 High Signal-to-Noise-Ratio: Lower background
than ChIP-chip, no cross hybridization.
 Genome-Wide Analysis: Identifies any binding
sites, not limited to array features.
 Low Starting Material Requirement: Robust
output from as little as 10ng of precious input.
THANKYOU.!! 

Rna seq and chip seq

  • 1.
    RNA SEQUENCING ANDCHIP SEQUENCING Presented by, Jyoti Kumari B.Tech(Bioinformatics)
  • 2.
    What is RNA? Within a multicellular organism and throughout its life, its genome stays mostly unchanged. Its cells, however, can have very distinct appearances, functions and respond differently to extracellular stimuli.  These differences are possible because cells make different use of stretches of the DNA, called genes, as templates to build functional cellular products in a process called gene expression. In the first step of gene expression, known as transcription, the information in the DNA is used to create ribonucleic acid molecules (RNA).
  • 3.
     RNA issynthesised using one of the DNA strands as a template and has the same chemical structure except that thymine is replaced by uracil (U). Some RNA molecules can be the end product in themselves and some can in turn be used as a template for the creation of other molecules, proteins, in a process called translation.  RNAs that are used as a template for proteins arecalled messenger RNAs(mRNAs) and the ones that are not are non-coding RNAs (ncRNAs).
  • 4.
    Transcriptome: RNA WORLD!! The transcriptome is the complete set of transcripts in a cell, and their quantity, for a specific developmental stage or physiological condition.  Understanding the transcriptome is essential for interpreting the functional elements of the genome and revealing the molecular constituents of cells and tissues, and also for understanding development and disease.
  • 6.
    Techonologies to deduceand quantify transcriptome  Hybridization-based : Microarrays Advantages: 1. HighThroughput 2. Relatively inexpensive Limitations: 1. Reliance upon existing knowledge about genome sequence. 2. High background levels owing to cross-hybridization. 3. Limited dynamic range of detection
  • 7.
     Sequence Based:Sanger sequencing of cDNA or EST libraries Limitations 1. Low throughput 2. Expensive  Tag Based: Serial analysis of gene expression (SAGE) Cap analysis of gene expression (CAGE) Massively parallel signature sequencing (MPSS)
  • 8.
    Advantages 1. High throughput 2.Provide precise, ‘digital’ gene expression levels. Limitations 1. Expensive 2. Only a portion of the transcript is analysed. 3. Isoforms are generally indistinguishable from each other.
  • 9.
     RNA-Sequencing Advantages 1. Itis not limited to detecting transcripts that correspond to existing genomic sequence. 2. It is helpful for studying complex genomes 3. HighThroughput 4. It can also reveal sequence variations (for example, SNPs) in the transcribed regions.
  • 10.
    What is RNASequencing?  RNA sequencing or Rna-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies.  It includes high-throughput shotgun sequencing of cDNA molecules obtained by reverse transcription from RNA, and next- generation sequencing technologies to sequence the RNA molecules within a biological sample in an effort to determine the primary sequence and relative abundance of each RNA molecule.
  • 11.
  • 12.
    Uses of RNA-Seq 1.Identify and quantify both rare and common transcripts, with over six orders of magnitude of dynamic range 2. Align sequencing reads across splice junctions, and detect isoforms, novel transcripts and gene fusions 3. Perform robust whole-transcriptome analysis on a wide range of samples, including low-quality samples
  • 13.
    4. Identification ofexons and introns and mapping of their boundaries. 5. Identification of the 5’ and 3’ ends of genes and identification of transcription start sites. 6. Quantification of exon expression and splicing variants.
  • 14.
    RNA-Sequencing WORKFLOW: A typicalRNA-Seq experiments follows these steps:
  • 15.
    RNA Preparation  Sincethe goal of RNA-seq is to characterize the transcriptome the first step naturally involves isolating and purifying cellular RNAs.  Isolation and purification of RNA typically involves disrupting cells in the presence of detergents and chaotropic agents.  After homogenization, RNA can be recovered and purified from the total cell lysate using either liquid-liquid partitioning or solid- phase extraction.  Typically the total RNA is then enriched for messenger RNA (mRNA).This can be done by either directly selecting mRNA or by selectively removing ribosomal RNA (rRNA).  To make the RNA suitable for RNA-seq it is typically fragmented and then the quality and fragmentation are assessed.
  • 16.
    Library Preparation  Inall cases an RNA-seq experiment involves making a collection of cDNA fragments which are flanked by specific constant sequences (known as adapters) that are necessary for sequencing.This collection (referred to as a library) is then sequenced using short-read sequencing which produces millions of short sequence reads that correspond to individual cDNA fragments.
  • 18.
     After obtainingan RNA preparation that is suitable for RNA-seq the RNA must be converted to double- stranded complementary DNA (cDNA).  This comprises of two steps: 1. First strand synthesis In order to convert RNA to DNA the RNA must be used as a template for DNA polymerase. Most DNA polymerases cannot use RNA as a template. However, retroviruses encode a unique type of polymerase known as reverse transcriptases, which are able to synthesize DNA using an RNA template. 2. Second strand synthesis The second cDNA strand is synthesized by a DNA polymerase using the RT-synthesized DNA- strand as a template.
  • 19.
     To achievethe highest quality of data, quality is validated and the cDNA libraries are accurately quantified before sequencing.  Assessing the fragment size distribution of final RNA-seq libraries is also important.The sizes of the molecules should fall into the expected size range.  Fragment sizes can be evaluated via electrophoresis, preferably using a sensitive instrument such as an ABI Bioanalyzer.
  • 20.
    Sequencing  For sequencing,a sequencing platform is required.  The current leading platform for RNA-seq is Illumina.This platform enables deep sequencing which is generally important for RNA-seq, and provides long enough, low- error reads that are suitable for mapping to reference genomes and transcriptome assembly.  Other platforms is PacBio platform.
  • 21.
    Analysis  Sequence readsare mapped with a combination of SOAP and BLAT. SOAP is a very fast mapping program, and BLAT contains powerful options for mapping gapped reads.  Most tags will map back to a unique place in the genome.  There are two special types of tags: 1. Gapped alignments:They are reads which putatively span an intron. 2. 3 end tags:They are sequence reads with a non- genomic run of.A. or .T. bases, indicating that they are the site of a polyadenylation event. These areuseful in determining the 3 ends of genes.
  • 22.
     With thehelp of Bioinformatics we can calculate the expression level for each base pair of the genome.  it is possible to annotate the genome with information 1. where introns are located (via gapped alignments) 2. 5’ ends (via suddenexpression drops) 3. 3’ ends (via sudden expression drops and 3 end tags).
  • 23.
    ChIP Sequencing  ChIPsequencing, also known as ChIP-Seq, is a method used to analyze protein interactions with DNA.  ChIP stands for Chromatin Immuno-Precipitation and seq refers to the high throughput sequencing to detect bound genomic locations.  ChIP-Seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins.  It can be used to map global binding sites precisely for any protein of interest
  • 25.
    USES  ChIP-seq isused primarily to determine how transcription factors and other chromatin-associated proteins influence phenotype affecting mechanisms.  Determining how proteins interact with DNA to regulate gene expression is essential for fully understanding many biological processes and disease states.  Specific DNA sites in direct physical interaction with transcription factors and other proteins can be isolated by chromatin immunoprecipitation.  ChIP produces a library of target DNA sites bound to a protein of interest in vivo
  • 26.
    Overview of ChIPsequencing
  • 27.
    Why Chip-Sequencing is betterapproach  High Quality Data: Positional precision of mapped binding sites += 50bp  Wide Dynamic Range: Robust quantification for determining binding specificities of varying strengths.  High Signal-to-Noise-Ratio: Lower background than ChIP-chip, no cross hybridization.  Genome-Wide Analysis: Identifies any binding sites, not limited to array features.  Low Starting Material Requirement: Robust output from as little as 10ng of precious input.
  • 28.