Transcriptomics approaches

ADVANCES IN TRANSCRIPTOMICS
AND IT’S APPROACHES
CHARUPRIYA CHAUHAN
ID- 52616
DOCTORAL SEMINAR I
Setia Pramana 1

 Transcriptome
 Approaches for transcriptome analysis
 Candidate gene approach
 Hybridisation based approaches
 Sequencing based approaches
 Emerging sequencing approaches
 Applications of transcriptomics
 Case studies
Overview

CENTRAL DOGMA OF MOLECULAR BIOLOGY
GENOME TRANSCRIPTOME PROTEOME
Complete set of transcripts
and relative levels of
expression in a particular
cell or tissue under defined
conditions at a given time
Complete DNA content
of an organism with all
its gene and regulatory
sequences
Complete collection of
proteins and their relative
levels in each cell

TRANSCRIPTOME
The ‘transcriptome’ is defined as the complete
complement of RNA molecules generated by a cell
or population of cells.
The term was first proposed by Charles Auffray
in 1996.
Charles Auffray

The study of the complete set of RNAs (transcriptome) encoded by the genome of a
specific cell or organism at a specific time or under a specific set of conditions is called
Transcriptomics.
In transcriptomics, the expression of genes by a genomeis studied :
 Qualitatively (identifying which genes are expressed and which are not)
 Quantitatively (measuring varying levels of expression for different genes)
TRANSCRIPTOMICS

Milestones in transcriptome analysis
YEAR MILESTONE
1965 Sequence of the first RNA molecule determined
1977 Development of the Northern blot technique and the Sanger sequencing method
1989 Reports of RT-PCR experiments for transcriptome analysis
1991 First high-throughput EST sequencing study
1992 Introduction of Differential Display (DD) for the discovery of differentially expressed genes
1995 Reports of the microarray and Serial Analysis of Gene Expression (SAGE) methods
2001 Draft of the Human Genome completed
2005 First next-generation sequencing technology (454/Roche) introduced to the market
2006 First transcriptome sequencing studies using a next-generation technology (454/Roche)

Why transcriptomics is important?
Transcriptome profiling provides clues to:
Expressed sequences and genes of genome
 Gene regulation and regulatory sequences
 Gene function annotation
 Functional differences between tissue and cell types
 Identification of candidate genes for any given process or disease

I. To catalogue all species of transcripts, including mRNAs, noncoding RNAs and
small RNAs.
II. To determine the transcriptional structure of genes, in terms of their start sites,
5′ and 3′ ends, splicing patterns and other post-transcriptional modifications.
III. Toquantify the changing expression levels of each transcript during development
and under different conditions.
Transcriptomics aims:

• Northern Blotting Analysis
• quantitative real-time RT-PCR
Candidate gene approaches
• Microarray expression profiling
Hybridization based approaches
• EST libraries
• Serial analysis of gene expression (SAGE)
• CAGE
• RNA sequencing
Sequencing based approaches
Approaches for transcriptome mining

Northern Blotting Analysis
The northern blot, or RNA blot, is a technique used in molecular biology research to study
gene expression by detection of RNA (or isolated mRNA) in a sample.
The northern blot technique was developed in 1977 by James Alwine, David Kemp, and
George Stark at Stanford University.

Reverse transcription polymerase chain reaction
(RT-PCR)
Application
• RT-PCR is used to clone expressed genes by reverse transcribing the RNA of interest
into its DNA complement through the use of reverse transcriptase.
• Utilized for quantification of RNA, by incorporating qPCR into the technique. The
combined technique, described as quantitative real-time RT-PCR ( RT-qPCR)
A variant of polymerase chain reaction (PCR), commonly used in molecular biology to
detect RNA expression.

Comparision between RT-PCR and PCR

Probes for qRT PCR
SYBR Green
Taqman Probe

Fluorescence emission is measured continuously during the PCR reaction and ▵Rn (increase in fluorescence emission,
from which the background fluorescence signal is subtracted) is plotted against cycle number. The threshold cycle (Ct) is
the cycle at which the fluorescence exceeds a chosen threshold.
Fig: PCR amplification plot
Intensity
of
fluoroscence
Cycle number
PCR amplification plot

One-step vs. Two-step RT-qPCR
One-step assays combine reverse transcription and PCR in a single tube and buffer, using
a reverse transcriptase along with a DNA polymerase. One-step RT-qPCR only utilizes
sequence-specific primers.
In two-step assays, the reverse transcription and PCR steps are performed in separate
tubes, with different optimized buffers, reaction conditions, and priming strategies.

MICROARRAY
 Microarray is a nucleic acid hybridization based, high throughput technique
developed to quantitate gene expression levels at the whole genome scale.
 A microarray is a pattern of ssDNA probes which are immobilized on a surface
(called a chip or a slide) in a regular pattern of spots and each spot containing
millions of copies of a unique DNA probe.
 Principle: base-pairing hybridization
 The technique was first time used for “Quantitative Monitoring of Gene
Expression Patterns with a complementary DNA microarray” by
Patrick Brown, Mark Schena and colleague published in Science (1995). Mark Schena
“Father of Microarray Technology”

Methodology
Microarray analysis can be divided into
1. Probe production
Specific sequences are immobilized to a surface
and reacted with labelled cDNA targets.
2. Target (cDNA) production
mRNA is extracted from the sample, converted to
cDNA and labeled using fluorescent dyes (usually
Cy3 or Cy5).
The array

3. Hybridisation
 The chip is exposed to a solution
containing extracted labelled cDNA
 Complementary nucleic acid sequences get
pair via hydrogen bonds.
 Washing off of non-specific bonding
sequences .

4. Scanning
 The array is scanned to measure fluorescent
label.
 Fluorescently labelled target sequences that
bind to a probe sequence generate a signal.
 The signal depends on.
 The hybridization conditions,
ex: temperature
 washing after hybridization
 Total strength of the signal, depends upon the
amount of target sample.

Microarray
Low density
spotted array
High density
short probe array
 prepared by using cDNA known as cDNA
chips , cDNAs are amplified using PCR than
Spotted by inkjet technology
 Probes are >1000 nucleotides in length
 Two different fluorophores can be used to
label the test and control sample (two-channel
detection)
 Cannot identify alternative splicing events
 short DNA oligonucleotides are directly
synthesised on solid microarray substrate
using photolithgraphy (Affymatrix) or spotted by
ink-jet printing.
 Probes are 70-80 nucleotides in length
 Uses a single fluorescent lable (single channel)
 Can identify alternative splicing events
 Commercial oligonucleotide chips are available
(e.g. Affymetrix, Inc. GeneChip system)

The Colours of a Microarray
cDNA chip Affymatrix gene chip

Applications of Microarray
Gene expression profiling
Gene fusions detection
Alternative splicing detection
SNP detection
TILING array
Chip

ADVANTAGES
High-throughput
Fast
Relatively inexpensive
LIMITATIONS
Reliance upon existing knowledge about genomesequence
High background levels owing to cross-hybridization
A limited dynamic range of detection due to both background and saturation of signals
Comparing expression levels across different experiments is often difficult and can require
complicated normalization methods

SAGE - Serial Analysis of Gene Expression
 SAGE invented at Johns Hopkins University in USA
(Oncology Center) by Dr. Victor Velculescu in 1995.
• SAGE is an approach that allows rapid and detailed
analysis of overall gene expression patterns.
• SAGE provides quantitative and comprehensive
expression profiling in a given cell population.
Dr. Victor Velculescu

Principle Underlining SAGE methodology
 A short sequence tag (10-14bp) contains sufficient information to uniquely identify a transcript provided that
tag is obtained from a unique position within each transcript.
 Sequence tag can be linked together to form long serial molecules that can be cloned and sequenced.
 Quantitation of the number of times a particular tag is observed provides the expression level of the
corresponding transcript.

Advantages
 mRNA sequence does not need to be known prior, so genes of variants which are
not known can be discovered.
 Its more accurate as it involves direct counting of the number of transcripts.
Disadvantages
 Length of gene tag is extremely short (13 or 14bp), so if the tag is derived from an
unknown gene, it is difficult to analyze with such a short sequence.
 Type II restriction enzyme does not yield same length fragments.

CAGE
 Cap analysis gene expression (CAGE) is a gene expression technique used in molecular biology to
produce a snapshot of the 5′ end of the messenger RNA population in a biological sample.
 First published by Hayashizaki, Carninci and coworkers in 2003.
 Aims to identify TIS and promoters.
 Collects 21 bp from 5’ ends of cap purified cDNA.
 The method essentially uses full-length cDNAs , to the 5’ ends of which linkers are attached.
 This is followed by the cleavage of the first 20 base pairs by class II restriction enzymes, PCR,
concatamerization, and cloning of the CAGE tags

ESTs
 In 1983, SD Putney for the first time
demonstrated the use of cDNA in identification of
genome.
 In 1991 Adams and co-workers coined the term
EST.
 They are the tiny sequences of cistron randomly
selected from genome library and can be used to
identify and map the whole genome of any
particular species. ESTs are usually 200 to 500
nucleotides long and are generated by sequencing
the ends of DNA.
Figure: Method of construction of ESTs from
nascent DNA.
 Use of EST
 Identify unknown gene and map their position in a
genome.
 Provide simple and inexpensive path for
discovering new gene.
 Genome map construction.
 Characterization of expressed gene

RNA-Sequencing
 RNA-seq, also called whole-transcriptome shotgun sequencing, refers to the use of high-throughput
sequencing technologies to reveal the presence and quantity of RNA in a biological sample at a given
moment.
Types of RNA sequencing
1.Direct sequencing
2.Indirect sequencing (next-generation sequencing )

Direct RNA sequencing
 Direct single molecule RNA sequencing without prior cDNA conversion
 Sequencing by synthesis
 Polyadenylation by poly (A) polymerase I (PAPI) from E. coli
 Blocking by 3′ deoxyATP
 Poly (A) tail ~ 150 nucleotides
 Each RNA molecule is filled in with dTTP and polymerase and than
locked in position VT-A, VT-C, VT-G, stopping the further nucleotide
addition.
 Unincorporated dye labelled nucleotides are washed away, images are
taken.
 Flourescent dye and inhibitor are cleaved off from the incorporated
nucleotide, making it suitable for additional rounds of incorporation.
 Repeating this cycle of rinsing, imaging and cleaving provides a set of
images that are aligned and are used to generate the sequence information
for each individual RNA molecule with real time image processing.
 Requirement of minor RNA quantities
 No biases due to cDNA synthesis, end repair, ligation and amplification
procedures
 Potentially useful to study short RNA species

Sequencing using NGS
•Next generation sequencing is also known as massively-parallel sequencing
•It enables large amount sequencing to be performed in a single assay
•Mostly produce short reads <400bp
•Read numbers vary from ~ 1 million to ~ 1 billion per run

Next-generation sequencing- Workflow
cDNA fragmentation and
in vitro adaptor ligation
emulsion PCR bridge PCR
Pyrosequencing Sequencing-by-ligation Sequencing-by-synthesis
1
2
3
1
2
3
SOLiD platform
ROCHE/454 sequencing ILLUMINA/Solexa technology
Library preparation
Clonal amplification
Cyclic array sequencing

Data analysis for mRNA-seq: key steps
 Mapping reads to the reference genome
 Read mapping of 454 sequencers can be done by conventional
sequence aligners.
 Short read aligner needed for Illumina or SOLiD reads
 Quantifying the known genes
 Prediction of novel transcripts
 Assembly of short reads: comparative vs. de novo

Next generation sequencing (NGS) techniques
Perticulars 454 Sequencing Illumina/Solexa ABI SOLiD
Sequencing Chemistry Pyrosequencing
Polymerase-based
sequence-by-synthesis
Ligation-based
sequencing
Amplification approach Emulsion PCR Bridge amplification Emulsion PCR
Paired end (PED)
separation
3 kb 200-500 bp 3 kb
Mb per run 100 Mb 1300 Mb 3000 Mb
Time per PED run <0.5 day 4 days 5 days
Read length (update) 250-400 bp 35, 75 and 100 bp 35 and 50 bp
Cost per run $ 8,438 USD $ 8,950 USD $ 17,447 USD
Cost per Mb $ 84.39 USD $ 5.97 USD $ 5.81 USD

 Do not use PCR amplification for template preparation
 Sequences single RNA molecule
 Faster, Cheaper, much higher throughput than NGS.
 Higher error rate.
 can produce long reads averaging between 5,000bp to 15,000bp.
 also known as long-read sequencing
The two commercially available third-generation DNA sequencing technologies are
 Pacific Biosciences (PacBio) Single Molecule Real Time (SMRT) sequencing,
 the Oxford Nanopore Technologies sequencing platform.
THIRD GENERATION SEQUENCING

 Commercially introduced in 2010.
 Sequences DNA using sequencing-by-synthesis,
 Based on monitoring polymerase activity while
incorporating differently labelled nucleotides into the
DNA strand
 Read lengths of up to ~100,000 bp
 Greatest throughput (~8GB / day).
 Error rate -10% to 15%
 Cost ~ $500-2000/Gb ($100/run)
PacBio SMRT technology
Limitation
 PacBio sequencing is the cost relative to second-
generation approaches, which has limited its
application.
 For analyzing large numbers of genomes.

Fig. In Single Molecule Real-Time (SMRT) sequencing the emission spectra of fluorescent
labelled nucleotides are detected while being incorporated by the polymerase.
Source: Pacific Biosciences.

 Released in 2014.
 Sequences RNA by electronically measuring the minute disruptions
to electric current as RNA molecules pass through a nanopore.
 150 bases/sec/pore
 125 Gb/ day
 20-100.000 bases reads
 4% error rate
 Cost $10/Gb
Oxford Nanopore Technologies
Fig. ONTs MinION sequencing device
attached to a laptop computer.

Some mRNA-Seq Applications
Differential gene expression analysis
Transcriptional profiling
Identification of splice variants
Novel gene identification
Transcriptome assembly
SNP finding
RNA editing

Some questions that can be addressed by transcriptomic methods:
a) How much transcript is there from each gene (expression level)?
b) How does expression level change over development (expression profile)?
c) How does expression differ among different tissues ?
d) How does environment/treatment affect gene expression?
e) How much variation is there in gene expression levels within natural populations?

TRANSCRIPTOMICS
Application
Protein Coding
Gene Annotation
Gene
Expression
Profiling
Noncoding
RNA
Discovery and
Detection
Transcript
Rearrangement
Discovery
Single
Nucleotide
Variation
Profiling
eQTL

Molecular Markers Developed by Means of High-throughput Transcriptomics
Techniques for the Breeding of Important Crops

 In this study, it is revealed that PB could significantly enhance rice seedling survival by retaining a higher level of
chlorophyll content and alcohol dehydrogenase activity.
 Transcriptomic analysis identified 3936 differentially expressed genes (DEGs) among the GA- and PB-treated
samples and control, which are extensively involved in the submergence and other abiotic stress responses,
phytohormone biosynthesis and signaling, photosynthesis, and nutrient metabolism.
 The results suggested that PB enhances rice survival under submergence through maintaining the photosynthesis
capacity and reducing nutrient metabolism.

 Comparative analysis of wheat anther transcriptomes for male fertile wheat and SQ-1–induced male sterile wheat was carried
out using next-generation sequencing technology
 In all, 42,634,12 sequence reads were generated and were assembled into 82,356 high quality unigenes with an average length
of 724 bp.
 1,088 unigenes were significantly differentially expressed in the fertile and sterile wheat anthers, including 643 up-regulated
unigenes and 445 down-regulated unigenes.
 This study is the first to provide a systematic overview comparing wheat anther transcriptomes of male fertile wheat with
those of SQ-1–induced male sterile wheat and is a valuable source of data for future research in SQ-1–induced wheat male
sterility.

Transcriptomics approaches

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Transcriptomics approaches

Similar to Transcriptomics approaches (20)

Recently uploaded

Recently uploaded (20)

Transcriptomics approaches