The study of the complete set of RNAs (transcriptome) encoded by the genome of a specific cell or organism at a specific time or under a specific set of conditions is called Transcriptomics.
Transcriptomics aims:
I. To catalogue all species of transcripts, including mRNAs, noncoding RNAs and small RNAs.
II. To determine the transcriptional structure of genes, in terms of their start sites, 5′ and 3′ ends, splicing patterns and other post-transcriptional modifications.
III. To quantify the changing expression levels of each transcript during development and under different conditions.
2. Transcriptome
Approaches for transcriptome analysis
Candidate gene approach
Hybridisation based approaches
Sequencing based approaches
Emerging sequencing approaches
Applications of transcriptomics
Case studies
Overview
3. CENTRAL DOGMA OF MOLECULAR BIOLOGY
GENOME TRANSCRIPTOME PROTEOME
Complete set of transcripts
and relative levels of
expression in a particular
cell or tissue under defined
conditions at a given time
Complete DNA content
of an organism with all
its gene and regulatory
sequences
Complete collection of
proteins and their relative
levels in each cell
4. TRANSCRIPTOME
The ‘transcriptome’ is defined as the complete
complement of RNA molecules generated by a cell
or population of cells.
The term was first proposed by Charles Auffray
in 1996.
Charles Auffray
5. The study of the complete set of RNAs (transcriptome) encoded by the genome of a
specific cell or organism at a specific time or under a specific set of conditions is called
Transcriptomics.
In transcriptomics, the expression of genes by a genomeis studied :
Qualitatively (identifying which genes are expressed and which are not)
Quantitatively (measuring varying levels of expression for different genes)
TRANSCRIPTOMICS
6. Milestones in transcriptome analysis
YEAR MILESTONE
1965 Sequence of the first RNA molecule determined
1977 Development of the Northern blot technique and the Sanger sequencing method
1989 Reports of RT-PCR experiments for transcriptome analysis
1991 First high-throughput EST sequencing study
1992 Introduction of Differential Display (DD) for the discovery of differentially expressed genes
1995 Reports of the microarray and Serial Analysis of Gene Expression (SAGE) methods
2001 Draft of the Human Genome completed
2005 First next-generation sequencing technology (454/Roche) introduced to the market
2006 First transcriptome sequencing studies using a next-generation technology (454/Roche)
7. Why transcriptomics is important?
Transcriptome profiling provides clues to:
Expressed sequences and genes of genome
Gene regulation and regulatory sequences
Gene function annotation
Functional differences between tissue and cell types
Identification of candidate genes for any given process or disease
8. I. To catalogue all species of transcripts, including mRNAs, noncoding RNAs and
small RNAs.
II. To determine the transcriptional structure of genes, in terms of their start sites,
5′ and 3′ ends, splicing patterns and other post-transcriptional modifications.
III. Toquantify the changing expression levels of each transcript during development
and under different conditions.
Transcriptomics aims:
9. • Northern Blotting Analysis
• quantitative real-time RT-PCR
Candidate gene approaches
• Microarray expression profiling
Hybridization based approaches
• EST libraries
• Serial analysis of gene expression (SAGE)
• CAGE
• RNA sequencing
Sequencing based approaches
Approaches for transcriptome mining
10. Northern Blotting Analysis
The northern blot, or RNA blot, is a technique used in molecular biology research to study
gene expression by detection of RNA (or isolated mRNA) in a sample.
The northern blot technique was developed in 1977 by James Alwine, David Kemp, and
George Stark at Stanford University.
11. Reverse transcription polymerase chain reaction
(RT-PCR)
Application
• RT-PCR is used to clone expressed genes by reverse transcribing the RNA of interest
into its DNA complement through the use of reverse transcriptase.
• Utilized for quantification of RNA, by incorporating qPCR into the technique. The
combined technique, described as quantitative real-time RT-PCR ( RT-qPCR)
A variant of polymerase chain reaction (PCR), commonly used in molecular biology to
detect RNA expression.
14. Fluorescence emission is measured continuously during the PCR reaction and ▵Rn (increase in fluorescence emission,
from which the background fluorescence signal is subtracted) is plotted against cycle number. The threshold cycle (Ct) is
the cycle at which the fluorescence exceeds a chosen threshold.
Fig: PCR amplification plot
Intensity
of
fluoroscence
Cycle number
PCR amplification plot
15. One-step vs. Two-step RT-qPCR
One-step assays combine reverse transcription and PCR in a single tube and buffer, using
a reverse transcriptase along with a DNA polymerase. One-step RT-qPCR only utilizes
sequence-specific primers.
In two-step assays, the reverse transcription and PCR steps are performed in separate
tubes, with different optimized buffers, reaction conditions, and priming strategies.
16. MICROARRAY
Microarray is a nucleic acid hybridization based, high throughput technique
developed to quantitate gene expression levels at the whole genome scale.
A microarray is a pattern of ssDNA probes which are immobilized on a surface
(called a chip or a slide) in a regular pattern of spots and each spot containing
millions of copies of a unique DNA probe.
Principle: base-pairing hybridization
The technique was first time used for “Quantitative Monitoring of Gene
Expression Patterns with a complementary DNA microarray” by
Patrick Brown, Mark Schena and colleague published in Science (1995). Mark Schena
“Father of Microarray Technology”
17. Methodology
Microarray analysis can be divided into
1. Probe production
Specific sequences are immobilized to a surface
and reacted with labelled cDNA targets.
2. Target (cDNA) production
mRNA is extracted from the sample, converted to
cDNA and labeled using fluorescent dyes (usually
Cy3 or Cy5).
The array
18. 3. Hybridisation
The chip is exposed to a solution
containing extracted labelled cDNA
Complementary nucleic acid sequences get
pair via hydrogen bonds.
Washing off of non-specific bonding
sequences .
19. 4. Scanning
The array is scanned to measure fluorescent
label.
Fluorescently labelled target sequences that
bind to a probe sequence generate a signal.
The signal depends on.
The hybridization conditions,
ex: temperature
washing after hybridization
Total strength of the signal, depends upon the
amount of target sample.
20. Microarray
Low density
spotted array
High density
short probe array
prepared by using cDNA known as cDNA
chips , cDNAs are amplified using PCR than
Spotted by inkjet technology
Probes are >1000 nucleotides in length
Two different fluorophores can be used to
label the test and control sample (two-channel
detection)
Cannot identify alternative splicing events
short DNA oligonucleotides are directly
synthesised on solid microarray substrate
using photolithgraphy (Affymatrix) or spotted by
ink-jet printing.
Probes are 70-80 nucleotides in length
Uses a single fluorescent lable (single channel)
Can identify alternative splicing events
Commercial oligonucleotide chips are available
(e.g. Affymetrix, Inc. GeneChip system)
21. The Colours of a Microarray
cDNA chip Affymatrix gene chip
22. Applications of Microarray
Gene expression profiling
Gene fusions detection
Alternative splicing detection
SNP detection
TILING array
Chip
23. ADVANTAGES
High-throughput
Fast
Relatively inexpensive
LIMITATIONS
Reliance upon existing knowledge about genomesequence
High background levels owing to cross-hybridization
A limited dynamic range of detection due to both background and saturation of signals
Comparing expression levels across different experiments is often difficult and can require
complicated normalization methods
24. SAGE - Serial Analysis of Gene Expression
SAGE invented at Johns Hopkins University in USA
(Oncology Center) by Dr. Victor Velculescu in 1995.
• SAGE is an approach that allows rapid and detailed
analysis of overall gene expression patterns.
• SAGE provides quantitative and comprehensive
expression profiling in a given cell population.
Dr. Victor Velculescu
25. Principle Underlining SAGE methodology
A short sequence tag (10-14bp) contains sufficient information to uniquely identify a transcript provided that
tag is obtained from a unique position within each transcript.
Sequence tag can be linked together to form long serial molecules that can be cloned and sequenced.
Quantitation of the number of times a particular tag is observed provides the expression level of the
corresponding transcript.
28. Advantages
mRNA sequence does not need to be known prior, so genes of variants which are
not known can be discovered.
Its more accurate as it involves direct counting of the number of transcripts.
Disadvantages
Length of gene tag is extremely short (13 or 14bp), so if the tag is derived from an
unknown gene, it is difficult to analyze with such a short sequence.
Type II restriction enzyme does not yield same length fragments.
29. CAGE
Cap analysis gene expression (CAGE) is a gene expression technique used in molecular biology to
produce a snapshot of the 5′ end of the messenger RNA population in a biological sample.
First published by Hayashizaki, Carninci and coworkers in 2003.
Aims to identify TIS and promoters.
Collects 21 bp from 5’ ends of cap purified cDNA.
The method essentially uses full-length cDNAs , to the 5’ ends of which linkers are attached.
This is followed by the cleavage of the first 20 base pairs by class II restriction enzymes, PCR,
concatamerization, and cloning of the CAGE tags
30. ESTs
In 1983, SD Putney for the first time
demonstrated the use of cDNA in identification of
genome.
In 1991 Adams and co-workers coined the term
EST.
They are the tiny sequences of cistron randomly
selected from genome library and can be used to
identify and map the whole genome of any
particular species. ESTs are usually 200 to 500
nucleotides long and are generated by sequencing
the ends of DNA.
Figure: Method of construction of ESTs from
nascent DNA.
Use of EST
Identify unknown gene and map their position in a
genome.
Provide simple and inexpensive path for
discovering new gene.
Genome map construction.
Characterization of expressed gene
31. RNA-Sequencing
RNA-seq, also called whole-transcriptome shotgun sequencing, refers to the use of high-throughput
sequencing technologies to reveal the presence and quantity of RNA in a biological sample at a given
moment.
Types of RNA sequencing
1.Direct sequencing
2.Indirect sequencing (next-generation sequencing )
32. Direct RNA sequencing
Direct single molecule RNA sequencing without prior cDNA conversion
Sequencing by synthesis
Polyadenylation by poly (A) polymerase I (PAPI) from E. coli
Blocking by 3′ deoxyATP
Poly (A) tail ~ 150 nucleotides
Each RNA molecule is filled in with dTTP and polymerase and than
locked in position VT-A, VT-C, VT-G, stopping the further nucleotide
addition.
Unincorporated dye labelled nucleotides are washed away, images are
taken.
Flourescent dye and inhibitor are cleaved off from the incorporated
nucleotide, making it suitable for additional rounds of incorporation.
Repeating this cycle of rinsing, imaging and cleaving provides a set of
images that are aligned and are used to generate the sequence information
for each individual RNA molecule with real time image processing.
Requirement of minor RNA quantities
No biases due to cDNA synthesis, end repair, ligation and amplification
procedures
Potentially useful to study short RNA species
33. Sequencing using NGS
•Next generation sequencing is also known as massively-parallel sequencing
•It enables large amount sequencing to be performed in a single assay
•Mostly produce short reads <400bp
•Read numbers vary from ~ 1 million to ~ 1 billion per run
35. Data analysis for mRNA-seq: key steps
Mapping reads to the reference genome
Read mapping of 454 sequencers can be done by conventional
sequence aligners.
Short read aligner needed for Illumina or SOLiD reads
Quantifying the known genes
Prediction of novel transcripts
Assembly of short reads: comparative vs. de novo
36. Next generation sequencing (NGS) techniques
Perticulars 454 Sequencing Illumina/Solexa ABI SOLiD
Sequencing Chemistry Pyrosequencing
Polymerase-based
sequence-by-synthesis
Ligation-based
sequencing
Amplification approach Emulsion PCR Bridge amplification Emulsion PCR
Paired end (PED)
separation
3 kb 200-500 bp 3 kb
Mb per run 100 Mb 1300 Mb 3000 Mb
Time per PED run <0.5 day 4 days 5 days
Read length (update) 250-400 bp 35, 75 and 100 bp 35 and 50 bp
Cost per run $ 8,438 USD $ 8,950 USD $ 17,447 USD
Cost per Mb $ 84.39 USD $ 5.97 USD $ 5.81 USD
37. Do not use PCR amplification for template preparation
Sequences single RNA molecule
Faster, Cheaper, much higher throughput than NGS.
Higher error rate.
can produce long reads averaging between 5,000bp to 15,000bp.
also known as long-read sequencing
The two commercially available third-generation DNA sequencing technologies are
Pacific Biosciences (PacBio) Single Molecule Real Time (SMRT) sequencing,
the Oxford Nanopore Technologies sequencing platform.
THIRD GENERATION SEQUENCING
38. Commercially introduced in 2010.
Sequences DNA using sequencing-by-synthesis,
Based on monitoring polymerase activity while
incorporating differently labelled nucleotides into the
DNA strand
Read lengths of up to ~100,000 bp
Greatest throughput (~8GB / day).
Error rate -10% to 15%
Cost ~ $500-2000/Gb ($100/run)
PacBio SMRT technology
Limitation
PacBio sequencing is the cost relative to second-
generation approaches, which has limited its
application.
For analyzing large numbers of genomes.
39. Fig. In Single Molecule Real-Time (SMRT) sequencing the emission spectra of fluorescent
labelled nucleotides are detected while being incorporated by the polymerase.
Source: Pacific Biosciences.
40. Released in 2014.
Sequences RNA by electronically measuring the minute disruptions
to electric current as RNA molecules pass through a nanopore.
150 bases/sec/pore
125 Gb/ day
20-100.000 bases reads
4% error rate
Cost $10/Gb
Oxford Nanopore Technologies
Fig. ONTs MinION sequencing device
attached to a laptop computer.
41. Some mRNA-Seq Applications
Differential gene expression analysis
Transcriptional profiling
Identification of splice variants
Novel gene identification
Transcriptome assembly
SNP finding
RNA editing
42. Some questions that can be addressed by transcriptomic methods:
a) How much transcript is there from each gene (expression level)?
b) How does expression level change over development (expression profile)?
c) How does expression differ among different tissues ?
d) How does environment/treatment affect gene expression?
e) How much variation is there in gene expression levels within natural populations?
44. Molecular Markers Developed by Means of High-throughput Transcriptomics
Techniques for the Breeding of Important Crops
45. In this study, it is revealed that PB could significantly enhance rice seedling survival by retaining a higher level of
chlorophyll content and alcohol dehydrogenase activity.
Transcriptomic analysis identified 3936 differentially expressed genes (DEGs) among the GA- and PB-treated
samples and control, which are extensively involved in the submergence and other abiotic stress responses,
phytohormone biosynthesis and signaling, photosynthesis, and nutrient metabolism.
The results suggested that PB enhances rice survival under submergence through maintaining the photosynthesis
capacity and reducing nutrient metabolism.
46. Comparative analysis of wheat anther transcriptomes for male fertile wheat and SQ-1–induced male sterile wheat was carried
out using next-generation sequencing technology
In all, 42,634,12 sequence reads were generated and were assembled into 82,356 high quality unigenes with an average length
of 724 bp.
1,088 unigenes were significantly differentially expressed in the fertile and sterile wheat anthers, including 643 up-regulated
unigenes and 445 down-regulated unigenes.
This study is the first to provide a systematic overview comparing wheat anther transcriptomes of male fertile wheat with
those of SQ-1–induced male sterile wheat and is a valuable source of data for future research in SQ-1–induced wheat male
sterility.