Rna seq

RNA-seq: whole transcriptome analysis
Noncoding RNA:
RNA functions directly, based on its own
shape
Messenger RNA:
Codes for proteins, which function based
on their shape
Types of noncoding RNA:
• tRNA (transfer RNA)
• rRNA (ribosomal RNA)
• ribozymes (RNA enzymes)
• miRNA (micro RNA)
• snRNA (small nuclear RNA)
• siRNA (small interfering RNA)
• piRNA (Piwi-interacting RNA)
• Xist
• Many more

Exons, introns, and isoforms from NGS data
Alternative splicing can generate different isoforms from the same
RNA gene product:
* Noncoding RNAs can have introns, too! Examples: Xist, HOTAIR, other lincRNAs

Why do a whole transcriptome analysis?
• Unknown disease correlations
Finding what you’re looking for when you don’t know exactly what you are
looking for. “Hypothesis Free Approach”
Example: 70% treatment efficacy = 30% poor response/no response
–Whole transcriptome analysis- compare responders and non-responders
–Computer can identify differences, even in the absence of a hypothesis
–Computer can present unexpected results that a researcher would not look for
due to preconceptions about the disease biology
• Disease correlations with post-transcription events
e.g. gene fusions and alternative splicing
• Species without a reference genome (GTF)
unsequenced species, poorly annotated genome, environmental sequencing
• Power: can outperform microarrays

Microarray vs. RNA-seq
 Microarrays: can only detect sequences the
array was designed to detect (must know in
advance what to put on the chip)
 Certain analyses not possible with
microarray, such as:
• Distinguish mature mRNA from unspliced
RNA, as well as different isoforms/splice
variants
• Strandedness
• Single cell analysis
 RNA-seq: "fuzzy" overview; facilitates novel
transcript discovery
 RNA-seq lends itself to further and
confirmatory analyses
 Lower error rate + problems like cross-
hybridization avoided in RNA-seq

NGS
Steps:
1.fragment RNA
2.reverse transcribe => cDNA
3.High-throughput sequencing
Length: Long = more information
but more errors + expensive
Variety of machines:
-choose based on experimental
design and cost
-output: 7.5 Gb to 1800 Gb
-max reads/run: 25 million to 6
billion
-max read length: 2 x 150bp to 2
x 300 bp

RNA-seq overview
de novo
Step 1:
Preparation of raw RNA reads
-Primers cleaned from library (library of
fragments)
-Length: computation vs. sequencing power
-Single-end vs. Paired-end
Sequences of
fragments (reads)
will be aligned to a
reference genome
with GTF file

Align RNA-seq library to genome
For today’s analysis, we will be mapping to a genome using an existing GTF file
• Genes
• Isoforms
Step 2:
Mapping on Transcriptome
Step 3:
Generating expression tables
Genes and isoforms
For our purposes, mapping (aligning) reads to a transcriptome is
just mapping to a genome, but with expression levels of each
transcript

Building pipelines in the T-BioInfo platform

The T-BioInfo pipeline we will be building in
today’s workshop

So, the pipeline will give us a table of transcripts.
Now what?
• Normalization: Methods for overcoming variance due to
technical issues or other issues not related to the experiment
• Post-processing:
• Principal Component Analysis (PCA): provides visual overview
of the data
• Statistical analysis (e.g. T-test)
• Machine learning techniques
• Biological interpretation of results: use databases to find out
more about the identified genes, e.g. publications,
correlations

Output you will
see (Excel table):
First two components
(“principle components”)
can be plotted on a 2D
graph to detect clustering:
“Shadow” (does not
show the whole picture)
Benchmark: 40% of variability
PCA
Dimension reduction technique for reducing a lot of data into a subset that captures the essence
of the original data.

A brief explanation of machine learning
Using a training set to teach a computer to categorize
Duck vs. Not Duck:

Three subtypes of breast cancer
1. ER+ Positive for the estrogen receptor, treatment includes hormone therapy and drug
treatments targeting the estrogen receptor. The most common subtype of diagnosed breast
cancer. Positive outlook in the short term.
1. HER2+ Overexpress human epidermal growth factor, HER2/neu, a growth-promoting protein.
This type of cancer tends to be more aggressive than ER+ or PR+ breast cancer. Cannot be
treated with hormone therapy, but there are targeted drug treatments.
1. Triple Negative Negative for estrogen receptor and progesterone receptor, and does not
overexpress HER2/neu. Most cancers with mutated BRCA1 genes are triple negative. This type
responds to surgery/chemotherapy, but tends to recur later. No targeted therapy, although some
treatments in development. Survival rates lower than for other breast cancer subtypes. This
cancer type occurs in 15-20% of those diagnosed with breast cancer in the United States.
Patient Derived Xenograft mouse models
each represents a different way of being immunocompromisedEx:
Athymic Nude: Lacks the thymus, unable to produce T-cells.
NOD/CB17 SCID: Combined immunodeficiency, no mature T cells or B cells. Functional natural
killer cells, macrophages, and granulocytes.
Tumor = human, Stroma = mouse (original transplant had human stroma)

Whole Transcriptome Profiling of
Cancer Tumors in Mouse PDX Models
http://www.impactjournals.com/oncotarget/index.php?journal=oncotarget&page=article&op=view&path%5B%5D=80
14
Based on breast cancer samples taken from the publication “Whole transcriptome profiling
of patient-derived xenograft models as a tool to identify both tumor and stromal specific
biomarkers” (James R. Bradford et. al.; DOI: 10.18632/oncotarget.8014)

Introduction
• Dataset: 21 samples from 3 subtypes of breast cancer in 3 different mouse models.
• Goals: identify a clear signal showing transcriptional differences between cancer subtypes
1) Identify differences in expression between cancer subtypes and between mouse models 2) Select representative
genes that could be considered as biomarker candidates
PDX Mouse Species
XID: Characterized by the absence
of the thymus, mutant B
lymphocytes, and no T-cell function.
NOD SCID: Severe combined
immunodeficiency, with no
mature T cells and B cells.
Athymic Nude: Lacks the
thymus and is unable to
produce T-cells
Breast TN: Survival rates are lower for this cancer than
ER+ cancer types.
Breast ER+: Treatment often includes Hormone Therapy
and has a more positive outlook in the short term.
Breast HER2+: Tends to be a more aggressive cancer
type than ER+.
Breast Cancer Subtypes

Sample Summary
For More information: http://www.cancer.org/cancer/breastcancer/detailedguide/breast-cancer-classifying
Biological Data Repositories

What is a FastQ file?
Project Accession Number
FASTA Format:
Text Based File without the Quality Score

Step 1: RNA-seq pipeline prepares all annotated and non-
annotated genomic element estimation of expression levels
Removing genomic elements that
did not have any expression (all
zeros) in the RSEM table. This
includes both the isoform and gene
tables.
Quantile Normalization
Principal Component Analysis
Step 2: RSEM output tables of genes and isoforms are
prepared for Machine Learning Analysis
1. Mapping by Bowtie2 using the original GTF
(Mouse and Human Genome Combined)
2. RSEM Expression Table: Quantification of Gene
and Isoform Level Abundance
3. Outputs include Genes Table and Isoform Table Factor Regression Analysis
Visualization of T-Bioinfo Bioinformatics Functions
Lets First Build Our RNA-seq Pipeline!

When your RNA-seq pipeline is complete….

Before Normalization
After Normalization
Gene Name Sample Names
Multi-Sample Normalization is considered a standard and necessary part of RNA-seq Analysis.
- Unwanted Technical Variation

Biological Databases- Great for
Annotation!
https://david.ncifcrf.gov/http://www.ensembl.org

Now back to the T-BioInfo Platform!
1. Start a PCA Pipeline
2. Create a Scatter Plot Image from our Results
3. Utilize DAVID and ENSEMBL to investigate Biological Meaning
4. Learn about other Machine Learning Methods
5. Understand a “real” RNA-seq project timeline
T-Bio.Info Platform: http://tbioinfopb1.pine-biotech.com:3000

PCA of Human Tumor By Samples and By
Genes
Link:https://pinebio.shinyapps.io/app_genes/
Link: https://pinebio.shinyapps.io/app_samples/
https://pinebio.shinyapps.io/app_samples/
PC1:22.16%, PC2:9.22%

• Extracellular
Matrix
Remodeling
• Cell
Migration
• Tumor
Growth
• Angiogenesis
0
2
4
6
8
10
12
LevelofExpression
Breast Cancer Samples
Matrix Metalloprotease 14 Expression in Breast Cancer Samples
Upregulated in Triple Negative Cancer Samples
Defining the Breast Cancer Subtypes

• Estrogen
Regulated
Proteins
• Oncogenic
• Bone
Metastasis
TFF3 is a promoter of angiogenesis in Breast
Cancer . This protein is secreted from
mammary carcinoma cells to promote
angiogenesis
TFF3 also promotes angiogenesis by direct
functional effects on endothelial cellular
processes promoting angiogenesis.
TFF3 stimulates angiogenesis to co-
coordinate with the growth promoting and
metastatic actions of TFF3 in mammary
carcinoma to enhance tumor progression
and dissemination.
0
2
4
6
8
10
12
LevelOfExpression
Trefoil Factor 3 in Breast Cancer

Upregulated in Estrogen Receptor + Samples
Significance of Hormones to Breast Cancer- Endocrine Therapy
0
2
4
6
8
10
12
LevelOfExpression
Estrogen Receptor Expression in Breast Cancer Samples
Estrogen
Stimulates the
cell
proliferation
of the Breast
cancer cell

Progesterone
receptor testing
is a standard
part of testing
for breast cancer
diagnosis 0
1
2
3
4
5
6
7
8
LevelofExpression
Breast Cancer Sample
Progesterone Receptor Expression in Breast Cancer Samples
Progesterone receptors, when activated by progesterone,
actually attached themselves to the estrogen receptors,
which caused the estrogen receptors to stop turning on the
cancer promotion gene.
Then they actually turned on the genes that promote death
of cancer cells (called apoptosis), and the growth of
healthy cells!
Upregulated in Estrogen Receptor Cancer

Estrogen Receptor, HER2, Triple Negative
Expression Profile 1:
High Estrogen Receptor
High Progesterone Receptor
Low Matrix Metalloprotease 14
Low Estrogen Receptor
No Progesterone Receptor
High Matrix Metalloprotease 14
Low Estrogen Receptor
Low Progesterone Receptor
High Matrix Metalloprotease 14
HER2 Breast Cancer
Luminal B- Estrogen Positive Breast Cancer
Basal-Triple Negative Breast Cancer
0
2
4
6
8
10
12
Estrogen MMP14 Progesterone
Breast Cancer Sample 1
0
2
4
6
8
10
0
2
4
6
8
10

Factor Regression Analysis
A0B0 Triple Neg/ Athymic Nude
A0B1 Triple Neg-/SCID
A1B0 ER+/ Athymic Nude
A1B1 ER+/ SCID
Factor Table (2 factors, 2 levels each)
Factor A: Triple Negative vs. ER+
Factor A: Triple Negative vs. ER+

RNA-Seq Experiment Overview
Based on Breast Cancer Samples taken from the publication “Whole transcriptome profiling of patient-derived xenograft models
as a tool to identify both tumor and stromal specific biomarkers” (James R. Bradford et. al.; DOI: 10.18632/oncotarget.8014)
HER2 ER+TNBC
NOD SCID XID Athymic CB17 SCID
1. Ribosomal Depleted RNA
2. Fragment RNA
3. TruSeq RNA Sample
Preparation Kit
4. Concatenated Genome
(Mouse/Human)
5. Indexed with star align
Secondary Analysis
Tertiary Analysis
Gene Summary and Ontology Report
1. Mapping using TopHat
2. Finding Isoforms using Cufflinks
3. GTF file of isoforms using Cuffmerge
4. Mapping Bowtie-2t on new transcriptome
Cancer Subtypes
Mouse Species

Thanks for Listening!
Any Questions?
Contact: Info@pine-biotech.com
T-Bioinfo Platform : http://tbioinfopb1.pine-biotech.com:3000
Pine Biotech Website: http://pine-biotech.com
Pine Biotech Education Website: http://edu.t-bio.info

Factor Regression Analysis
A0B0 Triple Neg/ Athymic Nude
A0B1 Triple Neg-/SCID
A1B0 ER+/ Athymic Nude
A1B1 ER+/ SCID
Factor Table (2 factors, 2 levels each)
Triple Negative Samples ER+ Samples
Selecting Human Genes Under the Influence of
Either Triple Negative Breast Cancer or Estrogen
Positive Breast Cancer
Gene Expression Key
*No Significant Mouse Genes
Link: https://pinebio.shinyapps.io/app_faca/

Rna seq

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Rna seq

Similar to Rna seq (20)

Recently uploaded

Recently uploaded (20)

Rna seq