This document summarizes an RNA-seq analysis that compared gene expression between wild type and fibrosis samples from mouse muscle tissue. The analysis used the DESeq2 package to normalize counts, perform quality control, differential expression analysis, and clustering. It identified around 6,000 genes with a log fold change not equal to 0 and subset those genes with a Benjamini-Hochberg adjusted p-value less than 0.05 as significantly differentially expressed between the two conditions. Next steps mentioned are further analyzing these significant genes for systems biology networks, gene ontology terms, and pathway enrichment.
2. RNA-seq Technology
Differential Gene Expression
Analysis
Here in this study, I have used DEseq2 package and Negative
Binomial Distribution Model.
Fastq files of reads from NGS tools collected, within quality
score 33-40, ASCII: I-B
R Cloud
Each sample group counts are merged together.
From this Directory, file read in R cloud.
Merging in one csv File
Treatment : SMOC2 knockdown
Wild Type Vs Fibrosis
Species : Mus Musculus
Gene Expression Omnibus
5. RNA-seq Workflow
47729 genes
Data normalization based on
library depth, RNA
construction, read length.
Normalization
Quality Control
DE Analysis
Quality check and filter out
outliers causing variance in
data.
Unsupervised Clustering
Shrinkage of data from fit
model and using threshold in
MA plot.
Shrinkage of Log2Fold
Change
Sub setting genes with
Benjamini Hochberg
adjusted P value < 0.05.
Significant Differentially
Expressed Genes
11. Unsupervised Clustering
In The heatmap all biological
replicated seems to cluster
together. Moreover, no outliers is
found in PCA analysis.
Consideration of Condition in
Metadata, as major source of
variation is validated.
12. Fit Model
• In the next slide, original
data points (black dots)
seems to follow the fit
model. And inverse relation
between mean and
dispersion is seen, as
expected in RNA-seq
analysis.
• Wald test is performed.
Normal condition is taken as
the baseline. Dispersion
value is taken 0.05, as
standard.
• Still 10k genes were filtered
out. So, I have set 0.32 log
fold threshold to cut off
more insignificant genes.
13.
14. MA Plot
Around 6000 genes are found,
which log2Fold Change ≠ 0
Md. Tabassum Hossain Emon
15.
16. Significant DE Genes
Sub setting genes with Benjamini
Hochberg adjusted p-value and created
data frame.
Sorted genes on Padj value
Md. Tabassum Hossain Emon
Next
Significant genes can be further
analyzed for system biological network,
Gene ontology for
• Molecular function
• Biological Process
• Cell component and
Pathway enrichment analysis.