CALL ON ➥9907093804 🔝 Call Girls Baramati ( Pune) Girls Service
Biogroup
1. An Integrated Approach for
Mining Precise RNA based
Diagnostic and Prognostic
Cervical Cancer Biomarkers
PRESENTED BY
DR. SATARUPA BANERJEE (BT17IPF01), INSTITUTE POSTDOCTORAL FELLOW
UNDER GUIDANCE OF PROF. D KARUNAGARAN
IIT MADRAS
2. Cervical Cancer Definition and Statistics
• Cervical cancer is cancer that begins in the uterine
cervix, the lower end of the uterus that contacts the
upper vagina
3. FIGO Staging of Cervical Carcinoma
FIGO, is a classification system by the French
Fédération Internationale de Gynécologie et
d'Obstétrique.
4. Long non-coding RNAs (lncRNAs) are a subgroup of
non coding RNAs that are >200 nucleotides in length
and may be implicated in various types of gene
regulation, including transcriptional, post-
or epigenetic regulation
The biological roles of lncRNAs have largely been
underestimated.
They modulate gene expression but their expression is
not yet considered for diagnostics in cervical cancer
staging because of low GC content (low expression
level)
They have been identified to be associated with multiple
types of cancer, including CC
Dysregulation and Functional Roles of long non-
coding RNAs in Cervical Cancer
Peng, L., et al., LncRNAs: key players and novel
insights into cervical cancer. Tumor Biology,
2016. 37(3): p. 2779-2788.
5. Global Challenge
Since personalized medicine is still very costly, genome wide
biomarker selection is still in its infancy for implementing them in
low or middle income group countries.
FIGO staging of cervical cancer is based on visual clinical
assessment of cervical cancer progression by the physician in
different anatomical locations of the tumour.
Since this is completely subjective, a thorough understanding of
the molecular deregulation of signalling pathways and devising
objective methods for staging of cervical cancer are needed
Previous studies have integrated genomic and molecular
information to understand cervical cancer including HPV status, but
role of stage specific changes in transcriptome and lncRNA
profile is yet to be performed
Most of the studies performed yet are ethnicity specific.
6. Working Hypothesis
Global selection of minimal number of lncRNAs along with mRNAs for differentiating stages
during cervical cancer progression followed by their survival analysis may be important towards
cost minimization of healthcare diagnostics and increasing its prognostic ability.
Role of lncRNAs during cervical cancer progression can be identified using co-expression lncRNA-
mRNA network analysis for pathway enrichment
Role of miRNA families during cervical cancer progression can be identified using co-expression
mRNA-miRNA network analysis using selected mRNAs
Identification of biomarker panel having both diagnostic and prognostic with minimal number of
genes, will help the community Integration of molecular information with FIGO staging can be
more helpful for deciding the therapeutic intervention strategy further.
7. Objectives
Identification of minimal number of lncRNAs and mRNAs from publicly available RNAseq data
which can delineate progressing stages of cervical cancer as two class disease conditions with
high sensitivity and specificity.
Identification of common mRNAs differentially expressed in more than one microarray data
during FIGO stage based cervical cancer progression in ethnicity independent manner
Identification of role of miRNA families during cervical cancer progression from mRNA-miRNA
co-expression network analysis using selected mRNAs
Survival analysis of the selected lncRNAs, mRNAs and enriched miRNA candidates as well as
oncoprint analysis of selected lncRNA and mRNAs .
Pathway and Gene Ontology analysis of selected mRNAs
Identification of role of identified lncRNAs during cervical cancer progression from lncRNA-mRNA co-
expression network analysis for pathway enrichment
8. Methodology
(Using Supervised Machine Learning
Classifiers, after SVM weight based
Feature Ranking followed by
Sequential Feature Reduction )
Biomarker
selection
for FIGO based
stage specific
classification of
cervical cancer
Oncoprint
Analysis
Find common DE
mRNAs from each
dataset
Pathway
and GO
Analysis
(For stage I and stage II, stage II and stage III) (For stage III and stage IV)
(Using GEO2R, select DE
mRNAs with p value
<0.05 and log FC ± 2)
miRNA
Family
Enrichment
(Survival analysis of enriched miRNAs)
Patient
Number
TANRIC TCGA
Stage I 120 162
Stage II 35 70
Stage III 30 46
Stage IV 7 21
(mRNAs having RSEM
values more than
1000 in minimum 100
cases were
considered)
(Information mining
of mRNAs from
Genecards)
9. Results
Selected mRNA and lncRNAs
Stage Microarray Common Up Genes Microarray
Common Down
Genes
TCGA mRNA TCGA lncRNA
Stage I and
Stage II
KRT5,SLC2A1,GJA1,ACTG2,CD24,MY
H11,ADM,MMP7,HBA2,KRT16,ADH7
,SOX4,PTK7,RARRES3,S100A3,CXCL1
0,NKG7,CD247,COL7A1,LAG3,FBLN1,
LAMC2,SLC6A8,PMEPA1,MXRA5,MT
1X,PTHLH,LAMP3,PDPN,DST,HLA-
DPB1
EPCAM,TSPAN8,CD5
5,TMC5,LIMCH1,AN
K2,BBS7,BAMBI,COL
8A1,RAB11FIP2,PF4,
ARL2BP, BLVRB, CDK18, CREG1, DTX2,
EIF2AK4, HDHD1A, HMGCS1, HSPA8,
ITPA, KIAA0430, KRT10, LDHA, NES,
NRIP1, PAK2, PNPLA6, POLR2A, RBM3,
TRIM25, WASL, WLS and CHURC1
ENSG00000270462.1 ENSG00000250057.1
ENSG00000250751.1 ENSG00000247516.3
ENSG00000250433.1 ENSG00000254762.1
ENSG00000258609.1 ENSG00000258658.1
ENSG00000230427.1 ENSG00000225234.1
ENSG00000267992.1 ENSG00000260510.1
ENSG00000265094.1 ENSG00000264421.1
ENSG00000230866.1 ENSG00000263316.1
Stage II
and Stage
III
GAGE5,GAGE2B, GAGE4 SCGB2A1, PIGR,
TFF3, KCNMA1,
SGCD, HOXA11
BAIAP2L1, CARHSP1, CD68, DUSP9,
FLYWCH1, FOS, HMGCS1, IFI6, PITPNB,
THAP4, TMPRSS11D, TRAF4, WBP5,
ZDHHC3, ERAP1, TGM2
ENSG00000250328.1 ENSG00000235215.2
ENSG00000232325.3 ENSG00000250850.2
ENSG00000261298.1 ENSG00000268066.1
ENSG00000254975.1 ENSG00000258609.1
ENSG00000268095.1 ENSG00000254900.1
Stage III
and Stage
IV
AKR1C2 ANXA5, ASH2L, BARX2, DDR1, EMP2,
ESRRA, FOXM1, ITPRIP, NR1D1, PER3,
PITPNA, STRAP, TCF19, TM7SF3, TOM1L2,
TRIB1, XRCC1, ZC3HAV1,MYCBP ,SQLE
ENSG00000232287.2, ENSG00000234076.1,
ENSG00000227487.3, ENSG00000234584.1,
ENSG00000273287.1, ENSG00000236262.1
10. Classification Result Table
For mRNA
classification CA Sens Spec AUC Prec Recall Brier
For lncRNA
classification
CA Sens Spec AUC Prec Recall Brier
Stage I and Stage II
SVM 0.866 0.8951 0.8 0.8842 0.911 0.8951 0.2383SVM
0.858
3
0.916
7 0.6571 0.8542 0.9016 0.9167 0.2253
kNN 0.7072 0.8704 0.3286 0.7371 0.75 0.8704 0.4877kNN
0.741
3 0.85 0.3714 0.7424 0.8226 0.85 0.4083
Naïve Bayes 0.7629 0.858 0.5429 0.8135 0.8129 0.858 0.347Naïve Bayes
0.883
3
0.908
3 0.8 0.8576 0.9397 0.9083 0.2142
For mRNA
classification
CA Sens Spec AUC Prec Recall Brier
For lncRNA
classification
CA Sens Spec AUC Prec Recall Brier
Stage II and Stage
III
Naïve Bayes 0.7235 0.8 0.6087 0.7564 0.7568 0.8 0.4078SVM
0.895
2
0.885
7 0.9 0.9528 0.9118 0.8857 0.1828
kNN 0.7167 0.7714 0.6304 0.8407 0.7606 0.7714 0.462kNN 0.75
0.771
4 0.7333 0.8861 0.7714 0.7714 0.3886
SVM 0.871 0.8571 0.8913 0.8907 0.923 0.8571 0.2483Naïve Bayes
0.892
9
0.914
3 0.8667 0.9167 0.8889 0.9143 0.2409
For mRNA
classification
CA Sens Spec AUC Prec Recall Brier
For lncRNA
classification
CA Sens Spec AUC Prec Recall Brier
Stage III and Stage
IV
Naïve Bayes 0.7786 0.8261 0.6667 0.8858 0.8444 0.8261 0.3424SVM 0.975 1 0.8571 0.9619 0.9677 1 0.0959
kNN 0.8071 0.913 0.5714 0.8133 0.8235 0.913 0.3397kNN 0.9
0.966
7 0.5714 0.8976 0.9062 0.9667 0.1686
SVM
0.940
5 0.9348 0.9524 0.9717 0.977 0.9348 0.1348Naïve Bayes 0.975 1 0.8571 0.9714 0.9677 1 0.087
11. mRNA – miRNA Enrichment Analysis
• miR-30 (Yellow), miR-17 (Green), let-7 (pink), miR-
130 (Cyan) families were found to be enriched
• miR-30 (Yellow) and miR-17 (Green) families were
found to be enriched
mRNA selected from
microarray
mRNA selected from TCGA
12. Reactome 2016 Pathway Analysis
• when cut-off was considered to be Z Score
< 1.95, p value <0.05 and combined score
<10 in best 10 enriched pathways
mRNA selected from microarray
mRNA selected from TCGA
13. GO Molecular Function 2017b Analysis
• when cut-off was considered to be Z
Score < 2, p value <0.05 and combined
score < 10 in best 10 enriched pathway
mRNA selected from microarraymRNA selected from microarray
mRNA selected from TCGA
14. a
c
b
GO BP, MF 2017b and Reactome 2016
Analysis of all selected mRNAs
mRNAs shown in royal blue colour
with yellow border are associated
with R-HSA-2022090
• Receptor tyrosine kinase signalling is
prolonged due to E6 oncoprotein,
where EGFR internalization is caused by
GRB2 (Sprangle et al, 2013).
• Independently and synergistically with
estrogen HPV oncogenes also
dysregulate associated collagen and
ECM dynamics via transcriptional
regulation (Spurgeon et al, 2017)
15. Oncoprint Analysis of selected mRNAs
mRNA selected from TCGA
mRNA selected from microarray
• mRNAs selected to differentiate stage I and stage II are found to be
altered in 87 (46%) of 191 sequenced cases/patients (191 total), of
which PAK2 (18%), HMGCS1 (7%) and HSPA8 (7%) possessed more
than or equal to five percent of genetic alteration.
• mRNAs selected to differentiate stage I and stage II are found to be
altered (Altered in 58 (30%) of patients, of which HMGCS1(7%) and
THAP4 (5%) possessed more than or equal to five percent of
genetic alteration.
• mRNAs selected to differentiate stage I and stage II are found to be
altered (Altered in Altered in 59 (31%) of patients, of which BARX2
(6%), PER3(5%) ,SQLE(5%) possessed more than or equal to five
percent of genetic alteration.
• Oncoprint Analysis showed that all selected mRNAs are altered in
116 (61%) of 191 sequenced cases/patients.
• ANK2 (7%), DST (10%), LAMP3 (17%), MXRA5 (8%), SLC6A8
(6%), COL7A1 (7%) and MMP7 (10%) possessed more than five
percent of genetic alteration.
16. Wiki-pathway Enrichment of Selected lncRNAs and their Co-
expressed mRNA via Integrated Statistical Analysis
• Cytoplasmic Ribosomal Proteins pathway and Electron
Transport chain pathway were found to be the most enriched
pathways associated with selected lncRNAs from TCGA data.
• Oxidative phosphorylation, proteasome degradation and TCR
Signalling Pathway were of lesser significance also in cervical
cancer progression.
How can we validate the result?
Associated literature mining suggested that
• E6 was found to activate genes associated with electron transport
chain and oxidative phosphorylation pathway (Evans et al, 2016).
• E6 and E7 oncoproteins is known to inactivate p53 through
proteasomal degradation in cervical cancer (Yim et al, 2005).
• Immunity pathway is activated with HPV and modulate toll like
receptor (TLR) signalling pathways and associated inflammatory
response promote carcinogenesis (Yang et el, 2017)
• Red Edge - positively correlated
• Blue Edge- negatively correlated
• Thickness of the edge is
proportional to the
enrichment score
• GO term nodes, coloured on a
yellow to red scale,
according to the GO term cumulative enrichment value.
17. Survival Plots
HMGCS1
HSPA8
SHANK-AS1 hsa-miR-30e-3p
• From microarray,out of 52 DE genes 10 genes were
found to be prognostic marker namely,
CD24,ADM,RARRES3,NKG7,CD247,LAG3,LAMC2,
PMEPA1,KCNMA1 and SLC2A1.
• mRNAs enriched in collagen assembly pathway in
Reactome (LAMC2, MMP7, DST, COL7A1 and
COL8A1 in combination can act as prognostic
marker (p value = 0.040)
• HMGCS1 and HSPA8 were found to be the
prognostic biomarkers having more than or equal to
five percent of genetic alteration.
• From the selected candidates of enriched miRNA
families, hsa-miR-30e-3p (p=0.029) was found to
be a survival marker.
• One selected lncRNA, ENSG00000236262.1, also
known as SHANK-AS1 (p=0.0007) was also found
to be a survival marker.
Combined plot for LAMC2,
MMP7, DST,COL7A1,
COL8A1
18. Conclusions
mRNAs identified from microarray can be used as biomarkers for differentiating FIGO specific cervical cancer stages in
ethnicity independent manner.
Minimal number of mRNAs and lncRNAs identified from TCGA can be used as biomarkers for differentiating FIGO
specific cervical cancer stages with more than 85% accuracy.
SVM outperformed for mRNA based classification, while Naïve Bayes during lncRNA based classification.
miR-30 and miR-17 families were found to be enriched in both mRNA-miRNA co-expression network using mRNAs
selected from both TCGA and microarrays.
Cytoplasmic Ribosomal Proteins pathway and Electron Transport chain pathway were found to be the most enriched
pathways associated with selected lncRNAs from lncRNA-mRNA co-expression network analysis for pathway
enrichment .
HMGCS1 and HSPA8 were found to be the prognostic biomarkers from selected diagnostic biomarkers having more than
or equal to five percent of genetic alteration.
Non-coding RNAs, miR-30e-3p as miRNAs and ENSG00000236262.1 (SHANK-AS1) as lncRNA were also found to be
important prognostic biomarkers in FIGO based cervical cancer progression.
19. Future Work
Identification of grade specific, HPV status specific and age specific markers for cervical cancer.
Validation of protein expression pattern of identified biomarkers in tissue microarray.
mRNA expression in tissue samples using QRT-PCR and lnc RNA expression using in situ hybridization
Implementation of deep learning algorithms for improving classification efficacy during marker selection
Implementation of proposed pipeline in other cancer models
20. References
1. Integrated genomic and molecular characterization of cervical cancer. Nature, 2017. 543(7645): p. 378-384.
2. Richard Boland, C., Non-coding RNA: It’s Not Junk. Digestive Diseases and Sciences, 2017. 62(5): p. 1107-1109.
3. Deng, S.P., L. Zhu, and D.S. Huang, Predicting Hub Genes Associated with Cervical Cancer through Gene Co-Expression Networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2016.
13(1): p. 27-35.
4. Li, J., et al., TANRIC: An Interactive Open Platform to Explore the Function of lncRNAs in Cancer. Cancer Res, 2015. 75(18): p. 3728-37.
5. Dem, J., et al., Orange: data mining toolbox in python. J. Mach. Learn. Res., 2013. 14(1): p. 2349-2353.
6. Steinfeld, I., et al., ENViz: a Cytoscape App for integrated statistical analysis and visualization of sample-matched data with multiple data types. Bioinformatics, 2015. 31(10): p. 1683-5.
7. Anaya, J., OncoLnc: linking TCGA survival data to mRNAs, miRNAs, and lncRNAs. PeerJ Computer Science, 2016. 2: p. e67.
8. Xu, Z., et al., Investigation of differentially-expressed microRNAs and genes in cervical cancer using an integrated bioinformatics analysis. Oncol Lett, 2017. 13(4): p. 2784-2790.
9. Dong, J., et al., Long non-coding RNAs on the stage of cervical cancer (Review). Oncol Rep, 2017. 38(4): p. 1923-1931.
10. Huang, J., et al., Identification of lncRNAs by microarray analysis reveals the potential role of lncRNAs in cervical cancer pathogenesis. Oncol Lett, 2018. 15(4): p. 5584-5592.
11. Zhu, H., et al., Long non-coding RNA expression profile in cervical cancer tissues. Oncology Letters, 2017. 14(2): p. 1379-1386.
12. Peng, L., et al., LncRNAs: key players and novel insights into cervical cancer. Tumor Biology, 2016. 37(3): p. 2779-2788.
13. Yang, X., Y. Cheng, and C. Li, The role of TLRs in cervical cancer with HPV infection: a review. Signal Transduction and Targeted Therapy, 2017. 2: p. 17055
14. Evans, W., et al., Overexpression of HPV16 E6* Alters β-Integrin and Mitochondrial Dysfunction Pathways in Cervical Cancer Cells. Cancer Genomics - Proteomics, 2016. 13(4): p. 259-273.
15. Yim, E.-K. and J.-S. Park, The Role of HPV E6 and E7 Oncoproteins in HPV-associated Cervical Carcinogenesis. Cancer Research and Treatment : Official Journal of Korean Cancer
Association, 2005. 37(6): p. 319-324.
21. Thank You
Prof D Karunagaran, my mentor and Head of Department of Biotechnology
Prof Karthik Raman for his meaningful insights to improve the work.
All the faculty members of “Bio-Group”
Finally….
22. Confusion Matrix
Stage I Stage II
Stage I 145 17 162
StageII 13 57 70
158 74 232
Stage II Stage III
Stage II 60 10 70
Stage III 5 41 46
65 51 116
Stage III Stage IV
Stage III 43 3 46
Stage IV 1 20 21
44 23 67
Stage I Stage II
Stage I 109 11 120
StageII 7 28 35
116 39 155
Stage II Stage III
Stage II 31 4 35
Stage III 3 27 30
34 31 65
Stage III Stage IV
Stage III 30 0 30
Stage IV 1 6 7
31 6 37
Sensitivity= true positives/(true positive + false negative)
Specificity=true negatives/(true negative + false positives)
ACC=TP+TN/(TP+ FP+FN+ TN)
23. Classifiers Used
Naive Bayes methods are a set of supervised learning algorithms based on
applying Bayes’ theorem with the “naive” assumption of independence between
every pair of features.
A Support Vector Machine (SVM) is a discriminative classifier formally
defined by a separating hyperplane. In other words, given labeled training data
(supervised learning), the algorithm outputs an optimal hyperplane which
categorizes new examples.
K nearest neighbors (KNN) is a simple algorithm that stores all available cases
and classifies new cases based on a similarity measure (e.g., distance
functions).
26. Survival
Microarray EXPRESSION SURVIVAL P value
SLC2A1 LOW HIGH 0.044
CD24 LOW HIGH 0.018
ADM LOW HIGH 0.040
RARRES3 HIGH HIGH 0.001
NKG7 HIGH HIGH 0.019
CD247 HIGH HIGH 0.004
LAG3 HIGH HIGH 0.002
LAMC2 LOW HIGH 0.015
PMEPA1 LOW HIGH 0.000
KCNMA1 HIGH HIGH 0.009
HMGCS1 LOW HIGH 0.039
HSPA8 LOW HIGH 0.015
LDHA LOW HIGH 0.004
RBM3 HIGH HIGH 0.034
WASL LOW HIGH 0.048
27. RELAPSE
ARL2BP LOW HIGH 0.022
LDHA HIGH HIGH 0.018
CD68 HIGH HIGH 0.017
IFI6 LOW HIGH 0.043
TRAF4 LOW HIGH 0.050
WBP5 HIGH HIGH 0.010
MICROARRAY
ADH7 HIGH HIGH 0.036
SLC6A8 LOW HIGH 0.016
CD55 LOW HIGH 0.004
PF4 LOW HIGH 0.005
Editor's Notes
GO Molecular Function 2017b
Reactome 2016
GO Biological Process 2017b of all selected mRNAs
Wiki-pathway Enrichment of Selected lncRNAs and their Co-expressed mRNA via Integrated Statistical Analysis