1. • An increasing number of long noncoding RNAs (lncRNAs) have been identified and
found to play key roles in mammary tumor development and may provide new
biomarkers and potential targets for future therapies.
• Our previous work demonstrates that insulin-like growth factor (IGF1) signaling, which
is implicated in the initiation and progression of breast cancer, regulates the expression
levels of lncRNAs including SNHG7.
• We also demonstrated that SNHG7 controls proliferation through the regulation of a
large number of transcripts; however, the molecular mechanism of how SNHG7
regulates transcript levels is unknown.
• In TCGA data, overexpression of SNHG7 correlates to poor overall survival; however,
TCGA clinical data has only short-term follow up of patients.
• We aimed to determine if we could extract clinical information from the rich METABRIC
dataset that lacks RNAseq data—necessary to determine expression of previously
unknown lncRNAs—by using a Guilt-by-Association (GBA) model.
The lncRNA SNHG7 Regulates Transcript Levels and Cell Proliferation of Breast Cancer Cells Through Association
with HNRNPK and a Guilt-by-Association Method to Determine lncRNA Expression in METABRIC Data
Andrew Warburton1,2 and David Boone, Ph.D2,3
University of Pittsburgh1, Magee Women’s Research Institute2, University of Pittsburgh Medical Center3
Background Lab Results Cont.
Conclusion
• Wet lab results confirmed that SNHG7 interacts with HNRNPK
• HNRNPK regulates proliferation.
• SNHG7 regulates RNA stability of certain IGF/SNHG7 target genes.
• GBA using all methods allows significant correlation between actual gene-of-interest
expression level and predicted level across all patient sets.
I would like to thank Dr. David Boone for guiding me along with these experiments. I
would also like to thank Dr. Adrian Lee and Women’s Cancer Institute Center for
providing the lab space and equipment to conduct all of these experiments and the
Department of Biomedical Informatics for their continued support of these experiments.
A special thank you to the Komen Foundation for providing funds to conduct this
experiment and the University of Pittsburgh for providing the opportunity for me to
conduct undergraduate research.
• Determine if HNRNPK regulation of transcript levels is global through RNAseq.
• Refine computational methods to assign gene signatures and scores for patient
tumors.
• Compare PPV and NVP of different computational methods and test on other
genes.
• Determine if GBA patient scores can be used to divide patients to determine the
effects of lncRNAs on overall survival and disease free survival.
Computational Modeling
50
75
37
1%Input
beads
Cd8-igf1r
SNHG7-NI
SNHG7-I
100
25
LDHA, RL6, HNRNH3?
HNRNPK, KPYM, CPNE1?
http://www.sigmaaldrich.com/life-science/epigenetics/imprint-rna.html
0
1
2
3
4
5
6
7
8
9
10
SNHG7 RPL19
FoldenrichmentoverIgG
IgG
HNRNPK
Computational Results
0
20000
40000
60000
80000
100000
120000
140000
160000
0 1 2 3 4 5 6 7 8 9
ProliferationUnits(RFU)
Days
siCtl
siSNHG7
siSNHG15
siHnRPK
siHnRPQ
siLDHA
TCGA RNA Sequence Data
Define genes to be used in gene
signature
Extract genes common
To TCGA and METABRIC
Plus Gene of Interest
Logistical
Regression
Differential Gene
Expression
Calculate gene signature score
Compare Positive Predictive Values (PPV) and
Negative Predictive Values (NPV)
Compared all values against METABRIC
study to validate our predictive model
DeSEQ2 OPTIMIZATION
PROCESS
High and Low Expression
High > 1 STDEV above mean
Low < 1 STDEV below mean
Determining Patients with
High and Low Expression
FDR Loop
Values by magnitudes of 10
FDR = 0.001
Fold Change Loop
Values by 0.1
FC = 1.5
Determining Cutoff for
FDR Values
Determining Fold Change
Cutoff
Building a Patient Classifier
Assigning Appropriate Patient
Score
Creighton MethodssGSEA
Assign ± 1 value for each
gene if it is positively or
negatively correlated
and create Patient Score
by Correlating ± 1 values
to Patient log2tpm
Values.
Using ssGSEA package
and GVSA Analysis,
create a GSEA Score
for each Patient to
plot against log2tpm
Future Research
Acknowledgements
Goals
SNHG7 and HNRNPK are Necessary for Proliferation
Lab Results
1. Determine SNHG7 mechanism regulating transcript levels and cell proliferation.
2. Determine if we can predict expression of lncRNAs using guilt-by-association method to
protein coding genes, so that we can use rich METABRIC clinical data set.
Choosing Genes
for Gene
Signature
Logistical
Regression
Choose genes
based on their
Pearson scores
and take top
2000 genes
according to
absolute Pearson
Value
Differential
Expression
Determine genes
differentially
expressed
between tumors
with high levels
of GOI vs low
levels of GOI
Method
Assigning
Patient Score
Creighton
TCGA
PATIENTS
METABRIC
PATIENTS
R= 0.5512439
ssGSEA
ssGSEA
Creighton
R= 0.6565751
R= 0.5619235
R = 0.6026541
R = 0.5061578
R = 0.6176028
R = 0.5911964
R = 0.4844419
SNHG7 Associates with HNRNPK
SNHG7 Regulates Transcript Stability