Invited presentation at Fred Hutch Cancer Epigenetics Affinity Group (CEAG) meeting in Fall 2017, on the topic of esophageal adenocarcinoma and Barrett's esophagus methylation subtype discovery and characterization.
Cancer Epigenetics Affinity Group Meeting Presentation
1. Identification of Novel Molecular Characteristics of
Methylation Subtypes in Esophageal
Adenocarcinoma by Integrated Analysis
Sean Maden
Research Data Analyst Assistant
Grady Lab, CRD
Fred Hutch
Cancer Epigenetics Affinity Group Meeting, Nov 7th 2017
2. Acronyms and Definitions
Tissue Types
• BE : Barrett’s Esophagus, mucinous columnal epithelium develops in
response to chronic bile and acid exposure (gastroesophageal
reflux), often includes goblet cells
• HGD/LGD : high/low- grade dysplasia
• EAC : Esophageal adenocarcinoma, rare but with recent rapid
incidence increases and poor prognosis
Methylation subtypes
• CIMP : CpG Island Methylator Phenotype
• HM / IM / LM / MM : high / intermediate / low / minimal methylator
subtype
3. Barrett’s Esophagus Progression Sequence
• Histologically distinct
progression sequence from
Barrett’s to dysplasia to
Esophageal Adenocarinoma
• Challenges with identification
and prognosis for high vs. low
grade dysplasia (HGD/LGD)
• Barrett’s patients at greatly
increased risk of Esophageal
Adenocarcinoma1
Img source: Jain and Dhingra. “Pathology of esophageal cancer and Barrett’s esophagus”. 2017 Ann Cardiothorac Surg Mar; 6(2): 99-109
1. Runge et al. “Epidemiology of Barrett’s Esophagus and Esophageal Adenocarcinoma”. 2015 Gastroenterol. Clin. North. Am. 44(2): 203-231
A. Nondysplastic Barrett’s
B. Barrett’s with low grade dysplasia
C. Barrett’s with high grade dysplasia
D. Barrett’s with intra-mucosal adenoma
4. Risk Factors in Barrett’s and Esophageal
Adenocarinoma
• Risk factors for Barrett’s: male, Caucasian, middle/old-age,
gastroesophageal reflux (aka. GERD)
• Barrett’s and High Grade Dysplasia are the most important risk factors
for Esophageal Adenocarcinoma (virtually by definition)
1. Runge et al. “Epidemiology of Barrett’s Esophagus and Esophageal Adenocarcinoma”. 2015 Gastroenterol. Clin. North. Am. 44(2): 203-231
5. Goals For Improved Screening
• Barrett’s and Esophageal adenocarcinoma can both be asymptomatic,
and thus not detected until they have considerably progressed
• Esophageal adenocarcinoma often found at progressed stages, and
thus hard to treat
• To improve screening efficacy, improve power to predict risk of
progression from BE to EAC, and enable stratification of patients and
treatment plans based on this risk
• This background motivates efforts to characterize subtypes in BE and
EAC that are of relevance to biological understanding and treatment in
the clinic.
6. Molecular Subtypes in EAC: The
Importance of Methylation
• Methylation is the best-studied epigenetic signature. Promoter
methylation can suppress gene expression
• Illumina HM450K BeadChip is a microarray that assays ~480,000
CpG loci (CG-dinucleotides) enriched for regions of interest (Islands,
genes, etc.)
• Array methylation measured by Beta-value, ratio of methylated
signal to total signal (methylated + unmethylated signal)
=> Note Beta-value scale for methylation heatmaps
7. Recent literature 1 of 2: molecular subtypes
in pan-gastroesophageal cancer analysis
• Location-specific molecular
heterogeneity throughout
esophagus and stomach
• Importance of CpG Island
Methylator Phenotype (CIMP)
in lower-esophagus
• EAC distinguishable from
esophageal squamous cell
carcinoma and non-
chromosome-unstable gastric
cancers
Image modified from:
The Cancer Genome Atlas Research Network. “Integrated genomic characterization of oesophageal carcinoma” (2017) Nature 541, 169-175.
8. Recent literature 2 of 2: EAC subtypes In Krause et al 2016
Hypervariable
methylationarrayprobes
Samples
9. Present work, background
• Discovery cohort of Barrett’s and Esophageal Adenocarcinoma
patients
• Methylation arrays for nondysplastic Barrett’s, Esophageal
Adenocarcinoma, Normal Squamous tissues from Barrett’s
patients, and fundus and cardia from Barrett’s patients
• Goal: Identify subtypes using methylation arrays, then validate
and characterize with other platforms
10. Determining Methylation Subtypes: Workflow*
Array prep,
preprocessing,
normalization,
filtering, batch
correction
Most variable
CpG probes
(MVPs) in
Esophageal
Adenocarcinoma
Recursive partition mixture model
(RPMM) clustering on MVPs (No pre-
determined k / n-clusters!)
Cluster subtypes (overall methylation
level differences):
High = HM
Intermediate = IM
Low = LM
Minimal = MM
Integrative and Orthogonal Analyses
*Strategy adapted from Hinoue et al 2012
11. Present work context: methylation subtypes
in Barrett’s and Esophageal Adenocarcinoma
cont.
• Illumina HM450K
BeadChip Arrays
• Clustering on
Most Variable
CpG probes
derived in cancer
patients
• Four subtypes:
HM, IM, LM, and
MM
12. Validated Methylator Subtypes in The Cancer
Genome Atlas (TCGA) Samples
• N = 87
Esophageal
Adenocarcinoma
patients from
The Cancer
Genome Atlas
(TCGA)
• Cancer patients
have matched
tumor and
normal samples
13. What are the molecular characteristics of
the methylator subtypes?
• Methylation is a molecular signature – are there independent
signatures of clinical and biological relevance?
• Signatures of interest include protein levels, expression of
mRNA and noncoding RNA, alteration frequencies
• Next: integrated analysis of gene expression and methylation
data in The Cancer Genome Atlas (TCGA) samples
14. Integrated Analysis: Differential epigenetic
repression (promoter methylation)
PREPROCESSING
RNAseq
(via Firehose)
Methylation
(in-house)
PRE-FILTERS
Mean Methyl.T-subtype –
Mean Methyl.Normal >=0.2
PREP/CONVERSION
Log2FC
Conversion
Mean Promoter
CpG Methylation
Mean Expr.T-subtype <= -1
Spearman
correlation
Filter correlation #1
(Rho<0)
Filter correlation #2
(p-adj < 0.1)
Loci for
testing
CORRELATIONS
Methylation
Expression
Integrated
Data Types
1. 2.
3.
4.
5.
6.
15. PREPROCESSING
RNAseq
(via Firehose)
Methylation
(in-house)
PRE-FILTERS
Mean Methyl.T-subtype –
Mean Methyl.Normal >=0.2
PREP/CONVERSION
Log2FC
Conversion
Mean Promoter
CpG Methylation
Mean Expr.T-subtype <= -1
Spearman
correlation
Filter correlation #1
(Rho<0)
Filter correlation #2
(p-adj < 0.1)
Loci for
testing
CORRELATIONS
Methylation
Expression
Integrated
Data Types
1. 2.
3.
4.
5.
6.
Integrated Analysis: Differential epigenetic
repression (promoter methylation)
16. Data Types for Integrated Analysis
• Illumina HM450K BeadChip methylation arrays, preprocessed
by lab (version 1 data obtained)
• Illumina HiSeq V2 RNAseq data, prepared by Firehose***, for
mRNA/ncRNA and miRNA/miR, respectively
***https://gdac.broadinstitute.org/
17. Data Access: Version 1 Methylation Arrays
via GDC website and GDC client
• GitHub tutorial repository:
metamaden/gdc_download_tools
18. Preparation and Analysis of Methylation
Arrays
• GitHub profile:
metamaden
• methyPre library for array
preprocessing
19. PREPROCESSING
RNAseq
(via Firehose)
Methylation
(in-house)
PRE-FILTERS
Mean Methyl.T-subtype –
Mean Methyl.Normal >=0.2
PREP/CONVERSION
Log2FC
Conversion
Mean Promoter
CpG Methylation
Mean Expr.T-subtype <= -1
Spearman
correlation
Filter correlation #1
(Rho<0)
Filter correlation #2
(p-adj < 0.1)
Loci for
testing
CORRELATIONS
Methylation
Expression
Integrated
Data Types
1. 2.
3.
4.
5.
6.
Integrated Analysis: Differential epigenetic
repression (promoter methylation)
20. Mapping CpGs to Genome Regions, GitHub resources
• Map CpGs to Regions:
metamaden/methyIntegratoR
• Map CpGs to Regions:
metamaden/cgmappeR
21. PREPROCESSING
RNAseq
(via Firehose)
Methylation
(in-house)
PRE-FILTERS
Mean Methyl.T-subtype –
Mean Methyl.Normal >=0.2
PREP/CONVERSION
Log2FC
Conversion
Mean Promoter
CpG Methylation
Mean Expr.T-subtype <= -1
Spearman
correlation
Filter correlation #1
(Rho<0)
Filter correlation #2
(p-adj < 0.1)
Loci for
testing
CORRELATIONS
Methylation
Expression
Integrated
Data Types
1. 2.
3.
4.
5.
6.
Integrated Analysis: Differential epigenetic
repression (promoter methylation)
22. PREPROCESSING
RNAseq
(via Firehose)
Methylation
(in-house)
PRE-FILTERS
Mean Methyl.T-subtype –
Mean Methyl.Normal >=0.2
PREP/CONVERSION
Log2FC
Conversion
Mean Promoter
CpG Methylation
Mean Expr.T-subtype <= -1
Spearman
correlation
Filter correlation #1
(Rho<0)
Filter correlation #2
(p-adj < 0.1)
Loci for
testing
CORRELATIONS
Methylation
Expression
Integrated
Data Types
1. 2.
3.
4.
5.
6.
Integrated Analysis: Differential epigenetic
repression (promoter methylation)
23. Differential epigenetic repression in tumor
subtypes: mRNA, hypotheses
• Expect differences in
dynamic range
(methylation and
expression) and in anti-
correlation (Spearman
test, Rho, pvalue)
Expression (Log2FC)
Methylation(MeanPromoter
CpG,Beta-value)
24. Differential epigenetic repression in tumor
subtypes: mRNA, findings
• Both common
(subtype-
independent) and
unique genes are
epigenetically
repressed
25. Differential epigenetic repression in tumor
subtypes: mRNA, summary
• HM shows more epigenetically repressed genes, inc. HUNK1,
PTPN13, TUSC1, RGS6, tumor suppressors and cell growth genes
• Low (LM) and minimal (MM) subtypes show less/no substantial
epigenetic repression
• Some shared repressed genes, but most are subtype-specific
26. Differential epigenetic repression in tumor
subtypes: ncRNA, notes
• Use “NR” filter (non-protein encoding) on RefSeq Accession
Pros:
1. Queries multiple non-coding
transcript classes (linc-RNA,
pseudogenes, alternate transcripts,
etc.);
2. Accession is readily identified from
the platform annotation
Cons:
1. Filter out alternate transcripts due to
ambiguous mapping to RNAseq data
27. Differential epigenetic repression in tumor
subtypes: ncRNA, findings
• High methylator: 4
lincRNAs (1 shared
with IM), 2
pseudogenes
• C6orf155 (lincRNA in
breast, ovarian, and
pancreatic cancer)
• PLAC2/TINCR
(lincRNA in liver and
gastric cancer)
Expression (log2FC)
PromoterMethylation(Beta-value)
28. Differential epigenetic repression in tumor
subtypes: ncRNA, summary
• Most differentially regulated “NR” transcripts in high methylator
(HM) subtype (lincRNA and pseudogenes)
• One shared repressed locus between high and intermediate
subtypes
• No differentially regulated loci in minimal methylator subtype
29. Differential epitranscriptomic regulation:
miRNA/miR workflow
Expression (miR)
Expression (mRNA)
Data Types
PREPROCESSING
miRNAseq
(via Firehose)
mRNAseq
(via Firehose)
PREP AND PRE-FILTER
Log2FC
Mean Expr.T-subtype <= -1
Differentially Expressed miR
Sequences
Differentially Expressed
Targets (DETGs)
miR-mRNA Target Interactions
(Data Mining, DEmiR database)
Differential Expression Testing
1.
2.
3.
4.
5.
30. Differential epitranscriptomic regulation:
miRNA/miR, notes
• Review: miRNAs regulate mRNA post-transcriptionally, pre-
translationally
• miRNAs are processed from a “precursor” stem-loop
structure/sequence into “mature” miR sequences that target mRNA
sequences
• Here, focus on miR and mRNA expression
31. Differential epitranscriptomic regulation:
miRNA/miR workflow
Expression (miR)
Expression (mRNA)
Data Types
PREPROCESSING
miRNAseq
(via Firehose)
mRNAseq
(via Firehose)
PREP AND PRE-FILTER
Log2FC
Mean Expr.T-subtype <= -1
Differentially Expressed miR
Sequences
Differentially Expressed
Targets (DETGs)
miR-mRNA Target Interactions
(Data Mining, DEmiR database)
Differential Expression Testing
1.
2.
3.
4.
5.
35. miR-mRNA Target Compromise: PRImiR
• Consensus of miR-mRNA
interactions from 5
databases:
metamaden/PrImiR
Source and background: https://github.com/metamaden/PrImiR
36. Differentially Expressed miRs Continued
• hsa-miR-134-5p and hsa-miR-200a-3p show increased
expression coupled with reduced expression of EGFR (proto-
oncogene, and a mutual target mRNA) in low and minimal
methylator subtypes
• Gene Overrepresentation analysis of ontological terms with
PANTHER, performed on the consensus mRNA targets (N = 40
mRNA targets for miR-134-5p; N = 1180 targets for miR-200a-
3p)
• Enrichment for developmental pathways (both miR’s),
regulation of metabolism and differentiation (miR-200a-3p)
37. Differential epitranscriptomic regulation:
miRNA/miR workflow
Expression (miR)
Expression (mRNA)
Data Types
PREPROCESSING
miRNAseq
(via Firehose)
mRNAseq
(via Firehose)
PREP AND PRE-FILTER
Log2FC
Mean Expr.T-subtype <= -1
Differentially Expressed miR
Sequences
Differentially Expressed
Targets (DETGs)
miR-mRNA Target Interactions
(Data Mining, DEmiR database)
Differential Expression Testing
1.
2.
3.
4.
5.
40. Mean Expression Difference, HM-MM (Log2FC)
-log(p-value)
Differentially Expressed Target Genes
• Differential target mRNA
expression across subtypes
41. Differential miRNA expression summary
• Greater differential miR expression between high (HM) and
low/minimal methylator (LM, MM) subtypes than between high
(HM) and intermediate (IM) subtypes.
• Differential miR-mRNA target correlations across subtypes
(DEmiRs and DETGs).
• Evidence of EGFR suppression in LM/MM by multiple miR’s,
evidence of additional repressive forces acting on TUSC1 and
PTPN13 (both epigenetically repressed by promoter
methylation in HM).
42. Additional Subtype Molecular Data
• Platforms: whole exome sequencing, immunohistochemical
assay, and targeted alteration microarray
• High methylator: Higher frequency of ERBB2+ and ARID1A-,
and higher frequencies of genome-wide mutations and
particularly small insertions/deletions, and upregulation of HER2
• Intermediate methylator: Increased frequency of CDK6+ and
co-amplification with ERBB2
• Low methylator: Lowest mutational frequency genome-wide
• Minimal methylator: Increased MDM2+
43. What’s Next?
• Data mine the Cancer Cell Line Encyclopedia (CCLE) database
• Perform wet lab investigation of clinical relevance of adenocarcinoma
subtypes (Dx sensitivity, knock out/in gene mRNA, etc.)
44. Acknowledgements
• Thanks to the support of current and former members of Grady Lab, and to Dr. Grady
for the invitation to speak
• Dr. Bill Grady
• Dr. Ming Yu
• Kelly Carter
• Tai Heinzerling
• Yuna Guo
• Thanks to our collaborators at Fred Hutch, Harvard, the Broad Institute, Case
Western, and UW
• Dr. Matt Stachler
• Dr. Adam Bass
• Dr. Georg Luebeck
• Dr. Bill Hazelton
• Thanks to facilitators of the BETRNet consortium
• Dr. Amitabh Chak
• Dr. Joe Willis
• Dr. Andrew Kaz
GitHub groups: GradyLab, Fred Hutch
Username: metamaden