1) The document discusses how synonymous mutations, which do not change the protein sequence, are frequently found to act as drivers in carcinogenesis.
2) Using data from thousands of human cancer genomes, the author finds that oncogenes are enriched for synonymous mutations, showing synonymous mutations can be selected for during cancer development.
3) Further analysis suggests some synonymous mutations may impact mRNA folding, splicing, or microRNA binding sites and thus influence gene expression, providing a means for them to act as drivers.
Synonymous mutations as drivers in human cancer genomes.
1. Synonymous mutations - from
bacterial evolution to somatic
changes in human cancer
Fran Supek
1) Lehner group, CRG/EMBL Systems Biology Unit, Barcelona
2) Division of Electronics, RBI, Zagreb, Croatia
XXI Jornades de Biologia Molecular
Barcelona, 11.6.2014
Part 2: Synonymous mutations frequently act as drivers in carcinogenesis.
3. Synonymous mutations
• (some) synonymous mutations are subject to evolutionary pressures
• clearly shown for many bacteria and yeasts
• likely also higher Eukarya (but weaker signal)
• how does selection for/against synonymous changes relate to gene
function in (a) evolution of bacteria and (b) in carcinogenesis?
evolutionary trace across ~1000 bacterial genomes somatic mutations in ~4000 human cancers
malignant transformationadaptation to diverse environments
( plush microbes in photos are from http://www.giantmicrobes.com/ )
4. A deluge of human cancer genomic data
3851 cancer exomes from 11 tissues (>200 samples each)
292,405 missense and 123,193 synonymous somatic mutations
ARE THE SYNONYMOUS MUTATIONS SELECTED FOR IN
CARCINOGENESIS?
5. from Lawrence et al (2013) Nature. Mutation rate varies widely across the genome
and correlates with DNA replication time and expression level.
7. Drivers vs. passengers
•many somatic mutations in cancer = „passengers”
•a driver = a gene that confers a selective advantage.
Recurrently mutated (ie. more than expected)
1. For missense, could be measured using the dN/dS
2.
3. commonly: find backgroud mut. frequencies for patient from
entire exome see if a gene is above that background
Intronic rates as a
baseline: INVEX test
Hodis et al. (Cell 2012)
8. 0
0.25
0.5
0.75
1
oPC2(24.3%)
carcinoma, 1Mb
non-carcinoma, 1Mb
pooled, 200kb
liver, 200kb
liver, 1Mb
breast, 1Mb
H3K9me3,
1Mb
hypothalamus
liver
skeletal & heart muscle
6 tissues
mRNA levels
0
0.2
0.4
0.6
0.8
1
9
0.6
0.8
1
oncogenes:
translocation
(217)
missense
(40)
copy number
(12)
tumor
suppressors:
all
mechanisms
(84)
Cancer Gene
CensusA
recurrently mutated genes
(self-reported in literature)
B
known
cancer genes
in Census
others:
336
39
38
C
# mutation
(110 cancers,
heterochrom
levels in 1 M
19 1821
missense-
activated
oncogenes
recurrently mutated
(from literature)
oncogenes
0
0.2
0.4
0.6
0.8
1
0.1 0
0
0.25
0.5
0.75
1
oPC2(24.3%)
carcinoma, 1Mb
non-carcinoma, 1Mb
pooled, 200kb
liver, 200kb
liver, 1Mb
breast, 1Mb
H3K9me3,
1Mb
hypothalamus
liver
skeletal & heart muscle
6 tissues
mRNA levels
0
0.2
0.4
0.6
0.8
1
9 19
D+ = 0
P = 0.0
0
0.2
0.4
0.6
0.8
1
9 19 29
D- = 0.256
P = 0.005
0.6
0.8
1
oncogenes:
translocation
(217)
missense
(40)
copy number
(12)
tumor
suppressors:
all
mechanisms
(84)
Cancer Gene
CensusA
recurrently mutated genes
(self-reported in literature)
B
known
cancer genes
in Census
others:
336
39
38
C
# mutations per 200 kb
(110 cancers, pooled tissues)
heterochromatin (H3K9me3
levels in 1 MB windows)
# mutations per
(110 cancers, poole
heterochromatin (
levels in 1 MB wi
0.6
0.8
1
D
P
0
0.2
0.4
0.6
0.8
1
0.1 0.3
D+ = 0.215
P = 0.025
D
19 1821
missense-
activated
oncogenes
recurrently mutated
(from literature)
oncogenes
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5
D- = 0.185
P = 0.061
„classical” cancer genes:
newly discovered, from
cancer genomes:
Oncogenes get activated by missense mutations, duplications, translocations....
Tumor suppressors get inactivated by missense/nonsense mutations, deletions,
promoter methylation...
9. -0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
correlationtoPC2(24.3%)
correlation to PC1 (30.4 % variance)
carcinoma, 1Mb
non-carcinoma, 1Mb
pooled, 200kb
liver, 200kb
liver, 1Mb
breast, 1Mb
H3K9me3,
1Mb
GC3
RepliSeq,
1Mb
hypothalamus
liver
skeletal & heart muscle
6 tissues
regional mutation rates
mRNA levels
1
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
oncogenes:
translocation
(217)
missense
(40)
copy number
(12)
tumor
suppressors:
all
mechanisms
(84)
Cancer Gene
CensusA
recurrently mutated genes
(self-reported in literature)
B
known
cancer genes
in Census
others:
336
39
38
#
(11
het
le
rep
s
19 1821
missense-
activated
oncogenes
recurrently mutated
(from literature)
oncogenes
0
0.2
0.4
0.6
0.8
1
0
Detecting positive selection on
synonymous mutations in cancer
• create „matched sets” of genes closely following the oncogenes in:
• regional mutation rates
• In 1 Mb and 200 kb windows
• expression levels in different tissues
• Heterochromatin, replication timing
• G+C content
10. How to find a good set of genes?
A genetic algorithm. An optimization technique that can (relatively)
easily handle many criteria at once. Quite efficient. Many parameters.
Operators:
...crossover
...random mutation
11. carcinoma, 1Mb
non-carcinoma, 1Mb
pooled, 200kb
liver, 200kb
liver, 1Mb
breast, 1Mb
H3K9me3,
1Mb
alamus
muscle
regional mutation rates
mRNA levels
0
0.2
0.4
0.6
0.8
1
9 19 29
D+ = 0.313
P = 0.0004
0
0.2
0.4
0.6
0.8
1
9 19 29
D- = 0.256
P = 0.005
0.2
0.4
0.6
0.8
1
D+ =0.211
P = 0.026
recurrently mutated genes
(self-reported in literature)
known
ncer genes
in Census
others:
336
39
38
C
# mutations per 200 kb
(110 cancers, pooled tissues)
heterochromatin (H3K9me3
levels in 1 MB windows)
# mutations per 200 kb
(110 cancers, pooled tissues)
heterochromatin (H3K9me3
levels in 1 MB windows)
0.2
0.4
0.6
0.8
1
D- = 0.199
P = 0.043
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5
D+ = 0.215
P = 0.025
D
19 1821
nse-
ted
enes
recurrently mutated
(from literature)
oncogenes
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5
D- = 0.185
P = 0.061
Oncogenes: Tumor suppressors:
Distributions of regional mutation rates (1Mb and 200 kb), heterochromatin,
etc. in the optimized sets of non-cancer genes closely match the cancer genes.
Genetic algorithm tries to minimize the K-S statistic.
12. -0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
correlationtoPC2(24.3%)
correlation to PC1 (30.4 % variance)
carcinoma, 1Mb
non-carcinoma, 1Mb
pooled, 200kb
liver, 200kb
liver, 1Mb
breast, 1Mb
H3K9me3,
1Mb
GC3
RepliSeq,
1Mb
hypothalamus
liver
skeletal & heart muscle
6 tissues
regional mutation rates
mRNA levels
0
0.2
0.4
0.6
0.8
1
-2 0 2
D- = 0.224
P = 0.017
0
0.2
0.4
0.6
0.8
1
9
D
P
0
0.2
0.4
0.6
0.8
1
-2 0
0
0.2
0.4
0.6
0.8
1
9 19 29
D- = 0.256
P = 0.005
0
0.2
0.4
0.6
0.8
1
D+ =0.211
P = 0.026
earlylate
oncogenes:
translocation
(217)
missense
(40)
copy number
(12)
tumor
suppressors:
all
mechanisms
(84)
Cancer Gene
CensusA
recurrently mutated genes
(self-reported in literature)
matched sets of noncancer genes:
1517 genes (for oncogenes)
693 genes (for tumor suppressors)
B
known
cancer genes
in Census
others:
336
39
38
C
# mutations per 200 kb
(110 cancers, pooled tissues)
heterochromatin (H3K9me3
levels in 1 MB windows)
replication timing (RepliSeq
signal in 1 MB windows)
mRNA levels, avg. of 6 tissues
# mutatio
(110 cancers
heterochrom
levels in 1
replication t
signal in 1
mRNA levels,
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0.1
D+ =
P = 0
39 oncogenes (recurrently mutated)
38 tumor suppressors (recurr. mutated)
D
19 1821
missense-
activated
oncogenes
recurrently mutated
(from literature)
oncogenes
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5
D- = 0.185
P = 0.061
Expected: the oncogenes and the tumor
suppressors are highly enriched with
missense mutations (~1.5 - 2.5x).
However, the oncogenes are also enriched
with synoynmous mutations over their
matched sets, ~1.2x.
13. Introns of oncogenes (from whole-
genome sequencing) are not enriched
with SNVs, compared to matched sets.
The matched sets method agrees with
Invex, and with simply using
neighboring genes as a baseline.
14. Tissue-specific oncogenes are more enriched
with synonymous mutations in the
corresponding tissue.
This effect is not due to mutation
showers/clustered mutations, as the same
cancer samples don't tend to contain both a
synonymous and a missense mutation in same
gene.
Synonymous enrichment in oncogenes
is detectable across cancer types.
15. Some oncogenes are more highly
enriched with synonymous mutations
than others, e.g. PDGFRA, EGFR, GATA1,
ELN, NTRK1, JAK3, ALK and others (n=16).
The synonymous SNV enrichment in
these genes is not paralleled by
intronic SNV enrichment.
16. The synonymous mutations tend to cluster together to a similar extent as the
missense mutations in the affected oncogenes. They also (less prominently)
cluster with missense mutations.
17. 0%
10%
20%
30%
40%
50%
60%
optimal
codon
gain
optimal
codon
loss
no
change
%ofsynonymousmutations
leadingtooutcome
n.s.
-18
-13
-8
-3
mRNAfoldingfreeenergy
aroundmutatedsites(kcal/mol)
50nt windows
w.t.
mRNA
mut.
mRNA
-31
-26
-21
-16
-11
-6 100nt windows
w.t.
mRNA
mut.
mRNA
0%
10%
20%
30%
40%
≤30 nt 31-70
nt
>70 nt
p < 10-4
1.75
1.26
0.45
-2
-1
0
1
2
1 2 3 4 5 6 7
log2RPKMofexon
exon # in transcript ENST00000334286
30 random samples w/o point mutations
6 samples w/ synonymous exonic mutations
EDNRB gene,
colorectal cancer
-0.5
-0.3
-0.1
0.1
0.3
0.5
whole
cDNA
sites w/
phyloP>1.0net#ofgainedmiRNAseed
sitespersyn.mutation
16 oncogenes
matched set
-0.3 -0.2 -0.1 0 0.1 0.2
normalized difference (Glass' delta) between properties
of mutated positions in oncogenes vs. matched set
Relative preference value at C-cap (of α helices)
Normalized frequency of turn in all-α class
Alpha-helix indices for α-proteins t-test
FDR
0%
10%
20%
30%
enh.
gain
enh.
loss
sil.
gain
sil.
loss
%syn.mutations(within30nt
ofsplicesite)leadingtoevent
Ke et al. 2012 hexamers
1.53
0.83
0.60
1.90
p = 0.02
enh.
gain
enh.
loss
RESCUE-ESE
p = 0.003
1.90
0.53
sil.
gain
sil.
loss
FAS-hex2
p = 3·10-4
0.37
2.73
A B C
D E
G
F
10%
20%
p=0.05
n.s.
n.s.
1.43
1.12
0.79
20%
30%
40%
50%
actual
synonymous
mutations
randomized
mutation
H
Use of „optimal codons” miRNA binding sites Secondary structures in mRNA
What could the synonymous mutations do?
18. 0%
10%
20%
30%
40%
50%
60%
optimal
codon
gain
optimal
codon
loss
no
change
%ofsynonymousmutations
leadingtooutcome
n.s.
-18
-13
-8
-3
mRNAfoldingfreeenergy
aroundmutatedsites(kcal/mol)
50nt windows
w.t.
mRNA
mut.
mRNA
-31
-26
-21
-16
-11
-6 100nt windows
w.t.
mRNA
mut.
mRNA
0%
10%
20%
30%
40%
≤30 nt 31-70
nt
>70 nt
p < 10-4
1.75
1.26
0.45
-2
-1
0
1
2
1 2 3 4 5 6 7
log2RPKMofexon
exon # in transcript ENST00000334286
30 random samples w/o point mutations
6 samples w/ synonymous exonic mutations
EDNRB gene,
colorectal cancer
-0.5
-0.3
-0.1
0.1
0.3
0.5
whole
cDNA
sites w/
phyloP>1.0net#ofgainedmiRNAseed
sitespersyn.mutation
16 oncogenes
matched set
-0.3 -0.2 -0.1 0 0.1 0.2
normalized difference (Glass' delta) between properties
of mutated positions in oncogenes vs. matched set
Relative preference value at C-cap (of α helices)
Normalized frequency of turn in all-α class
Alpha-helix indices for α-proteins t-test
FDR
0%
10%
20%
30%
enh.
gain
enh.
loss
sil.
gain
sil.
loss
%syn.mutations(within30nt
ofsplicesite)leadingtoevent
Ke et al. 2012 hexamers
1.53
0.83
0.60
1.90
p = 0.02
enh.
gain
enh.
loss
RESCUE-ESE
p = 0.003
1.90
0.53
sil.
gain
sil.
loss
FAS-hex2
p = 3·10-4
0.37
2.73
A B C
D E
G
F
10%
20%
p=0.05
n.s.
n.s.
1.43
1.12
0.79
20%
30%
40%
50%
actual
synonymous
mutations
randomized
mutation
H
Use of „optimal codons” miRNA binding sites Secondary structures in mRNA
No general effect was detected in any of these cases (although they may still
be important in specific examples).
19. Exonic Splicing Enhancer
~ and ~
Exonic Splicing Silencer
From Cartegni, Chew & Krainer. Nat Rev Genet. 2002
3(4),285-98.
AGAAGA enh
GAAGAT enh
GACGTC enh
GAAGAC enh
....
CTTTTA sil
CTTTAA sil
TAGGTA sil
TAGTAG sil
20. Synonymous SNVs tend
to be closer to splice
sites in oncogenes.
They also tend to cause gains of known exonic
splicing enhancer motifs, and losses of exonic
splicing silencer motifs.
21. They more often affect
exons with weaker
(noncanonical) splice
sites.
The exonic splicing
enhancers created may
resemble SF2/ASF motifs.
The ESS sites that are lost
upon mutation
sometimes resemble
hnRNP A2/B1, H2 and A1
motifs.
22. Roughly ½ of the putatively causal synonymous mutations alter
splicing, as evidenced by examining RNA-seq data from cancer.
We don't (yet) know what the other ½ is doing. One possibility
may be affecting protein folding.
23. In yeast: Pechmann & Frydmann Nature Struct Mol Biol 2013
F
0%
10%
20%
α-helix,
1st a.a.
α-helix,
middle
α-helix,
last a.a.
p=0.05
n.s.
n.s.
1.43
1.12
0.79
0%
10%
20%
30%
40%
50%
coil
actual
synonymous
mutations
randomized
mutation
positions
0%
10%
20%
middle next to
coil only
next to
β-sheet
p = 4·10-5
0.97
1.01
2.60
α-helix
parts:
0%
10%
20%
30%
40%
50%
coil
G
H
N’’ N’ Ncap Ccap C’ C’’
α-helix
turn
-0.3 -0.2 -0.1 0 0.1 0.2
normalized difference (Glass' delta) between
mutated sites in oncogenes vs. matched set
relative preference value at C-cap
normalized frequency of turn in all-α class
α-helix indices for α-proteins
relative preference value at N'
relative preference value at N''
normalized frequency of α-helix in all-α class
FDR
<10%
...also in cancer: we observe an
enrichment of synonymous
mutations at N-termini of alpha-
helices, esp. if close to beta-sheets.
Suggestive of effects on folding.
24. known
novel
TP53 gene has a large
excess of synonymous
mutations, which are
always near splice sites.
We found three examples
of recurrent SNV that
inactivate the nearby splice
site.
causes a frameshift
26. Take-home messages:
• oncogenes contain an excess of synonymous mutations in human
cancers
• a subset of synonymous mutations target splicing motifs
• 1/5 to 1/2 synonymous mutations in oncogenes reported to-date are
acting as driver mutations
• ~6 – 8% of all driver mutations due to single nucleotide changes are likely to be
synonymous mutations
• TP53 has recurrent synonymous mutations that disrupt splice sites
• an excess of mutations of 3’ UTRs of dosage-sensitive genes
published in: Supek et al. (2014) Cell. http://dx.doi.org/10.1016/j.cell.2014.01.051
27.
28. Thank you!
Fran Supek
1) Lehner group, CRG/EMBL Systems Biology Unit, Barcelona
2) Dept of Electronics, RBI, Zagreb, Croatia
XXI Jornades de Biologia Molecular
Barcelona, 11.6.2014
End of Part 2. Part 1 deals with inferring microbial gene function from
evolutionary change in codon biases, and is available separately.