Synonymous mutations - from
bacterial evolution to somatic
changes in human cancer
Fran Supek
1) Lehner group, CRG/EMBL Systems Biology Unit, Barcelona
2) Division of Electronics, RBI, Zagreb, Croatia
XXI Jornades de Biologia Molecular
Barcelona, 11.6.2014
Part 2: Synonymous mutations frequently act as drivers in carcinogenesis.
synonymous mutations =
changes in the gene sequence
that don’t alter the protein sequence
Synonymous mutations
• (some) synonymous mutations are subject to evolutionary pressures
• clearly shown for many bacteria and yeasts
• likely also higher Eukarya (but weaker signal)
• how does selection for/against synonymous changes relate to gene
function in (a) evolution of bacteria and (b) in carcinogenesis?
evolutionary trace across ~1000 bacterial genomes somatic mutations in ~4000 human cancers
malignant transformationadaptation to diverse environments
( plush microbes in photos are from http://www.giantmicrobes.com/ )
A deluge of human cancer genomic data
3851 cancer exomes from 11 tissues (>200 samples each)
292,405 missense and 123,193 synonymous somatic mutations
ARE THE SYNONYMOUS MUTATIONS SELECTED FOR IN
CARCINOGENESIS?
from Lawrence et al (2013) Nature. Mutation rate varies widely across the genome
and correlates with DNA replication time and expression level.
from Schuster-Böckler and Lehner (2012)
heterochromatin correlates to SNV rates
Drivers vs. passengers
•many somatic mutations in cancer = „passengers”
•a driver = a gene that confers a selective advantage.
Recurrently mutated (ie. more than expected)
1. For missense, could be measured using the dN/dS
2.
3. commonly: find backgroud mut. frequencies for patient from
entire exome  see if a gene is above that background
Intronic rates as a
baseline: INVEX test
Hodis et al. (Cell 2012)
0
0.25
0.5
0.75
1
oPC2(24.3%)
carcinoma, 1Mb
non-carcinoma, 1Mb
pooled, 200kb
liver, 200kb
liver, 1Mb
breast, 1Mb
H3K9me3,
1Mb
hypothalamus
liver
skeletal & heart muscle
6 tissues
mRNA levels
0
0.2
0.4
0.6
0.8
1
9
0.6
0.8
1
oncogenes:
translocation
(217)
missense
(40)
copy number
(12)
tumor
suppressors:
all
mechanisms
(84)
Cancer Gene
CensusA
recurrently mutated genes
(self-reported in literature)
B
known
cancer genes
in Census
others:
336
39
38
C
# mutation
(110 cancers,
heterochrom
levels in 1 M
19 1821
missense-
activated
oncogenes
recurrently mutated
(from literature)
oncogenes
0
0.2
0.4
0.6
0.8
1
0.1 0
0
0.25
0.5
0.75
1
oPC2(24.3%)
carcinoma, 1Mb
non-carcinoma, 1Mb
pooled, 200kb
liver, 200kb
liver, 1Mb
breast, 1Mb
H3K9me3,
1Mb
hypothalamus
liver
skeletal & heart muscle
6 tissues
mRNA levels
0
0.2
0.4
0.6
0.8
1
9 19
D+ = 0
P = 0.0
0
0.2
0.4
0.6
0.8
1
9 19 29
D- = 0.256
P = 0.005
0.6
0.8
1
oncogenes:
translocation
(217)
missense
(40)
copy number
(12)
tumor
suppressors:
all
mechanisms
(84)
Cancer Gene
CensusA
recurrently mutated genes
(self-reported in literature)
B
known
cancer genes
in Census
others:
336
39
38
C
# mutations per 200 kb
(110 cancers, pooled tissues)
heterochromatin (H3K9me3
levels in 1 MB windows)
# mutations per
(110 cancers, poole
heterochromatin (
levels in 1 MB wi
0.6
0.8
1
D
P
0
0.2
0.4
0.6
0.8
1
0.1 0.3
D+ = 0.215
P = 0.025
D
19 1821
missense-
activated
oncogenes
recurrently mutated
(from literature)
oncogenes
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5
D- = 0.185
P = 0.061
„classical” cancer genes:
newly discovered, from
cancer genomes:
Oncogenes get activated by missense mutations, duplications, translocations....
Tumor suppressors get inactivated by missense/nonsense mutations, deletions,
promoter methylation...
-0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
correlationtoPC2(24.3%)
correlation to PC1 (30.4 % variance)
carcinoma, 1Mb
non-carcinoma, 1Mb
pooled, 200kb
liver, 200kb
liver, 1Mb
breast, 1Mb
H3K9me3,
1Mb
GC3
RepliSeq,
1Mb
hypothalamus
liver
skeletal & heart muscle
6 tissues
regional mutation rates
mRNA levels
1
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
oncogenes:
translocation
(217)
missense
(40)
copy number
(12)
tumor
suppressors:
all
mechanisms
(84)
Cancer Gene
CensusA
recurrently mutated genes
(self-reported in literature)
B
known
cancer genes
in Census
others:
336
39
38
#
(11
het
le
rep
s
19 1821
missense-
activated
oncogenes
recurrently mutated
(from literature)
oncogenes
0
0.2
0.4
0.6
0.8
1
0
Detecting positive selection on
synonymous mutations in cancer
• create „matched sets” of genes closely following the oncogenes in:
• regional mutation rates
• In 1 Mb and 200 kb windows
• expression levels in different tissues
• Heterochromatin, replication timing
• G+C content
How to find a good set of genes?
A genetic algorithm. An optimization technique that can (relatively)
easily handle many criteria at once. Quite efficient. Many parameters.
Operators:
...crossover
...random mutation
carcinoma, 1Mb
non-carcinoma, 1Mb
pooled, 200kb
liver, 200kb
liver, 1Mb
breast, 1Mb
H3K9me3,
1Mb
alamus
muscle
regional mutation rates
mRNA levels
0
0.2
0.4
0.6
0.8
1
9 19 29
D+ = 0.313
P = 0.0004
0
0.2
0.4
0.6
0.8
1
9 19 29
D- = 0.256
P = 0.005
0.2
0.4
0.6
0.8
1
D+ =0.211
P = 0.026
recurrently mutated genes
(self-reported in literature)
known
ncer genes
in Census
others:
336
39
38
C
# mutations per 200 kb
(110 cancers, pooled tissues)
heterochromatin (H3K9me3
levels in 1 MB windows)
# mutations per 200 kb
(110 cancers, pooled tissues)
heterochromatin (H3K9me3
levels in 1 MB windows)
0.2
0.4
0.6
0.8
1
D- = 0.199
P = 0.043
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5
D+ = 0.215
P = 0.025
D
19 1821
nse-
ted
enes
recurrently mutated
(from literature)
oncogenes
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5
D- = 0.185
P = 0.061
Oncogenes: Tumor suppressors:
Distributions of regional mutation rates (1Mb and 200 kb), heterochromatin,
etc. in the optimized sets of non-cancer genes closely match the cancer genes.
Genetic algorithm tries to minimize the K-S statistic.
-0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
correlationtoPC2(24.3%)
correlation to PC1 (30.4 % variance)
carcinoma, 1Mb
non-carcinoma, 1Mb
pooled, 200kb
liver, 200kb
liver, 1Mb
breast, 1Mb
H3K9me3,
1Mb
GC3
RepliSeq,
1Mb
hypothalamus
liver
skeletal & heart muscle
6 tissues
regional mutation rates
mRNA levels
0
0.2
0.4
0.6
0.8
1
-2 0 2
D- = 0.224
P = 0.017
0
0.2
0.4
0.6
0.8
1
9
D
P
0
0.2
0.4
0.6
0.8
1
-2 0
0
0.2
0.4
0.6
0.8
1
9 19 29
D- = 0.256
P = 0.005
0
0.2
0.4
0.6
0.8
1
D+ =0.211
P = 0.026
earlylate
oncogenes:
translocation
(217)
missense
(40)
copy number
(12)
tumor
suppressors:
all
mechanisms
(84)
Cancer Gene
CensusA
recurrently mutated genes
(self-reported in literature)
matched sets of noncancer genes:
1517 genes (for oncogenes)
693 genes (for tumor suppressors)
B
known
cancer genes
in Census
others:
336
39
38
C
# mutations per 200 kb
(110 cancers, pooled tissues)
heterochromatin (H3K9me3
levels in 1 MB windows)
replication timing (RepliSeq
signal in 1 MB windows)
mRNA levels, avg. of 6 tissues
# mutatio
(110 cancers
heterochrom
levels in 1
replication t
signal in 1
mRNA levels,
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0.1
D+ =
P = 0
39 oncogenes (recurrently mutated)
38 tumor suppressors (recurr. mutated)
D
19 1821
missense-
activated
oncogenes
recurrently mutated
(from literature)
oncogenes
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5
D- = 0.185
P = 0.061
Expected: the oncogenes and the tumor
suppressors are highly enriched with
missense mutations (~1.5 - 2.5x).
However, the oncogenes are also enriched
with synoynmous mutations over their
matched sets, ~1.2x.
Introns of oncogenes (from whole-
genome sequencing) are not enriched
with SNVs, compared to matched sets.
The matched sets method agrees with
Invex, and with simply using
neighboring genes as a baseline.
Tissue-specific oncogenes are more enriched
with synonymous mutations in the
corresponding tissue.
This effect is not due to mutation
showers/clustered mutations, as the same
cancer samples don't tend to contain both a
synonymous and a missense mutation in same
gene.
Synonymous enrichment in oncogenes
is detectable across cancer types.
Some oncogenes are more highly
enriched with synonymous mutations
than others, e.g. PDGFRA, EGFR, GATA1,
ELN, NTRK1, JAK3, ALK and others (n=16).
The synonymous SNV enrichment in
these genes is not paralleled by
intronic SNV enrichment.
The synonymous mutations tend to cluster together to a similar extent as the
missense mutations in the affected oncogenes. They also (less prominently)
cluster with missense mutations.
0%
10%
20%
30%
40%
50%
60%
optimal
codon
gain
optimal
codon
loss
no
change
%ofsynonymousmutations
leadingtooutcome
n.s.
-18
-13
-8
-3
mRNAfoldingfreeenergy
aroundmutatedsites(kcal/mol)
50nt windows
w.t.
mRNA
mut.
mRNA
-31
-26
-21
-16
-11
-6 100nt windows
w.t.
mRNA
mut.
mRNA
0%
10%
20%
30%
40%
≤30 nt 31-70
nt
>70 nt
p < 10-4
1.75
1.26
0.45
-2
-1
0
1
2
1 2 3 4 5 6 7
log2RPKMofexon
exon # in transcript ENST00000334286
30 random samples w/o point mutations
6 samples w/ synonymous exonic mutations
EDNRB gene,
colorectal cancer
-0.5
-0.3
-0.1
0.1
0.3
0.5
whole
cDNA
sites w/
phyloP>1.0net#ofgainedmiRNAseed
sitespersyn.mutation
16 oncogenes
matched set
-0.3 -0.2 -0.1 0 0.1 0.2
normalized difference (Glass' delta) between properties
of mutated positions in oncogenes vs. matched set
Relative preference value at C-cap (of α helices)
Normalized frequency of turn in all-α class
Alpha-helix indices for α-proteins t-test
FDR
0%
10%
20%
30%
enh.
gain
enh.
loss
sil.
gain
sil.
loss
%syn.mutations(within30nt
ofsplicesite)leadingtoevent
Ke et al. 2012 hexamers
1.53
0.83
0.60
1.90
p = 0.02
enh.
gain
enh.
loss
RESCUE-ESE
p = 0.003
1.90
0.53
sil.
gain
sil.
loss
FAS-hex2
p = 3·10-4
0.37
2.73
A B C
D E
G
F
10%
20%
p=0.05
n.s.
n.s.
1.43
1.12
0.79
20%
30%
40%
50%
actual
synonymous
mutations
randomized
mutation
H
Use of „optimal codons” miRNA binding sites Secondary structures in mRNA
What could the synonymous mutations do?
0%
10%
20%
30%
40%
50%
60%
optimal
codon
gain
optimal
codon
loss
no
change
%ofsynonymousmutations
leadingtooutcome
n.s.
-18
-13
-8
-3
mRNAfoldingfreeenergy
aroundmutatedsites(kcal/mol)
50nt windows
w.t.
mRNA
mut.
mRNA
-31
-26
-21
-16
-11
-6 100nt windows
w.t.
mRNA
mut.
mRNA
0%
10%
20%
30%
40%
≤30 nt 31-70
nt
>70 nt
p < 10-4
1.75
1.26
0.45
-2
-1
0
1
2
1 2 3 4 5 6 7
log2RPKMofexon
exon # in transcript ENST00000334286
30 random samples w/o point mutations
6 samples w/ synonymous exonic mutations
EDNRB gene,
colorectal cancer
-0.5
-0.3
-0.1
0.1
0.3
0.5
whole
cDNA
sites w/
phyloP>1.0net#ofgainedmiRNAseed
sitespersyn.mutation
16 oncogenes
matched set
-0.3 -0.2 -0.1 0 0.1 0.2
normalized difference (Glass' delta) between properties
of mutated positions in oncogenes vs. matched set
Relative preference value at C-cap (of α helices)
Normalized frequency of turn in all-α class
Alpha-helix indices for α-proteins t-test
FDR
0%
10%
20%
30%
enh.
gain
enh.
loss
sil.
gain
sil.
loss
%syn.mutations(within30nt
ofsplicesite)leadingtoevent
Ke et al. 2012 hexamers
1.53
0.83
0.60
1.90
p = 0.02
enh.
gain
enh.
loss
RESCUE-ESE
p = 0.003
1.90
0.53
sil.
gain
sil.
loss
FAS-hex2
p = 3·10-4
0.37
2.73
A B C
D E
G
F
10%
20%
p=0.05
n.s.
n.s.
1.43
1.12
0.79
20%
30%
40%
50%
actual
synonymous
mutations
randomized
mutation
H
Use of „optimal codons” miRNA binding sites Secondary structures in mRNA
No general effect was detected in any of these cases (although they may still
be important in specific examples).
Exonic Splicing Enhancer
~ and ~
Exonic Splicing Silencer
From Cartegni, Chew & Krainer. Nat Rev Genet. 2002
3(4),285-98.
AGAAGA enh
GAAGAT enh
GACGTC enh
GAAGAC enh
....
CTTTTA sil
CTTTAA sil
TAGGTA sil
TAGTAG sil
Synonymous SNVs tend
to be closer to splice
sites in oncogenes.
They also tend to cause gains of known exonic
splicing enhancer motifs, and losses of exonic
splicing silencer motifs.
They more often affect
exons with weaker
(noncanonical) splice
sites.
The exonic splicing
enhancers created may
resemble SF2/ASF motifs.
The ESS sites that are lost
upon mutation
sometimes resemble
hnRNP A2/B1, H2 and A1
motifs.
Roughly ½ of the putatively causal synonymous mutations alter
splicing, as evidenced by examining RNA-seq data from cancer.
We don't (yet) know what the other ½ is doing. One possibility
may be affecting protein folding.
In yeast: Pechmann & Frydmann Nature Struct Mol Biol 2013
F
0%
10%
20%
α-helix,
1st a.a.
α-helix,
middle
α-helix,
last a.a.
p=0.05
n.s.
n.s.
1.43
1.12
0.79
0%
10%
20%
30%
40%
50%
coil
actual
synonymous
mutations
randomized
mutation
positions
0%
10%
20%
middle next to
coil only
next to
β-sheet
p = 4·10-5
0.97
1.01
2.60
α-helix
parts:
0%
10%
20%
30%
40%
50%
coil
G
H
N’’ N’ Ncap Ccap C’ C’’
α-helix
turn
-0.3 -0.2 -0.1 0 0.1 0.2
normalized difference (Glass' delta) between
mutated sites in oncogenes vs. matched set
relative preference value at C-cap
normalized frequency of turn in all-α class
α-helix indices for α-proteins
relative preference value at N'
relative preference value at N''
normalized frequency of α-helix in all-α class
FDR
<10%
...also in cancer: we observe an
enrichment of synonymous
mutations at N-termini of alpha-
helices, esp. if close to beta-sheets.
Suggestive of effects on folding.
known
novel
TP53 gene has a large
excess of synonymous
mutations, which are
always near splice sites.
We found three examples
of recurrent SNV that
inactivate the nearby splice
site.
causes a frameshift
Dosage sensitive oncogenes have many point mutations in their 3' UTRs
Take-home messages:
• oncogenes contain an excess of synonymous mutations in human
cancers
• a subset of synonymous mutations target splicing motifs
• 1/5 to 1/2 synonymous mutations in oncogenes reported to-date are
acting as driver mutations
• ~6 – 8% of all driver mutations due to single nucleotide changes are likely to be
synonymous mutations
• TP53 has recurrent synonymous mutations that disrupt splice sites
• an excess of mutations of 3’ UTRs of dosage-sensitive genes
published in: Supek et al. (2014) Cell. http://dx.doi.org/10.1016/j.cell.2014.01.051
Thank you!
Fran Supek
1) Lehner group, CRG/EMBL Systems Biology Unit, Barcelona
2) Dept of Electronics, RBI, Zagreb, Croatia
XXI Jornades de Biologia Molecular
Barcelona, 11.6.2014
End of Part 2. Part 1 deals with inferring microbial gene function from
evolutionary change in codon biases, and is available separately.

Synonymous mutations as drivers in human cancer genomes.

  • 1.
    Synonymous mutations -from bacterial evolution to somatic changes in human cancer Fran Supek 1) Lehner group, CRG/EMBL Systems Biology Unit, Barcelona 2) Division of Electronics, RBI, Zagreb, Croatia XXI Jornades de Biologia Molecular Barcelona, 11.6.2014 Part 2: Synonymous mutations frequently act as drivers in carcinogenesis.
  • 2.
    synonymous mutations = changesin the gene sequence that don’t alter the protein sequence
  • 3.
    Synonymous mutations • (some)synonymous mutations are subject to evolutionary pressures • clearly shown for many bacteria and yeasts • likely also higher Eukarya (but weaker signal) • how does selection for/against synonymous changes relate to gene function in (a) evolution of bacteria and (b) in carcinogenesis? evolutionary trace across ~1000 bacterial genomes somatic mutations in ~4000 human cancers malignant transformationadaptation to diverse environments ( plush microbes in photos are from http://www.giantmicrobes.com/ )
  • 4.
    A deluge ofhuman cancer genomic data 3851 cancer exomes from 11 tissues (>200 samples each) 292,405 missense and 123,193 synonymous somatic mutations ARE THE SYNONYMOUS MUTATIONS SELECTED FOR IN CARCINOGENESIS?
  • 5.
    from Lawrence etal (2013) Nature. Mutation rate varies widely across the genome and correlates with DNA replication time and expression level.
  • 6.
    from Schuster-Böckler andLehner (2012) heterochromatin correlates to SNV rates
  • 7.
    Drivers vs. passengers •manysomatic mutations in cancer = „passengers” •a driver = a gene that confers a selective advantage. Recurrently mutated (ie. more than expected) 1. For missense, could be measured using the dN/dS 2. 3. commonly: find backgroud mut. frequencies for patient from entire exome  see if a gene is above that background Intronic rates as a baseline: INVEX test Hodis et al. (Cell 2012)
  • 8.
    0 0.25 0.5 0.75 1 oPC2(24.3%) carcinoma, 1Mb non-carcinoma, 1Mb pooled,200kb liver, 200kb liver, 1Mb breast, 1Mb H3K9me3, 1Mb hypothalamus liver skeletal & heart muscle 6 tissues mRNA levels 0 0.2 0.4 0.6 0.8 1 9 0.6 0.8 1 oncogenes: translocation (217) missense (40) copy number (12) tumor suppressors: all mechanisms (84) Cancer Gene CensusA recurrently mutated genes (self-reported in literature) B known cancer genes in Census others: 336 39 38 C # mutation (110 cancers, heterochrom levels in 1 M 19 1821 missense- activated oncogenes recurrently mutated (from literature) oncogenes 0 0.2 0.4 0.6 0.8 1 0.1 0 0 0.25 0.5 0.75 1 oPC2(24.3%) carcinoma, 1Mb non-carcinoma, 1Mb pooled, 200kb liver, 200kb liver, 1Mb breast, 1Mb H3K9me3, 1Mb hypothalamus liver skeletal & heart muscle 6 tissues mRNA levels 0 0.2 0.4 0.6 0.8 1 9 19 D+ = 0 P = 0.0 0 0.2 0.4 0.6 0.8 1 9 19 29 D- = 0.256 P = 0.005 0.6 0.8 1 oncogenes: translocation (217) missense (40) copy number (12) tumor suppressors: all mechanisms (84) Cancer Gene CensusA recurrently mutated genes (self-reported in literature) B known cancer genes in Census others: 336 39 38 C # mutations per 200 kb (110 cancers, pooled tissues) heterochromatin (H3K9me3 levels in 1 MB windows) # mutations per (110 cancers, poole heterochromatin ( levels in 1 MB wi 0.6 0.8 1 D P 0 0.2 0.4 0.6 0.8 1 0.1 0.3 D+ = 0.215 P = 0.025 D 19 1821 missense- activated oncogenes recurrently mutated (from literature) oncogenes 0 0.2 0.4 0.6 0.8 1 0.1 0.3 0.5 D- = 0.185 P = 0.061 „classical” cancer genes: newly discovered, from cancer genomes: Oncogenes get activated by missense mutations, duplications, translocations.... Tumor suppressors get inactivated by missense/nonsense mutations, deletions, promoter methylation...
  • 9.
    -0.5 -0.25 0 0.25 0.5 0.75 1 -1 -0.75 -0.5-0.25 0 0.25 0.5 0.75 1 correlationtoPC2(24.3%) correlation to PC1 (30.4 % variance) carcinoma, 1Mb non-carcinoma, 1Mb pooled, 200kb liver, 200kb liver, 1Mb breast, 1Mb H3K9me3, 1Mb GC3 RepliSeq, 1Mb hypothalamus liver skeletal & heart muscle 6 tissues regional mutation rates mRNA levels 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 oncogenes: translocation (217) missense (40) copy number (12) tumor suppressors: all mechanisms (84) Cancer Gene CensusA recurrently mutated genes (self-reported in literature) B known cancer genes in Census others: 336 39 38 # (11 het le rep s 19 1821 missense- activated oncogenes recurrently mutated (from literature) oncogenes 0 0.2 0.4 0.6 0.8 1 0 Detecting positive selection on synonymous mutations in cancer • create „matched sets” of genes closely following the oncogenes in: • regional mutation rates • In 1 Mb and 200 kb windows • expression levels in different tissues • Heterochromatin, replication timing • G+C content
  • 10.
    How to finda good set of genes? A genetic algorithm. An optimization technique that can (relatively) easily handle many criteria at once. Quite efficient. Many parameters. Operators: ...crossover ...random mutation
  • 11.
    carcinoma, 1Mb non-carcinoma, 1Mb pooled,200kb liver, 200kb liver, 1Mb breast, 1Mb H3K9me3, 1Mb alamus muscle regional mutation rates mRNA levels 0 0.2 0.4 0.6 0.8 1 9 19 29 D+ = 0.313 P = 0.0004 0 0.2 0.4 0.6 0.8 1 9 19 29 D- = 0.256 P = 0.005 0.2 0.4 0.6 0.8 1 D+ =0.211 P = 0.026 recurrently mutated genes (self-reported in literature) known ncer genes in Census others: 336 39 38 C # mutations per 200 kb (110 cancers, pooled tissues) heterochromatin (H3K9me3 levels in 1 MB windows) # mutations per 200 kb (110 cancers, pooled tissues) heterochromatin (H3K9me3 levels in 1 MB windows) 0.2 0.4 0.6 0.8 1 D- = 0.199 P = 0.043 0 0.2 0.4 0.6 0.8 1 0.1 0.3 0.5 D+ = 0.215 P = 0.025 D 19 1821 nse- ted enes recurrently mutated (from literature) oncogenes 0 0.2 0.4 0.6 0.8 1 0.1 0.3 0.5 D- = 0.185 P = 0.061 Oncogenes: Tumor suppressors: Distributions of regional mutation rates (1Mb and 200 kb), heterochromatin, etc. in the optimized sets of non-cancer genes closely match the cancer genes. Genetic algorithm tries to minimize the K-S statistic.
  • 12.
    -0.5 -0.25 0 0.25 0.5 0.75 1 -1 -0.75 -0.5-0.25 0 0.25 0.5 0.75 1 correlationtoPC2(24.3%) correlation to PC1 (30.4 % variance) carcinoma, 1Mb non-carcinoma, 1Mb pooled, 200kb liver, 200kb liver, 1Mb breast, 1Mb H3K9me3, 1Mb GC3 RepliSeq, 1Mb hypothalamus liver skeletal & heart muscle 6 tissues regional mutation rates mRNA levels 0 0.2 0.4 0.6 0.8 1 -2 0 2 D- = 0.224 P = 0.017 0 0.2 0.4 0.6 0.8 1 9 D P 0 0.2 0.4 0.6 0.8 1 -2 0 0 0.2 0.4 0.6 0.8 1 9 19 29 D- = 0.256 P = 0.005 0 0.2 0.4 0.6 0.8 1 D+ =0.211 P = 0.026 earlylate oncogenes: translocation (217) missense (40) copy number (12) tumor suppressors: all mechanisms (84) Cancer Gene CensusA recurrently mutated genes (self-reported in literature) matched sets of noncancer genes: 1517 genes (for oncogenes) 693 genes (for tumor suppressors) B known cancer genes in Census others: 336 39 38 C # mutations per 200 kb (110 cancers, pooled tissues) heterochromatin (H3K9me3 levels in 1 MB windows) replication timing (RepliSeq signal in 1 MB windows) mRNA levels, avg. of 6 tissues # mutatio (110 cancers heterochrom levels in 1 replication t signal in 1 mRNA levels, 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0.1 D+ = P = 0 39 oncogenes (recurrently mutated) 38 tumor suppressors (recurr. mutated) D 19 1821 missense- activated oncogenes recurrently mutated (from literature) oncogenes 0 0.2 0.4 0.6 0.8 1 0.1 0.3 0.5 D- = 0.185 P = 0.061 Expected: the oncogenes and the tumor suppressors are highly enriched with missense mutations (~1.5 - 2.5x). However, the oncogenes are also enriched with synoynmous mutations over their matched sets, ~1.2x.
  • 13.
    Introns of oncogenes(from whole- genome sequencing) are not enriched with SNVs, compared to matched sets. The matched sets method agrees with Invex, and with simply using neighboring genes as a baseline.
  • 14.
    Tissue-specific oncogenes aremore enriched with synonymous mutations in the corresponding tissue. This effect is not due to mutation showers/clustered mutations, as the same cancer samples don't tend to contain both a synonymous and a missense mutation in same gene. Synonymous enrichment in oncogenes is detectable across cancer types.
  • 15.
    Some oncogenes aremore highly enriched with synonymous mutations than others, e.g. PDGFRA, EGFR, GATA1, ELN, NTRK1, JAK3, ALK and others (n=16). The synonymous SNV enrichment in these genes is not paralleled by intronic SNV enrichment.
  • 16.
    The synonymous mutationstend to cluster together to a similar extent as the missense mutations in the affected oncogenes. They also (less prominently) cluster with missense mutations.
  • 17.
    0% 10% 20% 30% 40% 50% 60% optimal codon gain optimal codon loss no change %ofsynonymousmutations leadingtooutcome n.s. -18 -13 -8 -3 mRNAfoldingfreeenergy aroundmutatedsites(kcal/mol) 50nt windows w.t. mRNA mut. mRNA -31 -26 -21 -16 -11 -6 100ntwindows w.t. mRNA mut. mRNA 0% 10% 20% 30% 40% ≤30 nt 31-70 nt >70 nt p < 10-4 1.75 1.26 0.45 -2 -1 0 1 2 1 2 3 4 5 6 7 log2RPKMofexon exon # in transcript ENST00000334286 30 random samples w/o point mutations 6 samples w/ synonymous exonic mutations EDNRB gene, colorectal cancer -0.5 -0.3 -0.1 0.1 0.3 0.5 whole cDNA sites w/ phyloP>1.0net#ofgainedmiRNAseed sitespersyn.mutation 16 oncogenes matched set -0.3 -0.2 -0.1 0 0.1 0.2 normalized difference (Glass' delta) between properties of mutated positions in oncogenes vs. matched set Relative preference value at C-cap (of α helices) Normalized frequency of turn in all-α class Alpha-helix indices for α-proteins t-test FDR 0% 10% 20% 30% enh. gain enh. loss sil. gain sil. loss %syn.mutations(within30nt ofsplicesite)leadingtoevent Ke et al. 2012 hexamers 1.53 0.83 0.60 1.90 p = 0.02 enh. gain enh. loss RESCUE-ESE p = 0.003 1.90 0.53 sil. gain sil. loss FAS-hex2 p = 3·10-4 0.37 2.73 A B C D E G F 10% 20% p=0.05 n.s. n.s. 1.43 1.12 0.79 20% 30% 40% 50% actual synonymous mutations randomized mutation H Use of „optimal codons” miRNA binding sites Secondary structures in mRNA What could the synonymous mutations do?
  • 18.
    0% 10% 20% 30% 40% 50% 60% optimal codon gain optimal codon loss no change %ofsynonymousmutations leadingtooutcome n.s. -18 -13 -8 -3 mRNAfoldingfreeenergy aroundmutatedsites(kcal/mol) 50nt windows w.t. mRNA mut. mRNA -31 -26 -21 -16 -11 -6 100ntwindows w.t. mRNA mut. mRNA 0% 10% 20% 30% 40% ≤30 nt 31-70 nt >70 nt p < 10-4 1.75 1.26 0.45 -2 -1 0 1 2 1 2 3 4 5 6 7 log2RPKMofexon exon # in transcript ENST00000334286 30 random samples w/o point mutations 6 samples w/ synonymous exonic mutations EDNRB gene, colorectal cancer -0.5 -0.3 -0.1 0.1 0.3 0.5 whole cDNA sites w/ phyloP>1.0net#ofgainedmiRNAseed sitespersyn.mutation 16 oncogenes matched set -0.3 -0.2 -0.1 0 0.1 0.2 normalized difference (Glass' delta) between properties of mutated positions in oncogenes vs. matched set Relative preference value at C-cap (of α helices) Normalized frequency of turn in all-α class Alpha-helix indices for α-proteins t-test FDR 0% 10% 20% 30% enh. gain enh. loss sil. gain sil. loss %syn.mutations(within30nt ofsplicesite)leadingtoevent Ke et al. 2012 hexamers 1.53 0.83 0.60 1.90 p = 0.02 enh. gain enh. loss RESCUE-ESE p = 0.003 1.90 0.53 sil. gain sil. loss FAS-hex2 p = 3·10-4 0.37 2.73 A B C D E G F 10% 20% p=0.05 n.s. n.s. 1.43 1.12 0.79 20% 30% 40% 50% actual synonymous mutations randomized mutation H Use of „optimal codons” miRNA binding sites Secondary structures in mRNA No general effect was detected in any of these cases (although they may still be important in specific examples).
  • 19.
    Exonic Splicing Enhancer ~and ~ Exonic Splicing Silencer From Cartegni, Chew & Krainer. Nat Rev Genet. 2002 3(4),285-98. AGAAGA enh GAAGAT enh GACGTC enh GAAGAC enh .... CTTTTA sil CTTTAA sil TAGGTA sil TAGTAG sil
  • 20.
    Synonymous SNVs tend tobe closer to splice sites in oncogenes. They also tend to cause gains of known exonic splicing enhancer motifs, and losses of exonic splicing silencer motifs.
  • 21.
    They more oftenaffect exons with weaker (noncanonical) splice sites. The exonic splicing enhancers created may resemble SF2/ASF motifs. The ESS sites that are lost upon mutation sometimes resemble hnRNP A2/B1, H2 and A1 motifs.
  • 22.
    Roughly ½ ofthe putatively causal synonymous mutations alter splicing, as evidenced by examining RNA-seq data from cancer. We don't (yet) know what the other ½ is doing. One possibility may be affecting protein folding.
  • 23.
    In yeast: Pechmann& Frydmann Nature Struct Mol Biol 2013 F 0% 10% 20% α-helix, 1st a.a. α-helix, middle α-helix, last a.a. p=0.05 n.s. n.s. 1.43 1.12 0.79 0% 10% 20% 30% 40% 50% coil actual synonymous mutations randomized mutation positions 0% 10% 20% middle next to coil only next to β-sheet p = 4·10-5 0.97 1.01 2.60 α-helix parts: 0% 10% 20% 30% 40% 50% coil G H N’’ N’ Ncap Ccap C’ C’’ α-helix turn -0.3 -0.2 -0.1 0 0.1 0.2 normalized difference (Glass' delta) between mutated sites in oncogenes vs. matched set relative preference value at C-cap normalized frequency of turn in all-α class α-helix indices for α-proteins relative preference value at N' relative preference value at N'' normalized frequency of α-helix in all-α class FDR <10% ...also in cancer: we observe an enrichment of synonymous mutations at N-termini of alpha- helices, esp. if close to beta-sheets. Suggestive of effects on folding.
  • 24.
    known novel TP53 gene hasa large excess of synonymous mutations, which are always near splice sites. We found three examples of recurrent SNV that inactivate the nearby splice site. causes a frameshift
  • 25.
    Dosage sensitive oncogeneshave many point mutations in their 3' UTRs
  • 26.
    Take-home messages: • oncogenescontain an excess of synonymous mutations in human cancers • a subset of synonymous mutations target splicing motifs • 1/5 to 1/2 synonymous mutations in oncogenes reported to-date are acting as driver mutations • ~6 – 8% of all driver mutations due to single nucleotide changes are likely to be synonymous mutations • TP53 has recurrent synonymous mutations that disrupt splice sites • an excess of mutations of 3’ UTRs of dosage-sensitive genes published in: Supek et al. (2014) Cell. http://dx.doi.org/10.1016/j.cell.2014.01.051
  • 28.
    Thank you! Fran Supek 1)Lehner group, CRG/EMBL Systems Biology Unit, Barcelona 2) Dept of Electronics, RBI, Zagreb, Croatia XXI Jornades de Biologia Molecular Barcelona, 11.6.2014 End of Part 2. Part 1 deals with inferring microbial gene function from evolutionary change in codon biases, and is available separately.