1. Cancer samples with high
APOBEC-signature
enrichment are especially
enriched for TCA mutations.
Enrichment is a measure of how much more frequently
mutations occur at a target motif (or its reverse
complement) vs. at C:G base pairs in general:
𝐸 𝑇𝐶 =
𝑚𝑢𝑡 𝑇𝐶 ÷ 𝑐𝑜𝑛 𝑇𝐶
𝑚𝑢𝑡 𝐶 ÷ 𝑐𝑜𝑛 𝐶
where mut = number of mutations and con = number of
contexts.
Enrichment for TCW (W = A or T) mutations was used
previously to identify cancer samples with significant
APOBEC mutagenesis [4].
But, pLogo results from Section 3 would suggest that
cancer samples mutated by either A3A or A3B should
be even more enriched for TCA mutations.
All samples were analyzed from 14 cancer types with
WGS data [9,10] and binned by TCW enrichment to
compute χ2 test for trend.
Data from six high APOBEC mutagenesis cohorts are
shown here: (a) bladder urothelial, (b) breast invasive,
(d) head & neck squamous cell, (e) lung adeno-, (f)
lung squamous cell carcinomas, all from The Cancer
Genome Atlas (TCGA); and (c) breast carcinoma from
the International Cancer Genome Consortium (ICGC).
All high APOBEC mutagenesis cohorts showed
significant skewing toward increasing TCA
enrichment as TCW enrichment increased,
consistent with the possibility that A3A and/or A3B were
acting as mutators in these cancer types.
The mutation signatures of
A3A and A3B are statistically
distinguishable.
pLogos [8] were used to identify nucleotides within an
extended APOBEC motif that were statistically over- or
under-represented within 41-mers centered at each C
→ T substitution, with 20 base flanks on either side.
The height of each letter above or below the horizontal
axis denotes magnitude of over- or under-
representation, respectively.
n(fg) denotes number of C, TC, or TCA mutations.
n(bg) denotes number of C, TC, or TCA contexts.
(a & b) When the deaminated C was fixed (highlighted
by boxes), T at -1 (one base 5′) was over-represented
for both A3A (a) and A3B (b).
(c & d) When TC was fixed, A was over-represented at
+1 (one base 3′) for both APOBECs.
(e) When TCA was fixed for A3A, C and T
(pyrimidines = Y) were over-represented at -2.
(f) Conversely when TCA was fixed for A3B, purines (=
R) and especially A were over-represented at -2.
APOBEC3A is the primary mutagenic cytidine deaminase in human cancers.
Kin Chan1, Steven A. Roberts2, Leszek J. Klimczak3, Joan F. Sterling1, Natalie Saini1, Ewa P. Malc4, Jaegil Kim5,
David J. Kwiatkowski5,6, David C. Fargo3, Piotr A. Mieczkowski4, Gad Getz5,7, and Dmitry A. Gordenin1
1Genome Integrity & Structural Biology Laboratory, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, NC
2School of Molecular Biosciences, Washington State University, Pullman, WA
3Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, NC
4Department of Genetics, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC
5Broad Institute of MIT and Harvard, Cambridge, MA 6Brigham and Women's Hospital, Harvard Medical School, Boston, MA
7Massachusetts General Hospital, Harvard Medical School, Boston, MA
Abstract
Background: The elucidation of mutagenic processes
that shape cancer genomes is a fundamental problem
whose solution promises insights into new treatment,
diagnostic, and prevention strategies [1]. We and
others previously identified mutation clusters bearing
the single-strand DNA (ssDNA)-specific APOBEC cytidine
deaminase signature at 5′-TC-3′ motifs across many
cancer types [2-4], but the APOBEC enzyme(s)
responsible remain unidentified. APOBEC3B (A3B) is
thought to be the most likely major mutator in
cancers [5,6], while the role of APOBEC3A (A3A) is
considered less certain [3].
Methods: We expressed A3A or A3B in a cdc13-1 yeast
system, which allowed controlled generation of long
stretches of genomic ssDNA substrate upon
temperature shift [7]. We identified selectable mutation
clusters, which inactivated at least two genes within a
three-gene reporter cassette. All mutations were
determined by whole-genome sequencing (WGS) of 94
isolates with clusters. We then evaluated statistical
over- or under-representation of specific nucleotides
within an extended APOBEC motif among unique
mutations at TC. Finally, we applied statistical analyses
to mutation catalogues from 14 types of cancer with
available WGS data, to determine if each individual
sample exhibited a mutation signature matching either
APOBEC.
Results: We found that the A3A mutation signature
in yeast was statistically distinguishable from that
of A3B by preferences at the -2 position relative to the
deamination site, i.e., two bases 5′ of the mutated C.
Cancer genomes with statistically significant, but low,
enrichment for APOBEC signature mutations usually had
an A3B-like signature. Strikingly, cancer genomes with
high enrichment for APOBEC mutations almost always
had an A3A-like signature.
Conclusions: We propose that there is a background
level of A3B-mediated mutagenesis in many cancers,
but in strongly mutated cancers, A3A-mediated
mutagenesis apparently dwarfs the A3B background.
The A3B background is detectable only in samples
without significant A3A signature mutagenesis. While
A3B is likely to be acting in the background of more
cancers, A3A is by far the more prolific deaminase
in terms of sheer numbers of mutations induced
(>11-fold difference when comparing medians).
As such, A3A should be considered a more important
target for development of novel diagnostic and
treatment strategies.
1
References
[1] Stratton, M.R. Exploring the Genomes of Cancer Cells:
Progress and Promise. Science 331, 1553-1558 (2011).
[2] Alexandrov, L.B. et al. Signatures of mutational processes
in human cancer. Nature 500, 415-421 (2013).
[3] Burns, M.B., Temiz, N.A. & Harris, R.S. Evidence for
APOBEC3B mutagenesis in multiple human cancers. Nat
Genet 45, 977-983 (2013).
[4] Roberts, S.A. et al. An APOBEC cytidine deaminase
mutagenesis pattern is widespread in human cancers. Nat
Genet 45, 970-976 (2013).
[5] Burns, M., Leonard, B. & Harris, R. APOBEC3B: Pathological
consequences of an innate immune DNA mutator. Biomed
J 38, 102-110 (2015).
[6] Harris, R.S. Molecular mechanism and clinical impact of
APOBEC3B-catalyzed mutagenesis in breast cancer.
Breast Cancer Res 17, 8 (2015).
[7] Chan, K. et al. Base Damage within Single-Strand DNA
Underlies In Vivo Hypermutability Induced by a Ubiquitous
Environmental Agent. PLoS Genet 8, e1003149 (2012).
[8] O'Shea, J.P. et al. pLogo: a probabilistic approach to
visualizing sequence motifs. Nat Meth 10, 1211-1212 (2013).
[9] Nik-Zainal, S. et al. Association of a germline copy number
polymorphism of APOBEC3A and APOBEC3B with burden
of putative APOBEC-dependent mutations in breast
cancer. Nat Genet 46, 487-491 (2014).
[10] Fredriksson, N.J. et al. Systematic analysis of noncoding
somatic mutations and gene expression alterations across
14 tumor types. Nat Genet 46, 1258-1263 (2014).
[11] Landry, S. et al. APOBEC3A can activate the DNA
damage response and cause cell‐cycle arrest. EMBO Rep
12, 444-450 (2011).
[12] Burns, M.B. et al. APOBEC3B is an enzymatic source of
mutation in breast cancer. Nature 494, 366-370 (2013).
[13] Mussil, B. et al. Human APOBEC3A Isoforms Translocate to
the Nucleus and Induce DNA Double Strand Breaks
Leading to Cell Stress and Death. PLoS ONE 8, e73641
(2013).
[14] Mimitou, E.P. & Symington, L.S. DNA end resection—
Unraveling the tail. DNA Repair 10, 344-348 (2011).
[15] Sakofsky, C.J. et al. Break-Induced Replication Is a
Source of Mutation Clusters Underlying Kataegis. Cell
Rep 7, 1640-1648 (2014).
8
Acknowledgments
This work was made possible by the following NIH
funding: Z1AES103266 (DAG); U24CA143845 (GG);
R01GM052319 (PAM); P01CA120964 (DJK);
R00ES022633 (SAR); and K99ES024424 (KC).
9
Subtelomeric ssDNA
Yeast Model System
(a) Three reporter genes (ADE2, URA3, and CAN1) were
deleted from their native loci and re-introduced near a
de novo telomere on Chromosome V, in cdc13-1
ung1∆ haploid yeast [7]. The scale denotes kilobases
into unique subtelomeric DNA sequence.
(b) Shifting to 37oC results in telomere uncapping,
followed by 5′ to 3′ enzymatic resection, which exposes
a long 3′ overhang.
(c) Expression of either A3A or A3B induces multiple
cytosine deaminations in the exposed ssDNA.
(d) When cells are restored to 23oC, the subtelomeric
DNA is restored to double-strandedness. A’s are
incorporated opposite U’s, resulting in numerous C → T
substitutions and selectable mutation clusters.
2
3
4
High enrichment samples
are A3A-like, while low
enrichment samples are
A3B-like.
To distinguish A3A- from A3B-like samples, we
compared enrichment for YTCA vs. for RTCA mutations.
Only samples with significant TCA enrichment and a
YTCA to RTCA mutation ratio statistically different from a
random mutagenesis model were included in χ2 test for
trend calculations. Samples were binned by quartile of
TCA enrichment.
(a-f) For each high APOBEC mutagenesis cohort, there
was significant skewing toward A3A-like signatures
as a function of increasing TCA enrichment.
(g) A minimal estimate for the number of TCA
mutations attributable to an APOBEC (minTCA) was
computed for each A3A- or A3B-like sample:
𝑚𝑖𝑛 𝑇𝐶𝐴 = 𝑚𝑢𝑡 𝑇𝐶𝐴 ×
𝐸 𝑇𝐶𝐴 − 1
𝐸 𝑇𝐶𝐴
where mutTCA = total number of TCA mutations and ETCA
= enrichment for TCA mutations.
The median minTCA for A3A-like cancers was over
11-fold greater than for A3B-like cancers (1480 vs.
133).
5
Root mean square deviation
(RMSD) & pLogo analyses
confirm YTCA vs. RTCA
enrichment results.
RMSDs between each cancer sample & yeast model
were computed:
𝑅𝑀𝑆𝐷 =
1
4
∑ 𝑁𝐸 𝑁𝑇𝐶𝐴 − 𝑦𝑁𝐸 𝑁𝑇𝐶𝐴
2
where NENTCA and yNENTCA denote the normalized
enrichment for one of the four NTCA’s from a given
cancer sample and yeast model, respectively.
Normalized enrichment for ATCA, as an example:
𝑁𝐸 𝐴𝑇𝐶𝐴 =
𝐸 𝐴𝑇𝐶𝐴
𝐸 𝐴𝑇𝐶𝐴 + 𝐸 𝐶𝑇𝐶𝐴 + 𝐸 𝐺𝑇𝐶𝐴 + 𝐸 𝑇𝑇𝐶𝐴
Results from the BRCA ICGC cohort (panels a-d, lowest
to highest quartiles of TCA enrichment) show
concordance among the YTCA vs. RTCA (left panels),
RMSD (middle), and pLogo (right) methodologies,
confirming that high TCA enrichment correlates
strongly with A3A-like signatures.
Similar three-way concordance among analytical
methods was observed for the five TCGA high APOBEC
mutagenesis cohorts (not shown).
6
Summary of Findings
and Conclusions
Expression of A3A (a) and A3B (b) in yeast revealed
opposite preferences at the -2 nucleotide (YTCA vs.
RTCA, respectively).
(c) Among 243 cancer genomes with significant TCA
enrichment, by YTCA vs. RTCA comparison, 101 were
A3A-like, 63 were A3B-like, and 79 were indeterminate.
(d) By the RMSD approach, 124 genomes were A3A-
like, 75 were A3B-like, and 44 were indeterminate.
We propose that in A3B-like cancers (e), background
mutagenesis by A3B results in weak, but detectable
TCA signatures.
In A3A-like cancers (f), the A3B background is dwarfed
by the mutagenic & genotoxic activity of A3A.
A3A is much more proficient than A3B at inducing
derived DNA double-strand breaks [11-13], whose
repair would likely involve single-strand intermediates
from homologous recombination [14] or break-induced
replication [15]. Thus, A3A is probably triggering the
generation of its own hypermutation substrates.
7