Abstract
Certain mutagens, including the APOBEC3 (A3) cytosine deaminase enzymes, can create multiple genetic changes in a single event. Activity of A3s results in striking ‘mutation showers’ occurring near DNA breakpoints; however, less is known about the mechanisms underlying the majority of A3 mutations. We classified the diverse patterns of clustered mutagenesis in tumor genomes, which identified a new A3 pattern: nonrecurrent, diffuse hypermutation (omikli). This mechanism occurs independently of the known focal hypermutation (kataegis), and is associated with activity of the DNA mismatch-repair pathway, which can provide the single-stranded DNA substrate needed by A3, and contributes to a substantial proportion of A3 mutations genome wide. Because mismatch repair is directed towards early-replicating, gene-rich chromosomal domains, A3 mutagenesis has a high propensity to generate impactful mutations, which exceeds that of other common carcinogens such as tobacco smoke and ultraviolet exposure. Cells direct their DNA repair capacity towards more important genomic regions; thus, carcinogens that subvert DNA repair can be remarkably potent.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Whole-genome sequences from the TCGA project were available through the Cancer Genomics Hub repository (now superseded by the NCI Genomic Data Commons; https://gdc.cancer.gov/). Corresponding SNP array data were downloaded from the GDC legacy portal (https://portal.gdc.cancer.gov/legacy-archive). WGS data from the Hartwig Medical Foundation are available at https://www.hartwigmedicalfoundation.nl/en. The whole-exome sequencing data of TCGA cohort are available through the MC3 dataset at https://gdc.cancer.gov/about-data/publications/mc3-2017. Data generated by the analyses in this study are available in the Supplementary Tables.
Code availability
Code to generate clustered mutation calls was implemented in Python (version 3.6) and R environments (version 3.6). Relevant packages are biopython (version 1.73) and numpy (version 1.15.4) for Python, and Biostrings (2.52.0), VariantAnnotation (1.30.1) and GenomicRanges (1.36.0) for R. Code is available at https://github.com/davidmasp/hyperclust. Statistical analysis of the data was performed using custom scripts in R (version 3.6). Relevant packages are mclust (version 5.4.4), mixtools (version 1.1.0), MASS (version 7.3-51.4) and flexmix (version 2.3-15).
References
Harris, K. & Nielsen, R. Error-prone polymerase activity causes multinucleotide mutations in humans. Genome Res. 24, 1445–1454 (2014).
Rogozin, I. B. et al. DNA polymerase η mutational signatures are found in a variety of different types of cancer. Cell Cycle 17, 348–355 (2018).
Seplyarskiy, V. B. et al. Error-prone bypass of DNA lesions during lagging-strand replication is a common source of germline and cancer mutations. Nat. Genet. 51, 36–41 (2019).
Supek, F. & Lehner, B. Clustered mutation signatures reveal that error-prone DNA repair targets mutations to active genes. Cell 170, 534–547.e23 (2017).
Moris, A., Murray, S. & Cardinaud, S. AID and APOBECs span the gap between innate and adaptive immunity. Front. Microbiol. 5, 534 (2014).
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Burns, M. B., Temiz, N. A. & Harris, R. S. Evidence for APOBEC3B mutagenesis in multiple human cancers. Nat. Genet. 45, 977–983 (2013).
Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).
Roberts, S. A. et al. An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nat. Genet. 45, 970–976 (2013).
Roberts, S. A. et al. Clustered mutations in yeast and in human cancers can arise from damaged long single-strand DNA regions. Mol. Cell 46, 424–435 (2012).
Landry, S., Narvaiza, I., Linfesty, D. C. & Weitzman, M. D. APOBEC3A can activate the DNA damage response and cause cell‐cycle arrest. EMBO Rep. 12, 444–450 (2011).
Suspène, R. et al. Somatic hypermutation of human mitochondrial and nuclear DNA by APOBEC3 cytidine deaminases, a pathway for DNA catabolism. Proc. Natl Acad. Sci. USA 108, 4858–4863 (2011).
Byeon, I.-J. L. et al. NMR structure of human restriction factor APOBEC3A reveals substrate binding and enzyme specificity. Nat. Commun. 4, 1890 (2013).
Holtz, C. M., Sadler, H. A. & Mansky, L. M. APOBEC3G cytosine deamination hotspots are defined by both sequence context and single-stranded DNA secondary structure. Nucleic Acids Res. 41, 6139–6148 (2013).
Nik-Zainal, S. et al. Association of a germline copy number polymorphism of APOBEC3A and APOBEC3B with burden of putative APOBEC-dependent mutations in breast cancer. Nat. Genet. 46, 487–491 (2014).
Glaser, A. P. et al. APOBEC-mediated mutagenesis in urothelial carcinoma is associated with improved survival, mutations in DNA damage response genes, and immune response. Oncotarget 9, 4537–4548 (2017).
Cortez, L. M. et al. APOBEC3A is a prominent cytidine deaminase in breast cancer. PLoS Genet. 15, e1008545 (2019).
Sakofsky, C. J. et al. Break-induced replication is a source of mutation clusters underlying kataegis. Cell Rep. 7, 1640–1648 (2014).
Sakofsky, C. J. et al. Repair of multiple simultaneous double-strand breaks causes bursts of genome-wide clustered hypermutation. PLoS Biol. 17, e3000464 (2019).
Kazanov, M. D. et al. APOBEC-induced cancer mutations are uniquely enriched in early-replicating, gene-dense, and active chromatin regions. Cell Rep. 13, 1103–1109 (2015).
Buisson, R. et al. Passenger hotspot mutations in cancer driven by APOBEC3A and mesoscale genomic features. Science 364, eaaw2872 (2019).
Supek, F. & Lehner, B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521, 81–84 (2015).
Zheng, C. L. et al. Transcription restores DNA repair to heterochromatin, determining regional mutation rates in cancer genomes. Cell Rep. 9, 1228–1234 (2014).
Haradhvala, N. J. et al. Mutational strand asymmetries in cancer genomes reveal mechanisms of DNA damage and repair. Cell 164, 538–549 (2016).
Morganella, S. et al. The topography of mutational processes in breast cancer genomes. Nat. Commun. 7, 11383 (2016).
Seplyarskiy, V. B. et al. APOBEC-induced mutations in human cancers are strongly enriched on the lagging DNA strand during replication. Genome Res. 26, 174–182 (2016).
Green, A. M. et al. APOBEC3A damages the cellular genome during DNA replication. Cell Cycle 15, 998–1008 (2016).
Kanu, N. et al. DNA replication stress mediates APOBEC3 family mutagenesis in breast cancer. Genome Biol. 17, 185 (2016).
Nikkilä, J. et al. Elevated APOBEC3B expression drives a kataegic-like mutation signature and replication stress-related therapeutic vulnerabilities in p53-defective cells. Br. J. Cancer 117, 113–123 (2017).
Bhagwat, A. S. et al. Strand-biased cytosine deamination at the replication fork causes cytosine to thymine mutations in Escherichia coli. Proc. Natl Acad. Sci. USA 113, 2176–2181 (2016).
Hoopes, J. I. et al. APOBEC3A and APOBEC3B preferentially deaminate the lagging strand template during DNA replication. Cell Rep. 14, 1273–1282 (2016).
Chen, J., Miller, B. F. & Furano, A. V. Repair of naturally occurring mismatches can induce mutations in flanking DNA. eLife 3, e02001 (2014).
Cannataro, V. L. et al. APOBEC-induced mutations and their cancer effect size in head and neck squamous cell carcinoma. Oncogene 38, 3475–3487 (2019).
Henderson, S., Chakravarthy, A., Su, X., Boshoff, C. & Fenton, T. R. APOBEC-mediated cytosine deamination links PIK3CA helical domain mutations to human papillomavirus-driven tumor development. Cell Rep. 7, 1833–1841 (2014).
Li, Z. et al. APOBEC signature mutation generates an oncogenic enhancer that drives LMO1 expression in T-ALL. Leukemia 31, 2057–2064 (2017).
De Bruin, E. C. et al. Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science 346, 251–256 (2014).
McGranahan, N. et al. Clonal status of actionable driver events and the timing of mutational processes in cancer evolution. Sci. Transl. Med. 7, 283ra54 (2015).
Ullah, I. et al. Evolutionary history of metastatic breast cancer reveals minimal seeding from axillary lymph nodes. J. Clin. Invest. 128, 1355–1370 (2018).
Reijns, M. A. M. et al. Lagging strand replication shapes the mutational landscape of the genome. Nature 518, 502–506 (2015).
Taylor, B. J. et al. DNA deaminases induce break-associated mutation showers with implication of APOBEC3B and 3A in breast cancer kataegis. eLife 2, e00534 (2013).
D’Antonio, M., Tamayo, P., Mesirov, J. P. & Frazer, K. A. Kataegis expression signature in breast cancer is associated with late onset, better prognosis, and higher HER2 levels. Cell Rep. 16, 672–683 (2016).
Petljak, M. et al. Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell 176, 1282–1294.e20 (2019).
Zhang, Y. et al. A pan-cancer compendium of genes deregulated by somatic genomic rearrangement across more than 1,400 cases. Cell Rep. 24, 515–527 (2018).
Yang, Y., Sterling, J., Storici, F., Resnick, M. A. & Gordenin, D. A. Hypermutability of damaged single-strand DNA formed at double-strand breaks and uncapped telomeres in yeast Saccharomyces cerevisiae. PLoS Genet. 4, e1000264 (2008).
Chan, K. et al. An APOBEC3A hypermutation signature is distinguishable from the signature of background mutagenesis by APOBEC3B in human cancers. Nat. Genet. 47, 1067–1072 (2015).
De, S. & Michor, F. DNA replication timing and long-range DNA interactions predict mutational landscapes of cancer genomes. Nat. Biotechnol. 29, 1103–1108 (2011).
Tomkova, M., Tomek, J., Kriaucionis, S. & Schuster-Böckler, B. Mutational signature distribution varies with DNA replication timing and strand asymmetry. Genome Biol. 19, 129 (2018).
Woo, Y. H. & Li, W.-H. DNA replication timing and selection shape the landscape of nucleotide variation in cancer genomes. Nat. Commun. 3, 1004 (2012).
Zou, X. et al. Validating the concept of mutational signatures with isogenic cell models. Nat. Commun. 9, 1744 (2018).
Li, F. et al. The histone mark H3K36me3 regulates human DNA mismatch repair through its interaction with MutSα. Cell 153, 590–600 (2013).
Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).
Vavouri, T. & Lehner, B. Human genes with CpG island promoters have a distinct transcription-associated chromatin organization. Genome Biol. 13, R110 (2012).
Huang, Y., Gu, L. & Li, G.-M. H3K36me3-mediated mismatch repair preferentially protects actively transcribed genes from mutation. J. Biol. Chem. 293, 7811–7823 (2018).
Mugal, C. F., von Grünberg, H.-H. & Peifer, M. Transcription-induced mutational strand bias and its effect on substitution rates in human genes. Mol. Biol. Evol. 26, 131–142 (2009).
Pfister, S. X. et al. SETD2-dependent histone H3K36 trimethylation is required for homologous recombination repair and genome stability. Cell Rep. 7, 2006–2018 (2014).
Chen, J. & Furano, A. V. Breaking bad: the mutagenic effect of DNA repair. DNA Repair 32, 43–51 (2015).
Andrianova, M. A., Bazykin, G. A., Nikolaev, S. I. & Seplyarskiy, V. B. Human mismatch repair system balances mutation rates between strands by removing more mismatches from the lagging strand. Genome Res. 27, 1336–1343 (2017).
Shinbrot, E. et al. Exonuclease mutations in DNA polymerase epsilon reveal replication strand specific mutation patterns and human origins of replication. Genome Res. 24, 1740–1750 (2014).
Jiricny, J. The multifaceted mismatch-repair system. Nat. Rev. Mol. Cell Biol. 7, 335–346 (2006).
Tran, P. T., Erdeniz, N., Symington, L. S. & Liskay, R. M. EXO1-A multi-tasking eukaryotic nuclease. DNA Repair 3, 1549–1559 (2004).
Cortes-Ciriano, I., Lee, S., Park, W.-Y., Kim, T.-M. & Park, P. J. A molecular portrait of microsatellite instability across multiple cancers. Nat. Commun. 8, 15180 (2017).
Hause, R. J., Pritchard, C. C., Shendure, J. & Salipante, S. J. Classification and characterization of microsatellite instability across 18 cancer types. Nat. Med. 22, 1342–1350 (2016).
Maruvka, Y. E. et al. Analysis of somatic microsatellite indels identifies driver events in human tumors. Nat. Biotechnol. 35, 951–959 (2017).
Hombauer, H., Srivatsan, A., Putnam, C. D. & Kolodner, R. D. Mismatch repair, but not heteroduplex rejection, is temporally coupled to DNA replication. Science 334, 1713–1716 (2011).
Hombauer, H., Campbell, C. S., Smith, C. E., Desai, A. & Kolodner, R. D. Visualization of eukaryotic DNA mismatch repair reveals distinct recognition and repair intermediates. Cell 147, 1040–1053 (2011).
Jeon, Y. et al. Dynamic control of strand excision during human DNA mismatch repair. Proc. Natl Acad. Sci. USA 113, 3281–3286 (2016).
Smith, D. J. & Whitehouse, I. Intrinsic coupling of lagging-strand synthesis to chromatin assembly. Nature 483, 434–438 (2012).
Bowen, N. et al. Reconstitution of long and short patch mismatch repair reactions using Saccharomyces cerevisiae proteins. Proc. Natl Acad. Sci. USA 110, 18472–18477 (2013).
Brosey, C. A. et al. A new structural framework for integrating replication protein A into DNA processing machinery. Nucleic Acids Res. 41, 2313–2327 (2013).
Fan, J. & Pavletich, N. P. Structure and conformational change of a replication protein A heterotrimer bound to ssDNA. Genes Dev. 26, 2337–2347 (2012).
Supek, F. & Lehner, B. Scales and mechanisms of somatic mutation rate variation across the human genome. DNA Repair 81, 102647 (2019).
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385.e18 (2018).
Pich, O. et al. The mutational footprints of cancer therapies. Nat. Genet. 51, 1732–1740 (2019).
Hodis, E. et al. A landscape of driver mutations in melanoma. Cell 150, 251–263 (2012).
Drost, J. et al. Use of CRISPR-modified human stem cell organoids to study the origin of mutational signatures in cancer. Science 358, 234–238 (2017).
Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).
Verheijen, B. M., Vermulst, M. & van Leeuwen, F. W. Somatic mutations in neurons during aging and neurodegeneration. Acta Neuropathol. 135, 811–826 (2018).
Lei, L. et al. APOBEC3 induces mutations during repair of CRISPR–Cas9-generated DNA breaks. Nat. Struct. Mol. Biol. 25, 45–52 (2018).
Belfield, E. J. et al. DNA mismatch repair preferentially protects genes from mutation. Genome Res. 28, 66–74 (2018).
Lujan, S. A. et al. Heterogeneous polymerase fidelity and mismatch repair bias genome variation and composition. Genome Res. 24, 1751–1764 (2014).
Peña-Diaz, J. et al. Noncanonical mismatch repair as a source of genomic instability in human cells. Mol. Cell 47, 669–680 (2012).
Zlatanou, A. et al. The hMSH2–hMSH6 complex acts in concert with monoubiquitinated PCNA and pol η in response to oxidative DNA damage in human cells. Mol. Cell 43, 649–662 (2011).
Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor–normal sample pairs. Bioinformatics 28, 1811–1817 (2012).
Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216 (2019).
Huang, M. N. et al. MSIseq: software for assessing microsatellite instability from catalogs of somatic mutations. Sci. Rep. 5, 13321 (2015).
Wang, J. et al. Clonal evolution of glioblastoma under therapy. Nat. Genet. 48, 768–776 (2016).
Hayward, N. K. et al. Whole-genome landscapes of major melanoma subtypes. Nature 545, 175–180 (2017).
Campbell, P. J. et al. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).
Ellrott, K. et al. Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Syst. 6, 271–281.e7 (2018).
Grün, B. & Leisch, F. FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J. Stat. Softw. 28, 1–35 (2008).
Khodabakhshi, A. H. et al. Recurrent targets of aberrant somatic hypermutation in lymphoma. Oncotarget 3, 1308–1319 (2012).
Krüger, S. et al. Rare variants in neurodegeneration associated genes revealed by targeted panel sequencing in a German ALS cohort. Front. Mol. Neurosci. 9, 92 (2016).
Hart, T. et al. Evaluation and design of genome-wide CRISPR/SpCas9 knockout screens. G3 (Bethesda) 7, 2719–2727 (2017).
Liu, J. et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 173, 400–416.e11 (2018).
Acknowledgements
We thank the members of the Genome Data Science group and B. Lehner for comments and discussions. This work was funded by the ERC Starting Grant HYPER-INSIGHT (757700) and the Spanish Ministry of Economy and Competitiveness (REGIOMUT, grant number BFU2017-89833-P). The results published here are in whole or part based on data generated by the TCGA Research Network (https://www.cancer.gov/tcga). This publication and the underlying research are partly facilitated by the Hartwig Medical Foundation and Center for Personalized Cancer Treatment (CPCT), which have generated, analyzed and made available data for this research. D.M.P. was funded by a Severo Ochoa FPI fellowship (MCIU/Fondo Social Europeo; BES-2017-079820). F.S. was funded by the ICREA Research Professor program and is a member of the EMBO Young Investigator Program. The authors acknowledge support from the Severo Ochoa Centre of Excellence program to IRB Barcelona.
Author information
Authors and Affiliations
Contributions
F.S. and D.M.-P. conceptualized the study and devised the methodology. D.M.-P. carried out the formal analysis and the investigation, operated the software and performed data visualization. D.M.-P. and F.S. wrote and edited the draft manuscript. F.S. acquired the funding and supervised the study.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Detecting clustered mutations and simulating processes that generate clustered mutations.
a, Method to determine significant mutation clustering using HyperClust. A baseline distribution is generated by shuffling mutations within 1 Mbp windows multiple times (R1, R2, …, Rn) to loci with matching trinucleotide contexts. For every mutation, the observed intermutational distance to its nearest neighbour (nIMD) is compared with distributions of expected IMDs (from randomized data) to determine a local FDR (lfdr). Thresholding by lfdr yields clustered mutation calls (blue). b, Overview of study. c, Precision-recall curves for models in Fig. 1a, derived from simulated data with spiked-in mutation clusters: kataegis (top; with five mutations per cluster at an average 600 bp pairwise distance) or omikli_M (bottom; two mutations at 101 bp). Two examples of high mutation burden tumors (TCGA-AP-A0LD, TCGA-AP-A0LE) were used to generate the background mutation distributions. d, e, Testing accuracy of mutation cluster calling methods using simulated data. Points represent randomized tumor samples into which spiked-in mutation clusters were introduced. Samples are ordered according to total mutation burden (panel d). Columns show different performance metrics: F1 score, precision, and recall, all at lfdr=20%. Rows represent different types of spiked-in mutation clusters (IMD distributions plotted in panel e, where kataegis have five mutations and omikli_K/M/O two mutations. Boxplots compare cluster calling methods, including implementations of some previous methodologies (details in Methods). The “strand-clonality-lfdr” (blue) is the HyperClust method used throughout our work. f, g, Poisson mixture modelling (related with Fig. 1d) of the number of mutations per cluster, showing relative likelihood (panel f) of models with increasing number of components and the density functions (panel g) of a model with two Poisson components. solid line represents mean and dashed lines the 95% C.I. h, Number of mutation events per tumor sample (x axis, n) per local hypermutation type (rows), either the A3 context TCW>K mutations, or the remaining mutations (columns).
Extended Data Fig. 2 Tetranucleotide context suggests a role for the A3A enzyme in generating omikli and A3B in kataegis mutations.
a, c, Ratios of the YTCA (A3A-like) and RTCA (A3B-like) mutation frequencies suggest differential mutagenic activity of A3A versus A3B enzymes in cancer samples. The C>T and the C>G changes in the two A3 contexts are shown in a pan-cancer analysis (panel a) and broken down by cancer type (panel c). At least 100 TCW mutations of a certain type across all tumor samples in a tissue were required to perform analyses on that tissue (number of mutations in brackets). Error bars are the bootstrap 95% C.I. of the ratio. KICH and THCA cancer types are not shown due to low overall number of A3-context mutations. b, Across multiple cancer types, omikli shows a tendency towards A3A-like, lower RTCA/YTCA-ratios than does kataegis. Difference tested by Fisher’s exact test (per tumor type), two-tailed; p-values were adjusted for multiple testing. Dashed line is FDR=20%. Lower odds ratios (<1) denote relative enrichment of YTCA (A3A-like) mutations in omikli compared to kataegis; see schematic above plot.
Extended Data Fig. 3 Association of clustered mutation rates with replication time (RT).
a, RT association per cancer type. Number of mutations per RT bin: A3 context (top row) and the non-A3 control context at C:G nucleotide pairs (bottom row). RT bins are ordered from the latest-replicating quartile to the earliest-replicating quartile; mutation rates are shown relative to the latest RT bin. Enrichments are not shown when the mutation count was lower than 10. b, Trinucleotide composition of the human reference genome in four RT bins, normalized to the latest RT quartile (leftmost point). The A3 trinucleotide contexts (TCW, green) are similarly abundant in the late and in the early-replicating regions of the genome. c, d, Enrichment of A3-context kataegis clusters, considering only RT (c), or jointly considering RT, mRNA levels and the H3K36me3 histone mark levels (d); points are coefficients from negative binomial regression, and error bars are 95% C.I. e, Mutation rates in genomic bins with different CpG density (determined per 10 kb segment), stratified by RT quartiles. y axis shows mutation densities relative to the first bin (‘t1’, lowest tertile by CpG content). f, Spearman correlation between mRNA expression of A3A, A3B and MMR genes, and the TCW context enrichment of clustered mutations in a tumor. Error bars are 95% C.I. from the Fisher transformation of the correlation coefficient. g, Association of A3 mutation burden (clustered and unclustered) with copy number alterations of MMR genes. Significance by a two-tailed Mann-Whitney test, comparing tumor samples with neutral (0) versus gain/amplification (+1 and +2) states (blue stars, showing p-values according to legend), and independently, comparing samples with neutral (0) versus loss (−1 and −2) states (purple stars). P-values were not adjusted.
Extended Data Fig. 4 Simulations estimate power to detect mutation clusters and deconvolute their IMD distributions.
a, b, An analysis of somatic hypermutation (SHM) events in lymphoid cancers suggests length of MMR excision tracts in human cells. The distance from the initiating AID mutation (here, WNCYN>N context) to the flanking mutation introduced by error-prone MMR (here, any mutation at a A:T pair) is plotted, in known SHM off-target regions (blue) and, as a control, in intergenic regions (red) (panel a). A statistically significant enrichment is seen in the bins of the distance to central AID mutation (x axis) between 400–1000 nt (panel b). Numbers above/below bars are p-values by Chi-square test on the standardized residuals. c, Gamma mixture modelling of the IMD distributions. Log-likelihood values for different number of components when modelling IMD of the A3 kataegis and omikli mutations. d, The alpha and beta parameters of the three fitted gamma distributions (‘comp.1’, ‘comp.2’ and ‘comp. 3’) approximately match the alpha and beta parameters expected from simulated distributions with IMD at 30 bp, 800 bp and 200 bp, respectively. e, f, Simulations using spiked-in clustered mutations into genomes obtained by randomizing and subsampling mutations from MSI-H hypermutated tumors (panel e) and other hypermutators (panel f), with the goal of determining the recall (or sensitivity; y axis) of recovering mutation clusters at various global mutation burdens (x axis). Dashed line is a loess fit and shaded area is its 95% C.I. Vertical lines are residuals of the fit. g, Difference between MSI and MSS tumor samples in the absolute burden of clustered A3 omikli mutations; significance by Mann-Whitney test (two-tailed).
Extended Data Fig. 5 Validation analyses using independent genomic data sets.
a–c, Fitting a Poisson distribution mixture to the number of mutations per cluster in the Hartwig Medical Foundation (HMF) dataset. The near-maximum log likelihood (LL) is obtained with two components (panel c) and the increase to three components is not statistically supported; p-values are from a two-sided bootstrap test. d, e, The relative density of A3 context (left) clustered mutations is higher in MSS (MMR-proficient) than in MSI (MMR-deficient) samples of the same tumor type (left column) in the HMF data. The difference is smaller for the non-A3, control context (right). Significance by Mann-Whitney (two-tailed), n is the number of samples, *** is p < 0.001. Numbers show fold-difference between MSS and MSI samples. The ‘other A3 tissues’ are lung, head-and-neck, skin, pancreas and bladder cancer. f, In HMF data, the A3-context omikli clustered mutations are enriched in tumors with amplified MMR genes; significance by Mann-Whitney test (two-tailed) comparing the neutral (0) versus the gain states (+1 and +2, considered jointly); n is the number of samples. g, In HMF data, A3-context omikli are enriched in early replicating, H3K36me3-marked genomic regions; error bars are 95% C.I. h, Intermutational distance distributions for kataegis (top) and omikli (bottom) A3 context mutations in the HMF data. Dashed lines show peaks of the simulated distributions (Fig. 2) with segment lengths of 25 bp (green), 200 bp (purple) and 800 bp (orange). i, j, Whole-exome sequences in the TCGA data show an excess of A3 context (TCW) mutation fraction in MSS compared to MSI cancers (panel i), and an excess of TCW mutations at distances <1000 bp, normalized to longer distances, in MSS over MSI samples (panel j). ‘MSI-exp’ (n = 152) denotes the experimentally established MSI-H status while ‘MSI-pred’ (n = 18) is the MSI status predicted using machine learning (ref. 61), ‘nonMSI’ (n = 5,661) is neither of these cases.
Extended Data Fig. 6 Contribution of the omikli and the kataegis mechanisms to the unclustered A3 mutation burden in various tissues.
a, The omikli mechanism generates many unclustered mutations (‘A3-O’) in various cancer types. b, The kataegis mechanism generates comparatively few unclustered mutations (‘A3-K’). Panels show the fit (red line) of the unclustered A3 burden (y axis) to the clustered A3 burden (x axis), (see Methods). Error bars are 95% prediction intervals at x=0, and at x = mean burden of A3 clustered mutations for that cancer type. Horizontal dashed lines are the predicted numbers of unclustered A3 mutations at those two points (for clarity also shown in blue/green bars next to each plot). Fits use robust regression (rlm function in R). For visual clarity, only the part of the plot up to the mean of unclustered mutation burden plus a margin is shown, however the fit uses all data points (that is tumor samples) including ones not visualized.
Extended Data Fig. 7 Mechanisms underlying A3 clustered mutations generate many impactful changes, affecting disease genes.
a, Coding regions in the human genome are enriched for CpG dinucleotides (NCG), but not with the A3-context TCW trinucleotides, compared to random expectation. b, Enrichment of mutations in exons versus introns (estimate of selection strength, x axis) and the enrichment in intergenic regions versus introns (estimate of redistribution of mutations towards regions containing genic DNA, y axis; flipped). The comparison of mutagenic agents against APOBEC was performed for selected tissues, matching the relevant tissue with the particular mutagen (tumor samples listed in Supplementary Table 7). Error bars are 95% C.I. from negative binomial regression; numbers in parenthesis are the tally of mutations. c, The differential functional impact of the tested mutagens across replication time (RT) bins. Left: total length of coding sequences (CDS) in the late and early RT bins, shaded by the RT sextiles that were merged to create the two bins (where 1 is the latest and 6 is the earliest RT). Middle: expected number of cancer gene CDS-affecting mutations in an average tumor sample (same sets of samples, genes and mutations as in Fig. 5a; y axis) for the late versus early RT bin (x axis), for various mutagens (colors); error bars are s.e.m. Right: fold-difference between the functional impact at the late versus early bin, for various mutagen types. d, e, The functional impact density (FID) of various mutational processes in a set of cell-essential genes (panel d) and neurodegenerative disease-associated genes (panel e). Slope shows the fraction of impactful genetic changes i.e. those affecting the CDS of at least one gene in the set. Points show the expected number of impactful changes resulting from a mutational process, on average, in a tumor genome affected by that mutational process. Error bars are s.e.m. ‘APOBEC-O4’ is A3 mutagenesis in omikli-rich tumors. ‘APOBEC-K2’ is A3 mutagenesis in kataegis-rich tumors.
Extended Data Fig. 8 Associations between genic mutations and global burden of clustered mutations.
a, Associations between A3-context TCW>K mutations in coding regions of each cancer gene, and the global burden of A3 kataegis (top left) or omikli (middle left) and their interaction term (bottom left). Right panel is same as middle-left panel, but showing only the significant genes, with labels. Volcano plots show logistic regression coefficients (transformed to odds ratio) on the x axis and the log FDR on the y axis. Genes that bore coding mutations in at least three tumor samples were tested. b, Number of TCW sites in a gene coding sequence (CDS; x axis) predicts the association of cancer gene mutations (y axis) with A3 omikli burden (bottom) but not with A3 kataegis burden (top). Error bands are 95% C.I. of the linear fit. c, Same association analysis as panel a but for the control, non-A3 context VCN>K mutations in the gene CDS. d, Early RT cancer genes are more affected by A3 mutagenesis. Cancer genes were stratified into RT quartiles (x axis) and logistic regression coefficient (log odds ratio, y axis) linking A3 omikli burden with the presence of a mutation in the CDS of any cancer gene in that RT bin was determined. Error bars are 95% C.I. from logistic regression (on n=593 tumor samples).
Supplementary information
Supplementary Information
Supplementary Note
Supplementary Tables
Supplementary Tables 1–10
Rights and permissions
About this article
Cite this article
Mas-Ponte, D., Supek, F. DNA mismatch repair promotes APOBEC3-mediated diffuse hypermutation in human cancers. Nat Genet 52, 958–968 (2020). https://doi.org/10.1038/s41588-020-0674-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-020-0674-6
This article is cited by
-
Cell cycle gene alterations associate with a redistribution of mutation risk across chromosomal domains in human cancers
Nature Cancer (2024)
-
Mesoscale DNA features impact APOBEC3A and APOBEC3B deaminase activity and shape tumor mutational landscapes
Nature Communications (2024)
-
APOBEC3-mediated mutagenesis in cancer: causes, clinical significance and therapeutic potential
Journal of Hematology & Oncology (2023)
-
Genomic hallmarks and therapeutic implications of G0 cell cycle arrest in cancer
Genome Biology (2023)
-
Influence network model uncovers relations between biological processes and mutational signatures
Genome Medicine (2023)