Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

DNA mismatch repair promotes APOBEC3-mediated diffuse hypermutation in human cancers

Abstract

Certain mutagens, including the APOBEC3 (A3) cytosine deaminase enzymes, can create multiple genetic changes in a single event. Activity of A3s results in striking ‘mutation showers’ occurring near DNA breakpoints; however, less is known about the mechanisms underlying the majority of A3 mutations. We classified the diverse patterns of clustered mutagenesis in tumor genomes, which identified a new A3 pattern: nonrecurrent, diffuse hypermutation (omikli). This mechanism occurs independently of the known focal hypermutation (kataegis), and is associated with activity of the DNA mismatch-repair pathway, which can provide the single-stranded DNA substrate needed by A3, and contributes to a substantial proportion of A3 mutations genome wide. Because mismatch repair is directed towards early-replicating, gene-rich chromosomal domains, A3 mutagenesis has a high propensity to generate impactful mutations, which exceeds that of other common carcinogens such as tobacco smoke and ultraviolet exposure. Cells direct their DNA repair capacity towards more important genomic regions; thus, carcinogens that subvert DNA repair can be remarkably potent.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Two types of local hypermutation in human tumors.
Fig. 2: Association of A3 clustered mutation density with genomic features.
Fig. 3: MMR activity in tumors is associated with APOBEC mutagenesis.
Fig. 4: The omikli process generates the majority of unclustered A3 mutations across tissues.
Fig. 5: APOBEC mutagenesis generates many impactful mutations.

Similar content being viewed by others

Data availability

Whole-genome sequences from the TCGA project were available through the Cancer Genomics Hub repository (now superseded by the NCI Genomic Data Commons; https://gdc.cancer.gov/). Corresponding SNP array data were downloaded from the GDC legacy portal (https://portal.gdc.cancer.gov/legacy-archive). WGS data from the Hartwig Medical Foundation are available at https://www.hartwigmedicalfoundation.nl/en. The whole-exome sequencing data of TCGA cohort are available through the MC3 dataset at https://gdc.cancer.gov/about-data/publications/mc3-2017. Data generated by the analyses in this study are available in the Supplementary Tables.

Code availability

Code to generate clustered mutation calls was implemented in Python (version 3.6) and R environments (version 3.6). Relevant packages are biopython (version 1.73) and numpy (version 1.15.4) for Python, and Biostrings (2.52.0), VariantAnnotation (1.30.1) and GenomicRanges (1.36.0) for R. Code is available at https://github.com/davidmasp/hyperclust. Statistical analysis of the data was performed using custom scripts in R (version 3.6). Relevant packages are mclust (version 5.4.4), mixtools (version 1.1.0), MASS (version 7.3-51.4) and flexmix (version 2.3-15).

References

  1. Harris, K. & Nielsen, R. Error-prone polymerase activity causes multinucleotide mutations in humans. Genome Res. 24, 1445–1454 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. Rogozin, I. B. et al. DNA polymerase η mutational signatures are found in a variety of different types of cancer. Cell Cycle 17, 348–355 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Seplyarskiy, V. B. et al. Error-prone bypass of DNA lesions during lagging-strand replication is a common source of germline and cancer mutations. Nat. Genet. 51, 36–41 (2019).

    CAS  PubMed  Google Scholar 

  4. Supek, F. & Lehner, B. Clustered mutation signatures reveal that error-prone DNA repair targets mutations to active genes. Cell 170, 534–547.e23 (2017).

    CAS  PubMed  Google Scholar 

  5. Moris, A., Murray, S. & Cardinaud, S. AID and APOBECs span the gap between innate and adaptive immunity. Front. Microbiol. 5, 534 (2014).

    PubMed  PubMed Central  Google Scholar 

  6. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Burns, M. B., Temiz, N. A. & Harris, R. S. Evidence for APOBEC3B mutagenesis in multiple human cancers. Nat. Genet. 45, 977–983 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Roberts, S. A. et al. An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nat. Genet. 45, 970–976 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Roberts, S. A. et al. Clustered mutations in yeast and in human cancers can arise from damaged long single-strand DNA regions. Mol. Cell 46, 424–435 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Landry, S., Narvaiza, I., Linfesty, D. C. & Weitzman, M. D. APOBEC3A can activate the DNA damage response and cause cell‐cycle arrest. EMBO Rep. 12, 444–450 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Suspène, R. et al. Somatic hypermutation of human mitochondrial and nuclear DNA by APOBEC3 cytidine deaminases, a pathway for DNA catabolism. Proc. Natl Acad. Sci. USA 108, 4858–4863 (2011).

    PubMed  Google Scholar 

  13. Byeon, I.-J. L. et al. NMR structure of human restriction factor APOBEC3A reveals substrate binding and enzyme specificity. Nat. Commun. 4, 1890 (2013).

    PubMed  PubMed Central  Google Scholar 

  14. Holtz, C. M., Sadler, H. A. & Mansky, L. M. APOBEC3G cytosine deamination hotspots are defined by both sequence context and single-stranded DNA secondary structure. Nucleic Acids Res. 41, 6139–6148 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Nik-Zainal, S. et al. Association of a germline copy number polymorphism of APOBEC3A and APOBEC3B with burden of putative APOBEC-dependent mutations in breast cancer. Nat. Genet. 46, 487–491 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Glaser, A. P. et al. APOBEC-mediated mutagenesis in urothelial carcinoma is associated with improved survival, mutations in DNA damage response genes, and immune response. Oncotarget 9, 4537–4548 (2017).

    PubMed  PubMed Central  Google Scholar 

  17. Cortez, L. M. et al. APOBEC3A is a prominent cytidine deaminase in breast cancer. PLoS Genet. 15, e1008545 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Sakofsky, C. J. et al. Break-induced replication is a source of mutation clusters underlying kataegis. Cell Rep. 7, 1640–1648 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Sakofsky, C. J. et al. Repair of multiple simultaneous double-strand breaks causes bursts of genome-wide clustered hypermutation. PLoS Biol. 17, e3000464 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Kazanov, M. D. et al. APOBEC-induced cancer mutations are uniquely enriched in early-replicating, gene-dense, and active chromatin regions. Cell Rep. 13, 1103–1109 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Buisson, R. et al. Passenger hotspot mutations in cancer driven by APOBEC3A and mesoscale genomic features. Science 364, eaaw2872 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Supek, F. & Lehner, B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521, 81–84 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Zheng, C. L. et al. Transcription restores DNA repair to heterochromatin, determining regional mutation rates in cancer genomes. Cell Rep. 9, 1228–1234 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Haradhvala, N. J. et al. Mutational strand asymmetries in cancer genomes reveal mechanisms of DNA damage and repair. Cell 164, 538–549 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Morganella, S. et al. The topography of mutational processes in breast cancer genomes. Nat. Commun. 7, 11383 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Seplyarskiy, V. B. et al. APOBEC-induced mutations in human cancers are strongly enriched on the lagging DNA strand during replication. Genome Res. 26, 174–182 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Green, A. M. et al. APOBEC3A damages the cellular genome during DNA replication. Cell Cycle 15, 998–1008 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Kanu, N. et al. DNA replication stress mediates APOBEC3 family mutagenesis in breast cancer. Genome Biol. 17, 185 (2016).

    PubMed  PubMed Central  Google Scholar 

  29. Nikkilä, J. et al. Elevated APOBEC3B expression drives a kataegic-like mutation signature and replication stress-related therapeutic vulnerabilities in p53-defective cells. Br. J. Cancer 117, 113–123 (2017).

    PubMed  PubMed Central  Google Scholar 

  30. Bhagwat, A. S. et al. Strand-biased cytosine deamination at the replication fork causes cytosine to thymine mutations in Escherichia coli. Proc. Natl Acad. Sci. USA 113, 2176–2181 (2016).

    CAS  PubMed  Google Scholar 

  31. Hoopes, J. I. et al. APOBEC3A and APOBEC3B preferentially deaminate the lagging strand template during DNA replication. Cell Rep. 14, 1273–1282 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Chen, J., Miller, B. F. & Furano, A. V. Repair of naturally occurring mismatches can induce mutations in flanking DNA. eLife 3, e02001 (2014).

    PubMed  PubMed Central  Google Scholar 

  33. Cannataro, V. L. et al. APOBEC-induced mutations and their cancer effect size in head and neck squamous cell carcinoma. Oncogene 38, 3475–3487 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Henderson, S., Chakravarthy, A., Su, X., Boshoff, C. & Fenton, T. R. APOBEC-mediated cytosine deamination links PIK3CA helical domain mutations to human papillomavirus-driven tumor development. Cell Rep. 7, 1833–1841 (2014).

    CAS  PubMed  Google Scholar 

  35. Li, Z. et al. APOBEC signature mutation generates an oncogenic enhancer that drives LMO1 expression in T-ALL. Leukemia 31, 2057–2064 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. De Bruin, E. C. et al. Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science 346, 251–256 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. McGranahan, N. et al. Clonal status of actionable driver events and the timing of mutational processes in cancer evolution. Sci. Transl. Med. 7, 283ra54 (2015).

    PubMed  PubMed Central  Google Scholar 

  38. Ullah, I. et al. Evolutionary history of metastatic breast cancer reveals minimal seeding from axillary lymph nodes. J. Clin. Invest. 128, 1355–1370 (2018).

    PubMed  PubMed Central  Google Scholar 

  39. Reijns, M. A. M. et al. Lagging strand replication shapes the mutational landscape of the genome. Nature 518, 502–506 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Taylor, B. J. et al. DNA deaminases induce break-associated mutation showers with implication of APOBEC3B and 3A in breast cancer kataegis. eLife 2, e00534 (2013).

    PubMed  PubMed Central  Google Scholar 

  41. D’Antonio, M., Tamayo, P., Mesirov, J. P. & Frazer, K. A. Kataegis expression signature in breast cancer is associated with late onset, better prognosis, and higher HER2 levels. Cell Rep. 16, 672–683 (2016).

    PubMed  PubMed Central  Google Scholar 

  42. Petljak, M. et al. Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell 176, 1282–1294.e20 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. Zhang, Y. et al. A pan-cancer compendium of genes deregulated by somatic genomic rearrangement across more than 1,400 cases. Cell Rep. 24, 515–527 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Yang, Y., Sterling, J., Storici, F., Resnick, M. A. & Gordenin, D. A. Hypermutability of damaged single-strand DNA formed at double-strand breaks and uncapped telomeres in yeast Saccharomyces cerevisiae. PLoS Genet. 4, e1000264 (2008).

    PubMed  PubMed Central  Google Scholar 

  45. Chan, K. et al. An APOBEC3A hypermutation signature is distinguishable from the signature of background mutagenesis by APOBEC3B in human cancers. Nat. Genet. 47, 1067–1072 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. De, S. & Michor, F. DNA replication timing and long-range DNA interactions predict mutational landscapes of cancer genomes. Nat. Biotechnol. 29, 1103–1108 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. Tomkova, M., Tomek, J., Kriaucionis, S. & Schuster-Böckler, B. Mutational signature distribution varies with DNA replication timing and strand asymmetry. Genome Biol. 19, 129 (2018).

    PubMed  PubMed Central  Google Scholar 

  48. Woo, Y. H. & Li, W.-H. DNA replication timing and selection shape the landscape of nucleotide variation in cancer genomes. Nat. Commun. 3, 1004 (2012).

    PubMed  Google Scholar 

  49. Zou, X. et al. Validating the concept of mutational signatures with isogenic cell models. Nat. Commun. 9, 1744 (2018).

    PubMed  PubMed Central  Google Scholar 

  50. Li, F. et al. The histone mark H3K36me3 regulates human DNA mismatch repair through its interaction with MutSα. Cell 153, 590–600 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).

    CAS  PubMed  Google Scholar 

  52. Vavouri, T. & Lehner, B. Human genes with CpG island promoters have a distinct transcription-associated chromatin organization. Genome Biol. 13, R110 (2012).

    PubMed  PubMed Central  Google Scholar 

  53. Huang, Y., Gu, L. & Li, G.-M. H3K36me3-mediated mismatch repair preferentially protects actively transcribed genes from mutation. J. Biol. Chem. 293, 7811–7823 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Mugal, C. F., von Grünberg, H.-H. & Peifer, M. Transcription-induced mutational strand bias and its effect on substitution rates in human genes. Mol. Biol. Evol. 26, 131–142 (2009).

    CAS  PubMed  Google Scholar 

  55. Pfister, S. X. et al. SETD2-dependent histone H3K36 trimethylation is required for homologous recombination repair and genome stability. Cell Rep. 7, 2006–2018 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. Chen, J. & Furano, A. V. Breaking bad: the mutagenic effect of DNA repair. DNA Repair 32, 43–51 (2015).

    PubMed  PubMed Central  Google Scholar 

  57. Andrianova, M. A., Bazykin, G. A., Nikolaev, S. I. & Seplyarskiy, V. B. Human mismatch repair system balances mutation rates between strands by removing more mismatches from the lagging strand. Genome Res. 27, 1336–1343 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Shinbrot, E. et al. Exonuclease mutations in DNA polymerase epsilon reveal replication strand specific mutation patterns and human origins of replication. Genome Res. 24, 1740–1750 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. Jiricny, J. The multifaceted mismatch-repair system. Nat. Rev. Mol. Cell Biol. 7, 335–346 (2006).

    CAS  PubMed  Google Scholar 

  60. Tran, P. T., Erdeniz, N., Symington, L. S. & Liskay, R. M. EXO1-A multi-tasking eukaryotic nuclease. DNA Repair 3, 1549–1559 (2004).

    CAS  PubMed  Google Scholar 

  61. Cortes-Ciriano, I., Lee, S., Park, W.-Y., Kim, T.-M. & Park, P. J. A molecular portrait of microsatellite instability across multiple cancers. Nat. Commun. 8, 15180 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. Hause, R. J., Pritchard, C. C., Shendure, J. & Salipante, S. J. Classification and characterization of microsatellite instability across 18 cancer types. Nat. Med. 22, 1342–1350 (2016).

    CAS  Google Scholar 

  63. Maruvka, Y. E. et al. Analysis of somatic microsatellite indels identifies driver events in human tumors. Nat. Biotechnol. 35, 951–959 (2017).

    CAS  PubMed  Google Scholar 

  64. Hombauer, H., Srivatsan, A., Putnam, C. D. & Kolodner, R. D. Mismatch repair, but not heteroduplex rejection, is temporally coupled to DNA replication. Science 334, 1713–1716 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. Hombauer, H., Campbell, C. S., Smith, C. E., Desai, A. & Kolodner, R. D. Visualization of eukaryotic DNA mismatch repair reveals distinct recognition and repair intermediates. Cell 147, 1040–1053 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Jeon, Y. et al. Dynamic control of strand excision during human DNA mismatch repair. Proc. Natl Acad. Sci. USA 113, 3281–3286 (2016).

    CAS  PubMed  Google Scholar 

  67. Smith, D. J. & Whitehouse, I. Intrinsic coupling of lagging-strand synthesis to chromatin assembly. Nature 483, 434–438 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. Bowen, N. et al. Reconstitution of long and short patch mismatch repair reactions using Saccharomyces cerevisiae proteins. Proc. Natl Acad. Sci. USA 110, 18472–18477 (2013).

    CAS  PubMed  Google Scholar 

  69. Brosey, C. A. et al. A new structural framework for integrating replication protein A into DNA processing machinery. Nucleic Acids Res. 41, 2313–2327 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  70. Fan, J. & Pavletich, N. P. Structure and conformational change of a replication protein A heterotrimer bound to ssDNA. Genes Dev. 26, 2337–2347 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. Supek, F. & Lehner, B. Scales and mechanisms of somatic mutation rate variation across the human genome. DNA Repair 81, 102647 (2019).

    PubMed  Google Scholar 

  72. Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385.e18 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. Pich, O. et al. The mutational footprints of cancer therapies. Nat. Genet. 51, 1732–1740 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  74. Hodis, E. et al. A landscape of driver mutations in melanoma. Cell 150, 251–263 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. Drost, J. et al. Use of CRISPR-modified human stem cell organoids to study the origin of mutational signatures in cancer. Science 358, 234–238 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  76. Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).

    CAS  PubMed  Google Scholar 

  77. Verheijen, B. M., Vermulst, M. & van Leeuwen, F. W. Somatic mutations in neurons during aging and neurodegeneration. Acta Neuropathol. 135, 811–826 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  78. Lei, L. et al. APOBEC3 induces mutations during repair of CRISPR–Cas9-generated DNA breaks. Nat. Struct. Mol. Biol. 25, 45–52 (2018).

    CAS  PubMed  Google Scholar 

  79. Belfield, E. J. et al. DNA mismatch repair preferentially protects genes from mutation. Genome Res. 28, 66–74 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  80. Lujan, S. A. et al. Heterogeneous polymerase fidelity and mismatch repair bias genome variation and composition. Genome Res. 24, 1751–1764 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  81. Peña-Diaz, J. et al. Noncanonical mismatch repair as a source of genomic instability in human cells. Mol. Cell 47, 669–680 (2012).

    PubMed  Google Scholar 

  82. Zlatanou, A. et al. The hMSH2–hMSH6 complex acts in concert with monoubiquitinated PCNA and pol η in response to oxidative DNA damage in human cells. Mol. Cell 43, 649–662 (2011).

    CAS  PubMed  Google Scholar 

  83. Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor–normal sample pairs. Bioinformatics 28, 1811–1817 (2012).

    CAS  PubMed  Google Scholar 

  84. Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. Huang, M. N. et al. MSIseq: software for assessing microsatellite instability from catalogs of somatic mutations. Sci. Rep. 5, 13321 (2015).

    CAS  PubMed  Google Scholar 

  86. Wang, J. et al. Clonal evolution of glioblastoma under therapy. Nat. Genet. 48, 768–776 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  87. Hayward, N. K. et al. Whole-genome landscapes of major melanoma subtypes. Nature 545, 175–180 (2017).

    CAS  PubMed  Google Scholar 

  88. Campbell, P. J. et al. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).

    Google Scholar 

  89. Ellrott, K. et al. Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Syst. 6, 271–281.e7 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  90. Grün, B. & Leisch, F. FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J. Stat. Softw. 28, 1–35 (2008).

    Google Scholar 

  91. Khodabakhshi, A. H. et al. Recurrent targets of aberrant somatic hypermutation in lymphoma. Oncotarget 3, 1308–1319 (2012).

    PubMed  PubMed Central  Google Scholar 

  92. Krüger, S. et al. Rare variants in neurodegeneration associated genes revealed by targeted panel sequencing in a German ALS cohort. Front. Mol. Neurosci. 9, 92 (2016).

    PubMed  PubMed Central  Google Scholar 

  93. Hart, T. et al. Evaluation and design of genome-wide CRISPR/SpCas9 knockout screens. G3 (Bethesda) 7, 2719–2727 (2017).

    CAS  Google Scholar 

  94. Liu, J. et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 173, 400–416.e11 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank the members of the Genome Data Science group and B. Lehner for comments and discussions. This work was funded by the ERC Starting Grant HYPER-INSIGHT (757700) and the Spanish Ministry of Economy and Competitiveness (REGIOMUT, grant number BFU2017-89833-P). The results published here are in whole or part based on data generated by the TCGA Research Network (https://www.cancer.gov/tcga). This publication and the underlying research are partly facilitated by the Hartwig Medical Foundation and Center for Personalized Cancer Treatment (CPCT), which have generated, analyzed and made available data for this research. D.M.P. was funded by a Severo Ochoa FPI fellowship (MCIU/Fondo Social Europeo; BES-2017-079820). F.S. was funded by the ICREA Research Professor program and is a member of the EMBO Young Investigator Program. The authors acknowledge support from the Severo Ochoa Centre of Excellence program to IRB Barcelona.

Author information

Authors and Affiliations

Authors

Contributions

F.S. and D.M.-P. conceptualized the study and devised the methodology. D.M.-P. carried out the formal analysis and the investigation, operated the software and performed data visualization. D.M.-P. and F.S. wrote and edited the draft manuscript. F.S. acquired the funding and supervised the study.

Corresponding author

Correspondence to Fran Supek.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Detecting clustered mutations and simulating processes that generate clustered mutations.

a, Method to determine significant mutation clustering using HyperClust. A baseline distribution is generated by shuffling mutations within 1 Mbp windows multiple times (R1, R2, …, Rn) to loci with matching trinucleotide contexts. For every mutation, the observed intermutational distance to its nearest neighbour (nIMD) is compared with distributions of expected IMDs (from randomized data) to determine a local FDR (lfdr). Thresholding by lfdr yields clustered mutation calls (blue). b, Overview of study. c, Precision-recall curves for models in Fig. 1a, derived from simulated data with spiked-in mutation clusters: kataegis (top; with five mutations per cluster at an average 600 bp pairwise distance) or omikli_M (bottom; two mutations at 101 bp). Two examples of high mutation burden tumors (TCGA-AP-A0LD, TCGA-AP-A0LE) were used to generate the background mutation distributions. d, e, Testing accuracy of mutation cluster calling methods using simulated data. Points represent randomized tumor samples into which spiked-in mutation clusters were introduced. Samples are ordered according to total mutation burden (panel d). Columns show different performance metrics: F1 score, precision, and recall, all at lfdr=20%. Rows represent different types of spiked-in mutation clusters (IMD distributions plotted in panel e, where kataegis have five mutations and omikli_K/M/O two mutations. Boxplots compare cluster calling methods, including implementations of some previous methodologies (details in Methods). The “strand-clonality-lfdr” (blue) is the HyperClust method used throughout our work. f, g, Poisson mixture modelling (related with Fig. 1d) of the number of mutations per cluster, showing relative likelihood (panel f) of models with increasing number of components and the density functions (panel g) of a model with two Poisson components. solid line represents mean and dashed lines the 95% C.I. h, Number of mutation events per tumor sample (x axis, n) per local hypermutation type (rows), either the A3 context TCW>K mutations, or the remaining mutations (columns).

Extended Data Fig. 2 Tetranucleotide context suggests a role for the A3A enzyme in generating omikli and A3B in kataegis mutations.

a, c, Ratios of the YTCA (A3A-like) and RTCA (A3B-like) mutation frequencies suggest differential mutagenic activity of A3A versus A3B enzymes in cancer samples. The C>T and the C>G changes in the two A3 contexts are shown in a pan-cancer analysis (panel a) and broken down by cancer type (panel c). At least 100 TCW mutations of a certain type across all tumor samples in a tissue were required to perform analyses on that tissue (number of mutations in brackets). Error bars are the bootstrap 95% C.I. of the ratio. KICH and THCA cancer types are not shown due to low overall number of A3-context mutations. b, Across multiple cancer types, omikli shows a tendency towards A3A-like, lower RTCA/YTCA-ratios than does kataegis. Difference tested by Fisher’s exact test (per tumor type), two-tailed; p-values were adjusted for multiple testing. Dashed line is FDR=20%. Lower odds ratios (<1) denote relative enrichment of YTCA (A3A-like) mutations in omikli compared to kataegis; see schematic above plot.

Extended Data Fig. 3 Association of clustered mutation rates with replication time (RT).

a, RT association per cancer type. Number of mutations per RT bin: A3 context (top row) and the non-A3 control context at C:G nucleotide pairs (bottom row). RT bins are ordered from the latest-replicating quartile to the earliest-replicating quartile; mutation rates are shown relative to the latest RT bin. Enrichments are not shown when the mutation count was lower than 10. b, Trinucleotide composition of the human reference genome in four RT bins, normalized to the latest RT quartile (leftmost point). The A3 trinucleotide contexts (TCW, green) are similarly abundant in the late and in the early-replicating regions of the genome. c, d, Enrichment of A3-context kataegis clusters, considering only RT (c), or jointly considering RT, mRNA levels and the H3K36me3 histone mark levels (d); points are coefficients from negative binomial regression, and error bars are 95% C.I. e, Mutation rates in genomic bins with different CpG density (determined per 10 kb segment), stratified by RT quartiles. y axis shows mutation densities relative to the first bin (‘t1’, lowest tertile by CpG content). f, Spearman correlation between mRNA expression of A3A, A3B and MMR genes, and the TCW context enrichment of clustered mutations in a tumor. Error bars are 95% C.I. from the Fisher transformation of the correlation coefficient. g, Association of A3 mutation burden (clustered and unclustered) with copy number alterations of MMR genes. Significance by a two-tailed Mann-Whitney test, comparing tumor samples with neutral (0) versus gain/amplification (+1 and +2) states (blue stars, showing p-values according to legend), and independently, comparing samples with neutral (0) versus loss (−1 and −2) states (purple stars). P-values were not adjusted.

Extended Data Fig. 4 Simulations estimate power to detect mutation clusters and deconvolute their IMD distributions.

a, b, An analysis of somatic hypermutation (SHM) events in lymphoid cancers suggests length of MMR excision tracts in human cells. The distance from the initiating AID mutation (here, WNCYN>N context) to the flanking mutation introduced by error-prone MMR (here, any mutation at a A:T pair) is plotted, in known SHM off-target regions (blue) and, as a control, in intergenic regions (red) (panel a). A statistically significant enrichment is seen in the bins of the distance to central AID mutation (x axis) between 400–1000 nt (panel b). Numbers above/below bars are p-values by Chi-square test on the standardized residuals. c, Gamma mixture modelling of the IMD distributions. Log-likelihood values for different number of components when modelling IMD of the A3 kataegis and omikli mutations. d, The alpha and beta parameters of the three fitted gamma distributions (‘comp.1’, ‘comp.2’ and ‘comp. 3’) approximately match the alpha and beta parameters expected from simulated distributions with IMD at 30 bp, 800 bp and 200 bp, respectively. e, f, Simulations using spiked-in clustered mutations into genomes obtained by randomizing and subsampling mutations from MSI-H hypermutated tumors (panel e) and other hypermutators (panel f), with the goal of determining the recall (or sensitivity; y axis) of recovering mutation clusters at various global mutation burdens (x axis). Dashed line is a loess fit and shaded area is its 95% C.I. Vertical lines are residuals of the fit. g, Difference between MSI and MSS tumor samples in the absolute burden of clustered A3 omikli mutations; significance by Mann-Whitney test (two-tailed).

Extended Data Fig. 5 Validation analyses using independent genomic data sets.

ac, Fitting a Poisson distribution mixture to the number of mutations per cluster in the Hartwig Medical Foundation (HMF) dataset. The near-maximum log likelihood (LL) is obtained with two components (panel c) and the increase to three components is not statistically supported; p-values are from a two-sided bootstrap test. d, e, The relative density of A3 context (left) clustered mutations is higher in MSS (MMR-proficient) than in MSI (MMR-deficient) samples of the same tumor type (left column) in the HMF data. The difference is smaller for the non-A3, control context (right). Significance by Mann-Whitney (two-tailed), n is the number of samples, *** is p < 0.001. Numbers show fold-difference between MSS and MSI samples. The ‘other A3 tissues’ are lung, head-and-neck, skin, pancreas and bladder cancer. f, In HMF data, the A3-context omikli clustered mutations are enriched in tumors with amplified MMR genes; significance by Mann-Whitney test (two-tailed) comparing the neutral (0) versus the gain states (+1 and +2, considered jointly); n is the number of samples. g, In HMF data, A3-context omikli are enriched in early replicating, H3K36me3-marked genomic regions; error bars are 95% C.I. h, Intermutational distance distributions for kataegis (top) and omikli (bottom) A3 context mutations in the HMF data. Dashed lines show peaks of the simulated distributions (Fig. 2) with segment lengths of 25 bp (green), 200 bp (purple) and 800 bp (orange). i, j, Whole-exome sequences in the TCGA data show an excess of A3 context (TCW) mutation fraction in MSS compared to MSI cancers (panel i), and an excess of TCW mutations at distances <1000 bp, normalized to longer distances, in MSS over MSI samples (panel j). ‘MSI-exp’ (n = 152) denotes the experimentally established MSI-H status while ‘MSI-pred’ (n = 18) is the MSI status predicted using machine learning (ref. 61), ‘nonMSI’ (n = 5,661) is neither of these cases.

Extended Data Fig. 6 Contribution of the omikli and the kataegis mechanisms to the unclustered A3 mutation burden in various tissues.

a, The omikli mechanism generates many unclustered mutations (‘A3-O’) in various cancer types. b, The kataegis mechanism generates comparatively few unclustered mutations (‘A3-K’). Panels show the fit (red line) of the unclustered A3 burden (y axis) to the clustered A3 burden (x axis), (see Methods). Error bars are 95% prediction intervals at x=0, and at x = mean burden of A3 clustered mutations for that cancer type. Horizontal dashed lines are the predicted numbers of unclustered A3 mutations at those two points (for clarity also shown in blue/green bars next to each plot). Fits use robust regression (rlm function in R). For visual clarity, only the part of the plot up to the mean of unclustered mutation burden plus a margin is shown, however the fit uses all data points (that is tumor samples) including ones not visualized.

Extended Data Fig. 7 Mechanisms underlying A3 clustered mutations generate many impactful changes, affecting disease genes.

a, Coding regions in the human genome are enriched for CpG dinucleotides (NCG), but not with the A3-context TCW trinucleotides, compared to random expectation. b, Enrichment of mutations in exons versus introns (estimate of selection strength, x axis) and the enrichment in intergenic regions versus introns (estimate of redistribution of mutations towards regions containing genic DNA, y axis; flipped). The comparison of mutagenic agents against APOBEC was performed for selected tissues, matching the relevant tissue with the particular mutagen (tumor samples listed in Supplementary Table 7). Error bars are 95% C.I. from negative binomial regression; numbers in parenthesis are the tally of mutations. c, The differential functional impact of the tested mutagens across replication time (RT) bins. Left: total length of coding sequences (CDS) in the late and early RT bins, shaded by the RT sextiles that were merged to create the two bins (where 1 is the latest and 6 is the earliest RT). Middle: expected number of cancer gene CDS-affecting mutations in an average tumor sample (same sets of samples, genes and mutations as in Fig. 5a; y axis) for the late versus early RT bin (x axis), for various mutagens (colors); error bars are s.e.m. Right: fold-difference between the functional impact at the late versus early bin, for various mutagen types. d, e, The functional impact density (FID) of various mutational processes in a set of cell-essential genes (panel d) and neurodegenerative disease-associated genes (panel e). Slope shows the fraction of impactful genetic changes i.e. those affecting the CDS of at least one gene in the set. Points show the expected number of impactful changes resulting from a mutational process, on average, in a tumor genome affected by that mutational process. Error bars are s.e.m. ‘APOBEC-O4’ is A3 mutagenesis in omikli-rich tumors. ‘APOBEC-K2’ is A3 mutagenesis in kataegis-rich tumors.

Extended Data Fig. 8 Associations between genic mutations and global burden of clustered mutations.

a, Associations between A3-context TCW>K mutations in coding regions of each cancer gene, and the global burden of A3 kataegis (top left) or omikli (middle left) and their interaction term (bottom left). Right panel is same as middle-left panel, but showing only the significant genes, with labels. Volcano plots show logistic regression coefficients (transformed to odds ratio) on the x axis and the log FDR on the y axis. Genes that bore coding mutations in at least three tumor samples were tested. b, Number of TCW sites in a gene coding sequence (CDS; x axis) predicts the association of cancer gene mutations (y axis) with A3 omikli burden (bottom) but not with A3 kataegis burden (top). Error bands are 95% C.I. of the linear fit. c, Same association analysis as panel a but for the control, non-A3 context VCN>K mutations in the gene CDS. d, Early RT cancer genes are more affected by A3 mutagenesis. Cancer genes were stratified into RT quartiles (x axis) and logistic regression coefficient (log odds ratio, y axis) linking A3 omikli burden with the presence of a mutation in the CDS of any cancer gene in that RT bin was determined. Error bars are 95% C.I. from logistic regression (on n=593 tumor samples).

Supplementary information

Supplementary Information

Supplementary Note

Reporting Summary

Supplementary Tables

Supplementary Tables 1–10

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mas-Ponte, D., Supek, F. DNA mismatch repair promotes APOBEC3-mediated diffuse hypermutation in human cancers. Nat Genet 52, 958–968 (2020). https://doi.org/10.1038/s41588-020-0674-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-020-0674-6

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer