Main

Chromothripsis is characterized by massive genomic rearrangements that are often generated in a single catastrophic event and localized to isolated chromosomal regions1,2,3,4. In contrast to the traditional view of tumorigenesis as the gradual process of the accumulation of mutations, chromothripsis provides a mechanism for the rapid accrual of hundreds of rearrangements in a few cell divisions. This phenomenon has been studied in primary tumors of diverse histological origins5,6,7,8,9,10, but similar random joining of chromosomal fragments has also been observed in the germline11. There has been considerable progress in elucidating the mechanisms by which chromothripsis may arise, including fragmentation and subsequent reassembly of a single chromatid in aberrant nuclear structures called micronuclei2,12 and the fragmentation of dicentric chromosomes during telomere crisis13,14. Chromothripsis is not specific to cancer as it can cause rare congenital human disease and can be transmitted through the germline11,15; it has also been described in plants, where it has been linked to micronucleation16. However, despite the recent rapid progress on elucidating the mechanisms of chromothripsis, much remains to be discovered regarding its cause, prevalence and consequences.

A hallmark of chromothripsis is multiple oscillations between two or three copy-number (CN) states1,6. Applying this criterion to CN profiles inferred from SNP arrays, chromothripsis was initially estimated to occur in at least 2–3% of human cancers1. Subsequent studies of large array-based datasets gave similar frequencies: 1.5% (124 out of 8,227 tumors across 30 cancer types)17 and 5% (918 out of 18,394 tumors)18, with the highest frequencies detected for soft-tissue tumors (54% for liposarcomas, 24% for fibrosarcomas and 23% for sarcomas)18. These estimates relied on the detection of CN oscillations that are more-densely clustered than expected by chance8.

Whole-genome sequencing (WGS) data provide a greatly enhanced view of structural variations (SVs) in the genome19, allowing us to generate a more nuanced set of criteria for chromothripsis and enhance detection specificity3. Our previous analysis of WGS data from cutaneous melanomas already found chromothripsis-like rearrangements in 38% of these tumors (45 out of 117)10; other studies using WGS data found 60–65% for pancreatic cancer5 and 32% for esophageal adenocarcinomas7. Whether these examples are outliers that reflect the unique biology of these tumors or whether they suggest a more general underestimation of the frequency of chromothripsis remained unclear.

Motivated by the importance of chromothripsis during tumor evolution and the need for more-comprehensive analyses, we determined the frequency and spectrum of chromothripsis events in the WGS data for 2,658 patients with cancer comprising 38 cancer types generated by the ICGC and TCGA projects, and aggregated by the PCAWG Consortium. These sequencing data were re-analyzed with standardized pipelines to align to the human genome (reference build hs37d5) and to identify germline variants and somatic mutations20. In addition to deriving more-accurate estimates of the per-tumor type prevalence of chromothripsis, we determined the size and genomic distribution of such events, examined their role in the amplification of oncogenes or loss of tumor-suppressor genes, described their relationship to genome ploidy and investigated whether their presence is correlated with patient survival. Our chromothripsis calls can be browsed at the accompanying website (http://compbio.med.harvard.edu/chromothripsis/).

Results

Prevalence of chromothripsis across cancer types

We first sought to formulate a set of criteria for identifying chromothripsis events with varying complexities (Fig. 1a). The acknowledged model of chromothripsis posits that some of the DNA fragments generated by the shattering of the DNA are lost; thus, CN oscillations between two or three states1,6 are an obvious first criterion (Fig. 1a). Such deletions also lead to interspersed loss of heterozygosity (LOH) or altered haplotype ratios if there is only a single copy of the parental homolog of the fragmented chromatid. Although chromosome shattering and reassembly has been experimentally demonstrated to generate chromothripsis2, template-switching DNA-replication errors can generate a similar pattern21. Indeed, shattering and replication error models are not mutually exclusive and could co-occur2. Therefore, for the discussion below we will refer generally to ‘chromothripsis’ as encompassing both classes of models.

Fig. 1: Overview of the chromothripsis-calling method and the frequency of events across 37 cancer types.
figure 1

a, Example of a region displaying the characteristic features of chromothripsis: cluster of interleaved SVs with equal proportions of SV types (that is, fragment joins), a CN profile that oscillates between two states and interspersed LOH. Details of the criteria are described in the Methods. Both the color scheme and the abbreviations shown in this figure are used throughout the manuscript. b, Classification of chromothripsis events. In a canonical event, more than 60% of the segments oscillate between two CN states; a tumor is classified as canonical if it showed at least one canonical chromothripsis event. c, Percentage of patients with chromothripsis events across the entire cohort. The fractions at the top of the bars are the number of tumors that showed high-confidence chromothripsis out of the total number of tumors of that type. The cancer type abbreviations used across the manuscript are as follows: Biliary-AdenoCA, biliary adenocarcinoma; Bladder-TCC, bladder transitional cell carcinoma; Bone-Benign, bone cartilaginous neoplasm, osteoblastoma and bone osteofibrous dysplasia; Bone-Epith, bone neoplasm, epithelioid; Bone-Osteosarc, sarcoma, bone; Breast-AdenoCA, breast adenocarcinoma; Breast-DCIS, breast ductal carcinoma in situ; Breast-LobularCA, breast lobular carcinoma; Cervix-AdenoCA, cervix adenocarcinoma; Cervix-SCC, cervix squamous cell carcinoma; CNS-GBM, central nervous system glioblastoma; CNS-Oligo, CNS oligodenroglioma; CNS-Medullo, CNS medulloblastoma; CNS-PiloAstro, CNS pilocytic astrocytoma; ColoRect-AdenoCA, colorectal adenocarcinoma; Eso-AdenoCA, esophagus adenocarcinoma; Head-SCC, head-and-neck squamous cell carcinoma; Kidney-ChRCC, kidney chromophobe renal cell carcinoma; Kidney-RCC, kidney renal cell carcinoma; Liver-HCC, liver hepatocellular carcinoma; Lung-AdenoCA, lung adenocarcinoma; Lung-SCC, lung squamous cell carcinoma; Lymph-CLL, lymphoid chronic lymphocytic leukemia; Lymph-BNHL, lymphoid mature B-cell lymphoma; Lymph-NOS, lymphoid not otherwise specified; Myeloid-AML, myeloid acute myeloid leukemia; Myeloid-MDS, myeloid myelodysplastic syndrome; Myeloid-MPN, myeloid myeloproliferative neoplasm; Ovary-AdenoCA, ovary adenocarcinoma; Panc-AdenoCA, pancreatic adenocarcinoma; Panc-Endocrine, pancreatic neuroendocrine tumor; Prost-AdenoCA, prostate adenocarcinoma; Skin-Melanoma, skin melanoma; SoftTissue-Leiomyo, leiomyosarcoma, soft tissue; SoftTissue-Liposarc, liposarcoma, soft tissue; Stomach-AdenoCA, stomach adenocarcinoma; Thy-AdenoCA, thyroid low-grade adenocarcinoma; and Uterus-AdenoCA, uterus adenocarcinoma.

To detect chromothripsis in WGS data, we developed ShatterSeek (Methods and Supplementary Note). A key feature of our method is to identify clusters of breakpoints belonging to SVs that are interleaved—that is, the regions bridged by their breakpoints overlap instead of being nested (Fig. 1)—as is expected from random joining of genomic fragments. This encompasses the many cases that do not display simple oscillations (for example, partially oscillating CN profiles with interspersed amplifications) and oscillations that span multiple CN levels due to aneuploidy5,22. Rearrangements in chromothripsis should also follow a roughly even distribution for the different types of fragment joins (duplication-like, deletion-like, head-to-head and tail-to-tail inversions, which are shown in blue, orange, black and green, respectively, in Fig. 1a and throughout) and have breakpoints that are randomly distributed across the affected region1,2,3. Finally, we use interchromosomal SVs to identify chromothripsis events that involve multiple chromosomes. In the Supplementary Note, we have compiled the criteria that have been used in 27 major chromothripsis-related studies to date.

After removing low-quality samples using stringent quality control, we applied ShatterSeek to 2,543 tumor–normal pairs of 37 cancer types (Methods and Supplementary Table 1). Of those 2,543 pairs, 2,428 cases had SVs and were analyzed further. To tune the parameters in our method, we used statistical thresholds and visual inspection. For the minimum number of oscillating CN segments, we used two thresholds: high-confidence calls display oscillations between two states in at least seven adjacent segments, whereas low-confidence calls involve between four and six segments (Fig. 1b and Supplementary Note). The analyses described in the subsequent sections were performed using the high-confidence call set unless noted otherwise.

We first focused on the 1,427 nearly diploid genomes (ploidy ≤ 2.1; Supplementary Table 1), in which detection of chromothripsis is more straightforward. We defined as ‘canonical’ those events in which more than 60% of the CN segments in the affected region oscillated between two states (canonical events in polyploid tumors are described later). The frequency of canonical chromothripsis events is more than 40% for multiple cancer types, such as glioblastomas (50%) and lung adenocarcinomas (40%). These frequencies are much higher than previous estimates17,18.

When we extend our analysis to the entire cohort, we identify high-confidence events in 29% of the samples (734 out of 2,543), affecting 3.2% of all chromosomes (Fig. 1c and Supplementary Dataset 1). When low-confidence calls are included, the percentages increase to 40% and 5.3%, respectively (Supplementary Dataset 2).

The frequency varies markedly across cancer types. At the high end, we find that 100% of liposarcomas and 77% of osteosarcomas exhibit high-confidence chromothripsis (Fig. 1c and Supplementary Fig. 1). Although a higher susceptibility of these cancer types to chromothripsis has been described1,22, our estimated frequencies are substantially higher. Melanomas, glioblastomas and lung adenocarcinomas showed evidence of chromothripsis in more than 50% of cases (Fig. 1c). By contrast, the frequencies were lowest in thyroid adenocarcinomas (3.3%, n = 30), chronic lymphocytic leukemia (1.2%, n = 86) and pilocytic astrocytomas (0%, n = 78); in the other tumor types with low incidence, the sample sizes were too small to give meaningful estimates. Consistent with previous reports23,24, we find that chromothripsis is enriched in chromosomes 3 and 5 in kidney renal cell carcinomas and chromosome 12 in liposarcomas (Supplementary Fig. 1a). Overall, these results indicate a much greater prevalence of chromothripsis in a majority of human cancers than previously estimated10,17,18.

Understanding the difference between our frequency estimates and previous ones

Our estimates are in accordance with recent analyses in specific tumor types5,7; however, they are considerably higher than those described in previous pan-cancer studies that used array-based platforms. With higher resolution from sequencing data, improved SV algorithms and refined criteria, we are able to provide more-accurate estimates.

To better understand the discrepancy between WGS-based studies, we carried out a detailed comparison using previously analyzed datasets. For 109 previously described prostate adenocarcinomas25, the authors used ShatterProof26 and found chromothripsis in 21% (23 out of 109). When we applied the same algorithm (with the same parameters) but using our CN and SV calls, the percentage more than doubled to 45% (49 out of 109). This indicates that the lower sensitivity of previous SV-detection methods is one of the main reasons for the discrepancy. Accurate SV detection remains challenging, especially for low-purity tumors. The SV calls that we used were generated by the PCAWG Structural Variation Working Group of the ICGC; each variant was required to be called by at least two of the four algorithms used in this analysis27.

Using ShatterSeek, we identified 11 additional cases for a total of 55% (60 out of 109). Of the 23 previously reported cases25, we missed four. The missed events are focal events comprising fewer than six SVs, which is the lowest number allowed in our criteria; the detected regions appear to be hypermutated regions characterized by tandem duplications or deletions. For the cases that we detect but that were missed previously25, visual inspection reveals that the differences are mostly due to the lower sensitivity of their SV calls (Supplementary Note). ShatterSeek has increased sensitivity by incorporating more complex patterns of oscillations and interchromosomal SVs while keeping the specificity high by imposing additional criteria on breakpoint homology to remove tandem duplications and those arising from breakage–fusion–bridge (BFB) cycles. Furthermore, we also compared our method against ChromAL5 for 76 pancreatic tumors. Both ChromAL and ShatterSeek detect chromothripsis in the same 41 tumors (54%).

Therefore, our estimates for the frequency of chromothripsis events are supported by the following: some tumor types such as thyroid adenocarcinoma, chronic lymphocytic leukemia and pilocytic astrocytomas have few or no events; diploid tumors, which have simpler configurations that are easier to reconstruct or verify visually, have high frequencies; the high-confidence cases were used for final estimates; more sensitive CN and SV calls result in higher frequencies for the same datasets; our estimates are in agreement with very recent analysis in specific tumor types; and our chromothripsis calls do not overlap with regions affected by chromoplexy (Supplementary Note).

Frequent involvement of interchromosomal SVs

An important feature of our approach is the incorporation of interchromosomal SVs to detect those events that involve multiple chromosomes. Chromothripsis affects only a single chromosome in 40% of the tumors with chromothripsis (Fig. 2a–c and Supplementary Figs. 13). A large number of chromosomes is frequently affected in some tumor types, for example, at least five chromosomes are affected in 61% osteosarcomas (Supplementary Figs. 14). In one extreme case, we found a single chromothripsis event that affected six chromosomes (Fig. 2b), with only seven of the 110 SVs on chromosome 5 being intrachromosomal. In another example (Supplementary Fig. 4d), an approximately 5-Mb region on chromosome 12 did not display CN oscillations, but it could be linked by interchromosomal SVs to another region that does show a clear chromothripsis pattern, suggesting that the amplification of CCND2 on chromosome 12 may have originated from chromothripsis. Chromothripsis involving multiple chromosomes is likely to have arisen either from simultaneous fragmentation of multiple chromosomes (for example, in a micronucleus or in a chromosome bridge) or from fragmentation of a chromosome that had previously undergone a non-reciprocal translocation.

Fig. 2: Heterogeneity of chromothripsis events.
figure 2

ac, Examples of massive chromothripsis events on the background of quiescent genomes in samples from patients DO17373 (a), DO52622 (b) and DO45249 (c). d, The fraction of SVs involved in chromothripsis in each sample against the maximum number of contiguous oscillating CN segments for the high-confidence (circles) and low-confidence (squares) chromothripsis calls. e, Distribution of patients showing high-confidence chromothripsis, deleterious TP53 mutations and MDM2 amplification (CN ≥ 4). WT, wild-type allele.

Size and complexity of chromothripsis events are highly variable

Chromothripsis events span a wide range of genomic scale, with the number of breakpoints involved varying by two orders of magnitude within some tumor types (Supplementary Fig. 1c). We found that tumors had relatively focal chromothripsis events—usually a few megabases in size—that took place within an otherwise quiet genome (bottom-right quadrant in Fig. 2d). Although focal, these events can lead to the simultaneous amplification of multiple oncogenes located in different chromosomes (Supplementary Figs. 4c–e, 5a–c). Other focal events co-localize with other complex events in highly rearranged genomes (bottom-left quadrant in Fig. 2d). Overall, our analysis reveals that there is greater heterogeneity in chromothripsis patterns than previously appreciated, both in terms of the number of SVs and chromosomes involved.

Relationship between chromothripsis and aneuploidy

Newly established polyploid cells have high rates of mitotic errors that generate lagging chromosomes28,29, which have been linked to chromothripsis in medulloblastomas and in vitro2,12,14. However, a causal relationship or even the frequency of association between polyploidy and chromothripsis has not been assessed in detail. To examine the sequence of events clearly, we focused on the canonical cases, for which we can infer whether chromothripsis occurred before or after polyploidization30. For example, if the CN oscillates between two and four copies in a tetraploid tumor, we infer that polyploidization occurred after chromothripsis; on the other hand, if the oscillation occurs between three and four copies, we infer that polyploidization occurred first30 (Supplementary Figs. 1, 2, 5d, 6 and Supplementary Note). Of the 194 cases in which we can distinguish the sequence of events, 74% show chromothripsis after polyploidization. This suggests that a large fraction of the canonical chromothripsis events in polyploid tumors are late events.

We observed canonical chromothripsis events in 26% of diploid-ranged tumors (431 out of 1,648) and in 40% of polyploid-ranged tumors (298 out of 748). After correcting for tumor type using the logistic regression, we estimate that, on average, the odds of chromothripsis occurring in a polyploid tumor (cases with ploidy ≥ 2.5) is 1.5 times larger than that in a diploid tumor (95% confidence interval, 1.20–1.85; P < 10−3). This increase may be due to the presence of more genomic material in polyploids, although polyploidy also reduces the sensitivity of CN and SV detection (due to a lower sequence coverage per copy) and makes it easier for the cell to lose the highly rearranged copy when intact copies are present31.

Frequent co-localization of chromothripsis with other complex events

About half of the chromothripsis events co-localize with other genomic alterations (Fig. 1 and Supplementary Figs. 1, 2). There is evidence across multiple tumor types that chromothripsis might occur before or after additional layers of rearrangements6,7,8,13,14,23. For instance, BFB cycles have been mechanistically linked to chromothripsis and telomere attrition—which results in the formation of BFB cycles, has been identified as a predisposing factor for chromothripsis6,13,32.

Co-localization of APOBEC-mediated clustered hypermutation (kataegis) and rearrangements has been reported for multiple cancer types33,34, and has been linked to single-stranded DNA intermediates during break-induced replication35. To study the relationship between kataegis and chromothripsis, we examined the presence of clusters of APOBEC-induced mutations within the chromothripsis regions (Methods). Excluding melanoma samples (due to the overlap between the APOBEC and ultraviolet-light signatures36), we find that 28% of the 734 tumors with chromothripsis show at least five clustered APOBEC-induced mutations, and 9.3% display kataegis comprising more than 20 mutations. Previous analysis of liposarcomas has suggested that multiple BFB cycles on a derivative chromosome generated by chromothripsis underlie the formation of neochromosomes23. In agreement with this model, we observe variant allele fractions of 0.01–0.1 for APOBEC-induced mutations in chromothripsis regions that have high-level CN amplifications in soft-tissue liposarcomas, suggesting that they occurred at the late stages of tumor development, likely after chromothripsis (Supplementary Fig. 4e). Overall, although kataegis can co-occur with chromothripsis, this co-occurrence is not common. This is consistent with recent data that chromothriptic derivative chromosomes are mostly assembled by end-joining mechanisms that do not involve extensive DNA-end resection37.

TP53 mutation status and chromothripsis

Inactivating TP53 mutations have been associated with chromothripsis in medulloblastomas8 and in pediatric cancers38,39, and TP53-deficient cells have been used as a model to generate chromothripsis in vitro2,14. Nevertheless, the relationship between deleterious TP53 mutations and chromothripsis has not been examined comprehensively. In our data, 38% of the samples with inactivating TP53 mutations show chromothripsis, whereas 24% of those with wild-type TP53 have chromothripsis (Fig. 2e). After correcting for cancer type, this translates to an odds ratio of 1.54 (95% confidence interval, 1.21–1.95, P < 10−3) for chromothripsis in those with TP53 mutations compared with TP53 wild-type cancers. However, we note that 60% of the chromothripsis cases show neither TP53 mutations nor MDM2 amplifications (a regulator of TP53 by ubiquitination40), including those with massive cases of chromothripsis in diploid genomes (for example, DO25622 in Fig. 2b). This indicates that, although p53 malfunction and polyploidy are predisposing factors for chromothripsis, it still occurs frequently in diploid tumors with proficient p53.

Signatures of repair mechanisms in chromothripsis regions

Although imprecise, it is possible to infer the predominant mechanisms responsible for the chromothripsis event based on the sequence homology at the breakpoints41,42. Previously, non-homologous end joining (NHEJ) has been implicated in the reassembly of DNA fragments generated by chromothripsis2,37, whereas alternative end joining (alt-EJ) has been proposed in constitutional chromothripsis and in glioblastomas15,43. In addition, short templated insertions suggestive of microhomology-mediated break-induced replication (MMBIR) or alt-EJ associated with polymerase theta have been detected in chromothripsis events that originated from DNA fragmentation in micronuclei2,44,45,46.

We analyzed the breakpoints involved in canonical chromothripsis events with interspersed LOH, as most SVs in such cases are related to chromothripsis (Fig. 1b). In 55% of these events, we only detected repair signatures that were concordant with NHEJ or alt-EJ (Supplementary Fig. 7). In 32%, we identified stretches of microhomology at two or more breakpoint junctions (mostly comprising 0–6 bp) and short insertions of 10–500 bp that map to distant locations within the affected region (Supplementary Fig. 7). For example, in the massive chromothripsis in Fig. 2a (1,394 SVs, hundreds of uninterrupted CN oscillations and interspersed LOH), we detect small nonrandom insertions of 10–379 bp at 60 breakpoints. Thus, NHEJ has a principal role in DNA repair, with partial contributions from MMBIR or alt-EJ.

By contrast, approximately 5% of the canonical events detected in diploid genomes show no evidence of LOH in part of the affected region or in the entire affected region, for example, oscillations between two and three CN, long stretches of microhomology and frequent evidence of template switching27 (Figs. 3, 4). For instance, in the case shown in Fig. 3b, both the size of the segments at CN 3 (mean of 45 kb) and the orientation of the breakpoints at their edges suggest that these are templated insertions27. In addition, multiple breakpoint junctions show features concordant with MMBIR. In this case, we could manually reconstruct part of the amplicon by following the polymerase trajectory across 42 template-switching events (Fig. 3c–f). This type of event might be more appropriately called chromoanasynthesis21, but systematically distinguishing chromoanasynthesis from chromothripsis is challenging due to their partially overlapping features (template switching events can generate LOH if the polymerase skips over segments of the template and LOH might not be present in chromothripsis events that occur in aneuploid genomes; Supplementary Note).

Fig. 3: Example of canonical chromothripsis events displaying templated insertions and evidence of MMBIR.
figure 3

a, Evidence of chromothripsis in chromosome 1 in a skin-melanoma tumor with CN oscillations that span 3 CN levels and LOH. b, Example of a chromothripsis event in chromosome 4 involving low-level CN gains and absence of LOH in an ovarian adenocarcinoma. Segments at CN 3 correspond to templated insertions, as evidenced by their size, and breakpoint orientations at their edges. Breakpoints corresponding to interchromosomal SVs are depicted as colored dots in the SV profile, whereas intrachromosomal SVs are represented with black dots and colored arcs following the representation shown in Fig. 1. c, Reconstruction of the amplicon generated by the chromoanasynthesis event detected in chromosome 4 in tumor DO46329 (see b). Inverted segments are depicted in green. Red arrows highlight breakpoints with short microhomology tracts, whereas blue arrows indicate the presence of small insertions at the breakpoints. The CN for all segments is 3 unless otherwise indicated. d, Size distribution for the templated insertions forming the amplicon depicted in c. e, CN step plot for chromosome 4 indicating that most of the SVs mapped to chromosome 4 link genomic regions at CN 3. The x and y axes correspond to the CN level of the segments linked by a given SV. The color of the bars corresponds to the four types of SVs (that is, deletion-like, duplication-like, and head-to-head and tail-to-tail inversions) indicated in Fig. 1a and considered throughout the manuscript. f, Trajectory of the polymerase across chromosome 4 estimated from the template-switching events shown in c.

We also find features associated with replication-associated mechanisms in more-complex rearrangements involving multiple chromosomes. In an illustrative case (Fig. 4a), LOH is observed in some chromosomes (Fig. 4b) but absent in others, where the oscillations occur at higher CN states without LOH (Fig. 4c,d). There is evidence of templated insertions in chromosomes 5 and 13, which are linked to a chromothripsis event showing LOH in chromosome 1. Notably, the minor CN for the templated insertions in chromosome 13 is 1, whereas it is 0 for the rest of the chromosome. This suggests that one parental chromosome served as a template and was later lost.

Fig. 4: Example of a multichromosomal chromothripsis event in a soft-tissue liposarcoma co-localized with other complex events involving templated insertions.
figure 4

a, Scaled circos plot of the entire genome for this tumor except for chromosome Y. bd, SV and CN profiles for chromosomes 1 (b), 5 (c) and 13 (d). Tens of CN oscillations and LOH in chromosome 1 are co-localized with additional rearrangements. The size, minor CN (from the allele with the lower number of copies) and orientation of the breakpoint junctions associated with the segments at CN 3 indicate that these are templated insertions. c, Inset: orientation of the breakpoint junctions at the edges of low-level CN gains originated from template switching (that is, − and + according to the annotation that we use in the manuscript).

Overall, these results indicate the involvement of template-switching events in the generation or repair of complex rearrangements, consistent with the observations of replication-associated processes in the formation of clustered rearrangements in congenital disorders and cancer15,21,27,41,47. Although further experimental evidence will be necessary, we suggest that the involvement of replication-associated mechanisms in the assembly of derivative chromosomes in chromothripsis might be substantial.

Oncogene amplification and loss of tumor-suppressor genes in chromothripsis regions

Evidence of oncogene amplification in extrachromosomal circular DNA elements, termed double-minutes, generated as a consequence of chromothripsis has been reported for selected cancer types1,2,8,43. However, the extent to which chromothripsis contributes to double-minute formation has not been examined on a pan-cancer scale. Although reconstruction of a double-minute structure with discordant reads would present clear evidence for its extrachromosomal nature, this proves to be too difficult in general. Therefore, we rely on CN to make our inferences. We find that 15 patients (2% of tumors with chromothripsis) show CN oscillations between one low (CN ≤ 4) and one very high (CN ≥ 10) state, consistent with the presence of a double minute8,43. We detect known cancer drivers in these putative double minutes, including MDM2 (four samples; Supplementary Figs. 4e, 5a and Supplementary Table 2) and CDK4 (four samples). These amplifications lead to increased mRNA levels of, for example, MDM2, NUP107 and CDK4 in a glioblastoma sample (DO14049) compared to other glioblastoma tumors. In chromothripsis regions subject to additional rearrangements, it is difficult to discern, using bulk-sequencing data, whether highly amplified segments are part of double minutes or correspond to intrachromosomal amplification48. Furthermore, once a double minute has formed, the derivative chromosome showing chromothripsis may be lost if it has no other tumor-promoting mutations. Therefore, the contribution of chromothripsis to the formation of extrachromosomal DNA bodies is likely to be higher than estimated here.

Further analysis of focal amplifications, defined as regions with CN ≥ 4 and smaller than 6 Mb (ref. 49), in 1,268 tumors and 162 normal tissue samples with RNA-sequencing data reveals that 6,310 focal amplifications encompassing oncogenes (11.1%; or 20.5% when including low-confidence calls) localize to chromothripsis regions, often leading to increased expression (Supplementary Table 2). These include well-known cancer-associated genes, such as CCND1 (25 tumors), CDK4 (25 tumors), MDM2 (23 tumors), SETDB1 (23 tumors), ERBB3 (11 tumors), ERBB2 (11 tumors), MYC (10 tumors) and MYCN (five tumors). Therefore, chromothripsis—perhaps together with associated replication-based CN gains22,50—may make a substantial contribution to small-scale focal amplifications.

Expanding previous analyses5,24, we examined the extent to which chromothripsis contributes to the loss of tumor-suppressor genes across tumor types. We find that chromothripsis underlies 2.1% and 1.9% of the losses of tumor-suppressor and DNA-repair genes, respectively. These include MLH1 (9 out of 301 tumors with MLH1 deletions), PTEN (12 out of 358), BRCA1 (8 out of 154), BRCA2 (7 out of 270), APC (9 out of 201), SMAD4 (10 out of 403) and TP53 (8 out of 614) (Supplementary Fig. 8 and Supplementary Table 2). In 28 samples, both alleles were inactivated, one due to chromothripsis and the other due to a point mutation, including in SMAD4, APC, TP53 and CDKN2A. In a biliary adenocarcinoma (Fig. 5), for instance, one MLH1 allele was lost due to chromothripsis and the other allele was likely silenced due to promoter hypermethylation, as evidenced by low expression of MLH1 and the microsatellite-instability phenotype in an otherwise mismatch repair (MMR)-proficient tumor51. Overall, these data illustrate the way in which chromothripsis can confer tumorigenic potential through the loss of key tumor-suppressor and DNA-repair genes. See Supplementary Note for additional analysis of the genes recurrently targeted by chromothripsis breakpoints, their role in the formation of gene fusions, enrichment of chromothripsis breakpoints in epigenomic marks and survival analyses.

Fig. 5: Chromothripsis-mediated depletion of MLH1.
figure 5

a, Chromothripsis event and expression levels of DNA MMR genes in the sample of patient DO45299 (biliary adenocarcinoma). b, Mean expression of DNA MMR genes in a panel of 16 biliary adenocarcinomas and 16 normal liver samples. Box plots in b show median, first and third quartiles (boxes), and the whiskers encompass observations within a distance of 1.5× the interquartile range from the first and third quartiles. AR, allelic ratio computed for heterozygous SNPs.

Discussion

Our analysis has revealed that chromothripsis plays a major part in shaping the architecture of cancer genomes across diverse cancers. We found that the prevalence and heterogeneity of chromothripsis was much higher than previously appreciated. Our approach enabled us to define more-nuanced criteria to detect chromothripsis events, including those that involve multiple chromosomes and those that were hard to detect previously due to the presence of other co-localized rearrangements.

We note that the estimated frequencies of chromothripsis depend on statistical thresholds. Although we chose conservative thresholds, we cannot exclude the possibility that some chromothripsis-like patterns might have arisen due to other sources of genomic instability. Conversely, it is also possible that we missed true chromothripsis events that have fewer than the required number of rearrangements; it is worth noting that such small-scale events are seen in experimentally generated chromothripsis2. Cases in which chromothripsis is followed by other complex rearrangements that mask the canonical CN pattern are especially difficult to detect, requiring additional criteria and in-depth manual inspection. Despite these limitations, we believe that our statistical approach is more sensitive than the reassembly-based approach in which one attempts to reconstruct the steps that led to the observed SV pattern. Most complex events are too complicated for reconstruction, especially when many breakpoints are undetected and some are incorrectly identified due to inherent limitations of short-read data, imperfect SV algorithms and insufficient sequencing coverage.

Given the pervasiveness of chromothripsis in human cancers and its association with poorer prognosis, another question that arises is whether chromothripsis itself constitutes an actionable molecular event that is amenable to therapy. This is of particular interest given the link between aneuploidy, depleted immune infiltration and reduced response to immunotherapy52. As more WGS data are linked to other data types including clinical information, it will become feasible to understand the influence of chromothripsis on tumorigenesis and its potential as a biomarker for diagnosis or treatment.

Methods

PCAWG whole-genome sequencing dataset

We integrated, using a common processing pipeline, whole-genome sequencing data from the TCGA and ICGC consortia for 2,658 tumor and matched normal pairs across 38 cancer types, of which 2,543 pairs from 37 cancer types that passed our quality-control criteria were selected for further analysis53. The list of samples is provided in Supplementary Table 1. Further information for all tumor samples and patients is provided in a separate study20. Sequencing reads were aligned using BWA-MEM v.0.7.8-r455, whereas BioBamBam v.0.0.138 was used to extract unpaired reads and mark duplicates54,55.

Mutation calling

We used the consensus SNV and indel (insertions and deletions) call sets released by the PCAWG project (Supplementary Table 3). We used HaplotypeCaller v.3.4-46-gbc0262554 to call SNPs in both tumor and matched normal samples following the GATK best-practice guidelines. We retained only SNPs supported by at least ten reads. We processed a total of 210,021 nonsynonymous somatic mutations, of which 43,548 were predicted to be deleterious using the MetaLR score as implemented in Annovar56. To identify APOBEC mutagenesis, we followed a previously described procedure36. In brief, we considered as APOBEC-associated mutations those involving a change of (1) G within the sequence motif wGa to a C or A (where w is A or T) or (2) C in the sequence motif tCw to G or T (where w is A or T).

Detection of SVs and CN alterations

The SVs were identified by the PCAWG Structural Variation Working Group, which applied four algorithms and selected those SVs found by at least two algorithms20,27. We used the consensus SV, CN, purity and ploidy call-sets generated by the PCAWG project (Supplementary Table 3). The calling pipelines are described in detail in associated papers27,57.

RNA-seq data analysis

We processed RNA-seq data for a total of 162 normal and 1,268 and tumor samples. Sequencing reads were aligned using TopHat2 and STAR58,59. HTseq-count was subsequently used to calculate read counts for the genes encompassed in the PCAWG reference GTF set, namely Gencode v.19. Counts were normalized to UQ-FPKM (upper-quartile-normalized fragments per kb per million mapped reads) values using upper-quartile normalization. The expression values were averaged across the two alignments. The set of oncogenes was downloaded and curated from COSMIC (dominant genes) and IntOGen databases60,61, whereas the set of tumor suppressors was downloaded from TSGene v.2.0, COSMIC (recessive genes) and previous studies62,63. DNA-repair genes were extracted from a previous study64.

Characterization of chromothripsis events using ShatterSeek

To identify and visualize chromothripsis-like patterns in the cancer genomes by using CN and SV data, we adapted the previously proposed set of statistical criteria3. The ShatterSeek code, the package documentation and a detailed tutorial are available at https://github.com/parklab/ShatterSeek. Interactive circos plots for all tumors in the PCAWG cohort analyzed in this study are provided at http://compbio.med.harvard.edu/chromothripsis/.

The values for the statistical criteria for all chromosomes across all samples are provided in Supplementary Table 1. Visual depictions of the high-confidence and low-confidence calls are provided in Supplementary Datasets 1 and 2. Visual depictions for the two sets of SV clusters not identified as chromothripsis by our method, namely (1) those involving clusters of duplications or deletions leading to CN oscillations, as well as oscillating CN profiles with few or no SVs mapped and (2) large clusters of interleaved SVs that did not display chromothripsis, are provided in Supplementary Datasets 3 and 4, respectively. In Supplementary Datasets 14 and in the main text (Figs. 1a, 3a,b, 4b–d and 5a), intrachromosomal SVs are depicted as arcs with the breakpoints represented by black points, whereas the breakpoints corresponding to interchromosomal SVs are depicted as colored points. Duplication-like SVs, deletion-like SVs, head-to-head and tail-to-tail inversions are depicted in blue, orange, black and green, respectively. The value for the statistical criteria described above for each event is provided underneath its representation.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.