Abstract
The development of deep learning (DL) models to predict the consensus molecular subtypes (CMS) from histopathology images (imCMS) is a promising and cost-effective strategy to support patient stratification. Here, we investigate whether imCMS calls generated from whole slide histopathology images (WSIs) of rectal cancer (RC) pre-treatment biopsies are associated with pathological complete response (pCR) to neoadjuvant long course chemoradiotherapy (LCRT) with single agent fluoropyrimidine. DL models were trained to classify WSIs of colorectal cancers stained with hematoxylin and eosin into one of the four CMS classes using a multi-centric dataset of resection and biopsy specimens (n = 1057 WSIs) with paired transcriptional data. Classifiers were tested on a held out RC biopsy cohort (ARISTOTLE) and correlated with pCR to LCRT in an independent dataset merging two RC cohorts (ARISTOTLE, n = 114 and SALZBURG, n = 55 patients). DL models predicted CMS with high classification performance in multiple comparative analyses. In the independent cohorts (ARISTOTLE, SALZBURG), cases with WSIs classified as imCMS1 had a significantly higher likelihood of achieving pCR (OR = 2.69, 95% CI 1.01–7.17, p = 0.048). Conversely, imCMS4 was associated with lack of pCR (OR = 0.25, 95% CI 0.07–0.88, p = 0.031). Classification maps demonstrated pathologist-interpretable associations with high stromal content in imCMS4 cases, associated with poor outcome. No significant association was found in imCMS2 or imCMS3. imCMS classification of pre-treatment biopsies is a fast and inexpensive solution to identify patient groups that could benefit from neoadjuvant LCRT. The significant associations between imCMS1/imCMS4 with pCR suggest the existence of predictive morphological features that could enhance standard pathological assessment.
Similar content being viewed by others
Introduction
Important progress has been made in the treatment of high-risk rectal cancer (RC) patients in the past decades1. Prospective clinical trials of neoadjuvant chemoradiotherapy have shown that approximately 15% of patients can reach pathological complete response (pCR) by neoadjuvant chemoradiotherapy (CRT)2 and that response rates can be further increased to about 30% pCR by total neoadjuvant treatment (TNT)3,4,5,6. Organ sparing approaches may be realistic for patients with pCR, and can have major impact on long-term quality after definite treatment However, intensified neoadjuvant treatment is associated with increased frequency and severity of adverse events that can present during or after CRT7. At the same time, responses are heterogeneous with 30–40% of patients presenting with tumor regression of different grades while 7–30% of patients classify as non-responders8,9. Interpretable predictive biomarkers to guide personalized treatment are therefore of utmost importance to further improve patient selection for intensified treatment regimens and treatment outcomes10. The setting for the identification of predictive markers in the clinical pathway of RC patients remains highly challenging: Turnaround time needs to be short and only scarce material from diagnostic biopsies is available for study, limiting the utility of molecular and functional analysis methods.
The Consensus Molecular Subtypes (CMS) define four distinct subtypes of colorectal cancer (CRC) by common patterns of gene expression11. The prognostic value of CMS classes has been reported in several studies12,13, yet their association with treatment outcome is an open research topic11. Recently, Domingo et al.14 have described an association between pCR to neoadjuvant chemoradiotherapy and transcriptional CMS signatures in diagnostic RC biopsies, suggesting that molecular classification may also be used as a predictive biomarker for treatment stratification. Overall, early profiling of CMS classification in the clinical pathway promises to be highly relevant for personalized treatment decisions15. Limitations of transcriptional CMS classification include a high cost and time requirement for RNA sequencing, a high failure rate due to the small amount of tumor material available and difficulty to standardize single samples13,16. In addition, even successful transcriptome generation shows higher frequencies of unclassified cases in biopsies than resections13,17.
Machine learning has been shown to be a promising alternative solution for predicting molecular signatures across multiple cancer types from diagnostic histopathology sections18. In particular, Sirinukunwattana et al.13 showed that deep learning models can predict image-based CMS (imCMS) classes that match transcriptional CMS calls directly from hematoxylin-and-eosin (H&E) stained histopathology whole slide images (WSIs) of CRC tumor specimens, using three independent cohorts. This prior work also highlighted the potential of imCMS to stratify patients for whom transcriptional CMS failed. Yet, evidence of the generalization of imCMS on biopsy cohorts was limited by the need for domain-adversarial training and fine-tuning, requiring labeled data from the target cohort, which is a shortcoming for the implementation of imCMS in the clinic.
Here, we set out to show whether imCMS classifiers can be trained without domain adaptation and perform reproducibly in independent pre-treatment biopsy datasets. By investigation of real-world cohorts sourced from the Medical Research Council (MRC) and Cancer Research UK (CRUK) Stratification in Colorectal cancer (S:CORT), we estimated via simulation experiments how many cancer biopsies are sufficient to achieve near-optimal imCMS classification performance. We then assessed whether imCMS classification is associated with pCR to neoadjuvant long course chemoradiotherapy (LCRT) with single agent fluoropyrimidine treatment in two independent held out diagnostic biopsy cohorts, comprising a total of 169 advanced RC patients with outcome data.
Results
Effect of multi-cohort training on imCMS performance
With the goal of developing a generic imCMS classifier that can perform well on both resection and biopsy specimens (without the need for a domain adaptation procedure as required in imCMSv113), we first assessed the effect of combining resection and biopsy cohorts for training the models, while keeping independent resection and biopsy cohorts held out for evaluation purposes. We designed five experiments with different combinations of cohorts (Fig. 1b). For each experiment, the development datasets were split into five bins (with stratified cohorts and classes at the patient level) to build five folds of training/validation splits by using each bin once as an internal validation set (for model selection purposes), while the remaining four bins are used for training. Classification performances were separately measured for each trained model using the corresponding held out test sets, as well as for the ensemble model that combines the output of the five models (Fig. 1c).
The macro-average area under the receiver operating characteristic curves (AUROC) of the trained models are reported in (Fig. 2), the detailed ROC curves, and confusion matrices of the best performing trained models are reported in (Supplementary Fig. 2). For each test set, we achieved the best performance when both resection and biopsy images were used jointly for training. The ensemble model resulting from the experiment [E3a] (TCGA: macro-average AUROC .813; ARISTOTLE: macro-average AUROC .798) was used for subsequent analysis of outcome association. To ensure a valid assessment of performance, all the datasets in this study were curated by excluding WSIs with poor overall quality (e.g., tissue folding, out-of-focus images), and image processing was restricted to regions of tumor and microenvironment that were delineated by a board-certified pathologist specialized in gastrointestinal pathology. The exclusion criteria used in this study are listed in (Supplementary Fig. S1). Robustness to heterogeneity of image appearance and to staining variability was addressed by the application of standard data augmentation policies during training. Details on the training procedure are provided in Supplementary Notes.
Association between imCMS and pCR to neoadjuvant LCRT
To investigate the association between imCMS calls and pCR to neoadjuvant LCRT with single agent fluoropyrimidine, we identified two RC cohorts treated with the same treatment regimen (pelvic irradiation 45–50.4Gy in 25 fractions over 5 weeks, combined with single agent Capecitabine on treatment days) (see Supplementary Notes). We applied the imCMS ensemble model resulting from experiment [E3a] to WSIs of all pre-treatment biopsies of the ARISTOTLE and SALZBURG cohorts with available pCR and pre-treatment T/N stage data. imCMS calls at the case level were defined as the majority vote of the calls of the five different trained models that constitute the ensemble model. Cases with undecided majority were classified as “mixed” (n = 2). The analysis of imCMS calls with clinicopathological data and treatment outcomes was conducted with 114 patients of the ARISTOTLE cohort including 24 patients with pCR to neoadjuvant LCRT, and 55 patients of the SALZBURG cohort with 6 patients with pCR to neoadjuvant LCRT. Patients were split into groups based on their predicted imCMS class. Odds ratios for pCR were calculated separately for each group and adjusted by T-stage, N-stage and cohort. Significant associations (p-value < 0.05) were found for both imCMS4 with lack of pCR (OR 0.25, 95% CI 0.07–0.88, p = 0.03) and imCMS1 with pCR (OR 2.69, 95% CI 1.01–7.17, p = 0.048) (Fig. 3), suggesting that imCMS calls generated from pre-operative biopsy material associate with response of the primary tumor to combined chemoradiotherapy.
Simulation: effect of biopsy sampling on imCMS classification
To assess how reliable biopsy-based imCMS predictions are in comparison to resection-based imCMS predictions in a control scenario, we estimated the change in classification performance of imCMS as a function of the number of biopsy fragments in a sample. It is theoretically possible to conduct an experiment in which we would generate a dataset with many tissue fragments biopsied from each patient’s tumor site along with paired resection specimens and associated CMS calls, then comparing imCMS classification performance when combining different numbers of fragments. However, since repeated sampling cannot be ethically justified, we argue that such an experimental protocol can be approximated via the generation of virtual biopsy fragments randomly sampled from existing resection images.
We generated 26 simulations of biopsy datasets, each with a specified number m of biopsy fragments and sourced from either of the two datasets of resection specimens, FOCUS or SPINAL. For every WSI in a given resection dataset, we collected 10,000 subsets of m non-overlapping biopsy-fragment-shaped images randomly cropped from the annotated tumor regions in the resection. Cropping was performed by sourcing real shapes of tumor biopsy fragments (Fig. 4a, b) that had been manually annotated in the GRAMPIAN and ARISTOTLE cohorts (total of 1580 possible shapes). Examples of simulated biopsy samples from a source resection specimen are illustrated in (Fig. 4c, d). Thus, each random subset of m fragments simulates a single random biopsy sampling event that an endoscopist could potentially perform, including possibly both superficial and deep samples. Then, all the random subsets generated across all the WSIs of a resection dataset were concatenated to represent a set of random biopsy sampling events (Fig. 4c, d). By restricting the cropping procedure to annotated regions of tumor and microenvironment we aimed at having simulated biopsies that are a good representation of real-world biopsies. We designed these simulated datasets to approximate the characteristics of actual biopsy datasets under the assumption that, (a) sampling locations are uniformly distributed in a tumor region, (b) the shape of the biopsies is random and independent of the sampling location, (c) sampling locations of consecutive biopsy fragments are random and non-overlapping. Consequently, we argue that the distribution of images of such a simulated biopsy from a single resection image closely approximates the distribution of images of actual random biopsy samples excised from the same tumor specimen.
To assess the performance of imCMS in the simulated biopsy datasets, we made sure to evaluate models on data not seen during training by using the training and test partitions described in Fig. 1e. Each trained imCMS model resulting from the training procedures [E4a] and [E4b] was applied to all its respective simulated test sets. As a result, we observed a rapid increase of classification performances when the number of fragments in a biopsy sample increases until convergence to the performance level that the models achieved when using the resection data without sampling (Fig. 4e). For both test conditions, we report a AUROC difference lower than 3% of the AUROC achieved with the original resection data when the number of simulated tumor biopsy fragments in a sample is above five, suggesting that reliable imCMS classification can be achieved at diagnosis in a large fraction of cases19.
Stability of distribution of cell types between resection and biopsy samples
As endoscopists excise biopsies in specific targeted regions of tumors, this procedure is subject to variance between operators as well as patient-specific factors. This can potentially induce a variation of morphological information in images of biopsies in comparison to resection specimens. This may particularly affect heterogeneously distributed factors in the tumor microenvironment such as cancer, immune and stromal cell populations. We therefore investigated whether there are systematic differences in microenvironment composition between biopsies and resection specimens of CRC specimens. Specifically, we confirmed the absence of confounding factors with real-world biopsies and investigated whether systematic differences of cell distribution exist between biopsy samples and resection specimens by deconvolution of stromal, immune and tumor-related signatures from transcriptomic data in 529 biopsies and 565 resections from four S:CORT cohorts (FOCUS, SPINAL, GRAMPIAN and ARISTOTLE). First, we generated absolute abundance estimates for key immune and stromal cell types with two different tools (MCPcounter20 and xCell21) using bulk transcriptomic data. With both methods, levels of key immune lineages (T-cells including CD8+ and differentiated cytotoxic T-cell subpopulations, B-cells, NK-cells, monocytes, myeloid dendritic cells and neutrophils) as well as endothelial and stromal cell populations in biopsies and resections were statistically similar across CMS subtypes (all p-values > 0.25, ANOVA) (Fig. 5). This result suggests that biopsies may be representative of the broad abundance of cell types in the tumor microenvironment compared to resection specimens, according to their transcriptomic CMS classes.
Tissue microenvironment patterns related to CMS classes
CMS classification is driven by microenvironment and tumor-related factors. To investigate consistency of imCMS classification in RC biopsy material with underlying biological patterns, we visualized image tiles with the highest predicted probability score for each imCMS class in biopsy material (Fig. 6, see Supplementary Fig. 3 for examples of slide-level classification maps). This approach provides an overall idea of the morphology associated with each CMS class in absence of an established definition of transcriptional CMS at the scale of image patches. In agreement with prior studies, we found a consistent pattern of high stromal content and dissociative tumor growth (tumor budding) in tumor tiles classified as imCMS4 from all three cohorts. imCMS1 tiles more frequently contain lymphocytic infiltration and focal mucinous differentiation, although this feature was less frequent than previously observed in CRC resection specimens (compare TCGA, bottom (Fig. 6)), which is consistent with a lower representation of mucinous differentiation in RC. imCMS2 and imCMS3 features were consistent with previously described patterns, with imCMS2 showing epithelial-rich glandular and cribriform tumor growth with focal comedo-like necrosis while imCMS3 was characterized by glandular differentiation with tubular growth, focal mucin and a minor villiform component. Bioinformatic deconvolution of cell composition supported the observed tile-level associations. In biopsy specimens classified as imCMS4, a significantly higher frequency of fibroblast signatures were observed, while imCMS1 samples showed a tendency towards increased immune signatures, supporting biological interpretability of imCMS predictions at the case level.
Discussion
There are currently no clinically established predictive markers for response to neoadjuvant chemoradiotherapy in RC. Decision making for RC patients and strategies for assignment to neoadjuvant treatment protocols are still taken at the cohort level and it is challenging to predict which patients will respond to neoadjuvant chemoradiotherapy. Current pathology assessment of pre-operative biopsies is limited to the confirmation of cancer diagnosis and a limited panel of molecular studies such as testing for mismatch-repair deficiency (MMRd) and the testing for common driver mutations if clinically desired. Recent studies have demonstrated a strong benefit of neoadjuvant treatment of patients with MMRd RC with PD-1 Blockade, but with a frequency of approximately 2-3%, this genotype is infrequent in RC22. Other studies have suggested that the absence of tumor budding23,24, low stromal content24 or the quantification of cytotoxic T-cells25 may aid in identifying patients with favorable prognosis, but these methods are based on subjective visual features and incompletely capture the complex biology related to neoadjuvant chemoradiotherapy response. Better methods to supply a biologically informed and clinically relevant classification of RC biology from pre-operative biopsy material are therefore needed.
Consensus molecular subtyping may aid transcriptome-based staging in both colon and rectal cancer. Based on their distinct biology and clinical behavior, CMS1 and CMS4 subgroups play a key role for prognostication and may support the development and assignment of patients to biologically informed precision treatments11,12. Specifically, the CMS1 subgroup (immune subtype) is highly enriched for MSI cases and the CpG Island Methylathion (CIMP-high) subgroup of colorectal cancers. Gene enrichment analysis has shown high expression of immune response genes, in particular interferon signaling as well as the wound healing signature11. Analysis of the tumor microenvironment has demonstrated dense infiltration by anti-tumoral immune populations, in particular CD8+ cytotoxic T-lymphocytes, which play a key role in mediating the anti-tumoral treatment effect of radiotherapy. These biological features support an improved prognosis, increased propensity for response to radiotherapy and may indicate an increased likelihood of response to immune checkpoint inhibitors. In contrast, CMS4 (mesenchymal subtype) comprises stroma-rich tumors with an activation of TGF-β signaling pathways, evidence of epithelial-mesenchymal transition and invasive growth pattern11,12. These cases are characterized by a poor prognosis and increased resistance to radiotherapy treatment as well as conventional chemotherapy protocols26. Studies have demonstrated an improved response and prolonged survival of CMS4 patients to irinotecan as compared with oxaliplatin-based chemotherapy, but overall improvements were limited26. Novel treatments will be needed (in particular TGF-β-targeting anticancer agents) for subtype-specific interventions in CMS4 and to further improve outcomes.
In an earlier study13, we developed an image-based approach to predict the consensus molecular subtypes from WSIs of clinical CRC specimens and provided the proof of principle for generating image-based molecular calls even with minimum input material. In a clinical setting, such image-based morpho-molecular classifiers have the benefit to offer a level of interpretability linked to known molecular profiles, as opposed to black-box classifiers whose interpretation is limited. The predictive ability of image-based molecular calls for response to chemoradiotherapy remained an open question. For the current study, we re-implemented the imCMS analysis pipeline and trained classifiers using a combination of CRC resection and biopsy cohorts, across different stages. We found that combining these datasets was a viable option to enable generalization to external biopsy cohorts without having to rely on domain adversarial training or fine-tuning as previously required13. This incremental improvement showed that such computational pipelines can directly and accurately classify images in new cohorts without the need for additional data or re-training. Despite distributional shifts between training cohorts (e.g., biopsy sampling artefacts, relative proportions of different tissue morphology) can make models subject to learning cohort-specific features that can limit classification performance, we observed a consistent improvement of performance when combining resection and biopsy modalities for training. This result is the most striking when comparing the performances of the models tested using TCGA: models trained solely with biopsy data (GRAMPIAN or ARISTOTLE) were not able to generalize unless resection data (FOCUS and SPINAL) was used. Furthermore, we conjecture that combining these datasets enables learning of modality-agnostic features that are invariant to inter-cohort variations. This hypothesis is supported by our visual assessment of the image patches with highest probability scores from different patients across the TCGA, ARISTOTLE and SALZBURG cohorts, which illustrates the stability of morphology of top-contributing tiles across independent cohorts. Yet, this visual assessment and the suggested associations observed between local morphology and CMS classes should be further studied in bottom-up approaches and further confirmed in conjunction with the establishment of the CMS classes in small fields of view.
The significant associations between imCMS1 and pCR to neoadjuvant LCRT and between imCMS4 and lack of pCR to neoadjuvant LCRT suggest the existence of predictive morphological features that our models were able to capture. This result paves the way for future work for the validation of such a tool as a support for clinical decision in the neoadjuvant treatment setting. Specifically, imCMS1 calls could identify patients for total neoadjuvant treatment with favorable prognosis. Due to an enrichment for immune-activated cases and MMRd cases, imCMS1 cases could also have a greater likelihood to achieve complete remission with immune-checkpoint blockade27,28, but this association requires further study. In contrast, patients classified as imCMS4 could be selected for clinical trials adding additional chemotherapy or biological agents to CRT, as the likelihood for response to standard LCRT protocols with single agent fluoropyrimidine is low. This result is aligned with current understanding of CMS4-associated biology: based on the high stromal content and TGF-β signatures in this subset, resistance to cytotoxic treatment is common29. For imCMS4 patients, tumor-stroma-based therapeutic targeting may therefore offer a potentially efficacious and biologically informed treatment alternative27,28. Ten Hoorn et al. suggest that prospective studies are required to further establish the CMS taxonomy and confirm treatment efficacy by CMS subtype in clinical practice12. However, CRC subtyping via current RNA sequencing technologies is deficient, in scenarios with low tumor material: other modalities such as biopsy imaging thus constitute a promising, fast and cost-effective alternative. Using an image-based approach, we demonstrate successful imCMS classification for all samples in the present study, compared to technical failure rates of up to 35% using state-of-the-art panel sequencing approaches13.
Current ESGE guidelines19 recommend sampling a minimum of six biopsies for the diagnosis of colorectal carcinomas, to ensure sampling of tumor fragments, and to reliably represent the overall tumor phenotype. Yet whether this number of biopsies is optimal to determine the classification of CRC according to biological subtypes remains an open question. We thus undertook comprehensive simulation experiments on existing fully digitized clinical cohorts to capture the morphology related to transcriptional CMS calls and to address whether clinically established sampling protocols are sufficient to describe tumor heterogeneity at the gene-expression level. As expected and due to the spatially heterogeneous nature of CRC tumors30,31, our results suggest that sampling less than five tumor biopsies is not sufficient to properly predict the CMS call that would be obtained from an equivalent resection specimen of the same tumor. Our results corroborate the 2021 ESGE recommendation as our experiments show on two independent external test datasets of 147 and 266 patients, that five or more standard tumorous biopsy fragments are sufficient to reliably capture the global tumor phenotype needed to achieve CMS classification performance with fidelity close to full resection specimens. Thus, the current sampling protocol established in clinical practice is likely sufficient to capture tumor biology from several cancerous fragments, and to enable informed patient stratification by computational analysis.
A limitation of the current imCMS pipeline is the reliance on manual annotations of tumor regions as a pre-processing step, making the whole pipeline semi-automated. We suggest that future work should investigate strategies to either fully automate this step or to accurately classify WSIs independently of localized tumor regions. The pipeline of imCMSv1.5 was designed as an incremental extension of the work of Sirinukunwattana et al.13, yet with the goal of increasing classification and generalization performance. Although model performance was reported in held out test sets in full transparency, the variation of performance observed across the different test sets suggest the existence of hidden dataset-related factors impacting model classification. The identification of these factors and the improvement of model robustness remain a priority for the development of future versions of imCMS. Other deep learning architectures and training procedures such as recent proposed solutions to biopsy image classification problems32,33 and consideration of model uncertainty beyond ensemble majority voting should be done in future work. A weakness of our simulation experiments is the strong assumption for randomness of the sampling distribution of biopsy fragments and we concede that the distribution of true biopsy fragments may be different from our simulation, e.g., tumor fragments from the lumen are more likely to be sampled in a real-world scenario. The effect of these degrees of approximation should be investigated in future work and across cancer types. We also recognize that the guidelines requiring six biopsies per case is designed to ensure there is a high chance of identifying invasive cancer, rather than slough or non-invasive malignancy. Here, our approach shows five or more biopsy fragments containing invasive tumor are required to achieve optimal imCMS classification. Although we did not observe any significant difference in terms of distribution of cell types between the studied resection and biopsy cohorts, such sampling information should be accounted for in future work. Beyond prediction of CMS classes, other molecular signatures of CRC proposed in the literature34,35 are relevant candidates to identify associations with treatment outcome.
To conclude, we found that deep learning models can automatically capture the morphology associated with transcriptional CMS in imaged biopsies from unseen cohorts. We found that patients stratified according to biopsy-based imCMS respond differently to neoadjuvant LCRT. The results of this study therefore support the development of an inexpensive clinical tool to assign patients to subtype targeted biological interventions in future clinical trials. Beyond CMS in CRC, the on-going development morpho-molecular classification models across cancer types and molecular signatures offer a new type of cost-effective computational tools to support clinical decision making18,36. This surrogate for transcriptional analysis can also be of use for research studies with restricted funding, or to revisit existing trial cohorts by studying associations between image-based CMS classification and clinical variables of interest without the need for extra tissue material.
Methods
Study design
The study design, cohorts and aims are outlined in Fig. 1 and detailed methods for all experiments and statistical analysis are provided in the Supplementary Notes.
All samples in the S:CORT cohorts were obtained following individual informed consent and ethical approval by the National Research Ethics Service in the United Kingdom (ref 15/EE/0241; IRAS reference 169363) consistent with the principles set out in the Declaration of Helsinki. The SALZBURG cohort was reviewed by the ethical board of the provincial government of Salzburg, Austria (415-E/2343/5-2018), although under Austrian law informed consent is not needed for research use and is therefore not available for all cases.
Rectal cancer LCRT treatment protocol
All patients from the GRAMPIAN, ARISTOTLE and SALZBURG cohorts included in the study received a “standard” treatment protocol for advanced RC by pelvic irradiation (45–50.4Gy in 25 fractions over 5 weeks) combined with Capecitabine (825 mg ⋅ m−2 BD on treatment days) Detailed information on these cohorts is available in the Supplementary Notes. The primary endpoint for all cases was pCR after completion of LCRT as assessed by histopathological analysis of the surgical RC resection specimen by an expert gastrointestinal pathologist according to established guidelines.
Processing of tissue samples and slide scanning
For the four S:CORT cohorts, serial 5-μm sections were cut from one pathologist-selected representative tumor block for H&E staining followed by up to nine unstained sections for RNA extraction. H&E slides were reviewed by an expert gastrointestinal pathologist and invasive cancer regions were annotated to guide RNA and DNA extraction by macrodissection. Regions of extensive necrosis and non-tumor tissue were excluded according to standard practice for molecular tumor profiling. All H&E slides were scanned on an Aperio scanner at magnification 20 × (0.5 μm ⋅ px−1). All digital slides were re-reviewed and a board-certified pathologist annotated tumor regions while excluding areas containing folds or debris. The data filtering procedure, cohort sizes and additional details for each cohort are summarized in (Supplementary Fig. 1).
Transcriptional CMS classification
For the S:CORT cohorts, RNA expression was obtained by microarray (Xcel, Affymetrix), and raw CEL files underwent the robust multiarray average normalization with the Affymetrix package (v1.56.0)37 in R38. Batch-corrected transcriptional CMS calls were derived for each sample with CMSclassifier using the protocol described in the Supplementary Notes. Any variation in the number of patients and slides in the same cohorts used in the study by Sirinukunwattana et al.13 stems from the updated procedure for transcriptional CMS. For TCGA39, the same transcriptional CMS calls were used as previously reported by Sirinukunwattana et al.13.
Transcriptomic immune profiling
The batch-corrected version of the S:CORT transcriptome was used to derive estimates of immune and stromal cell types with MCPcounter20 and Xcell21 by applying the original R packages. Although scores for both signatures are not comparable between cell types, they were scaled from 0 to 1 to facilitate visualization.
Deep learning imCMS classification
To develop a WSI-based CMS classifier, we considered the model proposed by Sirinukunwattana et al.13 as a baseline (imCMSv1), and re-implemented a new version (imCMS v1.5) that facilitates the training procedure, and reproducibility of our experiments while keeping high performance (test macro-average AUROC in held-out TCGA of .813) and same capabilities for the interpretation of classification results.
The version 1.5 of imCMS is based on the three-stage process illustrated in (Fig. 1d). First, manually annotated tumor regions of an input WSI are tiled with patches of size 318 × 318px at magnification 5 × (~2 μm ⋅ px−1) with 50% overlap, and all tiles with less than 50% overlap with the annotated tumor regions were excluded. Then, all the extracted patches are fed as input to a trained deep learning model that outputs probability scores for each target CMS class. Third, all tile-level probability scores are averaged to produce slide-level probability scores, and the class with the highest score is considered as the imCMS call prediction for the input WSI.
We kept the first stage identical to imCMSv1 but opted for a fixed magnification for tile extraction at magnification 5 × based on the results of13, suggesting optimal performance in both resection and biopsy images. With the last two stages, we moved from a count-based assumption to a collective assumption for determination of imCMS calls40, thus enabling more fine-grained contributions of each tile to the slide-level predictions. Further, for imCMS1.5, we changed the pre-trained InceptionV3 backbone architecture used in imCMSv1 by a randomly initialized customized ResNet architecture, to ensure that our approach can be re-implemented without having to rely on a pre-training procedure. See Supplementary Notes for more details about the implemented model architecture and training procedure.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding authors upon reasonable request in accordance with the S:CORT data access policy. The TCGA datasets and images analyzed in this study are openly and publicly available at https://portal.gdc.cancer.gov/.
Code availability
The source code of the underlying (trained) models is not available due to proprietary reasons.
References
Roeder, F. et al. Recent advances in (chemo-)radiation therapy for rectal Image-Based Consensus Molecular Subtyping in Rectal Cancer Biopsies cancer a comprehensive review. Radiat. Oncol. 15, 1–21 (2020).
Maas, M. et al. Long-term outcome in patients with a pathological complete response after chemoradiation for rectal cancer a pooled analysis of individual patient data. Lancet Oncol. 11, 835–844 (2010).
Conroy, T. et al. Unicancer Gastrointestinal Group and Partenariat de Recherche en Oncologie Digestive (PRODIGE) Group. Neoadjuvant chemotherapy with FOLFIRINOX and preoperative chemoradiotherapy for patients with locally advanced rectal cancer (UNICANCER-PRODIGE 23) a multicentre, randomised, open-label, phase 3 trial. Lancet Oncol. 22, 702–715 (2021).
Bahadoer, R. et al. RAPIDO collaborative investigators. short-course radiotherapy followed by chemotherapy before total mesorectal excision (TME) versus preoperative chemoradiotherapy, TME, and optional adjuvant chemotherapy in locally advanced rectal cancer (RAPIDO). Lancet Oncol. 22, 29–42 (2021).
Jin, J. et al. Multicenter, randomized, phase III trial of short-term radiotherapy plus chemotherapy versus long-term chemoradiotherapy in locally advanced rectal cancer (STELLAR). J. Clin. Oncol. 40, 1681–1692 (2022).
Petrelli, F. et al. Total neoadjuvant therapy in rectal cancer a systematic review and meta-analysis of treatment outcomes. Ann. Surg. 271, 440–448 (2020).
Liu, S. et al. Total neoadjuvant therapy (tnt) versus standard neoadjuvant chemoradiotherapy for locally advanced rectal cancer a systematic review and meta-analysis. Oncologist 26, e1555–e1566 (2021).
Petresc, B. et al. Pre-treatment T2-WI based radiomics features for prediction of locally advanced rectal cancer non-response to neoadjuvant chemoradiotherapy a preliminary study. Cancers 12, 1894 (2020).
Wang, H. et al. Serum metabolic traits reveal therapeutic toxicities and responses of neoadjuvant chemoradiotherapy in patients with rectal cancer. Nat. Commun. 13, 7802 (2022).
Li, M. et al. Predicting response to neoadjuvant chemoradiotherapy in rectal cancer from biomarkers to tumor models. Adv. Med. Oncol. 14, 1613 (2022).
Guinney, J. et al. The consensus molecular subtypes of colorectal cancer. Nat. Med. 21, 1350–1356 (2015).
ten Hoorn, S. et al. Clinical value of consensus molecular subtypes in colorectal cancer a systematic review and meta-analysis. JNCI 114, 503–516 (2022).
Sirinukunwattana, K. et al. Image-based consensus molecular subtype (imCMS) classification of colorectal cancer using deep learning. Gut 70, 544–554 (2021).
Domingo, E., Rathee, S., Blake, A. et al. Learning model of complete response to radiation in rectal cancer reveals immune infiltrate and TGFβ signalling as key predictors. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4267509 (2022).
Stintzing, S. et al. Consensus molecular subgroups (CMS) of colorectal cancer (CRC) and first-line efficacy of FOLFIRI plus cetuximab or bevacizumab in the FIRE3 (AIO KRK-0306) trial. Ann. Oncol. 30, 1796–1803 (2019).
Dunne, P. et al. Hallenging the cancer molecular stratification dogma intratumoral heterogeneity undermines consensus molecular subtypes and potential diagnostic value in colorectal cancer. Clin. Cancer Res. 22, 4095–4104 (2016).
Alderdice, M. et al. Prospective patient stratification into robust cancer-cell intrinsic subtypes from colorectal cancer biopsies. J. Pathol. 245, 19–28 (2018).
Lafarge, M. & Koelzer, V. Towards computationally efficient prediction of molecular signatures from routine histology images. Lancet Digital Health 3, e752–e753 (2021).
Pouw, R. et al. European Society of Gastrointestinal Endoscopy (ESGE) Guideline. Endoscopy 53, 1261–1273 (2021).
Becht, E. et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 17, 1–20 (2016).
Aran, D., Hu, Z. & Butte, A. xCell digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 18, 1–14 (2017).
Papke, D. et al. Prevalence of mismatch-repair deficiency in rectal adenocarcinomas. N. Engl. J. Med. 387, 1714–1716 (2022).
Zlobec, I. et al. Intratumoural budding (ITB) in preoperative biopsies predicts the presence of lymph node and distant metastases in colon and rectal cancer patients. Brit. J. Cancer 110, 1008–1013 (2021).
Rogers, A. et al. Prognostic significance of tumor budding in rectal cancer biopsies before neoadjuvant therapy. Modern Pathol. 27, 156–162 (2014).
Koelzer, V. et al. CD8/CD45RO T-cell infiltration in endoscopic biopsies of colorectal cancer predicts nodal metastasis and survival. J. Transl. Med. 12, 1–11 (2014).
Okita, A. et al. Consensus molecular subtypes classification of colorectal cancer as a predictive factor for chemotherapeutic efficacy against metastatic colorectal cancer. Oncotarget 9, 18698 (2018).
Fridman, W. et al. Therapeutic targeting of the colorectal tumor stroma. Gastroenterology 158, 303–321 (2020).
Xu, M. et al. Targeting the tumor stroma for cancer therapy. Mol. Cancer 21, 208 (2022).
Linnekamp, J. et al. Consensus molecular subtypes of colorectal cancer are recapitulated in in vitro and in vivo models. Cell Death Differ. 25, 616–633 (2018).
Marisa, L. et al. Intratumor CMS heterogeneity impacts patient prognosis in localized colon cancer. Clin. Cancer Res. 17, 4768–4780 (2021).
Valdeolivas, A. et al. Charting the heterogeneity of colorectal cancer consensus molecular subtypes using spatial transcriptomics. [preprint] bioRxiv https://doi.org/10.1101/2023.01.23.525135 (2023).
Lu, M. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 12, 555–570 (2021).
Wood, R., Sirinukunwattana, K., Domingo, E. et al. Enhancing local context of histology features in vision transformers. In Proceedings of the MICCAI Workshop on Medical Image Assisted Blomarkers’ Discovery, (MICCAI, 2022).
Isella, C. et al. Selective analysis of cancer-cell intrinsic transcriptional traits defines novel clinically relevant subtypes of colorectal cancer. Nat. Commun. 8, 15107 (2017).
Joanito, I. et al. Single-cell and bulk transcriptome sequencing identifies two epithelial tumor cell states and refines the consensus molecular classification of colorectal cancer. Nat. Genet. 54, 963–975 (2022).
Malla, S. et al. Pathway level subtyping identifies a slow-cycling biological phenotype associated with poor clinical outcomes in colorectal cancer. Nat. Genet. 1–15 (2024).
Gautier, L. et al. affy – analysis of affymetrix genechip data at the probe level. Bioinformatics 20, 307–315 (2004).
R Foundation for Statistical Computing, Vienna, Austria. R a language and environment for statistical computing. https://www.R-project.org (2021).
Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature, 490, 61–70, (2012).
Foulds, J. & Frank, E. A review of multi-instance learning assumptions. Knowl. Eng. Rev. 25, 1–25 (2010).
Acknowledgements
The authors thank Aurelien de Reynies for advice on CMS calling in FFPE blocks, Claire Butler and Michael Youdell for excellent managing in S:CORT and the MRC Clinical Trials Unit who provided the clinical data from the FOCUS trial with permission from the FOCUS trial steering group. The S:CORT consortium is a Medical Research Council stratified medicine consortium jointly funded by the MRC and CRUK (MR/M016587/1). The ARISTOTLE trial was funded by Cancer Research UK (CRUK/08/032). This work was supported by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre. This work was supported by the Research Fund of the Paracelsus Medical University Salzburg, Austria (PMU-FFF R-17/03/090-HUW). Computation used the CTP Lab core resources at the University of Zurich and the Oxford Biomedical Research Computing (BMRC) facility, a joint development between the Wellcome Centre for Human Genetics and the Big Data Institute supported by Health Data Research UK and the NIHR Oxford Biomedical Research Centre. RW is supported through the EPSRC Center for Doctoral Training in Health Data Science (EP/S02428X/1), Oxford CRUK Cancer Centre. JR is supported through the NIHR Oxford Biomedical Research Centre, the Oxford CRUK Cancer Center, and holds an adjunct appointment at the Ludwig Institute of Cancer Research at the University of Oxford. TM gratefully acknowledges funding by the Medical Research Council and Cancer Research UK. VHK gratefully acknowledges funding by the Swiss National Science Foundation (P2SKP3_168322/1 and P2SKP3_168322/2), and the Promedica Foundation (F-87701-41-01). The results published or shown here are based in part upon data generated by the TCGA Research Network established by the NCI and NHGRI. Information about TCGA and the investigators and institutions who constitute the TCGA research network can be found at http://cancergenome.nih.gov. The funders played no role in the analyses performed or the results presented. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.
Author information
Authors and Affiliations
Contributions
M.W.L., E.D., K.S., T.M., J.R. and V.H.K. jointly conceived the study; M.W.L., E.D., K.S., T.M., J.R. and V.H.K. designed the study; M.W.L., E.D., K.S., T.M., J.R. and V.H.K. drafted the manuscript; M.W.L. developed and analyzed deep learning models; E.D. performed bioinformatic and statistical analysis; M.W.L., E.D., V.H.K. performed data interpretation and analyzed the experiments; L.S., G.M., S.D.R., A.B., D.S.M., S.G., E.K., D.N., F.H., R.G., P.D., P.Q., L.W., V.H.K., T.M. obtained and categorized clinicopathological and molecular data; KS, RW, JR, LS, G.M., S.D.R., A.B., D.S.M., S.G., E.K., D.N., F.H., R.G., P.D., P.Q., L.W., V.H.K., T.M. provided important intellectual input, provided critical resources or funding, and critically reviewed the study design; All authors have read and given approval of the final manuscript.
Corresponding author
Ethics declarations
Competing interests
V.H.K.: invited speaker for Sharing Progress in Cancer Care (SPCC) and Indica Labs; advisory board of Takeda; sponsored research agreements with Roche and IAG all unrelated to the current study; participant of a patent application on the assessment of cancer immunotherapy biomarkers by digital pathology; a patent application on multimodal deep learning for the prediction of recurrence risk in cancer patients, and a patent application on predicting the efficacy of cancer treatment using deep learning. J.R. and K.S.: co-founders of the Oxford University spinout company Ground Truth Labs (GTL); G.T.L. uses computational pathology to provide biopharma services. T.M.: consultant for GTL. F.H.: honoraria from Pierre Fabre, Amgen, Servier, Daiichi Sankyo, BMS, Merck, Sanofi; travel support from Servier, BMS, Roche, Merck, Pharmamar, Pfizer, Pierre Fabre, Sanofi, Daiichi Sankyo, Gilead; scientific advisory role for Servier, Daiichi Sankyo, BMS; holds stock options in Guardant Health. D.N.: consultant for Boerhinger Ingelheim, Lilly. All other authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lafarge, M.W., Domingo, E., Sirinukunwattana, K. et al. Image-based consensus molecular subtyping in rectal cancer biopsies and response to neoadjuvant chemoradiotherapy. npj Precis. Onc. 8, 89 (2024). https://doi.org/10.1038/s41698-024-00580-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41698-024-00580-3