Abstract
The diagnostic spectrum for AML patients is increasingly based on genetic abnormalities due to their prognostic and predictive value. However, information on the AML blast phenotype regarding their maturational arrest has started to regain importance due to its predictive power for drug responses. Here, we deconvolute 1350 bulk RNA-seq samples from five independent AML cohorts on a single-cell healthy BM reference and demonstrate that the morphological differentiation stages (FAB) could be faithfully reconstituted using estimated cell compositions (ECCs). Moreover, we show that the ECCs reliably predict ex-vivo drug resistances as demonstrated for Venetoclax, a BCL-2 inhibitor, resistance specifically in AML with CD14+ monocyte phenotype. We validate these predictions using LUMC proteomics data by showing that BCL-2 protein abundance is split into two distinct clusters for NPM1-mutated AML at the extremes of CD14+ monocyte percentages, which could be crucial for the Venetoclax dosing patients. Our results suggest that Venetoclax resistance predictions can also be extended to AML without recurrent genetic abnormalities and possibly to MDS-related and secondary AML. Lastly, we show that CD14+ monocytic dominated Ven/Aza treated patients have significantly lower overall survival. Collectively, we propose a framework for allowing a joint mutation and maturation stage modeling that could be used as a blueprint for testing sensitivity for new agents across the various subtypes of AML.
Similar content being viewed by others
Introduction
Acute myeloid leukemia (AML) is an aggressive hematological cancer of the myeloid lineage. AML is caused by a combination of relatively few genetic alterations that are predominantly somatically acquired and cooperatively induce a maturation arrest in combination with rapid uncontrolled proliferation of immature myeloid precursor cells. The prognosis of AML is highly dependent on the presence of such recurrent genetic alterations and varies from > 90% cure rates to <10%1. Therefore, the current WHO classification primarily defines AML subtypes according to the presence of eleven recurrent genetic aberrations (RGA) changes and an added heterogeneous umbrella subtype composed of a highly diverse set of relatively rare RGAs2,3,4,5. Only AML cases that lack any detected RGA are characterized according to their maturation stage according to the French-American-British (FAB) cytomorphological/cytochemical classification6.
Recently, the maturation and differentiation stage of AML blasts has gained importance due to a striking association with sensitivity or resistance to new drugs7,8,9. AML differentiation stage can be assessed by diagnostic flow cytometry more accurately and objectively than by cytomorphological examination alone. Current standardized flow cytometry for AML diagnosis, analyzes few carefully selected differentiation markers, sufficient for accurate immunological AML classification10. However, gene expression profiling by bulk RNA-sequencing (RNA-seq) analyses many more markers (several thousands) and is an attractive alternative technique as it allows both calling of genetic aberrations and estimation of cell subsets, i.e. estimated cell composition (ECC) from the same sample11,12,13. Reported attempts to estimate ECCs by deconvoluting bulk AML samples utilizing single-cell RNA-seq as an in-silico reference, were mostly focused on detection of survival differences without performing thorough validation of the ECCs and/or used leukemic samples as reference14,15,16, which prevents assessing whether ECCs are tissue- or sample- specific.
Here, we perform deconvolution, a technique in which you try to estimate each cell type within the total gene expression profile of a bulk mixture, of 1350 AML transcriptomic samples via a healthy single-cell reference while validating our findings with our in-house (LUMC) flow cytometry data. Of note, we demonstrate that the ECCs recapitulate the entire FAB landscape (M0-M7). Then, using these ECCs we predict ex-vivo drug resistance data from literature and show the agreement of these results at protein level with the help of LUMC proteomics data in AML patients, for whom we also had acquired gene expression data. To conclude, we hereby propose a transcription-based single-cell guided deconvolution framework to assess the drug effectiveness to different maturational stages of AML. We also provide our framework as a CRAN R package available at https://github.com/eonurk/seAMLess.
Results
Deconvolution pipeline recapitulates healthy and malignant hematopoiesis
We initially created a healthy bone marrow (BM) reference atlas from single-cell transcriptomics data (Fig. 1a). For this purpose, we integrated data from 69 101 cells covering 439 genes from three publicly available datasets (two full-transcriptome and one targeted) from two studies17,18 (Fig. 1b, Methods). T-cells and B-cells formed separate clusters in the UMAP plot, whereas the myeloid lineage cells clustered together. To differentiate early myelopoiesis, we distinguished >3349 hematopoietic stem cells (HSC) and 1 432 erythro-myeloid progenitors (EMP), 2 939 lymphoid multipotent progenitors (LMPP), as well as 3 508 granulocytes-monocytes progenitors (GMP) (Fig. 1c). Inter-individual variability, possibly due to age differences or technical differences, mostly affected T- and B- cells and did not influence overall clustering (Fig. S1A). Homogeneous distribution of the studies on the UMAP plot showed successful integration of the different datasets (Fig. S1B).
We next performed in silico experiments to validate our deconvolution set-up. We first used the healthy BM reference to create in silico mixed bulk samples with known cell compositions. Pseudobulk profiles were simulated with one abundant cell type (80% of the cells) with the remaining cells being a mix created by random selection (Methods). Then, MuSiC13 with the default settings was used to deconvolute the simulated pseudobulk profiles into their respective cell types (Fig. S1C). To validate on an independent dataset, we also simulated pseudobulks from 40 000 healthy bone marrow cells across 8 donors of the human cell atlas (HCA)19 (Fig. S1D) and performed deconvolution via MuSiC. All simulated pseudobulk profiles were successfully deconvoluted into their respective cell types (matching to the most abundant cell type) apart from EMP (Fig. 1d). For this profile, the annotations were shared mostly among EMP (14%) as well as HSC (25%) and Early Erythrocytes (23%). This discrepancy might be explained by the transcriptional similarity of these cell types (Fig. 1b).
Next, we analyzed our LUMC diagnostic flow cytometry data (EuroFlow10 panels; see Methods) for 22 AML samples with matched bulk RNA-seq data. An overview of mean fluorescence intensity (MFI) values for these samples’ abnormal cells after staining with antibodies for 33 markers distributed over 7 tubes (1 Orientation + 6 AML assignment tubes) is shown in Fig. S1E (Supplementary Table S1, Fig. S7). In line with our expectations, the most abundant cell types for all samples were in the myeloid lineage for both flow cytometry analyses and estimated cell compositions (ECCs) (Fig. 1e, Supplementary Table S4). To quantify whether monocytic AML can be accurately distinguished from AML with more stem cell-like phenotypes, we plotted CD14+ monocyte percentages as determined by deconvolution against MFI values of antibodies of monocytic markers (CD11b, CD64, IREM2, and CD14) on all BM cells without gating, and observed statistically significant correlations for 3 markers (CD11b, CD64, IREM2), with CD64 being most significant (R2 = 0.43, P < 0.001) (Fig. S1F). Also, percentages of AML cells assigned to the monocytic subset by EuroFlow panels and ECCs showed statistically significant correlations (R2 = 0.64, P < 0.001) (Fig. 1f).
Lastly, we downloaded publicly available TARGET AML and ALL (B-ALL and T-ALL) data and deconvoluted these samples (n = 719) with the healthy BM reference (Fig. 1g, Supplementary Table S1) to show that different leukemic phenotypes could be captured by deconvolution. The ECC of the matching cell type of origin was significantly higher for the different acute leukemias (Fig. S1G), most prominent for B-ALL, confirming the ECCs’ ability to capture the patients’ immune phenotypes at major cell type levels. Together, these benchmarking and validation analyses demonstrated that given a healthy single cell BM transcriptomic atlas, the cell type proportions can be recapitulated faithfully via deconvolution from bulk transcriptomic leukemia cases.
Deconvolution of bulk AML transcriptomics reveals the dominating immune fraction
To investigate heterogeneity in cell composition of AML, we next applied our framework to deconvolute five independent bulk transcriptomic AML studies, i.e. TCGA-LAML20 (n = 151), BEAT-AML21 (n = 460), TARGET-AML22 (n = 187), LEUCEGENE23 (n = 452) and our cohort LUMC11 (n = 100) totaling in 1350 samples from 1267 patients (Fig. S2A). The results are shown in Fig. 2a as a heatmap where each sample was decomposed into the 22 cell types from the healthy BM reference. We also added information on patients’ clinical blast counts, FAB6, WHO 20163, and ELN 20175 classes for an overarching picture of AML landscape, and we also calculated the stemness score24, which is a gene expression signature for patient prognosis trained on engraftment capacity of AML in immunodeficient mice (Supplementary Table S2). Each sample was assigned to the most abundant cell type as determined by deconvolution (AML phenotype) and arranged according to their fractions within each phenotype. The data showed a clear dominance of one immune cell type for most of the cases, indicating that maturational arrests and lineage skewing are leukemic properties that can be readily assessed using transcriptome sequencing. The majority of AML cases was estimated to be dominated by myeloid cells (Fig. 2a, S2A, S2B, S3A, S3D). As notable exceptions, a few pediatric AML samples from the TARGET cohort showed ECC profiles dominated by T-cells or B-cells. As previously reported these cases can nevertheless be considered as a special subgroup of pediatric AML, and specifically T-cell dominated patients were recognized for poor survival25. Furthermore, in line with previous reports17, patients with acute promyelocytic leukemia (APL), classified as AML-M3 by FAB, were correctly assigned to have a cell composition dominated by granulocyte-monocyte progenitors (GMPs). Besides APL, there are also AML cases dominated by GMP and large groups of AML assigned as monocytic AML or AML with an earlier HSC or EMP phenotype, again demonstrating that bulk transcriptomics can be used to capture the stage of arrest of AML during hematopoiesis by deconvolution.
Deconvolution of bulk AML transcriptomics agrees upon stemness score
To comprehensively visualize changes in ECCs in relation to metadata, ECCs of AML samples were visualized using a UMAP plot. As expected, samples with similar ECCs clustered, as shown after annotating the samples for their most abundant cell type (Fig. 2b—eft panel). When superimposing the deconvoluted percentages of 6 major cell types on the same UMAP plot (Fig. 2b—right panels), gradual shifts in cell composition became apparent, in particular, APL samples populated the extreme extension of the GMP cluster. Moreover, a gradient towards monocytic outgrowth and conversely to a high stemness was also observed. Comparisons with the previously published stemness score showed an overall agreement with ECCs, with patients dominated by HSCs and EMPs having a statistically significantly higher stemness score than patients dominated by CD14+ Monocytes and GMPs (Fig. S3B). Late Erythrocyte also had high stemness score, albeit with large variation. By dividing AML into cases with high and low stemness scores, we observed similar compositional changes with abundance of HSCs and EMPs in AML with high stemness scores in all cohorts. Late erythrocytes, however, were not consistently enriched in all cohorts. Furthermore, we noticed that in TARGET, T cells are abundant in AML with high stemness scores (Fig. S3C). This raises the question whether the stemness score, which has been trained on adult AML, is useful to predict prognosis of pediatric patients particularly considering that large subtypes of adult AML such as APL and AML with mutated NPM1 are rare in pediatric AML. The consistent enrichment of HSCs and EMPs in AML with high stemness scores in all cohorts, however, further supports correct determination of the cell composition of AML by deconvoluting bulk transcriptomics.
Estimated cell composition recapitulates FAB classes
Clear associations were also observed between cell type assignments by ECC and FAB classification status (left panel of Fig. 2c). Furthermore, distinct distributions of cell type-defining gene expression profiles became evident for each FAB AML type by plotting continuous values of deconvolutions rather than categorical assignments: M0 (minimally differentiated AML) cases had high levels of the HSC-defining signature, whereas these levels decreased and EMP- and GMP signatures appeared in M1 (AML without maturation) and M2 (AML with maturation). As expected, M3 (APL) cases were almost completely dominated by the GMP signature, while M4 (Acute Myelomonocytic Leukemia) and M5 (Acute monoblastic/monocytic leukemia) samples resembled CD14+ monocyte cells and GMP. Noticeably, a few samples that were dominated by the GMP signature were assigned to AML M4 or M5 by morphological assessment (top left corner). Further inquiry in LEUCEGENE, which used more detailed FAB annotations, revealed that these AML samples were enriched for M5a (Fisher exact p = 3.08 × 10–8) and M4Eo (Fisher exact p = 2.06 × 10-6) (Fig. S3E). AML M5a are dominated by monoblastic cells6 and AML M4Eo are characterized by myelomonocytic marrow infiltration with eosinophils containing abnormal immature granules26. Both AML subtypes are more differentiated than GMP, but less differentiated and clearly distinct from AML M4 and M5b, which contain more mature promonocytic or monocytic cells. The small groups of M6 (Acute Erythroleukemia) and M7 (Acute Megakaryoblastic Leukemia) AML were dominated by the signatures of Late Erythrocytes and Megakaryocyte Progenitors, respectively. These data demonstrated and confirmed that our single cell guided deconvolution strategy successfully captures the maturational arrest of AML cells at different differentiation stages of hematopoiesis.
Estimated cell composition captures genetic subtype-specific resistances to various drugs
To explore whether ECCs convey information on drug resistances, ex vivo drug response data of 122 small molecule inhibitors provided as area under the curves (AUC) for 363 AML samples from BEAT were downloaded. A higher AUC indicates that cancer cells are relatively resistant since higher drug concentrations are needed to induce cell death. To understand whether drug resistance of AML samples can be predicted by ECCs, we trained random forest (RF) models per drug via a leave-one-out cross validation (LOOCV) setting. For each RF model Spearman ρ values were calculated (Supplementary Table S5-S7). The strongest correlation was observed for the BCL2 inhibitor Venetoclax27 (ABT-199) drug (Spearman ρ = 0.509). Drugs like EGF-R inhibitor Erlotinib (Spearman ρ = 0.376) and the mTOR pathway inhibitor Rapamycin (Spearman ρ = 0.368) are amongst the top 10 drugs for which resistance could be best predicted (Fig. S4A).
Next, we asked how changes in each cell type affect drug responses and therefore univariately associated ECC to drug resistance (Fig. 3a, Supplementary Table S8). This analysis revealed that most change in resistance occurs across maturational states. For instance, CD14+ Monocytic AML are more resistant to Venetoclax, whereas more immature cell subsets are more sensitive. This trend was also clear after overlaying the AUC values of Venetoclax onto the UMAP plot (Fig. 3b) or when comparing AUC values across states (Fig. S5A) (one-way ANOVA p = 6.31 × 10−13; Supplementary Table S9).
We also stratified RF-based predictions according to WHO classifications for all drugs (Fig. S4B, S5B, Supplementary Table S10). Notably, drug responses to the Erlotinib and Rapamycin showed statistically significant correlations with AML with mutated NPM1 (n = 77, R2 = 0.29, P = 3.7 × 10−7) and inv(3) (n = 6, R2 = 0.91, P = 3.2 × 10−3), respectively. Also, responses to Flavopiridol (Spearman ρ = 0.325), a CDK kinase inhibitor, showed significant correlations with ECC within the group of AML-NOS (n = 77, R2 = 0.31, P = 3.7 × 10−7) while showing higher resistance towards AML with more stem cell like cell phenotypes (Fig. S5C, S5D). These annotations also revealed that PML-RARA-carrying APL samples (n = 8) are dominated by GMPs are sensitive for Venetoclax, whereas AML with CBFB-MYH11 (n = 7) had the best fit (R2 = 0.78, P = 0.009) (Fig. 3c), although the small sample size of CBFB-MYH11 cases prohibits drawing robust conclusions. Larger groups of AML, however, such as NPM1 mutated cases (n = 55, R2 = 0.33, P < 0.001), and AML-NOS (n = 42, R2 = 0.17, P = 0.007), together accounting for ~58% of AML, also showed a clear trend for Venetoclax resistance (Fig. 3c). Also, within the group of NPM1 mutated samples, which is the largest class of AML, CD14+ Monocyte dominated cases were most resistant to Venetoclax (Fig. S5E, S5F; one-way ANOVA p = 7.6 × 10−4; Supplementary Table S11). In summary, these findings suggest that information on the ECCs of AML will yield therapeutic implications even within one genetic subtype.
Estimated CD14+ Monocyte percentages predict Venetoclax resistance better than BCL-2 mRNA expression
Since Venetoclax is targeting the anti-apoptotic BCL-2 protein, we next checked BCL-2 mRNA expression levels in AML and overlaid the CD14+ Monocyte percentages for these cases (Fig. 3d). The data showed that low BCL2 expression indeed correlated with strong resistance to Venetoclax. However, a few samples were resistant despite relatively high BCL-2 expression (samples at upper right corner in Fig. 3d). As the majority of these cases had a CD14+ Monocyte phenotype, we univariately associated BCL-2 expression and CD14+ Monocyte percentage to investigate which factor best explains the Venetoclax response (Fig. 3e). We also associated metadata such as reported blast percentages, ELN status and primary diagnosis to determine their effect sizes and significance (Supplementary Table S12). This analysis demonstrated that both CD14+ Monocyte percentages (adjusted P = 1.7 × 10−18) and BCL-2 mRNA expression (adjusted P = 4.8 × 10−14) were associated significantly with the Venetoclax response. Furthermore, we investigated whether previously reported MAC Score28 (albeit in gene expression) was more associated to Venetoclax resistance compared to BCL2 gene expression alone (R = -0.54, and R = -0.53), but the results were similar, and the estimated CD14 Monocytes percentages had a stronger correlation (R = 0.60) (Fig. S6E). Next, we created a multivariate model to investigate whether the significance of CD14+ Monocyte percentages diminishes along with the presence of BCL-2 expression in the same model (Fig. 3f, Multivariate Venetoclax Tab in Supplementary Table 13). In this model, CD14+ Monocyte percentages remained significantly associated with Venetoclax resistance (P = 1.92 × 10−5), while BCL-2 expression was below the significance threshold of 0.01 (P = 0.015). In conclusion, these results indicate that cellular composition is a more robust marker than BCL-2 mRNA expression to predict Venetoclax resistance, specifically for AML from NOS and NPM1 mutated patients.
Estimated CD14+ Monocyte percentages associates with BCL-2 protein abundance
Next, we compared the effects of BCL-2 gene expression and CD14+ Monocyte percentages with BCL-2 protein abundance within each sample. For this purpose, we used our LUMC produced proteomics data, and after batch correction (Fig. S6A, S6B, see Methods) for matched AML cases (n = 39; LUMC) to correlate the abundance of Apoptosis regulator BCL-2 protein vs gene expression (Fig. S6C) and CD14+ Monocyte percentages (Fig. S6D). The data showed that BCL-2 gene expression and CD14+ Monocyte percentages both correlated with BCL-2 protein abundance to a similar extent (R2 = 0.45, P < 0.001). We also overlaid plots for BCL-2 expression and CD14+ Monocyte percentages with information on patients’ genetic abnormalities (Fig. 4a, b). Based on their low AUC for Venetoclax response (Fig. 3b), two AML cases with PML-RARA had high BCL-2 protein expression (n = 2, dark green). We also confirmed the strong variability in AUC for Venetoclax response within the group of AML with mutated NPM1 (n = 18, green) by showing two distinct groups at the extremes of CD14+ Monocyte percentages which correlated with BCL-2 protein expression (Fig. 4b). Within the group of AML patients with mutated NPM1, CD14+ Monocyte percentage associated stronger with both BCL-2 protein abundance (R2 = 0.56, P < 0.001) than BCL-2 gene expression (R2 = 0.52) (Fig. 4c, d). In conclusion, the data demonstrate the relevance of our deconvolution approach on bulk RNA-Seq to separate AML, especially those with mutated NPM1, with high and low monocyte percentages to predict the patient’s response to Venetoclax.
Estimated CD14+ Monocyte phenotype captures Venetoclax treated patient prognosis
To test whether deconvolution approach associates with clinical outcome, we utilized BEAT-AML study and selected Venetoclax/Azacitidine (Ven/Aza) treated patients with overall survival and RNA-seq data (n = 20). Deconvolution revealed that 5 of these patients had a CD14+ monocytic phenotype and these patients had a significantly worse overall survival (p = 0.047) compared to others (Fig. 4e; Supplementary Table S14). On the contrary, stemness score24 (p = 0.11) and ELN classification (p = 0.14) were not able to fully split these patients with the notable exception of ELN favorable patients. In summary, this analysis highlights the clinical significance of employing ECC-based annotation for Ven/Aza treated patients.
Discussion
In this work, we utilized single cell guided deconvolution to decompose bulk transcriptomics data from five independent AML cohorts and show that the obtained estimated cell compositions (ECCs) faithfully reconstitute the FAB landscape (M0-M7). Moreover, using same-sample flow cytometry, we were able to validate our deconvolution framework. Hence, following our previous work using deep transcriptomics to call various types of leukemia-defining genetic aberrations11,29, our current findings further underpin the power of transcriptomic-based approaches as a comprehensive and versatile platform for AML diagnosis. We next illustrated the potential use of our deconvolution framework for precision medicine applications, by correlating the estimated ECC to the results of an ex vivo drug resistance screening of 122 small-molecule inhibitors in the BEAT-AML study. For the BCL-2 inhibitor Venetoclax we show that higher levels of the estimated ECC subset ‘CD14+ Monocyte’ correspond to a higher resistance, and intriguingly, that estimated CD14+ Monocyte levels is a better explanatory variable of resistance to Venetoclax than BCL-2 expression alone. Nevertheless, using same-sample LUMC proteomics data in 39 patients, we show that the estimated CD14+ Monocyte levels accurately mark BCL protein expression, and that for NPM1-mutated patients the presence or absence of a CD14 monocytic outgrowth corresponds with a distinct NPM1 protein abundance. Lastly, we show that CD14 monocyte phenotype correlates with poor survival outcome. Our findings may potentially have important implications on drug use especially for genetically uncharacterized patients (AML, NOS) currently accounting for ~40% of all AML as well as other well-characterized patients such as NPM1 mutated samples.
Kuusanmaki et al. reported that monocytic differentiation of AML reduced sensitivity to Venetoclax ex vivo8, and also with a recent paper they show ex-vivo drug responses correlate with AML response in clinic30. Similarly, Pei et al. reported that the different monocytic subclones in vivo created resistance to Venetoclax treatment9, and also recently, White et al. showed that BCL-2 inhibitor resistance could be predicted via the genes associated with monocytes. Recently, using the flow cytometry data from untreated MDS patients Ganan-Gomez et al. showed that after relapse and becoming secondary AML (sAML) patients, those patients with less maturated cell types (EMP) before treatment had a faster complete remission (CR) and longer relapse free status compared to more matured cell type (GMP) with Venetoclax treatment, supporting our hypothesis that ECCs could also predict Venetoclax resistance for MDS and sAML patients31. Collectively, these studies suggest that Venetoclax has different resistances at different maturational stages, and especially higher resistance for patients with CD14+ Monocyte dominated phenotype as we have shown in this manuscript, and we provide an open-source framework, seAMLess, for replicating our results or applying it to other clinically relevant datasets.
A limitation of our deconvolution strategy is that it cannot distinguish the cancerous cell types as it uses a healthy bone marrow as a reference. Although it is conceptually appealing, we have three rationales behind not using a cancerous reference. First, without mutation calling for all cells, one cannot be sure whether a cell is cancerous or not. Strategies like predicting cancer cells based on their transcriptional similarity of cells with mutation calling, proposed by Van Galen et al.17, adds another level of ambiguity to already not perfect deconvolution pipelines. Secondly and more importantly, heterogeneity of AML causes further sub-clustering within individual AML cases (e.g. UMAP plots of Triana et al.18), therefore creating a not well-characterized cell type signature but rather patient specific clusters10,18. Lastly, using healthy subsets as reference allows our framework to provide more interpretable and intuitive results for clinicians and doctors, as it reports immune phenotypes and percentages on contrary to score-based prognostic values16,24,29. To summarize, we believe our proposed pipeline could be a blueprint for assessing new drugs’ resistances on different cell types of AML and along with our framework, they may provide better insights for clinicians and help paving the way into precision medicine in AML.
Methods
Creating the healthy BM single-cell reference
We downloaded three different healthy BM datasets from two different studies, namely Van Galen et al.17 (full-transcriptome, n = 6915), and Triana et al.18 (full-transcriptome, n = 13,165; 462 targeted mRNA, n = 49,057). Then, all cell labels were uplifted up via Seurat package32 (v4.0.3) default query annotation pipeline to match with the Triana’s full-transcriptome cell labels as it had the most recent and detailed labels. Next, Seurat’s integration pipeline with CCA was run and the cells that were labeled as doublets/multiplets were removed from the down-stream analyses, and this yielded a healthy BM atlas of 69,130 cells in total, covering 439 genes. Also, we used the 40,000 cell subset of HCA provided by SeuratObject package (v4.0.2) for the validation analyses (Fig. 1c, d).
Different schemes of creating pseudobulks
To create a cancerous-like pseudobulk profile from healthy BM reference and HCA subset, we selected a total of 1000 cells for each profile, majority (80%) of them coming from a cell-type (over-abundant) and the rest of the cells were distributed according to inverse proportion of the numbers of cells for the remaining cell types. To achieve this, first integrated counts were exponentiated to make them non-log scale and slice_sample function from dplyr package (v1.0.7) was used with a replacement option. Then, these non-log scale cell counts were summed up to create the pseudobulk profiles. For individual-based pseudobulk, AggregateExpression from Seurat package was used.
Flow cytometry data
AML cases were stained with fluorescent antibodies and analyzed by flow cytometry for diagnosis, prognosis, and disease monitoring of AML in the diagnostic laboratory of the department of Hematology in the Leiden University Medical Center. The flow cytometric test has been developed and performed according to EuroFlow standard operating procedures (www.euroflow.org)10. EuroFlow antibody combinations have been tested against references databases of normal cells from healthy individuals and allow multidimensional identification and distinction of aberrant cells from normal cell populations. The flow cytometric test includes 8 tubes with different antibody combinations, i.e. one ALOT tube (acute leukemia orientation tube) and 7 AML tubes (AML1-7). The ALOT tube contains antibodies against CD3, CD45, MPO, CD79, CD19 and CD7. The AML tubes contain antibodies against CD16, CD13, CD34, CD117, CD11b, CD10, HLA-DR and CD45 (AML-1), CD35, CD64, CD34, CD117, CD300e/IREM2, CD14, HLA-DR and CD45 (AML-2), CD36, CD105, CD34, CD117, CD33, CD71, HLA-DR and CD45 (AML-3), NuTdT, CD56, CD34, CD117, CD7, CD19, HLA-DR and CD45 (AML-4), CD15, NG2, CD34, CD117, CD22, CD38, HLA-DR and CD45 (AML-5), CD42a and CD61, CD203c, CD34, CD117, CD123, CD4, HLA-DR and CD45 (AML-6). AML tube 7 contains antibodies against CD41, CD25, CD34, CD117, CD42b, CD9, HLA-DR and CD45, but this tube has not been used to stain AML cases analyzed in this study.
Obtaining bulk transcriptomic data and their preprocessing
We have downloaded the non-normalized count matrices (htseq-counts) and the meta files of the four discovery cohorts (TCGA-LAML, BEAT-AML, TARGET-AML and TARGET-ALL) from https://portal.gdc.cancer.gov. For LEUCEGENE, count data was downloaded from their dedicated site (https://data.leucegene.iric.ca/) along with their provided meta data. All meta/count data were pre-processed using R (v4.1.0). For the meta data, genomic aberration labels were relabeled to the main AML WHO 2016 classes, non-AML samples were removed from the down-stream analyses, ELN-classes were relabeled according to ELN 2017 recommendations. For the count data, ERCC spike-ins and mitochondrial genes were removed, and the count matrix was then sorted according genes standard deviation in order to remove the duplicated genes that had less variation thus providing less information, and lastly the gene ensembl ids were converted to gene symbols. Before converting ensembl ids into gene symbols, the stemness score for each patient was calculated via count-per-million (cpm) normalized libraries, and these libraries were normalized using cpm function from edgeR package33 (v.3.34.1).
Our 100 AML samples (LUMC) deposited to EGA with accession number EGAS00001003096 and they are accessible upon request. One hundred cryopreserved AML samples were selected from the Hematology Biobank of Leiden University Medical Center (LUMC) with approval by the institutional review board (no. B 18.047) and written informed consent were obtained according to the Declaration of Helsinki. QC benchmark analyses for these samples were done in our previous paper11. Therefore, we ran default HT-SEQ pipeline (v0.11.2) with paired-end option aligning fastq files to hg38 to obtain the count matrix. All above mentioned preprocessing steps (filtering, gene name conversion) were also conducted for these samples as well before deconvolution.
Deconvolution pipeline
To deconvolute the simulated pseudobulks and bulk RNA-seq AML patients, we used MuSiC13 package (v0.1.1) as it benchmarked highly and consistently across different cell types at various settings12 and had an easy-to-use open-source (GPL-3) implementation (https://github.com/xuranw/MuSiC). We used non-log scaled count values as inputs and set the normalization option to false. Patients were assigned to the groups (e.g., GMP, CD14+ Monocyte etc.) according to their most abundant deconvoluted ECCs. In the heatmap (Fig. 2a), patients were re-ordered according to their ECCs within each assignment.
UMAP of estimated cell compositions
First, to obtain reproducible results with umap plots, we set a seed to 2 as UMAP procedure involves random initialization. Then, we ran umap function with default parameters from umap package (v0.2.7) and used the first two reduced dimensions to create the plots. All related figures were plotted using ggplot2 package (v3.3.5).
Drug resistance predictions via random forest
BEAT-AML has drug resistance data for 122 small-molecule inhibitors, we downloaded these from their manuscript (Supplementary Table S5). Then, each drug response was min-max normalized, then matched with their available RNA-seq samples. Next, drug resistances were predicted with the deconvoluted ECCs. Random forest algorithm from randomForest package (v.4.6-14) with default parameters was used for the predictions. Each sample within each drug was predicted at leave-one-out cross-validation settings. Then, for each drug, Spearman ρ values were calculated between predicted and actual drug resistance values. To stratify the drugs according to their primary diagnosis of WHO classification, samples within each diagnosis are selected and then each drugs predicted, and normalized drug resistance were associated, then the correlation and p-values were calculated using summary function in base R.
Venetoclax association analysis
First, each attribute was associated to standardized (min-max normalized) Venetoclax resistance from BEAT-AML study at univariate settings using lm function in R environment (v4.1.0) (Supplementary Table S12). Then, p-values and coefficients were calculated using summary function and then p-values were multiple hypothesis corrected using Benjamini-Hochberg procedure. For multivariate models, FAB classification, and ECC levels (except CD14+ Monocytes) were excluded as only 76 out of 460 samples of BEAT-AML had FAB classifications and as other ECCs are not independent of CD14+ Monocyte percentages (as the question is whether CD14+ Monocyte levels are independently predictive of Venetoclax resistance given BCL-2 expression in the same model). Again, p-values and coefficients were calculated with summary function and plotted in a volcano plot (Fig. 3f, Supplementary Table S13).
Survival analysis
ggsurvival and survminer R packages was used for producing Kaplan–Meier curves. P-values were calculated with log-rank test. For ECC graph, patients were annotated with their most abundant cell type. Stemness score is split into low and high categories using the median value.
Proteomics sample preparation
Cell lysis, digestion and TMT labeling was performed as described in Paula et al.34. Cell lysis was performed using 5% SDS lysis buffer (100 mM Tris-HCl pH7.6) and 5 U benzonase nuclease (Thermo Scientific) with incubation at 95 °C for 4 min. Protein concentration was determined using Pierce BCA Gold protein assay (Thermo Fisher Scientific). 100 µg protein of each sample was then reduced with 5 mM TCEP. Reduced disulfide bonds were alkylated using 15 mM iodoacetamide. Excess iodoacetamide was quenched using 10 mM DTT. Protein lysates were precipitated using chloroform/methanol; resulting pellets were re-solubilized in 40 mM HEPES pH 8.4 and digested using TPCK treated trypsin (1:12.5 enzyme/protein ratio) overnight at 37 °C. Peptide concentration was then determined using Pierce BCA Gold protein assay.
The different samples, and reference samples, were arranged into five TMTpro 16plex sets. The peptides were labeled with TMTpro Label Reagents (Thermo Fisher Scientific) in a 1:4 ratio by mass (peptides/TMT reagents), total volume was 35 µL, for 1 h at RT. Excess TMT reagent was quenched with 5 µL 6% hydroxylamine for 15 min at RT. Samples corresponding to a TMT set were pooled and lyophilized.
Each TMT sample (80 ug) was fractionated by high pH reverse phase chromatrography on a Zorbax RRHD Eclipse Plus C18 2.1 × 150 mm 1.8-micron column, at 800 ul/min using an Agilent1200 binary HPLC system, equipped with a UV detector. The mobile phases were 10 mM Ambic pH 8.4 (A) and 10 mM Ambic/Acetonitrile 20/80 pH 8.4 (B). The gradient was from 2% to 90%B in 35 min. 20 fractions were collected in a circular fashion, i.e., collection per vial for 20 sec before moving to the next collection vial. After collection in the last vial collection is continued in the first vial. Fractions were subsequently freeze dried.
Mass spectrometry
TMT-labeled peptide fractions were dissolved in water/formic acid (100/0.1 v/v) and analyzed by on‐line C18 nanoHPLC MS/MS with a system consisting of an Ultimate3000nano gradient HPLC system (Thermo, Bremen, Germany), and an Exploris480 mass spectrometer (Thermo) as in Rossi et al.35. Fractions were injected onto a cartridge precolumn (300 μm × 5 mm, C18 PepMap, 5 um, 100 A), and eluted via a homemade analytical nano-HPLC column (50 cm × 75 μm; Reprosil-Pur C18-AQ 1.9 um, 120 A (Dr. Maisch, Ammerbuch, Germany)). Solvent A was water/formic acid 100/0.1 (v/v). The gradient was run from 2% to 40% solvent B (20/80/0.1 water/acetonitrile/formic acid (FA) v/v) in 120 min. The nano-HPLC column was drawn to a tip of ∼ 10 μm and acted as the electrospray needle of the MS source. The mass spectrometer was operated in data-dependent MS/MS mode with a cycle time of 3 s, with the HCD collision energy at 36 V and recording of the MS2 spectrum in the orbitrap, with a quadrupole isolation width of 1.2 Da. In the master scan (MS1) the resolution was 120,000, the scan range 350–1200, at standard AGC target @maximum fill time of 50 ms. A lock mass correction on the background ion m/z = 445.12003 was used. Precursors were dynamically excluded after n = 1 with an exclusion duration of 45 s, and with a precursor range of 20 ppm. Charge states 2–5 were included. For MS2 the first mass was set to 110 Da, and the MS2 scan resolution was 45,000 at an AGC target of 200% @maximum fill time of 60 ms.
Proteomics data processing and down-stream analysis
In a post-analysis process, raw data were first converted to peak lists using Proteome Discoverer version 2.4 (Thermo Electron), and submitted to the Uniprot database (Homo sapiens, 20596 entries), using Mascot v. 2.2.07 (www.matrixscience.com) for protein identification. Mascot searches were performed with 10 ppm and 0.02 Da deviation for precursor and fragment mass, respectively, and the enzyme trypsin was specified. Up to two missed cleavages were allowed. Methionine oxidation and acetyl on protein N-terminus were set as variable modifications. Carbamidomethyl on Cys and TMTpro on Lys and N-terminus were set as fixed modifications. Protein and peptide FDR were set to 1%. Normalization was on the total peptide amount. The 5 TMT-16plex analyses were normalized to each other by the bridge samples.
First, the abundance data is log-cpm transformed to stabilize variance among samples and then to ensure dealing with the technical batch effects, we ran removeBatchEffect function from limma package (v3.48.3) providing batch information. Then, we overlaid the transformed protein abundances onto a PCA plot to ensure that there is no batch related clustering, as can be observed from the positioning of the TMT control samples before (Fig. S6A) and after correction (Fig. S6B). The down-stream analyses were then done using only the primary AML (n = 39). Next, the transformed BCL-2 protein abundance was associated with the BCL-2 expression from LUMC cohort (Fig. 4a) and CD14+ Monocyte percentage within each sample (Fig. 4b) (Supplementary Table S14). R2 and p-values were calculated using stat_poly_eq function from ggpmisc (v0.4.5) package and lines were drawn with geom_smooth function from ggplot2 package (v3.3.5) via method option set to linear model.
Data availability
Our 100 AML samples (LUMC) have been deposited to EGA with accession number EGAS00001003096 and are accessible upon request. These samples were deposited to EGA for Arindrarto et al.11 For further inquiries, please contact: e.b.van_den_akker [at] lumc.nl.
Code availability
We have implemented an GPL-3 licensed R package at https://github.com/eonurk/seAMLess, which deconvolutes a given bulk RNA-seq count matrix to 22 cell types from our single-cell reference and predicts drug resistances via RF algorithm. All scripts related to this project can be found in https://github.com/eonurk/lumc-sc-aml.
References
Siegel, R. L., Miller, K. D., Fuchs, H. E. & Jemal, A. Cancer statistics, 2022. CA Cancer J. Clin. 72, 7–33 (2022).
Vardiman, J. W. et al. The 2008 revision of the world health organization (WHO) classification of myeloid neoplasms and acute leukemia: rationale and important changes. Blood 114, 937–951 (2009).
Arber, D. A. et al. The 2016 revision to the world health organization classification of myeloid neoplasms and acute leukemia. Blood 127, 2391–2405 (2016).
Khoury, J. D. et al. The 5th edition of the world health organization classification of haematolymphoid tumours: myeloid and histiocytic/dendritic neoplasms. Leukemia 36, 1703–1719 (2022).
Döhner, H. et al. Diagnosis and management of AML in adults: 2017 ELN recommendations from an international expert panel. Blood 129, 424–447 (2017).
Bennett, J. M. et al. Proposals for the classification of the acute leukaemias french-American-British (FAB) co-operative group. Br. J. Haematol. 33, 451–458 (1976).
Canaani, J. et al. Impact of FAB classification on predicting outcome in acute myeloid leukemia, not otherwise specified, patients undergoing allogeneic stem cell transplantation in CR1: an analysis of 1690 patients from the acute leukemia working party of EBMT: CANAANI ET AL. Am. J. Hematol. 92, 344–350 (2017).
Kuusanmäki, H. et al. Phenotype-based drug screening reveals association between venetoclax response and differentiation stage in acute myeloid leukemia. Haematologica 105, 708–720 (2020).
Pei, S. et al. Monocytic subclones confer resistance to venetoclax-based therapy in patients with acute myeloid leukemia. Cancer Discov. 10, 536–551 (2020).
EuroFlow Consortium (EU-FP6, LSHB-CT-2006-018708). et al. EuroFlow antibody panels for standardized n-dimensional flow cytometric immunophenotyping of normal, reactive and malignant leukocytes. Leukemia 26, 1908–1975 (2012)..
Arindrarto, W. et al. Comprehensive diagnostics of acute myeloid leukemia by whole transcriptome RNA sequencing. Leukemia 35, 47–61 (2021).
Avila Cobos, F., Alquicira-Hernandez, J., Powell, J. E., Mestdagh, P. & De Preter, K. Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat. Commun. 11, 5650 (2020).
Wang, X., Park, J., Susztak, K., Zhang, N. R. & Li, M. Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat. Commun. 10, 380 (2019).
Dai, C., Chen, M., Wang, C. & Hao, X. Deconvolution of bulk gene expression profiles with single-cell transcriptomics to develop a cell type composition-based prognostic model for acute myeloid leukemia. Front. Cell Dev. Biol. 9, 762260 (2021).
Li, H., Sharma, A., Ming, W., Sun, X. & Liu, H. A deconvolution method and its application in analyzing the cellular fractions in acute myeloid leukemia samples. BMC Genomics 21, 652 (2020).
Zeng, A. G. X. et al. A cellular hierarchy framework for understanding heterogeneity and predicting drug response in acute myeloid leukemia. Nat. Med. 28, 1212–1223 (2022).
van Galen, P. et al. Single-cell RNA-seq reveals AML hierarchies relevant to disease progression and immunity. Cell 176, 1265–1281.e24 (2019).
Triana, S. et al. Single-cell proteo-genomic reference maps of the hematopoietic system enable the purification and massive profiling of precisely defined cell states. Nat. Immunol. 22, 1577–1589 (2021).
Hay, S. B., Ferchen, K., Chetal, K., Grimes, H. L. & Salomonis, N. The human cell atlas bone marrow single-cell interactive web portal. Exp. Hematol. 68, 51–61 (2018).
The Cancer Genome Atlas Research Network. Genomic and epigenomic lndscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059–2074 (2013).
Tyner, J. W. et al. Functional genomic landscape of acute myeloid leukaemia. Nature 562, 526–531 (2018).
Farrar, J. E. et al. Genomic profiling of pediatric acute myeloid leukemia reveals a changing mutational landscape from disease diagnosis to relapse. Cancer Res. 76, 2197–2205 (2016).
MacRae, T. et al. RNA-Seq reveals spliceosome and proteasome genes as most consistent transcripts in human cancer cells. PLoS ONE 8, e72884 (2013).
Ng, S. W. K. et al. A 17-gene stemness score for rapid determination of risk in acute leukaemia. Nature 540, 433–437 (2016).
Jiang, F. et al. An immune checkpoint-related gene signature for predicting survival of pediatric acute myeloid leukemia. J. Oncol. 2021, 1–14 (2021).
Haferlach, T. et al. Clinical aspects of acute myeloid leukemias of the FAB types M3 and M4Eo. Ann. Hematol. 66, 165–170 (1993).
DiNardo, C. D. et al. Azacitidine and venetoclax in previously untreated acute myeloid leukemia. N. Engl. J. Med. 383, 617–629 (2020).
Waclawiczek, A. et al. Combinatorial BCL2 family expression in acute myeloid leukemia stem cells predicts clinical response to azacitidine/venetoclax. Cancer Discov. 13, 1408–1427 (2023).
Docking, T. R. et al. A clinical transcriptome approach to patient stratification and therapy selection in acute myeloid leukemia. Nat. Commun. 12, 2474 (2021).
Kuusanmäki, H. et al. Ex vivo venetoclax sensitivity testing predicts treatment response in acute myeloid leukemia. Haematologica 108, 1768–1781 (2023).
Ganan-Gomez, I. et al. Stem cell architecture drives myelodysplastic syndrome progression and predicts response to venetoclax-based therapy. Nat. Med. 28, 557–567 (2022).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Paulo, J. A. & Gygi, S. P. Nicotine-induced protein expression profiling reveals mutually altered proteins across four human cell lines. Proteomics https://doi.org/10.1002/pmic.201600319 (2017).
Rossi, M. et al. PHGDH heterogeneity potentiates cancer cell dissemination and metastasis. Nature 605, 747–753 (2022).
Acknowledgements
Authors want to thank Daniele Bizzarri for his fruitful critics on the project. This study was funded by a strategic investment of the Leiden University Medical Center, embedded within the Leiden Oncology Center, and executed within the Leiden Center for Computational Oncology. EvdA was funded by a personal grant of the Dutch Research Council (NWO; VENI: 09150161810095). The funding bodies had no role in the study design, the collection, analysis, and interpretation of data, the writing of the manuscript, and the decision to submit the manuscript for publication.
Author information
Authors and Affiliations
Contributions
Conceptualization: M.J.R., E.v.d.A., M.G. Resources: E.v.d.A. Methodology: E.O.K., E.v.d.A., P.v.A., A.M.O. Investigation: E.O.K., J.S., M.Z., M.G., E.v.d.A., P.v.A. Visualization: E.O.K. Funding acquisition: E.v.d.A. Project administration: E.v.d.A. Supervision: M.J.R., M.G., E.v.d.A. Writing—original draft: E.O.K., M.G., E.v.d.A. Writing—review and editing: All authors.
Corresponding author
Ethics declarations
Competing interests
J.J.M. van Dongen reports to be chairman of the EuroFlow scientific foundation, which receives royalties from licensed patents, which are collectively owned by the participants of the EuroFlow Foundation. These royalties are exclusively used for continuation of the EuroFlow collaboration and sustainability of the EuroFlow consortium. J.J.M. van Dongen also reports an Educational Services Agreement and a Scientific Advisor Agreement with BD Biosciences San José; the related fees and honoraria are for the University of Salamanca. The other authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Karakaslar, E.O., Severens, J.F., Sánchez-López, E. et al. A transcriptomic based deconvolution framework for assessing differentiation stages and drug responses of AML. npj Precis. Onc. 8, 105 (2024). https://doi.org/10.1038/s41698-024-00596-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41698-024-00596-9