Introduction

Acute myeloid leukemia (AML) is predominantly a disease of older patients for whom prognosis remains poor [1, 2]. Intensive chemotherapy, usually consisting of an anthracycline and cytarabine, induces remission in about 50% of older fit patients, but most patients relapse and succumb to their disease. Beyond patient-associated factors, such as increasing age, comorbidities and poor performance status, disease-related factors and particularly an unfavorable genetic profile of the disease predicts resistance to current standard therapy [3, 4]. In line, the proportion of patients with an unfavorable disease profile such as intermediate-2 and high risk according to 2010 European LeukemiaNet (ELN) [5] recommendations increases with older age from about one third in patients below the age of 60 years to nearly 60% in patients 70 years or older [6].

Epigenetic changes, such as mutations of epigenetic modifiers and aberrant DNA methylation, are frequent in AML [7, 8]. Furthermore, DNA methylation has emerged as an attractive therapy target in AML [9, 10] and particularly in patients with unfavorable cytogenetic and/or a TP53 mutation [11, 12]. The high failure rate of intensive induction therapy in AML with an unfavorable genetic profile may be a result of cytarabine resistance [13]. In contrast, patients with a favorable genetic profile such as core-binding factor (CBF) AML or AML with mutated NPM1 are sensitive to standard induction therapy [3, 4]. The improved understanding of the molecular pathogenesis has spurred new treatment strategies targeting specific molecular defects. So far, this concept has been successful by using all-trans retinoic acid (ATRA) and/or arsenic trioxide in the therapy of acute promyelocytic leukemia (APL) [14], as well as by introducing midostaurin and enasidenib in the therapy of AML with FLT3 and IDH2 mutations, respectively [15, 16].

In this trial, patients with CBF-AML [17], AML with mutated NPM1 [18], and AML with FLT3 internal tandem duplication (ITD) [19] were excluded due to competitive trials that were active during the same time resulting in a selection of patients with more high-risk disease features. The hypothesis was that these patients may particularly benefit from incorporation of the hypomethylating agent azacitidine in induction therapy. Thus, the aim of our study was to evaluate the impact of substituting cytarabine by azacitidine administered sequentially or concurrently with idarubicin and etoposide on response rate and survival endpoints. The AMLSG 12–09 trial was a prospective, randomized, multi-institutional, controlled phase-II trial.

Patients and methods

Patients

Between October 2010 and March 2012, 277 adult patients 18–82 years of age with newly diagnosed AML were enrolled; diagnoses included de novo AML, secondary AML with a preceding history of myelodysplastic syndrome or myeloproliferative neoplasm (s-AML), and therapy-related AML following treatment of a primary malignancy (t-AML), as defined by the WHO 2008 classification [20]. Excluded were APL, CBF-AML, AML with FLT3-ITD, and AML with NPM1 mutation; further exclusion criteria were concomitant renal (creatinine > 1.5 x upper normal serum level), liver (AST or ALP > 2.5 x upper normal serum level) or cardiac dysfunction (New York Heart Association III/IV), uncontrolled infectious disease, primary coagulation disturbance or ECOG performance status > 2. Written informed consent was obtained from all patients. The protocol was approved by the lead Ethics Committee and registered at clinicaltrialsregister.eu (EudraCT Number: 2009-016142-44) and clinicaltrials.gov (ClinicalTrials.gov Identifier: NCT01180322).

Cyto- and molecular genetics

Chromosome banding analysis was performed centrally in the two AMLSG Laboratories for Cytogenetics (Hannover, Ulm). Karyotypes were designated according to the International System for Human Cytogenetic Nomenclature [21]. Leukemia samples were analyzed for mutations in FLT3 (ITDs and tyrosine kinase domain [TKD] mutations at codons D835/I836), CEBPA, NPM1, IDH1/2, RUNX1, ASXL1, TP53, and DNMT3A as previously described [22,23,24,25,26,27].

Study design

Patients were randomized into 4 arms in a 1:1:1:1 manner. Induction therapy regimens comprised: a) STANDARD, cytarabine 100 mg/m²/day by continuous intravenous (iv) infusion on days 1–7, idarubicin 12 mg/m²/day by iv push on days 1,3,5 (application in patients > 65 years at days 1 + 3 only), etoposide 100 mg/m²/day by 1-hour iv infusion on days 1,2,3 (application in patients > 65 years at days 1 + 3 only); b) Azacitidine PRIOR, azacitidine 100 mg/m²/day by subcutaneous (sc) injection on days -5 to day −1, idarubicin and etoposide as in STANDARD; c) Azacitidine CONCURRENT, azacitidine 100 mg/m²/day by sc injection on days 1–5 concurrently to idarubicin and etoposide as in STANDARD; d) Azacitidine AFTER, azacitidine 100 mg/m² per day by sc injection on days 4–8. Patients in complete remission (CR), CR with incomplete hematological recovery (CRi) or partial remission (PR) after first induction therapy received a second cycle with a dose reduction of idarubicin (administered on days 1 + 3 only).

Consolidation therapy

Patients in CR/CRi following induction therapy were assigned to consolidation therapy with either allogeneic hematopoietic-cell transplantation (HCT) from a matched related or unrelated donor (one consolidation cycle before allogeneic HCT was optional), or, in second priority, three cycles of high-dose cytarabine (HiDAC). Cytarabine was administered by iv infusion in a dose of 3 g/m² bid on days 1,2,3 [28]. For patients > 65 years of age, dose of cytarabine was reduced to 1 g/m². Lenograstim (34 × 106IU/ml) was applied sc daily beginning on day 10 until neutrophil count > 0.5 × 109/l.

Maintenance therapy

with azacitidine was intended in all patients who were randomized to one of the azacitidine-containing induction therapy arms. Maintenance therapy was scheduled for a total duration of 2 years. Azacitidine was administered in a dose of 50 mg/m² per day by sc injection on days 1–5 every 4 weeks.

Definition of response criteria, survival endpoints and hematologic recovery

In accordance with standard criteria, CR was defined as < 5 % bone marrow blasts, an absolute neutrophil count of ≥ 1.0 G/L, a platelet count of ≥ 100 G/L, no blasts in the peripheral blood and no extramedullary leukemia; CR with incomplete blood count recovery (CRi) was characterized as CR except for residual neutropenia (neutrophils < 1.0 G/L) or thrombocytopenia (platelets < 100 G/L) [5]. Relapse was defined as > 5% bone marrow blasts or new extramedullary leukemia in patients with previously documented CR/CRi.

Event-free survival (EFS), relapse-free survival (RFS) and overall survival (OS) were defined as recommended [5]. Times to leukocyte, neutrophil and platelet recovery were measured from the first day of chemotherapy of each cycle until the first day with values ≥ 1, ≥ 0.5 and ≥ 20 G/L for white blood cells (WBC), neutrophils and platelets, respectively. Toxicities were defined and graded according to the National Cancer Institute (NCI) Common Toxicity Criteria, version 3.0.

Sample size planning and statistical analysis

An optimal two-stage design of Simon was used to evaluate each arm of the study separately [29]. The null hypothesis in each arm was H0: π ≤ 0.40, whereby π denoted the true CR/CRi rate of the induction therapy. In contrast, an effective therapy was estimated to achieve at least a CR/CRi rate of 55%. The sample size was calculated to detect an effective therapy with a power of 80%. The level of significance was fixed at α = 5% for each treatment arm. Based on the assumptions an efficacy of the corresponding therapy was rejected in the first stage of 26 treated patients, if 11 or less patients achieved a CR/CRi. If 12 or more patients achieved a CR/CRi during this first stage, the trial proceeded to second stage with a total sample size of 84 patients per treatment arm. Randomization after completion of the first stage was carried on until first stage results were available. Second stage rejection was considered if not more than 40 patients achieved a CR/CRi.

Pairwise comparisons between patient subgroups were performed by the Mann–Whitney or Kruskal-Wallis test for continuous variables and by Fisher’s exact test for categorical variables. Univariable and multivariable logistic regression models were applied to investigate the influence of covariates (age, sex, CEBPA, DNMT3A, RUNX1, ASXL1, IDH1, IDH2, TP53, ELN high-risk category) on response to induction therapy. Secondary endpoints of the study were OS, RFS, EFS, therapy-related toxicity and their correlation with the study drug. The median duration of follow-up was calculated by the reverse Kaplan–Meier estimate [30]; the Kaplan–Meier method was used to estimate the distributions of EFS, RFS and OS. Survival distributions were compared using the log-rank test. Multivariable Andersen-Gill regression models were used to evaluate the same prognostic variables as for response to induction therapy as well as alloHCT as a time-dependent covariable [31]. Missing data were replaced by 50 imputations using multivariate imputations by chained equations applying predictive mean matching [32]. Backward selection applying a stopping rule based on a p-value of 0.50 was used in multivariable regression models to exclude redundant or unnecessary variables [32].

All statistical analyses were performed with the statistical software environment R, version 3.2.1, using the R packages rms, version 4.3-1, and cmprsk, version 2.2-2 [33].

Results

Patients and baseline characteristics

Of 277 patients, 9 (3%) were excluded due to violation of inclusion/exclusion criteria: no diagnosis of AML, n = 3; AML with NPM1 mutation, n = 1; presence of Philadelphia chromosome, n = 1; organ insufficiency (renal failure), n = 1; withdrawal of informed consent, n = 2; other reason (extramedullary manifestation of AML in spleen), n = 1. Thus, overall 268 patients were randomized (Fig. 1).

Fig. 1
figure 1

CONSORT Diagram. Abbreviations: STANDARD, Cytarabine 100 mg/m²/day by continuous iv infusion on days 1–7, idarubicin 12 mg/m²/day by iv push on days 1,3,5 (application in patients > 65 years at days 1 + 3 only), etoposide 100 mg/m²/day by 1-hour iv infusion on days 1,2,3 (application in patients > 65 yrs at days 1 + 3 only); PRIOR, Azacitidine 100 mg/m² per day by subcutaneous injection on days -5 to day −1, idarubicin and etoposide as in STANDARD; CONCURRENT, Azacitidine 100 mg/m²/day by sc injection on days 1–5, idarubicin and etoposide as in STANDARD; AFTER, Azacitidine 100 mg/m² per day by sc injection days 4–8, idarubicin and etoposide as in STANDARD; EOT, end of trial; RD, refractory disease; Rel, relapse; AE, adverse event; WD, withdrawal; alloHCT, allogeneic hematopoietic-cell transplantation; Cons, consolidation

On the basis of the two-stage design of the study, initially 104 patients were randomized in the first stage between October 2010 and September 2011 with equal distribution into the 4 arms (n = 26 each). The baseline characteristics of all randomized patients as well as those randomized during the first stage (data not shown) were equally distributed except for the frequency of inv(3)/t(3;3) (Table 1).

Table 1 Patient and disease characteristics according to randomization

Induction therapy

Of 104 patients treated during the first stage of the study, 49 (47%) achieved CR/CRi, 49 (47%) had refractory disease (RD), and 6 (6%) died. The number of patients achieving CR/CRi in the treatment arms PRIOR and CONCURRENT were 11 and 10, respectively. Therefore, both arms were stopped with the effective date 16 September 2011. The treatment arms STANDARD and AFTER were continued based on 14 patients achieving CR/CRi each. After recruitment of 168 patients in treatment arms STANDARD and AFTER, 45/84 (54%) and 37/84 (44%) patients achieved CR/CRi, respectively. Thus, only the STANDARD arm of the study was identified as effective according to the predefined criteria.

Overall, 268 patients received induction therapy, 126 (47%) patients achieved CR/CRi, 130 (49%) had RD, and 12 (4%) died during induction therapy. When salvage therapy outside the protocol was taken into account, CR/CRi was achieved in 161 patients (60%), 93 patients had RD (35%), and 14 died (5%), with all treatment arms showing similar increases in response.

A logistic regression model revealed biallelic CEBPA mutation as favorable (Odds Ratio [OR], 7.35; 95%-Confidence Interval [CI], 1.43–27.3), and adverse risk according to 2010 ELN risk classification as unfavorable (OR, 0.48; 95%-CI, 0.26–0.87) parameters for CR/CRi achievement. Within the final model, the estimates for the treatment arms containing azacytidine compared to STANDARD were as follows (PRIOR; OR, 0.59; 95%-CI, 0.25–1.37; CONCURRENT, OR, 0.44; 95%-CI, 0.19-1.05; AFTER, OR, 0.71; 95%-CI, 0.39-1.30).

We also explored the impact of genetics as predictive factor for the treatment effect on response. Patients with IDH2-mutated AML had a higher CR/CRi rate with STANDARD than with azacitidine-containing regimens (8/16 [50%] and 1/16 [6%], p = 0.02, respectively); similarly, CR/CRi rate in patients with RUNX1-mutated AML was in trend superior in STANDARD (10/15, 66%) compared to azacitidine-regimens (11/31 35%, p = 0.06). Twenty-seven patients had AML with TP53 mutation, 13 patients were treated in AFTER and achieved a CR/CRi rate of 46%, whereas CR/CRi rate in the remaining patients was only 21% (p = 0.23). In patients with monosomal, complex (≥3 aberrations), or myelodysplasia-related karyotypes there was no difference in CR/CRi rates between STANDARD compared to all other arms (p = 0.66, p = 0.99, p = 0.46, respectively).

No differences in adverse events were observed in the four treatment arms except laboratory abnormalities all grades being more frequent in STANDARD and CONCURRENT as well as vascular abnormalities all grades and grade ≥ 3 predominantly observed in PRIOR (Table 2).

Table 2 Adverse Event occurring in first induction therapy according to treatment arm and CTCAE category

Consolidation therapy

Consolidation with HiDAC was administered for one cycle in 61 patients, for two cycles in 45 patients, and for three cycles in 37 patients. Within the protocol, 49 patients proceeded to allogeneic HCT in first CR/CRi; overall, 88 patients received allogeneic HCT in first CR/CRi, 45 patients with RD, and 21 patients after relapse. Forty-six patient received a matched-related donor transplant, 107 a matched-unrelated, and one patient a transplant from a haploidentical donor.

Maintenance therapy

Maintenance therapy with azacitidine was started in 15 patients. Median number of applied cycles was 5 (range, 1–24), with 2 patients receiving the intended 24 cycles. In 13 patients maintenance was terminated early (relapse, n = 12; toxicity, n = 1).

Survival analysis

Median follow-up was 56 months (95%-CI, 54–57 months). Overall median and 4-year EFS, RFS, and OS were 3.5 months, 15 months, 16 months, and 16% (95%-CI, 12–21%), 30% (95%-CI, 23–39 months), 29% (95%-CI, 24–35%), respectively. EFS (Fig. 2a) was significantly different among the four arms (p = 0.008), with inferior EFS in all three azacitidine arms compared to STANDARD (p < 0.001). RFS and OS (Fig. 2b, c) were not significantly different among the four study arms (p = 0.18, p = 0.12; respectively), but inferior when the three azacitidine arms were compared to STANDARD (p = 0.04, p = 0.03; respectively). Even in patients proceeding to an allogeneic HCT in first CR/CRi, RFS was in trend inferior in the three azacitidine arms compared to STANDARD (p = 0.07). In an Anderson Gill regression model including allogeneic HCT performed in first CR/CRi as a time-dependent covariable, all azacitidine arms showed worse outcome; further unfavorable factors were higher age, male sex, presence of a TP-53 mutation, and ELN high-risk. Favorable factors were biallelic CEBPA mutations, female gender, and allogeneic HCT in first CR/CRi (Table 3).

Fig. 2
figure 2

Kaplan-Meier plots illustrating the influence of upfront randomization on event-free (EFS) (a), relapse-free (RFS) (b), and overall survival (OS) (c)

Table 3 Anderson-Gill regression model on the endpoint overall survival including allogeneic hematopoietic-cell transplantation performed in first CR/CRi as a time-dependent covariable

Discussion

Based on the improved understanding of the molecular pathogenesis of AML, new treatment strategies targeting specific molecular defects have been implemented within treatment trials of the German-Austrian AML Study Group (AMLSG), such as FLT3 inhibition in AML with FLT3-ITD (ClinicalTrials.gov Identifier: NCT01477606) [19], KIT-inhibition in CBF-AML (NCT00850382) [17], and the use of gemtuzumab ozogamicin in AML with NPM1 mutations (NCT00893399, EudraCT 2009-011889-28) [18].

The remaining patients not eligible for these targeted approaches were mainly patients exhibiting an intermediate-2 or high-risk according to the 2010 ELN categorization [5]. Furthermore, 20% of patients had RUNX1-mutated AML, 15% ASXL1-mutated AML, and 13% TP53-mutated AML, all markers that are categorized within the adverse-risk group in the 2017 ELN risk stratification [34]; in addition, 25% of patients had IDH1/IDH2-mutated AML. Based on previous observations that hypomethylating agents may be particularly active in AML with poor-risk disease features, such as adverse-risk genetics, myelodysplasia-related changes, or specific gene mutations (e.g., TP53) [10,11,12, 35,36,37,38], our hypothesis was that these patients would benefit from incorporation of azacitidine within a regimen of intensive induction chemotherapy. We opted to substitute cytarabine by azacitidine within the commonly used ICE regimen based on share common chemical and biological characteristics as well as same metabolic pathways of incorporation into DNA. Since different sequences of azacitidine administration may affect efficacy, we employed three different investigational regimens, azacitidine given prior, concomitantly, and after chemotherapy.

On the basis of the optimal two-stage design of Simon, two arms of the study, PRIOR and CONCURRENT, had to be stopped early due to insufficient response rates. The study arm AFTER with azacitidine given after idarubicin and etoposide was similarly effective in the first stage than STANDARD, but in the second stage also failed with inferior induction results. Thus, all three investigational treatment arms were associated with an inferior response rate compared to STANDARD. Although comparable CR/CRi rates were achieved in all arms if high-dose cytarabine-based salvage therapy was included in the analysis, the inferior initial response in all azacitidine-containing arms translated into inferior EFS, RFS, and OS. Thus, the results of this study suggest that cytarabine remains an important component of induction therapy even in patients with adverse risk. Furthermore, our results are comparable to those reported by Müller-Tidow et al. adding azacitidine prior to intensive induction therapy [39]. In contrast to that study, we did not identify additive toxicity due to azacitidine, probable because of the omission of cytarabine in the azacitidine-containing treatment arms.

In exploratory analyses we looked at the impact of genetics on response to therapy. Of note, in patients exhibiting a complex, monosomal or myelodysplasia-related karyotype there was no beneficial effect of adding azacitidine to intensive induction therapy. Azacitidine was associated with significant inferior response rates in patients with IDH2- and RUNX1-mutated AML. In the AZA-AML-001 trial evaluating azacitidine versus conventional care regimens [10], mutations in two genes, that is FLT3 and TET2, were shown to negatively impact OS within the azacitidine treatment arm [12]. Of the 27 patients with TP53-mutated AML in our trial, 13 patients were treated in AFTER and achieved a CR/CRi rate of 46%, whereas CR/CRi rate in the remaining patients was only 21%, but this difference was not statistically significant (p = 0.23). Activity of hypomethylating agents in patients with TP53-mutated AML has been demonstrated in two previous trials. In a study of decitabine in patients with AML or MDS, those with TP53 mutations had a 100% response rate compared with a 41% response rate in patients with wild-type TP53, however responses were not durable [11]. In the AZA-AML-001 trial, median OS was prolonged by almost 5 months in patients with TP53 mutations receiving azacitidine compared with patients receiving conventional care regimens [12].

In conclusion, in this study of patients with AML exhibiting predominantly higher risk disease features the substitution of cytarabine by azacitidine within an intensive chemotherapy regimen of idarubicin and etoposide failed to improve response rates. On the contrary, two investigational arms had to be stopped early, and all three investigational arms were associated with poorer outcome compared to the standard ICE regimen.