Introduction

From the end of the twentieth century on, the ectoparasitic mite Varroa destructor switched host from the Asian honey bee Apis cerana to the Western honey bee Apis mellifera by globalization of apiculture1. Due to a lack of co-evolution, the mite has since threatened the survival of A. mellifera colonies worldwide by impairing the brood, weakening adult bees’ capabilities and acting as a vector for several honey bee viruses2,3,4,5,6,7. As an obligate parasite, V. destructor can only reproduce in the honey bees’ brood and host sensing occurs predominantly pheromonically2,8,9,10,11,12. The mature mite is attracted by volatile semiochemicals expressed by the 4th and 5th instar larvae and prefers drone brood over worker brood as 5th instar drone larvae express higher levels of these chemicals compared to worker larvae2,9,11,12,13,14,15. Also non-chemical factors play crucial roles in the preference of the mite for drone brood, such as bigger cell size16,17,18 and brood temperature19. After cell invasion by Varroa and capping, the foundress mite may produce up to three mature fertilized daughters in worker cells20,21 and four in drone cells20,22 before emergence of the newborn bee. Higher numbers of viable offspring mites have repeatedly been reported upon emergence in drone brood compared to worker brood, despite equal percentages of non-reproducing mites21,22,23,24,25,26,27. As such, presence of drone brood in the colony may positively impact the Varroa population growth, increasing the risk for colony death28.

Already since the early spread of Varroa on the Western honey bee, numerous reports described the natural survival of untreated A. mellifera colonies under V. destructor mite infestations on all continents, except Antarctica (reviewed in29,30). In these colonies, various behavioral and individual resistance traits have been characterized that lower the V. destructor mite burden29,30. Cases of ‘suppressed mite reproduction’ (SMR) being a prominent defensive trait were reported in Brazil31,32, Sweden33,34,35, France35,36, Norway37 and South Africa38,39. SMR is nowadays defined as a heritable trait in which cell-invading Varroa mites fail to produce (mated) mature offspring, caused by i.a. yet unknown brood-related traits. Reported h2 values range from 0.18 up to 0.46, mainly dependent on the population or subspecies screened for heritability estimation29,40,41,42,43,44. For example, a h2 of 0.44 has been described in Carniolan bees (Apis mellifera carnica), which is the preferred subspecies of A. mellifera by beekeepers in Flanders (Belgium) due to its gentleness and productivity43. Current hypotheses state that brood-intrinsic SMR may be the result of derangements in the synchronized host-parasite signal interaction45,46,47. Colony-level SMR is currently scored by microscopically phenotyping capped worker- or drone brood for fecundity-based reduction in mite reproduction (number of offspring in relation to pupal age) and/or fertility-based reduction in mite reproduction (presence or absence of at least one offspring mite)48,49,50,51. Phenotyping the trait is however time-consuming and requires sufficient data points for accurate and precise scoring48,52. Consequently, brood-intrinsic SMR is limitedly implemented in selection- or breeding programs aiming to increase resilience towards V. destructor, strengthening a top-down selection approach in which a few beekeepers provide further breeding materials, thereby narrowing genetic diversity.

A possible solution for these shortcomings is the transition from traditional phenotyping to genotyping methods, aiming to breed with or reconstruct genetic profiles associated with the protective trait of interest (= marker-assisted selection or MAS). Quantitative trait loci (QTL) not only support scientific research to identify the molecular basis behind a quantitative trait, but also allow bottom-up selective breeding approaches based on only a few genetic markers53. For instance, cost-effective high-throughput genotyping of genetic markers associated with a trait of interest permits more candidate populations in the breeding program and widens the genetic diversity from which further breeding may be accomplished54. In A. mellifera, numerous QTL (mapping) analyses and associations have already been performed for Varroa-sensitive hygiene (VSH)55,56,57,58,59,60, grooming behavior61 and mite non-reproduction (MNR)29,62,63,64,65. The current study builds further on an earlier published joint eight-variant model, in which eight single nucleotide polymorphisms (SNPs) in seven different genes were associated with drone brood resistance (DBR, i.e. MNR in drone brood) in one hybrid Dutch Varroa-resistant/sensitive Darwinian Black Bee Box (DBBB) colony66,67. For the first time, this study reported the use of Whole Exome Sequencing (WES) for high resolution phenotype-associated variant detection in A. mellifera. More specifically, hybrid Varroa-resistant/sensitive (VR/VS) colonies were established by artificially crossing virgin Varroa-resistant (VR/VR) queens from The Netherlands (DBBB program) with local Varroa-sensitive (VS) drones through single drone insemination (SDI)37,66,68,69. The created hybrid VR/VS queens were then mated naturally, and their offspring drone brood was phenotyped for the DBR trait66. Only the VR/VS hybrid queen originating from the Amsterdam Water Dunes population in The Netherlands (DBBB) showed a percentage of non-reproducing mites in the drone brood that differed significantly from a local control strain66. Thirty-five out of 69 drones (51%) of this queen contained a single non-reproducing mother mite (DBR phenotype). Next, 32 DBR positive- and 32 DBR negative drones were subjected to Illumina WES analysis, single-marker Fisher exact tests and elastic net penalized regression66. The resulting joint eight-variant model, comprising six risk and two protective mutations, classified 56 of the 64 drone phenotypes correctly solely based on the genotypes of the drones (88%)66.

As this joint eight-variant model (together with its effect sizes) was derived from only one hybrid colony from the Amsterdam Water Dunes (The Netherlands)66, it was uncertain whether it held true in a different subspecies on a population-wide scale in the northern part of Belgium (Flanders), with varying genetics, environmental conditions and beekeeping practices. Therefore, we conducted a population-wide genotype–phenotype association study for the eight described SNPs through Flemish population-wide sampling of Apis mellifera carnica honey bee colonies, DBR scoring of Varroa-infested drone brood and variant genotyping of the phenotyped drone pupae using qPCR’s with dual-labelled probes. Novel eight-variant mixed-effect modelling was performed on the obtained genotype–phenotype data set with the eight variants as fixed effects and the beekeeper as random effect, and classification performance of the eight SNPs was evaluated by Receiver Operating Characteristic (ROC) curve construction. Besides, a reduced three-variant model was built containing only three of the eight genetic variants (significant in single variant tests) as fixed effects and the beekeeper as random effect, and classification performance was compared with the new eight-variant mixed-effect model. To determine the spread of beneficial alleles of the reduced three-variant model’s SNP markers in A. mellifera carnica queens from Flanders, we investigated the allelic frequencies by drone leg pooling and qPCR’s with dual-labelled probes. Using the reduced three-variant model’s estimates, we predicted the DBR probability distribution in the screened Flemish honey bee populations. As such, this research paper reveals prospects of marker-assisted selection for Varroa drone brood resistance in honey bees.

Results

Population-wide drone brood sampling, drone pupae phenotyping and genotyping

Out of 162 different drone brood samples from 43 Flemish beekeepers, a total of 842 drone pupae with an age between 15 and 19 days were singly infested and phenotyped according to the reproduction status of the included foundress mite in the capped brood cell. Six hundred and ten pupae with a single reproducing foundress mite were categorized as phenotype ‘1’ (= no DBR), whereas 232 pupae with a single non-reproducing foundress mite were categorized as phenotype ‘0’ (= DBR). All drones in the data set were descendants from the 162 different queens, with at most 28 and at least 1 drone(s) from the same queen in the reproducing phenotype (‘1’) group and at most 9 and at least 1 drone(s) from the same queen in the non-reproducing phenotype (‘0’) group (S1 Fig). All 842 drone pupae were genotyped for the eight genetic SNP variants associated with mite non-reproduction in drone brood (DBR) by qPCR genotyping assays with dual-labeled probes54.

Genotype–phenotype association, modelling and probability calculation

Novel eight-variant mixed-effect modelling on population-wide data set (model M1)

The earlier published joint eight-variant model and its estimates was derived from only one hybrid colony from the Amsterdam Water Dunes in the Netherlands (DBBB colony)66. As such, this model could not directly be applied to our new population-overarching dataset containing offspring from another subspecies (carnica ssp.) and intra-colony and -beekeeper relationships between the sampled drones. Therefore, novel eight-variant mixed-effect modelling was performed on the obtained genotype–phenotype data set with all eight genotyped variants as fixed effects, the beekeeper as random effect and the reproduction status (non-reproduction = 0, reproduction = 1) of the included single foundress mite as outcome variable, and classification performance was tested by receiver operating characteristic (ROC) curve construction (Tables 12, Fig. 1).

Table 1 New eight-variant logistic generalized linear mixed-effect model 1 (M1).
Table 2 Contingency table of the true phenotype cases vs the predicted phenotype cases by the new eight-variant logistic generalized linear mixed-effect model 1 (M1).
Figure 1
figure 1

Evaluation of classification performance and determination of optimal cut-off by Receiver Operating Characteristic (ROC) curve analysis for the novel eight-variant logistic generalized linear mixed-effect model 1 (M1). ROC curve analysis resulted in optimal model cut-off at 1.198165.

Single variant tests for reduced three-variant model construction

Eight different single variant tests were performed with the single genetic variant as fixed effect (SNP1-8), the beekeeper as random effect and the reproduction status (non-reproduction = 0, reproduction = 1) of the included single foundress mite as outcome variable. These single variant tests were significant for SNP2, T > C (p ≤ 0.01**); SNP4, T > C (p ≤ 0.05*) and SNP6, C > T (p ≤ 0.05*) at the 5% significance level (Table 3).

Table 3  Single variant tests with an individual genetic variant as fixed effect (SNP1-8), the beekeeper as random effect and the reproduction status (0 or 1) of the included single foundress mite as outcome variable.

Logistic generalized linear mixed-effect modelling for reduced three-variant model construction on population-wide data set (model M2)

In this study, two logistic generalized linear mixed-effect models have been constructed to test the classification performance of the genetic variants associated with Varroa non-reproduction in drone brood for the actual Varroa reproduction status. The previously described model M1 contained all eight genetic variants as fixed effects and the beekeeper as random effect (cfr. Table 1) and classified 63% of the phenotyped drones correctly (cfr. Table 2). Here, we describe a new reduced model, M2, containing only the three genetic variants that were significant in the single variant tests as fixed effects (i.e. SNP2, SNP4 and SNP6), the beekeeper as random effect and the reproduction status (non-reproduction = 0, reproduction = 1) of the included single foundress mite as outcome variable (Tables 45, Fig. 2).

Table 4 Reduced logistic generalized linear mixed-effect model 2 (M2).
Table 5 Contingency table of the true phenotype cases vs the predicted phenotype cases by the new reduced logistic generalized linear mixed-effect model 2 (M2).
Figure 2
figure 2

Evaluation of classification performance and determination of optimal cut-off by Receiver Operating Characteristic (ROC) curve analysis for the reduced three-variant model M2. ROC curve analysis resulted in optimal model cut-off at 0.630785.

There was no significant difference in classification performance between M1 and M2 at the 5% significance level (DeLong’s test for two correlated ROC curves; p > 0.05). The area under the ROC curve (AUC) for M1 was 0.7955059, while the AUC for M2 equaled 0.7909023 (cfr. Figs. 1 and 2, Table 6). The threshold, sensitivity and specificity of the reduced three-variant model (M2) was 0.630785, 0.487 and 0.866 respectively (Fig. 2, Table 6). This reduced logistic generalized linear model with three genetic variants as predictors classified 641 of the 842 phenotyped drones (76%) correctly (Tables 5 and 6).

Based on the reduced three-variant model M2 and its estimates, the probability on mite non-reproduction or DBR for each of the eight possible drone genotypes could be calculated (Table 7).

Table 6 Comparison of classification performance of model M1 and M2.
Table 7 Probability on DBR for each of the eight possible drone genotypes according to the reduced three-variant model (M2).

Allelic frequency analysis by genotyping honey bee queens from Flanders

Over two consecutive years (2021–2022), 292 different honey bee queens from 54 Flemish beekeepers were genotyped for the three genetic variants of reduced model M2 through gDNA extraction from 30 pooled drone hind legs and qPCR genotyping assays with dual-labeled probes (Tables S3 and 8, Fig. 3). Table 8 shows the individual distribution of genotypes and allelic frequencies of the variant-type alleles for the three genetic variants of M2 in the screened queens. Figure 3 shows the absolute number of queens with each of the 27 possible genotypes (N = 292).

Figure 3
figure 3

Absolute number of screened queens with each of the 27 possible genotypes for the three variants of reduced model M2 (N = 292). WT = homozygous wild type; VT = homozygous variant type; HZ = heterozygous. Queen genotypes are coded as X–Y–Z with X the homo-/heterozygosity for SNP2; Y the homo-/heterozygosity for SNP4 and Z the homo-/heterozygosity for SNP6. Most screened queens were heterozygous for the three genetic variants of model M2.

Table 8 Distribution of genotypes and allelic frequencies of the variant-type alleles for the three genetic variants from model M2 in the 292 screened queens from Flanders.

Predicted colony-level DBR probabilities in drone brood from honey bee queens from Flanders

For each of the 292 genotyped honey bee queens, we calculated the probability on mite non-reproduction in the respective colony’s total drone brood using reduced model M2 (Fig. 4a). The mean and median predicted colony-level DBR probability in this screening was 0.26 and 0.28 respectively. Ranking the queens only based on the number of favorable alleles (out of six alleles) ranks them for predicted colony-level DBR probabilities as well (Fig. 4b). For instance, if we want to select only colonies that have colony-level DBR probabilities higher than or equal to 0.3, we may simply select all queens with four or more favorable alleles, independent of which genetic variants are favorable and which are not. In this screening, a total of 64 queens complied with this condition (Fig. 4c).

Figure 4
figure 4

(a) Distribution of predicted colony-level DBR probabilities over all 292 genotyped queens from Flanders, (b) distributions of colony-level DBR probabilities in function of the number of favorable alleles in the queen and (c) absolute number of queens in function of the number of favorable alleles. (a) The mean and median colony-level DBR probability over all 292 screened queens was 0.26 and 0.28 respectively. (b) A queen with N favorable alleles will always have a lower colony-level DBR probability than a queen with N + 1 favorable alleles, independent of which genetic variants are favorable and which are not. Means are indicated in red, outliers in blue. (c) Absolute number of screened queens in function of the number of favorable alleles (N = 292). Only one queen was homozygous favorable for all three genetic variants of model M2.

Discussion

As previously mentioned, the original joint eight-variant model published by66 was derived from only one hybrid colony from the Amsterdam Water Dunes in The Netherlands. Of the four F1 hybrid Varroa-resistant/Varroa-sensitive queens generated, only the queen derived from the Amsterdam Water Dune stock produced drone brood offspring in which the percentage of DBR differed significantly from a Varroa-sensitive control queen (that is 51 vs. 19% resp.)66. Sixty-four phenotyped drones from this hybrid queen were subsequently used for comparative WES analysis and variant model construction by elastic-net penalized regression. Thus, contrary to our new colony-wide data set, the drones in this preceding study were all descendants from a single honey bee queen, skipping representation of ancestrally diverse population offspring and divergent genetic background in the data set for genotype–phenotype association analysis. Therefore, it was unknown whether the reported eight-variant model remained valid on a population-wide scale and a different subspecies or, with other words, was colony-specific. As such, this new study aimed for the validation of the previously discovered eight-variant model on an independently obtained, representative and population-encompassing genotype–phenotype data set. By novel eight-variant modelling with random effects to account for relatedness of samples in the population-wide data set, we found that only three of the eight previously discovered variants were significant predictors in the new eight-variant model (i.e. SNP2, SNP4 and SNP8). Moreover, the sign of the estimated effect size for SNP2 and SNP8 switched from positive in the previously published joint eight-variant model to negative in the new model, while the one for SNP4 stayed positive in both. Consequently, SNP2 and SNP8 should now be considered as protective mutations according to the new eight-variant model, while SNP4 remains a risk mutation. SNP6 was almost significant as predictor in the novel eight-variant model (p = 0.06), and showed up as a significant fixed effect in the single variant tests (p value < 0.05*) and reduced three-variant model (p value < 0.05*). The sign of the estimate of SNP6 switched from negative in the original eight-variant model to positive in the reduced three-variant model, changing it from a protective variant to a risk mutation.

The concept of ‘genetic background’ in an organism is defined as the set of so-called ‘modifier genes’ and their genotypes that interact with the gene or variation of interest, influencing the phenotype of interest too. Background effects manifest when genetically different individuals exhibit varied phenotypic outcomes despite comprising the same genotype for certain residing or induced mutation(s), which could be explained by the epistasis effect70,71,72,73. This latter effect refers to the phenomenon in which the impact of a gene mutation relies on the genotype of one or more other modifier genes interacting with the considered gene. In mice for instance, mutations induced on different ancestral backgrounds or strains have already resulted in several confounding outcomes. One example is that of the diabetes mutation Leprdb, that results in obesity with transient diabetes when introduced in a C57BL/6J strain, but obesity with overt diabetes when introduced in a C57BLKS/J strain74. Regarding our findings, it is not inconceivable that genetic background may underlie the sign switches of the estimates between the previously published model and the new models described in the current study. More specifically, the originally published joint eight-variant model was derived from a single colony arising from a Darwinian Black Bee Box (DBBB) selection program, while the Flemish population-wide sampling strategy applied in the current study included diverse genetic backgrounds from multiple A. mellifera carnica colonies66,67,75. DBBB selection involves leveraging natural selection in honey bees, in this case the European dark bee or Apis mellifera mellifera, to enhance resistance by discontinuing mite treatment in managed colonies stationed on isolated locations67. Its core principles revolve around mating within the population, specifically between the colonies' own virgin queens and drones, and selecting colonies based on their survival and prolific development67. As such, the genetic background of the Amsterdam Water Dunes population and the therefrom derived hybrid queen used for the original eight-variant model construction may be considered very niche and different from these in our new population-wide sample set.

As the new models described in the current study are derived from and thus, up to now, are only applicable on Carniolan bees, we should note that similar population-wide genotype–phenotype association studies should be performed for other subspecies in Europe (such as Apis mellifera mellifera, Apis mellifera ligustica and Apis mellifera iberiensis) to verify the predictive value of the eight variants for the DBR trait76,77. However, due to the evolutionary relatedness between Apis mellifera carnica and Apis mellifera ligustica (C lineage), we expect similar results for the Italian subspecies as described in this study for Carniolan bees77. Although Apis mellifera iberiensis was morphologically and predominantly categorized in the M lineage together with Apis mellifera mellifera (European dark bee), other studies showed that the Spanish genotype rather finds its evolutionary origin in the African A lineage, especially when considering bees from the south-western part of the Iberian Peninsula76,77,78,79,80. Thus, both subspecies should be handled separately in future equivalent studies.

No significant difference in classification performance between the new eight-variant model M1 and reduced three-variant model M2 could be found at the 5% significance level, although M2 did classify 13% more drones correctly. In addition, when focusing on the DBR phenotype only, i.e. phenotype “0”, the positive predictive value (PPV) or fraction of true cases among the predicted cases of M2 was much higher than the PPV of M1 (58 vs. 39% respectively). Given this, together with the fact that screening and selecting for three genetic variants is practically more feasible than eight, future MAS-programs can benefit from the reduced three-variant model reported in this study. Using the reduced three-variant model’s estimations and distribution of the 292 Flemish queen genotypes, we predicted an average and median colony-level DBR probability over all screened colonies of 0.26 and 0.28 respectively (Fig. 4). Due to very low drone brood sample infestation levels, accurate DBR phenotypic data on colony level is lacking to compare the predicted colony-level DBR probabilities of the screened queens with their actual DBR scores48,52. Although colony-level DBR calculations are independent from Varroa infestation levels, a sufficient amount of single infested cells (SICs) in the brood samples are required for accurate and precise estimation of this phenotype48,52. However, the range of our predicted colony-level DBR probabilities in drone brood, i.e. from 15.4% up to 42.9%, is comparable to those reported by phenotyping in other studies49,81,82,83,84.

Due to the haplodiploid system in honey bees, colonies can be screened for supply of paternal lines by pooling the respective queens’ drones’ hind legs and determining the homo-or heterozygosity of each queen for the three genetic variants in model M2, as performed in this study (Table 8, Fig. 3). Input of beneficial alleles via the patriline guarantees input of beneficial alleles in the F1 fertilized eggs, which is beneficial for queen rearing, and also guarantees beneficial alleles in that F1 queen’s drones (F2). Our results show that thirty pooled drone legs are sufficient for accurate queen SNP-genotyping by qPCR’s with dual-labelled probes, which is similar to the sample size used in another study using a 100K SNP chip for queen genotyping by pooling drones85. When pooling only ten legs from drones descending from a heterozygous queen, the probability of proportions of wild-type and variant-type drones being in the interval of [30–70%] is 0.891. With 30 pooled legs, this probability increases up to 0.984. Lab tests with post-genotyped pooled individuals showed that proportions of wild-type and variant-type drones in the interval of [30–70%] are easily capable of distinguishing heterozygous queens from homozygous queens for a specific variant of interest when using the qPCR’s with dual-labeled probes (S2 Fig). Moreover, the probability of sampling 30 drones with the same allele from a heterozygous queen is about 9.3–10 (0.530), while this probability is 0.00098 (0.510) when sampling only ten drones. Instead of pooling drones, non-destructive queen genotyping can also be performed through gDNA extraction from the queen’s feces, exuviae or queen cell, which can be performed immediately after queen emergence85,86. On the other hand, genetically screening colonies for queen rearing can be fulfilled by pooling the queens’ worker bees’ hind legs and determining the percentages of beneficial alleles of the three genetic variants in the gDNA pool by running intraplate calibration curves with known percentages of alleles. Colonies with high percentages of beneficial alleles in their worker gDNA pools highlight high probabilities on beneficial alleles in queens reared therefrom. Once paternal and maternal lines are selected, mating may be controlled by means of artificial insemination or isolated mating stations. However, the speed of allele fixation in honey bee populations being framed in an organized selection program is difficult to predict and simulate, as this highly depends on many other factors, such as the consistency of the starting population, number of participating colonies in the breeding program, undesired allele input and other traits of interest. It is hard to predict to what extent marker-assisted selection (MAS) based on the three genetic variants reported here will contribute to overall holistic colony resistance, when ‘holistic colony resistance’ is being defined as the combination of all Varroa-resistance traits (e.g. MNR, VSH, grooming behavior, etc.). In this case, it is important to thoroughly consider and ‘weight’ the colony’s performance on these traits, as well as the presence of the beneficial alleles reported here.

The main advantages of the qPCR with dual-labeled probes and the above proposed protocols for selection of paternal and maternal lines are their accessibility, ease of use and implementation, rapidity and cost. Although comparative Genome-Wide Association Studies (GWAS) or high-density SNP chip analyses between colony-encompassing drone pupae expressing the DBR trait and non-DBR drone pupae could potentially reveal more and/or other significant phenotype-associated markers, the application of Whole Genome Sequencing (WGS) or SNP chips for high rate routine screening in honey bee sciences is less accessible, more complicated, more expensive and more time-consuming compared to qPCR’s with dual-labeled probes, especially for small research institutes87. For example, the 100K High-Density Honey Bee (HDHB) SNP chip from Jones et al. contains over 100K honey bee SNPs, including 498 SNPs associated with Varroa-sensitive hygiene and 8478 SNPs associated with Varroa resistance85. Population-wide association studies similar to the one reported here could be performed using this SNP chip and reveal more SNP’s associated with the trait of interest, although it should be noted that the eight DBR-associated SNP’s targeted in the current study resulted from an elastic-net penalized regression on more than 140.000 SNP’s that were not significant on genome-wide scale in the previous comparative WES study66. Same issues could be encountered when performing comparative SNP chip—or GWAS analyses, demanding for a joint variant modelling as well, which will considerably reduce the number of phenotype-associated variants too.

The use of genetic markers in MAS has emerged as a powerful tool in countless selective breeding programs. By identifying specific regions or variants in the honey bee genome associated with desirable traits, such as Varroa resistance, genetic markers enable breeders to make efficient targeted selections and allow for a high-throughput bottom-up selective breeding approach. It is important to note that the successful implementation of genetic markers in MAS relies on a comprehensive understanding of the genetic basis of the targeted trait. In the case of the three genetic variants associated with DBR in Flanders reported in this study, further research should clarify the role of the genetic markers in Worker Brood Resistance (WBR, i.e. MNR in worker brood), but also the molecular basis of the marker-associated genes resulting in or influencing the DBR phenotype.

Materials and methods

Phenotyping for genotype–phenotype association study

Over a time period of 2 years (2020 and 2021), 162 different A. mellifera colonies from 43 Flemish carnica ssp. beekeepers were sampled for at least two square decimeter of capped drone brood with an age between 15 and 19 days (S2 Table). Each brood sample was frozen immediately after collection by the beekeeper and kept in a cold chain (− 20 °C) for transport to the laboratory. During brood analysis, capped cells were opened under a microscope and inspected for the presence of foundress mites, male mites and/or daughter mites. Pupae with a single reproducing foundress mite (i.e. at least one offspring mite) were categorized as phenotype ‘1’ (= no DBR), whereas pupae with a single non-reproducing foundress mite (i.e. total absence of offspring) were categorized as phenotype ‘0’ (= DBR). Per brood sample, analysis was ceased when 200 cells were dissected. All sampled colonies were managed according to standard beekeeping practices.

Single thorax gDNA extractions

From each of the 842 phenotyped drone pupae, the thorax was dissected and individually homogenized with 0.5 mL lysis buffer (100 mM NaCl; 20 mM Tris–HCl, pH 8; 25 mM EDTA, pH 8; 0.5% SDS), metal- and zirconium beads during 1 min at 30 Hz with a PowerLyzer® 24 Homogenizer. After incubation with 10 µL proteinase K (20 mg/mL) at 56 °C for 4 h, gDNA was extracted by addition of an equal volume of phenol:chloroform:isoamylalcohol and centrifugation at 12000g for 30′ at 4 °C, followed by transfer and extraction of the supernatant with an equal volume of chloroform and centrifugation at 12000g for 15′ at 4 °C. The gDNA in the transferred supernatant was precipitated by addition of two volumes of ice cold 100% ethanol and overnight incubation at − 20 °C. After centrifugation at 12000 g for 30′, the DNA pellet was washed with 70% ethanol, air-dried and resuspended in 100 µL DNase/RNase free water.

qPCR assays with dual-labeled probes for variant genotyping

The qPCR assays with dual-labeled probes of Bouuaert et al. (2021) were used for genotyping54. Briefly, for each gDNA sample, genotyping assays were performed in a total volume of 10 µL with 1 × KEY buffer, 250 nM of each primer, 250 nM of each dual-labeled probe, 200 µM of each dNTP, 0.5 U TEMPase Hot Start DNA Polymerase (VWR) and 20 ng gDNA. Primer and probe sequences can be found in54. The Bio-Rad C1000™ Thermal Cycler with CFX96™ Real-Time System was set at one cycle of 95 °C for 14′40″, followed by 60 cycles of [95 °C for 20″ followed by 40″ of the assay-specific annealing/elongation/signal detection temperature]54. Data analysis and allelic discrimination plot construction was done with the Bio-Rad CFX Manager 3.1 Software.

Genotype–phenotype association and modelling

RStudio version 4.2.2 was used for data exploration and—visualization. All statistical tests were checked for and complied with the required assumptions. For modelling, analyses were conducted in R version 4.1.3 (“One Push-Up”), with the packages “multcomp”, “cutpointr”, “lme4”, “lmerTest” and “pROC”. For the single variant tests, a logistic mixed-effect model with the reproduction status of the included single foundress mite (non-reproduction (DBR) = 0, reproduction (no DBR) = 1) as binary outcome variable, the beekeeper as random effect and a single variant as predictor was compared to a logistic mixed-effect model with the reproduction status of the foundress mite as outcome variable and the beekeeper as random effect by means of ANOVA testing. When significant, a post hoc test was used with a Tukey correction for multiple testing. To evaluate the classification performance of the variants for Varroa reproduction, the prediction of a logistic generalized linear mixed-effect model containing (1) all eight genetic variants as fixed effects and the beekeeper as random effect and (2) the subset of significant genetic variants from the single variant tests as fixed effects and the beekeeper as random effect was compared with the real reproduction status using a leave-one-out cross validation strategy (LOOCV). Optimal cutoff points for classification were based on the Youden index. Classification performance was analyzed using Delong’s test. None of the constructed models did converge with ‘colony’ as random variable (due to too few drones per colony). As colonies from the same beekeeper are genetically more related, beekeeper was used as a simple approximation. Throughout all analyses, significance was set at α ≤ 0.05.

DBR probability calculations based on reduced three-variant model M2

Based on the reduced three-variant model M2 and its estimates (cfr. Table 4), the probability on mite non-reproduction or DBR for each of the eight possible drone genotypes could be calculated with formulas (1) and (2):

$${\text{P}}\left( {{\text{DBR}}\;{\text{for}}\;{\text{drone}}\;{\text{genotype}}\;{\text{ABC}}} \right) = 1 - \left( {\frac{{\exp \left( {\text{Y}} \right)}}{{1 + \exp \left( {\text{Y}} \right)}}} \right)$$
(1)
$$\begin{gathered} {\text{with Y}} = 0.8461 + {\text{A*}}\left( { - 0.5599} \right) + {\text{B*}}0.4581 + {\text{C*}}0.3995 \hfill \\ \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad {\text{A}},{\text{B}},{\text{C}} = 0\;{\text{if}}\;{\text{Wt}}\;{\text{for}}\;{\text{SNP}}2;\;{\text{SNP}}4\;{\text{and}}\;{\text{SNP}}6\;{\text{resp}}. \hfill \\ \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad {\text{A}},{\text{B}},{\text{C}} = 1\;{\text{if}}\;{\text{Vt}}\;{\text{for}}\;{\text{SNP}}2;\;{\text{SNP}}4\;{\text{and}}\;{\text{SNP}}6\;{\text{resp}}. \hfill \\ \end{gathered}$$
(2)

Pooled drone leg gDNA extractions for queen genotyping and allelic frequency analysis

For each of the 292 bee queens screened in the allelic frequency analysis, 30 hind legs from 30 different drone pupae were pooled in an Eppendorf tube containing 180 µL ATL buffer from the QIAamp® DNA Micro Kit (Qiagen). After overnight incubation at 56 °C with 20 µL proteinase K, gDNA was extracted according to the manufacturer’s instructions. gDNA was eluted in 50 µL DNase/RNase free water. Queen genotyping for SNP2; SNP4 and SNP6 was performed as previously described by qPCR assays with dual labelled probes using the pooled drone gDNA as template. For each variant, allelic discrimination plots were constructed by plotting the end-point Relative Fluorescence Units (end-RFU) values of FAM (fluorescein; wild-type fluorophore) against the end-RFU values of TR (Texas Red; variant-type fluorophore) for all pooled drone leg samples. Based on these allelic discrimination plots, the genotypes of the respective queens were reconstructed. For each of the three variants, the frequency of the variant type allele was calculated as (number of heterozygous queens + 2*number of homozygous variant type queens)/(2*total number of screened queens).

Predicted colony-level DBR probabilities in drone brood from honey bee queens from Flanders

Based on the genotype of a queen, the probability on mite non-reproduction in that queen’s offspring drone brood could be predicted by using model M2. First, the probability on each of the eight possible drone genotypes was calculated for each queen. Next, these probabilities on the different drone genotypes were multiplied with the corresponding probabilities on mite-non reproduction for the respective drone genotypes. The probability on mite non-reproduction in the colony is the sum of these multiplied probabilities:

$${\text{P}}\left( {{\text{colony}} - {\text{level}}\;{\text{DBR}}} \right) = \sum \left( {{\text{P}}\left( {{\text{drone}}\;{\text{genotype}}\;{\text{ABC}}} \right){\text{*P}}\left( {{\text{DBR}}\;{\text{for}}\;{\text{drone}}\;{\text{genotype}}\;{\text{ABC}}} \right)} \right)$$