Main

The transcriptional response of a cell to genetic perturbation reveals fundamental insights into how the cell functions. Transcriptional responses can describe diverse functionality ranging from how gene regulatory machinery helps maintain cellular identity to how modulating gene expression can reverse disease phenotypes1,2,3. This has implications for biomedical research, especially in developing personalized therapeutics. For instance, validating drug targets through genetic perturbation studies increases the likelihood of successful clinical trials4. Additionally, identifying synergistic gene pairs can enhance the effectiveness of combination therapies5,6,7,8. Because complex cellular phenotypes are known to be produced by genetic interactions between small sets of genes, identifying such interactions could facilitate precise cell engineering9,10,11,12,13,14. While recent advancements have enabled scientists to more rapidly sample perturbation outcomes experimentally9,15,16,17,18,19, computational approaches that predict perturbation effects are indispensable for prioritizing experimental perturbations due to the combinatorial explosion of potential multigene combinations.

However, existing computational methods for predicting perturbational outcomes present their own limitations. The predominant approach for single-gene perturbation outcome prediction relies on inferring transcriptional relationships between genes in the form of a gene regulatory network20,21,22,23. This is limited either by the difficulty in accurately inferring a network from gene expression datasets24 or by the incompleteness of networks derived from public databases25,26,27. Moreover, existing predictive models built using such networks linearly combine the effects of individual perturbations, which renders them incapable of predicting non-additive effects of multigene perturbations, such as synergy22. More recent work uses deep neural networks trained on data from large perturbational screens to skip the network inference step and directly map genetic relationships into a latent space for perturbation outcome prediction28,29. However, these methods still require that each gene in the combination be experimentally perturbed before the effect of perturbing the combination can be predicted.

Here, we present graph-enhanced gene activation and repression simulator (GEARS), a computational method that integrates deep learning with a knowledge graph of gene–gene relationships to simulate the effects of a genetic perturbation. The incorporation of biological knowledge gives GEARS the ability to predict the outcomes of perturbing single genes or combinations of genes for which there are no prior experimental perturbation data. GEARS outperformed existing approaches in predicting the outcomes of both one-gene and two-gene perturbations drawn from seven distinct datasets. GEARS could also detect five different genetic interaction subtypes and generalize to new regions of perturbational space by predicting phenotypes that were unlike what was seen during training. Thus, GEARS can directly impact the design of future perturbational experiments.

Results

Knowledge-informed deep learning of perturbation effects

GEARS is a deep learning-based model that predicts the gene expression outcome of combinatorially perturbing a set of one or more genes (perturbation set). Given unperturbed single-cell gene expression along with the perturbation set being applied (Fig. 1a), the output is the transcriptional state of the cell following the perturbation (Methods).

Fig. 1: GEARS combines prior knowledge with deep learning to predict postperturbation gene expression.
figure 1

a, Problem formulation: given unperturbed gene expression (green) and applied perturbation (red), predict the gene expression outcome (purple). Each box corresponds to an individual gene. Arrows indicate change in expression. b, GEARS model architecture. (i) For each gene in the unperturbed state, GEARS initializes a gene embedding vector (green) and a gene perturbation embedding vector (red) (ii). These embedding vectors are assigned as node features in the gene relationship graph and the perturbation relationship graph (iii). A GNN is used to combine information between neighbors in each graph. Each resulting gene embedding is summed with the perturbation embedding of each perturbation in the perturbation set (iv). The output is combined across all genes using the cross-gene layer and fed into gene-specific output layers (v). The final result is postperturbation gene expression; MLP, multilayer perceptron.

GEARS introduces a new approach of representing each gene and its perturbation using distinct multidimensional embeddings (arbitrary vectors of numbers used to represent a meaningful concept; Fig. 1b and Supplementary Note 1)30,31. Each gene’s embedding is tuned through the course of training to represent key traits of that gene. Splitting the representation into two multidimensional components gives GEARS additional expressivity for capturing gene-specific heterogeneity of perturbation response. Each gene’s embedding is sequentially combined with the perturbation embedding of each gene in the perturbation set and finally used to predict the postperturbation state for that gene. This prediction is conditioned on a single ‘cross-gene’ embedding vector that captures transcriptome-wide information for each cell.

GEARS is uniquely able to predict the outcomes of perturbation sets that involve one or more genes for which there are no experimental perturbation data. GEARS does this by incorporating prior knowledge of gene–gene relationships using a gene coexpression knowledge graph when learning gene embeddings and a Gene Ontology (GO)-derived knowledge graph when learning gene perturbation embeddings (Methods). This relies on two biological intuitions: (i) genes that share similar expression patterns should likely respond similarly to external perturbations, and (ii) genes that are involved in similar pathways should impact the expression of similar genes after perturbation (Fig. 1b). Different knowledge graphs, such as large context-specific networks, may prove more suitable depending on the gene set of interest32 (Supplementary Note 2). GEARS functionalizes this graph-based inductive bias using a graph neural network (GNN) architecture33.

Predicting single-gene perturbation transcriptional responses

In the case of single-gene perturbations, GEARS was evaluated on the perturbation of genes whose data had been held out at the time of training, and thus those genes had not been seen experimentally perturbed during training (Fig. 2a). We used data from two different genetic perturbation screens consisting of 1,543 (RPE-1 cells) and 1,092 (K562 cells) perturbations, respectively, with each measuring over 170,000 cells (Replogle et al.34; Supplementary Notes 3 and 4). The screens were run using the Perturb-seq assay, which combines a pooled screen with a single-cell RNA-sequencing readout of the entire transcriptome for each cell16. GEARS was trained separately on each dataset. In addition to an existing deep learning-based model (CPA), we designed two alternative baseline models for evaluation of performance. One baseline model (no perturbation) assumes that the perturbation does not result in any change in gene expression. The other baseline model first infers a gene regulatory network20 and then linearly propagates the effects of perturbing a gene along this network (adapted from CellOracle22; Supplementary Notes 6 and 7).

Fig. 2: GEARS outperforms alternative approaches in predicting postperturbation gene expression.
figure 2

a, Train–test data split for single-gene perturbations. b, The m.s.e. in predicted postperturbation gene expression for single-gene perturbations normalized to the no perturbation case. For each perturbation, the 20 most differentially expressed (DE) genes were considered; perturb, perturbation; GRN, gene regulatory network. c, Pearson correlation between mean predicted postperturbation differential gene expression over control and true values across all genes. d, Fraction of the top 20 differentially expressed genes where the predicted postperturbation differential expression is in the opposite direction of the ground truth. e, Train–test data split categories for two-gene perturbations. f, Normalized m.s.e. in predicted postperturbation gene expression for two-gene perturbations. g, Boxes indicate experimentally measured differential gene expression after perturbing the gene combination FOSB and CEBPB (n = 85). The red symbol shows the mean change in gene expression predicted by GEARS when it has only seen FOSB experimentally perturbed at the time of training. The green dotted line shows mean unperturbed control gene expression. Whiskers represent the last data point within 1.5× interquartile range. h, Jaccard similarity between model-predicted differentially expressed genes and true differentially expressed genes. Throughout the figure, markers correspond to the mean and error bars correspond to 95% confidence intervals computed over predictions made by five models trained using different data splits (n = 5).

We tested model performance by measuring the mean squared error (m.s.e.; Fig. 2b) and Pearson correlation (Fig. 2c) between the predicted postperturbation gene expression and true postperturbation expression for the held-out set (Supplementary Table 1). Because the vast majority of genes do not show substantial variation between unperturbed and perturbed states, we restricted our m.s.e. analysis to the harder task of only considering the top 20 most differentially expressed genes (Supplementary Note 8). GEARS significantly outperformed all baselines on both datasets with an m.s.e. improvement of 30–50% (Fig. 2b). When considering all genes using Pearson correlation, GEARS exhibited more than two times better performance in the case of both cell lines (Fig. 2c). Additionally, GEARS displayed a clear improvement in capturing the right direction of change in expression following perturbation (Fig. 2d), which reflects a more accurate representation of regulatory relationships. We consistently observed superior performance of GEARS over baselines across metrics (Supplementary Fig. 1) and across five additional datasets, including a genome-wide perturbation screen16,18,34,35,36 (Supplementary Table 2 and Supplementary Figs. 2 and 3). Furthermore, GEARS scaled to large datasets more effectively than conventional gene regulatory network-based methods (Supplementary Table 3). Beyond transcription levels, GEARS also identified groups of genes that induced similar transcriptional responses to perturbation, even when data for their perturbation had not been seen during training (Extended Data Fig. 1 and Supplementary Note 9).

Predicting multigene perturbation outcomes

GEARS is designed to predict transcriptional outcomes for perturbation sets consisting of multiple genes. We evaluated performance using a Perturb-seq dataset (Norman et al.9) containing 131 two-gene perturbations. When evaluating GEARS on two-gene perturbations, we defined three generalization classes based on how many of the genes we see experimentally perturbed at the time of training (Fig. 2e). The first case is when the model has seen each of the two genes in the combination individually experimentally perturbed in the training data (two-gene perturbation, zero of two unseen). The other cases, which are progressively harder to predict, are when either one of the two perturbed genes (one of two unseen) or both genes (two of two unseen) have not been seen individually perturbed at the time of training (Supplementary Fig. 4 and Supplementary Note 10). GEARS improves performance by more than 30% across all cases (Fig. 2f), with the highest improvement of 53% observed when both perturbed genes in the combination are unseen. Improvements were also observed across other metrics (Supplementary Fig. 5) and on a different dataset (Supplementary Tables 2 and 4)37.

Model performance was also analyzed on a gene-by-gene basis. In the case of predicting the outcome of perturbing FOSB with CEBPB, GEARS correctly captured both the right trend and the magnitude of perturbation across all 20 differentially expressed genes (Fig. 2g) even though one of the perturbed genes (CEBPB) had not been seen experimentally perturbed during training. Moreover, the predictions were different from the transcriptional state observed in the case of the single-gene perturbation (FOSB) that was seen at the time of training the model (Supplementary Fig. 6). Similar performance was observed for several other examples across generalization categories (Supplementary Fig. 7). We also measured 50% greater enrichment in the most significant differentially expressed genes as predicted by GEARS than observed with baseline methods (Fig. 2h, Extended Data Fig. 2 and Supplementary Note 11).

Although the incorporation of knowledge graphs was instrumental in enabling these predictions (Extended Data Fig. 3 and Supplementary Fig. 8), it also limits the ability of GEARS to predict outcomes for perturbing previously unperturbed genes that are not well connected in this graph (Extended Data Fig. 4 and Supplementary Note 12). GEARS makes use of a Bayesian formulation to overcome this challenge by outputting an uncertainty metric that is inversely correlated with model performance (Supplementary Fig. 9).

Predicting non-additive combinatorial perturbation effects

In the case of a two-gene perturbation, if the outcomes of perturbing the two genes independently are already known, then a naive model could simply add the perturbational effects to estimate the effect of the combinatorial perturbation (Fig. 3a,b). However, genes are known to interact with one another to produce non-additive genetic interactions after perturbation. For example, two genes that independently cause a minor loss in cell growth could synergistically interact with one another following combinatorial perturbation to cause cell death.

Fig. 3: GEARS accurately predicts non-additive combinatorial effects and genetic interaction subtypes.
figure 3

a, Illustration of an additive interaction between two genes after perturbation. X and Y represent change over the unperturbed state caused by single-gene perturbations. Z is a combinatorial perturbation of both genes. b, Definition of genetic interaction subtypes. c, Mean precision@10 in predicting genetic interactions from 131 two-gene combinations (error bars represent s.d.). A random model performs 1,000 random draws; other models perform three predictions (n = 3). d, Change in gene expression after perturbing the combination PTPN12 and ZBTB25. The gray bars show the true mean postperturbation gene expression change (n = 257). The hatched gray bars show the true change for each of the two single-gene perturbations performed individually (PTPN12 n = 164 and ZBTB25 n = 247), which are summed by the naive additive model. The red bar indicates the prediction made by GEARS (n = 3 trained models). Error bars correspond to 95% confidence interval.

We defined five types of genetic interactions (Supplementary Note 15): synergy, suppression, neomorphism, redundancy and epistasis (Supplementary Note 16). When both genes in a two-gene combination had been individually perturbed, the genetic interaction scores predicted by GEARS showed a stronger correlation with the ground truth scores calculated using true expression than existing methods. For instance, the correlation coefficient (R2) was approximately 0.4 for synergy, neomorphism and redundancy, whereas it was only around 0.0 for the same interactions when predicted by CPA (Extended Data Fig. 5).

To identify new genetic interactions, GEARS can recommend pairs of genes that are predicted to have strong genetic interactions. To assess the real-world application of GEARS where the recommended pairs are then experimentally validated, we calculated performance metrics based on the top-ranked predictions. Precision@10 measures the fraction of predicted combinations in the top ten that truly exhibit a specific genetic interaction subtype, as determined by experimentally measured gene expression after perturbation (Supplementary Note 17). When compared to baseline methods, GEARS improved precision@10 by more than 40% for four of five genetic interaction subtypes, and the improvement exceeded 90% for redundancy and epistasis (Fig. 3c). Additionally, GEARS demonstrated a twofold increase in accuracy when predicting the ten strongest interactions for a specific genetic interaction subtype (top ten accuracy; Extended Data Fig. 6b). Further validation using an additional dataset confirmed the effectiveness of GEARS, showing a 20% increase in accuracy across four genetic interaction subtypes. Moreover, the precision–recall curves for all observed genetic interaction subtypes exhibited a higher area under the curve than other methods (Supplementary Fig. 12)37. In scenarios where only one gene had been perturbed previously, GEARS successfully detected synergistic and suppressive interactions (Supplementary Fig. 13).

Different types of genetic interactions can also be evaluated at the level of individual genes. For this, the 20 most affected genes were identified for each two-gene combination (Supplementary Note 18). Based on the m.s.e. for these genes, GEARS was able to capture the effects of different types of genetic interactions more than 40% better than existing methods across three of the five genetic interaction subtypes (Extended Data Fig. 6a). As an example, GEARS predicted the correct non-additive effects across almost all of the top ten non-additively expressed genes following the perturbation of PTPN12 and ZBTB25 (Fig. 3d). This was also observed across other examples belonging to different genetic interaction subtypes (Supplementary Fig. 14).

Predicting new biologically meaningful phenotypes

We applied GEARS to the discovery of new phenotypes by predicting the outcomes of all pairwise combinatorial perturbations of 102 genes from the Norman et al. dataset9 (Fig. 4a). To make this prediction, GEARS was trained using the postperturbational gene expression profiles for both one-gene perturbation outcomes and 128 two-gene perturbation outcomes (Fig. 4b and Supplementary Note 13). The predicted postperturbation expression captured many distinct phenotypic clusters, including those previously identified in Norman et al.9 (Fig. 4c and Supplementary Note 13). Additionally, GEARS predicts a few new phenotypes, including one cluster showing high expression of erythroid markers.

Fig. 4: GEARS can predict new biologically meaningful phenotypes.
figure 4

a, Workflow for predicting all pairwise combinatorial perturbation outcomes of a set of genes. b, Low-dimensional representation of postperturbation gene expression for 102 one-gene perturbations and 128 two-gene perturbations used to train GEARS. A random selection is labeled. c, GEARS predicts postperturbation gene expression for all 5,151 pairwise combinations of the 102 single genes seen experimentally perturbed. Predicted postperturbation phenotypes (non-black symbols) are often different from phenotypes seen experimentally (black symbols). Colors indicate Leiden clusters labeled using marker gene expression (Supplementary Information).

To ascertain the biological relevance of this newly predicted phenotype, which was not observed in the training data, we compared it with data for proerythroblasts from the Tabula Sapiens cell atlas (Supplementary Fig. 10 and Supplementary Note 14). While this cluster’s distinct high erythroid marker expression has still not been experimentally validated, its identification demonstrates the ability of GEARS to expand the space of postperturbation phenotypes beyond what is observed in perturbational experiments. Moreover, we validated the robustness of this prediction by excluding all phenotypically similar postperturbation outcomes during training (Supplementary Fig. 11).

Mapping combinatorial space of diverse genetic interactions

We extended our analysis to predict genetic interactions among all possible pairwise combinations of 102 genes (Fig. 5a), following CRISPRa-based combinatorial gene activation9. By leveraging the predicted postperturbation gene expression for each of the 5,151 pairwise combinatorial perturbations, we constructed a genetic interaction map that could simultaneously represent five distinct types of genetic interactions: synergy, suppression, neomorphism, redundancy and epistasis. The genetic interaction map revealed a rich and diverse landscape of genetic interactions, with many genes exhibiting strong tendencies toward specific genetic interaction subtypes (Fig. 5b). This effect is most evident in the interactions between functionally related genes, which is in line with previous experimental results15,16,38. For instance, genes involved in early erythroid differentiation pathways (PTPN12, IKZF3 and LHX1) show a consistent trend of strong synergistic interactions with one another. Moreover, the uniqueness of this genetic interaction map is in how it captures a much broader range of interactions than a conventional genetic interaction map, which focuses primarily on synergistic or buffering interactions (Supplementary Fig. 15)15.

Fig. 5: GEARS can search perturbational space for novel genetic interactions of different subtypes.
figure 5

a, Workflow for predicting genetic interaction (GI) scores. b, Multidimensional genetic interaction map generated by GEARS for all pairwise combinations of 102 single genes perturbed in Norman et al.9. For each combination, GEARS predicted genetic interaction scores for five different genetic interactions: synergy and suppression (red to blue), neomorphism (green), redundancy (orange) and epistasis (purple).

To validate some of these predictions, we used data from a cell fitness screen that perturbed all pairwise combinations of 92 genes9 (Supplementary Note 19). GEARS performed comparably to a real Perturb-seq experiment in capturing the strong interaction effects observed in the cell fitness screen (Extended Data Fig. 7). The distribution of GEARS-predicted genetic interaction scores was significantly higher for perturbations showing synergistic cell fitness effects (P < 0.0013, n = 123; data were analyzed by one-sided t-test comparing the means) and lower for those showing buffering effects (P < 4 × 10−5, n = 69) than those showing approximately additive cell fitness effects. These findings increase our confidence that several strong interactions captured in the genetic interaction map are biologically meaningful even though not all predictions have been experimentally validated. When trained to directly predict cell fitness, GEARS also showed strong performance (R2 between 0.64 and 0.93; Supplementary Figs. 16 and 17 and Supplementary Note 20).

Discussion

Recent advancements in high-throughput perturbational screens have enhanced both the precision with which genes can be targeted39,40 and the scale of information generated17,34. However, their scalability is limited due to cost. As CRISPR-based perturbational screens become more widely used in drug discovery, GEARS can serve as a valuable complement to these experiments. GEARS has the unique ability to infer a broader range of multigene perturbation outcomes using the same experimental data as existing methods19,41. Furthermore, GEARS can guide the design of new screens by identifying perturbations that maximize information gained and minimize experimental costs (Extended Data Fig. 4).

However, for reliable predictions, GEARS must be trained on the same cell type or experimental condition. Moreover, training GEARS using combinatorial perturbation data is essential for accurate prediction of multigene perturbations. Various confounding factors in the data can also influence the accuracy of predictions, including cell cycle effects, the assumed success of gene editing experiments and heterogeneity in postperturbation distribution (Supplementary Note 21).

One of the important strengths of GEARS is detecting emergent interactions between pairs of genes. This feature enhances the discovery of feasible routes for engineering cell identity, where cells are guided between transcriptional states that may be significantly different from one another. For example, GEARS can aid in the precise reengineering of immune cells to prevent exhaustion when targeting cancer14,42 or in the reversal of phenotypes linked to aging43,44,45. Moreover, models like GEARS could predict effective cocktails of transcription factors for reprogramming induced pluripotent stem cells into individual-specific in vitro models46,47,48,49,50. Therefore, GEARS holds promise to not only impact the discovery of novel small molecules for targeting disease but also aid in designing the next generation of cell- and gene-based therapeutics.

Methods

Overview of GEARS

GEARS considers a perturbation dataset of N cells \({{{\mathcal{D}}}}={\{({{{{\bf{g}}}}}^{i},{{{{\mathcal{P}}}}}^{i})\}}_{i = 1}^{N}\), where \({{{{\bf{g}}}}}^{i}\in {{\mathbb{R}}}^{K}\) is the gene expression vector of cell i with K genes, and \({{{{\mathcal{P}}}}}^{i}=({P}_{1}^{i},\cdots \,,{P}_{M}^{i})\) is the set of perturbations of size M performed on cell i. M = 0 corresponds to an unperturbed cell. Each perturbation Pk in the set corresponds to the index of a gene. The goal of GEARS is to learn a function f that maps a novel perturbation set \({{{\mathcal{P}}}}\) to its postperturbation outcome, which is a gene expression vector g.

Specifically, given a perturbation set \({{{\mathcal{P}}}}=({P}_{1},\cdots \,,{P}_{M})\), GEARS first applies a GNN encoder \({f}_{{{\mbox{pert}}}}:{\mathbb{Z}}\longrightarrow {{\mathbb{R}}}^{d}\) that maps each genetic perturbation \(P\in {{{\mathcal{P}}}}\) to a d-dimensional gene perturbation embedding. Another GNN-based encoder \({f}_{{{\mbox{gene}}}}:{\mathbb{Z}}\longrightarrow {{\mathbb{R}}}^{d}\) maps each gene into a gene embedding. GEARS then combines the set of perturbation embeddings with each of the gene embeddings using a compositional module. A cross-gene decoder \({f}_{{{\mbox{dec}}}}:{\{{{\mathbb{R}}}_{i}^{d}\}}_{i = 1}^{K}\longrightarrow {{\mathbb{R}}}^{K}\) then takes in the set of perturbed gene embeddings and maps them to the postperturbation gene expression vector. The entire network is trained end to end with an autofocus direction-aware loss (Supplementary Note 22).

Gene coexpression graph encoder

To capture the relative heterogeneity of perturbational response for each gene, GEARS represents each gene \(u\in {\mathbb{Z}}\) as a learnable embedding \({{{{\bf{x}}}}}^{{{\mbox{gene}}}}\in {{\mathbb{R}}}^{d}\) instead of a scalar. GEARS first obtains a representation for each gene that captures coexpression patterns in the cell. For this, we apply a GNN on a gene coexpression graph \({{{{\mathcal{G}}}}}_{{{\mbox{gene}}}}\), where edges link coexpressed genes (nodes). GEARS calculates Pearson correlations ρu,v among genes u,v in the training dataset. For each gene u, we connect it to the top Hgene genes that have the highest ρu,v and are above a threshold δ. Next, we apply a GNN parameterized by θg that augments every gene u’s embedding \({{{{\bf{x}}}}}_{u}^{\,{{\mbox{gene}}}\,}\) by integrating information from the embeddings of its coexpressed genes: \({{{{\bf{h}}}}}_{u}^{\,{{\mbox{gene}}}\,}={{{{\rm{GNN}}}}}_{{\theta }_{g}}\left({{{{\bf{x}}}}}_{u}^{\,{{\mbox{gene}}}\,},{{{{\mathcal{G}}}}}_{{{\mbox{gene}}}}\right)\in {{\mathbb{R}}}^{d}\).

Incorporating prior knowledge of gene–gene relationships using the GO graph

GEARS predicts the outcome of perturbing genes never seen perturbed before by constructing a gene perturbation similarity graph \({{{{\mathcal{G}}}}}_{{{\mbox{pert}}}}\), leveraging the pathway information contained in GO51. We first define \({{{{\mathcal{G}}}}}_{{{\mbox{GO}}}}\) as a bipartite graph where an edge links a gene to a pathway GO term. We denote \({{{{\mathcal{N}}}}}_{u}\) as the set of pathways for a gene u. We compute the Jaccard index between a pair of genes u,v as \({J}_{u,v}=\frac{| {{{{\mathcal{N}}}}}_{u}\cap {{{{\mathcal{N}}}}}_{v}| }{| {{{{\mathcal{N}}}}}_{u}\cup {{{{\mathcal{N}}}}}_{v}| }\); this measures the fraction of shared pathways between the two genes. For each gene u, we then select the top Hpert gene v with the highest Ju,v to construct \({{{{\mathcal{G}}}}}_{{{\mbox{pert}}}}\). Next, we initialize all possible gene perturbations (P1,,PK) with learnable embeddings \(({{{{\bf{x}}}}}_{1}^{\,{{\mbox{pert}}}\,},\cdots \,,{{{{\bf{x}}}}}_{K}^{\,{{\mbox{pert}}}\,})\). We then feed them into a GNN parameterized by θp to augment every perturbation v’s embedding \({{{{\bf{x}}}}}_{v}^{\,{{\mbox{pert}}}\,}\) by integrating information from neighboring perturbations in \({{{{\mathcal{G}}}}}_{{{\mbox{pert}}}}\): \({{{{\bf{h}}}}}_{v}^{\,{{\mbox{pert}}}\,}={{{{\rm{GNN}}}}}_{{\theta }_{p}}({{{{\bf{x}}}}}_{v}^{\,{{\mbox{pert}}}\,},{{{{\mathcal{G}}}}}_{{{\mbox{pert}}}})\in {{\mathbb{R}}}^{d}\).

Modeling combinatorial perturbations across genes

Given a perturbation set \({{{\mathcal{P}}}}=({P}_{1},\cdots \,,{P}_{M})\), GEARS looks up the perturbation embedding of each element of that set \(({{{{\bf{h}}}}}_{{P}_{1}}^{\,{{\mbox{pert}}}\,},\cdots \,,{{{{\bf{h}}}}}_{{P}_{M}}^{\,{{\mbox{pert}}}\,})\). To model multigene perturbations, we use the ‘sum’ compositional operator followed by an MLP: \({{{{\bf{h}}}}}^{{{{\mathcal{P}}}}}={{{{\rm{MLP}}}}}_{{\theta }_{c}}\left(\mathop{\sum }\nolimits_{i = 1}^{M}{{{{\bf{h}}}}}_{{P}_{i}}^{\,{{\mbox{pert}}}\,}\right)\). The ‘sum’ operator allows extendability to perturbations of any size. Thus, each perturbation embedding from \(({{{{\bf{h}}}}}_{{P}_{1}}^{\,{{\mbox{pert}}}\,},\cdots \,,{{{{\bf{h}}}}}_{{P}_{M}}^{\,{{\mbox{pert}}}\,})\) is applied to every gene embedding to obtain a postperturbation gene embedding. For gene u, we have \({{{{\bf{h}}}}}_{u}^{\,{{\mbox{post-pert}}}\,}={{{{\rm{MLP}}}}}_{{\theta }_{pp}}\left({{{{\bf{h}}}}}_{u}^{\,{{\mbox{gene}}}\,}+{{{{\bf{h}}}}}^{{{{\mathcal{P}}}}}\right)\).

Cross-gene effects and gene-specific decoder

Following application of the perturbations in the embedding space, GEARS maps the postperturbation gene embedding to its corresponding postperturbation gene expression vector. Because each gene has its own perturbation pattern, for every gene u, we apply a gene-specific linear layer parameterized by \({{{{\bf{w}}}}}_{u}\in {{\mathbb{R}}}^{d},{b}_{u}\in {\mathbb{R}}\) to map it to a scalar of perturbation gene expression effect \({{{{\bf{z}}}}}_{u}={{{{\bf{w}}}}}_{u}{{{{\bf{h}}}}}_{u}^{\,{{\mbox{post-pert}}}\,}+{b}_{u}\in {\mathbb{R}}\). We then concatenate the individual effect to a single perturbation effect vector \({{{\bf{z}}}}\in {{\mathbb{R}}}^{K}\) for the cell. Because the perturbational effect on a gene can incur secondary effects on other genes, we wanted to use the transcriptome-wide ‘cross-gene’ information for the cell when predicting final gene expression for each gene. Thus, we added an additional MLP that generates a cross-gene embedding for the cell \({{{{\bf{h}}}}}^{{{\mbox{cg}}}}={{{{\rm{MLP}}}}}_{{\theta }_{cg}}\left({{{\bf{z}}}}\right)\in {{\mathbb{R}}}^{d}\). Conditioned on this cross-gene state, for every gene u, a gene-specific decoder parameterized by \({{{{\bf{w}}}}}_{u}^{\,{{\mbox{cg}}}\,}\in {{\mathbb{R}}}^{d+1},{b}_{u}^{\,{{\mbox{cg}}}\,}\in {\mathbb{R}}\) augments zu to \({\hat{{{{\bf{z}}}}}}_{u}={{{{\bf{w}}}}}_{u}^{\,{{\mbox{cg}}}\,}\left({{{{\bf{z}}}}}_{u}\parallel {{{{\bf{h}}}}}^{{{\mbox{cg}}}}\right)+{b}_{u}^{{{\mbox{cg}}}\,}\in {\mathbb{R}}\), where the double bar notation () refers to the vector concatenation operation. Finally, the predicted perturbation effect vector \(\hat{{{{\bf{z}}}}}\in {{\mathbb{R}}}^{K}\) is added to the gene expression of a randomly sampled unperturbed control cell (gctrl) to arrive at the predicted postperturbation gene expression vector for that cell \(\hat{{{{\bf{g}}}}}=\hat{{{{\bf{z}}}}}+{{{{\bf{g}}}}}_{{{{\rm{ctrl}}}}}\). This allows GEARS to focus only on learning perturbation effects.

Autofocus direction-aware loss

GEARS optimizes model parameters to fit the predicted \(\hat{{{{\bf{g}}}}}\) postperturbation gene expression to true postperturbation gene expression g using stochastic gradient descent. We designed an autofocus loss that automatically gives a higher weight to differentially expressed genes by elevating the exponent of the error. Given a minibatch of T perturbations, where each perturbation k has Tk cells and each cell has K genes with predicted postperturbation gene expression \(\hat{{{{\bf{g}}}}}\) and true expression g, the loss is defined as

$${L}_{{{{\rm{autofocus}}}}}=\frac{1}{T}\mathop{\sum }\limits_{k=1}^{T}\frac{1}{{T}_{k}}\mathop{\sum }\limits_{l=1}^{{T}_{k}}\frac{1}{K}\mathop{\sum }\limits_{u=1}^{K}{({{{{\bf{g}}}}}_{u}-{\hat{{{{\bf{g}}}}}}_{u})}^{(2+\gamma )}.$$

However, this loss is insensitive to directionality. To address this, GEARS incorporates an additional direction-aware loss

$${L}_{{{{\rm{direction}}}}}=\frac{1}{T}\mathop{\sum }\limits_{k=1}^{T}\frac{1}{{T}_{k}}\mathop{\sum }\limits_{l=1}^{{T}_{k}}\frac{1}{G}\mathop{\sum }\limits_{u=1}^{K}{\left[{{{\rm{sign}}}}\left({{{{\bf{g}}}}}_{u}-{{{{\bf{g}}}}}_{u}^{{{{\rm{ctrl}}}}}\right)-{{{\rm{sign}}}}\left({\hat{{{{\bf{g}}}}}}_{u}-{{{{\bf{g}}}}}_{u}^{{{{\rm{ctrl}}}}}\right)\right]}^{2}.$$

The prediction loss function is L = Lautofocus + λLdirection, where λ adjusts the weight for the directionality loss.

Uncertainty

GEARS generates an uncertainty score to measure the confidence of model prediction on a novel perturbation. A Gaussian likelihood \({{{\mathcal{N}}}}({\hat{{{{\bf{g}}}}}}_{u},{\hat{\sigma }}_{u}^{2})\) is used to model the postperturbation gene expression value for gene u under perturbation \({{{\mathcal{P}}}}\), where \({\hat{{{{\bf{g}}}}}}_{u}\) is the predicted postperturbation scalar and \({\hat{\sigma }}_{u}^{2}\) is the variance52. We add an additional gene-specific layer to predict the log variance term \({s}_{u}=\log {\hat{\sigma }}_{u}^{2}={{{{\bf{w}}}}}_{u}^{{{{\rm{unc}}}}}{{{{\bf{h}}}}}_{u}^{\,{{\mbox{post-pert}}}}+{b}_{u}^{{{\mbox{unc}}}\,}\) for each gene u and learn it through a modified Bayesian neural network loss52

$${L}_{{{{\rm{unc}}}}}=\frac{1}{T}\mathop{\sum }\limits_{k=1}^{T}\frac{1}{{T}_{k}}\mathop{\sum }\limits_{l=1}^{{T}_{k}}\frac{1}{G}\mathop{\sum }\limits_{u=1}^{K}\exp (-{s}_{u}){({{{{\bf{g}}}}}_{u}-{\hat{{{{\bf{g}}}}}}_{u})}^{(2+\gamma )}.$$

By encouraging log variance to be large when the error is large, the log variance is learned to be a proxy of model uncertainty.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.