Introduction

Disordered materials have attracted much attention in the community in recent years due to their exotic structural and electronic properties such as Anderson localization and Mott-like conduction1,2,3, novel phonon scattering channels and lattice dynamics4,5,6, enhanced ductility and mechanical strength over a wide temperature range in medium- and high-entropy alloys7,8,9,10,11, and regulated electronic states and atomic sites for catalysts12,13, endowing them with promising applications in electronic devices and energy materials. Depending on its chemical nature, disorder effects in materials can be classified into structural disorder characterized by disrupted chemical bonding network (such as vacancies, dislocations, and dangling bonds)14, and configurational (compositional) disorder characterized by crystallographic sites being occupied by irregular atomic species15. In this work, we focus on the latter type of disorder.

To numerically access the configurational disorder properties, such as the order-disorder phase transition temperature16,17 and the configurational entropy18,19, Monte Carlo (MC) simulations are often carried out with Metropolis sampling20 or Wang-Landau sampling21,22. With the Metropolis sampling, the free energy is obtained by performing thermodynamic integration numerically using the average energies at each temperature. As for the Wang-Landau sampling, the density of states, instead of the average energies, is estimated, and the configurational entropy and the heat capacity at any arbitrary temperature can then be evaluated. However, both sampling methods require evaluating a large number of supercell configurational energies efficiently and accurately in order to achieve convergence, thus not applicable to first-principles methods especially when the cell size is very large.

One commonly used approach to this problem is the cluster expansion (CE) method23,24,25,26,27,28, where the cell is decomposed into different atomic clusters up to a cutoff size, and the total energy of the cell is expanded into the effective cluster interactions of these clusters. This method has been applied to evaluate the total energies of different configurations and further to calculate the disorder properties effectively. However, it suffers from several limitations. Due to the limited cluster size in practice, it cannot capture long-range orders in materials, which may affect the phase stability and electronic structures29. Besides, the definitions of clusters strongly depend on the atomic positions, restricting this method from adapting to lattice distortions and local atomic displacements induced by atomic relaxations or thermal effects30.

Since 2018, graph neural networks (GNNs) have been applied to studying the structure-property correlation in complex solid-state materials. Instead of relying on human-selected descriptors, GNNs can autonomously learn latent representations of materials and make fast and accurate atom-, bond-, and material-level predictions31,32,33. Therefore, in this work, we propose to employ GNNs to evaluate the configurational energies and to access the disorder properties in configurationally-complex compound materials accurately, because of their high representation capability and versatile adaptability to realistic simulation scenarios, including lattice distortions, atomic displacements, and various types of defects32,33. Especially, we construct attention-based GNN models from Transformer neural networks34, leveraging masked self-attentional layers to obtain configurational energies efficiently. The model is trained on the face-centered tetragonal (fct) gold copper (AuxCu1−x) alloy dataset obtained from density functional theory (DFT) calculations. The trained model can well reproduce the DFT configurational energies, with a mean absolute error (MAE) of 2.76 meV/atom, which eventually leads to the prediction of order-disorder phase transition temperature that is comparable to experimental observation. When random atomic displacements are introduced, which is beyond the capability of the CE method, the GNN model can still evaluate the configurational energies accurately (with MAE being 5.02 meV/atom). The predicted phase transition temperature is slightly lower than the undistorted case, suggesting structural disorder can enhance the configurational disorder. Furthermore, we reveal the connections between the variance of the energy deviations among configurations and the accuracy of configurational entropy and heat capacity predictions, providing guidance on future data-driven studies of the configurational disorder properties in materials.

Results

Pristine AuxCu1−x structure dataset

We choose fct AuCu to construct our dataset. Although both experimental and numerical results on the configurational entropy and the phase transition temperature (683 K) were reported27,35, those results are based on the face-centered cubic (fcc) structure, which is higher in energy than the ground-state fct structure by 0.016 eV per formula unit and can only be stable under pressure. However, these two structures differ only in the lattice constants (Δa/a = 6.9% and Δc/c = 15%), and thus we expect their disorder properties, especially the phase transition temperature, to be similar. Using first-principles calculations, we construct our dataset for pristine AuxCu1−x containing 4500 configurations in a 5 × 5 × 4 supercell with 200 atoms, covering all chemical compositions x [0, 1]. The concentration distribution of the dataset is shown in Fig. 1a.

Fig. 1: The performance of the Transformer-based GNN model on the pristine AuxCu1−x structure dataset.
figure 1

a The histogram of the number of configurations for each Au concentration. b The comparison between the configurational energies (per atom) predicted from DFT (EDFT) and the optimal GNN model (EGNN). The color of each point represents the concentration x of Au atoms in the configuration. c The calculated normalized density of states of pristine AuCu during the MC simulation process (per cell). The total number of MC steps is 4 × 107. d The calculated configurational entropy (black curve) and the heat capacity (red curve) at different temperatures.

Our GNN model features a typical global graph feature regression model architecture, as shown in Fig. 2. It consists of a few stacked graph convolution layers which aggregate information between adjacent atoms/nodes, a pooling layer which extracts the global feature as the mean of all node features, and a multi-perceptron network which transforms the global feature into the energy (per atom) of the crystal (see Methods section). Therefore, when inputting each atomic configuration as a crystal graph, our GNN model can efficiently output its energy. To train the GNN models, we randomly shuffle the dataset and divide it into the training set, the validation set, and the testing set, allocating them in a 60:20:20 ratio. We also implement early stopping techniques to prevent overfitting on the training set, and we choose the GNN model with the minimum validation loss as our final model for testing and next-step MC simulations. Besides, we also use the Bayesian optimization method implemented in Optuna36 to select the best set of hyperparameters of the GNN model (the full list of hyperparameters can be found in Methods section).

Fig. 2: A schematic plot of the workflow to use GNN to calculate the disorder properties of compound materials.
figure 2

The input crystal structure is converted to a graph, whose nodes and edges represent the atomic species and the interatomic distances, respectively. Through several attention-based convolution layers, where the node features aggregate and interact with each other, global features are extracted and further processed by linear layers to predict the energy. The well-trained GNN model is subsequently utilized in Monte Carlo simulations to acquire the final disorder-related properties, such as the configurational density of states, configurational entropy, and heat capacity.

The MAE of the best GNN model built upon Transformer convolution layers on the testing set of our fct AuxCu1−x dataset is 2.76 meV/atom, similar to previously reported CE methods on fcc-structure AuCu (4.49 meV/atom)27. The comparison between the predicted energy and the DFT energy is shown in Fig. 1b, indicating the capability of our attention-based GNN model to evaluate the configurational energies accurately for all compositions of AuxCu1−x.

We compare this result with other commonly used GNNs with attention mechanism, such as the graph attention network (GAT)37,38, and those without attention mechanism, including the crystal graph neural network (CGNN)39 and the edge-conditioned neural network (ECNN)40,41 (details about these convolution layers can be found in Supplementary Note 1). The MAEs of the best GNN model built for each type of convolution layers are summarized in Table 1, suggesting that GNN models with the attention mechanism perform better than those lacking this mechanism when evaluating configurational energies. In attention-based neural networks, node features are updated by summing up adjacent node features weighted by the attention coefficients, calculated from the node features and the edge features and thus containing the information on the similarity between the two nodes. Specifically in Transformer convolution layers, the central node and the adjacent nodes are considered as queries and keys respectively, and the overlap between the queries and the keys reflects the similarity between the two nodes (atoms). This attention mechanism updates the central node features by a linear combination of adjacent node features weighted by the overlap between the query vector and key vectors in the latent space, thus allowing the network to effectively capture the chemical distinctions between neighboring atoms and increasing its efficacy in predicting properties related to crystal structure. Besides, since the attention coefficients depend on the node features, they are dynamic and can vary across different convolution layers, as compared to the static weight matrices that are fixed across the convolution layers in models without the attention mechanism. Therefore, we expect that in general models with the attention mechanism are better than those without it. Finally, we also observe the superior performance of the ECNN model compared to the CGNN model, possibly because the ECNN model adopts a multilayer perceptron layer to process edge features while the CGNN model employs only a linear transformation on edge features. This distinction makes the former more adaptable in evaluating configurational energies.

Table 1 The MAE (meV/atom) and the variance (meV2/atom) on the pristine AuxCu1−x dataset using different types of graph neural network models

We then apply the optimal GNN model based on Transformer attention mechanism to MC simulations to obtain the configurational properties of stoichiometric AuCu systems, as shown in Fig. 2. We use the Wang-Landau sampling method to obtain the configurational density of states g(E) (see Methods section), with which we can calculate the configurational entropy S and the configurational heat capacity Cv at any temperature T according to

$$S=\frac{\langle E\rangle -F}{T},\quad {C}_{v}=\frac{\langle {E}^{2}\rangle -{\langle E\rangle }^{2}}{{k}_{{{{\rm{B}}}}}{T}^{2}}$$
(1)

where the partition function is Z = ∫ g(E)eβEdE, the free energy is \(F=-{k}_{{{{\rm{B}}}}}T\ln Z\), the expected value for the physical quantity Q is \(\langle Q\rangle =\frac{1}{Z}\int\,g(E)Q(E){e}^{-\beta E}dE\), and \(\beta =\frac{1}{{k}_{{{{\rm{B}}}}}T}\).

Because the configurational space is complicated for such large supercells, we choose a bin size of 0.3 eV in the Wang-Landau sampling method to expedite convergence. Since the MAE of the trained GNN model is around 2.76 meV/atom, we anticipate that each configuration has an energy deviation of 0.6 eV/cell on average, justifying our choice of bin size. With this choice of the bin size, the MC simulation takes around 4 × 107 steps to converge, beyond the capability of DFT methods. We show the intermediate and final density of states that are fitted to a Gaussian distribution in Fig. 1c. The density of states is normalized such that the sum of the density of states equals to one; while the normalization constant is the total number of configurations of stoichiometric AuCu in a 200-atom supercell, i.e., \({\Omega }_{\max }=\left(\begin{array}{c}200\\ 100\end{array}\right)\). The peak of the density of states is higher in energy than the ground state by 3.2 eV/cell. From the density of states, we can calculate the configurational entropy and the heat capacity according to Eq. (1), shown in Fig. 1d. The position of the heat capacity peak, indicating the order-disorder phase transition temperature, is at 870 K (for fct AuCu), similar to the experimental values27,35. As temperature further increases, the configurational entropy gradually approaches the theoretical limit for the fully disordered phase (within a 200-atom supercell) \({S}_{\max }={k}_{{{{\rm{B}}}}}\ln {\Omega }_{\max }=1.17\times 1{0}^{-2}\) eV/K.

We also note that while training and selecting the best GNN model for MC simulations, batch normalization features can affect the predictions of disorder properties significantly and should be disallowed. When training GNN models, configurations are grouped into batches of a reasonable size to accelerate the training process. In general, too small batches can lead to instability of gradients and optimization process, while too large batches can lead to huge memory requirements and possible overfitting. However, in MC simulations, energy evaluations are performed on a single configuration consecutively. When using trained models with batch normalized features to predict the energy of one configuration, incorrect predictions may arise because in this case the batch mean and batch variance are highly influenced by the specific configuration in the batch, but not reflecting the distribution of the features of the whole dataset. In Supplementary Note 2, we show benchmark calculations for the GNN model with batch normalization features. By evaluating the configurations in batches (containing 58 configurations), the MAE of the best model is 3.52 meV/atom, but when evaluated consecutively, the MAE loss for the model increases drastically to 2.76 eV/atom, and the corresponding prediction on the order-disorder phase transition is incorrect (183 K).

Distorted AuxCu1−x structure dataset

To showcase the adaptability of GNN models to realistic simulation scenarios, such as atomic relaxations where CE methods face convergence challenges, we constructed another dataset containing 4500 configurations with random atomic displacements. The maximum amount of displacement for each atom is chosen to be 0.35 Å.

The MAE on the testing set of this dataset containing distorted structures is 6.43 meV/atom, and the comparison between the DFT and GNN energies is shown in Fig. 3a. Larger deviations from DFT energies are primarily observed in configurations with near amount of Au and Cu atoms, whose energies are more difficult to capture due to the large configuration space compared to configurations with mostly Au or mostly Cu atoms. We further apply the trained model to calculate the disorder properties, shown in Fig. 3b. We generate one supercell with random atomic displacements and apply it for MC simulations while keeping all the atomic positions fixed. The calculated phase transition temperature is 788 K, lower than that for the pristine structure, suggesting that structural disorder can enhance the configurational disorder; the introduction of structural disorder increases the likelihood of the material being in the configurational disordered phase.

Fig. 3: The performance on the dataset with random atomic displacements, using MSE as the target loss function.
figure 3

a The comparison between the configurational energies (per atom) obtained from DFT (EDFT) and the optimal GNN model (EGNN). b The calculated configurational entropy (black curve) and the heat capacity (red curve) at different temperatures. c, d Same as (a, b), but on a dataset without displacements and using variance as the target loss function.

A potential avenue for enhancing the performance of GNN models for energy predictions of disordered systems with distorted structures lies in the modification of how crystal structures are represented in the form of graphs. In our current method, the nodes represent the atomic species and the edges contain only bond distance information. But other local chemical information can also be included in the graphs, such as the directional information (for instance, bond directions expanded in spherical harmonic coefficients) and the multiplet interactions that can be included in hypergraphs42 or higher-order graphs43. These graph structures may capture the much more complex chemical environment of each atom after introducing random atomic displacements. An alternative avenue for improvement is to modify the network structure. For example, the Au concentration can be treated as a global feature for each configuration. This feature could be concatenated with the global features from the nodes and jointly passed through the linear layers to determine the total energy of the configuration. Despite room for these improvements, our current model already suffices for the effective predictions of disorder properties of complex alloy systems with distorted structures.

Discussions

Next, we would like to draw the reader’s attention to the correlation between disorder properties and the optimization process of GNN models. From Eq. (1), it can be shown that neither the heat capacity nor the configurational entropy are affected by an overall energy shift ΔE to all configurations (see Supplementary Note 3). Based on this observation, we postulate that when training and selecting the best GNN model to predict configurational entropy and heat capacity, the variance of the differences between EDFT and EGNN among all configurations, instead of the mean of the energy differences, controls the accuracy of predictions and thus can be chosen as the target quantity for optimization.

In the following, we denote \({\hat{E}}_{i}\) as the energy predicted by GNN models and Ei as the “ground-truth” energy obtained by DFT for configuration i, and their difference as \({e}_{i}={\hat{E}}_{i}-{E}_{i}\) (in the following, thermodynamic quantities with hats are calculated from \({\hat{E}}_{i}\), and those without hats are from Ei). We assume that those energy differences are independent and identically distributed random variables, following the normal distribution \({{{\mathcal{N}}}}(\mu ,{\sigma }^{2})\), with mean μ and variance σ2; thus we have \({\mathbb{E}}[{e}_{i}]=\mu\), \({\mathbb{E}}[{e}_{i}^{2}]={\mu }^{2}+{\sigma }^{2}\), and \({\mathbb{E}}[{e}_{i}{e}_{j}]={\mu }^{2}\) for all i and j ≠ i.

To derive the expected deviations of configurational entropy and the heat capacity due to ei’s, we first focus on the difference of the log-partition-function, defined as \(\Delta (\ln Z)\equiv \ln \hat{Z}-\ln Z=\ln {\sum }_{i}{e}^{-\beta \hat{{E}_{i}}}-\ln {\sum }_{i}{e}^{-\beta {E}_{i}}\). By assuming that the errors ei are smaller than the energies Ei, we can perform Taylor expansions on the log-sum-exp function \(\ln {\sum }_{i}{e}^{-\beta \hat{{E}_{i}}}=\ln {\sum }_{i}{e}^{-\beta ({E}_{i}+{e}_{i})}\) with respect to ei. As shown in the Supplementary Note 3, after taking the expectation value with respect to the random variables ei, the first-order term vanishes, and we have

$${\mathbb{E}}[\Delta (\ln Z)]\,\approx \,\frac{1}{2}{\sigma }^{2}{\beta }^{2}(1-\alpha )$$
(2)

where \(\alpha =\frac{{\sum }_{i}{e}^{-2\beta {E}_{i}}}{{({\sum }_{i}{e}^{-\beta {E}_{i}})}^{2}}\). Based on this result, according to Eq. (1), the expected deviation of the configurational entropy due to the random energy deviations ei, defined as \({\mathbb{E}}[\Delta S]\equiv {\mathbb{E}}[\hat{S}-S]\), is given by

$${\mathbb{E}}[\Delta S]=-\frac{1}{T}\frac{\partial }{\partial \beta }{\mathbb{E}}[\Delta (\ln Z)]+{k}_{{{{\rm{B}}}}}{\mathbb{E}}[\Delta (\ln Z)]\,\approx -\frac{{\sigma }^{2}}{2{k}_{{{{\rm{B}}}}}{T}^{2}}(1-\alpha )$$
(3)

where we use the fact that α, though containing β, does not depend on β explicitly. Similarly, the expected deviation of the heat capacity is given by

$${\mathbb{E}}[\Delta {C}_{v}]=\frac{1}{{k}_{{{{\rm{B}}}}}{T}^{2}}\frac{{\partial }^{2}}{\partial {\beta }^{2}}{\mathbb{E}}[\Delta (\ln Z)]\,\approx \,\frac{{\sigma }^{2}}{{k}_{{{{\rm{B}}}}}{T}^{2}}(1-\alpha )$$
(4)

The derivation details can be found in Supplementary Note 3. Therefore, both \({\mathbb{E}}[\Delta S]\) and \({\mathbb{E}}[\Delta {C}_{v}]\) are linearly dependent on the variance σ2, but independent on the mean μ, suggesting that the variance σ2 of the energy deviations ei among the configurations must be minimized to accurately predict configurational entropy and heat capacity.

To numerically verify this, we train our GNN model that is based on Transformer convolution layers on the pristine AuxCu1−x dataset, using the variance as the target loss function. The variance of the optimal model on the testing set is 15.82 meV2/atom, but the MAE is 4.89 eV/atom; as a comparison, the variance of our previous model using MSE as the loss function is 13.2 meV2/atom and the MAE loss is 2.76 meV/atom. In Fig. 3c, we show the comparison between the configurational energies predicted by GNN and DFT, suggesting a global shift of energies for all configurations. We then apply this model to MC simulations, and the predicted ordering-disorder phase transition temperature is 837 K, in good agreement with previous results using MSE as the loss function (870 K). Therefore, as long as the variance is minimized, the predicted configurational entropy and the heat capacity are reliable.

In conclusion, we demonstrate the capability of GNN with an attention mechanism to accurately predict the configurational disorder properties in compound materials, including configurational entropy and order-disorder phase transition temperature. Using the face-centered tetragonal AuxCu1−x dataset as an example, the predicted phase transition temperature from attention-based GNN models and MC simulations is close to that obtained in experiments. Even when random atomic displacements are introduced, reliable predictions are still achievable. Furthermore, we show that the variance of the configurational energy deviations between GNN and DFT controls the prediction accuracy of these disorder-related properties. These results provide perspectives on the efficient and accurate evaluation of disordered properties in configurationally-complex materials. This contributes to future research focused on the phase stability of such materials and advances the exploration of medium- and high-entropy alloys and related material systems.

Methods

Density functional theory calculations

The first-principles calculations are performed based on DFT, as implemented in Vienna Ab initio Simulation Package44,45. We use the projector augmented wave pseudopotentials46,47, where 5d and 6s electrons are treated as valence electrons for Au, and 3d and 4s electrons are treated as valence electrons for Cu. We used the GGA-PBE functional for all calculations48 and 550 eV for the kinetic energy cutoff of the plane-wave basis sets. The k-point grid density is taken to be 0.03 2π/Å, and the energy convergence threshold is 10−7 eV.

After variable-cell relaxation, the ground state structure of stoichiometric AuCu is the face-centered-tetragonal structure. The relaxed lattice constants are a = 2.86 Å and c = 3.55 Å. From the relaxed unit cell, we generate the 5 × 5 × 4 supercell, such that the lattice constant along each direction is close to 15 Å. Using the supercell, we generate the AuxCu1−x dataset covering the whole concentration range 0 < x < 1.

Constructing and training GNN models

In each layer of a GNN, the node features are updated by the features of neighboring nodes according to \({{{{\bf{x}}}}}_{i}=\phi ({{{{\bf{x}}}}}_{i},{{{{\bf{x}}}}}_{{{{\mathcal{N}}}}(i)})\), where xi is the feature of the ith node and \({{{\mathcal{N}}}}(i)\) is the adjacent nodes of i. This message aggregation process is repeated multiple times as the convolution layers are stacked, allowing GNNs to capture the long-range interactions in the crystal. Depending on the aggregate function ϕ, various types of GNNs are proposed. In this work, we choose the attention-based Transformer network34 to construct our GNN and generate the main results in the manuscript.

For each layer, we use global mean pooling to extract the global graph feature from all nodes, as we choose the total energy per atom as the target quantity to predict. These global features are added together, forming shortcut connections that allow the gradient to flow more easily during training (benchmark results using only the global features from the last convolution layer can be found in the Supplmentary Note 2). Finally, the global features are passed to two fully-connected linear layers to predict the total energy per atom.

For training the GNN models, unless otherwise stated, the target loss function is the MSE and we use the Adam algorithm to minimize the loss function, where both the learning rate and the weight decay are treated as hyperparameters. The validation set is used to prevent overfitting on the training set. While training the GNN models, we keep track of the validation loss and use the model with the smallest validation loss as our final model for testing and for next-step MC simulations.

To choose the optimal set of hyperparameters, we use the Bayesian optimization method as implemented in Optuna36, which calculates the expected improvement of the current set of hyperparameters based on results from previous trial runs using Tree-structured Parzen Estimator method49. The convergence is achieved when no significantly different hyperparameter values are proposed for consecutive ten trial runs. The hyperparameter set includes the number of convolution layers, the number of hidden channels, the number of attention heads (used only in GNN models with attention mechanisms), learning rate, weight decay, and batch size.

Monte Carlo simulations

MC simulations with the Wang-Landau sampling method21 are then carried out to obtain the density of states g(E), with the configurational energies evaluated from the trained GNN models. The flatness criterion is achieved when the minimum value of the histogram is no smaller than 80% of the mean value, and the convergence is achieved when the modification factor satisfies \(\ln f < 1{0}^{-7}\). Since the configurational space of the supercell is complicated, at least around 107 configurations are necessary for the MC simulation to converge.