As promising nextgeneration candidates for applications in aeroengines, L1_{2}strengthened cobalt (Co)based superalloys have attracted extensive attention. However, the L1_{2} strengthening phase in firstgeneration CoAlWbased superalloys is metastable, and both its solvus temperature and mechanical properties still need improvement. Therefore, it is necessary to discover new L1_{2}strengthened Cobased superalloy systems with a stable L1_{2} phase by exploring the effect of alloying elements on their stability. Traditional firstprinciples calculations are capable of providing the crystal structure and mechanical properties of the L1_{2} phase doped by transition metals but suffer from low efficiency and relatively high computational costs. The present study combines machine learning (ML) with firstprinciples calculations to accelerate crystal structure and mechanical property predictions, with the latter providing both the training and validation datasets. Three ML models are established and trained to predict the occupancy of alloying elements in the supercell and the stability and mechanical properties of the L1_{2} phase. The ML predictions are evaluated using firstprinciples calculations and the accompanying data are used to further refine the ML models. Our MLaccelerated firstprinciples calculation approach offers more efficient predictions of the crystal structure and mechanical properties for CoVTa and CoAlVbased systems than the traditional counterpart. This approach is applicable to expediting crystal structure and mechanical property calculations and thus the design and discovery of other advanced materials beyond Cobased superalloys.
Nibased superalloys have been widely used in the aviation, aerospace and petrochemical industries due to their superior combination of highly desirable properties, such as microstructural stability, mechanical properties and oxidation and thermal corrosion resistance at elevated temperatures^{[1,2]}. The signature coherent γ/γ' twophase precipitate microstructure can maintain the strength of the superalloys under hightemperature conditions^{[3]}. However, due to the limitation of the melting temperature of elemental Ni
Nevertheless, to explore the highdimensional composition and temperature space through the alloying strategy, the traditional experimental methods based on trial and error are labor intensive and timeconsuming. In order to guide the design and discovery of new L1_{2}strengthened Cobased superalloys with enhanced mechanical properties, the basic information, such as the crystal structures and atomic occupancies, of the L1_{2} phase are highly desirable, which is defined as the site occupied by a doped TM. Through structural optimization and static calculations based on firstprinciples calculations, the groundstate static energy of the L1_{2} phase at 0 K can be accurately calculated and the stable formation enthalpy and reaction energy of the L1_{2 }phase can then be derived^{[22,23]}. Firstprinciples calculations can also be combined with Hook’s law to predict the elastic constant of the supercell of Cobased superalloys, which allows for the prediction of the mechanical properties, such as the bulk, shear and elastic moduli^{[24,25]}. However, the procedures of traditional firstprinciples calculations are tedious and require significant computational resources. In the case of a system with more than four elements, the number of nonequivalent sites for each element in the supercell will dramatically increase due to the increase in the types of elements, resulting in a significant increase in computational cost and a reduction in computational efficiency. Therefore, improving the computational efficiency to speed up alloy discovery requires an alternative approach^{[26]}.
To date, there has been a push towards big data and artificial intelligence in materials research^{[27,28]}. Machine learning (ML) is a type of algorithm that can acquire new knowledge “automatically” like human beings, mine the existing data, extract key information, establish a predictive model that describes the relationship between influencing factors and a target property and use the model to predict new materials of new unknown systems^{[26]}. MLbased methods have been widely used for assisting the design and discovery of a wide class of materials, including alloys, ceramics and composites, polymers, twodimensional materials, organicinorganic hybrids, and so on^{[29,30]}. Using ML algorithms, new materials with excellent performance have been developed successfully and efficiently. However, most of the data used to train the models are collected from experimental studies^{[3136]}. Only a few studies have relied on data from firstprinciples calculations to train ML algorithms. For example, Guo
To overcome the limitations posed by the inherent low efficiency in predicting the crystal structure and mechanical properties of the L1_{2} phase using conventional firstprinciples calculations, a MLaccelerated firstprinciples approach is proposed in the present work. First, ML algorithms are established and trained using the data provided by conventional density functional theory (DFT) calculations. A small number of predictions made by these ML models are then validated by the firstprinciples calculations and the resulting dataset is used for improving the ML models if necessary. Finally, the models are employed to predict the crystal structure and mechanical properties of the L1_{2} phase. These predictions may provide a theoretical basis for the design and discovery of new L1_{2}strengthed Cobased superalloys. In particular, it is found that the efficiency of this MLassisted method is twice as fast as that based on conventional firstprinciples calculations alone.
In order to obtain the crystal structure and mechanical properties of the new L1_{2}strengthened Cobased superalloys more efficiently, ML algorithms are combined with firstprinciples calculations to predict the properties of the superalloys mentioned above in three steps.
Before attempting to use ML algorithms, it is necessary to conduct a detailed analysis of the firstprinciples calculations to determine the concept of establishing the ML models, as shown in
Schematic workflow of MLassisted firstprinciples calculations for designing L1_{2}strengthened Cobased superalloys.
In this study, we propose a new type of approach for predicting the L1_{2} phase crystal structure and mechanical properties based on ML algorithms in new Cobased superalloys in three steps, namely, occupied sites, stability prediction and mechanical property prediction, similar to the procedures of firstprinciples calculations mentioned above. Since the reaction energy and enthalpy of formation between different superalloy systems are incomparable numerically, the classification algorithm in ML should be selected to make a qualitative judgment rather than a quantitative prediction when predicting the occupancy of the doped TM atoms and the stability of the doped L1_{2} and D0_{19} phases.
Firstprinciples calculations are employed to generate data for training the ML model and verifying the ML model predictions, so as to improve the ML model iteratively. The details of the firstprinciples calculations are briefly summarized below. Generally, firstprinciples calculations can only deal with a completely ordered phase. If a completely ordered structure can be found and the correlation function of the structure is close to that of a disordered alloy, it is considered that the structure can reflect the configuration of the disordered alloy and the structure is used as the cell model of the disordered alloy in the calculation. The essence of the special quasirandom structure (SQS) method is to find a completely ordered structure to represent the disordered structure by matching the correlation function^{[38,39]}. Therefore, we use the SQS method to construct 2 × 2 × 2 supercells of the Cobased superalloys and consider two types of structures for the CoAlW, CoVTi, CoVIr, CoVTa and CoAlVbased systems, namely, the AuCu_{3} and Ni_{3}Sn prototype structures corresponding to the L1_{2} and D0_{19 }phases, respectively^{[39,40]} (see
Crystal structures of (A) Co_{3}(Al, W); (B) Co_{3}(V, Ti); (C) Co_{3}(V, Ir); (D) Co_{3}(V, Ta) and (E) Co_{3}(Al, V) of L1_{2}ordered γ'Co_{3}(X, Y); and (F) Co_{3}(Al, W); (G) Co_{3}(V, Ti); (H) Co_{3}(V, Ir); (I) Co_{3}(V, Ta) and (J) Co_{3}(Al, V) of D0_{19}ordered γ'Co_{3}(X, Y). Sites #1, #2 and #3 represent Co and the X and Y dopants, respectively.
The Vienna Ab initio Simulation Package (VASP) is used to perform all the firstprinciples calculations with the projector augmented wave (PAW) method^{[4246]} and PerdewBurkeErnzerhoff (PBE) exchangecorrelation functional using the generalized gradient approximation (GGA)^{[23]}. During the structural relaxation, the criteria for the convergence of energy and maximum force are set to be 10^{5 }eV/atom and 10^{3 }eV/Å, respectively. The kinetic energy cutoff is set to 450 eV. Spin polarization is considered during the calculations because of the presence of the ferromagnetic Co. The Brillouin zones are sampled using
Determining the occupancy of the TM dopants in the L1_{2} phase is a vital prerequisite for obtaining an accurate atomic configuration. The occupancy of an alloying element can be evaluated using the binding^{[23]} and formation energies of the impurity^{[47]}. Each system calculated contains three main elements, each of which is designated according to the name of the alloy system. For instance, Co, Al and W are the main elements #1, #2 and #3 in the CoAlW system, respectively. In order to discover the role played by each TM element, the reaction energy
where
The stability of the L1_{2} phase is then evaluated by comparing the stable formation enthalpy
where
Elastic properties, such as the bulk (
The data for the L1_{2} phase in the new Cobased superalloys with TM alloying elements are first generated by firstprinciples calculations. A total of 61 data from the CoAlW, CoVTi and CoVIrbased systems are collected for constructing a training set, which are all included in Supplementary Table 1^{[49,57]}. The characteristics of the data are described briefly as follows:
(1) The microscopic characteristics of the elements are used to replace the names of the main and doping elements, including the melting point, boiling point, density, atomic weight, atomic radius, covalent radius, electronegativity and first ionization energy;
(2) For the occupancy prediction model, the microscopic characteristics of the main and doping elements are set as
(3) For the L1_{2} phase stability prediction model, the microscopic characteristics of the main and doping elements and the occupancy of the doping elements are set as
(4) For the mechanical properties of the L1_{2} phase prediction model, the microscopic characteristics of the main and doping elements, the occupancy of the doping elements and the L1_{2} phase stability are set as
There are two research routes of choice:
Route
Route
According to the “no free lunch” theory^{[58]}, no algorithm can be applied to all situations, i.e., one algorithm (algorithm A) outperforms another (algorithm B) on a specific data set and therefore algorithm A will be inferior to algorithm B on another specific data set. As a result, a variety of ML algorithms are first employed to predict the crystal structure and mechanical properties of the L1_{2} phase, followed by a model performance evaluation and comparison. The algorithm with the best performance is selected for making predictions.
Random forest classification, gradient boosting classification (GBC), AdaBoost classification, a support vector machine, an artificial neural network (ANN), Knearest neighbor classification and Gaussian process classification are selected to establish the classification models. In contrast, regression models are established using random forest regression, gradient boosting regression, AdaBoost regression, support vector regression, an ANN, Knearest neighbor regression and Gaussian process regression.
All the ML algorithms are run through Python 3.0 and the sklearn package is used to carry out the calculations. All calculations are performed using a PC (Microsoft Windows 10, Intel Core (TM) i710875H, CPU 2.30 GHz, 16 GB of RAM).
The performance of the various ML algorithms mentioned above is compared using the
The performance of a classification model is quantified by the socalled “
where
In this study, a principal component analysis (PCA) algorithm is also employed to reduce the dimensionality of the data. PCA is a statistical process that uses orthogonal transformation method to convert a series of observations of possible related variables into a set of linear independent variables referred to as principal components. A new feature vector
where
Several accuracy metrics, such as the coefficient of determination
where
We evaluated the importance of the features with the relative importance (
where
The performance of the selected ML algorithms is then iteratively improved through the interaction with the firstprinciples calculations. First, the selected algorithm is used to predict the target properties for a small amount of randomly chosen input data. Second, the predictions are verified using firstprinciples calculations. Third, if the accuracy of the models does not meet the requirements, the new data will be used as an additional dataset for retraining the ML model. The procedures above are repeated until the predefined precision is met. The improved models are then employed to predict all the remaining data (the workflow is schematically shown in
The occupancy of a TM dopant may significantly influence both the stability and mechanical properties of the L1_{2} phase in Cobased superalloys^{[62]}. In new Cobased superalloys, the D0_{19} phase usually competes against the L1_{2 }phase^{[49]}. The performance of various ML algorithms for predicting the dopant occupancy and stability of the L1_{2} structures are evaluated using 10fold crossvalidation and the results are shown in
Ranking of prediction accuracies of (A) dopant occupancy and (B) L1_{2 }phase stability by different models. The GBC model has the highest accuracy (up to 88.52% and 93.44%, respectively). Prediction results of (C) occupied sites and (D) L1_{2} phase stability from the model based on the GBC algorithm on the training set. Three features (main features #1, #2 and #3) are selected out of 25 using PCA for visualization (accuracy is 88.52%).
The mechanical properties of the L1_{2} phase in the new Cobased superalloys are the most important indicators of alloy properties. There are two routes for predicting them, as shown in
Route I: We start by presenting the results using route
Route II
Model performance of each regression model in terms of
Selection between two routes:
Comparison of model performance of two routes based on Adaboost regression models. The warm color system (including vermeil, red and orange bars) represents the model performance of route
The relative importance of different features on the dopant occupancy, stability of the L1_{2} structures and the mechanical properties of the L1_{2} phase are extracted from the gradient boosting classification and AdaBoost regression models, as shown in
Calculated relative importance of different features on (A) dopant occupancy prediction based on gradient boosting classification model; (B) the stability of L1_{2} structure prediction based on gradient boosting classification model; (C) bulk modulus prediction based on Adaboost regression model; (D) shear modulus prediction based on Adaboost regression model and (E) elastic modulus prediction based on Adaboost regression model. The ranking of the features is in accord with the related references.
The first ionization energy and electronegativity quantify the attraction between atoms and affect the distortion of the supercell, and are thus capable of evaluating the occupancy of a dopant in the supercell^{[63]}. The covalent radius of a dopant affects the stability of the supercell^{[62]}. The melting and boiling points of a dopant and the mechanical properties (such as bulk, shear and elastic moduli^{[64]}) are correlated. It can be seen from
The L1_{2} phase exists at high temperatures in the CoAlW, CoVTi and CoVIrbased systems^{[1,6,65]}. Building a new alloy system based on the properties of the major alloying elements is highly desirable. Ta can increase the L1_{2 }solvus temperature, while V can improve the strength of the alloy^{[6668]}. Herein, the trained ML models are employed to predict the crystal structure and mechanical properties of the L1_{2} phase in new alloy systems containing V and Ta elements, such as the CoVTa and CoAlVbased systems. The prediction precision of the ML models without information for the CoVTa and CoAlVbased systems is usually low, so it is necessary to modify the models. The ML model modification precision is shown in
Precision standard of ML model modification



Dopant occupancy models 

100% 
L1_{2} phase stability prediction models 

100% 
Mechanical property prediction models 

> 0.9 

< 5  

< 5 
A rule is established where each round of random calculation verifies three data points for evaluating the model performance. In order to verify the prediction capability of the model for an unknown system, the calculated results of the CoVTabased system are added to the previous trained models as a new training set and the optimized models are used to predict the new CoAlVbased system. Through one round of iteration, the accuracy of the ML model for predicting dopant occupancy in the CoVTabased system is improved from 66.67% to 100%. The accuracy of the prediction in the CoAlVbased system reaches 100%, i.e., the model does not need to be modified. In addition, in order to verify the generalization ability of the ML model, we use firstprinciples calculations to compute the rest of the data that have not yet been verified. The results are compared with those predicted using the improved ML model. The results show that the prediction accuracy is improved from 80.00% to 95.00% for the CoVTabased system after only onetime model optimization. The accuracy of the CoAlVbased system is 95.24%. The PCA classification effect of the model is shown in
PCA classification result of occupied site prediction model based on GBC algorithm after one round of modification: (A) original CoVTabased system (accuracy reaches 80.00%); (B) modified CoVTabased system (accuracy reaches 95.00%); (C) original CoAlVbased system (accuracy reaches 95.24%).
The accuracy of the ML model for predicting the L1_{2} phase stability in the CoVTabased system is improved from 66.67% to 100% through a oneround iteration. The accuracy of the prediction in the CoAlVbased system reaches 100%, i.e., the model does not need to be modified. As before, we use firstprinciples calculations to compute the rest of the data that have not yet been verified. The verified results show that the accuracy of model prediction in the CoVTabased system after one round of iteration is improved from 70.00% to 95.00%. The results show that the model predictions in the CoAlVbased system are all correct. The display effect of the PCA classification effect of the models is shown in
Display effect of PCA classification effect of L1_{2} phase stability prediction model based on GBC algorithm after one round of modification: (A) original CoVTabased system (accuracy reaches 70.00%); (B) modified CoVTabased system (accuracy reaches 95.00%); (C) original CoAlVbased system (accuracy reaches 100%).
The iterative processes for improving the accuracy of the ML for predicting the mechanical property L1_{2} phase are shown in
The optimization processes of the ML models for predicting the mechanical properties of the L1_{2} phase in the CoVTa and CoAlVbased systems are shown in
Overall prediction results of modified mechanical performance models: (A)
Model performance of optimized ML models for predicting mechanical properties of L1_{2} phase. Accuracy metrics (A)
It takes about two days for traditional firstprinciples calculations to compute a data point, while establishing a ML model requires five days. However, it takes less than a minute for the trained ML models to predict the calculation results. By comparing the calculation amount and time between the modified ML models and the traditional firstprinciples calculations, we find the prediction method based on ML algorithms can improve the calculation efficiency by more than double using the modified ML model, as shown in
Comparison of time costs for firstprinciples calculations alone and MLaccelerated first principles calculations



Traditional DFT method  Firstprinciples calculations  92 days 
MLaccelerated method  Firstprinciples calculations  22 days 
Establish ML models  5 days  
ML prediction  1 minute  
Total  27 days 
Comparison of the predicted
Mechanical property comparison of predicted CoVTaX and CoAlVX systems with data in Ref.^{[57]} for CoAlWX and data in Ref.^{[49]} for CoVTiX system: (A) bulk modulus; (B) shear modulus; (C) elastic modulus.
This work aims to address the challenges encountered by the traditional experimental approaches and firstprinciples calculation methods for the discovery of new Cobased superalloys (strengthened by L1_{2} ordered precipitates), both of which are inefficient, timeconsuming and laborintensive when used alone.
A new approach is proposed that combines machine learning (ML) and firstprinciples calculations to speed up the prediction of crystal structure, phase stability and mechanical properties for systems, such as CoVTa and CoAlVbased alloys. This information is critical for developing new Cobased superalloys with superior properties at elevated temperatures. ML models are established and trained for predicting the site occupancy, phase stability and mechanical properties. Through iterative interactions between model predictions and validations using firstprinciples calculations, the ML models are further improved. Finally, the refined models are used to make accurate predictions for the crystal structure and mechanical properties for CoVTa and CoAlVbased systems.
The combination of ML and firstprinciples calculations may shed light on the rapid prediction of crystal structure and mechanical properties of other advanced materials beyond Cobased alloys.
Project conception: Liu X, Wang C
Calculation task: Xi S, Yu J
Analysis: Xi S, Yu J, Bao L
Investigation: Xi S, Yu J, Bao L, Chen L, Li Z, Shi R
Draft Preparation: Xi S, Yu J, Shi R
Supervision: Liu X
Not applicable.
All authors declare that there are no conflict of interest.
This work was supported by the National Key R&D Program of China (No. 2020YFB0704503), the National Natural Science Foundation of China (Grant No. 52001098 and Grant No. 51831007), and the KeyArea Research and Development Program of GuangDong Province (Grant No. 2019B010943001), as well as the open research fund of Songshan Lake Materials Laboratory (2021SLABFK06).
Not applicable.
Not applicable.
© The author(s) 2022.
Supplementary Materials