With rapid developments in big data and artificial intelligence technologies, materials informatics has become a new paradigm of materials science and engineering. In this review, the progress of modeling studies of phase stability in alloys is presented, with particular attention given to the development of the paradigm from traditional computational materials science (CMS) to materials informatics. The features of CMS models for phase stability studies are compared with those of data-driven approaches. The advantages of data-driven modeling in the framework of materials informatics are revealed. The approaches for developing interpretable machine learning, which has been mainly integrated with the developed CMS models and material science theories, are also discussed. Finally, the prospects for data-driven materials design based on the stability control of the dominant phases with regards to performance are proposed.

The mechanical and functional properties of materials are largely dependent on their phase constitutions and microstructures, which are dominated by the design of components and preparation processes^{[1,2]}. The stability of the dominant phases is significant for achieving excellent material properties and maintaining these properties due to the stabilized phase structure. Studies of phase stability and transformation behavior facilitate our understanding of the correlation between phase constitution and materials performance and the development of materials with excellent properties and high-stability dominant phases^{[3]}. Essentially, phase stability is mainly determined by the composition, structural features and energy state of the phase^{[4]}.

Phase stability can be investigated by computational methods. Computational materials science (CMS), which is an interdisciplinary subject that traverses both materials science and computer science, is a subject that uses computation and simulation technologies to study the composition, structure and properties of materials^{[5]}. Since the rise of CMS theories and methods in 1980s^{[6]}, the development of materials design has been greatly promoted. CMS methods, including first-principles calculations^{[7]}, molecular dynamics^{[8]}, Monte Carlo simulations^{[9]}, phase-field simulations^{[10]}, CALPHAD (calculations of phase diagrams)^{[11,12]} and the finite element method^{[13]}, have been well developed and play important roles in various materials and at microstructural scales. However, for traditional CMS methods, which are generally used at certain length scale of material microstructures, the computation time may be largely extended with the increase of element types and complexity of phase constitutions and crystal structures in the material.

With the accumulation of experimental and calculation data from research, the development of database technology and the integration of computing and big data techniques, the field of materials informatics has rapidly developed in recent years^{[14-28]}. As proposed by Zhang ^{[14,15]}, materials informatics integrates techniques, tools and theories drawn from a variety of fields, such as data science, the internet, computer science and engineering, as well as digital technologies applied to materials science and engineering to accelerate materials, products and manufacturing innovations. Materials informatics has now become a new paradigm for materials science research^{[29-31]}. In the framework of Materials Genome Engineering (MGE)^{[32-34]}, which seeks to boost high-efficiency materials design, high-throughput calculations and big data technologies continue to be demonstrated for applications in energy materials^{[35]}, biomedical materials^{[36]}, rare-earth functional materials^{[37]}, catalytic materials^{[38]}, superalloys^{[39]} and other material systems. A number of MGE technologies have achieved important breakthroughs, for example, in the establishment of models and algorithms^{[40,41]}, the development of specific databases^{[42]} and the invention of high-throughput experimental techniques^{[43,44]}.

In this review, we provide an overview of modeling studies of phase stability in alloys, including theories, models and methods. The Sm-Co alloy system, as one of the two most famous rare-earth permanent magnetic materials families^{[45]}, is taken as an example, with the consideration that this system is rich in various stable and metastable phases under the conditions of binary compositions^{[46-53]}. Efforts are made to introduce the progress in the studies of the phase stability and transformation behavior of Sm-Co-based alloys, in which the development of the research approaches from the traditional CMS methods to strategies based on the materials informatics is particularly demonstrated. Moreover, the advantages and prospects of extending new approaches for studying phase stability to additional research challenges are discussed, including shortening of the research and development period, increasing the modeling accuracy, reducing the costs of experimental trial-and-error processes and accelerating the development of new high-performance materials.

In recent decades, significant progress has been made in the field of rare-earth permanent magnetic materials. Sm-Co alloys have irreplaceable advantages in this regard, including high Curie temperatures, good thermal stability and excellent magnetic performance at high temperatures^{[54]}. Thus, Sm-Co alloys are the most promising candidates for applications in critical fields such as aerospace, marine power plants and military apparatus^{[55]}. The Sm-Co system is rich in phases with various crystal structures^{[56]}, as shown in _{3}Co, Sm_{9}Co_{4}, SmCo_{2}, SmCo_{3}, Sm_{2}Co_{7}, SmCo_{5} and Sm_{2}Co_{17}, and metastable phases such as Sm_{5}Co_{19}, SmCo_{7} and SmCo_{9.8}^{[57-62]}. These various phases have different thermodynamic stabilities and the transformation between them certainly influences the magnetic properties of Sm-Co alloys.

Binary phase diagram of Sm-Co system^{[56]}.

Increasing attention has been devoted to the research and development of nanoscale rare-earth permanent magnetic materials in recent years^{[46,63]} and these exhibit superior properties compared with their conventional coarse-grained counterparts. A high coercivity can be obtained in a nanostructured alloy because the magnetization reversal is strongly pinned by the grain boundaries that are extremely increased with the decrease of grain size in nanocrystalline materials^{[46]}. In addition, some Sm-Co phases that are metastable in the conventional coarse-grained alloy system can be stabilized in the nanocrystalline system due to the nanoscale effect on the phase stability^{[64]}. Thus, it is very promising to moderate phase stabilities by tailoring the grain size in the Sm-Co system, thereby improving the magnetic performance of the material^{[49]}. In the following sections, progress in the studies of phase stability in Sm-Co alloys is reviewed with regards to modeling approaches from traditional CMS calculations to innovative materials informatics.

A number of investigations have revealed that nanocrystalline materials may have abnormal phase stabilities that result in different phase structures^{[65-74]} compared with their coarse-grained counterparts with the same composition. An earlier thermodynamic calculation reported that for the typical γ-Fe (FCC) ↔ α-Fe (BCC) transformation, if the grain size is sufficiently small, e.g., in nanocrystalline Fe, the γ-Fe phase, which is normally stable at high temperatures, will be stabilized at room temperature^{[75]}. With decreasing grain size, grain boundaries play an increasingly important role in the phase stability of nanocrystalline materials^{[76]}. In a universal model proposed for nanocrystalline metals and single-phased alloys^{[77]}, the total Gibbs free energy of the system was calculated as the summation of the energies of the crystalline and interfacial components. Based on a dilated crystal model^{[78,79]}, the state of the interfaces in a nanocrystalline system was described in terms of its excess volume and thickness. Similarly, the decrease in the melting and evaporation temperatures with the particle size of the metals was also attributed to the nanoscale effect on the phase transformations. It was considered that the allotropic transformation of Ag nanoparticles is size dependent and the total Gibbs free energy is greatly affected by the surface energy and stress^{[80]}. Therefore, thermodynamics developed for nanoscale materials are required to fill the gap between macroscopic classical thermodynamics and microscopic quantum thermodynamics and thus to interpret abnormal phase transformation phenomena in nanocrystalline materials.

From the view of thermodynamics, a large number of disordered atoms at the grain boundaries significantly influence the entropy, enthalpy and Gibbs free energy of a nanocrystalline material, leading to different thermodynamic functions of the system compared to conventional coarse-grained materials. It is known that the Gibbs-Thomson equation relates the chemical potential to the radius of curvature and energy of the interface^{[81]}, which describes the state of a given interface. To derive the thermodynamic functions of the enthalpy, entropy and Gibbs free energy of a nanocrystalline bulk system, a model that considers the effects of grain size and temperature should be developed. In a classic thermodynamic model for nanocrystalline materials^{[77]}, an “excess volume”, ^{[77]}:

where _{i} and _{b} are the spatial distribution densities of atoms in the grain interior and at the grain boundary, respectively,

The ratio between the atomic densities at the grain boundary and in the grain interior can be expressed as^{[48]}:

where _{b}, can then be expressed as:

where _{0} is the unit cell volume in a perfect crystal. The fundamental thermodynamic functions, i.e., enthalpy, entropy and Gibbs free energy, of the grain boundaries are then given by^{[48]}:

where ^{[69]}, _{V} is the specific heat capacity at constant volume, γ is the Grüneisen parameter of the dilated crystal at the grain boundary^{[79]} and _{R} is the reference temperature. Thus, the thermodynamic functions of a nanocrystalline alloy can be described as the weighted average of the corresponding functions of the grain boundary and interior components^{[48]}:

where _{b} is the volume fraction of grain boundaries, _{A} is the Avogadro constant, _{i}, _{i} and _{i} are the enthalpy, entropy and Gibbs free energy of the grain interior, respectively. From the model, the thermodynamic properties and hence the phase stability and phase transformation characteristics are deterministic functions of grain size and temperature.

By applying the above model, Xu ^{[64,67,71-74]} calculated the phase stabilities of a series of nanocrystalline Sm-Co binary alloys and the tendency of phase transformations was then predicted. As an example, in coarse-grained alloys, Sm_{2}Co_{17} has a stable rhombohedral 2:17R phase at room temperature, while the 2:17H phase is only stable at temperatures higher than 1520 K^{[74]}. However, as predicted by the model calculations shown in ^{[74]}, with decreasing grain size, the Gibbs free energy differences (_{H-R}) between hexagonal and rhombohedral Sm_{2}Co_{17} decrease at different temperatures. There exists a critical grain size corresponding to _{H-R} = 0, implying the critical condition of grain size for the phase stability at a given temperature. In other words, if the grain size is smaller than the critical value, the 2:17H phase can be stabilized at a lower temperature than the phase transformation point of the conventional coarse-grained Sm_{2}Co_{17} alloy. Moreover, the 2:17H may become the dominant phase at room temperature when the grain size is sufficiently small.

Thermodynamic calculations and experimental verifications of Sm_{2}Co_{17} alloys with different grain sizes^{[74]}. (A) Calculated Gibbs free energy differences between hexagonal (2:17H) and rhombohedral (2:17R) phases as a function of grain size at different temperatures. (B) Calculated Gibbs free energies of nanocrystalline 2:17H and 2:17R phases as a function of grain size at 1000 K. (C) Transmission electron microscopy images, electron diffraction patterns and indexing of nanocrystalline Sm_{2}Co_{17} bulk samples in the as-prepared state (C1), annealed at 973 K for 1 h (C2) and annealed at 1073 K for 1 h (C3), respectively.

Experiments were carried out to verify the model calculations in Refs.^{[48,74]}. As shown in _{2}Co_{17} and an average grain size of 15 nm, which was smaller than the model-predicted critical value of 30 nm, the sample had a stable 2:17H phase instead of 2:17R at room temperature, as indicated by transmission electron microscopy^{[74]}. For the sample annealed at 973 K for 1 h, the average grain size was increased to 45 nm and the sample had a mixture of the 2:17H and 2:17R phases at room temperature. When the sample was annealed at 1073 K for 1 h, the average grain size was 80 nm and the alloy had a single 2:17R phase at room temperature. Therefore, the experimental results confirmed the model calculations for the grain size-dependent phase constitution at room temperature in the Sm_{2}Co_{17} alloy system. This model has been applied to a series of equilibrium (thermodynamically stable in the binary phase diagram) phases in the Sm-Co system, e.g., SmCo_{5}, SmCo_{2}, Sm_{2}Co_{7}, SmCo_{3} and Sm_{9}Co_{4} alloys^{[64,72,73]}, to evaluate their phase stabilities and transformation behavior at both the micro- and nanoscales.

Metastable phases in the Sm-Co system, such as SmCo_{7}, Sm_{5}Co_{19} and SmCo_{9.8}, have been successfully prepared by some non-equilibrium techniques developed in recent years^{[47,61,62,82,83]}. The metastable phase dominated alloys can have special and outstanding functional properties, which may be difficult to obtain in equilibrium phased alloys. However, the metastable phases are generally destabilized at higher temperatures due to their thermodynamically non-equilibrium state. In conventional coarse-grained alloy systems, various doping elements are often used in experiments to stabilize the metastable phases^{[84-92]}. A few theoretical studies have been reported in for modeling the phase stability in metastable phased alloys^{[93,94]}. Unfortunately, the covered alloy systems and microstructural scales by modeling are still limited.

In a developed thermodynamic model^{[64]}, it was proposed that the degree of the stability, or the relative stability, could be described by a parameter of phase activity, which had a value from 0 to 1. The activity of a phase, _{i}, was introduced to characterize the chemical potential difference between a component in the alloy system (_{i}) and the component at the reference state ^{[95]}:

where _{i} = 1 means the component

_{7} as an example^{[95]}. The range of grain sizes for the calculation covered from nanocrystalline to a conventional coarse-grained system. In a coarse-grained structure (_{7} is much lower than _{2}Co_{17} and SmCo_{5} are _{7} phase cannot exist stably in the coarse-grained system and at room temperature it decomposes into the Sm_{2}Co_{17} and SmCo_{5} phases. The calculations of the metastable phase model based on the concept of phase activity explained the thermodynamic nature of the instability of the SmCo_{7} phase in the coarse-grained alloys.

Model calculations based on phase activity^{[95]}. (A) Phase activities in the SmCo_{7} system at room temperature as a function of grain size. (B) Mole fractions of different phases in the SmCo_{7} system at room temperature.

For a nanocrystalline system (_{7} phase increases rapidly with decreasing grain size. When the grain size is reduced to below 51 nm^{[95]}, the phase activity changes from < 1 to 1, indicating that the nano-grained SmCo_{7} can exist stably as a single phase at grain sizes smaller than 51 nm. _{7} alloys^{[49,96]}. The average grain sizes of the as-prepared nanocrystalline alloy and the sample annealed at 873 K for 0.5 h were 20 and 35 nm, respectively^{[95]}, which were both smaller than the predicted critical grain size (51 nm). Thus, the SmCo_{7} phase was found to exist stably as a single phase in these samples _{7} phase was partially decomposed into SmCo_{5} and Sm_{2}Co_{17}, which can be distinguished by the very fine nanoparticles and microtwins in ^{[96]}. This confirms that when the grain size is larger than the critical value, the SmCo_{7} phase is destabilized and transforms to the equilibrium 2:17R phase. Therefore, with the experimental confirmation, the thermodynamic model based on the phase activity can evaluate the phase stability and transformation tendency of metastable phases.

Experimental investigations of the phase constitutions in SmCo_{7} samples with different grain sizes^{[95]}: (A) as-prepared state with an average grain size of 20 nm; (B) sample annealed at 873 K for 0.5 h with an average grain size of 35 nm; and (C) sample annealed at 973 K for 0.5 h with an average grain size of 161 nm.

Transmission electron microscopy images and diffraction analysis of a grain with a size of 166 nm^{[96]}: (A) bright-field image; (B) corresponding diffraction pattern and indexing, triangles indicating the 2:17R superstructure reflections; and (C) dark-field image reflecting the twin variants (displayed as red and green), indicating the single 2:17R phase of the whole grain.

It is known that doping is an alternative method to stabilize metastable phases in alloys^{[84-92]}. The dopants can change the interactions between the matrix elements, the chemical environment and the system energy and thus influence the phase stability significantly. Considering the different types and contents of the doping elements, as well as the complex interactions between the dopants and the matrix elements, the selection of appropriate doping elements is challenging in materials design. Meanwhile, it is usually difficult to explain the mechanisms of the doping effect on the phase stability only by experimental investigations. In an earlier theoretical study^{[87]}, parameters such as formation enthalpy, difference in atomic radius and electronegativity were used to estimate the stability of Sm(Co,M)_{7} (M is the doping element). However, in the calculations where more empirical or semi-empirical parameters were used, the modeling results were seldom verified by experimental tests.

In recent years, first-principles calculations based on density functional theory were carried out to study the structural stability of multicomponent alloys and the interactions between the electrons of the matrix and doping elements. Compared with thermodynamic models, first-principles calculations can be used to not only evaluate the crystal structure stability of a certain phase, but also to visualize the interactions between different atoms, so as to explain the mechanisms of doping at the atomic scale. In first-principles models, the optimization of the unit cell structure of the crystal is generally first performed and then some specific energies, such as the formation, interface and Gibbs free energies, are calculated. For example, the formation energy of a Sm(Co,M)_{x} phase can be defined as^{[97,98]}:

where _{1}, _{2} and _{3} are the atomic numbers of the corresponding elements in the supercell.

The formation energies of the binary and Hf-doped SmCo_{7} phases are shown in _{6.75}Hf_{0.25} phase, different doping sites were considered in the calculations^{[97]}. The results show that when the Hf atoms occupy the Co-2e site, the Hf-doped SmCo_{7} phase is the most stable. Compared with the binary SmCo_{7} phase, the decrease in the formation energy of the Hf-doped SmCo_{7} phase indicates that the doping of Hf may stabilize the metastable SmCo_{7} phase. In addition to the energy state, temperature also influences the site occupation of the doping elements. The site occupation probability _{i} is a function of temperature based on the Maxwell-Boltzmann statistical distribution and can be expressed as^{[97]}:

First-principles calculations for the SmCo_{7} alloy with Hf doping^{[97]}. (A) Formation energies calculated for SmCo_{7} and SmCo_{6.75}Hf_{0.25} with different doping sites. (B) Calculated occupation probability as a function of temperature for the SmCo_{6.75}Hf_{0.25} phase. (C) Total charge density distributions of SmCo_{7} (C1) and SmCo_{6.75}Hf_{0.25} (C2) on (100) plane. (D) Total density of states and partial density of states of SmCo_{6.75}Hf_{0.25}.

where _{i} is the multiplicity of configuration _{i} is the change of the Gibbs free energy and _{B} is the Boltzmann constant.

The calculations showed that Hf occupation at the 2e site has the highest probability at temperatures below 700 °C _{7} and SmCo_{6.75}Hf_{0.25} on a certain crystal plane ^{[97]}. It can be seen that the electrons accumulate around Co atoms when the Co-2e atom is partially replaced by a Hf atom, which implies a strong electronic interaction between Co atoms. Comparing _{7} phase structure can be enhanced by Hf doping.

Due to the spin polarization, the peak of the density of states splits into spin-up and spin-down states, as shown in ^{[97]}. The density of states of the electrons at the Fermi energy is set as zero. The ferromagnetic behavior is considered to originate from the asymmetry of the up and down density of states in the upper valence band near the Fermi level^{[97]}. The magnetic moments are mainly contributed to by the 3d electrons of Co atoms and the 4s and 4f electrons of Sm atoms. The doping of Hf leads to an increase in the symmetry of the total density of states

Experiments confirmed the first-principles calculations, as shown in _{7} phase in the SmCo_{7}, SmCo_{6.85}Hf_{0.15} and SmCo_{6.75}Hf_{0.25} alloys were 700, 800 and 850 °C, respectively^{[49,99]}. Clearly, the stability of the SmCo_{7} phase was improved by Hf doping. From modeling and experimental verification, it can be considered that first-principles calculations are effective for evaluating the interactions between the doping elements and the matrix and are thus able to analyze the structural stability of different phases.

Experimental results of phase constitutions of (A) SmCo_{7}, (B) SmCo_{6.85}Hf_{0.15} and (C) SmCo_{6.75}Hf_{0.25} alloys after annealing at different temperatures^{[49,99]}.

Based on first-principles calculations, as shown in ^{[100]} found that the formation energy of SmCo_{5} increased with increasing Fe doping content, while the magnetocrystalline anisotropy energy exhibited the opposite tendency. Moreover, doping with Ni can reduce the formation energy of SmCo_{5} and the co-doping of Fe and Ni causes the formation of the SmCoNiFe_{3} phase, which has a high saturation magnetization and stable CaCu_{5}-type crystal structure^{[101]}. In addition, it was found that there exists a negative correlation between the formation energy and the number of 3d electrons ^{[102]} performed thermodynamic calculations on the heat of formation of the Sm(Co_{1-x-y}Fe_{x}Ni_{y})_{5} phase using homemade software in the framework of Miedema’s model, as shown in _{5} phase.

First-principles calculation results and analysis from Das ^{[100]}, Söderlind ^{[101]}, Gavrikov ^{[102]}. (A) Calculated magnetocrystalline anisotropy and formation energies of the SmCo_{5-x}Fe_{x} phase as a function of Fe content. (B) Calculation and experimental results of formation energies as a function of the number of 3d electrons. (C) Concentration diagram of the heat of formation (Δ_{f}) of Sm(Co_{1-x-y}Fe_{x}Ni_{y})_{5} for equiprobably distributed 3d ions at 2c and 3g sites. (D) Δ_{f} of Sm(Co_{1-x-y}Fe_{x}Ni_{y})_{5} for selectively distributed Co/Ni (2c) and Fe (3g) 3d ions. The graded blue regions in (C) and (D) correspond to the negative values of the heat of formation.

As introduced in the former section, first-principles calculations can be used to evaluate the atomic interactions and hence the effect of doping elements on the structural stability and even material properties. However, it is generally difficult in first-principles calculations to build an accurate model for disordered crystal structures that are often formed due to doping^{[93]}. In this respect, some methods based on interatomic potentials, which are able to deal with millions of atoms, have been developed. A concise inverse method was proposed by Chen^{[103,104]} based on the modified Möbius inverse transformation in number theory. It assumed that the total cohesive energy per atom in a perfect crystal, ^{[105]}:

where _{i} is the lattice vector of the _{0}(_{0}(_{0}(^{[105]}:

where

In order to obtain the necessary interatomic potentials, some virtual structures were designed, e.g., bcc Co was proposed as a B2 or CsCl structure with two simple cubic (SC) sub-lattices, Co1 and Co2. Thus, the cohesive energy of Co-Co was obtained as^{[106]}:

where _{Co-Co}, can be obtained directly using Chen’s lattice inversion technique. In the same way, all other kinds of interatomic potentials can be obtained, as shown in ^{[105]}. The interatomic pair potential can be fitted by the Morse function^{[105]}:

Pair potentials for Sm(Co,M)_{12} as a function of interatomic distance, M = (A) Ti or (B) V. Average energy of SmCo_{12-x}M_{x} with different doping sites as a function of the content of doping element, M = (C) Cr, (D) V, (E) Nb and (F) Ti^{[105]}.

where _{0} is the depth of the potential, _{0} is the equilibrium distance and γ is a parameter. The pair potential is a function of interatomic distance _{12} lattice, the M atom is mostly surrounded by the Co atoms. It is the difference between _{Co-M}(_{Co-Co}(_{Co-M}(_{Co-Co}(^{[105]}.

As shown in ^{[105]}, the preferential occupation of the doping elements Cr, V, Nb and Ti at the 8i site result in the largest energy decrease in SmCo_{12}-based alloys and the ThMn_{12} structure can be stabilized. This model was also applied successfully to the evaluation of the preferential occupation and stabilization of the doped SmCo_{7} alloy. The calculations predicted that Ti, Zr and Hf prefer to occupy the 2e site and Ga, Si and Cu prefer to occupy the 3g site^{[93]}.

As described above, thermodynamic, first-principles and numerical computations are all modeling methods for individual microstructural scales and specific compositions and crystal structures. It is very difficult for these methods to perform high-throughput calculations and predictions for material systems covering a broad range of compositions and phase structures, as well as phase stabilities and transformation characteristics of the systems with various doping elements. Due to the complexity of the cell structure and the diversity of the doping sites, the computation period is usually long. For permanent magnetic materials, to obtain sufficient information of the cell structure of a certain phase in the doped system and the related electronic and magnetic structures, the all-electron method should be used to solve the wavefunction^{[100]}, which is extraordinarily time-consuming. However, other methods of first-principles calculations generally have poor convergence when dealing with 4f electrons. As a result, first-principles and thermodynamic calculations of the permanent magnetic materials are limited only to some specific systems and are difficult to carry out for large-scale calculations covering more factors and with a high computational efficiency.

The features of materials informatics enable it to bridge the gap between multiscale models in the study of phase stability. In particular, data-driven models and methods play a significant role in the development of new materials and are attracting increasing attention in materials science and engineering^{[107-109]}. The prerequisite for data-driven materials design is to build high-quality datasets. The sources of the datasets can be divided into two categories, one is from high-throughput computing or experiments and the other is from published research studies. The first type of data is basically dominated by the development of high-throughput experimental approaches and computation methods and at present is limited to a narrow range of materials. The datasets obtained from scattered research studies can make full use of the massive data generated in previous work to solve a comprehensive range of existing scientific issues. With the help of active learning algorithms, data-driven materials design methods can accelerate the progress of development of new materials tremendously.

In the field of Sm-Co-type permanent magnetic materials, the authors have spent over ten years building up a specific database, which contains both experimental and computational data for ~1050 alloys published in the literature over a period of decades. The construction of this database includes three main parts: database models, information management system and database applications. The information management system, known as the “Material Knowledge Information Analysis, Association and Management (MKI-AAM) System”^{[110]}, was established to cooperate with the constructed database to improve the efficiency of data collection and the quality of data and also to avoid repetitive data collection. It therefore accelerates the generation of high-quality datasets for machine learning and data-driven materials design.

As shown in ^{[111]} was used to obtain datasets for data-driven materials design. In addition, through the development of the application programming interface^{[112]}, the information management system can be docked with data analysis software or database exchange platforms of MGE.

Layout of the structure and functions of the “Material Knowledge Information Analysis, Association and Management (MKI-AAM) System” constructed by the authors.

Furthermore, as shown in

Example of modules in the construction of the Material Knowledge Information Analysis, Association and Management (MKI-AAM) System.

^{[31,113]} for scientific data. Through the above treatment, the data items are closely related and the data system is highly structured, which therefore facilitates the rapid retrieval of different kinds of data and collects them with a standard format.

Construction and inner correlations of a Materials Genome Engineering (MGE)-oriented dataset generated by the Material Knowledge Information Analysis, Association and Management (MKI-AAM) System using Sm-Co materials as an example^{[110]}.

The storage status of data in the MGE-oriented database is shown in

A tree diagram for the data volume and dataset constituents in the Materials Genome Engineering (MGE)-oriented database using the Sm-Co-based systems as an example to show the storage status of data^{[110]}.

The data collected in a database inevitably have multi-source heterogeneity. After data preprocessing, a dataset containing sufficient qualified data will be used for data mining and machine learning aimed at materials design. In the constructed MGE-oriented database of Sm-Co type alloys, with the exception of data of materials information, a number of factors that influence the composition, phase constitution, microstructure and properties are also included. Many empirical and semi-empirical formulas used in the modeling studies of materials have been obtained through “trial and error” experimental processes. Although empirical and semi-empirical methods have limitations of lower accuracy and are time consuming, these methods and the resultant criteria provide important references for researchers to set up features of materials for machine learning processes. The concept of “feature engineering”^{[114]} in machine learning has provided materials design with a more comprehensive and accurate approach in massive and multi-dimensional feature space.

Here, machine learning studies on phase stability are demonstrated using the SmCo_{7}-type alloys as an example, which is the most representative group for metastable phase systems in Sm-Co-based permanent magnetic materials. In the modeling process, the correlation between machine learning and materials informatics is analyzed. We first set up five features of the SmCo_{7-x}M_{x} (M is the doping element) alloys for the study of phase stability, i.e., preparation process (_{proc}), material form (_{form}), grain size (_{sup}). Thus, the function of SmCo_{7} phase stability can be described as:

The fundamental features of the elements are then embedded into machine learning models, as listed in

Fundamental features of elements for machine learning

Feature name | Symbol | Feature name | Symbol |
---|---|---|---|

Atomic number | Electrical conductivity | ||

Atomic radius | _{a} |
Heat of fusion | _{fus} |

The 1st ionization energy | _{i,1st} |
Heat of vaporization | _{vap} |

Standard atomic weight | _{r} |
Thermal conductivity | |

Melting point | _{m} |
Work function | |

Boiling point | _{b} |
Electron density | _{WS} |

Electronegativity | Atomic volume | _{a} |

Comprehensive chart of element features: data distributions of each element (histograms in the diagonal), data relationships (lower-left part) and Pearson correlation coefficients (upper-right part).

Finally, seven parameters, including the atomic radius (_{a}), relative atomic mass (_{r}), melting enthalpy (ΔH_{fus}), melting point (_{m}), conductivity (_{i,1st}) and electronegativity (

For SmCo_{7-x}M_{x} alloys, the composition feature

The features for machine learning are constructed by three types of parameters, as given in

Classification of features for machine learning

Formula | Meaning |
---|---|

_{sub}· |
Product of doping content and element features |

_{sub}· |
Absolute difference of element features of M and Co, combined with the doping content |

_{sub}· |
Absolute difference of element features of M and Sm, combined with the doping content |

The processing of a material (_{proc}) and its form (_{proc}) are categorical variables and should be converted to numeric (dummy) variables. Thus, the stability of the SmCo_{7} phase (_{1:7}) can be expressed as a function of the processing of the material, the form of the material, grain size and compositional feature:

The support vector machine algorithm with a radial basis function as the kernel is selected for machine learning, which has good extrapolation performance. If no element feature is selected as input, the model uses the processing, grain size and material form to evaluate the phase stability and its area under curve (AUC) value is 0.72. If the element feature is selected, the AUC value will be greatly improved, i.e., the accuracy of the prediction is increased. When _{sup}_{m} and _{sup}_{Co}_{M}

Machine learning results of phase stability of doped SmCo_{7} alloys. (A) Average area under curve (AUC) value as a function of the number of composition features, where the highest AUC corresponding to each number of features is marked with red. (B) Data distribution and classification of phase constitutions of SmCo_{7-x}M_{x} alloys in the dataset based on the two selected features of doping elements. (C) Prediction of 1:7 single phase probability with various doping elements for SmCo_{7-x}M_{x} ribbons. (D) Prediction of 1:7 single phase probability with various doping elements for SmCo_{7-x}M_{x} sintered bulks. In (C) and (D), different grain sizes and doping contents are considered for various doping elements.

_{7}-type alloys. Nearly all the 1:7 single-phase alloys are located in the lower left corner in _{sup} and the Y-axis is the grain size

As demonstrated above, each individual modeling method has its own features in characterizing the phase stability and transformation criterion. First-principles calculations can predict many intrinsic properties for ideal crystals, regardless of the microstructures and processing of the material. In particular, it can calculate the formation energy and electronic interactions to evaluate the phase stability. However, for phases with complex crystal structures or multicomponent systems with dopants, first-principles calculations are usually difficult to perform with high computational efficiency and reliability. For thermodynamic calculations, the phase with stoichiometry is generally the fundamental component in the model; thus, the phases with varying composition, such as solid solutions and doped matrices, cannot be calculated accurately. Numerical computations based on the cohesive energy are not limited to the type and number of doping elements, but it is difficult to describe the interactions between the atoms of different elements in a compound by the pair potential of independent atoms, especially for multicomponent systems. Data-driven approaches are effective tools to deal with high-dimensional and massive data, but the multi-source heterogeneity of the data and machine learning models with a weak physical background may be significant obstacles for applications. Therefore, we believe that CMS modeling cannot be replaced by data-driven approaches and instead the integration of CMS and materials informatics will strengthen and accelerate progress in the modeling studies of materials science.

As overviewed in the above sections, modeling studies of phase stability have been developed from traditional computational materials science to materials informatics. With the cooperation of traditional computational methods, database technology and machine learning, significantly accelerated elements screening and materials design are expected to be realized through high-throughput simulations.

In the framework of materials informatics^{[15]}, data-driven modeling in materials science is actually to use artificial intelligence to reveal the relationship of composition, structure, processing and properties. The laws behind the data may provide new approaches and perspectives for high-efficiency materials design. Data-driven modeling is a powerful complement and extension of the traditional cognitive paradigm in materials science. To improve the accuracy and efficiency of the modeling, domain knowledge should be introduced into machine learning algorithms. In this respect, the following prospects are proposed.

Sufficient materials data represent the basic premise for the implementation of data-driven modeling. The database from which the dataset is extracted should follow the principle of FAIR. With the improvement of data quality, data sharing will become easier and the total social cost will be reduced. At present, the MGE-oriented databases have covered many types of materials, such as superalloys, energy materials, steels, light alloys, composites and rare-earth permanent magnets. It can support the data-driven modeling studies on phase stability in these material systems and promote the high-efficiency development of new materials. With the progress of computational models and methods, numerous computational data can be obtained by high-throughput calculations. However, experimental data are usually scattered, incomplete or not unified in a unit system. In particular, the multi-source heterogeneity of experimental data may cause a serious decline in data quality, which is a large obstacle to data-driven modeling and materials design. Therefore, it is highly desirable to integrate materials science and data processing technologies to acquire knowledge and techniques of data collection, storage, transfer, fusion, and applications. Simultaneously, it is essential to set up standards for data that are suitable for data-driven modeling and materials design. The construction of a data-related information system should be incorporated into the establishment of a database for data-driven applications.

With the improvement of computing capabilities and the functions of integrated computational software, high-throughput calculations will surely be developed with an increasingly high speed. High-throughput calculations are advanced in multi-channel, multi-target and multi-task, and high-concurrency calculations and data management, which are able to generate a large amount of data that will be subsequently used for machine learning. For example, the Materials Project database, which is famous for structural data and the properties of inorganic compounds, has collected data from high-throughput first-principles calculations of ~65,000 materials in the Inorganic Crystal Structure Database. Taking advantages of high-performance computing clusters and developed programs for high-throughput calculations, a variety of new materials have been designed and prepared in experiments^{[115-117]}. However, for some material systems containing complex crystal structures and atomic interactions of multicomponent, even single-task calculation is very time-consuming and high-throughput calculations are challenging to carry out. Therefore, it is highly demanded to develop and optimize models and algorithms for multi-target and multi-task calculations, as well as to combine multi-scale and multi-stage simulation tools, to realize high-efficiency high-throughput calculations.

Recently, great successes have been achieved in machine learning studies of a large variety of materials^{[118-120]}. However, the lack of interpretability of machine learning algorithms and results limits the applications of machine learning in reality and especially stability-sensitive tasks. Actually, with the development of expression capacity of machine learning algorithms, the models become more complicated and their interpretability may be even weaker. At present, the interpretability of machine learning is still a difficult problem and particularly challenging for researchers in materials science. The models obtained by training based on machine learning algorithms can be considered as a “black box” and are therefore difficult for users to understand the inner working mechanisms.

With the applications of machine learning going deeply into materials design, more attention will be paid to the interpretability of machine learning results. Some valuable attempts have been reported very recently^{[121,122]} concerning the combination of machine learning and first-principles calculations to disclose the effects of alloying elements and propose the strategy of multicomponent materials design. The data-driven modeling on materials phenomena, processing, and relationship of composition, structure and properties are essentially effective accesses to investigate the interpretability of machine learning methods and results. Integration of machine learning with the developed CMS models and material science theories is considered to be the future tendency for developing interpretable machine learning.

The progress of modeling studies of phase stability in alloys has been reviewed in this article using Sm-Co permanent magnetic materials as examples. The features of various traditional CMS methods for the modeling of phase stability were analyzed and compared, and machine learning studies in the framework of materials informatics were also demonstrated. Both theories and techniques of materials informatics will be developed rapidly by integrating database technologies, high-throughput calculation models and machine learning methods around the core of materials science and engineering. The modeling of phase stability will be significantly advanced by the development of materials informatics, which then accelerates materials design and processing and further progresses the manufacturing of products with high performance.

The authors would like to thank Dr. H. Li (Faculty of Materials and Manufacturing, Beijing University of Technology) for providing CASTEP in partial calculations.

Made substantial contributions to conception and design of this review, writing and editing: Song X

Made substantial contributions to collation of literatures, figures preparation, and writing: Guo K

Performed data analysis, discussion and writing-review: Lu H, Tang F

Performed data acquisition and interpretation: Liu D

Not applicable.

This work was supported by the National Key Program of Research and Development (2018YFB0703902) and the National Natural Science Foundation of China (51631002).

All authors declared that there are no conflicts of interest.

Not applicable.

Not applicable.

© The Author(s) 2021.

_{4}.

_{7}alloy.

_{3}Co alloys.

_{5}Co

_{2}nanocrystalline alloy.

_{2}Co

_{17}permanent magnet.

_{3}compound.

_{7}alloy.

_{9.8}alloy.

_{2}Co

_{7}.

_{2}Co

_{17}alloy.

_{7}based alloys.

_{x}Ti

_{0.4}-1:7 ribbons.

_{5.85}Si

_{0.90}compound.

_{7-x}Ga

_{x}.

_{7}compounds (M=Si, Cu, Ti, Zr, and Hf).

_{6.75-x}Fe

_{x}Zr

_{0.25}compounds.

_{7-x}Zr

_{x}alloys (x = 0-0.8).

_{7-x}Ti

_{x}with TbCu

_{7}-type structure.

_{7-x}Hf

_{x}compounds.

_{7}pseudobinary intermetallics with TbCu

_{7}-type structure.

_{7}-type compounds SmCo

_{7}and Sm(Co,T)

_{7}(T=Ti, Ga, Si, Cu, Hf, Zr).

_{7}nanograins.

_{7}alloys.

_{5}compounds doped with transition metal elements.

_{7}-type alloy.

_{3}.

_{1-x}Fe

_{x})

_{5}compound: thermodynamic calculation and experiment.

_{12}(M = Cr, Ti, V, Nb, Fe).

_{2}(Co, Mn)

_{17}(R = Nd, Sm, Gd).