Composition genes in materials

High-performance materials always possess specific chemical compositions. The present work points out that the composition genes, which are the basic structural units that serve as the composition carriers, are actually the molecule-like chemical units. Friedel oscillations, in combination with the cluster-plus-glue-atom model, are fully presented to show how to uncover the composition genes hidden in chemical short-range orders in any material. Examples are given in three categories of materials, i.e., metallic alloys including solid solutions and metallic glasses, inorganic compounds as well as relevant glasses, and polymers. Furthermore, materials can be classified into single-, dual-, and multi-gene types. The proposition of composition genes facilitates the understanding of prevailing materials and can be a useful tool to guide the exploration of new composition space.


INTRODUCTION
Solids originate from strong chemical bonding between atoms and make a rich material world [1] . The commonly used materials are conventionally classified into metallic alloys, inorganic compounds (as well as relevant glasses), and polymers (as well as their hybrids). In terms of chemical compositions, the materials are differently expressed. Metallic alloys, for the convenience of large-scale industrial production, are generally expressed in mass percentages of elements, such as 304 stainless steel 06Cr19Ni10 (the percent is placed after the element) and brass 70Cu-30Zn (the percent is placed before the element), which are summarized into various industrial grades. Inorganic compounds are mostly represented by atomic fractions or chemical formulas, as exemplified by silica SiO 2 and ceramic TiN x (x represents nonstoichiometry). Polymers are composed of macromolecules by combining several monomers (repeating subunits), such as polyethylene (C 2 H 4 ) n and polyvinyl chloride (C 2 H 3 Cl) n . However, there is a long-standing mystery on the structural origin of the materials' chemistry, which is especially important for metals and inorganic compounds that contain multiple elements, or alloyed materials. Even for polymers, the same question persists: is the monomer the composition unit of the macromolecule? In other words, what are the composition genes of materials? Here, the gene refers to the smallest structural unit that serves as the composition carrier of the material. Different from molecular substances, such as ice composed of H 2 O molecules and their connections via relative weak inter-molecular bonding, solid materials are dominated by inter-atomic chemical bonding. In metals and inorganic compounds, inter-molecular bonding is completely missing, and it is impossible to define molecules. The unit cell or atomic motif in crystallography, despite carrying the composition information, cannot be universally accepted as the chemical unit for the lack of structural stability contribution and for being unable to deal with short-range ordering. For example, for a Cu 3 Au structure, ordered or disordered alike, the unit cell always contains three Cu and one Au atoms, which is too small in comparison with the short-range ordering that extends to a few atomic shells. In polymers, the basic units within the macromolecules are still to be defined.
The composition genes of materials should represent the local short-range ordering, just as molecules.
Specifically, a composition gene should possess typical molecule-like characteristics, such as chemical composition of the entire structure, charge balance (or stable electronic structure), local atomic configuration, and mean atomic density. We developed a new structural model for short-range order structures, the so-called cluster-plus-glue-atom model [2,3] . This model regards any structure from the viewpoint of a local unit composed of a central atom, its shell (the cluster part), plus a few atoms located at the next-neighbor sites (the glue part). In the following, we show that such local units are exactly the composition genes of materials. Termed chemical units, they are molecule-like and meet nearly all the conditions of molecules.
The method of using nearest-neighbor clusters as the basic structural units to characterize phase structures has a long history. As early as the 1970s, Mackay [4,5] proposed that nearest-neighbor coordination polyhedral clusters (hereinafter referred to as clusters) can be introduced into the structure description of complex phases, instead of the traditional crystallographic method using space group and atomic position information; thus, the characteristics of the material structures can be more effectively reflected, and even the relationship between the compositions and structures can be established. By the 1990s, it was pointed out that quasicrystals and their corresponding crystalline phases have the same electron concentrations [6] . Subsequently, a great deal of research on the electron behavior formed the basis for the proposition of the cluster-plus-glue-atom model, mainly destined for amorphous structures [7][8][9] . Based on the cluster-plusglue-atom model, the composition genes of materials are the molecule-like chemical units, expressed as the cluster formula [cluster](glue atom) x , where x means the number of glue atoms. The cluster part is the local structure with the strongest interactions between atoms, and the existence of glue atoms is to balance the charge neutrality of the structural units and maintain the average atomic density. Therefore, this structural picture contains structural stability information, being different fundamentally from the unit cell concept in conventional crystallography. These composition genes exist in liquids, amorphous solids, solid solutions, and crystalline states. The composition design is made simple once the composition gene is known.
In the following, the molecule-like chemical units defined via the cluster-plus-glue-atom model are first presented, stressing their role as the composition genes in materials. Then, examples of the composition genes in various materials are given. Finally, a new classification method for materials according to the composition genes is proposed.

MOleCUle-lIke CheMICal UNITs fROM shORT-RaNge ORDeRs
In a classical particle system composed of a large number of atoms, the most obvious manifestation of ordering is positional order, which means that the positions of atoms in different places are related [10] . If there is no correlation at all, the distribution of atomic positions is completely random, that is, the system is in a completely disordered state, such as the ideal gas; if the correlation range of atomic positions is limited to neighboring atoms, the system exhibits short-range order (SRO); and, if the correlation range reaches infinity, the system shows long-range order (LRO) [11] . Many materials we commonly use today are characterized by locally disordered states, or SROs, such as those based on solid solutions (e.g., steels) or glasses (e.g., amorphous silicates), in which local symmetry is partially destroyed. The traditional crystallography is no longer applicable in dealing with such locally disordered states. There is an urgent need to establish a theoretical model to describe SRO structures and their composition formulas, in order to define the composition genes hidden in them.
To describe the phenomenon of ordering in alloys, various theoretical approaches have been proposed. Bragg and Williams [12] introduced the LRO parameter, S, and devised a simple theory giving S as a function of temperature. Subsequently, Bethe [13] proposed the "order of neighbors" σ to express the difference of the probabilities of finding an unequal and an equal neighbor beside a given atom and determined the longrange and nearest-neighbor orders for the case of AB alloys. Later, Peierls [14] extended this theory to the case of the face-centered cubic (FCC) A 3 B alloys such as Cu 3 Au, and the results better accord with experiments than did the theory of Bragg and Williams. Similarly, the theory of Kirkwood [15] describes the order and disorder in solid solutions based on a direct evaluation of the crystalline partition function. Profiting from the advances in diffraction techniques, the quantitative determination of LRO and SRO parameters becomes possible. Cowley [16] measured the SRO parameters for the first ten shells of neighbors in Cu 3 Au using the single-crystal diffuse scattering technique. Furthermore, he proposed the SRO parameters α i to express the interaction of a given atom in an alloy with the atoms at the ith shell surrounding it, which is still widely used today [17][18][19] .
The physical root of ordering in classical particle systems lies in the interaction between particles, while this interaction itself is a quantum mechanics problem. With the development of quantum mechanics, people began to pay more attention to the electron fluctuation in the process of studying the structures of metals and alloys. By the 1950s, the advent of Friedel oscillation theory [20][21][22] had revolutionized the understanding of the structure of matter. This theory is derived from Friedel's early research on the distribution of electrons around impurities in monovalent metals [20] . Subsequently, Friedel [21] extended his research into the electronic structure of primary solid solutions in metals and proposed that it is simply a question of solving a Schrödinger equation. He specifically considered the charge screening phenomenon generated by the introduction of small and strong perturbations in a uniform potential field and described the collective behavior of conduction electrons in metallic alloys [22] . Due to this collective behavior, when an impurity charge is introduced into the uniform potential field, the electron cloud around the impurity charge is polarized, thereby shielding the disturbance of the impurity charge to the whole system. However, this shielding is incomplete. The disturbance in the short range is not completely shielded, and the electron density around the impurity shows an oscillating distribution. This phenomenon, similar to water waves, is known as Friedel oscillations.
Following Friedel, Langer and Vosko [23] , Heine and Weaire [24] , Harrison [25] , and Ziman [26] also conducted specific studies on the shielding potential energy generated by the introduction of impurity charges, and the calculation results were mutually verified, that is, the effective pair potential φ eff (r) at distance r is proportional to a damped cosine-function: φ eff (r) ∝ cos(2k F •r + θ )/r 3 . Here, θ signifies the phase shift angle changing from zero to π/2 and is related to the scattering amplitude at the Fermi level [24] . k F means Fermi vector. For liquid and amorphous states, at short and medium distances, Häussler [27] experimentally verified that the phase shift θ is equal to π/2, and Kroha et al. [28] also obtained θ = π/2 through theoretical calculations. Therefore, the expression of the effective potential of Friedel oscillations is written as φ eff (r) ∝ -sin(2k F •r)/r 3 , as shown in Figure 1 [29] . To reach a stable state, atoms tend to be located near the mid-points of the negative potential zones in Friedel oscillations, at the distances of r n = (1/4 + n)λ Fr , where λ Fr = π/k F is the Friedel wavelength and n = 1, 2, 3, … is the shell sequence, so as to minimize the total energy of the system [29] . This arrangement sequence of atoms with the inter-shell spacing of Friedel wavelength is the socalled spherical-periodic order, which occurs due to the resonance between the electronic and static atomic structures.
The compositions of materials come from chemical SRO, which is related to Friedel oscillations. According to the Friedel-oscillation-based spherical-periodic resonance theory, the molecule-like chemical unit covering only the nearest-neighbor cluster plus a few next-neighbor glue atoms can be rationalized [3] . Since the function forms of the charge distribution and atomic density distribution are consistent with the effective potential function, it is easy to obtain the charge-neutral and mean-density radial distances by integrating the function -sin(2k F •r)/r 3 . The first of such positions falls near 1.76λ Fr , close to the mid-point 1.75λ Fr in the first positive potential zone of Friedel oscillations [30] , defining the size of the corresponding composition gene. Generally, the cluster part can be derived from the homologous crystalline phase structure, and the glue atom part can be further calculated according to the chemical unit as well as the actual composition, from which the composition gene can be ultimately determined. Figure 2 exhibits a typical cluster-plus-glue-atom composition gene relevant to FCC structure, whose cluster configuration is a cuboctahedron, and a few glue atoms are located outside the cluster.
For binary FCC-based solid solution alloys, the coordination number of the cluster is 12, and the glueatom shell in the next neighborhood contains 1-5 atoms. Thus, the chemical unit of a binary system can be expressed as [A-B 12 ]A x B y , containing 14-18 atoms, where the integer x + y represents the number of glue atoms, 0 < x + y < 6. It is assumed that the volume of each element remains unchanged after mixing [31] , so the chemical unit volume is the sum of all atomic volumes, i.e.,   x + y = 3, which means that, for an FCC solid solution alloy composed of solute and solvent atoms with equal atomic radii, which is equivalent to a single-element FCC structure or a completely disordered FCC structure composed of one average atom, the composition gene consists of 16  In addition to solid solution alloys, the composition genes of other materials can also be identified, such as amorphous alloys, inorganic compounds, glasses, polymers, etc. First, the clusters are derived from relevant phases of known structures (for amorphous alloys and glasses, they are devitrification phases); then, combined with the cluster-resonance model [32,33] , the electronic factor is introduced to address the structural stability (the electron number per unit e/u being a multiple of 8, conforming to the octet rule), thus to determine the glue atoms. In the calculation process of the composition genes, the chemical compositions, spatial configurations, charge neutrality, and average atomic densities are taken into account.

COMPOsITION geNes IN MaTeRIals
In this section, the molecule-like chemical units as the composition genes in various materials, covering metallic alloys, inorganic compounds and relevant glasses, and polymers, are examined via examples.

Metallic alloys
Metallic alloys generally consist of solid-solution-based industrial alloys and metallic glasses.

Solid-solution-based industrial alloys
Austenitic stainless steels are the most widely used type of stainless steels, most of which are derived from 304 grade, or 06Cr19Ni10, whose composition (using weight percentage, wt.%) is C ≤ 0.08, Si ≤ 1.00, Mn ≤ 2.00, P ≤ 0.045, S ≤ 0.030, Cr: 18.0-20.0, Ni: 8.0-11.0 [34] . Due to the austenite structure, the cluster is a cuboctahedron with the coordination number of 12 [ Figure 2]. The elements in substitutional solution of austenite have approximately equal atomic radii, so the number of glue atoms can be determined as 3 by using the calculation method about composition genes mentioned above. Thus, the composition gene of 304 stainless steel can be determined to be (Cr,Si) 3-3.5 -(Ni,Mn) 1.25-1.75 -Fe 10.75-11.75 , which covers the composition zones as specified in different industrial standards. This formula contains at least three Cr atoms, which signifies that Cr 3 is the minimum quantity required to guarantee to passivate the steel. Similarly, the composition gene of 316 stainless steel is (Cr,Mo,Si) 3-3.5 -(Ni,Mn) 1.75-2.25 -Fe 10.25-11.25 . The major Figure 2. Configuration of the cluster-plus-glue-atom composition gene relevant to FCC structure, with the cuboctahedron representing the nearest-neighbor cluster and the scattered yellow atoms representing the glue atoms located at the next neighbors.
difference from that of 304 lies in higher Ni content and hence lower C content, to minimize sensitization (the tendency of Cr-rich carbide precipitation that deteriorates corrosion resistance). The complex chemistries of stainless steels are now made clear and concise. We can also see that these steels have large composition tolerance, as each steel actually corresponds to two cluster formulas, which is a significant advantage of steels. This is in clear contrast to Ni-based superalloys whose formulas are usually unique, which makes their preparation intrinsically difficult, as illustrated next.

Metallic glasses
Metallic glasses are characterized by short-range ordering. Good glass-formers conform to well-specified compositions. The steps of composition analysis and alloy design based on the cluster-plus-glue-atom model have been summarized [40,41] , and the resulting cluster formulas are exactly the composition genes. Here, the metallic glasses in Cu-Zr and Ni-(Nb,Ta) systems are taken as examples to illustrate the calculations of composition genes, and the calculated results are listed in Table 1. The three Cu-Zr bulk metallic glass compositions with high glass forming ability are Cu 64 Zr 36 , Cu 56 Zr 44 , and Cu 50 Zr 50 [42] . To interpret the composition of Cu 64 Zr 36 , the deep eutectic point Cu 0.618 Zr 0.382 is selected according to the phase diagram [43] . Then, the crystalline phase Cu 8 Zr 3 is determined to be the devitrification phase due to the structural homology [44,45] , which generates the cluster [Cu-Cu 7 Zr 5 ] with the configuration of icosahedron. Finally, one Cu atom is added as the glue atom, so the cluster formula [Cu-Cu 7 Zr 5 ]Cu ≈ Cu 64.3 Zr 35.7 is obtained, with e/u ≈ 24. For the bulk metallic glass Cu 56 Zr 44 (exactly the eutectic point), two eutectic phases, Cu 10 Zr 7 and CuZr, are related. Thus, the dual-cluster formulization is introduced to decipher this alloy [46] , and its composition gene is calculated to be [Zr-Cu 10  . Besides, the multi-element eutectic-type bulk metallic glasses are always derived from binary eutectic alloys; thus, their composition genes usually stem from multiple phases [49] .

Inorganic compounds
A compound is a pure substance composed of two or more different elements, usually in simple proportions. Compounds are generally classified as stoichiometric and non-stoichiometric ones. In this part, SiO 2 and TiN x are chosen as typical examples to unveil the composition genes hidden in inorganic compounds.
Silica SiO 2 is the most popular inorganic material. The outer electron orbital of Si is 3s 2 3p 2 , so that this atom provides four valence electrons to form chemical bonding. The atom O has an outer electron orbital of 2s 2 2p 4 , giving six valence electrons. Therefore, one Si atom attracts four O atoms through four covalent sp 3 -type bonds, forming a tetrahedron with the Si atom as the center and O atoms as the nearest neighbors,  ]. However, [Si-O 4 ] itself does not satisfy charge balance, and four more electrons are needed to reach the octet state. The [Si-O 4 ] cluster should be linked to one more Si atom as the glue atom to obtain enough electrons, thus maintaining the charge neutrality. Therefore, the molecule-like chemical unit as the composition gene in SiO 2 is determined to be [Si-O 4 ]Si, with e/u = 32 [50] , as shown in Figure 3. Actually, this is also the composition gene of all silicate glasses. Obviously, the composition gene [Si-O 4 ]Si is different from the chemical formula SiO 2 .
The compounds TiN x are typical for non-stoichiometry. The structure can be viewed as N atoms in interstitial solution in FCC Ti lattice, forming the nearest-neighbor coordination octahedral cluster [N-Ti 6 ]. The composition gene for the stoichiometric TiN compound is [N-Ti 6 ]N 5 . However, theoretically, it is revealed that the hardest (or the most stable) compound is TiN 0.8 [50] , which corresponds to a Ni-deficient formula [N-Ti 6 ]N 3.8 , with e/u = 48.

Inorganic glasses
The advent of silicate glasses is traced back to Egyptian times, when people accidentally found that silica mixed with soda easily forms glasses. Soda-lime silicate glasses remain the most widely used category for their easy production and low cost, occupying about 90% of the glass market. However, as with industrial metals, their compositions are developed out of tedious trial-and-error attempts. The silicate glass compositions should be hidden in the amorphous structure, of which the modeling is a big challenge for the scientific community even today. The widely recognized random network model of silicate glasses was proposed by Zachariasen [51] in 1932. The major assumption is that the coordination number centered by a It is also noted that the commercial glasses all have cation valences slightly above three; this is to facilitate industrial-scale production control, as any composition with a cation valence below three would suffer from low structural stability and thus from easy crystallization.
Cr-doped amorphous carbon has a diamond-like structure and is widely used as protective coatings.
Diamond-like amorphous carbon is mainly composed of sp 2 and sp 3 bondings. Cr-doping is generally adopted to increase sp 2 content and hence the electrical conductivity. However, the optimal Cr-doping content is not known. According to the cluster-plus-glue-atom model, the composition gene of stable Crdoped diamond-like carbon is [Cr-C 6 ]CrC or [Cr-C 6 ]Cr 3

[50]
, which provides a theoretical direction for the coating industry.

Polymers
Polymers are basically molecular materials so that in principle their compositions are not of major concern for the researchers: the materials' chemistry is identical to that of the macromolecules composing the polymers. For example, polyethylene, the most common plastic in use today, is made of monomer ethylene C 2 H 4 connected in chain, so that the composition of polyethylene is also that of the monomer. However, the macromolecules are so huge that there is a need to further identify the basic units, which is similar to the identification of chemical units in well-ordered inorganic compounds. Here, polyethylene is taken as the example to define the composition gene in a long-chained macromolecule. Polymerization of ethylene to polyethylene is described by n CH 2 = CH 2 (gas) → [-CH 2 -CH 2 -] n (solid). From this equation, one may have the impression that the repeating unit is a pair of methylene groups. However, in terms of the clusterplus-glue-atom model, the composition gene of this long chain (C 2 H 4 ) n is [C-H 2 C 2 ]H 4 = C 3 H 6 , with e/u = 18. Likewise, the macromolecules of polypropylene (C 3 H 6 ) n and polyvinyl chloride (C 2 H 3 Cl) n , the world's second-and third-most widely produced synthetic plastic polymers, respectively, are deciphered into the composition genes [C-H 1 C 3 ]H 7 = C 4 H 8 and [C-H 1 Cl 1 C 2 ]H 4 = C 3 H 5 Cl 1 , both having e/u = 24.
In terms of composition genes, materials can be simply classified into single-, dual-, and multi-gene types. Single-gene materials refer to the materials consisting of only one composition gene, such as single-crystal Ni-based superalloys, most binary bulk metallic glasses, single-phase inorganic compounds, and single- monomer polymers. Dual-gene materials are the ones composed of two material genes, including eutectic alloys, most industrial alloys, amorphous-crystalline dual-phase alloys, etc. Multi-gene materials are made up of three or more composition genes, such as glasses that are mixtures of several units and multimonomer polymers.
The concept of composition gene proposed here can guide material design, as the genes directly provide the composition formulas, which can greatly improve the efficiency of material research and development. The applications of such an approach in alloy optimization and development has been exemplified in a number of alloys (cf. Ti alloys [30,39] , maraging stainless steels [35] , high-entropy alloys [36,37] , Cu alloys [38] , metallic glasses [40,41,47] , eutectics [46,48] ).

CONClUsIONs
In summary, molecule-like chemical units as composition genes of materials are proposed based on the cluster-plus-glue-atom model, reflecting the chemistries, characteristic short-range ordering, electronic structure stability, and overall atomic densities of the materials. The compositions of various materials, including metallic alloys, inorganic compounds, and polymers, are analyzed and deciphered into different kinds of genes in terms of cluster formulas, expressed as the nearest-neighbor clusters plus a few nextneighbor glue atoms. Materials are thus classified into single-, dual-, and multi-gene types.

Authors' contributions
Conducted the calculations, analyzed the data, and wrote the paper: Zhang S Participated in the discussions about the cluster models: Wang Q Conceived the ideas, proposed the cluster model theory, and revised the manuscript: Dong C