Neural network to predict 23 Na NMR spectra of Na n clusters

spectra


INTRODUCTION
Lithium-ion batteries (LIBs) are essential and used in various devices.The demand for rechargeable batteries is expected to increase as electric vehicles become common nowadays; however, owing to the risk of instability in the supply of raw materials, it is crucial to develop battery technologies that can be applied as an alternative to LIBs.
The abundance of sodium in the earth's crust is 23,600 ppm, which is significantly higher than that of lithium (20 ppm) [1] , making sodium-ion batteries (SIBs) an attractive alternative to LIBs.Hard carbon (HC) exhibits potential for application in the SIB anode.The charge-discharge process needs to be further explored [2,3] .Therefore, we used nuclear magnetic resonance (NMR) spectroscopy, one of the primary techniques to obtain information on the molecular structure of compounds, to improve the performance of HC anodes.The effect of HC preparation conditions on the amount of sodium inserted into HC anodes using solid 23 Na NMR chemical shifts has been experimentally analyzed [2][3][4] .The correlation of NMR chemical shifts of sodium clusters based on pore size and structure is a relevant technique to evaluate the size distribution of closed pores in HCs and other amorphous carbon materials.However, with regard to the computational cost, it becomes difficult to predict NMR chemical shifts of large sodium clusters based on the density functional theory (DFT).
In recent years, graph convolutional networks (GCNs) have been developed to predict the physical properties of various molecules by considering the atomic information on the vicinity of the bond based on convolution [4] .For example, NMR chemical shifts of small molecules, e.g., 1 H and 13 C, can be predicted using the message passing neural network, which is an enhanced version of the GCN model [5] .However, it is difficult to use the model that utilizes graphical information for sodium clusters since the graph information is not sufficient for them.On the contrary, SchNet [6,7] proposed by Schütt et al., is a deep learning model that uses the distance between atoms instead of the connection information of the graph.
This study investigates the possibility of using SchNet to predict the shielding constant of 23 Na NMR based on the structure of sodium clusters.First, a baseline model is created to verify the performance of the SchNet neural network model using Lasso regression, in which coordination numbers are learned as structural descriptors.The results of the DFT calculations are added to the coordination numbers as descriptors of the gross population to train the Lasso regression.Although SchNet has generally been used to predict molecular energies and forces on nuclei, formally, it can also be adapted to cluster models.Therefore, in this study, we use SchNet to create a model that predicts NMR shielding constants from the structure of Na clusters and evaluate the performance of this model by comparing it to Lasso regression.

Data set
The second-order bond length distribution algorithm (S-BLDA) [8] , an algorithm based on bond length distribution analysis, was used to generate random Na n (n = 14, 16, ..., 28, 30, 40) cluster structures.Parameters of S-BLDA were set to μ 1 = 3.50 Å, μ 2 = 3.60 Å, and σ 1 = σ 2 = 0.1 Å. S-BLDA creates clusters by adding one atom to the current structure each time.Atoms are added so that the distribution of the distance between each atom and its neighbors and the distribution of the distance between each atom and its next neighbor can approximate a normal distribution (mean μ 1 , standard deviation σ 1 ) and a normal distribution (mean μ 2 , standard deviation σ 2 ), respectively.These parameters were determined to reproduce the distribution of distances between atoms in the clusters that were structurally optimized by DFT calculations.
The NMR isotropic shielding constant σ iso for the generated structures was determined based on DFT calculations using Gaussian16 [9] .B3LYP [10,11] and 6-31G(d) were used for a functional and a basis function, respectively.The electronic wavefunction was described by an open-shell singlet state, and structures with eigenvalues of the spin operator S 2 less than 0.05 were used to avoid spin contamination, and the structure generation was repeated until 200 structures were obtained for each n.Na n (n = 14, 16, ..., 26), Na n (n = 28, 30), and Na n (n = 40) were used as the training, validation, and test sets.The neural network was trained to minimize the loss function of the validation set.

Lasso regression
With the atomic shielding constant as the objective variable [12] , Lasso regression was used.The running coordination number and/or gross orbital population were used as the explanatory variables.The running coordination number is the number of atoms located from the center of an atom to the radius r max .The delta function used to calculate the running coordination number is approximated by a normally distributed density function with σ = 0.01 Å.Therefore, the running coordination number can be a fraction.The values used for r max were r max = 3.1, 3.2, ..., 12.0 Å.These explanatory variables are highly correlated, and we expect that the use of Lasso regression will improve model stability and robustness by automatically performing feature selection, providing a sparse representation of the model and preventing over-training and multicollinearity.
The functions, 1S, 2S, 2P X , 2P Y , 2P Z , 3S, 3P X , 3P Y , 3P Z , 4S, 4P X , 4P Y , 4P Z , 5D XX , 5D YY , 5D ZZ , 5D XY , 5D XZ , and 5D YZ , in the 6-31G(d) basis were used as the gross orbital population.The explanatory variables were standardized such that the mean value of the shielding constant of the training set and the variance were 0 and 1, respectively.The α parameters of Lasso regression were each fitted to the training set with a model that had α = 10 -5 , 10 -4 ,...,10 4 , 10 5 and selected to minimize the RMSE against the validation set.

Neural network
SchNet, as implemented in SchNetPack 1.0.0 [13] , was used.It predicts the properties of molecules by combining the contributions of individual atoms; however, instead of combining contributions, this study uses the value of the contribution of each atom to predict the value of the atomic shielding constant.The "Number of features to describe atomic environments", "Number of filters used in continuous-filter convolution", and "Number of Gaussian functions used to expand atomic distance" were set to SchNet default values of 128, 128, and 25, respectively.The "Number of interaction blocks" and "Cutoff radii" were 2 and 6.0 Å, respectively.Shifted Softplus was used as the activation function for hidden NNs, and Adam optimizer [14] was used at a learning rate of 0.01.The mini-batch size was set to 100.A maximum of 2,000 epochs of training was carried out.As results significantly depend on the initial weights of the neural network, twenty different seeds to generate pseudo-random numbers were used, and the model with the best performance against validation was selected.

Loss function
Shielding constants exhibited a generalized logistic distribution irrespective of the number of atoms in the clusters [Figure 1].As this distribution is skewed to the left, if neural networks were trained with the mean squared error (MSE) as a loss function, the neural networks would focus on low values and neglect values around the most frequent value.Therefore, the shielding constant was used for the cumulative distribution function (CDF) of the generalized logistic distribution and the percent point function of the standard normal distribution.In other words, the probability distribution was transformed by adapting the inverse CDF of the standard normal distribution to the CDF of the generalized logistic distribution.
The generalized logistic distribution used parameters fitted to the distribution of shielding constants in the training set.The loss function was set as follows, with a term added to the MSE to reproduce the distribution.
where y is the shielding constant calculated using the DFT and is the shielding constant predicted by the neural network, .In addition, are the values of shielding constants sorted in the mini-batch.In this study, a = 3 was used.
In order to reproduce the shape of the distribution, it is necessary to output values corresponding to the tails of the distribution.However, since it is difficult to predict these values accurately, the predicted values for some samples may have large errors, which may worsen the RMSE.Additionally, when minimizing RMSE, samples that show values at the base of the distribution tend to be less important because they are few relative to other samples.Thus, there is a trade-off between reproducing the shape of the distribution and RMSE.

Lasso regression
We confirmed the relationship between the shielding constant and the gross orbital population.Regarding relations to the shielding constant, 3S was positively correlated, and 3P was negatively correlated [Figure 2A  and B].In contrast, 4S, which constitutes the 3s orbital, and 4P, which constitutes the 3p orbital, as well as 3S and 3P, show no correlation with the shielding constant [Figure 2C and D].Based on the relation between the coordination number and the gross orbital population [Figure 2E and F], when the coordination number is small, only 3S orbitals are found.When the coordination number starts to increase, the number of 3S orbitals decreases while that of 3P increases.However, when the coordination number increases further, the number of 3S starts to increase as well.In other words, the atoms near the surface have mainly 3S orbitals with high shielding constants; the interior atoms have both 3S and 3P orbitals with large shielding constants, and the atoms in between the coordination numbers tend to have 3P orbitals with smaller shielding constants.
We predicted the shielding constant using Lasso regression with the coordination number and gross orbital population as the explanatory variables [Figure 3].The RMSE was 37.6 ppm when using the coordination number [Figure 3A], and it was 35.1 ppm when using the gross orbital population [Figure 3C].As the NMR shielding constant is strongly influenced by the electrons in the s orbitals, it is assumed that these results reflect that the gross orbital population is effective in predicting the shielding constant.However, the coordination number can be calculated only from the structure, whereas the gross orbital population can be obtained only using DFT calculation.The RMSE of the model using these two types of explanatory variables simultaneously was 34.9 ppm, which hardly differed from that using the gross orbital population alone.Next, by transforming the shielding constant to a normal distribution, the peak position shifted to the right and approached the distribution peak provided by the DFT analysis.However, the RMSE deteriorated considerably.It is assumed that this is because the machine learning model learns to minimize the error in the shielding constant after the conversion by converting the shielding constant to a normal distribution.
From the analysis of the gross orbital population, the atoms near the surface have mainly 3S orbitals with high shielding constants, and the atoms in the interior have both 3S and 3P orbitals with large shielding constants; the atoms with coordination numbers in between tend to have 3P orbitals with small shielding constants.

Neural network
The neural network was trained, and its prediction performance was verified for the test set.First, the distribution of shielding constants calculated using the DFT for the test set Na 40 [Figure 4A] peaks around 600 ppm, which is similar to a generalized logistic distribution as illustrated by the Quantile-Quantile plot (Q-Q plot) [Figure 4B].The predicted values of the neural network were compared with the DFT-calculated values [Figure 5].The model trained with MSE as a loss function [Figure 5A and C] reproduced the average value but failed to output the most frequent value and shielding constant below 500 ppm.On the contrary, the loss function of Equation ( 1), introduced in this study [Figure 5B and D], deteriorated the root mean square error (RMSE); however, it reproduced the shape of the distribution of the shielding constant, and the position of the peak shifted to the right.No significant change was observed in the distribution of the predicted values when the shielding constant was converted to generate a normal distribution.
There were issues in both the Lasso regression and neural network models, as the peaks in the distribution of the predicted shielding constant did not coincide with those given by DFT.Therefore, the shielding constants were converted to a normal distribution, and the shielding constants were predicted after the conversion.As a result, in the Lasso regression model, the peak position shifted to the right and approached the distribution peak given by DFT.However, the peak position did not shift in the neural network model, showing that it had no effect.Figure 3A and B predict shielding constants by Lasso regression using only the simplest descriptor, coordination numbers.Figure 5A-D uses SchNet to predict shielding constants from information on more complex structures.Comparing these results, the RMSE of Figure 5A and C is smaller than that of the Lasso regression, indicating that the prediction performance is improved.Moreover, Figure 3C has the smallest RMSE in this case.This is due to the use of the gross orbital population from the DFT calculation as the descriptor.However, SchNet, which predicts shielding constants from structural information alone without using the results of DFT calculations, succeeded in predicting RMSE comparable to the smallest RMSE (see Figure 5A and C).
The large RMSE of these neural networks is expected to be caused by the difficulty in predicting shielding constants for low values, i.e., below 500 ppm.The relationship between the shielding constants and running coordination number at a radius of 4.0 Å indicates that such low values of shielding constants are obtained for Na atoms with intermediate coordination numbers ranging from 4 to 9 [Figure 6].The running coordination number represents the number of Na atoms located at radius r max from the center of a Na atom as the delta function used in the calculation is approximated using a normally distributed density function with σ = 0.01 Å, the running coordination number can be small.7A] [15] and the particle swarm optimization (CALYPSO) method [16] have been reported, both possessing a Na atom with coordination number 12 in the center.However, structures generated using the proposed method rarely exhibit high coordination numbers [Figure 6].Such an unstable cluster structure reduces the shielding constant, i.e., below 500 ppm.To solve this problem, the neural network can be trained using stable cluster structures as data are obtained using structural optimization.One cluster structure, which includes outliers, is extracted and analyzed.The cluster structures and their running coordination numbers are shown in Figure 7B, where Na atoms with shielding constants of 368.1 and 386.4 ppm are located in the range of 4-7.The cluster structures appear to include 9-10 coordinated Na atoms at most.The optimized cluster structure is shown in Figure 7C.The cluster structure is round (almost spherical), and one Na atom exhibits a coordination number of 12.The minimum value of the shielding constant of the cluster structure is 581.9 ppm, which indicates that the cluster structure does not include any Na atoms with a shielding constant outlier.Thus, the number of shielding constant outliers can be reduced using structural optimization.The reason for the wider hem in Figure 1 can be attributed to the fact that the clusters have not undergone structural optimization calculations.

Structures of Na 20 clusters optimized by Murrell-Mottram potential [Figure
If the neural network is trained based on the cluster structure, which was structurally optimized based on the DFT level, this neural network for unknown structures is required to calculate the DFT level every time.Therefore, structural optimization using the self-consistent-charge density-functional tight-binding method [17,18] , which incurs a low cost, can help reduce computational costs.DFTB is effective in generating structures for gold and silver clusters [19] .Moreover, Gupta potential [15,20,21] or Murrell-Mottram potential [15] , which have been globally used to optimize Na clusters, can be an alternative.The CALYPSO method has successfully explored the structures of medium-sized metal-doped boron clusters [22][23][24][25] and silicon clusters [26] .

CONCLUSION
This study used a SchNet-based neural network to predict shielding constants of Na clusters and structures generated using the S-BLDA method.
The RMSE of the model using the neural network was 35.2 ppm, which was almost the same as the 34.9 and 35.1 ppm obtained in the Lasso regression model that respectively used the running coordination number and gross orbital population, or only the gross orbital population as the explanatory variables.While the neural network model used only the structure data, the Lasso regression model used the gross orbital population predicted by the DFT calculation as an explanatory variable.Therefore, it is considered that this neural network model, using only the structure data, achieved comparable performance to that of the Lasso model, which uses gross orbital population correlated with the shielding constant.
The S-BLDA method exhibited a skewed distribution for shielding constants, yielding low shielding constants.This may be because the randomly generated non-equilibrium cluster structure tends to destabilize electronic states, resulting in unrealistic values of the shielding constant, which are extremely low.SchNet and Lasso were unable to reproduce such a distribution.However, by adding a penalty term to reproduce the original distribution as a loss function, the distorted distribution can be handled without changing the structure of the neural network.It became possible to output an extremely low shielding constant.
To further improve the prediction performance, it would be necessary to stabilize each structure to reduce the anomalies in the shielding constants and improve the performance of each system.Future research will be conducted in this direction.

Figure 1 .
Figure 1.Distribution of the shielding constant σ iso for cluster size n and the fitting curve of the generalized logistic distribution.

Figure 3 .
Figure 3. Distribution of the shielding constants predicted by Lasso, and scatter plots of the shielding constants predicted by Lasso vs. those calculated by DFT for the test set Na 40 .To handle shielding constants that follow a generalized logistic distribution, in (B) and (D), the shielding constants are once transformed to a normal distribution to train the Lasso model.When making predictions, the prediction values of the Lasso model were back-transformed into shielding constants based on the original generalized logistic distribution.On the other hand, in (A) and (C), the shielding constants are predicted directly.In terms of the explanatory variables, (A) and (B) use the coordination numbers calculated from the structure, and (C) and (D) use the gross orbital population calculated by DFT calculations.

Figure 4 .
Figure 4. Q-Q plot for (A) the distribution of shielding constants and (B) generalized logistic distribution for the test set Na 40 .

Figure 5 .
Figure 5. Distribution of shielding constants predicted by the neural network and the scatter plot of the shielding constants predicted by the neural network vs. the shielding constant calculated using the DFT based on the test set Na 40 .Two approaches were employed for handling the generalized logistic distribution of shielding constants.One approach is to convert the shielding constant to a normal distribution before training the neural network in (C) and (D).However, before plotting, the neural network predictions are converted to the original generalized logistic distribution.In contrast, in (A) and (B), the shielding constants are learned directly.The other approach used Equation (1) as the loss function in (B) and (D).In contrast, (A) and (C) employ only MSE as the loss function.

Figure 6 .
Figure 6.Relationship between the shielding constant and running coordination number at a radius of 4.0 Å for the entire data set.

Figure 7 .
Figure 7. Line plot of the structure of Na 20 and the running coordination number of each atom (represented by a different colored line) in the structure.(A) Stable structure predicted based on Murrell-Mottram potential [15] ; (B) structure generated in this study; (C) structure-optimized version of (B).(A) and (B) are outlier atoms with shielding constants of 368.1 and 386.4 ppm, respectively.