Inversion of soil properties with hyperspectral reflectance in construction areas of high-standard farmland

: High-standard farmland construction is an important process that can enhance food security and accelerate new-style modernization agriculture. Hyperspectral remote sensing can provide data and technical support for this type of construction to provide a reference when optimizing high-standard farmland construction areas. This study was performed in Xinzheng City, the primary grain-producing areas in Henan Province. Field sampling and indoor hyperspectral spectroscopy (350~2500 nm) were combined; spectral transformations such as continuum removal ( CR ) were performed after Savitzky‒Golay ( SG ) convolution smoothing; and the best hyperspectral bands were selected as the common index of the soil properties by correlation analysis and fuzzy clustering maximum tree. A hyperspectral inversion model was built for the panel data model of the fixed effect variable coefficient based on the ordinary least squares estimation method ( OLS ), including panel data describing pH, organic matter, nitrogen, phosphorus, potassium, iron, chromium, cadmium, zinc, copper, and lead of 116 samples in Xinzheng City. Results show that the panel data model is of good quality overall, and the goodness of fit is higher ( R ̄ 2 = 0.9991, F = 2195.67). The precision test results indicate that the models performed well at both description and prediction, including accurate quantification, with an RPD above 2.5. Thus, the proposed model provides an important basis for soil information management, resource evaluation, and a reference when optimizing high-standard farmland construction processes.


INTRODUCTION
Core areas for grain production have become a national strategy in China.Henan Province is a critical area of grain production, and an important bottleneck that restricts sustainable agricultural development in Henan is its high population density, which produces a serious shortage of reserve resources of cultivated land, and an overall low quality of cultivated land.Therefore, strengthening high-standard farmland construction is important to apply China's national strategy and promote agricultural production.
High-standard farmland refers to basic farmland with centralized contiguity, supporting facilities, high and stable yields, good ecology, and strong disaster resistance formed through rural land consolidation and construction in a certain period, which is compatible with modern agricultural production and operation modes (Ministry of Land and Resources of the People's Republic of China, 2012).To date, most studies of high-standard farmland construction include high-standard farmland construction demarcation (Zhang et al., 2018;Li et al., 2019;Dong et al., 2020), potential evaluation (Cai et al., 2019;Li et al., 2018), suitability evaluation (Tang et al., 2019;Chen et al., 2019;Zhang et al., 2020), construction sequence and mode zoning (Li et al., 2020;Wang et al., 2021;Zeng et al., 2018), project implementation and effect evaluation (Ma et al., 2018;Xiong et al., 2019;Wang et al., 2018), etc.In the standard of high-standard farmland construction, "the quality of cultivated land after completion reaches the higher level of the county", etc; therefore, in the construction of high-standard farmland the construction of field projects requires that the soil quality should reach the higher level of the region.Thus, the amount of fertilizer required should be determined soil nutrient status and soil nitrogen, phosphorus, potassium, medium and trace elements, organic matter content, soil acidification, and salinity.The soil should also be regularly monitored for other conditions, and the fertilization should be constantly adjusted based on real-world measurements (The Ministry of Agriculture of the People's Republic of China, 2012).
However, during high-standard farmland construction, in addition to more land levelling, the improvement of supporting facilities of roads, ditches and other projects strengthens the rapid acquisition and real-time monitoring of basic soil information.However, most research methods use quantitative inversion models with single soil properties, e.g., partial least squares regression (PLSR) (Wei et al., 2020;Bian et al., 2021;Li et al., 2021;Yin et al., 2021), back propagation neural networks (BPNN) (Yu et al., 2021), random forest (RF) (Schreiner et al., 2021;Chen et al., 2021), and support vector machine (SVM) (Shi et al., 2021;Xu et al., 2020;Selvakumar et al., 2021).Inversion models of soil properties and spectral reflectance must be built one by one, and their calculation is complex and time-consuming.
A panel data model can build a three-dimensional data model concurrently and consider various soil properties of multiple points and high spectral characteristics of the band values.Using inversion modelling, multiple soil properties are considered, model calculation is simpler, and the relationship between each model analysis of soil properties and the influence of high spectral band characteristic values on each soil property can be determined.Therefore, it is necessary to study the rapid and nondestructive testing of soil attribute information in high-standard farmland construction areas to provide technical support for the rapid acquisition and real-time monitoring of soil properties, and to support the optimization of high-standard farmland construction areas.
This study considered the high-standard farmland construction area of Xinzheng City and attempts to establish a comprehensive hyperspectral inversion model of cultivated land soil properties using a panel data model, estimate the influence of hyperspectral characteristic band values on each soil attribute, and predict the content of each soil property to provide theoretical and technical support for the rapid acquisition and realtime monitoring of soil properties in high-standard farmland construction areas.

Overview of the researched area
Xinzheng City is located in the central part of Henan Province in China, which is a transition zone from the North China Plain and western Henan Mountains to the eastern Henan Plain; and is the core of the Central Plains economic zone under Zhengzhou City, which is located at 34° 16'~34° 39' N, 113° 30'~ 113° 54' E. Xinzheng City has a total population of 653,000 and, in 2019, had jurisdiction over 9 towns, 27 townships, 3 streets, 253 administrative villages, 921 natural villages and 24 residential areas.
According to a survey of land-use status in 2013, the total land area of Xinzheng City is 884.5915km 2 , and the cultivated land is 521.7641km 2 , accounting for 58.59 % of the total land area.The total annual grain output is 273148 Mg.According to the Integrated Land-use Planning of Xinzheng City (2010-2020), the protection index of prime farmland in Xinzheng City is 427.73 km 2 .There are various soil types, primarily cinnamon soil, tidal soil and aeolian sand soil.The terrain is high in altitude in the west and low in the east, with shallow hills in the west, plains in the east and hills in the northwest.

Field collection of soil samples
According to the soil type, topographic characteristics and spatial variation of the study area, and considering the integrity of administrative units (towns or villages as units), sampling points were laid out in a 2 × 2 km regular grid pattern, the sampling depth was 0.00-0.30m on the surface of the soil.A total of 154 soil samples were collected in this sampling, and invasive bodies, such as plant roots and stem residues and brick and tile fragments, were removed, as shown in figure 1.After natural air drying, grinding and passing through a 1-mm sieve, the samples were divided into four parts using the quartering method in duplicate: one part was used to determine each sample's physical and chemical properties in the laboratory, and the other part was used to determine the soil spectrum.
Soil properties measured in this study were soil pH, organic matter, nitrogen, phosphorus, potassium, iron, chromium, cadmium, zinc, copper, lead.They were measured using the regional geochemical sample analysis method (Ministry of Land and Resources of the People's Republic of China, 2012).To ensure a high-quality analysis, national geochemical standard samples were used for quality control.

Laboratory test of sample spectra
This study measured soil spectral reflectance by an ASD spectrometer on treated soil samples under indoor conditions.The spectrum measurement instrument is an ASD Field Spec 3 spectrometer produced by ASD, USA.Spectral range was 350-2500 nm with a sampling interval of 1.4 nm for 350-1000 nm, a sampling interval of 2 nm for 1000-2500 nm, and a resampling interval of 1 nm.Before spectral measurement, the surface of the soil was scraped in the same direction along the edge of the soil sample vessel with a ruler and then filled with a soil sample dish placed with black rubber pad with a reflectivity of approximately 0. A halogen lamp with a power of 50 W was used as light source, and a probe with a view angle of 25° and a light incidence angle of 45° was used.The distance to the light source was 15 cm, and the distance to the probe was 15 cm.To reduce the influence of anisotropic soil sample spectra, when measuring, the sample plate was turned three times, each time with a rotation angle of approximately 90°, and the soil sample spectra were measured in four directions.Reference plate calibration was performed before and after each target spectrum acquisition and repeated five times for 20 times.View Spec Pro software was then used to obtain the average spectral reflectance based on the original reflectance's spectral value.With the band test range in the spectral data near the two ends being unstable, we removed data at 350-399 and 2401-2500 nm, which are strongly affected by external noise.

Fuzzy clustering maximum tree method
Fuzzy theory was developed on the mathematical basis of fuzzy set theory established by American cybernetics expert Professor L.A. Zadeh in 1965 and has been widely used in mathematics and many other fields (Liu et al., 2004).Fuzzy clustering number is a multitechnology that classifies objective things using the fuzzy mathematics method, establishing similarity relations according to the characteristics, similarity and affinity degree of the objective (Li et al., 1994;Wang et al., 1983).Because the classification of reality is often accompanied by fuzziness, fuzzy clustering theory is more consistent with objective reality.
The basic steps of fuzzy clustering analysis using the maximum tree method are as follows: (1) Establish the sample set matrix Assuming that the sample set , in which n is the number of samples, and each sample has an m dimensional vector representation (i.e., each sample has m indicators), then:  Xinzheng City Solo 2023;47:e0230033 (2) Establish the fuzzy similarity matrix According to the given sample characteristic data, the correlation coefficient method is used to establish the fuzzy similarity matrix , in which r ij is the similarity coefficient between different samples: (3) Maximum tree generation With a certain point x i in a relatively concentrated set of classified objects as its vertex and r ij in the fuzzy similarity matrix R as its weight, it is arranged in descending order, requiring no loop (i.e., circle) until all vertices are connected, forming a special graph called the largest tree, which may not be unique.

(4) Clustering
We then select the appropriate threshold λ, cut off the branches of the weight ij r λ < , and obtain an unconnected graph.Each connected branch constitutes the classification of horizontal λ, and there are several branches indicating the classification of several categories.
In this study, the fuzzy similarity coefficient between the correlation coefficient curves of soil properties and spectral indices was calculated by a systematic clustering method, and a fuzzy similarity matrix was constructed to determine the similarity of the correlation coefficient between different soil properties and spectral indices.On this basis, the common hyperspectral inversion bands of different soil properties were determined by the maximum tree classification method.

Panel data model
Panel data are also called parallel data, time series and cross-section data, or pool data, and refer to taking multiple cross-sections on a time series.Sample data formed by sample observations are simultaneously selected on these cross sections (Sun et al., 2010).Using a cross section, we can perform a cross section observation that is formed by several individuals at a certain moment.Also, we can describe a time series from a longitudinal section.According to the characteristics of panel data, the hyperspectral characteristic band values of soil properties of multiple samples can be considered to be the hyperspectral characteristic band values of soil properties at a sample point on the cross-section and a sequence of sample points on the vertical section.Using a panel data model, a comprehensive inversion model of soil properties can also be developed without an individual inversion of each index, which reduces the tedious process of multiindex inversion (Zhang et al., 2021).
Due to the many sample points T and the few cross sections N, a fixed influence model was developed, and ordinary least squares estimation (OLS) was used to build the panel data model.Then, panel data model types were determined via an analysis of covariance, the invariant coefficient model, variable intercept model and variable coefficient model.To reduce the impact of heteroscedasticity, the natural logarithm of variables was calculated on both sides of the panel data model equation, and the panel data model was described by equation 1: ( 1, 2, , ; 1, 2, , ) Eq. 1 Rev Bras Cienc Solo 2023;47:e0230033 in which: y it is the values of explained variables on cross Section i and sample t, soil heavy metal element content; a i is a constant or intercept term that represents the cross-section of i (influence of the individual of i); b ji is the model parameter of the jth explanatory variable on the ith cross-section; x jit is the value of the jth explanatory variable on cross Section i and sample t, the reflectance of the hyperspectral characteristic band of soil heavy metals; u it is the random error term on cross Section i and sample t; k is the number of explanatory variables.

Accuracy test method of the inversion model
Calibration set determination coefficient R ̅ 2 and root mean square error (RMSEC) are used to verify the modeling accuracy.Validation set test is based on the validation set , root mean square error (RMSEP) and relative percent deviation (RPD).The relative percent deviation is the ratio between the standard deviation and RMSEP of the validation set.When RPD >2.5, the model is shown to have excellent predictive ability.When 2.0< RPD ≤2.5, the model has good quantitative prediction ability.When 1.8< RPD ≤2.0, the model has some quantitative prediction ability.When 1.40< RPD ≤1.80, the model has only general quantitative prediction ability.When 1.00< RPD ≤1.40, the model has the ability to distinguish a high value from a low value.When RPD ≤1.00, the model has no predictive ability (Rossel et al., 2006).To set the model's parameters, the larger R ̅ 2 is, the smaller RMSEC is, the higher the modelling accuracy is, and the more stable the model is.For the verification set, the larger R ̅ v 2 and RPD are, the smaller RMSEP is, and the higher the prediction accuracy is.

Spectral pretreatment
During ASD spectrometer acquisition and transmission of spectral signals, in addition to the spectral information of soil itself, spectrometer breeding and interference of external factors, there may be many "burr" noises in spectral curves, and the signal-to-noise ratio is reduced.To obtain a stable spectrum and improve the signal-to-noise ratio, it is necessary to smooth the spectral data.Savitzky-Golay (SG) convolution smoothing method was proposed by Savitky and Golay (Savitzhy et al., 2021) in 1964 and is a weighted average method that obtains smooth point data by least square fitting of the data that are to be measured in the moving window interval using a polynomial method.Convolution smoothing is widely used currently, and during SG filtering, appropriate smoothing points and polynomial fitting times must be selected.The more smoothing points that are considered, the smoother the spectral curve will be, but some information will also be lost .Therefore, SG filtering smoothing based on a 9-point quadratic polynomial is used.The transform tool used for smoothing and denoising was Unscrambler 9.7, as shown in figure 2.
To describe the smoothing effect more accurately, the band curves at 2000~2400 nm were amplified (Figure 2b).By comparing the details before and after SG smoothing, SG smoothing can effectively remove noise and better preserve the overall characteristics of spectral curves.

Continuum removal
To describe the sensitive relationship between soil heavy metal content and spectral reflectance, continuum removal (CR) spectral transformation after SG smoothing is required.A spectral analysis method was proposed first by Clark andRous in 1984 (Clark et al., 2021) and is defined as a point in a straight line connected with the wavelength change reflect or absorb protruding point of "peak value", making the line in the "peak value" on the outside greater than 180° (Tong et al., 2006), the real spectrum reflectance Rev Bras Cienc Solo 2023;47:e0230033 and the envelope line of the corresponding band reflectance ratio.By normalizing the spectral value to 0~1 (Li et al., 2021), the absorption and reflection characteristics of the spectral curve can be effectively highlighted, and the characteristic bands can be extracted.Via the proper spectral transformation, the influence of various noises can be reduced or even eliminated, the spectral sensitivity can be improved, and the prediction ability and stability of the calibration model can be improved.We performed the envelope removal line by constructing the Popper database in Envi 4.8, as shown in figure 3.
Reflectance curve of CR enhances the spectral characteristics of the original spectral curve at 1400, 1900 and 2200 nm, and also highlights the weak absorption characteristics at 410, 500 and 700 nm.These results show that the weak absorption characteristic information of the original spectral curve is enhanced, and the signal-to-noise ratio is improved by de-enveloping spectral transformation, which improves the extraction of effective characteristic bands.

Selection of common spectral characteristic bands for soil properties
Based on soil property significant band selection for the Xinzheng high-standard farmland construction area, and considering the needs of different soil property spectrum inversions, combined with the correlation coefficient curve similarity and inflection point, we use the method of fuzzy clustering tree to determine the share of the best band of hyperspectral inversion of soil properties for the Xinzheng high-standard farmland construction area.
Using comparative analysis of the correlation coefficient curves of 11 soil properties and SG-CR transformations of Xinzheng City high-standard farmland area (Figure 4), the correlation coefficient curves of the same spectral transformation have similar inflection points, showing good similarity.

Construction of panel data model
Based on soil types, the sample set was divided into a calibration set and a verification set using the Rank-KS (Liu et al., 2014) (i.e., the content gradient method-Kennard-Stone) method.The 154 samples in the study area were then divided into two groups: a calibration set a validation set.Calibration set included 116 samples for the construction of the soil property inversion model, and the validation set included 38 samples for testing the prediction accuracy of the model.
Using the common spectral feature bands selected from SG-CR spectral transformation as independent variables of the soil property inversion model, panel data were constructed based on ordinary least squares estimation (OLS) for the soil property content of 116 soil samples in Xinzheng city, as shown in table 1.
Results show that the regression coefficient does not significantly equal 0, and the sample determination coefficient after adjustment R ̅ v 2 is 0.9991, indicating that the goodness of fit of the model is high.A large F statistic indicates that the regression coefficient is significant, and the regression model is significant as a whole.

Model accuracy check
This study also tested the accuracy of soil pH, SOM, AN, AP, AK, Fe, Cr, Cd, Zn, Cu and Pb by the constructed panel data model.Results are shown in table 2. As shown in table 2, each soil property in the correction set has a high coefficient R ̅ 2 , greater than 0.95.The highest is alkaline hydrolysed nitrogen (0.998), and the corresponding root mean square error is 0.67.The lowest pH was 0.95, and the corresponding root mean square error was 0.05, both of which were particularly low, indicating that the inversion model constructed by the panel data model could simultaneously realize the inversion of 11 soil properties and have good modelling accuracy.
According to the prediction results of the validation set, R ̅ v 2 of pH, organic matter, alkalihydrolysed nitrogen, Cd and Cu were lower than the modeling set; the other metrics increased.In addition to Pb, the root mean square error was lower than that in the modeling set, and the R ̅ v 2 of the other soil properties increased.Relative errors are all greater than 2.5, indicating that the model has good quantitative prediction ability of pH, SOM, AN, AP, AK, Fe, Cr, Cd, Zn, Cu and Pb.
To more clearly demonstrate the modeling accuracy of the panel data of the fixed influence variable coefficient model, the soil property content diagram (Figure 5) and scatter diagram (Figure 6) of the measured value and the inversion value are shown.
Comparing figures 5 and 6 shows that, except for a few sample differences, the measured and predicted values of most samples are concentrated near y = x (i.e., the 1:1 line).Correlation coefficients r between the measured values and predicted values all pass the significance test at the p=0.01 level, which indicates that the panel data model with SG-CR spectral transformation as an independent variable has good predictive ability and can be used to invert multiple soil properties simultaneously.

DISCUSSION
Hyperspectral remote sensing technology has multiple bands, high resolution and large spectral information, which can provide spatial information on soil surface conditions and properties, and evaluate and detect subtle differences in soil properties, providing conditions to determine basic soil information quickly, accurately and efficiently.Therefore, introducing hyperspectra to the acquisition of soil information in high-standard farmland construction areas is helpful for real-time monitoring of basic soil information and provides a new technique for acquiring soil information.However, most research on hyperspectral inversion of soil properties only applies single quantitative inversions with soil properties.This study found that the panel data model achieves good predictions and is thus feasible in high-standard farmland construction areas, providing technical support for the acquisition of soil property data in high-standard farmland construction areas.By building hyperspectral inversion model of soil properties data comparing the calibration and validation set results, each soil property of calibration set has high determination coefficient, with R ̅ 2 greater than 0.95.Also, the relative errors of the validation set are greater than 2.5.Except for a few samples have the difference from the measured values, most measured and predicted values of the samples are concentrated near y=x (i.e., the 1:1 line), and the correlation coefficient r is significant at the p=0.01 level, indicating that the panel data model has good predictive ability when using a CR spectral transformation as the independent variable.The proposed method can thus be used for simultaneous inversion of multiple soil properties in high-standard farmland construction areas.Soil spectrum provides a comprehensive description of a soil sample's properties.Highstandard farmland construction processes, soil organic matter and NPK nutrients, soil moisture, soil texture, oxides and other basic information are vital.Although this study uses a panel data model and concurrently solves the problem of various soil property data spectrum inversions in the high-standard farmland construction area of Xinzheng City, the universality of the proposed method still requires more verification.In the future, the applicability of this method should be tested in different regions or different soil types to Rev Bras Cienc Solo 2023;47:e0230033 provide a data basis and technical support to achieve high-standard farmland construction.Due to the regional uniqueness of soil and the uncertainty of the field environment, whether indoor spectral data models can be applied to field and hyperspectral images should be a focus of future research.

CONCLUSION
In the construction of high-standard farmland, the soil pH, SOM, soil nutrient, and soil pollution characteristics are indispensable basic information for cultivated land quality evaluation.They are also important guarantees for the growing environment of crops.Therefore, the fast acquisition and real-time monitoring of basic soil information is the premise and foundation for constructing high-standard farmland.
The SG convolution smoothing of spectral reflectance can effectively remove noise, while preserving the overall characteristics of spectral curves, and improving signal-tonoise ratio.By removing the envelope CR spectral transformation, the absorption and reflection characteristics of the spectral curve are effectively highlighted, the sensitivity of the spectrum is improved, and the useful information of the spectrum is enhanced.
After the correlation analysis about soil properties and SG-CR spectral transformation, and using Fuzzy clustering maximum tree method, combined with the similar inflection point of the correlation coefficient curve of soil properties and spectral reflectance, select the common significance band of different soil properties as the best hyperspectral characteristic band and focus on 405~431nm, 781nm~831nm, 1044~1087nm, 1251~1410nm, 1836~1898nm, 2080nm~2201nm, 2324~2395nm.
Based on the panel data model, we constructed the comprehensive inversion model with the common spectral characteristic band of SG-CR spectral transformation as the independent variable.The model was significant as a whole, and the goodness of fit was high (R 2 = 0.9991, DW = 2.1899, F = 2195.67).The relative analysis errors of soil properties of SG-CR spectral transformation model are all greater than 2.5, and the measured and predicted values of most samples are concentrated near the 1:1 line, and the correlation coefficients r all pass the significance test at p=0.01.
It indicates that the panel data models with SG-CR spectral transformation as an independent variable have the ability of comprehensive inversion of soil properties, and have high precision prediction.

Figure 2 .
Figure 2. Spectral reflectance before and after SG b.Zoom conversely the spectral reflectance of 2000~2400 nm.

Figure 5 .
Figure 5. Inversion of soil property elements on the validation set.

Figure 6 .
Figure 6.Inversion of soil property elements on the validation set.

Table 1 .
Panel date models for soil property elements

Table 2 .
Calibration and validation for soil property elements using the panel date model