Pedotransfer functions: the role of soil chemical properties units conversion for soil classification

Chemical soil analysis data can be expressed by weight (i.e., gravimetric basis) or volume (i.e., volumetric basis) of the fine earth (sieved ≥2 mm), resulting in different units, cmolc kg and cmolc dm, respectively. The research problem is that the difference between methods to express the same soil properties hinders the comparison of results and database or dataset standardization. This paper aims to develop pedotransfer functions (PTF) to obtain the density of fine earth, which will then be used for conversion data expressed in volumetric to gravimetric basis, or vice versa, that will be applied to compare results and to standardize databases with different units. Soils samples, including profiles of the main soil orders in Brazil such as Latossolos (Ferralsols or Oxisols) and Argissolos (Acrisols or Ultisols), from the states of Rondônia, Roraima, and Mato Grosso do Sul (132 horizons) were selected and weighed (in triplicate) to obtain the fine earth mass contained in a volume of 10 cm. The mass values were used to calculate the fine earth density. Spearman’s correlation analysis was used between the density and nine soil properties (coarse sand, fine sand, total sand, silt, clay, clay dispersed in water, clay dispersion, particle density, and organic carbon). The total sand, clay, and organic carbon showed the best correlations, therefore they were selected to construct the pedotransfer functions. Nonlinear regression techniques were used to obtain the models (PTFs) to predict density, which was used for unit conversion. As a result, the residual standard error (RSE) statistics of the models were: 0.0920, 0.1231, and 0.1633 g cm, respectively for PTF1 (using total sand as a predictor), PTF2 (using clay), and PTF3 (using organic carbon). Independent data was used to evaluate the accuracy of the models by residue analysis and the RSE. For the validation, the lowest RSE obtained was from the PTF1, so the best performance. Thus, to convert values of the chemical properties from a volumetric to gravimetric basis, the value must be divided by the predicted density. While, the conversion from gravimetric to volumetric basis requires that the value be multiplied by the predicted density. The PTFs using the properties total sand, clay, and organic carbon as predictor variables, allowed conversion of analytical data of soil samples expressed in the volumetric basis to gravimetric and vice versa, which can be used for dataset or database standardization.


INTRODUCTION
The majority of soil chemical and physical analyses start with the preparation of samples in the laboratory, through drying and maceration, followed by sieving (≥2 mm) to obtain the called fine earth (Teixeira et al., 2017). To carry out the chemical analysis, a certain weight or volume of fine earth is taken to be analyzed. In general, there are two ways to obtain the sample: (i) using the fine earth mass (i.e., 10 g), thus expressing results in a gravimetric basis, that is, cmol c kg -1 , g kg -1 or mg kg -1 (Teixeira et al., 2017); and (ii) using the fine earth volume (i.e., 10 cm 3 ) and expressing the results in a volumetric basis, that is, cmol c dm -3 or mg dm -3 (Silva et al., 2009).
The most common procedure to evaluate soil fertility in Brazil and countries such as the U.S.A., is to sample volumetrically the fine earth by using a device that consists of a small cylinder with a 10 cm 3 volume (Silva et al., 2009;Soil Survey Staff, 2014a). This device gives convenience and speed in the process of sample preparation for chemical analysis. The adoption of this method began in Brazil and Latin America since 1966 with the support of North Carolina State University (USA) to evaluate more samples in less time and return fertility recommendations to farmers faster (Soil Science Department, 1966). Therefore, the manuals for recommendations of lime and fertilizers adopted in Brazil express the chemical properties on a volumetric basis (e.g., Freire et al., 2013;Prezotti et al., 2013;CQFS-RS/SC, 2016). However, in soil survey analyzes for pedological purposes, the results are commonly expressed on a gravimetric basis. Details about the chemical properties for soil classification and chemical analyses units are presented in the publications IUSS Working Group (2015), Santos et al. (2018), Soil Survey Staff (2014b), Kellogg Soil Survey Laboratory Methods Manual (Soil Survey Staff, 2014a), and the Manual for soil and water analysis (Buurman et al., 1996).
These two soil sampling procedures, which result in data expressed with different units, make difficult data comparisons and standardization of soil dataset or database. Thus, the conversion of the data analyzed on a mass basis to the volumetric basis, or vice versa, is necessary to evaluate data with more accuracy. The variation of results expressed by mass or volume depends of the fine earth fraction density (Mehlich, 1972), which is related to the granulometric composition and organic carbon in the sample, as pointed by Stewart et al. (1970), Qiao et al. (2019), and Patton et al. (2019).
The problems of reporting the analytical soil results in different units was discussed by Mehlich (1972), as well as the differences when the results were expressed as soil mass or volume. The author reports the importance of the volume weight (i.e., the fine earth density) for the conversion of units obtained on a volumetric or gravimetric basis and emphasizes that volumetric base data will only be equal to the gravimetric data if the fine earth density from the soil sample is precisely 1.0 g dm -3 . Further, the multiplication of the gravimetric data by the fine earth density converts the data to the volumetric unit. For soils that do not have 1.0 g dm -3 density, it is necessary to obtain the fine earth density to convert the units.
An option for conversion of these units is the use of pedotransfer functions (PTF), wich involves the application of statistical models. Soil data properties, which are difficult, expensive, and laborious to be obtained, can be predicted by PTF using other easily accessible and economically affordable soil properties (McBratney et al., 2002).
The PTF has been widely used around the world. Qiao et al. (2019) used PTF to estimate soil density at the Loess Plateau in China, formed from deep deposits of sediments. The authors used organic carbon, texture, and soil depth as explanatory variables for the prediction of soil densities. Ottoni et al. (2018) developed PTF to estimate soil hydraulic conductivity for a database of Brazilian tropical soils and European temperate soils, using as predictor variables soil texture and effective porosity. Dobarco et al. (2019) developed PTF to estimate available soil water from a French soil dataset, with contents of sand, Rev Bras Cienc Solo 2020;44:e0190086 clay, organic carbon, and bulk density as predictor variables. Patton et al. (2019) used PTF to estimate the bulk density of the thin fraction of soils from the Reynolds Creek Critical Zone Observatory, USDA Agricultural Research Service site. The percentage of organic carbon measured in soils derived from felsic and mafic lithologies and the particle size distribution were used as predictor variables. The authors point to the importance of incorporating these predictor variables to provide a reliable fine fraction density and, consequently, to estimate soil carbon stocks.
Studies using PTFs in Brazilian soils show consistent results for several soil properties. Nascimento et al. (2015) developed PTF using different variables such as sand, silt, clay, soil density, and organic carbon to estimate water content at 33 and 1500 kPa IN Xanthic Oxisols and Ultisols from a coastal tableland landscape database, from different locations in Brazil. Other studies used soil predictors, such as granulometry, organic carbon content, soil density, texture, and moisture, to estimate the water retention curve in different Brazilian soils (van Den Berg et al., 1997;Gaiser et al., 2000). Beutler et al. (2017) used organic carbon and clay content in Histosols and in other soils with high organic matter content, to predict the soil bulk density obtained in non-deformed soil samples. Likewise, Benites et al. (2007) also predicted soil bulk density by using organic carbon, clay, total nitrogen, and the sum of bases for Brazilian soils in general.
Taking into account that the access to soil samples already analyzed by fine earth mass or fine earth volume may be impracticable, the use of PTF helps to obtain the fine earth density and then to convert the units. To date, there are no studies proposing PTF for the prediction of the fine earth density to convert units expressed in the volumetric to gravimetric basis, or vice versa. Therefore, this paper aims to propose PTFs to obtain the fine earth density (hereinafter called just density) of mineral soils, which will be used for conversion of soil chemical properties expressed in volumetric basis to gravimetric basis, or vice versa.
According to the Köppen classification system, the climate conditions are tropical Aw for all profiles located at Rondônia and Mato Grosso do Sul; and tropical Af and Am in the profiles from Roraima. Samples of the surface and subsurface horizons were selected to comprehend the majority of diagnostic horizons defined in the SiBCS. Hence, they were included 3 profiles of Rondônia, 16 profiles of Mato Grosso do Sul, and 16 profiles of Roraima, totaling 132 samples. The organic horizons (O and H) were not included due to their particularities regarding the predictive properties (e.g., sand, silt, and clay) and the high levels of organic matter.
To measure the density, the soil sample mass contained in a 10 cm 3 device was weighted (precision scale ± 0.001 g) using triplicates. Then, the density was calculated Rev Bras Cienc Solo 2020;44:e0190086 from the corresponding mass values of each soil sample and the volume of the volumetric device.

Selection of the prediction variables
The following properties were selected from available analyses of the soil samples: coarse sand, fine sand, total sand, silt, clay, water dispersed clay, clay dispersion, particle density, and organic carbon. The analyzes of these properties were done in the laboratory following the methods described in Donagemma et al. (2011). Briefly, bulk soil samples were ground and passed through a 2 mm mesh sieve, to prepare fine earth samples, to be used for the following physical analyses: granulometry (with sodium hexametaphosphate as a dispersant for total clay) and clay dispersed in water, both using the densimeter method, and sieves for fractioning sand particle classes, with the silt obtained by difference. The particle density was obtained using the volumetric balloon, with alcohol, method. Total organic carbon was determined by the wet combustion method.
The properties chosen are associated with the particle size (granulometry) and the amount of soil organic matter, which are the main components of the solid soil phase. Additionally, the particle density reflects partially the mineralogy of the sample, since it is related to the amount of iron oxides, silicate clays, and quartz in the soil, as well as the amount of soil organic matter. Thus, they correspond to the possible predictive variables of density and are also properties routinely measured in the laboratories for soil analyses. Boxplots ( Figure 2) were used to graphically summarize the descriptive statistics of the properties for density prediction.
Spearman correlation (Spearman, 1904) was used to support our prior pedological knowledge to identify which properties were mostly associated with density, and then the correlogram was developed ( Figure 3). The correlation values are presented at the intersection between rows and columns with the variables, ranging from -1 to 1. The circle and its diameter are proportional to the correlation value. The blank intersections indicate that the correlation was not significant (p>0.05), while colored intersections (i.e., red or blue) indicate that there was a significant correlation between the respective variables.
Based on the correlogram (Figure 3), the properties with the highest values of significant correlations (negative or positive) with density was selected as predictors, to obtain the pedotransfer functions: total sand (TS) with a correlation of 0.85, clay with -0.76, and organic carbon (OC) with -0.55. Since the original structure and porosity of the soil samples is modified in the preparation process, the most prominent factors influencing the density were the total sand, clay, and organic carbon contents. The relation of these predictors with the fine fraction density were also discussed by Stewart et al. (1970), Qiao et al. (2019), and Patton et al. (2019), and it is corroborated by the results obtained in the correlation.

Pedotransfer function models
To construct the pedotransfer functions, firstly three linear models were fitted using TS, clay, and OC as explanatory variables. The three models were evaluated using a fitted vs. residual plot. The fitted vs. residual plot helps to identify if the relationship between the variables is linear (i.e., linearity) (Ritz and Streibig, 2008). All models exhibit a clear u-shaped pattern (see supplementary materials -S1), and therefore nonlinear exponential models were performed. We look for equations that could be associated with the pedological/environmental field of study. Therefore we found the role of reciprocal yield-density functions related to the fundamental relationship between crop yield and plant population (Farazdaghi and Harris, 1968) and also the plant growth models equations (Paine et al., 2012). All these equations resemble the phenomena modeling by soil variables in relation to density. Among the four equations (e.g., Farazdaghi equations, rational model equations, reciprocal quadratic and exponential models) the exponential models (Equation 1) were selected to represent the increase or decrease in density as a function of total sand, clay, or organic carbon.
The equation has three parameters for describing the curve: β 0 the intercept, β 1 the relative increase or decrease rate, and β 2 the exponential rate. The exponential model (Equation 1) was fitted using the TS, Clay, and OC as explanatory variables, totalizing three PTFs. The parameters for each PTFs were obtained using previous information Density ( to allow an adequate guess of starting values that were evaluated graphically to get plausible candidate model parameter values. For this purpose, nlstools provides a graphical function called preview, which can be used to assess the suitability of the chosen starting values, before fitting the model. This same approach was used by Ritz and Streibig (2008). The graphical examination was used jointly with the residual sum of squares result. Additionally, self-start functions were used to support the previous choice, except for β 0 . The choice of good starting values is essential here because of the novelty in development PTFs for a predictive model. Following, the three PTFs were fitted (i.e., for TS, Clay, and OC), and the model's residual standard error (RSE) statistics were obtained; after that, the RSE (Equation 2) was used to compare the exponential models and to select the best (Dalgaard, 2002). The RSE was calculated by the square root of the residual sum of squares divided by the degrees of freedom, and the lowest values indicate a better fit. The standardized residuals in a normal Q-Q plot and the plot of the fitted values versus the standardized residuals were used to evaluate the normality of the residuals of the nonlinear models. The Normal QQ Plot compares the distribution of standardized residuals and the standard normal distribution. It is expected to obtain a straight line by intercepting the y-axis at zero and slope of 1 as the inference of the data normality satisfied (Dalgaard, 2002). All analyses were performed using the software R (R Core Team, 2018).

Models validation
The evaluation of the accuracy of the models, PTF validation was performed using data of exchangeable potassium (K + ), available in both mg dm -3 and mg kg -1 , of 88 samples taken from 0.00-0.05 and 0.10-0.15 m layers from a Latossolo Amarelo (Xanthic Oxisols), from the National Forest of Tapajós, Pará State -Brazil (Cesário, 2018). The validation was assessed through predicted vs. measured graphs and the residual standard error (RSE) for each PTF.

RESULTS
The non-linear models between TS and density show an increasing exponential pattern (Figure 4a), indicating that enhancing sand content increases density. On the other hand, the Clay and OC non-linear models show a decreasing exponential pattern, indicating that decreasing values of Clay or OC (Figures 4b and 4c) increase density.
Three different models were developed to offer options for conversion since not all data (TS, Clay or OC) might be available. The models were: i) PTF1, which uses TS as the predictor variable; ii) PTF2, using clay as a predictor; and iii) PTF3, with OC as a predictor ( Table 2). The residual standard error (RSE) was calculated for each PTF. The PTF1 had the lowest RSE, 0.092 g cm -3 , followed by PTF2 and PTF3 with 0.1231 and 0.1633 g cm -3 , respectively. However, the dataset used for the development of PTF2 has some low clay contents, which can lead to high-density predictions, so it is recommended to use it with caution. The sequence for RSE is PTF1 < PTF2 < PTF3. This result shows the importance of TS in the density prediction, considering that RSE of PTF1 was 56 % smaller than that of PTF3.
The residuals of the three PTFs were analyzed graphically, and figure 4 shows the residuals graphs (e.g., fitted values vs. standardized residuals) and the Q-Q Plot for the three PTFs. For PTF1 and PTF3, no specific patterns were observed in the distribution of points, which confirms the assumption of normality. These results are also corroborated by the Normal Q-Q Plot of the same PTFs. In general, the analysis of PTF1 residues was slightly better than PTF3. Additionally, PTF2 presented a deviation from normality in the Q-Q plot, and the residue graph showed a trend in data distribution with results lower than PTF1 and PTF3. This indicates the lower accuracy of PTF2 in predicting density compared to other PTFs.
The descriptive statistics of residuals of each function (i.e., PTF1, PTF2, and PTF3) are presented in table 3. The maximum, minimum, and mean values (tending to zero), as well as the low value of the standard deviation, show that the accuracies of the models are satisfactory. The PTF1 and PTF3 show low kurtosis values indicating the normal distribution of the residues. The PTF2 had high kurtosis values and non-normal distribution of the residues, which was corroborated by the Q-Q graph for the same function ( Figure 4).

Models validation
The three PTFs were applied to independent data, samples from the Latossolo Amarelo (Oxisols), used for validation of the model for density prediction. The descriptive analysis of the TS, clay, and OC values are shown in table 4.
Each PTF generated the predicted values of density. Exchangeable potassium (K + ) data of National Forest of Tapajós samples show a range between 12 to 51 mg dm -3 and a median of 27 mg dm -3 . The conversion of the unit to mg kg -1 was calculated by the ratio of the volumetric data (mg dm -3 ) and the predicted density of each PTFs. The RSE values for PTF1, PTF2, and PF3, were respectively: 2.74, 2.78, and 2.79 mg kg -1 .
From the amplitude of the original data observed and the RSE obtained, PTF1 is the model that presents the best fit, corroborated by the lower values of RSE, followed by PTF2 and PTF3; although these two PTFs also presented low error values and can be used for conversion. Likewise, the graphs of the predicted vs. measured values ( Figure 6) show a linear distribution of the values similar for all PTFs and high values of R², corroborating the good application of the functions for data conversion. Table 2. Pedotransfer functions developed to predict the fine earth density and the respective model indexes. Density: fine earth density (g cm -3 ); TS: total sand (g kg -1 ); OC: organic carbon (g kg -1 ); RSE: residual standard error; SE: standard error of parameters; β 0 , β 1 , β 2 : parameters of the function. All PTF show the same degrees of freedom (

DISCUSSION
As discussed by Mehlich (1972), the volumetric base data will only be equal to the gravimetric data if the fine earth density from the soil sample is precisely 1.0 g dm -3 , for other soils, it is necessary to convert units. Using the PTF1, the density from our data ranged from 0.86 to 1.68 g cm -3 , showing a soil mass variation within 10 cm -3 of soil.
The US Soil Survey Laboratory Methods Manual (Soil Survey Staff, 2014a) uses a fixed factor (1.45 g cm -3 ) for the conversion of a volume estimate to a weight estimate of the bulk density for the fine earth, this means that no account is taken about the diversity of soil mass contained in a certain volume, which varies according to granulometry and the amount of organic carbon.
The use of PTF contributes to the standardization of chemical properties units in a dataset or database, facilitating the conversion of units in the volumetric base to the gravimetric base or vice versa. In this sense, we used a dataset of soils with anthropic horizons, including data from literature with different units of the chemical properties (unpublished data). This database was created to propose a taxonomic criteria for the anthropic horizon in the Brazilian Soil Classification System -SiBCS (Santos et al., 2018) and the pretic horizon in the World Reference Base for Soil Resource -WRB (IUSS Working Group WRB, 2015). Some of data sources compiled (Corrêa, 2007;Martins et al, 2007;Souza, 2011;Macedo, 2012;Silva et al., 2013;Miranda, 2018;Macedo et al., 2019) expressed chemical properties of horizons on a volumetric basis (as it is recommended for fertility purposes, that takes composite soil samples), while others used gravimetric basis for the expression of the results (unit recommended for soil classification purposes). The conversion was essential to standardize the various chemical properties data from the literature compiled in this database that had units on a volumetric basis.
Thus, for the dataset standardization, the best accuracy pedotransfer function obtained (i.e., PTF1) was applied to convert the calcium plus magnesium (Ca 2+ + Mg 2+ ) data from the volumetric to the gravimetric base. Table 5 shows the results of the conversion.
As shown in table 5, the largest differences between converted values (i.e., gravimetric basis) and the original values (i.e., volumetric basis) were found for samples with high total sand contents. It corroborates with the data that had a density prediction greater than 1.0 g dm -3 (Table 5). On the other hand, data with low total sand contents, such as IDs 3, 4, 5, 17, and 18 (Table 5), had a predicted density of approximately 1.0 g dm -3 , thus when applied the conversion of Ca 2+ + Mg 2+ data to gravimetric basis, there was practically no difference. Which was expected, since mathematically the closer is the density of the fine earth to 1.0 g dm -3 , the closer will be values of the chemical properties in volumetric and gravimetric base. However, the predicted thin earth density will not always be equal or close to 1.0 g dm -3 , thus chemical properties data on a volumetric basis differs from data on a gravimetric basis, requiring the functions to convert this data or a new measure. Therefore, the use of PTFs assists in the standardization of the database, which can later be used for various purposes, such as soil classification.
The main chemical properties used as criteria for identification of pretic horizons in the WRB (IUSS Working Group WRB, 2015) are exchangeable calcium and magnesium, available phosphorus, and organic carbon content. These properties are expressed on a gravimetric basis -"exchangeable calcium plus magnesium content must be greater than 2.0 cmol c kg -1 on thin earth, organic carbon content must be greater than or equal to 10 g kg -1 and available phosphorus levels must be greater than or equal to 30 mg kg -1 soil". When applying to the soil samples with IDs 2, 9, 12, and 13, Ca 2+ + Mg 2+ values are greater than 2.0 cmol c dm -3 . When the data is converted to the gravimetric base (the unit used in the WRB) the resulting value decreases. Thus, according to WRB, after the conversion, these samples would not fit in the concept of a pretic horizon, since their Ca 2+ + Mg 2+ values are less than 2.0 cmol c kg -1 .
The same situation can be observed for the criteria for phosphorus in both WRB and SiBCS. Macedo et al. (2019), for example, evaluated chemical properties of anthropic soils from naturally fertile floodplain areas of the Solimões River in the Brazilian Central Amazon. Phosphorus data were obtained on a volumetric basis (i.e., mg dm -3 ) and the authors classified these soils according to WRB. In this work, specifically, the phosphorus data are very high and considerably exceed the WRB classification criterion (P above 30 mg kg -1 ). However, if the phosphorus levels were near 30 mg dm -3 , the conversion of these data to the gravimetric basis would be paramount for the proper classification of the horizons. That is, for soil profile analysis and classification, it is recommended to obtain the data on a gravimetric basis, while the volumetric basis properties are used for soil fertility purposes.
The use of PTF for standardization of the anthropic soil database was just one application example. It can be extended to other data standardization applications when access to samples is restricted or when the option to redo analyses is impracticable to change the units of chemical properties from volumetric base to gravimetric base or vice versa.

Simplified steps to the conversion of data
A. Selection of available data: total sand, clay, or organic carbon.
C. Divide the value of the properties expressed in a volumetric basis, that is desired to convert, by the density predicted in the previous item, obtaining the value converted to gravimetric basis; D. Alternatively, multiply the value of the properties expressed on a gravimetric basis and to be converted to predicted density, obtaining the value converted to the volumetric basis.

CONCLUSIONS
The pedotransfer functions obtained by nonlinear regressions, using the properties: total sand, clay, or organic carbon as predictor variables, allowed the conversion of soil chemical properties obtained in the gravimetric base to the volumetric base and vice versa. The pedotransfer function that presented the best precision to predict the fine earth density, which is essential for data conversion, is the one with total sand as a predictor (PTF1), presenting the smallest RSE of 0.092 g cm -3 .
The proposed pedotransfer functions can be used to standardize soil datasets or databases and convert chemical soil data expressed in different units.
This paper also highlights the importance of using the proper method of taking fine earth samples (either by weighing or by using the volumetric device) for laboratory analysis.
The selection of samples for elaboration of the functions did not cover all of Brazilian territory. Thus, a proposal for future work is to expand the selection to validated the PTFs in other regions and for all mineral soil orders.